Developer tip - Best resources to learn Git

This post is more of a developer tip.

Charles Duan has written one of the best description of Git data model and Git commands explaining how they manipulate the data model under the hood.

As he puts it

you can only really use Git if you understand how Git works.

Now go read it here.

Thank you Charles for such a nice write up!

If you liked this post, you can share it with your followers or follow me on Twitter!

NewRelic's default conf drags Play Scala app

Last week I learned about an important tweak in NewRelic configuration to avoid performance issues with Play Scala webframework.

While benchmarking our play-scala app I was surprised to see poor performance numbers given that play framework claims to be one of high performant web frameworks around. App wasn't able to serve more than 200~230 requests per seconds for a simple DB lookup operation. Some serious digging revealed that default configuration of NewRelic for Play framework adds serious drag to the app's performance.

I contacted folks at NewRelic and they were quick to respond. Here is what I learned from them about the instrumentation:

In order to properly track async activity, we instrument Scala promises and futures. Unfortunately naming the tracers using these classes are not helpful at all. In effort to improve trace segment naming, I take a Thread.stacktrace and navigate up the stack to find a useful caller and name based on that. The call to Thread.stacktrace can be relatively expensive and result in significant overhead. The setting we suggested disables this functionality and has reduced overhead significantly for several other customers.

So, all you need to do is to turn off stackbasednaming in your newrelic.yml. Remember to put this configuration under transaction_tracer section.

transaction_tracer:
      stack_based_naming: false

Here is a side by side comparison of running Apache Benchmark (ab) with 10K requests with concurrency of 10 with/without the configuration tweak.

You can see the difference in throughput 1300 requests per second vs 220 requests per second.

NewRelic should turn off stackbasednaming setting by default.

Hope you find this tip helpful. If I had known about it, it would have saved me couple of days :).

If you liked this post, you can share it with your followers or follow me on Twitter!

Scala Concurrent Downloader example using Future

I spent this weekend learning more about Scala Futures, and believe me, posts below are the best written introductory posts about scala futures and promises.

In order to learn it for real, I started with an exercise of building a concurrent URL downloader. Basically idea is, let say, given n urls, how do you download n urls concurrently.

import play.api.libs.ws._ //We are using WebService Client from Play's framework
import scala.concurrent._
import java.util.concurrent.Executors
import scala.util.{Success, Failure, Try}

object ConcurrentDownloader {

    def main(args: Array[String]): Unit = {

   //Future need an execution context for running them
      val executorService = Executors.newFixedThreadPool(1)
      implicit val executionContext = ExecutionContext.
                            fromExecutorService(executorService)
      
      //lets define set of URLs to be downloaded
      val urls = List("http://www.google.com",
                      "http://yahoo.com",
                      "http://bing.com",
                      "http://jskdlsycds.com", //invalid url-1
                      "http://amazon.com",
                      "http://hackerne.ws",
                      "http://firstpost.com",
                      "http://rediff.com",
                      "http://wowslskdleodd.com") //invalid url-2

     //Here is how we create the future for each URLs
     //execution context will run start fetching the URLs in the background
      val futures = urls.map { url => WS.url(url).get() }

    //This is a nice little trick to ensure convert a future of T to future of
    Try[T]
      def futureToFutureTry[T](f: Future[T]): Future[Try[T]] =
              f.map(Success(_)).recover { case x => Failure(x) }

      val futureListOfTrys = futures.map(futureToFutureTry(_))

     //This is way to combine all those future into a single future
      val fseq = Future.sequence(futureListOfTrys)

      fseq onComplete {
          case Success(l) => {
              var sCount: Int = 0
              var fCount: Int = 0
              l.foreach {
                      case Success(resp) => {
                          sCount += 1 
                          println("status....=%s".format(resp.status))
                      }
                      case Failure(t) => {
                          fCount += 1
                          println(s"failure $t")
                      }
              }
              println(s"success=$sCount, failures=$fCount")
          }
          case Failure(ex) => {
              println("failure")
          }
      }
    }
}

There you go, you have a nice little concurrent URL downloader. I am so glad to write a concurrent program without involving any threads, locks, shared data structures, feels so refreshing!

Learning Scala -- my experience and advice

Learning a new programming language has been on my list quite sometime and fortunately, I got a chance to try out Scala for one of the projects at work a month back. So, its been a month learning Scala now. Though its too early to form an opinion about it, but I must say, it has NOT been an easy language to learn so far.  And I think one of the reasons for that is -- the language is way too expressive with pretty much all the constructs in the basket and that makes it a very powerful tool but a difficult one to master. So one advice, don't get disappointed if you feel lost and be patient while you are learning it because it is going to take a bit longer than you expected, so hang in there!!!

As I am on this journey, I am discovering some useful resources everyday, so I thought of documenting them in a blog post. I plan to maintain this blog post as a living document.

Books: 

If you are looking for a quick introduction to the language, I would recommend "Scala for Impatient" by Cay S. Horstmann. It is well written book with right balance of content explaining key concepts and their application. But if you are in for a long and patient read, "Programming in Scala" by Martin Odersky is a good read which explains the key concepts on a much more fundamental and detailed level. I am using both the books  and cross reference depending upon the details I want to dive in to a particular topic. I am yet to explore other books, but these two makes a good start. The good thing is that both of them are available in kindle version.

Video Lectures: 

If you like attending video course, then there is a very good course "Functional Programming Principles in Scala"  by Martin Odersky on Coursera. The course is divided in to video lectures of 10-20 minutes length, which makes them easy to consume.

Blogs/Online Articles: 

There are some really useful blog post out there about scala, but a few deserve special mention here.

Tooling: 

Eclipse and IntelliJ Idea, both of them, have a good support for scala. In my case, I end up using IntelliJ for my day to day coding, but eclipse has a very good support for Scala worksheet, which I find really handy to try and learn new concepts. For some reason, scala worksheets does not work in IntelliJ Idea. I also use Vim for some quick editing etc and I found vim-scala plugin pretty handy.

Let me know what has been your experience learning Scala and chime in with links to resources which you found useful.

Drake - Make for Data!!!

Very recently, I started my journey in big data world and its exciting time to see new technologies popping up these days. I came across this really handy tool which I used in my project at work. Its called Drake, Make for Data as factual folks call it. You can read more about it in their blog post . 

I will write up a detailed post about how I used Drake to simplify my life at work but in the mean time, I wanted to share this very good introductory video by Artem Boytsov the author himself.

I would highly recommend it to anyone to add it to his toolkit if you are dealing with complex data workflows in your project. Have a look and let me know what you think!!!