Violet Hill

Violet Hill

Sunil Arora  //  

Sep 18 / 8:25am

PyCon India Presentation on Redis And Python

Pune (India) witnessed probably the second biggest PyCon event this year. It was third PyCon to be held in India and was very successfull event.

I took a comprehensive tutorial on Redis & Python subject. I am embedding the slides from the same session below.

Hope you will find it useful. Ping me @_sunil_ if you have any questions.

Filed under  //  Redis   pune pycon 2011   pycon india   python   sunil arora  

Comments (1)

Jul 30 / 11:20am

Another Redis use case: Centralized logging

Log analysis has become a difficult task in our production environment at work because logs are distributed on different machines and in different files. So, we wanted all the exception logs from all of our apps to be tracked centrally and viewed in single console. 

And once again, we found another good use case for Redis. Our strategy is to dump all of critical logs in to a Redis List and have a background worker which continuously pulls logs from the Redis List and write stuff in log file.

As we use python for all our backend work, I quickly wrote a Log Handler that can dump log messages in to Redis. 

So RedisLogHandler class looks like this:

To hook up this RedisLogHandler to your application logger, all you need to do is following:

I quickly hooked it up to our all of our background jobs in celery and I can see it working. Let me know what you think of this solution.

Comments (6)

Jun 18 / 12:55pm

Redis Lua Scripting

 Redis is one of my favorite data storage platform and I won't miss a single chance to use it wherever I can. One of the biggest strengths of Redis has been that you can define your data modelling in the most natural form like key-value/hashes/lists/sets. I prefer thinking of data storage in form of list, hashes, sets instead of tables :).

Redis provides a very good set of APIs(commands) which pretty much allows majority of the operations you would like to perform on these data types in a single command. 

But in the past, I encountered situations where solution design required the following:
  • Executing more than one Redis commands.
  • Outcome of one command would determine whether to run other commands or not.
  • Atomic execution of 1 & 2 [Transactional].
  • Contention environement for such operations, so optimistic locking schemes wont work.
Situations like above pretty much made me turn away from Redis in the past. 

This is going to change now. About a month ago, @antirez (creator of Redis) released Lua-Scripting support in a different branch by hacking over a weekend (that shows how terrific hacker he is).

Lua-Scripting support Redis pretty much solves majority of the problems of the nature described above. This has great potential and can't wait to see what community do with it.

@antirez's post about scripting support provides good details. You can read the blog post here (http://antirez.com/post/scripting-branch-released.html).

I spend sometime today to try out scripting, so thought of doing a quick write up about what I did today.

I started with simple goal of implementing two new commands zpop and zrevpop for sorted set data type using scripting.

1. ZPOP: This will allow popping out element with lowest score from a sorted set.
2. ZREVPOP: This will allow popping out element with highest score from a sorted set.

Setup:
Follow these steps to get scripting version of Redis running on your machine.
2. cd redis
3. git checkout scripting
3. make
Now you should all the binaries ready in the src folder (redis-server and redis-cli). Run the server by running redis-server binary.

You can take a quick crash course on Lua for writing some basic script. Follow this link (http://www.lua.org/manual/5.1/).

ZPOP Implementation:
Redis implements redis.call interface to invoke redis commands from Lua code. Here is Lua script for zpop command.

Testing:
Redis server implements new command "eval" to invoke lua scripts. Quick syntax goes like this:
EVAL <body> <num_keys_in_args> [<arg1> <arg2> ... <arg_N>]
You can test your lua script using redis-cli program. 

Lets populate sorted set named zset by inserting a, b, c with scores values 1, 2 and 3 respectively.
./redis-cli zadd zset 1 a
./redis-cli zadd zset 2 b
./redis-cli zadd zset 3 c

Here is command to test zpop.lua file. You should see 
./redis-cli -p 10000 eval "$(cat path-to-zpop.lua-file)" 1 zset
1) "a"
./redis-cli -p 10000 eval "$(cat path-to-zpop.lua-file)" 1 zset
1) "b"
On the similar lines, zrevpop can be implemented using following lua script.

I am going to do a follow up post with some complex examples to demostrate the true potential of Lua scripting.

Thanks @antirez for keeping this project on the edge by innovating at all the levels of the project. 
Filed under  //  Redis   lua   scripting  

Comments (1)

Apr 6 / 9:32am

CoffeeScript with Backbone.js Example

I have been spending sometime researching development strategies for building heavy Javascript application. I have been having some fun playing with backbone.js and coffeescript. I did not find some good tutorials that explains combine usage of these two awesome projects, so I thought of writing one myself.

Before we start, here is a quick intro about Backbone.js and Coffee Script from from their main website itself.

1. Backbone.js: Backbone supplies structure to JavaScript-heavy applications by providing modelswith key-value binding and custom events, collections with a rich API of enumerable functions, views with declarative event handling, and connects it all to your existing application over a RESTful JSON interface.

2. Coffee Script: CoffeeScript is a little language that compiles into JavaScript. Underneath all of those embarrassing braces and semicolons, JavaScript has always had a gorgeous object model at its heart. CoffeeScript is an attempt to expose the good parts of JavaScript in a simple way.

In order to demonstrate usage of backbone.js and coffeescript, let me first describe you a simple example that we going to build today. You can see the final build example here. Basically, we will build a color-box configurater which takes size and color for square box to be drawn. Changes in the configuration should reflect in real time on screen. This type of functionality you see in any visual designer part of an IDE.

Coffeescript and Backbone.js example

 

Example consist of following parts:

1. Configuration Model: which basically contains three configuration properties, i.e. height, width and color of the box. We create ConfigModel class which inherits Backbone.Model class and its initialize method (which gets invoked when an instance is created) sets up the default values of the three properties. 

 

2. Configuration View: This view contains two input text boxes which are initialized with default values. This view is responsible for capturing inputs and updating the underlying model. We define two form fields color-input and width-input for taking color and size of the square. Note the identifiers (color-input/width-input) of these two elements and identifier (config-input) of the config container as they will be used coffee script part of the code.

ConfigInputView class defines the presentation layer of the configuration input and it inherits from Backbone.View class. Initialize method creates a linking between the underlying model with this instance of ConfigInputView class. Events field of the class basically binds keyup events on both the input boxes to updateConfig function. updateConfig function simply updates the underlying model's three properties height, width and color. Note the thick arrow here "=>" which specifies that updateConfig function should be invoked in context of this instance of ConfigInputView. 

 

3. ColorBoxView: This view is responsible for presenting a single colored box as per the configuration. This view responds to changes on configuration model. Another point to note here is there can be more than one instance of such views and all of them respond to changes in the configuration. We define ColorBoxView which inherits from Backbone.View. Initialize method of the view create a compiled class from a template which is also defined below. We define a method render which simply redraws the content of color box by generating HTML for the containing element from current configuration. Note we bind render method to be invoked on any change event on the underlying model. The underlying model is config model which gets changed on every keystroke in any of the two input boxes. Render function again defined using thick arrow "=>" makes sure that render function is invoked in context of ColorBoxView instance.

This is the HTML template file that we use for a color box presentation. We are using jquery template library. Two data pieces ${width} and ${color} are filled in the render function to generate the template.

 

4. Controller: Controller is kind a core which binds everything together basically the above three components together. Controller class inherits from Backbone.Controller class and instantiates one instance of ConfigModel, ConfigInputView each and five instances of ColorBoxView. Each of the ColorBoxView gets ConfigModel instance as underlying model.

 

Lastly we define a global init method which create an instance of ColorBoxController and is invoked on documentLoad event of JQuery.

I am sure you will have a few questions about the post, please comment below or tweet me @_sunil_

Comments (3)

Jan 16 / 6:54am

Facebook Python Library Documentation

This weekend , I spent sometime in exploring documentation tools available in python. I think Sphinx is the best documentation project I have ever come across. Python Docs itself uses Sphinx for documenting Python.

I thought of trying Sphinx for documenting official Facebook Python SDK. After spending some serious 4-5 hours, here it is:

Hope you guys like it. I plan to keep it updated with more writeups.

#cheers

Comments (1)

Nov 15 / 10:01am

Shit happens, It does!

shit happens

Picture by Anant S.

We've all had embarrassing moments in our carrer where they involved inadvertently wreaking havoc on a production system. When it happens; for a second you (so desperately) want to believe it didn't. You will be so afraid to even cross-check that it actually happend. 

Github went through an outage yesterday and Chris was brave enough to reveal how it happend, then hacker news post generated a good buzz around the subject. While reading comments on both the threads, I hand picked a few interesting stories about production mishaps. Here they are:

seldo: My worst was discovering I had written a unique ID generator which was (due to me typing "==" instead of "!="), producing duplicate IDs -- and not only that, it was producing them at exponentially increasing rates -- and every duplicate ID was destroying an association in our database, making it unclear what records belonged to who.

pixdamix: Mine was for a French social networking site 4 years ago. They used to send mails everyday to say "hey look at the people who you might know". The links on the mail would automatically log the user on the website. When I sent the code live it took 2 days (and more than 50000 mails to found out that when I sent a mail to person Z about person Y the link logged in Z ON Y's account.

SkyMarshal: sent a test email to thousands of customers in your prod database encouraging them to use web check-in for their non-existent flight tomorrow. Yeah, did that five years ago, talk about heart-attack-inducing. Quickly remedied by sending a second email to the same test set, thankfully, but that's the kind of mistake you never forget.

lambda: The Dreamhost case in which they typed the wrong year in their billing code; charged many of their users for an extra year of service, to the tune of $7.5 million.  (http://blog.dreamhost.com/2008/01/15/um-whoops/). 

JabblesWhen the google engineer added "/" to the list of bad URLs, thereby marking every single website (!) as possibly dangerous. http://googleblog.blogspot.com/2009/01/this-site-may-harm-your-computer-on.html Now that would be a bad feeling.

dacortAt backdoor, the outage was caused by a test script that was meant to refresh a test database when it was inadvertently run against our production systems.  This script truncated 79 gigabytes of data in a matter of seconds. http://www.bigdoor.com/blog/bigdoor-api-service-has-been-restored/

joeybakerI ran datamapper auto_migrate! against the company blog mysql db with 5 years worth of posts and my sysadmin soon realized we didn't have a backup. Three fun filled days of Google cache scraping got us back up and running.

Would love to hear about your production mishaps if any :). 

Comments (0)

Jul 28 / 9:07am

Parsing signed_request parameter in Python based Facebook Canvas application

Recently Facebook announced a new way to passing user information who is viewing your Facebook canvas application using "signed_request" parameter which is implemented on top of new signature scheme based on OAuth2.0 proposal. Facebook documentation describes "signed_request" as

The signed_request parameter is a simple way to make sure that the data you're receiving is the actual data sent by Facebook. It is signed using your application secret which is only known by you and Facebook. If someone were to make a change to the data, the signature would no longer validate as they wouldn't know your application secret to also update the signature.

Facebook's python-sdk does not support parsing request parameter. Today at work, I had to write this piece of code snippet for parsing "signed_request", so thought of sharing it here.

I know there is some cryptic code in base64_url_decode because translate, maketrans does not work that well with unicode strings. Anyways, if you have any questions, just drop a line in the commments below or message me @_sunil_.

Filed under  //  canvas   canvas application   facebook   python   request   signed  

Comments (18)

Jul 25 / 10:36am

Pankhon ko... Hawa zara see lagne do....

Its been quite a while since I wrote my last blog post and for a change, this time I am not going to talk anything technical :). Actually, the idea of writing this blog post got triggered from the feeling that I had when I finished watching "Udaan -by Anurag Kashyap's production house" at Inox today. Then, when I came home, 3 Idiots was being screened on TV. There's been quite a few movies that I can count which shivered me from inside and forced me into thinking hard about life and whether I AM LIVING IT RIGHT ? These are some of the movies which shake old age fundamentals which most of us have been brought up with.

So in today's blog post, I am going to celebrate this feeling of freedom and "TAKE ON THE WORLD" spirit and would like to refresh the thoughts and spirit that these movies carry. The best song to set the theme would be "Pankhon ko..." from Rocket Singh. Its a beautiful song and with so thoughtful lyrics by Jaideep Sahni. Enjoy!!!

Pankhon ko hawa zara si lagne do
Dil bole soya tha ab jagne do
Dil dil mein hain dil ki tammana sau
Do sau hon chalo zara sa tapne do
Udne do udne do
Hawa zara si lagne do soya tha ab jagne do
Pankhon ko hawa zara si lagne do

Dhoop khili jism garam sa hai
Suraj yahin yeh bharam sa hai
Bikhri huyi raahein hazaron sau
Thaamo koi phir bhatakne do
Udne do udne do

Dil ki patang chahon mein gotey khaati hai
Dheel toh do dekho kahan pe jaati hai
Uljhe nahin toh kaise suljhoge
Bikhre nahin toh kaise nikhroge
Udne do udne do

1. The first one is "Rocket Singh".

 

2. The Second one is "Udaan"

 

3. The Third one is "3 Idiots"

 

4. The fourth one is "Taaren zameen par..."

 

5. The fifth one is "Rang de basanti..."

 

Guys, do share your thoughts and movies which rate high in this category! #cheers

Filed under  //  3 idiots   bollywood   rang de basanti   rocket singh   udaan  

Comments (0)

May 13 / 11:39am

Why StackOverFlow hates Ruby and Loves C#

No wonder why stackoverflow folks decide to go with C#, the answer is simple because C# is going to make lot of money for them :) and it works really hard for them.

If you are interested in economics of programming languages with StackOverFlow, you are going to find this one amusing.

An hour back, just out of curiosity I thought of checking out popularity of programming languages on stackoverflow. I was looking for simple representative figures to compare engagement value of programming lanugages on stackoverflow. The simplest I could think of was 'Number of Questions tagged to each language'. I chose four languages:

For the simplcity, let assume that each question for a language asked on Stackoverflow generates 1 USD (I know in the long run, value of each question will be way over 1USD as it will increase over time...but lets keep it 1 USD fixed), then here are some interesting economics that each programming language creates for StackOverFlow.

1. Dollars earned so far from the four languages:  poor ruby :)

stackover_dollars_earned_till_now

2. Dollars earned this Month: poorer ruby :)

stackover_dollars_earned_last_month

3. Dollars earned this this week: somebody save ruby :)

stackover_dollars_earned_last_weekOn the serious note, the stats for all the four languages are interesting and puzzled me when I tried to draw some logic out of them. Oh yes, the money graphs were made using the 'Piles of Samples' visualization from Google. You can reach me @_sunil_

 

Filed under  //  C#   Java   Ruby   funny   python   stackoverflow  

Comments (3)

May 9 / 5:15am

7 things one can do to scale up a web application

scaling up

Recently at work, we had to undertake a quick exercise of scaling up our web application which taught me a few things which I thought of sharing with the community. We are using following technology stack at work: 

  • Python as our primary language for most of our work at the backend
  • Pylons (Webframework)
  • MongoDB (NoSQL datastore)
  • Redis (Cache)

Lets jump in to the seven steps that worked for us and hope that most of them can be applied to any web application.

1. Profile your web application: In order to understand the execution pattern from performance perspective, first step would be to profile application. Profiling can bring out some interesting insights about your application. Within 10 minutes of profiling we were able to figure out some of the very important (low-hanging) performance fixes. Another reason for enabling profiling is that if you do some performance fixes, it quickly helps you measure the difference as well.

These days most of the web frameworks provide profiling tools with them. For our web framework Pylons, we used  ProfileMiddlware from paste package.

  • You need to install python-profiler package. following command should do the trick for you if you are using Ubuntu:

sudo apt-get install python-profiler

  • Add following lines in your pylon's application middleware.py i.e. <app_name>/config/middleware.py

from paste.debug.profile import ProfileMiddleware

app = ProfileMiddleware(app, config, log_filename='profile.log.tmp', limit=40) #in the custom middle ware section

With above steps performed, you should see profiler output on the console (stdout) if you are running in dev mode. Now identlfy the code paths which are the real bottlenecks.

2. DB Query Profiling: Since most of webapps are powered by some kind of data store, profiling data store query profiling would give you some interesting insights about the slower operations in your application. In our case, since we are using mongodb. It provided one command line switch for verbose mode (-vvvvv) for different verbosity levels to understand query execution happening at server. It helped us to identify some of the most frequent and slow queries and in 90% of the cases, all we needed was to define indexes on our collections and we were done. Things may not be that simple in your case but it will ateast give your engineers enough to understand what needs to be attacked in the application.

3. Enable Data Caching: Caching can be your biggest friend for scaling up and it can be done at various levels. Caching strategy depends on usecases in your application and for some of the popular usecases like page-level-caching etc, most of the frameworks provides support out of the box. For ex. for Pylons, beaker cache module provides supports most of caching use cases.Just to give you example of caching scenarios, in our cases we observed most of our application pages can be cached for non-logged state and we wrote our custom caching module to enable page-level caching for non-logged mode. Now we are in process of going one step down to enable data-level caching for even logged-in version. Caching can be your biggest friend for scaling up your app (I am going to do a follow up blog post on caching work that we did)

4. Background certain tasks: While improving response time for some of the requests in our webapplication, we found lot of things which were not needed to be performed inline in the request handling path and could be performed as background job. There are some standard off the shelf components available these days for most of the web frameworks. For ex. resque if you are using Ruby on Rails. In our case, we used python based Celery for backgrounding certain tasks.

5. Combining JS/CSS:  We observed that we had 17 CSS and 9 JS files being included in different pages of our application which were leading to 26 IO requests which were bad from the server as well page-load perspective. So simply combining all the JS files and CSS files in one file for JS and CSS each, we cut down on 23 IO requests from our server which improved our page load performance as well. Most of the webframeworks provide minification/combining modules for JS/CSS files. In our case, we used MinificationWebHelpers module. javascript_link helper and stylesheet_link needs to be passed with extra flags as shown below.

${h.stylesheet_link('/css/ext-libs/jqModal.css',
                    '/css/ext-libs/jquery-ui-1.8.custom.css',
                    '/css/ext-libs/jquery.jcarousel.css',
                    ...........
                    '/css/explore/exp_common.css',
                    minified = True, combined = True, combined_filename = 'app.css')}

6. Server Static content from Other Server: If your web application contains lot of static content like images etc., then it would be good idea to serve the static content from other services like Amazon S3 which are better suited for this purpose. It will further cut down on IO requests being served from your web server. We used Amazon S3 for serving our images. Also in our case, there was some content which is not exactly static like user images which get changed when user uploads a new image, we used Amazon S3 API (python's boto library) to push the new/changed images on the fly to the Amazon S3. You can take further advantage of hosting images on Amazon S3 by enabling Amazon CDN service to power this content from Amazon's CDN infrastructure which can further improve page-load performance.

7. Correct Logging Strategy: This one is a very low-hanging one and may not be the problem in your case but we observed that there were lot of logs enabled in our production setup and were needed to bumped down in their log-levels. A quick one hour exercise led to assigning right log levels to all the noisy log statement.

I hope you will find above tips useful. It would be great to hear about some of the tips that you must have applied in your app. We are pretty much done with the vertical scaling exercise and I am going to follow up this post with the horizontal scaling exercise which we are starting off this week.

Filed under  //  performance   pylons   python   web application scalability  

Comments (4)