Canvas Tetris

Here's a Tetris game I wrote during my spare time in the weekend. It's one HTML file and it uses the HTML canvas element only.

You're gonna need a HTML 5 browser to see it in action. That means Firefox 3.5+, Safari 4 or Google Chrome.

I would very much appreciate remarks on the code :)

Here's the link

Django vs Pylons

This is an attempt to make a non-biased comparison between Django and Pylons. I did a brief comparison of the two at work since we needed a web framework for our new project.

We are making a web application that will serve as a frontend to a bank database. The application is going to allow bank employees to insert and track interbank loans and their repayments. The database is actually abstracted behind a layer built with Twisted, so the web application will only use XMLRPC calls to communicate with the database, hence no ORM.

As of the moment of writing, Django is in version 1.1 and Pylons is in version 0.9.7. Both of the frameworks are mature and tested in production in some major websites. Django has Washington Post and pylons has Reddit.

The frameworks are quite different in philosophy, although both follow the same MVC paradigm. Django has more magic and less code, while pylons has more code and less magic. Pylons is essentially a bare-bones wrapper around the WSGI specification that uses 3rd party modules for templating, database interaction, routing and just about anything else, while Django is aimed towards rapid development of web applications and has everything packed inside of it - it's own template system, routing and ORM. This allows Django to establish high reusability for code between different projects. On the other hand, the developer is limited to one ORM and templating system.

Let's go point by point. These are not all the relevant features in a web framework by any means, but they should give the reader a general overview of the differences and the strengths of the frameworks in the most important areas.

Read the rest of this entry »

Details are everything

Attention to details can make or break a product. I was just downloading dropbox and I noticed a facinating thing. On the download page there are instructions for installing. It's just a few images of your browser download window and the installation confirm popup telling you where to click. Now the fascinating thing is that those are images of YOUR browser. They change depending on the browser or operating system you use to visit the site so that there are no confusions. Now this is what I call attention to detail. That is why Blizzard is making the finest games and Microsoft is making crappy OS-es. Kudos Dropbox!

Maybe exceptions are not so awesome after all

Maybe I got a bit rusty on my 1 month vacation, but today I ran into a simple problem that I spent 2 hours solving. I Am working on a website made in Django and I decided to make some changes to the model in one of the apps in the project. I currently have 3 apps - game, content and members. One of the classes of the content model imports a class from the game model, but I decided to remove that particular class.

After syncing the DB I noticed that the database tables for the content app are missing.This was strange, because no errors were reported. I tried to do a "python manage.py sqlall content" and the shell threw an error that it couldn't find the content app on my PYTHONPATH.

Error: App with label content could not be found. Are you sure your INSTALLED_APPS setting is correct?

I tried all kinds of different stuff until i found a mailing list that said that if you have import errors in some module, you cannot import it, which is to be expected, but the error messages that come up are all but informative. The message that my PYTHONPATH is incomplete in no way suggests that I have an import error.

What does this have to do with exceptions ?

Well, I'll talk about another example first: While I was writing a screen scraper, I used a lot of regular expressions. The regular expressions return a match object if they successfully make a match and None if they don't. If you try and call the .group() method on a Match object - it does the job, but calling it on a None object throws and AtrributeError. I used to catch this exception in the upper layers of my program in order to avoid crashing the script on invalid input. This turned out to be a bad idea since all kinds of other stuff throws an AttributeError and makes the debugging a living hell.

I suspect that the same thing is going on in Django. Failure to import inside a module throws an ImportError which Django interprets as a missing module - only the module isn't missing - its a "submodule" that is missing. But the exceptions are the same, no matter on which level they happen.

Using exceptions correctly requires defining your own classes for every particular error that you may run into. This is a lot of extra code and need's to be weighed against the old-fashioned error codes.

A recursive django template tag

Here is my attempt to create a "silver bullet" tag for printing tree structures with the Django templating language. It's far from a silver bullet, tho, but it can do basic stuff:

It's a modification of the standard "for" tag and i have kept the counter, counter0, first and last variables, only this time they are not attributes to forloop, but rather to recurseloop. Check the docstring for more info.

For example, if you need to print comments that have other comments as replies, you would want the comments to appear one below the other, but the replies to be indented a bit. Let's say 20 pixels per level. So the code would be:

{% load file_that_contains_recurse_tag %}
{% recurse comment in comments children="replies" indent=(0,20) %}
  <div style='margin-left:{{indent}}px;'>
    {{ comment.text }}
  </div>
{% endrecurse %}

This tag will expect a list of comments (top-level) that have a property named 'replies' which contain other comments.

indent is an argument that will start with the float value of 0 and get increased by 20 on each depth level of the recursion. You can pass as many variables like indent as you like. They must be in the form

name=(float,float)

or strings will also work but must be enclosed in quotes

name=("string","string")

Caveat: You must not leave blank spaces between the equal signs when assigning children and additional incremented arguments. I will fix this later.

A second scenario is when you need the parent element to contain the children, like in unordered/ordered lists. In that case you can use the {% yield %} tag inside the recurse block. This tag will output the HTML between the recurse and endrecurse tags if there are any children in the current iteration item, or it will output nothing if there are no children.

{% recurse comment in comments children="replies" indent=(0,20) %}
<div style='margin-left:20px;'>
  {{ comment.text }}
  {% yield %}
  </div>
{% endrecurse %}

The yield tag (as for now) can only be used directly inside the recurse block, much like the {% else %} tag can only be used directly inside the if-endif block. This means that you can't make code like

{% recurse comment in comments children="replies" indent=(0,20) %}
<div style='margin-left:{{indent}}px;'>
  ({{ comment.text }})
</div>
  {% if cond %}
    {% yield %}
  {% endif %}
{% endrecurse %}

This will fail with an invalid block tag exception. You can check the docstring of the do_recurse function inside the code for more info. Also, you may want to check this code for errors since I just wrote it today, and haven't had much time to test it.

Download and comment on any errors you find :)

The Leopard 10.5.7 update, the keychain and the clock.

I installed the 10.5.7 update for Mac OS X Leopard and after logging in i got a prompt that the system date was reset (to 1990-something). I am using the automatic date setting which gets the current date and time from apple.com so I figured that it was no biggie. But after a short while I noticed that I have no wireless network connection and that the password for the network was blank, which is odd, since I am supposed to have this password saved in the keychain.

I opened up the keychain access and checked the "show password" on the wireless network in my office (I don't remember it, ofc ) and i got a message that "Access to this item is restricted.".

After a lot of digging around (on another laptop) I found out that if the current system date is earlier than the creation date of your keychain file (~/Library/Keychain/login.keychain) you cannot view the items in it. So here's a loophole: No internet - can't set correct system date - no correct system date - no internet ....

Luckily for me, I devised an extremely clever plan - I manually set the date to today, got the keychain working, and then switched back to auto-date. Ha-ha.

Python 2.6, Leopard 10.5.6 and psycopg 2.0.10

Psycopg2 is the postgreSQL adapter for Django. If you are trying to set it up on Mac OSX (using the supplied setup.py script) you may run into an error saying

File "setup.py", line 219, in finalize_options
NameError: global name 'w' is not defined

This is a simple thing to fix: open the setup.py file coming with psycopg2 and edit the line 219 like this:

except (Warning, w):

Remove the braces

except Warning, w:

In python 2.6 (at least) putting exceptions in brackets will catch multiple exceptions in this case Warnings and 'w's. The author clearly meant to get the Warning instance as 'w'. Also, do not forget to edit the setup.cfg file on line 28 and type in your pg_config path. Mine is

pg_config=/Library/PostgreSQL/8.3/bin/pg_config

and so should yours be if you used the default installer package for mac OS X. Oh, yeah, you need to install postgres before installing psycopg2.

Maze Drawer

This piece of python code is going to draw some mazes in a .PNG format. It's a side-product of the solution to the Google Code Jam practice problem B called "Always turn left". The script takes the input data sets and draws the mazes described with every test case. You will need Python with Python Imaging Library (PIL) installed. If you're working on OS X like me, check out this link for some tips on how to install PIL.

To run the script type "python left.py" and it will ask for a file path containing the test cases. It will create a png image for each test case in the directory from which the program is run.

Download script + data set and make some mazes

Some problem solving and how it’s easier with python

There's a project I work on that required me to make an import utility for a CRM. The import should get a comma separated values file of clients and information about clients, and save it to the database. The database is split across several tables, so in the `clients` table I normally don't keep the name of the company, but just a foreign key. Now, our client is not very good with numbers and she needed to import files in which she could enter the name of the company instead of the database ID. A spreadsheet row representing a client looks like this:

FirstName | LastName | Email               | Company
John      | Doe      | johndoe@example.com | Coca Cola

But the database row in the `clients` table looks like this:

first_name | last_name | email            | company_id
John       | Doe       | john@example.com | 2

What I need to do is search for the company named 'Coca Cola' in the `companies` table and replace the name with it's ID. This is all fine except for one problem - typos. Moreover, the user could write "Apple Computer Inc." instead of "Apple Inc.". So I needed a way to compare the input strings with the ones in the database.

After poking around I found out about the Levenshtein distance between strings, but that solved only half of my problems - the typo part. The distance would be very small between "Apple" and "Aple" but very big between "ACME International Inc." and "International ACME Inc.", and the latter two are obviously the same.

I devised the following method to compare entries:

  1. Split up the terms by words and eliminate blanks
  2. Get the Levenshtein distance between each word from the first term and each word from the second term. Comparing "Apple Computer Inc." with "Apple Inc." for example, will give a matrix of 6 distances. lev_matrix
  3. Get the shortest term (one with less words, not the one with less characters). It has 2 words in this case. Then choose the smallest values from each row. When you pick the smallest row value, you cannot pick anymore values from that column. This means that the word in the column is the best match for some word in the rows.
  4. Add these values up and add the difference between the word count of the 2 terms - and you have a score for the similarity of the terms. If the score is zero, they are the same. We are adding +1 for each extra word, but this can be weighted if needed. The point is that we don't care much for extra words since company names can have many words in them, but they are often called by one or two words.

But there is a problem with step 3. If, for example, a column has the lowest values for more than one row, we always choose the first, and this practice is not always the best answer. For instance, matching "Fast Cats" with "Fats Cats" (notice the typo) gets a total score of 3 - matching Cats to Fats and Fast to Cats, which is wrong - it will be 2 if we match Fast to Fats and Cats to Cats, which is the intended solution.

fast_cats

So to be sure we have the best match, we need to always have the lowest sum that is unique across rows and columns. One solution is to make all permutations of the words in the columns and join them to a single permutation of the words in the rows then see which one has the lowest score. If the words in the rows are fewer then we need to get all permutations P(n,k) of the words in the columns, where n is the number of columns and k is the number of rows. This is a O(n!) algorithm but it's the best that I could think of - practically the same problem as finding every possible way to place 8 rooks on a chess table without making them attack each other.

And finally, here is the part where we get to write some code. I need a function that can calculate all permutations consisted of k elements out of a larger set consisted of n elements (k <= n).

I decided first to write the algorithm in Python because it's cleaner and easier to think, and then to rewrite it in PHP. The first attempt was really, really sucky and I won't talk about it because I'm a bit embarassed. But I wasn't aware of a neat thing that Python has: the yield statement. The darn thing can be written in 6 lines with it:

def permutations(the_set, n):
  if n==0:
    yield []
  else:
    for i in xrange( len( the_set ) ):
      for x in permutations( the_set[0:i] + the_set[i+1:], n-1 ):
        yield [the_set[i]]+x

I will go into the yield statement later, maybe I will extend this post, but for now, I'll say that it allows you to make a function that will calculate the combinations on the fly, without storing them in a huge list and then returning the list. It sort of lazy-loads the list of combinations when needed. There is no such thing in PHP (as far as I know). So here's my best shot at the function in PHP:

function permutations( $array, $size )
{
  $result = array();
  $x = count($array);
  for( $i=0; $i<$x; $i++ ) {
    $copy = $array; // copy: array_splice gets the arg by reference
    $item = array_splice( $copy, $i, 1 );
    if( $size == 1 )
      $result[] = $item;
    else {
      $rest = permutations( $copy , $size - 1 );
      foreach( $rest as $r )
        $result[] = array_merge($item, $r);
    }
  }
  return $result;
}

There really are excessive parts of the PHP code like storing the final result, but more importantly copying the array each time because array_splice takes the array argument by reference and modifies it ( talking about orthogonality ), plus its twice as long as the python code and half as readable.

Anyway, to get back at my original problem - the solution worked in terms of accuracy (at least for the first few test cases), but I fear it's going to be slow for large datasets. I have around 7 fields to compare with each respective table of the database, each table having 100 records on average; each record is 3 words long on average which gives 6 permutations per comparison. Importing a list of 1000 clients would require 1000*7*100*6 = 4,200,000 comparisons, plus 700,000 calls to the permutations function (not counting the recursive calls :). I still think that it's better than hammering the database with 7000 fulltext serach queries, not to mention moving the database tables to MyISAM and indexing a bunch of fields. After all, it's an import. I could put one of those useless progress indicators like when you're starting Windows.

Fun with pythonchallenge.com

This site is ultra cool. I should be studying for my exams, but this is bloody addictive: http://pythonchallenge.com

Anyway, I made it to the 10th level where you can see a bull saying: len(a[30]) = ? so obviously i need to calculate the length of the 31st member of a. Now, we don't know what a is, but fortunately, the kaw does. Clicking on it will give you this:

a = [1, 11, 21, 1211, 111221,

A sequence of some sort ... i spent like half an hour trying to decipher it and it didn't make any sense, so I did what every sane person would do: consult the On-line encyclopedia of integer sequences and it turns out it's the look-and-say sequence. How was I supposed to know that? It goes like this: first we have ONE one (1), so we say it: ONE ONE. That's 11. Now we have TWO ones so that's 21. Then we have ONE Two and ONE One (1211). Aham... So to cut things short, here's what I wrote to calculate the length of the 30th member:


def make_look_and_say( members = 1 ):
  seq = ["1"]
  for x in range(0,members):
    new_member = ""
    char = seq[-1][0]
    streak = 1
    for c in seq[-1][1:]:
      if c == char:
        streak += 1
      else:
        new_member += str(streak) + char
        streak = 1
        char = c
    else:
      new_member += str(streak) + char
    seq.append( new_member )
  return seq

print len(make_look_and_say(31)[30])

Now I'm stuck at level 11 and it seems that there is some image that has pixels interpolated or extrapolated, but I really cannot guess what it is. I suppose I'll read up on image libraries some other time.

Anyway, the challenge is awesome, you should try it. There really is only one python specific question (involving pickles) and everything else can be solved (up to level 10 of course :) with any other programming language .