<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Codegrind &#187; Programming</title>
	<atom:link href="http://jordanovski.com/category/programming/feed" rel="self" type="application/rss+xml" />
	<link>http://jordanovski.com</link>
	<description>Homepage of Dusko Jordanovski</description>
	<lastBuildDate>Fri, 18 Jun 2010 17:37:08 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Maybe exceptions are not so awesome after all</title>
		<link>http://jordanovski.com/maybe-exceptions-are-not-so-awesome-after-all</link>
		<comments>http://jordanovski.com/maybe-exceptions-are-not-so-awesome-after-all#comments</comments>
		<pubDate>Sat, 15 Aug 2009 19:32:59 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Django]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://jordanovski.com/?p=188</guid>
		<description><![CDATA[Maybe I got a bit rusty on my 1 month vacation, but today I ran into a simple problem that I spent 2 hours solving. I Am working on a website made in Django and I decided to make some changes to the model in one of the apps in the project. I currently have [...]]]></description>
			<content:encoded><![CDATA[<p>Maybe I got a bit rusty on my 1 month vacation, but today I ran into a simple problem that I spent 2 hours solving. I Am working on a website made in Django and I decided to make some changes to the model in one of the apps in the project. I currently have 3 apps - game, content and members. One of the classes of the content model imports a class from the game model, but I decided to remove that particular class.</p>
<p>After syncing the DB I noticed that the database tables for the content app are missing.This was strange, because no errors were reported. I tried to do a "python manage.py sqlall content" and the shell threw an error that it couldn't find the content app on my PYTHONPATH.</p>
<p>Error: App with label content could not be found. Are you sure your INSTALLED_APPS setting is correct?</p>
<p>I tried all kinds of different stuff until i found a mailing list that said that if you have import errors in some module, you cannot import it, which is to be expected, but the error messages that come up are all but informative. The message that my PYTHONPATH is incomplete in no way suggests that I have an import error.</p>
<p>What does this have to do with exceptions ?</p>
<p>Well, I'll talk about another example first: While I was writing a screen scraper, I used a lot of regular expressions. The regular expressions return a match object if they successfully make a match and None if they don't. If you try and call the .group() method on a Match object - it does the job, but calling it on a None object throws and AtrributeError. I used to catch this exception in the upper layers of my program in order to avoid crashing the script on invalid input. This turned out to be a bad idea since all kinds of other stuff throws an AttributeError and makes the debugging a living hell.</p>
<p>I suspect that the same thing is going on in Django. Failure to import inside a module throws an ImportError which Django interprets as a missing module - only the module isn't missing - its a "submodule" that is missing. But the exceptions are the same, no matter on which level they happen.</p>
<p>Using exceptions correctly requires defining your own classes for every particular error that you may run into. This is a lot of extra code and need's to be weighed against the old-fashioned error codes.</p>
]]></content:encoded>
			<wfw:commentRss>http://jordanovski.com/maybe-exceptions-are-not-so-awesome-after-all/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A recursive django template tag</title>
		<link>http://jordanovski.com/a-recursive-django-template-tag</link>
		<comments>http://jordanovski.com/a-recursive-django-template-tag#comments</comments>
		<pubDate>Fri, 05 Jun 2009 19:37:39 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Django]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[node]]></category>
		<category><![CDATA[recursion]]></category>
		<category><![CDATA[recursive]]></category>
		<category><![CDATA[template tag]]></category>

		<guid isPermaLink="false">http://jordanovski.com/?p=169</guid>
		<description><![CDATA[Here is my attempt to create a "silver bullet" tag for printing tree structures with the Django templating language. It's far from a silver bullet, tho, but it can do basic stuff:
It's a modification of the standard "for" tag and i have kept the counter, counter0, first and last variables, only this time they are [...]]]></description>
			<content:encoded><![CDATA[<p>Here is my attempt to create a "silver bullet" tag for printing tree structures with the Django templating language. It's far from a silver bullet, tho, but it can do basic stuff:</p>
<p>It's a modification of the standard "for" tag and i have kept the counter, counter0, first and last variables, only this time they are not attributes to forloop, but rather to recurseloop. Check the docstring for more info.</p>
<p>For example, if you need to print comments that have other comments as replies, you would want the comments to appear one below the other, but the replies to be indented a bit. Let's say 20 pixels per level. So the code would be:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: black;">&#123;</span><span style="color: #66cc66;">%</span> load file_that_contains_recurse_tag <span style="color: #66cc66;">%</span><span style="color: black;">&#125;</span>
<span style="color: black;">&#123;</span><span style="color: #66cc66;">%</span> recurse comment <span style="color: #ff7700;font-weight:bold;">in</span> comments children=<span style="color: #483d8b;">&quot;replies&quot;</span> indent=<span style="color: black;">&#40;</span><span style="color: #ff4500;">0</span>,<span style="color: #ff4500;">20</span><span style="color: black;">&#41;</span> <span style="color: #66cc66;">%</span><span style="color: black;">&#125;</span>
  <span style="color: #66cc66;">&lt;</span>div style=<span style="color: #483d8b;">'margin-left:{{indent}}px;'</span><span style="color: #66cc66;">&gt;</span>
    <span style="color: black;">&#123;</span><span style="color: black;">&#123;</span> comment.<span style="color: black;">text</span> <span style="color: black;">&#125;</span><span style="color: black;">&#125;</span>
  <span style="color: #66cc66;">&lt;</span>/div<span style="color: #66cc66;">&gt;</span>
<span style="color: black;">&#123;</span><span style="color: #66cc66;">%</span> endrecurse <span style="color: #66cc66;">%</span><span style="color: black;">&#125;</span></pre></div></div>

<p>This tag will expect a list of comments (top-level) that have a property named 'replies' which contain other comments.</p>
<p>indent is an argument that will start with the float value of 0 and get increased by 20 on each depth level of the recursion. You can pass as many variables like indent as you like. They must be in the form</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;">name=<span style="color: black;">&#40;</span><span style="color: #008000;">float</span>,<span style="color: #008000;">float</span><span style="color: black;">&#41;</span></pre></div></div>

<p>or strings will also work but must be enclosed in quotes</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;">name=<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;string&quot;</span>,<span style="color: #483d8b;">&quot;string&quot;</span><span style="color: black;">&#41;</span></pre></div></div>

<p>Caveat: You must not leave blank spaces between the equal signs when assigning children and additional incremented arguments. I will fix this later.</p>
<p>A second scenario is when you need the parent element to <strong>contain </strong>the children, like in unordered/ordered lists. In that case you can use the {% yield %} tag inside the recurse block. This tag will output the HTML between the recurse and endrecurse tags if there are any children in the current iteration item, or it will output nothing if there are no children.</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: black;">&#123;</span><span style="color: #66cc66;">%</span> recurse comment <span style="color: #ff7700;font-weight:bold;">in</span> comments children=<span style="color: #483d8b;">&quot;replies&quot;</span> indent=<span style="color: black;">&#40;</span><span style="color: #ff4500;">0</span>,<span style="color: #ff4500;">20</span><span style="color: black;">&#41;</span> <span style="color: #66cc66;">%</span><span style="color: black;">&#125;</span>
<span style="color: #66cc66;">&lt;</span>div style=<span style="color: #483d8b;">'margin-left:20px;'</span><span style="color: #66cc66;">&gt;</span>
  <span style="color: black;">&#123;</span><span style="color: black;">&#123;</span> comment.<span style="color: black;">text</span> <span style="color: black;">&#125;</span><span style="color: black;">&#125;</span>
  <span style="color: black;">&#123;</span><span style="color: #66cc66;">%</span> <span style="color: #ff7700;font-weight:bold;">yield</span> <span style="color: #66cc66;">%</span><span style="color: black;">&#125;</span>
  <span style="color: #66cc66;">&lt;</span>/div<span style="color: #66cc66;">&gt;</span>
<span style="color: black;">&#123;</span><span style="color: #66cc66;">%</span> endrecurse <span style="color: #66cc66;">%</span><span style="color: black;">&#125;</span></pre></div></div>

<p>The yield tag (as for now) can <strong>only </strong>be used directly inside the recurse block, much like the {% else %} tag can only be used directly inside the if-endif block. This means that you can't make code like</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: black;">&#123;</span><span style="color: #66cc66;">%</span> recurse comment <span style="color: #ff7700;font-weight:bold;">in</span> comments children=<span style="color: #483d8b;">&quot;replies&quot;</span> indent=<span style="color: black;">&#40;</span><span style="color: #ff4500;">0</span>,<span style="color: #ff4500;">20</span><span style="color: black;">&#41;</span> <span style="color: #66cc66;">%</span><span style="color: black;">&#125;</span>
<span style="color: #66cc66;">&lt;</span>div style=<span style="color: #483d8b;">'margin-left:{{indent}}px;'</span><span style="color: #66cc66;">&gt;</span>
  <span style="color: black;">&#40;</span><span style="color: black;">&#123;</span><span style="color: black;">&#123;</span> comment.<span style="color: black;">text</span> <span style="color: black;">&#125;</span><span style="color: black;">&#125;</span><span style="color: black;">&#41;</span>
<span style="color: #66cc66;">&lt;</span>/div<span style="color: #66cc66;">&gt;</span>
  <span style="color: black;">&#123;</span><span style="color: #66cc66;">%</span> <span style="color: #ff7700;font-weight:bold;">if</span> cond <span style="color: #66cc66;">%</span><span style="color: black;">&#125;</span>
    <span style="color: black;">&#123;</span><span style="color: #66cc66;">%</span> <span style="color: #ff7700;font-weight:bold;">yield</span> <span style="color: #66cc66;">%</span><span style="color: black;">&#125;</span>
  <span style="color: black;">&#123;</span><span style="color: #66cc66;">%</span> endif <span style="color: #66cc66;">%</span><span style="color: black;">&#125;</span>
<span style="color: black;">&#123;</span><span style="color: #66cc66;">%</span> endrecurse <span style="color: #66cc66;">%</span><span style="color: black;">&#125;</span></pre></div></div>

<p>This will fail with an invalid block tag exception. You can check the docstring of the do_recurse function inside the code for more info. Also, you may want to check this code for errors since I just wrote it today, and haven't had much time to test it.</p>
<p><a href="http://jordanovski.com/wp-content/uploads/recurse.zip">Download and comment on any errors you find</a> :)</p>
]]></content:encoded>
			<wfw:commentRss>http://jordanovski.com/a-recursive-django-template-tag/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Maze Drawer</title>
		<link>http://jordanovski.com/always-turn-left-draw-mazes</link>
		<comments>http://jordanovski.com/always-turn-left-draw-mazes#comments</comments>
		<pubDate>Wed, 01 Apr 2009 12:56:35 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[always turn left]]></category>
		<category><![CDATA[code jam]]></category>
		<category><![CDATA[draw maze]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[help]]></category>
		<category><![CDATA[practice]]></category>

		<guid isPermaLink="false">http://jordanovski.com/?p=126</guid>
		<description><![CDATA[This piece of python code is going to draw some mazes in a .PNG format. It's a side-product of the solution to the Google Code Jam practice problem B called "Always turn left". The script takes the input data sets and draws the mazes described with every test case. You will need Python with Python [...]]]></description>
			<content:encoded><![CDATA[<p>This piece of python code is going to draw some mazes in a .PNG format. It's a side-product of the solution to the <a href="http://code.google.com/codejam/contest">Google Code Jam</a> practice problem B called "Always turn left". The script takes the input data sets and draws the mazes described with every test case. You will need Python with Python Imaging Library (PIL) installed. If you're working on OS X like me, check out <a href="http://passingcuriosity.com/2009/installing-pil-on-mac-os-x-leopard/">this link</a> for some tips on how to install PIL.</p>
<p>To run the script type "python left.py" and it will ask for a file path containing the test cases. It will create a png image for each test case in the directory from which the program is run.</p>
<p><a href="http://jordanovski.com/wp-content/uploads/left.zip">Download script + data set and make some mazes</a></p>
]]></content:encoded>
			<wfw:commentRss>http://jordanovski.com/always-turn-left-draw-mazes/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Some problem solving and how it&#8217;s easier with python</title>
		<link>http://jordanovski.com/some-problem-solving-and-how-its-easier-with-python</link>
		<comments>http://jordanovski.com/some-problem-solving-and-how-its-easier-with-python#comments</comments>
		<pubDate>Mon, 23 Mar 2009 02:12:48 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[PHP]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[algorithms]]></category>
		<category><![CDATA[generators]]></category>
		<category><![CDATA[text comparison]]></category>

		<guid isPermaLink="false">http://jordanovski.com/?p=111</guid>
		<description><![CDATA[There's a project I work on that required me to make an import utility for a CRM. The import should get a comma separated values file of clients and information about clients, and save it to the database. The database is split across several tables, so in the `clients` table I normally don't keep the [...]]]></description>
			<content:encoded><![CDATA[<p>There's a project I work on that required me to make an import utility for a CRM. The import should get a comma separated values file of clients and information about clients, and save it to the database. The database is split across several tables, so in the `clients` table I normally don't keep the name of the company, but just a foreign key. Now, our client is not very good with numbers and she needed to import files in which she could enter the name of the company instead of the database ID. A spreadsheet row representing a client looks like this:</p>
<pre><code style="font-family: monaco,consolas,monospace;">FirstName | LastName | Email               | Company
John      | Doe      | johndoe@example.com | Coca Cola
</code></pre>
<p>But the database row in the `clients` table looks like this:</p>
<pre><code style="font-family: monaco,consolas,monospace;">first_name | last_name | email            | company_id
John       | Doe       | john@example.com | 2
</code></pre>
<p>What I need to do is search for the company named 'Coca Cola' in the `companies` table and replace the name with it's ID. This is all fine except for one problem - typos. Moreover, the user could write "Apple Computer Inc." instead of "Apple Inc.". So I needed a way to compare the input strings with the ones in the database.</p>
<p>After poking around I found out about the <a href="http://en.wikipedia.org/wiki/Levenshtein_distance">Levenshtein distance</a> between strings, but that solved only half of my problems - the typo part. The distance would be very small between "Apple" and "Aple" but very big between "ACME International Inc." and "International ACME Inc.", and the latter two are obviously the same.</p>
<p>I devised the following method to compare entries:</p>
<ol>
<li>Split up the terms by words and eliminate blanks</li>
<li>Get the Levenshtein distance between each word from the first term and each word from the second term. Comparing "Apple Computer Inc." with "Apple Inc." for example, will give a matrix of 6 distances. <img class="size-full wp-image-112 aligncenter" style="margin-top: 5px; margin-bottom: 5px;" title="lev_matrix" src="http://jordanovski.com/wp-content/uploads/lev_matrix.png" alt="lev_matrix" width="347" height="227" /></li>
<li>Get the shortest term (one with less words, not the one with less characters). It has 2 words in this case. Then choose the smallest values from each <em>row</em>. When you pick the smallest <em>row</em> value, you cannot pick anymore values from that <em>column</em>. This means that the word in the <em>column</em> is the best match for some word in the <em>rows</em>.</li>
<li>Add these values up and add the difference between the word count of the 2 terms - and you have a score for the similarity of the terms. If the score is zero, they are the same. We are adding +1 for each extra word, but this can be weighted if needed. The point is that we don't care much for extra words since company names can have many words in them, but they are often called by one or two words.</li>
</ol>
<p style="text-align: left;">But there is a problem with step 3. If, for example, a column has the lowest values for more than one row, we always choose the first, and this practice is not always the best answer. For instance, matching "Fast Cats" with "Fats Cats" (notice the typo) gets a total score of 3 - matching <em>Cats</em> to <em>Fats</em> and <em>Fast to</em> <em>Cats, </em>which is wrong - it will be 2 if we match <em>Fast</em> to <em>Fats</em> and <em>Cats</em> to <em>Cats</em>, which is the intended solution.</p>
<p style="text-align: left;">
<p style="text-align: left;"><img class="aligncenter size-full wp-image-113" style="margin:5px;" title="fast_cats" src="http://jordanovski.com/wp-content/uploads/fast_cats.png" alt="fast_cats" width="207" height="97" /></p>
<p style="text-align: left;">So to be sure we have the best match, we need to always have the lowest sum that is unique across rows and columns. One solution is to make all permutations of the words in the <em>columns </em>and join them to a single permutation of the words <em>in the rows </em>then see which one has the lowest score. If the words in the rows are fewer then we need to get all permutations <strong>P(n,k)</strong> of the words in <em>the columns, </em>where <strong>n </strong>is the number of columns and <strong>k</strong> is the number of rows. This is a O(n!) algorithm but it's the best that I could think of - practically the same problem as finding every possible way to place 8 rooks on a chess table without making them attack each other.</p>
<p style="text-align: left;">And finally, here is the part where we get to write some code. I need a function that can calculate all permutations consisted of <strong>k</strong> elements out of a larger set consisted of <strong>n</strong> elements (<strong>k</strong> &lt;= <strong>n</strong>).</p>
<p style="text-align: left;">I decided first to write the algorithm in Python because it's cleaner and easier to think, and then to rewrite it in PHP. The first attempt was really, really sucky and I won't talk about it because I'm a bit embarassed. But I wasn't aware of a neat thing that Python has: the <strong>yield</strong> statement. The darn thing can be written in 6 lines with it:</p>
<pre><code style="font-family: monaco,consolas,monospace;">def permutations(the_set, n):
  if n==0:
    yield []
  else:
    for i in xrange( len( the_set ) ):
      for x in </code><code style="font-family: monaco,consolas,monospace;">permutations</code><code style="font-family: monaco,consolas,monospace;">( the_set[0:i] + the_set[i+1:], n-1 ):
        yield [the_set[i]]+x
</code></pre>
<p>I will go into the yield statement later, maybe I will extend this post, but for now, I'll say that it allows you to make a function that will calculate the combinations on the fly, without storing them in a huge list and then returning the list. It sort of lazy-loads the list of combinations when needed. There is no such thing in PHP (as far as I know). So here's my best shot at the function in PHP:</p>
<pre><code style="font-family: monaco,consolas,monospace;">function permutations( $array, $size )
{
  $result = array();
  $x = count($array);
  for( $i=0; $i&lt;$x; $i++ ) {
    $copy = $array; // copy: array_splice gets the arg by reference
    $item = array_splice( $copy, $i, 1 );
    if( $size == 1 )
      $result[] = $item;
    else {
      $rest = permutations( $copy , $size - 1 );
      foreach( $rest as $r )
        $result[] = array_merge($item, $r);
    }
  }
  return $result;
}
</code></pre>
<p>There really are excessive parts of the PHP code like storing the final result, but more importantly copying the array each time because array_splice takes the array argument by reference and modifies it ( talking about orthogonality ), plus its twice as long as the python code and half as readable.</p>
<p>Anyway, to get back at my original problem - the solution worked in terms of accuracy (at least for the first few test cases), but I fear it's going to be slow for large datasets. I have around 7 fields to compare with each respective table of the database,  each table having 100 records on average; each record is 3 words long on average which gives 6 permutations per comparison. Importing a list of 1000 clients would require 1000*7*100*6 = 4,200,000 comparisons, plus 700,000 calls to the permutations function (not counting the recursive calls :). I still think that it's better than hammering the database with 7000 fulltext serach queries, not to mention moving the database tables to MyISAM and indexing a bunch of fields. After all, it's an import. I could put one of those useless progress indicators like when you're starting Windows.</p>
]]></content:encoded>
			<wfw:commentRss>http://jordanovski.com/some-problem-solving-and-how-its-easier-with-python/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Get your copy of bronze framework</title>
		<link>http://jordanovski.com/get-your-copy-of-bronze-framework</link>
		<comments>http://jordanovski.com/get-your-copy-of-bronze-framework#comments</comments>
		<pubDate>Wed, 25 Feb 2009 02:07:01 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://jordanovski.com/?p=56</guid>
		<description><![CDATA[Bronze framework is a PHP MVC framework for building web applications. It was inspired by many other frameworks that can be found around. It uses a system of internal redirects and a recursive front controller to allow the developer to reuse code as much as possible, and avoid repetition.
For now, the only thing you can [...]]]></description>
			<content:encoded><![CDATA[<p>Bronze framework is a <a href="http://php.net">PHP</a> <a href="http://en.wikipedia.org/wiki/Model-view-controller">MVC</a> framework for building web applications. It was inspired by many other frameworks that can be found around. It uses a system of internal redirects and a recursive front controller to allow the developer to reuse code as much as possible, and avoid repetition.</p>
<p>For now, the only thing you can get is the source and consult :) I need some time to write proper documentation.</p>
<p>Anyway, download  <a href="http://code.google.com/p/bronze-framework/">here</a>, and please comment on it <a href="http://jordanovski.com/bronze">here</a></p>
]]></content:encoded>
			<wfw:commentRss>http://jordanovski.com/get-your-copy-of-bronze-framework/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
