thoughts, ideas, code and other things...

Saturday, July 17, 2010

Poorly implementing caching in python - eye opener

So after looking at some slides on caching function returns in javascript, I was keen on trying out so in Python. And LOL I came up with this logic -
if val in cache.keys():
return cache[val]
do the right thing...

But it seems, though "if val in cache.keys()" sounds very human friendly, it definitely would suck for a very big cache, and so it does in the following test.

I guess I'm not using timeit in the classical way where I would pass some statements in string and ask it to do them for N number of times, but it seems passing some variables from existing code to statement string is a pain, tried global etc, didn't work. Hence a simple time diff test.
# -*- coding: utf-8 -*-
import random
from timeit import Timer

class PlainColorParser:
def parse(self,value):
rgb = int(value[1:], 16)
r = rgb >> 16 & 0xff
g = rgb >> 8 & 0xff
b = rgb & 0xff
return (r,g,b)

def __call__(self,value):
return self.parse(value)

class NoBitColorParser:
def __call__(self,value):
return (int(value[1:3],16), int(value[3:5],16), int(value[5:7],16))

class PoorCachedColorParser(PlainColorParser):
def __init__(self):
self.cache = {}

def __call__(self,value):
if value in self.cache.keys():
return self.cache[value]
self.cache[value] = self.parse(value)
return self.cache[value]

class CachedExceptionColorParser(PlainColorParser):
def __init__(self):
self.cache = {}

def __call__(self,value):
return self.cache[value]
except KeyError:
self.cache[value] = self.parse(value)
return self.cache[value]

class CachedColorParser(PlainColorParser):
def __init__(self):
self.cache = {}

def __call__(self,value):
if value in self.cache:
return self.cache[value]
self.cache[value] = self.parse(value)
return self.cache[value]

if __name__ == "__main__":
t = Timer()
pccParse = PoorCachedColorParser()
cecParse = CachedExceptionColorParser()
ccParse = CachedColorParser()
pcParse = PlainColorParser()
nbParse = NoBitColorParser()

# setup some random data to test
colors = []
for i in xrange(100000):
colors.append("#" + hex(random.randint(0xfe0000, 0xff0aff))[2:])

def timeDiff(obj):
start = t.timer()
for c in colors:
stop = t.timer()
return ((1000000*stop - 1000000*start)/1000000)

# ---- test poorly cached
print "Poorly Cached - %.2fs" % timeDiff(pccParse)

# ---- test exception cached
print "Exception Cached - %.2fs" % timeDiff(cecParse)

# ---- test cached
print "Cached - %.2fs" % timeDiff(ccParse)

# ---- test uncached
print "Non Cached - %.2fs" % timeDiff(pcParse)

# ---- test no bitwise, uncached
print "Not Bitwise, Non Cached - %.2fs" % timeDiff(nbParse)

So we've got 4 5 classes to represent different ways of parsing an html hex code for color e.g. "#f00f00" into a tuple of (r,g,b) integers. PlainColorParser and NoBitColorParser could easily be functions with no need of classes over them as they do not cache, but to bring them a little equal to other two cached ones, I've bound them in classes.

NoBitColorParser does string manipulations and parses 3 times before returning a tuple. PlainColorParser does better than that, it uses bit shifts and AND masks to filter out content after 1 round of parsing integer from string. PoorCachedColorParser does caching in an obvious way "if its there in cache keys... else ...", and CachedColorParser CachedExceptionColorParser complies to the philosophy of "Fail early, fail often", which is quite interesting :D
but recent findings reveal that CachedColorParser is the right, fast, pythonic way.

What the test does is - generate 100000 random color codes, pick them from a range of 2816 colors (0xff0aff - 0xff0000 + 1). Obviously many colors are bound to get repeated says pigeon hole.

Here's what goes on in an average run on my machine -
abhishekmishra@mbp [~/code]> python
Poorly Cached - 340.23s
Exception Cached - 0.30s
Cached - 0.18s
Non Cached - 0.15s
Not Bitwise, Non Cached - 0.16s

"if value in self.cache.keys():" in PoorCachedColorParser gives you a thumbs down with a sucky performance, obviously not the right thing to do!!! (I was mistaken)

CachedExceptionColorParser gives a sweet 0.30s, "Fail early, Fail often" works :)
But wait, CachedColorParser goes further even a bit more with just 0.18s.

NoBitColorParser is suckier than the Non Cached PlainColorParser, which points out that string ops, parsing integers is one costly affair.

So much for food to my sleeplessness. Oh I remember doing something similar in PyMos too :D.

More updates -

Looking at Dhananjay's code on BangPypers, I think I was too excited to throw in the idion of fail fast in this place, it can albeit be done in a cleaner way. So instead of -
return self.cache[value]
except KeyError:
self.cache[value] = self.parse(value)
return self.cache[value]

You could just write in much cleaner way -
if value in self.cache:
return self.cache[value]
self.cache[value] = self.parse(value)
return self.cache[value]

So the issue was with cache.keys(), which now seems like an obvious slow and shitty way.
New stats reveal that even Try: Except:... fail fast is even not the right way.

Notice the almost double time difference between "try.. except.." way and "if value in self.cache".

Lesson learnt :)

Labels: , , ,

Wednesday, July 14, 2010

अपने घर के चोर

You find it to be a sorry ass pussy system 
that fails to respond to situations in time, 
you read up, watch stuff and blame on them for "not doing anything", 
but hardly would it strike to your conditioned imagination 
that they might just be so pre-occupied in doing something else 
that you could have never imagined. 

And yet you passive cunt, you take it all for granted
and sleep for the next day to come by,
expecting everything to be normal.

Who knows for everything,
that you quoted the system to be pussy about,
over your cup of tea and a morning newspaper,
there were these "अपने घर के चोर",
thiefs, filling their pockets.

On a side note,
Now that things have become so well connected,
the systems are so well informed, fed by numerous channels,
and fingers just like the ones that typed out this piece of brainf*ck,
clicking out zillions of preferences,
we're not far from the point when
it would be possible to crunch these into
a whole new understanding, meaning, life.

Just that in this spiderweb,
analyzing nodes, reaching from one to another,
predicting their minds and way of thinking has become easier.