IdeaMonk: Who stole all that bandwidth? or was it a design accident, was it a python ... ?

That's what happened to Web2Hunter yesterday. Downtime due to quota exceed, I never thought 1 Gb a day would ever exceed... but now I actually have to think about saving every possible bit. Here is my explanation to the downtime of Web2Hunter -

I post about Web2Hunter on HN. Drives good deal of traffic to Web2Hunter.
Many users throw requests simultaneously, thanks to 5 second ajax loop which never ends for sake of Web2Hunter's simplicity/chaos or randomness.
Web2Hunter validates domains using Google, this contributes much less to the outgoing bandwidth, no problem so far.
Google gives back html, each roughly 3-4 KB in size. Given a surge in traffic, and the 5 second ajax loop, this means a lot of incoming bandwidth.
Again, the response to ajax request is minute (a domain name), so nothing much to do with outgoing bandwidth quotas.
So, even if things are ajaxified, the poor design used to suck up lot of bandwidth internally

Solutions - things have been working fine now, and incoming bandwidth quota isn't climbing every 5 minute as it used to do yesterday.

Ajaxwhois API came out as a life saver. It returns a few bytes of JSON instead of chunk of html, and I don't even need to parse the JSON for it is too simple to make out from it's contents.
¶It could be possible for someone to misuse/proxy the domain finding url that ajax loop fetches every 5 seconds. For that I've tried to filter the requesting client by referer information. This would reduce the chances of hotlinking/sucking
```
if (self.request.headers.get('Referer') != None):
if (self.request.headers.get('Referer').find('web2hunter.appspot.com')!=-1):
   # show your content
.
.
.
```
Images to be converted to jpeg, obviously every bit counts, and we really don't need fancy png without any need of translucency.
Another one yet to be implemented - since web2hunter does magic of random-combination to get you cool names, one good idea to save some more bandwidth would be to store all the domains that are unavailable so as to not to lookup again for them. Hmmm... but that means we lose out on any domain marked today, if it expires tomorrow. Hmm... I guess this idea would also need monthly truncation/refreshment of domains stored in unavailable list. Not good for maintenance...

So, till then I hope Web2Hunter runs smoothly on the Engine without any glitches. So far, even a one hour test has not been able to do anything to 12% incoming bandwidth :)

Labels: Google App Engine, python, web2hunter

Wednesday, September 16, 2009

Who stole all that bandwidth? or was it a design accident, was it a python ... ?

0 Comments:

Post a Comment

About me

labels

projects

game-dev

open source

python

programming

artwork

Recently...

Archives

elsewhere...

traffic