A few months ago the Tech Team at dorthy.com was tasked with the creation of an in-house url shortening service which will enable users to share dorthy urls over services like twitter.
We settled quickly on two technologies that would allow for rapid development and impressive performance.
Redis was chosen for persistence. It is an extremely fast data structures store and provides for simple replication and scaling on top of a clean programatic interface. We chose Twisted as our server framework as it’s simple event-driven networking architecture utilizing the deferred paradigm lends itself to rapid development and excellent performance for short-duration, non-computationally intensive tasks.
In this example we will illustrate how to create a simple protocol that allows requests for url setting and saving to be sent over the wire in JSON format. For a more robust system one may chose a remote protocol such as Google’s Protocol Buffers or the Apache project’s Thrift protocol which is well supported in the Twisted framework.
Setting up a basic twisted server is beyond the scope of this tutorial however the entire project can be downloaded here.
Encoding and Decoding functions
First lets take a look at the core functions of the system, which will be to create a “tiny” url like “http://dorthy.com/a/2Xg” from a “big” url like “http://www.dorthy.com/discover/h22f1/go-skiing-in-park-city-utah“. We would like the system to work with whatever base “tiny” url we’d like so we really only care about the last few digits of the tiny url, in this case “2Xg”. We’d also like to generate as many of these urls using the least number of actual characters possible. To do this we’ll start a numbering system that uses all of the digits 0-9 inclusive as well as all possible upper and lowercase characters; these are all things that should be fairly easy to type on all kinds of devices. When we sum those up and count them we get 62 available characters. We could store those “tiny” characters in redis as keys and use the big url as values, and keep track of the next key to be allocated in the server in memory. What if we want to one day expand the number of url shortening servers we deploy, but have them able to allocate new tiny urls independently? In this case we will choose to use redis’s incr function to allocate new numbers on the server side. This will ensure that all clients get a unique, one-time-use key. Redis’s incr function will take a given key and increase it’s value by 1 each time it is called and return the results. We will use a couple of functions in the included base62.py file to convert our “tiny” urls to and from integers. NOTE: the GYLPHS variable is our list of 62 characters we will be using for our tiny urls.
def b62enc(number):
try:
number =
int(number)
except ValueError:
return -
1
if number <
0:
return -
1
def tobase(remain, result=
”):
"""Recursive function to encode"""
if remain ==
0:
return result
else:
return tobase(remain/BASE,
GLYPHS[remain%BASE] + result)
return tobase(number)
def b62dec(decodeme):
decodeme = str(decodeme)
if not decodeme.isalnum():
return -1
power = 0
result = 0
for letter in decodeme[::-1]:
number = GLYPHS.index(letter)
result += number*(BASE**power)
power = power + 1
return result
Setting up our Protocol - Subclassing Twisted’s base Protocol
We’d like to set up a server now, to listen on the wire for some JSON encoded requests for getting and setting shortened urls and fire back responses. In this example we’ll simply subclass twisted.internet.protocol.Protocol and define our own handlers for some functions. The core functions will be detailed below and a few others can be seen in the example file.
When our server receives some data, the dataRecieved function in our class will be called with the data as the parameter. We will decode that data and return an error if the data is incorrectly formatted or contains an invalid command. In the event the request looks solid, we’ll pass it on to a handler. This code is below.
def dataReceived(self, data):
"""Main handler of incoming data"""
try:
message = jdecode(data)
except ValueError:
self.transport.write(jencode(ERROR_MSG))
self.transport.loseConnection()
if message['request'].lower() not in COMMANDS:
self.transport.write(jencode(ERROR_MSG))
self.transport.loseConnection()
if message['request'].lower() == ‘get’:
self.getHandler(message['url'])
if message['request'].lower() == ’set’:
self.setHandler(message['url'])
Connecting to and interacting with Redis in a Twisted Environment
This is where the magic starts. The handler functions are wrapped in the @defer.inlineCallbacks decorator. This wrapper, along with the txredis client library, is what allows us to do blocking network functions within an event driven environment like twisted without having to spawn another thread. These functions create a redis client, connect the client to the server, and issue commands using a generator-like syntax. After receiving results from their redis actions they’ll add their data to a “pre-fab” response object and send it along to the client
@defer.inlineCallbacks
def setHandler(self, url):
"""Handles a request by creating a b62 string for a url
for a given b62 encoded string"""
clientCreator = protocol.ClientCreator(reactor, Redis, **REDIS_CONFIG)
redis = yield clientCreator.connectTCP(REDIS_HOST, REDIS_PORT)
newkey = yield redis.incr(REDIS_INDEX)
res = yield redis.set(newkey, url)
ret = OK_MSG
ret['command'] = ’set’
ret['tiny'] = b62enc(newkey)
ret['big'] = url
self.sendIt(jencode(ret))
@defer.inlineCallbacks
def getHandler(self, url):
"""Handles a request by getting a url
for a given b62 encoded string"""
key = b62dec(url)
if key < 0:
sendIt(jencode(ERROR_MSG))
return
clientCreator = protocol.ClientCreator(reactor, Redis, **REDIS_CONFIG)
redis = yield clientCreator.connectTCP(REDIS_HOST, REDIS_PORT)
url = yield redis.get(key)
ret = OK_MSG
ret['command'] = ‘get’
ret['tiny'] = key
ret['big'] = url
self.sendIt(jencode(ret))
The rest of the wiring to expose these functions to the network and do some logging can be seen in the source files provided with this tutorial. To run the server, make sure you have the required libraries (twisted, simplejson, txredis), and a redis server running on it’s default port and simply execute the shorterurl.py file in python. You can test out your server using telnet as follows:
$ telnet 127.0.0.1 5757
Trying 127.0.0.1…
Connected to localhost.
Escape character is ‘^]’.
{”request”: “set”, “url”: “http://www.dorthy.com/discover/h22f1/go-skiing-in-park-city-utah“} <— you send this
{”big”: “http://www.dorthy.com/discover/h22f1/go-skiing-in-park-city-utah”,
“command”: “set”, “tiny”: “2Xg“, “respose”: “ok”} <– you get this
$ telnet 127.0.0.1 5757
Trying 127.0.0.1…
Connected to localhost.
Escape character is ‘^]’.
{”request”: “get”, “url”: “3″} <— you send this
{”big”: “http://www.dorthy.com/discover/h22f1/go-skiing-in-park-city-utah“,
“command”: “get”, “tiny”: 2Xg, “respose”: “ok”} <– you get this
That is it for this tutorial but the dorthy.com Tech team would like to send much thanks out to the folks at contributing to Twisted, Antirez and the rest of the Redis contributors, as well as Dorian Raymer for the txredis libraries.