Asynchronous Python with gevent

Spamming web servers with HTTP GETs since the mid-2000s.

Python hasn't always made asynchronous programming easy. There are languages that are better suited for asynchronous work than Python, but sometimes you don't want to use other languages.

gevent has been one relatively straightforward means of doing async networking in Python for at least a decade now.

Take, for example, a script that - first - sends a number of synchronous HTTP GET requests to a web server, then - second - sends the same number of asynchronous requests. The results of running show a higher average request time (due to overhead introduced by gevent), but a significantly lower total elapsed time:

$ ./get.py 50 https://google.com
Target: https://google.com
Request count: 50
Testing synchronous requests...
  Total elapsed time    15.10 seconds
  Average request time  0.30 seconds
Testing asynchronous requests (gevent)...
  Total elapsed time    2.14 seconds
  Average request time  1.39 seconds

Here's a simplified version of the script:

#!/usr/bin/env python

from gevent import monkey, joinall, spawn
monkey.patch_all()

import requests

def get(url):
    return requests.get(url).status_code

running_greenlets = [spawn(get, 'https://google.com') for i in range(0, 50)]
finished_greenlets = joinall(running_greenlets)
results = [greenlet.value for greenlet in finished_greenlets]
print(results)

The script breaks down like so:

Line 3 - import monkey, joinall, and spawn from gevent.
Line 4 - immediately patch portions of the standard library with gevent-friendly replacements. This call to monkey.patch_all() should be done as early as possible in program execution.
Lines 8-9 - define a function that sychronously completes the required task - in this case, times an HTTP request/response cycle against a target URL - and returns a result.
Line 11 - call spawn() once for each target, passing it the synchronous function followed by any other arguments. Under the hood, each call to spawn() creates a Greenlet object that is scheduled to run the synchronous function.
Line 12 - call joinall() to wait for each greenlet to finish its execution of get().
Line 13 - for each greenlet returned by joinall(), access get()'s return value through the .value property.
Line 14 - do something with the results.

This feels like a pretty straightforward and approachable way to do async without too much boilerplate or complexity.

Need more information? There is a useful introduction and the docs are excellent. Ditching the Task Queue for Gevent was an interesting read as well, although I'm not convinced moving all your task processing into the same process as your web server is a good thing.

I haven't tried out Python 3's asyncio yet but that is on the to-do list.

Posted on: Mon 05 November 2018

Category: tech – Tags: tech