IT Detective Mission 0.1 documentation

The hacker’s guide to scaling python

The hacker’s guide to scaling python

Author: Julien Danjou Date of finishing: 21/03/2018


I already read The Hacker's Guide to Python, which was the previous book of Julien Danjou, and it was a must read for any python developer from my point of view. This previous book focused on all the common pitfalls python developer does, and give some easy solution to all of them. This new book focused on scaling python, with a wide range variety of solution.

The book is very pleasant to read thanks to two main points: - the balance between the theory, the definitions, code example is perfect - the format is closer to a story, with lots of interviews

This book is one of my recommendation for your library 2018, and you can buy it online, it takes only few seconds to do.

Table of content

  1. Scaling?
  2. CPU Scaling
  3. Event Loops
  4. Functional Programming
  5. Queue-Based Distribution
  6. Designing for Failure
  7. Lock Management
  8. Group membership
  9. Building REST API
  10. Deploying on PaaS
  11. Testing Distributed Systems
  12. Caching
  13. Performance

Learned in the book

  • Stateless on everything except the database
  • Consensus algorithm: PAXOS / RAFT
  • Functional programming is an easy way to stay stateless in your business application
  • Gabbi python library for testing API framework, with some YAML syntax. wsgi-intercept also a good tool for testing.
  • Threads: different part of the same process are run concurrently in the same processor or not, determined by the OS scheduler. Threads are only limited to gain throughput when dealing with IO. It is very rare to gain more than 120% with python threads. Mainly because of the GIL.
  • Use requests session to avoid the TCP handshake each time
  • wrk and siege as benchmark HTTP tool
  • HTTP streaming possible with Server-Sent Events with HTML5, but also available since HTTP 1.1 with Transfer-Encoding: chunked, and WebSocket protocol. LISTEN / NOTIFY in PostgreSQL available to stream data
  • API best practice available in RFC HTTP + HAT OAS + OpenAPI Initiative + OpenStack API Working Group Guidelines
  • Linux perf tool very handy to bench. (perf stat, perf record, perf report …)
  • dogpile.cache: library to cache SQL Queries

Some rules to keep in mind:

  • Don’t use a framework if you are starting from scratch. P.216
  • Use middleware to handle authentication, logging, encoding, exception catching… P.217
  • Meter everything […] Do not trust averages, trust the percentiles and volumes comparison over time. P.175