This talk will explore how free-threading unlocks cool new practical ways to enhance the next generation of Python programs. We introduce Parallelopedia: a multi-threaded, asyncio-based HTTP server that can do PyTorch model inference in parallel within a single Python process; first against a GPT-2 model we trained ourselves, and then on more contemporary models via HuggingFace (GPT-OSS, Qwen, etc.).
Then, because why not, we straight up just include the entirety of Wikipedia in the same process. Granted, it’s an XML dump from 2015, but it’s a 50GB dump, and using some simple NumPy and datrie data structures (that we can load super-fast in parallel now thanks to free-threading), we can expose a very responsive web interface for keyword searching the entirety of Wikipedia, all within the same Python asyncio HTTP server doing PyTorch model inference in parallel.
Practically speaking, attendees can expect to leave the session with a handful of pragmatic techniques for leveraging Python free-threading today, as well as an appreciation for what’s possible tomorrow.