Decks

Unlocking Parallel PyTorch Inference (and More!) with Python Free-Threading

From the speaker who got kicked off the stage after 54 minutes of his 45-minute PyParallel talk at PyData NYC 2013, comes a new talk foaming about the virtues of Python’s new free-threaded support!

This talk will explore how free-threading unlocks cool new practical ways to enhance the next generation of Python programs. We introduce Parallelopedia: a multi-threaded, asyncio-based HTTP server that can do PyTorch model inference in parallel within a single Python process; first against a GPT-2 model we trained ourselves, and then on more contemporary models via HuggingFace (GPT-OSS, Qwen, etc.).

Then, because why not, we straight up just include the entirety of Wikipedia in the same process. Granted, it’s an XML dump from 2015, but it’s a 50GB dump, and using some simple NumPy and datrie data structures (that we can load super-fast in parallel now thanks to free-threading), we can expose a very responsive web interface for keyword searching the entirety of Wikipedia, all within the same Python asyncio HTTP server doing PyTorch model inference in parallel.

Practically speaking, attendees can expect to leave the session with a handful of pragmatic techniques for leveraging Python free-threading today, as well as an appreciation for what’s possible tomorrow.

Nov 8, 2025

Trent Nelson

Git Carpentry

Crafting PRs Reviewers Will Love

Jun 22, 2025

Trent Nelson

Categories

Unlocking Parallel PyTorch Inference (and More!) with Python Free-Threading

Git Carpentry