demucs-onnx¶

The canonical way to run and export HT-Demucs / Demucs music source separation as ONNX. Pure numpy + onnxruntime at inference (no PyTorch), a one-liner export pipeline that fixes the four known blockers in torch.onnx.export, and a copy-pasteable onnxruntime-web path for the browser. Powers the StemSplit production stack.

Install via pip · View on GitHub · View on PyPI · Hugging Face models

Quick start¶

pip install 'demucs-onnx[mp3]'

# One command -> karaoke instrumental as a shareable MP3.
demucs-onnx separate song.mp3 out/ --karaoke --mp3
# writes out/karaoke.mp3  (drums + bass + other, vocals removed)

# Or every stem, automatically picking the best GPU on this host.
demucs-onnx separate song.mp3 out/

# 6-stem mode with guitar + piano (new in v0.3):
demucs-onnx separate song.mp3 out/ --model htdemucs_6s

# Scaffold a browser demo and try it locally:
demucs-onnx browser-demo /tmp/browser_demo
cd /tmp/browser_demo && python -m http.server 8080

What you get¶

A pure-numpy + onnxruntime inference path that runs HT-Demucs FT, htdemucs (single 4-stem), and htdemucs_6s (6-stem with guitar + piano) with no PyTorch dependency. Install footprint drops from ~2 GB (PyTorch) to ~50 MB (onnxruntime).
A one-call ONNX export pipeline — export_to_onnx("htdemucs_ft", ...) — that applies all four blocker patches and parity-checks the output before writing.
Independent, grep-able patch modules (stft.py, mha.py, pos_embed.py, segment.py) so you can lift any single fix into a different project.
onnxruntime-web recipes for Vite, Webpack, esbuild, Next.js, and Rollup, plus a zero-build vanilla HTML demo and a React + Vite demo emitted by demucs-onnx browser-demo.
9 ONNX model repos on Hugging Face, auto-downloaded on first use.

The 4 blockers¶

For the entire history of the demucs repo (2021 – 2026) nobody on PyPI has shipped working ONNX export tooling for HT-Demucs. Searching GitHub turns up half a dozen abandoned forks, all stuck on one of four blockers, all without a working .onnx file to show for it. The official demucs README has no mention of ONNX.

We solved it. See Export your own for the full write-up with code references.

#	Blocker	Fix
1	`torch.stft` complex64 outputs	`Conv1d` with sin/cos kernels — `RealSTFT` / `RealISTFT`
2	`model.segment = Fraction(39, 5)`	Coerce to `float`
3	`random.randrange` in pos-embedding	Hardcode `shift=0`
4	`aten::_native_multi_head_attention` has no ONNX symbolic	Drop-in MHA forward built from `Linear` / `bmm` / `softmax`

Net result, end-to-end parity vs PyTorch fp32:

Model	max abs diff (random 1×2×343980)
`htdemucs`	6.62 × 10⁻⁴
`htdemucs_6s`	2.42 × 10⁻⁴
`htdemucs_ft` drums	1.63 × 10⁻⁴
`htdemucs_ft` bass	1.42 × 10⁻⁴
`htdemucs_ft` other	1.71 × 10⁻⁴
`htdemucs_ft` vocals	1.55 × 10⁻⁴

…and the ONNX graph runs in onnxruntime CPU at 1.31× the speed of PyTorch CPU on Apple M4 Pro (no GPU).

Where next¶

Install — pip extras for MP3 output and the export pipeline.
CLI reference — every flag, with copyable examples.
Python API — autogenerated from docstrings.
Browser support — Vite, Webpack, vanilla HTML, React.
Models — every model in the registry with size, speed, quality.
Export your own — the canonical reference for the four blockers, with full code examples.
Comparison — honest comparison vs spleeter, raw demucs, and cloud APIs.

Skip the infrastructure¶

Don't want to bundle a 316 MB model in your app, manage a GPU pool, or write overlap-add chunking? Use the StemSplit API instead — same models under the hood, hosted for you, with credits and a dashboard.