demucs-onnx¶
The canonical way to run and export HT-Demucs / Demucs music source separation as ONNX. Pure numpy + onnxruntime at inference (no PyTorch), a one-liner export pipeline that fixes the four known blockers in
torch.onnx.export, and a copy-pasteableonnxruntime-webpath for the browser. Powers the StemSplit production stack.
Install via pip · View on GitHub · View on PyPI · Hugging Face models
Quick start¶
pip install 'demucs-onnx[mp3]'
# One command -> karaoke instrumental as a shareable MP3.
demucs-onnx separate song.mp3 out/ --karaoke --mp3
# writes out/karaoke.mp3 (drums + bass + other, vocals removed)
# Or every stem, automatically picking the best GPU on this host.
demucs-onnx separate song.mp3 out/
# 6-stem mode with guitar + piano (new in v0.3):
demucs-onnx separate song.mp3 out/ --model htdemucs_6s
# Scaffold a browser demo and try it locally:
demucs-onnx browser-demo /tmp/browser_demo
cd /tmp/browser_demo && python -m http.server 8080
What you get¶
- A pure-numpy + onnxruntime inference path that runs HT-Demucs FT, htdemucs (single 4-stem), and htdemucs_6s (6-stem with guitar + piano) with no PyTorch dependency. Install footprint drops from ~2 GB (PyTorch) to ~50 MB (onnxruntime).
- A one-call ONNX export pipeline —
export_to_onnx("htdemucs_ft", ...)— that applies all four blocker patches and parity-checks the output before writing. - Independent, grep-able patch modules
(
stft.py,mha.py,pos_embed.py,segment.py) so you can lift any single fix into a different project. -
onnxruntime-webrecipes for Vite, Webpack, esbuild, Next.js, and Rollup, plus a zero-build vanilla HTML demo and a React + Vite demo emitted bydemucs-onnx browser-demo. - 9 ONNX model repos on Hugging Face, auto-downloaded on first use.
The 4 blockers¶
For the entire history of the demucs
repo (2021 – 2026) nobody on PyPI has shipped working ONNX export
tooling for HT-Demucs. Searching GitHub turns up half a dozen abandoned
forks, all stuck on one of four blockers, all without a working .onnx
file to show for it. The official demucs README has no mention of ONNX.
We solved it. See Export your own for the full write-up with code references.
| # | Blocker | Fix |
|---|---|---|
| 1 | torch.stft complex64 outputs |
Conv1d with sin/cos kernels — RealSTFT / RealISTFT |
| 2 | model.segment = Fraction(39, 5) |
Coerce to float |
| 3 | random.randrange in pos-embedding |
Hardcode shift=0 |
| 4 | aten::_native_multi_head_attention has no ONNX symbolic |
Drop-in MHA forward built from Linear / bmm / softmax |
Net result, end-to-end parity vs PyTorch fp32:
| Model | max abs diff (random 1×2×343980) |
|---|---|
htdemucs |
6.62 × 10⁻⁴ |
htdemucs_6s |
2.42 × 10⁻⁴ |
htdemucs_ft drums |
1.63 × 10⁻⁴ |
htdemucs_ft bass |
1.42 × 10⁻⁴ |
htdemucs_ft other |
1.71 × 10⁻⁴ |
htdemucs_ft vocals |
1.55 × 10⁻⁴ |
…and the ONNX graph runs in onnxruntime CPU at 1.31× the speed of
PyTorch CPU on Apple M4 Pro (no GPU).
Where next¶
- Install — pip extras for MP3 output and the export pipeline.
- CLI reference — every flag, with copyable examples.
- Python API — autogenerated from docstrings.
- Browser support — Vite, Webpack, vanilla HTML, React.
- Models — every model in the registry with size, speed, quality.
- Export your own — the canonical reference for the four blockers, with full code examples.
- Comparison — honest comparison vs spleeter, raw demucs, and cloud APIs.
Skip the infrastructure¶
Don't want to bundle a 316 MB model in your app, manage a GPU pool, or write overlap-add chunking? Use the StemSplit API instead — same models under the hood, hosted for you, with credits and a dashboard.