Skip to content

demucs-onnx

The canonical way to run and export HT-Demucs / Demucs music source separation as ONNX. Pure numpy + onnxruntime at inference (no PyTorch), a one-liner export pipeline that fixes the four known blockers in torch.onnx.export, and a copy-pasteable onnxruntime-web path for the browser. Powers the StemSplit production stack.

Install via pip · View on GitHub · View on PyPI · Hugging Face models

Quick start

pip install 'demucs-onnx[mp3]'

# One command -> karaoke instrumental as a shareable MP3.
demucs-onnx separate song.mp3 out/ --karaoke --mp3
# writes out/karaoke.mp3  (drums + bass + other, vocals removed)

# Or every stem, automatically picking the best GPU on this host.
demucs-onnx separate song.mp3 out/

# 6-stem mode with guitar + piano (new in v0.3):
demucs-onnx separate song.mp3 out/ --model htdemucs_6s

# Scaffold a browser demo and try it locally:
demucs-onnx browser-demo /tmp/browser_demo
cd /tmp/browser_demo && python -m http.server 8080

What you get

  • A pure-numpy + onnxruntime inference path that runs HT-Demucs FT, htdemucs (single 4-stem), and htdemucs_6s (6-stem with guitar + piano) with no PyTorch dependency. Install footprint drops from ~2 GB (PyTorch) to ~50 MB (onnxruntime).
  • A one-call ONNX export pipelineexport_to_onnx("htdemucs_ft", ...) — that applies all four blocker patches and parity-checks the output before writing.
  • Independent, grep-able patch modules (stft.py, mha.py, pos_embed.py, segment.py) so you can lift any single fix into a different project.
  • onnxruntime-web recipes for Vite, Webpack, esbuild, Next.js, and Rollup, plus a zero-build vanilla HTML demo and a React + Vite demo emitted by demucs-onnx browser-demo.
  • 9 ONNX model repos on Hugging Face, auto-downloaded on first use.

The 4 blockers

For the entire history of the demucs repo (2021 – 2026) nobody on PyPI has shipped working ONNX export tooling for HT-Demucs. Searching GitHub turns up half a dozen abandoned forks, all stuck on one of four blockers, all without a working .onnx file to show for it. The official demucs README has no mention of ONNX.

We solved it. See Export your own for the full write-up with code references.

# Blocker Fix
1 torch.stft complex64 outputs Conv1d with sin/cos kernels — RealSTFT / RealISTFT
2 model.segment = Fraction(39, 5) Coerce to float
3 random.randrange in pos-embedding Hardcode shift=0
4 aten::_native_multi_head_attention has no ONNX symbolic Drop-in MHA forward built from Linear / bmm / softmax

Net result, end-to-end parity vs PyTorch fp32:

Model max abs diff (random 1×2×343980)
htdemucs 6.62 × 10⁻⁴
htdemucs_6s 2.42 × 10⁻⁴
htdemucs_ft drums 1.63 × 10⁻⁴
htdemucs_ft bass 1.42 × 10⁻⁴
htdemucs_ft other 1.71 × 10⁻⁴
htdemucs_ft vocals 1.55 × 10⁻⁴

…and the ONNX graph runs in onnxruntime CPU at 1.31× the speed of PyTorch CPU on Apple M4 Pro (no GPU).

Where next

  • Install — pip extras for MP3 output and the export pipeline.
  • CLI reference — every flag, with copyable examples.
  • Python API — autogenerated from docstrings.
  • Browser support — Vite, Webpack, vanilla HTML, React.
  • Models — every model in the registry with size, speed, quality.
  • Export your own — the canonical reference for the four blockers, with full code examples.
  • Comparison — honest comparison vs spleeter, raw demucs, and cloud APIs.

Skip the infrastructure

Don't want to bundle a 316 MB model in your app, manage a GPU pool, or write overlap-add chunking? Use the StemSplit API instead — same models under the hood, hosted for you, with credits and a dashboard.