Skip to content

Models

demucs-onnx v0.3.0 publishes 9 ONNX repos under the StemSplitio Hugging Face org. All auto-download on first use; nothing else is required.

At a glance

Alias Kind Stems fp32 size fp16weights size Best for
htdemucs_ft 4-specialist bag drums, bass, other, vocals 1.26 GB (4×316 MB) 660 MB (4×166 MB) Best SDR. Default.
htdemucs Single file drums, bass, other, vocals 316 MB 166 MB Fastest 4-stem startup, ~30% faster than the bag.
htdemucs_6s Single file drums, bass, other, vocals, guitar, piano 258 MB 136 MB Need guitar / piano stems. The only ONNX export of the 6-stem variant.
htdemucs_ft_drums Single specialist drums 316 MB 166 MB Drum extraction, beat transcription.
htdemucs_ft_bass Single specialist bass 316 MB 166 MB Bassline isolation, mix rebalancing.
htdemucs_ft_other Single specialist other 316 MB 166 MB Karaoke instrumental (pair with vocals).
htdemucs_ft_vocals Single specialist vocals 316 MB 166 MB #1 open-source vocal SDR — vocal removal, acapella, karaoke.

fp16weights files compute in fp32 at runtime — same RAM, same speed, max abs diff vs fp32 weights is ~6 × 10⁻⁵. Use them.


Quality

Parity vs PyTorch fp32 (random (1, 2, 343980) input):

Repo max abs diff tolerance
htdemucs-ft-drums-onnx 1.63 × 10⁻⁴ 1 × 10⁻³
htdemucs-ft-bass-onnx 1.42 × 10⁻⁴ 1 × 10⁻³
htdemucs-ft-other-onnx 1.71 × 10⁻⁴ 1 × 10⁻³
htdemucs-ft-vocals-onnx 1.55 × 10⁻⁴ 1 × 10⁻³
htdemucs-onnx 6.62 × 10⁻⁴ 1 × 10⁻³
htdemucs-6s-onnx 2.42 × 10⁻⁴ 1 × 10⁻³

Official MUSDB18-HQ SDR (median across 50 test songs, BSS Eval v4):

Model drums bass other vocals
htdemucs_ft 10.11 dB 9.7 dB 7.0 dB 9.19 dB
htdemucs ~9.5 dB ~9.0 dB ~5.5 dB ~8.8 dB
htdemucs_6s ~9.5 dB ~9.0 dB ~5.5 dB ~8.5 dB

htdemucs_6s has lower vocals/other SDR because the model also predicts guitar and piano — "other" becomes more specific (less of a catchall), trading vocal-recall for guitar/piano clarity.

Full benchmark across every popular open-source separator: StemSplitio/stem-separation-benchmark-2026.


Performance

Single 7.8 s segment on Apple M4 Pro (CPU EP):

Mode Per segment Per 3-min song RTF
separate(model="htdemucs_ft", stems=("vocals",)) 1.6 s ~22 s 0.20
separate(model="htdemucs_ft") (bag, 4 stems) 6.4 s ~88 s 0.49
separate(model="htdemucs") (single file, 4 stems) 1.6 s ~22 s 0.20
separate(model="htdemucs_6s") (single file, 6 stems) 1.6 s ~22 s 0.20

CUDA / DirectML / CoreML EPs are typically ≥ 5× faster on real GPUs.


Programmatic listing

from demucs_onnx import list_models
import json
print(json.dumps(list_models(), indent=2))

Or via the CLI:

demucs-onnx list-models

Direct download URLs

Every model is downloadable without huggingface_hub. The URLs are stable and CDN-hosted by HF.

# Single-file 4-stem htdemucs (fp16weights variant).
curl -OL https://huggingface.co/StemSplitio/htdemucs-onnx/resolve/main/htdemucs_fp16weights.onnx

# 6-stem variant.
curl -OL https://huggingface.co/StemSplitio/htdemucs-6s-onnx/resolve/main/htdemucs_6s_fp16weights.onnx

# Vocals specialist (the most-used file).
curl -OL https://huggingface.co/StemSplitio/htdemucs-ft-vocals-onnx/resolve/main/htdemucs_ft_vocals_fp16weights.onnx

License

All ONNX files are MIT-licensed, matching the original HT-Demucs. Please cite the original authors if you use them in research; see the Skip the infrastructure section on the home page for the bibtex.