ISMIR 2026 Submission · Audio Demo

MixtureTT: Flexible Timbre Transfer from Polyphonic Mixtures via Joint Stem Diffusion

Anonymous Authors Affiliation withheld during review

Abstract

Timbre transfer aims to modify the timbral identity of a musical recording while preserving the original melody and rhythm. While single-instrument timbre transfer has made substantial progress, existing approaches to multi-instrument settings rely on separate-then-transfer pipelines that propagate source separation artifacts and produce incoherent synthesized timbres across stems. This paper proposes MixtureTT, to the best of our knowledge the first system for flexible per-stem timbre transfer directly from a polyphonic mixture. Given a mixture and a separate timbre reference for each target voice, MixtureTT jointly transfers all stems to the specified instruments through a shared diffusion process. Modeling the dependencies across the per-stem content and cross-stem harmonic, the proposed joint stem diffusion transformer eliminates cascaded separation error, reduces inference cost by a factor equal to the number of stems, and yields more coherent multi-stem outputs. Despite operating under a strictly harder input condition, evaluations on the SATB choral dataset show that MixtureTT outperforms single-instrument baselines on both objective and subjective metrics demonstrating the necessity of dedicated multi-instrument timbre transfer over the naive separate-then-transfer pipelines. As a result, this work confirms that the cross-stem modeling is essential for mixture-level timbre transfer as the proposed joint setting consistently exceeds an equivalent single-stem ablation.

MixtureTT overview

Audio Samples

Example 01 Brass Quartet String Quartet
Source mixture (input) input
Mixture
Trumpet
Horn
Trombone
Tuba
Timbre references ref
Mixture
Violin
Violin
Viola
Cello

Converted outputs

Ours Baselines
Method
S1Violin 1
S2Violin 2
S3Viola
S4Cello
Remix
new-joint ours · joint training
new-single ours · single-stem variant
base-diff baseline · diffusion
base-vae baseline · VAE
Example 02 Brass Quartet Woodwind Quartet
Source mixture (input) input
Mixture
Trumpet
Horn
Trombone
Tuba
Timbre references ref
Mixture
Flute
Oboe
Clarinet
Bassoon

Converted outputs

Ours Baselines
Method
S1Flute
S2Oboe
S3Clarinet
S4Bassoon
Remix
new-joint ours · joint training
new-single ours · single-stem variant
base-diff baseline · diffusion
base-vae baseline · VAE
Example 03 String Quartet Brass Quartet
Source mixture (input) input
Mixture
Violin
Violin
Viola
Cello
Timbre references ref
Mixture
Trumpet
Horn
Trombone
Tuba

Converted outputs

Ours Baselines
Method
S1Trumpet
S2Horn
S3Trombone
S4Tuba
Remix
new-joint ours · joint training
new-single ours · single-stem variant
base-diff baseline · diffusion
base-vae baseline · VAE
Example 04 String Quartet Woodwind Quartet
Source mixture (input) input
Mixture
Violin
Violin
Viola
Cello
Timbre references ref
Mixture
Flute
Oboe
Clarinet
Bassoon

Converted outputs

Ours Baselines
Method
S1Flute
S2Oboe
S3Clarinet
S4Bassoon
Remix
new-joint ours · joint training
new-single ours · single-stem variant
base-diff baseline · diffusion
base-vae baseline · VAE
Example 05 Woodwind Quartet Brass Quartet
Source mixture (input) input
Mixture
Flute
Oboe
Clarinet
Bassoon
Timbre references ref
Mixture
Trumpet
Horn
Trombone
Tuba

Converted outputs

Ours Baselines
Method
S1Trumpet
S2Horn
S3Trombone
S4Tuba
Remix
new-joint ours · joint training
new-single ours · single-stem variant
base-diff baseline · diffusion
base-vae baseline · VAE
Example 06 Woodwind Quartet String Quartet
Source mixture (input) input
Mixture
Flute
Oboe
Clarinet
Bassoon
Timbre references ref
Mixture
Violin
Violin
Viola
Cello

Converted outputs

Ours Baselines
Method
S1Violin 1
S2Violin 2
S3Viola
S4Cello
Remix
new-joint ours · joint training
new-single ours · single-stem variant
base-diff baseline · diffusion
base-vae baseline · VAE