Distribution Preserving Source Separation with Time Frequency Predictive Models

This is the demonstration page of the paper "Distribution Preserving Source Separation with Time Frequency Predictive Models" with the samples used for the MUSHRA-style listening test.

Info

Abstract

We provide an example of a distribution preserving source separation method, which aims at addressing perceptual shortcomings of state-of-the-art methods. Our approach uses unconditioned generative models of signal sources. Reconstruction is achieved by means of mix-consistent sampling from a distribution conditioned on a realization of a mix. The separated signals follow their respective source distributions, which provides an advantage when separation results are evaluated in a listening test.

Reference

Pedro J. Villasana T., Janusz Klejsa, Lars Villemoes, Per Hedelin (2023). Distribution Preserving Source Separation with Time Frequency Predictive Models.

MUSHRA items

This section contains the 10 items used in the MUSHRA-style listening tests. We trained the models using the VCTK [1] and Supra [2] datasets. These items were never seen during training.

Item Mixture Sources DPSS IRM PNF Low-pass anchor 3.5kHz
1

2

3

4

5

6

7

8

9

10

Additional material

VCTK+MAESTRO

In this section we used the Maestro [3] dataset to train the piano model.

Item Mixture Sources DPSS
1

2

3

4

5

Out-of-distribution mixes

This section contains examples of out-of-distribution mixes. We don't have the original components of these mixtures, nor were the models trained for sources that are present in the mixes.

Item Out-of-distribution mixes DPSS
1

2

3

4

5

References

[1] Junichi Yamagishi, Christophe Veaux, Kirsten MacDonald, et al., "CSTR VCTK Corpus: English multi-speaker corpus for CSTR voice cloning toolkit (version 0.92)," University of Edinburgh. The Centre for Speech Technology Research (CSTR), 2019.
[2] Zhengshan Shi, Craig Sapp, Kumaran Arul, Jerry McBride, and Julius O Smith III, "SUPRA: Digitizing the Stanford University Piano Roll Archive," in ISMIR, 2019, pp. 517–523.
[3] Curtis Hawthorne, Andriy Stasyuk, Adam Roberts, Ian Simon, Cheng-Zhi Anna Huang, Sander Dieleman, Erich Elsen, Jesse Engel, and Douglas Eck. "Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset". In International Conference on Learning Representations, 2019.