Skip to main content

Audio-focused loss functions in PyTorch

Project description

auraloss

A collection of audio-focused loss functions in PyTorch.

[PDF]

Setup

pip install auraloss

Usage

import torch
import auraloss

mrstft = auraloss.freq.MultiResolutionSTFTLoss()

input = torch.rand(8,1,44100)
target = torch.rand(8,1,44100)

loss = mrstft(input, target)

Loss functions

We categorize the loss functions as either time-domain or frequency-domain approaches. Additionally, we include perceptual transforms.

Loss function Interface Reference
Time domain
Error-to-signal ratio (ESR) auraloss.time.ESRLoss() Wright & Välimäki, 2019
DC error (DC) auraloss.time.DCLoss() Wright & Välimäki, 2019
Log hyperbolic cosine (Log-cosh) auraloss.time.LogCoshLoss() Chen et al., 2019
Signal-to-noise ratio (SNR) auraloss.time.SNRLoss()
Scale-invariant signal-to-distortion
ratio (SI-SDR)
auraloss.time.SISDRLoss() Le Roux et al., 2018
Scale-dependent signal-to-distortion
ratio (SD-SDR)
auraloss.time.SDSDRLoss() Le Roux et al., 2018
Frequency domain
Aggregate STFT auraloss.freq.STFTLoss() Arik et al., 2018
Aggregate Mel-scaled STFT auraloss.freq.MelSTFTLoss(sample_rate)
Multi-resolution STFT auraloss.freq.MultiResolutionSTFTLoss() Yamamoto et al., 2019*
Random-resolution STFT auraloss.freq.RandomResolutionSTFTLoss() Steinmetz & Reiss, 2020
Sum and difference STFT loss auraloss.freq.SumAndDifferenceSTFTLoss() Steinmetz et al., 2020
Perceptual transforms
Sum and difference signal transform auraloss.perceptual.SumAndDifference()
FIR pre-emphasis filters auraloss.perceptual.FIRFilter() Wright & Välimäki, 2019

* Wang et al., 2019 also propose a multi-resolution spectral loss (that Engel et al., 2020 follow), but they do not include both the log magnitude (L1 distance) and spectral convergence terms, introduced in Arik et al., 2018, and then extended for the multi-resolution case in Yamamoto et al., 2019.

Examples

Currently we include an example using a set of the loss functions to train a TCN for modeling an analog dynamic range compressor. For details please refer to the details in examples/compressor. We provide pre-trained models, evaluation scripts to compute the metrics in the paper, as well as scripts to retrain models.

There are some more advanced things you can do based upon the STFTLoss class. For example, you can compute both linear and log scaled STFT errors as in Engel et al., 2020. In this case we do not include the spectral convergence term.

stft_loss = auraloss.freq.STFTLoss(w_log_mag=1.0, 
                                   w_lin_mag=1.0, 
                                   w_sc=0.0, )

There is also a Mel-scaled STFT loss, which has some special requirements. This loss requires you set the sample rate as well as specify the correct device.

sample_rate = 44100
melstft_loss = auraloss.freq.MelSTFTLoss(sample_rate, device="cuda")

You can also build a multi-resolution Mel-scaled STFT loss with 64 bins easily. Make sure you pass the correct device where the tensors you are comparing will be.

mrmelstft_loss = auraloss.freq.MultiResolutionSTFTLoss(scale="mel", 
                                                       n_bins=64,
                                                       sample_rate=sample_rate,
                                                       device="cuda")

Development

We currently have no tests, but those will also be coming soon, so use caution at the moment. Future loss functions to be included will target neural network based perceptual losses, which tend to be a bit more sophisticated than those we have included so far.

If you are interested in adding a loss function please make a pull request.

Cite

If you use this code in your work please consider citing us.

@inproceedings{steinmetz2020auraloss,
    title={auraloss: {A}udio focused loss functions in {PyTorch}},
    author={Steinmetz, Christian J. and Reiss, Joshua D.},
    booktitle={Digital Music Research Network One-day Workshop (DMRN+15)},
    year={2020}}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

auraloss-0.2.0.tar.gz (12.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

auraloss-0.2.0-py3-none-any.whl (15.4 kB view details)

Uploaded Python 3

File details

Details for the file auraloss-0.2.0.tar.gz.

File metadata

  • Download URL: auraloss-0.2.0.tar.gz
  • Upload date:
  • Size: 12.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.7.3 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.9.2

File hashes

Hashes for auraloss-0.2.0.tar.gz
Algorithm Hash digest
SHA256 451bfc549d308a53473805855a4ec8a868c648cb26630f93af9beadb53d51895
MD5 fdb6551456809da2658c8c2257fbf059
BLAKE2b-256 aed64e108da932bc2de10a5a236622f95320bd1acbc28f650dd28b56e5011a1c

See more details on using hashes here.

File details

Details for the file auraloss-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: auraloss-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 15.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.7.3 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.9.2

File hashes

Hashes for auraloss-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 95ca26690ed697a0899bb04f1b4eac0ed44b4ef0d30c8f1ea264986dc3a076dc
MD5 37d0906601bb320a7293bab67ebcc945
BLAKE2b-256 01e51f1ab8c0631707d528d50a531e826ce13cfcba459e69e379e98901486a33

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page