A Python library for audio data augmentation. Inspired by albumentations. Useful for machine learning.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Audiomentations

A Python library for audio data augmentation. Inspired by albumentations. Useful for machine learning.

Setup

Python version support

pip install audiomentations

Optional requirements

If you want to use the Mp3Compression transform, you need to install additional dependencies that are optional.

Run pip install audiomentations[extras]. Then install ffmpeg, via e.g. conda or from the official ffmpeg download page.

Usage example

from audiomentations import Compose, AddGaussianNoise, TimeStretch, PitchShift, Shift
import numpy as np

SAMPLE_RATE = 16000

augmenter = Compose([
    AddGaussianNoise(min_amplitude=0.001, max_amplitude=0.015, p=0.5),
    TimeStretch(min_rate=0.8, max_rate=1.25, p=0.5),
    PitchShift(min_semitones=-4, max_semitones=4, p=0.5),
    Shift(min_fraction=-0.5, max_fraction=0.5, p=0.5),
])

samples = np.zeros((20,), dtype=np.float32)
samples = augmenter(samples=samples, sample_rate=SAMPLE_RATE)

Go to audiomentations/augmentations/transforms.py to see the transforms you can apply, and what arguments they have.

Transforms

`AddBackgroundNoise`

Added in v0.9.0

Mix in another sound, e.g. a background noise. Useful if your original sound is clean and you want to simulate an environment where background noise is present.

Can also be used for mixup, as in https://arxiv.org/pdf/1710.09412.pdf

A folder of (background noise) sounds to be mixed in must be specified. These sounds should ideally be at least as long as the input sounds to be transformed. Otherwise, the background sound will be repeated, which may sound unnatural.

Note that the gain of the added noise is relative to the amount of signal in the input. This implies that if the input is completely silent, no noise will be added.

`AddGaussianNoise`

Added in v0.1.0

Add gaussian noise to the samples

`AddGaussianSNR`

Added in v0.7.0

Add gaussian noise to the samples with random Signal to Noise Ratio (SNR)

`AddImpulseResponse`

Added in v0.7.0

Convolve the audio with a random impulse response. Impulse responses can be created using e.g. http://tulrich.com/recording/ir_capture/

Impulse responses are represented as wav files in the given ir_path.

`AddShortNoises`

Added in v0.9.0

Mix in various (bursts of overlapping) sounds with random pauses between. Useful if your original sound is clean and you want to simulate an environment where short noises sometimes occur.

A folder of (noise) sounds to be mixed in must be specified.

`ClippingDistortion`

Added in v0.8.0

Distort signal by clipping a random percentage of points

The percentage of points that will ble clipped is drawn from a uniform distribution between the two input parameters min_percentile_threshold and max_percentile_threshold. If for instance 30% is drawn, the samples are clipped if they're below the 15th or above the 85th percentile.

`FrequencyMask`

Added in v0.7.0

Mask some frequency band on the spectrogram. Inspired by https://arxiv.org/pdf/1904.08779.pdf

`Gain`

Added in v0.11.0

Multiply the audio by a random amplitude factor to reduce or increase the volume. This technique can help a model become somewhat invariant to the overall gain of the input audio.

Warning: This transform can return samples outside the [-1, 1] range, which may lead to clipping or wrap distortion, depending on what you do with the audio in a later stage. See also https://en.wikipedia.org/wiki/Clipping_(audio)#Digital_clipping

`Mp3Compression`

Added in v0.12.0

Compress the audio using an MP3 encoder to lower the audio quality. This may help machine learning models deal with compressed, low-quality audio.

This transform depends on either lameenc or pydub/ffmpeg.

Note that bitrates below 32 kbps are only supported for low sample rates (up to 24000 hz).

Note: When using the lameenc backend, the output may be slightly longer than the input due to the fact that the LAME encoder inserts some silence at the beginning of the audio.

`Normalize`

Added in v0.6.0

Apply a constant amount of gain, so that highest signal level present in the sound becomes 0 dBFS, i.e. the loudest level allowed if all samples must be between -1 and 1. Also known as peak normalization.

`PitchShift`

Added in v0.4.0

Pitch shift the sound up or down without changing the tempo

`PolarityInversion`

Added in v0.11.0

Flip the audio samples upside-down, reversing their polarity. In other words, multiply the waveform by -1, so negative values become positive, and vice versa. The result will sound the same compared to the original when played back in isolation. However, when mixed with other audio sources, the result may be different. This waveform inversion technique is sometimes used for audio cancellation or obtaining the difference between two waveforms. However, in the context of audio data augmentation, this transform can be useful when training phase-aware machine learning models.

`Resample`

Added in v0.8.0

Resample signal using librosa.core.resample

To do downsampling only set both minimum and maximum sampling rate lower than original sampling rate and vice versa to do upsampling only.

`Shift`

Added in v0.5.0

Shift the samples forwards or backwards, with or without rollover

`TimeMask`

Added in v0.7.0

Make a randomly chosen part of the audio silent. Inspired by https://arxiv.org/pdf/1904.08779.pdf

`TimeStretch`

Added in v0.2.0

Time stretch the signal without changing the pitch

Known limitations

Mainly only float32 (i.e. values between -1 and 1) mono audio is supported. Only a few of the transforms support multichannel audio. See also #55
The code runs on CPU, not GPU. For a GPU-compatible version, check out pytorch-audiomentations
Multiprocessing is not officially supported yet. See also #46

Contributions are welcome!

Version history

v0.12.1 (2020-09-28)

Speed up AddBackgroundNoise, AddShortNoises and AddImpulseResponse by loading wav files with scipy or wavio instead of librosa.

v0.12.0 (2020-09-23)

Implement Mp3Compression
Python <= 3.5 is no longer officially supported, since Python 3.5 has reached end-of-life
Expand range of supported librosa versions
Officially support multichannel audio in Gain and PolarityInversion
Add m4a and opus to the list of recognized audio filename extensions
Breaking change: Internal util functions are no longer exposed directly. If you were doing e.g. from audiomentations import calculate_rms, now you have to do from audiomentations.core.utils import calculate_rms

v0.11.0 (2020-08-27)

Implement Gain and PolarityInversion. Thanks to Spijkervet for the inspiration.

v0.10.1 (2020-07-27)

Improve the performance of AddBackgroundNoise and AddShortNoises by optimizing the implementation of calculate_rms.
Improve compatibility of output files written by the demo script. Thanks to xwJohn.
Fix division by zero bug in Normalize. Thanks to ZFTurbo.

v0.10.0 (2020-05-05)

Breaking change: AddImpulseResponse, AddBackgroundNoise and AddShortNoises now include subfolders when searching for files. This is useful when your sound files are organized in subfolders.
AddImpulseResponse, AddBackgroundNoise and AddShortNoises now support aiff files in addition to flac, mp3, ogg and wav
Fix filter instability bug in FrequencyMask. Thanks to kvilouras.

v0.9.0 (2020-02-20)

Disregard non-audio files when looking for impulse response files
Remember randomized/chosen effect parameters. This allows for freezing the parameters and applying the same effect to multiple sounds. Use transform.freeze_parameters() and transform.unfreeze_parameters() for this.
Fix a bug in ClippingDistortion where the min_percentile_threshold was not respected as expected.
Implement transform.serialize_parameters(). Useful for when you want to store metadata on how a sound was perturbed.
Switch to a faster convolve implementation. This makes AddImpulseResponse significantly faster.
Add a rollover parameter to Shift. This allows for introducing silence instead of a wrapped part of the sound.
Expand supported range of librosa versions
Add support for flac in AddImpulseResponse
Implement AddBackgroundNoise transform. Useful for when you want to add background noise to all of your sound. You need to give it a folder of background noises to choose from.
Implement AddShortNoises. Useful for when you want to add (bursts of) short noise sounds to your input audio.
Improve handling of empty input

v0.8.0 (2020-01-28)

Add shuffle parameter in Composer
Add Resample transformation
Add ClippingDistortion transformation
Add fade parameter to TimeMask

Thanks to askskro

v0.7.0 (2020-01-14)

Add new transforms:

AddImpulseResponse
FrequencyMask
TimeMask
AddGaussianSNR

Thanks to karpnv

v0.6.0 (2019-05-27)

Implement peak normalization

v0.5.0 (2019-02-23)

Implement Shift transform
Ensure p is within bounds

v0.4.0 (2019-02-19)

Implement PitchShift transform
Fix output dtype of AddGaussianNoise

v0.3.0 (2019-02-19)

Implement leave_length_unchanged in TimeStretch

v0.2.0 (2019-02-18)

Add TimeStretch transform
Parametrize AddGaussianNoise

v0.1.0 (2019-02-15)

Initial release. Includes only one transform: AddGaussianNoise

Development

Install the dependencies specified in requirements.txt

Code style

Format the code with black

Run tests and measure code coverage

pytest

Generate demo sounds for empirical evaluation

python -m demo.demo

Alternatives

Acknowledgements

Thanks to Nomono for backing audiomentations.

Thanks to all contributors who help improving audiomentations.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.35.0

Mar 15, 2024

0.34.1

Nov 24, 2023

0.33.0

Aug 30, 2023

0.32.0

Aug 15, 2023

0.31.0

Jun 21, 2023

0.30.0

May 2, 2023

0.29.0

Mar 15, 2023

0.28.0

Jan 12, 2023

0.27.0

Sep 13, 2022

0.26.0

Aug 19, 2022

0.25.1

Jun 15, 2022

0.25.0

May 30, 2022

0.24.0

Mar 18, 2022

0.23.0

Mar 7, 2022

0.22.0

Feb 18, 2022

0.21.0

Feb 10, 2022

0.20.0

Nov 18, 2021

0.19.0

Oct 18, 2021

0.18.0

Aug 5, 2021

0.17.0

Jun 25, 2021

0.16.0

Feb 11, 2021

0.15.0

Dec 10, 2020

0.14.0

Dec 6, 2020

0.13.0

Nov 10, 2020

This version

0.12.1

Sep 28, 2020

0.12.0

Sep 23, 2020

0.11.0

Aug 27, 2020

0.10.1

Jul 27, 2020

0.10.0

May 5, 2020

0.9.0

Feb 20, 2020

0.8.0

Jan 28, 2020

0.7.0

Jun 14, 2019

0.6.0

May 27, 2019

0.5.0

Feb 23, 2019

0.4.0

Feb 19, 2019

0.3.0

Feb 19, 2019

0.2.0

Feb 18, 2019

0.1.0

Feb 15, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

audiomentations-0.12.1.tar.gz (17.3 kB view hashes)

Uploaded Sep 28, 2020 Source

Built Distribution

audiomentations-0.12.1-py3-none-any.whl (19.9 kB view hashes)

Uploaded Sep 28, 2020 Python 3

Hashes for audiomentations-0.12.1.tar.gz

Hashes for audiomentations-0.12.1.tar.gz
Algorithm	Hash digest
SHA256	`510a91b252ffc14f28b994235a564ea90fa34d1e0a3a1d0b7560edfa19111cc8`
MD5	`9229500565c0724d04691dd1bcfdca41`
BLAKE2b-256	`04747495e2d7fd4062f9e73c343c9c466795c16e0511535a19685d4e657c0b25`

Hashes for audiomentations-0.12.1-py3-none-any.whl

Hashes for audiomentations-0.12.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`580a16aed3386fd62400180a51a201ec65b6ba80aec8248368def31461212d46`
MD5	`85ba7952658f64798eb8e24e83f50152`
BLAKE2b-256	`14c94bd1a4656487c112405bb3facb8ee106e00bad32baee79a5e4dd2c13884c`

audiomentations 0.12.1

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

Audiomentations

Setup

Optional requirements

Usage example

Transforms

AddBackgroundNoise

AddGaussianNoise

AddGaussianSNR

AddImpulseResponse

AddShortNoises

ClippingDistortion

FrequencyMask

Gain

Mp3Compression

Normalize

PitchShift

PolarityInversion

Resample

Shift

TimeMask

TimeStretch

Known limitations

Version history

v0.12.1 (2020-09-28)

v0.12.0 (2020-09-23)

v0.11.0 (2020-08-27)

v0.10.1 (2020-07-27)

v0.10.0 (2020-05-05)

v0.9.0 (2020-02-20)

v0.8.0 (2020-01-28)

v0.7.0 (2020-01-14)

v0.6.0 (2019-05-27)

v0.5.0 (2019-02-23)

v0.4.0 (2019-02-19)

v0.3.0 (2019-02-19)

v0.2.0 (2019-02-18)

v0.1.0 (2019-02-15)

Development

Code style

Run tests and measure code coverage

Generate demo sounds for empirical evaluation

Alternatives

Acknowledgements

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

`AddBackgroundNoise`

`AddGaussianNoise`

`AddGaussianSNR`

`AddImpulseResponse`

`AddShortNoises`

`ClippingDistortion`

`FrequencyMask`

`Gain`

`Mp3Compression`

`Normalize`

`PitchShift`

`PolarityInversion`

`Resample`

`Shift`

`TimeMask`

`TimeStretch`