Skip to main content

cut sample enclosed noise regions out of sound files

Project description

This repository contains cutbynoise - a tool for cutting regions that are delimited by certain samples (a.k.a. jingles, markers or templates) out of larger sound files.

For example, you can use it on a multi-hour long podcast episode MP3 file where the producer inserts audio advertisements and marks such inserts with static samples.

That way you can easily listen again and again to an advertisement you found interesting.

Alternatively, you can listen to an episode without having to manually pause and skip over advertisements you have grown to hate and whose markers have conditioned you to do so, already.

Example usage:

$ ./cutbynoise.py -b ldn-begin-short.flac -o ldn372-clean.mp3 ldn372.mp3 --trash del -v
004.7 INF Aligning ldn-begin-short.flac with ldn372.mp3 (overlap=True) ...    [__main__]
006.5 INF Template matches at: [1918.26029091 1994.97614545 4029.22829091 4112.20814545] (s)    [__main__]
006.5 INF All template match positions: [1918.26029091 1994.97614545 4029.22829091 4112.20814545] (s)    [__main__]
006.5 INF Noisy regions: [[1918.26029091 1994.97614545]  [4029.22829091 4112.20814545]] (s)    [__main__]
006.5 INF Duration(s) of noisy regions: [76.71585455 82.97985455] (s)    [__main__]
006.5 INF Writing noise to del/01.mp3 ...    [__main__]
006.9 INF Writing noise to del/02.mp3 ...    [__main__]
007.1 INF Writing cut ldn372-clean.mp3 ...    [__main__]

NB: In this case the begin template also matches the end one. In case the region is delimited by truly different templates you can specify the second one via -e.

How it works

Cutbynoise cross-correlates each template with the input sound file. To speed this up and reduce memory usage, only the mono signal is cross-correlated, the signals are down-sampled, the cross-correlation is computed via convolution, using FFT and the overlap-add method.

In a direct implementation, cross-correlating a shorter signal with a longer one basically means computing the dot product over a sliding window. Peaks in the resulting vector are possible template matches. Thus, it can be seen as a variant of the sequence alignment problem.

For the heavy-lifting it uses ffmpeg (via ffmpeg-python), i.e. for reading and resampling the waveform signals and cutting the input file, and SciPy for the cross-correlation.

Related Methods

In case the begin and end templates of regions aren't very static one can look into standard methods used in speech recognition, i.e. especially Dynamic Time Warping (DTW). See also e.g. the Dynamic Time Warping Suite which provides packages for Python and R (FWIW, I haven't tested them, yet).

When it comes to podcasts with ads, some of them provide a premium feed that also contains all the episodes without any ads. Thus, if it fits your budget and you want to support the podcast hosts you can consider to simply pay for such a premium feed on a regular basis, instead of fiddling with cutbynoise.

However, not all podcasts (understandably) can be bothered to provide multiple feeds. For example, AFAICS, the c't uplink podcast is only available with ads, i.e. even if you have a c't Plus paid subscription.

Also, at some point, everyone's budget is limited, i.e. one can only financially support so many projects.

As always, one can also simply unsubscribe podcasts that contain too many annoying ads. Arguably, there are enough podcasts available that don't contain any advertisements.

Getting Started

Cutbynoise can be installed via pip or directly be executed, given that the required dependencies (cf. pyproject.toml) are installed.

On Fedora, you can install the dependencies system-wide like this:

dnf install python3-ffmpeg-python python3-scipy python3-numpy ffmpeg

For creating suitable templates (samples) that delimit your regions you can use your favourite audio editor. For example, you can use the open-source Audacity for this task (cf. File -> Export -> Export Selected Audio ...), which is packaged by many Linux distributions.

Limitations

As is, cutbynoise doesn't take extra care of additional meta-data contained in the input file. Especially chapter marks aren't adjusted, i.e. in the output file some chapter marks might be off by whatever amount was cut. Also, embedded images might not remain in the output file.

FWIW, both limitations might be fixed by moderate extra effort, e.g. by finding the right ffmpeg options and possibly adjusting chapter marks programmatically.

License

Cutbynoise is licensed under the GPL version 3 or later.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cutbynoise-0.0.1.tar.gz (7.0 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page