Brent M. Spell

Trimming Silence with Gaussian Mixtures
June 27, 2022

Removing silence from audio is a common task in speech machine learning applications, including wakeword/keyword detection, speech recognition, and text-to-speech. By stripping silence segments, we can reduce the amount of wasted computation used to train on a speech corpus. Doing…
Audio Bandwidth Extension with GANs
April 23, 2022

Many generative tasks in machine learning for speech synthesize audio at relatively low sample rates, usually 16kHz or 24kHz. For example, it is common for a text-to-speech pipeline to include a synthesizer that generates a mel spectrogram from text, followed by a vocoder model that…
Yin Pitch Estimator in PyTorch
March 19, 2022

An estimator of fundamental frequency, or pitch, of an audio signal is a useful tool for many audio machine learning applications. For example, the pitch contour is used as an input feature in the Mellotron singing synthesizer and the DDSP sound generator. Pitch vectors can also…
Efficient Oscillator Synthesis
February 19, 2022

Oscillators are basic building blocks for several sound generation algorithms, such as additive, subtractive, and frequency modulation (FM) synthesis. For digital synthesizers, these waveforms are represented by points sampled from a continuous periodic function. In most cases, we…
Piecewise Interpolation in TensorFlow
January 29, 2022

NumPy's interp is a handy function for generating an array from a piecewise linear mapping defined by a set of control points. For example, here is a linear plot of today's U.S. Treasury yield curve…