1. June 27, 2022

    Removing silence from audio is a common task in speech machine learning applications, including wakeword/keyword detection, speech recognition, and text-to-speech. By stripping silence segments, we can reduce the amount of wasted computation used to train on a speech corpus. Doing…

  2. April 23, 2022

    Many generative tasks in machine learning for speech synthesize audio at relatively low sample rates, usually 16kHz or 24kHz. For example, it is common for a text-to-speech pipeline to include a synthesizer that generates a mel spectrogram from text, followed by a vocoder model that…

  3. March 19, 2022

    An estimator of fundamental frequency, or pitch, of an audio signal is a useful tool for many audio machine learning applications. For example, the pitch contour is used as an input feature in the Mellotron singing synthesizer and the DDSP sound generator. Pitch vectors can also…

  4. February 19, 2022

    Oscillators are basic building blocks for several sound generation algorithms, such as additive, subtractive, and frequency modulation (FM) synthesis. For digital synthesizers, these waveforms are represented by points sampled from a continuous periodic function. In most cases, we…

  5. January 29, 2022

    NumPy's interp is a handy function for generating an array from a piecewise linear mapping defined by a set of control points. For example, here is a linear plot of today's U.S. Treasury yield curve…