Prajol Shrestha

Blog and Portfolio page.

View on GitHub

Speech and Audio Processing

A. Basics

  1. Human Speech production & Hearing
    • Speech Production: Anatomy & Models
    • Hearing: Anatomy & psychoacoustics
  2. Signal Representation
    • Long-term description vs Short-term description
    • Stochastic Properties
    • Spectral representation & Cepstral representation

B. Source Coding for speech & audio signals

  1. Data compression
  2. Quantization
  3. Linear Prediction
  4. Coding in Time Domain
  5. Coding in Frequency Domain
    • MP3, AAC

C. Basics of Automatic Speech Recognition (ASR)

  1. Basics
  2. Approaches to ASR
    • Acoustic-phonetic approach
    • Pattern Recognition approach
    • AI approach
  3. Acoustic Modeling
    • Feature Extraction
    • Feature Transformation
    • Pattern Comparison
  4. Hidden Markov Models
    • Properties of HMM
    • Evaluation of model: Forward Algorithm
    • Search for hidden str of obs: Viterbi Algorithm
    • Optimization of model by training: Baum-Welch Algorithm

D. Basics of Text-to-Speech (TTS) Translation

  1. System Architecture
  2. Text Analyis
  3. Phonetic Analysis
  4. Prosody
  5. Speech Synthesis
    • Basics
    • Formant Synthesis
    • Concatenative Synthesis
    • Prosodic Modification of Speech: OLA, SOLA, PSOLA

E. Signal Enhancement

  1. Signal Procesisng Methods
    • Single-channel acquisition & reproduction
    • Multi-channel acquisition & reproduction: Beamforming
  2. Acoustic echo cancellation (AEC)
  3. Noise Reduction
  4. Dereverbation
  5. MIMO Systems for Blind Signal Acquisition: TRINICON