MelKWS Engine
A simple and resource efficient hardware accelerator designed specifically for Keyword Spotting (KWS) applications using log-mel spectrograms as the audio feature extractor.
Architecture
Description
-
Input:
- The input audio stream is sampled at a specific frequency, such as 16 kHz.
- Each audio frame consists of a fixed number of samples.
-
Log-Mel Spectrogram Computation:
- Implement a lightweight log-mel spectrogram computation module to extract features from the input audio stream.
-
Keyword Detection:
- The accelerator should detect the presence or absence of a single predefined keyword or command based on the computed log-mel spectrograms.
-
Output:
- Provide a mechanism to indicate the presence or absence of the keyword in the input audio stream.
- Output a binary flag signal indicating the presence or absence of the keyword.
Architecture Choice
- Input Interface:
- Purpose: Handles incoming audio samples, ensuring they are correctly timed and formatted for processing.
- Components:
- Sample buffer: Temporarily stores incoming audio samples.
- Control logic: Manages the flow of samples based on system state and input validity.
- Pre-processing:
- Purpose: Applies necessary pre-processing steps to the audio samples, such as framing and windowing.
- Components:
- Frame buffer: Segments the continuous audio stream into overlapping frames.
- Window function: Applies a windowing function to each frame to minimize spectral leakage.
- FFT Module:
- Purpose: Converts time-domain audio frames into frequency-domain representations using the Fast Fourier Transform (FFT).
- Components:
- FFT processor: Computes the FFT of each windowed frame.
- Mel Filterbank Processing:
- Purpose: Applies a set of Mel-scaled filters to the FFT output to extract frequency bands that mimic human auditory perception.
- Components:
- Filterbank: A collection of band-pass filters corresponding to the Mel scale.
- Energy computation: Calculates the energy in each Mel band.
- Feature Extraction:
- Purpose: Optionally extracts additional features from the Mel spectrogram, such as MFCCs (Mel Frequency Cepstral Coefficients), if required by the keyword detection logic.
- Components:
- Feature extractor: Calculates MFCCs or other features from the Mel spectrogram.
- Dynamic Precision Adjustment:
- Purpose: Adjusts the precision of the FFT or Mel spectrogram data to optimize for computational efficiency or resource usage.
- Components:
- Precision control: Dynamically adjusts data bit-width based on configurable criteria.
- Logarithmic Compression:
- Purpose: Applies logarithmic compression to the Mel spectrogram to better match the non-linear perception of loudness in the human auditory system.
- Components:
- Logarithmic function: Computes the logarithm of Mel spectrogram values.
- Keyword Detection Logic:
- Purpose: Analyzes the log-Mel spectrogram (and possibly additional features) to detect the presence of specific keywords.
- Components:
- Detection algorithm: Implements a simple thresholding or a more complex pattern matching/machine learning algorithm to identify keywords.
- Keyword selector: Allows dynamic selection of the keyword(s) to be detected.
- Output Interface:
- Purpose: Indicates the detection result, such as the presence of a keyword.
- Components:
- Detection output: Signals when a keyword has been detected.
- Status indicators: Provide additional information about the detection process, such as confidence levels.
- Integration and Control (Top):
- System Controller: Coordinates the operation of all stages, managing state transitions, processing flow, and synchronization.
- Clock and Reset Management: Ensures all components operate synchronously and can be reset to a known state.
:exclamation: Important Note |
Forked from the Caravel User Project
Refer to README for a quickstart of how to use caravel_user_project
Refer to README for this sample project documentation.
Refer to the following readthedocs for how to add cocotb tests to your project.