Return to search

Ultra-low-power Audio Feature Extraction using Time-Mode Analog Signal Processing Circuits

On-device audio recognition, in particular keyword spotting, will be instrumental to realizing the promise of pervasive intelligence. On-device operation demands ultra-low power and compact area. The state of the art in fully-integrated keyword spotting chips reveals that the power and area bottleneck is not the backend keyword spotting classifier, but rather the frontend audio feature extractor, motivating research into frontend audio feature extraction that is both power- and area-efficient.

After, first, introducing the topic of ultra-low power audio feature extraction using time-mode analog signal processing circuits, we, second, present an analog audio feature extractor chip that achieves the lowest power/feature and area/feature, as compared, respectively, to the most area-efficient and power-efficient published analog audio feature extractor chips. Despite the chip's state-of-the-art efficiency, competitive keyword spotting accuracy is maintained when interfacing the chip with a standard, small-footprint, software backend neural network. The chip's efficiency is due to a pair of novel circuit techniques we developed. The techniques are based on time-mode analog signal processing. This is a paradigm favored by technology scaling, in which analog information is encoded in the timing of digital edges, enabling digital gates to perform analog signal processing.

Third, we present a theory-based analysis of one of the two circuit techniques.

Fourth, we present theory- and simulation-based progress towards what would be a novel type of analog filtering, ``Time-Mode Analog Filter." Such a filter would use only the horizontal time axis to represent and process continuous-valued information, and would be built out of nothing more than digital gates. Fifth, and finally, we present a simulation-based study that finds that in state-of-the-art analog audio feature extractor chips, the power consumption of the critical block, the analog filterbank, can be reduced by one-and-a-half orders of magnitude, while degrading downstream keyword spotting accuracy by only a couple percent, paving the way towards more rigorous system-level design of audio recognition systems.

Identiferoai:union.ndltd.org:columbia.edu/oai:academiccommons.columbia.edu:10.7916/v7gr-f229
Date January 2023
CreatorsKinget, Peter R.
Source SetsColumbia University
LanguageEnglish
Detected LanguageEnglish
TypeTheses

Page generated in 0.002 seconds