AutoML for DSP pipeline optimization

Huang Kevin
4 min readNov 30, 2021

Automated machine learning(AutoML) has been considered to be a promising way to solve hyperparameter optimization for machine-learning pipeline and neural architecture search. However, quite few literature discusses about the possibility of applying AutoML to digital-signal-processing(DSP) pipeline. Traditionally, DSP pipeline design required lots of expert experience and efforts to trial-and-error. In my opinion, there are several benefits DSP pipeline design can gain from AutoML :

  • Expand the limit of simple model with the help of composite preprocessing design. Make it possible to get rid of heavy training framework for neural network.
  • Identify proper combination of preprocessor(s) and neural network model which can dramatically reduce the size of neural network. In ref[1], Dai et al. have shown that it would take 18 weight layers of neural network to replace simple classical audio preprocessing — Mel spectrogram. In ref[2], Samiul Based Shuvo et al. have demonstrated that with the delicated design of preprocessing, including signal decomposition and wavelet transform, one can downsize neural network for lung sound classification.
  • Facilitate model conversion into C/C++ code ready for hardware deployment because ML pipeline can be designed to comprise of controllable building blocks, said preprocessors, with C/C++ counterpart.
  • Instead of single complex end-to-end neural network, composite pipeline is composed of interpretable building blocks with meaning in signal processing, said FFT or wavelet transform.

In this post, I will elucidate this point by solving ECG classification problem via Coretronic AutoML system.

MIT — BIH ECG dataset

To compare my result with the one from mathwork example[3], I used the same ECG dataset from mathwork github which is sourced from Physionet. There are 162 ECG samples with 3 diagnostic classes. With the combination of wavelet transformation and deep learning, matlab has reach 96.875% in test accuracy for GoogleNet and 93.75% for SqueezeNet.

Pipeline search via Coretronic AutoML system

I followed the same procedure in ref[3] to split 80% for training dataset and the rest for test set. Next, I fed the training dataset into our Coretronic AutoML system . After 1 day of pipeline search, there is already some interesting pattern in what AutoML system recommend for me. It highly recommends to combine spectrogram and vanilla 2D convolution neural network to solve this problem. Moreover, it suggests that a continuous wavelet transform before spectrogram might help.

Top10 pipelines recommended by Coretronic AutoML system

Check and compare top1 recommended pipeline

Based on the system’s recommendations, I tried to reconstruct the top1 recommended pipeline: CWT-Spectrogram-2D CNN with given set of hyperparameters, and further examined the performance of this pipeline on test set. Boom! The test accuracy can reach up to 96.97% merely w/ much fewer parameters. Compared with deep neural network, said GoogleNet and SqueezeNet, used by Matlab, our neural network is much more shallower without complex operators. This proves that with well-designed combination of preprocessors we don’t need too large neural network or too complex operator in neural network.

Detailed profile of Top1 recommended pipeline
Recommended vanilla 2D CNN model with around 835K parameters
Table of model performance from our system and matlab

Conclusion

In this post, I have discussed about the potential benefit of AutoML + DSP in pipeline interpretability, model compression, model deployment to C/C++ system and demonstrate how this approach can compress neural network model without sacrificing the performance via ECG dataset. Coretronic AutoML system has accelerate the DSP pipeline design for our sensor products, including accelerometer[4], microphone[4] and spectrometer[5]. In the future, I hope that there would be more tiny, edge-deployable AI pipeline provided by AutoML.

Ref:

[1] Dai, W., Dai, C., Qu, S., Li, J. & Das, S. Very Deep Convolutional Neural Networks for Raw Waveforms. arXiv:1610.00087 [cs] (2016).

[2] Shuvo, S.B., Ali, S.N., Swapnil, S.I., Hasan, T., and Bhuiyan, M.I.H. (2021). A Lightweight CNN Model for Detecting Respiratory Diseases From Lung Auscultation Sounds Using EMD-CWT-Based Hybrid Scalogram. IEEE J. Biomed. Health Inform. 25, 2595–2603.

[3] Classify Time Series Using Wavelet Analysis and Deep Learning https://www.mathworks.com/help/wavelet/ug/classify-time-series-using-wavelet-analysis-and-deep-learning.html

[4] Coretronic MEMS Corporation https://www.coretronicmems.com/

[5] Innospectra http://www.inno-spectra.com/

--

--

Huang Kevin

Algorithm engineer at semiconductor company with background in physics