YAMNET for audio classification

Tiya Vaj
1 min readOct 12, 2024

--

  • Pre-trained Model: YAMNet is a deep learning model trained to predict 521 audio event classes.
  • Dataset: Trained on 1,574,587 10-second YouTube soundtrack excerpts from the AudioSet dataset (unbalanced train segments).
  • Architecture: Uses the MobileNet_v1 depthwise-separable convolution architecture for efficient computation.
  • Audio Processing: Designed to process audio files sampled at 16 kHz and make predictions at a 10 Hz frame rate.
  • Performance:
  • D-prime: 2.318
  • Balanced mAP: 0.306
  • Balanced average lwlrap: 0.393 (lwlrap: label-weighted label-ranking average precision, described in the DCASE 2019 Task 2 Overview Paper).
  • Keras Model: Includes Keras code for constructing the model and applying it to input audio files.
  • Purpose: Released as a baseline for audio event classification and to inspire new applications.
  • Improvements: Model includes refinements to handle challenges of imbalanced priors and weak labels .

References :

--

--

Tiya Vaj
Tiya Vaj

Written by Tiya Vaj

Ph.D. Research Scholar in NLP and my passionate towards data-driven for social good.Let's connect here https://www.linkedin.com/in/tiya-v-076648128/

No responses yet