YAMNET for audio classification

1 min readOct 12, 2024

--

Pre-trained Model: YAMNet is a deep learning model trained to predict 521 audio event classes.
Dataset: Trained on 1,574,587 10-second YouTube soundtrack excerpts from the AudioSet dataset (unbalanced train segments).
Architecture: Uses the MobileNet_v1 depthwise-separable convolution architecture for efficient computation.
Audio Processing: Designed to process audio files sampled at 16 kHz and make predictions at a 10 Hz frame rate.
Performance:
D-prime: 2.318
Balanced mAP: 0.306
Balanced average lwlrap: 0.393 (lwlrap: label-weighted label-ranking average precision, described in the DCASE 2019 Task 2 Overview Paper).
Keras Model: Includes Keras code for constructing the model and applying it to input audio files.
Purpose: Released as a baseline for audio event classification and to inspire new applications.
Improvements: Model includes refinements to handle challenges of imbalanced priors and weak labels .

References :

YamNet : Environmental Sound Classification 50

Explore and run machine learning code with Kaggle Notebooks | Using data from multiple data sources

www.kaggle.com

YAMNet: A pretrained audio event classifier

YAMNet is about 1/20th the size of VGGish (because it employs the efficient scheme of depth-separable convolutions)…

groups.google.com

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Written by Tiya Vaj

Ph.D. Research Scholar in NLP and my passionate towards data-driven for social good.Let's connect here https://www.linkedin.com/in/tiya-v-076648128/

No responses yet

Write a response

What are your thoughts?

Also publish to my profile

Help
Status
About
Careers
Press
Blog
Privacy
Rules
Terms
Text to speech