7 Multi-label text classification dataset

Tiya Vaj
2 min readJun 18, 2022

--

  1. Netflix Movies and TV Shows from Kaggle

dataset link : https://www.kaggle.com/datasets/shivamb/netflix-shows

The tutorial : https://medium.com/swlh/multi-label-text-classification-with-scikit-learn-and-tensorflow-257f9ee30536

2.StackSample: 10% of Stack Overflow Q&A from Kaggle

dataset link : https://www.kaggle.com/datasets/stackoverflow/stacksample

The tutorial : https://pianalytix.com/multi-label-text-classification/

3.Reuters dataset from UCI

The tutorial : https://medium.com/technovators/machine-learning-based-multi-label-text-classification-9a0e17f88bb4

4.Toxic Comment Classification Challenge

The tutorial : https://towardsdatascience.com/multi-label-text-classification-with-scikit-learn-30714b7819c5

https://stackabuse.com/python-for-nlp-multi-label-text-classification-with-keras/

5.Topic Modeling for Research Articles

the tutorial :

6.mysql, python, and phptags dataset

the tutorial : https://www.section.io/engineering-education/multi-label-classification-with-scikit-multilearn/

7.Arxiv dataset

the tutorial :https://medium.com/analytics-vidhya/building-multi-label-text-classifiers-for-arxiv-paper-abstract-dataset-1cc5353b3e96

Finally, I just found the multi-label classification dataset repository ,there are plenty of multi-label that you can have a look. https://www.uco.es/kdis/mllresources/

Any other datasets related to multi-label text classification can be suggested in the comment.

Enjoy learning everyone!!

--

--

Tiya Vaj

Ph.D. Research Scholar in NLP and my passionate towards data-driven for social good.Let's connect here https://www.linkedin.com/in/tiya-v-076648128/