The differences between BERT and mBERT

Tiya Vaj
2 min readFeb 29, 2024

The main differences between BERT (Bidirectional Encoder Representations from Transformers) and mBERT (Multilingual BERT) lie in their design and intended use:

1. Language Coverage:
— BERT: BERT was initially trained on English language data and primarily focused on English NLP tasks.
— mBERT: mBERT, on the other hand, is trained on data from multiple languages. It is designed to be a multilingual model capable of understanding and processing text in various languages without the need for language-specific models.

2. Pre-training Data:
— BERT: BERT’s pre-training corpus consists mainly of English text data, including books, articles, and websites.
mBERT: mBERT is trained on a mixture of monolingual and parallel data from multiple languages. This broader training data allows mBERT to capture cross-lingual relationships and transfer knowledge across different languages.

3. Vocabulary:
— BERT: BERT uses a vocabulary primarily tailored to English text, including English words, subwords, and special tokens.
mBERT: mBERT’s vocabulary is expanded to include tokens from multiple languages, allowing it to tokenize and process text in various languages.

4. Fine-tuning and Transfer Learning:
— BERT: BERT models are typically fine-tuned on downstream tasks using task-specific labeled data, primarily for English NLP tasks.
— mBERT: mBERT can be fine-tuned on labeled data from any language included in its multilingual training corpus. It can transfer knowledge across languages, making it useful for low-resource languages or cross-lingual tasks.

In summary, while BERT is specialized for English language understanding tasks, mBERT is a multilingual model capable of handling text in multiple languages. mBERT’s broader language coverage and ability to transfer knowledge across languages make it particularly useful for multilingual and cross-lingual NLP applications.



Tiya Vaj

Ph.D. Research Scholar in NLP and my passionate towards data-driven for social good.Let's connect here