Member-only story

Problems with RNN and how attention weights solved this?

Tiya Vaj
2 min readJan 19, 2025

--

Tthe attention mechanism is indeed inspired by the idea of context in Recurrent Neural Networks (RNNs), but it represents a more flexible and powerful way to manage context.

Here’s how it evolved from RNNs:

RNNs and Context:

  • RNNs are designed to process sequences (like sentences or time-series data) by maintaining a hidden state that carries information from previous time steps. This hidden state is used to capture the context from the earlier parts of the sequence.
  • In an RNN, the context is passed from one step to the next in the form of a hidden state, and this hidden state gets updated as new data comes in. This means that the model relies on the current hidden state to remember previous context.

Problem with RNNs:

  • The key issue with RNNs (especially vanilla RNNs and even LSTMs/GRUs) is that as the sequence gets longer, the model struggles to remember distant information due to problems like the vanishing gradient problem.
  • Even though LSTMs and GRUs were designed to solve some of these issues, RNNs still tend to focus on the most recent parts of the sequence and can forget important earlier parts, especially in long sequences.

Attention Mechanism:

  • The attention mechanism was introduced as a way to explicitly focus on different parts of the input

--

--

Tiya Vaj
Tiya Vaj

Written by Tiya Vaj

Ph.D. Research Scholar in NLP and my passionate towards data-driven for social good.Let's connect here https://www.linkedin.com/in/tiya-v-076648128/

No responses yet