What is the activation function?

Tiya Vaj
2 min readApr 27, 2024

In a neural network, the activation function acts like a gatekeeper, deciding how much information a neuron should pass on to the next layer. It takes the weighted sum of inputs from the previous layer (along with a bias term) and applies a mathematical function to determine the neuron’s output.

Let’s denote the weights from the previous layer as wj (where j refers to the index of the input neuron), the inputs from the previous layer as xj, and the bias term as b.

The weighted sum can be expressed as:

Σ(wj * xj) + b

This represents the sum of all the products of weights and inputs, along with the bias term.

The activation function f takes this weighted sum as input and produces the neuron’s output (activation):

y = f(Σ(wj * xj) + b)

Here, y represents the output of the current neuron.

By applying the activation function, the neuron regulates the information it transmits based on the strength of the weighted input it receives. Neurons with high activation values will pass on more information to the next layer, while those with low activation values will contribute less. This selective gating mechanism is what allows neural networks to learn complex patterns and make decisions.

Here’s a breakdown of the key roles of activation functions:

Introducing Non-Linearity:

  • Without activation functions, neural networks would only be able to model linear relationships between inputs and outputs. This is a major limitation, as real-world data often has complex, non-linear patterns.
  • Activation functions add a crucial layer of complexity, allowing neural networks to learn these non-linear patterns. Imagine the difference between a straight line and a winding road — activation functions enable networks to navigate the curves of real-world data.

Adding Expressiveness:

  • By introducing non-linearity, activation functions make neural networks more expressive. They allow the network to create a wider range of outputs, essential for tasks like image recognition or speech classification.
  • Think of activation functions as adding a variety of brushes to an artist’s palette. With more tools, the network can create a richer and more nuanced representation of the data.

Computational Efficiency:

  • Some activation functions are computationally simpler than others. This can be important for training large neural networks, where efficiency becomes a concern.
  • The choice of activation function often involves a balance between expressiveness and computational cost.

Here are some common examples of activation functions used in neural networks:

  • Sigmoid Function: Outputs a value between 0 and 1, often used for classification problems.
  • ReLU (Rectified Linear Unit): Outputs the input directly if it’s positive, otherwise outputs zero. Popular for its efficiency and effectiveness in many tasks.
  • TanH (Hyperbolic Tangent): Outputs a value between -1 and 1, similar to the sigmoid function but often with faster convergence during training.

Choosing the right activation function depends on the specific problem and network architecture.

Overall, activation functions are fundamental building blocks of neural networks. They add the necessary non-linearity and expressiveness to enable networks to learn complex patterns and perform a wide range of tasks.

--

--

Tiya Vaj

Ph.D. Research Scholar in NLP and my passionate towards data-driven for social good.Let's connect here https://www.linkedin.com/in/tiya-v-076648128/