Why we freeze some layers for transfer learning

2 min readJan 24, 2024

Freezing some initial layers, often referred to as “feature extraction layers,” is a common practice in transfer learning when adapting a pre-trained neural network model for a specific task. The main reason to freeze these layers is to leverage the knowledge and features learned from a large, diverse dataset during pre-training, while fine-tuning only the later layers of the network for the new task. Here’s why you might want to freeze some initial layers:

1. Feature Reusability: The initial layers of a deep neural network, especially in convolutional neural networks (CNNs), learn low-level features like edges, textures, and basic shapes. These features are generic and transferable across many tasks and datasets. By freezing these layers, you preserve these learned features, which can be extremely useful for the new task.

2. Data Efficiency: Fine-tuning a large neural network from scratch on a limited amount of task-specific data can lead to overfitting because the model has a high capacity to learn complex patterns. Freezing the initial layers helps mitigate this problem by restricting the number of parameters that can be updated during training, reducing the risk of overfitting when you have limited data.

3. Faster Convergence: Since the initial layers have already learned generic features, they provide a good starting point for the model. By keeping them fixed, you can achieve faster convergence during training because the model doesn’t need to relearn basic patterns.

4. Stability: Freezing the initial layers can stabilize training. When you’re fine-tuning a pre-trained model, the gradients from the later layers can be large and unstable initially. By keeping the earlier layers fixed, you ensure that the lower-level features remain consistent, which can help stabilize the training process.

5. Regularization: The fixed initial layers act as a form of regularization. They impose a constraint on the model’s capacity, making it less likely to fit the noise in the new data and improving generalization.

6. Resource Efficiency: Training deep neural networks with millions of parameters can be computationally expensive and time-consuming. Freezing the initial layers reduces the number of parameters that need to be updated, making the fine-tuning process more efficient.

In practice, you can choose how many initial layers to freeze based on the specific task and dataset. It’s common to freeze most of the initial layers and fine-tune only the later layers that are more task-specific. The exact architecture and number of layers to freeze depend on the nature of your transfer learning problem and may require experimentation to find the best configuration for your particular task.

Why we freeze some layers for transfer learning

Written by Tiya Vaj