How to Implement MSLR: Step-by-Step Tutorial
This tutorial walks through a practical, step-by-step process to implement MSLR (Multi-Stage Learning Rate scheduling — assumed here as a progressive learning-rate strategy). If you intended a different MSLR meaning (e.g., Mean Sea Level Rise, Minimum Sum of Least Residuals, or a domain-specific acronym), this guide assumes the machine-learning-related scheduling interpretation and shows a complete implementation you can adapt.
Prerequisites
- Basic experience with Python and PyTorch or TensorFlow.
- Installed packages: numpy, matplotlib, and either torch (recommended) or tensorflow.
- A dataset and model ready for training.
Overview of the approach
MSLR here means dividing training into distinct stages with different learning-rate policies per stage (warmup, base, decay, fine-tune). Typical stages:
- Warmup — raise LR from a small value to the base LR.
- Base — keep a stable base LR for main training.
- Decay — reduce LR (step, cosine, or exponential) to refine weights.
- Fine-tune — very small LR for last epochs.
Step 1 — Choose stage lengths and LR values
- Total epochs: 100 (example).
- Warmup: 5 epochs, LR from 1e-6 to 1e-3.
- Base: 60 epochs, LR = 1e-3.
- Decay: 30 epochs, LR decays from 1e-3 to 1e-5 (cosine or exponential).
- Fine-tune: optional extra 5–10 epochs at 1e-6.
Make decisions based on dataset size and model complexity.
Step 2 — Implement schedulers (PyTorch example)
Code below demonstrates MSLR with PyTorch using a custom scheduler that switches policies between stages.
python
import math import torch from torch.optim import SGD from torch.optim.lr_scheduler import LambdaLR def make_ms_lr_scheduler(optimizer, total_epochs, warmup_epochs=5, base_epochs=60, decay_epochs=35, lr_init=1e-6, lr_base=1e-3, lr_final=1e-5): assert warmup_epochs + base_epochs + decay_epochs == total_epochs def lr_lambda(epoch): if epoch < warmup_epochs: # linear warmup t = (epoch + 1) / warmup_epochs return (lr_init (1 - t) + lr_base t) / lr_base elif epoch < warmup_epochs + base_epochs: # base return 1.0 else: # cosine decay over decay_epochs d = epoch - (warmup_epochs + base_epochs) t = d / decay_epochs cos_decay = 0.5 (1 + math.cos(math.pi t)) lr = lr_final + (lr_base - lr_final) cos_decay return lr / lr_base return LambdaLR(optimizer, lr_lambda) # Example usage: model = ... # your nn.Module optimizer = SGD(model.parameters(), lr=1e-3, momentum=0.9) scheduler = make_ms_lr_scheduler(optimizer, total_epochs=100, warmup_epochs=5, base_epochs=60, decay_epochs=35, lr_init=1e-6, lr_base=1e-3, lr_final=1e-5) for epoch in range(100): train_oneepoch(...) # your training loop validate(...) # optional scheduler.step()
Step 3 — TensorFlow/Keras equivalent
Use callbacks to adjust learning rate per epoch. Example using a custom callback:
python
import math import tensorflow as tf class MSLRCallback(tf.keras.callbacks.Callback): def init(self, total_epochs, warmup_epochs, base_epochs, decay_epochs, lr_init, lr_base, lr_final): super().init() self.total_epochs = total_epochs self.warmup_epochs = warmup_epochs self.base_epochs = base_epochs self.decay_epochs = decay_epochs self.lr_init = lr_init self.lr_base = lr_base self.lr_final = lr_final def on_epoch_begin(self, epoch, logs=None): if epoch < self.warmup_epochs: t = (epoch + 1) / self.warmup_epochs lr = self.lr_init (1 - t) + self.lr_base t elif epoch < self.warmup_epochs + self.base_epochs: lr = self.lr_base else: d = epoch - (self.warmup_epochs + self.base_epochs) t = d / self.decay_epochs cos_decay = 0.5 (1 + math.cos(math.pi t)) lr = self.lr_final + (self.lr_base - self.lr_final) cos_decay tf.keras.backend.set_value(self.model.optimizer.lr, lr)
Add MSLRCallback(…) to callbacks when calling model.fit().
Step 4 — Integrate with other training components
- Use weight decay/regularization as needed.
- Combine with gradient clipping, mixed precision, or distributed training without changing LR logic.
- Log LR per epoch for debugging (print or TensorBoard).
Step 5 — Monitor and tune
- Track loss, accuracy, and LR schedule.
- If training diverges during warmup: reduce warmup slope (longer warmup or lower base LR).
- If convergence stalls: try slower decay (longer base) or different decay shape (exponential, step).
Troubleshooting quick tips
- Overfitting late in training: shorten base, increase decay, or add regularization.
- Underfitting: increase base LR or lengthen base stage.
- Noisy training: use longer warmup and gradient clipping.
Example hyperparameter presets
- Small dataset: total 50 epochs — warmup 3, base 30, decay 17.
- Large dataset / large model: total 200 epochs — warmup 10, base 140, decay 50.
Summary
Implement MSLR by splitting training into warmup, base, decay, and optional fine-tune stages; encode those stages in a scheduler or callback; monitor metrics and adjust stage lengths and LR endpoints to suit your model and data.
Leave a Reply