lmp.util.optim
#
Optimization utilities.
- lmp.util.optim.get_optimizer(beta1: float, beta2: float, eps: float, lr: float, model: BaseModel, weight_decay: float) AdamW [source]#
Get AdamW optimizer.
- Parameters
beta1 (float) – First coefficient of gradient moving average.
beta2 (float) – Second coefficient of gradient moving average.
eps (float) – Numerically saved computation term.
lr (float) – Learning rate of gradient descent.
model (lmp.model.BaseModel) – Language model to be optimized.
weight_decay (float) – Weight decay coefficient.
- Returns
Language model optimizer.
- Return type
See also
torch.optim.AdamW
AdamW algorithm.
- lmp.util.optim.get_scheduler(optim: AdamW, total_step: int, warmup_step: int) LambdaLR [source]#
Get linearly decay scheduler with linearly warm up.
Learning rate will first linearly increase (warm up) to the specified value, then linearly decay to
0
.- Parameters
optim (torch.optim.AdamW) – Optimizer to be scheduled.
total_step (int) – Total training step.
warmup_step (int) – Learning rate warmup step.
- Returns
Optimizer learning rate scheduler.
- Return type