`lmp.util.optim`#

Optimization utilities.

lmp.util.optim.get_optimizer(beta1: float, beta2: float, eps: float, lr: float, model: BaseModel, weight_decay: float) → AdamW[source]#

Get AdamW optimizer.

Parameters

beta1 (float) – First coefficient of gradient moving average.
beta2 (float) – Second coefficient of gradient moving average.
eps (float) – Numerically saved computation term.
lr (float) – Learning rate of gradient descent.
model (lmp.model.BaseModel) – Language model to be optimized.
weight_decay (float) – Weight decay coefficient.

Returns

Language model optimizer.

Return type

torch.optim.AdamW

See also

torch.optim.AdamW: AdamW algorithm.

lmp.util.optim.get_scheduler(optim: AdamW, total_step: int, warmup_step: int) → LambdaLR[source]#

Get linearly decay scheduler with linearly warm up.

Learning rate will first linearly increase (warm up) to the specified value, then linearly decay to 0.

Parameters

optim (torch.optim.AdamW) – Optimizer to be scheduled.
total_step (int) – Total training step.
warmup_step (int) – Learning rate warmup step.

Returns

Optimizer learning rate scheduler.

Return type

torch.optim.lr_scheduler.LambdaLR