lmp.util.optim#

Optimization utilities.

lmp.util.optim.get_optimizer(beta1: float, beta2: float, eps: float, lr: float, model: BaseModel, weight_decay: float) AdamW[source]#

Get AdamW optimizer.

Parameters
  • beta1 (float) – First coefficient of gradient moving average.

  • beta2 (float) – Second coefficient of gradient moving average.

  • eps (float) – Numerically saved computation term.

  • lr (float) – Learning rate of gradient descent.

  • model (lmp.model.BaseModel) – Language model to be optimized.

  • weight_decay (float) – Weight decay coefficient.

Returns

Language model optimizer.

Return type

torch.optim.AdamW

See also

torch.optim.AdamW

AdamW algorithm.

lmp.util.optim.get_scheduler(optim: AdamW, total_step: int, warmup_step: int) LambdaLR[source]#

Get linearly decay scheduler with linearly warm up.

Learning rate will first linearly increase (warm up) to the specified value, then linearly decay to 0.

Parameters
  • optim (torch.optim.AdamW) – Optimizer to be scheduled.

  • total_step (int) – Total training step.

  • warmup_step (int) – Learning rate warmup step.

Returns

Optimizer learning rate scheduler.

Return type

torch.optim.lr_scheduler.LambdaLR