`lmp.script.train_model`#

Use this script to train language model on particular dataset.

This script is usually run after training tokenizer. Training performance will be shown on both CLI and tensorboard. Use pipenv run tensorboard to launch tensorboard and open browser with URL http://localhost:6006/ to see model training performance.

See also

lmp.model: All available language models.
lmp.script.train_tknzr: Tokenizer training script.

Examples

The following example script train Elman Net model ElmanNet on Wiki-Text-2 dataset WikiText2Dset with train version.

python -m lmp.script.train_model Elman-Net \
  --batch_size 32 \
  --beta1 0.9 \
  --beta2 0.999 \
  --ckpt_step 1000 \
  --d_emb 100 \
  --d_hid 100 \
  --dset_name wiki-text-2 \
  --eps 1e-8 \
  --exp_name my_model_exp \
  --init_lower -0.1 \
  --init_upper 0.1 \
  --label_smoothing 0.0 \
  --log_step 500 \
  --lr 1e-3 \
  --max_norm 10 \
  --max_seq_len 32 \
  --n_lyr 1 \
  --p_emb 0.5 \
  --p_hid 0.1 \
  --stride 32 \
  --tknzr_exp_name my_tknzr_exp \
  --total_step 10000 \
  --ver train \
  --warmup_step 5000 \
  --weight_decay 0.0

The training result will be save at path project_root/exp/my_model_exp and can be reused by other scripts. We only save checkpoints per --ckpt_step steps and log performance per --log_step steps.

One can increase --total_step to train more steps. Be careful model might overfit on datasets if model were trained with too many steps.

python -m lmp.script.train_model Elman-Net \
  --batch_size 32 \
  --beta1 0.9 \
  --beta2 0.999 \
  --ckpt_step 1000 \
  --d_emb 100 \
  --d_hid 100 \
  --dset_name wiki-text-2 \
  --eps 1e-8 \
  --exp_name my_model_exp \
  --init_lower -0.1 \
  --init_upper 0.1 \
  --label_smoothing 0.0 \
  --log_step 500 \
  --lr 1e-3 \
  --max_norm 10 \
  --max_seq_len 32 \
  --n_lyr 1 \
  --p_emb 0.5 \
  --p_hid 0.1 \
  --stride 32 \
  --tknzr_exp_name my_tknzr_exp \
  --total_step 100000 \
  --ver train \
  --warmup_step 5000 \
  --weight_decay 0.0

One can reduce overfitting with the following ways:

Increase --batch_size. This increase sample variance and make model hard to optimize.
Increase --weight_decay. This increase L2 penalty and make model output differences small when given large variance input.
Reduce model parameters (In ElmanNet this means reducing --d_emb or --d_hid). This make model capacity low and hard to memorize all samples. Thus model is forced to learn and utilize patterns found on different samples.
Use dropout (In ElmanNet this means increasing --p_emb or --p_hid). Dropout is a way to perform models ensembling without the cost of training multiple model instances.
Use label smoothing so that model is not optimized to predict exactly 0 or 1. This can be done by setting --label_smoothing to positive values.
Use any combinations of tricks above.

python -m lmp.script.train_model Elman-Net \
  --batch_size 32 \
  --beta1 0.9 \
  --beta2 0.999 \
  --ckpt_step 1000 \
  --d_emb 50 \
  --d_hid 50 \
  --dset_name wiki-text-2 \
  --eps 1e-8 \
  --exp_name my_model_exp \
  --init_lower -0.1 \
  --init_upper 0.1 \
  --label_smoothing 0.0 \
  --log_step 500 \
  --lr 1e-3 \
  --max_norm 10 \
  --max_seq_len 32 \
  --n_lyr 1 \
  --p_emb 0.5 \
  --p_hid 0.5 \
  --stride 32 \
  --tknzr_exp_name my_tknzr_exp \
  --total_step 10000 \
  --ver train \
  --warmup_step 5000 \
  --weight_decay 1e-1

We use torch.optim.AdamW to perform optimization. Use --beta1, --beta2, --eps, --lr and --weight_decay to adjust optimization hyperparameters. We also use --max_norm to perform gradient clipping which avoids gradient explosion.

python -m lmp.script.train_model Elman-Net \
  --batch_size 32 \
  --beta1 0.95 \
  --beta2 0.98 \
  --ckpt_step 1000 \
  --d_emb 100 \
  --d_hid 100 \
  --dset_name wiki-text-2 \
  --eps 1e-6 \
  --exp_name my_model_exp \
  --init_lower -0.1 \
  --init_upper 0.1 \
  --label_smoothing 0.0 \
  --log_step 500 \
  --lr 5e-4 \
  --max_norm 0.1 \
  --max_seq_len 32 \
  --n_lyr 1 \
  --p_emb 0.5 \
  --p_hid 0.1 \
  --stride 32 \
  --tknzr_exp_name my_tknzr_exp \
  --total_step 10000 \
  --ver train \
  --warmup_step 5000 \
  --weight_decay 0.0

You can use -h or --help options to get a list of available language models.

python -m lmp.script.train_model -h

You can use -h or --help options on a specific language model to get a list of supported CLI arguments.

python -m lmp.script.train_model Elman-Net -h

lmp.script.train_model.main(argv: List[str]) → None[source]

Script entry point.

Parameters: argv (list[str]) – List of CLI arguments.
Return type: None

lmp.script.train_model.parse_args(argv: List[str]) → Namespace[source]

Parse CLI arguments.

Parameters: argv (list[str]) – List of CLI arguments.

lmp.script.train_model#

`lmp.script.train_model`#