Resume Training from a Checkpoint
This page documents advanced/checkpoint.py resume in mint-quickstart.
Two resume modes
- Weights only: the script first tries
create_training_client_from_state(path)for auto-detection. If the metadata lookup returns404for a raw checkpoint path, it falls back tocreate_lora_training_client(...)plusload_state(path)usingMINT_BASE_MODEL/MINT_LORA_RANK(or their defaults). - With optimizer:
create_lora_training_client(...)plusload_state_with_optimizer(path)preserves optimizer momentum, but requiresMINT_BASE_MODELandMINT_LORA_RANK.
Use the MinT endpoint that matches your region:
- Mainland China:
https://mint-cn.macaron.xin/ - Outside Mainland China:
https://mint.macaron.xin/
Commands
# Weights only
export MINT_API_KEY=sk-...
python advanced/checkpoint.py resume tinker://<run-id>/weights/<checkpoint-name>
# Preserve optimizer state
export MINT_API_KEY=sk-...
export MINT_BASE_MODEL=Qwen/Qwen3-0.6B
export MINT_LORA_RANK=16
python advanced/checkpoint.py resume tinker://<run-id>/weights/<checkpoint-name> --with-optimizer --steps 3Useful flags:
--steps: number of post-resume SFT steps to run--lr: learning rate for those steps--save-name: name of the checkpoint saved after the resume steps finish
Core APIs
tc = service_client.create_training_client_from_state(resume_path)
tc = service_client.create_lora_training_client(base_model=model, rank=rank)
tc.load_state(resume_path).result()
tc = service_client.create_lora_training_client(base_model=model, rank=rank)
tc.load_state_with_optimizer(resume_path).result()Expected output
[resume] path=tinker://.../weights/my-ckpt-state with_optimizer=False steps=3
[resume] creating training client from state (optimizer resets)...
[resume] auto-detect state metadata lookup returned 404; retrying with explicit model/rank from env/defaults
[resume] fallback to explicit training client: model=Qwen/Qwen3-0.6B rank=16
[resume] loading state from tinker://.../weights/my-ckpt-state...
[resume] loaded, running 3 SFT step(s)...
[resume] step 1/3 done
[resume] saved: tinker://.../weights/resumed-checkpointCommon failure cases
- the checkpoint path is missing or invalid
--with-optimizeris used without matchingMINT_BASE_MODEL/MINT_LORA_RANK- the checkpoint was saved for a different adapter shape than the new client
- the base model is unavailable for your account