the models were finetuned today. with an h100, it only took a few hours for 7b, less so for gemma 4b . the 70b on the other hand is a beefy one. without fsdp or deepspeed which would've sped up finetuning considerably, which is tough to setup.
with claude code, making plots and performance reports is so easy. adhoc scripts are all one shotted.