found from this guide by openai that if you do LoRA finetuning, you should target the projection layers within the expert modules as well.
from peft import LoraConfig, get_peft_model peft_config = LoraConfig( r=8, lora_alpha=16, target_modules="all-linear", target_parameters=[ "7.mlp.experts.gate_up_proj", "7.mlp.experts.down_proj", "15.mlp.experts.gate_up_proj", "15.mlp.experts.down_proj", "23.mlp.experts.gate_up_proj", "23.mlp.experts.down_proj", ], ) peft_model = get_peft_model(model, peft_config) peft_model.print_trainable_parameters()
i still have issues trying to finetune this model, and it's not clear to me that i'm doing things right. but i am having fun. i just wish this was a month long research project rather than a one week speedrun to get my project done.