gpt oss expert layers

found from this guide by openai that if you do LoRA finetuning, you should target the projection layers within the expert modules as well.

from peft import LoraConfig, get_peft_model

peft_config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules="all-linear",
    target_parameters=[
        "7.mlp.experts.gate_up_proj",
        "7.mlp.experts.down_proj",
        "15.mlp.experts.gate_up_proj",
        "15.mlp.experts.down_proj",
        "23.mlp.experts.gate_up_proj",
        "23.mlp.experts.down_proj",
    ],
)
peft_model = get_peft_model(model, peft_config)
peft_model.print_trainable_parameters()

i still have issues trying to finetune this model, and it's not clear to me that i'm doing things right. but i am having fun. i just wish this was a month long research project rather than a one week speedrun to get my project done.

BENEDICT NEO 梁耀恩

gpt oss expert layers