spent almost the entire day classifying documents with local models
- google/gemma-3-4b-it
- ibm-granite/granite-vision-3.2-2b
- HuggingFaceTB/SmolVLM-256M-Instruct
- h2oai/h2ovl-mississippi-800m
- a CPU compatible version
what i learned:
ollama
downloading models, flash attention doesn't work on mac
prompting doesn't work as well on smaller models
always go for the simplest approaches first, how would i solve this without LLMs?
images are slow
CPU is slow