Also developers often want more ram, and if youre on the mac side, the M series ram works as video ram for loading and running models, so there’s a good chance you can already run something better than is typical of others, and apple is focusing on this by adding more NPUs and increasing memory bandwidth. They arent good at training, but can do inference.
Why would a developer likely have a high end GPU? Writing code doesn’t use a GPU.
There is a significant overlap between developers and gamers.
Most developers use their work provided machines, which aren’t gaming machines with giant GPUs because again, GPUs don’t help development at all.
Also developers often want more ram, and if youre on the mac side, the M series ram works as video ram for loading and running models, so there’s a good chance you can already run something better than is typical of others, and apple is focusing on this by adding more NPUs and increasing memory bandwidth. They arent good at training, but can do inference.
I’m on a MacBook with M2, 32GB ram. Literally just tried:
Well, I guess I’ll try again next year.
For context: my home pc is running gemma4:31b just fine. It’s also a beefy ass desktop, though.
Are you running an mlx model? If not, try that. My m4 macbook runs qwen3.6-35b-a3b lightning fast. Has its issues, but fast nonetheless.
What kind of context length can you get with that, and how much ram?
You might be doing something wrong, models that size shouldn’t be that slow if properly configured on a 32gb m2
You need a metal optimized client and model, not the same models you’d run on your desktop machine.