the M5 Max is a $3,500 toy the only number that matters for local AI: tokens/sec what actually matters for LLM inference (in order): memory bandwidth → determines tok/s VRAM/unified memory → determines model size compute → barely matters for inference 25 tokens/sec on 70B sounds impressive until you realize agentic AI needs 100+ to not feel broken M5 is a great laptop, but not server material