Run arbitrary open models on your Mac — unified memory, quantization that actually preserves quality, and the honest limits of a single-user inference node