Ship a capable ~3B LLM inside your iOS app — guided generation, tool calling, and when to still reach for Claude