Stop Training Your LLM Router: Zero-Shot Confidence Beats Supervised Baselines
Published:
TL;DR: We tested whether you actually need labeled training data to route queries between a cheap local LLM and an expensive cloud model. You don’t. Average token log-probability — available for free from the first query — matches supervised routing in-distribution and crushes it when the query distribution shifts. We tested across 3 model families, 2 datasets, ~4,500 queries, and $123 in cloud costs. All code and data are open.
