Tether AI Research Group has released its next-generation medical AI model, QVAC MedPsy, which boasts the ability to run locally on low-computing-power hardware such as smartphones and wearable devices without relying on cloud servers. It also outperforms several larger-scale state-of-the-art (SOTA) models in multiple medical benchmark tests. Official data shows that the 1.7 billion parameter version of QVAC MedPsy achieved an average score of 62.62 in seven closed-loop medical benchmark tests, outperforming Google's MedGemma-1.5-4B-it by 11.42 points, despite being less than half the size of the latter. In real-world clinical tests such as HealthBench Hard, the model even surpassed MedGemma 27B, which has nearly 16 times the number of parameters. Furthermore, the 4 billion parameter version achieved an average score of 70.54, outperforming several large models nearly seven times larger in multiple medical inference evaluations. Tether states that this model achieves "high performance with a small model" through post-training medical inference optimization, reinforcement learning, and training with high-quality medical data. Compared to traditional cloud-based AI architectures, QVAC MedPsy significantly reduces inference costs. Its 4 billion parameter version generates an average of approximately 909 tokens, far lower than the 2953 tokens of similar systems, resulting in lower latency and computational costs. The model also offers a GGUF quantized version, suitable for local deployment on mobile and edge devices. Paolo Ardoino stated that the core goal of this model is to improve model efficiency, rather than simply increasing the parameter scale, enabling medical AI to run directly on hospital systems or terminal devices, thereby avoiding the uploading of sensitive medical data to the cloud.