News

In a significant move to empower developers and teams working with large language models (LLMs), OpenAI has introduced the Evals API, a new toolset that brings programmatic evaluation capabilities to ...
Additionally, existing evaluation metrics often fail to align with human preferences. Our agent-based evaluation emphasizes a flexible, scalable, and evolving system to keep up with the rapid ...