We are assembling a small reviewer bench for people who want deeper exposure to how strong agent systems are actually tested. Share your background with agent evals, benchmarks, and tooling, and we may invite you into hands-on review work that sharpens your instincts for testing real-world agents.
Hands-on experience with benchmarking, rubric design, or review workflows.
Tell us what you have used: custom harnesses, trace tools, or public eval stacks.
Enough signal for us to match reviewers to the right work.