About the job
partnering with a top AI research organization to evaluate and improve how coding assistants reason, act, and communicate during development workflows. We’re seeking technically sharp experts (especially those with experience in code review, testing, or documentation) to assess full transcripts of user–AI coding conversations. This short-term, fully remote engagement helps shape the future of developer-assisting AI systems.
Key Responsibilities
Review long-form transcripts between users and AI coding assistants
Analyze the AI’s logic, execution, and stated actions in detail
Score each transcript using a 10-point rubric across multiple criteria
Optionally write brief justifications citing examples from the dialogue
Detect mismatches between claims and actions (e.g., saying “I’ll run tests” but not doing so)
Ideal Qualifications
Top choices:
Senior or Staff Engineers with deep code review experience and execution insight
QA Engineers with strong verification and consistency-checking habits
Technical Writers or Documentation Specialists skilled at comparing instructions vs. implementation
Also a strong fit:
Backend or Full-Stack Developers comfortable with function calls, APIs, and test workflows
DevOps or SRE professionals familiar with tool orchestration and system behavior analysis
Languages and Tools:
Proficiency in Python is helpful (most transcripts are Python-based)
Familiarity with other languages like JavaScript, TypeScript, Java, C++, Go, Ruby, Rust, or Bash is a plus
Comfort with Git workflows, testing frameworks, and debugging tools is valuable
More About the Opportunity
Remote and asynchronous — complete tasks on your own schedule
Must complete each transcript batch within 5 hours of starting (unlimited tasks to be done)
Flexible, task-based engagement with potential for recurring batches
Compensation \& Contract Terms
Competitive hourly rates based on geography and experience
Contractors will be classified as independent service providers
Payments issued weekly via Stripe Connect
Application Process
Submit your resume to begin
If selected, you’ll receive rubric documentation and access to the evaluation platform
Most applicants hear back within a few business days