1. Role Overview
Collaborating with a leading AI research team to advance DeepResearch-2-App pipelines that simulate real-world code generation tasks. We’re seeking senior-level software engineers to serve as independent evaluators and supervisors in this process. You’ll help assess and refine AI-generated code across a wide range of domain-specific scenarios, with a focus on feasibility, functionality, and test coverage. This is a part-time, project-based contract ideal for highly experienced engineers looking to contribute to cutting-edge AI evaluation.
2. Key Responsibilities
Review domain-generated prompts and assess their feasibility from a coding perspective
Supervise model outputs and validate Docker file execution
Design and implement 40–60 unit tests per evaluation set
Review peer-generated unit tests for completeness and robustness
Execute unit tests and confirm code performance and reliability
3. Ideal Qualifications
6+ years of professional software engineering experience
Deep specialization in backend or full-stack development, with testing and evaluation experience
Strong ability to assess technical feasibility and debug complex systems
Experience with Docker and automated testing frameworks
Detail-oriented mindset and ability to provide structured technical feedback
4. More About the Opportunity
Remote and asynchronous — set your own schedule
Estimated workload: \~20 hours per week
Project-based contract, with ongoing need for evaluations
5. Compensation \& Contract Terms
$120/hour for all services rendered
Paid weekly via Stripe Connect
You’ll be classified as an independent contractor
6. Application Process
Submit your resume to get started
Complete a brief form to detail your technical expertise
If selected, you’ll receive onboarding materials and sample tasks
We consider all qualified applicants without regard to legally protected characteristics and provide reasonable accommodations upon request.
Contract and Payment Terms
You will be engaged as an independent contractor.
This is a fully remote role that can be completed on your own schedule.
Projects can be extended, shortened, or concluded early depending on needs and performance.
Your work will not involve access to confidential or proprietary information from any employer, client, or institution.
Payments are weekly on Stripe or Wise based on services rendered.