Position:

SWE Expert

Type:

Hourly contract

Compensation:

$70-$150 per hour

Commitment:

Project Based

Location:

Remote

Role Responsibilities

Translate high-level AI evaluation objectives into structured, testable deliverables with defined inputs, outputs, and success criteria

Create documentation describing expected behavior, constraints, and edge cases for evaluation workflows

Develop lightweight automation scripts that generate artifacts, validate outputs, and enforce formatting requirements

Build deterministic Python verifier scripts that confirm task completion through output or final-state validation

Design prompts and evaluation tasks that reliably trigger intended workflow behavior while preventing instruction leakage

Implement robust error handling and clear failure messaging in verification tooling

Develop negative-control or baseline approaches that test whether evaluation systems correctly distinguish valid solutions

Maintain well-structured, reproducible artifacts with consistent naming and version control practices

Requirements

Strong Python skills including scripting, file system operations, parsing, and deterministic validation logic

Experience with automated evaluation, testing frameworks, or verification workflows

Familiarity with prompt design and evaluation methodologies for large language models

Ability to create structured technical documentation using tools such as Markdown or similar formats

Experience with developer tooling such as Git, command-line workflows, virtual environments, and dependency management

Understanding of reproducible evaluation practices and deterministic task design

Strong communication skills with the ability to produce clear specifications and controlled project scope

Application Process (Takes 20 Mins)

Upload resume

Interview (15 min)

Submit form

Full Stack Developer | Remote

Job Description