CodeTiburon is looking for a Senior Python Developer to join our team remotely.
Project: Agent Engineering Platform (AI Systems Control Layer)
An Israel-based startup is building a platform that helps teams design, observe, evaluate, and optimize AI agents systematically — like software engineering, not prompt guesswork. The platform provides the infrastructure needed for production-grade AI agents, including evaluation execution, orchestration, gating, reliability, and optimization workflows grounded in real usage data.
Required Skills:
10+ years of experience
building production-grade software systems
Strong backend expertise in
Python
Experience building
microservices and distributed systems
Knowledge of
async/event-driven architectures
, retries, scheduling, idempotency
Strong understanding of designing
clean and stable APIs
(service-to-service + SDKs/CLI)
Experience ensuring scale and correctness
Strong testing mindset: unit/integration tests, contract tests, CI/CD gates
Experience with
observability
: metrics, logs, tracing, profiling, performance/cost budgeting
Cloud and platform fundamentals:
Docker, Kubernetes, CI/CD pipelines
Comfortable working with AWS / GCP / Azure
Intermediate+ spoken and written English
Will be a plus:
Strong SRE/observability experience (profiling, tracing, incident response patterns)
Infrastructure-as-code (Terraform, Pulumi, etc.)
Security hardening / production readiness practices
Data/ML backend experience (retrieval systems, vector DBs, evaluation datasets)
Familiarity with frameworks like Ray, Optuna, or other compute/optimization systems
Ability to contribute to light full-stack development (React / TypeScript )
Your key accountabilities and responsibilities will include:
Own and evolve Python microservices and distributed workflows end-to-end
Build and harden distributed evaluation pipelines
(execution, scheduling, retries, idempotency, fault tolerance)
Design and maintain stable, well-documented APIs
(including versioning, SDK-first ergonomics, backward compatibility)
Engineer services for reliability and scale in multi-tenant production systems
Drive platform observability: metrics, logging, tracing, profiling
Work with static code analysis tools and program transformation
Contribute when needed to platform UI features (React/TypeScript)
What we offer:
Remote-friendly work with overlap in Israel/Europe time zone
Paid leaves and holidays
Small high-autonomy founding team with real ownership
Working product with SDK
Deep technical challenges: distributed systems, pipelines, orchestration, optimization
If this sounds like you and you have most of the skills and qualifications above please send your CV.
We sincerely thank all applicants for applying; if we like what we see and feel you are a match for our position, we will be in touch.