Company overview
Avantos is building the industry's first AI-native operating system for financial services, redefining how firms onboard clients, deliver advice, and manage core servicing workflows. Our platform unifies fragmented data, automates complex processes, and embeds intelligent decision-making across every step of the client lifecycle.
We partner with leading financial institutions and are scaling rapidly. We're an execution-driven, design-obsessed, product-led team composed of founders and leaders from Wharton, MIT, top design programs, and prior unicorn SaaS companies. We move fast, solve deep industry problems, and build technology that puts users back in control of their workflows.
If you love client impact, product design, complex problem solving, and bringing AI-enabled change to real-world businesses, Avantos is where you will thrive.
Job summary
We're seeking a Senior DevOps Engineer / Site Reliability Engineer to own and evolve our infrastructure, reliability, and deployment practices. You'll be responsible for building the foundational platform that enables our engineering teams to ship quickly and reliably while maintaining the security and compliance standards required in financial services.
Design, implement, and maintain our AWS cloud infrastructure using infrastructure-as-code principles with Terraform
Build and optimize CI/CD pipelines to enable rapid, safe deployments across multiple environments
Own observability strategy—implement comprehensive monitoring, logging, and alerting systems using Datadog and other tooling
Architect and manage containerized workloads on ECS Fargate and evaluate migration paths to Kubernetes
Establish and enforce security best practices, working closely with compliance teams on financial services requirements
Design and implement disaster recovery, backup, and business continuity strategies
Optimize system performance, cost efficiency, and resource utilization across AWS services
Collaborate with engineering teams to improve service reliability, reduce toil, and establish SLOs/SLIs
Participate in incident response and conduct thorough post-mortems to drive continuous improvement
Mentor engineers on DevOps practices, cloud architecture patterns, and operational excellence
Your skills will include
8+ years of experience in DevOps, SRE, or infrastructure engineering roles
Expert-level proficiency with AWS services including ECS Fargate, ALB, Cognito, S3, SQS, and related services
Deep hands-on experience with Terraform for managing complex, multi-account AWS environments
Strong scripting and automation skills in Python and/or Bash
Proven experience designing and implementing CI/CD pipelines (GitHub Actions, ArgoCD, or similar)
Solid understanding of containerization technologies (Docker) and orchestration platforms (Kubernetes/ECS)
Experience with observability and monitoring tools (Datadog, CloudWatch, or equivalent)
Deep knowledge of networking, security, and AWS best practices
Strong problem-solving abilities and experience troubleshooting complex distributed systems
Excellent communication skills and ability to work cross-functionally with engineering teams
Nice to haves
Experience in financial services or highly regulated industries
Familiarity with event-driven architectures and message queue systems (Kafka, SQS)
Experience with PostgreSQL performance tuning and RDS management
Knowledge of microservices architecture patterns and service mesh technologies
Experience with security tooling, vulnerability scanning, and compliance frameworks
Familiarity with our application stack (Golang, Next.js, PostgreSQL)
Experience managing AI/ML infrastructure and AWS Bedrock
What we offer:
Competitive compensation + meaningful equity
Opportunity to build production infrastructure from the ground up for a rapidly scaling AI platform
A culture optimized for engineering excellence, focus, deep work, and ownership—not ticket factories
Remote work flexibility