We are a dynamic, well-funded and innovative startup in the security industry, dedicated to making AI secure. Our cutting-edge product, backed by substantial funding, is set to make a significant impact in the market. We are aggressively pursuing our goals and are looking for a highly skilled and motivated individual to join our team as a Site Reliability Engineer.

Key Responsibilities:

Design, build, and maintain scalable AWS infrastructure with a focus on high availability and fault tolerance.

Design and configure ECS scaling strategies.

Optimize, monitor, and automate Amazon RDS (PostgreSQL) performance, backups, and failover strategies.

Implement disaster recovery plans, backup solutions, and system restoration procedures.

Develop and maintain infrastructure-as-code (IaC) using Terraform or CloudFormation.

Create monitoring and alerting systems using CloudWatch, Prometheus, Grafana, or Datadog.

Enhance CI/CD pipelines to improve deployment automation and system resilience.

Perform incident management, troubleshoot production issues, and conduct post-mortems.

Collaborate with engineering teams to ensure best practices in application reliability and performance.

Stay up-to-date with AWS services and industry best practices to drive continuous improvement.

Qalifications:

3+ years of experience in SRE, DevOps, or Cloud Engineering roles.

Previous experience in a high-scale, production environment.

Strong expertise in AWS services, particularly EC2, ECS, RDS, S3, IAM, and VPC.

Knowledge of event-driven architectures using AWS Lambda and SNS/SQS.

Hands-on experience managing databases in production environments.

Proficiency in Terraform, CloudFormation, or CDK for infrastructure automation.

Experience with containerization (Docker, ECS, Kubernetes).

Solid understanding of Linux systems, networking, and security best practices.

Proficiency in scripting (Python or Bash) for automation.

Strong troubleshooting and incident response skills.

Experience with monitoring and logging tools like CloudWatch, Prometheus, Grafana, or Datadog.

Experience working for a startup.

What We Offer:

An exciting and challenging work environment where you can make a real impact.

Competitive compensation and benefits package.

Opportunity to make a huge impact on the industry and have proportionately great upside.

The chance to work with a passionate and talented team on a groundbreaking product.

If you are a highly technical and hands-on professional with a passion for building secure and scalable SaaS solutions, we want to hear from you. Join us and be a part of our journey to transform the AI journey.

Job Type: Full-time

Pay: $70,000.00-$95,000.00 per year

Benefits:

Dental care

Extended health care

Paid time off

Work Location: In person

AWS Site Reliability Engineer (SRE)

Job Description

Login / Register

👋 Let's find you a Dream Job

Check Your Email!

Get job updates in your inbox