👨🏻‍💻 postech.work

Site Reliability Engineer

TalentoHCM • 🌐 In Person • 💵 $102,539 - $168,062

In Person Posted 5 days, 6 hours ago

Job Description

Site Reliability Engineer

On site in Miami

Contract

Talento has partnered with an enterprise organization on a search for an SRE Engineer based in Miami, FL. The Site Reliability Engineer (SRE) ensures the availability, performance, security, and reliability of the organization’s infrastructure and applications. This role focuses on building automation, improving system resilience, managing observability platforms, and supporting incident response processes. The SRE works closely with infrastructure, security, and application teams to maintain a highly scalable and stable environment.

Requirements

5+ years of experience in Site Reliability Engineering, DevOps, or related infrastructure roles

Strong knowledge of cloud platforms (AWS, Azure, or GCP)

Proficiency in automation and scripting (Python, Bash, PowerShell, etc.)

Experience with CI/CD pipelines and infrastructure-as-code tools (Terraform, CloudFormation, etc.)

Hands-on experience with observability/monitoring tools (Datadog, Prometheus, Grafana, Splunk, etc.)

Understanding of networking, security best practices, and system performance tuning

Experience supporting production environments and participating in incident response

Strong troubleshooting skills and ability to diagnose complex system issues

Responsibilities

Improve system uptime, resilience, performance, and overall security posture

Develop automation for deployments, monitoring, alerting, and infrastructure scaling

Manage observability platforms, including dashboards, alerts, and log pipelines

Lead and support incident response workflows to restore services quickly and prevent recurrences

Implement best practices for reliability engineering, performance optimization, and security hardening

Collaborate with infrastructure, cybersecurity, and application teams to ensure reliable system integrations

Maintain documentation, runbooks, and operational standards for system reliability

Continuously evaluate and implement tools to enhance performance, automation, and monitoring

Get job updates in your inbox

Subscribe to our newsletter and stay updated with the best job opportunities.