👨🏻‍💻 postech.work

Chief Site Reliability Engineer

EPAM Systems • 🌐 Remote

Remote Posted 1 week ago

Job Description

Become a key member of our Enterprise Technology team as a Chief Site Reliability Engineer, overseeing critical infrastructure and enterprise applications.

You will leverage your expertise in Site Reliability Engineering, CI/CD, cloud platforms, Kubernetes, and security to build resilient and scalable systems. If you are driven to lead innovation and maintain high availability, we invite you to join us.

Responsibilities

Oversee and enhance enterprise application infrastructure through advanced DevOps strategies

Design and manage CI/CD pipelines to facilitate efficient and dependable software delivery

Administer and upgrade Kubernetes clusters ensuring scalability and robust security

Create and maintain automation tools and scripts primarily in Python

Direct cloud infrastructure operations on Amazon Web Services and Microsoft Azure with emphasis on security and identity management

Collaborate with development teams to refine infrastructure as code practices using Terraform

Monitor system performance and implement proactive reliability measures

Coordinate operational requests and maintenance activities effectively

Diagnose and resolve complex infrastructure and deployment challenges

Ensure adherence to security standards and company policies across all systems

Document infrastructure setups and standard operating procedures comprehensively

Lead disaster recovery and business continuity initiatives

Continuously assess emerging technologies to enhance system reliability and efficiency

Requirements

Extensive experience of at least 7 years in Site Reliability Engineering or equivalent DevOps roles

Advanced proficiency in Python programming language

Comprehensive experience with Amazon Web Services and Microsoft Azure including API usage, authentication, and serverless solutions

Deep understanding of cloud networking, Kubernetes cluster management, security, IAM, and configuration automation

Strong knowledge of CI/CD workflows, source control systems, containerization, and infrastructure as code with Terraform

Proven expertise in enabling and improving IaaS environments

Demonstrated success in managing enterprise-scale software development and deployments

Thorough understanding of automation techniques related to CI/CD and IaaS

Exceptional analytical and complex problem-solving abilities

Effective management of operational requests and maintenance processes

Strong communication skills with English proficiency at B2+ level

Get job updates in your inbox

Subscribe to our newsletter and stay updated with the best job opportunities.