We are seeking a (Senior) Site Reliability Engineer with a strong DevOps mindset to drive automation, delivery excellence, and infrastructure scalability for our high-throughput payment platform. You will partner with engineering teams to streamline CI/CD pipelines, implement GitOps workflows, and build internal tools that improve developer productivity and operational reliability.
Responsibilities
Operate and manage large-scale systems with high availability and resilience requirements.
Build internal tools and scripts to eliminate manual work and streamline SRE/DevOps tasks.
Automate infrastructure provisioning, configuration, deployment, and monitoring across on-premise and cloud (e.g., AWS) environments.
Collaborate with development teams to design and maintain scalable, reliable, and secure systems.
Apply security and compliance best practices (e.g., PCI DSS, ISO 27001) across infrastructure.
Monitor and respond to incidents 24/7 with a focus on root cause elimination.
Continuously improve system performance, scalability, and reliability.
Requirements
3+ years of experience in SRE, DevOps, or Infrastructure Engineering roles.
Strong Linux systems background with solid understanding of OS-level debugging and performance tuning.
Expertise in CI/CD and automation tools (e.g., Jenkins, GitLab CI, Terraform, ArgoCD, Prometheus, Grafana).
Extensive experience building scalable and resilient Jenkins pipelines for build, test, and deployment workflows.
Proficient in scripting languages such as Python, Go, or Bash.
Deep knowledge of Kubernetes, container orchestration, and containerization best practices.
Familiarity with microservices architecture, observability, service mesh, and API gateways.
Experience with distributed systems technologies such as Kafka, Redis, MySQL, MongoDB, ETCD.