Job Description
You will play a key role in building, maintaining, and optimizing a shared cloud infrastructure platform to support multiple software development teams. The role requires a combination of operational engineering, automation, and system thinking.
Key Responsibilities:
Design and implement cloud infrastructure platforms serving dozens of microservices: compute, storage, network, identity, observability, etc.
Operate and optimize Kubernetes systems, CI/CD pipelines, GitOps, service mesh, secrets management, autoscaling, etc.
Build centralized monitoring, alerting, and logging systems (Cloud Logging, Prometheus, Grafana, Loki, ELK, Datadog, etc.).
Automate the full infrastructure lifecycle using Terraform, Helm, Ansible, DroneCI, or equivalent.
Ensure scalability, security, stability, and performance of the entire cloud platform.
Work closely with DevOps, Security, Database, Networking, and product development teams.
Requirements:
Minimum 3 years’ experience operating cloud systems: GCP, AWS, Azure, or Oracle Cloud Infrastructure (OCI).
Strong knowledge of infrastructure: networking (VPC, DNS, Load Balancer), security, IAM, autoscaling, HA/DR, cost management.
Hands-on experience with Kubernetes: cluster setup, upgrades, networking (CNI), storage (PVC), monitoring, etc.
Proficiency with tools: Terraform, Helm, Docker, GitLab CI/GitHub Actions, DroneCI, Buildpacks, etc.
Deep knowledge of Linux: kernel, cgroups, CLI commands.
Incident response experience, debugging logs, tracing, performance tuning.
Ability to script with Bash or Go.
Strong system analysis skills, proactivity, and good communication with stakeholders.
Preferred Qualifications:
Experience implementing GitOps (DroneCI, Kustomize).
Experience deploying service mesh (Envoy, Ingress Nginx, Istio, etc.).
Experience with cloud-native observability stack (OpenTelemetry, Loki, Prometheus, Grafana, ELK, etc.).
Experience working in fast-paced CI/CD environments (hundreds of deploys/day).
Cloud certifications: CKA, CKAD, AWS/GCP/Azure Professional, etc.
Understanding of software architecture and experience debugging Java applications.
Mentoring or internal training skills.