Position Summary
The candidate will be responsible for automated deployments, ensuring the highest reliability and scalability of our Production services, and efficiently managing our cloud platform infrastructure.
Our ideal candidate is a professional with experience in automating deployments with modern configuration and deployment management systems. The candidate requires a broad knowledge of systems, servers, load balancers, storage, security, networking, and some background in programming. We are using cloud infrastructure (AWS), containerization, CI, and CD process.
Responsibilities
Build, scale, and monitor various highly complex applications in our cloud platform infrastructure.
Build and maintain highly available systems on containerization (Docker \& Kubernetes).
Manage and support multitier architecture focusing on web technology stack (CDN, Reverse Proxy, Application, DB).
Working with application developers to automate and accelerate the testing, release and deployment of applications into a runtime environment quickly and reliably.
Improve reliability and performance of test and build processes
Design and maintain automated release channels
Proactively look for ways to automate the installation and upkeep of build tools and dependencies
Review and recommend solutions and tools to improve the software development process
Managing pre/post release code merges and the code branching strategies
Responsible for mentoring and teaching existing team members. As such, the ideal candidate must have experience clearly explaining solutions to complex problems and demonstrate the ability to lead and impart knowledge effectively to junior resources
Skills \& Qualifications
Bachelor’s degree in IT/Computer Science or related field; 4+ years in DevOps.
Hands-on experience with
CI/CD pipelines
and scripting (Python, Java, Go, PowerShell, etc.).
Good knowledge of
Linux systems
and
Kubernetes
in production environments.
Understanding of
distributed data platforms
(Kafka, Flink, Cassandra; Clickhouse is a plus).
Skilled in
automating hybrid-cloud (AWS \& On-Prem)
infrastructure and
using Terraform, Ansible
for configuration management.
Proficient in
AWS services
(EC2, S3, EKS, IAM, RDS) and
Unix/Shell scripting
.
Knowledge of
monitoring tools
(Prometheus, Grafana) and improving system
availability, latency, and reliability
.
Experienced in
incident response and root cause analysis (RCA)
across OS, network, and database in SaaS/IaaS environments, applying
SRE best practices
.
Strong
problem-solving
and
cost-optimization
mindset; passionate about
automation
.
Team player
, proactive and adaptable in a dynamic environment.