About the Role
We are seeking an exceptionally skilled and experienced Principal DevOps Engineer to join our team. This role is pivotal in designing, implementing, and maintaining our next-generation cloud infrastructure and CI/CD pipelines. The ideal candidate will be a deep technical expert focused on enhancing system reliability, scalability, security, and, critically, cost efficiency. This specialist must possess a profound understanding of the AWS ecosystem, Kubernetes containerization, automation scripting, and cybersecurity principles.
Key Responsibilities
1Infrastructure and Systems Mastery
Operating System Expertise: Expert proficiency in both Windows and Linux operating
systems, capable of skillfully managing system configurations and resolving complex,
system-level issues.
Network Engineering: Expert knowledge of network principles; proficiently configure
and maintain wired, wireless intranet, and VPN networks.
Cloud Infrastructure Management (AWS):
Act as an AWS Services Expert, responsible for configuring, maintaining, and optimizing the AWS infrastructure, including security and IAM permissions, cross-region communication, traffic monitoring, routing configuration, various storage services (S3, EBS, etc.), and RDS databases.
AWS Billing Service Specialist: Adept at understanding and analyzing the cost structure of AWS expenditures and implementing correct adjustments and architectural changes to achieve continuous cost reduction and optimization.
Containerization and Orchestration:
Expert knowledge of Kubernetes (K8s) and Docker-compose, with a deep understanding of cloud container management and operational lifecycle.
Proficiently use ArgoCD and FluxCD (GitOps) for automated deployment and management of K8s applications.
2Automation and CI/CD Pipeline
Continuous Integration/Continuous Deployment (CI/CD): Proficient in configuring,
scripting and maintaining Jenkins or GitHub Actions to build and optimize robust
CI/CD pipelines.
Deployment Proficiency: Skillfully deploy various application types to AWS, including
Amazon EKS, ECS, Fargate, Lambda, and other serverless/container platforms.
Service Exposure: Expertly leverage AWS API Gateway and CloudFront for secure
and high-performance application publishing and distribution.
Scripting and Automation: Expert in Python and Shell scripting for automating
operational tasks, infrastructure management, and data handling.
Specific Service Maintenance: Proficiently configure and maintain a CodePush
Server.
3Security, Monitoring, and Data
Security Specialization: Act as an expert in Cyber Security and Network Security,
implementing and enforcing robust security measures across the infrastructure and
applications.
SSO Implementation: Proficiently configure and maintain SSO (Single Sign-On)
services and integrate them into application code (e.g., Google, Okta, Azure, etc.).
Monitoring and Observability:
Expert at monitoring large-scale cloud services, identifying system health performance bottlenecks, and providing effective solutions.
Skilled in monitoring micro services metrics in K8s, understanding performance degradation and spikes, and implementing auto-scaling solutions to meet business demands.
Database Management: MySQL database administration expert, proficient in
transactions, locks, and complex SQL query analysis and performance tuning.
Data Warehousing: Experienced in ETL (Extract, Transform, Load) of large-scale data
into a data warehouse, with familiarity with Google BigQuery and Snowflake.
4Additional Skills and Documentation
AI Experience (Plus): Practical experience in setting up a local AI LLM environment.
Documentation: Highly proficient in writing and organizing technical documentation,
processes, and operational runbooks.
Required Qualifications (Summary)
Deep practical expertise in AWS, Kubernetes, CI/CD, automation, and network
security.
Excellent troubleshooting, problem-solving, and performance optimization skills.
Strong communication and collaborative skills.