Expert.ai is the premier artificial intelligence platform for language understanding. Its unique hybrid approach to NL combines symbolic human-like comprehension and machine learning to transform language-intensive processes into practical knowledge, providing the insight required to improve decision making throughout organizations.
Our mission is simple: we want to create technology that transforms language into knowledge and insight!
Our vision is to make anyone an expert with simple, powerful AI tools that capture the value of natural language.
With your help, Expert.ai will continue to create experts all over the world. Join us to make a difference!
We are looking for an open minded, highly motivated person to reinforce our Technology Infrastructure team with a permanent contract.
In this role you will be responsible for designing, implementing, and maintaining the cloud infrastructure that powers our AI platform and enables our development teams to deliver innovative solutions. You will be part of a dynamic and growing Expert.ai team, working at the core of our technology stack. You will architect and manage highly available, scalable Kubernetes and OpenShift environments, implement robust backup and disaster recovery strategies, and ensure our infrastructure meets the highest standards of reliability and security.
This is an excellent opportunity to increase your knowledge and gain experience in managing enterprise-grade cloud infrastructure at scale. You will use your technical expertise to drive automation, optimize infrastructure performance and costs, and implement cutting-edge DevOps practices. This role at Expert.ai is a great opportunity to shape our infrastructure strategy, ensure platform reliability, and drive innovation in cloud technologies.
The HeadQuarters are based in Modena; work from home flexibility can be evaluated according to the job position and to the candidate experience.
What you will do: As a Senior Cloud Engineer on our Platform team, you will:
Design, implement, and maintain highly available, scalable cloud infrastructure
Manage and optimize Kubernetes and OpenShift container orchestration platforms
Develop and maintain Helm charts for application deployment and configuration management
Implement and manage backup strategies, disaster recovery, and business continuity plans
Ensure high availability (HA) and fault tolerance across all critical systems
Monitor, troubleshoot, and optimize cloud infrastructure performance and costs
Implement Infrastructure as Code (IaC) practices and automation workflows
Work collaboratively with development teams, architects, and operations to enable continuous delivery
Participate in on-call rotations and incident response procedures
Lead cloud migration projects and infrastructure modernization initiatives
Who you are:
Education \& Experience:
Minimum 6+ years of hands-on experience in cloud infrastructure, platform engineering, and DevOps
Expert-level experience managing production Kubernetes and OpenShift environments at scale
Proven track record of designing and implementing high-availability architectures
Required skills:
Expert proficiency in cloud platforms: AWS, Azure, or hybrid cloud environments
Strong scripting skills (at least python and bash)
Deep experience with different Kubernetes distros (AKS, EKS, K3s, Openshift)
Experience with infrastructure design patterns and best practices
Infrastructure as Code tools (Terraform)
GitOps deployment pattern
CI/CD pipeline design and maintenance
Cloud networking and security principles
Monitoring and logging tools (Prometheus, Grafana, ELK/EFK)
Git base SCM tools (GitLab)
Nice-to-Have
Familiarity with backup and disaster recovery tools (Velero, Kasten K10/Kanister)
Secrets management (Vault, AWS Secrets Manager)
Exposure to cloud migration and modernization strategies
Business continuity strategies and HA architectures
Documentation and knowledge sharing practices
Certifications on cloud matters and Openshift are a plus
Soft Skills \& Collaboration:
Excellent communication skills in English (oral and written)
Strong problem-solving, troubleshooting, and debugging capabilities
Collaborate closely with the operations team to resolve incidents and outages effectively, even in high-pressure situations
Collaborative approach with development and operations teams
What We Value:
Passion for automation, efficiency, and infrastructure best practices
Commitment to reliability, security, and operational excellence
Ability to thrive in fast-paced, innovative environments
Proactive approach to learning emerging cloud technologies and trends
Strong focus on documentation, knowledge sharing, and team enablement
Experience with incident management and post-mortem analysis
We offer you:
Variety of exciting challenges with ample opportunities for development and training in a truly global landscape
Culture and values that focus on teamwork, innovation and passion for artificial intelligence and language
Flexible working arrangements and attention to work life balance
Equal opportunity employment experience that values difference and diversity
Customized induction \& Onboarding training that will facilitate the initiation process and accelerate your integration into our daily business activities.
Performance appraisal process that will bring annual assessment of competencies, targets achievement and areas of improvement
Welfare platform where you can buy services and goods