VSOL

is a digital enabler with a mission to help public and private organizations evolve their businesses through data and technology. We provide an end-to-end service from consulting to execution that drives the growth and innovation of our clients. As VSOL is in a phase of rapid expansion, we offer a dynamic, creative environment that accelerates your personal and professional development. We are looking for talented individuals eager to develop in international markets while contributing to the company’s future in a constructive and supportive manner.

Responsibilities:

Lead deployment and management of web applications, ensuring stability, scalability and reliability.

Design and manage hybrid environment reliability solutions (cloud and on-premises), optimizing for availability and performance.

Knowledge of orchestrate and administer containerized applications using Kubernetes, focusing on efficient deployment and runtime management.

Administer, including Geographic Information System (GIS) and databases (SQL Server), maintaining data integrity and high performance.

Analyze and mitigate service disruptions, developing strategic preventative measures to minimize downtime.

Understanding of network engineering principles.

Participate in evaluation and integration of new technologies, enhancing service reliability and operational capabilities.

Develop and automate critical system health metrics, using tools like

ELK stack.

Manage major incident response efforts, ensuring effective resolution to maintain system stability.

Coordinate with cross-functional teams to align SRE practices with business objectives and IT standards.

Create and review technical documentation for system architecture and operational procedures.

Assure regulatory compliance and security assessments, implementing best practices to protect system integrity.

Participate in pager-duty rotations, resolving critical incidents.

Note:

The position may require international travel for periods of 6 months continuously. Candidates will be required to accept this requirement as part of the positions.

Requirements

Over 4 years of experience with cloud environments and containerization technologies, including designing and implementing scalable, resilient infrastructure solutions using platforms

GCP

(and other cloud platforms)

, and Kubernetes.

Experience with monitoring and logging tools such as ELK Stack.

Demonstrated excellence in network management, advanced troubleshooting, and system optimization, with a focus on enhancing efficiency and reducing downtime.

Awareness of experience in IT, with advanced expertise in network engineering and system administration.

Awareness of experience in site reliability practices, any experience with GIS platforms is a plus.

Strong skills in scripting and automation, particularly with Python and Bash is a big plus.

Good knowledge of GitOps tools (e.g., Argo CD, FluxCD).

Knowledge of security frameworks and compliance standards.

Qualifications:

Bachelor’s degree in Computer Science, Information Technology, or a related field.

Cisco Certified Network Associate (CCNA) is a plus.

Certified Kubernetes Administrator (CKA) is a plus.

Written and spoken English communication skills at CEFR B1 level or above.

Why you’ll love working here:

Working in start-up environment, English-speaking, with opportunity to be part of innovation team and global projects

Onsite opportunities in UAE (United Arab Emirates) and KSA (Kingdom of Saudi Arabia)

13th-month salary bonus

Premium Health insurance for employees and family members (depending on level), Annual Health Check, Government Insurance in probation

14++ days of Annual leave and 5 days of Outing leave

Lunch allowance and free parking

Taxi \& phone allowance (depending on level)

Senior Site Reliability Engineer

Job Description

Login / Register

👋 Let's find you a Dream Job

Check Your Email!

Get job updates in your inbox