👨🏻‍💻 postech.work

Site Reliability Engineer

DT One • 🌐 In Person

In Person Posted 5 days, 6 hours ago

Job Description

About DT One

Keeping more people, more connected, more often

DT One was founded with the aim to provide mobile carriers with the infrastructure and services they need to help migrant workers stay in touch with their family and friends back home.

Today, we operate a leading global network for mobile top-up solutions, innovative mobile rewards, and Phone-to-Phone solutions.

Our global network delivers better infrastructure and access to digital communications for over five billion across emerging economies, enabling them to stay better connected and as a result participate more actively in the global economy.

As a company, we're forward-thinking, adaptable, and solutions-focused. We work closely with our network partners to provide them with valuable market insights and intelligent mobile technology that delivers more value to their business, and that ultimately benefits the end-consumer.

For more information, visit our website: www.dtone.com.

Context of the role

At DT One, we count on our Site Reliability Engineers (SREs) to empower our users with a rich feature set, high availability, and extreme performance level. As we expand our platform infrastructure and applications, we are currently seeking talented Site Reliability Engineers to maintain, improve, and flawlessly operate our environments, we are searching for someone who brings fresh ideas, demonstrates a unique and informed viewpoint, and enjoys collaborating with a globally distributed team to develop real-world solutions and positive user experiences at every interaction.

Key Responsibilities

Run the production environment by monitoring availability and taking a holistic view of system health

Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve

Establish and guarantee platform infrastructure, and applications service level objectives

Provide primary operational support and engineering for multiple large distributed software applications including on-call shifts

Build software and systems to manage network infrastructure, platform infrastructure, and applications

Improve reliability, quality, security, and time-to-market of our suite of software solutions

Partner with development teams to improve services through rigorous testing and release procedures

Participate in system design consulting, platform management, and capacity planning

Document every action turning findings into repeatable actions–and then into future automation

Professional Skills/Qualifications

Bachelor's degree in computer science or other highly technical, scientific discipline

Ability to program (structured and OO) with one or more high-level languages, such as Golang, Python, Ruby, and JavaScript

Experience with AWS cloud infrastructure management and related services

Experience with Infrastructure as Code and Configuration Management concepts and related tools and technologies, such as Terraform and Ansible

Hands-on experience with Linux administration, command-line interface, and shell scripting

Experience with dynamic resource management frameworks, and technologies, such as Kubernetes and Nomad

Experience with source code management tools, and related workflows

Experience with continuous integration and continuous deployment concepts and related tools and technologies, such as Jenkins, GitlabCI, Bitbucket Pipelines

A proactive approach to spotting problems, areas for improvement, and performance bottlenecks

Good communication skills in English

Preferred Qualifications

Previous success in technical engineering

Previous experience with multiple large distributed software applications operations

Previous experience defining and implementing deployment and release standards

Experience with database administration and performance tunings, such as PostgreSQL, MySQL, ElasticSearch, and Redis

Experience with monitoring tools, such as Prometheus, DataDog, and NewRelic

Experience with VPN configuration and administration

Coding experience beyond simple scripts

Strong Site Reliability principles oriented mindset

Sharing and mentoring mindset

Sound like you? Apply now!

Get job updates in your inbox

Subscribe to our newsletter and stay updated with the best job opportunities.