👨🏻‍💻 postech.work

Staff Site Reliability Engineer

Techcombank (TCB) • 🌐 In Person

In Person Posted 2 days, 20 hours ago

Job Description

About the Role

As a Staff Site Reliability Engineer (SRE) in the Data Engineering team, you will be a strategic technical leader responsible for driving reliability, scalability, and operational excellence across our data platforms. You will work across teams and divisions to influence architecture, establish best practices, and embed SRE principles into the fabric of our data engineering culture.

This is a high-impact role where you will shape the future of data reliability at Techcombank, mentor engineers, and lead initiatives that span multiple teams and domains.

Key Responsibilities

Lead the design and architecture of resilient, scalable, and observable data systems and pipelines.

Drive cross-functional collaboration across engineering, platform, governance, and business teams to align on reliability goals.

Define and enforce SLOs, SLIs, and error budgets at the platform and division level.

Champion SRE culture and practices across the organization, influencing engineering standards and operational maturity.

Lead incident response and postmortem processes, ensuring learnings are institutionalized.

Govern and automate change management and deployment processes for data infrastructure.

Guide the adoption of monitoring, alerting, and observability tools and practices across teams.

Mentor engineers and contribute to the technical growth of the SRE and Data Engineering community.

Partner with leadership to align reliability initiatives with business and regulatory priorities.

Qualifications

Required:

7+ years of experience in SRE, DevOps, or Data Engineering roles, with at least 2 years in a technical leadership or staff-level capacity.

Proven experience designing and operating large-scale, distributed data systems.

Deep understanding of SRE principles, error budgeting, and incident management.

Strong programming skills in Python, Scala, and SQL.

Demonstrated ability to influence across teams and divisions, driving adoption of best practices.

Experience with solution architecture and cross-functional technical leadership.

Preferred:

Experience with Databricks, Apache Spark, Kafka/Flink, and Delta Lake.

Familiarity with data governance tools like Collibra and Unity Catalog.

Knowledge of GraphQL, data APIs, and data mesh principles.

Get job updates in your inbox

Subscribe to our newsletter and stay updated with the best job opportunities.