👨🏻‍💻 postech.work

Site Reliability Engineer

Hasaki.vn • 🌐 In Person

In Person Posted 13 hours, 22 minutes ago

Job Description

RESPONSIBILITIES

- Infrastructure Management:

Design and maintain highly available, scalable cloud infrastructure on AWS (or GCP/Azure), using

Kubernetes for orchestration.

Manage and optimize databases like MySQL (for transactional and analytical workloads) and

Elasticsearch (for search and geo queries).

- Reliability and Performance:

Ensure platform uptime of 99.99%, minimizing downtime during peak traffic (e.g., 20:00-22:00 in Vietnam or Black Friday in the U.S.).

Monitor and optimize system performance using tools like Prometheus, Grafana, and ELK Stack (Elasticsearch, Logstash, Kibana).

Reduce latency for critical services (e.g., product search \<200ms, transaction processing \<500ms).

- Automation and CI/CD:

Automate infrastructure provisioning and deployment using Terraform, Ansible, or Helm.

Maintain CI/CD pipelines with GitHub Actions or Jenkins, ensuring zero-downtime deployments.

Develop scripts to automate repetitive tasks (e.g., autos-scaling nodes, refreshing Elasticsearch

indices).

- Incident Response:

Lead incident response for outages or performance degradation, using PagerDuty or Opsgenie for 24/7 alerting.

Create and maintain runbooks for common issues (e.g., Elasticsearch query timeouts).

Conduct post-mortems to identify root causes and implement preventive measures.

- Security and Compliance:

Ensure compliance with CCPA, PCI DSS, and GDPR (for EU customers), securing data with TLS 1.3, AWS KMS, and encryption.

Protect against DDoS attacks using AWS WAF and Cloudflare.

Implement audit logs and data masking for sensitive data (e.g., user locations, payment details).

- Collaboration:

Work with Backend, Frontend, and Data teams to optimize microservices (e.g., Search, Payment, Logistics).

Integrate analytics tools like Google Analytics (GA4) to track user engagement (e.g., Average Engagement Time).

Support message queues like Beanstalkd/Kafka for asynchronous notifications (e.g., order updates).

REQUIREMENTS

Strong scripting skills in

Python, Go, or Bash

for automation.

Expertise in Terraform, Ansible, or Helm for infrastructure-as-code.

Knowledge of CI/CD pipelines (GitHub Actions, Jenkins).

Experience with message queues (Beanstalkd, Kafka)

Soft skills:

Strong problem-solving and analytical skills, with a focus on root cause analysis.

Excellent communication and collaboration skills to work with cross-functional teams.

Ability to thrive in a fast-paced, startup environment

BENEFITS

Salary range: Open to negotiate;

13th-month salary, annual salary review;

Insurance will be paid according to company policy;

Working devices provided, parking fee allowance (if any);

Professional, dynamic, and friendly working environment and culture;

LOCATION:

568 Luy Ban Bich St, Tan Phu Ward, Ho Chi Minh City.

If you need any further information, contact me or shoot me your CV at

thuyntt9@hasaki.vn

to apply.

***Please note that only shortlisted candidates will be contacted. Thanks for your concern. ^^

Get job updates in your inbox

Subscribe to our newsletter and stay updated with the best job opportunities.