About The Team
The mission of the Shopee Tech Ops MRE (Machine Reliability Engineering) team is to ensure efficient and sustainable operation of the Shopee network and hardware level 24x7, building and maintaining massive hardware clusters for SRE and capacity, in terms of capacity, cost and hardware performance. The team provides sustainable hardware resources and stable network support services. MRE needs to communicate with the data center team to design and optimize network architecture; provide reasonable hardware configuration through hardware testing and selection according to business requirements; customize stable and efficient OS; optimize traditional operation through engineering and service means; and build a complete hardware monitoring system to improve the efficiency of fault handling.
Job Description
Manage automated installation of Linux operating systems
Troubleshoot server hardware and OS issues
Oversee server assets management throughout lifecycle
Develop automation tools for operations.
Collaborate with cross-functional teams to ensure efficient system performance and reliability
Requirements
Expertise in Linux automated installation processes and methodologies.
In-depth knowledge of the Linux OS, with proficiency in common commands and tools.
Strong analytical skills to diagnose and resolve OS issues, including storage and network.
Proficiency in Shell scripting and at least one programming language (e.g., Python, Go).
Basic understanding of bare metal servers and data center infrastructure.
Foundational knowledge of containers and orchestration technologies.