Are you looking for a career that makes a positive difference in your life and reimagines learners and educators across the globe? Do you want to work with fun and social people in a positive and engaged virtual office environment?
We are hiring a Senior Site Reliability Engineer who will build and support reliable, high capacity, and well-performing systems in support of our mission to protect and improve our customer platforms, with an ever-watchful eye on reliability, security, performance, cost, and operational excellence.
We call this work Site Reliability Engineering.
As a Site Reliability Engineer within a small team, you will collaborate in a DevOps model with product development teams; designing, deploying, and managing automation tools that increase predictability as well as time to market while reducing cost.
Our stack
Code:, Java, PHP, Node, and GoLang
RDBMS: Oracle, PostGreSQL, MySQL
Cache: Couchbase, Redis, ElastiCache, DynamoDB
Containers: ECS, EKS, K8S, Docker
Cloud: Amazon AWS
Telemetry: New Relic, CloudWatch
Build: Jenkins, CircleCI, GitHub Actions
Run: PagerDuty, Exigence
Infrastructure-As-Code: Terraform, Cloudformation
Your contributions
Cloud Engineering
Hands-on design, analysis, development and troubleshooting of highly-distributed large-scale production systems and event-driven, cloud-based services
Ensure repeatability, traceability, and transparency of our infrastructure automation (infrastructure-as-code, monitoring-as-code)
Participate in continual learning of the AWS ecosystem, game day scenarios, and professional conferences
Collaborative solutioning of enterprise applications with development teams utilizing our software stack
Actively monitor AWS Cost, and utilize optimizer to maximize ROI while maintaining Service Level Objectives
Observability Engineering
Ownership of reliability, uptime, system security, cost, operations, capacity, resiliency and performance-analysis thereof
Define, monitor and report on service level indicators for applications workloads
Support on-call rotations for operational duties that have not been addressed with automation, with an eye for correcting issues that result in on-call alarms
Maintain telemetry that improve the visibility to our applications' performance and business metrics and keep operational workload in-check
Develop, communicate, collaborate, and monitor standard processes to promote the long-term health and sustainability of operational development tasks.
DevSecOps
Support healthy software development practices, including complying with agile software development methodology, building standards for code reviews, work packaging, and continuous delivery
Partner with CyberSecurity and develop plans and automation to respond to new risks and vulnerabilities
Systems Engineering
Collaborate with Systems Admins to coordinate middleware, network, storage, database, Windows, Linux, VMware maintenance
Automate legacy onprem system maintenance and migrate to cloud via thoughtful redesign
Resiliency Engineering
Collaborate with dev teams to identify failure points and blast radius of systems
Validate effectiveness of monitoring and observability configurations
Coordinate failure injection testing
Observe and document steady state production levels, growth patterns
Plan and forecast for seasonal growth, communicate trend lines with leadership, enhance infrastructure scaling plans to accommodate 2x planned load
Coordinate improvements of existing software and infrastructure to meet resiliency goals
Must Have for this role
We are looking for a senior reliability engineer who can work with the cross – functional teams.
The candidate must have strong experience in Terraform.
This person should have the capability to work with the stakeholders and should have the expereince in leading the P1 and P2 teams.
Candidates should come with the experience in support on-call rotations for operational duties that have not been addressed with automation, with an eye for correcting issues that result in on-call alarms.
Candidate must have EKS and K8S experience.
Candidate should be a good communicator.
Interviews
2 rounds of interviews will be conducted.
One with the hiring manager.
Panel interview with the team.