Role \& Responsibilities
As a
Devops/SRE Senior Engineer
, you will act as the production readiness steward for versatile Gateway products and integration with other platforms. You will partner with development teams to design, implement, and support services with a focus on operational resilience, automation, and compliance.
Key Responsibilities:
Lifecycle Ownership:
Engage in and improve the entire service lifecycle—from design and deployment to operations and continuous improvement.
Operational Readiness:
Ensure system availability, capacity, performance, monitoring, and self-healing capabilities are embedded throughout delivery.
Incident Management:
Practice sustainable incident response, lead blameless postmortems, and optimize Mean Time to Recovery (MTTR).
Automation \& CI/CD:
Develop and maintain automation pipelines for certificate renewal, traffic routing, alerting, and compliance reporting using tools like
Ansible, Venafi \& XLR template
.
Support CI/CD pipelines for software promotion and operational gating.
Reliability Engineering:
Scale systems sustainably through automation and advocate for changes that improve reliability and velocity.
Compliance \& Risk Management:
Drive initiatives for Safety \& Soundness, PCI compliance, threat/toil reduction, and ITSM defect resolution.
Monitoring \& Observability:
Implement robust logging, monitoring, and alerting standards to ensure system health and proactive issue detection. Hands-on experience with
Dynatrace \& Splunk
monitoring tool configuration and alerting.
Collaboration:
Work with global teams across multiple time zones and mentor junior engineers.
Continuous Improvement:
Provide feedback loops to development teams on resiliency gaps and operational enhancements.
Rotational On-Call \& Flexibility:
Participate in rotational on-call support for critical production systems.
Demonstrate flexibility to take on additional responsibilities and ad-hoc duties as needed to support team and organizational goals.
All About You (Skills \& Qualifications)
Experience:
5+ years in Devops/SRE, Site Reliability Engineering, or DevOps roles.
Technical Expertise:
Strong understanding of
NGINX configuration
and
gRPC event-driven architectures
.
Proficiency in DevOps tools:
Chef, Jenkins, Groovy, shell scripting, Bitbucket, Git, Ansible, XLR
.
Experience with
AWS infrastructure
, secure access practices, and cloud-native deployments.
Security \& Compliance:
Awareness of certificate lifecycle management, mutual TLS, SSL handshake, SSH keys, encryption standards.
Familiarity with ITSM processes, compliance frameworks, and incident management.
Networking \& Systems:
Knowledge of client-server relationships, network layers (L1–L7), load balancers (
BIG-IP F5
), and application firewalls.
Ability to analyze stack traces, TCP dumps, heap/thread dumps, and perform OS-level troubleshooting.
Authentication \& Authorization:
Intermediate understanding of
Active Directory, SAML, LTPA, SSO, OAuth
.
Soft Skills:
Strong documentation and communication skills.
Ability to collaborate across cross-functional teams and mentor junior resources.
Preferred Experience:
Building self-healing systems and operational resiliency frameworks.
Implementing observability solutions for distributed systems.
Driving DevOps transformation and automation best practices in large-scale environments