Job Description
Job Overview
Redis is seeking a Site Reliability Engineer to join their Cloud Operations team in Redmond, WA. This role involves working on large-scale systems and supporting customers by ensuring the reliability and stability of Redis managed services. Candidates will engage in complex troubleshooting, collaborate with engineering teams, and participate in on-call rotations to maintain service continuity.
Technical Requirements
Required Skills
- • Linux/Unix
- • Networking (TCP/IP)
- • Scripting languages (Bash, Python)
- • Azure cloud infrastructure
- • Alerting and monitoring systems (Prometheus, Grafana, ELK, Splunk)
Preferred Skills
- • C# programming
- • NoSQL databases (especially Redis)
- • Infrastructure as code tools (Terraform, Pulumi)
- • Deployment and configuration management tools (Github actions, Jenkins, Ansible, Chef)
Experience Level
At least 4 years of experience in infrastructure/CloudOps/SRE domains and 3 years in troubleshooting real-time production systems.
Responsibilities
- • Engage in complex troubleshooting and manage technical escalations within a Follow-the-Sun support model.
- • Leverage software development skills to create automation tools and runbooks.
- • Collaborate with engineering teams during service-impacting incidents.
- • Participate in on-call rotations for critical support.
Additional Information
- Location
-
Redmond, WA
- Type
-
Full-time
- Compensation
-
Competitive salary with benefits