Job Summary:
We are looking for a Site Reliability Engineer for the SRE team in India who will be responsible for ensuring the reliability, availability, and performance of software systems by applying software engineering principles to operations.
Responsibilities:
- Ensure the reliability, scalability, and performance of our company's production environment, including complex architecture with multiple servers, deployment & various cloud technologies.
- Ability to collaborate with cross-functional teams, work independently, and prioritize effectively in a fast-paced environment.
- Effectively oversee and enhance monitoring capabilities for production environment and ensure optimal performance and functionality across the technology stack.
- Demonstrates flexibility to support our 24/7 operations and is willing to participate in on-call rotations to ensure timely incident response and resolution.
- Effectively address and resolve unexpected service issues while also creating and implementing tools and automation measures to proactively mitigate the likelihood of future problems.
Requirements:
- Minimum 3 years of experience in SRE/DevOps position for SaaS based products.
- Experience in managing mission critical production environment.
- Experience on version control tools like GIT, Bitbucket, etc.
- Experience in establishing CI/CD procedures with Jenkins.
- Working knowledge of databases.
- Experience in effectively managing AWS infrastructure, demonstrating proficiency across multiple AWS Cloud services including networking, EC2, VPC, EKS, ELB/NLB, API GW, Cognito, and more.
- Experience in monitoring tools like Datadog, ELK, Prometheus and Grafana, etc.
- Experience in understanding and managing Linux infrastructure.
- Experience in bash or python.
- Experience with IaC like CloudFormation / CDK / Terraforms
- Experience in Kubernetes and container management.
- Possesses excellent written and verbal communication skills in English, allowing for effective and articulate correspondence.
- Demonstrates strong teamwork, maintains a positive demeanor, and upholds a high level of integrity.
- Exhibits exceptional organizational abilities, displays thorough attention to detail, and remains highly committed to tasks at hand.
- Displays sharp intellect, adeptness at picking up new information quickly, and is highly self-motivated.
Advantages:
- Additional cloud services knowledge (Azure, GCP, etc.)
- Understanding of Java, Maven, NodeJS based applications.
- Experience in serverless architecture