About the Role:
We are seeking a driven and passionate DevOps & Site Reliability Engineer (SRE) to join our growing team. You will play a key role in building, deploying, and maintaining our highly-scalable infrastructure across multiple cloud platforms (AWS, GCP). You will be responsible for automating software delivery pipelines, implementing continuous integration and continuous delivery (CI/CD) practices, and ensuring the reliability and performance of our systems.
Responsibilities:
Design, develop, and implement automated infrastructure provisioning and configuration management using tools like Terraform, Ansible.
Configure and manage Kubernetes clusters for containerized application deployments.
Implement CI/CD pipelines using tools like GitHub Actions, Jenkins, Helm charts
Monitor and optimize system performance using tools like Prometheus, Grafana, elastic stack and APM.
Investigate and troubleshoot system incidents and outages.
Develop and implement automation scripts for repetitive tasks.
Collaborate with developers and other engineers to ensure smooth deployments and operations.
Stay up-to-date with the latest trends and technologies in DevOps and SRE.
Required Skills and Experience:
2+ years of experience as a DevOps Engineer or SRE.
Strong understanding of cloud platforms (AWS, GCP) and their services.
Proficient in Kubernetes and container orchestration technologies.
Experience with Git and GitHub, ideally with GitHub Actions CI/CD workflows.
Experience with infrastructure as code (IaC) tools like Terraform.