Principal Site Reliability Engineer (SRE)
Atlan
Software Engineering
Mexico City, Mexico
Posted on Wednesday, October 11, 2023
About the Role
- As the Principal SRE, you will be responsible for leading and driving platform-first initiatives to ensure the scalability, reliability, and performance of our technology platform. You will play a pivotal role in enhancing the availability, reliability, and performance of our critical systems and services.
What will you do?
- Lead and drive platform-first initiatives, with a focus on scalability, reliability, and performance of our technology platform.
- Design, build, and maintain robust infrastructure supporting our distributed systems, leveraging technologies such as Kubernetes, Kafka, Postgres, Cassandra, and Redis.
- Implement monitoring and alerting systems to guarantee high availability and performance, with a dedicated focus on SLA and availability metrics.
- Collaborate with engineering and operations teams to identify critical components and systems requiring enhanced availability measures.
- Design and implement strategies, tooling, and processes to enhance system uptime and reliability.
- Continuously evaluate and recommend improvements to platform infrastructure and processes, enhancing efficiency and reliability.
- Align the platform with customer needs and business goals by working closely with cross-functional teams.
- Develop and maintain CI/CD pipelines for seamless deployment and release management.
What makes you a match?
- Proven expertise in software development and engineering, with a strong emphasis on building large-scale distributed systems.
- Proficiency in one of the commonly used programming languages for building distributed systems, such as Golang, Java, or Python.
- Extensive experience with cloud infrastructure providers (AWS, Azure, or GCP) and developing distributed systems using cloud services.
- Strong expertise in container orchestration platforms, specifically Kubernetes. CKA certification is a plus.
- Exceptional problem-solving skills and a passion for developing robust, scalable, and secure solutions.
- Excellent communication skills to effectively collaborate with cross-functional teams.
- Ability to share impactful tech stories, demonstrating the results of your technical contributions.