As a Site Reliability Engineer (SRE), you'll be part of a team responsible for ensuring Conservice's platform can run at a massive scale. You will be joining the foundational group comprising our first SRE team and be part of setting the strategy for our success. In this role, you will continue developing the skills necessary to help us deliver our products reliably, securely, cost-effectively, and with maximum performance. You will work in an Agile environment and estimate assigned work, create designs, provide task breakdowns, provide support in C# ASP.NET JavaScript, HTML5 and various data technologies (SQLServer, React, NoSQL, ElasticSearch, Datadog).
- Work in an Agile team environment where team members collaborate on strategies that lead to continuous improvement.
- Be a model and mentor for writing well-architected, well-tested, and easy to understand services using software engineering methodologies (e.g., Infrastructure as Code, continuous integration and delivery, architecture reviews, etc.).
- Work directly with development teams to define build pipelines and tools that proactively automate away painful manual tasks and toil.
- Employ observability platforms to collect, monitor, and analyze system metrics to maintain and improve the health of our systems.
- Find the ways to continuously improve our platform.
- Help and train other engineers.
- Prior experience in architecting cloud-based solutions on Amazon Web Services
- Must have experience with coding languages and frameworks used in an SRE/Operations engineering context (e.g. TypeScript, Python, Go, Bash, etc.)
- Strong understanding of the PDLC (Product Development Life Cycle) and the Agile software development methodology
- Strong understanding of tools and techniques used in high availability systems, disaster recovery, and avoiding, mitigating, and managing failure conditions in a microservices environment, both at the infrastructure and application level