Senior Site Reliability Engineer
4 semanas atrás
Senior Site Reliability Engineer (SRE) - (Brazil) Join to apply for the Senior Site Reliability Engineer (SRE) - (Brazil) role at Articul8 AI. Position Overview We are seeking an experienced Site Reliability Engineer (SRE) to join our team and help ensure the reliability, performance, and scalability of our GenAI SaaS platform. As an SRE, you will bridge the gap between development and operations, implementing automation and best practices to maintain our service reliability objectives while supporting rapid innovation. Key Responsibilities Architect and maintain scalable, highly available infrastructure for our GenAI platform. Design and implement robust monitoring, alerting, and observability solutions to proactively ensure system health and performance. Automate deployment, scaling, and management of our cloud-native infrastructure, reducing toil and improving efficiency. Define, measure, and improve Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to deliver outstanding service quality. Participate in on-call rotations and provide rapid response to production incidents, minimizing downtime and user impact. Collaborate closely with development teams to build reliable, scalable, and efficient systems for complex AI workloads. Lead incident response efforts, conduct thorough post-mortems, and champion continuous improvement initiatives. Optimize infrastructure for performance, scalability, and cost-effectiveness—especially for high-demand AI workloads. Implement and enforce security best practices across all systems and environments. Create and maintain comprehensive documentation, including runbooks and knowledge base articles, to foster a culture of shared knowledge. Required Qualifications Bachelor's degree in Computer Science, Engineering, or related field, or equivalent practical experience. 5+ years of experience in DevOps, SRE, or similar roles. Strong experience with cloud platforms (AWS, GCP, or Azure). Proficiency in at least one programming/scripting language (Python, Go, Bash, etc.). Hands‑on experience with infrastructure as code tools (Terraform, CloudFormation, etc.). Solid background in containerization technologies (Docker, Kubernetes). Proven experience with monitoring and observability tools (Prometheus, Grafana, ELK stack, etc.). Strong understanding of CI/CD pipelines and automation. Exceptional troubleshooting and problem‑solving skills and ability to troubleshoot complex systems. Preferred Qualifications Experience supporting AI/ML systems in production. Knowledge of GPU infrastructure management and optimization. Familiarity with distributed systems and high‑performance computing. Experience with database systems (SQL and NoSQL). Certifications in cloud platforms (AWS, GCP, Azure). Experience with chaos engineering and resilience testing. Knowledge of security best practices and compliance requirements. #J-18808-Ljbffr
-
Senior Site Reliability Engineer
1 semana atrás
Belo Horizonte, Brasil K2 Solutions Tempo inteiroTrabalho híbrido na região de Pinheiros/ SP - 3x por semana no escritório Estamos selecionando um Senior Site Reliability Engineer - SRE para se juntar ao nosso time e desempenhar um papel essencial na manutenção, automação e melhoria da confiabilidade dos sistemas que impulsionam a rede logística da empresa em múltiplas regiões.Essa pessoa...
-
Senior Site Reliability Engineer
Há 3 dias
Belo Horizonte, Brasil Canonical Tempo inteiroCanonical is a leading provider of open source software and operating systems to the global enterprise and technology markets.Our platform, Ubuntu, is widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation and IoT.Our customers include the world's leading public cloud and silicon providers, and...
-
Senior Site Reliability Engineer
1 semana atrás
Belo Horizonte, Minas Gerais, Brasil YAPP Tempo inteiro R$80.000 - R$120.000 por anoA Getrak, líder em plataforma SaaS de rastreamento, monitoramento e segurança veicular, busca um Senior Site Reliability Engineer (SRE) para integrar o time de Tecnologia e Produto. Atuando em um ambiente de alta escala e missão crítica, você será responsável por garantir a confiabilidade, disponibilidade e performance da nossa plataforma, que...
-
Site Reliability Engineer
Há 3 dias
Belo Horizonte, Brasil Bairesdev Tempo inteiroSite Reliability Engineer - Remote Work | REF#******We are looking for a Site Reliability Engineer to administrate and provide support for the whole project infrastructure hosted in the cloud while implementing CI/CD pipelines for the automation of the deployments.What You Will DoEnsure high service availability, performance, security, and...
-
Site Reliability Engineer
3 semanas atrás
Belo Horizonte, Brasil Gauge Tempo inteiroSomos uma empresa do Grupo Stefanini. Especializados em marketing digital, utilizamos uma abordagem integrada que combina tecnologia, inteligência de dados, design e profundo conhecimento do comportamento do consumidor. Nosso foco está em potencializar os resultados de nossos parceiros, oferecendo soluções que vão desde consultoria estratégica até a...
-
Senior Site Reliability Engineer
Há 20 horas
Belo Horizonte, Brasil YAPP Tempo inteiroA Getrak, líder em plataforma SaaS de rastreamento, monitoramento e segurança veicular, busca um Senior Site Reliability Engineer (SRE) para integrar o time de Tecnologia e Produto. Atuando em um ambiente de alta escala e missão crítica, você será responsável por garantir a confiabilidade, disponibilidade e performance da nossa plataforma, que...
-
Site Reliability Engineer Sr
Há 7 dias
Belo Horizonte, Brasil Mercado Eletrônico Tempo inteiroO Mercado Eletrônico é líder na América Latina em soluções de gestão de compras B2B. Suas tecnologias e serviços para as áreas de compras ajudam empresas a conquistarem mais economia, agilidade, governança e colaboração. Com escritórios no Brasil, Estados Unidos, México e Portugal, contabiliza mais de 1 milhão de fornecedores, 10 mil...
-
Site Reliability Engineer
3 semanas atrás
Belo Horizonte, Brasil Canonical Tempo inteiroOverview Join to apply for the Site Reliability Engineer role at Canonical Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation, and IoT. Our...
-
Site Reliability Engineer
1 semana atrás
Belo Horizonte, Brasil Canonical Tempo inteiroOverviewJoin to apply for the Site Reliability Engineer role at CanonicalCanonical is a leading provider of open source software and operating systems to the global enterprise and technology markets.Our platform, Ubuntu, is widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation, and IoT.Our customers...
-
Site Reliability Engineer
2 semanas atrás
Belo Horizonte, Brasil AgileEngine Tempo inteiroOverview Site Reliability Engineer (Middle/Senior) ID38916 AgileEngine is an Inc. 5000 company that creates award-winning software for Fortune 500 brands and trailblazing startups across 17+ industries. We rank among the leaders in areas like application development and AI/ML, and our people-first culture has earned us multiple Best Place to Work awards. If...