Senior Site Reliability Engineer

1 semana atrás


BrazilRemote, Brasil Articul8 Tempo inteiro
About Us

Articul8 AI is at the forefront of Generative AI innovation, delivering cutting-edge SaaS products that transform how businesses operate. Our platform empowers organizations to leverage the power of artificial intelligence in a reliable, scalable, and secure environment.

Position Overview

We are seeking an experienced Site Reliability Engineer (SRE) specializing in chaos engineering and monitoring to join our team and help ensure the reliability, performance, and scalability of our GenAI SaaS platform. As a Senior SRE and Chaos Engineering Specialist, you will create and run chaos experiments to validate our systems' resilience against real-world failures and will bridge the gap between development and operations, implementing automation and best practices to maintain our service reliability objectives while supporting rapid innovation.

Key Responsibilities
  • Architect and maintain scalable, highly available infrastructure for our GenAI platform.

  • Design and implement robust monitoring, alerting, and observability solutions to proactively ensure system health and performance.

  • Automate deployment, scaling, and management of our cloud-native infrastructure, reducing toil and improving efficiency.

  • Define, measure, and improve Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to deliver outstanding service quality.

  • Participate in on-call rotations and provide rapid response to production incidents, minimizing downtime and user impact.

  • Collaborate closely with development teams to build reliable, scalable, and efficient systems for complex AI workloads.

  • Lead incident response efforts, conduct thorough post-mortems, and champion continuous improvement initiatives.

  • Optimize infrastructure for performance, scalability, and cost-effectiveness—especially for high-demand AI workloads.

  • Implement and enforce security best practices across all systems and environments.

  • Create and maintain comprehensive documentation, including runbooks and knowledge base articles, to foster a culture of shared knowledge.

QualificationsRequired
  • Bachelor's degree in Computer Science, Engineering, or related field, or equivalent practical experience

  • 5+ years of experience in DevOps, SRE, or similar roles

  • Strong experience with cloud platforms (AWS, GCP, or Azure)

  • Proficiency in at least one programming/scripting language (Python, Go, Bash, etc.)

  • Hands-on experience with infrastructure as code tools (Terraform, CloudFormation, etc.)

  • Solid background in containerization technologies (Docker, Kubernetes)

  • Proven experience with monitoring and observability tools (Prometheus, Grafana, ELK stack, etc.)

  • Strong understanding of CI/CD pipelines and automation

  • Exceptional troubleshooting and problem-solving skills and ability to troubleshoot complex systems

  • Experience with chaos engineering tools such as Chaos Monkey, Gremlin, or similar frameworks

  • Familiarity with container orchestration platforms like Kubernetes and related chaos tools

Preferred
  • Experience supporting AI/ML systems in production

  • Knowledge of GPU infrastructure management and optimization

  • Familiarity with distributed systems and high-performance computing

  • Experience with database systems (SQL and NoSQL)

  • Certifications in cloud platforms (AWS, GCP, Azure)

  • Experience with chaos engineering and resilience testing

  • Knowledge of security best practices and compliance requirements

Ready to shape the future of resilient software systems? Apply now and help drive the reliability of tomorrow's AI at Articul8 AI
NOTE: This position is available via CLT contract only, Thank you



  • Remote, Brasil Swile Tempo inteiro

    At Swile, we believe that good products can help reduce friction in daily professional life and boost employee satisfaction. Today, we provide innovative solutions in various areas such as Fintech, Travel, HR, and Employee Benefits to more than 5.5 million users in 85,000 companies in France and Brazil. Your role as a Senior Site Reliability Engineer (SRE)...


  • Remote - Argentina; Remote - Brazil; Remote - Chile; Remote - Colombia; Remote - Ecuador; Remote - Mexico; Remote - Peru; Remote - Uruguay Groupon Tempo inteiro

    Groupon is a marketplace where customers discover new experiences and services everyday and local businesses thrive. To date we have worked with over a million merchant partners worldwide, connecting over 16 million customers with deals across various categories. In a world often dominated by e-commerce giants, we stand out as one of the few platforms...

  • Site Reliability Engineer

    4 semanas atrás


    Brazil Softensity Inc Tempo inteiro

    Summary We at Softensity are looking for a Site Reliability Engineer (SRE) – This is a dynamic and hands-on role within a global, collaborative SRE environment . The SRE Technical Member will contribute to building resilient systems, automating operations, and ensuring the platform meets high standards for performance, reliability, and security. You will...

  • Site Reliability Engineer

    4 semanas atrás


    Brazil, BR Softensity Inc Tempo inteiro

    SummaryWe at Softensity are looking for a Site Reliability Engineer (SRE) – This is a dynamic and hands-on role within a global, collaborative SRE environment. The SRE Technical Member will contribute to building resilient systems, automating operations, and ensuring the platform meets high standards for performance, reliability, and security.You will be...


  • São Paulo, State of São Paulo, Brazil Sigma Software Tempo inteiro

    Company Description As a Site Reliability Engineer, you will work as an integral member of product teams, helping to build, deploy, and monitor cloud services reliably. You will contribute to complex software development projects to maintain essential, revenue-critical services. Additionally, you will actively develop code and build frameworks to monitor...

  • Site Reliability Engineer Sr

    4 semanas atrás


    Brazil, BR Mercado Eletrônico Tempo inteiro

    O Mercado Eletrônico é líder na América Latina em soluções de gestão de compras B2B. Suas tecnologias e serviços para as áreas de compras ajudam empresas a conquistarem mais economia, agilidade, governança e colaboração.Com escritórios no Brasil, Estados Unidos, México e Portugal, contabiliza mais de 1 milhão de fornecedores, 10 mil...


  • Remote, SP, Brazil Wizdaa Tempo inteiro

    Job description Level: Senior (5+ years) | Department: Foundation/Platform EngineeringRole OverviewLead development of internal Kubernetes platform enabling scalable application deployment through GitOps. Engineer solutions for deployment complexity, database migrations, multi-environment management, and developer productivity. Drive DevOps practices...

  • Cloud Reliability Engineer

    2 semanas atrás


    Remote Brazil Infios BR . Tempo inteiro

    If you are looking for a meaningful career where people work and act with passion, rethink the existing and always strive to find the best solution - you have come to the right place. We develop future technologies to relentlessly make supply chains better. We are a leader in supply chain software solutions, helping organizations streamline operations,...

  • Site Reliability Engineer

    1 semana atrás


    Descartes Brazil Descartes SmartCompliance Tempo inteiro

    Descartes Unites the People and Technology that Move the WorldThe need for efficient, secure, and agile supply chains and logistics operations has become ever more critical and complex.  By combining innovative technology, powerful trade intelligence and the reach of our network, Descartes helps get goods, information, transportation assets, and people...


  • Brazil - Remote Kuali Tempo inteiro

    Senior Software Engineer (Remote Contractor, Brazil)About the Role We're looking for six Senior Full Stack Engineers to join our team as remote contractors. We're seeking experienced engineers based in Brazil or across Latin America who want to join a US engineering team building our next generation of our enterprise software platform for delivering amazing...