Site Reliability Engineer

Há 2 dias


Remoto, Brasil AgileEngine Tempo inteiro R$80.000 - R$120.000 por ano

Important: after confirming your application on this platform, you'll receive an email with the next step: completing your application on our internal site, LaunchPod. So keep an eye on your inbox and don't miss this step — without it, the process can't move forward.

What you will do

  • Shift: Monday – Thursday 8AM – 7PM PST (11AM – 10PM EST) with rotating on-call;
  • Manage alerts daily, check systems, and escalate issues as needed;
  • Be part of a team that provides 24×7 on-call support for critical SaaS events;
  • Be available in case of emergencies when team members are not available or need help;
  • Document issues and remediation steps;
  • Proactively create appropriate monitors in the EKS/K8S ecosystem;
  • Deploy to EKS/K8s cluster using Terraform and Helm;
  • Learn and maintain existing infrastructure running under Docker Swarm;
  • Improve existing infrastructure health by implementing checks and scripts to correct known issues;
  • Maintain and develop deployment code;
  • Automate manual tasks;
  • Implement/integrate new technologies in our Cloud Infrastructure;
  • Collaborate with other teams and departments to provide the highest level of support and assistance;
  • Apply a real customer focus when planning deployments/updates, having the customer in the forefront of the mind, and considering the impact on them before making changes;
  • Work closely on solutions with Support, Customer Success, Migration, and Professional Services teams to provide the best in class SaaS service to our customers;
  • Perform RCA and take necessary corrective actions to prevent the recurrence of issues;
  • Create and assign alert-related actions to the appropriate team after the investigation;
  • Handle support requests for environment-specific actions;
  • Identify and provide automation requirements to improve RCA.

Must haves

  • 2+ years of professional experience;
  • Experience working with Datadog;
  • Hands-on experience as an AWS Cloud Engineer;
  • Working knowledge of EKS/Terraform/Helm;
  • Working Experience with Docker and Docker Swarm;
  • Good understanding of AWS IAM roles and policies;
  • Experience logging and monitoring AWS resources using CloudWatch logs;
  • Experience working in a Linux environment;
  • Proficient in Bash and/or Python scripting;
  • A strong understanding of web technologies such as REST APIs;
  • Working Experience with monitoring solutions, such as Grafana and Prometheus;
  • Excellent oral and written communication skills;
  • Customer-facing communication skills to effectively explain issues and RCAs to them;
  • Experience in Product/Application Support for SaaS-based products;
  • Understanding of APIs, Databases, Systems Architecture, and Design;
  • Designing, implementing, and operating in a DevSecOps;
  • Excellent communication skills, both written and verbal;
  • Ability to work independently as well as within a collaborative environment;
  • A technical aptitude with the desire to learn new and evolving technologies;
  • Upper-Intermediate English level.

About us

AgileEngine is an Inc. 5000 company that creates award-winning software for Fortune 500 brands and trailblazing startups across 17+ industries. We rank among the leaders in areas like application development and AI/ML, and our people-first culture has earned us multiple Best Place to Work awards.

If you're looking for a place to grow, make an impact, and work with people who care, we'd love to meet you

Perks and benefits

  • Professional growth: Accelerate your professional journey with mentorship, TechTalks, and personalized growth roadmaps.
  • Competitive compensation: We match your ever-growing skills, talent, and contributions with competitive USD-based compensation and budgets for education, fitness, and team activities.
  • A selection of exciting projects: Join projects with modern solutions development and top-tier clients that include Fortune 500 enterprises and leading product brands.
  • Flextime: Tailor your schedule for an optimal work-life balance, by having the options of working from home and going to the office, whatever makes you the happiest and most productive.

Job Type: Full-time



  • Remoto, Brasil Rocket Tempo inteiro US$90.000 - US$120.000 por ano

    Engineering/RemoteJob Title: Senior Site Reliability EngineerLevel: SeniorWorking Hours: Full Time(40h/Week)Contract: ContractorLocation: RemoteYour TeamYou will report to our Head Of Infrastructure and Deployment and join the Engineering team. The Site Reliability Engineering (SRE) team is dedicated to engineering, maintaining, and continuously improving...

  • Site Reliability Engineer

    2 semanas atrás


    Remoto, Brasil Objective Tempo inteiro R$90.000 - R$120.000 por ano

    Somos ávidos por tecnologia, criatividade e desafios.Se você gosta de desafios, aprendizado constante e valoriza as conexões pessoais, junte-se a nósValorizamos a diversidade e acreditamos que ela é fundamental para a inovação e entregas de valor aos nossos clientes. Todas as nossas vagas são destinadas a todas as pessoas, com ou sem deficiência,...


  • Remoto, Brasil Mercos Tempo inteiro R$80.000 - R$120.000 por ano

    Se você tem experiência com infraestrutura Cloud, está sempre atualizado nas tecnologias novas, busca desenvolver infraestruturas imutáveis e reproduzíveis, esta vaga é para você Aqui, você será incentivado a simplificar e automatizar ao máximo a infraestrutura, questionar a arquitetura e as tecnologias escolhidas e a resposta não será "porque...


  • Remoto, Brasil Foxbit Tempo inteiro R$80.000 - R$120.000 por ano

    Estamos à procura de um DBRE (Database Reliability Engineer) para reforçar nosso time de Plataforma e garantir a confiabilidade, performance e eficiência de toda a nossa infraestrutura de dados em uma das maiores exchanges de criptomoedas do BrasilO principal objetivo do DBRE é, em conjunto com os times de SRE, Desenvolvimento e Segurança, assegurar que...


  • Remoto, Brasil vortigo Tempo inteiro R$80.000 - R$120.000 por ano

    Somos a Vortigo - nascemos com o propósito de criar aplicativos mobile para um mundo em constante movimento, mas não paramos por aí. Ampliamos nossa atuação e hoje desenvolvemos softwares para ajudar empresas e startups no processo de transformação digital. Nosso time é composto por pessoas apaixonadas por desafios gigantes, mudando a experiência...


  • Remoto, Brasil Clicksign Tempo inteiro R$90.000 - R$120.000 por ano

    Sobre a ClicksignSomos uma empresa brasileira líder em assinaturas eletrônicas. Em essência, facilitamos relações entre pessoas e empresas no ambiente digital. Por trás da nossa tecnologia de ponta e foco em segurança, temos a missão de fazer o mundo crescer, tornando as relações digitais cada vez mais inteligentes.Como trabalhamos:Nossa essência...

  • Data Engineer

    Há 2 dias


    Remoto, Brasil Ambush Consulting Tempo inteiro R$90.000 - R$120.000 por ano

    Ambush is a People Company. But what does that mean exactly? It means we care about our people as much as we care about building great products. We take a human-centered approach to identifying, retaining and integrating highly-talented, long-term remote people into America's best product and development team.We began our consulting journey in 2015 and have...

  • Senior AI Engineer

    1 semana atrás


    Remoto, Brasil Lean Tech Tempo inteiro R$60.000 - R$180.000 por ano

    Position Summary: As a Principal AI Engineer and Thought Leader, you will spearhead the design, development, and deployment of cutting-edge AI solutions that drive operational and product innovation. You will work closely with cross-functional teams to implement AI-driven improvements, optimize workflows, and create meaningful business impact. This role...


  • Remoto, Brasil High 5 Games Tempo inteiro R$120.000 - R$240.000 por ano

    We are looking for a Machine Leaning Engineer (MLE) to design, build, and optimize our machine learning operations. You will play a crucial role in scaling AI models from research to production, ensuring smooth model deployment, monitoring, and lifecycle management across our Google Cloud Platform (GCP) infrastructure. You'll work closely with data...


  • Remoto, Brasil High 5 Games Tempo inteiro R$90.000 - R$120.000 por ano

    We are looking for a Machine Leaning Engineer (MLE) to design, build, and optimize our machine learning operations. You will play a crucial role in scaling AI models from research to production, ensuring smooth model deployment, monitoring, and lifecycle management across our Google Cloud Platform (GCP) infrastructure. You'll work closely with data...