Staff Site Reliability Engineer

1 dia atrás

Ituiutaba, Brasil Nearsure Tempo inteiro

Explore the Nearsure experience Join our close-knit LATAM remote team: Connect through fun activities like coffee breaks, tech talks, and games with your team-mates and management. Say goodbye to micromanagement We champion autonomy, open communication, and respect for diversity as our core values.Your well-being matters: Our People Care team is here from day one to support you with everything from time-off requests to wellness check-ins.Plus, our Accounts Management team ensures smooth, effective client relationships, so you can focus on what you do best.Ready to grow with us? Here's what we offer you by joining us Competitive USD salary – We value your skills and contributions 100% remote work – While you can work from anywhere, you're always welcome to connect with teammates and grow your network at our coworking spaces across LATAM Paid time off – Take the time you need according to your country's regulations, all while receiving your full salary. Rest, recharge, and come back stronger National Holidays celebrated – Take time off to celebrate important events and traditions with loved ones, fully embracing your culture. Sick leave – Focus on your health without the stress. Take the necessary time to recover and feel better. Refundable Annual Credit – Spend it on the perks you love to enhance your work-life balance Team-building activities – Join us for coffee breaks, tech talks, and after-work gatherings to bond with your Nearsure family and feel part of our vibrant community. Birthday day off – Enjoy an extra day off during your birthday week to celebrate in style with friends and familyAbout the project:As a Staff Site Reliability Engineer, you will own and optimize Open Telemetry pipelines, enabling scalable and efficient observability. You'll build tools that empower teams, support incident response, and drive best practices. Your work ensures a reliable, secure infrastructure and actionable alerting across the organization.How your day-to-day work will look like Design, implement, and maintain observability pipelines across the three main signals—logs, metrics, and traces—ensuring standardized, scalable, and efficient data ingestion. Optimize ingestion strategies to balance cost, performance, and usability. Build self-service automation and tooling that enables development teams to instrument and leverage observability without requiring manual intervention from the SRE team. Drive adoption of best practices while ensuring teams own their telemetry. Design the processes, playbooks, checklists, and automations for them and other engineers to follow during an incident. Interact with members from almost all teams across the business to understand their monitoring, alerting, and SLO / SLA requirements and design systems and processes that ensure we meet or exceed these requirements. Influence architectural decisions during initial design stages to ensure resiliency and scale at the outset of software development. Design the processes, playbooks, checklists, and automations for them and other engineers to follow during an incident. Leverage Infrastructure-as-Code (Ia C) to provision and manage monitoring tools, alerting rules, and our observability configurations across OTEL Pipelines. Design base-level requirements for new and existing services to ensure that all client infrastructure and code are monitored consistently and accurately at a basic level. Take full ownership of client infrastructure reliability, ensuring adherence to key availability and security KPIs.This would make you the ideal candidate Bachelor's Degree in Computer Science, Engineering, or a related field. 8+ Years of experience working as an SRE Engineer or in a very similar role, more focused on observability. 5+ Years of experience working with cloud (AWS). 5+ Years of experience working with Ia C tools (Terraform) and Git Ops CI/CD solutions (Argo CD, Git Hub Actions, or similar). 4+ Years of experience working with monitoring and logging Open Source tools such as Grafana, Prometheus, Elastic/Open Search, Loki, Tempo. 4+ Years of experience working in Kubernetes, including its core components, deployment methodologies, and monitoring best practices. Strong scripting abilities (Python, Go, or similar) for automating observability tasks. Experience in managing observability: SLI, SLOs, Log Transformation, Cardinality Management, Business and Resilience Metrics, 4 Golden Signals, Distributed Tracing. Experience with automated alerting workflows. Exposure with Open Telemetry Pipelines. Advanced English Level is required for this role as you will work with US clients. Effective communication in English is essential to deliver the best solutions to our clients and expand your horizons.What to expect from our hiring process1. Let's chat about your experience2. Impress our recruiters, and you'll move on to a technical interview with our top developers.3. Nail that, and you'll meet our client - your final step to joining our amazing team At Nearsure, we're dedicated to solving complex business challenges through cutting-edge technology and we believe in the power of tailored solutions. Whether you are passionate about transforming businesses with Generative AI, building innovative software products, or implementing comprehensive enterprise platform solutions, we invite you to be part of our dynamic teamWe would love to hear from you if you are eager to make an impact and join a collaborative team that values creativity and expertise.Let's work together to shape the future of technology Apply now By applying to this position, you authorize Nearsure to collect, store, transfer, and process your personal data in accordance with our Privacy Policy. For more information, please review our Privacy Policy. (

Remote Site Reliability Engineer

1 dia atrás

Ituiutaba, Brasil Indi Staffing Services Tempo inteiro

At INDI, we're passionate about empowering individuals and businesses worldwide. Our cutting-edge recruiters connect leading companies with top talent, fostering a dynamic environment where innovation thrives. Join us in shaping the future of work.Overview of the role:We are looking for a Site Reliability Engineer to build and maintain highly reliable,...
Site Reliability Engineer

1 dia atrás

Ituiutaba, Brasil Huge Networks Tempo inteiro

A Huge Networks é uma empresa de tecnologia especializada em cibersegurança, conectividade IP e infraestrutura de alta disponibilidade. Operamos sistemas críticos e entregamos serviços robustos de internet, garantindo confiabilidade e estabilidade para clientes que dependem da nossa operação em toda a América Latina.Estamos em busca de um(a) Site...
Site Reliability Engineer

1 dia atrás

Ituiutaba, Brasil Personetics Tempo inteiro

About the companyPersonetics is shaping the Cognitive Banking era, harnessing AI to help banks anticipate customer needs, provide actionable insights, and deliver intelligent financial guidance. Our platform continuously analyzes and leverages real-time transactional data, enabling banks to proactively support customers in managing their finances and...
Site Reliability Engineer

1 dia atrás

Ituiutaba, Brasil Gauge Tempo inteiro

Somos uma empresa do Grupo Stefanini. Especializados em marketing digital, utilizamos uma abordagem integrada que combina tecnologia, inteligência de dados, design e profundo conhecimento do comportamento do consumidor. Nosso foco está em potencializar os resultados de nossos parceiros, oferecendo soluções que vão desde consultoria estratégica até a...
Site Reliability Engineer

Há 5 dias

Ituiutaba, Brasil Gauge Tempo inteiro

Somos uma empresa do Grupo Stefanini.Especializados em marketing digital, utilizamos uma abordagem integrada que combina tecnologia, inteligência de dados, design e profundo conhecimento do comportamento do consumidor.Nosso foco está em potencializar os resultados de nossos parceiros, oferecendo soluções que vão desde consultoria estratégica até a...
Site Reliability Engineer

1 dia atrás

Ituiutaba, Brasil Grupo Foxbit Tempo inteiro

Estamos à procura de um SRE (Site Reliability Engineer) para nos ajudar a garantir a estabilidade, segurança e escalabilidade de uma das maiores exchanges de criptomoedas do Brasil!O principal objetivo do time de SRE é, em conjunto com Desenvolvimento e Segurança, garantir a confiabilidade dos sistemas, monitorar, melhorar a performance e automatizar...
Site Civil Engineer

1 dia atrás

Ituiutaba, Brasil Alec Holdings Tempo inteiro

Site Engineer - Civil | UAEReady to build what others only imagine? As a Site Engineer at ALEC, you'll turn blueprints into landmarks, managing execution with precision, speed, and impact.This role is based on-site in the UAE and will require relocation. ALEC will be holding interviews in the following cities.Sau Paulo, BrazilRio de Janeiro, BrazilBogota,...
Reliability Expert

Há 5 dias

Ituiutaba, Brasil Bebeereliability Tempo inteiro

Job DescriptionWe are seeking a skilled Site Reliability Engineer to fill this key role.The primary focus of this position is incident resolution via Critical Issue Response System (CIRS) and providing regular updates until successful resolution.Responsibilities:Handle major incidents using the CIRS platformPerform in-depth application troubleshooting,...
Site Reliability Engineer

2 semanas atrás

Ituiutaba, Minas Gerais, Brasil Vericode Tempo inteiro

Sobre nós: Se você gosta de desafios e quer mostrar todo o seu potencial, queremos te conhecerA Vericode preza por um time inclusivo e repleto de diversidade, nas suas mais variadas representações.Todas as nossas vagas estão abertas para pessoas com deficiência#VemSerVericode r Responsabilidades e atribuições: Atuar como facilitador(a) entre os times...
Senior Devops Engineer

Há 7 dias

Ituiutaba, Brasil Itg Software, Inc. Tempo inteiro

Calling all DevOps Engineers!We're looking for an experienced DevOps Engineer who wants to do more than just keep the lights on.This is a chance to design, build, and secure production-grade, cloud-native platforms that power real-world impact.About the Role: You'll be at the heart of our engineering team, scaling infrastructure, automating workflows, and...

Américas

Europa

Ásia / Oceania

África

Staff Site Reliability Engineer