
Senior Site Reliability Engineer
2 semanas atrás
What's the opportunity?
We're looking for a Site Reliability Engineer (SRE) to join our team
As an SRE, you're expected to ask key questions like:
What data do we need to understand how our systems are performing?
How do we collect that data?
What patterns are we looking for, and what do they mean?
Who needs to be alerted when something isn't working?
Are there any systems where we need more or better data?An SRE designs systems and processes to answer these questions and automate support and response wherever possible.
Responsibilities:
Own OpenTelemetry Pipelines: Design, implement, and maintain observability pipelines across logs, metrics, and traces, ensuring standardized, scalable, and efficient data ingestion. Optimize ingestion strategies for cost, performance, and usability.
Empower Engineering Teams: Build self-service automation and tooling that lets development teams implement observability without needing manual SRE support. Drive best practices and ensure teams take ownership of their telemetry.
Support Incident Management: Act as the engineering arm of the Incident Management Team—designing playbooks, processes, checklists, and automations to support teams during incidents.
Collaborate Across Teams: Work with teams across the business to understand their monitoring, alerting, and SLO/SLA needs. Design solutions that meet or exceed these requirements and influence architectural decisions from the start to ensure scalability and resilience.
Automate Observability Infrastructure: Use Infrastructure-as-Code (IaC) to manage monitoring tools, alert rules, and observability configurations across OTEL pipelines.
Define Baseline Observability Standards: Create base-level requirements to ensure all infrastructure and code is monitored consistently and accurately.
Own Technical and Security Health: Take full ownership of infrastructure reliability and ensure alignment with key availability and security KPIs.
Optimize Alerting Systems: Continuously fine-tune alerting to reduce noise, ensure alerts are actionable, and improve response efficiency.
If you have
4+ years of experience as an SRE or in a similar observability-focused role.
Strong Kubernetes expertise, including components, deployment practices, and monitoring.
Familiarity with OpenTelemetry—setting up collectors, instrumentation, and pipeline optimization.
Experience with tools like Grafana, Prometheus, Loki, New Relic, or Datadog.
Hands-on experience with Infrastructure-as-Code (Terraform) and GitOps CI/CD (e.g., ArgoCD, GitHub Actions).
Experience integrating incident platforms (PagerDuty, Jira) into alerting workflows.
Strong scripting skills (Python, Go, etc.) to automate observability tasks.
A problem-solving mindset and ability to collaborate across teams to improve reliability.
It's a plus:
Cloud experience, especially with AWS and ECS workloads.
Experience managing observability pipelines at scale in high-throughput environments.
Familiarity with Configuration-as-Code tools (Ansible, Chef, or SaltStack).
Experience with database performance monitoring in large-scale distributed systems.
-
Site Reliability Engineer Júnior
3 semanas atrás
Salvador, Bahia, Brasil TAG IMF Tempo inteiroConhecendo a TAG Somos uma empresa de tecnologia, uma Infraestrutura do Mercado Financeiro (IMF), autorizada e regulada pelo Banco Central. Viabilizamos a gestão de ativos através de plataformas e ferramentas modernas e inovadoras.Nosso foco é desenvolver soluções eficazes para os mercados de pagamento, crédito e financeiro do Brasil.De...
-
DevOps Engineer
2 semanas atrás
Salvador, Bahia, Brasil AgileEngine Tempo inteiroJoin to apply for the DevOps Engineer (Middle/Senior) ID39763 role at AgileEngine 3 days ago Be among the first 25 applicants Join to apply for the DevOps Engineer (Middle/Senior) ID39763 role at AgileEngine AgileEngine is an Inc. 5000 company that creates award-winning software for Fortune 500 brands and trailblazing startups across 17+ industries. We...
-
DevOps Engineer
3 semanas atrás
Salvador, Bahia, Brasil AgileEngine Tempo inteiroJoin to apply for the DevOps Engineer (Middle/Senior) ID39763 role at AgileEngine3 days ago Be among the first 25 applicantsJoin to apply for the DevOps Engineer (Middle/Senior) ID39763 role at AgileEngineAgileEngine is an Inc. 5000 company that creates award-winning software for Fortune 500 brands and trailblazing startups across 17+ industries. We rank...
-
Senior Network Engineer
Há 5 dias
Salvador, Bahia, Brasil Acronis Tempo inteiroJoin to apply for the Senior Network Engineer role at Acronis 3 days ago Be among the first 25 applicants Join to apply for the Senior Network Engineer role at Acronis Get AI-powered advice on this job and more exclusive features. Direct message the job poster from Acronis Global TA @ Acronis | HR & Talent Strategy | Talent Coaching | Social Media...
-
Senior DevOps Engineer
Há 4 dias
Salvador, Bahia, Brasil beBeeDevops Tempo inteiro US$100.000 - US$150.000Job Title: Senior DevOps EngineerAre you a seasoned DevOps professional looking for a challenging opportunity? We have an exciting role available for a Senior DevOps Engineer to join our team. In this position, you will be responsible for designing, implementing, and maintaining scalable infrastructure solutions using cutting-edge technologies.This is a...
-
Senior Network Engineer
4 semanas atrás
Salvador, Bahia, Brasil Acronis Tempo inteiroJoin to apply for the Senior Network Engineer role at Acronis3 days ago Be among the first 25 applicantsJoin to apply for the Senior Network Engineer role at AcronisGet AI-powered advice on this job and more exclusive features.Direct message the job poster from AcronisGlobal TA @ Acronis | HR & Talent Strategy | Talent Coaching | Social Media Ambassador |...
-
Senior Software Engineer
Há 4 dias
Salvador, Bahia, Brasil Braintrust Tempo inteiroJoin to apply for the Senior Software Engineer - C# role at Braintrust1 day ago Be among the first 25 applicantsJoin to apply for the Senior Software Engineer - C# role at BraintrustAbout PyyneJob DescriptionAbout PyynePyyne is a modern technology consultancy engineering the next generation ofdigital products and services.At Pyyne, we believe in using...
-
Linux Site Reliability Consultant
1 dia atrás
Salvador, Bahia, Brasil Pythian Tempo inteiroOverviewSite Reliability Consultant. Brazil | Remote | Work from Home. One available position for the following time zone: PST.Why PythianAt Pythian, we are experts in strategic database and analytics services, driving digital transformation and operational excellence. Pythian, a multinational company, was founded in 1997 and started by ensuring the...
-
Senior Hardware discipline engineer
Há 2 dias
Salvador, Bahia, Brasil Dow Tempo inteiroOverview Join to apply for the Senior Hardware discipline engineer role at Dow . Dow (NYSE: DOW) is one of the world's leading materials science companies, serving customers in high-growth markets such as packaging, infrastructure, mobility and consumer applications. Our global breadth, asset integration and scale, focused innovation, leading business...
-
Senior Devops Engineer
Há 4 dias
Salvador, Bahia, Brasil Fullstack Labs Tempo inteiroJoin to apply for the Senior DevOps Engineer - Remote - Latin America role at FullStack Labs2 days ago Be among the first 25 applicantsJoin to apply for the Senior DevOps Engineer - Remote - Latin America role at FullStack LabsGet AI-powered advice on this job and more exclusive features.About FullStackFullStack is the most transparent IT talent network,...