
Senior Site Reliability Engineer
1 semana atrás
What's the opportunity?
We're looking for a Site Reliability Engineer (SRE) to join our team
As an SRE, you're expected to ask key questions like:
What data do we need to understand how our systems are performing?
How do we collect that data?
What patterns are we looking for, and what do they mean?
Who needs to be alerted when something isn't working?
Are there any systems where we need more or better data?An SRE designs systems and processes to answer these questions and automate support and response wherever possible.
Responsibilities:
Own OpenTelemetry Pipelines: Design, implement, and maintain observability pipelines across logs, metrics, and traces, ensuring standardized, scalable, and efficient data ingestion. Optimize ingestion strategies for cost, performance, and usability.
Empower Engineering Teams: Build self-service automation and tooling that lets development teams implement observability without needing manual SRE support. Drive best practices and ensure teams take ownership of their telemetry.
Support Incident Management: Act as the engineering arm of the Incident Management Team—designing playbooks, processes, checklists, and automations to support teams during incidents.
Collaborate Across Teams: Work with teams across the business to understand their monitoring, alerting, and SLO/SLA needs. Design solutions that meet or exceed these requirements and influence architectural decisions from the start to ensure scalability and resilience.
Automate Observability Infrastructure: Use Infrastructure-as-Code (IaC) to manage monitoring tools, alert rules, and observability configurations across OTEL pipelines.
Define Baseline Observability Standards: Create base-level requirements to ensure all infrastructure and code is monitored consistently and accurately.
Own Technical and Security Health: Take full ownership of infrastructure reliability and ensure alignment with key availability and security KPIs.
Optimize Alerting Systems: Continuously fine-tune alerting to reduce noise, ensure alerts are actionable, and improve response efficiency.
If you have
4+ years of experience as an SRE or in a similar observability-focused role.
Strong Kubernetes expertise, including components, deployment practices, and monitoring.
Familiarity with OpenTelemetry—setting up collectors, instrumentation, and pipeline optimization.
Experience with tools like Grafana, Prometheus, Loki, New Relic, or Datadog.
Hands-on experience with Infrastructure-as-Code (Terraform) and GitOps CI/CD (e.g., ArgoCD, GitHub Actions).
Experience integrating incident platforms (PagerDuty, Jira) into alerting workflows.
Strong scripting skills (Python, Go, etc.) to automate observability tasks.
A problem-solving mindset and ability to collaborate across teams to improve reliability.
It's a plus:
Cloud experience, especially with AWS and ECS workloads.
Experience managing observability pipelines at scale in high-throughput environments.
Familiarity with Configuration-as-Code tools (Ansible, Chef, or SaltStack).
Experience with database performance monitoring in large-scale distributed systems.
-
Site Reliability Engineer
2 semanas atrás
Salvador, Bahia, Brasil WEX Tempo inteiro R$104.000 - R$130.878 por anoAbout the Team/Role We are seeking a Software Development Engineer Level 3 to join our SRE team dedicated to the Mobility line of business. This role is for a professional with a software development background who will apply SRE principles to ensure the reliability, scalability, and performance of our complex software systems.The ideal candidate will have...
-
Site Reliability Engineer
3 semanas atrás
Salvador, Bahia, Brasil AgileEngine Tempo inteiroOverview Join to apply for the Site Reliability Engineer (Middle) ID38916 role at AgileEngine. AgileEngine is an Inc. 5000 company that creates award-winning software for Fortune 500 brands and trailblazing startups across 17+ industries. We rank among the leaders in areas like application development and AI/ML, and our people-first culture has earned us...
-
Senior Data Engineer
1 semana atrás
Salvador, Bahia, Brasil Pride Global Tempo inteiroWe're Hiring: Senior Data Engineer (MLOps) | Remote from Brazil | Fluent English required | USD-Hourly payLocation: Remote – Brazil only Language: Fluent English requiredAre you passionate about building scalable data platforms and cutting-edge MLOps solutions? Do you want to work with a top-tier US company revolutionizing e-commerce and circular...
-
Senior Network Engineer
4 semanas atrás
Salvador, Bahia, Brasil Acronis Tempo inteiroJoin to apply for the Senior Network Engineer role at Acronis 3 days ago Be among the first 25 applicants Join to apply for the Senior Network Engineer role at Acronis Get AI-powered advice on this job and more exclusive features. Direct message the job poster from Acronis Global TA @ Acronis | HR & Talent Strategy | Talent Coaching | Social Media...
-
Senior Network Engineer
1 semana atrás
Salvador, Bahia, Brasil Acronis Tempo inteiroJoin to apply for the Senior Network Engineer role at Acronis 3 days ago Be among the first 25 applicants Join to apply for the Senior Network Engineer role at Acronis Get AI-powered advice on this job and more exclusive features. Direct message the job poster from Acronis Global TA @ Acronis | HR & Talent Strategy | Talent Coaching | Social Media...
-
Linux Site Reliability Consultant
3 semanas atrás
Salvador, Bahia, Brasil Pythian Tempo inteiroOverviewSite Reliability Consultant. Brazil | Remote | Work from Home. One available position for the following time zone: PST.Why PythianAt Pythian, we are experts in strategic database and analytics services, driving digital transformation and operational excellence. Pythian, a multinational company, was founded in 1997 and started by ensuring the...
-
Linux Site Reliability Consultant
3 semanas atrás
Salvador, Bahia, Brasil Pythian Tempo inteiroOverview Site Reliability Consultant. Brazil | Remote | Work from Home. One available position for the following time zone: PST. Why Pythian At Pythian, we are experts in strategic database and analytics services, driving digital transformation and operational excellence. Pythian, a multinational company, was founded in 1997 and started by ensuring the...
-
Site Reliability Expert
2 semanas atrás
Salvador, Bahia, Brasil beBeeReliability Tempo inteiro US$150.000 - US$170.000Job OverviewWe are seeking a highly skilled System Reliability Engineer to join our team. As a key technology leader, advisor for our clients, and mentor for other team members, you will be responsible for designing, implementing, and maintaining scalable and reliable infrastructure solutions.Key ResponsibilitiesOperate, maintain, and administer solutions...
-
Golang Software Engineer
3 semanas atrás
Salvador, Bahia, Brasil AgileEngine Tempo inteiroGolang Software Engineer (Senior/Lead) ID37218 AgileEngine is an Inc. 5000 company that creates award-winning software for Fortune 500 brands and trailblazing startups across 17+ industries. We rank among the leaders in areas like application development and AI/ML, and our people-first culture has earned us multiple Best Place to Work awards. If you're...
-
Golang Software Engineer
1 semana atrás
Salvador, Bahia, Brasil AgileEngine Tempo inteiroGolang Software Engineer (Senior/Lead) ID37218 AgileEngine is an Inc. 5000 company that creates award-winning software for Fortune 500 brands and trailblazing startups across 17+ industries. We rank among the leaders in areas like application development and AI/ML, and our people-first culture has earned us multiple Best Place to Work awards. If you're...