Site Reliability Engineer
4 semanas atrás
About the Company This company operates a global computing platform that enables businesses to programmatically deploy single-tenant Bare Metal instances across multiple regions worldwide.
They are a team of passionate engineers working at the intersection of hardware, software, and network infrastructure, building the fastest, most developer-centric single-tenant cloud infrastructure on the market. If you share this passion, this role offers the opportunity to help shape the future of internet-scale infrastructure.
This position is being managed in partnership with an external recruitment consultancy supporting the company throughout the hiring process.
Summary
The Reliability team is responsible for the health and resilience of the infrastructure powering a global bare metal cloud platform. As a Senior Site Reliability Engineer (SRE), you'll focus on building reliable, observable, and self-healing systems at scale.
SREs here operate at the intersection of software engineering and infrastructure — designing tools that automate operations, improve incident response, and enhance observability, ensuring the platform delivers high performance and reliability to customers worldwide.
This role is ideal for engineers passionate about reliability, automation, distributed systems, and bringing cloud-like experiences to bare metal environments.
Key Responsibilities
Continuously improve platform reliability and performance.
Design, build, and maintain tools to automate operational workflows and incident response.
Implement and enhance observability systems (monitoring, alerting, tracing).
Collaborate with engineering and platform teams to design scalable and resilient systems.
Participate in on-call rotations and lead post-incident reviews with a learning-focused approach.
Develop and document operational playbooks and processes.
Contribute to defining SLOs/SLIs and driving reliability metrics across teams.
Skills & Qualifications
Required:
Fluent verbal and written English communication skills
Advanced experience with Linux/Unix in production environments
Hands-on experience with Kubernetes and container orchestration
Proficiency with IaC tools (e.g., Terraform, Ansible)
Experience with observability stacks (Prometheus, Grafana, Loki, ELK, etc.)
Proficiency with scripting/programming languages such as Bash, Python, Go, or Ruby
Working knowledge of Git and CI/CD pipelines
Experience with incident response and root cause analysis
Knowledge of cloud-native reliability and security best practices
What's Offered
Contractor engagement (PJ)
Paid Time Off
Competitive compensation package
Wellness benefit (Wellhub / Gympass equivalent)
Annual performance-based bonus
Flexible working hours
Opportunities for technical and career growth
-
Site Reliability Engineer
2 semanas atrás
Manaus, Brasil Review All Tempo inteiroAbout the Company This company operates a global computing platform that enables businesses to programmatically deploy single-tenant Bare Metal instances across multiple regions worldwide.They are a team of passionate engineers working at the intersection of hardware, software, and network infrastructure, building the fastest, most developer-centric...
-
Site Reliability Engineer
3 semanas atrás
Manaus, Brasil Review ALL Tempo inteiroAbout the Company This company operates a global computing platform that enables businesses to programmatically deploy single-tenant Bare Metal instances across multiple regions worldwide. They are a team of passionate engineers working at the intersection of hardware, software, and network infrastructure, building the fastest, most developer-centric...
-
Site Reliability Engineer
4 semanas atrás
Manaus, Brasil Review ALL Tempo inteiroAbout the Company This company operates a global computing platform that enables businesses to programmatically deploy single-tenant Bare Metal instances across multiple regions worldwide. They are a team of passionate engineers working at the intersection of hardware, software, and network infrastructure, building the fastest, most developer-centric...
-
Site Reliability Engineer
4 semanas atrás
Manaus, Brasil MetaCTO Tempo inteiroAbout Us At MetaCTO, we specialize in helping startups and growing companies turn visionary ideas into successful digital products through expert app development and fractional CTO services. As a Site Reliability Engineer (SRE), you will play a critical role in ensuring the reliability, scalability, and security of the backend infrastructure that powers...
-
Site Reliability Engineer
4 semanas atrás
Manaus, Brasil MetaCTO Tempo inteiroAbout Us At MetaCTO, we specialize in helping startups and growing companies turn visionary ideas into successful digital products through expert app development and fractional CTO services. As a Site Reliability Engineer (SRE), you will play a critical role in ensuring the reliability, scalability, and security of the backend infrastructure that powers...
-
Senior Site Reliability
3 semanas atrás
Manaus, Brasil Canonical Tempo inteiroSenior Site Reliability / Gitops Engineer Join to apply for the Senior Site Reliability / Gitops Engineer role at Canonical Senior Site Reliability / Gitops Engineer 1 day ago Be among the first 25 applicants Join to apply for the Senior Site Reliability / Gitops Engineer role at Canonical Canonical is a leading provider of open source software and operating...
-
Site Reliability Engineer
Há 7 dias
Manaus, Brasil Canonical Tempo inteiroJoin to apply for the Site Reliability Engineer role at Canonical Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation, and IoT. Our customers...
-
Site Reliability Engineer
2 semanas atrás
Manaus, Brasil Canonical Tempo inteiroJoin to apply for the Site Reliability Engineer role at CanonicalCanonical is a leading provider of open source software and operating systems to the global enterprise and technology markets.Our platform, Ubuntu, is widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation, and IoT.Our customers include...
-
Senior Site Reliability Engineer
1 dia atrás
Manaus, Brasil Canonical Tempo inteiroSenior Site Reliability Engineer Join to apply for the Senior Site Reliability Engineer role at Canonical. Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is widely used in breakthrough enterprise initiatives such as public cloud, data science, AI,...
-
Site Reliability Engineer
Há 7 dias
Manaus, Brasil Psm Company Tempo inteiroSobre a vaga A PSM Company é especializada na identificação de Talentos para as áreas de TI / Telecom como também para as áreas operacionais e administrativas. Nossa história de sucesso, está baseada em nosso modelo de negócios que proporcionam assertividade e qualidade no processo seletivo, baixo Turn Over e isenção de riscos e passivos...