Site Reliability Engineer

1 dia atrás


Ananindeua, Brasil Review ALL Tempo inteiro

About the Company This company operates a global computing platform that enables businesses to programmatically deploy single-tenant Bare Metal instances across multiple regions worldwide. They are a team of passionate engineers working at the intersection of hardware, software, and network infrastructure, building the fastest, most developer‑centric single‑tenant cloud infrastructure on the market. If you share this passion, this role offers the opportunity to help shape the future of internet‑scale infrastructure. Summary The Reliability team is responsible for the health and resilience of the infrastructure powering a global bare metal cloud platform. As a Senior Site Reliability Engineer (SRE), you'll focus on building reliable, observable, and self‑healing systems at scale. SREs here operate at the intersection of software engineering and infrastructure — designing tools that automate operations, improve incident response, and enhance observability, ensuring the platform delivers high performance and reliability to customers worldwide. Key Responsibilities Continuously improve platform reliability and performance. Design, build, and maintain tools to automate operational workflows and incident response. Implement and enhance observability systems (monitoring, alerting, tracing). Collaborate with engineering and platform teams to design scalable and resilient systems. Participate in on‑call rotations and lead post‑incident reviews with a learning‑focused approach. Develop and document operational playbooks and processes. Contribute to defining SLOs / SLIs and driving reliability metrics across teams. Skills & Qualifications Fluent verbal and written English communication skills. Advanced experience with Linux / Unix in production environments. Hands‑on experience with Kubernetes and container orchestration. Proficiency with IaC tools (e.g., Terraform, Ansible). Experience with observability stacks (Prometheus, Grafana, Loki, ELK, etc.). Proficiency with scripting / programming languages such as Bash, Python, Go, or Ruby. Working knowledge of Git and CI / CD pipelines. Experience with incident response and root cause analysis. Knowledge of cloud‑native reliability and security best practices. What’s Offered Contractor engagement (PJ) Paid Time Off Competitive compensation package Wellness benefit (Wellhub / Gympass equivalent) Annual performance‑based bonus Flexible working hours Opportunities for technical and career growth #J-18808-Ljbffr



  • Ananindeua, Brasil Vortigo Digital Tempo inteiro

    Somos a Vortigo - nascemos com o propósito de criar aplicativos mobile para um mundo em constante movimento, mas não paramos por aí. Ampliamos nossa atuação e hoje desenvolvemos softwares para ajudar empresas e startups no processo de transformação digital. Nosso time é composto por pessoas apaixonadas por desafios gigantes, mudando a experiência...


  • Ananindeua, Brasil Tata Consultancy Services Tempo inteiro

    Join one of the biggest IT Services companies in the world! Here you can transform your career! Why join TCS? Here at TCS we believe that people make the difference, that's why we live a culture of unlimited learning full of opportunities for improvement and mutual development. The ideal scenario to expand ideas through the right tools, contributing to our...


  • Ananindeua, Brasil Nearsure Tempo inteiro

    Explore the Nearsure experience! Join our close-knit LATAM remote team: Connect through fun activities like coffee breaks, tech talks, and games with your team-mates and management. Say goodbye to micromanagement! We champion autonomy, open communication, and respect for diversity as our core values. ⚖️Your well-being matters: Our People Care...