Site Reliability Engineer

Há 2 dias


Porto Alegre, Brasil Review All Tempo inteiro

About the CompanyThis company operates a global computing platform that enables businesses to programmatically deploy single-tenant Bare Metal instances across multiple regions worldwide.They are a team of passionate engineers working at the intersection of hardware, software, and network infrastructure, building the fastest, most developer-centric single-tenant cloud infrastructure on the market.If you share this passion, this role offers the opportunity to help shape the future of internet-scale infrastructure.This position is being managed in partnership with an external recruitment consultancy supporting the company throughout the hiring process.SummaryThe Reliability team is responsible for the health and resilience of the infrastructure powering a global bare metal cloud platform.As a Senior Site Reliability Engineer (SRE), you'll focus on building reliable, observable, and self-healing systems at scale.SREs here operate at the intersection of software engineering and infrastructure — designing tools that automate operations, improve incident response, and enhance observability, ensuring the platform delivers high performance and reliability to customers worldwide.This role is ideal for engineers passionate about reliability, automation, distributed systems, and bringing cloud-like experiences to bare metal environments.Key ResponsibilitiesContinuously improve platform reliability and performance.Design, build, and maintain tools to automate operational workflows and incident response.Implement and enhance observability systems (monitoring, alerting, tracing).Collaborate with engineering and platform teams to design scalable and resilient systems.Participate in on-call rotations and lead post-incident reviews with a learning-focused approach.Develop and document operational playbooks and processes.Contribute to defining SLOs/SLIs and driving reliability metrics across teams.Skills & QualificationsRequired:Fluent verbal and written English communication skillsAdvanced experience with Linux/Unix in production environmentsHands-on experience with Kubernetes and container orchestrationProficiency with IaC tools (e.g., Terraform, Ansible)Experience with observability stacks (Prometheus, Grafana, Loki, ELK, etc.)Proficiency with scripting/programming languages such as Bash, Python, Go, or RubyWorking knowledge of Git and CI/CD pipelinesExperience with incident response and root cause analysisKnowledge of cloud-native reliability and security best practicesWhat's OfferedContractor engagement (PJ)Paid Time OffCompetitive compensation packageWellness benefit (Wellhub / Gympass equivalent)Annual performance-based bonusFlexible working hoursOpportunities for technical and career growth


  • Site Reliability Engineer

    2 semanas atrás


    Porto Alegre, Brasil Canonical Tempo inteiro

    1 month ago Be among the first 25 applicants Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation, and IoT. Our customers include the world's...

  • Site Reliability Engineer

    1 semana atrás


    Porto Alegre, Brasil Canonical Tempo inteiro

    1 month ago Be among the first 25 applicants Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation, and IoT. Our customers include the world's...


  • Porto Alegre, Brasil MetaCTO Tempo inteiro

    About UsAt MetaCTO, we specialize in helping startups and growing companies turn visionary ideas into successful digital products through expert app development and fractional CTO services. As a Site Reliability Engineer (SRE), you will play a critical role in ensuring the reliability, scalability, and security of the backend infrastructure that powers...


  • Porto Alegre, Brasil MetaCTO Tempo inteiro

    About Us At MetaCTO, we specialize in helping startups and growing companies turn visionary ideas into successful digital products through expert app development and fractional CTO services. As a Site Reliability Engineer (SRE), you will play a critical role in ensuring the reliability, scalability, and security of the backend infrastructure that powers...


  • Porto Alegre, Brasil WEX Tempo inteiro

    Join to apply for the Mid level Site Reliability Engineer role at WEX 1 week ago Be among the first 25 applicants About The Team/Role The WEX Site Reliability Engineering (SRE) team seeks individuals passionate about developing software and solutions for observability, incident response, reliability, performance, operational excellence, and compliance. As...

  • Site Reliability Engineer

    1 semana atrás


    Porto Alegre, Brasil Azion Tempo inteiro

    Join to apply for the Site Reliability Engineer (SRE) role at Azion 3 days ago Be among the first 25 applicants Join to apply for the Site Reliability Engineer (SRE) role at Azion About Azion We are a global leader in the application and security industry. Our platform allows companies to operate with agility, reducing latency and increasing the reliability...


  • Porto Alegre, Brasil Wex Tempo inteiro

    The WEX Site Reliability Engineering (SRE) team seeks individuals passionate about developing software and solutions for observability, incident response, reliability, performance, operational excellence, and compliance.As part of the Site Reliability Engineering organization, you will support internal stakeholders and Payment Platform teams, tackling...


  • Porto Alegre, Brasil Wex Tempo inteiro

    About The Team/RoleWe are seeking a Software Development Engineer Level 3 to join our SRE team dedicated to the Mobility line of business.This role is for a professional with a software development background who will apply SRE principles to ensure the reliability, scalability, and performance of our complex software systems.The ideal candidate will have...


  • Porto Alegre, Rio Grande do Sul, Brasil Wex Tempo inteiro R$80.000 - R$120.000 por ano

    About the Team/Role We are seeking a Software Development Engineer Level 3 to join our SRE team dedicated to the Mobility line of business. This role is for a professional with a software development background who will apply SRE principles to ensure the reliability, scalability, and performance of our complex software systems.The ideal candidate will have...


  • Porto Alegre, Brasil Netvagas Tempo inteiro

    About AzionWe are a global leader in the application and security industry.Our platform allows companies to operate with agility, reducing latency and increasing the reliability of their applications.We are focused on simplifying application building and looking for passionate and innovative individuals to join our team!At Azion you will have the opportunity...