Senior Site Reliability Engineer

Há 2 dias


São Paulo State of São Paulo Brazil Sigma Software Tempo inteiro R$80.000 - R$120.000 por ano
Company Description

As a Site Reliability Engineer, you will work as an integral member of product teams, helping to build, deploy, and monitor cloud services reliably. You will contribute to complex software development projects to maintain essential, revenue-critical services. Additionally, you will actively develop code and build frameworks to monitor services deployed in production, driving reliability and performance on a large scale. You will be responsible for ensuring the reliability, availability, and performance of our Elasticsearch infrastructure. 

CUSTOMER

ConnectWise is the world's leading software company dedicated to the success of IT solution providers. Its vision is to power a thriving IT ecosystem that transforms what's possible for SMBs and do this by empowering IT solution providers with unmatched software, services, and community to help them achieve their most ambitious vision of success. These tools being developed are used by IT service providers to automate their activities for small and medium sized businesses (SMBs), such as backup and restore, providing security, and performing administrative tasks on Microsoft 365 tenants.  

Job Description
  • Build systems and infrastructure for monitoring complex, large-scale distributed systems 
  • Identify stability and performance issues, and collaborate with developers to triage critical issues in production systems 
  • Represent the SRE organization in design reviews and operational readiness exercises for new and existing services 
  • Devise ways to actively monitor system throughput, capacity, and reliability 
  • Debug complex systems and evolve a running environment without causing downtime 
  • Engage in service capacity planning and demand forecasting, as well as software performance analysis and system tuning 
  • Drive standardization efforts across multiple disciplines and services in conjunction with embedded SREs throughout the organization 
  • Monitor and troubleshoot Elasticsearch performance issues and outages 
Qualifications
  • Fundamental knowledge of technologies across a broad range of disciplines, including virtualization, storage, networking, server, and security 
  • Bachelor's degree in computer science or equivalent work experience as a System Administrator with programming skills 
  • Understanding of systems and application design, including the operational trade-offs of various designs 
  • Experience with monitoring and logging solutions such as Prometheus, Grafana, and ELK stack 
  • Proficiency in scripting languages such as Python 
  • Experience with infrastructure-as-code tools, such as Terraform or CloudFormation 
  • Strong understanding of Linux system administration and networking concepts 
  • Demonstrable knowledge of Unix, TCP/IP, HTTP, web application security, and experience supporting multi-tier web application architectures 
  • Experience in analyzing logs and troubleshooting large-scale, distributed systems 

WOULD BE A PLUS 

  • Experience with instrumenting and monitoring production systems using tools such as ELK stack, Zabbix, Nagios, Statsd/Graphite, APM, etc 
  • Experience with Amazon AWS Infrastructure (including EC2, S3, VPC, Security Groups, RDS) and related services is desirable 
  • Practical knowledge  of Docker, Vagrant, and configuration management tools like Ansible, Chef, or Puppet 
  • Experience with one or more general-purpose programming or scripting languages, including but not limited to Python, Bash, Perl, or Go 
Additional Information

PERSONAL PROFILE

  • Excellent troubleshooting and problem-solving skills 
  • Ability to work independently and collaboratively in a fast-paced environment 
  • Strong communication and interpersonal skills 
  • Excellent organizational, time management, and communication skills 


  • São Paulo, Brasil K2 Solutions Tempo inteiro

    Trabalho híbrido na região de Pinheiros/ SP - 3x por semana no escritórioEstamos selecionando um Senior Site Reliability Engineer - SRE para se juntar ao nosso time e desempenhar um papel essencial na manutenção, automação e melhoria da confiabilidade dos sistemas que impulsionam a rede logística da empresa em múltiplas regiões. Essa pessoa...


  • São Paulo, Brasil Lend Tempo inteiro

    Buscamos um(a) Site Reliability Engineer Sênior para projetar, operar e evoluir uma infraestrutura de crédito que vai transformar o mercado financeiro brasileiro. Você será responsável por garantir que nossa plataforma seja confiável, escalável, segura e eficiente em custo , impactando diretamente nossos clientes e moldando como o crédito será...


  • São Paulo, Brasil Canonical Tempo inteiro

    Senior Site Reliability / Gitops EngineerJoin to apply for the Senior Site Reliability / Gitops Engineer role at Canonical Senior Site Reliability / Gitops Engineer1 day ago Be among the first 25 applicants Join to apply for the Senior Site Reliability / Gitops Engineer role at Canonical Get AI-powered advice on this job and more exclusive features....


  • São Paulo, Brasil Chainlink Labs Tempo inteiro

    Join to apply for the Senior Site Reliability Engineer role at Chainlink Labs 2 weeks ago Be among the first 25 applicants Join to apply for the Senior Site Reliability Engineer role at Chainlink Labs Get AI-powered advice on this job and more exclusive features. About UsChainlink Labs is the primary contributing developer of Chainlink, the decentralized...


  • São Paulo, Brasil Mouts TI Tempo inteiro

    Na Mouts TI, entregamos soluções que impulsionam a transformação digital de forma ágil, eficiente e descomplicada.Buscamos um(a) SRE (Site Reliability Engineer) para atuar presencialmente, com foco em infraestrutura, automação e observabilidade em ambientes de missão crítica.Responsabilidades:Implementar e gerenciar soluções de observabilidade


  • São Paulo, Brasil PayRetailers Tempo inteiro

    Site Reliability Engineer Join PayRetailers in São Paulo. We are expanding across Latin America and Africa, building cutting‑edge payment solutions. We value creativity, growth, and collaboration. About the role Site Reliability Engineers are guardians of our reliability promise. They deliver a highly reliable, resilient, and cost‑efficient platform...


  • São Paulo, Brasil PayRetailers Tempo inteiro

    Site Reliability Engineer Join PayRetailers in São Paulo. We are expanding across Latin America and Africa, building cutting‑edge payment solutions. We value creativity, growth, and collaboration. About the role Site Reliability Engineers are guardians of our reliability promise. They deliver a highly reliable, resilient, and cost‑efficient platform...


  • São Paulo, Brasil INDI Staffing Services Tempo inteiro

    OverviewWe are looking for a Site Reliability Engineer to build and maintain highly reliable, scalable, and secure OpenShift/Kubernetes clusters. Approach the problem of building and maintaining production systems from a software engineering perspective with a focus on automation and reliability. ResponsibilitiesBuild, automate, and maintain...


  • São Paulo, Brasil K2 Solutions Tempo inteiro

    Trabalho híbrido na região de Pinheiros/ SP - 3x por semana no escritórioEstamos selecionando um Senior Site Reliability Engineer - SRE para se juntar ao nosso time e desempenhar um papel essencial na manutenção, automação e melhoria da confiabilidade dos sistemas que impulsionam a rede logística da empresa em múltiplas regiões. Essa pessoa...


  • São Paulo, Brasil K2 Solutions Tempo inteiro

    Trabalho híbrido na região de Pinheiros/ SP - 3x por semana no escritórioEstamos selecionando um Senior Site Reliability Engineer - SRE para se juntar ao nosso time e desempenhar um papel essencial na manutenção, automação e melhoria da confiabilidade dos sistemas que impulsionam a rede logística da empresa em múltiplas regiões. Essa pessoa...