Site Reliability Expert

Há 6 horas


Salvador, Bahia, Brasil beBeeReliability Tempo inteiro US$150.000 - US$170.000
Job Overview

We are seeking a highly skilled System Reliability Engineer to join our team. As a key technology leader, advisor for our clients, and mentor for other team members, you will be responsible for designing, implementing, and maintaining scalable and reliable infrastructure solutions.

Key Responsibilities
  • Operate, maintain, and administer solutions contributing to customer infrastructure's operational efficiency, availability, and visibility.
  • Plan maintenance activities, design documentation, and standard procedures.
  • Provide root cause analysis reports for outages/incidents (ITIL - Problem Management).
  • Observe and provide feedback on the current state of the client's infrastructure, identifying opportunities to improve resiliency, reduce incident occurrence, and automate repetitive administrative and operational tasks.
  • Contribute to, improve, and maintain team documentation about client systems and infrastructure, procedures, policies, and schedules.
  • Gather and document information about client environments through audit activities, analyzing the information to identify opportunities for improvement and application of best practices.
  • Work collaboratively with teammates to contribute to the continuous improvement of our working culture.
  • Act as a technology leader for clients, driving client discussions on technology road maps.
  • Participate in an on-call rotation in an escalation capacity.
Required Skills and Qualifications
  • Experience working with Google and AWS Clouds (including infrastructure as code deployment with Cloud Formation, Terraform, Opsworks, etc).
  • Scripting and automation of administrative tasks using Python and Scala are mandatory.
  • Solid understanding of microservices architecture and container technologies (Kubernetes is a must, Docker, lxc, etc).
  • Clear understanding of software development lifecycles and best practices from an infrastructure point of view (PRs, merge, rebase, etc).
  • Understanding the end-to-end operations of a 'Business System' vs components.
  • Comprehensive systems hardware and network troubleshooting experience.
  • Common Linux distribution platform installation, configuration, performance tuning, and cloud migration.
  • TCP/IP networking, NIC bonding, and network services configuration (DNS, NTP, DHCP, SMTP, etc).
  • Operation and administration of virtual infrastructure, including experience with at least one hypervisor (VMware, Hyper-V, KVM, etc.).
  • Ability to describe IaaS, PaaS, SaaS, pros and cons of each, use cases for virtualization and cloud.
  • Administration of web servers and supporting technologies, including network load balancers.
  • Experience with the design, development, and deployment of Puppet.
  • System and application error investigation, troubleshooting of access/availability issues including deep multi-system root cause analysis.
  • Experience managing networking devices, such as switches and firewalls from a variety of vendors.
  • Solid understanding of DevOps tools, processes, and culture.
  • Ability to pick up new technologies quickly.
  • Ability to provide accurate work scheduling and task estimations for work delivery.
What We Offer
  • A competitive total rewards package.
  • The flexibility to work remotely from home with no daily travel requirement to an office.
  • The opportunity to collaborate with some of the best and brightest in the industry.
  • Significant training allowance to hone your skills or learn new ones.
  • Professional development days, training, certification, and more.
  • Annual budget to personalize your work environment.
  • Wellness budget to make yourself a priority.
  • Generous amount of paid vacation and sick days.
  • Day off to volunteer for your favorite charity.


  • Salvador, Bahia, Brasil WEX Tempo inteiro US$104.000 - US$130.878 por ano

    About the Team/RoleThe WEX Site Reliability Engineering (SRE) team seeks individuals passionate about developing software and solutions for observability, incident response, reliability, performance, operational excellence, and compliance. As part of the Site Reliability Engineering organization, you will support internal stakeholders and Payment Platform...


  • Salvador, Bahia, Brasil TAG IMF Tempo inteiro

    Conhecendo a TAG Somos uma empresa de tecnologia, uma Infraestrutura do Mercado Financeiro (IMF), autorizada e regulada pelo Banco Central. Viabilizamos a gestão de ativos através de plataformas e ferramentas modernas e inovadoras.Nosso foco é desenvolver soluções eficazes para os mercados de pagamento, crédito e financeiro do Brasil.De...


  • Salvador, Bahia, Brasil Pythian Tempo inteiro

    OverviewSite Reliability Consultant. Brazil | Remote | Work from Home. One available position for the following time zone: PST.Why PythianAt Pythian, we are experts in strategic database and analytics services, driving digital transformation and operational excellence. Pythian, a multinational company, was founded in 1997 and started by ensuring the...


  • Salvador, Bahia, Brasil Pythian Tempo inteiro

    Overview Site Reliability Consultant. Brazil | Remote | Work from Home. One available position for the following time zone: PST. Why Pythian At Pythian, we are experts in strategic database and analytics services, driving digital transformation and operational excellence. Pythian, a multinational company, was founded in 1997 and started by ensuring the...


  • Salvador, Bahia, Brasil beBeeSiteReliabilityEngineer Tempo inteiro US$90.000 - US$120.000

    Job Summary:","We are seeking a skilled Site Reliability Engineer (Middle) to join our team. This individual will be responsible for ensuring the smooth operation of our IT systems, managing alerts, and escalating issues as needed.","Key Responsibilities:","Manage alerts daily and check systems for any issues","Escalate critical issues to the appropriate...


  • Salvador, Bahia, Brasil AgileEngine Tempo inteiro

    Overview Join to apply for the Site Reliability Engineer (Middle) ID38916 role at AgileEngine. AgileEngine is an Inc. 5000 company that creates award-winning software for Fortune 500 brands and trailblazing startups across 17+ industries. We rank among the leaders in areas like application development and AI/ML, and our people-first culture has earned us...


  • Salvador, Bahia, Brasil beBeeDeployment Tempo inteiro R$90.353 - R$109.976

    We are seeking a seasoned IT professional to fill the role of Deployment Reliability Engineer.


  • Salvador, Bahia, Brasil beBeeEnvironmental Tempo inteiro R$60.000 - R$95.000

    Job DescriptionAs a skilled Investigation & Remediation Consultant, you will play a key role in implementing innovative solutions for clients with complex technical and regulatory issues. This is an excellent opportunity to work with experienced professionals and contribute to the development of sustainable approaches.You will be responsible for:Developing...


  • Salvador, Bahia, Brasil beBeeReliability Tempo inteiro US$120.000 - US$140.000

    Site Reliability EngineerWe are seeking a skilled Site Reliability Engineer to join our team. In this role, you will be responsible for ensuring the operational efficiency, availability, and visibility of our clients' infrastructure.As a Site Reliability Engineer, you will operate, maintain, and administer solutions contributing to customer infrastructure's...


  • Salvador, Bahia, Brasil beBeeRegulatory Tempo inteiro R$63.000 - R$97.800

    Job OverviewThe Senior Regulatory Specialist serves as the primary point of contact for investigative sites during site start-up activities and maintenance, ensuring timely delivery of high-quality results.Maintain awareness of regulatory legislation, guidance, and best practices in assigned countries.Coordinate collection and organization of data and...