
Site Reliability Expert
Há 6 horas
We are seeking a highly skilled System Reliability Engineer to join our team. As a key technology leader, advisor for our clients, and mentor for other team members, you will be responsible for designing, implementing, and maintaining scalable and reliable infrastructure solutions.
Key Responsibilities- Operate, maintain, and administer solutions contributing to customer infrastructure's operational efficiency, availability, and visibility.
- Plan maintenance activities, design documentation, and standard procedures.
- Provide root cause analysis reports for outages/incidents (ITIL - Problem Management).
- Observe and provide feedback on the current state of the client's infrastructure, identifying opportunities to improve resiliency, reduce incident occurrence, and automate repetitive administrative and operational tasks.
- Contribute to, improve, and maintain team documentation about client systems and infrastructure, procedures, policies, and schedules.
- Gather and document information about client environments through audit activities, analyzing the information to identify opportunities for improvement and application of best practices.
- Work collaboratively with teammates to contribute to the continuous improvement of our working culture.
- Act as a technology leader for clients, driving client discussions on technology road maps.
- Participate in an on-call rotation in an escalation capacity.
- Experience working with Google and AWS Clouds (including infrastructure as code deployment with Cloud Formation, Terraform, Opsworks, etc).
- Scripting and automation of administrative tasks using Python and Scala are mandatory.
- Solid understanding of microservices architecture and container technologies (Kubernetes is a must, Docker, lxc, etc).
- Clear understanding of software development lifecycles and best practices from an infrastructure point of view (PRs, merge, rebase, etc).
- Understanding the end-to-end operations of a 'Business System' vs components.
- Comprehensive systems hardware and network troubleshooting experience.
- Common Linux distribution platform installation, configuration, performance tuning, and cloud migration.
- TCP/IP networking, NIC bonding, and network services configuration (DNS, NTP, DHCP, SMTP, etc).
- Operation and administration of virtual infrastructure, including experience with at least one hypervisor (VMware, Hyper-V, KVM, etc.).
- Ability to describe IaaS, PaaS, SaaS, pros and cons of each, use cases for virtualization and cloud.
- Administration of web servers and supporting technologies, including network load balancers.
- Experience with the design, development, and deployment of Puppet.
- System and application error investigation, troubleshooting of access/availability issues including deep multi-system root cause analysis.
- Experience managing networking devices, such as switches and firewalls from a variety of vendors.
- Solid understanding of DevOps tools, processes, and culture.
- Ability to pick up new technologies quickly.
- Ability to provide accurate work scheduling and task estimations for work delivery.
- A competitive total rewards package.
- The flexibility to work remotely from home with no daily travel requirement to an office.
- The opportunity to collaborate with some of the best and brightest in the industry.
- Significant training allowance to hone your skills or learn new ones.
- Professional development days, training, certification, and more.
- Annual budget to personalize your work environment.
- Wellness budget to make yourself a priority.
- Generous amount of paid vacation and sick days.
- Day off to volunteer for your favorite charity.
-
Mid level Site Reliability Engineer
Há 22 horas
Salvador, Bahia, Brasil WEX Tempo inteiro US$104.000 - US$130.878 por anoAbout the Team/RoleThe WEX Site Reliability Engineering (SRE) team seeks individuals passionate about developing software and solutions for observability, incident response, reliability, performance, operational excellence, and compliance. As part of the Site Reliability Engineering organization, you will support internal stakeholders and Payment Platform...
-
Site Reliability Engineer Júnior
4 semanas atrás
Salvador, Bahia, Brasil TAG IMF Tempo inteiroConhecendo a TAG Somos uma empresa de tecnologia, uma Infraestrutura do Mercado Financeiro (IMF), autorizada e regulada pelo Banco Central. Viabilizamos a gestão de ativos através de plataformas e ferramentas modernas e inovadoras.Nosso foco é desenvolver soluções eficazes para os mercados de pagamento, crédito e financeiro do Brasil.De...
-
Linux Site Reliability Consultant
1 semana atrás
Salvador, Bahia, Brasil Pythian Tempo inteiroOverviewSite Reliability Consultant. Brazil | Remote | Work from Home. One available position for the following time zone: PST.Why PythianAt Pythian, we are experts in strategic database and analytics services, driving digital transformation and operational excellence. Pythian, a multinational company, was founded in 1997 and started by ensuring the...
-
Linux Site Reliability Consultant
Há 6 dias
Salvador, Bahia, Brasil Pythian Tempo inteiroOverview Site Reliability Consultant. Brazil | Remote | Work from Home. One available position for the following time zone: PST. Why Pythian At Pythian, we are experts in strategic database and analytics services, driving digital transformation and operational excellence. Pythian, a multinational company, was founded in 1997 and started by ensuring the...
-
Reliability Engineer
Há 6 dias
Salvador, Bahia, Brasil beBeeSiteReliabilityEngineer Tempo inteiro US$90.000 - US$120.000Job Summary:","We are seeking a skilled Site Reliability Engineer (Middle) to join our team. This individual will be responsible for ensuring the smooth operation of our IT systems, managing alerts, and escalating issues as needed.","Key Responsibilities:","Manage alerts daily and check systems for any issues","Escalate critical issues to the appropriate...
-
Site Reliability Engineer
Há 6 dias
Salvador, Bahia, Brasil AgileEngine Tempo inteiroOverview Join to apply for the Site Reliability Engineer (Middle) ID38916 role at AgileEngine. AgileEngine is an Inc. 5000 company that creates award-winning software for Fortune 500 brands and trailblazing startups across 17+ industries. We rank among the leaders in areas like application development and AI/ML, and our people-first culture has earned us...
-
SAP Ariba Cloud Delivery Expert
1 semana atrás
Salvador, Bahia, Brasil beBeeDeployment Tempo inteiro R$90.353 - R$109.976We are seeking a seasoned IT professional to fill the role of Deployment Reliability Engineer.
-
Site Remediation Specialist
Há 6 dias
Salvador, Bahia, Brasil beBeeEnvironmental Tempo inteiro R$60.000 - R$95.000Job DescriptionAs a skilled Investigation & Remediation Consultant, you will play a key role in implementing innovative solutions for clients with complex technical and regulatory issues. This is an excellent opportunity to work with experienced professionals and contribute to the development of sustainable approaches.You will be responsible for:Developing...
-
Cloud Infrastructure Professional
Há 6 dias
Salvador, Bahia, Brasil beBeeReliability Tempo inteiro US$120.000 - US$140.000Site Reliability EngineerWe are seeking a skilled Site Reliability Engineer to join our team. In this role, you will be responsible for ensuring the operational efficiency, availability, and visibility of our clients' infrastructure.As a Site Reliability Engineer, you will operate, maintain, and administer solutions contributing to customer infrastructure's...
-
Global Regulatory Expert
Há 7 dias
Salvador, Bahia, Brasil beBeeRegulatory Tempo inteiro R$63.000 - R$97.800Job OverviewThe Senior Regulatory Specialist serves as the primary point of contact for investigative sites during site start-up activities and maintenance, ensuring timely delivery of high-quality results.Maintain awareness of regulatory legislation, guidance, and best practices in assigned countries.Coordinate collection and organization of data and...