Empregos atuais relacionados a Site Reliability Engineer - São Paulo - Pluxee

Site Reliability Engineer

2 semanas atrás

São Paulo, São Paulo, Brasil Kaizen Gaming Group Tempo inteiro

About the RoleWe are seeking an experienced Site Reliability Engineer to join our team at Kaizen Gaming Group. As a Site Reliability Engineer, you will play a critical role in ensuring the uptime and reliability of our applications and infrastructure.Key ResponsibilitiesCollaborate with a team of engineers to enable and enhance operational workflows of...
Site Reliability Engineer 3

3 semanas atrás

São Paulo, São Paulo, Brasil LatamCent Tempo inteiro

About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at LatamCent. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our software systems.Key ResponsibilitiesDesign, develop, and deploy scalable and highly available software systemsCollaborate with...
Senior Site Reliability Engineer

2 semanas atrás

São Paulo, São Paulo, Brasil Myskysys Tempo inteiro

Job Title: Senior Site Reliability EngineerWe are seeking a highly skilled Senior Site Reliability Engineer to join our team at Myskysys. As a key member of our infrastructure team, you will play a critical role in ensuring the reliability, scalability, and performance of our digital platforms and infrastructure.Key Responsibilities:Troubleshoot and resolve...
Senior Site Reliability Engineer

3 semanas atrás

São Paulo, São Paulo, Brasil Sigma Software Group Tempo inteiro

About the RoleWe are seeking a highly motivated and experienced Senior/Principal Site Reliability Engineer to join our mature project team at Sigma Software Group.Key ResponsibilitiesDesign and implement scalable and reliable systems to ensure high availability and performance.Collaborate with cross-functional teams to identify and prioritize technical...
Senior/Principal Site Reliability Engineer

3 semanas atrás

São Paulo, Brasil Sigma Software Group Tempo inteiro

We have an excellent opportunity for a bright, smart, and highly motivated Senior/Principal Site Reliability Engineer to join our mature project team.
Senior Site Reliability Engineer

Há 4 dias

São Paulo, São Paulo, Brasil SkySys Tempo inteiro

Job Title: Senior Site Reliability EngineerAt SkySys, we are seeking a highly skilled Senior Site Reliability Engineer to join our team. As a key member of our infrastructure team, you will play a critical role in ensuring the reliability, scalability, and performance of our digital platforms and infrastructure.Key Responsibilities:Troubleshoot and resolve...
Senior Site Reliability Engineer

1 semana atrás

São Paulo, São Paulo, Brasil SkySys Tempo inteiro

Job Title: Senior Site Reliability EngineerAt SkySys, we are seeking a highly skilled Senior Site Reliability Engineer to join our team. As a key member of our infrastructure team, you will play a critical role in ensuring the reliability, scalability, and performance of our digital platforms and infrastructure.Key Responsibilities:Troubleshoot and resolve...
Site Reliability Engineer

2 semanas atrás

São Paulo, São Paulo, Brasil Pluxee Tempo inteiro

Job SummaryWe are seeking a highly skilled Site Reliability Engineer and Incident Manager to join our team in Brazil/Chile. As a key member of our Run & Operations team, you will be responsible for ensuring the reliability and resiliency of our systems and services, driving a culture of continuous improvement, and leading incident response efforts.Key...
Senior Site Reliability Engineer

2 semanas atrás

São Paulo, São Paulo, Brasil SkySys Tempo inteiro

{"title": "Senior Site Reliability Engineer", "description": "Job SummaryAt SkySys, we are seeking a highly skilled Senior Site Reliability Engineer to join our team. As a key member of our infrastructure team, you will be responsible for ensuring the reliability, scalability, and performance of our digital platforms and infrastructure.Main...
Senior Site Reliability

2 semanas atrás

São Paulo, Brasil SkySys Tempo inteiro

Role: Senior Site Reliability Engineer Position Type: Full-Time Contract (40hrs/week) Contract Duration: 6-8 Months+ Work Hours: Eastern Standard Time (EST) Work Schedule: 8 hours/day (Mon-Fri) Location: Hybrid (combination of on-site and remote work)- on site 2-3 times a month Overview: The Site Reliability Engineer (SRE) plays a critical role in...
Senior Site Reliability

2 semanas atrás

São Paulo, Brasil SkySys Tempo inteiro

Role: Senior Site Reliability Engineer Position Type: Full-Time Contract (40hrs/week) Contract Duration: 6-8 Months+ Work Hours: Eastern Standard Time (EST) Work Schedule: 8 hours/day (Mon-Fri) Location: 100% Remote in Brazil Overview: The Site Reliability Engineer (SRE) plays a critical role in ensuring the reliability, scalability, and performance of...
Site Reliability Engineer

2 semanas atrás

São Paulo, São Paulo, Brasil Understanding Recruitment Group Tempo inteiro

About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our cybersecurity organization, which utilizes AI/ML to develop cutting-edge enterprise security products at scale. With a global team serving millions of users, this is an excellent opportunity to work at a large scale and make a significant impact.Your role as a Site...
Site Reliability Engineer

2 semanas atrás

São Paulo, São Paulo, Brasil Ebury Tempo inteiro

About EburyEbury is a hyper-growth FinTech firm, recognized as one of the top 15 European Fintechs to work for by AltFi. We offer a range of innovative products, including FX risk management, trade finance, currency accounts, international payments, and API integration.Job DescriptionWe are seeking a skilled Site Reliability Engineer to join our team in São...
Senior site reliability engineer

2 meses atrás

São Paulo, Brasil Netvagas Tempo inteiro

About the RoleOur company employs a diverse array of systems and technologies to deliver our products. As a Senior Site Reliability Engineer, you will work closely with software engineering teams to focus on software development and infrastructure design, providing expertise in performance, stability, and scalability. You will report to the Site Reliability...
Senior Site Reliability Engineer

1 semana atrás

São Paulo, São Paulo, Brasil Grupo Hub Tempo inteiro

About the RoleAt Grupo Hub, we're committed to delivering exceptional products and services through our diverse array of systems and technologies. As a Senior Site Reliability Engineer, you'll collaborate closely with software engineering teams to drive software development and infrastructure design, focusing on performance, stability, and scalability....
Senior Site Reliability Engineer

Há 7 dias

São Paulo, São Paulo, Brasil Grupo Hub Tempo inteiro

About the RoleAt Grupo Hub, we're committed to delivering exceptional products and services through our diverse array of systems and technologies. As a Senior Site Reliability Engineer, you'll collaborate closely with software engineering teams to focus on software development and infrastructure design, providing expertise in performance, stability, and...
Site Reliability Engineer

Há 1 mês

São Paulo, Brasil Pentasia Tempo inteiro

My client is looking for a Mid and Senior Site Reliability Engineer with strong skills and focus on development and automation to address complex challenges. This is a remote positions in Brazil to work in a payrolls system, and the candidate needs to be at the office 1x a quarter to meet the team in Sao Pablo. Remote in Brazil Salary: Circa R$15k - R$18k...
Site Reliability Engineer

2 semanas atrás

São Paulo, Brasil Pentasia Tempo inteiro

My client is looking for a Mid Site Reliability Engineer with strong skills and focus on development and automation to address complex challenges. This is a remote positions in Brazil to work in a payrolls system, and the candidate needs to be at the office 1x a quarter to meet the team in Sao Pablo. Remote in Brazil Salary: Circa R$14k - R$16k gross/month...
Site Reliability Engineer

2 semanas atrás

São Paulo, São Paulo, Brasil Pentasia Tempo inteiro

About the RolePentasia is seeking a talented Mid-Level Site Reliability Engineer to join our team in Brazil. As a key member of our engineering team, you will be responsible for ensuring the reliability and availability of our applications and infrastructure.Key ResponsibilitiesCollaborate with fellow engineers to optimize and enhance operational processes,...
Site Reliability Engineer

3 semanas atrás

São Paulo, São Paulo, Brasil Kaizen Gaming Group Tempo inteiro

About Kaizen Gaming GroupKaizen Gaming Group is a leading GameTech company in Greece and one of the fastest-growing in the world, operating in 16 markets with 2 brands, Betano & Stoiximan. We are a diverse team of over 2,500 professionals from 40+ nationalities, spread across 3 continents. Our team is proud to be among the Best Workplaces in Europe and...

Site Reliability Engineer

3 meses atrás

São Paulo, Brasil Pluxee Tempo inteiro

SRE and Incident Manager - Brazil / Chile (Latin America)Pluxee is the leading global employee benefits and engagement partner that opens up a world of opportunities to help people enjoy more of what really matters.

We believe that living life to the full means making the most of every moment and sharing experiences with the people we care about. To make these experiences meaningful, fulfilling, and personalized, we combine our 45+ years of experience with the agility and energy of a new digital brand.

The SRE and Incident Manager - LATAM will be responsible for overseeing the entire SRE (Site Reliability Engineering) and incident management process, ensuring that infrastructure and application under his / her scope are design, developed and implemented with reliability and resiliency, increasing efficiency through automation, which at the end improving customer satisfaction and retention. He/she should also proactively drive a culture of continuous improvement

During the operational phase of the system under his / her scope, he is responsible for the timely resolution of incidents and minimizing the impact on our systems and services offered to clients, consumers and merchants in several Latin American countries.

He / She will be a part of a distributed international Run & Operations team that collaborates with other cross-functional teams to drive continuous improvement of our incident response capabilities at all levels (local, regional and global).

Our systems and services are built on a cloud-native, distributed architecture and integrated to country systems. The current functional scope is composed of an identity management system, a store locator, several web portals, a couple of mobile applications, as well as several payment capabilities.

Technical expertise

: solid experience with Azure services, like App Services, Functions, Application Gateways, APIM, and understanding of common server-side frameworks, like React.js, Node.js, PHP Synfony, ".Net" as well as at least one observability platform, preferably Datadog.

Process-centric approach

: strive to understand and optimize the end-to-end flow of work, rather than solely focusing on individual tasks or functional silos.

Assertive communication

: clearly express his / her thoughts, needs, and boundaries in a direct, and respectful manner while considering the need to engage multiple parties (internal and external) towards a common goal.

Your next challenge:The SRE and Incident Manager – LATAM will have the following responsibilities to ensure reliability and resiliency of the system under his / her scope:

Ensuring reliability - getting systems back to steady-state as quickly as possibleEliminating toil - automating wherever possibleBlameless postmortems - driving better cross-team collaborationObserving what matters - gaining full visibility into system healthBeing pro-active - living and breathing SLOs to identify and remediate issues before SLAs are violatedArchitecting for resiliency - Informing architectural design decisions to build more reliable systems

The SRE and Incident Manager – LATAM will have the following responsibilities during the lifecycle of an incident or crisis:Review and maintain the incident and crisis management processes, escalation paths, and communication plans.Review and maintain an incident and crisis response plan, outlining steps, procedures, and roles during incidents.Identify and log incidents promptly, using monitoring tools, alerts, and reports.Ensure incidents are prioritized according to predefined criteria, allocating appropriate resources based on the incident's impact.Coordinate and communicate effectively with technical teams, support staff, and stakeholders throughout the incident lifecycle.Initiate escalation processes when incidents cannot be resolved within predefined timeframes or by initial response teams.Lead or participate in incident investigations and root cause analysis, identifying underlying causes and recommending preventive measures.Maintain accurate records of incidents, including timelines, actions taken, and resolutions.Lead or participate in the preparation of incident reports and root cause analysis reports.Share relevant reports with stakeholders, highlighting trends, recurring issues, and improvement opportunities.Drive continuous improvement of the incident management process by analysing incident data, identifying patterns, and suggesting enhancements.

You’re a match :The Incident Manager - LATAM will be responsible for the following technical activities:Review and track patch management activities to ensure that applications and services are up to date with the latest security patches, bug fixes, and upgrades.Coordinate and schedule regular maintenance windows for applying patches or implementing upgrades, minimizing disruption to users.Track support lifecycle of all configuration items supporting the systems and services under his / her responsibility.Collaborate with other stakeholders to ensure timely upgrades of software and services ensuring that they remain enjoying the proper support and service levels.Track capacity and utilization of resources and collaborate with other stakeholders to ensure services remain available, even under high loads.Regularly review monitoring data with Tech Leads and DevOps to identify performance issues, capacity bottlenecks, or potential risks.Proactively propose new monitors and adjustments in the monitoring thresholds aiming at detecting events and incidents the earliest.Review and maintain process documentation, including support procedures, runbooks, knowledge articles, and FAQs (Frequently Asked Questions), for all systems and services under his responsibility.Review contracts from vendors and service providers to ensure they fully support the SLAs (Service Level Agreements) agreed with internal clients.Propose architectural changes to increase systems resiliency, based on insights and experience gained from incidents and maintenance activities.Conduct training sessions, workshops, or awareness programs to educate staff and countries on the support model, incident response procedures and best practices.Be the main point of contact from Run & Operations in matters related to the Release Management process that may affect the services availability.

SKILLS REQUIRED:Soft Skills :Fluency in English and Spanish (Mandatory) / Portuguese is a PlusAssertive communication, Leadership, Problem-Solving, Stress Management, Collaboration and Teamwork, Conflict Resolution, Continuous Learning and Adaptability.Knowledge / Experience:Technical Knowledge, Incident Management Tools, Troubleshooting and Root Cause Analysis, Application Stacks Knowledge, Incident Response and Recovery, IT Infrastructure and Networks, Data Analysis and Reporting, IT Security and Compliance.

To get this challenge:

Video call Discussion with TA Expert.Video call Discussion with Hiring ManagerVideo call Discussion with Operation ManagerVideo call Discussion with HRBP.

Your team

IT TEAM

Your Location:

Preferred: Brazil / Chile (Latin America)

Américas

Europa

Ásia / Oceania

África

Empregos atuais relacionados a Site Reliability Engineer - São Paulo - Pluxee

Site Reliability Engineer