SRE Architect

4 semanas atrás

Blumenau, Santa Catarina, Brasil EPAM Systems Tempo inteiro

Overview

We are seeking a highly skilled Site Reliability Engineer/Architect (SRE) to join our innovative and fast-paced team.

In this role, you will be responsible for designing and implementing modern SRE practices to enhance the reliability and scalability of our enterprise-grade Generative AI (GenAI) integration platform. You will play a vital role in driving operational excellence by adopting advanced methodologies and tools while collaborating with key stakeholders across technical and business units.

Responsibilities

Define Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to establish reliability standards and monitor system health
Architect resilient production systems using methodologies like canary deployments, shadow traffic, and testing-in-production
Develop incident management strategies and automate on-call operations to minimize downtime and improve system stability
Enhance observability frameworks with logging, tracing, and monitoring for real-time visibility and proactive troubleshooting
Automate tasks related to scalability, performance optimization, and operational processes for improved efficiency
Collaborate with engineering teams to integrate SRE principles into system design and development
Provide strategic leadership for implementing site reliability solutions in multi-cloud, multi-tenant environments for enterprise applications
Advise executive stakeholders with insights and recommendations to align SRE strategies with organizational goals
Promote a culture of innovation and operational reliability through mentoring and industry-leading best practices
Ensure the platform's infrastructure supports high availability and scalability in partnership with architecture and DevOps teams
Drive continuous improvement by identifying opportunities for process innovation and optimization

Requirements

10+ years of professional experience in SRE, DevOps, or related areas, including managing production systems
Expertise in SRE practices such as SLOs, SLIs, canary testing, and incident management
Proficiency with cloud technologies like AWS, Google Cloud Platform, or Azure, with hands-on experience in multi-cloud setups
Background in observability tools such as Prometheus, Grafana, or ELK Stack, as well as monitoring distributed systems
Skills in automation platforms such as Terraform, Ansible, or Kubernetes, enabling infrastructure-as-code adoption
Familiarity with programming languages like Python, Go, or Bash for building automation solutions
Strong understanding of CI/CD pipelines, containerization technologies, and orchestration frameworks
Competency in system architecture for fault tolerance, redundancy, and performance optimization
History of collaborating effectively with diverse stakeholders, from technical teams to executive management
Background in managing enterprise-scale systems and multi-tenant platform deployments

Nice to have

Knowledge of Generative AI platforms and integration techniques
Understanding of managed database services, including Amazon RDS, Google Spanner, or Azure SQL
Familiarity with security practices for enterprise platforms and multi-cloud infrastructures
Background in contributing to technical roadmaps for distributed systems at scale
Capability to lead initiatives involving Chaos Engineering or disaster recovery strategies

We offer

International projects with top brands
Work with global teams of highly skilled, diverse peers
Employee financial programs
Paid time off and sick leave
Upskilling, reskilling and certification courses
Unlimited access to the LinkedIn Learning library and 22,000+ courses
Global career opportunities
Volunteer and community involvement opportunities
EPAM Employee Groups
Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn

Seniority level

Mid-Senior level

Employment type

Full-time

Job function

Information Technology, Engineering, and Business Development

Industries

Software Development, IT Services and IT Consulting, and Venture Capital and Private Equity Principals

Referrals increase your chances of interviewing at EPAM Systems.

Get notified about new Site Reliability Engineer jobs in Brazil .

#J-18808-Ljbffr

Engenheiro(a) de Infraestrutura em Nuvem

1 semana atrás

Blumenau, Santa Catarina, Brasil Pulsati Tempo inteiro R$80.000 - R$120.000 por ano

Venha fazer parte do time Pulsati Desenvolvemos soluções tecnológicas para empresas de saúde de todos os portes e segmentos.O maior ingrediente da PULSATI é o investimento em pessoas voltadas para inovação. Juntos nós temos mais de 30 anos de caminhada nessa trajetória na área da saúde.Estamos, desde então, unindo forças por um propósito...
Devsecops Analyst Sre

1 semana atrás

Blumenau, Brasil WIIPO LAB Tempo inteiro

**None**: Nosso propósito é oferecer à empresas e pessoas a liberdade e poder de escolha, aliado à experiência de uma empresa de tecnologia. Ter serviços financeiros e benefícios flexíveis conectados de forma inteligente é o que nos move. Embarque com a gente nesse foguete: Seja um #Wiiper ? **SOBRE A OPORTUNIDADE**: Na Wiipo, somos uma fintech que...
Devsecops Analyst Sre

1 semana atrás

Blumenau, Brasil WIIPO LAB Tempo inteiro

**None**:Nosso propósito é oferecer à empresas e pessoas a liberdade e poder de escolha, aliado à experiência de uma empresa de tecnologia. Ter serviços financeiros e benefícios flexíveis conectados de forma inteligente é o que nos move. Embarque com a gente nesse foguete: Seja um #Wiiper ?**SOBRE A OPORTUNIDADE**:Na Wiipo, somos uma fintech que...
Engenheiro(a) de Infraestrutura em Nuvem | Devops

Há 3 dias

Blumenau, Brasil Pulsati Tempo inteiro

Engenheiro(a) de Infraestrutura em Nuvem | Devops Venha fazer parte do time Pulsati! Desenvolvemos soluções tecnológicas para empresas de saúde de todos os portes e segmentos.O maior ingrediente da PULSATI é o investimento em pessoas voltadas para inovação. Juntos nós temos mais de 30 anos de caminhada nessa trajetória na área da saúde.Estamos,...
Arquiteto de sistemas Pleno

2 semanas atrás

Blumenau, Brasil Runtalent Tempo inteiro

Overview Somos a @Runtalent, com DNA inovador, somos consolidados no mercado de tecnologia e especializados em soluções de TI há quase duas décadas. Acompanhamos todos os avanços tecnológicos dos últimos anos e estamos juntos nessa corrida pela transformação digital. Temos uma oportunidade para: Arquiteto de sistemas Pleno (remoto) Venha conhecer...

Américas

Europa

Ásia / Oceania

África

SRE Architect

Engenheiro(a) de Infraestrutura em Nuvem

Devsecops Analyst Sre

Devsecops Analyst Sre

Engenheiro(a) de Infraestrutura em Nuvem | Devops

Arquiteto de sistemas Pleno