
SRE Architect
Há 5 dias
We are seeking a highly skilled Site Reliability Engineer/Architect (SRE) to join our innovative and fast-paced team.
In this role, you will be responsible for designing and implementing modern SRE practices to enhance the reliability and scalability of our enterprise-grade Generative AI (GenAI) integration platform. You will play a vital role in driving operational excellence by adopting advanced methodologies and tools while collaborating with key stakeholders across technical and business units.
Responsibilities- Define Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to establish reliability standards and monitor system health
- Architect resilient production systems using methodologies like canary deployments, shadow traffic, and testing-in-production
- Develop incident management strategies and automate on-call operations to minimize downtime and improve system stability
- Enhance observability frameworks with logging, tracing, and monitoring for real-time visibility and proactive troubleshooting
- Automate tasks related to scalability, performance optimization, and operational processes for improved efficiency
- Collaborate with engineering teams to integrate SRE principles into system design and development
- Provide strategic leadership for implementing site reliability solutions in multi-cloud, multi-tenant environments for enterprise applications
- Advise executive stakeholders with insights and recommendations to align SRE strategies with organizational goals
- Promote a culture of innovation and operational reliability through mentoring and industry-leading best practices
- Ensure the platform's infrastructure supports high availability and scalability in partnership with architecture and DevOps teams
- Drive continuous improvement by identifying opportunities for process innovation and optimization
- 10+ years of professional experience in SRE, DevOps, or related areas, including managing production systems
- Expertise in SRE practices such as SLOs, SLIs, canary testing, and incident management
- Proficiency with cloud technologies like AWS, Google Cloud Platform, or Azure, with hands-on experience in multi-cloud setups
- Background in observability tools such as Prometheus, Grafana, or ELK Stack, as well as monitoring distributed systems
- Skills in automation platforms such as Terraform, Ansible, or Kubernetes, enabling infrastructure-as-code adoption
- Familiarity with programming languages like Python, Go, or Bash for building automation solutions
- Strong understanding of CI/CD pipelines, containerization technologies, and orchestration frameworks
- Competency in system architecture for fault tolerance, redundancy, and performance optimization
- History of collaborating effectively with diverse stakeholders, from technical teams to executive management
- Background in managing enterprise-scale systems and multi-tenant platform deployments
- Knowledge of Generative AI platforms and integration techniques
- Understanding of managed database services, including Amazon RDS, Google Spanner, or Azure SQL
- Familiarity with security practices for enterprise platforms and multi-cloud infrastructures
- Background in contributing to technical roadmaps for distributed systems at scale
- Capability to lead initiatives involving Chaos Engineering or disaster recovery strategies
- International projects with top brands
- Work with global teams of highly skilled, diverse peers
- Employee financial programs
- Paid time off and sick leave
- Upskilling, reskilling and certification courses
- Unlimited access to the LinkedIn Learning library and 22,000+ courses
- Global career opportunities
- Volunteer and community involvement opportunities
- EPAM Employee Groups
- Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn
- Mid-Senior level
- Full-time
- Information Technology, Engineering, and Business Development
- Software Development, IT Services and IT Consulting, and Venture Capital and Private Equity Principals
Referrals increase your chances of interviewing at EPAM Systems.
Get notified about new Site Reliability Engineer jobs in Brazil.
#J-18808-Ljbffr
-
Cloud Platform Architect
1 semana atrás
Brasil beBeeInfrastructure Tempo inteiro US$160.000 - US$190.000Senior Infrastructure EngineerWe are seeking a seasoned Senior Infrastructure Engineer to join our infrastructure team in managing and modernizing our hybrid cloud environment.This position requires an experienced engineer with deep expertise in VMware, Hyper-V, and Azure cloud platforms.About the RoleYou will play a critical role in infrastructure projects...
-
Devops Engineer
3 semanas atrás
Brasil ITPROTECH Tempo inteiroOur client is seeking for a SRE/DevOps Engineer to design, build, and maintain the scalable, reliable, and high-performance infrastructure that powers our client's core services. You'll be working for an US-based Internet Service Provider.Key Responsibilites:Architect, deploy, and manage robust cloud infrastructure on AWS and/or GCP, ensuring high...
-
Reliable System Architect
Há 7 dias
Brasil beBeeDevOps Tempo inteiro US$90.000 - US$120.000Job DescriptionWe're seeking a highly skilled DevOps Engineer to join our global team. As a key member of our SRE team, you will be responsible for designing and optimizing architectures for reliability and scalability.Our stack is heavy on EKS, MongoDB/Atlas, and you'll be tackling database contention, scaling challenges, and complex deployments every day....
-
Platform Architect
1 semana atrás
Brasil beBeeDevops Tempo inteiro US$90.000 - US$130.000Transforming IT Operations and Software DeliveryYou'll lead the design and presentation of innovative solutions to enhance IT operations and software delivery. Working closely with sales teams and clients, you'll translate challenges into modern DevOps architectures using cloud, IaC, CI/CD, and containerization.The ideal candidate will have strong knowledge...
-
Enterprise System Reliability Engineer
Há 5 dias
Brasil beBeeReliability Tempo inteiro US$120.000 - US$150.000Site Reliability EngineerWe are seeking an experienced engineer to lead our SRE initiatives and ensure the high availability of our enterprise-grade systems.This role involves designing and implementing modern SRE practices, adopting advanced methodologies, and collaborating with technical stakeholders to drive operational excellence. The ideal candidate...
-
Ruby on Rails Architect
3 semanas atrás
Brasil Housecall Pro Tempo inteiroJoin to apply for the Ruby on Rails Architect - Brazil role at Housecall Pro3 days ago Be among the first 25 applicantsJoin to apply for the Ruby on Rails Architect - Brazil role at Housecall ProGet AI-powered advice on this job and more exclusive features.TO BE CONSIDERED FOR THIS ROLE, PLEASE SUBMIT AN UPDATED RESUME TRANSLATED TO ENGLISHWho is Housecall...
-
Ruby on Rails Architect
Há 2 dias
Brasil Housecall Pro Tempo inteiroJoin to apply for the Ruby on Rails Architect - Brazil role at Housecall Pro3 days ago Be among the first 25 applicantsJoin to apply for the Ruby on Rails Architect - Brazil role at Housecall ProGet AI-powered advice on this job and more exclusive features.TO BE CONSIDERED FOR THIS ROLE, PLEASE SUBMIT AN UPDATED RESUME TRANSLATED TO ENGLISHWho is Housecall...
-
DevOps Engineer
4 semanas atrás
Brasil ITPROTECH Tempo inteiroOur client is seeking for a SRE/DevOps Engineer to design, build, and maintain the scalable, reliable, and high-performance infrastructure that powers our client's core services. You'll be working for an US-based Internet Service Provider.Key Responsibilites:Architect, deploy, and manage robust cloud infrastructure on AWS and/or GCP, ensuring high...
-
Engenheiro de Operações em Nuvem
Há 6 dias
Brasil beBeeOperacional Tempo inteiro R$80.000 - R$108.000Engenheiro de Operações em NuvemSomos procuradores de uma pessoa para fazer parte da nossa equipe de cloud e operações. Essa pessoa será a chave da manutenção, automação e evolução da nossa infraestrutura na nuvem, garantindo ambientes executivos, seguros e altamente disponíveis.Misso:Como parte da estrutura de engenharia de cloud e SRE, sua...
-
DevOps Engineer
4 semanas atrás
Brasil ITPROTECH Tempo inteiroOur client is seeking for a SRE/DevOps Engineer to design, build, and maintain the scalable, reliable, and high-performance infrastructure that powers our client's core services. You'll be working for an US-based Internet Service Provider. Key Responsibilites: Architect, deploy, and manage robust cloud infrastructure on AWS and/or GCP, ensuring high...