Sre Architect

Há 6 dias


Caxias do Sul, Brasil Epam Systems Tempo inteiro

OverviewWe are seeking a highly skilled Site Reliability Engineer/Architect (SRE) to join our innovative and fast-paced team.In this role, you will be responsible for designing and implementing modern SRE practices to enhance the reliability and scalability of our enterprise-grade Generative AI (GenAI) integration platform.You will play a vital role in driving operational excellence by adopting advanced methodologies and tools while collaborating with key stakeholders across technical and business units.ResponsibilitiesDefine Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to establish reliability standards and monitor system healthArchitect resilient production systems using methodologies like canary deployments, shadow traffic, and testing-in-productionDevelop incident management strategies and automate on-call operations to minimize downtime and improve system stabilityEnhance observability frameworks with logging, tracing, and monitoring for real-time visibility and proactive troubleshootingAutomate tasks related to scalability, performance optimization, and operational processes for improved efficiencyCollaborate with engineering teams to integrate SRE principles into system design and developmentProvide strategic leadership for implementing site reliability solutions in multi-cloud, multi-tenant environments for enterprise applicationsAdvise executive stakeholders with insights and recommendations to align SRE strategies with organizational goalsPromote a culture of innovation and operational reliability through mentoring and industry-leading best practicesEnsure the platform's infrastructure supports high availability and scalability in partnership with architecture and DevOps teamsDrive continuous improvement by identifying opportunities for process innovation and optimizationRequirements10+ years of professional experience in SRE, DevOps, or related areas, including managing production systemsExpertise in SRE practices such as SLOs, SLIs, canary testing, and incident managementProficiency with cloud technologies like AWS, Google Cloud Platform, or Azure, with hands-on experience in multi-cloud setupsBackground in observability tools such as Prometheus, Grafana, or ELK Stack, as well as monitoring distributed systemsSkills in automation platforms such as Terraform, Ansible, or Kubernetes, enabling infrastructure-as-code adoptionFamiliarity with programming languages like Python, Go, or Bash for building automation solutionsStrong understanding of CI/CD pipelines, containerization technologies, and orchestration frameworksCompetency in system architecture for fault tolerance, redundancy, and performance optimizationHistory of collaborating effectively with diverse stakeholders, from technical teams to executive managementBackground in managing enterprise-scale systems and multi-tenant platform deploymentsNice to haveKnowledge of Generative AI platforms and integration techniquesUnderstanding of managed database services, including Amazon RDS, Google Spanner, or Azure SQLFamiliarity with security practices for enterprise platforms and multi-cloud infrastructuresBackground in contributing to technical roadmaps for distributed systems at scaleCapability to lead initiatives involving Chaos Engineering or disaster recovery strategiesWe offerInternational projects with top brandsWork with global teams of highly skilled, diverse peersEmployee financial programsPaid time off and sick leaveUpskilling, reskilling and certification coursesUnlimited access to the LinkedIn Learning library and 22,000+ coursesGlobal career opportunitiesVolunteer and community involvement opportunitiesEPAM Employee GroupsAward-winning culture recognized by Glassdoor, Newsweek and LinkedInSeniority levelMid-Senior levelEmployment typeFull-timeJob functionInformation Technology, Engineering, and Business DevelopmentIndustriesSoftware Development, IT Services and IT Consulting, and Venture Capital and Private Equity PrincipalsReferrals increase your chances of interviewing at EPAM Systems.Get notified about new Site Reliability Engineer jobs in Brazil.#J-18808-Ljbffr


  • Senior DevOps Engineer

    3 semanas atrás


    Caxias do Sul, Brasil Wizdaa Tempo inteiro

    Role Overview Lead development of internal Kubernetes platform enabling scalable application deployment through GitOps. Engineer solutions for deployment complexity, database migrations, multi-environment management, and developer productivity. Drive DevOps practices including CI/CD automation, infrastructure operations, system reliability, and cross-team...


  • Jaraguá do Sul, Brasil Sankhya Gestão de Negócios Tempo inteiro

    Procuramos um(a)Cloud Operations Engineer Sênior (AWS)para compor nosso time de Cloud e Operações. Esse profissional será peça-chave na sustentação, automação e evolução da nossa infraestrutura em nuvem, garantindo ambientes performáticos, seguros e altamente disponíveis para suportar a escalabilidade de nossos produtos SaaS. MISSÃO: Como parte...


  • Jaraguá do Sul, Brasil Sankhya Gestão De Negócios Tempo inteiro

    Procuramos um(a) Cloud Operations Engineer Pleno (AWS) para compor nosso time de Cloud e Operações.Esse profissional será peça-chave na sustentação, automação e evolução da nossa infraestrutura em nuvem, garantindo ambientes performáticos, seguros e altamente disponíveis para suportar a escalabilidade de nossos produtos SaaS.MISSÃO:Como parte da...

  • Arquiteto(a) de Soluções I

    2 semanas atrás


    Caxias do Sul, Brasil Randoncorp Tempo inteiro

    Você tem experiência em arquitetura de soluções e busca um papel de maior abrangência, influenciando decisões, avaliando tecnologias e garantindo que soluções sejam sustentáveis, escaláveis e aderentes ao negócio? Gosta de unir visão sistêmica, técnica e pragmatismo para apoiar lideranças e times na tomada de decisão? Esta vaga pode ser o...

  • Arquiteto de Soluções

    4 semanas atrás


    São Bernardo do Campo, Brasil Netvagas Tempo inteiro

    Candidate-se rapidamente pelo email: Requisitos e qualificações: Experiência no setor bancário ou financeiro. Experiência sólida com AWS (ECS, Lambda, API Gateway, DynamoDB, RDS, S3, VPC). Forte conhecimento em Engenharia de Software, incluindo design patterns, DDD e clean architecture. Desenvolvimento em Node.js e TypeScript com foco em arquiteturas...

  • Senior Java Support

    2 semanas atrás


    São Bernardo do Campo, Brasil DaCodes. Tempo inteiro

    We are a high-impact software and digital transformation expert firm with over 10 years of experience, creating innovative solutions for clients in LATAM and the United States. Our team of more than 220 talented #DaCoders—including developers, architects, UX/UI designers, PMs, QA testers, and more—collaborates on diverse projects across various...

  • Senior Java Support

    2 semanas atrás


    São Bernardo do Campo, Brasil DaCodes. Tempo inteiro

    We are a high-impact software and digital transformation expert firm with over 10 years of experience, creating innovative solutions for clients in LATAM and the United States. Our team of more than 220 talented #DaCoders—including developers, architects, UX/UI designers, PMs, QA testers, and more—collaborates on diverse projects across various...