Senior Site Reliability Engineer
2 semanas atrás
Where elite tech talent meets world-class opportunities
At Xenon7, we work with leading enterprises and innovative startups on exciting, cutting-edge projects that leverage the latest technologies across various domains of IT including Data, Web, Infrastructure, AI, and many others. Our expertise in IT solutions development and on-demand resources allows us to partner with clients on transformative initiatives, driving innovation and business growth. Whether it's empowering global organizations or collaborating with trailblazing startups, we are committed to delivering advanced, impactful solutions that meet today's most complex challenges.
About the Client:Join one of Egypt's premier financial institutions, renowned for its extensive suite of banking services, including Institutional Banking, Personal Banking, and Islamic Banking. With a global presence through over 50 branches and correspondents, we serve a diverse and dynamic clientele. As we embark on a groundbreaking digital transformation journey, we are committed to leveraging the latest technologies to establish a state-of-the-art data architecture that will redefine our performance and service delivery.
Position OverviewThe Senior Site Reliability Engineer is a technical leadership role responsible for designing, implementing, and maintaining highly available, scalable, and secure infrastructure for critical banking applications, including Mobile Banking and Internet Banking platforms on on-premise infrastructure. This role leads SRE initiatives, mentors junior engineers, drives continuous improvement in production support, and leads observability strategy using OpenShift, Kubernetes, Prometheus, Grafana, and ELK Stack on on-premise data center infrastructure.
Key Responsibilities· Design and architect highly available and scalable OpenShift/Kubernetes infrastructure for banking applications on on-premise servers
· Lead and implement comprehensive monitoring and observability strategy using Prometheus and Grafana
· Design and oversee centralized logging infrastructure using ELK Stack (Elasticsearch, Logstash, Kibana)
· Lead SRE best practices implementation and adoption of production support standards across teams
· Mentor and coach junior SRE and DevOps engineers on OpenShift, Kubernetes, monitoring, and production support
· Define and implement Service Level Indicators (SLIs), Objectives (SLOs), and Agreements (SLAs) with measurable metrics
· Lead incident response strategy, post-incident reviews, and drive continuous improvement in production stability
· Architect and implement advanced alerting, monitoring dashboards, and visualization strategies using Prometheus and Grafana
· Design automation frameworks and tools to reduce operational toil and improve production efficiency
· Lead OpenShift/Kubernetes cluster upgrades, security patches, and infrastructure modernization on-premise
· Establish production support procedures, on-call rotation policies, and escalation frameworks
· Optimize system performance, cost, and resource utilization across containerized on-premise infrastructure
· Conduct capacity planning, performance optimization, and infrastructure scaling initiatives
· Lead technical architecture reviews and infrastructure design decisions for banking applications
· Manage on-premise data center resources and infrastructure planning
· Participate in 24/7 on-call rotation and escalation for critical production incidents
· Ensure compliance, security hardening, and disaster recovery procedures for financial systems
Qualifications· BSc in Computer Science, Information Technology, Software Engineering, or related field
· years of hands-on SRE, DevOps, or Production Engineering experience
· years of experience leading SRE teams or managing production support operations
· years of hands-on experience managing OpenShift and Kubernetes infrastructure on on-premise infrastructure
· Expert-level experience with Prometheus for monitoring and alerting in production
· Expert-level experience with Grafana for creating comprehensive monitoring dashboards
· Advanced experience with ELK Stack (Elasticsearch, Logstash, Kibana) for logging and log analysis
· Proven experience designing and scaling production systems for high-traffic banking applications
· Deep expertise in Linux/Unix system administration and container networking
· Advanced knowledge of CI/CD automation and deployment strategies
· Hands-on experience with database management, tuning, and optimization on-premises
· Strong experience with infrastructure automation and Infrastructure as Code
· Proven 24/7 production support experience in mission-critical environments
· Experience managing on-premise data center infrastructure
· Proven leadership skills and ability to mentor junior engineers
· Excellent communication skills and ability to present to executive stakeholders
· Experience in financial services or banking sector is highly preferred
-
Senior Site Reliability Engineer
2 semanas atrás
Remote, Brasil Swile Tempo inteiroAt Swile, we believe that good products can help reduce friction in daily professional life and boost employee satisfaction. Today, we provide innovative solutions in various areas such as Fintech, Travel, HR, and Employee Benefits to more than 5.5 million users in 85,000 companies in France and Brazil. Your role as a Senior Site Reliability Engineer (SRE)...
-
Site Reliability Engineer
3 semanas atrás
Brasil Quantum World Technologies Inc. Tempo inteiroWe are seeking a Site Reliability Engineer (SRE) who is passionate about large-scale infrastructure and eager to develop deeper expertise in PostgreSQL. In this role, you will join the Database Engineering organization and help strengthen the reliability, resilience, and automation of our database platform. This position is an excellent fit for an...
-
Site Reliability Engineer
3 semanas atrás
Brasil Quantum World Technologies Inc. Tempo inteiroWe are seeking a Site Reliability Engineer (SRE) who is passionate about large-scale infrastructure and eager to develop deeper expertise in PostgreSQL. In this role, you will join the Database Engineering organization and help strengthen the reliability, resilience, and automation of our database platform. This position is an excellent fit for an...
-
Senior Site Reliability Engineer
2 semanas atrás
Brasil YAPP Tempo inteiroA Getrak, líder em plataforma SaaS de rastreamento, monitoramento e segurança veicular, busca um Senior Site Reliability Engineer (SRE) para integrar o time de Tecnologia e Produto. Atuando em um ambiente de alta escala e missão crítica, você será responsável por garantir a confiabilidade, disponibilidade e performance da nossa plataforma, que...
-
Site Reliability Engineer Sr
3 semanas atrás
Brasil Mercado Eletrônico Tempo inteiroO Mercado Eletrônico é líder na América Latina em soluções de gestão de compras B2B. Suas tecnologias e serviços para as áreas de compras ajudam empresas a conquistarem mais economia, agilidade, governança e colaboração. Com escritórios no Brasil, Estados Unidos, México e Portugal, contabiliza mais de 1 milhão de fornecedores, 10 mil...
-
Site Reliability Engineer
2 semanas atrás
Brasil Xenon7 Tempo inteiroAbout us:Where elite tech talent meets world-class opportunitiesAt Xenon7, we work with leading enterprises and innovative startups on exciting, cutting-edge projects that leverage the latest technologies across various domains of IT including Data, Web, Infrastructure, AI, and many others. Our expertise in IT solutions development and on-demand resources...
-
Site Reliability Engineer
1 semana atrás
Índio do Brasil Softensity Inc Tempo inteiroSummaryWe at Softensity are looking for a Site Reliability Engineer (SRE) – This is a dynamic and hands-on role within a global, collaborative SRE environment. The SRE Technical Member will contribute to building resilient systems, automating operations, and ensuring the platform meets high standards for performance, reliability, and security.You will be...
-
Senior DevOps Engineer
2 semanas atrás
Índio do Brasil Edison Smart Tempo inteiroSenior DevOps Engineer | Remote ±3 GMT (Brazil) | €200 per day | 12 monthsRole: Senior DevOps EngineerLocation: Remote ±3 GMT (Brazil)Rate / Salary: €200 per dayDuration: 12 months (extension likely)Language: EnglishI'm partnering with a leading consultancy delivering a large-scale Cloud, DevOps & Data transformation for a major global organisation in...
-
Senior Platform Engineer
3 semanas atrás
Brasil beBeeEngineering Tempo inteiroJob Title: Senior Platform Engineer We are looking for an experienced Senior Platform Engineer to join our team. As a key member of our platform engineering team, you will be responsible for designing and optimizing architectures for reliability and scalability. Key Responsibilities Design and optimize architectures for reliability and scalability Debug and...
-
Staff Devops Site Reliability Engineer
Há 5 dias
Brasil Housecall Pro Tempo inteiroTO BE CONSIDERED FOR THIS ROLE, PLEASE SUBMIT AN UPDATED RESUME TRANSLATED TO ENGLISHWhy Housecall Pro?Help us build solutions that build better lives. At Housecall Pro, we show up to work every day to make a difference for real people: the home service professionals that support America's 100 million homes. We're all about the Pro, and dedicate our days to...