Site Reliability Engineer
2 semanas atrás
Where elite tech talent meets world-class opportunities
At Xenon7, we work with leading enterprises and innovative startups on exciting, cutting-edge projects that leverage the latest technologies across various domains of IT including Data, Web, Infrastructure, AI, and many others. Our expertise in IT solutions development and on-demand resources allows us to partner with clients on transformative initiatives, driving innovation and business growth. Whether it's empowering global organizations or collaborating with trailblazing startups, we are committed to delivering advanced, impactful solutions that meet today's most complex challenges.
About the Client:Join one of Egypt's premier financial institutions, renowned for its extensive suite of banking services, including Institutional Banking, Personal Banking, and Islamic Banking. With a global presence through over 50 branches and correspondents, we serve a diverse and dynamic clientele. As we embark on a groundbreaking digital transformation journey, we are committed to leveraging the latest technologies to establish a state-of-the-art data architecture that will redefine our performance and service delivery.
Position OverviewThe Site Reliability Engineer (SRE) is responsible for ensuring the stability, performance, and reliability of Bank's critical applications, particularly Mobile Banking and Internet Banking platforms. This role bridges development and operations teams, implementing automation solutions, monitoring system health, and providing 24/7 operational support to maintain seamless banking services for customers on on-premise infrastructure.
Key Responsibilities· Monitor and maintain the reliability and performance of Mobile Banking and Internet Banking applications using Prometheus and Grafana dashboards
· Manage and support OpenShift/Kubernetes infrastructure for containerized banking applications on on-premise servers
· Respond to and resolve production incidents with minimal mean time to resolution (MTTR)
· Implement and maintain centralized logging solutions using ELK Stack (Elasticsearch, Logstash, Kibana) for application troubleshooting
· Develop and execute runbooks and automation scripts to reduce manual operational toil in OpenShift environments
· Provide 24/7 production support and on-call rotation for critical banking services
· Analyze logs and metrics from Prometheus and EFK to identify performance bottlenecks and reliability issues
· Conduct root cause analysis (RCA) on incidents and implement preventive measures
· Optimize Kubernetes/OpenShift deployments, pod management, and resource allocation on-premise
· Implement alerting strategies and threshold management in Prometheus and Grafana
· Support infrastructure scaling, capacity planning, and load balancing in production environments
· Implement security best practices and compliance requirements for financial systems in containerized environments
· Manage on-premise data center infrastructure and server resources
· Document operational procedures, troubleshooting guides, and create knowledge base articles
Qualifications· BSc in Computer Science, Information Technology, Software Engineering, or related field
· years of hands-on experience in SRE, DevOps, or Production Engineering roles
· Hands-on experience supporting production applications in Kubernetes/OpenShift environments
· Strong experience with OpenShift container platform administration and troubleshooting on on-premise infrastructure
· Proficiency with Prometheus for metrics collection and monitoring
· Proficiency with Grafana for dashboard creation and visualization
· Experience with ELK Stack (Elasticsearch, Logstash, Kibana) for centralized logging
· Strong understanding of Linux/Unix operating systems and networking fundamentals
· Practical experience with CI/CD tools and automation frameworks
· Proficiency in at least one programming/scripting language (Python, Go, or Bash)
· Experience with database management (SQL and NoSQL) on-premise
· Excellent troubleshooting and analytical skills for production support
· Strong communication skills and ability to work in cross-functional teams
· Experience in 24/7 production support environments
· Experience with on-premise data center infrastructure management
· Previous experience in financial services or banking sector is a plus
-
Senior Site Reliability Engineer
2 semanas atrás
Remote, Brasil Swile Tempo inteiroAt Swile, we believe that good products can help reduce friction in daily professional life and boost employee satisfaction. Today, we provide innovative solutions in various areas such as Fintech, Travel, HR, and Employee Benefits to more than 5.5 million users in 85,000 companies in France and Brazil. Your role as a Senior Site Reliability Engineer (SRE)...
-
Site Reliability Engineer
Há 6 dias
Brasil Conquest One Tempo inteiroVaga: SRE Sênior ️ Inglês para conversação é imprescindível Híbrido – presencial 2x na semana no Jardim Paulista (Av. Nove de Julho – São Paulo/SP) + 3x na semana de home office Contratação: CLT Horário de trabalho: 09:00 às 18:00 Estamos em busca de um(a) Site Reliability Engineer Sênior para atuar de forma estratégica na transformação...
-
Senior Site Reliability Engineer
2 semanas atrás
Brasil YAPP Tempo inteiroA Getrak, líder em plataforma SaaS de rastreamento, monitoramento e segurança veicular, busca um Senior Site Reliability Engineer (SRE) para integrar o time de Tecnologia e Produto. Atuando em um ambiente de alta escala e missão crítica, você será responsável por garantir a confiabilidade, disponibilidade e performance da nossa plataforma, que...
-
Site reliability engineer
2 semanas atrás
Brasil Psm Company Tempo inteiroSobre a vaga A PSM Company é especializada na identificação de Talentos para as áreas de TI / Telecom como também para as áreas operacionais e administrativas. Nossa história de sucesso, está baseada em nosso modelo de negócios que proporcionam assertividade e qualidade no processo seletivo, baixo Turn Over e isenção de riscos e passivos...
-
Site reliability engineer
Há 2 dias
Brasil WSO2 Tempo inteiroAbout WSO2Founded in 2005, WSO2 is the largest independent software vendor providing open-source API management, integration, and identity and access management (IAM) products. WSO2's products and platforms—including our next-gen internal developer platform, Choreo—empower organizations to leverage the full potential of APIs for secure delivery of...
-
Site Reliability Engineer
2 semanas atrás
Vitória Brasil Psm Company Tempo inteiroSobre a vagaA PSM Company é especializada na identificação de Talentos para as áreas de TI / Telecom como também para as áreas operacionais e administrativas.Nossa história de sucesso, está baseada em nosso modelo de negócios que proporcionam assertividade e qualidade no processo seletivo, baixo Turn Over e isenção de riscos e passivos...
-
Site Reliability Engineer Sênior
3 semanas atrás
Brasil Stone Tempo inteiroQuem é Stone Tech? A Stone nasceu com o propósito de ser protagonista na transformação da indústria de pagamentos, lutando para oferecer as melhores soluções para quem empreende no Brasil. Pensando nisso, construímos a Stone Tech A junção dos times de tecnologia Stone Co. e as empresas financeiras do grupo que reconhecem o potencial empreendedor de...
-
Senior Site Reliability Engineer
2 semanas atrás
Brasil Xenon7 Tempo inteiroAbout us:Where elite tech talent meets world-class opportunitiesAt Xenon7, we work with leading enterprises and innovative startups on exciting, cutting-edge projects that leverage the latest technologies across various domains of IT including Data, Web, Infrastructure, AI, and many others. Our expertise in IT solutions development and on-demand resources...
-
Site Reliability Engineering
1 dia atrás
Brasil CWI Software Tempo inteiroBuscamos um(a) Site Reliability Engineer (SRE) Sênior para atuar em plataformas digitais de alta criticidade e escala, garantindo disponibilidade, performance e confiabilidade dos sistemas em produção. A atuação será em squads ágeis, com forte integração entre engenharia e operações, aplicando práticas avançadas de observabilidade, automação e...
-
Site Reliability Engineering
Há 2 dias
Brasil CWI Software Tempo inteiroBuscamos um(a) Site Reliability Engineer (SRE) Sênior para atuar em plataformas digitais de alta criticidade e escala, garantindo disponibilidade, performance e confiabilidade dos sistemas em produção. A atuação será em squads ágeis, com forte integração entre engenharia e operações, aplicando práticas avançadas de observabilidade, automação e...