Site Reliability Engineer

2 semanas atrás


Brasil Xenon7 Tempo inteiro
About us:

Where elite tech talent meets world-class opportunities

At Xenon7, we work with leading enterprises and innovative startups on exciting, cutting-edge projects that leverage the latest technologies across various domains of IT including Data, Web, Infrastructure, AI, and many others. Our expertise in IT solutions development and on-demand resources allows us to partner with clients on transformative initiatives, driving innovation and business growth. Whether it's empowering global organizations or collaborating with trailblazing startups, we are committed to delivering advanced, impactful solutions that meet today's most complex challenges.

About the Client:

Join one of Egypt's premier financial institutions, renowned for its extensive suite of banking services, including Institutional Banking, Personal Banking, and Islamic Banking. With a global presence through over 50 branches and correspondents, we serve a diverse and dynamic clientele. As we embark on a groundbreaking digital transformation journey, we are committed to leveraging the latest technologies to establish a state-of-the-art data architecture that will redefine our performance and service delivery.

Position Overview

The Site Reliability Engineer (SRE) is responsible for ensuring the stability, performance, and reliability of Bank's critical applications, particularly Mobile Banking and Internet Banking platforms. This role bridges development and operations teams, implementing automation solutions, monitoring system health, and providing 24/7 operational support to maintain seamless banking services for customers on on-premise infrastructure.

Key Responsibilities

·       Monitor and maintain the reliability and performance of Mobile Banking and Internet Banking applications using Prometheus and Grafana dashboards

·       Manage and support OpenShift/Kubernetes infrastructure for containerized banking applications on on-premise servers

·       Respond to and resolve production incidents with minimal mean time to resolution (MTTR)

·       Implement and maintain centralized logging solutions using ELK Stack (Elasticsearch, Logstash, Kibana) for application troubleshooting

·       Develop and execute runbooks and automation scripts to reduce manual operational toil in OpenShift environments

·       Provide 24/7 production support and on-call rotation for critical banking services

·       Analyze logs and metrics from Prometheus and EFK to identify performance bottlenecks and reliability issues

·       Conduct root cause analysis (RCA) on incidents and implement preventive measures

·       Optimize Kubernetes/OpenShift deployments, pod management, and resource allocation on-premise

·       Implement alerting strategies and threshold management in Prometheus and Grafana

·       Support infrastructure scaling, capacity planning, and load balancing in production environments

·       Implement security best practices and compliance requirements for financial systems in containerized environments

·       Manage on-premise data center infrastructure and server resources

·       Document operational procedures, troubleshooting guides, and create knowledge base articles

Qualifications

·       BSc in Computer Science, Information Technology, Software Engineering, or related field

· years of hands-on experience in SRE, DevOps, or Production Engineering roles

·       Hands-on experience supporting production applications in Kubernetes/OpenShift environments

·       Strong experience with OpenShift container platform administration and troubleshooting on on-premise infrastructure

·       Proficiency with Prometheus for metrics collection and monitoring

·       Proficiency with Grafana for dashboard creation and visualization

·       Experience with ELK Stack (Elasticsearch, Logstash, Kibana) for centralized logging

·       Strong understanding of Linux/Unix operating systems and networking fundamentals

·       Practical experience with CI/CD tools and automation frameworks

·       Proficiency in at least one programming/scripting language (Python, Go, or Bash)

·       Experience with database management (SQL and NoSQL) on-premise

·       Excellent troubleshooting and analytical skills for production support

·       Strong communication skills and ability to work in cross-functional teams

·       Experience in 24/7 production support environments

·       Experience with on-premise data center infrastructure management

·       Previous experience in financial services or banking sector is a plus


  • Site Reliability Engineer

    3 semanas atrás


    Brasil Quantum World Technologies Inc. Tempo inteiro

    We are seeking a Site Reliability Engineer (SRE) who is passionate about large-scale infrastructure and eager to develop deeper expertise in PostgreSQL. In this role, you will join the Database Engineering organization and help strengthen the reliability, resilience, and automation of our database platform. This position is an excellent fit for an...

  • Site Reliability Engineer

    3 semanas atrás


    Brasil Quantum World Technologies Inc. Tempo inteiro

    We are seeking a Site Reliability Engineer (SRE) who is passionate about large-scale infrastructure and eager to develop deeper expertise in PostgreSQL. In this role, you will join the Database Engineering organization and help strengthen the reliability, resilience, and automation of our database platform. This position is an excellent fit for an...


  • Remote, Brasil Swile Tempo inteiro

    At Swile, we believe that good products can help reduce friction in daily professional life and boost employee satisfaction. Today, we provide innovative solutions in various areas such as Fintech, Travel, HR, and Employee Benefits to more than 5.5 million users in 85,000 companies in France and Brazil. Your role as a Senior Site Reliability Engineer (SRE)...


  • Brasil YAPP Tempo inteiro

    A Getrak, líder em plataforma SaaS de rastreamento, monitoramento e segurança veicular, busca um Senior Site Reliability Engineer (SRE) para integrar o time de Tecnologia e Produto. Atuando em um ambiente de alta escala e missão crítica, você será responsável por garantir a confiabilidade, disponibilidade e performance da nossa plataforma, que...

  • Site Reliability Engineer Sr

    3 semanas atrás


    Brasil Mercado Eletrônico Tempo inteiro

    O Mercado Eletrônico é líder na América Latina em soluções de gestão de compras B2B. Suas tecnologias e serviços para as áreas de compras ajudam empresas a conquistarem mais economia, agilidade, governança e colaboração. Com escritórios no Brasil, Estados Unidos, México e Portugal, contabiliza mais de 1 milhão de fornecedores, 10 mil...

  • Site Reliability Engineer

    1 semana atrás


    Índio do Brasil Softensity Inc Tempo inteiro

    SummaryWe at Softensity are looking for a Site Reliability Engineer (SRE) – This is a dynamic and hands-on role within a global, collaborative SRE environment. The SRE Technical Member will contribute to building resilient systems, automating operations, and ensuring the platform meets high standards for performance, reliability, and security.You will be...


  • Brasil Housecall Pro Tempo inteiro

    TO BE CONSIDERED FOR THIS ROLE, PLEASE SUBMIT AN UPDATED RESUME TRANSLATED TO ENGLISHWhy Housecall Pro?Help us build solutions that build better lives. At Housecall Pro, we show up to work every day to make a difference for real people: the home service professionals that support America's 100 million homes. We're all about the Pro, and dedicate our days to...


  • Brasil Xenon7 Tempo inteiro

    About us:Where elite tech talent meets world-class opportunitiesAt Xenon7, we work with leading enterprises and innovative startups on exciting, cutting-edge projects that leverage the latest technologies across various domains of IT including Data, Web, Infrastructure, AI, and many others. Our expertise in IT solutions development and on-demand resources...

  • Site Reliability Engineer Sr

    2 semanas atrás


    Índio do Brasil Mercado Eletrônico Tempo inteiro

    O Mercado Eletrônico é líder na América Latina em soluções de gestão de compras B2B. Suas tecnologias e serviços para as áreas de compras ajudam empresas a conquistarem mais economia, agilidade, governança e colaboração.Com escritórios no Brasil, Estados Unidos, México e Portugal, contabiliza mais de 1 milhão de fornecedores, 10 mil...


  • Brasil Stone Tempo inteiro

    Quem é Stone Tech? A Stone nasceu com o propósito de ser protagonista na transformação da indústria de pagamentos, lutando para oferecer as melhores soluções para quem empreende no Brasil. Pensando nisso, construímos a Stone Tech! A junção dos times de tecnologia Stone Co. e as empresas financeiras do grupo que reconhecem o potencial empreendedor...