Site Reliability Engineer

4 semanas atrás


Várzea Grande, Brasil MetaCTO Tempo inteiro

About Us

At MetaCTO, we specialize in helping startups and growing companies turn visionary ideas into successful digital products through expert app development and fractional CTO services. As a Site Reliability Engineer (SRE), you will play a critical role in ensuring the reliability, scalability, and security of the backend infrastructure that powers innovative applications for our clients. This role will involve managing cloud environments, optimizing databases, automating deployments, and improving system observability.

Job Description

As a Site Reliability Engineer (SRE) at MetaCTO, you will be responsible for designing, implementing, and maintaining highly available, scalable, and secure infrastructure solutions. You will collaborate with software engineers to improve system performance, automate operations, and ensure the smooth functioning of critical backend services. You'll work extensively with cloud platforms like AWS, leveraging technologies such as Terraform, Docker, Kubernetes, and CI/CD pipelines to enhance system reliability.

Responsibilities

- Architect, build, and maintain cloud infrastructure on AWS (Lambda, EC2, RDS, S3, EKS, SQS, CloudWatch).
- Manage and optimize databases (MySQL, PostgreSQL) for performance, reliability, and security.
- Implement monitoring, alerting, and logging solutions to ensure system health and performance, with specific experience using Zabbix and Elastic Logging.
- Design and maintain CI/CD pipelines for automated deployment and scaling of applications.
- Work with containerization and orchestration tools such as Docker and Kubernetes.
- Develop and enforce security best practices for cloud environments and infrastructure.
- Automate operational processes using Infrastructure-as-Code (Terraform, CloudFormation) and scripting languages like Python or Bash.
- Troubleshoot and resolve infrastructure-related incidents and optimize system performance.
- Collaborate with backend engineers to ensure high availability, fault tolerance, and scalable system design, with a strong focus on Django-based applications.

Qualifications

- 5-10 years of experience in Site Reliability Engineering (SRE), DevOps, or Cloud Engineering roles.
- Strong expertise in AWS cloud services (EC2, RDS, S3, Lambda, CloudFront, EKS, SQS, IAM).
- Hands-on experience with containerization (Docker) and orchestration (Kubernetes, ECS, or EKS).
- Deep knowledge of relational databases (MySQL, PostgreSQL), including performance tuning, query optimization, monitoring, and migration management.
- Proficiency in Infrastructure-as-Code tools such as Terraform, CloudFormation, or Pulumi.
- Strong experience with CI/CD pipelines and automation tools (GitHub Actions, Jenkins, CircleCI, or GitLab CI/CD).
- Proficiency in monitoring tools, specifically Zabbix, and logging solutions like Elastic Logging.
- Scripting experience with Python, Bash, or Go for automating operational tasks.
- Experience working with Django-based applications in a cloud environment.
- Experience implementing security best practices for cloud-based applications.
- Knowledge of distributed systems and microservices architecture.

Preferred Skills

- AWS certifications (Solutions Architect, DevOps Engineer) are a plus.
- Experience with serverless computing and event-driven architectures.
- Familiarity with message queue services (SQS, RabbitMQ, Kafka).
- Understanding of zero-downtime deployments and disaster recovery strategies.

Position Details

- Type: Full-Time
- Location: 100% Remote
- Hours: US Pacific Time hours

How to Apply

If you are passionate about scalability, automation, and reliability, and thrive in a collaborative, fast-paced environment, we'd love to hear from you. Please submit your resume and an optional brief cover letter outlining your relevant experience.

MetaCTO is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.


  • Site Reliability Engineer

    4 semanas atrás


    Várzea Grande, Brasil Review ALL Tempo inteiro

    About the Company This company operates a global computing platform that enables businesses to programmatically deploy single-tenant Bare Metal instances across multiple regions worldwide. They are a team of passionate engineers working at the intersection of hardware, software, and network infrastructure, building the fastest, most developer-centric...


  • Várzea Grande, Brasil Mercado Eletrônico Tempo inteiro

    O Mercado Eletrônico é líder na América Latina em soluções de gestão de compras B2B. Suas tecnologias e serviços para as áreas de compras ajudam empresas a conquistarem mais economia, agilidade, governança e colaboração. Com escritórios no Brasil, Estados Unidos, México e Portugal, contabiliza mais de 1 milhão de fornecedores, 10 mil...


  • Várzea Grande, Brasil Psm Company Tempo inteiro

    Sobre a vaga A PSM Company é especializada na identificação de Talentos para as áreas de TI / Telecom como também para as áreas operacionais e administrativas. Nossa história de sucesso, está baseada em nosso modelo de negócios que proporcionam assertividade e qualidade no processo seletivo, baixo Turn Over e isenção de riscos e passivos...


  • Várzea Grande, Brasil Scubyt Tempo inteiro

    Software Engineer Site Reliability Engineer Location: Brazil REMOTE Duration: Fulltime CLT / REMOTE About the role The Application SRE Team supports several critical components of our foundational technologies for real-time protection, as well as our RBI and SSPM services. We are a team of software engineers focused on improving availability, latency,...

  • Site Reliability Engineer

    4 semanas atrás


    Rio Grande, Brasil INDI Staffing Services Tempo inteiro

    At INDI, we're passionate about empowering individuals and businesses worldwide. Our cutting-edge recruiters connect leading companies with top talent, fostering a dynamic environment where innovation thrives. Join us in shaping the future of work. Overview of the role: We are looking for a Site Reliability Engineer to build and maintain highly reliable,...


  • Campo Grande, Brasil Scubyt Tempo inteiro

    Software Engineer Site Reliability EngineerLocation: Brazil REMOTE Duration: Fulltime CLT / REMOTEAbout the role The Application SRE Team supports several critical components of our foundational technologies for real-time protection, as well as our RBI and SSPM services. We are a team of software engineers focused on improving availability, latency,...


  • Campo Grande, Brasil Scubyt Tempo inteiro

    Software Engineer Site Reliability EngineerLocation: Brazil REMOTE Duration: Fulltime CLT / REMOTEAbout the role The Application SRE Team supports several critical components of our foundational technologies for real-time protection, as well as our RBI and SSPM services. We are a team of software engineers focused on improving availability, latency,...

  • Site Reliability Engineer Pl

    4 semanas atrás


    Várzea Grande, Brasil Turbi Tempo inteiro

    E aí, tudo azul por aí? A Turbi é a locadora do futuro: 100% digital, movida a tecnologia, gente boa e paixão por transformar a forma como as pessoas se locomovem. A gente abre o carro pelo app (sim, sem chave!) e acreditamos que a inovação de verdade começa com um time engajado e com liberdade para criar. Estamos procurando uma pessoa para a...


  • Várzea Grande, MT, Brasil Vortigo Digital Tempo inteiro

    Somos a Vortigo - nascemos com o propósito de criar aplicativos mobile para um mundo em constante movimento, mas não paramos por aí. Ampliamos nossa atuação e hoje desenvolvemos softwares para ajudar empresas e startups no processo de transformação digital. Nosso time é composto por pessoas apaixonadas por desafios gigantes, mudando a experiência...


  • Várzea Grande, Brasil GlobalSource IT Tempo inteiro

    Databricks Data Engineer Fully Remote Contract We're looking for a hands-on Databricks Data Engineer with strong experience building scalable data pipelines using Spark, PySpark, SQL, and Delta Lake. This role focuses on ingesting data from multiple sources, transforming it for analytics, and publishing high-quality datasets and...