Site Reliability Engineer
Há 4 dias
About Us
At MetaCTO, we specialize in helping startups and growing companies turn visionary ideas into successful digital products through expert app development and fractional CTO services. As a Site Reliability Engineer (SRE), you will play a critical role in ensuring the reliability, scalability, and security of the backend infrastructure that powers innovative applications for our clients. This role will involve managing cloud environments, optimizing databases, automating deployments, and improving system observability.
Job Description
As a Site Reliability Engineer (SRE) at MetaCTO, you will be responsible for designing, implementing, and maintaining highly available, scalable, and secure infrastructure solutions. You will collaborate with software engineers to improve system performance, automate operations, and ensure the smooth functioning of critical backend services. You’ll work extensively with cloud platforms like AWS, leveraging technologies such as Terraform, Docker, Kubernetes, and CI/CD pipelines to enhance system reliability.
Responsibilities
- Architect, build, and maintain cloud infrastructure on AWS (Lambda, EC2, RDS, S3, EKS, SQS, CloudWatch).
- Manage and optimize databases (MySQL, PostgreSQL) for performance, reliability, and security.
- Implement monitoring, alerting, and logging solutions to ensure system health and performance, with specific experience using Zabbix and Elastic Logging.
- Design and maintain CI/CD pipelines for automated deployment and scaling of applications.
- Work with containerization and orchestration tools such as Docker and Kubernetes.
- Develop and enforce security best practices for cloud environments and infrastructure.
- Automate operational processes using Infrastructure-as-Code (Terraform, CloudFormation) and scripting languages like Python or Bash.
- Troubleshoot and resolve infrastructure-related incidents and optimize system performance.
- Collaborate with backend engineers to ensure high availability, fault tolerance, and scalable system design, with a strong focus on Django-based applications.
Qualifications
- 5-10 years of experience in Site Reliability Engineering (SRE), DevOps, or Cloud Engineering roles.
- Strong expertise in AWS cloud services (EC2, RDS, S3, Lambda, CloudFront, EKS, SQS, IAM).
- Hands-on experience with containerization (Docker) and orchestration (Kubernetes, ECS, or EKS).
- Deep knowledge of relational databases (MySQL, PostgreSQL), including performance tuning, query optimization, monitoring, and migration management.
- Proficiency in Infrastructure-as-Code tools such as Terraform, CloudFormation, or Pulumi.
- Strong experience with CI/CD pipelines and automation tools (GitHub Actions, Jenkins, CircleCI, or GitLab CI/CD).
- Proficiency in monitoring tools, specifically Zabbix, and logging solutions like Elastic Logging.
- Scripting experience with Python, Bash, or Go for automating operational tasks.
- Experience working with Django-based applications in a cloud environment.
- Experience implementing security best practices for cloud-based applications.
- Knowledge of distributed systems and microservices architecture.
Preferred Skills
- AWS certifications (Solutions Architect, DevOps Engineer) are a plus.
- Experience with serverless computing and event-driven architectures.
- Familiarity with message queue services (SQS, RabbitMQ, Kafka).
- Understanding of zero-downtime deployments and disaster recovery strategies.
Position Details
- Type: Full-Time
- Location: 100% Remote
- Hours: US Pacific Time hours
How to Apply
If you are passionate about scalability, automation, and reliability, and thrive in a collaborative, fast-paced environment, we’d love to hear from you. Please submit your resume and an optional brief cover letter outlining your relevant experience.
MetaCTO is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.
-
Site Reliability Engineer
2 semanas atrás
Porto Alegre, Brasil Canonical Tempo inteiro1 month ago Be among the first 25 applicants Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation, and IoT. Our customers include the world's...
-
Site Reliability Engineer
1 semana atrás
Porto Alegre, Brasil Canonical Tempo inteiro1 month ago Be among the first 25 applicants Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation, and IoT. Our customers include the world's...
-
Site Reliability Engineer
Há 2 dias
Porto Alegre, Brasil Review All Tempo inteiroAbout the CompanyThis company operates a global computing platform that enables businesses to programmatically deploy single-tenant Bare Metal instances across multiple regions worldwide.They are a team of passionate engineers working at the intersection of hardware, software, and network infrastructure, building the fastest, most developer-centric...
-
Site Reliability Engineer
Há 4 dias
Porto Alegre, Brasil MetaCTO Tempo inteiroAbout UsAt MetaCTO, we specialize in helping startups and growing companies turn visionary ideas into successful digital products through expert app development and fractional CTO services. As a Site Reliability Engineer (SRE), you will play a critical role in ensuring the reliability, scalability, and security of the backend infrastructure that powers...
-
Mid level Site Reliability Engineer
2 semanas atrás
Porto Alegre, Brasil WEX Tempo inteiroJoin to apply for the Mid level Site Reliability Engineer role at WEX 1 week ago Be among the first 25 applicants About The Team/Role The WEX Site Reliability Engineering (SRE) team seeks individuals passionate about developing software and solutions for observability, incident response, reliability, performance, operational excellence, and compliance. As...
-
Site Reliability Engineer
1 semana atrás
Porto Alegre, Brasil Azion Tempo inteiroJoin to apply for the Site Reliability Engineer (SRE) role at Azion 3 days ago Be among the first 25 applicants Join to apply for the Site Reliability Engineer (SRE) role at Azion About Azion We are a global leader in the application and security industry. Our platform allows companies to operate with agility, reducing latency and increasing the reliability...
-
Mid Level Site Reliability Engineer
Há 5 dias
Porto Alegre, Brasil Wex Tempo inteiroThe WEX Site Reliability Engineering (SRE) team seeks individuals passionate about developing software and solutions for observability, incident response, reliability, performance, operational excellence, and compliance.As part of the Site Reliability Engineering organization, you will support internal stakeholders and Payment Platform teams, tackling...
-
Site Reliability Engineer
Há 2 dias
Porto Alegre, Brasil Wex Tempo inteiroAbout The Team/RoleWe are seeking a Software Development Engineer Level 3 to join our SRE team dedicated to the Mobility line of business.This role is for a professional with a software development background who will apply SRE principles to ensure the reliability, scalability, and performance of our complex software systems.The ideal candidate will have...
-
Site Reliability Engineer
Há 5 dias
Porto Alegre, Rio Grande do Sul, Brasil Wex Tempo inteiro R$80.000 - R$120.000 por anoAbout the Team/Role We are seeking a Software Development Engineer Level 3 to join our SRE team dedicated to the Mobility line of business. This role is for a professional with a software development background who will apply SRE principles to ensure the reliability, scalability, and performance of our complex software systems.The ideal candidate will have...
-
Site Reliability Engineer Sre
2 semanas atrás
Porto Alegre, Brasil Netvagas Tempo inteiroAbout AzionWe are a global leader in the application and security industry.Our platform allows companies to operate with agility, reducing latency and increasing the reliability of their applications.We are focused on simplifying application building and looking for passionate and innovative individuals to join our team!At Azion you will have the opportunity...