Site Reliability Engineer

Há 7 dias


Brasília, Brasil Agileengine Tempo inteiro

Job DescriptionAgileEngine is an Inc. 5000 company that creates award-winning software for Fortune 500 brands and trailblazing startups across 17+ industries.
We rank among the leaders in areas like application development and AI/ML, and our people-first culture has earned us multiple Best Place to Work awards.
WHY JOIN US If you're looking for a place to grow, make an impact, and work with people who care, we'd love to meet you
WHAT YOU WILL DO- Shift: Monday – Thursday 8AM – 7PM PST (11AM – 10PM EST) with rotating on-call; - On call shifts: every 6 weeks, for one week as primary responder and next week as secondary;- Manage alerts daily, check systems, and escalate issues as needed;- Be part of a team that provides 24×7 on-call support for critical SaaS events;- Be available in case of emergencies when team members are not available or need help;- Document issues and remediation steps;- Proactively create appropriate monitors in the EKS/K8S ecosystem;- Deploy to EKS/K8s cluster using Terraform and Helm;- Learn and maintain existing infrastructure running under Docker Swarm;- Improve existing infrastructure health by implementing checks and scripts to correct known issues;- Maintain and develop deployment code;- Automate manual tasks;- Implement/integrate new technologies in our Cloud Infrastructure;- Collaborate with other teams and departments to provide the highest level of support and assistance;- Apply a real customer focus when planning deployments/updates, having the customer in the forefront of the mind, and considering the impact on them before making changes;- Work closely on solutions with Support, Customer Success, Migration, and Professional Services teams to provide the best in class SaaS service to our customers;- Perform RCA and take necessary corrective actions to prevent the recurrence of issues;- Create and assign alert-related actions to the appropriate team after the investigation;- Handle support requests for environment-specific actions;- Identify and provide automation requirements to improve RCA.MUST HAVES- 2+ years of professional experience; - Experience working with Datadog;- Hands-on experience as an AWS Cloud Engineer;- Working knowledge of EKS/Terraform/Helm;- Working Experience with Docker and Docker Swarm;- Good understanding of AWS IAM roles and policies;- Experience logging and monitoring AWS resources using CloudWatch logs;- Experience working in a Linux environment;- Proficient in Bash and/or Python scripting;- A strong understanding of web technologies such as REST APIs;- Working Experience with monitoring solutions, such as Grafana and Prometheus;- Excellent oral and written communication skills;- Customer-facing communication skills to effectively explain issues and RCAs to them;- Experience in Product/Application Support for SaaS-based products;- Understanding of APIs, Databases, Systems Architecture, and Design;- Designing, implementing, and operating in a DevSecOps;- Excellent communication skills, both written and verbal;- Ability to work independently as well as within a collaborative environment;- A technical aptitude with the desire to learn new and evolving technologies;- Upper-Intermediate English level.NICE TO HAVES- Experience with GCP or Azure; - Certifications: AWS Certified DevOps Engineer – Professional or AWS Certified Advanced Networking Specialty.PERKS AND BENEFITS- Professional growth: Accelerate your professional journey with mentorship, TechTalks, and personalized growth roadmaps.
- Competitive compensation: We match your ever-growing skills, talent, and contributions with competitive USD-based compensation and budgets for education, fitness, and team activities.
- A selection of exciting projects: Join projects with modern solutions development and top-tier clients that include Fortune 500 enterprises and leading product brands.
- Flextime: Tailor your schedule for an optimal work-life balance, by having the options of working from home and going to the office – whatever makes you the happiest and most productive.
Requirements2+ years of professional experience; Experience working with Datadog; Hands-on experience as an AWS Cloud Engineer; Working knowledge of EKS/Terraform/Helm; Working Experience with Docker and Docker Swarm; Good understanding of AWS IAM roles and policies; Experience logging and monitoring AWS resources using CloudWatch logs; Experience working in a Linux environment; Proficient in Bash and/or Python scripting; A strong understanding of web technologies such as REST APIs; Working Experience with monitoring solutions, such as Grafana and Prometheus; Excellent oral and written communication skills; Customer-facing communication skills to effectively explain issues and RCAs to them; Experience in Product/Application Support for SaaS-based products; Understanding of APIs, Databases, Systems Architecture, and Design; Designing, implementing, and operating in a DevSecOps; Excellent communication skills, both written and verbal; Ability to work independently as well as within a collaborative environment; A technical aptitude with the desire to learn new and evolving technologies; Upper-Intermediate English level.


  • Senior Site Reliability

    3 semanas atrás


    Brasília, Brasil Canonical Tempo inteiro

    Senior Site Reliability / Gitops Engineer Join to apply for the Senior Site Reliability / Gitops Engineer role at Canonical Senior Site Reliability / Gitops Engineer 1 day ago Be among the first 25 applicants Join to apply for the Senior Site Reliability / Gitops Engineer role at Canonical Get AI-powered advice on this job and more exclusive features....


  • Brasília, Brasil Aubay Portugal Tempo inteiro

    Aubay Portugal is a multinational French company, in Portugal since 2007. We have offices in Lisbon and Oporto and we are a specialized consultant in Management, Implementation, Development and Maintenance of Information Systems. We have more than 150 active partners and we operate in sectors such as banking, insurance, telecommunications, services, energy...

  • Site Reliability Engineer

    4 semanas atrás


    Brasília, Brasil Gauge Tempo inteiro

    Somos uma empresa do Grupo Stefanini. Especializados em marketing digital, utilizamos uma abordagem integrada que combina tecnologia, inteligência de dados, design e profundo conhecimento do comportamento do consumidor. Nosso foco está em potencializar os resultados de nossos parceiros, oferecendo soluções que vão desde consultoria estratégica até a...

  • Site Reliability Engineer

    3 semanas atrás


    Brasília, Brasil Gauge Tempo inteiro

    Somos uma empresa do Grupo Stefanini. Especializados em marketing digital, utilizamos uma abordagem integrada que combina tecnologia, inteligência de dados, design e profundo conhecimento do comportamento do consumidor. Nosso foco está em potencializar os resultados de nossos parceiros, oferecendo soluções que vão desde consultoria estratégica até a...

  • Site Reliability Engineer

    2 semanas atrás


    Brasília, Distrito Federal, Brasil Aubay Portugal Tempo inteiro R$80.000 - R$120.000 por ano

    Aubay Portugal is a multinational French company, in Portugal since 2007. We have offices in Lisbon and Oporto and we are a specialized consultant in Management, Implementation, Development and Maintenance of Information Systems. We have more than 150 active partners and we operate in sectors such as banking, insurance, telecommunications, services, energy...

  • Site Reliability Engineer

    3 semanas atrás


    Brasília, Brasil AgileEngine Tempo inteiro

    OverviewSite Reliability Engineer (Middle) ID38916 – Join to apply for the Site Reliability Engineer (Middle) ID38916 role at AgileEngine. AgileEngine is an Inc. 5000 company that creates award-winning software for Fortune 500 brands and trailblazing startups across 17+ industries. We rank among the leaders in areas like application development and AI/ML,...

  • Site Reliability Engineer

    4 semanas atrás


    Brasília, Brasil AgileEngine Tempo inteiro

    Overview Site Reliability Engineer (Middle) ID38916 – Join to apply for the Site Reliability Engineer (Middle) ID38916 role at AgileEngine. AgileEngine is an Inc. 5000 company that creates award-winning software for Fortune 500 brands and trailblazing startups across 17+ industries. We rank among the leaders in areas like application development and AI/ML,...

  • Site Reliability Engineer

    2 semanas atrás


    Brasília, Brasil Aubay Portugal Tempo inteiro

    Site Reliability Engineer - Relocation to Portugal Direct message the job poster from Aubay Portugal Aubay Portugal is a multinational French company, in Portugal since 2007. We have offices in Lisbon and Oporto and we are a specialized consultant in Management, Implementation, Development and Maintenance of Information Systems. We have more than 150 active...

  • Site Reliability Engineer

    2 semanas atrás


    Brasília, Brasil AgileEngine Tempo inteiro

    OverviewSite Reliability Engineer (Middle) ID38916 – AgileEngine AgileEngine is an Inc. 5000 company that creates award-winning software for Fortune 500 brands and startups across 17+ industries. We rank among leaders in application development and AI/ML, and our people-first culture has earned Best Place to Work awards. If you're looking for a place...

  • Site Reliability Engineer

    1 semana atrás


    Brasília, Brasil AgileEngine Tempo inteiro

    Overview Site Reliability Engineer (Middle) ID38916 – AgileEngine AgileEngine is an Inc. 5000 company that creates award-winning software for Fortune 500 brands and startups across 17+ industries. We rank among leaders in application development and AI/ML, and our people-first culture has earned Best Place to Work awards. If you're looking for a place to...