
Senior Site Reliability Engineer
3 semanas atrás
2 weeks ago Be among the first 25 applicants
Get AI-powered advice on this job and more exclusive features.
About Us
Articul8 AI is at the forefront of Generative AI innovation, delivering cutting-edge SaaS products that transform how businesses operate. Our platform empowers organizations to leverage the power of artificial intelligence in a reliable, scalable, and secure environment.
About Us
Articul8 AI is at the forefront of Generative AI innovation, delivering cutting-edge SaaS products that transform how businesses operate. Our platform empowers organizations to leverage the power of artificial intelligence in a reliable, scalable, and secure environment.
Position Overview
We are seeking an experienced Site Reliability Engineer (SRE) to join our team and help ensure the reliability, performance, and scalability of our GenAI SaaS platform. As an SRE, you will bridge the gap between development and operations, implementing automation and best practices to maintain our service reliability objectives while supporting rapid innovation.
Key Responsibilities
- Architect and maintain scalable, highly available infrastructure for our GenAI platform.
- Design and implement robust monitoring, alerting, and observability solutions to proactively ensure system health and performance.
- Automate deployment, scaling, and management of our cloud-native infrastructure, reducing toil and improving efficiency.
- Define, measure, and improve Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to deliver outstanding service quality.
- Participate in on-call rotations and provide rapid response to production incidents, minimizing downtime and user impact.
- Collaborate closely with development teams to build reliable, scalable, and efficient systems for complex AI workloads.
- Lead incident response efforts, conduct thorough post-mortems, and champion continuous improvement initiatives.
- Optimize infrastructure for performance, scalability, and cost-effectiveness—especially for high-demand AI workloads.
- Implement and enforce security best practices across all systems and environments.
- Create and maintain comprehensive documentation, including runbooks and knowledge base articles, to foster a culture of shared knowledge.
Required
- Bachelor's degree in Computer Science, Engineering, or related field, or equivalent practical experience
- 5+ years of experience in DevOps, SRE, or similar roles
- Strong experience with cloud platforms (AWS, GCP, or Azure)
- Proficiency in at least one programming/scripting language (Python, Go, Bash, etc.)
- Hands-on experience with infrastructure as code tools (Terraform, CloudFormation, etc.)
- Solid background in containerization technologies (Docker, Kubernetes)
- Proven experience with monitoring and observability tools (Prometheus, Grafana, ELK stack, etc.)
- Strong understanding of CI/CD pipelines and automation
- Exceptional troubleshooting and problem-solving skills and ability to troubleshoot complex systems
- Experience supporting AI/ML systems in production
- Knowledge of GPU infrastructure management and optimization
- Familiarity with distributed systems and high-performance computing
- Experience with database systems (SQL and NoSQL)
- Certifications in cloud platforms (AWS, GCP, Azure)
- Experience with chaos engineering and resilience testing
- Knowledge of security best practices and compliance requirements
Seniority level
- Seniority levelMid-Senior level
- Employment typeFull-time
- Job functionEngineering and Information Technology
- IndustriesTechnology, Information and Internet
Referrals increase your chances of interviewing at Articul8 AI by 2x
Sign in to set job alerts for "Senior Site Reliability Engineer" roles.Desenvolvedor Front-end | Front-end Developer - RemotoJoinville, Santa Catarina, Brazil 4 months ago
We're unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
#J-18808-Ljbffr-
Senior DevOps Site Reliability Engineer
3 semanas atrás
Brasil Housecall Pro Tempo inteiroJoin to apply for the Senior DevOps Site Reliability Engineer role at Housecall Pro Join to apply for the Senior DevOps Site Reliability Engineer role at Housecall Pro Get AI-powered advice on this job and more exclusive features. TO BE CONSIDERED FOR THIS ROLE, PLEASE SUBMIT AN UPDATED RESUME TRANSLATED TO ENGLISH Who is Housecall Pro? Housecall Pro is...
-
Senior Site Reliability Engineer
3 semanas atrás
Brasil DuckDuckGo Tempo inteiro6 days ago Be among the first 25 applicants Get AI-powered advice on this job and more exclusive features. Who We Are Hi, we're DuckDuckGo, the online protection company and remote-first team of 300+ on a mission to raise the standard of trust online. Founded in 2008 and profitable since 2014, our annual revenue now exceeds $100 million USD. Millions use our...
-
Site Reliability Engineer
2 semanas atrás
Brasil Aubay Portugal Tempo inteiroAubay Portugal is a multinational French company, in Portugal since 2007. We have offices in Lisbon and Oporto and we are a specialized consultant in Management, Implementation, Development and Maintenance of Information Systems. We have more than 150 active partners and we operate in sectors such as banking, insurance, telecommunications, services, energy...
-
Site Reliability Engineer
2 semanas atrás
Brasil Seedify Tempo inteiro US$90.000 - US$120.000 por anoSeedify is a leading cryptocurrency launchpad platform dedicated to fostering innovation and success in the Web3 space. Our mission is to identify and assist promising teams and projects and offer outstanding returns to our investor base.Job DescriptionWe are seeking a highly skilled Site Reliability Engineer with extensive experience in DevOps,...
-
Site Reliability Engineer
Há 7 dias
Brasil Kraken Tempo inteiroOverviewSite Reliability Engineer - Data Platform role at Kraken. Join our Data Infrastructure team to uphold the reliability, scalability, and efficiency of our data platform. ResponsibilitiesDesign the data governance mechanisms that ensure our lakehouse is easy to interact with, secure and in compliance with all applicable regulations. Implement the...
-
Site Reliability Engineer
3 semanas atrás
Brasil Parfin Tempo inteiroAbout Parfin Parfin is the leading web3 infrastructure provider in Latin America. We offer institutions an end-to-end solution for digital asset custody, trading, tokenization, and management. Our clients include some of the largest banks and crypto-native companies in Latin America. We accelerate institutional adoption of web3 by creating solutions that...
-
Site Reliability Engineer
3 semanas atrás
Brasil Aubay Portugal Tempo inteiroSite Reliability Engineer - Relocation to Portugal Aubay Portugal is a multinational French company, with offices in Lisbon and Oporto. We are a specialized consultant in Management, Implementation, Development and Maintenance of Information Systems, and we operate in sectors such as banking, insurance, telecommunications, services, energy and transports....
-
site reliability engineer
3 semanas atrás
Brasil Bernoulli Educação Tempo inteiroJoin to apply for the SITE RELIABILITY ENGINEER role at Bernoulli Educação Join to apply for the SITE RELIABILITY ENGINEER role at Bernoulli Educação Se o olho brilha, vem ser Bernoulli Somos feitos de pessoas que acreditam no poder transformador da educação. Gente criativa, determinada e que gosta de aprender. Profissionais que enxergam os...
-
Site Reliability Engineer
3 semanas atrás
Brasil Pythian Tempo inteiroSite Reliability Engineer Multiple timezones available |Remote | Work from Home Why Pythian: At Pythian, we are experts in strategic database and analytics services, driving digital transformation and operational excellence. Pythian, a multinational company, was founded in 1997 and started by ensuring the reliability and performance of mission-critical...
-
Senior Site Reliability Engineer
2 semanas atrás
Brasil Articul8 Tempo inteiro US$120.000 - US$200.000 por anoAbout UsArticul8 AI is at the forefront of Generative AI innovation, delivering cutting-edge SaaS products that transform how businesses operate. Our platform empowers organizations to leverage the power of artificial intelligence in a reliable, scalable, and secure environment.Position OverviewWe are seeking an experienced Site Reliability Engineer (SRE) to...