ML Data Pipeline Engineer
Há 2 dias
We're seeking a Data Pipeline Engineer to own and evolve our exercise recognition training data infrastructure. You'll manage the end-to-end pipeline that collects, synchronizes, validates, and prepares IMU sensor and video data for ML model training. This role combines systems engineering, data quality automation, and hands-on problem-solving in a production environment. What You'll Do Pipeline Operations & Improvement Maintain and enhance our multi-source data collection system: IMU sensors (via mobile app) and synchronized video streams from gym-based cameras. Improve video capture software robustness, particularly handling network interruptions and operational monitoring. Deploy and monitor services in remote Linux environments with appropriate DevOps practices. Data Quality & Validation Evolve our Python-based QC engine that validates data pre- and post-annotation Implement checks for IMU-video time synchronization, sensor health, and measurement consistency Apply digital signal processing techniques to identify sensor failures, connectivity issues, and measurement irregularities. Develop validation logic comparing annotations against sensor data to ensure temporal alignment. Analysis & Troubleshooting Perform ad-hoc analysis on ~1,200+ workout tasks to classify failure modes Identify whether issues stem from pipeline bugs, sensor problems, or annotation errors Prioritize engineering work based on data quality impact and coordinate with annotation team on fixes Tooling and Visualization Maintain and extend our NextJS UI serving annotators, data scientists, and stakeholders Create visualizations (Chart.js) for QC metrics and signal analysis Integrate with LabelStudio annotation interface What You Bring Required Strong Python programming skills, particularly for data processing pipelines Experience with time-series data and digital signal processing Comfortable working in Linux environments and deploying/monitoring remote services Ability to debug complex multi-component systems (sensors, video, networks, sync) Data quality mindset: designing validation rules, tracking metrics, investigating anomalies SQL/database experience for managing pipeline metadata Highly Valued Video processing experience (RTSP streams, encoding, OCR) Working with sensor/IoT data and handling connectivity challenges NextJS or modern web frameworks for data tooling DevOps practices: containerization, monitoring, logging, alerting Experience with annotation pipelines and ML training data workflows Background in biomechanics, sports science, or wearable sensors Tech Stack Languages: Python (primary), JavaScript/TypeScript (NextJS UI) Data: IMU sensor streams, video (RTSP), time-series analysis, DSP Tools: LabelStudio, Chart.js, Linux/bash, OCR libraries Infrastructure: Remote deployment, monitoring systems You'll Thrive Here If You Enjoy detective work: diagnosing why data doesn't match expectations Balance pragmatism with quality: shipping improvements while maintaining reliability Communicate well across technical and non-technical stakeholders Can work autonomously in a small, mission-driven team
-
Senior ML Engineer, Ad Performance
4 semanas atrás
Teresina, Brasil Launch Potato Tempo inteiroOverview Join to apply for the Senior ML Engineer, Ad Performance role at Launch Potato 4 days ago Be among the first 25 applicants Join to apply for the Senior ML Engineer, Ad Performance role at Launch Potato WHO ARE WE? Launch Potato is a profitable digital media company that reaches over 30M+ monthly visitors through brands such as FinanceBuzz, All About...
-
Principal ML Engineer, Recommendation Systems
3 semanas atrás
Teresina, Brasil Launch Potato Tempo inteiroPrincipal ML Engineer, Recommendation Systems Join to apply for the Principal ML Engineer, Recommendation Systems role at Launch Potato WHO ARE WE Launch Potato is a profitable digital media company that reaches over 30M+ monthly visitors through brands such as FinanceBuzz, All About Cookies, and OnlyInYourState. As The Discovery and Conversion Company, our...
-
Data Engineer, Data Lake
4 semanas atrás
Teresina, Brasil Visa Inc. Tempo inteiroWhat you\'ll do Data Teams play a central role at Pismo, providing data-intensive analytics for internal product teams, while also processing and delivering TBs of high-quality data, mainly in streaming form, needed by our partners to produce outstanding products downstream. Operating mainly in Brazil, the data team recently extended its scope to accelerate...
-
Data Engineer
2 semanas atrás
Teresina, Brasil Avra Tempo inteiroA Avra é uma plataforma de inteligência de dados deep tech, que traduz a complexidade das pequenas e médias empresas em decisões estratégicas para grandes corporações. Desenvolvemos nossos próprios modelos fundamentais do zero — sem depender de soluções de terceiros — para entregar insights inovadores que impulsionam grandes bancos e fintechs...
-
Analytics Engineer
2 semanas atrás
Teresina, Brasil AgileEngine Tempo inteiroOverview Join to apply for the Analytics Engineer (MarTech) ID43406 role at AgileEngine . AgileEngine is an Inc. 5000 company that creates award-winning software for Fortune 500 brands and trailblazing startups across 17+ industries. We rank among the leaders in areas like application development and AI/ML, and our people-first culture has earned us multiple...
-
Data Warehouse Architect
Há 3 horas
Teresina, Brasil AVM Consulting Inc Tempo inteiroDatabricks Data Warehouse Architect About the role: One of the largest companies in the world in the Gaming industry is seeking a hands-on Data Architect with DataWarehouse Engineer expertise in Databricks (DBX)and AWS-native data services to spearhead the design and implementation of a new data warehouse instance for a major product line. This role will...
-
Site Reliability Engineer
2 semanas atrás
Teresina, Brasil Ryz Labs Tempo inteiroJoin to apply for the Site Reliability Engineer role at Ryz Labs 1 week ago Be among the first 25 applicants Join to apply for the Site Reliability Engineer role at Ryz Labs Get AI-powered advice on this job and more exclusive features. Remote position within South AmericaRYZ is seeking a Site Reliability Engineer to join one of our clients, who is...
-
Data Engineer | Specialist
3 semanas atrás
Teresina, Brasil Compass UOL Tempo inteiroJob description . Main responsibilities Criar pipeline de dados para obtenção dos dados com airflow/spark; Implementar controles de acesso e auditoria para proteger dados sensíveis e garantir conformidade com regulamentações; Migrar e integrar dados existentes na nova plataforma, assegurando a continuidade das operações e a integridade dos dados;...
-
Senior Machine Learning
2 semanas atrás
Teresina, Brasil BairesDev Tempo inteiroJoin or sign in to find your next job Join to apply for the Senior Machine Learning & LLM Engineer - Remote Work | REF# role at BairesDev 4 months ago Be among the first 25 applicants Join to apply for the Senior Machine Learning & LLM Engineer - Remote Work | REF# role at BairesDev At BairesDev, we've been leading the way in technology projects for over 15...
-
Senior Data Engineer
3 semanas atrás
Teresina, Brasil Swapcard Tempo inteiroOur Mission Swapcard is the leading AI-powered event platform designed to drive revenue growth and foster meaningful connections at in-person and hybrid events. We recognize the importance of teamwork in successful events; that's why Swapcard is fueled by a team of innovators who are passionate about helping organizers build future-proof events. Our Vision...