ML Data Pipeline Engineer

Há 2 dias


Teresina, Brasil Prosigliere Tempo inteiro

We're seeking a Data Pipeline Engineer to own and evolve our exercise recognition training data infrastructure. You'll manage the end-to-end pipeline that collects, synchronizes, validates, and prepares IMU sensor and video data for ML model training. This role combines systems engineering, data quality automation, and hands-on problem-solving in a production environment. What You'll Do Pipeline Operations & Improvement Maintain and enhance our multi-source data collection system: IMU sensors (via mobile app) and synchronized video streams from gym-based cameras. Improve video capture software robustness, particularly handling network interruptions and operational monitoring. Deploy and monitor services in remote Linux environments with appropriate DevOps practices. Data Quality & Validation Evolve our Python-based QC engine that validates data pre- and post-annotation Implement checks for IMU-video time synchronization, sensor health, and measurement consistency Apply digital signal processing techniques to identify sensor failures, connectivity issues, and measurement irregularities. Develop validation logic comparing annotations against sensor data to ensure temporal alignment. Analysis & Troubleshooting Perform ad-hoc analysis on ~1,200+ workout tasks to classify failure modes Identify whether issues stem from pipeline bugs, sensor problems, or annotation errors Prioritize engineering work based on data quality impact and coordinate with annotation team on fixes Tooling and Visualization Maintain and extend our NextJS UI serving annotators, data scientists, and stakeholders Create visualizations (Chart.js) for QC metrics and signal analysis Integrate with LabelStudio annotation interface What You Bring Required Strong Python programming skills, particularly for data processing pipelines Experience with time-series data and digital signal processing Comfortable working in Linux environments and deploying/monitoring remote services Ability to debug complex multi-component systems (sensors, video, networks, sync) Data quality mindset: designing validation rules, tracking metrics, investigating anomalies SQL/database experience for managing pipeline metadata Highly Valued Video processing experience (RTSP streams, encoding, OCR) Working with sensor/IoT data and handling connectivity challenges NextJS or modern web frameworks for data tooling DevOps practices: containerization, monitoring, logging, alerting Experience with annotation pipelines and ML training data workflows Background in biomechanics, sports science, or wearable sensors Tech Stack Languages: Python (primary), JavaScript/TypeScript (NextJS UI) Data: IMU sensor streams, video (RTSP), time-series analysis, DSP Tools: LabelStudio, Chart.js, Linux/bash, OCR libraries Infrastructure: Remote deployment, monitoring systems You'll Thrive Here If You Enjoy detective work: diagnosing why data doesn't match expectations Balance pragmatism with quality: shipping improvements while maintaining reliability Communicate well across technical and non-technical stakeholders Can work autonomously in a small, mission-driven team



  • Teresina, Brasil Launch Potato Tempo inteiro

    Overview Join to apply for the Senior ML Engineer, Ad Performance role at Launch Potato 4 days ago Be among the first 25 applicants Join to apply for the Senior ML Engineer, Ad Performance role at Launch Potato WHO ARE WE? Launch Potato is a profitable digital media company that reaches over 30M+ monthly visitors through brands such as FinanceBuzz, All About...


  • Teresina, Brasil Launch Potato Tempo inteiro

    Principal ML Engineer, Recommendation Systems Join to apply for the Principal ML Engineer, Recommendation Systems role at Launch Potato WHO ARE WE Launch Potato is a profitable digital media company that reaches over 30M+ monthly visitors through brands such as FinanceBuzz, All About Cookies, and OnlyInYourState. As The Discovery and Conversion Company, our...

  • Data Engineer, Data Lake

    4 semanas atrás


    Teresina, Brasil Visa Inc. Tempo inteiro

    What you\'ll do Data Teams play a central role at Pismo, providing data-intensive analytics for internal product teams, while also processing and delivering TBs of high-quality data, mainly in streaming form, needed by our partners to produce outstanding products downstream. Operating mainly in Brazil, the data team recently extended its scope to accelerate...

  • Data Engineer

    2 semanas atrás


    Teresina, Brasil Avra Tempo inteiro

    A Avra é uma plataforma de inteligência de dados deep tech, que traduz a complexidade das pequenas e médias empresas em decisões estratégicas para grandes corporações. Desenvolvemos nossos próprios modelos fundamentais do zero — sem depender de soluções de terceiros — para entregar insights inovadores que impulsionam grandes bancos e fintechs...

  • Analytics Engineer

    2 semanas atrás


    Teresina, Brasil AgileEngine Tempo inteiro

    Overview Join to apply for the Analytics Engineer (MarTech) ID43406 role at AgileEngine . AgileEngine is an Inc. 5000 company that creates award-winning software for Fortune 500 brands and trailblazing startups across 17+ industries. We rank among the leaders in areas like application development and AI/ML, and our people-first culture has earned us multiple...


  • Teresina, Brasil AVM Consulting Inc Tempo inteiro

    Databricks Data Warehouse Architect About the role: One of the largest companies in the world in the Gaming industry is seeking a hands-on Data Architect with DataWarehouse Engineer expertise in Databricks (DBX)and AWS-native data services to spearhead the design and implementation of a new data warehouse instance for a major product line. This role will...

  • Site Reliability Engineer

    2 semanas atrás


    Teresina, Brasil Ryz Labs Tempo inteiro

    Join to apply for the Site Reliability Engineer role at Ryz Labs 1 week ago Be among the first 25 applicants Join to apply for the Site Reliability Engineer role at Ryz Labs Get AI-powered advice on this job and more exclusive features. Remote position within South AmericaRYZ is seeking a Site Reliability Engineer to join one of our clients, who is...

  • Data Engineer | Specialist

    3 semanas atrás


    Teresina, Brasil Compass UOL Tempo inteiro

    Job description . Main responsibilities Criar pipeline de dados para obtenção dos dados com airflow/spark; Implementar controles de acesso e auditoria para proteger dados sensíveis e garantir conformidade com regulamentações; Migrar e integrar dados existentes na nova plataforma, assegurando a continuidade das operações e a integridade dos dados;...

  • Senior Machine Learning

    2 semanas atrás


    Teresina, Brasil BairesDev Tempo inteiro

    Join or sign in to find your next job Join to apply for the Senior Machine Learning & LLM Engineer - Remote Work | REF# role at BairesDev 4 months ago Be among the first 25 applicants Join to apply for the Senior Machine Learning & LLM Engineer - Remote Work | REF# role at BairesDev At BairesDev, we've been leading the way in technology projects for over 15...

  • Senior Data Engineer

    3 semanas atrás


    Teresina, Brasil Swapcard Tempo inteiro

    Our Mission Swapcard is the leading AI-powered event platform designed to drive revenue growth and foster meaningful connections at in-person and hybrid events. We recognize the importance of teamwork in successful events; that's why Swapcard is fueled by a team of innovators who are passionate about helping organizers build future-proof events. Our Vision...