Lead Data Engineer

Há 14 horas


João Pessoa, Brasil Fusemachines Tempo inteiro

3 weeks ago Be among the first 25 applicants

About Fusemachines
Fusemachines is a leading AI strategy, talent, and education services provider. Founded by Sameer Maskey Ph.D., Adjunct Associate Professor at Columbia University, Fusemachines has a core mission of democratizing AI. With a presence in 4 countries (Nepal, United States, Canada, and Dominican Republic and more than 450 employees). Fusemachines seeks to bring its global expertise in AI to transform companies around the world.

About Fusemachines
Fusemachines is a leading AI strategy, talent, and education services provider. Founded by Sameer Maskey Ph.D., Adjunct Associate Professor at Columbia University, Fusemachines has a core mission of democratizing AI. With a presence in 4 countries (Nepal, United States, Canada, and Dominican Republic and more than 450 employees). Fusemachines seeks to bring its global expertise in AI to transform companies around the world.
Location: Remote (Full-time)
About The Role
This is a remote full-time position, responsible for designing, building, testing, optimizing and maintaining the infrastructure and code required for data integration, storage, processing, pipelines and analytics (BI, visualization and Advanced Analytics) from ingestion to consumption, implementing data flow controls, and ensuring high data quality and accessibility for analytics and business intelligence purposes. This role requires a strong foundation in programming, and a keen understanding of how to integrate and manage data effectively across various storage systems and technologies.
We're looking for someone who can quickly ramp up, contribute right away and lead the work in Data & Analytics, helping from backlog definition, to architecture decisions, and lead technical the rest of the team with minimal oversight.
We are looking for a skilled Sr. Data Engineer/Technical Lead with a strong background in Python, SQL, Pyspark, Redshift and AWS cloud-based large scale data solutions with a passion for data quality, performance and cost optimization. The ideal candidate will develop in an Agile environment, and would have GCP experience too, to contribute to the migration from AWS to GCP.
This role is perfect for an individual passionate about leading, leveraging data to drive insights, improve decision-making, and support the strategic goals of the organization through innovative data engineering solutions.
Qualification / Skill Set Requirement:

  • Must have a full-time Bachelor's degree in Computer Science Information Systems, Engineering, or a related field
  • 5+ years of real-world data engineering development experience in AWS and GCP (certifications preferred). Strong expertise in Python, SQL, PySpark and AWS in an Agile environment, with a proven track record of building and optimizing data pipelines, architectures, and datasets, and proven experience in data storage, modeling, management, lake, warehousing, processing/transformation, integration, cleansing, validation and analytics
  • Senior person who can understand requirements and design end to end solutions with minimal oversight
  • Strong programming Skills in one or more languages such as Python, Scala, and proficient in writing efficient and optimized code for data integration, storage, processing and manipulation
  • Strong knowledge SDLC tools and technologies, including project management software (Jira or similar), source code management (GitHub or similar), CI/CD system (GitHub actions, AWS CodeBuild or similar) and binary repository manager (AWS CodeArtifact or similar)
  • Good understanding of Data Modeling and Database Design Principles. Being able to design and implement efficient database schemas that meet the requirements of the data architecture to support data solutions
  • Strong SQL skills and experience working with complex data sets, Enterprise Data Warehouse and writing advanced SQL queries. Proficient with Relational Databases (RDS, MySQL, Postgres, or similar) and NonSQL Databases (Cassandra, MongoDB, Neo4j, etc.)
  • Skilled in Data Integration from different sources such as APIs, databases, flat files, event streaming.
  • Strong experience in implementing data pipelines and efficient ELT/ETL processes, batch and real-time, in AWS and using open source solutions, being able to develop custom integration solutions as needed, including Data Integration from different sources such as APIs (PoS integrations is a plus), ERP (Oracle and Allegra are a plus), databases, flat files, Apache Parquet, event streaming, including cleansing, transformation and validation of the data
  • Strong experience with scalable and distributed Data Technologies such as Spark/PySpark, DBT and Kafka, to be able to handle large volumes of data
  • Experience with stream-processing systems: Storm, Spark-Streaming, etc. is a plus
  • Strong experience in designing and implementing Data Warehousing solutions in AWS with Redshift. Demonstrated experience in designing and implementing efficient ELT/ETL processes that extract data from source systems, transform it (DBT), and load it into the data warehouse
  • Strong experience in Orchestration using Apache Airflow
  • Expert in Cloud Computing in AWS, including deep knowledge of a variety of AWS services like Lambda, Kinesis, S3, Lake Formation, EC2, EMR, ECS/ECR, IAM, CloudWatch, etc
  • Good understanding of Data Quality and Governance, including implementation of data quality checks and monitoring processes to ensure that data is accurate, complete, and consistent
  • Good understanding of BI solutions including Looker and LookML (Looker Modeling Language)
  • Strong knowledge and hands-on experience of DevOps principles, tools and technologies (GitHub and AWS DevOps) including continuous integration, continuous delivery (CI/CD), infrastructure as code (IaC – Terraform), configuration management, automated testing, performance tuning and cost management and optimization
  • Good Problem-Solving skills: being able to troubleshoot data processing pipelines and identify performance bottlenecks and other issues
  • Possesses strong leadership skills with a willingness to lead, create Ideas, and be assertive
  • Strong project management and organizational skills
  • Excellent communication skills to collaborate with cross-functional teams, including business users, data architects, DevOps/DataOps/MLOps engineers, data analyst, data scientists, developers, and operations teams. Essential to convey complex technical concepts and insights to non-technical stakeholders effectively
  • Ability to document processes, procedures, and deployment configurations
Responsibilities:
  • Design, implement, deploy, test and maintain highly scalable and efficient data architectures, defining and maintaining standards and best practices for data management independently with minimal guidance
  • Ensuring the scalability, reliability, quality and performance of data systems
  • Mentoring and guiding junior/mid-level data engineers
  • Collaborating with Product, Engineering, Data Scientists and Analysts to understand data requirements and develop data solutions, including reusable components
  • Evaluating and implementing new technologies and tools to improve data integration, data processing and analysis
  • Design architecture, observability and testing strategies, and building reliable infrastructure and data pipelines
  • Takes ownership of storage layer, data management tasks, including schema design, indexing, and performance tuning
  • Swiftly address and resolve complex data engineering issues, incidents and resolve bottlenecks in SQL queries and database operations
  • Conduct Discovery on existing Data Infrastructure and Proposed Architecture
  • Evaluate and implement cutting-edge technologies and methodologies and continue learning and expanding skills in data engineering and cloud platforms, to improve and modernize existing data systems
  • Evaluate, design, and implement data governance solutions: cataloging, lineage, quality and data governance frameworks that are suitable for a modern analytics solution, considering industry-standard best practices and patterns.
  • Define and document data engineering architectures, processes and data flows
  • Assess best practices and design schemas that match business needs for delivering a modern analytics solution (descriptive, diagnostic, predictive, prescriptive)
  • Be an active member of our Agile team, participating in all ceremonies and continuous improvement activities
Equal Opportunity Employer: Race, Color, Religion, Sex, Sexual Orientation, Gender Identity, National Origin, Age, Genetic Information, Disability, Protected Veteran Status, or any other legally protected group status.
Powered by JazzHR
ZrFIVaBonR Seniority level
  • Seniority level Mid-Senior level
Employment type
  • Employment type Contract
Job function
  • Job function Information Technology
  • Industries Internet Publishing

Referrals increase your chances of interviewing at Fusemachines by 2x

Junior Data Analytics / R+D - Remote Work | REF# Junior Software Development Engineer in Test / R+D - Remote Work | REF# Software Engineer PHP - Remote Work | REF#5716 Golang Software Engineer (Senior/Lead) ID37218

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

#J-18808-Ljbffr
  • Data Lead

    Há 17 horas


    João Pessoa, Brasil buscojobs Brasil Tempo inteiro

    About YipitData YipitData is the market-leading data and analytics firm and raised up to $475M from The Carlyle Group at a valuation over $1B in its latest funding round. We analyze billions of data points every day to provide accurate, detailed insights across industries, including consumer brands, technology, software, and healthcare. Our data team uses...

  • Big Data Engineer

    Há 3 dias


    João Pessoa, Paraíba, Brasil HUNT IT - Recrutamento Especializado para TI Tempo inteiro

    Descrição: Big Data EngineerOur Client is a US technology company looking for a highly motivated Data Engineer with a passion for data, to build and implement data pipelines in cloud technologies, including SnapLogic and AWS.Responsibilities:● Develops and maintains scalable data pipelines in SnapLogic and builds out new ETL and API integrations to...

  • Data Engineer

    Há 14 horas


    João Pessoa, Brasil Coda Search│Staffing Tempo inteiro

    Get AI-powered advice on this job and more exclusive features. Our client is a U.S.-based company that provides technical expertise, testing, and certification services to the global food and agricultural industry. Their mission is to ensure food safety, quality, and sustainability across international supply chains. This role is critical to building,...


  • João Pessoa, Brasil EPAM Systems Tempo inteiro

    We are seeking a Senior Data DevOps Engineer to join our remote team, working on a cutting-edge project that involves developing and maintaining large-scale big data infrastructure. In this role, you will play a crucial role in ensuring the reliability, scalability, and performance of our big data infrastructure. You will work closely with...

  • Data Engineer

    Há 14 horas


    João Pessoa, Brasil AgileEngine Tempo inteiro

    Join to apply for the Data Engineer (Senior) ID40199 role at AgileEngine AgileEngine is an Inc. 5000 company that creates award-winning software for Fortune 500 brands and trailblazing startups across 17+ industries. We rank among the leaders in areas like application development and AI/ML, and our people-first culture has earned us multiple Best Place to...


  • João Pessoa, Brasil Epam Systems Tempo inteiro

    OverviewWe are seeking a Senior Data Infrastructure Engineer, responsible for supporting a data processing infrastructure cluster.The selected individual will have a pivotal role in sustaining, improving, and resolving problems from the AWS level down to specific data pipeline arrangements.Responsibilities will include deploying updates, establishing new...


  • João Pessoa, Paraíba, Brasil Deep Origin Tempo inteiro

    Deep Origin is building an operating system for science to help scientists be more productive and bring tomorrow's ideas to life quickly and at a reasonable cost. The company is led by Michael Antonov, a co-founder of Oculus, and funded by Formic Ventures. We are looking for an experienced Automated Software QA Engineer / Lead to join our dynamic team. The...


  • João Pessoa, Paraíba, Brasil beBeeDataEngineer Tempo inteiro US$10.360 - US$10.720

    Senior Cloud Data Engineer PositionThis role is a senior-level opportunity to work as a Cloud Data Engineer, requiring expertise in Azure tool Stack, including Databricks, ADF, and Synapse Analytics.Primary Responsibilities:Design and implement advanced solutions for complex integrations and data preprocessing tasks.Sustain activities on large-scale data...

  • Sr. Data Engineer

    Há 3 dias


    João Pessoa, Paraíba, Brasil RecargaPay Tempo inteiro

    Come Make an Impact on Millions of Brazilians At RecargaPay, we're on a mission to deliver the best payment experience for Brazilian consumers and small businesses — by building a powerful digital ecosystem where the banked and unbanked connect, and where consumers and merchants have a one-stop shop for all their financial needs. We serve over 10 million...

  • Software Engineer

    Há 3 dias


    João Pessoa, Paraíba, Brasil beBeeBlockchain Tempo inteiro US$200.000 - US$250.000

    Job OpportunityFight crime and create a safer world by leveraging blockchain data and advanced analytics to combat illicit activity.We are seeking a skilled Software Engineer to execute mission-critical systems and data services analyzing blockchain transaction activity at petabyte scale.You will be part of a team empowering governments, financial...