This role focuses on designing and building scalable data infrastructure to support advanced autonomous systems. The position is responsible for transforming large-scale multimodal sensor data into high-quality, structured datasets that are ready for downstream processing and machine learning workflows.
The work involves establishing foundational systems and architectural decisions that will support long-term scalability, including how data is recorded, ingested, stored, versioned, labeled, and served. The environment handles hundreds of terabytes of data generated from LiDAR, cameras, IMU, GPS, and radar across multiple platforms.
Key Responsibilities
On-vehicle data recording pipeline Design and manage high-throughput recording systems, including topic selection, multi-GB/s write pipelines, and efficient data formats (MCAP/rosbag2). Oversee on-platform storage and ensure reliable data transfer to cloud environments with integrity checks. Ensure timestamp accuracy and synchronization across recorded data.
Data lake architecture Design and maintain scalable storage solutions across S3, FSx/Lustre, and GCS. Define data organization, regional placement, caching strategies, retention policies, data lineage, and cost optimization.
Dataset pipeline development Build pipelines that convert raw sensor data into structured, training-ready datasets. Ensure accurate time alignment across modalities, including ego-pose, calibration metadata, and scenario tagging.
Versioning and dataset management Implement robust dataset versioning and discovery processes. Evaluate and deploy tools such as DVC, LakeFS, Deep Lake, and FiftyOne, ensuring datasets are reproducible, traceable, and easily accessible.
Dataset format design Contribute to defining efficient on-disk dataset formats, focusing on write performance and optimized I/O for large-scale training workloads.
Annotation workflows Develop and manage annotation pipelines, including defining vendor handoff formats, ingesting labeled data, performing quality control, handling schema evolution, and supporting iterative dataset improvements.
Required Experience
5+ years of experience building production-grade data infrastructure, ideally involving large-scale multimodal or sensor data (e.g., robotics, autonomous systems, geospatial, or scientific domains)
Strong proficiency in Python, with the ability to work with C++ for ROS2 and pipeline-related tooling
Hands-on experience with cloud storage and distributed systems (S3, GCS, FSx, Lustre), including performance and cost optimization
Experience with dataset versioning and ML data tools such as DVC, LakeFS, Deep Lake, FiftyOne, or similar platforms
Preferred Qualifications
Background in autonomous systems or mobile platforms, particularly in complex or unstructured environments
Experience working with large-scale annotation workflows and external labeling providers
Familiarity with distributed training approaches (e.g., DDP, FSDP) to support efficient collaboration with machine learning infrastructure
Searching, interviewing and hiring are all part of the professional life. The TALENTMATE Portal idea is to fill and help professionals doing one of them by bringing together the requisites under One Roof. Whether you're hunting for your Next Job Opportunity or Looking for Potential Employers, we're here to lend you a Helping Hand.
Disclaimer: talentmate.com is only a platform to bring jobseekers & employers together.
Applicants
are
advised to research the bonafides of the prospective employer independently. We do NOT
endorse any
requests for money payments and strictly advice against sharing personal or bank related
information. We
also recommend you visit Security Advice for more information. If you suspect any fraud
or
malpractice,
email us at abuse@talentmate.com.
You have successfully saved for this job. Please check
saved
jobs
list
Applied
You have successfully applied for this job. Please check
applied
jobs list
Do you want to share the
link?
Please click any of the below options to share the job
details.
Report this job
Success
Successfully updated
Success
Successfully updated
Thank you
Reported Successfully.
Copied
This job link has been copied to clipboard!
Apply Job
Upload your Profile Picture
Accepted Formats: jpg, png
Upto 2MB in size
Your application for Data Engineer M F D
has been successfully submitted!
To increase your chances of getting shortlisted, we recommend completing your profile.
Employers prioritize candidates with full profiles, and a completed profile could set you apart in the
selection process.
Why complete your profile?
Higher Visibility: Complete profiles are more likely to be viewed by employers.
Better Match: Showcase your skills and experience to improve your fit.
Stand Out: Highlight your full potential to make a stronger impression.
Complete your profile now to give your application the best chance!