At Apna, data is central to how we build products, understand users, improve employer outcomes, power recommendations, and scale decision-making. This role gives you the opportunity to build the backbone of Apnas data platform and influence how data is used across the company.
You will work on real-world, high-scale problems across jobs, users, employers, communities, matching, growth, and AI-driven systems.
About The Role
Apna is looking for a Lead / Staff Data Engineer to build and scale our core data platform. This role will work on large-scale data pipelines, lakehouse architecture, query platforms, workflow orchestration, and data reliability systems that power analytics, product intelligence, machine learning, business dashboards, experimentation, and operational decision-making across Apna.
We are looking for someone who can think deeply about data architecture, design reliable pipelines, improve data quality, and help build a platform that can scale with Apnas growth.
What Youll Own:
You will be responsible for designing, building, and operating critical parts of Apnas data platform, including:
Building scalable batch and near-real-time data pipelines across product, business, growth, and ML use cases
Designing and improving our lakehouse architecture using technologies likeApache Hudi
Working with query engines such asPresto / Trinofor large-scale analytical workloads
Building and maintaining orchestration workflows usingApache Airflow
Creating reusable data models, curated datasets, and reliable data marts for analytics and product teams
Improving data platform reliability, observability, SLA tracking, lineage, and data quality checks
Optimizing storage, compute, query performance, and pipeline costs
Partnering with product, analytics, ML, and backend engineering teams to understand data needs and convert them into scalable platform solutions
Driving engineering standards around data modeling, schema evolution, partitioning, deduplication, backfills, replayability, and pipeline ownership
Mentoring data engineers and influencing architecture decisions across teams
What Were Looking For
Must Have
Strong experience indata engineering, preferably at scale
Hands-on experience withApache Airflowor similar orchestration systems
Strong knowledge ofPresto / Trinoor other distributed query engines
Good understanding ofApache Hudiconcepts such as:
Copy-on-write vs merge-on-read
Upserts and deletes
Incremental reads
Compaction
Clustering
Timeline and commits
Schema evolution
Partitioning strategy
Strong knowledge of distributed data processing and storage systems
Ability to design and build reliable ETL / ELT pipelines
Strong SQL skills and ability to debug complex data issues
Good understanding of different data architectures, including:
Data warehouse
Data lake
Lakehouse
Lambda architecture
Kappa architecture
Medallion architecture
Event-driven data architecture
Experience with data modeling for analytics and reporting
Strong programming skills in at least one language such asPython, Java, or Scala
Ability to reason about trade-offs between freshness, cost, reliability, latency, and complexity
Strong debugging and production ownership mindset
Good to Have
Experience with Kafka, Spark, Flink, Hive, Iceberg, Delta Lake, or BigQuery
Experience building internal data platforms or self-serve data infrastructure
Experience with data quality frameworks such as Great Expectations, Deequ, Soda, or custom validation systems
Exposure to ML feature pipelines or feature stores
Experience with metadata management, data catalogs, lineage, and governance
Experience with cloud infrastructure such as AWS, GCP, or Azure
Understanding of privacy, compliance, PII handling, and access control in data systems
What Success Looks Like
In this role, success means:
Critical business and product datasets are reliable, discoverable, and trusted
Pipelines are observable, recoverable, and have clear SLAs
Query performance improves across major analytical workloads
Data freshness and quality issues reduce significantly
Teams can build on top of the data platform faster without reinventing pipelines
The platform can scale with Apnas user, job, employer, and engagement data
Searching, interviewing and hiring are all part of the professional life. The TALENTMATE Portal idea is to fill and help professionals doing one of them by bringing together the requisites under One Roof. Whether you're hunting for your Next Job Opportunity or Looking for Potential Employers, we're here to lend you a Helping Hand.
Disclaimer: talentmate.com is only a platform to bring jobseekers & employers together.
Applicants
are
advised to research the bonafides of the prospective employer independently. We do NOT
endorse any
requests for money payments and strictly advice against sharing personal or bank related
information. We
also recommend you visit Security Advice for more information. If you suspect any fraud
or
malpractice,
email us at abuse@talentmate.com.
You have successfully saved for this job. Please check
saved
jobs
list
Applied
You have successfully applied for this job. Please check
applied
jobs list
Do you want to share the
link?
Please click any of the below options to share the job
details.
Report this job
Success
Successfully updated
Success
Successfully updated
Thank you
Reported Successfully.
Copied
This job link has been copied to clipboard!
Apply Job
Upload your Profile Picture
Accepted Formats: jpg, png
Upto 2MB in size
Your application for Lead Staff Data Engineer - Data Platform
has been successfully submitted!
To increase your chances of getting shortlisted, we recommend completing your profile.
Employers prioritize candidates with full profiles, and a completed profile could set you apart in the
selection process.
Why complete your profile?
Higher Visibility: Complete profiles are more likely to be viewed by employers.
Better Match: Showcase your skills and experience to improve your fit.
Stand Out: Highlight your full potential to make a stronger impression.
Complete your profile now to give your application the best chance!