Talentmate
India
16th January 2026
2601-8900-7
About Us
CLOUDSUFI, a Google Cloud Premier Partner, is a global leading provider of data-driven digital transformation across cloud-based enterprises. With a global presence and focus on Software & Platforms, Life sciences and Healthcare, Retail, CPG, financial services and supply chain, CLOUDSUFI is positioned to meet customers where they are in their data monetization journey.
Our Values
We are a passionate and empathetic team that prioritizes human values. Our purpose is to elevate the quality of lives for our family, customers, partners and the community.
Equal Opportunity Statement
CLOUDSUFI is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. All qualified candidates receive consideration for employment without regard to race, colour, religion, gender, gender identity or expression, sexual orientation and national origin status. We provide equal opportunities in employment, advancement, and all other areas of our workplace. Please explore more at https://www.cloudsufi.com/
About Us
CLOUDSUFI, a Google Cloud Premier Partner, is a global leading provider of data-driven digital transformation across cloud-based enterprises. With a global presence and focus on Software & Platforms, Life sciences and Healthcare, Retail, CPG, financial services, and supply chain, CLOUDSUFI is positioned to meet customers where they are in their data monetization journey.
Job Summary
We are seeking a highly skilled and motivated Data Engineer to join our Development POD for the Integration Project. The ideal candidate will be responsible for designing, building, and maintaining robust data pipelines to ingest, clean, transform, and integrate diverse public datasets into our knowledge graph. This role requires a strong understanding of Cloud Platform (GCP) services, data engineering best practices, and a commitment to data quality and scalability.
Key Responsibilities
ETL Development: Design, develop, and optimize data ingestion, cleaning, and transformation pipelines for various data sources (e.g., CSV, API, XLS, JSON, SDMX) using Cloud Platform services (Cloud Run, Dataflow) and Python.
Schema Mapping & Modeling: Work with LLM-based auto-schematization tools to map source data to our schema.org vocabulary, defining appropriate Statistical Variables (SVs) and generating MCF/TMCF files.
Entity Resolution & ID Generation: Implement processes for accurately matching new entities with existing IDs or generating unique, standardized IDs for new entities.
Knowledge Graph Integration: Integrate transformed data into the Knowledge Graph, ensuring proper versioning and adherence to existing standards.
API Development: Develop and enhance REST and SPARQL APIs via Apigee to enable efficient access to integrated data for internal and external stakeholders.
Data Validation & Quality Assurance: Implement comprehensive data validation and quality checks (statistical, schema, anomaly detection) to ensure data integrity, accuracy, and freshness. Troubleshoot and resolve data import errors.
Automation & Optimization: Collaborate with the Automation POD to leverage and integrate intelligent assets for data identification, profiling, cleaning, schema mapping, and validation, aiming for significant reduction in manual effort.
Collaboration: Work closely with cross-functional teams, including Managed Service POD, Automation POD, and relevant stakeholders.
Qualifications And Skills
Education: Bachelors or Masters degree in Computer Science, Data Engineering, Information Technology, or a related quantitative field.
Experience: 3+ years of proven experience as a Data Engineer, with a strong portfolio of successfully implemented data pipelines.
Programming Languages: Proficiency in Python for data manipulation, scripting, and pipeline development.
Cloud Platforms and Tools: Expertise in Google Cloud Platform (GCP) services, including Cloud Storage, Cloud SQL, Cloud Run, Dataflow, Pub/Sub, BigQuery, and Apigee. Proficiency with Git-based version control.
Core Competencies:
Must Have - SQL, Python, BigQuery, (GCP DataFlow / Apache Beam), Google Cloud Storage (GCS)
Must Have - Proven ability in comprehensive data wrangling, cleaning, and transforming complex datasets from various formats (e.g., API, CSV, XLS, JSON)
Secondary Skills - SPARQL, Schema.org, Apigee, CI/CD (Cloud Build), GCP, Cloud Data Fusion, Data Modelling
Solid understanding of data modeling, schema design, and knowledge graph concepts (e.g., Schema.org, RDF, SPARQL, JSON-LD).
Experience with data validation techniques and tools.
Familiarity with CI/CD practices and the ability to work in an Agile framework.
Strong problem-solving skills and keen attention to detail.
Preferred Qualifications:
Experience with LLM-based tools or concepts for data automation (e.g., auto-schematization).
Familiarity with similar large-scale public dataset integration initiatives.
Experience with multilingual data integration.
| Role Level: | Mid-Level | Work Type: | Full-Time |
|---|---|---|---|
| Country: | India | City: | Gautam Buddha Nagar ,Uttar Pradesh |
| Company Website: | https://www.cloudsufi.com/ | Job Function: | Engineering |
| Company Industry/ Sector: |
Software Development | ||
Searching, interviewing and hiring are all part of the professional life. The TALENTMATE Portal idea is to fill and help professionals doing one of them by bringing together the requisites under One Roof. Whether you're hunting for your Next Job Opportunity or Looking for Potential Employers, we're here to lend you a Helping Hand.
Disclaimer: talentmate.com is only a platform to bring jobseekers & employers together. Applicants are advised to research the bonafides of the prospective employer independently. We do NOT endorse any requests for money payments and strictly advice against sharing personal or bank related information. We also recommend you visit Security Advice for more information. If you suspect any fraud or malpractice, email us at abuse@talentmate.com.