Virtusa
United Arab Emirates
1st August 2025
2507-2552-164
We are looking for an AI/ML-focused Data Engineer who brings deep expertise in building intelligent data pipelines for unstructured content and is experienced in integrating with modern machine learning ecosystems. The ideal candidate will have hands-on experience in PySpark and Python, with a strong focus on document classification, cleansing, quality metrics, and the ability to work with LLMs, vector databases, and Retrieval-Augmented Generation (RAG) frameworks. Candidates will play a critical role in bridging data engineering and machine learning, enabling the development of AI-first applications across the enterprise.
Key Responsibilities
Build robust, scalable data processing pipelines for unstructured documents (PDFs, emails, forms, etc.) using PySpark and Python.
Implement document cleansing, classification, and enrichment techniques to prepare high-quality data for AI/ML applications. Develop and integrate data workflows that feed into LLM-based pipelines and support vector-based retrieval using RAG architectures.
Engineer vector embeddings, document chunking, and metadata tagging for semantic search and question-answering systems.
Collaborate closely with AI architect, AI/Data engineers, and platform teams to design end-to-end AI solutions.
Communicate data readiness, pipeline quality, and model integration strategies clearly to both technical and non-technical stakeholders.
Apply Agile methodologies and CI/CD best practices to deliver continuously evolving AI capabilities Required Skills:
Overall 5+ years of commercial experience with 2+ years in relevant role Strong proficiency in PySpark and distributed data frameworks.
Solid experience in core Python, including ML/AI libraries (e.g., Transformers, LangChain, Hugging Face, FAISS, etc.).
Proven expertise in processing unstructured data and document intelligence (OCR, NLP, classification, tagging).
Familiarity with vector databases (e.g., Redis) and embedding models for RAG pipelines.
Understanding of LLM lifecycle, including fine-tuning, inference, and prompt engineering.
Experience working in agile environments, collaborating with cross-functional teams.
Excellent communication skills with the ability to interface with both technical and business stakeholders.
Role Level: | Associate | Work Type: | Full-Time |
---|---|---|---|
Country: | United Arab Emirates | City: | Dubai |
Company Website: | http://www.virtusa.com | Job Function: | Data Science & AI |
Company Industry/ Sector: |
IT Services and IT Consulting |
Virtusa Corporation provides digital engineering and technology services to Forbes Global 2000 companies worldwide. Our Engineering First approach ensures we can execute all ideas and creatively solve pressing business challenges. With industry expertise and empowered agile teams, we prioritize execution early in the process for impactful results. We combine logic, creativity and curiosity to build, solve, and create. Every day, we help clients engage with new technology paradigms, creatively building solutions that solve their most pressing business challenges and move them to the forefront of their industry. Join us at Virtusa, an equal opportunity employer that values inclusion and diversity. Check out Virtusa.com/careers to find out more.
Disclaimer: talentmate.com is only a platform to bring jobseekers & employers together. Applicants are advised to research the bonafides of the prospective employer independently. We do NOT endorse any requests for money payments and strictly advice against sharing personal or bank related information. We also recommend you visit Security Advice for more information. If you suspect any fraud or malpractice, email us at abuse@talentmate.com.