GCP Infrastructure Engineer - Google Cloud Terraform Python Bash GKE CI CD
Talentmate
India
21st October 2025
2510-3589-473
Job Description
Avant de postuler à un emploi, sélectionnez votre langue de préférence parmi les options disponibles en haut à droite de cette page.
Découvrez votre prochaine opportunité au sein dune organisation qui compte parmi les 500 plus importantes entreprises mondiales. Envisagez des opportunités innovantes, découvrez notre culture enrichissante et travaillez avec des équipes talentueuses qui vous poussent à vous développer chaque jour. Nous savons ce qu’il faut faire pour diriger UPS vers lavenir : des personnes passionnées dotées d’une combinaison unique de compétences. Si vous avez les qualités, de la motivation, de lautonomie ou le leadership pour diriger des équipes, il existe des postes adaptés à vos aspirations et à vos compétences daujourdhui et de demain.
Job Summary
Fiche de poste :
We are seeking a highly skilled GCP Infrastructure Engineer to design, build, and manage the cloud infrastructure that powers Generative AI (GenAI) applications at scale. In this role, you will leverage Google Cloud Platform (GCP) Vertex AI, IBM Watsonx, and containerization technologies such as Docker and Kubernetes (GKE) to deliver secure, scalable, and high-performance AI solutions. You will own the end-to-end infrastructure lifecycle — from design and provisioning to automation, monitoring, and optimization — while enabling data scientists and ML engineers to seamlessly deploy and operate GenAI workloads.
Key Responsibilities
Cloud Infrastructure & Platform Engineering
Design, provision, and maintain scalable, secure, and cost-efficient infrastructure for GenAI applications on GCP.
Deploy and manage containerized workloads using Docker and Kubernetes (GKE).
Configure and optimize Vertex AI and IBM Watsonx platforms for training, fine-tuning, and serving LLMs and other generative models.
Implement high-performance GPU/TPU clusters to support distributed training and large-scale inference.
Ensure business continuity through backup, disaster recovery, and multi-region deployments.
Automation & Reliability
Develop and maintain Infrastructure as Code (IaC) templates with Terraform, or Cloud Deployment Manager.
Adopt GitOps practices (Flux) for infrastructure lifecycle management.
Build and optimize CI/CD pipelines for data pipelines, model workflows, and GenAI applications.
Apply SRE principles (SLIs, SLOs, SLAs) to guarantee platform reliability and uptime.
Security, Governance & Compliance
Embed DevSecOps best practices across the infrastructure lifecycle, including policy-as-code, vulnerability scanning, and secrets management.
Enforce identity and access management (IAM), network segmentation, and data encryption in compliance with standards (HIPAA, SOX, GDPR, FedRAMP).
Collaborate with enterprise security and compliance teams to implement governance frameworks for GenAI platforms.
Monitoring, Observability & Cost Optimization
Implement observability stacks (Prometheus, Grafana, Cloud Monitoring, Datadog) for both infra health and ML-specific metrics (model drift, data anomalies).
Define KPIs to monitor system health, performance, and adoption across AI workloads.
Optimize cloud cost efficiency for GPU/TPU-intensive workloads using autoscaling, preemptible instances, and utilization monitoring.
Collaboration & Enablement
Partner with data scientists, ML engineers, and software teams to streamline GenAI application development and deployment.
Provide onboarding, documentation, and reusable templates to enable faster adoption of AI infrastructure.
Stay current with the latest advancements in GenAI, cloud-native infrastructure, and container orchestration.
Required Education
Bachelor’s or master’s degree in computer science, Software Engineering, or a related field.
Required Experience
8+ years of experience in cloud infrastructure engineering, DevOps, or platform engineering.
Experience with GenAI use cases (chatbots, content generation, code assistants, etc.).
Strong hands-on expertise with Google Cloud Platform (GCP), especially Vertex AI.
Experience with IBM Watsonx for AI application deployment and management.
Proven skills in Docker, Kubernetes (GKE), and container orchestration at scale.
Proficiency in Python, Bash, or other relevant scripting languages.
Strong understanding of cloud networking, IAM, and security best practices.
Experience with CI/CD tools (GitHub Actions, GitLab CI, Jenkins) and IaC tools (Terraform, Pulumi, Ansible, Deployment Manager).
Familiarity with data pipelines and integration tools (Dataflow, Apache Beam, Pub/Sub, Kafka).
Excellent problem-solving, debugging, and communication skills.
Preferred Experience
Experience in MLOps practices for model deployment, monitoring, and retraining.
Exposure to multi-cloud or hybrid cloud environments (GCP, AWS, Azure, on-prem).
Hands-on experience with feature stores (Vertex AI Feature Store, Feast) and ML observability tools (EvidentlyAI, Fiddler).
Knowledge of distributed training frameworks (Horovod, DeepSpeed, PyTorch Distributed).
Contributions to open-source projects in infrastructure, MLOps, or GenAI.
Experience managing infrastructure in regulated industries.
Preferred Certifications
Google Cloud Certified - Professional Cloud Architect
Google Cloud Certified - Machine Learning Engineer
Certified Kubernetes Administrator (CKA) or Certified Kubernetes Application Developer (CKAD)
IBM Certified Watsonx Generative AI Engineer – Associate
IBM Certified Solution Architect - Cloud Pak for Data
Other relevant certifications in AI, Machine Learning, or Cloud-Native technologies.
Type De Contrat
en CDI
Chez UPS, égalité des chances, traitement équitable et environnement de travail inclusif sont des valeurs clefs auxquelles nous sommes attachés.
Searching, interviewing and hiring are all part of the professional life. The TALENTMATE Portal idea is to fill and help professionals doing one of them by bringing together the requisites under One Roof. Whether you're hunting for your Next Job Opportunity or Looking for Potential Employers, we're here to lend you a Helping Hand.
Disclaimer: talentmate.com is only a platform to bring jobseekers & employers together.
Applicants
are
advised to research the bonafides of the prospective employer independently. We do NOT
endorse any
requests for money payments and strictly advice against sharing personal or bank related
information. We
also recommend you visit Security Advice for more information. If you suspect any fraud
or
malpractice,
email us at abuse@talentmate.com.
You have successfully saved for this job. Please check
saved
jobs
list
Applied
You have successfully applied for this job. Please check
applied
jobs list
Do you want to share the
link?
Please click any of the below options to share the job
details.
Report this job
Success
Successfully updated
Success
Successfully updated
Thank you
Reported Successfully.
Copied
This job link has been copied to clipboard!
Apply Job
Upload your Profile Picture
Accepted Formats: jpg, png
Upto 2MB in size
Your application for GCP Infrastructure Engineer - Google Cloud Terraform Python Bash GKE CI CD
has been successfully submitted!
To increase your chances of getting shortlisted, we recommend completing your profile.
Employers prioritize candidates with full profiles, and a completed profile could set you apart in the
selection process.
Why complete your profile?
Higher Visibility: Complete profiles are more likely to be viewed by employers.
Better Match: Showcase your skills and experience to improve your fit.
Stand Out: Highlight your full potential to make a stronger impression.
Complete your profile now to give your application the best chance!