Entity:
Technology
Job Family Group:
IT&S Group
Job Description:
bp is navigating the most significant transition in its 100+ year history with the aim of becoming a leading renewable energy provider and achieving net zero carbon emissions by 2050. To facilitate this transition, senior AI Platform Engineers are crucial in architecting and maintaining the infrastructure that supports AI and ML initiatives across the organization.
We are seeking experienced AI Platform Engineers with a strong background in Azure and AWS cloud services, Databricks, OpenAI, MLOps practices, and Azure DevOps to join our technology team. As a key member, you will play a vital role in leading the design, development, and optimization of scalable and reliable AI platforms.
As a Senior AI Platform Engineer, you deploy and operate cloud infrastructure and automation pipelines to support AI/ML and generative AI workloads across Azure, AWS, and Databricks. You will build secure, scalable environments (VMs, containers, Kubernetes clusters, Databricks workspaces) to deploy cloud AI services like Azure OpenAI/Azure ML, AWS SageMaker/Bedrock. You will focus on continuous integration and deployment (CI/CD) and infrastructure-as-code (ARM/Bicep, AWS CDK etc.) to automate model release cycles, while embedding security, compliance, monitoring and observability (logging, Prometheus/Grafana) to ensure service reliability and quick issue resolution. You will collaborate with data scientists and application teams to integrate generative AI features (e.g. chatbots, co-pilot assistants) into enterprise applications.
Qualification
Bachelor’s or master’s degree or equivalent experience in computer science, engineering, information systems or a numerate degree.
Key Responsibilities
- Lead the design and implementation of scalable AI/ML infrastructure on Azure and AWS.
- Build and manage cloud-native infrastructure (Azure, AWS, Databricks) for AI workloads using Infrastructure-as-Code (IaC) tools like Terraform and Bicep.
- Create reusable self-service tooling, templates, and CI/CD workflows for data scientists and ML engineers.
- Govern AI systems with access control, audit trails, policy enforcement, and compliance monitoring (e.g., GDPR).
- Implement GenAI workloads using Azure AI Foundry Azure AI Hub, Azure OpenAI, Amazon Bedrock, Anthropic Claude, Hugging Face, LangChain, etc.
- Implement infrastructure and DevOps practices for Agentic AI solutions using native Azure and AWS AI services.
- Collaborate with security and architecture teams to embed cloud security best practices in the AI platform.
- Contribute to incident response, troubleshooting, and root cause analysis of ML and GenAI workload failures and latency issues.
- Implement MLOps practices to manage and optimize the lifecycle of machine learning models, including monitoring, versioning, and retraining.
- Collaborate with data scientists, software engineers, and other stakeholders to ensure effective integration of AI solutions within the business.
- Stay up to date with the latest advancements in AI, cloud computing, and DevOps practices, and integrate relevant technologies into the platform.
- Review Weekly/bi-weekly Cloud Cost Reports. Identify and lead the efforts for any cloud cost-savings opportunities
- Mentor junior engineers, providing technical leadership and fostering a culture of continuous learning.
- Ensure compliance with industry standards and best practices for data security and privacy.
Requirements
- 8 - 10+ years of experience in platform engineering, with a proven track record of designing, deploying, and managing scalable and secure cloud-based infrastructures, leveraging both Azure and AWS services.
- Experience with Azure services such as Azure AI services, Azure Search, Azure ML, Databricks, Azure Kubernetes Service, and AWS services like AWS SageMaker, AWS Bedrock and AWS Lambda.
- Exposure to Generative AI and Agentic AI ecosystems such as Azure OpenAI, Azure AI Foundry, Azure AI Hub, Bedrock, Anthropic Claude, OpenAI API, LlamaCloud, LangChain.
- Understanding of token usage, LLM prompt injection risks, Jailbreak attempts and mitigation techniques.
- Strong knowledge of governance, audit, observability, and compliance in cloud-based GenAI and ML ecosystems.
- Should understand Azure AI Evaluation SDK and AI Red Teaming Prompt Security Scans
- Good to have experience with code assistant tools like Github Copilot, Cursor and Claude Code
- Expertise in Azure DevOps or AWS CodePipeline, including setting up and managing CI/CD pipelines.
- Advanced experience with Azure Blob Storage, Cosmos DB, SQL, Key Vault, AWS S3, DynamoDB, and AWS RDS etc and their integrations with AI services
- Advanced understanding of networking concepts, including DNS management, load balancing, VPNs, and virtual networks (VNets).
- Advanced understanding of security concepts, including IAM roles, identities, Azure policies, AWS SCPs.
- Experience in Advanced Authentication and Authorization Concepts across various cloud providers and platforms
- Must have experience with Azure Policy, AWS SCP, AWS IAM, audit logging, Azure RBAC etc.
- Mastery of infrastructure-as-code tools such as Azure ARM / Bicep, Terraform, CloudFormation, or equivalent.
- Proficiency in networking, DNS, load balancers, and cloud engineering services.
- Knowledge in Python programming and AI/ML libraries (TensorFlow, PyTorch, Sci-Kit learn etc.).
- Experience with containerization and orchestration tools such as Docker and Kubernetes.
- Good to have knowledge about Azure Bot framework, APIM, Application Gateway. Also, knowledge about M365 offerings like M365 Copilot. AWS CDK, AWS Python(Boto3) SDK.
- Experience with monitoring tools like Grafana, Prometheus, Application Insights, Log Analytics Workspaces, and Azure Monitor.
- Strong problem-solving and analytical skills.
- Strong communication and collaboration skills to work effectively with diverse teams.
- Proven leadership abilities to guide and mentor junior engineers.
Travel Requirement
Up to 10% travel should be expected with this role
Relocation Assistance:
This role is eligible for relocation within country
Remote Type:
This position is a hybrid of office/remote working
Skills:
Agility core practices, Agility core practices, Analytics, API and platform design, Business Analysis, Cloud Platforms, Coaching, Communication, Configuration management and release, Continuous deployment and release, Data Structures and Algorithms (Inactive), Digital Project Management, Documentation and knowledge sharing, Facilitation, Information Security, iOS and Android development, Mentoring, Metrics definition and instrumentation, NoSql data modelling, Relational Data Modelling, Risk Management, Scripting, Service operations and resiliency, Software Design and Development, Source control and code management {+ 4 more}
Legal Disclaimer:
We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, sex, gender, gender expression, sexual orientation, age, marital status, socioeconomic status, neurodiversity/neurocognitive functioning, veteran status or disability status. Individuals with an accessibility need may request an adjustment/accommodation related to bp’s recruiting process (e.g., accessing the job application, completing required assessments, participating in telephone screenings or interviews, etc.). If you would like to request an adjustment/accommodation related to the recruitment process, please contact us.
If you are selected for a position and depending upon your role, your employment may be contingent upon adherence to local policy. This may include pre-placement drug screening, medical review of physical fitness for the role, and background checks.