Talentmate
India
5th June 2026
2606-2039-234
As a global leader in cybersecurity, CrowdStrike protects the people, processes and technologies that drive modern organizations. Since 2011, our mission hasn’t changed — we’re here to stop breaches, and we’ve redefined modern security with the world’s most advanced AI-native platform. We work on large scale distributed systems, processing almost 3 trillion events per day and this traffic is growing daily. Our customers span all industries, and they count on CrowdStrike to keep their businesses running, their communities safe and their lives moving forward. We’re also a mission-driven company. We cultivate a culture that gives every CrowdStriker both the flexibility and autonomy to own their careers. We’re always looking to add talented CrowdStrikers to the team who have limitless passion, a relentless focus on innovation and a fanatical commitment to our customers, our community and each other. Ready to join a mission that matters? The future of cybersecurity starts with you.
About The Role:
Were seeking a Sr. Engineer - ML Platform to maintain and optimize CrowdStrikes mission-critical ML infrastructure. Youll diagnose complex distributed systems issues and ensure platform reliability for infrastructure processing billions of events daily.
What Youll Do:
Platform Reliability & Debugging: Diagnose and resolve issues across Ray, Spark, Airflow, MLflow, JupyterHub, Kubeflow, and SLURM Perform root cause analysis on production incidents affecting training and inference pipelines Debug performance bottlenecks, resource contention, memory leaks, and scheduling conflicts Develop debugging tools and diagnostic frameworks
System Optimization & Performance: Profile and optimize Ray clusters and Spark jobs on K8s and Cloud (EMR/Dataproc) Troubleshoot JupyterHub spawner issues, kernel crashes, and resource allocation Optimize SLURM job scheduling, GPU allocation, and HPC cluster utilization
Infrastructure & Monitoring: Build observability solutions and automated health checks Develop runbooks, alerting workflows, and incident response procedures Maintain platform stability metrics (SLAs, error rates, latency)
Collaboration: Partner with ML and ML Platform engineers to resolve workflow issues Conduct post-mortems and mentor on debugging techniques
What Youll Need:
| Role Level: | Not Applicable | Work Type: | Full-Time |
|---|---|---|---|
| Country: | India | City: | Bengaluru ,Karnataka |
| Company Website: | http://www.crowdstrike.com | Job Function: | Data Science & AI |
| Company Industry/ Sector: |
Computer and Network Security | ||
Searching, interviewing and hiring are all part of the professional life. The TALENTMATE Portal idea is to fill and help professionals doing one of them by bringing together the requisites under One Roof. Whether you're hunting for your Next Job Opportunity or Looking for Potential Employers, we're here to lend you a Helping Hand.
Disclaimer: talentmate.com is only a platform to bring jobseekers & employers together. Applicants are advised to research the bonafides of the prospective employer independently. We do NOT endorse any requests for money payments and strictly advice against sharing personal or bank related information. We also recommend you visit Security Advice for more information. If you suspect any fraud or malpractice, email us at abuse@talentmate.com.