Overview
The Windows Servicing & Delivery (WSD) team investigates and remediates security vulnerabilities and high-severity reliability issues across the Windows platform. The Storage & File Systems team within WSD owns NTFS, ReFS, Storage Spaces Direct (S2D), Windows Server Failover Clustering (WSFC), Cluster Shared Volumes (CSV), the Volume Shadow Copy Service (VSS), and the full Windows storage driver stack — from NVMe and iSCSI miniport drivers through to the file system minifilter layer and user-mode storage management APIs.
This role sits at the intersection of kernel engineering and enterprise customer reliability. You will resolve the most complex ICMs escalated by top-tier enterprise and cloud customers — issues that have defeated Tier 1 and Tier 2 support and require deep ownership of source code, cluster state machines, and file system on-disk structures. Alongside security vulnerability work, you will own reliability fixes for S2D rebuild storms, CSV failover edge cases, NTFS metadata corruption, and NVMe queue-depth exhaustion scenarios that impact Fortune 500 production environments.
Responsibilities
- Own end-to-end resolution of critical ICMs escalated from top enterprise customers — analyze memory dumps, ETW traces, Storage Spaces logs, and cluster event logs to root-cause failures in S2D, WSFC, CSV, NTFS, and ReFS that cannot be resolved by field support.
- Investigate and fix security vulnerabilities in the Windows storage stack: privilege escalation through NTFS reparse points and junctions, information disclosure via uninitialized kernel pool in file system drivers, and denial-of-service through crafted on-disk structures in ReFS or NTFS.
- Design and implement reliability and correctness fixes in kernel-mode storage miniport drivers (StorPort, NVMe, iSCSI, SMB Direct/RDMA) and file system filter drivers — owning the full fix lifecycle from root cause through regression test to servicing release.
- Work directly with Storage Spaces Direct (S2D): diagnose and fix rebuild, rebalance, and fault-domain logic errors; investigate cache tier promotion/demotion bugs; resolve pool fragmentation and storage bus layer (SBL) issues in hyper-converged deployments.
- Maintain and harden Windows Server Failover Clustering (WSFC) and Cluster Shared Volumes (CSV): resolve quorum edge cases, CSV ownership transfer failures, cluster validation regressions, and inter-node storage arbitration deadlocks.
- Contribute to the Volume Shadow Copy Service (VSS) and Windows Backup infrastructure: fix provider/requester interaction bugs, VSS writer timeouts in large-scale environments, and shadow copy metadata consistency failures.
- Develop diagnostic tooling and automated regression suites for the storage stack — including kernel debugger extensions (!sdt, !storport analysis), ETW provider instrumentation, and Storage Spaces health model validation.
- Collaborate with MSRC for coordinated disclosure and patch delivery on storage-related CVEs; participate in threat modeling and security design reviews for new file system and storage features.
- Engage directly with enterprise customers and Partner Technical Advisors (PTAs) during active outages to provide expert-level guidance and expedite fix delivery through the servicing pipeline.
- Mentor engineers; drive technical bar through code reviews, design reviews, and active participation in WSD hiring loops.
Qualifications
Required Qualifications:
- Kernel & Storage Driver Engineering
- 8+ years of software engineering with deep expertise in C and C++ for Windows kernel-mode development.
- Hands-on experience with Windows storage driver stack: StorPort miniport drivers, storage filter drivers, or file system minifilter drivers — understanding of IRP flow, completion routines, and cancel-safe queue management.
- Solid grounding in Windows kernel fundamentals
- Demonstrated ability to perform crash dump analysis and live kernel debugging using WinDbg.
File Systems
- Working knowledge of NTFS on-disk structures: MFT record layout, attribute types, USN journal, and the NTFS log file for crash recovery.
- Familiarity with ReFS (Resilient File System): B+ tree metadata structure, integrity streams, block cloning, and the differences in crash recovery model versus NTFS.
- Experience debugging file system corruption scenarios: cross-linked clusters, orphaned MFT records, directory entry inconsistencies, and reparse point cycles.
- Understanding of Windows file system minifilter architecture: altitude registration, pre/post operation callbacks.
Clustering & High Availability Storage
- Hands-on experience with Windows Server Failover Clustering (WSFC): quorum models (Node Majority, Disk Witness, Cloud Witness), cluster network configuration, and the cluster API
- Deep understanding of Cluster Shared Volumes (CSV): CSV file system (CSVFS) redirected vs. direct I/O modes, CSV ownership arbitration, and coordination with the Storage Bus Layer.
- Experience with Storage Spaces Direct (S2D): storage pool creation, virtual disk provisioning, cache tier architecture (NVMe + SSD + HDD), fault domain awareness, and rebuild/rebalance behavior under node and drive failure.
- Familiarity with storage connectivity protocols in clustered environments: SMB Direct (RDMA), iSCSI multipath (MPIO/DSM), NVMe-oF, and Fibre Channel HBA integration with StorPort.
Customer Escalation & Diagnostics
- Proven ability to work high-urgency customer escalations (ICMs / CritSits): triage under time pressure, communicate root cause to non-technical stakeholders, and deliver targeted fixes through the Windows servicing pipeline.
- Experience reading and interpreting Storage Spaces diagnostic packages, cluster logs, and ETW traces (StorPort, ReFS, NTFS providers) to reconstruct failure timelines.
- Familiarity with Microsoft Support tooling: ProcMon/xperf captures, and WPA (Windows Performance Analyzer) for I/O latency profiling.
Preferred Qualifications
- Experience with Azure Stack HCI: S2D on validated hardware, stretched clustering across sites, Azure Arc integration, and software-defined storage policy management.
- Knowledge of NVMe specification internals: submission/completion queue mechanics, NVMe error log page analysis, and namespace management — beyond just driver-level consumption.
- Familiarity with SMB protocol internals (SMBv3): persistent handles, witness service (SWN), transparent failover, and scale-out file server (SOFS) architecture.
- Experience with deduplication and compression engines (Windows Data Deduplication): chunk store architecture, scrubbing, and garbage collection edge cases.
- Knowledge of Windows BitLocker full-volume encryption integration with clustered storage and its interaction with CSV and S2D volumes.
- Published CVE credits, conference presentations, or technical blog posts on file system or storage security topics.
- MS/BS in Computer Science, Electrical Engineering, or a closely related field.
#WSDJOBS
#WSDINDIA
#EWDINDIA
#WSD
#CPC
This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about
requesting accommodations.