At PwC, our people in infrastructure focus on designing and implementing robust, secure IT systems that support business operations. They enable the smooth functioning of networks, servers, and data centres to optimise performance and minimise downtime. Those in cloud operations at PwC will focus on managing and optimising cloud infrastructure and services to enable seamless operations and high availability for clients. You will be responsible for monitoring, troubleshooting, and implementing industry leading practices for cloud-based systems.
Focused on relationships, you are building meaningful client connections, and learning how to manage and inspire others. Navigating increasingly complex situations, you are growing your personal brand, deepening technical expertise and awareness of your strengths. You are expected to anticipate the needs of your teams and clients, and to deliver quality. Embracing increased ambiguity, you are comfortable when the path forward isn’t clear, you ask questions, and you use these moments as opportunities to grow.
Skills
Examples of the skills, knowledge, and experiences you need to lead and deliver value at this level include but are not limited to:
- Respond effectively to the diverse perspectives, needs, and feelings of others.
- Use a broad range of tools, methodologies and techniques to generate new ideas and solve problems.
- Use critical thinking to break down complex concepts.
- Understand the broader objectives of your project or role and how your work fits into the overall strategy.
- Develop a deeper understanding of the business context and how it is changing.
- Use reflection to develop self awareness, enhance strengths and address development areas.
- Interpret data to inform insights and recommendations.
- Uphold and reinforce professional and technical standards (e.g. refer to specific PwC tax and audit guidance), the Firms code of conduct, and independence requirements.
Instructions
- Please update areas marked in red 2. Link to Tips & Tricks for Writing PwC Job Description
- Quick Tips for Reviewing your JD!
- Make sure you have the appropriate header sentence based on the level of the JD (i.e. Manager level role should start with appropriate descriptor “Demonstrates extensive abilities and/or a proven record of success as a team leader:” The appropriate header can be found in the Tips and Tricks document provided above.
- Be mindful of grammatical consistency. the list should either be all verb-driven or all noun-driven (but not both).
- When listing requirements under the required or preferred skills section, each sentence should end in a semi-colon (.) except for the last bullet which should end with a period (.)
Job Profile Name: *TC/Recruiting to Update*
Child Name: *TC/Recruiting to Update*
Global LoS: *TC/Recruiting to Update*
Global Network: *TC/Recruiting to Update*
Global Competency Network: *TC/Recruiting to Update*
Go-To-Market: Managed Services
Sector: Not Applicable
Programme Type: Experienced
Additional Responsibilities: (This field may be used to describe the daily role, duties and/or purpose of this Job Profile/Job Description. The field is limited to 500 characters, including spaces.)
Leads reliability improvements across applications, platforms, and cloud systems. Drives automation, enhances observability, optimizes performance, and conducts root-cause analysis. Partners with engineering teams to reduce toil, improve operational maturity, and strengthen service resilience.
Minimum Degree Required: Bachelors
Degree Preferred: Bachelors or master’s in science, Computer Science, Engineering
Minimum Years of Experience: 5-7 year(s)
Certifications Required: None
Certifications Preferred: AWS Solutions Architect Associate; Azure Administrator; Kubernetes CKA; Terraform Associate; ITIL Foundation, Observability certifications, Scripting and Coding Certifications will be great as well.
Required / Mandatory Knowledge/Skills: (character count limit 5000) *PLEASE ONLY USE THIS FIELD IF THIS IS A MUST HAVE SKILL FOR APPLICANT*
- Strong understanding of SRE practices including SLIs/SLOs, error budgets, service health, and operational KPIs
- Ability to automate operational tasks using Python, Shell, PowerShell, Go, or similar languages
- Experience improving alerting systems, reducing noise, and refining observability instrumentation
- Proficiency with cloud platforms and core services (compute, storage, networking, serverless)
- Experience executing root-cause analysis and problem management
- Ability to lead incident response and coordinate cross-team troubleshooting
- Experience identifying systemic reliability gaps and proposing engineering solutions
- Ability to design performance tests, validate reliability risks, and assess scalability
- Strong communication skills for partnering with development, operations, and leadership
Preferred Knowledge/Skills: (character count limit 5000)* PLEASE MAKE THIS A BULLETED LIST WHERE EACH SENTENCE STARTS WITH THE SAME VERB TENSE (I.E. PROVIDES, DEVELOPS, FACILITATES, ETC.)
- Leads tuning of monitoring rules, dashboards, and reliability metrics;
- Leads development of automation to reduce operational toil and manual interventions;
- Leads incident response actions and service stabilization procedures;
- Leads post-incident reviews and contributes to long-term fixes;
- Leads resilience initiatives such as chaos testing and failover drills;
- Leads capacity forecasting and risk identification;
- Leads refinement of operational standards, documentation, and runbooks;
- Leads collaboration with product and engineering teams to embed reliability requirements.