At PwC, our people in infrastructure focus on designing and implementing robust, secure IT systems that support business operations. They enable the smooth functioning of networks, servers, and data centres to optimise performance and minimise downtime. Those in cloud operations at PwC will focus on managing and optimising cloud infrastructure and services to enable seamless operations and high availability for clients. You will be responsible for monitoring, troubleshooting, and implementing industry leading practices for cloud-based systems.
Driven by curiosity, you are a reliable, contributing member of a team. In our fast-paced environment, you are expected to adapt to working with a variety of clients and team members, each presenting varying challenges and scope. Every experience is an opportunity to learn and grow. You are expected to take ownership and consistently deliver quality work that drives value for our clients and success as a team. As you navigate through the Firm, you build a brand for yourself, opening doors to more opportunities.
Skills
Examples of the skills, knowledge, and experiences you need to lead and deliver value at this level include but are not limited to:
- Apply a learning mindset and take ownership for your own development.
- Appreciate diverse perspectives, needs, and feelings of others.
- Adopt habits to sustain high performance and develop your potential.
- Actively listen, ask questions to check understanding, and clearly express ideas.
- Seek, reflect, act on, and give feedback.
- Gather information from a range of sources to analyse facts and discern patterns.
- Commit to understanding how the business works and building commercial awareness.
- Learn and apply professional and technical standards (e.g. refer to specific PwC tax and audit guidance), uphold the Firms code of conduct and independence requirements.
Instructions
- Please update areas marked in red 2. Link to Tips & Tricks for Writing PwC Job Description
- Quick Tips for Reviewing your JD!
- Make sure you have the appropriate header sentence based on the level of the JD (i.e. Manager level role should start with appropriate descriptor “Demonstrates extensive abilities and/or a proven record of success as a team leader:” The appropriate header can be found in the Tips and Tricks document provided above.
- Be mindful of grammatical consistency. the list should either be all verb-driven or all noun-driven (but not both).
- When listing requirements under the required or preferred skills section, each sentence should end in a semi-colon (.) except for the last bullet which should end with a period (.)
Job Profile Name: *TC/Recruiting to Update*
Child Name: *TC/Recruiting to Update*
Global LoS: *TC/Recruiting to Update*
Global Network: *TC/Recruiting to Update*
Global Competency Network: *TC/Recruiting to Update*
Go-To-Market: Managed Services
Sector: Not Applicable
Programme Type: Experienced
Additional Responsibilities: (This field may be used to describe the daily role, duties and/or purpose of this Job Profile/Job Description. The field is limited to 500 characters, including spaces.)
Supports day-to-day reliability operations by monitoring system health, responding to incidents, improving runbook quality, and assisting in automation efforts. Helps ensure availability, performance, and operational stability across cloud and on-prem environments while learning core SRE practices.
Minimum Degree Required: Bachelors
Degree Preferred: Bachelors or master’s in science, Computer Science, Engineering
Minimum Years of Experience: 2-4 year(s)
Certifications Required: None
Certifications Preferred: AWS Cloud Practitioner; Azure Fundamentals; ITIL Foundation, Observability certifications, Scripting and Coding Certifications will be great as well.
Required / Mandatory Knowledge/Skills: (character count limit 5000) *PLEASE ONLY USE THIS FIELD IF THIS IS A MUST HAVE SKILL FOR APPLICANT*
- Foundational understanding of SRE principles such as SLIs, SLOs, error budgets, and reliability metrics
- Experience monitoring system availability, latency, capacity, and resource utilization
- Basic scripting skills (Python, Shell, PowerShell, or Go)
- Ability to execute runbooks, respond to incidents, and escalate appropriately
- Familiarity with cloud platforms (AWS, Azure, or GCP)
- Basic troubleshooting across Linux or Windows environments
- Understanding of logging, metrics, and alerting tools
- Interest in automation, workflow optimization, and reducing manual tasks
- Ability to document operational procedures and update knowledge repositories
Preferred Knowledge/Skills: (character count limit 5000)* PLEASE MAKE THIS A BULLETED LIST WHERE EACH SENTENCE STARTS WITH THE SAME VERB TENSE (I.E. PROVIDES, DEVELOPS, FACILITATES, ETC.)
- Supports monitoring and alert tuning for high-availability services;
- Supports incident response efforts by collecting logs, metrics, and diagnostic data;
- Supports development of small automation scripts to eliminate repetitive tasks;
- Supports execution of resilience tests, failover validation, and recovery checks;
- Supports deployment validation and post-implementation health checks;
- Supports capacity reviews and basic scalability assessments;
- Supports refinement of runbooks, SOPs, and operational documenta