Join the UAE’s largest bank and one of the world’s largest and safest financial institutions. Our focus is to create value for our employees, customers, shareholders and communities to grow through differentiation, agility and innovation. We are looking for top talent and your success is our success. Accelerate your growth as you help us reach our goals and advance your career. Be ready to make your mark a top company, in an exciting and dynamic industry.
Job Description
Overall objectives
To establish and maintain an effective, intelligent, and timely alerting framework across infrastructure, application, and business services.
To coordinate and continuously improve the incident management lifecycle with a focus on early detection, rapid response, and root cause accountability.
To integrate observability data (logs, metrics, traces) into a unified alerting and incident response workflow.
To reduce Mean Time to Detect (MTTD) and Mean Time to Resolve (MTTR) through automation, clear escalation paths, and operational discipline.
Role Specific Responsibilities
Manage and continuously improve the incident response process, including triage, escalation, status communications, and resolution tracking.
Act as the incident commander during major outages or high-severity issues, coordinating technical teams toward resolution.
Maintain and govern on-call schedules, escalation paths, and responder playbooks.
Integrate observability tools with incident management platforms to enable real-time, contextual alerting.
Lead and document root cause analysis (RCA) and ensure completion of follow-up actions and preventive measures.
Report on incident metrics and trends, identifying areas for resilience and process improvement.
General Functional Responsibilities
Maintain detailed documentation on alert rules, incident workflows, contact rosters, and escalation trees.
Ensure compliance with regulatory, audit, and risk management requirements related to incident response and system availability.
Collaborate with monitoring, logging, and APM peers to align telemetry signals with operational response.
Work with development, infrastructure, and support teams to embed alert and incident management best practices in SDLC and change management.
Participate in regular incident simulations and on-call readiness drills.
Drive continuous improvement through retrospective reviews, blameless post-mortems, and incident automation.
Qualifications
Core competencies required
Strong experience with alert management platforms such as Opsgenie, Splunk On-Call, ServiceNow Event Management, or VictorOps.
Familiarity with routing rules, escalation policies, noise suppression, on-call schedules, and alert deduplication.
Deep understanding of the end-to-end incident management process—detection, triage, escalation, communication, and closure.
Proficient in running major incident bridges, documenting timelines, and leading post-incident reviews (PIRs/RCAs).
Calm and assertive in high-pressure incident scenarios.
Excellent communicator—able to coordinate with technical and business stakeholders during incidents..
Searching, interviewing and hiring are all part of the professional life. The TALENTMATE Portal idea is to fill and help professionals doing one of them by bringing together the requisites under One Roof. Whether you're hunting for your Next Job Opportunity or Looking for Potential Employers, we're here to lend you a Helping Hand.
Disclaimer: talentmate.com is only a platform to bring jobseekers & employers together.
Applicants
are
advised to research the bonafides of the prospective employer independently. We do NOT
endorse any
requests for money payments and strictly advice against sharing personal or bank related
information. We
also recommend you visit Security Advice for more information. If you suspect any fraud
or
malpractice,
email us at abuse@talentmate.com.
You have successfully saved for this job. Please check
saved
jobs
list
Applied
You have successfully applied for this job. Please check
applied
jobs list
Do you want to share the
link?
Please click any of the below options to share the job
details.
Report this job
Success
Successfully updated
Success
Successfully updated
Thank you
Reported Successfully.
Copied
This job link has been copied to clipboard!
Apply Job
Upload your Profile Picture
Accepted Formats: jpg, png
Upto 2MB in size
Your application for Senior Engineer- Alerting And Incident Management
has been successfully submitted!
To increase your chances of getting shortlisted, we recommend completing your profile.
Employers prioritize candidates with full profiles, and a completed profile could set you apart in the
selection process.
Why complete your profile?
Higher Visibility: Complete profiles are more likely to be viewed by employers.
Better Match: Showcase your skills and experience to improve your fit.
Stand Out: Highlight your full potential to make a stronger impression.
Complete your profile now to give your application the best chance!