End-to-end service ownership: design for telemetry, security, resiliency, scalability, and performance; lead sizing/architecture; drive service health reviews and process simplification.
Incident management and prevention: lead postmortems/RCAs, coordinate fixes, define repair items, and implement data-driven prevention and continuous improvement.
AI/ML and GenAI delivery: design and integrate solutions with LLMs, RAG, agentic workflows, and conversational AI; build low-latency model serving and retraining pipelines.
Application engineering: develop performant microservices for distributed, containerized, cloud-native systems.
Automation: eliminate toil by automating operational workflows, recovery procedures, code delivery, and configuration management; build internal tools and reusable scripts/services to accelerate delivery and reduce errors.
Observability: define and implement monitoring, logging, alerting, and tracing strategies; establish SLOs/SLIs/error budgets; improve diagnostics and performance visibility for rapid triage.
Cross-functional collaboration: partner with product, operations, and data teams to translate requirements into secure, scalable solutions; communicate effectively with technical and non-technical stakeholders.
Minimum Qualifications
BS/MS in Computer Science or related field; 10+ years of software engineering in cloud environments.
Strong in distributed systems/microservices using java / python; SQL/data modeling; python for AI/automation.
SRE/DevOps expertise: systems and networking fundamentals, application security, observability, performance analysis, and incident response.
Proven SDLC excellence: code quality, reviews, version control, CI/CD, testing, and release engineering.
Excellent written and verbal communication; English fluency.
Preferred/Technical Skills
AI/ML/GenAI: experience with foundational models, RAG, agentic architectures; model deployment, optimization, monitoring, and retraining.
Cloud and containers: experience with containerization, orchestration, and resilient, fault-tolerant microservices.
Observability: hands-on experience designing dashboards, alerts, traces, logs, and metrics; defining SLOs/SLIs and error budgets; on-call readiness and runbook quality.
Operations: performance tuning across java / python and SQL for large-scale enterprise applications; strong Linux/Unix expertise; capacity planning and reliability reviews.
Automation and scripting: proficiency in scripting to automate operational workflows, build tooling, and CI/CD tasks (e.g., shell scripting, python, configuration-as-code, task runners).
Familiarity with enterprise ERP applications and standard DevOps tooling and practices.
Qualifications
Career Level - IC4
About Us
As a world leader in cloud solutions, Oracle uses tomorrow’s technology to tackle today’s challenges. We’ve partnered with industry-leaders in almost every sector—and continue to thrive after 40+ years of change by operating with integrity.
We know that true innovation starts when everyone is empowered to contribute. That’s why we’re committed to growing an inclusive workforce that promotes opportunities for all.
Oracle careers open the door to global opportunities where work-life balance flourishes. We offer competitive benefits based on parity and consistency and support our people with flexible medical, life insurance, and retirement options. We also encourage employees to give back to their communities through our volunteer programs.
We’re committed to including people with disabilities at all stages of the employment process. If you require accessibility assistance or accommodation for a disability at any point, let us know by emailing accommodation-request_mb@oracle.com or by calling +1 888 404 2494 in the United States.
Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans’ status, or any other characteristic protected by law. Oracle will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.
Searching, interviewing and hiring are all part of the professional life. The TALENTMATE Portal idea is to fill and help professionals doing one of them by bringing together the requisites under One Roof. Whether you're hunting for your Next Job Opportunity or Looking for Potential Employers, we're here to lend you a Helping Hand.
Disclaimer: talentmate.com is only a platform to bring jobseekers & employers together.
Applicants
are
advised to research the bonafides of the prospective employer independently. We do NOT
endorse any
requests for money payments and strictly advice against sharing personal or bank related
information. We
also recommend you visit Security Advice for more information. If you suspect any fraud
or
malpractice,
email us at abuse@talentmate.com.
You have successfully saved for this job. Please check
saved
jobs
list
Applied
You have successfully applied for this job. Please check
applied
jobs list
Do you want to share the
link?
Please click any of the below options to share the job
details.
Report this job
Success
Successfully updated
Success
Successfully updated
Thank you
Reported Successfully.
Copied
This job link has been copied to clipboard!
Apply Job
Upload your Profile Picture
Accepted Formats: jpg, png
Upto 2MB in size
Your application for Principal Service Reliability Engineer
has been successfully submitted!
To increase your chances of getting shortlisted, we recommend completing your profile.
Employers prioritize candidates with full profiles, and a completed profile could set you apart in the
selection process.
Why complete your profile?
Higher Visibility: Complete profiles are more likely to be viewed by employers.
Better Match: Showcase your skills and experience to improve your fit.
Stand Out: Highlight your full potential to make a stronger impression.
Complete your profile now to give your application the best chance!