The Junior Site Reliability Engineer (SRE) is responsible for ensuring the availability, performance, and reliability of production systems hosted on Google Cloud Platform (GCP), with a strong focus on voice and real-time communication services. This role provides L2 production support, actively manages incidents, and drives root cause analysis to prevent recurrence. You will work closely with engineering, network, and operations teams to improve system resilience, automate operational tasks, and meet SLA commitments. The ideal candidate brings a strong mix of cloud reliability engineering and voice/VoIP technical expertise in a live production environment.
Specific Duties And Responsibilities
Monitor Production Systems: Use monitoring tools (e.g., Cloud Monitoring) to ensure the health and performance of cloud-based production systems on Google Cloud Platform (GCP).
Incident Management: Respond to production incidents, triage issues, and ensure timely resolution. Perform root cause analysis (RCA) and document findings.
Performance Tuning: Analyze system performance, identify bottlenecks, and make recommendations for improvements to optimize service reliability, scalability, and speed.
System Alerts and Incident Escalation: Set up and maintain system alerts to proactively detect issues. Escalate critical issues to appropriate teams and ensure swift resolution.
Collaboration with Engineering: Work closely with development and operations teams to ensure smooth production releases, provide feedback on system performance, and implement monitoring solutions for new services.
System Documentation: Maintain documentation related to system configurations, monitoring setups, and incident resolutions to create knowledge-sharing practices across teams.
Service Level Agreements (SLAs): Track and report on SLA performance, ensuring that production services meet predefined availability and reliability standards.
Proactive System Health Checks: Conduct routine system health checks, reviewing logs and performance metrics, to ensure system uptime.
Disaster Recovery and Backup: Monitor backup systems and ensure that disaster recovery procedures are in place and tested.
COMPETENCIES
Core Competencies
3+ years experience in cloud production support, Site Reliability Engineering, or System Reliability roles
3+ years hands-on experience with Google Cloud Platform (GCP), including Compute Engine, GKE, Cloud Monitoring, Logging, and Storage
3+ years experience using monitoring and observability tools to track system health and performance
3+ years experience in system performance metrics (CPU, memory, disk, network) and issue diagnosis
3+ years experience managing incidents and troubleshooting live production systems
3+ years experience in scripting or automation using Bash, Python, or similar languages
Complementary Competencies
Strong experience with VoIP and UC technologies including SIP, RTP/SRTP, WebRTC, SBCs (Ribbon, Oracle, AudioCodes), SIP trunks, gateways, and voice codecs (G.711, G.729)
Proven ability to troubleshoot IP telephony and real-time communications using tools such as Wireshark and network analyzers
Solid understanding of network fundamentals (TCP/IP, VLANs, routing, switching, QoS) and voice security best practices (TLS, SRTP, firewalls)
Experience integrating voice, contact center (ACD/IVR), and UC platforms within cloud-native and hybrid environments
Proficiency in automation and scripting for voice and system management (Python, Bash, PowerShell)
Experience with observability and monitoring tools (Prometheus, Grafana, Zabbix, Elastic Stack)
Hands-on exposure to network and VoIP analysis tools such as Netscout NG1 and Wireshark
Familiarity with automation and CI/CD tools (Ansible, N8N, Jenkins, GitLab CI/CD)
Exposure to multi-cloud environments (AWS, Azure)
Certifications (Preferred)
CCNA (Collaboration) or CompTIA Network+
Cloud certifications (GCP, AWS, or Azure)
Qualifications
Educational Qualifications
Bachelor’s degree in computer science, Information Technology, or related field.
Searching, interviewing and hiring are all part of the professional life. The TALENTMATE Portal idea is to fill and help professionals doing one of them by bringing together the requisites under One Roof. Whether you're hunting for your Next Job Opportunity or Looking for Potential Employers, we're here to lend you a Helping Hand.
Disclaimer: talentmate.com is only a platform to bring jobseekers & employers together.
Applicants
are
advised to research the bonafides of the prospective employer independently. We do NOT
endorse any
requests for money payments and strictly advice against sharing personal or bank related
information. We
also recommend you visit Security Advice for more information. If you suspect any fraud
or
malpractice,
email us at abuse@talentmate.com.
You have successfully saved for this job. Please check
saved
jobs
list
Applied
You have successfully applied for this job. Please check
applied
jobs list
Do you want to share the
link?
Please click any of the below options to share the job
details.
Report this job
Success
Successfully updated
Success
Successfully updated
Thank you
Reported Successfully.
Copied
This job link has been copied to clipboard!
Apply Job
Upload your Profile Picture
Accepted Formats: jpg, png
Upto 2MB in size
Your application for Site Reliability Engineer GCP Work From Home
has been successfully submitted!
To increase your chances of getting shortlisted, we recommend completing your profile.
Employers prioritize candidates with full profiles, and a completed profile could set you apart in the
selection process.
Why complete your profile?
Higher Visibility: Complete profiles are more likely to be viewed by employers.
Better Match: Showcase your skills and experience to improve your fit.
Stand Out: Highlight your full potential to make a stronger impression.
Complete your profile now to give your application the best chance!