Job Description

Job Title: Site Reliability Engineer

Level: Senior

Working Hours: Full Time (40h/Week)

Contract: Contractor

Location: Remote (APAC)

Your Team 👥

You will report to our Head Of Infrastructure and Deployment and join the Engineering team. The Site Reliability Engineering (SRE) team is dedicated to engineering, maintaining, and continuously improving the reliability, scalability, and performance of all critical Rocket.Chat systems and services. Our mission is to ensure an exceptional and uninterrupted experience for our users and customers, bridging the gap between development and operations to deliver value efficiently and automatically. On TheOrg you can view the complete structure of our organisation, including information about every team member, hiring managers and the size of each department.

Your Responsabilities ✏️

As a Senior Site Reliability Engineer, you will play a critical role in enhancing the reliability, performance, and scalability of Rocket.Chats entire ecosystem. You will apply software engineering principles to infrastructure and operations, proactively preventing outages, optimizing system efficiency, and ensuring that new features and services are delivered with the highest standards of stability. Your expertise will be instrumental in delivering exceptional user experiences across our core platform, internal infrastructure, and customer-facing services.

Mandatory Hard Skills 🎯

  • Strong background in software engineering with expertise in large-scale distributed systems.
  • Expertise in Kubernetes, including operator development, and cloud platforms (e.g., AWS, GCP, Azure, OVH).
  • Proficiency in programming/scripting languages such as Go, Python, or Bash for tooling and operator development.
  • Deep, hands-on experience with monitoring, logging, and alerting systems (e.g., Prometheus, Grafana, Loki).
  • Experience with Infrastructure as Code (IaC) tools like Terraform, Pulumi or Ansible and CI/CD pipelines using tools like ArgoCD.
  • Solid understanding of networking fundamentals (TCP/IP, DNS, routing) and security principles.
  • Familiarity with database technologies such as MongoDB or Redis.

Desirable Hard Skills đź’•

  • Practical experience with chaos engineering principles and tools.
  • Experience with disaster recovery planning, testing, and implementation.
  • Familiarity with agile management tools such as Jira.

Soft Skills ✨

  • Proactive Mindset: Anticipate and address potential issues before they impact users.
  • Collaboration: Work seamlessly with other teams, sharing knowledge and expertise to drive reliability.
  • Problem-Solving: Strong troubleshooting and analytical skills to identify the root cause of complex issues across diverse technical stacks.
  • Leadership: Guide and inspire team members, especially during incidents, and effectively communicate with both technical and non-technical stakeholders.
  • Data-Driven Decisions: Base decisions on metrics and data to drive improvements.
  • Passion: Genuine enthusiasm for what you do and how it contributes to our companys mission;
  • Dream: Proactively seek out opportunities and challenges to achieve extraordinary results. If youre someone who takes initiative and is always striving to improve, youll fit right in;
  • Own: Take ownership of your work, set high standards for yourself, and be accountable for outcomes demonstrating a strong sense of responsibility and commitment. Take full responsibility for the reliability and performance of all Rocket.Chat services and infrastructure.
  • Trust: Recognizing the importance of trust and support and actively working towards a collaborative and inclusive workplace;
  • Share: Communicating openly and transparently, ensures clarity and honesty in interactions.

What Youll Do 🖥️

  • Engineer & Operate Deployment & Platform Services: Design, develop, and maintain the Kubernetes Operators at the core of our managed hosting offerings, ensuring their reliability, scalability, and robust error handling.
  • Manage & Optimize Core Infrastructure: Oversee the reliability and performance of foundational infrastructure, including multiple Kubernetes clusters and critical services like ArgoCD, Traefik, and our monitoring stack.
  • Ensure Service Reliability & Uptime: Define, monitor, and enforce SLOs for all critical services, manage error budgets, and implement robust monitoring, alerting, and logging solutions.
  • Automate Operations & Reduce Toil: Develop and maintain automation frameworks for deployment, configuration, and operational tasks, building internal tools to streamline SRE workflows.
  • Lead Incident Management & On-Call Response: Act as a primary responder for critical alerts, lead blameless post-mortems, and continuously improve runbook documentation and disaster recovery plans.
  • Foster Cross-Functional Collaboration: Engage early in the product lifecycle to ensure reliability is built-in, and collaborate with Engineering, Security, and QA to integrate reliability best practices.
  • Implement Advanced Reliability Practices: Conduct proactive load testing, performance analysis, and chaos engineering experiments to identify system weaknesses and improve fault tolerance.

Benefits ✨

  • Fully Remote & Flexible Working Hours
  • Flexible Paid Time Off, Holidays and Vacation
  • Company Laptop
  • Remote Benefit
  • iTalki, Courses and Books
  • Stock Options
  • Multicultural Environment
  • Vibrant Company Culture

Check out our handbook to dive into each of our awesome benefits! At Rocket.Chat, we have tailored base pay ranges according to work locations. This approach ensures that we can competitively and consistently compensate our employees across different geographic markets.

  • While we define an initial seniority level and budget for each role, this can be adjusted during the hiring process. The selection process itself — including interviews and assessments — helps us better understand where the candidate fits within our career framework and which grade they should be positioned in.
  • To ensure fairness and consistency, all applications are accepted exclusively via our Careers site. Submissions through other channels will not be taken into consideration.

About Rocket.Chat 🚀

‍Rocket.Chat is the worlds largest open-source communications platform. Built for organizations needing more control over their communications, Rocket.Chat Secure CommsOS™ is a communication platform that unifies messaging, voice, video, AI, and mission-critical applications—ensuring uncompromising security, compliance, and operational efficiency for governments, defense, and critical infrastructure organizations operating in highly-regulated environments.

Tens of millions of users in over 150 countries and organizations such as Deutsche Bahn, the U.S. Navy and Credit Suisse trust Rocket.Chat every day to keep their communications completely private and secure. As Rocket.Chat we believe in reconnecting the world, one conversation at a time!

See yourself in that? So apply now! Check out our handbook for more information about our rocket.

If youre interested in keeping up with new roles at Rocket.Chat, you can now set up custom job alerts. Just click the link, pick the types of roles you want to hear about, and get notified whenever there’s a match.


Job Details

Role Level: Entry-Level Work Type: Full-Time
Country: India City: Greater Bengaluru Area
Company Website: https://rocket.chat Job Function: Engineering
Company Industry/
Sector:
Software Development

What We Offer


About the Company

Searching, interviewing and hiring are all part of the professional life. The TALENTMATE Portal idea is to fill and help professionals doing one of them by bringing together the requisites under One Roof. Whether you're hunting for your Next Job Opportunity or Looking for Potential Employers, we're here to lend you a Helping Hand.

Report

Disclaimer: talentmate.com is only a platform to bring jobseekers & employers together. Applicants are advised to research the bonafides of the prospective employer independently. We do NOT endorse any requests for money payments and strictly advice against sharing personal or bank related information. We also recommend you visit Security Advice for more information. If you suspect any fraud or malpractice, email us at abuse@talentmate.com.


Recent Jobs
View More Jobs
Talentmate Instagram Talentmate Facebook Talentmate YouTube Talentmate LinkedIn