Talentmate
United Arab Emirates
1st September 2025
2509-3235-14
Lead Site Reliability Engineer (SRE)
Department: IT
Position Overview
Emcode architects and operates the sovereign telematics backbone for the UAEs most critical government and enterprise entities. We are seeking an Lead Site Reliability Engineer (SRE) to take ownership of the deployment, maintenance, and resilience of our mission-critical SaaS/PaaS infrastructure. This role is the bedrock of our commitment to delivering 99.99% uptime and guaranteeing absolute data sovereignty.
The Lead SRE will be the subject matter expert for our entire production environment, ensuring the scalability, security, and performance of the platforms that process over 400 billion data points annually. The ideal candidate is a master of Kubernetes distributions (i.e. Rancher, Open Shift) and an expert in managing large-scale, distributed data systems like Cassandra, ScyllaDB, and PostgreSQL. You will be responsible for the automation, security hardening, and disaster recovery strategies that underpin our national-scale telematics ecosystems, including SecurePath and Shahin.
Key ResponsibilitiesInfrastructure Architecture G Automation
● Design, deploy, and maintain the scalable, secure, and resilient Rancher Kubernetes-based infrastructure for all Emcode SaaS and PaaS offerings.
● Automate infrastructure provisioning, configuration management, and application deployment pipelines to enhance velocity and reliability.
● Manage and optimize our high-throughput, distributed database clusters, including Cassandra, ScyllaDB, PostgreSQL, MongoDB, ElasticSearch, Kafka and/or RabbitMQ broker ensuring data integrity and performance.
● Develop and maintain sophisticated monitoring, logging, and alerting systems to ensure proactive issue identification and resolution.
System Resilience G Security
● Master and manage all aspects of our Linux-based environment, primarily on Ubuntu Server, ensuring systems are hardened, patched, and configured according to industry best practices.
● Architect, implement, and regularly test comprehensive disaster recovery and business continuity plans to uphold our stringent 99.99% uptime Service Level Agreement (SLA).
● Implement and enforce rigorous security protocols across the entire infrastructure stack, protecting sensitive telematics data and ensuring compliance with DESC and SIRA standards.
● Conduct performance tuning, capacity planning, and cost optimization for our sovereign self hosted & Cloud infrastructure.
Operational Excellence G Collaboration
● Serve as the highest point of escalation for complex infrastructure-related incidents, leading troubleshooting and resolution efforts.
● Collaborate closely with the Software Engineering team to create and refine CI/CD pipelines for our Go-based microservices and other applications.
● Create and maintain detailed documentation for infrastructure architecture, system configurations, and operational procedures.
● Provide mentorship and technical guidance to other members of the technology team.
● A Bachelor’s degree in Computer Science, Systems Engineering, or a related technical field.
● A minimum of 8 years of hands-on experience in a Site Reliability Engineering (SRE), DevOps, or Systems Engineering role, managing large-scale, 24/7 production environments.
● Expert-level mastery of selfhosted Kubernetes is an absolute necessity, including cluster design, deployment, scaling, and security.
● Proven experience in deploying and managing large scale distributed NoSQL databases (Cassandra, ScyllaDB) and relational databases (PostgreSQL).
● Deep proficiency in Linux administration, particularly with Ubuntu Server and/or Suse Enterprise, including server hardening procedures and understanding of system hard and soft limits.
● Strong scripting and automation skills (e.g., Bash, Python, GoLang) and experience with Infrastructure as Code (IaC) tools (e.g., Pulumi, Terraform, Ansible).
● Deep proficiency and knowledge of software (HAProxy) and hardware load balancing (Fortigate, Barracuda, PaloAlto) with previous track of successful implementations on high load websites.
● Previous experience with network storage solutions (NAS), storage area networks (SUN) and self-hosted S3 solutions (i.e. MinIO)
Desired Skills & Competencies
● Security Acumen: A profound understanding of modern infrastructure security principles, including network security, access control, and vulnerability management.
● Disaster Recovery: Verifiable experience in designing and executing successful disaster recovery drills and strategies for mission-critical systems.
● Database Management: Advanced knowledge of database performance tuning, replication, and backup/recovery strategies.
● Go Language: Familiarity with Go (Golang) and experience supporting the deployment of Go-based applications and tools is highly advantageous. Python language too.
● Problem-Solving: Elite analytical and troubleshooting skills with the ability to diagnose and resolve complex issues across the entire technology stack.
● Ownership: A proactive and accountable professional who thrives in a high-stakes environment and takes ultimate ownership of infrastructure stability and performance.
● Collaboration: Strong ability to work effectively with software development teams to build a culture of reliability and operational excellence.
Role Level: | Entry-Level | Work Type: | Full-Time |
---|---|---|---|
Country: | United Arab Emirates | City: | Dubai |
Company Website: | https://www.uxe.ai/ | Job Function: | Information Technology (IT) |
Company Industry/ Sector: |
Other |
Searching, interviewing and hiring are all part of the professional life. The TALENTMATE Portal idea is to fill and help professionals doing one of them by bringing together the requisites under One Roof. Whether you're hunting for your Next Job Opportunity or Looking for Potential Employers, we're here to lend you a Helping Hand.
Disclaimer: talentmate.com is only a platform to bring jobseekers & employers together. Applicants are advised to research the bonafides of the prospective employer independently. We do NOT endorse any requests for money payments and strictly advice against sharing personal or bank related information. We also recommend you visit Security Advice for more information. If you suspect any fraud or malpractice, email us at abuse@talentmate.com.