125 Site Reliability Engineer jobs in Singapore

Site Reliability Engineer

Singapore, Singapore Thales

Posted today

Job Viewed

Tap Again To Close

Job Description

Join to apply for the Site Reliability Engineer role at Thales

Location: Singapore, Singapore

Thales people architect identity management and data protection solutions at the heart of digital security. Business and governments rely on us to bring trust to billions of digital interactions they have with people. Our technologies and services help banks exchange funds, people cross borders, energy become smarter, and much more. Over 30,000 organizations already rely on us to verify identities, grant access to digital services, analyze data, and encrypt information to secure the connected world.

Established in Singapore since 1973, Thales supports aerospace, defence, security, ground transportation, and digital identity sectors, employing over 2,100 people across all business areas.

Responsibilities:
  1. Manage ODC products in GCP Cloud following the SRE approach within a DevOps team.
  2. Develop and maintain Infrastructure as Code (IaC) and automation tools.
  3. Provide technical direction for high-impact changes requiring Tier II input.
  4. Support operations tasks to shape the product roadmap and ensure operational readiness.
  5. Complete handovers to Tier I and II teams to meet SLAs.
  6. Monitor systems with real-time monitoring tools.
  7. Review technical documentation for products and clients.
  8. Maintain the integrity of solution baselines and architecture.
  9. Guide the evolution of services and conduct technical analyses.
  10. Deploy Thales products in cloud environments.
  11. Perform performance tuning, technological updates, and monitoring.
  12. Conduct onboarding testing, communicate risks, and develop mitigation plans.
  13. Provide 24/7 on-call support in shifts.
Requirements:
  1. Degree in Computer Science or related field.
  2. Hands-on experience with Kubernetes and GCP in a production environment.
  3. 4+ years in infrastructure and application development and deployment.
  4. Knowledge of Agile methodologies and service delivery best practices.
  5. Experience with GCP, monitoring tools, networking, Linux.
  6. Strong skills in system integration, automation tools like Gitlab, and scripting (Shell/Python).
  7. Proficiency in Linux, TCP/IP, HTTP(S).
  8. Knowledge of Docker, Kubernetes, and cloud services; Kubernetes certification and telecom experience are highly preferred.
  9. Scrum certification is a plus.
Other Information:
  • Location: One North
  • Hours: Monday-Friday, 9am-6pm
  • On-call support in rotation (average once every 2 months)

At Thales, we offer careers, not just jobs. With a global presence of 80,000 employees, our mobility policy supports career development worldwide. We embrace flexibility for a smarter way of working. Great journeys start here—apply now!

#J-18808-Ljbffr
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Singapore, Singapore Point72 Asset Management, L.P

Posted today

Job Viewed

Tap Again To Close

Job Description

A Career with Point72’s Technology Team

As Point72 reimagines the future of investing, our Technology group is constantly improving our company’s IT infrastructure, positioning us at the forefront of a rapidly evolving technology landscape. We’re a team of experts experimenting, discovering new ways to harness the power of open-source solutions, and embracing enterprise agile methodology. We encourage professional development to ensure you bring innovative ideas to our products while satisfying your own intellectual curiosity.


What you’ll do

You will play a highly critical operational role where you will apply a combination of software and systems engineering skills to develop and maintain a complex set of distributed, real-time systems that serve critical stakeholders in Point72’s Global Macro business.

You will focus on optimizing the operations of existing systems and infrastructure in an efficient manner, through a strict adherence to automation and tooling. Specifically, you will:

  • Build out foundational technical components of an extensive SRE program across multiple complex systems, both new and existing
  • Collaborate with our development and quant teams to ensure that ongoing change is consistent with a pre-determined, measurable set of SLOs spanning multiple complex user interactions with our systems
  • Monitor system capacity and performance, identifying and addressing potential future bottlenecks and sources of instability before they become impactful to our stakeholders
  • Review and provide feedback on automation code developed by peers to maintain high standards of code quality and efficiency
  • Troubleshoot and resolve system issues, analyzing their impact on infrastructure and service operations
  • Participate in or lead design reviews with peers and stakeholders, evaluating and selecting the best technologies and automation strategies for our needs

What’s required

We are looking for highly motivated, proactive engineers, who enjoy the challenge of working with complex systems and environments. Specifically, you should have:

  • Demonstrated work experience as a site reliability engineer or similar role
  • Strong coding skills – Python, PowerShell are required. Ability to comprehend C# and write basic applications in the language
  • Experience with Windows and Linux based operating systems, Cloud-based services and infrastructure (AWS), Infrastructure as Code (Terraform) and Configuration Management (Ansible)
  • Experience with container technologies: Docker, Kubernetes, AWS EKS and ECS
  • Self-motivated individual with great communication and interpersonal skills, and a very strong sense of ownership
  • Commitment to the highest ethical standards

We take care of our people

We invest in our people, their careers, their health, and their well-being. When you work here, we provide:

  • Health care benefits
  • Generous parental and family leave policies
  • Mental and physical wellness programs
  • Volunteer opportunities
  • Non-profit matching gift program
  • Support for employee-led affinity groups representing women, minorities and the LGBT+ community
  • Tuition assistance

About Point72

Point72 Asset Management is a global firm led by Steven Cohen that invests in multiple asset classes and strategies worldwide. Resting on more than a quarter-century of investing experience, we seek to be the industry’s premier asset manager through delivering superior risk-adjusted returns, adhering to the highest ethical standards, and offering the greatest opportunities to the industry’s brightest talent. We’re inventing the future of finance by revolutionizing how we develop our people and how we use data to shape our thinking. For more information, visit


#J-18808-Ljbffr
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Singapore, Singapore LANDI Global

Posted today

Job Viewed

Tap Again To Close

Job Description

Get AI-powered advice on this job and more exclusive features.

Direct message the job poster from LANDI Global

IHRP-SP | HRBP @ LANDI Global | Payment & Merchant Solutions

As a Site Reliability Engineer, you will assist in the operation and maintenance of LANDI Global infrastructures. Your responsibilities will include supporting the platform's reliability and performance while learning from senior engineers.

Key Responsibilities:

  • Help build and maintain platform infrastructures across various environments.
  • Collaborate with the R&D team to ensure platform availability and scalability.
  • Assist in implementing monitoring and alerting systems for timely issue resolution.
  • Support the maintenance of Disaster Recovery plans for business continuity.
  • Analyze performance metrics and contribute to cost optimization strategies.
  • Participate in automated testing, CI/CD processes, and deployment efficiency.
  • Help manage incident reporting and change management processes.
  • Provide operational support for platforms and assist with production issues.
  • Participate in a 24/7 standby rotation.

EXPERIENCES

  • At least 3 years or more experience in similar capacity
  • Excellent oral and written communication in English.

PREFERRED SKILLS

Candidates should ideally have experience in some of the following technologies:

  • Experience in various cloud technologies (e.g. AWS, Azure)
  • Experience in high-level programming or scripting languages
  • Experience in monitoring tools (e.g. Prometheus, Grafana, Zabbix)
  • Experience in configuration management tools (e.g. Ansible, Chef, Puppet)
  • Experience in SQL databases (e.g. Postgres, MySQL)
  • Experience in load balancing and reverse proxies (e.g. Nginx)
  • Experience in CI/CD tools (e.g. Jenkins, GitLab)
  • Experience in Containerization (e.g. Dockers, K8s)
Seniority level
  • Seniority level Associate
Employment type
  • Employment type Full-time
Job function
  • Job function Engineering and Information Technology
  • Industries IT Services and IT Consulting

Referrals increase your chances of interviewing at LANDI Global by 2x

Get notified about new Site Reliability Engineer jobs in Singapore, Singapore .

South East Community Development Council, Singapore 5 days ago

Intern, Software Engineer (Jul - Dec 2025) Production Engineer / Site Reliability Engineer

Downtown Core, Central Singapore Community Development Council, Singapore 2 days ago

DevOps Engineering, Engineer (1 year contract) Backend Software Engineer (Solutions), TikTok Local Service Site Reliability Engineer (EMEA, Japan, Singapore, Australia) Backend Software Engineer, Business Integrity Site Reliability Engineer (Elite Fintech) $175,000 +Bonus Information Technology - Cloud/DevOps Engineer Platform Engineer – Automation (Terraform / Ansible / Python) - Elite FinTech - $25,000- 250,000 SGD + Bonus Engineer/Senior Engineer, Site Reliability Engineering

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

#J-18808-Ljbffr
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Singapore, Singapore Apple Inc.

Posted today

Job Viewed

Tap Again To Close

Job Description

To view your favorites, sign in with your Apple Account.

There is a lot that goes into building the most secure yet user-friendly devices in the world. We are a unique Software Development group with a charter to secure our platforms, which include iOS software, iOS Devices, and Mac. We build solutions that are used by our customers, engineering teams, and manufacturing environments. We are looking for a Site Reliability Engineer (SRE) who would be responsible for deploying, monitoring, troubleshooting, and developing tools for all team's solutions. The SRE position requires a mix of strategic engineering and design along with hands-on, technical work. You will have experience in being a Systems Administrator or a Programmer that has moved on to DevOps/Automation in their career. You will configure, tune, and tackle multi-tiered systems to achieve optimal application performance, stability, and availability. You will work closely with the systems engineers, network engineers, database administrators, monitoring team, and information security team. For this position, strict application security and high availability requirements need to be consistent to achieve optimal solutions. This hiring team is a rare team focused on security initiatives that provides critical IT solutions across most of Apple’s product lines. These solutions are utilized from the manufacturing space all the way to customer-facing solutions. We are looking for a hardworking individual who can excel in a dynamic environment, who can be a self-starter and bring their passion to ensure quality and reliability of the solutions we maintain.

Description

Review hardware, software infrastructure and application functionality for optimization. Identify performance bottlenecks. Responsible for the full system lifecycle including configuration, code deployment in user acceptance test and production environments. Monitor infrastructure and application services and drive incident management. Collaborate with Apple's production support team, application engineers, project managers, systems engineers, network engineers, database administrators, and QA team to effectively ensure availability and reliability of solutions.

Minimum Qualifications
  • Unix or Linux administration and performance tuning skills, 3 ~ 5 years of leading services in a large scale *nix environment.
  • Java and JVM technologies runtime configurations and troubleshooting. Or proficient in Python/Go/other scripting language.
  • Experience with DevOps tools, processes, and culture.
  • Validated experience with Automation skills using Ansible, Chef, Jenkins, Puppet.
Preferred Qualifications
  • Oracle DB knowledge and troubleshooting skills.
  • Infrastructure knowledge of Networks, load balancers, Firewalls and WAF.
  • SDLC and release engineering including source code repository and build tools including SVN and GIT.
  • Network, System and Application Security knowledge.
  • Experience with Kafka or other message queueing technology a plus.
#J-18808-Ljbffr
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Singapore, Singapore Visier Solutions Inc

Posted today

Job Viewed

Tap Again To Close

Job Description

Visier is the global leader in AI-powered people analytics, workforce planning, and compensation management solutions, helping organizations gain a Workforce AI Edge. With over 60,000 customers in 75 countries, including enterprises like BASF, Panasonic, and Ford Motor Company, we empower businesses to understand the relationship between people and work, adapt faster to change, and drive better outcomes.

Backed by leading investors and valued at $1B, Visier is at the forefront of transforming the HR landscape through innovation and data-driven insights. Join us in our mission to unlock the business-transforming potential of people data.

Visier is seeking a skilled Site Reliability Engineer to join our dynamic team. As an SRE at Visier, you'll directly contribute to the reliability and scalability of our cloud-based analytics platform. You'll work alongside experienced engineers, tackling complex infrastructure challenges and mastering essential SRE practices. You'll gain hands-on experience with technologies like Kubernetes, Kafka, Cassandra, and a comprehensive suite of AWS services, building a strong foundation to develop your career with Visier in the future!

What you'll be doing.
  • Architect and Automate: Design, develop, and deploy infrastructure as code using Terraform and Packer, ensuring high availability and scalability in our AWS environment.
  • Pipeline Mastery: Build and optimize robust CI/CD pipelines with Jenkins and Groovy, streamlining deployment processes and enhancing release velocity.
  • Security by Design: Implement and enhance security best practices within our infrastructure, safeguarding sensitive data and ensuring compliance.
  • Infrastructure Optimization: Troubleshoot, monitor, and improve the performance and reliability of critical infrastructure services, including Kong API gateway, Cassandra, PostgreSQL, Consul, Vault, and Kafka.
  • Develop SRE Tools: Write and maintain automation scripts and tools using Python to streamline operational tasks and improve efficiency.
  • Contribute to System Design: Participate in system design discussions and contribute to the evolution of our infrastructure architecture.
  • Incident Response: Participate in on-call rotations and contribute to incident response efforts, learning to diagnose and resolve production issues.
What you'll bring to the table.
  • A strong foundation in software development principles and a passion for infrastructure as code.
  • Proficiency in at least one programming language (Python, Java, Scala, Go, etc.).
  • A solid understanding of Linux systems administration and networking fundamentals.
  • A proactive approach to problem-solving, with a strong sense of ownership and accountability.
  • A desire to learn and grow in a fast-paced, collaborative environment.
  • A strong understanding of CI/CD concepts.
  • A desire to learn and improve security practices.
Bonus Points:
  • Experience with AWS services (EC2, S3, RDS, etc.) and related tools.
  • Familiarity with containerization and orchestration technologies (Docker, Kubernetes).
  • Knowledge of configuration management tools (Ansible, Chef, Puppet).
  • Experience with monitoring and logging tools (Prometheus, Grafana, ELK stack).
  • Experience with Infrastructure as code tools.

Most importantly, you share our values.

Diversity, Equity & Inclusion

Visier is committed to creating a diverse and inclusive workplace to ensure every employee feels a sense of belonging and is connected to their work, their team and Visier. It is imperative that we take every opportunity to measure, track and advance this commitment. Building a diverse and inclusive workplace is essential to the success of Visier and the well-being of our employees. The information you provide helps make our diversity data actionable.

#J-18808-Ljbffr
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Singapore, Singapore Tek Systems

Posted today

Job Viewed

Tap Again To Close

Job Description

  • Monitor production systems using tools like Grafana and New Relic to detect performance issues and security vulnerabilities.
  • Respond to live incidents and outages, perform root cause analysis, and drive postmortem documentation and learning.
  • Maintain up-to-date operational runbooks for common issues and workflows.

A leading global gaming and technology company is seeking a highly capable Site Reliability Engineer (SRE) to join their team in Singapore. This is a mission-critical role where you’ll own the reliability, scalability, and performance of complex distributed systems supporting a global platform. You’ll work at the intersection of software development and operations—designing robust systems, responding to live incidents, and driving automation across infrastructure and CI/CD processes.

The Position:

· Monitor production systems using tools like Grafana and New Relic to detect performance issues and security vulnerabilities.

· Respond to live incidents and outages, perform root cause analysis, and drive postmortem documentation and learning.

· Maintain up-to-date operational runbooks for common issues and workflows.

· Collaborate closely with developers to streamline production releases, patches, and deployment workflows.

· Manage infrastructure across cloud environments (primarily AWS), and optimize CI/CD pipelines for reliability and efficiency.

· Handle capacity planning, system performance tuning, and implement infrastructure-as-code using tools like Terraform.

The Candidate:

· Comes from a backend or full-stack development background and is comfortable coding in languages such as Java, JavaScript/TypeScript, or Bash.

· Has experience running services at scale in cloud environments like AWS, with a strong understanding of Linux.

· Thinks like a software engineer, but with the mindset of an operator—proactively preventing outages and continuously improving systems.

· Is adept at debugging under pressure, analyzing logs/metrics, and communicating clearly during incidents.

· Is passionate about automation, observability, and creating self-healing systems.

Preferred Qualifications

· 3+ years of experience in site reliability engineering, DevOps, or software engineering roles.

· Proven skills in:

o Monitoring & alerting tools (Grafana, New Relic)

o CI/CD pipelines (Git, Jenkins, GitHub Actions, etc.)

o Container orchestration (Docker, Kubernetes)

o Infrastructure-as-code (Terraform, CloudFormation, Ansible)

o Managing and securing AWS environments

· Understanding of authentication/authorization protocols (OAuth, JWT, OpenID)

· Familiarity with SQL/NoSQL databases (PostgreSQL, Redis, MongoDB)

· Strong interpersonal skills and a collaborative approach to working with cross-functional teams.

We regret to inform that only shortlisted candidates will be notified / contacted.

EA Registration No: R22105541,TAY ZHIHENG, DARIUS

Allegis Group Singapore Pte Ltd, Company Reg No. 200909448N, EA License No. 10C4544

#J-18808-Ljbffr
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Singapore, Singapore Sleek

Posted today

Job Viewed

Tap Again To Close

Job Description

3 weeks ago Be among the first 25 applicants

About Sleek

Through proprietary software and AI, along with a focus on customer delight, Sleek makes the back-office easy for micro SMEs.

About Sleek

Through proprietary software and AI, along with a focus on customer delight, Sleek makes the back-office easy for micro SMEs.

We give Entrepreneurs time back to focus on what they love doing - growing their business and being with customers. With a surging number of Entrepreneurs globally, we are innovating in a highly lucrative space.

We operate 3 business segments:

Corporate Secretary: Automating the company incorporation, secretarial, filing, Nominee Director, mailroom and immigration processes via custom online robots and SleekSign. We are the market leaders in Singapore with :5% market share of all new business incorporations

Accounting & Bookkeeping: Redefining what it means to do Accounting, Bookkeeping, Tax and Payroll thanks to our proprietary SleekBooks ledger, AI tools and exceptional customer service

Business banking: Overcoming a key challenge for Entrepreneurs by offering digital banking services to new businesses

Sleek launched in 2017 and now has around 15,000 customers across our offices in Singapore, Hong Kong, Australia and the UK. We have around 450 staff with an intact startup mindset.

We have achieved >70% compound annual growth in Revenue over the last 5 years and as a result have been recognised by The Financial Times, The Straits Times, Forbes and LinkedIn as one of the fastest growing companies in Asia. Backed by world-class investors, we are on track to be one of the few cash flow positive, tech-enabled unicorns based out of Singapore.

Some Other Great Things About Working At Sleek.

Humility and kindness: Humility is a core attribute we hire for, which means we have a culture of not taking ourselves too seriously and being able to laugh. Kindness is also incredibly important. We are committed to creating and nurturing a diverse and inclusive environment.

Flexibility: You'll be able to work from home 5 days per week. If you need to start early or start late to cater to your family or other needs, we don't mind, so long as you get your work done and proactively communicate. You can also work fully remote from anywhere in the world for 1 month each year

Financial benefits: We pay competitive market salaries and provide staff with generous paid time off and holiday schedules. Certain staff at Sleek are also eligible for our employee share ownership plan and can share in the upside of our stellar growth trajectory as we work toward listing on a prominent stock exchange in the Asia Pacific region.

Personal growth: You'll get a lot of responsibility and autonomy at Sleek - we move at a fast pace so you'll be making decisions, making mistakes and learning. There's also a range of internal and external facing training programmes we run. We're also at the forefront of utilising AI in our space and are developing a regional centre of AI excellence. It is our intention that if you leave Sleek, you leave as a more well-rounded person and professional.

Sleek is also a proudly certified B Corp. Since we started our journey in 2017, we've been committed to building Sleek as a force for good. In just over 5 years, we've joined a community of industry leaders like Patagonia, Ben & Jerry's, and P&G who are building an inclusive, equitable, and a regenerative economy. We have planted over 29,271 trees to reforest our ecosystem and saved 7 tons of paper from landfills by processing over 1.4M pages through SleekSign. We aim to be Carbon Neutral by 2030.

About The Role

We are looking for Service Reliability Engineer that is excited about the below Mission and Outcomes over the next 6-12 months.

Mission: The primary mission of a Service Reliability Engineer is to serve as the highest level of technical escalation within the support team. This role focuses on resolving complex technical issues that cannot be addressed by frontline support, ensuring minimal downtime, and maintaining the highest level of customer satisfaction. The engineer collaborates with development teams, product management, and other stakeholders to diagnose, troubleshoot, and resolve issues, as well as to contribute to continuous improvement initiatives that enhance the overall quality of products and services.

Outcomes:

  • Collaborate with friendly Product, Tech and Data teams to reproduce and resolve an array of enquiries
  • Support our internal business stakeholders on questions about Sleek platform functionality
  • Provide technical support through our channels via Zendesk, JIRA, and phone
  • Identification of fixes and development of code
  • Configure servers and networks in the Cloud
  • Update database records to keep client records up-to-date
  • Update knowledge base to improve self-servicing

To do this, you will have a minimum of 5 years experience as Software Engineer and you will most likely be located in Singapore, Philippines, India or Malaysia.

Behavioural fit is also important at Sleek, and we will be looking for candidates that have a proven track record of embodying the below attributes in their recent roles:

Ownership : This shows reliability and helps build trust within the team. We move fast and need to know that everyone will see things through to completion and proactively help to get things back on track when challenges arise. Accountability is really important to us.

Humility: There is so much we don't know. Humility allows for open-mindedness to feedback and a willingness to learn from others. It paves the way for collaboration and creates a positive work environment. It is a key ingredient of self awareness and emotional intelligence.

Structured Thinking: Our business is complex with many layers (many services, many countries, many cultures). Regardless of whether you're more analytical or creative in nature, being able to show sound judgement is important to us. It ensures solutions are pragmatic and balance the needs of the organisation, team and customers.

Data driven: We are a data rich business with :15,000 small customers. Each decision we make can impact many more people than we realise - so it's critical that we use sound data to support our strategies and review the success of our initiatives.

Can have tough conversations in a positive way: It's not a matter of if, but when difficult interpersonal situations arise. Disagreement, conflict and disappointment are a given in a fast moving business where people care about their work. People that proactively have tough conversations with kindness build empathy, trust and great working relationships.

Analytical Mindset: You have a keen eye for detail and a methodical approach to dissecting problems. You excel at analysing complex systems and processes to identify weaknesses and inefficiencies, and your ability to evaluate multiple scenarios enables you to devise the best testing strategies. You apply data-driven decisions to enhance testing coverage and performance metrics, ensuring the highest standards of software quality.

Collaboration-Driven: You thrive in a cross-functional team environment, working closely with developers, product managers, and operations teams to ensure alignment on requirements and testing goals. You communicate effectively, advocate for quality throughout the development process, and proactively address potential issues before they arise, fostering a culture of shared responsibility for delivering exceptional software.

Requirements

Performance Standard

  • Possess a strong academic foundation, ideally from reputable universities
  • Demonstrated track record working in well-established or recognized organizations
  • Over 5 years of experience as a Software Engineer
  • Able to demonstrate an attention to detail
  • Shows customer-attentiveness and have faced customers in past positions
  • Strong analytical and problem-solving skills
  • Experience with JavaScript tech stack (Node, NestJS, React, Vue)
  • Experience with MongoDB, RDBMS (PostgreSQL, MySQL, etc.)
  • Experience with Cloud Platforms (e.g. AWS)
  • Bitbucket and Github(CI/CD knowledge)
  • Proficiency in reading logs, especially Splunk
  • Understanding load testing tools and analyzing results


Benefits

  • Remote/ Hybrid arrangements

Seniority level
  • Seniority level Mid-Senior level
Employment type
  • Employment type Full-time
Job function
  • Job function Other
  • Industries IT Services and IT Consulting

Referrals increase your chances of interviewing at Sleek by 2x

Sign in to set job alerts for “Site Reliability Engineer” roles. Site Reliability Engineer (EMEA, Japan, Singapore, Australia) Senior Site Reliability Engineer, Environment Automation Senior Site Reliability / Gitops Engineer Software Engineer, Backend (International Exchange) AI/ML Senior Software Engineer (Singapore) Software Engineer, Frontend (International Exchange) Quality Assurance Engineer (Software) - Contractor Python and Kubernetes Software Engineer - Data, AI/ML & Analytics Software Engineer - Solutions Engineering Python and Kubernetes Software Engineer - Data, Workflows, AI/ML & Analytics

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

#J-18808-Ljbffr
This advertiser has chosen not to accept applicants from your region.
Be The First To Know

About the latest Site reliability engineer Jobs in Singapore !

Site Reliability Engineer

Singapore, Singapore Hamilton Barnes ?

Posted today

Job Viewed

Tap Again To Close

Job Description

Get AI-powered advice on this job and more exclusive features.

Direct message the job poster from Hamilton Barnes

Job Title: Site Reliability Engineer

Employment Type: Full-Time / Permanent

Eligibility: Singapore Citizens only (due to project and clearance requirements)

About the Role

Our client, a respected IT solutions provider supporting national infrastructure and enterprise systems, is seeking a Site Reliability Engineer (SRE) to join their growing operations team. This role is ideal for candidates with strong infrastructure and automation experience who are passionate about system reliability, scalability, and performance in secure production environments.

Key Responsibilities

  • Ensure availability, performance, and scalability of production systems and services
  • Develop and maintain CI/CD pipelines and automated deployment processes
  • Implement monitoring, alerting, and observability tools
  • Manage infrastructure using Infrastructure-as-Code (IaC) tools such as Terraform or Ansible
  • Troubleshoot and resolve production issues across the stack (infrastructure, applications, network)
  • Drive incident management, root cause analysis, and long-term remediation
  • Collaborate closely with development and DevOps teams to improve system design and reliability
  • Maintain operational documentation, SOPs, and runbooks

Requirements

  • Singapore Citizen (due to project and clearance eligibility)
  • Diploma or Degree in Computer Science, Engineering, or related field
  • 3 to 6 years of experience in SRE, DevOps, or infrastructure roles
  • Proficient with Linux/Unix systems and public cloud platforms (AWS, Azure, or GCP)
  • Hands-on experience with containerization (Docker, Kubernetes)
  • Strong scripting abilities (e.g., Python, Bash, or Go)
  • Familiarity with CI/CD tools such as Jenkins, GitLab CI, or equivalent
  • Experience with monitoring/logging platforms (Prometheus, Grafana, ELK, etc.)
  • Understanding of networking concepts including DNS, routing, load balancing, and firewalls
  • Strong problem-solving skills and a proactive mindset

Why Apply

  • Contribute to secure, high-impact projects across the public sector and regulated industries
  • Join a team focused on automation, resilience, and infrastructure modernization
  • Long-term career growth in a structured, well-established organisation

How to Apply

If your background aligns with the above and you're looking to make a meaningful impact in a secure, production-grade environment, please send your resume to

Note: Only Singapore Citizens will be considered for this role due to clearance requirements.

Seniority level
  • Seniority level Not Applicable
Employment type
  • Employment type Full-time
Job function
  • Job function Information Technology
  • Industries IT Services and IT Consulting

Referrals increase your chances of interviewing at Hamilton Barnes by 2x

Sign in to set job alerts for “Site Reliability Engineer” roles. Site Reliability Engineer Intern - 2025 Start Production Engineer / Site Reliability Engineer Software Engineer Intern, Dev Infra - 2025 Start Customer Engineer, Data Analytics and AI, Google Cloud WeChat - Senior Site Reliability Engineer Software Engineer, AI Acceleration, Android Information Technology - Cloud/DevOps Engineer Site Reliability Engineer (EMEA, Japan, Singapore, Australia) Head of Engineering, Systems & Services - APAC Software Development Engineer In Test Intern, Trust and Safety Engineering (2025 Start) Backend Software Engineer, Global LIVE Fund Safety Intern- 2025 Start Site Reliability Engineer-(Fresh-Grad)(A98145) Software Development Engineer in Test Intern , TikTok - 2025 Start Site Reliability Engineer (SRE) (GovTech)

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

#J-18808-Ljbffr
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Singapore, Singapore StarHub

Posted today

Job Viewed

Tap Again To Close

Job Description

Join to apply for the Site Reliability Engineer role at StarHub

Join to apply for the Site Reliability Engineer role at StarHub

Job Description

We are looking for a talented and motivated Site Reliability Engineer (SRE) to join our team. This role requires a mix of infrastructure expertise, hands-on observability experience, and DevOps skills. As an SRE, you will be instrumental in building reliable, scalable, and efficient systems. The ideal candidate will have hands-on experience with Terraform, Ansible, and log analytics tools, combined with proficiency in working with Linux, Kubernetes, and AIOps platforms.

Job Description

We are looking for a talented and motivated Site Reliability Engineer (SRE) to join our team. This role requires a mix of infrastructure expertise, hands-on observability experience, and DevOps skills. As an SRE, you will be instrumental in building reliable, scalable, and efficient systems. The ideal candidate will have hands-on experience with Terraform, Ansible, and log analytics tools, combined with proficiency in working with Linux, Kubernetes, and AIOps platforms.

Key Responsibilities

  • Design, deploy, and manage scalable infrastructure using Infrastructure as Code (IaC) tools such as Terraform, Ansible and GitHub.
  • Implement and maintain observability solutions using ELK, Grafana suite (e.g. Loki, Tempo, Mimir, and Prometheus), ensuring complete monitoring, logging, and tracing capabilities.
  • Leverage OpenTelemetry to instrument applications and collect telemetry data for performance insights and system health.
  • Automate configuration and operational tasks using Ansible to reduce manual efforts.
  • Manage and monitor Kubernetes clusters and Linux-based systems to ensure optimal performance and availability.
  • Integrate and support SNMP-based Network Performance Monitoring (NPM) tools like SolarWinds, SevOne, or OpsRamp for network observability.
  • Implement event management systems and AIOps platforms for proactive incident detection, correlation, and automated resolution.
  • Collaborate with DevOps teams to build and maintain CI/CD pipelines for continuous integration and delivery.
  • Perform incident management, conduct post-incident reviews, and drive long-term improvements through root-cause analysis.
  • Maintain detailed documentation for infrastructure, automation workflows, troubleshooting procedures, and operational best practices.


Required Expertise And Experience

  • At least 3 years of experience in SRE, DevOps, or a related engineering role.
  • Proficiency in Infrastructure as Code (IaC) using Terraform to manage complex infrastructure.
  • Hands-on experience with log analytics and observability tools, including ELK (Elasticsearch, Logstash, Kibana) and the Grafana suite (Loki, Tempo, Mimir, Prometheus).
  • Knowledge and experience with OpenTelemetry for distributed tracing and telemetry collection.
  • Experience working with Kubernetes clusters and Linux-based systems in production environments.
  • Expertise in automation using Ansible to streamline configuration and deployment processes.
  • Knowledge of SNMP-based NPM tools such as SolarWinds, SevOne, or OpsRamp for network monitoring.
  • Experience with AIOps platforms for event correlation and automated incident management.
  • Strong background in CI/CD practices, with hands-on involvement in building pipelines for software delivery.


Required Skills And Qualifications

  • Technical Skills:
    • Infrastructure management with Terraform.
    • Observability with ELK, Grafana suite, and OpenTelemetry.
    • Automation using Ansible.
    • Kubernetes orchestration and Linux system administration.
    • Expertise in SNMP-based NPM tools (SolarWinds, SevOne, or OpsRamp).
    • Experience with AIOps and event management platforms.
  • Soft Skills:
    • Strong problem-solving abilities with a focus on automation and continuous improvement.
    • Excellent communication and collaboration skills across cross-functional teams.
    • Ability to thrive in a dynamic, fast-paced environment and manage multiple priorities.
  • Preferred Knowledge:
    • Familiarity with GitOps practices for infrastructure management.
    • Understanding of Service Level Indicators (SLIs) and Service Level Objectives (SLOs).
    • Security awareness and experience implementing secure infrastructure.
  • Education:
    • Bachelor’s degree in Computer Science, Information Technology, or a related field, or equivalent work experience.

Seniority level
  • Seniority level Mid-Senior level
Employment type
  • Employment type Full-time
Job function
  • Job function Engineering and Information Technology
  • Industries Telecommunications

Referrals increase your chances of interviewing at StarHub by 2x

Site Reliability Engineer Intern - 2025 Start Production Engineer / Site Reliability Engineer Software Engineer Intern, Dev Infra - 2025 Start Customer Engineer, Data Analytics and AI, Google Cloud Site Reliability Engineer (EMEA, Japan, Singapore, Australia) WeChat - Senior Site Reliability Engineer Information Technology - Cloud/DevOps Engineer Software Engineer, AI Acceleration, Android Head of Engineering, Systems & Services - APAC Software Development Engineer In Test Intern, Trust and Safety Engineering (2025 Start) Backend Software Engineer, Global LIVE Fund Safety Intern- 2025 Start Site Reliability Engineer-(Fresh-Grad)(A98145) Software Development Engineer in Test Intern , TikTok - 2025 Start Backend Software Engineer, TikTok Eng Privacy and Security(Location) Intern - 2025 Start Site Reliability Engineer (SRE) (GovTech) Platform Engineer, Operations & Technology

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

#J-18808-Ljbffr
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Singapore, Singapore Aptitude Asia Limited

Posted today

Job Viewed

Tap Again To Close

Job Description

Our client, a top-tier hedge fund, is looking to hire a talented Site Reliability Engineer to join their growing SRE team in Singapore.

Job Responsibilities:

  • Ensure high reliability, availability, and performance of applications throughout their lifecycle.
  • Automate repetitive tasks and systematically address recurring issues.
  • Generate innovative ideas for application improvements and participate in their implementation.
  • Analyze end-to-end workflows across various technologies and business processes.
  • Manage incident response and resolution while promoting the SRE philosophy across teams.

Job Requirements:

  • Solid foundation in computer science principles, including data structures and algorithms, as well as distributed systems.
  • Proficiency in at least one modern programming language, with a preference for Python or Java.
  • Familiarity with contemporary software development practices, including testing, version control, and CI/CD.
  • Experience with SQL and databases is highly desirable; knowledge of web technologies (JavaScript, CSS, React) is a plus.
  • Strong communication skills, both written and verbal, along with a proactive and entrepreneurial mindset.
#J-18808-Ljbffr
This advertiser has chosen not to accept applicants from your region.
 

Nearby Locations

Other Jobs Near Me

Industry

  1. request_quote Accounting
  2. work Administrative
  3. eco Agriculture Forestry
  4. smart_toy AI & Emerging Technologies
  5. school Apprenticeships & Trainee
  6. apartment Architecture
  7. palette Arts & Entertainment
  8. directions_car Automotive
  9. flight_takeoff Aviation
  10. account_balance Banking & Finance
  11. local_florist Beauty & Wellness
  12. restaurant Catering
  13. volunteer_activism Charity & Voluntary
  14. science Chemical Engineering
  15. child_friendly Childcare
  16. foundation Civil Engineering
  17. clean_hands Cleaning & Sanitation
  18. diversity_3 Community & Social Care
  19. construction Construction
  20. brush Creative & Digital
  21. currency_bitcoin Crypto & Blockchain
  22. support_agent Customer Service & Helpdesk
  23. medical_services Dental
  24. medical_services Driving & Transport
  25. medical_services E Commerce & Social Media
  26. school Education & Teaching
  27. electrical_services Electrical Engineering
  28. bolt Energy
  29. local_mall Fmcg
  30. gavel Government & Non Profit
  31. emoji_events Graduate
  32. health_and_safety Healthcare
  33. beach_access Hospitality & Tourism
  34. groups Human Resources
  35. precision_manufacturing Industrial Engineering
  36. security Information Security
  37. handyman Installation & Maintenance
  38. policy Insurance
  39. code IT & Software
  40. gavel Legal
  41. sports_soccer Leisure & Sports
  42. inventory_2 Logistics & Warehousing
  43. supervisor_account Management
  44. supervisor_account Management Consultancy
  45. supervisor_account Manufacturing & Production
  46. campaign Marketing
  47. build Mechanical Engineering
  48. perm_media Media & PR
  49. local_hospital Medical
  50. local_hospital Military & Public Safety
  51. local_hospital Mining
  52. medical_services Nursing
  53. local_gas_station Oil & Gas
  54. biotech Pharmaceutical
  55. checklist_rtl Project Management
  56. shopping_bag Purchasing
  57. home_work Real Estate
  58. person_search Recruitment Consultancy
  59. store Retail
  60. point_of_sale Sales
  61. science Scientific Research & Development
  62. wifi Telecoms
  63. psychology Therapy
  64. pets Veterinary
View All Site Reliability Engineer Jobs