125 Site Reliability Engineer jobs in Singapore
Site Reliability Engineer
Posted today
Job Viewed
Job Description
Join to apply for the Site Reliability Engineer role at Thales
Location: Singapore, Singapore
Thales people architect identity management and data protection solutions at the heart of digital security. Business and governments rely on us to bring trust to billions of digital interactions they have with people. Our technologies and services help banks exchange funds, people cross borders, energy become smarter, and much more. Over 30,000 organizations already rely on us to verify identities, grant access to digital services, analyze data, and encrypt information to secure the connected world.
Established in Singapore since 1973, Thales supports aerospace, defence, security, ground transportation, and digital identity sectors, employing over 2,100 people across all business areas.
Responsibilities:- Manage ODC products in GCP Cloud following the SRE approach within a DevOps team.
- Develop and maintain Infrastructure as Code (IaC) and automation tools.
- Provide technical direction for high-impact changes requiring Tier II input.
- Support operations tasks to shape the product roadmap and ensure operational readiness.
- Complete handovers to Tier I and II teams to meet SLAs.
- Monitor systems with real-time monitoring tools.
- Review technical documentation for products and clients.
- Maintain the integrity of solution baselines and architecture.
- Guide the evolution of services and conduct technical analyses.
- Deploy Thales products in cloud environments.
- Perform performance tuning, technological updates, and monitoring.
- Conduct onboarding testing, communicate risks, and develop mitigation plans.
- Provide 24/7 on-call support in shifts.
- Degree in Computer Science or related field.
- Hands-on experience with Kubernetes and GCP in a production environment.
- 4+ years in infrastructure and application development and deployment.
- Knowledge of Agile methodologies and service delivery best practices.
- Experience with GCP, monitoring tools, networking, Linux.
- Strong skills in system integration, automation tools like Gitlab, and scripting (Shell/Python).
- Proficiency in Linux, TCP/IP, HTTP(S).
- Knowledge of Docker, Kubernetes, and cloud services; Kubernetes certification and telecom experience are highly preferred.
- Scrum certification is a plus.
- Location: One North
- Hours: Monday-Friday, 9am-6pm
- On-call support in rotation (average once every 2 months)
At Thales, we offer careers, not just jobs. With a global presence of 80,000 employees, our mobility policy supports career development worldwide. We embrace flexibility for a smarter way of working. Great journeys start here—apply now!
#J-18808-LjbffrSite Reliability Engineer
Posted today
Job Viewed
Job Description
As Point72 reimagines the future of investing, our Technology group is constantly improving our company’s IT infrastructure, positioning us at the forefront of a rapidly evolving technology landscape. We’re a team of experts experimenting, discovering new ways to harness the power of open-source solutions, and embracing enterprise agile methodology. We encourage professional development to ensure you bring innovative ideas to our products while satisfying your own intellectual curiosity.
What you’ll do
You will play a highly critical operational role where you will apply a combination of software and systems engineering skills to develop and maintain a complex set of distributed, real-time systems that serve critical stakeholders in Point72’s Global Macro business.
You will focus on optimizing the operations of existing systems and infrastructure in an efficient manner, through a strict adherence to automation and tooling. Specifically, you will:
- Build out foundational technical components of an extensive SRE program across multiple complex systems, both new and existing
- Collaborate with our development and quant teams to ensure that ongoing change is consistent with a pre-determined, measurable set of SLOs spanning multiple complex user interactions with our systems
- Monitor system capacity and performance, identifying and addressing potential future bottlenecks and sources of instability before they become impactful to our stakeholders
- Review and provide feedback on automation code developed by peers to maintain high standards of code quality and efficiency
- Troubleshoot and resolve system issues, analyzing their impact on infrastructure and service operations
- Participate in or lead design reviews with peers and stakeholders, evaluating and selecting the best technologies and automation strategies for our needs
What’s required
We are looking for highly motivated, proactive engineers, who enjoy the challenge of working with complex systems and environments. Specifically, you should have:
- Demonstrated work experience as a site reliability engineer or similar role
- Strong coding skills – Python, PowerShell are required. Ability to comprehend C# and write basic applications in the language
- Experience with Windows and Linux based operating systems, Cloud-based services and infrastructure (AWS), Infrastructure as Code (Terraform) and Configuration Management (Ansible)
- Experience with container technologies: Docker, Kubernetes, AWS EKS and ECS
- Self-motivated individual with great communication and interpersonal skills, and a very strong sense of ownership
- Commitment to the highest ethical standards
We take care of our people
We invest in our people, their careers, their health, and their well-being. When you work here, we provide:
- Health care benefits
- Generous parental and family leave policies
- Mental and physical wellness programs
- Volunteer opportunities
- Non-profit matching gift program
- Support for employee-led affinity groups representing women, minorities and the LGBT+ community
- Tuition assistance
About Point72
Point72 Asset Management is a global firm led by Steven Cohen that invests in multiple asset classes and strategies worldwide. Resting on more than a quarter-century of investing experience, we seek to be the industry’s premier asset manager through delivering superior risk-adjusted returns, adhering to the highest ethical standards, and offering the greatest opportunities to the industry’s brightest talent. We’re inventing the future of finance by revolutionizing how we develop our people and how we use data to shape our thinking. For more information, visit
#J-18808-Ljbffr
Site Reliability Engineer
Posted today
Job Viewed
Job Description
Get AI-powered advice on this job and more exclusive features.
Direct message the job poster from LANDI Global
IHRP-SP | HRBP @ LANDI Global | Payment & Merchant SolutionsAs a Site Reliability Engineer, you will assist in the operation and maintenance of LANDI Global infrastructures. Your responsibilities will include supporting the platform's reliability and performance while learning from senior engineers.
Key Responsibilities:
- Help build and maintain platform infrastructures across various environments.
- Collaborate with the R&D team to ensure platform availability and scalability.
- Assist in implementing monitoring and alerting systems for timely issue resolution.
- Support the maintenance of Disaster Recovery plans for business continuity.
- Analyze performance metrics and contribute to cost optimization strategies.
- Participate in automated testing, CI/CD processes, and deployment efficiency.
- Help manage incident reporting and change management processes.
- Provide operational support for platforms and assist with production issues.
- Participate in a 24/7 standby rotation.
EXPERIENCES
- At least 3 years or more experience in similar capacity
- Excellent oral and written communication in English.
PREFERRED SKILLS
Candidates should ideally have experience in some of the following technologies:
- Experience in various cloud technologies (e.g. AWS, Azure)
- Experience in high-level programming or scripting languages
- Experience in monitoring tools (e.g. Prometheus, Grafana, Zabbix)
- Experience in configuration management tools (e.g. Ansible, Chef, Puppet)
- Experience in SQL databases (e.g. Postgres, MySQL)
- Experience in load balancing and reverse proxies (e.g. Nginx)
- Experience in CI/CD tools (e.g. Jenkins, GitLab)
- Experience in Containerization (e.g. Dockers, K8s)
- Seniority level Associate
- Employment type Full-time
- Job function Engineering and Information Technology
- Industries IT Services and IT Consulting
Referrals increase your chances of interviewing at LANDI Global by 2x
Get notified about new Site Reliability Engineer jobs in Singapore, Singapore .
South East Community Development Council, Singapore 5 days ago
Intern, Software Engineer (Jul - Dec 2025) Production Engineer / Site Reliability EngineerDowntown Core, Central Singapore Community Development Council, Singapore 2 days ago
DevOps Engineering, Engineer (1 year contract) Backend Software Engineer (Solutions), TikTok Local Service Site Reliability Engineer (EMEA, Japan, Singapore, Australia) Backend Software Engineer, Business Integrity Site Reliability Engineer (Elite Fintech) $175,000 +Bonus Information Technology - Cloud/DevOps Engineer Platform Engineer – Automation (Terraform / Ansible / Python) - Elite FinTech - $25,000- 250,000 SGD + Bonus Engineer/Senior Engineer, Site Reliability EngineeringWe’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
#J-18808-LjbffrSite Reliability Engineer
Posted today
Job Viewed
Job Description
To view your favorites, sign in with your Apple Account.
There is a lot that goes into building the most secure yet user-friendly devices in the world. We are a unique Software Development group with a charter to secure our platforms, which include iOS software, iOS Devices, and Mac. We build solutions that are used by our customers, engineering teams, and manufacturing environments. We are looking for a Site Reliability Engineer (SRE) who would be responsible for deploying, monitoring, troubleshooting, and developing tools for all team's solutions. The SRE position requires a mix of strategic engineering and design along with hands-on, technical work. You will have experience in being a Systems Administrator or a Programmer that has moved on to DevOps/Automation in their career. You will configure, tune, and tackle multi-tiered systems to achieve optimal application performance, stability, and availability. You will work closely with the systems engineers, network engineers, database administrators, monitoring team, and information security team. For this position, strict application security and high availability requirements need to be consistent to achieve optimal solutions. This hiring team is a rare team focused on security initiatives that provides critical IT solutions across most of Apple’s product lines. These solutions are utilized from the manufacturing space all the way to customer-facing solutions. We are looking for a hardworking individual who can excel in a dynamic environment, who can be a self-starter and bring their passion to ensure quality and reliability of the solutions we maintain.
DescriptionReview hardware, software infrastructure and application functionality for optimization. Identify performance bottlenecks. Responsible for the full system lifecycle including configuration, code deployment in user acceptance test and production environments. Monitor infrastructure and application services and drive incident management. Collaborate with Apple's production support team, application engineers, project managers, systems engineers, network engineers, database administrators, and QA team to effectively ensure availability and reliability of solutions.
Minimum Qualifications- Unix or Linux administration and performance tuning skills, 3 ~ 5 years of leading services in a large scale *nix environment.
- Java and JVM technologies runtime configurations and troubleshooting. Or proficient in Python/Go/other scripting language.
- Experience with DevOps tools, processes, and culture.
- Validated experience with Automation skills using Ansible, Chef, Jenkins, Puppet.
- Oracle DB knowledge and troubleshooting skills.
- Infrastructure knowledge of Networks, load balancers, Firewalls and WAF.
- SDLC and release engineering including source code repository and build tools including SVN and GIT.
- Network, System and Application Security knowledge.
- Experience with Kafka or other message queueing technology a plus.
Site Reliability Engineer
Posted today
Job Viewed
Job Description
Visier is the global leader in AI-powered people analytics, workforce planning, and compensation management solutions, helping organizations gain a Workforce AI Edge. With over 60,000 customers in 75 countries, including enterprises like BASF, Panasonic, and Ford Motor Company, we empower businesses to understand the relationship between people and work, adapt faster to change, and drive better outcomes.
Backed by leading investors and valued at $1B, Visier is at the forefront of transforming the HR landscape through innovation and data-driven insights. Join us in our mission to unlock the business-transforming potential of people data.
Visier is seeking a skilled Site Reliability Engineer to join our dynamic team. As an SRE at Visier, you'll directly contribute to the reliability and scalability of our cloud-based analytics platform. You'll work alongside experienced engineers, tackling complex infrastructure challenges and mastering essential SRE practices. You'll gain hands-on experience with technologies like Kubernetes, Kafka, Cassandra, and a comprehensive suite of AWS services, building a strong foundation to develop your career with Visier in the future!
What you'll be doing.- Architect and Automate: Design, develop, and deploy infrastructure as code using Terraform and Packer, ensuring high availability and scalability in our AWS environment.
- Pipeline Mastery: Build and optimize robust CI/CD pipelines with Jenkins and Groovy, streamlining deployment processes and enhancing release velocity.
- Security by Design: Implement and enhance security best practices within our infrastructure, safeguarding sensitive data and ensuring compliance.
- Infrastructure Optimization: Troubleshoot, monitor, and improve the performance and reliability of critical infrastructure services, including Kong API gateway, Cassandra, PostgreSQL, Consul, Vault, and Kafka.
- Develop SRE Tools: Write and maintain automation scripts and tools using Python to streamline operational tasks and improve efficiency.
- Contribute to System Design: Participate in system design discussions and contribute to the evolution of our infrastructure architecture.
- Incident Response: Participate in on-call rotations and contribute to incident response efforts, learning to diagnose and resolve production issues.
- A strong foundation in software development principles and a passion for infrastructure as code.
- Proficiency in at least one programming language (Python, Java, Scala, Go, etc.).
- A solid understanding of Linux systems administration and networking fundamentals.
- A proactive approach to problem-solving, with a strong sense of ownership and accountability.
- A desire to learn and grow in a fast-paced, collaborative environment.
- A strong understanding of CI/CD concepts.
- A desire to learn and improve security practices.
- Experience with AWS services (EC2, S3, RDS, etc.) and related tools.
- Familiarity with containerization and orchestration technologies (Docker, Kubernetes).
- Knowledge of configuration management tools (Ansible, Chef, Puppet).
- Experience with monitoring and logging tools (Prometheus, Grafana, ELK stack).
- Experience with Infrastructure as code tools.
Most importantly, you share our values.
Diversity, Equity & InclusionVisier is committed to creating a diverse and inclusive workplace to ensure every employee feels a sense of belonging and is connected to their work, their team and Visier. It is imperative that we take every opportunity to measure, track and advance this commitment. Building a diverse and inclusive workplace is essential to the success of Visier and the well-being of our employees. The information you provide helps make our diversity data actionable.
#J-18808-LjbffrSite Reliability Engineer
Posted today
Job Viewed
Job Description
- Monitor production systems using tools like Grafana and New Relic to detect performance issues and security vulnerabilities.
- Respond to live incidents and outages, perform root cause analysis, and drive postmortem documentation and learning.
- Maintain up-to-date operational runbooks for common issues and workflows.
A leading global gaming and technology company is seeking a highly capable Site Reliability Engineer (SRE) to join their team in Singapore. This is a mission-critical role where you’ll own the reliability, scalability, and performance of complex distributed systems supporting a global platform. You’ll work at the intersection of software development and operations—designing robust systems, responding to live incidents, and driving automation across infrastructure and CI/CD processes.
The Position:
· Monitor production systems using tools like Grafana and New Relic to detect performance issues and security vulnerabilities.
· Respond to live incidents and outages, perform root cause analysis, and drive postmortem documentation and learning.
· Maintain up-to-date operational runbooks for common issues and workflows.
· Collaborate closely with developers to streamline production releases, patches, and deployment workflows.
· Manage infrastructure across cloud environments (primarily AWS), and optimize CI/CD pipelines for reliability and efficiency.
· Handle capacity planning, system performance tuning, and implement infrastructure-as-code using tools like Terraform.
The Candidate:
· Comes from a backend or full-stack development background and is comfortable coding in languages such as Java, JavaScript/TypeScript, or Bash.
· Has experience running services at scale in cloud environments like AWS, with a strong understanding of Linux.
· Thinks like a software engineer, but with the mindset of an operator—proactively preventing outages and continuously improving systems.
· Is adept at debugging under pressure, analyzing logs/metrics, and communicating clearly during incidents.
· Is passionate about automation, observability, and creating self-healing systems.
Preferred Qualifications
· 3+ years of experience in site reliability engineering, DevOps, or software engineering roles.
· Proven skills in:
o Monitoring & alerting tools (Grafana, New Relic)
o CI/CD pipelines (Git, Jenkins, GitHub Actions, etc.)
o Container orchestration (Docker, Kubernetes)
o Infrastructure-as-code (Terraform, CloudFormation, Ansible)
o Managing and securing AWS environments
· Understanding of authentication/authorization protocols (OAuth, JWT, OpenID)
· Familiarity with SQL/NoSQL databases (PostgreSQL, Redis, MongoDB)
· Strong interpersonal skills and a collaborative approach to working with cross-functional teams.
We regret to inform that only shortlisted candidates will be notified / contacted.
EA Registration No: R22105541,TAY ZHIHENG, DARIUS
Allegis Group Singapore Pte Ltd, Company Reg No. 200909448N, EA License No. 10C4544
#J-18808-LjbffrSite Reliability Engineer
Posted today
Job Viewed
Job Description
3 weeks ago Be among the first 25 applicants
About Sleek
Through proprietary software and AI, along with a focus on customer delight, Sleek makes the back-office easy for micro SMEs.
About Sleek
Through proprietary software and AI, along with a focus on customer delight, Sleek makes the back-office easy for micro SMEs.
We give Entrepreneurs time back to focus on what they love doing - growing their business and being with customers. With a surging number of Entrepreneurs globally, we are innovating in a highly lucrative space.
We operate 3 business segments:
Corporate Secretary: Automating the company incorporation, secretarial, filing, Nominee Director, mailroom and immigration processes via custom online robots and SleekSign. We are the market leaders in Singapore with :5% market share of all new business incorporations
Accounting & Bookkeeping: Redefining what it means to do Accounting, Bookkeeping, Tax and Payroll thanks to our proprietary SleekBooks ledger, AI tools and exceptional customer service
Business banking: Overcoming a key challenge for Entrepreneurs by offering digital banking services to new businesses
Sleek launched in 2017 and now has around 15,000 customers across our offices in Singapore, Hong Kong, Australia and the UK. We have around 450 staff with an intact startup mindset.
We have achieved >70% compound annual growth in Revenue over the last 5 years and as a result have been recognised by The Financial Times, The Straits Times, Forbes and LinkedIn as one of the fastest growing companies in Asia. Backed by world-class investors, we are on track to be one of the few cash flow positive, tech-enabled unicorns based out of Singapore.
Some Other Great Things About Working At Sleek.
Humility and kindness: Humility is a core attribute we hire for, which means we have a culture of not taking ourselves too seriously and being able to laugh. Kindness is also incredibly important. We are committed to creating and nurturing a diverse and inclusive environment.
Flexibility: You'll be able to work from home 5 days per week. If you need to start early or start late to cater to your family or other needs, we don't mind, so long as you get your work done and proactively communicate. You can also work fully remote from anywhere in the world for 1 month each year
Financial benefits: We pay competitive market salaries and provide staff with generous paid time off and holiday schedules. Certain staff at Sleek are also eligible for our employee share ownership plan and can share in the upside of our stellar growth trajectory as we work toward listing on a prominent stock exchange in the Asia Pacific region.
Personal growth: You'll get a lot of responsibility and autonomy at Sleek - we move at a fast pace so you'll be making decisions, making mistakes and learning. There's also a range of internal and external facing training programmes we run. We're also at the forefront of utilising AI in our space and are developing a regional centre of AI excellence. It is our intention that if you leave Sleek, you leave as a more well-rounded person and professional.
Sleek is also a proudly certified B Corp. Since we started our journey in 2017, we've been committed to building Sleek as a force for good. In just over 5 years, we've joined a community of industry leaders like Patagonia, Ben & Jerry's, and P&G who are building an inclusive, equitable, and a regenerative economy. We have planted over 29,271 trees to reforest our ecosystem and saved 7 tons of paper from landfills by processing over 1.4M pages through SleekSign. We aim to be Carbon Neutral by 2030.
About The Role
We are looking for Service Reliability Engineer that is excited about the below Mission and Outcomes over the next 6-12 months.
Mission: The primary mission of a Service Reliability Engineer is to serve as the highest level of technical escalation within the support team. This role focuses on resolving complex technical issues that cannot be addressed by frontline support, ensuring minimal downtime, and maintaining the highest level of customer satisfaction. The engineer collaborates with development teams, product management, and other stakeholders to diagnose, troubleshoot, and resolve issues, as well as to contribute to continuous improvement initiatives that enhance the overall quality of products and services.
Outcomes:
- Collaborate with friendly Product, Tech and Data teams to reproduce and resolve an array of enquiries
- Support our internal business stakeholders on questions about Sleek platform functionality
- Provide technical support through our channels via Zendesk, JIRA, and phone
- Identification of fixes and development of code
- Configure servers and networks in the Cloud
- Update database records to keep client records up-to-date
- Update knowledge base to improve self-servicing
Behavioural fit is also important at Sleek, and we will be looking for candidates that have a proven track record of embodying the below attributes in their recent roles:
Ownership : This shows reliability and helps build trust within the team. We move fast and need to know that everyone will see things through to completion and proactively help to get things back on track when challenges arise. Accountability is really important to us.
Humility: There is so much we don't know. Humility allows for open-mindedness to feedback and a willingness to learn from others. It paves the way for collaboration and creates a positive work environment. It is a key ingredient of self awareness and emotional intelligence.
Structured Thinking: Our business is complex with many layers (many services, many countries, many cultures). Regardless of whether you're more analytical or creative in nature, being able to show sound judgement is important to us. It ensures solutions are pragmatic and balance the needs of the organisation, team and customers.
Data driven: We are a data rich business with :15,000 small customers. Each decision we make can impact many more people than we realise - so it's critical that we use sound data to support our strategies and review the success of our initiatives.
Can have tough conversations in a positive way: It's not a matter of if, but when difficult interpersonal situations arise. Disagreement, conflict and disappointment are a given in a fast moving business where people care about their work. People that proactively have tough conversations with kindness build empathy, trust and great working relationships.
Analytical Mindset: You have a keen eye for detail and a methodical approach to dissecting problems. You excel at analysing complex systems and processes to identify weaknesses and inefficiencies, and your ability to evaluate multiple scenarios enables you to devise the best testing strategies. You apply data-driven decisions to enhance testing coverage and performance metrics, ensuring the highest standards of software quality.
Collaboration-Driven: You thrive in a cross-functional team environment, working closely with developers, product managers, and operations teams to ensure alignment on requirements and testing goals. You communicate effectively, advocate for quality throughout the development process, and proactively address potential issues before they arise, fostering a culture of shared responsibility for delivering exceptional software.
Requirements
Performance Standard
- Possess a strong academic foundation, ideally from reputable universities
- Demonstrated track record working in well-established or recognized organizations
- Over 5 years of experience as a Software Engineer
- Able to demonstrate an attention to detail
- Shows customer-attentiveness and have faced customers in past positions
- Strong analytical and problem-solving skills
- Experience with JavaScript tech stack (Node, NestJS, React, Vue)
- Experience with MongoDB, RDBMS (PostgreSQL, MySQL, etc.)
- Experience with Cloud Platforms (e.g. AWS)
- Bitbucket and Github(CI/CD knowledge)
- Proficiency in reading logs, especially Splunk
- Understanding load testing tools and analyzing results
- Remote/ Hybrid arrangements
- Seniority level Mid-Senior level
- Employment type Full-time
- Job function Other
- Industries IT Services and IT Consulting
Referrals increase your chances of interviewing at Sleek by 2x
Sign in to set job alerts for “Site Reliability Engineer” roles. Site Reliability Engineer (EMEA, Japan, Singapore, Australia) Senior Site Reliability Engineer, Environment Automation Senior Site Reliability / Gitops Engineer Software Engineer, Backend (International Exchange) AI/ML Senior Software Engineer (Singapore) Software Engineer, Frontend (International Exchange) Quality Assurance Engineer (Software) - Contractor Python and Kubernetes Software Engineer - Data, AI/ML & Analytics Software Engineer - Solutions Engineering Python and Kubernetes Software Engineer - Data, Workflows, AI/ML & AnalyticsWe’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
#J-18808-LjbffrBe The First To Know
About the latest Site reliability engineer Jobs in Singapore !
Site Reliability Engineer
Posted today
Job Viewed
Job Description
Get AI-powered advice on this job and more exclusive features.
Direct message the job poster from Hamilton Barnes
Job Title: Site Reliability Engineer
Employment Type: Full-Time / Permanent
Eligibility: Singapore Citizens only (due to project and clearance requirements)
About the Role
Our client, a respected IT solutions provider supporting national infrastructure and enterprise systems, is seeking a Site Reliability Engineer (SRE) to join their growing operations team. This role is ideal for candidates with strong infrastructure and automation experience who are passionate about system reliability, scalability, and performance in secure production environments.
Key Responsibilities
- Ensure availability, performance, and scalability of production systems and services
- Develop and maintain CI/CD pipelines and automated deployment processes
- Implement monitoring, alerting, and observability tools
- Manage infrastructure using Infrastructure-as-Code (IaC) tools such as Terraform or Ansible
- Troubleshoot and resolve production issues across the stack (infrastructure, applications, network)
- Drive incident management, root cause analysis, and long-term remediation
- Collaborate closely with development and DevOps teams to improve system design and reliability
- Maintain operational documentation, SOPs, and runbooks
Requirements
- Singapore Citizen (due to project and clearance eligibility)
- Diploma or Degree in Computer Science, Engineering, or related field
- 3 to 6 years of experience in SRE, DevOps, or infrastructure roles
- Proficient with Linux/Unix systems and public cloud platforms (AWS, Azure, or GCP)
- Hands-on experience with containerization (Docker, Kubernetes)
- Strong scripting abilities (e.g., Python, Bash, or Go)
- Familiarity with CI/CD tools such as Jenkins, GitLab CI, or equivalent
- Experience with monitoring/logging platforms (Prometheus, Grafana, ELK, etc.)
- Understanding of networking concepts including DNS, routing, load balancing, and firewalls
- Strong problem-solving skills and a proactive mindset
Why Apply
- Contribute to secure, high-impact projects across the public sector and regulated industries
- Join a team focused on automation, resilience, and infrastructure modernization
- Long-term career growth in a structured, well-established organisation
How to Apply
If your background aligns with the above and you're looking to make a meaningful impact in a secure, production-grade environment, please send your resume to
Note: Only Singapore Citizens will be considered for this role due to clearance requirements.
Seniority level- Seniority level Not Applicable
- Employment type Full-time
- Job function Information Technology
- Industries IT Services and IT Consulting
Referrals increase your chances of interviewing at Hamilton Barnes by 2x
Sign in to set job alerts for “Site Reliability Engineer” roles. Site Reliability Engineer Intern - 2025 Start Production Engineer / Site Reliability Engineer Software Engineer Intern, Dev Infra - 2025 Start Customer Engineer, Data Analytics and AI, Google Cloud WeChat - Senior Site Reliability Engineer Software Engineer, AI Acceleration, Android Information Technology - Cloud/DevOps Engineer Site Reliability Engineer (EMEA, Japan, Singapore, Australia) Head of Engineering, Systems & Services - APAC Software Development Engineer In Test Intern, Trust and Safety Engineering (2025 Start) Backend Software Engineer, Global LIVE Fund Safety Intern- 2025 Start Site Reliability Engineer-(Fresh-Grad)(A98145) Software Development Engineer in Test Intern , TikTok - 2025 Start Site Reliability Engineer (SRE) (GovTech)We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
#J-18808-LjbffrSite Reliability Engineer
Posted today
Job Viewed
Job Description
Join to apply for the Site Reliability Engineer role at StarHub
Join to apply for the Site Reliability Engineer role at StarHub
Job Description
We are looking for a talented and motivated Site Reliability Engineer (SRE) to join our team. This role requires a mix of infrastructure expertise, hands-on observability experience, and DevOps skills. As an SRE, you will be instrumental in building reliable, scalable, and efficient systems. The ideal candidate will have hands-on experience with Terraform, Ansible, and log analytics tools, combined with proficiency in working with Linux, Kubernetes, and AIOps platforms.
Job Description
We are looking for a talented and motivated Site Reliability Engineer (SRE) to join our team. This role requires a mix of infrastructure expertise, hands-on observability experience, and DevOps skills. As an SRE, you will be instrumental in building reliable, scalable, and efficient systems. The ideal candidate will have hands-on experience with Terraform, Ansible, and log analytics tools, combined with proficiency in working with Linux, Kubernetes, and AIOps platforms.
Key Responsibilities
- Design, deploy, and manage scalable infrastructure using Infrastructure as Code (IaC) tools such as Terraform, Ansible and GitHub.
- Implement and maintain observability solutions using ELK, Grafana suite (e.g. Loki, Tempo, Mimir, and Prometheus), ensuring complete monitoring, logging, and tracing capabilities.
- Leverage OpenTelemetry to instrument applications and collect telemetry data for performance insights and system health.
- Automate configuration and operational tasks using Ansible to reduce manual efforts.
- Manage and monitor Kubernetes clusters and Linux-based systems to ensure optimal performance and availability.
- Integrate and support SNMP-based Network Performance Monitoring (NPM) tools like SolarWinds, SevOne, or OpsRamp for network observability.
- Implement event management systems and AIOps platforms for proactive incident detection, correlation, and automated resolution.
- Collaborate with DevOps teams to build and maintain CI/CD pipelines for continuous integration and delivery.
- Perform incident management, conduct post-incident reviews, and drive long-term improvements through root-cause analysis.
- Maintain detailed documentation for infrastructure, automation workflows, troubleshooting procedures, and operational best practices.
- At least 3 years of experience in SRE, DevOps, or a related engineering role.
- Proficiency in Infrastructure as Code (IaC) using Terraform to manage complex infrastructure.
- Hands-on experience with log analytics and observability tools, including ELK (Elasticsearch, Logstash, Kibana) and the Grafana suite (Loki, Tempo, Mimir, Prometheus).
- Knowledge and experience with OpenTelemetry for distributed tracing and telemetry collection.
- Experience working with Kubernetes clusters and Linux-based systems in production environments.
- Expertise in automation using Ansible to streamline configuration and deployment processes.
- Knowledge of SNMP-based NPM tools such as SolarWinds, SevOne, or OpsRamp for network monitoring.
- Experience with AIOps platforms for event correlation and automated incident management.
- Strong background in CI/CD practices, with hands-on involvement in building pipelines for software delivery.
- Technical Skills:
- Infrastructure management with Terraform.
- Observability with ELK, Grafana suite, and OpenTelemetry.
- Automation using Ansible.
- Kubernetes orchestration and Linux system administration.
- Expertise in SNMP-based NPM tools (SolarWinds, SevOne, or OpsRamp).
- Experience with AIOps and event management platforms.
- Soft Skills:
- Strong problem-solving abilities with a focus on automation and continuous improvement.
- Excellent communication and collaboration skills across cross-functional teams.
- Ability to thrive in a dynamic, fast-paced environment and manage multiple priorities.
- Preferred Knowledge:
- Familiarity with GitOps practices for infrastructure management.
- Understanding of Service Level Indicators (SLIs) and Service Level Objectives (SLOs).
- Security awareness and experience implementing secure infrastructure.
- Education:
- Bachelor’s degree in Computer Science, Information Technology, or a related field, or equivalent work experience.
- Seniority level Mid-Senior level
- Employment type Full-time
- Job function Engineering and Information Technology
- Industries Telecommunications
Referrals increase your chances of interviewing at StarHub by 2x
Site Reliability Engineer Intern - 2025 Start Production Engineer / Site Reliability Engineer Software Engineer Intern, Dev Infra - 2025 Start Customer Engineer, Data Analytics and AI, Google Cloud Site Reliability Engineer (EMEA, Japan, Singapore, Australia) WeChat - Senior Site Reliability Engineer Information Technology - Cloud/DevOps Engineer Software Engineer, AI Acceleration, Android Head of Engineering, Systems & Services - APAC Software Development Engineer In Test Intern, Trust and Safety Engineering (2025 Start) Backend Software Engineer, Global LIVE Fund Safety Intern- 2025 Start Site Reliability Engineer-(Fresh-Grad)(A98145) Software Development Engineer in Test Intern , TikTok - 2025 Start Backend Software Engineer, TikTok Eng Privacy and Security(Location) Intern - 2025 Start Site Reliability Engineer (SRE) (GovTech) Platform Engineer, Operations & TechnologyWe’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
#J-18808-LjbffrSite Reliability Engineer
Posted today
Job Viewed
Job Description
Our client, a top-tier hedge fund, is looking to hire a talented Site Reliability Engineer to join their growing SRE team in Singapore.
Job Responsibilities:
- Ensure high reliability, availability, and performance of applications throughout their lifecycle.
- Automate repetitive tasks and systematically address recurring issues.
- Generate innovative ideas for application improvements and participate in their implementation.
- Analyze end-to-end workflows across various technologies and business processes.
- Manage incident response and resolution while promoting the SRE philosophy across teams.
Job Requirements:
- Solid foundation in computer science principles, including data structures and algorithms, as well as distributed systems.
- Proficiency in at least one modern programming language, with a preference for Python or Java.
- Familiarity with contemporary software development practices, including testing, version control, and CI/CD.
- Experience with SQL and databases is highly desirable; knowledge of web technologies (JavaScript, CSS, React) is a plus.
- Strong communication skills, both written and verbal, along with a proactive and entrepreneurial mindset.