838 Devops Engineers jobs in Singapore
Site Reliability Engineer/Cloud Engineer
Posted today
Job Viewed
Job Description
Job Description & Requirements
Join a global leader in gaming to manage the reliability of game-related platforms and infrastructure across both cloud and on-premise environments.
Responsibilities:
- Responsible for deployment, change, issues triage and infrastructure management of overseas games and relevant components and system, e.g. game monitor system, login services.
- Responsible for monitoring and dashboarding for game observability, and ensure the game is reliable, scalable and secure.
- Understand the game architecture, analyze, evaluate and respond to potential risks, such as hidden troubles and performance bottlenecks.
- Responsible for daily communication and coordination between various teams.
Requirements:
- Bachelor's Degree or above in Computer Science or comparable field.
- More than 3 years of operations experience in Linux and Windows operating system.
- Have a high sense of responsibility and teamwork spirit.
- Proficiency in scripting programming such as Bash, Python, SQL.
- Good understanding of cloud environment, such as AWS or Azure.
- Experience with containerization technologies such as Docker and orchestration platforms like Kubernetes is a plus.
- Experience with worldwide online game live operations is a plus.
Scalability
Kubernetes
Azure
Scripting
Bash
Reliability
Reliability Engineering
Networking
Python
Containerization
Windows
Docker
Ansible
Java
Orchestration
Linux
Cloud Site Reliability Engineer
Posted today
Job Viewed
Job Description
Job Description:
As a Cloud Site Reliability Engineer , you will be instrumental in ensuring the reliability, scalability, and performance of our hybrid cloud infrastructure across Azure and AWS . You will collaborate with engineering and cloud platform teams to build resilient, observable, and automated systems that support rapid delivery and high availability of services.
Key Responsibilities:
- Lead SRE initiatives to improve availability, reliability, and performance of cloud-native and hybrid applications.
- Design and implement observability frameworks across Azure and AWS using tools like CloudWatch, Azure Monitor, Prometheus, and Grafana.
- Drive automation and infrastructure-as-code practices to reduce operational toil and streamline deployments.
- Collaborate with application teams to define and implement SLIs, SLOs, and Error Budgets for cloud-hosted services.
- Champion chaos engineering and resilience testing across Azure and AWS environments.
- Work with enterprise teams to deploy and scale SRE enablers such as service mesh, auto-scaling, and CI/CD pipelines.
- Establish and enforce cloud infrastructure deployment standards , including blue-green and canary deployments.
- Support cloud migration strategies , cutover planning, and testing for applications transitioning between Azure and AWS.
Requirements:
- Minimum 10 years of experience in SRE or Cloud Engineering, preferably within the banking or financial services sector.
- Deep expertise in Azure and AWS cloud platforms , including compute, networking, storage, and security services.
- Strong understanding of ITIL and SRE frameworks , with the ability to integrate traditional operations with modern cloud practices.
- Proven leadership in coordinating with application teams and vendors for cloud deployment and migration planning.
- Hands-on experience with infrastructure-as-code tools (e.g., Terraform, Bicep, CloudFormation) and scripting (Bash, Python).
- Certifications in AWS (e.g., Solutions Architect, DevOps Engineer) and Azure (e.g., Azure Administrator, Azure Solutions Architect) are highly desirable.
- Experience with monitoring and alerting tools across both cloud platforms.
- Solid grasp of SRE principles: Toil reduction, SLIs/SLOs, Error Budgets, MTTD/MTTR .
- Strong interpersonal and communication skills to foster collaboration across teams and stakeholders.
- Agile mindset with experience in DevOps, CI/CD , and cloud-native development practices.
Cloud Site Reliability Engineer
Posted today
Job Viewed
Job Description
An excellent opportunity has just arisen for a Cloud Site Reliability Engineer (AWS) to join a global technology leader supporting secure, mission-critical cloud systems.
Job Purpose:
You'll play a key role in ensuring uptime, automation, and compliance across AWS environments while working alongside an experienced team in a well-established organization.
Job Responsibilities:
- Manage AWS cloud services ensuring uptime, scalability, and security compliance.
- Build and maintain automation pipelines with Terraform, CloudFormation, and Ansible.
- Oversee Linux and Windows patching cycles with compliance and audit readiness.
- Monitor and troubleshoot performance issues, preventing incidents and downtime.
- Document runbooks and support strict regulatory and security standards.
- 7+ years in Cloud/DevOps, with 5+ in regulated environments.
- Deep AWS expertise across compute, storage, networking, and security.
- Strong skills in Terraform, CloudFormation, and automation practices.
- AWS Certified Solutions Architect + RHCE/Windows certification.
- Experience with compliance frameworks (GovTech, Healthcare, Banking).
- Work on mission-critical AWS platforms with a global tech leader.
- Join a compliance-driven, high-availability engineering environment.
- Drive automation, reliability, and innovation in secure cloud operations.
PERSOLKELLY Singapore Pte Ltd
• EA License No.01C4394
• EA Registration No. R (Naveen Vasudevan)
By sending us your personal data and curriculum vitae (CV), you are deemed to consent to PERSOLKELLY Singapore Pte Ltd and its affiliates to collect, use and disclose your personal data for the purposes set out in the Privacy Policy available at You acknowledge that you have read, understood, and agree with the Privacy Policy.
***
Cloud Site Reliability Engineer
Posted today
Job Viewed
Job Description
An excellent Cloud Site Reliability Engineer (AWS) opportunity has just arisen in a global brand supporting mission-critical government systems.
Job Purpose:
Ensure reliable, secure, and automated cloud operations supporting mission-critical systems and compliance needs.
Job Responsibilities:
- Manage and support AWS cloud services ensuring uptime, scalability, and security compliance.
- Design and maintain Infrastructure-as-Code pipelines using Terraform, CloudFormation, and Ansible.
- Oversee operating system patching cycles across Linux and Windows environments efficiently.
- Monitor and troubleshoot performance issues, proactively preventing incidents and downtime.
- Document processes, maintain runbooks, and adhere to strict compliance and audit standards.
- 8+ years' cloud/DevOps experience with 5+ years in regulated environments.
- Strong AWS expertise across compute, networking, databases, and security services.
- Hands-on experience with Terraform, CloudFormation, Ansible, and automation practices.
- Certified AWS Solutions Architect (Associate/Professional) with RHCE or Windows certification.
- Familiarity with ITIL, incident response, and compliance-driven cloud operations.
Join a cutting-edge technology team in a growing market, ensuring uptime, automation, and reliability of large-scale AWS platforms-let's talk at
PERSOLKELLY Singapore Pte Ltd
• EA License No.01C4394
• EA Registration No. R (Naveen Vasudevan)
By sending us your personal data and curriculum vitae (CV), you are deemed to consent to PERSOLKELLY Singapore Pte Ltd and its affiliates to collect, use and disclose your personal data for the purposes set out in the Privacy Policy available at You acknowledge that you have read, understood, and agree with the Privacy Policy.
***
Cloud Site Reliability Engineer
Posted today
Job Viewed
Job Description
Cloud Site Reliability Engineer (AWS)
An excellent opportunity has just arisen for a Cloud Site Reliability Engineer (AWS) to join a global technology leader supporting secure, mission-critical cloud systems.
Job Purpose:
You'll play a key role in ensuring uptime, automation, and compliance across AWS environments while working alongside an experienced team in a well-established organization.
Job Responsibilities:
- Manage AWS cloud services ensuring uptime, scalability, and security compliance.
- Build and maintain automation pipelines with Terraform, CloudFormation, and Ansible.
- Oversee Linux and Windows patching cycles with compliance and audit readiness.
- Monitor and troubleshoot performance issues, preventing incidents and downtime.
- Document runbooks and support strict regulatory and security standards.
Job Requirements:
- 7+ years in Cloud/DevOps, with 5+ in regulated environments.
- Deep AWS expertise across compute, storage, networking, and security.
- Strong skills in Terraform, CloudFormation, and automation practices.
- AWS Certified Solutions Architect + RHCE/Windows certification.
- Experience with compliance frameworks (GovTech, Healthcare, Banking).
Why apply?:
- Work on mission-critical AWS platforms with a global tech leader.
- Join a compliance-driven, high-availability engineering environment.
- Drive automation, reliability, and innovation in secure cloud operations.
Do you want to work with experienced engineers in a well-established organization delivering secure, high-availability government cloud systems—let's talk at
PERSOLKELLY Singapore Pte Ltd
• EA License No.01C4394
• EA Registration No. R (Naveen Vasudevan)
By sending us your personal data and curriculum vitae (CV), you are deemed to consent to PERSOLKELLY Singapore Pte Ltd and its affiliates to collect, use and disclose your personal data for the purposes set out in the Privacy Policy available at You acknowledge that you have read, understood, and agree with the Privacy Policy.
***
Tell employers what skills you havepatch installation
Terraform
Scalability
Pipelines
AWS
Storage
Windows Server
Reliability Engineering
Networking
Windows
Site Reliability Engineering
Cloud Services
Cloud
Ansible
AWS Lambda
Linux
Amazon Cloud
RHCE
Cloud Site Reliability Engineer
Posted today
Job Viewed
Job Description
Cloud Site Reliability Engineer
An excellent Cloud Site Reliability Engineer (AWS) opportunity has just arisen in a global brand supporting mission-critical government systems.
Job Purpose:
Ensure reliable, secure, and automated cloud operations supporting mission-critical systems and compliance needs.
Job Responsibilities:
- Manage and support AWS cloud services ensuring uptime, scalability, and security compliance.
- Design and maintain Infrastructure-as-Code pipelines using Terraform, CloudFormation, and Ansible.
- Oversee operating system patching cycles across Linux and Windows environments efficiently.
- Monitor and troubleshoot performance issues, proactively preventing incidents and downtime.
- Document processes, maintain runbooks, and adhere to strict compliance and audit standards.
Job Requirements:
- 8+ years' cloud/DevOps experience with 5+ years in regulated environments.
- Strong AWS expertise across compute, networking, databases, and security services.
- Hands-on experience with Terraform, CloudFormation, Ansible, and automation practices.
- Certified AWS Solutions Architect (Associate/Professional) with RHCE or Windows certification.
- Familiarity with ITIL, incident response, and compliance-driven cloud operations.
The successful Cloud Site Reliability Engineer (AWS) must possess deep AWS expertise, strong automation skills, and proven experience in uptime-critical, compliance-driven environments.
Join a cutting-edge technology team in a growing market, ensuring uptime, automation, and reliability of large-scale AWS platforms—let's talk at
PERSOLKELLY Singapore Pte Ltd
• EA License No.01C4394
• EA Registration No. R (Naveen Vasudevan)
By sending us your personal data and curriculum vitae (CV), you are deemed to consent to PERSOLKELLY Singapore Pte Ltd and its affiliates to collect, use and disclose your personal data for the purposes set out in the Privacy Policy available at You acknowledge that you have read, understood, and agree with the Privacy Policy.
***
Tell employers what skills you haveTerraform
AWS
Patch Management
Nginx
Scripting
RHEL
Windows Server
WSUS
Reliability Engineering
Windows
Site Reliability Engineering
Cloud Services
Cloud
Ansible
AWS Lambda
SSL
Cloud Site Reliability Engineer
Posted today
Job Viewed
Job Description
An excellent Cloud Site Reliability Engineer (AWS) opportunity has just arisen in a global brand supporting mission-critical government systems.
Job Purpose:
Ensure reliable, secure, and automated cloud operations supporting mission-critical systems and compliance needs.
Job Responsibilities:
- Manage and support AWS cloud services ensuring uptime, scalability, and security compliance.
- Design and maintain Infrastructure-as-Code pipelines using Terraform, CloudFormation, and Ansible.
- Oversee operating system patching cycles across Linux and Windows environments efficiently.
- Monitor and troubleshoot performance issues, proactively preventing incidents and downtime.
- Document processes, maintain runbooks, and adhere to strict compliance and audit standards.
- 8+ years' cloud/DevOps experience with 5+ years in regulated environments.
- Strong AWS expertise across compute, networking, databases, and security services.
- Hands-on experience with Terraform, CloudFormation, Ansible, and automation practices.
- Certified AWS Solutions Architect (Associate/Professional) with RHCE or Windows certification.
- Familiarity with ITIL, incident response, and compliance-driven cloud operations.
Join a cutting-edge technology team in a growing market, ensuring uptime, automation, and reliability of large-scale AWS platforms-let's talk at
PERSOLKELLY Singapore Pte Ltd
• EA License No.01C4394
• EA Registration No. R (Naveen Vasudevan)
By sending us your personal data and curriculum vitae (CV), you are deemed to consent to PERSOLKELLY Singapore Pte Ltd and its affiliates to collect, use and disclose your personal data for the purposes set out in the Privacy Policy available at You acknowledge that you have read, understood, and agree with the Privacy Policy.
***
Be The First To Know
About the latest Devops engineers Jobs in Singapore !
Cloud Site Reliability Engineer
Posted today
Job Viewed
Job Description
An excellent opportunity has just arisen for a Cloud Site Reliability Engineer (AWS) to join a global technology leader supporting secure, mission-critical cloud systems.
Job Purpose:
You'll play a key role in ensuring uptime, automation, and compliance across AWS environments while working alongside an experienced team in a well-established organization.
Job Responsibilities:
- Manage AWS cloud services ensuring uptime, scalability, and security compliance.
- Build and maintain automation pipelines with Terraform, CloudFormation, and Ansible.
- Oversee Linux and Windows patching cycles with compliance and audit readiness.
- Monitor and troubleshoot performance issues, preventing incidents and downtime.
- Document runbooks and support strict regulatory and security standards.
- 7+ years in Cloud/DevOps, with 5+ in regulated environments.
- Deep AWS expertise across compute, storage, networking, and security.
- Strong skills in Terraform, CloudFormation, and automation practices.
- AWS Certified Solutions Architect + RHCE/Windows certification.
- Experience with compliance frameworks (GovTech, Healthcare, Banking).
- Work on mission-critical AWS platforms with a global tech leader.
- Join a compliance-driven, high-availability engineering environment.
- Drive automation, reliability, and innovation in secure cloud operations.
PERSOLKELLY Singapore Pte Ltd
• EA License No.01C4394
• EA Registration No. R (Naveen Vasudevan)
By sending us your personal data and curriculum vitae (CV), you are deemed to consent to PERSOLKELLY Singapore Pte Ltd and its affiliates to collect, use and disclose your personal data for the purposes set out in the Privacy Policy available at You acknowledge that you have read, understood, and agree with the Privacy Policy.
***
Site Reliability Engineer
Posted today
Job Viewed
Job Description
Technology
Site Reliability Engineer (Global) - TikTok Server Arch
Location
:
Singapore
Employment Type
:
Regular
Job Code
:
A
Responsibilities
This position is with TikTok's Stability Assurance Team. The team is responsible for ensuring that the services provided by TikTok are highly reliable with low-latency. Reliability assurance is complex and systematic for any massive application system and the team focuses on optimizing the application architecture from end to end; driven by data analysis, with automatic and intelligent failure recovery.
Job Responsibilities:
1.Ensure the online stability of TikTok and improve product SLA through systematic disaster recovery abilities, standardized emergency mechanisms, and intelligent analysis.
2.Identify system risks and promote governance through comprehensive and multi-perspective quality data.
3.Establish TikTok's unified standards and specifications, design and develop a one-stop operation platform, and enhance efficiency across multiple fields.
4.Collaborate closely with developers to implement best practices in SRE.
Qualifications
Minimum Qualifications:
1. Bachelor's degree or above in a computer-related field
2.Solid foundational knowledge of computer software; understanding of Linux operating systems, storage, network IO, and related principles.
3.Ability to solve problems systematically, strong communication skills, and a sense of ownership.
Preferred Qualification
- Minimum 3-5 years relevant work experience from a large-scale internet business
Job Information
About TikTok
TikTok is the leading destination for short-form mobile video. At TikTok, our mission is to inspire creativity and bring joy. TikTok's global headquarters are in Los Angeles and Singapore, and we also have offices in New York City, London, Dublin, Paris, Berlin, Dubai, Jakarta, Seoul, and Tokyo.
Why Join Us
Inspiring creativity is at the core of TikTok's mission. Our innovative product is built to help people authentically express themselves, discover and connect – and our global, diverse teams make that possible. Together, we create value for our communities, inspire creativity and bring joy - a mission we work towards every day.
We strive to do great things with great people. We lead with curiosity, humility, and a desire to make impact in a rapidly growing tech company. Every challenge is an opportunity to learn and innovate as one team. We're resilient and embrace challenges as they come. By constantly iterating and fostering an "Always Day 1" mindset, we achieve meaningful breakthroughs for ourselves, our company, and our users. When we create and grow together, the possibilities are limitless. Join us.
Diversity & Inclusion
TikTok is committed to creating an inclusive space where employees are valued for their skills, experiences, and unique perspectives. Our platform connects people from across the globe and so does our workplace. At TikTok, our mission is to inspire creativity and bring joy. To achieve that goal, we are committed to celebrating our diverse voices and to creating an environment that reflects the many communities we reach. We are passionate about this and hope you are too.
Site Reliability Engineer
Posted today
Job Viewed
Job Description
Before you apply to a job, select your language preference from the options available at the top right of this page.
Explore your next opportunity at a Fortune Global 500 organization. Envision innovative possibilities, experience our rewarding culture, and work with talented teams that help you become better every day. We know what it takes to lead UPS into tomorrow—people with a unique combination of skill + passion. If you have the qualities and drive to lead yourself or teams, there are roles ready to cultivate your skills and take you to the next level.
Job Description:
Job Summary:
We are seeking a skilled and proactive Site Reliability Engineer (SRE) with 5–8 years of experience and deep expertise in Google Cloud Platform (GCP). The ideal candidate will be responsible for the reliability, availability, and performance of cloud-based applications and infrastructure. You will collaborate with development, operations, and security teams to build and maintain scalable, secure, and highly available systems.
Key Responsibilities:
- Design, develop, and maintain reliable, scalable, and highly available systems on GCP.
- Build and manage CI/CD pipelines, infrastructure as code (IaC), and monitoring solutions.
- Proactively monitor and manage system performance, uptime, and capacity using observability tools.
- Troubleshoot and resolve infrastructure and application-level issues in real-time.
- Implement and maintain disaster recovery, failover mechanisms, and backup strategies.
- Automate repetitive tasks and processes to improve efficiency and reduce toil.
- Participate in on-call rotations, incident management, and root cause analysis (RCA).
- Ensure compliance with security standards, privacy regulations, and governance policies.
- Collaborate with cross-functional teams to support DevOps and SRE best practices.
- Drive improvements in SLAs, SLOs, and error budgets through data-driven insights.
Required Qualifications:
- 5–8 years of relevant experience as an SRE, DevOps Engineer, or Cloud Infrastructure Engineer.
- Strong hands-on experience with Google Cloud Platform (GCP) – Compute Engine, GKE, Cloud Functions, Cloud Storage, IAM, BigQuery, etc.
- Proficiency in Infrastructure as Code tools like Terraform, Deployment Manager, or CloudFormation.
- Experience with Kubernetes, Docker, and container orchestration.
- Proficiency in scripting languages like Python, Shell, or Go.
- Deep understanding of monitoring and logging tools such as Prometheus, Grafana, Stackdriver, or Datadog.
- Knowledge of CI/CD tools such as Jenkins, GitLab CI, or Cloud Build.
- Experience with incident response, postmortem analysis, and site reliability principles.
- Strong problem-solving and communication skills.
Preferred Qualifications:
- GCP certifications (e.g., Professional Cloud DevOps Engineer, Cloud Architect).
- Exposure to multi-cloud environments or hybrid cloud infrastructure.
- Familiarity with Agile and ITIL frameworks.
- Experience working in regulated environments with compliance standards (e.g., ISO, SOC2).
Employee Type:
Permanent
UPS is committed to providing a workplace free of discrimination, harassment, and retaliation.