1,317 Devops Engineers jobs in Singapore

Site Reliability Engineer

Singapore, Singapore MANPOWER STAFFING SERVICES (SINGAPORE) PTE LTD

Posted 2 days ago

Job Viewed

Tap Again To Close

Job Description

Responsibilities:

  • Responsible for deployment, change, issues triage and infrastructure management of overseas games and relevant components and system, e.g. game monitor system, login services.
  • Responsible for monitoring and dashboarding for game observability, and ensure the game is reliable, scalable and secure.
  • Understand the game architecture, analyze, evaluate and respond to potential risks, such as hidden troubles and performance bottlenecks.
  • Responsible for daily communication and coordination between various teams.

Requirements:

  • Bachelor’s Degree or above in Computer Science or comparable field.
  • More than 3 years of operations experience in Linux and Windows operating system.
  • Good knowlege in cloud and containerization.
  • Proficiency in scripting programming such as Bash, Python, SQL is a plus.
  • Experience with worldwide online game live operations is a plus.
  • Have a high sense of responsibility and teamwork spirit.
#J-18808-Ljbffr
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Singapore, Singapore Huawei International Pte Ltd

Posted 3 days ago

Job Viewed

Tap Again To Close

Job Description

Responsibilities

  • To be responsible for reliability, availability, user experience, capacity planning, toil reduction, process enhancement and digitalization of the cloud-based internet services.

  • Handle SRE role for assigned cloud services owning the KPIs for reliability, issue to resolution, service deployment, business continuity management, security policy planning, capacity planning, toil reduction through automation.

  • Introduce service governance initiatives based on latest technologies to consistently increase reliability and user experience components of Huawei mobile services on cloud to provide world class user experience with high reliability.

  • Effectively utilize our world class AIOPS and autonomous service governance platform to ideate new ways to streamline process, accuracy of alerts, time series-based trend analysis, anomaly detection, risk identifications.

  • Support platform/service expansions, migrations to new architectures, upgrades and drill activities across different technology domains.

  • Incorporate mature chaos engineering for risk identification, IPDRR for security, comprehensive automation frameworks to reduce ops effort to reach lowest possible level and make time, space for engineering related focus for the team.

Requirements and Qualifications

  • Bachelor/Master of computer science engineering or related majors

  • Have knowledge of Linux, Network, Database,Containers, Container management systems, etc.

  • Have knowledge of at least one programming language or scripting such as Java, Python, Shell, Ansible, Terraform

  • Have knowledge in big data analytics.

  • Explored new technology trends, opensource technologies, methodologies in internet service domain.

#J-18808-Ljbffr
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Singapore, Singapore RigNet

Posted 3 days ago

Job Viewed

Tap Again To Close

Job Description

About us

One team. Global challenges. Infinite opportunities. At Viasat, we’re on a mission to deliver connections with the capacity to change the world. For more than 35 years, Viasat has helped shape how consumers, businesses, governments and militaries around the globe communicate. We’re looking for people who think big, act fearlessly, and create an inclusive environment that drives positive impact to join our team.


What you'll do

The Customer Engineering team is a group of highly technical engineers who are tasked with maintaining and developing the reliability, scalability, and performance of the Service to different Enterprise Customers. The Customer Engineering Team is empowered to drive technical resolutions across the technology stack from hardware through to application and all stops in between. The team is also responsible to build and maintain Alerts to proactively monitor the service and act as the technical liaison between Customer facing teams and the Engineering teams.


The day-to-day

As a Site Reliability Engineer, you will:

  • Identify and investigate potential and actual customer performance problems, recommend, and prioritize remediation, and assess effectiveness of remediation actions
  • Participate in and provide feedback on product design, especially regarding reliability and availability
  • Drive initiatives with partner teams to improve the reliability and performance of the Service through improved system design
  • Drive a culture of intolerance to manual activity which results in a highly automated environment delivering scalable solution
  • Work Closely with Customer facing teams (Technical Account Mangers and Program Teams) to understand and prioritize the Customer issues
  • Drive monitoring and automation initiatives
  • Create and present Performance reports for technical and management stakeholders
  • Work closely with Engineering teams to communicate and prioritize the service impacting issues
  • Reproduce and test the Customer issues in the Lab
  • Develop Automated scripts and tools to Enable monitoring of the Service
  • Be part of on-call rotations

What you'll need

Requirements

  • 5+ years experience in troubleshooting and triage of technical issues in a fast paced environment, to support customers.
  • 5+ years experience in Network Operations or Product Support
  • Advanced knowledge of modern programming languages, especially Python
  • An ability to understand large complex systems and a passion to constantly improve environments
  • Strong networking knowledge: TCP/IP, IPSEC, VPN, NAT, Routing Protocols, AAA
  • Set priorities and work efficiently in a fast-paced environment
  • Demonstrated ability to deliver results on time with high quality and attention to detail
  • Demonstrated ability to work with ambiguous requirements, adapt, and learn
  • Experience with data analytics tools(Splunk, Kibana)
  • Keen (data-driven) decision making skills under incomplete information
  • Excellent face-to-face and remote customer rapport
  • Bachelor’s degree in electrical engineering, Computer Science, or Computer Engineering
  • Up to 10% travel

What will help you on the job

  • Experience analyzing data and trending to gain operational efficiencies
  • Telecom or related operational service experience, especially wireless networks
  • Previous technical role in a DevOps/SRE workflow
  • Experience with Satcom technology
  • Experience/knowledge GCP, AWS, Big Query

EEO Statement

Viasat is proud to be an equal opportunity employer, seeking to create a welcoming and diverse environment. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, ancestry, physical or mental disability, medical condition, marital status, genetics, age, or veteran status or any other applicable legally protected status or characteristic. If you would like to request an accommodation on the basis of disability for completing this on-line application, please click here .

#J-18808-Ljbffr
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Tower Research Capital

Posted 3 days ago

Job Viewed

Tap Again To Close

Job Description

workfromhome

Join to apply for the Site Reliability Engineer role at Tower Research Capital

Join to apply for the Site Reliability Engineer role at Tower Research Capital

Tower Research Capital is a leading quantitative trading firm founded in 1998. Tower has built its business on a high-performance platform and independent trading teams. We have a 25+ year track record of innovation and a reputation for discovering unique market opportunities.

Tower is home to some of the world’s best systematic trading and engineering talent. We empower portfolio managers to build their teams and strategies independently while providing the economies of scale that come from a large, global organization.

Engineers thrive at Tower while developing electronic trading infrastructure at a world class level. Our engineers solve challenging problems in the realms of low-latency programming, FPGA technology, hardware acceleration and machine learning. Our ongoing investment in top engineering talent and technology ensures our platform remains unmatched in terms of functionality, scalability and performance.

At Tower, every employee plays a role in our success. Our Business Support teams are essential to building and maintaining the platform that powers everything we do — combining market access, data, compute, and research infrastructure with risk management, compliance, and a full suite of business services. Our Business Support teams enable our trading and engineering teams to perform at their best.

At Tower, employees will find a stimulating, results-oriented environment where highly intelligent and motivated colleagues inspire each other to reach their greatest potential.

Responsibilities

  • Overseeing and ensuring the continuous operation of the firm's Linux-based trading infrastructure, addressing day-to-day operational needs
  • Providing second-level support, including:
    • Rapid response to emergencies
    • Implementing scheduled updates and deployments
    • In-depth analysis and resolution of performance issues
    • Engage in a rotational on-call schedule, including early morning and weekend shifts, to provide timely support
  • Contributing towards the development of automated solutions for server provisioning, configuration, and monitoring, targeting a scalable management of thousands of servers
  • Engaging in interactions with the Trading and Core Engineering teams
  • Managing essential Core services such as DHCP, LDAP, DNS, and NFS for on-prem and hosted data centers as well as public clouds
  • Participating in an on-call rotation and occasional weekend shifts

Qualifications

  • Sound expertise in Linux production environments
  • Basic knowledge of Python and Bash scripting
  • Engagement with automation and monitoring tool sets
  • Comprehensive knowledge of operating system principles, with a particular focus on Linux internals
  • Familiarity with Intel-based server hardware and components
  • Competence in server-side networking, including understanding network protocols and configurations
  • Familiarity in cloud services and architectural solutions
  • Experience in designing, building, and troubleshooting complex systems
  • Good problem-solving skills, underpinned by a methodical approach to technical challenges. This includes an ability to communicate effectively, demonstrating strong interpersonal skills, a sense of responsibility, and a commitment to driving projects to completion.
  • Sense of ownership and drive

Preferred Qualifications

  • Involvement in open source or personal projects showcasing a passion for innovation and collaboration
  • Experience in High Frequency Trading, Quantitative Finance or working in low latency environment is advantageous but not a strict requirement

Candidates Attributes

  • Organized, responsible, and meticulous
  • Strong communicator
  • Proactive and willing to take initiative
  • Able to manage and prioritize multiple tasks
  • Excellent at supporting Linux Production environments
  • Able to work both within a team and independently

Benefits

Tower’s headquarters are in the historic Equitable Building, right in the heart of NYC’s Financial District and our impact is global, with over a dozen offices around the world.

At Tower, we believe work should be both challenging and enjoyable. That is why we foster a culture where smart, driven people thrive – without the egos. Our open concept workplace, casual dress code, and well-stocked kitchens reflect the value we place on a friendly, collaborative environment where everyone is respected, and great ideas win.

Our benefits include:

  • Generous paid time off policies
  • Savings plans and other financial wellness tools available in each region
  • Hybrid working opportunities
  • Free breakfast, lunch and snacks daily
  • In-office wellness experiences and reimbursement for select wellness expenses (e.g., gym, personal training and more)
  • Company-sponsored sports teams and fitness events (JPM Corporate Challenge, Cycle for Survival, Wall Street Rides FAR and more)
  • Volunteer opportunities and charitable giving
  • Social events, happy hours, treats and celebrations throughout the year
  • Workshops and continuous learning opportunities

At Tower, you’ll find a collaborative and welcoming culture, a diverse team and a workplace that values both performance and enjoyment. No unnecessary hierarchy. No ego. Just great people doing great work – together.

Tower Research Capital is an equal opportunity employer.

Seniority level
  • Seniority level Mid-Senior level
Employment type
  • Employment type Full-time
Job function
  • Job function Engineering and Information Technology

Referrals increase your chances of interviewing at Tower Research Capital by 2x

Get notified about new Site Reliability Engineer jobs in Singapore .

Internship, Technology (Full Stack Developer) May/June - December 2025 Software Developer (Early Career/Young Talent Program) Frontend Software Engineer, Data Platform - 2025 Start Frontend Software Engineer, Data Platform - 2025 Start Project Intern, Digital Innovations & Solutions (Full Stack Developer) Frontend Engineer-Search - Singapore-2025 Start Backend Software Engineer, TikTok - Singapore Internship, Technology (ML/Data Engineer) May/June - December 2025

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

#J-18808-Ljbffr
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Singapore, Singapore NetEase

Posted 5 days ago

Job Viewed

Tap Again To Close

Job Description

About NetEase Games:

As a leading internet technology company based in China, NetEase, Inc. (NASDAQ: NTES and HKEX:999, “NetEase”) provides premium online services centered around content creation. With extensive offerings across its expanding gaming ecosystem, the Company develops and operates some of China’s most popular and longest-running mobile and PC games. Powered by industry-leading in-house R&D capabilities in China and globally, NetEase creates superior gaming experiences, inspires players, and passionately delivers value for its thriving community worldwide. By infusing play with culture and education with technology, NetEase transforms gaming into a meaningful vehicle to build a more entertaining and enlightened world.

NetEase’s ESG initiatives are among the best in the global media and entertainment industry, earning it a distinction as one of the S&P Global Industry Movers and an “A” rating from MSCI. For more information, please visit:

Job Description:

  • Site Reliability Engineering (SRE) refers to using software engineering methods to manage systems, solve problems, and achieve operational automation to reduce trivial tasks and improve service availability. Responsibilities include but are not limited to:
  • Manage the operational work of NetEase Interactive Entertainment services, such as Eggy Party, Marvel Rivals, UU Accelerator, Ace Racer, and other online services, as well as internal research projects.
  • Design and select basic runtime environments (including servers, virtualization, cloud services, networks, databases, etc.) for game servers based on different games' service architecture, performance requirements, and business conditions, providing high-quality and efficient operational services at controllable costs.
  • Establish and monitor various operational metrics and customize data analysis standards.
  • Collaborate with product departments to identify issues, optimize technical architecture, and enhance user experience based on game and infrastructure conditions.
  • Participate in in-depth research on cutting-edge open-source software, virtualization, databases, and web services, and develop technical solutions for business implementation.

Job Requirements:

  • Bachelor's degree or above, majors in computer science, networking, communications, automation, or related fields are preferred.
  • Familiar with the Linux operating system; knowledgeable about computer network architectures and common network protocols such as TCP/IP and HTTP.
  • Proficient in at least one programming language, including but not limited to C/C++, Shell, Python, Golang, Rust, or Java.
  • Passionate about open-source; experience or knowledge in open-source software such as Linux, Nginx, MySQL, K8S, and Istio is preferred.
  • Strong logical thinking, communication, and learning abilities; adept at research and problem-solving.
  • Skilled at teamwork, with a strong sense of collective honor, responsibility, and service awareness.
  • Open to trying new things, with excellent problem-solving skills and strong technical sensitivity; experience in contributing to open-source communities is a plus.
  • Proficiency in Chinese is required for this role, as daily communication and collaboration with key stakeholders and team members based in China are essential to the responsibilities of the position.
#J-18808-Ljbffr
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Singapore, Singapore HCLTech

Posted 6 days ago

Job Viewed

Tap Again To Close

Job Description

Get AI-powered advice on this job and more exclusive features.

This role combines software and systems engineering to build run, and maintain high performant, distributed, fault tolerant and resilient financial systems. Site Reliability Engineers focus on ensuring a joyful customer journey.

As a Site Reliability Engineer you will be filling a mission-critical role ensuring that the systems are healthy, monitored, automated, fault tolerant and designed to scale.

You will collaborate and work closely with engineering teams to continually improve production services, facilitating fast delivery of new products, and reducing downtime.

  • Drive Site Reliability Engineering agenda to improve availability, reliability, and performance of services
  • Drive observability for our applications.
  • Drive optimise-operate initiative, example, reduction of operation toil
  • Work with application teams in setting up SLI, SLO and Error budget for their applications
  • Work with enterprise team in deploying SRE enablers/initiatives.

Requirements

  • At least 6-8 years IT experience with at least 3 years in a project deployment capacity, preferably gained in IT banking environment or a system integrator environment
  • The candidate should have knowledge on leveraging on LLM’s & deploying solutions for different Gen-AI use cases.
  • The candidate should have strong infrastructure/technical background with knowledge on Open Systems platform. Moderate information security knowledge
  • Have a good understanding of ITIL & SRE processes & practices
  • Have good leadership skills in working with application teams and service providers in defining infrastructure deployment plan, cutover/migration strategy and test plan.
  • Able to formulae and establish infrastructure deployment standards.
  • Good people management, vendor management and project management skills
  • Agile, AWS certification preferred
  • Able to create Bash/Python scripts for infra deployment
  • Must able to practice SRE & Chaos Engineering principles
  • Understands key SRE concepts such as Toil, SLI, SLO, Error Budgets, MTTD, MTTR, etc
  • Strong, committed, and reliable team player, able to take direction but also willing to contribute to discussions on design and strategy.
  • Possess strong interpersonal and communication skills to be able to deal with and form good relationships with other technology teams through day to day support and project work
  • Strong background in machine learning and deep learning algorithms.
  • Proficiency in Python to developing Gen-AI models.
  • Ability to design and implement scalable and efficient AI systems.
  • Skills in data preprocessing and feature engineering for AI model training.
  • Ability to stay updated with the latest advancements in generative AI research and incorporate them into work.
  • Expert level knowledge of different OS (AIX, LINUX, WINTEL, Solaris) for BAU support, upgrades & maintenance.
  • Knowledge on OS Security & hardening.
  • Knowledge / hands on experience on Patch Management.
  • In-depth knowledge of LVM, SAN allocation & File System increase, Create new file systems in Cluster / Non-cluster environment.
  • ESXi, vSphere systems administration and support including vMotion, HA, DRS, vCenter Operations Manager, vCenter Service Manager, vCenter Configuration Manager, Site Recovery Manager.
  • Administering cloud-based & OpenShift based Infrastructure deployment. Administration tasks includes provisioning/de-provisioning Of resources.
  • Support audit and Infrastructure / network security scans, Disaster Recovery and security related drills.
  • Capacity review & performance management across all platform systems.
  • Knowledge on Middleware components such as JBOSS, APACHE, WebSphere Application server & MQ.
  • Knowledge on SSL Certificate procurement process & renewals.
  • Having knowledge on MariaDB, Oracle & DB2 databases Backup, DB restarts, access issues, DB Upgrade support.
  • Very good understanding of SAN configuration EMC/Hitachi LUNs on UNIX (AIX/Solaris/Linux) servers.
  • Mange Firewall, GTM & LTM configuration requests.
  • Ability to develop simple/complex shell scripts as per requirements and for automation.
  • Effective in dealing with crisis calls / critical issues for business-critical services.
  • Proven experience in technically guiding teams in productivity driven environment.
  • Worked in at least two of the areas of IT Infrastructure support i.e. Production Support, Application Support & infrastructure Support.
  • Explore, learn and deploy new technologies that will help the company to reduce cost or improve operational efficiencies.
  • Excellent troubleshooting and analytical skills
  • Communication and interpersonal skills.
  • Working across cultures & able to work 24*7

Secondary Skills: Unix, CHAOS Engineering, DB / MQ Administration, Network (DNS, Firewall, GTM/LTM, VLAN).

Seniority level
  • Seniority level Mid-Senior level
Employment type
  • Employment type Contract
Job function
  • Job function Information Technology
  • Industries IT Services and IT Consulting

Referrals increase your chances of interviewing at HCLTech by 2x

Sign in to set job alerts for “Site Reliability Engineer” roles. Production Engineer / Site Reliability Engineer Site Reliability Engineer (EMEA, Japan, Singapore, Australia) Information Technology - Cloud/DevOps Engineer Site Reliability Engineer (SRE) (GovTech) Engineer (Energy Management Systems Department) Site Reliability Engineer Intern - 2025 Start

Downtown Core, Central Singapore Community Development Council, Singapore 4 weeks ago

Site Reliability Engineer, Engineering Infra - AZ SRE (Campus Recruitment 2026)

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

#J-18808-Ljbffr
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Singapore, Singapore M-DAQ GLOBAL PTE. LTD.

Posted 7 days ago

Job Viewed

Tap Again To Close

Job Description

About Us

At M-DAQ Global, we're on a mission to create a World without Currency Borders. We are a pioneering fintech group specialising in foreign exchange (FX) & payment solutions that facilitate seamless cross-border transactions for businesses worldwide.

Headquartered in Singapore, our vibrant and diverse team spans six countries and territories. We foster a dynamic environment where individuals can contribute to a comprehensive suite of solutions, from advanced FX and streamlined collections to AI-driven onboarding and enhanced risk management. If you're passionate about making a tangible impact in the global financial landscape, and eager to grow within a company that's constantly innovating, M-DAQ Global offers a unique opportunity.

Join us and be part of the team powering faster, smarter cross-border payment and FX solutions for Asia and the world.

For more information, please visit:

Responsibilities

  • Primarily responsible for day-to-day support of Ecommerce Platform Application,
  • FX clients and Stockbrokers using FX.
  • Perform monitoring using available tools and media
  • Write Monitoring Tools in Unix, Python, SQL or Lambda, strictly adhering to the monitoring framework
  • Assist Clients Success Team with onboarding activities.
  • Trouble shoot incidents, providing enough in-dept details which would assist developers to quickly zoom into codes to identify any bugs
  • Manage incidents following internal incident management process and procedures till they are resolved
  • Ensure incidents are logged and root cause analysis updated and concerned parties are advised
  • Involved in Problem Management
  • Ensure timely email responses and stake holders updates
  • Participate in depth QA/Testing of the application at the end of a Sprint
  • Able to grasp the permutation of inputs to develop Test Cases and Test Scenarios for exhaustive testing against new applications or new functionalities
  • Code Test Cases using Selenium, Cucumber, Scripts, Python and Postman
  • Ensure production system are adequately monitored

Requirements

  • Degree in Computer Science / Info Technology
  • Able to code in Java, Python and Unix/Bash is a must
  • Able to code in Selenium would be an advantage
  • Some knowledge in AWS platform/tools would be an advantage
  • Some knowledge on FX, Payment and E-Commerce business knowledge would be an advantage
  • Equipped with analytical and problem-solving skills
  • Able to learn new functionalities and new systems quickly and present testing permutations for Test Cases and Test Scenarios
  • Essential but not a must to have knowledge of FIX Protocol
  • Result driven and customer oriented
  • Prior experience in managing clients from China and Japan will be advantageous
  • Proficient in spoken and written Simplified Chinese
  • Ability to speak/write Japanese will be advantageous
  • Team player, resourceful, independent and a good communicator

Why Us?

  • Make a positive impact to the world’s economy by creating a World without Currency BordersTM
  • Team Innovation Mindset, People-Oriented
  • Challenging environment, offering great opportunities to learn and grow
  • Creative and Innovative Workplace
  • We offer competitive remuneration, including employee stock options and employee benefits
#J-18808-Ljbffr
This advertiser has chosen not to accept applicants from your region.
Be The First To Know

About the latest Devops engineers Jobs in Singapore !

Site Reliability Engineer

Singapore, Singapore Tencent

Posted 7 days ago

Job Viewed

Tap Again To Close

Job Description

Get AI-powered advice on this job and more exclusive features.

Direct message the job poster from Tencent

International Talent Acquisition Lead - WeChat & HR4HR

About Tencent

Tencent is an Internet-based platform company founded in Shenzhen, China, in 1998. We use technology to enrich the lives of Internet users and assist the digital upgrade of enterprises. Our mission is "Value for Users, Tech for Good". We embrace a culture of teamwork & creativity and are driven by our values - Integrity, Proactivity, Collaboration and Creativity.

We are rapidly expanding our international operations and are looking for top talent to propel us forward. Combining the results-oriented nature of a startup with the resources of a profitable and leading Internet company, Tencent offers a unique opportunity for aspiring individuals to thrive.

Department Introduction

Since the first Hadoop cluster was built in January 2009, Tencent Data Platform Department has continuously explored and practiced with the mission of "data-driven, maximizing business value". The platform's computing power elastic resource pool has exceeded 200,000 units, and the daily data access reaches 35 trillion pieces, forming a data service-oriented platform around the full data lifecycle. It provides real-time, mass, and accurate personalized recommendation services driven by full-process data for Tencent's advertising business, delivering professional data foundation and application solutions across the company's various business units.

  • Responsible for the operation and maintenance of big data clusters (Hadoop, HBase, Hive, Yarn, Spark, etc.).
  • Optimize cluster performance, manage scaling and contraction.
  • Monitor clusters, handle alarms, troubleshoot faults, perform data backups, and assist users with issues.

Requirements

  • Bachelor's degree in Computer Science or related field.
  • Experience in big data cluster operation and maintenance is preferred.
  • Proficient in any programming language such as Python, Shell, or Java; familiar with Linux environment, configuration, management, and optimization.
  • Knowledge of TCP/IP, HTTP protocols, and network, security, database, and computer architecture fundamentals.
  • Ability to communicate bilingually in English and Mandarin to collaborate with international stakeholders and teams based in China.
Seniority level
  • Mid-Senior level
Employment type
  • Full-time
Job function
  • Information Technology
Industries
  • Software Development, Internet Marketplace Platforms, and Technology, Information and Media

Referrals increase your chances of interviewing at Tencent by 2x.

Get notified about new Site Reliability Engineer jobs in Singapore, Singapore .

#J-18808-Ljbffr
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Singapore, Singapore ABAXX SINGAPORE PTE. LTD.

Posted 7 days ago

Job Viewed

Tap Again To Close

Job Description

Site Reliability Engineer - Networking

We are seeking competent candidate joining our Infrastructure Team for the mission building and operating MAS regulated marketplace and clearing house. This role is ideal for someone with a strong foundation in AWS services, infrastructure as code, and cloud security, who is passionate about building scalable, secure, and compliant cloud environments.

In this role, you will work alongside experienced engineers in a collaborative and demanding environment, contributing to the development and operation of mission-critical platforms that support real-time trading and clearing services. You will also be instrumental in ensuring system reliability, scalability, and performance across our technology stack.

Key Responsibilities

Cloud Infrastructure Engineering

  • Design, implement, and manage scalable AWS infrastructure usingTerraform.
  • Architect secure and efficient network topologies usingVPCs, subnets, route tables, and security groups.
  • ManageAWS Control Towerfor multi-account governance and compliance.
  • Deploy and manageKubernetes (K8s)clusters for container orchestration.
  • Integrate and maintainElastic Stackfor observability and monitoring.
  • Configure and manageCloudflarefor DNS, WAF, and edge security.

Infrastructure & Security Operations

  • Own day-to-dayinfrastructure operations, including monitoring, patching, and performance tuning.
  • Implement and maintainCloud Security Posture Management (CSPM)tools to ensure continuous compliance and risk visibility.
  • Identify, prioritize, and remediatevulnerabilitiesacross cloud workloads and infrastructure.
  • Collaborate with security teams to enforce best practices in IAM, encryption, and data protection.
  • Participate in incident response and root cause analysis for infrastructure-related issues.

Automation & Collaboration

  • Automate infrastructure provisioning and configuration using CI/CD pipelines.
  • Work closely with application development teams to support application delivery and platform reliability.
  • Document infrastructure designs, operational procedures, and security controls.

Required Qualifications

  • 3+ years of hands-on experience with AWS cloud services.
  • Proficiency in Terraform and infrastructure as code principles.
  • Experience with Cloudflare, AWS Control Tower, and Kubernetes.
  • Strong understanding of AWS observability.
  • Proven experience in network design and security group/subnet architecture.
  • Familiarity with CSPM tools.
  • Experience with vulnerability management and remediation workflows.
  • Strong scripting skills (e.g., Bash, Python) and CI/CD tooling.
  • Experience with Ansible and Packer for automation and image creation.
  • Excellent troubleshooting and communication skills.

Preferred Qualifications

  • Experience with GitOps tools (e.g., ArgoCD, Flux).
  • Knowledge of compliance frameworks (e.g., CIS, NIST, ISO 27001).
  • Familiarity with container security and runtime protection tools.
  • Hands-on experience withPagerDutyor similar incident response platforms.
#J-18808-Ljbffr
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Singapore, Singapore IDEMIA Group

Posted 7 days ago

Job Viewed

Tap Again To Close

Job Description

Select how often (in days) to receive an alert:

This role plays a critical part in ensuring reliability, scalability, and performance of our systems and services. You will work closely with development and operations teams to build and maintain robust infrastructure and tools that support high availability, monitoring and rapid deployment.

Key Missions

• Maintains platforms or products after go live by measuring and monitoring their availability, performance and overall system health
• Recovers platforms or products during production incidents to meet targeted service-level agreements
• Set up, enhance and maintain observability tools.
• Assist in incident response, perform root cause analysis, and postmortem documentation.
• Develop tools/applications/scripts to improve operational efficiency.
• Maintain and enhance CI/CD pipelines.
• Collaborate with software engineers to design scalable and resilient systems.
• Participate in on-call and on-site rotations and contribute to reducing alert fatigue.
• Document processes, configurations, and best practices.
• Support other software efficiency improvement initiatives.

Profile & Other Information

• At least 1-3 years’ experience in software development, Devops or SRE.
• Curious, Strong communicator and ready to work in a fast-paced environment and willing to pick up new skills and technologies as necessary.
• Degree in Electrical / Electronics / Computer Engineering / Computer Science or a relevant discipline

• Basic understanding of Linux/Unix systems and shell scripting.
• Familiarity with cloud platforms (e.g., AWS, Azure, GCP).
• Exposure to containerization tools (e.g., Docker, Kubernetes).
• Experience with monitoring tools (e.g., Prometheus, Grafana, ELK).
• Knowledge of CI/CD tools (e.g., Jenkins, Gitlab, Bitbucket, Jira).
• Programming/scripting skills in Python, Java, or Bash.
• Understanding of networking fundamentals and system security.

• Good written and verbal communication skills.
• Self-motivated, independent and a good team player
• Able to work under pressure in a fast-paced environment
• Innovative, proactive mindset and with a focus on continuous improvement
• Strong analytical and problem-solving skills

#J-18808-Ljbffr
This advertiser has chosen not to accept applicants from your region.
 

Nearby Locations

Other Jobs Near Me

Industry

  1. request_quote Accounting
  2. work Administrative
  3. eco Agriculture Forestry
  4. smart_toy AI & Emerging Technologies
  5. school Apprenticeships & Trainee
  6. apartment Architecture
  7. palette Arts & Entertainment
  8. directions_car Automotive
  9. flight_takeoff Aviation
  10. account_balance Banking & Finance
  11. local_florist Beauty & Wellness
  12. restaurant Catering
  13. volunteer_activism Charity & Voluntary
  14. science Chemical Engineering
  15. child_friendly Childcare
  16. foundation Civil Engineering
  17. clean_hands Cleaning & Sanitation
  18. diversity_3 Community & Social Care
  19. construction Construction
  20. brush Creative & Digital
  21. currency_bitcoin Crypto & Blockchain
  22. support_agent Customer Service & Helpdesk
  23. medical_services Dental
  24. medical_services Driving & Transport
  25. medical_services E Commerce & Social Media
  26. school Education & Teaching
  27. electrical_services Electrical Engineering
  28. bolt Energy
  29. local_mall Fmcg
  30. gavel Government & Non Profit
  31. emoji_events Graduate
  32. health_and_safety Healthcare
  33. beach_access Hospitality & Tourism
  34. groups Human Resources
  35. precision_manufacturing Industrial Engineering
  36. security Information Security
  37. handyman Installation & Maintenance
  38. policy Insurance
  39. code IT & Software
  40. gavel Legal
  41. sports_soccer Leisure & Sports
  42. inventory_2 Logistics & Warehousing
  43. supervisor_account Management
  44. supervisor_account Management Consultancy
  45. supervisor_account Manufacturing & Production
  46. campaign Marketing
  47. build Mechanical Engineering
  48. perm_media Media & PR
  49. local_hospital Medical
  50. local_hospital Military & Public Safety
  51. local_hospital Mining
  52. medical_services Nursing
  53. local_gas_station Oil & Gas
  54. biotech Pharmaceutical
  55. checklist_rtl Project Management
  56. shopping_bag Purchasing
  57. home_work Real Estate
  58. person_search Recruitment Consultancy
  59. store Retail
  60. point_of_sale Sales
  61. science Scientific Research & Development
  62. wifi Telecoms
  63. psychology Therapy
  64. pets Veterinary
View All Devops Engineers Jobs