12 High Performance Computing jobs in Singapore
High-Performance Computing
Posted today
Job Viewed
Job Description
Summary:
We are seeking a highly experienced and driven High-Performance Computing (HPC) Engineer or Scientist to support our Linux-based HPC environment which includes compute clusters, parallel storage and high-speed networking used by researchers, staff and students. This role also involves customer-facing responsibilities including application support, system optimization and AI/DL workload integration.
Key Responsibilities:
- Lead the administration and operation of HPC Linux clusters, storage systems and high-speed networks.
- Provide hands-on support for HPC system software including cluster management, parallel file systems and job schedulers (e.g., SLURM, PBS, etc.).
- Troubleshoot and resolve issues across hardware, software, OS and networking layers.
- Collaborate with software engineers to support AI/deep learning applications and desktop engineers for user support.
- Advise and train researchers, postdocs and students on application development, debugging, optimization and parallelization.
- Plan and execute HPC application tuning and parallelization on behalf of users and research projects.
- Supervise end-user production operations on HPC infrastructure and provide expert-level guidance on job management and code execution.
- Engage with end users to support numerical simulation applications including weather forecasting, aeronautics and climate modeling.
- Support Distributed Data Parallel (DDP) training across multi-GPU setups for scaling deep learning models.
- Assist users with AI/ML frameworks such as TensorFlow, PyTorch, Hugging Face Transformers and PyTorch Lightning.
- Utilize performance tools like Allinea DDT, TotalView and compiler analyzers to optimize and debug HPC workloads.
- Deliver HPC training workshops, documentation and onboarding for new users.
Job Requirements:
- Bachelor's or Master's degree in Computer Science, Engineering, Physics or a related field.
- At least 5 years of experience with large-scale HPC systems including cluster operations and user support.
- In-depth knowledge of parallel programming and code optimization using:
Languages: Fortran, C, C++
Libraries: MPI, OpenMP - Proficient in Linux OS administration and scripting for system automation.
- Strong understanding of HPC system architecture, performance tuning and resource management.
- Experience with numerical simulations, such as weather forecasting, climate modeling and CFD applications.
- Familiarity with AI/DL tools and scalable training across multi-GPU environments.
- Experience with HPC job scheduling systems such as SLURM, LSF or PBS.
Desired Attributes:
- Highly self-motivated and results-driven with strong problem-solving skills.
- Team player with excellent collaboration and interpersonal skills.
- Strong verbal and written communication skills.
- Confident in presenting technical topics to diverse audiences.
- Proactive and resourceful with a strong customer service orientation.
Please send your detailed resume in MS Word format to with
- Education Level
- Working experiences
- Each employment background
- Reason for leaving each employment
- Last drawn salary
- Expected salary
- Date of availability
TensorFlow
Service Orientation
MPI
Fortran
Tuning
PyTorch
Climate
Performance Tuning
Simulations
Resource Management
System Architecture
Parallel Programming
C++
High-Performance Computing Engineer
Posted 2 days ago
Job Viewed
Job Description
The Digital Division leads the digital transformation of DSO through the master planning and policies, delivering digital capabilities through IT infrastructure, and providing one stop service to corporate and R&D Divisions. The Digital Division will transform the way we work, our workplace, and the capabilities we deliver to the MINDEF/SAF and for the security of Singapore.
People are DSO’s greatest asset. You will get to realise your career aspirations and develop your own niche either as a deep technical expert or a leader in the team. With frequent career dialogues and a robust training and development framework, we will provide you with the necessary development tools for you to reach your potential. You will also be recognised and rewarded through competitive remuneration packages and scholarship opportunities.
High-Performance Computing Engineer (3 Year Contract)
In this role, you will:
- Ensure the reliable operations of the central GPU Clusters use for AI training and High-Performance Computing (HPC) Clusters
- Advise Users on workload execution and optimization strategies
- Provide Users support for resources they need
- Support the maintenance and troubleshooting of AI and HPC infrastructure to ensure system stability. Work with the OEM vendor for troubleshooting and part replacements
- Manage day-to-day operations of the GPU cluster, HPC cluster, distributed storage system and other associated IT infrastructure (e.g., head nodes)
- Degree in Computer Engineering/ Computer Science/ Electrical & Electronic Engineering
- Proficient in UNI/ Linux operating systems and command-line interfaces (e.g., Ubuntu, Red Hat)
- Familiar with monitoring tools (e.g., Prometheus, Grafana, PRTG, Environet)
- Good knowledge and experience in HPC performance optimization and troubleshooting
- Proven working knowledge of HPC system and software
- Strong programming skill in Python and Bash scripting
- Familiarity with HPC schedulers (e.g., SLURM), container orchestration (e.g., Kubernetes), and GPU based systems
- Experience with HPC scheduling and workload management tools (e.g., Run.AI and SLURM will be preferred)
- Experience in managing parallel file systems (e.g., Lustre), with a strong understanding of HPC storage principles
- Experience with cluster management software (e.g., BCM)
- Proficient in Python and Bash scripting for automation tasks
- Experience with container technologies (e.g., Docker); container orchestration using Kubernetes is a plus
High-Performance Computing Engineer
Posted today
Job Viewed
Job Description
JOB DESCRIPTION DSO National Laboratories (DSO) is Singapore’s largest defence research and development (R&D) organisation, with the critical mission to develop technological solutions to sharpen the cutting edge of Singapore's national security. At DSO, you will develop more than just a career. This is where you will make a real impact and shape the future of defence across the spectrum of air, land, sea, space and cyberspace.
The Digital Division leads the digital transformation of DSO through the master planning and policies, delivering digital capabilities through IT infrastructure, and providing one stop service to corporate and R&D Divisions. The Digital Division will transform the way we work, our workplace, and the capabilities we deliver to the MINDEF/SAF and for the security of Singapore.
People are DSO’s greatest asset. You will get to realise your career aspirations and develop your own niche either as a deep technical expert or a leader in the team. With frequent career dialogues and a robust training and development framework, we will provide you with the necessary development tools for you to reach your potential. You will also be recognised and rewarded through competitive remuneration packages and scholarship opportunities.
High-Performance Computing Engineer (3 Year Contract)
In this role, you will:
- Ensure the reliable operations of the central GPU Clusters use for AI training and High-Performance Computing (HPC) Clusters
- Advise Users on workload execution and optimization strategies
- Provide Users support for resources they need
- Support the maintenance and troubleshooting of AI and HPC infrastructure to ensure system stability. Work with the OEM vendor for troubleshooting and part replacements
- Manage day-to-day operations of the GPU cluster, HPC cluster, distributed storage system and other associated IT infrastructure (e.g., head nodes)
JOB REQUIREMENTS
- Degree in Computer Engineering/ Computer Science/ Electrical & Electronic Engineering
- Proficient in UNI/ Linux operating systems and command-line interfaces (e.g., Ubuntu, Red Hat)
- Familiar with monitoring tools (e.g., Prometheus, Grafana, PRTG, Environet)
- Good knowledge and experience in HPC performance optimization and troubleshooting
- Proven working knowledge of HPC system and software
- Strong programming skill in Python and Bash scripting
- Familiarity with HPC schedulers (e.g., SLURM), container orchestration (e.g., Kubernetes), and GPU based systems
- Experience with HPC scheduling and workload management tools (e.g., Run.AI and SLURM will be preferred)
- Experience in managing parallel file systems (e.g., Lustre), with a strong understanding of HPC storage principles
- Experience with cluster management software (e.g., BCM)
- Proficient in Python and Bash scripting for automation tasks
- Experience with container technologies (e.g., Docker); container orchestration using Kubernetes is a plus
SKILLS
PARALLEL COMPUTINGDISTRIBUTED SYSTEMSCLUSTER MANAGEMENTJOB ID :
704878EXPERIENCE :
2 ~ 4 years #J-18808-LjbffrHigh-Performance Computing Expert
Posted today
Job Viewed
Job Description
We are seeking a highly experienced and driven HPC engineer or scientist to support our Linux-based HPC environment which includes compute clusters, parallel storage and high-speed networking used by researchers, staff and students. This role involves customer-facing responsibilities including application support, system optimization and AI/DL workload integration.
As a key member of the team, you will lead the administration and operation of HPC Linux clusters, storage systems and high-speed networks. You will provide hands-on support for HPC system software including cluster management, parallel file systems and job schedulers (e.g., SLURM, PBS, etc.).
Troubleshooting and resolving issues across hardware, software, OS and networking layers is also a critical component of this role. You will collaborate with software engineers to support AI/deep learning applications and desktop engineers for user support.
Additionally, you will advise and train researchers, postdocs and students on application development, debugging, optimization and parallelization. Planning and executing HPC application tuning and parallelization on behalf of users and research projects is also an essential part of this position.
This role requires strong expertise in numerical simulations, such as weather forecasting, climate modeling and CFD applications. Familiarity with AI/DL tools and scalable training across multi-GPU environments is also desired.
High-Performance Computing Innovator
Posted today
Job Viewed
Job Description
Join our team of innovators and help shape the future of high-performance computing. We are a cutting-edge semiconductor company that is developing next-generation server-class SoCs for data centers, cloud computing, and AI acceleration.
As an experienced Server Chip Architect, you will define and drive the architecture of innovative, scalable, and power-efficient CPU solutions. You will collaborate across hardware, software, and verification teams to bring world-class server processors from concept to production.
Key Responsibilities:
- Analyze Product Requirements
- Define High-Level Architecture
- Develop Hardware Specifications
- Provide Technical Leadership
- Review Verification Plans
- Evaluate Existing Hardware Modules
Job Requirements:
- Bachelor's Degree in Electrical Engineering or Computer Engineering with 8+ Years of Industry Experience
- Proven Track Record in Driving System Architecture Decisions
- Familiarity with Interface Peripherals such as Ethernet, PCIe, I2S, SPI, I2C
- In-Depth Knowledge of CPU Microarchitecture, Cache Hierarchy, Coherency Protocols, and Interconnect Fabrics
- Hands-On Experience in System-Level Performance Modeling
High-Performance Computing Senior Engineer
Posted 5 days ago
Job Viewed
Job Description
The Digital Division leads the digital transformation of DSO through the master planning and policies, delivering digital capabilities through IT infrastructure, and providing one stop service to corporate and R&D Divisions. The Digital Division will transform the way we work, our workplace, and the capabilities we deliver to the MINDEF/SAF and for the security of Singapore.
People are DSO’s greatest asset. You will get to realise your career aspirations and develop your own niche either as a deep technical expert or a leader in the team. With frequent career dialogues and a robust training and development framework, we will provide you with the necessary development tools for you to reach your potential. You will also be recognised and rewarded through competitive remuneration packages and scholarship opportunities.
High-Performance Computing Senior Engineer
In this role, you will:
- Ensure the reliable operations of the central GPU Clusters use for AI training and High Performance Computing (HPC) Clusters
- Advise Users on workload execution and optimization strategies
- Provide Users support for resources they need
- Support the maintenance and troubleshooting of AI and HPC infrastructure to ensure system stability. Work with the OEM vendor for troubleshooting and part replacements
- Manage day-to-day operations of the GPU cluster, HPC cluster, distributed storage system and other associated IT infrastructure (e.g. head nodes)
- Degree in Computer Engineering / Computer Science
- Experience with HPC scheduling and workload management tools (e.g., Run.AI and SLURM will be preferred)
- Experience in managing parallel file systems (e.g., Lustre), with a strong understanding of HPC storage principles
- Experience with cluster management software (e.g., BCM)
- Proficient in Python and Bash scripting for automation tasks
- Experience with container technologies (e.g., Docker); container orchestration using Kubernetes is a plus
- Understanding of basic network protocols (e.g., DHCP, DNS, SSH, SCP, SMTP).
- Proficient in UNIX/Linux operating systems and command-line interfaces (e.g., Ubuntu, Red Hat
- Familiar with monitoring tools (e.g., Prometheus, Grafana, PRTG, Environet
- Good knowledge and experience in HPC performance optimization and troubleshooting
- Proven working knowledge of HPC system and software
- Strong programming skill in Python and Bash scripting
- Familiarity with HPC schedulers (e.g., SLURM), container orchestration (e.g., Kubernetes), and GPU based systems
High-Performance Computing Senior Engineer
Posted today
Job Viewed
Job Description
JOB DESCRIPTION DSO National Laboratories (DSO) is Singapore’s largest defence research and development (R&D) organisation, with the critical mission to develop technological solutions to sharpen the cutting edge of Singapore's national security. At DSO, you will develop more than just a career. This is where you will make a real impact and shape the future of defence across the spectrum of air, land, sea, space and cyberspace.
The Digital Division leads the digital transformation of DSO through the master planning and policies, delivering digital capabilities through IT infrastructure, and providing one stop service to corporate and R&D Divisions. The Digital Division will transform the way we work, our workplace, and the capabilities we deliver to the MINDEF/SAF and for the security of Singapore.
People are DSO’s greatest asset. You will get to realise your career aspirations and develop your own niche either as a deep technical expert or a leader in the team. With frequent career dialogues and a robust training and development framework, we will provide you with the necessary development tools for you to reach your potential. You will also be recognised and rewarded through competitive remuneration packages and scholarship opportunities.
High-Performance Computing Senior Engineer
In this role, you will:
- Ensure the reliable operations of the central GPU Clusters use for AI training and High Performance Computing (HPC) Clusters
- Advise Users on workload execution and optimization strategies
- Provide Users support for resources they need
- Support the maintenance and troubleshooting of AI and HPC infrastructure to ensure system stability. Work with the OEM vendor for troubleshooting and part replacements
- Manage day-to-day operations of the GPU cluster, HPC cluster, distributed storage system and other associated IT infrastructure (e.g. head nodes)
JOB REQUIREMENTS
- Degree in Computer Engineering / Computer Science
- Experience with HPC scheduling and workload management tools (e.g., Run.AI and SLURM will be preferred)
- Experience in managing parallel file systems (e.g., Lustre), with a strong understanding of HPC storage principles
- Experience with cluster management software (e.g., BCM)
- Proficient in Python and Bash scripting for automation tasks
- Experience with container technologies (e.g., Docker); container orchestration using Kubernetes is a plus
- Understanding of basic network protocols (e.g., DHCP, DNS, SSH, SCP, SMTP).
- Proficient in UNIX/Linux operating systems and command-line interfaces (e.g., Ubuntu, Red Hat
- Familiar with monitoring tools (e.g., Prometheus, Grafana, PRTG, Environet
- Good knowledge and experience in HPC performance optimization and troubleshooting
- Proven working knowledge of HPC system and software
- Strong programming skill in Python and Bash scripting
- Familiarity with HPC schedulers (e.g., SLURM), container orchestration (e.g., Kubernetes), and GPU based systems
SKILLS
PARALLEL COMPUTINGDISTRIBUTED SYSTEMSCLUSTER MANAGEMENTJOB ID :
702249EXPERIENCE :
5 ~ 10 yearsDIVISION
DIGITALTYPE
PERMANENTDIVISION
DIGITALFIELD
SOFTWARE DEVELOPMENT #J-18808-LjbffrBe The First To Know
About the latest High performance computing Jobs in Singapore !
Advanced High-Performance Computing Engineer
Posted today
Job Viewed
Job Description
We are seeking a skilled System Level Technical Lead to drive and lead the GPU debug for Data Center New Product Introduction (NPI).
About the RoleThis is an exciting opportunity to join our high-end test engineering team as a System Level Technical Lead. You will be responsible for providing leadership to meet business milestones, cost, and quality in the GPU system level test area.
You will collaborate with internal teams on GPU swim lane from ASIC initial bring up to HVM (High Volume Manufacturing) and solve complex, novel, and non-recurring problems.
Key Responsibilities:High-Performance Computing Infrastructure Engineer
Posted today
Job Viewed
Job Description
Data Centre Engineer Opportunity
Firmus Technologies is seeking a skilled Data Centre Engineer to join our Operations team, supporting the daily operations and maintenance of our AI-accelerated high-performance computing (HPC) infrastructure.
This role will work closely with Field Service Engineers, HPC and Network Engineering teams, and assist the Global Operations Centre (GOC). It offers a unique opportunity to contribute directly to the stability and growth of cutting-edge AI infrastructure.
Key Responsibilities:- Support in the deployment, configuration, and maintenance of various high-end GPU servers, storage servers, networking equipment and software components in highly secure environments.
- Perform hardware diagnostics, systems functionality and firmware updates as required.
- Collaborate with engineering teams to assist in tailored customer environments deployment (eg: bare-metal systems, HPC Clusters, Kubernetes, Slurm etc).
- Serve as first line of engineering support for onsite operational issues, including troubleshooting hardware, network and software problems.
- Troubleshoot incidents, escalate critical issues and provide feedback to appropriate teams for improvements.
- Participate in an on-call rotation to ensure 24/7 availability and responsiveness to critical issues.
- Provide technical support to the GOC Support Specialist team in troubleshooting HPC-related problems.
- Document incident details, resolutions, and lessons learned to enhance future problem-solving.
- Maintain clear, accurate, and up-to-date documentation to promote effective knowledge sharing across the team.
- Communicate effectively with GOC, HPC Engineers, internal teams, stakeholders, and end-users to ensure alignment on issue resolution.
- Bachelor's degree in computer engineering, computer science, or a related technical field.
- 5+ years of experience in field service technical areas.
- Strong understanding of server hardware technology, Linux environments and troubleshooting hardware problems, with adherence to physical and system-level security standards.
- Experience with scripting languages (eg: Bash, Python)
- Familiarity with using workload manager and cluster softwares (eg: Slurm, Kubernetes, Nvidia BCM) and Observability tools (eg: Prometheus, Grafana, ELK, etc)
- Excellent problem-solving and analytical skills.
- Ability to work independently and as part of a team.
- Strong communication skills, both written and verbal.
About Us:
We are committed to building a diverse and inclusive workplace. We encourage applications from candidates of all backgrounds who are passionate about creating a more sustainable future through innovative engineering solutions.
Join us in our mission to revolutionize the AI industry through sustainable practices and cutting-edge engineering. Apply now to be part of shaping the future of sustainable AI infrastructure.
Tell employers what skills you haveSenior System Developer - High Performance Computing
Posted today
Job Viewed
Job Description
Job Description:
We are seeking a skilled system developer to design, develop and deploy high-performance algorithms for our cutting-edge systems. You will tackle complex problems in real-time data processing, computer vision, and high-performance computing (HPC) while collaborating with cross-functional teams to drive innovation.
Key Responsibilities:
ü Design and implement efficient algorithms for various applications, including risk modeling, 3D rendering, and route optimization.
ü Develop robust, low-latency C++ code (C++17/20) for mission-critical systems.
ü Optimize algorithm performance for speed and memory efficiency on CPU, GPU, and embedded systems.
ü Explore novel approaches using machine learning, statistics, and geometric methods to solve open-ended problems.
ü Collaborate with software engineers, data scientists, and product teams to integrate algorithms into production.
Technical Qualifications:
ü 3+ years of experience in modern C++ development (C++11/14/17) in production environments.
ü Expertise in algorithm design, complexity analysis, and data structures (trees, graphs, hash tables).
ü Strong mathematical foundation in linear algebra, calculus, and probability.
ü Familiarity with performance tools (Valgrind, gprof, VTune) and low-latency systems.
Domain Knowledge:
o Fintech: Pricing models, quantitative finance.
o Gaming: Physics engines, pathfinding.
o HPC/Embedded: CUDA, OpenMP, ARM optimization.
ü Experience & Education:
ü 3–5+ years in algorithm-intensive roles (e.g., HFT, game engine dev, robotics).
ü Education: BS/MS/PhD in Computer Science, Engineering, Math, or related field.
ü Portfolio: Public GitHub repo or white papers demonstrating algorithm work (strongly preferred).