1,043 Reliability Engineering jobs in Singapore
Reliability Engineering Professional
Posted today
Job Viewed
Job Description
We are seeking a Principal Engineer to lead reliability initiatives within our Quality and Reliability Engineering team. Key responsibilities include overseeing laboratory operations, guiding reliability testing for advanced process technologies, and ensuring equipment and methods meet the highest standards. This leadership role combines technology, operations, and mentoring.
Key Responsibilities:
- Lead reliability qualifications and monitoring for new technologies.
- Oversee wafer-level and product-level reliability testing programs.
- Improve test methods and lab capabilities to meet evolving needs.
- Support fab monitoring, customer requests, and engineering evaluations.
- Ensure smooth day-to-day lab operations and high equipment uptime.
- Promote a culture of safety, quality, and continuous improvement.
- Mentor engineers and build technical expertise within the team.
Requirements
- Degree in Engineering or Science (Mechanical, Chemical, or related).
- 10+ years in semiconductor/wafer fab, with proven leadership in reliability.
- Deep knowledge of WLR/PLR methods, reliability mechanisms, and test standards.
- Experience across major process technologies such as Automotive, Logic, HV, Flash/NVM.
- Familiarity with global standards (AEC-Q100, JEDEC, JEP001).
Reliability Engineering Senior Manager
Posted today
Job Viewed
Job Description
SSMC (Systems on Silicon Manufacturing Company Pte. Ltd.), is a Joint Venture between NXP and TSMC. We offer flexible and cost-effective semiconductor fabrication solutions by maintaining fully equipped SMIF cleanroom environment, 100% equipment automation and proven wafer-manufacturing processes.
We're looking for innovative, passionate, and talented people like you to join our team.
We're searching for a Manager /Senior MTS to be part of our QRE Department diverse team of talent, to support Reliability Laboratory Operations and Manage PLR and WLR Reliability Test Equipment (Preventive Maintenance, Calibration). Lead High Voltage (HV) Process Technologies Reliability Tests & Support for Fab Monitoring / Qualification / Customer Issues / Engineering Change Evaluations.
What you will be working on:
- Lead and Setup New Process Technology Reliability Qualification
- Define and Execute New Process Technology Reliability Qualification Plan Requirements to meet Technology Milestones requirements
- Lead and Setup New Process Technology Reliability Monitoring
- Conduct Process/Wafer Level Reliability (WLR) Tests and Analysis
- Conduct Product Level Reliability (PLR) Tests and Analysis
- Support Fab Monitoring / Qualification / Customer Issues / Engineering Change Evaluations and Perform Reliability Risk Assessments
- Develop and Setup New or Enhanced Process and Product Reliability Tests / Analysis / Methodologies / Capabilities / Techniques
- Schedule & Prioritize Reliability Tests Requests (Manpower, Skills, Tool resources)
- Keep in-line with Industry and Mother-fabs' Reliability Tests & Requirement Trends / Development
- Support Reliability Laboratory Operations and Manage PLR and WLR Reliability Test Equipment (Preventive Maintenance, Calibration). Maintain Day-to-Day Reliability Laboratory Operations, Equipment Uptime
- Drive Continuous Improvement in Safety, Quality, Productivity of work processes and environment to achieve assigned department targets
- Training, Coaching and Development of Reliability Engineers
More about you:
- Master / Degree in Science or Engineering in Mechanical, Chemical Engineering or equivalent
- Extensive Experience: >10 years in Wafer Fab / Semiconductor Environment and Leading Role in WLR / PLR Reliability.
- In-depth understanding of Technologies, Trends and Needs
- Experience with major Process Technologies like Automotive, Logic, High Voltage, FLASH / EE / Non-Volatile-Memory (NVM), General Purpose Processes。
- In-depth Knowledge Front-End / Back-End Reliability Mechanisms, Test Methodology (GOI, TDDB, HCI, NBTI, BTS, JS, PID, ESD, LU, EM, SV, Low-K IMD) (HTOL, EFR, IFR, THB, HAST, TMCL, TH, HTS, Pre-Con, Reflow)
- Good knowledge of International Standards & Requirements on Process & Product Reliability (AEC-Q100, JEDEC, JEP001)
SSMC is firmly committed to upholding equal employment opportunities for all individuals. We strictly adhere to the Tripartite Guidelines on Fair Employment Practices (TGFEP), the Singapore Food Safety and Security Act 2025 (FSSA 2025), and the Singapore Code of Advertising Practice. All qualified applicants will receive non-discriminatory consideration for employment on the basis of merit and regardless of age, race, gender, religion, marital status and family responsibilities, or disability, or any other attributes as protected by the relevant laws.
Tell employers what skills you haveCoaching
Test Equipment
Reliability
Administration Management
Reliability Engineering
Technical Engineering
Wafer Fabrication
Voltage
Silicon
Chemical Engineering
Engineering Assistant (Reliability Engineering)
Posted 2 days ago
Job Viewed
Job Description
Responsibilities :
- Be trained in reliability tests to understand the test requirement and workflow.
- Prepare and submit laboratory reliability report per laboratory requirement.
- Perform data analysis to determine any abnormal test result or error in testing.
- Report to Engineers on the findings and assist to provide the action plan.
- Work with supervisors and shift technicians to clarify and/or perform follow up actions such as amendment or retest if required.
- For product issue,provide findings and follow up with Failure Analysis team to obtain the FA result.
- For laboratory error,with the instruction from engineers to work with supervisors to come out with corrective actions and arrange with re-test if necessary.
Requirements :
- Fresh Diploma Holder or Nitec in Electronic/Electrical Engineering or equivalent
- 2 years of relevant working experience in manufacturing/Lab environment will be advantageous.
- Proficient in Microsoft Excel.
- Possess positive mind set and willingness to learn.
- Transport pick-up at Bedok MRT station.
Work Location: Chai Chee (Bedok)
Work Days: Mondays to Fridays: 8.15am to 5.15pm
Reliability Engineering Manager- Information Security
Posted today
Job Viewed
Job Description
Imagine what you could accomplish here. Bring your passion, creativity, and dedication, and there will be no limit to what you can achieve. This is not just another SRE role-it's a chance to help redefine how reliability engineering is practiced at hyper-scale. Our team is building the platforms that will autonomously operate Apple's core information security systems, setting a new bar for how critical services are managed.
Description
We are seeking exceptional engineers who thrive at the intersection of reliability, software development and automation - individuals driven to push the boundaries of what's possible. The ideal candidate has a strong foundation in modern SRE practices and a proven ability to design and implement software that solves operational challenges. You'll break new ground using the most advanced tools and approaches available, developing automation that doesn't just keep pace with scale but anticipates, reacts and stays ahead of it. You will work closely with Security Engineering, Threat Detection, Incident Response and other internal functions to ensure the scalability, availability and security of the tools and infrastructure that support our cybersecurity mission. Join us, and help build the future of self-managing systems at one of the most innovative companies in the world.
Responsibilities
- Inspire, mentor, and grow a high-performing team of SREs dedicated to automating and scaling Apple's core security platforms.
- Champion operational excellence by building resilient monitoring, alerting, and automated remediation practices that minimize downtime and manual effort.
- Advance infrastructure-as-code and automation to eliminate toil, improve consistency, and accelerate delivery of secure, reliable services.
- Partner closely with InfoSec stakeholders to translate security requirements into scalable, supportable, and performant solutions.
- Own the reliability of critical security systems-including SIEM, SOAR, telemetry, and vulnerability management-ensuring availability, performance, and capacity keep pace with business demand.
- Lead incident response with confidence, driving resolution of outages and infrastructure issues while fostering a blameless, learning-oriented culture.
- Define and enforce SLOs/SLIs for InfoSec services, using data to measure success and continuously improve.
- Collaborate across engineering and IT to embed best practices in CI/CD, containerization, and service orchestration.
- Uphold strong security hygiene and compliance, aligning with both internal standards and external regulatory requirements.
Set direction and priorities for the team, managing resources, timelines, and initiatives to maximize impact.
Minimum Qualifications
- 5+ years of experience in SRE or Service Infrastructure roles, including 2+ years in a leadership or managerial role
- Strong understanding of modern SRE practices, including observability, automation, and reliability engineering
- Experience with cloud platforms (AWS, GCP) and infrastructure-as-code tools (Pulumi, Terraform, Ansible, etc.)
- Familiarity with container technologies (Docker, Kubernetes) and CI/CD pipelines
- Excellent communication skills with an ability to collaborate across technical and non-technical teams
Preferred Qualifications
- Bachelor's degree in Computer Science, or a related field, or equivalent practical experience
- Prior experience working in or closely with Information Security teams
- The ability to contribute and review code in Python, Go, Swift or other scripting languages
- Experience operating with Scrum/Agile development methodologies
- Ability to cultivate an environment that emphasizes collaboration, accountability, and excellence
- Experience managing systems that support InfoSec functions (e.g., security monitoring, log aggregation, scanning tools)
- Ability to work under pressure and manage difficult situations in a dynamic work environment
- Passion for high-quality code, unit-tests, documentation, and production services
Previous experience working on a global team with 24/7 support model
Submit CV
Reliability Engineering Senior Manager /MTS
Posted today
Job Viewed
Job Description
SSMC (Systems on Silicon Manufacturing Company Pte. Ltd.), is a Joint Venture between NXP and TSMC. We offer flexible and cost-effective semiconductor fabrication solutions by maintaining fully equipped SMIF cleanroom environment, 100% equipment automation and proven wafer-manufacturing processes.
We're looking for innovative, passionate, and talented people like you to join our team.
We’re searching for a
Manager /Senior MTS
to be part of our
QRE Department
diverse team of talent, to support Reliability Laboratory Operations and Manage PLR and WLR Reliability Test Equipment (Preventive Maintenance, Calibration). Lead High Voltage (HV) Process Technologies Reliability Tests & Support for Fab Monitoring / Qualification / Customer Issues / Engineering Change Evaluations.
What you will be working on:
Lead and Setup New Process Technology Reliability Qualification
Define and Execute New Process Technology Reliability Qualification Plan Requirements to meet Technology Milestones requirements
Lead and Setup New Process Technology Reliability Monitoring
Conduct Process/Wafer Level Reliability (WLR) Tests and Analysis
Conduct Product Level Reliability (PLR) Tests and Analysis
Support Fab Monitoring / Qualification / Customer Issues / Engineering Change Evaluations and Perform Reliability Risk Assessments
Develop and Setup New or Enhanced Process and Product Reliability Tests / Analysis / Methodologies / Capabilities / Techniques
Schedule & Prioritize Reliability Tests Requests (Manpower, Skills, Tool resources)
Keep in-line with Industry and Mother-fabs’ Reliability Tests & Requirement Trends / Development
Support Reliability Laboratory Operations and Manage PLR and WLR Reliability Test Equipment (Preventive Maintenance, Calibration). Maintain Day-to-Day Reliability Laboratory Operations, Equipment Uptime
Drive Continuous Improvement in Safety, Quality, Productivity of work processes and environment to achieve assigned department targets
Training, Coaching and Development of Reliability Engineers
More about you:
Master / Degree in Science or Engineering in Mechanical, Chemical Engineering or equivalent
Extensive Experience: >10 years in Wafer Fab / Semiconductor Environment and Leading Role in WLR / PLR Reliability.
In-depth understanding of Technologies, Trends and Needs
Experience with major Process Technologies like Automotive, Logic, High Voltage, FLASH / EE / Non-Volatile-Memory (NVM), General Purpose Processes。
In-depth Knowledge Front-End / Back-End Reliability Mechanisms, Test Methodology (GOI, TDDB, HCI, NBTI, BTS, JS, PID, ESD, LU, EM, SV, Low-K IMD) (HTOL, EFR, IFR, THB, HAST, TMCL, TH, HTS, Pre-Con, Reflow)
Good knowledge of International Standards & Requirements on Process & Product Reliability (AEC-Q100, JEDEC, JEP001)
SSMC is firmly committed to upholding equal employment opportunities for all individuals. We strictly adhere to the Tripartite Guidelines on Fair Employment Practices (TGFEP), the Singapore Food Safety and Security Act 2025 (FSSA 2025), and the Singapore Code of Advertising Practice. All qualified applicants will receive non-discriminatory consideration for employment on the basis of merit and regardless of age, race, gender, religion, marital status and family responsibilities, or disability, or any other attributes as protected by the relevant laws.
#J-18808-Ljbffr
Reliability Engineering Senior Manager /MTS
Posted 13 days ago
Job Viewed
Job Description
SSMC (Systems on Silicon Manufacturing Company Pte. Ltd.), is a Joint Venture between NXP and TSMC. We offer flexible and cost-effective semiconductor fabrication solutions by maintaining fully equipped SMIF cleanroom environment, 100% equipment automation and proven wafer-manufacturing processes.
We're looking for innovative, passionate, and talented people like you to join our team.
We’re searching for a Manager /Senior MTS to be part of our QRE Department diverse team of talent, to support Reliability Laboratory Operations and Manage PLR and WLR Reliability Test Equipment (Preventive Maintenance, Calibration). Lead High Voltage (HV) Process Technologies Reliability Tests & Support for Fab Monitoring / Qualification / Customer Issues / Engineering Change Evaluations.
What you will be working on:
- Lead and Setup New Process Technology Reliability Qualification
- Define and Execute New Process Technology Reliability Qualification Plan Requirements to meet Technology Milestones requirements
- Lead and Setup New Process Technology Reliability Monitoring
- Conduct Process/Wafer Level Reliability (WLR) Tests and Analysis
- Conduct Product Level Reliability (PLR) Tests and Analysis
- Support Fab Monitoring / Qualification / Customer Issues / Engineering Change Evaluations and Perform Reliability Risk Assessments
- Develop and Setup New or Enhanced Process and Product Reliability Tests / Analysis / Methodologies / Capabilities / Techniques
- Schedule & Prioritize Reliability Tests Requests (Manpower, Skills, Tool resources)
- Keep in-line with Industry and Mother-fabs’ Reliability Tests & Requirement Trends / Development
- Support Reliability Laboratory Operations and Manage PLR and WLR Reliability Test Equipment (Preventive Maintenance, Calibration). Maintain Day-to-Day Reliability Laboratory Operations, Equipment Uptime
- Drive Continuous Improvement in Safety, Quality, Productivity of work processes and environment to achieve assigned department targets
- Training, Coaching and Development of Reliability Engineers
More about you:
- Master / Degree in Science or Engineering in Mechanical, Chemical Engineering or equivalent
- Extensive Experience: >10 years in Wafer Fab / Semiconductor Environment and Leading Role in WLR / PLR Reliability.
- In-depth understanding of Technologies, Trends and Needs
- Experience with major Process Technologies like Automotive, Logic, High Voltage, FLASH / EE / Non-Volatile-Memory (NVM), General Purpose Processes。
- In-depth Knowledge Front-End / Back-End Reliability Mechanisms, Test Methodology (GOI, TDDB, HCI, NBTI, BTS, JS, PID, ESD, LU, EM, SV, Low-K IMD) (HTOL, EFR, IFR, THB, HAST, TMCL, TH, HTS, Pre-Con, Reflow)
- Good knowledge of International Standards & Requirements on Process & Product Reliability (AEC-Q100, JEDEC, JEP001)
SSMC is firmly committed to upholding equal employment opportunities for all individuals. We strictly adhere to the Tripartite Guidelines on Fair Employment Practices (TGFEP), the Singapore Food Safety and Security Act 2025 (FSSA 2025), and the Singapore Code of Advertising Practice. All qualified applicants will receive non-discriminatory consideration for employment on the basis of merit and regardless of age, race, gender, religion, marital status and family responsibilities, or disability, or any other attributes as protected by the relevant laws.
Senior Software Engineer, Site Reliability Engineering
Posted today
Job Viewed
Job Description
We are a team to design, develop, maintain, and improve software for various ventures projects, i.e., projects that are adjacent to our core businesses and are bootstrapped fast with a lean team. You will be actively involved in the design of various components behind scalable applications, from frontend UI to backend infrastructure.
What you’ll be doing
Ensure entire stack is healthy: hardware, software, application and network are operating at optimal performance
Perform deep dives into both systemic and latent reliability issues; partnering with other software and DevOps engineers across the organization to design, implement and roll out fixes
Continuously improve availability, reliability, and observability and reduce the burden of human toil with tooling and automation
Lead and drive SRE initiatives to improve operation efficiencies
Represent the SRE team in system design reviews and operational readiness exercises for new and existing services
What you need
Experience coding in Ruby and/or Go
Familiar with GitOps principles and tools (Github Actions, Docker, Kubernetes)
Experience in designing, analyzing, and troubleshooting large-scale distributed systems
Curiosity about finding root causes in incidents and outages
Ability to develop alignment to cultivate relationships and driving impact
Mindset in designing fault tolerance system architecture
Comfort with being uncomfortable in ambiguous situations
Involvement with incident management and response
Desire to grow expertise, inform, and educate others
Capable to pick up various technologies, a fast learner and have a “get things done” mentality
Humble to embrace better ideas from others, eager to make things better, open to challenges and possibilities
Desirable
Familiar with cloud platforms and micro-service based architecture (AWS is big plus)
Familiar with monitoring tools (e.g. Datadog, OpenTelemetry)
Familiar with CICD tools (e.g. Github Actions)
Familiar with IaC tools (e.g. Terraform, Spacelift)
Experience in designing resilient system architecture
Experience in optimizing performance of large-scale production system
Life @ Crypto.com
Empowered to think big. Try new opportunities while working with a talented, ambitious and supportive team.
Transformational and proactive working environment. Empower employees to find thoughtful and innovative solutions.
Growth from within. We help to develop new skill-sets that would impact the shaping of your personal and professional growth.
Work Culture. Our colleagues are some of the best in the industry; we are all here to help and support one another.
One cohesive team. Engage stakeholders to achieve our ultimate goal - Cryptocurrency in every wallet.
Work Flexibility Adoption. Flexi-work hour and hybrid or remote set-up.
Aspire career alternatives through us - our internal mobility program offers employees a new scope.
Work Perks: crypto.com visa card provided upon joining.
Benefits
Competitive salary.
Attractive annual leave entitlement including: birthday, work anniversary.
Work Flexibility Adoption. Flexi-work hour and hybrid or remote set-up.
Aspire career alternatives through us. Our internal mobility program can offer employees a diverse scope.
Work Perks: crypto.com visa card provided upon joining.
Our Crypto.com benefits packages vary depending on region requirements, you can learn more from our talent acquisition team.
About Crypto.com:
Founded in 2016, Crypto.com serves more than 80 million customers and is the world's fastest growing global cryptocurrency platform. Our vision is simple: Cryptocurrency in Every Wallet. Built on a foundation of security, privacy, and compliance, Crypto.com is committed to accelerating the adoption of cryptocurrency through innovation and empowering the next generation of builders, creators, and entrepreneurs to develop a fairer and more equitable digital ecosystem.
Learn more at
Crypto.com is an equal opportunities employer and we are committed to creating an environment where opportunities are presented to everyone in a fair and transparent way. Crypto.com values diversity and inclusion, seeking candidates with a variety of backgrounds, perspectives, and skills that complement and strengthen our team.
Personal data provided by applicants will be used for recruitment purposes only.
Please note that only shortlisted candidates will be contacted.
#J-18808-Ljbffr
Be The First To Know
About the latest Reliability engineering Jobs in Singapore !
Senior Manager – Site Reliability Engineering SRE
Posted today
Job Viewed
Job Description
Nice to Meet You We are Dropsuite, a NinjaOne Company
Site Ops teams are a blend of pragmatic operators and software craftspeople that apply sound engineering principles, operational discipline, and mature automation to our operating environments.
We are seeking a seasoned Senior Manager – Site Reliability Engineering (SRE) to lead a high-impact team focused on building resilient, scalable infrastructure and ensuring platform reliability across our cloud environments. This role combines strategic leadership with deep technical expertise in automation, observability, and modern DevOps practices to drive operational excellence and service uptime.
Work Arrangement
- Full-time position
- Hybrid work model (2 days per week in the office)
- Monday to Friday, 5-day work week (flexible work schedule)
- Eligible to reside and work in Singapore (Singapore Citizens / PRs preferred)
This position is open exclusively to candidates who reside in and are authorised to work in Singapore. Only shortlisted candidates will be contacted.
Key Accountabilities
- Define and implement SRE roadmaps aligned with business objectives and SLAs.
- Collaborate with service owners to define SLOs supporting SLA commitments.
- Deliver platform SLI insights through reports and observability tools.
- Integrate reliability best practices into engineering and product workflows.
- Lead initiatives on uptime, monitoring, incident response, and optimization.
- Manage incident response processes, on-call rotations, and playbooks.
- Set infrastructure resiliency standards for cloud-native environments.
- Optimize architecture for scalability, fault tolerance, and cost efficiency.
- Ensure production systems meet security and compliance requirements.
- Provide strategic leadership and mentorship to drive team growth and performance.
- Design scalable and resilient systems architecture.
- Recruit, mentor, and retain high-performing SRE talent.
- Develop growth and training plans for SRE team members.
- Foster a reliability-focused, customer-centric team culture.
Qualifications and Competencies
- Bachelor's degree in Computer Science or a related field.
- Cloud certification in AWS, Azure, or GCP preferred.
- 8+ years in Software Engineering or Site Reliability Engineering.
- 3+ years in team management or technical leadership.
- Expert-level Linux administration, scripting, and troubleshooting.
- Strong hands-on experience with CI/CD and SDLC practices.
- Deep passion for automation, security, and self-service.
- Proficient in AWS, GCP, and/or Azure cloud platforms.
- Skilled in infrastructure-as-code tools like Terraform, CloudFormation, Helm, and Ansible.
- Experienced with containers, Kubernetes, and microservice architectures.
- Excellent verbal and written communication skills.
Why Join Us
At Dropsuite, now proudly part of NinjaOne, we are on a mission to safeguard business information and help businesses stay in business. We are a global, fast-growing, partner-centric company building secure, scalable, and highly usable cloud backup technologies for businesses of all sizes. Today, we perform billions of backups daily for organizations across more than 100 countries.
As we enter an exciting new chapter with NinjaOne—a leader in endpoint management, security, and IT automation—our combined strengths enable us to drive even greater impact, innovation, and global scale. Together, we are building a world-class platform that empowers IT teams with simplicity, performance, and reliability.
At our core, we are a team of hungry owners: we are tenacious in our pursuit of excellence and take full ownership in everything we do. We are deeply customer-focused, collaborative, and solutions-driven. We play as a team—respecting, supporting, and elevating one another every step of the way.
Join us as we shape the future of IT and data protection—powered by passion, purpose, and the spirit of ownership.
Rewards That Go Beyond
- Competitive compensation
- Hybrid work model
- 18 days of annual leave (with accrual up to 20 days)
- Entitled to Singapore Public Holidays
- Other leave benefits, such as Wedding leave
- Health Insurance for you and your dependents
- Growth opportunities
- Work in a global company with meaningful work, highly skilled colleagues, and an amazing culture
Diversity and Inclusion Statement
Dropsuite is an Equal Employment Opportunity and Affirmative Action Employer. Qualified applicants will receive consideration for employment without regard to race, colour, religion, sex, sexual orientation, gender perception or identity, national origin, age, marital status, protected veteran status, or disability status.
As part of our recruitment process, we may collect personal data to support hiring-related activities such as screening, assessment, and communication. This information is collected solely for recruitment purposes and handled in accordance with applicable data protection and privacy regulations. Your data will be treated with strict confidentiality and used only to facilitate your application with us.
Your Career Growth Starts Here. Apply Now
Tell employers what skills you haveTroubleshooting
Scalability
Operational Excellence
Kubernetes
Azure
Ubuntu
Software Engineering
Scripting
Reliability
Administration Management
Reliability Engineering
Technical Consultation
GCP
Ansible
Linux
Principal Specialist, Platforms Reliability Engineering (Networks)
Posted today
Job Viewed
Job Description
Overview
Principal Specialist, Platforms Reliability Engineering (Networks)
Singtel Networks is transforming to enable the digital generation of tomorrow. We are introducing new capabilities in 5G, Cloud, Analytics, Digital Commerce, Software Engineering, and Cyber Security to enhance our core competencies and deliver innovative and differentiated services for our customers. We are committed to inclusion and diversity and upskilling all individuals. We build Singtel’s Networks of tomorrow and empower every generation to live, work and play in new ways.
We are an Employer of Choice and strive for a vibrant, diverse and inclusive workforce with a fair, performance-based culture that is collaborative.
Vaccination policy:
We are committed to a safe and healthy environment for our employees and customers and will require all prospective employees to be fully vaccinated.
Responsibilities
Lead the design, development, and operation of scalable platforms including NSB, SO, NDB and NDC to enable Telco APIs and digital services across Consumer and Enterprise domains.
Institutionalize DevOps methodologies infused with AI to optimize planning, coding, testing, and deployment cycles; deliver secure, scalable microservices infrastructure through AI-enhanced CI/CD pipelines and cloud-native technologies, enabling self-optimizing, resilient, and adaptive systems.
Research and explore new technologies in platform engineering, automation, cybersecurity, and cloud computing (hybrid/multi/edge) for incorporation into platform architecture and solutions.
Manage and align various teams and stakeholders, including top management, to ensure timely and secure delivery of key building blocks for Autonomous Networks.
Collaborate with business units to gather and prioritize requirements for platform delivery.
Oversee service management for production platforms, ensuring reliable operations in accordance with network SLAs and IMDA regulations through proactive monitoring and reporting.
Manage change processes to ensure software releases align with internal change management and deployment protocols.
Lead incident management efforts, including troubleshooting and root cause analysis of platform issues.
Qualifications
Bachelor's Degree in IT/Computer Science/Computer Engineering or relevant discipline.
Minimum 12 years of working experience in DevOps automation, containerization, platform engineering and site reliability engineering.
Experience in platforms engineering with strong understanding of containerization, API gateway and enterprise integration.
Strong knowledge of software development automation tools (e.g. Ansible, Terraform, Nexus, Jenkins, SoapUI, SonarQube).
Strong scripting skills (e.g. Python, Bash, JavaScript, Ruby).
Strong understanding and experience in virtualization and networking in a container environment, such as OpenShift/Kubernetes.
Strong understanding of cloud computing/container deployment and management (AWS/Azure/OpenStack, etc.).
Breadth of knowledge – OS, system administration, networking, infrastructure, storage, distributed computing, cloud computing.
Strong understanding of Agile projects (SCRUM/KANBAN) and tools (e.g., JIRA).
Experience in project planning and management activities including financial and procurement, and translating business requirements into actionable deliverables.
Rewards and Benefits
Full suite of health and wellness benefits
Ongoing training and development programs
Internal mobility opportunities
We are committed to a safe and healthy environment for our employees and customers and will require all prospective employees to be fully vaccinated.
#J-18808-Ljbffr
Senior Manager – Site Reliability Engineering [SRE]
Posted 13 days ago
Job Viewed
Job Description
Nice to Meet You! We are Dropsuite, a NinjaOne Company!
Site Ops teams are a blend of pragmatic operators and software craftspeople that apply sound engineering principles, operational discipline, and mature automation to our operating environments.
We are seeking a seasoned Senior Manager – Site Reliability Engineering (SRE) to lead a high-impact team focused on building resilient, scalable infrastructure and ensuring platform reliability across our cloud environments. This role combines strategic leadership with deep technical expertise in automation, observability, and modern DevOps practices to drive operational excellence and service uptime.
Work Arrangement
- Full-time position
- Hybrid work model (2 days per week in the office)
- Monday to Friday, 5-day work week (flexible work schedule)
- Eligible to reside and work in Singapore (Singapore Citizens / PRs preferred)
This position is open exclusively to candidates who reside in and are authorised to work in Singapore. Only shortlisted candidates will be contacted.
Key Accountabilities
- Define and implement SRE roadmaps aligned with business objectives and SLAs.
- Collaborate with service owners to define SLOs supporting SLA commitments.
- Deliver platform SLI insights through reports and observability tools.
- Integrate reliability best practices into engineering and product workflows.
- Lead initiatives on uptime, monitoring, incident response, and optimization.
- Manage incident response processes, on-call rotations, and playbooks.
- Set infrastructure resiliency standards for cloud-native environments.
- Optimize architecture for scalability, fault tolerance, and cost efficiency.
- Ensure production systems meet security and compliance requirements.
- Provide strategic leadership and mentorship to drive team growth and performance.
- Design scalable and resilient systems architecture.
- Recruit, mentor, and retain high-performing SRE talent.
- Develop growth and training plans for SRE team members.
- Foster a reliability-focused, customer-centric team culture.
Qualifications and Competencies
- Bachelor's degree in Computer Science or a related field.
- Cloud certification in AWS, Azure, or GCP preferred.
- 8+ years in Software Engineering or Site Reliability Engineering.
- 3+ years in team management or technical leadership.
- Expert-level Linux administration, scripting, and troubleshooting.
- Strong hands-on experience with CI/CD and SDLC practices.
- Deep passion for automation, security, and self-service.
- Proficient in AWS, GCP, and/or Azure cloud platforms.
- Skilled in infrastructure-as-code tools like Terraform, CloudFormation, Helm, and Ansible.
- Experienced with containers, Kubernetes, and microservice architectures.
- Excellent verbal and written communication skills.
Why Join Us
At Dropsuite, now proudly part of NinjaOne, we are on a mission to safeguard business information and help businesses stay in business. We are a global, fast-growing, partner-centric company building secure, scalable, and highly usable cloud backup technologies for businesses of all sizes. Today, we perform billions of backups daily for organizations across more than 100 countries.
As we enter an exciting new chapter with NinjaOne—a leader in endpoint management, security, and IT automation—our combined strengths enable us to drive even greater impact, innovation, and global scale. Together, we are building a world-class platform that empowers IT teams with simplicity, performance, and reliability.
At our core, we are a team of hungry owners: we are tenacious in our pursuit of excellence and take full ownership in everything we do. We are deeply customer-focused, collaborative, and solutions-driven. We play as a team—respecting, supporting, and elevating one another every step of the way.
Join us as we shape the future of IT and data protection—powered by passion, purpose, and the spirit of ownership.
Rewards That Go Beyond
- Competitive compensation
- Hybrid work model
- 18 days of annual leave (with accrual up to 20 days)
- Entitled to Singapore Public Holidays
- Other leave benefits, such as Wedding leave
- Health Insurance for you and your dependents
- Growth opportunities
- Work in a global company with meaningful work, highly skilled colleagues, and an amazing culture
Diversity and Inclusion Statement
Dropsuite is an Equal Employment Opportunity and Affirmative Action Employer. Qualified applicants will receive consideration for employment without regard to race, colour, religion, sex, sexual orientation, gender perception or identity, national origin, age, marital status, protected veteran status, or disability status.
As part of our recruitment process, we may collect personal data to support hiring-related activities such as screening, assessment, and communication. This information is collected solely for recruitment purposes and handled in accordance with applicable data protection and privacy regulations. Your data will be treated with strict confidentiality and used only to facilitate your application with us.
Your Career Growth Starts Here. Apply Now!