952 Reliability Engineering jobs in Singapore
Reliability Engineering Specialist
Posted today
Job Viewed
Job Description
WHAT YOU DO AT AMD CHANGES EVERYTHING
We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences – the building blocks for the data center, artificial intelligence, PCs, gaming and embedded. Underpinning our mission is the AMD culture. We push the limits of innovation to solve the world’s most important challenges. We strive for execution excellence while being direct, humble, collaborative, and inclusive of diverse perspectives.
AMD together we advance_
THE ROLE:
Join a dynamic global team dedicated to advanced reliability testing of module and system boards of AMD's cutting-edge products. Collaborate closely with cross-functional teams across AMD Global Operations & Quality, and Data Center organizations on accelerator-product system setup and reliability testing.
KEY RESPONSIBILITIES:
- System-level setup and testing:
- Plan, execute, and optimize system-level setups for accelerator products, including server rack and system configurations.
- Ensure seamless integration and functionality of server systems with advanced cooling solutions and environmental management systems.
- Validate and maintain reliability test scripts for automated and manual testing processes.
- Reliability assessment and testing:
- Conduct comprehensive reliability assessments of accelerator systems, focusing on mechanical, thermal, and electrical stress factors.
- Design and implement environmental stress tests to simulate data center conditions, including operational stress, thermal cycling, signal, and power integrity.
- Evaluate material interactions and their impact on product reliability, ensuring robustness in diverse operating environments.
- Analyze results to identify potential reliability risks and areas for design improvement.
- Functional testing and fault isolation:
- Perform detailed functional testing to evaluate system performance under various operational conditions.
- Identify, isolate, and troubleshoot faults using advanced diagnostic tools and methodologies.
- Failure analysis and reporting:
- Perform root cause analysis for identified reliability failures and develop corrective actions for design and process enhancement.
- Collaborate with cross-functional teams to conduct root cause analysis of reliability testing failures.
- Collaboration and documentation:
- Work closely with design, manufacturing, and quality teams to align reliability goals with overall product requirements.
- Generate comprehensive reports detailing reliability test results, analysis, and recommendations.
- Maintain meticulous records of testing methodologies and outcomes for future reference and continuous improvement initiatives.
- Mentorship:
- Effectively mentor junior engineers, providing guidance in both technical domains and professional skill development to foster growth and team success.
PREFERRED EXPERIENCE:
- Knowledge of reliability engineering principles, product lifecycle, and standards in high-performance computing environments.
- Proven experience in system-level setup and testing for accelerator products or similar technologies.
- Proficiency in developing and executing reliability test scripts and protocols.
- Familiarity with reliability standards and best practices in high-performance computing environments.
- Familiarity with data center environmental management, server rack/system configurations, and integrated cooling solutions.
- Strong understanding of environmental stress factors, including thermal, mechanical, and electrical stresses, in server systems (L6–L10).
- Expertise in failure analysis techniques, including root cause analysis and fault isolation methodologies.
- Excellent written and verbal communication skills for clear reporting and collaboration.
- Strong analytical, problem-solving, and communication skills.
- Experience with reliability testing tools, simulation software and statistical tools is an added advantage.
- Knowledge in project and risk management is an added advantage.
- Self-starter and able to independently drive tasks to completion.
- Ability to structure and execute complex analysis, draw insights, and communicate summary conclusions/recommendations to senior management and AMD customers/partners.
- Ability to network, build relationships, and collaborate to drive effective decision-making across multiple functions and levels within AMD.
ACADEMIC CREDENTIALS:
- Bachelor’s or Master’s degree in Electrical/Electronics Engineering (EE) or a related field.
LOCATION:
Singapore
#LI-JV1
Benefits offered are described: AMD benefits at a glance .
AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process.
#J-18808-LjbffrReliability Engineering Specialist
Posted today
Job Viewed
Job Description
Reliability Engineering Expert
We are seeking talented and driven professionals to join our team of reliability engineers. This role involves helping organizations enhance the availability, performance, and resilience of their applications and services through the deployment and administration of Observability Platforms.
Responsibilities:- Deploy and manage Observability platforms and agents for ingesting metrics, logs, and traces from various sources.
- Parse and organize logs to extract relevant fields and data for processing and filtering.
- Assist developers in instrumenting application code to collect custom Application Performance Monitoring (APM) data.
- Record, script, and manage synthetic monitors for testing purposes.
- Capture user sessions and data for real user monitoring (RUM).
- Set up alerts and notifications for proactive monitoring.
- Generate dashboards, visualizations, and reports to provide actionable insights.
- Participate in and support root cause analysis (RCA) and application/service profiling sessions.
- Educate and assist teams in leveraging observability tools effectively.
- At least 2 years of experience working with modern observability platforms.
- Familiarity with observability concepts and standards such as OpenTelemetry.
- Experience with observability tools like the Elastic Stack for monitoring cloud infrastructure and application performance.
- Knowledge of developing, instrumenting, and profiling applications to enhance performance and reliability.
VP of Site Reliability Engineering
Posted 1 day ago
Job Viewed
Job Description
- Technology is key to enabling the DBS vision of being the leading bank in Asia. To meet the challenges arising from the ever-evolving technological advancements and increasing sophistication and demands of customers, there is a need for deft Technology Risk Managers to ensure robust risk governance.
- As a member of the Technology Risk Management team, you will oversee aglobal portfolioof technology risk management activities (includes participating in any technology risk management related initiatives), with a focus on:
- Targeted Risk Reviews
- Policy/Standard/Guide enforcement validation
- Thematic risk analysis for IT risks
- This role ensures that DBS Bank's technology risk framework aligns with global regulatory requirements (MAS, HKMA, RBI, GDPR, etc.)and industry best practices (NIST, ISO 27001, COBIT), and internal policies while identifying vulnerabilities and recommending mitigation strategies.
- The position requires a strategic leader who can identify systemic risks, drive audit remediation, and enhance governance across all regions where DBS operates.
- Accountable for managing internal and external reviews/audits from audit planning (such as request for information (RFI), opening meeting, etc.), fieldwork (such as RFI, issue discussion, etc.), to reporting and closing meeting.
- Responsible for monitoring and validating the closure of management actions, arising from internal and external reviews/audits, including regulator inspection reviews.
- Perform review of new / revised processes, provide risk opinion and ensure proper approvals and documentations.
- Collaborate with the different technology teams to conduct post implementation review of new / revised processes to provide assurance.
- Drive automation (e.g., data analytics, AI/ML) for continuous auditing.
- Prepare and develop technology risk insights (such as IT audit thematic and trend analysis) to be presented at forums (such as technology risk forums, etc.).
- Engage and collaborate with technology stakeholders to proactively identify risks at a detailed and technical level and ensure that IT is effectively driving remediation activities and to continuously improve IT risk posture.
- Proactive in forging effective engagement with key stakeholders relating to risk & control matters.
- Provide risk assessment and advisory as required:
- Evaluate the effectiveness of IT risk governance, security policies, and control frameworks.
- Provide actionable recommendations to senior management for risk mitigation.
- Subject matter expert in Site Reliability Engineering.
- Manage technology risk initiatives and target reviews.
Required Experience
- At least 12 years (SVP) / 8 years (VP)of experience preferably with exposure on risk management (in control functions; including technology).
- Demonstrated experience in Identifying, assessing and advising on technology risks.
- Excellent organizational, problem solving, interpersonal and operating skills to effectively drive the IT Risk agenda with IT functions.
- Strong communication skills at all levels -- able to effectively communicate with IT and senior management, as well as line staff to drive IT risk mitigation initiatives and other IT risk management related areas.
- Experience in driving IT risk management in digital age a plus.
- Knowledge of Information Security, System Resiliency & Availability & Software development practices and frameworks and regulatory requirements preferred.
- Subject matter expertise in Site Reliability Engineering, including but not limited to the following areas:
- SDLC governance (includes, CICD, SQA)
- DevOps, Release & deployment
- Change management
- Problem/Incident management
- Disaster recovery
- Good technical competencies and exposure to IT application or infrastructure development, support and management.
- Demonstrated experience of leveraging data and analytics to get stakeholder buy-in is a plus.
- Strong executive communication(for Technology EXCO-level reporting).
- Ability to translate technical risks into business impact.
- Leadership in driving cultural change toward risk awareness.
- Bachelor's/Master's inComputer Science, or related field.
- Certifications (Required):CISA, CISSP, CRISC, CISM, or equivalent.
- Preferred:ISO 27001 Lead Auditor, AWS/Azure Security, CCSP.
information_technology
#J-18808-LjbffrSenior Software Engineer, Site Reliability Engineering
Posted 1 day ago
Job Viewed
Job Description
We are a team to design, develop, maintain, and improve software for various ventures projects, i.e., projects that are adjacent to our core businesses and are bootstrapped fast with a lean team. You will be actively involved in the design of various components behind scalable applications, from frontend UI to backend infrastructure.
- Ensure entire stack is healthy: hardware, software, application and network are operating at optimal performance
- Perform deep dives into both systemic and latent reliability issues; partnering with other software and DevOps engineers across the organization to design, implement and roll out fixes
- Continuously improve availability, reliability, and observability and reduce the burden of human toil with tooling and automation
- Lead and drive SRE initiatives to improve operation efficiencies
- Represent the SRE team in system design reviews and operational readiness exercises for new and existing services
- Experience coding in Ruby and/or Go
- Familiar with GitOps principles and tools (Github Actions, Docker, Kubernetes)
- Experience in designing, analyzing, and troubleshooting large-scale distributed systems
- Curiosity about finding root causes in incidents and outages
- Ability to develop alignment to cultivate relationships and driving impact
- Mindset in designing fault tolerance system architecture
- Comfort with being uncomfortable in ambiguous situations
- Involvement with incident management and response
- Desire to grow expertise, inform, and educate others
- Capable to pick up various technologies, a fast learner and have a “get things done” mentality
- Humble to embrace better ideas from others, eager to make things better, open to challenges and possibilities
- Familiar with cloud platforms and micro-service based architecture (AWS is big plus)
- Familiar with monitoring tools (e.g. Datadog, OpenTelemetry)
- Familiar with CICD tools (e.g. Github Actions)
- Familiar with IaC tools (e.g. Terraform, Spacelift)
- Experience in designing resilient system architecture
- Experience in optimizing performance of large-scale production system
Life @ Crypto.com
Empowered to think big. Try new opportunities while working with a talented, ambitious and supportive team.
Transformational and proactive working environment. Empower employees to find thoughtful and innovative solutions.
Growth from within. We help to develop new skill-sets that would impact the shaping of your personal and professional growth.
Work Culture. Our colleagues are some of the best in the industry; we are all here to help and support one another.
One cohesive team. Engage stakeholders to achieve our ultimate goal - Cryptocurrency in every wallet.
Work Flexibility Adoption. Flexi-work hour and hybrid or remote set-up
Aspire career alternatives through us - our internal mobility program offers employees a new scope.
Work Perks: crypto.com visa card provided upon joining
Are you ready to kickstart your future with us?
Benefits
Competitive salary
Attractive annual leave entitlement including: birthday, work anniversary
Work Flexibility Adoption. Flexi-work hour and hybrid or remote set-up
Aspire career alternatives through us. Our internal mobility program can offer employees a diverse scope.
Work Perks: crypto.com visa card provided upon joining
Our Crypto.com benefits packages vary depending on region requirements, you can learn more from our talent acquisition team.
About Crypto.com:
Founded in 2016, Crypto.com serves more than 80 million customers and is the world's fastest growing global cryptocurrency platform. Our vision is simple: Cryptocurrency in Every Wallet. Built on a foundation of security, privacy, and compliance, Crypto.com is committed to accelerating the adoption of cryptocurrency through innovation and empowering the next generation of builders, creators, and entrepreneurs to develop a fairer and more equitable digital ecosystem.
Learn more at
Crypto.com is an equal opportunities employer and we are committed to creating an environment where opportunities are presented to everyone in a fair and transparent way. Crypto.com values diversity and inclusion, seeking candidates with a variety of backgrounds, perspectives, and skills that complement and strengthen our team.
Personal data provided by applicants will be used for recruitment purposes only.
Please note that only shortlisted candidates will be contacted.
Senior Software Engineer, Site Reliability Engineering
Posted 12 days ago
Job Viewed
Job Description
We are a team to design, develop, maintain, and improve software for various ventures projects, i.e., projects that are adjacent to our core businesses and are bootstrapped fast with a lean team. You will be actively involved in the design of various components behind scalable applications, from frontend UI to backend infrastructure.
What you’ll be doing- Ensure entire stack is healthy: hardware, software, application and network are operating at optimal performance
- Perform deep dives into both systemic and latent reliability issues; partnering with other software and DevOps engineers across the organization to design, implement and roll out fixes
- Continuously improve availability, reliability, and observability and reduce the burden of human toil with tooling and automation
- Lead and drive SRE initiatives to improve operation efficiencies
- Represent the SRE team in system design reviews and operational readiness exercises for new and existing services
- Experience coding in Ruby and/or Go
- Familiar with GitOps principles and tools (Github Actions, Docker, Kubernetes)
- Experience in designing, analyzing, and troubleshooting large-scale distributed systems
- Curiosity about finding root causes in incidents and outages
- Ability to develop alignment to cultivate relationships and driving impact
- Mindset in designing fault tolerance system architecture
- Comfort with being uncomfortable in ambiguous situations
- Involvement with incident management and response
- Desire to grow expertise, inform, and educate others
- Capable to pick up various technologies, a fast learner and have a “get things done” mentality
- Humble to embrace better ideas from others, eager to make things better, open to challenges and possibilities
- Familiar with cloud platforms and micro-service based architecture (AWS is big plus)
- Familiar with monitoring tools (e.g. Datadog, OpenTelemetry)
- Familiar with CICD tools (e.g. Github Actions)
- Familiar with IaC tools (e.g. Terraform, Spacelift)
- Experience in designing resilient system architecture
- Experience in optimizing performance of large-scale production system
Empowered to think big. Try new opportunities while working with a talented, ambitious and supportive team.
Transformational and proactive working environment. Empower employees to find thoughtful and innovative solutions.
Growth from within. We help to develop new skill-sets that would impact the shaping of your personal and professional growth.
Work Culture. Our colleagues are some of the best in the industry; we are all here to help and support one another.
One cohesive team. Engage stakeholders to achieve our ultimate goal - Cryptocurrency in every wallet.
Work Flexibility Adoption. Flexi-work hour and hybrid or remote set-up.
Aspire career alternatives through us - our internal mobility program offers employees a new scope.
Work Perks: crypto.com visa card provided upon joining.
BenefitsCompetitive salary.
Attractive annual leave entitlement including: birthday, work anniversary.
Work Flexibility Adoption. Flexi-work hour and hybrid or remote set-up.
Aspire career alternatives through us. Our internal mobility program can offer employees a diverse scope.
Work Perks: crypto.com visa card provided upon joining.
Our Crypto.com benefits packages vary depending on region requirements, you can learn more from our talent acquisition team.
About Crypto.com:Founded in 2016, Crypto.com serves more than 80 million customers and is the world's fastest growing global cryptocurrency platform. Our vision is simple: Cryptocurrency in Every Wallet. Built on a foundation of security, privacy, and compliance, Crypto.com is committed to accelerating the adoption of cryptocurrency through innovation and empowering the next generation of builders, creators, and entrepreneurs to develop a fairer and more equitable digital ecosystem.
Learn more at
Crypto.com is an equal opportunities employer and we are committed to creating an environment where opportunities are presented to everyone in a fair and transparent way. Crypto.com values diversity and inclusion, seeking candidates with a variety of backgrounds, perspectives, and skills that complement and strengthen our team.
Personal data provided by applicants will be used for recruitment purposes only.
Please note that only shortlisted candidates will be contacted.
#J-18808-LjbffrSenior Manager – Site Reliability Engineering SRE
Posted today
Job Viewed
Job Description
Nice to Meet You We are Dropsuite, a NinjaOne Company
Site Ops teams are a blend of pragmatic operators and software craftspeople that apply sound engineering principles, operational discipline, and mature automation to our operating environments.
We are seeking a seasoned Senior Manager – Site Reliability Engineering (SRE) to lead a high-impact team focused on building resilient, scalable infrastructure and ensuring platform reliability across our cloud environments. This role combines strategic leadership with deep technical expertise in automation, observability, and modern DevOps practices to drive operational excellence and service uptime.
Work Arrangement
- Full-time position
- Hybrid work model (2 days per week in the office)
- Monday to Friday, 5-day work week (flexible work schedule)
- Eligible to reside and work in Singapore (Singapore Citizens / PRs preferred)
This position is open exclusively to candidates who reside in and are authorised to work in Singapore. Only shortlisted candidates will be contacted.
Key Accountabilities
- Define and implement SRE roadmaps aligned with business objectives and SLAs.
- Collaborate with service owners to define SLOs supporting SLA commitments.
- Deliver platform SLI insights through reports and observability tools.
- Integrate reliability best practices into engineering and product workflows.
- Lead initiatives on uptime, monitoring, incident response, and optimization.
- Manage incident response processes, on-call rotations, and playbooks.
- Set infrastructure resiliency standards for cloud-native environments.
- Optimize architecture for scalability, fault tolerance, and cost efficiency.
- Ensure production systems meet security and compliance requirements.
- Provide strategic leadership and mentorship to drive team growth and performance.
- Design scalable and resilient systems architecture.
- Recruit, mentor, and retain high-performing SRE talent.
- Develop growth and training plans for SRE team members.
- Foster a reliability-focused, customer-centric team culture.
Qualifications and Competencies
- Bachelor's degree in Computer Science or a related field.
- Cloud certification in AWS, Azure, or GCP preferred.
- 8+ years in Software Engineering or Site Reliability Engineering.
- 3+ years in team management or technical leadership.
- Expert-level Linux administration, scripting, and troubleshooting.
- Strong hands-on experience with CI/CD and SDLC practices.
- Deep passion for automation, security, and self-service.
- Proficient in AWS, GCP, and/or Azure cloud platforms.
- Skilled in infrastructure-as-code tools like Terraform, CloudFormation, Helm, and Ansible.
- Experienced with containers, Kubernetes, and microservice architectures.
- Excellent verbal and written communication skills.
Why Join Us
At Dropsuite, now proudly part of NinjaOne, we are on a mission to safeguard business information and help businesses stay in business. We are a global, fast-growing, partner-centric company building secure, scalable, and highly usable cloud backup technologies for businesses of all sizes. Today, we perform billions of backups daily for organizations across more than 100 countries.
As we enter an exciting new chapter with NinjaOne—a leader in endpoint management, security, and IT automation—our combined strengths enable us to drive even greater impact, innovation, and global scale. Together, we are building a world-class platform that empowers IT teams with simplicity, performance, and reliability.
At our core, we are a team of hungry owners: we are tenacious in our pursuit of excellence and take full ownership in everything we do. We are deeply customer-focused, collaborative, and solutions-driven. We play as a team—respecting, supporting, and elevating one another every step of the way.
Join us as we shape the future of IT and data protection—powered by passion, purpose, and the spirit of ownership.
Rewards That Go Beyond
- Competitive compensation
- Hybrid work model
- 18 days of annual leave (with accrual up to 20 days)
- Entitled to Singapore Public Holidays
- Other leave benefits, such as Wedding leave
- Health Insurance for you and your dependents
- Growth opportunities
- Work in a global company with meaningful work, highly skilled colleagues, and an amazing culture
Diversity and Inclusion Statement
Dropsuite is an Equal Employment Opportunity and Affirmative Action Employer. Qualified applicants will receive consideration for employment without regard to race, colour, religion, sex, sexual orientation, gender perception or identity, national origin, age, marital status, protected veteran status, or disability status.
As part of our recruitment process, we may collect personal data to support hiring-related activities such as screening, assessment, and communication. This information is collected solely for recruitment purposes and handled in accordance with applicable data protection and privacy regulations. Your data will be treated with strict confidentiality and used only to facilitate your application with us.
Your Career Growth Starts Here. Apply Now
Tell employers what skills you haveTroubleshooting
Scalability
Operational Excellence
Kubernetes
Azure
Ubuntu
Software Engineering
Scripting
Reliability
Administration Management
Reliability Engineering
Technical Consultation
GCP
Ansible
Linux
Principal Network Development Engineer - Network Reliability Engineering

Posted 28 days ago
Job Viewed
Job Description
**About the Role:**
As a Principal Engineer within NRE, you will be responsible for ensuring the reliability, scalability, and security of OCI's network infrastructure. You will apply engineering principles to measure and automate the network's reliability, aligning it with Oracle's service-level objectives. This role will involve resolving complex network issues, collaborating across teams, and driving automation efforts that enhance the overall operational efficiency of the OCI network. You'll work with a team dedicated to proactively preventing network disruptions, performing root-cause analysis, and delivering innovative solutions that ensure the smooth operation of a global network environment.
**What You'll Do:**
+ **Lead Network Reliability Efforts** : Develop, automate, and optimize network services that ensure high availability and performance across OCI's global infrastructure.
+ **Network Lifecycle Management** : Drive key programs to manage and maintain the network lifecycle, defining objectives and coordinating delivery milestones to meet organizational goals.
+ **Troubleshoot and Resolve Complex Network Issues** : Serve as the technical expert for network events, providing Tier 2 support and leading efforts to quickly restore services.
+ **Drive Automation** : Develop scripts and automation tools to improve operational efficiency, reduce manual interventions, and support a rapidly evolving network environment.
+ **Collaborate Across Teams** : Work closely with cross-functional teams-including engineering, product, and vendor partners-to design, implement, and optimize network solutions that meet the needs of both the business and end-users.
+ **Mentor and Lead** : Provide technical leadership and mentorship to junior engineers, helping them develop their skills and grow within the organization.
+ **Innovate and Influence** : Contribute to the roadmap for new network technologies, tools, and methodologies that enhance OCI's network performance and reliability.
Career Level - IC4
**Responsibilities**
**What You'll Need to Succeed:**
+ **Technical Expertise** : Extensive experience in network engineering, with a strong background in protocols like **MPLS, BGP, OSPF, IS-IS, TCP/IP, IPv4, IPv6, DNS** , and **DHCP** . Experience with **VxLAN** , **EVPN** , and **SDN technologies** is a plus.
+ **Automation Skills** : Proficiency in scripting or programming, ideally with **Python** , to develop solutions that automate network operations and troubleshooting.
+ **Deep Understanding of Networking** : Strong knowledge of networking protocols, monitoring tools, telemetry solutions, and network modeling techniques (e.g., **YANG, OpenConfig, NETCONF** ).
+ **Experience in Cloud or ISP Environments** : Proven track record in large-scale cloud or ISP network environments, ideally supporting complex, multi-cloud infrastructures.
+ **Problem-Solving Mindset** : Excellent analytical and troubleshooting skills, with a focus on proactive identification and resolution of network issues.
+ **Collaboration and Leadership** : Ability to work effectively in a fast-paced, cross-functional team environment. Experience leading technical teams or projects is highly desirable.
**Preferred Experience:**
+ Experience with **network modeling** and **automation frameworks** for large-scale networks.
+ Familiarity with **cloud-native network architectures** and modern network management tools.
+ Experience with **network monitoring** , **telemetry** systems, and **telemetry-based decision-making** .
**Additional Information:**
+ This role requires participation in an **on-call rotation** to provide 24/7 support for critical network events and incidents.
+ You will work in a **high-impact, high-visibility role** with opportunities for technical leadership and career advancement.
+ This role is open to Singaporeans and PRs only.
+ This role will involve the successful applicant working on government projects which may require security clearance being obtained and maintained as a condition of employment.
**What We Offer:**
+ **Impact at Scale** : Work on projects that support millions of users and some of the largest organizations in the world.
+ **Global Reach** : Collaborate with engineers, leaders, and vendors across the globe to build and operate Oracle Cloud's network.
+ **Innovation and Growth** : Opportunity to work with cutting-edge technologies and drive innovation in a fast-evolving field.
+ **Supportive Culture** : A culture of collaboration, continuous learning, and growth, where your contributions matter.
**About Us**
As a world leader in cloud solutions, Oracle uses tomorrow's technology to tackle today's challenges. We've partnered with industry-leaders in almost every sector-and continue to thrive after 40+ years of change by operating with integrity.
We know that true innovation starts when everyone is empowered to contribute. That's why we're committed to growing an inclusive workforce that promotes opportunities for all.
Oracle careers open the door to global opportunities where work-life balance flourishes. We offer competitive benefits based on parity and consistency and support our people with flexible medical, life insurance, and retirement options. We also encourage employees to give back to their communities through our volunteer programs.
We're committed to including people with disabilities at all stages of the employment process. If you require accessibility assistance or accommodation for a disability at any point, let us know by emailing or by calling +1 in the United States.
Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans' status, or any other characteristic protected by law. Oracle will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.
Be The First To Know
About the latest Reliability engineering Jobs in Singapore !
Senior/Expert Engineer, Site Reliability Engineering (Garena)
Posted 5 days ago
Job Viewed
Job Description
Job Description
- Deep dive into development lines, learning and understanding the mechanism of every application component, and promoting product scalability, stability and performance.
- Setup, manage and maintain product/middleware/big-data applications and services.
- Perform regular and ad-hoc server-side deployments, performance fine-tuning and troubleshooting.
- Design and develop automations for our workflow.
- Capacity and Resource management.
- Responsible for the full-chain stress test to enhance the performance and remove redundancy of applications.
- Prepare routine operation documentation.
Job Requirements
- Bachelor’s or higher degree in Computer Science, Engineering, Information Systems or related fields.
- Minimum 3 years of relevant full-time working experience in Site Reliability Engineer roles
- Extensive and hands-on knowledge with Linux operating systems (Ubuntu, CentOS, etc.).
- Extensive and hands-on knowledge with Kubernetes and the eco-system.
- Knowledge of Computer Network(TCP/IP, DNS, etc.) and OS.
- Hands-on experience with at least one of the programming languages: Bash, Python, Go.
- Strong analytical and problem-solving skills with the ability to thrive under high-pressure situations.
- Fast learning ability and a good team player.
- Detailed-oriented, cautious and prudent.
Process Improvement Professional
Posted today
Job Viewed
Job Description
We are seeking a skilled Process Excellence Specialist to drive process improvement initiatives across our terminal operations. The successful candidate will be responsible for identifying and implementing operational excellence solutions, leveraging Lean and Six Sigma methodologies.
Process Improvement Specialist
Posted today
Job Viewed
Job Description
Seeking a detail-oriented professional to fill a unique role that combines process auditing and administrative responsibilities. In this capacity, you will leverage your analytical skills to ensure operational efficiency by developing and implementing processes that align with business objectives. Regular audits will be conducted to identify areas for improvement, and collaboration with various departments will be necessary to ensure compliance with industry standards and company policies.
Key Responsibilities- Develop and implement processes that align with business requirements.
- Conduct regular audits to identify areas for improvement.
- Collaborate with various departments to ensure compliance with company policies and industry standards.
- Strong analytical and problem-solving skills.
- Ability to work independently and collaboratively as part of a team.
- Excellent communication and interpersonal skills.
Our organization offers a dynamic work environment, opportunities for growth and development, and competitive compensation and benefits packages.
OthersThis is an excellent opportunity for individuals who are passionate about process improvement and enjoy working in a fast-paced environment.