1,313 Reliability Engineer jobs in Singapore
Reliability Engineer
Posted today
Job Viewed
Job Description
For more than 100 years, John Crane has equipped global process industries to meet mission-critical challenges. Our customers depend on John Crane to ensure their operations run efficiently and effectively. We deliver on that promise with technologies that maximize reliability, innovations that improve efficiency and services that enable a proven rapid response.
As pioneers of progress, we are committed to leveraging our legacy of technology leadership, innovative solutions and service excellence to help customers achieve their net-zero ambitions. We're fully committed to supporting global efforts that address climate change and are taking concrete steps to reach net carbon neutrality by 2050. Building a sustainable future, today.
Job DescriptionPurpose of Role:
To provide on-site support and manage reliability contract of mechanical seals for customers in Singapore.
Key Activities:
- Assist and guide seal installation / removal on site
- On site pump /seal initial assessment and inspection
- Support commissioning of seal system
- Perform 5-point checks on the equipment's prior to seal installation
- Conduct site tours, identify preventive activities (for eg: monitoring plan 11/13/23 for temperatures, pressures for plan 52/53A/B/C, flow monitoring on plan 54, bearing oil condition monitoring and general leakage on the equipment) and implement recommendations proactively to eliminate potential causes of seal failures.
- Co-ordinate seal activation and tracking
- Ensure and monitor timely repair of seals
- Prepare Inspection Reports and perform failure analysis
- To conduct Root Cause Analysis (RCA's) for bad actors in the plant
- Be responsible for reliability data management, trending, analysis, and reporting.
- Conduct monthly (Alliance Improvement team)/quarterly/ yearly meeting with stakeholders
- Prepare monthly and yearly reports for external stakeholders.
- Inventory management and optimization and conduct audits
- Gain understanding of systematic and chronic failures and develop plan accordingly to eliminate root cause of failures
Key Accountability:
- Maintain the equipment & seal database and failure history using John Crane Interface Reliability Management software for all equipment within the scope of the contract.
- Perform root cause analysis, failure mode assessment, recommendations, and reporting.
- Monitor key KPI's on periodic basis
- Understand customers' needs, Review installations, make engineering proposals to optimize arrangements, quantify benefits, prepare and present proposals.
Education:
- Bachelor Degree in Engineering, preferably in Mechanical Eng.
Experience:
- 3-8 Years
Technical Skills:
- Sound Knowledge of pumps & mechanical seals in Oil & Gas or Petrochem Industry.
- Knowledge of API Standards.
- Field experience in Installation, commissioning & maintenance of Pumps & seals.
- Exposure to Condition Monitoring techniques such as Vibration Analysis, Ultra- Sound Analysis, Motion Amplification will be added advantage.
People Skills
- Self-driven & Result Oriented
- Problem-solving attitude
Responsible:
- Mechanical seal refurbishment (may include cleaning)
- Mechanical seal installation/dismantling
- Plant tour
- Physical inventory (seal parts)
We believe that different perspectives and backgrounds are what make a company flourish. All qualified applicants will receive equal consideration for employment regardless of color, religion, gender, sexual orientation, gender identity, national origin, economic status, disability, age, or any other legally protected characteristics. We are proud to be an inclusive company with values grounded in equality and ethics, where we celebrate, support, and embrace diversity.
At no time during the hiring process will Smiths Group, nor any of our recruitment partners ever request payment to enable participation – including, but not limited to, interviews or testing. Avoid fraudulent requests by applying jobs directly through our career's website (Careers - Smiths Group plc)
Reliability Engineer
Posted today
Job Viewed
Job Description
About Us
Applied Materials is an innovation-driven company. We are the leader in materials engineering solutions used to produce virtually every new chip and advanced display in the world. Our ability to identify emerging technologies that can complement and leverage Applied's materials engineering expertise is critical to the company's continued growth.
About The Role
We are seeking a motivated and hands-on Reliability Engineer to support the introduction and scale-up of our next-generation semiconductor manufacturing line. This role will be pivotal in ensuring robust reliability performance through technology transfer, early-stage validation, and high-volume production. You will collaborate across engineering, design, quality, and manufacturing teams to influence product development decisions and ensure smooth ramp-up from pilot to production.
Responsibilities:
- Develop and execute advanced reliability test strategies across the product lifecycle, from concept through mass production.
- Lead and document reliability risk assessments including Failure Mode and Effects Analysis (FMEA).
- Manage the Reliability Lab, including test equipment calibration, setup of chambers/fixtures, and maintaining best-in-class testing standards.
- Prepare detailed reliability test plans, results, and reports with a focus on prioritization and on-time execution.
- Provide design guidance and influence multi-disciplinary teams to improve product robustness and long-term reliability.
- Support technology transfer and process integration efforts during factory ramp-up, ensuring efficient and disciplined execution.
- Collaborate globally across time zones, with flexibility to travel between Southeast Asia and the United States (up to 15%).
Qualifications:
- Bachelor's or Master's in Engineering (Electrical, Electronics, Microelectronics, Physics, Materials Science, Chemical Engineering, or related).
- Strong foundation in semiconductor processes (deposition, etch, lithography, metrology) and device fundamentals.
- Proficiency in reliability testing, statistical analysis, and acceleration models, with a track record of executing programs and mitigating risk in manufacturing or R&D.
- Familiarity with advanced failure-analysis tools and methods (e.g., optical microscopy, X-ray/CT, SEM/EDX).
- Proven track record in driving process transfer, reliability test execution, and risk mitigation in a manufacturing or R&D environment.
- Experience with consumer electronics or semiconductor reliability programs.
Reliability Engineer
Posted today
Job Viewed
Job Description
The Maintenance & Reliability Engineer will play a crucial role in ensuring the reliability of our manufacturing facility.
Responsibilities:
- Ensure all maintenance activities comply with company guidelines and ensure closure of issues
- Lead investigation, RCA and CAPA actions. Analyse maintenance data to identify trends, root causes of failures, and opportunities for improvement and issues raised
- Support the implementation and continuous improvement of the reliability program, including predictive and preventive maintenance strategies
- Coach and ensure safe and proper techniques to monitor equipment well-being
- Propose and recommend cost justification for the maintenance work
Requirements:
- Degree in Mechanical / Electrical Engineering or related discipline
- Minimum 4-8 years of relevant working experience in maintenance and reliability
- Proficiency in maintenance management software, reliability analysis tools and data analysis software
- Strong understanding of predictive and preventive maintenance techniques, root cause analysis, and reliability-cantered maintenance.
- Good problem-solving, communication and stakeholder management skills
Reliability Engineer
Posted today
Job Viewed
Job Description
About Us
Applied Materials is an innovation-driven company. We are the leader in materials engineering solutions used to produce virtually every new chip and advanced display in the world. Our ability to identify emerging technologies that can complement and leverage Applied's materials engineering expertise is critical to the company's continued growth.
About this role
We are seeking a motivated and hands-on Reliability Engineer to support the introduction and scale-up of our next-generation semiconductor manufacturing line. This role will be pivotal in ensuring robust reliability performance through technology transfer, early-stage validation, and high-volume production. You will collaborate across engineering, design, quality, and manufacturing teams to influence product development decisions and ensure smooth ramp-up from pilot to production.
Responsibilities:
- Develop and execute advanced reliability test strategies across the product lifecycle, from concept through mass production.
- Lead and document reliability risk assessments including Failure Mode and Effects Analysis (FMEA).
- Manage the Reliability Lab, including test equipment calibration, setup of chambers/fixtures, and maintaining best-in-class testing standards.
- Prepare detailed reliability test plans, results, and reports with a focus on prioritization and on-time execution.
- Provide design guidance and influence multi-disciplinary teams to improve product robustness and long-term reliability.
- Support technology transfer and process integration efforts during factory ramp-up, ensuring efficient and disciplined execution.
- Collaborate globally across time zones, with flexibility to travel between Southeast Asia and the United States (up to 15%).
Qualifications:
- Bachelor's or Master's in Engineering (Electrical, Electronics, Microelectronics, Physics, Materials Science, Chemical Engineering, or related).
- Strong foundation in semiconductor processes (deposition, etch, lithography, metrology) and device fundamentals.
- Proficiency in reliability testing, statistical analysis, and acceleration models, with a track record of executing programs and mitigating risk in manufacturing or R&D.
- Familiarity with advanced failure-analysis tools and methods (e.g., optical microscopy, X-ray/CT, SEM/EDX).
- Proven track record in driving process transfer, reliability test execution, and risk mitigation in a manufacturing or R&D environment.
- Experience with consumer electronics or semiconductor reliability programs.
Work Location:
- North
Materials Science
Test Equipment
Factory
FMEA
Technology Transfer
Process Integration
Physics
Microscopy
Reliability
Materials Engineering
Engineering Design
Metrology
Test Execution
Chemical Engineering
Electronics
Calibration
Reliability Engineer
Posted today
Job Viewed
Job Description
About the role
The Maintenance & Reliability Engineer will play a crucial role in ensuring the reliability and efficiency of our manufacturing operations. This position is responsible for driving key performance indicators (KPIs) and supporting the reliability program in maintenance department. The ideal candidate will also assist the team in various improvement projects to enhance overall plant performance, by providing engineering professional and technical leadership to the Engineering & Maintenance function in the plant.
Your responsibilities- Ensure all maintenance activities comply with safety, health, and environmental regulations.
- Support the implementation and continuous improvement of the reliability program, including predictive and preventive maintenance strategies.
- Collaborate with cross-functional teams to identify, plan, and execute improvement projects aimed at enhancing equipment performance and reliability.
- Lead investigation, RCA and CAPA actions. Analyse maintenance data to identify trends, root causes of failures, and opportunities for improvement.
- Maintain accurate records of maintenance activities, equipment performance, and reliability metrics.
- Assist in training maintenance personnel on best practices and new technologies related to equipment maintenance and reliability.
- Assist the maintenance department in tracking and achieving KPIs related to equipment reliability, downtime, and maintenance costs.
- Coordinate PM Optimization (Preventive Maintenance) to reduce maintenance costs using Lean Six Sigma methodologies.
- Oversee and maintain HACCP processes, including X-ray, metal detector, and pasteurization CCP.
- Develop and periodically update new SOPs.
- Assist the project team with new or upgraded project activities.
- Foster and maintain strong relationships within and outside the organization, proactively developing competitive strategies to support growth and productivity.
- Support the Maintenance Manager in fostering a culture of high compliance regarding equipment maintenance plans, calibration, and food safety-related equipment and processes.
- Recognize the connection between robust maintenance practices and their impact on food safety and product quality.
- Promote the use of technology in engineering, maintenance, and reliability engineering, integrating best practices by focusing on external competitors and other industries.
- Maintain good relationships within and outside the organization, serving as a community interface on strategic areas affecting the communities where MJN operates, and representing the company's views to foster good relations with local, national, and regulatory agencies.
- Education: Bachelor's degree in Engineering or a related field.
- Experience: Minimum of 3 years of experience in maintenance and reliability engineering, preferably in a food or pharma manufacturing environment.
- Technical Skills: Proficiency in maintenance management software (e.g., CMMS), reliability analysis tools, and data analysis software.
- Knowledge: Experience in reliability maintenance strategies. Strong understanding of predictive and preventive maintenance techniques, root cause analysis, and reliability-cantered maintenance.
- Problem-Solving: Excellent analytical and problem-solving skills with a proactive approach to identifying and addressing issues.
- Communication: Strong verbal and written communication skills, with the ability to effectively collaborate with cross-functional teams.
- Adaptability: Ability to work in a fast-paced environment and manage multiple priorities.
Maintenance Management
Preventive Maintenance
CAPA
Food Safety
MetaL
Root Cause Analysis
Reliability
Manufacturing Operations
Equipment Maintenance
Adaptability
Reliability Engineering
HACCP
Technical Leadership
Calibration
Lean Six Sigma
Reliability Engineer
Posted today
Job Viewed
Job Description
Company Description
For more than 100 years, John Crane has equipped global process industries to meet mission-critical challenges. Our customers depend on John Crane to ensure their operations run efficiently and effectively. We deliver on that promise with technologies that maximize reliability, innovations that improve efficiency and services that enable a proven rapid response. As pioneers of progress, we are committed to leveraging our legacy of technology leadership, innovative solutions and service excellence to help customers achieve their net-zero ambitions. We’re fully committed to supporting global efforts that address climate change and are taking concrete steps to reach net carbon neutrality by 2050. Building a sustainable future, today.
We believe that different perspectives and backgrounds are what make a company flourish. All qualified applicants will receive equal consideration for employment regardless of color, religion, gender, sexual orientation, gender identity, national origin, economic status, disability, age, or any other legally protected characteristics. We are proud to be an inclusive company with values grounded in equality and ethics, where we celebrate, support, and embrace diversity.
At no time during the hiring process will Smiths Group, nor any of our recruitment partners ever request payment to enable participation – including, but not limited to, interviews or testing. Avoid fraudulent requests by applying to jobs directly through our career’s website (Careers - Smiths Group plc).
Be careful - Don’t provide your bank or credit card details when applying for jobs. Don't transfer any money or complete suspicious online surveys. If you see something suspicious, report this job ad.
We are an equal opportunity employer and value diversity at our company. We encourage applications from all qualified individuals without regard to race, color, religion, sex, national origin, disability, age, or any other status protected by law.
Note: This description is based on the supplied content and aims to reflect the original role context without translations or additions.
Job Description
Purpose of Role: To provide on-site support and manage reliability contract of mechanical seals for customers in Singapore.
Key Activities:
Assist and guide seal installation / removal on site
On site pump / seal initial assessment and inspection
Support commissioning of seal system
Perform 5-point checks on the equipment’s prior to seal installation
Conduct site tours, identify preventive activities (for example: monitoring plan 11/13/23 for temperatures, pressures for plan 52/53A/B/C, flow monitoring on plan 54, bearing oil condition monitoring and general leakage on the equipment) and implement recommendations proactively to eliminate potential causes of seal failures
Co-ordinate seal activation and tracking
Ensure and monitor timely repair of seals
Prepare inspection reports and perform failure analysis
Conduct Root Cause Analysis (RCA’s) for bad actors in the plant
Be responsible for reliability data management, trending, analysis, and reporting
Conduct monthly (Alliance Improvement team)/quarterly/ yearly meetings with stakeholders
Prepare monthly and yearly reports for external stakeholders
Inventory management and optimization and conduct audits
Gain understanding of systematic and chronic failures and develop plan accordingly to eliminate root cause of failures
Key Accountability:
Maintain the equipment & seal database and failure history using John Crane Interface Reliability Management software for all equipment within the scope of the contract
Perform root cause analysis, failure mode assessment, recommendations, and reporting
Monitor key KPI’s on periodic basis
Understand customers’ needs, review installations, make engineering proposals to optimize arrangements, quantify benefits, prepare and present proposals
Qualifications
Education:
Bachelor Degree in Engineering, preferably in Mechanical Eng.
Experience:
3-8 Years
Technical Skills:
Sound Knowledge of pumps & mechanical seals in Oil & Gas or Petrochem Industry
Knowledge of API Standards
Field experience in Installation, commissioning & maintenance of Pumps & seals
Exposure to Condition Monitoring techniques such as Vibration Analysis, Ultra-Sound Analysis, Motion Amplification will be added advantage
People Skills
Self-driven & Result Oriented
Problem-solving attitude
Additional Information
Responsible:
Mechanical seal refurbishment (may include cleaning)
Mechanical seal installation/dismantling
Plant tour
Physical inventory (seal parts)
#J-18808-Ljbffr
Reliability Engineer
Posted 9 days ago
Job Viewed
Job Description
The Maintenance & Reliability Engineer will play a crucial role in ensuring the reliability and efficiency of our manufacturing operations. This position is responsible for driving key performance indicators (KPIs) and supporting the reliability program in maintenance department. The ideal candidate will also assist the team in various improvement projects to enhance overall plant performance, by providing engineering professional and technical leadership to the Engineering & Maintenance function in the plant.
Your responsibilities- Ensure all maintenance activities comply with safety, health, and environmental regulations.
- Support the implementation and continuous improvement of the reliability program, including predictive and preventive maintenance strategies.
- Collaborate with cross-functional teams to identify, plan, and execute improvement projects aimed at enhancing equipment performance and reliability.
- Lead investigation, RCA and CAPA actions. Analyse maintenance data to identify trends, root causes of failures, and opportunities for improvement.
- Maintain accurate records of maintenance activities, equipment performance, and reliability metrics.
- Assist in training maintenance personnel on best practices and new technologies related to equipment maintenance and reliability.
- Assist the maintenance department in tracking and achieving KPIs related to equipment reliability, downtime, and maintenance costs.
- Coordinate PM Optimization (Preventive Maintenance) to reduce maintenance costs using Lean Six Sigma methodologies.
- Oversee and maintain HACCP processes, including X-ray, metal detector, and pasteurization CCP.
- Develop and periodically update new SOPs.
- Assist the project team with new or upgraded project activities.
- Foster and maintain strong relationships within and outside the organization, proactively developing competitive strategies to support growth and productivity.
- Support the Maintenance Manager in fostering a culture of high compliance regarding equipment maintenance plans, calibration, and food safety-related equipment and processes.
- Recognize the connection between robust maintenance practices and their impact on food safety and product quality.
- Promote the use of technology in engineering, maintenance, and reliability engineering, integrating best practices by focusing on external competitors and other industries.
- Maintain good relationships within and outside the organization, serving as a community interface on strategic areas affecting the communities where MJN operates, and representing the company's views to foster good relations with local, national, and regulatory agencies.
- Education: Bachelor's degree in Engineering or a related field.
- Experience: Minimum of 3 years of experience in maintenance and reliability engineering, preferably in a food or pharma manufacturing environment.
- Technical Skills: Proficiency in maintenance management software (e.g., CMMS), reliability analysis tools, and data analysis software.
- Knowledge: Experience in reliability maintenance strategies. Strong understanding of predictive and preventive maintenance techniques, root cause analysis, and reliability-cantered maintenance.
- Problem-Solving: Excellent analytical and problem-solving skills with a proactive approach to identifying and addressing issues.
- Communication: Strong verbal and written communication skills, with the ability to effectively collaborate with cross-functional teams.
- Adaptability: Ability to work in a fast-paced environment and manage multiple priorities.
Be The First To Know
About the latest Reliability engineer Jobs in Singapore !
Lead Reliability Engineer
Posted 6 days ago
Job Viewed
Job Description
Applied Materials is a global leader in materials engineering solutions used to produce virtually every new chip and advanced display in the world. We design, build and service cutting-edge equipment that helps our customers manufacture display and semiconductor chips - the brains of devices we use every day. As the foundation of the global electronics industry, Applied enables the exciting technologies that literally connect our world - like AI and IoT. If you want to push the boundaries of materials science and engineering to create next generation technology, join us to deliver material innovation that changes the world.
**What We Offer**
Location:
Singapore,SGP
You'll benefit from a supportive work culture that encourages you to learn, develop, and grow your career as you take on challenges and drive innovative solutions for our customers. We empower our team to push the boundaries of what is possible-while learning every day in a supportive leading global company. Visit our Careers website to learn more.
At Applied Materials, we care about the health and wellbeing of our employees. We're committed to providing programs and support that encourage personal and professional growth and care for you at work, at home, or wherever you may go. Learn more about our benefits ( .
**Key Responsibilities**
Design, collect data, analyze and compile reports on a wide range of complex process engineering experiments for multiple products, within safety guidelines
Utilize techniques to characterize hardware, define methods and apply new technologies to characterize hardware, and/or perform hardware characterization on a wide range of complex systems for multiple products, within safety guidelines
Generate internal and external documentation for products, presentations, technical reports and generate process engineering specifications
Develop, plan and execute process engineering projects, within safety guidelines
Train engineers in measurement techniques of film properties and guide them in the interpretation of the data, new methodologies, trouble shooting techniques and resolve a wide range of complex process engineering issues/problems for multiple products
Interact with customers to resolve a wide range of complex process engineering issues/problems with limited to no supervision
Design and implement new technology, products and analytical instrumentation
Identify, select and work with vendors and suppliers with limited to no supervision
**Functional Knowledge**
+ Demonstrates depth and/or breadth of expertise in own specialized discipline or field
**Business Expertise**
+ Interprets internal/external business challenges and recommends best practices to improve products, processes or services
**Leadership**
+ May lead functional teams or projects with moderate resource requirements, risk, and/or complexity
**Problem Solving**
+ Leads others to solve complex problems; uses sophisticated analytical thought to exercise judgment and identify innovative solutions
**Impact**
+ Impacts the achievement of customer, operational, project or service objectives; work is guided by functional policies
**Interpersonal Skills**
+ Communicates difficult concepts and negotiates with others to adopt a different point of view
**Additional Information**
**Time Type:**
Full time
**Employee Type:**
Assignee / Regular
**Travel:**
Yes, 50% of the Time
**Relocation Eligible:**
No
Applied Materials is an Equal Opportunity Employer. Qualified applicants will receive consideration for employment without regard to race, color, national origin, citizenship, ancestry, religion, creed, sex, sexual orientation, gender identity, age, disability, veteran or military status, or any other basis prohibited by law.
Site Reliability Engineer
Posted today
Job Viewed
Job Description
Company Description
Higogame is a trailblazer in the mobile gaming and entertainment industry. Since our inception in late 2020, we have been dedicated to transforming the gaming landscape in Southeast Asia and beyond, delivering innovative and immersive experiences that engage millions of players around the globe.
- Our revenue has seen remarkable growth year after year, with operations extending across multiple regions worldwide.
- In just three years, we've risen to become one of the top two games of our kind in the local market.
- We proudly serve around 2 million active users daily and have a total monthly active user base of 5 million worldwide.
- Our team consists of over 200 talented employees, including a robust R&D division of more than 100 experts.
- We offer exceptional career development opportunities and foster a multinational culture that empowers everyone to reach their full potential.
Join us as we continue to push the boundaries of mobile gaming
Job Responabilities:
Responsible for the full lifecycle management of the company's global/multi-region infrastructure. Lead the setup of the Singapore physical data center and deep operations of Google Cloud (GCP) platform. Drive automation and intelligent operations systems to ensure high availability, low cost, and scalable business operations. The role requires both traditional data center operations experience and cloud-native technical vision, acting as a key technical backbone connecting physical resources with cloud capabilities.
I. Core Responsibilities
1. Physical Data Center Planning & Implementation
- Lead end-to-end management of self-built/hosted data centers: requirements analysis, architecture design (network/power/cooling/cabling), equipment selection (servers/switches/UPS), construction acceptance, and post-operations optimization.
- Design multi–data center disaster recovery architectures (e.g., active-active across two sites, three centers), including cross-site synchronization and failover strategies to ensure business continuity.
- Manage internal resource backup/disaster recovery, including art assets, code, and other data assets.
2. Google Cloud (GCP) Deep Operations & Optimization
- Design and manage GCP architecture (Compute Engine, VPC, Cloud Storage, GKE, BigQuery, etc.), supporting cloud migration and hybrid cloud deployment of core business systems.
- Lead full lifecycle management of cloud resources, including cost optimization (reserved instances, autoscaling, idle resource reclamation), performance tuning (network latency, storage IOPS, compute utilization), and security hardening (IAM governance, encryption policies, vulnerability scanning).
- Build cloud-native ops systems using Cloud Monitoring/Logging for real-time alerting and fault detection.
3. Automation & Intelligent Operations Systems
- Lead development and integration of operations toolchains (e.g., Ansible/Puppet automation, Prometheus+Grafana monitoring, ELK logging) to shift operations from manual to platform-based and intelligent.
- Integrate CI/CD pipelines with cloud platforms, optimizing deployment efficiency and stability of containerized (K8s) and serverless (Cloud Functions) workloads.
- Lead root cause analysis (RCA) and postmortems of major incidents, deliver improvement plans, and strengthen contingency planning and drills (e.g., data center power outage, cloud region failure).
4. Cross-Team Collaboration & Technical Enablement
- Collaborate with R&D, QA, and Product teams to provide infrastructure support for rapid business delivery.
- Develop operations standards and technical documentation, drive team knowledge sharing, and mentor junior engineers.
II. Requirements
Basic Qualifications
- Bachelor's degree or higher in Computer Science, Network Engineering, Cloud Computing, or related fields.
- 5+ years in IT operations, including 3+ years in physical data center build/ops, and 2+ years of hands-on GCP experience (must provide project examples).
- Experience in large-scale distributed systems, with solid knowledge of Linux, network protocols (TCP/IP, SDN), and high availability database architectures (MySQL/Redis).
- Must be able to converse in Mandarin due to the need to travel to China to communicate with Chinese speaking stakeholders
- Must be able to travel (up to 50% of the time)
Technical Skills
Data Center
- Familiar with infrastructure (power/cooling/fire safety/cabling), and optimization metrics like PUE/CUE.
- Experience in IDC hosting, custom data center builds, or third-party acceptance audits. Knowledge of industry standards (e.g., GB50174 Data Center Design Standard).
Google Cloud (GCP)
- Proficient in GCP core services: GCE, VPC, Cloud SQL/Spanner, GKE.
- Skilled in GCP cost management (Budgets & Alerts, preemptible VMs, storage tiers).
- Strong in GCP security: IAM, VPC Service Controls, Cloud Firewall, KMS.
Automation & Toolchains
- Skilled with Terraform/Ansible for IaC, scripting in Shell/Python/Go for ops tooling.
- Experienced in Prometheus+Grafana monitoring, ELK/OpenTelemetry for logging & tracing.
- Hands-on Kubernetes operations (scaling, node management, Helm) and CI/CD pipeline integration (Jenkins/GitLab CI).
Soft Skills
- Strong troubleshooting and resilience, able to quickly resolve complex incidents (e.g., data center outage, regional cloud failure).
- Excellent cross-team communication and project leadership skills.
- Fast learner, stays updated on cloud-native (CNCF), AIOps, and industry trends.
III. Nice-to-Haves
- GCP certifications (e.g., Professional Cloud Architect, Associate Cloud Engineer) or ITIL/ISO2000.
- Led/participated in large-scale data center builds (multi-million) or GCP ops projects with million+ annual cloud spend.
- Experience in hybrid cloud (GCP + on-premises) or edge computing ops.
- Published blogs, open-source contributions, or active participation in tech communities (GitHub, CNCF events).
IV. What We Offer
- Competitive salary
- Global platform: Participate in building multi-region intelligent operations infrastructure.
- Growth: Internal tech sharing, external conferences, certification & training support.
- Work environment: Flat management, flexible hours, free snacks, comprehensive medical, hospital and dental coverage
Site Reliability Engineer
Posted today
Job Viewed
Job Description
Imagine what you could accomplish here. Bring your passion, creativity, and dedication, and there will be no limit to what you can achieve. This is not just another SRE role - it's a chance to help redefine how reliability engineering is practiced at hyper-scale. Our team is building the platforms that will autonomously operate Apple's core information security systems, setting a new bar for how critical services are managed.
Description
We are seeking exceptional engineers who thrive at the intersection of reliability, software development and automation - individuals driven to push the boundaries of what's possible. The ideal candidate has a strong foundation in modern SRE practices and a proven ability to design and implement software that solves operational challenges. You'll break new ground using the most advanced tools and approaches available, developing automation that doesn't just keep pace with scale but anticipates, reacts and stays ahead of it. You will work closely with Security Engineering, Threat Detection, Incident Response and other internal functions to ensure the scalability, availability and security of the tools and infrastructure that support Apple's cybersecurity mission. Join us, and help build the future of self-managing systems at one of the most innovative companies in the world.
Responsibilities
- Our team is highly collaborative, working closely with partner teams to deliver the best results for Apple. We strive to find the best solution while also considering the need to get things done efficiently for each engineering challenge we face. Good ideas are valued and rewarded.
- As an SRE in Apple Information Security, you will:
- Operate, monitor, and triage all aspects of our production and non-production environments
- Pioneer and implement the next generation telemetry system for AIS services
- Establish alert handling procedures, run-books, and collaborate with our global security team
- Automate deployment and orchestration of services into the cloud environment as well as other routine processes
- Actively participate in capacity planning and disaster recovery exercises
- Interact with and support partner teams across the enterprise
Cultivate and maintain relationships with internal and external third party vendors
Minimum Qualifications
- Bachelor's degree in Computer Science, or a related field, or equivalent practical experience
- Proven experience in Site Reliability Engineering or a related field
- Strong programming skills: Python, Go or Swift
- Experience working with cloud compute environments like AWS, GCP or Azure
- Experience with infrastructure as code (IaC), configuration management, CI/CD, and automation, e.g., Terraform, Pulumi, CloudFormation, Ansible, Chef, Puppet, Jenkins
- Cloud deployment and CI/CD problem diagnosis and troubleshooting
Preferred Qualifications
- Experience or experimentation building systems that leverage Agentic AI principles, tools, platforms and frameworks
- Strong understanding and experience in implementing monitoring and observability tools like Splunk, Grafana, Prometheus
- Building and operating container orchestrating systems (Docker, Kubernetes, Vagrant and micro-services)
- Experience administering and troubleshooting Linux systems including the usage of standard Linux utilities
- Experience in shell scripting (e.g., bash/zsh) and system administration
- Experience with measuring, analyzing, and optimizing system performance
- Passion for high-quality code, tests, documentation and production services
Participation in an on-call rotation
Submit CV
Explore reliability engineer jobs, focusing on roles that demand expertise in risk assessment and