202 Security Monitoring jobs in Singapore
Command Centre Operator (Security Monitoring Operations, West)
Posted 13 days ago
Job Viewed
Job Description
Who We Are Looking For & What Role You Will Play
- Perform monitoring, analysis and escalation of real-time operation events.
- Monitor attendance records of all AETOS employees and provide detailed reports to management.
- Monitor security access control points and basic administration of clearance for persons and vehicles into the system.
- Monitor various alarm alerts, activation of response team and prepare basic reports upon completion of operation.
- Monitor of video analytics CCTV and response in accordance with procedures.
- Perform inbound and outbound calls
- Managing and resolving client’s enquiries
- Any other duties/jobs assigned by supervisor.
What Knowledge & Experience We Require From You
- Candidate must possess at least GCE N/ O level.
- Able to speak and write fluent English.
- Proficient in Microsoft Office (Word, Excel, Outlook).
- Able to work 12 hours shift including weekends or Public Holiday (depend on roster).
- Relevant training will be provided.
Incident Response Lead
Posted today
Job Viewed
Job Description
Press Tab to Move to Skip to Content Link
Select how often (in days) to receive an alert: Create Alert
At Tetra Pak we commit to making food safe and available, everywhere; and we protect what's good – protecting food, protecting people, and protecting the planet. By doing so we touch millions of people's lives every day.
And we need people like you to make it happen.
We empower you to reach your potential with opportunities to make an impact to be proud of – for food, people and the planet.
The Incident Response (IR) Lead leads a 24/7 virtual team who monitor and respond to ISIRT major incidents. This role requires management of Incident Response activities and team communication with SOC analysts, SME and other IT technical personnel. This role is also required to work closely with stakeholders and cybersecurity’s leadership team. Additionally, the Incident Response Lead will ensure staff members prioritize their work related to suspected and confirmed incidents, which may vary in severity and impact. The Incident Response Lead will direct analysts to investigate, validate, remediate and communicate known details about the incident and is a point of contact for escalation.
Due to coverage requirements, this is a permanent position based in a country within the Asia time zone.
What you will doRole and responsibilities:
The Incident Response Lead will analyze and organize to help the team rank complex work. As a central figure, Incident Response Lead brings order to a fast-paced, constantly evolving operation. Incident Response Lead to enforce policies, playbooks and methodologies, which have been adopted for the best course of action.
Personal, organizational, communication and analytical skills are vital, as well as the ability to communicate effectively with cybersecurity leadership. This role requires technical aptitude, and managers are also expected to be adept at working well with people who will be under stress and subject to burnout.
Key Responsibilities:
• Manage a team of incident responders for ISIRT response and interact with cybersecurity leadership and business stakeholders.
• Coordinate and ensure ISIRT incidents are prioritized at all hours of the day.
• Implement a cross-functional team of analysts working closely with cybersecurity, IT and developers.
• Serve as a point of escalation and incident commander.
• Review ISIRT incidents that may be related to ransomware, host compromise, account compromise, phishing, anomalous user behavior, third parties and data leakage.
• Ensure the ISIRT response team is following processes embraced by leadership and adhering to best practices.
• Measure and give feedback to the team to improve mean time to respond, key performance indicators (KPIs) and service-level objectives.
• Proactively adjust to upcoming company changes affecting the operation to modify ISIRT response processes.
• Possess advanced knowledge of attackers’ methods of escalation; lateral movement; and tactics, techniques and procedures.
• Present incident analysis and trend reporting to leadership, highlighting KPIs.
• Review events and process effectiveness and make recommendations for change to leadership.
• Require participation in ISIRT tabletop exercises designed to identify gaps, improve skills, enhance communication and engage with key stakeholders.
• Oversee IR playbooks, policies, procedures and guidelines to ensure they align with industry best practices.
• Collaborate with infrastructure, IT, vulnerability, threat intelligence and application security leads.
• Participate in monitoring internal and external events and stay tightly aligned with infrastructure and third-party, hosted, on-premises and end-user systems.
• Review and communicate ISIRT incident details from initial investigation through root cause analysis and post-mortem.
• Maintain operational rigor and recognize when team members need time away to refocus and refresh.
• Identify strengths and weaknesses in ISIRT team members and provide training to improve skills and knowledge.
• Remain current with emerging threats and share knowledge with colleagues to improve incident response. Perform other duties as assigned.
Strong organizational and team management skills are required to excel in this role, as well as previous experience in security administration, IR and security operations center (SOC) roles.
- Seven-plus years’ experience in security administration and SOC, with three-plus years’ security IR.
- Demonstrated experience leading people both in person and remotely distributed.
- Self-aware and capable of remaining calm under intense pressure.
- Strong written and oral communication skills across varying levels of the organization.
- Excellent judgment and the ability to make quick decisions when working with complex situations.
- Organized, with the ability to prioritize and respond within defined SLAs and maintain composure.
- Understanding of threats and vulnerabilities, as well as principles of ISIRT incident response and chain of custody.
- Knowledge with multiple solutions such as security orchestration, automation and response; SIEM; threat intelligence platform; directory services; malware sandboxes; vulnerability management; MITRE ATT&CK; IR playbooks; and endpoint/extended detection and response
- Generally familiar with one or more but not limited to: NIST, ISO 27001, NIS 2, CRA
- Track record of acting with integrity, taking pride in work, seeking to excel, and being curious and flexible.
- High degree of integrity, trustworthiness, professionalism and character.
Education Requirements:
- Bachelor’s degree preferred in cybersecurity, computer science, engineering or related field.
- Certification in CRISC, CISSP, CISA, CISM will be a plus.
We Offer You
- A variety of exciting challenges with ample opportunities for development and training in a truly global landscape
- A culture that pioneers a spirit of innovation where our industry experts drive visible results
- An equal opportunity employment experience that values diversity and inclusion
- Market competitive compensation and benefits with flexible working arrangements
Apply Now
If you are inspired to be part of our promise to protect what’s good; for food, people, and the planet, apply through our careers page at .
If you have any questions about your application, please contact Ephraim Kwa .
Diversity, equity, and inclusion is an everyday part of how we work. We give people a place to belong and support to thrive, an environment where everyone can be comfortable being themselves and has equal opportunities to grow and succeed. We embrace difference, celebrate people for who they are, and for the diversity they bring that helps us better understand and connect with our customers and communities worldwide.
#J-18808-LjbffrIncident Response Analyst
Posted today
Job Viewed
Job Description
Responsibilities:
• Deliver data centre operations support across multiple data centres
• Respond to all alarms/alerts set in Data Center Infrastructure Management (DCIM), Server Automation Operations System (SAOS), CCTV, Access Control Systems (ACS), and other functions (EHS, Security, etc),
• Provide deep understanding and intelligence of the criticality and impact of the incidents to the resolver groups.
• Ensure detailed records of alarm handling activities, including actions taken, resolutions in ticketing tools and file incident reports.
• Be available to coordinate as an incident commander in event of an issue.
• Support program managers and facilitate project deliverables, improve overall operational and engineering initiatives.
• Conduct root cause analysis (RCA) to determine recurring problems to their source.
• Employ in-depth questioning and analysis techniques such as five whys to determine the underlying cause of the incident or problem.
• Handle ticketing system
• Perform duties in compliance with SOP.
Requirements:
• Diploma/Degree in Information Technology.
• 2 years+ experience in command center, service center, or similar 24x7 operations center environment
• Ability to quickly triage multiple incidents and assign the right priority based on risk and confidence levels
• Knowledge of technical elements associated with systems such as IP Networks, DC Environment and Server Health.
• Outstanding verbal and written communication skills required, work with minimal direction, meeting goals, attention to details and an eye for continuous improvements
• Ability to successfully interact at all levels of the organization, including with clients, while functioning as a team player required.
• Basic working knowledge of data protection policies such as GDPR and the need to keep sensitive information secure.
Switches
Troubleshooting
Incident Response
Hardware
Ticketing
Data Center
Root Cause Analysis
Information Technology
Access Control
CCTV
IP
Networking
Attention to Details
network servers
Routers
Cabling
Security Incident Response
Incident Response Analyst
Posted today
Job Viewed
Job Description
• Deliver data centre operations support across multiple data centres
• Respond to all alarms/alerts set in Data Center Infrastructure Management (DCIM), Server Automation Operations System (SAOS), CCTV, Access Control Systems (ACS), and other functions (EHS, Security, etc),
• Provide deep understanding and intelligence of the criticality and impact of the incidents to the resolver groups.
• Ensure detailed records of alarm handling activities, including actions taken, resolutions in ticketing tools and file incident reports.
• Be available to coordinate as an incident commander in event of an issue.
• Support program managers and facilitate project deliverables, improve overall operational and engineering initiatives.
• Conduct root cause analysis (RCA) to determine recurring problems to their source.
• Employ in-depth questioning and analysis techniques such as five whys to determine the underlying cause of the incident or problem.
• Handle ticketing system
• Perform duties in compliance with SOP.
Requirements:
• Diploma/Degree in Information Technology.
• 2 years+ experience in command center, service center, or similar 24x7 operations center environment
• Ability to quickly triage multiple incidents and assign the right priority based on risk and confidence levels
• Knowledge of technical elements associated with systems such as IP Networks, DC Environment and Server Health.
• Outstanding verbal and written communication skills required, work with minimal direction, meeting goals, attention to details and an eye for continuous improvements
• Ability to successfully interact at all levels of the organization, including with clients, while functioning as a team player required.
• Basic working knowledge of data protection policies such as GDPR and the need to keep sensitive information secure.
Cybersecurity Incident Response Lead
Posted 2 days ago
Job Viewed
Job Description
Direct message the job poster from Ambition
Practice Lead-Tech | Executive Search at Ambition GroupTeam Lead – Cyber Defence & Incident Response
Overview:
We are seeking experienced Team Lead (AVP/VP level) to oversee our Cyber Defence and Response function. This role plays a critical part in safeguarding the organization by leading our incident response and threat intelligence efforts. You will serve as the incident commander during major cyber incidents and will be responsible for end-to-end management of threat response activities including threat intelligence, incident handling, vulnerability assessment, and cyber readiness exercises.
You will lead a small internal team (2 headcount) and work closely with trusted external vendors. While we operate a separate SOC function, this role is more strategic and command-focused, not hands-on technical SOC work. We're looking for someone who can lead and run this function, with strong stakeholder management and crisis coordination experience.
Key Responsibilities:
- Lead the Threat Response and Intelligence function, including proactive threat hunting, incident response, and vulnerability management.
- Act as the incident commander during critical cyber incidents—coordinate containment, impact assessment, root cause analysis, and response actions.
- Regularly engage with internal stakeholders and external partners to ensure a strong, collaborative cybersecurity posture.
- Conduct regular cybersecurity drills, tabletop exercises , and simulations to ensure organizational readiness.
- Keep the organization up to date with emerging threats , attack vectors, and evolving cyber risk trends.
- Work closely with the CISO and other leadership to provide incident reports, regulatory updates, and strategic insights.
- Build and mature the cyber defence capability , ensuring that processes, playbooks, and response procedures are efficient, clear, and continually improved.
- Mentor and guide junior team members and vendors supporting the function.
Qualifications & Skills:
- Strong experience in cybersecurity , with deep expertise in incident response , threat intelligence, and vulnerability management.
- Proven experience leading response teams during high-impact incidents and reporting to senior leadership and regulators.
- Professional cybersecurity certifications such as GCIH, CISSP, GCIA, or similar are required.
- Excellent verbal and written communication skills able to communicate complex incidents in a clear and structured manner.
- A proactive, analytical, and strategic mindset able to approach challenges with a holistic view .
- Strong stakeholder management and crisis leadership skills; capable of operating under pressure.
- Passionate about team collaboration, knowledge sharing, and driving a resilient security culture .
- Self-motivated, accountable, and results-driven, with the ability to manage time and quality effectively.
- Seniority level Mid-Senior level
- Employment type Full-time
- Job function Information Technology
- Industries Information Services
Referrals increase your chances of interviewing at Ambition by 2x
Sign in to set job alerts for “Cyber Security Specialist” roles. Information Technology - Cyber Security Analyst (Scoot) (Entry/Junior) Associate / Security Engineer, Managed Operations, SOC/Cyber Ops Information Technology - Cyber Security Specialist (Risk and Governance) Graduate Hiring - Cybersecurity Engineer Information Technology - Cyber Security Engineer (Threat Management and Incident Response) (Scoot) Security Analyst, Insider Risk - Global Security Organisation Information Technology - Cyber Security Lead Engineer Data Center Security Specialist, Data Center Security Security Engineer (SOC Analyst) - Digital Bank Cyber Security Officer (In Partnership with IMDA)We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
#J-18808-LjbffrSecurity Specialist (Incident Response)
Posted 4 days ago
Job Viewed
Job Description
- Engage in digital forensics and incident response efforts, including investigating complex and large-scale cyberattacks. This includes analyzing logs, performing host and network forensics, and examining malicious software.
- Take part in proactive threat hunting operations, identifying advanced threats and targeted attacks within client environments, and support security evaluations and simulation exercises.
- Detect and analyze indicators of compromise (IOCs) and understand adversaries’ tools, techniques, and procedures (TTPs) to determine the occurrence and impact of security breaches.
- Enhance and apply tools and processes to strengthen the organization's capabilities in investigation and threat detection.
- Work closely with internal IT and cybersecurity teams throughout the course of an investigation.
- Produce detailed and professional reports summarizing investigation findings and insights.
Manager, Incident Response & Management
Posted 4 days ago
Job Viewed
Job Description
Who we are About Stripe
Stripe is a financial infrastructure platform for businesses. Millions of companies—from the world’s largest enterprises to the most ambitious startups—use Stripe to accept payments, grow their revenue, and accelerate new business opportunities. Our mission is to increase the GDP of the internet, and we have a staggering amount of work ahead. That means you have an unprecedented opportunity to put the global economy within everyone’s reach while doing the most important work of your career.
About the teamThe Incident Response team is a global 24/7 team responsible for driving incident response and management from detection to resolution. Stripe is proud of its five 9s API reliability and this team is at the forefront of ensuring we keep it that way - working hand-in-hand with Reliability Eng and across the Tech Org. This team of incident response managers (IRM) is defined by our sense of ownership and how we drive incidents to resolution - marshaling the necessary cross-functional resources to respond to and resolve service outages, critical bugs, security attacks and anything that significantly impacts the users of our products. The team is user-first and ensures appropriate external communications from Stripe and senior management to keep our users informed of disruption to their experience of Stripe. The team is highly skilled in incident troubleshooting, program management, incident classifications, incident communications, incident escalation and technical adeptness as incidents can arise from anywhere and cut across products and orgs in Stripe.
What you’ll doThis position entails leading and optimizing Stripe's incident management processes and automation, ensuring efficiency and adherence to stringent incident response metrics. As the head of the incident response team, you will establish and maintain a best-in-class incident response framework, upholding the reliability standards expected of Stripe. Responsibilities include but are not limited to incident classification, escalation, and notification management, along with accountability for key incident response metrics (TTx). You will generate actionable insights to drive continuous improvement, collaborating with engineering leadership to refine incident detection, response, user communication, and tooling efficacy. Leadership and development of a highly effective 24/7 global incident response management team, characterized by urgency, programmatic ownership of incidents and communications, and the capacity to engage engineering teams, are crucial. Additionally, you will manage incident communications across multiple channels for executive and end-user audiences, and identify automation opportunities to streamline incident response workflows, thereby safeguarding users and minimizing disruption to their operations.
Responsibilities- Lead the global 24/7 team of regional managers and incident response managers with ability to be hands-on and support frontline on-call with speed, cross-functional collaboration and escalation
- Develop and own Stripe's incident response and management strategy and cross-functional roadmap, ensuring it aligns with the company's reputation for reliability.
- Spearhead and manage Stripe's AI-First strategy for automation of incident response workflows, partnering with the engineering team to implement required tooling enhancements.
- Enhance Stripe's incident response by leading and implementing improvements derived from analyzing user-facing incidents and extracting actionable insights and learnings.
- Collaborate closely with executive leadership, engineering, and operations teams to lead significant programs and reshape workflows and metrics concerning reliability and incident operations.
- Manage relevant TTx metrics, particularly those related to communication and escalation. Collaborate with engineering leadership to implement necessary improvements for each metric.
- Develop user-focused metrics and data to guide Stripe's incident response, reliability strategy, and user communications (including RCAs), ensuring impactful decision-making.
We’re looking for someone who meets the minimum requirements to be considered for the role. If you meet these requirements, you are encouraged to apply. The preferred qualifications are a bonus, not a requirement.
Minimum requirements- 5+ years of management experience, including 2+ years of experience managing managers with a proven record in building, growing and transforming teams.
- Extensive experience (4+ years) leading incident response for complex, large-scale distributed services with high SLOs/SLAs, coupled with deep expertise in crisis management.
- Demonstrated ability to lead, influence other leaders and deliver complex strategic projects involving multiple stakeholders
- Strong analytical skills, and the ability to use data to drive business decisions
- Possesses proficiency in basic incident troubleshooting and a reasonable understanding of system architecture. Fluent in using SQL, Splunk, or similar query languages.
- Exceptional communication abilities, capable of adapting incident updates for diverse audiences (executives, external users, internal teams).
- Affinity for a fast paced work environment, crafting strategic and rapid fixes to high intensity problems with a keen eye for detail and a high bar for quality
- Comfort navigating ambiguity, while identifying areas for process improvement and establishing best practices
- Experience managing geographically dispersed teams
- Experience using infrastructure and application monitoring tools such as Prometheus, Sentry and others
- Experience in incident response at a high-growth technology company, preferably within the payments or e-commerce sectors.
- Proven ability to apply Agentic and Generative AI to revolutionize incident response, coupled with a strong grasp of current industry trends in the incident response domain.
- Demonstrated history of driving engineering and process enhancements to improve incident response efficiency within a rapidly expanding technology organization.
Office-assigned Stripes spend at least 50% of the time in a given month in their local office or with users. This hits a balance between bringing people together for in-person collaboration and learning from each other, while supporting flexibility about how to do this in a way that makes sense for individuals and their teams.
The annual salary range for this role in the primary location is S$208,000 - S$312,000. This range may change if you are hired in another location. For sales roles, the range provided is the role’s On Target Earnings (“OTE”) range, meaning that the range includes both the sales commissions/sales bonuses target and annual base salary for the role. This salary range may be inclusive of several career levels at Stripe and will be narrowed during the interview process based on a number of factors, including the candidate’s experience, qualifications, and specific location. Applicants interested in this role and who are not located in the primary location may request the annual salary range for their location during the interview process.
Specific benefits and details about what compensation is included in the salary range listed above will vary depending on the applicant’s location and can be discussed in more detail during the interview process. Benefits/additional compensation for this role may include: equity, company bonus or sales commissions/bonuses; retirement plans; health benefits; and wellness stipends.
Office locations
Singapore
Team
Infrastructure & Corporate Tech
Job type
Full time
#J-18808-LjbffrBe The First To Know
About the latest Security monitoring Jobs in Singapore !
Manager, Incident Response & Management
Posted 9 days ago
Job Viewed
Job Description
Stripe is a financial infrastructure platform for businesses. Millions of companies—from the world’s largest enterprises to the most ambitious startups—use Stripe to accept payments, grow their revenue, and accelerate new business opportunities. Our mission is to increase the GDP of the internet, and we have a staggering amount of work ahead. That means you have an unprecedented opportunity to put the global economy within everyone’s reach while doing the most important work of your career.
About the teamThe Incident Ops team is a global 24/7 team responsible for driving incident response and management of incidents from detection to resolution. Stripe is proud of its five 9s reliability and this team is at the forefront of ensuring we keep it that way - working hand-in-hand with Reliability Eng and across the Tech Org. This team of incident response managers (IRM) is defined by our sense of ownership and how we drive incidents to resolution - marshaling the necessary cross-functional resources to respond to and resolve service outages, critical bugs, security attacks and anything that significantly impacts the users of our products. The team is user-first and ensures appropriate external communications from Stripe and senior management to keep our users informed of disruption to their experience of Stripe. The team is skilled in program management, communications, incident handling and technical adeptness as incidents can arise from anywhere and cut across products and orgs in Stripe.
What you’ll doAs the Manager of Incident Response Managers, you’ll evolve a world class incident response team in APAC to maintain a high bar of reliability expected of Stripe and by Stripe’s users. You’ll work hand-in-hand with regional IRM teams in AMER and EMEA to ensure solid 24/7 coverage for how we detect, respond to incidents, communicate to users, improve related tooling and measure impact. You will lead and nurture a high-performing IRM team based in APAC who has a strong sense of urgency, focused on identifying incident impact, rapidly assembling incident responders, driving incident communications, and mitigating impact as quickly as possible. As a result, you’ll be seen as the protector of our users - in minimizing the impact of incidents on their business and ensuring that Stripe is always thinking of our users.
Responsibilities- Manage a team of frontline incident response managers
- Provide coaching and development to each team member
- Coordinate and manage incident resolution with speed, cross-functional collaboration, and accuracy, with a global and broad set of stakeholders.
- Facilitate post incident reviews to identify technical or process problems which need to be remediated
- Contribute to incident root cause analysis, identifying remediation opportunities for Incident Operations, partner teams on operations and engineering to execute upon.
- Formulate strategy and deliver on communications to both internal stakeholders and Stripe’s users.
- Collaborate with engineering and operations teams to align on and execute upon on-going improvements to processes, tooling, metrics, and the Incident Management framework.
- Influence and make decisions through interpretation of data and consolidation of input from multiple stakeholders.
We’re looking for someone who meets the minimum requirements to be considered for the role. If you meet these requirements, you are encouraged to apply. The preferred qualifications are a bonus, not a requirement.
Minimum requirements- Have 5+ years of direct people management experience, an excellent coach
- Have 3+ years of experience within a Major Incident Management team
- Demonstrated employee and team development
- Enjoy a fast paced work environment, crafting strategic and rapid fixes to high intensity problems with a keen eye for detail and a high bar for quality
- Comfortable navigating ambiguity, while identifying areas for process improvement and establishing best practices
- Strong written and verbal communication skills, able to deliver effective messaging to all levels of a technical organization
- Can problem solve and translate complicated technical issues into solutions, while keeping a users-first mindset
- Have an ability to execute on and deliver complex operational projects involving multiple stakeholders especially in partnering with engineering
- Have technical background, are proficient in SQL, Splunk, or equivalent query languages and the ability to use data to drive business decisions based on analytical research
- Experience using infrastructure and application monitoring tools such as Signalfx, Prometheus, Sentry, Grafana and others
- Experience at a high-growth technology company, especially within the payments or e-commerce space in particular for incident response
- Experience working with both cloud and third-party solution providers
- Experience with managing user-facing communications strategy during sensitive situations such as outages
Hybrid work at Stripe
Office-assigned Stripes spend at least 50% of the time in a given month in their local office or with users. This hits a balance between bringing people together for in-person collaboration and learning from each other, while supporting flexibility about how to do this in a way that makes sense for individuals and their teams.
Manager, Incident Response & Management
Posted 13 days ago
Job Viewed
Job Description
Who we are
About Stripe
Stripe is a financial infrastructure platform for businesses. Millions of companies—from the world’s largest enterprises to the most ambitious startups—use Stripe to accept payments, grow their revenue, and accelerate new business opportunities. Our mission is to increase the GDP of the internet, and we have a staggering amount of work ahead. That means you have an unprecedented opportunity to put the global economy within everyone’s reach while doing the most important work of your career.
About the teamThe Incident Response team is a global 24/7 team responsible for driving incident response and management from detection to resolution. Stripe is proud of its five 9s API reliability and this team is at the forefront of ensuring we keep it that way - working hand-in-hand with Reliability Eng and across the Tech Org. This team of incident response managers (IRM) is defined by our sense of ownership and how we drive incidents to resolution - marshaling the necessary cross-functional resources to respond to and resolve service outages, critical bugs, security attacks and anything that significantly impacts the users of our products. The team is user-first and ensures appropriate external communications from Stripe and senior management to keep our users informed of disruption to their experience of Stripe. The team is highly skilled in incident troubleshooting, program management, incident classifications, incident communications, incident escalation and technical adeptness as incidents can arise from anywhere and cut across products and orgs in Stripe.
What you’ll doThis position entails leading and optimizing Stripe's incident management processes and automation, ensuring efficiency and adherence to stringent incident response metrics. As the head of the incident response team, you will establish and maintain a best-in-class incident response framework, upholding the reliability standards expected of Stripe. Responsibilities include but are not limited to incident classification, escalation, and notification management, along with accountability for key incident response metrics (TTx). You will generate actionable insights to drive continuous improvement, collaborating with engineering leadership to refine incident detection, response, user communication, and tooling efficacy. Leadership and development of a highly effective 24/7 global incident response management team, characterized by urgency, programmatic ownership of incidents and communications, and the capacity to engage engineering teams, are crucial. Additionally, you will manage incident communications across multiple channels for executive and end-user audiences, and identify automation opportunities to streamline incident response workflows, thereby safeguarding users and minimizing disruption to their operations.
Responsibilities- Lead the global 24/7 team of regional managers and incident response managers with ability to be hands-on and support frontline on-call with speed, cross-functional collaboration and escalation
- Develop and own Stripe's incident response and management strategy and cross-functional roadmap, ensuring it aligns with the company's reputation for reliability.
- Spearhead and manage Stripe's AI-First strategy for automation of incident response workflows, partnering with the engineering team to implement required tooling enhancements.
- Enhance Stripe's incident response by leading and implementing improvements derived from analyzing user-facing incidents and extracting actionable insights and learnings.
- Collaborate closely with executive leadership, engineering, and operations teams to lead significant programs and reshape workflows and metrics concerning reliability and incident operations.
- Manage relevant TTx metrics, particularly those related to communication and escalation. Collaborate with engineering leadership to implement necessary improvements for each metric.
- Develop user-focused metrics and data to guide Stripe's incident response, reliability strategy, and user communications (including RCAs), ensuring impactful decision-making.
We’re looking for someone who meets the minimum requirements to be considered for the role. If you meet these requirements, you are encouraged to apply. The preferred qualifications are a bonus, not a requirement.
Minimum requirements- 5+ years of management experience, including 2+ years of experience managing managers with a proven record in building, growing and transforming teams.
- Extensive experience (4+ years) leading incident response for complex, large-scale distributed services with high SLOs/SLAs, coupled with deep expertise in crisis management.
- Demonstrated ability to lead, influence other leaders and deliver complex strategic projects involving multiple stakeholders
- Strong analytical skills, and the ability to use data to drive business decisions
- Possesses proficiency in basic incident troubleshooting and a reasonable understanding of system architecture. Fluent in using SQL, Splunk, or similar query languages.
- Exceptional communication abilities, capable of adapting incident updates for diverse audiences (executives, external users, internal teams).
- Affinity for a fast paced work environment, crafting strategic and rapid fixes to high intensity problems with a keen eye for detail and a high bar for quality
- Comfort navigating ambiguity, while identifying areas for process improvement and establishing best practices
- Experience managing geographically dispersed teams
- Experience using infrastructure and application monitoring tools such as Prometheus, Sentry and others
- Experience in incident response at a high-growth technology company, preferably within the payments or e-commerce sectors.
- Proven ability to apply Agentic and Generative AI to revolutionize incident response, coupled with a strong grasp of current industry trends in the incident response domain.
- Demonstrated history of driving engineering and process enhancements to improve incident response efficiency within a rapidly expanding technology organization.
Manager, Incident Response & Management
Posted 13 days ago
Job Viewed
Job Description
Stripe is a financial infrastructure platform for businesses. Millions of companies—from the world’s largest enterprises to the most ambitious startups—use Stripe to accept payments, grow their revenue, and accelerate new business opportunities. Our mission is to increase the GDP of the internet, and we have a staggering amount of work ahead. That means you have an unprecedented opportunity to put the global economy within everyone’s reach while doing the most important work of your career.
About the teamThe Incident Response team is a global 24/7 team responsible for driving incident response and management from detection to resolution. Stripe is proud of its five 9s API reliability and this team is at the forefront of ensuring we keep it that way - working hand-in-hand with Reliability Eng and across the Tech Org. This team of incident response managers (IRM) is defined by our sense of ownership and how we drive incidents to resolution - marshaling the necessary cross-functional resources to respond to and resolve service outages, critical bugs, security attacks and anything that significantly impacts the users of our products. The team is user-first and ensures appropriate external communications from Stripe and senior management to keep our users informed of disruption to their experience of Stripe. The team is highly skilled in incident troubleshooting, program management, incident classifications, incident communications, incident escalation and technical adeptness as incidents can arise from anywhere and cut across products and orgs in Stripe.
What you’ll doThis position entails leading and optimizing Stripe's incident management processes and automation, ensuring efficiency and adherence to stringent incident response metrics. As the head of the incident response team, you will establish and maintain a best-in-class incident response framework, upholding the reliability standards expected of Stripe. Responsibilities include but are not limited to incident classification, escalation, and notification management, along with accountability for key incident response metrics (TTx). You will generate actionable insights to drive continuous improvement, collaborating with engineering leadership to refine incident detection, response, user communication, and tooling efficacy. Leadership and development of a highly effective 24/7 global incident response management team, characterized by urgency, programmatic ownership of incidents and communications, and the capacity to engage engineering teams, are crucial. Additionally, you will manage incident communications across multiple channels for executive and end-user audiences, and identify automation opportunities to streamline incident response workflows, thereby safeguarding users and minimizing disruption to their operations.
Responsibilities- Lead the global 24/7 team of regional managers and incident response managers with ability to be hands-on and support frontline on-call with speed, cross-functional collaboration and escalation
- Develop and own Stripe's incident response and management strategy and cross-functional roadmap, ensuring it aligns with the company's reputation for reliability.
- Spearhead and manage Stripe's AI-First strategy for automation of incident response workflows, partnering with the engineering team to implement required tooling enhancements.
- Enhance Stripe's incident response by leading and implementing improvements derived from analyzing user-facing incidents and extracting actionable insights and learnings.
- Collaborate closely with executive leadership, engineering, and operations teams to lead significant programs and reshape workflows and metrics concerning reliability and incident operations.
- Manage relevant TTx metrics, particularly those related to communication and escalation. Collaborate with engineering leadership to implement necessary improvements for each metric.
- Develop user-focused metrics and data to guide Stripe's incident response, reliability strategy, and user communications (including RCAs), ensuring impactful decision-making.
We’re looking for someone who meets the minimum requirements to be considered for the role. If you meet these requirements, you are encouraged to apply. The preferred qualifications are a bonus, not a requirement.
- 5+ years of management experience, including 2+ years of experience managing managers with a proven record in building, growing and transforming teams.
- Extensive experience (4+ years) leading incident response for complex, large-scale distributed services with high SLOs/SLAs, coupled with deep expertise in crisis management.
- Demonstrated ability to lead, influence other leaders and deliver complex strategic projects involving multiple stakeholders
- Strong analytical skills, and the ability to use data to drive business decisions
- Possesses proficiency in basic incident troubleshooting and a reasonable understanding of system architecture. Fluent in using SQL, Splunk, or similar query languages.
- Exceptional communication abilities, capable of adapting incident updates for diverse audiences (executives, external users, internal teams).
- Affinity for a fast paced work environment, crafting strategic and rapid fixes to high intensity problems with a keen eye for detail and a high bar for quality
- Comfort navigating ambiguity, while identifying areas for process improvement and establishing best practices
- Experience managing geographically dispersed teams
- Experience using infrastructure and application monitoring tools such as Prometheus, Sentry and others
- Experience in incident response at a high-growth technology company, preferably within the payments or e-commerce sectors.
- Proven ability to apply Agentic and Generative AI to revolutionize incident response, coupled with a strong grasp of current industry trends in the incident response domain.
- Demonstrated history of driving engineering and process enhancements to improve incident response efficiency within a rapidly expanding technology organization.
The annual salary range for this role in the primary location is S$208,000 - S$312,000. This range may change if you are hired in another location. For sales roles, the range provided is the role’s On Target Earnings (“OTE”) range, meaning that the range includes both the sales commissions/sales bonuses target and annual base salary for the role. This salary range may be inclusive of several career levels at Stripe and will be narrowed during the interview process based on a number of factors, including the candidate’s experience, qualifications, and specific location. Applicants interested in this role and who are not located in the primary location may request the annual salary range for their location during the interview process.
Specific benefits and details about what compensation is included in the salary range listed above will vary depending on the applicant’s location and can be discussed in more detail during the interview process. Benefits/additional compensation for this role may include: equity, company bonus or sales commissions/bonuses; retirement plans; health benefits; and wellness stipends.
At Stripe, we're looking for people with passion, grit, and integrity. You're encouraged to apply even if your experience doesn't precisely match the job description. Your skills and passion will stand out—and set you apart—especially if your career has taken some extraordinary twists and turns. At Stripe, we welcome diverse perspectives and people who think rigorously and aren't afraid to challenge assumptions. Join us.
#J-18808-Ljbffr