1,206 Sre jobs in Singapore

Site Reliability Engineer (SRE)

Singapore, Singapore Sea

Posted 23 days ago

Job Viewed

Tap Again To Close

Job Description

Join to apply for the Site Reliability Engineer (SRE) role at Sea .

Get AI-powered advice on this job and more exclusive features.

Responsibilities
  • Develop and maintain scripts to retrieve and process data from Google Workspace and Zoom, including users, groups, meeting rooms, licenses, activity logs, and configurations.
  • Normalize and structure data for analysis, reporting, and alerting.
  • Build automated alerting systems to identify anomalies, policy violations, or operational issues in Workspace or Zoom environments.
  • Design workflows to automate tasks such as account cleanup, license management, and configuration enforcement.
  • Build a secure internal web platform to standardize administrative actions, including reporting, dashboards, and visualizations.
  • Collaborate with IT Services and Support teams to prioritize features and gather automation requirements.
  • Implement Git workflows for collaboration and maintain well-documented code.
  • Deploy tools in containerized environments like Docker and support infrastructure such as databases and authentication mechanisms.
Requirements
  • 3–5 years of experience in software development, automation, or internal systems.
  • Proficiency in scripting/backend languages like Python, Node.js, or Go.
  • Experience with APIs such as Google Workspace Admin SDK, Zoom API, GAM.
  • Familiarity with Git and collaborative workflows.
  • Strong problem-solving skills and ability to work independently.
Additional Details
  • Seniority level: Entry level
  • Employment type: Full-time
  • Job function: Information Technology
  • Industries: Technology, Internet
#J-18808-Ljbffr
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer (SRE)

Singapore, Singapore Percept Solutions

Posted 23 days ago

Job Viewed

Tap Again To Close

Job Description

Join to apply for the Site Reliability Engineer (SRE) role at Percept Solutions

Continue with Google Continue with Google

2 years ago Be among the first 25 applicants

Join to apply for the Site Reliability Engineer (SRE) role at Percept Solutions

Job Description

Job Description

Design and implementation of new solutions as well as enhancement and integration of existing ones to ensure pro-active monitoring

Working collaboration with internal teams and vendors to identify, monitor and improve Service Level Objective and Indicator

Support for incident management, investigation, resolution and post-mortem

Performance monitoring and capacity management

Automate manual operational tasks for self-healing

Administration to provide operational support for monitoring tools

Deployment and patching

System configuration

User access management

Incident management and investigation

Report and Dashboard generation

Job Requirements

SRE and automation tools like Ansible, Jenkins

Monitoring solutions such as Zabbix, Dynatrace,CloudWatch, eG, SolarWinds

Dashboard visualization such as Grafana

Proficient in SQL Scripting for data analytics

Familiar with database technology such as Oracle,MySQL, MS SQL

Familiar with Windows, Unix, Linux OS environments

EA Licence No.:18S9405 / EA Reg. No.:R

Seniority level
  • Seniority level Mid-Senior level
Employment type
  • Employment type Full-time
Job function
  • Job function Engineering and Information Technology
  • Industries IT Services and IT Consulting

Referrals increase your chances of interviewing at Percept Solutions by 2x

Sign in to set job alerts for “Site Reliability Engineer” roles.

Continue with Google Continue with Google

Continue with Google Continue with Google

Project Intern, Digital Innovations & Solutions (Full Stack Developer) Web Frontend Engineer(Work Location: Remote in Taiwan) Software Engineering - Research Internship Software Developer – Life Sciences Technology Frontend Software Engineer, Data Platform - 2025 Start Python Developer (Singapore) – Elite Fintech Startup (up to $200K SGD + Bonus + Hybrid)

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

#J-18808-Ljbffr
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer (SRE)

Singapore, Singapore Percept Solutions

Posted today

Job Viewed

Tap Again To Close

Job Description

full-time

Join to apply for the
Site Reliability Engineer (SRE)
role at
Percept Solutions
Continue with Google Continue with Google
2 years ago Be among the first 25 applicants
Join to apply for the
Site Reliability Engineer (SRE)
role at
Percept Solutions
Job Description
Job Description
Design and implementation of new solutions as well as enhancement and integration of existing ones to ensure pro-active monitoring
Working collaboration with internal teams and vendors to identify, monitor and improve Service Level Objective and Indicator
Support for incident management, investigation, resolution and post-mortem
Performance monitoring and capacity management
Automate manual operational tasks for self-healing
Administration to provide operational support for monitoring tools
Deployment and patching
System configuration
User access management
Incident management and investigation
Report and Dashboard generation
Job Requirements
SRE and automation tools like Ansible, Jenkins
Monitoring solutions such as Zabbix, Dynatrace,CloudWatch, eG, SolarWinds
Dashboard visualization such as Grafana
Proficient in SQL Scripting for data analytics
Familiar with database technology such as Oracle,MySQL, MS SQL
Familiar with Windows, Unix, Linux OS environments
EA Licence No.:18S9405 / EA Reg. No.:R
Seniority level
Seniority level Mid-Senior level
Employment type
Employment type Full-time
Job function
Job function Engineering and Information Technology
Industries IT Services and IT Consulting
Referrals increase your chances of interviewing at Percept Solutions by 2x
Sign in to set job alerts for “Site Reliability Engineer” roles.
Continue with Google Continue with Google
Continue with Google Continue with Google
Project Intern, Digital Innovations & Solutions (Full Stack Developer)
Web Frontend Engineer(Work Location: Remote in Taiwan)
Software Engineering - Research Internship
Software Developer – Life Sciences Technology
Frontend Software Engineer, Data Platform - 2025 Start
Python Developer (Singapore) – Elite Fintech Startup (up to $200K SGD + Bonus + Hybrid)
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
#J-18808-Ljbffr

This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer (SRE)

Singapore, Singapore Sea

Posted today

Job Viewed

Tap Again To Close

Job Description

Join to apply for the
Site Reliability Engineer (SRE)
role at
Sea .
Get AI-powered advice on this job and more exclusive features.
Responsibilities
Develop and maintain scripts to retrieve and process data from Google Workspace and Zoom, including users, groups, meeting rooms, licenses, activity logs, and configurations.
Normalize and structure data for analysis, reporting, and alerting.
Build automated alerting systems to identify anomalies, policy violations, or operational issues in Workspace or Zoom environments.
Design workflows to automate tasks such as account cleanup, license management, and configuration enforcement.
Build a secure internal web platform to standardize administrative actions, including reporting, dashboards, and visualizations.
Collaborate with IT Services and Support teams to prioritize features and gather automation requirements.
Implement Git workflows for collaboration and maintain well-documented code.
Deploy tools in containerized environments like Docker and support infrastructure such as databases and authentication mechanisms.
Requirements
3–5 years of experience in software development, automation, or internal systems.
Proficiency in scripting/backend languages like Python, Node.js, or Go.
Experience with APIs such as Google Workspace Admin SDK, Zoom API, GAM.
Familiarity with Git and collaborative workflows.
Strong problem-solving skills and ability to work independently.
Additional Details
Seniority level: Entry level
Employment type: Full-time
Job function: Information Technology
Industries: Technology, Internet
#J-18808-Ljbffr

This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer (SRE)

068895 $8500 Monthly COFFEE MEETS BAGEL PTE. LTD.

Posted 2 days ago

Job Viewed

Tap Again To Close

Job Description

We are a global dating app created to give everyone a chance at love. The sense of belonging and connectedness we get from relationships helps us survive and thrive, and we’re working to make it a little easier for people to find that. We’re inspired by the stories we hear from employees, friends, and family who have used our app to transform their lives, and you, too, can make a difference by joining us!

We are looking for a talented Senior Site Reliability Engineer to help design the future of dating. This individual will bring extensive experience in running large-scale data sources in the cloud and will be responsible for modernizing our data source handling and maintaining our core infrastructure and services on AWS.

This role will be based in Singapore and report directly to the CTO.

Responsibilities:
  • Architect, develop, and maintain our core infrastructure and services on AWS, focusing on high availability, performance, and scalability.
  • Specific AWS services of interest include EC2, RDS, S3, ElastiCache, CloudWatch, RedShift, OpenSearch, and VPC.
  • Implement and manage continuous deployment processes to achieve seamless deployment of services with minimal downtime.
  • Monitor system performance, identify bottlenecks, and apply necessary optimizations to ensure the smooth operation of our services.
  • Develop and maintain automated tools for infrastructure provisioning, configuration, and deployment.
  • Work closely with development teams to integrate infrastructure builds and operational best practices into the software development lifecycle.
  • Conduct root cause analysis for production errors and implement strategies to prevent future occurrences.
  • Manage and optimize network configurations to ensure secure and efficient data flow and access.
  • Administer and maintain databases, ensuring their reliability, performance, and security.
  • Lead capacity planning efforts to ensure that our infrastructure scales in line with demand while optimizing costs and maintaining performance.
  • Modernize data source handling (Redshift, Postgres, RDS, etc.).
  • Manage Kubernetes workloads.
Qualifications:
  • Bachelor's degree in Computer Science, Engineering, or a related field.
  • 5+ years of industry experience.
  • Proven experience as an SRE, DevOps Engineer, or similar role in a cloud-based environment.
  • Strong expertise in AWS services and tools.
  • Proficient understanding of networking principles, transport, and application protocols, especially TCP/IP, BGP, DNS, TLS, and HTTP/S.
  • Experience with database administration, including performance tuning, backup and recovery processes, and security management.
  • Proficiency in scripting languages (e.g., Python, Bash) and automation tools (e.g., Terraform).
  • Excellent problem-solving skills and the ability to work independently or as part of a team.
  • Strong Written and Verbal Communication: Fluent in English (both written and verbal); proficiency in Chinese is an advantage to work with Chinese stakeholders.
  • Significant experience in capacity planning and cost management within cloud environments.
  • Experience with Kubernetes.
  • Familiarity with Terraform for general systems maintenance.
  • Experience with data sources like Redshift, Postgres (Citus, Patroni), and RDS.
Preferred Qualifications:
  • AWS SysOps Administrator Associate or AWS Solutions Architect Professional (SAP) certification.
  • Experience with Spotinst for cost optimization.
  • Familiarity with additional scripting languages such as Go or JavaScript.

If you're passionate about tackling big challenges and have the skills to help us shape the future of online dating, we want to hear from you!

This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer (SRE) (GovTech)

Singapore, Singapore AvePoint

Posted 15 days ago

Job Viewed

Tap Again To Close

Job Description

Site Reliability Engineer (SRE) (GovTech)

We are seeking a skilled and passionate Engineer to join our team to build and operate a Whole-of-Government (WoG) runtime platform.

As a Site Reliability Engineer, you will be responsible for designing and operating GitLab, AWS and Kubernetes-based infrastructure and solutions that power our platform, to ensure the stability, scalability, and performance of our runtime platform.

Responsibilities

As a Site Reliability Engineer, you will be responsible for:

Toil Reduction & Automation

  • Identify repetitive tasks and develop automation via CI/CD pipelines, ensuring integration with cross-functional teams to reduce manual intervention and improve operational efficiency.
Observability & System Health
  • Implement comprehensive observability solutions (logs, metrics, traces, alerts) around the four Golden Signals (latency, traffic, errors, saturation), and build automation for proactive system health assessments and self-remediation.
Production Support & Incident Management
  • Participate in on-call rotations, promptly respond to incidents to minimize MTTR, and conduct thorough post-incident reviews to implement preventive measures and improve system resilience.
Security & Compliance
  • Design and implement solutions that are secure and compliant by collaborating with dedicated security teams, conducting regular audits, and integrating advanced vulnerability scanning tools.
Maintenance, Optimisation & Performance
  • Identify and resolve performance bottlenecks and operational issues, define and track KPIs (e.g., MTTR, system uptime, cost efficiency), and drive ongoing optimisation efforts.
Strategic Customer Engagement
  • Act as a technical advisor for tenants, guiding them on containerization, and best practices for cloud-native deployments, and participating in strategic initiatives to enhance platform scalability and performance.
Knowledge Sharing & Documentation
  • Develop and maintain detailed playbooks, runbooks, and documentation to facilitate team-wide knowledge sharing, streamline incident response, and ensure that critical processes are well understood across the team.
Continuous Learning & Innovation
  • Stay current with the latest AWS, Kubernetes, and industry developments, and proactively recommend improvements and innovative solutions to maintain a competitive and reliable platform.
Requirements
  • Bachelor's degree or Diploma in Computer Science, Engineering, or a related field (or equivalent experience).
  • Proven experience as a Site Reliability Engineer or similar role, with a strong background in containerization, orchestration, and cloud-native technologies.
  • Proven ability to troubleshoot and resolve complex technical issues in containerized applications.
  • Demonstrated experience with incident management, including post-incident reviews and continuous improvement.
  • Strong documentation skills and experience in knowledge sharing across teams.
  • Deep understanding of AWS, Kubernetes (including AWS EKS), and operational best practices, with familiarity in multi-cloud or hybrid environments.
  • Solid grasp of networking, security, and storage in both AWS and Kubernetes contexts.
  • Experience integrating Kubernetes with AWS cloud technologies (e.g., Secrets Manager, Load Balancers) and using infrastructure-as-code (Terraform or similar).
  • Hands-on experience with containerization tools (Kubernetes, Kustomize, Helm) and automation scripting (Go, Python, Bash, or equivalent).
  • Ability to write and maintain automated tests or conduct thorough manual testing for automation scripts, ensuring the reliability and effectiveness of automated solutions.
  • Familiarity with CI/CD tools (GitLab CI/CD, ArgoCD) and version control systems (Git).
  • Experience with observability/monitoring tools (Prometheus, Grafana, ELK Stack) and defining SLOs and Error Budgets.
  • Certifications such as Certified Kubernetes Administrator (CKA) or Certified Kubernetes Application Developer (CKAD) are a plus.
  • Experience with developing Kubernetes operators using Go, service mesh technologies, and Chaos Engineering is a plus.
Soft Skills
  • Proactive in identifying problems and recommending strategic solutions.
  • Excellent problem-solving skills with a robust analytical mindset.
  • Clear, concise, and effective communication skills; adept at collaborating across crossfunctional teams, including development, security, and customer-facing groups.
  • Ability to remain calm and effective under pressure, especially during incident response.
  • Adaptability to rapid change with a continuous learning mindset, sharing knowledge to foster team growth.
  • Customer-focused with the ability to translate technical insights into understandable, actionable guidance.
  • Leadership and mentoring capabilities, contributing to the development of a resilient and collaborative team environment are a plus.

Any personal data you share with us during the application process will be processed strictly in compliance with applicable data protection laws and our Privacy Notice.

Seniority level

Mid-Senior level

Employment type

Full-time

Job function

Engineering and Information Technology

Industries

Data Security Software Products

#J-18808-Ljbffr
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer (SRE) (GovTech)

Singapore, Singapore Avepoint

Posted 24 days ago

Job Viewed

Tap Again To Close

Job Description

We are seeking a skilled and passionate Engineer to join our team to build and operate a Whole-of-Government (WoG) runtime platform.

As a Site Reliability Engineer, you will be responsible for designing and operating GitLab, AWSand Kubernetes-based infrastructure and solutions that power our platform, to ensure the stability, scalability, and performance of our runtime platform.

Responsibilities:

As a Site Reliability Engineer, you will be responsible for:
Toil Reduction & Automation
• Identify repetitive tasks and develop automation via CI/CD pipelines, ensuring integration with cross-functional teams to reduce manual intervention and improve operational efficiency.
Observability & System Health
• Implement comprehensive observability solutions (logs, metrics, traces, alerts) around the four Golden Signals (latency, traffic, errors, saturation), and build automation for proactive system health assessments and self-remediation.
Production Support & Incident Management
• Participate in on-call rotations, promptly respond to incidents to minimize MTTR, and conduct thorough post-incident reviews to implement preventive measures and improve system resilience.
Security & Compliance
• Design and implement solutions that are secure and compliant by collaborating with dedicated security teams, conducting regular audits, and integrating advanced vulnerability scanning tools.

Maintenance, Optimisation & Performance
• Identify and resolve performance bottlenecks and operational issues, define and track KPIs (e.g., MTTR, system uptime, cost efficiency), and drive ongoing optimisation efforts.
Strategic Customer Engagement
• Act as a technical advisor for tenants, guiding them on containerization, and best practices for cloud-native deployments, and participating in strategic initiatives to enhance platform scalability and performance.
Knowledge Sharing & Documentation
• Develop and maintain detailed playbooks, runbooks, and documentation to facilitate team-wide knowledge sharing, streamline incident response, and ensure that critical processes are well understood across the team.
Continuous Learning & Innovation
• Stay current with the latest AWS, Kubernetes, and industry developments, and proactively recommend improvements and innovative solutions to maintain a competitive and reliable platform.

Requirements:

• Bachelor's degree or Diploma in Computer Science, Engineering, or a related field (or equivalent experience).
• Proven experience as a Site Reliability Engineer or similar role, with a strong background in containerization, orchestration, and cloud-native technologies.
• Proven ability to troubleshoot and resolve complex technical issues in containerized applications.
• Demonstrated experience with incident management, including post-incident reviews and continuous improvement.
• Strong documentation skills and experience in knowledge sharing across teams.
• Deep understanding of AWS, Kubernetes (including AWS EKS), and operational best practices, with familiarity in multi-cloud or hybrid environments.
• Solid grasp of networking, security, and storage in both AWS and Kubernetes contexts.
• Experience integrating Kubernetes with AWS cloud technologies (e.g., Secrets Manager, Load Balancers) and using infrastructure-as-code (Terraform or similar).
• Hands-on experience with containerization tools (Kubernetes, Kustomize, Helm) and automation scripting (Go, Python, Bash, or equivalent).
• Ability to write and maintain automated tests or conduct thorough manual testing for automation scripts, ensuring the reliability and effectiveness of automated solutions.
• Familiarity with CI/CD tools (GitLab CI/CD, ArgoCD) and version control systems (Git).
• Experience with observability/monitoring tools (Prometheus, Grafana, ELK Stack) and defining SLOs and Error Budgets.
• Certifications such as Certified Kubernetes Administrator (CKA) or Certified Kubernetes Application Developer (CKAD) are a plus.
• Experience with developing Kubernetes operators using Go, service mesh technologies, and Chaos Engineering is a plus.

Soft skills:

• Proactive in identifying problems and recommending strategic solutions.
• Excellent problem-solving skills with a robust analytical mindset.
• Clear, concise, and effective communication skills; adept at collaborating across crossfunctional teams, including development, security, and customer-facing groups.
• Ability to remain calm and effective under pressure, especially during incident response.
• Adaptability to rapid change with a continuous learning mindset, sharing knowledge to foster team growth.
• Customer-focused with the ability to translate technical insights into understandable, actionable guidance.
• Leadership and mentoring capabilities, contributing to the development of a resilient and collaborative team environment are a plus.

Any personal data you share with us during the application process will be processed strictly in compliance with applicable data protection laws and our Privacy Notice .

#J-18808-Ljbffr
This advertiser has chosen not to accept applicants from your region.
Be The First To Know

About the latest Sre Jobs in Singapore !

Site Reliability Engineer (SRE) (GovTech)

Singapore, Singapore AvePoint

Posted today

Job Viewed

Tap Again To Close

Job Description

Site Reliability Engineer (SRE) (GovTech)
We are seeking a skilled and passionate Engineer to join our team to build and operate a Whole-of-Government (WoG) runtime platform.
As a Site Reliability Engineer, you will be responsible for designing and operating GitLab, AWS and Kubernetes-based infrastructure and solutions that power our platform, to ensure the stability, scalability, and performance of our runtime platform.
Responsibilities
As a Site Reliability Engineer, you will be responsible for:
Toil Reduction & Automation
Identify repetitive tasks and develop automation via CI/CD pipelines, ensuring integration with cross-functional teams to reduce manual intervention and improve operational efficiency.
Observability & System Health
Implement comprehensive observability solutions (logs, metrics, traces, alerts) around the four Golden Signals (latency, traffic, errors, saturation), and build automation for proactive system health assessments and self-remediation.
Production Support & Incident Management
Participate in on-call rotations, promptly respond to incidents to minimize MTTR, and conduct thorough post-incident reviews to implement preventive measures and improve system resilience.
Security & Compliance
Design and implement solutions that are secure and compliant by collaborating with dedicated security teams, conducting regular audits, and integrating advanced vulnerability scanning tools.
Maintenance, Optimisation & Performance
Identify and resolve performance bottlenecks and operational issues, define and track KPIs (e.g., MTTR, system uptime, cost efficiency), and drive ongoing optimisation efforts.
Strategic Customer Engagement
Act as a technical advisor for tenants, guiding them on containerization, and best practices for cloud-native deployments, and participating in strategic initiatives to enhance platform scalability and performance.
Knowledge Sharing & Documentation
Develop and maintain detailed playbooks, runbooks, and documentation to facilitate team-wide knowledge sharing, streamline incident response, and ensure that critical processes are well understood across the team.
Continuous Learning & Innovation
Stay current with the latest AWS, Kubernetes, and industry developments, and proactively recommend improvements and innovative solutions to maintain a competitive and reliable platform.
Requirements
Bachelor's degree or Diploma in Computer Science, Engineering, or a related field (or equivalent experience).
Proven experience as a Site Reliability Engineer or similar role, with a strong background in containerization, orchestration, and cloud-native technologies.
Proven ability to troubleshoot and resolve complex technical issues in containerized applications.
Demonstrated experience with incident management, including post-incident reviews and continuous improvement.
Strong documentation skills and experience in knowledge sharing across teams.
Deep understanding of AWS, Kubernetes (including AWS EKS), and operational best practices, with familiarity in multi-cloud or hybrid environments.
Solid grasp of networking, security, and storage in both AWS and Kubernetes contexts.
Experience integrating Kubernetes with AWS cloud technologies (e.g., Secrets Manager, Load Balancers) and using infrastructure-as-code (Terraform or similar).
Hands-on experience with containerization tools (Kubernetes, Kustomize, Helm) and automation scripting (Go, Python, Bash, or equivalent).
Ability to write and maintain automated tests or conduct thorough manual testing for automation scripts, ensuring the reliability and effectiveness of automated solutions.
Familiarity with CI/CD tools (GitLab CI/CD, ArgoCD) and version control systems (Git).
Experience with observability/monitoring tools (Prometheus, Grafana, ELK Stack) and defining SLOs and Error Budgets.
Certifications such as Certified Kubernetes Administrator (CKA) or Certified Kubernetes Application Developer (CKAD) are a plus.
Experience with developing Kubernetes operators using Go, service mesh technologies, and Chaos Engineering is a plus.
Soft Skills
Proactive in identifying problems and recommending strategic solutions.
Excellent problem-solving skills with a robust analytical mindset.
Clear, concise, and effective communication skills; adept at collaborating across crossfunctional teams, including development, security, and customer-facing groups.
Ability to remain calm and effective under pressure, especially during incident response.
Adaptability to rapid change with a continuous learning mindset, sharing knowledge to foster team growth.
Customer-focused with the ability to translate technical insights into understandable, actionable guidance.
Leadership and mentoring capabilities, contributing to the development of a resilient and collaborative team environment are a plus.
Any personal data you share with us during the application process will be processed strictly in compliance with applicable data protection laws and our Privacy Notice.
Seniority level
Mid-Senior level
Employment type
Full-time
Job function
Engineering and Information Technology
Industries
Data Security Software Products
#J-18808-Ljbffr

This advertiser has chosen not to accept applicants from your region.

Tech Lead (SRE) - Cloud Infrastructure

Singapore, Singapore Refine Group

Posted 10 days ago

Job Viewed

Tap Again To Close

Job Description

Responsibilities

ByteDance will prioritize applicants who have the right to work in Singapore without requiring sponsorship.

About ByteDance

Founded in 2012, ByteDance's mission is to inspire creativity and enrich life. Our products include TikTok, Toutiao, Douyin, and Xigua, making it easier and more fun for people to connect, consume, and create content.

Why Join Us

Creation is at the core of ByteDance's purpose. Our teams drive innovation and growth, turning challenges into opportunities to learn and improve. We foster a culture of courage, collaboration, and impact.

Team Introduction

The Site Reliability Engineering (SRE) team combines software and systems engineering to design and operate large-scale, distributed, and resilient systems.

Within TikTok's Infrastructure SRE, our focus is on ensuring the reliability and uptime of our infrastructure services, supporting rapid improvements through automation and system optimization.

The Role

As a Tech Lead, you will guide and build a team of software and system engineers, establishing efficient processes and promoting best engineering practices. You will coordinate with other teams and the user community.

Responsibilities
  • Build and lead the SRE team, including recruitment, training, system operation, and fostering a strong team culture.
  • Oversee software system development and organizational unit integration.
  • Develop long-term technical strategies with clear milestones to enhance team capabilities.
  • Guide Proof-of-Concept and solution development, ensuring security and risk considerations.
  • Establish protocols for access management, configuration, disaster recovery, and fault handling.
  • Create monitoring frameworks and promote automated, intelligent governance within a service-oriented architecture.
  • Collaborate with development teams to ensure system reliability from design to launch, advancing automation in operations and maintenance.
  • Improve communication and collaboration with business teams, refining processes and business architecture.
Qualifications

What you should have:

  • Bachelor's Degree in Computer Science or related field, with over 5 years of professional experience, including at least 3 in R&D.
  • Proficiency in Linux systems, networking, and managing large-scale distributed systems.
  • Strong planning, summarization, and project management skills.
  • Responsibility, proactive attitude, and problem-solving skills.
  • Experience with cloud platforms is a plus; experience in large-scale storage, scheduling, big data, or intelligent operations is preferred.

ByteDance values diversity and is committed to creating an inclusive environment where employees are valued for their unique perspectives. We aim to reflect the communities we serve and foster a workplace of creativity and innovation.

#J-18808-Ljbffr
This advertiser has chosen not to accept applicants from your region.

Tech Lead (SRE) - Cloud Infrastructure

Singapore, Singapore Refine Group

Posted today

Job Viewed

Tap Again To Close

Job Description

Responsibilities
ByteDance will prioritize applicants who have the right to work in Singapore without requiring sponsorship.
About ByteDance
Founded in 2012, ByteDance's mission is to inspire creativity and enrich life. Our products include TikTok, Toutiao, Douyin, and Xigua, making it easier and more fun for people to connect, consume, and create content.
Why Join Us
Creation is at the core of ByteDance's purpose. Our teams drive innovation and growth, turning challenges into opportunities to learn and improve. We foster a culture of courage, collaboration, and impact.
Team Introduction
The Site Reliability Engineering (SRE) team combines software and systems engineering to design and operate large-scale, distributed, and resilient systems.
Within TikTok's Infrastructure SRE, our focus is on ensuring the reliability and uptime of our infrastructure services, supporting rapid improvements through automation and system optimization.
The Role
As a Tech Lead, you will guide and build a team of software and system engineers, establishing efficient processes and promoting best engineering practices. You will coordinate with other teams and the user community.
Responsibilities
Build and lead the SRE team, including recruitment, training, system operation, and fostering a strong team culture.
Oversee software system development and organizational unit integration.
Develop long-term technical strategies with clear milestones to enhance team capabilities.
Guide Proof-of-Concept and solution development, ensuring security and risk considerations.
Establish protocols for access management, configuration, disaster recovery, and fault handling.
Create monitoring frameworks and promote automated, intelligent governance within a service-oriented architecture.
Collaborate with development teams to ensure system reliability from design to launch, advancing automation in operations and maintenance.
Improve communication and collaboration with business teams, refining processes and business architecture.
Qualifications
What you should have:
Bachelor's Degree in Computer Science or related field, with over 5 years of professional experience, including at least 3 in R&D.
Proficiency in Linux systems, networking, and managing large-scale distributed systems.
Strong planning, summarization, and project management skills.
Responsibility, proactive attitude, and problem-solving skills.
Experience with cloud platforms is a plus; experience in large-scale storage, scheduling, big data, or intelligent operations is preferred.
ByteDance values diversity and is committed to creating an inclusive environment where employees are valued for their unique perspectives. We aim to reflect the communities we serve and foster a workplace of creativity and innovation.
#J-18808-Ljbffr

This advertiser has chosen not to accept applicants from your region.
 

Nearby Locations

Other Jobs Near Me

Industry

  1. request_quote Accounting
  2. work Administrative
  3. eco Agriculture Forestry
  4. smart_toy AI & Emerging Technologies
  5. school Apprenticeships & Trainee
  6. apartment Architecture
  7. palette Arts & Entertainment
  8. directions_car Automotive
  9. flight_takeoff Aviation
  10. account_balance Banking & Finance
  11. local_florist Beauty & Wellness
  12. restaurant Catering
  13. volunteer_activism Charity & Voluntary
  14. science Chemical Engineering
  15. child_friendly Childcare
  16. foundation Civil Engineering
  17. clean_hands Cleaning & Sanitation
  18. diversity_3 Community & Social Care
  19. construction Construction
  20. brush Creative & Digital
  21. currency_bitcoin Crypto & Blockchain
  22. support_agent Customer Service & Helpdesk
  23. medical_services Dental
  24. medical_services Driving & Transport
  25. medical_services E Commerce & Social Media
  26. school Education & Teaching
  27. electrical_services Electrical Engineering
  28. bolt Energy
  29. local_mall Fmcg
  30. gavel Government & Non Profit
  31. emoji_events Graduate
  32. health_and_safety Healthcare
  33. beach_access Hospitality & Tourism
  34. groups Human Resources
  35. precision_manufacturing Industrial Engineering
  36. security Information Security
  37. handyman Installation & Maintenance
  38. policy Insurance
  39. code IT & Software
  40. gavel Legal
  41. sports_soccer Leisure & Sports
  42. inventory_2 Logistics & Warehousing
  43. supervisor_account Management
  44. supervisor_account Management Consultancy
  45. supervisor_account Manufacturing & Production
  46. campaign Marketing
  47. build Mechanical Engineering
  48. perm_media Media & PR
  49. local_hospital Medical
  50. local_hospital Military & Public Safety
  51. local_hospital Mining
  52. medical_services Nursing
  53. local_gas_station Oil & Gas
  54. biotech Pharmaceutical
  55. checklist_rtl Project Management
  56. shopping_bag Purchasing
  57. home_work Real Estate
  58. person_search Recruitment Consultancy
  59. store Retail
  60. point_of_sale Sales
  61. science Scientific Research & Development
  62. wifi Telecoms
  63. psychology Therapy
  64. pets Veterinary
View All Sre Jobs