334 Sre Manager jobs in Singapore
Senior Manager – Site Reliability Engineering SRE
Posted today
Job Viewed
Job Description
Nice to Meet You We are Dropsuite, a NinjaOne Company
Site Ops teams are a blend of pragmatic operators and software craftspeople that apply sound engineering principles, operational discipline, and mature automation to our operating environments.
We are seeking a seasoned Senior Manager – Site Reliability Engineering (SRE) to lead a high-impact team focused on building resilient, scalable infrastructure and ensuring platform reliability across our cloud environments. This role combines strategic leadership with deep technical expertise in automation, observability, and modern DevOps practices to drive operational excellence and service uptime.
Work Arrangement
- Full-time position
- Hybrid work model (2 days per week in the office)
- Monday to Friday, 5-day work week (flexible work schedule)
- Eligible to reside and work in Singapore (Singapore Citizens / PRs preferred)
This position is open exclusively to candidates who reside in and are authorised to work in Singapore. Only shortlisted candidates will be contacted.
Key Accountabilities
- Define and implement SRE roadmaps aligned with business objectives and SLAs.
- Collaborate with service owners to define SLOs supporting SLA commitments.
- Deliver platform SLI insights through reports and observability tools.
- Integrate reliability best practices into engineering and product workflows.
- Lead initiatives on uptime, monitoring, incident response, and optimization.
- Manage incident response processes, on-call rotations, and playbooks.
- Set infrastructure resiliency standards for cloud-native environments.
- Optimize architecture for scalability, fault tolerance, and cost efficiency.
- Ensure production systems meet security and compliance requirements.
- Provide strategic leadership and mentorship to drive team growth and performance.
- Design scalable and resilient systems architecture.
- Recruit, mentor, and retain high-performing SRE talent.
- Develop growth and training plans for SRE team members.
- Foster a reliability-focused, customer-centric team culture.
Qualifications and Competencies
- Bachelor's degree in Computer Science or a related field.
- Cloud certification in AWS, Azure, or GCP preferred.
- 8+ years in Software Engineering or Site Reliability Engineering.
- 3+ years in team management or technical leadership.
- Expert-level Linux administration, scripting, and troubleshooting.
- Strong hands-on experience with CI/CD and SDLC practices.
- Deep passion for automation, security, and self-service.
- Proficient in AWS, GCP, and/or Azure cloud platforms.
- Skilled in infrastructure-as-code tools like Terraform, CloudFormation, Helm, and Ansible.
- Experienced with containers, Kubernetes, and microservice architectures.
- Excellent verbal and written communication skills.
Why Join Us
At Dropsuite, now proudly part of NinjaOne, we are on a mission to safeguard business information and help businesses stay in business. We are a global, fast-growing, partner-centric company building secure, scalable, and highly usable cloud backup technologies for businesses of all sizes. Today, we perform billions of backups daily for organizations across more than 100 countries.
As we enter an exciting new chapter with NinjaOne—a leader in endpoint management, security, and IT automation—our combined strengths enable us to drive even greater impact, innovation, and global scale. Together, we are building a world-class platform that empowers IT teams with simplicity, performance, and reliability.
At our core, we are a team of hungry owners: we are tenacious in our pursuit of excellence and take full ownership in everything we do. We are deeply customer-focused, collaborative, and solutions-driven. We play as a team—respecting, supporting, and elevating one another every step of the way.
Join us as we shape the future of IT and data protection—powered by passion, purpose, and the spirit of ownership.
Rewards That Go Beyond
- Competitive compensation
- Hybrid work model
- 18 days of annual leave (with accrual up to 20 days)
- Entitled to Singapore Public Holidays
- Other leave benefits, such as Wedding leave
- Health Insurance for you and your dependents
- Growth opportunities
- Work in a global company with meaningful work, highly skilled colleagues, and an amazing culture
Diversity and Inclusion Statement
Dropsuite is an Equal Employment Opportunity and Affirmative Action Employer. Qualified applicants will receive consideration for employment without regard to race, colour, religion, sex, sexual orientation, gender perception or identity, national origin, age, marital status, protected veteran status, or disability status.
As part of our recruitment process, we may collect personal data to support hiring-related activities such as screening, assessment, and communication. This information is collected solely for recruitment purposes and handled in accordance with applicable data protection and privacy regulations. Your data will be treated with strict confidentiality and used only to facilitate your application with us.
Your Career Growth Starts Here. Apply Now
Tell employers what skills you haveTroubleshooting
Scalability
Operational Excellence
Kubernetes
Azure
Ubuntu
Software Engineering
Scripting
Reliability
Administration Management
Reliability Engineering
Technical Consultation
GCP
Ansible
Linux
Cloud Infrastructure
Posted today
Job Viewed
Job Description
An excellent Cloud Infrastructure & Delivery Director opportunity has just arisen at a globally recognized tech partner, delivering cutting-edge cloud and infrastructure services across a dynamic client portfolio.
Job Purpose:
Lead and improve cloud and IT service delivery operations for enterprise-scale clients and platforms.
Job Responsibilities:
- Oversee cloud and infrastructure services to meet performance, security, and client satisfaction goals.
- Drive service excellence through proactive operations, automation, and process improvements.
- Manage client relationships, SLAs, and ensure timely resolution of escalated service issues.
- Lead and coach a high-performing team across service delivery and technical support functions.
- Collaborate with internal teams to align solutions with client and business needs.
- 10+ years in IT service delivery with 5+ years in a senior leadership role.
- Strong knowledge of AWS, Azure, or GCP in hybrid cloud environments.
- Experience working in a System Integrator (SI) or multi-client delivery model.
- Skilled in service management tools like Splunk, ServiceNow, Dynatrace, or Elastic.
- ITIL v4 certification required; cloud or PMP certifications are a plus.
Work with a global tech leader delivering secure, scalable cloud and infrastructure services-apply now at
PERSOLKELLY Singapore Pte Ltd
• EA License No.01C4394
• EA Registration No. R1330844 (Naveen Vasudevan)
By sending us your personal data and curriculum vitae (CV), you are deemed to consent to PERSOLKELLY Singapore Pte Ltd and its affiliates to collect, use and disclose your personal data for the purposes set out in the Privacy Policy available at You acknowledge that you have read, understood, and agree with the Privacy Policy.
***
Senior Software Engineer, Site Reliability Engineering
Posted 2 days ago
Job Viewed
Job Description
We are a team to design, develop, maintain, and improve software for various ventures projects, i.e., projects that are adjacent to our core businesses and are bootstrapped fast with a lean team. You will be actively involved in the design of various components behind scalable applications, from frontend UI to backend infrastructure.
- Ensure entire stack is healthy: hardware, software, application and network are operating at optimal performance
- Perform deep dives into both systemic and latent reliability issues; partnering with other software and DevOps engineers across the organization to design, implement and roll out fixes
- Continuously improve availability, reliability, and observability and reduce the burden of human toil with tooling and automation
- Lead and drive SRE initiatives to improve operation efficiencies
- Represent the SRE team in system design reviews and operational readiness exercises for new and existing services
- Experience coding in Ruby and/or Go
- Familiar with GitOps principles and tools (Github Actions, Docker, Kubernetes)
- Experience in designing, analyzing, and troubleshooting large-scale distributed systems
- Curiosity about finding root causes in incidents and outages
- Ability to develop alignment to cultivate relationships and driving impact
- Mindset in designing fault tolerance system architecture
- Comfort with being uncomfortable in ambiguous situations
- Involvement with incident management and response
- Desire to grow expertise, inform, and educate others
- Capable to pick up various technologies, a fast learner and have a “get things done” mentality
- Humble to embrace better ideas from others, eager to make things better, open to challenges and possibilities
- Familiar with cloud platforms and micro-service based architecture (AWS is big plus)
- Familiar with monitoring tools (e.g. Datadog, OpenTelemetry)
- Familiar with CICD tools (e.g. Github Actions)
- Familiar with IaC tools (e.g. Terraform, Spacelift)
- Experience in designing resilient system architecture
- Experience in optimizing performance of large-scale production system
Life @ Crypto.com
Empowered to think big. Try new opportunities while working with a talented, ambitious and supportive team.
Transformational and proactive working environment. Empower employees to find thoughtful and innovative solutions.
Growth from within. We help to develop new skill-sets that would impact the shaping of your personal and professional growth.
Work Culture. Our colleagues are some of the best in the industry; we are all here to help and support one another.
One cohesive team. Engage stakeholders to achieve our ultimate goal - Cryptocurrency in every wallet.
Work Flexibility Adoption. Flexi-work hour and hybrid or remote set-up
Aspire career alternatives through us - our internal mobility program offers employees a new scope.
Work Perks: crypto.com visa card provided upon joining
Are you ready to kickstart your future with us?
Benefits
Competitive salary
Attractive annual leave entitlement including: birthday, work anniversary
Work Flexibility Adoption. Flexi-work hour and hybrid or remote set-up
Aspire career alternatives through us. Our internal mobility program can offer employees a diverse scope.
Work Perks: crypto.com visa card provided upon joining
Our Crypto.com benefits packages vary depending on region requirements, you can learn more from our talent acquisition team.
About Crypto.com:
Founded in 2016, Crypto.com serves more than 80 million customers and is the world's fastest growing global cryptocurrency platform. Our vision is simple: Cryptocurrency in Every Wallet. Built on a foundation of security, privacy, and compliance, Crypto.com is committed to accelerating the adoption of cryptocurrency through innovation and empowering the next generation of builders, creators, and entrepreneurs to develop a fairer and more equitable digital ecosystem.
Learn more at
Crypto.com is an equal opportunities employer and we are committed to creating an environment where opportunities are presented to everyone in a fair and transparent way. Crypto.com values diversity and inclusion, seeking candidates with a variety of backgrounds, perspectives, and skills that complement and strengthen our team.
Personal data provided by applicants will be used for recruitment purposes only.
Please note that only shortlisted candidates will be contacted.
Senior Software Engineer, Site Reliability Engineering
Posted 13 days ago
Job Viewed
Job Description
We are a team to design, develop, maintain, and improve software for various ventures projects, i.e., projects that are adjacent to our core businesses and are bootstrapped fast with a lean team. You will be actively involved in the design of various components behind scalable applications, from frontend UI to backend infrastructure.
What you’ll be doing- Ensure entire stack is healthy: hardware, software, application and network are operating at optimal performance
- Perform deep dives into both systemic and latent reliability issues; partnering with other software and DevOps engineers across the organization to design, implement and roll out fixes
- Continuously improve availability, reliability, and observability and reduce the burden of human toil with tooling and automation
- Lead and drive SRE initiatives to improve operation efficiencies
- Represent the SRE team in system design reviews and operational readiness exercises for new and existing services
- Experience coding in Ruby and/or Go
- Familiar with GitOps principles and tools (Github Actions, Docker, Kubernetes)
- Experience in designing, analyzing, and troubleshooting large-scale distributed systems
- Curiosity about finding root causes in incidents and outages
- Ability to develop alignment to cultivate relationships and driving impact
- Mindset in designing fault tolerance system architecture
- Comfort with being uncomfortable in ambiguous situations
- Involvement with incident management and response
- Desire to grow expertise, inform, and educate others
- Capable to pick up various technologies, a fast learner and have a “get things done” mentality
- Humble to embrace better ideas from others, eager to make things better, open to challenges and possibilities
- Familiar with cloud platforms and micro-service based architecture (AWS is big plus)
- Familiar with monitoring tools (e.g. Datadog, OpenTelemetry)
- Familiar with CICD tools (e.g. Github Actions)
- Familiar with IaC tools (e.g. Terraform, Spacelift)
- Experience in designing resilient system architecture
- Experience in optimizing performance of large-scale production system
Empowered to think big. Try new opportunities while working with a talented, ambitious and supportive team.
Transformational and proactive working environment. Empower employees to find thoughtful and innovative solutions.
Growth from within. We help to develop new skill-sets that would impact the shaping of your personal and professional growth.
Work Culture. Our colleagues are some of the best in the industry; we are all here to help and support one another.
One cohesive team. Engage stakeholders to achieve our ultimate goal - Cryptocurrency in every wallet.
Work Flexibility Adoption. Flexi-work hour and hybrid or remote set-up.
Aspire career alternatives through us - our internal mobility program offers employees a new scope.
Work Perks: crypto.com visa card provided upon joining.
BenefitsCompetitive salary.
Attractive annual leave entitlement including: birthday, work anniversary.
Work Flexibility Adoption. Flexi-work hour and hybrid or remote set-up.
Aspire career alternatives through us. Our internal mobility program can offer employees a diverse scope.
Work Perks: crypto.com visa card provided upon joining.
Our Crypto.com benefits packages vary depending on region requirements, you can learn more from our talent acquisition team.
About Crypto.com:Founded in 2016, Crypto.com serves more than 80 million customers and is the world's fastest growing global cryptocurrency platform. Our vision is simple: Cryptocurrency in Every Wallet. Built on a foundation of security, privacy, and compliance, Crypto.com is committed to accelerating the adoption of cryptocurrency through innovation and empowering the next generation of builders, creators, and entrepreneurs to develop a fairer and more equitable digital ecosystem.
Learn more at
Crypto.com is an equal opportunities employer and we are committed to creating an environment where opportunities are presented to everyone in a fair and transparent way. Crypto.com values diversity and inclusion, seeking candidates with a variety of backgrounds, perspectives, and skills that complement and strengthen our team.
Personal data provided by applicants will be used for recruitment purposes only.
Please note that only shortlisted candidates will be contacted.
#J-18808-LjbffrCloud Infrastructure Engineer
Posted 2 days ago
Job Viewed
Job Description
Assurity Trusted Solutions (ATS) is a wholly owned subsidiary of the Government Technology Agency (GovTech). As a Trusted Partner over the last decade, ATS offers a comprehensive suite of products and services ranging from infrastructure and operational services, authentication services, governance and assurance services as well as managed processes. In a dynamic digital and cyber landscape, where trust & collaboration are key, ATS continues to drive mutually beneficial business outcomes through collaboration with GovTech, government agencies and commercial partners to mitigate cyber risks and bolster security postures.
Responsibilities:- Design, deploy, and optimize Kubernetes clusters using the Nvidia software stack to support large language model applications.
- Collaborate with cross-functional teams to integrate Nvidia GPU resources effectively within Kubernetes environments, ensuring optimal performance.
- Implement and manage infrastructure as code (IaC) for Nvidia GPU configurations, focusing on scalability and high availability.
- Monitor, troubleshoot, and resolve issues related to both Kubernetes clusters and Nvidia GPU resources to maintain a reliable and performant infrastructure.
- Stay abreast of industry best practices and emerging technologies related to Kubernetes and the Nvidia GPU ecosystem.
- Work closely with development teams to automate deployment processes, leveraging Nvidia GPU capabilities, and streamline workflows.
- Implement security best practices to safeguard Kubernetes environments, Nvidia GPU resources, and sensitive data.
- Participate in on-call rotation and provide timely response to incidents, minimizing downtime for language model applications.
- Contribute to capacity planning and performance tuning activities, considering the demands of large-scale language model applications utilizing Nvidia GPU acceleration.
- Document infrastructure configurations, processes, and procedures, facilitating knowledge sharing and team member onboarding.
- Bachelor's degree in Computer Science, Information Technology, or a related field.
- Proven experience in designing, implementing, and managing on-premises infrastructure solutions.
- Strong knowledge of server virtualization, storage systems, and network infrastructure.
- Hands-on experience with cloud-native technologies and deployment strategies.
- Proven experience designing, deploying, and managing Kubernetes clusters such as SUSE Rancher, RedHat OpenShift.
- Strong understanding of containerization concepts such as Docker, orchestration tools like Kubernetes, and Nvidia GPU acceleration technologies.
- Proficiency in scripting, automation, and configuration management using tools such as Chef, Ansible, Terraform, or similar.
- Familiarity with infrastructure-as-code principles and tools (e.g., Helm, Kubernetes manifests).
- Experience with large-scale language model applications, particularly leveraging Nvidia GPU acceleration, is highly desirable.
- Solid knowledge of networking concepts, Kubernetes networking models, and integration with Nvidia GPU resources.
- Excellent problem-solving and troubleshooting skills, with a proactive approach to system optimization.
- Strong communication skills for effective collaboration in a team-oriented, agile environment.
Join us and discover a meaningful and exciting career with Assurity Trusted Solutions!
The remuneration package will be commensurate with your qualifications and experience. Interested applicants, please click "Apply Now".
We thank you for your interest and please note that only shortlisted candidates will be notified.
By submitting your application, you agree that your personal data may be collected, used and disclosed by Assurity Trusted Solutions Pte. Ltd. (ATS), GovTech and their service providers and agents in accordance with ATS’s privacy statement which can be found at: or such other successor site.
- A wholly-owned subsidiary of GovTech.
- We promote a learning culture and encourage you to grow and learn.
Cloud Infrastructure Engineer
Posted 2 days ago
Job Viewed
Job Description
As a Cloud Infrastructure Engineer, you will run and support the company’s software application infrastructure in the cloud, ensuring the optimized use of resources to scale.
What will you do
- Design, develop, and implement cloud infrastructure as code using Terraform.
- Manage and optimize cloud resources for cost-efficiency and performance.
- Secure infrastructure by implementing best practices and adhering to security compliance standards.
- Implement robust monitoring and alerting systems for infrastructure resources.
- Collaborate with developers and operations teams to integrate infrastructure changes seamlessly into the software development lifecycle.
- Champion DevOps best practices for cloud infrastructure management.
- Setup, monitor and support CI/CD pipeline on premise or in the cloud.
- Stay up to date on the latest AWS services and Terraform features.
What do we expect
- Proven experience as a Cloud Infrastructure Engineer with at least 3 years of experience (with a focus on AWS cloud infrastructure).
- Implementation proficiency in service control policy and account governance (Control Tower, OU management, etc.).
- Enable and configure AWS VPC flow logs, load balancer logging, direct connect, AWS VPN, TGX etc.
- Experience managing Unix/Linux environments.
- Extensive experience in scripting in languages such as Python/Ruby/Perl and shell scripting.
- In-depth knowledge and experience with AWS services (EC2, S3, VPC, IAM, etc.).
- Expert-level proficiency in Terraform, including writing reusable modules and leveraging best practices.
- Able to troubleshoot infrastructure issues in cloud, identify root cause and implement improvements and fixes.
- Good knowledge of microservices architecture, container technologies (Docker, Kubernetes).
- Expertise in Git and GitOps philosophy.
- Good people skills, teamwork, and collaboration with cross-functional and cross-geo teams.
- Self-motivated, willing to learn new technologies continuously.
- Maintain documentation for design, integration, testing, and deployment.
- Relevant cloud platform certifications (e.g. AWS)
- Some experience with other cloud providers like Azure and GCP.
- Experience with CI/CD pipelines and integrating infrastructure provisioning with development workflows.
- Knowledge of mobile and web applications development.
- Knowledge of big data and data analytics infrastructure in cloud.
Our offer #J-18808-Ljbffr
Cloud Infrastructure Engineer
Posted 3 days ago
Job Viewed
Job Description
Overview
We are seeking bright and friendly individuals with excellent communication skills, willingness to learn and apply new technology and a track record for providing great customer service. You will have opportunities to work in a variety of challenging IT environments to quickly build up your IT and communication skills.
What will you be doing?
As a Cloud Infrastructure Engineer, you should know about either AWS or Azure. Your job is to do onsite implementation for different AWS/Azure services based on the system infrastructure design, trouble shoot the infrastructure issues and resolve them together with vendor and customer.
Your responsibilities will include :
- Designing the system infrastructure architecture by using AWS/Azure native services.
- Implementing the infrastructure into cloud environments using AWS/Azure
- Tracking and monitoring the status for provisioning of each AWS/Azure
- Trouble shooting and resolving the infrastructure issues during the implementation and maintenance.
OK, I’m interested… is this the job for me?
We look for people who value agility, passion, and teamwork; those who can bring fresh ideas to the table and want the opportunity to learn, grow, and expand their careers. Bring your aptitude and build upon what you do best for our customers, partners, team, and you.
Other qualities you will need to be a fit for this role include :
- Mandatory 2+ years’ experience using Amazon Web Services (AWS) or Microsoft Azure and its various components.
- Experience in setting up, maintaining, and monitoring AWS services such as EC2, RDS, Lambda, API Gateway, S3, Cloud Watch, ALB, etc.
- Experience in setting up, maintaining, and monitoring Azure services such as Virtual Machine, Manage Disk, Azure SQL, Load Balancer, Application Gateway, etc.
- Experience implementing and maintaining secure and robust environments.
- Strong experience with SQL Server/Oracle.
- Knowledge of network technologies including VPC, Subnet, Security Group for AWS and Vnet, NSG, ASG for Azure.
- Experience with Continuous Integration and Continuous Delivery solutions.
- Experience with Infrastructure as a code to provision the AWS services.
- Good trouble shooting and organizational skills.
- Preferred if obtained the AWS Certified Solutions Architect – Professional certification or Microsoft Certified Solutions Architect – Expert certification.
- Preferred if have experience with working on Government on Commercial Cloud (GCC).
Check out our careers blog for content on our people, culture, and workplace!
We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color , national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.
Any personal data you share with us during the application process will be processed strictly in compliance with applicable data protection laws and our Privacy Notice .
#LI-TS1
Any personal data you share with us during the application process will be processed strictly in compliance with applicable data protection laws and our Privacy Notice .
#J-18808-LjbffrBe The First To Know
About the latest Sre manager Jobs in Singapore !
Cloud Infrastructure Specialist
Posted today
Job Viewed
Job Description
We are seeking a skilled System Engineer (Cloud Platform) to collaborate with our team and contribute to the development of innovative data protection solutions.
Key Responsibilities- Collaborate with senior engineers to ensure smooth operation and high availability of production systems.
- Maintain infrastructure, configuration, and monitoring using best practices in software development lifecycle.
- Create, extend, and improve delivery and development processes and systems.
- Participate in deployment and maintenance phases of software.
- Technical Skills:
- A minimum of 2 years of experience as a System Administrator or 1 year as a DevOps/SRE engineer.
- Strong Linux administration skills.
- Proficiency in Python, Bash, or Groovy scripting.
- Experience in automation development.
- Knowledge and hands-on experience with configuration management tools (e.g., Ansible).
- Understanding of networking concepts (e.g., NGINX, DNS, DHCP, PXE, firewalls, routing, NAT).
- Familiarity with Infrastructure as Code principles.
- Experience with CI/CD pipelines.
- Knowledge of virtualization and containerization.
- Familiarity with provisioning and orchestration systems.
- Soft Skills:
- Excellent communication skills for effective collaboration with teams across multiple countries.
- English proficiency at an upper-intermediate level or higher.
- Experience in multi-cloud or hybrid cloud environments.
- Certifications in Linux, DevOps, or related technologies.
Cloud Infrastructure Specialist
Posted today
Job Viewed
Job Description
We are seeking a highly skilled Senior DevOps Engineer to enhance our technology ecosystem.
About the Role- Minimum of 8 years of experience in DevOps, cloud infrastructure, and CI/CD pipeline management is required.
- Extensive experience with AWS services, including EC2, S3, RDS, Lambda, Airflow, and IAM is essential.
- Strong proficiency in infrastructure as code tools, such as Terraform and CloudFormation, is necessary.
- Experience with GitLab pipelines, including creating, managing, and optimizing CI/CD workflows, is desired.
- Knowledge of Prisma and its integration with various data projects is beneficial.
- Familiarity with containerization and orchestration technologies, such as Docker and Kubernetes, is required.
- Proficiency in scripting languages, such as Python, Bash, or similar, is expected.
- Strong problem-solving skills and ability to troubleshoot complex infrastructure and application issues are necessary.
- Excellent communication and collaboration skills, with a proactive approach to teamwork and project management, are essential.
To succeed in this role, you will need:
- A strong foundation in DevOps principles and practices.
- Excellent technical skills, with expertise in cloud infrastructure, CI/CD pipelines, and infrastructure as code tools.
- Strong problem-solving skills and ability to troubleshoot complex technical issues.
- Excellent communication and collaboration skills, with a proactive approach to teamwork and project management.
Cloud Infrastructure Specialist
Posted today
Job Viewed
Job Description
About the role
We are seeking an experienced Cloud Infrastructure Specialist to design, build, and manage our AWS cloud infrastructure. In this position, you will be responsible for ensuring reliability, security, and cost optimization while integrating into on-premises, IaaS, PaaS, and SaaS platforms.
- Design and implement AWS cloud infrastructure with integration into on-premises, IaaS, PaaS, and SaaS platforms.
- Collaborate with application teams to architect, deploy, and maintain highly scalable, secure, cost-effective, and reliable solutions on AWS.
- Oversee cloud infrastructure setup, governance, and operational management.
- Explore new AWS and cloud-native technologies, integrating best practices into the infrastructure.
- Develop and maintain architecture documentation, explaining cloud design concepts and solutions to technical and non-technical teams.
- Implement and manage incident, problem, and change management processes for cloud environments.
- Maintain comprehensive documentation for cloud standards, configurations, and troubleshooting procedures.
- Perform security assessments, compliance audits, and disaster recovery planning.
- Drive infrastructure automation using Terraform, Ansible, or other tools to enhance operational efficiency.
- Optimize cloud environments to reduce costs and improve performance, collaborating with multiple business stakeholders.
- 5+ years of experience in cloud infrastructure engineering, with a strong focus on AWS.
- Expertise in AWS services (EC2, VPC, IAM, Lambda, RDS, S3, CloudFormation, CloudTrail, Config, etc.).
- Experience in cloud networking, security, IAM, CI/CD, and monitoring tools.
- Proficiency in Terraform, Ansible, or other automation tools is an advantage.
- Hands-on experience with Kubernetes (EKS / AKS) and containerized workloads is a plus.
- AWS Solutions Architect (Associate/Professional) certification required; Azure certifications are a plus.
- Strong project management and stakeholder communication skills.
- Experience in managing outsourcing partners/system integrators is an advantage.
This role offers a unique opportunity to take ownership of AWS infrastructure in a fast-paced, dynamic environment while leveraging automation and cloud best practices.
Work EnvironmentThis is a mid-senior level position in the IT Services and IT Consulting industry.