Didn't find the right job?

Get expert career advice to help you find the ideal role and improve your job search strategy.

2 jobs in VIPKid

Senior LLM Deployment & Inference Optimization Engineer

Singapore VIPKid

Posted 10 days ago

Tap Again To Close

Job Description

We are looking for an experienced Senior LLM Deployment & Inference Optimization Engineer to build and operate self-hosted inference infrastructure for LLMs, multimodal models, ASR, and TTS systems in the cloud. Your mission is to deliver a stable, low-latency, and cost-efficient inference platform that powers real-time conversations and voice interactions in AI-driven English learning classrooms. This is a senior, cross-functional engineering role focused on deploying, optimizing, and operating open-source inference engines and GPU infrastructure at scale, rather than developing inference kernels from scratch.

Responsibilities

Design, deploy, and operate self-hosted cloud inference services for LLMs, multimodal models, ASR, and TTS systems , building highly available and elastically scalable inference infrastructure.
Optimize and productionize open-source inference frameworks such as vLLM, SGLang, TensorRT-LLM, Triton, and TGI , focusing on: Throughput, Latency, time-to-First-Token (TTFT), Continuous batching, KV cache optimization, Quantization and Parallelization strategies
Achieve the optimal balance between user experience and infrastructure cost.
Manage and optimize GPU resources and infrastructure costs, including: Instance selection, GPU utilization improvements, Scheduling and workload co-location, Spot and reserved instance strategies and Cost-per-inference optimization
Build reliability, observability, and performance management systems for inference services, including: Monitoring and alerting, Load testing, Capacity planning, Rate limiting
Graceful degradation and disaster recovery
GPU memory management and OOM mitigation
Ensure high SLA performance for real-time production workloads.
Improve model-serving engineering capabilities, including: Multi-model routing, Load balancing, Auto-scaling, Canary deployments and Rollback mechanisms
Support rapid and reliable model iteration
Collaborate closely with AI researchers, backend engineers, and application teams to establish an end-to-end path from model development to production deployment.

Requirements

Bachelor's degree or above in Computer Science or a related field.
5+ years of experience in backend engineering, infrastructure engineering, MLOps, or related domains.
Proven production experience with self-hosted model inference systems
Independently deployed or led deployment of LLM, multimodal, or speech models in production environments.
Responsible for real-world reliability, scalability, and cost management—not just proof-of-concept or demo deployments.
Strong hands-on experience with one or more of: vLLM, SGLang, TensorRT-LLM, Triton Inference Server and Hugging Face TGI
Able to understand their internals and perform advanced service optimization.
Deep understanding of inference optimization techniques, including: Transformer inference mechanisms, KV Cache, Continuous/Dynamic Batching, Quantization (INT8, FP8, AWQ, GPTQ, etc.), Tensor Parallelism (TP), Pipeline Parallelism (PP) and PagedAttention
With proven experience tuning and deploying these techniques in production.
Strong knowledge of cloud-native infrastructure and GPU environments: Docker, Kubernetes, AWS, GCP, Alibaba Cloud, or similar platforms
GPU resource scheduling and utilization optimization
Infrastructure cost optimization
Solid systems engineering and reliability background: Distributed systems, High-concurrency services, High-availability architectures, Monitoring and observability, Load testing, Capacity planning and Production troubleshooting
Strong data-driven mindset toward SLA and infrastructure efficiency.

Preferred Qualifications

Experience optimizing real-time or streaming inference systems , including streaming generation and low TTFT workloads.
Experience deploying and accelerating: ASR systems, TTS systems, Speech models, Multimodal models
Experience building or operating: Large-scale GPU clusters, Inference scheduling platforms, Model serving platforms
Familiarity with: CUDA programming, GPU kernel optimization
Model compilation technologies such as TensorRT, TVM, and torch.compile
Understanding of model fine-tuning, distillation, and compression techniques, with awareness of the interplay between training and inference.
Demonstrated success in: Significantly reducing LLM inference costs and Building inference infrastructure from 0 to 1

Is this job a match or a miss?

Apply Now

Global Human Resources Director

Singapore VIPKid

Posted 19 days ago

Tap Again To Close

Job Description

About the Role

We are seeking a strategic and globally minded Global HR Director to join our leadership team and partner closely with the Founder & CEO in shaping the company’s next phase of growth. This role will lead the end-to-end people strategy across China and international markets, supporting our transformation into an AI-native digital education company and strengthening our global operating model. The ideal candidate combines strong business acumen with deep HR expertise, and is capable of building high-performing, scalable and globally aligned organizations.

Key Responsibilities

1. Global Organization Design & Strategic Enablement

Translate the company’s global business and AI strategy into scalable organizational structures and talent strategies
Lead organization design and transformation across the Singapore global headquarters, China operations and overseas markets
Optimize cross-functional and cross-regional collaboration mechanisms to support rapid execution of strategic initiatives and innovation programs

2.Global Talent Acquisition & Leadership Pipeline

Build and continuously enhance a globally competitive talent acquisition strategy and operating model
Lead executive and critical talent hiring across leadership, international operations and technical functions, ensuring timely and high-quality hiring outcomes
Establish global talent review and succession planning frameworks, maintaining strong visibility over key leadership and successor pipelines across business functions

3.Culture Transformation & Organizational Effectiveness

Shape and reinforce a high-performance, professional and values-driven culture across global teams
Foster an environment centered on ownership, transparency, execution excellence and data-informed decision making
Build effective cross-cultural collaboration mechanisms to strengthen trust, reduce organizational friction and improve overall team cohesion

4.Workforce Planning, Rewards & Performance Management

Lead global workforce planning and people cost management, developing people analytics and productivity frameworks to improve organizational efficiency and ROI
Design and optimize compensation, rewards and long-term incentive structures aligned with business growth and talent priorities
Drive standardized and business-linked performance management systems across domestic and international teams, ensuring performance accountability and organizational alignment

5.Global HR Leadership & Compliance Governance

Lead and develop HR teams across China, Singapore and international markets, elevating HRBP capability and delivery standards
Ensure full alignment with labor laws, employment regulations and workforce compliance requirements across key operating regions
Establish governance and risk-control mechanisms to safeguard organizational integrity and business continuity

Qualifications and Education & Professional Foundation

Bachelor’s degree or above required; MBA or equivalent business education preferred
Strong foundation in Organization Development (OD), performance management, rewards and global employment practices
Experience10+ years of progressive HR leadership experience within leading internet, education technology or global organization
Proven experience managing multinational teams and HR operations across China and international markets, including Singapore and Southeast Asia
Experience leading large-scale organizational transformation, international expansion or HQ setup initiatives is highly preferred
Strong track record in workforce budgeting, people cost optimization and enterprise-level performance management design and execution Leadership Competencies
CEO mindset & business acumen – Think beyond traditional HR operations and approach people strategy from a business and long-term value creation perspective
Strong principles & execution capability – Able to navigate organizational complexity with sound judgment, professionalism and decisiveness
High resilience & adaptability – Thrives in fast-paced, high-growth and cross-time-zone environments with strong ownership and emotional steadiness
Global leadership capability – Demonstrates cultural intelligence and the ability to lead diverse teams across markets and business context

Why Join us!

Partner directly with the Founder & CEO on organization strategy and global growth
Play a critical leadership role in building an AI-native, globally scaled education company
Shape the future of people, culture and organizational excellence across international markets

Is this job a match or a miss?

Apply Now

Menu

Search Suggestions

Recent Searches

Popular Searches

Location Suggestions

Popular Locations

2 jobs in VIPKid

Senior LLM Deployment & Inference Optimization Engineer

Job Description

Is this job a match or a miss?

Global Human Resources Director

Job Description

Is this job a match or a miss?