2 jobs in VIPKid
Senior LLM Deployment & Inference Optimization Engineer
Posted 10 days ago
Job Viewed
Job Description
We are looking for an experienced Senior LLM Deployment & Inference Optimization Engineer to build and operate self-hosted inference infrastructure for LLMs, multimodal models, ASR, and TTS systems in the cloud. Your mission is to deliver a stable, low-latency, and cost-efficient inference platform that powers real-time conversations and voice interactions in AI-driven English learning classrooms. This is a senior, cross-functional engineering role focused on deploying, optimizing, and operating open-source inference engines and GPU infrastructure at scale, rather than developing inference kernels from scratch.
Responsibilities
- Design, deploy, and operate self-hosted cloud inference services for LLMs, multimodal models, ASR, and TTS systems , building highly available and elastically scalable inference infrastructure.
- Optimize and productionize open-source inference frameworks such as vLLM, SGLang, TensorRT-LLM, Triton, and TGI , focusing on: Throughput, Latency, time-to-First-Token (TTFT), Continuous batching, KV cache optimization, Quantization and Parallelization strategies
- Achieve the optimal balance between user experience and infrastructure cost.
- Manage and optimize GPU resources and infrastructure costs, including: Instance selection, GPU utilization improvements, Scheduling and workload co-location, Spot and reserved instance strategies and Cost-per-inference optimization
- Build reliability, observability, and performance management systems for inference services, including: Monitoring and alerting, Load testing, Capacity planning, Rate limiting
- Graceful degradation and disaster recovery
- GPU memory management and OOM mitigation
- Ensure high SLA performance for real-time production workloads.
- Improve model-serving engineering capabilities, including: Multi-model routing, Load balancing, Auto-scaling, Canary deployments and Rollback mechanisms
- Support rapid and reliable model iteration
- Collaborate closely with AI researchers, backend engineers, and application teams to establish an end-to-end path from model development to production deployment.
Requirements
- Bachelor's degree or above in Computer Science or a related field.
- 5+ years of experience in backend engineering, infrastructure engineering, MLOps, or related domains.
- Proven production experience with self-hosted model inference systems
- Independently deployed or led deployment of LLM, multimodal, or speech models in production environments.
- Responsible for real-world reliability, scalability, and cost management—not just proof-of-concept or demo deployments.
- Strong hands-on experience with one or more of: vLLM, SGLang, TensorRT-LLM, Triton Inference Server and Hugging Face TGI
- Able to understand their internals and perform advanced service optimization.
- Deep understanding of inference optimization techniques, including: Transformer inference mechanisms, KV Cache, Continuous/Dynamic Batching, Quantization (INT8, FP8, AWQ, GPTQ, etc.), Tensor Parallelism (TP), Pipeline Parallelism (PP) and PagedAttention
- With proven experience tuning and deploying these techniques in production.
- Strong knowledge of cloud-native infrastructure and GPU environments: Docker, Kubernetes, AWS, GCP, Alibaba Cloud, or similar platforms
- GPU resource scheduling and utilization optimization
- Infrastructure cost optimization
- Solid systems engineering and reliability background: Distributed systems, High-concurrency services, High-availability architectures, Monitoring and observability, Load testing, Capacity planning and Production troubleshooting
- Strong data-driven mindset toward SLA and infrastructure efficiency.
Preferred Qualifications
- Experience optimizing real-time or streaming inference systems , including streaming generation and low TTFT workloads.
- Experience deploying and accelerating: ASR systems, TTS systems, Speech models, Multimodal models
- Experience building or operating: Large-scale GPU clusters, Inference scheduling platforms, Model serving platforms
- Familiarity with: CUDA programming, GPU kernel optimization
- Model compilation technologies such as TensorRT, TVM, and torch.compile
- Understanding of model fine-tuning, distillation, and compression techniques, with awareness of the interplay between training and inference.
- Demonstrated success in: Significantly reducing LLM inference costs and Building inference infrastructure from 0 to 1
Is this job a match or a miss?
Global Human Resources Director
Posted 19 days ago
Job Viewed
Job Description
About the Role
We are seeking a strategic and globally minded Global HR Director to join our leadership team and partner closely with the Founder & CEO in shaping the company’s next phase of growth. This role will lead the end-to-end people strategy across China and international markets, supporting our transformation into an AI-native digital education company and strengthening our global operating model. The ideal candidate combines strong business acumen with deep HR expertise, and is capable of building high-performing, scalable and globally aligned organizations.
Key Responsibilities
1. Global Organization Design & Strategic Enablement
- Translate the company’s global business and AI strategy into scalable organizational structures and talent strategies
- Lead organization design and transformation across the Singapore global headquarters, China operations and overseas markets
- Optimize cross-functional and cross-regional collaboration mechanisms to support rapid execution of strategic initiatives and innovation programs
2.Global Talent Acquisition & Leadership Pipeline
- Build and continuously enhance a globally competitive talent acquisition strategy and operating model
- Lead executive and critical talent hiring across leadership, international operations and technical functions, ensuring timely and high-quality hiring outcomes
- Establish global talent review and succession planning frameworks, maintaining strong visibility over key leadership and successor pipelines across business functions
3.Culture Transformation & Organizational Effectiveness
- Shape and reinforce a high-performance, professional and values-driven culture across global teams
- Foster an environment centered on ownership, transparency, execution excellence and data-informed decision making
- Build effective cross-cultural collaboration mechanisms to strengthen trust, reduce organizational friction and improve overall team cohesion
4.Workforce Planning, Rewards & Performance Management
- Lead global workforce planning and people cost management, developing people analytics and productivity frameworks to improve organizational efficiency and ROI
- Design and optimize compensation, rewards and long-term incentive structures aligned with business growth and talent priorities
- Drive standardized and business-linked performance management systems across domestic and international teams, ensuring performance accountability and organizational alignment
5.Global HR Leadership & Compliance Governance
- Lead and develop HR teams across China, Singapore and international markets, elevating HRBP capability and delivery standards
- Ensure full alignment with labor laws, employment regulations and workforce compliance requirements across key operating regions
- Establish governance and risk-control mechanisms to safeguard organizational integrity and business continuity
Qualifications and Education & Professional Foundation
- Bachelor’s degree or above required; MBA or equivalent business education preferred
- Strong foundation in Organization Development (OD), performance management, rewards and global employment practices
- Experience10+ years of progressive HR leadership experience within leading internet, education technology or global organization
- Proven experience managing multinational teams and HR operations across China and international markets, including Singapore and Southeast Asia
- Experience leading large-scale organizational transformation, international expansion or HQ setup initiatives is highly preferred
- Strong track record in workforce budgeting, people cost optimization and enterprise-level performance management design and execution Leadership Competencies
- CEO mindset & business acumen – Think beyond traditional HR operations and approach people strategy from a business and long-term value creation perspective
- Strong principles & execution capability – Able to navigate organizational complexity with sound judgment, professionalism and decisiveness
- High resilience & adaptability – Thrives in fast-paced, high-growth and cross-time-zone environments with strong ownership and emotional steadiness
- Global leadership capability – Demonstrates cultural intelligence and the ability to lead diverse teams across markets and business context
Why Join us!
- Partner directly with the Founder & CEO on organization strategy and global growth
- Play a critical leadership role in building an AI-native, globally scaled education company
- Shape the future of people, culture and organizational excellence across international markets
Is this job a match or a miss?