2 jobs in VIPKid

Senior LLM Deployment & Inference Optimization Engineer

Singapore VIPKid

Posted 10 days ago

Job Viewed

Tap Again To Close

Job Description

We are looking for an experienced Senior LLM Deployment & Inference Optimization Engineer to build and operate self-hosted inference infrastructure for LLMs, multimodal models, ASR, and TTS systems in the cloud. Your mission is to deliver a stable, low-latency, and cost-efficient inference platform that powers real-time conversations and voice interactions in AI-driven English learning classrooms. This is a senior, cross-functional engineering role focused on deploying, optimizing, and operating open-source inference engines and GPU infrastructure at scale, rather than developing inference kernels from scratch.


Responsibilities

  • Design, deploy, and operate self-hosted cloud inference services for LLMs, multimodal models, ASR, and TTS systems , building highly available and elastically scalable inference infrastructure.
  • Optimize and productionize open-source inference frameworks such as vLLM, SGLang, TensorRT-LLM, Triton, and TGI , focusing on: Throughput, Latency, time-to-First-Token (TTFT), Continuous batching, KV cache optimization, Quantization and Parallelization strategies
  • Achieve the optimal balance between user experience and infrastructure cost.
  • Manage and optimize GPU resources and infrastructure costs, including: Instance selection, GPU utilization improvements, Scheduling and workload co-location, Spot and reserved instance strategies and Cost-per-inference optimization
  • Build reliability, observability, and performance management systems for inference services, including: Monitoring and alerting, Load testing, Capacity planning, Rate limiting
  • Graceful degradation and disaster recovery
  • GPU memory management and OOM mitigation
  • Ensure high SLA performance for real-time production workloads.
  • Improve model-serving engineering capabilities, including: Multi-model routing, Load balancing, Auto-scaling, Canary deployments and Rollback mechanisms
  • Support rapid and reliable model iteration
  • Collaborate closely with AI researchers, backend engineers, and application teams to establish an end-to-end path from model development to production deployment.


Requirements

  • Bachelor's degree or above in Computer Science or a related field.
  • 5+ years of experience in backend engineering, infrastructure engineering, MLOps, or related domains.
  • Proven production experience with self-hosted model inference systems
  • Independently deployed or led deployment of LLM, multimodal, or speech models in production environments.
  • Responsible for real-world reliability, scalability, and cost management—not just proof-of-concept or demo deployments.
  • Strong hands-on experience with one or more of: vLLM, SGLang, TensorRT-LLM, Triton Inference Server and Hugging Face TGI
  • Able to understand their internals and perform advanced service optimization.
  • Deep understanding of inference optimization techniques, including: Transformer inference mechanisms, KV Cache, Continuous/Dynamic Batching, Quantization (INT8, FP8, AWQ, GPTQ, etc.), Tensor Parallelism (TP), Pipeline Parallelism (PP) and PagedAttention
  • With proven experience tuning and deploying these techniques in production.
  • Strong knowledge of cloud-native infrastructure and GPU environments: Docker, Kubernetes, AWS, GCP, Alibaba Cloud, or similar platforms
  • GPU resource scheduling and utilization optimization
  • Infrastructure cost optimization
  • Solid systems engineering and reliability background: Distributed systems, High-concurrency services, High-availability architectures, Monitoring and observability, Load testing, Capacity planning and Production troubleshooting
  • Strong data-driven mindset toward SLA and infrastructure efficiency.

Preferred Qualifications

  • Experience optimizing real-time or streaming inference systems , including streaming generation and low TTFT workloads.
  • Experience deploying and accelerating: ASR systems, TTS systems, Speech models, Multimodal models
  • Experience building or operating: Large-scale GPU clusters, Inference scheduling platforms, Model serving platforms
  • Familiarity with: CUDA programming, GPU kernel optimization
  • Model compilation technologies such as TensorRT, TVM, and torch.compile
  • Understanding of model fine-tuning, distillation, and compression techniques, with awareness of the interplay between training and inference.
  • Demonstrated success in: Significantly reducing LLM inference costs and Building inference infrastructure from 0 to 1


Is this job a match or a miss?
Apply Now

Global Human Resources Director

Singapore VIPKid

Posted 19 days ago

Job Viewed

Tap Again To Close

Job Description

About the Role

We are seeking a strategic and globally minded Global HR Director to join our leadership team and partner closely with the Founder & CEO in shaping the company’s next phase of growth. This role will lead the end-to-end people strategy across China and international markets, supporting our transformation into an AI-native digital education company and strengthening our global operating model. The ideal candidate combines strong business acumen with deep HR expertise, and is capable of building high-performing, scalable and globally aligned organizations.




Key Responsibilities

1. Global Organization Design & Strategic Enablement

  • Translate the company’s global business and AI strategy into scalable organizational structures and talent strategies
  • Lead organization design and transformation across the Singapore global headquarters, China operations and overseas markets
  • Optimize cross-functional and cross-regional collaboration mechanisms to support rapid execution of strategic initiatives and innovation programs


2.Global Talent Acquisition & Leadership Pipeline

  • Build and continuously enhance a globally competitive talent acquisition strategy and operating model
  • Lead executive and critical talent hiring across leadership, international operations and technical functions, ensuring timely and high-quality hiring outcomes
  • Establish global talent review and succession planning frameworks, maintaining strong visibility over key leadership and successor pipelines across business functions


3.Culture Transformation & Organizational Effectiveness

  • Shape and reinforce a high-performance, professional and values-driven culture across global teams
  • Foster an environment centered on ownership, transparency, execution excellence and data-informed decision making
  • Build effective cross-cultural collaboration mechanisms to strengthen trust, reduce organizational friction and improve overall team cohesion


4.Workforce Planning, Rewards & Performance Management

  • Lead global workforce planning and people cost management, developing people analytics and productivity frameworks to improve organizational efficiency and ROI
  • Design and optimize compensation, rewards and long-term incentive structures aligned with business growth and talent priorities
  • Drive standardized and business-linked performance management systems across domestic and international teams, ensuring performance accountability and organizational alignment


5.Global HR Leadership & Compliance Governance

  • Lead and develop HR teams across China, Singapore and international markets, elevating HRBP capability and delivery standards
  • Ensure full alignment with labor laws, employment regulations and workforce compliance requirements across key operating regions
  • Establish governance and risk-control mechanisms to safeguard organizational integrity and business continuity



Qualifications and Education & Professional Foundation

  • Bachelor’s degree or above required; MBA or equivalent business education preferred
  • Strong foundation in Organization Development (OD), performance management, rewards and global employment practices
  • Experience10+ years of progressive HR leadership experience within leading internet, education technology or global organization
  • Proven experience managing multinational teams and HR operations across China and international markets, including Singapore and Southeast Asia
  • Experience leading large-scale organizational transformation, international expansion or HQ setup initiatives is highly preferred
  • Strong track record in workforce budgeting, people cost optimization and enterprise-level performance management design and execution Leadership Competencies
  • CEO mindset & business acumen – Think beyond traditional HR operations and approach people strategy from a business and long-term value creation perspective
  • Strong principles & execution capability – Able to navigate organizational complexity with sound judgment, professionalism and decisiveness
  • High resilience & adaptability – Thrives in fast-paced, high-growth and cross-time-zone environments with strong ownership and emotional steadiness
  • Global leadership capability – Demonstrates cultural intelligence and the ability to lead diverse teams across markets and business context



Why Join us!

  • Partner directly with the Founder & CEO on organization strategy and global growth
  • Play a critical leadership role in building an AI-native, globally scaled education company
  • Shape the future of people, culture and organizational excellence across international markets
Is this job a match or a miss?
Apply Now