VP of Product, Research and Training Infrastructure
CoreWeaveGPU Cloud company
Livingston, United StatesDirector
Product
About the role
Lead product strategy and engineering for AI research training infrastructure.
- •As CoreWeave continues to solidify its position as the Essential Cloud for AI, we are seeking a visionary VP of Product, Research Training Infrastructure.
- •This executive leader will own the product strategy and engineering execution for the services that power the most ambitious AI research labs in the world.
- •Key Responsibilities Oversee the evolution of SUNK (Slurm on Kubernetes) to provide researchers with deterministic, bare-metal performance through a cloud-native interface.
- •Drive the development of next-generation orchestrators and automated training-based evaluation frameworks.
- •Build the infrastructure required for sophisticated Reinforcement Learning (RL) and RLHF pipelines.
- •Act as the primary technical partner for lead researchers at global AI labs, translating their requirements into actionable product roadmaps.
- •Requirements 15+ years of experience in engineering leadership, with at least 5+ years managing large-scale infrastructure at a top-tier research lab or an AI-native cloud provider.
- •Deep, hands-on knowledge of Slurm, Kubernetes, and the specific networking requirements (InfiniBand/RDMA) for distributed training clusters.
- •Experience supporting frontier model research (pre-training and post-training) and understanding the pain points of a research scientist.
- •Track record of delivering mission-critical services on multi-thousand GPU clusters.
Tech stack
KubernetesPythonBashGitLinux
Match insights
Tech:Kubernetes, Python, Bash, Git, Linux
Level:Director
Location:Livingston, United States