About This Job
Join a world-class team of builders combining deep hardware and software expertise to invent a revolutionary consumer product! We're looking for engineers to help turbocharge our serving layer, optimizing the performance of cutting-edge LLM, speech, and vision models, experiment with new compilers to support running models on a variety of hardware compute platforms and write custom kernels. If you're excited about shaping the future of real-time AI systems, we’d love to talk.
Note: This role requires candidates already within the mainland United States and 4 years of professional industry experience as a ML Performance or Infrastructure Engineer.
Locations: Bellevue WA, San Francisco CA, New York City, NY.
Required Qualifications:
•Expert in some differentiable array computing framework, preferably PyTorch.
•Expert in optimizing machine learning models for serving reliably at high throughput, with low latency.
•Significant systems programming experience; ex. Experience working on high-performance server systems—you’d be just as comfortable with the internals of VLLM as you would with a complex PyTorch codebase.
•Significant performance engineering experience; ex. Bottleneck analysis in high-scale server systems or profiling low-level systems code.
•Familiarity with high-performance LLM serving; ex. experience with VLLM, SGlang deployment, and internals.
•Experience deploying and scaling inference workloads in the cloud using Kubernetes, Ray, etc.
•You like to ship and have a track record of leading complex multi-month projects without assistance.