Software Engineer, ML Systems & Training Architecture
OpenAIGenerative AI company
San Francisco, United States$295K - $380K USDSenior
Software Engineering
About the role
Maintain and improve ML training infrastructure and frameworks to support robotics research.
- •Hands-on senior engineer improving and maintaining the robotics team’s ML training framework and infrastructure, debugging training jobs and unblocking researchers.
- •Key Responsibilities Review and improve code across training frameworks and adjacent infrastructure.
- •Identify and prevent risky or low-quality changes.
- •Debug ML training systems, GPUs, clusters, networking, and infrastructure.
- •Unblock researchers from broken training jobs and flaky tooling.
- •Improve reliability and usability of the training framework.
- •Requirements Strong software engineering fundamentals and code review judgment.
- •Experience with ML systems, training frameworks, GPUs, and distributed systems.
- •Ability to read and debug unfamiliar codebases quickly.
- •Produces high-quality code with pragmatic judgment.
Tech stack
PyTorchLinuxGit
Match insights
Tech:PyTorch, Linux, Git
Level:Senior
Location:San Francisco, United States