Skip to content
OpenAI logo

Software Engineer, ML Systems & Training Architecture

OpenAIGenerative AI company
San Francisco, United States$295K - $380K USDSenior
Software Engineering

About the role

Maintain and improve ML training infrastructure and frameworks to support robotics research.

  • Hands-on senior engineer improving and maintaining the robotics team’s ML training framework and infrastructure, debugging training jobs and unblocking researchers.
  • Key Responsibilities Review and improve code across training frameworks and adjacent infrastructure.
  • Identify and prevent risky or low-quality changes.
  • Debug ML training systems, GPUs, clusters, networking, and infrastructure.
  • Unblock researchers from broken training jobs and flaky tooling.
  • Improve reliability and usability of the training framework.
  • Requirements Strong software engineering fundamentals and code review judgment.
  • Experience with ML systems, training frameworks, GPUs, and distributed systems.
  • Ability to read and debug unfamiliar codebases quickly.
  • Produces high-quality code with pragmatic judgment.
View original posting →

Tech stack

PyTorchLinuxGit

Match insights

Tech:PyTorch, Linux, Git
Level:Senior
Location:San Francisco, United States