Sr. Machine Learning Infrastructure Engineer, Optimus

  • Tesla
  • Palo Alto, California
  • Full Time

As a Software Engineer for the Optimus team, you will build the tools and infrastructure to make and measure improvements to neural network architecture, visualize data, assist with exporting and deploying neural networks to the bot, and evaluate experimental results. You will help us automate the entire workflows of training, validation, and production of the Optimus. Most importantly, you will see your work repeatedly shipped to and utilized by thousands of Humanoid Robots in real world applications.

  • Build and improve our Python training infrastructure for stable and faster training

  • Build the tooling and infrastructure for reporting and visualizing model metrics and performance

  • Build the pipelines to run and validate our PyTorch models

  • Manage, analyze, and visualize our training and test datasets

  • Coordinate with the team managing the hardware cluster to maintain high availability / jobs throughput for Machine Learning

  • Build and improve tooling to deploy trained neural nets to Tesla hardware

  • Practical experience programming in Python and/or C++

  • Proficient in system-level software, particularly hardware-software interactions and resource utilization

  • Understanding of modern machine learning concepts and state of the art deep learning

  • Experience working with training frameworks, ideally PyTorch

  • Demonstrated experience scaling neural network training jobs across clusters of GPUs

  • Optional: Previous experience in deep learning deployment

  • Optional: Profiling and optimizing CPU-GPU interactions (pipelining compute/transfers, etc)

Job ID: 486179905
Originally Posted on: 7/21/2025

Want to find more Construction opportunities?

Check out the 168,458 verified Construction jobs on iHireConstruction