As an HPC Infrastructure Engineer, you will play a key role in architecting and operating our on-prem and hybrid compute clusters. Youll collaborate with research teams and application owners to translate requirements into automated, repeatable deployments. Your work ensures that thousands of cores, hundreds of GPUs, and petabytes of storage deliver reliable, high-throughput service for CAE simulations and AI/ML training. This position is open to internal candidates and reports to the HPC Infrastructure Manager.
+ Design, develop, and maintain Ansible playbooks, roles, and collections for provisioning and configuring compute nodes, storage systems, and network services.
+ Automate OS and middleware upgrades, security patching, and routine maintenance tasks across HPC clusters.
+ Leverage Ansible Automation Platform (AAP) to orchestrate workflows, manage inventories, and streamline job templates.
+ Collaborate with SRE and DevOps teams to integrate HPC infrastructure monitoring (e.g., Prometheus, Grafana) and alerting workflows.
+ Work closely with application owners to optimize cluster performance for CAE, data analytics, and AI/ML workloads.
+ Troubleshoot complex hardware and software issuesranging from network fabric anomalies to scheduler misconfigurationsand drive root-cause analysis.
+ Participate in capacity planning, performance benchmarking, and scalability studies to guide infrastructure investments.
+ Document system designs, runbooks, and standard operating procedures; mentor junior engineers on automation best practices.
+ Support research and engineering teams during peak project phases, including on-call rotations for after-hours issue resolution.
+ Bachelors degree in Computer Science, Engineering, or related field, or equivalent practical experience.
+ 3+ years of hands-on experience designing and managing HPC or large-scale Linux infrastructures.
+ Expert-level proficiency with Ansible: playbook authoring, role development, Galaxy publishing, and debugging.
+ Familiarity with Red Hat Ansible Automation Platform (AAP) for enterprise automation and orchestration.
+ Strong Linux administration skills (RHEL/CentOS preferred), including kernel tuning, storage configuration, and network management.
+ Familiarity with HPC workload schedulers such as Slurm or PBS.
+ Experience with performance monitoring tools (Prometheus, Nagios) and metrics visualization (Grafana).
+ Solid scripting ability in Python, Bash, or similar languages to extend automation and integrate APIs.
+ Demonstrated troubleshooting skills across hardware (InfiniBand, Ethernet) and software stacks.
+ Excellent communication and collaboration skills; ability to distill technical concepts for cross-functional stakeholders.
+ Ability to work independently, prioritize tasks in a fast-paced environment, and mentor less experienced team members.
You may not check every box, or your experience may look a little different from what we've outlined, but if you think you can bring value to Ford Motor Company, we encourage you to apply!
**As an established global company, we offer the benefit of choice. You can choose what your Ford future will look like: will your story span the globe, or keep you close to home? Will your career be a deep dive into what you love, or a series of new teams and new skills? Will you be a leader, a changemaker, a technical expert, a culture builderor all of the above? No matter what you choose, we offer a work life that works for you, including:**
Immediate medical, dental, and prescription drug coverage
Flexible family care, parental leave, new parent ramp-up programs, subsidized back-up child care and more
Vehicle discount program for employees and family members, and management leases
Tuition assistance
Established and active employee resource groups
Paid time off for individual and team community service
A generous schedule of paid holidays, including the week between Christmas and New Years Day
Paid time off and the option to purchase additional vacation time.
**For a detailed look at our benefits, click here:** Benefit Summary (
This position is a salary grade **7** .
This position is a range of salary grades **7.**
**_*Visa Sponsorship is not provided for this role_** *****
Candidates for positions with Ford Motor Company must be legally authorized to work in the United States. Verification of employment eligibility will be required at the time of hire.
We are an Equal Opportunity Employer committed to a culturally diverse workforce. All qualified applicants will receive consideration for employment without regard to race, religion, color, age, sex, national origin, sexual orientation, gender identity, disability status or protected veteran status. In the United States, If you need a reasonable accommodation for the online application process due to a disability, please call ....
**\#** LI-Remote
\#LI-GH2
**Requisition ID** : 48969
+ Design, develop, and maintain Ansible playbooks, roles, and collections for provisioning and configuring compute nodes, storage systems, and network services.
+ Automate OS and middleware upgrades, security patching, and routine maintenance tasks across HPC clusters.
+ Leverage Ansible Automation Platform (AAP) to orchestrate workflows, manage inventories, and streamline job templates.
+ Collaborate with SRE and DevOps teams to integrate HPC infrastructure monitoring (e.g., Prometheus, Grafana) and alerting workflows.
+ Work closely with application owners to optimize cluster performance for CAE, data analytics, and AI/ML workloads.
+ Troubleshoot complex hardware and software issuesranging from network fabric anomalies to scheduler misconfigurationsand drive root-cause analysis.
+ Participate in capacity planning, performance benchmarking, and scalability studies to guide infrastructure investments.
+ Document system designs, runbooks, and standard operating procedures; mentor junior engineers on automation best practices.
+ Support research and engineering teams during peak project phases, including on-call rotations for after-hours issue resolution.
+ Bachelors degree in Computer Science, Engineering, or related field, or equivalent practical experience.
+ 3+ years of hands-on experience designing and managing HPC or large-scale Linux infrastructures.
+ Expert-level proficiency with Ansible: playbook authoring, role development, Galaxy publishing, and debugging.
+ Familiarity with Red Hat Ansible Automation Platform (AAP) for enterprise automation and orchestration.
+ Strong Linux administration skills (RHEL/CentOS preferred), including kernel tuning, storage configuration, and network management.
+ Familiarity with HPC workload schedulers such as Slurm or PBS.
+ Experience with performance monitoring tools (Prometheus, Nagios) and metrics visualization (Grafana).
+ Solid scripting ability in Python, Bash, or similar languages to extend automation and integrate APIs.
+ Demonstrated troubleshooting skills across hardware (InfiniBand, Ethernet) and software stacks.
+ Excellent communication and collaboration skills; ability to distill technical concepts for cross-functional stakeholders.
+ Ability to work independently, prioritize tasks in a fast-paced environment, and mentor less experienced team members.
You may not check every box, or your experience may look a little different from what we've outlined, but if you think you can bring value to Ford Motor Company, we encourage you to apply!
**As an established global company, we offer the benefit of choice. You can choose what your Ford future will look like: will your story span the globe, or keep you close to home? Will your career be a deep dive into what you love, or a series of new teams and new skills? Will you be a leader, a changemaker, a technical expert, a culture builderor all of the above? No matter what you choose, we offer a work life that works for you, including:**
Immediate medical, dental, and prescription drug coverage
Flexible family care, parental leave, new parent ramp-up programs, subsidized back-up child care and more
Vehicle discount program for employees and family members, and management leases
Tuition assistance
Established and active employee resource groups
Paid time off for individual and team community service
A generous schedule of paid holidays, including the week between Christmas and New Years Day
Paid time off and the option to purchase additional vacation time.
**For a detailed look at our benefits, click here:** Benefit Summary (
This position is a salary grade **7** .
This position is a range of salary grades **7.**
**_*Visa Sponsorship is not provided for this role_** *****
Candidates for positions with Ford Motor Company must be legally authorized to work in the United States. Verification of employment eligibility will be required at the time of hire.
We are an Equal Opportunity Employer committed to a culturally diverse workforce. All qualified applicants will receive consideration for employment without regard to race, religion, color, age, sex, national origin, sexual orientation, gender identity, disability status or protected veteran status. In the United States, If you need a reasonable accommodation for the online application process due to a disability, please call ....
**\#** LI-Remote
\#LI-GH2
**Requisition ID** : 48969
Job ID: 488239679
Originally Posted on: 8/6/2025