The University of Vermont
|The Research Computing Systems Engineer position at the University of Vermont provides expert support for the Vermont Advanced Computing Center’s (VACC) research computing systems and helps build and maintain state-of-the-art high performance computing (HPC) solutions for our researchers. The position joins a team that supports the HPC hardware, large-scale storage systems, cluster software, and researcher software in the VACC.
The VACC has four research clusters, including multi-thousand core “big compute” resources and GPU-focused resources, and a massive in-memory database cluster. The role support the VACC computing and storage hardware infrastructure, as well as research computing services such as Slurm, OpenOnDemand and related technologies. Directly supports UVM’s goal of enhancing IT resources and infrastructure available to the UVM research community.
The position works with other VACC IT staff to install, maintain, and support research software and tools used by the UVM research community. In collaboration with Research Computing Facilitators, this position will respond to a wide range of requests submitted by researchers, providing guidance and expertise in software installation and troubleshooting. Provide assessment of the needs of faculty, helping adjust the VACC compute resources to better meet their needs for research and academic purposes. The position actively supports academic use of VACC resources, working with faculty to expand HPC into the classroom.
Scripting is an important skill for this position, both for cluster administration, and for helping users take advantage of the VACC cluster. A change to any part of the VACC resources is likely to affect hundreds of users, working on millions of dollars of research, so systematic attention to detail, careful planning and judgement are critical.
Minimum Qualifications (or equivalent combination of education and experience)
Bachelor’s degree in STEM related field and five years as a Linux Systems Administrator/Engineer, or an equivalent combination of education and experience required. A thorough knowledge of Linux operating systems, network architecture and Linux shell scripting required. Expertise with at least one systems programming language (C, C++, Python, Perl, Rust, etc.) required. Experience with system and network debugging required. Advanced troubleshooting skills required. Configuration management experience required. Excellent customer service ethic, effective communication skills and collaborative approach to teamwork required. Demonstrated success at learning and evaluating new technologies and ability to determine if appropriate for adoption. Demonstrated appreciation for infrastructure as code. Able to work effectively on team and independent projects with the ability to self-direct and adjust to shifts in priorities.
– Significant prior experience with high performance computing systems and scientific software.
– Expertise with HPC technologies, such as job schedulers like Slurm, package managers like Spack, CUDA.
– Experience with virtualization and/or container technologies (VMware, Proxmox, KVM, Podman, Singularity).
– Experience with MongoDB in a sharded architecture.
– Low latency networking experience (e.g., InfiniBand)
– Experience compiling, installing, and running open-source software.
– Experience in debugging and tuning of software applications on HPC clusters, experience with common scientific libraries and applications.
– Experience deploying Open OnDemand, Jupyter Notebook and supporting technologies.
– Clustered filesystem expertise (e.g., GPFS).
Position is eligible for telework.
Special Conditions Bargaining unit position, A probationary period may be required, A probationary period may be required for current UVM employees, Background Check required for this position
|Bargaining unit position, A probationary period may be required, A probationary period may be required for current UVM employees, Background Check required for this position|