The University of Vermont
The University of Vermont is especially interested in candidates who can contribute to the diversity and excellence of the institution. Applicants are encouraged to include in their cover letter information about how they will further this goal.
This senior level IT position provides expert support for the Vermont Advanced Computing Center’s (VACC) cluster services. The VACC has three high performance computing (HPC) research clusters, including multi-thousand core “big compute” resources and GPU-focused resources. The VACC also has large-scale filesystems with over 1 petabyte of IBM Spectrum Scale (GPFS) storage and advanced networking systems that this position is responsible for.
In collaboration with other sysadmins and facilitators, this position is responsible for all hardware and software components of the VACC. This position provides senior level technical expertise to ensure the smooth, reliable, and performant operation of the VACC’s large-scale computing, storage, and networking systems. This position is also responsible for ensuring the security of our primary research computing cluster on campus.
UVM’s commitment to investments in Science, Technology, Engineering, and Mathematics (STEM) mean faculty will have increased needs for state-of-the-art computing resources. This position with build, configure, and run the VACC’s computing services, in collaboration with others in the Systems Architecture & Administration department.
This position will help researchers start using the cluster, explaining basic usage, and improving system documentation. Troubleshooting and performance debugging are important tasks for this position. The person in this position is likely to be called upon to support research software which is new, unfamiliar and/or still under development. Experience compiling, installing and running open-source software is desirable.
Additionally, this position will assess the needs of our faculty, helping adjust the VACC compute resources to better meet their needs. The position will work on software improvements to make the cluster easier to access for new researchers, and academic use of the cluster will also be facilitated. The position needs to thoroughly document procedures and details about systems built and maintained.
Scripting is an important skill for this position, both for cluster administration, and for helping users take advantage of the VACC cluster. A change to any part of the VACC resources is likely to affect hundreds of users, working on millions of dollars of research, so systematic attention to detail, careful planning and judgement are critical.
Minimum Qualifications (or equivalent combination of education and experience)
Bachelor’s degree in Computer Science or technology related field and five years of systems administration experience in a large-scale complex server environment required. A thorough knowledge of Linux operating systems, network architecture and Linux shell scripting are required. Excellent troubleshooting skills. Excellent customer service ethic, effective communication skills and collaborative teamwork. Proven track record of deploying effective research computing systems. Experience with system and network debugging. Configuration management experience.
Experience with HPC technologies such as Slurm, parallel computing, MPI programming, CUDA. Low latency networking experience (e.g., Infiniband). Open OnDemand, Jupyter Notebook and supporting technologies. Clustered filesystem expertise (e.g., GPFS). Expertise with at least one systems programming language (C, Python, Perl, Rust, etc.).
Occasional evening/weekend hours required