The University of Vermont
UVM is especially interested in candidates who can contribute to the diversity and excellence of the institution. Applicants are encouraged to include in their cover letter information about how they will further this goal.
Provide expert support for the Vermont Advanced Computing Center’s (VACC) cluster services which includes three high performance computing (HPC) research clusters, including multi-thousand core “big compute” resources and GPU-focused resources, and large-scale filesystems with over 1 petabyte of IBM Spectrum Scale (GPFS) storage and advanced networking systems.
In collaboration with others in the Systems Architecture & Administration department, oversee hardware and software components of the VACC. Provide senior level technical expertise to ensure the smooth, reliable, and performant operation of the VACC’s large-scale computing, storage, and networking systems. Ensure security of primary research computing cluster on campus. Build, configure, and run the VACC’s computing services, in collaboration with other system administrators and facilitators, to meet UVM’s commitment to investments in Science, Technology, Engineering, and Mathematics (STEM) in support of faculty needs for state-of-the-art computing resources. Help researchers start using the cluster, explaining basic usage, and improving system documentation. Provide troubleshooting and performance debugging. Support research software which is new, unfamiliar and/or still under development. Assess faculty needs, helping adjust the VACC compute resources to better meet their needs. Work on software improvements to make the cluster easier to access for new researchers, and academic use of the cluster will also be facilitated. Thoroughly document procedures and details about systems built and maintained. Scripting, for both for cluster administration and for helping users take advantage of the VACC cluster, is critical in protecting services for hundreds of users and multi-million dollar research commitments.
Minimum Qualifications (or equivalent combination of education and experience)
Bachelor’s degree in Computer Science or technology-related field and four to five years of systems administration experience in a large-scale complex server environment required. Thorough knowledge of Linux operating systems, network architecture and Linux shell scripting required. Effective troubleshooting skills required. Effective customer service ethic, communication skills and collaborative teamwork required. Demonstrated experience deploying effective research computing systems required. Experience with system and network debugging required. Configuration management experience required.
Experience with HPC technologies such as Slurm, parallel computing, MPI programming, and CUDA desirable. Low latency networking experience (e.g., Infiniband) desirable. Open OnDemand, Jupyter Notebook and supporting technologies experience desirable. Clustered filesystem expertise (e.g., GPFS) desirable. Expertise with at least one systems programming language (C, Python, Perl, Rust) desirable. Experience compiling, installing and running open-source software desirable