Job Details

Lead, Systems Administrator

Lead, Systems Administrator
Power the Future of Discovery with KAUST Supercomputing Core Labs!
Are you ready to take the lead in high-performance computing? KAUST Core Labs is looking for a Lead Systems Administrator to manage and optimize a powerful Linux cluster of 300+ GPU/CPU nodes, high-speed networks, and parallel filesystems. In this dynamic role, you'll combine technical expertise with leadership to ensure world-class performance and reliability.
Position Summary:
Serve as the Lead for the team ensuring smooth operation of the Linux cluster consisting of 300+ GPU/CPU compute nodes including parallel filesystems and high-performance network. This is partly technical and partly people leading role which involves supervision of 3-4 experienced HPC system administrators. The role involves development, implementation and supervision of standard operating procedures for the system and the team.
Major Responsibilities:
System operation and upgrade planning to meet laboratory and customer requirements
Workload scheduler policy development and implementation
Support of high-performance filesystems
Network infrastructure management including TCP/IP and HPC networks
Use of scripting languages for nodes automation and configuration management
Hardware failures and spare part management
Build effective relationships with staff, faculty and students through the Core Labs.
Manages multiple or significant projects which may require the use of sophisticated project planning techniques.
Plans, schedules, conducts, or coordinates detailed phases of the work of a major project or in a total project of moderate scope.
Identifies technical training needs for staff attached to the area.
Serve as a resource and as a member to respond to security and safety incidents.
Creates opportunities to enhance technical methodology or content through expansion of existing, or development of, new efforts; may extend technology into new application areas; contributes or leads in major intellectual development activities.
Provides innovative problem-solving approaches to enhance organizational capabilities; uses peer network to expand technical capabilities and identify new research opportunities.
Understands broad strategic objectives and contributes to them; nurtures and maintains relationships with major customers.
May initiate new project concepts; develops technical proposals and makes presentations to potential customers.
Will supervise several scientists, engineers or technicians on assigned work; provides major input to staffing of overall project teams; builds teams and staff to optimize efficiency and cost effectiveness.
Identifies and evaluates candidates for open positions; mentors/trains staff in development of technical, project and business development skills.
Job Requirements:
SLURM workload manager including GPU scheduling
Parallel filesystems (Weka IO, Lustre)
TCP/IP and high performance networks (Infiniband)
Proficient in scripting languages (i.e. Bash, Python, Ruby)
Familiar with configuration management tools (Puppet)
Proficient documentation skills.
Will have working level contact with users and suppliers
Demonstrates an analytical and systematic approach to problem solving.
Takes the initiative in identifying and negotiating appropriate development opportunities.
Demonstrates effective communication skills in written and oral English.
Works effectively with other teams in the Supercomputing Laboratory
Plans, schedules and monitors own work (and that of others) competently within limited deadlines and according to relevant legislation and procedures.
Ability to work successfully in a highly collaborative research environment.
Uses discretion in identifying and resolving complex problems and assignments.
Performs a broad range of work, sometimes complex and non-routine, in a variety of environments.
Maintain expert-level knowledge in most of the laboratory systems, including high performance computing systems administration, high performance storage administration, or high performance network administration.
Qualifications and experience:
Bachelor of Science (or equivalent) in a relevant discipline plus 10 years' experience, OR Master of Science (or equivalent) in a relevant discipline plus 7 years' experience OR Doctor of Philosophy (or equivalent) in a relevant discipline plus 5 years' experience.
About KAUST:
King Abdullah University of Science and Technology (KAUST) is a world-class research and graduate-level university located on the shores of the Red Sea in Saudi Arabia. As a hub for innovation and scientific discovery, KAUST brings together a diverse, international community of researchers, scholars, and students from over 100 countries, including one of the fastest supercomputers in the world, and an attractive compensation package. The Kingdom is undergoing a dynamic transformation, offering exciting opportunities for personal and professional growth in a welcoming, and rapidly advancing environment. It is Located on the Red Sea 80 km north of Jeddah in Saudi Arabia.
Join us and work with Shaheen III, the most powerful supercomputer in the Middle East, driving groundbreaking research and innovation at KAUST and apply here.
To apply, visit: https://careers.kaust.edu.sa/job/Lead%2C-Systems-Administrator/1217160401/
Copyright 2025 Jobelephant.com Inc. All rights reserved.
Posted by the FREE value-added recruitment advertising agency
jeid-1447ce0d2bad7f459f7f36fc38485314