The Research Infrastructure (RI) group (within SCD) defines, develops and manages the underpinning scientific computing infrastructure used to provide an extensive range of national and international science projects including the GridPP Tier1 service, the JASMIN super data cluster, STFC’s central HPC cluster as well as computing support for STFC facilities such as the ISIS Neutron Source, the Central Laser Facility plus the Diamond Light Source.
The RI group currently has a vacancy for a highly motivated Linux Systems and Services Administrator to join the team that manages the extensive computing infrastructure that supports the large science projects as well as general purpose computing resources used by the SCD.
The current infrastructure includes 20000+ CPU cores, 30PB+ online data storage, near line tape storage, complex virtualisation and cloud installations all underpinned by high performance networking.
The post is based at the STFC Rutherford laboratory in Oxfordshire.
List of Duties/Work Programme/Responsibilities
- part of the team responsible for installing, maintaining and supporting both the High Performance Computing (HPC) and High Throughput Computing (HTC) services as well as general purpose scientific computing resources managed by the RI group
- ensuring services run by the RI group run smoothly and meet their operational commitments
- investigating and resolving operational problems and incidents affecting production services, often acting as a first line of response and escalating to specialists as required
- work alongside other team members, other groups in SCD, STFC and external collaborators to ensure that the services are able to meet the scientists’ requirements
- investigate, recommend and deploy new technologies, services and management tools as required to enhance the service levels provided by the RI group
- part of an on-call team.
Technical Skills Required
- experience in the management of Linux (ideally RHEL/SL/CENTOS ) machines, ideally in a production environment
- experience in user support roles preferably in a production environment
Ideally, additional experience in at least one of:
- performance and exception monitoring tools such as Nagios/Icinga or Ganglia
- scientific computing workflows on HPC or HTC clusters including workload schedulers such as Platform LSF or HTCondor
- managing a service platform to a high level of availability
- experience in the configuration and management of large numbers of Linux servers
- large capacity storage solutions
- ethernet and TCP/IP networking
Personal Skills and Attributes
- good communication skills both verbal and written
- a proactive attitude to problem solving, service delivery and continuous improvement
- the ability to work as a team member towards the delivery of both common team and personal objectives
- commitment to acquiring new skills given the opportunities to work with a large range of cutting edge technologies deployed within the SCD
Any other Relevant Information
occasional UK and Overseas travel
For more information and to apply, click here.