Scientific Computing Services Manager
15 Mar 2017




The Scientific Computing Department (SCD) provides world class leading edge compute and data storage infrastructure to support the work of world class science both within STFC, the UK and internationally.


The Research Infrastructure (RI) group (within SCD) defines, develops and manages the underpinning scientific computing infrastructure used to provide an extensive range of national and international science projects including the GridPP Tier1 service, the JASMIN super data cluster, STFC’s central HPC cluster as well as computing support for STFC facilities such as the ISIS Neutron Source, the Central Laser Facility plus the Diamond Light Source. 
The RI group currently has a vacancy for a highly motivated IT professional or scientist with experience of delivering scientific computing services or e-infrastructure to join the team that manages the extensive computing infrastructure that supports the large science projects as well as general purpose computing resources used by the SCD.
The current infrastructure includes 20000+ CPU cores, 30PB+ online data storage, near line tape storage, complex virtualisation and cloud installations all underpinned by high performance networking.
The post is based at the STFC Rutherford laboratory in Oxfordshire.

List of Duties/Work Programme/Responsibilities

  • Ensuring services run by the RI group run smoothly and meet their operational commitments by:
  • ensuring operational problems and incidents affecting production services are identified and resolved in a timely manner, often acting as a first line of response and escalating to specialists as required
  • planning for interventions, upgrades and planned downtime periods across a complex interrelated set of services
  • tracking service performance against agreed SLAs/SLDs and reporting to relevant stakeholders on service performance and issues
  • ensuring the team follows agreed best practices and driving a process of continuous improvements to ensure the high availability of production systems and services
  • working alongside other team members, other groups in SCD, STFC and external collaborators to ensure that the services offered are able to meet the scientists’ requirements
  • investigate, recommend and deploy new technologies, services and management tools as required to enhance the service levels provided by the RI group
  • part of an on-call team

Technical Skills Required

  • ability to manage a large multi user service platform to a high level of availability
  • ability to support users of computing systems, responding to problems and developing solutions as required in a timely manner
  • strong problem solving and analysis skills

Ideally, additional experience in at least one of:

  • performance and exception monitoring tools such as Nagios/Icinga or Ganglia
  • managing scientific computing workflows on HPC or HTC clusters 
  • system administration procedures and practices

Personal Skills and Attributes

  • good communication skills both verbal and written
  • a proactive attitude to service delivery and continuous improvement
  • the ability to negotiate and reach consensus with multiple stakeholders often with conflicting requirements 
  • the ability to work well as a team member or team leader ensuring that the team meets its commitments
  • the ability to plan and manage workloads towards the delivery of both common team and personal objectives
  • a commitment to acquiring new skills given the opportunities to work with a large range of cutting edge services deployed within the SCD

Any other Relevant Information

occasional UK and Overseas travel

For more information and to apply, click here.