GRACE at UCL Grace/Legion/Software Stack/Legion DI State of Research Computing Services: Legion Legion has been UCL's primary local compute resource since 2007. Almost none of the original hardware is still in service. Gradual upgrade over time. Absorbing other services. 7 year old core network technology – 1G Ethernet

State of Research Computing Services: Legion Gradual upgrade over time means service is fragmented: 8 Different node types! Some have Infinib in addition to , some don’t! PIs buy the hardware they need. Parallel vs Serial In general: Iridis 3 parallel Legion high throughput Parallel Single job spans multiple nodes Tightly coupled parallelisation usually in MPI Sensitive to network per as long as mance Currently primarily chemistry, physics, engineering High throughput Lots (tens of thous in addition to s) of independent jobs on different data High I/O Currently, primarily biosciences in addition to physics In the future, digital humanities Input Data Output Data Many processes on many processors work simultaneously + communicate between each other Parallel

Many processes, operate independently of each other in addition to in any order Input Data Output Data High Throughput Iridis Retirement In summer 2015, Southampton were due to retire Iridis This means that we would lose ~71 TeraFlops of compute capacity. And the ability to run large parallel jobs! We also wanted to retire the original Legion hardware which was 7 years old! Losing another 20 TeraFlops Luckily, we had £1.5 million to spend! State of Research Computing Services: Grace Grace went “into service” on the 2nd December 2015.Complete new service as long as parallel compute. All nodes are connected to storage by 40 gigabit infinib in addition to . Infinib in addition to is primary network in the cluster (IP over IB – looks like a “normal” network). Designed with network capacity to double size over time.

To replace UCL’s Iridis 3 service in addition to retired Legion nodes we required ~90 TeraFlops sustained Grace was benchmarked at ~180 TeraFlops Grace Legion Legion/Grace have a common software stack. Red Hat Enterprise Linux + Son of Grid Engine + Environment modules Common set of Compilers (so you can compile your own code) Libraries Applications It’s likely the application you use is already available or we can install it as long as you Scripted builds of applications (so we can easily install new versions as long as you) xCAT management software (which allows us to manage the cluster) Easy to move between the services (you have the same environment on both machines)

Wherever possible the UCL Research Computing Plat as long as m Services Team’s work is Open Source in addition to on Github: You can deploy it on your resources/desktop (application licenses permitting) The Future – Legion “Data Intensive” Although Legion now does only high throughput computing, it’s not designed as long as it. Some issues with I/O We need to retire some old hardware. So the next major upgrade is re-designing Legion as long as HTC. Replace old “Nehalem” nodes. Replace/upgrade 1G Ethernet I/O subsystem. Local mirroring of common datasets. Coming ~summer 2017! The then current iteration of the software stack. None of this would have been possible without: UCL: Dr Ian Kirker, Heather Kelly, Brian Alston, Thomas Jones, Luke Sudbery, William Hay, Colin Byelong, Prof. Dario Alfe, Dr Javier Herrero, Dr Jörg Saßmannshausen, Mike Atkins, Greg Dyer OCF/Lenovo/DDN Georgina Ellis, Arif Ali, Jagjit Reehal, Jim Roche, Richard Mansfield in addition to certainly many, many others. THANKS!

