Do not attempt to simply copy all data over. PLAN the transition and read this document first.
With effect from 1 May 2021, there is a new 3 PB Lustre storage system available. It is mounted on the system as
/mnt/lustre3p, and user directories for all current users have already been created. To make life easy for yourself, create a symbolic link in your home directory:
[jblogs@login2:~]$ USERNAME=`whoami` ln -s /mnt/lustre3p/users/$USERNAME lustre3p
From your home directory,
cd lustre3p will take you straight to your new lustre directory.
Use the full absolute path in job scripts, e.g.,
subdir is replaced with the actual name of the subdirectory in your Lustre directory which will be the working directory of the job script).
When you use the symbolic link this generates a file access via your home directory which is on NFS and this extra step slows down the job script for each and every file access. Depending on the process and software, using a symbolic link instead of the full path may prevent the job from executing successfully.
As of 1 May 2021, it is already possible to do new runs directly using the new storage.
The original Lustre storage will be permanently retired shortly afterwards, as it has reached its lifetime limit.
Please refer to the File Systems guide for policy information, but the following points are most relevant:
Generally speaking, large files of several hundred MB can be transferred at high speed, thanks to the inherent speed of both Lustre systems and the high-speed Infiniband networks connecting everything in the cluster. However, transferring large numbers of small files can be extremely time-consuming, because of the need to negotiate the transfer of each file individually. If you have a lot of small files to transfer, consider using tar to “archive” the files into a single file which can be copied more effectively. The following example shows how to do this without first saving an intermediate file. The files are concatenated into a stream, and a pipe is used to untar them into the new directory. Do not attempt this before testing it first.
cd /mnt/lustre/users/jblogs tar cf - MyDataDirectory | tar xf - -C /mnt/lustre3p/users/jblogs
Of course, this assumes that your username is “jblogs”. Please use your own username and directory names. If your files are uncompressed ASCII files, it will probably be faster to compress them with the following version of the process:
cd /mnt/lustre/users/jblogs tar czf - MyDataDirectory | tar xzf - -C /mnt/lustre3p/users/jblogs
Do not use data compression if you haven't tested it first and found it to be beneficial. Many types of data files, such as NetCDF files, are already compressed, and trying to compress them again will slow the process down substantially.
However, bear in mind that we have tested the utility bbcp also for this purpose, and it has turned out to be easy to use and fast. bbcp is available on all nodes in the cluster.
bbcp can easily be used to copy data between different computers. However, it also works well when copying data between different directories on the same server, hence it is well suited to the purpose of migrating data from one Lustre storage to another. By default it can use multiple simultaneous streams and it can resume interrupted transfers. The syntax is described here.
This example will take the entire directory DataDirectory and copy it to the new Lustre space of the user jblogs. As ever, PLEASE use your own username and not jblogs!
bbcp -apr -s 4 -P 30 /mnt/lustre/users/jblogs/DataDirectory /mnt/lustre3p/users/jblogs/
The rather cryptic command line options mean the following:
|-a||append mode to restart a previously failed copy|
|-p||preserve source mode, ownership, and dates|
|-r||copy subdirectories and their contents (actual files only)|
|-s 4||number of network streams to use (default is 4)|
|-P 30||produce a progress message every sec seconds (15 sec minimum)|
|-v||verbose (provide more informative output, such as transfer rate per file)|
|-O||omits files that already exist at the target node (useful with -r)|
|-c lvl||compress data before sending across the network (default lvl is 1)|
|-t sec||sets the time limit for the copy to complete|