Tutorial 4 (Why making the most of Glicid’s storage space can save you a lot of time.)

logo glicid

Best Practices for Using Scratch and Storage Spaces

When running jobs on an HPC cluster, it is important to use the right filesystem for the right purpose. This ensures optimal performance, avoids overloading shared filesystems, and keeps your data safe.

1. 1. Work in the /scratch space

  • Has explain here /scratch/nautilus or /scratch/waves are fast, temporary filesystem designed for I/O-intensive workloads.

  • Use it to read and write files during your job execution.

Example in a job script:

cd /scratch/$USER/myjob
# Run your application here

2. 2. Archive your results before moving them

  • After your job finishes, collect small files into a single archive.

  • Use tar to group them, which reduces the number of inodes and simplifies transfers.

Example:

tar -czf myjob.tar.gz results_dir/

3. 3. Transfer to long-term storage

  • Move your archive from /scratch to a slower but larger storage area (e.g. /store or /home).

  • This storage is designed for capacity and persistence, not speed.

Example:

mv myjob.tar.gz /LAB-DATA/GLiCID/$USER/

4. 4. Key Points

  • /scratch/nautilus or /scratch/waves → for temporary, fast I/O during computation.

  • /LAB-DATA/GLiCID (or equivalent) → for long-term storage of results.

  • Always archive small files before moving them to large storage.

  • Clean up /scratch/nautilus or /scratch/waves after jobs to free space for other users.

5. Example: Why Archiving Small Files Before Transfer is important

This example shows the difference in performance when handling 20,000 very small files: first directly on /LAB-DATA/GLiCID, and then using /scratch/nautilus for archiving before transfer.

5.1. Case 1: Creating tar.gz on LABDATA/GLICID

time ./test_tar2.sh /LAB-DATA/GLiCID/users/john-d@univ-nantes.fr/test_lab-data/testdir
=== Preparing test files (20000 files) ===
=== Stress test started on /LAB-DATA/GLiCID/.../testdir ===
Total data size: 0.27 MB
--- Iteration 1 ---
Duration: 155 s | Avg. throughput: 0 MB/s
--- Iteration 2 ---
Duration: 141 s | Avg. throughput: 0 MB/s
--- Iteration 3 ---
Duration:  61 s | Avg. throughput: 0 MB/s
=== Cleanup ===
Stress test finished. Archives are in /LAB-DATA/GLiCID/.../testdir

real    6m15s

Decompression of the 3 tar files and rsync of the 20,000 small files from /scratch to /LAB-DATA/GLiCID:

time rsync -av --progress /scratch/.../testdir .
...
real    16m13s

Result: ~6 minutes for compression on LABDATA/GLICID and ~16 minutes to transfer 20,000 tiny files. Transfer rate observed: 0.34 kB/s for small files.


5.2. Case 2: Creating tar.gz on /scratch

time ./test_tar2.sh /scratch/nautilus/users/blondel-a@univ-nantes.fr/testdir
=== Preparing test files (20000 files) ===
=== Stress test started on /scratch/.../testdir ===
Total data size: 0.27 MB
--- Iteration 1 ---
Duration: 0.11 s | Avg. throughput: 2.29 MB/s
--- Iteration 2 ---
Duration: 0.12 s | Avg. throughput: 2.19 MB/s
--- Iteration 3 ---
Duration: 0.12 s | Avg. throughput: 2.19 MB/s
=== Cleanup ===
Stress test finished. Archives are in /scratch/nautilus/users/john-d@univ-nantes.fr/testdir

real    0m24,999s

Then rsync the tar archives to LABDATA/GLICID:

time rsync -av --progress /scratch/nautilus/users/john-d@univ-nantes.fr/testdir /LAB-DATA/GLiCID/users/john-d@univ-nantes.fr/test_lab-data/

sending incremental file list
testdir/
testdir/test_1.tar  20MB 100%  6.42MB/s  0:00:03
testdir/test_2.tar  20MB 100%  1.65MB/s  0:00:11
testdir/test_3.tar  20MB 100%  5.48MB/s  0:00:03

real    0m17,673s

Result: ~25 seconds to compress 20,000 files on /scratch, and ~17 seconds to transfer the 3 large tar files to LABDATA/GLICID. Transfer rate observed: 6.42 MB/s for large files.


5.3. Conclusion

  • Directly storing and transferring thousands of small files on LABDATA/GLiCID is slow and inefficient.

  • Using /scratch to archive small files into larger .tar.gz files before transfer is much faster.

  • This difference comes from the underlying infrastructure:

    • /scratch is a distributed, high-performance storage system designed for fast I/O during computations.

    • LABDATA/GLiCID is designed for capacity and long-term storage of files, not for handling millions of small I/O operations efficiently.

6. Best Practice

Work on /scratch/nautilus or /scratch/waves for temporary data processing, then compress and transfer results to LABDATA/GLiCID for middle or long-term storage within the limits of your quota.