Tutorial 4 (Why making the most of Glicid’s storage space can save you a lot of time.)
Best Practices for Using Scratch and Storage Spaces
When running jobs on an HPC cluster, it is important to use the right filesystem for the right purpose. This ensures optimal performance, avoids overloading shared filesystems, and keeps your data safe.
1. 1. Work in the /scratch space
-
Has explain here
/scratch/nautilus
or/scratch/waves
are fast, temporary filesystem designed for I/O-intensive workloads. -
Use it to read and write files during your job execution.
Example in a job script:
cd /scratch/$USER/myjob
# Run your application here
2. 2. Archive your results before moving them
-
After your job finishes, collect small files into a single archive.
-
Use
tar
to group them, which reduces the number of inodes and simplifies transfers.
Example:
tar -czf myjob.tar.gz results_dir/
3. 3. Transfer to long-term storage
-
Move your archive from
/scratch
to a slower but larger storage area (e.g./store
or/home
). -
This storage is designed for capacity and persistence, not speed.
Example:
mv myjob.tar.gz /LAB-DATA/GLiCID/$USER/
4. 4. Key Points
-
/scratch/nautilus
or/scratch/waves
→ for temporary, fast I/O during computation. -
/LAB-DATA/GLiCID
(or equivalent) → for long-term storage of results. -
Always archive small files before moving them to large storage.
-
Clean up
/scratch/nautilus
or/scratch/waves
after jobs to free space for other users.
5. Example: Why Archiving Small Files Before Transfer is important
This example shows the difference in performance when handling 20,000 very small files:
first directly on /LAB-DATA/GLiCID
, and then using /scratch/nautilus
for archiving before transfer.
5.1. Case 1: Creating tar.gz on LABDATA/GLICID
time ./test_tar2.sh /LAB-DATA/GLiCID/users/john-d@univ-nantes.fr/test_lab-data/testdir
=== Preparing test files (20000 files) ===
=== Stress test started on /LAB-DATA/GLiCID/.../testdir ===
Total data size: 0.27 MB
--- Iteration 1 ---
Duration: 155 s | Avg. throughput: 0 MB/s
--- Iteration 2 ---
Duration: 141 s | Avg. throughput: 0 MB/s
--- Iteration 3 ---
Duration: 61 s | Avg. throughput: 0 MB/s
=== Cleanup ===
Stress test finished. Archives are in /LAB-DATA/GLiCID/.../testdir
real 6m15s
Decompression of the 3 tar files and rsync
of the 20,000 small files from /scratch
to /LAB-DATA/GLiCID
:
time rsync -av --progress /scratch/.../testdir .
...
real 16m13s
Result: ~6 minutes for compression on LABDATA/GLICID and ~16 minutes to transfer 20,000 tiny files. Transfer rate observed: 0.34 kB/s for small files.
5.2. Case 2: Creating tar.gz on /scratch
time ./test_tar2.sh /scratch/nautilus/users/blondel-a@univ-nantes.fr/testdir
=== Preparing test files (20000 files) ===
=== Stress test started on /scratch/.../testdir ===
Total data size: 0.27 MB
--- Iteration 1 ---
Duration: 0.11 s | Avg. throughput: 2.29 MB/s
--- Iteration 2 ---
Duration: 0.12 s | Avg. throughput: 2.19 MB/s
--- Iteration 3 ---
Duration: 0.12 s | Avg. throughput: 2.19 MB/s
=== Cleanup ===
Stress test finished. Archives are in /scratch/nautilus/users/john-d@univ-nantes.fr/testdir
real 0m24,999s
Then rsync
the tar archives to LABDATA/GLICID:
time rsync -av --progress /scratch/nautilus/users/john-d@univ-nantes.fr/testdir /LAB-DATA/GLiCID/users/john-d@univ-nantes.fr/test_lab-data/
sending incremental file list
testdir/
testdir/test_1.tar 20MB 100% 6.42MB/s 0:00:03
testdir/test_2.tar 20MB 100% 1.65MB/s 0:00:11
testdir/test_3.tar 20MB 100% 5.48MB/s 0:00:03
real 0m17,673s
Result: ~25 seconds to compress 20,000 files on /scratch
, and ~17 seconds to transfer the 3 large tar files to LABDATA/GLICID.
Transfer rate observed: 6.42 MB/s for large files.
5.3. Conclusion
-
Directly storing and transferring thousands of small files on LABDATA/GLiCID is slow and inefficient.
-
Using
/scratch
to archive small files into larger.tar.gz
files before transfer is much faster. -
This difference comes from the underlying infrastructure:
-
/scratch
is a distributed, high-performance storage system designed for fast I/O during computations. -
LABDATA/GLiCID is designed for capacity and long-term storage of files, not for handling millions of small I/O operations efficiently.
-