logo glicid

   Tutorial 1 (buildah/apptainer/unet)

.1. Introduction

This tutorial aims to run a deep learning algorithm on GLICID computing clusters using your own Dockerfile. We will use the example

It is broken down into several parts:

  • creation of the apptainer image using buildah.

  • start training an AI model on the Nautilus cluster using a submit script

  • Python script which performs semantic segmentation.

.2. buildah

On Glicid clusters, you will not find the docker container solution. On the other hand, apptainer (formerly singularity) is available.

What if you have been using Docker for a while and have some existing Dockerfiles? Not a problem. Buildah can build images using a Dockerfile. The build command takes a Dockerfile as input and produces an OCI image..

.2.1. create the apptainer image with buildah

Example of dockerfile for the tutorial: Dockerfile_unet-gpu

FROM tensorflow/tensorflow:2.10.1-gpu
MAINTAINER Aymeric BLONDEl
RUN apt-get update && \
  apt-get upgrade -y && \
  apt-get install -y \
python-setuptools

#apt install python3.8-venv
RUN apt-get install -y cuda-compat-12-1
RUN apt-get install -y nvidia-cuda-toolkit
RUN apt-get install -y vim

RUN pip install --upgrade pip
RUN pip install tf-keras
RUN pip install matplotlib
RUN pip install numpy
RUN pip install opencv-python

RUN mkdir /scratch
RUN mkdir /scratch/waves
RUN mkdir /scratch/nautilus
WORKDIR /scratch/nautilus
RUN pip install opencv-python-headless==4.5.3.56
RUN pip install scikit-learn

Creating Images From Containerfiles With Buildah

buildah bud -f Dockerfile_unet-gpu -t glicid_unet2:latest .
The Buildah utility is functionally similar to Podman in the way that it behaves, but maintains independence from Podman to facilitate the build of OCI compliant images. We want to create OCI images to be able to use them without problems with apptainer

Apptainer is in fact an open-source technology for container management, based on OCI (Open Container Initiative) specifications, ensuring the portability of images.

By using Apptainer with OCI images, developers benefit from a consistent and portable infrastructure for their containerized applications.

We can see our image

podman images

result

localhost/glicid_unet2 latest 66e0f7f53723 About a minute ago 7.9 GB

.2.2. construction of the apptainer image from the podman OCI image

podman save glicid_unet:latest -o glicid_unet2.tar
apptainer build glicid_unet2.sif docker-archive:glicid_unet2.tar

.2.3. moving the image and the dataset to Nautilus

moving image apptainer

scp glicid_unet2.sif Glicid:/LAB-DATA/GLiCID/projects/johndoeprj/apptainer-img/glicid_unet2.sif

moving the dataset

scp images_masks_tests_bande_X10_ab_22_03_24.tar Glicid:/scratch/nautilus/users/john-d@univ-nantes.fr/dataset/

We place ourselves in the correct directory:

cd /scratch/nautilus/users/john-d@univ-nantes.fr/tutounet

.2.4. creation of the submission slurm file

#!/bin/bash
#SBATCH --job-name=unetx100
#SBATCH -p gpu
#SBATCH --output=%x-%j.out
#SBATCH -t 1-00:00:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1      # Number of OpenMP threads
#SBATCH --nodes=1
#SBATCH --cluster=nautilus
#SBATCH --qos=short
#SBATCH --mem=20G
#SBATCH --gres=gpu:1
#SBATCH --mail-user=john-d@univ-nantes.fr
#SBATCH --mail-type=END
#SBATCH --mail-type=FAIL

source /usr/share/Modules/init/bash
module load guix/latest
module load apptainer/1.1.6
export APPTAINERENV_LD_LIBRARY_PATH=/opt":$LD_LIBRARY_PATH"
export CUDA_VISIBLE_DEVICES=2

apptainer exec --nv -B /opt/software/glicid/biblio_scientifiques/cuda_12.2.0_535.54.03/lib64:/opt -B /scratch/nautilus/users/john-d@univ-nantes.fr/dataset:/scratch/nautilus -B /scratch/nautilus/users/john-d@univ-nantes.fr/tutounet:/usr/local/bin /LAB-DATA/GLiCID/projects/modes/apptainer-img/glicid_unet2.sif /usr/bin/python3 /usr/local/bin/unet/FIB_SEG/unet-modif-semantic-seg/simple_unet_multi/mric_2_512_nautilus.py
As a reminder: -B allows you to bind a cluster storage space to a directory in the apptainer container. --nv : Flag to run a CUDA application inside a container.

.2.5. start the model training calculation.

sbatch launch_tf_nautilus.slurm

.2.6. Result

ls /scratch/nautilus/users/john-d@univ-nantes.fr/tutounet

train_25_04_band_testnautilus.hdf5

last line of unetx100-3820104.out

Accuracy is = 98.42780232429504%

.3. Use U-Net for semantic image segmentation.

U-Net is a convolutional neural network for semantic image segmentation. It uses an encoder-decoder architecture with residual connections to capture contextual information at different scales. U-Net produces accurate masks by identifying and classifying each pixel based on its semantics.

.3.1. Contents of the mric_2_512_nautilus.py file (semantic segmentation of an image via unet)

from mric_unet_model import unet_model #Uses softmax (3)
from keras.utils import normalize
import os
import cv2
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
# indicate to tensorflow where is cuda
os.environ["XLA_FLAGS"] = "--xla_gpu_cuda_data_dir=/usr/lib/cuda"
##############################################################
######### Create Datasets Train and Test  ####################
##############################################################
SIZE_X = 512
SIZE_Y = 512
# number of classes to segment
n_classes=3
# train image in a list
train_img = []
# mask image in a list
train_mask = []
x = []
y = []
(1)
rep_img ="/scratch/nautilus/images_masks_tests_bande_X10_ab/train/augmented_imagesx100"
rep_mask ="/scratch/nautilus/images_masks_tests_bande_X10_ab/train/augmented_masksx100"
img_files = os.listdir(rep_img)
paires_img_mask = []
# Images and Masks must have the same dimensions
for img_file in img_files:
    img_path = os.path.join(rep_img, img_file)
    nom_masque = os.path.splitext(img_file)[0] + ".tif"
    masque_path = os.path.join(rep_mask, nom_masque)
    img = cv2.imread(image_path,0)
    if img is not None:
        x.append(cv2.resize(img, (SIZE_X,SIZE_Y), interpolation=cv2.INTER_CUBIC))
        masque = cv2.imread(masque_path, cv2.IMREAD_GRAYSCALE)
        y.append(cv2.resize(masque, (SIZE_X,SIZE_Y), interpolation=cv2.INTER_CUBIC))
# Convert list to numpy array need by ML
train_img = np.array(x)
train_mask = np.array(y)
# Encode Labels
labelencoder = LabelEncoder()
n, h, w = train_masks.shape
train_masks_reshaped = train_masks.reshape(-1,1)
train_masks_reshaped_encoded = labelencoder.fit_transform(train_masks_reshaped)
train_masks_encoded_original_shape = train_masks_reshaped_encoded.reshape(n, h, w)
np.unique(train_masks_encoded_original_shape)
train_img = np.expand_dims(train_img, axis=3)
train_img = normalize(train_img, axis=1)
train_masks_input = np.expand_dims(train_masks_encoded_original_shape, axis=3)
# 10% for test and other for train
X1, X_test, y1, y_test = train_test_split(train_img, train_masks_input, test_size = 0.10, random_state = 0)
X_train=X1
y_train=y1
from keras.utils import to_categorical
train_masks_cat = to_categorical(y_train, num_classes=n_classes)
y_train_cat = train_masks_cat.reshape((y_train.shape[0], y_train.shape[1], y_train.shape[2], n_classes))
test_masks_cat = to_categorical(y_test, num_classes=n_classes)
y_test_cat = test_masks_cat.reshape((y_test.shape[0], y_test.shape[1], y_test.shape[2], n_classes))
#######################################################
########train model####################################
#######################################################
IMG_HEIGHT = X_train.shape[1]
IMG_WIDTH  = X_train.shape[2]
IMG_CHANNELS = X_train.shape[3]
def get_model():
    return unet_model(n_classes=n_classes, IMG_HEIGHT=IMG_HEIGHT, IMG_WIDTH=IMG_WIDTH, IMG_CHANNELS=IMG_CHANNELS)

model = get_model()
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.summary()
# Train the parameter model to adjust
history = model.fit(X_train, y_train_cat,
                    batch_size = 4,
                    verbose=1,
                    epochs=10,
                    validation_data=(X_test, y_test_cat),
                    #class_weight=class_weights,
                    shuffle=False)
# saving the modell
model.save('/usr/local/bin/model/train_25_04_band_testnautilus.hdf5')(2)
##############################################
#########Evaluate model#######################
##############################################
_, accuracy = model.evaluate(X_test, y_test_cat)
print("Accuracy is = ", (accuracy * 100.0), "%")
Remember how you bound your directories with apptainer
1 : Dataset directory (here: /scratch/nautilus/users/john-d@univ-nantes.fr/dataset:/scratch/nautilus)
2 : Model directory (here: /scratch/nautilus/users/john-d@univ-nantes.fr/tutounet:/usr/local/bin )
3 : So that the documentation is not too cumbersome, there are no details on the unet model used. There are quite a few tutorials available on the web.

.4. Tutorial to come (inference on this model)