Tutorial 1 (buildah/apptainer/unet)
.1. Introduction
This tutorial aims to run a deep learning algorithm on GLICID computing clusters using your own Dockerfile. We will use the example
It is broken down into several parts:
-
creation of the apptainer image using buildah.
-
start training an AI model on the Nautilus cluster using a submit script
-
Python script which performs semantic segmentation.
.2. buildah
On Glicid clusters, you will not find the docker container solution. On the other hand, apptainer (formerly singularity) is available.
What if you have been using Docker for a while and have some existing Dockerfiles? Not a problem. Buildah can build images using a Dockerfile. The build command takes a Dockerfile as input and produces an OCI image..
.2.1. create the apptainer image with buildah
Example of dockerfile for the tutorial: Dockerfile_unet-gpu
FROM tensorflow/tensorflow:2.10.1-gpu MAINTAINER Aymeric BLONDEl RUN apt-get update && \ apt-get upgrade -y && \ apt-get install -y \ python-setuptools #apt install python3.8-venv RUN apt-get install -y cuda-compat-12-1 RUN apt-get install -y nvidia-cuda-toolkit RUN apt-get install -y vim RUN pip install --upgrade pip RUN pip install tf-keras RUN pip install matplotlib RUN pip install numpy RUN pip install opencv-python RUN mkdir /scratch RUN mkdir /scratch/waves RUN mkdir /scratch/nautilus WORKDIR /scratch/nautilus RUN pip install opencv-python-headless==4.5.3.56 RUN pip install scikit-learn
Creating Images From Containerfiles With Buildah
buildah bud -f Dockerfile_unet-gpu -t glicid_unet2:latest .
The Buildah utility is functionally similar to Podman in the way that it behaves, but maintains independence from Podman to facilitate the build of OCI compliant images. We want to create OCI images to be able to use them without problems with apptainer |
Apptainer is in fact an open-source technology for container management, based on OCI (Open Container Initiative) specifications, ensuring the portability of images.
By using Apptainer with OCI images, developers benefit from a consistent and portable infrastructure for their containerized applications.
We can see our image
podman images
result
localhost/glicid_unet2 latest 66e0f7f53723 About a minute ago 7.9 GB
.2.2. construction of the apptainer image from the podman OCI image
podman save glicid_unet:latest -o glicid_unet2.tar apptainer build glicid_unet2.sif docker-archive:glicid_unet2.tar
.2.3. moving the image and the dataset to Nautilus
moving image apptainer
scp glicid_unet2.sif Glicid:/LAB-DATA/GLiCID/projects/johndoeprj/apptainer-img/glicid_unet2.sif
moving the dataset
scp images_masks_tests_bande_X10_ab_22_03_24.tar Glicid:/scratch/nautilus/users/john-d@univ-nantes.fr/dataset/
We place ourselves in the correct directory:
cd /scratch/nautilus/users/john-d@univ-nantes.fr/tutounet
.2.4. creation of the submission slurm file
#!/bin/bash
#SBATCH --job-name=unetx100
#SBATCH -p gpu
#SBATCH --output=%x-%j.out
#SBATCH -t 1-00:00:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1 # Number of OpenMP threads
#SBATCH --nodes=1
#SBATCH --cluster=nautilus
#SBATCH --qos=short
#SBATCH --mem=20G
#SBATCH --gres=gpu:1
#SBATCH --mail-user=john-d@univ-nantes.fr
#SBATCH --mail-type=END
#SBATCH --mail-type=FAIL
source /usr/share/Modules/init/bash
module load guix/latest
module load apptainer/1.1.6
export APPTAINERENV_LD_LIBRARY_PATH=/opt":$LD_LIBRARY_PATH"
export CUDA_VISIBLE_DEVICES=2
apptainer exec --nv -B /opt/software/glicid/biblio_scientifiques/cuda_12.2.0_535.54.03/lib64:/opt -B /scratch/nautilus/users/john-d@univ-nantes.fr/dataset:/scratch/nautilus -B /scratch/nautilus/users/john-d@univ-nantes.fr/tutounet:/usr/local/bin /LAB-DATA/GLiCID/projects/modes/apptainer-img/glicid_unet2.sif /usr/bin/python3 /usr/local/bin/unet/FIB_SEG/unet-modif-semantic-seg/simple_unet_multi/mric_2_512_nautilus.py
As a reminder: -B allows you to bind a cluster storage space to a directory in the apptainer container. --nv : Flag to run a CUDA application inside a container. |
.3. Use U-Net for semantic image segmentation.
U-Net is a convolutional neural network for semantic image segmentation. It uses an encoder-decoder architecture with residual connections to capture contextual information at different scales. U-Net produces accurate masks by identifying and classifying each pixel based on its semantics.
.3.1. Contents of the mric_2_512_nautilus.py file (semantic segmentation of an image via unet)
from mric_unet_model import unet_model #Uses softmax (3)
from keras.utils import normalize
import os
import cv2
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
# indicate to tensorflow where is cuda
os.environ["XLA_FLAGS"] = "--xla_gpu_cuda_data_dir=/usr/lib/cuda"
##############################################################
######### Create Datasets Train and Test ####################
##############################################################
SIZE_X = 512
SIZE_Y = 512
# number of classes to segment
n_classes=3
# train image in a list
train_img = []
# mask image in a list
train_mask = []
x = []
y = []
(1)
rep_img ="/scratch/nautilus/images_masks_tests_bande_X10_ab/train/augmented_imagesx100"
rep_mask ="/scratch/nautilus/images_masks_tests_bande_X10_ab/train/augmented_masksx100"
img_files = os.listdir(rep_img)
paires_img_mask = []
# Images and Masks must have the same dimensions
for img_file in img_files:
img_path = os.path.join(rep_img, img_file)
nom_masque = os.path.splitext(img_file)[0] + ".tif"
masque_path = os.path.join(rep_mask, nom_masque)
img = cv2.imread(image_path,0)
if img is not None:
x.append(cv2.resize(img, (SIZE_X,SIZE_Y), interpolation=cv2.INTER_CUBIC))
masque = cv2.imread(masque_path, cv2.IMREAD_GRAYSCALE)
y.append(cv2.resize(masque, (SIZE_X,SIZE_Y), interpolation=cv2.INTER_CUBIC))
# Convert list to numpy array need by ML
train_img = np.array(x)
train_mask = np.array(y)
# Encode Labels
labelencoder = LabelEncoder()
n, h, w = train_masks.shape
train_masks_reshaped = train_masks.reshape(-1,1)
train_masks_reshaped_encoded = labelencoder.fit_transform(train_masks_reshaped)
train_masks_encoded_original_shape = train_masks_reshaped_encoded.reshape(n, h, w)
np.unique(train_masks_encoded_original_shape)
train_img = np.expand_dims(train_img, axis=3)
train_img = normalize(train_img, axis=1)
train_masks_input = np.expand_dims(train_masks_encoded_original_shape, axis=3)
# 10% for test and other for train
X1, X_test, y1, y_test = train_test_split(train_img, train_masks_input, test_size = 0.10, random_state = 0)
X_train=X1
y_train=y1
from keras.utils import to_categorical
train_masks_cat = to_categorical(y_train, num_classes=n_classes)
y_train_cat = train_masks_cat.reshape((y_train.shape[0], y_train.shape[1], y_train.shape[2], n_classes))
test_masks_cat = to_categorical(y_test, num_classes=n_classes)
y_test_cat = test_masks_cat.reshape((y_test.shape[0], y_test.shape[1], y_test.shape[2], n_classes))
#######################################################
########train model####################################
#######################################################
IMG_HEIGHT = X_train.shape[1]
IMG_WIDTH = X_train.shape[2]
IMG_CHANNELS = X_train.shape[3]
def get_model():
return unet_model(n_classes=n_classes, IMG_HEIGHT=IMG_HEIGHT, IMG_WIDTH=IMG_WIDTH, IMG_CHANNELS=IMG_CHANNELS)
model = get_model()
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.summary()
# Train the parameter model to adjust
history = model.fit(X_train, y_train_cat,
batch_size = 4,
verbose=1,
epochs=10,
validation_data=(X_test, y_test_cat),
#class_weight=class_weights,
shuffle=False)
# saving the modell
model.save('/usr/local/bin/model/train_25_04_band_testnautilus.hdf5')(2)
##############################################
#########Evaluate model#######################
##############################################
_, accuracy = model.evaluate(X_test, y_test_cat)
print("Accuracy is = ", (accuracy * 100.0), "%")
Remember how you bound your directories with apptainer |
1 | : Dataset directory (here: /scratch/nautilus/users/john-d@univ-nantes.fr/dataset:/scratch/nautilus) |
2 | : Model directory (here: /scratch/nautilus/users/john-d@univ-nantes.fr/tutounet:/usr/local/bin ) |
3 | : So that the documentation is not too cumbersome, there are no details on the unet model used. There are quite a few tutorials available on the web. |