GLiCID

(https://www.nextflow.io/)

Data-driven computational pipelines Nextflow enables scalable and reproducible scientific workflows using software containers. It allows the adaptation of pipelines written in the most common scripting languages.

Install nextflow

This is how I installed Nextflow on Glicid, feel free to change the directories:

create the directory packages in your ${HOME}

mkdir -p ${HOME}/packages/

Install a recent version of openjdk (e.g: jdk-23.0.1) from https://jdk.java.net/23/ , and for example download and expand it into ${HOME}/packages/jdk-23.0.1 ( or look under https://jdk.java.net/archive/ ).

add JDK to your ${PATH}:

export PATH=${HOME}/packages/jdk-23.0.1/bin:${PATH}

set the variable JAVA_HOME

export JAVA_HOME=${HOME}/packages/jdk-23.0.1

Nextflow will install some data in ${HOME}/.nextflow. Note that the content of this .nextflow directory can be deleted, it will be re-filled at the next invocation of nextflow. If you want to avoid to fill your home, you could create a symbolink link to LAB-DATA.

mkdir -p  "/LAB-DATA/GLiCID/users/${USER}/.nextflow"
ln -s  "/LAB-DATA/GLiCID/users/${USER}/.nextflow" "${HOME}/.nextflow"

Then install nextflow itself:

mkdir -p "${HOME}/packages/nextflow"
cd "${HOME}/packages/nextflow"
curl -s https://get.nextflow.io | bash

create the directories where nextflow will download its libraries:

mkdir -p "/scratch/nautilus/users/${USER}/.nextflow/capsule"

Prepare a CONDA directory for nextflow:

mkdir -p /micromamba/${USER}/envs/nextflow-envs

Open ${HOME}/.bash_profile and add the following lines

export NAUTILUS_SCRATCH=/scratch/nautilus/users/${USER}
# disable nextflow logging because I want one line per job instead of one line per process
export NXF_ANSI_LOG=false
# where to put conda/micromamba stuff
export NXF_CONDA_CACHEDIR=/micromamba/${USER}/envs/nextflow-envs
# if you want to use nf-core, don't connect to the web to check the version
export NFCORE_NO_VERSION_CHECK=1
# nextflow wants to put file here
export CAPSULE_CACHE_DIR=${NAUTILUS_SCRATCH}/.nextflow/capsule
# override NF home in the scratch instead of your home
export NXF_HOME=${NAUTILUS_SCRATCH}/.nextflow
# where to install singularity/apptainer stuff
export NXF_SINGULARITY_CACHEDIR=${NAUTILUS_SCRATCH}/.singularity/cache
export NXF_APPTAINER_CACHEDIR=${NXF_SINGULARITY_CACHEDIR}
export NXF_SINGULARITY_LIBRARYDIR=${NAUTILUS_SCRATCH}/.singularity/lib
export NXF_APPTAINER_LIBRARYDIR=${NXF_SINGULARITY_LIBRARYDIR}
# set java home
export JAVA_HOME=${HOME}/packages/jdk-23.0.1
export PATH=${JAVA_HOME}/bin:${HOME}/packages/nextflow:${PATH}
# the following line was found to be required if you want to use micromamba with nextflow
export PATH=${HOME}/.local/bin:${PATH}

Note: here I add java and nextflow to my PATH but one could imagine to use "modules".

exit Glicid and re-log again. Nextflow should be installed and in your PATH.

Run workflow with the Bird Cluster

I’m currently using the following config file ${HOME}/.nextflow/config/nautilus-devel-001.cfg to run my jobs on the cluster. The memory, time , etc.. can be adjusted according to your needs.

process {
executor="slurm"
clusterOptions = "--qos=short"
queue = "standard"
cache = "lenient"

/* if you want to use conda with micromamba */
conda.enabled=true
conda.useMicromamba=true

    withLabel:process_single {
        queue = {"standard"}
        cpus={1}
        memory = {2.GB  * task.attempt }
    }
    withLabel:process_low {
        clusterOptions = {task.attempt==1?"--qos=quick":"--qos=medium"}
        queue = {"standard" }
        cpus={1  * task.attempt}
        memory = {5.GB  * task.attempt}
        time = {task.attempt==1?"3h":"24h"}
    }
    withLabel:process_short {
        clusterOptions = {task.attempt==1?"--qos=quick":"--qos=medium"}
        queue = {"standard" }
        cpus={2  * task.attempt}
        memory = {5.GB  * task.attempt}
        time = {task.attempt==1?"3h":"24h"}
    }
    withLabel:process_quick {
        clusterOptions = {task.attempt==1?"--qos=quick":"--qos=medium"}
        queue = {"standard" }
        cpus={1  * task.attempt}
        memory = {5.GB  * task.attempt}
        time = {task.attempt==1?"3h":"24h"}
    }
    withLabel:process_medium {
        clusterOptions = {"--qos=medium"}
        cpus={3*task.attempt}
        memory = {10.GB  * task.attempt}
    }
    withLabel:process_high {
        clusterOptions = {"--qos=long"}
        cpus={10}
        memory = {20.GB  * task.attempt}
        queue = {"standard"}
    }
    withLabel:process_long {
        clusterOptions = {"--qos=long"}
        queue = {"standard"}
    }
    withLabel:process_high_memory {
        clusterOptions = {"--qos=long"}
        queue = {"standard"}
    }

}

then, use this config file when running a workflow with nextflow:

nextflow run -c "${HOME}/.nextflow/config/${HOSTNAME}.cfg" main.nf