GLiCID
Data-driven computational pipelines Nextflow enables scalable and reproducible scientific workflows using software containers. It allows the adaptation of pipelines written in the most common scripting languages.
Install nextflow
This is how I installed Nextflow on Glicid, feel free to change the directories:
create the directory packages
in your ${HOME}
mkdir -p ${HOME}/packages/
Install a recent version of openjdk (e.g: jdk-23.0.1) from https://jdk.java.net/23/ ,
and for example download and expand it into ${HOME}/packages/jdk-23.0.1
( or look under https://jdk.java.net/archive/ ).
add JDK to your ${PATH}
:
export PATH=${HOME}/packages/jdk-23.0.1/bin:${PATH}
set the variable JAVA_HOME
export JAVA_HOME=${HOME}/packages/jdk-23.0.1
Nextflow will install some data in ${HOME}/.nextflow
.
Note that the content of this .nextflow
directory can be deleted, it will be re-filled at the next invocation of nextflow.
If you want to avoid to fill your home, you could create a symbolink link to LAB-DATA
.
mkdir -p "/LAB-DATA/GLiCID/users/${USER}/.nextflow"
ln -s "/LAB-DATA/GLiCID/users/${USER}/.nextflow" "${HOME}/.nextflow"
Then install nextflow itself:
mkdir -p "${HOME}/packages/nextflow"
cd "${HOME}/packages/nextflow"
curl -s https://get.nextflow.io | bash
create the directories where nextflow will download its libraries:
mkdir -p "/scratch/nautilus/users/${USER}/.nextflow/capsule"
Prepare a CONDA directory for nextflow:
mkdir -p /micromamba/${USER}/envs/nextflow-envs
Open ${HOME}/.bash_profile
and add the following lines
export NAUTILUS_SCRATCH=/scratch/nautilus/users/${USER}
# disable nextflow logging because I want one line per job instead of one line per process
export NXF_ANSI_LOG=false
# where to put conda/micromamba stuff
export NXF_CONDA_CACHEDIR=/micromamba/${USER}/envs/nextflow-envs
# if you want to use nf-core, don't connect to the web to check the version
export NFCORE_NO_VERSION_CHECK=1
# nextflow wants to put file here
export CAPSULE_CACHE_DIR=${NAUTILUS_SCRATCH}/.nextflow/capsule
# override NF home in the scratch instead of your home
export NXF_HOME=${NAUTILUS_SCRATCH}/.nextflow
# where to install singularity/apptainer stuff
export NXF_SINGULARITY_CACHEDIR=${NAUTILUS_SCRATCH}/.singularity/cache
export NXF_APPTAINER_CACHEDIR=${NXF_SINGULARITY_CACHEDIR}
export NXF_SINGULARITY_LIBRARYDIR=${NAUTILUS_SCRATCH}/.singularity/lib
export NXF_APPTAINER_LIBRARYDIR=${NXF_SINGULARITY_LIBRARYDIR}
# set java home
export JAVA_HOME=${HOME}/packages/jdk-23.0.1
export PATH=${JAVA_HOME}/bin:${HOME}/packages/nextflow:${PATH}
# the following line was found to be required if you want to use micromamba with nextflow
export PATH=${HOME}/.local/bin:${PATH}
Note: here I add java and nextflow to my PATH but one could imagine to use "modules".
exit Glicid and re-log again. Nextflow should be installed and in your PATH.
Run workflow with the Bird Cluster
I’m currently using the following config file ${HOME}/.nextflow/config/nautilus-devel-001.cfg
to run my jobs on the cluster.
The memory, time , etc.. can be adjusted according to your needs.
process {
executor="slurm"
clusterOptions = "--qos=short"
queue = "standard"
cache = "lenient"
/* if you want to use conda with micromamba */
conda.enabled=true
conda.useMicromamba=true
withLabel:process_single {
queue = {"standard"}
cpus={1}
memory = {2.GB * task.attempt }
}
withLabel:process_low {
clusterOptions = {task.attempt==1?"--qos=quick":"--qos=medium"}
queue = {"standard" }
cpus={1 * task.attempt}
memory = {5.GB * task.attempt}
time = {task.attempt==1?"3h":"24h"}
}
withLabel:process_short {
clusterOptions = {task.attempt==1?"--qos=quick":"--qos=medium"}
queue = {"standard" }
cpus={2 * task.attempt}
memory = {5.GB * task.attempt}
time = {task.attempt==1?"3h":"24h"}
}
withLabel:process_quick {
clusterOptions = {task.attempt==1?"--qos=quick":"--qos=medium"}
queue = {"standard" }
cpus={1 * task.attempt}
memory = {5.GB * task.attempt}
time = {task.attempt==1?"3h":"24h"}
}
withLabel:process_medium {
clusterOptions = {"--qos=medium"}
cpus={3*task.attempt}
memory = {10.GB * task.attempt}
}
withLabel:process_high {
clusterOptions = {"--qos=long"}
cpus={10}
memory = {20.GB * task.attempt}
queue = {"standard"}
}
withLabel:process_long {
clusterOptions = {"--qos=long"}
queue = {"standard"}
}
withLabel:process_high_memory {
clusterOptions = {"--qos=long"}
queue = {"standard"}
}
}
then, use this config file when running a workflow with nextflow:
nextflow run -c "${HOME}/.nextflow/config/${HOSTNAME}.cfg" main.nf