View on GitHub

longgwas_workshop

Worshop pre requisites

Before you can start this workshop, there are a few requisites you should meet to be able to take hands on this workshop.

  1. Have access to Ubuntu Linux distribution:
    • Access to your personal Ubuntu Linux distribution (ie Linux Ubuntu on a VM, Linux Ubuntu on WSL)
    • Access to Ubuntu Linux on a High Computing Cluster (HPC)
    • Access to Ubuntu Linux Virtual Machine in Terra.
  2. Bash version must be 3.2 or later in your Linux distribution

  3. Java Runtime Environment JRE 11 or higher
    longgwas Nextflow now makes use of the newest DSL2. In order to run longgwas workflows, you need to have JRE 11 or higher on your Ubuntu Linux

  4. Install docker or docker desktop

    Nextflow runs the different processes that make up a workflow within docker containers to guarantee all workflow dependencies and versions are specified before hand. This guarantees the tool’s portability making reproducibility now at hand.

  5. Install Nextflow

  6. Get git installed in Linux Ubuntu We need to install git so that we can copy the tool from a github remote to our local directory.
    This guarantees

  7. Modify /etc/default/grub. In order to enable Nextflow to manage memory resources within Docker containers, run the following command in your Bash Shell
GRUB_CMDLINE_LINUX="cgroup_enable=memory swapaccount=1"
  1. Clone longgwas tool in your working directory
git clone --single-branch --branch modularize  git@github.com:michael-ta/longitudinal-GWAS-pipeline.git


Outline

  1. Introduction to Nextflow and longgwas
  2. Get familiar with longgwas components
  3. longgwas workflow summary
  4. Building your first nextflow process
  5. Running longitudinal GWASs
  6. Summary

1. Introduction to Nextflow

To get started, we are going to follow a brief presentation to understand what Nextflow is.
Then, we will introduce the longgwas tool. Finally, we will review all longgwas components and capabilities.


2. Get familiar with longgwas components


longgwas is hosted on github remote, and it is where all the development and

2.1 Modules, subworkflows and workflows

longgwas is hosted on github.
It has three main components everyone should be familiar with:

As an example, we could ask ourselves the question of how the GWAS analysis is run within the longgwas workflow after all the QC has been performed.

Which are the modules? We could look at the content in the gwasrun module. There are three processess definitions, one per nextflow file.

Which workflow are the modules included? These three modules are included in the rungwas subworkflow. Based on user inputs, this subworkflow will allow us to run either of the three main models currently available through longgwas (GLM, CPH, GALLOP-LMM)

Do we see the GWAS subworkflow in the worjflow? Finally, the subworkflow is included in the main workflow as one step.

2.2 config and yml file

In addition, there are other key components that allow the workflow to run:

2.3 Dockerfile and docker image hosted in the Hub

There is a Dockerfile that contains all software, dependencies and versions longgwas uses to run the main workflow.
However, it is no longer needed to build the docker image yourself. We are currently hosting the longgwas docker container on the Hub, which means that as long as you have docker installed, when you run longgwas, the tool will automatically pull the image from the Hub

2.4 Documentation pages

longgwas has a very good online documentation resource It has information on how to run longgwas as a thorough description for all the parameters the tool supports


3 Workflow summary

The workflow to run longgwas could be thought in two somewhat simple steps:

We won’t go through the data preparation step in Terra today as it is out of the scope of this workshop, but I have added an example notebook to quickly see an example. It is available on github so you can download and reuse.
CLICK ME


4. Building your first nextflow process

We are going to go through three examples running nextflow workflows and getting hands on interacting with some nextflow components Please, clone the github repository if you have not done so yet.

git clone git@github.com:AMCalejandro/longgwas_workshop.git

Example 1

A very easy example to get familiar with process and workflow nextflow keywords.
This is convenient to familiarise yourself with dataflow. Where does my data go after the process run on the workflow?

Example 2

This is an example to introduce attendees with channels nextflow structures, and how they are uused coupled with processes.
We will then try to add an extra process that makes use of the data coming out of the first process

Example 3

A very complete example provided by the Nextflow training team which is great as you can easily understand all the components part of the nextflow script.


5. Running longitudinal GWASs

Before getting started, please clone the modularize branch of longwas github remote if you have not done so yet.

git clone --single-branch --branch modularize  git@github.com:michael-ta/longitudinal-GWAS-pipeline.git

Now that we have seen some basic examples running nextflow, we are going to try to run our very first job with longgwas. To do so, we are going to through the following steps together.

5.1 Run longgwas analysis

Once we have applied all the changes on the yml file, we can run with a local executor. We are going to apply several changes so that:

Tu run the analysis we will repeteadly use the follow comand

nxtflow run workflows/main.nf \
  -params-file params.yml \
  --profile standard

5.2 Demo longitudinal GWASs with cloud batch

Now I am going to give a quick demonstration on how we can connect to google and run our analyses using Nextflow google cloud batch executor.

6. Summary.