UseGalaxy.eu Use Case: Translational Research and Precision Medicine
With this article we would like to share Ali's experience with usegalaxy.eu and its use for his research.
The world of bioinformatics moves very quickly, and it can often become a race against time to keep up with it. This is not helped by the fragmented nature of the different tool specifications, and it becomes almost impossible to have all the tools available to ensure that you have everything available to run them. Galaxy has made it easier for bioinformaticians and non-bioinformaticians alike to evaluate different tools quickly. This, then, was the reason I had a long interaction with Galaxy recently.
Our lab is at the forefront of translational research and precision medicine. Our core activities include Active Kinome evaluation and multi-omics analyses. Luckily, we are also affiliated with a medical school, allowing us to quickly explore translation in the bench/computer-to-bedside. One such opportunity arose recently, when we were given access to exome sequencing data for a patient with certain psychiatric illnesses. Our lab has had an interest in bioenergetic dysfunction in schizophrenia and the genetic variants that allow it to happen. We have published on it earlier, (here and [here]https://doi.org/10.1038/s41380-022-01494-x).
I was then tasked with identifying potential mutations and other single nucleotide variants present in the patient's genome to determine their genotype concerning mutations related to bioenergetic dysfunction. This process took the form of the following steps:
- Align the genome to a reference assembly
- Do variant calling on the aligned data
- Filter to the variants of interest
A specific problem for us though, was identifying the appropriate set of tools for each step. Step 3, filtering, was easy enough. Alignment and variant calling, however, were different. We have had a good experience of using HISAT2 for alignment in RNASeq datasets, but: is this the optimal aligner, when it comes to variant calling? The Broad Institute recommends a pipeline for variant calling using the Genome Analysis Toolkit 4.0 (GATK4.0), but same here: is this the best pipeline? For alignment, we still have people using multiple reference genomes, from hg19 to hg38 to the latest chm13. We decided to use as many combinations of the options available as we could to ensure that we do not miss anything.
Galaxy made it easier to create a fan-in and fan-out technique that let us run the same dataset through multiple variant discovery pipelines with various parameters. These included the aligners (HISAT2, minimap2, STAR), reference genomes (hg38 and chm13), and variant callers (gatk, bcftools). This understandably created a deluge of data that practically exploded. We are incredibly thankful to the Galaxy Governance team for allotting us an expanded quota that allowed us to experiment with the multiple parameter sets and keep the intermediate results safe for provenance reasons.