How to generate T2T BAM file on Galaxy FOR FREE

Galaxy is web-based platform for reproducible computational analysis – https://usegalaxy.eu/

And we can upload our FASTQ or BAM (hg19, hg38) and generate(convert) to BAM in T2T format FOR FREE !!!

 

FIrst of all need to create free account or login if you have existed.

Then, depends on what non-T2T file you have (BAM or FASTQ) you may have different flows to proceed!!!

 

If you have only non-T2T BAM file

My FTDNA BAM file is smaller in size and it is from Big Y 700 analysis which is NOT quite a WGS-like test.

So I downloaded hg19 BAM from “Dante Labs” and then sued that hg19 BAM file to upload to UseGalaxy because it the only one WGS-test kind of BAM file I have.

 

step 1 – Upload BAM file

Go to Upload Data and > Choose local files > Select your file *.BAM or *.CRAM > click Start

 

 

In my case Dante Labs 90.2 GB BAM file takes 7-8 hours to upload to the server. Other BAM files might be smaller.

So your Internet better be STABLE for at least one day.

 

When uploaded process reach 100% there will be all green with checkbox.

 

When BAM file uploading status finished, you should also see it in the list of your files (histories) in the top right section/pane:

If the job related to BAM file is still in orange color, it does something – job in state Running… Need to wait BEFORE Proceed, or proceed but to realise that next job will be added in queue.

When that”something” is finished then file is green and can be FULLY used.

And in case of need there is ALSO public URL to BAM file and ability to save it locally.

In my case it’s the same file I have. I now need to continue and convert/extract to FASTQ.

 

step 2  – Extract FASTQ from BAM

When BAM file uploading status finished, in the left pane Search type “Samtools fastx extract FASTA or FASTQ from alignment files

In “Output format” select “compressed FASTQ

 

Click “Run tool” and and then there will be orange job in the History right pane/section which will be changed after a few days to green.

Or if the job is added to queue then job color will be light brown with clock icon:

 

!!! to wait day or two !!! It took me around week of time.

Anyway, when Extracting to FASTQ started it will show some data:

 

 

 

step 3 – Map reads with BWA-MEM

 

When uploading status finished, in the left pane Search type “bwa” and pick up the result with text “Map with BWA-MEM – map medium and long reads (> 100 bp) against reference genome“.

What is does, info from UseGalaxy.eu referring to http://arxiv.org/abs/1303.3997:

BWA-MEM is an alignment algorithm for aligning sequence reads or long query sequences against a large reference genome such as human. It automatically chooses between local and end-to-end alignments, supports paired-end reads and performs chimeric alignment. The algorithm is robust to sequencing errors and applicable to a wide range of sequence lengths from 70bp to a few megabases.

This Galaxy tool wraps bwa-mem module of bwa read mapping tool. The Galaxy implementation takes fastq files as input and produces output in BAM format, which can be further processed using various BAM utilities exiting in Galaxy (BAMTools, SAMTools, Picard).

 

In the left-pane select Tool Parameters:

  • In “Using reference genome” select “Human CHM13 2.0 (T2T Consortium) Jan.2022
  • In “Single or Paired-end reads” select up “Single
  • Select a FASTQ file created previously by Galaxy

 

 

In “Select analysis mode” remain “1. Simple Illumina mode

In “BAM sorting mode” remain “Sort by chromosomal coordinates

 

Click “Run tool” again and there will be task in orange color.

After day or two task will be changed to green and then we can click on icon diskette and SAVE T2T file to local drive.

While job is running you should see like this:

At this point, pay attention on files size, because it may be NOT enough. And maybe it’s reasonable to delete OLD BAM/FASTQ files.

In my case I have 1 BAM file (91GB), generated FASTQ file (97GB) and work-in-progress BAM-T2T file estimated size at least 90GB then in total it’s around 278 GB.

And when job started my available capacity didn’t look enough:

It took me After a few days converting/mapping to BAM in T2T format finished, and I can download now new BAM file (66.4 GB):

btw, disk space is exactly enough for my one BAM hg19 file, converted FASTQ file, and mapped T2T BAM file:

 

 

If you have only non-T2T FASTQ file

I have my FASTQ file from Dante Labs but I didn’t try to test this flow below.

 

  • Upload Data \ Choose local files \ Select TWO FASTQ files \ Click Start.
  • Search “bwa” and select “Map with BWA-MEM – map medium and long reads (> 100 bp) against reference genome
  • In expanding list “Using reference genome”: type T2T and select Human CHM13 2.0 (T2T Consortium) Jan.2022
  • In “Single or Paired-end reads” REMAIN Paired!!!
  • In “Select first set of reads” select uploaded FASTQ file with number 1.
  • In “Select second set of reads” select uploaded FASTQ file with number 2.
  • In “Select analysis mode” remain 1. Simple Illumina mode
  • In “BAM sorting mode” remain Sort by chromosomal coordinates
  • Click Run tool and the job in orange color should appear started. Then after a few days it will be changed into green. And then we should be able to DOWNLAOD T2T BAM file. There will be also bam.bai file aka Index file for the T2T BAM file.

 

 

Format of shared URLs:

 

To BAM file (available upon click on icon)

https://usegalaxy.eu/api/datasets/4838***4a35/display?to_ext=bam

To bam.bai (hinted by bam.iobio.io and the link is downloadable)

https://usegalaxy.eu/api/datasets/4838***4a35/display?to_ext=bam.bai


Visualization

UseGalaxy.eu does have Visualisation tool, but it’s rather too scientific, and it loads data very SLOW.

It’s called Trackster. You have to have BAM file to start with (either uploaded or converted to):And then when data loaded, you can look to the chromosomes setup.

In my case my newly converted T2T BAM file contains data about 1-22, chromosomes, X chromosome, Y chromosome and also mitochondria chromosome.

 

I once uploaded my hg19 and hg38 BAM file to bam.iobio.io service/portal, and it was OK. And now, when I have T2T BAM file and also looks like shared URL to bam.bai file I wanted to visualize BAM data. But when I use direct shared remote URL to BAM file from UseGalazy.eu I got some errors there.

 

Luckily, UseGalaxy.eu DOES HAVE A DIRECT link from its site to the bam.iobio.io and then it shows data properly.

To start need to go to BAM file data set and find icon and click on it:

 

Then you will be redirected to this interim page: https://usegalaxy.eu/visualizations?dataset_id=4838ba***44a35 with preselected dataset of yours:

And then you will have 3 choices:

  • look up local version of IGV
  • shared public URL to IGV
  • and the proper URL to bam.iobio.io portal to pass the data from UseGalaxy.eu

 

https://usegalaxy.eu/display_application/4838ba*****e44a35/iobio_bam/bam_iobio

This is a UseGalaxy.eu URL which will lead to proper page on BAM.iobio.io portal.

Resources

 

 

Leave a comment