BWA Tutorial - Aligning Sequenced Reads to Reference Genome

Introduction

BWA maps DNA sequences to a large reference genome, such as human or plant genomes. There are different algorithms in BWA:

BWA-backtrack - Best for aligning Illumina reads of up to 100bp.
BWA-SW - For longer reads, from 70bp to Megabases.
BWA-MEM - For longer reads (better suited for most sequencing platforms).

You can download BWA and Picard tools from the following links:

Setting Up Directories

Create three directories to organize the necessary files:

mkdir Ref_genome # for reference genome

mkdir FastqFiles # for raw fastq files

mkdir BamFiles # for BWA output files

Processing the Reference Genome

To prepare the reference genome for alignment, you need to index it. Run the following command to index the reference genome:

bwa index -p Ref_genome/your_ref.genome

After running this command, several index files will be created in the Ref_genome directory.

Aligning Reads to the Reference Genome

If you have multiple paired-end reads like:

SRR1.1.fastq.gz SRR1.2.fastq.gz SRR2.1.fastq.gz SRR2.2.fastq.gz ...

Use the following loop to align them using BWA-MEM:

for INDEX in 1 2 3 4;
do
    bwa mem -M -t 8 -R "@RG\tID:COL_${INDEX}\tSM:COL_${INDEX}" Ref_genome/genome.Garb.CRI.fa \
    FastqFiles/SRR${INDEX}.1.fastq.gz \
    FastqFiles/SRR${INDEX}.2.fastq.gz \
    > BamFiles/SRR${INDEX}.sam
done

Note: The @RG field refers to read groups, which are collections of reads from a single sequencing run. This information helps in distinguishing between samples and specific sequenced samples across different experiments. It is required by tools like GATK to account for variability across sequencing runs.

Converting SAM Files to BAM

After BWA alignment, you will get SAM files. To convert these into BAM files, which are more efficient for downstream processing, run the following command:

for INDEX in {1..4};
do
    picard SortSam \
    I=BamFiles/SRR${INDEX}.sam \
    O=BamFiles/SRR${INDEX}.sorted.bam \
    SORT_ORDER=coordinate \
    CREATE_INDEX=true
done

Building BAM Index

Next, create an index for the BAM files so that downstream programs can quickly access their contents:

for INDEX in {1..4}
do
    picard BuildBamIndex \
    I=BamFiles/SRR${INDEX}.sorted.bam
done

References

For more information on BWA, visit the official manual:

BWA Manual

Primary reference:

Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009 Jul 15;25(14):1754-60. doi: 10.1093/bioinformatics/btp324. Epub 2009 May 18. PMID: 19451168; PMCID: PMC2705234.