This tutorial provides a step-by-step guide to analyzing VCF (Variant Call Format) files using a Bash script. The script performs basic analysis tasks such as counting variants, identifying SNPs and INDELs, and calculating the transition/transversion (Ts/Tv) ratio. It also generates summary statistics for further analysis.
Before running the script, ensure you have the following tools installed:
sudo apt-get install bcftools
.sudo apt-get install vcftools
.The script performs the following tasks:
To run the script, follow these steps:
chmod +x vcf_analysis.sh
./vcf_analysis.sh your_file.vcf
vcf_analysis_output
directory.The script generates the following output files:
summary.txt
: Contains summary statistics (total variants, SNPs, INDELs, Ts/Tv ratio).allele_freq.frq
: Contains allele frequency data.TsTv.log
: Log file for Ts/Tv ratio calculation.Here is an example of running the script:
./vcf_analysis.sh Ghir_A13G023930.vd.gene.SNPs.Gbio.1913.3kbud.HAU.recode.vcf
Output:
Analysis complete. Results are in the vcf_analysis_output directory.
This tutorial provides a simple yet powerful pipeline for analyzing VCF files. You can extend the script to include more advanced analyses as needed. Feel free to contribute to the project on GitHub.