This tutorial explains the analysis performed in the population_genetics.sh
script. You can download the script from the link below:
Download population_genetics.sh
The script performs population genetics analyses using a VCF file as input. It calculates nucleotide diversity (π), Tajima's D, and FST in sliding windows across the genome. Additionally, it generates visualizations of the results.
input.vcf
: The input VCF file containing genetic variants.pop1.txt
& pop2.txt
: Text files listing individuals from two populations for FST analysis.The script calculates nucleotide diversity (π) in 10kb sliding windows using VCFtools:
vcftools --vcf input.vcf --window-pi 10000 --window-pi-step 5000 --out population_analysis_results/stats/pi
Tajima's D is calculated for 10kb windows to detect selection signals:
vcftools --vcf input.vcf --TajimaD 10000 --out population_analysis_results/stats/tajima
FST values between two populations are computed in 10kb sliding windows:
vcftools --vcf input.vcf --weir-fst-pop pop1.txt --weir-fst-pop pop2.txt --fst-window-size 10000 --fst-window-step 5000 --out population_analysis_results/stats/fst
The script generates plots for each analysis using Python with Matplotlib and Seaborn.
Scatter plot of nucleotide diversity across genomic positions:
plt.scatter(pi['POS'], pi['PI'], s=10, alpha=0.7, color='darkorange')
Tajima's D values plotted along genomic coordinates:
plt.scatter(tajima['POS'], tajima['D'], s=10, alpha=0.7, color='royalblue')
FST values plotted to compare population differentiation:
plt.scatter(fst['POS'], fst['FST'], s=10, alpha=0.7, color='forestgreen')
population_analysis_results/stats/pi.windowed.pi
: Nucleotide diversity results.population_analysis_results/stats/tajima.Tajima.D
: Tajima's D values.population_analysis_results/stats/fst.windowed.weir.fst
: Windowed FST values.population_analysis_results/plots/*.png
: Visualization plots.population_analysis_results/report.html
: HTML summary report.This script automates population genomics analysis, making it easier to compute diversity statistics and visualize results in a structured manner.