Population Genetics Analysis Tutorial

This tutorial explains the analysis performed in the population_genetics.sh script. You can download the script from the link below:

Download population_genetics.sh

1. Introduction

The script performs population genetics analyses using a VCF file as input. It calculates nucleotide diversity (π), Tajima's D, and FST in sliding windows across the genome. Additionally, it generates visualizations of the results.

2. Input Files

3. Analysis Steps

3.1 Nucleotide Diversity (π)

The script calculates nucleotide diversity (π) in 10kb sliding windows using VCFtools:

vcftools --vcf input.vcf --window-pi 10000 --window-pi-step 5000 --out population_analysis_results/stats/pi

3.2 Tajima's D

Tajima's D is calculated for 10kb windows to detect selection signals:

vcftools --vcf input.vcf --TajimaD 10000 --out population_analysis_results/stats/tajima

3.3 Windowed FST

FST values between two populations are computed in 10kb sliding windows:

vcftools --vcf input.vcf --weir-fst-pop pop1.txt --weir-fst-pop pop2.txt --fst-window-size 10000 --fst-window-step 5000 --out population_analysis_results/stats/fst

4. Visualization

The script generates plots for each analysis using Python with Matplotlib and Seaborn.

4.1 Nucleotide Diversity (π)

Scatter plot of nucleotide diversity across genomic positions:

plt.scatter(pi['POS'], pi['PI'], s=10, alpha=0.7, color='darkorange')

4.2 Tajima's D

Tajima's D values plotted along genomic coordinates:

plt.scatter(tajima['POS'], tajima['D'], s=10, alpha=0.7, color='royalblue')

4.3 Windowed FST

FST values plotted to compare population differentiation:

plt.scatter(fst['POS'], fst['FST'], s=10, alpha=0.7, color='forestgreen')

5. Output Files

6. Conclusion

This script automates population genomics analysis, making it easier to compute diversity statistics and visualize results in a structured manner.