Introduction to Bash Scripting for Biology

What is Bash Scripting?

Bash scripting is like writing a recipe for your computer - it's a series of commands that:

1. Script Components Explained

#!/bin/bash # Shebang line
# This is a comment # Explanatory text
echo "Analyzing DNA sequences" # Command

Key Elements of a Bash Script

Shebang Line (#!): First line starting with #!
- Tells the computer this is a Bash script
- Always use: #!/bin/bash

Comments: Lines starting with #
- Ignored by the computer
- Help explain your code

Commands: Instructions to execute
- Same commands you'd type in terminal
- Example: echo, grep, wc

2. Creating a Basic Analysis Script

#!/bin/bash

# Create sample DNA file
echo ">Sample1
ATGCGATCG
>Sample2
TAGCTAGCTAG" > sequences.fasta

# Count sequences
echo "Number of sequences: $(grep -c '^>' sequences.fasta)"

# Calculate total length
total_length=$(grep -v '^>' sequences.fasta | tr -d '\n' | wc -c)
echo "Total length: $total_length bp"

# Calculate GC%
gc_count=$(grep -v '^>' sequences.fasta | tr -d '\n' | tr -cd 'GC' | wc -c)
gc_percent=$(echo "scale=2; ($gc_count/$total_length)*100" | bc)
echo "GC Content: $gc_percent%"

Step-by-Step Explanation

Creating FASTA File:
- echo writes lines to sequences.fasta
- > indicates sequence names

Counting Sequences:
- grep -c '^>' counts lines starting with >

Calculating Length:
1. grep -v '^>': Exclude header lines
2. tr -d '\n': Remove line breaks
3. wc -c: Count characters

GC% Calculation:
1. tr -cd 'GC': Keep only G/C
2. bc: Calculator for percentage

3. Running Your Script

Execution Instructions

  1. Save as dna_analysis.sh
  2. Make executable:
    chmod +x dna_analysis.sh
  3. Run:
    ./dna_analysis.sh

4. Expected Output

Number of sequences: 2
Total length: 19 bp
GC Content: 47.36%

Key Bash Concepts Used

ConceptExamplePurpose
Variablestotal_length=...Store values
Command Substitution$(...)Capture command output
Pipes (|)cmd1 | cmd2Pass output between commands
Redirection (> )echo ... > fileWrite to file

Why This Matters for Biology