This tutorial explains how to use Python and the pandas
library to extract matching data from two files.
import pandas as pd
Load the data from your files using pandas.read_csv()
. Assume the files are tab-separated (TSV) and encoded in Latin-1.
# Reading the first file
df1 = pd.read_csv('File1.txt', sep='\t', encoding='latin1')
# Reading the second file
df2 = pd.read_csv('File2.txt', sep='\t', encoding='latin1')
Filter rows in df2
where values in the column1
match the values in column1
of df1
.
# Extracting matching data
matching_data = df2[df2['column1'].isin(df1['column1'])]
Save the matching data to a new file.
# Saving the extracted data
matching_data.to_csv('extracted_data.txt', sep='\t', index=False)
Here is an example using species data:
File1.txt (list of species to extract):
Specie
Species_A
Species_C
File2.txt (data with species and codes):
Specie Code
Species_A ABC123
Species_B DEF456
Species_C GHI789
The script will produce a file named extracted_data.txt with the following content:
Specie Code
Species_A ABC123
Species_C GHI789
Using pandas
, this script efficiently matches data between two files and saves the result in a new file. Adjust the file names, column names, and formats to suit your dataset.