FASTQ filteringΒΆ
Before following this tutorial, we assume you have already followed the introduction part of reading files (see Reading files).
The following in an example of a small script that filters FASTQ reads. This example illustrates the use of multiple functions decorated with @streamable(). Each function is designed so that it initially works on one chunk, but with the streamable descorator, we can send chunks from a file and BioNumPy handles the rest for us.
This example also illustrates how to chain multiple functions.
import numpy as np
import bionumpy as bnp
from bionumpy.npdataclassstream import streamable
@streamable()
def filter_reads_on_mean_base_quality(reads, minimum_base_quality=20):
mask = np.mean(reads.quality, axis=-1) > minimum_base_quality
return reads[mask]
@streamable()
def filter_reads_on_minimum_base_quality(reads, min_base_quality=5):
mask = np.min(reads.quality, axis=-1) > min_base_quality
return reads[mask]
def main():
reads = bnp.open("example_data/big.fq.gz").read_chunks()
reads = filter_reads_on_mean_base_quality(reads, 10)
reads = filter_reads_on_minimum_base_quality(reads, 1)
print("Number of reads after filtering: ", sum(len(r) for r in reads))