Rollable Functions

In many cases, we want to take a simple function mapping a sequence to some scalar, and map it to all subsequences of a given length in a set of sequences. Examples of this is: * Hashing kmers * Use a position weight matrix to compute a score for a sequence * Find all occurances of a substring in a sequence set

bioNumpy provides this functionality throught the RollableFunction class. All you have to to is subclass the RollableFunction class, and write a broadcastable version of the sequence function as the __call__ method. A call to the rolling_window method will then apply the function to all the subsequences of length window_size in the sequence set. window_size can either be set as self.window_size or passed as argument to the rolling_window method.

For instance, if we want to check for instances of “CGGT” in a set of sequences, we can use the following:

from bionumpy.rollable import RollableFunction
from bionumpy.sequences import as_sequence_array
import numpy as np


class StringMatcher(RollableFunction):
    def __init__(self, matching_sequence):
        self._matching_sequence = as_sequence_array(matching_sequence)

    def __call__(self, sequence):
        return np.all(sequence == self._matching_sequence, axis=-1)

The __call__ function here just checks that all the letters in the sequence are equal to the corresponding letters in the matching sequence. Specifying axis=-1 for the all function makes the function broadcastable:

>>> matcher = StringMatcher("CGGT")
>>> matcher("CGGT")
Sequence(True)

Giving a sequence of different length to the __call__ function returns False, since the sequneces are then not equal:

>>> matcher("CGGTA")
<stdin>:7: DeprecationWarning: elementwise comparison failed; this will raise an error in the future.
False

However we can use the rolling_window method to match every subsequence of length 4 to “CGGT”:

>>> matcher.rolling_window("CGGTA")
array([ True, False])
>>> matcher.rolling_window(["CGGTA", "ACGGTG"])
RaggedArray([[True, False], [False, True, False]])

For examples of rollable function implementations see: * Minimizers * KmerEncoding * PositionWeightMatrix