Skip to content

API Reference

GSP

gsppy.gsp.GSP

Generalized Sequential Pattern (GSP) Algorithm.

The GSP algorithm is used to find frequent sequential patterns in transactional datasets based on a user-defined minimum support threshold. This implementation is optimized for efficiency with candidate generation, batch processing, and multiprocessing.

Attributes: freq_patterns (List[Dict[Tuple, int]]): Stores discovered frequent sequential patterns at each k-sequence level as dictionaries mapping patterns to their support counts. transactions (List[Tuple]): Preprocessed dataset where each transaction is represented as a tuple of items. unique_candidates (List[Tuple]): List of initial singleton candidates (1-item sequences). max_size (int): Length of the longest transaction in the dataset, used to set the maximum k-sequence for pattern generation.

__init__

__init__(raw_transactions)

Initialize the GSP algorithm with raw transactional data.

Parameters: raw_transactions (List[List]): Input transaction dataset where each transaction is a list of items (e.g., [['A', 'B'], ['B', 'C', 'D']]).

Attributes Initialized: - Processes the input raw transaction dataset. - Computes unique singleton candidates (unique_candidates). - Extracts the maximum transaction size (max_size) from the dataset for limiting the search space.

Raises: ValueError: If the input transaction dataset is empty, contains fewer than two transactions, or is not properly formatted.

search

search(min_support=0.2, max_k=None, backend=None)

Execute the Generalized Sequential Pattern (GSP) mining algorithm.

This method facilitates the discovery of frequent sequential patterns in the input transaction dataset. Patterns are extracted iteratively at each k-sequence level, starting from singleton sequences, until no further frequent patterns can be found.

Parameters: min_support (float): Minimum support threshold as a fraction of total transactions. For example, 0.3 means that a sequence is frequent if it appears in at least 30% of all transactions.

Returns: List[Dict[Tuple[str, ...], int]]: A list of dictionaries containing frequent patterns at each k-sequence level, with patterns as keys and their support counts as values.

Raises: ValueError: If the minimum support threshold is not in the range (0.0, 1.0].

Logs: - Information about the algorithm's start, intermediate progress (candidates filtered), and completion. - Status updates for each iteration until the algorithm terminates.

Example: Basic usage with the default backend:

```python
from gsppy.gsp import GSP

transactions = [
    ["Bread", "Milk"],
    ["Bread", "Diaper", "Beer", "Eggs"],
    ["Milk", "Diaper", "Beer", "Coke"],
]

gsp = GSP(transactions)
patterns = gsp.search(min_support=0.3)
```

Acceleration utilities

gsppy.accelerate.support_counts

support_counts(transactions, candidates, min_support_abs, batch_size=100, backend=None)

Choose the best available backend for support counting.

Backend selection is controlled by the backend argument when provided, otherwise by the env var GSPPY_BACKEND: - "rust": require Rust extension (raise if missing) - "gpu": try GPU path when available (currently singletons optimized), fall back to CPU for the rest - "python": force pure-Python fallback - otherwise: try Rust first and fall back to Python

Example: Running a search with an explicit backend:

```python
from gsppy.accelerate import support_counts

transactions = [("A", "B"), ("A", "C")]
candidates = [("A",), ("B",), ("A", "B")]
counts = support_counts(transactions, candidates, min_support_abs=1, backend="python")
```