Interactive Examples with Marimo¶
GSP-Py provides interactive notebook examples using marimo that demonstrate various features and use cases. Marimo notebooks are reactive Python notebooks that provide an interactive environment for exploring GSP-Py functionality.
What is Marimo?¶
Marimo is a reactive Python notebook that runs as a pure Python program. Unlike traditional notebooks:
- Reactive: Changes propagate automatically through dependent cells
- Reproducible: Runs deterministically every time
- Git-friendly: Stored as pure Python files (.py), not JSON
- Interactive: Edit code and see results update in real-time
- Shareable: Can be exported to HTML, PDF, or run as web apps
Available Notebooks¶
1. Sequence Abstraction Example¶
Learn how to use the Sequence class for working with sequential patterns in a structured way.
Topics covered: - Traditional dict-based output vs. Sequence objects - Accessing sequence properties (items, support, length) - Filtering and analyzing sequences - Creating custom sequence objects - Pattern analysis and statistics
Notebook: notebooks/sequence_example.py
2. Itemset Support Example¶
Understand how to work with itemsets where multiple items can occur together at the same time step.
Topics covered: - Flat vs. itemset sequences - Market basket analysis with itemsets - Web clickstream with parallel page views - Reading itemsets from SPM format
Notebook: notebooks/itemset_example.py
3. DataFrame Integration Example¶
Learn how to use GSP-Py with Polars and Pandas DataFrames for efficient pattern mining.
Topics covered: - Polars DataFrame with grouped format - Pandas DataFrame with sequence format - Temporal mining with timestamps - Reading from Parquet files - Performance comparison
Notebook: notebooks/dataframe_examples.py
4. Hooks Example¶
Explore how to use preprocessing, postprocessing, and candidate filtering hooks.
Topics covered: - Custom preprocessing hooks - Candidate filtering strategies - Postprocessing transformations - Practical examples with hooks
Notebook: notebooks/hooks_example.py
5. Custom Hooks¶
Advanced custom hooks for preprocessing, postprocessing, and candidate filtering.
Topics covered: - Normalization and filtering hooks - Length and support constraint filters - Pattern transformation and enrichment - Metadata addition and top-k filtering
Notebook: notebooks/custom_hooks.py
6. Parquet Round-trip Example¶
Complete workflow for using Parquet files with GSP-Py.
Topics covered: - Loading transactions from CSV - Converting to Parquet format - Running GSP mining on Parquet data - Exporting results to Parquet - Handling edge cases and errors
Notebook: notebooks/parquet_roundtrip_example.py
Setup and Installation¶
Before running the notebooks, you need to install marimo and GSP-Py:
# Install GSP-Py with dataframe support (recommended)
pip install 'gsppy[dataframe]'
# Install marimo
pip install marimo
Running Notebooks Locally¶
To run these notebooks interactively on your machine:
- Clone the repository:
git clone https://github.com/jacksonpradolima/gsp-py.git
cd gsp-py
- Run a notebook in interactive mode:
marimo edit notebooks/sequence_example.py
This opens the notebook in your browser where you can:
- Modify code and see results update automatically
- Experiment with different parameters
- Add new cells and explore the library
- Save your changes back to the .py file
- Run a notebook as a script:
python notebooks/sequence_example.py
Or use marimo to run it as a read-only app:
marimo run notebooks/sequence_example.py
Working with Marimo Notebooks¶
Interactive Editing¶
When you run marimo edit, you get a reactive development environment:
- Edit any cell - Changes propagate automatically to dependent cells
- Add new cells - Use the + button or keyboard shortcuts
- Reorder cells - Drag and drop to reorganize
- View outputs - Rich display of DataFrames, plots, and results
Keyboard Shortcuts¶
Cmd/Ctrl + Enter: Run current cellShift + Enter: Run cell and select nextCmd/Ctrl + S: Save notebook
Cell Dependencies¶
Marimo automatically tracks dependencies between cells: - If you change a variable, all cells using it update - No need to manually re-run cells in order - Prevents stale outputs and hidden state bugs
Creating Your Own Notebooks¶
You can create your own marimo notebooks for GSP-Py:
# Create a new notebook
marimo new my_notebook.py
# Edit an existing notebook
marimo edit my_notebook.py
Marimo notebooks are: - Reactive: Changes propagate automatically - Reproducible: Run deterministically every time - Git-friendly: Stored as pure Python files - Shareable: Export to HTML, PDF, or run as apps