`superleaf`

A library for intuitive and readable data filtering and manipulation, using functional and pipeable syntax.

When trying to select and filter pandas dataframes in somewhat complicated ways, the syntax can quickly get cumbersome. This library provides utilities for quickly and intuitively specifying ways to select and filter based on column and row conditions. It also includes utilities for piping and composing value accessor and manipulation operators on more general datatypes.

Operators

Some of the operators in this library have analogues in the Python standard library operator module, but the ones here can be more flexibly composed and piped.

For example, let's say you want to take the log of all of the values inside one of the fields of a sequence of data containers. You can achieve this with the following code:

import numpy as np
from superleaf.operators import operator
from superlead.getters import attr_getter

exp_field_op = attr_getter("field_a") >> operator(np.log)  # "right shift" operator used for piping
results_iter = map(exp_field_op, data_containers)

This produces an iterator with the same results as the following list comprehension:

results_list = [np.log(datum.field_a) for datum in data_containers]

DataFrame column operators and selection/filtering

Row filtering, especially for complicated combinations of conditions, of pandas dataframes can have cumbersome and seemingly repetitive syntax. The dfilter and Col utilities can be used to more succinctly achieve complicated filtering and selection of portions of dataframes.

For example, consider the following dataframe:

import pandas as pd

df = pd.DataFrame({
    "col1": [   0,   1,   0,   1,    1,      1,   0],
    "col2": [-5.1, 2.2, 0.2, 1.7, -1.1, np.nan, 0.9],
    "col3": [ "A", "A", "C", "B",  "C",    "C", "C"],
})

Let's say we want to select the rows where col1 == 1, col2 is null or negative, and col3 == "C":

sub_df = df[(df["col1"] == 1) & (df["col2"].isna() | (df["col2"] < 0)) & (df["col3"] == "C")]

Using superleaf, we could do this with the following:

from superleaf.operator import ComparisonFunctions as F
from superleaf.dataframe.selection import dfilter

sub_df = dfilter(df, col1=1, col2=(F.isna | F.lt(0)), col3="C")

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
.github/workflows		.github/workflows
src/superleaf		src/superleaf
tests		tests
.flake8		.flake8
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

`superleaf`

A library for intuitive and readable data filtering and manipulation, using functional and pipeable syntax.

Operators

DataFrame column operators and selection/filtering

About

Releases

Packages

Languages

License

eschombu/superleaf

Folders and files

Latest commit

History

Repository files navigation

superleaf

A library for intuitive and readable data filtering and manipulation, using functional and pipeable syntax.

Operators

DataFrame column operators and selection/filtering

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

`superleaf`

Packages