Window functions¶

Prerequisites

Start with Basics for conversion fundamentals, Collections for iterable helpers, and Aggregations for reducers used over window frames.

Common SQL equivalents¶

Row numbering and ranking¶

RowIndex() + 1 is equivalent to row_number(), PeerGroupFirstRowIndex() + 1 is equivalent to rank(), and PeerGroupIndex() + 1 is equivalent to dense_rank().

from convtools import conversion as c


data = [
    {"team": "A", "name": "Ada", "score": 20},
    {"team": "A", "name": "Ben", "score": 20},
    {"team": "A", "name": "Cy", "score": 10},
    {"team": "B", "name": "Dee", "score": 15},
    {"team": "B", "name": "Eli", "score": 12},
]

result = (
    c.this.window(
        {
            "name": c.WindowFuncs.Row().item("name"),
            "ranking": (
                c.WindowFuncs.RowIndex() + 1,
                c.WindowFuncs.PeerGroupFirstRowIndex() + 1,
                c.WindowFuncs.PeerGroupIndex() + 1,
            ),
        }
    )
    .over(
        partition_by=c.item("team"),
        order_by=c.item("score").desc(),
    )
    .execute(data)
)
assert result == [
    {"name": "Ada", "ranking": (1, 1, 1)},
    {"name": "Ben", "ranking": (2, 1, 1)},
    {"name": "Cy", "ranking": (3, 3, 2)},
    {"name": "Dee", "ranking": (1, 1, 1)},
    {"name": "Eli", "ranking": (2, 2, 2)},
]

Aggregate over a window¶

Reducers inside window(...) behave like aggregate window functions. This example is equivalent to sum(amount) over (order by day rows between 1 preceding and current row).

from convtools import conversion as c


data = [
    {"day": 1, "amount": 5},
    {"day": 2, "amount": 7},
    {"day": 3, "amount": 3},
    {"day": 4, "amount": 8},
]

result = (
    c.this.window(
        {
            "row": (
                c.WindowFuncs.Row().item("day"),
                c.WindowFuncs.Row().item("amount"),
            ),
            "rolling_sum": c.ReduceFuncs.Sum(c.item("amount")),
        }
    )
    .over(
        order_by=c.item("day"),
        frame_mode="ROWS",
        frame_start=(1, "PRECEDING"),
        frame_end="CURRENT ROW",
    )
    .execute(data)
)
assert result == [
    {"row": (1, 5), "rolling_sum": 5},
    {"row": (2, 7), "rolling_sum": 12},
    {"row": (3, 3), "rolling_sum": 10},
    {"row": (4, 8), "rolling_sum": 11},
]

Lag and lead¶

RowPreceding(offset) is equivalent to lag(...), and RowFollowing(offset) is equivalent to lead(...).

from convtools import conversion as c


data = [
    {"day": 1, "amount": 5},
    {"day": 2, "amount": 7},
    {"day": 3, "amount": 3},
    {"day": 4, "amount": 8},
]

result = (
    c.this.window(
        {
            "row": (
                c.WindowFuncs.Row().item("day"),
                c.WindowFuncs.Row().item("amount"),
            ),
            "neighbors": (
                c.WindowFuncs.RowPreceding(1).item("amount", default=None),
                c.WindowFuncs.RowFollowing(1).item("amount", default=None),
            ),
        }
    )
    .over(order_by=c.item("day"))
    .execute(data)
)
assert result == [
    {"row": (1, 5), "neighbors": (None, 7)},
    {"row": (2, 7), "neighbors": (5, 3)},
    {"row": (3, 3), "neighbors": (7, 8)},
    {"row": (4, 8), "neighbors": (3, None)},
]

`c.WindowFuncs`¶

Function	Arguments	Returns	SQL equivalent
`Row()`	none	The current row from the current partition.	Current row reference
`RowIndex()`	none	Zero-based row index within the current partition.	`row_number() - 1`
`RowPreceding(offset, default=None)`	`offset`: rows before current row; `default`: value when missing	The row at `offset` rows before the current row, or `default`.	`lag(row, offset, default)`
`RowFollowing(offset, default=None)`	`offset`: rows after current row; `default`: value when missing	The row at `offset` rows after the current row, or `default`.	`lead(row, offset, default)`
`PeerGroupFirstRow()`	none	The first row in the current peer group.	First row among ties
`PeerGroupLastRow()`	none	The last row in the current peer group.	Last row among ties
`PeerGroupFirstRowIndex()`	none	Zero-based index of the first row in the current peer group.	`rank() - 1`
`PeerGroupLastRowIndex()`	none	Zero-based index of the last row in the current peer group.	End index of rank peer group
`PeerGroupIndex()`	none	Zero-based peer-group index within the current partition.	`dense_rank() - 1`
`FrameFirstRow(default=None)`	`default`: value for an empty frame	First row in the current frame, or `default`.	`first_value(row)`
`FrameLastRow(default=None)`	`default`: value for an empty frame	Last row in the current frame, or `default`.	`last_value(row)`
`FrameNthRow(n, default=None)`	`n`: zero-based row offset in the frame; `default`: value when missing	The `n`th row in the current frame, or `default`.	`nth_value(row, n + 1)`

Use .item(...), .attr(...), or any other conversion on row-returning functions to extract fields:

c.WindowFuncs.RowPreceding(1).item("amount", default=0)

`.over(...)` parameters¶

Parameter	Default	Description
`partition_by`	not set	Conversion used to split input rows into independent partitions. Use a tuple of conversions for multi-key partitions.
`order_by`	not set	Conversion, or tuple of conversions, used to order rows inside each partition. Equal ordering keys form a peer group. Supports sorting helpers such as `.desc()` and `none_last=True`.
`frame_mode`	`"RANGE"`	Frame interpretation: `"RANGE"` uses ordering-key values, `"ROWS"` uses row offsets, and `"GROUPS"` uses peer-group offsets.
`frame_start`	`"UNBOUNDED PRECEDING"`	Start boundary. Accepts `"UNBOUNDED PRECEDING"`, `"CURRENT ROW"`, or `(offset, "PRECEDING" / "FOLLOWING")`.
`frame_end`	`"CURRENT ROW"`	End boundary. Accepts `"UNBOUNDED FOLLOWING"`, `"CURRENT ROW"`, or `(offset, "PRECEDING" / "FOLLOWING")`.
`frame_exclusion`	`"NO OTHERS"`	Exclusion rule: `"NO OTHERS"`, `"CURRENT ROW"`, `"GROUP"`, or `"TIES"`.

Frame modes follow PostgreSQL terminology:

Mode	Frame offset meaning
`"RANGE"`	Offsets are added to or subtracted from the current row's ordering key. Offset frames require `order_by`.
`"ROWS"`	Offsets are non-negative row counts before or after the current row.
`"GROUPS"`	Offsets are peer-group counts before or after the current peer group.

For available reducers, see c.ReduceFuncs.

Frame mode examples¶

RANGE modeROWS modeGROUPS mode

from datetime import date, timedelta
from convtools import conversion as c


data = [
    {"a": 1, "dt": date(2020, 1, 1), "b": 1},
    {"a": 2, "dt": date(2020, 1, 1), "b": 6},
    {"a": 1, "dt": date(2020, 1, 2), "b": 3},
    {"a": 1, "dt": date(2020, 1, 2), "b": 4},
    {"a": 1, "dt": date(2020, 1, 2), "b": 2},
    {"a": 1, "dt": date(2020, 1, 3), "b": 5},
    {"a": 1, "dt": date(2020, 1, 4), "b": 6},
    {"a": 1, "dt": date(2020, 1, 5), "b": 7},
    {"a": 1, "dt": date(2020, 1, 7), "b": 8},
]

result = (
    c.this.window(
        {
            "sum": c.ReduceFuncs.Sum(c.item("b")),
            "idx": c.WindowFuncs.RowIndex(),
            "frame": c.ReduceFuncs.Array(c.item("b")),
        }
    )
    .over(
        partition_by=c.item("a"),
        order_by=c.item("dt"),
        frame_mode="RANGE",
        frame_start=(timedelta(days=1), "PRECEDING"),
        frame_end=(timedelta(days=1), "FOLLOWING"),
    )
    .execute(data)
)
assert result == [
    {"sum": 10, "idx": 0, "frame": [1, 3, 4, 2]},
    {"sum": 6, "idx": 0, "frame": [6]},
    {"sum": 15, "idx": 1, "frame": [1, 3, 4, 2, 5]},
    {"sum": 15, "idx": 2, "frame": [1, 3, 4, 2, 5]},
    {"sum": 15, "idx": 3, "frame": [1, 3, 4, 2, 5]},
    {"sum": 20, "idx": 4, "frame": [3, 4, 2, 5, 6]},
    {"sum": 18, "idx": 5, "frame": [5, 6, 7]},
    {"sum": 13, "idx": 6, "frame": [6, 7]},
    {"sum": 8, "idx": 7, "frame": [8]},
]

from datetime import date, timedelta
from convtools import conversion as c


data = [
    {"a": 1, "dt": date(2020, 1, 1), "b": 1},
    {"a": 2, "dt": date(2020, 1, 1), "b": 6},
    {"a": 1, "dt": date(2020, 1, 2), "b": 3},
    {"a": 1, "dt": date(2020, 1, 2), "b": 4},
    {"a": 1, "dt": date(2020, 1, 2), "b": 2},
    {"a": 1, "dt": date(2020, 1, 3), "b": 5},
    {"a": 1, "dt": date(2020, 1, 3), "b": None},
    {"a": 1, "dt": date(2020, 1, 4), "b": 6},
    {"a": 1, "dt": date(2020, 1, 5), "b": 7},
    {"a": 1, "dt": date(2020, 1, 7), "b": 8},
]

result = (
    c.this.window(
        {
            "sum": c.ReduceFuncs.Sum(c.item("b")),
            "idx": c.WindowFuncs.RowIndex(),
            "frame": c.ReduceFuncs.Array(c.item("b")),
        }
    )
    .over(
        order_by=c.item("b").desc(none_last=True),
        frame_mode="ROWS",
        frame_start="UNBOUNDED PRECEDING",
        frame_end="CURRENT ROW",
    )
    .execute(data)
)
assert result == [
    {"sum": 42, "idx": 8, "frame": [8, 7, 6, 6, 5, 4, 3, 2, 1]},
    {"sum": 21, "idx": 2, "frame": [8, 7, 6]},
    {"sum": 39, "idx": 6, "frame": [8, 7, 6, 6, 5, 4, 3]},
    {"sum": 36, "idx": 5, "frame": [8, 7, 6, 6, 5, 4]},
    {"sum": 41, "idx": 7, "frame": [8, 7, 6, 6, 5, 4, 3, 2]},
    {"sum": 32, "idx": 4, "frame": [8, 7, 6, 6, 5]},
    {"sum": 42, "idx": 9, "frame": [8, 7, 6, 6, 5, 4, 3, 2, 1, None]},
    {"sum": 27, "idx": 3, "frame": [8, 7, 6, 6]},
    {"sum": 15, "idx": 1, "frame": [8, 7]},
    {"sum": 8, "idx": 0, "frame": [8]},
]

from datetime import date, timedelta
from convtools import conversion as c


data = [
    {"a": 1, "dt": date(2020, 1, 1), "b": 1},
    {"a": 2, "dt": date(2020, 1, 1), "b": 6},
    {"a": 1, "dt": date(2020, 1, 2), "b": 3},
    {"a": 1, "dt": date(2020, 1, 2), "b": 4},
    {"a": 1, "dt": date(2020, 1, 2), "b": 2},
    {"a": 1, "dt": date(2020, 1, 3), "b": 5},
    {"a": 1, "dt": date(2020, 1, 4), "b": 6},
    {"a": 1, "dt": date(2020, 1, 5), "b": 7},
    {"a": 1, "dt": date(2020, 1, 7), "b": 8},
]

result = (
    c.this.window(
        {
            "dt_min": c.ReduceFuncs.Min(c.item("dt")).pipe(str),
            "dt_max": c.ReduceFuncs.Max(c.item("dt")).pipe(str),
            "idx": c.WindowFuncs.RowIndex(),
            "frame": c.ReduceFuncs.Array(c.item("b")),
        }
    )
    .over(
        order_by=c.item("dt"),
        frame_mode="GROUPS",
        frame_start=(1, "PRECEDING"),
        frame_end=(1, "FOLLOWING"),
    )
    .execute(data)
)
# fmt: off
assert result == [
    {"dt_min": "2020-01-01", "dt_max": "2020-01-02", "idx": 0, "frame": [1, 6, 3, 4, 2]},
    {"dt_min": "2020-01-01", "dt_max": "2020-01-02", "idx": 1, "frame": [1, 6, 3, 4, 2]},
    {"dt_min": "2020-01-01", "dt_max": "2020-01-03", "idx": 2, "frame": [1, 6, 3, 4, 2, 5]},
    {"dt_min": "2020-01-01", "dt_max": "2020-01-03", "idx": 3, "frame": [1, 6, 3, 4, 2, 5]},
    {"dt_min": "2020-01-01", "dt_max": "2020-01-03", "idx": 4, "frame": [1, 6, 3, 4, 2, 5]},
    {"dt_min": "2020-01-02", "dt_max": "2020-01-04", "idx": 5, "frame": [3, 4, 2, 5, 6]},
    {"dt_min": "2020-01-03", "dt_max": "2020-01-05", "idx": 6, "frame": [5, 6, 7]},
    {"dt_min": "2020-01-04", "dt_max": "2020-01-07", "idx": 7, "frame": [6, 7, 8]},
    {"dt_min": "2020-01-05", "dt_max": "2020-01-07", "idx": 8, "frame": [7, 8]},
]
# fmt: on