Skip to content

convtools — write transformations as expressions, run them as Python

convtools lets you declare data transformations in plain Python, then compiles them into tiny, optimized Python functions at runtime. You keep your data in native iterables (lists, dicts, generators, CSV streams)—no heavy container required.

License codecov Tests status Docs status PyPI Downloads Python versions

Why pick convtools?

  • Stay in Python. Compose transformations as expressions: pipes, filters, joins, group‑bys, reducers, window functions, and more. Then call .gen_converter() to get a real Python function.
  • Stream‑friendly. Works directly on iterators and files; the Table helper processes CSV‑like data without loading everything into memory.
  • Powerful aggregations. Rich reducers (Sum, CountDistinct, MaxRow, ArraySorted, Dict*, TopK…) with per‑reducer where filters and defaults. Nested aggregations are first‑class.
  • Debuggable & inspectable. Print the generated code with debug=True or set global options via c.OptionsCtx. Works with pdb/pydevd.
  • Plays nicely with Pandas/Polars. It’s not a DataFrame; it’s a code‑generation layer. Use it when you want lean, composable transforms over native Python data.

Installation

pip install convtools

60-second tour

1) Build & run a converter

from convtools import conversion as c

# Title‑case a name in an incoming dict
to_title = c.item("name").pipe(str.title).gen_converter()

assert to_title({"name": "jane doe"}) == "Jane Doe"

Under the hood gen_converter() compiles your expression into an ad‑hoc Python function. Want a one‑off call? Use .execute(data) instead.

2) Transform a collection

from convtools import conversion as c

rows = [
    {"name": "ada", "score": 10},
    {"name": "grace", "score": 12},
    {"name": "linus", "score": 9},
]

to_names = (
    c.iter(c.item("name").pipe(str.title))
    .as_type(list)  # return a list, not a generator
    .gen_converter()
)

assert to_names(rows) == ["Ada", "Grace", "Linus"]

Uses c.iter to express a per‑row transform and .as_type(list) to collect.

3) Group & aggregate

from convtools import conversion as c

orders = [
    {"user": "a", "amount": 20, "status": "paid"},
    {"user": "a", "amount": 30, "status": "refunded"},
    {"user": "b", "amount": 15, "status": "paid"},
    {"user": "b", "amount": 10, "status": "paid"},
]

group_and_sum_paid = (
    c.group_by(c.item("user"))
    .aggregate(
        {
            "user": c.item("user"),
            "paid_total": c.ReduceFuncs.Sum(
                c.item("amount"),
                where=c.item("status") == "paid",
            ),
        }
    )
    .sort(key=c.item("paid_total").desc())
    .gen_converter()
)

assert group_and_sum_paid(orders) == [
    {"user": "b", "paid_total": 25},
    {"user": "a", "paid_total": 20},
]

Reducers support where filters and sensible defaults. c.group_by(...).aggregate(...) returns a list you can sort, filter, or map further.

4) Join two sequences

from convtools import conversion as c

collection_1 = [
    {"id": 1, "name": "Nick"},
    {"id": 2, "name": "Joash"},
    {"id": 3, "name": "Bob"},
]
collection_2 = [
    {"ID": "3", "age": 17, "country": "GB"},
    {"ID": "2", "age": 21, "country": "US"},
    {"ID": "1", "age": 18, "country": "CA"},
]
input_data = (collection_1, collection_2)

conv = (
    c.join(
        c.item(0),
        c.item(1),
        c.and_(
            c.LEFT.item("id") == c.RIGHT.item("ID").as_type(int),
            c.RIGHT.item("age") >= 18,
        ),
        how="left",
    )
    .pipe(
        c.list_comp(
            {
                "id": c.item(0, "id"),
                "name": c.item(0, "name"),
                "age": c.item(1, "age", default=None),
                "country": c.item(1, "country", default=None),
            }
        )
    )
    .gen_converter()
)

assert conv(input_data) == [
    {"id": 1, "name": "Nick", "age": 18, "country": "CA"},
    {"id": 2, "name": "Joash", "age": 21, "country": "US"},
    {"id": 3, "name": "Bob", "age": None, "country": None},
]

c.join returns (left, right) tuples; c.LEFT/c.RIGHT let you express join conditions. Hash‑join optimization kicks in on equi‑joins.


Streaming CSVs with Table

from convtools import conversion as c
from convtools.contrib.tables import Table
from decimal import Decimal

# tests/csvs/orders.csv
"""
order_id,price,qty,status
a,20,2,paid
a,30,3,refunded
b,15,4,paid
b,10,5,paid
"""

# Read a CSV, infer header, and stream out a subset
pipe = (
    Table.from_csv("tests/csvs/orders.csv", header=True)  # stream in
    .filter(c.col("status") == "paid")  # row-wise filter
    .update(total=c.col("price").as_type(Decimal) * c.col("qty").as_type(int))
    .take("order_id", "total")
    .into_iter_rows(dict)  # stream out or into_csv("output.csv")
)

assert list(pipe) == [
    {"order_id": "a", "total": Decimal("40")},
    {"order_id": "b", "total": Decimal("60")},
    {"order_id": "b", "total": Decimal("50")},
]

Table is optimized for streaming transformations: rename/take/drop/update columns, joins, explode, pivot, and more. Note: Table consumes its input once (it’s an iterator).


Debugging & generated code

Pass debug=True to .gen_converter(...) or .execute(...) to print the compiled function for inspection. You can also set global debug options with c.OptionsCtx(). Installing black prettifies the printed code automatically.

from convtools import conversion as c

with c.OptionsCtx() as opts:
    opts.debug = True
    c.item(1).gen_converter()

When should I reach for convtools?

  • You need composable transforms over native Python data (lists/dicts/generators/CSV), not a DataFrame.
  • You want to express business rules declaratively and generate fast, readable Python functions.
  • You need aggregations/joins/pipes that you can reuse across scripts and services.

Info

Looking for benchmarks and deeper rationale? See Benefits in the docs and the linked benchmark sources.


Install & use in 3 steps

  1. pip install convtools

  2. from convtools import conversion as c

  3. Build an expression → gen_converter() → call it wherever you need.


Contributing

  • Star the repo and share use‑cases in Discussions -- it really helps.

  • To report a bug or suggest enhancements, please open an issue and/or submit a pull request.

  • Reporting a Security Vulnerability: see the security policy.


What’s included in the box?

  • from convtools import conversion as c — the main interface.

  • from convtools.contrib.tables import Table — stream processing of CSV‑like/tabular data.

  • from convtools.contrib import fs — tiny helpers for splitting buffers with custom newlines.


License

MIT License (see LICENSE.txt).