Skip to content

Contrib / Tables

Table is a streaming helper for tabular data. It builds a pipeline of convtools conversions so you can reshape, enrich, filter, and combine rows without loading the entire dataset into memory.

Warning

  • Table is single-pass: the underlying iterable is consumed once. If you need to reuse data, materialize it or reopen the source.
  • If you want to use Table inside conversions, read this first.

Header handling and duplicate columns

header supports multiple forms:

  • True - infer a header from the first row. For list/tuple input, the first row becomes the header and is not part of the data. For dict input, the first row's keys become column names, and that first row's values are still emitted as data.
  • False - treat input as headerless; columns are numbered (COLUMN_0, COLUMN_1, ...).
  • list / tuple - explicit column names.
  • dict - keys are column names, values are indexes (or keys) used to pull data.
  • None - infer only if the first row is a dict; otherwise behave like False. When inferred from a dict, the first row is still emitted as data.

duplicate_columns controls how repeated column names are handled:

  • raise - raise ValueError.
  • mangle - rename duplicates to name, name_1, name_2, ...
  • keep - keep duplicates; when referenced by name, the first one wins.
  • drop - skip duplicates entirely.

Defaults differ by constructor: from_rows uses raise, while from_csv and from_jsonl use mangle.

Read / output rows

Table.from_rows

Initialize a table from an iterable of rows.

Arguments:


Table.into_iter_rows

Return processed rows as an iterator.

Arguments:

  • type_ - must be dict, tuple, or list.
  • include_header - when True, prepend a header row (ignored for dict).
from convtools import conversion as c
from convtools.contrib.tables import Table

with c.OptionsCtx() as options:
    options.debug = True

    # NO HEADER PROVIDED
    assert list(
        Table.from_rows([(1, 2, 3), (2, 3, 4)]).into_iter_rows(dict)
    ) == [
        {"COLUMN_0": 1, "COLUMN_1": 2, "COLUMN_2": 3},
        {"COLUMN_0": 2, "COLUMN_1": 3, "COLUMN_2": 4},
    ]

    # HEADER IS PROVIDED
    assert list(
        Table.from_rows(
            [[1, 2, 3], [2, 3, 4]], header=["a", "b", "c"]
        ).into_iter_rows(dict)
    ) == [
        {"a": 1, "b": 2, "c": 3},
        {"a": 2, "b": 3, "c": 4},
    ]

    # INFERS HEADER ON DEMAND
    assert list(
        Table.from_rows(
            [["a", "b", "c"], [1, 2, 3], [2, 3, 4]], header=True
        ).into_iter_rows(dict)
    ) == [
        {"a": 1, "b": 2, "c": 3},
        {"a": 2, "b": 3, "c": 4},
    ]

    # INFERS HEADER AUTOMATICALLY BECAUSE THE FIRST ELEMENT IS A DICT
    assert list(
        Table.from_rows([{"a": 1, "b": 2}, {"a": 2, "b": 3}]).into_iter_rows(
            tuple, include_header=True
        )
    ) == [("a", "b"), (1, 2), (2, 3)]

Read CSV-like

Table.from_csv

Initialize a table from a CSV-like file.

Arguments:

  • filepath_or_buffer - a filepath or a buffer acceptable by csv.reader.
  • header - see Header handling and duplicate columns.
  • duplicate_columns - see Header handling and duplicate columns.
  • skip_rows - number of rows to skip before header handling; default is 0.
  • dialect - a dialect acceptable by csv.reader. Use Table.csv_dialect(delimiter="\t") for tab-separated files.
  • encoding - default is utf-8.

Table.into_csv

Write the results to a CSV-like file.

This is a terminal writer method: it consumes the table pipeline, writes rows to the target, and returns None.

Arguments:

  • filepath_or_buffer - a filepath or something csv.writer can write to.
  • include_header - whether to emit the header; default is True.
  • dialect - a dialect acceptable by csv.writer. Use Table.csv_dialect(delimiter="\t") for tab-separated files.
  • encoding - default is utf-8.
from convtools import conversion as c
from convtools.contrib.tables import Table

with c.OptionsCtx() as options:
    options.debug = True

    # READING CSV
    # file content:
    #   a,b
    #   1,2
    #   2,3
    assert list(
        Table
        .from_csv("tests/csvs/ab.csv", header=True)
        .into_iter_rows(dict)
        # .into_csv("output.csv")  # TO WRITE TO A FILE
    ) == [
        {"a": "1", "b": "2"},
        {"a": "2", "b": "3"},
    ]

    # READING TSV
    # file content:
    #   a\tc
    #   2\t4
    #   1\t3
    assert list(
        Table.from_csv(
            "tests/csvs/ac.csv",
            header=True,
            dialect=Table.csv_dialect(delimiter="\t"),
        ).into_iter_rows(dict)
    ) == [
        {"a": "2", "c": "4"},
        {"a": "1", "c": "3"},
    ]

    # READ TSV + SKIP EXISTING HEADER + REMAP COLUMNS
    # file content:
    #   a\tc
    #   2\t4
    #   1\t3
    assert list(
        Table.from_csv(
            "tests/csvs/ac.csv",
            header={"a": 1, "c": 0},
            skip_rows=1,
            dialect=Table.csv_dialect(delimiter="\t"),
        ).into_iter_rows(dict)
    ) == [
        {"a": "4", "c": "2"},
        {"a": "3", "c": "1"},
    ]

Read JSONL

Table.from_jsonl

Initialize a table from a JSONL (JSON Lines) file. Each line must be a valid JSON object or array. Empty lines are silently skipped.

Arguments:


Table.into_jsonl

Write the results to a JSONL file (one JSON object per line).

This is a terminal writer method: it consumes the table pipeline, writes rows to the target, and returns None.

Arguments:

  • filepath_or_buffer - a filepath or a text buffer to write to.
  • encoding - default is utf-8.
from io import StringIO

from convtools import conversion as c
from convtools.contrib.tables import Table

# READING JSONL (file with one JSON object per line)
output = StringIO(
    '{"a": 1, "b": 2}\n'
    '{"a": 3, "b": 4}\n'
)
assert list(Table.from_jsonl(output).into_iter_rows(dict)) == [
    {"a": 1, "b": 2},
    {"a": 3, "b": 4},
]

# READING JSONL WITH ARRAYS
arrays = StringIO("[1, 2, 3]\n[4, 5, 6]\n")
assert list(
    Table.from_jsonl(arrays, header=["x", "y", "z"]).into_iter_rows(dict)
) == [
    {"x": 1, "y": 2, "z": 3},
    {"x": 4, "y": 5, "z": 6},
]

# READING + TRANSFORMING + WRITING
output.seek(0)
transformed = StringIO()
result = (
    Table.from_jsonl(output)
    .update(c=c.col("a") + c.col("b"))
    .into_jsonl(transformed)
)
assert result is None

transformed.seek(0)
assert list(Table.from_jsonl(transformed).into_iter_rows(dict)) == [
    {"a": 1, "b": 2, "c": 3},
    {"a": 3, "b": 4, "c": 7},
]

Rename, take / rearrange, drop columns

  • rename(columns) renames columns:
  • tuple / list define new column names (length must match the current number of columns).
  • dict maps old column names to new ones.
  • take(*column_names) keeps only specified columns (order matters).
  • take can accept ... to include all non-mentioned columns, which makes reordering easy: table.take("c", "d", ...).
  • drop(*column_names) drops specified columns, keeping the rest as-is.
from convtools import conversion as c
from convtools.contrib.tables import Table

with c.OptionsCtx() as options:
    options.debug = True

    assert list(
        Table.from_rows([("A", "b", "c"), (1, 2, 3), (2, 3, 4)], header=True)
        .rename({"A": "a"})
        .drop("b")
        .take("c", ...)  # MAKE "c" COLUMN THE FIRST ONE
        .into_iter_rows(dict)
    ) == [{"c": 3, "a": 1}, {"c": 4, "a": 2}]

Add, update columns

Use update to add or replace columns and update_all to apply conversions to every existing column.

  • update(**column_to_conversion) takes keyword arguments where keys are column names and values are conversions applied row-wise.
  • update_all(*conversions) composes conversions and applies them to each value in each column.

Use c.col("column name") to reference column values.

from convtools import conversion as c
from convtools.contrib.tables import Table

with c.OptionsCtx() as options:
    options.debug = True

    assert list(
        Table.from_rows([(1, -2), (2, -3)], ["a", "b"])
        .update(c=c.col("a") + c.col("b"))  # ADDING NEW COLUMN: "c"
        .update(c=c.call_func(abs, c.col("c")))  # UPDATING NEW COLUMN: "c"
        .into_iter_rows(dict)
    ) == [{"a": 1, "b": -2, "c": 1}, {"a": 2, "b": -3, "c": 1}]

Filter rows

To filter rows, pass a conversion to the filter method. Rows are kept when the conversion returns a truthy value.

Use c.col("column name") to reference column values.

from convtools import conversion as c
from convtools.contrib.tables import Table

with c.OptionsCtx() as options:
    options.debug = True

    assert list(
        Table.from_rows([(1, -2), (2, -3)], header=["a", "b"])
        .filter(c.col("b") < -2)
        .into_iter_rows(dict)
    ) == [{"a": 2, "b": -3}]

Join tables

Use join to combine two tables. Arguments:

  • table - another table to join with.
  • on - either:
  • a join conversion like c.LEFT.col("a") == c.RIGHT.col("A"), or
  • an iterable of column names to join on.
  • how - one of "inner", "left", "right", "full".
  • suffixes - a tuple of two strings (left and right) to be concatenated with conflicting column names. Default is ("_LEFT", "_RIGHT"). Columns listed in on (when on is an iterable of names) are not suffixed.
from convtools import conversion as c
from convtools.contrib.tables import Table

with c.OptionsCtx() as options:
    options.debug = True

    # JOIN ON COLUMN NAMES
    assert list(
        Table.from_rows([(1, 2), (2, 3)], ["a", "b"])
        .join(
            Table.from_rows([(1, 3), (2, 4)], ["a", "c"]),
            how="inner",
            on=["a"],
        )
        .into_iter_rows(dict)
    ) == [
        {"a": 1, "b": 2, "c": 3},
        {"a": 2, "b": 3, "c": 4},
    ]

    # JOIN ON CONDITION
    assert list(
        Table.from_rows([(1, 2), (2, 30)], ["a", "b"])
        .join(
            Table.from_rows([(1, 3), (2, 4)], ["a", "c"]),
            how="full",
            on=c.and_(
                c.LEFT.col("a") == c.RIGHT.col("a"),
                c.LEFT.col("b") < c.RIGHT.col("c"),
            ),
        )
        .into_iter_rows(dict)
    ) == [
        {"a_LEFT": 1, "b": 2, "a_RIGHT": 1, "c": 3},
        {"a_LEFT": 2, "b": 30, "a_RIGHT": None, "c": None},
        {"a_LEFT": None, "b": None, "a_RIGHT": 2, "c": 4},
    ]

Chain tables

chain concatenates tables vertically (appends rows).

Arguments:

  • table - table to chain.
  • fill_value - value used to fill gaps when columns don't align; default is None.
from convtools import conversion as c
from convtools.contrib.tables import Table

with c.OptionsCtx() as options:
    options.debug = True

    assert list(
        Table.from_rows([(1, 2), (2, 3)], ["a", "b"])
        .chain(
            Table.from_rows([(1, 3), (2, 4)], ["a", "c"]),
            fill_value=0,
        )
        .into_iter_rows(dict)
    ) == [
        {"a": 1, "b": 2, "c": 0},
        {"a": 2, "b": 3, "c": 0},
        {"a": 1, "b": 0, "c": 3},
        {"a": 2, "b": 0, "c": 4},
    ]

Zip tables

zip concatenates tables horizontally (combines columns row-by-row).

Arguments:

  • table - table to zip.
  • fill_value - value used to fill gaps; default is None.

Warning

  • Before using this method, please make sure you are not looking for Table.join.
  • Be cautious with using .into_iter_rows(dict) here, because by default zip uses the "keep" duplicate_columns strategy. If column names collide, dict will keep only the first occurrence.
from convtools import conversion as c
from convtools.contrib.tables import Table

with c.OptionsCtx() as options:
    options.debug = True

    assert list(
        Table.from_rows([(1, 2), (2, 3)], ["a", "b"])
        .zip(
            Table.from_rows([(10, 3), (20, 4)], ["a", "c"]),
            fill_value=0,
        )
        .into_iter_rows(tuple, include_header=True)
    ) == [("a", "b", "a", "c"), (1, 2, 10, 3), (2, 3, 20, 4)]

Explode table

explode transforms a table with columns containing lists into a table with values of those lists, repeating values of other columns.

Arguments:

  • column_name - first column with iterables to explode.
  • *other_column_names - additional columns to explode together.
  • fill_value - value used to pad shorter arrays when exploding multiple columns together; default is None.

When multiple columns are provided, they are exploded together using zip_longest semantics (like PostgreSQL's multiple unnest in the same SELECT). Shorter arrays are padded with fill_value.

from convtools import conversion as c
from convtools.contrib.tables import Table

with c.OptionsCtx() as options:
    options.debug = True

    # Single column explode
    assert list(
        Table.from_rows([{"a": 1, "b": [1, 2, 3]}, {"a": 10, "b": [4, 5, 6]}])
        .explode("b")
        .into_iter_rows(tuple, include_header=True)
    ) == [
        ("a", "b"),
        (1, 1),
        (1, 2),
        (1, 3),
        (10, 4),
        (10, 5),
        (10, 6),
    ]

    # Multi-column explode with zip_longest semantics
    # Shorter arrays are padded with None
    assert list(
        Table.from_rows(
            [{"a": 1, "b": [2, 3], "c": [10, 20, 30]}],
        )
        .explode("b", "c")
        .into_iter_rows(tuple, include_header=True)
    ) == [
        ("a", "b", "c"),
        (1, 2, 10),
        (1, 3, 20),
        (1, None, 30),
    ]

Wide to long

wide_to_long turns a table from wide to long form, converting each input row into multiple rows with fewer columns.

Arguments:

  • col_for_names - name of the column with names of processed columns.
  • col_for_values - name of the column with values of processed columns.
  • prepare_name - callable or conversion to prepare a name.
  • prepare_value - callable or conversion to prepare a value.
  • keep_cols - column names to keep as-is.
from convtools import conversion as c
from convtools.contrib.tables import Table

with c.OptionsCtx() as options:
    options.debug = True

    assert list(
        Table.from_rows(
            [{"name": "John", "height": 200, "age": 30, "mood": "good"}]
        )
        .wide_to_long(
            col_for_names="metric", col_for_values="value", keep_cols=("name",)
        )
        .into_iter_rows(dict)
    ) == [
        {"name": "John", "metric": "height", "value": 200},
        {"name": "John", "metric": "age", "value": 30},
        {"name": "John", "metric": "mood", "value": "good"},
    ]

Pivot

pivot aggregates data and creates a pivot table.

Arguments:

  • rows - columns to group by.
  • columns - columns to take names of new columns from.
  • values - mapping of name to reducer of column value/values.
  • prepare_column_names - callable to create column names from column names and reducer name. By default, each generated column is named by joining the pivot column value(s) and reducer name with " - ", after converting every part to str; for example, columns=["currency"] and a "sum" reducer can produce "USD - sum".
from convtools import conversion as c
from convtools.contrib.tables import Table

with c.OptionsCtx() as options:
    options.debug = True

    assert list(
        Table.from_rows(
            [
                {"dept": 1, "year": 2023, "currency": "USD", "revenue": 100},
                {"dept": 1, "year": 2024, "currency": "USD", "revenue": 300},
                {"dept": 1, "year": 2024, "currency": "CNY", "revenue": 200},
                {"dept": 1, "year": 2024, "currency": "CNY", "revenue": 111},
            ]
        )
        .pivot(
            rows=["year", "dept"],
            columns=["currency"],
            values={
                "sum": c.ReduceFuncs.Sum(c.col("revenue")),
                "min": c.ReduceFuncs.Min(c.col("revenue")),
            },
        )
        .into_iter_rows(dict)
    ) == [
        {
            "CNY - min": None,
            "CNY - sum": None,
            "USD - min": 100,
            "USD - sum": 100,
            "dept": 1,
            "year": 2023,
        },
        {
            "CNY - min": 111,
            "CNY - sum": 311,
            "USD - min": 300,
            "USD - sum": 300,
            "dept": 1,
            "year": 2024,
        },
    ]

Using tables inside conversions

Table cannot be used directly inside other conversions because it would introduce a second code-generation layer.

You can still use piping to callables:

from convtools import conversion as c
from convtools.contrib.tables import Table

with c.OptionsCtx() as options:
    options.debug = True

    input_data = [["a", "b"], [1, 2], [3, 4]]
    converter = (
        c.this.pipe(
            lambda it: Table.from_rows(it, header=True).into_iter_rows(dict)
        )
        .as_type(list)
        .gen_converter()
    )
    assert converter(input_data) == [
        {"a": 1, "b": 2},
        {"a": 3, "b": 4},
    ]

Note

Keep in mind that unlike conversions, Table doesn't have gen_converter, so the code cannot be generated once during a warm-up and reused. Tables generate their code on each run.