Contrib / Tables¶
Table is a streaming helper for tabular data. It builds a pipeline of
convtools conversions so you can reshape, enrich, filter, and combine rows
without loading the entire dataset into memory.
Warning
Tableis single-pass: the underlying iterable is consumed once. If you need to reuse data, materialize it or reopen the source.- If you want to use
Tableinside conversions, read this first.
Header handling and duplicate columns¶
header supports multiple forms:
True- infer a header from the first row. For list/tuple input, the first row becomes the header and is not part of the data. For dict input, the first row's keys become column names, and that first row's values are still emitted as data.False- treat input as headerless; columns are numbered (COLUMN_0,COLUMN_1, ...).list/tuple- explicit column names.dict- keys are column names, values are indexes (or keys) used to pull data.None- infer only if the first row is a dict; otherwise behave likeFalse. When inferred from a dict, the first row is still emitted as data.
duplicate_columns controls how repeated column names are handled:
raise- raiseValueError.mangle- rename duplicates toname,name_1,name_2, ...keep- keep duplicates; when referenced by name, the first one wins.drop- skip duplicates entirely.
Defaults differ by constructor: from_rows uses raise, while from_csv and
from_jsonl use mangle.
Read / output rows¶
Table.from_rows¶
Initialize a table from an iterable of rows.
Arguments:
rows- iterable ofdict,tuple, orlistis treated as a multi-column table; any other type is treated as a single-column table.header- see Header handling and duplicate columns.duplicate_columns- see Header handling and duplicate columns.skip_rows- number of rows to skip before header handling; default is 0.
Table.into_iter_rows¶
Return processed rows as an iterator.
Arguments:
type_- must bedict,tuple, orlist.include_header- whenTrue, prepend a header row (ignored fordict).
from convtools import conversion as c
from convtools.contrib.tables import Table
with c.OptionsCtx() as options:
options.debug = True
# NO HEADER PROVIDED
assert list(
Table.from_rows([(1, 2, 3), (2, 3, 4)]).into_iter_rows(dict)
) == [
{"COLUMN_0": 1, "COLUMN_1": 2, "COLUMN_2": 3},
{"COLUMN_0": 2, "COLUMN_1": 3, "COLUMN_2": 4},
]
# HEADER IS PROVIDED
assert list(
Table.from_rows(
[[1, 2, 3], [2, 3, 4]], header=["a", "b", "c"]
).into_iter_rows(dict)
) == [
{"a": 1, "b": 2, "c": 3},
{"a": 2, "b": 3, "c": 4},
]
# INFERS HEADER ON DEMAND
assert list(
Table.from_rows(
[["a", "b", "c"], [1, 2, 3], [2, 3, 4]], header=True
).into_iter_rows(dict)
) == [
{"a": 1, "b": 2, "c": 3},
{"a": 2, "b": 3, "c": 4},
]
# INFERS HEADER AUTOMATICALLY BECAUSE THE FIRST ELEMENT IS A DICT
assert list(
Table.from_rows([{"a": 1, "b": 2}, {"a": 2, "b": 3}]).into_iter_rows(
tuple, include_header=True
)
) == [("a", "b"), (1, 2), (2, 3)]
Read CSV-like¶
Table.from_csv¶
Initialize a table from a CSV-like file.
Arguments:
filepath_or_buffer- a filepath or a buffer acceptable bycsv.reader.header- see Header handling and duplicate columns.duplicate_columns- see Header handling and duplicate columns.skip_rows- number of rows to skip before header handling; default is 0.dialect- a dialect acceptable bycsv.reader. UseTable.csv_dialect(delimiter="\t")for tab-separated files.encoding- default isutf-8.
Table.into_csv¶
Write the results to a CSV-like file.
This is a terminal writer method: it consumes the table pipeline, writes rows
to the target, and returns None.
Arguments:
filepath_or_buffer- a filepath or somethingcsv.writercan write to.include_header- whether to emit the header; default isTrue.dialect- a dialect acceptable bycsv.writer. UseTable.csv_dialect(delimiter="\t")for tab-separated files.encoding- default isutf-8.
from convtools import conversion as c
from convtools.contrib.tables import Table
with c.OptionsCtx() as options:
options.debug = True
# READING CSV
# file content:
# a,b
# 1,2
# 2,3
assert list(
Table
.from_csv("tests/csvs/ab.csv", header=True)
.into_iter_rows(dict)
# .into_csv("output.csv") # TO WRITE TO A FILE
) == [
{"a": "1", "b": "2"},
{"a": "2", "b": "3"},
]
# READING TSV
# file content:
# a\tc
# 2\t4
# 1\t3
assert list(
Table.from_csv(
"tests/csvs/ac.csv",
header=True,
dialect=Table.csv_dialect(delimiter="\t"),
).into_iter_rows(dict)
) == [
{"a": "2", "c": "4"},
{"a": "1", "c": "3"},
]
# READ TSV + SKIP EXISTING HEADER + REMAP COLUMNS
# file content:
# a\tc
# 2\t4
# 1\t3
assert list(
Table.from_csv(
"tests/csvs/ac.csv",
header={"a": 1, "c": 0},
skip_rows=1,
dialect=Table.csv_dialect(delimiter="\t"),
).into_iter_rows(dict)
) == [
{"a": "4", "c": "2"},
{"a": "3", "c": "1"},
]
Read JSONL¶
Table.from_jsonl¶
Initialize a table from a JSONL (JSON Lines) file. Each line must be a valid JSON object or array. Empty lines are silently skipped.
Arguments:
filepath_or_buffer- a filepath or a text buffer to read from.header- see Header handling and duplicate columns.duplicate_columns- see Header handling and duplicate columns.skip_rows- number of rows to skip before header handling; default is 0.encoding- default isutf-8.
Table.into_jsonl¶
Write the results to a JSONL file (one JSON object per line).
This is a terminal writer method: it consumes the table pipeline, writes rows
to the target, and returns None.
Arguments:
filepath_or_buffer- a filepath or a text buffer to write to.encoding- default isutf-8.
from io import StringIO
from convtools import conversion as c
from convtools.contrib.tables import Table
# READING JSONL (file with one JSON object per line)
output = StringIO(
'{"a": 1, "b": 2}\n'
'{"a": 3, "b": 4}\n'
)
assert list(Table.from_jsonl(output).into_iter_rows(dict)) == [
{"a": 1, "b": 2},
{"a": 3, "b": 4},
]
# READING JSONL WITH ARRAYS
arrays = StringIO("[1, 2, 3]\n[4, 5, 6]\n")
assert list(
Table.from_jsonl(arrays, header=["x", "y", "z"]).into_iter_rows(dict)
) == [
{"x": 1, "y": 2, "z": 3},
{"x": 4, "y": 5, "z": 6},
]
# READING + TRANSFORMING + WRITING
output.seek(0)
transformed = StringIO()
result = (
Table.from_jsonl(output)
.update(c=c.col("a") + c.col("b"))
.into_jsonl(transformed)
)
assert result is None
transformed.seek(0)
assert list(Table.from_jsonl(transformed).into_iter_rows(dict)) == [
{"a": 1, "b": 2, "c": 3},
{"a": 3, "b": 4, "c": 7},
]
Rename, take / rearrange, drop columns¶
rename(columns)renames columns:tuple/listdefine new column names (length must match the current number of columns).dictmaps old column names to new ones.take(*column_names)keeps only specified columns (order matters).takecan accept...to include all non-mentioned columns, which makes reordering easy:table.take("c", "d", ...).drop(*column_names)drops specified columns, keeping the rest as-is.
from convtools import conversion as c
from convtools.contrib.tables import Table
with c.OptionsCtx() as options:
options.debug = True
assert list(
Table.from_rows([("A", "b", "c"), (1, 2, 3), (2, 3, 4)], header=True)
.rename({"A": "a"})
.drop("b")
.take("c", ...) # MAKE "c" COLUMN THE FIRST ONE
.into_iter_rows(dict)
) == [{"c": 3, "a": 1}, {"c": 4, "a": 2}]
Add, update columns¶
Use update to add or replace columns and update_all to apply conversions
to every existing column.
update(**column_to_conversion)takes keyword arguments where keys are column names and values are conversions applied row-wise.update_all(*conversions)composes conversions and applies them to each value in each column.
Use c.col("column name") to reference column values.
from convtools import conversion as c
from convtools.contrib.tables import Table
with c.OptionsCtx() as options:
options.debug = True
assert list(
Table.from_rows([(1, -2), (2, -3)], ["a", "b"])
.update(c=c.col("a") + c.col("b")) # ADDING NEW COLUMN: "c"
.update(c=c.call_func(abs, c.col("c"))) # UPDATING NEW COLUMN: "c"
.into_iter_rows(dict)
) == [{"a": 1, "b": -2, "c": 1}, {"a": 2, "b": -3, "c": 1}]
Filter rows¶
To filter rows, pass a conversion to the filter method. Rows are kept when
the conversion returns a truthy value.
Use c.col("column name") to reference column values.
from convtools import conversion as c
from convtools.contrib.tables import Table
with c.OptionsCtx() as options:
options.debug = True
assert list(
Table.from_rows([(1, -2), (2, -3)], header=["a", "b"])
.filter(c.col("b") < -2)
.into_iter_rows(dict)
) == [{"a": 2, "b": -3}]
Join tables¶
Use join to combine two tables. Arguments:
table- another table to join with.on- either:- a join conversion like
c.LEFT.col("a") == c.RIGHT.col("A"), or - an iterable of column names to join on.
how- one of"inner","left","right","full".suffixes- a tuple of two strings (left and right) to be concatenated with conflicting column names. Default is("_LEFT", "_RIGHT"). Columns listed inon(whenonis an iterable of names) are not suffixed.
from convtools import conversion as c
from convtools.contrib.tables import Table
with c.OptionsCtx() as options:
options.debug = True
# JOIN ON COLUMN NAMES
assert list(
Table.from_rows([(1, 2), (2, 3)], ["a", "b"])
.join(
Table.from_rows([(1, 3), (2, 4)], ["a", "c"]),
how="inner",
on=["a"],
)
.into_iter_rows(dict)
) == [
{"a": 1, "b": 2, "c": 3},
{"a": 2, "b": 3, "c": 4},
]
# JOIN ON CONDITION
assert list(
Table.from_rows([(1, 2), (2, 30)], ["a", "b"])
.join(
Table.from_rows([(1, 3), (2, 4)], ["a", "c"]),
how="full",
on=c.and_(
c.LEFT.col("a") == c.RIGHT.col("a"),
c.LEFT.col("b") < c.RIGHT.col("c"),
),
)
.into_iter_rows(dict)
) == [
{"a_LEFT": 1, "b": 2, "a_RIGHT": 1, "c": 3},
{"a_LEFT": 2, "b": 30, "a_RIGHT": None, "c": None},
{"a_LEFT": None, "b": None, "a_RIGHT": 2, "c": 4},
]
Chain tables¶
chain concatenates tables vertically (appends rows).
Arguments:
table- table to chain.fill_value- value used to fill gaps when columns don't align; default isNone.
from convtools import conversion as c
from convtools.contrib.tables import Table
with c.OptionsCtx() as options:
options.debug = True
assert list(
Table.from_rows([(1, 2), (2, 3)], ["a", "b"])
.chain(
Table.from_rows([(1, 3), (2, 4)], ["a", "c"]),
fill_value=0,
)
.into_iter_rows(dict)
) == [
{"a": 1, "b": 2, "c": 0},
{"a": 2, "b": 3, "c": 0},
{"a": 1, "b": 0, "c": 3},
{"a": 2, "b": 0, "c": 4},
]
Zip tables¶
zip concatenates tables horizontally (combines columns row-by-row).
Arguments:
table- table to zip.fill_value- value used to fill gaps; default isNone.
Warning
- Before using this method, please make sure you are not looking for
Table.join. - Be cautious with using
.into_iter_rows(dict)here, because by defaultzipuses the"keep"duplicate_columnsstrategy. If column names collide,dictwill keep only the first occurrence.
from convtools import conversion as c
from convtools.contrib.tables import Table
with c.OptionsCtx() as options:
options.debug = True
assert list(
Table.from_rows([(1, 2), (2, 3)], ["a", "b"])
.zip(
Table.from_rows([(10, 3), (20, 4)], ["a", "c"]),
fill_value=0,
)
.into_iter_rows(tuple, include_header=True)
) == [("a", "b", "a", "c"), (1, 2, 10, 3), (2, 3, 20, 4)]
Explode table¶
explode transforms a table with columns containing lists into a table with
values of those lists, repeating values of other columns.
Arguments:
column_name- first column with iterables to explode.*other_column_names- additional columns to explode together.fill_value- value used to pad shorter arrays when exploding multiple columns together; default isNone.
When multiple columns are provided, they are exploded together using
zip_longest semantics (like PostgreSQL's multiple unnest in the same
SELECT). Shorter arrays are padded with fill_value.
from convtools import conversion as c
from convtools.contrib.tables import Table
with c.OptionsCtx() as options:
options.debug = True
# Single column explode
assert list(
Table.from_rows([{"a": 1, "b": [1, 2, 3]}, {"a": 10, "b": [4, 5, 6]}])
.explode("b")
.into_iter_rows(tuple, include_header=True)
) == [
("a", "b"),
(1, 1),
(1, 2),
(1, 3),
(10, 4),
(10, 5),
(10, 6),
]
# Multi-column explode with zip_longest semantics
# Shorter arrays are padded with None
assert list(
Table.from_rows(
[{"a": 1, "b": [2, 3], "c": [10, 20, 30]}],
)
.explode("b", "c")
.into_iter_rows(tuple, include_header=True)
) == [
("a", "b", "c"),
(1, 2, 10),
(1, 3, 20),
(1, None, 30),
]
Wide to long¶
wide_to_long turns a table from wide to long form, converting each input row
into multiple rows with fewer columns.
Arguments:
col_for_names- name of the column with names of processed columns.col_for_values- name of the column with values of processed columns.prepare_name- callable or conversion to prepare a name.prepare_value- callable or conversion to prepare a value.keep_cols- column names to keep as-is.
from convtools import conversion as c
from convtools.contrib.tables import Table
with c.OptionsCtx() as options:
options.debug = True
assert list(
Table.from_rows(
[{"name": "John", "height": 200, "age": 30, "mood": "good"}]
)
.wide_to_long(
col_for_names="metric", col_for_values="value", keep_cols=("name",)
)
.into_iter_rows(dict)
) == [
{"name": "John", "metric": "height", "value": 200},
{"name": "John", "metric": "age", "value": 30},
{"name": "John", "metric": "mood", "value": "good"},
]
Pivot¶
pivot aggregates data and creates a pivot table.
Arguments:
rows- columns to group by.columns- columns to take names of new columns from.values- mapping of name to reducer of column value/values.prepare_column_names- callable to create column names from column names and reducer name. By default, each generated column is named by joining the pivot column value(s) and reducer name with" - ", after converting every part tostr; for example,columns=["currency"]and a"sum"reducer can produce"USD - sum".
from convtools import conversion as c
from convtools.contrib.tables import Table
with c.OptionsCtx() as options:
options.debug = True
assert list(
Table.from_rows(
[
{"dept": 1, "year": 2023, "currency": "USD", "revenue": 100},
{"dept": 1, "year": 2024, "currency": "USD", "revenue": 300},
{"dept": 1, "year": 2024, "currency": "CNY", "revenue": 200},
{"dept": 1, "year": 2024, "currency": "CNY", "revenue": 111},
]
)
.pivot(
rows=["year", "dept"],
columns=["currency"],
values={
"sum": c.ReduceFuncs.Sum(c.col("revenue")),
"min": c.ReduceFuncs.Min(c.col("revenue")),
},
)
.into_iter_rows(dict)
) == [
{
"CNY - min": None,
"CNY - sum": None,
"USD - min": 100,
"USD - sum": 100,
"dept": 1,
"year": 2023,
},
{
"CNY - min": 111,
"CNY - sum": 311,
"USD - min": 300,
"USD - sum": 300,
"dept": 1,
"year": 2024,
},
]
Using tables inside conversions¶
Table cannot be used directly inside other conversions because it would
introduce a second code-generation layer.
You can still use piping to callables:
from convtools import conversion as c
from convtools.contrib.tables import Table
with c.OptionsCtx() as options:
options.debug = True
input_data = [["a", "b"], [1, 2], [3, 4]]
converter = (
c.this.pipe(
lambda it: Table.from_rows(it, header=True).into_iter_rows(dict)
)
.as_type(list)
.gen_converter()
)
assert converter(input_data) == [
{"a": 1, "b": 2},
{"a": 3, "b": 4},
]
Note
Keep in mind that unlike conversions, Table doesn't have gen_converter,
so the code cannot be generated once during a warm-up and reused. Tables
generate their code on each run.