convtools.contrib package

Submodules

convtools.contrib.fs module

Python’s native open() doesn’t support custom newlines in the text mode and doesn’t support “newlines” (delimiters) in binary mode. The following methods should close the gap.

convtools.contrib.fs.split_buffer(buffer, delimiter, chunk_size=32768)[source]

Reads text or binary buffer and splits it by delimiter.

Parameters
  • buffer – buffer to be read

  • delimiter – delimiter to use for splitting

  • chunk_size – chunk size to read at every iteration

convtools.contrib.fs.split_buffer_n_decode(buffer, delimiter, chunk_size=32768, encoding='utf-8')[source]

Reads binary buffer, splits it by binary delimiter and yields decoded chunks.

Parameters
  • buffer – buffer to be read

  • delimiter – delimiter to use for splitting

  • chunk_size – chunk size to read at every iteration

  • encoding – encoding to use when decoding a chunk

convtools.contrib.tables module

Implements streaming operations on table-like data and csv files.

Conversions are defined in realtime based on table headers and called methods:
  • update

  • take

  • drop

  • join

class convtools.contrib.tables.CloseFileIterator(file_to_close)[source]

Bases: object

Utility to close the corresponding file once the iterator is exhausted

class convtools.contrib.tables.CustomCsvDialect(delimiter=',', quotechar='"', escapechar=None, doublequote=True, skipinitialspace=False, lineterminator='\r\n', quoting=0)[source]

Bases: csv.Dialect

A helper to define custom csv dialects without defining classes

class convtools.contrib.tables.Table(row_type: Optional[type], rows_objects: List[Iterable], meta_columns: convtools.columns.MetaColumns, pending_changes: int, pipeline: Optional[convtools.base.BaseConversion] = None, file_to_close=None)[source]

Bases: object

Table conversion exposes streaming operations on table-like data and csv files

>>> Table.from_csv("input.csv", header=True).update(
>>>     c=c.item("a") + c.item("b")
>>> ).into_csv("output.csv")
>>> list(
>>>     Table.from_rows(
>>>         [
>>>             ("a", "b"),
>>>             (1, 2),
>>>             (3, 4),
>>>         ],
>>>         header=True,
>>>     )
>>>     .update(c=c.col("a") + c.col("b"))
>>>     .into_iter_rows()
>>> )
[("a", "b", "c"), (1, 2, 3), (3, 4, 7)]
chain(table: convtools.contrib.tables.Table, fill_value=None) convtools.contrib.tables.Table[source]

Chain tables, putting them one after another.

Let’s assume fill_value is set to ” “:

>>> Table 1      Table 2
>>> | a | b |    | b | c |
>>> | 1 | 2 |    | 3 | 4 |
>>>
>>> table1.chain(table2, fill_value=" ")
>>>
>>> Result:
>>> | a | b | c |
>>> | 1 | 2 |   |
>>> |   | 3 | 4 |
Parameters
  • table (-) – table to be chained

  • fill_value (-) – value to use for filling gaps

property columns: List[str]

Exposes list of column names

csv_dialect

alias of convtools.contrib.tables.CustomCsvDialect

drop(*column_names: str) convtools.contrib.tables.Table[source]

Drops specified columns, keeping the rest.

Parameters

column_names – columns to drop

embed_conversions() convtools.contrib.tables.Table[source]

There’s no need in calling this directly, it’s done automatically when you use convtools.contrib.tables.Table.update or convtools.contrib.tables.Table.join See the explanation below:

Since each column is either:
  • simply taken by index (cheap)

  • or obtained by performing arbitrary convtools conversion (may be expensive)

it’s important to have multiple layers of processing when something depends on something which may be expensive to calculate.

This method adds a new processing stage to a pipeline and exposes all “cheap” columns to further conversions.

explode(column_name: str)[source]

Explodes a table to a long format by exploding a column with iterables, e.g.:

>>> | a |   b    |
>>> | 1 | [2, 3] |
>>> | 4 | [5, 6] |
>>>
>>> table.explode("b")
>>>
>>> | a | b |
>>> | 1 | 2 |
>>> | 1 | 3 |
>>> | 2 | 5 |
>>> | 2 | 6 |
Parameters

column_name (-) – column with iterables to be exploded

filter(condition: convtools.base.BaseConversion) convtools.contrib.tables.Table[source]

Filters table-like data, keeping rows where condition resolves to True

classmethod from_csv(filepath_or_buffer: Union[str, TextIO], header: Optional[Union[bool, List[str], Tuple[str], Dict[str, Union[str, int]]]] = None, duplicate_columns: str = 'mangle', skip_rows: int = 0, dialect: Union[str, convtools.contrib.tables.CustomCsvDialect] = 'excel', encoding: str = 'utf-8') convtools.contrib.tables.Table[source]

A method to initialize a table conversion from a csv-like file.

Parameters
  • filepath_or_buffer – a filepath or something csv.reader can read

  • header

    specifies header inference mode:

    • True: takes either the first tuple/list or keys of the first dict as a header

    • False: there’s no header in input data, use numbered columns instead: COLUMN_0, COLUMN_1, etc.

    • list/tuple of str: there’s no header in input data, so this is the header to be used (raises ValueError if numbers of columns don’t match)

    • dict: its keys form the header, values are str/int indices to take values of columns from input rows (raises ValueError if numbers of columns don’t match)

    • None: inspects the first row and if it’s a dict, then takes its keys as a header

  • duplicate_columns

    either of following (“mangle” by default):

    • ”raise”: ValueError is raise if a duplicate column is detected

    • ”keep”: duplicate columns are left as is, but when referenced the first one is used

    • ”drop”: duplicate columns are skipped

    • ”mangle”: names of duplicate columns are mangled like: “name”, “name_1”, “name_2”, etc.

  • skip_rows – number of rows to skip at the beginning. Useful when input data contains a header, but you provide your own - in this case it’s convenient to skip the heading row from the input

  • dialect – a dialect acceptable by csv.reader There’s a helper method: convtools.contrib.tables.Table.csv_dialect to create dialects without defining classes

  • encoding – encoding to pass to open

classmethod from_rows(rows: Iterable[Union[dict, tuple, list, Any]], header: Optional[Union[bool, List[str], Tuple[str], Dict[str, Union[str, int]]]] = None, duplicate_columns: str = 'raise', skip_rows: int = 0, file_to_close=None) convtools.contrib.tables.Table[source]

A method to initialize a table conversion from an iterable of rows.

Parameters
  • rows – can be either an iterable of any objects if no header inference is required OR an iterable of dicts, tuples or lists

  • header

    specifies header inference mode:

    • True: takes either the first tuple/list or keys of the first dict as a header

    • False: there’s no header in input data, use numbered columns instead: COLUMN_0, COLUMN_1, etc.

    • list/tuple of str: there’s no header in input data, so this is the header to be used (raises ValueError if numbers of columns don’t match)

    • dict: its keys form the header, values are str/int indices to take values of columns from input rows (raises ValueError if numbers of columns don’t match)

    • None: inspects the first row and if it’s a dict, then takes its keys as a header

  • duplicate_columns

    either of following (“raise” by default):

    • ”raise”: ValueError is raise if a duplicate column is detected

    • ”keep”: duplicate columns are left as is, but when referenced the first one is used

    • ”drop”: duplicate columns are skipped

    • ”mangle”: names of duplicate columns are mangled like: “name”, “name_1”, “name_2”, etc.

  • skip_rows – number of rows to skip at the beginning. Useful when input data contains a header, but you provide your own - in this case it’s convenient to skip the heading row from the input

get_columns() List[str][source]

Exposes list of column names

into_csv(filepath_or_buffer: Union[str, TextIO], include_header: bool = True, dialect: Union[str, convtools.contrib.tables.CustomCsvDialect] = 'excel', encoding='utf-8')[source]

Consumes inner rows Iterable and writes processed rows as a csv-like file.

Parameters
into_iter_rows(type_=<class 'tuple'>, include_header=None) Iterable[Any][source]

Consumes inner rows Iterable and returns Iterable of processed rows.

Parameters

type

casts output rows to the type. Accepts the following values:

into_list_of_iterables(type_=<class 'tuple'>, include_header=None) List[Iterable][source]
join(table: convtools.contrib.tables.Table, on: Union[convtools.base.BaseConversion, str, Iterable[str]], how: str, suffixes=('_LEFT', '_RIGHT')) convtools.contrib.tables.Table[source]

Joins the table conversion to another table conversion.

Parameters
  • table – another table conversion to join to self

  • on

    • either a join conversion like c.LEFT.col("a") == c.RIGHT.col("A")

    • or iterable of column names to join on

  • how – either of these: “inner”, “left”, “right”, “outer” (same as “full”)

  • suffixes – tuple of two strings: the first one is the suffix to be added to left columns, having conflicting names with right columns; the second one is added to conflicting right ones. When on is an iterable of strings, these columns are excluded from suffixing.

move_rows_objects() List[Iterable][source]

Moves out rows objects including files to be closed later

rename(columns: Union[Tuple[str], List[str], Dict[str, str]]) convtools.contrib.tables.Table[source]

Method to rename columns. The behavior depends on type of columns argument.

Parameters

columns – if tuple/list, then it defines new column names (length of passed columns should match number of columns inside). If dict, then it defines a mapping from old column names to new ones.

supported_types = (<class 'tuple'>, <class 'list'>, <class 'dict'>)
take(*column_names: Union[str, Any]) convtools.contrib.tables.Table[source]

Leaves only specified columns, omitting the rest. ... references non-mentioned columns, so it’s easy to both take a few columns:

>>> table.take("a", "b")

and rearrange them:

>>> table.take("c", "d", ...)
Parameters

column_names – columns to keep

update(**column_to_conversion) convtools.contrib.tables.Table[source]

The main method to mutate table-like data.

Parameters

column_to_conversion – dict where keys are new or existing columns and values are conversions to be applied row-wise

update_all(*conversions) convtools.contrib.tables.Table[source]

Table-wide mutations, applied to each value of each column.

Parameters

conversions – conversion to apply to each value of each column

zip(table: convtools.contrib.tables.Table, fill_value=None) convtools.contrib.tables.Table[source]

Zip tables one to another. Before using this method, make sure you are not looking for convtools.contrib.tables.Table.join

Let’s assume fill_value is set to ” “:

>>> Table 1      Table 2
>>> | a | b |    | b | c |
>>> | 1 | 2 |    | 3 | 4 |
>>>              | 5 | 6 |
>>>
>>> table1.zip(table2, fill_value=" ")
>>>
>>> Result:
>>> | a | b | b | c |
>>> | 1 | 2 | 3 | 4 |
>>> |   |   | 5 | 6 |
Parameters
  • table (-) – table to be chained

  • fill_value (-) – value to use for filling gaps

Module contents