convtools QuickStart

1. Installation

pip install convtools

For the sake of conciseness, let’s assume the following import statement is in place:

from convtools import conversion as c

This is an object which exposes public API.

2. Glossary

3. Intro

Please make sure you’ve read - base info here.

Let’s review the most basic conversions:
  • c.this returns an input untouched

  • c.item makes any number of dictionary/index lookups, supports default=...

  • c.attr makes any number of attribute lookups, supports default=...

  • c.naive returns an object passed to naive untouched

  • c.input_arg returns an input argument of a resulting converter

Example #1:

# we'll cover this "c() wrapper" in the next section
converter = c({
    "full_name": c.item("data", "fullName"),
    "age": c.item("data", "age", default=None),
}).gen_converter(debug=True)

input_data = {"data": {"fullName": "John Wick", "age": 18}}
assert converter(input_data) == {"full_name": "John Wick", "age": 18}

Example #2 - just to demonstrate every concept mentioned above:

# we'll cover this "c() wrapper" in the next section
c({
    "input": c.this,
    "naive": c.naive("string to be passed"),
    "input_arg": c.input_arg("dt"),
    "by_keys_and_indexes": c.item("key1", 1),
    "by_attrs": c.attr("keys"),
}).gen_converter(debug=True)

Example #3 (advanced) - keys/indexes/attrs can be conversions themselves:

converter = c.item(c.item("key")).gen_converter(debug=True)
converter({"key": "amount", "amount": 15}) == 15

These were the most basic ones. You will see how useful they are, when combining them with manipulating converter signatures, passing functions / objects to conversions, sharing conversion parts (honoring DRY principle).

4. Creating collections - c() wrapper, Optional items, overloaded operators and debugging

Next points to learn:
  1. operators are overloaded for conversions - convtools operators

  2. every argument passed to a conversion is wrapped with c() wrapper which:

    • leaves conversions untouched

    • rebuilds python dict/list/tuple/set collections as literals, e.g. collection conversions

    • everything else is being wrapped with c.naive

  3. collections support optional items c.optional

Note

whenever you are not sure what code is going to be generated, just pass debug=True to the gen_converter method. Also it’s useful to have black installed, because then it is used to format auto-generated code.

For example, to convert a tuple to a dict:

data_input = (1, 2, 3)

converter = c({
    "sum": c.item(0) + c.item(1) + c.item(2),
    "and_or": c.item(0).and_(c.item(1)).or_(c.item(2)),
    "comparisons": c.item(0) > c.item(1),
}).gen_converter(debug=True)

converter(data_input) == {'sum': 6, 'and_or': 2, 'comparisons': False}

It’s possible to define an optional key, value or list/set/tuple item, which appears in the output only if a condition is met:

converter = c({
    "exists if 'key' exists": c.optional(c.item("key", default=None)),
    "exists if not None": c.optional(
        c.call_func(lambda i: i+1, c.item("key", default=None)),
        skip_value=None,
    ),
    "exists if 'amount' > 10": c.optional(
        c.call_func(bool, c.item("key", default=None)),
        skip_if=c.item("amount") <= 10,
    ),
    "exists if 'amount' > 10 (same)": c.optional(
        c.call_func(bool, c.item("key", default=None)),
        keep_if=c.item("amount") > 10,
    ),
    # works for keys too
    c.optional(
        "name",
        keep_if=c.item("tos_accepted", default=False)
     ): c.item("name"),
}).gen_converter(debug=True)

5. Passing/calling functions & objects into conversions; defining converter signature

Next:
  • gen_converter takes signature argument to modify a signature of the resulting converter. Also there are 2 shortcuts: method=True for defining methods and class_method=False for classmethods

  • there are 3 different ways of calling functions, see this section for examples:

    • c.call_func - to call a function and pass arguments (of course each is being wrapped with c() wrapper)

    • (...).call_method - to call a method of the conversion and pass args

    • (...).call - to call a callable and pass args

  • also there are 3 call counterparts for cases where argument unpacking is needed and kwargs keys contain conversions

Imagine we have the following:

from datetime import date
from decimal import Decimal

# A function to convert amounts
def convert_currency(
    currency_from: str, currency_to: str, dt: date, amount: Decimal
):
    # ...
    return amount

# OR an object to use to convert amounts
class CurrencyConverter:
    def __init__(self, currency_to="USD"):
        self.currency_to = currency_to

    def convert_currency(self, currency_from, dt, amount):
        # ...
        return amount

 currency_converter = CurrencyConverter(currency_to="GBP")

 # and some mapping to add company name:
 company_id_to_name = {"id821": "Tardygram"}

Let’s prepare the converter to get a dict with company name and USD amount from a tuple:

data_input = ("id821", "EUR", date(2020, 1, 1), Decimal("100"))

converter = c({
    "id": c.item(0),

    # naive makes the mapping available to a generated code
    "company_name": c.naive(company_id_to_name).item(c.item(0)),

    "amount_usd": c.call_func(
        convert_currency,
        c.item(1),
        "USD",
        c.input_arg("kwargs").item("dt"),
        c.item(3),
    ),
    "amount_usd2": c.naive(currency_converter).call_method(
        "convert_currency",
        c.item(1),
        c.input_arg("kwargs").item("dt"),
        c.item(3),
    ),
    # of course we could take "dt" as an argument directly, but doing the
    # following is here just for demonstrational purposes
}).gen_converter(debug=True, signature="data_, **kwargs")

converter(data_input, dt=date(2020, 1, 1)) == {
    "id": "id821",
    "company_name": "Tardygram",
    "amount_usd": Decimal("110"),
    "amount_usd2": Decimal("110"),
}

Let’s review apply ones:

c.apply_func(f, args, kwargs)
# is same as the following, but works for kwargs with conversions as keys
c.call_func(f, *args, **kwargs)

c.apply(args, kwargs)
c.this.apply(args, kwargs)
# are same as
c.call(*args, **kwargs)
c.this.call(*args, **kwargs)

c.this.apply_method("foo", args, kwargs)
# is same as
c.this.call_method("foo", *args, **kwargs)

6. List/dict/set/tuple comprehensions & inline expressions

Next:
  1. the following conversions generate comprehension code:

  2. every comprehension supports if clauses to filter input:

    • c.list_comp(..., where=condition_conv)

    • c.this.iter(..., where=condition_conv)

  3. to avoid unnecessary function call overhead, there is a way to pass an inline python expression c.inline_expr

Lets do all at once:

input_data = [
    {"value": 100, "country": "US"},
    {"value": 15, "country": "CA"},
    {"value": 74, "country": "AU"},
    {"value": 350, "country": "US"},
]

converter = c.list_comp(
    c.item("value").call_method("bit_length"),
    where=c.item("country") == "US"
).sort(
    # working with the resulting item here
    key=lambda item: item,
    reverse=True,
).gen_converter(debug=True)
converter(input_data)

This may be useful in cases where you work with dicts, where values are lists:

conv = (
    c.this
    .call_method("items")
    .pipe(
        c.inline_expr(
            "(key, item)"
            " for key, items in {}"
            " for item in items"
            " if key"
        ).pass_args(c.this)
    )
    # of course we could continue doing something interesting here
    # .pipe(
    #     c.group_by(...).aggregate(...)
    # )
).gen_converter(debug=True)

7.1. Processing collections - I: iter, filter, sort, pipe, label, if, if_multiple

Points to learn:

  1. c.iter iterates through an iterable, applying conversion to each element

  2. c.filter iterates through an iterable, filtering it by a passed conversion, taking items for which the conversion resolves to true

  3. c.sort passes the input to sorted

  4. (...).pipe chains two conversions by passing the result of the first one to the second one. If piping is done at the top level of a resulting conversion (not nested), then it’s going to be represented as several statements in the resulting code.

  5. labels extend pipe and regular conversions functionality:

    • (...).add_label("first_el", c.item(0)) allows to apply any conversion and then add a label to the result

    • to reference the result c.label("first_el") is used

    • any (...).pipe supports label_input and label_output parameters, both accept either str (a label name) or dict (keys are label names, values are conversions to be applied before labeling)

  6. c.if_ allows to build 1 if a else 2 expressions. It’s possible to pass not every parameter:

    • if a condition is not passed, then the input is used as a condition

    • if any branch is not passed, then the input is passed untouched

  7. c.if_multiple allows to combine multiple conditions-result pairs to build case-when like conversion

A simple pipe first:

conv = c.iter(c.this * 2).pipe(sum).gen_converter(debug=True)

# OR THE SAME
conv = c.generator_comp(c.this * 2).pipe(sum).gen_converter(debug=True)

A bit more complex ones:

conv = c.dict_comp(
    c.item("name"),
    c.item("transactions").pipe(
        c.list_comp(
            {
                "id": c.item(0).as_type(str),
                "amount": c.item(1).pipe(
                    c.if_(c.this, c.this.as_type(Decimal), None)
                ),
            }
        )
    ),
).gen_converter(debug=True)
assert conv([{"name": "test", "transactions": [(0, 0), (1, 10)]}]) == {
    "test": [
        {"id": "0", "amount": None},
        {"id": "1", "amount": Decimal("10")},
    ]
}

Now let’s use some labels:

conv1 = (
    c.this.add_label("input")
    .pipe(
        c.filter(c.this % 3 == 0),
        label_input={
            "input_type": c.call_func(type, c.this),
        },
    )
    .pipe(
        c.list_comp(c.this.as_type(str)),
        label_output={
            "list_length": c.call_func(len, c.this),
            "separator": c.if_(c.label("list_length") > 10, ",", ";"),
        },
    )
    .pipe({
        "result": c.label("separator").call_method("join", c.this),
        "input_type": c.label("input_type"),
        "input_data": c.label("input"),
    })
    .gen_converter(debug=True)
)
assert conv1(range(30)) == {
    "result": "0;3;6;9;12;15;18;21;24;27",
    "input_type": range
}
assert conv1(range(40)) == {
    "result": "0,3,6,9,12,15,18,21,24,27,30,33,36,39",
    "input_type": range
}

It works as follows: if it finds any function calls, index/attribute lookups, it just caches the input, because the IF cannot be sure whether it’s cheap or applicable to run the input code twice.

Let’s finish the section with reviewing how c.if_multiple works:

c.iter(
    c.if_multiple(
        (c.this < 0, c.this * 10),
        (c.this == 0, None),
        else_=5
    )
).gen_converter(debug=True)

7.2. Processing collections - II: chunk_by, chunk_by_condition, iter_windows

Points to learn:

  1. c.chunk_by allows to slice iterables into chunks by values and chunk sizes

  2. c.chunk_by_condition allows to slice iterables into chunks based on a condition, which is a function of a current chunk and an element

  3. c.iter_windows / (...).iter_windows iterates through an iterable and yields tuples, which are obtained by sliding a window of a given width and by moving the window by specified step size as follows: c.iter_windows(width=7, step=1)

A simple pipe first:

conv = c.chunk_by(size=1000).gen_converter(debug=True)

# OR USING A CONDITION
# conv = c.chunk_by_condition(c.CHUNK.len() < 1000).gen_converter(debug=True)

And a more complex one with running aggregation on each chunk:

c.chunk_by(
    c.item("x"),
    size=1000
).aggregate({
    "x": c.ReduceFuncs.Last(c.item("x")),
    "y": c.ReduceFuncs.Sum(c.item("y")),
}).gen_converter(debug=True)

# OR USING A CONDITION
# c.chunk_by_condition(
#     c.and_(
#         c.CHUNK.item(-1, "x") == c.item("x"),
#         c.CHUNK.len() < 1000
#     )
# ).aggregate({
#     "x": c.ReduceFuncs.Last(c.item("x")),
#     "y": c.ReduceFuncs.Sum(c.item("y")),
# }).gen_converter(debug=True)

8. Helper shortcuts

Points to learn:

  1. (...).len() is a shortcut to python’s len

  2. c.min and c.max are shortcuts to python’s min & max

  3. c.zip python’s zip on batteries, because when args provided, it generates tuples; when kwargs provided, generates dicts

  4. c.repeat wraps python’s itertools.repeat

  5. c.flatten wraps python’s itertools.chain.from_iterable

  6. c.take_while re-implements itertools.takewhile

  7. c.drop_while re-implements itertools.dropwhile

  8. c.and_then is a shortcut to (…).pipe(c.if_(condition, conversion, c.this)) where condition=bool is default. It pipes if condition is true, otherwise leaves untouched

c.this.len()
c.min(c.this, 5)
c.zip(c.item("list_a"), c.repeat(None))
c.zip(a=c.item("list_a"), b=c.repeat(None))
c.take_while(c.this < 3)
c.drop_while(c.this < 3)
c.item("dt").and_then(c.this + timedelta(days=1))

9. Aggregations

Points to learn:
  1. first, call c.group_by to specify one or many conversions to use as group by keys (getting list of items in the end) OR no conversions to aggregate (results in a single item)

  2. then call the aggregate method to define the desired output, comprised of:

    • (optional) a container you want to get the results in

    • (optional) group by keys or further conversions of them

    • any number of available out of the box c.ReduceFuncs or further conversions of them

    • any number of custom c.reduce and further conversions of them

  3. c.aggregate is a shortcut for c.group_by().aggregate(...)

Not to provide a lot of boring examples, let’s use the most interesting reduce functions:
  • use sum or none reducer

  • find a row with max value of one field and return a value of another field

  • take first value (one per group)

  • use dict array reducer

  • use dict sum reducer

from datetime import date, datetime
from decimal import Decimal

from convtools import conversion as c


def test_doc__quickstart_aggregation():
    input_data = [
        {
            "company_name": "Facebrochure",
            "company_hq": "CA",
            "app_name": "Tardygram",
            "date": "2019-01-01",
            "country": "US",
            "sales": Decimal("45678.98"),
        },
        {
            "company_name": "Facebrochure",
            "company_hq": "CA",
            "app_name": "Tardygram",
            "date": "2019-01-02",
            "country": "US",
            "sales": Decimal("86869.12"),
        },
        {
            "company_name": "Facebrochure",
            "company_hq": "CA",
            "app_name": "Tardygram",
            "date": "2019-01-03",
            "country": "CA",
            "sales": Decimal("45000.35"),
        },
        {
            "company_name": "BrainCorp",
            "company_hq": "NY",
            "app_name": "Learn FT",
            "date": "2019-01-01",
            "country": "US",
            "sales": Decimal("86869.12"),
        },
    ]

    # we are going to reuse this reducer
    top_sales_day = c.ReduceFuncs.MaxRow(c.item("sales"))

    # so the result is going to be a list of dicts
    converter = (
        c.group_by(c.item("company_name"))
        .aggregate(
            {
                "company_name": c.item("company_name").call_method("upper"),
                # this would work as well
                # c.item("company_name"): ...,
                "none_sensitive_sum": c.ReduceFuncs.SumOrNone(c.item("sales")),
                # as you can see, next two reduce objects do the same except taking
                # different fields after finding a row with max value.
                # but please check the generated code below, you'll see that it is
                # calculated just once AND then reused to take necessary fields
                "top_sales_app": top_sales_day.item("app_name"),
                "top_sales_day": (
                    top_sales_day.item("date")
                    .pipe(
                        datetime.strptime,
                        "%Y-%m-%d",
                    )
                    .call_method("date")
                ),
                "company_hq": c.ReduceFuncs.First(c.item("company_hq")),
                "app_name_to_countries": c.ReduceFuncs.DictArrayDistinct(
                    c.item("app_name"), c.item("country")
                ),
                "app_name_to_sales": c.ReduceFuncs.DictSum(
                    c.item("app_name"), c.item("sales")
                ),
            }
        )
        .gen_converter()
    )

    assert converter(input_data) == [
        {
            "app_name_to_countries": {"Tardygram": ["US", "CA"]},
            "app_name_to_sales": {"Tardygram": Decimal("177548.45")},
            "company_hq": "CA",
            "company_name": "FACEBROCHURE",
            "none_sensitive_sum": Decimal("177548.45"),
            "top_sales_app": "Tardygram",
            "top_sales_day": date(2019, 1, 2),
        },
        {
            "app_name_to_countries": {"Learn FT": ["US"]},
            "app_name_to_sales": {"Learn FT": Decimal("86869.12")},
            "company_hq": "NY",
            "company_name": "BRAINCORP",
            "none_sensitive_sum": Decimal("86869.12"),
            "top_sales_app": "Learn FT",
            "top_sales_day": date(2019, 1, 1),
        },
    ]

10. Joins

There is JOIN functionality which returns generator of joined pairs. Points to learn:

  1. c.join exposes API for joins

    • first two positional arguments are conversions which are considered as 2 iterables to be joined

    • the third argument is a join condition, represented as a conversion based on c.LEFT and c.RIGHT

  2. the following join types are supported (via passing how):

    • inner (default)

    • left

    • right

    • outer

    • cross (inner with condition=True)

Let’s say we want to parse JSON string, take 2 collections, join them on left id == right id AND right value > 100 condition, and then merge data of joined pairs into dicts:

s = '''{"left": [
    {"id": 1, "value": 10},
    {"id": 2, "value": 20}
], "right": [
    {"id": 1, "value": 100},
    {"id": 2, "value": 200}
]}'''
conv1 = (
    c.call_func(json.loads, c.this)
    .pipe(
        c.join(
            c.item("left"),
            c.item("right"),
            c.and_(
                c.LEFT.item("id") == c.RIGHT.item("id"),
                c.RIGHT.item("value") > 100
            ),
            how="left",
        )
    )
    .pipe(
        c.list_comp({
            "id": c.item(0, "id"),
            "value_left": c.item(0, "value"),
            "value_right": c.item(1).and_(c.item(1, "value")),
        })
    )
    .gen_converter(debug=True)
)
assert conv1(s) == [
    {'id': 1, 'value_left': 10, 'value_right': None},
    {'id': 2, 'value_left': 20, 'value_right': 200}
]

11. Mutations

Alongside pipes, there’s a way to tap into any conversion and define mutation of its result by using:
The following mutations are available:
  • c.Mut.set_item

  • c.Mut.set_attr

  • c.Mut.del_item

  • c.Mut.del_attr

  • c.Mut.custom

iter_mut example:

input_data = [{"a": 1, "b": 2}]

converter = c.iter_mut(
    c.Mut.set_item("c", c.item("a") + c.item("b")),
    c.Mut.del_item("a"),
    c.Mut.custom(c.this.call_method("update", c.input_arg("data")))
).as_type(list).gen_converter(debug=True)

assert converter(input_data, data={"d": 4}) == [{"b": 2, "c": 3, "d": 4}]

tap example:

input_data = [{"a": 1, "b": 2}]

converter = c.list_comp(
    c.this.tap(
        c.Mut.set_item("c", c.item("a") + c.item("b")),
        c.Mut.del_item("a"),
        c.Mut.custom(c.this.call_method("update", c.input_arg("data")))
    )
).gen_converter(debug=True)

assert converter(input_data, data={"d": 4}) == [{"b": 2, "c": 3, "d": 4}]

12. Cumulative

It’s possible to calculate cumulative metrics by defining two conversions:
  • what to do to calculate the initial value based on the first met element

  • how to reduce to elements to one

This is the main method: c.cumulative OR (...).cumulative and here is a way to reset cumulative to the initial state: c.this.cumulative_reset

assert (
    c.iter(c.cumulative(c.this, c.this + c.PREV))
    .as_type(list)
    .execute([0, 1, 2, 3, 4], debug=True)
) == [0, 1, 3, 6, 10]
assert (
    c.iter(
        c.cumulative_reset("abc")
        .iter(c.cumulative(c.this, c.this + c.PREV, label_name="abc"))
        .as_type(list)
    )
    .as_type(list)
    .execute([[0, 1, 2], [3, 4]], debug=True)
) == [[0, 1, 3], [3, 7]]

13. Debugging & setting Options

Compiled converters are debuggable callables, which dump generated code on disk to PY_CONVTOOLS_DEBUG_DIR (if env variable is defined) or to tempfile.gettempdir on any of the following cases:

  • on exception inside a converter

  • on .gen_converter(debug=True)

  • if breakpoint() method is used.

So there are 3 options to help you debug:

# No. 1: just prints black-formatted code
c.this.gen_converter(debug=True)

# No. 2: both prints black-formatted code & puts a breakpoint after "name"
# lookup
c.list_comp(c.item("name").breakpoint()).gen_converter()
# e.g. what's inside list_comp
c.list_comp(c.breakpoint()).gen_converter()

# No. 3: prints black-formatted code for all converters, generated within
# the context
with c.OptionsCtx() as options:
    options.debug = True
    c.this.gen_converter()

See c.OptionsCtx() API docs for the full list of available options.

14. Details: inner input data passing

There are few conversions which change the input for next conversions:
  • comprehensions

    inside a comprehension the input is an item of an iterable

  • pipes

    next conversion gets the result of a previous one

  • filters

    next conversion gets the result of a previous one

  • aggregations

    e.g. any further conversions done either to group by fields or to reduce objects take the result of aggregation as the input