Skip to content

Basics

The idea behind this library is to allow you to dynamically build data transforms, which can be compiled to ad-hoc Python functions.

This means that we need to introduce convtools primitives for the most basic Python operations first, before we get to more complex things like aggregations and joins.

c.this

To start with, here is a function which increments an input by one:

def f(data):
    return data + 1

If we called data as c.this, then this data transform would look like: c.this + 1. And this is a correct convtools conversion.

So the steps of getting a converter function are:

  1. you define data transforms using its building blocks (conversions)
  2. you call gen_converter conversion method to generate ad-hoc code and compile a function, which implements the transform you just defined
  3. you use the resulting function as many times as needed
from convtools import conversion as c

conversion = c.this + 1
converter = conversion.gen_converter(debug=True)

assert converter(1) == 2

Many examples will contain debug=True just so the generated code is visible for those who are curious, not because it's required :)

If we need a converter function to run it only once, then we can shorten it to:

from convtools import conversion as c

assert (c.this + 1).execute(1, debug=True) == 2

c.item

Of course the above is not enough to work with data. Let's define conversions to perform key/index lookups.

Note that c.item(x) is shorthand for c.this.item(x) - it operates on the input data by default, just like c.this does.

No default

def f(data):
    return data[1]

There are two ways to do it:

  1. c.this.item(1) - to build on top of the previous example
  2. c.item(1) - same, but shorter
from convtools import conversion as c

converter = c.item(1).gen_converter(debug=True)

assert converter([10, 20]) == 20

With default

Should you need to suppress KeyError, IndexError and TypeError to achieve dict.get(key, default)-like behavior but for arbitrary data:

from convtools import conversion as c

converter = c.item(1, default=-1).gen_converter(debug=True)

assert converter([10]) == -1

Multiple indexes / keys

Sometimes you may need to perform multiple subsequent index/key lookups:

from convtools import conversion as c

converter = c.item(1, "value").gen_converter(debug=True)

assert converter([{"value": 10}, {"value": 20}]) == 20

c.attr

Just like the c.item conversion takes care of index/key lookups, the c.attr does attribute lookups. So to define the following conversion:

def f(data):
    return data.value

just use c.attr("value").

Should you need to suppress AttributeError, pass the default argument.

Here is all-in one example:

from convtools import conversion as c

class Obj:
    def __init__(self, a):
        self.a = a

class Container:
    def __init__(self, obj):
        self.obj = obj

assert c.attr("obj", "a").execute(Container(Obj(1)), debug=True) == 1
assert c.attr("b", default=None).execute(Obj(1), debug=True) is None

c.naive

In fact we implicitly used c.naive when we implemented the increment conversion. It is used to make objects/functions/whatever available inside conversions.

A good example is when we need to achieve something like the following:

VALUE_TO_VERBOSE = {
    1: "ACTIVE",
    2: "INACTIVE",
}
def f(data):
    return VALUE_TO_VERBOSE[data]
here we made VALUE_TO_VERBOSE available to the function. To build an equivalent conversion wrap an object to be exposed into c.naive:

from convtools import conversion as c

VALUE_TO_VERBOSE = {
    1: "ACTIVE",
    2: "INACTIVE",
}
converter = c.naive(VALUE_TO_VERBOSE).item(c.this).gen_converter(debug=True)

assert converter(2) == "INACTIVE"

And yes, you can pass conversions as arguments to other conversions (notice .item(c.this) part).

c.input_arg

Given that we are generating functions, it is useful to be able to add parameters to them.

Let's update our "increment" function to have a keyword-only increment parameter:

def f(data, *, increment):
    return data + increment

To build a conversion like this use c.input_arg("increment") to reference the keyword argument to be passed:

from convtools import conversion as c

converter = (c.this + c.input_arg("increment")).gen_converter(debug=True)

assert converter(10, increment=5) == 15

When to use c.naive vs c.input_arg

  • c.naive: Use when the value is known at converter creation time and won't change between calls (lookup tables, configuration constants, helper functions).
  • c.input_arg: Use when the value varies per call (runtime parameters, user-provided configuration, dynamic thresholds).

Placeholders & Special References

Some conversions change what "current input" means. For example, inside c.iter(...), c.this refers to the current element, while in a join condition c.LEFT and c.RIGHT refer to the current pair of rows being matched.

Reference Where to use it What it points to
c.this Anywhere The current conversion input. Inside iterable helpers and comprehensions, this is the current item.
c.naive(value) Anywhere A value or function known when the converter is built and exposed to generated code.
c.input_arg("name") Anywhere A runtime argument of the generated converter. By default these become keyword-only arguments unless signature= is customized.
c.label("name") After labeling data with add_label, label_input, or label_output The previously saved value for that label.
c.LEFT / c.RIGHT c.join(...) conditions and table joins The current left and right rows being compared.
c.CHUNK c.chunk_by_condition(...) The current accumulated chunk while deciding whether the next item belongs to it.
c.PREV c.cumulative(...) reduce expressions The previous cumulative value.

Reducers and window expressions also use context-specific inputs. Reducer arguments such as c.ReduceFuncs.Sum(c.item("amount")) are evaluated against each input row. Window conversions follow the same pattern for row expressions, while window functions themselves are provided by c.WindowFuncs.

Calling functions

One of the most important primitive conversions is the one which calls functions. Let's build a conversion which does the following:

from datetime import datetime

def f(data):
    return datetime.strptime(data, "%m/%d/%Y")

We can either:

  1. use c.call_func on datetime.strptime
  2. use call_method on datetime
  3. expose datetime.strptime via c.naive and then call it
from datetime import datetime
from convtools import conversion as c

# Option 1
converter = c.call_func(datetime.strptime, c.this, "%m/%d/%Y").gen_converter(
    debug=True
)

assert converter("12/31/2000") == datetime(2000, 12, 31)

# Option 2
converter = (
    c.naive(datetime)
    .call_method("strptime", c.this, "%m/%d/%Y")
    .gen_converter(debug=True)
)

assert converter("12/31/2000") == datetime(2000, 12, 31)

# Option 3
assert (
    c.naive(datetime.strptime)
    .call(c.this, "%m/%d/%Y")
    .execute("12/31/2000", debug=True)
) == datetime(2000, 12, 31)

Tip

If we think about which one is faster, have a look at the generated code. That extra .strptime attribute lookup in the 2nd variant makes it slower, while both other variants perform this lookup only once at conversion building stage and wouldn't perform it if we stored the converter for further reuse.

Recommendation: For most cases, prefer c.call_func or c.naive(func)(...). Both avoid repeated attribute lookups and produce cleaner generated code.

calling with *args, **kwargs

Of course this is slower because on every call args and kwargs get rebuilt, but sometimes you cannot avoid such calls as f(*args, **kwargs). The options are:

  1. c.apply_func
  2. apply_method
  3. apply
from datetime import datetime
from convtools import conversion as c

data = {"obj": {"args": (1, 2), "kwargs": {"mode": "verbose"}}}

class A:
    @classmethod
    def f(cls, *args, **kwargs):
        return len(args) + len(kwargs)

# Option 1
converter = c.apply_func(
    A.f, c.item("obj", "args"), c.item("obj", "kwargs")
).gen_converter(debug=True)
assert converter(data) == 3

# Option 2
converter = (
    c.naive(A)
    .apply_method("f", c.item("obj", "args"), c.item("obj", "kwargs"))
    .gen_converter(debug=True)
)
assert converter(data) == 3

# Option 3
converter = (
    c.naive(A.f)
    .apply(c.item("obj", "args"), c.item("obj", "kwargs"))
    .gen_converter(debug=True)
)
assert converter(data) == 3

Operators

from convtools import conversion as c

c(
    {
        "-a": -c.item(0),
        "a + b": c.item(0) + c.item(1),
        "a - b": c.item(0) - c.item(1),
        "a * b": c.item(0) * c.item(1),
        "a ** b": c.item(0) ** c.item(1),
        "a / b": c.item(0) / c.item(1),
        "a // b": c.item(0) // c.item(1),
        "a % b": c.item(0) % c.item(1),
        "a == b": c.item(0) == c.item(1),
        "a >= b": c.item(0) >= c.item(1),
        "a <= b": c.item(0) <= c.item(1),
        "a < b": c.item(0) < c.item(1),
        "a > b": c.item(0) > c.item(1),
        "a or b": c.or_(c.item(0), c.item(1)),
          # "a or b": c.item(0).or_(c.item(1)),
          # "a or b": c.item(0) | c.item(1),
        "a and b": c.and_(c.item(0), c.item(1)),
          # "a and b": c.item(0).and_(c.item(1)),
          # "a and b": c.item(0) & c.item(1),
        "not a": ~c.item(0),
        "a is b": c.item(0).is_(c.item(1)),
        "a is not b": c.item(0).is_not(c.item(1)),
        "a in b": c.item(0).in_(c.item(1)),
        "a not in b": c.item(0).not_in(c.item(1)),
    }
).gen_converter(debug=True)

Converter signature

Sometimes it's required to adjust the automatically generated converter signature. gen_converter accepts several parameters for this:

  1. method=True - prepends self, producing a signature like def converter(self, data_)
  2. class_method=True - prepends cls, producing a signature like def converter(cls, data_), and returns a classmethod
  3. signature="..." - uses the provided function signature verbatim
  4. debug=True - prints and dumps generated code for this converter
  5. converter_name="..." - changes the generated function name prefix

The generated converter uses data_ as the input variable. Include data_ in custom signature= values when the conversion reads the input. Also include any names referenced with c.input_arg("name"); otherwise converter generation raises an error before compiling the function.

from convtools import conversion as c


class A:
    get_one = c.naive(1).gen_converter(class_method=True, debug=True)

    get_two = c.naive(2).gen_converter(method=True, debug=True)

    get_self = c.input_arg("self").gen_converter(signature="self", debug=True)


a = A()

assert A.get_one(None) == 1 and a.get_two(None) == 2 and a.get_self() is a

Debug

When you need to debug a conversion, enable debug mode to inspect the generated Python code. There are 2 ways:

  1. pass debug=True to gen_converter or execute for one converter build
  2. set options.debug = True inside c.OptionsCtx() for a scoped block of converter builds

c.OptionsCtx is thread-local and restores the previous options when the with block exits. It currently supports one option:

Option Default Meaning
debug False Print generated code during compilation, format it with black when installed, and dump generated source files for debugger tracebacks.

Passing debug=True to gen_converter temporarily enables the same debug option for that converter and any nested converter functions it generates. Passing debug=True to execute is a shortcut for generating a debug converter and immediately calling it.

Generated source files are written to the directory from PY_CONVTOOLS_DEBUG_DIR, or to py_convtools_debug inside Python's temporary directory when the environment variable is not set. Sources are also dumped if generated code raises an exception, so tracebacks can point to readable files.

from convtools import conversion as c

c.item(1).gen_converter(debug=True)

with c.OptionsCtx() as options:
    options.debug = True
    c.item(1).gen_converter()

Here is a small example of the kind of code debug=True prints:

def _converter(data_):
    try:
        return [
            {
                "id": _i["id"],
                "total": (_i["qty"] * _i["price"]),
            }
            for _i in data_["orders"]
        ]
    except __exceptions_to_dump_sources:
        __convtools__code_storage.dump_sources()
        raise

For interactive debugging, use the breakpoint conversion:

c({"a": c.breakpoint()}).gen_converter(debug=True)
# same
c({"a": c.item(0).breakpoint()}).gen_converter(debug=True)

c.breakpoint() wraps the intermediate value, stops at that point, and returns the value unchanged when execution continues. On Python 3.7+ it calls the built-in breakpoint(), on older Python versions it falls back to pdb.set_trace(), and when pydevd is already loaded it uses pydevd.settrace() for IDE debuggers.

Inline expressions

Warning

convtools cannot guard you here and doesn't infer any insights from the attributes of unknown pieces of code. Avoid using if possible.

Risks:

  • Code injection: Never pass untrusted user input to inline expressions - they are executed as Python code.
  • Bypasses optimizer: Inline code cannot be analyzed or optimized by convtools.
  • Harder to debug: Errors in inline expressions produce less helpful tracebacks.

There are two ways to pass custom code expression as a string:

  1. c.escaped_string
  2. c.inline_expr
from convtools import conversion as c

assert c.escaped_string("1 + 1").execute(None, debug=True) == 2
assert c.inline_expr("1 + 1").execute(None, debug=True) == 2

assert (
    c.inline_expr("{} + {}").pass_args(c.this, 1).execute(10, debug=True) == 11
)
assert (
    c.inline_expr("{a} + {b}").pass_args(a=c.this, b=1).execute(10, debug=True)
    == 11
)

Now that we know the basics and how the thing works, we are ready to go over more complex conversions in a more cheatsheet-like narrative.