Skip to content

Joins

Prerequisites

Start with Basics for conversion fundamentals and Collections for shaping or iterating inputs before joining.

See also

Join conditions use context-specific references; see Placeholders & Special References. For composing join predicates, see Conditions and Pipes.

c.join(left_conversion, right_conversion, condition, how="inner") defines join conversion, which returns an iterator of (left_element, right_element) tuples.

  • left_conversion defines the left part of a join
  • right_conversion defines the right part of a join
  • condition defines which left/right row pairs match. Use c.LEFT and c.RIGHT inside the condition to reference elements of the left and right sequences. Pass True to make a cross join.
  • how is any of "inner" | "left" | "right" | "full". "outer" is accepted as an alias for "full".

See Placeholders & Special References for the full list of context-specific references.

Info

c.join builds a hash map of the right side when keys are equi‑joinable (c.and_(...) / == operators only); memory is O(len(right)). Non‑equi conditions use nested-loop matching and can be much slower on large inputs.

Left and right references

c.LEFT and c.RIGHT point to the current left and right rows being compared. They behave like normal conversions, so you can use item, attr, call_method, operators, and type conversions on them.

The example below joins two collections from a tuple input. The right-side IDs are strings, so the condition casts them with .as_type(int) before comparing. It also keeps only right-side rows where age >= 18.

from convtools import conversion as c

collection_1 = [
    {"id": 1, "name": "Nick"},
    {"id": 2, "name": "Joash"},
    {"id": 3, "name": "Bob"},
]
collection_2 = [
    {"ID": "3", "age": 17, "country": "GB"},
    {"ID": "2", "age": 21, "country": "US"},
    {"ID": "1", "age": 18, "country": "CA"},
]
input_data = (collection_1, collection_2)

conv = (
    c.join(
        c.item(0),
        c.item(1),
        c.and_(
            c.LEFT.item("id") == c.RIGHT.item("ID").as_type(int),
            c.RIGHT.item("age") >= 18,
        ),
        how="left",
    )
    .pipe(
        c.list_comp(
            {
                "id": c.item(0, "id"),
                "name": c.item(0, "name"),
                "age": c.item(1, "age", default=None),
                "country": c.item(1, "country", default=None),
            }
        )
    )
    .gen_converter(debug=True)
)

assert conv(input_data) == [
    {"id": 1, "name": "Nick", "age": 18, "country": "CA"},
    {"id": 2, "name": "Joash", "age": 21, "country": "US"},
    {"id": 3, "name": "Bob", "age": None, "country": None},
]

Multi-key joins

Use c.and_(...) with multiple equality conditions to join on more than one key. Equi-joinable conditions are used to build the right-side hash map.

from convtools import conversion as c


input_data = {
    "orders": [
        {"store": "east", "sku": "A", "qty": 2},
        {"store": "east", "sku": "B", "qty": 1},
        {"store": "west", "sku": "A", "qty": 3},
    ],
    "prices": [
        {"store": "east", "sku": "A", "price": 10},
        {"store": "east", "sku": "B", "price": 8},
        {"store": "west", "sku": "A", "price": 12},
    ],
}

conv = (
    c.join(
        c.item("orders"),
        c.item("prices"),
        c.and_(
            c.LEFT.item("store") == c.RIGHT.item("store"),
            c.LEFT.item("sku") == c.RIGHT.item("sku"),
        ),
    )
    .pipe(
        c.list_comp(
            {
                "store": c.item(0, "store"),
                "sku": c.item(0, "sku"),
                "qty": c.item(0, "qty"),
                "price": c.item(1, "price"),
            }
        )
    )
    .gen_converter()
)

assert conv(input_data) == [
    {"store": "east", "sku": "A", "qty": 2, "price": 10},
    {"store": "east", "sku": "B", "qty": 1, "price": 8},
    {"store": "west", "sku": "A", "qty": 3, "price": 12},
]

Non-equi joins

Conditions do not have to be equality checks. For range matching and other non-equi joins, write the condition directly with operators on c.LEFT and c.RIGHT.

from convtools import conversion as c


input_data = {
    "people": [
        {"name": "Nick", "age": 17},
        {"name": "Ann", "age": 22},
    ],
    "age_bands": [
        {"band": "junior", "min_age": 0, "max_age": 17},
        {"band": "adult", "min_age": 18, "max_age": 64},
    ],
}

conv = (
    c.join(
        c.item("people"),
        c.item("age_bands"),
        c.and_(
            c.LEFT.item("age") >= c.RIGHT.item("min_age"),
            c.LEFT.item("age") <= c.RIGHT.item("max_age"),
        ),
    )
    .pipe(
        c.list_comp(
            {
                "name": c.item(0, "name"),
                "age": c.item(0, "age"),
                "band": c.item(1, "band"),
            }
        )
    )
    .gen_converter()
)

assert conv(input_data) == [
    {"name": "Nick", "age": 17, "band": "junior"},
    {"name": "Ann", "age": 22, "band": "adult"},
]

Duplicate matches

c.join emits every matching pair. If two left rows and two right rows have the same join key, the result contains four pairs for that key. This is the same many-to-many behavior as a relational join.

from convtools import conversion as c


left = [{"id": 1, "value": "a"}, {"id": 1, "value": "b"}]
right = [{"id": 1, "value": "x"}, {"id": 1, "value": "y"}]

conv = (
    c.join(c.item(0), c.item(1), c.LEFT.item("id") == c.RIGHT.item("id"))
    .as_type(list)
    .gen_converter()
)

assert conv((left, right)) == [
    ({"id": 1, "value": "a"}, {"id": 1, "value": "x"}),
    ({"id": 1, "value": "a"}, {"id": 1, "value": "y"}),
    ({"id": 1, "value": "b"}, {"id": 1, "value": "x"}),
    ({"id": 1, "value": "b"}, {"id": 1, "value": "y"}),
]

Memory and performance

When the condition contains equality predicates that compare left expressions to right expressions, c.join uses those predicates as hash keys and stores the right side in memory. Additional one-sided filters can be pushed to the relevant side before matching.

For non-equi conditions, or conditions that cannot be split into hash keys, c.join keeps the right side available and checks candidate pairs in a nested loop. This is flexible, but it is usually more expensive for large inputs.

left, right, and full joins may yield None for the missing side. full joins also track which right rows were already matched so they can emit unmatched right rows at the end.