Joins¶
Prerequisites
Start with Basics for conversion fundamentals and Collections for shaping or iterating inputs before joining.
See also
Join conditions use context-specific references; see Placeholders & Special References. For composing join predicates, see Conditions and Pipes.
c.join(left_conversion, right_conversion, condition, how="inner") defines
join conversion, which returns an iterator of (left_element, right_element)
tuples.
left_conversiondefines the left part of a joinright_conversiondefines the right part of a joinconditiondefines which left/right row pairs match. Usec.LEFTandc.RIGHTinside the condition to reference elements of the left and right sequences. PassTrueto make a cross join.howis any of"inner" | "left" | "right" | "full"."outer"is accepted as an alias for"full".
See Placeholders & Special References for the full list of context-specific references.
Info
c.join builds a hash map of the right side when keys are equi‑joinable
(c.and_(...) / == operators only); memory is O(len(right)). Non‑equi
conditions use nested-loop matching and can be much slower on large inputs.
Left and right references¶
c.LEFT and c.RIGHT point to the current left and right rows being compared.
They behave like normal conversions, so you can use item, attr,
call_method, operators, and type conversions on them.
The example below joins two collections from a tuple input. The right-side IDs
are strings, so the condition casts them with .as_type(int) before comparing.
It also keeps only right-side rows where age >= 18.
from convtools import conversion as c
collection_1 = [
{"id": 1, "name": "Nick"},
{"id": 2, "name": "Joash"},
{"id": 3, "name": "Bob"},
]
collection_2 = [
{"ID": "3", "age": 17, "country": "GB"},
{"ID": "2", "age": 21, "country": "US"},
{"ID": "1", "age": 18, "country": "CA"},
]
input_data = (collection_1, collection_2)
conv = (
c.join(
c.item(0),
c.item(1),
c.and_(
c.LEFT.item("id") == c.RIGHT.item("ID").as_type(int),
c.RIGHT.item("age") >= 18,
),
how="left",
)
.pipe(
c.list_comp(
{
"id": c.item(0, "id"),
"name": c.item(0, "name"),
"age": c.item(1, "age", default=None),
"country": c.item(1, "country", default=None),
}
)
)
.gen_converter(debug=True)
)
assert conv(input_data) == [
{"id": 1, "name": "Nick", "age": 18, "country": "CA"},
{"id": 2, "name": "Joash", "age": 21, "country": "US"},
{"id": 3, "name": "Bob", "age": None, "country": None},
]
Multi-key joins¶
Use c.and_(...) with multiple equality conditions to join on more than one
key. Equi-joinable conditions are used to build the right-side hash map.
from convtools import conversion as c
input_data = {
"orders": [
{"store": "east", "sku": "A", "qty": 2},
{"store": "east", "sku": "B", "qty": 1},
{"store": "west", "sku": "A", "qty": 3},
],
"prices": [
{"store": "east", "sku": "A", "price": 10},
{"store": "east", "sku": "B", "price": 8},
{"store": "west", "sku": "A", "price": 12},
],
}
conv = (
c.join(
c.item("orders"),
c.item("prices"),
c.and_(
c.LEFT.item("store") == c.RIGHT.item("store"),
c.LEFT.item("sku") == c.RIGHT.item("sku"),
),
)
.pipe(
c.list_comp(
{
"store": c.item(0, "store"),
"sku": c.item(0, "sku"),
"qty": c.item(0, "qty"),
"price": c.item(1, "price"),
}
)
)
.gen_converter()
)
assert conv(input_data) == [
{"store": "east", "sku": "A", "qty": 2, "price": 10},
{"store": "east", "sku": "B", "qty": 1, "price": 8},
{"store": "west", "sku": "A", "qty": 3, "price": 12},
]
Non-equi joins¶
Conditions do not have to be equality checks. For range matching and other
non-equi joins, write the condition directly with operators on c.LEFT and
c.RIGHT.
from convtools import conversion as c
input_data = {
"people": [
{"name": "Nick", "age": 17},
{"name": "Ann", "age": 22},
],
"age_bands": [
{"band": "junior", "min_age": 0, "max_age": 17},
{"band": "adult", "min_age": 18, "max_age": 64},
],
}
conv = (
c.join(
c.item("people"),
c.item("age_bands"),
c.and_(
c.LEFT.item("age") >= c.RIGHT.item("min_age"),
c.LEFT.item("age") <= c.RIGHT.item("max_age"),
),
)
.pipe(
c.list_comp(
{
"name": c.item(0, "name"),
"age": c.item(0, "age"),
"band": c.item(1, "band"),
}
)
)
.gen_converter()
)
assert conv(input_data) == [
{"name": "Nick", "age": 17, "band": "junior"},
{"name": "Ann", "age": 22, "band": "adult"},
]
Duplicate matches¶
c.join emits every matching pair. If two left rows and two right rows have the
same join key, the result contains four pairs for that key. This is the same
many-to-many behavior as a relational join.
from convtools import conversion as c
left = [{"id": 1, "value": "a"}, {"id": 1, "value": "b"}]
right = [{"id": 1, "value": "x"}, {"id": 1, "value": "y"}]
conv = (
c.join(c.item(0), c.item(1), c.LEFT.item("id") == c.RIGHT.item("id"))
.as_type(list)
.gen_converter()
)
assert conv((left, right)) == [
({"id": 1, "value": "a"}, {"id": 1, "value": "x"}),
({"id": 1, "value": "a"}, {"id": 1, "value": "y"}),
({"id": 1, "value": "b"}, {"id": 1, "value": "x"}),
({"id": 1, "value": "b"}, {"id": 1, "value": "y"}),
]
Memory and performance¶
When the condition contains equality predicates that compare left expressions to
right expressions, c.join uses those predicates as hash keys and stores the
right side in memory. Additional one-sided filters can be pushed to the relevant
side before matching.
For non-equi conditions, or conditions that cannot be split into hash keys,
c.join keeps the right side available and checks candidate pairs in a nested
loop. This is flexible, but it is usually more expensive for large inputs.
left, right, and full joins may yield None for the missing side. full
joins also track which right rows were already matched so they can emit
unmatched right rows at the end.