Recently I ran into the trouble of having to mimic the behaviour of pandas.merge_asof in SQL. Other databases like kdb, for example, already have this functionality built-in.
The asof joins are really useful when dealing with time-series. A typical scenario where this comes up:
We have two tables:
The timestamps in the two tables might not necessarily match. If we want to merge these two tables so that we know the rate that each trade was executed at, then for each trade with timestamp t, we need to find the most recent rate that has occurred before time t in the fx-rates table.
Using pandas, this is straight-forward:
pandas.merge_asof(trades, fx_rates, on='ts', by='currency')
To get this to work in SQL, we will need to do a bit more work. My first attempt was a bit clumsy but gets the job done – just about.
select *
from (
select f.*, t.*,
rank() over (partition by t.id order by
abs(extract(epoch from (t.ts - f.ts))) ASC, t.ts DESC
) rnk
from fx_rates f
inner join trades t on f.currency = t.currency
where t.ts > f.ts
) e
where e.rnk=1
This is not so offending if the database is very small, but even for small databases sizes we can run into trouble. The main troubles are:
Lateral joins to the rescue.
Lateral joins were introduced in postgresql 9.3. We can use it to write a much simpler query.
The basic idea behind lateral joins is that we “tell” the query planner that the sub-query will be run many times; this allows the it to be optimised.
select *
from (
select *
from trades) t
cross join lateral (
select *
from fx_rates
where currency = t.currency and t.ts > ts
order by ts DESC
limit 1)
This stackoverflow question contains a lot of info and some more interesting links.