Acyr Locatelli

Asof joins in PostgreSQL

Posted at — Oct 4, 2018

Recently I ran into the trouble of having to mimic the behaviour of pandas.merge_asof in SQL. Other databases like kdb, for example, already have this functionality built-in.

ASOF joins

The asof joins are really useful when dealing with time-series. A typical scenario where this comes up:

We have two tables:

The timestamps in the two tables might not necessarily match. If we want to merge these two tables so that we know the rate that each trade was executed at, then for each trade with timestamp t, we need to find the most recent rate that has occurred before time t in the fx-rates table.

Using pandas, this is straight-forward:

 pandas.merge_asof(trades, fx_rates, on='ts', by='currency')

To get this to work in SQL, we will need to do a bit more work. My first attempt was a bit clumsy but gets the job done – just about.

select *
from (
  select f.*, t.*,
    rank() over (partition by t.id order by
      abs(extract(epoch from (t.ts - f.ts))) ASC, t.ts DESC
    ) rnk
  from fx_rates f
  inner join trades t on f.currency = t.currency
  where t.ts > f.ts
) e
where e.rnk=1

This is not so offending if the database is very small, but even for small databases sizes we can run into trouble. The main troubles are:

  1. It is very slow.
  2. It cause large amounts of data to be written to disk by the subquery.

Lateral joins to the rescue.

Lateral joins

Lateral joins were introduced in postgresql 9.3. We can use it to write a much simpler query.

The basic idea behind lateral joins is that we “tell” the query planner that the sub-query will be run many times; this allows the it to be optimised.

select *
from (
  select *
  from trades) t
  cross join lateral (
    select *
    from fx_rates
    where currency = t.currency and t.ts > ts
  order by ts DESC
  limit 1)

References

This stackoverflow question contains a lot of info and some more interesting links.

comments powered by Disqus