You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The query to get the last update of a dataset can be quite slow, especially once the number of time series in the dataset grow above 90'000.
I wrote an alternative query that seems to perform slightly better, but there probably needs to be an additional index somewhere to make it really go fast.
The existing query is:
-- Get the last time the dataset was updated---- Returns the created_at timestamp of the series in the given dataset-- that was most recently updated.CREATE OR REPLACEFUNCTIONtimeseries.dataset_get_last_update(p_dataset TEXT)
RETURNS TABLE(name TEXT, updated TIMESTAMPTZ)
AS $$
BEGIN
CREATE TEMPORARY TABLE tmp_ts_read_keys
ONCOMMIT DROP
AS (
SELECT ts_key
FROMtimeseries.catalogAS cat
WHERE set_id = p_dataset
);
RETURN QUERY
SELECT p_dataset AS name, max(ud.updated) AS updated
FROMtimeseries.ts_get_last_update() AS ud;
END;
$$ LANGUAGE PLPGSQL
SECURITY DEFINER
SET search_path = timeseries, pg_temp;
-- Get the last time time series were updated---- Returns the created_at of the given time seriesCREATE OR REPLACEFUNCTIONtimeseries.ts_get_last_update()
RETURNS TABLE(ts_key TEXT, updated TIMESTAMPTZ)
AS $$
BEGIN
RETURN QUERY
SELECT
DISTINCT ON(mn.ts_key)
mn.ts_key, mn.created_atAS updated
FROMtimeseries.timeseries_mainAS mn
JOIN tmp_ts_read_keys AS rd
USING(ts_key)
ORDER BYmn.ts_key, mn.created_atDESC;
END;
$$ LANGUAGE PLPGSQL
SECURITY DEFINER
SET search_path = timeseries, pg_temp;
The query I wrote is the following:
SELECTMAX(updated) updated
FROM (
SELECT
ts_key,
(
SELECTMAX(created_at) updated
FROMtimeseries.timeseries_main tm
WHEREtm.ts_key=catalog.ts_key
)
FROMtimeseries.catalogWHERE set_id ='<set_id>'
) AS s;
The timings with the ch.fso.cah.inv dataset, which contains 69'366 time series, is as follows:
The difference in time between the first query and the following ones is probably due to Postgres caching the table in memory after the first read, but the time when used in actual code would probably be closer to the first run.
The timings with the updated query are as follows:
Some data
Set
N. series
Time old
Time new
ch.kof.ivua.ng08
120'742
First Run: ~270s , Cached: ~270s
FR: ~49s, C: ~6s
ch.kof.ivus.ng08
97'595
FR: ~245-250s, C: ~245-250s
FR: ~13s, C: ~4s
ch.fso.cah.inv
69'366
FR: ~4.5s, C: ~2.9s
FR: ~2s, C: ~2s
ch.fso.hesta
4'212
FR: ~1.6s, C: ~1.6s
FR: ~1.2s, C: ~1.2s
ch.fso.unemp
84
FR: ~1.2s, C: ~1.1s
FR: ~1.2s, C: ~1.1s
The text was updated successfully, but these errors were encountered:
The query to get the last update of a dataset can be quite slow, especially once the number of time series in the dataset grow above 90'000.
I wrote an alternative query that seems to perform slightly better, but there probably needs to be an additional index somewhere to make it really go fast.
The existing query is:
The query I wrote is the following:
The timings with the
ch.fso.cah.inv
dataset, which contains 69'366 time series, is as follows:The difference in time between the first query and the following ones is probably due to Postgres caching the table in memory after the first read, but the time when used in actual code would probably be closer to the first run.
The timings with the updated query are as follows:
Some data
The text was updated successfully, but these errors were encountered: