以Postgresql为主,再聊聊资料库递迴又见递迴-58码农网

情境：有许多 sensor, 一直传送资料,我们想取得最新的资料.create unlogged table it201011c (  id bigint not null primary key, sensor_id int not null, val int not null);insert into it201011cselect n     , ceil(random() * 1e5)     , random() * 100   from generate_series(1,1e7) n;输入一千万笔资料,注意到这次使用了 unlogged table.为了方便起见,id 越大,代表资料越新.实务上可以使用id或是时间戳.一般的作法,利用 window function 的 row_number() , rank()select id, sensor_id, val  from (select *             , row_number() over(partition by sensor_id order by id desc) rn          from it201011c) a where a.rn = 1; (100000 rows)Time: 14941.510 ms (00:14.942)取得十万笔资料,速度还不赖.explain (analyze,timing,costs)select id, sensor_id, val  from (select *             , row_number() over(partition by sensor_id order by id desc) rn          from it201011c) a where a.rn = 1; +-------------------------------------------------------------------------------------------------------------------------------------------+|                                                                QUERY PLAN                                                                 |+-------------------------------------------------------------------------------------------------------------------------------------------+| Subquery Scan on a  (cost=1658507.15..1983502.60 rows=49999 width=16) (actual time=10466.416..18693.056 rows=100000 loops=1)              ||   Filter: (a.rn = 1)                                                                                                                      ||   Rows Removed by Filter: 9900000                                                                                                         ||   ->  WindowAgg  (cost=1658507.15..1858504.35 rows=9999860 width=24) (actual time=10466.414..17789.984 rows=10000000 loops=1)             ||         ->  Sort  (cost=1658507.15..1683506.80 rows=9999860 width=16) (actual time=10466.401..12646.441 rows=10000000 loops=1)            ||               Sort Key: it201011c.sensor_id, it201011c.id DESC                                                                            ||               Sort Method: external merge  Disk: 254472kB                                                                                 ||               ->  Seq Scan on it201011c  (cost=0.00..154053.60 rows=9999860 width=16) (actual time=2.679..1516.206 rows=10000000 loops=1) || Planning Time: 0.254 ms                                                                                                                   || Execution Time: 19242.739 ms                                                                                                              |+-------------------------------------------------------------------------------------------------------------------------------------------+观察到 Sort Key: it201011c.sensor_id, it201011c.id DESC   执行时间在 19~14 秒.建立 index.create index on it201011c(sensor_id, id desc);commit;analyze it201011c;select id, sensor_id, val  from (select *             , row_number() over(partition by sensor_id order by id desc) rn          from it201011c) a where a.rn = 1;Time: 46248.666 ms (00:46.249)explain (analyze,timing,costs)select id, sensor_id, val  from (select *             , row_number() over(partition by sensor_id order by id desc) rn          from it201011c) a where a.rn = 1; +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+|                                                                               QUERY PLAN                                                                                |+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+| Subquery Scan on a  (cost=0.43..820237.17 rows=49999 width=16) (actual time=3.364..43062.106 rows=100000 loops=1)                                                       ||   Filter: (a.rn = 1)                                                                                                                                                    ||   Rows Removed by Filter: 9900000                                                                                                                                       ||   ->  WindowAgg  (cost=0.43..695238.92 rows=9999860 width=24) (actual time=3.360..41889.384 rows=10000000 loops=1)                                                      ||         ->  Index Scan using it201011c_sensor_id_id_idx on it201011c  (cost=0.43..520241.37 rows=9999860 width=16) (actual time=3.328..35215.809 rows=10000000 loops=1) || Planning Time: 0.172 ms                                                                                                                                                 || Execution Time: 43078.930 ms                                                                                                                                            |+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+虽然 index 建立的方式,也符合排序的需求,也正确的使用了 index, 但是实际存取时,反而变慢了....因为我们建立的 index 是 id desc,与 id 生长方面相反,但要取得最新,就是要抓大的,而且还要多透过index,没index 时,还要利用 external merge  Disk: 254472kB, 但反而速度较快.所以资料库的优化,并不是建立 index 就必然加快速度. 看过昨天的利用递迴来加快,那我们来看看如何利用递迴来加快这样的查询.我们透过与昨天的类似的方式,来尝试使用以下 SQL Commandwith recursive t1 as (select sensor_id, val  from it201011c where id in (       select id         from it201011c        where sensor_id is not null        order by sensor_id, id desc        limit 1       )union allselect (select sensor_id, val          from it201011c         where id in (               select id                 from it201011c a                where a.sensor_id > t1.sensor_id                  and a.sensor_id is not null                order by a.sensor_id, a.id desc                limit 1              )       )  from t1 where t1.sensor_id is not null)select sensor_id, val  from t1 where t1.sensor_id is not null;ERROR:  42601: subquery must return only one columnLINE 12: select (select sensor_id, val出现了错误讯息.....不过方法是人想出来的,既然只能一个栏位,那我们就建立 type, 来包含两个栏位,type 本身只算一个栏位.create type ty1011 as (sensor_id int, val int);with recursive t1 as (select (sensor_id, val)::ty1011 as sr  from it201011c where id in (       select id         from it201011c        where sensor_id is not null        order by sensor_id, id desc        limit 1       )union allselect (select (sensor_id, val)::ty1011 as sr          from it201011c         where id in (               select id                 from it201011c a                where a.sensor_id > (t1.sr).sensor_id                  and a.sensor_id is not null                order by a.sensor_id, a.id desc                limit 1              )       )  from t1 where (t1.sr).sensor_id is not null)select (t1.sr).sensor_id, (t1.sr).val  from t1 where t1.* is not null; Time: 12371.356 ms (00:12.371)Time: 5858.400 ms (00:05.858)第一次执行会久一些 12秒,第二次就降到 5秒.这样我们就可以利用递迴,来取得十万个sensor的最新资料,而且速度也较快.Execution Time: 3929.956 ms    <-- 第三次, 查看执行计画时,更快了!执行计画就不列出了,有兴趣的,可以自己测试看看.
给这篇文章的作者打赏

关于作者: 网站小编

相关文章

HBO Max vs.Netflix：当你负担不起两者时如何选择

课内笔记整理---作业系统实务(资安相关篇)

excel vba捞网页数据问题

热门文章

1以Postgresql为主,再聊聊资料库 递迴 又见递迴

210大安全软体爆漏洞

3Seagate希捷外接式硬碟机无响应/无法读取/没有回应？

4[JS] You Don't Know JavaScript [Scope & Closures]

5HERE API Example - 显示 GeoJSON Data

1以Postgresql为主,再聊聊资料库递迴又见递迴