digoal
2013-08-14
PostgreSQL , WITH ORDINALITY , SRF , unnest
PostgreSQL 9.4已经支持unnest(array, array, ....)多参数. 不需要使用WITH ORDINALITY
http://www.postgresql.org/docs/9.4/static/functions-array.html
- | - | - | - |
---|---|---|---|
unnest(anyarray) | setof anyelement | expand an array to a set of rows | unnest(ARRAY[1,2]) |
unnest(anyarray, anyarray [, ...]) | setof anyelement, anyelement [, ...] | expand multiple arrays (possibly of different types) to a set of rows. This is only allowed in the FROM clause; see Section 7.2.1.4 | unnest(ARRAY[1,2],ARRAY['foo','bar','baz']) |
使用WITH ORDINALITY会新增一个类似行号的值, 即一一匹配的行号.
PostgreSQL 的一个补丁, 支持多个数组输入, 一对一的输出行列转换后的结果.
http://www.postgresql.org/message-id/flat/[email protected]#[email protected]
例如pg_stats的most_common_vals和most_common_freqs是两个数组, 在没有这个补丁时, 并排输出需要使用窗口函数进行关联后输出, 或者使用函数来输出.
例如 http://blog.163.com/digoal@126/blog/static/1638770402013710105353862/ 文中用到的关联.
digoal=# select * from
(select row_number() over(partition by r) as rn,ele from (select unnest(most_common_elems::text::int[]) ele,2 as r from pg_stats where tablename='test_2' and attname='appid') t) t1
join
(select row_number() over(partition by r) as rn,freq from (select unnest(most_common_elem_freqs) freq,2 as r from pg_stats where tablename='test_2' and attname='appid') t) t2
on (t1.rn=t2.rn) order by t2.freq desc limit 20;
rn | ele | rn | freq
-----+-----+-----+------------
3 | 2 | 3 | 0.803167
9 | 8 | 9 | 0.802233
7 | 6 | 7 | 0.802133
8 | 7 | 8 | 0.802067
6 | 5 | 6 | 0.801633
4 | 3 | 4 | 0.801367
5 | 4 | 5 | 0.8006
10 | 9 | 10 | 0.799933
2 | 1 | 2 | 0.798167
1 | 0 | 1 | 0.5835
11 | 10 | 11 | 0.5822
92 | 91 | 92 | 0.00263333
320 | 319 | 320 | 0.00256667
15 | 14 | 15 | 0.00246667
769 | 768 | 769 | 0.00243333
961 | 960 | 961 | 0.00243333
42 | 41 | 42 | 0.00236667
503 | 502 | 503 | 0.00236667
651 | 650 | 651 | 0.00236667
664 | 663 | 664 | 0.00236667
(20 rows)
使用本补丁后, 这个SQL可以改成
digoal=# select u.* from pg_stats t,unnest(most_common_elems::text::int[],most_common_elem_freqs) WITH ORDINALITY as u(a,b,r) where t.tablename='test_2' and t.attname='appid' and u.a is not null order by u.b desc limit 20;
a | b | r
-----+------------+-----
2 | 0.803167 | 3
8 | 0.802233 | 9
6 | 0.802133 | 7
7 | 0.802067 | 8
5 | 0.801633 | 6
3 | 0.801367 | 4
4 | 0.8006 | 5
9 | 0.799933 | 10
1 | 0.798167 | 2
0 | 0.5835 | 1
10 | 0.5822 | 11
91 | 0.00263333 | 92
319 | 0.00256667 | 320
14 | 0.00246667 | 15
768 | 0.00243333 | 769
960 | 0.00243333 | 961
41 | 0.00236667 | 42
502 | 0.00236667 | 503
650 | 0.00236667 | 651
663 | 0.00236667 | 664
(20 rows)
使用后大大简化了SQL写法.
补丁安装 :
[root@db-172-16-3-39 soft_bak]# tar -zxvf postgresql-5e3e8e4.tar.gz
[root@db-172-16-3-39 soft_bak]# cd postgresql-5e3e8e4
[root@db-172-16-3-39 postgresql-5e3e8e4]# wget http://www.postgresql.org/message-id/attachment/29844/table-functions.patch
[root@db-172-16-3-39 postgresql-5e3e8e4]# patch -p1 < ./table-functions.patch
patching file src/backend/access/common/tupdesc.c
patching file src/backend/commands/explain.c
patching file src/backend/executor/nodeFunctionscan.c
patching file src/backend/nodes/copyfuncs.c
patching file src/backend/nodes/equalfuncs.c
patching file src/backend/nodes/makefuncs.c
patching file src/backend/nodes/nodeFuncs.c
patching file src/backend/nodes/outfuncs.c
patching file src/backend/nodes/readfuncs.c
patching file src/backend/optimizer/path/allpaths.c
patching file src/backend/optimizer/path/costsize.c
patching file src/backend/optimizer/path/pathkeys.c
patching file src/backend/optimizer/plan/createplan.c
patching file src/backend/optimizer/plan/initsplan.c
patching file src/backend/optimizer/plan/planner.c
patching file src/backend/optimizer/plan/setrefs.c
patching file src/backend/optimizer/plan/subselect.c
patching file src/backend/optimizer/prep/prepjointree.c
patching file src/backend/optimizer/util/clauses.c
patching file src/backend/optimizer/util/pathnode.c
patching file src/backend/parser/gram.y
patching file src/backend/parser/parse_clause.c
patching file src/backend/parser/parse_relation.c
patching file src/backend/rewrite/rewriteHandler.c
patching file src/backend/utils/adt/ruleutils.c
patching file src/include/access/tupdesc.h
patching file src/include/nodes/execnodes.h
patching file src/include/nodes/parsenodes.h
patching file src/include/nodes/plannodes.h
patching file src/include/optimizer/pathnode.h
patching file src/include/optimizer/paths.h
patching file src/include/parser/parse_relation.h
[root@db-172-16-3-39 postgresql-5e3e8e4]# ./configure --prefix=/home/pg94/pgsql9.4devel --with-pgport=2999 --with-perl --with-tcl --with-python --with-openssl --with-pam --without-ldap --with-libxml --with-libxslt --enable-thread-safety --with-wal-blocksize=16 && gmake && gmake install
[root@db-172-16-3-39 postgresql-5e3e8e4]# cd contrib/
[root@db-172-16-3-39 contrib]# gmake && gmake install
[root@db-172-16-3-39 postgresql-5e3e8e4]# su - pg94
pg94@db-172-16-3-39-> initdb -D $PGDATA -E UTF8 --locale=C -W -U postgres
pg_ctl start
补丁实现了table的多srf函数支持
digoal=# select * from table(generate_series(1,10),generate_series(2,5));
?column? | ?column?
----------+----------
1 | 2
2 | 3
3 | 4
4 | 5
5 |
6 |
7 |
8 |
9 |
10 |
(10 rows)
补丁的实施原理,
unnest(a,b,c)
SQL解析时转换成
TABLE(unnest(a),unnest(b),unnest(c))
所以这个补丁实际上并不是改了unnest函数, 而是改了sql parser.
因此unnest(anyarray, anyarray,...) 这个函数是不存在的, 也不能写在select clause中.
所以不能用以下写法
digoal=# select unnest(most_common_elems::text::int[],most_common_elem_freqs) from pg_stats t where t.tablename='test_2' and t.attname='appid' limit 20;
ERROR: function unnest(integer[], real[]) does not exist
LINE 1: select unnest(most_common_elems::text::int[],most_common_ele...
^
HINT: No function matches the given name and argument types. You might need to add explicit type casts.
本补丁的详细介绍
The spec syntax for table function calls, <table function derived table>
in <table reference>, looks like TABLE(func(args...)) AS ...
This patch implements that, plus an extension: it allows multiple
functions, TABLE(func1(...), func2(...), func3(...)) [WITH ORDINALITY]
and defines this as meaning that the functions are to be evaluated in
parallel.
This is implemented by changing RangeFunction, function RTEs, and the
FunctionScan node to take lists of function calls rather than a single
function. The calling convention for SRFs is completely unchanged; each
function returns its own rows (or a tuplestore in materialize mode) just
as before, and FunctionScan combines the results into a single output
tuple (keeping track of which functions are exhausted in order to
correctly fill in nulls on a backwards scan).
Then, a hack in the parser converts unnest(...) appearing as a
func_table (and only there) into a list of unnest() calls, one for each
parameter. So
select ... from unnest(a,b,c)
is converted to
select ... from TABLE(unnest(a),unnest(b),unnest(c))
and if unnest appears as part of an existing list inside TABLE(), it's
expanded to multiple entries there too.
This parser hackery is of course somewhat ugly. But given the objective
of implementing the spec's unnest syntax, it seems to be the least ugly
of the possible approaches. (The hard part of doing it any other way
would be generating the description of the result type; composite array
parameters expand into multiple result columns.)
Overall, it's my intention here to remove as many as feasible of the old
reasons why one might use an SRF in the select list. This should also
address the points that Josh brought up in discussion of ORDINALITY
regarding use of SRF-in-select to unnest multiple arrays.
(As a side issue, this patch also sets up pathkeys for ordinality along
the lines of a patch I suggested to Greg a while back in response to
his.)
Current patch status:
This is a first working cut: no docs, no tests, not enough comments, the
deparse logic probably needs more work (it deparses correctly but the
formatting may be suboptimal). However all the functionality is believed
to be in place.
1. 这个补丁比较适合并列数组的查询需求.
例如本例中用到的pg_stats中的频繁出现的值和频繁出现的值对应的出现频率这2个字段就是一一对应的数组.
1. http://www.postgresql.org/message-id/flat/[email protected]#[email protected]
2. http://blog.163.com/digoal@126/blog/static/16387704020136301599959/
3. SRFs(setof return functions) 返回多行的函数.
4. http://blog.163.com/digoal@126/blog/static/1638770402013710105353862/