相关文档:
PG9.1
Collation支持http://www.postgresql.org/docs/9.1/static/collation.html
字符集的设置http://www.postgresql.org/docs/9.1/static/multibyte.html
PG9.2
Collation支持http://www.postgresql.org/docs/current/static/collation.html
字符集的设置http://www.postgresql.org/docs/current/static/multibyte.html
简介:从PG9.1开始支持collation。collation允许指定每列数据的排序顺序和字符分类,甚至每个操作的。
它缓解了数据库的LC_COLLATE和LC_CTYPE设置无法在数据库创建后更改的限制。
使用示例:
1.排序顺序的设置:
a.初始化数据库是可以设置locale,如果不设置默认参照操作系统的locale。
[postgres@localhost bin]$ ./initdb -D ../data5 --encoding=UTF8 --locale=zh_CN.UTF8
postgres=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------+----------+----------+-------------+-------------+-----------------------
postgres | postgres | UTF8 | zh_CN.utf8 | zh_CN.utf8 |
template0| postgres | UTF8 | zh_CN.utf8 | zh_CN.utf8 | =c/postgres
+
| | | | | postgres=CTc/postgres
template1| postgres | UTF8 | zh_CN.utf8 | zh_CN.utf8 | =c/postgres
+
| | | | | postgres=CTc/postgres
(5 rows)
b.创建数据库时,locale参数的设置必须是数据库集群初始化时设置的locale(也就是template的locale)兼容的:
postgres=# create database testdb with ENCODING='utf8' LC_COLLATE='zh_CN.utf8' LC_CTYPE='zh_CN.utf8'
template=template0;
CREATE DATABASE
c.创建表时,引用的collation必须是所在数据库的编码encoding兼容的。
CREATE TABLE test1 (
a text COLLATE "de_DE",
b text COLLATE "es_ES"
);
2.定义新的collation
语法:
CREATE COLLATION name (
[ LOCALE = locale,]
[ LC_COLLATE = lc_collate,]
[ LC_CTYPE = lc_ctype ]
)
CREATE COLLATION name FROM existing_collation
注意:创建collation时:若设置参数LOCALE的值,则表示LC_COLLATE和LC_CTYPE都设置为LOCALE所设置的值,
此时不能再设置LC_COLLATE和LC_CTYPE;不设置LOCALE值的时候才能设置LC_COLLATE和LC_CTYPE的值,
一般二者设置为相同的值,也可以设置为不同,但一般在实践中是无意义的。
例子:
postgres=# create collation ddd (locale='de_DE.utf8');
CREATE COLLATION
postgres=# select a<b collate "ddd" from tt;
?column?
----------
f
t
f
f
f
t
f
(7 rows)
postgres=# create collation ddd2 from ddd;
CREATE COLLATION
3.排序顺序的使用:
一个表达式的collation derivation可以是隐式的或者显式的。这个区别在多个不同collation出现在一个表达式中
时影响collation的组合。显式的collation derivation出现在使用COLLATE语句的时候;所有其他的collation derivations
都是隐式的。当需要组合多个collation时,例如在一个函数调用中,通常有以下规则:
a.If any input expression has an explicit collation derivation,then all explicitly derived collations among the
input expressions must be the same,otherwise an error is raised. If any explicitly derived collation is present,
that is the result of the collation combination.
b. Otherwise,all input expressions must have the same implicit collation derivation or the default collation.
If any non-default collation is present,that is the result of the collation combination. Otherwise,the result is
the default collation.
c. If there are conflicting non-default implicit collations among the input expressions,then the combination is
deemed to have indeterminate collation. This is not an error condition unless the particular function being
invoked requires knowledge of the collation it should apply. If it does,an error will be raised at run-time.
以表test1的操作为例:
postgres=# \d test1
Table "public.test1"
Column | Type | Modifiers
--------+------+---------------
a | text | collate de_DE
b | text | collate es_ES
①a和b列有冲突的隐式collation。必须为任何一个输入表达式附加一个显式的collation说明。如下:
postgres=# select a<b from test1;
ERROR: could not determine which collation to use for string comparison
HINT: Use the COLLATE clause to set the collation explicitly.
STATEMENT: select a<b from test1;
ERROR: could not determine which collation to use for string comparison
HINT: Use the COLLATE clause to set the collation explicitly.
解决:SELECT a < b COLLATE "de_DE" FROM test1;
或等效的SELECT a COLLATE "de_DE" < b FROM test1;
②a和b列组合时,隐式的collation不一致,则需要显式地为组合指定一个collation。如下:
postgres=# select * from test1 order by a||b;
ERROR: collation mismatch between implicit collations "de_DE" and "es_ES" at character 30
HINT: You can choose the collation by applying the COLLATE clause to one or both expressions.
STATEMENT: select * from test1 order by a||b;
ERROR: collation mismatch between implicit collations "de_DE" and "es_ES"
LINE 1: select * from tt order by a||b;
^
HINT: You can choose the collation by applying the COLLATE clause to one or both expressions.
解决:SELECT * FROM test1 ORDER BY a || b COLLATE "fr_FR";
③Postgresql认为不同的collation对象是不兼容的,即使他们有相同的属性。所以,比如:
postgres=# select a collate "C"<b COLLATE "POSIX" from test1;
ERROR: collation mismatch between explicit collations "C" and "POSIX" at character 24
STATEMENT: select a collate "C"<b COLLATE "POSIX" from test1;
ERROR: collation mismatch between explicit collations "C" and "POSIX"
LINE 1: select a collate "C"<b COLLATE "POSIX" from test1;
tip:默认情况下,不管数据库编码是什么都可以使用C和POSIX的collation。
④当排序顺序不符合自身语言环境的习惯时,可以对排序字段排序顺序的转化来调整。以下是一个
中文排序习惯的转化:
以下排序不符合中文发音排序的习惯:
postgres=# select * from t order by cont;
id | name | cont
----+------+--------------
10 | kkk |从此从此从此
10 | kkk |可可己
10 | kkk |可可己
10 | kkk |可可己
10 | kkk |并不并不
10 | kkk |阿阿阿阿阿阿
(6 rows)
解决:
postgres=# select * from t order by convert_to(cont,'GBK');
id | name | cont
----+------+--------------
10 | kkk |阿阿阿阿阿阿
10 | kkk |并不并不
10 | kkk |从此从此从此
10 | kkk |可可己
10 | kkk |可可己
10 | kkk |可可己
(6 rows)
4.创建索引时也可以由COLLATE语句显式指定collation。
postgres=# create index tiindex on t1(b collate "de_DE" DESC);
CREATE INDEX
更多详细信息,请查阅PG社区相关文档专区http://www.postgresql.org/docs/。
初次发表网址:http://bbs.pgsqldb.com:8079/client/post_show.php?zt_auto_bh=56943