全文搜索

Postgres有内置的功能来处理 全文搜索查询。这就像Postgres中的一个 搜索引擎。

准备工作

在本指南中，我们将使用以下示例数据：

id	title	author	description
1	The Poky Little Puppy	Janette Sebring Lowrey	Puppy is slower than other, bigger animals.
2	The Tale of Peter Rabbit	Beatrix Potter	Rabbit eats some vegetables.
3	Tootle	Gertrude Crampton	Little toy train has big dreams.
4	Green Eggs and Ham	Dr. Seuss	Sam has changing food preferences and eats unusually colored food.
5	Harry Potter and the Goblet of Fire	J.K. Rowling	Fourth year of school starts, big drama ensues.

使用方法

我们在本指南中要介绍的函数是：

`to_tsvector()` {#to-tsvector}#

将你的数据转换为可搜索的 "标记"。to_tsvector()代表 to text search vector。例如：

1select to_tsvector('green eggs and ham')
2
3-- Returns 'egg':2 'green':1 'ham':4

这些令牌统称为 "文件"，Postgres可以用它来进行比较。

`to_tsquery()` {#to-tsquery}#

将查询字符串转换为 令牌来匹配。to_tsquery()代表 to text search query。

这个转换步骤很重要，因为我们要对关键词进行 "模糊匹配"。例如，如果用户搜索 eggs，而某一列的值是 eggs，我们可能仍然希望返回一个匹配。

Match: `@@` {#match}#

@@符号是全文搜索的 "匹配 "符号。它返回to_tsvector结果和to_tsquery结果之间的任何匹配。

以下面的例子为例：

1select *
2from books
3where title = 'Harry';

上面的等于符号（=）对其匹配的内容非常 严格。在全文搜索的情况下，我们可能想找到所有Harry Potter的书，所以我们可以重写上面的例子：

1select *
2from books
3where to_tsvector(title) @@ to_tsquery('Harry');

基本的全文查询

查询一个单列

查找所有 description中包含big单词的 books：

1select
2  *
3from
4  books
5where
6  to_tsvector(description)
7  @@ to_tsquery('big');

查询多列

查找所有description或title包含little单词的books：

1select
2  *
3from
4  books
5where
6  to_tsvector(description || ' ' || title) -- concat columns, but be sure to include a space to separate them!
7  @@ to_tsquery('little');

匹配所有搜索词

为了找到所有description包含 little 和 big两个单词的 books，我们可以使用&符号：

1select
2  *
3from
4  books
5where
6  to_tsvector(description)
7  @@ to_tsquery('little & big'); -- use & for AND in the search query

匹配任何搜索词

查找所有description包含little或big中任何一个单词的books，使用|符号。

1select
2  *
3from
4  books
5where
6  to_tsvector(description)
7  @@ to_tsquery('little | big'); -- use | for OR in the search query

注意：搜索 big如何包括 bigger（或 biggest等等）的结果。

创建索引

现在我们有了全文搜索的功能，让我们来创建一个 索引。这将允许Postgres预先 建立文件，这样就不需要在我们执行查询的时候创建这些文件。这将使我们的查询速度大大加快。

可搜索列

我们在 books表内创建一个新的列fts来存储title和description列的可搜索索引。

我们可以使用Postgres的一个特殊功能，叫做生成的列以确保当title和description列的值发生变化时，该索引会被更新。

1alter table
2  books
3add column
4  fts tsvector generated always as (to_tsvector('english', description || ' ' || title)) stored;
5
6create index books_fts on books using gin (fts); -- generate the index
7
8select id, fts
9from books;

使用新列进行查询

现在我们已经创建并填充了我们的索引，我们可以使用与之前相同的技术来查询：

1select
2  *
3from
4  books
5where
6  fts @@ to_tsquery('little & big');

查询运算符

访问 PostgreSQL: 文本搜索功能和操作符来了解更多的查询运算符，你可以用它来做更高级的 全文查询，例如：

近似符: `<->` {#proximity}#

近似符号对于搜索相隔一定 "距离 "的术语非常有用。例如，要找到 big dreams这个短语，在 big的匹配后面紧跟着dreams的匹配。

1select
2  *
3from
4  books
5where
6  to_tsvector(description) @@ to_tsquery('big <-> dreams');

我们还可以使用<->来查找彼此之间有一定距离的单词。例如，找到year 和school间隔2个词之间的距离。

1select
2  *
3from
4  books
5where
6  to_tsvector(description) @@ to_tsquery('year <2> school');

否定: `!` {#negation}#

否定符号可以用来查找不包含搜索词的短语。例如，找到有big字但没有little字的记录：

1select
2  *
3from
4  books
5where
6  to_tsvector(description) @@ to_tsquery('big & !little');

资源

PostgreSQL: 文本搜索功能和操作符