Nov 5, 2014. Sphinx Search Quick Tour using a MySQL Datasource

You can find a ‘Quick Usage Tour’ in our documentation. In this post, I’m going to walk you through that tour and elaborate on a few things. Enjoy!

Things to consider

  • I’m assuming you’ve already installed MySQL. Sphinx does not require that you use MySQL, but the following examples do.
  • I’m installing Sphinx on Ubuntu 14.04, using the package provided on our download page.

Installing Sphinx on Ubuntu is very easy

Download the package, click it, install it.

The view from Ubuntu’s Software Center:

Selection_052

Oh, and since we’re talking about installing Sphinx on Ubuntu, I want to mention that you might also choose to use our official PPA.

That’s it. Let’s move onto the MySQL side of things.

Example.sql

We’ve included some sample data in the download package.

To get that sample data into MySQL, try this:

mysql test < /usr/share/doc/sphinxsearch/example-conf/example.sql

This will create a table called ‘documents’.

Take a look:

mysql> explain documents;
+------------+--------------+------+-----+---------+----------------+
| Field      | Type         | Null | Key | Default | Extra          |
+------------+--------------+------+-----+---------+----------------+
| id         | int(11)      | NO   | PRI | NULL    | auto_increment |
| group_id   | int(11)      | NO   |     | NULL    |                |
| group_id2  | int(11)      | NO   |     | NULL    |                |
| date_added | datetime     | NO   |     | NULL    |                |
| title      | varchar(255) | NO   |     | NULL    |                |
| content    | text         | NO   |     | NULL    |                |
+------------+--------------+------+-----+---------+----------------+
6 rows in set (0.00 sec)

Now, let’s set up the configuration file.

Configuration

We have also provided a sample Sphinx configuration file. To use it, just copy sphinx-min.conf.dist into a file called sphinx.conf.

Like this:

$ cd /usr/share/doc/sphinxsearch/example-conf
$ cp sphinx-min.conf.dist sphinx.conf
$ vi sphinx.conf

In the example above, we’re using ‘vi’ to make changes to sphinx.conf. You can use any text editor you like.

I only had to make one small change to the configuration. I could have created a MySQL user named ‘test’, but instead, I just changed the sql_user in sphinx.conf to an already existing user.

Then, from /usr/share/doc/sphinxsearch/example-conf/, I ran indexer like this:

sudo indexer -c sphinx.conf --all

‘–all’ means that we’ll be indexing all the indexes defined in sphinx.conf.

(**interesting side-note: the plural form of ‘Sphinx’ is ‘Sphinges’ and the plural form of ‘index’ is ‘indices’. In reality, both of these are barely ever used.. So, I’ll be sticking with ‘indexes’… but, if I ever get the chance, I’d love to use ‘Sphinges’ in a sentence!)

Here’s what should happen when you run indexer:

stefo@ubuntu:/usr/share/doc/sphinxsearch/example-conf$ sudo indexer -c sphinx.conf --all
[sudo] password for stefo: 
Sphinx 2.2.5-id64-release (r4825)
Copyright (c) 2001-2014, Andrew Aksyonoff
Copyright (c) 2008-2014, Sphinx Technologies Inc (http://sphinxsearch.com)
 
using config file 'sphinx.conf'...
WARNING: key 'max_matches' was permanently removed from Sphinx configuration. Refer to documentation for details.
indexing index 'test1'...
collected 4 docs, 0.0 MB
sorted 0.0 Mhits, 100.0% done
total 4 docs, 193 bytes
total 0.154 sec, 1247 bytes/sec, 25.85 docs/sec
skipping non-plain index 'testrt'...
total 4 reads, 0.000 sec, 0.1 kb/call avg, 0.0 msec/call avg
total 12 writes, 0.000 sec, 0.1 kb/call avg, 0.0 msec/call avg

And, this is what starting searchd should look like:

stefo@ubuntu:/usr/share/doc/sphinxsearch/example-conf$ sudo searchd -c sphinx.conf
Sphinx 2.2.5-id64-release (r4825)
Copyright (c) 2001-2014, Andrew Aksyonoff
Copyright (c) 2008-2014, Sphinx Technologies Inc (http://sphinxsearch.com)
 
using config file 'sphinx.conf'...
WARNING: key 'max_matches' was permanently removed from Sphinx configuration. Refer to documentation for details.
listening on all interfaces, port=9312
listening on all interfaces, port=9306
precaching index 'test1'
precaching index 'testrt'                                   
precached 2 indexes in 0.001 sec

Everything looks good. Let’s move on.

The Fun Stuff

Connect to Sphinx:

mysql -h0 -P9306

Full text search

Search text from our MySQL datasource:

mysql> SELECT * FROM test1 WHERE MATCH('my document');
+------+----------+------------+
| id   | group_id | date_added |
+------+----------+------------+
|    1 |        1 | 1415106370 |
|    2 |        1 | 1415106370 |
+------+----------+------------+
2 rows in set (0.19 sec)

INSERTing into a real time index

Insert some documents into a real time index:

mysql> INSERT INTO testrt VALUES (1, 'this is', 'some sample text', 11), (2, 'some more', 'text here', 22), (3, 'mmooooorrree', 'even more text', 33);
Query OK, 3 rows affected (0.00 sec)

Expressions

mysql> SELECT gid/11 FROM testrt WHERE MATCH('text') GROUP BY gid;
+----------+
| gid/11   |
+----------+
| 1.000000 |
| 2.000000 |
| 3.000000 |
+----------+
3 rows in set (0.00 sec)

Did you notice we’re dividing ‘gid>’ by 11? Sphinx lets you use arbitrary arithmetic expressions both via SphinxQL and SphinxAPI, involving attribute values, internal attributes (document ID and relevance weight), arithmetic operations, a number of built-in functions, and user-defined functions. Read more about all the expressions, functions and operators here.

Also, notice that we’re grouping by ‘>gid’. Read about grouping (sometimes called ‘clustering’) search results here.

ORDER BY

Order the results by ‘gid’, from highest to lowest (DESC):

mysql> SELECT * FROM testrt ORDER BY gid DESC;
+------+------+
| id   | gid  |
+------+------+
|    3 |   33 |
|    2 |   22 |
|    1 |   11 |
+------+------+
3 rows in set (0.00 sec)

SHOW TABLES

Maybe you’d like to see what indexes are available. Try this:

mysql> show tables;
+--------+-------+
| Index  | Type  |
+--------+-------+
| test1  | local |
| testrt | rt    |
+--------+-------+
2 rows in set (0.00 sec)

TRUNCATE

Maybe you’d like to empty a real time index. Try this:

mysql> truncate rtindex testrt;

ALTER

The ALTER statement was added in version 2.2.1-beta. As of 2.2.1-beta, it supports adding one attribute at a time for both plain and RT indexes. The int, bigint, float, bool, multi-valued, multi-valued 64bit, json and string attribute types are supported. Support for multi, multi64, json and string attributes was added in 2.2.2-beta. As of 2.2.2-beta, you can add json and string attributes, but you cannot modify their values. The ability to remove attributes was added in 2.2.2-beta.

mysql> desc plain;
+------------+-----------+
| Field      | Type      |
+------------+-----------+
| id         | bigint    |
| text       | field     |
| group_id   | uint      |
| date_added | timestamp |
+------------+-----------+
4 rows in set (0.01 sec)
 
mysql> alter table plain add column test integer;
Query OK, 0 rows affected (0.04 sec)
 
mysql> desc plain;
+------------+-----------+
| Field      | Type      |
+------------+-----------+
| id         | bigint    |
| text       | field     |
| group_id   | uint      |
| date_added | timestamp |
| test       | uint      |
+------------+-----------+
5 rows in set (0.00 sec)
 
mysql> alter table plain drop column group_id;
Query OK, 0 rows affected (0.01 sec)
 
mysql> desc plain;
+------------+-----------+
| Field      | Type      |
+------------+-----------+
| id         | bigint    |
| text       | field     |
| date_added | timestamp |
| test       | uint      |
+------------+-----------+
4 rows in set (0.00 sec)

And! As of 2.2.3-beta, ALTER can also reconfigure an existing RT index, so that new tokenization, morphology, and other text processing settings from sphinx.conf take effect on the newly INSERT-ed rows, while retaining the existing rows as they were.

Just make changes to the configuration file and run:

ALTER RTINDEX index RECONFIGURE

SHOW META

Let’s take a look at the information provided by SHOW META:

mysql> SELECT *, WEIGHT() FROM test1 WHERE MATCH('"document one"/1');SHOW META;
+------+----------+------------+----------+
| id   | group_id | date_added | weight() |
+------+----------+------------+----------+
|    1 |        1 | 1415106370 |     2663 |
|    2 |        1 | 1415106370 |     1528 |
+------+----------+------------+----------+
2 rows in set (0.03 sec)
 
+---------------+----------+
| Variable_name | Value    |
+---------------+----------+
| total         | 2        |
| total_found   | 2        |
| time          | 0.032    |
| keyword[0]    | document |
| docs[0]       | 2        |
| hits[0]       | 2        |
| keyword[1]    | one      |
| docs[1]       | 1        |
| hits[1]       | 2        |
+---------------+----------+
9 rows in set (0.02 sec)

SHOW META shows additional meta-information about the latest query such as query time and keyword statistics. IO and CPU counters will only be available if searchd was started with --iostats and --cpustats switches respectively. Additional predicted_time, dist_predicted_time, [{local|dist}]_fetched_[{docs|hits|skips}] counters will only be available if searchd was configured with predicted time costs and if the query had predicted_time in the OPTION clause.

SHOW PROFILE and IN()

Now, let’s explore IN() and SHOW PROFILE:

mysql> SET profiling=1;SELECT * FROM test1 WHERE id IN (1,2,4);SHOW PROFILE;
Query OK, 0 rows affected (0.00 sec)
 
+------+----------+------------+
| id   | group_id | date_added |
+------+----------+------------+
|    1 |        1 | 1415106370 |
|    2 |        1 | 1415106370 |
|    4 |        2 | 1415106370 |
+------+----------+------------+
3 rows in set (0.01 sec)
 
+--------------+----------+----------+---------+
| Status       | Duration | Switches | Percent |
+--------------+----------+----------+---------+
| unknown      | 0.000271 | 4        | 2.14    |
| net_read     | 0.000018 | 1        | 0.14    |
| local_search | 0.012208 | 1        | 96.34   |
| sql_parse    | 0.000077 | 1        | 0.61    |
| fullscan     | 0.000005 | 1        | 0.04    |
| finalize     | 0.000029 | 1        | 0.23    |
| aggregate    | 0.000016 | 2        | 0.13    |
| net_write    | 0.000048 | 1        | 0.38    |
| eval_post    | 0.000000 | 1        | 0.00    |
| total        | 0.012672 | 13       | 0       |
+--------------+----------+----------+---------+
10 rows in set (0.01 sec)

The SHOW PROFILE statement shows a detailed execution profile of the previous SQL statement executed in the current SphinxQL session. Profiling must be enabled in the current session before actually running the statement. We did this with the SET profiling=1 statement. By default, profiling is disabled to avoid potential performance implications.

IN() takes 2 or more arguments, and returns 1 if the first argument (expr) is equal to any of the other arguments (val1..valN), or 0 otherwise. So, in our example, for IN to return 1, id must be equal to 1, 2 or 4.

HAVING

The HAVING clause is used to filter on GROUP BY values:

mysql> SELECT COUNT(*) c, id%3 idd FROM test1 GROUP BY idd HAVING COUNT(*)>0;
+------+------+
| c    | idd  |
+------+------+
|    2 |    1 |
|    1 |    2 |
|    1 |    0 |
+------+------+
3 rows in set (0.00 sec)

CALL KEYWORDS

And finally, let’s look at CALL KEYWORDS:

mysql> CALL KEYWORDS ('one two three', 'test1', 1);
+------+-----------+------------+------+------+
| qpos | tokenized | normalized | docs | hits |
+------+-----------+------------+------+------+
| 1    | one       | one        | 1    | 2    |
| 2    | two       | two        | 1    | 2    |
| 3    | three     | three      | 0    | 0    |
+------+-----------+------------+------+------+
3 rows in set (0.00 sec)

CALL KEYWORDS returns tokenized and normalized forms of the keywords, and, optionally, keyword statistics. Since version 2.2.2-beta it also returns the position of each keyword in the query and all forms of tokenized keywords in the case that lemmatizers were used.

Conclusion

Much of the above can already be found in the ‘Quick Tour’ from our documentation, but I added a few options and a bit more exploration. Hopefully this has helped you get a basic understanding of how to use Sphinx. There are many more goodies to explore, but we’ll save those for another day.

If you have questions, please feel free to ask them.

Happy Sphinxing!


« »

One Response to “Sphinx Search Quick Tour using a MySQL Datasource”

  1. After seeing this guide I think I might actually give this a go thanks.

Leave a Reply