Indexing Basics
索引类型
B-TREE 索引
InnoDB使用的即是B-TREE索引。
存储引擎以不同的方式使用B-TREE索引,性能也各有不同。例如,MyISAM使用前缀压缩技术使得索引更小,但InnoDB则按照原数据格式进行存储。再如MyISAM索引通过数据的物理位置引用被索引的行,而InnoDB则根据主键引用被索引的行。
B-TREE对索引列是顺序组织存储的,所以很适合查找范围数据。下图为B-TREE索引的抽象表示:
B-TREE索引能够加快数据访问的速度,因为存储引擎不再需要进行全表扫描获取需要的数据。
假设有下面这个数据表:
CREATE TABLE People (
last_name varchar(50) ,
first_name varchar(50) dob date,
not null, not null, not null,
gender enum('m', 'f')not null,
key(last_name, first_name, dob) );
The index will contain the values from the last_name, first_name, and dob columns for every row in the table.
!!!索引对多个值进行排序的依据是CREATE TABLE语句中定义索引时的顺序。
Types of queries that can use a B-Tree index. B-Tree indexes work well for lookups by the full key value
, a key range
, or a key prefix
. They are useful only if the lookup uses a leftmost prefix
of the index.
Here are some limitations of B-Tree indexes:
- They are not useful if the lookup
does not start from the leftmost side of the indexed columns
. For example, this index won’t help you find all people named Bill or all people born on a certain date, because those columns are not leftmost in the index. Likewise, you can’t use the index to find people whose last name ends with a par- ticular letter. - You
can’t skip columns in the index
. That is, you won’t be able to find all people whose last name is Smith and who were born on a particular date. If you don’t specify a value for the first_name column, MySQL can use only the first column of the index. - The storage engine can’t optimize accesses with any columns to
the right of the first range condition
. For example, if your query is WHERE last_name="Smith" AND first_name LIKE 'J%' AND dob='1976-12-23', the index access will use only the first two columns in the index, because the LIKE is a range condition (the server can use the rest of the columns for other purposes, though). For a column that has a limited number of values, you can often work around this by specifying equality conditions instead of range conditions. We show detailed examples of this in the indexing case study later in this chapter.
所以索引列的顺序是非常的重要,这些限制都和索引列的顺序有关。在优化的时候可以使用相同的列但顺序不同的索引来满足快速查询的需求。
哈希索引
A hash index is built on a hash table and is useful only for exact lookups that use every column in the index
.
In MySQL, only the Memory storage engine
supports explicit hash indexes. They are the default index type for Memory tables, though Memory tables can have B-Tree indexes, too.
Building your own hash indexes. If your storage engine doesn’t support hash indexes, you can emulate them yourself in a manner similar to that InnoDB uses. This will give you access to some of the desirable properties of hash indexes, such as a very small index size for very long keys
.
An example of when this approach works well is for URL lookups. URLs generally cause B-Tree indexes to become huge, because they’re very long. You’d normally query a table of URLs like this:
mysql> SELECT id FROM url WHERE url="http://www.mysql.com";
But if you remove the index on the url column and add an indexed url_crc column to the table, you can use a query like this:
mysql> SELECT id FROM url WHERE url="http://www.mysql.com" -> AND url_crc=CRC32("http://www.mysql.com");
CRC32 is one of hash functions based on on the "polynomial" division idea. The CRC is acronym for Cyclic Redundancy Code (other variants instead "Code" is "Check" and "Checksum") algorithm. The number 32 is specifying the size of resulting hash value (checksum) - 32 bits. The checksum is used to detect errors after transmission or storage of any piece of information.
This works well because the MySQL query optimizer notices there’s a small, highly selective index on the url_crc column and does an index lookup for entries with that value (1560514994, in this case). Even if several rows have the same url_crc value, it’s very easy to find these rows with a fast integer comparison and then examine them to find the one that matches the full URL exactly. The alternative is to index the full URL as a string, which is much slower.
这样做的一个缺陷是需要维护hash值,可以通过使用触发器来较好的解决这个问题。
If you use this approach, you should not use SHA1()
or MD5()
hash functions. These return very long strings
, which waste a lot of space and result in slower comparisons. They are cryptographically strong functions designed to virtually eliminate collisions, which is not your goal here. Simple hash functions can offer acceptable collision rates with better performance.
If your table has many rows and CRC32() gives too many collisions
, implement your own 64-bit hash function. Make sure you use a function that returns an integer, not a string. One way to implement a 64-bit hash function is to use just part of the value returned by MD5()
.
空间数据索引(R-Tree)
全文索引
索引的优点
Three main benefits proceed from these properties:
- Indexes reduce the amount of data the server has to examine.
- Indexes help the server
avoid sorting
andtemporary tables
. - Indexes turn
random I/O
intosequential I/O
.
高性能索引策略
独立的列
“Isolating” the column means it should not
be part of an expression or be inside a function in the query.如果查询中得列不是独立的,MySQL就不会使用索引。
前缀索引和索引选择性
Index selectivity is the ratio of the number of distinct indexed values (the cardinality) to the total number of rows in the table (#T), and ranges from 1/#T to 1.
A prefix of the column is often selective enough to give good performance. If you’re indexing BLOB or TEXT columns, or very long VARCHAR columns, you must define prefix indexes, because MySQL disallows indexing their full length.
创建前缀索引
mysql> ALTER TABLE tableName ADD KEY (colName(len));
前缀索引的一个缺点是MySQL无法用其做orderBy和groupBy。
选择合适的索引列顺序
选择索引列顺序的经验法则: 将选择性最高的列放在前面。
聚族索引
聚族索引并不是一种单独的索引类型,而是一种数据存储方式。
InnoDB将通过主键聚集数据。
覆盖索引
An index that contains (or “covers”) all the data needed to satisfy a query is called a covering index
.
使用索引扫描来做排序
压缩(前缀压缩)索引
MySQL需要单独维护重复的索引,所以有性能上的考虑。重复索引是指在相同的列上
按照相同的顺序
创建的相同类型
的索引。