创建高性能索引

Indexing Basics

索引类型

B-TREE 索引

InnoDB使用的即是B-TREE索引。
存储引擎以不同的方式使用B-TREE索引,性能也各有不同。例如,MyISAM使用前缀压缩技术使得索引更小,但InnoDB则按照原数据格式进行存储。再如MyISAM索引通过数据的物理位置引用被索引的行,而InnoDB则根据主键引用被索引的行。

B-TREE对索引列是顺序组织存储的,所以很适合查找范围数据。下图为B-TREE索引的抽象表示:


B-TREE Index.png

B-TREE索引能够加快数据访问的速度,因为存储引擎不再需要进行全表扫描获取需要的数据。

假设有下面这个数据表:

CREATE TABLE People ( 
last_name varchar(50) ,
first_name varchar(50) dob date,
not null, not null, not null,
gender enum('m', 'f')not null,
key(last_name, first_name, dob) );

The index will contain the values from the last_name, first_name, and dob columns for every row in the table.
!!!索引对多个值进行排序的依据是CREATE TABLE语句中定义索引时的顺序。

Types of queries that can use a B-Tree index. B-Tree indexes work well for lookups by the full key value, a key range, or a key prefix. They are useful only if the lookup uses a leftmost prefix of the index.

Here are some limitations of B-Tree indexes:

  • They are not useful if the lookup does not start from the leftmost side of the indexed columns. For example, this index won’t help you find all people named Bill or all people born on a certain date, because those columns are not leftmost in the index. Likewise, you can’t use the index to find people whose last name ends with a par- ticular letter.
  • You can’t skip columns in the index. That is, you won’t be able to find all people whose last name is Smith and who were born on a particular date. If you don’t specify a value for the first_name column, MySQL can use only the first column of the index.
  • The storage engine can’t optimize accesses with any columns to the right of the first range condition. For example, if your query is WHERE last_name="Smith" AND first_name LIKE 'J%' AND dob='1976-12-23', the index access will use only the first two columns in the index, because the LIKE is a range condition (the server can use the rest of the columns for other purposes, though). For a column that has a limited number of values, you can often work around this by specifying equality conditions instead of range conditions. We show detailed examples of this in the indexing case study later in this chapter.

所以索引列的顺序是非常的重要,这些限制都和索引列的顺序有关。在优化的时候可以使用相同的列但顺序不同的索引来满足快速查询的需求。

哈希索引

A hash index is built on a hash table and is useful only for exact lookups that use every column in the index.
In MySQL, only the Memory storage engine supports explicit hash indexes. They are the default index type for Memory tables, though Memory tables can have B-Tree indexes, too.

Building your own hash indexes. If your storage engine doesn’t support hash indexes, you can emulate them yourself in a manner similar to that InnoDB uses. This will give you access to some of the desirable properties of hash indexes, such as a very small index size for very long keys.

An example of when this approach works well is for URL lookups. URLs generally cause B-Tree indexes to become huge, because they’re very long. You’d normally query a table of URLs like this:

mysql> SELECT id FROM url WHERE url="http://www.mysql.com";

But if you remove the index on the url column and add an indexed url_crc column to the table, you can use a query like this:

mysql> SELECT id FROM url WHERE url="http://www.mysql.com" -> AND url_crc=CRC32("http://www.mysql.com");

CRC32 is one of hash functions based on on the "polynomial" division idea. The CRC is acronym for Cyclic Redundancy Code (other variants instead "Code" is "Check" and "Checksum") algorithm. The number 32 is specifying the size of resulting hash value (checksum) - 32 bits. The checksum is used to detect errors after transmission or storage of any piece of information.

This works well because the MySQL query optimizer notices there’s a small, highly selective index on the url_crc column and does an index lookup for entries with that value (1560514994, in this case). Even if several rows have the same url_crc value, it’s very easy to find these rows with a fast integer comparison and then examine them to find the one that matches the full URL exactly. The alternative is to index the full URL as a string, which is much slower.

这样做的一个缺陷是需要维护hash值,可以通过使用触发器来较好的解决这个问题。

If you use this approach, you should not use SHA1() or MD5() hash functions. These return very long strings, which waste a lot of space and result in slower comparisons. They are cryptographically strong functions designed to virtually eliminate collisions, which is not your goal here. Simple hash functions can offer acceptable collision rates with better performance.

If your table has many rows and CRC32() gives too many collisions, implement your own 64-bit hash function. Make sure you use a function that returns an integer, not a string. One way to implement a 64-bit hash function is to use just part of the value returned by MD5().

空间数据索引(R-Tree)

全文索引

索引的优点

Three main benefits proceed from these properties:

  1. Indexes reduce the amount of data the server has to examine.
  2. Indexes help the server avoid sorting and temporary tables.
  3. Indexes turn random I/O into sequential I/O.

高性能索引策略

独立的列

“Isolating” the column means it should not be part of an expression or be inside a function in the query.如果查询中得列不是独立的,MySQL就不会使用索引。

前缀索引和索引选择性

Index selectivity is the ratio of the number of distinct indexed values (the cardinality) to the total number of rows in the table (#T), and ranges from 1/#T to 1.

A prefix of the column is often selective enough to give good performance. If you’re indexing BLOB or TEXT columns, or very long VARCHAR columns, you must define prefix indexes, because MySQL disallows indexing their full length.

创建前缀索引

mysql> ALTER TABLE tableName ADD KEY (colName(len));

前缀索引的一个缺点是MySQL无法用其做orderBy和groupBy。

选择合适的索引列顺序

选择索引列顺序的经验法则: 将选择性最高的列放在前面。

聚族索引

聚族索引并不是一种单独的索引类型,而是一种数据存储方式。
InnoDB将通过主键聚集数据。


cluster_index.png

覆盖索引

An index that contains (or “covers”) all the data needed to satisfy a query is called a covering index.

使用索引扫描来做排序

压缩(前缀压缩)索引

MySQL需要单独维护重复的索引,所以有性能上的考虑。重复索引是指在相同的列上按照相同的顺序创建的相同类型的索引。

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 212,657评论 6 492
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 90,662评论 3 385
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 158,143评论 0 348
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 56,732评论 1 284
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 65,837评论 6 386
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 50,036评论 1 291
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,126评论 3 410
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 37,868评论 0 268
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,315评论 1 303
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 36,641评论 2 327
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 38,773评论 1 341
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 34,470评论 4 333
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,126评论 3 317
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 30,859评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,095评论 1 267
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 46,584评论 2 362
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 43,676评论 2 351

推荐阅读更多精彩内容