2020-11-18 索引

恍然总结(20201202补充)

检索某个记录时,查找类型大概分为线性和B树,快慢的话,数据量大时B树比线性快,如果查找类型相同,查找一遍的比多遍快,比如innodb聚集索引比非聚集索引快的原因就是,虽然都采用B树查找,但是非聚集的只需查询一遍,而非聚集的查多遍。

innodb聚簇索引的索引以及记录数据(注意是两样),存储在同一个名为表.opt的文件(逻辑上),而这个文件则是多个数据页(既有存储索引的页,也有存储记录的页)组成,这些数据页物理存储顺序是以一整棵B+树来维护的,所以当通过该聚簇索引(一般是id列)检索某个记录时,只需要查一遍这个B+树就可以直接触达目标索引对应的记录了。

一个文件(逻辑)的所有数据页由一个B+树维护,而非聚集索引,索引数据和记录数据不在同一个文件上,即非聚集索引数据和记录数据的文件,以数据页为节点分别维护在两颗不同B+树上,所以通过非聚集索引检索某记录时,要先在非聚集索引B+树上找到目标的记录值(即聚集索引,一般为id),然后再去记录数据B+树(也就是聚集索引树)上找到该id对应的的记录(在这里,所有列的都可拿到)。

SQL

There are two types of Indexes in SQL Server:

  1. Clustered Index
  2. Non-Clustered Index

Clustered Index

A clustered index defines the order in which data is physically stored in a table. Table data can be sorted in only way, therefore, there can be only one clustered index per table. In SQL Server, the primary key constraint automatically creates a clustered index on that particular column.

The process of creating clustered index is similar to a normal index with one exception. With clustered index, you have to use the keyword “CLUSTERED” before “INDEX”.

The above script creates a clustered index named “IX_tblStudent_Gender_Score” on the student table. This index is created on the “gender” and “total_score” columns. An index that is created on more than one column is called “composite index”.

The above index first sorts all the records in the ascending order of the gender. If gender is same for two or more records, the records are sorted in the descending order of the values in their “total_score” column. You can create a clustered index on a single column as well. Now if you select all the records from the student table, they will be retrieved in the following order:


image.png

简而概之就是 聚集索引 维护的是 每行数据的物理存储顺序,说到物理存储顺序就是以什么顺序存储在磁盘中的,那么存储方式肯定也就只有一种了,所以一个表只能有一个聚集索引。
至于以哪列或者哪几列的record来排序,则是看如何定义这个聚集索引了,创建每张表时,默认情况下自动会维护一个id列作索引,当插入record时,以该列的值升序来插入每个record。
当然也可以自定义聚簇索引,即指定某列或某几列的作为索引,当插入record时,通过被指定的列进行排序存储在物理磁盘中,当被指定列的值重合度较高时,或频繁修改时,这样会不会随着插入或者删除经常会把record掏出来放回去,改来改去,效率应该不高。

Non-Clustered Indexes

A non-clustered index doesn’t sort the physical data inside the table. In fact, a non-clustered index is stored at one place and table data is stored in another place. This is similar to a textbook where the book content is located in one place and the index is located in another. This allows for more than one non-clustered index per table.

It is important to mention here that inside the table the data will be sorted by a clustered index. However, inside the non-clustered index data is stored in the specified order. The index contains column values on which the index is created and the address of the record that the column value belongs to.

When a query is issued against a column on which the index is created, the database will first go to the index and look for the address of the corresponding row in the table. It will then go to that row address and fetch other column values. It is due to this additional step that non-clustered indexes are slower than clustered indexes.

image.png

Conclusion

From the discussion we find following differences between clustered and non-clustered indexes.

  1. There can be only one clustered index per table. However, you can create multiple non-clustered indexes on a single table.
  2. Clustered indexes only sort tables. Therefore, they do not consume extra storage. Non-clustered indexes are stored in a separate place from the actual table claiming more storage space.
  3. Clustered indexes are faster than non-clustered indexes since they don’t involve any extra lookup step.
image.png

参考:
https://www.sqlshack.com/what-is-the-difference-between-clustered-and-non-clustered-indexes-in-sql-server/

Mysql

Clustered Index

A clustered index is collocated with the data in the same table space or same disk file. You can consider that a clustered index is a B-Tree index whose leaf nodes are the actual data blocks on disk, since the index & data reside together. This kind of index physically organizes the data on disk as per the logical order of the index key.

Structure of Primary key (clustered) Index:
An index is usually maintained as a B+ Tree on disk & in-memory, and any index is stored in blocks on disk. These blocks are called index blocks. The entries in the index block are always sorted on the index/search key. The leaf index block of the index contains a row locator. For the primary index, the row locator refers to virtual address of the corresponding physical location of the data blocks on disk where rows reside being sorted as per the index key.

In the following diagram, the left side rectangles represent leaf level index blocks, and the right side rectangles represent the data blocks. Logically the data blocks look to be aligned in a sorted order, but as already described earlier, the actual physical locations may be scattered here & there.


image.png

Secondary Index

Any index other than a clustered index is called a secondary index. Secondary indices does not impact physical storage locations unlike primary indices.

Structure of Secondary Index:
In the diagram below, the red coloured rectangles represent secondary index blocks. Secondary index is also maintained in the B+ Tree and it’s sorted as per the key on which the index was created. The leaf nodes contain a copy of the key of the corresponding data in the primary index.

Why do we use composite indices?

Why not define multiple secondary indices on the columns we are interested in?
MySQL uses only one index per table per query except for UNION. (In a UNION, each logical query is run separately, and the results are merged.) So defining multiple indices on multiple columns does not guarantee those indices will be used even if they are part of the query.

MySQL maintains something called index statistics which helps MySQL infer what the data looks like in the system. Index statistics is a generilization though, but based on this meta data, MySQL decides which index is appropriate for the current query.

How does composite index work?

The columns used in composite indices are concatenated together, and those concatenated keys are stored in sorted order using a B+ Tree. When you perform a search, concatenation of your search keys is matched against those of the composite index. Then if there is any mismatch between the ordering of your search keys & ordering of the composite index columns, the index can’t be used.

Covering Index:

A covering index is a special kind of composite index where all the columns specified in the query somewhere exist in the index. So the query optimizer does not need to hit the database to get the data — rather it gets the result from the index itself. Example: we have already defined a composite index on (pan_no, name, age) , so now consider the following query:

覆盖索引就是 比如 某表有 a、b、c列,且创建了a,b,c三列的复合索引com_index,
当通过该复合索引select 某个目标列的record时,这个目标列恰好在是b、c中的某列,
比如select c from table where a=1 and b="a"; 那么刚好c的record就在这个复合索引树中有备份,而无需再去聚集索引树中找c的值了,这种情况就叫做覆盖索引,大概是联合索引覆盖了目标列的情况把,总结这应该算是一种使用联合索引的方法。

SELECT age FROM index_demo WHERE pan_no = 'HJKXS9086W' AND name = 'kousik'

The columns mentioned in the SELECT & WHERE clauses are part of the composite index. So in this case, we can actually get the value of the age column from the composite index itself. Let’s see what the EXPLAIN command shows for this query:

EXPLAIN FORMAT=JSON SELECT age FROM index_demo WHERE pan_no = 'HJKXS9086W' AND name = '111kousik1';
image.png

In the above response, note that there is a key — using_index which is set to true which signifies that the covering index has been used to answer the query.

I don’t know how much covering indices are appreciated in production environments, but apparently it seems to be a good optimization in case the query fits the bill.

Partial Index:

当想把某列设为索引时,这个列的长度确实不固定或者 固定但是很长,我们可以考虑取这列record值的前几个字节作为索引,这样索引树的key就不会,由于太长,导致每个节点上能存的key很少,最后把树高弄得很高,查询时增加磁盘访问次数了。
We already know that Indices speed up our queries at the cost of space. The more indices you have, the more the storage requirement. We have already created an index called secondary_idx_1 on the column name. The column name can contain large values of any length. Also in the index, the row locators’ or row pointers’ metadata have their own size. So overall, an index can have a high storage & memory load.

In MySQL, it’s possible to create an index on the first few bytes of data as well. Example: the following command creates an index on the first 4 bytes of name. Though this method reduces memory overhead by a certain amount, the index can’t eliminate many rows, since in this example the first 4 bytes may be common across many names. Usually this kind of prefix indexing is supported on CHAR ,VARCHAR, BINARY, VARBINARY type of columns.

CREATE INDEX secondary_index_1 ON index_demo (name(4));

参考:
https://www.freecodecamp.org/news/database-indexing-at-a-glance-bb50809d48bd/

Q&A

疑问:两个参考中(前者是Sql 后者是Mysql) 都提到了聚簇索引叶子结点会存储record,第一个参考说record值直接存贮在叶子结点,但是第二个参考却说也存record值,不过是以record对应的逻辑地址(指向物理磁盘地址)形式存在的。

那么对于非聚簇索引上存储的是record的指针(和逻辑地址有区别吗)。
那为何非聚簇索引会更慢呢。
初步推测第二个参考中说的有问题。后来发现第二个说的没有问题 至少可以自圆其说
其圆的方式是 辅助索引key的值存的是聚簇索引的key,然后拿着这个key 再去找对应record 所以确实查一次辅助索引需要 和磁盘打交道两回
那么问题来了 为何 辅助索引key对应值 为何不直接写record 的逻辑地址,这样岂不是可以赶上聚簇索引? 还是有何限制逻辑值放到辅助索引中,是因为落盘的那一瞬间只能确定主键id,还没法确定逻辑地址吗。

那为何SQL中却说 其非聚簇索引 key对应值可以存储record的指针,他是怎么做到的。

解决这些疑问 可能需要补充 如下知识:

  1. 数据如何在磁盘中存储的(直接一个page上放一堆(就是物理上的一堆了),还是一个page只是逻辑上的,真实的数据虽然属于一个page,但是却分布在不同的物理地址?那么磁盘转一次所对应的一份数据属于一page 还是一个磁盘片区?需要详细了解)
  2. 逻辑地址和物理地址关系
  3. SQL 中非聚簇 record的指针的实际含义是?
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 217,826评论 6 506
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 92,968评论 3 395
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 164,234评论 0 354
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 58,562评论 1 293
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 67,611评论 6 392
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 51,482评论 1 302
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 40,271评论 3 418
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 39,166评论 0 276
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 45,608评论 1 314
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,814评论 3 336
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,926评论 1 348
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,644评论 5 346
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 41,249评论 3 329
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,866评论 0 22
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,991评论 1 269
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 48,063评论 3 370
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,871评论 2 354

推荐阅读更多精彩内容