概念
在关系数据库中,ID是用来定义某个数据的唯一标识。任何新生成的数据,我们都会为它赋予一个新的唯一的ID。通常来说,我们都可以通过数据库的ID sequence的自增,来保证ID的唯一。但是,在分布式系统中,依赖数据库ID的生成,造成所有的ID生成请求都会在这里排队获取。为了避免这种情况,我们来看看分布式ID生成的常见用法。
UUID
UUID在本地生成:
UUID id = UUID.randomUUID()
好处:
- 本地生成,速度快,无性能问题
缺点: - UUID太长,32位 + 4位'-'共计36位
- UUID在数据库中作为主键,建立索引查询效率低
- 无任何递增的趋势
注意:在这里我们为什么要强调递增呢?我们经常遇到这样的需要,查询某个表,然后将结果按照时间排序输出。
select * from xx_table where xx = '' order by created_dt desc
如果我们的主键ID在生成的时候,就带有时间属性,我们的语句可以改成如下:
select * from xx_table where xx = '' order by id desc
因为ID本身就是主键,这样可以省时,省空间。
worker使用table max(id)本地生成
依赖每个worker在集群中配置的worker id,则每个worker的ID生成算法为:
- worker第一次取table的max(id), 放入cache
- worker id = worker id < 10? "0" + worker id
- newId = max(id) + 100
- nextId = newId.substring(0, newId.length - 2) + worker id
优点:
- 每个worker都本地生成,除了第一次需要从DB中load max(id)。
缺点: - 造成大量的ID浪费,不连续。 如果集群中只有0~9个worker,从100开始分配id, 则只能生成:101, 102, 103 ~109, 下次则:201~209。 110~199, 210~299则全部浪费掉。
根据时间戳 + sequence的生成机制
根据时间戳,sequence来生成ID的机制,在实际使用中比较常见。国内某博就根据 userid + 时间戳来生成帖子的ID。
今天我们讨论的是twitter分布式自增ID的算法。
snowflake是twitter开源的分布式ID生成算法,其核心思想是:一个long型的ID,使用其中41bit作为毫秒数,5 bit作为worder id,
5 bit作为data center id, 12bit作为毫秒内序列,共计64 bit。其中还有一位是long的占位符。
单机,每毫秒可以生成: 4096个
集群每毫秒可以产生:4,194,304个
优点:
- 生成的ID是根据时间递增
- 每个data center, 每个worker在每毫秒生成的ID都是不同的
- 同一个workder在同一毫秒内,生成的ID是递增的
- 不依赖数据库
缺点:
- 某些产品会根据ID来进行分库分表,往往通过其ID来取模。在跨毫秒时,第一个sequence的ID是0, 所以这样的数据最多,导致取模分配不均匀。故建议序列号不是每次都归0,而是归一个0到9的随机数
附snowflake的JAVA实现源代码:
package com.eyesee.snowflake;
import java.util.concurrent.atomic.AtomicReference;
import java.util.Objects;
/**
* The twitter snowflake id generator.
* <pre>
* The format:
* 1 + 41 time millis + 5 data center size + 5 worker size + 12 sequence.
*
* The total length is: 64
*
* </pre>
*
* @Auther jesse
*/
public class SnowflakeIdGenerator {
private static final long WORKER_ID_BITS = 5L;
private static final long DATA_CENTER_BITS = 5L;
//The max worker id is 31
private static final long MAX_WORKER_ID = -1L ^ (-1L << WORKER_ID_BITS);
//the max data center id is 31
private static final long MAX_DATA_CENTER_ID = -1L ^ (-1L << DATA_CENTER_BITS);
private static final long SEQUENCE_BITS = 12L;
private static final long WORKER_ID_SHIFT = SEQUENCE_BITS;
//the data center shift left:12 + 5
private static final long DATA_CENTER_SHIFT = SEQUENCE_BITS + WORKER_ID_BITS;
//the timestamp shift:12 + 5 + 5
private static final long TIMESTAMP_SHIFT = SEQUENCE_BITS + WORKER_ID_BITS + DATA_CENTER_BITS;
private static final long SEQUENCE_MASK = -1L ^ (-1L << SEQUENCE_BITS);
private long workerId;
private long dataCenterId;
private AtomicReference<TimestampSequence> timestampSequenceReference = new AtomicReference<>(new TimestampSequence());
public TimestampSequence generateNextTimestampSequence() {
long timestamp = timeMillis();
long newSequenceValue = 0l;
TimestampSequence oldTimestampSequence = timestampSequenceReference.get();
TimestampSequence newTimestampSequence = new TimestampSequence();
if (timestamp == oldTimestampSequence.getTimestamp()) {
newSequenceValue = (oldTimestampSequence.getSequence() + 1) & SEQUENCE_MASK;
if (newSequenceValue != 0) {
newTimestampSequence.setSequence(newSequenceValue);
newTimestampSequence.setTimestamp(timestamp);
} else {
return generateNextTimestampSequence();
}
} else {
newTimestampSequence.setTimestamp(timestamp);
newTimestampSequence.setSequence(newSequenceValue);
}
if(!timestampSequenceReference.compareAndSet(oldTimestampSequence, newTimestampSequence)) {
return generateNextTimestampSequence();
}
return newTimestampSequence;
}
public class TimestampSequence {
//the timstamp when generate the sequence
private long timestamp = -1l;
//the sequence(0~4095 Math.pow(2, 12))
private long sequence = 0l;
public long getTimestamp() {
return timestamp;
}
public void setTimestamp(long timestamp) {
this.timestamp = timestamp;
}
public long getSequence() {
return sequence;
}
public void setSequence(long sequence) {
this.sequence = sequence;
}
@Override
public int hashCode() {
return Long.hashCode(timestamp) * 31 + Long.hashCode(sequence);
}
@Override
public boolean equals(Object obj) {
if (Objects.isNull(obj)) {
return false;
}
if (obj instanceof TimestampSequence) {
TimestampSequence timestampSequence = (TimestampSequence) obj;
return (this.timestamp == timestampSequence.getTimestamp()) && (this.sequence == timestampSequence.getSequence());
}
return false;
}
}
/**
*
* @param workerId (0~31)
* @param dataCenterId (0~31)
*/
public SnowflakeIdGenerator(long workerId, long dataCenterId) {
if (workerId > MAX_WORKER_ID || workerId < 0) {
throw new IllegalArgumentException(String.format("worker Id can't be greater than %d or less than 0", MAX_WORKER_ID));
}
if (dataCenterId > MAX_DATA_CENTER_ID || dataCenterId < 0) {
throw new IllegalArgumentException(String.format("datacenter Id can't be greater than %d or less than 0", MAX_DATA_CENTER_ID));
}
this.workerId = workerId;
this.dataCenterId = dataCenterId;
}
public long nextId() {
TimestampSequence timestampSequence = generateNextTimestampSequence();
return (timestampSequence.getTimestamp() << TIMESTAMP_SHIFT) //
| (dataCenterId << DATA_CENTER_SHIFT) //
| (workerId << WORKER_ID_SHIFT) //
| timestampSequence.getSequence();
}
protected long timeMillis() {
return System.currentTimeMillis();
}
public static void main(String[] args) {
SnowflakeIdGenerator idWorker = new SnowflakeIdGenerator(0, 0);
Runnable runnable = () -> {
Long id = idWorker.nextId();
System.out.println(id);
};
for (int i = 0; i < 1000; i++) {
new Thread(runnable).start();
}
}
}