引言

在使用开源的分布式任异步务调度框架Celery时，每一个Celery task，都会返回一个128位的uuid，那这个uuid是唯一的吗？

在同一台主机上，uuid是唯一的吗？（分同一个进程和不同进程）
在一个分布式系统中的不同的主机上，uuid是唯一的吗？

为了解决心中的疑问，我们来看看Celery中有关uuid生成的相关源代码。

Github中Celery uuid的源代码：
celery/celery/utils/init.py

from kombu.utils.uuid import uuid   # noqa
gen_unique_id = uuid

celery使用到了kombu项目中utils提供的uuid:
kombu是celery使用的Python消息库，也是整个celery项目中一个独立的模块。我们再看看kombu的uuid.

kombu/kombu/utils/uuid.py

"""UUID utilities."""
from __future__ import absolute_import, unicode_literals

from uuid import uuid4


def uuid(_uuid=uuid4):
    """Generate unique id in UUID4 format.
    See Also:
        For now this is provided by :func:`uuid.uuid4`.
    """
    return str(_uuid())

发现，Celery最终使用到了python提供的自带模块uuid。并且是uuid4.

下面来看看python的uuid模块的实现。

python中uuid的实现

关于UUID

UUID 是通用唯一识别码（Universally Unique Identifier）的缩写，是一种软件建构的标准，亦为开放软件基金会组织在分布式计算环境领域的一部分。其目的，是让分布式系统中的所有元素，都能有唯一的辨识信息，而不需要通过中央控制端来做辨识信息的指定。如此一来，每个人都可以创建不与其它人冲突的UUID。在这样的情况下，就不需考虑数据库创建时的名称重复问题。

UUID是128位的全局唯一标识符，通常由32字节的字符串表示。
它可以保证时间和空间的唯一性，也称为GUID，全称为：
UUID Universally unique IDentifier python 中叫UUID
GUID Globally Unique IDentifier C#中叫GUID
它是通过MAC地址、时间戳、命名空间、随机数、伪随机数来保证生成ID的唯一性。

UUID主要有5个算法，也就是5中方法来实现：
uuid1() --- 基于时间戳
由MAC地址、当前时间戳、随机数生成。可以保证全球范围内的唯一性，但MAC的使用同时带来了安全性问题，局域网中可以使用IP来代替MAC。
uuid2() -- 基于分布式计算环境DCE（python 中没有这个函数）
算法与uuid1相同，不同的是把时间戳的前4位置换为POSIX的UID。实际中很少使用
uuid3() --- 基于名字的MD5散列值
通过计算名字和命名空间的MD5散列值得到，保证了同一命名空间中不同名字的唯一性，和不同命名空间的唯一性，但同一命名空间的同一名字生成相同的uuid。
uuid4() --- 基于随机数
由伪随机数得到，有一定的重复概率，该概率可以计算出来。
uuid5() ---基于名字的SHA-1散列值
算法与uuid3相同，不同的是使用secure Hash Algorithm 1算法。

使用方面：
python 中没有基于DCE的，所有uuid2可以忽略；
uuid4存在概率性重复，由无映射性，最好不用；
在Global的分布式计算环境下，最好用uuid1；
有名字的唯一性要求，最好用uuid3或uuid5。

但是不知道为什么celery模块选择使用uuid4.

python3中uuid4的实现方法

先贴出python的源代码

class UUID(object):
    """Instances of the UUID class represent UUIDs as specified in RFC 4122.
    UUID objects are immutable, hashable, and usable as dictionary keys.
    Converting a UUID to a string with str() yields something in the form
    '12345678-1234-1234-1234-123456789abc'.  The UUID constructor accepts
    five possible forms: a similar string of hexadecimal digits, or a tuple
    of six integer fields (with 32-bit, 16-bit, 16-bit, 8-bit, 8-bit, and
    48-bit values respectively) as an argument named 'fields', or a string
    of 16 bytes (with all the integer fields in big-endian order) as an
    argument named 'bytes', or a string of 16 bytes (with the first three
    fields in little-endian order) as an argument named 'bytes_le', or a
    single 128-bit integer as an argument named 'int'.

    UUIDs have these read-only attributes:

        bytes       the UUID as a 16-byte string (containing the six
                    integer fields in big-endian byte order)

        bytes_le    the UUID as a 16-byte string (with time_low, time_mid,
                    and time_hi_version in little-endian byte order)

        fields      a tuple of the six integer fields of the UUID,
                    which are also available as six individual attributes
                    and two derived attributes:

            time_low                the first 32 bits of the UUID
            time_mid                the next 16 bits of the UUID
            time_hi_version         the next 16 bits of the UUID
            clock_seq_hi_variant    the next 8 bits of the UUID
            clock_seq_low           the next 8 bits of the UUID
            node                    the last 48 bits of the UUID

            time                    the 60-bit timestamp
            clock_seq               the 14-bit sequence number

        hex         the UUID as a 32-character i string

        int         the UUID as a 128-bit integer

        urn         the UUID as a URN as specified in RFC 4122

        variant     the UUID variant (one of the constants RESERVED_NCS,
                    RFC_4122, RESERVED_MICROSOFT, or RESERVED_FUTURE)

        version     the UUID version number (1 through 5, meaningful only
                    when the variant is RFC_4122)
    """

    def __init__(self, hex=None, bytes=None, bytes_le=None, fields=None,
                       int=None, version=None):
        r"""Create a UUID from either a string of 32 hexadecimal digits,
        a string of 16 bytes as the 'bytes' argument, a string of 16 bytes
        in little-endian order as the 'bytes_le' argument, a tuple of six
        integers (32-bit time_low, 16-bit time_mid, 16-bit time_hi_version,
        8-bit clock_seq_hi_variant, 8-bit clock_seq_low, 48-bit node) as
        the 'fields' argument, or a single 128-bit integer as the 'int'
        argument.  When a string of hex digits is given, curly braces,
        hyphens, and a URN prefix are all optional.  For example, these
        expressions all yield the same UUID:

        UUID('{12345678-1234-5678-1234-567812345678}')
        UUID('12345678123456781234567812345678')
        UUID('urn:uuid:12345678-1234-5678-1234-567812345678')
        UUID(bytes='\x12\x34\x56\x78'*4)
        UUID(bytes_le='\x78\x56\x34\x12\x34\x12\x78\x56' +
                      '\x12\x34\x56\x78\x12\x34\x56\x78')
        UUID(fields=(0x12345678, 0x1234, 0x5678, 0x12, 0x34, 0x567812345678))
        UUID(int=0x12345678123456781234567812345678)

        Exactly one of 'hex', 'bytes', 'bytes_le', 'fields', or 'int' must
        be given.  The 'version' argument is optional; if given, the resulting
        UUID will have its variant and version set according to RFC 4122,
        overriding the given 'hex', 'bytes', 'bytes_le', 'fields', or 'int'.
        """

        if [hex, bytes, bytes_le, fields, int].count(None) != 4:
            raise TypeError('one of the hex, bytes, bytes_le, fields, '
                            'or int arguments must be given')
        if hex is not None:
            hex = hex.replace('urn:', '').replace('uuid:', '')
            hex = hex.strip('{}').replace('-', '')
            if len(hex) != 32:
                raise ValueError('badly formed hexadecimal UUID string')
            int = int_(hex, 16)
        if bytes_le is not None:
            if len(bytes_le) != 16:
                raise ValueError('bytes_le is not a 16-char string')
            bytes = (bytes_le[4-1::-1] + bytes_le[6-1:4-1:-1] +
                     bytes_le[8-1:6-1:-1] + bytes_le[8:])
        if bytes is not None:
            if len(bytes) != 16:
                raise ValueError('bytes is not a 16-char string')
            assert isinstance(bytes, bytes_), repr(bytes)
            int = int_.from_bytes(bytes, byteorder='big')
        if fields is not None:
            if len(fields) != 6:
                raise ValueError('fields is not a 6-tuple')
            (time_low, time_mid, time_hi_version,
             clock_seq_hi_variant, clock_seq_low, node) = fields
            if not 0 <= time_low < 1<<32:
                raise ValueError('field 1 out of range (need a 32-bit value)')
            if not 0 <= time_mid < 1<<16:
                raise ValueError('field 2 out of range (need a 16-bit value)')
            if not 0 <= time_hi_version < 1<<16:
                raise ValueError('field 3 out of range (need a 16-bit value)')
            if not 0 <= clock_seq_hi_variant < 1<<8:
                raise ValueError('field 4 out of range (need an 8-bit value)')
            if not 0 <= clock_seq_low < 1<<8:
                raise ValueError('field 5 out of range (need an 8-bit value)')
            if not 0 <= node < 1<<48:
                raise ValueError('field 6 out of range (need a 48-bit value)')
            clock_seq = (clock_seq_hi_variant << 8) | clock_seq_low
            int = ((time_low << 96) | (time_mid << 80) |
                   (time_hi_version << 64) | (clock_seq << 48) | node)
        if int is not None:
            if not 0 <= int < 1<<128:
                raise ValueError('int is out of range (need a 128-bit value)')
        if version is not None:
            if not 1 <= version <= 5:
                raise ValueError('illegal version number')
            # Set the variant to RFC 4122.
            int &= ~(0xc000 << 48)
            int |= 0x8000 << 48
            # Set the version number.
            int &= ~(0xf000 << 64)
            int |= version << 76
        self.__dict__['int'] = int

def uuid4():
    """Generate a random UUID."""
    return UUID(bytes=os.urandom(16), version=4)

由源代码可以看出，python3的uuid4，是由一个16 bytes的随机数生成的。最重要的一句代码就是bytes=os.urandom(16)。

Python中os.urandom()的详细说明：

Random numbers
--------------

 function:: urandom(size)

   Return a string of *size* random bytes suitable for cryptographic use.

   This function returns random bytes from an OS-specific randomness source.  The
   returned data should be unpredictable enough for cryptographic applications,
   though its exact quality depends on the OS implementation.

   On Linux, if the ``getrandom()`` syscall is available, it is used in
   blocking mode: block until the system urandom entropy pool is initialized
   (128 bits of entropy are collected by the kernel). See the :pep:`524` for
   the rationale. On Linux, the :func:`getrandom` function can be used to get
   random bytes in non-blocking mode (using the :data:`GRND_NONBLOCK` flag) or
   to poll until the system urandom entropy pool is initialized.

   On a Unix-like system, random bytes are read from the ``/dev/urandom``
   device. If the ``/dev/urandom`` device is not available or not readable, the
   :exc:`NotImplementedError` exception is raised.

从python的官方文档可以看出，在linux系统中，os.urandom随机数的产生，依赖于/dev/urandom文件。

/dev/random和/dev/urandom是Linux系统中提供的随机伪设备，这两个设备的任务，是提供永不为空的随机字节数据流。很多解密程序与安全应用程序（如SSH Keys,SSL Keys等）需要它们提供的随机数据流。
这两个设备的差异在于：/dev/random的random pool依赖于系统中断，因此在系统的中断数不足时，/dev/random设备会一直封锁，尝试读取的进程就会进入等待状态，直到系统的中断数充分够用, /dev/random设备可以保证数据的随机性。/dev/urandom不依赖系统的中断，也就不会造成进程忙等待，但是数据的随机性也不高。

现在可以回答上面提到的2个问题了：
1. 在同一台主机上，uuid是唯一的吗？（分同一个进程和不同进程）
2. 在一个分布式系统中的不同的主机上，uuid是唯一的吗？
答案是： celery使用的uuid4与主机，进程都没有关系，是纯粹的由随机数算法得到的，与平台，机器，时间，命名空间都无关。

暂且相信16bytes的随机数，出现重复的概率极小，但是，在一个大规模的分布式系统中，仍然由概率相同，希望在使用celery的过程中，不要碰到这个坑。

celery的uuid，真的全球唯一吗？

celery的uuid，真的全球唯一吗？

引言

python中uuid的实现

关于UUID

python3中uuid4的实现方法

Python中os.urandom()的详细说明：