CUDA基础笔记（一）

CUDA显卡硬件

一般用Host指代计算机的CPU，而用Device指代显卡的GPU

一个GPU有多个流处理器（streaming multiprocessors）（SM），每一个SM包含：

memory register for threads to use
several memory caches
- shared memory
- constant cache
- texture memory
- L1 cache
thread schedulers
Several CUDA cores (analagous to streaming processor in AMD cards) - number depends on microarchitecture generation
- Each core consists of an Arithmetic logic unit (ALU) that handles integer and single precision calculations and a Floating point unit (FPU) that handles double precision calculations
Special function units (SFU) for transcendental function (FPU) that handles double precision calculations

例如，高端的Kepler构架显卡有15个SMs，每个又有12组每组16个的CUDA core，这样一共有2880个CUDA core（其中只有2048个线程可以同时操作）。合理的CUDA使用方法是尽量保证快速为线程输入数据使之始终保持工作状态，因而理解memory hiearchy非常重要。GTX 750 只有 512个CUDA core

获取GPU信息

不同NVIDIA显卡对于CUDA的支持并不相同，因而使用CUDA前，不仅要了解它的物理构架还要了解其对CUDA的支持情况，NVIDIA使用Compute Capability来描述产品对CUDA功能的支持情况，可以在支持网页上查询到产品的 Compute Capability。

The Compute Capability describes the features supported by a CUDA hardware.

同时，Compute Capability虽然描述的不是GPU的构架，但由于二者都是在新产品中不断更新，因而它们之间也有一定的相关关系。
例如，每个SM上ALU的数量随版本变化：

Compute Capability	1.x	2.0	2.1	3.x	5.x	6.0	6.1
number of ALU	8	32	48	192	128	64	128

使用numba的接口可以获得GPU的相关信息，如：

from numba import cuda
my_gpu = cuda.get_current_device()

获得型号：

print(my_gpu.name)
>> 'GeForce GTX 750'

获得 Compute Capability：


print(my_gpu.COMPUTE_CAPABILITY)
>> (5, 0)

获得SM数量：


print(my_gpu.MUTIPROCESSOR_COUNT)
>> 4

获得CUDA core的总数：

print(my_gpu. MUTIPROCESSOR_COUNT * 128)
>> 512

CUDA基础笔记（一）

CUDA显卡硬件

获取GPU信息

推荐阅读更多精彩内容