注:在GPU编程中,我们一般说的设备端(Device)就是指GPU,而主机端(Host)则是指CPU
Parallel-For
Parallel-For用于对集合的每个元素或有序区间的每个索引执行相同的独立操作。如果这些操作没有写入其他操作访问的内存位置,则它们是独立的。与串行循环相比,对执行操作的顺序没有定义,并且操作可以并行运行。
void Gpu.For(int start, int end, Action<int> op);
前两个参数是指定循环范围的边界,第三个参数是一个Action,用来调用集合中的每一个数字,并将其作为参数。Gpu.For还需要一个GPU来运行,方便起见在Alea中提供了指定默认gpu的方法
var gpu = Gpu.Default;
var n = ...
gpu.For(0, n, i =>
{
...
});
Parallel-For工作原理如图所示,参数和结果构成一个闭包(closure)再传递给Parallel-For循环体。注意循环体中的操作必须彼此独立,它们不允许通过写入共享变量或共享数组元素进行通信。
下面是一个计算两个数组元素和的示例
using Alea.Parallel;
var gpu = Gpu.Default;
var arg1 = Enumerable.Range(0, Length).ToArray();
var arg2 = Enumerable.Range(0, Length).ToArray();
var result = new int[Length];
gpu.For(0, result.Length, i => result[i] = arg1[i] + arg2[i]);
利用action可以访问循环体外部定义的数据元素,并且支持直接写出结果到.NET数组。Alea GPU的内存管理是自动的,下面的章节会进一步解释。
笔者下载了Alea.Parallel的示例代码,加上自己改写的部分,做了做运行速度测试,功能是计算10000000次两数组元素和,代码如下
using System;
using System.Diagnostics;
using System.Linq;
using System.Threading.Tasks;
using Alea;
using Alea.Parallel;
namespace DeviceQuery
{
class Program
{
static void Main()
{
Stopwatch sw = new Stopwatch();
sw.Start();
DelegateWithClosureCpu(); //CPU并行闭包委托
sw.Stop();
double time1 = sw.Elapsed.TotalMilliseconds;
sw.Restart();
DelegateWithClosureGpu();//GPU并行闭包委托
sw.Stop();
double time2 = sw.Elapsed.TotalMilliseconds;
sw.Restart();
ActionWithClosure();//GPU并行闭包Action
sw.Stop();
double time3 = sw.Elapsed.TotalMilliseconds;
sw.Restart();
ActionFactoryWithClosureGPU();//GPU并行闭包Func和CPU并行闭包Func
sw.Stop();
double time4 = sw.Elapsed.TotalMilliseconds;
sw.Restart();
ActionFactoryWithClosureCPU();//CPU并行闭包Func和CPU并行闭包Func
sw.Stop();
double time5 = sw.Elapsed.TotalMilliseconds;
Console.WriteLine(time1);
Console.WriteLine(time2);
Console.WriteLine(time3);
Console.WriteLine(time4);
Console.WriteLine(time5);
Console.ReadKey();
}
private const int Length = 10000000;
public static void DelegateWithClosureCpu()
{
var arg1 = Enumerable.Range(0, Length).ToArray();
var arg2 = Enumerable.Range(0, Length).ToArray();
var result = new int[Length];
Parallel.For(0, result.Length, i => result[i] = arg1[i] + arg2[i]);
var expected = arg1.Zip(arg2, (x, y) => x + y);
}
[GpuManaged]
public static void DelegateWithClosureGpu()
{
var arg1 = Enumerable.Range(0, Length).ToArray();
var arg2 = Enumerable.Range(0, Length).ToArray();
var result = new int[Length];
Gpu.Default.For(0, result.Length, i => result[i] = arg1[i] + arg2[i]);
var expected = arg1.Zip(arg2, (x, y) => x + y);
}
[GpuManaged]
public static void ActionWithClosure()
{
var gpu = Gpu.Default;
var arg1 = Enumerable.Range(0, Length).ToArray();
var arg2 = Enumerable.Range(0, Length).ToArray();
var result = new int[Length];
Action<int> op = i => result[i] = arg1[i] + arg2[i];
gpu.For(0, arg1.Length, op);
var expected = arg1.Zip(arg2, (x, y) => x + y);
}
[GpuManaged]
public static void ActionFactoryWithClosureGPU()
{
var gpu = Gpu.Default;
var arg1 = Enumerable.Range(0, Length).ToArray();
var arg2 = Enumerable.Range(0, Length).ToArray();
var result = new int[Length];
var expected = new int[Length];
Func<int[], Action<int>> opFactory = res => i => res[i] = arg1[i] + arg2[i];
gpu.For(0, arg1.Length, opFactory(result));
}
[GpuManaged]
public static void ActionFactoryWithClosureCPU()
{
var gpu = Gpu.Default;
var arg1 = Enumerable.Range(0, Length).ToArray();
var arg2 = Enumerable.Range(0, Length).ToArray();
var result = new int[Length];
var expected = new int[Length];
Func<int[], Action<int>> opFactory = res => i => res[i] = arg1[i] + arg2[i];
Parallel.For(0, arg1.Length, opFactory(expected));
}
}
}