在 Mac 上能让程序运行在 GPU 上，一般不能用 CUDA（因为 CUDA 只能在 NVIDIA GPU 上用，而大部分 Mac 用 AMD 或 Apple Silicon GPU）。

在 macOS 里，你需要用 Metal API。Metal 是苹果提供的 GPU 编程接口。

程序说明

语言：Metal Shading Language (MSL) + Objective-C 主程序
功能：在 GPU 上生成 "Hello, GPU!" 并把结果传回 CPU 输出。

代码

1. Metal 内核文件 hello.metal

#include <metal_stdlib>
using namespace metal;

kernel void hello(device char* buffer [[buffer(0)]], uint id [[thread_position_in_grid]]) {
    const char msg[] = "Hello, GPU!";
    if (id < sizeof(msg)) {
        buffer[id] = msg[id];
    }
}

2. 主程序 main.m

#import <Foundation/Foundation.h>
#import <Metal/Metal.h>

int main() {
    @autoreleasepool {
        id<MTLDevice> device = MTLCreateSystemDefaultDevice();
        NSLog(@"Using GPU: %@", [device name]);

        id<MTLCommandQueue> commandQueue = [device newCommandQueue];

        NSError *error = nil;

        // 用新的 API：需要 NSURL 而不是 NSString
        NSURL *libraryURL = [NSURL fileURLWithPath:@"hello.metallib"];
        id<MTLLibrary> library = [device newLibraryWithURL:libraryURL error:&error];
        if (!library) {
            NSLog(@"Failed to load Metal library: %@", error);
            return -1;
        }

        id<MTLFunction> function = [library newFunctionWithName:@"hello"];
        id<MTLComputePipelineState> pipeline = [device newComputePipelineStateWithFunction:function error:&error];

        const int bufferSize = 64;
        id<MTLBuffer> buffer = [device newBufferWithLength:bufferSize options:MTLResourceStorageModeShared];

        id<MTLCommandBuffer> commandBuffer = [commandQueue commandBuffer];
        id<MTLComputeCommandEncoder> encoder = [commandBuffer computeCommandEncoder];
        [encoder setComputePipelineState:pipeline];
        [encoder setBuffer:buffer offset:0 atIndex:0];
        [encoder dispatchThreads:MTLSizeMake(bufferSize, 1, 1) threadsPerThreadgroup:MTLSizeMake(1, 1, 1)];
        [encoder endEncoding];

        [commandBuffer commit];
        [commandBuffer waitUntilCompleted];

        char *result = (char *)[buffer contents];
        NSLog(@"Result from GPU: %s", result);
    }
    return 0;
}

编译运行步骤

编译 Metal 内核

xcrun -sdk macosx metal -c hello.metal -o hello.air
xcrun -sdk macosx metallib hello.air -o hello.metallib

编译主程序

clang -framework Metal -framework Foundation main.m -o hello_gpu

运行

./hello_gpu

输出

Using GPU: Apple M2
Result from GPU: Hello, GPU!

image.png

原理

hello.metal 是在 GPU 上运行的「kernel」。

主程序用 Metal API 把 kernel 加载到 GPU，并分配显存（buffer）。

GPU 内核把字符串写进 buffer，CPU 从 buffer 里读出来打印。

2025-07-23最接近 CUDA 风格的、可在 Mac 上跑的 Metal GPU 版 HelloWorld