Today we look at how memory is managed when working with the GPU. The Metal framework defines memory sources as MTLBuffer objects which are typeless and unformatted allocations of memory (any type of data), and MTLTexture objects which are formatted allocations of memory holding image data. We only look at buffers in this article.

To create MTLBuffer objects we have 3 options:

  • makeBuffer(length:options:) creates a MTLBuffer object with a new allocation.
  • makeBuffer(bytes:length:options:) copies data from an existing allocation into a new allocation.
  • makeBuffer(bytesNoCopy:length:options:deallocator:) reuses an existing storage allocation.

Let’s create a couple of buffers and see how data is being sent to the GPU and then sent back to the CPU. We first create a buffer for both input and output data and initialize them to some values:

let count = 1500
var myVector = [Float](repeating: 0, count: count)
var length = count * MemoryLayout< Float >.stride
var outBuffer = device.makeBuffer(bytes: myVector, length: length, options: [])
for (index, value) in myVector.enumerated() { myVector[index] = Float(index) }
var inBuffer = device.makeBuffer(bytes: myVector, length: length, options: [])

The new MemoryLayout< Type >.stride syntax was introduced in Swift 3 to replace the old strideof(Type) function. By the way, we use .stride instead of .size for memory alignment reasons. The stride is the number of bytes moved when a pointer is incremented. The next step is to tell the command encoder about our buffers:

encoder.setBuffer(inBuffer, offset: 0, at: 0)
encoder.setBuffer(outBuffer, offset: 0, at: 1)

Note: the Metal Best Practices Guide states that we should always avoid creating buffers when our data is less than 4 KB (up to a thousand Floats, for example). In this case we should simply use the setBytes() function instead of creating a buffer.

The final step is to read the data the GPU sent back by using the contents() function to bind the memory data to our output buffer:

let result = outBuffer.contents().bindMemory(to: Float.self, capacity: count)
var data = [Float](repeating:0, count: count)
for i in 0 ..< count { data[i] = result[i] }

Metal resources must be configured for fast memory access and driver performance optimizations. Resource storage modes let us define the storage location and access permissions for our buffers and textures. If you take a look above where we created our buffers, we used the default option ([]) as the storage mode.

All iOS and tvOS devices support a unified memory model where both the CPU and the GPU share the system memory, while macOS devices support a discrete memory model where the GPU has its own memory. In iOS and tvOS, the Shared mode (MTLStorageModeShared) defines system memory accessible to both CPU and GPU, while Private mode (MTLStorageModePrivate) defines system memory accessible only to the GPU. The Shared mode is the default storage mode on all three operating systems.

alt text

Besides these two storage modes, macOS also has a Managed mode (MTLStorageModeManaged) that defines a synchronized memory pair for a resource, with one copy in system memory and another in video memory for faster CPU and GPU local accesses.

alt text

Now let’s look at what happens on the GPU when we send it data buffers. Here is a typical vertex shader example:

vertex Vertices vertex_func(const device Vertices *vertices [[buffer(0)]], 
constant Uniforms &uniforms [[buffer(1)]],
uint vid [[vertex_id]])

The Metal Shading Language implements address space qualifiers to specify the region of memory where a function variable or argument is allocated:

  • device - refers to buffer memory objects allocated from the device memory pool that are both readable and writeable unless the keyword const preceeds it in which case the objects are only readable.
  • constant - refers to buffer memory objects allocated from the device memory pool but that are read-only. Variables in program scope must be declared in the constant address space and initialized during the declaration statement. The constant address space is optimized for multiple instances executing a graphics or kernel function accessing the same location in the buffer.
  • threadgroup - is used to allocate variables used by a kernel functions only and they are allocated for each threadgroup executing the kernel, are shared by all threads in a threadgroup and exist only for the lifetime of the threadgroup that is executing the kernel.
  • thread - refers to the per-thread memory address space. Variables allocated in this address space are not visible to other threads. Variables declared inside a graphics or kernel function are allocated in the thread address space.

As a bonus, let’s also look at another way of accessing memory locations in Swift 3. This code snippet belongs to a previous article, The Model I/O framework, so we will not go again into details about voxels. Just think of an array that we need to iterate over to get values:

let url = Bundle.main.url(forResource: "teapot", withExtension: "obj")
let asset = MDLAsset(url: url)
let voxelArray = MDLVoxelArray(asset: asset, divisions: 10, patchRadius: 0)
if let data = voxelArray.voxelIndices() {
data.withUnsafeBytes { (voxels: UnsafePointer<MDLVoxelIndex>) -> Void in
let count = data.count / MemoryLayout<MDLVoxelIndex>.size
let position = voxelArray.spatialLocation(ofIndex: voxels.pointee)

In this case, the MDLVoxelArray object has a function named spatialLocation() which lets us iterate through the array by using an UnsafePointer of the MDLVoxelIndex type and accessing the data through the pointee at each location. In this example we are only printing out the first value found at that address but a simple loop will let us get all of them like this:

var voxelIndex = voxels
for _ in 0..<count {
let position = voxelArray.spatialLocation(ofIndex: voxelIndex.pointee)
voxelIndex = voxelIndex.successor()

The source code is posted on Github as usual.

Until next time!