Today we look at how memory is managed when working with the GPU
. The Metal
framework defines memory sources as MTLBuffer
objects which are typeless and unformatted allocations of memory (any type of data), and MTLTexture
objects which are formatted allocations of memory holding image data. We only look at buffers in this article.
To create MTLBuffer
objects we have 3 options:
- makeBuffer(length:options:) creates a
MTLBuffer
object with a new allocation. - makeBuffer(bytes:length:options:) copies data from an existing allocation into a new allocation.
- makeBuffer(bytesNoCopy:length:options:deallocator:) reuses an existing storage allocation.
Let’s create a couple of buffers and see how data is being sent to the GPU
and then sent back to the CPU
. We first create a buffer for both input and output data and initialize them to some values:
let count = 1500
var myVector = [Float](repeating: 0, count: count)
var length = count * MemoryLayout< Float >.stride
var outBuffer = device.makeBuffer(bytes: myVector, length: length, options: [])
for (index, value) in myVector.enumerated() { myVector[index] = Float(index) }
var inBuffer = device.makeBuffer(bytes: myVector, length: length, options: [])
The new MemoryLayout< Type >.stride syntax was introduced in Swift 3
to replace the old strideof(Type)
function. By the way, we use .stride
instead of .size
for memory alignment reasons. The stride is the number of bytes moved when a pointer is incremented. The next step is to tell the command encoder about our buffers:
encoder.setBuffer(inBuffer, offset: 0, at: 0)
encoder.setBuffer(outBuffer, offset: 0, at: 1)
Note: the Metal Best Practices Guide states that we should always avoid creating buffers when our data is less than 4 KB (up to a thousand
Floats
, for example). In this case we should simply use the setBytes() function instead of creating a buffer.
The final step is to read the data the GPU
sent back by using the contents() function to bind the memory data to our output buffer:
let result = outBuffer.contents().bindMemory(to: Float.self, capacity: count)
var data = [Float](repeating:0, count: count)
for i in 0 ..< count { data[i] = result[i] }
Metal
resources must be configured for fast memory access and driver performance optimizations. Resource storage modes let us define the storage location and access permissions for our buffers and textures. If you take a look above where we created our buffers, we used the default option ([]) as the storage mode.
All iOS
and tvOS
devices support a unified memory model where both the CPU
and the GPU
share the system memory, while macOS
devices support a discrete memory model where the GPU
has its own memory. In iOS
and tvOS
, the Shared mode (MTLStorageModeShared
) defines system memory accessible to both CPU
and GPU
, while Private mode (MTLStorageModePrivate
) defines system memory accessible only to the GPU. The Shared
mode is the default storage mode on all three operating systems.

Besides these two storage modes, macOS
also has a Managed mode (MTLStorageModeManaged
) that defines a synchronized memory pair for a resource, with one copy in system memory and another in video memory for faster CPU and GPU local accesses.

Now let’s look at what happens on the GPU
when we send it data buffers. Here is a typical vertex shader example:
vertex Vertices vertex_func(const device Vertices *vertices [[buffer(0)]],
constant Uniforms &uniforms [[buffer(1)]],
uint vid [[vertex_id]])
{
...
}
The Metal Shading Language
implements address space qualifiers to specify the region of memory where a function variable or argument is allocated:
- device – refers to buffer memory objects allocated from the device memory pool that are both readable and writeable unless the keyword const preceeds it in which case the objects are only readable.
- constant – refers to buffer memory objects allocated from the device memory pool but that are
read-only
. Variables in program scope must be declared in the constant address space and initialized during the declaration statement. The constant address space is optimized for multiple instances executing a graphics or kernel function accessing the same location in the buffer. - threadgroup – is used to allocate variables used by a kernel functions only and they are allocated for each threadgroup executing the kernel, are shared by all threads in a threadgroup and exist only for the lifetime of the threadgroup that is executing the kernel.
- thread – refers to the per-thread memory address space. Variables allocated in this address space are not visible to other threads. Variables declared inside a graphics or kernel function are allocated in the thread address space.
As a bonus, let’s also look at another way of accessing memory locations in Swift 3
. This code snippet belongs to a previous article, The Model I/O framework, so we will not go again into details about voxels. Just think of an array that we need to iterate over to get values:
let url = Bundle.main.url(forResource: "teapot", withExtension: "obj")
let asset = MDLAsset(url: url)
let voxelArray = MDLVoxelArray(asset: asset, divisions: 10, patchRadius: 0)
if let data = voxelArray.voxelIndices() {
data.withUnsafeBytes { (voxels: UnsafePointer<MDLVoxelIndex>) -> Void in
let count = data.count / MemoryLayout<MDLVoxelIndex>.size
let position = voxelArray.spatialLocation(ofIndex: voxels.pointee)
print(position)
}
}
In this case, the MDLVoxelArray
object has a function named spatialLocation()
which lets us iterate through the array by using an UnsafePointer
of the MDLVoxelIndex
type and accessing the data through the pointee
at each location. In this example we are only printing out the first value found at that address but a simple loop will let us get all of them like this:
var voxelIndex = voxels
for _ in 0..<count {
let position = voxelArray.spatialLocation(ofIndex: voxelIndex.pointee)
print(position)
voxelIndex = voxelIndex.successor()
}
The source code is posted on Github
as usual.
Until next time!