Last time we looked at how to manipulate vertices from Model I/O objects on the GPU. In this part we are going to show yet another way to create particles using compute threads. We can reuse the playground from last time and we start by modifying the Particle struct in our metal view delegate class to only include two members that we will update on the GPU - position and velocity:

struct Particle {
    var position: float2
    var velocity: float2

We need neither the timer variable, nor the translate(by:) and update() methods anymore so you can delete them. The significant change happens inside the initializeBuffers() method:

func initializeBuffers() {
    for _ in 0 ..< particleCount {
        let particle = Particle(
        		position: float2(Float(arc4random() %  UInt32(side)), 
        				Float(arc4random() % UInt32(side))), 
        		velocity: float2((Float(arc4random() %  10) - 5) / 10, 
        				(Float(arc4random() %  10) - 5) / 10))
    let size = particles.count * MemoryLayout<Particle>.size
    particleBuffer = device.makeBuffer(bytes: &particles, length: size, options: [])

Note: we generate random positions to fill the entire window and we also generate velocities that will range between [-5, 5]. we also divide by 10 to slow them down a little.

The most important part however, is happening when configuring the command encoder. We set the numbers of threads per group to be a 2D grid determined on one side by the thread execution width and on the other side by the maximum total threads per threadgroup which are hardware characteristics specific to each GPU and will never change during execution. We set the number of threads per grid to be a one-dimensional array whose size is determined by the particle count:

let w = pipelineState.threadExecutionWidth
let h = pipelineState.maxTotalThreadsPerThreadgroup / w
let threadsPerGroup = MTLSizeMake(w, h, 1)
let threadsPerGrid = MTLSizeMake(particleCount, 1, 1)
commandEncoder.dispatchThreads(threadsPerGrid, threadsPerThreadgroup: threadsPerGroup)

Note: new in Metal 2, the dispatchThreads(:) method lets us dispatch work without having to specify how many thread groups we want. in contrast to using the older dispatchThreadgroups(:) method, the new method calculates the number of groups and provides nonuniform thread groups when the size of the grid is not a multiple of the group size, and also makes sure there are no underutilized threads.

On to the kernel shader, we first match the particle struct with the one on the CPU and then inside the kernel we update the positions and velocities:

Particle particle = particles[id];
float2 position = particle.position;
float2 velocity = particle.velocity;
int width = output.get_width();
int height = output.get_height();
if (position.x < 0 || position.x > width) { velocity.x *= -1; }
if (position.y < 0 || position.y > height) { velocity.y *= -1; }
position += velocity;
particle.position = position;
particle.velocity = velocity;
particles[id] = particle;
uint2 pos = uint2(position.x, position.y);
output.write(half4(1.), pos);
output.write(half4(1.), pos + uint2( 1, 0));
output.write(half4(1.), pos + uint2( 0, 1));
output.write(half4(1.), pos - uint2( 1, 0));
output.write(half4(1.), pos - uint2( 0, 1));

Note: we do checks for bounds and when that happens we simply reverse the velocity so the particles do not leave the screen. we also use a neat trick when drawing, by making sure the four neighboring pixels are also drawn so the particles look a bit larger. fair warning - be careful when writing the neighboring pixels that are outside of the texture. that part of the memory might belong to another program and should not be written here, so make sure to do bound checks here as well.

You can set particleCount to 1,000,000 if you want but it will take a few seconds to generate them before rendering them all. Because I am only rendering in a relatively small window, I am only rendering 10,000 particles so they don’t look too crammed in this window space. If you run the app, you should be able to see the particles moving around randomly:

alt text

This article concludes the rendering particles series. I want to thank FlexMonkey for sharing great insights about compute concepts. The source code is posted on Github as usual.

Until next time!