Metal 3 was announced last week at WWDC 2019. Apple presented it along with relevant numbers:

  • Metal can now make 100 times more draw calls than OpenGL.
  • Metal runs on roughly 1.4 billion devices currently.
  • Metal can drive up to 56 TFLOPS of single precision.

Note: To get 56 TFLOPS, you need the new Mac Pro with dual Vega II Duo (4 GPUs). The Radeon Pro Vega II Duo is the world’s most powerful GPU at the moment, capable of 28.3 TFLOPs of FP32 precision. This GPU is only available to the Mac Pro currently and it uses the Infinity Fabric Link to boost internal transfers between the dual GPUs to 48 GB/sec.

The Metal Shading Language is now at version 2.2 while the API is at version 3. There is now a way to check the Metal version in Xcode 11 by querying a MTLSoftwareVersion enum, a device property:

Metal version

Alright, let’s see some of the major additions to the Metal framework this year.

1. The iOS Simulator now supports Metal

Most of the frameworks are now Metal accelerated: UIKit, SpriteKit, SceneKit, Core Animation, Core Image, MapKit and so on. The simulator works with A8 GPUs and later. You can even run two simulators simultaneously on two different targets:

Two targets

The iOS Metal commands are translated to macOS Metal so you can benefit from the Mac underlying GPU hardware. From the simulator menu you can choose which macOS GPU to use:

GPU selection

The performance in the simulator is still below that of a real device, so production code should be profiled and optimized on the device ultimately. Another thing to remember when using simulators is that the texture storage on the simulator needs to always be in private mode. It is easy, however, to cover both cases. When the texture is on the simulator, create a temporary shared buffer, initialize the texture to that buffer and then blit it to the private texture:

#if targetEnvironment(simulator) 
textureDescriptor.storageMode = .private 
#else 
textureDescriptor.storageMode = .shared 
#endif 

let texture = device.makeTexture(descriptor: textureDescriptor)! 
if texture.storageMode == .private { 
    let tmpBuffer = device.makeBuffer(length: textureSize, 
				      options: .storageModeShared)! 
    initWithTextureData(buffer: tmpBuffer) 
    blitData(fromBuffer: tmpBuffer, toTexture: texture) 
} else { 
    initWithTextureData(texture: texture) 
} 

2. Simpler GPU families

The new Metal Feature Set Tables document was also updated to version 3 and it replaced the old support for feature sets with support for the new GPU families, as following:

  • The Apple Family refers to features for all the Apple designed GPUs (the A-series GPUs).

Apple Family

  • The Mac Family refers to features for all the macOS GPUs (Intel, AMD, Nvidia):

Mac Family

  • The Common Family refers to features supported by all devices and platforms:

Common Family

  • The iPad Apps for Mac Family refers to features for iPadOS apps running on macOS:

iPad Mac Family

To determine if Mac 2 family features are available:

if #available(macOS 10.15, iOS 13, tvOS 13, *) { 
    if self.device.supportsVersion(.version3_0) { 
        if self.device.supportsFamily(.familyMac2) { 
            // enable Metal 3 features for the Mac family 2 
        }
    }
    else { 
        // enable Metal 2 features (fallback)
    }
} else { 
    if self.device.supportsFeatureSet(.featureSet_macOS_GPUFamily2_v1) { 
        // enable Metal 2 features (fallback)
    }
}

Here are some of the most common techniques and their family support:

Feature Family
Deferred shading All
Programmable blending Apple 1 and later
Tile deferred / forward Common 2 and later
Tile shading Apple 3 and later
Visibility buffer Mac 1 and later
Argument buffers All
Indirect Command buffers Common 2 and later
   

3. Ray Tracing and compute

Last year when the Metal Performance Shaders (MPS) APIs for Ray Tracing were introduced, the calculation of ray-triangle intersections was moved on to the GPU, making Ray Tracing in Metal really appealing for the first time. This year, two other costly and important stages were moved on to the GPU as well: the acceleration structure and the image denoising.

The Acceleration Structure is updated through a process called refitting which does not rebuild the acceleration structure from scratch but rather moves the bounding boxes to where the geometry moved, thus saving precious processing time. Refitting is now done entirely on the GPU:

Refitting

The Image Denoising is cleverly aided by an image processing based denoising filter. The idea behind it is to store normals and depth information each frame, and then compare them with the next frame to see if certain pixels got invalidated. This could happen if the object moved to a different location or if another object occluded it.

The new MPSSVGF classes are implementing the Spatiotemporal Variance-Guided Filtering denoising algorithm. Denoising is now up to 1000x faster on GPU than it would take on the CPU.

Denoising
Metal introduced Indirect Compute Buffers (ICB) last year as a way to reduce CPU overhead and simplify command execution by reusing commands. But it only worked with rendering. This year MTLIndirectComputeCommand joins MTLIndirectRenderCommand as encoding types on the ICB.

4. Debugging and Profiling Tools

The GPU frame capture tool now has a Metal Memory Viewer which lets you inspect textures, buffers and heaps. The tool provides detailed information about the storage mode, type and size:

Memory debugger

As for profiling, the Instruments tool now has the Metal Resource Allocations instrument which lets you inspect the storage location as well and provides information about resource utilization and state on each device, display or the shader compiler:

Instrument

5. Other new features in Metal 3:

New to iOS and tvOS:

  • Setting pipeline states on Indirect Command Buffers
  • Sourcing Indirect Command Buffer ranges from buffers
  • 16-bit depth textures

New to macOS:

  • Rendering without render pass attachments
  • Command buffer timing
  • Casting between sRGB and non-sRGB texture views

Many other new features:

  • Heaps support developer driven placement
  • Heaps can track resources
  • macOS blit alignment rules are relaxed to match Apple GPUs
  • Improvement on resource usage
  • Well-defined behavior for texture access
  • Texture custom swizzle
  • Texture sharing across processes
  • iOS texture binding number increased
  • iOS varying limit increased
  • ASTC 3D support for recent Apple GPUs
  • 3D BC textures on all Mac GPUs
  • Visibility buffer (aka occlusion query) size increased to 256K
  • MSL new attributes [[ primitive_id ]] and [[ barycentric_coord ]]

For a complete list of new features consult the Metal API documentation website. The new source code is available on the Apple website as well.

Until next time!