A couple of weeks ago, at the WWDC 2016, the Apple engineers released a new document, the Metal Best Practices Guide which includes useful information about organizing your code for better performance in your Metal apps. Because the documentation is quite extensive, we are just going to outline the main concepts in this article. An efficient Metal app requires:

  • Low CPU overhead.
  • Optimal GPU performance.
  • Continuous processor parallelism.
  • Effective resource management.

1 Resource Management

1.1 Persistent Objects

Best Practice: Create persistent objects early and reuse them often.

The Metal framework provides several protocols to manage persistent objects throughout the lifetime of your app. These objects are expensive to create but are usually initialized once and reused often. Do not create these objects at the beginning of every render or compute loop.

  • Initialize Your Device and Command Queue First
  • Compile Your Functions and Build Your Library at Build Time
  • Build Your Pipelines Once and Reuse Them Often
  • Allocate Resource Storage Up Front

For more information, consult the Persistent Objects section of the documentation.

1.2 Resource Options

Best Practice: Set appropriate resource storage modes and texture usage options for your resources.

Metal resources must be configured appropriately to take advantage of fast memory access and driver performance optimizations. Resource storage modes allow you to define the storage location and access permissions for your MTLBuffer and MTLTexture objects. Texture usage options allow you to explicitly declare how you intend to use your MTLTexture objects.

  • Familiarize Yourself with Device Memory Models
  • Choose an Appropriate Resource Storage Mode (iOS and tvOS)
  • Choose an Appropriate Resource Storage Mode (OS X)
  • Set Appropriate Texture Usage Flags

For more information, consult the Resource Options section of the documentation.

1.3 Triple Buffering

Best Practice: Implement a triple buffering model to update dynamic buffer data.

Dynamic buffer data refers to frequently-updated data stored in a buffer. To avoid creating new buffers per frame and to minimize processor idle time between frames, implementing a triple buffering model is strongly recommended.

  • Prevent Access Conflicts and Reduce Processor Idle Time
  • Reduce Memory Overhead and Frame Latency
  • Allow Time for Command Buffer Transactions
  • Implement a Triple Buffering Model

For more information, consult the Triple Buffering section of the documentation.

1.4 Buffer Bindings

Best Practice: Use an appropriate method to bind your buffer data to a graphics or compute function.

Metal provides several API options for binding buffer data to a graphics or compute function. The setVertexBytes:length:atIndex: method is the best option for binding an amount of dynamic buffer data (a transient buffer) that is less than 4 KB to a vertex function. If the data size is larger than 4 KB, you should create a MTLBuffer once and update its contents as needed.

For more information, consult the Buffer Bindings section of the documentation.

2 Display Management

2.1 Drawables

Best Practice: Hold a drawable as briefly as possible.

The command buffer is used to schedule a drawable’s presentation with the presentDrawable: method before the command buffer itself is scheduled for execution, however, the drawable itself is actually presented after the command buffer has completed execution.

  • Use a MetalKit View to Acquire a Drawable

For more information, consult the Drawables section of the documentation.

2.2 Native Screen Scale (iOS and tvOS)

Best Practice: Render at the exact pixel size of your target display.

The pixel size of your drawables should always match the exact pixel size of their target display. This is critical to avoid rendering to off-screen pixels or incurring an additional sampling stage.

  • Use a MetalKit View to Support Native Screen Scale

For more information, consult the Native Screen Scale section of the documentation.

2.3 Frame Rate (iOS and tvOS)

Best Practice: For apps that can’t maintain a 60 FPS frame rate, present your drawables at a steady frame rate.

The display refresh rate of iOS devices is 60 Hz. Apps that are consistently unable to complete a frame’s work within this time should target a lower frame rate to avoid jitter. The display refresh rate of tvOS devices is usually, but not always, 60 Hz.

  • Use the Display Link
  • Adjust the Frame Interval
  • Adjust the Drawable Presentation Time

For more information, consult the Frame Rate section of the documentation.

3 Command Generation

3.1 Load and Store Actions

Best Practice: Set appropriate load and store actions for your render targets.

Actions performed on your Metal render targets must be configured appropriately to avoid costly and unnecessary rendering work at the start (load action) or end (store action) of a rendering pass.

  • Choose an Appropriate Load Action
  • Choose an Appropriate Store Action
  • Evaluate Actions Between Rendering Passes

For more information, consult the Load and Store Actions section of the documentation.

3.2 Render Command Encoders (iOS and tvOS)

Best Practice: Merge render command encoders when possible.

Eliminating unnecessary render command encoders reduces memory bandwidth and increases performance.

  • Evaluate Rendering Pass Order
  • Evaluate Sampling Dependencies
  • Evaluate Actions Between Rendering Passes

For more information, consult the Render Command Encoders section of the documentation.

3.3 Command Buffers

Best Practice: Submit the fewest command buffers per frame without underutilizing the GPU.

Command buffers are the unit of work submission in Metal; they are created by the CPU and executed by the GPU. This relationship allows you to balance CPU and GPU work by adjusting the number of command buffers submitted per frame.

For more information, consult the Command Buffers section of the documentation.

3.4 Indirect Buffers

Best Practice: Use indirect buffers if your draw or dispatch call arguments are dynamically generated by the GPU.

Indirect buffers are MTLBuffer objects with a specific data layout representing draw or dispatch call arguments.

  • Eliminate Unnecessary Data Transfers and Reduce Processor Idle Time

For more information, consult the Indirect Buffers section of the documentation.

4 Compilation

4.1 Functions and Libraries

Best Practice: Compile your functions and build your library at build time.

Compiling Metal Shading Language source code is one of the most expensive stages in a Metal app. Metal is designed to minimize this cost by allowing you to compile graphics and compute functions at build time, then load them at runtime as a library.

  • Build Your Library at Build Time
  • Group Your Functions into a Single Library

For more information, consult the Functions and Libraries section of the documentation.

4.2 Pipelines

Best Practice: Build your render and compute pipelines asynchronously.

Having multiple render or compute pipelines allows your app to use different state configurations for specific tasks. Building these pipelines asynchronously maximizes performance and parallelism. It is recommended that you build all known pipelines up front and avoid lazy loading.

For more information, consult the Pipelines section of the documentation.

This guide, along with the Metal Programming Guide and the Metal Shading Language Guide both already updated for iOS 10, tvOS 10 and OS X 10.12, give you the trilogy of documents containing everything you need to start creating performant Metal apps.

Until next time!