Performance Guide

This performance guide provides tips, instructions and best-practices for optimizing RAM, CPU and GPU performance when using the APIs of the Tinman 3D SDK.

Profiling Values

A profiling value is a quantity with a certain unit that is generated at runtime, in order to provide insight into the performance characteristics of the code.

Instances of IProfiler are used to produce and/or consume profiling values. They are attached to other objects via the IProfilerConsumer interface.

The ProfilerGui component provides a simple GUI with an optional 2D overlay, which can be used to browse the hierarchy of profiling values at runtime.

ApplicationLoop

The ApplicationLoop class provides profiling values for CPU/GPU/RAM usage and the amount of time spent invoking the application loop callbacks. See the ApplicationLoop.Profile* constants for details.

TerrainView

The TerrainView class provides a number of profiling values that are specific to the terrain rendering pipeline and data flow. See the TerrainView.Profile* constants for details.

Debug Helpers

Debug helpers are special class fields (static or instance), which can be updated by client code to enable additional debugging features at runtime.

Each debug helper is annotated with DebugHelperAttribute.

Disposable / ShowAllocationStackTraces

When set to true, allocation stack-traces of Disposable objects are collected and included in the output of finalized disposables (which usually indicates improper use of the Disposal and Ownership rules).

In DEBUG mode, the report of finalized disposables is output to a file named tinman3d.finalizers.ID.txt (where ID is then ame of enclosing process) in the user’s profile folder (see Environment.SpecialFolder.UserProfile, for example C:\Users\TheUserName). The file will be deleted if the report is empty.

This debug helper is only available in C#.

ObjectPoolBase / Interval

Object pools are used at various places in the Tinman 3D SDK. When used by client-code, this debug helper may be used to determine optimal pool parameters and to evaluate the performance of the pool.

See ObjectPoolBase.DebugInterval for details.

Monitor / DumpMonitorUsage

When this debug helper is enabled, the calls to Begin of Monitor objects are counted, grouped by caller. A periodic report is output to the standard output stream of the process. The report can be used to track down excessive use of thread synchronization quickly.

This debug helper is only available in C#.
Monitor usage console dump
Monitor.Begin : 217.848 (3.391/s)
#01: ObjectPool`1.GetThreadSafe      = 62.822 (28%) +0/s
#02: ObjectPool`1.PutThreadSafe      = 62.822 (28%) +0/s
#03: MeshBuffer_RefinementThread.Run = 22.251 (10%) +125/s
#04: Heightmap_Dataset.Begin         =  8.548 ( 3%) +156/s
#05: MeshBuffer.UpdateVertexFlags    =  7.415 ( 3%) +41/s
#06: BlockStorage.Begin              =  4.629 ( 2%) +0/s
#07: TaskResultBase.Wait             =  4.351 ( 1%) +21/s
#08: DataCache.CachePageData         =  3.832 ( 1%) +0/s
#09: MeshBuffer.MeshUpdate           =  3.651 ( 1%) +229/s
#10: DataUpdaterList`1.Validate      =  3.390 ( 1%) +154/s
#11: TaskResultBase.NotifyFinished   =  3.069 ( 1%) +15/s
#12: SampleBuffer_Pool.Get           =  2.884 ( 1%) +0/s
#13: TaskVoid`1.Schedule             =  2.203 ( 1%) +0/s
#14: TaskVoid`1.BackToPool           =  2.203 ( 1%) +0/s
- - -

Overuse of thread synchronization will degrade performance. Usually, those calls will appear at the top of the dump and can be used as starting points for additional profiling.

Monitor / DumpWaitTimes

Enabling this debug helper will measure the time spent by waiting during calls to WaitForNotify of Monitor objects. A periodic report is output to the standard output stream of the process. The report can be used to identify bottle-necks that limit parallel execution.

This debug helper is only available in C#.
Wait times console dump
Monitor.WaitForNotify : 105.390 (5.553/s)
#01: TaskThread.Run                   = 94.767 (89%) +20612/s
#02: DataCache.Access                 =  5.383 ( 5%) +1531/s
#03: MeshBuffer_RefinementThread.Run  =  2.257 ( 2%) +310/s
#04: DataStream_Background.Run        =  2.094 ( 1%) +0/s
#05: TaskPool.Wait                    =    544 ( 0%) +0/s
#06: BlockStorage.WriteWait           =    336 ( 0%) +0/s
#07: TaskResultBase.WaitForce         =      6 ( 0%) +2/s
#08: DataStream_Background.ReadBuffer =      3 ( 0%) +0/s
- - -

Long wait times do not necessarily indicate a performance problem. For example, pooled worker thread may spent most of their time waiting for work to be submitted. On the other hand, when callers spend lots of time waiting to gain access to shared memory caches, this usually means that there is some kind of a performance problem, such as an improperly sized cache.

See Monitor.DebugDumpWaitTimes for details.

GraphicsContext / EnableLogOutput

Graphics APIs usually provide a debug layer, which provides additional information that can be very helpful for debugging and testing. When this debug helper is enabled, IGraphicsContextFactory objects will enable the debug layer, if available.

DirectX12Context / DumpDescriptorPools

Being a low-level graphics API, Direct3D 12 requires an application to manage CPU/GPU descriptor values in pools. By enabling this debug helper, the descriptor pool usage of Tinman 3D is output periodically to the standard output stream of the process.

DirectX12Context / DumpUploadBuffer

Being a low-level graphics API, Direct3D 12 requires that an application manages data uploads from the CPU to the GPU by itself. This debug helper periodically outputs the state of the internal upload buffer to the standard output stream of the process. This can be used to check that no inadvertent uploads are performed.

General API

This section covers performance problems that might be encountered when using the general-purpose APIs of the Tinman 3D SDK (see Software Architecture).

ApplicationLoop

The main loop of an application is responsible for consuming user input, for updating the application state and for rendering new graphics frames. Often, the cycles of that loop are referred to a frames and the application performance is measured with frames pre second (FPS).

Basically, there are two options that determine the overall behaviour of an application loop, with respect to performance:

Limit FPS

An application may want to introduce a limit to the frames per second at which the loop runs, for example to reduce GPU power consumption.

Minimize CPU

After a change of application state, a new graphics frame is rendered (see UpdateFrameTime). When idle (i.e. no state changes), an application may prefer to sleep for short amount of time, instead of busy-waiting to keep as near as possible to the FPS limit.

With Tinman 3D, an application loop may be established using any of the following:

Table 1. Application loop options per API
API Limit FPS Minimize CPU

ApplicationLoop.Main

Implicitly via FrameRateLimit

Call Thread.Sleep when Idle.

ApplicationLoop.Run
IApplication.Run

Implicitly via FrameRateLimit

Set the sleepOnIdle parameter.

IApplicationControl

Implicitly via FrameRateLimit

Always enabled

FrameRateLimit

100 Hz, overridable by subclasses

-

WidgetApplication
IWidget.ToApplication

Set the frameRateLimit parameter.

-

IWidget.Run

Set the frameRateLimit parameter.

Set the sleepOnIdle parameter.

Low-level Terrain API

This section covers performance problems that might be encountered when using the Low-level Terrain API.

hourglass This section is not yet available, see roadmap for details.

High-level Terrain API

This section covers performance problems that might be encountered when using the High-level Terrain API.

hourglass This section is not yet available, see roadmap for details.

Scene API

This section covers performance problems that might be encountered when using the Scene API.

hourglass This section is not yet available, see roadmap for details.

C# Specifics

This section covers performance problems that might be encountered when using the C# version of the Tinman 3D SDK.

hourglass This section is not yet available, see roadmap for details.

C++ Specifics

This section covers performance problems that might be encountered when using the C++ version of the Tinman 3D SDK.

hourglass This section is not yet available, see roadmap for details.