Performance Guide
This performance guide provides tips, instructions and best-practices for optimizing RAM, CPU and GPU performance when using the APIs of the Tinman 3D SDK.
Profiling Values
A profiling value is a quantity with a certain unit that is generated at runtime, in order to provide insight into the performance characteristics of the code.
Instances of IProfiler are used to produce and/or consume profiling values. They are attached to other objects via the IProfilerConsumer interface.
The ProfilerGui component provides a simple GUI with an optional 2D overlay, which can be used to browse the hierarchy of profiling values at runtime.
ApplicationLoop
The ApplicationLoop class provides profiling values for CPU/GPU/RAM usage and the amount of time spent invoking the application loop callbacks. See the ApplicationLoop.Profile* constants for details.
TerrainView
The TerrainView class provides a number of profiling values that are specific to the terrain rendering pipeline and data flow. See the TerrainView.Profile* constants for details.
Debug Helpers
Debug helpers are special class fields (static or instance), which can be updated by client code to enable additional debugging features at runtime.
Each debug helper is annotated with DebugHelper. |
Disposable / ShowAllocationStackTraces
When set to true
, allocation stack-traces of Disposable objects are collected and included in the output of finalized disposables (which usually indicates improper use of the Disposal and Ownership rules).
In DEBUG
mode, the report of finalized disposables is output to a file named tinman3d.finalizers.ID.txt
(where ID
is then ame of enclosing process) in the user’s profile folder (see Environment.SpecialFolder.UserProfile
, for example C:\Users\TheUserName
).
The file will be deleted if the report is empty.
This debug helper is only available in C#. |
See Disposable.DebugShowAllocationStackTraces for details.
ObjectPoolBase / Interval
Object pools are used at various places in the Tinman 3D SDK. When used by client-code, this debug helper may be used to determine optimal pool parameters and to evaluate the performance of the pool.
See ObjectPoolBase.DebugInterval for details.
Monitor / DumpMonitorUsage
When this debug helper is enabled, the calls to Begin of Monitor objects are counted, grouped by caller. A periodic report is output to the standard output stream of the process. The report can be used to track down excessive use of thread synchronization quickly.
This debug helper is only available in C#. |
Monitor.Begin : 217.848 (3.391/s) #01: ObjectPool`1.GetThreadSafe = 62.822 (28%) +0/s #02: ObjectPool`1.PutThreadSafe = 62.822 (28%) +0/s #03: MeshBuffer_RefinementThread.Run = 22.251 (10%) +125/s #04: Heightmap_Dataset.Begin = 8.548 ( 3%) +156/s #05: MeshBuffer.UpdateVertexFlags = 7.415 ( 3%) +41/s #06: BlockStorage.Begin = 4.629 ( 2%) +0/s #07: TaskResultBase.Wait = 4.351 ( 1%) +21/s #08: DataCache.CachePageData = 3.832 ( 1%) +0/s #09: MeshBuffer.MeshUpdate = 3.651 ( 1%) +229/s #10: DataUpdaterList`1.Validate = 3.390 ( 1%) +154/s #11: TaskResultBase.NotifyFinished = 3.069 ( 1%) +15/s #12: SampleBuffer_Pool.Get = 2.884 ( 1%) +0/s #13: TaskVoid`1.Schedule = 2.203 ( 1%) +0/s #14: TaskVoid`1.BackToPool = 2.203 ( 1%) +0/s - - -
Overuse of thread synchronization will degrade performance. Usually, those calls will appear at the top of the dump and can be used as starting points for additional profiling.
See Monitor.DebugDumpMonitorUsage for details.
Monitor / DumpWaitTimes
Enabling this debug helper will measure the time spent by waiting during calls to WaitForNotify of Monitor objects. A periodic report is output to the standard output stream of the process. The report can be used to identify bottle-necks that limit parallel execution.
This debug helper is only available in C#. |
See Monitor.DebugDumpWaitTimes for details.
GraphicsContext / EnableLogOutput
Graphics APIs usually provide a debug layer, which provides additional information that can be very helpful for debugging and testing. When this debug helper is enabled, IGraphicsContextFactory objects will enable the debug layer, if available.
Monitor.WaitForNotify : 105.390 (5.553/s) #01: TaskThread.Run = 94.767 (89%) +20612/s #02: DataCache.Access = 5.383 ( 5%) +1531/s #03: MeshBuffer_RefinementThread.Run = 2.257 ( 2%) +310/s #04: DataStream_Background.Run = 2.094 ( 1%) +0/s #05: TaskPool.Wait = 544 ( 0%) +0/s #06: BlockStorage.WriteWait = 336 ( 0%) +0/s #07: TaskResultBase.WaitForce = 6 ( 0%) +2/s #08: DataStream_Background.ReadBuffer = 3 ( 0%) +0/s - - -
Long wait times do not necessarily indicate a performance problem. For example, pooled worker thread may spent most of their time waiting for work to be submitted. On the other hand, when callers spend lots of time waiting to gain access to shared memory caches, this usually means that there is some kind of a performance problem, such as an improperly sized cache.
See GraphicsContext.DebugEnableLogOutput for details.
DirectX12Context / DumpDescriptorPools
Being a low-level graphics API, Direct3D 12 requires an application to manage CPU/GPU descriptor values in pools. By enabling this debug helper, the descriptor pool usage of Tinman 3D is output periodically to the standard output stream of the process.
See DirectX12Context.DebugDumpDescriptorPools for details.
DirectX12Context / DumpUploadBuffer
Being a low-level graphics API, Direct3D 12 requires that an application manages data uploads from the CPU to the GPU by itself. This debug helper periodically outputs the state of the internal upload buffer to the standard output stream of the process. This can be used to check that no inadvertent uploads are performed.
See DirectX12Context.DebugDumpUploadBuffer for details.
General API
This section covers performance problems that might be encountered when using the general-purpose APIs of the Tinman 3D SDK (see Software Architecture).
ApplicationLoop
The main loop of an application is responsible for consuming user input, for updating the application state and for rendering new graphics frames. Often, the cycles of that loop are referred to a frames and the application performance is measured with frames pre second (FPS).
Basically, there are two options that determine the overall behaviour of an application loop, with respect to performance:
- Limit FPS
-
An application may want to introduce a limit to the frames per second at which the loop runs, for example to reduce GPU power consumption.
- Minimize CPU
-
After a change of application state, a new graphics frame is rendered (see UpdateFrameTime). When idle (i.e. no state changes), an application may prefer to sleep for short amount of time, instead of busy-waiting to keep as near as possible to the FPS limit.
With Tinman 3D, an application loop may be established using any of the following:
API | Limit FPS | Minimize CPU |
---|---|---|
Implicitly via FrameRateLimit |
Call Thread.Sleep when Idle. |
|
Implicitly via FrameRateLimit |
Set the |
|
Implicitly via FrameRateLimit |
Always enabled |
|
100 Hz, overridable by subclasses |
- |
|
Set the |
- |
|
Set the |
Set the |
Low-level Terrain API
This section covers performance problems that might be encountered when using the Low-level Terrain API.
High-level Terrain API
This section covers performance problems that might be encountered when using the High-level Terrain API.
Scene API
This section covers performance problems that might be encountered when using the Scene API.