Performance Guide
This performance guide provides tips, instructions and best-practices for optimizing RAM, CPU and GPU performance when using the APIs of the Tinman 3D SDK.
Profiling Values
A profiling value is a quantity with a certain unit that is generated at runtime, in order to provide insight into the performance characteristics of the code.
Instances of IProfiler are used to produce and/or consume profiling values. They are attached to other objects via the IProfilerConsumer interface.
The ProfilerGui component provides a simple GUI with an optional 2D overlay, which can be used to browse the hierarchy of profiling values at runtime.
ApplicationLoop
The ApplicationLoop class provides profiling values for CPU/GPU/RAM usage and the amount of time spent invoking the application loop callbacks. See the ApplicationLoop.Profile* constants for details.
TerrainView
The TerrainView class provides a number of profiling values that are specific to the terrain rendering pipeline and data flow. See the TerrainView.Profile* constants for details.
Debug Helpers
Debug helpers are special class fields (static or instance), which can be updated by client code to enable additional debugging features at runtime.
Each debug helper is annotated with DebugHelperAttribute. |
Disposable / ShowAllocationStackTraces
When set to true
, allocation stack-traces of Disposable objects are collected and included in the output of finalized disposables (which usually indicates improper use of the Disposal and Ownership rules).
In DEBUG
mode, the report of finalized disposables is output to a file named tinman3d.finalizers.ID.txt
(where ID
is then ame of enclosing process) in the user’s profile folder (see Environment.SpecialFolder.UserProfile
, for example C:\Users\TheUserName
).
The file will be deleted if the report is empty.
This debug helper is only available in C#. |
See Disposable.DebugShowAllocationStackTraces for details.
ObjectPoolBase / Interval
Object pools are used at various places in the Tinman 3D SDK. When used by client-code, this debug helper may be used to determine optimal pool parameters and to evaluate the performance of the pool.
See ObjectPoolBase.DebugInterval for details.
Monitor / DumpMonitorUsage
When this debug helper is enabled, the calls to Begin of Monitor objects are counted, grouped by caller. A periodic report is output to the standard output stream of the process. The report can be used to track down excessive use of thread synchronization quickly.
This debug helper is only available in C#. |
Monitor.Begin : 217.848 (3.391/s) #01: ObjectPool`1.GetThreadSafe = 62.822 (28%) +0/s #02: ObjectPool`1.PutThreadSafe = 62.822 (28%) +0/s #03: MeshBuffer_RefinementThread.Run = 22.251 (10%) +125/s #04: Heightmap_Dataset.Begin = 8.548 ( 3%) +156/s #05: MeshBuffer.UpdateVertexFlags = 7.415 ( 3%) +41/s #06: BlockStorage.Begin = 4.629 ( 2%) +0/s #07: TaskResultBase.Wait = 4.351 ( 1%) +21/s #08: DataCache.CachePageData = 3.832 ( 1%) +0/s #09: MeshBuffer.MeshUpdate = 3.651 ( 1%) +229/s #10: DataUpdaterList`1.Validate = 3.390 ( 1%) +154/s #11: TaskResultBase.NotifyFinished = 3.069 ( 1%) +15/s #12: SampleBuffer_Pool.Get = 2.884 ( 1%) +0/s #13: TaskVoid`1.Schedule = 2.203 ( 1%) +0/s #14: TaskVoid`1.BackToPool = 2.203 ( 1%) +0/s - - -
Overuse of thread synchronization will degrade performance. Usually, those calls will appear at the top of the dump and can be used as starting points for additional profiling.
See Monitor.DebugDumpMonitorUsage for details.
Monitor / DumpWaitTimes
Enabling this debug helper will measure the time spent by waiting during calls to WaitForNotify of Monitor objects. A periodic report is output to the standard output stream of the process. The report can be used to identify bottle-necks that limit parallel execution.
This debug helper is only available in C#. |
Monitor.WaitForNotify : 105.390 (5.553/s) #01: TaskThread.Run = 94.767 (89%) +20612/s #02: DataCache.Access = 5.383 ( 5%) +1531/s #03: MeshBuffer_RefinementThread.Run = 2.257 ( 2%) +310/s #04: DataStream_Background.Run = 2.094 ( 1%) +0/s #05: TaskPool.Wait = 544 ( 0%) +0/s #06: BlockStorage.WriteWait = 336 ( 0%) +0/s #07: TaskResultBase.WaitForce = 6 ( 0%) +2/s #08: DataStream_Background.ReadBuffer = 3 ( 0%) +0/s - - -
Long wait times do not necessarily indicate a performance problem. For example, pooled worker threads may spent most of their time waiting for work to be submitted. On the other hand, when callers spend lots of time waiting to gain access to shared memory caches, this usually means that there is some kind of a performance problem, such as an improperly sized cache.
See Monitor.DebugDumpWaitTimes for details.
GraphicsContext / EnableLogOutput
Graphics APIs usually provide a debug layer, which provides additional information that can be very helpful for debugging and testing. When this debug helper is enabled, IGraphicsContextFactory objects will enable the debug layer, if available.
See GraphicsContext.DebugEnableLogOutput for details.
DirectX12Context / DumpDescriptorPools
Being a low-level graphics API, Direct3D 12 requires an application to manage CPU/GPU descriptor values in pools. By enabling this debug helper, the descriptor pool usage of Tinman 3D is output periodically to the standard output stream of the process.
See DirectX12Context.DebugDumpDescriptorPools for details.
DirectX12Context / DumpUploadBuffer
Being a low-level graphics API, Direct3D 12 requires that an application manages data uploads from the CPU to the GPU by itself. This debug helper periodically outputs the state of the internal upload buffer to the standard output stream of the process. This can be used to check that no inadvertent uploads are performed.
See DirectX12Context.DebugDumpUploadBuffer for details.
General API
This section covers performance problems that might be encountered when using the general-purpose APIs of the Tinman 3D SDK (see Software Architecture).
ApplicationLoop
The main loop of an application is responsible for consuming user input, for updating the application state and for rendering new graphics frames. Often, the cycles of that loop are referred to a frames and the application performance is measured with frames pre second (FPS).
Basically, there are two options that determine the overall behaviour of an application loop, with respect to performance:
- Limit FPS
-
An application may want to introduce a limit to the frames per second at which the loop runs, for example to reduce GPU power consumption.
- Minimize CPU
-
After a change of application state, a new graphics frame is rendered (see UpdateFrameTime). When idle (i.e. no state changes), an application may prefer to sleep for short amount of time, instead of busy-waiting to keep as near as possible to the FPS limit.
With Tinman 3D, an application loop may be established using any of the following:
API | Limit FPS | Minimize CPU |
---|---|---|
Implicitly via FrameRateLimit |
Call Thread.Sleep when Idle. |
|
Implicitly via FrameRateLimit |
Set the |
|
Implicitly via FrameRateLimit |
Always enabled |
|
100 Hz, overridable by subclasses |
- |
|
Set the |
- |
|
Set the |
Set the |
IModel
The domain model for 3D Models provides the API for loading 3D model files, building 3D models from scratch and using the resulting model structure at runtime, for example to perform collision detection or rendering with the GPU Rendering abstraction layer.
By using IModel.Bounds or IModel.Collider, additional computations may need to be carried out, in order to obtain the required spatial data. By default, these computations are performed lazily, potentially causing framerate stuttering when triggered during rendering.
To counteract such problems, any of the following mechanisms may be used:
-
Call IModel.PrepareLazy after loading a model resp. before rendering, to avoid lazy computations later.
-
Use a ModelFormat that supports pre-computed IModel.Bounds, for example the CMH format.
-
Use the ModelFlags.ComplexGeometry flag in conjunction with a supporting ModelFormat (for example CMH), which will embed pre-computed spatial data into the model (including pre-computed IModel.Bounds) and hence eliminate the need for lazy computations in the first place.
ModelReader
When using a ModelFormat that returns a ModelReader object for reading 3D model files, additional options may be specified for controlling how the read model data is going to be interpreted.
The default behaviour merges the hierarchical structure of a 3D model file quite aggressively, in order to reduce the number of generated IModel objects, which in turn allows further optimizations that benefit rendering performance.
This behaviour may not be suitable for complex models that represent whole 3D scenes, where the hierarchy is vital for visibility culling. In such cases, the following options may be used:
-
ReadModelFlags.ModelNames
Retains the hierarchical structure of the 3D model file. -
ModelReaderOptions.ModelNames
Retains the hierarchical structure only for specific nodes. -
OpenFlightModelReader.KeepStructure
Retains all group database nodes, which usually represent the 3D scene structure.
OpenFlightModelReader
When using the OpenFlight API to load 3D model files in the *.flt
format, lazy loading of IModelGeometry and IModelTexture data forces the OpenFlight database file to be re-opened each time, which can reduce performance significantly.
With OpenFlightModelReader.Database, the OpenFlight database can be kept open, which eliminates this performance bottleneck.
Low-level Terrain API
This section covers performance problems that might be encountered when using the Low-level Terrain API.
High-level Terrain API
This section covers performance problems that might be encountered when using the High-level Terrain API.
TerrainModel
The presence of the Culling, Query or ShadowReceiver flag in TerrainModel.Flags will cause the IModel.Bounds property to be get for TerrainModel.Model, which may trigger lazy computations at render time, possibly leading to framerate stuttering.
Using the ISpatialQuery interface on a terrain model may also trigger lazy computations, because the IModel.Collider property needs to be retrieved.
Please refer to General API / IModel for details on how to prevent the above from hapenning.
Scene API
This section covers performance problems that might be encountered when using the Scene API.