目录
早在2020年5月,虚幻官方放出了一个展示虚幻5代渲染特性的视频Lumen in the Land of Nanite,视频展示了基于虚拟微多边形几何体的Nanite和实时全局光照的Lumen技术,给实时游戏带来了影视级的视听体验。
当时的虚幻官方承诺在2021年上半年放出UE5预览版,果然守信如斯没有食言,在2021年5月下旬成功发布预览版UE5 Early Access(EA)。于是,我们可以研究UE5的编辑器、工具链、新的渲染特性,以及对应的UE5 EA版源代码和随同发布的资源工程AncientGame。
UE5编辑器一览,主场景是随同UE5 EA发布的工程AncientWorld。
本篇主要根据UE5 Early Access(EA)版本阐述UE5的以下内容:
为了限制篇幅的长度,将分为两部分,第一部分讲述编辑器、新特性和Nanite技术,第二部分讲述Lumen、其它渲染技术和总结。
本篇涉及的部分渲染基础概念及解析如下表:
概念
缩写
中文译名
解析
Lumen
-
流明
UE5的实时全局光照技术。
Nanite
-
纳米机器人
UE5的虚拟微多边形技术。
本节将阐述UE5的安装、编辑器,以及与UE4不一样的新特性。
第一步,更新Epic Game Launcher,重启它。
第二步,点击UE5页面,点击“下载抢先体验版”。
第三步,在“库”页面,找到5.0.0按钮,点击下载。
切到UE5页面,往下拉倒最底,点击演示UE5新功能的示例项目中的获取示例按钮:
打开UE5页面,点击“访问源代码”按钮:
或者直接打开页面5.0 Early Access,下载Source Code(zip)或Source Code(tar.gz),解压之后就可以按照UE4的流程进行设置和编译。
利用下载好或者编译好的UE5编辑器打开AncientGame工程,若顺利启动,会出现如下提示页面:
按Ctrl+Space弹出Content Drawer,进入AncientContent/Maps,打开AncientWorld的地图:
打开关卡后,屏幕会出现一片黑,不要慌,正常现象,那是因为UE5在正常渲染场景前需要执行很多数据预处理:
等待漫长的数据处理结束后,就可以预览到AncientWold的主场景了:
UE5的编辑器主界面相较UE4,排版和UI风格都有了明显的变化。UI风格变得扁平化,更像DCC工具,排版上突出了关卡编辑区域,缩小如组件、内容浏览器等区域的占用:
如上图,每个区域的功能如下:
1、菜单栏,和UE4类似。
2、组件添加、内容搜索、关卡蓝图、Sequence等工具组件。
3、地图笔刷编辑模式。
4、播放和预览。
5、设置,包含世界、工程和插件等设置。
6、世界分区(World Partition)、数据层(Data Layer)等页面。
7、关卡Actor列表及属性面板,和UE4类似。
8、后台任务状态和版本控制。
9、内容浏览器和命令行工具。内容浏览器通过快捷键Ctrl+Space可以快速显示、隐藏。命令行不需要像UE4那样需要按~键了,更加方便设置控制台变量,提升调试效率。
10、关卡编辑主窗口,它的具体功能和UE4类似,但也有不一样的地方。比如Lit增加了Nanite可视组,用以显示虚拟微多边形技术的相关信息:
关卡编辑的Lit模式增加了Ninite可视化组,背景的噪点不是bug,是显示了Nanite的三角形模式。
本小节将阐述UE5的新渲染特性。
Nanite意为纳米机器人,UE5用它来作为新一代的网格处理着色技术命名,意图明显,就是替换和升级传统以网格LOD为粒度的剔除和光栅化着色技术,利用极小粒度处理网格和三角形。
UE5的Nanite全称是Nanite Virtualized Geometry(Nanite虚拟微几何,Nanite虚拟微多边形),它支持自动化处理高精度的网格模型,支持像素级别的三角形的高细节表面和海量物体。它只会在合适的层级处理需要且仅需的数据,防止表面细节丢失,或者处理过多的数据。Nanite在渲染前会对网格、纹理、动画等数据执行很多预处理,保存在高度压缩和细粒度的二进制流中,并且自动处理它们的LOD。
AncientGame示例工程的Nanite微多边形技术一览。左上:AncientGame的神庙;右上:左上对应的微多边形可视化;左下和右下分别是工程的Boss和山体细节。
UE5种开启网格的Nanite技术有3种方式:
导入静态网格时,可在设置界面开启。
可在网格编辑器的网格属性面板设置。
通过资源浏览器的右键菜单可批量开启。
启用Nanite技术后,可获得诸多益处:
Nanite网格和传统的静态网格类似,本质上仍然是一个三角形网格,但其核心不同点是大量的细节和高度的数据压缩。最重要的是,Nanite使用了一个全新的系统,以一种极其有效的方式渲染数据格式。
传统的静态模型需要设置标记去启用Nanite技术。Nanite网格可以支持多组UV和顶点色。材质被分配到网格的一部分,因此这些材质可以使用不同的着色模型和动态效果(可在shader中执行)。材质分配可以被动态交换,就像任何其它静态网格,Nanite不需要额外处理来烘焙材质。
由于Nanite有着更好的渲染性能,更少的内存和硬盘占用,所以尽可能地开启静态网格的Nanite属性。静态网格为了更好地利用Nanite技术,最好满足以下几点:
目前版本,Nanite支持以下组件类型:
Nanite【不】支持的带动画的类型包括但不限于:
此外,Nanite还【不】支持以下特性:
需要注意的是,Nanite网格的逐顶点切线并不是像传统静态网格那样存储在网格数据中(官方文档解释是为了减少数据尺寸),因此,切线是在像素着色器动态计算出来的。Nanite由于切线空间和传统方式存在着使用上的差异,可能会导致边缘处的不连续。
Nanite也【不】支持有着以下配置的材质:
除Opaque之外的Blend Mode。
延迟贴花。
线框模式。
像素深度偏移。
世界位置偏移。
自定义逐实例数据。
双面的材质。
Nanite无法正常渲染使用了以下特性的材质(可能会消失):
Nanite【不】支持以下渲染特性:
使用了以下视图相关的物体过滤:
前向渲染
VR模式的Stereo渲染
分屏
MSAA
灯光通道
针对拥有完整细节的Nanite网格的光线追踪
部分可视化模式
Nanite支持拥有最新驱动的以下GPU的PlayStation 5、Xbox Series S|X、PC等平台:
Nanite为了监测性能,需关注以下几点:
聚合几何体(Aggregate Geometry):Aggregate Geometry将许多微小的、不相连的东西在远处合成一个体积(Volume),比如头发、叶子、草。它会影响LOD和遮挡剔除技术。
紧密堆积的表面(Closely Stacked Surfaces):Nanite会将那些靠近视图最顶层表面的物体合并起来,会将堆积在一起的物体都绘制出来,而不考虑它们之间的遮挡和隐藏关系。
上图是AncientGame的正常画面,下图是对应的使用了Closely Stacked Surfaces的实例可视化。黑色部分是因为Nanite不支持动态角色导致。
大多数情况下,Closely Stacked Surfaces会降低绘制调用,但某些情形可能带来相反效果。移动摄像机在Overdraw可视化模式下可以显示这些堆叠的表面是如何被渲染的:
分面和硬边法线(Faceted and Hard-edge Normals):理想情况下,网格的顶点数应少于三角形数。如果顶点数和三角形的比率为 2:1 或更高,则可能存在问题,尤其是在三角形计数很高的情况下。比率为 3:1 意味着网格是完全分面的,其中每个三角形都有自己的三个顶点,没有一个与另一个三角形共享,通常这是因为它们没有平滑导致的法线不相同。下面两图展示了多面和平滑法线的异同:
上图左侧使用了分面法线,上图右侧使用了平滑组法线。
上图左侧使用了分面法线的Nanite三角形可视化,上图右侧使用了平滑组法线的Nanite三角形可视化。由此可知使用平滑组的法线会采用更少的三角形绘制。
除了以上提到的几种可视化模式,UE5还提供了其它多种可视化方式,通过Overview模式可以统览所有数据:
UE5提供了Nanite的多种可视化模式,分别显示不同数据。
开启Nanite的Overview会显示所有可视化模式缩略图。
利用控制台变量Nanitestats
可以实时查看当前画面的Nanite统计数据:
右侧显示了Nanite的裁剪数据和几何数据。
Lumen是UE5的全动态全局光照和反射系统,它是UE5的默认全局光照和反射方式。Lumen可以在从毫米级到公里级的大范围、带细节的环境中呈现无限反弹和间接镜面反射的漫反射。
为了开启Lumen,需要在工程设置中开启以下选项(默认都已开启):
开启Lumen之后,Lumen GI会代替SSGI和DFAO,Lumen反射会代替SSR,并且静态光照会被禁用,所有光照图都被隐藏。
Lumen支持的渲染特性如下:
Lumen GI解决的是场景物体的非直接光照部分,例如直接光的像素会影响附近的像素,这种现象也被称为色彩溢出(Color Bleeding)。同时,由于网格会遮挡和吸收部分非直接光,Lumen也能正确处理非直接光的阴影遮挡。
Lumen全局光照能够实时动态地处理非直接光的光照和阴影效果。
Lumen实现了全分辨率的法线细节,同时用较低分辨率计算间接照明以达到实时渲染的目的。
Lumen在Final Gather阶段解决天空光照,使得户内和户外的天空光有着明显的区别,户内更暗。此外,Lumen的天空光还支持低质量的透明光照和体积雾的GI效果。
发光材质通过Lumen的Final Gather完成光照传播,而没有额外的消耗。但也对发光材质的辐射区域大小和亮度都有限制,否则会引起噪点瑕疵。
Lumen为所有范围粗糙度的材质解决了间接镜面反射效果。
此外,Lumen也支持清漆材质的反射。
在EA版本,Lumen对光源特性的支持说明如下:
在工程设置和后处理体积中,可以设置Lumen的诸多参数,如软光追模式、细节追踪模式、全局追踪模式、硬件追踪模式以及GI和反射。
虚拟阴影图(VirtualShadowMap,VSM)是一种新的阴影投射方法,用于提供一致的、高分辨率的阴影、与电影质量的资产和大型开放世界的动态照明。
在工程设置中的阴影图方法中开启VSM(默认已开启):
VSM开启后,会替换传统的阴影技术,包含固定预计算阴影、距离场阴影、预览阴影、逐物体阴影、级联阴影、移动动态阴影等。
VSM开启后,阴影图光线追踪(Shadow Map Ray Tracing ,SMRT)便可以利用其来实现很多更精准和清晰的阴影及相关特性,包含半影、软阴影、接触硬阴影等。
利用SMRT技术,点光源实现了软阴影和接触硬阴影的特性。
左:PCF会模糊并删除表面重要细节;右:SMRT提供更可信的软阴影和接触硬阴影。
我们都知道,在传统的渲染中,为了优化定向光的阴影,会采用CSM。与之类似,UE5会为定向光采用剪辑图(clipmap)技术。
一个单独的虚拟阴影贴图并不能提供足够的分辨率来覆盖更大的区域。定向光使用了围绕摄像机扩展范围的剪辑图结构,每个剪辑图级别都有自己的16K VSM。每个剪贴图级别的分辨率是相同的,但覆盖的半径是前一个级别的两倍。
左:剪辑图可视化;右:VSM页可视化。
聚光灯采用了单个16k带mip链的VSM处理阴影的LOD,而不是clipmap;点光源使用了cube map,每个面拥有16k VSM,共6个。
上:聚光灯效果;下:对应的单个VSM可视化。
为了阴影渲染性能优化,UE5采用了缓存、粗糙页(Coarse Pages)等技术。
时间超分辨率(Temporal Super Resolution,TSR)是新一代的时间抗锯齿算法,用来替换传统的TAA。它支持的特性如下:
时间超分辨率可以在工程配置中开启或关闭,默认情况下,UE5已经开启了此技术。
左:4K原生分辨率渲染,帧率是20.57;右:利用时间超分辨率技术将1080p输出4K画质,帧率提升至44.22。
UE5针对移动端改进了部分渲染模块,包含:
世界分区(World Partition)是一种新的数据管理和流式系统,在编辑器和运行时都可以使用,它完全消除了手动将世界划分成无数子级别来管理流数据和减少数据争用的要求。
使用世界分区,世界作为一个单一的持久关卡存在。在编辑器中,世界被分割成一个个格子,可以根据感兴趣的区域部分地加载地图数据。当烘焙或启动PIE时,世界被划分为针对运行时流优化的网格单元,从而成为独立的流关卡。
可以在菜单栏Window/World Partition打开世界分区编辑器,拖动左键快速选择指定范围的所有格子,右击可弹出操作菜单,包含加载选中的格子、卸载选中的格子、移动摄像机到当前格子:
此外,世界还支持数据层(Data Layer)、HLOD(Hierarchical Level of Detail)、关卡实例化(Level Instancing)、一个Actor一个文件(One File Per Actor)等功能。
UE5的动画模块增加了全身IK(Full Body IK)、控制绑定(Control Rig)、运动变形(Motion Warping)、动画工具脚本以及在Sequencer方面的支持。
Full Body IK示意图。
Control Rig效果示意图。
Motion Warping效果示意图。
UE5的物理效果在新特性中也非常抢眼,主要有以下特性:
Chaos是UE5的轻量级物理模拟解决方案,是为了满足下一代游戏的需求而从头开始建造的。它支持的主要特性有:
1、动态刚体(Rigid Body Dynamics)
2、刚体动画节点和布料物理(Rigid Body Animation Nodes and Cloth Physics)
3、破坏(Destruction)
4、布娃娃物理(Ragdoll Physics)
5、车辆(Vehicles)
6、物理场(Physics Fields)。物理场系统使用户可以在运行时在特定的空间区域直接影响Chaos物理模拟。这些场可以配置为以各种方式影响物理模拟,例如对刚体施加力,破坏几何集合簇(Geometry Collection Clusters),锚定或禁用断裂的刚体。
7、流体模拟(Fluid Simulation)
8、头发模拟(Hair Simulation)
GamePlay框架增加了游戏和模块逻辑、数据注册表(Data Registries)、增加的输入系统。
数据注册表编辑器。
UE5通过Unreal Insights的Memory Insights模块改进了内存追踪和调试支持。可支持以下特性:
1、查看会话期间任意给定时间内所有已分配内存的快照。
2、在两个不同的时间比较所有已分配内存的快照。
3、查看每个内存分配的调用堆栈。
4、鉴定长期和短期(或临时)的内存分配。
5、查找内存泄漏。
Unreal Insights的Memory Insights编辑器一览。
提高了mac上iOS远程构建过程的可靠性,并添加了一个跨平台库,以提高通过USB与iOS设备交互的可靠性。
此外,还有增加的音频音效、重新设计的VR模板、Unreal Turnkey等等。
本章将对比UE4.26版本的源码,系统地阐述UE5 EA版本源码和UE4.26的不同。为了方便对比,使用Beyond Compare的文件夹对比功能:
两个版本的源码差异还蛮大,不过本章后续章节将专注在基础模块、渲染体系、Shader等方面的差异,主要集中在以下文件夹:
Engine\Source\Runtime\Core
Engine\Source\Runtime\CoreUObject
Engine\Source\Runtime\D3D12RHI
Engine\Source\Runtime\Engine
Engine\Source\Runtime\MeshDescription
Engine\Source\Runtime\Renderer
Engine\Source\Runtime\RHI
Engine\Source\Runtime\RHICore
Engine\Shaders
Core:
UObject:
AssetRegistry:
Package:
Serialization:
由此可见,在核心和基础模块,基本上都做了大量的改动和重构,涉及方方面面。
RHI:
RHICore:
增加实例化裁剪模块:InstanceCulling、InstanceCullingManager等,包含FInstanceCullingRdgParams、EInstanceCullingMode、FInstanceCullingContext、FInstanceCullingManager等类型,主要用于Nanite技术。
虚拟纹理增加或完善了数据读写和FVirtualTextureFeedbackBuffer、RenderPages、RenderPagesStandAlone等接口。
全局距离场数据(FGlobalDistanceFieldParameterData)增加了Mipmap和VT数据和接口。
HairStrand增加EHairBindingType、EHairInterpolationType、FHairStrandsInstance等类型。
增加或完善FVertexFactoryShaderPermutationParameters的类型。
MeshPassProcessor:
FDynamicPassMeshDrawListContext的FinalizeCommand阶段增加了NewVisibleMeshDrawCommand.Setup阶段。
摒弃SetInstancedViewUniformBuffer、SetPassUniformBuffer、GetInstancedViewUniformBuffer等UniformBuffer接口。
新增Nanite模块:
新增Lumen模块:
替换传统Shader绑定接口到RDG,如SHADER_PARAMETER_TEXTURE改成SHADER_PARAMETER_RDG_TEXTURE。
增加或完善RuntimeVirtualTextureProducer、FSceneTextureExtracts、EMobileSceneTextureSetupMode、Strata地层。
增强后处理效果,如增加了Temporal Super Resolution(TSR)。
增强了光照追踪模块,如RTGI、RTAO、RTR、RTShadow、RTSkyLight等。
DeferredShadingRenderer:
增加或增强了BasePass、DepthPass、SDF、GPUScene、GlobalDistanceField、BVH、GenerateConservativeDepthBuffer、LightRendering、InrectLightRendering、MeshDrawCommands、Shader、ScreenSpace、Shadow、MobileRender等等渲染模块。
从UE4.26到5.0EA,重要和基础的渲染模块都做了大大小小的修改。
新增IRenderCaptureProvider、NaniteResources、NaniteStreamingManager等模块。
新增LevelInstance模块,用于处理关卡实例化的数据读写、打包、渲染、编辑器等。
新增ActorReferencesUtils、AssetCompilingManager、AsyncCompilationHelpers、CanvasRender、ComputeKernelCollection、DerivedMeshDataTaskUtils、InstancedStaticMeshDelegates、InstanceUniformShaderParameters、MeshCardRepresentation、NaniteSceneProxy、ObjectCacheContext、StaticMeshCompiler、TextureCompiler等模块,涉及各类资源编译、实例化、Buffer、Mesh、Nanite等。
PrimitiveSceneProxy:
SceneInterface和SceneManagement:
SceneView:
增强距离场、GPUSkinCache、Material、StaticMesh、SkeletalMesh、Texture等类型和接口。
新增Nanite:
新增Lumen:
新增Strata:
新增VirtualShadowMap:
新增InstanceCulling:
增强了HairStrands模块,如HairCards、HairScatter、BsdfPlot、ClusterCulling、Shadow、DeepShadow、EnvironmentLighting、GBuffer、Material、Visibility、Voxel等模块。
增强PathTracing、RayTracing、SSD、SSRT、TAA等模块。
删除了LPV模块。
完善基础、材质、着色、光照、阴影、专用Pass等模块,如AnisotropyPass、BasePass、MobileBasePass、BRDF、BurleySSS、CapsuleLigh、RectLight、TranslucentLighting、ClusteredDeferredShading、LightGrid、ForwardLighting、DeferredLight、DiffuseIndirect、DistanceFieldAO、DistanceFieldLighting、DistanceFieldShadow、GlobalDistanceField、Math、Decal、GpuSkin、Halton、MonteCarlo、HZB、LocalVertexFactory、MaterialTemplate、Particle、PlanarReflection、PostProcess、Reflection、SceneData、ShadingCommon、ShadingModels、Shadow、Volumetric等。
从前面几个小节可以看出来,最大的改变在于增加了Nanite、Lumen、VSM、InstanceCull、LevelInstance等模块和技术,同时修改了Engine、Render模块的相关类型和接口。
RHI层的变动主要在于将各种顶点、索引Buffer统一成了FRHIBuffer。
Renderer层增强了光线追踪,特别是屏幕空间的光线追踪,加强了距离场的各种应用,同时删除了LPV。
Engine层主要围绕着RHI、Renderer层的变动做了相应修改和调整。
本章将阐述UE5的Nanite虚拟微多边形的预处理、渲染、优化技术。
在UE5 EA源码工程搜索“Nanite”字眼,发现了195个文件供3026处匹配:
由于涉及面太广,当然不可能每个细节都阐述,笔者经过筛查,将集中精力剖析以下模块的Nanite源码:
本节主要阐述Nanite相关的基本概念、类型和基础知识。
// Engine\Source\Runtime\Engine\Classes\Engine\EngineTypes.h
// 阴影图方法.
namespace EShadowMapMethod
{
enum Type
{
// 传统的阴影图. 逐组件裁剪, 在高多边形场景造成较差的性能.
ShadowMaps UMETA(DisplayName = "Shadow Maps"),
// 为阴影渲染几何体到虚拟深度图, 用简单设置便可提供高质量的次世代投影. 与Nanite配合使用时,可实现高效裁剪.
VirtualShadowMaps UMETA(DisplayName = "Virtual Shadow Maps (Beta)")
};
}
// 应用于Nanite数据构建时的配置.
struct FMeshNaniteSettings
{
// 是否启用Nanite网格.
uint8 bEnabled : 1;
// 位置精度. 步长为2^(-PositionPrecision) cm. MIN_int32表示自动设置.
int32 PositionPrecision;
// 从LOD0的三角形百分比. 1.0表示没有任何减面, 0.0表示没有三角形.
float PercentTriangles;
FMeshNaniteSettings(): bEnabled(false), PositionPrecision(MIN_int32), PercentTriangles(0.0f){}
FMeshNaniteSettings(const FMeshNaniteSettings& Other);
bool operator==(const FMeshNaniteSettings& Other) const;
bool operator!=(const FMeshNaniteSettings& Other) const;
};
// Engine\Source\Runtime\Engine\Classes\Engine\StaticMesh.h
class UStaticMesh : public UStreamableRenderAsset, (......)
{
(......)
public:
// 静态网格的Nanite配置数据.
FMeshNaniteSettings NaniteSettings;
// 如果网格存在有效的Nanite渲染数据则返回true.
bool HasValidNaniteData() const
{
if (const FStaticMeshRenderData* SMRenderData = GetRenderData())
{
return SMRenderData->NaniteResources.PageStreamingStates.Num() > 0;
}
return false;
}
(......)
// 超高分辨率的源模型相关的接口.
FStaticMeshSourceModel& GetHiResSourceModel();
const FStaticMeshSourceModel& GetHiResSourceModel() const;
FStaticMeshSourceModel&& MoveHiResSourceModel();
void SetHiResSourceModel(FStaticMeshSourceModel&& SourceModel);
bool LoadHiResMeshDescription(FMeshDescription& OutMeshDescription) const;
bool CloneHiResMeshDescription(FMeshDescription& OutMeshDescription) const;
FMeshDescription* CreateHiResMeshDescription();
FMeshDescription* CreateHiResMeshDescription(FMeshDescription MeshDescription);
FMeshDescription* GetHiResMeshDescription() const;
bool IsHiResMeshDescriptionValid() const;
void CommitHiResMeshDescription(const FCommitMeshDescriptionParams& Params);
void ClearHiResMeshDescription();
(......)
private:
// 超高分辨率的源模型.
FStaticMeshSourceModel HiResSourceModel;
(......)
};
// Engine\Source\Runtime\Engine\Public\StaticMeshResources.h
// 静态网格所需的渲染数据.
class FStaticMeshRenderData
{
public:
(......)
// Nanite渲染资源.
Nanite::FResources NaniteResources;
(......)
};
// Engine\Source\Runtime\Engine\Public\Rendering\NaniteResources.h
// 最大数量的常量.
#define MAX_STREAMING_REQUESTS ( 128u * 1024u )
#define MAX_CLUSTER_TRIANGLES 128
#define MAX_CLUSTER_VERTICES 256
#define MAX_CLUSTER_INDICES ( MAX_CLUSTER_TRIANGLES * 3 )
#define MAX_NANITE_UVS 4
#define NUM_ROOT_PAGES 1u
// 是否使用三角形带索引.
#define USE_STRIP_INDICES 1
// CLUSTER常量.
#define CLUSTER_PAGE_GPU_SIZE_BITS 17
#define CLUSTER_PAGE_GPU_SIZE ( 1 << CLUSTER_PAGE_GPU_SIZE_BITS )
#define CLUSTER_PAGE_DISK_SIZE ( CLUSTER_PAGE_GPU_SIZE * 2 )
#define MAX_CLUSTERS_PER_PAGE_BITS 10
#define MAX_CLUSTERS_PER_PAGE_MASK ( ( 1 << MAX_CLUSTERS_PER_PAGE_BITS ) - 1 )
#define MAX_CLUSTERS_PER_PAGE ( 1 << MAX_CLUSTERS_PER_PAGE_BITS )
#define MAX_CLUSTERS_PER_GROUP_BITS 9
#define MAX_CLUSTERS_PER_GROUP_MASK ( ( 1 << MAX_CLUSTERS_PER_GROUP_BITS ) - 1 )
#define MAX_CLUSTERS_PER_GROUP ( ( 1 << MAX_CLUSTERS_PER_GROUP_BITS ) - 1 )
#define MAX_CLUSTERS_PER_GROUP_TARGET 128
// 层级, GPU页, 实例化, 组等的常量.
#define MAX_HIERACHY_CHILDREN_BITS 6
#define MAX_HIERACHY_CHILDREN ( 1 << MAX_HIERACHY_CHILDREN_BITS )
#define MAX_GPU_PAGES_BITS 14
#define MAX_GPU_PAGES ( 1 << MAX_GPU_PAGES_BITS )
#define MAX_INSTANCES_BITS 24
#define MAX_INSTANCES ( 1 << MAX_INSTANCES_BITS )
#define MAX_NODES_PER_PRIMITIVE_BITS 16
#define MAX_RESOURCE_PAGES_BITS 20
#define MAX_RESOURCE_PAGES (1 << MAX_RESOURCE_PAGES_BITS)
#define MAX_GROUP_PARTS_BITS 3
#define MAX_GROUP_PARTS_MASK ((1 << MAX_GROUP_PARTS_BITS) - 1)
#define MAX_GROUP_PARTS (1 << MAX_GROUP_PARTS_BITS)
#define PERSISTENT_CLUSTER_CULLING_GROUP_SIZE 64
// BVH
#define MAX_BVH_NODE_FANOUT_BITS 3
#define MAX_BVH_NODE_FANOUT (1 << MAX_BVH_NODE_FANOUT_BITS)
#define MAX_BVH_NODES_PER_GROUP (PERSISTENT_CLUSTER_CULLING_GROUP_SIZE / MAX_BVH_NODE_FANOUT)
#define NUM_CULLING_FLAG_BITS 3
#define NUM_PACKED_CLUSTER_FLOAT4S 8
#define MAX_POSITION_QUANTIZATION_BITS 21 // (21*3 = 63) < 64
#define NORMAL_QUANTIZATION_BITS 9
#define MAX_TEXCOORD_QUANTIZATION_BITS 15
#define MAX_COLOR_QUANTIZATION_BITS 8
#define NUM_STREAMING_PRIORITY_CATEGORY_BITS 2
#define STREAMING_PRIORITY_CATEGORY_MASK ((1u << NUM_STREAMING_PRIORITY_CATEGORY_BITS) - 1u)
#define VIEW_FLAG_HZBTEST 0x1
#define MAX_TRANSCODE_GROUPS_PER_PAGE 128
#define VERTEX_COLOR_MODE_WHITE 0
#define VERTEX_COLOR_MODE_CONSTANT 1
#define VERTEX_COLOR_MODE_VARIABLE 2
#define NANITE_USE_SCRATCH_BUFFERS 1
#define NANITE_CLUSTER_FLAG_LEAF 0x1
namespace Nanite
{
// 整形向量.
struct FUIntVector
{
uint32 X, Y, Z;
bool operator==(const FUIntVector& V) const;
FORCEINLINE friend FArchive& operator<<(FArchive& Ar, FUIntVector& V);
};
// 打包的层级节点.
struct FPackedHierarchyNode
{
FSphere LODBounds[MAX_BVH_NODE_FANOUT]; // 用球体做LOD包围盒.
struct
{
FVector BoxBoundsCenter;
uint32 MinLODError_MaxParentLODError;
} Misc0[MAX_BVH_NODE_FANOUT];
struct
{
FVector BoxBoundsExtent;
uint32 ChildStartReference;
} Misc1[MAX_BVH_NODE_FANOUT];
struct
{
uint32 ResourcePageIndex_NumPages_GroupPartSize;
} Misc2[MAX_BVH_NODE_FANOUT];
};
// 材质三角形.
struct FMaterialTriangle
{
uint32 Index0;
uint32 Index1;
uint32 Index2;
uint32 MaterialIndex;
uint32 RangeCount;
};
// 从Value中获取指定位数和偏移的值.
uint32 GetBits(uint32 Value, uint32 NumBits, uint32 Offset)
{
uint32 Mask = (1u << NumBits) - 1u;
return (Value >> Offset) & Mask;
}
// 将指定位数和偏移的值合并到Value中.
void SetBits(uint32& Value, uint32 Bits, uint32 NumBits, uint32 Offset)
{
uint32 Mask = (1u << NumBits) - 1u;
Mask <<= Offset;
Value = (Value & ~Mask) | (Bits << Offset);
}
// 被GPU使用的打包的Cluster.
struct FPackedCluster
{
// 光栅化所需的数据成员.
FIntVector QuantizedPosStart;
uint32 NumVerts_PositionOffset; // NumVerts:9, PositionOffset:23
FVector MeshBoundsMin;
uint32 NumTris_IndexOffset; // NumTris:8, IndexOffset: 24
FVector MeshBoundsDelta;
uint32 BitsPerIndex_QuantizedPosShift_PosBits; // BitsPerIndex:4, QuantizedPosShift:6, QuantizedPosBits:5.5.5
// 裁剪所需的数据成员.
FSphere LODBounds;
FVector BoxBoundsCenter;
uint32 LODErrorAndEdgeLength;
FVector BoxBoundsExtent;
uint32 Flags;
// 材质所需的数据成员.
uint32 AttributeOffset_BitsPerAttribute; // AttributeOffset: 22, BitsPerAttribute: 10
uint32 DecodeInfoOffset_NumUVs_ColorMode; // DecodeInfoOffset: 22, NumUVs: 3, ColorMode: 2
uint32 UV_Prec; // U0:4, V0:4, U1:4, V1:4, U2:4, V2:4, U3:4, V3:4
uint32 PackedMaterialInfo;
uint32 ColorMin;
uint32 ColorBits; // R:4, G:4, B:4, A:4
uint32 GroupIndex; // Debug only
uint32 Pad0;
uint32 GetNumVerts() const { return GetBits(NumVerts_PositionOffset, 9, 0); }
uint32 GetPositionOffset() const { return GetBits(NumVerts_PositionOffset, 23, 9); }
uint32 GetNumTris() const { return GetBits(NumTris_IndexOffset, 8, 0); }
uint32 GetIndexOffset() const { return GetBits(NumTris_IndexOffset, 24, 8); }
uint32 GetBitsPerIndex() const { return GetBits(BitsPerIndex_QuantizedPosShift_PosBits, 4, 0); }
uint32 GetQuantizedPosShift() const { return GetBits(BitsPerIndex_QuantizedPosShift_PosBits, 6, 4); }
uint32 GetPosBitsX() const { return GetBits(BitsPerIndex_QuantizedPosShift_PosBits, 5, 10); }
uint32 GetPosBitsY() const { return GetBits(BitsPerIndex_QuantizedPosShift_PosBits, 5, 15); }
uint32 GetPosBitsZ() const { return GetBits(BitsPerIndex_QuantizedPosShift_PosBits, 5, 20); }
uint32 GetAttributeOffset() const { return GetBits(AttributeOffset_BitsPerAttribute, 22, 0); }
uint32 GetBitsPerAttribute() const { return GetBits(AttributeOffset_BitsPerAttribute, 10, 22); }
void SetNumVerts(uint32 NumVerts) { SetBits(NumVerts_PositionOffset, NumVerts, 9, 0); }
void SetPositionOffset(uint32 Offset) { SetBits(NumVerts_PositionOffset, Offset, 23, 9); }
void SetNumTris(uint32 NumTris) { SetBits(NumTris_IndexOffset, NumTris, 8, 0); }
void SetIndexOffset(uint32 Offset) { SetBits(NumTris_IndexOffset, Offset, 24, 8); }
void SetBitsPerIndex(uint32 BitsPerIndex) { SetBits(BitsPerIndex_QuantizedPosShift_PosBits, BitsPerIndex, 4, 0); }
void SetQuantizedPosShift(uint32 PosShift) { SetBits(BitsPerIndex_QuantizedPosShift_PosBits, PosShift, 6, 4); }
void SetPosBitsX(uint32 NumBits) { SetBits(BitsPerIndex_QuantizedPosShift_PosBits, NumBits, 5, 10); }
void SetPosBitsY(uint32 NumBits) { SetBits(BitsPerIndex_QuantizedPosShift_PosBits, NumBits, 5, 15); }
void SetPosBitsZ(uint32 NumBits) { SetBits(BitsPerIndex_QuantizedPosShift_PosBits, NumBits, 5, 20); }
void SetAttributeOffset(uint32 Offset) { SetBits(AttributeOffset_BitsPerAttribute, Offset, 22, 0); }
void SetBitsPerAttribute(uint32 Bits) { SetBits(AttributeOffset_BitsPerAttribute, Bits, 10, 22); }
void SetDecodeInfoOffset(uint32 Offset) { SetBits(DecodeInfoOffset_NumUVs_ColorMode, Offset, 22, 0); }
void SetNumUVs(uint32 Num) { SetBits(DecodeInfoOffset_NumUVs_ColorMode, Num, 3, 22); }
void SetColorMode(uint32 Mode) { SetBits(DecodeInfoOffset_NumUVs_ColorMode, Mode, 2, 22+3); }
};
// 页面流状态.
struct FPageStreamingState
{
uint32 BulkOffset;
uint32 BulkSize;
uint32 PageUncompressedSize;
uint32 DependenciesStart;
uint32 DependenciesNum;
};
// 层级修正.
class FHierarchyFixup
{
public:
FHierarchyFixup() {}
FHierarchyFixup( uint32 InPageIndex, uint32 NodeIndex, uint32 ChildIndex, uint32 InClusterGroupPartStartIndex, uint32 PageDependencyStart, uint32 PageDependencyNum )
{
PageIndex = InPageIndex;
HierarchyNodeAndChildIndex = ( NodeIndex << MAX_HIERACHY_CHILDREN_BITS ) | ChildIndex;
ClusterGroupPartStartIndex = InClusterGroupPartStartIndex;
PageDependencyStartAndNum = (PageDependencyStart << MAX_GROUP_PARTS_BITS) | PageDependencyNum;
}
uint32 GetPageIndex() const { return PageIndex; }
uint32 GetNodeIndex() const { return HierarchyNodeAndChildIndex >> MAX_HIERACHY_CHILDREN_BITS; }
uint32 GetChildIndex() const { return HierarchyNodeAndChildIndex & ( MAX_HIERACHY_CHILDREN - 1 ); }
uint32 GetClusterGroupPartStartIndex() const { return ClusterGroupPartStartIndex; }
uint32 GetPageDependencyStart() const { return PageDependencyStartAndNum >> MAX_GROUP_PARTS_BITS; }
uint32 GetPageDependencyNum() const { return PageDependencyStartAndNum & MAX_GROUP_PARTS_MASK; }
uint32 PageIndex;
uint32 HierarchyNodeAndChildIndex;
uint32 ClusterGroupPartStartIndex;
uint32 PageDependencyStartAndNum;
};
// Cluster修正.
class FClusterFixup
{
public:
FClusterFixup() {}
FClusterFixup( uint32 PageIndex, uint32 ClusterIndex, uint32 PageDependencyStart, uint32 PageDependencyNum )
{
PageAndClusterIndex = ( PageIndex << MAX_CLUSTERS_PER_PAGE_BITS ) | ClusterIndex;
PageDependencyStartAndNum = (PageDependencyStart << MAX_GROUP_PARTS_BITS) | PageDependencyNum;
}
uint32 GetPageIndex() const { return PageAndClusterIndex >> MAX_CLUSTERS_PER_PAGE_BITS; }
uint32 GetClusterIndex() const { return PageAndClusterIndex & (MAX_CLUSTERS_PER_PAGE - 1u); }
uint32 GetPageDependencyStart() const { return PageDependencyStartAndNum >> MAX_GROUP_PARTS_BITS; }
uint32 GetPageDependencyNum() const { return PageDependencyStartAndNum & MAX_GROUP_PARTS_MASK; }
uint32 PageAndClusterIndex;
uint32 PageDependencyStartAndNum;
};
// 页面磁盘头.
struct FPageDiskHeader
{
uint32 GpuSize;
uint32 NumClusters;
uint32 NumRawFloat4s;
uint32 NumTexCoords;
uint32 DecodeInfoOffset;
uint32 StripBitmaskOffset;
uint32 VertexRefBitmaskOffset;
};
// Cluster磁盘头.
struct FClusterDiskHeader
{
uint32 IndexDataOffset;
uint32 VertexRefDataOffset;
uint32 PositionDataOffset;
uint32 AttributeDataOffset;
uint32 NumPrevRefVerticesBeforeDwords;
uint32 NumPrevNewVerticesBeforeDwords;
};
// Chunk修正.
class FFixupChunk //TODO: rename to something else
{
public:
struct FHeader
{
uint16 NumClusters = 0;
uint16 NumHierachyFixups = 0;
uint16 NumClusterFixups = 0;
uint16 Pad = 0;
} Header;
uint8 Data[ sizeof(FHierarchyFixup) * MAX_CLUSTERS_PER_PAGE + sizeof( FClusterFixup ) * MAX_CLUSTERS_PER_PAGE ];
FClusterFixup& GetClusterFixup( uint32 Index ) const { check( Index < Header.NumClusterFixups ); return ( (FClusterFixup*)( Data + Header.NumHierachyFixups * sizeof( FHierarchyFixup ) ) )[ Index ]; }
FHierarchyFixup& GetHierarchyFixup( uint32 Index ) const { check( Index < Header.NumHierachyFixups ); return ((FHierarchyFixup*)Data)[ Index ]; }
uint32 GetSize() const { return sizeof( Header ) + Header.NumHierachyFixups * sizeof( FHierarchyFixup ) + Header.NumClusterFixups * sizeof( FClusterFixup ); }
};
// 实例绘制参数.
struct FInstanceDraw
{
uint32 InstanceId;
uint32 ViewId;
};
// Nanite渲染资源.
struct FResources
{
// 持久状态.
TArray< uint8 > RootClusterPage; // Root page is loaded on resource load, so we always have something to draw.
FByteBulkData StreamableClusterPages; // Remaining pages are streamed on demand.
TArray< uint16 > ImposterAtlas;
TArray< FPackedHierarchyNode > HierarchyNodes;
TArray< uint32 > HierarchyRootOffsets;
TArray< FPageStreamingState > PageStreamingStates;
TArray< uint32 > PageDependencies;
int32 PositionPrecision = 0;
bool bLZCompressed = false;
// 运行时状态.
uint32 RuntimeResourceID = 0xFFFFFFFFu;
int32 HierarchyOffset = INDEX_NONE;
int32 RootPageIndex = INDEX_NONE;
uint32 NumHierarchyNodes = 0;
(......)
ENGINE_API void InitResources();
ENGINE_API bool ReleaseResources();
ENGINE_API void Serialize(FArchive& Ar, UObject* Owner);
};
// GPU端Buffer, 包含了Nanite资源数据.
class FGlobalResources : public FRenderResource
{
public:
struct PassBuffers
{
// 候选的(即未裁剪的)节点和Cluster缓冲区.
TRefCountPtr<FRDGPooledBuffer> CandidateNodesAndClustersBuffer;
TRefCountPtr<FRDGPooledBuffer> StatsRasterizeArgsSWHWBuffer;
};
uint32 StatsRenderFlags = 0;
uint32 StatsDebugFlags = 0;
public:
virtual void InitRHI() override;
virtual void ReleaseRHI() override;
ENGINE_API void Update(FRDGBuilder& GraphBuilder); // Called once per frame before any Nanite rendering has occurred.
ENGINE_API static uint32 GetMaxCandidateClusters();
ENGINE_API static uint32 GetMaxVisibleClusters();
ENGINE_API static uint32 GetMaxNodes();
(......)
private:
PassBuffers MainPassBuffers;
PassBuffers PostPassBuffers;
class FVertexFactory* VertexFactory = nullptr;
TRefCountPtr<FRDGPooledBuffer> StatsBuffer;
// Dummy structured buffer with stride8
TRefCountPtr<FRDGPooledBuffer> StructureBufferStride8;
#if NANITE_USE_SCRATCH_BUFFERS
TRefCountPtr<FRDGPooledBuffer> PrimaryVisibleClustersBuffer;
// Used for scratch memory (transient only)
TRefCountPtr<FRDGPooledBuffer> ScratchVisibleClustersBuffer;
TRefCountPtr<FRDGPooledBuffer> ScratchOccludedInstancesBuffer;
#endif
};
extern ENGINE_API TGlobalResource< FGlobalResources > GGlobalResources;
} // namespace Nanite
由于构建Nanite数据时涉及的概念众多,这里集中阐述一下。
Nanite涉及到最核心最基础的概念便是Cluster,一个Cluster是一组相邻三角形的集合:
上:正常渲染;中:三角形可视化;下:Cluster可视化。
Cluster可以和相邻的Cluster或者相邻LOD的Cluster动态合批,使得画面不违和,不产生明显的跳变,具体见下视频:
Cluster技术并非UE独创,而在早前已被育碧和寒霜引擎使用,具体可参见论文:GPU-Driven Rendering Pipeline和Optimizing the Graphics Pipeline with Compute。
下面是Cluster及其它基础类型的定义:
// Engine\Source\Developer\NaniteBuilder\Private\Cluster.h
// 网格簇, 将模型划分为若干个簇.
class FCluster
{
public:
FCluster();
FCluster( FCluster& SrcCluster, uint32 TriBegin, uint32 TriEnd, const TArray< uint32 >& TriIndexes );
FCluster( const TArray< const FCluster*, TInlineAllocator<16> >& MergeList );
FCluster(const TArray< FStaticMeshBuildVertex >& InVerts,const TArrayView< const uint32 >& InIndexes,
const TArrayView< const int32 >& InMaterialIndexes,const TBitArray<>& InBoundaryEdges,uint32 TriBegin, uint32 TriEnd, const TArray< uint32 >& TriIndexes, uint32 NumTexCoords, bool bHasColors );
// 简化Cluster, 可以指定期望的三角形数量.
float Simplify( uint32 NumTris );
// 拆分Cluster.
void Split( FGraphPartitioner& Partitioner ) const;
(......)
static const uint32 ClusterSize = 128;
// 计数器.
uint32 NumVerts = 0;
uint32 NumTris = 0;
uint32 NumTexCoords = 0;
bool bHasColors = false;
// 网格数据.
TArray< float > Verts; // 顶点
TArray< uint32 > Indexes; // 索引
TArray< int32 > MaterialIndexes; // 材质索引.
TBitArray<> BoundaryEdges; // 边界边.
TBitArray<> ExternalEdges; // 扩展边.
uint32 NumExternalEdges; // 扩展边数量.
TMap< uint32, uint32 > AdjacentClusters; // 相邻的Cluster.
// 包围盒数据.
FBounds Bounds; // 包围盒.
FSphere SphereBounds;
FSphere LODBounds;
FVector MeshBoundsMin; //网格包围盒.
FVector MeshBoundsDelta;
float SurfaceArea = 0.0f;
uint32 GUID = 0;
int32 MipLevel = 0;
// 量化位置的数据.
TArray<FIntVector> QuantizedPositions;
FIntVector QuantizedPosStart = { 0u, 0u, 0u };
uint32 QuantizedPosShift = 0u;
FIntVector QuantizedPosBits = {};
float EdgeLength = 0.0f;
float LODError = 0.0f;
// 所在的Group数据.
uint32 GroupIndex = MAX_uint32;
uint32 GroupPartIndex = MAX_uint32;
uint32 GeneratingGroupIndex= MAX_uint32;
// 材质范围.
TArray<FMaterialRange, TInlineAllocator<4>> MaterialRanges;
// 带状索引数据.
FStripDesc StripDesc;
TArray<uint8> StripIndexData;
};
// Engine\Source\Developer\NaniteBuilder\Private\ClusterDAG.h
// 簇组, 集合了若干个Cluster.
struct FClusterGroup
{
// 包围盒.
FSphere Bounds;
FSphere LODBounds;
// 误差.
float MinLODError;
float MaxParentLODError;
// 层级和网格索引.
int32 MipLevel;
uint32 MeshIndex;
// 页表索引.
uint32 PageIndexStart;
uint32 PageIndexNum;
// 子节点索引.
TArray< uint32 > Children;
friend FArchive& operator<<(FArchive& Ar, FClusterGroup& Group);
};
// Engine\Source\Developer\NaniteBuilder\Private\NaniteEncode.cpp
// FClusterGroup分拆后的全部或一部分.
struct FClusterGroupPart
{
TArray<uint32> Clusters; // 在页面分配期间可能重新排序,因此需要在这里存储一个列表。
FBounds Bounds; // 包围盒.
uint32 PageIndex; // 页表索引.
uint32 GroupIndex; // 所在的Group索引.
uint32 HierarchyNodeIndex; // 层次结构节点索引.
uint32 HierarchyChildIndex; // 层次结构子节点索引.
uint32 PageClusterOffset; // 页表Cluster列表偏移.
};
// 页表的一部分.
struct FPageSections
{
uint32 Cluster = 0;
uint32 MaterialTable = 0;
uint32 DecodeInfo = 0;
uint32 Index = 0;
uint32 Position = 0;
uint32 Attribute = 0;
uint32 GetMaterialTableSize() const { return Align(MaterialTable, 16); }
uint32 GetClusterOffset() const { return 0; }
uint32 GetMaterialTableOffset() const { return Cluster; }
uint32 GetDecodeInfoOffset() const { return Cluster + GetMaterialTableSize(); }
uint32 GetIndexOffset() const { return Cluster + GetMaterialTableSize() + DecodeInfo; }
uint32 GetPositionOffset() const { return Cluster + GetMaterialTableSize() + DecodeInfo + Index; }
uint32 GetAttributeOffset() const { return Cluster + GetMaterialTableSize() + DecodeInfo + Index + Position; }
uint32 GetTotal() const { return Cluster + GetMaterialTableSize() + DecodeInfo + Index + Position + Attribute; }
FPageSections GetOffsets() const
{
return FPageSections{ GetClusterOffset(), GetMaterialTableOffset(), GetDecodeInfoOffset(), GetIndexOffset(), GetPositionOffset(), GetAttributeOffset() };
}
void operator+=(const FPageSections& Other)
{
Cluster += Other.Cluster;
MaterialTable += Other.MaterialTable;
DecodeInfo += Other.DecodeInfo;
Index += Other.Index;
Position += Other.Position;
Attribute += Other.Attribute;
}
};
// Clsuter页表.
struct FPage
{
uint32 PartsStartIndex = 0; // FClusterGroupPart起始索引.
uint32 PartsNum = 0; // FClusterGroupPart数量.
uint32 NumClusters = 0; // Cluster数量.
FPageSections GpuSizes; // GPU尺寸.
};
// 编码信息.
struct FEncodingInfo
{
uint32 BitsPerIndex; // 每个索引的位数.
uint32 BitsPerAttribute; // 每个属性的位数.
uint32 UVPrec; // UV精度.
uint32 ColorMode; // 颜色模式.
FIntVector4 ColorMin; // 最小颜色.
FIntVector4 ColorBits; // 颜色位数.
FPageSections GpuSizes; // GPU尺寸.
// UV编码信息.
FGeometryEncodingUVInfo UVInfos[MAX_NANITE_UVS];
};
// Cluster Hierarchy的中间节点, 用于构建Hierarchy.
struct FIntermediateNode
{
uint32 PartIndex = MAX_uint32; // FClusterGroupPart索引.
uint32 MipLevel = MAX_int32; // Mip层级.
bool bLeaf = false; // 是否叶子节点.
FBounds Bound; // 包围盒.
TArray< uint32 > Children; // 子节点列表.
};
// Engine\Source\Developer\NaniteBuilder\Private\ImposterAtlas.h
// Cluster光栅化进的图集.
class FImposterAtlas
{
public:
static constexpr uint32 AtlasSize = 12;
static constexpr uint32 TileSize = 12;
FImposterAtlas( TArray< uint16 >& InPixels, const FBounds& MeshBounds );
// 光栅化指定Cluster的所有三角形到此FImposterAtlas.
void Rasterize( const FIntPoint& TilePos, const FCluster& Cluster, uint32 ClusterIndex );
private:
TArray< uint16 >& Pixels;
FVector BoundsCenter;
FVector BoundsExtent;
FMatrix GetLocalToImposter( const FIntPoint& TilePos ) const;
};
本小节主要阐述Nanite在渲染前执行的预处理,包含Nanite静态数据的构建、调用过程等。
Nanite通过BuildNaniteFromHiResSourceModel接口从最高分辨率的模型构建需要的数据,类似于FStaticMeshBuilder::Build()接口,但会忽略减面过程,这个过程被称作Nanite切分(Nanite-fractional-cut),具体过程如下:
// Engine\Source\Developer\MeshBuilder\Private\StaticMeshBuilder.cpp
static bool BuildNaniteFromHiResSourceModel(
UStaticMesh* StaticMesh,
const FMeshNaniteSettings NaniteSettings,
FBoxSphereBounds& HiResBoundsOut,
Nanite::FResources& NaniteResourcesOut)
{
// 忽略没有高分辨率的静态网格.
if (ensure(StaticMesh->IsHiResMeshDescriptionValid()) == false)
{
return false;
}
TRACE_CPUPROFILER_EVENT_SCOPE(FStaticMeshBuilder::BuildNaniteFromHiResSourceModel);
// 获取模型数据
FMeshDescription HiResMeshDescription = *StaticMesh->GetHiResMeshDescription();
FStaticMeshSourceModel& HiResSrcModel = StaticMesh->GetHiResSourceModel();
FMeshBuildSettings& HiResBuildSettings = HiResSrcModel.BuildSettings;
// 计算切线, 光照图UV等等.
FMeshDescriptionHelper MeshDescriptionHelper(&HiResBuildSettings);
MeshDescriptionHelper.SetupRenderMeshDescription(StaticMesh, HiResMeshDescription);
// 构建临时的RenderData数据, 以便传递到后续的Nanite构建阶段.
FStaticMeshRenderData HiResTempRenderData;
HiResTempRenderData.AllocateLODResources(1);
// 注意获取的是索引为0的LOD数据(亦即最高分辨率的数据).
FStaticMeshLODResources& HiResStaticMeshLOD = HiResTempRenderData.LODResources[0];
HiResStaticMeshLOD.MaxDeviation = 0.0f;
// 准备PerSectionIndices数组, 以优化提供给GPU的索引缓冲.
TArray<TArray<uint32>> PerSectionIndices;
PerSectionIndices.AddDefaulted(HiResMeshDescription.PolygonGroups().Num());
HiResStaticMeshLOD.Sections.Empty(HiResMeshDescription.PolygonGroups().Num());
// 构建顶点和索引缓冲. 不需要WedgeMap或RemapVerts
TArray<int32> WedgeMap, RemapVerts;
TArray<FStaticMeshBuildVertex> StaticMeshBuildVertices;
BuildVertexBuffer(StaticMesh, HiResMeshDescription, HiResBuildSettings, WedgeMap, HiResStaticMeshLOD.Sections, PerSectionIndices, StaticMeshBuildVertices, MeshDescriptionHelper.GetOverlappingCorners(), RemapVerts);
WedgeMap.Empty();
const uint32 NumTextureCoord = HiResMeshDescription.VertexInstanceAttributes().GetAttributesRef<FVector2D>(MeshAttribute::VertexInstance::TextureCoordinate).GetNumChannels();
// 只有渲染数据和顶点数据需要被使用, 所以可以清理MeshDescription.
HiResMeshDescription.Empty();
// 连结逐section的索引缓冲.
TArray<uint32> CombinedIndices;
bool bNeeds32BitIndices = false;
BuildCombinedSectionIndices(PerSectionIndices, HiResStaticMeshLOD, CombinedIndices, bNeeds32BitIndices);
// 在Nanite构建之前从高分辨率网格计算包围盒, 因为它会修改StaticMeshBuildVertices.
ComputeBoundsFromVertexList(StaticMeshBuildVertices, HiResBoundsOut.Origin, HiResBoundsOut.BoxExtent, HiResBoundsOut.SphereRadius);
// Nanite构建要求section材质索引已经从SectionInfoMap中解析出来, 因为索引被烘焙进了FMaterialTriangles.
for (int32 SectionIndex = 0; SectionIndex < HiResStaticMeshLOD.Sections.Num(); SectionIndex++)
{
HiResStaticMeshLOD.Sections[SectionIndex].MaterialIndex = StaticMesh->GetSectionInfoMap().Get(0, SectionIndex).MaterialIndex;
}
// 运行Nanite构建.
{
TRACE_CPUPROFILER_EVENT_SCOPE(FStaticMeshBuilder::BuildNaniteFromHiResSourceModel::Nanite);
Nanite::IBuilderModule& NaniteBuilderModule = Nanite::IBuilderModule::Get();
if (!NaniteBuilderModule.Build(NaniteResourcesOut, StaticMeshBuildVertices, CombinedIndices, HiResStaticMeshLOD.Sections, NumTextureCoord, NaniteSettings))
{
UE_LOG(LogStaticMesh, Error, TEXT("Failed to build Nanite for HiRes static mesh. See previous line(s) for details."));
return false;
}
}
return true;
}
上面的代码涉及了几个重要接口,下面分析它们:
// Engine\Source\Runtime\Engine\Private\StaticMesh.cpp
// 是否存在有效的高分辨率网格.
bool UStaticMesh::IsHiResMeshDescriptionValid() const
{
const FStaticMeshSourceModel& SourceModel = GetHiResSourceModel();
return SourceModel.IsMeshDescriptionValid();
}
// Engine\Source\Developer\MeshBuilder\Private\MeshDescriptionHelper.cpp
void FMeshDescriptionHelper::SetupRenderMeshDescription(UObject* Owner, FMeshDescription& RenderMeshDescription)
{
TRACE_CPUPROFILER_EVENT_SCOPE(FMeshDescriptionHelper::GetRenderMeshDescription);
UStaticMesh* StaticMesh = Cast<UStaticMesh>(Owner);
const bool bNaniteBuildEnabled = StaticMesh->NaniteSettings.bEnabled;
float ComparisonThreshold = (BuildSettings->bRemoveDegenerates && !bNaniteBuildEnabled) ? THRESH_POINTS_ARE_SAME : 0.0f;
// 保证多边形法线,切线,副法线被计算, 也会从render mesh description删除的退化三件套.
FStaticMeshOperations::ComputeTriangleTangentsAndNormals(RenderMeshDescription, ComparisonThreshold);
FVertexInstanceArray& VertexInstanceArray = RenderMeshDescription.VertexInstances();
FStaticMeshAttributes Attributes(RenderMeshDescription);
TVertexInstanceAttributesRef<FVector> Normals = Attributes.GetVertexInstanceNormals();
TVertexInstanceAttributesRef<FVector> Tangents = Attributes.GetVertexInstanceTangents();
TVertexInstanceAttributesRef<float> BinormalSigns = Attributes.GetVertexInstanceBinormalSigns();
// 找到重叠的顶点,加速邻接。
FStaticMeshOperations::FindOverlappingCorners(OverlappingCorners, RenderMeshDescription, ComparisonThreshold);
// 静态网格总是混合重叠角的法线.
EComputeNTBsFlags ComputeNTBsOptions = EComputeNTBsFlags::BlendOverlappingNormals;
ComputeNTBsOptions |= BuildSettings->bComputeWeightedNormals ? EComputeNTBsFlags::WeightedNTBs : EComputeNTBsFlags::None;
ComputeNTBsOptions |= BuildSettings->bRecomputeNormals ? EComputeNTBsFlags::Normals : EComputeNTBsFlags::None;
ComputeNTBsOptions |= BuildSettings->bUseMikkTSpace ? EComputeNTBsFlags::UseMikkTSpace : EComputeNTBsFlags::None;
// Nanite网格不会计算切线数据.
if (!bNaniteBuildEnabled)
{
ComputeNTBsOptions |= BuildSettings->bRemoveDegenerates ? EComputeNTBsFlags::IgnoreDegenerateTriangles : EComputeNTBsFlags::None;
ComputeNTBsOptions |= BuildSettings->bRecomputeTangents ? EComputeNTBsFlags::Tangents : EComputeNTBsFlags::None;
}
// 计算任何丢失的法线或切线.
FStaticMeshOperations::ComputeTangentsAndNormals(RenderMeshDescription, ComputeNTBsOptions);
// 生成光照图UV.
if (BuildSettings->bGenerateLightmapUVs && VertexInstanceArray.Num() > 0)
{
TVertexInstanceAttributesRef<FVector2D> VertexInstanceUVs = Attributes.GetVertexInstanceUVs();
int32 NumIndices = VertexInstanceUVs.GetNumChannels();
//Verify the src light map channel
if (BuildSettings->SrcLightmapIndex >= NumIndices)
{
BuildSettings->SrcLightmapIndex = 0;
}
//Verify the destination light map channel
if (BuildSettings->DstLightmapIndex >= NumIndices)
{
//Make sure we do not add illegal UV Channel index
if (BuildSettings->DstLightmapIndex >= MAX_MESH_TEXTURE_COORDS_MD)
{
BuildSettings->DstLightmapIndex = MAX_MESH_TEXTURE_COORDS_MD - 1;
}
//Add some unused UVChannel to the mesh description for the lightmapUVs
VertexInstanceUVs.SetNumChannels(BuildSettings->DstLightmapIndex + 1);
BuildSettings->DstLightmapIndex = NumIndices;
}
FStaticMeshOperations::CreateLightMapUVLayout(RenderMeshDescription,
BuildSettings->SrcLightmapIndex,
BuildSettings->DstLightmapIndex,
BuildSettings->MinLightmapResolution,
(ELightmapUVVersion)StaticMesh->GetLightmapUVVersion(),
OverlappingCorners);
}
}
// Engine\Source\Developer\MeshBuilder\Private\StaticMeshBuilder.cpp
// 构建顶点缓冲区.
void BuildVertexBuffer(
UStaticMesh *StaticMesh
, const FMeshDescription& MeshDescription
, const FMeshBuildSettings& BuildSettings
, TArray<int32>& OutWedgeMap
, FStaticMeshSectionArray& OutSections
, TArray<TArray<uint32> >& OutPerSectionIndices
, TArray< FStaticMeshBuildVertex >& StaticMeshBuildVertices
, const FOverlappingCorners& OverlappingCorners
, TArray<int32>& RemapVerts)
{
TRACE_CPUPROFILER_EVENT_SCOPE(BuildVertexBuffer);
TArray<int32> RemapVertexInstanceID;
// 设置顶点缓冲元素.
const int32 NumVertexInstances = MeshDescription.VertexInstances().GetArraySize();
StaticMeshBuildVertices.Reserve(NumVertexInstances);
FStaticMeshConstAttributes Attributes(MeshDescription);
TPolygonGroupAttributesConstRef<FName> PolygonGroupImportedMaterialSlotNames = Attributes.GetPolygonGroupMaterialSlotNames();
TVertexAttributesConstRef<FVector> VertexPositions = Attributes.GetVertexPositions();
TVertexInstanceAttributesConstRef<FVector> VertexInstanceNormals = Attributes.GetVertexInstanceNormals();
TVertexInstanceAttributesConstRef<FVector> VertexInstanceTangents = Attributes.GetVertexInstanceTangents();
TVertexInstanceAttributesConstRef<float> VertexInstanceBinormalSigns = Attributes.GetVertexInstanceBinormalSigns();
TVertexInstanceAttributesConstRef<FVector4> VertexInstanceColors = Attributes.GetVertexInstanceColors();
TVertexInstanceAttributesConstRef<FVector2D> VertexInstanceUVs = Attributes.GetVertexInstanceUVs();
const bool bHasColors = VertexInstanceColors.IsValid();
const bool bIgnoreTangents = StaticMesh->NaniteSettings.bEnabled;
const uint32 NumTextureCoord = VertexInstanceUVs.GetNumChannels();
const FMatrix ScaleMatrix = FScaleMatrix(BuildSettings.BuildScale3D).Inverse().GetTransposed();
TMap<FPolygonGroupID, int32> PolygonGroupToSectionIndex;
for (const FPolygonGroupID PolygonGroupID : MeshDescription.PolygonGroups().GetElementIDs())
{
int32& SectionIndex = PolygonGroupToSectionIndex.FindOrAdd(PolygonGroupID);
SectionIndex = OutSections.Add(FStaticMeshSection());
FStaticMeshSection& StaticMeshSection = OutSections[SectionIndex];
StaticMeshSection.MaterialIndex = StaticMesh->GetMaterialIndexFromImportedMaterialSlotName(PolygonGroupImportedMaterialSlotNames[PolygonGroupID]);
if (StaticMeshSection.MaterialIndex == INDEX_NONE)
{
StaticMeshSection.MaterialIndex = PolygonGroupID.GetValue();
}
}
int32 ReserveIndicesCount = MeshDescription.Triangles().Num() * 3;
// 填充重映射数组.
RemapVerts.AddZeroed(ReserveIndicesCount);
for (int32& RemapIndex : RemapVerts)
{
RemapIndex = INDEX_NONE;
}
// 初始化楔形表OutWedgeMap
OutWedgeMap.Reset();
OutWedgeMap.AddZeroed(ReserveIndicesCount);
float VertexComparisonThreshold = BuildSettings.bRemoveDegenerates ? THRESH_POINTS_ARE_SAME : 0.0f;
int32 WedgeIndex = 0;
for (const FTriangleID TriangleID : MeshDescription.Triangles().GetElementIDs())
{
const FPolygonGroupID PolygonGroupID = MeshDescription.GetTrianglePolygonGroup(TriangleID);
const int32 SectionIndex = PolygonGroupToSectionIndex[PolygonGroupID];
TArray<uint32>& SectionIndices = OutPerSectionIndices[SectionIndex];
TArrayView<const FVertexID> VertexIDs = MeshDescription.GetTriangleVertices(TriangleID);
FVector CornerPositions[3];
for (int32 TriVert = 0; TriVert < 3; ++TriVert)
{
CornerPositions[TriVert] = VertexPositions[VertexIDs[TriVert]];
}
FOverlappingThresholds OverlappingThresholds;
OverlappingThresholds.ThresholdPosition = VertexComparisonThreshold;
// 不处理已被合并的三角形.
if (PointsEqual(CornerPositions[0], CornerPositions[1], OverlappingThresholds)
|| PointsEqual(CornerPositions[0], CornerPositions[2], OverlappingThresholds)
|| PointsEqual(CornerPositions[1], CornerPositions[2], OverlappingThresholds))
{
WedgeIndex += 3;
continue;
}
TArrayView<const FVertexInstanceID> VertexInstanceIDs = MeshDescription.GetTriangleVertexInstances(TriangleID);
for (int32 TriVert = 0; TriVert < 3; ++TriVert, ++WedgeIndex)
{
const FVertexInstanceID VertexInstanceID = VertexInstanceIDs[TriVert];
const FVector& VertexPosition = CornerPositions[TriVert];
const FVector& VertexInstanceNormal = VertexInstanceNormals[VertexInstanceID];
const FVector& VertexInstanceTangent = VertexInstanceTangents[VertexInstanceID];
const float VertexInstanceBinormalSign = VertexInstanceBinormalSigns[VertexInstanceID];
FStaticMeshBuildVertex StaticMeshVertex;
StaticMeshVertex.Position = VertexPosition * BuildSettings.BuildScale3D;
// 如果是Nanite网格, 直接赋值固定的切线和副切线.
if( bIgnoreTangents )
{
StaticMeshVertex.TangentX = FVector( 1.0f, 0.0f, 0.0f );
StaticMeshVertex.TangentY = FVector( 0.0f, 1.0f, 0.0f );
}
else
{
StaticMeshVertex.TangentX = ScaleMatrix.TransformVector(VertexInstanceTangent).GetSafeNormal();
StaticMeshVertex.TangentY = ScaleMatrix.TransformVector(FVector::CrossProduct(VertexInstanceNormal, VertexInstanceTangent) * VertexInstanceBinormalSign).GetSafeNormal();
}
StaticMeshVertex.TangentZ = ScaleMatrix.TransformVector(VertexInstanceNormal).GetSafeNormal();
if (bHasColors)
{
const FVector4& VertexInstanceColor = VertexInstanceColors[VertexInstanceID];
const FLinearColor LinearColor(VertexInstanceColor);
StaticMeshVertex.Color = LinearColor.ToFColor(true);
}
else
{
StaticMeshVertex.Color = FColor::White;
}
const uint32 MaxNumTexCoords = FMath::Min<int32>(MAX_MESH_TEXTURE_COORDS_MD, MAX_STATIC_TEXCOORDS);
for (uint32 UVIndex = 0; UVIndex < MaxNumTexCoords; ++UVIndex)
{
if(UVIndex < NumTextureCoord)
{
StaticMeshVertex.UVs[UVIndex] = VertexInstanceUVs.Get(VertexInstanceID, UVIndex);
}
else
{
StaticMeshVertex.UVs[UVIndex] = FVector2D(0.0f, 0.0f);
}
}
// 不会增加重复的顶点实例. 使用已被构建的WedgeIndex
const TArray<int32>& DupVerts = OverlappingCorners.FindIfOverlapping(WedgeIndex);
int32 Index = INDEX_NONE;
for (int32 k = 0; k < DupVerts.Num(); k++)
{
if (DupVerts[k] >= WedgeIndex)
{
break;
}
int32 Location = RemapVerts.IsValidIndex(DupVerts[k]) ? RemapVerts[DupVerts[k]] : INDEX_NONE;
if (Location != INDEX_NONE && AreVerticesEqual(StaticMeshVertex, StaticMeshBuildVertices[Location], VertexComparisonThreshold))
{
Index = Location;
break;
}
}
if (Index == INDEX_NONE)
{
Index = StaticMeshBuildVertices.Add(StaticMeshVertex);
}
RemapVerts[WedgeIndex] = Index;
OutWedgeMap[WedgeIndex] = Index;
SectionIndices.Add( Index );
}
}
// 设置缓冲区前先优化.
if (NumVertexInstances < 100000 * 3)
{
BuildOptimizationHelper::CacheOptimizeVertexAndIndexBuffer(StaticMeshBuildVertices, OutPerSectionIndices, OutWedgeMap);
}
}
// 构建组合的Section索引.
static void BuildCombinedSectionIndices(
const TArray<TArray<uint32>>& PerSectionIndices,
FStaticMeshLODResources& StaticMeshLODInOut,
TArray<uint32>& CombinedIndicesOut,
bool& bNeeds32BitIndicesOut )
{
bNeeds32BitIndicesOut = false;
for (int32 SectionIndex = 0; SectionIndex < StaticMeshLODInOut.Sections.Num(); SectionIndex++)
{
FStaticMeshSection& Section = StaticMeshLODInOut.Sections[SectionIndex];
const TArray<uint32>& SectionIndices = PerSectionIndices[SectionIndex];
Section.FirstIndex = 0;
Section.NumTriangles = 0;
Section.MinVertexIndex = 0;
Section.MaxVertexIndex = 0;
if (SectionIndices.Num())
{
Section.FirstIndex = CombinedIndicesOut.Num();
Section.NumTriangles = SectionIndices.Num() / 3;
CombinedIndicesOut.AddUninitialized(SectionIndices.Num());
uint32* DestPtr = &CombinedIndicesOut[Section.FirstIndex];
uint32 const* SrcPtr = SectionIndices.GetData();
Section.MinVertexIndex = *SrcPtr;
Section.MaxVertexIndex = *SrcPtr;
for (int32 Index = 0; Index < SectionIndices.Num(); Index++)
{
uint32 VertIndex = *SrcPtr++;
bNeeds32BitIndicesOut |= (VertIndex > MAX_uint16);
Section.MinVertexIndex = FMath::Min<uint32>(VertIndex, Section.MinVertexIndex);
Section.MaxVertexIndex = FMath::Max<uint32>(VertIndex, Section.MaxVertexIndex);
*DestPtr++ = VertIndex;
}
}
}
}
// 根据顶点计算包围盒和球体
static void ComputeBoundsFromVertexList(const TArray<FStaticMeshBuildVertex>& Vertices, FVector& OriginOut, FVector& ExtentOut, float& RadiusOut)
{
// 计算包围盒
FBox BoundingBox(ForceInit);
for (int32 VertexIndex = 0; VertexIndex < Vertices.Num(); VertexIndex++)
{
BoundingBox += Vertices[VertexIndex].Position;
}
BoundingBox.GetCenterAndExtents(OriginOut, ExtentOut);
// 计算球体, 利用包围盒的中心作为球体中心.
RadiusOut = 0.0f;
for (int32 VertexIndex = 0; VertexIndex < Vertices.Num(); VertexIndex++)
{
RadiusOut = FMath::Max((Vertices[VertexIndex].Position-OriginOut).Size(), RadiusOut);
}
}
以上的很多逻辑和普通的静态网格类似,但也存在以下几点不同:
本小节将阐述Nanite网格的构建过程。
// Engine\Source\Developer\NaniteBuilder\Private\NaniteBuilder.cpp
bool FBuilderModule::Build(
FResources& Resources,
TArray< FStaticMeshBuildVertex>& Vertices,
TArray< uint32 >& TriangleIndices,
TArray< FStaticMeshSection, TInlineAllocator<1>>& Sections,
uint32 NumTexCoords,
const FMeshNaniteSettings& Settings)
{
TRACE_CPUPROFILER_EVENT_SCOPE(Nanite::Build);
check(Sections.Num() > 0 && Sections.Num() <= 64);
// 构建三角形索引和材质索引的关联数组。
TArray<int32> MaterialIndices;
{
TRACE_CPUPROFILER_EVENT_SCOPE(Nanite::BuildSections);
// 材质索引的数量和三角形数量一致.
MaterialIndices.Reserve(TriangleIndices.Num() / 3);
for (int32 SectionIndex = 0; SectionIndex < Sections.Num(); SectionIndex++)
{
FStaticMeshSection& Section = Sections[SectionIndex];
check(Section.MaterialIndex != INDEX_NONE);
for (uint32 i = 0; i < Section.NumTriangles; ++i)
{
MaterialIndices.Add(Section.MaterialIndex);
}
}
}
TArray<uint32> MeshTriangleCounts;
MeshTriangleCounts.Add(TriangleIndices.Num() / 3);
// 保证每个三角形有一个材质索引.
check(MaterialIndices.Num() * 3 == TriangleIndices.Num());
// 构建Nanite数据.
return BuildNaniteData(
Resources,
Vertices,
TriangleIndices,
MaterialIndices,
MeshTriangleCounts,
Sections,
NumTexCoords,
Settings
);
}
// 构建Nanite数据.
static bool BuildNaniteData(
FResources& Resources,
TArray< FStaticMeshBuildVertex >& Verts, // TODO: Do not require this vertex type for all users of Nanite
TArray< uint32 >& Indexes,
TArray< int32 >& MaterialIndexes,
TArray<uint32>& MeshTriangleCounts,
TArray< FStaticMeshSection, TInlineAllocator<1> >& Sections,
uint32 NumTexCoords,
const FMeshNaniteSettings& Settings
)
{
TRACE_CPUPROFILER_EVENT_SCOPE(Nanite::BuildData);
if (NumTexCoords > MAX_NANITE_UVS) NumTexCoords = MAX_NANITE_UVS;
FBounds VertexBounds;
uint32 Channel = 255; // 用来检测是否拥有有效的顶点数据.
for( auto& Vert : Verts )
{
VertexBounds += Vert.Position;
Channel &= Vert.Color.R;
Channel &= Vert.Color.G;
Channel &= Vert.Color.B;
Channel &= Vert.Color.A;
}
const uint32 NumMeshes = MeshTriangleCounts.Num();
// 只有非全白时才拥有颜色数据.
bool bHasColors = Channel != 255;
TArray< uint32 > ClusterCountPerMesh;
TArray< FCluster > Clusters;
{
uint32 BaseTriangle = 0;
// 遍历所有Section, 给每个Section构建一个或多个Cluster.
for (uint32 NumTriangles : MeshTriangleCounts)
{
uint32 NumClustersBefore = Clusters.Num();
if (NumTriangles)
{
// 为每个Section构建1或多个Cluster. 使用了TArrayView构建复用数据的数组.
// 后面有分析ClusterTriangles的具体过程.
ClusterTriangles(Verts, TArrayView< const uint32 >( &Indexes[BaseTriangle * 3], NumTriangles * 3 ),
TArrayView< const int32 >( &MaterialIndexes[BaseTriangle], NumTriangles ),
Clusters, VertexBounds, NumTexCoords, bHasColors);
}
// 记录每个Section的Cluster数量.
ClusterCountPerMesh.Add(Clusters.Num() - NumClustersBefore);
BaseTriangle += NumTriangles;
}
}
const int32 OldTriangleCount = Indexes.Num() / 3;
const int32 MinTriCount = 2000;
// 用粗糙代表(coarse representation)代替原始的静态网格数据。
const bool bUseCoarseRepresentation = Settings.PercentTriangles < 1.0f && OldTriangleCount > MinTriCount;
// 如果不用粗糙代表(coarse representation)替换原始的顶点缓冲, 去掉旧的拷贝数据.
// 将它复制到cluster representation中, 在更长的DAG减少阶段之前执行,以减少峰值内存持续时间。
// 当并行构建多个巨大的Nanite网格时,这一点尤为重要。
if (bUseCoarseRepresentation)
{
check(MeshTriangleCounts.Num() == 1);
Verts.Empty();
Indexes.Empty();
MaterialIndexes.Empty();
}
uint32 Time0 = FPlatformTime::Cycles();
FBounds MeshBounds;
TArray<FClusterGroup> Groups; // Cluster组列表.
{
TRACE_CPUPROFILER_EVENT_SCOPE(Nanite::Build::DAG.Reduce);
uint32 ClusterStart = 0;
for (uint32 MeshIndex = 0; MeshIndex < NumMeshes; MeshIndex++)
{
uint32 NumClusters = ClusterCountPerMesh[MeshIndex];
// 构建DAG(Directed Acyclic Graph,有向非循环图),以减面减模, 并且附加Cluster和Group到对应数组中.
BuildDAG( Groups, Clusters, ClusterStart, NumClusters, MeshIndex, MeshBounds );
ClusterStart += NumClusters;
}
}
uint32 ReduceTime = FPlatformTime::Cycles();
UE_LOG(LogStaticMesh, Log, TEXT("Reduce [%.2fs]"), FPlatformTime::ToMilliseconds(ReduceTime - Time0) / 1000.0f);
// 使用粗糙代表.
if (bUseCoarseRepresentation)
{
const uint32 CoarseStartTime = FPlatformTime::Cycles();
int32 CoarseTriCount = FMath::Max(MinTriCount, int32((float(OldTriangleCount) * Settings.PercentTriangles)));
TArray<FStaticMeshSection, TInlineAllocator<1>> CoarseSections = Sections;
// 构建粗糙代表.
BuildCoarseRepresentation(Groups, Clusters, Verts, Indexes, CoarseSections, NumTexCoords, CoarseTriCount);
// 使用粗糙网格范围修正网格section信息, 同时遵守原始序号和保留材质.
// 它不会以任何指定的三角形结束(由于抽取过程)。
for (FStaticMeshSection& Section : Sections)
{
// 对于每个section的信息,尝试在粗略版本中找到一个匹配的条目。
const FStaticMeshSection* CoarseSection = CoarseSections.FindByPredicate(
[&Section](const FStaticMeshSection& CoarseSectionIter)
{
return CoarseSectionIter.MaterialIndex == Section.MaterialIndex;
});
// 找到匹配的条目
if (CoarseSection != nullptr)
{
Section.FirstIndex = CoarseSection->FirstIndex;
Section.NumTriangles = CoarseSection->NumTriangles;
Section.MinVertexIndex = CoarseSection->MinVertexIndex;
Section.MaxVertexIndex = CoarseSection->MaxVertexIndex;
}
// 未找到匹配的条目.
else
{
// 由于抽取而被移除的部分,设置占位符条目
Section.FirstIndex = 0;
Section.NumTriangles = 0;
Section.MinVertexIndex = 0;
Section.MaxVertexIndex = 0;
}
}
const uint32 CoarseEndTime = FPlatformTime::Cycles();
UE_LOG(LogStaticMesh, Log, TEXT("Coarse [%.2fs], original tris: %d, coarse tris: %d"), FPlatformTime::ToMilliseconds(CoarseEndTime - CoarseStartTime) / 1000.0f, OldTriangleCount, CoarseTriCount);
}
uint32 EncodeTime0 = FPlatformTime::Cycles();
// 编码Nanite网格.
Encode( Resources, Settings, Clusters, Groups, MeshBounds, NumMeshes, NumTexCoords, bHasColors );
uint32 EncodeTime1 = FPlatformTime::Cycles();
UE_LOG( LogStaticMesh, Log, TEXT("Encode [%.2fs]"), FPlatformTime::ToMilliseconds( EncodeTime1 - EncodeTime0 ) / 1000.0f );
// 只有一个网格时才生成Imposter.
const bool bGenerateImposter = (NumMeshes == 1);
if (bGenerateImposter)
{
uint32 ImposterStartTime = FPlatformTime::Cycles();
auto& RootChildren = Groups.Last().Children;
// Resources的ImposterAtlas.
FImposterAtlas ImposterAtlas( Resources.ImposterAtlas, MeshBounds );
// 并行生成Imposter.
ParallelFor(FMath::Square(FImposterAtlas::AtlasSize),
[&](int32 TileIndex)
{
FIntPoint TilePos(
TileIndex % FImposterAtlas::AtlasSize,
TileIndex / FImposterAtlas::AtlasSize);
// 遍历所有子Cluster, 光栅化到ImposterAtlas.
for (int32 ClusterIndex = 0; ClusterIndex < RootChildren.Num(); ClusterIndex++)
{
ImposterAtlas.Rasterize(TilePos, Clusters[RootChildren[ClusterIndex]], ClusterIndex);
}
});
UE_LOG(LogStaticMesh, Log, TEXT("Imposter [%.2fs]"), FPlatformTime::ToMilliseconds(FPlatformTime::Cycles() - ImposterStartTime ) / 1000.0f);
}
uint32 Time1 = FPlatformTime::Cycles();
UE_LOG( LogStaticMesh, Log, TEXT("Nanite build [%.2fs]\n"), FPlatformTime::ToMilliseconds( Time1 - Time0 ) / 1000.0f );
return true;
}
// 为每个Section构建1或多个Cluster.
static void ClusterTriangles(
const TArray< FStaticMeshBuildVertex >& Verts,
const TArrayView< const uint32 >& Indexes,
const TArrayView< const int32 >& MaterialIndexes,
TArray< FCluster >& Clusters, // Append
const FBounds& MeshBounds,
uint32 NumTexCoords,
bool bHasColors )
{
uint32 Time0 = FPlatformTime::Cycles();
LOG_CRC( Verts );
LOG_CRC( Indexes );
uint32 NumTriangles = Indexes.Num() / 3;
// 共享边
TArray< uint32 > SharedEdges;
SharedEdges.AddUninitialized( Indexes.Num() );
// 边界边
TBitArray<> BoundaryEdges;
BoundaryEdges.Init( false, Indexes.Num() );
// 边哈希
FHashTable EdgeHash( 1 << FMath::FloorLog2( Indexes.Num() ), Indexes.Num() );
// 并行处理边哈希.
ParallelFor( Indexes.Num(),
[&]( int32 EdgeIndex )
{
uint32 VertIndex0 = Indexes[ EdgeIndex ];
uint32 VertIndex1 = Indexes[ Cycle3( EdgeIndex ) ];
const FVector& Position0 = Verts[ VertIndex0 ].Position;
const FVector& Position1 = Verts[ VertIndex1 ].Position;
uint32 Hash0 = HashPosition( Position0 );
uint32 Hash1 = HashPosition( Position1 );
uint32 Hash = Murmur32( { Hash0, Hash1 } );
// 注意此处添加元素使用的是并发版本Add_Concurrent.
EdgeHash.Add_Concurrent( Hash, EdgeIndex );
});
const int32 NumDwords = FMath::DivideAndRoundUp( BoundaryEdges.Num(), NumBitsPerDWORD );
ParallelFor( NumDwords,
[&]( int32 DwordIndex )
{
const int32 NumIndexes = Indexes.Num();
const int32 NumBits = FMath::Min( NumBitsPerDWORD, NumIndexes - DwordIndex * NumBitsPerDWORD );
uint32 Mask = 1;
uint32 Dword = 0;
for( int32 BitIndex = 0; BitIndex < NumBits; BitIndex++, Mask <<= 1 )
{
// 计算边索引.
int32 EdgeIndex = DwordIndex * NumBitsPerDWORD + BitIndex;
uint32 VertIndex0 = Indexes[ EdgeIndex ];
uint32 VertIndex1 = Indexes[ Cycle3( EdgeIndex ) ];
const FVector& Position0 = Verts[ VertIndex0 ].Position;
const FVector& Position1 = Verts[ VertIndex1 ].Position;
uint32 Hash0 = HashPosition( Position0 );
uint32 Hash1 = HashPosition( Position1 );
uint32 Hash = Murmur32( { Hash1, Hash0 } );
// 找到共享两个顶点且方向相反的边.
/*
/\
/ \
o-<<-o
o->>-o
\ /
\/
*/
uint32 FoundEdge = ~0u;
for( uint32 OtherEdgeIndex = EdgeHash.First( Hash ); EdgeHash.IsValid( OtherEdgeIndex ); OtherEdgeIndex = EdgeHash.Next( OtherEdgeIndex ) )
{
uint32 OtherVertIndex0 = Indexes[ OtherEdgeIndex ];
uint32 OtherVertIndex1 = Indexes[ Cycle3( OtherEdgeIndex ) ];
if( Position0 == Verts[ OtherVertIndex1 ].Position &&
Position1 == Verts[ OtherVertIndex0 ].Position )
{
// 找到匹配的边.
// 哈希表不是确定性的顺序。找到稳定的匹配,而不仅仅是第一个。
FoundEdge = FMath::Min( FoundEdge, OtherEdgeIndex );
}
}
SharedEdges[ EdgeIndex ] = FoundEdge;
if( FoundEdge == ~0u )
{
Dword |= Mask;
}
}
if( Dword )
{
BoundaryEdges.GetData()[ DwordIndex ] = Dword;
}
});
// 不连贯的三角形集.
FDisjointSet DisjointSet( NumTriangles );
for( uint32 EdgeIndex = 0, Num = SharedEdges.Num(); EdgeIndex < Num; EdgeIndex++ )
{
uint32 OtherEdgeIndex = SharedEdges[ EdgeIndex ];
if( OtherEdgeIndex != ~0u )
{
// OtherEdgeIndex是匹配EdgeIndex的最小索引.
// ThisEdgeIndex是匹配OtherEdgeIndex的最小索引.
uint32 ThisEdgeIndex = SharedEdges[ OtherEdgeIndex ];
check( ThisEdgeIndex != ~0u );
check( ThisEdgeIndex <= EdgeIndex );
if( EdgeIndex > ThisEdgeIndex )
{
// 上一个元素指向OtherEdgeIndex
SharedEdges[ EdgeIndex ] = ~0u;
}
else if( EdgeIndex > OtherEdgeIndex )
{
// 再次检测.
DisjointSet.UnionSequential( EdgeIndex / 3, OtherEdgeIndex / 3 );
}
}
}
uint32 BoundaryTime = FPlatformTime::Cycles();
UE_LOG( LogStaticMesh, Log, TEXT("Boundary [%.2fs], tris: %i, UVs %i%s"), FPlatformTime::ToMilliseconds( BoundaryTime - Time0 ) / 1000.0f, Indexes.Num() / 3, NumTexCoords, bHasColors ? TEXT(", Color") : TEXT("") );
LOG_CRC( SharedEdges );
// 三角形划分.
FGraphPartitioner Partitioner( NumTriangles );
{
TRACE_CPUPROFILER_EVENT_SCOPE(Nanite::Build::PartitionGraph);
// 获取三角形的中心.
auto GetCenter = [ &Verts, &Indexes ]( uint32 TriIndex )
{
FVector Center;
Center = Verts[ Indexes[ TriIndex * 3 + 0 ] ].Position;
Center += Verts[ Indexes[ TriIndex * 3 + 1 ] ].Position;
Center += Verts[ Indexes[ TriIndex * 3 + 2 ] ].Position;
return Center * (1.0f / 3.0f);
};
// 构建位置连接.
Partitioner.BuildLocalityLinks( DisjointSet, MeshBounds, GetCenter );
auto* RESTRICT Graph = Partitioner.NewGraph( NumTriangles * 3 );
// 处理划分数据.
for( uint32 i = 0; i < NumTriangles; i++ )
{
Graph->AdjacencyOffset[i] = Graph->Adjacency.Num();
uint32 TriIndex = Partitioner.Indexes[i];
for( int k = 0; k < 3; k++ )
{
uint32 EdgeIndex = SharedEdges[ 3 * TriIndex + k ];
// 增加邻边.
if( EdgeIndex != ~0u )
{
Partitioner.AddAdjacency( Graph, EdgeIndex / 3, 4 * 65 );
}
}
// 增加位置连接.
Partitioner.AddLocalityLinks( Graph, TriIndex, 1 );
}
Graph->AdjacencyOffset[ NumTriangles ] = Graph->Adjacency.Num();
// 精确地划分Cluster.
Partitioner.PartitionStrict( Graph, FCluster::ClusterSize - 4, FCluster::ClusterSize, true );
check( Partitioner.Ranges.Num() );
LOG_CRC( Partitioner.Ranges );
}
// 计算最理想的Cluster数量.
const uint32 OptimalNumClusters = FMath::DivideAndRoundUp< int32 >( Indexes.Num(), FCluster::ClusterSize * 3 );
uint32 ClusterTime = FPlatformTime::Cycles();
UE_LOG( LogStaticMesh, Log, TEXT("Clustering [%.2fs]. Ratio: %f"), FPlatformTime::ToMilliseconds( ClusterTime - BoundaryTime ) / 1000.0f, (float)Partitioner.Ranges.Num() / OptimalNumClusters );
const uint32 BaseCluster = Clusters.Num();
Clusters.AddDefaulted( Partitioner.Ranges.Num() );
// 笔者注: 大于32用单线程? 是否弄反了?
const bool bSingleThreaded = Partitioner.Ranges.Num() > 32;
{
TRACE_CPUPROFILER_EVENT_SCOPE(Nanite::Build::BuildClusters);
// 并行构建Cluster.
ParallelFor( Partitioner.Ranges.Num(),
[&]( int32 Index )
{
auto& Range = Partitioner.Ranges[ Index ];
// 创建单个Cluster实例.
Clusters[ BaseCluster + Index ] = FCluster( Verts,
Indexes,
MaterialIndexes,
BoundaryEdges, Range.Begin, Range.End, Partitioner.Indexes, NumTexCoords, bHasColors );
// 负数标明它是个叶子.
Clusters[ BaseCluster + Index ].EdgeLength *= -1.0f;
}, bSingleThreaded);
}
uint32 LeavesTime = FPlatformTime::Cycles();
UE_LOG( LogStaticMesh, Log, TEXT("Leaves [%.2fs]"), FPlatformTime::ToMilliseconds( LeavesTime - ClusterTime ) / 1000.0f );
}
上一小节的代码在处理Cluster时使用了FGraphPartitioner,下面进入它的代码分析:
// Engine\Source\Developer\NaniteBuilder\Private\GraphPartitioner.h
(......)
// 引用了metis第三方开源库.
#include "metis.h"
(......)
// Cluster划分图
class FGraphPartitioner
{
public:
// 图数据.
struct FGraphData
{
int32 Offset; // 索引位移.
int32 Num; // 数量.
TArray< idx_t > Adjacency; // 邻边列表
TArray< idx_t > AdjacencyCost; // 邻边权重列表
TArray< idx_t > AdjacencyOffset; // 邻边位移列表
};
// 范围是[Begin, End]
struct FRange
{
uint32 Begin;
uint32 End;
bool operator<( const FRange& Other) const { return Begin < Other.Begin; }
};
TArray< FRange > Ranges;
TArray< uint32 > Indexes;
public:
FGraphPartitioner( uint32 InNumElements );
// 构建新的子图数据实例.
FGraphData* NewGraph( uint32 NumAdjacency ) const;
// 增加邻边.
void AddAdjacency( FGraphData* Graph, uint32 AdjIndex, idx_t Cost );
// 增加位置连接.
void AddLocalityLinks( FGraphData* Graph, uint32 Index, idx_t Cost );
// 构建位置连接.
template< typename FGetCenter >
void BuildLocalityLinks( FDisjointSet& DisjointSet, const FBounds& Bounds, FGetCenter& GetCenter );
// 划分Cluster.
void Partition( FGraphData* Graph, int32 InMinPartitionSize, int32 InMaxPartitionSize );
// 精确地划分Cluster.
void PartitionStrict( FGraphData* Graph, int32 InMinPartitionSize, int32 InMaxPartitionSize, bool bThreaded );
private:
// 平分子图.
void BisectGraph( FGraphData* Graph, FGraphData* ChildGraphs[2] );
// 递归平分子图.
void RecursiveBisectGraph( FGraphData* Graph );
uint32 NumElements;
int32 MinPartitionSize = 0;
int32 MaxPartitionSize = 0;
// Cluster数量. 用了原子, 以支持多线程读写.
TAtomic< uint32 > NumPartitions;
TArray< idx_t > PartitionIDs;
TArray< int32 > SwappedWith;
TArray< uint32 > SortedTo;
// 位置连接.
TMultiMap< uint32, uint32 > LocalityLinks;
};
(......)
// Engine\Source\Developer\NaniteBuilder\Private\GraphPartitioner.cpp
(......)
// 平分网格.
void FGraphPartitioner::BisectGraph( FGraphData* Graph, FGraphData* ChildGraphs[2] )
{
ChildGraphs[0] = nullptr;
ChildGraphs[1] = nullptr;
// 增加分区回调.
auto AddPartition =
[ this ]( int32 Offset, int32 Num )
{
FRange& Range = Ranges[ NumPartitions++ ];
Range.Begin = Offset;
Range.End = Offset + Num;
};
// 如果Graph的分区数量没有超限, 则直接添加到this中.
if( Graph->Num <= MaxPartitionSize )
{
AddPartition( Graph->Offset, Graph->Num );
return;
}
// 计算预期的分区尺寸.
const int32 TargetPartitionSize = ( MinPartitionSize + MaxPartitionSize ) / 2;
const int32 TargetNumPartitions = FMath::Max( 2, FMath::DivideAndRoundNearest( Graph->Num, TargetPartitionSize ) );
check( Graph->AdjacencyOffset.Num() == Graph->Num + 1 );
idx_t NumConstraints = 1;
idx_t NumParts = 2;
idx_t EdgesCut = 0;
real_t PartitionWeights[] = {
float( TargetNumPartitions / 2 ) / TargetNumPartitions,
1.0f - float( TargetNumPartitions / 2 ) / TargetNumPartitions
};
// 设置Metis库的默认操作参数.
idx_t Options[ METIS_NOPTIONS ];
METIS_SetDefaultOptions( Options );
// 在高层级允许宽松的容差, 严格的平衡在更接近分区大小之前并不重要。
bool bLoose = TargetNumPartitions >= 128 || MaxPartitionSize / MinPartitionSize > 1;
bool bSlow = Graph->Num < 4096;
Options[ METIS_OPTION_UFACTOR ] = bLoose ? 200 : 1;
//Options[ METIS_OPTION_NCUTS ] = Graph->Num < 1024 ? 8 : ( Graph->Num < 4096 ? 4 : 1 );
//Options[ METIS_OPTION_NCUTS ] = bSlow ? 4 : 1;
//Options[ METIS_OPTION_NITER ] = bSlow ? 20 : 10;
//Options[ METIS_OPTION_IPTYPE ] = METIS_IPTYPE_RANDOM;
//Options[ METIS_OPTION_MINCONN ] = 1;
// 调用Metis的递归划分.
int r = METIS_PartGraphRecursive(
&Graph->Num,
&NumConstraints, // number of balancing constraints
Graph->AdjacencyOffset.GetData(),
Graph->Adjacency.GetData(),
NULL, // Vert weights
NULL, // Vert sizes for computing the total communication volume
Graph->AdjacencyCost.GetData(), // Edge weights
&NumParts,
PartitionWeights, // Target partition weight
NULL, // Allowed load imbalance tolerance
Options,
&EdgesCut,
PartitionIDs.GetData() + Graph->Offset
);
// 确认Metis递归划分的结果有效.
if( ensureAlways( r == METIS_OK ) )
{
// 在适当的位置划分数组.
// 双方都保持排序,但顺序是颠倒的.
int32 Front = Graph->Offset;
int32 Back = Graph->Offset + Graph->Num - 1;
while( Front <= Back )
{
while( Front <= Back && PartitionIDs[ Front ] == 0 )
{
SwappedWith[ Front ] = Front;
Front++;
}
while( Front <= Back && PartitionIDs[ Back ] == 1 )
{
SwappedWith[ Back ] = Back;
Back--;
}
if( Front < Back )
{
Swap( Indexes[ Front ], Indexes[ Back ] );
SwappedWith[ Front ] = Back;
SwappedWith[ Back ] = Front;
Front++;
Back--;
}
}
int32 Split = Front;
int32 Num[2];
Num[0] = Split - Graph->Offset;
Num[1] = Graph->Offset + Graph->Num - Split;
check( Num[0] > 1 );
check( Num[1] > 1 );
// 如果两个子节点的分区尺寸未超限, 则直接添加.
if( Num[0] <= MaxPartitionSize && Num[1] <= MaxPartitionSize )
{
AddPartition( Graph->Offset, Num[0] );
AddPartition( Split, Num[1] );
}
else
{
// 创建两个子节点实例.
for( int32 i = 0; i < 2; i++ )
{
ChildGraphs[i] = new FGraphData;
ChildGraphs[i]->Adjacency.Reserve( Graph->Adjacency.Num() >> 1 );
ChildGraphs[i]->AdjacencyCost.Reserve( Graph->Adjacency.Num() >> 1 );
ChildGraphs[i]->AdjacencyOffset.Reserve( Num[i] + 1 );
ChildGraphs[i]->Num = Num[i];
}
ChildGraphs[0]->Offset = Graph->Offset;
ChildGraphs[1]->Offset = Split;
// 遍历所有子分区, 将Graph的邻边加入到ChildGraphs[0]或ChildGraphs[1]
for( int32 i = 0; i < Graph->Num; i++ )
{
// 这里代码有点trick: 若i<=ChildGraphs[0]->Num则获取ChildGraphs[0], 否则获取ChildGraphs[1].
FGraphData* ChildGraph = ChildGraphs[ i >= ChildGraphs[0]->Num ];
ChildGraph->AdjacencyOffset.Add( ChildGraph->Adjacency.Num() );
int32 OrgIndex = SwappedWith[ Graph->Offset + i ] - Graph->Offset;
for( idx_t AdjIndex = Graph->AdjacencyOffset[ OrgIndex ]; AdjIndex < Graph->AdjacencyOffset[ OrgIndex + 1 ]; AdjIndex++ )
{
idx_t Adj = Graph->Adjacency[ AdjIndex ];
idx_t AdjCost = Graph->AdjacencyCost[ AdjIndex ];
// Remap to child
Adj = SwappedWith[ Graph->Offset + Adj ] - ChildGraph->Offset;
// Edge connects to node in this graph
if( 0 <= Adj && Adj < ChildGraph->Num )
{
ChildGraph->Adjacency.Add( Adj );
ChildGraph->AdjacencyCost.Add( AdjCost );
}
}
}
ChildGraphs[0]->AdjacencyOffset.Add( ChildGraphs[0]->Adjacency.Num() );
ChildGraphs[1]->AdjacencyOffset.Add( ChildGraphs[1]->Adjacency.Num() );
}
}
}
// 精确划分
void FGraphPartitioner::PartitionStrict( FGraphData* Graph, int32 InMinPartitionSize, int32 InMaxPartitionSize, bool bThreaded )
{
MinPartitionSize = InMinPartitionSize;
MaxPartitionSize = InMaxPartitionSize;
PartitionIDs.AddUninitialized( NumElements );
SwappedWith.AddUninitialized( NumElements );
// Adding to atomically so size big enough to not need to grow.
int32 NumPartitionsExpected = FMath::DivideAndRoundUp( Graph->Num, MinPartitionSize );
Ranges.AddUninitialized( NumPartitionsExpected * 2 );
NumPartitions = 0;
// 使用多线程.
if( bThreaded && NumPartitionsExpected > 4 )
{
extern CORE_API int32 GUseNewTaskBackend;
// 使用后台线程.
if (GUseNewTaskBackend)
{
// 局部工作队列
TLocalWorkQueue<FGraphData> LocalWork(Graph);
// 这里的Self指Lambda函数自身.
LocalWork.Run(MakeYCombinator([this, &LocalWork](auto Self, FGraphData* Graph) -> void
{
FGraphData* ChildGraphs[2];
// 平均划分.
BisectGraph( Graph, ChildGraphs );
delete Graph;
if( ChildGraphs[0] && ChildGraphs[1] )
{
// 处理第1个子节点
// 只有在剩余工作足够大的情况下才会添加新的工作线程
if (ChildGraphs[0]->Num > 256)
{
LocalWork.AddTask(ChildGraphs[0]);
LocalWork.AddWorkers(1);
}
else // 否则递归调用.
{
Self(ChildGraphs[0]);
}
// 处理第2个子节点
Self(ChildGraphs[1]);
}
}));
}
// 非后台线程. 使用传统的TaskGraph任务系统.
else
{
const ENamedThreads::Type DesiredThread = IsInGameThread() ? ENamedThreads::AnyThread : ENamedThreads::AnyBackgroundThreadNormalTask;
// 构建任务.
class FBuildTask
{
public:
FBuildTask( FGraphPartitioner* InPartitioner, FGraphData* InGraph, ENamedThreads::Type InDesiredThread)
: Partitioner( InPartitioner )
, Graph( InGraph )
, DesiredThread( InDesiredThread )
{}
void DoTask( ENamedThreads::Type CurrentThread, const FGraphEventRef& MyCompletionEvent )
{
FGraphData* ChildGraphs[2];
Partitioner->BisectGraph( Graph, ChildGraphs );
delete Graph;
if( ChildGraphs[0] && ChildGraphs[1] )
{
if( ChildGraphs[0]->Num > 256 )
{
FGraphEventRef Task = TGraphTask< FBuildTask >::CreateTask().ConstructAndDispatchWhenReady( Partitioner, ChildGraphs[0], DesiredThread);
MyCompletionEvent->DontCompleteUntil( Task );
}
else
{
FBuildTask( Partitioner, ChildGraphs[0], DesiredThread).DoTask( CurrentThread, MyCompletionEvent );
}
FBuildTask( Partitioner, ChildGraphs[1], DesiredThread).DoTask( CurrentThread, MyCompletionEvent );
}
}
static FORCEINLINE TStatId GetStatId()
{
RETURN_QUICK_DECLARE_CYCLE_STAT(FBuildTask, STATGROUP_ThreadPoolAsyncTasks);
}
static FORCEINLINE ESubsequentsMode::Type GetSubsequentsMode() { return ESubsequentsMode::TrackSubsequents; }
FORCEINLINE ENamedThreads::Type GetDesiredThread() const
{
return DesiredThread;
}
private:
FGraphPartitioner* Partitioner;
FGraphData* Graph;
ENamedThreads::Type DesiredThread;
};
FGraphEventRef BuildTask = TGraphTask< FBuildTask >::CreateTask( nullptr ).ConstructAndDispatchWhenReady( this, Graph, DesiredThread);
FTaskGraphInterface::Get().WaitUntilTaskCompletes( BuildTask );
}
}
else
{
RecursiveBisectGraph( Graph );
}
Ranges.SetNum( NumPartitions );
if( bThreaded )
{
// Force a deterministic order
Ranges.Sort();
}
PartitionIDs.Empty();
SwappedWith.Empty();
}
关于Nanite的网格划分,这里补充以下说明:
METIS是一套用于划分图、划分有限元网格和生成稀疏矩阵的填充约序的串行程序,在METIS中实现的算法是基于Karypis实验室开发的多级递归对分、多级k-way和多约束划分方案。它的关键特性有:
- 提供高品质的划分。METIS产生的分区始终优于其他广泛使用的算法产生的分区。METIS产生的分区始终比光谱划分算法(spectral partitioning algorithms)产生的分区好10%到50%。
- 处理速度异常快。大量实践表明,METIS比其他广泛使用的分区算法快一到两个数量级。在当前的工作站和pc机上,具有数百万个顶点的图形可以在几秒钟内划分为256个部分。
- 生成结果具有低填充率。由METIS产生的减少填充的排序明显优于其他广泛使用的算法,包括多最小度(multiple minimum degree)。对于科学计算和线性规划中出现的许多类问题,METIS能够将稀疏矩阵分解的存储和计算要求降低到一个数量级。与多最小度方法不同,METIS生成的消元树适用于并行直接分解。此外,METIS能够非常快地计算这些排序。在当前的工作站和pc上,具有数百万行的矩阵可以在几秒钟内重新排序。
它还有并行化的版本ParMETIS。具体参加官方说明:Family of Graph and Hypergraph Partitioning Software。
划分网格时的步骤、细节和逻辑比较复杂,但笔者认为Nanite的划分思路、意图和文稿METIS Three Phases Coarsening Partitioning Uncoarsening、论文Learning Boundary Edges for 3D-Mesh Segmentation比较类似,便依此加以说明网格划分的算法和过程。
在需要将某个模型划分成若干份时,可以使用普林斯顿划分原则(Princeton segmentation benchmark)手动划分,也可以借助某些算法自动划分(下图)。
上图存在多组配对图,每组配对图的左边是基于普林斯顿划分原则手动划分的(深色的线表示手动划分的边),配对图的右边是算法自动划分而成(红色是边界)。可见自动划分算法可以和手动划分高度匹配。
自动划分算法既有结合深度学习和视觉的方法,又有像METIS的基于数理的传统算法。而METIS的划分算法有3个阶段:粗化(Coarsening)、划分(Partitioning)、细分(Uncoarsening)。
在Coarsening阶段,最大化匹配:没有共同顶点的边集合,查找复杂度上存在NP完全问题。
Coarsening在匹配最大化边缘时,存在NP完全问题,如a组明显不是最多的非共享顶点边数,b才是。
在Partitioning阶段,需要两个步骤,第一步是随机选取一个根,第二步是宽度优先搜索(breadth first search,BFS)以包含能够获得较少切边的顶点。
在Uncoarsening阶段的关键思路:每个父节点包含了一组子节点,通过从一个分区移动顶点到另一个分区来减少切边。
// Engine\Source\Developer\NaniteBuilder\Private\ClusterDAG.cpp
// 构建Cluster的有向非循环图.
void BuildDAG( TArray< FClusterGroup >& Groups, TArray< FCluster >& Clusters, uint32 ClusterRangeStart, uint32 ClusterRangeNum, uint32 MeshIndex, FBounds& MeshBounds )
{
uint32 LevelOffset = ClusterRangeStart;
TAtomic< uint32 > NumClusters( Clusters.Num() );
uint32 NumExternalEdges = 0;
bool bFirstLevel = true;
while( true )
{
TArrayView< FCluster > LevelClusters( &Clusters[LevelOffset], bFirstLevel ? ClusterRangeNum : (Clusters.Num() - LevelOffset) );
bFirstLevel = false;
for( FCluster& Cluster : LevelClusters )
{
NumExternalEdges += Cluster.NumExternalEdges;
MeshBounds += Cluster.Bounds;
}
if( LevelClusters.Num() < 2 )
break;
// 如果该级别的Cluster少于每个组的最大数量, 直接添加到组列表.
if( LevelClusters.Num() <= MaxGroupSize )
{
TArray< uint32, TInlineAllocator< MaxGroupSize > > Children;
uint32 MaxParents = 0;
for( FCluster& Cluster : LevelClusters )
{
MaxParents += FMath::DivideAndRoundUp< uint32 >( Cluster.Indexes.Num(), FCluster::ClusterSize * 6 );
Children.Add( LevelOffset++ );
}
LevelOffset = Clusters.Num();
Clusters.AddDefaulted( MaxParents );
Groups.AddDefaulted( 1 );
// 使用DAG减顶点减面并添加到对应组.
DAGReduce( Groups, Clusters, NumClusters, Children, Groups.Num() - 1, MeshIndex );
// Correct num to atomic count
Clusters.SetNum( NumClusters, false );
continue;
}
// 该级别的Cluster数量大于MaxGroupSize, 需要用FGraphPartitioner进行划分.
// 外部边缘结构体
struct FExternalEdge
{
uint32 ClusterIndex;
uint32 EdgeIndex;
};
// 外部边缘列表.
TArray< FExternalEdge > ExternalEdges;
FHashTable ExternalEdgeHash;
TAtomic< uint32 > ExternalEdgeOffset(0);
// 有NumExternalEdges的总数,所以可以分配一个不增长的哈希表。
ExternalEdges.AddUninitialized( NumExternalEdges );
ExternalEdgeHash.Clear( 1 << FMath::FloorLog2( NumExternalEdges ), NumExternalEdges );
NumExternalEdges = 0;
// 并行地增加边缘到哈希表.
ParallelFor( LevelClusters.Num(),
[&]( uint32 ClusterIndex )
{
FCluster& Cluster = LevelClusters[ ClusterIndex ];
for( TConstSetBitIterator<> SetBit( Cluster.ExternalEdges ); SetBit; ++SetBit )
{
uint32 EdgeIndex = SetBit.GetIndex();
uint32 VertIndex0 = Cluster.Indexes[ EdgeIndex ];
uint32 VertIndex1 = Cluster.Indexes[ Cycle3( EdgeIndex ) ];
const FVector& Position0 = Cluster.GetPosition( VertIndex0 );
const FVector& Position1 = Cluster.GetPosition( VertIndex1 );
uint32 Hash0 = HashPosition( Position0 );
uint32 Hash1 = HashPosition( Position1 );
uint32 Hash = Murmur32( { Hash0, Hash1 } );
uint32 ExternalEdgeIndex = ExternalEdgeOffset++;
ExternalEdges[ ExternalEdgeIndex ] = { ClusterIndex, EdgeIndex };
ExternalEdgeHash.Add_Concurrent( Hash, ExternalEdgeIndex );
}
});
check( ExternalEdgeOffset == ExternalEdges.Num() );
TAtomic< uint32 > NumAdjacency(0);
// 并行地在其它Cluster查找匹配边缘.
ParallelFor( LevelClusters.Num(),
[&]( uint32 ClusterIndex )
{
FCluster& Cluster = LevelClusters[ ClusterIndex ];
for( TConstSetBitIterator<> SetBit( Cluster.ExternalEdges ); SetBit; ++SetBit )
{
uint32 EdgeIndex = SetBit.GetIndex();
uint32 VertIndex0 = Cluster.Indexes[ EdgeIndex ];
uint32 VertIndex1 = Cluster.Indexes[ Cycle3( EdgeIndex ) ];
const FVector& Position0 = Cluster.GetPosition( VertIndex0 );
const FVector& Position1 = Cluster.GetPosition( VertIndex1 );
uint32 Hash0 = HashPosition( Position0 );
uint32 Hash1 = HashPosition( Position1 );
uint32 Hash = Murmur32( { Hash1, Hash0 } );
for( uint32 ExternalEdgeIndex = ExternalEdgeHash.First( Hash ); ExternalEdgeHash.IsValid( ExternalEdgeIndex ); ExternalEdgeIndex = ExternalEdgeHash.Next( ExternalEdgeIndex ) )
{
FExternalEdge ExternalEdge = ExternalEdges[ ExternalEdgeIndex ];
FCluster& OtherCluster = LevelClusters[ ExternalEdge.ClusterIndex ];
if( OtherCluster.ExternalEdges[ ExternalEdge.EdgeIndex ] )
{
uint32 OtherVertIndex0 = OtherCluster.Indexes[ ExternalEdge.EdgeIndex ];
uint32 OtherVertIndex1 = OtherCluster.Indexes[ Cycle3( ExternalEdge.EdgeIndex ) ];
if( Position0 == OtherCluster.GetPosition( OtherVertIndex1 ) &&
Position1 == OtherCluster.GetPosition( OtherVertIndex0 ) )
{
// 找到匹配边缘, 增加其计数.
Cluster.AdjacentClusters.FindOrAdd( ExternalEdge.ClusterIndex, 0 )++;
// Can't break or a triple edge might be non-deterministically connected.
// Need to find all matching, not just first.
}
}
}
}
NumAdjacency += Cluster.AdjacentClusters.Num();
// 强制邻边的确定性顺序。
Cluster.AdjacentClusters.KeySort(
[ &LevelClusters ]( uint32 A, uint32 B )
{
return LevelClusters[A].GUID < LevelClusters[B].GUID;
} );
});
// 不连续的Cluster的集合.
FDisjointSet DisjointSet( LevelClusters.Num() );
for( uint32 ClusterIndex = 0; ClusterIndex < (uint32)LevelClusters.Num(); ClusterIndex++ )
{
for( auto& Pair : LevelClusters[ ClusterIndex ].AdjacentClusters )
{
uint32 OtherClusterIndex = Pair.Key;
uint32 Count = LevelClusters[ OtherClusterIndex ].AdjacentClusters.FindChecked( ClusterIndex );
check( Count == Pair.Value );
if( ClusterIndex > OtherClusterIndex )
{
DisjointSet.UnionSequential( ClusterIndex, OtherClusterIndex );
}
}
}
// 划分器.
FGraphPartitioner Partitioner( LevelClusters.Num() );
// 排序以强制确定性顺序。
{
TArray< uint32 > SortedIndexes;
SortedIndexes.AddUninitialized( Partitioner.Indexes.Num() );
RadixSort32( SortedIndexes.GetData(), Partitioner.Indexes.GetData(), Partitioner.Indexes.Num(),
[&]( uint32 Index )
{
return LevelClusters[ Index ].GUID;
} );
Swap( Partitioner.Indexes, SortedIndexes );
}
auto GetCenter = [&]( uint32 Index )
{
FBounds& Bounds = LevelClusters[ Index ].Bounds;
return 0.5f * ( Bounds.Min + Bounds.Max );
};
// 构建位置连接.
Partitioner.BuildLocalityLinks( DisjointSet, MeshBounds, GetCenter );
auto* RESTRICT Graph = Partitioner.NewGraph( NumAdjacency );
// 遍历所有层级的Cluster, 再遍历每个层级上的所有Cluster, 增加邻边和位置连接.
for( int32 i = 0; i < LevelClusters.Num(); i++ )
{
Graph->AdjacencyOffset[i] = Graph->Adjacency.Num();
uint32 ClusterIndex = Partitioner.Indexes[i];
for( auto& Pair : LevelClusters[ ClusterIndex ].AdjacentClusters )
{
uint32 OtherClusterIndex = Pair.Key;
uint32 NumSharedEdges = Pair.Value;
const auto& Cluster0 = Clusters[ LevelOffset + ClusterIndex ];
const auto& Cluster1 = Clusters[ LevelOffset + OtherClusterIndex ];
bool bSiblings = Cluster0.GroupIndex != MAX_uint32 && Cluster0.GroupIndex == Cluster1.GroupIndex;
Partitioner.AddAdjacency( Graph, OtherClusterIndex, NumSharedEdges * ( bSiblings ? 1 : 16 ) + 4 );
}
Partitioner.AddLocalityLinks( Graph, ClusterIndex, 1 );
}
Graph->AdjacencyOffset[ Graph->Num ] = Graph->Adjacency.Num();
LOG_CRC( Graph->Adjacency );
LOG_CRC( Graph->AdjacencyCost );
LOG_CRC( Graph->AdjacencyOffset );
// 严格分区.
Partitioner.PartitionStrict( Graph, MinGroupSize, MaxGroupSize, true );
LOG_CRC( Partitioner.Ranges );
// 计算最大父亲数量.
uint32 MaxParents = 0;
for( auto& Range : Partitioner.Ranges )
{
uint32 NumParentIndexes = 0;
for( uint32 i = Range.Begin; i < Range.End; i++ )
{
// Global indexing is needed in Reduce()
Partitioner.Indexes[i] += LevelOffset;
NumParentIndexes += Clusters[ Partitioner.Indexes[i] ].Indexes.Num();
}
MaxParents += FMath::DivideAndRoundUp( NumParentIndexes, FCluster::ClusterSize * 6 );
}
LevelOffset = Clusters.Num();
Clusters.AddDefaulted( MaxParents );
Groups.AddDefaulted( Partitioner.Ranges.Num() );
// 并行地执行DAG减面减模.
ParallelFor( Partitioner.Ranges.Num(),
[&]( int32 PartitionIndex )
{
auto& Range = Partitioner.Ranges[ PartitionIndex ];
TArrayView< uint32 > Children( &Partitioner.Indexes[ Range.Begin ], Range.End - Range.Begin );
uint32 ClusterGroupIndex = PartitionIndex + Groups.Num() - Partitioner.Ranges.Num();
DAGReduce( Groups, Clusters, NumClusters, Children, ClusterGroupIndex, MeshIndex );
});
// Correct num to atomic count
Clusters.SetNum( NumClusters, false );
}
// 最大输出根节点.
uint32 RootIndex = LevelOffset;
FClusterGroup RootClusterGroup;
RootClusterGroup.Children.Add( RootIndex );
RootClusterGroup.Bounds = Clusters[ RootIndex ].SphereBounds;
RootClusterGroup.LODBounds = FSphere( 0 );
RootClusterGroup.MaxParentLODError = 1e10f;
RootClusterGroup.MinLODError = -1.0f;
RootClusterGroup.MipLevel = Clusters[RootIndex].MipLevel + 1;
RootClusterGroup.MeshIndex = MeshIndex;
Clusters[ RootIndex ].GroupIndex = Groups.Num();
Groups.Add( RootClusterGroup );
}
上面数次执行了DAGReduce,简析其实现:
static void DAGReduce( TArray< FClusterGroup >& Groups, TArray< FCluster >& Clusters, TAtomic< uint32 >& NumClusters, TArrayView< uint32 > Children, int32 GroupIndex, uint32 MeshIndex )
{
check( GroupIndex >= 0 );
// 合并Cluster.
TArray< const FCluster*, TInlineAllocator<16> > MergeList;
for( int32 Child : Children )
{
MergeList.Add( &Clusters[ Child ] );
}
// 强制有序。
MergeList.Sort(
[]( const FCluster& A, const FCluster& B )
{
return A.GUID < B.GUID;
} );
FCluster Merged( MergeList );
int32 NumParents = FMath::DivideAndRoundUp< int32 >( Merged.Indexes.Num(), FCluster::ClusterSize * 6 );
int32 ParentStart = 0;
int32 ParentEnd = 0;
float ParentMaxLODError = 0.0f;
// 注意TargetClusterSize的步长-2.
for( int32 TargetClusterSize = FCluster::ClusterSize - 2; TargetClusterSize > FCluster::ClusterSize / 2; TargetClusterSize -= 2 )
{
int32 TargetNumTris = NumParents * TargetClusterSize;
// 简化, 会返回父节点最大LOD误差.
ParentMaxLODError = Merged.Simplify( TargetNumTris );
// 拆分
if( NumParents == 1 )
{
ParentEnd = ( NumClusters += NumParents );
ParentStart = ParentEnd - NumParents;
Clusters[ ParentStart ] = Merged;
Clusters[ ParentStart ].Bound();
break;
}
else
{
FGraphPartitioner Partitioner( Merged.Indexes.Num() / 3 );
Merged.Split( Partitioner );
if( Partitioner.Ranges.Num() <= NumParents )
{
NumParents = Partitioner.Ranges.Num();
ParentEnd = ( NumClusters += NumParents );
ParentStart = ParentEnd - NumParents;
int32 Parent = ParentStart;
for( auto& Range : Partitioner.Ranges )
{
Clusters[ Parent ] = FCluster( Merged, Range.Begin, Range.End, Partitioner.Indexes );
Parent++;
}
break;
}
}
}
TArray< FSphere, TInlineAllocator<32> > Children_LODBounds;
TArray< FSphere, TInlineAllocator<32> > Children_SphereBounds;
// 强制单调地嵌套(monotonic nesting).
float ChildMinLODError = MAX_flt;
for( int32 Child : Children )
{
bool bLeaf = Clusters[ Child ].EdgeLength < 0.0f;
float LODError = Clusters[ Child ].LODError;
Children_LODBounds.Add( Clusters[ Child ].LODBounds );
Children_SphereBounds.Add( Clusters[ Child ].SphereBounds );
ChildMinLODError = FMath::Min( ChildMinLODError, bLeaf ? -1.0f : LODError );
ParentMaxLODError = FMath::Max( ParentMaxLODError, LODError );
Clusters[ Child ].GroupIndex = GroupIndex;
Groups[ GroupIndex ].Children.Add( Child );
check( Groups[ GroupIndex ].Children.Num() <= MAX_CLUSTERS_PER_GROUP_TARGET );
}
FSphere ParentLODBounds( Children_LODBounds.GetData(), Children_LODBounds.Num() );
FSphere ParentBounds( Children_SphereBounds.GetData(), Children_SphereBounds.Num() );
// 强制父节点都有相同的LOD数据, 它们彼此依赖.
for( int32 Parent = ParentStart; Parent < ParentEnd; Parent++ )
{
Clusters[ Parent ].LODBounds = ParentLODBounds;
Clusters[ Parent ].LODError = ParentMaxLODError;
Clusters[ Parent ].GeneratingGroupIndex = GroupIndex;
}
Groups[ GroupIndex ].Bounds = ParentBounds;
Groups[ GroupIndex ].LODBounds = ParentLODBounds;
Groups[ GroupIndex ].MinLODError = ChildMinLODError;
Groups[ GroupIndex ].MaxParentLODError = ParentMaxLODError;
Groups[ GroupIndex ].MipLevel = Merged.MipLevel - 1;
Groups[ GroupIndex ].MeshIndex = MeshIndex;
}
BuildCoarseRepresentation根据输入的Cluster列表和Cluster组列表构建网格的粗糙代表,输出对应的顶点、索引、Section等数据:
static void BuildCoarseRepresentation(
const TArray<FClusterGroup>& Groups,
const TArray<FCluster>& Clusters,
TArray<FStaticMeshBuildVertex>& Verts,
TArray<uint32>& Indexes,
TArray<FStaticMeshSection, TInlineAllocator<1>>& Sections,
uint32& NumTexCoords,
uint32 TargetNumTris
)
{
FCluster CoarseRepresentation = FindDAGCut(Groups, Clusters, TargetNumTris + 4096);
CoarseRepresentation.Simplify(TargetNumTris);
TArray< FStaticMeshSection, TInlineAllocator<1> > OldSections = Sections;
// 需要更新粗糙代表的UV计数以匹配新的数据。
NumTexCoords = CoarseRepresentation.NumTexCoords;
// 重建顶点数据。
Verts.Empty(CoarseRepresentation.NumVerts);
for (uint32 Iter = 0, Num = CoarseRepresentation.NumVerts; Iter < Num; ++Iter)
{
FStaticMeshBuildVertex Vertex = {};
Vertex.Position = CoarseRepresentation.GetPosition(Iter);
Vertex.TangentX = FVector::ZeroVector;
Vertex.TangentY = FVector::ZeroVector;
Vertex.TangentZ = CoarseRepresentation.GetNormal(Iter);
const FVector2D* UVs = CoarseRepresentation.GetUVs(Iter);
for (uint32 UVIndex = 0; UVIndex < NumTexCoords; ++UVIndex)
{
Vertex.UVs[UVIndex] = UVs[UVIndex].ContainsNaN() ? FVector2D::ZeroVector : UVs[UVIndex];
}
if (CoarseRepresentation.bHasColors)
{
Vertex.Color = CoarseRepresentation.GetColor(Iter).ToFColor(false /* sRGB */);
}
Verts.Add(Vertex);
}
TArray<FMaterialTriangle, TInlineAllocator<128>> CoarseMaterialTris;
TArray<FMaterialRange, TInlineAllocator<4>> CoarseMaterialRanges;
// 计算粗糙代表的材质范围.
BuildMaterialRanges(
CoarseRepresentation.Indexes,
CoarseRepresentation.MaterialIndexes,
CoarseMaterialTris,
CoarseMaterialRanges);
check(CoarseMaterialRanges.Num() <= OldSections.Num());
// 重建section数据.
Sections.Reset(CoarseMaterialRanges.Num());
for (const FStaticMeshSection& OldSection : OldSections)
{
// 根据计算的材质范围添加新的section.
// 强制材质顺序与OldSections一样.
const FMaterialRange* FoundRange = CoarseMaterialRanges.FindByPredicate([&OldSection](const FMaterialRange& Range) { return Range.MaterialIndex == OldSection.MaterialIndex; });
// 如果它们的源数据没有包含足够的三角形,那么它们实际上可以从粗糙网格中删除.
if (FoundRange)
{
// 从原始网格section复制属性。
FStaticMeshSection Section(OldSection);
// 渲染section时使用的顶点和索引的范围.
Section.FirstIndex = FoundRange->RangeStart * 3;
Section.NumTriangles = FoundRange->RangeLength;
Section.MinVertexIndex = TNumericLimits<uint32>::Max();
Section.MaxVertexIndex = TNumericLimits<uint32>::Min();
for (uint32 TriangleIndex = 0; TriangleIndex < (FoundRange->RangeStart + FoundRange->RangeLength); ++TriangleIndex)
{
const FMaterialTriangle& Triangle = CoarseMaterialTris[TriangleIndex];
// 更新最小顶点索引.
Section.MinVertexIndex = FMath::Min(Section.MinVertexIndex, Triangle.Index0);
Section.MinVertexIndex = FMath::Min(Section.MinVertexIndex, Triangle.Index1);
Section.MinVertexIndex = FMath::Min(Section.MinVertexIndex, Triangle.Index2);
// 更新最大顶点索引.
Section.MaxVertexIndex = FMath::Max(Section.MaxVertexIndex, Triangle.Index0);
Section.MaxVertexIndex = FMath::Max(Section.MaxVertexIndex, Triangle.Index1);
Section.MaxVertexIndex = FMath::Max(Section.MaxVertexIndex, Triangle.Index2);
}
Sections.Add(Section);
}
}
// 重建索引数据.
Indexes.Reset();
for (const FMaterialTriangle& Triangle : CoarseMaterialTris)
{
Indexes.Add(Triangle.Index0);
Indexes.Add(Triangle.Index1);
Indexes.Add(Triangle.Index2);
}
// 计算切线.
CalcTangents(Verts, Indexes);
}
Encode将Nanite资源根据FMeshNaniteSettings编码到Cluster和Cluster组中:
// Engine\Source\Developer\NaniteBuilder\Private\NaniteEncode.cpp
void Encode(
FResources& Resources,
const FMeshNaniteSettings& Settings,
TArray< FCluster >& Clusters,
TArray< FClusterGroup >& Groups,
const FBounds& MeshBounds,
uint32 NumMeshes,
uint32 NumTexCoords,
bool bHasColors )
{
// 删除退化的三角形.
{
TRACE_CPUPROFILER_EVENT_SCOPE(Nanite::Build::RemoveDegenerateTriangles);
RemoveDegenerateTriangles( Clusters );
}
// 构建材质范围.
{
TRACE_CPUPROFILER_EVENT_SCOPE(Nanite::Build::BuildMaterialRanges);
BuildMaterialRanges( Clusters );
}
// 约束Cluster.
#if USE_CONSTRAINED_CLUSTERS
{
TRACE_CPUPROFILER_EVENT_SCOPE(Nanite::Build::ConstrainClusters);
ConstrainClusters( Groups, Clusters );
}
(......)
#endif
// 计算量化的位置.
{
TRACE_CPUPROFILER_EVENT_SCOPE(Nanite::Build::CalculateQuantizedPositions);
// 需要在cluster被约束和拆分之后触发。
Resources.PositionPrecision = CalculateQuantizedPositionsUniformGrid( Clusters, MeshBounds, Settings );
}
// 输出材质范围统计信息.
{
TRACE_CPUPROFILER_EVENT_SCOPE(Nanite::Build::PrintMaterialRangeStats);
PrintMaterialRangeStats( Clusters );
}
TArray<FPage> Pages;
TArray<FClusterGroupPart> GroupParts;
TArray<FEncodingInfo> EncodingInfos;
// 计算编码信息.
{
TRACE_CPUPROFILER_EVENT_SCOPE(Nanite::Build::CalculateEncodingInfos);
CalculateEncodingInfos(EncodingInfos, Clusters, bHasColors, NumTexCoords);
}
// 分配Cluster到Page页表.
{
TRACE_CPUPROFILER_EVENT_SCOPE(Nanite::Build::AssignClustersToPages);
AssignClustersToPages(Groups, Clusters, EncodingInfos, Pages, GroupParts);
}
// 构建Cluster组的层级节点.
{
TRACE_CPUPROFILER_EVENT_SCOPE(Nanite::Build::BuildHierarchyNodes);
BuildHierarchies(Resources, Groups, GroupParts, NumMeshes);
}
// 将Cluster和Cluster组的信息写入Page页表.
{
TRACE_CPUPROFILER_EVENT_SCOPE(Nanite::Build::WritePages);
WritePages(Resources, Pages, Groups, GroupParts, Clusters, EncodingInfos, NumTexCoords);
}
}
上面编码的过程涉及了很多重要接口,下面一一分析它们:
// Engine\Source\Developer\NaniteBuilder\Private\NaniteEncode.cpp
static void RemoveDegenerateTriangles(TArray<FCluster>& Clusters)
{
// 并行地删除Cluster列表的退化三角形.
ParallelFor( Clusters.Num(),
[&]( uint32 ClusterIndex )
{
RemoveDegenerateTriangles( Clusters[ ClusterIndex ] );
} );
}
// 删除单个Cluster的退化三角形.
static void RemoveDegenerateTriangles(FCluster& Cluster)
{
uint32 NumOldTriangles = Cluster.NumTris;
uint32 NumNewTriangles = 0;
for (uint32 OldTriangleIndex = 0; OldTriangleIndex < NumOldTriangles; OldTriangleIndex++)
{
uint32 i0 = Cluster.Indexes[OldTriangleIndex * 3 + 0];
uint32 i1 = Cluster.Indexes[OldTriangleIndex * 3 + 1];
uint32 i2 = Cluster.Indexes[OldTriangleIndex * 3 + 2];
uint32 mi = Cluster.MaterialIndexes[OldTriangleIndex];
// 如果不是退化三角形, 则3个顶点的数据必然彼此不一样.
// 笔者注: 也许这里可以做优化, 比如同一个三角形的任意两个顶点的距离小于某个阈值(0.01f)时也算退化三角形.
if (i0 != i1 && i0 != i2 && i1 != i2)
{
Cluster.Indexes[NumNewTriangles * 3 + 0] = i0;
Cluster.Indexes[NumNewTriangles * 3 + 1] = i1;
Cluster.Indexes[NumNewTriangles * 3 + 2] = i2;
Cluster.MaterialIndexes[NumNewTriangles] = mi;
NumNewTriangles++;
}
}
Cluster.NumTris = NumNewTriangles;
Cluster.Indexes.SetNum(NumNewTriangles * 3);
Cluster.MaterialIndexes.SetNum(NumNewTriangles);
}
// 将Cluster三角形分类到材质范围内, 添加材质范围到Cluster。
static void BuildMaterialRanges( TArray<FCluster>& Clusters )
{
// 并行处理.
ParallelFor( Clusters.Num(),
[&]( uint32 ClusterIndex )
{
BuildMaterialRanges( Clusters[ ClusterIndex ] );
} );
}
static void BuildMaterialRanges(FCluster& Cluster)
{
TArray<FMaterialTriangle, TInlineAllocator<128>> MaterialTris;
// 构建单个Cluster的材质范围.
BuildMaterialRanges(
Cluster.Indexes,
Cluster.MaterialIndexes,
MaterialTris,
Cluster.MaterialRanges);
// 将索引写回到Cluster.
for (uint32 Triangle = 0; Triangle < Cluster.NumTris; ++Triangle)
{
Cluster.Indexes[Triangle * 3 + 0] = MaterialTris[Triangle].Index0;
Cluster.Indexes[Triangle * 3 + 1] = MaterialTris[Triangle].Index1;
Cluster.Indexes[Triangle * 3 + 2] = MaterialTris[Triangle].Index2;
Cluster.MaterialIndexes[Triangle] = MaterialTris[Triangle].MaterialIndex;
}
}
// 约束Cluster.
static void ConstrainClusters( TArray< FClusterGroup >& ClusterGroups, TArray< FCluster >& Clusters )
{
// 计算统计信息.
uint32 TotalOldTriangles = 0;
uint32 TotalOldVertices = 0;
for( const FCluster& Cluster : Clusters )
{
TotalOldTriangles += Cluster.NumTris;
TotalOldVertices += Cluster.NumVerts;
}
// 并行地约束Cluster, 区分是否使用带状索引.
ParallelFor( Clusters.Num(),
[&]( uint32 i )
{
#if USE_STRIP_INDICES // 使用带状索引.
FStripifier Stripifier;
Stripifier.ConstrainAndStripifyCluster(Clusters[i]);
#else // 不使用带状索引.
ConstrainClusterFIFO(Clusters[i]);
#endif
} );
uint32 TotalNewTriangles = 0;
uint32 TotalNewVertices = 0;
// 约束cluster.
const uint32 NumOldClusters = Clusters.Num();
for( uint32 i = 0; i < NumOldClusters; i++ )
{
TotalNewTriangles += Clusters[ i ].NumTris;
TotalNewVertices += Clusters[ i ].NumVerts;
// 如果Cluster太多顶点(多于256个), 则拆分它们.
if( Clusters[ i ].NumVerts > 256 )
{
FCluster ClusterA, ClusterB;
uint32 NumTrianglesA = Clusters[ i ].NumTris / 2;
uint32 NumTrianglesB = Clusters[ i ].NumTris - NumTrianglesA;
BuildClusterFromClusterTriangleRange( Clusters[ i ], ClusterA, 0, NumTrianglesA );
BuildClusterFromClusterTriangleRange( Clusters[ i ], ClusterB, NumTrianglesA, NumTrianglesB );
Clusters[ i ] = ClusterA;
ClusterGroups[ ClusterB.GroupIndex ].Children.Add( Clusters.Num() );
Clusters.Add( ClusterB );
}
}
// 计算统计信息.
uint32 TotalNewTrianglesWithSplits = 0;
uint32 TotalNewVerticesWithSplits = 0;
for( const FCluster& Cluster : Clusters )
{
TotalNewTrianglesWithSplits += Cluster.NumTris;
TotalNewVerticesWithSplits += Cluster.NumVerts;
}
(......)
}
// 计算量化位置的均匀格子.
static int32 CalculateQuantizedPositionsUniformGrid(TArray< FCluster >& Clusters, const FBounds& MeshBounds, const FMeshNaniteSettings& Settings)
{
// 为EA简化全局的量化值.
const int32 MaxPositionQuantizedValue = (1 << MAX_POSITION_QUANTIZATION_BITS) - 1;
int32 PositionPrecision = Settings.PositionPrecision;
if (PositionPrecision == MIN_int32)
{
// 自动: 从叶子层级的边界上猜测需要的精度.
const float MaxSize = MeshBounds.GetExtent().GetMax();
// 启发: 如果网格更密集,需要更高的分辨率.
// 使用cluster大小的几何平均值作为密度的代理.
// 另一种解读: 位精度是cluster所需的平均值.
// 对于大小大致相同的cluster,这给出的结果与旧的量化代码非常相似.
double TotalLogSize = 0.0;
int32 TotalNum = 0;
for (const FCluster& Cluster : Clusters)
{
if (Cluster.MipLevel == 0)
{
float ExtentSize = Cluster.Bounds.GetExtent().Size();
if (ExtentSize > 0.0)
{
TotalLogSize += FMath::Log2(ExtentSize);
TotalNum++;
}
}
}
double AvgLogSize = TotalNum > 0 ? TotalLogSize / TotalNum : 0.0;
PositionPrecision = 7 - FMath::RoundToInt(AvgLogSize);
// 截断精度. 用户现在需要明确选择最低精度设置.
// 这些设置可能会导致问题,并且对节省磁盘大小的贡献很小(在测试项目中约为0.4%), 所以不应该自动选择它们.
// 例如:一个非常低分辨率的道路或建筑框架,在孤立状态下看起来不需要什么精度, 但是在一个场景中仍然需要相当高的精度,因为更小的网格被放置在上面或里面.
const int32 AUTO_MIN_PRECISION = 4; // 最小精度是1/16cm.
PositionPrecision = FMath::Max(PositionPrecision, AUTO_MIN_PRECISION);
}
// 计算量化比例.
float QuantizationScale = FMath::Exp2((float)PositionPrecision);
// 确保所有cluster都是可编码的。一个足够大的cluster可能会达到21bpc的极限。如果发生了,就缩小规模,直到合适为止。
for (const FCluster& Cluster : Clusters)
{
const FBounds& Bounds = Cluster.Bounds;
int32 Iterations = 0;
while (true)
{
float MinX = FMath::RoundToFloat(Bounds.Min.X * QuantizationScale);
float MinY = FMath::RoundToFloat(Bounds.Min.Y * QuantizationScale);
float MinZ = FMath::RoundToFloat(Bounds.Min.Z * QuantizationScale);
float MaxX = FMath::RoundToFloat(Bounds.Max.X * QuantizationScale);
float MaxY = FMath::RoundToFloat(Bounds.Max.Y * QuantizationScale);
float MaxZ = FMath::RoundToFloat(Bounds.Max.Z * QuantizationScale);
if (MinX >= (double)MIN_int32 && MinY >= (double)MIN_int32 && MinZ >= (double)MIN_int32 && // MIN_int32/MAX_int32 is not representable in float
MaxX <= (double)MAX_int32 && MaxY <= (double)MAX_int32 && MaxZ <= (double)MAX_int32 &&
((int32)MaxX - (int32)MinX) <= MaxPositionQuantizedValue && ((int32)MaxY - (int32)MinY) <= MaxPositionQuantizedValue && ((int32)MaxZ - (int32)MinZ) <= MaxPositionQuantizedValue)
{
break;
}
QuantizationScale *= 0.5f;
PositionPrecision--;
check(++Iterations < 100); // Endless loop?
}
}
const float RcpQuantizationScale = 1.0f / QuantizationScale;
// 并行地处理位置量化.
ParallelFor(Clusters.Num(), [&](uint32 ClusterIndex)
{
FCluster& Cluster = Clusters[ClusterIndex];
const uint32 NumClusterVerts = Cluster.NumVerts;
const uint32 ClusterShift = Cluster.QuantizedPosShift;
Cluster.QuantizedPositions.SetNumUninitialized(NumClusterVerts);
// 量化位置.
FIntVector IntClusterMax = { MIN_int32, MIN_int32, MIN_int32 };
FIntVector IntClusterMin = { MAX_int32, MAX_int32, MAX_int32 };
for (uint32 i = 0; i < NumClusterVerts; i++)
{
const FVector Position = Cluster.GetPosition(i);
FIntVector& IntPosition = Cluster.QuantizedPositions[i];
float PosX = FMath::RoundToFloat(Position.X * QuantizationScale);
float PosY = FMath::RoundToFloat(Position.Y * QuantizationScale);
float PosZ = FMath::RoundToFloat(Position.Z * QuantizationScale);
IntPosition = FIntVector((int32)PosX, (int32)PosY, (int32)PosZ);
IntClusterMax.X = FMath::Max(IntClusterMax.X, IntPosition.X);
IntClusterMax.Y = FMath::Max(IntClusterMax.Y, IntPosition.Y);
IntClusterMax.Z = FMath::Max(IntClusterMax.Z, IntPosition.Z);
IntClusterMin.X = FMath::Min(IntClusterMin.X, IntPosition.X);
IntClusterMin.Y = FMath::Min(IntClusterMin.Y, IntPosition.Y);
IntClusterMin.Z = FMath::Min(IntClusterMin.Z, IntPosition.Z);
}
// 存储最小位数.
const uint32 NumBitsX = FMath::CeilLogTwo(IntClusterMax.X - IntClusterMin.X + 1);
const uint32 NumBitsY = FMath::CeilLogTwo(IntClusterMax.Y - IntClusterMin.Y + 1);
const uint32 NumBitsZ = FMath::CeilLogTwo(IntClusterMax.Z - IntClusterMin.Z + 1);
check(NumBitsX <= MAX_POSITION_QUANTIZATION_BITS);
check(NumBitsY <= MAX_POSITION_QUANTIZATION_BITS);
check(NumBitsZ <= MAX_POSITION_QUANTIZATION_BITS);
for (uint32 i = 0; i < NumClusterVerts; i++)
{
FIntVector& IntPosition = Cluster.QuantizedPositions[i];
// 用量化数据更新浮点位置.
Cluster.GetPosition(i) = FVector(IntPosition.X * RcpQuantizationScale, IntPosition.Y * RcpQuantizationScale, IntPosition.Z * RcpQuantizationScale);
IntPosition.X -= IntClusterMin.X;
IntPosition.Y -= IntClusterMin.Y;
IntPosition.Z -= IntClusterMin.Z;
check(IntPosition.X >= 0 && IntPosition.X < (1 << NumBitsX));
check(IntPosition.Y >= 0 && IntPosition.Y < (1 << NumBitsY));
check(IntPosition.Z >= 0 && IntPosition.Z < (1 << NumBitsZ));
}
// 更新包围盒.
Cluster.Bounds.Min = FVector(IntClusterMin.X * RcpQuantizationScale, IntClusterMin.Y * RcpQuantizationScale, IntClusterMin.Z * RcpQuantizationScale);
Cluster.Bounds.Max = FVector(IntClusterMax.X * RcpQuantizationScale, IntClusterMax.Y * RcpQuantizationScale, IntClusterMax.Z * RcpQuantizationScale);
Cluster.MeshBoundsMin = FVector::ZeroVector;
Cluster.MeshBoundsDelta = FVector(RcpQuantizationScale);
Cluster.QuantizedPosBits = FIntVector(NumBitsX, NumBitsY, NumBitsZ);
Cluster.QuantizedPosStart = IntClusterMin;
Cluster.QuantizedPosShift = 0;
} );
return PositionPrecision;
}
// 计算一组Cluster的编码信息.
static void CalculateEncodingInfos(TArray<FEncodingInfo>& EncodingInfos, const TArray<Nanite::FCluster>& Clusters, bool bHasColors, uint32 NumTexCoords)
{
uint32 NumClusters = Clusters.Num();
EncodingInfos.SetNumUninitialized(NumClusters);
for (uint32 i = 0; i < NumClusters; i++)
{
CalculateEncodingInfo(EncodingInfos[i], Clusters[i], bHasColors, NumTexCoords);
}
}
// 计算单个Cluster的编码信息.
static void CalculateEncodingInfo(FEncodingInfo& Info, const Nanite::FCluster& Cluster, bool bHasColors, uint32 NumTexCoords)
{
const uint32 NumClusterVerts = Cluster.NumVerts;
const uint32 NumClusterTris = Cluster.NumTris;
FMemory::Memzero(Info);
// 写三角形索引。索引存储在一个密集的位流中,每个索引使用ceil(log2(NumClusterVerices))位。着色器实现了未对齐的位流读取来支持这一点。
const uint32 BitsPerIndex = NumClusterVerts > 1 ? (FGenericPlatformMath::FloorLog2(NumClusterVerts - 1) + 1) : 0;
const uint32 BitsPerTriangle = BitsPerIndex + 2 * 5; // Base index + two 5-bit offsets
Info.BitsPerIndex = BitsPerIndex;
// 计算页信息.
FPageSections& GpuSizes = Info.GpuSizes;
GpuSizes.Cluster = sizeof(FPackedCluster);
GpuSizes.MaterialTable = CalcMaterialTableSize(Cluster) * sizeof(uint32);
GpuSizes.DecodeInfo = NumTexCoords * sizeof(FUVRange);
GpuSizes.Index = (NumClusterTris * BitsPerTriangle + 31) / 32 * 4;
#if USE_UNCOMPRESSED_VERTEX_DATA // 使用未压缩的顶点数据.
const uint32 AttribBytesPerVertex = (3 * sizeof(float) + sizeof(uint32) + NumTexCoords * 2 * sizeof(float));
Info.BitsPerAttribute = AttribBytesPerVertex * 8;
Info.ColorMin = FIntVector4(0, 0, 0, 0);
Info.ColorBits = FIntVector4(8, 8, 8, 8);
Info.ColorMode = VERTEX_COLOR_MODE_VARIABLE;
Info.UVPrec = 0;
GpuSizes.Position = NumClusterVerts * 3 * sizeof(float);
GpuSizes.Attribute = NumClusterVerts * AttribBytesPerVertex;
#else // 使用压缩的顶点数据.
Info.BitsPerAttribute = 2 * NORMAL_QUANTIZATION_BITS;
check(NumClusterVerts > 0);
const bool bIsLeaf = (Cluster.GeneratingGroupIndex == INVALID_GROUP_INDEX);
// 顶点颜色.
Info.ColorMode = VERTEX_COLOR_MODE_WHITE;
Info.ColorMin = FIntVector4(255, 255, 255, 255);
if (bHasColors)
{
FIntVector4 ColorMin = FIntVector4( 255, 255, 255, 255);
FIntVector4 ColorMax = FIntVector4( 0, 0, 0, 0);
for (uint32 i = 0; i < NumClusterVerts; i++)
{
FColor Color = Cluster.GetColor(i).ToFColor(false);
ColorMin.X = FMath::Min(ColorMin.X, (int32)Color.R);
ColorMin.Y = FMath::Min(ColorMin.Y, (int32)Color.G);
ColorMin.Z = FMath::Min(ColorMin.Z, (int32)Color.B);
ColorMin.W = FMath::Min(ColorMin.W, (int32)Color.A);
ColorMax.X = FMath::Max(ColorMax.X, (int32)Color.R);
ColorMax.Y = FMath::Max(ColorMax.Y, (int32)Color.G);
ColorMax.Z = FMath::Max(ColorMax.Z, (int32)Color.B);
ColorMax.W = FMath::Max(ColorMax.W, (int32)Color.A);
}
const FIntVector4 ColorDelta = ColorMax - ColorMin;
const int32 R_Bits = FMath::CeilLogTwo(ColorDelta.X + 1);
const int32 G_Bits = FMath::CeilLogTwo(ColorDelta.Y + 1);
const int32 B_Bits = FMath::CeilLogTwo(ColorDelta.Z + 1);
const int32 A_Bits = FMath::CeilLogTwo(ColorDelta.W + 1);
uint32 NumColorBits = R_Bits + G_Bits + B_Bits + A_Bits;
Info.BitsPerAttribute += NumColorBits;
Info.ColorMin = ColorMin;
Info.ColorBits = FIntVector4(R_Bits, G_Bits, B_Bits, A_Bits);
if (NumColorBits > 0)
{
Info.ColorMode = VERTEX_COLOR_MODE_VARIABLE;
}
else
{
if (ColorMin.X == 255 && ColorMin.Y == 255 && ColorMin.Z == 255 && ColorMin.W == 255)
Info.ColorMode = VERTEX_COLOR_MODE_WHITE;
else
Info.ColorMode = VERTEX_COLOR_MODE_CONSTANT;
}
}
for( uint32 UVIndex = 0; UVIndex < NumTexCoords; UVIndex++ )
{
FGeometryEncodingUVInfo& UVInfo = Info.UVInfos[UVIndex];
// 分块压缩纹理坐标.
// 纹理坐标相对于Cluster的最小/最大UV坐标存储.
// UV接缝产生非常大的稀疏边界矩形. 为了减轻这一点,最大的差距在U和V的边界矩形被排除在编码空间.
// 解码这个非常简单: UV += (UV >= GapStart) ? GapRange : 0;
// 生成有序的U和V数组.
TArray<float> UValues;
TArray<float> VValues;
UValues.AddUninitialized(NumClusterVerts);
VValues.AddUninitialized(NumClusterVerts);
for (uint32 i = 0; i < NumClusterVerts; i++)
{
const FVector2D& UV = Cluster.GetUVs(i)[ UVIndex ];
UValues[i] = UV.X;
VValues[i] = UV.Y;
}
UValues.Sort();
VValues.Sort();
// 找出有序uv之间的最大差距
FVector2D LargestGapStart = FVector2D(UValues[0], VValues[0]);
FVector2D LargestGapEnd = FVector2D(UValues[0], VValues[0]);
for (uint32 i = 0; i < NumClusterVerts - 1; i++)
{
if (UValues[i + 1] - UValues[i] > LargestGapEnd.X - LargestGapStart.X)
{
LargestGapStart.X = UValues[i];
LargestGapEnd.X = UValues[i + 1];
}
if (VValues[i + 1] - VValues[i] > LargestGapEnd.Y - LargestGapStart.Y)
{
LargestGapStart.Y = VValues[i];
LargestGapEnd.Y = VValues[i + 1];
}
}
const FVector2D UVMin = FVector2D(UValues[0], VValues[0]);
const FVector2D UVMax = FVector2D(UValues[NumClusterVerts - 1], VValues[NumClusterVerts - 1]);
const FVector2D UVDelta = UVMax - UVMin;
const FVector2D UVRcpDelta = FVector2D( UVDelta.X > SMALL_NUMBER ? 1.0f / UVDelta.X : 0.0f,
UVDelta.Y > SMALL_NUMBER ? 1.0f / UVDelta.Y : 0.0f);
const FVector2D NonGapLength = FVector2D::Max(UVDelta - (LargestGapEnd - LargestGapStart), FVector2D(0.0f, 0.0f));
const FVector2D NormalizedGapStart = (LargestGapStart - UVMin) * UVRcpDelta;
const FVector2D NormalizedGapEnd = (LargestGapEnd - UVMin) * UVRcpDelta;
const FVector2D NormalizedNonGapLength = NonGapLength * UVRcpDelta;
#if 1
const float TexCoordUnitPrecision = (1 << 14); // TODO: Implement UI + 'Auto' mode that decides when this is necessary.
int32 TexCoordBitsU = 0;
if (UVDelta.X > 0)
{
// 即使当NonGapLength=0时,UVDelta是非零的,所以至少需要2个值(1bit)来区分高和低。
int32 NumValues = FMath::Max(FMath::CeilToInt(NonGapLength.X * TexCoordUnitPrecision), 2);
// 限制在12位, 从下面的临时hack可知已足够好了.
TexCoordBitsU = FMath::Min((int32)FMath::CeilLogTwo(NumValues), 12);
}
int32 TexCoordBitsV = 0;
if (UVDelta.Y > 0)
{
int32 NumValues = FMath::Max(FMath::CeilToInt(NonGapLength.Y * TexCoordUnitPrecision), 2);
TexCoordBitsV = FMath::Min((int32)FMath::CeilLogTwo(NumValues), 12);
}
#else
// 临时hack以修正编码问题.
const int32 TexCoordBitsU = 12;
const int32 TexCoordBitsV = 12;
#endif
// 处理UV坐标和大小.
Info.UVPrec |= ((TexCoordBitsV << 4) | TexCoordBitsU) << (UVIndex * 8);
const int32 TexCoordMaxValueU = (1 << TexCoordBitsU) - 1;
const int32 TexCoordMaxValueV = (1 << TexCoordBitsV) - 1;
const int32 NU = (int32)FMath::Clamp(NormalizedNonGapLength.X > SMALL_NUMBER ? (TexCoordMaxValueU - 2) / NormalizedNonGapLength.X : 0.0f, (float)TexCoordMaxValueU, (float)0xFFFF);
const int32 NV = (int32)FMath::Clamp(NormalizedNonGapLength.Y > SMALL_NUMBER ? (TexCoordMaxValueV - 2) / NormalizedNonGapLength.Y : 0.0f, (float)TexCoordMaxValueV, (float)0xFFFF);
int32 GapStartU = TexCoordMaxValueU + 1;
int32 GapStartV = TexCoordMaxValueV + 1;
int32 GapLengthU = 0;
int32 GapLengthV = 0;
if (NU > TexCoordMaxValueU)
{
GapStartU = int32(NormalizedGapStart.X * NU + 0.5f) + 1;
const int32 GapEndU = int32(NormalizedGapEnd.X * NU + 0.5f);
GapLengthU = FMath::Max(GapEndU - GapStartU, 0);
}
if (NV > TexCoordMaxValueV)
{
GapStartV = int32(NormalizedGapStart.Y * NV + 0.5f) + 1;
const int32 GapEndV = int32(NormalizedGapEnd.Y * NV + 0.5f);
GapLengthV = FMath::Max(GapEndV - GapStartV, 0);
}
UVInfo.UVRange.Min = UVMin;
UVInfo.UVRange.Scale = FVector2D(NU > 0 ? UVDelta.X / NU : 0.0f, NV > 0 ? UVDelta.Y / NV : 0.0f);
check(GapStartU >= 0);
check(GapStartV >= 0);
UVInfo.UVRange.GapStart[0] = GapStartU;
UVInfo.UVRange.GapStart[1] = GapStartV;
UVInfo.UVRange.GapLength[0] = GapLengthU;
UVInfo.UVRange.GapLength[1] = GapLengthV;
UVInfo.UVDelta = UVDelta;
UVInfo.UVRcpDelta = UVRcpDelta;
UVInfo.NU = NU;
UVInfo.NV = NV;
Info.BitsPerAttribute += TexCoordBitsU + TexCoordBitsV;
}
const uint32 PositionBitsPerVertex = Cluster.QuantizedPosBits.X + Cluster.QuantizedPosBits.Y + Cluster.QuantizedPosBits.Z;
GpuSizes.Position = (NumClusterVerts * PositionBitsPerVertex + 31) / 32 * 4;
GpuSizes.Attribute = (NumClusterVerts * Info.BitsPerAttribute + 31) / 32 * 4;
#endif
}
/*
构建流式Page
Page布局:
Fixup Chunk (仅加载到CPU内存)
FPackedCluster
MaterialRangeTable
GeometryData
*/
static void AssignClustersToPages(
TArray< FClusterGroup >& ClusterGroups,
TArray< FCluster >& Clusters,
const TArray< FEncodingInfo >& EncodingInfos,
TArray<FPage>& Pages,
TArray<FClusterGroupPart>& Parts
)
{
check(Pages.Num() == 0);
check(Parts.Num() == 0);
const uint32 NumClusterGroups = ClusterGroups.Num();
Pages.AddDefaulted();
SortGroupClusters(ClusterGroups, Clusters);
TArray<uint32> ClusterGroupPermutation = CalculateClusterGroupPermutation(ClusterGroups);
for (uint32 i = 0; i < NumClusterGroups; i++)
{
// 挑选最好的下一个Group.
uint32 GroupIndex = ClusterGroupPermutation[i];
FClusterGroup& Group = ClusterGroups[GroupIndex];
uint32 GroupStartPage = INVALID_PAGE_INDEX;
for (uint32 ClusterIndex : Group.Children)
{
// 挑选最好的下一个Cluster.
FCluster& Cluster = Clusters[ClusterIndex];
const FEncodingInfo& EncodingInfo = EncodingInfos[ClusterIndex];
// 加入Page.
FPage* Page = &Pages.Top();
if (Page->GpuSizes.GetTotal() + EncodingInfo.GpuSizes.GetTotal() > CLUSTER_PAGE_GPU_SIZE || Page->NumClusters + 1 > MAX_CLUSTERS_PER_PAGE)
{
// Page已满, 需要新增一个.
Pages.AddDefaulted();
Page = &Pages.Top();
}
// 检测是否增加新的FClusterGroupPart.
if (Page->PartsNum == 0 || Parts[Page->PartsStartIndex + Page->PartsNum - 1].GroupIndex != GroupIndex)
{
if (Page->PartsNum == 0)
{
Page->PartsStartIndex = Parts.Num();
}
Page->PartsNum++;
FClusterGroupPart& Part = Parts.AddDefaulted_GetRef();
Part.GroupIndex = GroupIndex;
}
// 添加cluster到page.
uint32 PageIndex = Pages.Num() - 1;
uint32 PartIndex = Parts.Num() - 1;
FClusterGroupPart& Part = Parts.Last();
if (Part.Clusters.Num() == 0)
{
Part.PageClusterOffset = Page->NumClusters;
Part.PageIndex = PageIndex;
}
Part.Clusters.Add(ClusterIndex);
check(Part.Clusters.Num() <= MAX_CLUSTERS_PER_GROUP);
Cluster.GroupPartIndex = PartIndex;
if (GroupStartPage == INVALID_PAGE_INDEX)
{
GroupStartPage = PageIndex;
}
Page->GpuSizes += EncodingInfo.GpuSizes;
Page->NumClusters++;
}
Group.PageIndexStart = GroupStartPage;
Group.PageIndexNum = Pages.Num() - GroupStartPage;
check(Group.PageIndexNum >= 1);
check(Group.PageIndexNum <= MAX_GROUP_PARTS_MASK);
}
// 重新计算group part的包围盒.
for (FClusterGroupPart& Part : Parts)
{
check(Part.Clusters.Num() <= MAX_CLUSTERS_PER_GROUP);
check(Part.PageIndex < (uint32)Pages.Num());
FBounds Bounds;
for (uint32 ClusterIndex : Part.Clusters)
{
Bounds += Clusters[ClusterIndex].Bounds;
}
Part.Bounds = Bounds;
}
}
// 构建ClusterGroup层级结构.
static void BuildHierarchies(FResources& Resources, const TArray<FClusterGroup>& Groups, TArray<FClusterGroupPart>& Parts, uint32 NumMeshes)
{
TArray<TArray<uint32>> PartsByMesh;
PartsByMesh.SetNum(NumMeshes);
// 将group part分配给它们所属的网格.
const uint32 NumTotalParts = Parts.Num();
for (uint32 PartIndex = 0; PartIndex < NumTotalParts; PartIndex++)
{
FClusterGroupPart& Part = Parts[PartIndex];
PartsByMesh[Groups[Part.GroupIndex].MeshIndex].Add(PartIndex);
}
for (uint32 MeshIndex = 0; MeshIndex < NumMeshes; MeshIndex++)
{
const TArray<uint32>& PartIndices = PartsByMesh[MeshIndex];
const uint32 NumParts = PartIndices.Num();
int32 MaxMipLevel = 0;
for (uint32 i = 0; i < NumParts; i++)
{
MaxMipLevel = FMath::Max(MaxMipLevel, Groups[Parts[PartIndices[i]].GroupIndex].MipLevel);
}
TArray< FIntermediateNode > Nodes;
Nodes.SetNum(NumParts);
// 为每个网格的LOD层级构建叶子节点.
TArray<TArray<uint32>> NodesByMip;
NodesByMip.SetNum(MaxMipLevel + 1);
for (uint32 i = 0; i < NumParts; i++)
{
const uint32 PartIndex = PartIndices[i];
const FClusterGroupPart& Part = Parts[PartIndex];
const FClusterGroup& Group = Groups[Part.GroupIndex];
const int32 MipLevel = Group.MipLevel;
FIntermediateNode& Node = Nodes[i];
Node.Bound = Part.Bounds;
Node.PartIndex = PartIndex;
Node.MipLevel = Group.MipLevel;
Node.bLeaf = true;
NodesByMip[Group.MipLevel].Add(i);
}
uint32 RootIndex = 0;
if (Nodes.Num() == 1)
{
// 只是一个叶子节点, 需要特殊设置, 因为根节点总是一个内部节点。
FIntermediateNode& Node = Nodes.AddDefaulted_GetRef();
Node.Children.Add(0);
Node.Bound = Nodes[0].Bound;
RootIndex = 1;
}
else
{
// 构建层次结构(Hierarchy):
// Nanite网格包含了许多LOD级的Cluster数据. 不同层级的Cluster大小可以相差很大, 这对建立良好的Hierarchy俨然是个挑战.
// 除了可见性包围盒,该Hierarchy还跟踪子节点的保守LOD误差度量。
// 只要子节点是可见的,并且保守LOD误差不会比我们所寻找的更详细,运行时遍历就会下降。
// 当混合来自不同LOD的Cluster时,我们必须非常小心,因为不太详细的Cluster很容易导致包围盒和误差度量的膨胀。
// 我们已经尝试了许多LOD混合方法,但目前看来,为每个LOD级别构建单独的Hierarchy,然后再构建这些Hierarchy的Hierarchy,可以得到最好的、最可预测的结果。
TArray<uint32> LevelRoots;
for (int32 MipLevel = 0; MipLevel <= MaxMipLevel; MipLevel++)
{
if (NodesByMip[MipLevel].Num() > 0)
{
// 为mip层级构建一个hierarchy, 使用了自顶向下分离法.
uint32 NodeIndex = BuildHierarchyTopDown(Nodes, NodesByMip[MipLevel], true);
if (Nodes[NodeIndex].bLeaf || Nodes[NodeIndex].Children.Num() == MAX_BVH_NODE_FANOUT)
{
// 叶子或填充节点, 直接加入.
LevelRoots.Add(NodeIndex);
}
else
{
// 不完整的节点。丢弃编码,并将子节点添加为根节点.
LevelRoots.Append(Nodes[NodeIndex].Children);
}
}
}
// 构建顶层hierarchy, 是MIP hierarchies的hierarchy.
RootIndex = BuildHierarchyTopDown(Nodes, LevelRoots, false);
}
check(Nodes.Num() > 0);
#if BVH_BUILD_WRITE_GRAPHVIZ
WriteDotGraph(Nodes);
#endif
TArray< FHierarchyNode > HierarchyNodes;
BuildHierarchyRecursive(HierarchyNodes, Nodes, Groups, Parts, RootIndex);
// 转换hierarchy成压缩格式.
const uint32 NumHierarchyNodes = HierarchyNodes.Num();
const uint32 PackedBaseIndex = Resources.HierarchyNodes.Num();
Resources.HierarchyRootOffsets.Add(PackedBaseIndex);
Resources.HierarchyNodes.AddDefaulted(NumHierarchyNodes);
for (uint32 i = 0; i < NumHierarchyNodes; i++)
{
// 压缩Hierarchy节点.
PackHierarchyNode(Resources.HierarchyNodes[PackedBaseIndex + i], HierarchyNodes[i], Groups, Parts);
}
}
}
// 写入页表.
static void WritePages( FResources& Resources,
TArray<FPage>& Pages,
const TArray<FClusterGroup>& Groups,
const TArray<FClusterGroupPart>& Parts,
const TArray<FCluster>& Clusters,
const TArray<FEncodingInfo>& EncodingInfos,
uint32 NumTexCoords)
{
check(Resources.PageStreamingStates.Num() == 0);
const bool bLZCompress = true;
TArray< uint8 > StreamableBulkData;
const uint32 NumPages = Pages.Num();
const uint32 NumClusters = Clusters.Num();
Resources.PageStreamingStates.SetNum(NumPages);
// 处理FixupChunk.
uint32 TotalGPUSize = 0;
TArray<FFixupChunk> FixupChunks;
FixupChunks.SetNum(NumPages);
for (uint32 PageIndex = 0; PageIndex < NumPages; PageIndex++)
{
const FPage& Page = Pages[PageIndex];
FFixupChunk& FixupChunk = FixupChunks[PageIndex];
FixupChunk.Header.NumClusters = Page.NumClusters;
uint32 NumHierarchyFixups = 0;
for (uint32 i = 0; i < Page.PartsNum; i++)
{
const FClusterGroupPart& Part = Parts[Page.PartsStartIndex + i];
NumHierarchyFixups += Groups[Part.GroupIndex].PageIndexNum;
}
FixupChunk.Header.NumHierachyFixups = NumHierarchyFixups; // NumHierarchyFixups must be set before writing cluster fixups
TotalGPUSize += Page.GpuSizes.GetTotal();
}
// 向Page添加额外的修正.
for (const FClusterGroupPart& Part : Parts)
{
check(Part.PageIndex < NumPages);
const FClusterGroup& Group = Groups[Part.GroupIndex];
for (uint32 ClusterPositionInPart = 0; ClusterPositionInPart < (uint32)Part.Clusters.Num(); ClusterPositionInPart++)
{
const FCluster& Cluster = Clusters[Part.Clusters[ClusterPositionInPart]];
if (Cluster.GeneratingGroupIndex != INVALID_GROUP_INDEX)
{
const FClusterGroup& GeneratingGroup = Groups[Cluster.GeneratingGroupIndex];
check(GeneratingGroup.PageIndexNum >= 1);
if (GeneratingGroup.PageIndexStart == Part.PageIndex && GeneratingGroup.PageIndexNum == 1)
continue; // Dependencies already met by current page. Fixup directly instead.
uint32 PageDependencyStart = GeneratingGroup.PageIndexStart;
uint32 PageDependencyNum = GeneratingGroup.PageIndexNum;
RemoveRootPagesFromRange(PageDependencyStart, PageDependencyNum); // Root page should never be a dependency
const FClusterFixup ClusterFixup = FClusterFixup(Part.PageIndex, Part.PageClusterOffset + ClusterPositionInPart, PageDependencyStart, PageDependencyNum);
for (uint32 i = 0; i < GeneratingGroup.PageIndexNum; i++)
{
FFixupChunk& FixupChunk = FixupChunks[GeneratingGroup.PageIndexStart + i];
FixupChunk.GetClusterFixup(FixupChunk.Header.NumClusterFixups++) = ClusterFixup;
}
}
}
}
// 生成page依赖.
for (uint32 PageIndex = 0; PageIndex < NumPages; PageIndex++)
{
const FFixupChunk& FixupChunk = FixupChunks[PageIndex];
FPageStreamingState& PageStreamingState = Resources.PageStreamingStates[PageIndex];
PageStreamingState.DependenciesStart = Resources.PageDependencies.Num();
for (uint32 i = 0; i < FixupChunk.Header.NumClusterFixups; i++)
{
uint32 FixupPageIndex = FixupChunk.GetClusterFixup(i).GetPageIndex();
check(FixupPageIndex < NumPages);
if (IsRootPage(FixupPageIndex) || FixupPageIndex == PageIndex) // Never emit dependencies to ourselves or a root page.
continue;
// 没有在集合内才增加.
// O(n^2), 但实际上依赖数量会比较小.
bool bFound = false;
for (uint32 j = PageStreamingState.DependenciesStart; j < (uint32)Resources.PageDependencies.Num(); j++)
{
if (Resources.PageDependencies[j] == FixupPageIndex)
{
bFound = true;
break;
}
}
if (bFound)
continue;
Resources.PageDependencies.Add(FixupPageIndex);
}
PageStreamingState.DependenciesNum = Resources.PageDependencies.Num() - PageStreamingState.DependenciesStart;
}
// 处理page.
struct FPageResult
{
TArray<uint8> Data;
uint32 UncompressedSize;
};
TArray< FPageResult > PageResults;
PageResults.SetNum(NumPages);
// 并行处理
ParallelFor(NumPages, [&Resources, &Pages, &Groups, &Parts, &Clusters, &EncodingInfos, &FixupChunks, &PageResults, NumTexCoords, bLZCompress](int32 PageIndex)
{
const FPage& Page = Pages[PageIndex];
FFixupChunk& FixupChunk = FixupChunks[PageIndex];
// 增加hierarchy修正.
{
// Parts include the hierarchy fixups for all the other parts of the same group.
uint32 NumHierarchyFixups = 0;
for (uint32 i = 0; i < Page.PartsNum; i++)
{
const FClusterGroupPart& Part = Parts[Page.PartsStartIndex + i];
const FClusterGroup& Group = Groups[Part.GroupIndex];
const uint32 HierarchyRootOffset = Resources.HierarchyRootOffsets[Group.MeshIndex];
uint32 PageDependencyStart = Group.PageIndexStart;
uint32 PageDependencyNum = Group.PageIndexNum;
RemoveRootPagesFromRange(PageDependencyStart, PageDependencyNum);
// Add fixups to all parts of the group
for (uint32 j = 0; j < Group.PageIndexNum; j++)
{
const FPage& Page2 = Pages[Group.PageIndexStart + j];
for (uint32 k = 0; k < Page2.PartsNum; k++)
{
const FClusterGroupPart& Part2 = Parts[Page2.PartsStartIndex + k];
if (Part2.GroupIndex == Part.GroupIndex)
{
const uint32 GlobalHierarchyNodeIndex = HierarchyRootOffset + Part2.HierarchyNodeIndex;
FixupChunk.GetHierarchyFixup(NumHierarchyFixups++) = FHierarchyFixup(Part2.PageIndex, GlobalHierarchyNodeIndex, Part2.HierarchyChildIndex, Part2.PageClusterOffset, PageDependencyStart, PageDependencyNum);
break;
}
}
}
}
check(NumHierarchyFixups == FixupChunk.Header.NumHierachyFixups);
}
// Pack clusters and generate material range data
TArray<uint32> CombinedStripBitmaskData;
TArray<uint32> CombinedVertexRefBitmaskData;
TArray<uint32> CombinedVertexRefData;
TArray<uint8> CombinedIndexData;
TArray<uint8> CombinedPositionData;
TArray<uint8> CombinedAttributeData;
TArray<uint32> MaterialRangeData;
TArray<uint16> CodedVerticesPerCluster;
TArray<uint32> NumVertexBytesPerCluster;
TArray<FPackedCluster> PackedClusters;
PackedClusters.SetNumUninitialized(Page.NumClusters);
CodedVerticesPerCluster.SetNumUninitialized(Page.NumClusters);
NumVertexBytesPerCluster.SetNumUninitialized(Page.NumClusters);
const uint32 NumPackedClusterDwords = Page.NumClusters * sizeof(FPackedCluster) / sizeof(uint32);
FPageSections GpuSectionOffsets = Page.GpuSizes.GetOffsets();
TMap<FVariableVertex, uint32> UniqueVertices;
for (uint32 i = 0; i < Page.PartsNum; i++)
{
const FClusterGroupPart& Part = Parts[Page.PartsStartIndex + i];
for (uint32 j = 0; j < (uint32)Part.Clusters.Num(); j++)
{
const uint32 ClusterIndex = Part.Clusters[j];
const FCluster& Cluster = Clusters[ClusterIndex];
const FEncodingInfo& EncodingInfo = EncodingInfos[ClusterIndex];
const uint32 LocalClusterIndex = Part.PageClusterOffset + j;
FPackedCluster& PackedCluster = PackedClusters[LocalClusterIndex];
PackCluster(PackedCluster, Cluster, EncodingInfos[ClusterIndex], NumTexCoords);
PackedCluster.PackedMaterialInfo = PackMaterialInfo(Cluster, MaterialRangeData, NumPackedClusterDwords);
check((GpuSectionOffsets.Index & 3) == 0);
check((GpuSectionOffsets.Position & 3) == 0);
check((GpuSectionOffsets.Attribute & 3) == 0);
PackedCluster.SetIndexOffset(GpuSectionOffsets.Index);
PackedCluster.SetPositionOffset(GpuSectionOffsets.Position);
PackedCluster.SetAttributeOffset(GpuSectionOffsets.Attribute);
PackedCluster.SetDecodeInfoOffset(GpuSectionOffsets.DecodeInfo);
GpuSectionOffsets += EncodingInfo.GpuSizes;
const uint32 PrevVertexBytes = CombinedPositionData.Num();
uint32 NumCodedVertices = 0;
EncodeGeometryData( LocalClusterIndex, Cluster, EncodingInfo, NumTexCoords,
CombinedStripBitmaskData, CombinedIndexData,
CombinedVertexRefBitmaskData, CombinedVertexRefData, CombinedPositionData, CombinedAttributeData,
UniqueVertices, NumCodedVertices);
NumVertexBytesPerCluster[LocalClusterIndex] = CombinedPositionData.Num() - PrevVertexBytes;
CodedVerticesPerCluster[LocalClusterIndex] = NumCodedVertices;
}
}
check(GpuSectionOffsets.Cluster == Page.GpuSizes.GetMaterialTableOffset());
check(Align(GpuSectionOffsets.MaterialTable, 16) == Page.GpuSizes.GetDecodeInfoOffset());
check(GpuSectionOffsets.DecodeInfo == Page.GpuSizes.GetIndexOffset());
check(GpuSectionOffsets.Index == Page.GpuSizes.GetPositionOffset());
check(GpuSectionOffsets.Position == Page.GpuSizes.GetAttributeOffset());
check(GpuSectionOffsets.Attribute == Page.GpuSizes.GetTotal());
// Dword对齐索引数据.
CombinedIndexData.SetNumZeroed((CombinedIndexData.Num() + 3) & -4);
// 直接在packkedclusters上执行页面内部修复.
for (uint32 LocalPartIndex = 0; LocalPartIndex < Page.PartsNum; LocalPartIndex++)
{
const FClusterGroupPart& Part = Parts[Page.PartsStartIndex + LocalPartIndex];
const FClusterGroup& Group = Groups[Part.GroupIndex];
uint32 GeneratingGroupIndex = MAX_uint32;
for (uint32 ClusterPositionInPart = 0; ClusterPositionInPart < (uint32)Part.Clusters.Num(); ClusterPositionInPart++)
{
const FCluster& Cluster = Clusters[Part.Clusters[ClusterPositionInPart]];
if (Cluster.GeneratingGroupIndex != INVALID_GROUP_INDEX)
{
const FClusterGroup& GeneratingGroup = Groups[Cluster.GeneratingGroupIndex];
uint32 PageDependencyStart = Group.PageIndexStart;
uint32 PageDependencyNum = Group.PageIndexNum;
RemoveRootPagesFromRange(PageDependencyStart, PageDependencyNum);
if (GeneratingGroup.PageIndexStart == PageIndex && GeneratingGroup.PageIndexNum == 1)
{
// 当前Page已经满足的依赖, 直接修正.
PackedClusters[Part.PageClusterOffset + ClusterPositionInPart].Flags &= ~NANITE_CLUSTER_FLAG_LEAF; // Mark parent as no longer leaf
}
}
}
}
// 开始page
FPageResult& PageResult = PageResults[PageIndex];
PageResult.Data.SetNum(CLUSTER_PAGE_DISK_SIZE);
FBlockPointer PagePointer(PageResult.Data.GetData(), PageResult.Data.Num());
// 磁盘头信息.
FPageDiskHeader* PageDiskHeader = PagePointer.Advance<FPageDiskHeader>(1);
// 16字节对齐材质范围数据,使其易于在GPU转码期间复制.
MaterialRangeData.SetNum(Align(MaterialRangeData.Num(), 4));
static_assert(sizeof(FUVRange) % 16 == 0, "sizeof(FUVRange) must be a multiple of 16");
static_assert(sizeof(FPackedCluster) % 16 == 0, "sizeof(FPackedCluster) must be a multiple of 16");
PageDiskHeader->NumClusters = Page.NumClusters;
PageDiskHeader->GpuSize = Page.GpuSizes.GetTotal();
PageDiskHeader->NumRawFloat4s = Page.NumClusters * (sizeof(FPackedCluster) + NumTexCoords * sizeof(FUVRange)) / 16 + MaterialRangeData.Num() / 4;
PageDiskHeader->NumTexCoords = NumTexCoords;
// Cluster头信息.
FClusterDiskHeader* ClusterDiskHeaders = PagePointer.Advance<FClusterDiskHeader>(Page.NumClusters);
// 用SOA(Structure-of-Arrays)内存布局写入cluster.
{
const uint32 NumClusterFloat4Propeties = sizeof(FPackedCluster) / 16;
for (uint32 float4Index = 0; float4Index < NumClusterFloat4Propeties; float4Index++)
{
for (const FPackedCluster& PackedCluster : PackedClusters)
{
uint8* Dst = PagePointer.Advance<uint8>(16);
FMemory::Memcpy(Dst, (uint8*)&PackedCluster + float4Index * 16, 16);
}
}
}
// 材质表.
uint32 MaterialTableSize = MaterialRangeData.Num() * MaterialRangeData.GetTypeSize();
uint8* MaterialTable = PagePointer.Advance<uint8>(MaterialTableSize);
FMemory::Memcpy(MaterialTable, MaterialRangeData.GetData(), MaterialTableSize);
check(MaterialTableSize == Page.GpuSizes.GetMaterialTableSize());
// 解码信息.
PageDiskHeader->DecodeInfoOffset = PagePointer.Offset();
for (uint32 i = 0; i < Page.PartsNum; i++)
{
const FClusterGroupPart& Part = Parts[Page.PartsStartIndex + i];
for (uint32 j = 0; j < (uint32)Part.Clusters.Num(); j++)
{
const uint32 ClusterIndex = Part.Clusters[j];
FUVRange* DecodeInfo = PagePointer.Advance<FUVRange>(NumTexCoords);
for (uint32 k = 0; k < NumTexCoords; k++)
{
DecodeInfo[k] = EncodingInfos[ClusterIndex].UVInfos[k].UVRange;
}
}
}
// 索引数据.
{
uint8* IndexData = PagePointer.GetPtr<uint8>();
#if USE_STRIP_INDICES
for (uint32 i = 0; i < Page.PartsNum; i++)
{
const FClusterGroupPart& Part = Parts[Page.PartsStartIndex + i];
for (uint32 j = 0; j < (uint32)Part.Clusters.Num(); j++)
{
const uint32 LocalClusterIndex = Part.PageClusterOffset + j;
const uint32 ClusterIndex = Part.Clusters[j];
const FCluster& Cluster = Clusters[ClusterIndex];
ClusterDiskHeaders[LocalClusterIndex].IndexDataOffset = PagePointer.Offset();
ClusterDiskHeaders[LocalClusterIndex].NumPrevNewVerticesBeforeDwords = Cluster.StripDesc.NumPrevNewVerticesBeforeDwords;
ClusterDiskHeaders[LocalClusterIndex].NumPrevRefVerticesBeforeDwords = Cluster.StripDesc.NumPrevRefVerticesBeforeDwords;
PagePointer.Advance<uint8>(Cluster.StripIndexData.Num());
}
}
uint32 IndexDataSize = CombinedIndexData.Num() * CombinedIndexData.GetTypeSize();
FMemory::Memcpy(IndexData, CombinedIndexData.GetData(), IndexDataSize);
PagePointer.Align(sizeof(uint32));
PageDiskHeader->StripBitmaskOffset = PagePointer.Offset();
uint32 StripBitmaskDataSize = CombinedStripBitmaskData.Num() * CombinedStripBitmaskData.GetTypeSize();
uint8* StripBitmaskData = PagePointer.Advance<uint8>(StripBitmaskDataSize);
FMemory::Memcpy(StripBitmaskData, CombinedStripBitmaskData.GetData(), StripBitmaskDataSize);
#else
for (uint32 i = 0; i < Page.NumClusters; i++)
{
ClusterDiskHeaders[i].IndexDataOffset = PagePointer.Offset();
PagePointer.Advance<uint8>(PackedClusters[i].GetNumTris() * 3);
}
PagePointer.Align(sizeof(uint32));
uint32 IndexDataSize = CombinedIndexData.Num() * CombinedIndexData.GetTypeSize();
FMemory::Memcpy(IndexData, CombinedIndexData.GetData(), IndexDataSize);
#endif
}
// 写入顶点引用的位掩码.
{
PageDiskHeader->VertexRefBitmaskOffset = PagePointer.Offset();
const uint32 VertexRefBitmaskSize = Page.NumClusters * (MAX_CLUSTER_VERTICES / 8);
uint8* VertexRefBitmask = PagePointer.Advance<uint8>(VertexRefBitmaskSize);
FMemory::Memcpy(VertexRefBitmask, CombinedVertexRefBitmaskData.GetData(), VertexRefBitmaskSize);
check(CombinedVertexRefBitmaskData.Num() * CombinedVertexRefBitmaskData.GetTypeSize() == VertexRefBitmaskSize);
}
// 写入顶点引用.
{
uint8* VertexRefs = PagePointer.GetPtr<uint8>();
for (uint32 i = 0; i < Page.NumClusters; i++)
{
ClusterDiskHeaders[i].VertexRefDataOffset = PagePointer.Offset();
uint32 NumVertexRefs = PackedClusters[i].GetNumVerts() - CodedVerticesPerCluster[i];
PagePointer.Advance<uint32>(NumVertexRefs);
}
FMemory::Memcpy(VertexRefs, CombinedVertexRefData.GetData(), CombinedVertexRefData.Num() * CombinedVertexRefData.GetTypeSize());
}
// 写入位置.
{
uint8* PositionData = PagePointer.GetPtr<uint8>();
for (uint32 i = 0; i < Page.NumClusters; i++)
{
ClusterDiskHeaders[i].PositionDataOffset = PagePointer.Offset();
PagePointer.Advance<uint8>(NumVertexBytesPerCluster[i]);
}
check( (PagePointer.GetPtr<uint8>() - PositionData) == CombinedPositionData.Num() * CombinedPositionData.GetTypeSize());
FMemory::Memcpy(PositionData, CombinedPositionData.GetData(), CombinedPositionData.Num() * CombinedPositionData.GetTypeSize());
}
// 写入属性.
{
uint8* AttribData = PagePointer.GetPtr<uint8>();
for (uint32 i = 0; i < Page.NumClusters; i++)
{
const uint32 BytesPerAttribute = (PackedClusters[i].GetBitsPerAttribute() + 7) / 8;
ClusterDiskHeaders[i].AttributeDataOffset = PagePointer.Offset();
PagePointer.Advance<uint8>(Align(CodedVerticesPerCluster[i] * BytesPerAttribute, 4));
}
check((uint32)(PagePointer.GetPtr<uint8>() - AttribData) == CombinedAttributeData.Num() * CombinedAttributeData.GetTypeSize());
FMemory::Memcpy(AttribData, CombinedAttributeData.GetData(), CombinedAttributeData.Num()* CombinedAttributeData.GetTypeSize());
}
// 使用Lempel-Ziv(LZ)无损压缩内存, LZ的一个变种是Lempel-Ziv-Welch(LZW).
// 更多详见: http://athena.ecs.csus.edu/~wang/DLZW.pdf.
if (bLZCompress)
{
TArray<uint8> DataCopy(PageResult.Data.GetData(), PagePointer.Offset());
PageResult.UncompressedSize = DataCopy.Num();
int32 CompressedSize = PageResult.Data.Num();
verify(FCompression::CompressMemory(NAME_LZ4, PageResult.Data.GetData(), CompressedSize, DataCopy.GetData(), DataCopy.Num()));
PageResult.Data.SetNum(CompressedSize, false);
}
else // 不使用压缩.
{
PageResult.Data.SetNum(PagePointer.Offset(), false);
PageResult.UncompressedSize = PageResult.Data.Num();
}
});
// 写入Page.
uint32 TotalUncompressedSize = 0;
uint32 TotalCompressedSize = 0;
uint32 TotalFixupSize = 0;
for (uint32 PageIndex = 0; PageIndex < NumPages; PageIndex++)
{
const FPage& Page = Pages[PageIndex];
FFixupChunk& FixupChunk = FixupChunks[PageIndex];
TArray<uint8>& BulkData = IsRootPage(PageIndex) ? Resources.RootClusterPage : StreamableBulkData;
FPageStreamingState& PageStreamingState = Resources.PageStreamingStates[PageIndex];
PageStreamingState.BulkOffset = BulkData.Num();
// 写入修正块.
uint32 FixupChunkSize = FixupChunk.GetSize();
check(FixupChunk.Header.NumHierachyFixups < MAX_CLUSTERS_PER_PAGE);
check(FixupChunk.Header.NumClusterFixups < MAX_CLUSTERS_PER_PAGE);
BulkData.Append((uint8*)&FixupChunk, FixupChunkSize);
TotalFixupSize += FixupChunkSize;
// 拷贝页到BulkData.
TArray<uint8>& PageData = PageResults[PageIndex].Data;
BulkData.Append(PageData.GetData(), PageData.Num());
TotalUncompressedSize += PageResults[PageIndex].UncompressedSize;
TotalCompressedSize += PageData.Num();
PageStreamingState.BulkSize = BulkData.Num() - PageStreamingState.BulkOffset;
PageStreamingState.PageUncompressedSize = PageResults[PageIndex].UncompressedSize;
}
uint32 TotalDiskSize = Resources.RootClusterPage.Num() + StreamableBulkData.Num();
UE_LOG(LogStaticMesh, Log, TEXT("WritePages:"), NumPages);
UE_LOG(LogStaticMesh, Log, TEXT(" %d pages written."), NumPages);
UE_LOG(LogStaticMesh, Log, TEXT(" GPU size: %d bytes. %.3f bytes per page. %.3f%% utilization."), TotalGPUSize, TotalGPUSize / float(NumPages), TotalGPUSize / (float(NumPages) * CLUSTER_PAGE_GPU_SIZE) * 100.0f);
UE_LOG(LogStaticMesh, Log, TEXT(" Uncompressed page data: %d bytes. Compressed page data: %d bytes. Fixup data: %d bytes."), TotalUncompressedSize, TotalCompressedSize, TotalFixupSize);
UE_LOG(LogStaticMesh, Log, TEXT(" Total disk size: %d bytes. %.3f bytes per page."), TotalDiskSize, TotalDiskSize/ float(NumPages));
// 存储PageData.
Resources.StreamableClusterPages.Lock(LOCK_READ_WRITE);
uint8* Ptr = (uint8*)Resources.StreamableClusterPages.Realloc(StreamableBulkData.Num());
FMemory::Memcpy(Ptr, StreamableBulkData.GetData(), StreamableBulkData.Num());
Resources.StreamableClusterPages.Unlock();
Resources.StreamableClusterPages.SetBulkDataFlags(BULKDATA_Force_NOT_InlinePayload);
Resources.bLZCompressed = bLZCompress;
}
// Engine\Source\Developer\NaniteBuilder\Private\ImposterAtlas.cpp
// 将指定Cluster的光栅化到Imposter.
void FImposterAtlas::Rasterize( const FIntPoint& TilePos, const FCluster& Cluster, uint32 ClusterIndex )
{
constexpr uint32 ViewSize = TileSize;// * SuperSample;
FIntRect Scissor( 0, 0, ViewSize, ViewSize );
// 获取局部到Imposter的变换矩阵.
FMatrix LocalToImposter = GetLocalToImposter( TilePos );
TArray< FVector, TInlineAllocator<128> > Positions;
Positions.SetNum( Cluster.NumVerts, false );
// 提取Cluster顶点位置, 并转换到Imposter空间.
for( uint32 VertIndex = 0; VertIndex < Cluster.NumVerts; VertIndex++ )
{
FVector Position = Cluster.GetPosition( VertIndex );
Position = LocalToImposter.TransformPosition( Position );
Positions[ VertIndex ].X = ( Position.X * 0.5f + 0.5f ) * ViewSize;
Positions[ VertIndex ].Y = ( Position.Y * 0.5f + 0.5f ) * ViewSize;
Positions[ VertIndex ].Z = ( Position.Z * 0.5f + 0.5f ) * 254.0f + 1.0f; // zero is reserved as masked
}
// 遍历所有三角形, 光栅化它们到Imposter.
for( uint32 TriIndex = 0; TriIndex < Cluster.NumTris; TriIndex++ )
{
FVector Verts[3];
Verts[0] = Positions[ Cluster.Indexes[ TriIndex * 3 + 0 ] ];
Verts[1] = Positions[ Cluster.Indexes[ TriIndex * 3 + 1 ] ];
Verts[2] = Positions[ Cluster.Indexes[ TriIndex * 3 + 2 ] ];
// 光栅化三角形.
RasterizeTri( Verts, Scissor, 0,
// 保存光栅化后的结果.
[&]( int32 x, int32 y, float z )
{
uint32 Depth = FMath::RoundToInt( FMath::Clamp( z, 1.0f, 255.0f ) );
uint16 PixelValue = ( Depth << 8 ) | ( ClusterIndex << 7 ) | TriIndex;
//uint32 PixelIndex = x + y * ViewSize;
uint32 PixelIndex = x + ( y + ( TilePos.X + TilePos.Y * AtlasSize ) * TileSize ) * TileSize;
Pixels[ PixelIndex ] = FMath::Max( Pixels[ PixelIndex ], PixelValue );
} );
}
}
// Engine\Source\Developer\NaniteBuilder\Private\Rasterizer.h
// 软光栅指定的三角形, 写入数据时调用FWritePixel回调函数.
template< typename FWritePixel >
void RasterizeTri( const FVector Verts[3], const FIntRect& ScissorRect, uint32 SubpixelDilate, FWritePixel WritePixel )
{
constexpr uint32 SubpixelBits = 8;
constexpr uint32 SubpixelSamples = 1 << SubpixelBits;
FVector v01 = Verts[1] - Verts[0];
FVector v02 = Verts[2] - Verts[0];
float DetXY = v01.X * v02.Y - v01.Y * v02.X;
if( DetXY >= 0.0f )
{
// 背面剔除.
// 如果未剔除,需要交换顶点,为其余代码纠正winding.
return;
}
FVector2D GradZ;
GradZ.X = ( v01.Z * v02.Y - v01.Y * v02.Z ) / DetXY;
GradZ.Y = ( v01.X * v02.Z - v01.Z * v02.X ) / DetXY;
// 24.8 fixed point
FIntPoint Vert0 = ToIntPoint( Verts[0] * SubpixelSamples );
FIntPoint Vert1 = ToIntPoint( Verts[1] * SubpixelSamples );
FIntPoint Vert2 = ToIntPoint( Verts[2] * SubpixelSamples );
// 矩形包围盒.
FIntRect RectSubpixel( Vert0, Vert0 );
RectSubpixel.Include( Vert1 );
RectSubpixel.Include( Vert2 );
RectSubpixel.InflateRect( SubpixelDilate );
// 四舍五入到最近像素.
FIntRect RectPixel = ( ( RectSubpixel + (SubpixelSamples / 2) - 1 ) ) / SubpixelSamples;
// 裁剪到视口.
RectPixel.Clip( ScissorRect );
// 若没有像素覆盖, 裁剪之.
if( RectPixel.IsEmpty() )
return;
// 12.8 fixed point
FIntPoint Edge01 = Vert0 - Vert1;
FIntPoint Edge12 = Vert1 - Vert2;
FIntPoint Edge20 = Vert2 - Vert0;
// 用半像素偏移调整MinPixel.
// 12.8 fixed point
// 最大的三角形尺寸 = 2047x2047 像素.
const FIntPoint BaseSubpixel = RectPixel.Min * SubpixelSamples + (SubpixelSamples / 2);
Vert0 -= BaseSubpixel;
Vert1 -= BaseSubpixel;
Vert2 -= BaseSubpixel;
auto EdgeC = [=]( const FIntPoint& Edge, const FIntPoint& Vert )
{
int64 ex = Edge.X;
int64 ey = Edge.Y;
int64 vx = Vert.X;
int64 vy = Vert.Y;
// Half-edge constants
// 24.16 fixed point
int64 C = ey * vx - ex * vy;
// 校正填充公约(fill convention)
// Top left rule for CCW
C -= ( Edge.Y < 0 || ( Edge.Y == 0 && Edge.X > 0 ) ) ? 0 : 1;
// 扩大边.
C += ( FMath::Abs( Edge.X ) + FMath::Abs( Edge.Y ) ) * SubpixelDilate;
// 像素增量步进.
// 低位总是相同的,因此在测试符号时无关紧要。
// 24.8 fixed point
return int32( C >> SubpixelBits );
};
int32 C0 = EdgeC( Edge01, Vert0 );
int32 C1 = EdgeC( Edge12, Vert1 );
int32 C2 = EdgeC( Edge20, Vert2 );
float Z0 = Verts[0].Z - ( GradZ.X * Vert0.X + GradZ.Y * Vert0.Y ) / SubpixelSamples;
int32 CY0 = C0;
int32 CY1 = C1;
int32 CY2 = C2;
float ZY = Z0;
// 遍历矩形内的所有像素, 填充在三角形内的像素.
for( int32 y = RectPixel.Min.Y; y < RectPixel.Max.Y; y++ )
{
int32 CX0 = CY0;
int32 CX1 = CY1;
int32 CX2 = CY2;
float ZX = ZY;
for( int32 x = RectPixel.Min.X; x < RectPixel.Max.X; x++ )
{
// 如果当前3个边的X分量都是正数, 说明在三角形内, 调用WritePixel写入数据.
if( ( CX0 | CX1 | CX2 ) >= 0 )
{
WritePixel( x, y, ZX );
}
CX0 -= Edge01.Y;
CX1 -= Edge12.Y;
CX2 -= Edge20.Y;
ZX += GradZ.X;
}
CY0 += Edge01.X;
CY1 += Edge12.X;
CY2 += Edge20.X;
ZY += GradZ.Y;
}
}
本小节总结一下Nanite数据构建的过程。最初的入口是BuildNaniteFromHiResSourceModel:
下面是NaniteBuilderModule.Build的主要过程概述:
构建三角形索引和材质索引的关联数组。
BuildNaniteData:构建Nanite数据。
处理顶点色。
遍历所有Section,给每个Section构建Cluster。
ClusterTriangles:将Section拆分成一个或多个Cluster。
检测是否需要用粗糙代表(coarse representation)代替原始的静态网格数据。
为所有Section调用BuildDAG构建有向非循环图加速减面减模。
如果使用粗糙代表,则调用BuildCoarseRepresentation构建粗糙代表的数据,然后使用粗糙网格范围修正网格section信息,同时遵守原始序号和保留材质。
Encode:编码Nanite网格。
如果有需要(只有一个Section时),则生成FImposterAtlas。
在Nanite的构建过程使用了大量的优化技巧,主要包含但不限于:
另外再说一下,Nanite并没有使用之前传闻的几何图像(Geometry Image)技术,但核心思想或技术还是比较类似的。
据说UE5的Coder中就有和国际数学大师丘成桐弟子、几何图像先驱——顾险峰教授一起发表过论文的作者。
关于Geometry Image技术,可以参考顾教授的论文Geometry images以及他的公众号:老顾谈几何。
本节将阐述Nanite渲染阶段的代码及逻辑。
UE5在渲染模块做了较大的改动以支持Nanite特性的渲染,总结起来如下:
引擎模块:
渲染模块:
新增NaniteRender模块,包含FNaniteCommandInfo、ENaniteMeshPass、FNaniteDrawListContext、FCullingContext、FRasterContext、FRasterResults、FNaniteShader、FNaniteMaterialVS、FNaniteMeshProcessor、FNaniteMaterialTables、ERasterTechnique、ERasterScheduling、EOutputBufferMode、FPackedView等等类型及处理接口。
FPrimitiveSceneInfo增加NaniteCommandInfos、NaniteMaterialIds、LumenPrimitiveIndex以及CachedRayTracingMeshCommandsHashPerLOD、bRegisteredWithVelocityData、InstanceDataOffset、NumInstanceDataEntries实例化和光追相关的等数据和处理接口。
新增NaniteResources模块,包含Nanite::FSceneProxy、Nanite::FResources、Nanite::FVertexFactory等类型。
新增NaniteStreamingManager模块,包含Nanite::FPageKey、Nanite::FGPUStreamingRequest、Nanite::FStreamingRequest、Nanite::FStreamingPageInfo、Nanite::FRootPageInfo、Nanite::FPendingPage、Nanite::FAsyncState、Nanite::FStreamingManager、Nanite::、Nanite::等类型。
FPrimitiveSceneProxyz增加SupportsNaniteRendering、IsNaniteMesh、bSupportsMeshCardRepresentation、IsAlwaysVisible、GetPrimitiveInstances、RayTracingGroupId等。
SceneInterface和SceneManagement增加FInstanceCullingManagerResources等类型,使用FGPUScenePrimitiveCollector代替FPrimitiveUniformShaderParameters。
SceneView增加FViewShaderParameters、PrecomputedIndirectLightingColorScale、GlobalDistanceField、VirtualTexture、PhysicsField、Lumen、Instance、Page等shader绑定。
FPrimitiveFlagsCompact等类型增加bIsNaniteMesh标记。
Shader模块:
本小节解析Nanite渲染中涉及的主要概念、类型及接口。
InstanceUniformShaderParameters
// Engine\Source\Runtime\Engine\Public\InstanceUniformShaderParameters.h
#define INSTANCE_SCENE_DATA_FLAG_CAST_SHADOWS 0x1
#define INSTANCE_SCENE_DATA_FLAG_DETERMINANT_SIGN 0x2
#define INSTANCE_SCENE_DATA_FLAG_HAS_IMPOSTER 0x4
// Nanite实例化信息.
class FNaniteInfo
{
public:
uint32 RuntimeResourceID; // 运行时的资源标识号.
uint32 HierarchyOffset_AndHasImposter; // 层次结构偏移和是否有Imposter的联合数据.
FNaniteInfo()
: RuntimeResourceID(0xFFFFFFFFu)
, HierarchyOffset_AndHasImposter(0xFFFFFFFFu)
{
}
FNaniteInfo(uint32 InRuntimeResourceID, int32 InHierarchyOffset, bool bHasImposter)
: RuntimeResourceID(InRuntimeResourceID)
, HierarchyOffset_AndHasImposter((InHierarchyOffset << 1) | (bHasImposter ? 1u : 0u))
{
}
};
// Nanite图元实例化信息.
struct FPrimitiveInstance
{
FMatrix InstanceToLocal;
FMatrix PrevInstanceToLocal;
FMatrix LocalToWorld;
FMatrix PrevLocalToWorld;
FVector4 NonUniformScale;
FVector4 InvNonUniformScaleAndDeterminantSign;
FBoxSphereBounds RenderBounds;
FBoxSphereBounds LocalBounds;
FVector4 LightMapAndShadowMapUVBias;
uint32 PrimitiveId; // 图元ID.
FNaniteInfo NaniteInfo; // Nanite信息.
uint32 LastUpdateSceneFrameNumber;
float PerInstanceRandom;
uint32 Flags;
};
(……)
// FInstanceUniformShaderParameters的声明. 需要和shader的FInstanceSceneData严格匹配.
BEGIN_GLOBAL_SHADER_PARAMETER_STRUCT(FInstanceUniformShaderParameters,ENGINE_API)
SHADER_PARAMETER(FMatrix, LocalToWorld)
SHADER_PARAMETER(FMatrix, PrevLocalToWorld)
SHADER_PARAMETER(FVector4, NonUniformScale)
SHADER_PARAMETER(FVector4, InvNonUniformScaleAndDeterminantSign)
SHADER_PARAMETER(FVector, LocalBoundsCenter)
SHADER_PARAMETER(uint32, PrimitiveId)
SHADER_PARAMETER(FVector, LocalBoundsExtent)
SHADER_PARAMETER(uint32, LastUpdateSceneFrameNumber)
SHADER_PARAMETER(uint32, NaniteRuntimeResourceID)
SHADER_PARAMETER(uint32, NaniteHierarchyOffset)
SHADER_PARAMETER(float, PerInstanceRandom)
SHADER_PARAMETER(uint32, Flags)
SHADER_PARAMETER(FVector4, LightMapAndShadowMapUVBias)
END_GLOBAL_SHADER_PARAMETER_STRUCT()
// 实例化场景着色数据.
struct FInstanceSceneShaderData
{
// 需要和SceneData.ush的GetInstanceData()相匹配.
enum { InstanceDataStrideInFloat4s = 10 };
FVector4 Data[InstanceDataStrideInFloat4s];
(......)
};
NaniteStreamingManager
// Engine\Source\Runtime\Engine\Public\Rendering\NaniteStreamingManager.h
namespace Nanite
{
// 页面键值
struct FPageKey
{
// 运行时资源ID.
uint32 RuntimeResourceID;
// 页索引.
uint32 PageIndex;
};
// 键值哈希
FORCEINLINE uint32 GetTypeHash( const FPageKey& Key )
{
return Key.RuntimeResourceID * 0xFC6014F9u + Key.PageIndex * 0x58399E77u;
}
// 键值比较.
FORCEINLINE bool operator==( const FPageKey& A, const FPageKey& B )
{
return A.RuntimeResourceID == B.RuntimeResourceID && A.PageIndex == B.PageIndex;
}
FORCEINLINE bool operator!=(const FPageKey& A, const FPageKey& B)
{
return !(A == B);
}
// 去重(deduplication)【之前】的数据信息.
struct FGPUStreamingRequest
{
uint32 RuntimeResourceID;
uint32 PageIndex_NumPages;
uint32 Priority;
};
// 去重(deduplication)【之后】的数据信息.
struct FStreamingRequest
{
FPageKey Key;
uint32 Priority;
};
// 流式页面信息.
struct FStreamingPageInfo
{
FStreamingPageInfo* Next;
FStreamingPageInfo* Prev;
FPageKey RegisteredKey;
FPageKey ResidentKey;
uint32 GPUPageIndex;
uint32 LatestUpdateIndex;
uint32 RefCount;
};
// 根页信息.
struct FRootPageInfo
{
uint32 RuntimeResourceID;
uint32 NumClusters;
};
// 挂起页面.
struct FPendingPage
{
#if !WITH_EDITOR
uint8* MemoryPtr;
FIoRequest Request;
IAsyncReadFileHandle* AsyncHandle;
IAsyncReadRequest* AsyncRequest;
#endif
uint32 GPUPageIndex;
FPageKey InstallKey;
#if !UE_BUILD_SHIPPING
uint32 BytesLeftToStream;
#endif
};
// 异步信息.
struct FAsyncState
{
FRHIGPUBufferReadback* LatestReadbackBuffer = nullptr;
const uint32* LatestReadbackBufferPtr = nullptr;
uint32 NumReadyPages = 0;
bool bUpdateActive = false;
bool bBuffersTransitionedToWrite = false;
};
// Nanite流管理器.
class FStreamingManager : public FRenderResource
{
public:
FStreamingManager();
// 初始化/释放RHI资源.
virtual void InitRHI() override;
virtual void ReleaseRHI() override;
// 增删资源.
void Add( FResources* Resources );
void Remove( FResources* Resources );
// 须在Nanite任何渲染发生之前每帧调用一次, 也必须在EndUpdate[之前]调用。
ENGINE_API void BeginAsyncUpdate(FRDGBuilder& GraphBuilder);
// 须在Nanite任何渲染发生之前每帧调用一次, 也必须在BeginUpdate[之后]调用。
ENGINE_API void EndAsyncUpdate(FRDGBuilder& GraphBuilder);
ENGINE_API bool IsAsyncUpdateInProgress();
// 在添加最后一个请求后,每帧调用一次。
ENGINE_API void SubmitFrameStreamingRequests(FRDGBuilder& GraphBuilder);
(......)
private:
friend class FStreamingUpdateTask;
// 堆缓冲, 包含数据和上传缓冲.
struct FHeapBuffer
{
int32 TotalUpload = 0;FGrowOnlySpanAllocator Allocator;
FScatterUploadBuffer UploadBuffer;
FRWByteAddressBuffer DataBuffer;
void Release()
{
UploadBuffer.Release();
DataBuffer.Release();
}
};
// FPackedCluster*, GeometryData { Index, Position, TexCoord, TangentX, TangentZ }*
FHeapBuffer ClusterPageData;
FHeapBuffer ClusterPageHeaders;
FScatterUploadBuffer ClusterFixupUploadBuffer;
FHeapBuffer Hierarchy; // 层次结构.
FHeapBuffer RootPages; // 根页面.
TRefCountPtr< FRDGPooledBuffer > StreamingRequestsBuffer;
uint32 MaxStreamingPages;
uint32 MaxPendingPages;
uint32 MaxPageInstallsPerUpdate;
uint32 MaxStreamingReadbackBuffers;
// 回传数据.
uint32 ReadbackBuffersWriteIndex;
uint32 ReadbackBuffersNumPending;
TArray<uint32> NextRootPageVersion;
uint32 NextUpdateIndex;
uint32 NumRegisteredStreamingPages;
uint32 NumPendingPages;
uint32 NextPendingPageIndex;
TArray<FRootPageInfo> RootPageInfos;
#if !UE_BUILD_SHIPPING
uint64 PrevUpdateTick;
#endif
TArray< FRHIGPUBufferReadback* > StreamingRequestReadbackBuffers;
TArray< FResources* > PendingAdds;
TMap< uint32, FResources* > RuntimeResourceMap;
TMap< FPageKey, FStreamingPageInfo* > RegisteredStreamingPagesMap; // This is updated immediately.
TMap< FPageKey, FStreamingPageInfo* > CommittedStreamingPageMap; // This update is deferred to the point where the page has been loaded and committed to memory.
TArray< FStreamingRequest > PrioritizedRequestsHeap;
FStreamingPageInfo StreamingPageLRU;
FStreamingPageInfo* StreamingPageInfoFreeList;
TArray< FStreamingPageInfo > StreamingPageInfos;
// 常驻流页面的修复信息, 需保持这个信息,以便能够释放页面。
TArray< FFixupChunk* > StreamingPageFixupChunks;
TArray< FPendingPage > PendingPages;
#if !WITH_EDITOR
TArray< uint8 > PendingPageStagingMemory;
#endif
TArray< uint8 > PendingPageStagingMemoryLZ;
FRequestsHashTable* RequestsHashTable = nullptr;
FStreamingPageUploader* PageUploader = nullptr;
FGraphEventArray AsyncTaskEvents;
FAsyncState AsyncState;
// 操作页面.
void CollectDependencyPages( FResources* Resources, TSet< FPageKey >& DependencyPages, const FPageKey& Key );
void SelectStreamingPages( FResources* Resources, TArray< FPageKey >& SelectedPages, TSet<FPageKey>& SelectedPagesSet, uint32 RuntimeResourceID, uint32 PageIndex, uint32 MaxSelectedPages );
// 注册/取消注册页面.
void RegisterStreamingPage( FStreamingPageInfo* Page, const FPageKey& Key );
void UnregisterPage( const FPageKey& Key );
void MovePageToFreeList( FStreamingPageInfo* Page );
void ApplyFixups( const FFixupChunk& FixupChunk, const FResources& Resources, uint32 PageIndex, uint32 GPUPageIndex );
bool ArePageDependenciesCommitted(uint32 RuntimeResourceID, uint32 PageIndex, uint32 DependencyPageStart, uint32 DependencyPageNum);
// 返回是否完成了任何工作且页面/层次缓冲区是否转换为计算可写状态.
bool ProcessNewResources( FRDGBuilder& GraphBuilder);
uint32 DetermineReadyPages();
void InstallReadyPages( uint32 NumReadyPages );
// 异步更新.
void AsyncUpdate();
void ClearStreamingRequestCount(FRDGBuilder& GraphBuilder, FRDGBufferUAVRef BufferUAVRef);
};
// Nanite流管理器声明.
extern ENGINE_API TGlobalResource< FStreamingManager > GStreamingManager;
}
NaniteRender
// Engine\Source\Runtime\Renderer\Private\Nanite\NaniteRender.h
static constexpr uint32 NANITE_MAX_MATERIALS = 64;
static constexpr uint32 MAX_VIEWS_PER_CULL_RASTERIZE_PASS_BITS = 12;
static constexpr uint32 MAX_VIEWS_PER_CULL_RASTERIZE_PASS_MASK = ( ( 1 << MAX_VIEWS_PER_CULL_RASTERIZE_PASS_BITS ) - 1 );
static constexpr uint32 MAX_VIEWS_PER_CULL_RASTERIZE_PASS = ( 1 << MAX_VIEWS_PER_CULL_RASTERIZE_PASS_BITS );
(……)
// Nanite统一缓冲区参数.
BEGIN_GLOBAL_SHADER_PARAMETER_STRUCT(FNaniteUniformParameters, )
SHADER_PARAMETER(FIntVector4, SOAStrides)
SHADER_PARAMETER(FIntVector4, MaterialConfig) // .x mode, .yz grid size, .w unused
SHADER_PARAMETER(uint32, MaxNodes)
SHADER_PARAMETER(uint32, MaxVisibleClusters)
SHADER_PARAMETER(uint32, RenderFlags)
SHADER_PARAMETER(FVector4, RectScaleOffset) // xy: scale, zw: offset
SHADER_PARAMETER_SRV(ByteAddressBuffer, ClusterPageData)
SHADER_PARAMETER_SRV(ByteAddressBuffer, ClusterPageHeaders)
SHADER_PARAMETER_SRV(ByteAddressBuffer, VisibleClustersSWHW)
SHADER_PARAMETER_SRV(StructuredBuffer
SHADER_PARAMETER_TEXTURE(Texture2D
SHADER_PARAMETER_TEXTURE(Texture2D
SHADER_PARAMETER_TEXTURE(Texture2D
SHADER_PARAMETER_TEXTURE(Texture2D
END_SHADER_PARAMETER_STRUCT()
(……)
// 光栅化参数.
BEGIN_SHADER_PARAMETER_STRUCT( FRasterParameters, )
SHADER_PARAMETER_RDG_TEXTURE_UAV( RWTexture2D< uint >, OutDepthBuffer ) // 深度
SHADER_PARAMETER_RDG_TEXTURE_UAV( RWTexture2D< UlongType >, OutVisBuffer64 ) // 可见性
SHADER_PARAMETER_RDG_TEXTURE_UAV( RWTexture2D< UlongType >, OutDbgBuffer64 ) // 调试数据
SHADER_PARAMETER_RDG_TEXTURE_UAV( RWTexture2D< uint >, OutDbgBuffer32 )
SHADER_PARAMETER_RDG_TEXTURE_UAV( RWTexture2D< uint >, LockBuffer ) // 锁定缓冲
END_SHADER_PARAMETER_STRUCT()
// Nanite绘制命令信息.
class FNaniteCommandInfo
{
public:
static constexpr int32 MAX_STATE_BUCKET_ID = (1 << 14) - 1; // Must match NaniteDataDecode.ush
void SetStateBucketId(int32 InStateBucketId)
{
StateBucketId = InStateBucketId;
}
int32 GetStateBucketId() const
{
check(StateBucketId < MAX_STATE_BUCKET_ID);
return StateBucketId;
}
uint32 GetMaterialId() const
{
return GetMaterialId(GetStateBucketId());
}
static uint32 GetMaterialId(int32 StateBucketId)
{
float DepthId = GetDepthId(StateBucketId);
return *reinterpret_cast<uint32*>(&DepthId);
}
static float GetDepthId(int32 StateBucketId)
{
return float(StateBucketId + 1) / float(MAX_STATE_BUCKET_ID);
}
private:
// 将索引存储到对应FMeshDrawCommand的FScene::NaniteDrawCommands中.
int32 StateBucketId = INDEX_NONE;
};
struct MeshDrawCommandKeyFuncs;
// Nanite绘制命令列表上下文, 跟非Nanite模式的比较类型.
class FNaniteDrawListContext : public FMeshPassDrawListContext
{
public:
FNaniteDrawListContext(FRWLock& InNaniteDrawCommandLock, FStateBucketMap& InNaniteDrawCommands);
virtual FMeshDrawCommand& AddCommand(FMeshDrawCommand& Initializer, uint32 NumElements) override final;
virtual void FinalizeCommand(
const FMeshBatch& MeshBatch,
int32 BatchElementIndex,
int32 DrawPrimitiveId,
int32 ScenePrimitiveId,
ERasterizerFillMode MeshFillMode,
ERasterizerCullMode MeshCullMode,
FMeshDrawCommandSortKey SortKey,
EFVisibleMeshDrawCommandFlags Flags,
const FGraphicsMinimalPipelineStateInitializer& PipelineState,
const FMeshProcessorShaders* ShadersForDebugging,
FMeshDrawCommand& MeshDrawCommand
) override final;
(......)
private:
FRWLock* NaniteDrawCommandLock;
FStateBucketMap* NaniteDrawCommands; // Nanite绘制命令.
FNaniteCommandInfo CommandInfo; // Nanite命令信息.
FMeshDrawCommand MeshDrawCommandForStateBucketing;
};
// Nanite着色器父类.
class FNaniteShader : public FGlobalShader
{
public:
(……)
static bool ShouldCompilePermutation(const FGlobalShaderPermutationParameters& Parameters);
static void ModifyCompilationEnvironment(const FGlobalShaderPermutationParameters& Parameters, FShaderCompilerEnvironment& OutEnvironment;
};
// 指定深度绘制全屏的顶点着色器, 可在所有平台运行.
class FNaniteMaterialVS : public FNaniteShader
{
DECLARE_GLOBAL_SHADER(FNaniteMaterialVS);
BEGIN_SHADER_PARAMETER_STRUCT(FParameters, )
SHADER_PARAMETER(float, MaterialDepth)
END_SHADER_PARAMETER_STRUCT()
(......)
void GetShaderBindings(
const FScene* Scene,
ERHIFeatureLevel::Type FeatureLevel,
const FPrimitiveSceneProxy* PrimitiveSceneProxy,
const FMaterialRenderProxy& MaterialRenderProxy,
const FMaterial& Material,
const FMeshPassProcessorRenderState& DrawRenderState,
const FMeshMaterialShaderElementData& ShaderElementData,
FMeshDrawSingleShaderBindings& ShaderBindings) const
{
ShaderBindings.Add(NaniteUniformBuffer, DrawRenderState.GetNaniteUniformBuffer());
}
private:
LAYOUT_FIELD(FShaderParameter, MaterialDepth);
LAYOUT_FIELD(FShaderUniformBufferParameter, NaniteUniformBuffer);
};
// Nanite网格处理器.
class FNaniteMeshProcessor : public FMeshPassProcessor
{
public:
(……)
virtual void AddMeshBatch(const FMeshBatch& RESTRICT MeshBatch, uint64 BatchElementMask, const FPrimitiveSceneProxy* RESTRICT PrimitiveSceneProxy, int32 StaticMeshId = -1) override final;
private:
FMeshPassProcessorRenderState PassDrawRenderState;
};
// 创建Nanite网格处理器实例.
FMeshPassProcessor* CreateNaniteMeshProcessor(const FScene* Scene, const FSceneView* InViewIfDynamicMeshCommand, FMeshPassDrawListContext* InDrawListContext);
// Nanite材质表.
class FNaniteMaterialTables
{
public:
FNaniteMaterialTables(uint32 MaxMaterials = NANITE_MAX_MATERIALS);
~FNaniteMaterialTables();
void Release();
void UpdateBufferState(FRDGBuilder& GraphBuilder, uint32 NumPrimitives);
void Begin(FRHICommandListImmediate& RHICmdList, uint32 NumPrimitives, uint32 NumPrimitiveUpdates);
void* GetDepthTablePtr(uint32 PrimitiveIndex, uint32 EntryCount);
void Finish(FRHICommandListImmediate& RHICmdList);
FRHIShaderResourceView* GetDepthTableSRV() const { return DepthTableDataBuffer.SRV; }
private:
uint32 MaxMaterials = 0;
uint32 NumPrimitiveUpdates = 0;
uint32 NumDepthTableUpdates = 0;
uint32 NumHitProxyTableUpdates = 0;
// CPU及用于上传的数据缓冲.
FScatterUploadBuffer DepthTableUploadBuffer;
FRWByteAddressBuffer DepthTableDataBuffer;
FScatterUploadBuffer HitProxyTableUploadBuffer;
FRWByteAddressBuffer HitProxyTableDataBuffer;
};
namespace Nanite
{
// 光栅化技术.
enum class ERasterTechnique : uint8
{
LockBufferFallback = 0, // 使用备用锁定缓冲来近似没有64位原子(有竞争条件).
PlatformAtomics = 1, // 使用平台提供的64位原子.
NVAtomics = 2, // 使用Nv扩展提供的64位原子.
AMDAtomicsD3D11 = 3, // 使用AMD扩展(D3D11)提供的64位原子.
AMDAtomicsD3D12 = 4, // 使用AMD扩展(D3D12)提供的64位原子.
DepthOnly = 5, // 对深度使用32位原子, 没有额外负载.
NumTechniques
};
// 光栅化调度模式.
enum class ERasterScheduling : uint8
{
HardwareOnly = 0, // 只使用固定功能硬件的光栅化.
HardwareThenSoftware = 1, // 用硬件光栅化大三角形,用软件(comtue shader)光栅化小三角形.
HardwareAndSoftwareOverlap = 2, // 用硬件光栅化大三角形,重叠地用软件(comtue shader)光栅化小三角形.
};
// 输出缓冲模式. 当创建设备上下文时用来选择光栅化模式.
enum class EOutputBufferMode : uint8
{
VisBuffer, // 可见性缓冲, 默认模式, 用来输出ID和深度.
DepthOnly, // 仅光栅化深度到32位缓冲.
};
// 填充的视图.
struct FPackedView
{
FMatrix TranslatedWorldToView;
FMatrix TranslatedWorldToClip;
FMatrix ViewToClip;
FMatrix ClipToWorld;
FMatrix PrevTranslatedWorldToView;
FMatrix PrevTranslatedWorldToClip;
FMatrix PrevViewToClip;
FMatrix PrevClipToWorld;
FIntVector4 ViewRect;
FVector4 ViewSizeAndInvSize;
FVector4 ClipSpaceScaleOffset;
FVector4 PreViewTranslation;
FVector4 PrevPreViewTranslation;
FVector4 WorldCameraOrigin;
FVector4 ViewForwardAndNearPlane;
FVector2D LODScales;
float MinBoundsRadiusSq;
uint32 StreamingPriorityCategory_AndFlags;
FIntVector4 TargetLayerIdX_AndMipLevelY_AndNumMipLevelsZ;
FIntVector4 HZBTestViewRect; // In full resolution
// 计算LOD比例,假设视图大小和投影已经设置好。依赖全局变量GNaniteMaxPixelsPerEdge.
void UpdateLODScales();
};
// 裁剪上下文.
struct FCullingContext
{
FGlobalShaderMap* ShaderMap;
uint32 DrawPassIndex;
uint32 NumInstancesPreCull;
uint32 RenderFlags;
uint32 DebugFlags;
TRefCountPtr<IPooledRenderTarget> PrevHZB; // 如果非null, HZB裁剪将开启.
FIntRect HZBBuildViewRect;
bool bTwoPassOcclusion;
bool bSupportsMultiplePasses;
FIntVector4 SOAStrides;
FRDGBufferRef MainRasterizeArgsSWHW;
FRDGBufferRef PostRasterizeArgsSWHW;
FRDGBufferRef SafeMainRasterizeArgsSWHW;
FRDGBufferRef SafePostRasterizeArgsSWHW;
FRDGBufferRef MainAndPostPassPersistentStates;
FRDGBufferRef VisibleClustersSWHW;
FRDGBufferRef OccludedInstances;
FRDGBufferRef OccludedInstancesArgs;
FRDGBufferRef TotalPrevDrawClustersBuffer;
FRDGBufferRef StreamingRequests;
FRDGBufferRef ViewsBuffer;
FRDGBufferRef InstanceDrawsBuffer;
FRDGBufferRef StatsBuffer;
};
// 光栅化上下文.
struct FRasterContext
{
FGlobalShaderMap* ShaderMap;
FVector2D RcpViewSize;
FIntPoint TextureSize;
ERasterTechnique RasterTechnique;
ERasterScheduling RasterScheduling;
FRasterParameters Parameters;
FRDGTextureRef LockBuffer;
FRDGTextureRef DepthBuffer;
FRDGTextureRef VisBuffer64;
FRDGTextureRef DbgBuffer64;
FRDGTextureRef DbgBuffer32;
uint32 VisualizeModeBitMask;
bool VisualizeActive;
};
// 光栅化结果.
struct FRasterResults
{
FIntVector4 SOAStrides;
uint32 MaxVisibleClusters;
uint32 MaxNodes;
uint32 RenderFlags;
FRDGBufferRef ViewsBuffer{};
FRDGBufferRef VisibleClustersSWHW{};
FRDGTextureRef VisBuffer64{};
FRDGTextureRef DbgBuffer64{};
FRDGTextureRef DbgBuffer32{};
FRDGTextureRef MaterialDepth{};
FRDGTextureRef NaniteMask{};
FRDGTextureRef VelocityBuffer{};
TArray<FVisualizeResult, TInlineAllocator<32>> Visualizations;
};
// 初始化裁剪上下文.
FCullingContext InitCullingContext(FRDGBuilder& GraphBuilder, const FScene& Scene, …);
// 初始化光栅化上下文.
FRasterContext InitRasterContext(FRDGBuilder& GraphBuilder, ERHIFeatureLevel::Type FeatureLevel, …);
// 填充的视图参数.
struct FPackedViewParams
{
FViewMatrices ViewMatrices;
FViewMatrices PrevViewMatrices;
FIntRect ViewRect;
FIntPoint RasterContextSize;
uint32 StreamingPriorityCategory = 0;
float MinBoundsRadius = 0.0f;
float LODScaleFactor = 1.0f;
uint32 Flags = 0;
int32 TargetLayerIndex = 0;
int32 PrevTargetLayerIndex = INDEX_NONE;
int32 TargetMipLevel = 0;
int32 TargetMipCount = 1;
FIntRect HZBTestViewRect = {0, 0, 0, 0};
};
FPackedView CreatePackedView( const FPackedViewParams& Params );
FPackedView CreatePackedViewFromViewInfo(const FViewInfo& View, FIntPoint RasterContextSize, …);
// 光栅化状态.
struct FRasterState
{
bool bNearClip = true; // 是否开启Near平面裁剪.
ERasterizerCullMode CullMode = CM_CW; // 光栅化裁剪模式, 默认是顺时针.
};
// 带裁剪的光栅化.
void CullRasterize(
FRDGBuilder& GraphBuilder,
const FScene& Scene,
const TArray
FCullingContext& CullingContext,
const FRasterContext& RasterContext,
const FRasterState& RasterState = FRasterState(),
const TArray
bool bExtractStats = false
);
// 光栅化到虚拟阴影图(virtual shadow map)集
void CullRasterize(
FRDGBuilder& GraphBuilder,
const FScene& Scene,
const TArray
uint32 NumPrimaryViews, // Number of non-mip views
FCullingContext& CullingContext,
const FRasterContext& RasterContext,
const FRasterState& RasterState = FRasterState(),
const TArray
FVirtualShadowMapArray* VirtualShadowMapArray = nullptr,
bool bExtractStats = false
);
// 解压光栅化结果.
void ExtractResults(FRDGBuilder& GraphBuilder, const FCullingContext& CullingContext, const FRasterContext& RasterContext, FRasterResults& RasterResults);
// 触发阴影图.
void EmitShadowMap(FRDGBuilder& GraphBuilder, const FRasterContext& RasterContext, const FRDGTextureRef DepthBuffer, …);
// 触发立方体图阴影.
void EmitCubemapShadow(FRDGBuilder& GraphBuilder, const FRasterContext& RasterContext, const FRDGTextureRef CubemapDepthBuffer, …);
// 触发深度目标.
void EmitDepthTargets(FRDGBuilder& GraphBuilder, const FScene& Scene, const FViewInfo& View, …);
// 绘制BasePass.
void DrawBasePass(FRDGBuilder& GraphBuilder, const FSceneTextures& SceneTextures, const FDBufferTextures& DBufferTextures, const FScene& Scene, const FViewInfo& View, const FRasterResults& RasterResults
);
// 绘制Lumen网格捕捉通道.
void DrawLumenMeshCapturePass(FRDGBuilder& GraphBuilder, const FScene& Scene, …);
(……)
}
// 是否需要渲染Nanite.
extern bool ShouldRenderNanite(const FScene* Scene, const FViewInfo& View, bool bCheckForAtomicSupport = true);
NaniteSceneProxy
// Engine\Source\Runtime\Engine\Public\NaniteSceneProxy.h
namespace Nanite
{
// Nanite场景代理父类.
class FSceneProxyBase : public FPrimitiveSceneProxy
{
public:
struct FMaterialSection
{
UMaterialInterface* Material = nullptr;
int32 MaterialIndex = INDEX_NONE;
};
public:
ENGINE_API SIZE_T GetTypeHash() const override;
FSceneProxyBase(UPrimitiveComponent* Component)
: FPrimitiveSceneProxy(Component)
{
bIsNaniteMesh = true;
bAlwaysVisible = true;
}
// 检测是否满足Nanite渲染的条件: 不透明物体, 不是贴花, 不是Masked, 不是法线半透明, 不是分离半透明.
static bool IsNaniteRenderable(FMaterialRelevance MaterialRelevance)
{
return MaterialRelevance.bOpaque &&
!MaterialRelevance.bDecal &&
!MaterialRelevance.bMasked &&
!MaterialRelevance.bNormalTranslucency &&
!MaterialRelevance.bSeparateTranslucency;
}
virtual bool CanBeOccluded() const override;
inline const TArray<FMaterialSection>& GetMaterialSections() const;
inline int32 GetMaterialMaxIndex() const;
virtual const TArray<FPrimitiveInstance>* GetPrimitiveInstances() const;
virtual TArray<FPrimitiveInstance>* GetPrimitiveInstances();
virtual uint8 GetCurrentFirstLODIdx_RenderThread() const override;
protected:
ENGINE_API void DrawStaticElementsInternal(FStaticPrimitiveDrawInterface* PDI, const FLightCacheInterface* LCI);
protected:
TArray
TArray
int32 MaterialMaxIndex = INDEX_NONE;
};
// Nanite场景代理.
class FSceneProxy : public FSceneProxyBase
{
public:
FSceneProxy(UStaticMeshComponent* Component);
FSceneProxy(UInstancedStaticMeshComponent* Component);
FSceneProxy(UHierarchicalInstancedStaticMeshComponent* Component);
virtual ~FSceneProxy() = default;
public:
// FPrimitiveSceneProxy接口.
virtual FPrimitiveViewRelevance GetViewRelevance(const FSceneView* View) const override;
virtual void GetLightRelevance(const FLightSceneProxy* LightSceneProxy, bool& bDynamic, bool& bRelevant, bool& bLightMapped, bool& bShadowMapped) const override;
// 获取静态或动态网格元素.
virtual void DrawStaticElements(FStaticPrimitiveDrawInterface* PDI) override;
virtual void GetDynamicMeshElements(const TArray<const FSceneView*>& Views, const FSceneViewFamily& ViewFamily, uint32 VisibilityMap, FMeshElementCollector& Collector) const override;
// 光追相关接口.
#if RHI_RAYTRACING
virtual bool IsRayTracingRelevant() const { return true; }
virtual bool IsRayTracingStaticRelevant() const { return false; }
virtual void GetDynamicRayTracingInstances(FRayTracingMaterialGatheringContext& Context, TArray
#endif
virtual uint32 GetMemoryFootprint() const override;
virtual void GetLCIs(FLCIArray& LCIs) override
{
FLightCacheInterface* LCI = &MeshInfo;
LCIs.Add(LCI);
}
// 距离场接口.
virtual void GetDistancefieldAtlasData(const FDistanceFieldVolumeData*& OutDistanceFieldData, float& SelfShadowBias) const override;
virtual void GetDistancefieldInstanceData(TArray<FMatrix>& ObjectLocalToWorldTransforms) const override;
virtual bool HasDistanceFieldRepresentation() const override;
// GI接口.
virtual const FCardRepresentationData* GetMeshCardRepresentation() const override;
virtual int32 GetLightMapCoordinateIndex() const override;
// 获取静态网格.
const UStaticMesh* GetStaticMesh() const
{
return StaticMesh;
}
protected:
virtual void CreateRenderThreadResources() override;
class FMeshInfo : public FLightCacheInterface
{
public:
FMeshInfo(const UStaticMeshComponent* InComponent);// FLightCacheInterface.
virtual FLightInteraction GetInteraction(const FLightSceneProxy* LightSceneProxy) const override;
private:
TArray<FGuid> IrrelevantLights;
};
bool IsCollisionView(const FEngineShowFlags& EngineShowFlags, bool& bDrawSimpleCollision, bool& bDrawComplexCollision) const;
protected:
FMeshInfo MeshInfo;
FResources* Resources = nullptr;
const FStaticMeshRenderData* RenderData;
const FDistanceFieldVolumeData* DistanceFieldData;
const FCardRepresentationData* CardRepresentationData;
FMaterialRelevance MaterialRelevance;
uint32 bReverseCulling : 1;
uint32 bHasMaterialErrors : 1;
const UStaticMesh* StaticMesh = nullptr;
#if RHI_RAYTRACING
TArray
#endif
(......)
};
} // namespace Nanite
RenderUtils
// Engine\Source\Runtime\RenderCore\Public\RenderUtils.h
(……)
// 检测平台是否支持Nanite渲染.
RENDERCORE_API bool DoesPlatformSupportNanite(EShaderPlatform Platform)
{
// 确保当前平台定义了DDPI(FGenericDataDrivenShaderPlatformInfo).
const bool bValidPlatform = FDataDrivenShaderPlatformInfo::IsValid(Platform);
// Nanite需要GPUScene.
const bool bSupportGPUScene = FDataDrivenShaderPlatformInfo::GetSupportsGPUScene(Platform);
// Nanite特定检测.
const bool bSupportNanite = FDataDrivenShaderPlatformInfo::GetSupportsNanite(Platform);
const bool bFullCheck = bValidPlatform && bSupportGPUScene && bSupportNanite;
return bFullCheck;
}
// 使用Nanite, 如果成功将返回true.
inline bool UseNanite(EShaderPlatform ShaderPlatform, bool bCheckForAtomicSupport = true);
// 使用VSM, 成功返回true.
inline bool UseVirtualShadowMaps(EShaderPlatform ShaderPlatform, const FStaticFeatureLevel FeatureLevel);
// 使用非Nanite的VSM, 成功返回true. 前提是r.Shadow.Virtual.NonNaniteVSM不为0, 且UseVirtualShadowMaps为true.
inline bool UseNonNaniteVirtualShadowMaps(EShaderPlatform ShaderPlatform, const FStaticFeatureLevel FeatureLevel);
其它
// Engine\Source\Runtime\Engine\Classes\Components\StaticMeshComponent.h
class ENGINE_API UStaticMeshComponent : public UMeshComponent
{
(……)
uint8 bDisplayNaniteProxyMesh:1; // 对于nanite启用的网格,如果为true,将只显示代理网格.
(......)
};
// Engine\Source\Runtime\Engine\Public\PrimitiveSceneProxy.h
class FPrimitiveSceneProxy
{
inline bool IsNaniteMesh() const
{
return bIsNaniteMesh;
}
(......)
private:
uint8 bIsNaniteMesh : 1; // 是否Nanite网格.
(......)
};
// 如果指定网格可通过Nanite渲染, 则返回true.
ENGINE_API extern bool SupportsNaniteRendering(const FVertexFactory* RESTRICT VertexFactory, const FPrimitiveSceneProxy* RESTRICT PrimitiveSceneProxy);
ENGINE_API extern bool SupportsNaniteRendering(const FVertexFactory* RESTRICT VertexFactory, const FPrimitiveSceneProxy* RESTRICT PrimitiveSceneProxy, const class FMaterialRenderProxy* MaterialRenderProxy, ERHIFeatureLevel::Type FeatureLevel);
// Engine\Source\Runtime\Renderer\Public\MeshPassProcessor.h
struct FMeshPassProcessorRenderState
{
public:
void SetNaniteUniformBuffer(FRHIUniformBuffer* InNaniteUniformBuffer);
FRHIUniformBuffer* GetNaniteUniformBuffer() const;
(......)
private:
FRHIUniformBuffer* NaniteUniformBuffer = nullptr; // Nanite统一缓冲区.
(......)
};
// Engine\Source\Runtime\RenderCore\Public\VertexFactory.h
// 顶点工厂标记.
enum class EVertexFactoryFlags : uint32
{
None = 0u,
UsedWithMaterials = 1u << 1,
SupportsStaticLighting = 1u << 2,
SupportsDynamicLighting = 1u << 3,
SupportsPrecisePrevWorldPos = 1u << 4,
SupportsPositionOnly = 1u << 5,
SupportsCachingMeshDrawCommands = 1u << 6,
SupportsPrimitiveIdStream = 1u << 7,
SupportsNaniteRendering = 1u << 8, // 是否支持Nanite渲染.
};
// 是否支持Nanite渲染.
bool SupportsNaniteRendering() const { return HasFlags(EVertexFactoryFlags::SupportsNaniteRendering); }
// Engine\Source\Runtime\RHI\Public\RHIDefinitions.h
// 通用的数据驱动着色器平台信息.
class RHI_API FGenericDataDrivenShaderPlatformInfo
{
static FORCEINLINE_DEBUGGABLE const bool GetSupportsNanite(const FStaticShaderPlatform Platform)
{
return Infos[Platform].bSupportsNanite;
}
(......)
};
Nanite的主要渲染步骤也是发生在FDeferredShadingSceneRenderer::Render
,下面将阐述Nanite相关的步骤以及前几篇涉及的重要步骤:
void FDeferredShadingSceneRenderer::Render(FRDGBuilder& GraphBuilder)
{
// 尝试使用Nanite渲染。
const bool bNaniteEnabled = UseNanite(ShaderPlatform) && ViewFamily.EngineShowFlags.NaniteMeshes;
// 更新图元场景信息.
Scene->UpdateAllPrimitiveSceneInfos(GraphBuilder, true);
// 使用GPUScene.
FGPUSceneScopeBeginEndHelper GPUSceneScopeBeginEndHelper(Scene->GPUScene, GPUSceneDynamicContext, Scene);
bool bVisualizeNanite = false;
if (bNaniteEnabled) // Nanite开启才执行
{
// 更新Nanite全局资源. 需要为Nanite管理乱序的缓冲区。
Nanite::GGlobalResources.Update(GraphBuilder);
// 开始异步更新Nanite流管理器.
Nanite::GStreamingManager.BeginAsyncUpdate(GraphBuilder);
// 处理Nanite可视化模式.
FNaniteVisualizationData& NaniteVisualization = GetNaniteVisualizationData();
if (Views.Num() > 0)
{
const FName& NaniteViewMode = Views[0].CurrentNaniteVisualizationMode;
if (NaniteVisualization.Update(NaniteViewMode))
{
ViewFamily.EngineShowFlags.SetVisualizeNanite(true);
}
bVisualizeNanite = NaniteVisualization.IsActive() && ViewFamily.EngineShowFlags.VisualizeNanite;
}
}
(......)
// 是否需要应用Nanite材质.
const bool bShouldApplyNaniteMaterials
= !ViewFamily.EngineShowFlags.ShaderComplexity
&& !ViewFamily.UseDebugViewPS()
&& !ViewFamily.EngineShowFlags.Wireframe
&& !ViewFamily.EngineShowFlags.LightMapDensity;
(......)
// 实例化裁剪管理器.
FInstanceCullingManager InstanceCullingManager(GInstanceCullingManagerResources, Scene->GPUScene.IsEnabled());
bDoInitViewAftersPrepass = InitViews(GraphBuilder, ..., InstanceCullingManager);
(......)
// 处理GPUScene.
{
(......)
// 更新GPUScene.
Scene->GPUScene.Update(GraphBuilder, *Scene);
(......)
// 上传动态图元着色器数据到GPU.
for (int32 ViewIndex = 0; ViewIndex < Views.Num(); ViewIndex++)
{
FViewInfo& View = Views[ViewIndex];
Scene->GPUScene.UploadDynamicPrimitiveShaderDataForView(GraphBuilder, Scene, View);
}
// 实例化裁剪.
{
InstanceCullingManager.CullInstances(GraphBuilder, Scene->GPUScene);
}
(......)
}
(......)
if (bNaniteEnabled)
{
Nanite::ListStatFilters(this);
// 必须在每帧的Nanite渲染之前调用.
Nanite::GStreamingManager.EndAsyncUpdate(GraphBuilder);
}
(......)
// 提前深度通道.
RenderPrePass(GraphBuilder, SceneTextures.Depth.Target, InstanceCullingManager);
(......)
// Nanite光栅化
TArray<Nanite::FRasterResults, TInlineAllocator<2>> NaniteRasterResults;
if (bNaniteEnabled && Views.Num() > 0)
{
LLM_SCOPE_BYTAG(Nanite);
NaniteRasterResults.AddDefaulted(Views.Num());
RDG_GPU_STAT_SCOPE(GraphBuilder, NaniteRaster);
const FIntPoint RasterTextureSize = SceneTextures.Depth.Target->Desc.Extent;
const FViewInfo& PrimaryViewRef = Views[0];
const FIntRect PrimaryViewRect = PrimaryViewRef.ViewRect;
// 主光栅化视图
{
Nanite::FRasterState RasterState;
Nanite::FRasterContext RasterContext = Nanite::InitRasterContext(GraphBuilder, FeatureLevel, RasterTextureSize);
const bool bTwoPassOcclusion = true;
const bool bUpdateStreaming = true;
const bool bSupportsMultiplePasses = false;
const bool bForceHWRaster = RasterContext.RasterScheduling == Nanite::ERasterScheduling::HardwareOnly;
const bool bPrimaryContext = true;
const bool bDiscardNonMoving = ViewFamily.EngineShowFlags.DrawOnlyVSMInvalidatingGeo != 0;
// 遍历所有view
for (int32 ViewIndex = 0; ViewIndex < Views.Num(); ViewIndex++)
{
const FViewInfo& View = Views[ViewIndex];
// 初始化裁剪上下文.
Nanite::FCullingContext CullingContext = Nanite::InitCullingContext(
GraphBuilder,
*Scene,
!bIsEarlyDepthComplete ? View.PrevViewInfo.NaniteHZB : View.PrevViewInfo.HZB,
View.ViewRect,
bTwoPassOcclusion,
bUpdateStreaming,
bSupportsMultiplePasses,
bForceHWRaster,
bPrimaryContext,
bDiscardNonMoving
);
static FString EmptyFilterName = TEXT(""); // Empty filter represents primary view.
const bool bExtractStats = Nanite::IsStatFilterActive(EmptyFilterName);
Nanite::FPackedView PackedView = Nanite::CreatePackedViewFromViewInfo(View, RasterTextureSize, VIEW_FLAG_HZBTEST, /*StreamingPriorityCategory*/ 3);
// 带裁剪的光栅化.
Nanite::CullRasterize(
GraphBuilder,
*Scene,
{ PackedView },
CullingContext,
RasterContext,
RasterState,
/*OptionalInstanceDraws*/ nullptr,
bExtractStats
);
Nanite::FRasterResults& RasterResults = NaniteRasterResults[ViewIndex];
// 需要提前深度, 则渲染之.
if (bNeedsPrePass)
{
Nanite::EmitDepthTargets(
GraphBuilder,
*Scene,
Views[ViewIndex],
CullingContext.SOAStrides,
CullingContext.VisibleClustersSWHW,
CullingContext.ViewsBuffer,
SceneTextures.Depth.Target,
RasterContext.VisBuffer64,
RasterResults.MaterialDepth,
RasterResults.NaniteMask,
RasterResults.VelocityBuffer,
bNeedsPrePass
);
}
// 构建层次深度缓冲HZB.
if (!bIsEarlyDepthComplete && bTwoPassOcclusion && View.ViewState)
{
// 不会有一个针对后通道的完整的场景深度,所以不能使用完整的HZB主通道, 否则它将干扰后通道HZB销毁遮挡剔除。
RDG_EVENT_SCOPE(GraphBuilder, "Nanite::BuildHZB");
FRDGTextureRef SceneDepth = SystemTextures.Black;
FRDGTextureRef GraphHZB = nullptr;
// 最大程度地构建HZB.
BuildHZBFurthest(
GraphBuilder,
SceneDepth,
RasterContext.VisBuffer64,
PrimaryViewRect,
FeatureLevel,
ShaderPlatform,
TEXT("Nanite.HZB"),
/* OutFurthestHZBTexture = */ &GraphHZB );
GraphBuilder.QueueTextureExtraction( GraphHZB, &View.ViewState->PrevFrameViewInfo.NaniteHZB );
}
Nanite::ExtractResults(GraphBuilder, CullingContext, RasterContext, RasterResults);
}
}
}
(......)
// 渲染Nanite的BasePass.
{
RenderBasePass(GraphBuilder, SceneTextures, DBufferTextures, BasePassDepthStencilAccess, ForwardScreenSpaceShadowMaskTexture, InstanceCullingManager);
AddServiceLocalQueuePass(GraphBuilder);
if (bNaniteEnabled && bShouldApplyNaniteMaterials)
{
for (int32 ViewIndex = 0; ViewIndex < Views.Num(); ++ViewIndex)
{
const FViewInfo& View = Views[ViewIndex];
Nanite::FRasterResults& RasterResults = NaniteRasterResults[ViewIndex];
// 如果没有提前绘制深度, 则现在绘制深度
if (!bNeedsPrePass)
{
Nanite::EmitDepthTargets(
GraphBuilder,
*Scene,
Views[ViewIndex],
RasterResults.SOAStrides,
RasterResults.VisibleClustersSWHW,
RasterResults.ViewsBuffer,
SceneTextures.Depth.Target,
RasterResults.VisBuffer64,
RasterResults.MaterialDepth,
RasterResults.NaniteMask,
RasterResults.VelocityBuffer,
bNeedsPrePass
);
}
// 绘制BasePass.
Nanite::DrawBasePass(
GraphBuilder,
SceneTextures,
DBufferTextures,
*Scene,
View,
RasterResults
);
}
}
if (!bAllowReadOnlyDepthBasePass)
{
AddResolveSceneDepthPass(GraphBuilder, Views, SceneTextures.Depth);
}
(......)
}
(......)
if (bNaniteEnabled)
{
// 计算体积雾.
if (!bOcclusionBeforeBasePass)
{
ComputeVolumetricFog(GraphBuilder);
}
// 提交帧流请求.
Nanite::GStreamingManager.SubmitFrameStreamingRequests(GraphBuilder);
}
(......)
// 渲染延迟光源.
RenderLights(GraphBuilder, SceneTextures, ...);
(......)
// 渲染半透明物体.
RenderTranslucency(GraphBuilder, SceneTextures, ...);
(......)
// 后处理
AddPostProcessingPasses(GraphBuilder, View, PostProcessingInputs, NaniteResults, InstanceCullingManager);
(......)
}
由此可见,Nanite的渲染流程和普通模式比较类型,都是先更新图元数据、GPUScene、裁剪数据,然后渲染BasePass和Lighting,最后是半透明和后处理。不过也存在与普通模式不同点,如增加了GStreamingManager、FInstanceCullingManager、构建HZB、Nanite光栅化等阶段。下面借助RenderDoc截取示例工程AncientGame以展示UE5相关的主要步骤:
RenderDoc截取的UE5渲染过程,其中红框处是UE5相关的步骤。
Nanite的实例化裁剪由FInstanceCullingManager担当,贯穿在FDeferredShadingSceneRenderer::Render
的整个过程。下面是它及相关类型的定义和声明:
// Engine\Source\Runtime\Engine\Public\SceneManagement.h
// 实例化裁剪管理资源, 用于FInstanceCullingManager中.
class FInstanceCullingManagerResources : public FRenderResource
{
public:
// 最大非直接绘制实例数量是1024*1024=104万个.
static constexpr uint32 MaxIndirectInstances = 1024 * 1024;
// 初始化和释放RHI资源.
virtual void InitRHI() override;
virtual void ReleaseRHI() override;
// 获取数据接口.
FRHIBuffer* GetInstancesIdBuffer() const { return InstanceIdsBuffer.Buffer; }
FRHIShaderResourceView* GetInstancesIdBufferSrv() const { return InstanceIdsBuffer.SRV.GetReference(); }
FRHIShaderResourceView* GetPageInfoBufferSrv() const { return PageInfoBuffer.SRV.GetReference(); }
FUnorderedAccessViewRHIRef GetInstancesIdBufferUav() const { return InstanceIdsBuffer.UAV; }
FUnorderedAccessViewRHIRef GetPageInfoBufferUav() const { return PageInfoBuffer.UAV; }
private:
FRWBuffer PageInfoBuffer; // 页面信息缓冲.
FRWBuffer InstanceIdsBuffer; // 实例化ID缓冲.
};
// 全局FInstanceCullingManagerResources对象.
extern ENGINE_API TGlobalResource<FInstanceCullingManagerResources> GInstanceCullingManagerResources;
// Engine\Source\Runtime\Renderer\Private\InstanceCulling\InstanceCullingManager.h
// 实例化裁剪中间数据.
class FInstanceCullingIntermediate
{
public:
// 每个注册视图对应的每个Instance可见性位, 它被CullInstances接口处理.
FRDGBufferRef VisibleInstanceFlags = nullptr;
// 所有实例ID扩展所使用的写偏移量, 用于在全局实例ID缓冲区中分配空间. 被CullInstances初始化为0.
FRDGBufferRef InstanceIdOutOffsetBuffer = nullptr;
// 实例化数量.
int32 NumInstances = 0;
// 视图数量.
int32 NumViews = 0;
};
// 实例化裁剪结果.
struct FInstanceCullingResult
{
// 非直接绘制参数缓冲.
FRDGBufferRef DrawIndirectArgsBuffer = nullptr;
// 实例化ID偏移缓冲.
FRDGBufferRef InstanceIdOffsetBuffer = nullptr;
// 获取绘制参数到FInstanceCullingDrawParams中.
void GetDrawParameters(FInstanceCullingDrawParams &OutParams) const
{
OutParams.DrawIndirectArgsBuffer = DrawIndirectArgsBuffer;
OutParams.InstanceIdOffsetBuffer = InstanceIdOffsetBuffer;
}
// 带检测地获取绘制参数.
static void CondGetDrawParameters(const FInstanceCullingResult* InstanceCullingResult, FInstanceCullingDrawParams& OutParams)
{
if (InstanceCullingResult)
{
InstanceCullingResult->GetDrawParameters(OutParams);
}
else
{
OutParams.DrawIndirectArgsBuffer = nullptr;
OutParams.InstanceIdOffsetBuffer = nullptr;
}
}
};
// 管理所有实例绘制的非直接参数和裁剪作业的分配, 使用GPUScene裁剪.
class FInstanceCullingManager
{
public:
FInstanceCullingManager(FInstanceCullingManagerResources& InResources, bool bInIsEnabled);
// 图元展开后的最大平均实例数.
static constexpr uint32 MaxAverageInstanceFactor = 128;
bool IsEnabled() const { return bIsEnabled; }
// 注册需要裁剪的视图, 返回视图的id.
int32 RegisterView(const Nanite::FPackedViewParams& Params);
int32 RegisterView(const FViewInfo& ViewInfo);
// 裁剪实例, 需要在视图被初始化和注册之后, 需要在GPUScene被更新之后且渲染指令被提交之前.
void CullInstances(FRDGBuilder& GraphBuilder, FGPUScene& GPUScene);
// 由CullInstances填充, 被用于执行最终裁剪和渲染之时.
FInstanceCullingIntermediate CullingIntermediate;
private:
FInstanceCullingManagerResources& Resources;
TArray<Nanite::FPackedView> CullingViews;
bool bIsEnabled;
(....)
};
接下来分析FInstanceCullingManager::CullInstances
的代码:
// Engine\Source\Runtime\Renderer\Private\InstanceCulling\InstanceCullingManager.cpp
void FInstanceCullingManager::CullInstances(FRDGBuilder& GraphBuilder, FGPUScene& GPUScene)
{
#if GPUCULL_TODO
// 获取视图和实例化数量.
int32 NumViews = CullingViews.Num();
int32 NumInstances = GPUScene.InstanceDataAllocator.GetMaxSize();
RDG_EVENT_SCOPE(GraphBuilder, "CullInstances [%d Views X %d Instances]", NumViews, NumInstances);
(......)
TArray<uint32> NullArray;
NullArray.AddZeroed(1);
// 初始化裁剪中间数据CullingIntermediate.
CullingIntermediate.InstanceIdOutOffsetBuffer = CreateStructuredBuffer(GraphBuilder, TEXT("InstanceCulling.OutputOffsetBufferOut"), NullArray);
int32 NumInstanceFlagWords = FMath::DivideAndRoundUp(NumInstances, int32(sizeof(uint32) * 8));
CullingIntermediate.NumInstances = NumInstances;
CullingIntermediate.NumViews = NumViews;
if (NumInstances && NumViews) // 视图数量和实例化数量同时大于0才需要GPU裁剪.
{
// 为每个视图的每个实例创建一个缓冲区记录一个位,
CullingIntermediate.VisibleInstanceFlags = GraphBuilder.CreateBuffer(FRDGBufferDesc::CreateStructuredDesc(sizeof(uint32), NumInstanceFlagWords * NumViews), TEXT("InstanceCulling.VisibleInstanceFlags"));
FRDGBufferUAVRef VisibleInstanceFlagsUAV = GraphBuilder.CreateUAV(CullingIntermediate.VisibleInstanceFlags);
if (CVarCullInstances.GetValueOnRenderThread() != 0)
{
// 清理UAV.
AddClearUAVPass(GraphBuilder, VisibleInstanceFlagsUAV, 0);
// 处理裁剪实例CS的参数.
FCullInstancesCs::FParameters* PassParameters = GraphBuilder.AllocParameters<FCullInstancesCs::FParameters>();
// 从GPUScene获取实例化和图元数据.
PassParameters->GPUSceneInstanceSceneData = GPUScene.InstanceDataBuffer.SRV;
PassParameters->GPUScenePrimitiveSceneData = GPUScene.PrimitiveBuffer.SRV;
PassParameters->InstanceDataSOAStride = GPUScene.InstanceDataSOAStride;
PassParameters->NumInstances = NumInstances;
PassParameters->NumInstanceFlagWords = NumInstanceFlagWords;
// GPU侧View的类型是Nanite::FPackedView.
// GPU侧InViews的类型是StructuredBuffer< Nanite::FPackedView >.
PassParameters->InViews = GraphBuilder.CreateSRV(CreateStructuredBuffer(GraphBuilder, TEXT("InstanceCulling.CullingViews"), CullingViews));
PassParameters->NumViews = NumViews;
// 存储可见性结果的缓冲区.
PassParameters->InstanceVisibilityFlagsOut = VisibleInstanceFlagsUAV;
// CS用的是FCullInstancesCs, 后面再解析之.
auto ComputeShader = GetGlobalShaderMap(GMaxRHIFeatureLevel)->GetShader<FCullInstancesCs>();
// 增加裁剪的CS Pass.
FComputeShaderUtils::AddPass(
GraphBuilder,
RDG_EVENT_NAME("CullInstancesCs"),
ComputeShader,
PassParameters,
FComputeShaderUtils::GetGroupCount(NumInstances, FCullInstancesCs::NumThreadsPerGroup)
);
}
else // 视图数量和实例化数量都是0
{
// 所有都清理成可见.
AddClearUAVPass(GraphBuilder, VisibleInstanceFlagsUAV, 0xFFFFFFFF);
}
}
#endif // GPUCULL_TODO
}
上面的逻辑就是构建裁剪着色器FCullInstancesCs的参数,调用FComputeShaderUtils::AddPass进行裁剪工作。下面继续分析FCullInstancesCs的代码:
// Engine\Source\Runtime\Renderer\Private\InstanceCulling\InstanceCullingManager.cpp
class FCullInstancesCs : public FGlobalShader
{
DECLARE_GLOBAL_SHADER(FCullInstancesCs);
SHADER_USE_PARAMETER_STRUCT(FCullInstancesCs, FGlobalShader)
public:
static constexpr int32 NumThreadsPerGroup = 64;
static bool ShouldCompilePermutation(const FGlobalShaderPermutationParameters& Parameters)
{
return UseGPUScene(Parameters.Platform);
}
static void ModifyCompilationEnvironment(const FGlobalShaderPermutationParameters& Parameters, FShaderCompilerEnvironment& OutEnvironment)
{
FGlobalShader::ModifyCompilationEnvironment(Parameters, OutEnvironment);
OutEnvironment.SetDefine(TEXT("INDIRECT_ARGS_NUM_WORDS"), FInstanceCullingContext::IndirectArgsNumWords);
OutEnvironment.SetDefine(TEXT("VF_SUPPORTS_PRIMITIVE_SCENE_DATA"), 1);
OutEnvironment.SetDefine(TEXT("USE_GLOBAL_GPU_SCENE_DATA"), 1);
OutEnvironment.SetDefine(TEXT("NUM_THREADS_PER_GROUP"), NumThreadsPerGroup);
OutEnvironment.SetDefine(TEXT("NANITE_MULTI_VIEW"), 1);
}
// 声明着色器需要使用到的参数.
BEGIN_SHADER_PARAMETER_STRUCT(FParameters, )
SHADER_PARAMETER_SRV(StructuredBuffer<float4>, GPUSceneInstanceSceneData)
SHADER_PARAMETER_SRV(StructuredBuffer<float4>, GPUScenePrimitiveSceneData)
SHADER_PARAMETER(uint32, InstanceDataSOAStride)
SHADER_PARAMETER_RDG_BUFFER_SRV(StructuredBuffer< Nanite::FPackedView >, InViews)
// 存储可见性结果的缓冲区.
SHADER_PARAMETER_RDG_BUFFER_UAV(RWStructuredBuffer<uint>, InstanceVisibilityFlagsOut)
SHADER_PARAMETER(int32, NumInstances)
SHADER_PARAMETER(int32, NumInstanceFlagWords)
SHADER_PARAMETER(int32, NumViews)
END_SHADER_PARAMETER_STRUCT()
};
// 实现着色器.
IMPLEMENT_GLOBAL_SHADER(FCullInstancesCs, "/Engine/Private/InstanceCulling/CullInstances.usf", "CullInstancesCs", SF_Compute);
上面的最后一句实现宏可知FCullInstancesCs调用的shader代码文件是CullInstances.usf,分析之:
// Engine\Shaders\Private\InstanceCulling\CullInstances.usf
#include "../Common.ush"
#include "../SceneData.ush"
#include "../Nanite/NaniteDataDecode.ush"
#include "../Nanite/HZBCull.ush"
RWStructuredBuffer<uint> InstanceVisibilityFlagsOut;
uint NumInstances;
uint NumInstanceFlagWords;
uint NumViews;
uint InstanceDataSOAStride;
// 裁剪实例主入口.
[numthreads(NUM_THREADS_PER_GROUP, 1, 1)]
void CullInstancesCs(uint InstanceId : SV_DispatchThreadID)
{
// 防止InstanceId越界.
if (InstanceId >= NumInstances)
{
return;
}
const bool bNearClip = true;
// 解压Instance数据成Mask和Offset.
FInstanceSceneData InstanceData = GetInstanceData(InstanceId, InstanceDataSOAStride);
uint WordMask = 1U << (InstanceId % 32U);
uint InstanceWordOffset = InstanceId / 32U;
// 判定是否有效: PrimitiveId不是最大值且局部包围盒长度不为0.
bool bIsValid = InstanceData.PrimitiveId != 0xFFFFFFFFu && dot(InstanceData.LocalBoundsExtent, InstanceData.LocalBoundsExtent) > 0.0f;
// 遍历所有view, 每个view的视锥体和实例的包围盒做相交测试.
for (uint ViewId = 0; ViewId < NumViews; ++ViewId)
{
uint Flag = WordMask;
if (bIsValid)
{
FNaniteView NaniteView = GetNaniteView(ViewId);
// 计算局部到裁剪空间的变换矩阵.
float4x4 LocalToTranslatedWorld = InstanceData.LocalToWorld;
LocalToTranslatedWorld[3].xyz += NaniteView.PreViewTranslation.xyz;
float4x4 LocalToClip = mul(LocalToTranslatedWorld, NaniteView.TranslatedWorldToClip);
// 立方体和视锥体相交检测.
FFrustumCullData Cull = BoxCullFrustum(InstanceData.LocalBoundsCenter, InstanceData.LocalBoundsExtent, LocalToClip, bNearClip, false);
if (!Cull.bIsVisible)
{
Flag = 0U;
}
}
// 若实例可见, 设置InstanceVisibilityFlagsOut对应位置的值为1.
if (Flag != 0U)
{
uint WordOffset = NumInstanceFlagWords * ViewId + InstanceWordOffset;
// 注意CS里需要调用原子操作InterlockXXX接口, 避免竞争条件.
InterlockedOr(InstanceVisibilityFlagsOut[WordOffset], Flag);
}
}
}
有了VisibleInstanceFlags可见性数据,后续的Pass绘制就可以根据它来动态生成绘制指令和绘制参数,以达成GPU裁剪和驱动的渲染管线。
Nanite光栅化主要是给每个View构建并初始化一个FCullingContext的实例,接着调用CullRasterize,存储光栅化结果,构建HZB,关键代码如下:
for (int32 ViewIndex = 0; ViewIndex < Views.Num(); ViewIndex++)
{
const FViewInfo& View = Views[ViewIndex];
// 初始化裁剪上下文.
Nanite::FCullingContext CullingContext = Nanite::InitCullingContext(
GraphBuilder, *Scene,
!bIsEarlyDepthComplete ? View.PrevViewInfo.NaniteHZB : View.PrevViewInfo.HZB,
View.ViewRect,
bTwoPassOcclusion, bUpdateStreaming, bSupportsMultiplePasses, bForceHWRaster, bPrimaryContext, bDiscardNonMoving);
static FString EmptyFilterName = TEXT("");
const bool bExtractStats = Nanite::IsStatFilterActive(EmptyFilterName);
Nanite::FPackedView PackedView = Nanite::CreatePackedViewFromViewInfo(View, RasterTextureSize, VIEW_FLAG_HZBTEST, 3);
// 带裁剪的光栅化.
Nanite::CullRasterize(GraphBuilder, *Scene, { PackedView }, CullingContext, RasterContext, RasterState, nullptr, bExtractStats);
Nanite::FRasterResults& RasterResults = NaniteRasterResults[ViewIndex];
// 渲染提前渲染.
if (bNeedsPrePass)
{
Nanite::EmitDepthTargets(GraphBuilder, *Scene, Views[ViewIndex], CullingContext.SOAStrides, CullingContext.VisibleClustersSWHW, CullingContext.ViewsBuffer, SceneTextures.Depth.Target, RasterContext.VisBuffer64,RasterResults.MaterialDepth,RasterResults.NaniteMask,RasterResults.VelocityBuffer,bNeedsPrePass);
}
// 构建HZB.
if (!bIsEarlyDepthComplete && bTwoPassOcclusion && View.ViewState)
{
RDG_EVENT_SCOPE(GraphBuilder, "Nanite::BuildHZB");
FRDGTextureRef SceneDepth = SystemTextures.Black;
FRDGTextureRef GraphHZB = nullptr;
BuildHZBFurthest(GraphBuilder,SceneDepth, RasterContext.VisBuffer64, PrimaryViewRect, FeatureLevel, ShaderPlatform, TEXT("Nanite.HZB"), &GraphHZB );
GraphBuilder.QueueTextureExtraction( GraphHZB, &View.ViewState->PrevFrameViewInfo.NaniteHZB );
}
// 提取光栅化和裁剪结果.
Nanite::ExtractResults(GraphBuilder, CullingContext, RasterContext, RasterResults);
}
着重分析一下Nanite::CullRasterize的代码:
// Engine\Source\Runtime\Renderer\Private\Nanite\NaniteRender.cpp
void CullRasterize(
FRDGBuilder& GraphBuilder,
const FScene& Scene,
const TArray<FPackedView, SceneRenderingAllocator>& Views,
uint32 NumPrimaryViews, // Number of non-mip views
FCullingContext& CullingContext,
const FRasterContext& RasterContext,
const FRasterState& RasterState,
const TArray<FInstanceDraw, SceneRenderingAllocator>* OptionalInstanceDraws,
// VirtualShadowMapArray is the supplier of virtual to physical translation, probably could abstract this a bit better,
FVirtualShadowMapArray* VirtualShadowMapArray,
bool bExtractStats
)
{
// 如果视图太多, 拆分到多个Pass去光栅化. 只有depth-only渲染才可能发生.
if (Views.Num() > MAX_VIEWS_PER_CULL_RASTERIZE_PASS)
{
CullRasterizeMultiPass(GraphBuilder, Scene, Views, NumPrimaryViews, CullingContext, RasterContext, RasterState, OptionalInstanceDraws, VirtualShadowMapArray, bExtractStats);
return;
}
RDG_EVENT_SCOPE(GraphBuilder, "Nanite::CullRasterize");
(......)
// 创建视图的结构化缓冲.
{
const uint32 ViewsBufferElements = FMath::RoundUpToPowerOfTwo(Views.Num());
CullingContext.ViewsBuffer = CreateStructuredBuffer(GraphBuilder, TEXT("Nanite.Views"), Views.GetTypeSize(), ViewsBufferElements, Views.GetData(), Views.Num() * Views.GetTypeSize());
}
// 处理裁剪上下文的结构化缓冲.
if (OptionalInstanceDraws)
{
const uint32 InstanceDrawsBufferElements = FMath::RoundUpToPowerOfTwo(OptionalInstanceDraws->Num());
CullingContext.InstanceDrawsBuffer = CreateStructuredBuffer
(
GraphBuilder,
TEXT("Nanite.InstanceDraws"),
OptionalInstanceDraws->GetTypeSize(),
InstanceDrawsBufferElements,
OptionalInstanceDraws->GetData(),
OptionalInstanceDraws->Num() * OptionalInstanceDraws->GetTypeSize()
);
CullingContext.NumInstancesPreCull = OptionalInstanceDraws->Num();
}
else
{
CullingContext.InstanceDrawsBuffer = nullptr;
CullingContext.NumInstancesPreCull = Scene.GPUScene.InstanceDataAllocator.GetMaxSize();
}
(......)
// 裁剪参数.
FCullingParameters CullingParameters;
{
CullingParameters.InViews = GraphBuilder.CreateSRV(CullingContext.ViewsBuffer);
CullingParameters.NumViews = Views.Num();
CullingParameters.NumPrimaryViews = NumPrimaryViews;
CullingParameters.DisocclusionLodScaleFactor = GNaniteDisocclusionHack ? 0.01f : 1.0f; // TODO: Get rid of this hack
CullingParameters.HZBTexture = RegisterExternalTextureWithFallback(GraphBuilder, CullingContext.PrevHZB, GSystemTextures.BlackDummy);
CullingParameters.HZBSize = CullingContext.PrevHZB ? CullingContext.PrevHZB->GetDesc().Extent : FVector2D(0.0f);
CullingParameters.HZBSampler = TStaticSamplerState< SF_Point, AM_Clamp, AM_Clamp, AM_Clamp >::GetRHI();
CullingParameters.SOAStrides = CullingContext.SOAStrides;
CullingParameters.MaxCandidateClusters = Nanite::FGlobalResources::GetMaxCandidateClusters();
CullingParameters.MaxVisibleClusters = Nanite::FGlobalResources::GetMaxVisibleClusters();
CullingParameters.RenderFlags = CullingContext.RenderFlags;
CullingParameters.DebugFlags = CullingContext.DebugFlags;
CullingParameters.CompactedViewInfo = nullptr;
CullingParameters.CompactedViewsAllocation = nullptr;
}
FVirtualTargetParameters VirtualTargetParameters;
// 处理VSM(虚拟阴影图)数组.
if (VirtualShadowMapArray)
{
VirtualTargetParameters.VirtualShadowMap = VirtualShadowMapArray->GetUniformBuffer(GraphBuilder);
VirtualTargetParameters.PageFlags = GraphBuilder.CreateSRV(VirtualShadowMapArray->PageFlagsRDG, PF_R32_UINT);
VirtualTargetParameters.HPageFlags = GraphBuilder.CreateSRV(VirtualShadowMapArray->HPageFlagsRDG, PF_R32_UINT);
VirtualTargetParameters.PageRectBounds = GraphBuilder.CreateSRV(VirtualShadowMapArray->PageRectBoundsRDG);
// 如果提供了来自上一帧的HZB, 也需要上一帧的Page表.
FRDGBufferRef HZBPageTableRDG = VirtualShadowMapArray->PageTableRDG;
if (CullingContext.PrevHZB)
{
check( VirtualShadowMapArray->CacheManager );
TRefCountPtr<FRDGPooledBuffer> HZBPageTable = VirtualShadowMapArray->CacheManager->PrevBuffers.PageTable;
check( HZBPageTable );
HZBPageTableRDG = GraphBuilder.RegisterExternalBuffer( HZBPageTable, TEXT( "Shadow.Virtual.HZBPageTable" ) );
}
VirtualTargetParameters.ShadowHZBPageTable = GraphBuilder.CreateSRV( HZBPageTableRDG, PF_R32_UINT );
}
// 处理GPUScene数据.
FGPUSceneParameters GPUSceneParameters;
GPUSceneParameters.GPUSceneInstanceSceneData = Scene.GPUScene.InstanceDataBuffer.SRV;
GPUSceneParameters.GPUScenePrimitiveSceneData = Scene.GPUScene.PrimitiveBuffer.SRV;
GPUSceneParameters.GPUSceneFrameNumber = Scene.GPUScene.GetSceneFrameNumber();
// 裁剪VSM.
if (VirtualShadowMapArray && CVarCompactVSMViews.GetValueOnRenderThread() != 0)
{
RDG_GPU_STAT_SCOPE(GraphBuilder, NaniteInstanceCullVSM);
// 压缩视图来删除不必要的(空的)mip视图, 需要在GPU上做,因为GPU侧才知道mip拥有哪些page。
const uint32 ViewsBufferElements = FMath::RoundUpToPowerOfTwo(Views.Num());
FRDGBufferRef CompactedViews = GraphBuilder.CreateBuffer(FRDGBufferDesc::CreateStructuredDesc(sizeof(FPackedView), ViewsBufferElements), TEXT("Shadow.Virtual.CompactedViews"));
FRDGBufferRef CompactedViewInfo = GraphBuilder.CreateBuffer(FRDGBufferDesc::CreateStructuredDesc(sizeof(FCompactedViewInfo), Views.Num()), TEXT("Shadow.Virtual.CompactedViewInfo"));
const static uint32 TheZeros[2] = { 0U, 0U };
FRDGBufferRef CompactedViewsAllocation = CreateStructuredBuffer(GraphBuilder, TEXT("Shadow.Virtual.CompactedViewsAllocation"), sizeof(uint32), 2, TheZeros, sizeof(TheZeros), ERDGInitialDataFlags::NoCopy);
{
FCompactViewsVSM_CS::FParameters* PassParameters = GraphBuilder.AllocParameters< FCompactViewsVSM_CS::FParameters >();
PassParameters->GPUSceneParameters = GPUSceneParameters;
PassParameters->CullingParameters = CullingParameters;
PassParameters->VirtualShadowMap = VirtualTargetParameters;
PassParameters->CompactedViewsOut = GraphBuilder.CreateUAV(CompactedViews);
PassParameters->CompactedViewInfoOut = GraphBuilder.CreateUAV(CompactedViewInfo);
PassParameters->CompactedViewsAllocationOut = GraphBuilder.CreateUAV(CompactedViewsAllocation);
auto ComputeShader = CullingContext.ShaderMap->GetShader<FCompactViewsVSM_CS>();
// 利用CS压缩并裁剪VSM.
FComputeShaderUtils::AddPass(
GraphBuilder,
RDG_EVENT_NAME("CompactViewsVSM"),
ComputeShader,
PassParameters,
FComputeShaderUtils::GetGroupCount(NumPrimaryViews, 64)
);
}
// 用压缩的视图覆盖原有的信息.
CullingParameters.InViews = GraphBuilder.CreateSRV(CompactedViews);
CullingContext.ViewsBuffer = CompactedViews;
CullingParameters.CompactedViewInfo = GraphBuilder.CreateSRV(CompactedViewInfo);
CullingParameters.CompactedViewsAllocation = GraphBuilder.CreateSRV(CompactedViewsAllocation);
}
// 初始化裁剪上下文的参数.
{
FInitArgs_CS::FParameters* PassParameters = GraphBuilder.AllocParameters< FInitArgs_CS::FParameters >();
PassParameters->RenderFlags = CullingParameters.RenderFlags;
PassParameters->OutMainAndPostPassPersistentStates = GraphBuilder.CreateUAV( CullingContext.MainAndPostPassPersistentStates );
PassParameters->InOutMainPassRasterizeArgsSWHW = GraphBuilder.CreateUAV( CullingContext.MainRasterizeArgsSWHW );
uint32 ClampedDrawPassIndex = FMath::Min(CullingContext.DrawPassIndex, 2u);
if (CullingContext.bTwoPassOcclusion)
{
PassParameters->OutOccludedInstancesArgs = GraphBuilder.CreateUAV( CullingContext.OccludedInstancesArgs );
PassParameters->InOutPostPassRasterizeArgsSWHW = GraphBuilder.CreateUAV( CullingContext.PostRasterizeArgsSWHW );
}
if (CullingContext.RenderFlags & RENDER_FLAG_HAVE_PREV_DRAW_DATA)
{
PassParameters->InOutTotalPrevDrawClusters = GraphBuilder.CreateUAV(CullingContext.TotalPrevDrawClustersBuffer);
}
else
{
// Use any UAV just to keep render graph happy that something is bound, but the shader doesn't actually touch this.
PassParameters->InOutTotalPrevDrawClusters = PassParameters->OutMainAndPostPassPersistentStates;
}
FInitArgs_CS::FPermutationDomain PermutationVector;
PermutationVector.Set<FInitArgs_CS::FOcclusionCullingDim>( CullingContext.bTwoPassOcclusion );
PermutationVector.Set<FInitArgs_CS::FDrawPassIndexDim>( ClampedDrawPassIndex );
auto ComputeShader = CullingContext.ShaderMap->GetShader< FInitArgs_CS >( PermutationVector );
// 也是用CS初始化参数.
FComputeShaderUtils::AddPass(
GraphBuilder,
RDG_EVENT_NAME( "InitArgs" ),
ComputeShader,
PassParameters,
FIntVector( 1, 1, 1 )
);
}
// 分配候选缓冲区, 生命周期只在CullRasterize期间.
FRDGBufferRef MainCandidateNodesAndClustersBuffer = nullptr;
FRDGBufferRef PostCandidateNodesAndClustersBuffer = nullptr;
AllocateCandidateBuffers(GraphBuilder, CullingContext.ShaderMap, &MainCandidateNodesAndClustersBuffer, CullingContext.bTwoPassOcclusion ? &PostCandidateNodesAndClustersBuffer : nullptr);
// 实例化层级和Cluster裁剪, 包含无遮挡Pass或遮挡主Pass.
AddPass_InstanceHierarchyAndClusterCull(
GraphBuilder,
Scene,
CullingParameters,
Views,
NumPrimaryViews,
CullingContext,
RasterContext,
RasterState,
GPUSceneParameters,
MainCandidateNodesAndClustersBuffer,
PostCandidateNodesAndClustersBuffer,
CullingContext.bTwoPassOcclusion ? CULLING_PASS_OCCLUSION_MAIN : CULLING_PASS_NO_OCCLUSION,
VirtualShadowMapArray,
VirtualTargetParameters
);
// 光栅化.
AddPass_Rasterize(
GraphBuilder,
Views,
RasterContext,
RasterState,
CullingContext.SOAStrides,
CullingContext.RenderFlags,
CullingContext.ViewsBuffer,
CullingContext.VisibleClustersSWHW,
nullptr,
CullingContext.SafeMainRasterizeArgsSWHW,
CullingContext.TotalPrevDrawClustersBuffer,
GPUSceneParameters,
true,
VirtualShadowMapArray,
VirtualTargetParameters
);
// 遮挡后置Pass. 重新检测上一帧不可见的实例和Cluster, 如果它们此帧可见, 渲染之.
if (CullingContext.bTwoPassOcclusion)
{
// 用上一帧的遮挡体建立一个最近的HZB,以再次检测剩余的遮挡体。
{
RDG_EVENT_SCOPE(GraphBuilder, "BuildPreviousOccluderHZB");
FSceneTextureParameters SceneTextures = GetSceneTextureParameters(GraphBuilder);
FRDGTextureRef SceneDepth = SceneTextures.SceneDepthTexture;
FRDGTextureRef RasterizedDepth = RasterContext.VisBuffer64;
if( RasterContext.RasterTechnique == ERasterTechnique::DepthOnly )
{
SceneDepth = GraphBuilder.RegisterExternalTexture( GSystemTextures.BlackDummy );
RasterizedDepth = RasterContext.DepthBuffer;
}
FRDGTextureRef OutFurthestHZBTexture;
FIntRect ViewRect(0, 0, RasterContext.TextureSize.X, RasterContext.TextureSize.Y);
if (Views.Num() == 1)
{
ViewRect = FIntRect(Views[0].ViewRect.X, Views[0].ViewRect.Y, Views[0].ViewRect.Z, Views[0].ViewRect.W);
}
// 构建HZB.
BuildHZBFurthest(
GraphBuilder,
SceneDepth,
RasterizedDepth,
CullingContext.HZBBuildViewRect,
Scene.GetFeatureLevel(),
Scene.GetShaderPlatform(),
TEXT("Nanite.PreviousOccluderHZB"),
/* OutFurthestHZBTexture = */ &OutFurthestHZBTexture);
CullingParameters.HZBTexture = OutFurthestHZBTexture;
CullingParameters.HZBSize = CullingParameters.HZBTexture->Desc.Extent;
}
// 后置Pass.
AddPass_InstanceHierarchyAndClusterCull(
GraphBuilder,
Scene,
CullingParameters,
Views,
NumPrimaryViews,
CullingContext,
RasterContext,
RasterState,
GPUSceneParameters,
MainCandidateNodesAndClustersBuffer,
PostCandidateNodesAndClustersBuffer,
CULLING_PASS_OCCLUSION_POST,
VirtualShadowMapArray,
VirtualTargetParameters
);
// 渲染后置Pass.
AddPass_Rasterize(
GraphBuilder,
Views,
RasterContext,
RasterState,
CullingContext.SOAStrides,
CullingContext.RenderFlags,
CullingContext.ViewsBuffer,
CullingContext.VisibleClustersSWHW,
CullingContext.MainRasterizeArgsSWHW,
CullingContext.SafePostRasterizeArgsSWHW,
CullingContext.TotalPrevDrawClustersBuffer,
GPUSceneParameters,
false,
VirtualShadowMapArray,
VirtualTargetParameters
);
}
if (RasterContext.RasterTechnique != ERasterTechnique::DepthOnly)
{
// 上一个Pass渲染的Cluster索引和数量和仅深度渲染毫无关联.
CullingContext.DrawPassIndex++;
CullingContext.RenderFlags |= RENDER_FLAG_HAVE_PREV_DRAW_DATA;
}
(......)
}
下面将注意力放到AddPass_InstanceHierarchyAndClusterCull和AddPass_Rasterize两个接口。首先是AddPass_InstanceHierarchyAndClusterCull:
void AddPass_InstanceHierarchyAndClusterCull(
FRDGBuilder& GraphBuilder,
const FScene& Scene,
const FCullingParameters& CullingParameters,
const TArray<FPackedView, SceneRenderingAllocator>& Views,
const uint32 NumPrimaryViews,
const FCullingContext& CullingContext,
const FRasterContext& RasterContext,
const FRasterState& RasterState,
const FGPUSceneParameters &GPUSceneParameters,
FRDGBufferRef MainCandidateNodesAndClusters,
FRDGBufferRef PostCandidateNodesAndClusters,
uint32 CullingPass,
FVirtualShadowMapArray *VirtualShadowMapArray,
FVirtualTargetParameters &VirtualTargetParameters
)
{
(......)
const bool bMultiView = Views.Num() > 1 || VirtualShadowMapArray != nullptr;
if (VirtualShadowMapArray)
{
(......)
}
// 处理实例化裁剪.
else if (CullingContext.NumInstancesPreCull > 0 || CullingPass == CULLING_PASS_OCCLUSION_POST)
{
RDG_GPU_STAT_SCOPE( GraphBuilder, NaniteInstanceCull );
// 处理实例化裁剪CS的参数.
FInstanceCull_CS::FParameters* PassParameters = GraphBuilder.AllocParameters< FInstanceCull_CS::FParameters >();
PassParameters->NumInstances = CullingContext.NumInstancesPreCull;
PassParameters->MaxNodes = Nanite::FGlobalResources::GetMaxNodes();
PassParameters->ImposterMaxPixels = GNaniteImposterMaxPixels;
PassParameters->GPUSceneParameters = GPUSceneParameters;
PassParameters->RasterParameters = RasterContext.Parameters;
PassParameters->CullingParameters = CullingParameters;
const ERasterTechnique Technique = RasterContext.RasterTechnique;
PassParameters->OnlyCastShadowsPrimitives = Technique == ERasterTechnique::DepthOnly ? 1 : 0;
PassParameters->ImposterAtlas = Nanite::GStreamingManager.GetRootPagesSRV();
PassParameters->OutMainAndPostPassPersistentStates = GraphBuilder.CreateUAV( CullingContext.MainAndPostPassPersistentStates );
if (CullingContext.StatsBuffer)
{
PassParameters->OutStatsBuffer = GraphBuilder.CreateUAV(CullingContext.StatsBuffer);
}
// 根据不同的裁剪方式设置不同的参数.
if( CullingPass == CULLING_PASS_NO_OCCLUSION )
{
if( CullingContext.InstanceDrawsBuffer )
{
PassParameters->InInstanceDraws = GraphBuilder.CreateSRV( CullingContext.InstanceDrawsBuffer );
}
PassParameters->OutCandidateNodesAndClusters = GraphBuilder.CreateUAV( MainCandidateNodesAndClusters);
}
else if( CullingPass == CULLING_PASS_OCCLUSION_MAIN )
{
PassParameters->OutOccludedInstances = GraphBuilder.CreateUAV( CullingContext.OccludedInstances );
PassParameters->OutOccludedInstancesArgs = GraphBuilder.CreateUAV( CullingContext.OccludedInstancesArgs );
PassParameters->OutCandidateNodesAndClusters = GraphBuilder.CreateUAV( MainCandidateNodesAndClusters );
}
else
{
PassParameters->InInstanceDraws = GraphBuilder.CreateSRV( CullingContext.OccludedInstances );
PassParameters->InOccludedInstancesArgs = GraphBuilder.CreateSRV( CullingContext.OccludedInstancesArgs );
PassParameters->OutCandidateNodesAndClusters = GraphBuilder.CreateUAV( PostCandidateNodesAndClusters);
}
check(CullingContext.ViewsBuffer);
// 处理排列参数.
const uint32 InstanceCullingPass = CullingContext.InstanceDrawsBuffer != nullptr ? CULLING_PASS_EXPLICIT_LIST : CullingPass;
FInstanceCull_CS::FPermutationDomain PermutationVector;
PermutationVector.Set<FInstanceCull_CS::FCullingPassDim>(InstanceCullingPass);
PermutationVector.Set<FInstanceCull_CS::FMultiViewDim>(bMultiView);
PermutationVector.Set<FInstanceCull_CS::FNearClipDim>(RasterState.bNearClip);
PermutationVector.Set<FInstanceCull_CS::FDebugFlagsDim>(CullingContext.DebugFlags != 0);
PermutationVector.Set<FInstanceCull_CS::FRasterTechniqueDim>(int32(RasterContext.RasterTechnique));
auto ComputeShader = CullingContext.ShaderMap->GetShader<FInstanceCull_CS>(PermutationVector);
// 后置Pass实例裁剪.
if( InstanceCullingPass == CULLING_PASS_OCCLUSION_POST )
{
PassParameters->IndirectArgs = CullingContext.OccludedInstancesArgs;
FComputeShaderUtils::AddPass(
GraphBuilder,
RDG_EVENT_NAME( "Post Pass: InstanceCull" ),
ComputeShader,
PassParameters,
PassParameters->IndirectArgs,
0
);
}
else // 主通道实例裁剪.
{
FComputeShaderUtils::AddPass(
GraphBuilder,
InstanceCullingPass == CULLING_PASS_OCCLUSION_MAIN ? RDG_EVENT_NAME( "Main Pass: InstanceCull" ) :
InstanceCullingPass == CULLING_PASS_NO_OCCLUSION ? RDG_EVENT_NAME( "Main Pass: InstanceCull - No occlusion" ) :
RDG_EVENT_NAME( "Main Pass: InstanceCull - Explicit list" ),
ComputeShader,
PassParameters,
FComputeShaderUtils::GetGroupCount(CullingContext.NumInstancesPreCull, 64)
);
}
}
// Cluster裁剪.
{
RDG_GPU_STAT_SCOPE(GraphBuilder, NaniteClusterCull);
FPersistentClusterCull_CS::FParameters* PassParameters = GraphBuilder.AllocParameters< FPersistentClusterCull_CS::FParameters >();
// Cluster裁剪用到了GPUScene、GStreamingManager等参数。
PassParameters->GPUSceneParameters = GPUSceneParameters;
PassParameters->CullingParameters = CullingParameters;
PassParameters->MaxNodes = Nanite::FGlobalResources::GetMaxNodes();
PassParameters->ClusterPageHeaders = Nanite::GStreamingManager.GetClusterPageHeadersSRV();
PassParameters->ClusterPageData = Nanite::GStreamingManager.GetClusterPageDataSRV();
PassParameters->HierarchyBuffer = Nanite::GStreamingManager.GetHierarchySRV();
check(CullingContext.DrawPassIndex == 0 || CullingContext.RenderFlags & RENDER_FLAG_HAVE_PREV_DRAW_DATA); // sanity check
// 处理上一帧数据.
if (CullingContext.RenderFlags & RENDER_FLAG_HAVE_PREV_DRAW_DATA)
{
PassParameters->InTotalPrevDrawClusters = GraphBuilder.CreateSRV(CullingContext.TotalPrevDrawClustersBuffer);
}
else
{
FRDGBufferRef Dummy = GraphBuilder.RegisterExternalBuffer(Nanite::GGlobalResources.GetStructureBufferStride8(), TEXT("Nanite.StructuredBufferStride8"));
PassParameters->InTotalPrevDrawClusters = GraphBuilder.CreateSRV(Dummy);
}
PassParameters->MainAndPostPassPersistentStates = GraphBuilder.CreateUAV( CullingContext.MainAndPostPassPersistentStates );
// 候选节点和Cluster.
if( CullingPass == CULLING_PASS_NO_OCCLUSION || CullingPass == CULLING_PASS_OCCLUSION_MAIN )
{
PassParameters->InOutCandidateNodesAndClusters = GraphBuilder.CreateUAV( MainCandidateNodesAndClusters );
PassParameters->VisibleClustersArgsSWHW = GraphBuilder.CreateUAV( CullingContext.MainRasterizeArgsSWHW );
if( CullingPass == CULLING_PASS_OCCLUSION_MAIN )
{
PassParameters->OutOccludedNodesAndClusters = GraphBuilder.CreateUAV( PostCandidateNodesAndClusters );
}
}
else
{
PassParameters->InOutCandidateNodesAndClusters = GraphBuilder.CreateUAV( PostCandidateNodesAndClusters );
PassParameters->OffsetClustersArgsSWHW = GraphBuilder.CreateSRV( CullingContext.MainRasterizeArgsSWHW );
PassParameters->VisibleClustersArgsSWHW = GraphBuilder.CreateUAV( CullingContext.PostRasterizeArgsSWHW );
}
// 输出结果UAV, 包含可见Cluster和流请求.
PassParameters->OutVisibleClustersSWHW = GraphBuilder.CreateUAV( CullingContext.VisibleClustersSWHW );
PassParameters->OutStreamingRequests = GraphBuilder.CreateUAV( CullingContext.StreamingRequests );
if (VirtualShadowMapArray)
{
PassParameters->VirtualShadowMap = VirtualTargetParameters;
PassParameters->OutDynamicCasterFlags = GraphBuilder.CreateUAV(VirtualShadowMapArray->DynamicCasterPageFlagsRDG, PF_R32_UINT);
}
if (CullingContext.StatsBuffer)
{
PassParameters->OutStatsBuffer = GraphBuilder.CreateUAV(CullingContext.StatsBuffer);
}
PassParameters->LargePageRectThreshold = CVarLargePageRectThreshold.GetValueOnRenderThread();
check(CullingContext.ViewsBuffer);
// 排列.
FPersistentClusterCull_CS::FPermutationDomain PermutationVector;
PermutationVector.Set<FPersistentClusterCull_CS::FCullingPassDim>(CullingPass);
PermutationVector.Set<FPersistentClusterCull_CS::FMultiViewDim>(bMultiView);
PermutationVector.Set<FPersistentClusterCull_CS::FNearClipDim>(RasterState.bNearClip);
PermutationVector.Set<FPersistentClusterCull_CS::FVirtualTextureTargetDim>(VirtualShadowMapArray != nullptr);
PermutationVector.Set<FPersistentClusterCull_CS::FClusterPerPageDim>(GNaniteClusterPerPage && VirtualShadowMapArray != nullptr);
PermutationVector.Set<FPersistentClusterCull_CS::FDebugFlagsDim>(CullingContext.DebugFlags != 0);
auto ComputeShader = CullingContext.ShaderMap->GetShader<FPersistentClusterCull_CS>(PermutationVector);
// CS Pass调用.
FComputeShaderUtils::AddPass(
GraphBuilder,
CullingPass == CULLING_PASS_NO_OCCLUSION ? RDG_EVENT_NAME( "Main Pass: PersistentCull - No occlusion" ) :
CullingPass == CULLING_PASS_OCCLUSION_MAIN ? RDG_EVENT_NAME( "Main Pass: PersistentCull" ) :
RDG_EVENT_NAME( "Post Pass: PersistentCull" ),
ComputeShader,
PassParameters,
FIntVector(GRHIPersistentThreadGroupCount, 1, 1)
);
}
// 计算光栅化参数, 以保证后续的光栅化通道正确且安全.
{
FCalculateSafeRasterizerArgs_CS::FParameters* PassParameters = GraphBuilder.AllocParameters< FCalculateSafeRasterizerArgs_CS::FParameters >();
const bool bPrevDrawData = (CullingContext.RenderFlags & RENDER_FLAG_HAVE_PREV_DRAW_DATA) != 0;
const bool bPostPass = (CullingPass == CULLING_PASS_OCCLUSION_POST) != 0;
if (bPrevDrawData)
{
PassParameters->InTotalPrevDrawClusters = GraphBuilder.CreateSRV(CullingContext.TotalPrevDrawClustersBuffer);
}
if (bPostPass)
{
PassParameters->OffsetClustersArgsSWHW = GraphBuilder.CreateSRV(CullingContext.MainRasterizeArgsSWHW);
PassParameters->InRasterizerArgsSWHW = GraphBuilder.CreateSRV(CullingContext.PostRasterizeArgsSWHW);
PassParameters->OutSafeRasterizerArgsSWHW = GraphBuilder.CreateUAV(CullingContext.SafePostRasterizeArgsSWHW);
}
else
{
PassParameters->InRasterizerArgsSWHW = GraphBuilder.CreateSRV(CullingContext.MainRasterizeArgsSWHW);
PassParameters->OutSafeRasterizerArgsSWHW = GraphBuilder.CreateUAV(CullingContext.SafeMainRasterizeArgsSWHW);
}
PassParameters->MaxVisibleClusters = Nanite::FGlobalResources::GetMaxVisibleClusters();
PassParameters->RenderFlags = CullingContext.RenderFlags;
FCalculateSafeRasterizerArgs_CS::FPermutationDomain PermutationVector;
PermutationVector.Set<FCalculateSafeRasterizerArgs_CS::FHasPrevDrawData>(bPrevDrawData);
PermutationVector.Set<FCalculateSafeRasterizerArgs_CS::FIsPostPass>(bPostPass);
auto ComputeShader = CullingContext.ShaderMap->GetShader< FCalculateSafeRasterizerArgs_CS >(PermutationVector);
FComputeShaderUtils::AddPass(
GraphBuilder,
bPostPass ? RDG_EVENT_NAME("Post Pass: CalculateSafeRasterizerArgs") : RDG_EVENT_NAME("Main Pass: CalculateSafeRasterizerArgs"),
ComputeShader,
PassParameters,
FIntVector(1, 1, 1)
);
}
}
上面涉及多次Compute Shader的调用,限于篇幅,就不对其shader代码进行剖析了。下面将重点放到AddPass_Rasterize:
void AddPass_Rasterize(
FRDGBuilder& GraphBuilder,
const TArray<FPackedView, SceneRenderingAllocator>& Views,
const FRasterContext& RasterContext,
const FRasterState& RasterState,
FIntVector4 SOAStrides,
uint32 RenderFlags,
FRDGBufferRef ViewsBuffer,
FRDGBufferRef VisibleClustersSWHW,
FRDGBufferRef ClusterOffsetSWHW,
FRDGBufferRef IndirectArgs,
FRDGBufferRef TotalPrevDrawClustersBuffer,
const FGPUSceneParameters& GPUSceneParameters,
bool bMainPass,
FVirtualShadowMapArray* VirtualShadowMapArray,
FVirtualTargetParameters& VirtualTargetParameters
)
{
(......)
// 分配光栅化参数.
auto* RasterPassParameters = GraphBuilder.AllocParameters<FHWRasterizePS::FParameters>();
auto* CommonPassParameters = &RasterPassParameters->Common;
// 设置Cluster页面和页面头.
CommonPassParameters->ClusterPageData = GStreamingManager.GetClusterPageDataSRV();
CommonPassParameters->ClusterPageHeaders = GStreamingManager.GetClusterPageHeadersSRV();
// 视图缓冲数据.
if (ViewsBuffer)
{
CommonPassParameters->InViews = GraphBuilder.CreateSRV(ViewsBuffer);
}
// 绘制参数.
CommonPassParameters->GPUSceneParameters = GPUSceneParameters;
CommonPassParameters->RasterParameters = RasterContext.Parameters;
CommonPassParameters->VisualizeModeBitMask = RasterContext.VisualizeModeBitMask;
CommonPassParameters->SOAStrides = SOAStrides;
CommonPassParameters->MaxVisibleClusters = Nanite::FGlobalResources::GetMaxVisibleClusters();
CommonPassParameters->RenderFlags = RenderFlags;
if (RasterState.CullMode == CM_CCW)
{
CommonPassParameters->RenderFlags |= RENDER_FLAG_REVERSE_CULLING;
}
CommonPassParameters->VisibleClustersSWHW = GraphBuilder.CreateSRV(VisibleClustersSWHW);
if (VirtualShadowMapArray)
{
CommonPassParameters->VirtualShadowMap = VirtualTargetParameters;
}
if (!bMainPass)
{
CommonPassParameters->InClusterOffsetSWHW = GraphBuilder.CreateSRV(ClusterOffsetSWHW);
}
CommonPassParameters->IndirectArgs = IndirectArgs;
const bool bHavePrevDrawData = (RenderFlags & RENDER_FLAG_HAVE_PREV_DRAW_DATA);
if (bHavePrevDrawData)
{
CommonPassParameters->InTotalPrevDrawClusters = GraphBuilder.CreateSRV(TotalPrevDrawClustersBuffer);
}
const ERasterTechnique Technique = RasterContext.RasterTechnique;
const ERasterScheduling Scheduling = RasterContext.RasterScheduling;
const bool bNearClip = RasterState.bNearClip;
const bool bMultiView = Views.Num() > 1 || VirtualShadowMapArray != nullptr;
ERDGPassFlags ComputePassFlags = ERDGPassFlags::Compute;
// 如果是软硬件结合的方式, 创建带SkipBarrier标记的UAV.
if (Scheduling == ERasterScheduling::HardwareAndSoftwareOverlap)
{
const auto CreateSkipBarrierUAV = [&](auto& InOutUAV)
{
if (InOutUAV)
{
// 带了ERDGUnorderedAccessViewFlags::SkipBarrier标记.
InOutUAV = GraphBuilder.CreateUAV(InOutUAV->Desc, ERDGUnorderedAccessViewFlags::SkipBarrier);
}
};
// 创建带SkipBarrier标记的UAV, 以允许软硬件交叉重叠.
CreateSkipBarrierUAV(CommonPassParameters->RasterParameters.OutDepthBuffer);
CreateSkipBarrierUAV(CommonPassParameters->RasterParameters.OutVisBuffer64);
CreateSkipBarrierUAV(CommonPassParameters->RasterParameters.OutDbgBuffer64);
CreateSkipBarrierUAV(CommonPassParameters->RasterParameters.OutDbgBuffer32);
CreateSkipBarrierUAV(CommonPassParameters->RasterParameters.LockBuffer);
ComputePassFlags = ERDGPassFlags::AsyncCompute;
}
FIntRect ViewRect(Views[0].ViewRect.X, Views[0].ViewRect.Y, Views[0].ViewRect.Z, Views[0].ViewRect.W);
if (bMultiView)
{
ViewRect.Min = FIntPoint::ZeroValue;
ViewRect.Max = RasterContext.TextureSize;
}
// 处理VSM.
if (VirtualShadowMapArray)
{
ViewRect.Min = FIntPoint::ZeroValue;
if( GNaniteClusterPerPage )
{
ViewRect.Max = FIntPoint( FVirtualShadowMap::PageSize, FVirtualShadowMap::PageSize ) * FVirtualShadowMap::RasterWindowPages;
}
else
{
ViewRect.Max = FIntPoint( FVirtualShadowMap::VirtualMaxResolutionXY, FVirtualShadowMap::VirtualMaxResolutionXY );
}
}
// 先用传统的硬件渲染管线光栅化.
{
const bool bUsePrimitiveShader = UsePrimitiveShader();
const bool bUseAutoCullingShader =
GRHISupportsPrimitiveShaders &&
!bUsePrimitiveShader &&
GNaniteAutoShaderCulling != 0;
// 处理VS参数.
FHWRasterizeVS::FPermutationDomain PermutationVectorVS;
PermutationVectorVS.Set<FHWRasterizeVS::FRasterTechniqueDim>(int32(Technique));
PermutationVectorVS.Set<FHWRasterizeVS::FAddClusterOffset>(bMainPass ? 0 : 1);
PermutationVectorVS.Set<FHWRasterizeVS::FMultiViewDim>(bMultiView);
PermutationVectorVS.Set<FHWRasterizeVS::FPrimShaderDim>(bUsePrimitiveShader);
PermutationVectorVS.Set<FHWRasterizeVS::FAutoShaderCullDim>(bUseAutoCullingShader);
PermutationVectorVS.Set<FHWRasterizeVS::FHasPrevDrawData>(bHavePrevDrawData);
PermutationVectorVS.Set<FHWRasterizeVS::FVisualizeDim>(RasterContext.VisualizeActive && Technique != ERasterTechnique::DepthOnly);
PermutationVectorVS.Set<FHWRasterizeVS::FNearClipDim>(bNearClip);
PermutationVectorVS.Set<FHWRasterizeVS::FVirtualTextureTargetDim>(VirtualShadowMapArray != nullptr);
PermutationVectorVS.Set<FHWRasterizeVS::FClusterPerPageDim>(GNaniteClusterPerPage && VirtualShadowMapArray != nullptr );
// 处理PS参数.
FHWRasterizePS::FPermutationDomain PermutationVectorPS;
PermutationVectorPS.Set<FHWRasterizePS::FRasterTechniqueDim>(int32(Technique));
PermutationVectorPS.Set<FHWRasterizePS::FMultiViewDim>(bMultiView);
PermutationVectorPS.Set<FHWRasterizePS::FPrimShaderDim>(bUsePrimitiveShader);
PermutationVectorPS.Set<FHWRasterizePS::FVisualizeDim>(RasterContext.VisualizeActive && Technique != ERasterTechnique::DepthOnly);
PermutationVectorPS.Set<FHWRasterizePS::FNearClipDim>(bNearClip);
PermutationVectorPS.Set<FHWRasterizePS::FVirtualTextureTargetDim>(VirtualShadowMapArray != nullptr);
PermutationVectorPS.Set<FHWRasterizePS::FClusterPerPageDim>( GNaniteClusterPerPage && VirtualShadowMapArray != nullptr );
auto VertexShader = RasterContext.ShaderMap->GetShader<FHWRasterizeVS>(PermutationVectorVS);
auto PixelShader = RasterContext.ShaderMap->GetShader<FHWRasterizePS>(PermutationVectorPS);
// 增加光栅化Pass.
GraphBuilder.AddPass(
bMainPass ? RDG_EVENT_NAME("Main Pass: Rasterize") : RDG_EVENT_NAME("Post Pass: Rasterize"),
RasterPassParameters,
ERDGPassFlags::Raster | ERDGPassFlags::SkipRenderPass,
[VertexShader, PixelShader, RasterPassParameters, ViewRect, bUsePrimitiveShader, bMainPass](FRHICommandListImmediate& RHICmdList)
{
// 渲染Pass信息.
FRHIRenderPassInfo RPInfo;
// Resolve参数.
RPInfo.ResolveParameters.DestRect.X1 = ViewRect.Min.X;
RPInfo.ResolveParameters.DestRect.Y1 = ViewRect.Min.Y;
RPInfo.ResolveParameters.DestRect.X2 = ViewRect.Max.X;
RPInfo.ResolveParameters.DestRect.Y2 = ViewRect.Max.Y;
RHICmdList.BeginRenderPass(RPInfo, bMainPass ? TEXT("Main Pass: Rasterize") : TEXT("Post Pass: Rasterize"));
RHICmdList.SetViewport(ViewRect.Min.X, ViewRect.Min.Y, 0.0f, FMath::Min(ViewRect.Max.X, 32767), FMath::Min(ViewRect.Max.Y, 32767), 1.0f);
FGraphicsPipelineStateInitializer GraphicsPSOInit;
RHICmdList.ApplyCachedRenderTargets(GraphicsPSOInit);
// PSO.
GraphicsPSOInit.BlendState = TStaticBlendState<>::GetRHI();
GraphicsPSOInit.RasterizerState = GetStaticRasterizerState<false>(FM_Solid, CM_CW);
GraphicsPSOInit.DepthStencilState = TStaticDepthStencilState<false, CF_Always>::GetRHI();
GraphicsPSOInit.PrimitiveType = bUsePrimitiveShader ? PT_PointList : PT_TriangleList;
GraphicsPSOInit.BoundShaderState.VertexDeclarationRHI = GEmptyVertexDeclaration.VertexDeclarationRHI;
GraphicsPSOInit.BoundShaderState.VertexShaderRHI = VertexShader.GetVertexShader();
GraphicsPSOInit.BoundShaderState.PixelShaderRHI = PixelShader.GetPixelShader();
SetGraphicsPipelineState( RHICmdList, GraphicsPSOInit );
SetShaderParameters(RHICmdList, VertexShader, VertexShader.GetVertexShader(), RasterPassParameters->Common);
SetShaderParameters(RHICmdList, PixelShader, PixelShader.GetPixelShader(), *RasterPassParameters);
RHICmdList.SetStreamSource( 0, nullptr, 0 );
// 注意调用的是Indirect类型的接口, 并且IndirectArgs就是AddPass_InstanceHierarchyAndClusterCull的结果.
RHICmdList.DrawPrimitiveIndirect(RasterPassParameters->Common.IndirectArgs->GetIndirectRHICallBuffer(), 16);
RHICmdList.EndRenderPass();
});
}
// 软件光栅化(用Compute Shader计算).
if (Scheduling != ERasterScheduling::HardwareOnly)
{
// 处理软件光栅化CS的参数.
FMicropolyRasterizeCS::FPermutationDomain PermutationVectorCS;
PermutationVectorCS.Set<FMicropolyRasterizeCS::FAddClusterOffset>(bMainPass ? 0 : 1);
PermutationVectorCS.Set<FMicropolyRasterizeCS::FMultiViewDim>(bMultiView);
PermutationVectorCS.Set<FMicropolyRasterizeCS::FHasPrevDrawData>(bHavePrevDrawData);
PermutationVectorCS.Set<FMicropolyRasterizeCS::FRasterTechniqueDim>(int32(Technique));
PermutationVectorCS.Set<FMicropolyRasterizeCS::FVisualizeDim>(RasterContext.VisualizeActive && Technique != ERasterTechnique::DepthOnly);
PermutationVectorCS.Set<FMicropolyRasterizeCS::FNearClipDim>(bNearClip);
PermutationVectorCS.Set<FMicropolyRasterizeCS::FVirtualTextureTargetDim>(VirtualShadowMapArray != nullptr);
PermutationVectorCS.Set<FMicropolyRasterizeCS::FClusterPerPageDim>(GNaniteClusterPerPage&& VirtualShadowMapArray != nullptr);
auto ComputeShader = RasterContext.ShaderMap->GetShader<FMicropolyRasterizeCS>(PermutationVectorCS);
// 派发调用, 光栅化的数据和参数在CommonPassParameters内.
FComputeShaderUtils::AddPass(
GraphBuilder,
bMainPass ? RDG_EVENT_NAME("Main Pass: Rasterize") : RDG_EVENT_NAME("Post Pass: Rasterize"),
ComputePassFlags,
ComputeShader,
CommonPassParameters,
CommonPassParameters->IndirectArgs,
0);
}
}
为了更进一步探查硬件光栅化和软件光栅化的过程,有必要进入它们的shader逻辑进行分析:
// Engine\Shaders\Private\Nanite\Rasterizer.usf
(......)
// 光栅化三角形(用于软件光栅化)
void RasterizeTri(
FNaniteView NaniteView,
int4 ViewRect,
uint PixelValue,
#if VISUALIZE
uint2 VisualizeValues,
#endif
float3 Verts[3],
bool bUsePageTable )
{
float3 v01 = Verts[1] - Verts[0];
float3 v02 = Verts[2] - Verts[0];
// 背面剔除
float DetXY = v01.x * v02.y - v01.y * v02.x;
if( DetXY >= 0.0f )
{
return;
}
float InvDet = rcp( DetXY );
float2 GradZ;
GradZ.x = ( v01.z * v02.y - v01.y * v02.z ) * InvDet;
GradZ.y = ( v01.x * v02.z - v01.z * v02.x ) * InvDet;
// 16.8定点数
float2 Vert0 = Verts[0].xy;
float2 Vert1 = Verts[1].xy;
float2 Vert2 = Verts[2].xy;
// 矩形包围盒
const float2 MinSubpixel = min3( Vert0, Vert1, Vert2 );
const float2 MaxSubpixel = max3( Vert0, Vert1, Vert2 );
// 四舍五入到最近像素
int2 MinPixel = (int2)floor( ( MinSubpixel + (SUBPIXEL_SAMPLES / 2) - 1 ) * (1.0 / SUBPIXEL_SAMPLES) );
int2 MaxPixel = (int2)floor( ( MaxSubpixel - (SUBPIXEL_SAMPLES / 2) - 1 ) * (1.0 / SUBPIXEL_SAMPLES) );
// 裁剪到视图.
MinPixel = max( MinPixel, ViewRect.xy );
MaxPixel = min( MaxPixel, ViewRect.zw - 1 );
// 裁剪无像素覆盖的三角形
if( any( MinPixel > MaxPixel ) )
return;
// 限制光栅化边界到一个合理的最大值。
MaxPixel = min( MaxPixel, MinPixel + 63 );
// 4.8 定点数
float2 Edge01 = -v01.xy;
float2 Edge12 = Vert1 - Vert2;
float2 Edge20 = v02.xy;
// 用MinPixel调整MinPixel的像素偏移
// 4.8 fixed point
// 最大三角形尺寸 = 127x127像素
const float2 BaseSubpixel = (float2)MinPixel * SUBPIXEL_SAMPLES + (SUBPIXEL_SAMPLES / 2);
Vert0 -= BaseSubpixel;
Vert1 -= BaseSubpixel;
Vert2 -= BaseSubpixel;
// 半边常量
// 8.16 fixed point
float C0 = Edge01.y * Vert0.x - Edge01.x * Vert0.y;
float C1 = Edge12.y * Vert1.x - Edge12.x * Vert1.y;
float C2 = Edge20.y * Vert2.x - Edge20.x * Vert2.y;
// 校正填充规则
// Top left rule for CCW
C0 -= saturate(Edge01.y + saturate(1.0f - Edge01.x));
C1 -= saturate(Edge12.y + saturate(1.0f - Edge12.x));
C2 -= saturate(Edge20.y + saturate(1.0f - Edge20.x));
float Z0 = Verts[0].z - ( GradZ.x * Vert0.x + GradZ.y * Vert0.y );
GradZ *= SUBPIXEL_SAMPLES;
// 计算步进常量, 和SUBPIXEL_SAMPLES相关, SUBPIXEL_SAMPLES越大, 步进越小, 光栅化结果越精准, 但消耗越大.
float CY0 = C0 * (1.0f / SUBPIXEL_SAMPLES);
float CY1 = C1 * (1.0f / SUBPIXEL_SAMPLES);
float CY2 = C2 * (1.0f / SUBPIXEL_SAMPLES);
float ZY = Z0;
// 是否使用扫描线
#if COMPILER_SUPPORTS_WAVE_VOTE
bool bScanLine = WaveActiveAnyTrue( MaxPixel.x - MinPixel.x > 4 );
#else
bool bScanLine = false;
#endif
if( bScanLine ) // 扫描线算法.
{
float3 Edge012 = { Edge01.y, Edge12.y, Edge20.y };
bool3 bOpenEdge = Edge012 < 0;
float3 InvEdge012 = Edge012 == 0 ? 1e8 : rcp( Edge012 );
int y = MinPixel.y;
while( true )
{
// No longer fixed point
float3 CrossX = float3( CY0, CY1, CY2 ) * InvEdge012;
float3 MinX = bOpenEdge ? CrossX : 0;
float3 MaxX = bOpenEdge ? MaxPixel.x - MinPixel.x : CrossX;
float x0 = ceil( max3( MinX.x, MinX.y, MinX.z ) );
float x1 = min3( MaxX.x, MaxX.y, MaxX.z );
float ZX = ZY + GradZ.x * x0;
x0 += MinPixel.x;
x1 += MinPixel.x;
// 遍历x方向上的所有像素, 写入像素数据.
for( float x = x0; x <= x1; x++ )
{
// 写入像素值和深度值.
WritePixel(OutVisBuffer64, PixelValue, uint2(x,y), ZX, NaniteView, bUsePageTable);
#if VISUALIZE
WritePixel(OutDbgBuffer64, VisualizeValues.x, uint2(x,y), ZX, NaniteView, bUsePageTable);
InterlockedAdd(OutDbgBuffer32[uint2(x,y)], VisualizeValues.y);
#endif
ZX += GradZ.x;
}
if( y >= MaxPixel.y )
break;
// 增加Y方向的步进
CY0 += Edge01.x;
CY1 += Edge12.x;
CY2 += Edge20.x;
ZY += GradZ.y;
y++;
}
}
else // 非扫描线算法(矩形框算法, 需要检测是否在三角形内部)
{
int y = MinPixel.y;
while (true)
{
int x = MinPixel.x;
// 3个都是正数, 说明在三角形内.
if (min3(CY0, CY1, CY2) >= 0)
{
WritePixel(OutVisBuffer64, PixelValue, uint2(x, y), ZY, NaniteView, bUsePageTable);
#if VISUALIZE
WritePixel(OutDbgBuffer64, VisualizeValues.x, uint2(x, y), ZY, NaniteView, bUsePageTable);
InterlockedAdd(OutDbgBuffer32[uint2(x, y)], VisualizeValues.y);
#endif
}
if (x < MaxPixel.x)
{
float CX0 = CY0 - Edge01.y;
float CX1 = CY1 - Edge12.y;
float CX2 = CY2 - Edge20.y;
float ZX = ZY + GradZ.x;
x++;
HOIST_DESCRIPTORS
while (true)
{
if (min3(CX0, CX1, CX2) >= 0)
{
WritePixel(OutVisBuffer64, PixelValue, uint2(x, y), ZX, NaniteView, bUsePageTable);
#if VISUALIZE
WritePixel(OutDbgBuffer64, VisualizeValues.x, uint2(x, y), ZX, NaniteView, bUsePageTable);
InterlockedAdd(OutDbgBuffer32[uint2(x, y)], VisualizeValues.y);
#endif
}
if (x >= MaxPixel.x)
break;
CX0 -= Edge01.y;
CX1 -= Edge12.y;
CX2 -= Edge20.y;
ZX += GradZ.x;
x++;
}
}
if (y >= MaxPixel.y)
break;
CY0 += Edge01.x;
CY1 += Edge12.x;
CY2 += Edge20.x;
ZY += GradZ.y;
y++;
}
}
}
#if USE_CONSTRAINED_CLUSTERS
groupshared float3 GroupVerts[256];
#else
groupshared float3 GroupVerts[384];
#endif
// 检测裁剪模式, 模式是顺时针(CW). 如果返回true, 需要逆时针(CCW).
bool ReverseWindingOrder(FInstanceSceneData InstanceData)
{
bool bReverseInstanceCull = (InstanceData.InvNonUniformScaleAndDeterminantSign.w < 0.0f);
bool bRasterStateReverseCull = (RenderFlags & RENDER_FLAG_REVERSE_CULLING);
// Logical XOR
return (bReverseInstanceCull != bRasterStateReverseCull);
}
StructuredBuffer< uint2 > InTotalPrevDrawClusters;
Buffer<uint> InClusterOffsetSWHW;
groupshared float4x4 LocalToSubpixelLDS;
// 微表面光栅化, 用于Nanite的CS软光栅.
[numthreads(128, 1, 1)]
void MicropolyRasterize(
uint VisibleIndex : SV_GroupID,
uint GroupIndex : SV_GroupIndex)
{
// 计算可见索引.
#if HAS_PREV_DRAW_DATA
VisibleIndex += InTotalPrevDrawClusters[0].x;
#endif
#if ADD_CLUSTER_OFFSET
VisibleIndex += InClusterOffsetSWHW[0];
#endif
// 获取可见的Cluster和实例数据.
FVisibleCluster VisibleCluster = GetVisibleCluster( VisibleIndex, VIRTUAL_TEXTURE_TARGET );
FInstanceSceneData InstanceData = GetInstanceData( VisibleCluster.InstanceId );
// 获取Nanite视图.
FNaniteView NaniteView = GetNaniteView( VisibleCluster.ViewId );
// 获取页面信息.
#if CLUSTER_PER_PAGE
// Scalar
uint2 vPage = VisibleCluster.vPage;
FShadowPhysicalPage pPage = ShadowGetPhysicalPage( CalcPageTableLevelOffset( NaniteView.TargetLayerIndex, NaniteView.TargetMipLevel ) + CalcPageOffsetInLevel( NaniteView.TargetMipLevel, vPage ) );
#endif
float4x4 LocalToSubpixel;
// InstancedDynamicData是Group不变的, 所以只需计算一次, 然后存储在groupshared的变量中以供后续使用.
if( GroupIndex == 0 )
{
LocalToSubpixel = CalculateInstanceDynamicData(NaniteView, InstanceData).LocalToClip;
float2 Scale = float2( 0.5, -0.5 ) * NaniteView.ViewSizeAndInvSize.xy * SUBPIXEL_SAMPLES;
float2 Bias = ( 0.5 * NaniteView.ViewSizeAndInvSize.xy + NaniteView.ViewRect.xy ) * SUBPIXEL_SAMPLES + 0.5f;
#if CLUSTER_PER_PAGE
Bias += ( (float2)pPage.PageIndex - (float2)vPage ) * VSM_PAGE_SIZE * SUBPIXEL_SAMPLES;
#endif
LocalToSubpixel._m00_m10_m20_m30 = LocalToSubpixel._m00_m10_m20_m30 * Scale.x + LocalToSubpixel._m03_m13_m23_m33 * Bias.x;
LocalToSubpixel._m01_m11_m21_m31 = LocalToSubpixel._m01_m11_m21_m31 * Scale.y + LocalToSubpixel._m03_m13_m23_m33 * Bias.y;
LocalToSubpixelLDS = LocalToSubpixel;
}
// 使用Group内存屏障以同步Group数据.
GroupMemoryBarrierWithGroupSync();
LocalToSubpixel = LocalToSubpixelLDS;
// 获取Cluster数据.
FCluster Cluster = GetCluster(VisibleCluster.PageIndex, VisibleCluster.ClusterIndex);
UNROLL
for( uint i = 0; i < 2; i++ )
{
uint VertIndex = GroupIndex + i * 128;
if( VertIndex < Cluster.NumVerts )
{
// 变换顶点, 且保持到组间共享内存中.
float3 PointLocal = DecodePosition( VertIndex, Cluster );
float4 PointClipSubpixel = mul( float4( PointLocal, 1 ), LocalToSubpixel );
float3 Subpixel = PointClipSubpixel.xyz / PointClipSubpixel.w;
GroupVerts[ VertIndex ] = float3(floor(Subpixel.xy), Subpixel.z);
}
}
// 使用Group内存屏障以同步Group数据.
GroupMemoryBarrierWithGroupSync();
int4 ViewRect = NaniteView.ViewRect;
#if CLUSTER_PER_PAGE
ViewRect.xy = pPage.PageIndex * VSM_PAGE_SIZE;
ViewRect.zw = ViewRect.xy + VSM_PAGE_SIZE;
#endif
if (GroupIndex < Cluster.NumTris)
{
// 三角形ID就是Group索引.
uint TriangleID = GroupIndex;
// 生成三角形索引, 同时处理需要翻转的情况.
uint3 TriangleIndices = ReadTriangleIndices(Cluster, TriangleID);
if (ReverseWindingOrder(InstanceData))
{
TriangleIndices = uint3(TriangleIndices.x, TriangleIndices.z, TriangleIndices.y);
}
// 获取三角形位置.
float3 Vertices[3];
Vertices[0] = GroupVerts[TriangleIndices.x];
Vertices[1] = GroupVerts[TriangleIndices.y];
Vertices[2] = GroupVerts[TriangleIndices.z];
// 像素值就是三角形ID.
uint PixelValue = ((VisibleIndex + 1) << 7) | TriangleID;
// 光栅化该三角形, 写入对应的id和深度.
RasterizeTri(
NaniteView,
ViewRect,
PixelValue,
#if VISUALIZE
GetVisualizeValues(),
#endif
Vertices,
!CLUSTER_PER_PAGE );
}
}
#define PIXEL_VALUE (RASTER_TECHNIQUE != RASTER_TECHNIQUE_DEPTHONLY)
#define VERTEX_TO_TRIANGLE_MASKS (NANITE_PRIM_SHADER && PIXEL_VALUE)
struct VSOut
{
noperspective float DeviceZ : TEXCOORD0;
#if PIXEL_VALUE
nointerpolation uint PixelValue : TEXCOORD1;
#endif
#if NANITE_MULTI_VIEW
nointerpolation int4 ViewRect : TEXCOORD2;
#endif
#if VISUALIZE
nointerpolation uint2 VisualizeValues : TEXCOORD3;
#endif
#if VIRTUAL_TEXTURE_TARGET
nointerpolation int ViewId : TEXCOORD4;
#endif
#if VERTEX_TO_TRIANGLE_MASKS
CUSTOM_INTERPOLATION uint4 ToTriangleMasks : TEXCOORD5;
#endif
float4 Position : SV_Position;
};
// 硬件光栅化的VS, 主要是将顶点数据从Cluster中解压出来, 然后变换到裁剪空间.
VSOut CommonRasterizerVS(FNaniteView NaniteView, FInstanceSceneData InstanceData, FVisibleCluster VisibleCluster, FCluster Cluster, uint VertIndex, out float4 PointClipNoScaling)
{
VSOut Out;
float4x4 LocalToWorld = InstanceData.LocalToWorld;
float3 PointLocal = DecodePosition( VertIndex, Cluster );
float3 PointRotated = LocalToWorld[0].xyz * PointLocal.xxx + LocalToWorld[1].xyz * PointLocal.yyy + LocalToWorld[2].xyz * PointLocal.zzz;
float3 PointTranslatedWorld = PointRotated + (LocalToWorld[3].xyz + NaniteView.PreViewTranslation.xyz);
float4 PointClip = mul( float4( PointTranslatedWorld, 1 ), NaniteView.TranslatedWorldToClip );
PointClipNoScaling = PointClip;
#if CLUSTER_PER_PAGE
PointClip.xy = NaniteView.ClipSpaceScaleOffset.xy * PointClip.xy + NaniteView.ClipSpaceScaleOffset.zw * PointClip.w;
// Offset 0,0 to be at vPage for a 0, VSM_PAGE_SIZE * VSM_RASTER_WINDOW_PAGES viewport.
PointClip.xy += PointClip.w * ( float2(-2, 2) / VSM_RASTER_WINDOW_PAGES ) * VisibleCluster.vPage;
Out.ViewRect.xy = VisibleCluster.vPage * VSM_PAGE_SIZE;
Out.ViewRect.zw = NaniteView.ViewRect.zw;
#elif NANITE_MULTI_VIEW
PointClip.xy = NaniteView.ClipSpaceScaleOffset.xy * PointClip.xy + NaniteView.ClipSpaceScaleOffset.zw * PointClip.w;
Out.ViewRect = NaniteView.ViewRect;
#endif
#if VIRTUAL_TEXTURE_TARGET
Out.ViewId = VisibleCluster.ViewId;
#endif
Out.Position = PointClip;
Out.DeviceZ = PointClip.z / PointClip.w;
// Shader workaround to avoid HW depth clipping. Should be replaced with rasterizer state ideally.
#if !NEAR_CLIP
Out.Position.z = 0.5f * Out.Position.w;
#endif
#if VISUALIZE
Out.VisualizeValues = GetVisualizeValues();
#endif
return Out;
}
#if NANITE_PRIM_SHADER
#pragma argument(wavemode=wave64)
#pragma argument(realtypes)
struct PrimitiveInput
{
uint Index : PRIM_SHADER_SEM_VERT_INDEX;
uint WaveIndex : PRIM_SHADER_SEM_WAVE_INDEX;
};
struct PrimitiveOutput
{
VSOut Out;
uint PrimExport : PRIM_SHADER_SEM_PRIM_EXPORT;
uint VertCount : PRIM_SHADER_SEM_VERT_COUNT;
uint PrimCount : PRIM_SHADER_SEM_PRIM_COUNT;
};
// 压缩三角形索引, 其中x,y,z的位数是10,10,12.
uint PackTriangleExport(uint3 TriangleIndices)
{
return TriangleIndices.x | (TriangleIndices.y << 10) | (TriangleIndices.z << 20);
}
// 解压三角形索引.
uint3 UnpackTriangleExport(uint Packed)
{
const uint Index0 = (Packed & 0x3FF); // 提取前10位.
const uint Index1 = (Packed >> 10) & 0x3FF; // 提取中间10位
const uint Index2 = (Packed >> 20); // 提取后12位.
return uint3(Index0, Index1, Index2);
}
#if VERTEX_TO_TRIANGLE_MASKS // 三角形掩码渲染模式.
groupshared uint GroupVertexToTriangleMasks[256][4];
#endif
groupshared uint GroupTriangleCount;
groupshared uint GroupVertexCount;
groupshared uint GroupClusterIndex;
PRIM_SHADER_OUTPUT_TRIANGLES
PRIM_SHADER_PRIM_COUNT(1)
PRIM_SHADER_VERT_COUNT(1)
PRIM_SHADER_VERT_LIMIT(256)
PRIM_SHADER_AMP_FACTOR(128)
PRIM_SHADER_AMP_ENABLE
// 硬件光栅化VS入口(三角形掩码渲染模式).
PrimitiveOutput HWRasterizeVS(PrimitiveInput Input)
{
const uint LaneIndex = WaveGetLaneIndex();
const uint LaneCount = WaveGetLaneCount();
const uint GroupThreadID = LaneIndex + Input.WaveIndex * LaneCount;
if (GroupThreadID == 0)
{
// Input index is only initialized for lane 0, so we need to manually communicate it to all other threads in subgroup (not just wavefront).
GroupClusterIndex = Input.Index;
}
GroupMemoryBarrierWithGroupSync();
// 下面的代码和MicropolyRasterize类型, 省略之.
uint VisibleIndex = GroupClusterIndex;
#if HAS_PREV_DRAW_DATA
VisibleIndex += InTotalPrevDrawClusters[0].y;
#endif
#if ADD_CLUSTER_OFFSET
VisibleIndex += InClusterOffsetSWHW[GetHWClusterCounterIndex(RenderFlags)];
#endif
VisibleIndex = (MaxVisibleClusters - 1) - VisibleIndex;
// Should be all scalar.
FVisibleCluster VisibleCluster = GetVisibleCluster( VisibleIndex, VIRTUAL_TEXTURE_TARGET );
FInstanceSceneData InstanceData = GetInstanceData( VisibleCluster.InstanceId );
FNaniteView NaniteView = GetNaniteView( VisibleCluster.ViewId );
FInstanceDynamicData InstanceDynamicData = CalculateInstanceDynamicData(NaniteView, InstanceData);
FCluster Cluster = GetCluster(VisibleCluster.PageIndex, VisibleCluster.ClusterIndex);
#if VERTEX_TO_TRIANGLE_MASKS
if (GroupThreadID < Cluster.NumVerts)
{
GroupVertexToTriangleMasks[GroupThreadID][0] = 0;
GroupVertexToTriangleMasks[GroupThreadID][1] = 0;
GroupVertexToTriangleMasks[GroupThreadID][2] = 0;
GroupVertexToTriangleMasks[GroupThreadID][3] = 0;
}
#endif
GroupMemoryBarrierWithGroupSync();
PrimitiveOutput PrimOutput;
PrimOutput.VertCount = Cluster.NumVerts;
PrimOutput.PrimCount = Cluster.NumTris;
bool bCullTriangle = false;
if (GroupThreadID < Cluster.NumTris)
{
uint TriangleID = GroupThreadID;
uint3 TriangleIndices = ReadTriangleIndices(Cluster, TriangleID);
if (ReverseWindingOrder(InstanceData))
{
TriangleIndices = uint3(TriangleIndices.x, TriangleIndices.z, TriangleIndices.y);
}
#if VERTEX_TO_TRIANGLE_MASKS
const uint DwordIndex = (GroupThreadID >> 5) & 3;
const uint TriangleMask = 1 << (GroupThreadID & 31);
InterlockedOr(GroupVertexToTriangleMasks[TriangleIndices.x][DwordIndex], TriangleMask);
InterlockedOr(GroupVertexToTriangleMasks[TriangleIndices.y][DwordIndex], TriangleMask);
InterlockedOr(GroupVertexToTriangleMasks[TriangleIndices.z][DwordIndex], TriangleMask);
#endif
PrimOutput.PrimExport = PackTriangleExport(TriangleIndices);
}
GroupMemoryBarrierWithGroupSync();
if (GroupThreadID < Cluster.NumVerts)
{
float4 PointClipNoScaling;
// 光栅化三角形.
PrimOutput.Out = CommonRasterizerVS(NaniteView, InstanceData, VisibleCluster, Cluster, GroupThreadID, PointClipNoScaling);
#if VERTEX_TO_TRIANGLE_MASKS
PrimOutput.Out.PixelValue = ((VisibleIndex + 1) << 7);
PrimOutput.Out.ToTriangleMasks = uint4(GroupVertexToTriangleMasks[GroupThreadID][0],
GroupVertexToTriangleMasks[GroupThreadID][1],
GroupVertexToTriangleMasks[GroupThreadID][2],
GroupVertexToTriangleMasks[GroupThreadID][3]);
#endif
}
return PrimOutput;
}
#else // NANITE_PRIM_SHADER(Nanite图元着色模式)
// 硬件光栅化VS入口(图元着色模式).
VSOut HWRasterizeVS(
uint VertexID : SV_VertexID,
uint VisibleIndex : SV_InstanceID
)
{
#if HAS_PREV_DRAW_DATA
VisibleIndex += InTotalPrevDrawClusters[0].y;
#endif
#if ADD_CLUSTER_OFFSET
VisibleIndex += InClusterOffsetSWHW[GetHWClusterCounterIndex(RenderFlags)];
#endif
VisibleIndex = (MaxVisibleClusters - 1) - VisibleIndex;
uint TriIndex = VertexID / 3;
VertexID = VertexID - TriIndex * 3;
VSOut Out;
Out.Position = float4(0,0,0,1);
Out.DeviceZ = 0.0f;
FVisibleCluster VisibleCluster = GetVisibleCluster( VisibleIndex, VIRTUAL_TEXTURE_TARGET );
FInstanceSceneData InstanceData = GetInstanceData( VisibleCluster.InstanceId );
FNaniteView NaniteView = GetNaniteView( VisibleCluster.ViewId );
FCluster Cluster = GetCluster(VisibleCluster.PageIndex, VisibleCluster.ClusterIndex);
if( TriIndex < Cluster.NumTris )
{
uint3 TriangleIndices = ReadTriangleIndices( Cluster, TriIndex );
if( ReverseWindingOrder( InstanceData ) )
{
TriangleIndices = uint3( TriangleIndices.x, TriangleIndices.z, TriangleIndices.y );
}
uint VertIndex = TriangleIndices[ VertexID ];
float4 PointClipNoScaling;
// 光栅化三角形.
Out = CommonRasterizerVS(NaniteView, InstanceData, VisibleCluster, Cluster, VertIndex, PointClipNoScaling);
#if PIXEL_VALUE
Out.PixelValue = ((VisibleIndex + 1) << 7) | TriIndex;
#endif
}
return Out;
}
#endif // NANITE_PRIM_SHADER
// 硬件光栅化的PS入口.
void HWRasterizePS(VSOut In)
{
uint2 PixelPos = (uint2)In.Position.xy;
uint PixelValue = 0;
#if PIXEL_VALUE
PixelValue = In.PixelValue;
#endif
#if VERTEX_TO_TRIANGLE_MASKS
uint4 Masks0 = LoadParameterCacheP0( In.ToTriangleMasks );
uint4 Masks1 = LoadParameterCacheP1( In.ToTriangleMasks );
uint4 Masks2 = LoadParameterCacheP2( In.ToTriangleMasks );
uint4 Masks = Masks0 & Masks1 & Masks2;
uint TriangleIndex = Masks.x ? firstbitlow( Masks.x ) :
Masks.y ? firstbitlow( Masks.y ) + 32 :
Masks.z ? firstbitlow( Masks.z ) + 64 :
firstbitlow( Masks.w ) + 96;
PixelValue += TriangleIndex;
#endif
#if VIRTUAL_TEXTURE_TARGET
FNaniteView NaniteView = GetNaniteView(In.ViewId);
#else
FNaniteView NaniteView;
#endif
#if CLUSTER_PER_PAGE
PixelPos += In.ViewRect.xy;
if (all(PixelPos < In.ViewRect.zw))
#elif NANITE_MULTI_VIEW
// In multi-view mode every view has its own scissor, so we have to scissor manually.
if (all(PixelPos >= In.ViewRect.xy && PixelPos < In.ViewRect.zw))
#endif
{
// 写入像素数据: 三角形id(PixelValue), 深度(In.DeviceZ)
WritePixel(OutVisBuffer64, PixelValue, PixelPos, In.DeviceZ, NaniteView, VIRTUAL_TEXTURE_TARGET);
#if VISUALIZE
WritePixel(OutDbgBuffer64, In.VisualizeValues.x, PixelPos, In.DeviceZ, NaniteView, VIRTUAL_TEXTURE_TARGET);
InterlockedAdd(OutDbgBuffer32[PixelPos], In.VisualizeValues.y);
#endif
}
}
从上面的分析可知,无论是软件光栅还是硬件光栅,写入数据的只有ClusterID、三角形ID和深度(如果是可视化模式还有其它数据),也就是说此阶段并没有真正地着色,而是类似于延迟渲染的BasePass,但输出的信息远没有BasePass的多,由此参数的IO、显存都显著降低。其实这个技术就是Visibility Buffer技术,具体可以参见剖析虚幻渲染体系(04)- 延迟渲染管线的小节4.2.3.5 Visibility Buffer。
Nanite光栅化后的存储结构:ClusterID占25位,三角形ID占7位,深度占32位。
Nanite光栅化后的结果示意图,从上到下依次是ClusterID、三角形ID、深度。
在Nanite光栅化之后,还有个重要的步骤是Nanite::EmitDepthTargets,它的作用在于场景的深度、模板、速度、材质深度等缓冲数据:
其中模板缓冲表示哪些像素是被Nanite渲染的:
而最有意思的是材质深度,表明位于场景最前面的每个像素被哪个材质所覆盖,本质上是一个转换为唯一深度值并存储在深度模板纹理中的材质ID。实际上,每种材质都有一个灰度值,以便后续将利用Early Z进行优化。
本小节主要阐述Nanite的BasePass对GBuffer的生成。其在FDeferredShadingSceneRenderer::Render主流程如下:
void FDeferredShadingSceneRenderer::Render(FRDGBuilder& GraphBuilder)
{
(......)
// 渲染Nanite的BasePass.
{
// 绘制普通模式的BasePass.
RenderBasePass(GraphBuilder, SceneTextures, DBufferTextures, BasePassDepthStencilAccess, ForwardScreenSpaceShadowMaskTexture, InstanceCullingManager);
AddServiceLocalQueuePass(GraphBuilder);
if (bNaniteEnabled && bShouldApplyNaniteMaterials)
{
for (int32 ViewIndex = 0; ViewIndex < Views.Num(); ++ViewIndex)
{
const FViewInfo& View = Views[ViewIndex];
Nanite::FRasterResults& RasterResults = NaniteRasterResults[ViewIndex];
// 如果没有提前绘制深度, 则现在绘制深度
if (!bNeedsPrePass)
{
Nanite::EmitDepthTargets(
GraphBuilder,
*Scene,
Views[ViewIndex],
RasterResults.SOAStrides,
RasterResults.VisibleClustersSWHW,
RasterResults.ViewsBuffer,
SceneTextures.Depth.Target,
RasterResults.VisBuffer64,
RasterResults.MaterialDepth,
RasterResults.NaniteMask,
RasterResults.VelocityBuffer,
bNeedsPrePass
);
}
// 绘制Nanite模式的BasePass.
Nanite::DrawBasePass(
GraphBuilder,
SceneTextures,
DBufferTextures,
*Scene,
View,
RasterResults
);
}
}
// 解析场景深度.
if (!bAllowReadOnlyDepthBasePass)
{
AddResolveSceneDepthPass(GraphBuilder, Views, SceneTextures.Depth);
}
(......)
}
(......)
}
需要注意的是,上面有两次BasePass的绘制:一次是传统的BasePass绘制RenderBasePass,另一次是Nanite模式的BasePass绘制Nanite::DrawBasePass。下面是Nanite::DrawBasePass的解析:
// Engine\Source\Runtime\Renderer\Private\Nanite\NaniteRender.cpp
void DrawBasePass(
FRDGBuilder& GraphBuilder,
const FSceneTextures& SceneTextures,
const FDBufferTextures& DBufferTextures,
const FScene& Scene,
const FViewInfo& View,
const FRasterResults& RasterResults
)
{
(......)
RDG_EVENT_SCOPE(GraphBuilder, "Nanite::BasePass");
const int32 ViewWidth = View.ViewRect.Max.X - View.ViewRect.Min.X;
const int32 ViewHeight = View.ViewRect.Max.Y - View.ViewRect.Min.Y;
const FIntPoint ViewSize = FIntPoint(ViewWidth, ViewHeight);
const FRDGSystemTextures& SystemTextures = FRDGSystemTextures::Get(GraphBuilder);
FRenderTargetBindingSlots GBufferRenderTargets;
SceneTextures.GetGBufferRenderTargets(ERenderTargetLoadAction::ELoad, GBufferRenderTargets);
// 初始化纹理引用.
FRDGTextureRef MaterialDepth = RasterResults.MaterialDepth ? RasterResults.MaterialDepth : SystemTextures.Black;
FRDGTextureRef VisBuffer64 = RasterResults.VisBuffer64 ? RasterResults.VisBuffer64 : SystemTextures.Black;
FRDGTextureRef DbgBuffer64 = RasterResults.DbgBuffer64 ? RasterResults.DbgBuffer64 : SystemTextures.Black;
FRDGTextureRef DbgBuffer32 = RasterResults.DbgBuffer32 ? RasterResults.DbgBuffer32 : SystemTextures.Black;
FRDGBufferRef VisibleClustersSWHW = RasterResults.VisibleClustersSWHW;
// 检测材质裁剪模式. 波操作需要SM6才支持,不支持的平台将切换成4.
if (!FDataDrivenShaderPlatformInfo::GetSupportsWaveOperations(GMaxRHIShaderPlatform) &&
(GNaniteMaterialCulling == 1 || GNaniteMaterialCulling == 2))
{
UE_LOG(LogNanite, Warning, TEXT("r.Nanite.MaterialCulling set to %d which requires wave-ops (not supported on this platform), switching to mode 4"), GNaniteMaterialCulling);
GNaniteMaterialCulling = 4;
}
// 使用局部赋值, 可以不用修改全部视图达到覆盖的目的.
int32 NaniteMaterialCulling = GNaniteMaterialCulling;
if ((NaniteMaterialCulling == 1 || NaniteMaterialCulling == 2) && (View.ViewRect.Min.X != 0 || View.ViewRect.Min.Y != 0))
{
NaniteMaterialCulling = 4;
static bool bLoggedAlready = false;
if (!bLoggedAlready)
{
bLoggedAlready = true;
UE_LOG(LogNanite, Warning, TEXT("View has non-zero viewport offset, using material culling mode 4 (overrides r.Nanite.MaterialCulling = %d)."), GNaniteMaterialCulling);
}
}
// 位掩码裁剪
const bool b32BitMaskCulling = (NaniteMaterialCulling == 1 || NaniteMaterialCulling == 2);
// 分块裁剪
const bool bTileGridCulling = (NaniteMaterialCulling == 3 || NaniteMaterialCulling == 4);
const FIntPoint TileGridDim = bTileGridCulling ? FMath::DivideAndRoundUp(ViewSize, { 64, 64 }) : FIntPoint(1, 1);
// 创建纹理和缓冲.
FRDGBufferDesc VisibleMaterialsDesc = FRDGBufferDesc::CreateStructuredDesc(4, b32BitMaskCulling ? FNaniteCommandInfo::MAX_STATE_BUCKET_ID+1 : 1);
FRDGBufferRef VisibleMaterials = GraphBuilder.CreateBuffer(VisibleMaterialsDesc, TEXT("Nanite.VisibleMaterials"));
FRDGBufferUAVRef VisibleMaterialsUAV = GraphBuilder.CreateUAV(VisibleMaterials);
FRDGTextureDesc MaterialRangeDesc = FRDGTextureDesc::Create2D(TileGridDim, PF_R32G32_UINT, FClearValueBinding::Black, TexCreate_ShaderResource | TexCreate_UAV);
FRDGTextureRef MaterialRange = GraphBuilder.CreateTexture(MaterialRangeDesc, TEXT("Nanite.MaterialRange"));
FRDGTextureUAVRef MaterialRangeUAV = GraphBuilder.CreateUAV(MaterialRange);
FRDGTextureSRVDesc MaterialRangeSRVDesc = FRDGTextureSRVDesc::Create(MaterialRange);
FRDGTextureSRVRef MaterialRangeSRV = GraphBuilder.CreateSRV(MaterialRangeSRVDesc);
// 清理纹理缓冲
AddClearUAVPass(GraphBuilder, VisibleMaterialsUAV, 0);
AddClearUAVPass(GraphBuilder, MaterialRangeUAV, { 0u, 1u, 0u, 0u });
// 分类材质以分块裁剪
if (b32BitMaskCulling || bTileGridCulling)
{
FClassifyMaterialsCS::FParameters* PassParameters = GraphBuilder.AllocParameters<FClassifyMaterialsCS::FParameters>();
PassParameters->View = View.ViewUniformBuffer;
PassParameters->VisibleClustersSWHW = GraphBuilder.CreateSRV(VisibleClustersSWHW);
PassParameters->SOAStrides = RasterResults.SOAStrides;
PassParameters->ClusterPageData = Nanite::GStreamingManager.GetClusterPageDataSRV();
PassParameters->ClusterPageHeaders = Nanite::GStreamingManager.GetClusterPageHeadersSRV();
PassParameters->VisBuffer64 = VisBuffer64;
PassParameters->MaterialDepthTable = Scene.MaterialTables[ENaniteMeshPass::BasePass].GetDepthTableSRV();
uint32 DispatchGroupSize = 0;
PassParameters->ViewRect = FIntVector4(View.ViewRect.Min.X, View.ViewRect.Min.Y, View.ViewRect.Max.X, View.ViewRect.Max.Y);
if (b32BitMaskCulling)
{
checkf(View.ViewRect.Min.X == 0 && View.ViewRect.Min.Y == 0, TEXT("Viewport offset support is not implemented."));
DispatchGroupSize = 8;
PassParameters->VisibleMaterials = VisibleMaterialsUAV;
}
else if (bTileGridCulling)
{
DispatchGroupSize = 64;
PassParameters->FetchClamp = View.ViewRect.Max - 1;
PassParameters->MaterialRange = MaterialRangeUAV;
}
const FIntVector DispatchDim = FComputeShaderUtils::GetGroupCount(View.ViewRect.Max - View.ViewRect.Min, DispatchGroupSize);
FClassifyMaterialsCS::FPermutationDomain PermutationVector;
PermutationVector.Set<FClassifyMaterialsCS::FCullingMethodDim>(NaniteMaterialCulling);
auto ComputeShader = View.ShaderMap->GetShader<FClassifyMaterialsCS>(PermutationVector.ToDimensionValueId());
// 分类材质的CS Pass.
FComputeShaderUtils::AddPass(
GraphBuilder,
RDG_EVENT_NAME("Classify Materials"),
ComputeShader,
PassParameters,
DispatchDim
);
}
// 渲染GBuffer.
{
// 处理Pass数据
FNaniteEmitGBufferParameters* PassParameters = GraphBuilder.AllocParameters<FNaniteEmitGBufferParameters>();
PassParameters->SOAStrides = RasterResults.SOAStrides;
PassParameters->MaxVisibleClusters = RasterResults.MaxVisibleClusters;
PassParameters->MaxNodes = RasterResults.MaxNodes;
PassParameters->RenderFlags = RasterResults.RenderFlags;
PassParameters->ClusterPageData = Nanite::GStreamingManager.GetClusterPageDataSRV();
PassParameters->ClusterPageHeaders = Nanite::GStreamingManager.GetClusterPageHeadersSRV();
PassParameters->VisibleClustersSWHW = GraphBuilder.CreateSRV(VisibleClustersSWHW);
PassParameters->MaterialRange = MaterialRange;
PassParameters->VisibleMaterials = GraphBuilder.CreateSRV(VisibleMaterials, PF_R32_UINT);
PassParameters->VisBuffer64 = VisBuffer64; // 可见性
PassParameters->DbgBuffer64 = DbgBuffer64;
PassParameters->DbgBuffer32 = DbgBuffer32;
PassParameters->RenderTargets = GBufferRenderTargets; // 渲染纹理
// Uniform Buffer
PassParameters->View = View.ViewUniformBuffer; // To get VTFeedbackBuffer
PassParameters->BasePass = CreateOpaqueBasePassUniformBuffer(GraphBuilder, View, 0, {}, DBufferTextures, nullptr);
switch (NaniteMaterialCulling)
{
// 使用8x4的格子渲染, 共32bit, 每个bit一个tile.
case 1:
case 2:
PassParameters->GridSize.X = 8;
PassParameters->GridSize.Y = 4;
break;
// 用64x64的像素分块渲染.
case 3:
case 4:
PassParameters->GridSize = FMath::DivideAndRoundUp(View.ViewRect.Max - View.ViewRect.Min, { 64, 64 });
break;
// 使用全屏方块渲染.
default:
PassParameters->GridSize.X = 1;
PassParameters->GridSize.Y = 1;
break;
}
const FExclusiveDepthStencil MaterialDepthStencil = UseComputeDepthExport()
? FExclusiveDepthStencil::DepthWrite_StencilNop
: FExclusiveDepthStencil::DepthWrite_StencilWrite;
PassParameters->RenderTargets.DepthStencil = FDepthStencilBinding(
MaterialDepth,
ERenderTargetLoadAction::ELoad,
ERenderTargetLoadAction::ELoad,
MaterialDepthStencil
);
TShaderMapRef<FNaniteMaterialVS> NaniteVertexShader(View.ShaderMap);
// 增加渲染pass.
GraphBuilder.AddPass(
RDG_EVENT_NAME("Emit GBuffer"),
PassParameters,
ERDGPassFlags::Raster,
[PassParameters, &Scene, NaniteVertexShader, ViewRect = View.ViewRect, NaniteMaterialCulling](FRHICommandListImmediate& RHICmdList)
{
RHICmdList.SetViewport(ViewRect.Min.X, ViewRect.Min.Y, 0.0f, ViewRect.Max.X, ViewRect.Max.Y, 1.0f);
// 处理全局缓冲参数.
FNaniteUniformParameters UniformParams;
UniformParams.SOAStrides = PassParameters->SOAStrides;
UniformParams.MaxVisibleClusters= PassParameters->MaxVisibleClusters;
UniformParams.MaxNodes = PassParameters->MaxNodes;
UniformParams.RenderFlags = PassParameters->RenderFlags;
UniformParams.MaterialConfig.X = NaniteMaterialCulling;
UniformParams.MaterialConfig.Y = PassParameters->GridSize.X;
UniformParams.MaterialConfig.Z = PassParameters->GridSize.Y;
UniformParams.MaterialConfig.W = 0;
UniformParams.RectScaleOffset = FVector4(1.0f, 1.0f, 0.0f, 0.0f); // Render a rect that covers the entire screen
// 材质裁剪模式
if (NaniteMaterialCulling == 3 || NaniteMaterialCulling == 4)
{
FIntPoint ScaledSize = PassParameters->GridSize * 64;
UniformParams.RectScaleOffset.X = float(ScaledSize.X) / float(ViewRect.Max.X - ViewRect.Min.X);
UniformParams.RectScaleOffset.Y = float(ScaledSize.Y) / float(ViewRect.Max.Y - ViewRect.Min.Y);
}
// Cluster页面及可见性数据
UniformParams.ClusterPageData = PassParameters->ClusterPageData;
UniformParams.ClusterPageHeaders = PassParameters->ClusterPageHeaders;
UniformParams.VisibleClustersSWHW = PassParameters->VisibleClustersSWHW->GetRHI();
// 材质数据
UniformParams.MaterialRange = PassParameters->MaterialRange->GetRHI();
UniformParams.VisibleMaterials = PassParameters->VisibleMaterials->GetRHI();
// 可见性数据
UniformParams.VisBuffer64 = PassParameters->VisBuffer64->GetRHI();
UniformParams.DbgBuffer64 = PassParameters->DbgBuffer64->GetRHI();
UniformParams.DbgBuffer32 = PassParameters->DbgBuffer32->GetRHI();
const_cast<FScene&>(Scene).UniformBuffers.NaniteUniformBuffer.UpdateUniformBufferImmediate(UniformParams);
FGraphicsMinimalPipelineStateSet GraphicsMinimalPipelineStateSet;
TArray<FNaniteMaterialPassCommand, SceneRenderingAllocator> NaniteMaterialPassCommands;
// 构建Nanite材质Pass的命令.
BuildNaniteMaterialPassCommands(RHICmdList, Scene.NaniteDrawCommands[ENaniteMeshPass::BasePass], NaniteMaterialPassCommands);
FMeshDrawCommandStateCache StateCache;
const uint32 TileCount = UniformParams.MaterialConfig.Y * UniformParams.MaterialConfig.Z; // (W * H)
// 遍历所有材质通道命令, 逐个提交.
for (auto CommandsIt = NaniteMaterialPassCommands.CreateConstIterator(); CommandsIt; ++CommandsIt)
{
SubmitNaniteMaterialPassCommand(*CommandsIt, NaniteVertexShader, GraphicsMinimalPipelineStateSet, TileCount, RHICmdList, StateCache);
}
});
}
}
在渲染BasePass之前,需要执行材质分类Pass,以对材质进行分类(Classify Material),对后续的材质剔除等操作有着重要作用。它用Compute Shader分析全屏的Visibility Buffer,输出20x12=240的像素(被称为材质范围,格式是R32G32_UINT),每个像素(材质范围)对每个分块表示的64×64区域中出现的材质范围进行了编码。它呈现的颜色如下所示:
上面代码涉及的Wave Operation翻译成波操作,是DX的概念,VK至于对应的概念是Subgroup,只有SM6以上才支持。具体可以参见GDC2017的Talk:Wave-Programming-D3D12-Vulkan。
上面代码构建Nanite材质Pass的绘制指令时的源数据是Scene.NaniteDrawCommands[ENaniteMeshPass::BasePass],该数据是在FPrimitiveSceneInfo::UpdateStaticMeshes时生成的,调用堆栈如下:
// Engine\Source\Runtime\Renderer\Private\PrimitiveSceneInfo.cpp
void FPrimitiveSceneInfo::UpdateStaticMeshes(FRHICommandListImmediate& RHICmdList, FScene* Scene, const TArrayView<FPrimitiveSceneInfo*>& SceneInfos, bool bReAddToDrawLists)
{
(......)
if (bReAddToDrawLists)
{
CacheMeshDrawCommands(RHICmdList, Scene, SceneInfos);
// 缓存Nanite绘制指令.
CacheNaniteDrawCommands(RHICmdList, Scene, SceneInfos);
}
}
void FPrimitiveSceneInfo::CacheNaniteDrawCommands(FRHICommandListImmediate& RHICmdList, FScene* Scene, const TArrayView<FPrimitiveSceneInfo*>& SceneInfos)
{
(......)
// 遍历场景的所有图元场景信息, 逐个构建Nanite绘制指令.
for (FPrimitiveSceneInfo* PrimitiveSceneInfo : SceneInfos)
{
BuildNaniteDrawCommands(RHICmdList, Scene, PrimitiveSceneInfo);
}
(......)
}
void BuildNaniteDrawCommands(FRHICommandListImmediate& RHICmdList, FScene* Scene, FPrimitiveSceneInfo* PrimitiveSceneInfo)
{
(......)
for (int32 MeshPass = 0; MeshPass < ENaniteMeshPass::Num; ++MeshPass)
{
FNaniteDrawListContext NaniteDrawListContext(Scene->NaniteDrawCommandLock[MeshPass], Scene->NaniteDrawCommands[MeshPass]);
// 创建Nanite模式的MeshProcessor.
FMeshPassProcessor* NaniteMeshProcessor = nullptr;
switch (MeshPass)
{
case ENaniteMeshPass::BasePass:
NaniteMeshProcessor = CreateNaniteMeshProcessor(Scene, nullptr, &NaniteDrawListContext);
break;
case ENaniteMeshPass::LumenCardCapture:
NaniteMeshProcessor = CreateLumenCardNaniteMeshProcessor(Scene, nullptr, &NaniteDrawListContext);
break;
default:
check(false);
}
// 遍历所有静态网格, 对支持Nanite渲染的网格构建Nanite绘制指令.
int32 StaticMeshesCount = PrimitiveSceneInfo->StaticMeshes.Num();
for (int32 MeshIndex = 0; MeshIndex < StaticMeshesCount; ++MeshIndex)
{
FStaticMeshBatchRelevance& MeshRelevance = PrimitiveSceneInfo->StaticMeshRelevances[MeshIndex];
FStaticMeshBatch& Mesh = PrimitiveSceneInfo->StaticMeshes[MeshIndex];
if (MeshRelevance.bSupportsNaniteRendering)
{
uint64 BatchElementMask = ~0ull;
// 向MeshProcessor加入网格批次, 后续的步骤跟传统的类似, 不再追踪.
NaniteMeshProcessor->AddMeshBatch(Mesh, BatchElementMask, Proxy);
FNaniteCommandInfo CommandInfo = NaniteDrawListContext.GetCommandInfoAndReset();
PrimitiveSceneInfo->NaniteCommandInfos[MeshPass].Add(CommandInfo);
const uint32 MaterialDepthId = CommandInfo.GetMaterialId();
const uint32 SectionIndex = Mesh.SegmentIndex;
PrimitiveSceneInfo->NaniteMaterialIds[MeshPass][SectionIndex] = MaterialDepthId;
}
}
NaniteMeshProcessor->~FMeshPassProcessor();
}
(......)
}
下面继续解析Nanite::DrawBasePass的两个重要接口BuildNaniteMaterialPassCommands和SubmitNaniteMaterialPassCommand:
// Engine\Source\Runtime\Renderer\Private\Nanite\NaniteRender.cpp
// 构建Nanite材质Pass的命令
static void BuildNaniteMaterialPassCommands(
FRHICommandListImmediate& RHICmdList,
const FStateBucketMap& NaniteDrawCommands,
TArray<FNaniteMaterialPassCommand, SceneRenderingAllocator>& OutNaniteMaterialPassCommands)
{
OutNaniteMaterialPassCommands.Reset(NaniteDrawCommands.Num());
FGraphicsMinimalPipelineStateSet GraphicsMinimalPipelineStateSet;
const int32 MaterialSortMode = GNaniteMaterialSortMode;
// 遍历所有Nanite绘制指令, 构建对应的FNaniteMaterialPassCommand.
for (auto& Command : NaniteDrawCommands)
{
// 构建FNaniteMaterialPassCommand实例.
FNaniteMaterialPassCommand PassCommand(Command.Key);
Experimental::FHashElementId SetId = NaniteDrawCommands.FindId(Command.Key);
int32 DrawIdx = SetId.GetIndex();
PassCommand.MaterialDepth = FNaniteCommandInfo::GetDepthId(DrawIdx);
// 使用渲染状态的排序键值替换原有的.
if (MaterialSortMode == 2 && GRHISupportsPipelineStateSortKey)
{
const FMeshDrawCommand& MeshDrawCommand = Command.Key;
const FGraphicsMinimalPipelineStateInitializer& MeshPipelineState = MeshDrawCommand.CachedPipelineId.GetPipelineState(GraphicsMinimalPipelineStateSet);
FGraphicsPipelineState* PipelineState = PipelineStateCache::GetAndOrCreateGraphicsPipelineState(RHICmdList, MeshPipelineState.AsGraphicsPipelineStateInitializer(), EApplyRendertargetOption::DoNothing);
if (PipelineState)
{
const uint64 StateSortKey = PipelineStateCache::RetrieveGraphicsPipelineStateSortKey(PipelineState);
if (StateSortKey != 0)
{
PassCommand.SortKey = StateSortKey;
}
}
}
// 添加到命令列表.
OutNaniteMaterialPassCommands.Emplace(PassCommand);
}
// 排序材质.
if (MaterialSortMode != 0)
{
OutNaniteMaterialPassCommands.Sort();
}
}
// 提交单个材质通道绘制命令
static void SubmitNaniteMaterialPassCommand(
const FMeshDrawCommand& MeshDrawCommand,
const float MaterialDepth,
const TShaderRef<FNaniteMaterialVS>& NaniteVertexShader,
const FGraphicsMinimalPipelineStateSet& GraphicsMinimalPipelineStateSet,
const uint32 InstanceFactor,
FRHICommandList& RHICmdList,
FMeshDrawCommandStateCache& StateCache)
{
// 提交绘制开始.
FMeshDrawCommand::SubmitDrawBegin(MeshDrawCommand, GraphicsMinimalPipelineStateSet, nullptr, 0, InstanceFactor, RHICmdList, StateCache);
// 所有Nanite网格绘制指令都是使用相同的VS, 该命令拥有在渲染时刻赋值的材质深度.
{
FNaniteMaterialVS::FParameters Parameters;
Parameters.MaterialDepth = MaterialDepth;
SetShaderParameters(RHICmdList, NaniteVertexShader, NaniteVertexShader.GetVertexShader(), Parameters);
}
// 提交绘制结束.
FMeshDrawCommand::SubmitDrawEnd(MeshDrawCommand, InstanceFactor, RHICmdList);
}
不过奇怪的是,绘制BasePass只指定了VS,而没有指定PS,那么PS究竟在哪里设置的或者本来就是空的?为了探明真相,利用RenderDoc截帧分析,发现PS使用的依然是传统的BasePassPixelShader,并且经过此阶段之后渲染的GBuffer和传统的基本一致:
左上:渲染画面,右上:GBufferA,左下:GBufferB,右下:GBufferC
Nanite在渲染BasePass的过程中,是以材质为Pass来进行提交的,这意味着,可以利用之前渲染的材质范围纹理和材质深度进行快速剔除,以下面了两图为例:
在渲染上面的第一幅图的材质区域时,会根据材质深度和材质范围来快速判断和剔除像素,如第二幅图所示,红色方框表示其覆盖的所有像素均没有通过材质范围检测,会被顶点着色器完全抛弃,而绿色的像素则表示通过了深度测试和材质范围测试,将送入PS执行GBuffer的输出。
Nanite的光影计算和传统的光影混夹在一起,都在RenderLights接口中:
// Engine\Source\Runtime\Renderer\Private\LightRendering.cpp
void FDeferredShadingSceneRenderer::RenderLights(
FRDGBuilder& GraphBuilder,
FMinimalSceneTextures& SceneTextures,
const FTranslucencyLightingVolumeTextures& TranslucencyLightingVolumeTextures,
FRDGTextureRef LightingChannelsTexture,
FSortedLightSetSceneInfo& SortedLightSet)
{
(......)
const FSimpleLightArray &SimpleLights = SortedLightSet.SimpleLights;
const TArray<FSortedLightSceneInfo, SceneRenderingAllocator> &SortedLights = SortedLightSet.SortedLights;
const int32 AttenuationLightStart = SortedLightSet.AttenuationLightStart;
const int32 SimpleLightsEnd = SortedLightSet.SimpleLightsEnd;
(......)
{
RDG_EVENT_SCOPE(GraphBuilder, "DirectLighting");
if (ViewFamily.EngineShowFlags.DirectLighting &&
Strata::IsStrataEnabled() && Strata::IsClassificationEnabled())
{
// 更新模板缓冲, 为所有后续的Pass只标记一次简单/复杂的阶层材质.
Strata::AddStrataStencilPass(GraphBuilder, Views, SceneTextures);
}
(......)
// 无阴影光照.
if(ViewFamily.EngineShowFlags.DirectLighting)
{
RDG_EVENT_SCOPE(GraphBuilder, "NonShadowedLights");
(......)
}
// 带阴影光照.
{
RDG_EVENT_SCOPE(GraphBuilder, "ShadowedLights");
(......)
// 绘制阴影和带光照函数的光源.
for (int32 LightIndex = AttenuationLightStart; LightIndex < SortedLights.Num(); LightIndex++)
{
(......)
if (bDrawShadows)
{
INC_DWORD_STAT(STAT_NumShadowedLights);
(......)
else // (OcclusionType == FOcclusionType::Shadowmap)
{
(......)
// 清理阴影遮蔽纹理.
ClearShadowMask(ScreenShadowMaskTexture);
// 渲染阴影投射.
RenderDeferredShadowProjections(GraphBuilder, SceneTextures, TranslucencyLightingVolumeTextures, &LightSceneInfo, ScreenShadowMaskTexture, ScreenShadowMaskSubPixelTexture, bInjectedTranslucentVolume);
}
bUsedShadowMaskTexture = true;
}
(......)
if (bDirectLighting)
{
const bool bRenderOverlap = false;
// 渲染单个光源.
RenderLight(GraphBuilder, SceneTextures, &LightSceneInfo, ScreenShadowMaskTexture, LightingChannelsTexture, bRenderOverlap);
}
(......)
}
}
}
}
由于UE5的RenderLights的处理逻辑和UE4高度相似,仅增加了Strata模板的初始化。下面继续看RenderLight的逻辑:
void FDeferredShadingSceneRenderer::RenderLight(
FRHICommandList& RHICmdList,
const FViewInfo& View,
const FLightSceneInfo* LightSceneInfo,
FRHITexture* ScreenShadowMaskTexture,
FRHITexture* LightingChannelsTexture,
bool bRenderOverlap, bool bIssueDrawEvent)
{
(......)
// 渲染光源的内部接口.
auto RenderInternalLight = [&](bool bStrataFastPath)
{
(......)
// 设置Strata深度模板缓冲.
if (Strata::IsStrataEnabled() && Strata::IsClassificationEnabled())
{
GraphicsPSOInit.DepthStencilState = TStaticDepthStencilState<
false, CF_Always,
true, CF_Equal, SO_Keep, SO_Keep, SO_Keep,
true, CF_Equal, SO_Keep, SO_Keep, SO_Keep,
Strata::StencilBit, 0x0>::GetRHI();
}
else
{
GraphicsPSOInit.DepthStencilState = TStaticDepthStencilState<false, CF_Always>::GetRHI();
}
(......)
if (LightProxy->GetLightType() == LightType_Directional)
{
(......)
else
{
(......)
FDeferredLightPS::FPermutationDomain PermutationVector;
(......)
// 增加了Strata(阶层)排序
PermutationVector.Set< FDeferredLightPS::FStrata >(Strata::IsStrataEnabled());
PermutationVector.Set< FDeferredLightPS::FStrataFastPath >(Strata::IsStrataEnabled() && Strata::IsClassificationEnabled() && bStrataFastPath);
TShaderMapRef< FDeferredLightPS > PixelShader( View.ShaderMap, PermutationVector );
(......)
}
(......)
// 设置Strata目标值.
RHICmdList.SetStencilRef(bStrataFastPath ? Strata::StencilBit : 0u);
// 全屏幕绘制平行光.
DrawRectangle(
RHICmdList,
0, 0,
View.ViewRect.Width(), View.ViewRect.Height(),
View.ViewRect.Min.X, View.ViewRect.Min.Y,
View.ViewRect.Width(), View.ViewRect.Height(),
View.ViewRect.Size(),
GetSceneTextureExtent(),
VertexShader,
EDRF_UseTriangleOptimization);
}
else // 非平行光(局部光源)
{
(......)
TShaderMapRef<TDeferredLightVS<true> > VertexShader(View.ShaderMap);
// 相机是否在光源几何体内部.
const bool bCameraInsideLightGeometry = ((FVector)View.ViewMatrices.GetViewOrigin() - LightBounds.Center).SizeSquared() < FMath::Square(LightBounds.W * 1.05f + View.NearClippingDistance * 2.0f)
|| !View.IsPerspectiveProjection();
// 设置绑定几何体光栅化和深度状态, 其中bCameraInsideLightGeometry在此传进入.
SetBoundingGeometryRasterizerAndDepthState(GraphicsPSOInit, View, bCameraInsideLightGeometry);
(......)
else
{
(......)
// Strata.
PermutationVector.Set< FDeferredLightPS::FStrata >(Strata::IsStrataEnabled());
PermutationVector.Set< FDeferredLightPS::FStrataFastPath >(Strata::IsStrataEnabled() && Strata::IsClassificationEnabled() && bStrataFastPath);
TShaderMapRef< FDeferredLightPS > PixelShader( View.ShaderMap, PermutationVector );
(......)
}
(......)
RHICmdList.SetStencilRef(bStrataFastPath ? Strata::StencilBit : 0u);
(......)
// 根据不同类型的局部光选择不同的形状绘制.
if( LightProxy->GetLightType() == LightType_Point ||
LightProxy->GetLightType() == LightType_Rect )
{
StencilingGeometry::DrawSphere(RHICmdList);
}
else if (LightProxy->GetLightType() == LightType_Spot)
{
StencilingGeometry::DrawCone(RHICmdList);
}
}
};
// 调用一次非Strata版本的光源绘制(UE4的光源计算模式).
RenderInternalLight(false);
// 如果开启了Strata, 则再调用一次Strata版本的光源绘制.
if (Strata::IsStrataEnabled() && Strata::IsClassificationEnabled())
{
RenderInternalLight(true);
}
}
而光照的Shader代码也仅仅是增加了对Strata的支持,此处就不展开探讨了。
另外,值得一提的是,UE5的阴影计算使用了虚拟阴影图(VirtualShadowMap,VSM)技术,它是一种新的阴影投射方法,用于提供一致的、高分辨率的阴影、与电影质量的资产和大型开放世界的动态照明。
VSM最早由Markus Giegl等人在2007年提出,并发表了论文Queried Virtual Shadow Maps,随后又发表了改进篇Fitted Virtual Shadow Maps。多年后的2015年,Olsson Ola等人结合了Clusterred等渲染技术,发表了论文More efficient virtual shadow maps for many lights。
该技术的核心在于它以一种适应性的方式渲染阴影图,即在需要的地方创建更大的阴影贴图分辨率,不需要存储来自前一帧的信息,使其适用于完全动态的场景。因此,它可以保证阴影图亚像素精度的查询,消除了传统阴影图的投影和透视锯齿。
VSM采用虚拟分块阴影图(Virtual Tiled Shadow Mapping)技术,算法描述如下:
上:使用传统的阴影图,出现了严重的锯齿问题;下:使用了32x32 2048x2048的QVSM,阴影精度得到极大提升。
在UE5的实现中,VSM的最大分辨率为\(16k \times 16k\)像素,每个分块(页面)大小为\(128 \times 128\),以便在合理的内存成本下保持较高的性能。分块的分配和渲染只需要根据屏幕上需要着色的像素(基于深度缓冲区的分析)。分块会被缓存在帧之间,除非它们涉及的物体或灯光移动,这进一步提高了性能。
另外,UE5对定向光的阴影采纳了ClipMap技术,以取代CSM获取更高的阴影图分辨率。ClipMap最早于1998年由Christopher C. Tanner等人在论文The clipmap: a virtual mipmap中提出。该技术的核心在于设置一个阴影图mipmap大小的上限,超过这个上限的mipmap会被clip掉(不会加载到内存中):
由此构成了Clipmap Stack(堆栈)和Clipmap Pyramid(金字塔):
当摄像机(视野)发生变化时,需要修改重映射Clipmap Stack的区域,并加载重映射之后的Clipmap数据,使得Clipmap Stack部分和视野相对应:
视野发生变化后的Clipmap更新示意图,此处使用了环形更新(Toroidal Update)来提升性能。
Nanite技术涉及了渲染前的预处理构建、渲染时的各级粒度裁剪、光栅化、BasePass和Lighting阶段。这期间应用了大量的数据结构、算法、渲染技术以及对应的优化技术。
Nanite并非如之前所传的使用了Geometry Image技术,而是使用了Cluster、ClusterGroup、Page为基础的各级粗糙代表,这种技术可以充分利用预计算提前构建简化的数据以及对应的存储数据,以便在渲染时较高效地重建、索引、处理和渲染Nanite数据,但也导致了Nanite只支持静态网格的缺点。
Nanite的渲染阶段穿插于传统的渲染管线中,先后经历GPUScene更新、流管理、裁剪、光栅化、BasePass和Readback等阶段,充分发挥了GPU-Driven Rendering Pipeline的威力,最终将Nanite的数据良好地呈现到RenderTrage上。每个步骤都历经了众多Pass、渲染技术和优化技巧,比如:裁剪有逐Instance、逐Cluster、逐Page、逐三角形等不同粒度的裁剪,都是GPU Driven的裁剪,以减少CPU和GPU的IO;光栅化阶段默认使用了CS软光栅+PS硬光栅的混合关系,其中CS软光栅负责面积很小的三角光栅化(避免Quad Overdraw),而PS负责面积较大的三角形光栅化,光栅化之后输出的只是三角形ID和深度(Visibility Buffer技术),以减少GBuffer的占用和带宽的消耗;BasePass输出的结果跟传统的一样,存储于GBufferA、GBufferB…之中;后续的光照计算阶段,除了增加Strata模式的支持,其它光照逻辑基本和传统一样。
此外,为了提升阴影的质量和优化阴影的消耗,使用了VSM和Clipmap计算,获得了效果和消耗相平衡的实时渲染。
以下章节将在UE5特辑Part 2呈现:
6.5 Lumen
6.6 其它渲染技术
6.7 本篇总结
手机扫一扫
移动阅读更方便
你可能感兴趣的文章