【GDC2013】IVB Atmospheric Light Scattering(Part 2)

5. The algorithm

Our algorithm combines epipolar sampling by Engelhardt and Dachsbacher [ED10] with 1D min/max mipmap by Chen et al. [CBDJ11]. It goes through the following steps:
本文的实现算法[ED10]极坐标采样算法与[CBDJ11]一维min/max mipmap算法的结合，实现步骤给出如下：

Render the scene from the camera and from the light source（分两次渲染场景，一次normal，一次shadow map）
Reconstruct linear camera space Z from the depth buffer（从深度buffer中构建出线性深度）
Build 1D min/max mipmap（从shadow map中为每个epipolar slice构建一个一维的min/max mipmap）
Render coordinate texture（采样点布置）
- Samples are placed along the lines connecting the light source projected position with a user-defined number of points equidistantly placed along the border of the screen（采样点的位置落在从光源在屏幕上的投影位置出发到屏幕上等距分布的边缘点的各个射线上）
- If the light source is outside the screen, the line is intersected with the screen border and samples are placed between the intersection points（如果光源投影位置超出屏幕之外，那么从光源位置引出的射线就会与屏幕边缘存在两个交点，采样点就落在这两个交点之间）
- Samples that fall outside the screen are excluded from further processing（出于简化考虑，超出屏幕外的采样点就不需要考虑其对于大气散射的影响了）
- Detect depth discontinuities and refine initially placed sample locations（采样点排布规律）
- Initial samples are placed to capture light variation in the absence of occludes（在不考虑遮挡物的情况下，先根据一定的规律设置一些初始的采样点——通常是定长分布）
- Additional samples are placed at locations where camera space z difference between neighboring pixels exhibit a large threshold（之后根据深度的变化幅度，在深度边缘位置安插额外的采样点）
- Perform ray marching for selected samples（对于给定的采样点，进行ray marching计算——每个采样点对应的是屏幕上的一个像素，ray marching则是沿着相机与这个像素世界坐标的连线进行的一个积分计算）
- Interpolate inscattering radiance in epipolar coordinates for the rest of the samples from ray marching samples（对于相邻采样点之间的区域的其余点的数据，通过插值计算得到）
- Transform inscattering from epipolar coordinates to downscaled rectangular buffer（将inscattering从极坐标转换到一个低分辨率的RT中，没太看懂下面的实现）
- Determine the two closest epipolar lines
- Project the sample onto the lines and perform bilateral bilinear filtering taking into account the z difference（将极坐标中的采样点投影到平面坐标系中，并进行一次双线性插值，注意在插值的时候对深度不连贯的位置要进行特别考虑）
- Mark pixels for which there is no appropriate source samples（将那些没有对应的数据来源的像素标记出来）
- Correct inscattering for these samples, which could not be correctly interpolated from epipolar coordinates by performing ray marching（对那些无法通过ray marching算法在极坐标系中进行正确插值的采样点，进行额外的散射计算处理）
- Upscale the inscattering image to the original resolution and combine it with the attenuated background（将之前低分辨率的平面坐标散射贴图上采样到屏幕分辨率，并将其作用于背景上，根据散射强度对各个像素颜色进行相应的blend处理）
- Perform bilateral filtering（上采样后，进行双线性模糊处理）
- Mark pixels for which there is no appropriate source samples（将那些无任何数据关联的像素标记出来）
- Correct inscattering（修正处理）

While the most important concepts of the original epipolar sampling algorithm [ED10] are preserved, there are a number of improvements:
[ED10]极坐标采样算法中的一些非常重要的概念或者闪光点没有被抓住，本文对此给出了一系列的优化：
* We use the additional down-sampling step to provide additional control over speed/quality tradeoff. Usually, downscaling by a factor of 2x does not significantly degrade visual quality, at the same time making the rays look smoother and improving performance
* 本文增加了一个额外的下采样+上采样处理，以实现速度与质量之间的平衡，通常情况下2x的下采样不会导致渲染质量的下降，而且还有助于降低低频采样点之间的噪声，使之看起来更为平滑。
* Additional inscattering correction steps are introduced:（引入了额外的散射修正处理）
* In the original approach, the authors used bilateral filtering along and between epipolar lines when performing transformation from epipolar geometry to the original rectangular coordinates. Up to five samples along each epipolar line were used. If there is no appropriate sample to filter from, the authors proposed going to the next outer epipolar line. We found this approach still produces rendering artifacts and can’t be efficiently implemented on the GPU because it requires branching.
* 原算法中，作者在将极坐标转换为平面坐标的时候对epipolar lines以及lines之间的区域使用了双线性采样。在每条line上面最多使用五个采样点，如果某些区域上的采样点比较稀疏，导致部分像素拿不到对应的采样点数据源，就会从下一条line上获取数据，我们发现这种做法存在一些瑕疵，那就是由于shader分支的存在，会导致在GPU上实施起来比较低效。
* We implemented another method: we mark samples that cannot be correctly interpolated with stencil and perform an additional ray marching pass for these pixels using fewer steps. Note that this correction is done when both epipolar inscattering is transformed into rectangular geometry and when upscaling is performed.
* 相对于原算法，本文给出的算法就有所不同：前面说到，会将那些无法获取到数据源的像素标记出来，这个过程是通过stencil完成的，之后会对这些像素增加一次ray marching操作，需要注意的是，这个步骤是在极坐标已经转换到平面坐标，且上采样已经完成之后（放在这里的原因是？）

6. Implementation details

The algorithm implementation follows steps described in section 5. There are a number of textures which are used to store intermediate data required during the processing, which are summarized in table 1.

这里给出前面一节中的算法的实施细节，下面列出算法中需要用来存储一些中间数据的贴图：

此处表格复制效果有问题，因此就不贴了，有兴趣的同学烦请移步原文
Table 1: Textures used by the algorithm implementation

The algorithm workflow is summarized in fig. 8 and 9, while the rest of this section provides various details.

整个算法的工作流程可以用图8跟图9来概括，后面会继续介绍相关的实现细节。

图8给出的是算法的一些前置工作流程：

1.从场景depth buffer中构建出线性深度Space Z

2.将场景转换到极坐标空间，会用到三张贴图，分别存储像素的极坐标，对应的屏幕空间Space Z（进行连贯性检查）以及极坐标Depth（N_samples x N_slice，N_samples指的是从太阳出发的射线像素数目，注意，对于部分射线，不是所有的像素都是屏幕内的，因此前面才会有需要将一些屏幕外的像素剔除出去，不再计算的说法）

3.根据极坐标贴图与极坐标屏幕空间Space Z贴图拟定用于参与ray marching计算的采样点，得到可供插值的原始计算结果贴图

4.根据极坐标贴图，计算出当前slice对应的shadow map中的坐标位置（一个一维贴图）

5.根据slice uv贴图采样shadow map，得到一维的shadow map min/max mipmap（之所以要用mipmap，就是为了构建binary tree）贴图

Fig.8: Preliminary steps of the algorithm

图9给出的是inscattering的计算过程：

6.根据interpolation source贴图（存储的是每个像素对应的两个插值source sample的index）在极坐标depth贴图上标记出source sample（用于通过ray marching计算inscattering）

7.对标记需要通过ray marching计算inscattering的点，根据所需要的输入数据，计算出对应的inscattering

8.根据计算出来的inscattering数据，插值出剩余的inscattering数据

9.将计算结果从极坐标空间转换到平面笛卡尔坐标空间

10.将之前的计算结果下采样后，通过一些fix手段，消除瑕疵

11.将瑕疵消除后的贴图上采样，并与color贴图相混合

12.对上采样后的结果贴图进行再一次的fix处理

Fig.9: Final steps of the algorithm

Note that inscattering correction at steps 10 and 12 also uses camera space z, shadow map and slice UV direction textures, which is not shown in fig. 9 for clarity.

注意，第10跟第12步的fix操作都是需要用到相机空间深度贴图与shadow map贴图以及UV direction贴图（用于ray marching计算），图中未标出。下面给出实现的细节。

The remaining part of this section details all the stages of the algorithm.

6.1. Rendering coordinates texture

Coordinate texture generation is done by rendering screen-size quad with the texture set up to the pipeline as a render target. Pixel shader GenerateCoordinateTexturePS() performs all the required processing. Depth stencil state is configured to increment stencil value, thus all valid samples will be marked by 1 in the stencil, while all invalid will keep initial 0 value. Thus samples that fall behind the screen will be skipped from all further processing.

坐标贴图的生成是通过一个类似后处理来完成的，一个PS用于计算对应的极坐标，在这个过程中，stencil pass被设置成increment，这样一来，满足条件的像素的stencil为1，不满足条件的为0（不在屏幕范围内的），这些不满足条件的像素在后续的处理过程中就可以被直接跳过。

We assume that epipolar slice coordinate ranges from 0 to 1. Screen borders are traversed in a counter clockwise order starting from the left top corner (fig. 10): values from 0 to 0.25 define locations on the left border, values in the ranges from 0.25 to 0.25, from 0.5 to 0.75 and from 0.75 to 1.0, define locations on bottom, right and top borders correspondingly. Values 0, 0.25, 0.5, 0.75, 1 define locations in exactly the screen corners.

这里假设极坐标范围为[0,1]，其角度定义方式按照逆时针进行，如下图所示，左上角为0，左下角为0.25，一直到循环回到左上角1.

_Screen borders traversal order

The stage consists of the following steps:

Computing epipolar line exit point
Computing epipolar line entry point given its exit point
Rescaling epipolar line length to provide even texel to screen pixel correspondence
Computing camera space z for the location

这个阶段包括以下几个步骤：

计算epipolar line的终点
在给定终点的前提下，计算epipolar line的起点
对epipolar line进行缩放，使得这个line上的像素接近于屏幕像素数目
对于line上的像素，计算其对应的相机空间深度值

如图所示，如果不对epipolar line进行缩放的话，就会导致在光源距离屏幕边缘较近的情况下，某些line上的采样点数目过多，导致浪费，在根据到屏幕的距离进行缩放之后，就能够保证采样点之间的距离适中，且可以将采样点的分布pattern从rectangle变换成circular，后者更适合用于计算light shaft。

Computing epipolar line exit point is relatively simple as it lies on one of the four screen boundaries. The following code effectively computes this location using arithmetic instructions only:

epipolar line的终点位置都是落在屏幕的四条边上，下面给出其计算的代码：

uint uiBoundary = clamp(floor( fEpipolarSlice * 4 ), 0, 3); //locate the edge index

float fPosOnBoundary = frac( fEpipolarSlice * 4 ); //take the fraction as the position

float fBoundaryCoord = -1 + 2*fPosOnBoundary; // Left Bttom Right Top float4 f4BoundaryXCoord = float4( -1, fBoundaryCoord, 1, -fBoundaryCoord); 

float4 f4BoundaryYCoord = float4(-fBoundaryCoord, -1, fBoundaryCoord, 1); //horse pattern

bool4 b4BoundaryFlags = bool4(uiBoundary.xxxx == uint4(0,1,2,3)); // Select the right coordinates for the boundary 

float2 f2ExitPoint = float2(dot(f4BoundaryXCoord, b4BoundaryFlags), dot(f4BoundaryYCoord, b4BoundaryFlags));

The next step is a bit more complex: we need to compute epipolar line entry point given its exit point and position of the light source on the screen. This is accomplished by the GetEpipolarLineEntryPoint() function. There are two possible cases: light is located inside the screen and outside it (fig. 11). The first case is simple: entry point is simply the position of the light on the screen. In the second case we have to find intersection of the epipolar line with the appropriate boundary.

下一步是在给定exit point的前提下计算entry point，这个计算有两种情况：

1.光源落在屏幕内，那么光源在屏幕中的位置就是entry point所在位置

2.光源在屏幕之外，就需要计算exit point与光源在屏幕的四条边上的另一个交点

Fig.11: Entry and exit points of an epipolar line when light source is on the screen and outside it

Our task is to find first intersection of the ray connecting light projected position and the exit point before the exit point (fig. 12).

根据entry与exit的定义，第二种情况中，entry point肯定是处于light source与exit point连线中的。

Fig.12: Computing entry and exit points of an epipolar line when light source is outside the screen

For this, we compute signed distances to left, bottom, right and top boundaries along the ray and find the maximum distance which is less then distance to the exit point (examine fig. 12). We also take care of near horizontal and near vertical ray orientations when distances to left and right or top and bottom boundaries cannot be properly computed. We use special flag vector to skip computations with incorrect boundaries. The following code snippet accomplishes this task using mathematical instructions only and avoiding branches:

下面的代码给出第二种情况中，是如何通过计算出连线上的各个点到四条边带符号距离来求取交点的（与某一条边的距离为0，与其相邻边的距离符号要相反），这里需要对垂直以及水平方向做特别考虑。

// Compute direction from the light source to the ray exit point: 

float2 f2RayDir = f2ExitPoint.xy - g_LightAttribs.f4LightScreenPos.xy; 

float fDistToExitBoundary = length(f2RayDir); 

f2RayDir /= fDistToExitBoundary; 

// Compute signed distances along the ray from the light position to all four boundaries 

bool4 b4IsCorrectIntersectionFlag = abs(f2RayDir.xyxy) > 1e-5; 

float4 f4DistToBoundaries = (float4(-1,-1,1,1) - g_LightAttribs.f4LightScreenPos.xyxy) / (f2RayDir.xyxy + !b4IsCorrectIntersectionFlag); 
//避免除以0，得到到各条边的距离，在这四个距离中，排除两个不小于到exit point的距离
//从中挑选那个较大的一个，就是我们想要的交点

// Addition of !b4IsCorrectIntersectionFlag is required to prevent divison by zero

 // Note that such incorrect lanes will be masked out 

// We now need to find first intersection BEFORE the intersection with the exit boundary 

// This means that we need to find maximum intersection distance which is less than fDistToBoundary 

// We thus need to skip all boundaries, distance to which is greater than the distance to exit boundary 

// Using -FLT_MAX as the distance to these boundaries will result in skipping them: 

b4IsCorrectIntersectionFlag = b4IsCorrectIntersectionFlag && ( f4DistToBoundaries < (fDistToExitBoundary - 1e-4) ); 

f4DistToBoundaries = b4IsCorrectIntersectionFlag * f4DistToBoundaries + !b4IsCorrectIntersectionFlag * float4(-FLT_MAX, -FLT_MAX, -FLT_MAX, -FLT_MAX); 

float fFirstIntersecDist = 0; 

fFirstIntersecDist = max(fFirstIntersecDist, f4DistToBoundaries.x); fFirstIntersecDist = max(fFirstIntersecDist, f4DistToBoundaries.y); fFirstIntersecDist = max(fFirstIntersecDist, f4DistToBoundaries.z); fFirstIntersecDist = max(fFirstIntersecDist, f4DistToBoundaries.w); 

// Now we can compute entry point: f2EntryPoint = g_LightAttribs.f4LightScreenPos.xy + f2RayDir * fFirstIntersecDist;

Note that if light source is located outside the screen, there could be several cases when the whole slice is not visible (fig. 13). For such cases coordinates of the entry point will also be outside the screen.

注意，如果光源处于屏幕之外的话，那么有部分的epipolar slice会是整个的处于屏幕之外（entry point也是落在屏幕之外的），如图13所示

Fig.13: Entry point of a completely invisible epipolar line

Such pixels are easily detected and discarded so that they will be skipped from further processing.

这些情况可以通过如下代码处理掉，避免后续浪费算力：

if( any(abs(f2EntryPoint) > 1+1e-4) ) discard;//坐标超出边界[0,1]

If light source is located close to screen boundary, the screen length of epipolar lines could vary significantly. This will result in using too dense sampling for short lines and doing redundant calculations and also could cause aliasing artifacts. To solve this issue, we rescale the epipolar lines by advancing exit point (fig. 14). We strive to provide 1:1 correspondence between samples on the epipolar line and screen pixels. The following code updates epipolar line exit point:

如果光源位置比较靠近屏幕边缘，就会导致epipolar line的屏幕尺寸变化过于剧烈，从而导致一个极小范围内的过多的采样点，以及锯齿等问题。为了避免这种情况，就需要对epipolar line进行缩放，使之上面的像素跟屏幕空间的像素数目能够比较好的匹配起来：

float fEpipolarSliceScreenLen = length( (f2ExitPoint - f2EntryPoint) * g_PPAttribs.m_f2ScreenResolution.xy / 2 ); 

f2ExitPoint = f2EntryPoint + (f2ExitPoint - f2EntryPoint) * max(g_PPAttribs.m_f2CoordinateTexDim.x / fEpipolarSliceScreenLen, 1);

Fig.14: Advancing epipolar line exit point to provide even sample to screen pixel correspondence

This step not only reduces the amount of computations necessary, but also results in a more natural circular-shaped distributions of samples against rectangular-shaped distribution in original algorithm.

Finally, we compute interpolated location of the sample between entry and exit points and discard these samples that fall outside the screen:

最终，根据exit跟entry point，对其间的数据进行插值。

f2XY = lerp(f2EntryPoint, f2ExitPoint, fSamplePosOnEpipolarLine);

if( any(abs(f2XY) > 1+1e-4) )

discard;

The shader also outputs camera space Z coordinate for the epipolar sample which is used by subsequent steps of the algorithm.

在这个计算过程中，还会同时输出epipolar 采样点的相机空间的深度值，用于后续的计算。

Note that coordinate texture as well as camera space z texture are initially cleared with incorrect coordinates which are outside the allowable ranges. This is important since these values will help skip such pixels from further processing.

注意，在整个计算过程之初，两张贴图的数据都是用无效的数据进行过清理的，保证可以跳过后续的计算步骤，节省开销。

渲染得到的coordinate 贴图给出如上，其中每一行代表一个epipolar line，每一列代表line上的采样点对应的屏幕空间的uv坐标。

6.2. Refining sample locations

The next step of the algorithm is refining initially placed ray marching sample locations by finding depth discontinuities. This stage generates interpolation source texture which for each sample contains indices of two samples from the same slice, from which current sample will be interpolated. Basically, the algorithm performing search for the interpolation samples (indexed by left and right) in the depth array depth1d of size N, for the current texel at location x as presented in [ED10] is the following:

这一节给出ray marching采样点根据深度不连贯的优化处理过程，同时会生成插值source贴图（每个像素给出其对应的两个ray marching采样点的索引，在下面的算法中用left跟right来标注），用于实现ray marching结果的插值，下面的代码给出的是在一维的深度数组中（尺寸为N，将每个line segment均分成N等份）查找出深度的断裂点：

left = right = x;

while ( left > 0 )

{

     if(abs( depth1d[left-1], depth1d[left] ) > threshold)

         break;

     left --;

}

while ( right < N-1 )

{

     if(abs( depth1d[right], depth1d[right+1] ) > threshold)

         break;

     right ++;

}

Algorithm 3: Searching for depth discontinuities.

If there is no depth discontinuities on the ray section, the interpolation sources are the end point of this section. If depth discontinuity is detected, the sample will be interpolated from sample placed directly before the break.

如果在当前的line segment上深度一直都是连贯的，那么在这个line segment上就不需要增加额外的ray marching计算采样点，这个segment中的其他点的数据可以直接根据segment的端点ray marching数据计算，否则在深度断裂处的前一个点处增加额外的ray marching点，并以此作为插值source点。

We tried several strategies while implementing search for depth discontinuities. Our first approach was straightforward implementation of the algorithm 2 in pixel shader as suggested by [ED10]. We found out that this implementation is too slow, especially when large initial sampling step (32 and greater) is used.

为了查找深度不连贯的点，这里尝试了多种方法，最开始的方法就是直接通过在像素shader中执行algorithm 2来进行，不过后来发现这种算法实现起来过于缓慢，且在初始的采样step比较大的时候这个问题就更为严重。

We implemented the optimized search algorithm using compute shader. The performance of the implementation is up to 6x higher than the performance of the original pixel-shader based approach and almost independent of the initial sampling step.

之后使用了一种基于compute shader的优化搜索算法，其实现效率最快可以达到像素shader版本的六倍，且其时间消耗跟初始的采样step是无关的。

The compute shader organizes the work it performs into groups of threads. Number of threads in each group must be at least 32 and not less than the initial sample step due to the reasons which will be clear a bit later. Each group of threads processes one or several sections in one epipolar slice. If initial sample step Si is less than the total number of threads Nt in a group, then the group processes Nt/Si ray sections. Otherwise (Nt=Si) each group processes one section. Location of one sample group within interpolation source texture is illustrated in fig. 15.

一个compute包含多个线程组，每个线程组中的线程数目不得少于32与初始的采样step（原因后面会给出），每个线程组会处理一个或者多个section（取决于step的尺寸，比如线程组包含32个线程，step为16，那么一个线程组可以处理2个section）：如果线程组中包含的线程数Nt>step的大小Si，那么一个线程组就会处理Nt/Si个section，否则处理一个section（前面约定了Nt不小于Si），从而保证每个线程只处理section中的一个单元。

Fig.15: Location of the thread group within coordinate texture

Location of individual sample processed by one thread in the group is shown in fig. 16.

每个sample点都由一个线程处理，示意图建Fig.16

Fig.16: Location of individual sample processed by one thread in the group

The number of threads in the group must be large enough to fully utilize the SIMD units, so we set it to be at least 128. Our implementation exploits group shared memory exposed by the compute shader to efficiently implement discontinuities search. The idea is first to check if there is a depth break located next to each sample in the group and store this information in a compact shared array. Since 1 bit is sufficient to indicate depth break, we can pack 32 flags into one UINT variable. We use the following shared array:

为了充分利用SIMD的便利，这里设定线程组内的线程数目为128，由于线程组之间的线程是可以共享内存的，因此考虑将各个线程在深度不连贯检测的结果（用1位就可以表示了）放入到一个共享的数据结构中，那么一个线程组就可以用4个uint来表示其检测结果（这就是为什么线程组的大小不得小于32的原因），之后将这个结果用于搜索深度不连贯：

static const uint g_uiNumPackedFlags = THREAD_GROUP_SIZE/32; groupshared uint g_uiPackedCamSpaceDiffFlags[ g_uiNumPackedFlags ];

It is now clear why the group size must be at least 32. The array is then used to efficiently search for depth discontinuities.

At the very beginning the shader checks if the sample it processes is correct. Recall that coordinate texture is initially cleared with invalid coordinates, which indicate samples outside the screen:

之前说过，极坐标贴图中通过stencil来确保有效数据能够被识别到，因此在shader运行之前，需要通过有效性判断，来避免无效的计算：

bool bIsValidThread = all( abs(f2SampleLocationPS) < 1+1e-4 );

The shader then loads camera space z for current sample location and for its neighboring sample from the camera z coordinate texture generated at the previous stage, computes the difference and compares it with the threshold and sets up appropriate flag. Notice using InterlockedOr() function.

之后读取当前sample对应的相机空间的深度值与其相邻sample的深度值，并进行比对，判定是否满足深度不连贯的条件，并设置对应的标记：

// Load camera space Z for this sample and for its right neighbor (remeber to use global sample index)

bool bFlag; float fCamSpaceZ = g_tex2DEpipolarCamSpaceZ.Load( uint3(uiGlobalSampleInd, uiSliceInd, 0) );

float fRightNeighbCamSpaceZ = g_tex2DEpipolarCamSpaceZ.Load( uint3(uiGlobalSampleInd+1, uiSliceInd, 0) );

// Compare the difference with the threshold

bFlag = abs(fCamSpaceZ - fRightNeighbCamSpaceZ) < g_PPAttribs.m_fRefinementThreshold;

// Set appropriate flag using INTERLOCKED Or:

InterlockedOr( g_uiPackedCamSpaceDiffFlags[uiSampleInd/32], bFlag << (uiSampleInd%32) );

An important aspect here is that camera space z texture contains large negative values for all samples which fall behind the screen. As a result, difference between the last sample on the screen and the first sample behind the screen will always exceed the threshold, and depth discontinuity will be detected.

这里有一点需要注意，由于屏幕之后的采样点（这是什么意思？）的相机空间深度值是一个比较大的负数，因此屏幕上的最后一个像素与屏幕之后的第一个像素的深度差值总是会超出设定的阈值，从而会被检测到深度不连贯。

After all the flags are set, all the threads must be synchronized which is done by the call to GroupMemoryBarrierWithGroupSync() function.

在所有线程的flag计算都完成之后，需要通过GroupMemoryBarrierWithGroupSync函数来保证所有线程的数据都是同步的。

On the next step, the shader computes initial indices uiInitialSample0Ind and uiInitialSample1Ind of the source ray marching samples at the ends of the current ray section (fig. 17). Initial sample step is here adjusted to provide higher sampling density near the epipole to account for high variation of scattering intensity.

之后，就要为各个sample点设置其插值的source sample的index：uiInitialSample0Ind & uiInitialSample1Ind 。在这里，会需要在极点位置调整初始采样step的尺寸，以保证在这片散射强度变化速率较大的区域拥有更高的采样密度。

Fig.17: Initially placed ray marching samples at the ends of the ray section

If the sample index equals one of initial sample indices, the sample is ray marching and is interpolated from itself. Otherwise it is necessary to search for the two interpolation sources. But before performing the search itself, it is easy to check if there is at least one depth break on the current ray segment:

如果某个采样点本身就是ray marching计算点，那么就直接用其计算结果作为输出，否则就需要从两个source ray marching计算点进行插值，不过在进行source搜寻之前，还需要检测在当前的ray segment上是否存在深度不连贯点。

bool bNoDepthBreaks = true;

#if INITIAL_SAMPLE_STEP < 32

{

     // Check if all uiInitialSampleStep flags starting from

     // position uiInitialSample0Ind are set:

     int iFlagPackOrder = uiInitialSample0Ind / 32;

     int iFlagOrderInPack = uiInitialSample0Ind % 32;

     uint uiFlagPack = uiPackedCamSpaceDiffFlags[iFlagPackOrder];      uint uiAllFlagsMask = ((1<<uiInitialSampleStep) - 1);

     if( ((uiFlagPack >> iFlagOrderInPack) & uiAllFlagsMask) !=                uiAllFlagsMask )

         bNoDepthBreaks = false;

}

#else

{

for(uint i=0; i < g_uiNumPackedFlags; ++i)

     if( uiPackedCamSpaceDiffFlags[i] != 0xFFFFFFFFU )

     // If at least one flag is not set, there is a depth break on this                 section

         bNoDepthBreaks = false;

} #endif

If there are no depth breaks, no further processing is required. If it was found, that there is at least one depth break on current ray section, we perform search for two interpolation sources. Due to the efficient packing scheme, the search can be efficiently implemented using bit manipulation and two intrinsic functions: firstbitlow() and firstbithigh() which return bit position of the first non-zero bit starting from the lowest order bit and the highest order bit correspondingly. If initial sample step is not greater than 32, this implementation avoids using loops at all. If initial sample step is greater than 32, the number of iterations the loop is executed is at most Si/32.

如果整个segment中不存在深度不连贯的点，那么就不需要进行后续的计算，否则就需要进行进一步的搜索source interpolation点。在当前的实现框架下，可以通过位操作（主要用到两个位函数：firstbitlow以及firstbithigh，分别用于返回从最低位开始以及最高位开始的第一个非0数值所在位置）实现快速查找。如果初始sample step不小于32，整个计算过程可以直接顺序完成，否则需要增加循环处理机制。

6.3. Building min/max mipmap

At this stage, a min/max mipmap is constructed. For this, a 1D auxiliary texture is first rendered which contains UV direction of the slice in the shadow map. The shader loads coordinates of the first sample in the slice (note that sample with the index 0 is the location of light source). It then reprojects this location from camera projection space to light space and computes direction from the light space position to the resulting location. Note that direction is normalized in a way that the it always covers one shadow map texel. Note also that complimentary depth buffering is used.

在这一步之前，需要先从shadow map中生成出当前epipolar slice对应的UV direction一维贴图（这个一维贴图指明了每个epipolar slice在shadow map中的uv方向）。之后据此构建一维的min/max shadow mipmap。

UV direction贴图生成算法给出如下：先读取slice中的第一个采样点的坐标，并将之从相机投影空间转换到光源空间（注意，index0的采样点对应的是光源，所以这里的第一个采样点是光源之后的第一个采样点，也就是index1），之后从光源向这个位置引出一条射线，由于这个射线的方向是经过归一化的，因此其对应的shadow map位置只覆盖一个采样点（在这个过程中还会用到complimentary 深度贴图，这个是干啥的？），再跟光源在shadow map中的位置相减（光源在shadow map中的位置是不是在中心？），就得到了slice对应的uv direction了。

float2 RenderSliceUVDirInShadowMapTexturePS(SScreenSizeQuadVSOutput In) : SV_Target

{

     // Load location of the first sample in the slice (after the light source, which is 0-th sample)

     uint uiSliceInd = In.m_f4Pos.x;

     float2 f2FirstSampleInSliceLocationPS = g_tex2DCoordinates.Load( uint3(1, uiSliceInd, 0) );

     if( any( abs(f2FirstSampleInSliceLocationPS) > 1 + 1e-4 ) )

         return float2(-10000, -10000);

// Reproject the sample location from camera projection space to light space

     float4 f4FirstSampleInSlicePosInLightProjSpace = mul( float4(f2FirstSampleInSliceLocationPS, 0, 1), g_LightAttribs.mCameraProjToLightProjSpace);                           f4FirstSampleInSlicePosInLightProjSpace /= f4FirstSampleInSlicePosInLightProjSpace.w;

float2 f2FirstSampleInSliceUVInShadowMap = ProjToUV( f4FirstSampleInSlicePosInLightProjSpace.xy );

// Compute direction from the camera pos in light space to the sample pos

     float2 f2SliceDir = f2FirstSampleInSliceUVInShadowMap - g_LightAttribs.f4CameraUVAndDepthInShadowMap.xy;

     f2SliceDir /= max(abs(f2SliceDir.x), abs(f2SliceDir.y));

     return f2SliceDir;

}

1D min/max mipmap is

in size where S is the resolution of original shadow map. Each row of the min/max mipmap contains 1D min/max binary tree for appropriate epipolar slice. The layout of the texture is shown in the figure below:

一维的min/map mipmap的分辨率为SxN_slices，其中S指的是原shadow map的分辨率（这个地方尚存疑惑，指的应该是一维的shadow map的分辨率）。mipmap的每一行都包含了对应于一个epipolar slice的一个一维min/max的二叉树结构，贴图的布局如下图所示：

Fig.18: Layout of min/max mipmap texture

Note that the 1D min/max mipmap does not contain original shadow map, so its first level is two times down sampled and has

samples. Note also that f2SliceDir is computed in a way that at most S steps are required to cross the shadow map texture.

由于一维的min/max mipmap阵列中并不需要包含原尺寸的shadow map，因此其第一级mipmap的分辨率为S/2，另外，f2SliceDir的计算过程最多需要S步就能cross整个shadow map贴图（没太明白这个意思）

Creating min/max shadow map is performed using well-known flip/flop approach when two textures are alternately used as a source and destination. Initial shadow map is used to initialize first level of the min/max binary trees. 1D min/max tree construction starts from the projected light source position. Note that Gather() instruction is used at this stage to load four source values which will be used to perform bilinear interpolation. In practice we do not construct full binary tree because it is very unlikely that coarse levels could be reached. Besides, rendering to low-resolution textures is inefficient on modern GPUs.

min/max shadow map的创建使用的是经典的flip/flop实现方式（整个过程使用两张贴图，第一次使用A作为source，B作为destination，第二次就反过来），第一次渲染的source贴图使用的是shadow map。一维min/max二叉树的构建过程的起点为投影后的光源位置，且在这个过程中使用了Gather指令来同时采样四个像素进行双线性插值。在实际计算中，并不需要构建完整的二叉树，因为这里认为过于粗糙的层次数据在实际使用中不太可能会用到，且在渲染中使用过低分辨率的贴图，其消耗会比较高（为什么？）

float2 InitializeMinMaxShadowMapPS(SScreenSizeQuadVSOutput In) : SV_Target

{

    uint uiSliceInd = In.m_f4Pos.y;

     // Load slice direction in shadow map

     float2 f2SliceDir = g_tex2DSliceUVDirInShadowMap.Load( uint3(uiSliceInd,0,0) );

     // Calculate current sample position on the ray

float2 f2CurrUV = g_LightAttribs.f4CameraUVAndDepthInShadowMap.xy + f2SliceDir *     floor(In.m_f4Pos.x) * 2.f * g_PPAttribs.m_f2ShadowMapTexelSize;

    // Gather 8 depths which will be used for PCF filtering for this sample and its immediate neighbor along the epipolar slice

     float4 f4Depths = g_tex2DLightSpaceDepthMap.Gather(samLinearBorder0, f2CurrUV);

     // Shift UV to the next sample along the epipolar slice:

     f2CurrUV +=     f2SliceDir * g_PPAttribs.m_f2ShadowMapTexelSize;

     float4 f4NeighbDepths = g_tex2DLightSpaceDepthMap.Gather(samLinearBorder0, f2CurrUV);

     float4 f4MinDepth = min(f4Depths, f4NeighbDepths);

     f4MinDepth.xy = min(f4MinDepth.xy, f4MinDepth.zw);

     f4MinDepth.x = min(f4MinDepth.x, f4MinDepth.y);

     float4 f4MaxDepth = max(f4Depths, f4NeighbDepths);

     f4MaxDepth.xy = max(f4MaxDepth.xy, f4MaxDepth.zw);

     f4MaxDepth.x = max(f4MaxDepth.x, f4MaxDepth.y);

     return float2(f4MinDepth.x, f4MaxDepth.x);

}

After that next levels of the binary trees are constructed by loading two min/max values from the next finer level and computing min/max value.

经过初始的min/max构建之后，之后的每一轮min/max构建都是从上一轮构建的结果中按照相邻像素两两合并的方式求得。

6.4. Ray marching

After the sampling refinement stage all ray marching samples are marked as being interpolated from themselves. Before performing ray marching, a screen-size quad is rendered with simple pixel shader discarding all pixels, which are not interpolated from themselves:

在经过精修阶段之后，所有的ray marching采样点都被标记成从自身进行插值求得，在进行ray marching之前，会先通过PS进行一次屏幕空间的渲染，在这个渲染中，会discard掉那些非ray marching采样点。

uint2 ui2InterpolationSources = g_tex2DInterpolationSource.Load( uint3(In.m_f4Pos.xy,0) );

// Ray marching samples are interpolated from themselves, so it is easy to detect them:

if( ui2InterpolationSources.x != ui2InterpolationSources.y )
     discard;

Depth stencil state is configured to increment stencil value. As a result, all ray marching samples will be marked by the 2 in stencil.

通过stencil操作，保证了ray marching采样点的stencil被设置成2

After that another screen-size quad is rendered with depth stencil configured to pass only these pixels whose stencil values equals 2 and discards all other pixels. Pixel shader performs ray marching on the selected locations as described in Algorithm 2 presented in section 4.

在上述操作完成之后，会再进行一次屏幕空间PS渲染，只对那些stencil为2的像素进行处理，在这个渲染中，会通过PS完成ray marching采样点的ray marching渲染（用第四节中给出的算法2）

Implementing colored light shafts is straightforward. For this we only need stained glass color texture as seen from the light source and on Each iteration of Algorithm 2 fetch color from this texture along with the shadow depth. This color is then used to modulate the sun color. Note that in this case we cannot skip long lit sections of the ray, which makes 1D min/max optimization less efficient.

而如果想要实现带颜色的light shaft，就需要在algorithm 2中的每次迭代的时候，从一个以光源作为相机的观察角度而得来的color贴图中读取对应的颜色数据以及阴影深度数据，之后将这个颜色数据与光照数据进行调制，不过需要注意的是，如果要表达真实正确的结果，就不能对整段连续的点亮区域进行简化计算（即只计算两端数据，中间数据使用插值得到），这就会导致实现消耗急剧增高

6.5. Interpolating in-scattering

The next stage is interpolation of in-scattered radiance for the rest of the samples from ray marching ones. For this, indices of interpolation sources are loaded from the interpolation source texture and appropriate sample are loaded from initial in-scattering texture. This work is done by the following shader.

在求得ray marching采样点的inscattering数据之后，下一步就是对其他非ray marching采样点进行插值处理，这个过程需要读取插值source贴图来获取当前像素所对应的ray marching点的索引：

float3 InterpolateIrradiancePS(SScreenSizeQuadVSOutput In) : SV_Target

{

     uint uiSampleInd = In.m_f4Pos.x;

     uint uiSliceInd = In.m_f4Pos.y;

     // Get interpolation sources

     uint2 ui2InterpolationSources = g_tex2DInterpolationSource.Load(     uint3(uiSampleInd, uiSliceInd, 0) );

     float fInterpolationPos = float(uiSampleInd - ui2InterpolationSources.x) /     float( max(ui2InterpolationSources.y - ui2InterpolationSources.x,1) );

     float3 f3Src0 = g_tex2DInitialInsctrIrradiance.Load(     uint3(ui2InterpolationSources.x, uiSliceInd, 0) );

    float3 f3Src1 = g_tex2DInitialInsctrIrradiance.Load(     uint3(ui2InterpolationSources.y, uiSliceInd, 0));

     // Ray marching samples are interpolated from themselves

     return lerp(f3Src0, f3Src1, fInterpolationPos);

}

6.6. Transforming epipolar in-scattering to rectangular coordinates

At this stage we have to transform inscattering image from epipolar coordinates back to the rectangular space. For this purpose, we first need to determine the two closest epipolar lines, project the pixel onto these and perform bilateral interpolation. Rendering is done with depth stencil state configured to increment stencil value. Note that different depth stencil buffer is used at this stage. The following steps are performed:

在这个阶段，会需要将极坐标系下的inscattering贴图转换回平面坐标空间中，其实现方式是遍历整个屏幕空间的所有像素，对于每个像素，先找到两条距离当前像素最近的epipolar line，将像素投影到这两条line上面（应该是将低分辨率的贴图的像素转换到平面坐标系空间）并进行双线性插值。整个渲染过程需要设置stencil的pass操作为increment。注意，在这个过程中，会用到不同的depth/stencil buffer，其主要的操作步骤给出如下：

· The epipolar line which contains current pixel is identified -- 对每一个像素，判定其对应的epipolar line

· Current sample is projected onto the line -- 将当前像素投影到对应的epipolar line

· Bilateral interpolation is performed using interpolated in-scattering texture and epipolar camera space z -- 根据极坐标系下相机空间的深度值对插值后的inscattering贴图采样数据进行双线性插值

At the first step, we have to determine which epipolar line current pixel lies on. What we need to do is to connect light source projected position with one of four boundaries, and determine the slice entry and exit points. This is not as simple as it seems from the first glance because the light source could be outside the screen. Thus the obvious solution of determining the closest intersection with the boundary along the ray connecting light source and current pixel will not work if the light is not on the screen. A universal solution we implemented is based on determining to which of 4 sectors the pixel belongs to (fig. 19). If we know this, we will be immediately able to connect the light source with the appropriate border and get the exit point. After that we can compute entry point using GetEpipolarLineEntryPoint() function describe in section 6.1.

第一步是需要判定当前像素所从属的epipolar line，直观上来看，我们需要做的是根据光源位置计算出epipolar line的entry跟exit位置，不过实际计算过程中比这个会复杂一点，因为有可能光源位置是处于屏幕之外的，这里给出的一种通用的计算方式是先将光源跟屏幕的四个角点连接起来，将整个屏幕分割成四个区域，之后根据像素所在的位置确定当前的像素是属于哪个区域，那么这个区域所对应的边就是exit所在的边，据此可以计算出exit点，之后根据6.1节的算法计算出entry点。

Fig.19: Four sectors of the screen formed by connecting light source with screen corners

To determine the sector, we first determine which of four half spaces formed by connecting each of the four screen corners with the light source the current pixel belongs to (fig 20).

为了确定像素所在的区域，这里的实现方法是先将光源与四个焦点分别连线，得到将屏幕空间一分为二的划分，如下图所示：

Fig.20: Four half-spaces formed by connecting each of the four screen corners with the light source

It is then easy to see that the following relations are always true for any pixel P:之后对于像素P而言，就能得到如下的一些等价关系：

The following code snippet efficiently computes sector mask vector, which has 1 in the component corresponding to the required sector:

下面的代码可以高效的计算出区域mask vector，vector中为1的索引就是区域的索引：

float2 f2RayDir = normalize( In.m_f2PosPS - g_LightAttribs.f4LightScreenPos.xy );

float4 f4HalfSpaceEquationTerms = (In.m_f2PosPS.xxyy - float4(-1,1,-1,1)) * f2RayDir.yyxx;

bool4 b4HalfSpaceFlags = f4HalfSpaceEquationTerms.xyyx < f4HalfSpaceEquationTerms.zzww;

bool4 b4SectorFlags = b4HalfSpaceFlags.wxyz && !b4HalfSpaceFlags.xyzw;

Having this mask we can compute epipolar line exit point by computing distances to all four boundaries and then selecting the required distance using b4SectorFlags. After that and entry point can be computed with GetEpipolarLineEntryPoint():

得到这个mask vector之后，我们就能够计算epipolar line的exit点，而entry点的计算过程则可以给出如下：通过计算当前像素到四个边的距离，并根据b4SectorFlags从中选择出对应的距离，之后就可以通过GetEpipolarLineEntryPoint计算对应的entry点。

float4 f4DistToBoundaries = ( float4(-1,-1, 1,1) - g_LightAttribs.f4LightScreenPos.xyxy ) / (f2RayDir.xyxy + float4( abs(f2RayDir.xyxy)<1e-6 ) );

// Select distance to the exit boundary:

float fDistToExitBoundary = dot( b4SectorFlags, f4DistToBoundaries );

// Compute exit point on the boundary: float2 f2ExitPoint = g_LightAttribs.f4LightScreenPos.xy + f2RayDir * fDistToExitBoundary;

float2 f2EntryPoint = GetEpipolarLineEntryPoint(f2ExitPoint);

Note that the method above works for any light source location either when it is outside and inside the screen.

注意，上面的实现算法适用于所有的光源位置，不论是inside还是outside

Epipolar slice number which corresponds to ordering presented in section 6.1 can be computed as follows:

epipolar line索引就可以根据如下的算法计算得到：

float4 f4EpipolarSlice = float4(0, 0.25, 0.5, 0.75) + (0.5 + float4(-0.5, +0.5, +0.5, -0.5)*f2ExitPoint.yxyx)/4.0;

float fEpipolarSlice = dot(b4SectorFlags, f4EpipolarSlice);

Sample location on epipolar slice is computed by simply projecting the pixel onto the line connecting entry and exit points. This gives us coordinate f2ScatteredColorUV in the interpolated inscattering texture to filter from.

epipolar line上的采样点位置则可以通过将像素投影到这个line上来求得，其给出的结果是插值后的inscattering贴图上的UV坐标f2ScatteredColorUV

To perform bilateral interpolation we need to compute bilinear weights and locations of the source samples. The following code snippet computes bilinear filter weights as well as texture coordinates f2ScatteredColorIJ of the center of the left bottom source texel.

为了进行双线性插值，就需要计算source采样点的双线性插值权重以及位置，下面的代码给出了左下角source像素中心的贴图坐标以及权重的计算过程：

float2 f2ScatteredColorUVScaled = f2ScatteredColorUV.xy * f2ScatteredColorTexDim.xy - float2(0.5, 0.5);
float2 f2ScatteredColorIJ = floor(f2ScatteredColorUVScaled);
// Get bilinear filtering weights
float2 f2BilinearWeights = f2ScatteredColorUVScaled - f2ScatteredColorIJ;
// Get texture coordinates of the left bottom source texel. Again, offset by 0.5 is essential
// to align with texel center
f2ScatteredColorIJ = (f2ScatteredColorIJ + float2(0.5, 0.5)) / f2ScatteredColorTexDim.xy;

Camera space z values for 4 source samples can be obtained using Gather() intrinsic function. An important aspect here is offsetting coordinates to be at the same distance from all four source texels to eliminate rounding artifacts:
四个source采样点的相机空间深度值可以通过Gather指令一次性读取完成，不过这里需要注意的是，为了避免小数舍入的误差，需要保证偏移坐标到四个source采样像素点的距离是相同的：

float4 f4SrcLocationsCamSpaceZ = g_tex2DEpipolarCamSpaceZ.Gather(samLinearClamp, f2ScatteredColorIJ + float2(0.5, 0.5) / f2ScatteredColorTexDim.xy)

After that bilateral weights f4BilateralWeights are computed. To perform inscattering texture filtering using two fetches instead of four, we use the following trick: we can obtain weighted sum of two samples using hardware supported bilinear filtering. For this we only need to compute appropriate offset:
通过上述计算过程，我们就得到了双线性插值的权重f4BilateralWeights ，为了能将贴图采样操作从四个降低到两个，这里使用了一个小trick，借助与硬件的双线性插值特性来提高实现效率：

float fRow0UOffset = f4BilateralWeights.z / max(f4BilateralWeights.z + f4BilateralWeights.w, 0.001); fRow0UOffset /= f2SrcTexDim.x;
float3 f3Row0WeightedCol = (f4BilateralWeights.z + f4BilateralWeights.w) * tex2DSrcTexture.SampleLevel(Sampler, f2LeftBottomSrcTexelUV + float2(fRow0UOffset, 0), 0, int2(0,0));

float fRow1UOffset = f4BilateralWeights.y / max(f4BilateralWeights.x + f4BilateralWeights.y, 0.001); fRow1UOffset /= f2SrcTexDim.x;

float3 f3Row1WeightedCol = (f4BilateralWeights.x + f4BilateralWeights.y) * tex2DSrcTexture.SampleLevel(Sampler, f2LeftBottomSrcTexelUV + float2(fRow1UOffset, 0 ), 0, int2(0,1)); f3ScatteredLight = f3Row0WeightedCol + f3Row1WeightedCol;

Note that presented implementation does not perform branching (except for discarding invalid pixel) and performs bilateral filtering of inscattering texture using just one gather and two bilinear fetches.
上面的实现算法，除了在discarding无效像素之外，没有其他的分支操作，只用了一个gather跟两个双线性采样就实现了inscattering贴图的双线性过滤。
If total bilateral weight is close to zero, there are no appropriate samples which can be used to calculate inscattering for this sample. In this case we discard the sample, keeping 0 in the stencil.
如果整个双线性权重接近于0，那么就意味着找不到可以用来对当前像素进行插值的source 采样点，在这种情况下，会考虑将这个sample的插值计算停止掉，保持其输出值为0.

6.7. Correcting in-scattering

For each pixel, which cannot be correctly interpolated from inscattering texture, we perform additional ray marching pass. Since these pixels are marked in stencil with 0, all we need to identify these pixels is to configure depth stencil state to pass only pixels whose stencil value equals 0 and discards all other pixels. Note that at this stage we cannot use 1D min/max mipmap, because it will require constructing this for each sample. We also use lower number of samples when performing this step with no visual impact.
对于那些无法通过插值求得ray marching结果的像素而言，在这个地方还需要对其再单独进行一次ray marching计算（这些像素在之前的实现中，已经被stencil标记为0了，可以利用这个特性来进行处理）。不过在这个计算中，min/max shadow mipmap就不可用了，因为如果要使用的话，就需要为每个此类像素构建一个mipmap，消耗太高。不过，可以只对其中的少量有代表性的采样点进行ray marching计算来降低消耗，且这个实现不会对视觉效果造成什么影响。

6.8. Up-scaling in-scattering to original resolution

The final step of the algorithm is up-scaling the downscaled inscattering texture to original resolution and applying it to the attenuated background. This step is very similar to transforming the inscattering image from epipolar coordinates to rectangular with the main difference being is that original depth buffer is used to load camera space z and downscaled inscattering texture is used to load inscattering values. In the same manner pixels which cannot be correctly interpolated are marked in the stencil at this stage and finally corrected in the later pass. Note that we also apply phase function here, because it exhibits high variation near the epipole.
算法的最终一步就是要将inscattering结果贴图跟原始的color贴图混合起来，在这个过程中，那些无法通过插值得到inscattering结果的像素discard掉，这些像素会在后面的处理中被考虑到，注意，在这个步骤中，还需要处理相函数的影响，因为相函数会在极点附近会有比较大的变化。7. Sample structure
The sample project consists of the following files:
sample项目包含了以下文件：
* LightScattering.h, LightScattering.cpp – Responsible for application startup/shutdown, rendering the scene and shadow map, handling user input --项目的入口，负责场景渲染与shadow map渲染以及用户输入
* LightSctrPostProcess.h, LightSctrPostProcess.cpp – Responsible for performing all the post processing steps to create light scattering effects--负责所有的后处理渲染步骤，以计算光照散射
* RenderTechnique.h, RenderTechnique.cpp – Provide auxiliary functionality for creating effect technique--一些辅助接口
* Common.fxh – Contains common shader definitions--通用的shader定义
* LightScattering.fx – Contains all the shaders performing post processing steps--后处理相关的所有shader
* RefineSampleLocations.fx – Contains compute shader performing sample refinement--用于对采样点进行精修的compute shader
* Structures.fxh – Contains definitions of structures used by the shaders--shader相关的结构定义

8. How to integrate the technique into your engine

项目接入方式
Since the technique is completely post processing, integrating it into an existing engine is relatively simple. The technique is fully implemented in LightSctrPostProcess.h, LightSctrPostProcess.cpp and shader files Common.fxh, LightScattering.fx, RefineSampleLocations.fx and Structures.fxh. To compile the files you need DirectX SDK only.
All the work is done by PerformPostProcessing() method of CLightSctrPostProcess class, which takes two structs as arguments:

void PerformPostProcessing(SFrameAttribs &FrameAttribs, SPostProcessingAttribs &PPAttribs)

The first structure defines attributes for the frame to be processed and has the following declaration:

struct SFrameAttribs { ID3D11Device *pd3dDevice; ID3D11DeviceContext *pd3dDeviceContext; SLightAttribs LightAttribs; SCameraAttribs CameraAttribs; ID3D11ShaderResourceView *ptex2DSrcColorBufferSRV; ID3D11ShaderResourceView *ptex2DDepthBufferSRV; ID3D11ShaderResourceView *ptex2DShadowMapSRV; ID3D11ShaderResourceView *ptex2DStainedGlassSRV; ID3D11RenderTargetView *pDstRTV; ID3D11DepthStencilView *pDstDSV; }

The structure has two nested structures which define light source parameters (such as light direction and color, light source position on the screen etc.). The second structure defines camera attributes (position, world, view, projection matrices). Destination render target to which post-processed scene is written is specified by the pDstRTV member of the structure.

The second argument PPAttribs of the PerformPostProcessing() defines the method parameters such as number of epipolar slices, number of samples, initial sample step etc.

9. Performance

Performance results are given for rendering the scene shown in the fig. 21

Fig.21: The scene used for performance test

The effect configuration is:

Number of epipolar slices: 512
Total number of slices: 256
Initial sample step: 16
Downscale factor: 4
Shadow map resolution: 1024 x 1024
Screen resolution 1024x768

Hardware configuration: 3rd Gen Intel® Core™ processor (code-named Ivy Bridge) with Intel® HD Graphics, 4 GB RAM.

Total time required to render the scene is 18.7 ms, of which 9.906 ms (52.9%) is spent on post processing.

场景渲染用了18.7ms，其中9.9ms用于实现散射计算

The timings for the individual steps of the algorithm are as follows:

Reconstructing camera space z coordinate: 0.390 ms (3.94 %)
Rendering coordinates texture: 0.595 ms (6.01 %)
Refining sample locations: 0.557 ms (5.62 %)
Constructing min/max shadow map: 0.736 ms (7.43 %)
Ray marching: 2.638 ms (26.63 %)
Interpolation: 0.141 ms (1.42 %)
Transformation to rectangular coords: 0.306 ms (3.09 %)
Fixing depth discontinuities: 1.015 ms (10.25 %)
Upscaling: 2.498 ms (25.22 %)
Fixing depth discontinuities: 1.030 ms (10.40 %)
Total: 9.906 ms (100 %)

Without 1D min/max mipmap optimization, ray marching step alone takes 5.651 ms which is 1.67x slower than 2.638+0.736=3.374 ms for optimized version. Note that performance improvement is much higher for higher shadow map resolution, or higher number of epipolar slices/number of samples in slice.

如果不用一维的min/max mipmap优化算法，ray marching计算的消耗需要5.65ms，其结果是优化算法的1.67倍，这个优化效果对于高分辨率的shadow map或者更多的epipolar slice数可能更明显。

Note that according to timings presented in [ED10] for ATI Radeon HD4850 GPU, epipolar line generation took 1.2 ms while discontinuity search took 6.0 ms which is 5 times slower. In our implementation discontinuity search takes even less time than coordinate texture generation thanks to optimization with compute shader.

且由于compute shader的使用，这里给出的算法在搜寻深度不连贯点的时候的消耗时间比[ED10]更短。

Note also that Chen et al. report 55 fps for rendering the test scene on high-end NVidia GTX480 GPU [CBDJ11]. With similar quality settings (4096x4096 shadow map, 1024 epipolar slices with 1024 samples, 1x downscale factor, colored light shafts, 1280x960 screen resolution) for the scene of similar complexity (fig. 22) we were able to get more than 100 fps on the same graphics card and similar CPU.

在同样的机器配置与算法配置情况下，这里给出的算法能够达到100fps以上，而Chen的实现却只有55fps。

Fig.22: Colored light shafts in high-quality settings

10. Conclusion

The volumetric lighting method presented in the sample efficiently combines epipolar sampling with 1D min/max mipmap construction and a number of optimization tricks to achieve high rendering speed and visual quality on Intel integrated graphics. Being fully post-processing technique, it can be easily integrated into the game engines or other applications.

这篇文章的源代码可以在这个页面找到，不过这篇文章给出的算法，只能用于计算平行光光源的light shaft效果，对于其他各类光源的实现，就需要去这个页面上下载对应的源码。

这篇文章中对应的GDC演讲的PPT给出如下：

11. References

[PSS99] Preetham A.J., Shirley P., Smits B.E. (1999) A practical analytic model for daylight. In: Computer Graphics Proceedings, Annual Conference Series (Proc.SIGGRAPH ’99), pp 91–100.

[NSTN93] Nishita T., Sirai T., Tadamura K., Nakamae E.: Display of the Earth taking into account atmospheric scattering. In SIGGRAPH 93 (1993), ACM, pp. 175–182.

[HP02] Hoffman N., Preetham A. J.: Rendering outdoor light scattering in real time. Proceedings of Game Developer Conference (2002).

[ED10] Engelhardt, T., and Dachsbacher, C. 2010. Epipolar sampling for shadows and crepuscular rays in participating media with single scattering. In Proc. 2010 ACM SIGGRAPH symposium on Interactive 3D Graphics and Games, ACM, 119–125.

[BN08] Eric Bruneton and Fabrice Neyret.: Precomputed atmospheric scattering. Comput. Graph. Forum. Proceedings of the 19th Eurographics symposium on Rendering 2008 27:4 (2008), 1079-1086. Special issue.

[CBDJ11] Chen, J., Baran, I., Durand, F., and Jarosz, W. 2011. Real-time volumetric shadows using 1d min-max mipmaps. In Proceedings of the Symposium on Interactive 3D Graphics and Games, 39–46.

[GMF09] Gautrom, P., Marvie, J.-E., and Francois, G., 2009. Volumetric shadow mapping. ACM SIGGRAPH 2009 Sketches.

[TIS08] Tevs, A., Ihrke, I., and Seidel, H.-P. 2008. Maximum mipmaps for fast, accurate, and scalable dynamic height field rendering. In Symposium on Interactive 3D Graphics and Games (i3D’08), 183–190.

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 219,635评论 6赞 508
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 93,628评论 3赞 396
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 165,971评论 0赞 356
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 58,986评论 1赞 295
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 68,006评论 6赞 394
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 51,784评论 1赞 307
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 40,475评论 3赞 420
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 39,364评论 0赞 276
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 45,860评论 1赞 317
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 38,008评论 3赞 338
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 40,152评论 1赞 351
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 35,829评论 5赞 346
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 41,490评论 3赞 331
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 32,035评论 0赞 22
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 33,156评论 1赞 272
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 48,428评论 3赞 373
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 45,127评论 2赞 356