go to: [part 2][home]


Implementing Ray Tracing on GPU
Part 1


This document is meant as supplement to the pdf document "Ray Tracing on GPU" found in the download area on [home]. The pdf paper doesn't contain shader source code. Shader source code with partial documentation is found here.


Shader Language: GLSL and HLSL

GPU Ray Tracing is presented in shader source code that is implementable in both GLSL and HLSL with minor modifications. In the HLSL part I map datatypes using “#define vec3 float3” etc. to have a common vector data type for both languages. From the standpoint of syntax, GLSL and HLSL are pretty much equal.

Basic Ray Tracing Concept

The paths of individual rays of light are followed from the viewer to their points of origin.
The primary task of a ray tracer is to find intersections of a ray with a scene consisting of a set of geometric primitives.
The GPU ray tracer will be limited to geometric primitives consisting of triangles (mesh based objects).
The first step is to calculate primary rays originating from the camera. The next step is to reflect rays and calculate resulting intersections. If a point is in shadow, another object (“occluder”) is between the point and the light source. This concept is – of course – very simplified, if you want to learn more about ray tracing, you find plenty of resources in the net.

Data Types for the Shader Programs

In both HLSL and GLSL it is possible to define structures. Using structures in shader programs makes souce code is easier to read, nothing else. For the ray tracer, only two structures “Ray” and “Triangle” are used.

struct Ray
{
   vec3 startpoint;
   vec3 direction;
};

struct Triangle
{

   vec3 A;
   vec3 B;
   vec3 C;
};



The HLSL version additionally uses structures for vertex shader input/output semantics, and pixel shader input/output semantics.

Vertex Shader Input Semantics

struct vertex_attribute
{
    float4 position : POSITION;
    float2 texCoord : TEXCOORD0;
};

Vertex Shader Output Semantics

struct varying_data
{
    float4 hposition    : POSITION;
    float2 texCoord     : TEXCOORD0;
};


Pixel Shader Input Semantics

struct PixelShaderInput
{
#ifdef USE_PS_3
    float4 hposition    : POSITION;
#endif
    float2 texCoord     : TEXCOORD0;
#ifdef USE_PS_3    
    float2 screencoord  : VPOS;   

// Pixel Shader 3.0 allows reading of current screen coordinates
#endif
};


Pixel Shader Output Semantics for Multiple Render Targets (MRT)

// (1) Pixel Shader Output Semantics: write to 1 texture
struct pixel
{
   float4 color         : COLOR;
};

// (2) Pixel Shader Output Semantics: write to 2 textures
struct pixel2
{
   float4 color0        : COLOR0;
   float4 color1        : COLOR1;
};

// (3) Pixel Shader Output Semantics: write to 3 textures
struct pixel3
{
   float4 color0        : COLOR0;
   float4 color1        : COLOR1;
   float4 color2        : COLOR2;
};

In GLSL gl_FragData[i] can be used to write to multiple render targets (MRT).

1:1 Texel to Pixel Mapping

varying_data stream_generator( vertex_attribute IN )
{
   varying_data OUT;
 
   OUT.hposition = IN.position;
   OUT.texCoord.x = IN.texCoord.x;
   OUT.texCoord.y = IN.texCoord.y;
   return OUT;
}

AABB Check: Ray hits Box ?

An important utiliy function for a ray tracer is to check if a ray hits an axis aligned box, and return the possible resulting hit-point. The algorithm is based on the Smits method mentioned in “An Efficent and Robust Ray-Box Intersection Algorithm” by Williams et al. The code uses IEEE floating point standard to avoid explicit testing of values: in the IEEE standard division zero returns +INF or –INF. I wrote a little program to test this for NVidia cards and this aspect of the IEEE is implemented correctly on current GPUs.

bool hitbox(Ray r, vec3 m1, vec3 m2, out float tmin, out float tmax) 
 {
   float tymin, tymax, tzmin, tzmax; 
   float flag = 1.0; 
 
    if (r.direction.x >= 0) 
    {
       tmin = (m1.x - r.startpoint.x) / r.direction.x;
         tmax = (m2.x - r.startpoint.x) / r.direction.x;
    }
    else 
    {
       tmin = (m2.x - r.startpoint.x) / r.direction.x;
       tmax = (m1.x - r.startpoint.x) / r.direction.x;
    }
    if (r.direction.y >= 0) 
    {
       tymin = (m1.y - r.startpoint.y) / r.direction.y; 
       tymax = (m2.y - r.startpoint.y) / r.direction.y; 
    }
    else 
    {
       tymin = (m2.y - r.startpoint.y) / r.direction.y; 
       tymax = (m1.y - r.startpoint.y) / r.direction.y; 
    }
     
    if ((tmin > tymax) || (tymin > tmax)) flag = -1.0; 
    if (tymin > tmin) tmin = tymin; 
    if (tymax < tmax) tmax = tymax; 
      
    if (r.direction.z >= 0) 
    {
       tzmin = (m1.z - r.startpoint.z) / r.direction.z; 
       tzmax = (m2.z - r.startpoint.z) / r.direction.z; 
    }
    else 
    {
       tzmin = (m2.z - r.startpoint.z) / r.direction.z; 
       tzmax = (m1.z - r.startpoint.z) / r.direction.z; 
    }
    if ((tmin > tzmax) || (tzmin > tmax)) flag = -1.0; 
    if (tzmin > tmin) tmin = tzmin; 
    if (tzmax < tmax) tmax = tzmax; 
      
    return (flag > 0); 
 }

Point In AABB ?

Another important function is to check if a point is within an axis aligned box. This can be done using a "#define":

#define inbox(v, b1, b2) ((v.x>=b1.x) && (v.y>=b1.y) && (v.z>=b1.z) && (v.x<=b2.x) && (v.y<=b2.y) && (v.z<=b2.z))



Unfortunatly it is not possible to use a vectorized check:
       (v >= b1 && v <=b2)

Early Z-Culling

Early Z-culling is an optimization method provided by some graphics cards, including NVidia GeForce 6 series. It is possible to prevent execution of certain (long) shader programs based of depth culling. In the first pass values are written to a depth mask. Pixels that shouldn’t be processed any further are “masked out”. In the next pass the computational expensive shader will only process the pixels which were not masked out in the previous pass.


go to: [part 2][home]