Let me explain a little extension I have in mind:
MOTIVATION
Real transparency sorting is very slow. Methods like depth peeling or triangle sorting are too much expensive.
Traditional shadow methods ( shadowmaps, shadow volumes, etc… ) have tons of problems and are too much tricky ( bias, infinite extrussion cap, perspective antialiasing, filtering… )
Physics are coming to the world of the GPUs. These days we assist to a war between the GPUs and the AGEIA card.
New “parallax” bump methods are coming. Last one is the “parallax occlusion mapping” ( im sure you saw the wonderful ATI presentation about this ).
There are tons of applications that could use the HW-acceleration to accelerate normal map generation, ambient occlusion or lightmap/PRT computation.
We need a method to cast “rays” from the shaders so we could perform:
- Real and high-accurate penumbra shadows
- Transparency sorting
- Collisions and physics in the GPU
- More accurate “occlusion parallax” bump mapping using a heightmap.
- Medical 3D voxel applications
- Volumetric fog ( distance between closest ray hit and furthest )
- Sub-surface scattering
PROPOSAL
We can create an “object batch”. This is like an occlusion query batch really. We tell the driver some geometrical objects are “grouped”. We could use something like:
GLint query = glCreateObjectQuery();
glStartObjectQuery(query);
glAddToObjectQuery(query,obj.vb,obj.ib,obj.id);//vb is a vbo, ib is an index buffer and id is an identifier for the object (like in the geom shaders)
glAddToObjectQuery(query,obj2.vb,obj2.ib,obj2.id);
glAddToObjectQuery(query,obj3.vb,obj3.ib,obj3.id);
//Add more objects to the query forming a "group set"
...
glEndObjectQuery(query);
The driver now calculates the AABB, OOB of the set. Also could make an octree or other spatial structure for future use.
Of course will be good too to allow “queries inside queries” feature to make this hierarchic.
Ok, now this extension will define a function for the GLSL called “castRay” like:
bool castRay ( [in]vec3 origin, [in]vec3 endPoint,
[out]vec3 hitPoint, [out]vec3 hitNormal, [out]int triangleId, [out]int primitiveID );
We pass to the function the ray origin and end point ( which is origin + (rayDir*distance) ).
The function will return “TRUE” if there is hit and then will fill the out params hitPoint, the interpolated normal and the geometry-shader triangleID and primitiveID.
The function will return “FALSE” if there is NOT hit ( and then out params won’t be filled )
This function iterates over all the previously-created “object queries”, trying to find a ray-triangle intersection. The driver must internally optimize the “object queries” using AABBs and some kind of hierarchical structure to optimize the collisions ( like the AGEIA does ).
The “castRay” should be available for BOTH vertex and fragment shaders ( and pehaps too in the upcoming geometry shaders? )
With this, you could do collisions, shadows, etc… inside the GLSL.
For example, you could perform raytraced shadows in the fragment shader in a very simple way:
uniform vec3 lightPos;
uniform float lightRange;
uniform sampler2D baseTex;
varying vec3 vPos;
void main ()
{
vec3 base = texture2D(baseTex,gl_TexCoord[0].st).rgb;
vec3 hitPos, hitNormal;
bool inShadow = castRay(lightPos,vPos,hitPos,hitNormal);
if ( inShadow )
{
base *= 0.4;
}
gl_FragColor = vec4(base,1.0);
}
The OGL driver when finds the “castRay” GLSL instruction will iterate over all the “group object batches” previously-created with the glCreateObjectQuery() function.
If the ray doesn’t touch the group AABB no test will be performed and could skip that triangle set SUPERFAST.
If the ray touches the group AABB then will find more AABBs inside the AABB ( iterative process ). Once reached a “node limit” will perform ray-triangle hit tests ( yes this is slow but only occurs a few times and there is where the NVIDIA/ATI engineer’s brain should work to optimize it ).
CONCLUSION
What do you think about this? We can perform raytracing in the GPU very easy with this and will be HW-driver optimized very fast. This feature is very good to achieve TONS of effects like the mentioned ones.
Also, “object batch queries” can be used by the driver to perform HW-accelerated simple frustrum culling because the driver could know the AABB or OOB of the group sets, so could skip FAST these objects if are not visible…
However, the performance can be bad… That is why the graphic cards should implement any kind of spatial structure to accelerate the ray-triangle collision test ( like a kd-tree or axis aligned bounding boxes ). Perhaps dynamic-objects shouldn’t be available for this so we could limit this to static objects to start??? Also perhaps the driver should forget octrees/kd-trees and just fire a HW-occlusion query painting all the “object queries” and to get if the pixel is visible from a camera placed at ray origin??? Well, the “castRay” could be implemented by the graphics engineers using different methods…
“Object batch queries” combined with the “castRay” GLSL instruction ( combined too with the geometry shaders ) can open a new world for the GPU, the raytracing, the next step we all waiting for to achieve real and amazing new effects!