Post by Jon Harro » Sun, 21 Jan 2007 14:23:28

I was reading up on real time ray tracing and stumbled upon this demo:

it is one of the best such demos that I've seen.

I thought I'd try my hand at remaking the demo using OpenGL instead of ray

When hardware accelerated, the OpenGL-based renderer can render the same
image over 100x faster than the real time ray tracer with 1/10th as much

But I noticed something else interesting. When my graphics drivers were
broken and OpenGL resorted to software rendering, it was still faster than
the ray tracer.

This leads me to think that a z-buffer (or s-buffer) might be a better way
to render primary rays in a ray tracer. Tracing ray bundles seems like a
first step in this direction.

PS: Anyone got any links to better real time ray tracing demos?
Dr Jon D Harrop, Flying Frog Consultancy
Objective CAML for Scientists


Post by kalle.las » Sat, 27 Jan 2007 20:51:45

You haven't seen much then I guess :)

First of all, have you got any idea how much more FP power your GPU has
compared to the CPU?

You seem not to use shadowing in your example whereas ray tracer has
dynamic pixel-perfect self-shadowing. Please add it to your demo and
then come back to compare the two. Vertex lighting is not shadowing.
Another interesting thing would be if you would try to render that
Buddha statue with full detail (~1M triangles) and with (pixel perfect)
shadows. I wouldn't be too surprised when CPU would beat HW accelerated
OpenGL in that scenario even though it has much less raw power.

Anyway, ray tracing a bunny isn't really the best thing to do with ray
tracing. Add in some reflections-refractions and shadows and then start
comparing RT vs rasterizing in terms of speed, FLOPS used and code

As I said, your test scene wasn't good when comparing the two rendering
algorithms. Add shadows to your demo and then we'll talk about if
software OpenGL is faster than software ray tracing.

It wouldn't help all that much since primary rays take only part of the
time in rendering. When you add in shadows and secondary rays then
tracing primaries won't make a big difference any more, especially when
you use things such as MLRTA.

Of course:
Look under "visuals" and "tools, demos & sources". Especially
interesting is the Arauna tracer, availiable here: #2721

Ahh, the ffconsultancy. Have you already forgot the thread that was
here around 1.5 years ago?

I'd like to make a request for you. You already have a program on your
home page that renders a sphereflake using occlam. What would be the
speed and code length of comparable OpenGL program?

In case you have forgot where it is, here is a link:



Post by Jon Harro » Sat, 27 Jan 2007 22:03:17



I'm working on it. There are plenty of OpenGL programs already out there
with shadowing, of course. Look at Quake 4, for example.


I seriously doubt it. Jacco couldn't ray trace 1M pixels in real time, let
alone triangles...

Exactly. Ray tracing only seems to be good for sphereflakes.

Shadows won't make any difference. Reflections of distant objects can be
done with cube maps, so that won't make any difference either.

Accurate close-up reflections and refractions you'd need to use something
akin to ray tracing. Use mathematically simple primitives like spheres and
ray tracing has a benefit. So ray tracing is great for reflective
sphereflakes and bad for everything else...

I already added a completely naive shadow rendering algorithm and using
OpenGL is still many times faster than Jacco's ray tracer. I'll optimise my
OpenGL and post it.

Even in real time ray tracers?

The screenshot on that page is about as compelling as Jacco's grainy bunny.

Sure, TBP said that sphereflake's should ray trace "in the blink of an eye".
Then he posted an optimised sphereflake ray tracer to back up his claim.

It took me a year to reply, but TBP's C++ ray tracer turned out to be only
60% faster than the OCaml:

OpenGL would not fare well. However, do you have a ray tracer that can
render that in real time? Many people have claimed that is has been done,
but I've never seen anything like it.

Dr Jon D Harrop, Flying Frog Consultancy
Objective CAML for Scientists


Post by kalle.las » Sun, 28 Jan 2007 04:35:14

on Harrop kirjutas:

You have no ideas how wrong you are:
~80k static and 1k dynamic triangle scene with depth fog, reflections,
texture mipmapping, bumpmapping, bilinear filtering and lots of other
things run at 8-10FPS @800x576 on my Core2Duo CPU. That is at least 4M
primary rays per second with 100% screen coverage. Together with shadow
and secondary rays around 10M rays per second.
Btw, Jacco has got an eight core machine at his UNI and is developing a
game based on ray tracing.

And highly detailed scenes with reflections and refractions, things
that multipass rendering hates badly. Anyway, we shouldn't debate over
speed but over efficiency. If you want to debate overs speed then at
least use comparable HW. In this case, software OpenGL vs software ray
tracing. Good luck running those pixel shaders on your CPU.

How many lines of code will it add to implement point and cone lights
with nice gradient?
and six passes for pointlight to get decent shadow quality with OpenGL
based rasterizing assuming you use shadowmaps. My knowledge about
rasterization is a bit rusty so I might be wrong but I don't think I'm
too far off.
How much does it cost to create a detailed reflection cubemap and what
kind would it's quality be? I dare you to render that same scene that
Jacco does in his demo with the same effects and claim that OpenGL is
easier to write and more efficient.

Also, some scenes with several translucent objects behind each other
would be nice to see in OpenGL.

Since we don't have wildly availiable HW acceleration for ray tracing
yet I agree that currently OpenGL is better when you need maximum speed
and don't need to render massive and detailed scenes. Though RT HW is
being developed ( and first tests show that RT needs around
half the memory bandwidth of regular rasterizers. With special purpose
chips (GPU, RPU, PPU) FP power is a lot cheaper than memory bandwidth.
In the future as scene complexity rises RT will have even more benefits
compared to rasterizing.

Also if you haven't noticed then all the real time ray tracers you see
in are tracing only triangle based scenes.

Please post some screenshots with code too when you are done. I can't
wait to see how simple it is to create pixel perfect dynamic lights
with self shadowing in OpenGL. Even better would be some speed
comparisons with shadows turned on and off. Until you are done with
this I'd like to see that naive shadow thingie you have already

Yes, why not?
Hint: take a look at some of the threads in

You didn't run the latest demo, did you? Anyway, here is the post with
some pictures, post containing the latest demo is right above it:

There is a followup in the forum to that very same thread:
After lots of tweaking, OCaml is still quite a bit slower than C++,
5-6x slower if I remember correctly. Feel free to post more optimized
tracer to that thread but make sure you trace triangulated

You should join the ompf forum and ask someone to make a scene for that
sphereflake. I know that tbp's tracer is capable of tracing 1M triangle
happy buddha at 15FPS, I think sphereflake with all its reflections
should be around 5-10FPS on Core2Duo @512x512 resolution, possibly more
if SIMD is used.



Post by Dr Jon D H » Mon, 29 Jan 2007 07:21:56


That's for 10x fewer triangles though?

I'm keen to see what he can come up with next. :-)

You want to cripple OpenGL to make the comparison fair? You'll be
comparing TBP's C++ with an OCaml newbie's code next... ;-)

I just implemented completely naive geometric shadows in ~90LOC and
I'm getting 10fps on my laptop at 1600x900. My current algorithm is
using dynamic data structures (mainly lists) that get completely
regenerated every frame. I'll optimise it by changing to static data
structures and incremental updates.

There are many different ways to do it. I've used shadow volumes
extruded on CPU and stencilled in hardware. They're exact. Shadow
volumes are probably faster but they're inexact.

I think you're right.

I'll try to get his demo up and running (am away from home for the


Gallium Arsenide is the semiconductor of the future. It always has
been and it always will be.


Will do.

I just did it in 90LOC but it is too slow to be very useful.

Currently 1600fps without shadows and 10fps with. :-)

I'll post the trivial implementation as well as an optimised one then.
I think it is interesting to see the simplest possible way to do

Because they can't afford to trace much more than 1 primary and 1
shadow per pixel.

Not yet (no Windows here).

I'll try it when I get back to a Windows box.

The OCaml code was written by a newbie!

I already gave the author lots of help with his OCaml and posted
several improved versions to various groups and lists. The OCaml needs
a complete rewrite. It should be 4x smaller and 4x faster. I'll
probably redo the whole thing myself one day but I'm too busy ATM and
OGL is much more fun. :-)

Rendering the Buddha at 15fps with shadows and without LOD will be
tricky but I'll give it a go...




Post by Dr Jon D H » Tue, 30 Jan 2007 03:25:29

For 1280x800, 4x FSAA, 70k tri bunny on my 800MHz laptop I get:

80fps without shadows (85LOC)
2fps with every triangle extruded per frame (174LOC)
30fps with silhouette edges extruded per frame (231LOC)

for 200k tri dragon:

35fps without shadows.
9fps with silhouette edges extruded per frame.

How fast and long are real time ray tracers at similar scenes? Of
course, we need to run the benchmarks on the same machine...



Post by kalle.las » Tue, 30 Jan 2007 06:34:25

Dr Jon D Harrop kirjutas:

Want to bet that adding 10x triangles won't bring performance down
nearly as much as with rasterizing? I would expect around 1.5-3x
performance decrease with adding 10x more triangles to a scene.

As I asked, are we debating over SW/HW implementation speeds or
algorithmic efficiency? Currently you are trying to compare HW
accelerated rasterizing vs software ray tracing. Is it fair?
If you want to compare HW then perhaps you should compare dRPU8 vs any
GPU you like: -> Publications -> "SaarCOR - A
Hardware-Architecture for Realtime Ray Tracing "

Please upload your code somewhere so I could test it on my PC too.

Say what?
Also please read that dynRPU thingie and compare the transistor counts
and especially needed memory bandwidth to current GPU's.

I can run glxgears at 10k FPS too. FPS that high only shows you don't
really benchmark anything ;)

Who sais you have to trace packets per pixel? Just read this paper:
They say tracing 16x16 ray packets is rather efficient, even for
secondary and shadow rays.

It wors quite nicely through Wine. I don't have Windows installed
either. It shouldn't loose too much performance. Only thing is that
SSE2 is required. Without it, it won't work.

Somehow I doubt you can create much better code but feel free to prove
me wrong. Also don't forget to post it to ompf, it would make an
interesting discussion subject.



Post by kalle.las » Tue, 30 Jan 2007 07:08:30

Dr Jon D Harrop kirjutas:

Seems as you have increased the performance quite a bit. Though
loosing around 2.6x performance with only single (spot?)light is still
rather bad in my oppinion. Try the arauna demo and comment out some
spotlight there and you'll see it has minimal impact on performance.

Gee, increasing polycount by less than 3x you loose around 2.5x
performance. Linear scaling sucks, doesn't it :)
Also, it seems as turning on the shadows will decrease performance
around 4x for that 200k model. Would you want to make any guesses on
how much would performance decrease when you have four such models
shaded and each covering a quarter of the screen? What do you think,
how much would a ray tracer loose it's performance in similar
Also, what if you have a scene with ten such models. Three onscreen,
three out of view frustrum but shadows partially inside it and the
rest are completely invisible, including their shadows?

Define "long"
As for speed, I don't know. It largely depends on how much of the
screen is covered by the model and on how large surface does the
shadow cover. Before I get some decent tracer working on my PC I can't
give you any hard numbers.

When assuming that screen is 40% covered by something (60% is empty
space with nothing behind it) and resolution is the same as you
(4xFSAA is replaced by 4 samples by ray), I would guess around 3-6 FPS
on both models on my PC, that is around 12-25M rays per second. My CPU
is easily capable of that in such a scene. If someone would implement
adaptive antialiasing and algorithms described in the paper I linked
in last post (that one with 16x16 ray packets) I would assume much
higher FPS.

Unfortunately I don't have the code for the latest tracers, though I
can ask. Problem is that the one used in Arauna is currently only
running in Windows. I have some older and slower version of that
tracer but it isn't completely ported to Linux. I might try to hack
Arauna a bit to make it accept some custom models, though I'm not sure
if that is simple now when it has some dynamic stuff added.

As for PC's, I have Core2Duo e6300 OC'd to 3.1GHz and 6600GT at


Post by Dr Jon D H » Tue, 30 Jan 2007 21:22:57


It isn't very compelling for you to claim that real time ray tracers
can handle 1M triangle scenes with ease, then point to a 100k tri
benchmark and then speculate that the ray tracer would only be
"1.5-3x" slower. We need code and measurements! :-)

I can render the 200k tri dragon full screen with 4xFSAA at 15fps now,
still on an 800MHz laptop...

I want to know if real time ray tracing can compete with rasterising
on modern consumer hardware, e.g. if we were to write a game now,
would be it worth considering ray tracing?

The vast majority of current computers have hardware rasterisation but
little in the way of hardware ray tracing.

The same arguments apply to DEC Alpha vs x86. I'm really interested in
learning about the practical trade-offs. Of course, if nVidia
announced serious hardware accelerated ray tracing in a future GPU,
that would change everything.

I'll put it on the web page when I get back home.

Ray tracing has been the future of rendering since I can remember.
reactors, ...) will always be a thing of the future.

I don't doubt it. But I have a GF7900GT, not a dynRPU.


Sounds like a step towards rasterising for primaries to me. They're
exploiting exactly the same coherence.

I don't have Wine on here (does it work from 64-bit?).

I already did. My original OCaml ray tracer was much faster (60%
slower than C++) and I posted several major rewrites of his code, e.g.
to completely avoid OOP.

I thought I already had. I certainly posted to the caml-list (fa.caml




Post by Dr Jon D H » Tue, 30 Jan 2007 22:04:21

It just got a bit faster again...

Yes. There is lots more scope for optimisation though.

Scaling well with respect to the number of lights is no good if you're
slower in all cases.

A ray tracer will have that disadvantage with respect to pixels, so it
is swings and roundabouts. The only thing we can compare objectively
is overall performance. So, I'm now getting 80fps at 512x512 for the
70k tri bunny and 21fps for the 200k tri dragon for only 235LOC.

So Jacco's ray tracer is ~32x slower now. And I haven't finished
optimising or even started approximating...

Thanks to LOD, the triangle count and performance will remain the

The same.

You'd have to be careful to avoid extruding silhouettes unnecessarily
but you can always recover the asymptotic complexity of ray tracing,
e.g. using hierarchical occlusion culling.


We should look at dynamic geometry too...

Right, I'll have to benchmark my renderer when I get back to my



Post by kalle.las » Wed, 31 Jan 2007 09:19:49

r Jon D Harrop kirjutas:

I downloaded an almost four months old Phantom's demo and hacked
around with it a bit. I couldn't use a newer one since they seemed to
have too many things hardcoded so creating custom scenes was
difficult. I can't currently remember how much has the speed increased
compared to the latest version. Though it will be a bit difficult to
compare since the latest one uses all sorts of funky things such as
bumpmapping, depth fog and reflections. Anyway, here are the results:

80k polygon Sponza scene at 18.7 FPS on my 3.1GHz C2D:

The same Sponza with added happy byddha statue, total of around 80+1M
triangles. ~7.6FPS :

Drop in speed with increasing polygon count around 12.5 times is

I hope to see similar scene from you soon. If you intend to use
massive LOD then I'll just request some closeup mugshots where you
have to use the full detailed model. Normalmapping and other things
won't count ;)

The program I used is here:
Unfortunately, latest availiable source is much older than that, from
around May. It currently isn't availiable on the net but I can upload
it somewhere if you like to. Only problem is that it is working only
on Windows, I couldn't quite finish porting it to Linux. I almost got
single threaded thing working but for some reason it was muchs slower
than Windows version.

Depends on what HW you are targeting. If you target 5GHz quadcore with
128bit wide SSE2 units and SSE4 support then you can use ray tracing
to get reasnoable results. If you target lower CPU's it will get a bit
difficult unless you start lowering resolution. So in short, if you
plan to release your game in less than 4 years RT isn't that useful.
Though it is still quite fun to play around with it.

Of cource if you target offline rendering then I would say that ray
tracing beats rasterizers quite a bit. With such rendering you usually
can't use rasterizing HW to speed things up and when run on the same
CPU with complex shading and detailed scenes RT will most probably
win. If you want to prove me wrong then you can start by giving some
performance numbers on that buddha and sponza scene when rasterized
using either your own custom written rasterizer (include LOC number
too!) or software OpenGL :)

I know but I would expect CPU's to become faster and with more cores.
Intel will probably have 8-core CPU's with new and improved
architecture around 2008-2009 running at > 5GHz. Together with some of
the SSE4 instructions (single cycle dot product) these would be fast
enough for good quality ray traced games.

Before you say that fast CPU's is a pipe dream I'll say that there
have been talks that it'll release 3.7GHz 45nm quadcores before this
year is over. Assuming they are as easily OC'able as 65nm C2D's it
wouldn't be too difficult to make 32nm CPU's run >5GHz in three years.
Intel plans to use every tech node for about two years before it moves
to the smaller one. That means 32nm by 2009 and 22 by 2011.

Of cource if some HW vendor starts producing RT HW it will suddenly
get a lot more interesting. Only problem is I can't see anyone doing
it in the near future. Perhaps only Intel with its upcoming >50 core
CPU's where some of the cores might be specia


Post by Jon Harro » Wed, 31 Jan 2007 23:27:20

On my desktop machine I'm getting 140fps for the 70k tri bunny at 1920x1200
and 35fps for the 200k tri Dragon.


I get 3fps for the Buddah at 1920x1200 but the model really needs LOD.

That won't cause a problem provided I'm using adaptive subdivision. Check
out the planet renderer I wrote several years ago:

without adaptive LOD, any renderer will struggle with 2^80 triangles. :-)

I get up to 20fps but the image in tiny, only 1/10th the size of my screen.
How many triangles are there in the whole scene?

Interesting. I've got Windows and Linux here now.


I agree that ray tracing will win for non-realtime when you have a warehouse
full of CPUs.

Yes, but GPUs are increasing in speed faster than CPUs.


Absolutely. I don't know how massively-parallel CPUs will change rasterising

But will it ever get here?

That is impressive for a ray tracer but the image quality is nothing like as
good as the real Quake 4. My guess is that they've avoided all non-trivial
lighting calculations. Anyway, it is biased to try to mimic games because
they are designed for rasterising and not RT. If you want to show off RT
then you really need to use features that are very difficult to do
efficiently without RT, like sphereflakes.

Yes, but immature disciplines often see greater improvements for less

You can do many good approximations, of course:

Whether you're tracing triangles or spheres, if an OCaml program is 6x
slower than a C++ program you've done something wrong in the OCaml. That
particular OCaml program has many different inefficiencies. I listed all of
the ones that I noticed when the program was first posted and even made
some of the changes myself.

Dr Jon D Harrop, Flying Frog Consultancy
Objective CAML for Scientists


Post by ander » Thu, 01 Feb 2007 19:11:30

> > Of cource if you target offline rendering then I would say that ray

Absolutely not I'm afraid. For high-quality rendering, hybrid reyes-
rasterizers/raytracers win hands down (I'm thinking specifically of
photorealistic renderman here). In fact when one starts getting into
ridiculously complicated datasets (rendering hair for example),
raytracing ceases to be a viable option altogether. Of course, it
becomes better for raytracing on 64-bit architectures, but that
assumes you have enough money to buy hundreds of >8-core, >32GB
renderfarm machines, not to mention keeping those suckers cool :)



Post by Jon Harro » Fri, 02 Feb 2007 03:13:54

I should probably add that the OCaml is using a different algorithm that is
asymptotically slower than the one used by the C++.

Dr Jon D Harrop, Flying Frog Consultancy
Objective CAML for Scientists