Thanks Peter.
You can speed up the performance by computing in the FPU stack. It is quite tricky, but satisfying.
Maybe also setting your own pixels in a DIB and copying the whole block as a sprite into the screen buffer. Not sure how that is done in GDI.