Author Topic: OpenGL Performance Contd.  (Read 15825 times)

0 Members and 1 Guest are viewing this topic.

Mike Lobanovsky

  • Guest
Re: OpenGL Performance Contd.
« Reply #15 on: April 29, 2014, 01:29:33 AM »
Hi Charles,

Thanks a lot for your report!

In Vista Aero, I confirm:
-- nearly perfect movement for VSYNC=None/0 (1 or 2 stumbles per session)
-- visible stuttering for VSYNC=60FPS/1 (up to half a dozen stumbles per horizontal run)
-- no stuttering for VSYNC=30FPS/2 (not comfortable, never used on a PC)

In Vista Classic, I'm seeing stutter everywhere, even at VSYNC=30FPS. The worst case is VSYNC=60FPS.

Is your Classic as good as your Aero or as bad as my Classic?

I do not confirm any problems with audio even against background music (do you use it actually for any useful purpose? ;) )

I'm running Vista Ultimate Sp2.

P.S. Our video cards seem to be almost equally old. The stutter may be due to minor incompatibilities of these pieces of HW with modern drivers. Aren't we fighting with windmills, Charles? Petr reports no visible stutter at all on his modern nVidia gear.
« Last Edit: April 29, 2014, 01:37:38 AM by Mike Lobanovsky »

Charles Pegge

  • Guest
Re: OpenGL Performance Contd.
« Reply #16 on: April 29, 2014, 02:41:58 AM »
I switched to classic (XP) display mode, but it does not make any apparent difference.

Stuttering was most frequent when the PC was booted out of hibernation. Vista goes crazy for at least five minutes following reboot, so that does not surprise me.

I think its beneficial to incorporate a compensation system for animations anyway, so that the dymamics are not affected by a fall in frame rate. Ubuntu/Wine seems to set it to 30Hz.

This is my current frame rate monitor. It measures the lapsed time from the previous frame, and adjusts stepn, which is used as a multiplier for all variables controlling velocity. It also counts the number of missed (60Hz) frames for diagnostics.
Code: [Select]
  macro TimeFrame
  ===============
  static sys   countn
  static float FramePeriod,stepn
  scope
  static quad   t1,t2
  stepn=1
  t1=t2
  TimeMark t2
  if t1
    FramePeriod=TimeDiff t2,t1
    if FramePeriod>.02              'quantum 60Hz 0.0166
      stepn=round(FramePeriod*60.0) 'moment scale
      countn+=stepn-1               'count missed frames
    end if
  end if
  end scope
  end macro

PS
I'll stay with classic mode, it is not not as pretty but windows can be cascaded efficiently, with their titles visible.
« Last Edit: April 29, 2014, 02:49:53 AM by Charles Pegge »

Mike Lobanovsky

  • Guest
Re: OpenGL Performance Contd.
« Reply #17 on: April 29, 2014, 03:23:32 AM »
Charles,

Thanks for this additional input.

My time counter is a little simpler than yours:

Code: [Select]
  macro TimeFrame()
  =================
  static quad   t1, t2
  static double FramePeriod
  t1 = t2
  TimeMark t2
  FramePeriod = TimeDiff t2, t1
  mult = 1 / (0.01666666666667 / FramePeriod)
  end macro

and its mult parameter is used to adjust linear and angular velocities to make them totally independent of the FPS rate, e.g.:

Code: [Select]
.........
' ML
' FPS-corrected step
      my += (mm * mult)
' END ML
..........

or

Code: [Select]
.............
' ML
' FPS-corrected step
  ang1 += (angi1 * mult)
' END ML
..............

Coupled with a continuous single-threaded PeekMessage(PM_REMOVE) + render-when-idle loop, it adds approx. 25% or 26% load on my quad core CPU in all the VSYNC modes, which would equal approx. 50% or 55% load on a dual core CPU. That's perfectly normal for a non-harware assisted (no VBO's, no shaders) OpenGL rendering loop. There's no sense in trying to bring it down artificially any lower by using a Windows timer or Sleep() calls that would only interfere with OpenGL's natural flow and timing of events as well as with VSYNC settings. An OpenGL rendering loop is not supposed to run at zero load when there are dynamic objects moving on the screen.

P.S. You'd be surprised at how many people actually like to stay Classic. The more professional you are, the less are you expected to like OS animations and decorations. Not so for me though - I like to be having all the bells and whistles on. :)
« Last Edit: April 29, 2014, 03:45:54 AM by Mike Lobanovsky »

Mike Lobanovsky

  • Guest
Re: OpenGL Performance Contd.
« Reply #18 on: April 29, 2014, 04:56:21 AM »
O'kay gentlemen,

Here are my mods for OxygenBasic's OpenGL ControlCraft example. It runs perfectly smoothly for me under my XP though it may exhibit image stutter on other platforms.

1. Please go to Oxygen's \inc folder and backup your OpenglSceneFrame.inc and WinUtil.inc files there then copy my files of the same names from the attached zip into this folder.

2. Go to the \examples\OpenGl folder and backup its existing ControlPanels.inc and ControlCraft.o2bas files. Copy my files of the same names from my zip into this folder.

3. Do not use these four files for anything other than my example! They may be incompatible with other examples and/or the existing functionality of OxygenBasic!

4. Make sure your driver is adjusted to follow your current program's settings or you won't be able to toggle your VSYNC.

5. Run the ControlCraft.o2bas example the regular way. Select the saucer by left-clicking on its hull and use the arrow keys and/or control panel widgets to navigate it in the scene as you normally would.

6. Click your middle mouse button to toggle VSYNC between None, 60FPS, and 30FPS. Your middle mouse button is the one that's under your mouse wheel.

All my changes to the code in all the four files are enclosed in ' ML / ' END ML blocks.

The sample will run with a minimum load on your CPU (though I actually don't like it). If you see image stutter, please try to comment out all of the function calls starting with timeBeginPeriod(1) down to timeEndPeriod(1) inclusive at the bottom of WinUtil.inc file. Alternatively, you may try and change individual Sleep() settings in this block (though I repeat I do not like to see those calls there altogether. I think upgrading our video cards to newer models would be a wiser solution than trying to control OpenGL timing any better than it controls itself. :) )

Anyway, thanks a lot for your advice and assistance.

.

Charles Pegge

  • Guest
Re: OpenGL Performance Contd.
« Reply #19 on: April 29, 2014, 08:44:41 AM »
Many thanks, Mike.

I have remapped the paths so that your code will run in it's own folder. Now we can compare :)

Your code is a little more hungry than mine at 60fps. taking 5-8% CPU compared to 2-3% as is, or 3-5% with performance displayed inWindowText (ControlCraftFrames.o2bas). It is surprising that setting window text makes a measurable difference. I did not see any dropouts in my test except for 3 at the very start, and when moving or resizing the window. It is highly dependent on how busy the PC is - playing youtubes, Text-To-Speech etc.

Anyway, I will examine your scheme more closely.



.

Mike Lobanovsky

  • Guest
Re: OpenGL Performance Contd.
« Reply #20 on: April 29, 2014, 02:04:50 PM »
Thank you, Charles.

I did not see any dropouts in my test...
I am glad it runs OK for you under your Vista. I wasn't able to eliminate stutter under mine though.

Quote
... except for 3 at the very start ...
I guess these must be due to some transitional processes until the app's window message flow stabilizes itself.

Quote
... and when moving or resizing the window.
That's only natural as the message loop is strictly single-threaded and window message handling has absolute priority over OpenGL renders. A Windows timer would run in another thread allowing for OpenGL rendering to occur even when processing other messages such as WM_RESIZE. But this scheme doesn't use any Windows timers. Instead, its WM_SIZE enforces canvas repaint by a direct call to Render().

Quote
It is highly dependent on how busy the PC is - playing youtubes, Text-To-Speech etc.
Yes, it is. The best tactics to run a video game would be to have all other apps shut down and the game process priority, somewhat elevated. When the focus moves to another window but you still want OpenGL action to go on in the background, you would usually pass render control to a timer similar to your own scheme. And finally, a call to Render() would be suppressed in MainWindow() altogether and moved to a WM_PAINT handling routine in a 3D editor environment where constant rendering of the scene is unnecessary.

Quote
Anyway, I will examine your scheme more closely.
Looking forward to your further comments. :)
« Last Edit: April 29, 2014, 05:06:27 PM by Mike Lobanovsky »

Charles Pegge

  • Guest
Re: OpenGL Performance Contd.
« Reply #21 on: April 30, 2014, 04:54:34 AM »
Hi Mike, a couple of questions:

Does XP default to 60 fps VSYNC?

Does XP opengl require any sleep time, or is this a normal part of VSYNC'ing?


Using PeekMessage, I'm now getting good results without using ARB calls, sleeps or timers (CPU load 3%) :)

Mike Lobanovsky

  • Guest
Re: OpenGL Performance Contd.
« Reply #22 on: April 30, 2014, 06:32:10 AM »
Good afternoon, Charles,

Does XP default to 60 fps VSYNC?
All Windows platforms default to 60FPS/1. I don't know the real idea behind 30FPS/2 on a PC; perhaps it was originally meant for laptops with on-board video chips. Now that higher-end modern notebooks may, and actually do, have discrete miniature video cards, they also default to 60FPS but 30FPS still holds true for lower-end business-style notebooks with on-board chips.

VSYNC=1 and VSYNC=2 are actually deprecated settings. The bad effect of fixed-rate VSYNC'ing is that once the scene becomes so heavily populated with static scenery and moving objects that the current OpenGL rendering technique (immediate mode, display lists, vertex buffer objects, or vertex shaders plus the associated fixed or programmable pipe line or pixel shaders) cannot cope with the entire frame within the 1/60 sec. period, VSYNC drops down from 60FPS to 30FPS abruptly. When such drops are momentary due to camera movement, image stutter is inevitable.

My XANEngine was dynamically watching the FPS rate and switched off VSYNC below 60FPS immediately, allowing for my custom calibrated timing procedures written in asm to take over in order to ensure smooth rendering until the camera retuned into a 61FPS area where 60FPS VSYNC would be switched on again.

Slight image tear potentially seen while the scene runs un-VSYNC'ed is a much lesser evil than image stutter caused by abrupt toggling between 60 and 30 FPS.

Modern adaptive VSYNC'ing implements this strategy within a contemporary OpenGL driver directly. The magic number is -1 which you use to set your VSYNC to 60FPS as usual. As soon as the FPS rate drops below 60, OpenGL switches VSYNC off entirely and your application is expected to run un-VSYNC'ed. When the camera comes back to areas that are easier to render, a 60FPS VSYNC is switched on again automatically.

Quote
Does XP opengl require any sleep time, or is this a normal part of VSYNC'ing?
No, Sleep() calls with their associated timeBegin/EndPeriod() are there only to satisfy your aspiration for zero CPU load. They are not either necessary or welcome for OpenGL's own timing or VSYNC'ing.

Matter is, immediate mode and draw lists are deprecated features in the OpenGL standard. There have been several attempts to declare them legacy techniques and exclude their support from the standard altogether. Yet it didn't actually happen for the reasons of backwards compatibility and educational/promotional importance. These techniques are however not being optimised in, or further developed for, the modern GPU's any more.

If you switch your rendering to vertex buffer arrays and fixed or programmable pipeline (texture units), you will see your CPU usage drop close to zero at 60FPS VSYNC at no extra programming costs. Once created and uploaded to the GPU, vertex buffer arrays can also be erased from conventional memory. Static vertex data will not be sent from the CPU to the GPU along the data bus any more and there will be only minimum activity on the CPU side of data bus to switch OpenGL's client-side states related to VBO and texture selection.

This is how the CPU and data bus loads would compare in my profiler in real-time deployment of XANEngine to render a fully textured static scene with approx. 250K non-culled polygons in view (CPU - long blue bar, data bus - long cyan bar):



The leftmost snapshot employs an RDTSC-based timer with finely calibrated mixture of Sleep(1) and Sleep(0) calls in place of VSYNC=60. In all the three cases, a fixed pipeline of 4 texture units is used to render diffuse+bump+gloss+lightmap textures in one pass and a post-processing pass is added to cover the entire canvas with a tinted quad to simulate the light volume the camera is currently in. A x4 antialiased x4 anisotropic canvas is used with 5 max. mipmap levels of texturing.

Finally, in a multi-threaded vertex and pixel shader application 99 per cent of rendering activity will be shifted to your GPU. Your CPU cores will go Cool'n'Quiet while your GPU fans will be roaring like a Boeing and you will start to meditate in all seriousness about the benefits of liquid nitrogen cooling. At that very moment you shall finally learn the attributes of a true gamer's nirvana. :)

Quote
Using PeekMessage, I'm now getting good results without using ARB calls, sleeps or timers (CPU load 3%) :)
Excellent! :D
« Last Edit: April 30, 2014, 09:23:36 AM by Mike Lobanovsky »

Charles Pegge

  • Guest
Re: OpenGL Performance Contd.
« Reply #23 on: April 30, 2014, 11:22:37 PM »
Thanks for your immensely detailed reply, Mike.

My interest is in using Opengl as a regular part of Oxygen's API, which needs to be efficient and feasible on a wide range of hardware. Once, I bought a top-of-the-range PC. It was not a good experience unfortunately. It never worked properly, even after many engineer call-outs and returns. On one occasion, the cooling fan failed on the Radeon card, which filled the room with the unfragrant aroma of baked electronics in a matter of minutes. But the beast made a good radiator for the winter months, and it eventually became spare parts, when the eletricity bill indicated it would be cheaper to buy another box :)

I'm updating all my Opengl examples. Winutil and OpenglScene run the show for most of the GUI samples.


Mike Lobanovsky

  • Guest
Re: OpenGL Performance Contd.
« Reply #24 on: April 30, 2014, 11:45:48 PM »
Thanks for your updates, Charles.

My workstation is also a very cosy place to spend winter nights at. The fans are blowing warmth onto my legs as the box stands beneath the desktop, and the temperature in the study is definitely three or four degrees higher than elsewhere in my apartment. :)

Looking forward to an updated O2 example pack!