Oxygen Basic

Information => Open Forum => Topic started by: Arnold on April 01, 2014, 10:51:54 PM

Title: Tiny Benchmark Test
Post by: Arnold on April 01, 2014, 10:51:54 PM
Hello,


In the Qdepartment forum I found this little code snippet which I tried with Oxygenbasic:

This is the code in C:

Code: [Select]

// gcc -03 prime_test.c -o prime_test.exe
//
#include <stdio.h>
#include <time.h>
 
int main() {
     int n, lim;
     int k, p;
     int pc;

int t1,t2,time_lapsed;
printf ("%s\n","Starting prime numbers:" );
t1=clock();

    pc = 0;
     n  = 1;
     lim = 5000000;
     while (n < lim) {
         k = 3;
         p = 1;
         n = n + 2;
         while (k * k <= n && p) {
             p = n / k * k != n;
             k = k + 2;
         }
         if (p) {
             pc = pc + 1;
         }
     }

t2=clock();
time_lapsed=(t2-t1) / 1000;

     printf("%d,\n", pc);
printf("%d%s\n", time_lapsed," Seconds");
printf("%s", "Enter ...");
getchar();

     return 0;
 }



This is the code in Oxygenbasic:

Code: [Select]

$ filename "prime_test2.exe"

includepath "$/inc/"

'#include "RTL32.inc"
'#include "RTL64.inc"

include "console.inc"


! GetTickCount lib "kernel32.dll" alias "GetTickCount" () as dword
 
sub main() {
     int n, lim;
     int k, p;
     int pc;

int t1,t2, time_lapsed
print "Starting Prime Numbers:" + cr + cr
t1=GetTickCount()

    pc = 0;
     n  = 1;
     lim = 5000000;
     while (n < lim) {
         k = 3;
         p = 1;
         n = n + 2;
         while (k * k <= n && p) {
             p = n / k * k != n;
             k = k + 2;
         }
         if (p) {
             pc = pc + 1;
         }
     }
     print( pc);
     
t2=GetTickCount()
time_lapsed = (t2-t1) / 1000 + cr

print cr + cr + "Time: " +time_lapsed + " Seconds" + cr + cr
print "Enter ..."
GetKey() 
   
     return 0;
end sub

main()


The result is: 348512

I made these observations:
Oxygenbasic has a very flexible syntax, it can handle a subset of C apart from understanding assembly language.
The filesize generated with gcc is about 89 kb. The Filesize generated with Oxygenbasic is about 16 kb.
The execution time on my (old) machine is nearly the same (about 9 to 10 Seconds).

So Oxygenbasic is very fast?


Roland
Title: Re: Tiny Benchmark Test
Post by: Mike Lobanovsky on April 02, 2014, 01:17:46 PM
Hello Roland,

Both gcc and O2 do exactly the same thing; they compile their respective source scripts into native machine code. The only difference here is that gcc always outputs the code to a disk file for storage and further utilization while O2 can (and very often does) execute it immediately from memory without storing its image to a disk file. So there's no surprise the benchmark results for such a simple and straight-forward integer math task are nearly identical.

OTOH gcc can perform transparently very intricate and clever code optimizations for more elaborate math-intensive scripts.  The -O3 switch you used for gcc compilation enables such optimization to the maximum. The inevitable penalty of optimization is however much, much longer compilation time and also much larger size of resultant disk file which carries in itself many different algorithms and architecture-specific facilities to check for and use in various brands of CPU's your code may run on. Just remove the -O3 switch altogether or change it to -Os to tell gcc to optimize the resultant exe for the smallest size possible. But even then, the resultant exe size will also be largely dependent on the gcc version you're using. The lower (and hence older and simpler) the version, the smaller the output file size.
Title: Re: Tiny Benchmark Test
Post by: Arnold on April 03, 2014, 12:53:57 AM
Hello Mike,

thank you for your reply. My intention was not to emphasize neither benchmark nor filesize. There is no doubt that C is a mighty language and gcc is a powerful tool which I (unfortunately) will not really ever understand. But I found this little code and wanted to know how it works with O2h. The first surprise for me was that Oxygenbasic can understand some C syntax - I did not realize this until then. The second surprise for me was that the execution time was nearly the same using both gcc and o2h with my notebook. And this seems to be very fast compared with other languages. But currently I cannot verify this.

Roland
Title: Re: Tiny Benchmark Test
Post by: Charles Pegge on April 03, 2014, 01:57:56 AM
Yes, it generates efficient machine code for integer and float operations. You will often find that there is very little advantage in using assembly code to replace numeric expressions. There are, of course, many opportunities for improvement

I suspect that GCC optimisation involves turning function calls into inline code. This avoids the overhead of function prologs and epilogs, but it will usually make the binary significantly larger.

To achieve in-lining, procedures can be turned into macros quite easily. (Just make sure that expression params are enclosed in brackets)

sub abc(sys a,b,c, string d)
...
end sub

macro(a,b,c,d)
...
end macro

if the procedure uses local variables, then the macro must use a scope to confine them:

macro(a,b,c,d)
  scope
  ...
  end scope
end macro
Title: Re: Tiny Benchmark Test
Post by: Arnold on April 03, 2014, 08:36:56 AM
This closes a gap in my knowledge and gives new prospects. I found several files which contain macros and wondered about their purpose. But now (sub/func with inline coding) it makes sense.

There is a lot to be discovered in Oxygenbasic. At least for me.

Roland
Title: Re: Tiny Benchmark Test
Post by: Ed Davis on April 04, 2014, 07:13:07 AM
This is pretty cool, that you can compile using C-like syntax.  I thought I'd give it a try, but I'm running into an error that I can't figure out:

Code: [Select]
$ filename "prime_test2.exe"

includepath "$/inc/"

'#include "RTL32.inc"
'#include "RTL64.inc"

include "console.inc"


! GetTickCount lib "kernel32.dll" alias "GetTickCount" () as dword

sub main() {
    int left_edge, right_edge, top_edge, bottom_edge, max_iter,
    x_step, y_step, y0, x0, x, y, i, x_x, y_y, temp, the_char,
    accum, count;

int t1,t2, time_lapsed
print "Starting Mandel accum:" + cr + cr
t1=GetTickCount()

    accum = 0;
    count = 0;
    while (count < 1545) {
        left_edge   = -420;
        right_edge  =  300;
        top_edge    =  300;
        bottom_edge = -300;
        x_step      =  7;
        y_step      =  15;

        max_iter    =  200;

        y0 = top_edge;
        while (y0 > bottom_edge) {
            x0 = left_edge;
            while (x0 < right_edge) {
                y = 0;
                x = 0;
                the_char = ' ';
                x_x = 0;
                y_y = 0;
                i = 0;
                while (i < max_iter && x_x + y_y <= 800) {
                    x_x = (x * x) / 200;
                    y_y = (y * y) / 200;
                    if (x_x + y_y > 800 ) {
                        the_char = '0' + i;
                        if (i > 9) {
                            the_char = '@';
                        }
                    } else {
                        temp = x_x - y_y + x0;
                        if ((x < 0 && y > 0) || (x > 0 && y < 0)) {
                            y = (-1 * ((-1 * (x * y)) / 100)) + y0;
                        } else {
                            y = x * y / 100 + y0;
                        }
                        x = temp;
                    }

                    i = i + 1;
                }
                accum = accum + the_char;

                x0 = x0 + x_step;
            }
            y0 = y0 - y_step;
        }
        if (count % 300 == 0) {
            print(accum);
        }

        count = count + 1;
    }
    printf(accum);


t2=GetTickCount()
time_lapsed = (t2-t1) / 1000 + cr

print cr + cr + "Time: " +time_lapsed + " Seconds" + cr + cr
print "Enter ..."
GetKey()

     return 0;
end sub

main()

But I get:

exo2 -c mandel.o2bas

ERROR:  `end while` or `wend`  expected

WORD:   }
LINE:   65
FILE:   main source
PASS:   1



I just cut and pasted the C version of this, so not sure what I did wrong.

Thanks for any help!
Title: Re: Tiny Benchmark Test
Post by: Charles Pegge on April 04, 2014, 08:04:36 AM
Hi Ed,

Curly braces are very hard to follow in multiple nestings, but the problem turned out to be single quotes '0' instead of "0"

Also, printf needs to be print

Code: [Select]

$ filename "prime_test2.exe"

includepath "$/inc/"

'#include "RTL32.inc"
'#include "RTL64.inc"

include "console.inc"


! GetTickCount lib "kernel32.dll" alias "GetTickCount" () as dword

sub main() {
    int left_edge, right_edge, top_edge, bottom_edge, max_iter,
    x_step, y_step, y0, x0, x, y, i, x_x, y_y, temp, the_char,
    accum, count;

int t1,t2, time_lapsed
print "Starting Mandel accum:" + cr + cr
t1=GetTickCount()
    accum = 0;
    count = 0;
    while (count < 1545)
    {
        left_edge   = -420;
        right_edge  =  300;
        top_edge    =  300;
        bottom_edge = -300;
        x_step      =  7;
        y_step      =  15;

        max_iter    =  200;

        y0 = top_edge;
        while (y0 > bottom_edge)
        {
            x0 = left_edge;
            while (x0 < right_edge)
            {
                y = 0;
                x = 0;
                the_char = ""';
                x_x = 0;
                y_y = 0;
                i = 0;
                while (i < max_iter && x_x + y_y <= 800)
                {
                    x_x = (x * x) / 200;
                    y_y = (y * y) / 200;
                    if (x_x + y_y > 800 )
                    {
                        the_char = "0" + i;
                        if (i > 9)
                        {
                            the_char = "@";
                        }
                    }
                    else
                    {
                        temp = x_x - y_y + x0;
                        if ((x < 0 && y > 0) || (x > 0 && y < 0)) {
                            y = (-1 * ((-1 * (x * y)) / 100)) + y0;
                        } else {
                            y = x * y / 100 + y0;
                        }
                        x = temp;
                    }
                    i = i + 1;
                } 'wend
                accum = accum + the_char;
                x0 = x0 + x_step;
            } 'wend
            y0 = y0 - y_step;
        } 'wend
        if (count % 300 == 0) {
            print(accum);
        }

        count = count + 1;
    }'wend
    print accum;


t2=GetTickCount()
time_lapsed = (t2-t1) / 1000 + cr

print cr + cr + "Time: " +time_lapsed + " Seconds" + cr + cr
print "Enter ..."
GetKey()

     return 0;
end sub

main()
Title: Re: Tiny Benchmark Test
Post by: Ed Davis on April 04, 2014, 08:23:23 AM
Hi Ed,

Curly braces are very hard to follow in multiple nestings, but the problem turned out to be single quotes '0' instead of "0"

Also, printf needs to be print

Thanks for the catch!

re: Curly braces nesting. That is why C programmers depend on the editor matching them when you type one or move the cursor to one.  I'd have a hard time keeping up with them otherwise :)

Turns out the double quotes don't translate the same way they do in C - so I just used the number, e.g., 32 for space, 48 for '0', and 64 for '@'.

Now I get the result I'm after - 309886830 - and it takes 3 seconds.  The original C version takes one second, so the current result is very good, especially considering how fast Oxygen compiles.

Another question though - I'm not getting the intermediate results - this statement doesn't appear to be correct:
Code: [Select]
        if (count % 300 == 0) {
            print(accum);
        }

Title: Re: Tiny Benchmark Test
Post by: Charles Pegge on April 04, 2014, 10:14:26 AM
% is used to define equates - so modulus is done like  this:

mod(count,300)
Title: Re: Tiny Benchmark Test
Post by: Mike Lobanovsky on April 04, 2014, 10:29:25 AM
Hello Ed,

If you want to have a vanilla ANSI C JIT compiler (and also hand-coded JIT Intel-style assembly) within the framework of a BASIC-like interpreter, please have a look at FBSL (http://www.fbsl.net/phpbb2/index.php).

The zip includes raw C and FBSL scripts as well as gcc (both -O3 and unoptimized) and FBSL binaries. Running all the three on the end user's computer may give them a more coherent view of benchmark results. A table put together on a computer of unknown configuration does looks to me a bit like "data in spherical vacuum". :)

My current config is in my forum signature. The screenshot results were taken under Win XP Sp3 Professional.

.
Title: Re: Tiny Benchmark Test
Post by: Arnold on April 06, 2014, 11:20:41 AM
Hello,

I tried to transfer the Mandel test to plain basic. If I did it correctly, the code looks like this:

Code: [Select]

$ filename "Mandel2.exe"

includepath "$/inc/"

'#include "RTL32.inc"
'#include "RTL64.inc"

include "console.inc"


! GetTickCount lib "kernel32.dll" alias "GetTickCount" () as dword

sub main()
    int left_edge, right_edge, top_edge, bottom_edge, max_iter,
    x_step, y_step, y0, x0, x, y, i, x_x, y_y, temp, the_char,
    accum, count

    int t1,t2, time_lapsed
    print "Starting Mandel accum:" + cr + cr
    t1=GetTickCount()

    accum = 0
    count = 0
    while count < 1545
   
        left_edge   = -420
        right_edge  =  300
        top_edge    =  300
        bottom_edge = -300
        x_step      =  7
        y_step      =  15

        max_iter    =  200

        y0 = top_edge
        while y0 > bottom_edge       
            x0 = left_edge
            while x0 < right_edge           
                y = 0
                x = 0
                the_char = asc(" ")
                x_x = 0
                y_y = 0
                i = 0
                while (i < max_iter) and (x_x + y_y <= 800)                               
                    x_x = (x * x) / 200
                    y_y = (y * y) / 200
                   
                    if (x_x + y_y) > 800  then
                        the_char = asc("0") + i                       
                        if (i > 9) then                       
                            the_char = asc("@")                       
                        end if
                    else               
                        temp = x_x - y_y + x0
                        if ((x < 0 and y > 0) or (x > 0 and y < 0)) then
                            y = (-1 * ((-1 * (x * y)) / 100)) + y0
                        else
                            y = x * y / 100 + y0
                        end if
                        x = temp
                    end if
                    i = i + 1
                wend
               
                accum = accum + the_char

                x0 = x0 + x_step
            wend
           
            y0 = y0 - y_step
        wend
       
        if mod(count, 300) = 0 then
            print accum + cr
        end if

        count = count + 1
    wend
   
    print accum

    t2=GetTickCount()
    time_lapsed = (t2-t1) / 1000 + cr

    print cr + cr + "Time: " +time_lapsed + " Seconds" + cr + cr
    print "Enter ..."
    GetKey()

end sub

main()


The execution time with my notebook was 5 seconds for gcc, 19 seconds for Mandel_FBSL.exe and 23 seconds for Oxygenbasic. I did not try to optimize anything.

The result shows that my notebook is a lame box, as the test could be run 5 times faster. When I buy a new pc, I will have an usb-stick with this test program with me.

Roland
Title: Re: Tiny Benchmark Test
Post by: Aurel on April 06, 2014, 03:40:05 PM
Well i don't know how but i get this:
C unopt - 19 sec
FBSL - 24 sec
o2 - 26 sec
 ::)
Title: Re: Tiny Benchmark Test
Post by: Mike Lobanovsky on April 06, 2014, 06:39:54 PM
Aurel,

I think the only path to knowledge lies in your PC config. Perhaps it's time to upgrade.

Here comes the O2 result on my workstation. The script and binary are attached in the zip.

.
Title: Re: Tiny Benchmark Test
Post by: Charles Pegge on April 06, 2014, 08:31:59 PM
Oxygen's condition logic is not optimised for speed. It uses the same catch-all strategy for most conditions, and has resisted my attempts to improve it so far. I am looking for ways that do not make compiling excessively complicated.
Title: Re: Tiny Benchmark Test
Post by: JRS on April 06, 2014, 08:56:53 PM
You would think with all the senseless loop benchmarking going on someone at CERN is holding a contest for the fastest BASIC in town. Who cares if one BASIC runs a second faster than the other? How about posting something of value that can be used for something?
Title: Re: Tiny Benchmark Test
Post by: Aurel on April 06, 2014, 10:04:54 PM
Mike
Yes perhaps... ::)
BUT i have tested this programs 3 times on my both old computers
( hehe i love retro stuff)  ;D
and results are the same...second more or second less...
by the way Borland BCC is little bit faster than gcc   ;D.

I agree with John ,this benchmarks are really little bit stupid.
Title: Re: Tiny Benchmark Test
Post by: Mike Lobanovsky on April 07, 2014, 12:28:54 PM
Who cares if one BASIC runs a second faster than the other?
Where did you see me comparing BASIC's here, John? I was benchmarking gcc -O3, gcc w/o optimization and FBSL's DynC where all the three compilers are 100% ANSI C standard compliant. Their execution speed is a direct criterion of the quality of resultant machine code they generate. I'm also appending a screenshot of gcc optimized to compile the same script for the smallest exe size possible (-Os option).

I posted my original message in response to Ed Davis' C-related inquiry. And I have added the O2 snippet only in response to Aurel's message to give him an idea of how the O2 variant would look on my machine.

OTOH both O2 and FBSL are much more than just BASIC's. FBSL already has three distinct languages incorporated and interacting while O2 is nearing the same point with its beneficial capability of reading much of assembly and C as direct BASIC inlines.

I am not in the habit of comparing Atari BASIC against Sinclair BASIC. I'm not a necro-monger. I believe that future belongs to such complete development environments as O2 or FBSL or Terra (http://terralang.org/pldi071-devito.pdf) exactly thanks to their omnivorous nature. And I like to know exactly where I am with my FBSL visions/technologies/implementations amongst my peers. I do not want to end up my days the way that unfortunate newly-appeared ClipperBASIC author did.

P.S. And if what I said above still sounds stupid to you, Aurel, then you may call me a fool. Time will show.

.
Title: Re: Tiny Benchmark Test
Post by: Aurel on April 07, 2014, 09:51:26 PM
Oh Mike...what a heck is wrong with you ?
Where you see that i say that your words are stupid  :o
I just think that all this to much benchmarking is little bit stupid and boring..ok  ;)

Hm...about FBSL...i think that you know where you are.
Sorry man ..from technical point of view FBSL is ok and fine   but
what is all that worth if you don't have users.
And as we can see there is no interest for (not only ) FBSL ,i dont see to much
activity on FBSL forum and why FBSL is not much more popular?
And as you can see situation is almost the same anywhere ...
if you have 3-5 active users - what is that ? ..nothing
Wishes are one thing and situation is another thing.
but who care ...forget  ;)

Ha clipperBasic is a what ?
What he espect on a rubbish site like is basicprogramming.org with
users ( read losers) oriented to *unix .
And to be honest ,what is clipperBasic ...unfinished attempt  which don't have nothing
with basic,
Title: Re: Tiny Benchmark Test
Post by: Charles Pegge on April 08, 2014, 02:00:27 AM
I remember Clipper as a DbaseIII compiler. It inspired me to write a replacement for DbaseIII, and avoid costly licences for the company network, where I was working.

Is there any connection with Clipper Basic?

PS:
After looking at this BenchMark, I decided to have another crack at optimising OxygenBasic's conditionals. I think I'm making progress this time.
Title: Re: Tiny Benchmark Test
Post by: Aurel on April 08, 2014, 02:21:24 AM
Quote
After looking at this BenchMark, I decided to have another crack at optimising OxygenBasic's conditionals. I think I'm making progress this time.

That would be good Charles... ;)
Title: Re: Tiny Benchmark Test
Post by: Mike Lobanovsky on May 14, 2014, 07:49:40 AM
Hello community,

To finish off another round of "the integer intensive benchmarks" so popular at BP.org, here are my results for some of the native-code and JIT compilers as well as bytecode and pure interpreters that I checked out under XP Sp3 on the hardware depicted in my signature below.

The reason why I opted for my own table is that I couldn't confirm some of the data presented at BP.org. For instance, my results for such items as GCC, FreeBASIC or Euphoria differ significantly from that data though some other items are in harmony with it. Also, I thought it would be interesting for some of our members to see where FBSL** or thinBasic or Scriba** (that's an interpretative reincarnation of ScriptBASIC) should've stood among their competitors.

The zip attached at the very bottom of this message contains the respective scripts and some precompiled executables where applicable, so that you could try and verify the results on your own HW yourselves.
______________________________
** FBSL's BASIC and Scriba are bytecode interpreters with typeless (a.k.a. Variant-type) variables. They are included in the Interpreters category because 99% of the benchmark code runs in quadruply nested loops where byte coding doesn't help due to difficulties with efficient garbage collection.

Code: [Select]
Native Code Compilers
=====================

MS VC12 with full
optimization (-Ox) 1.094 sec Proprietary

GCC v4.3.3 with full
optimization (-O3) 1.312 sec Open source

FreeBASIC 3.209 sec Open source

MS VB6 console with
full optimization 62.00 sec Proprietary


Just-In-Time Compilers
======================

FBSL Dynamic C 3.203 sec Freeware,
closed source

Oxygen Basic 3.593 sec Open source

LuaJIT 3.890 sec Open source


Bytecode Interpreters
=====================

Euphoria 43 sec Open source

Lua 77 sec Open source

MS VB6 console p-code 147 sec Proprietary


     Interpreters
     ============

thinBasic 543 sec Freeware,
closed source

FBSL BASIC** 564 sec Freeware,
closed source

ScriptBASIC 625 sec Open source
(Scriba**)

LB Booster 1375 sec Freeware,
closed source


Native code compilers:

(http://i1240.photobucket.com/albums/gg490/FbslGeek/Native.png)


Just-in-time compilers:

(http://i1240.photobucket.com/albums/gg490/FbslGeek/JIT.png)


Bytecode interpreters:

(http://i1240.photobucket.com/albums/gg490/FbslGeek/Bytecode.png)


Interpreters:

(http://i1240.photobucket.com/albums/gg490/FbslGeek/Interpreters.png)

.
Title: Re: Tiny Benchmark Test
Post by: Charles Pegge on May 14, 2014, 12:32:32 PM
Very interesting results, Mike. Interpreters always look bad on integer tests. When doing trig functions, the performance difference is far smaller.

I am intrigued as to how VC12 squeezes the time down to 1 second! With a few simple optimisations to the inner loop, I can get it down from 3.6 to about 2 secs (in terms of your PC performance).

Even hand coding the inner loop in assembler, I can't push it below 2 seconds!
Title: Re: Tiny Benchmark Test
Post by: JRS on May 14, 2014, 01:08:41 PM
Script BASIC is a P-CODE interpreter. SB surely takes a hit in repetitive loop benchmarks as its variables are like variants. (objects)

Quote from: Charles
Even hand coding the inner loop in assembler, I can't push it below 2 seconds!

I don't know what to say. I'm crushed!  >:(
Title: Re: Tiny Benchmark Test
Post by: Ed Davis on May 14, 2014, 01:53:09 PM
Quote
my results for such items as GCC, FreeBASIC or Euphoria differ significantly from that data

Running some .exe's from your supplied archive on my machine:


intmandelGCC:           1.342 seconds (yours was 1.312)
intmandelFB:            3.377 seconds (yours was 3.209)


My FreeBasic script used '/' for division, along with int().  Changing those, and now my version comes in at 3.36, which is almost the same. 

Running the Euphoria script from your archive:

(http://semware.com/images/mandel-eu.png)

Definitely no idea why these are so different, from 43 to 24 seconds is a big difference.

Compiling your version of intmandelGCC with a later version of gcc (4.8.1):

(http://semware.com/images/mandel-gcc.png)

Here are the specs for my machine:

Windows 7, Service Pack 1, 64-bit
Intel Core i7-3720QM CPU @2.60GHz
16.0 GB (15.9 usable)


Title: Re: Tiny Benchmark Test
Post by: Charles Pegge on May 14, 2014, 08:47:43 PM

Could the optimised C code be running in 2 concurrent threads? :)
Title: Re: Tiny Benchmark Test
Post by: JRS on May 14, 2014, 09:26:07 PM
Nope. Single thread.

There is no difference in speed between VC12 32 bit and a 64 bit version of the benchmark.
Title: Re: Tiny Benchmark Test
Post by: Mike Lobanovsky on May 15, 2014, 12:21:55 AM
Script BASIC is a P-CODE interpreter. SB surely takes a hit in repetitive loop benchmarks as its variables are like variants. (objects) I don't know what to say. I'm crushed!

You shouldn't be so upset with that, John. FBSL's BASIC is also a bytecode interpreter (thinBasic isn't - it's a pure interpreter) and totally Variant-based; nonetheless it suffers similar heavy performance penalties in loop benchmarks. That's why I made a special reservation about FBSL by classifying it under Interpreters rather than Bytecode Interpreters for this particular benchmark. Perhaps we should add this remark also to Scriba. That's a natural trade-off for the benefits of having typeless variables.

The nature of a Variant variable may change any time within the loop. It may become a string or even a class instance that need to be freed or destroyed before the bunch of them pollute the process stack and heap to the extent of total exhaustion. Same goes about recursion. Where strong data type interpreters enjoy temp var allocation on the function stack that's cleared automatically on function exit, Variants need an explicit malloc'ing and freeing of heap memory chunks. Thus, a simple loop becomes a nightmare for the garbage collector that does the cleanup job plus memory defragmentation whenever possible. Garbage collection adds dozens of auxiliary function calls that slow down the loop dramatically.

OTOH Lua is also Variant-based and is exceptionally fast at that. So the problem can be solved somehow, however I don't know how to re-implement Lua's methodology in the existing implementation of FBSL's BASIC without major rework of its engine from the ground up.
Title: Re: Tiny Benchmark Test
Post by: Mike Lobanovsky on May 15, 2014, 01:06:24 AM
Thanks for your response, Ed.

My FreeBasic script used '/' for division, along with int().  Changing those, and now my version comes in at 3.36, which is almost the same.
Oh yes, that explains pretty much the difference in our results.

Quote
Definitely no idea why these are so different, from 43 to 24 seconds is a big difference.
I'm using Euphoria's latest build, v4.1 for Windows from their repository. Is that what you've used for the test?

Could the optimised C code be running in 2 concurrent threads? :)
No Charles, definitely not although Euphoria does seem to be employing some sort of code optimization. I think the difference may also be caused by the architectural differences of our CPU's. Ed's machine is i7-based while mine is built around an i5.

Another point: the script takes about 1 second or more to compile while the console is already on the screen before it starts to execute. I don't think their parser is so slow. I'm rather inclined to attribute the lag to some heavy optimization which may be the reason why Euphoria is the fastest bytecode interpreter of all indie implementations.

Quote
Compiling your version of intmandelGCC with a later version of gcc (4.8.1):
Thanks for pointing this out, Ed. I've been using GCC v4.3.3 to compile FBSL for many years because its output code proved to be up to 45% faster than later builds. I'm using a post-build script to fix the crooked PE headers and garbage that v4.3.3 generates in its output binaries under Windows. Evidently it's high time for me to try out v4.8.1 against the FBSL sources. Perhaps I'll be able to enjoy a 30% boost there too. :)
Title: Re: Tiny Benchmark Test
Post by: Charles Pegge on May 15, 2014, 01:40:04 AM
Mike, could you time this one on your PC. I think it goes almost twice the speed of mine, but I would like to see how fast it goes on yours with my optimisations. (3.95 secs on my PC)

The code is very raw and messy. Don't look at it if you are squeamish :)


A few ideas for interpreter optimisation: You may already have these:

String pools for efficiently recycling small strings, and mass allocation/disallocation.

Dynamic tokenisation of source code. Progressively convert source into tokens.

Jump tables for efficiently converting operator tokens into handler calls.



.
Title: Re: Tiny Benchmark Test
Post by: Mike Lobanovsky on May 15, 2014, 02:13:06 AM
The code is very raw and messy. Don't look at it if you are squeamish :)
I love any code that runs so fast as this! ;D

Quote
I think it goes almost twice the speed of mine, but I would like to see how fast it goes on yours with my optimisations. (3.95 secs on my PC)
AMAZING! Please see the snapshot below taken on fresh reboot into XP Sp3. And do not try to beat Java and .NET. They are cheaters: they run the JIT source that's been heavily optimized when precompiled slowly for the first time and written into a disk file.

Quote
String pools for efficiently recycling small strings, and mass allocation/disallocation.
FBSL uses separate pool allocators for strings, static Variant variables,  temporary Variant variables, and class instances.

Quote
Dynamic tokenisation of source code. Progressively convert source into tokens.
FBSL uses three ^^and a half^^ successive levels of source code processing at app start:
-- Level 1: preprocessor (separate for BASIC/DynAsm and DynC)
-- Level 2: recursive descent parser and tokenizer for BASIC, JIT for DynAsm and DynC
-- Level 3: BASIC bytecode compiler
-- Level 3.5: BASIC bytecode optimizer.

Quote
Jump tables for efficiently converting operator tokens into handler calls.
That's what I'm currently busy with having read the article that Ed pointed Aurel to recently at BP.org. :)

.
Title: Re: Tiny Benchmark Test
Post by: Charles Pegge on May 15, 2014, 02:54:42 AM
Thanks Mike,

There are no further optimisations I can think of. We are down to manual assembler, which convinces me that the optimised C code employs the services of a least 2 cores, to be able to produce timings less than half of mine.

Jump Tables and Tokens:

maybe not so simple in C:
Code: [Select]
'SETUP
======
sys j
sys t[]={@AA,@BB,@CC} 'map label addreses into table

'TEST
=====

token=2
j=t[token] : jmp j

jmp fwd done

AA:
print "AA"
jmp fwd done

BB:
print "BB"
jmp fwd done

CC:
print "CC"
jmp fwd done

done:
Title: Re: Tiny Benchmark Test
Post by: Mike Lobanovsky on May 15, 2014, 03:26:52 AM
... the optimised C code employs the services of a least 2 cores ...
Possibly though not directly via multi-threading. Also, there may be transparent optimizations with MMX or SSE+ facilities involved.

My Ida Pro shows that the VC12 optimizer uses MMX (GCC v4.3.3 doesn't) to enhance the test code performance. Then, apart from code alignment there's also such a thing as proper UV-paring of Pentium instructions. So your handcrafted assembly still has a chance to improve on its odds... :)

Jump Tables and Tokens:
....................
maybe not so simple in C:
....................
Yet perfectly feasible for GCC's crooked AT&T assembly ( ;) ) provided parameter passing is added to the scheme. Computed gotos are also part of C99 standard and fully supported in GCC (and in FBSL's Dynamic C too BTW).

Thanks a lot!
Title: Re: Tiny Benchmark Test
Post by: Ed Davis on May 15, 2014, 05:13:23 AM
I'm using Euphoria's latest build, v4.1 for Windows from their repository. Is that what you've used for the test?

I used 4.04.  But I should have used 3.1 :)

(http://www.semware.com/images/mandel-eu-all.png)

That 3.1 time is very impressive!  And note: exwc is the Windows Console interpreter.  The name was changed in the 4.0 line.
Title: Re: Tiny Benchmark Test
Post by: Mike Lobanovsky on May 15, 2014, 05:33:20 AM
Ed, ;D

This is the most demonstrative demotivator to justify once again what I've always said:

COMMUNITY DRIVEN EQUAL-OPPORTUNITY PROJECTS ARE A HIGHWAY TO HELL!

(http://www.fbsl.net/phpbb2/images/smilies/icon_ml_bananadevil.gif)




[EDIT] In the same vein:

Petr Schreiber of thinBasic has just published a message (http://www.thinbasic.com/community/showthread.php?12427-Brief-peek-at-VB-NET-development) about MS C# and VB.NET compiler code going open source (http://roslyn.codeplex.com/).
Title: Re: Tiny Benchmark Test
Post by: JRS on May 15, 2014, 08:05:35 AM
Mike,

Thanks for your explanation and clarification of the trade-offs of typeless variables. By default, Script BASIC's variables are thread safe and uses Peter Verhas's MyAlloc for the memory manager/garbage collector. With a change of a #define, SB can use the standard C malloc/free if SB is only going to be used in a single process environment. I may generate a scriba in this model just for performance testing and comparison.
 
Title: Re: Tiny Benchmark Test
Post by: JRS on May 15, 2014, 08:35:01 AM
Quote from: Mike
Petr Schreiber of thinBasic has just published a message about MS C# and VB.NET compiler code going open source.

.NET jQuery package (http://www.nuget.org/packages/jQuery/) - 6,080,575 downloads from nuget.

I don't think we are playing in the same ballpark.  :-\
Title: Re: Tiny Benchmark Test
Post by: Mike Lobanovsky on May 15, 2014, 09:48:36 AM
We certainly aren't as public significance of what we and they are doing is definitively incongruous.

I think open source in both cases may be just a marketing ploy to increase the popularity of either platform among the ranks of independent developers. The more they trust the instruments they are given, the larger their incentive to use them on the respective platform and pay for that in full. Note well the project owners stay immutable in both cases regardless of who actually contributes to the respective products. Sort of public trust management of private assets, hehe...

Still I'm feeling again like I'm talking about matters that aren't exactly within my sphere of competence. :)

Title: Re: Tiny Benchmark Test
Post by: Aurel on May 16, 2014, 01:34:01 AM
Quote
And do not try to beat Java and .NET. They are cheaters: they run the JIT source that's been heavily optimized when precompiled slowly for the first time and written into a disk file.

good to know Mike...and thanks for real results...
also i will add that euphoria is on the same track as Java or .Net or is really that optimized
that is faster than other bytecode interpreters.
Title: Re: Computed Goto's in DynC and GCC
Post by: Mike Lobanovsky on May 16, 2014, 02:28:45 AM
.......
Jump Tables and Tokens:

maybe not so simple in C:
Code: [Select]
'SETUP
======
sys j
sys t[]={@AA,@BB,@CC} 'map label addreses into table

'TEST
=====

token=2
j=t[token] : jmp j

jmp fwd done

AA:
print "AA"
jmp fwd done

BB:
print "BB"
jmp fwd done

CC:
print "CC"
jmp fwd done

done:

Hi Charles,

GNU C supports computed goto's so your scheme can be used literally in both FBSL's Dynamic C and GCC alike. Both languages resolve goto to a jmp instruction. The scheme now works for the intended purposes in the FBSL source code.

Thanks again!

.
Title: Re: Tiny Benchmark Test
Post by: Charles Pegge on May 16, 2014, 03:16:18 AM
Excellent Mike, The best possible performance can be obtained for token code. Calling functions has a significant overhead.

You can also optimise integer processing for simple variables and common tasks like iteration.

Assuming you don't want to take the JIT path, of course. I can see the portability advantage of not dealing with specific processors.
Title: Re: Tiny Benchmark Test
Post by: Mike Lobanovsky on May 16, 2014, 06:12:54 AM
You can also optimise integer processing for simple variables and common tasks like iteration.

No way, Charles. There are no simple variables in FBSL's BASIC:
 
1. Every little thing is at least a 16-byte COM-style Variant. Besides the Variant proper, strings and compound data types grow sideways branches of associated readable/writable pool memory.

2. Functions grow a downwards tree-like sub-script of parameters, local Variant variables and code proper while the top Variant hosts the return value. DynAsm and DynC blocks are framed as ordinary functions with a sideways branch of associated executable pool memory filled with machine code bytes.

3. Namespaces and/or class instances are tree-like sub-scripts of their own that replicate the general structure of an FBSL main script .

4. Main script is a huge tree wherein:

-- the top Variant hosts the app exit code;
-- the trunk is made up of global Variant variables and global code, if any; and
-- the branches are function sub-trees, and/or namespaces, and/or class instances, if any, with their own functions/methods as described in Item 2.

An FBSL program's memory footprint is like a Christmas fir-tree - very attractive to look at but extremely prickly to the touch whenever you want to change the sources. ;D

Quote
Assuming you don't want to take the JIT path, of course.

FBSL v3.5's BASIC will stay as it is. FBSL v4's BASIC will be a strong data type JIT compiler - no more Variants except for the COM layer.

Quote
I can see the portability advantage of not dealing with specific processors.

Hmmm... That is, in the core engine? Yet there can be a range of connectable modules responsive to a call to IsProcessorFeaturePresent() - that's what our "big brothers" have (e.g. MS VC and GCC). And I think it will be reasonable to target Intel architectures for Windows/Linux/Mac mainly, at least at this stage of project development.
Title: Re: Tiny Benchmark Test
Post by: Charles Pegge on May 16, 2014, 10:02:17 PM
Hi Mike,

I found this lecture very informative from the development angle. Have you seen it before?


Bjarne Stroustrup 2012 keynote presentation on C++11

http://channel9.msdn.com/Events/GoingNative/GoingNative-2012/Keynote-Bjarne-Stroustrup-Cpp11-Style

In particular, his discussion of linked-structures vs linear structures (around 46:00)
Title: Re: Tiny Benchmark Test
Post by: Aurel on May 17, 2014, 02:46:37 PM
Is this what you call ' computed goto ' ?

Quote
Handler Chaining

What I have never seen a C or C++ compiler do for such interpreter
loop code is make one further optimization.
 What is if instead of generating the jump instruction to the top of the "for" loop the compiler
was smart enough to simply compile in the fetch and dispatch into each handler?
 In other words, if there was some kind of funky "goto" keyword syntax
that would allow you to write this in your source code to hint to the compiler to do that:

        switch(peekb(PC++))
            {
        default: /* undefined opcode! treat as nop */
        case opNop:
            goto case(peekb(PC++));

        case opIncX:
            X++;
            goto case(peekb(PC++));

        case opLdaAbs16:
            addr = peekw(PC);
            PC += 2;
            A = peekb(addr);
            goto case(peekb(PC++));

You would now have an interpreter loop that simply
jumped from instruction handler to instruction handler
without even looping. Unless I missed something obvious,
C and C++ lack the syntax to specify this design pattern,
and the optimizing compilers don't catch it. It is for this reason that for all
of my virtual machine projects over the past 20+ years I have resorted to using
assembly language to implement CPU interpreters. Because in assembly language you
can have a calculated jump target that you branch to from the end of each handler,
 as this x86 example code which represents the typical instruction dispatch code used
in most past versions of Gemulator and SoftMac and Fusion PC:

    movzx ebx,word ptr gs:[esi]
    add esi,2
    jmp fs:[ebx*4]
Title: Re: Tiny Benchmark Test
Post by: Aurel on May 17, 2014, 03:08:13 PM
well ..another interesting site:
http://eli.thegreenplace.net/2012/07/12/computed-goto-for-efficient-dispatch-tables/
Title: Re: Tiny Benchmark Test
Post by: Mike Lobanovsky on May 17, 2014, 09:41:54 PM
@Charles:

Thanks a lot for the both links!

@Aurel:

No, it isn't.

The article Charles pointed to explains computed goto's very clearly. In ordinary C code, you don't know the memory address of a label. The jmp inctruction is hardcoded by the compiler at compile time based on the label's relative offset from the command (goto) that actually jumps to the label:

goto Label;
................. // some other code
Label:


But you still don't know what the real address of the label is.

Computed goto's allow the programmer to retrieve the real address of the label, store it e.g. in a table (i.e. in an array) or even perform some mathematic operations on it if necessary, and use the address or the resultant math value as the target for the instruction that actually jumps to the label or address computed in a math expression. The goto syntax in this case is slightly different:

void *target = &&Label; // && retrieves Label address
target += 100; // add 100 bytes to Label address
goto *target; // jump 100 bytes farther in memory than where Label points to
................. // some other code
Label:


The abstract you quoted comes from Ed's article and it only discusses briefly why switch isn't perfect for an ideal interpreting loop. Yet Charles' article does it in much deeper detail and in better English.

I would however like to add two points to what is written in both articles:

1. The usual break causes two jumps to be performed in this hypothetic loop:

// there's an implicit BeginWhile label here
while (1) {
    switch (code[pc++]) {
    case OP_ONE:
        DoSomething();
        break;
    default:
        DoNothing();
    } // there's an implicit EndSwitch label here
} // there's an implicit EndWhile label here


- first, to the implicit EndSwitch label, and then, from the EndWhile label to the BeginWhile label where while (1) resides.

Assuming there's nothing to execute in between the EndSwitch and EndWhile labels, a more reasonable solution will be to use continue instead of break - it will spare one extra jump:

// there's an implicit BeginWhile label here
while (1) {
    switch (code[pc++]) {
    case OP_ONE:
        DoSomething();
        continue;
    default:
        DoNothing();
    } // there's an implicit EndSwitch label here
} // there's an implicit EndWhile label here


- here continue jumps directly to the implicit BeginWhile label.

2. All these considerations and timings matter only if the code within DoSomething() and DoNothing() is well-optimized, short and very very fast. If it isn't, you won't see any noticeable or measurable improvement from changing break's to continue's, or the entire switch block, to computed goto's and a jump table. :)
Title: Re: Tiny Benchmark Test
Post by: Aurel on May 17, 2014, 10:05:47 PM
thanks mike  ;)
If we talk about oxygen ...it looks to me that  if/elseif method is somehow faster than
select/case ..how is that possible ?
Title: Re: Tiny Benchmark Test
Post by: Mike Lobanovsky on May 17, 2014, 10:27:56 PM
Hi Aurel,

I don't know yet how Charles implemented the two blocks at the assembly level but in modern C with an optimizing compiler, there are two differences between a multiline if block and a switch block:

1. A switch block is supposed to pre-scan the incoming value for the range of values the block handles. If the block doesn't contain default: and the value is totally out of its range of handled values, the block is supposed to be skipped altogether. In many cases, pre-scanning is much faster than going through the long long chain of mispredicted case's only to find that no actual case matches the value.

A multiline if will always perform the entire range of comparisons regardless of actual range of values it can handle.

2. A long switch block will be converted to a jump table and computed goto's by the optimizing compiler automatically if a corresponding optimization option is specified. Multiline if blocks do not enjoy such an optimization.

Now let's wait for Charles to tell us how the two blocks are actually handled in OxygenBasic.
Title: Re: Tiny Benchmark Test
Post by: JRS on May 17, 2014, 11:06:21 PM
Are nested SELECT/CASE's more efficient than a IF/THEN/ELSEIF/ELSE block structure?

SB doesn't have SELECT/CASE syntax. (uses IF/THEN/ELSEIF/ELSE instead) I have never run into a SELECT/CASE that I couldn't easily convert to a IF/THEN/ELSEIF/ELSE structure. Then again, SB isn't a compiler and ease of use with a consistent syntax trumps speed & obscurity.
Title: Re: Tiny Benchmark Test
Post by: Mike Lobanovsky on May 18, 2014, 12:34:46 AM
Are nested SELECT/CASE's more efficient than a IF/THEN/ELSEIF/ELSE block structure?

Nested Select Case's are exactly as awkward as nested If/ElseIf's especially if the both are long and not properly indented (and preferably highlighted with indentation guides).

I usually frame the outer conditionals in a Select Case block, and the inner ones, in If/ElseIf blocks if I have to. Also, I will never ever use an editor that isn't equipped with auto-indentation and indentation guides features.

Quote
... consistent syntax trumps speed & obscurity.

For obvious reasons that I described in my previous message, the use of multiple if/else if's will not be considered good programming style in the C language. Long chains of conditionals should be framed in a switch block while shorter and simpler if/else's are acceptable in the case sub-blocks.

It's not a matter of obscurity or consistency but rather of implied functionality. Long switch'es are generally executed faster than if/else if's due to inherent pre-scanning, even if not optimized into jump tables and computed goto's altogether.
Title: Re: Tiny Benchmark Test
Post by: JRS on May 18, 2014, 12:38:53 AM
This is as close as I can come emulating a SELECT/CASE in SB.

Code: [Select]
v = 3

IF v = 1 THEN GOTO CASE_1
IF v = 2 THEN GOTO CASE_2
IF v = 3 THEN GOTO CASE_3
GOTO DEFAULT

CASE_1:
  PRINT "ONE\n"
  GOTO BREAK
CASE_2:
  PRINT "TWO\n"
  GOTO BREAK
CASE_3:
  PRINT "THREE\n"
  GOTO BREAK
DEFAULT:
  PRINT "DEFAULT\n"
 
BREAK:   


Last alternative before shutting down.

Code: [Select]
CASE{"1"} = ADDRESS(One())
CASE{"2"} = ADDRESS(Two())
CASE{"3"} = ADDRESS(Three())

SUB One
  PRINT "One\n"
END SUB 

SUB Two
  PRINT "Two\n"
END SUB 

SUB Three
  PRINT "Three\n"
END SUB 

v = 3

ICALL CASE{STR(v)}
Title: Re: Tiny Benchmark Test
Post by: Mike Lobanovsky on May 18, 2014, 01:23:34 AM
The first variant is very very close to what a  C compiler will do decoding a C if/else if block into equivalent assembly. No code relocations, no pre-scanning - just straight-forward conversion of every C equivalent to your IF/THEN GOTO into a corresponding cmp/je pair of assembly instructions.

The second variant is worse - it is one function call slower in each constituent case.

The preprocessing of a real switch block in C will be much more thorough. That's why I'm looking forward to Charles' response as to how his BASIC parser treats the two blocks in OxygenBasic.

On a side note, I'm in favor of both BASIC GoTo and C goto keywords. Bellard's TinyCC which FBSL's DynC is based on uses goto's heavily. I'd say his C compiler wouldn't be so very fast (~30MB of source code per second on a 2GHz computer!) if he didn't use goto's so heavily throughout its source code. C++ goto's may be detrimental to structured programming but ANSI C goto's and BASIC GoTo's are certainly a bliss.
Title: Re: Tiny Benchmark Test
Post by: Aurel on May 18, 2014, 01:33:21 AM
Mike thanks again...
I have used in my toy interpreter select to select token ( integer) from integer array
tok[n]...But mine main problem is slow interpreter loop ..in this case for/loop.
why?
For loop is not slow in oxygen BUT when i do some internal loop inside this main
for loop then thigs go slow..no mather what i do in this sub-loop the speed is same
what is really crazy... :( :o
Title: Re: Tiny Benchmark Test
Post by: Mike Lobanovsky on May 18, 2014, 01:45:32 AM
I can't say it off the top of my head, Aurel.

Let me take some time to look through your sources if they are available from your site and perhaps I will be able to spot the bottleneck. I don't however promise that I'll do it right away, sorry. :)
Title: Re: Tiny Benchmark Test
Post by: Charles Pegge on May 18, 2014, 02:01:49 AM
OxygenBasic select case is faster than if .. elseif, especially when the variable being tested is  indirect, or calculated or an array. This is because the test value is held in the accumulator register, and does not require reloading for each case evaluation.

Another advantage is case ranges. These produce very efficient binary, compared to the if equivalent.

case 1 to 10 : ...

if (a>=1) and (a<=10) ..

Oxygen does not preprocess its case blocks or try to map them into jump tables.

A simple token loop with a jump table: (goto is indispensible for this kind of code :) )

Code: [Select]
'JUMP BY TOKEN
'=============

sys t[256] : t[65]=>{@FAA,@FBB,@FCC}
'
string src = "ABCA" 'representing token stream
sys    e   = len(src)
sys    i   = 0
byte  *b   = strptr(src)-1
sys    eb  = @b+e

NextItem:

  @b++
  if @b>eb then end
  i=b : goto t[i]
  FAA:
  print "AA"
  goto NextItem

  FBB:
  print "BB"
  goto NextItem
  '
  FCC:
  print "CC"
  goto NextItem


The most recent Oxygen supports jump by array.
Title: Re: Tiny Benchmark Test
Post by: Aurel on May 18, 2014, 02:11:49 AM
Thanks Mike for interest   :D
I must say that this is pure interpreting ..so there is no any intermediate representation.
From my experience ..in Oxygen is the fastest Do/EndDo loop.
I even try to code some asembler code to increase speed( as far as i understand asm coding  :-\ ).
Ok Charles maybe i am wrong about speed of select vs if/elseif... ;)
Title: Re: Tiny Benchmark Test
Post by: Mike Lobanovsky on May 18, 2014, 02:26:13 AM
Hi Charles,

I think case ranges and faster evaluation are already reasons enough to keep two distinct constructs in the language.

And yes, while If/ElseIf allows the programmer to evaluate absolutely unrelated conditions in each comparison, Select Case always evaluates only one quantity against a number of other similar quantities. This enables the compiler to keep the former always in one place (accumulator register in O2 case) without reloading. This will naturally yield a boost compared to ElseIf where a completely different pair of objects may be evaluated and reloading must be done to allow for that.

Thanks also for yet another example of jump tables. The O2 code seems to be exactly as efficient in this regard as pure C. :)
Title: Re: Tiny Benchmark Test
Post by: Charles Pegge on May 18, 2014, 02:30:36 AM
I've made ifs more efficient quite recently, so there is not much difference when testing numbers/equates.

if

mov eax,a
cmp eax,42
jnz _exit_if


case

cmp eax,42
jnz _exit_cas
Title: Re: Tiny Benchmark Test
Post by: Mike Lobanovsky on May 18, 2014, 02:48:21 AM
I must say that this is pure interpreting ..so there is no any intermediate representation.

@Aurel:

As you see from Charles' examples, it is extremely easy to implement a fast jump table/computed GoTo's scheme in Oxygen. What is left to do for you is in fact to write a tokenizer which will substitute each of your language keywords in a given program with a corresponding token - an integer within the range of 0 to 255 (byte integers will suffice). You can then store the program's entire succession of tokens in a string and feed it to the loop exactly as shown in Charles' latest example.

That way you will have an extremely fast interpreting loop. Then, the only thing left to do will be optimize the function for every token (Charles uses simple Print instead of it) to make it running as fast as your main loop, et voila: your toy interpreter becomes the Interpreter. :)

@Charles:

Yeah, the difference is ultimately 1 clock per comparison.  But you can improve on that yet further by comparing two things simultaneously as both mov and cmp are perfectly UV-pairable. Then the overhead will be just 0.5 clocks.  :D

Further, sources say that Pentium branch predictors are optimized for jz more than other instructions of this mnemonic. Mispredicted jumps lead to processor stalls with heavy clock-wise penalties. If you could compare for zero then your evaluators will be lightning fast. You can even add an extra unconditional jmp at the end if needed as its clocks are fewer than the misprediction penalty. :)
Title: Re: Tiny Benchmark Test
Post by: Charles Pegge on May 18, 2014, 03:03:43 AM
Mike,

Thanks for Pentium jump tips.

Yes, inefficient assembler is too painful to behold, so I am driven to optimise!

switch is also supported, as well as select. It is occasionally used to allow cases to fall through for more case testing and processing. In this situation the accumulator has to be reloaded with the switch variable, before testing further cases.
Title: Re: Tiny Benchmark Test
Post by: Mike Lobanovsky on May 18, 2014, 03:12:20 AM
It is occasionally used to allow cases to fall through for more case testing and processing. In this situation the accumulator has to be reloaded with the switch variable, before testing further cases.

Yup, that's when the values to compare against don't fall within a contiguous range. But in my practice such cases occur seldom enough. So I guess we can live with an occasional reload here and there. :)
Title: Re: Tiny Benchmark Test
Post by: Aurel on May 18, 2014, 03:44:23 AM
Hi Mike...
Quote
What is left to do for you is in fact to write a tokenizer which will substitute each of your language keywords in a given program with a corresponding token - an integer

I think that i alredy have something like this:
from interperter:
Code: [Select]
SUB RUNCODE
'print "RUN CODE..."
'print "PC:" + Str(PC)
INT Start,EndCode
LineNum=0 : Start=0 : EndCode=PC :perr=0
'STRING key$=""
sys key


' *** MAIN LOOP *****
'For LineNum = Start TO EndCode
'get keyword...token(int)
mov ecx,0
again:
'key$=arg0[LineNum]
'do
key = tok[LineNum]
'print "KEY:" + key$
PC=LineNum
'

'if selected is true
'>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
select key
CASE R_WFORM
exec_Window()

Case R_IF
LineNum = exec_IF()

Case R_ENDIF
'do nothing if TRUE

Case R_SET
exec_Set()

Case R_TXCOLOR
exec_TxColor()

Case R_WTEXT
exec_WText()

Case R_LINE
exec_LineXY()

Case R_CIRCLE
exec_Circle()

Case R_PIX
exec_PsetXY()

Case R_LOOPTO
exec_FOR()

Case R_SHIFT
exec_NEXT()

Case R_JUMP
LineNum=exec_JUMP()

Case R_WINCOLOR
exec_WINCOLOR()

'controls
Case R_CONTROL
exec_WINCONTROL()

End select

'<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
lineNum++

If perr=1
print "Error -> Exit from Main Loop!"
'EXIT For
goto out
End If
'Next LineNum ---------------------------

IF PeekMessage (&wm,0,0,0,Pm_Remove)>0 ' //peek
       TranslateMessage (&wm)
       DispatchMessage (&wm)
END IF
 

'----------------------------------------
IF lineNum > EndCode then goto out
dec ecx
jnz again
'end do

out:
Return

End SUB 

You can see here small asm code to... :D

In attachment is complete interpreter..
Title: Re: Tiny Benchmark Test
Post by: Aurel on May 18, 2014, 09:07:52 AM
And here is small program which draw pine tree .
It took 5.5 sec on my old comp  ::)

Code: [Select]
'chTree
DEFn rise, rad, frad, xshorten
DEFn left, top, width, height, bpx, bpy, tpx, tpy
DEFn x1, y1, x2, y2

WFORM 20,5,440,460,#MMS,0,"ChTree"
wColor 235,235,245
defn ht,xs,aa,msto
defn tpxx,tpyy,bpxx,bpyy,minus4

set msto = -100
set minus4 = -4
set bpx=220 , bpy=410 , tpx=bpx
'brown
txcolor 130, 100, 0
for aa,minus4,4
   set bpxx=bpx+aa, bpyy=bpy-390
  line  bpxx, bpy, bpx, bpyy
next aa
'green
txcolor 30,120,40
set rad=160,tpy=bpy-40
for ht,1,40

  for xs,msto,100,40
    set xshorten=xs/100 
    set rise= RND(0.3),tpxx = tpx+(xshorten*rad),tpyy = tpy-rise*rad
    line tpx, tpy, tpxx, tpyy

    for aa,1,30
       set frad=rnd(0.9)*rad
       set x1=tpx+(xshorten*frad)
       set y1=tpy-rise*frad
       set x2=tpx+xshorten*(frad+rad/5)
       set y2=tpy-rise*frad+(-rise+(RND(0.8)-0.4))*(rad/5)
       line x1, y1, x2, y2
       'wait 1
    next aa
 
  next xs

  set rad=rad-4 , tpy=tpy-9
next ht



.
Title: Re: Tiny Benchmark Test
Post by: JRS on May 18, 2014, 09:38:46 AM
FWIW - Script BASIC is rather flexible with the use of labels. Line numbers in SB are a special form of label. This should freak a few unaware programmers out.  ;D

Code: [Select]
40 FOR x = 1 TO 5
50   IF x = 3 THEN GOTO 10
20   PRINT x,"\n"
30 NEXT
10 END

jrs@laptop:~/sb/sb22/test$ scriba testlnum.sb
1
2
jrs@laptop:~/sb/sb22/test$

Script BASIC also allows you to use the same label name multiple times in your script as long as it isn't in the same scope.

Code: [Select]
SUB One
  IF global_var = 1 THEN GOTO Done
  EXIT SUB
  Done:
END SUB

SUB Two
  IF global_var = 2 THEN GOTO Done
  EXIT SUB
  Done:
END SUB


Title: Re: Tiny Benchmark Test
Post by: Mike Lobanovsky on May 18, 2014, 12:48:45 PM
Hey John,

That line number feature looks like a first class admission fee to BP.org! ;D
Title: Re: Tiny Benchmark Test
Post by: Mike Lobanovsky on May 18, 2014, 12:57:01 PM
Aurel,

Where's the awinh.inc file, please?
Title: Re: Tiny Benchmark Test
Post by: Charles Pegge on May 18, 2014, 01:04:27 PM
Hi Mike, You will find a copy in  OxygenBasic: projectsB\Scintilla. This is Aurel's area :)


For old times sake, OxygenBasic accepts line numbers too. As in ScriptBasic, they are treated as labels. I think they might be useful for diagnostics, and providing numerical references to sections.

Because they are labels, exotic formats can be used:

100.01.44A
Title: Re: Tiny Benchmark Test
Post by: Mike Lobanovsky on May 18, 2014, 01:05:43 PM
Thanks Charles,

Going there right away.
Title: Re: Tiny Benchmark Test
Post by: Aurel on May 18, 2014, 01:15:40 PM
here is awinh.


.
Title: Re: Tiny Benchmark Test
Post by: Mike Lobanovsky on May 18, 2014, 01:24:00 PM
Thanks Aurel,

Your ruben2.exe compiles but it doesn't run your tree script. It breaks on Set and Get calls (see picture).

Where can I get the BMP's for ruben2.o2bas? What other bells and whistles does it need? Perhaps the absence of these components renders the final exe inoperative?

.
Title: Re: Tiny Benchmark Test
Post by: Aurel on May 18, 2014, 01:41:16 PM
Mike..i see ,sorry wrong version..
try this one from attachment...


PS:BMP's for ruben2.o2bas ?
Mike ruben2 don't have any bitmaps ,what you see is probably code editor
for ruben called RCode which is just a modification of ASciEdit for OxygenBasic..ok?
The tree code you can save on disk in same folder where you compile ruben interpreter
then just drag & drop file into interpreter exe and must work.
If you need code for code editor there is no problem...

.
Title: Re: Tiny Benchmark Test
Post by: Mike Lobanovsky on May 18, 2014, 01:53:21 PM
Sorry Aurel,

This one doesn't work for me either. Please give me some other sample script. And what extensions do you use for your scripts? .BAS? .RUB?
Title: Re: Tiny Benchmark Test
Post by: Aurel on May 18, 2014, 01:59:48 PM
Mike ..
extension is .rub
But because you use xp i think that must work.
When you say that you can compile ruben2 to exe ..right?
Title: Re: Tiny Benchmark Test
Post by: Aurel on May 18, 2014, 02:04:39 PM
Ok Mike in attachment is everything you need to try..
just unpack and run RCode2.exe...
i only can hope that will work

.
Title: Re: Tiny Benchmark Test
Post by: Mike Lobanovsky on May 18, 2014, 02:09:11 PM
Yes, it compiles fine. Then I drag-and-drop the script onto ruben2.exe. It builds the main window but then reports that it can't find all the vars that are declared in the DEFn rise, rad, frad, xshorten line and aborts.

OK I will try the entire pack. I'll inform you tomorrow about my results.

Thanks,
Title: Re: Tiny Benchmark Test
Post by: Aurel on May 18, 2014, 02:11:17 PM
ok... ;)
Title: Re: Tiny Benchmark Test
Post by: JRS on May 18, 2014, 02:22:55 PM
Charles,

This is as creative as I can get with SB line number labels. Anything more needs the : style syntax.

Code: [Select]
0X40 FOR x = 1 TO 5
0X50   IF x = 3 THEN GOTO 0X10
0X20   PRINT x,"\n"
0X30 NEXT
0X10 END

Does this count?  8)

Code: [Select]
One:  FOR x = 1 TO 5
Two:    IF x = 3 THEN GOTO Five
Three:  PRINT x,"\n"
Four: NEXT
Five: END
Title: Re: Tiny Benchmark Test
Post by: Mike Lobanovsky on May 18, 2014, 02:38:13 PM
John,

Does that work:

Code: [Select]
&H10 FOR x = 1 TO 5
&H20   IF x = 3 THEN GOTO &H50
&H30   PRINT x,"\n"
&H40 NEXT
&H50 END

That'd be supa-cool BASIC.

;D
Title: Re: Tiny Benchmark Test
Post by: JRS on May 18, 2014, 02:46:08 PM
Yes.

Code: [Select]
&H10 FOR x = 1 TO 5
&H20   IF x = 3 THEN GOTO &H50
&H30   PRINT x,"\n"
&H40 NEXT
&H50 END

jrs@laptop:~/sb/sb22/test$ scriba linenumhex.sb
1
2
jrs@laptop:~/sb/sb22/test$

In Script BASIC numbers (http://www.scriptbasic.org/docs/ug/ug_9.3.html) can use multiple formats and mixed with any other number format style. Remember, you (or someone else) may have to revisit the code.  :o

P.S. Don't you have SB installed?
Title: Re: Tiny Benchmark Test
Post by: Charles Pegge on May 18, 2014, 02:50:31 PM
OxygenBasic is more relaxed about its line numbers. it will accept anything beginning with 0..9, being the first word in a statement. But it will not allow &h.  - nor negative numbers.

goto 0x100h.01.04..annex_B

0x100h.01.04..annex_B Print "line numbers in Basic Source code are hereby deprecated"

Title: Re: Tiny Benchmark Test
Post by: Mike Lobanovsky on May 18, 2014, 04:34:56 PM
Remember, you (or someone else) may have to revisit the code. Don't you have SB installed?

LOL John,

Can't you see I'm simply trolling you (gently though) whenever I can? That's my little revenge for the past times of our yet-offline acquaintance.

;D
Title: Re: Tiny Benchmark Test
Post by: Mike Lobanovsky on May 18, 2014, 05:14:46 PM
The most recent Oxygen supports jump by array.

Charles,

Exactly what do you mean by "jump by array", please? Just saw it in a later edit to your message.

This is Aurel's area :)

The sandbox, that is? Well, well... :)

Quote
For old times sake, OxygenBasic accepts line numbers too. As in ScriptBasic, they are treated as labels. I think they might be useful for diagnostics, and providing numerical references to sections.

I would've gladly changed FBSL labels' behavior to include line numbers and acquire local visibility scope (all FBSL's labels are global) but Gerome has a hell of a lot of scripts at his work and also at his customers' so he won't let me do it, lazy bones. :)
Title: Re: Tiny Benchmark Test
Post by: Mike Lobanovsky on May 18, 2014, 05:37:09 PM
Evidently it's high time for me to try out v4.8.1 against the FBSL sources. Perhaps I'll be able to enjoy a 30% boost there too. :)

I actually did succeed in compiling FBSL v3.5 RC2 with GCC v4.8.1 and it did add a 10% to 15% boost to all my integer and floating-point benchmarks, respectively. But:

1. The size of Fbsl.exe binary has increased by 110KB from its current 606KB to 716KB.

2. GCC's new SEH scheme interferes with my own SEH implementation and doesn't work quite like I want it to. Nor am I able to either disable or override this new feature altogether.

3. The new data and code section layout within the binaries generated by GCC v4.8.1 require that I rewrite my elaborate hand-coded FBSL executable compilation routine from scratch. I'd rather not spend two more weeks of my time recoding and debugging it again.

That said, I guess FBSL v3.5 will stay as it is even if it makes it run somewhat slower than I'd like it to. That's my final decision.
Title: Re: Tiny Benchmark Test
Post by: Aurel on May 18, 2014, 10:20:42 PM
mike
i think that you must tried with exe from pack ..because ruben2.exe is not compiled
with latest oxygen.dll because of some bugs or errors or strange reasons
in latest release i must use older dll which work properly for interprer.
Title: Re: Tiny Benchmark Test
Post by: Charles Pegge on May 18, 2014, 11:12:28 PM
Mike,
I mean you can jump/goto an address specified by an indexed variable, as well as a simple variable or label: goto a[ b ]

Aurel,
I am using your Ruben2 source of 26 April as part of my Oxygen-checking regime, using StarFlower.RUB as a test script.
Title: Re: Tiny Benchmark Test
Post by: Mike Lobanovsky on May 18, 2014, 11:45:32 PM
Thanks for the clarification, Charles!

Aurel, sorry but the package you sent me doesn't work for me either. It seems like your parser doesn't handle end-of-line sequences correctly on my XP Sp3 Russian. If I try to delete a line in any .RUB script, or insert a line, or shuffle the lines with variable declarations, ruben2 starts to report different variables as missing and refuses to set/get their values.

What alphabet do you use in your native language? Is it a Central European Latin charset?

I'll look into the code later on to experiment and see what may be wrong with it on my platform.
Title: Re: Tiny Benchmark Test
Post by: Aurel on May 19, 2014, 02:12:52 AM
Quote
What alphabet do you use in your native language? Is it a Central European Latin charset?
I'll look into the code later on to experiment and see what may be wrong with it on my platform.

Mike
I use croatian keyboard but i think that this don't have nothing with your xp sp3...
i  think that problem might be new oxygen.dll....so i still use older version...
In atachment is a version which work perfectly for me..
I am not sure which version is it :-\


.
Title: Re: Tiny Benchmark Test
Post by: Charles Pegge on May 19, 2014, 05:53:55 AM
Hi Aurel,

I can compile your latest Ruben2, and run all the scripts, with the current Oxygen:

http://www.oxygenbasic.org/o2zips/Oxygen.zip
Title: Re: Tiny Benchmark Test
Post by: Aurel on May 19, 2014, 06:03:59 AM
Hi Charles..

So then everything is ok with interpreter and code editor..right?
Hmm ...i really don't have a idea why not work on Mike computer ???
I have never problems to compile any o2 script from you or Peter
but often when you add some new things in oxygen then problems jumps out... :-\
Anyway i will backup somewhere this old dll to prevent any sort of crushing or errors.

thanks for testing... :)
Title: Re: Tiny Benchmark Test
Post by: Charles Pegge on May 19, 2014, 07:04:23 AM

Having samples of your code, will help me detect unforseen problems :)

I also did a brief test of AsciEdit2, but it was not a full function test.
Title: Re: Tiny Benchmark Test
Post by: Aurel on May 19, 2014, 07:16:49 AM
Quote
Having samples of your code, will help me detect unforseen problems

heh..i see, i am glad that can help .
ASciEdit2 is almost complete editor (in a term of simplicity ) which can be used for coding
o2 programs...just few things missing :-\
Title: Re: Tiny Benchmark Test
Post by: Mike Lobanovsky on May 19, 2014, 09:11:20 AM
The problem is the way how Aurel gets his scripts parsed into lines.

He creates a hidden richedit20a control and streams the script file into it. Then he reads lines of code from it one by one using SendMessage(EM_GETLINE) and stores them in his own lines[] array - empty lines, rem lines, code lines - all of them without any filtering.

This is a very dirty hack. This control requires initialization before use and its implementation is very platform-specific. What may work for richedit20a under a (Chroatian?) XP Sp2 or a British Vista might not work under a Russian XP Sp3 without proper initialization: half of the lines loaded into the uninitialized control on my machine are simply not read back into the lines[] array.

I think a better design decision would be to discard the richedit20a control altogether and split the file into lines in a custom-coded Split() function based on LF's and having filtered off CR's. Half of modern text editors save their texts in a Unix format without CR (0xD) characters and line parsing based on CRLF's only may simply not work.

Having samples of your code, will help me detect unforseen problems :)

He-he-he-he-he.... Now I see what you mean. ;D
Title: Re: Tiny Benchmark Test
Post by: Aurel on May 19, 2014, 10:59:11 AM
Quote
This is a very dirty hack
:D
Ok if you say so.. ;)
Yes it is not the best way...
But i have found what might be wrong about intialization of richedit control.
In awinh i have a wrong name of dll ....
riched32 =  LoadLibrary "riched32.dll"
but must be ...
riched32 =  LoadLibrary "riched20.dll"
i tried again and it looks that work....
i can only...hope that will work with your xp sp3 ...russian edition.  ;)



.
Title: Re: Tiny Benchmark Test
Post by: Mike Lobanovsky on May 19, 2014, 06:52:51 PM
Aurel,

1. I got ruben2 running. The offending routine appeared to be ClearGlobalArray(). It must be commented out, otherwise the process memory gets randomly corrupted. These arrays don't require clearing at the current stage of your project development. They are filled up only once per app run and they are effectively cleared when dimensioned at app start.

I do not think it is entirely your fault; it may still be related to string memory leaks in Oxygen (see my message to Charles below).

2. ruben2 is going to be as slow as it is so long as it uses strings for all its operations. You can't improve it radically without minimizing or eliminating string operations from the loop altogether. You're using strings in very many places where you should use numeric tokens and numeric literals. Your tree runs 3.5 seconds on my PC and I think it is close to how fast it can only be until you eliminate string operations in the interpreting loop.

3. Your drawing functions are leaking GDI objects heavily. The tree generates about 7,500 orphaned pens and brushes. Please open up your Task Manager, go to View->Select Columns... and tick the GDI Objects check box. Now you will be able to see a new GDI Objects column in your table and you will be able to control the quality of drawing functions you add to ruben2.

4. Overwrite the respective functions in your ruben2 with the following code. It will prevent GDI leakage - you'll see that your tree needs only about 55 GDI objects to draw itself.

SUB LineXY (wID as INT,byval x as INT,byval y as INT,byval x1 as INT,byval y1 as INT)
    hdc = GetDC(wID)
    GetSize(wID,0,0,ww,hh)

    int np = CreatePen(PS_SOLID,1,fColor)
    int op = SelectObject(hdc, np)

    MoveToEx hdc,x,y,Byval 0
    LineTo hdc,x1,y1
    BitBlt(hDCmem, 0, 0, ww, hh, hdc, 0, 0, SRCCOPY)
   
    DeleteObject(SelectObject(hdc, op))
    ReleaseDC( wID, hdc)
End SUB

SUB Circle (wID as INT, byval cix as INT, byval ciy as INT, byval cra as INT)
    hdc = GetDC(wID)
    GetSize(wID, 0, 0, ww, hh)
   
    int np = CreatePen(PS_SOLID, 1, fColor)
    int op = SelectObject(hdc, np)

    Ellipse hdc, cix-cra, ciy-cra, cra+cix, cra+ciy
    BitBlt(hDCmem, 0, 0, ww, hh, hdc, 0, 0, SRCCOPY)
   
    DeleteObject(SelectObject(hdc, op))
    ReleaseDC( wID, hdc)
End SUB

Sub FillSolidRect(wID as INT, x As Long, Y As Long, cx As Long, cy As Long, bColor as INT)
    Dim hBr As Long ' rc As RECT
    hDC = GetDC(wID)
    rc.Left = x
    rc.Top = Y
    rc.right = x + cx
    rc.bottom = Y + cy
    hBr = CreateSolidBrush(bColor)
   
    FillRect hDC, rc, hBr
    BitBlt(hDCmem, 0, 0, ww, hh, hdc, 0, 0, SRCCOPY)

    DeleteObject(hBr)
    ReleaseDC(wID, hdc)
End Sub

SUB CleanUp
    DeleteObject(SelectObject(hdcMem, oldBrush))
    DeleteObject(SelectObject(hdcMem, oldPen))
    DeleteObject(SelectObject(hdcMem, oldBmp))
    DeleteDC(hdcMem)
End SUB

Hope this helps.

.
Title: Re: Tiny Benchmark Test
Post by: Mike Lobanovsky on May 19, 2014, 07:35:57 PM
Hi Charles,

I think there are still problems with string garbage collection in Oxygen. While discrete functions seem to be clean enough when used with proper notation, some weird cases may still fool the GC and cause severe memory leaks.

For example, Aurel's tree uses 1.5GB of memory while drawing. Other .RUB examples are not much better in this regard. I didn't manage to check every line of his code for proper string handling but I did test separate basic string functions and they seem to be OK at a first glance.

However despite all the functions being perfectly correct and working, the following queer example of mine (and this is legitimate Oxygen syntax!) crashes in about 2 seconds from app start having exhausted 2GB of XP process memory:

$Filename "test.exe"
Include "RTL32.inc"
#Lookahead

String s = Space(2500)

While 1
   Replace s, "A", "A"
WEnd

Function Replace(String t, w, r) As String
   '=======================================
   '
   Sys a, b, lw, lr
   String s = t
   '
   lw = Len(w)
   lr = Len(r)
   a = 1
   ' 
   Do
      a = InStr(a, s, w)
      If a = 0 Then Exit Do
      s = Left(s, a - 1) + r + Mid(s, a + lw)
      a += lr
   End Do
   Return s
End Function

I realize that this is not an expected way to use Replace() in a BASIC but Oxygen's Functions are not supposed to differ from Subs so in a generic sense, this looks like an obvious bug.

You can replace Replace s, "A", "A" (pun not intended) with Space 2500 in this While/WEnd loop and see similar memory leakage.

I also remember Aurel (I wasn't yet registered on the forum then) complaining that the evaluator code you published seemed to leak memory. This is what I think is still happening now massively in his ruben2 code.
Title: Re: Tiny Benchmark Test
Post by: Charles Pegge on May 19, 2014, 11:49:54 PM
Many thanks Mike,

I am reviewing the garbage collector's logging-points , and will also trap calls to string functions and float functions, which neglect to assign the returned value to a variable.
Title: Re: Tiny Benchmark Test
Post by: Aurel on May 20, 2014, 03:04:31 AM
Thank you very much MIke  :D
Your investigation is great...and yes you have a right about GDI objects...
Infact i have never use this option from taskManager .. :-\   before.

About strings ..hmmm yes oxygen have a real problems with strings but after all my
investigation and trying to do best i know i use tronD( purebasic guy) evaluator which
don't create mem-leaks...
thanks again... ;)
Title: Re: Tiny Benchmark Test
Post by: Peter on May 20, 2014, 03:22:55 AM
Hello together,

Title: Re: Tiny Benchmark Test
Post by: Mike Lobanovsky on May 20, 2014, 04:06:21 AM
No Peter,

Your code will leak GDI pens as heavily as Aurel's did before my correction. We discussed this matter with you in the past and I gave you the appropriate links to MSDN.

You cannot delete a GDI object while it is still selected into a device context, and your DeleteObject iPen will simply fail. The iPen object will stay in hdc until the function is called next time. Then it will be replaced with a new iPen in a new call to SelectObject hdc, iPen but the old pen object will stay undeleted. This is because in the new call, iPen already refers to another newly created pen and this call will also fail like the old one.

You must always store the old GDI objects when you make a call to SelectObject() and re-select them afterwards into the device context with another call to this function. The second SelectObject() call will return the de-selected GDI object's handle that may be used to delete this object.

That said, your Sub should be changed to the following in an efficient C-style notation:

Sub LineXY(int hdc, x, y, a, b, color)
    iPen = CreatePen 0,1,color
    int oldPen = SelectObject(hdc, iPen)
    MoveToEx hdc,x,y,NULL
    LineTo hdc,a,b
    DeleteObject(SelectObject(hdc,oldPen)) // SelectObject returns iPen here
End Sub

or atomically in your more common BASIC-style notation:

Sub LineXY(int hdc, x, y, a, b, color)
    iPen = CreatePen 0,1,color
    int oldPen = SelectObject(hdc, iPen)
    MoveToEx hdc,x,y,NULL
    LineTo hdc,a,b
    SelectObject hdc,oldPen
    DeleteObject iPen
End Sub

Please do not argue, Peter. Your code is outright buggy and you are teaching people wrong things again.
Title: Re: Tiny Benchmark Test
Post by: Aurel on May 20, 2014, 06:12:54 AM
As i say before i have never look in TM GDI objects so i cannot say is Peters functions
leak in GDI.
Mike you say 55 GDI objects ...
on new way i get even less ....48 ..which is good i think  :)
Title: Re: Tiny Benchmark Test
Post by: Aurel on May 20, 2014, 07:30:42 AM
Quote
2. ruben2 is going to be as slow as it is so long as it uses strings for all its operations. You can't improve it radically without minimizing or eliminating string operations from the loop altogether. You're using strings in very many places where you should use numeric tokens and numeric literals.

Hi Mike
i agree with you in your point that i use lot of string operations
BUT not in main interpreter loop...
Main loop in my case ..as you can see select integer(sys) variable...
not string..right?
BUT this integer is a index of integer array

key = tok[LineNum]

Maybe is that problematic in a terms of speed...
I know that would be far better way to translate to bytecode ..
As i say many times i have code of DLib which compile source program to bytecode file
and then interpreter(VM) execute this file ...but i don't know how to properly translate this
PureBasic code to oxygen which is very complex and PureBasic very specific.
i still don't catch aa time to try translate Ed toy2 interpreter to oxygen that can give me some
directions.
Title: Re: Tiny Benchmark Test
Post by: Mike Lobanovsky on May 20, 2014, 08:14:07 AM
Hi Aurel,

As i say before i have never look in TM GDI objects so i cannot say is Peters functions leak in GDI.

This is not a question of trial and error here. Any piece of code written like Peter's function must leak GDI objects because it denies the basic principles of working with the Windows device context. I pointed Peter to the MSDN pages and I even sent him a stand-alone Windows SDK help file while you were still discussing publicly "who is a better programmero".

If he still prefers to ignore both MSDN and my recommendations, then let him do it in his own closed-source code. But let's not let him do it publicly. This is because some newbie may copy-paste this "masterpiece" into their own code and then run around the net calling Windows a piece of sh*t because it is Windows that is allegedly leaking tons of its GDI objects. A newbie cannot tell right from wrong at a glance - but I can.

Quote
Mike you say 55 GDI objects ...
on new way i get even less ....48 ..which is good i think  :)

In fact, the actual number is highly platform dependent and what is 48 under your XP may be 55 under mine or even 155 under Peter's Windows 7 or Charles' Vista. But this number must stay invariable for as long as the drawing loop runs.

GDI objects are system objects but even though their number can be very large on a given platform, it is still a finite value. Other concurrent programs may suffer a visible lag or even crash if the system runs out of its GDI memory pool due to some ignorant piece of code running in an endless loop.
Title: Re: Tiny Benchmark Test
Post by: Mike Lobanovsky on May 20, 2014, 08:26:29 AM
Main loop in my case ..as you can see select integer(sys) variable...
not string..right?
BUT this integer is a index of integer array

But the arguments that your evaluator uses in response to a tokenized command come from the string argX[] arrays.

OTOH such code is a very good test case material for Charles to debug and polish his memory garbage collector. :)

Quote
i still don't catch aa time to try translate Ed toy2 interpreter to oxygen that can give me some
directions.

Where's that interpreter's code, please? But open up a new thread for that on an appropriate board. We already have so much info here that's irrelevant to the topic starter's original message... :)
Title: Re: Tiny Benchmark Test
Post by: Peter on May 20, 2014, 08:32:00 AM
Do not tell so much nonsense here!
We select for the Device Contex! Nothing will stored in the Device Context!

Select(Hdc, handle) means:  handle is for the Device Context HDC!
OldHandle = Select(hdc, handle) means: You may restore the previous handle, and nothing more.

And again, do not call me idiot!
Memory leaks are in you, not in my  works.

In other words, our friendship ends now here!
Title: Re: Tiny Benchmark Test
Post by: Mike Lobanovsky on May 20, 2014, 08:53:06 AM
OldHandle = Select(hdc, handle) means: You may restore the previous handle, and nothing more.
You must restore the old handle to be able to free the new one before you can delete it. If you do not take the new handle out of hdc, you can not delete it because your DeleteObject() call will simply fail silently. You can check it by printing the DeleteObject()'s return value: your calls will return zero which means failure. My calls will return non-zero which means success.

(http://www.fbsl.net/phpbb2/images/smilies/icon_m_rtfm.gif), Peter.

Quote
And again, do not call me idiot!
Memory leaks are in you, not in my  works.
Where did you see me calling you an "idiot"?

On the contrary, this is where you're calling me, MSDN, and Microsoft idiots:
Do not tell so much nonsense here!

Quote
In other words, our friendship ends now here!
Was there any really? ;)

.
Title: Re: Tiny Benchmark Test
Post by: JRS on May 20, 2014, 08:58:31 AM
Count to 10 first Peter before getting pissed off and leaving the forum. I will not relink your ghost posts again. No one here is perfect and knows all. (maybe Charles  8))

Title: Re: Tiny Benchmark Test
Post by: Aurel on May 20, 2014, 10:26:37 AM
-the basic principles of working with the Windows device context

Mike...
I have found this functions from net and some parts from Dlib userFunctions
and they are combinations of both.
So... from that point,do you can tell me is my function ok now
after adding DeleteObject,because i think that now work fine?

Yes Mike you are right about arguments which are strings.
But i see one strange thing in all this inside(interpreteed) loops with string
arguments.
There is no big difference if i have some calculations or some other executions
inside this loop...it looks to me that speed(time) depend of number of iteration.
Again ...maybe i am completely wrong .

Ok about Ed toy2 i will open new topic in interpreters..
Title: Re: Tiny Benchmark Test
Post by: Mike Lobanovsky on May 20, 2014, 11:28:12 AM
So... from that point,do you can tell me is my function ok now after adding DeleteObject,because i think that now work fine?
Yes Aurel, if you are talking about the entire set of drawing functions in ruben2 after you added my corrections, then I can say that they are OK and they do not cause GDI leaks in this particular program.

In order to revise any other 3rd-party functions before you add them to ruben2, or create your own ones, just memorise a few simple rules of handling Windows device contexts and GDI objects:

1. Release the device contexts that you Get in your code when you are done working with them.

2. Delete the device contexts that you have Create'd when they are no longer needed.

3. Always store the GDI objects that are already present in your device contexts which you have Get'ed (gotten) or Create'd, whenever you use SelectObject() to add a new GDI object to a device context. The device contexts are never empty; even newly created DC's already have their default font, pen, brush, and 1x1 px BMP pre-selected into them.

4. Always restore the old GDI objects before you try to delete the new ones with a call to DeleteObject(). GDI objects including BMP's cannot be deleted if they are still selected into a DC.

5. Some drawing functions, e.g. FillRect(), don't need the respective GDI object (in this case, a brush) to be selected into the device context. Such a GDI object can and should be deleted directly with a call to DeleteObject() when it is no longer needed.

6. When deleting a device context, first re-select all the old GDI objects that you have changed with your calls to SelectObject() and only then delete the DC proper. There's no sense in trying to re-select anything into a DC that's already been deleted.

Simple, eh? :)

Quote
There is no big difference if i have some calculations or some other executions
inside this loop...it looks to me that speed(time) depend of number of iteration.
Again ...maybe i am completely wrong .
You simply can't see the real speed with which your integer tokens can run in a Select Case- or in a jump table-based loop as long as everything else works 100 times slower with string arguments. You have to change the entire concept of ruben3 to some decent 3rd-party prototype, perhaps, this toy2 proggy.

Quote
Ok about Ed toy2 i will open new topic in interpreters..
Yes, please do.
Title: Re: Tiny Benchmark Test
Post by: Charles Pegge on May 20, 2014, 11:12:21 PM

Mike,

I picked up a few issues with the GC. Nothing major. Local string arrays were getting logged into the global GC list, instead of the local one.

And it will no longer be possible to call a function without assigning the result to a variable. (except for integer returns)

This is your test code, slightly modified. - Its a good test for string memory leakage.


Code: [Select]
$Filename "t.exe"
 Include "$/inc/RTL32.inc"
 Include "$/inc/console.inc"
#Lookahead

'String s = Space(2500)
 String s = String(2500,"A")
 

sys i
printl "Press key to begin" : waitkey
While i<1000
   s=Replace s, "A", "A"
   i++
   printl i
Wend
printl "Press key to end" : waitkey

Function Replace(String t, w, r) As String
   '
   Sys a, b, lw, lr, ls
   string s = t
   '
   ls = len(s)
   lw = Len(w)
   lr = Len(r)
   a = 1
   ' 
   Do
      if a>Len s then Exit Do
      a = InStr(a, s, w)
      If a = 0 Then Exit Do
      s = Left(s, a - 1) + r + Mid(s, a + lw)
      a += lr
   End Do
   Return s
End Function

http://www.oxygenbasic.org/o2zips/Oxygen.zip
Title: Re: Tiny Benchmark Test
Post by: Mike Lobanovsky on May 21, 2014, 03:18:41 AM
Hi Charles,

Yes, your "Must assign to a variable" feature is cool! :D Works OK for both internal and external functions.

But garbage collection doesn't work for strings created in external modules. For example,

$Filename "test.exe"
Include "RTL32.inc"
#Lookahead

Declare Function Spaces Lib "test.dll" Alias "Spaces" (ByVal nBytes As Long) As String

String s

While 1
   s = Spaces(2500)
WEnd

would crash immediately with Spaces() implemented in an external test.dll module like this:

char* __stdcall __declspec(dllexport) Spaces(const int nBytes)
{
    char* s = (char*)malloc(nBytes+1);
    memset(s,0x20,nBytes);
    s[nBytes]=0;
    return s;
}

having exhausted the entire 2GB of XP process heap in a fraction of a second.

Do you think it might be reasonable to make the garbage collector guard against such weird cases as well?


.
Title: Re: Tiny Benchmark Test
Post by: Charles Pegge on May 21, 2014, 04:10:48 AM
Hi Mike,

Yes, Oxygen assumes that returned char* (as from GetCommandLine)  are constants. It has no way of knowing how they were created, and therefore, if they are not constants, they must be handed back to the host system for recycling.

In your example, Oxygen first frees any content of s then copies from the returned char pointer, into s. The final content of s will be released by the GC. (at the end of eternity :) )

I think most libraries deliver their creations as handles, which are explicitly freed when finished with.

There is a more subtle problem to chew on:

Uninitialised member strings, must be logged to the correct garbage collection list. Since all Oxygen arrays and structures are lazy, if they are passed byref, the function must log any new strings to the global GC list by default. It cannot use its local GC list, or the string member would evaporate when the function terminates.
Title: Re: Tiny Benchmark Test
Post by: Mike Lobanovsky on May 21, 2014, 07:01:30 AM
Thanks Charles,

Yes, I know my example was totally weird because nobody in his right senses would create a DLL like that. I did it simply to illustrate the idea and hear out your reasons why the GC shouldn't react to cases like that. :)