Author Topic: Tally slow ?  (Read 6551 times)

0 Members and 2 Guests are viewing this topic.

Aurel

  • Guest
Tally slow ?
« on: June 28, 2012, 06:24:04 AM »
Hi Charles...
Here is function Tally(src,del) which count delimiters in string ,
put each delimiter position in array dPos[]
and return number of delimiters.
I tested this function in Ebasic and Purebasic and there work fast but i found that in
Oxygen basic need 2 second for such a simple string like :
FORM 0,0,400,300,@min,0,"NewForm"
What might cause such a slow processing?

Code: [Select]
FUNCTION Tally(STRING Main$,STRING Match$) As INT
    DIM i,j,q,mlen,matchlen As INT
DIM t$ As STRING
    mlen = LEN(Main$)
    matchlen = LEN(Match$)
    i = 1
    j = 0
    q = 0
    IF (mlen = 0) OR (matchlen = 0)
        RETURN j
    END IF
   
    WHILE (1)

t$=MID(Main$,i,matchlen)
IF t$ = CHR(39) THEN q = q + 1
IF q=2 THEN q = 0
        IF t$ = Match$ AND q=0
           j = j + 1
'mem del$ position
dpos[j]=i
        END IF

           i = i + 1
        IF i > mlen
           exit while
        END IF
    WEND
'tbreak:
    RETURN j
END FUNCTION

called like:
cc=Tally (src,",")

Peter

  • Guest
Re: Tally slow ?
« Reply #1 on: June 28, 2012, 06:47:36 AM »
Hi Aurel

No problem!

Code: [Select]
sys dpos[100]

FUNCTION Tally(STRING Main$,STRING Match$) As INT
    DIM i,j,q,mlen,matchlen As INT
DIM t$ As STRING
    mlen = LEN(Main$)
    matchlen = LEN(Match$)
    i = 1
    j = 0
    q = 0
    IF matchlen=0 or mlen=0   
       RETURN j
    END IF
   
    WHILE (1)

t$=MID(Main$,i,matchlen)
IF t$ = CHR(39) THEN q = q + 1
IF q=2 THEN q = 0
        IF t$ = Match$ AND q=0
        j = j + 1
'mem del$ position
dpos[j]=i
        END IF

        i = i + 1
        IF i > mlen
        exit while
        END IF
    WEND
'tbreak:
    RETURN j
END FUNCTION

sys c, string sc="hello, banana, and, apple, man, at, home"
c= Tally sc,","
print c

Aurel

  • Guest
Re: Tally slow ?
« Reply #2 on: June 28, 2012, 08:45:37 AM »
Hi Peter...
It looks that is little bit faster but still is not fast enough( i think..)
Here is complete code (require awinh.inc).
I think that something is not good becuse there is problematic part
with array indexes .
And Charles why array is not zero based by default?
When i force Indexbase 0 then i recive few weird errors in For/loop ? ::)
Code: [Select]
$ Filename "NewWindow.exe"
Include "RTL32.inc"
Include "awinh.inc"
#lookahead ' for procedures
'Indexbase 0
'must be in global scope +++++++++++++++++++++++++
string txt = nuls 500000 'buffer
string lines[500000]   'lines array
sys pt = strptr txt    'buffer base pointer
sys i.j,p              'indexes and pointer
string CRLF=chr(13)+chr(10)
string temp$=""
'++++++++++++++++++++++++++++++++++++++++++++++++
% LR_LOADTRANSPARENT = &H20
% LR_LOADMAP3DCOLORS = &H1000
INT TransparentMap3D = LR_LOADTRANSPARENT or LR_LOADMAP3DCOLORS
INT win,win2
INT winstyle
INT button1,button2,button3,button4,button5
INT edit1,edit2,edit3
INT Lbox,static1,static2,static3,richedit1
INT ed1ID,ed2ID,ed3ID
INT b1ID,b2ID,b3ID,b4ID,b5ID,b6ID,b7ID
INT LboxID = 300
INT st1ID,st2ID
INT reID
b1ID=100
b2ID=101
b3ID=102
b4ID=103
b5ID=104
'-----------------
ed1ID=200
ed2ID=201
ed3ID=202
'----------------
st1ID=350
st2ID=351
st3ID=352
'----------------
reID=400
'loadbmp
'##########################################
INT bmpB1,bmpS1
bmpB1 = LoadImage(0,"data/xpBopen.bmp",0,76,20,16)
bmpS1 = LoadImage(0,"data/xpStatic.bmp",0,82,82,16)
'##########################################
STRING src[15000]
'*******************************************
'************* GLOBALS *********************
'*******************************************
String arg[16]
sys dPos[16]
'*******************************************

winstyle = WS_MINMAXSIZE or WS_CLIPCHILDREN
'create window **************************************************
win = SetWindow("Test o2",100,100,640,400,0,winstyle)
'****************************************************************

'create buttons ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
button1 = SetButton(win,280,14,100,24,"OpenFileDialog",0x50000000,0x200,b1ID)
'set bitmap on button 1
'SendMessage button1 , BM_SETIMAGE, 0, bmpB1
button2 = SetButton(win,390,14,80,24,"FILL->ARRAY",0x50000000,0x200,b2ID)
button3 = SetButton(win,490,14,120,24,"Load scrambled-txt",0x50000000,0x200,b3ID)
button4 = SetButton(win,490,62,84,24,"Get Line Count",0x50000000,0x200,b4ID)
button5 = SetButton(win,490,88,84,24,"RUN",0x50000000,0x200,b5ID)

':::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
'create edit control
edit1 = SetEditBox(win,100,80,380,23,"edit 1",0x50004000,0x200,ed1ID)

'create listbox
Lbox = SetListBox(win,100,140,180,150,"LB 1",0x50000140,0x200,LboxID)

'create static control
static1 = SetStatic(win,10,20,254,16," This is a STATIC text control with EX_CLIENTEDGE",0,0x200,st1ID)
'upper edit
static2 = SetEditBox(win,10,40,600,18,"search for string 'NWF' in source -> scrambled.txt",0x50004000,0x200,st1ID)
'crete static control with bitmap
static3 = SetStatic(win,10,60,82,82,"",0x5000030E,0,st3ID)
SendMessage static3 ,370, 0, bmpS1

'create Rich Edit control
INT reStyle = WS_CHILD|WS_VISIBLE|ES_MULTILINE|ES_WANTRETURN|WS_VSCROLL|WS_HSCROLL|ES_AUTOVSCROLL|ES_AUTOHSCROLL|ES_SUNKEN
'INT reStyle = 0x508010C4
richedit1 = SetRichEdit(win,300,120,300,200,"WINDOW 0,0,400,300,@min,0,'OK,ready!'",reStyle ,0x200,reID)

INT LCount = 0
'---------------------------------------------------------------------
'GoSub AddListboxItems
'--------------------------------------------------------------------

'--------------------------------------------------------------------
'WHILE GetMessage (&wm,0,0,0)<>0
'TranslateMessage &wm
'DispatchMessage &wm
'WEND
sys bRet
  '
  Do While bRet := GetMessage (&wm, 0, 0, 0)
    If bRet = -1 then
      'show an error message
    Else
      TranslateMessage &wm
      DispatchMessage &wm
    End If
  Wend



Function WndProc(byval win as long,byval wMsg as long, byval wParam as long,byval lparam as long) as long callback

SELECT wMsg
'----------------------------

CASE WM_DESTROY
DestroyWindow (win)
PostQuitMessage 0
'-------------------------------------------------------------
CASE WM_COMMAND
controlID = LoWord(wParam) 'get control ID
notifyCode = HiWord(wParam) 'get notification message

Select controlID
   CASE b1ID
If notifycode=0 
'Beep(1660,50)
    'Print "Button 1 Clicked!"
Gosub Test
End If

   CASE b2ID ' fill array fast
     If notifycode=0
  FillArray()
 
End If

  CASE b3ID
If notifycode=0
LoadFromFile()
End If

  CASE b4ID
If notifycode=0
GetLineCount()
End If

  CASE b5ID
If notifycode=0
GetLine()
End If




End Select
'-----------------------------------------------------
END SELECT


FUNCTION = DefWindowProc win,wMsg,wParam,lParam


END FUNCTION

'########################################################
SUB AddListboxItems
bstring i$ : i$=""

For n = 0 To 10
i$ = str(n) + "...." + Str(dpos[n])
SendMessage Lbox,LB_ADDSTRING,0,i$
Next n
END SUB
'#########################################################
SUB SetControlText
bstring n$
n$=""

'loop
For i = 0 To 14000
n$ = src[i]
IF n$ = "WINDOW"
print "Command found AT: "+str(i)
'showw()
exit for
END IF
'SendMessage edit1,WM_SETTEXT,0,n$
'UpdateWindow(edit1)
Next

END SUB
'#########################################################
SUB LoadFromFile

string fName = "scrambled.txt"
bstring source =  getfile fName
SendMessage richedit1,WM_SETTEXT,0,source

if not source
    print fName " file not found"
end if

END SUB

'************************************************************

Sub Test
bstring fR
dir=""
bstring sep=chr 0
filter="All Files"+sep+"*.*"+sep+"Text files"+sep+"*.txt"
'filter="text"+sep+"*.txt"+sep+
       '"basic"+sep+"*.bas;*.o2bas"+sep+
       '"include"+sep+"*.inc"+sep+
       '"header"+sep+"*.h"+sep+
       'sep

title="Test File Opening Dialog"
hwnd=0
fR = FileDialog(dir,filter,title,0,0,"txt")

SendMessage edit1,WM_SETTEXT,0,fR
bstring tx =  getfile fR
SendMessage richedit1,WM_SETTEXT,0,tx

End Sub
'*************************************************************

SUB GetLineCount

LCount = SendMessage richedit1,EM_GETLINECOUNT,0,0
bstring count$ = ""
count$ = Str(LCount)
SendMessage edit1,WM_SETTEXT,0,count$
count$=""
END SUB

'*************************************************************

SUB GetLine
INT Lpos
INT LLen
string LText=""
string pText
'---------------
FillArray()
GetLineCount()
'--------------
FOR Lpos = 0 TO LCount

LText = lines[Lpos]

'check command
IF Left(Ltext,6) = "WINDOW" Then
print "Command WINDOW -> found... "

OpenNewWindow()
exit for
END IF

Next
'serch finished exit for
print "EXIT-FOR: "+ str(Lpos)

'test - load file
'with 12560 lines of text
'read line by line ,finished
'cca 7 seconds

END SUB

'////////////////////////////////////////////////////////////////////


'------ FILL ARRAY FAST ------------------------------
Sub FillArray
print "Start..."


'string CRLF=chr(13)+chr(10)
'init...
SendMessage richedit1,WM_GETTEXT,500000,pt

p=pt 'char pointer
b=0  'left boundary
i=0  'lines array index
j=1  'char index
'
byte a at p 'byte linked to pointer p
'
'splitter loop
'=============
'
do
  select a
  case 0
    exit do
  case 10
    b=j
  case 13
    b++
    i++
    lines[i]=mid(txt,b,j-b)
    b=j
  end select
  p++
  j++   
end do

print "Lines: " i CRLF+
      ">" lines[10122] "<" CRLF+
  "Source Loaded into $Array"

End Sub

'------------------------------------------------------

SUB OpenNewWindow
print "SUB WINDOW"
'***********************************************************
Int cc,nArg,n : nArg=7
String buffer=""
bstring tt  'again problem with conversion so i add bString here
temp$ = lines[0]
temp$=Ltrim(temp$)
temp$=Mid(temp$,7,LEN(temp$)-2)
temp$=Ltrim(temp$)
print "Temp$:" + temp$ : tt=temp$
SendMessage edit1,WM_SETTEXT,0,tt
cc=Tally(temp$,",")
print "Number of delimiters: "+ str(cc)
'show coma positions from array dPos[n]
 AddListboxItems()
 ParseArgs()
'if wrong numOfArgs
IF nArg <>(cc+1)
print "Error:Wrong Number of arguments!"
Goto exitW
END IF
'show arguments
for n = 0 TO nArg
buffer = buffer + "arg."+str(n)+ " " + arg[n] + CRLF
next
print buffer

'***********************************************************
'try open new window >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
winstyle = WS_MINMAXSIZE or WS_CLIPCHILDREN 'or WS_CHILD
'create window - parent HWND_DESKTOP
win2 = SetWindow("NewWindow",400,100,400,300,0,winstyle)
Dim wm2 as MSG

IF win2 <> 0
DO WHILE GetMessage(&wm2,0,0,0)
TranslateMessage &wm2
DispatchMessage &wm2

Select @wm2
  CASE WM_Quit
  DestroyWindow (win2)
  PostQuitMessage 0

End select

WEND

END IF

exitW:
END SUB

'<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
FUNCTION Tally(STRING Main$,STRING Match$) As INT
    DIM i,j,q,mlen,matchlen As INT
DIM t$ As STRING
    mlen = LEN(Main$)
    matchlen = LEN(Match$)
    i = 1
    j = 0
    q = 0
    IF (mlen = 0) OR (matchlen = 0)
        RETURN j
    END IF
   
    do

t$=MID(Main$,i,matchlen)
IF t$ = CHR(39) THEN q = q + 1
IF q=2 THEN q = 0
        IF t$ = Match$ AND q=0
           j = j + 1
'mem del$ position
dpos[j]=i
' j = j + 1
        END IF

           i = i + 1
        IF i > mlen
           exit do
        END IF
    end do
'tbreak:
    RETURN j
END FUNCTION
'////////////////////////////////////////////////////////////////////
SUB ParseArgs
'temp$=> 0,0,400,300,@min,0,'OK,ready!'
STRING t$="":t$=temp$
print t$
INT k,i,j
i = 0
j = 1
k = 1

FOR i = 1 TO LEN(t$)
IF i = dPos[k]

arg[k] = MID(t$,j,(i-j))
   arg[k] = Trim(arg[k])
    'print arg[k]
k=k+1
j = i + 1
END IF
NEXT
'+ extract last argument

arg[k] = MID(t$,j,LEN(t$))
PRINT "ParseArgs-finished..."
temp$=""
END SUB

PS...Just click button ' RUN '

Aurel

  • Guest
Re: Tally slow ?
« Reply #3 on: June 28, 2012, 11:59:11 AM »
I really still don't get it why Tally work slow on my computer?
In Peter example there is no other stuff but... ::)
Extract function ParseArgs work fast but i think that require more operation
with strings than Tally..hmmm

Charles Pegge

  • Guest
Re: Tally slow ?
« Reply #4 on: June 28, 2012, 12:40:09 PM »
Hi Aurel,

In general I would avoid creating strings in this sort of function.
Strings (and division and trig functions ) are the most expensive operations.

Indexbase 1 is a bit easier to use. String character indexing is also based to 1. Naturally professionals prefer indexbase 0 :). I would advise using this directive before dimensioning any arrays. However it can also be placed inside a local scope, like a function or class and it will not effect other parts of the program.

This one should approach assembly code speeds:

Code: OxygenBasic
  1.  
  2. function tally(string main,delim, sys *dp, max) as sys
  3. '=====================================================
  4.  
  5. sys a,p,le,ld,tal,ascd
  6.  
  7. le=len main
  8. ld=len delim
  9. if ld=0 or le= 0 then exit function 'NULL STRINGS
  10. ascd=asc delim
  11. '
  12. p=strptr main 'string pointer
  13. byte *b       'pointered byte
  14. @b=p          'map to main string
  15. do
  16.   if b=0 then exit do
  17.   if b=39 then 'SKIP QUOTE
  18.    do
  19.       @b++
  20.       if b=39 or a<=le then exit do
  21.     end do
  22.     @b++
  23.     continue do  
  24.   end if
  25.   if b=ascd then
  26.     a=@b-p+1
  27.     if a=instr(a,main,delim) then 'CONFIRM MATCH
  28.      if tal>=max then exit do 'ARRAY LIMIT EXCEEDED
  29.      tal++
  30.       dp[tal]=a
  31.       @b+=ld
  32.       continue do
  33.     end if
  34.   end if
  35.   @b++
  36. end do
  37. return tal
  38. end function
  39.  
  40. '====
  41. 'TEST
  42. '====
  43.  
  44. sys dp[1000]
  45.  
  46. '4 delimters  !       !       x      !      !
  47. a=tally "apple//orange// 'pear//drop'// / / //banana", "//", dp,1000
  48.  
  49. pr="positions: "
  50. for i=1 to a
  51.   pr+=" " dp[ i ]
  52. next
  53. print pr
  54.  

Charles
« Last Edit: June 28, 2012, 12:50:12 PM by Charles Pegge »

Peter

  • Guest
Re: Tally slow ?
« Reply #5 on: June 28, 2012, 01:10:31 PM »
Aurel,

try this.
Code: [Select]
sys dpos[1000]

Sub Tally (String Main,String Match)
    sys i,j,q,mlen,matchlen, string t
    mlen = LEN(Main)
    matchlen = LEN(Match)
    i=1: j=0: q=0
    iF matchlen=0 or mlen=0 Then Return    
    WHILE (1)
t=MID(Main,i,matchlen)
IF t = CHR(39) THEN q = q+1
IF q=2 THEN q=0
        IF t = Match AND q=0
        inc j
dpos[j]=i
        END IF
        inc i
        IF i> mlen
        Exit While
        End iF
    WEND
    Return j
End Sub

sys c, string sc="hello,,,banana,,,and,,,apple,,,man,,,at,,,home,,,,,,go outside,,,,,,"
c= Tally sc,","
print c

Charles Pegge

  • Guest
Re: Tally slow ?
« Reply #6 on: June 28, 2012, 02:08:50 PM »

The problem here is that string t is created for every character in the text.

Peter

  • Guest
Re: Tally slow ?
« Reply #7 on: June 28, 2012, 02:35:40 PM »
Code: [Select]
sys dpos[1000]
string t[1000]

Sub Tally (String Main,String Match)
    sys i,j,q,mlen,matchlen  
    mlen = LEN(Main)
    matchlen = LEN(Match)
    i=1: j=0: q=0
    iF matchlen=0 or mlen=0 Then Return    
    WHILE (1)
t[j]=MID(Main,i,matchlen)
IF t[j] = CHR(39) THEN q = q+1
IF q=2 THEN q=0
        IF t[j] = Match AND q=0
        inc j
dpos[j]=i
        END IF
        inc i
        IF i> mlen
        Exit While
        End iF
    WEND
    Return j
End Sub

sys c, string sc="HELLO,HELLO,HELLO,HELLO,HELLO,HELLO,HELLO,HELLO,HELLO,HELLO"
c= Tally sc,"HELLO"
print c

Aurel

  • Guest
Re: Tally slow ?
« Reply #8 on: June 29, 2012, 06:29:30 AM »
Hmm...
I know that string processing is time consuming but in this case something is really
strange and not very good.
I tested this function on DLib and work on speed of light ,dlib is just a bycode interpreter.
Then i try even in mine Abasic and is executed instantly to ???
If such a small and slow interpreter can do this in less then a second what
Oxygen doing  ???
when he need so much time,must be something connected with array filling... ::)

Charles Pegge

  • Guest
Re: Tally slow ?
« Reply #9 on: June 29, 2012, 10:33:32 AM »
Hi Aurel,

I did a timing analysis of your function. It runs at about 1 microsecond per byte on my PC, which is quite a lot. By not pulling strings my function runs around 50x faster.

By far the most expensive operations were the string comparisons. Those 2 lines accounted for 90% of the time. The procedures I use are not optimised. They support the worst case scenario where each side is treated as a full string expression rather than a simple string. So I think I could substantially improve performance here, possibly by a factor of 5.

Charles

Charles Pegge

  • Guest
Re: Tally slow ?
« Reply #10 on: June 30, 2012, 10:36:15 AM »
Well, I've got a 3X improvement here:
« Last Edit: July 01, 2012, 04:08:02 AM by Charles Pegge »

Aurel

  • Guest
Re: Tally slow ?
« Reply #11 on: July 01, 2012, 01:40:16 AM »
Thanks Charles.. :)
I understand what you talking about but is not clear to me how same function work
fast on other compilers which are far slower then oxygen in some other things that quick.
Ok i will try now...thanks again... ;)

Aurel

  • Guest
Re: Tally slow ?
« Reply #12 on: July 01, 2012, 02:32:03 AM »
It looks that now work faster but i found one small catch.
this is not executed correctly:
Code: [Select]
IF t$ = "'" THEN q = q + 1BUT this is correctly executed;
Code: [Select]
IF t$ = chr(39) THEN q = q + 1And work array is filled but still first zero member is empty because i
don't force indexbase to NULL.

Charles i hope that i not bothering you to much with this but
looks really strange such a slow respond on this func.
If you remember you show me way to fill string array with 500000 elements
which is executed like you say in a microsecond...
thanks...

Charles Pegge

  • Guest
Re: Tally slow ?
« Reply #13 on: July 01, 2012, 04:04:44 AM »
Hi Aurel,

I hope I've fixed the bug with this update. I could see what was happening.

When performing string operations, compilers will not give great advantage over interpreters. The low level steps required to process strings will be much the same. So you have to code in a different way to take advantage of compiler performance. Work with integers wherever possible, and use buffers to avoid adding little bits of string together.

I know Oxygen is a little faster than it used to be. I have a program which reads all the words in the bible, builds a word list, sorts them into alphabetic order, and tallies the occurance of each word. 2 years ago it took around 0.1 seconds. Now it is down to around 0.087 seconds on the same PC :)

The bible is over 4 Megs, and a vocabulary of over 13000 words.

Charles

[attachment deleted by admin]
« Last Edit: July 01, 2012, 04:09:55 AM by Charles Pegge »

Aurel

  • Guest
Re: Tally slow ?
« Reply #14 on: July 01, 2012, 04:36:24 AM »
Hi Charles...
I hope that i understand whole problematic( 50% for sure  ;D ).
Quote
So you have to code in a different way to take advantage of compiler performance. Work with integers wherever possible, and use buffers to avoid adding little bits of string together.

Hmm..
Well which way you can suggest me ?...
because im not sure how to do this on quick way as possible?

Yes results with bible text are impressive so something similar I need for this operation,right?

thanks for bugfix.. ;)

Aurel