Author Topic: Sequential file reading (Read 5337 times)

Aurel · « **on:** March 02, 2015, 11:08:18 PM »

Is there any example of that ?
I mean read each line of file and put this text into one array element..
thanks

JRS · « **Reply #1 on:** March 02, 2015, 11:54:17 PM »

In Script BASIC I use SPLITA and use the line ending character as the delimiter.

Code: Script BASIC

OPEN "mytextfile" FOR INPUT AS #1
text = INPUT(LOF(1), 1)
SPLITA text BY "\n" TO txtarray
 

Aurel · « **Reply #2 on:** March 03, 2015, 02:15:30 PM »

There is no such a thing in oxygen - splitA

I see that we must detect chr(13) or CRLF and put each line into string array element?

JRS · « **Reply #3 on:** March 03, 2015, 03:59:52 PM »

Quote from: Aurel

Is there any example of that ?

I assumed you were looking for a generic example for direction. Please specify the BASIC next time as there are more than O2 topics discussed here on the forum. Thanks!

Aurel · « **Reply #4 on:** March 04, 2015, 11:14:47 PM »

WOW
look screenshot...

I just try o2 example LineSpliter.o2bas ....
I really don't know that print txt, txt as buffer from GetFile() return splited txt file?
I must know how this work because i need this one to avoid function Space().

Charles
where i must look in src of oxygen?

.

Charles Pegge · « **Reply #5 on:** March 05, 2015, 04:42:42 AM »

Hi Aurel,

I forgot all about the line splitter. It was originally written for your benefit.

I have revised it to support the reading of Linux LF terminated lines and Mac CR terminated lines, as well as our native CRLF.

Also cleaner indexing and pointering:

Code: OxygenBasic

 
'=============
'LINE SPLITTER
'=============
 
indexbase 1
string lines[100000]   'static lines array
sys i.j              'indexes
 
 
'data:
'=====
 
'txt=getfile "s.txt"
 
'txt=nuls 100000
'sendMessage richedit1,WM_GETTEXT,100000, strptr txt
'
txt="11
22
33
44
55
66
77"
 
'initial
'=======
'
'
b=1  'start of line
i=0  'lines array index
j=1  'char index
'
byte a at (strptr txt)
'
'splitter loop
'=============
'
do
  select a
  case 0 
    if j-b>0 then
      i++
      lines[i]=mid(txt,b,j-b)
    end if
    exit do
  case 10 'LF TERMINATED LINES
    i++
    lines[i]=mid(txt,b,j-b)
    b=j+1
  case 13 'CR AND CRLF TERMINATED LINES
    i++
    lines[i]=mid(txt,b,j-b)
    if a[2]=10 then @a++ : j++ 'SKIP LF
    b=j+1
  end select
  @a++ 'inc pointer for a
  j++    
end do
 
 
cr=chr(13)+chr(10)
print "Lines:  " i cr +
      "Line 5: " lines[5] + cr +
      "Last:   " lines[i]
 

Aurel · « **Reply #6 on:** March 05, 2015, 12:59:16 PM »

Thank you Charles on that
BUT
do you can explain me how command print txt
can split loaded asci file .rub into lines?

here is code:

Code: [Select]

'PRINT SPLITTER (messageBox o2 function)

indexbase 1
string src,source
string txt=nuls 100000 'buffer
string lines[100000]   'lines array
sys pt=strptr txt      'buffer base pointer
sys i.j,p              'indexes and pointer


'test data:
'==========
cr=chr(13)+chr(10)
'lc="abc"+cr
'mid (txt,1)=lc+lc+lc+lc+lc
txt = GetFile "C:\o2Basic\OxygenBasic\array1.rub"
print txt 'look this ????
'IT IS WOW !!! print already split file into lines?
'how is that possible ???

.

Charles Pegge · « **Reply #7 on:** March 05, 2015, 01:36:52 PM »

Hi Aurel,

Print (MessageBox) recognises CRLFs in the text string. The same is also true with console output.

Aurel · « **Reply #8 on:** March 05, 2015, 02:03:41 PM »

Hi Charles
Ok,so do i can see source code of print
or in another words ..is that code or method the same as given function?
I need just exactly what in this case print do
split file into lines ,then each line put into string array element ...
do i must look into o2lexi or somewhere else ?

Charles Pegge · « **Reply #9 on:** March 05, 2015, 03:24:20 PM »

Oxygen does not use line arrays. Instead, it encodes the physical line number into the code source text, which is one large string. This ensures that the physical line number passes through several compiler stages, which have different sets of lines.

Aurel · « **Reply #10 on:** March 05, 2015, 10:45:11 PM »

Quote

Oxygen does not use line arrays. Instead, it encodes the physical line number into the code source text, which is one large string. This ensures that the physical line number passes through several compiler stages, which have different sets of lines.

Ok Charles
But then ..what to use to emulate print spliting?
I am confused this time ...

Charles Pegge · « **Reply #11 on:** March 06, 2015, 02:06:30 AM »

Hi Aurel,

Here is a more familiar, though slightly less efficient line splitter using asc(txt,j), instead of a byte pointer.

Code: OxygenBasic

 
'=============
'LINE SPLITTER
'=============
 
indexbase 1
string lines[100000]   'static lines array
sys i.j              'indexes
 
 
'data:
'=====
 
'txt=getfile "s.txt"
 
'txt=nuls 100000
'sendMessage richedit1,WM_GETTEXT,100000, strptr txt
'
txt="11
22
33
44
55
66
77
"
 
'initial
'=======
'
'
b=1  'start of line
i=0  'lines array index
j=1  'char index
'
'
'splitter loop
'=============
'
do
  select asc(txt,j)
  case 0
    if j-b>0 then
      i++
      lines[i]=mid(txt,b,j-b)
    end if
    exit do
  case 10 'LF TERMINATED LINES
    i++
    lines[i]=mid(txt,b,j-b)
    b=j+1
  case 13 'CR AND CRLF TERMINATED LINES
    i++
    lines[i]=mid(txt,b,j-b)
    if asc(txt,j+1)=10 then j++ 'SKIP LF
    b=j+1
  end select
  j++    
end do
 
 
cr=chr(13)+chr(10)
print "Lines:  " i cr +
      "Line 5: " lines[5] + cr +
      "Last:   " lines[i]
 

Mike Lobanovsky · « **Reply #12 on:** March 06, 2015, 02:51:44 AM »

Aurel,

Every text file is just one long contiguous string of characters that ends in a unique short sequence of characters (or a single character) that form an operating system-dependent end-of-file (EOF) marker. Under Windows, EOF equals Ctrl+Z (ASCII code decimal 26).

Every file loading function would load this contiguous string of characters into one contiguous memory string of characters with only one substitution: it would change EOF into a zero to mark the end of an ASCIIZ string (Z here stands for "zero-terminated"). Unicode string loading requires more metamorphoses, and I won't discuss it here for simplicity.

In both the file character string and the corresponding memory ASCIIZ string, separate lines of text as we see them in a book are marked with end-of-line (EOL) markers that would be different for different operating systems a particular text file was created in. The EOL marker would be LF ("line feed", ASCII code decimal 10) in Linux, CR ("carriage return" a.k.a. "caret return", ASCII code decimal 13) in Mac OSX, and CRLF (two characters, "carriage return"+"line feed") in Windows. There can be other markers too if the text was created in some rich text editor like, say, MS Word. For example, there may be the so-called "soft line breaks" used for finer line formating; those would usually be denoted with three sequential characters, CRCRLF.

The existing implementations of Windows system services and controls like console windows, message boxes, multiline edit and rich edit controls do not need to split the ASCIIZ text string into any further separate lines of text. They just print the text character by character until they see a CR character that makes them return the caret to the beginning of the line, and a LF character that makes them "feed the line", i.e. scroll the text one line up, and put the caret one line down on a blank line. The caret (a.k.a. "carriage") is the term to denote a point at which the next "printable" character (that is, a character meaningful to the human reader) is about to be printed. In Windows, it is usually denoted with a blinking underline as in the console window, or a "beam-like" (i.e. capital I-shaped) cursor as in edit and rich edit box windows.

If for any reason you would like to split this contiguous ASCIIZ text string into separate text lines, then you will need a function that would scan the text string characters one by one in a loop. Then if the current character is "printable", the function would append it to the current line string in your line array, and if the current character is one of CR and LF characters, the function would stop appending to the current line string, proceed to the next line string in the array, and start appending the incoming "printable" characters to this next line string. The loop runs until the terminating zero character is met in the original ASCIIZ text string.

This is exactly what Charles' LINE SPLITTER function does. There is no other way to implement this functionality. John's SplitA(), my ArrayFromFile(), and all sorts of BASIC line-by-line text file reader functions are built around a similar splitter.

Aurel · « **Reply #13 on:** March 06, 2015, 04:55:09 AM »

Thanks Charles

Ok i will try last method even i understand very well method with byte pointer to.
Still ..i have feeling that method how command print split file is a sort of secret
or maybe i am wrong?

here is code which i can copy/paste from my browser:

Code: [Select]

'=============

'LINE SPLITTER

'=============
indexbase 1

string lines[100000]   'static lines array
sys i.j              'indexes
'data:
'=====

 

'txt=getfile "s.txt"
'txt=nuls 100000
'sendMessage richedit1,WM_GETTEXT,100000, strptr txt
'
txt="11
22
33
44
55
66
77
"
'initial
'=======
'
'
b=1  'start of line
i=0  'lines array index
j=1  'char index
'
'
'splitter loop
'=============
'
do
  select asc(txt,j)
  case 0
    if j-b>0 then
      i++
      lines[i]=mid(txt,b,j-b)
    end if
    exit do
  case 10 'LF TERMINATED LINES
    i++
    lines[i]=mid(txt,b,j-b)
    b=j+1
  case 13 'CR AND CRLF TERMINATED LINES
    i++
    lines[i]=mid(txt,b,j-b)
    if asc(txt,j+1)=10 then j++ 'SKIP LF
    b=j+1
  end select
  j++    
end do

 
cr=chr(13)+chr(10)
print "Lines:  " i cr +
      "Line 5: " lines[5] + cr +
      "Last:   " lines[i]

Mike
There is no need for such a huge explanation, i am not from yesterday..ok

but maybe you can show internal code of ArrayfromFile() ...

Mike Lobanovsky · « **Reply #14 on:** March 06, 2015, 05:48:51 AM »

No Aurel,

You are not from yesterday. You must be from Mars if, after my huge explanation, you're still having problems with "how command print split file".

[EDIT]

And oh,

Quote

... maybe you can show internal code of ArrayfromFile() ...

Are you sure you could use one?

Code: C

FBSL_FUNCTION(Array_FromFile)
{
        LPFBVARIANT v = call_get_param_str(0);
        int len = call_get_param_int(1);
 
        FILE *infile = _fsopen( v->strVal, "r", SH_DENYNO );
        if (!infile) {
                g_errStr = v->strVal;
                return ERR_CALL_UNABLE_TO_OPEN_FILE;
        }
 
        char *szLine = my_alloc_az(len);
 
        int j = 0;
        while (fgets(szLine, len, infile)) j++;
 
        static FBARRAYBOUND ab;
        ab.LBound = g_optionBase;
        ab.UBound = ab.LBound + j - 1;
        *pvret = FBSL_newArray_ab(call->script, call->where, 0, &ab, 0, NULL);
        LPFBVARIANT pData = (*pvret)->parray->pData;
 
        FBVARIANT e;
        int k = 0;
 
        rewind(infile);
        while( fgets(szLine, len, infile) ) {
                j = strlen(szLine);
                if (j && szLine[j-1] == '\n')
                        szLine[--j] = 0;
                mkStrFbVariant(&e, szLine, j);
                copyFbVariant(&pData[k++], &e);
        }
        fclose(infile);
        my_free_ref(szLine);
 
return 0;
}

Oxygen Basic

News:

Author Topic: Sequential file reading (Read 5337 times)

Aurel

Sequential file reading

JRS

Re: Sequential file reading

Aurel

Re: Sequential file reading

JRS

Re: Sequential file reading

Aurel

Re: Sequential file reading

Charles Pegge

Re: Sequential file reading

Aurel

Re: Sequential file reading

Charles Pegge

Re: Sequential file reading

Aurel

Re: Sequential file reading

Charles Pegge

Re: Sequential file reading

Aurel

Re: Sequential file reading

Charles Pegge

Re: Sequential file reading

Mike Lobanovsky

Re: Sequential file reading

Aurel

Re: Sequential file reading

Mike Lobanovsky

Re: Sequential file reading