Author Topic: A helper tool for indenting source code  (Read 1310 times)

0 Members and 1 Guest are viewing this topic.

Arnold

  • Guest
A helper tool for indenting source code
« on: June 21, 2019, 12:11:54 PM »
Hi Charles,

your indenter.o2bas in \tools\CodeMan is very fascinating. As I will need some functions for another little project I de-ooped the code. This is certainly not the safest way, but it helped me to understand better what happens. By comparing some test files like anwinh.inc, rtl64.inc, o2scm.o2bas etc, I added some modifications in function spacer() and sub indent(). Probably the simple checks in sub indent() should be already done in function spacer(), but perhaps the modifications so far can already give some ideas for indenter.o2bas. I stored the compiled exe file in my environment path and it will help me to format my future code.

Roland

This is the code of CodeIndent.o2bas, formatted with CodeIndent:

Code: [Select]
'This is the code of \tools\CodeMan\Indenter.o2bas
'Not OOP
'will not work with unicode text files

$ filename "CodeIndent.exe"
'uses rtl32
'uses rtl64


'FORMATTING SOURCE CODE
'======================
'Modified code to run as a console application
'Multiline comments are not formatted
'line breaks are coded as chr(13,10) generally
'Tabs - chr(9) - are replaced with 4 spaces
'Some simple checks for:
'function=, ==== and ---- blocks (not foolproof)
'Option to change Left Margin and Step Size (default 2, 3)

#console
% TryAttachConsole

uses console

! GetLastError lib "kernel32.dll"
! SetLastError lib "kernel32.dll"

function replace(string t,w,r) as string    'parseutil.inc
========================================
   '
   sys a,b,lw,lr
   string s=t
   '
   lw=len(w)
   lr=len(r)
   a=1
   do
      a=instr(a,s,w)
      if a=0 then exit do
      s=left(s,a-1)+r+mid(s,a+lw)
      a+=lr
   end do
   return s
end function


================
'word management
================

sys    bb   'starting position
sys    bc   'starting position of word
sys    aa   'ascii code of word
sys    le   'length of word
string w    'temp word

sub skiplspace(s as string, i as sys)
=====================================
   sys a
   if i>len(s) then exit sub
   do
      a=asc s,i
      select a
      case 0         : exit do
      case 13        : exit do 'exit at end of a line
      case 10        : exit do
      case 33 to 255 : exit do 'exit at start of word   33 = !
      end select
      i++
   end do
end sub

function getword(string s, sys*b) as string
===========================================
   sys a,c,d
   a=0
   bb=b                                           'bb=starting position
   do
      c=asc s,b
      select c
      case 0, 33 to 255 : exit do
      end select
      b++
   end do
   bc=b                                           'bc=starting position of word
   aa=c                                           'aa=ascii code of word
   select c
   case 34,96 'CAPTURE QUOTES " `
      do
         b++
         d=asc s,b
         if d=0 or d=c then b+=1 : jmp fwd done
      end do
   end select
   do
      c=asc s,b
      select c
      case 0 to 32    : exit do 'white space
      case 35,95      :         ' # _
      case 48 to 57   :         'numbers
      case 65 to 90   :         'capitals
      case 97 to 122  :         'lower case
      case 127 to 255 :         'higher ascii
      case else
         if b=bc then b+=1      ' self-terminating symbol,   'bc=starting position of word
         exit do
      end select
      b++
   end do
   '
   done:
   '
   le=b-bc                      'bc=starting position of word  'le=length of word
   if le then return mid s,bc,le
end function

function instrword(sys i,string s,k) as sys   'find pos of word k in string s
===========================================
   'i=index, s=text, k=word to find
   do
      w=getword s,i                'w=temporary word
      if aa=0 then return 0        'aa=ascii code of word
      if w=k then return bc        'bc=starting position of word
   end do
end function

function instrwordce(sys i,string s,k) as sys     ' not used?
=============================================
   do
      w=getword s,i                'w=temporary word
      if aa=0 then return 0        'aa=ascii code of word
      if aa=58 then return i       ' 58 = :
      if w=k then return i         'i=new index
   end do
end function



================
'buffer handling
================
'
string br    'buffer string
sys    lb    'length of buffer
sys    chars 'accum char count

function buf_get() as string
============================
   return left br,chars
end function

sub append(string s)
====================
   sys a,le
   le=len s
   a=le+chars
   if a>lb
      br+=space(8192) 'stretch
      lb+=8192
   end if
   mid (br,chars+1)=s
   chars+=le
end sub
'


==============
' indenting
==============
'
'SETTINGS
sys     instep  'indent step size
sys     lmargin 'left margin
'SHARED STATES
sys     lcount  'line counter                           - Purpose?
sys     dcount  'indent counter (test)                  - Purpose?
sys     embcmt  'embedded comment flag /* .. */
sys     indi    'indent relative
sys     indo    'indent accumulator

function spacer(string t,s1,s2,sys n) as sys    't=(lowercase) text line, s1,s2=word1,word2, n=next line indent count
============================================
   sys a,b,c,le,i=1          'a=pos of some ascii chars, b=block, c(?), le(?), i=index

   if left(t,3)="sub" then
      if instr(t, "(") = 0 then
         if instr(t,",") > 0 then
            'assembler instruction
            return 1
         end if
      end if
   end if

   b=instrword i,t,s1        'find first word

   if b=1 'start of word
      n*=instep 'create a new indent for the next line
      if s2
         b=instrword i,t,s2 'block closing on same line  , find second word
      else
         b=0                 'no end of block
      end if
      '
      if b then 'check for commented out end-blocks
         if s2="then" then
            b+=4
            skiplspace t,b
            select asc t,b
            case 0,39,59 : indi=n    ' chr(0), ', ;
            case 47                  '/
               if mid(t,b,2)="//" then indi=n
               if mid(t,b,2)="/*" then indi=n
            end select
         end if
         a=instrword 1,t,"'" : if a>0 and a<b then indi=n
         a=instrword 1,t,";" : if a>0 and a<b then indi=n
      else
         indi=n
      end if 's2
      if n=0 then indo-=instep : indi=instep 'else elseif case
      if indi<0 then indo+=indi : indi=0
      if indo<0 then indo=0 'clamp min indentation
      'when indi>0 then the indent will be applied to the next line
      return 1
   end if 's1
end function

function clean(string s) as string
==================================
   sys a,i,e
   string bu=s
   e=len bu
   do
      i++
      if i>e then exit do
      a=asc bu,i
      select a
      case 13,10   : 'do not replace
      case 0 to 31 : mid bu,i," " 'replace with space
      end select
   end do
   return bu
end function

sub indent(string dat)
======================
   '
   string cr=chr(13,10)
   string tab=chr(9)
   sys    a,b  'temp
   sys    ex   'exit flag
   sys    pu=1 'next line position
   string s    'line string
   string t    'line string lowercase
   string bu   'cleaned dat
   '
   bu=clean dat
   do
      if ex then exit do
      a=instr pu,bu,cr
      if a=0 then ex=1 : a=1+len bu
      '
      s=mid bu,pu,a-pu  'extract line
      pu=a+len(cr)      'skip cr
      '
      do 'redocmt:
         '
         s=ltrim rtrim s 'remove previous indents and endspace
         t=lcase s       'view lowercase for keyword matching
         '
         if embcmt       'Comment flag
            a=instr t,"*/"
            if a
               embcmt=0
            end if
            exit do
         else
            if asc(t)=47                 ' /
               if asc(t,2)=42            ' *
                  b=instr(3,t,"*/")
                  if b
                     t=mid(s,b+2)
                     s=space(lmargin+indo)+left(s,b+1)   'indo=indent accumulator
                     append s
                     s=t
                     continue do 'look at rest of line again
                  else
                     embcmt=1
                  end if
               end if
            end if
         end if
         exit do
      end do

      '
      if embcmt=0 then                  'not in multiline comment
         if     spacer(t, "(", ")",1)
         elseif spacer(t, "{", "}",1)
         elseif spacer(t, "select", "",1)
         elseif spacer(t, "sub", "",0)
         elseif spacer(t, "function", "",0)
         elseif spacer(t, "method", "",-1)
         elseif spacer(t, "methods", "",-1)
         elseif spacer(t, "typedef", "",-1)
         elseif spacer(t, "type", "",0)
         elseif spacer(t, "class", "",-1)
         elseif spacer(t, "macro", "",-1)
         elseif spacer(t, "def", "",-1)
         elseif spacer(t, "bind","",-1)
         elseif spacer(t, "declare","",-1)
         elseif spacer(t, "scope", "",1)
         elseif spacer(t, "if", "then",1)
         elseif spacer(t, "elseif", "",0)
         elseif spacer(t, "else", "",0)
         elseif spacer(t, "case", "",0)
         elseif spacer(t, "for", "next",1)
         elseif spacer(t, "do", "",1)
         elseif spacer(t, "loop", "",-1)
         elseif spacer(t, "while", "",1)
         elseif spacer(t, "end", "",-1)
         elseif spacer(t, "endif", "",-1)
         elseif spacer(t, "wend", "",-1)
         elseif spacer(t, "next", "",-1)
         elseif spacer(t, "endsel", "",-1)
         elseif spacer(t, "}", "",-1)
         end if
      end if
      '
      if s=""
         s=cr
      else
         s=space(lmargin+indo)+s+cr             'indo=indent accumulator
         's=string(lmargin+indo," ")+s+cr
      end if


      'only simple checks
      if lcase(mid(s,lmargin+indo+1,11)) = "function  =" then  s = space(instep) & s
      if lcase(mid(s,lmargin+indo+1,10)) = "function =" then   s = space(instep) & s
      if lcase(mid(s,lmargin+indo+1,9))  = "function="  then   s = space(instep) & s
      string tmp_space=space(instep)
      string tmp_s=tmp_space & "==="
      if mid(s,lmargin+1,len(tmp_s)) = tmp_s then s=space(lmargin) & mid(s,lmargin+instep+1)
      tmp_s=tmp_space & "'==="
      if mid(s,lmargin+1,len(tmp_s)) = tmp_s then s=space(lmargin) & mid(s,lmargin+instep+1)
      tmp_s=tmp_space & "---"
      if mid(s,lmargin+1,len(tmp_s)) = tmp_s then s=space(lmargin) & mid(s,lmargin+instep+1)
      tmp_s=tmp_space & "'---"
      if mid(s,lmargin+1,len(tmp_s)) = tmp_s then s=space(lmargin) & mid(s,lmargin+instep+1)
      if left(ltrim(s),1)=")" then indo-=indi-instep : if indo<0 then indo=0

      '
      if asc(s) != 0 then 'Win64 needs this sometimes?
         append s
         indo+=indi                        'indi=indent relative     'indo=indent accumulator
         if indi>0 then dcount+=1
         indi=0
         lcount+=1
      end if
   end do
end sub
'

sub indent_file()
   string Fname, fn, ext, NewFname, ans
   string txt
   int x
   string cr=chr(13,10)
   string tab=chr(9)

   '
   'Clear
   txt=""
   br = ""   'buffer string
   lb = 0    'length of buffer
   chars = 0 'accum char count

   SetConsoleTitle "OxygenBasic Source Code Formatter"

   cls
   print "--- Expects correctly running code ---"
   printl "Filename to format? "
   Fname = input()
   Fname = ltrim rtrim(Fname)
   if len(Fname)=0 then printl "No valid Filename" : exit sub
   for x=len(Fname) to 1 step -1
      if mid(Fname,x,1)="." then exit for
   next
   if x>0 then
      fn=left(Fname,x-1)
      ext=mid(Fname,x+1)
      'simple check
      if lcase(ext)="exe" then printl "Exe files cannot be formatted" : exit sub
      if lcase(ext)="dll" then printl "Dll files cannot be formatted" : exit sub
      NewFname=fn & "_out." & ext
   else
      NewFname=Fname & "_out"
   end if

   'load filename
   txt=getfile Fname
   if len(txt)=0 then printl "Error whith loading file: " Fname : printl : exit sub

   'replace tab with "    "
   txt=replace(txt,tab,"    ")

   'perhaps unix encoded - change to dos
   string unix_cr=chr(10)
   if instr(txt,cr) = 0 then
      'no cr found, probably unix file
      txt=replace(txt,unix_cr,cr)
   end if

   string num
   int n
   printl "Left Margin (0-5, default = 2) "
   num=rtrim ltrim input()
   if len(num)=0 then
      lmargin=2
   else
      n=val(num)
      if num<0 then
         lmargin=0
      elseif n>5 then
         lmargin=5
      else
         lmargin=n
      end if
   end if
   print "Left Margin = " lmargin

   printl "Indent step size (0-5, default = 3) "
   num=rtrim ltrim input()
   if len(num)=0 then
      instep=3
   else
      n=val(num)
      if n<0 then
         instep=0
      elseif n>5 then
         instep=5
      else
         instep=n
      end if
   end if
   print "Indent step size  = " instep
   printl

   'Indent and save
   indent txt

   SetLastError 0
   putfile NewFname, buf_get
   'Check if successful
   int err=GetLastError
   if err=183 then err=0 'file already exists, will override
   if err then mbox "Error: Cannot write to file "+ NewFname : exit sub

   'printl : print buf_get

   printl
   printl "Done. Saved as " NewFname : printl
   printl "Reminder:"
   printl "Some items must be checked separately for possible identing e.g."
   printl "===... , '===..., ---...,'---.... or similar blocks"
   printl "function = ... statements"
   printl "<label>: identifiers , can be found searching for goto, gosub"
   printl "sub as an assembler statement"
   printl "code with asm statements might need apdaption"
   printl "indenting after def, typedef et al. might be not accurate"

end sub


string answer
indent_file()

while 1
   printl
   printl "Format another file? (y/n) "
   loop1:
   answer=ltrim rtrim(input()) : if lcase(answer)="y" then indent_file()
   if lcase(answer)="n" then exit while
wend


Mike Lobanovsky

  • Guest
Re: A helper tool for indenting source code
« Reply #1 on: June 21, 2019, 01:21:34 PM »
Hi Roland,

That's a nice attempt that could grow into a really handy tool for the entire community, especially if integrated directly into some existing O2 editor.

Yet IMHO your code beautifier script is far from perfect in that its output still looks wild from a canonical point of view. To make it minimally usable, it should:

0. Enforce grammatically correct spacing before and after every delimiter in the language except in string literals, e.g. =, +, -, *, /, +=, -=, etc. should ensure there's always one and only one space both before and after the delimiter while commas, colons and semicolons should always have no whitespace before the delimiter, and one and only one space, after it, etc. regardless of whether Frenchmen tend to use spaces before their punctuation signs or not, simply because it looks wild, barbaric and illiterate to the rest of civilized world.

1. Ensure the user has an option to choose between tab and space characters as the leading space for indentation, and also to set the exact number of space characters to a tab if the user prefers to see spaces instead of tab characters when reformatting someone else's mess.

2. Provide an option to erase automatically all whitespace on visibly empty lines, and also all whitespace at the end of both commented and uncommented lines.

3. Etc., etc., etc.

Note that three-space tabbing, no spaces before and/or after function arguments, commas, assignments, etc. will never be allowed in team development practices by a team lead who claims to be in his right senses, regardless of whether it is Charles' or someone else's personally preferred coding style. ;)

Arnold

  • Guest
Re: A helper tool for indenting source code
« Reply #2 on: June 21, 2019, 03:22:50 PM »
Hi Mike,

of course you are right with your arguments. But this was my first attempt to understand the inner workings of indenter.o2bas and explore a possible extension. After I noticed that I should better use some statements in function spacer(), I decided to take a little break for the moment in order to examine the results of CodeIndent with some more source code files. (and perhaps get noticed by Charles of some logical errors in my code)

My main goals with this project was to make sure that the code files use chr(13,10) for line breaks generally (Unix files seem not to be processed) and to avoid tabs which look so different with different editors. This could be switched as an option. At the moment there is only the option for Left Margin, and a general Indention Step. I suppose there could be added more options, but probably then this should be a windows app.

For a first basic indenting the CodeIndent app is certainly helpful. I noticed with my own files, that the result can be a bit more compact sometimes, because spaces at the end of a line are cleared. But of course there are some more aspects which should be considered. It would also be interesting to see how to treat unicode text files.

Is there some basic information available about formatting source code in general? I tried to find some rules in Internet, but it seems that each language has its own policy.

Roland

Charles Pegge

  • Guest
Re: A helper tool for indenting source code
« Reply #3 on: June 23, 2019, 02:05:04 AM »
Hi Roland and Mike,

I'm going to make  the o2 lexing layers modular (lexa.inc and lexi.inc), so they can be used by source-code processing tools following the same reading rules as the compiler. This should make it easier to space names, operators and punctuation according to style.

The unicode question is an interesting one. o2 currently converts Unicode scripts to Ansi, encoding utf-16 characters, and stripping out comments. As this is a destructive process, it won't work well for source-code tools. The ideal solution would be to make the lexers work with wide characters instead of Ansi bytes.