Author Topic: Parser Skeleton  (Read 2856 times)

0 Members and 1 Guest are viewing this topic.

Charles Pegge

  • Guest
Parser Skeleton
« on: May 16, 2012, 08:23:53 AM »
Essential component for programming languages:

Code: OxygenBasic
  1.  
  2.  
  3. string src=quote
  4.  
  5. --NorwegianParrotScript--
  6.  
  7. aa = (b+c)*d
  8.  
  9. --NorwegianParrotScript--
  10.  
  11. 'print a
  12.  
  13.  
  14. 'state variables
  15. '===============
  16.  
  17. sys ascw, 'ascii code of first word char
  18.    ascn, 'ascii code of next word or end of line
  19.    lenw, 'length of word
  20.    swd,  'start index of word
  21.    ewd,   'boundary index of word
  22.    dtp,   'dot position
  23.    tyw,   'type of word
  24.    opr,   'operator code
  25.    opn    'operand code
  26.  
  27. function nword(string s,sys *i)
  28. '==============================
  29. sys pt=strptr(s)
  30. sys b,e,p
  31. dtp=0
  32. tyw=0
  33. p=pt+i-1
  34. byte a at p
  35. '
  36. 'skip leading space and lines
  37. '
  38. do
  39.   select a
  40.   case 0
  41.     exit do
  42.   case 33 to 255
  43.     exit do
  44.   end select
  45.   p++
  46. end do
  47. b=p-pt+1
  48. ascw=a
  49. swd=b
  50. '
  51. 'locate boundary of word
  52. '
  53. 'quotes
  54. '
  55. if ascw=34 or ascw=96 then 'or ascw=39 ''
  56.  if tyw=0 then tyw=3
  57.   do
  58.     p++
  59.     if a=ascw then goto endnword
  60.   end do
  61. end if
  62.  
  63. do
  64.   select a
  65.   case 0 to 32
  66.     exit do
  67.   case 46
  68.     if dtp=0 then
  69.       dtp=pt-p+1 'DOT position
  70.    end if
  71.     p++ : continue do
  72.   case 48 to 57
  73.     if tyw=0 then tyw=1
  74.     p++ : continue do 'NUMS
  75.  case 65 to 90
  76.     if tyw=0 then tyw=2
  77.     p++ : continue do 'CAPS
  78.  case 96 to 122
  79.     if tyw=0 then tyw=2
  80.     p++ : continue do 'LOWERS
  81.  case 95
  82.     if tyw=0 then tyw=2
  83.     p++ : continue do 'UNDERSCORE
  84.  case 33 to 47
  85.     e=p-pt+1
  86.     if b=e then
  87.       tyw=4 'symbol !%& etc
  88.      p++
  89.     end if
  90.     exit do
  91.   case 58 to 64
  92.     e=p-pt+1
  93.     if b=e then
  94.       tyw=5 'symbol ;:<=>?@ etc
  95.      p++
  96.     end if
  97.     exit do
  98.   case 91 to 96
  99.     e=p-pt+1
  100.     if b=e then
  101.       tyw=6 'symbol [\] etc
  102.      p++
  103.     end if
  104.     exit do
  105.   case 123 to 126
  106.     e=p-pt+1
  107.     if b=e then
  108.       tyw=7 'symbol {|}~ etc
  109.      p++
  110.     end if
  111.     exit do
  112.   end select
  113.   p++
  114. end do
  115. '
  116. endnword:
  117. '--------
  118. '
  119. e=p-pt+1
  120. ewd=e
  121. '
  122. 'skip to next word or end of line
  123. '
  124. do
  125.   select a
  126.   case 0
  127.     exit do
  128.   case 10
  129.     exit do
  130.   case 13
  131.     exit do
  132.   case 33 to 255
  133.     exit do
  134.   end select
  135.   p++
  136. end do
  137. ascn=a
  138. lenw=e-b
  139. i=p-pt+1
  140. end function
  141.  
  142.  
  143.  
  144. 'states of tyw
  145. '=============
  146. ' 1 numbers
  147. ' 2 upper case
  148. ' 2 lowercase
  149. ' 3 quote
  150. ' 4 symbols
  151. ' 5 symbols
  152. ' 6 symbols
  153. ' 7 symbols
  154.  
  155.  
  156. function identify(s,i)
  157. '=====================
  158. if lenw=1
  159.   if tyw>3
  160.     opr=ascw
  161.     select case ascw
  162.     case "+"
  163.       if ascn=43 then
  164.         opr+=0x200
  165.       elseif ascn=61
  166.       end if
  167.     case "-"
  168.       if ascn=45
  169.         opr+=0x200
  170.       elseif ascn=61
  171.         opr+=0x100
  172.       end if
  173.     case "*"
  174.       if ascn=61
  175.         opr+=0x100
  176.       end if
  177.     case "/"
  178.       if ascn=61
  179.         opr+=0x100
  180.       end if
  181.     case "<"
  182.       if ascn=61
  183.         opr+=0x100
  184.       end if
  185.     case "="
  186.     case ">"
  187.       if ascn=61
  188.         opr+=0x100
  189.       end if
  190.     case ","
  191.     case ";"
  192.     case ":"
  193.     case "?"
  194.     case "@"
  195.     case "("
  196.     case ")"
  197.     case "["
  198.     case "]"
  199.     case "{"
  200.     case "}"
  201.     end select
  202.     exit function
  203.   end if
  204. end if
  205. end function
  206.  
  207.  
  208. string cr,tab,pr
  209. cr=chr(13)+chr(10)
  210. tab=chr(9)
  211. pr="WORDS AND WORD-TYPE:" cr cr
  212.  
  213. function ProcessScript(string s)
  214. '===============================
  215. string wr
  216. sys i=1
  217. do
  218.   opn=0
  219.   opr=0
  220.   nword s,i
  221.   identify s,i
  222.   wr=mid src,swd,lenw
  223.   if lenw=0 then exit do
  224.   pr+=wr tab tyw cr
  225. end do
  226.  
  227. print pr
  228.  
  229. end function
  230.  
  231. ProcessScript src
  232.  
  233.  

Charles

Aurel

  • Guest
Re: Parser Skeleton
« Reply #1 on: May 17, 2012, 03:18:43 AM »
Cool...
I don't know how i miss this topic.. ::)
If i understand right this parser parse:
into tokens ,right?

Charles Pegge

  • Guest
Re: Parser Skeleton
« Reply #2 on: May 17, 2012, 06:51:44 AM »

It does not go as far as generating tokens. I only wrote it yesterday. It collects information about each word it scans, like position, length, ascii code and symbol group, then I added an identify skeleton function, which could be part of a tokenising process.

Charles

Frankolinox

  • Guest
Re: Parser Skeleton
« Reply #3 on: May 28, 2012, 11:33:51 PM »
hi charles, I've tested your parser skeleton example, good stuff! :)

question: the symbol for " * " (multiplication) is same as " = " (result 5) ? why not using another one like (4) or (6) as for me that's easier to understand ;) do you have another example by hand? thanks, best regards, frank

Charles Pegge

  • Guest
Re: Parser Skeleton
« Reply #4 on: May 29, 2012, 01:48:04 AM »

I think you are referring to group codings, Frank.

'states of tyw
'=============
' 1 numbers
' 2 upper case
' 2 lowercase
' 3 quote
' 4 symbols
' 5 symbols
' 6 symbols
' 7 symbols



the actual operator code (based on ascii) for '*' multiply is 42
and for the '*=' operator it is 142

Similarly '+' is 43 and '+='=143,

Using the same coding strategy:
'++' is 243 and '--' is 245

I had to correct '++'

Code: OxygenBasic
  1.  
  2.     case "+"
  3.       if ascn=43 then
  4.         opr+=0x200
  5.       elseif ascn=61
  6.         opr+=0x100
  7.       end if
  8.     case "-"
  9.       if ascn=45
  10.         opr+=0x200
  11.       elseif ascn=61
  12.         opr+=0x100
  13.       end if
  14.     case "*"
  15.       if ascn=61
  16.         opr+=0x100
  17.       end if
  18.     case "/"
  19.       if ascn=61
  20.         opr+=0x100
  21.       end if
  22.     case "<"
  23.       if ascn=61
  24.         opr+=0x100
  25.       end if
  26.     case "="
  27.     case ">"
  28.       if ascn=61
  29.         opr+=0x100
  30.       end if
  31. ...
  32.  

Anyway nothing is cast in stone. Oxygen has its own mini-interpreter for metaprogramming. - resolving constants etc. This can be found in src/o2meta.bas function metaval(). I have just rewritten it to improve efficiency. It relies on functions found in o2lex and o2pars. It's all highly interconnected.

Charles