May 06, 2024, 03:18:11 PM

News:

IonicWind Snippit Manager 2.xx Released!  Install it on a memory stick and take it with you!  With or without IWBasic!


Parsing with assembly

Started by aurelCB, September 24, 2010, 07:34:48 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

aurelCB

Hi ...
I have idea that maby parser written with inline asm will parse text or source code
much faster then code written in IWB (i mean in first place on my interpreter and may on some sort of text scaner)
but i dont know how do this?
I search trough NASM forum but i dont find anything usefull or some piece of code which
do this.
When i say parser i mean in first place on asm program which can split string to several segment
using predefined delimiter.
I only found one small static lib on one MASM page called getst.lib but without any expanation how
work.
If anyone have idea how do this with inline asm in IWBasic?

Thanks advance... :)
Aurel

Ficko

Hi aurelCB !

There is a sample could be useful for you writen in MASM and does some kind of parsing for tokens for the "Gold Parser Generator".
It is not directly what you wnat since it is the "engine" part of the parser system but may get you started.

http://www.devincook.com/goldparser/engine/assembly-x86/index.htm

Regards,
Ficko

aurelCB

Hi Ficko...thanks for a link.
I only want this:
a$="string1 string2 string3"
split to 3 new strings nothing else ...

sapero

September 24, 2010, 09:50:14 AM #3 Last Edit: September 24, 2010, 09:59:30 AM by sapero
Aurel, there is an easy to use crt function called strtok:
declare cdecl extern strtok alias _strtok(pointer text, string delimiters),pointer

string a$="string1 string2 string3"

pointer p = strtok(a$, " ") ' first time
while (p)
print *<string>p
p = strtok(NULL, " ") ' next time - NULL
wend

' second iteration
a$="string1,string2 string3"
p = strtok(a$, " ,") ' first time
while (p)
print *<string>p
p = strtok(NULL, " ,") ' next time - NULL
wend


In the first call to strtok, you need to pass the string to be tokenized. The second,third... time you must pass NULL.
You can break this loop at any time.
The variable a$ (the buffer passed as first parameter to strtok) will be modified.

Ficko

But if you want it in the "hard way" : ;D


CONST MaxSplit = 10
DEF A$:STRING
DEF StrPArray[MaxSplit]:INT
A$ = "First;Second;Third"
Split(A$,";")
FOR I = 0 TO MaxSplit - 1
IF StrPArray[I] THEN
PRINT *<STRING>(StrPArray[I])
FREEHEAP StrPArray[I]
ENDIF
NEXT I
WAITCON
END


SUB Split(Inp$:STRING,Delim:CHAR)
DEF Sp:POINTER
_asm
extern AllocHeap
extern __imp_RtlMoveMemory
mov edi, [ebp+8]
mov [ebp-4], edi
mov esi, $StrPArray
xor ecx, ecx
C01:movzx eax, byte [ebp+12]
C00:cmp byte [edi], 0
        jz Exit
inc ecx
scasb
jnz C00
call AllocString
lea esi, [esi+4]
mov [ebp-4], edi
jmp C01
Exit: cmp edi, [ebp+8]
jz J0000
inc ecx
call AllocString
J0000:
_endasm
ENDSUB

_asm
AllocString: push ecx
dec dword [esp]
push ecx
call AllocHeap
mov [esi], eax
push dword [ebp-4]
push eax
call [__imp_RtlMoveMemory]
ret
_endasm

aurelCB

WOW  :o
You guys are incredibile...
Thanks Sapero it looks very very interesting ....
Thanks Ficko i probably need 2 months that understand what is what ::)

I was wondering in first place how much faster is execution of assembler code insted of
usual IWB code.
Of course i will try both aproach.
Once againt thank's... ;)

I currently have
one WHILE loop which count words in line as splited by empty space
then after counting words
i have long tinghs as this:
IF wc>0       
    SPos = 1     
    EPos = InStr(abscript[start], " ",SPos) - 1
    If EPos <= 0 Then EPos = Len(abscript[start])
    GW1 = RTrim$(LTrim$(Mid$(abscript[start], SPos, EPos - SPos + 1)))
ENDIF

Which i mean use to much time...

Aurel

Ficko

September 25, 2010, 12:00:58 PM #6 Last Edit: September 25, 2010, 12:03:29 PM by Ficko
This is a slightly improved version from above.

By not using heap improves performance but the original string get's distroyed.
Have return value to determine the number of strings created.


' Splits "Inp$" by  "Deliminator" into "RetArray" returns number of strings placed into "RetArray"
DECLARE Split(Inp$:STRING,Deliminator:CHAR,RetArray:POINTER),INT
CONST MaxSplit = 10
DEF A$:STRING
DEF StrPArray[MaxSplit]:INT
A$ = "This string will be split into 8 strings"
W = Split(A$," ",StrPArray)
PRINT "Number of strings:",W,"\n"
FOR I = 0 TO W - 1
PRINT *<STRING>(StrPArray[I])
NEXT I
WAITCON
END

_asm
Split: push ebp
mov ebp, esp
push esi
push edi
push ebx
mov edi, [ebp+8]
mov esi, [ebp+16]
xor ecx, ecx
xor ebx, ebx
movzx eax, byte [ebp+12]
C01:mov [esi], edi
inc ebx
C00:cmp byte [edi], 0
       jz Exit
inc ecx
scasb
jnz C00
lea esi, [esi+4]
mov [edi-1], ah
jmp C01
Exit: mov dword [esi+4], 0          ;Can be omited since we have return value.
xchg eax, ebx
pop ebx
pop edi
pop esi
leave
ret 0x0C
_endasm


aurelCB

Cool.... :)
This is exactly what i need.
Thank you very much Ficko.! ;)
I never figured how you do this becose i dont know how assembler work :-\
Of course i will test this inside ABasic tonight....

Thanks again
all best...

Aurel

aurelCB

Im test this way and work in Abasic but unfortunatly there's no speed up as i espected.
it looks that i must change way of reading source code from current from RE control to read
line by line from file or diectly from memory.
Infact this way works fine... ;)

all best..
Aurel

Ficko

Yap, on modern processors any speed improvement is beyond human experience since IW generate pretty good assembler code as well.  :D
You get some feeling of improvement prehabs after you compiling 100000 line of code.

Do respect I dont think you have a source in "ABasic" even close that. ;D

Of curse you can get observable improvement if you load your source into memory at a chunk an work there with it instead of loading line by line.

aurelCB

QuoteDo respect I dont think you have a source in "ABasic" even close that.
;D
Yes of course is much much smaller.
QuoteOf curse you can get observable improvement if you load your source into memory at a chunk an work there with it instead of loading line by line.
And yes Ficko you are probably right.
I test once how works reading from string array and is faster and with using memory will be
best option.

all best... :)
Aurel

aurelCB

hmmm ...
I just lookin' into this topic and thinking that maybe problem of speed is in
FOR/NEXT loop ...
Is there a way to replace FOR loop with assembler code and then see
how fast is?