April 19, 2024, 01:00:18 PM

News:

IonicWind Snippit Manager 2.xx Released!  Install it on a memory stick and take it with you!  With or without IWBasic!


Ficko's Split Assembler Function

Started by billhsln, April 09, 2020, 06:56:09 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

billhsln

Does any one else use Ficko's Split Subroutine.  I use it and every once in a while it goes to never never land.  It is not consistent, but every once in a while I find a record that it just aborts on.  Does this happen to any one else.  Maybe some one has improved on the subroutine?  File length does not seem to be the problem, I have used standard strings and ISTRINGs with 1600 chars or more.  So, just seeing if any one else uses it and if they have had problems.  Right now I am trying to split an istring with 800 chars with 47 fields into a pointer array defined with 50 fields.

Thanks,
Bill

PS, here is the Code, just in case some one else wants to test. 

' Split created by Ficko
_asm
Split: push ebp
mov ebp, esp
push esi
push edi
push ebx
mov edi, [ebp+8]
mov esi, [ebp+16]
xor ecx, ecx
xor ebx, ebx
movzx eax, byte [ebp+12]
C01:mov [esi], edi
inc ebx
C00:cmp byte [edi], 0
        jz Exit
inc ecx
scasb
jnz C00
lea esi, [esi+4]
mov [edi-1], ah
jmp C01
Exit: mov dword [esi+4], 0          ;Can be omited since we have return value.
xchg eax, ebx
pop ebx
pop edi
pop esi
leave
ret 0x0C
_endasm
When all else fails, get a bigger hammer.

billhsln

So far, no one else uses this subroutine?

Bill
When all else fails, get a bigger hammer.

h3kt0r

I'm in no way knowledgeable in ASM but whenever i see something like :

QuoteExit:   mov dword [esi+4], 0          ;Can be omited since we have return value.

i'm tempted by commenting this out. If it can be omitted, then omit it just to see if it produces
different results...

billhsln

I was worth a try, which I did.  It still goes to never never land.

Wish I knew assembler, it works most of the time and every once in a while it doesn't.  It is consistent in as it always aborts on the same file, even if I do it in another program.

Bill
When all else fails, get a bigger hammer.

h3kt0r

April 19, 2020, 02:28:23 PM #4 Last Edit: April 19, 2020, 02:34:59 PM by h3kt0r
Perhaps the problem comes from this particular file ?
There might be some strange character / escape sequence (EBDIC ?) in it that causes the string parsing
routine to fail ?
You could also check the file encoding (UTF8/UTF16, PC/DOS, MAC, UNIX, etc...) and convert it to PC/DOS...

billhsln

1252-Latin 1 (DOS) works, the one that didn't was 65001 (UTF-8) DOS did not.  I converted it to 1252-Latin 1 (DOS) and it still aborts.

Was something I had not thought of, but still no luck.  I wish there was a way to get some kind of good error out of the Assembler.  I get no message back.

Thanks for the good idea, to bad it didn't turn out to fix the problem.

Bill
When all else fails, get a bigger hammer.

aurelCB

Bill
I am not sure what you doing with that assembly routine ..
but maybe something like tokenizer can help you because tokenizer read each character and detect it 
if he cannot detect then he can ecit with error and show you which char cannot read.
I hope that you understand and this can be done in BASIC not in assembly

h3kt0r

Here is a trick that might work : open the culprit file in Notepad, Select All, Copy.
Then run another text editor (like Notepad++, NoteTab or TextPad...) and Paste the content
from clipboard. Then Save the file as new and test this in your program...

billhsln

Pulled up on UltraEdit, copied to Notepad and saved.  Ran program, it aborted.  Then copied into Notepad++, saved again.  Ran program, it aborted again.

Still no luck, like I said, it is a weird problem.

Bill
When all else fails, get a bigger hammer.

h3kt0r


aurelCB

Bill
how large is that your file?
when i say tokenizer...probably i need to say scanner...
it is best method in which you can scan each character and detect error.
i don't see better option..
also as far as i know iString type is limited to 16484 chars i think...
maybe you need larger buffer with type byte...

billhsln

Maximum record size is 760, which is a header record.  Largest detail record, is 443.

If it wouldn't be name and address information, which is company private, I would just put the file here for some one else to test.

I just think its weird that it works on most of the files I use it on, but so far about 3 files, it barfs on.

Bill
When all else fails, get a bigger hammer.

aurelCB

wow Bill
you have lot of records there ..
maybe is a option to use linked list and to each node add array of 440 items

Brian

Aurel,

I have tested 11 of Bill's files for him on my PC, and didn't have one error, so he is rather stumped

Brian

Andy

Bill,

I'm not sure what the ASM routine does, but there must be a common factor to these 3 files.

I presume the function is working with letters a to z and / or numbers 0 to 9? and possibly other characters you can type?

Had problems with files before, I personally would scan what's being sent to the ASM routine for the ASCII codes of each character and send them to a new file.

When the routine goes into never never land abort the program and look at the new file for the last few ASCII codes written.

Maybe there is a "Null" character somewhere - or something like that?

Andy.
Day after day, day after day, we struck nor breath nor motion, as idle as a painted ship upon a painted ocean.

fasecero

I don't know if there is someone left who knows assembler... but if I were in Bill's place, I would discard the function and write it again with basic commands until it mimics the output. For example, this is how to split a string using c-runtime

$INCLUDE "windowssdk.inc"

OPENCONSOLE
string indata = "hello,how,are,you,have,a,good,one"
pointer token = strtok(indata, ",")

WHILE token
PRINT *<string>token
token = strtok(NULL, ",")
   ENDWHILE

DO:UNTIL INKEY$ <> ""
CLOSECONSOLE

aurelCB

Quotebut if I were in Bill's place, I would discard the function and write it again with basic
yes i agree with you!

Andy

Bill sent me his code plus some files to test.

Although the code worked on Brian's machine, all files failed on mine.

The only conclusion I can come to is the difference in our Operating systems.

A question for Fasecero:

The C runtime you posted works, but can it work for multiple delimiters?

e.g. ", " + chr$(9) (tab etc)....

Thanks,
Andy.
 
Day after day, day after day, we struck nor breath nor motion, as idle as a painted ship upon a painted ocean.

Brian

Andy,

Messed about with Fasecero's code, and this takes an input.txt file and writes an output.txt file. It removes the tabs and writes each line out with a space between words, up to 10 lines

Brian

Brian

Hi,

Just been reading up on this function, and realise you can remove more than one token. So this sample removes tabs, commas, pipes and double quotes, and replaces them with just spaces between the words, all in one go

Brian

fasecero

QuoteThe C runtime you posted works, but can it work for multiple delimiters?
Yes you can, just put all delimiter characters together and it should work. Looks like Brian worked on your question.

Andy

Thanks guys,

That's a handy function to know about.

For it to be useful to me I would need to know the following:

1. Which delimiter character was found.
2. The start / end position in the string the word is, and the word might occur more than once in a line.

Thanks,
Andy.
 :)
Day after day, day after day, we struck nor breath nor motion, as idle as a painted ship upon a painted ocean.

Andy

April 26, 2020, 03:13:46 AM #22 Last Edit: April 26, 2020, 04:09:13 AM by Andy
Well, with a bit of playing around I've managed to answer my own question.

We have to:

1. Make a copy of the string before we split it into words as the original gets eaten up.
2. Token returns a number, which is actually the pointer value of our string PLUS the delimiter offset.

So if the pointer value to our string was

2000000

And our first delimiter was at position 8 in the string, then Token will return

2000008

Anyway, it seems to work on initial testing.

Andy.
Day after day, day after day, we struck nor breath nor motion, as idle as a painted ship upon a painted ocean.

Brian

April 26, 2020, 06:42:31 AM #23 Last Edit: April 26, 2020, 07:00:42 AM by Brian
Andy,

I don't get any text written to output.txt. Can't see why, though

Brian

PS: Sorted it . . .