IonicWind Software

IWBasic => General Questions => Topic started by: BJ on June 25, 2010, 12:50:54 PM

Title: Help with Anagram Check
Post by: BJ on June 25, 2010, 12:50:54 PM
First off.. Hello! ;D  Now on to the problem...
This is working code I ported from a VB project of mine.

In the IWBasic code, it's much slower when the two words are not anagrams, very fast when they are.
In VB, when they are not anagrams it is many times faster than the IWBasic code for the same case.
The IWBasic code is a lot quicker than the VB code when they are anagrams.

When they are anagrams:          IWBasic:    180ms -versus- VB: 1,000ms
When they are not anagrams:    IWBasic: 3,700ms -versus- VB: 1,300ms
These results gotten with the follwing strings and the below code: test & tess  AND  test & test

I am trying to figure out why it's so very slow when the words are not anagrams.  Any ideas?

Thanks!

// Additional info: In VB I've found this to be the quickest for its use --actually, the immediately following method is faster for this use, whoops-- (comparing 1 word against a word list) versus, say, counting the presence of each character for both strings then comparing.  I will likely just sort in-place the input then compare it against a presorted word list. Nevertheless, I am still curious about this.

(Compile as Console EXE and, if you don't have the community command pack, change the last CCGETKEY().)


DECLARE IMPORT,QueryPerformanceCounter(pPerformanceCount as pointer),INT64
DECLARE IMPORT,QueryPerformanceFrequency(pPerformanceCount as pointer),INT64

DOUBLE dOverhead = 0.0
INT64 CountsPerSecond = 0
INT64 sTime = 0
INT64 eTime = 0

SUB TimerInit()
   CountsPerSecond = 0
   sTime = 0
   eTime = 0
   
   QueryPerformanceFrequency(&CountsPerSecond)
   
   QueryPerformanceCounter(sTime)
   For i = 1 To 100
       QueryPerformanceCounter(eTime)
   Next i
   dOverhead = (eTime - sTime) / 100
ENDSUB

SUB TimerStart()
   QueryPerformanceCounter(&sTime)
ENDSUB

SUB TimerEnd(), DOUBLE
   QueryPerformanceCounter(&eTime)
   RETURN 1000 * ((eTime - sTime) - dOverhead) / CountsPerSecond
ENDSUB

' ---

SUB AnagramCheck (inpStr1 as ISTRING BYREF, inpStr2 as ISTRING BYVAL), INT

   If inpStr1 = inpStr2 Then
       RETURN 1
   ENDIF

   INT inpLen = Len(inpStr1)
   
   If Len(inpStr2) <> inpLen Then
       RETURN 0
   ENDIF
   
   INT i = 0
   INT iStr2 = 0
   ISTRING strTmp = ""
   
   For i = 1 To inpLen
       strTmp = Mid$(inpStr2, i, 1) ' inpStr2[i] is referenced twice.
       If INSTR(inpStr1, strTmp, i) = 0 Then ' If letter isn't present in inpStr1 then it's not an anagram.
           RETURN 0
       ENDIF
       
       iStr2 = INSTR(inpStr2, Mid$(inpStr1, i, 1), i) ' Does inpStr1[i] exist in inpStr2 and where?
       If iStr2 = 0 Then
           RETURN 0 ' If not present then it can't be an anagram.
       ENDIF
       
       If i <> iStr2 Then ' If letter isn't already in position...
           inpStr2[i-1] = inpStr2[iStr2-1] ' Swap!
           inpStr2[iStr2-1] = strTmp
           If inpStr1 = inpStr2 Then ' Once the string has been rearranged into the other we're done.
               BREAKFOR
           ENDIF
       ENDIF
   Next i

   RETURN 1
ENDSUB

' ---

int i = 0

string str1
string str2

print "Give me two strings to compare:"

input "String 1: ", str1
input "String 2: ", str2
print "": print "Thinking..."

TimerInit()

TimerStart()
for i = 1 to 459024
   AnagramCheck(str1, str2)
next i
tResult = TimerEnd()

print ""
print "AnagramCheck:" + str$(AnagramCheck(str1, str2)) + " --" + str$(tResult) + "ms"

CCGETKEY()
Title: Re: Help with Anagram Check
Post by: sapero on June 25, 2010, 02:08:56 PM
Hello James, welcome!

First of all : ISTRING strTmp = "" is not correct. The compiler does not show any error, instead it assumes that your string is up to single byte long, like CHAR (type)
When you copy something to it, you may overwrite other variables in current subroutine, causing dead-loops (executive hungs).

For STRING, the size is fixed 255 bytes (including terminating NULL)
For ISTRING, the size is 1 byte, and you must specify the size in array dimension: ISTRING strTmp[maximum_size]

ByRef and ByVal is ignored for strings, the string is always passed by reference. ByRef and ByVal is importand in VB, it has some ansi/unicode meaning.

What phrases do you use if the time is soo long?
Title: Re: Help with Anagram Check
Post by: LarryMc on June 25, 2010, 02:12:34 PM
Sapero

I played with it a little.
I could see the big difference using:
snow - wons   -> fast
snow - wona   -> slow


LarryMc
Title: Re: Help with Anagram Check
Post by: BJ on June 25, 2010, 02:26:12 PM
Thanks for the informative reply.

The phrases I am using are: test - tess

I've changed:  ISTRING strTmp = ""
to:  ISTRING strTmp[1]

Is this correct?  I've also removed the ByVal and ByRef.
Running the speed tests I get the same results, by the way.
Title: Re: Help with Anagram Check
Post by: sapero on June 25, 2010, 02:41:10 PM
ISTRING strTmp has the same size as ISTRING strTmp[1]. You can assign a value using strTmp[0]=byte ONLY!

NOTE: in your case, when this variable is a local (defined inside a subroutine), the space allocated for it is 4 bytes, so you can safely copy strings (up to 3 characters + terminating NULL) to it.

I see now that AnagramCheck is modifying the second string passed to it:
SUB AnagramCheck (..., inpStr2 as STRING)
  inpStr2[i-1] = ...

Each time you call AnagramCheck, the str2 string will be different. The BYVAL keyword in VB internally created a copy of the string when you called AnagramCheck, so to make it "better", just copy inpStr2 to another local string, and use the copy instead inpStr2:
SUB AnagramCheck(inpStr1 as STRING, inpStr2 as STRING),INT

  STRING inpStr2_copy
  if (len(inpStr2) > 255) then return 0*MessageBox(0, "second string is too long", "")
  inpStr2_copy = inpStr2

  ' now use inpStr2_copy instead inpStr2
Title: Re: Help with Anagram Check
Post by: BJ on June 25, 2010, 02:58:43 PM
Thanks again for the replies.

The purpose of modifying the second input string was to avoid what you suggested, which drastically slows it down in my test case.

//EDIT

Oh, I see what's happening.  In the case where they are equal, the string is rearranged into itself. So when the sub is called again all it does is check if the strings are equal, and they are, so then the sub returns. In the other case, it always shifts around the string to see if it's equal.. so what you suggested gave the actual results.

So if for the first argument I use a wordlist entry and the second a temp input string, the wordlist entry will stay the same (so i can add it to a listbox as-is) and the input (temp string) will be shifted around, which doesn't matter at all.

Stupid me...  :-[

Thanks for clearing that up!
Title: Re: Help with Anagram Check
Post by: sapero on June 25, 2010, 03:31:51 PM
I've googled a bit, and picked the first result for "Anagram function fast" - 3.6x faster than the VB example:
sub is_anagram (string w1, string w2),int
uint i, sz

/* The histogram */
int freqtbl[26]

/* Sanity check */
sz = len(w1)
if (sz <> len(w2)) then return false

/* Initialize the histogram */
'bzero(freqtbl, 26*sizeof(int))
for i=0 to 25
freqtbl[i] = 0
next i

/* Read the first string, incrementing the corresponding histogram entry */
for i = 0 to sz-1
if ((w1[i] >= 65) and (w1[i] <= 90))
freqtbl[w1[i]-65]++
elseif ((w1[i] >= 97) and (w1[i] <= 122))
freqtbl[w1[i]-97]++
else
return false
endif
next i

/* Read the second string, decrementing the corresponding histogram entry */
for i = 0 to sz-1
if ((w2[i] >= 65) and (w2[i] <= 90))
if (!freqtbl[w2[i]-65]) then return false
freqtbl[w2[i]-65]--
elseif ((w2[i] >= 97) and (w2[i] <= 122))
if (!freqtbl[w2[i]-97]) then return false
freqtbl[w2[i]-97]--
else
return false
endif
next i

return true
endsub
Title: Re: Help with Anagram Check
Post by: BJ on June 25, 2010, 03:40:46 PM
 :o  Ahh.. Geeze.  That's great! :-X

Profuse thanks, really.  Thanks for your time!!