Hi,
I have found a previous thread here about reading the html code of a web page and it does exactly that - but it stores it in a file that you can read when you want to.
This was the code posted:
sub DoEnum()
' input variables:
' a) WINDOW cont - browser window
' get the browser control
IDispatch browser
pointer p = GetPropA(cont.hWnd, "BROWSER")
if (p)
IDispatch tmp = *<comref>p
if (tmp && !tmp->QueryInterface(_IID_IWebBrowser2, &browser))
BSTR bstrHtml = browser.Document.documentElement.outerHTML
if (bstrHtml)
' todo: open a file, write "\xFF\xFE", write *<WSTRING>bstrHtml
BFILE f
OPENFILE(f, "html dump.htm", "w")
WRITE f, "\xFF\xFE" ' unicode LE16 BOM
__WRITE f, *<char>bstrHtml, len(*<WSTRING>bstrHtml)*2
CLOSEFILE f
FreeComString(bstrHtml)
endif
browser->Release()
endif
endif
endsub
Question:
How can the code be changed so that I can put each line of code into a string rather than to a file?
I want to be able to read each line in turn so I can check for a specific word - can this be done?
Thanks,
Andy.
Quoteif (bstrHtml)
' todo: open a file, write "\xFF\xFE", write *<WSTRING>bstrHtml
BFILE f
OPENFILE(f, "html dump.htm", "w")
WRITE f, "\xFF\xFE" ' unicode LE16 BOM
__WRITE f, *<char>bstrHtml, len(*<WSTRING>bstrHtml)*2
CLOSEFILE f
FreeComString(bstrHtml)
endif
Change the above to:
if (bstrHtml)
' code here to process *<char>bstrHtml
FreeComString(bstrHtml)
endifBill
This is not possible, there are no lines in HTML code. You can embed line breaks in the source code, but they will be usually ignored and deleted in the control. Even spaces and tabs are useless here.
A 10MB (or more) html script can be written in a single line, and that will be still valid.
If you really want to extract the code line by line, tokenize it:
pointer tok = wcstok(bstrHtml, L"\n") ' defined in string.inc and wchar.inc
while (tok)
*<WSTRING>tok is the first/next line
tok = wcstok(0, L"\n")
wend
But note that <tok> may point to a string of any length. Do not copy it to normal wstring variables, do not PRINT it if LEN returns 16KB or more.
Note2: wcstok will modify the string pointed to by bstrHtml.
Thanks for the replies,
The strange thing is when I add the code posted it works on browser_test example but NOT on the browser_test2 example?
I get the following compile errors
Compiling...
browser_test2.iwb
File: C:\2\projects\browser_test2.iwb (316) Error: Undefined variable cont
File: C:\2\projects\browser_test2.iwb (316) Error: FUNCTION (GetPropA): invalid type in parameter 1 (typeOpr)
File: C:\2\projects\browser_test2.iwb (316) Error: Cannot assign none to pointer
Error(s) in compiling C:\2\projects\browser_test2.iwb
It's complaining about this line
pointer p = GetPropA(cont.hWnd, "BROWSER")
Any ideas on how I can fix this?
Thanks very much,
Andy.
The browser_test2.iwb example is a multi-browser program. Global window variables have been moved to a linked list g_list.
Change your sub to' version A
sub DoEnum(WINDOW cont)
' input variables:
' a) WINDOW cont - browser window
' get the browser control
IDispatch browser = GETBROWSERINTERFACE(cont)
if (browser)
BSTR bstrHtml = browser.Document.documentElement.outerHTML
if (bstrHtml)
' todo: open a file, write "\xFF\xFE", write *<WSTRING>bstrHtml
BFILE f
OPENFILE(f, "html dump.htm", "w")
WRITE f, "\xFF\xFE" ' unicode LE16 BOM
__WRITE f, *<char>bstrHtml, len(*<WSTRING>bstrHtml)*2
CLOSEFILE f
FreeComString(bstrHtml)
endif
browser->Release()
endif
endsub
Or even (without the window variable) to'version B
sub DoEnum(IDispatch browser)
BSTR bstrHtml = browser.Document.documentElement.outerHTML
if (bstrHtml)
' todo: open a file, write "\xFF\xFE", write *<WSTRING>bstrHtml
BFILE f
OPENFILE(f, "html dump.htm", "w")
WRITE f, "\xFF\xFE" ' unicode LE16 BOM
__WRITE f, *<char>bstrHtml, len(*<WSTRING>bstrHtml)*2
CLOSEFILE f
FreeComString(bstrHtml)
endif
endsub
Notice that GETBROWSERINTERFACE command returns IWebBrowser2 interface, just casted to the generic IDispatch type, and with incremented reference counter. You need to call Release method when finished.
Now all depends from where you want to call DoEnum.
1. From the global namespace:
a) first you'll need the pointer to BROWSERDATA structure. It is returned from CreateBrowserWindow() and stored in pFirstWindow pointer.
b) call DoEnum(*pFirstWindow.cont) for version A
2. From handler and browsehandler
a) the "p" pointer points to BROWSERDATA structure. The idSave command (button) calls GETBROWSERINTERFACE and executes an OLECMDID_SAVEAS OLE command. Use it to create your own handler for a new button. Here you can call DoEnum(browser) for version B, or DoEnum(.cont) for version A.
Thanks sapero!
Hi,
I have incorporated the DoEnum routine into my browser (based on the browser_test2 example - thanks sapero).
It works exactly as I want it to except when the browser opens a "second" dummy window, at this point the browser crashes.
I've amended the DoEnum routine to write a binary file consisting of the HTLM code of that web page, the file is then read to find a keyword i am looking for.
Don't understand why it only crashes when the second window is opened.
This is the code:
CASE @IDDOCUMENTCOMPLETE
'best place to update toolbar buttons
CONTROLCMD .win, 999, @TBENABLEBUTTON, idBack, BROWSECMD(.cont, @BACKENABLED)
CONTROLCMD .win, 999, @TBENABLEBUTTON, idForward, BROWSECMD(.cont, @FORWARDENABLED)
DoEnum(.cont)
And later the subroutine:
sub DoEnum(WINDOW cont)
OPENFILE(fl,"C:\\2\\dump.txt","W")
' input variables:
' a) WINDOW cont - browser window
' get the browser control
IDispatch browser = GETBROWSERINTERFACE(cont)
if (browser)
BSTR bstrHtml = browser.Document.documentElement.outerHTML
if (bstrHtml)
pointer tok = wcstok(bstrHtml, L"\n") ' defined in
DO
*<WSTRING>tok 'is the first/next line
tok = wcstok(0, L"\n")
ax = *<WSTRING>tok
b$ = ""
b$ = W2S(ax)
b$ = b$ + "\n"
WRITE fl,b$
UNTIL tok = 0
CLOSEFILE fl
FreeComString(bstrHtml)
endif
browser->Release()
ENDIF
DEF myfile as BFILE
DEF str as STRING
DEF ln:string
IF OPENFILE(myfile, "C:\\2\\dump.txt", "r") = 0
DO
ln = ""
ln = space$(254)
READ myfile,ln
IF INSTR(ln,"bbc.co.uk")
'do something here
CLOSEFILE myfile
RETURN
ENDIF
UNTIL EOF(myfile)
CLOSEFILE myfile
ENDIF
RETURN
endsub
This is the bug report:
Problem signature:
Problem Event Name: APPCRASH
Application Name: NewTest New Version.exe
Application Version: 0.0.0.0
Application Timestamp: 4e7877ac
Fault Module Name: NewTest New Version.exe
Fault Module Version: 0.0.0.0
Fault Module Timestamp: 4e7877ac
Exception Code: c0000005
Exception Offset: 0004831a
OS Version: 6.1.7600.2.0.0.256.1
Locale ID: 2057
Additional Information 1: 0a9e
Additional Information 2: 0a9e372d3b4ad19135b953a78882e789
Additional Information 3: 0a9e
Additional Information 4: 0a9e372d3b4ad19135b953a78882e789
Can anyone tell me why it's crashing? and what must I do to fix it - in English.
Thanks as always,
Andy.