May 21, 2024, 11:15:25 AM

News:

Own IWBasic 2.x ? -----> Get your free upgrade to 3.x now.........


Browser Control - Window content

Started by Peter, November 26, 2008, 03:05:16 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Peter

I've been looking everywhere it feels like, and I don't know how this could be accomplished.

I am loading a webpage containing things that needs to be evaluated (such as javascript etc) and just looking at the htmlcode won't give me any help. I would need to look at the screen itself, and try to extract the information I need from there. Are there any shortcuts to accomplish this that anyone knows of? Even the layout of the page might differ from time to time, so I'll have to check the contents and decide which parts that I have use for before processing the input as well.

Cheers.

Rock Ridge Farm (Larry)

PHP is the way to go for web forms.
I use it for data collection and database activities.

Peter

Oh, maybe I didn't explain it as well as I should've.

I'm not doing a serverside application. I'm doing something that needs a bit of "browser automation". I need to be able to let the application "collect" items from viewable things on the page. The problem is when javascript is used to dynamically load certain parts. That's why I need smthg that doesn't just read the source, but enables my application to see what the user is seeing. The Browser Control as a base is brilliant (can keep it hidden if needed), but I can't find enough information about intercepting/interpreting information shown in the browser control window.

Thanks.

LarryMc

Have no idea if this will help your situation or not but you can load a "webpage" into an form and then "post" the form and read everything  that's there.

but to be able to "see" what's on a page when it can be in varying formats isn't going to be easy if possible.
people can display things on a web page using plain html, dhtml,xml transforms, in frames, in iframes, with java applets, with and without elaborate css files.

I wish you luck, but......

Larry
LarryMc
Larry McCaughn :)
Author of IWB+, Custom Button Designer library, Custom Chart Designer library, Snippet Manager, IWGrid control library, LM_Image control library

JohnP

Quote from: Peter on November 27, 2008, 05:04:04 AM
Oh, maybe I didn't explain it as well as I should've.

I'm not doing a serverside application. I'm doing something that needs a bit of "browser automation". I need to be able to let the application "collect" items from viewable things on the page. The problem is when javascript is used to dynamically load certain parts. That's why I need smthg that doesn't just read the source, but enables my application to see what the user is seeing. The Browser Control as a base is brilliant (can keep it hidden if needed), but I can't find enough information about intercepting/interpreting information shown in the browser control window.

Thanks.

Peter,
I'm not sure if I'm fully understanding your request but, here goes.

I guess that your prog needs to download a webpage and examine the contents of the HTML code, and then give the user a choice from several options, according to the nature of the HTML code.
If I'm roughly on target, then I think you might need to write what is sometimes called a 'web-spider'.

If so, I can help a bit, since that is exactly what I am currently engaged in writing.
I have written (using a programming language other than EBasic) a large (2000-lines of code) specialist spider to extract and process data from the web.
The spider saves the data to a large text file in .csv format.
It is aimed at a particular website, so is written with that website's layout in mind.
It won't work with any other website, but...

I'm now re-writing the complete spider in EBasic, so that I can use its built-in database capability.

I'm not suggesting that you need all that, but maybe the routines I use to download, examine and extract data, which are more general in nature, might be of assistance?

JohnP

Peter

December 01, 2008, 02:23:22 AM #5 Last Edit: December 01, 2008, 02:50:05 AM by Peter
JohnP:

Not exactly. The html is easy to download, but the issue is when the webpage is having strange javascripts to show certain text (and even autoupdating texts) in certain places such as time, currency etc in my case. What I need is to be able to "see" the way a normal browsing user would see the page. That is, I need to "remote control" and "snoop" inside the browser window, and I know that it is possible. I just don't know how. :)

Edit:
I will also need to be able to simulate textinput, button/image/hyperlink clicks etc. It's hard to do only with GET and POST since many pages generate the code to be "executed" on the fly using javascript when an object is clicked.

sapero

December 01, 2008, 10:25:34 AM #6 Last Edit: December 01, 2008, 12:40:20 PM by sapero
Peter, I've created a mini example for you, just to show a bit about browser internals.

It opens google homepage, puts a string into search box and clicks the search button. After search results are ready, a 'Support Forums' anchor gets clicked and a messagebox shows who is online.
It does also extend the browser libray with missing DOCUMENT_COMPLETE event. I have used the first unused ID for this event.
$include "windowssdk.inc"
$include "mshtml.inc"
$include "exdisp.inc"

WINDOW g_win
UINT   g_ondoccompleted ' callback pointer
setid  "IDDOCCOMPLETE",0x8011 ' first unused id

declare BROWSERCB(WINDOW win, IWebBrowser2 browser)

InstallDocumentCompleteSupport()

OPENWINDOW g_win,0,0,600,400,0x10CF0000,0,"",&win_handler

if (AttachBrowser(g_win) = 0)

IWebBrowser2 g_browser
if (BrowserFromWindow(g_win, &g_browser))
' call OnGoogleOpened after the navigation completes
'update callback
g_ondoccompleted = &OnGoogleOpened
' and navigate
g_browser->Navigate(L"http://www.google.com/webhp?sourceid=navclient&ie=UTF-8",0,0,0,0)
' release browser
g_browser->Release()

waituntil g_win.hwnd = 0
endif
endif




sub OnGoogleOpened(WINDOW win, IWebBrowser2 browser)
IHTMLDocument2 document

if (BrowserGetDocument(browser, &document))

' 1. put "aurora compiler" into search box
' 2. change submit button text
' 3. submit
IHTMLInputElement textfind
if (DocumentGetElementById(document, L"q", _IID_IHTMLInputElement, &textfind))

textfind->put_value(L"aurora compiler")
textfind->Release()

IHTMLInputElement searchbutton
if (DocumentGetElementById(document, L"btnG", _IID_IHTMLInputElement, &searchbutton))

ElementFocus(searchbutton)
searchbutton->put_value(L"Click Me  Click Me  Click Me")

if (MessageBox(win, "search now ?", "", MB_YESNO) = @IDYES)
'update callback
g_ondoccompleted = &OnGoogleResults
ElementClick(searchbutton)
endif
searchbutton->Release()

endif
endif
document->Release()
endif

return
endsub


sub OnGoogleResults(WINDOW win, IWebBrowser2 browser)
int index

' find and click 'Support Forums'

IHTMLElementCollection all
if (BrowserGetAll(browser, &all)) /*browser.document.all*/

' number of elements
int count=0
all->get_length(&count)

' for each element
for index=0 to count-1

IHTMLAnchorElement anchor
if (CollectionGetItem(all, index, _IID_IHTMLAnchorElement, &anchor))

' anchor has text and href attributes (not only)
BSTR bstrText=0
'if (ElementGetInnerText(anchor, &bstrText))
if (ElementGetAttribute(anchor, L"innerText", &bstrText))

' bstrText is a wstring
if (wcsicmp(bstrText, L"Support Forums") = 0)
' if you are interested with this anchor ...
' ... click it
if (MessageBox(win, "click on 'Support Forums' anchor ?", "Support Forums", MB_YESNO) = @IDYES)
'if (MessageBox(win, "in new window ?", "Support Forums", MB_YESNO) = @IDNO)
anchor->put_target(NULL) ' but in same window
'update callback
g_ondoccompleted = &OnSupportForumsOpened
'else
' anchor->put_target(L"_blank")
'endif
ElementClick(anchor)
' break the FOR
index = count
endif
endif
SysFreeString(bstrText)
endif
anchor->Release()
endif

next index
all->Release()
endif
return
endsub


sub OnSupportForumsOpened(WINDOW win, IWebBrowser2 browser)
' this will be a surprise
int index

IHTMLElementCollection all
if (BrowserGetAll(browser, &all))

' number of elements
int count=0
all->get_length(&count)

' for each element
for index=0 to count-1

IHTMLAnchorElement anchor
if (CollectionGetItem(all, index, _IID_IHTMLAnchorElement, &anchor))

' anchor has text and href
BSTR bstrHref=0
if (ElementGetAttribute(anchor, L"href", &bstrHref))

' bstrHref is a wstring
if (wcsstr(bstrHref, L"?action=who"))

BSTR bstrText=0
'if (ElementGetInnerText(anchor, &bstrText))
if (ElementGetAttribute(anchor, L"innerText", &bstrText))

MessageBoxW(win.hwnd, bstrText, L"Surprise", 0)
SysFreeString(bstrText)
index = count
endif

endif
SysFreeString(bstrHref)
endif
anchor->Release()
endif

next index
all->Release()
endif
return
endsub



sub win_handler
IWebBrowser2 browser
UINT function
SELECT @MESSAGE

CASE @IDCREATE
CENTERWINDOW *<WINDOW>@HITWINDOW

CASE @IDDOCCOMPLETE
if (g_ondoccompleted and BrowserFromWindow(*<WINDOW>@HITWINDOW, &browser))

function = g_ondoccompleted
g_ondoccompleted = 0

!<BROWSERCB>function(*<WINDOW>@HITWINDOW, browser)
browser->Release()
endif

CASE @IDCLOSEWINDOW
CLOSEWINDOW *<WINDOW>@HITWINDOW

ENDSELECT
RETURN
ENDSUB


'================================================================== html util

sub BrowserFromWindow(WINDOW w, pointer ppBrowser),BOOL
' This will return IWebBrowser2 object.
' Call Release() method when finished with it.
pointer p = GetProp(w.hwnd, "BROWSER")
BOOL success = FALSE
*<int>ppBrowser = 0

if (p)
IUnknown unk = *<comref>p
if (unk <> 0)
success = (unk->QueryInterface(_IID_IWebBrowser2, ppBrowser) = 0)
endif
endif

return success
endsub


sub BrowserGetDocument(IWebBrowser2 browser, pointer ppv),BOOL
' This will return IHTMLDocument2
BOOL success = FALSE

IDispatch disp = 0
if ((browser->get_Document(&disp) = 0) and (disp <> 0))
success = (disp->QueryInterface(_IID_IHTMLDocument2, ppv) = 0)
disp->Release()
endif

return success
endsub


sub DocumentGetElementById(IHTMLDocument2 document, LPWSTR id, pointer refiid, pointer ppv),BOOL
BOOL success = FALSE
if (refiid = 0) then refiid = _IID_IHTMLElement

IHTMLDocument3 doc = 0
if (document->QueryInterface(_IID_IHTMLDocument3, &doc) = 0)

IHTMLElement element = 0
if ((doc->getElementById(id, &element) = 0) and (element <> 0))

success = (element->QueryInterface(refiid, ppv) = 0)
element->Release()

endif
doc->Release()
endif
return success
endsub


sub CollectionGetItem(IHTMLElementCollection all, int index, pointer refiid, pointer ppv),BOOL
BOOL success = FALSE
VARIANT vName
VARIANT vIndex
vName.vt     = VT_I4
vIndex.vt    = VT_EMPTY
vName.intVal = index

IDispatch pDisp = 0
if ((all->item(vName, vIndex, &pDisp) = 0) and (pDisp <> 0))
success = (pDisp->QueryInterface(refiid, ppv) = 0)
pDisp->Release()
endif
return success
endsub


sub BrowserGetAll(IWebBrowser2 browser, pointer ppAall),BOOL
' this will return IHTMLElementCollection
BOOL success = FALSE

IHTMLDocument2 document
if (BrowserGetDocument(browser, &document))

IHTMLElementCollection all = 0
if ((document->get_all(ppAall) = 0) and *<int>ppAall)
success = TRUE
endif
document->Release()
endif
return success
endsub


sub ElementClick(IDispatch object)
IHTMLElement element = 0
if (object->QueryInterface(_IID_IHTMLElement, &element) = 0)
element->click()
element->Release()
endif
endsub


sub ElementFocus(IDispatch object)
IHTMLElement2 element = 0
if (object->QueryInterface(_IID_IHTMLElement2, &element) = 0)
element->focus()
element->Release()
endif
endsub

/*
sub ElementGetInnerText(IDispatch object, pointer ppv),BOOL
' this will return BSTR
' you get same result calling ElementGetAttribute(element, L"innerText", &bstrText)
BOOL success = FALSE

IHTMLElement element=0
if (object->QueryInterface(_IID_IHTMLElement, &element) = 0)

if ((element->get_innerText(ppv) = 0) and *<int>ppv)
success = TRUE
endif

element->Release()
endif
return success
endsub*/


sub ElementGetAttribute(IDispatch object, LPWSTR attribute, pointer ppv),BOOL
' this will return BSTR
BOOL success = FALSE

IHTMLElement element=0

if (object->QueryInterface(_IID_IHTMLElement, &element) = 0)

VARIANT v
if (element->getAttribute(attribute, 0, &v) = 0)

if ((v.vt <> VT_UNKNOWN) and (v.vt <> VT_DISPATCH))
VariantChangeType(&v, &v, VARIANT_ALPHABOOL, VT_BSTR)
endif

if (v.vt = VT_BSTR)
*<BSTR>ppv = v.bstrVal
v.vt = VT_EMPTY
success = TRUE
endif

VariantClear(&v)
endif

element->Release()
endif
return success
endsub


'================================================
' this is a trick to receive OnDocumentComplete.
' the _DocumentComplete function is empty, and is 16 bytes long. Takes two parameters
declare extern _DocumentComplete()
declare _dcpath()

_asm
jmp _skip
_dcpath: ; replacement for _DocumentComplete
push dword _doccompl
ret

align 4
_doccompl:
add  esp,12          ; eat 2 parameters and return address
mov  eax,[esp+4]     ; WebEvents*
mov  edx,[esp+24]    ; pDispParams
mov  edx,[edx]       ; VARIANT[]

push dword [edx+8]   ; VARIANT *URL
push dword [edx+16+8]; IDispatch *pDisp
push dword [eax+12]  ; hwnd
call OnDocumentComplete
ret 0x24             ; return from WebEvents::Invoke
_skip:
_endasm

sub InstallDocumentCompleteSupport()
' overwrite the _DocumentComplete function
WriteProcessMemory(GetCurrentProcess(),&_DocumentComplete,&_dcpath,6,0)
return
endsub


sub OnDocumentComplete(HWND hwnd, IDispatch pDisp, VARIANT URL)
' pDisp - Pointer to the IDispatch interface of the window or frame in which the document has loaded.
' This IDispatch interface can be queried for the IWebBrowser2 interface.
_SendMessage(hwnd, @IDDOCCOMPLETE,0,0)
return
endsub

Peter

sapero: You amaze me every time you post. Quite an advanced method, including inlineasm etc just to accomplish this, but it sure is a push in the right direction. Thanks!

sapero

I have updated the code - added OnGoogleResults function where the code is searching for an anchor with given text. If found, the anchor will be clicked and... (top secret).

Peter

I can hardly even speak. The implementation is way more advanced than anything that I could come up with, and that's just the internals on how to find controls etc. This could be a priceless library if worked on a little.

The trick with the OnDocumentComplete-function is so far out of my league that I won't even start discussing it.

You're a god.

sapero

December 05, 2008, 06:24:09 PM #10 Last Edit: December 05, 2008, 06:33:16 PM by sapero
The next example shows how to attach your dispatch class to DHTML event (onreadystatechange), how to set src attribute of image element, and how to append text to element.

Required headers update (6th december) - fixes setAttribute method!
The DHTMLDispatch class does not use reference counter (AddRef,Release), it is not required here.
$include "windowssdk.inc"
$include "mshtml.inc"
$include "exdisp.inc"

' required SDK pak from 6th december 2008, or newer

class DHTMLDispatch
declare DHTMLDispatch()
declare virtual QueryInterface(REFIID riid, pointer ppvObject),HRESULT
declare virtual AddRef(),ULONG
declare virtual Release(),ULONG
declare virtual GetTypeInfoCount(pointer pctinfo),HRESULT
declare virtual GetTypeInfo(UINT iTInfo, LCID lcid, ITypeInfo ppTInfo),HRESULT
declare virtual GetIDsOfNames(REFIID riid, LPOLESTR rgszNames[], UINT cNames, LCID lcid, pointer rgDispId),HRESULT
declare virtual Invoke(DISPID dispIdMember, REFIID riid, LCID lcid, USHORT wFlags, DISPPARAMS pDispParams, VARIANT pVarResult, EXCEPINFO pExcepInfo, UINT puArgErr byref)

uint    m_callback /* OnEvent(DHTMLDispatch disp) */
' user data here
IHTMLElement m_image
IHTMLElement m_text
endclass


WINDOW        g_win
DHTMLDispatch g_dLogo



OPENWINDOW g_win,0,0,600,400,0x10CF0000,0,"DHTML events", &win_handler
if (AttachBrowser(g_win) = 0)
BROWSECMD g_win, @BROWSELOAD, "<html><head><base href='http://ionicwind.com/forums/'></head><body><img id=smfLogo border=1><br><br><div id=myInfo style='border: 1px solid'></div></body></html>"

IWebBrowser2 g_browser
if (BrowserFromWindow(g_win, &g_browser))

' InitImageHref will set g_dLogo.m_image to document.smfLogo, g_dLogo.m_text to myInfo,
' image.onreadystatechange to OnImageReadyStateChanged, image.src to SMF logo url.
InitImageHref(g_browser, L"smfLogo", L"Themes/babylon/images/smflogo.gif", g_dLogo)
g_browser->Release()

AppendTextLine(g_dLogo, L"entering message loop")
waituntil g_win.hwnd = 0

' cleanup
if (g_dLogo.m_image <> 0) then g_dLogo.m_image->Release()
if (g_dLogo.m_text <> 0) then g_dLogo.m_text->Release()

endif
endif
end


sub InitImageHref(IWebBrowser2 browser, LPWSTR wszImgId, LPWSTR wszImgSrc, DHTMLDispatch dStateChangeDisp)
VARIANT v

IHTMLDocument2 document
if (BrowserGetDocument(browser, &document))

DocumentGetElementById(document, L"myInfo", _IID_IHTMLElement, &dStateChangeDisp.m_text)

IHTMLElement image
if (DocumentGetElementById(document, wszImgId, _IID_IHTMLElement, &image))

dStateChangeDisp.m_callback = &OnImageReadyStateChanged
dStateChangeDisp.m_image    = image

AppendTextLine(dStateChangeDisp, L"initializing onreadystatechange")
v.vt       = VT_DISPATCH
v.pDispVal = &dStateChangeDisp
ElementSetAttributeEx(image, L"onreadystatechange", v)

AppendTextLine(dStateChangeDisp, L"initializing image.src with " + *<WSTRING>wszImgSrc)
v.vt       = VT_BSTR
v.bstrVal  = SysAllocString(wszImgSrc)
ElementSetAttributeEx(image, L"src", v)
VariantClear(&v)
'image->Release() !! keep a reference in g_dLogo.m_image

endif
document->Release()
endif

AppendTextLine(dStateChangeDisp, L"returning from InitImageHref function")
return
endsub


' called when ready-state of image changes
sub OnImageReadyStateChanged(DHTMLDispatch disp)

VARIANT state

if (ElementGetAttributeEx(disp.m_image, L"readyState", state))

AppendTextLine(disp, L"ready state: " + state.*<WSTRING>bstrVal)

if (wcsicmp(state.bstrVal, L"complete") = 0)
disp.m_image->Release()
disp.m_image = 0
endif

VariantClear(&state)
endif

return
endsub


sub AppendTextLine(DHTMLDispatch disp, wstring wszText)
VARIANT text

if (disp.m_text <> 0)

if (ElementGetAttributeEx(disp.m_text, L"innerHTML", text))

BSTR newString = SysAllocString(text.*<WSTRING>bstrVal + wszText + L"<br>")
SysFreeString(text.bstrVal)
text.bstrVal = newString
ElementSetAttributeEx(disp.m_text, L"innerHTML", text)
SysFreeString(text.bstrVal)

endif
endif

return
endsub


sub win_handler
SELECT @MESSAGE

CASE @IDCREATE
CENTERWINDOW *<WINDOW>@HITWINDOW

CASE @IDCLOSEWINDOW
CLOSEWINDOW *<WINDOW>@HITWINDOW

ENDSELECT
RETURN
ENDSUB

' general dispatch class

sub DHTMLDispatch::DHTMLDispatch()
m_callback = 0
m_image    = 0
m_text     = 0
return
endsub

sub DHTMLDispatch::QueryInterface(REFIID riid, pointer ppvObject),HRESULT
if (IsEqualGUID(riid, _IID_IUnknown) or IsEqualGUID(riid, _IID_IDispatch))
*<pointer>ppvObject = this
AddRef()
return 0
endif
return E_NOINTERFACE
endsub

sub DHTMLDispatch::AddRef(),ULONG
return 1
endsub

sub DHTMLDispatch::Release(),ULONG
return 1
endsub

sub DHTMLDispatch::GetTypeInfoCount(pointer pctinfo),HRESULT
*<int>pctinfo = 0
return 0
endsub

sub DHTMLDispatch::GetTypeInfo(UINT iTInfo, LCID lcid, ITypeInfo ppTInfo),HRESULT
return E_NOINTERFACE
endsub

sub DHTMLDispatch::GetIDsOfNames(REFIID riid, LPOLESTR rgszNames[], UINT cNames, LCID lcid, pointer rgDispId),HRESULT
return E_FAIL
endsub

sub DHTMLDispatch::Invoke(DISPID dispIdMember, REFIID riid, LCID lcid, USHORT wFlags, DISPPARAMS pDispParams, VARIANT pVarResult, EXCEPINFO pExcepInfo, UINT puArgErr byref)
if (m_callback)
declare CB1(DHTMLDispatch d)
!<CB1>m_callback(*<DHTMLDispatch>this)
endif
return
endsub


' html helpers

sub ElementGetAttributeEx(IDispatch object, LPWSTR attribute, VARIANT ppv),BOOL
' this will return BSTR
BOOL success = FALSE

IHTMLElement element=0

if (object->QueryInterface(_IID_IHTMLElement, &element) = 0)
success =  (element->getAttribute(attribute, 0, &ppv) = 0)
element->Release()
endif

return success
endsub


sub ElementSetAttributeEx(IDispatch object, LPWSTR name, VARIANT v)

IHTMLElement element
if (object->QueryInterface(_IID_IHTMLElement, &element) = 0)
element->setAttribute(name, v, 0)
element->Release()
endif

VariantClear(&v)

return
endsub



sub BrowserFromWindow(WINDOW w, pointer ppBrowser),BOOL
' This will return IWebBrowser2 object.
' Call Release() method when finished with it.
pointer p = GetProp(w.hwnd, "BROWSER")
BOOL success = FALSE
*<int>ppBrowser = 0

if (p)
IUnknown unk = *<comref>p
if (unk <> 0)
success = (unk->QueryInterface(_IID_IWebBrowser2, ppBrowser) = 0)
endif
endif

return success
endsub



sub BrowserGetDocument(IWebBrowser2 browser, pointer ppv),BOOL
' This will return IHTMLDocument2
BOOL success = FALSE

IDispatch disp = 0
if ((browser->get_Document(&disp) = 0) and (disp <> 0))
success = (disp->QueryInterface(_IID_IHTMLDocument2, ppv) = 0)
disp->Release()
endif

return success
endsub


sub DocumentGetElementById(IHTMLDocument2 document, LPWSTR id, pointer refiid, pointer ppv),BOOL
BOOL success = FALSE

IHTMLDocument3 doc = 0
if (document->QueryInterface(_IID_IHTMLDocument3, &doc) = 0)

IHTMLElement element = 0
if ((doc->getElementById(id, &element) = 0) and (element <> 0))

success = (element->QueryInterface(refiid, ppv) = 0)
element->Release()

endif
doc->Release()
endif
return success
endsub

sapero

Attached a project extending the above example. Has additional input box and a button. If you click it, image will be downloaded from the url typed in input box.
The url can be relative to http://www.ionicwind.com/forums/ (see <base href>) or it can be a fully qualified url.