October 30, 2025, 05:21:18 PM

News:

Own IWBasic 2.x ? -----> Get your free upgrade to 3.x now.........


Capture info in Browser window

Started by billhsln, March 16, 2011, 11:26:42 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

billhsln

Is there a way to capture or scan thru the source of a Browser window?

Thanks,
Bill
When all else fails, get a bigger hammer.

Egil

Maybe I misunderstand, but I usually find an empty space on the particular web page, Right click and choose "Show Source".

Egil
Support Amateur Radio  -  Have a ham  for dinner!

sapero

March 16, 2011, 12:20:47 PM #2 Last Edit: March 18, 2011, 08:56:45 AM by sapero
I think Bill wants to access the DOM from within his browser application.
I've used the browser_test.iwb example - added a new toolbar button "Enum" with a handler:

INT tbArray[10] ' changed 9 to 10
CONST idEnum = 0x107 ' added
...
tbArray = idBack,idForward,idStop,idRefresh,idHome,0,idSearch,idPrint,idEnum ' added idEnum
...
IF LOADTOOLBAR(win,"",999,tbArray,9,@TBTOP | @TBLIST) ' changed 8 to 9
CONTROLCMD win,999,@TBSETLABELS,"Back|Forward|Stop|Refresh|Home|Search|Print|Enum||" ' added Enum
...
SUB handler
...
CASE idEnum ' added
DoEnum() ' added


And the handler for new toolbar button:
declare import,GetPropA(int hwnd, string name),int
extern _IID_IWebBrowser2 as GUID
typedef BSTR pointer

sub DoEnum()
' input variables:
' a) WINDOW cont - browser window

' open the console and clear it
CLOSECONSOLE
OPENCONSOLE

' get the browser control
IDispatch browser
pointer p = GetPropA(cont.hWnd, "BROWSER")
if (p)

IDispatch tmp = *<comref>p
if (tmp && !tmp->QueryInterface(_IID_IWebBrowser2, &browser))

IDispatch all = browser.Document.all
if (all)
UINT count = all.length
INT  itemIndex

for itemIndex = 0 to count-1
IDispatch element = all.item(itemIndex)
if (element)
BSTR bstrTagName = element.tagName
if (bstrTagName)
print *<WSTRING>bstrTagName
FreeComString(bstrTagName)
endif
element->Release()
endif
next itemIndex

all->Release()
endif
browser->Release()
endif
endif
endsub


Output:
Quote!
HTML
HEAD
TITLE
STYLE
META
BODY
TABLE
TBODY
TR
TD
IMG
TD
H1
TR
TD
FONT
TR
TD
FONT
...

billhsln

What I would like to be able to do is.  On any site I visit in a Browse window, I would like to save the HTML into a file.

Thanks,
Bill
When all else fails, get a bigger hammer.

sapero

March 16, 2011, 02:12:30 PM #4 Last Edit: March 17, 2011, 11:12:13 AM by sapero
There are several ways to save the page. The default way is to execute a OLECMDID_SAVEAS command, where the user can select desired format:
sub DoEnum()
' input variables:
' a) WINDOW cont - browser window

' get the browser control
IDispatch browser
pointer p = GetPropA(cont.hWnd, "BROWSER")
if (p)

IDispatch tmp = *<comref>p
if (tmp && !tmp->QueryInterface(_IID_IWebBrowser2, &browser))

const OLECMDID_SAVEAS = 4
const OLECMDEXECOPT_DODEFAULT = 0
browser.ExecWB(OLECMDID_SAVEAS, OLECMDEXECOPT_DODEFAULT)
browser->Release()
endif
endif
endsub


OLECMDID_SAVEAS may fail, if your browser is too strict and accepts the "nocache" http option received from the server. But you can always grab the html code directly from the browser control, and save it to file. It will include all modifications made by javascript (document.write stuff):
sub DoEnum()
' input variables:
' a) WINDOW cont - browser window

' get the browser control
IDispatch browser
pointer p = GetPropA(cont.hWnd, "BROWSER")
if (p)

IDispatch tmp = *<comref>p
if (tmp && !tmp->QueryInterface(_IID_IWebBrowser2, &browser))

BSTR bstrHtml = browser.Document.documentElement.outerHTML
if (bstrHtml)
' todo: open a file, write "\xFF\xFE", write *<WSTRING>bstrHtml
BFILE f
OPENFILE(f, "html dump.htm", "w")
WRITE f, "\xFF\xFE" ' unicode LE16 BOM
__WRITE f, *<char>bstrHtml, len(*<WSTRING>bstrHtml)*2
CLOSEFILE f
FreeComString(bstrHtml)
endif

browser->Release()
endif
endif
endsub


This will dump the HTML tag and all its childs, but will not include frames, because each frame has another copy of browser object.

billhsln

I put in the code from the second example and get the following errors:

QuoteCompiling...
browser_test.iwb
File: C:\Users\Public\Documents\IWBasic\MyProgs\Search Web\browser_test.iwb (177) syntax error - !
File: C:\Users\Public\Documents\IWBasic\MyProgs\Search Web\browser_test.iwb (179) invalid use of dot operator, unknown type
File: C:\Users\Public\Documents\IWBasic\MyProgs\Search Web\browser_test.iwb (179) Invalid assignment
File: C:\Users\Public\Documents\IWBasic\MyProgs\Search Web\browser_test.iwb (190) Warning: Uninitialized variable: browser - )
File: C:\Users\Public\Documents\IWBasic\MyProgs\Search Web\browser_test.iwb (192) syntax error - endif
Error(s) in compiling "C:\Users\Public\Documents\IWBasic\MyProgs\Search Web\browser_test.iwb"

Full program:

Quote/* a basic test of the Emergence BASIC embedded browser control
Requires EBASIC 1.0 or greater
Compile as a WINDOWS target
*/
WINDOW win,cont,urldlg
STRING caption
INT tbArray[10]

declare import,GetPropA(int hwnd, string name),int
extern _IID_IWebBrowser2 as GUID
typedef BSTR pointer

CONST idBack = 0x100
CONST idForward = 0x101
CONST idStop = 0x102
CONST idRefresh = 0x103
CONST idHome = 0x104
CONST idSearch = 0x105
CONST idPrint = 0x106
CONST idEnum = 0x107
DECLARE IMPORT,GetSysColor(nIndex as INT),UINT
'open our main window and browser containing window
OPENWINDOW win,0,0,800,600,@NOAUTODRAW|@SIZE|@MINBOX|@MAXBOX,0,"BROWSER TEST",&handler
OPENWINDOW cont,0,0,800,580,@NOAUTODRAW|@NOCAPTION,win,"",&browsehandler

'A URL entry window used as a toolbar
OPENWINDOW urldlg,0,0,800,35,@NOAUTODRAW|@NOCAPTION|@BORDER,win,"",&UrlHandler
CONTROL urldlg,@EDIT,"",54,6,500,22,@CTEDITMULTI|@CTEDITRETURN|@CTEDITAUTOV,2
CONTROL urldlg,@STATIC,"URL",21,8,31,16,0x5000010B,3
SETWINDOWCOLOR urldlg,GetSysColor(15)
SETFONT urldlg,"MS Sans Serif",-13,400,0,2

'add a status window for messages
CONTROL win,@STATUS,"Status",0,0,0,0,0,2
'create a toolbar. It will send its messages to the main window
tbArray = idBack,idForward,idStop,idRefresh,idHome,0,idSearch,idPrint,idEnum
IF LOADTOOLBAR(win,"",999,tbArray,9,@TBTOP | @TBLIST)
   CONTROLCMD win,999,@TBSETLABELS,"Back|Forward|Stop|Refresh|Home|Search|Print|Enum||"
   CONTROLCMD win,999,@TBRESIZE
   CONTROLCMD win,999,@TBENABLEBUTTON,idBack,FALSE
   CONTROLCMD win,999,@TBENABLEBUTTON,idForward,FALSE
ENDIF
'Create and attach the browser to the containing window
'ATTACHBROWSER returns 0 on success
IF ATTACHBROWSER(cont) <> 0
   MESSAGEBOX win,"Unable to create embedded browser","error"
   CLOSEWINDOW urldlg
   CLOSEWINDOW cont
   CLOSEWINDOW win
   END
ENDIF

'Set the browser, status window and toolbars initial size
ResizeAll()
'Browse to some page
BROWSECMD cont,@NAVIGATE,"http://www.ionicwind.com"

WAITUNTIL win=0
END

SUB browsehandler
SELECT @CLASS
   CASE @IDBEFORENAV
      BROWSECMD(cont,@GETNAVURL,caption,255)      
      SETCAPTION win,caption
      SETCONTROLTEXT urldlg,2,caption
   CASE @IDNAVCOMPLETE
      BROWSECMD(cont,@GETTITLE,caption,255)
      IF LEN(caption)   THEN SETCAPTION win,caption
      'best place to update toolbar buttons
      CONTROLCMD win,999,@TBENABLEBUTTON,idBack,BROWSECMD(cont,@BACKENABLED)
      CONTROLCMD win,999,@TBENABLEBUTTON,idForward,BROWSECMD(cont,@FORWARDENABLED)
   CASE @IDSTATUSTEXTUPDATE
      BROWSECMD(cont,@GETSTATUSTEXT,caption,255)      
      CONTROLCMD win,2,@SWSETPANETEXT,0,caption
ENDSELECT
RETURN
ENDSUB

SUB handler
SELECT @CLASS
   CASE @IDCLOSEWINDOW
      CLOSEWINDOW urldlg
      CLOSEWINDOW cont
      CLOSEWINDOW win
   CASE @IDSIZE
      ResizeAll()
   CASE @IDCONTROL
      IF @NOTIFYCODE = 0 /* ignore any tooltip messages */
         SELECT @CONTROLID
            CASE idBack
               BROWSECMD cont,@GOBACK
            CASE idForward
               BROWSECMD cont,@GOFORWARD
            CASE idStop
               BROWSECMD cont,@BROWSESTOP
            CASE idRefresh
               BROWSECMD cont,@REFRESH
            CASE idHome
               BROWSECMD cont,@GOHOME
            CASE idSearch
               BROWSECMD cont,@BROWSESEARCH
            CASE idPrint
               BROWSECMD cont,@BROWSEPRINT
            CASE idEnum
               DoEnum()
         ENDSELECT
      ENDIF
ENDSELECT
RETURN
ENDSUB

SUB UrlHandler
   DEF temp as STRING
   DEF rcClient as WINRECT
   SELECT @CLASS
      CASE @IDCONTROL
         SELECT @CONTROLID
            CASE 1
               BROWSECMD cont,@NAVIGATE,GETCONTROLTEXT(urldlg,2)
            CASE 2
               SELECT @NOTIFYCODE
                  CASE @ENCHANGE
                     temp = GETCONTROLTEXT(urldlg,2)
                     IF INSTR(temp,"\n")
                        BROWSECMD cont,@NAVIGATE,temp
                     ENDIF         
               ENDSELECT
         ENDSELECT
      CASE @IDSIZE
         'dynamically adjust the size of the edit control.
         GETCLIENTSIZE urldlg,rcClient.left,rcClient.top,rcClient.right,rcClient.bottom
         SETSIZE urldlg,54,6,rcClient.right - 58,22, 2
   ENDSELECT
RETURN
ENDSUB

SUB ResizeAll
   'use rects for convenience.
   'the Get*Size functions return width and height instead of right and bottom
   WINRECT rcClient,rcStatus,rcUrl
   'tell the status bar to resize itself
   CONTROLCMD win,2,@SWRESIZE
   CONTROLCMD win,999,@TBRESIZE
   GetClientSize win,rcClient.left,rcClient.top,rcClient.right,rcClient.bottom
   'get the size of the statusbar
   GetSize win,rcStatus.left,rcStatus.top,rcStatus.right,rcStatus.bottom,2
   'subtract the 'height' of the status bar control
   rcClient.bottom -= rcStatus.bottom
   'get the size of the toolbar
   GetSize win,rcStatus.left,rcStatus.top,rcStatus.right,rcStatus.bottom,999
   'subtract the 'height' of the toolbar. Do this by increasing the top of the rectangle by
   'the height of the toolbar and also reducing the height of the rectangle accordingly
   rcClient.top += rcStatus.bottom
   rcClient.bottom -= rcStatus.bottom
   'make room for our URL entry window
   GetSize urldlg,rcUrl.left,rcUrl.top,rcUrl.right,rcUrl.bottom
   rcClient.top += rcUrl.bottom
   rcClient.bottom -= rcUrl.bottom
   'set the size of the browsers containing window
   SetSize cont,rcClient.left,rcClient.top,rcClient.right,rcClient.bottom
   'finally set the size of the url window
   SetSize urldlg,rcClient.left,rcClient.top-rcUrl.bottom,rcClient.right,rcUrl.bottom
RETURN
ENDSUB

sub DoEnum()
   ' input variables:
   ' a) WINDOW cont - browser window

   ' get the browser control
   IDispatch browser
   pointer p = GetPropA(cont.hWnd, "BROWSER")
   if (p)

      IDispatch tmp = *<comref>p
      if (browser && !tmp->QueryInterface(_IID_IWebBrowser2, &browser))

         BSTR bstrHtml = browser.Document.documentElement.outerHTML
         if (bstrHtml)
            ' todo: open a file, write "\xFF\xFE", write *<WSTRING>bstrHtml
            BFILE f
            OPENFILE(f, "html dump.htm", "w")
            WRITE f, "\xFF\xFE" ' unicode LE16 BOM
            __WRITE f, *<char>bstrHtml, len(*<WSTRING>bstrHtml)*2
            CLOSEFILE f
            FreeComString(bstrHtml)
         endif

         browser->Release()
      endif
   endif
endsub

Not sure what I must have missed.

Thanks,
Bill
When all else fails, get a bigger hammer.

sapero

In my first reply, above sub DoEnum() are three defines you need to keep. I was sure that you will use the modified sample and replace only the doenum function.
Ah, you have it. My code is for 2.0 compiler, the ! operator and the dot operator for IDispatch was added in 2.0, so you need to update from EBasic to IWBasic.

EDIT: I have updated my code - there was a bug left to QueryInterface (invalid variable in IF).

billhsln

I put in your changes and installed version 2.0.  It works exactly as what I was looking for.

Thanks, Sapero....

Minor question, the compiler returns with:
QuoteFile: C:\Documents and Settings\All Users\Documents\IWBasic\MyProgs\Search Web\browser_test.iwb (78) Warning: RETURN value expected.
File: C:\Documents and Settings\All Users\Documents\IWBasic\MyProgs\Search Web\browser_test.iwb (110) Warning: RETURN value expected.
File: C:\Documents and Settings\All Users\Documents\IWBasic\MyProgs\Search Web\browser_test.iwb (134) Warning: RETURN value expected.

The 3 SUB's it complains about are browsehandler, handler and UrlHandler, which do not have a RETURN in them, it still returns these errors even when I put a RETURN as the last line of the SUB.  Is this is minor bug?

Thanks again,
Bill
When all else fails, get a bigger hammer.

sapero

It's not a bug, just disinformation - you don't even know, that window/dialog handler must return a value.

See listview_noresize example. It has "return TRUE", but does not have return FALSE. It means, that zero is the default value to be returned, to handle a message in the default way.
The previous compiler returned zero from all functions, even those defined as void (not retuning anything). In the new compiler you can control this behavior and disable hidden RETURN statements, to output smaller code, but if you enable this option and forget to return a value from window/dialog handler, the handler will return random values, causing all received messages to be marked as "handled", thus freezing the window and heating the CPU (for incorrectly handled WM_PAINT message)

The previous window handler was defined incorrectly:
sub d1_handler
Seeing it, you may think that it does not return any value - and this is the disinformation. It should be defined as follows:
sub d1_handler(),int
To make clear, that it is returning something, and it is possible to return custom values.

You can still use the first definition, because the compiler is able to detect that d1_handler was used in a call to OPENWINDOW or CREATEDIALOG, and it will mark d1_handler as returning INT, but if you forget to return 0, it will warn you about this.

An internal library function which calls d1_handler expects that the callback will return a value (zero by default), so you can't define the callback as void.

The new compiler has an extra option: disable the hidden "return 0". If you enable it, the window handler will return random values, causing your window to stop responding (if you do not return a value). The warning you mentioned tells you, that return value is missing.

If you create a new dialog using the build-in dialog designed, the generated source will include "(),int" and "return 0", so your callback will be working correctly with any combination of compiler options.

billhsln

Thanks for the info.  Will add the RETURN 0's.  Will need to add them into all my programs.  I prefer my compilers to run default without any special conditions.  That way I don't have to remember to change things, if I end up having to reload my system.

Thanks again for all the help.  Program works exactly as I was looking for.  Now, if I find a web page that I think has some neat features, I can quickly save the source code and take a look at it later.

Take care,
Bill
When all else fails, get a bigger hammer.