April 30, 2024, 08:11:12 PM

News:

Own IWBasic 2.x ? -----> Get your free upgrade to 3.x now.........


XML Parser

Started by LarryMc, March 31, 2007, 12:00:20 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

LarryMc

Anyone out there have a wrapper and dll for a SAX or DOM based XML parser?

Larry
LarryMc
Larry McCaughn :)
Author of IWB+, Custom Button Designer library, Custom Chart Designer library, Snippet Manager, IWGrid control library, LM_Image control library

Allan

I heard of Expat XML DLL which you could Google for having been used in IBP.

LarryMc

For those may be interested I have found a DLL for a non-validating XML parser.

It reads an existing xml file into memory, allows read,edit,delete portions of or all of the file then write the file back out.

It also allows you to create a xml from scratch in memory then write it out.

Seems like it will be an excellant way to store program configuration parameters without using the registry.

Anyway, it appears that I have successfully converted the declarations over to EBasic and have actually compiled and run a demo program to create a new xml file.

When I get the details worked out I will be sharing it.

NOTE: This is the first conversion from another lang to EBasic that I have ever accomplished BY MY SELF!!!!! ;D

LarryMc
Larry McCaughn :)
Author of IWB+, Custom Button Designer library, Custom Chart Designer library, Snippet Manager, IWGrid control library, LM_Image control library

Jerry Muelver

Larry, that's super! I can't wait to see what you've come up with.  ;D

peterpuk

XML is flexible and can be used for a lot of things, so it will be good to see how it can be used with EB.
Good work... :)
Peter

LarryMc

For those that are interested in xml:

I abandoned the use of the dll for xml lib mentioned above.
It took entirely too long to generate a new xml file after changes had been made.

Working on a new version from scratch.
Right now I can read a xml file, parse it, store the info in a linked list; read the linked list and reconstruct the xml file to a new file.
Much faster than before.(by a factor of 10x).  All based on dynamic memory allocation for nodes.

I can also goto first, last, next, prev nested nodes with or without using attributes and attribute values.
Node nesting is limited only by available memory.

Have begun work on methods to add, del, modify nodes.

Larry

LarryMc
Larry McCaughn :)
Author of IWB+, Custom Button Designer library, Custom Chart Designer library, Snippet Manager, IWGrid control library, LM_Image control library

LarryMc

May 21, 2007, 04:18:33 PM #6 Last Edit: June 02, 2007, 03:44:56 PM by Larry McCaughn
UPDATE:

I've finished my first pass at all the methods for my xml class.
Have started working on examples and documentation (which is always slow).
Below is what the class looks like:
removed list of methods because some changed between this posting and initial release of library


For the few that are interested maybe it won't be much longer.


Larry
LarryMc
Larry McCaughn :)
Author of IWB+, Custom Button Designer library, Custom Chart Designer library, Snippet Manager, IWGrid control library, LM_Image control library

Dogge

Larry, this looks very promising, looking forward to se the real thing in action.
/Douglas

Quote from: Larry McCaughn on May 21, 2007, 04:18:33 PM
UPDATE:

I've finished my first pass at all the methods for my xml class.
Have started working on examples and documentation (which is always slow).
Below is what the class looks like:

......

For the few that are interested maybe it won't be much longer.


Larry

LarryMc

June 02, 2007, 03:41:36 PM #8 Last Edit: February 10, 2008, 04:53:24 PM by Larry McCaughn
Done!
The attached zip contains the latest release of my xml library, CXmlLM v2.06.
The zip contains:
   CXmlLM.lib
   CXmlLM.inc
   A Help file with a xml  primer, limitations section, and example code for every public method.
   Example .eba source files.
   Example xml files
   A readme.txt file to tell you what to put where.

Be sure to read the license agreement and the warranty disclaimer before using.

I didn't put this in the OOP section because I didn't want to scare people away.
This library can be used with equal ease in console or GUI applications.
It also can be used in full fledged OOP applications or in what I call "regular ole basic" programs.

My library is not intended to replace MSxml4 for handling xml files.
I swapped ease of use and shortness of learning curve for all that extra functionality.
I feel my library will accomplish what 80%+ of EBasic users would want to do.

I hope it proves useful to someone.
I really did it to see if I could do it.  I hated pointers and you can't go 3 lines in the library's
source without seeing one or more pointers.
I have no idea how I could have written the library without pointers.

Anyway....
Give me some feedback.

Edit: 8/13/07 V2.06
LarryMc
Larry McCaughn :)
Author of IWB+, Custom Button Designer library, Custom Chart Designer library, Snippet Manager, IWGrid control library, LM_Image control library

Jerry Muelver

Thanks, Larry. This may be the tug I needed to get me back into programming.  ;)

Dogge

Very Impressive, I hope I can put it to good use in a project I'm working on. I'll look forward to study it in more detail.
Many thanks for sharing
/Douglas

LarryMc

Thanks for the feedback.
LarryMc
Larry McCaughn :)
Author of IWB+, Custom Button Designer library, Custom Chart Designer library, Snippet Manager, IWGrid control library, LM_Image control library

LarryMc

Not that there appears to be any real interest here is an update on what I'm doing to my CXmlLM.lib:

I'm currently working on increasing the max allowable string lengths from 254.
Tag names, atrribute names, and attribute values will be increased to 1024.
Comments and node values will be increased to 100K.

While I'm at it the comments and node values will be allowed to have embedded CRLF.

Also I am adding the "CDATA" type data node which allows nodes to contain embedded <, >,/,CRLF, and maintains all spacing(formating) on output.  This will allow having data which has such things as html coding and proper display of program source code examples.

I'm at the point where I have completely restructured my parser and switched from reading a line at a time from the xml file
to reading the whole file into memory and then parsing it. The last piece of the parser I have to do is the section that takes care of attributes.
Onced finished with that I start working on each of the methods to convert all my double linked lists to support variable length strings instead of fixed length as they are now.

Also have to add new method/s for for handling CDATA.

So... now is the time for the few of you who have given me feedback to put your 2 cents in for what else, in anything, you would like to see.

Larry
LarryMc
Larry McCaughn :)
Author of IWB+, Custom Button Designer library, Custom Chart Designer library, Snippet Manager, IWGrid control library, LM_Image control library

czoran

Dear Larry,

I have tried your parser and I think it is impressive. But still not good enough for my personal use. I am also in the middle of writing mine, and thinking to abandon it and buy commercial one if exist. I am little bit tired from experimenting with all these opensource parsers because what you get at the end is not worth the time invested. Main problem is when you have to parse somehow specific xml, make changes and save it back to another file. Resulting xml is very often different in structure from original and not readable in application which produced it originally.

So, here are some hints for you based on my experience with other parsers:
- take care of multiline comments and those which contain special characters (",? \, /, & ...).
- take care that comments can be all over xml, not only in the begining of file.
- dont' forget to include "DOCTYPE ..." line, even if your parser does not care for DTD.
- whole parsed structure have to be spit back to file, not only things which parser recognize.
- dont split lines or change their order. Especially attributes.
- don't limit number of characters per line. 254 is funny, 1000 is too small... (Sometimes, whole XML is in one line 98000 characters long)
- don't insist that node must follow rule of <node>blabla</node>. Why forbid <node blabla/> ?
- reading whole xml in memory for faster parsing is good only when you have really small xml file. Think about xml files of 660 MB. You need triple amount of memory to process it. Reading it line by line is faster.

I hear following sentence very often from XML parsers authors when complain for some bugs: "Please modify or write your XML files in a way that our parser can process them." How to do that when I have XML file with million lines? I want to modify that large file by using parser and not manually!

Regards,
Zoran

Mike Stefanik

...reading whole xml in memory for faster parsing is good only when you have really small xml file. Think about xml files of 660 MB. You need triple amount of memory to process it. Reading it line by line is faster.

I don't think you'd need triple the amount of memory, but you would need at least enough memory to store the XML document and the overhead for the node structure. Regardless, what you're asking for is a SAX parser, and it sounds like what he's written is a DOM parser. They both have their use. While SAX will allow you to handle arbitrarily large documents, it is an event-driven sequential single-pass parser. As soon as you have the requirement that you need to access the XML document elements more than one time or out of sequence, then you're going to need a DOM parser which loads the entire document into memory.

And in practical terms, you're going to find very few "real world" XML documents out there that are >500Mb in size. But for those that are, a SAX parser is the way to go.

Oh, Larry, as far as the issue of line length is concerned, if you can avoid any artificial limitations there you'll find fewer issues with documents. As czoran has noted, it's not uncommon for the entire XML document to be one or two lines long.
Mike Stefanik
www.catalyst.com
Catalyst Development Corporation

LarryMc


Quote- take care of multiline comments and those which contain special characters (",? \, /, & ...).
Already taken care of.

Quote- take care that comments can be all over xml, not only in the begining of file.
True in my original version

Quote- dont' forget to include "DOCTYPE ..." line, even if your parser does not care for DTD.
Already taken care of.

Quote- whole parsed structure have to be spit back to file, not only things which parser recognize.
Will be closer to be true in new version.

Quote- dont split lines or change their order. Especially attributes.
True in my original version

Quote- don't limit number of characters per line. 254 is funny, 1000 is too small... (Sometimes, whole XML is in one line 98000 characters long)
Read my previous post where I said I was changing that?

Quote- don't insist that node must follow rule of <node>blabla</node>. Why forbid <node blabla/> ?
Will be added.

Quote- reading whole xml in memory for faster parsing is good only when you have really small xml file. Think about xml files of 660 MB. You need triple amount of memory to process it. Reading it line by line is faster.
You say read the file line by line and it may be 660MB long. Above you say a whole xml file may be one line. 
I'm afraid I'm not smart enough to know how to input a one line file of 660MB into a variable for processing without loading the file into memory.

When it comes down to it I still stick to my statement I made when I posted my lib.
QuoteMy library is not intended to replace MSxml4 for handling xml files.
I swapped ease of use and shortness of learning curve for all that extra functionality.
I feel my library will accomplish what 80%+ of EBasic users would want to do.
I'm sorry that what you apparently want to do falls in the 20% that I'm not addressing.


LarryMc
Larry McCaughn :)
Author of IWB+, Custom Button Designer library, Custom Chart Designer library, Snippet Manager, IWGrid control library, LM_Image control library

LarryMc

Mike

QuoteAs czoran has noted, it's not uncommon for the entire XML document to be one or two lines long.
That is why I switched my parser from being line based input to reading the whole file into memory.

In general my new limitations will be:
what can be put between "<" and ">" will be 100K
a tagname can be 1K
an attr name can be 1K
an attr value can be 1K
Since a tagename and multiple attr name/value pairs can be in one set of brackets their sum cannot exceed the 100K limit.

Also, if my lib is used to read a xml file that basically has no CRLF; the file modified; and then saved.
My lib inserts the nesting indents and the appropriate CRLFs.
I guess it would be easy enough to pass a param to the save routine  to disable that formatting on output for people who want to pack the file like that.

Since I am an amatuer programming hack (with limited skills) I never intended to write a "commercial" library.
I apologize to those who wasted their time downloading my lib thinking they were getting something of that calibre.

Larry

LarryMc
Larry McCaughn :)
Author of IWB+, Custom Button Designer library, Custom Chart Designer library, Snippet Manager, IWGrid control library, LM_Image control library

Dogge

Quote from: Larry McCaughn on June 21, 2007, 11:00:46 PM
Not that there appears to be any real interest .....
Hi Larry,
unfortunately my project hasn't come to the point where I need to create XML-files but it will very soon. My intent is to use your XML lib is to create XML files for storing a special kind of logs that should be parsed by external programs, hence XML is a good format to use.
I will come with feedback as soon as I start.
Keep up the good work!
/Douglas



Steven Picard

Sorry, Larry,  I haven't been looking through the forum very thoroughly lately due to a busy schedule.  I am actually interested in this library and I see what you are doing.  I agree that there are many times when just a light XML parser is needed (which would be my needs most the time.)

Thanks for sharing!

LarryMc

thanks for the feedback :)
LarryMc
Larry McCaughn :)
Author of IWB+, Custom Button Designer library, Custom Chart Designer library, Snippet Manager, IWGrid control library, LM_Image control library

czoran

Dear Larry,

Please don't take it too strong. My intention was to point your attention to some weird things found in real world.
Many "commercial" libraries (not only for XML) fail when not taking in consideration line and file size,content, memory, speed, etc. So it is better to work on that right in the beginning. Otherwise, complete rethinking and rewriting is necessary. I wrote that your lib did not fill my needs, and that's all. And I know that I have some special needs too.
Sure that your lib is very useful for most needs of regular users and I wish to encourage you to continue working on it.

Regards.

LarryMc

czoran

thank you for ALL your comments

Larry
LarryMc
Larry McCaughn :)
Author of IWB+, Custom Button Designer library, Custom Chart Designer library, Snippet Manager, IWGrid control library, LM_Image control library

Dogge

June 24, 2007, 05:12:24 PM #22 Last Edit: June 25, 2007, 01:08:15 AM by Dogge
Hi Larry, I did my first implementation with your XML lib, I must say that I am impressed, it works very well when I got the pieces together.
At this point it does what I need, and I haven't used it to its full capability yet.

Question removed after further testing

I will explore further

Thanks
/Douglas

tekrat

Has anyone gotten Mircosoft's XML DOM to work through COM?  I use it on a daily bases in VBS and it works fine for what I need.

FYI: If you have I would love to see your code so I could use in an Emergence BASIC project.

LarryMc


I have posted the new release of my xml library (CXmlLM V2.0) at http://www.ionicwind.com/forums/index.php/topic,1603.msg15670.html#msg15670

Much more flexible than before and now has CDATA capabilities.

Appreciate any and all feedback.
LarryMc
Larry McCaughn :)
Author of IWB+, Custom Button Designer library, Custom Chart Designer library, Snippet Manager, IWGrid control library, LM_Image control library