OFBiz uses a lot of XML files. When each XML file is read, it is first parsed into a DOM Document, then the DOM Document is parsed into OFBiz Java objects. This two-step process consumes a lot of memory, and it takes more time than it should. There is an alternative - what is called event-driven parsing. The XML parser can be set up to convert XML elements directly to the OFBiz Java objects - bypassing the DOM Document build and parse steps. Theoretically, this could provide a huge performance boost, and it would use less memory. In addition, it would solve the problem of huge XML files maxing out server memory during the parse process - like with entity XML import/export. Has anyone else considered this? Do you think it is worth pursuing? -Adrian |
The outcome of optimizing the memory usage and helping to get past some of these out of memory issues is definitely a big win - but I don't know enough about the event-driven parsing paradigm to speak to the outcome. I guess my vote would be that it's worth a shot for sure.
Cheers, Tim -- Tim Ruppert HotWax Media http://www.hotwaxmedia.com o:801.649.6594 f:801.649.6595 ----- "Adrian Crum" <[hidden email]> wrote: > OFBiz uses a lot of XML files. When each XML file is read, it is first > parsed into a DOM Document, then the DOM Document is parsed into OFBiz > Java objects. This two-step process consumes a lot of memory, and it > takes more time than it should. > > There is an alternative - what is called event-driven parsing. The XML > parser can be set up to convert XML elements directly to the OFBiz > Java objects - bypassing the DOM Document build and parse steps. > Theoretically, this could provide a huge performance boost, and it > would use less memory. In addition, it would solve the problem of huge > XML files maxing out server memory during the parse process - like > with entity XML import/export. > > Has anyone else considered this? Do you think it is worth pursuing? > > -Adrian |
In reply to this post by Adrian Crum-2
I'm guessing you're speaking of SAX parsers when you talk about event- driven parsing. If you take a look at the entity XML import code it actually is a SAX event-driven parser. As for other XML readers (like entity defs, widget XML files, etc, etc) I'd be surprised if a SAX reader resulting in much performance improvement, and it makes the code more complex, so the first step would be to test it on one and do some performance tests to see if it is any faster. The XML reading code already has some simple stuff to test how long it takes, though to test this you should run it 100 times or something so the times are more meaningful (otherwise they are probably less than 1ms and possibly not as accurate on the small time scale). Anyway, yeah, there are some general thoughts about it at least... -David On Apr 25, 2009, at 9:48 PM, Adrian Crum wrote: > > OFBiz uses a lot of XML files. When each XML file is read, it is > first parsed into a DOM Document, then the DOM Document is parsed > into OFBiz Java objects. This two-step process consumes a lot of > memory, and it takes more time than it should. > > There is an alternative - what is called event-driven parsing. The > XML parser can be set up to convert XML elements directly to the > OFBiz Java objects - bypassing the DOM Document build and parse > steps. Theoretically, this could provide a huge performance boost, > and it would use less memory. In addition, it would solve the > problem of huge XML files maxing out server memory during the parse > process - like with entity XML import/export. > > Has anyone else considered this? Do you think it is worth pursuing? > > -Adrian > > > > |
In reply to this post by Adrian Crum-2
Adrian Crum wrote:
> OFBiz uses a lot of XML files. When each XML file is > read, it is first parsed into a DOM Document, then the > DOM Document is parsed into OFBiz Java objects. This > two-step process consumes a lot of memory, and it > takes more time than it should. > > There is an alternative - what is called event-driven > parsing. The XML parser can be set up to convert XML > elements directly to the OFBiz Java objects - > bypassing the DOM Document build and parse steps. > Theoretically, this could provide a huge performance > boost, and it would use less memory. In addition, it > would solve the problem of huge XML files maxing out > server memory during the parse process - like with entity XML > > Has anyone else considered this? Do you think it is > worth pursuing? What files are you talking about, that are so huge, they can't be parsed with the simpler DOM model? entity data files are sax based already. widget files, scripts, config files are small, so it's better to keep the simpler algo, as David suggested. additionally, I already did some memory profiling a while back, and interned the long-lived strings from parsed xml. This actually reduced memory usage. Another thing, the widgets, scripts, config files are read very infrequently, then cached. The time it takes to parse them is not really a performance consideration. As an aside, how much swap do you have on your server? Any? Is it being used? Then you don't have enough ram. If your work-load is causing swap to be used, then you haven't correctly identified your work load usage requirements. The same can be said for java maximum memory allocation. |
Adam and David, Thank you for your comments! I'll look into the entity import code some more. Personally, I don't have an issue with importing large XML files. I see it come up from time to time on the mailing lists. I remember BJ Freeman had to write his own import code because of some OFBiz limitation. I'll accept the widget files, scripts, and config files are too small to optimize. Having event-driven parsing for those might be an interesting experiment though. -Adrian --- On Sat, 4/25/09, Adam Heath <[hidden email]> wrote: > From: Adam Heath <[hidden email]> > Subject: Re: Discussion: XML file parsing improvement > To: [hidden email] > Date: Saturday, April 25, 2009, 9:52 PM > Adrian Crum wrote: > > OFBiz uses a lot of XML files. When each XML file is > > read, it is first parsed into a DOM Document, then the > > DOM Document is parsed into OFBiz Java objects. This > > two-step process consumes a lot of memory, and it > > takes more time than it should. > > > > There is an alternative - what is called event-driven > > parsing. The XML parser can be set up to convert XML > > elements directly to the OFBiz Java objects - > > bypassing the DOM Document build and parse steps. > > Theoretically, this could provide a huge performance > > boost, and it would use less memory. In addition, it > > would solve the problem of huge XML files maxing out > > server memory during the parse process - like with > entity XML > import/export. > > > > Has anyone else considered this? Do you think it is > > worth pursuing? > > What files are you talking about, that are so huge, they > can't be > parsed with the simpler DOM model? > > entity data files are sax based already. > > widget files, scripts, config files are small, so it's > better to keep > the simpler algo, as David suggested. > > additionally, I already did some memory profiling a while > back, and > interned the long-lived strings from parsed xml. This > actually > reduced memory usage. > > Another thing, the widgets, scripts, config files are read > very > infrequently, then cached. The time it takes to parse them > is not > really a performance consideration. > > As an aside, how much swap do you have on your server? > Any? Is it > being used? Then you don't have enough ram. If your > work-load is > causing swap to be used, then you haven't correctly > identified your > work load usage requirements. > > The same can be said for java maximum memory allocation. |
Okay, I did some work on this purely as a learning experience for me. I wanted to learn SAX parsing, so I tried converting the screen widgets to SAX parsing. I found a small public-domain framework that makes the whole process very easy. Since all of the model screen widgets subclass a single base class, I was able to hook them into the parsing framework by just having the base class subclass one of the framework classes. Model widgets that don't have sub-widgets just needed a new constructor. Model widgets that have sub-widgets needed a little extra code to handle the sub-widgets, but it was no more code than what already exists to handle the DOM version of the sub-widgets. Overall, it was pretty easy and I was surprised when it worked the very first time I tried it. If anyone is interested, I would be happy to post the POC code in Jira. Just let me know. -Adrian --- On Sat, 4/25/09, Adrian Crum <[hidden email]> wrote: > From: Adrian Crum <[hidden email]> > Subject: Re: Discussion: XML file parsing improvement > To: [hidden email] > Date: Saturday, April 25, 2009, 10:28 PM > Adam and David, > > Thank you for your comments! I'll look into the entity > import code some more. > > Personally, I don't have an issue with importing large > XML files. I see it come up from time to time on the mailing > lists. I remember BJ Freeman had to write his own import > code because of some OFBiz limitation. > > I'll accept the widget files, scripts, and config files > are too small to optimize. Having event-driven parsing for > those might be an interesting experiment though. > > -Adrian > > > --- On Sat, 4/25/09, Adam Heath > <[hidden email]> wrote: > > > From: Adam Heath <[hidden email]> > > Subject: Re: Discussion: XML file parsing improvement > > To: [hidden email] > > Date: Saturday, April 25, 2009, 9:52 PM > > Adrian Crum wrote: > > > OFBiz uses a lot of XML files. When each XML file > is > > > read, it is first parsed into a DOM Document, > then the > > > DOM Document is parsed into OFBiz Java objects. > This > > > two-step process consumes a lot of memory, and it > > > takes more time than it should. > > > > > > There is an alternative - what is called > event-driven > > > parsing. The XML parser can be set up to convert > XML > > > elements directly to the OFBiz Java objects - > > > bypassing the DOM Document build and parse steps. > > > Theoretically, this could provide a huge > performance > > > boost, and it would use less memory. In addition, > it > > > would solve the problem of huge XML files maxing > out > > > server memory during the parse process - like > with > > entity XML > > import/export. > > > > > > Has anyone else considered this? Do you think it > is > > > worth pursuing? > > > > What files are you talking about, that are so huge, > they > > can't be > > parsed with the simpler DOM model? > > > > entity data files are sax based already. > > > > widget files, scripts, config files are small, so > it's > > better to keep > > the simpler algo, as David suggested. > > > > additionally, I already did some memory profiling a > while > > back, and > > interned the long-lived strings from parsed xml. This > > actually > > reduced memory usage. > > > > Another thing, the widgets, scripts, config files are > read > > very > > infrequently, then cached. The time it takes to parse > them > > is not > > really a performance consideration. > > > > As an aside, how much swap do you have on your server? > > > Any? Is it > > being used? Then you don't have enough ram. If > your > > work-load is > > causing swap to be used, then you haven't > correctly > > identified your > > work load usage requirements. > > > > The same can be said for java maximum memory > allocation. |
Administrator
|
Hi Adrian,
I have not time at the moment, but yes a page in the Wiki with a link to a Jira issue sounds good Thanks Jacques From: "Adrian Crum" <[hidden email]> > > Okay, I did some work on this purely as a learning experience for me. I wanted to learn SAX parsing, so I tried converting the > screen widgets to SAX parsing. > > I found a small public-domain framework that makes the whole process very easy. Since all of the model screen widgets subclass a > single base class, I was able to hook them into the parsing framework by just having the base class subclass one of the framework > classes. Model widgets that don't have sub-widgets just needed a new constructor. Model widgets that have sub-widgets needed a > little extra code to handle the sub-widgets, but it was no more code than what already exists to handle the DOM version of the > sub-widgets. > > Overall, it was pretty easy and I was surprised when it worked the very first time I tried it. > > If anyone is interested, I would be happy to post the POC code in Jira. Just let me know. > > -Adrian > > > > --- On Sat, 4/25/09, Adrian Crum <[hidden email]> wrote: > >> From: Adrian Crum <[hidden email]> >> Subject: Re: Discussion: XML file parsing improvement >> To: [hidden email] >> Date: Saturday, April 25, 2009, 10:28 PM >> Adam and David, >> >> Thank you for your comments! I'll look into the entity >> import code some more. >> >> Personally, I don't have an issue with importing large >> XML files. I see it come up from time to time on the mailing >> lists. I remember BJ Freeman had to write his own import >> code because of some OFBiz limitation. >> >> I'll accept the widget files, scripts, and config files >> are too small to optimize. Having event-driven parsing for >> those might be an interesting experiment though. >> >> -Adrian >> >> >> --- On Sat, 4/25/09, Adam Heath >> <[hidden email]> wrote: >> >> > From: Adam Heath <[hidden email]> >> > Subject: Re: Discussion: XML file parsing improvement >> > To: [hidden email] >> > Date: Saturday, April 25, 2009, 9:52 PM >> > Adrian Crum wrote: >> > > OFBiz uses a lot of XML files. When each XML file >> is >> > > read, it is first parsed into a DOM Document, >> then the >> > > DOM Document is parsed into OFBiz Java objects. >> This >> > > two-step process consumes a lot of memory, and it >> > > takes more time than it should. >> > > >> > > There is an alternative - what is called >> event-driven >> > > parsing. The XML parser can be set up to convert >> XML >> > > elements directly to the OFBiz Java objects - >> > > bypassing the DOM Document build and parse steps. >> > > Theoretically, this could provide a huge >> performance >> > > boost, and it would use less memory. In addition, >> it >> > > would solve the problem of huge XML files maxing >> out >> > > server memory during the parse process - like >> with >> > entity XML >> > import/export. >> > > >> > > Has anyone else considered this? Do you think it >> is >> > > worth pursuing? >> > >> > What files are you talking about, that are so huge, >> they >> > can't be >> > parsed with the simpler DOM model? >> > >> > entity data files are sax based already. >> > >> > widget files, scripts, config files are small, so >> it's >> > better to keep >> > the simpler algo, as David suggested. >> > >> > additionally, I already did some memory profiling a >> while >> > back, and >> > interned the long-lived strings from parsed xml. This >> > actually >> > reduced memory usage. >> > >> > Another thing, the widgets, scripts, config files are >> read >> > very >> > infrequently, then cached. The time it takes to parse >> them >> > is not >> > really a performance consideration. >> > >> > As an aside, how much swap do you have on your server? >> >> > Any? Is it >> > being used? Then you don't have enough ram. If >> your >> > work-load is >> > causing swap to be used, then you haven't >> correctly >> > identified your >> > work load usage requirements. >> > >> > The same can be said for java maximum memory >> allocation. > > > > |
Free forum by Nabble | Edit this page |