Hey everybody,
I just wanted to get a discussion going on nested sets. In the past few weeks, I had to work alot with the ofbiz category structure and I really got the feeling that the way the productcategories are setup, it takes an awful lot of time to run through the categories, slowly querying from level to level and work your way down to the data you need. I got the feeling that we could really improve the data structure behind that by introducing nested trees. I have done a similar thing for a non-ofbiz-based application and the query results are really fast that way (the queries themselves are rather simple to put together to). In case you need some additional information on this topic: http://dev.mysql.com/tech-resources/articles/hierarchical-data.html (just scroll down to nested sets). On a different note, I also noticed that the macro within sidedeepcategory.ftl is awfully complicated, possibly much more complicated than it has to be and I'd be interested in approving the very same. I couldn't get my head around the contentWrapper class, but I think that calling the .getRelated function, could do the trick also and possibly be a simpler solution... any thoughts`? |
On Jan 16, 2009, at 4:39 AM, madppiper wrote: > > Hey everybody, > > I just wanted to get a discussion going on nested sets. In the past > few > weeks, I had to work alot with the ofbiz category structure and I > really got > the feeling that the way the productcategories are setup, it takes > an awful > lot of time to run through the categories, slowly querying from > level to > level and work your way down to the data you need. I got the feeling > that we > could really improve the data structure behind that by introducing > nested > trees. I have done a similar thing for a non-ofbiz-based application > and the > query results are really fast that way (the queries themselves are > rather > simple to put together to). > > In case you need some additional information on this topic: > http://dev.mysql.com/tech-resources/articles/hierarchical-data.html > (just > scroll down to nested sets). Yes, this is a classic approach, and a pretty well documented one. The author sited in that mysql article (Joe Celko) has a number of great books on data structures, especially for relational databases, and worked at a school near Salt Lake City, Utah not far from where I used to live there. I had the pleasure of meeting him and Terry Halpin there, a real treat since they've both done some great work, and continue to do good work and publish places like Intelligent Enterprise and the BPM Forum. It does have various limitations though (too simplistic) and is not capable of functionally equivalent to the category rollup model in OFBiz. Some reasons: 1. not a tree: categories are allowed to have multiple parents (be in multiple categories) as is thus a graph and not a tree and this model does not work with graphs, only with trees 2. the category rollups are effective dated so a category being a sub- category of another can change at any time without a database update, just the passing of time 3. categories are sorted by a sequence field on the rollup, and that sequence can be different for different parent categories This pattern also requires a large number of updates to change anything in the category rollup. To maintain data consistency it is necessary to lock the entire category table, which is bad in large organizations with many product marketers or others dealing with categories. The main reason this isn't an issue in OFBiz is that all category lookups are cached in public facing code. On that note, this could be used in OFBiz but only as an alternative form of caching. In other words, we could have a separate table that models a top-down view of the categories with duplicate records as needed when a category is in multiple places in the tree (ie use a key for that entity that other than the productCategoryId). The real question is, would that help performance at all? The only way to tell is to try to implement and see how it compares to the current caching. My guess based on similar efforts is that the current caching approach will still be far faster so I've never even bothered with implementing this. > On a different note, I also noticed that the macro within > sidedeepcategory.ftl is awfully complicated, possibly much more > complicated > than it has to be and I'd be interested in approving the very same. I > couldn't get my head around the contentWrapper class, but I think that > calling the .getRelated function, could do the trick also and > possibly be a > simpler solution... As always, if your changes don't break existing functionality then your contributions will be welcome. -David |
Administrator
|
Thanks for the clarification David,
Jacques From: "David E Jones" <[hidden email]> > > On Jan 16, 2009, at 4:39 AM, madppiper wrote: > >> >> Hey everybody, >> >> I just wanted to get a discussion going on nested sets. In the past >> few >> weeks, I had to work alot with the ofbiz category structure and I >> really got >> the feeling that the way the productcategories are setup, it takes >> an awful >> lot of time to run through the categories, slowly querying from >> level to >> level and work your way down to the data you need. I got the feeling >> that we >> could really improve the data structure behind that by introducing >> nested >> trees. I have done a similar thing for a non-ofbiz-based application >> and the >> query results are really fast that way (the queries themselves are >> rather >> simple to put together to). >> >> In case you need some additional information on this topic: >> http://dev.mysql.com/tech-resources/articles/hierarchical-data.html >> (just >> scroll down to nested sets). > > Yes, this is a classic approach, and a pretty well documented one. The > author sited in that mysql article (Joe Celko) has a number of great > books on data structures, especially for relational databases, and > worked at a school near Salt Lake City, Utah not far from where I used > to live there. I had the pleasure of meeting him and Terry Halpin > there, a real treat since they've both done some great work, and > continue to do good work and publish places like Intelligent > Enterprise and the BPM Forum. > > It does have various limitations though (too simplistic) and is not > capable of functionally equivalent to the category rollup model in > OFBiz. Some reasons: > > 1. not a tree: categories are allowed to have multiple parents (be in > multiple categories) as is thus a graph and not a tree and this model > does not work with graphs, only with trees > > 2. the category rollups are effective dated so a category being a sub- > category of another can change at any time without a database update, > just the passing of time > > 3. categories are sorted by a sequence field on the rollup, and that > sequence can be different for different parent categories > > This pattern also requires a large number of updates to change > anything in the category rollup. To maintain data consistency it is > necessary to lock the entire category table, which is bad in large > organizations with many product marketers or others dealing with > categories. > > The main reason this isn't an issue in OFBiz is that all category > lookups are cached in public facing code. > > On that note, this could be used in OFBiz but only as an alternative > form of caching. In other words, we could have a separate table that > models a top-down view of the categories with duplicate records as > needed when a category is in multiple places in the tree (ie use a key > for that entity that other than the productCategoryId). > > The real question is, would that help performance at all? The only way > to tell is to try to implement and see how it compares to the current > caching. My guess based on similar efforts is that the current caching > approach will still be far faster so I've never even bothered with > implementing this. > >> On a different note, I also noticed that the macro within >> sidedeepcategory.ftl is awfully complicated, possibly much more >> complicated >> than it has to be and I'd be interested in approving the very same. I >> couldn't get my head around the contentWrapper class, but I think that >> calling the .getRelated function, could do the trick also and >> possibly be a >> simpler solution... > > As always, if your changes don't break existing functionality then > your contributions will be welcome. > > -David > > |
I can only agree with Jacques on this one. To be honest, I didn't think about the reusability of categories in applications, which is why a tree structure doesn't make any sense - absolutely correct. I guess the only way of implementing anything of that sort would be by opening up a 1:n relational table that is tied to the category id, but features an automated id of itself, as well as the rgt and lft values...
... but then again: why bother? I may implement something of that sort if time allows, simply because i think it may be rather fun to see if it does any good. But yes, a complete change of the category strucure of ofbiz is probably a waste of time... once again: well put! :) |
Free forum by Nabble | Edit this page |