Implemantation of nested trees for productCategories

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Implemantation of nested trees for productCategories

madppiper-2
Hey everybody,

I just wanted to get a discussion going on nested sets. In the past few weeks, I had to work alot with the ofbiz category structure and I really got the feeling that the way the productcategories are setup, it takes an awful lot of time to run through the categories, slowly querying from level to level and work your way down to the data you need. I got the feeling that we could really improve the data structure behind that by introducing nested trees. I have done a similar thing for a non-ofbiz-based application and the query results are really fast that way (the queries themselves are rather simple to put together to).

In case you need some additional information on this topic: http://dev.mysql.com/tech-resources/articles/hierarchical-data.html (just scroll down to nested sets).


On a different note, I also noticed that the macro within sidedeepcategory.ftl is awfully complicated, possibly much more complicated than it has to be and I'd be interested in approving the very same. I couldn't get my head around the contentWrapper class, but I think that calling the .getRelated function, could do the trick also and possibly be a simpler solution...

any thoughts`?


Reply | Threaded
Open this post in threaded view
|

Re: Implemantation of nested trees for productCategories

David E Jones-3

On Jan 16, 2009, at 4:39 AM, madppiper wrote:

>
> Hey everybody,
>
> I just wanted to get a discussion going on nested sets. In the past  
> few
> weeks, I had to work alot with the ofbiz category structure and I  
> really got
> the feeling that the way the productcategories are setup, it takes  
> an awful
> lot of time to run through the categories, slowly querying from  
> level to
> level and work your way down to the data you need. I got the feeling  
> that we
> could really improve the data structure behind that by introducing  
> nested
> trees. I have done a similar thing for a non-ofbiz-based application  
> and the
> query results are really fast that way (the queries themselves are  
> rather
> simple to put together to).
>
> In case you need some additional information on this topic:
> http://dev.mysql.com/tech-resources/articles/hierarchical-data.html 
> (just
> scroll down to nested sets).

Yes, this is a classic approach, and a pretty well documented one. The  
author sited in that mysql article (Joe Celko) has a number of great  
books on data structures, especially for relational databases, and  
worked at a school near Salt Lake City, Utah not far from where I used  
to live there. I had the pleasure of meeting him and Terry Halpin  
there, a real treat since they've both done some great work, and  
continue to do good work and publish places like Intelligent  
Enterprise and the BPM Forum.

It does have various limitations though (too simplistic) and is not  
capable of functionally equivalent to the category rollup model in  
OFBiz. Some reasons:

1. not a tree: categories are allowed to have multiple parents (be in  
multiple categories) as is thus a graph and not a tree and this model  
does not work with graphs, only with trees

2. the category rollups are effective dated so a category being a sub-
category of another can change at any time without a database update,  
just the passing of time

3. categories are sorted by a sequence field on the rollup, and that  
sequence can be different for different parent categories

This pattern also requires a large number of updates to change  
anything in the category rollup. To maintain data consistency it is  
necessary to lock the entire category table, which is bad in large  
organizations with many product marketers or others dealing with  
categories.

The main reason this isn't an issue in OFBiz is that all category  
lookups are cached in public facing code.

On that note, this could be used in OFBiz but only as an alternative  
form of caching. In other words, we could have a separate table that  
models a top-down view of the categories with duplicate records as  
needed when a category is in multiple places in the tree (ie use a key  
for that entity that other than the productCategoryId).

The real question is, would that help performance at all? The only way  
to tell is to try to implement and see how it compares to the current  
caching. My guess based on similar efforts is that the current caching  
approach will still be far faster so I've never even bothered with  
implementing this.

> On a different note, I also noticed that the macro within
> sidedeepcategory.ftl is awfully complicated, possibly much more  
> complicated
> than it has to be and I'd be interested in approving the very same. I
> couldn't get my head around the contentWrapper class, but I think that
> calling the .getRelated function, could do the trick also and  
> possibly be a
> simpler solution...

As always, if your changes don't break existing functionality then  
your contributions will be welcome.

-David


Reply | Threaded
Open this post in threaded view
|

Re: Implemantation of nested trees for productCategories

Jacques Le Roux
Administrator
Thanks for the clarification David,

Jacques

From: "David E Jones" <[hidden email]>

>
> On Jan 16, 2009, at 4:39 AM, madppiper wrote:
>
>>
>> Hey everybody,
>>
>> I just wanted to get a discussion going on nested sets. In the past  
>> few
>> weeks, I had to work alot with the ofbiz category structure and I  
>> really got
>> the feeling that the way the productcategories are setup, it takes  
>> an awful
>> lot of time to run through the categories, slowly querying from  
>> level to
>> level and work your way down to the data you need. I got the feeling  
>> that we
>> could really improve the data structure behind that by introducing  
>> nested
>> trees. I have done a similar thing for a non-ofbiz-based application  
>> and the
>> query results are really fast that way (the queries themselves are  
>> rather
>> simple to put together to).
>>
>> In case you need some additional information on this topic:
>> http://dev.mysql.com/tech-resources/articles/hierarchical-data.html 
>> (just
>> scroll down to nested sets).
>
> Yes, this is a classic approach, and a pretty well documented one. The  
> author sited in that mysql article (Joe Celko) has a number of great  
> books on data structures, especially for relational databases, and  
> worked at a school near Salt Lake City, Utah not far from where I used  
> to live there. I had the pleasure of meeting him and Terry Halpin  
> there, a real treat since they've both done some great work, and  
> continue to do good work and publish places like Intelligent  
> Enterprise and the BPM Forum.
>
> It does have various limitations though (too simplistic) and is not  
> capable of functionally equivalent to the category rollup model in  
> OFBiz. Some reasons:
>
> 1. not a tree: categories are allowed to have multiple parents (be in  
> multiple categories) as is thus a graph and not a tree and this model  
> does not work with graphs, only with trees
>
> 2. the category rollups are effective dated so a category being a sub-
> category of another can change at any time without a database update,  
> just the passing of time
>
> 3. categories are sorted by a sequence field on the rollup, and that  
> sequence can be different for different parent categories
>
> This pattern also requires a large number of updates to change  
> anything in the category rollup. To maintain data consistency it is  
> necessary to lock the entire category table, which is bad in large  
> organizations with many product marketers or others dealing with  
> categories.
>
> The main reason this isn't an issue in OFBiz is that all category  
> lookups are cached in public facing code.
>
> On that note, this could be used in OFBiz but only as an alternative  
> form of caching. In other words, we could have a separate table that  
> models a top-down view of the categories with duplicate records as  
> needed when a category is in multiple places in the tree (ie use a key  
> for that entity that other than the productCategoryId).
>
> The real question is, would that help performance at all? The only way  
> to tell is to try to implement and see how it compares to the current  
> caching. My guess based on similar efforts is that the current caching  
> approach will still be far faster so I've never even bothered with  
> implementing this.
>
>> On a different note, I also noticed that the macro within
>> sidedeepcategory.ftl is awfully complicated, possibly much more  
>> complicated
>> than it has to be and I'd be interested in approving the very same. I
>> couldn't get my head around the contentWrapper class, but I think that
>> calling the .getRelated function, could do the trick also and  
>> possibly be a
>> simpler solution...
>
> As always, if your changes don't break existing functionality then  
> your contributions will be welcome.
>
> -David
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Implemantation of nested trees for productCategories

madppiper-2
I can only agree with Jacques on this one. To be honest, I didn't think about the reusability of categories in applications, which is why a tree structure doesn't make any sense - absolutely correct. I guess the only way of implementing anything of that sort would be by opening up a 1:n relational table that is tied to the category id, but features an automated id of itself, as well as the rgt and lft values...

... but then again: why bother? I may implement something of that sort if time allows, simply because i think it may be rather fun to see if it does any good. But yes, a complete change of the category strucure of ofbiz is probably a waste of time...


once again: well put! :)