OFBiz - Re: EntityListIterator.getResultsSizeAfterPartialList() vs. delegator.getCountByCondition(...)

OFBiz › OFBiz - Dev

Re: EntityListIterator.getResultsSizeAfterPartialList() vs. delegator.getCountByCondition(...)

Posted by Scott Gray-2 on Aug 27, 2009; 10:26pm
URL: http://ofbiz.116.s1.nabble.com/EntityListIterator-getResultsSizeAfterPartialList-vs-delegator-getCountByCondition-tp206959p206968.html

Thanks for the feedback Adrian, the reason for the additional method
is that you would want this behavior only when paginating and the
delegator would have no way of determining that.

Regards
Scott

On 28/08/2009, at 3:01 AM, Adrian Crum wrote:

> I would probably skip #2. No need for client code to be concerned
> with implementation details. We could just modify the existing
> delegator code to make the best decision on how to handle pagination.
>
> I agree with the rest of it - it sounds like a good plan.
>
> -Adrian
>
> Scott Gray wrote:
>> I can't do that at the moment because I've been modifying the same
>> script repeatedly to test different situations and it's gotten
>> pretty messy. But as soon as I've found a temporary solution to
>> the problem we're having I'll come back to it, clean it up and post
>> it so the discussion can continue with more people running tests.
>> What I have been able to determine so far is that regardless of the
>> database you are using it is ALWAYS faster to retrieve a resultset
>> limited to the records you are actually going to use. The problem
>> is that for pagination we "need" (I'm not sure how badly) to know
>> the full size of the resultset. It turns out there is a magic
>> number of records where it becomes faster to do a separate count
>> query but only if you are using EntityFindOptions.setMaxRows(int)
>> on your ELI along with it. On my machine it is somewhere between
>> 25,000-50,000 records.
>> Keep in mind also that this is only really a problem for screens
>> where we paginate through resultsets, in most other cases we always
>> use the entire resultset.
>> It was taking too long to test different table sizes so I ended up
>> just filling a table in MySql with 1,000,000 records and simulated
>> paginating through it (each result is the same query run 5 times
>> and the times are in milliseconds):
>> Here's the result for the way we currently do it with an ELI, there
>> is only one result because because it takes too long to test and
>> the result doesn't really vary regardless of the viewIndex being 50
>> or 500,000:
>> 0 viewIndex: min 13610, max 37307, avg 27386.2, total 136931
>> Here's the result using an ELI with maxRows and a separate count
>> query:
>> 50 viewIndex: min 363, max 377, avg 371.8, total 1859
>> 100 viewIndex: min 372, max 462, avg 413.4, total 2067
>> 200 viewIndex: min 385, max 412, avg 394.4, total 1972
>> 400 viewIndex: min 378, max 412, avg 392.2, total 1961
>> 800 viewIndex: min 373, max 1044, avg 510, total 2550
>> 1600 viewIndex: min 390, max 405, avg 397.4, total 1987
>> 3200 viewIndex: min 402, max 433, avg 418.6, total 2093
>> 6400 viewIndex: min 425, max 504, avg 449.8, total 2249
>> 12800 viewIndex: min 459, max 648, avg 536, total 2680
>> 25600 viewIndex: min 570, max 1173, avg 705.8, total 3529
>> 51200 viewIndex: min 756, max 1144, avg 958.8, total 4794
>> 102400 viewIndex: min 1252, max 2810, avg 1576, total 7880
>> 204800 viewIndex: min 2123, max 10120, avg 5253.6, total 26268
>> 409600 viewIndex: min 4212, max 10837, avg 7011, total 35055
>> That's for a 1,000,000 records but the ELI by itself is much faster
>> for me on small resultsets but gets progressively slower as the
>> resultset's size increases.
>> Another issue I encountered is that if I run the first portion of
>> this test twice in two separate browser windows at the same time
>> then an out of memory error occurs and the instance locks up until
>> I restart it. Should we be able to recover from an out of memory
>> error or should just we just concentrate on avoiding them?
>> The only solution I can think of so far is to:
>> 1. Add the ability for OFBiz to learn when a query becomes high
>> volume i.e. it's resultsize begins crossing the configurable magic
>> number threshold
>> 2. Add a new method for pagination to the delegator that can
>> decide whether or not to set maxRows based on #1 for the
>> EntityListIterator that it will return
>> 3. Provide the EntityListIterator with the information required to
>> be able to perform a separate count query (it needs the delegator
>> or dao + the where and having conditions or we could just give it a
>> SQLProcessor ready to go)
>> 4. Change the ELI's getResultSizeAfterPartialList to perform a
>> count query if maxRows was set and the info from #3 was provided
>> 5. For forms that use the generic performFind service for
>> paginated results, switch them over to using the performFindList
>> service and change it's implementation to use the new delegator
>> method from #2
>> Any thoughts?
>> Thanks
>> Scott
>> On 27/08/2009, at 3:22 AM, Adrian Crum wrote:
>>> Scott,
>>>
>>> It would be helpful if you could post your script in a Jira issues
>>> so we can run it against various databases. I would like to try it
>>> on ours.
>>>
>>> -Adrian
>>>
>>> --- On Tue, 8/25/09, Scott Gray <[hidden email]> wrote:
>>>
>>>> From: Scott Gray <[hidden email]>
>>>> Subject: EntityListIterator.getResultsSizeAfterPartialList() vs.
>>>> delegator.getCountByCondition(...)
>>>> To: [hidden email]
>>>> Date: Tuesday, August 25, 2009, 9:04 PM
>>>> Hi all,
>>>>
>>>> We've had a few slow query problems lately and I've
>>>> narrowed it down to the ResultSet.last() method call in
>>>> EntityListIterator when performed on large result
>>>> sets. I switched the FindGeneric.groovy script in
>>>> webtools to use findCountByCondition instead of
>>>> getResultSizeAfterPartialList and the page load time went
>>>> from 40-50s down to 2-3s for a result containing 700,000
>>>> records.
>>>>
>>>> Based on that I assumed there was probably some magic
>>>> number depending on your system where it becomes more
>>>> efficient to do a separate count query rather than use the
>>>> ResultSet so I put together a quick test to find out.
>>>> I threw together a script that repeatedly adds 500 rows to
>>>> the jobsandbox and then outputs the average time taken of 3
>>>> attempts to get the list size for each method. Here's
>>>> a graph of the results using embedded Derby: http://imgur.com/ieR7m
>>>>
>>>> So unless the magic number lies somewhere in the first 500
>>>> records it looks to me like it always more efficient to do a
>>>> separate count query.
>>>>
>>>> It makes me wonder if we should be taking a different
>>>> approach to pagination in the form widget and in
>>>> general. Any thoughts?
>>>>
>>>> Thanks
>>>> Scott
>>>>
>>>>
>>>
>>>
>>>
>>>

smime.p7s (3K) Download Attachment