Login  Register

Re: EntityListIterator.getResultsSizeAfterPartialList() vs. delegator.getCountByCondition(...)

Posted by Adrian Crum on Aug 27, 2009; 3:01pm
URL: http://ofbiz.116.s1.nabble.com/EntityListIterator-getResultsSizeAfterPartialList-vs-delegator-getCountByCondition-tp206959p206967.html

I would probably skip #2. No need for client code to be concerned with
implementation details. We could just modify the existing delegator code
to make the best decision on how to handle pagination.

I agree with the rest of it - it sounds like a good plan.

-Adrian

Scott Gray wrote:

> I can't do that at the moment because I've been modifying the same
> script repeatedly to test different situations and it's gotten pretty
> messy.  But as soon as I've found a temporary solution to the problem
> we're having I'll come back to it, clean it up and post it so the
> discussion can continue with more people running tests.
>
> What I have been able to determine so far is that regardless of the
> database you are using it is ALWAYS faster to retrieve a resultset
> limited to the records you are actually going to use.  The problem is
> that for pagination we "need" (I'm not sure how badly) to know the full
> size of the resultset.  It turns out there is a magic number of records
> where it becomes faster to do a separate count query but only if you are
> using EntityFindOptions.setMaxRows(int) on your ELI along with it.  On
> my machine it is somewhere between 25,000-50,000 records.
>
> Keep in mind also that this is only really a problem for screens where
> we paginate through resultsets, in most other cases we always use the
> entire resultset.
>
> It was taking too long to test different table sizes so I ended up just
> filling a table in MySql with 1,000,000 records and simulated paginating
> through it (each result is the same query run 5 times and the times are
> in milliseconds):
>
> Here's the result for the way we currently do it with an ELI, there is
> only one result because because it takes too long to test and the result
> doesn't really vary regardless of the viewIndex being 50 or 500,000:
> 0 viewIndex: min 13610, max 37307, avg 27386.2, total 136931
>
> Here's the result using an ELI with maxRows and a separate count query:
> 50 viewIndex: min 363, max 377, avg 371.8, total 1859
> 100 viewIndex: min 372, max 462, avg 413.4, total 2067
> 200 viewIndex: min 385, max 412, avg 394.4, total 1972
> 400 viewIndex: min 378, max 412, avg 392.2, total 1961
> 800 viewIndex: min 373, max 1044, avg 510, total 2550
> 1600 viewIndex: min 390, max 405, avg 397.4, total 1987
> 3200 viewIndex: min 402, max 433, avg 418.6, total 2093
> 6400 viewIndex: min 425, max 504, avg 449.8, total 2249
> 12800 viewIndex: min 459, max 648, avg 536, total 2680
> 25600 viewIndex: min 570, max 1173, avg 705.8, total 3529
> 51200 viewIndex: min 756, max 1144, avg 958.8, total 4794
> 102400 viewIndex: min 1252, max 2810, avg 1576, total 7880
> 204800 viewIndex: min 2123, max 10120, avg 5253.6, total 26268
> 409600 viewIndex: min 4212, max 10837, avg 7011, total 35055
>
> That's for a 1,000,000 records but the ELI by itself is much faster for
> me on small resultsets but gets progressively slower as the resultset's
> size increases.
>
> Another issue I encountered is that if I run the first portion of this
> test twice in two separate browser windows at the same time then an out
> of memory error occurs and the instance locks up until I restart it.  
> Should we be able to recover from an out of memory error or should just
> we just concentrate on avoiding them?
>
> The only solution I can think of so far is to:
> 1.  Add the ability for OFBiz to learn when a query becomes high volume
> i.e. it's resultsize begins crossing the configurable magic number
> threshold
> 2.  Add a new method for pagination to the delegator that can decide
> whether or not to set maxRows based on #1 for the EntityListIterator
> that it will return
> 3.  Provide the EntityListIterator with the information required to be
> able to perform a separate count query (it needs the delegator or dao +
> the where and having conditions or we could just give it a SQLProcessor
> ready to go)
> 4.  Change the ELI's getResultSizeAfterPartialList to perform a count
> query if maxRows was set and the info from #3 was provided
> 5.  For forms that use the generic performFind service for paginated
> results, switch them over to using the performFindList service and
> change it's implementation to use the new delegator method from #2
>
> Any thoughts?
>
> Thanks
> Scott
>
>
> On 27/08/2009, at 3:22 AM, Adrian Crum wrote:
>
>> Scott,
>>
>> It would be helpful if you could post your script in a Jira issues so
>> we can run it against various databases. I would like to try it on ours.
>>
>> -Adrian
>>
>> --- On Tue, 8/25/09, Scott Gray <[hidden email]> wrote:
>>
>>> From: Scott Gray <[hidden email]>
>>> Subject: EntityListIterator.getResultsSizeAfterPartialList() vs.
>>> delegator.getCountByCondition(...)
>>> To: [hidden email]
>>> Date: Tuesday, August 25, 2009, 9:04 PM
>>> Hi all,
>>>
>>> We've had a few slow query problems lately and I've
>>> narrowed it down to the ResultSet.last() method call in
>>> EntityListIterator when performed on large result
>>> sets.  I switched the FindGeneric.groovy script in
>>> webtools to use findCountByCondition instead of
>>> getResultSizeAfterPartialList and the page load time went
>>> from 40-50s down to 2-3s for a result containing 700,000
>>> records.
>>>
>>> Based on that I assumed there was probably some magic
>>> number depending on your system where it becomes more
>>> efficient to do a separate count query rather than use the
>>> ResultSet so I put together a quick test to find out.
>>> I threw together a script that repeatedly adds 500 rows to
>>> the jobsandbox and then outputs the average time taken of 3
>>> attempts to get the list size for each method.  Here's
>>> a graph of the results using embedded Derby: http://imgur.com/ieR7m
>>>
>>> So unless the magic number lies somewhere in the first 500
>>> records it looks to me like it always more efficient to do a
>>> separate count query.
>>>
>>> It makes me wonder if we should be taking a different
>>> approach to pagination in the form widget and in
>>> general.  Any thoughts?
>>>
>>> Thanks
>>> Scott
>>>
>>>
>>
>>
>>
>>
>