>
> On Aug 27, 2009, at 7:57 PM, Scott Gray wrote:
>
>> I just tried using a cursor with MySQL and the following caveat
>> from the docs caught me out:
>>> There are some caveats with this approach. You will have to read
>>> all of the rows in the result set (or close it) before you can
>>> issue any other queries on the connection, or an exception will be
>>> thrown.
>>>
>> Attempting the count query causes the exception to be thrown
>> because the same connection is used. Is it feasible to try and
>> detect a streaming ResultSet on the connection and if present get
>> another from the connection factory? I'll skip the cursor for
>> MySQL for now I think but I'll need to check whether Postgres
>> exhibits the same behavior although their docs make no mention of it.
>
> I'm shooting from the hip here... but I don't think the connection
> pool returns a connection to the pool (for reuse by other threads)
> while it is still be used, ie not until the connection is "closed",
> so I don't think this would happen with Entity Engine connections.
>
> -David
>
>
>> On 28/08/2009, at 12:39 PM, Scott Gray wrote:
>>
>>> Hi David,
>>>
>>> Always doing the separate query would certainly be easier, I think
>>> I'll go with that.
>>>
>>>> You mentioned: "regardless of the database you are using it is
>>>> ALWAYS faster to retrieve a resultset limited." By that do you
>>>> mean including just Derby and MySQL, or have you tried other
>>>> databases? Also, was the JDBC driver setup to have the ResultSet
>>>> backed by a cursor in the database? As for MySQL and Derby, I'm
>>>> not sure how good a job with this sort of thing I'd expect.
>>>> Things are a little different with Postgres and I'd expect better
>>>> results there, and a lot different with Oracle and I'd expect way
>>>> better results there.
>>>
>>> Thanks for calling that out, saying ALWAYS was probably getting a
>>> bit carried away :-)
>>> I've been trying things out with Derby, Postgres and MySQL. By
>>> default both Postgres[1] and MySQL[2] load the entire ResultSet
>>> into memory (Derby does not but calling last() is expensive), the
>>> only way to get hold of a cursor is to specify TYPE_FOWARD_ONLY
>>> and CONCUR_READ_ONLY but I've read of potential problems with this
>>> approach in MySQL[3]. And of course it removes the ability to
>>> call last() to get the size of the result but that can be negated
>>> by using the separate count query. It also causes getPartialList
>>> to fail because you can't jump forward in the resultset (MySQL and
>>> Derby, I haven't tried Postgres on that yet), but we could also
>>> negate that by detecting a forward only result set and using
>>> next() only to get to where we want to be.
>>>
>>> So with all that said here is my revised solution:
>>> 1. Always use a separate count query to get the resultset size in
>>> the ELI
>>> 2. Switch pagination queries to use performFindList rather than
>>> performFind and set maxRows, also switch the resultset type to
>>> FORWARD_ONLY for both services, I think it is rare that we would
>>> want to jump around a resultset except to get the size. Setting
>>> maxRows will cause no harm if using a cursor but if not it'll
>>> speed up queries quite a bit for large resultsets and reduce the
>>> memory consumed (unless the viewIndex is significantly high)
>>> 3. Detect forward only result sets in the ELI and change
>>> getPartialList to iterate to the desired position rather than
>>> using absolute().
>>> 4. MySQL requires you to set the fetchSize to Integer.MIN_VALUE
>>> but SQLProcessor overrides this if the setting is less than zero
>>> so change that to allow any int value to pass through.
>>>
>>> I'll keep testing with MySQL to see if I can reproduce the problem
>>> mentioned in [3]. Even if it is a problem and I have to go
>>> without the cursor, using maxRows should still result in some
>>> improvements.
>>>
>>> Thanks David, your feedback is always appreciated and often helps
>>> me to look at things in a different light.
>>>
>>> Regards
>>> Scott
>>>
>>> [1]
http://jdbc.postgresql.org/documentation/83/query.html#query-with-cursor>>> [2]
http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-implementation-notes.html
>>> (Skip down to the ResultSet paragraph)
>>> [3]
http://javaquirks.blogspot.com/2007/12/mysql-streaming-result-set.html
>>> (2nd to last sentence)
>>>
>>> On 28/08/2009, at 3:41 AM, David E Jones wrote:
>>>
>>>>
>>>> Scott,
>>>>
>>>> The steps you mentioned sound good. One possible simplification
>>>> is that you could always do a separate query in the
>>>> getResultSizeAfterPartialList method. IMO it is safe to assume
>>>> that the data may be large if the EntityListIterator is being
>>>> used explicitly. In other words, if the programmers knows there
>>>> won't be much data they'll just get a List back instead of an ELI.
>>>>
>>>> You mentioned: "regardless of the database you are using it is
>>>> ALWAYS faster to retrieve a resultset limited." By that do you
>>>> mean including just Derby and MySQL, or have you tried other
>>>> databases? Also, was the JDBC driver setup to have the ResultSet
>>>> backed by a cursor in the database? As for MySQL and Derby, I'm
>>>> not sure how good a job with this sort of thing I'd expect.
>>>> Things are a little different with Postgres and I'd expect better
>>>> results there, and a lot different with Oracle and I'd expect way
>>>> better results there.
>>>>
>>>> -David
>>>>
>>>>
>>>> On Aug 27, 2009, at 2:57 AM, Scott Gray wrote:
>>>>
>>>>> I can't do that at the moment because I've been modifying the
>>>>> same script repeatedly to test different situations and it's
>>>>> gotten pretty messy. But as soon as I've found a temporary
>>>>> solution to the problem we're having I'll come back to it, clean
>>>>> it up and post it so the discussion can continue with more
>>>>> people running tests.
>>>>>
>>>>> What I have been able to determine so far is that regardless of
>>>>> the database you are using it is ALWAYS faster to retrieve a
>>>>> resultset limited to the records you are actually going to use.
>>>>> The problem is that for pagination we "need" (I'm not sure how
>>>>> badly) to know the full size of the resultset. It turns out
>>>>> there is a magic number of records where it becomes faster to do
>>>>> a separate count query but only if you are using
>>>>> EntityFindOptions.setMaxRows(int) on your ELI along with it. On
>>>>> my machine it is somewhere between 25,000-50,000 records.
>>>>>
>>>>> Keep in mind also that this is only really a problem for screens
>>>>> where we paginate through resultsets, in most other cases we
>>>>> always use the entire resultset.
>>>>>
>>>>> It was taking too long to test different table sizes so I ended
>>>>> up just filling a table in MySql with 1,000,000 records and
>>>>> simulated paginating through it (each result is the same query
>>>>> run 5 times and the times are in milliseconds):
>>>>>
>>>>> Here's the result for the way we currently do it with an ELI,
>>>>> there is only one result because because it takes too long to
>>>>> test and the result doesn't really vary regardless of the
>>>>> viewIndex being 50 or 500,000:
>>>>> 0 viewIndex: min 13610, max 37307, avg 27386.2, total 136931
>>>>>
>>>>> Here's the result using an ELI with maxRows and a separate count
>>>>> query:
>>>>> 50 viewIndex: min 363, max 377, avg 371.8, total 1859
>>>>> 100 viewIndex: min 372, max 462, avg 413.4, total 2067
>>>>> 200 viewIndex: min 385, max 412, avg 394.4, total 1972
>>>>> 400 viewIndex: min 378, max 412, avg 392.2, total 1961
>>>>> 800 viewIndex: min 373, max 1044, avg 510, total 2550
>>>>> 1600 viewIndex: min 390, max 405, avg 397.4, total 1987
>>>>> 3200 viewIndex: min 402, max 433, avg 418.6, total 2093
>>>>> 6400 viewIndex: min 425, max 504, avg 449.8, total 2249
>>>>> 12800 viewIndex: min 459, max 648, avg 536, total 2680
>>>>> 25600 viewIndex: min 570, max 1173, avg 705.8, total 3529
>>>>> 51200 viewIndex: min 756, max 1144, avg 958.8, total 4794
>>>>> 102400 viewIndex: min 1252, max 2810, avg 1576, total 7880
>>>>> 204800 viewIndex: min 2123, max 10120, avg 5253.6, total 26268
>>>>> 409600 viewIndex: min 4212, max 10837, avg 7011, total 35055
>>>>>
>>>>> That's for a 1,000,000 records but the ELI by itself is much
>>>>> faster for me on small resultsets but gets progressively slower
>>>>> as the resultset's size increases.
>>>>>
>>>>> Another issue I encountered is that if I run the first portion
>>>>> of this test twice in two separate browser windows at the same
>>>>> time then an out of memory error occurs and the instance locks
>>>>> up until I restart it. Should we be able to recover from an out
>>>>> of memory error or should just we just concentrate on avoiding
>>>>> them?
>>>>>
>>>>> The only solution I can think of so far is to:
>>>>> 1. Add the ability for OFBiz to learn when a query becomes high
>>>>> volume i.e. it's resultsize begins crossing the configurable
>>>>> magic number threshold
>>>>> 2. Add a new method for pagination to the delegator that can
>>>>> decide whether or not to set maxRows based on #1 for the
>>>>> EntityListIterator that it will return
>>>>> 3. Provide the EntityListIterator with the information required
>>>>> to be able to perform a separate count query (it needs the
>>>>> delegator or dao + the where and having conditions or we could
>>>>> just give it a SQLProcessor ready to go)
>>>>> 4. Change the ELI's getResultSizeAfterPartialList to perform a
>>>>> count query if maxRows was set and the info from #3 was provided
>>>>> 5. For forms that use the generic performFind service for
>>>>> paginated results, switch them over to using the performFindList
>>>>> service and change it's implementation to use the new delegator
>>>>> method from #2
>>>>>
>>>>> Any thoughts?
>>>>>
>>>>> Thanks
>>>>> Scott
>>>>>
>>>>>
>>>>> On 27/08/2009, at 3:22 AM, Adrian Crum wrote:
>>>>>
>>>>>> Scott,
>>>>>>
>>>>>> It would be helpful if you could post your script in a Jira
>>>>>> issues so we can run it against various databases. I would like
>>>>>> to try it on ours.
>>>>>>
>>>>>> -Adrian
>>>>>>
>>>>>> --- On Tue, 8/25/09, Scott Gray <
[hidden email]>
>>>>>> wrote:
>>>>>>
>>>>>>> From: Scott Gray <
[hidden email]>
>>>>>>> Subject: EntityListIterator.getResultsSizeAfterPartialList()
>>>>>>> vs. delegator.getCountByCondition(...)
>>>>>>> To:
[hidden email]
>>>>>>> Date: Tuesday, August 25, 2009, 9:04 PM
>>>>>>> Hi all,
>>>>>>>
>>>>>>> We've had a few slow query problems lately and I've
>>>>>>> narrowed it down to the ResultSet.last() method call in
>>>>>>> EntityListIterator when performed on large result
>>>>>>> sets. I switched the FindGeneric.groovy script in
>>>>>>> webtools to use findCountByCondition instead of
>>>>>>> getResultSizeAfterPartialList and the page load time went
>>>>>>> from 40-50s down to 2-3s for a result containing 700,000
>>>>>>> records.
>>>>>>>
>>>>>>> Based on that I assumed there was probably some magic
>>>>>>> number depending on your system where it becomes more
>>>>>>> efficient to do a separate count query rather than use the
>>>>>>> ResultSet so I put together a quick test to find out.
>>>>>>> I threw together a script that repeatedly adds 500 rows to
>>>>>>> the jobsandbox and then outputs the average time taken of 3
>>>>>>> attempts to get the list size for each method. Here's
>>>>>>> a graph of the results using embedded Derby:
http://imgur.com/ieR7m>>>>>>>
>>>>>>> So unless the magic number lies somewhere in the first 500
>>>>>>> records it looks to me like it always more efficient to do a
>>>>>>> separate count query.
>>>>>>>
>>>>>>> It makes me wonder if we should be taking a different
>>>>>>> approach to pagination in the form widget and in
>>>>>>> general. Any thoughts?
>>>>>>>
>>>>>>> Thanks
>>>>>>> Scott
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>