Login  Register

Re: EntityListIterator.getResultsSizeAfterPartialList() vs. delegator.getCountByCondition(...)

Posted by Scott Gray-2 on Aug 29, 2009; 5:21am
URL: http://ofbiz.116.s1.nabble.com/EntityListIterator-getResultsSizeAfterPartialList-vs-delegator-getCountByCondition-tp206959p206966.html

Is it possible the pool only issues one connection per thread?  Here's  
the relevant portion of the stacktrace:
---- exception report  
----------------------------------------------------------
Failure in operation, rolling back transaction
Exception: org.ofbiz.entity.GenericDataSourceException
Message: SQL Exception while executing the following:SELECT COUNT(1)  
FROM JOB_SANDBOX (Streaming result set  
com.mysql.jdbc.RowDataDynamic@bcdff0 is still active. No statements  
may be issued when any streaming result sets are open and in use on a  
given connection. Ensure that you have called .close() on any active  
streaming result sets before attempting more queries.)
---- cause  
---------------------------------------------------------------------
Exception: java.sql.SQLException
Message: Streaming result set com.mysql.jdbc.RowDataDynamic@bcdff0 is  
still active. No statements may be issued when any streaming result  
sets are open and in use on a given connection. Ensure that you have  
called .close() on any active streaming result sets before attempting  
more queries.
---- stack trace  
---------------------------------------------------------------
java.sql.SQLException: Streaming result set  
com.mysql.jdbc.RowDataDynamic@bcdff0 is still active. No statements  
may be issued when any streaming result sets are open and in use on a  
given connection. Ensure that you have called .close() on any active  
streaming result sets before attempting more queries.
com.mysql.jdbc.SQLError.createSQLException(SQLError.java:930)
com.mysql.jdbc.MysqlIO.checkForOutstandingStreamingData(MysqlIO.java:
2672)
com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1866)
com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2107)
com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2642)
com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:
2171)
org
.apache
.commons
.dbcp
.DelegatingPreparedStatement
.executeQuery(DelegatingPreparedStatement.java:93)
org
.apache
.commons
.dbcp
.DelegatingPreparedStatement
.executeQuery(DelegatingPreparedStatement.java:93)
org.ofbiz.entity.jdbc.SQLProcessor.executeQuery(SQLProcessor.java:378)
org
.ofbiz
.entity.datasource.GenericDAO.selectCountByCondition(GenericDAO.java:
1075)
org
.ofbiz
.entity
.util
.EntityListIterator
.getResultsSizeAfterPartialList(EntityListIterator.java:524)

Regards
Scott

On 29/08/2009, at 4:52 PM, David E Jones wrote:

>
> On Aug 27, 2009, at 7:57 PM, Scott Gray wrote:
>
>> I just tried using a cursor with MySQL and the following caveat  
>> from the docs caught me out:
>>> There are some caveats with this approach. You will have to read  
>>> all of the rows in the result set (or close it) before you can  
>>> issue any other queries on the connection, or an exception will be  
>>> thrown.
>>>
>> Attempting the count query causes the exception to be thrown  
>> because the same connection is used.  Is it feasible to try and  
>> detect a streaming ResultSet on the connection and if present get  
>> another from the connection factory?  I'll skip the cursor for  
>> MySQL for now I think but I'll need to check whether Postgres  
>> exhibits the same behavior although their docs make no mention of it.
>
> I'm shooting from the hip here... but I don't think the connection  
> pool returns a connection to the pool (for reuse by other threads)  
> while it is still be used, ie not until the connection is "closed",  
> so I don't think this would happen with Entity Engine connections.
>
> -David
>
>
>> On 28/08/2009, at 12:39 PM, Scott Gray wrote:
>>
>>> Hi David,
>>>
>>> Always doing the separate query would certainly be easier, I think  
>>> I'll go with that.
>>>
>>>> You mentioned: "regardless of the database you are using it is  
>>>> ALWAYS faster to retrieve a resultset limited." By that do you  
>>>> mean including just Derby and MySQL, or have you tried other  
>>>> databases? Also, was the JDBC driver setup to have the ResultSet  
>>>> backed by a cursor in the database? As for MySQL and Derby, I'm  
>>>> not sure how good a job with this sort of thing I'd expect.  
>>>> Things are a little different with Postgres and I'd expect better  
>>>> results there, and a lot different with Oracle and I'd expect way  
>>>> better results there.
>>>
>>> Thanks for calling that out, saying ALWAYS was probably getting a  
>>> bit carried away :-)
>>> I've been trying things out with Derby, Postgres and MySQL.  By  
>>> default both Postgres[1] and MySQL[2] load the entire ResultSet  
>>> into memory (Derby does not but calling last() is expensive), the  
>>> only way to get hold of a cursor is to specify TYPE_FOWARD_ONLY  
>>> and CONCUR_READ_ONLY but I've read of potential problems with this  
>>> approach in MySQL[3].  And of course it removes the ability to  
>>> call last() to get the size of the result but that can be negated  
>>> by using the separate count query.  It also causes getPartialList  
>>> to fail because you can't jump forward in the resultset (MySQL and  
>>> Derby, I haven't tried Postgres on that yet), but we could also  
>>> negate that by detecting a forward only result set and using  
>>> next() only to get to where we want to be.
>>>
>>> So with all that said here is my revised solution:
>>> 1.  Always use a separate count query to get the resultset size in  
>>> the ELI
>>> 2.  Switch pagination queries to use performFindList rather than  
>>> performFind and set maxRows, also switch the resultset type to  
>>> FORWARD_ONLY for both services, I think it is rare that we would  
>>> want to jump around a resultset except to get the size.  Setting  
>>> maxRows will cause no harm if using a cursor but if not it'll  
>>> speed up queries quite a bit for large resultsets and reduce the  
>>> memory consumed (unless the viewIndex is significantly high)
>>> 3.  Detect forward only result sets in the ELI and change  
>>> getPartialList to iterate to the desired position rather than  
>>> using absolute().
>>> 4.  MySQL requires you to set the fetchSize to Integer.MIN_VALUE  
>>> but SQLProcessor overrides this if the setting is less than zero  
>>> so change that to allow any int value to pass through.
>>>
>>> I'll keep testing with MySQL to see if I can reproduce the problem  
>>> mentioned in [3].  Even if it is a problem and I have to go  
>>> without the cursor, using maxRows should still result in some  
>>> improvements.
>>>
>>> Thanks David, your feedback is always appreciated and often helps  
>>> me to look at things in a different light.
>>>
>>> Regards
>>> Scott
>>>
>>> [1] http://jdbc.postgresql.org/documentation/83/query.html#query-with-cursor
>>> [2] http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-implementation-notes.html 
>>>  (Skip down to the ResultSet paragraph)
>>> [3] http://javaquirks.blogspot.com/2007/12/mysql-streaming-result-set.html 
>>>  (2nd to last sentence)
>>>
>>> On 28/08/2009, at 3:41 AM, David E Jones wrote:
>>>
>>>>
>>>> Scott,
>>>>
>>>> The steps you mentioned sound good. One possible simplification  
>>>> is that you could always do a separate query in the  
>>>> getResultSizeAfterPartialList method. IMO it is safe to assume  
>>>> that the data may be large if the EntityListIterator is being  
>>>> used explicitly. In other words, if the programmers knows there  
>>>> won't be much data they'll just get a List back instead of an ELI.
>>>>
>>>> You mentioned: "regardless of the database you are using it is  
>>>> ALWAYS faster to retrieve a resultset limited." By that do you  
>>>> mean including just Derby and MySQL, or have you tried other  
>>>> databases? Also, was the JDBC driver setup to have the ResultSet  
>>>> backed by a cursor in the database? As for MySQL and Derby, I'm  
>>>> not sure how good a job with this sort of thing I'd expect.  
>>>> Things are a little different with Postgres and I'd expect better  
>>>> results there, and a lot different with Oracle and I'd expect way  
>>>> better results there.
>>>>
>>>> -David
>>>>
>>>>
>>>> On Aug 27, 2009, at 2:57 AM, Scott Gray wrote:
>>>>
>>>>> I can't do that at the moment because I've been modifying the  
>>>>> same script repeatedly to test different situations and it's  
>>>>> gotten pretty messy.  But as soon as I've found a temporary  
>>>>> solution to the problem we're having I'll come back to it, clean  
>>>>> it up and post it so the discussion can continue with more  
>>>>> people running tests.
>>>>>
>>>>> What I have been able to determine so far is that regardless of  
>>>>> the database you are using it is ALWAYS faster to retrieve a  
>>>>> resultset limited to the records you are actually going to use.  
>>>>> The problem is that for pagination we "need" (I'm not sure how  
>>>>> badly) to know the full size of the resultset.  It turns out  
>>>>> there is a magic number of records where it becomes faster to do  
>>>>> a separate count query but only if you are using  
>>>>> EntityFindOptions.setMaxRows(int) on your ELI along with it.  On  
>>>>> my machine it is somewhere between 25,000-50,000 records.
>>>>>
>>>>> Keep in mind also that this is only really a problem for screens  
>>>>> where we paginate through resultsets, in most other cases we  
>>>>> always use the entire resultset.
>>>>>
>>>>> It was taking too long to test different table sizes so I ended  
>>>>> up just filling a table in MySql with 1,000,000 records and  
>>>>> simulated paginating through it (each result is the same query  
>>>>> run 5 times and the times are in milliseconds):
>>>>>
>>>>> Here's the result for the way we currently do it with an ELI,  
>>>>> there is only one result because because it takes too long to  
>>>>> test and the result doesn't really vary regardless of the  
>>>>> viewIndex being 50 or 500,000:
>>>>> 0 viewIndex: min 13610, max 37307, avg 27386.2, total 136931
>>>>>
>>>>> Here's the result using an ELI with maxRows and a separate count  
>>>>> query:
>>>>> 50 viewIndex: min 363, max 377, avg 371.8, total 1859
>>>>> 100 viewIndex: min 372, max 462, avg 413.4, total 2067
>>>>> 200 viewIndex: min 385, max 412, avg 394.4, total 1972
>>>>> 400 viewIndex: min 378, max 412, avg 392.2, total 1961
>>>>> 800 viewIndex: min 373, max 1044, avg 510, total 2550
>>>>> 1600 viewIndex: min 390, max 405, avg 397.4, total 1987
>>>>> 3200 viewIndex: min 402, max 433, avg 418.6, total 2093
>>>>> 6400 viewIndex: min 425, max 504, avg 449.8, total 2249
>>>>> 12800 viewIndex: min 459, max 648, avg 536, total 2680
>>>>> 25600 viewIndex: min 570, max 1173, avg 705.8, total 3529
>>>>> 51200 viewIndex: min 756, max 1144, avg 958.8, total 4794
>>>>> 102400 viewIndex: min 1252, max 2810, avg 1576, total 7880
>>>>> 204800 viewIndex: min 2123, max 10120, avg 5253.6, total 26268
>>>>> 409600 viewIndex: min 4212, max 10837, avg 7011, total 35055
>>>>>
>>>>> That's for a 1,000,000 records but the ELI by itself is much  
>>>>> faster for me on small resultsets but gets progressively slower  
>>>>> as the resultset's size increases.
>>>>>
>>>>> Another issue I encountered is that if I run the first portion  
>>>>> of this test twice in two separate browser windows at the same  
>>>>> time then an out of memory error occurs and the instance locks  
>>>>> up until I restart it.  Should we be able to recover from an out  
>>>>> of memory error or should just we just concentrate on avoiding  
>>>>> them?
>>>>>
>>>>> The only solution I can think of so far is to:
>>>>> 1.  Add the ability for OFBiz to learn when a query becomes high  
>>>>> volume i.e. it's resultsize begins crossing the configurable  
>>>>> magic number threshold
>>>>> 2.  Add a new method for pagination to the delegator that can  
>>>>> decide whether or not to set maxRows based on #1 for the  
>>>>> EntityListIterator that it will return
>>>>> 3.  Provide the EntityListIterator with the information required  
>>>>> to be able to perform a separate count query (it needs the  
>>>>> delegator or dao + the where and having conditions or we could  
>>>>> just give it a SQLProcessor ready to go)
>>>>> 4.  Change the ELI's getResultSizeAfterPartialList to perform a  
>>>>> count query if maxRows was set and the info from #3 was provided
>>>>> 5.  For forms that use the generic performFind service for  
>>>>> paginated results, switch them over to using the performFindList  
>>>>> service and change it's implementation to use the new delegator  
>>>>> method from #2
>>>>>
>>>>> Any thoughts?
>>>>>
>>>>> Thanks
>>>>> Scott
>>>>>
>>>>>
>>>>> On 27/08/2009, at 3:22 AM, Adrian Crum wrote:
>>>>>
>>>>>> Scott,
>>>>>>
>>>>>> It would be helpful if you could post your script in a Jira  
>>>>>> issues so we can run it against various databases. I would like  
>>>>>> to try it on ours.
>>>>>>
>>>>>> -Adrian
>>>>>>
>>>>>> --- On Tue, 8/25/09, Scott Gray <[hidden email]>  
>>>>>> wrote:
>>>>>>
>>>>>>> From: Scott Gray <[hidden email]>
>>>>>>> Subject: EntityListIterator.getResultsSizeAfterPartialList()  
>>>>>>> vs. delegator.getCountByCondition(...)
>>>>>>> To: [hidden email]
>>>>>>> Date: Tuesday, August 25, 2009, 9:04 PM
>>>>>>> Hi all,
>>>>>>>
>>>>>>> We've had a few slow query problems lately and I've
>>>>>>> narrowed it down to the ResultSet.last() method call in
>>>>>>> EntityListIterator when performed on large result
>>>>>>> sets.  I switched the FindGeneric.groovy script in
>>>>>>> webtools to use findCountByCondition instead of
>>>>>>> getResultSizeAfterPartialList and the page load time went
>>>>>>> from 40-50s down to 2-3s for a result containing 700,000
>>>>>>> records.
>>>>>>>
>>>>>>> Based on that I assumed there was probably some magic
>>>>>>> number depending on your system where it becomes more
>>>>>>> efficient to do a separate count query rather than use the
>>>>>>> ResultSet so I put together a quick test to find out.
>>>>>>> I threw together a script that repeatedly adds 500 rows to
>>>>>>> the jobsandbox and then outputs the average time taken of 3
>>>>>>> attempts to get the list size for each method.  Here's
>>>>>>> a graph of the results using embedded Derby: http://imgur.com/ieR7m
>>>>>>>
>>>>>>> So unless the magic number lies somewhere in the first 500
>>>>>>> records it looks to me like it always more efficient to do a
>>>>>>> separate count query.
>>>>>>>
>>>>>>> It makes me wonder if we should be taking a different
>>>>>>> approach to pagination in the form widget and in
>>>>>>> general.  Any thoughts?
>>>>>>>
>>>>>>> Thanks
>>>>>>> Scott
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>


smime.p7s (3K) Download Attachment