[jira] [Commented] (OFBIZ-11789) Implement EntityBatchIterator for large data set queries

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (OFBIZ-11789) Implement EntityBatchIterator for large data set queries

Nicolas Malin (Jira)

    [ https://issues.apache.org/jira/browse/OFBIZ-11789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17126641#comment-17126641 ]

Pawan Verma commented on OFBIZ-11789:
-------------------------------------

Implemented: EntityBatchIterator which pages through large resultsets (500 at a time by default) and added support for EntityBatchIterator to EntityQuery via 'queryBatchIterator'(OFBIZ-11789)

Modified GenericDao#makeOffsetString to have OFFSET/FETCH style by default
Modified entityengine's MySQL and Postgres data sources to have LIMIT/OFFSET style, rest can work with default OFFSET/FETCH style

PR Link: [https://github.com/apache/ofbiz-framework/pull/191]

Thanks: [~lektran] for the help.

 

This can be tested using EntityQuery#queryBatchIterator() method for any entity that has large data. See the logs to verify its working. 

 

 

> Implement EntityBatchIterator for large data set queries
> --------------------------------------------------------
>
>                 Key: OFBIZ-11789
>                 URL: https://issues.apache.org/jira/browse/OFBIZ-11789
>             Project: OFBiz
>          Issue Type: Improvement
>          Components: framework/entity
>    Affects Versions: Trunk
>            Reporter: Pawan Verma
>            Assignee: Pawan Verma
>            Priority: Minor
>
> Dev Mail Thread: [https://markmail.org/search/EntityBatchIterator+for+large+data+set+queries]
> While working on the large database we have figured out that very large queries consume all memory and crash ofbiz(because queryIterator() doesn't really work, it's no different from queryList())
> The EntityListIterator attempts to use a cursor to iterate over large result sets but in reality most databases do not give us a cursor unless we ask for it in a very specific way, and instead you get back the full result set and potentially consume a large amount of memory.  For example, the MySql details are here (ResultSet section): [https://dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-implementation-notes.html]
> To work around this we've built the EntityBatchIterator which instead basically just pages through results like you would do on a webpage that uses pagination.  The iterator takes an EntityQuery and when next() is called it grabs the first 500 records and returns the first record, call next again and you get the second record, once you've called next 500 times it runs the query again with an offset and grabs the next 500 and so on until there are no more records.
> The main downsides to this approach are:
> 1. Same as when using limit/offset you want to be sure your results will come back in a consistent order to avoid accidentally skipping some rows and seeing other rows twice.
> 2. Because the results are a combination of many queries rather than a single query, some of the data may change while you are paging through it. i.e. if you were to sort by lastUpdatedTxStamp you may see some rows twice as they are updated by other transactions (this might be a good thing in some cases).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)