OFBiz › OFBiz - Dev

Entity Caching

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

46 messages Options

123

Adrian Crum-3

Re: Entity Caching

The translation to English is not good, but I think I understand what
you are saying.

The entity values in the cache MUST be immutable - because multiple
threads share the values. To do otherwise would require complicated
synchronization code in GenericValue (which would cause blocking and
hurt performance).

When I first starting working on the entity cache issues, it appeared to
me that mutable entity values may have been in the original design (to
enable a write-through cache). That is my guess - I am not sure. At some
time, the entity values in the cache were made immutable, but the change
was incomplete - some cached entity values were immutable and others
were not. That is one of the things I fixed - I made sure ALL entity
values coming from the cache are immutable.

One way we can eliminate the additional complication of cloning
immutable entity values is to wrap the List in a custom Iterator
implementation that automatically clones elements as they are retrieved
from the List. The drawback is the performance hit - because you would
be cloning values that might not get modified. I think it is more
efficient to clone an entity value only when you intend to modify it.

Adrian Crum
Sandglass Software
www.sandglass-software.com

On 3/19/2015 4:19 PM, Nicolas Malin wrote:

> Le 18/03/2015 13:16, Adrian Crum a écrit :
>> If you code Delegator calls to avoid the cache, then there is no way
>> for a sysadmin to configure the caching behavior - that bit of code
>> will ALWAYS make a database call.
>>
>> If you make all Delegator calls use the cache, then there is an
>> additional complication that will add a bit more code: the
>> GenericValue instances retrieved from the cache are immutable - if you
>> want to modify them, then you will have to clone them. So, this
>> approach can produce an additional line of code.
>
> I don't see any logical reason why we need to keep a GenericValue came
> from cache as immutable. In large vision, a developper give information
> on cache or not only he want force the cache using during his process.
> As OFBiz manage by default transaction, timezone, locale, auto-matching
> or others.
> The entity engine would be works with admin sys cache tuning.
>
> As example delegator.find("Party", "partyId", partyId) use the default
> parameter from cache.properties and after the store on a cached
> GenericValue is a delegator's problem. I see a simple test like that :
> if (genericValue came from cache) {
> if (value is already done) {
> getFromDataBase
> update Value
> }
> else refuse (or not I have a doubt :) )
> }
> store
>
>
> Nicolas

Scott Gray-2

Re: Entity Caching

In reply to this post by Adrian Crum-3

You're missing a step that actually causes the issue, prior to the rollback
in 5b some code within the same transaction retrieves the modified row from
the database again which puts the modified row in the cache and makes the
change visible to other transactions even though it hasn't yet been
committed.

Because of our service oriented architecture this scenario isn't uncommon.
An example is updating an OrderHeader's statusId which can trigger a number
of SECAs which in turn are likely to retrieve the OrderHeader row after
being passed only the orderId. If a rollback occurred in one of those
services, the modified row would remain in the cache even though the
changes were never committed.
On 20 Mar 2015 00:06, "Adrian Crum" <[hidden email]>
wrote:

> Okay, let's assume processes cannot "see" changes made by another
> transaction until that transaction is committed. Here is how the current
> entity cache works:
>
> 1. A Delegator find method is invoked. The Delegator checks the cache, and
> the SQL SELECT result does not exist in the cache.
> 2. The Delegator executes the SQL SELECT and puts the results in the
> entity cache.
> 3. The SQL SELECT results are returned to the calling process.
> 4. The calling process modifies one of the values (rows) in the SQL SELECT
> result (after cloning the immutable entity value).
> 5a. Something goes wrong and the calling process rolls back the
> transaction before the cloned value is persisted.
> 5b. Something goes wrong and the calling process rolls back the
> transaction after the cloned value is persisted and all related caches have
> been cleared.
> 6. Another process performs the same query as #1.
> 7. The second process gets the results from the cache. The values from the
> cache have not changed because the cloned & modified value (in #4) was not
> put in the cache, nor was it written to the data source.
>
> From my perspective, the scenario you described can only happen if another
> process can see changes that are made in the data source before the
> transaction is committed.
>
> From your perspective, the entity cache is somehow inserting invalid
> values when a transaction is rolled back.
>
> Adrian Crum
> Sandglass Software
> www.sandglass-software.com
>
> On 3/19/2015 10:41 AM, Scott Gray wrote:
>
>> I'm sorry but I'm not following what you're proposing. Currently row
>> changes caused within a transaction are available only to queries issued
>> within that same transaction (i.e. read committed), except that the cache
>> breaks this isolation by making them immediately available to any
>> transaction querying that entity. I don't see how this scenario exists
>> outside of the cache unless the logic within the transaction explicitly
>> passes a row off to another transaction, and I'm not aware of any cases
>> like that.
>>
>> On Thu, Mar 19, 2015 at 3:17 AM, Adrian Crum <
>> [hidden email]> wrote:
>>
>> I call it an edge case because it is easily fixed by changing the
>>> transaction isolation level.
>>>
>>> The behavior you describe is not caused by the entity cache, but by the
>>> transaction isolation level. The same scenario would exist without the
>>> entity cache - where two processes hold a reference to the updated row,
>>> and
>>> one process performs a rollback.
>>>
>>> Adrian Crum
>>> Sandglass Software
>>> www.sandglass-software.com
>>>
>>> On 3/19/2015 7:28 AM, Scott Gray wrote:
>>>
>>> Ah, it's quite a large edge case IMO
>>>>
>>>> On Thu, Mar 19, 2015 at 12:20 AM, Adrian Crum <
>>>> [hidden email]> wrote:
>>>>
>>>> That is the edge case I mentioned.
>>>>
>>>>>
>>>>> Adrian Crum
>>>>> Sandglass Software
>>>>> www.sandglass-software.com
>>>>>
>>>>> On 3/19/2015 6:54 AM, Scott Gray wrote:
>>>>>
>>>>> I tend to disagree with the "cache everything" approach because the
>>>>>
>>>>>> cache
>>>>>> isn't transaction aware.
>>>>>> If you:
>>>>>> 1. update a record
>>>>>> 2. select that same record
>>>>>> 3. encounter a transaction rollback
>>>>>>
>>>>>> Then the cache will still contain the changes that were rolled back.
>>>>>>
>>>>>> Regards
>>>>>> Scott
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 18, 2015 at 5:16 AM, Adrian Crum <
>>>>>> [hidden email]> wrote:
>>>>>>
>>>>>> I would like to share some insights into the entity cache feature,
>>>>>> some
>>>>>>
>>>>>> best practices I like to follow, and some related information.
>>>>>>>
>>>>>>> Some OFBiz experts may disagree with some of my views, and that is
>>>>>>> okay.
>>>>>>> Different experiences with OFBiz will lead to different viewpoints.
>>>>>>>
>>>>>>> The OFBiz entity caching feature is intended to improve performance
>>>>>>> by
>>>>>>> keeping GenericValue instances in memory - decreasing the number of
>>>>>>> calls
>>>>>>> to the database.
>>>>>>>
>>>>>>> Background
>>>>>>> ----------
>>>>>>>
>>>>>>> Initially, the entity cache was very unreliable due to a number of
>>>>>>> flaws
>>>>>>> in its design and in the code that calls it (it was guaranteed to
>>>>>>> produce
>>>>>>> stale data). As a result, I personally avoided using the entity cache
>>>>>>> feature.
>>>>>>>
>>>>>>> Some time ago, Adam Heath did a lot of work on the entity cache.
>>>>>>> After
>>>>>>> that, Jacopo and I did a lot of work fixing stale data issues in the
>>>>>>> entity
>>>>>>> cache. Today, the entity cache is much improved and unit tests ensure
>>>>>>> it
>>>>>>> produces the correct data (except for one edge case that Jacopo has
>>>>>>> identified).
>>>>>>>
>>>>>>> I mention all of this because the previous quirky behavior led to
>>>>>>> some
>>>>>>> "best practices" that didn't make much sense. A search through the
>>>>>>> OFBiz
>>>>>>> mail archives will produce a mountain of conflicting and confusing
>>>>>>> information.
>>>>>>>
>>>>>>> Today
>>>>>>> -----
>>>>>>>
>>>>>>> Since the current entity cache is reliable, there is no reason NOT to
>>>>>>> use
>>>>>>> it. My preference is to make ALL Delegator calls use the cache. If
>>>>>>> all
>>>>>>> code
>>>>>>> uses the cache, then individual entities can have their caching
>>>>>>> characteristics configured outside of code. This enables sysadmins to
>>>>>>> fine-tune entity caches for best performance.
>>>>>>>
>>>>>>> [Some experts might disagree with this approach because the entity
>>>>>>> cache
>>>>>>> will consume all available memory. But the idea is to configure the
>>>>>>> cache
>>>>>>> so that doesn't happen.]
>>>>>>>
>>>>>>> If you code Delegator calls to avoid the cache, then there is no way
>>>>>>> for
>>>>>>> a
>>>>>>> sysadmin to configure the caching behavior - that bit of code will
>>>>>>> ALWAYS
>>>>>>> make a database call.
>>>>>>>
>>>>>>> If you make all Delegator calls use the cache, then there is an
>>>>>>> additional
>>>>>>> complication that will add a bit more code: the GenericValue
>>>>>>> instances
>>>>>>> retrieved from the cache are immutable - if you want to modify them,
>>>>>>> then
>>>>>>> you will have to clone them. So, this approach can produce an
>>>>>>> additional
>>>>>>> line of code.
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Adrian Crum
>>>>>>> Sandglass Software
>>>>>>> www.sandglass-software.com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>

Adrian Crum-3

Re: Entity Caching

I understand. Yes, that could occur.

But I still believe it is an edge case. ;)

Adrian Crum
Sandglass Software
www.sandglass-software.com

On 3/19/2015 8:37 PM, Scott Gray wrote:

> You're missing a step that actually causes the issue, prior to the rollback
> in 5b some code within the same transaction retrieves the modified row from
> the database again which puts the modified row in the cache and makes the
> change visible to other transactions even though it hasn't yet been
> committed.
>
> Because of our service oriented architecture this scenario isn't uncommon.
> An example is updating an OrderHeader's statusId which can trigger a number
> of SECAs which in turn are likely to retrieve the OrderHeader row after
> being passed only the orderId. If a rollback occurred in one of those
> services, the modified row would remain in the cache even though the
> changes were never committed.
> On 20 Mar 2015 00:06, "Adrian Crum" <[hidden email]>
> wrote:
>
>> Okay, let's assume processes cannot "see" changes made by another
>> transaction until that transaction is committed. Here is how the current
>> entity cache works:
>>
>> 1. A Delegator find method is invoked. The Delegator checks the cache, and
>> the SQL SELECT result does not exist in the cache.
>> 2. The Delegator executes the SQL SELECT and puts the results in the
>> entity cache.
>> 3. The SQL SELECT results are returned to the calling process.
>> 4. The calling process modifies one of the values (rows) in the SQL SELECT
>> result (after cloning the immutable entity value).
>> 5a. Something goes wrong and the calling process rolls back the
>> transaction before the cloned value is persisted.
>> 5b. Something goes wrong and the calling process rolls back the
>> transaction after the cloned value is persisted and all related caches have
>> been cleared.
>> 6. Another process performs the same query as #1.
>> 7. The second process gets the results from the cache. The values from the
>> cache have not changed because the cloned & modified value (in #4) was not
>> put in the cache, nor was it written to the data source.
>>
>> From my perspective, the scenario you described can only happen if another
>> process can see changes that are made in the data source before the
>> transaction is committed.
>>
>> From your perspective, the entity cache is somehow inserting invalid
>> values when a transaction is rolled back.
>>
>> Adrian Crum
>> Sandglass Software
>> www.sandglass-software.com
>>
>> On 3/19/2015 10:41 AM, Scott Gray wrote:
>>
>>> I'm sorry but I'm not following what you're proposing. Currently row
>>> changes caused within a transaction are available only to queries issued
>>> within that same transaction (i.e. read committed), except that the cache
>>> breaks this isolation by making them immediately available to any
>>> transaction querying that entity. I don't see how this scenario exists
>>> outside of the cache unless the logic within the transaction explicitly
>>> passes a row off to another transaction, and I'm not aware of any cases
>>> like that.
>>>
>>> On Thu, Mar 19, 2015 at 3:17 AM, Adrian Crum <
>>> [hidden email]> wrote:
>>>
>>> I call it an edge case because it is easily fixed by changing the
>>>> transaction isolation level.
>>>>
>>>> The behavior you describe is not caused by the entity cache, but by the
>>>> transaction isolation level. The same scenario would exist without the
>>>> entity cache - where two processes hold a reference to the updated row,
>>>> and
>>>> one process performs a rollback.
>>>>
>>>> Adrian Crum
>>>> Sandglass Software
>>>> www.sandglass-software.com
>>>>
>>>> On 3/19/2015 7:28 AM, Scott Gray wrote:
>>>>
>>>> Ah, it's quite a large edge case IMO
>>>>>
>>>>> On Thu, Mar 19, 2015 at 12:20 AM, Adrian Crum <
>>>>> [hidden email]> wrote:
>>>>>
>>>>> That is the edge case I mentioned.
>>>>>
>>>>>>
>>>>>> Adrian Crum
>>>>>> Sandglass Software
>>>>>> www.sandglass-software.com
>>>>>>
>>>>>> On 3/19/2015 6:54 AM, Scott Gray wrote:
>>>>>>
>>>>>> I tend to disagree with the "cache everything" approach because the
>>>>>>
>>>>>>> cache
>>>>>>> isn't transaction aware.
>>>>>>> If you:
>>>>>>> 1. update a record
>>>>>>> 2. select that same record
>>>>>>> 3. encounter a transaction rollback
>>>>>>>
>>>>>>> Then the cache will still contain the changes that were rolled back.
>>>>>>>
>>>>>>> Regards
>>>>>>> Scott
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Mar 18, 2015 at 5:16 AM, Adrian Crum <
>>>>>>> [hidden email]> wrote:
>>>>>>>
>>>>>>> I would like to share some insights into the entity cache feature,
>>>>>>> some
>>>>>>>
>>>>>>> best practices I like to follow, and some related information.
>>>>>>>>
>>>>>>>> Some OFBiz experts may disagree with some of my views, and that is
>>>>>>>> okay.
>>>>>>>> Different experiences with OFBiz will lead to different viewpoints.
>>>>>>>>
>>>>>>>> The OFBiz entity caching feature is intended to improve performance
>>>>>>>> by
>>>>>>>> keeping GenericValue instances in memory - decreasing the number of
>>>>>>>> calls
>>>>>>>> to the database.
>>>>>>>>
>>>>>>>> Background
>>>>>>>> ----------
>>>>>>>>
>>>>>>>> Initially, the entity cache was very unreliable due to a number of
>>>>>>>> flaws
>>>>>>>> in its design and in the code that calls it (it was guaranteed to
>>>>>>>> produce
>>>>>>>> stale data). As a result, I personally avoided using the entity cache
>>>>>>>> feature.
>>>>>>>>
>>>>>>>> Some time ago, Adam Heath did a lot of work on the entity cache.
>>>>>>>> After
>>>>>>>> that, Jacopo and I did a lot of work fixing stale data issues in the
>>>>>>>> entity
>>>>>>>> cache. Today, the entity cache is much improved and unit tests ensure
>>>>>>>> it
>>>>>>>> produces the correct data (except for one edge case that Jacopo has
>>>>>>>> identified).
>>>>>>>>
>>>>>>>> I mention all of this because the previous quirky behavior led to
>>>>>>>> some
>>>>>>>> "best practices" that didn't make much sense. A search through the
>>>>>>>> OFBiz
>>>>>>>> mail archives will produce a mountain of conflicting and confusing
>>>>>>>> information.
>>>>>>>>
>>>>>>>> Today
>>>>>>>> -----
>>>>>>>>
>>>>>>>> Since the current entity cache is reliable, there is no reason NOT to
>>>>>>>> use
>>>>>>>> it. My preference is to make ALL Delegator calls use the cache. If
>>>>>>>> all
>>>>>>>> code
>>>>>>>> uses the cache, then individual entities can have their caching
>>>>>>>> characteristics configured outside of code. This enables sysadmins to
>>>>>>>> fine-tune entity caches for best performance.
>>>>>>>>
>>>>>>>> [Some experts might disagree with this approach because the entity
>>>>>>>> cache
>>>>>>>> will consume all available memory. But the idea is to configure the
>>>>>>>> cache
>>>>>>>> so that doesn't happen.]
>>>>>>>>
>>>>>>>> If you code Delegator calls to avoid the cache, then there is no way
>>>>>>>> for
>>>>>>>> a
>>>>>>>> sysadmin to configure the caching behavior - that bit of code will
>>>>>>>> ALWAYS
>>>>>>>> make a database call.
>>>>>>>>
>>>>>>>> If you make all Delegator calls use the cache, then there is an
>>>>>>>> additional
>>>>>>>> complication that will add a bit more code: the GenericValue
>>>>>>>> instances
>>>>>>>> retrieved from the cache are immutable - if you want to modify them,
>>>>>>>> then
>>>>>>>> you will have to clone them. So, this approach can produce an
>>>>>>>> additional
>>>>>>>> line of code.
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Adrian Crum
>>>>>>>> Sandglass Software
>>>>>>>> www.sandglass-software.com
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>
>

Ron Wheeler

Re: Entity Caching

Isn't this the kind of issue that something like ehcache handles?
It seems to know the difference between a committed transaction and a
transaction which is in progress and might be rolled back.

Certainly a relational database with transaction support is not going to
allow a process to access data from other processes unless the
transaction is completed.
The cache needs to know the difference between private data (incomplete
transactions) and public data (data previously committed and not in the
process of being changed) and prevent others from using private data
from the cache.

On the bright side, an SOA does make this much more of an edge case at
the expense of moving transaction rollback higher up the application logic.

Ron

On 19/03/2015 4:55 PM, Adrian Crum wrote:

> I understand. Yes, that could occur.
>
> But I still believe it is an edge case. ;)
>
> Adrian Crum
> Sandglass Software
> www.sandglass-software.com
>
> On 3/19/2015 8:37 PM, Scott Gray wrote:
>> You're missing a step that actually causes the issue, prior to the
>> rollback
>> in 5b some code within the same transaction retrieves the modified
>> row from
>> the database again which puts the modified row in the cache and makes
>> the
>> change visible to other transactions even though it hasn't yet been
>> committed.
>>
>> Because of our service oriented architecture this scenario isn't
>> uncommon.
>> An example is updating an OrderHeader's statusId which can trigger a
>> number
>> of SECAs which in turn are likely to retrieve the OrderHeader row after
>> being passed only the orderId. If a rollback occurred in one of those
>> services, the modified row would remain in the cache even though the
>> changes were never committed.
>> On 20 Mar 2015 00:06, "Adrian Crum" <[hidden email]>
>> wrote:
>>
>>> Okay, let's assume processes cannot "see" changes made by another
>>> transaction until that transaction is committed. Here is how the
>>> current
>>> entity cache works:
>>>
>>> 1. A Delegator find method is invoked. The Delegator checks the
>>> cache, and
>>> the SQL SELECT result does not exist in the cache.
>>> 2. The Delegator executes the SQL SELECT and puts the results in the
>>> entity cache.
>>> 3. The SQL SELECT results are returned to the calling process.
>>> 4. The calling process modifies one of the values (rows) in the SQL
>>> SELECT
>>> result (after cloning the immutable entity value).
>>> 5a. Something goes wrong and the calling process rolls back the
>>> transaction before the cloned value is persisted.
>>> 5b. Something goes wrong and the calling process rolls back the
>>> transaction after the cloned value is persisted and all related
>>> caches have
>>> been cleared.
>>> 6. Another process performs the same query as #1.
>>> 7. The second process gets the results from the cache. The values
>>> from the
>>> cache have not changed because the cloned & modified value (in #4)
>>> was not
>>> put in the cache, nor was it written to the data source.
>>>
>>> From my perspective, the scenario you described can only happen if
>>> another
>>> process can see changes that are made in the data source before the
>>> transaction is committed.
>>>
>>> From your perspective, the entity cache is somehow inserting invalid
>>> values when a transaction is rolled back.
>>>
>>> Adrian Crum
>>> Sandglass Software
>>> www.sandglass-software.com
>>>
>>> On 3/19/2015 10:41 AM, Scott Gray wrote:
>>>
>>>> I'm sorry but I'm not following what you're proposing. Currently row
>>>> changes caused within a transaction are available only to queries
>>>> issued
>>>> within that same transaction (i.e. read committed), except that the
>>>> cache
>>>> breaks this isolation by making them immediately available to any
>>>> transaction querying that entity. I don't see how this scenario
>>>> exists
>>>> outside of the cache unless the logic within the transaction
>>>> explicitly
>>>> passes a row off to another transaction, and I'm not aware of any
>>>> cases
>>>> like that.
>>>>
>>>> On Thu, Mar 19, 2015 at 3:17 AM, Adrian Crum <
>>>> [hidden email]> wrote:
>>>>
>>>> I call it an edge case because it is easily fixed by changing the
>>>>> transaction isolation level.
>>>>>
>>>>> The behavior you describe is not caused by the entity cache, but
>>>>> by the
>>>>> transaction isolation level. The same scenario would exist without
>>>>> the
>>>>> entity cache - where two processes hold a reference to the updated
>>>>> row,
>>>>> and
>>>>> one process performs a rollback.
>>>>>
>>>>> Adrian Crum
>>>>> Sandglass Software
>>>>> www.sandglass-software.com
>>>>>
>>>>> On 3/19/2015 7:28 AM, Scott Gray wrote:
>>>>>
>>>>> Ah, it's quite a large edge case IMO
>>>>>>
>>>>>> On Thu, Mar 19, 2015 at 12:20 AM, Adrian Crum <
>>>>>> [hidden email]> wrote:
>>>>>>
>>>>>> That is the edge case I mentioned.
>>>>>>
>>>>>>>
>>>>>>> Adrian Crum
>>>>>>> Sandglass Software
>>>>>>> www.sandglass-software.com
>>>>>>>
>>>>>>> On 3/19/2015 6:54 AM, Scott Gray wrote:
>>>>>>>
>>>>>>> I tend to disagree with the "cache everything" approach
>>>>>>> because the
>>>>>>>
>>>>>>>> cache
>>>>>>>> isn't transaction aware.
>>>>>>>> If you:
>>>>>>>> 1. update a record
>>>>>>>> 2. select that same record
>>>>>>>> 3. encounter a transaction rollback
>>>>>>>>
>>>>>>>> Then the cache will still contain the changes that were rolled
>>>>>>>> back.
>>>>>>>>
>>>>>>>> Regards
>>>>>>>> Scott
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Mar 18, 2015 at 5:16 AM, Adrian Crum <
>>>>>>>> [hidden email]> wrote:
>>>>>>>>
>>>>>>>> I would like to share some insights into the entity cache
>>>>>>>> feature,
>>>>>>>> some
>>>>>>>>
>>>>>>>> best practices I like to follow, and some related information.
>>>>>>>>>
>>>>>>>>> Some OFBiz experts may disagree with some of my views, and
>>>>>>>>> that is
>>>>>>>>> okay.
>>>>>>>>> Different experiences with OFBiz will lead to different
>>>>>>>>> viewpoints.
>>>>>>>>>
>>>>>>>>> The OFBiz entity caching feature is intended to improve
>>>>>>>>> performance
>>>>>>>>> by
>>>>>>>>> keeping GenericValue instances in memory - decreasing the
>>>>>>>>> number of
>>>>>>>>> calls
>>>>>>>>> to the database.
>>>>>>>>>
>>>>>>>>> Background
>>>>>>>>> ----------
>>>>>>>>>
>>>>>>>>> Initially, the entity cache was very unreliable due to a
>>>>>>>>> number of
>>>>>>>>> flaws
>>>>>>>>> in its design and in the code that calls it (it was guaranteed to
>>>>>>>>> produce
>>>>>>>>> stale data). As a result, I personally avoided using the
>>>>>>>>> entity cache
>>>>>>>>> feature.
>>>>>>>>>
>>>>>>>>> Some time ago, Adam Heath did a lot of work on the entity cache.
>>>>>>>>> After
>>>>>>>>> that, Jacopo and I did a lot of work fixing stale data issues
>>>>>>>>> in the
>>>>>>>>> entity
>>>>>>>>> cache. Today, the entity cache is much improved and unit tests
>>>>>>>>> ensure
>>>>>>>>> it
>>>>>>>>> produces the correct data (except for one edge case that
>>>>>>>>> Jacopo has
>>>>>>>>> identified).
>>>>>>>>>
>>>>>>>>> I mention all of this because the previous quirky behavior led to
>>>>>>>>> some
>>>>>>>>> "best practices" that didn't make much sense. A search through
>>>>>>>>> the
>>>>>>>>> OFBiz
>>>>>>>>> mail archives will produce a mountain of conflicting and
>>>>>>>>> confusing
>>>>>>>>> information.
>>>>>>>>>
>>>>>>>>> Today
>>>>>>>>> -----
>>>>>>>>>
>>>>>>>>> Since the current entity cache is reliable, there is no reason
>>>>>>>>> NOT to
>>>>>>>>> use
>>>>>>>>> it. My preference is to make ALL Delegator calls use the
>>>>>>>>> cache. If
>>>>>>>>> all
>>>>>>>>> code
>>>>>>>>> uses the cache, then individual entities can have their caching
>>>>>>>>> characteristics configured outside of code. This enables
>>>>>>>>> sysadmins to
>>>>>>>>> fine-tune entity caches for best performance.
>>>>>>>>>
>>>>>>>>> [Some experts might disagree with this approach because the
>>>>>>>>> entity
>>>>>>>>> cache
>>>>>>>>> will consume all available memory. But the idea is to
>>>>>>>>> configure the
>>>>>>>>> cache
>>>>>>>>> so that doesn't happen.]
>>>>>>>>>
>>>>>>>>> If you code Delegator calls to avoid the cache, then there is
>>>>>>>>> no way
>>>>>>>>> for
>>>>>>>>> a
>>>>>>>>> sysadmin to configure the caching behavior - that bit of code
>>>>>>>>> will
>>>>>>>>> ALWAYS
>>>>>>>>> make a database call.
>>>>>>>>>
>>>>>>>>> If you make all Delegator calls use the cache, then there is an
>>>>>>>>> additional
>>>>>>>>> complication that will add a bit more code: the GenericValue
>>>>>>>>> instances
>>>>>>>>> retrieved from the cache are immutable - if you want to modify
>>>>>>>>> them,
>>>>>>>>> then
>>>>>>>>> you will have to clone them. So, this approach can produce an
>>>>>>>>> additional
>>>>>>>>> line of code.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Adrian Crum
>>>>>>>>> Sandglass Software
>>>>>>>>> www.sandglass-software.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>
>

--
Ron Wheeler
President
Artifact Software Inc
email: [hidden email]
skype: ronaldmwheeler
phone: 866-970-2435, ext 102

Scott Gray-3

Re: Entity Caching

Yes ehcache supports transactions and would ideally be what we use for
caching. I started work on it and there's a branch in svn for it but I
haven't had time to continue since December. Unfortunately there were a
few incompatible aspects of the existing OFBiz cache API and the ehcache
API which need to be reconciled before it would be possible to run the two
against the same API and compare them.

I'll restate my opinion that I don't think the lack of transactional
awareness by the OFBiz cache is an "edge case". I think if you try and
cache everything you'll soon encounter strange behavior that will be very
difficult to reproduce and debug. My preference is to cache data that is
read often and updated rarely.

On 20 March 2015 at 17:44, Ron Wheeler <[hidden email]>
wrote:

>
> Isn't this the kind of issue that something like ehcache handles?
> It seems to know the difference between a committed transaction and a
> transaction which is in progress and might be rolled back.
>
> Certainly a relational database with transaction support is not going to
> allow a process to access data from other processes unless the transaction
> is completed.
> The cache needs to know the difference between private data (incomplete
> transactions) and public data (data previously committed and not in the
> process of being changed) and prevent others from using private data from
> the cache.
>
> On the bright side, an SOA does make this much more of an edge case at the
> expense of moving transaction rollback higher up the application logic.
>
> Ron
>
>
>
> On 19/03/2015 4:55 PM, Adrian Crum wrote:
>
>> I understand. Yes, that could occur.
>>
>> But I still believe it is an edge case. ;)
>>
>> Adrian Crum
>> Sandglass Software
>> www.sandglass-software.com
>>
>> On 3/19/2015 8:37 PM, Scott Gray wrote:
>>
>>> You're missing a step that actually causes the issue, prior to the
>>> rollback
>>> in 5b some code within the same transaction retrieves the modified row
>>> from
>>> the database again which puts the modified row in the cache and makes the
>>> change visible to other transactions even though it hasn't yet been
>>> committed.
>>>
>>> Because of our service oriented architecture this scenario isn't
>>> uncommon.
>>> An example is updating an OrderHeader's statusId which can trigger a
>>> number
>>> of SECAs which in turn are likely to retrieve the OrderHeader row after
>>> being passed only the orderId. If a rollback occurred in one of those
>>> services, the modified row would remain in the cache even though the
>>> changes were never committed.
>>> On 20 Mar 2015 00:06, "Adrian Crum" <[hidden email]>
>>> wrote:
>>>
>>> Okay, let's assume processes cannot "see" changes made by another
>>>> transaction until that transaction is committed. Here is how the current
>>>> entity cache works:
>>>>
>>>> 1. A Delegator find method is invoked. The Delegator checks the cache,
>>>> and
>>>> the SQL SELECT result does not exist in the cache.
>>>> 2. The Delegator executes the SQL SELECT and puts the results in the
>>>> entity cache.
>>>> 3. The SQL SELECT results are returned to the calling process.
>>>> 4. The calling process modifies one of the values (rows) in the SQL
>>>> SELECT
>>>> result (after cloning the immutable entity value).
>>>> 5a. Something goes wrong and the calling process rolls back the
>>>> transaction before the cloned value is persisted.
>>>> 5b. Something goes wrong and the calling process rolls back the
>>>> transaction after the cloned value is persisted and all related caches
>>>> have
>>>> been cleared.
>>>> 6. Another process performs the same query as #1.
>>>> 7. The second process gets the results from the cache. The values from
>>>> the
>>>> cache have not changed because the cloned & modified value (in #4) was
>>>> not
>>>> put in the cache, nor was it written to the data source.
>>>>
>>>> From my perspective, the scenario you described can only happen if
>>>> another
>>>> process can see changes that are made in the data source before the
>>>> transaction is committed.
>>>>
>>>> From your perspective, the entity cache is somehow inserting invalid
>>>> values when a transaction is rolled back.
>>>>
>>>> Adrian Crum
>>>> Sandglass Software
>>>> www.sandglass-software.com
>>>>
>>>> On 3/19/2015 10:41 AM, Scott Gray wrote:
>>>>
>>>> I'm sorry but I'm not following what you're proposing. Currently row
>>>>> changes caused within a transaction are available only to queries
>>>>> issued
>>>>> within that same transaction (i.e. read committed), except that the
>>>>> cache
>>>>> breaks this isolation by making them immediately available to any
>>>>> transaction querying that entity. I don't see how this scenario exists
>>>>> outside of the cache unless the logic within the transaction explicitly
>>>>> passes a row off to another transaction, and I'm not aware of any cases
>>>>> like that.
>>>>>
>>>>> On Thu, Mar 19, 2015 at 3:17 AM, Adrian Crum <
>>>>> [hidden email]> wrote:
>>>>>
>>>>> I call it an edge case because it is easily fixed by changing the
>>>>>
>>>>>> transaction isolation level.
>>>>>>
>>>>>> The behavior you describe is not caused by the entity cache, but by
>>>>>> the
>>>>>> transaction isolation level. The same scenario would exist without the
>>>>>> entity cache - where two processes hold a reference to the updated
>>>>>> row,
>>>>>> and
>>>>>> one process performs a rollback.
>>>>>>
>>>>>> Adrian Crum
>>>>>> Sandglass Software
>>>>>> www.sandglass-software.com
>>>>>>
>>>>>> On 3/19/2015 7:28 AM, Scott Gray wrote:
>>>>>>
>>>>>> Ah, it's quite a large edge case IMO
>>>>>>
>>>>>>>
>>>>>>> On Thu, Mar 19, 2015 at 12:20 AM, Adrian Crum <
>>>>>>> [hidden email]> wrote:
>>>>>>>
>>>>>>> That is the edge case I mentioned.
>>>>>>>
>>>>>>>
>>>>>>>> Adrian Crum
>>>>>>>> Sandglass Software
>>>>>>>> www.sandglass-software.com
>>>>>>>>
>>>>>>>> On 3/19/2015 6:54 AM, Scott Gray wrote:
>>>>>>>>
>>>>>>>> I tend to disagree with the "cache everything" approach because
>>>>>>>> the
>>>>>>>>
>>>>>>>> cache
>>>>>>>>> isn't transaction aware.
>>>>>>>>> If you:
>>>>>>>>> 1. update a record
>>>>>>>>> 2. select that same record
>>>>>>>>> 3. encounter a transaction rollback
>>>>>>>>>
>>>>>>>>> Then the cache will still contain the changes that were rolled
>>>>>>>>> back.
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>> Scott
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Mar 18, 2015 at 5:16 AM, Adrian Crum <
>>>>>>>>> [hidden email]> wrote:
>>>>>>>>>
>>>>>>>>> I would like to share some insights into the entity cache
>>>>>>>>> feature,
>>>>>>>>> some
>>>>>>>>>
>>>>>>>>> best practices I like to follow, and some related information.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Some OFBiz experts may disagree with some of my views, and that is
>>>>>>>>>> okay.
>>>>>>>>>> Different experiences with OFBiz will lead to different
>>>>>>>>>> viewpoints.
>>>>>>>>>>
>>>>>>>>>> The OFBiz entity caching feature is intended to improve
>>>>>>>>>> performance
>>>>>>>>>> by
>>>>>>>>>> keeping GenericValue instances in memory - decreasing the number
>>>>>>>>>> of
>>>>>>>>>> calls
>>>>>>>>>> to the database.
>>>>>>>>>>
>>>>>>>>>> Background
>>>>>>>>>> ----------
>>>>>>>>>>
>>>>>>>>>> Initially, the entity cache was very unreliable due to a number of
>>>>>>>>>> flaws
>>>>>>>>>> in its design and in the code that calls it (it was guaranteed to
>>>>>>>>>> produce
>>>>>>>>>> stale data). As a result, I personally avoided using the entity
>>>>>>>>>> cache
>>>>>>>>>> feature.
>>>>>>>>>>
>>>>>>>>>> Some time ago, Adam Heath did a lot of work on the entity cache.
>>>>>>>>>> After
>>>>>>>>>> that, Jacopo and I did a lot of work fixing stale data issues in
>>>>>>>>>> the
>>>>>>>>>> entity
>>>>>>>>>> cache. Today, the entity cache is much improved and unit tests
>>>>>>>>>> ensure
>>>>>>>>>> it
>>>>>>>>>> produces the correct data (except for one edge case that Jacopo
>>>>>>>>>> has
>>>>>>>>>> identified).
>>>>>>>>>>
>>>>>>>>>> I mention all of this because the previous quirky behavior led to
>>>>>>>>>> some
>>>>>>>>>> "best practices" that didn't make much sense. A search through the
>>>>>>>>>> OFBiz
>>>>>>>>>> mail archives will produce a mountain of conflicting and confusing
>>>>>>>>>> information.
>>>>>>>>>>
>>>>>>>>>> Today
>>>>>>>>>> -----
>>>>>>>>>>
>>>>>>>>>> Since the current entity cache is reliable, there is no reason
>>>>>>>>>> NOT to
>>>>>>>>>> use
>>>>>>>>>> it. My preference is to make ALL Delegator calls use the cache. If
>>>>>>>>>> all
>>>>>>>>>> code
>>>>>>>>>> uses the cache, then individual entities can have their caching
>>>>>>>>>> characteristics configured outside of code. This enables
>>>>>>>>>> sysadmins to
>>>>>>>>>> fine-tune entity caches for best performance.
>>>>>>>>>>
>>>>>>>>>> [Some experts might disagree with this approach because the entity
>>>>>>>>>> cache
>>>>>>>>>> will consume all available memory. But the idea is to configure
>>>>>>>>>> the
>>>>>>>>>> cache
>>>>>>>>>> so that doesn't happen.]
>>>>>>>>>>
>>>>>>>>>> If you code Delegator calls to avoid the cache, then there is no
>>>>>>>>>> way
>>>>>>>>>> for
>>>>>>>>>> a
>>>>>>>>>> sysadmin to configure the caching behavior - that bit of code will
>>>>>>>>>> ALWAYS
>>>>>>>>>> make a database call.
>>>>>>>>>>
>>>>>>>>>> If you make all Delegator calls use the cache, then there is an
>>>>>>>>>> additional
>>>>>>>>>> complication that will add a bit more code: the GenericValue
>>>>>>>>>> instances
>>>>>>>>>> retrieved from the cache are immutable - if you want to modify
>>>>>>>>>> them,
>>>>>>>>>> then
>>>>>>>>>> you will have to clone them. So, this approach can produce an
>>>>>>>>>> additional
>>>>>>>>>> line of code.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Adrian Crum
>>>>>>>>>> Sandglass Software
>>>>>>>>>> www.sandglass-software.com
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>
>>
>
> --
> Ron Wheeler
> President
> Artifact Software Inc
> email: [hidden email]
> skype: ronaldmwheeler
> phone: 866-970-2435, ext 102
>
>

Jacques Le Roux

Re: Entity Caching

Administrator

+1

Jacques

Le 20/03/2015 08:46, Scott Gray a écrit :

> Yes ehcache supports transactions and would ideally be what we use for
> caching. I started work on it and there's a branch in svn for it but I
> haven't had time to continue since December. Unfortunately there were a
> few incompatible aspects of the existing OFBiz cache API and the ehcache
> API which need to be reconciled before it would be possible to run the two
> against the same API and compare them.
>
> I'll restate my opinion that I don't think the lack of transactional
> awareness by the OFBiz cache is an "edge case". I think if you try and
> cache everything you'll soon encounter strange behavior that will be very
> difficult to reproduce and debug. My preference is to cache data that is
> read often and updated rarely.
>
> On 20 March 2015 at 17:44, Ron Wheeler <[hidden email]>
> wrote:
>
>> Isn't this the kind of issue that something like ehcache handles?
>> It seems to know the difference between a committed transaction and a
>> transaction which is in progress and might be rolled back.
>>
>> Certainly a relational database with transaction support is not going to
>> allow a process to access data from other processes unless the transaction
>> is completed.
>> The cache needs to know the difference between private data (incomplete
>> transactions) and public data (data previously committed and not in the
>> process of being changed) and prevent others from using private data from
>> the cache.
>>
>> On the bright side, an SOA does make this much more of an edge case at the
>> expense of moving transaction rollback higher up the application logic.
>>
>> Ron
>>
>>
>>
>> On 19/03/2015 4:55 PM, Adrian Crum wrote:
>>
>>> I understand. Yes, that could occur.
>>>
>>> But I still believe it is an edge case. ;)
>>>
>>> Adrian Crum
>>> Sandglass Software
>>> www.sandglass-software.com
>>>
>>> On 3/19/2015 8:37 PM, Scott Gray wrote:
>>>
>>>> You're missing a step that actually causes the issue, prior to the
>>>> rollback
>>>> in 5b some code within the same transaction retrieves the modified row
>>>> from
>>>> the database again which puts the modified row in the cache and makes the
>>>> change visible to other transactions even though it hasn't yet been
>>>> committed.
>>>>
>>>> Because of our service oriented architecture this scenario isn't
>>>> uncommon.
>>>> An example is updating an OrderHeader's statusId which can trigger a
>>>> number
>>>> of SECAs which in turn are likely to retrieve the OrderHeader row after
>>>> being passed only the orderId. If a rollback occurred in one of those
>>>> services, the modified row would remain in the cache even though the
>>>> changes were never committed.
>>>> On 20 Mar 2015 00:06, "Adrian Crum" <[hidden email]>
>>>> wrote:
>>>>
>>>> Okay, let's assume processes cannot "see" changes made by another
>>>>> transaction until that transaction is committed. Here is how the current
>>>>> entity cache works:
>>>>>
>>>>> 1. A Delegator find method is invoked. The Delegator checks the cache,
>>>>> and
>>>>> the SQL SELECT result does not exist in the cache.
>>>>> 2. The Delegator executes the SQL SELECT and puts the results in the
>>>>> entity cache.
>>>>> 3. The SQL SELECT results are returned to the calling process.
>>>>> 4. The calling process modifies one of the values (rows) in the SQL
>>>>> SELECT
>>>>> result (after cloning the immutable entity value).
>>>>> 5a. Something goes wrong and the calling process rolls back the
>>>>> transaction before the cloned value is persisted.
>>>>> 5b. Something goes wrong and the calling process rolls back the
>>>>> transaction after the cloned value is persisted and all related caches
>>>>> have
>>>>> been cleared.
>>>>> 6. Another process performs the same query as #1.
>>>>> 7. The second process gets the results from the cache. The values from
>>>>> the
>>>>> cache have not changed because the cloned & modified value (in #4) was
>>>>> not
>>>>> put in the cache, nor was it written to the data source.
>>>>>
>>>>> From my perspective, the scenario you described can only happen if
>>>>> another
>>>>> process can see changes that are made in the data source before the
>>>>> transaction is committed.
>>>>>
>>>>> From your perspective, the entity cache is somehow inserting invalid
>>>>> values when a transaction is rolled back.
>>>>>
>>>>> Adrian Crum
>>>>> Sandglass Software
>>>>> www.sandglass-software.com
>>>>>
>>>>> On 3/19/2015 10:41 AM, Scott Gray wrote:
>>>>>
>>>>> I'm sorry but I'm not following what you're proposing. Currently row
>>>>>> changes caused within a transaction are available only to queries
>>>>>> issued
>>>>>> within that same transaction (i.e. read committed), except that the
>>>>>> cache
>>>>>> breaks this isolation by making them immediately available to any
>>>>>> transaction querying that entity. I don't see how this scenario exists
>>>>>> outside of the cache unless the logic within the transaction explicitly
>>>>>> passes a row off to another transaction, and I'm not aware of any cases
>>>>>> like that.
>>>>>>
>>>>>> On Thu, Mar 19, 2015 at 3:17 AM, Adrian Crum <
>>>>>> [hidden email]> wrote:
>>>>>>
>>>>>> I call it an edge case because it is easily fixed by changing the
>>>>>>
>>>>>>> transaction isolation level.
>>>>>>>
>>>>>>> The behavior you describe is not caused by the entity cache, but by
>>>>>>> the
>>>>>>> transaction isolation level. The same scenario would exist without the
>>>>>>> entity cache - where two processes hold a reference to the updated
>>>>>>> row,
>>>>>>> and
>>>>>>> one process performs a rollback.
>>>>>>>
>>>>>>> Adrian Crum
>>>>>>> Sandglass Software
>>>>>>> www.sandglass-software.com
>>>>>>>
>>>>>>> On 3/19/2015 7:28 AM, Scott Gray wrote:
>>>>>>>
>>>>>>> Ah, it's quite a large edge case IMO
>>>>>>>
>>>>>>>> On Thu, Mar 19, 2015 at 12:20 AM, Adrian Crum <
>>>>>>>> [hidden email]> wrote:
>>>>>>>>
>>>>>>>> That is the edge case I mentioned.
>>>>>>>>
>>>>>>>>
>>>>>>>>> Adrian Crum
>>>>>>>>> Sandglass Software
>>>>>>>>> www.sandglass-software.com
>>>>>>>>>
>>>>>>>>> On 3/19/2015 6:54 AM, Scott Gray wrote:
>>>>>>>>>
>>>>>>>>> I tend to disagree with the "cache everything" approach because
>>>>>>>>> the
>>>>>>>>>
>>>>>>>>> cache
>>>>>>>>>> isn't transaction aware.
>>>>>>>>>> If you:
>>>>>>>>>> 1. update a record
>>>>>>>>>> 2. select that same record
>>>>>>>>>> 3. encounter a transaction rollback
>>>>>>>>>>
>>>>>>>>>> Then the cache will still contain the changes that were rolled
>>>>>>>>>> back.
>>>>>>>>>>
>>>>>>>>>> Regards
>>>>>>>>>> Scott
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Mar 18, 2015 at 5:16 AM, Adrian Crum <
>>>>>>>>>> [hidden email]> wrote:
>>>>>>>>>>
>>>>>>>>>> I would like to share some insights into the entity cache
>>>>>>>>>> feature,
>>>>>>>>>> some
>>>>>>>>>>
>>>>>>>>>> best practices I like to follow, and some related information.
>>>>>>>>>>
>>>>>>>>>>> Some OFBiz experts may disagree with some of my views, and that is
>>>>>>>>>>> okay.
>>>>>>>>>>> Different experiences with OFBiz will lead to different
>>>>>>>>>>> viewpoints.
>>>>>>>>>>>
>>>>>>>>>>> The OFBiz entity caching feature is intended to improve
>>>>>>>>>>> performance
>>>>>>>>>>> by
>>>>>>>>>>> keeping GenericValue instances in memory - decreasing the number
>>>>>>>>>>> of
>>>>>>>>>>> calls
>>>>>>>>>>> to the database.
>>>>>>>>>>>
>>>>>>>>>>> Background
>>>>>>>>>>> ----------
>>>>>>>>>>>
>>>>>>>>>>> Initially, the entity cache was very unreliable due to a number of
>>>>>>>>>>> flaws
>>>>>>>>>>> in its design and in the code that calls it (it was guaranteed to
>>>>>>>>>>> produce
>>>>>>>>>>> stale data). As a result, I personally avoided using the entity
>>>>>>>>>>> cache
>>>>>>>>>>> feature.
>>>>>>>>>>>
>>>>>>>>>>> Some time ago, Adam Heath did a lot of work on the entity cache.
>>>>>>>>>>> After
>>>>>>>>>>> that, Jacopo and I did a lot of work fixing stale data issues in
>>>>>>>>>>> the
>>>>>>>>>>> entity
>>>>>>>>>>> cache. Today, the entity cache is much improved and unit tests
>>>>>>>>>>> ensure
>>>>>>>>>>> it
>>>>>>>>>>> produces the correct data (except for one edge case that Jacopo
>>>>>>>>>>> has
>>>>>>>>>>> identified).
>>>>>>>>>>>
>>>>>>>>>>> I mention all of this because the previous quirky behavior led to
>>>>>>>>>>> some
>>>>>>>>>>> "best practices" that didn't make much sense. A search through the
>>>>>>>>>>> OFBiz
>>>>>>>>>>> mail archives will produce a mountain of conflicting and confusing
>>>>>>>>>>> information.
>>>>>>>>>>>
>>>>>>>>>>> Today
>>>>>>>>>>> -----
>>>>>>>>>>>
>>>>>>>>>>> Since the current entity cache is reliable, there is no reason
>>>>>>>>>>> NOT to
>>>>>>>>>>> use
>>>>>>>>>>> it. My preference is to make ALL Delegator calls use the cache. If
>>>>>>>>>>> all
>>>>>>>>>>> code
>>>>>>>>>>> uses the cache, then individual entities can have their caching
>>>>>>>>>>> characteristics configured outside of code. This enables
>>>>>>>>>>> sysadmins to
>>>>>>>>>>> fine-tune entity caches for best performance.
>>>>>>>>>>>
>>>>>>>>>>> [Some experts might disagree with this approach because the entity
>>>>>>>>>>> cache
>>>>>>>>>>> will consume all available memory. But the idea is to configure
>>>>>>>>>>> the
>>>>>>>>>>> cache
>>>>>>>>>>> so that doesn't happen.]
>>>>>>>>>>>
>>>>>>>>>>> If you code Delegator calls to avoid the cache, then there is no
>>>>>>>>>>> way
>>>>>>>>>>> for
>>>>>>>>>>> a
>>>>>>>>>>> sysadmin to configure the caching behavior - that bit of code will
>>>>>>>>>>> ALWAYS
>>>>>>>>>>> make a database call.
>>>>>>>>>>>
>>>>>>>>>>> If you make all Delegator calls use the cache, then there is an
>>>>>>>>>>> additional
>>>>>>>>>>> complication that will add a bit more code: the GenericValue
>>>>>>>>>>> instances
>>>>>>>>>>> retrieved from the cache are immutable - if you want to modify
>>>>>>>>>>> them,
>>>>>>>>>>> then
>>>>>>>>>>> you will have to clone them. So, this approach can produce an
>>>>>>>>>>> additional
>>>>>>>>>>> line of code.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Adrian Crum
>>>>>>>>>>> Sandglass Software
>>>>>>>>>>> www.sandglass-software.com
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>> --
>> Ron Wheeler
>> President
>> Artifact Software Inc
>> email: [hidden email]
>> skype: ronaldmwheeler
>> phone: 866-970-2435, ext 102
>>
>>

Ron Wheeler

Re: Entity Caching

I guess "edge" is a subjective term and I was careful to add "more of
an" to allow for different perspectives.
In the end a framework that gives erroneous results occasionally is a
bit hard to work with unless the causes of erroneous results can be
clearly identified and are easy to avoid by workarounds in the
application or framework code.
If the errors are going to be random and unavoidable (workload
dependent), they really need to be fixed.
An accounting system that occasionally gives bad results or triggers
factory orders based on phantom backlogs, is not very good.

It is not clear to me that caching is or should be part of the key
competencies of this group.
Is there a critical mass of caching expertise in the group to properly
maintain a custom caching system given the other demands on time and
resources.
It seems to be one of those technologies (databases, containers, UI
frameworks, etc.) that can and should be left to external resources if
at all possible.

Assuming that the custom caching solution could handle all of the use
cases correctly, the key question is how much effort would it take to to
fix and maintain the current caching system in comparison to moving to
ehcache.

Is there any urgency to fixing caching?
Is it broken now or is it required to support the implementation of a
new feature?

Ron

On 20/03/2015 5:56 AM, Jacques Le Roux wrote:

> +1
>
> Jacques
>
> Le 20/03/2015 08:46, Scott Gray a écrit :
>> Yes ehcache supports transactions and would ideally be what we use for
>> caching. I started work on it and there's a branch in svn for it but I
>> haven't had time to continue since December. Unfortunately there were a
>> few incompatible aspects of the existing OFBiz cache API and the ehcache
>> API which need to be reconciled before it would be possible to run
>> the two
>> against the same API and compare them.
>>
>> I'll restate my opinion that I don't think the lack of transactional
>> awareness by the OFBiz cache is an "edge case". I think if you try and
>> cache everything you'll soon encounter strange behavior that will be
>> very
>> difficult to reproduce and debug. My preference is to cache data
>> that is
>> read often and updated rarely.
>>
>> On 20 March 2015 at 17:44, Ron Wheeler <[hidden email]>
>> wrote:
>>
>>> Isn't this the kind of issue that something like ehcache handles?
>>> It seems to know the difference between a committed transaction and a
>>> transaction which is in progress and might be rolled back.
>>>
>>> Certainly a relational database with transaction support is not
>>> going to
>>> allow a process to access data from other processes unless the
>>> transaction
>>> is completed.
>>> The cache needs to know the difference between private data (incomplete
>>> transactions) and public data (data previously committed and not in the
>>> process of being changed) and prevent others from using private data
>>> from
>>> the cache.
>>>
>>> On the bright side, an SOA does make this much more of an edge case
>>> at the
>>> expense of moving transaction rollback higher up the application logic.
>>>
>>> Ron
>>>
>>>
>>>
>>> On 19/03/2015 4:55 PM, Adrian Crum wrote:
>>>
>>>> I understand. Yes, that could occur.
>>>>
>>>> But I still believe it is an edge case. ;)
>>>>
>>>> Adrian Crum
>>>> Sandglass Software
>>>> www.sandglass-software.com
>>>>
>>>> On 3/19/2015 8:37 PM, Scott Gray wrote:
>>>>
>>>>> You're missing a step that actually causes the issue, prior to the
>>>>> rollback
>>>>> in 5b some code within the same transaction retrieves the modified
>>>>> row
>>>>> from
>>>>> the database again which puts the modified row in the cache and
>>>>> makes the
>>>>> change visible to other transactions even though it hasn't yet been
>>>>> committed.
>>>>>
>>>>> Because of our service oriented architecture this scenario isn't
>>>>> uncommon.
>>>>> An example is updating an OrderHeader's statusId which can trigger a
>>>>> number
>>>>> of SECAs which in turn are likely to retrieve the OrderHeader row
>>>>> after
>>>>> being passed only the orderId. If a rollback occurred in one of those
>>>>> services, the modified row would remain in the cache even though the
>>>>> changes were never committed.
>>>>> On 20 Mar 2015 00:06, "Adrian Crum"
>>>>> <[hidden email]>
>>>>> wrote:
>>>>>
>>>>> Okay, let's assume processes cannot "see" changes made by another
>>>>>> transaction until that transaction is committed. Here is how the
>>>>>> current
>>>>>> entity cache works:
>>>>>>
>>>>>> 1. A Delegator find method is invoked. The Delegator checks the
>>>>>> cache,
>>>>>> and
>>>>>> the SQL SELECT result does not exist in the cache.
>>>>>> 2. The Delegator executes the SQL SELECT and puts the results in the
>>>>>> entity cache.
>>>>>> 3. The SQL SELECT results are returned to the calling process.
>>>>>> 4. The calling process modifies one of the values (rows) in the SQL
>>>>>> SELECT
>>>>>> result (after cloning the immutable entity value).
>>>>>> 5a. Something goes wrong and the calling process rolls back the
>>>>>> transaction before the cloned value is persisted.
>>>>>> 5b. Something goes wrong and the calling process rolls back the
>>>>>> transaction after the cloned value is persisted and all related
>>>>>> caches
>>>>>> have
>>>>>> been cleared.
>>>>>> 6. Another process performs the same query as #1.
>>>>>> 7. The second process gets the results from the cache. The values
>>>>>> from
>>>>>> the
>>>>>> cache have not changed because the cloned & modified value (in
>>>>>> #4) was
>>>>>> not
>>>>>> put in the cache, nor was it written to the data source.
>>>>>>
>>>>>> From my perspective, the scenario you described can only happen if
>>>>>> another
>>>>>> process can see changes that are made in the data source before the
>>>>>> transaction is committed.
>>>>>>
>>>>>> From your perspective, the entity cache is somehow inserting
>>>>>> invalid
>>>>>> values when a transaction is rolled back.
>>>>>>
>>>>>> Adrian Crum
>>>>>> Sandglass Software
>>>>>> www.sandglass-software.com
>>>>>>
>>>>>> On 3/19/2015 10:41 AM, Scott Gray wrote:
>>>>>>
>>>>>> I'm sorry but I'm not following what you're proposing.
>>>>>> Currently row
>>>>>>> changes caused within a transaction are available only to queries
>>>>>>> issued
>>>>>>> within that same transaction (i.e. read committed), except that the
>>>>>>> cache
>>>>>>> breaks this isolation by making them immediately available to any
>>>>>>> transaction querying that entity. I don't see how this scenario
>>>>>>> exists
>>>>>>> outside of the cache unless the logic within the transaction
>>>>>>> explicitly
>>>>>>> passes a row off to another transaction, and I'm not aware of
>>>>>>> any cases
>>>>>>> like that.
>>>>>>>
>>>>>>> On Thu, Mar 19, 2015 at 3:17 AM, Adrian Crum <
>>>>>>> [hidden email]> wrote:
>>>>>>>
>>>>>>> I call it an edge case because it is easily fixed by changing
>>>>>>> the
>>>>>>>
>>>>>>>> transaction isolation level.
>>>>>>>>
>>>>>>>> The behavior you describe is not caused by the entity cache,
>>>>>>>> but by
>>>>>>>> the
>>>>>>>> transaction isolation level. The same scenario would exist
>>>>>>>> without the
>>>>>>>> entity cache - where two processes hold a reference to the updated
>>>>>>>> row,
>>>>>>>> and
>>>>>>>> one process performs a rollback.
>>>>>>>>
>>>>>>>> Adrian Crum
>>>>>>>> Sandglass Software
>>>>>>>> www.sandglass-software.com
>>>>>>>>
>>>>>>>> On 3/19/2015 7:28 AM, Scott Gray wrote:
>>>>>>>>
>>>>>>>> Ah, it's quite a large edge case IMO
>>>>>>>>
>>>>>>>>> On Thu, Mar 19, 2015 at 12:20 AM, Adrian Crum <
>>>>>>>>> [hidden email]> wrote:
>>>>>>>>>
>>>>>>>>> That is the edge case I mentioned.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Adrian Crum
>>>>>>>>>> Sandglass Software
>>>>>>>>>> www.sandglass-software.com
>>>>>>>>>>
>>>>>>>>>> On 3/19/2015 6:54 AM, Scott Gray wrote:
>>>>>>>>>>
>>>>>>>>>> I tend to disagree with the "cache everything" approach
>>>>>>>>>> because
>>>>>>>>>> the
>>>>>>>>>>
>>>>>>>>>> cache
>>>>>>>>>>> isn't transaction aware.
>>>>>>>>>>> If you:
>>>>>>>>>>> 1. update a record
>>>>>>>>>>> 2. select that same record
>>>>>>>>>>> 3. encounter a transaction rollback
>>>>>>>>>>>
>>>>>>>>>>> Then the cache will still contain the changes that were rolled
>>>>>>>>>>> back.
>>>>>>>>>>>
>>>>>>>>>>> Regards
>>>>>>>>>>> Scott
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Mar 18, 2015 at 5:16 AM, Adrian Crum <
>>>>>>>>>>> [hidden email]> wrote:
>>>>>>>>>>>
>>>>>>>>>>> I would like to share some insights into the entity cache
>>>>>>>>>>> feature,
>>>>>>>>>>> some
>>>>>>>>>>>
>>>>>>>>>>> best practices I like to follow, and some related
>>>>>>>>>>> information.
>>>>>>>>>>>
>>>>>>>>>>>> Some OFBiz experts may disagree with some of my views, and
>>>>>>>>>>>> that is
>>>>>>>>>>>> okay.
>>>>>>>>>>>> Different experiences with OFBiz will lead to different
>>>>>>>>>>>> viewpoints.
>>>>>>>>>>>>
>>>>>>>>>>>> The OFBiz entity caching feature is intended to improve
>>>>>>>>>>>> performance
>>>>>>>>>>>> by
>>>>>>>>>>>> keeping GenericValue instances in memory - decreasing the
>>>>>>>>>>>> number
>>>>>>>>>>>> of
>>>>>>>>>>>> calls
>>>>>>>>>>>> to the database.
>>>>>>>>>>>>
>>>>>>>>>>>> Background
>>>>>>>>>>>> ----------
>>>>>>>>>>>>
>>>>>>>>>>>> Initially, the entity cache was very unreliable due to a
>>>>>>>>>>>> number of
>>>>>>>>>>>> flaws
>>>>>>>>>>>> in its design and in the code that calls it (it was
>>>>>>>>>>>> guaranteed to
>>>>>>>>>>>> produce
>>>>>>>>>>>> stale data). As a result, I personally avoided using the
>>>>>>>>>>>> entity
>>>>>>>>>>>> cache
>>>>>>>>>>>> feature.
>>>>>>>>>>>>
>>>>>>>>>>>> Some time ago, Adam Heath did a lot of work on the entity
>>>>>>>>>>>> cache.
>>>>>>>>>>>> After
>>>>>>>>>>>> that, Jacopo and I did a lot of work fixing stale data
>>>>>>>>>>>> issues in
>>>>>>>>>>>> the
>>>>>>>>>>>> entity
>>>>>>>>>>>> cache. Today, the entity cache is much improved and unit tests
>>>>>>>>>>>> ensure
>>>>>>>>>>>> it
>>>>>>>>>>>> produces the correct data (except for one edge case that
>>>>>>>>>>>> Jacopo
>>>>>>>>>>>> has
>>>>>>>>>>>> identified).
>>>>>>>>>>>>
>>>>>>>>>>>> I mention all of this because the previous quirky behavior
>>>>>>>>>>>> led to
>>>>>>>>>>>> some
>>>>>>>>>>>> "best practices" that didn't make much sense. A search
>>>>>>>>>>>> through the
>>>>>>>>>>>> OFBiz
>>>>>>>>>>>> mail archives will produce a mountain of conflicting and
>>>>>>>>>>>> confusing
>>>>>>>>>>>> information.
>>>>>>>>>>>>
>>>>>>>>>>>> Today
>>>>>>>>>>>> -----
>>>>>>>>>>>>
>>>>>>>>>>>> Since the current entity cache is reliable, there is no reason
>>>>>>>>>>>> NOT to
>>>>>>>>>>>> use
>>>>>>>>>>>> it. My preference is to make ALL Delegator calls use the
>>>>>>>>>>>> cache. If
>>>>>>>>>>>> all
>>>>>>>>>>>> code
>>>>>>>>>>>> uses the cache, then individual entities can have their
>>>>>>>>>>>> caching
>>>>>>>>>>>> characteristics configured outside of code. This enables
>>>>>>>>>>>> sysadmins to
>>>>>>>>>>>> fine-tune entity caches for best performance.
>>>>>>>>>>>>
>>>>>>>>>>>> [Some experts might disagree with this approach because the
>>>>>>>>>>>> entity
>>>>>>>>>>>> cache
>>>>>>>>>>>> will consume all available memory. But the idea is to
>>>>>>>>>>>> configure
>>>>>>>>>>>> the
>>>>>>>>>>>> cache
>>>>>>>>>>>> so that doesn't happen.]
>>>>>>>>>>>>
>>>>>>>>>>>> If you code Delegator calls to avoid the cache, then there
>>>>>>>>>>>> is no
>>>>>>>>>>>> way
>>>>>>>>>>>> for
>>>>>>>>>>>> a
>>>>>>>>>>>> sysadmin to configure the caching behavior - that bit of
>>>>>>>>>>>> code will
>>>>>>>>>>>> ALWAYS
>>>>>>>>>>>> make a database call.
>>>>>>>>>>>>
>>>>>>>>>>>> If you make all Delegator calls use the cache, then there
>>>>>>>>>>>> is an
>>>>>>>>>>>> additional
>>>>>>>>>>>> complication that will add a bit more code: the GenericValue
>>>>>>>>>>>> instances
>>>>>>>>>>>> retrieved from the cache are immutable - if you want to modify
>>>>>>>>>>>> them,
>>>>>>>>>>>> then
>>>>>>>>>>>> you will have to clone them. So, this approach can produce an
>>>>>>>>>>>> additional
>>>>>>>>>>>> line of code.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Adrian Crum
>>>>>>>>>>>> Sandglass Software
>>>>>>>>>>>> www.sandglass-software.com
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>> --
>>> Ron Wheeler
>>> President
>>> Artifact Software Inc
>>> email: [hidden email]
>>> skype: ronaldmwheeler
>>> phone: 866-970-2435, ext 102
>>>
>>>
>

--
Ron Wheeler
President
Artifact Software Inc
email: [hidden email]
skype: ronaldmwheeler
phone: 866-970-2435, ext 102

Nicolas Malin-2

Re: Entity Caching

In reply to this post by Adrian Crum-3

Le 19/03/2015 18:46, Adrian Crum a écrit :
> The translation to English is not good, but I think I understand what
> you are saying.
Oups my apologies !

>
> The entity values in the cache MUST be immutable - because multiple
> threads share the values. To do otherwise would require complicated
> synchronization code in GenericValue (which would cause blocking and
> hurt performance).
>
> When I first starting working on the entity cache issues, it appeared
> to me that mutable entity values may have been in the original design
> (to enable a write-through cache). That is my guess - I am not sure.
> At some time, the entity values in the cache were made immutable, but
> the change was incomplete - some cached entity values were immutable
> and others were not. That is one of the things I fixed - I made sure
> ALL entity values coming from the cache are immutable.

:) thanks Adrian for this mind refresh ! It's logical
>
> One way we can eliminate the additional complication of cloning
> immutable entity values is to wrap the List in a custom Iterator
> implementation that automatically clones elements as they are
> retrieved from the List. The drawback is the performance hit - because
> you would be cloning values that might not get modified. I think it is
> more efficient to clone an entity value only when you intend to modify
> it.
Right. An other way would be add one step for the developper to prepare
the GenericValue for update.

GenericValue party = delegator.find("Party", "partyId", partyId);
party = party.openForUpdate();
party.set("comments", "groovy");
party.store();

on list
List<GenericValue> parties : delegator.findList("Party", null, null,
null, null);
for (GenericValue party : parties) {
if (case 1) {
party = party.openForUpdate();
party.set("comments", "groovy");
toStore.add(party);
}
}

With : GenericValue.openForUpdate() {
if (this.isMutable()) return this;
return this.clone();
}

It's just a draft idea to conciliate Sys admin, dev and performance

Nicolas

>
> Adrian Crum
> Sandglass Software
> www.sandglass-software.com
>
> On 3/19/2015 4:19 PM, Nicolas Malin wrote:
>> Le 18/03/2015 13:16, Adrian Crum a écrit :
>>> If you code Delegator calls to avoid the cache, then there is no way
>>> for a sysadmin to configure the caching behavior - that bit of code
>>> will ALWAYS make a database call.
>>>
>>> If you make all Delegator calls use the cache, then there is an
>>> additional complication that will add a bit more code: the
>>> GenericValue instances retrieved from the cache are immutable - if you
>>> want to modify them, then you will have to clone them. So, this
>>> approach can produce an additional line of code.
>>
>> I don't see any logical reason why we need to keep a GenericValue came
>> from cache as immutable. In large vision, a developper give information
>> on cache or not only he want force the cache using during his process.
>> As OFBiz manage by default transaction, timezone, locale, auto-matching
>> or others.
>> The entity engine would be works with admin sys cache tuning.
>>
>> As example delegator.find("Party", "partyId", partyId) use the default
>> parameter from cache.properties and after the store on a cached
>> GenericValue is a delegator's problem. I see a simple test like that :
>> if (genericValue came from cache) {
>> if (value is already done) {
>> getFromDataBase
>> update Value
>> }
>> else refuse (or not I have a doubt :) )
>> }
>> store
>>
>>
>> Nicolas

David E. Jones-2

Re: Entity Caching

In reply to this post by Adrian Crum-3

Stepping back a little, some history and theory of the entity cache might be helpful.

The original intent of the entity cache was a simple way to keep frequently used values/records closer to the code that uses them, ie in the application server. One real world example of this is the goal to be able to render ecommerce catalog and product pages without hitting the database.

Over time the entity caching was made more complex to handle more caching scenarios, but still left to the developer to determine if caching is appropriate for the code they are writing.

In theory is it possible to write an entity cache that can be used 100% of the time? IMO the answer is NO. This is almost possible for single record caching, with the cache ultimately becoming an in-memory relational database running on the app server (with full transaction support, etc)... but for List caching it totally kills the whole concept. The current entity cache keeps lists of results by the query condition used to get those results and this is very different from what a database does, and makes things rather messy and inefficient outside simple use cases.

On top of these big functional issues (which are deal killers IMO), there is also the performance issue. The point, or intent at least, of the entity cache is to improve performance. As the cache gets more complex the performance will suffer, and because of the whole concept of caching results by queries the performance will be WORSE than the DB performance for the same queries in most cases. Databases are quite fast and efficient, and we'll never be able to reproduce their ability to scale and search in something like an in-memory entity cache, especially not considering the massive redundancy and overhead of caching lists of values by condition.

As an example of this in the real world: on a large OFBiz project I worked on that finished last year we went into production with the entity cache turned OFF, completely DISABLED. Why? When doing load testing on a whim one of the guys decided to try it without the entity cache enabled, and the body of JMeter tests that exercised a few dozen of the most common user paths through the system actually ran FASTER. The database (MySQL in this case) was hit over the network, but responded quickly enough to make things work quite well for the various find queries, and FAR faster for updates, especially creates. This project was one of the higher volume projects I'm aware of for OFBiz, at peaks handling sustained processing of around 10 orders per second (36,000 per hour), with some short term peaks much higher, closer to 20-30 orders per second... and longer term peaks hitting over 200k orders in one day (north America only day time, around a 12 hour window).

I found this to be curious so looked into it a bit more and the main performance culprit was updates, ESPECIALLY creates on any entity that has an active list cache. Auto-clearing that cache requires running the condition for each cache entry on the record to see if it matches, and if it does then it is cleared. This could be made more efficient by expanding the reverse index concept to index all values of fields in conditions... though that would be fairly complex to implement because of the wide variety of conditions that CAN be performed on fields, and even moreso when they are combined with other logic... especially NOTs and ORs. This could potentially increase performance, but would again add yet more complexity and overhead.

To turn this dilemma into a nightmare, consider caching view-entities. In general as systems scale if you ever have to iterate over stuff your performance is going to get hit REALLY hard compared to indexed and other less than n operations.

The main lesson from the story: caching, especially list caching, should ONLY be done in limited cases when the ratio of reads to write is VERY high, and more particularly the ratio of reads to creates. When considering whether to use a cache this should be considered carefully, because records are sometimes updated from places that developers are unaware, sometimes at surprising volumes. For example, it might seem great (and help a lot in dev and lower scale testing) to cache inventory information for viewing on a category screen, but always go to the DB to avoid stale data on a product detail screen and when adding to cart. The problem is that with high order volumes the inventory data is pretty much constantly being updated, so the caches are constantly... SLOWLY... being cleared as InventoryDetail records are created for reservations and issuances.

To turn this nightmare into a deal killer, consider multiple application servers and the need for either a (SLOW) distributed cache or (SLOW) distributed cache clearing. These have to go over the network anyway, so might as well go to the database!

In the case above where we decided to NOT use the entity cache at all the tests were run on one really beefy server showing that disabling the cache was faster. When we ran it in a cluster of just 2 servers with direct DCC (the best case scenario for a distributed cache) we not only saw a big performance hit, but also got various run-time errors from stale data.

I really don't how anyone could back the concept of caching all finds by default... you don't even have to imagine edge cases, just consider the problems ALREADY being faced with more limited caching and how often the entity cache simply isn't a good solution.

As for improving the entity caching in OFBiz, there are some concepts in Moqui that might be useful:

1. add a cache attribute to the entity definition with true, false, and never options; true and false being defaults that can be overridden by code, and never being an absolute (OFBiz does have this option IIRC); this would default to false, true being a useful setting for common things like Enumeration, StatusItem, etc, etc

2. add general support in the entity engine find methods for a "for update" parameter, and if true don't cache (and pass this on to the DB to lock the record(s) being queried), also making the value mutable

3. a write-through per-transaction cache; you can do some really cool stuff with this, avoiding most database hits during a transaction until the end when the changes are dumped to the DB; the Moqui implementation of this concept even looks for cached records that any find condition would require to get results and does the query in-memory, not having to go to the database at all... and for other queries augments the results with values in the cache

The whole concept of a write-through cache that is limited to the scope of a single transaction shows some of the issues you would run into even if trying to make the entity cache transactional. Especially with more complex finds it just falls apart. The current Moqui implementation handles quite a bit, but there are various things that I've run into testing it with real-world business services that are either a REAL pain to handle (so I haven't yet, but it is conceptually possible) or that I simply can't think of any good way to handle... and for those you simply can't use the write-through cache.

There are some notes in the code for this, and some code/comments to more thoroughly communicate this concept, in this class in Moqui:

https://github.com/moqui/moqui/blob/master/framework/src/main/groovy/org/moqui/impl/context/TransactionCache.groovy

I should also say that my motivation to handle every edge case even for this write-through cache is limited... yes there is room for improvement handling more scenarios, but how big will the performance increase ACTUALLY be for them? The efforts on this so far have been based on profiling results and making sure there is a significant difference (which there is for many services in Mantle Business Artifacts, though I haven't even come close to testing all of them this way).

The same concept would apply to a read-only entity cache... some things might be possible to support, but would NOT improve performance making them a moot point.

I don't know if I've written enough to convince everyone listening that even attempting a universal read-only entity cache is a useless idea... I'm sure some will still like the idea. If anyone gets into it and wants to try it out in their own branch of OFBiz, great... knock yourself out (probably literally...). But PLEASE no one ever commit something like this to the primary branch in the repo... not EVER.

The whole idea that the OFBiz entity cache has had more limited ability to handle different scenarios in the past than it does now is not an argument of any sort supporting the idea of taking the entity cache to the ultimate possible end... which theoretically isn't even that far from where it is now.

To apply a more useful standard the arguments should be for a _useful_ objective, which means increasing performance. I guarantee an always used find cache will NOT increase performance, it will kill it dead and cause infinite concurrency headaches in the process.

-David

> On 19 Mar 2015, at 10:46, Adrian Crum <[hidden email]> wrote:
>
> The translation to English is not good, but I think I understand what you are saying.
>
> The entity values in the cache MUST be immutable - because multiple threads share the values. To do otherwise would require complicated synchronization code in GenericValue (which would cause blocking and hurt performance).
>
> When I first starting working on the entity cache issues, it appeared to me that mutable entity values may have been in the original design (to enable a write-through cache). That is my guess - I am not sure. At some time, the entity values in the cache were made immutable, but the change was incomplete - some cached entity values were immutable and others were not. That is one of the things I fixed - I made sure ALL entity values coming from the cache are immutable.
>
> One way we can eliminate the additional complication of cloning immutable entity values is to wrap the List in a custom Iterator implementation that automatically clones elements as they are retrieved from the List. The drawback is the performance hit - because you would be cloning values that might not get modified. I think it is more efficient to clone an entity value only when you intend to modify it.
>
> Adrian Crum
> Sandglass Software
> www.sandglass-software.com
>
> On 3/19/2015 4:19 PM, Nicolas Malin wrote:
>> Le 18/03/2015 13:16, Adrian Crum a écrit :
>>> If you code Delegator calls to avoid the cache, then there is no way
>>> for a sysadmin to configure the caching behavior - that bit of code
>>> will ALWAYS make a database call.
>>>
>>> If you make all Delegator calls use the cache, then there is an
>>> additional complication that will add a bit more code: the
>>> GenericValue instances retrieved from the cache are immutable - if you
>>> want to modify them, then you will have to clone them. So, this
>>> approach can produce an additional line of code.
>>
>> I don't see any logical reason why we need to keep a GenericValue came
>> from cache as immutable. In large vision, a developper give information
>> on cache or not only he want force the cache using during his process.
>> As OFBiz manage by default transaction, timezone, locale, auto-matching
>> or others.
>> The entity engine would be works with admin sys cache tuning.
>>
>> As example delegator.find("Party", "partyId", partyId) use the default
>> parameter from cache.properties and after the store on a cached
>> GenericValue is a delegator's problem. I see a simple test like that :
>> if (genericValue came from cache) {
>> if (value is already done) {
>> getFromDataBase
>> update Value
>> }
>> else refuse (or not I have a doubt :) )
>> }
>> store
>>
>>
>> Nicolas

Michael Brohl-3

Re: Entity Caching

David,

wow, quite an interesting read!

Thank you for sharing some historical design insights for the entity
cache and the valuable findings using (or not using) the entity cache in
a real life production scenario.

I had not the time to dig deeper into the proposed entity cache changes
but had the feeling that this would be quite a challenging change which
have to be very well thought-out and needs some thorough testing.

Regards,

Michael

Am 20.03.15 um 22:22 schrieb David E. Jones:

> Stepping back a little, some history and theory of the entity cache might be helpful.
>
> The original intent of the entity cache was a simple way to keep frequently used values/records closer to the code that uses them, ie in the application server. One real world example of this is the goal to be able to render ecommerce catalog and product pages without hitting the database.
>
> Over time the entity caching was made more complex to handle more caching scenarios, but still left to the developer to determine if caching is appropriate for the code they are writing.
>
> In theory is it possible to write an entity cache that can be used 100% of the time? IMO the answer is NO. This is almost possible for single record caching, with the cache ultimately becoming an in-memory relational database running on the app server (with full transaction support, etc)... but for List caching it totally kills the whole concept. The current entity cache keeps lists of results by the query condition used to get those results and this is very different from what a database does, and makes things rather messy and inefficient outside simple use cases.
>
> On top of these big functional issues (which are deal killers IMO), there is also the performance issue. The point, or intent at least, of the entity cache is to improve performance. As the cache gets more complex the performance will suffer, and because of the whole concept of caching results by queries the performance will be WORSE than the DB performance for the same queries in most cases. Databases are quite fast and efficient, and we'll never be able to reproduce their ability to scale and search in something like an in-memory entity cache, especially not considering the massive redundancy and overhead of caching lists of values by condition.
>
> As an example of this in the real world: on a large OFBiz project I worked on that finished last year we went into production with the entity cache turned OFF, completely DISABLED. Why? When doing load testing on a whim one of the guys decided to try it without the entity cache enabled, and the body of JMeter tests that exercised a few dozen of the most common user paths through the system actually ran FASTER. The database (MySQL in this case) was hit over the network, but responded quickly enough to make things work quite well for the various find queries, and FAR faster for updates, especially creates. This project was one of the higher volume projects I'm aware of for OFBiz, at peaks handling sustained processing of around 10 orders per second (36,000 per hour), with some short term peaks much higher, closer to 20-30 orders per second... and longer term peaks hitting over 200k orders in one day (north America only day time, around a 12 hour window).
>
> I found this to be curious so looked into it a bit more and the main performance culprit was updates, ESPECIALLY creates on any entity that has an active list cache. Auto-clearing that cache requires running the condition for each cache entry on the record to see if it matches, and if it does then it is cleared. This could be made more efficient by expanding the reverse index concept to index all values of fields in conditions... though that would be fairly complex to implement because of the wide variety of conditions that CAN be performed on fields, and even moreso when they are combined with other logic... especially NOTs and ORs. This could potentially increase performance, but would again add yet more complexity and overhead.
>
> To turn this dilemma into a nightmare, consider caching view-entities. In general as systems scale if you ever have to iterate over stuff your performance is going to get hit REALLY hard compared to indexed and other less than n operations.
>
> The main lesson from the story: caching, especially list caching, should ONLY be done in limited cases when the ratio of reads to write is VERY high, and more particularly the ratio of reads to creates. When considering whether to use a cache this should be considered carefully, because records are sometimes updated from places that developers are unaware, sometimes at surprising volumes. For example, it might seem great (and help a lot in dev and lower scale testing) to cache inventory information for viewing on a category screen, but always go to the DB to avoid stale data on a product detail screen and when adding to cart. The problem is that with high order volumes the inventory data is pretty much constantly being updated, so the caches are constantly... SLOWLY... being cleared as InventoryDetail records are created for reservations and issuances.
>
> To turn this nightmare into a deal killer, consider multiple application servers and the need for either a (SLOW) distributed cache or (SLOW) distributed cache clearing. These have to go over the network anyway, so might as well go to the database!
>
> In the case above where we decided to NOT use the entity cache at all the tests were run on one really beefy server showing that disabling the cache was faster. When we ran it in a cluster of just 2 servers with direct DCC (the best case scenario for a distributed cache) we not only saw a big performance hit, but also got various run-time errors from stale data.
>
> I really don't how anyone could back the concept of caching all finds by default... you don't even have to imagine edge cases, just consider the problems ALREADY being faced with more limited caching and how often the entity cache simply isn't a good solution.
>
> As for improving the entity caching in OFBiz, there are some concepts in Moqui that might be useful:
>
> 1. add a cache attribute to the entity definition with true, false, and never options; true and false being defaults that can be overridden by code, and never being an absolute (OFBiz does have this option IIRC); this would default to false, true being a useful setting for common things like Enumeration, StatusItem, etc, etc
>
> 2. add general support in the entity engine find methods for a "for update" parameter, and if true don't cache (and pass this on to the DB to lock the record(s) being queried), also making the value mutable
>
> 3. a write-through per-transaction cache; you can do some really cool stuff with this, avoiding most database hits during a transaction until the end when the changes are dumped to the DB; the Moqui implementation of this concept even looks for cached records that any find condition would require to get results and does the query in-memory, not having to go to the database at all... and for other queries augments the results with values in the cache
>
> The whole concept of a write-through cache that is limited to the scope of a single transaction shows some of the issues you would run into even if trying to make the entity cache transactional. Especially with more complex finds it just falls apart. The current Moqui implementation handles quite a bit, but there are various things that I've run into testing it with real-world business services that are either a REAL pain to handle (so I haven't yet, but it is conceptually possible) or that I simply can't think of any good way to handle... and for those you simply can't use the write-through cache.
>
> There are some notes in the code for this, and some code/comments to more thoroughly communicate this concept, in this class in Moqui:
>
> https://github.com/moqui/moqui/blob/master/framework/src/main/groovy/org/moqui/impl/context/TransactionCache.groovy
>
> I should also say that my motivation to handle every edge case even for this write-through cache is limited... yes there is room for improvement handling more scenarios, but how big will the performance increase ACTUALLY be for them? The efforts on this so far have been based on profiling results and making sure there is a significant difference (which there is for many services in Mantle Business Artifacts, though I haven't even come close to testing all of them this way).
>
> The same concept would apply to a read-only entity cache... some things might be possible to support, but would NOT improve performance making them a moot point.
>
> I don't know if I've written enough to convince everyone listening that even attempting a universal read-only entity cache is a useless idea... I'm sure some will still like the idea. If anyone gets into it and wants to try it out in their own branch of OFBiz, great... knock yourself out (probably literally...). But PLEASE no one ever commit something like this to the primary branch in the repo... not EVER.
>
> The whole idea that the OFBiz entity cache has had more limited ability to handle different scenarios in the past than it does now is not an argument of any sort supporting the idea of taking the entity cache to the ultimate possible end... which theoretically isn't even that far from where it is now.
>
> To apply a more useful standard the arguments should be for a _useful_ objective, which means increasing performance. I guarantee an always used find cache will NOT increase performance, it will kill it dead and cause infinite concurrency headaches in the process.
>
> -David
>
>

smime.p7s (5K) Download Attachment

Jacques Le Roux

Re: Entity Caching

Administrator

Le 20/03/2015 23:37, Michael Brohl a écrit :
> David,
>
> wow, quite an interesting read!
>
> Thank you for sharing some historical design insights for the entity cache and the valuable findings using (or not using) the entity cache in a real
> life production scenario.
>
> I had not the time to dig deeper into the proposed entity cache changes but had the feeling that this would be quite a challenging change which have
> to be very well thought-out and needs some thorough testing.

Like said Scott, without the ability to compare the 2 possible implementations (old and new) it's a risky thing. David's feedback proves it

Jacques

>
> Regards,
>
> Michael
>
>
> Am 20.03.15 um 22:22 schrieb David E. Jones:
>> Stepping back a little, some history and theory of the entity cache might be helpful.
>>
>> The original intent of the entity cache was a simple way to keep frequently used values/records closer to the code that uses them, ie in the
>> application server. One real world example of this is the goal to be able to render ecommerce catalog and product pages without hitting the database.
>>
>> Over time the entity caching was made more complex to handle more caching scenarios, but still left to the developer to determine if caching is
>> appropriate for the code they are writing.
>>
>> In theory is it possible to write an entity cache that can be used 100% of the time? IMO the answer is NO. This is almost possible for single
>> record caching, with the cache ultimately becoming an in-memory relational database running on the app server (with full transaction support,
>> etc)... but for List caching it totally kills the whole concept. The current entity cache keeps lists of results by the query condition used to get
>> those results and this is very different from what a database does, and makes things rather messy and inefficient outside simple use cases.
>>
>> On top of these big functional issues (which are deal killers IMO), there is also the performance issue. The point, or intent at least, of the
>> entity cache is to improve performance. As the cache gets more complex the performance will suffer, and because of the whole concept of caching
>> results by queries the performance will be WORSE than the DB performance for the same queries in most cases. Databases are quite fast and
>> efficient, and we'll never be able to reproduce their ability to scale and search in something like an in-memory entity cache, especially not
>> considering the massive redundancy and overhead of caching lists of values by condition.
>>
>> As an example of this in the real world: on a large OFBiz project I worked on that finished last year we went into production with the entity cache
>> turned OFF, completely DISABLED. Why? When doing load testing on a whim one of the guys decided to try it without the entity cache enabled, and the
>> body of JMeter tests that exercised a few dozen of the most common user paths through the system actually ran FASTER. The database (MySQL in this
>> case) was hit over the network, but responded quickly enough to make things work quite well for the various find queries, and FAR faster for
>> updates, especially creates. This project was one of the higher volume projects I'm aware of for OFBiz, at peaks handling sustained processing of
>> around 10 orders per second (36,000 per hour), with some short term peaks much higher, closer to 20-30 orders per second... and longer term peaks
>> hitting over 200k orders in one day (north America only day time, around a 12 hour window).
>>
>> I found this to be curious so looked into it a bit more and the main performance culprit was updates, ESPECIALLY creates on any entity that has an
>> active list cache. Auto-clearing that cache requires running the condition for each cache entry on the record to see if it matches, and if it does
>> then it is cleared. This could be made more efficient by expanding the reverse index concept to index all values of fields in conditions... though
>> that would be fairly complex to implement because of the wide variety of conditions that CAN be performed on fields, and even moreso when they are
>> combined with other logic... especially NOTs and ORs. This could potentially increase performance, but would again add yet more complexity and
>> overhead.
>>
>> To turn this dilemma into a nightmare, consider caching view-entities. In general as systems scale if you ever have to iterate over stuff your
>> performance is going to get hit REALLY hard compared to indexed and other less than n operations.
>>
>> The main lesson from the story: caching, especially list caching, should ONLY be done in limited cases when the ratio of reads to write is VERY
>> high, and more particularly the ratio of reads to creates. When considering whether to use a cache this should be considered carefully, because
>> records are sometimes updated from places that developers are unaware, sometimes at surprising volumes. For example, it might seem great (and help
>> a lot in dev and lower scale testing) to cache inventory information for viewing on a category screen, but always go to the DB to avoid stale data
>> on a product detail screen and when adding to cart. The problem is that with high order volumes the inventory data is pretty much constantly being
>> updated, so the caches are constantly... SLOWLY... being cleared as InventoryDetail records are created for reservations and issuances.
>>
>> To turn this nightmare into a deal killer, consider multiple application servers and the need for either a (SLOW) distributed cache or (SLOW)
>> distributed cache clearing. These have to go over the network anyway, so might as well go to the database!
>>
>> In the case above where we decided to NOT use the entity cache at all the tests were run on one really beefy server showing that disabling the
>> cache was faster. When we ran it in a cluster of just 2 servers with direct DCC (the best case scenario for a distributed cache) we not only saw a
>> big performance hit, but also got various run-time errors from stale data.
>>
>> I really don't how anyone could back the concept of caching all finds by default... you don't even have to imagine edge cases, just consider the
>> problems ALREADY being faced with more limited caching and how often the entity cache simply isn't a good solution.
>>
>> As for improving the entity caching in OFBiz, there are some concepts in Moqui that might be useful:
>>
>> 1. add a cache attribute to the entity definition with true, false, and never options; true and false being defaults that can be overridden by
>> code, and never being an absolute (OFBiz does have this option IIRC); this would default to false, true being a useful setting for common things
>> like Enumeration, StatusItem, etc, etc
>>
>> 2. add general support in the entity engine find methods for a "for update" parameter, and if true don't cache (and pass this on to the DB to lock
>> the record(s) being queried), also making the value mutable
>>
>> 3. a write-through per-transaction cache; you can do some really cool stuff with this, avoiding most database hits during a transaction until the
>> end when the changes are dumped to the DB; the Moqui implementation of this concept even looks for cached records that any find condition would
>> require to get results and does the query in-memory, not having to go to the database at all... and for other queries augments the results with
>> values in the cache
>>
>> The whole concept of a write-through cache that is limited to the scope of a single transaction shows some of the issues you would run into even if
>> trying to make the entity cache transactional. Especially with more complex finds it just falls apart. The current Moqui implementation handles
>> quite a bit, but there are various things that I've run into testing it with real-world business services that are either a REAL pain to handle (so
>> I haven't yet, but it is conceptually possible) or that I simply can't think of any good way to handle... and for those you simply can't use the
>> write-through cache.
>>
>> There are some notes in the code for this, and some code/comments to more thoroughly communicate this concept, in this class in Moqui:
>>
>> https://github.com/moqui/moqui/blob/master/framework/src/main/groovy/org/moqui/impl/context/TransactionCache.groovy
>>
>> I should also say that my motivation to handle every edge case even for this write-through cache is limited... yes there is room for improvement
>> handling more scenarios, but how big will the performance increase ACTUALLY be for them? The efforts on this so far have been based on profiling
>> results and making sure there is a significant difference (which there is for many services in Mantle Business Artifacts, though I haven't even
>> come close to testing all of them this way).
>>
>> The same concept would apply to a read-only entity cache... some things might be possible to support, but would NOT improve performance making them
>> a moot point.
>>
>> I don't know if I've written enough to convince everyone listening that even attempting a universal read-only entity cache is a useless idea... I'm
>> sure some will still like the idea. If anyone gets into it and wants to try it out in their own branch of OFBiz, great... knock yourself out
>> (probably literally...). But PLEASE no one ever commit something like this to the primary branch in the repo... not EVER.
>>
>> The whole idea that the OFBiz entity cache has had more limited ability to handle different scenarios in the past than it does now is not an
>> argument of any sort supporting the idea of taking the entity cache to the ultimate possible end... which theoretically isn't even that far from
>> where it is now.
>>
>> To apply a more useful standard the arguments should be for a _useful_ objective, which means increasing performance. I guarantee an always used
>> find cache will NOT increase performance, it will kill it dead and cause infinite concurrency headaches in the process.
>>
>> -David
>>
>>
>
>

Jacques Le Roux

Re: Entity Caching

Administrator

Le 20/03/2015 23:41, Jacques Le Roux a écrit :

> Le 20/03/2015 23:37, Michael Brohl a écrit :
>> David,
>>
>> wow, quite an interesting read!
>>
>> Thank you for sharing some historical design insights for the entity cache and the valuable findings using (or not using) the entity cache in a
>> real life production scenario.
>>
>> I had not the time to dig deeper into the proposed entity cache changes but had the feeling that this would be quite a challenging change which
>> have to be very well thought-out and needs some thorough testing.
>
> Like said Scott, without the ability to compare the 2 possible implementations (old and new) it's a risky thing. David's feedback proves it
>
> Jacques

Ha something I forgot to say also, nowadays with SSDs, which are roughly 10 times faster than HDs, things have changed a bit.

Jacques

>
>>
>> Regards,
>>
>> Michael
>>
>>
>> Am 20.03.15 um 22:22 schrieb David E. Jones:
>>> Stepping back a little, some history and theory of the entity cache might be helpful.
>>>
>>> The original intent of the entity cache was a simple way to keep frequently used values/records closer to the code that uses them, ie in the
>>> application server. One real world example of this is the goal to be able to render ecommerce catalog and product pages without hitting the database.
>>>
>>> Over time the entity caching was made more complex to handle more caching scenarios, but still left to the developer to determine if caching is
>>> appropriate for the code they are writing.
>>>
>>> In theory is it possible to write an entity cache that can be used 100% of the time? IMO the answer is NO. This is almost possible for single
>>> record caching, with the cache ultimately becoming an in-memory relational database running on the app server (with full transaction support,
>>> etc)... but for List caching it totally kills the whole concept. The current entity cache keeps lists of results by the query condition used to
>>> get those results and this is very different from what a database does, and makes things rather messy and inefficient outside simple use cases.
>>>
>>> On top of these big functional issues (which are deal killers IMO), there is also the performance issue. The point, or intent at least, of the
>>> entity cache is to improve performance. As the cache gets more complex the performance will suffer, and because of the whole concept of caching
>>> results by queries the performance will be WORSE than the DB performance for the same queries in most cases. Databases are quite fast and
>>> efficient, and we'll never be able to reproduce their ability to scale and search in something like an in-memory entity cache, especially not
>>> considering the massive redundancy and overhead of caching lists of values by condition.
>>>
>>> As an example of this in the real world: on a large OFBiz project I worked on that finished last year we went into production with the entity
>>> cache turned OFF, completely DISABLED. Why? When doing load testing on a whim one of the guys decided to try it without the entity cache enabled,
>>> and the body of JMeter tests that exercised a few dozen of the most common user paths through the system actually ran FASTER. The database (MySQL
>>> in this case) was hit over the network, but responded quickly enough to make things work quite well for the various find queries, and FAR faster
>>> for updates, especially creates. This project was one of the higher volume projects I'm aware of for OFBiz, at peaks handling sustained processing
>>> of around 10 orders per second (36,000 per hour), with some short term peaks much higher, closer to 20-30 orders per second... and longer term
>>> peaks hitting over 200k orders in one day (north America only day time, around a 12 hour window).
>>>
>>> I found this to be curious so looked into it a bit more and the main performance culprit was updates, ESPECIALLY creates on any entity that has an
>>> active list cache. Auto-clearing that cache requires running the condition for each cache entry on the record to see if it matches, and if it does
>>> then it is cleared. This could be made more efficient by expanding the reverse index concept to index all values of fields in conditions... though
>>> that would be fairly complex to implement because of the wide variety of conditions that CAN be performed on fields, and even moreso when they are
>>> combined with other logic... especially NOTs and ORs. This could potentially increase performance, but would again add yet more complexity and
>>> overhead.
>>>
>>> To turn this dilemma into a nightmare, consider caching view-entities. In general as systems scale if you ever have to iterate over stuff your
>>> performance is going to get hit REALLY hard compared to indexed and other less than n operations.
>>>
>>> The main lesson from the story: caching, especially list caching, should ONLY be done in limited cases when the ratio of reads to write is VERY
>>> high, and more particularly the ratio of reads to creates. When considering whether to use a cache this should be considered carefully, because
>>> records are sometimes updated from places that developers are unaware, sometimes at surprising volumes. For example, it might seem great (and help
>>> a lot in dev and lower scale testing) to cache inventory information for viewing on a category screen, but always go to the DB to avoid stale data
>>> on a product detail screen and when adding to cart. The problem is that with high order volumes the inventory data is pretty much constantly being
>>> updated, so the caches are constantly... SLOWLY... being cleared as InventoryDetail records are created for reservations and issuances.
>>>
>>> To turn this nightmare into a deal killer, consider multiple application servers and the need for either a (SLOW) distributed cache or (SLOW)
>>> distributed cache clearing. These have to go over the network anyway, so might as well go to the database!
>>>
>>> In the case above where we decided to NOT use the entity cache at all the tests were run on one really beefy server showing that disabling the
>>> cache was faster. When we ran it in a cluster of just 2 servers with direct DCC (the best case scenario for a distributed cache) we not only saw a
>>> big performance hit, but also got various run-time errors from stale data.
>>>
>>> I really don't how anyone could back the concept of caching all finds by default... you don't even have to imagine edge cases, just consider the
>>> problems ALREADY being faced with more limited caching and how often the entity cache simply isn't a good solution.
>>>
>>> As for improving the entity caching in OFBiz, there are some concepts in Moqui that might be useful:
>>>
>>> 1. add a cache attribute to the entity definition with true, false, and never options; true and false being defaults that can be overridden by
>>> code, and never being an absolute (OFBiz does have this option IIRC); this would default to false, true being a useful setting for common things
>>> like Enumeration, StatusItem, etc, etc
>>>
>>> 2. add general support in the entity engine find methods for a "for update" parameter, and if true don't cache (and pass this on to the DB to lock
>>> the record(s) being queried), also making the value mutable
>>>
>>> 3. a write-through per-transaction cache; you can do some really cool stuff with this, avoiding most database hits during a transaction until the
>>> end when the changes are dumped to the DB; the Moqui implementation of this concept even looks for cached records that any find condition would
>>> require to get results and does the query in-memory, not having to go to the database at all... and for other queries augments the results with
>>> values in the cache
>>>
>>> The whole concept of a write-through cache that is limited to the scope of a single transaction shows some of the issues you would run into even
>>> if trying to make the entity cache transactional. Especially with more complex finds it just falls apart. The current Moqui implementation handles
>>> quite a bit, but there are various things that I've run into testing it with real-world business services that are either a REAL pain to handle
>>> (so I haven't yet, but it is conceptually possible) or that I simply can't think of any good way to handle... and for those you simply can't use
>>> the write-through cache.
>>>
>>> There are some notes in the code for this, and some code/comments to more thoroughly communicate this concept, in this class in Moqui:
>>>
>>> https://github.com/moqui/moqui/blob/master/framework/src/main/groovy/org/moqui/impl/context/TransactionCache.groovy
>>>
>>> I should also say that my motivation to handle every edge case even for this write-through cache is limited... yes there is room for improvement
>>> handling more scenarios, but how big will the performance increase ACTUALLY be for them? The efforts on this so far have been based on profiling
>>> results and making sure there is a significant difference (which there is for many services in Mantle Business Artifacts, though I haven't even
>>> come close to testing all of them this way).
>>>
>>> The same concept would apply to a read-only entity cache... some things might be possible to support, but would NOT improve performance making
>>> them a moot point.
>>>
>>> I don't know if I've written enough to convince everyone listening that even attempting a universal read-only entity cache is a useless idea...
>>> I'm sure some will still like the idea. If anyone gets into it and wants to try it out in their own branch of OFBiz, great... knock yourself out
>>> (probably literally...). But PLEASE no one ever commit something like this to the primary branch in the repo... not EVER.
>>>
>>> The whole idea that the OFBiz entity cache has had more limited ability to handle different scenarios in the past than it does now is not an
>>> argument of any sort supporting the idea of taking the entity cache to the ultimate possible end... which theoretically isn't even that far from
>>> where it is now.
>>>
>>> To apply a more useful standard the arguments should be for a _useful_ objective, which means increasing performance. I guarantee an always used
>>> find cache will NOT increase performance, it will kill it dead and cause infinite concurrency headaches in the process.
>>>
>>> -David
>>>
>>>
>>
>>
>

Adrian Crum-3

Re: Entity Caching

In reply to this post by David E. Jones-2

Thanks for the info David! I agree 100% with everything you said.

There may be some misunderstanding about my advice. I suggested that
caching should be configured in the settings file, I did not suggest
that everything should be cached all the time.

Like you said, JMeter tests can reveal what needs to be cached, and a
sysadmin can fine-tune performance by tweaking the cache settings. The
problem I mentioned is this: A sysadmin can't improve performance by
caching a particular entity if a developer has hard-coded it not to be
cached.

Btw, I removed the complicated condition checking in the condition cache
because it didn't work. Not only was the system spending a lot of time
evaluating long lists of values (each value having a potentially long
list of conditions), at the end of the evaluation the result was always
a cache miss.

Adrian Crum
Sandglass Software
www.sandglass-software.com

On 3/20/2015 9:22 PM, David E. Jones wrote:

>
> Stepping back a little, some history and theory of the entity cache might be helpful.
>
> The original intent of the entity cache was a simple way to keep frequently used values/records closer to the code that uses them, ie in the application server. One real world example of this is the goal to be able to render ecommerce catalog and product pages without hitting the database.
>
> Over time the entity caching was made more complex to handle more caching scenarios, but still left to the developer to determine if caching is appropriate for the code they are writing.
>
> In theory is it possible to write an entity cache that can be used 100% of the time? IMO the answer is NO. This is almost possible for single record caching, with the cache ultimately becoming an in-memory relational database running on the app server (with full transaction support, etc)... but for List caching it totally kills the whole concept. The current entity cache keeps lists of results by the query condition used to get those results and this is very different from what a database does, and makes things rather messy and inefficient outside simple use cases.
>
> On top of these big functional issues (which are deal killers IMO), there is also the performance issue. The point, or intent at least, of the entity cache is to improve performance. As the cache gets more complex the performance will suffer, and because of the whole concept of caching results by queries the performance will be WORSE than the DB performance for the same queries in most cases. Databases are quite fast and efficient, and we'll never be able to reproduce their ability to scale and search in something like an in-memory entity cache, especially not considering the massive redundancy and overhead of caching lists of values by condition.
>
> As an example of this in the real world: on a large OFBiz project I worked on that finished last year we went into production with the entity cache turned OFF, completely DISABLED. Why? When doing load testing on a whim one of the guys decided to try it without the entity cache enabled, and the body of JMeter tests that exercised a few dozen of the most common user paths through the system actually ran FASTER. The database (MySQL in this case) was hit over the network, but responded quickly enough to make things work quite well for the various find queries, and FAR faster for updates, especially creates. This project was one of the higher volume projects I'm aware of for OFBiz, at peaks handling sustained processing of around 10 orders per second (36,000 per hour), with some short term peaks much higher, closer to 20-30 orders per second... and longer term peaks hitting over 200k orders in one day (north America only day time, around a 12 hour window).
>
> I found this to be curious so looked into it a bit more and the main performance culprit was updates, ESPECIALLY creates on any entity that has an active list cache. Auto-clearing that cache requires running the condition for each cache entry on the record to see if it matches, and if it does then it is cleared. This could be made more efficient by expanding the reverse index concept to index all values of fields in conditions... though that would be fairly complex to implement because of the wide variety of conditions that CAN be performed on fields, and even moreso when they are combined with other logic... especially NOTs and ORs. This could potentially increase performance, but would again add yet more complexity and overhead.
>
> To turn this dilemma into a nightmare, consider caching view-entities. In general as systems scale if you ever have to iterate over stuff your performance is going to get hit REALLY hard compared to indexed and other less than n operations.
>
> The main lesson from the story: caching, especially list caching, should ONLY be done in limited cases when the ratio of reads to write is VERY high, and more particularly the ratio of reads to creates. When considering whether to use a cache this should be considered carefully, because records are sometimes updated from places that developers are unaware, sometimes at surprising volumes. For example, it might seem great (and help a lot in dev and lower scale testing) to cache inventory information for viewing on a category screen, but always go to the DB to avoid stale data on a product detail screen and when adding to cart. The problem is that with high order volumes the inventory data is pretty much constantly being updated, so the caches are constantly... SLOWLY... being cleared as InventoryDetail records are created for reservations and issuances.
>
> To turn this nightmare into a deal killer, consider multiple application servers and the need for either a (SLOW) distributed cache or (SLOW) distributed cache clearing. These have to go over the network anyway, so might as well go to the database!
>
> In the case above where we decided to NOT use the entity cache at all the tests were run on one really beefy server showing that disabling the cache was faster. When we ran it in a cluster of just 2 servers with direct DCC (the best case scenario for a distributed cache) we not only saw a big performance hit, but also got various run-time errors from stale data.
>
> I really don't how anyone could back the concept of caching all finds by default... you don't even have to imagine edge cases, just consider the problems ALREADY being faced with more limited caching and how often the entity cache simply isn't a good solution.
>
> As for improving the entity caching in OFBiz, there are some concepts in Moqui that might be useful:
>
> 1. add a cache attribute to the entity definition with true, false, and never options; true and false being defaults that can be overridden by code, and never being an absolute (OFBiz does have this option IIRC); this would default to false, true being a useful setting for common things like Enumeration, StatusItem, etc, etc
>
> 2. add general support in the entity engine find methods for a "for update" parameter, and if true don't cache (and pass this on to the DB to lock the record(s) being queried), also making the value mutable
>
> 3. a write-through per-transaction cache; you can do some really cool stuff with this, avoiding most database hits during a transaction until the end when the changes are dumped to the DB; the Moqui implementation of this concept even looks for cached records that any find condition would require to get results and does the query in-memory, not having to go to the database at all... and for other queries augments the results with values in the cache
>
> The whole concept of a write-through cache that is limited to the scope of a single transaction shows some of the issues you would run into even if trying to make the entity cache transactional. Especially with more complex finds it just falls apart. The current Moqui implementation handles quite a bit, but there are various things that I've run into testing it with real-world business services that are either a REAL pain to handle (so I haven't yet, but it is conceptually possible) or that I simply can't think of any good way to handle... and for those you simply can't use the write-through cache.
>
> There are some notes in the code for this, and some code/comments to more thoroughly communicate this concept, in this class in Moqui:
>
> https://github.com/moqui/moqui/blob/master/framework/src/main/groovy/org/moqui/impl/context/TransactionCache.groovy
>
> I should also say that my motivation to handle every edge case even for this write-through cache is limited... yes there is room for improvement handling more scenarios, but how big will the performance increase ACTUALLY be for them? The efforts on this so far have been based on profiling results and making sure there is a significant difference (which there is for many services in Mantle Business Artifacts, though I haven't even come close to testing all of them this way).
>
> The same concept would apply to a read-only entity cache... some things might be possible to support, but would NOT improve performance making them a moot point.
>
> I don't know if I've written enough to convince everyone listening that even attempting a universal read-only entity cache is a useless idea... I'm sure some will still like the idea. If anyone gets into it and wants to try it out in their own branch of OFBiz, great... knock yourself out (probably literally...). But PLEASE no one ever commit something like this to the primary branch in the repo... not EVER.
>
> The whole idea that the OFBiz entity cache has had more limited ability to handle different scenarios in the past than it does now is not an argument of any sort supporting the idea of taking the entity cache to the ultimate possible end... which theoretically isn't even that far from where it is now.
>
> To apply a more useful standard the arguments should be for a _useful_ objective, which means increasing performance. I guarantee an always used find cache will NOT increase performance, it will kill it dead and cause infinite concurrency headaches in the process.
>
> -David
>
>
>
>
>> On 19 Mar 2015, at 10:46, Adrian Crum <[hidden email]> wrote:
>>
>> The translation to English is not good, but I think I understand what you are saying.
>>
>> The entity values in the cache MUST be immutable - because multiple threads share the values. To do otherwise would require complicated synchronization code in GenericValue (which would cause blocking and hurt performance).
>>
>> When I first starting working on the entity cache issues, it appeared to me that mutable entity values may have been in the original design (to enable a write-through cache). That is my guess - I am not sure. At some time, the entity values in the cache were made immutable, but the change was incomplete - some cached entity values were immutable and others were not. That is one of the things I fixed - I made sure ALL entity values coming from the cache are immutable.
>>
>> One way we can eliminate the additional complication of cloning immutable entity values is to wrap the List in a custom Iterator implementation that automatically clones elements as they are retrieved from the List. The drawback is the performance hit - because you would be cloning values that might not get modified. I think it is more efficient to clone an entity value only when you intend to modify it.
>>
>> Adrian Crum
>> Sandglass Software
>> www.sandglass-software.com
>>
>> On 3/19/2015 4:19 PM, Nicolas Malin wrote:
>>> Le 18/03/2015 13:16, Adrian Crum a écrit :
>>>> If you code Delegator calls to avoid the cache, then there is no way
>>>> for a sysadmin to configure the caching behavior - that bit of code
>>>> will ALWAYS make a database call.
>>>>
>>>> If you make all Delegator calls use the cache, then there is an
>>>> additional complication that will add a bit more code: the
>>>> GenericValue instances retrieved from the cache are immutable - if you
>>>> want to modify them, then you will have to clone them. So, this
>>>> approach can produce an additional line of code.
>>>
>>> I don't see any logical reason why we need to keep a GenericValue came
>>> from cache as immutable. In large vision, a developper give information
>>> on cache or not only he want force the cache using during his process.
>>> As OFBiz manage by default transaction, timezone, locale, auto-matching
>>> or others.
>>> The entity engine would be works with admin sys cache tuning.
>>>
>>> As example delegator.find("Party", "partyId", partyId) use the default
>>> parameter from cache.properties and after the store on a cached
>>> GenericValue is a delegator's problem. I see a simple test like that :
>>> if (genericValue came from cache) {
>>> if (value is already done) {
>>> getFromDataBase
>>> update Value
>>> }
>>> else refuse (or not I have a doubt :) )
>>> }
>>> store
>>>
>>>
>>> Nicolas
>

taher

Re: Entity Caching

Hi all,

I just have to say I learned an insane amount of information from this thread! More than half of this stuff should be documented somewhere in the wiki. And to David Jones, respect! Thank you for sharing this information.

Taher Alkhateeb

----- Original Message -----

From: "Adrian Crum" <[hidden email]>
To: [hidden email]
Sent: Saturday, 21 March, 2015 10:39:09 AM
Subject: Re: Entity Caching

Thanks for the info David! I agree 100% with everything you said.

There may be some misunderstanding about my advice. I suggested that
caching should be configured in the settings file, I did not suggest
that everything should be cached all the time.

Like you said, JMeter tests can reveal what needs to be cached, and a
sysadmin can fine-tune performance by tweaking the cache settings. The
problem I mentioned is this: A sysadmin can't improve performance by
caching a particular entity if a developer has hard-coded it not to be
cached.

Btw, I removed the complicated condition checking in the condition cache
because it didn't work. Not only was the system spending a lot of time
evaluating long lists of values (each value having a potentially long
list of conditions), at the end of the evaluation the result was always
a cache miss.

Adrian Crum
Sandglass Software
www.sandglass-software.com

On 3/20/2015 9:22 PM, David E. Jones wrote:

Scott Gray-3

Re: Entity Caching

In reply to this post by Adrian Crum-3

> My preference is to make ALL Delegator calls use the cache.

Perhaps I misunderstood the above sentence? I responded because I don't
think caching everything is a good idea

On 21 Mar 2015 20:41, "Adrian Crum" <[hidden email]>
wrote:
>
> Thanks for the info David! I agree 100% with everything you said.
>
> There may be some misunderstanding about my advice. I suggested that
caching should be configured in the settings file, I did not suggest that
everything should be cached all the time.
>
> Like you said, JMeter tests can reveal what needs to be cached, and a
sysadmin can fine-tune performance by tweaking the cache settings. The
problem I mentioned is this: A sysadmin can't improve performance by
caching a particular entity if a developer has hard-coded it not to be
cached.
>
> Btw, I removed the complicated condition checking in the condition cache
because it didn't work. Not only was the system spending a lot of time
evaluating long lists of values (each value having a potentially long list
of conditions), at the end of the evaluation the result was always a cache
miss.

>
>
>
> Adrian Crum
> Sandglass Software
> www.sandglass-software.com
>
> On 3/20/2015 9:22 PM, David E. Jones wrote:
>>
>>
>> Stepping back a little, some history and theory of the entity cache

might be helpful.
>>
>> The original intent of the entity cache was a simple way to keep
frequently used values/records closer to the code that uses them, ie in the
application server. One real world example of this is the goal to be able
to render ecommerce catalog and product pages without hitting the database.
>>
>> Over time the entity caching was made more complex to handle more
caching scenarios, but still left to the developer to determine if caching
is appropriate for the code they are writing.
>>
>> In theory is it possible to write an entity cache that can be used 100%
of the time? IMO the answer is NO. This is almost possible for single
record caching, with the cache ultimately becoming an in-memory relational
database running on the app server (with full transaction support, etc)...
but for List caching it totally kills the whole concept. The current entity
cache keeps lists of results by the query condition used to get those
results and this is very different from what a database does, and makes
things rather messy and inefficient outside simple use cases.
>>
>> On top of these big functional issues (which are deal killers IMO),
there is also the performance issue. The point, or intent at least, of the
entity cache is to improve performance. As the cache gets more complex the
performance will suffer, and because of the whole concept of caching
results by queries the performance will be WORSE than the DB performance
for the same queries in most cases. Databases are quite fast and efficient,
and we'll never be able to reproduce their ability to scale and search in
something like an in-memory entity cache, especially not considering the
massive redundancy and overhead of caching lists of values by condition.
>>
>> As an example of this in the real world: on a large OFBiz project I
worked on that finished last year we went into production with the entity
cache turned OFF, completely DISABLED. Why? When doing load testing on a
whim one of the guys decided to try it without the entity cache enabled,
and the body of JMeter tests that exercised a few dozen of the most common
user paths through the system actually ran FASTER. The database (MySQL in
this case) was hit over the network, but responded quickly enough to make
things work quite well for the various find queries, and FAR faster for
updates, especially creates. This project was one of the higher volume
projects I'm aware of for OFBiz, at peaks handling sustained processing of
around 10 orders per second (36,000 per hour), with some short term peaks
much higher, closer to 20-30 orders per second... and longer term peaks
hitting over 200k orders in one day (north America only day time, around a
12 hour window).
>>
>> I found this to be curious so looked into it a bit more and the main
performance culprit was updates, ESPECIALLY creates on any entity that has
an active list cache. Auto-clearing that cache requires running the
condition for each cache entry on the record to see if it matches, and if
it does then it is cleared. This could be made more efficient by expanding
the reverse index concept to index all values of fields in conditions...
though that would be fairly complex to implement because of the wide
variety of conditions that CAN be performed on fields, and even moreso when
they are combined with other logic... especially NOTs and ORs. This could
potentially increase performance, but would again add yet more complexity
and overhead.
>>
>> To turn this dilemma into a nightmare, consider caching view-entities.
In general as systems scale if you ever have to iterate over stuff your
performance is going to get hit REALLY hard compared to indexed and other
less than n operations.
>>
>> The main lesson from the story: caching, especially list caching, should
ONLY be done in limited cases when the ratio of reads to write is VERY
high, and more particularly the ratio of reads to creates. When considering
whether to use a cache this should be considered carefully, because records
are sometimes updated from places that developers are unaware, sometimes at
surprising volumes. For example, it might seem great (and help a lot in dev
and lower scale testing) to cache inventory information for viewing on a
category screen, but always go to the DB to avoid stale data on a product
detail screen and when adding to cart. The problem is that with high order
volumes the inventory data is pretty much constantly being updated, so the
caches are constantly... SLOWLY... being cleared as InventoryDetail records
are created for reservations and issuances.
>>
>> To turn this nightmare into a deal killer, consider multiple application
servers and the need for either a (SLOW) distributed cache or (SLOW)
distributed cache clearing. These have to go over the network anyway, so
might as well go to the database!
>>
>> In the case above where we decided to NOT use the entity cache at all
the tests were run on one really beefy server showing that disabling the
cache was faster. When we ran it in a cluster of just 2 servers with direct
DCC (the best case scenario for a distributed cache) we not only saw a big
performance hit, but also got various run-time errors from stale data.
>>
>> I really don't how anyone could back the concept of caching all finds by
default... you don't even have to imagine edge cases, just consider the
problems ALREADY being faced with more limited caching and how often the
entity cache simply isn't a good solution.
>>
>> As for improving the entity caching in OFBiz, there are some concepts in
Moqui that might be useful:
>>
>> 1. add a cache attribute to the entity definition with true, false, and
never options; true and false being defaults that can be overridden by
code, and never being an absolute (OFBiz does have this option IIRC); this
would default to false, true being a useful setting for common things like
Enumeration, StatusItem, etc, etc
>>
>> 2. add general support in the entity engine find methods for a "for
update" parameter, and if true don't cache (and pass this on to the DB to
lock the record(s) being queried), also making the value mutable
>>
>> 3. a write-through per-transaction cache; you can do some really cool
stuff with this, avoiding most database hits during a transaction until the
end when the changes are dumped to the DB; the Moqui implementation of this
concept even looks for cached records that any find condition would require
to get results and does the query in-memory, not having to go to the
database at all... and for other queries augments the results with values
in the cache
>>
>> The whole concept of a write-through cache that is limited to the scope
of a single transaction shows some of the issues you would run into even if
trying to make the entity cache transactional. Especially with more complex
finds it just falls apart. The current Moqui implementation handles quite a
bit, but there are various things that I've run into testing it with
real-world business services that are either a REAL pain to handle (so I
haven't yet, but it is conceptually possible) or that I simply can't think
of any good way to handle... and for those you simply can't use the
write-through cache.
>>
>> There are some notes in the code for this, and some code/comments to
more thoroughly communicate this concept, in this class in Moqui:
>>
>>
https://github.com/moqui/moqui/blob/master/framework/src/main/groovy/org/moqui/impl/context/TransactionCache.groovy
>>
>> I should also say that my motivation to handle every edge case even for
this write-through cache is limited... yes there is room for improvement
handling more scenarios, but how big will the performance increase ACTUALLY
be for them? The efforts on this so far have been based on profiling
results and making sure there is a significant difference (which there is
for many services in Mantle Business Artifacts, though I haven't even come
close to testing all of them this way).
>>
>> The same concept would apply to a read-only entity cache... some things
might be possible to support, but would NOT improve performance making them
a moot point.
>>
>> I don't know if I've written enough to convince everyone listening that
even attempting a universal read-only entity cache is a useless idea... I'm
sure some will still like the idea. If anyone gets into it and wants to try
it out in their own branch of OFBiz, great... knock yourself out (probably
literally...). But PLEASE no one ever commit something like this to the
primary branch in the repo... not EVER.
>>
>> The whole idea that the OFBiz entity cache has had more limited ability
to handle different scenarios in the past than it does now is not an
argument of any sort supporting the idea of taking the entity cache to the
ultimate possible end... which theoretically isn't even that far from where
it is now.
>>
>> To apply a more useful standard the arguments should be for a _useful_
objective, which means increasing performance. I guarantee an always used
find cache will NOT increase performance, it will kill it dead and cause
infinite concurrency headaches in the process.
>>
>> -David
>>
>>
>>
>>
>>> On 19 Mar 2015, at 10:46, Adrian Crum <
[hidden email]> wrote:
>>>
>>> The translation to English is not good, but I think I understand what
you are saying.
>>>
>>> The entity values in the cache MUST be immutable - because multiple
threads share the values. To do otherwise would require complicated
synchronization code in GenericValue (which would cause blocking and hurt
performance).
>>>
>>> When I first starting working on the entity cache issues, it appeared
to me that mutable entity values may have been in the original design (to
enable a write-through cache). That is my guess - I am not sure. At some
time, the entity values in the cache were made immutable, but the change
was incomplete - some cached entity values were immutable and others were
not. That is one of the things I fixed - I made sure ALL entity values
coming from the cache are immutable.
>>>
>>> One way we can eliminate the additional complication of cloning
immutable entity values is to wrap the List in a custom Iterator
implementation that automatically clones elements as they are retrieved
from the List. The drawback is the performance hit - because you would be
cloning values that might not get modified. I think it is more efficient to
clone an entity value only when you intend to modify it.

>>>
>>> Adrian Crum
>>> Sandglass Software
>>> www.sandglass-software.com
>>>
>>> On 3/19/2015 4:19 PM, Nicolas Malin wrote:
>>>>
>>>> Le 18/03/2015 13:16, Adrian Crum a écrit :
>>>>>
>>>>> If you code Delegator calls to avoid the cache, then there is no way
>>>>> for a sysadmin to configure the caching behavior - that bit of code
>>>>> will ALWAYS make a database call.
>>>>>
>>>>> If you make all Delegator calls use the cache, then there is an
>>>>> additional complication that will add a bit more code: the
>>>>> GenericValue instances retrieved from the cache are immutable - if you
>>>>> want to modify them, then you will have to clone them. So, this
>>>>> approach can produce an additional line of code.
>>>>
>>>>
>>>> I don't see any logical reason why we need to keep a GenericValue came
>>>> from cache as immutable. In large vision, a developper give information
>>>> on cache or not only he want force the cache using during his process.
>>>> As OFBiz manage by default transaction, timezone, locale, auto-matching
>>>> or others.
>>>> The entity engine would be works with admin sys cache tuning.
>>>>
>>>> As example delegator.find("Party", "partyId", partyId) use the default
>>>> parameter from cache.properties and after the store on a cached
>>>> GenericValue is a delegator's problem. I see a simple test like that :
>>>> if (genericValue came from cache) {
>>>> if (value is already done) {
>>>> getFromDataBase
>>>> update Value
>>>> }
>>>> else refuse (or not I have a doubt :) )
>>>> }
>>>> store
>>>>
>>>>
>>>> Nicolas
>>
>>

Adrian Crum-3

Re: Entity Caching

I will try to say it again, but differently.

If I am a developer, I am not aware of the subtleties of caching various
entities. Entity cache settings will be determined during staging. So, I
write my code as if everything will be cached - leaving the door open
for a sysadmin to configure caching during staging.

During staging, a sysadmin can start off with caching disabled, and then
switch on caching for various entities while performance tests are being
run. After some time, the sysadmin will have cache settings that provide
optimal throughput. Does that mean ALL entities are cached? No, only the
ones that need to be.

The point I'm trying to make is this: The decision to cache or not
should be made by a sysadmin, not by a developer.

Adrian Crum
Sandglass Software
www.sandglass-software.com

On 3/21/2015 10:08 AM, Scott Gray wrote:

>> My preference is to make ALL Delegator calls use the cache.
>
> Perhaps I misunderstood the above sentence? I responded because I don't
> think caching everything is a good idea
>
> On 21 Mar 2015 20:41, "Adrian Crum" <[hidden email]>
> wrote:
>>
>> Thanks for the info David! I agree 100% with everything you said.
>>
>> There may be some misunderstanding about my advice. I suggested that
> caching should be configured in the settings file, I did not suggest that
> everything should be cached all the time.
>>
>> Like you said, JMeter tests can reveal what needs to be cached, and a
> sysadmin can fine-tune performance by tweaking the cache settings. The
> problem I mentioned is this: A sysadmin can't improve performance by
> caching a particular entity if a developer has hard-coded it not to be
> cached.
>>
>> Btw, I removed the complicated condition checking in the condition cache
> because it didn't work. Not only was the system spending a lot of time
> evaluating long lists of values (each value having a potentially long list
> of conditions), at the end of the evaluation the result was always a cache
> miss.
>>
>>
>>
>> Adrian Crum
>> Sandglass Software
>> www.sandglass-software.com
>>
>> On 3/20/2015 9:22 PM, David E. Jones wrote:
>>>
>>>
>>> Stepping back a little, some history and theory of the entity cache
> might be helpful.
>>>
>>> The original intent of the entity cache was a simple way to keep
> frequently used values/records closer to the code that uses them, ie in the
> application server. One real world example of this is the goal to be able
> to render ecommerce catalog and product pages without hitting the database.
>>>
>>> Over time the entity caching was made more complex to handle more
> caching scenarios, but still left to the developer to determine if caching
> is appropriate for the code they are writing.
>>>
>>> In theory is it possible to write an entity cache that can be used 100%
> of the time? IMO the answer is NO. This is almost possible for single
> record caching, with the cache ultimately becoming an in-memory relational
> database running on the app server (with full transaction support, etc)...
> but for List caching it totally kills the whole concept. The current entity
> cache keeps lists of results by the query condition used to get those
> results and this is very different from what a database does, and makes
> things rather messy and inefficient outside simple use cases.
>>>
>>> On top of these big functional issues (which are deal killers IMO),
> there is also the performance issue. The point, or intent at least, of the
> entity cache is to improve performance. As the cache gets more complex the
> performance will suffer, and because of the whole concept of caching
> results by queries the performance will be WORSE than the DB performance
> for the same queries in most cases. Databases are quite fast and efficient,
> and we'll never be able to reproduce their ability to scale and search in
> something like an in-memory entity cache, especially not considering the
> massive redundancy and overhead of caching lists of values by condition.
>>>
>>> As an example of this in the real world: on a large OFBiz project I
> worked on that finished last year we went into production with the entity
> cache turned OFF, completely DISABLED. Why? When doing load testing on a
> whim one of the guys decided to try it without the entity cache enabled,
> and the body of JMeter tests that exercised a few dozen of the most common
> user paths through the system actually ran FASTER. The database (MySQL in
> this case) was hit over the network, but responded quickly enough to make
> things work quite well for the various find queries, and FAR faster for
> updates, especially creates. This project was one of the higher volume
> projects I'm aware of for OFBiz, at peaks handling sustained processing of
> around 10 orders per second (36,000 per hour), with some short term peaks
> much higher, closer to 20-30 orders per second... and longer term peaks
> hitting over 200k orders in one day (north America only day time, around a
> 12 hour window).
>>>
>>> I found this to be curious so looked into it a bit more and the main
> performance culprit was updates, ESPECIALLY creates on any entity that has
> an active list cache. Auto-clearing that cache requires running the
> condition for each cache entry on the record to see if it matches, and if
> it does then it is cleared. This could be made more efficient by expanding
> the reverse index concept to index all values of fields in conditions...
> though that would be fairly complex to implement because of the wide
> variety of conditions that CAN be performed on fields, and even moreso when
> they are combined with other logic... especially NOTs and ORs. This could
> potentially increase performance, but would again add yet more complexity
> and overhead.
>>>
>>> To turn this dilemma into a nightmare, consider caching view-entities.
> In general as systems scale if you ever have to iterate over stuff your
> performance is going to get hit REALLY hard compared to indexed and other
> less than n operations.
>>>
>>> The main lesson from the story: caching, especially list caching, should
> ONLY be done in limited cases when the ratio of reads to write is VERY
> high, and more particularly the ratio of reads to creates. When considering
> whether to use a cache this should be considered carefully, because records
> are sometimes updated from places that developers are unaware, sometimes at
> surprising volumes. For example, it might seem great (and help a lot in dev
> and lower scale testing) to cache inventory information for viewing on a
> category screen, but always go to the DB to avoid stale data on a product
> detail screen and when adding to cart. The problem is that with high order
> volumes the inventory data is pretty much constantly being updated, so the
> caches are constantly... SLOWLY... being cleared as InventoryDetail records
> are created for reservations and issuances.
>>>
>>> To turn this nightmare into a deal killer, consider multiple application
> servers and the need for either a (SLOW) distributed cache or (SLOW)
> distributed cache clearing. These have to go over the network anyway, so
> might as well go to the database!
>>>
>>> In the case above where we decided to NOT use the entity cache at all
> the tests were run on one really beefy server showing that disabling the
> cache was faster. When we ran it in a cluster of just 2 servers with direct
> DCC (the best case scenario for a distributed cache) we not only saw a big
> performance hit, but also got various run-time errors from stale data.
>>>
>>> I really don't how anyone could back the concept of caching all finds by
> default... you don't even have to imagine edge cases, just consider the
> problems ALREADY being faced with more limited caching and how often the
> entity cache simply isn't a good solution.
>>>
>>> As for improving the entity caching in OFBiz, there are some concepts in
> Moqui that might be useful:
>>>
>>> 1. add a cache attribute to the entity definition with true, false, and
> never options; true and false being defaults that can be overridden by
> code, and never being an absolute (OFBiz does have this option IIRC); this
> would default to false, true being a useful setting for common things like
> Enumeration, StatusItem, etc, etc
>>>
>>> 2. add general support in the entity engine find methods for a "for
> update" parameter, and if true don't cache (and pass this on to the DB to
> lock the record(s) being queried), also making the value mutable
>>>
>>> 3. a write-through per-transaction cache; you can do some really cool
> stuff with this, avoiding most database hits during a transaction until the
> end when the changes are dumped to the DB; the Moqui implementation of this
> concept even looks for cached records that any find condition would require
> to get results and does the query in-memory, not having to go to the
> database at all... and for other queries augments the results with values
> in the cache
>>>
>>> The whole concept of a write-through cache that is limited to the scope
> of a single transaction shows some of the issues you would run into even if
> trying to make the entity cache transactional. Especially with more complex
> finds it just falls apart. The current Moqui implementation handles quite a
> bit, but there are various things that I've run into testing it with
> real-world business services that are either a REAL pain to handle (so I
> haven't yet, but it is conceptually possible) or that I simply can't think
> of any good way to handle... and for those you simply can't use the
> write-through cache.
>>>
>>> There are some notes in the code for this, and some code/comments to
> more thoroughly communicate this concept, in this class in Moqui:
>>>
>>>
> https://github.com/moqui/moqui/blob/master/framework/src/main/groovy/org/moqui/impl/context/TransactionCache.groovy
>>>
>>> I should also say that my motivation to handle every edge case even for
> this write-through cache is limited... yes there is room for improvement
> handling more scenarios, but how big will the performance increase ACTUALLY
> be for them? The efforts on this so far have been based on profiling
> results and making sure there is a significant difference (which there is
> for many services in Mantle Business Artifacts, though I haven't even come
> close to testing all of them this way).
>>>
>>> The same concept would apply to a read-only entity cache... some things
> might be possible to support, but would NOT improve performance making them
> a moot point.
>>>
>>> I don't know if I've written enough to convince everyone listening that
> even attempting a universal read-only entity cache is a useless idea... I'm
> sure some will still like the idea. If anyone gets into it and wants to try
> it out in their own branch of OFBiz, great... knock yourself out (probably
> literally...). But PLEASE no one ever commit something like this to the
> primary branch in the repo... not EVER.
>>>
>>> The whole idea that the OFBiz entity cache has had more limited ability
> to handle different scenarios in the past than it does now is not an
> argument of any sort supporting the idea of taking the entity cache to the
> ultimate possible end... which theoretically isn't even that far from where
> it is now.
>>>
>>> To apply a more useful standard the arguments should be for a _useful_
> objective, which means increasing performance. I guarantee an always used
> find cache will NOT increase performance, it will kill it dead and cause
> infinite concurrency headaches in the process.
>>>
>>> -David
>>>
>>>
>>>
>>>
>>>> On 19 Mar 2015, at 10:46, Adrian Crum <
> [hidden email]> wrote:
>>>>
>>>> The translation to English is not good, but I think I understand what
> you are saying.
>>>>
>>>> The entity values in the cache MUST be immutable - because multiple
> threads share the values. To do otherwise would require complicated
> synchronization code in GenericValue (which would cause blocking and hurt
> performance).
>>>>
>>>> When I first starting working on the entity cache issues, it appeared
> to me that mutable entity values may have been in the original design (to
> enable a write-through cache). That is my guess - I am not sure. At some
> time, the entity values in the cache were made immutable, but the change
> was incomplete - some cached entity values were immutable and others were
> not. That is one of the things I fixed - I made sure ALL entity values
> coming from the cache are immutable.
>>>>
>>>> One way we can eliminate the additional complication of cloning
> immutable entity values is to wrap the List in a custom Iterator
> implementation that automatically clones elements as they are retrieved
> from the List. The drawback is the performance hit - because you would be
> cloning values that might not get modified. I think it is more efficient to
> clone an entity value only when you intend to modify it.
>>>>
>>>> Adrian Crum
>>>> Sandglass Software
>>>> www.sandglass-software.com
>>>>
>>>> On 3/19/2015 4:19 PM, Nicolas Malin wrote:
>>>>>
>>>>> Le 18/03/2015 13:16, Adrian Crum a écrit :
>>>>>>
>>>>>> If you code Delegator calls to avoid the cache, then there is no way
>>>>>> for a sysadmin to configure the caching behavior - that bit of code
>>>>>> will ALWAYS make a database call.
>>>>>>
>>>>>> If you make all Delegator calls use the cache, then there is an
>>>>>> additional complication that will add a bit more code: the
>>>>>> GenericValue instances retrieved from the cache are immutable - if you
>>>>>> want to modify them, then you will have to clone them. So, this
>>>>>> approach can produce an additional line of code.
>>>>>
>>>>>
>>>>> I don't see any logical reason why we need to keep a GenericValue came
>>>>> from cache as immutable. In large vision, a developper give information
>>>>> on cache or not only he want force the cache using during his process.
>>>>> As OFBiz manage by default transaction, timezone, locale, auto-matching
>>>>> or others.
>>>>> The entity engine would be works with admin sys cache tuning.
>>>>>
>>>>> As example delegator.find("Party", "partyId", partyId) use the default
>>>>> parameter from cache.properties and after the store on a cached
>>>>> GenericValue is a delegator's problem. I see a simple test like that :
>>>>> if (genericValue came from cache) {
>>>>> if (value is already done) {
>>>>> getFromDataBase
>>>>> update Value
>>>>> }
>>>>> else refuse (or not I have a doubt :) )
>>>>> }
>>>>> store
>>>>>
>>>>>
>>>>> Nicolas
>>>
>>>
>

Pierre Smits

Re: Entity Caching

<quote>The decision to cache or not should be made by a sysadmin</quote>

I agree. This should be a configuration aspect. But I would suggest to have
it by default set to false. We have to take into consideration that in
testing and staging environments the data sets to work with can be limited
in size as compared to the production environment. And performance
measurements might go unnoticed.

Having it set to false by default, can also build merit for the system
integrator/system admin when he switches the caching configuration from
false to true to improve performance (perception). When done with true by
default and done the other way around, I expect that yielding the same
result might prove more difficult.

Best regards,

Pierre Smits

*ORRTIZ.COM <http://www.orrtiz.com>*
Services & Solutions for Cloud-
Based Manufacturing, Professional
Services and Retail & Trade
http://www.orrtiz.com

On Sat, Mar 21, 2015 at 11:22 AM, Adrian Crum <
[hidden email]> wrote:

> I will try to say it again, but differently.
>
> If I am a developer, I am not aware of the subtleties of caching various
> entities. Entity cache settings will be determined during staging. So, I
> write my code as if everything will be cached - leaving the door open for a
> sysadmin to configure caching during staging.
>
> During staging, a sysadmin can start off with caching disabled, and then
> switch on caching for various entities while performance tests are being
> run. After some time, the sysadmin will have cache settings that provide
> optimal throughput. Does that mean ALL entities are cached? No, only the
> ones that need to be.
>
> The point I'm trying to make is this: The decision to cache or not should
> be made by a sysadmin, not by a developer.
>
> Adrian Crum
> Sandglass Software
> www.sandglass-software.com
>

Ron Wheeler

Re: Entity Caching

In reply to this post by Adrian Crum-3

I agree with Adrian that caching should be a sysadmin choice.

I would also caution that measuring cache performance during testing is
not a very useful activity. Testing tends to test one use case once and
move on to the next.
In production, users tend to do the same thing over and over.
Testing might fill a shopping cart a few times and do a lot of other
administrative functions as many times . In real life, shopping carts
are filled much more frequently than catalog updates (one hopes). Using
performance numbers from functional testing will be misleading.

The other message that I get from David's discussion is that caching
built by professional caching experts (Database developers as he
mentioned) worked better than caching systems built by application
developers.
It is likely that ehcache and the database built-in caching functions
will outperform caching systems built by OFBiz developers and will
handle the main cases better and will handle edge cases properly. They
will probably integrate better and be easier to configure at run-time or
during deployment. They will also be easier to tune by the system
administrator.

I understand that Adrian needs to fix this quickly. I suppose that
caching could be eliminated to solve the problem while a better solution
is implemented.

Do we know what it will take to add enough ehcache to make the system
perform adequately to meet current requirements?

Ron

On 21/03/2015 6:22 AM, Adrian Crum wrote:

> I will try to say it again, but differently.
>
> If I am a developer, I am not aware of the subtleties of caching
> various entities. Entity cache settings will be determined during
> staging. So, I write my code as if everything will be cached - leaving
> the door open for a sysadmin to configure caching during staging.
>
> During staging, a sysadmin can start off with caching disabled, and
> then switch on caching for various entities while performance tests
> are being run. After some time, the sysadmin will have cache settings
> that provide optimal throughput. Does that mean ALL entities are
> cached? No, only the ones that need to be.
>
> The point I'm trying to make is this: The decision to cache or not
> should be made by a sysadmin, not by a developer.
>
> Adrian Crum
> Sandglass Software
> www.sandglass-software.com
>
> On 3/21/2015 10:08 AM, Scott Gray wrote:
>>> My preference is to make ALL Delegator calls use the cache.
>>
>> Perhaps I misunderstood the above sentence? I responded because I don't
>> think caching everything is a good idea
>>
>> On 21 Mar 2015 20:41, "Adrian Crum" <[hidden email]>
>> wrote:
>>>
>>> Thanks for the info David! I agree 100% with everything you said.
>>>
>>> There may be some misunderstanding about my advice. I suggested that
>> caching should be configured in the settings file, I did not suggest
>> that
>> everything should be cached all the time.
>>>
>>> Like you said, JMeter tests can reveal what needs to be cached, and a
>> sysadmin can fine-tune performance by tweaking the cache settings. The
>> problem I mentioned is this: A sysadmin can't improve performance by
>> caching a particular entity if a developer has hard-coded it not to be
>> cached.
>>>
>>> Btw, I removed the complicated condition checking in the condition
>>> cache
>> because it didn't work. Not only was the system spending a lot of time
>> evaluating long lists of values (each value having a potentially long
>> list
>> of conditions), at the end of the evaluation the result was always a
>> cache
>> miss.
>>>
>>>
>>>
>>> Adrian Crum
>>> Sandglass Software
>>> www.sandglass-software.com
>>>
>>> On 3/20/2015 9:22 PM, David E. Jones wrote:
>>>>
>>>>
>>>> Stepping back a little, some history and theory of the entity cache
>> might be helpful.
>>>>
>>>> The original intent of the entity cache was a simple way to keep
>> frequently used values/records closer to the code that uses them, ie
>> in the
>> application server. One real world example of this is the goal to be
>> able
>> to render ecommerce catalog and product pages without hitting the
>> database.
>>>>
>>>> Over time the entity caching was made more complex to handle more
>> caching scenarios, but still left to the developer to determine if
>> caching
>> is appropriate for the code they are writing.
>>>>
>>>> In theory is it possible to write an entity cache that can be used
>>>> 100%
>> of the time? IMO the answer is NO. This is almost possible for single
>> record caching, with the cache ultimately becoming an in-memory
>> relational
>> database running on the app server (with full transaction support,
>> etc)...
>> but for List caching it totally kills the whole concept. The current
>> entity
>> cache keeps lists of results by the query condition used to get those
>> results and this is very different from what a database does, and makes
>> things rather messy and inefficient outside simple use cases.
>>>>
>>>> On top of these big functional issues (which are deal killers IMO),
>> there is also the performance issue. The point, or intent at least,
>> of the
>> entity cache is to improve performance. As the cache gets more
>> complex the
>> performance will suffer, and because of the whole concept of caching
>> results by queries the performance will be WORSE than the DB performance
>> for the same queries in most cases. Databases are quite fast and
>> efficient,
>> and we'll never be able to reproduce their ability to scale and
>> search in
>> something like an in-memory entity cache, especially not considering the
>> massive redundancy and overhead of caching lists of values by condition.
>>>>
>>>> As an example of this in the real world: on a large OFBiz project I
>> worked on that finished last year we went into production with the
>> entity
>> cache turned OFF, completely DISABLED. Why? When doing load testing on a
>> whim one of the guys decided to try it without the entity cache enabled,
>> and the body of JMeter tests that exercised a few dozen of the most
>> common
>> user paths through the system actually ran FASTER. The database
>> (MySQL in
>> this case) was hit over the network, but responded quickly enough to
>> make
>> things work quite well for the various find queries, and FAR faster for
>> updates, especially creates. This project was one of the higher volume
>> projects I'm aware of for OFBiz, at peaks handling sustained
>> processing of
>> around 10 orders per second (36,000 per hour), with some short term
>> peaks
>> much higher, closer to 20-30 orders per second... and longer term peaks
>> hitting over 200k orders in one day (north America only day time,
>> around a
>> 12 hour window).
>>>>
>>>> I found this to be curious so looked into it a bit more and the main
>> performance culprit was updates, ESPECIALLY creates on any entity
>> that has
>> an active list cache. Auto-clearing that cache requires running the
>> condition for each cache entry on the record to see if it matches,
>> and if
>> it does then it is cleared. This could be made more efficient by
>> expanding
>> the reverse index concept to index all values of fields in conditions...
>> though that would be fairly complex to implement because of the wide
>> variety of conditions that CAN be performed on fields, and even
>> moreso when
>> they are combined with other logic... especially NOTs and ORs. This
>> could
>> potentially increase performance, but would again add yet more
>> complexity
>> and overhead.
>>>>
>>>> To turn this dilemma into a nightmare, consider caching view-entities.
>> In general as systems scale if you ever have to iterate over stuff your
>> performance is going to get hit REALLY hard compared to indexed and
>> other
>> less than n operations.
>>>>
>>>> The main lesson from the story: caching, especially list caching,
>>>> should
>> ONLY be done in limited cases when the ratio of reads to write is VERY
>> high, and more particularly the ratio of reads to creates. When
>> considering
>> whether to use a cache this should be considered carefully, because
>> records
>> are sometimes updated from places that developers are unaware,
>> sometimes at
>> surprising volumes. For example, it might seem great (and help a lot
>> in dev
>> and lower scale testing) to cache inventory information for viewing on a
>> category screen, but always go to the DB to avoid stale data on a
>> product
>> detail screen and when adding to cart. The problem is that with high
>> order
>> volumes the inventory data is pretty much constantly being updated,
>> so the
>> caches are constantly... SLOWLY... being cleared as InventoryDetail
>> records
>> are created for reservations and issuances.
>>>>
>>>> To turn this nightmare into a deal killer, consider multiple
>>>> application
>> servers and the need for either a (SLOW) distributed cache or (SLOW)
>> distributed cache clearing. These have to go over the network anyway, so
>> might as well go to the database!
>>>>
>>>> In the case above where we decided to NOT use the entity cache at all
>> the tests were run on one really beefy server showing that disabling the
>> cache was faster. When we ran it in a cluster of just 2 servers with
>> direct
>> DCC (the best case scenario for a distributed cache) we not only saw
>> a big
>> performance hit, but also got various run-time errors from stale data.
>>>>
>>>> I really don't how anyone could back the concept of caching all
>>>> finds by
>> default... you don't even have to imagine edge cases, just consider the
>> problems ALREADY being faced with more limited caching and how often the
>> entity cache simply isn't a good solution.
>>>>
>>>> As for improving the entity caching in OFBiz, there are some
>>>> concepts in
>> Moqui that might be useful:
>>>>
>>>> 1. add a cache attribute to the entity definition with true, false,
>>>> and
>> never options; true and false being defaults that can be overridden by
>> code, and never being an absolute (OFBiz does have this option IIRC);
>> this
>> would default to false, true being a useful setting for common things
>> like
>> Enumeration, StatusItem, etc, etc
>>>>
>>>> 2. add general support in the entity engine find methods for a "for
>> update" parameter, and if true don't cache (and pass this on to the
>> DB to
>> lock the record(s) being queried), also making the value mutable
>>>>
>>>> 3. a write-through per-transaction cache; you can do some really cool
>> stuff with this, avoiding most database hits during a transaction
>> until the
>> end when the changes are dumped to the DB; the Moqui implementation
>> of this
>> concept even looks for cached records that any find condition would
>> require
>> to get results and does the query in-memory, not having to go to the
>> database at all... and for other queries augments the results with
>> values
>> in the cache
>>>>
>>>> The whole concept of a write-through cache that is limited to the
>>>> scope
>> of a single transaction shows some of the issues you would run into
>> even if
>> trying to make the entity cache transactional. Especially with more
>> complex
>> finds it just falls apart. The current Moqui implementation handles
>> quite a
>> bit, but there are various things that I've run into testing it with
>> real-world business services that are either a REAL pain to handle (so I
>> haven't yet, but it is conceptually possible) or that I simply can't
>> think
>> of any good way to handle... and for those you simply can't use the
>> write-through cache.
>>>>
>>>> There are some notes in the code for this, and some code/comments to
>> more thoroughly communicate this concept, in this class in Moqui:
>>>>
>>>>
>> https://github.com/moqui/moqui/blob/master/framework/src/main/groovy/org/moqui/impl/context/TransactionCache.groovy
>>
>>>>
>>>> I should also say that my motivation to handle every edge case even
>>>> for
>> this write-through cache is limited... yes there is room for improvement
>> handling more scenarios, but how big will the performance increase
>> ACTUALLY
>> be for them? The efforts on this so far have been based on profiling
>> results and making sure there is a significant difference (which
>> there is
>> for many services in Mantle Business Artifacts, though I haven't even
>> come
>> close to testing all of them this way).
>>>>
>>>> The same concept would apply to a read-only entity cache... some
>>>> things
>> might be possible to support, but would NOT improve performance
>> making them
>> a moot point.
>>>>
>>>> I don't know if I've written enough to convince everyone listening
>>>> that
>> even attempting a universal read-only entity cache is a useless
>> idea... I'm
>> sure some will still like the idea. If anyone gets into it and wants
>> to try
>> it out in their own branch of OFBiz, great... knock yourself out
>> (probably
>> literally...). But PLEASE no one ever commit something like this to the
>> primary branch in the repo... not EVER.
>>>>
>>>> The whole idea that the OFBiz entity cache has had more limited
>>>> ability
>> to handle different scenarios in the past than it does now is not an
>> argument of any sort supporting the idea of taking the entity cache
>> to the
>> ultimate possible end... which theoretically isn't even that far from
>> where
>> it is now.
>>>>
>>>> To apply a more useful standard the arguments should be for a _useful_
>> objective, which means increasing performance. I guarantee an always
>> used
>> find cache will NOT increase performance, it will kill it dead and cause
>> infinite concurrency headaches in the process.
>>>>
>>>> -David
>>>>
>>>>
>>>>
>>>>
>>>>> On 19 Mar 2015, at 10:46, Adrian Crum <
>> [hidden email]> wrote:
>>>>>
>>>>> The translation to English is not good, but I think I understand what
>> you are saying.
>>>>>
>>>>> The entity values in the cache MUST be immutable - because multiple
>> threads share the values. To do otherwise would require complicated
>> synchronization code in GenericValue (which would cause blocking and
>> hurt
>> performance).
>>>>>
>>>>> When I first starting working on the entity cache issues, it appeared
>> to me that mutable entity values may have been in the original design
>> (to
>> enable a write-through cache). That is my guess - I am not sure. At some
>> time, the entity values in the cache were made immutable, but the change
>> was incomplete - some cached entity values were immutable and others
>> were
>> not. That is one of the things I fixed - I made sure ALL entity values
>> coming from the cache are immutable.
>>>>>
>>>>> One way we can eliminate the additional complication of cloning
>> immutable entity values is to wrap the List in a custom Iterator
>> implementation that automatically clones elements as they are retrieved
>> from the List. The drawback is the performance hit - because you
>> would be
>> cloning values that might not get modified. I think it is more
>> efficient to
>> clone an entity value only when you intend to modify it.
>>>>>
>>>>> Adrian Crum
>>>>> Sandglass Software
>>>>> www.sandglass-software.com
>>>>>
>>>>> On 3/19/2015 4:19 PM, Nicolas Malin wrote:
>>>>>>
>>>>>> Le 18/03/2015 13:16, Adrian Crum a écrit :
>>>>>>>
>>>>>>> If you code Delegator calls to avoid the cache, then there is no
>>>>>>> way
>>>>>>> for a sysadmin to configure the caching behavior - that bit of code
>>>>>>> will ALWAYS make a database call.
>>>>>>>
>>>>>>> If you make all Delegator calls use the cache, then there is an
>>>>>>> additional complication that will add a bit more code: the
>>>>>>> GenericValue instances retrieved from the cache are immutable -
>>>>>>> if you
>>>>>>> want to modify them, then you will have to clone them. So, this
>>>>>>> approach can produce an additional line of code.
>>>>>>
>>>>>>
>>>>>> I don't see any logical reason why we need to keep a GenericValue
>>>>>> came
>>>>>> from cache as immutable. In large vision, a developper give
>>>>>> information
>>>>>> on cache or not only he want force the cache using during his
>>>>>> process.
>>>>>> As OFBiz manage by default transaction, timezone, locale,
>>>>>> auto-matching
>>>>>> or others.
>>>>>> The entity engine would be works with admin sys cache tuning.
>>>>>>
>>>>>> As example delegator.find("Party", "partyId", partyId) use the
>>>>>> default
>>>>>> parameter from cache.properties and after the store on a cached
>>>>>> GenericValue is a delegator's problem. I see a simple test like
>>>>>> that :
>>>>>> if (genericValue came from cache) {
>>>>>> if (value is already done) {
>>>>>> getFromDataBase
>>>>>> update Value
>>>>>> }
>>>>>> else refuse (or not I have a doubt :) )
>>>>>> }
>>>>>> store
>>>>>>
>>>>>>
>>>>>> Nicolas
>>>>
>>>>
>>
>

--
Ron Wheeler
President
Artifact Software Inc
email: [hidden email]
skype: ronaldmwheeler
phone: 866-970-2435, ext 102

Christian Carlow-OFBizzer

Re: Entity Caching

Is there a convenient setting for disabling cache completely as David
mentioned he did?

On Sat, 2015-03-21 at 21:39 -0400, Ron Wheeler wrote:

> I agree with Adrian that caching should be a sysadmin choice.
>
> I would also caution that measuring cache performance during testing is
> not a very useful activity. Testing tends to test one use case once and
> move on to the next.
> In production, users tend to do the same thing over and over.
> Testing might fill a shopping cart a few times and do a lot of other
> administrative functions as many times . In real life, shopping carts
> are filled much more frequently than catalog updates (one hopes). Using
> performance numbers from functional testing will be misleading.
>
> The other message that I get from David's discussion is that caching t
> built by professional caching experts (Database developers as he
> mentioned) worked better than caching systems built by application
> developers.
> It is likely that ehcache and the database built-in caching functions
> will outperform caching systems built by OFBiz developers and will
> handle the main cases better and will handle edge cases properly. They
> will probably integrate better and be easier to configure at run-time or
> during deployment. They will also be easier to tune by the system
> administrator.
>
> I understand that Adrian needs to fix this quickly. I suppose that
> caching could be eliminated to solve the problem while a better solution
> is implemented.
>
> Do we know what it will take to add enough ehcache to make the system
> perform adequately to meet current requirements?
>
> Ron
>
>
> On 21/03/2015 6:22 AM, Adrian Crum wrote:
> > I will try to say it again, but differently.
> >
> > If I am a developer, I am not aware of the subtleties of caching
> > various entities. Entity cache settings will be determined during
> > staging. So, I write my code as if everything will be cached - leaving
> > the door open for a sysadmin to configure caching during staging.
> >
> > During staging, a sysadmin can start off with caching disabled, and
> > then switch on caching for various entities while performance tests
> > are being run. After some time, the sysadmin will have cache settings
> > that provide optimal throughput. Does that mean ALL entities are
> > cached? No, only the ones that need to be.
> >
> > The point I'm trying to make is this: The decision to cache or not
> > should be made by a sysadmin, not by a developer.
> >
> > Adrian Crum
> > Sandglass Software
> > www.sandglass-software.com
> >
> > On 3/21/2015 10:08 AM, Scott Gray wrote:
> >>> My preference is to make ALL Delegator calls use the cache.
> >>
> >> Perhaps I misunderstood the above sentence? I responded because I don't
> >> think caching everything is a good idea
> >>
> >> On 21 Mar 2015 20:41, "Adrian Crum" <[hidden email]>
> >> wrote:
> >>>
> >>> Thanks for the info David! I agree 100% with everything you said.
> >>>
> >>> There may be some misunderstanding about my advice. I suggested that
> >> caching should be configured in the settings file, I did not suggest
> >> that
> >> everything should be cached all the time.
> >>>
> >>> Like you said, JMeter tests can reveal what needs to be cached, and a
> >> sysadmin can fine-tune performance by tweaking the cache settings. The
> >> problem I mentioned is this: A sysadmin can't improve performance by
> >> caching a particular entity if a developer has hard-coded it not to be
> >> cached.
> >>>
> >>> Btw, I removed the complicated condition checking in the condition
> >>> cache
> >> because it didn't work. Not only was the system spending a lot of time
> >> evaluating long lists of values (each value having a potentially long
> >> list
> >> of conditions), at the end of the evaluation the result was always a
> >> cache
> >> miss.
> >>>
> >>>
> >>>
> >>> Adrian Crum
> >>> Sandglass Software
> >>> www.sandglass-software.com
> >>>
> >>> On 3/20/2015 9:22 PM, David E. Jones wrote:
> >>>>
> >>>>
> >>>> Stepping back a little, some history and theory of the entity cache
> >> might be helpful.
> >>>>
> >>>> The original intent of the entity cache was a simple way to keep
> >> frequently used values/records closer to the code that uses them, ie
> >> in the
> >> application server. One real world example of this is the goal to be
> >> able
> >> to render ecommerce catalog and product pages without hitting the
> >> database.
> >>>>
> >>>> Over time the entity caching was made more complex to handle more
> >> caching scenarios, but still left to the developer to determine if
> >> caching
> >> is appropriate for the code they are writing.
> >>>>
> >>>> In theory is it possible to write an entity cache that can be used
> >>>> 100%
> >> of the time? IMO the answer is NO. This is almost possible for single
> >> record caching, with the cache ultimately becoming an in-memory
> >> relational
> >> database running on the app server (with full transaction support,
> >> etc)...
> >> but for List caching it totally kills the whole concept. The current
> >> entity
> >> cache keeps lists of results by the query condition used to get those
> >> results and this is very different from what a database does, and makes
> >> things rather messy and inefficient outside simple use cases.
> >>>>
> >>>> On top of these big functional issues (which are deal killers IMO),
> >> there is also the performance issue. The point, or intent at least,
> >> of the
> >> entity cache is to improve performance. As the cache gets more
> >> complex the
> >> performance will suffer, and because of the whole concept of caching
> >> results by queries the performance will be WORSE than the DB performance
> >> for the same queries in most cases. Databases are quite fast and
> >> efficient,
> >> and we'll never be able to reproduce their ability to scale and
> >> search in
> >> something like an in-memory entity cache, especially not considering the
> >> massive redundancy and overhead of caching lists of values by condition.
> >>>>
> >>>> As an example of this in the real world: on a large OFBiz project I
> >> worked on that finished last year we went into production with the
> >> entity
> >> cache turned OFF, completely DISABLED. Why? When doing load testing on a
> >> whim one of the guys decided to try it without the entity cache enabled,
> >> and the body of JMeter tests that exercised a few dozen of the most
> >> common
> >> user paths through the system actually ran FASTER. The database
> >> (MySQL in
> >> this case) was hit over the network, but responded quickly enough to
> >> make
> >> things work quite well for the various find queries, and FAR faster for
> >> updates, especially creates. This project was one of the higher volume
> >> projects I'm aware of for OFBiz, at peaks handling sustained
> >> processing of
> >> around 10 orders per second (36,000 per hour), with some short term
> >> peaks
> >> much higher, closer to 20-30 orders per second... and longer term peaks
> >> hitting over 200k orders in one day (north America only day time,
> >> around a
> >> 12 hour window).
> >>>>
> >>>> I found this to be curious so looked into it a bit more and the main
> >> performance culprit was updates, ESPECIALLY creates on any entity
> >> that has
> >> an active list cache. Auto-clearing that cache requires running the
> >> condition for each cache entry on the record to see if it matches,
> >> and if
> >> it does then it is cleared. This could be made more efficient by
> >> expanding
> >> the reverse index concept to index all values of fields in conditions...
> >> though that would be fairly complex to implement because of the wide
> >> variety of conditions that CAN be performed on fields, and even
> >> moreso when
> >> they are combined with other logic... especially NOTs and ORs. This
> >> could
> >> potentially increase performance, but would again add yet more
> >> complexity
> >> and overhead.
> >>>>
> >>>> To turn this dilemma into a nightmare, consider caching view-entities.
> >> In general as systems scale if you ever have to iterate over stuff your
> >> performance is going to get hit REALLY hard compared to indexed and
> >> other
> >> less than n operations.
> >>>>
> >>>> The main lesson from the story: caching, especially list caching,
> >>>> should
> >> ONLY be done in limited cases when the ratio of reads to write is VERY
> >> high, and more particularly the ratio of reads to creates. When
> >> considering
> >> whether to use a cache this should be considered carefully, because
> >> records
> >> are sometimes updated from places that developers are unaware,
> >> sometimes at
> >> surprising volumes. For example, it might seem great (and help a lot
> >> in dev
> >> and lower scale testing) to cache inventory information for viewing on a
> >> category screen, but always go to the DB to avoid stale data on a
> >> product
> >> detail screen and when adding to cart. The problem is that with high
> >> order
> >> volumes the inventory data is pretty much constantly being updated,
> >> so the
> >> caches are constantly... SLOWLY... being cleared as InventoryDetail
> >> records
> >> are created for reservations and issuances.
> >>>>
> >>>> To turn this nightmare into a deal killer, consider multiple
> >>>> application
> >> servers and the need for either a (SLOW) distributed cache or (SLOW)
> >> distributed cache clearing. These have to go over the network anyway, so
> >> might as well go to the database!
> >>>>
> >>>> In the case above where we decided to NOT use the entity cache at all
> >> the tests were run on one really beefy server showing that disabling the
> >> cache was faster. When we ran it in a cluster of just 2 servers with
> >> direct
> >> DCC (the best case scenario for a distributed cache) we not only saw
> >> a big
> >> performance hit, but also got various run-time errors from stale data.
> >>>>
> >>>> I really don't how anyone could back the concept of caching all
> >>>> finds by
> >> default... you don't even have to imagine edge cases, just consider the
> >> problems ALREADY being faced with more limited caching and how often the
> >> entity cache simply isn't a good solution.
> >>>>
> >>>> As for improving the entity caching in OFBiz, there are some
> >>>> concepts in
> >> Moqui that might be useful:
> >>>>
> >>>> 1. add a cache attribute to the entity definition with true, false,
> >>>> and
> >> never options; true and false being defaults that can be overridden by
> >> code, and never being an absolute (OFBiz does have this option IIRC);
> >> this
> >> would default to false, true being a useful setting for common things
> >> like
> >> Enumeration, StatusItem, etc, etc
> >>>>
> >>>> 2. add general support in the entity engine find methods for a "for
> >> update" parameter, and if true don't cache (and pass this on to the
> >> DB to
> >> lock the record(s) being queried), also making the value mutable
> >>>>
> >>>> 3. a write-through per-transaction cache; you can do some really cool
> >> stuff with this, avoiding most database hits during a transaction
> >> until the
> >> end when the changes are dumped to the DB; the Moqui implementation
> >> of this
> >> concept even looks for cached records that any find condition would
> >> require
> >> to get results and does the query in-memory, not having to go to the
> >> database at all... and for other queries augments the results with
> >> values
> >> in the cache
> >>>>
> >>>> The whole concept of a write-through cache that is limited to the
> >>>> scope
> >> of a single transaction shows some of the issues you would run into
> >> even if
> >> trying to make the entity cache transactional. Especially with more
> >> complex
> >> finds it just falls apart. The current Moqui implementation handles
> >> quite a
> >> bit, but there are various things that I've run into testing it with
> >> real-world business services that are either a REAL pain to handle (so I
> >> haven't yet, but it is conceptually possible) or that I simply can't
> >> think
> >> of any good way to handle... and for those you simply can't use the
> >> write-through cache.
> >>>>
> >>>> There are some notes in the code for this, and some code/comments to
> >> more thoroughly communicate this concept, in this class in Moqui:
> >>>>
> >>>>
> >> https://github.com/moqui/moqui/blob/master/framework/src/main/groovy/org/moqui/impl/context/TransactionCache.groovy
> >>
> >>>>
> >>>> I should also say that my motivation to handle every edge case even
> >>>> for
> >> this write-through cache is limited... yes there is room for improvement
> >> handling more scenarios, but how big will the performance increase
> >> ACTUALLY
> >> be for them? The efforts on this so far have been based on profiling
> >> results and making sure there is a significant difference (which
> >> there is
> >> for many services in Mantle Business Artifacts, though I haven't even
> >> come
> >> close to testing all of them this way).
> >>>>
> >>>> The same concept would apply to a read-only entity cache... some
> >>>> things
> >> might be possible to support, but would NOT improve performance
> >> making them
> >> a moot point.
> >>>>
> >>>> I don't know if I've written enough to convince everyone listening
> >>>> that
> >> even attempting a universal read-only entity cache is a useless
> >> idea... I'm
> >> sure some will still like the idea. If anyone gets into it and wants
> >> to try
> >> it out in their own branch of OFBiz, great... knock yourself out
> >> (probably
> >> literally...). But PLEASE no one ever commit something like this to the
> >> primary branch in the repo... not EVER.
> >>>>
> >>>> The whole idea that the OFBiz entity cache has had more limited
> >>>> ability
> >> to handle different scenarios in the past than it does now is not an
> >> argument of any sort supporting the idea of taking the entity cache
> >> to the
> >> ultimate possible end... which theoretically isn't even that far from
> >> where
> >> it is now.
> >>>>
> >>>> To apply a more useful standard the arguments should be for a _useful_
> >> objective, which means increasing performance. I guarantee an always
> >> used
> >> find cache will NOT increase performance, it will kill it dead and cause
> >> infinite concurrency headaches in the process.
> >>>>
> >>>> -David
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>> On 19 Mar 2015, at 10:46, Adrian Crum <
> >> [hidden email]> wrote:
> >>>>>
> >>>>> The translation to English is not good, but I think I understand what
> >> you are saying.
> >>>>>
> >>>>> The entity values in the cache MUST be immutable - because multiple
> >> threads share the values. To do otherwise would require complicated
> >> synchronization code in GenericValue (which would cause blocking and
> >> hurt
> >> performance).
> >>>>>
> >>>>> When I first starting working on the entity cache issues, it appeared
> >> to me that mutable entity values may have been in the original design
> >> (to
> >> enable a write-through cache). That is my guess - I am not sure. At some
> >> time, the entity values in the cache were made immutable, but the change
> >> was incomplete - some cached entity values were immutable and others
> >> were
> >> not. That is one of the things I fixed - I made sure ALL entity values
> >> coming from the cache are immutable.
> >>>>>
> >>>>> One way we can eliminate the additional complication of cloning
> >> immutable entity values is to wrap the List in a custom Iterator
> >> implementation that automatically clones elements as they are retrieved
> >> from the List. The drawback is the performance hit - because you
> >> would be
> >> cloning values that might not get modified. I think it is more
> >> efficient to
> >> clone an entity value only when you intend to modify it.
> >>>>>
> >>>>> Adrian Crum
> >>>>> Sandglass Software
> >>>>> www.sandglass-software.com
> >>>>>
> >>>>> On 3/19/2015 4:19 PM, Nicolas Malin wrote:
> >>>>>>
> >>>>>> Le 18/03/2015 13:16, Adrian Crum a écrit :
> >>>>>>>
> >>>>>>> If you code Delegator calls to avoid the cache, then there is no
> >>>>>>> way
> >>>>>>> for a sysadmin to configure the caching behavior - that bit of code
> >>>>>>> will ALWAYS make a database call.
> >>>>>>>
> >>>>>>> If you make all Delegator calls use the cache, then there is an
> >>>>>>> additional complication that will add a bit more code: the
> >>>>>>> GenericValue instances retrieved from the cache are immutable -
> >>>>>>> if you
> >>>>>>> want to modify them, then you will have to clone them. So, this
> >>>>>>> approach can produce an additional line of code.
> >>>>>>
> >>>>>>
> >>>>>> I don't see any logical reason why we need to keep a GenericValue
> >>>>>> came
> >>>>>> from cache as immutable. In large vision, a developper give
> >>>>>> information
> >>>>>> on cache or not only he want force the cache using during his
> >>>>>> process.
> >>>>>> As OFBiz manage by default transaction, timezone, locale,
> >>>>>> auto-matching
> >>>>>> or others.
> >>>>>> The entity engine would be works with admin sys cache tuning.
> >>>>>>
> >>>>>> As example delegator.find("Party", "partyId", partyId) use the
> >>>>>> default
> >>>>>> parameter from cache.properties and after the store on a cached
> >>>>>> GenericValue is a delegator's problem. I see a simple test like
> >>>>>> that :
> >>>>>> if (genericValue came from cache) {
> >>>>>> if (value is already done) {
> >>>>>> getFromDataBase
> >>>>>> update Value
> >>>>>> }
> >>>>>> else refuse (or not I have a doubt :) )
> >>>>>> }
> >>>>>> store
> >>>>>>
> >>>>>>
> >>>>>> Nicolas
> >>>>
> >>>>
> >>
> >
>
>

Adrian Crum-3

Re: Entity Caching

I don't see an enable/disable setting but

default.maxSize=0 in cache.properties

should do it.

Adrian Crum
Sandglass Software
www.sandglass-software.com

On 3/22/2015 3:16 AM, Christian Carlow wrote:

> Is there a convenient setting for disabling cache completely as David
> mentioned he did?
>
> On Sat, 2015-03-21 at 21:39 -0400, Ron Wheeler wrote:
>> I agree with Adrian that caching should be a sysadmin choice.
>>
>> I would also caution that measuring cache performance during testing is
>> not a very useful activity. Testing tends to test one use case once and
>> move on to the next.
>> In production, users tend to do the same thing over and over.
>> Testing might fill a shopping cart a few times and do a lot of other
>> administrative functions as many times . In real life, shopping carts
>> are filled much more frequently than catalog updates (one hopes). Using
>> performance numbers from functional testing will be misleading.
>>
>> The other message that I get from David's discussion is that caching t
>> built by professional caching experts (Database developers as he
>> mentioned) worked better than caching systems built by application
>> developers.
>> It is likely that ehcache and the database built-in caching functions
>> will outperform caching systems built by OFBiz developers and will
>> handle the main cases better and will handle edge cases properly. They
>> will probably integrate better and be easier to configure at run-time or
>> during deployment. They will also be easier to tune by the system
>> administrator.
>>
>> I understand that Adrian needs to fix this quickly. I suppose that
>> caching could be eliminated to solve the problem while a better solution
>> is implemented.
>>
>> Do we know what it will take to add enough ehcache to make the system
>> perform adequately to meet current requirements?
>>
>> Ron
>>
>>
>> On 21/03/2015 6:22 AM, Adrian Crum wrote:
>>> I will try to say it again, but differently.
>>>
>>> If I am a developer, I am not aware of the subtleties of caching
>>> various entities. Entity cache settings will be determined during
>>> staging. So, I write my code as if everything will be cached - leaving
>>> the door open for a sysadmin to configure caching during staging.
>>>
>>> During staging, a sysadmin can start off with caching disabled, and
>>> then switch on caching for various entities while performance tests
>>> are being run. After some time, the sysadmin will have cache settings
>>> that provide optimal throughput. Does that mean ALL entities are
>>> cached? No, only the ones that need to be.
>>>
>>> The point I'm trying to make is this: The decision to cache or not
>>> should be made by a sysadmin, not by a developer.
>>>
>>> Adrian Crum
>>> Sandglass Software
>>> www.sandglass-software.com
>>>
>>> On 3/21/2015 10:08 AM, Scott Gray wrote:
>>>>> My preference is to make ALL Delegator calls use the cache.
>>>>
>>>> Perhaps I misunderstood the above sentence? I responded because I don't
>>>> think caching everything is a good idea
>>>>
>>>> On 21 Mar 2015 20:41, "Adrian Crum" <[hidden email]>
>>>> wrote:
>>>>>
>>>>> Thanks for the info David! I agree 100% with everything you said.
>>>>>
>>>>> There may be some misunderstanding about my advice. I suggested that
>>>> caching should be configured in the settings file, I did not suggest
>>>> that
>>>> everything should be cached all the time.
>>>>>
>>>>> Like you said, JMeter tests can reveal what needs to be cached, and a
>>>> sysadmin can fine-tune performance by tweaking the cache settings. The
>>>> problem I mentioned is this: A sysadmin can't improve performance by
>>>> caching a particular entity if a developer has hard-coded it not to be
>>>> cached.
>>>>>
>>>>> Btw, I removed the complicated condition checking in the condition
>>>>> cache
>>>> because it didn't work. Not only was the system spending a lot of time
>>>> evaluating long lists of values (each value having a potentially long
>>>> list
>>>> of conditions), at the end of the evaluation the result was always a
>>>> cache
>>>> miss.
>>>>>
>>>>>
>>>>>
>>>>> Adrian Crum
>>>>> Sandglass Software
>>>>> www.sandglass-software.com
>>>>>
>>>>> On 3/20/2015 9:22 PM, David E. Jones wrote:
>>>>>>
>>>>>>
>>>>>> Stepping back a little, some history and theory of the entity cache
>>>> might be helpful.
>>>>>>
>>>>>> The original intent of the entity cache was a simple way to keep
>>>> frequently used values/records closer to the code that uses them, ie
>>>> in the
>>>> application server. One real world example of this is the goal to be
>>>> able
>>>> to render ecommerce catalog and product pages without hitting the
>>>> database.
>>>>>>
>>>>>> Over time the entity caching was made more complex to handle more
>>>> caching scenarios, but still left to the developer to determine if
>>>> caching
>>>> is appropriate for the code they are writing.
>>>>>>
>>>>>> In theory is it possible to write an entity cache that can be used
>>>>>> 100%
>>>> of the time? IMO the answer is NO. This is almost possible for single
>>>> record caching, with the cache ultimately becoming an in-memory
>>>> relational
>>>> database running on the app server (with full transaction support,
>>>> etc)...
>>>> but for List caching it totally kills the whole concept. The current
>>>> entity
>>>> cache keeps lists of results by the query condition used to get those
>>>> results and this is very different from what a database does, and makes
>>>> things rather messy and inefficient outside simple use cases.
>>>>>>
>>>>>> On top of these big functional issues (which are deal killers IMO),
>>>> there is also the performance issue. The point, or intent at least,
>>>> of the
>>>> entity cache is to improve performance. As the cache gets more
>>>> complex the
>>>> performance will suffer, and because of the whole concept of caching
>>>> results by queries the performance will be WORSE than the DB performance
>>>> for the same queries in most cases. Databases are quite fast and
>>>> efficient,
>>>> and we'll never be able to reproduce their ability to scale and
>>>> search in
>>>> something like an in-memory entity cache, especially not considering the
>>>> massive redundancy and overhead of caching lists of values by condition.
>>>>>>
>>>>>> As an example of this in the real world: on a large OFBiz project I
>>>> worked on that finished last year we went into production with the
>>>> entity
>>>> cache turned OFF, completely DISABLED. Why? When doing load testing on a
>>>> whim one of the guys decided to try it without the entity cache enabled,
>>>> and the body of JMeter tests that exercised a few dozen of the most
>>>> common
>>>> user paths through the system actually ran FASTER. The database
>>>> (MySQL in
>>>> this case) was hit over the network, but responded quickly enough to
>>>> make
>>>> things work quite well for the various find queries, and FAR faster for
>>>> updates, especially creates. This project was one of the higher volume
>>>> projects I'm aware of for OFBiz, at peaks handling sustained
>>>> processing of
>>>> around 10 orders per second (36,000 per hour), with some short term
>>>> peaks
>>>> much higher, closer to 20-30 orders per second... and longer term peaks
>>>> hitting over 200k orders in one day (north America only day time,
>>>> around a
>>>> 12 hour window).
>>>>>>
>>>>>> I found this to be curious so looked into it a bit more and the main
>>>> performance culprit was updates, ESPECIALLY creates on any entity
>>>> that has
>>>> an active list cache. Auto-clearing that cache requires running the
>>>> condition for each cache entry on the record to see if it matches,
>>>> and if
>>>> it does then it is cleared. This could be made more efficient by
>>>> expanding
>>>> the reverse index concept to index all values of fields in conditions...
>>>> though that would be fairly complex to implement because of the wide
>>>> variety of conditions that CAN be performed on fields, and even
>>>> moreso when
>>>> they are combined with other logic... especially NOTs and ORs. This
>>>> could
>>>> potentially increase performance, but would again add yet more
>>>> complexity
>>>> and overhead.
>>>>>>
>>>>>> To turn this dilemma into a nightmare, consider caching view-entities.
>>>> In general as systems scale if you ever have to iterate over stuff your
>>>> performance is going to get hit REALLY hard compared to indexed and
>>>> other
>>>> less than n operations.
>>>>>>
>>>>>> The main lesson from the story: caching, especially list caching,
>>>>>> should
>>>> ONLY be done in limited cases when the ratio of reads to write is VERY
>>>> high, and more particularly the ratio of reads to creates. When
>>>> considering
>>>> whether to use a cache this should be considered carefully, because
>>>> records
>>>> are sometimes updated from places that developers are unaware,
>>>> sometimes at
>>>> surprising volumes. For example, it might seem great (and help a lot
>>>> in dev
>>>> and lower scale testing) to cache inventory information for viewing on a
>>>> category screen, but always go to the DB to avoid stale data on a
>>>> product
>>>> detail screen and when adding to cart. The problem is that with high
>>>> order
>>>> volumes the inventory data is pretty much constantly being updated,
>>>> so the
>>>> caches are constantly... SLOWLY... being cleared as InventoryDetail
>>>> records
>>>> are created for reservations and issuances.
>>>>>>
>>>>>> To turn this nightmare into a deal killer, consider multiple
>>>>>> application
>>>> servers and the need for either a (SLOW) distributed cache or (SLOW)
>>>> distributed cache clearing. These have to go over the network anyway, so
>>>> might as well go to the database!
>>>>>>
>>>>>> In the case above where we decided to NOT use the entity cache at all
>>>> the tests were run on one really beefy server showing that disabling the
>>>> cache was faster. When we ran it in a cluster of just 2 servers with
>>>> direct
>>>> DCC (the best case scenario for a distributed cache) we not only saw
>>>> a big
>>>> performance hit, but also got various run-time errors from stale data.
>>>>>>
>>>>>> I really don't how anyone could back the concept of caching all
>>>>>> finds by
>>>> default... you don't even have to imagine edge cases, just consider the
>>>> problems ALREADY being faced with more limited caching and how often the
>>>> entity cache simply isn't a good solution.
>>>>>>
>>>>>> As for improving the entity caching in OFBiz, there are some
>>>>>> concepts in
>>>> Moqui that might be useful:
>>>>>>
>>>>>> 1. add a cache attribute to the entity definition with true, false,
>>>>>> and
>>>> never options; true and false being defaults that can be overridden by
>>>> code, and never being an absolute (OFBiz does have this option IIRC);
>>>> this
>>>> would default to false, true being a useful setting for common things
>>>> like
>>>> Enumeration, StatusItem, etc, etc
>>>>>>
>>>>>> 2. add general support in the entity engine find methods for a "for
>>>> update" parameter, and if true don't cache (and pass this on to the
>>>> DB to
>>>> lock the record(s) being queried), also making the value mutable
>>>>>>
>>>>>> 3. a write-through per-transaction cache; you can do some really cool
>>>> stuff with this, avoiding most database hits during a transaction
>>>> until the
>>>> end when the changes are dumped to the DB; the Moqui implementation
>>>> of this
>>>> concept even looks for cached records that any find condition would
>>>> require
>>>> to get results and does the query in-memory, not having to go to the
>>>> database at all... and for other queries augments the results with
>>>> values
>>>> in the cache
>>>>>>
>>>>>> The whole concept of a write-through cache that is limited to the
>>>>>> scope
>>>> of a single transaction shows some of the issues you would run into
>>>> even if
>>>> trying to make the entity cache transactional. Especially with more
>>>> complex
>>>> finds it just falls apart. The current Moqui implementation handles
>>>> quite a
>>>> bit, but there are various things that I've run into testing it with
>>>> real-world business services that are either a REAL pain to handle (so I
>>>> haven't yet, but it is conceptually possible) or that I simply can't
>>>> think
>>>> of any good way to handle... and for those you simply can't use the
>>>> write-through cache.
>>>>>>
>>>>>> There are some notes in the code for this, and some code/comments to
>>>> more thoroughly communicate this concept, in this class in Moqui:
>>>>>>
>>>>>>
>>>> https://github.com/moqui/moqui/blob/master/framework/src/main/groovy/org/moqui/impl/context/TransactionCache.groovy
>>>>
>>>>>>
>>>>>> I should also say that my motivation to handle every edge case even
>>>>>> for
>>>> this write-through cache is limited... yes there is room for improvement
>>>> handling more scenarios, but how big will the performance increase
>>>> ACTUALLY
>>>> be for them? The efforts on this so far have been based on profiling
>>>> results and making sure there is a significant difference (which
>>>> there is
>>>> for many services in Mantle Business Artifacts, though I haven't even
>>>> come
>>>> close to testing all of them this way).
>>>>>>
>>>>>> The same concept would apply to a read-only entity cache... some
>>>>>> things
>>>> might be possible to support, but would NOT improve performance
>>>> making them
>>>> a moot point.
>>>>>>
>>>>>> I don't know if I've written enough to convince everyone listening
>>>>>> that
>>>> even attempting a universal read-only entity cache is a useless
>>>> idea... I'm
>>>> sure some will still like the idea. If anyone gets into it and wants
>>>> to try
>>>> it out in their own branch of OFBiz, great... knock yourself out
>>>> (probably
>>>> literally...). But PLEASE no one ever commit something like this to the
>>>> primary branch in the repo... not EVER.
>>>>>>
>>>>>> The whole idea that the OFBiz entity cache has had more limited
>>>>>> ability
>>>> to handle different scenarios in the past than it does now is not an
>>>> argument of any sort supporting the idea of taking the entity cache
>>>> to the
>>>> ultimate possible end... which theoretically isn't even that far from
>>>> where
>>>> it is now.
>>>>>>
>>>>>> To apply a more useful standard the arguments should be for a _useful_
>>>> objective, which means increasing performance. I guarantee an always
>>>> used
>>>> find cache will NOT increase performance, it will kill it dead and cause
>>>> infinite concurrency headaches in the process.
>>>>>>
>>>>>> -David
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> On 19 Mar 2015, at 10:46, Adrian Crum <
>>>> [hidden email]> wrote:
>>>>>>>
>>>>>>> The translation to English is not good, but I think I understand what
>>>> you are saying.
>>>>>>>
>>>>>>> The entity values in the cache MUST be immutable - because multiple
>>>> threads share the values. To do otherwise would require complicated
>>>> synchronization code in GenericValue (which would cause blocking and
>>>> hurt
>>>> performance).
>>>>>>>
>>>>>>> When I first starting working on the entity cache issues, it appeared
>>>> to me that mutable entity values may have been in the original design
>>>> (to
>>>> enable a write-through cache). That is my guess - I am not sure. At some
>>>> time, the entity values in the cache were made immutable, but the change
>>>> was incomplete - some cached entity values were immutable and others
>>>> were
>>>> not. That is one of the things I fixed - I made sure ALL entity values
>>>> coming from the cache are immutable.
>>>>>>>
>>>>>>> One way we can eliminate the additional complication of cloning
>>>> immutable entity values is to wrap the List in a custom Iterator
>>>> implementation that automatically clones elements as they are retrieved
>>>> from the List. The drawback is the performance hit - because you
>>>> would be
>>>> cloning values that might not get modified. I think it is more
>>>> efficient to
>>>> clone an entity value only when you intend to modify it.
>>>>>>>
>>>>>>> Adrian Crum
>>>>>>> Sandglass Software
>>>>>>> www.sandglass-software.com
>>>>>>>
>>>>>>> On 3/19/2015 4:19 PM, Nicolas Malin wrote:
>>>>>>>>
>>>>>>>> Le 18/03/2015 13:16, Adrian Crum a écrit :
>>>>>>>>>
>>>>>>>>> If you code Delegator calls to avoid the cache, then there is no
>>>>>>>>> way
>>>>>>>>> for a sysadmin to configure the caching behavior - that bit of code
>>>>>>>>> will ALWAYS make a database call.
>>>>>>>>>
>>>>>>>>> If you make all Delegator calls use the cache, then there is an
>>>>>>>>> additional complication that will add a bit more code: the
>>>>>>>>> GenericValue instances retrieved from the cache are immutable -
>>>>>>>>> if you
>>>>>>>>> want to modify them, then you will have to clone them. So, this
>>>>>>>>> approach can produce an additional line of code.
>>>>>>>>
>>>>>>>>
>>>>>>>> I don't see any logical reason why we need to keep a GenericValue
>>>>>>>> came
>>>>>>>> from cache as immutable. In large vision, a developper give
>>>>>>>> information
>>>>>>>> on cache or not only he want force the cache using during his
>>>>>>>> process.
>>>>>>>> As OFBiz manage by default transaction, timezone, locale,
>>>>>>>> auto-matching
>>>>>>>> or others.
>>>>>>>> The entity engine would be works with admin sys cache tuning.
>>>>>>>>
>>>>>>>> As example delegator.find("Party", "partyId", partyId) use the
>>>>>>>> default
>>>>>>>> parameter from cache.properties and after the store on a cached
>>>>>>>> GenericValue is a delegator's problem. I see a simple test like
>>>>>>>> that :
>>>>>>>> if (genericValue came from cache) {
>>>>>>>> if (value is already done) {
>>>>>>>> getFromDataBase
>>>>>>>> update Value
>>>>>>>> }
>>>>>>>> else refuse (or not I have a doubt :) )
>>>>>>>> }
>>>>>>>> store
>>>>>>>>
>>>>>>>>
>>>>>>>> Nicolas
>>>>>>
>>>>>>
>>>>
>>>
>>
>>
>
>

123