The translation to English is not good, but I think I understand what
you are saying. The entity values in the cache MUST be immutable - because multiple threads share the values. To do otherwise would require complicated synchronization code in GenericValue (which would cause blocking and hurt performance). When I first starting working on the entity cache issues, it appeared to me that mutable entity values may have been in the original design (to enable a write-through cache). That is my guess - I am not sure. At some time, the entity values in the cache were made immutable, but the change was incomplete - some cached entity values were immutable and others were not. That is one of the things I fixed - I made sure ALL entity values coming from the cache are immutable. One way we can eliminate the additional complication of cloning immutable entity values is to wrap the List in a custom Iterator implementation that automatically clones elements as they are retrieved from the List. The drawback is the performance hit - because you would be cloning values that might not get modified. I think it is more efficient to clone an entity value only when you intend to modify it. Adrian Crum Sandglass Software www.sandglass-software.com On 3/19/2015 4:19 PM, Nicolas Malin wrote: > Le 18/03/2015 13:16, Adrian Crum a écrit : >> If you code Delegator calls to avoid the cache, then there is no way >> for a sysadmin to configure the caching behavior - that bit of code >> will ALWAYS make a database call. >> >> If you make all Delegator calls use the cache, then there is an >> additional complication that will add a bit more code: the >> GenericValue instances retrieved from the cache are immutable - if you >> want to modify them, then you will have to clone them. So, this >> approach can produce an additional line of code. > > I don't see any logical reason why we need to keep a GenericValue came > from cache as immutable. In large vision, a developper give information > on cache or not only he want force the cache using during his process. > As OFBiz manage by default transaction, timezone, locale, auto-matching > or others. > The entity engine would be works with admin sys cache tuning. > > As example delegator.find("Party", "partyId", partyId) use the default > parameter from cache.properties and after the store on a cached > GenericValue is a delegator's problem. I see a simple test like that : > if (genericValue came from cache) { > if (value is already done) { > getFromDataBase > update Value > } > else refuse (or not I have a doubt :) ) > } > store > > > Nicolas |
In reply to this post by Adrian Crum-3
You're missing a step that actually causes the issue, prior to the rollback
in 5b some code within the same transaction retrieves the modified row from the database again which puts the modified row in the cache and makes the change visible to other transactions even though it hasn't yet been committed. Because of our service oriented architecture this scenario isn't uncommon. An example is updating an OrderHeader's statusId which can trigger a number of SECAs which in turn are likely to retrieve the OrderHeader row after being passed only the orderId. If a rollback occurred in one of those services, the modified row would remain in the cache even though the changes were never committed. On 20 Mar 2015 00:06, "Adrian Crum" <[hidden email]> wrote: > Okay, let's assume processes cannot "see" changes made by another > transaction until that transaction is committed. Here is how the current > entity cache works: > > 1. A Delegator find method is invoked. The Delegator checks the cache, and > the SQL SELECT result does not exist in the cache. > 2. The Delegator executes the SQL SELECT and puts the results in the > entity cache. > 3. The SQL SELECT results are returned to the calling process. > 4. The calling process modifies one of the values (rows) in the SQL SELECT > result (after cloning the immutable entity value). > 5a. Something goes wrong and the calling process rolls back the > transaction before the cloned value is persisted. > 5b. Something goes wrong and the calling process rolls back the > transaction after the cloned value is persisted and all related caches have > been cleared. > 6. Another process performs the same query as #1. > 7. The second process gets the results from the cache. The values from the > cache have not changed because the cloned & modified value (in #4) was not > put in the cache, nor was it written to the data source. > > From my perspective, the scenario you described can only happen if another > process can see changes that are made in the data source before the > transaction is committed. > > From your perspective, the entity cache is somehow inserting invalid > values when a transaction is rolled back. > > Adrian Crum > Sandglass Software > www.sandglass-software.com > > On 3/19/2015 10:41 AM, Scott Gray wrote: > >> I'm sorry but I'm not following what you're proposing. Currently row >> changes caused within a transaction are available only to queries issued >> within that same transaction (i.e. read committed), except that the cache >> breaks this isolation by making them immediately available to any >> transaction querying that entity. I don't see how this scenario exists >> outside of the cache unless the logic within the transaction explicitly >> passes a row off to another transaction, and I'm not aware of any cases >> like that. >> >> On Thu, Mar 19, 2015 at 3:17 AM, Adrian Crum < >> [hidden email]> wrote: >> >> I call it an edge case because it is easily fixed by changing the >>> transaction isolation level. >>> >>> The behavior you describe is not caused by the entity cache, but by the >>> transaction isolation level. The same scenario would exist without the >>> entity cache - where two processes hold a reference to the updated row, >>> and >>> one process performs a rollback. >>> >>> Adrian Crum >>> Sandglass Software >>> www.sandglass-software.com >>> >>> On 3/19/2015 7:28 AM, Scott Gray wrote: >>> >>> Ah, it's quite a large edge case IMO >>>> >>>> On Thu, Mar 19, 2015 at 12:20 AM, Adrian Crum < >>>> [hidden email]> wrote: >>>> >>>> That is the edge case I mentioned. >>>> >>>>> >>>>> Adrian Crum >>>>> Sandglass Software >>>>> www.sandglass-software.com >>>>> >>>>> On 3/19/2015 6:54 AM, Scott Gray wrote: >>>>> >>>>> I tend to disagree with the "cache everything" approach because the >>>>> >>>>>> cache >>>>>> isn't transaction aware. >>>>>> If you: >>>>>> 1. update a record >>>>>> 2. select that same record >>>>>> 3. encounter a transaction rollback >>>>>> >>>>>> Then the cache will still contain the changes that were rolled back. >>>>>> >>>>>> Regards >>>>>> Scott >>>>>> >>>>>> >>>>>> On Wed, Mar 18, 2015 at 5:16 AM, Adrian Crum < >>>>>> [hidden email]> wrote: >>>>>> >>>>>> I would like to share some insights into the entity cache feature, >>>>>> some >>>>>> >>>>>> best practices I like to follow, and some related information. >>>>>>> >>>>>>> Some OFBiz experts may disagree with some of my views, and that is >>>>>>> okay. >>>>>>> Different experiences with OFBiz will lead to different viewpoints. >>>>>>> >>>>>>> The OFBiz entity caching feature is intended to improve performance >>>>>>> by >>>>>>> keeping GenericValue instances in memory - decreasing the number of >>>>>>> calls >>>>>>> to the database. >>>>>>> >>>>>>> Background >>>>>>> ---------- >>>>>>> >>>>>>> Initially, the entity cache was very unreliable due to a number of >>>>>>> flaws >>>>>>> in its design and in the code that calls it (it was guaranteed to >>>>>>> produce >>>>>>> stale data). As a result, I personally avoided using the entity cache >>>>>>> feature. >>>>>>> >>>>>>> Some time ago, Adam Heath did a lot of work on the entity cache. >>>>>>> After >>>>>>> that, Jacopo and I did a lot of work fixing stale data issues in the >>>>>>> entity >>>>>>> cache. Today, the entity cache is much improved and unit tests ensure >>>>>>> it >>>>>>> produces the correct data (except for one edge case that Jacopo has >>>>>>> identified). >>>>>>> >>>>>>> I mention all of this because the previous quirky behavior led to >>>>>>> some >>>>>>> "best practices" that didn't make much sense. A search through the >>>>>>> OFBiz >>>>>>> mail archives will produce a mountain of conflicting and confusing >>>>>>> information. >>>>>>> >>>>>>> Today >>>>>>> ----- >>>>>>> >>>>>>> Since the current entity cache is reliable, there is no reason NOT to >>>>>>> use >>>>>>> it. My preference is to make ALL Delegator calls use the cache. If >>>>>>> all >>>>>>> code >>>>>>> uses the cache, then individual entities can have their caching >>>>>>> characteristics configured outside of code. This enables sysadmins to >>>>>>> fine-tune entity caches for best performance. >>>>>>> >>>>>>> [Some experts might disagree with this approach because the entity >>>>>>> cache >>>>>>> will consume all available memory. But the idea is to configure the >>>>>>> cache >>>>>>> so that doesn't happen.] >>>>>>> >>>>>>> If you code Delegator calls to avoid the cache, then there is no way >>>>>>> for >>>>>>> a >>>>>>> sysadmin to configure the caching behavior - that bit of code will >>>>>>> ALWAYS >>>>>>> make a database call. >>>>>>> >>>>>>> If you make all Delegator calls use the cache, then there is an >>>>>>> additional >>>>>>> complication that will add a bit more code: the GenericValue >>>>>>> instances >>>>>>> retrieved from the cache are immutable - if you want to modify them, >>>>>>> then >>>>>>> you will have to clone them. So, this approach can produce an >>>>>>> additional >>>>>>> line of code. >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Adrian Crum >>>>>>> Sandglass Software >>>>>>> www.sandglass-software.com >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>> >> |
I understand. Yes, that could occur.
But I still believe it is an edge case. ;) Adrian Crum Sandglass Software www.sandglass-software.com On 3/19/2015 8:37 PM, Scott Gray wrote: > You're missing a step that actually causes the issue, prior to the rollback > in 5b some code within the same transaction retrieves the modified row from > the database again which puts the modified row in the cache and makes the > change visible to other transactions even though it hasn't yet been > committed. > > Because of our service oriented architecture this scenario isn't uncommon. > An example is updating an OrderHeader's statusId which can trigger a number > of SECAs which in turn are likely to retrieve the OrderHeader row after > being passed only the orderId. If a rollback occurred in one of those > services, the modified row would remain in the cache even though the > changes were never committed. > On 20 Mar 2015 00:06, "Adrian Crum" <[hidden email]> > wrote: > >> Okay, let's assume processes cannot "see" changes made by another >> transaction until that transaction is committed. Here is how the current >> entity cache works: >> >> 1. A Delegator find method is invoked. The Delegator checks the cache, and >> the SQL SELECT result does not exist in the cache. >> 2. The Delegator executes the SQL SELECT and puts the results in the >> entity cache. >> 3. The SQL SELECT results are returned to the calling process. >> 4. The calling process modifies one of the values (rows) in the SQL SELECT >> result (after cloning the immutable entity value). >> 5a. Something goes wrong and the calling process rolls back the >> transaction before the cloned value is persisted. >> 5b. Something goes wrong and the calling process rolls back the >> transaction after the cloned value is persisted and all related caches have >> been cleared. >> 6. Another process performs the same query as #1. >> 7. The second process gets the results from the cache. The values from the >> cache have not changed because the cloned & modified value (in #4) was not >> put in the cache, nor was it written to the data source. >> >> From my perspective, the scenario you described can only happen if another >> process can see changes that are made in the data source before the >> transaction is committed. >> >> From your perspective, the entity cache is somehow inserting invalid >> values when a transaction is rolled back. >> >> Adrian Crum >> Sandglass Software >> www.sandglass-software.com >> >> On 3/19/2015 10:41 AM, Scott Gray wrote: >> >>> I'm sorry but I'm not following what you're proposing. Currently row >>> changes caused within a transaction are available only to queries issued >>> within that same transaction (i.e. read committed), except that the cache >>> breaks this isolation by making them immediately available to any >>> transaction querying that entity. I don't see how this scenario exists >>> outside of the cache unless the logic within the transaction explicitly >>> passes a row off to another transaction, and I'm not aware of any cases >>> like that. >>> >>> On Thu, Mar 19, 2015 at 3:17 AM, Adrian Crum < >>> [hidden email]> wrote: >>> >>> I call it an edge case because it is easily fixed by changing the >>>> transaction isolation level. >>>> >>>> The behavior you describe is not caused by the entity cache, but by the >>>> transaction isolation level. The same scenario would exist without the >>>> entity cache - where two processes hold a reference to the updated row, >>>> and >>>> one process performs a rollback. >>>> >>>> Adrian Crum >>>> Sandglass Software >>>> www.sandglass-software.com >>>> >>>> On 3/19/2015 7:28 AM, Scott Gray wrote: >>>> >>>> Ah, it's quite a large edge case IMO >>>>> >>>>> On Thu, Mar 19, 2015 at 12:20 AM, Adrian Crum < >>>>> [hidden email]> wrote: >>>>> >>>>> That is the edge case I mentioned. >>>>> >>>>>> >>>>>> Adrian Crum >>>>>> Sandglass Software >>>>>> www.sandglass-software.com >>>>>> >>>>>> On 3/19/2015 6:54 AM, Scott Gray wrote: >>>>>> >>>>>> I tend to disagree with the "cache everything" approach because the >>>>>> >>>>>>> cache >>>>>>> isn't transaction aware. >>>>>>> If you: >>>>>>> 1. update a record >>>>>>> 2. select that same record >>>>>>> 3. encounter a transaction rollback >>>>>>> >>>>>>> Then the cache will still contain the changes that were rolled back. >>>>>>> >>>>>>> Regards >>>>>>> Scott >>>>>>> >>>>>>> >>>>>>> On Wed, Mar 18, 2015 at 5:16 AM, Adrian Crum < >>>>>>> [hidden email]> wrote: >>>>>>> >>>>>>> I would like to share some insights into the entity cache feature, >>>>>>> some >>>>>>> >>>>>>> best practices I like to follow, and some related information. >>>>>>>> >>>>>>>> Some OFBiz experts may disagree with some of my views, and that is >>>>>>>> okay. >>>>>>>> Different experiences with OFBiz will lead to different viewpoints. >>>>>>>> >>>>>>>> The OFBiz entity caching feature is intended to improve performance >>>>>>>> by >>>>>>>> keeping GenericValue instances in memory - decreasing the number of >>>>>>>> calls >>>>>>>> to the database. >>>>>>>> >>>>>>>> Background >>>>>>>> ---------- >>>>>>>> >>>>>>>> Initially, the entity cache was very unreliable due to a number of >>>>>>>> flaws >>>>>>>> in its design and in the code that calls it (it was guaranteed to >>>>>>>> produce >>>>>>>> stale data). As a result, I personally avoided using the entity cache >>>>>>>> feature. >>>>>>>> >>>>>>>> Some time ago, Adam Heath did a lot of work on the entity cache. >>>>>>>> After >>>>>>>> that, Jacopo and I did a lot of work fixing stale data issues in the >>>>>>>> entity >>>>>>>> cache. Today, the entity cache is much improved and unit tests ensure >>>>>>>> it >>>>>>>> produces the correct data (except for one edge case that Jacopo has >>>>>>>> identified). >>>>>>>> >>>>>>>> I mention all of this because the previous quirky behavior led to >>>>>>>> some >>>>>>>> "best practices" that didn't make much sense. A search through the >>>>>>>> OFBiz >>>>>>>> mail archives will produce a mountain of conflicting and confusing >>>>>>>> information. >>>>>>>> >>>>>>>> Today >>>>>>>> ----- >>>>>>>> >>>>>>>> Since the current entity cache is reliable, there is no reason NOT to >>>>>>>> use >>>>>>>> it. My preference is to make ALL Delegator calls use the cache. If >>>>>>>> all >>>>>>>> code >>>>>>>> uses the cache, then individual entities can have their caching >>>>>>>> characteristics configured outside of code. This enables sysadmins to >>>>>>>> fine-tune entity caches for best performance. >>>>>>>> >>>>>>>> [Some experts might disagree with this approach because the entity >>>>>>>> cache >>>>>>>> will consume all available memory. But the idea is to configure the >>>>>>>> cache >>>>>>>> so that doesn't happen.] >>>>>>>> >>>>>>>> If you code Delegator calls to avoid the cache, then there is no way >>>>>>>> for >>>>>>>> a >>>>>>>> sysadmin to configure the caching behavior - that bit of code will >>>>>>>> ALWAYS >>>>>>>> make a database call. >>>>>>>> >>>>>>>> If you make all Delegator calls use the cache, then there is an >>>>>>>> additional >>>>>>>> complication that will add a bit more code: the GenericValue >>>>>>>> instances >>>>>>>> retrieved from the cache are immutable - if you want to modify them, >>>>>>>> then >>>>>>>> you will have to clone them. So, this approach can produce an >>>>>>>> additional >>>>>>>> line of code. >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Adrian Crum >>>>>>>> Sandglass Software >>>>>>>> www.sandglass-software.com >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>> >>> > |
Isn't this the kind of issue that something like ehcache handles? It seems to know the difference between a committed transaction and a transaction which is in progress and might be rolled back. Certainly a relational database with transaction support is not going to allow a process to access data from other processes unless the transaction is completed. The cache needs to know the difference between private data (incomplete transactions) and public data (data previously committed and not in the process of being changed) and prevent others from using private data from the cache. On the bright side, an SOA does make this much more of an edge case at the expense of moving transaction rollback higher up the application logic. Ron On 19/03/2015 4:55 PM, Adrian Crum wrote: > I understand. Yes, that could occur. > > But I still believe it is an edge case. ;) > > Adrian Crum > Sandglass Software > www.sandglass-software.com > > On 3/19/2015 8:37 PM, Scott Gray wrote: >> You're missing a step that actually causes the issue, prior to the >> rollback >> in 5b some code within the same transaction retrieves the modified >> row from >> the database again which puts the modified row in the cache and makes >> the >> change visible to other transactions even though it hasn't yet been >> committed. >> >> Because of our service oriented architecture this scenario isn't >> uncommon. >> An example is updating an OrderHeader's statusId which can trigger a >> number >> of SECAs which in turn are likely to retrieve the OrderHeader row after >> being passed only the orderId. If a rollback occurred in one of those >> services, the modified row would remain in the cache even though the >> changes were never committed. >> On 20 Mar 2015 00:06, "Adrian Crum" <[hidden email]> >> wrote: >> >>> Okay, let's assume processes cannot "see" changes made by another >>> transaction until that transaction is committed. Here is how the >>> current >>> entity cache works: >>> >>> 1. A Delegator find method is invoked. The Delegator checks the >>> cache, and >>> the SQL SELECT result does not exist in the cache. >>> 2. The Delegator executes the SQL SELECT and puts the results in the >>> entity cache. >>> 3. The SQL SELECT results are returned to the calling process. >>> 4. The calling process modifies one of the values (rows) in the SQL >>> SELECT >>> result (after cloning the immutable entity value). >>> 5a. Something goes wrong and the calling process rolls back the >>> transaction before the cloned value is persisted. >>> 5b. Something goes wrong and the calling process rolls back the >>> transaction after the cloned value is persisted and all related >>> caches have >>> been cleared. >>> 6. Another process performs the same query as #1. >>> 7. The second process gets the results from the cache. The values >>> from the >>> cache have not changed because the cloned & modified value (in #4) >>> was not >>> put in the cache, nor was it written to the data source. >>> >>> From my perspective, the scenario you described can only happen if >>> another >>> process can see changes that are made in the data source before the >>> transaction is committed. >>> >>> From your perspective, the entity cache is somehow inserting invalid >>> values when a transaction is rolled back. >>> >>> Adrian Crum >>> Sandglass Software >>> www.sandglass-software.com >>> >>> On 3/19/2015 10:41 AM, Scott Gray wrote: >>> >>>> I'm sorry but I'm not following what you're proposing. Currently row >>>> changes caused within a transaction are available only to queries >>>> issued >>>> within that same transaction (i.e. read committed), except that the >>>> cache >>>> breaks this isolation by making them immediately available to any >>>> transaction querying that entity. I don't see how this scenario >>>> exists >>>> outside of the cache unless the logic within the transaction >>>> explicitly >>>> passes a row off to another transaction, and I'm not aware of any >>>> cases >>>> like that. >>>> >>>> On Thu, Mar 19, 2015 at 3:17 AM, Adrian Crum < >>>> [hidden email]> wrote: >>>> >>>> I call it an edge case because it is easily fixed by changing the >>>>> transaction isolation level. >>>>> >>>>> The behavior you describe is not caused by the entity cache, but >>>>> by the >>>>> transaction isolation level. The same scenario would exist without >>>>> the >>>>> entity cache - where two processes hold a reference to the updated >>>>> row, >>>>> and >>>>> one process performs a rollback. >>>>> >>>>> Adrian Crum >>>>> Sandglass Software >>>>> www.sandglass-software.com >>>>> >>>>> On 3/19/2015 7:28 AM, Scott Gray wrote: >>>>> >>>>> Ah, it's quite a large edge case IMO >>>>>> >>>>>> On Thu, Mar 19, 2015 at 12:20 AM, Adrian Crum < >>>>>> [hidden email]> wrote: >>>>>> >>>>>> That is the edge case I mentioned. >>>>>> >>>>>>> >>>>>>> Adrian Crum >>>>>>> Sandglass Software >>>>>>> www.sandglass-software.com >>>>>>> >>>>>>> On 3/19/2015 6:54 AM, Scott Gray wrote: >>>>>>> >>>>>>> I tend to disagree with the "cache everything" approach >>>>>>> because the >>>>>>> >>>>>>>> cache >>>>>>>> isn't transaction aware. >>>>>>>> If you: >>>>>>>> 1. update a record >>>>>>>> 2. select that same record >>>>>>>> 3. encounter a transaction rollback >>>>>>>> >>>>>>>> Then the cache will still contain the changes that were rolled >>>>>>>> back. >>>>>>>> >>>>>>>> Regards >>>>>>>> Scott >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Mar 18, 2015 at 5:16 AM, Adrian Crum < >>>>>>>> [hidden email]> wrote: >>>>>>>> >>>>>>>> I would like to share some insights into the entity cache >>>>>>>> feature, >>>>>>>> some >>>>>>>> >>>>>>>> best practices I like to follow, and some related information. >>>>>>>>> >>>>>>>>> Some OFBiz experts may disagree with some of my views, and >>>>>>>>> that is >>>>>>>>> okay. >>>>>>>>> Different experiences with OFBiz will lead to different >>>>>>>>> viewpoints. >>>>>>>>> >>>>>>>>> The OFBiz entity caching feature is intended to improve >>>>>>>>> performance >>>>>>>>> by >>>>>>>>> keeping GenericValue instances in memory - decreasing the >>>>>>>>> number of >>>>>>>>> calls >>>>>>>>> to the database. >>>>>>>>> >>>>>>>>> Background >>>>>>>>> ---------- >>>>>>>>> >>>>>>>>> Initially, the entity cache was very unreliable due to a >>>>>>>>> number of >>>>>>>>> flaws >>>>>>>>> in its design and in the code that calls it (it was guaranteed to >>>>>>>>> produce >>>>>>>>> stale data). As a result, I personally avoided using the >>>>>>>>> entity cache >>>>>>>>> feature. >>>>>>>>> >>>>>>>>> Some time ago, Adam Heath did a lot of work on the entity cache. >>>>>>>>> After >>>>>>>>> that, Jacopo and I did a lot of work fixing stale data issues >>>>>>>>> in the >>>>>>>>> entity >>>>>>>>> cache. Today, the entity cache is much improved and unit tests >>>>>>>>> ensure >>>>>>>>> it >>>>>>>>> produces the correct data (except for one edge case that >>>>>>>>> Jacopo has >>>>>>>>> identified). >>>>>>>>> >>>>>>>>> I mention all of this because the previous quirky behavior led to >>>>>>>>> some >>>>>>>>> "best practices" that didn't make much sense. A search through >>>>>>>>> the >>>>>>>>> OFBiz >>>>>>>>> mail archives will produce a mountain of conflicting and >>>>>>>>> confusing >>>>>>>>> information. >>>>>>>>> >>>>>>>>> Today >>>>>>>>> ----- >>>>>>>>> >>>>>>>>> Since the current entity cache is reliable, there is no reason >>>>>>>>> NOT to >>>>>>>>> use >>>>>>>>> it. My preference is to make ALL Delegator calls use the >>>>>>>>> cache. If >>>>>>>>> all >>>>>>>>> code >>>>>>>>> uses the cache, then individual entities can have their caching >>>>>>>>> characteristics configured outside of code. This enables >>>>>>>>> sysadmins to >>>>>>>>> fine-tune entity caches for best performance. >>>>>>>>> >>>>>>>>> [Some experts might disagree with this approach because the >>>>>>>>> entity >>>>>>>>> cache >>>>>>>>> will consume all available memory. But the idea is to >>>>>>>>> configure the >>>>>>>>> cache >>>>>>>>> so that doesn't happen.] >>>>>>>>> >>>>>>>>> If you code Delegator calls to avoid the cache, then there is >>>>>>>>> no way >>>>>>>>> for >>>>>>>>> a >>>>>>>>> sysadmin to configure the caching behavior - that bit of code >>>>>>>>> will >>>>>>>>> ALWAYS >>>>>>>>> make a database call. >>>>>>>>> >>>>>>>>> If you make all Delegator calls use the cache, then there is an >>>>>>>>> additional >>>>>>>>> complication that will add a bit more code: the GenericValue >>>>>>>>> instances >>>>>>>>> retrieved from the cache are immutable - if you want to modify >>>>>>>>> them, >>>>>>>>> then >>>>>>>>> you will have to clone them. So, this approach can produce an >>>>>>>>> additional >>>>>>>>> line of code. >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Adrian Crum >>>>>>>>> Sandglass Software >>>>>>>>> www.sandglass-software.com >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>> >>>> >> > -- Ron Wheeler President Artifact Software Inc email: [hidden email] skype: ronaldmwheeler phone: 866-970-2435, ext 102 |
Yes ehcache supports transactions and would ideally be what we use for
caching. I started work on it and there's a branch in svn for it but I haven't had time to continue since December. Unfortunately there were a few incompatible aspects of the existing OFBiz cache API and the ehcache API which need to be reconciled before it would be possible to run the two against the same API and compare them. I'll restate my opinion that I don't think the lack of transactional awareness by the OFBiz cache is an "edge case". I think if you try and cache everything you'll soon encounter strange behavior that will be very difficult to reproduce and debug. My preference is to cache data that is read often and updated rarely. On 20 March 2015 at 17:44, Ron Wheeler <[hidden email]> wrote: > > Isn't this the kind of issue that something like ehcache handles? > It seems to know the difference between a committed transaction and a > transaction which is in progress and might be rolled back. > > Certainly a relational database with transaction support is not going to > allow a process to access data from other processes unless the transaction > is completed. > The cache needs to know the difference between private data (incomplete > transactions) and public data (data previously committed and not in the > process of being changed) and prevent others from using private data from > the cache. > > On the bright side, an SOA does make this much more of an edge case at the > expense of moving transaction rollback higher up the application logic. > > Ron > > > > On 19/03/2015 4:55 PM, Adrian Crum wrote: > >> I understand. Yes, that could occur. >> >> But I still believe it is an edge case. ;) >> >> Adrian Crum >> Sandglass Software >> www.sandglass-software.com >> >> On 3/19/2015 8:37 PM, Scott Gray wrote: >> >>> You're missing a step that actually causes the issue, prior to the >>> rollback >>> in 5b some code within the same transaction retrieves the modified row >>> from >>> the database again which puts the modified row in the cache and makes the >>> change visible to other transactions even though it hasn't yet been >>> committed. >>> >>> Because of our service oriented architecture this scenario isn't >>> uncommon. >>> An example is updating an OrderHeader's statusId which can trigger a >>> number >>> of SECAs which in turn are likely to retrieve the OrderHeader row after >>> being passed only the orderId. If a rollback occurred in one of those >>> services, the modified row would remain in the cache even though the >>> changes were never committed. >>> On 20 Mar 2015 00:06, "Adrian Crum" <[hidden email]> >>> wrote: >>> >>> Okay, let's assume processes cannot "see" changes made by another >>>> transaction until that transaction is committed. Here is how the current >>>> entity cache works: >>>> >>>> 1. A Delegator find method is invoked. The Delegator checks the cache, >>>> and >>>> the SQL SELECT result does not exist in the cache. >>>> 2. The Delegator executes the SQL SELECT and puts the results in the >>>> entity cache. >>>> 3. The SQL SELECT results are returned to the calling process. >>>> 4. The calling process modifies one of the values (rows) in the SQL >>>> SELECT >>>> result (after cloning the immutable entity value). >>>> 5a. Something goes wrong and the calling process rolls back the >>>> transaction before the cloned value is persisted. >>>> 5b. Something goes wrong and the calling process rolls back the >>>> transaction after the cloned value is persisted and all related caches >>>> have >>>> been cleared. >>>> 6. Another process performs the same query as #1. >>>> 7. The second process gets the results from the cache. The values from >>>> the >>>> cache have not changed because the cloned & modified value (in #4) was >>>> not >>>> put in the cache, nor was it written to the data source. >>>> >>>> From my perspective, the scenario you described can only happen if >>>> another >>>> process can see changes that are made in the data source before the >>>> transaction is committed. >>>> >>>> From your perspective, the entity cache is somehow inserting invalid >>>> values when a transaction is rolled back. >>>> >>>> Adrian Crum >>>> Sandglass Software >>>> www.sandglass-software.com >>>> >>>> On 3/19/2015 10:41 AM, Scott Gray wrote: >>>> >>>> I'm sorry but I'm not following what you're proposing. Currently row >>>>> changes caused within a transaction are available only to queries >>>>> issued >>>>> within that same transaction (i.e. read committed), except that the >>>>> cache >>>>> breaks this isolation by making them immediately available to any >>>>> transaction querying that entity. I don't see how this scenario exists >>>>> outside of the cache unless the logic within the transaction explicitly >>>>> passes a row off to another transaction, and I'm not aware of any cases >>>>> like that. >>>>> >>>>> On Thu, Mar 19, 2015 at 3:17 AM, Adrian Crum < >>>>> [hidden email]> wrote: >>>>> >>>>> I call it an edge case because it is easily fixed by changing the >>>>> >>>>>> transaction isolation level. >>>>>> >>>>>> The behavior you describe is not caused by the entity cache, but by >>>>>> the >>>>>> transaction isolation level. The same scenario would exist without the >>>>>> entity cache - where two processes hold a reference to the updated >>>>>> row, >>>>>> and >>>>>> one process performs a rollback. >>>>>> >>>>>> Adrian Crum >>>>>> Sandglass Software >>>>>> www.sandglass-software.com >>>>>> >>>>>> On 3/19/2015 7:28 AM, Scott Gray wrote: >>>>>> >>>>>> Ah, it's quite a large edge case IMO >>>>>> >>>>>>> >>>>>>> On Thu, Mar 19, 2015 at 12:20 AM, Adrian Crum < >>>>>>> [hidden email]> wrote: >>>>>>> >>>>>>> That is the edge case I mentioned. >>>>>>> >>>>>>> >>>>>>>> Adrian Crum >>>>>>>> Sandglass Software >>>>>>>> www.sandglass-software.com >>>>>>>> >>>>>>>> On 3/19/2015 6:54 AM, Scott Gray wrote: >>>>>>>> >>>>>>>> I tend to disagree with the "cache everything" approach because >>>>>>>> the >>>>>>>> >>>>>>>> cache >>>>>>>>> isn't transaction aware. >>>>>>>>> If you: >>>>>>>>> 1. update a record >>>>>>>>> 2. select that same record >>>>>>>>> 3. encounter a transaction rollback >>>>>>>>> >>>>>>>>> Then the cache will still contain the changes that were rolled >>>>>>>>> back. >>>>>>>>> >>>>>>>>> Regards >>>>>>>>> Scott >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Mar 18, 2015 at 5:16 AM, Adrian Crum < >>>>>>>>> [hidden email]> wrote: >>>>>>>>> >>>>>>>>> I would like to share some insights into the entity cache >>>>>>>>> feature, >>>>>>>>> some >>>>>>>>> >>>>>>>>> best practices I like to follow, and some related information. >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Some OFBiz experts may disagree with some of my views, and that is >>>>>>>>>> okay. >>>>>>>>>> Different experiences with OFBiz will lead to different >>>>>>>>>> viewpoints. >>>>>>>>>> >>>>>>>>>> The OFBiz entity caching feature is intended to improve >>>>>>>>>> performance >>>>>>>>>> by >>>>>>>>>> keeping GenericValue instances in memory - decreasing the number >>>>>>>>>> of >>>>>>>>>> calls >>>>>>>>>> to the database. >>>>>>>>>> >>>>>>>>>> Background >>>>>>>>>> ---------- >>>>>>>>>> >>>>>>>>>> Initially, the entity cache was very unreliable due to a number of >>>>>>>>>> flaws >>>>>>>>>> in its design and in the code that calls it (it was guaranteed to >>>>>>>>>> produce >>>>>>>>>> stale data). As a result, I personally avoided using the entity >>>>>>>>>> cache >>>>>>>>>> feature. >>>>>>>>>> >>>>>>>>>> Some time ago, Adam Heath did a lot of work on the entity cache. >>>>>>>>>> After >>>>>>>>>> that, Jacopo and I did a lot of work fixing stale data issues in >>>>>>>>>> the >>>>>>>>>> entity >>>>>>>>>> cache. Today, the entity cache is much improved and unit tests >>>>>>>>>> ensure >>>>>>>>>> it >>>>>>>>>> produces the correct data (except for one edge case that Jacopo >>>>>>>>>> has >>>>>>>>>> identified). >>>>>>>>>> >>>>>>>>>> I mention all of this because the previous quirky behavior led to >>>>>>>>>> some >>>>>>>>>> "best practices" that didn't make much sense. A search through the >>>>>>>>>> OFBiz >>>>>>>>>> mail archives will produce a mountain of conflicting and confusing >>>>>>>>>> information. >>>>>>>>>> >>>>>>>>>> Today >>>>>>>>>> ----- >>>>>>>>>> >>>>>>>>>> Since the current entity cache is reliable, there is no reason >>>>>>>>>> NOT to >>>>>>>>>> use >>>>>>>>>> it. My preference is to make ALL Delegator calls use the cache. If >>>>>>>>>> all >>>>>>>>>> code >>>>>>>>>> uses the cache, then individual entities can have their caching >>>>>>>>>> characteristics configured outside of code. This enables >>>>>>>>>> sysadmins to >>>>>>>>>> fine-tune entity caches for best performance. >>>>>>>>>> >>>>>>>>>> [Some experts might disagree with this approach because the entity >>>>>>>>>> cache >>>>>>>>>> will consume all available memory. But the idea is to configure >>>>>>>>>> the >>>>>>>>>> cache >>>>>>>>>> so that doesn't happen.] >>>>>>>>>> >>>>>>>>>> If you code Delegator calls to avoid the cache, then there is no >>>>>>>>>> way >>>>>>>>>> for >>>>>>>>>> a >>>>>>>>>> sysadmin to configure the caching behavior - that bit of code will >>>>>>>>>> ALWAYS >>>>>>>>>> make a database call. >>>>>>>>>> >>>>>>>>>> If you make all Delegator calls use the cache, then there is an >>>>>>>>>> additional >>>>>>>>>> complication that will add a bit more code: the GenericValue >>>>>>>>>> instances >>>>>>>>>> retrieved from the cache are immutable - if you want to modify >>>>>>>>>> them, >>>>>>>>>> then >>>>>>>>>> you will have to clone them. So, this approach can produce an >>>>>>>>>> additional >>>>>>>>>> line of code. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Adrian Crum >>>>>>>>>> Sandglass Software >>>>>>>>>> www.sandglass-software.com >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>> >>>>> >>> >> > > -- > Ron Wheeler > President > Artifact Software Inc > email: [hidden email] > skype: ronaldmwheeler > phone: 866-970-2435, ext 102 > > |
Administrator
|
+1
Jacques Le 20/03/2015 08:46, Scott Gray a écrit : > Yes ehcache supports transactions and would ideally be what we use for > caching. I started work on it and there's a branch in svn for it but I > haven't had time to continue since December. Unfortunately there were a > few incompatible aspects of the existing OFBiz cache API and the ehcache > API which need to be reconciled before it would be possible to run the two > against the same API and compare them. > > I'll restate my opinion that I don't think the lack of transactional > awareness by the OFBiz cache is an "edge case". I think if you try and > cache everything you'll soon encounter strange behavior that will be very > difficult to reproduce and debug. My preference is to cache data that is > read often and updated rarely. > > On 20 March 2015 at 17:44, Ron Wheeler <[hidden email]> > wrote: > >> Isn't this the kind of issue that something like ehcache handles? >> It seems to know the difference between a committed transaction and a >> transaction which is in progress and might be rolled back. >> >> Certainly a relational database with transaction support is not going to >> allow a process to access data from other processes unless the transaction >> is completed. >> The cache needs to know the difference between private data (incomplete >> transactions) and public data (data previously committed and not in the >> process of being changed) and prevent others from using private data from >> the cache. >> >> On the bright side, an SOA does make this much more of an edge case at the >> expense of moving transaction rollback higher up the application logic. >> >> Ron >> >> >> >> On 19/03/2015 4:55 PM, Adrian Crum wrote: >> >>> I understand. Yes, that could occur. >>> >>> But I still believe it is an edge case. ;) >>> >>> Adrian Crum >>> Sandglass Software >>> www.sandglass-software.com >>> >>> On 3/19/2015 8:37 PM, Scott Gray wrote: >>> >>>> You're missing a step that actually causes the issue, prior to the >>>> rollback >>>> in 5b some code within the same transaction retrieves the modified row >>>> from >>>> the database again which puts the modified row in the cache and makes the >>>> change visible to other transactions even though it hasn't yet been >>>> committed. >>>> >>>> Because of our service oriented architecture this scenario isn't >>>> uncommon. >>>> An example is updating an OrderHeader's statusId which can trigger a >>>> number >>>> of SECAs which in turn are likely to retrieve the OrderHeader row after >>>> being passed only the orderId. If a rollback occurred in one of those >>>> services, the modified row would remain in the cache even though the >>>> changes were never committed. >>>> On 20 Mar 2015 00:06, "Adrian Crum" <[hidden email]> >>>> wrote: >>>> >>>> Okay, let's assume processes cannot "see" changes made by another >>>>> transaction until that transaction is committed. Here is how the current >>>>> entity cache works: >>>>> >>>>> 1. A Delegator find method is invoked. The Delegator checks the cache, >>>>> and >>>>> the SQL SELECT result does not exist in the cache. >>>>> 2. The Delegator executes the SQL SELECT and puts the results in the >>>>> entity cache. >>>>> 3. The SQL SELECT results are returned to the calling process. >>>>> 4. The calling process modifies one of the values (rows) in the SQL >>>>> SELECT >>>>> result (after cloning the immutable entity value). >>>>> 5a. Something goes wrong and the calling process rolls back the >>>>> transaction before the cloned value is persisted. >>>>> 5b. Something goes wrong and the calling process rolls back the >>>>> transaction after the cloned value is persisted and all related caches >>>>> have >>>>> been cleared. >>>>> 6. Another process performs the same query as #1. >>>>> 7. The second process gets the results from the cache. The values from >>>>> the >>>>> cache have not changed because the cloned & modified value (in #4) was >>>>> not >>>>> put in the cache, nor was it written to the data source. >>>>> >>>>> From my perspective, the scenario you described can only happen if >>>>> another >>>>> process can see changes that are made in the data source before the >>>>> transaction is committed. >>>>> >>>>> From your perspective, the entity cache is somehow inserting invalid >>>>> values when a transaction is rolled back. >>>>> >>>>> Adrian Crum >>>>> Sandglass Software >>>>> www.sandglass-software.com >>>>> >>>>> On 3/19/2015 10:41 AM, Scott Gray wrote: >>>>> >>>>> I'm sorry but I'm not following what you're proposing. Currently row >>>>>> changes caused within a transaction are available only to queries >>>>>> issued >>>>>> within that same transaction (i.e. read committed), except that the >>>>>> cache >>>>>> breaks this isolation by making them immediately available to any >>>>>> transaction querying that entity. I don't see how this scenario exists >>>>>> outside of the cache unless the logic within the transaction explicitly >>>>>> passes a row off to another transaction, and I'm not aware of any cases >>>>>> like that. >>>>>> >>>>>> On Thu, Mar 19, 2015 at 3:17 AM, Adrian Crum < >>>>>> [hidden email]> wrote: >>>>>> >>>>>> I call it an edge case because it is easily fixed by changing the >>>>>> >>>>>>> transaction isolation level. >>>>>>> >>>>>>> The behavior you describe is not caused by the entity cache, but by >>>>>>> the >>>>>>> transaction isolation level. The same scenario would exist without the >>>>>>> entity cache - where two processes hold a reference to the updated >>>>>>> row, >>>>>>> and >>>>>>> one process performs a rollback. >>>>>>> >>>>>>> Adrian Crum >>>>>>> Sandglass Software >>>>>>> www.sandglass-software.com >>>>>>> >>>>>>> On 3/19/2015 7:28 AM, Scott Gray wrote: >>>>>>> >>>>>>> Ah, it's quite a large edge case IMO >>>>>>> >>>>>>>> On Thu, Mar 19, 2015 at 12:20 AM, Adrian Crum < >>>>>>>> [hidden email]> wrote: >>>>>>>> >>>>>>>> That is the edge case I mentioned. >>>>>>>> >>>>>>>> >>>>>>>>> Adrian Crum >>>>>>>>> Sandglass Software >>>>>>>>> www.sandglass-software.com >>>>>>>>> >>>>>>>>> On 3/19/2015 6:54 AM, Scott Gray wrote: >>>>>>>>> >>>>>>>>> I tend to disagree with the "cache everything" approach because >>>>>>>>> the >>>>>>>>> >>>>>>>>> cache >>>>>>>>>> isn't transaction aware. >>>>>>>>>> If you: >>>>>>>>>> 1. update a record >>>>>>>>>> 2. select that same record >>>>>>>>>> 3. encounter a transaction rollback >>>>>>>>>> >>>>>>>>>> Then the cache will still contain the changes that were rolled >>>>>>>>>> back. >>>>>>>>>> >>>>>>>>>> Regards >>>>>>>>>> Scott >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Mar 18, 2015 at 5:16 AM, Adrian Crum < >>>>>>>>>> [hidden email]> wrote: >>>>>>>>>> >>>>>>>>>> I would like to share some insights into the entity cache >>>>>>>>>> feature, >>>>>>>>>> some >>>>>>>>>> >>>>>>>>>> best practices I like to follow, and some related information. >>>>>>>>>> >>>>>>>>>>> Some OFBiz experts may disagree with some of my views, and that is >>>>>>>>>>> okay. >>>>>>>>>>> Different experiences with OFBiz will lead to different >>>>>>>>>>> viewpoints. >>>>>>>>>>> >>>>>>>>>>> The OFBiz entity caching feature is intended to improve >>>>>>>>>>> performance >>>>>>>>>>> by >>>>>>>>>>> keeping GenericValue instances in memory - decreasing the number >>>>>>>>>>> of >>>>>>>>>>> calls >>>>>>>>>>> to the database. >>>>>>>>>>> >>>>>>>>>>> Background >>>>>>>>>>> ---------- >>>>>>>>>>> >>>>>>>>>>> Initially, the entity cache was very unreliable due to a number of >>>>>>>>>>> flaws >>>>>>>>>>> in its design and in the code that calls it (it was guaranteed to >>>>>>>>>>> produce >>>>>>>>>>> stale data). As a result, I personally avoided using the entity >>>>>>>>>>> cache >>>>>>>>>>> feature. >>>>>>>>>>> >>>>>>>>>>> Some time ago, Adam Heath did a lot of work on the entity cache. >>>>>>>>>>> After >>>>>>>>>>> that, Jacopo and I did a lot of work fixing stale data issues in >>>>>>>>>>> the >>>>>>>>>>> entity >>>>>>>>>>> cache. Today, the entity cache is much improved and unit tests >>>>>>>>>>> ensure >>>>>>>>>>> it >>>>>>>>>>> produces the correct data (except for one edge case that Jacopo >>>>>>>>>>> has >>>>>>>>>>> identified). >>>>>>>>>>> >>>>>>>>>>> I mention all of this because the previous quirky behavior led to >>>>>>>>>>> some >>>>>>>>>>> "best practices" that didn't make much sense. A search through the >>>>>>>>>>> OFBiz >>>>>>>>>>> mail archives will produce a mountain of conflicting and confusing >>>>>>>>>>> information. >>>>>>>>>>> >>>>>>>>>>> Today >>>>>>>>>>> ----- >>>>>>>>>>> >>>>>>>>>>> Since the current entity cache is reliable, there is no reason >>>>>>>>>>> NOT to >>>>>>>>>>> use >>>>>>>>>>> it. My preference is to make ALL Delegator calls use the cache. If >>>>>>>>>>> all >>>>>>>>>>> code >>>>>>>>>>> uses the cache, then individual entities can have their caching >>>>>>>>>>> characteristics configured outside of code. This enables >>>>>>>>>>> sysadmins to >>>>>>>>>>> fine-tune entity caches for best performance. >>>>>>>>>>> >>>>>>>>>>> [Some experts might disagree with this approach because the entity >>>>>>>>>>> cache >>>>>>>>>>> will consume all available memory. But the idea is to configure >>>>>>>>>>> the >>>>>>>>>>> cache >>>>>>>>>>> so that doesn't happen.] >>>>>>>>>>> >>>>>>>>>>> If you code Delegator calls to avoid the cache, then there is no >>>>>>>>>>> way >>>>>>>>>>> for >>>>>>>>>>> a >>>>>>>>>>> sysadmin to configure the caching behavior - that bit of code will >>>>>>>>>>> ALWAYS >>>>>>>>>>> make a database call. >>>>>>>>>>> >>>>>>>>>>> If you make all Delegator calls use the cache, then there is an >>>>>>>>>>> additional >>>>>>>>>>> complication that will add a bit more code: the GenericValue >>>>>>>>>>> instances >>>>>>>>>>> retrieved from the cache are immutable - if you want to modify >>>>>>>>>>> them, >>>>>>>>>>> then >>>>>>>>>>> you will have to clone them. So, this approach can produce an >>>>>>>>>>> additional >>>>>>>>>>> line of code. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Adrian Crum >>>>>>>>>>> Sandglass Software >>>>>>>>>>> www.sandglass-software.com >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >> -- >> Ron Wheeler >> President >> Artifact Software Inc >> email: [hidden email] >> skype: ronaldmwheeler >> phone: 866-970-2435, ext 102 >> >> |
I guess "edge" is a subjective term and I was careful to add "more of
an" to allow for different perspectives. In the end a framework that gives erroneous results occasionally is a bit hard to work with unless the causes of erroneous results can be clearly identified and are easy to avoid by workarounds in the application or framework code. If the errors are going to be random and unavoidable (workload dependent), they really need to be fixed. An accounting system that occasionally gives bad results or triggers factory orders based on phantom backlogs, is not very good. It is not clear to me that caching is or should be part of the key competencies of this group. Is there a critical mass of caching expertise in the group to properly maintain a custom caching system given the other demands on time and resources. It seems to be one of those technologies (databases, containers, UI frameworks, etc.) that can and should be left to external resources if at all possible. Assuming that the custom caching solution could handle all of the use cases correctly, the key question is how much effort would it take to to fix and maintain the current caching system in comparison to moving to ehcache. Is there any urgency to fixing caching? Is it broken now or is it required to support the implementation of a new feature? Ron On 20/03/2015 5:56 AM, Jacques Le Roux wrote: > +1 > > Jacques > > Le 20/03/2015 08:46, Scott Gray a écrit : >> Yes ehcache supports transactions and would ideally be what we use for >> caching. I started work on it and there's a branch in svn for it but I >> haven't had time to continue since December. Unfortunately there were a >> few incompatible aspects of the existing OFBiz cache API and the ehcache >> API which need to be reconciled before it would be possible to run >> the two >> against the same API and compare them. >> >> I'll restate my opinion that I don't think the lack of transactional >> awareness by the OFBiz cache is an "edge case". I think if you try and >> cache everything you'll soon encounter strange behavior that will be >> very >> difficult to reproduce and debug. My preference is to cache data >> that is >> read often and updated rarely. >> >> On 20 March 2015 at 17:44, Ron Wheeler <[hidden email]> >> wrote: >> >>> Isn't this the kind of issue that something like ehcache handles? >>> It seems to know the difference between a committed transaction and a >>> transaction which is in progress and might be rolled back. >>> >>> Certainly a relational database with transaction support is not >>> going to >>> allow a process to access data from other processes unless the >>> transaction >>> is completed. >>> The cache needs to know the difference between private data (incomplete >>> transactions) and public data (data previously committed and not in the >>> process of being changed) and prevent others from using private data >>> from >>> the cache. >>> >>> On the bright side, an SOA does make this much more of an edge case >>> at the >>> expense of moving transaction rollback higher up the application logic. >>> >>> Ron >>> >>> >>> >>> On 19/03/2015 4:55 PM, Adrian Crum wrote: >>> >>>> I understand. Yes, that could occur. >>>> >>>> But I still believe it is an edge case. ;) >>>> >>>> Adrian Crum >>>> Sandglass Software >>>> www.sandglass-software.com >>>> >>>> On 3/19/2015 8:37 PM, Scott Gray wrote: >>>> >>>>> You're missing a step that actually causes the issue, prior to the >>>>> rollback >>>>> in 5b some code within the same transaction retrieves the modified >>>>> row >>>>> from >>>>> the database again which puts the modified row in the cache and >>>>> makes the >>>>> change visible to other transactions even though it hasn't yet been >>>>> committed. >>>>> >>>>> Because of our service oriented architecture this scenario isn't >>>>> uncommon. >>>>> An example is updating an OrderHeader's statusId which can trigger a >>>>> number >>>>> of SECAs which in turn are likely to retrieve the OrderHeader row >>>>> after >>>>> being passed only the orderId. If a rollback occurred in one of those >>>>> services, the modified row would remain in the cache even though the >>>>> changes were never committed. >>>>> On 20 Mar 2015 00:06, "Adrian Crum" >>>>> <[hidden email]> >>>>> wrote: >>>>> >>>>> Okay, let's assume processes cannot "see" changes made by another >>>>>> transaction until that transaction is committed. Here is how the >>>>>> current >>>>>> entity cache works: >>>>>> >>>>>> 1. A Delegator find method is invoked. The Delegator checks the >>>>>> cache, >>>>>> and >>>>>> the SQL SELECT result does not exist in the cache. >>>>>> 2. The Delegator executes the SQL SELECT and puts the results in the >>>>>> entity cache. >>>>>> 3. The SQL SELECT results are returned to the calling process. >>>>>> 4. The calling process modifies one of the values (rows) in the SQL >>>>>> SELECT >>>>>> result (after cloning the immutable entity value). >>>>>> 5a. Something goes wrong and the calling process rolls back the >>>>>> transaction before the cloned value is persisted. >>>>>> 5b. Something goes wrong and the calling process rolls back the >>>>>> transaction after the cloned value is persisted and all related >>>>>> caches >>>>>> have >>>>>> been cleared. >>>>>> 6. Another process performs the same query as #1. >>>>>> 7. The second process gets the results from the cache. The values >>>>>> from >>>>>> the >>>>>> cache have not changed because the cloned & modified value (in >>>>>> #4) was >>>>>> not >>>>>> put in the cache, nor was it written to the data source. >>>>>> >>>>>> From my perspective, the scenario you described can only happen if >>>>>> another >>>>>> process can see changes that are made in the data source before the >>>>>> transaction is committed. >>>>>> >>>>>> From your perspective, the entity cache is somehow inserting >>>>>> invalid >>>>>> values when a transaction is rolled back. >>>>>> >>>>>> Adrian Crum >>>>>> Sandglass Software >>>>>> www.sandglass-software.com >>>>>> >>>>>> On 3/19/2015 10:41 AM, Scott Gray wrote: >>>>>> >>>>>> I'm sorry but I'm not following what you're proposing. >>>>>> Currently row >>>>>>> changes caused within a transaction are available only to queries >>>>>>> issued >>>>>>> within that same transaction (i.e. read committed), except that the >>>>>>> cache >>>>>>> breaks this isolation by making them immediately available to any >>>>>>> transaction querying that entity. I don't see how this scenario >>>>>>> exists >>>>>>> outside of the cache unless the logic within the transaction >>>>>>> explicitly >>>>>>> passes a row off to another transaction, and I'm not aware of >>>>>>> any cases >>>>>>> like that. >>>>>>> >>>>>>> On Thu, Mar 19, 2015 at 3:17 AM, Adrian Crum < >>>>>>> [hidden email]> wrote: >>>>>>> >>>>>>> I call it an edge case because it is easily fixed by changing >>>>>>> the >>>>>>> >>>>>>>> transaction isolation level. >>>>>>>> >>>>>>>> The behavior you describe is not caused by the entity cache, >>>>>>>> but by >>>>>>>> the >>>>>>>> transaction isolation level. The same scenario would exist >>>>>>>> without the >>>>>>>> entity cache - where two processes hold a reference to the updated >>>>>>>> row, >>>>>>>> and >>>>>>>> one process performs a rollback. >>>>>>>> >>>>>>>> Adrian Crum >>>>>>>> Sandglass Software >>>>>>>> www.sandglass-software.com >>>>>>>> >>>>>>>> On 3/19/2015 7:28 AM, Scott Gray wrote: >>>>>>>> >>>>>>>> Ah, it's quite a large edge case IMO >>>>>>>> >>>>>>>>> On Thu, Mar 19, 2015 at 12:20 AM, Adrian Crum < >>>>>>>>> [hidden email]> wrote: >>>>>>>>> >>>>>>>>> That is the edge case I mentioned. >>>>>>>>> >>>>>>>>> >>>>>>>>>> Adrian Crum >>>>>>>>>> Sandglass Software >>>>>>>>>> www.sandglass-software.com >>>>>>>>>> >>>>>>>>>> On 3/19/2015 6:54 AM, Scott Gray wrote: >>>>>>>>>> >>>>>>>>>> I tend to disagree with the "cache everything" approach >>>>>>>>>> because >>>>>>>>>> the >>>>>>>>>> >>>>>>>>>> cache >>>>>>>>>>> isn't transaction aware. >>>>>>>>>>> If you: >>>>>>>>>>> 1. update a record >>>>>>>>>>> 2. select that same record >>>>>>>>>>> 3. encounter a transaction rollback >>>>>>>>>>> >>>>>>>>>>> Then the cache will still contain the changes that were rolled >>>>>>>>>>> back. >>>>>>>>>>> >>>>>>>>>>> Regards >>>>>>>>>>> Scott >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wed, Mar 18, 2015 at 5:16 AM, Adrian Crum < >>>>>>>>>>> [hidden email]> wrote: >>>>>>>>>>> >>>>>>>>>>> I would like to share some insights into the entity cache >>>>>>>>>>> feature, >>>>>>>>>>> some >>>>>>>>>>> >>>>>>>>>>> best practices I like to follow, and some related >>>>>>>>>>> information. >>>>>>>>>>> >>>>>>>>>>>> Some OFBiz experts may disagree with some of my views, and >>>>>>>>>>>> that is >>>>>>>>>>>> okay. >>>>>>>>>>>> Different experiences with OFBiz will lead to different >>>>>>>>>>>> viewpoints. >>>>>>>>>>>> >>>>>>>>>>>> The OFBiz entity caching feature is intended to improve >>>>>>>>>>>> performance >>>>>>>>>>>> by >>>>>>>>>>>> keeping GenericValue instances in memory - decreasing the >>>>>>>>>>>> number >>>>>>>>>>>> of >>>>>>>>>>>> calls >>>>>>>>>>>> to the database. >>>>>>>>>>>> >>>>>>>>>>>> Background >>>>>>>>>>>> ---------- >>>>>>>>>>>> >>>>>>>>>>>> Initially, the entity cache was very unreliable due to a >>>>>>>>>>>> number of >>>>>>>>>>>> flaws >>>>>>>>>>>> in its design and in the code that calls it (it was >>>>>>>>>>>> guaranteed to >>>>>>>>>>>> produce >>>>>>>>>>>> stale data). As a result, I personally avoided using the >>>>>>>>>>>> entity >>>>>>>>>>>> cache >>>>>>>>>>>> feature. >>>>>>>>>>>> >>>>>>>>>>>> Some time ago, Adam Heath did a lot of work on the entity >>>>>>>>>>>> cache. >>>>>>>>>>>> After >>>>>>>>>>>> that, Jacopo and I did a lot of work fixing stale data >>>>>>>>>>>> issues in >>>>>>>>>>>> the >>>>>>>>>>>> entity >>>>>>>>>>>> cache. Today, the entity cache is much improved and unit tests >>>>>>>>>>>> ensure >>>>>>>>>>>> it >>>>>>>>>>>> produces the correct data (except for one edge case that >>>>>>>>>>>> Jacopo >>>>>>>>>>>> has >>>>>>>>>>>> identified). >>>>>>>>>>>> >>>>>>>>>>>> I mention all of this because the previous quirky behavior >>>>>>>>>>>> led to >>>>>>>>>>>> some >>>>>>>>>>>> "best practices" that didn't make much sense. A search >>>>>>>>>>>> through the >>>>>>>>>>>> OFBiz >>>>>>>>>>>> mail archives will produce a mountain of conflicting and >>>>>>>>>>>> confusing >>>>>>>>>>>> information. >>>>>>>>>>>> >>>>>>>>>>>> Today >>>>>>>>>>>> ----- >>>>>>>>>>>> >>>>>>>>>>>> Since the current entity cache is reliable, there is no reason >>>>>>>>>>>> NOT to >>>>>>>>>>>> use >>>>>>>>>>>> it. My preference is to make ALL Delegator calls use the >>>>>>>>>>>> cache. If >>>>>>>>>>>> all >>>>>>>>>>>> code >>>>>>>>>>>> uses the cache, then individual entities can have their >>>>>>>>>>>> caching >>>>>>>>>>>> characteristics configured outside of code. This enables >>>>>>>>>>>> sysadmins to >>>>>>>>>>>> fine-tune entity caches for best performance. >>>>>>>>>>>> >>>>>>>>>>>> [Some experts might disagree with this approach because the >>>>>>>>>>>> entity >>>>>>>>>>>> cache >>>>>>>>>>>> will consume all available memory. But the idea is to >>>>>>>>>>>> configure >>>>>>>>>>>> the >>>>>>>>>>>> cache >>>>>>>>>>>> so that doesn't happen.] >>>>>>>>>>>> >>>>>>>>>>>> If you code Delegator calls to avoid the cache, then there >>>>>>>>>>>> is no >>>>>>>>>>>> way >>>>>>>>>>>> for >>>>>>>>>>>> a >>>>>>>>>>>> sysadmin to configure the caching behavior - that bit of >>>>>>>>>>>> code will >>>>>>>>>>>> ALWAYS >>>>>>>>>>>> make a database call. >>>>>>>>>>>> >>>>>>>>>>>> If you make all Delegator calls use the cache, then there >>>>>>>>>>>> is an >>>>>>>>>>>> additional >>>>>>>>>>>> complication that will add a bit more code: the GenericValue >>>>>>>>>>>> instances >>>>>>>>>>>> retrieved from the cache are immutable - if you want to modify >>>>>>>>>>>> them, >>>>>>>>>>>> then >>>>>>>>>>>> you will have to clone them. So, this approach can produce an >>>>>>>>>>>> additional >>>>>>>>>>>> line of code. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Adrian Crum >>>>>>>>>>>> Sandglass Software >>>>>>>>>>>> www.sandglass-software.com >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>> -- >>> Ron Wheeler >>> President >>> Artifact Software Inc >>> email: [hidden email] >>> skype: ronaldmwheeler >>> phone: 866-970-2435, ext 102 >>> >>> > -- Ron Wheeler President Artifact Software Inc email: [hidden email] skype: ronaldmwheeler phone: 866-970-2435, ext 102 |
In reply to this post by Adrian Crum-3
Le 19/03/2015 18:46, Adrian Crum a écrit :
> The translation to English is not good, but I think I understand what > you are saying. Oups my apologies ! > > The entity values in the cache MUST be immutable - because multiple > threads share the values. To do otherwise would require complicated > synchronization code in GenericValue (which would cause blocking and > hurt performance). > > When I first starting working on the entity cache issues, it appeared > to me that mutable entity values may have been in the original design > (to enable a write-through cache). That is my guess - I am not sure. > At some time, the entity values in the cache were made immutable, but > the change was incomplete - some cached entity values were immutable > and others were not. That is one of the things I fixed - I made sure > ALL entity values coming from the cache are immutable. > > One way we can eliminate the additional complication of cloning > immutable entity values is to wrap the List in a custom Iterator > implementation that automatically clones elements as they are > retrieved from the List. The drawback is the performance hit - because > you would be cloning values that might not get modified. I think it is > more efficient to clone an entity value only when you intend to modify > it. Right. An other way would be add one step for the developper to prepare the GenericValue for update. GenericValue party = delegator.find("Party", "partyId", partyId); party = party.openForUpdate(); party.set("comments", "groovy"); party.store(); on list List<GenericValue> parties : delegator.findList("Party", null, null, null, null); for (GenericValue party : parties) { if (case 1) { party = party.openForUpdate(); party.set("comments", "groovy"); toStore.add(party); } } With : GenericValue.openForUpdate() { if (this.isMutable()) return this; return this.clone(); } It's just a draft idea to conciliate Sys admin, dev and performance Nicolas > > Adrian Crum > Sandglass Software > www.sandglass-software.com > > On 3/19/2015 4:19 PM, Nicolas Malin wrote: >> Le 18/03/2015 13:16, Adrian Crum a écrit : >>> If you code Delegator calls to avoid the cache, then there is no way >>> for a sysadmin to configure the caching behavior - that bit of code >>> will ALWAYS make a database call. >>> >>> If you make all Delegator calls use the cache, then there is an >>> additional complication that will add a bit more code: the >>> GenericValue instances retrieved from the cache are immutable - if you >>> want to modify them, then you will have to clone them. So, this >>> approach can produce an additional line of code. >> >> I don't see any logical reason why we need to keep a GenericValue came >> from cache as immutable. In large vision, a developper give information >> on cache or not only he want force the cache using during his process. >> As OFBiz manage by default transaction, timezone, locale, auto-matching >> or others. >> The entity engine would be works with admin sys cache tuning. >> >> As example delegator.find("Party", "partyId", partyId) use the default >> parameter from cache.properties and after the store on a cached >> GenericValue is a delegator's problem. I see a simple test like that : >> if (genericValue came from cache) { >> if (value is already done) { >> getFromDataBase >> update Value >> } >> else refuse (or not I have a doubt :) ) >> } >> store >> >> >> Nicolas |
In reply to this post by Adrian Crum-3
Stepping back a little, some history and theory of the entity cache might be helpful. The original intent of the entity cache was a simple way to keep frequently used values/records closer to the code that uses them, ie in the application server. One real world example of this is the goal to be able to render ecommerce catalog and product pages without hitting the database. Over time the entity caching was made more complex to handle more caching scenarios, but still left to the developer to determine if caching is appropriate for the code they are writing. In theory is it possible to write an entity cache that can be used 100% of the time? IMO the answer is NO. This is almost possible for single record caching, with the cache ultimately becoming an in-memory relational database running on the app server (with full transaction support, etc)... but for List caching it totally kills the whole concept. The current entity cache keeps lists of results by the query condition used to get those results and this is very different from what a database does, and makes things rather messy and inefficient outside simple use cases. On top of these big functional issues (which are deal killers IMO), there is also the performance issue. The point, or intent at least, of the entity cache is to improve performance. As the cache gets more complex the performance will suffer, and because of the whole concept of caching results by queries the performance will be WORSE than the DB performance for the same queries in most cases. Databases are quite fast and efficient, and we'll never be able to reproduce their ability to scale and search in something like an in-memory entity cache, especially not considering the massive redundancy and overhead of caching lists of values by condition. As an example of this in the real world: on a large OFBiz project I worked on that finished last year we went into production with the entity cache turned OFF, completely DISABLED. Why? When doing load testing on a whim one of the guys decided to try it without the entity cache enabled, and the body of JMeter tests that exercised a few dozen of the most common user paths through the system actually ran FASTER. The database (MySQL in this case) was hit over the network, but responded quickly enough to make things work quite well for the various find queries, and FAR faster for updates, especially creates. This project was one of the higher volume projects I'm aware of for OFBiz, at peaks handling sustained processing of around 10 orders per second (36,000 per hour), with some short term peaks much higher, closer to 20-30 orders per second... and longer term peaks hitting over 200k orders in one day (north America only day time, around a 12 hour window). I found this to be curious so looked into it a bit more and the main performance culprit was updates, ESPECIALLY creates on any entity that has an active list cache. Auto-clearing that cache requires running the condition for each cache entry on the record to see if it matches, and if it does then it is cleared. This could be made more efficient by expanding the reverse index concept to index all values of fields in conditions... though that would be fairly complex to implement because of the wide variety of conditions that CAN be performed on fields, and even moreso when they are combined with other logic... especially NOTs and ORs. This could potentially increase performance, but would again add yet more complexity and overhead. To turn this dilemma into a nightmare, consider caching view-entities. In general as systems scale if you ever have to iterate over stuff your performance is going to get hit REALLY hard compared to indexed and other less than n operations. The main lesson from the story: caching, especially list caching, should ONLY be done in limited cases when the ratio of reads to write is VERY high, and more particularly the ratio of reads to creates. When considering whether to use a cache this should be considered carefully, because records are sometimes updated from places that developers are unaware, sometimes at surprising volumes. For example, it might seem great (and help a lot in dev and lower scale testing) to cache inventory information for viewing on a category screen, but always go to the DB to avoid stale data on a product detail screen and when adding to cart. The problem is that with high order volumes the inventory data is pretty much constantly being updated, so the caches are constantly... SLOWLY... being cleared as InventoryDetail records are created for reservations and issuances. To turn this nightmare into a deal killer, consider multiple application servers and the need for either a (SLOW) distributed cache or (SLOW) distributed cache clearing. These have to go over the network anyway, so might as well go to the database! In the case above where we decided to NOT use the entity cache at all the tests were run on one really beefy server showing that disabling the cache was faster. When we ran it in a cluster of just 2 servers with direct DCC (the best case scenario for a distributed cache) we not only saw a big performance hit, but also got various run-time errors from stale data. I really don't how anyone could back the concept of caching all finds by default... you don't even have to imagine edge cases, just consider the problems ALREADY being faced with more limited caching and how often the entity cache simply isn't a good solution. As for improving the entity caching in OFBiz, there are some concepts in Moqui that might be useful: 1. add a cache attribute to the entity definition with true, false, and never options; true and false being defaults that can be overridden by code, and never being an absolute (OFBiz does have this option IIRC); this would default to false, true being a useful setting for common things like Enumeration, StatusItem, etc, etc 2. add general support in the entity engine find methods for a "for update" parameter, and if true don't cache (and pass this on to the DB to lock the record(s) being queried), also making the value mutable 3. a write-through per-transaction cache; you can do some really cool stuff with this, avoiding most database hits during a transaction until the end when the changes are dumped to the DB; the Moqui implementation of this concept even looks for cached records that any find condition would require to get results and does the query in-memory, not having to go to the database at all... and for other queries augments the results with values in the cache The whole concept of a write-through cache that is limited to the scope of a single transaction shows some of the issues you would run into even if trying to make the entity cache transactional. Especially with more complex finds it just falls apart. The current Moqui implementation handles quite a bit, but there are various things that I've run into testing it with real-world business services that are either a REAL pain to handle (so I haven't yet, but it is conceptually possible) or that I simply can't think of any good way to handle... and for those you simply can't use the write-through cache. There are some notes in the code for this, and some code/comments to more thoroughly communicate this concept, in this class in Moqui: https://github.com/moqui/moqui/blob/master/framework/src/main/groovy/org/moqui/impl/context/TransactionCache.groovy I should also say that my motivation to handle every edge case even for this write-through cache is limited... yes there is room for improvement handling more scenarios, but how big will the performance increase ACTUALLY be for them? The efforts on this so far have been based on profiling results and making sure there is a significant difference (which there is for many services in Mantle Business Artifacts, though I haven't even come close to testing all of them this way). The same concept would apply to a read-only entity cache... some things might be possible to support, but would NOT improve performance making them a moot point. I don't know if I've written enough to convince everyone listening that even attempting a universal read-only entity cache is a useless idea... I'm sure some will still like the idea. If anyone gets into it and wants to try it out in their own branch of OFBiz, great... knock yourself out (probably literally...). But PLEASE no one ever commit something like this to the primary branch in the repo... not EVER. The whole idea that the OFBiz entity cache has had more limited ability to handle different scenarios in the past than it does now is not an argument of any sort supporting the idea of taking the entity cache to the ultimate possible end... which theoretically isn't even that far from where it is now. To apply a more useful standard the arguments should be for a _useful_ objective, which means increasing performance. I guarantee an always used find cache will NOT increase performance, it will kill it dead and cause infinite concurrency headaches in the process. -David > On 19 Mar 2015, at 10:46, Adrian Crum <[hidden email]> wrote: > > The translation to English is not good, but I think I understand what you are saying. > > The entity values in the cache MUST be immutable - because multiple threads share the values. To do otherwise would require complicated synchronization code in GenericValue (which would cause blocking and hurt performance). > > When I first starting working on the entity cache issues, it appeared to me that mutable entity values may have been in the original design (to enable a write-through cache). That is my guess - I am not sure. At some time, the entity values in the cache were made immutable, but the change was incomplete - some cached entity values were immutable and others were not. That is one of the things I fixed - I made sure ALL entity values coming from the cache are immutable. > > One way we can eliminate the additional complication of cloning immutable entity values is to wrap the List in a custom Iterator implementation that automatically clones elements as they are retrieved from the List. The drawback is the performance hit - because you would be cloning values that might not get modified. I think it is more efficient to clone an entity value only when you intend to modify it. > > Adrian Crum > Sandglass Software > www.sandglass-software.com > > On 3/19/2015 4:19 PM, Nicolas Malin wrote: >> Le 18/03/2015 13:16, Adrian Crum a écrit : >>> If you code Delegator calls to avoid the cache, then there is no way >>> for a sysadmin to configure the caching behavior - that bit of code >>> will ALWAYS make a database call. >>> >>> If you make all Delegator calls use the cache, then there is an >>> additional complication that will add a bit more code: the >>> GenericValue instances retrieved from the cache are immutable - if you >>> want to modify them, then you will have to clone them. So, this >>> approach can produce an additional line of code. >> >> I don't see any logical reason why we need to keep a GenericValue came >> from cache as immutable. In large vision, a developper give information >> on cache or not only he want force the cache using during his process. >> As OFBiz manage by default transaction, timezone, locale, auto-matching >> or others. >> The entity engine would be works with admin sys cache tuning. >> >> As example delegator.find("Party", "partyId", partyId) use the default >> parameter from cache.properties and after the store on a cached >> GenericValue is a delegator's problem. I see a simple test like that : >> if (genericValue came from cache) { >> if (value is already done) { >> getFromDataBase >> update Value >> } >> else refuse (or not I have a doubt :) ) >> } >> store >> >> >> Nicolas |
David,
wow, quite an interesting read! Thank you for sharing some historical design insights for the entity cache and the valuable findings using (or not using) the entity cache in a real life production scenario. I had not the time to dig deeper into the proposed entity cache changes but had the feeling that this would be quite a challenging change which have to be very well thought-out and needs some thorough testing. Regards, Michael Am 20.03.15 um 22:22 schrieb David E. Jones: > Stepping back a little, some history and theory of the entity cache might be helpful. > > The original intent of the entity cache was a simple way to keep frequently used values/records closer to the code that uses them, ie in the application server. One real world example of this is the goal to be able to render ecommerce catalog and product pages without hitting the database. > > Over time the entity caching was made more complex to handle more caching scenarios, but still left to the developer to determine if caching is appropriate for the code they are writing. > > In theory is it possible to write an entity cache that can be used 100% of the time? IMO the answer is NO. This is almost possible for single record caching, with the cache ultimately becoming an in-memory relational database running on the app server (with full transaction support, etc)... but for List caching it totally kills the whole concept. The current entity cache keeps lists of results by the query condition used to get those results and this is very different from what a database does, and makes things rather messy and inefficient outside simple use cases. > > On top of these big functional issues (which are deal killers IMO), there is also the performance issue. The point, or intent at least, of the entity cache is to improve performance. As the cache gets more complex the performance will suffer, and because of the whole concept of caching results by queries the performance will be WORSE than the DB performance for the same queries in most cases. Databases are quite fast and efficient, and we'll never be able to reproduce their ability to scale and search in something like an in-memory entity cache, especially not considering the massive redundancy and overhead of caching lists of values by condition. > > As an example of this in the real world: on a large OFBiz project I worked on that finished last year we went into production with the entity cache turned OFF, completely DISABLED. Why? When doing load testing on a whim one of the guys decided to try it without the entity cache enabled, and the body of JMeter tests that exercised a few dozen of the most common user paths through the system actually ran FASTER. The database (MySQL in this case) was hit over the network, but responded quickly enough to make things work quite well for the various find queries, and FAR faster for updates, especially creates. This project was one of the higher volume projects I'm aware of for OFBiz, at peaks handling sustained processing of around 10 orders per second (36,000 per hour), with some short term peaks much higher, closer to 20-30 orders per second... and longer term peaks hitting over 200k orders in one day (north America only day time, around a 12 hour window). > > I found this to be curious so looked into it a bit more and the main performance culprit was updates, ESPECIALLY creates on any entity that has an active list cache. Auto-clearing that cache requires running the condition for each cache entry on the record to see if it matches, and if it does then it is cleared. This could be made more efficient by expanding the reverse index concept to index all values of fields in conditions... though that would be fairly complex to implement because of the wide variety of conditions that CAN be performed on fields, and even moreso when they are combined with other logic... especially NOTs and ORs. This could potentially increase performance, but would again add yet more complexity and overhead. > > To turn this dilemma into a nightmare, consider caching view-entities. In general as systems scale if you ever have to iterate over stuff your performance is going to get hit REALLY hard compared to indexed and other less than n operations. > > The main lesson from the story: caching, especially list caching, should ONLY be done in limited cases when the ratio of reads to write is VERY high, and more particularly the ratio of reads to creates. When considering whether to use a cache this should be considered carefully, because records are sometimes updated from places that developers are unaware, sometimes at surprising volumes. For example, it might seem great (and help a lot in dev and lower scale testing) to cache inventory information for viewing on a category screen, but always go to the DB to avoid stale data on a product detail screen and when adding to cart. The problem is that with high order volumes the inventory data is pretty much constantly being updated, so the caches are constantly... SLOWLY... being cleared as InventoryDetail records are created for reservations and issuances. > > To turn this nightmare into a deal killer, consider multiple application servers and the need for either a (SLOW) distributed cache or (SLOW) distributed cache clearing. These have to go over the network anyway, so might as well go to the database! > > In the case above where we decided to NOT use the entity cache at all the tests were run on one really beefy server showing that disabling the cache was faster. When we ran it in a cluster of just 2 servers with direct DCC (the best case scenario for a distributed cache) we not only saw a big performance hit, but also got various run-time errors from stale data. > > I really don't how anyone could back the concept of caching all finds by default... you don't even have to imagine edge cases, just consider the problems ALREADY being faced with more limited caching and how often the entity cache simply isn't a good solution. > > As for improving the entity caching in OFBiz, there are some concepts in Moqui that might be useful: > > 1. add a cache attribute to the entity definition with true, false, and never options; true and false being defaults that can be overridden by code, and never being an absolute (OFBiz does have this option IIRC); this would default to false, true being a useful setting for common things like Enumeration, StatusItem, etc, etc > > 2. add general support in the entity engine find methods for a "for update" parameter, and if true don't cache (and pass this on to the DB to lock the record(s) being queried), also making the value mutable > > 3. a write-through per-transaction cache; you can do some really cool stuff with this, avoiding most database hits during a transaction until the end when the changes are dumped to the DB; the Moqui implementation of this concept even looks for cached records that any find condition would require to get results and does the query in-memory, not having to go to the database at all... and for other queries augments the results with values in the cache > > The whole concept of a write-through cache that is limited to the scope of a single transaction shows some of the issues you would run into even if trying to make the entity cache transactional. Especially with more complex finds it just falls apart. The current Moqui implementation handles quite a bit, but there are various things that I've run into testing it with real-world business services that are either a REAL pain to handle (so I haven't yet, but it is conceptually possible) or that I simply can't think of any good way to handle... and for those you simply can't use the write-through cache. > > There are some notes in the code for this, and some code/comments to more thoroughly communicate this concept, in this class in Moqui: > > https://github.com/moqui/moqui/blob/master/framework/src/main/groovy/org/moqui/impl/context/TransactionCache.groovy > > I should also say that my motivation to handle every edge case even for this write-through cache is limited... yes there is room for improvement handling more scenarios, but how big will the performance increase ACTUALLY be for them? The efforts on this so far have been based on profiling results and making sure there is a significant difference (which there is for many services in Mantle Business Artifacts, though I haven't even come close to testing all of them this way). > > The same concept would apply to a read-only entity cache... some things might be possible to support, but would NOT improve performance making them a moot point. > > I don't know if I've written enough to convince everyone listening that even attempting a universal read-only entity cache is a useless idea... I'm sure some will still like the idea. If anyone gets into it and wants to try it out in their own branch of OFBiz, great... knock yourself out (probably literally...). But PLEASE no one ever commit something like this to the primary branch in the repo... not EVER. > > The whole idea that the OFBiz entity cache has had more limited ability to handle different scenarios in the past than it does now is not an argument of any sort supporting the idea of taking the entity cache to the ultimate possible end... which theoretically isn't even that far from where it is now. > > To apply a more useful standard the arguments should be for a _useful_ objective, which means increasing performance. I guarantee an always used find cache will NOT increase performance, it will kill it dead and cause infinite concurrency headaches in the process. > > -David > > |
Administrator
|
Le 20/03/2015 23:37, Michael Brohl a écrit :
> David, > > wow, quite an interesting read! > > Thank you for sharing some historical design insights for the entity cache and the valuable findings using (or not using) the entity cache in a real > life production scenario. > > I had not the time to dig deeper into the proposed entity cache changes but had the feeling that this would be quite a challenging change which have > to be very well thought-out and needs some thorough testing. Like said Scott, without the ability to compare the 2 possible implementations (old and new) it's a risky thing. David's feedback proves it Jacques > > Regards, > > Michael > > > Am 20.03.15 um 22:22 schrieb David E. Jones: >> Stepping back a little, some history and theory of the entity cache might be helpful. >> >> The original intent of the entity cache was a simple way to keep frequently used values/records closer to the code that uses them, ie in the >> application server. One real world example of this is the goal to be able to render ecommerce catalog and product pages without hitting the database. >> >> Over time the entity caching was made more complex to handle more caching scenarios, but still left to the developer to determine if caching is >> appropriate for the code they are writing. >> >> In theory is it possible to write an entity cache that can be used 100% of the time? IMO the answer is NO. This is almost possible for single >> record caching, with the cache ultimately becoming an in-memory relational database running on the app server (with full transaction support, >> etc)... but for List caching it totally kills the whole concept. The current entity cache keeps lists of results by the query condition used to get >> those results and this is very different from what a database does, and makes things rather messy and inefficient outside simple use cases. >> >> On top of these big functional issues (which are deal killers IMO), there is also the performance issue. The point, or intent at least, of the >> entity cache is to improve performance. As the cache gets more complex the performance will suffer, and because of the whole concept of caching >> results by queries the performance will be WORSE than the DB performance for the same queries in most cases. Databases are quite fast and >> efficient, and we'll never be able to reproduce their ability to scale and search in something like an in-memory entity cache, especially not >> considering the massive redundancy and overhead of caching lists of values by condition. >> >> As an example of this in the real world: on a large OFBiz project I worked on that finished last year we went into production with the entity cache >> turned OFF, completely DISABLED. Why? When doing load testing on a whim one of the guys decided to try it without the entity cache enabled, and the >> body of JMeter tests that exercised a few dozen of the most common user paths through the system actually ran FASTER. The database (MySQL in this >> case) was hit over the network, but responded quickly enough to make things work quite well for the various find queries, and FAR faster for >> updates, especially creates. This project was one of the higher volume projects I'm aware of for OFBiz, at peaks handling sustained processing of >> around 10 orders per second (36,000 per hour), with some short term peaks much higher, closer to 20-30 orders per second... and longer term peaks >> hitting over 200k orders in one day (north America only day time, around a 12 hour window). >> >> I found this to be curious so looked into it a bit more and the main performance culprit was updates, ESPECIALLY creates on any entity that has an >> active list cache. Auto-clearing that cache requires running the condition for each cache entry on the record to see if it matches, and if it does >> then it is cleared. This could be made more efficient by expanding the reverse index concept to index all values of fields in conditions... though >> that would be fairly complex to implement because of the wide variety of conditions that CAN be performed on fields, and even moreso when they are >> combined with other logic... especially NOTs and ORs. This could potentially increase performance, but would again add yet more complexity and >> overhead. >> >> To turn this dilemma into a nightmare, consider caching view-entities. In general as systems scale if you ever have to iterate over stuff your >> performance is going to get hit REALLY hard compared to indexed and other less than n operations. >> >> The main lesson from the story: caching, especially list caching, should ONLY be done in limited cases when the ratio of reads to write is VERY >> high, and more particularly the ratio of reads to creates. When considering whether to use a cache this should be considered carefully, because >> records are sometimes updated from places that developers are unaware, sometimes at surprising volumes. For example, it might seem great (and help >> a lot in dev and lower scale testing) to cache inventory information for viewing on a category screen, but always go to the DB to avoid stale data >> on a product detail screen and when adding to cart. The problem is that with high order volumes the inventory data is pretty much constantly being >> updated, so the caches are constantly... SLOWLY... being cleared as InventoryDetail records are created for reservations and issuances. >> >> To turn this nightmare into a deal killer, consider multiple application servers and the need for either a (SLOW) distributed cache or (SLOW) >> distributed cache clearing. These have to go over the network anyway, so might as well go to the database! >> >> In the case above where we decided to NOT use the entity cache at all the tests were run on one really beefy server showing that disabling the >> cache was faster. When we ran it in a cluster of just 2 servers with direct DCC (the best case scenario for a distributed cache) we not only saw a >> big performance hit, but also got various run-time errors from stale data. >> >> I really don't how anyone could back the concept of caching all finds by default... you don't even have to imagine edge cases, just consider the >> problems ALREADY being faced with more limited caching and how often the entity cache simply isn't a good solution. >> >> As for improving the entity caching in OFBiz, there are some concepts in Moqui that might be useful: >> >> 1. add a cache attribute to the entity definition with true, false, and never options; true and false being defaults that can be overridden by >> code, and never being an absolute (OFBiz does have this option IIRC); this would default to false, true being a useful setting for common things >> like Enumeration, StatusItem, etc, etc >> >> 2. add general support in the entity engine find methods for a "for update" parameter, and if true don't cache (and pass this on to the DB to lock >> the record(s) being queried), also making the value mutable >> >> 3. a write-through per-transaction cache; you can do some really cool stuff with this, avoiding most database hits during a transaction until the >> end when the changes are dumped to the DB; the Moqui implementation of this concept even looks for cached records that any find condition would >> require to get results and does the query in-memory, not having to go to the database at all... and for other queries augments the results with >> values in the cache >> >> The whole concept of a write-through cache that is limited to the scope of a single transaction shows some of the issues you would run into even if >> trying to make the entity cache transactional. Especially with more complex finds it just falls apart. The current Moqui implementation handles >> quite a bit, but there are various things that I've run into testing it with real-world business services that are either a REAL pain to handle (so >> I haven't yet, but it is conceptually possible) or that I simply can't think of any good way to handle... and for those you simply can't use the >> write-through cache. >> >> There are some notes in the code for this, and some code/comments to more thoroughly communicate this concept, in this class in Moqui: >> >> https://github.com/moqui/moqui/blob/master/framework/src/main/groovy/org/moqui/impl/context/TransactionCache.groovy >> >> I should also say that my motivation to handle every edge case even for this write-through cache is limited... yes there is room for improvement >> handling more scenarios, but how big will the performance increase ACTUALLY be for them? The efforts on this so far have been based on profiling >> results and making sure there is a significant difference (which there is for many services in Mantle Business Artifacts, though I haven't even >> come close to testing all of them this way). >> >> The same concept would apply to a read-only entity cache... some things might be possible to support, but would NOT improve performance making them >> a moot point. >> >> I don't know if I've written enough to convince everyone listening that even attempting a universal read-only entity cache is a useless idea... I'm >> sure some will still like the idea. If anyone gets into it and wants to try it out in their own branch of OFBiz, great... knock yourself out >> (probably literally...). But PLEASE no one ever commit something like this to the primary branch in the repo... not EVER. >> >> The whole idea that the OFBiz entity cache has had more limited ability to handle different scenarios in the past than it does now is not an >> argument of any sort supporting the idea of taking the entity cache to the ultimate possible end... which theoretically isn't even that far from >> where it is now. >> >> To apply a more useful standard the arguments should be for a _useful_ objective, which means increasing performance. I guarantee an always used >> find cache will NOT increase performance, it will kill it dead and cause infinite concurrency headaches in the process. >> >> -David >> >> > > |
Administrator
|
Le 20/03/2015 23:41, Jacques Le Roux a écrit : > Le 20/03/2015 23:37, Michael Brohl a écrit : >> David, >> >> wow, quite an interesting read! >> >> Thank you for sharing some historical design insights for the entity cache and the valuable findings using (or not using) the entity cache in a >> real life production scenario. >> >> I had not the time to dig deeper into the proposed entity cache changes but had the feeling that this would be quite a challenging change which >> have to be very well thought-out and needs some thorough testing. > > Like said Scott, without the ability to compare the 2 possible implementations (old and new) it's a risky thing. David's feedback proves it > > Jacques Ha something I forgot to say also, nowadays with SSDs, which are roughly 10 times faster than HDs, things have changed a bit. Jacques > >> >> Regards, >> >> Michael >> >> >> Am 20.03.15 um 22:22 schrieb David E. Jones: >>> Stepping back a little, some history and theory of the entity cache might be helpful. >>> >>> The original intent of the entity cache was a simple way to keep frequently used values/records closer to the code that uses them, ie in the >>> application server. One real world example of this is the goal to be able to render ecommerce catalog and product pages without hitting the database. >>> >>> Over time the entity caching was made more complex to handle more caching scenarios, but still left to the developer to determine if caching is >>> appropriate for the code they are writing. >>> >>> In theory is it possible to write an entity cache that can be used 100% of the time? IMO the answer is NO. This is almost possible for single >>> record caching, with the cache ultimately becoming an in-memory relational database running on the app server (with full transaction support, >>> etc)... but for List caching it totally kills the whole concept. The current entity cache keeps lists of results by the query condition used to >>> get those results and this is very different from what a database does, and makes things rather messy and inefficient outside simple use cases. >>> >>> On top of these big functional issues (which are deal killers IMO), there is also the performance issue. The point, or intent at least, of the >>> entity cache is to improve performance. As the cache gets more complex the performance will suffer, and because of the whole concept of caching >>> results by queries the performance will be WORSE than the DB performance for the same queries in most cases. Databases are quite fast and >>> efficient, and we'll never be able to reproduce their ability to scale and search in something like an in-memory entity cache, especially not >>> considering the massive redundancy and overhead of caching lists of values by condition. >>> >>> As an example of this in the real world: on a large OFBiz project I worked on that finished last year we went into production with the entity >>> cache turned OFF, completely DISABLED. Why? When doing load testing on a whim one of the guys decided to try it without the entity cache enabled, >>> and the body of JMeter tests that exercised a few dozen of the most common user paths through the system actually ran FASTER. The database (MySQL >>> in this case) was hit over the network, but responded quickly enough to make things work quite well for the various find queries, and FAR faster >>> for updates, especially creates. This project was one of the higher volume projects I'm aware of for OFBiz, at peaks handling sustained processing >>> of around 10 orders per second (36,000 per hour), with some short term peaks much higher, closer to 20-30 orders per second... and longer term >>> peaks hitting over 200k orders in one day (north America only day time, around a 12 hour window). >>> >>> I found this to be curious so looked into it a bit more and the main performance culprit was updates, ESPECIALLY creates on any entity that has an >>> active list cache. Auto-clearing that cache requires running the condition for each cache entry on the record to see if it matches, and if it does >>> then it is cleared. This could be made more efficient by expanding the reverse index concept to index all values of fields in conditions... though >>> that would be fairly complex to implement because of the wide variety of conditions that CAN be performed on fields, and even moreso when they are >>> combined with other logic... especially NOTs and ORs. This could potentially increase performance, but would again add yet more complexity and >>> overhead. >>> >>> To turn this dilemma into a nightmare, consider caching view-entities. In general as systems scale if you ever have to iterate over stuff your >>> performance is going to get hit REALLY hard compared to indexed and other less than n operations. >>> >>> The main lesson from the story: caching, especially list caching, should ONLY be done in limited cases when the ratio of reads to write is VERY >>> high, and more particularly the ratio of reads to creates. When considering whether to use a cache this should be considered carefully, because >>> records are sometimes updated from places that developers are unaware, sometimes at surprising volumes. For example, it might seem great (and help >>> a lot in dev and lower scale testing) to cache inventory information for viewing on a category screen, but always go to the DB to avoid stale data >>> on a product detail screen and when adding to cart. The problem is that with high order volumes the inventory data is pretty much constantly being >>> updated, so the caches are constantly... SLOWLY... being cleared as InventoryDetail records are created for reservations and issuances. >>> >>> To turn this nightmare into a deal killer, consider multiple application servers and the need for either a (SLOW) distributed cache or (SLOW) >>> distributed cache clearing. These have to go over the network anyway, so might as well go to the database! >>> >>> In the case above where we decided to NOT use the entity cache at all the tests were run on one really beefy server showing that disabling the >>> cache was faster. When we ran it in a cluster of just 2 servers with direct DCC (the best case scenario for a distributed cache) we not only saw a >>> big performance hit, but also got various run-time errors from stale data. >>> >>> I really don't how anyone could back the concept of caching all finds by default... you don't even have to imagine edge cases, just consider the >>> problems ALREADY being faced with more limited caching and how often the entity cache simply isn't a good solution. >>> >>> As for improving the entity caching in OFBiz, there are some concepts in Moqui that might be useful: >>> >>> 1. add a cache attribute to the entity definition with true, false, and never options; true and false being defaults that can be overridden by >>> code, and never being an absolute (OFBiz does have this option IIRC); this would default to false, true being a useful setting for common things >>> like Enumeration, StatusItem, etc, etc >>> >>> 2. add general support in the entity engine find methods for a "for update" parameter, and if true don't cache (and pass this on to the DB to lock >>> the record(s) being queried), also making the value mutable >>> >>> 3. a write-through per-transaction cache; you can do some really cool stuff with this, avoiding most database hits during a transaction until the >>> end when the changes are dumped to the DB; the Moqui implementation of this concept even looks for cached records that any find condition would >>> require to get results and does the query in-memory, not having to go to the database at all... and for other queries augments the results with >>> values in the cache >>> >>> The whole concept of a write-through cache that is limited to the scope of a single transaction shows some of the issues you would run into even >>> if trying to make the entity cache transactional. Especially with more complex finds it just falls apart. The current Moqui implementation handles >>> quite a bit, but there are various things that I've run into testing it with real-world business services that are either a REAL pain to handle >>> (so I haven't yet, but it is conceptually possible) or that I simply can't think of any good way to handle... and for those you simply can't use >>> the write-through cache. >>> >>> There are some notes in the code for this, and some code/comments to more thoroughly communicate this concept, in this class in Moqui: >>> >>> https://github.com/moqui/moqui/blob/master/framework/src/main/groovy/org/moqui/impl/context/TransactionCache.groovy >>> >>> I should also say that my motivation to handle every edge case even for this write-through cache is limited... yes there is room for improvement >>> handling more scenarios, but how big will the performance increase ACTUALLY be for them? The efforts on this so far have been based on profiling >>> results and making sure there is a significant difference (which there is for many services in Mantle Business Artifacts, though I haven't even >>> come close to testing all of them this way). >>> >>> The same concept would apply to a read-only entity cache... some things might be possible to support, but would NOT improve performance making >>> them a moot point. >>> >>> I don't know if I've written enough to convince everyone listening that even attempting a universal read-only entity cache is a useless idea... >>> I'm sure some will still like the idea. If anyone gets into it and wants to try it out in their own branch of OFBiz, great... knock yourself out >>> (probably literally...). But PLEASE no one ever commit something like this to the primary branch in the repo... not EVER. >>> >>> The whole idea that the OFBiz entity cache has had more limited ability to handle different scenarios in the past than it does now is not an >>> argument of any sort supporting the idea of taking the entity cache to the ultimate possible end... which theoretically isn't even that far from >>> where it is now. >>> >>> To apply a more useful standard the arguments should be for a _useful_ objective, which means increasing performance. I guarantee an always used >>> find cache will NOT increase performance, it will kill it dead and cause infinite concurrency headaches in the process. >>> >>> -David >>> >>> >> >> > |
In reply to this post by David E. Jones-2
Thanks for the info David! I agree 100% with everything you said.
There may be some misunderstanding about my advice. I suggested that caching should be configured in the settings file, I did not suggest that everything should be cached all the time. Like you said, JMeter tests can reveal what needs to be cached, and a sysadmin can fine-tune performance by tweaking the cache settings. The problem I mentioned is this: A sysadmin can't improve performance by caching a particular entity if a developer has hard-coded it not to be cached. Btw, I removed the complicated condition checking in the condition cache because it didn't work. Not only was the system spending a lot of time evaluating long lists of values (each value having a potentially long list of conditions), at the end of the evaluation the result was always a cache miss. Adrian Crum Sandglass Software www.sandglass-software.com On 3/20/2015 9:22 PM, David E. Jones wrote: > > Stepping back a little, some history and theory of the entity cache might be helpful. > > The original intent of the entity cache was a simple way to keep frequently used values/records closer to the code that uses them, ie in the application server. One real world example of this is the goal to be able to render ecommerce catalog and product pages without hitting the database. > > Over time the entity caching was made more complex to handle more caching scenarios, but still left to the developer to determine if caching is appropriate for the code they are writing. > > In theory is it possible to write an entity cache that can be used 100% of the time? IMO the answer is NO. This is almost possible for single record caching, with the cache ultimately becoming an in-memory relational database running on the app server (with full transaction support, etc)... but for List caching it totally kills the whole concept. The current entity cache keeps lists of results by the query condition used to get those results and this is very different from what a database does, and makes things rather messy and inefficient outside simple use cases. > > On top of these big functional issues (which are deal killers IMO), there is also the performance issue. The point, or intent at least, of the entity cache is to improve performance. As the cache gets more complex the performance will suffer, and because of the whole concept of caching results by queries the performance will be WORSE than the DB performance for the same queries in most cases. Databases are quite fast and efficient, and we'll never be able to reproduce their ability to scale and search in something like an in-memory entity cache, especially not considering the massive redundancy and overhead of caching lists of values by condition. > > As an example of this in the real world: on a large OFBiz project I worked on that finished last year we went into production with the entity cache turned OFF, completely DISABLED. Why? When doing load testing on a whim one of the guys decided to try it without the entity cache enabled, and the body of JMeter tests that exercised a few dozen of the most common user paths through the system actually ran FASTER. The database (MySQL in this case) was hit over the network, but responded quickly enough to make things work quite well for the various find queries, and FAR faster for updates, especially creates. This project was one of the higher volume projects I'm aware of for OFBiz, at peaks handling sustained processing of around 10 orders per second (36,000 per hour), with some short term peaks much higher, closer to 20-30 orders per second... and longer term peaks hitting over 200k orders in one day (north America only day time, around a 12 hour window). > > I found this to be curious so looked into it a bit more and the main performance culprit was updates, ESPECIALLY creates on any entity that has an active list cache. Auto-clearing that cache requires running the condition for each cache entry on the record to see if it matches, and if it does then it is cleared. This could be made more efficient by expanding the reverse index concept to index all values of fields in conditions... though that would be fairly complex to implement because of the wide variety of conditions that CAN be performed on fields, and even moreso when they are combined with other logic... especially NOTs and ORs. This could potentially increase performance, but would again add yet more complexity and overhead. > > To turn this dilemma into a nightmare, consider caching view-entities. In general as systems scale if you ever have to iterate over stuff your performance is going to get hit REALLY hard compared to indexed and other less than n operations. > > The main lesson from the story: caching, especially list caching, should ONLY be done in limited cases when the ratio of reads to write is VERY high, and more particularly the ratio of reads to creates. When considering whether to use a cache this should be considered carefully, because records are sometimes updated from places that developers are unaware, sometimes at surprising volumes. For example, it might seem great (and help a lot in dev and lower scale testing) to cache inventory information for viewing on a category screen, but always go to the DB to avoid stale data on a product detail screen and when adding to cart. The problem is that with high order volumes the inventory data is pretty much constantly being updated, so the caches are constantly... SLOWLY... being cleared as InventoryDetail records are created for reservations and issuances. > > To turn this nightmare into a deal killer, consider multiple application servers and the need for either a (SLOW) distributed cache or (SLOW) distributed cache clearing. These have to go over the network anyway, so might as well go to the database! > > In the case above where we decided to NOT use the entity cache at all the tests were run on one really beefy server showing that disabling the cache was faster. When we ran it in a cluster of just 2 servers with direct DCC (the best case scenario for a distributed cache) we not only saw a big performance hit, but also got various run-time errors from stale data. > > I really don't how anyone could back the concept of caching all finds by default... you don't even have to imagine edge cases, just consider the problems ALREADY being faced with more limited caching and how often the entity cache simply isn't a good solution. > > As for improving the entity caching in OFBiz, there are some concepts in Moqui that might be useful: > > 1. add a cache attribute to the entity definition with true, false, and never options; true and false being defaults that can be overridden by code, and never being an absolute (OFBiz does have this option IIRC); this would default to false, true being a useful setting for common things like Enumeration, StatusItem, etc, etc > > 2. add general support in the entity engine find methods for a "for update" parameter, and if true don't cache (and pass this on to the DB to lock the record(s) being queried), also making the value mutable > > 3. a write-through per-transaction cache; you can do some really cool stuff with this, avoiding most database hits during a transaction until the end when the changes are dumped to the DB; the Moqui implementation of this concept even looks for cached records that any find condition would require to get results and does the query in-memory, not having to go to the database at all... and for other queries augments the results with values in the cache > > The whole concept of a write-through cache that is limited to the scope of a single transaction shows some of the issues you would run into even if trying to make the entity cache transactional. Especially with more complex finds it just falls apart. The current Moqui implementation handles quite a bit, but there are various things that I've run into testing it with real-world business services that are either a REAL pain to handle (so I haven't yet, but it is conceptually possible) or that I simply can't think of any good way to handle... and for those you simply can't use the write-through cache. > > There are some notes in the code for this, and some code/comments to more thoroughly communicate this concept, in this class in Moqui: > > https://github.com/moqui/moqui/blob/master/framework/src/main/groovy/org/moqui/impl/context/TransactionCache.groovy > > I should also say that my motivation to handle every edge case even for this write-through cache is limited... yes there is room for improvement handling more scenarios, but how big will the performance increase ACTUALLY be for them? The efforts on this so far have been based on profiling results and making sure there is a significant difference (which there is for many services in Mantle Business Artifacts, though I haven't even come close to testing all of them this way). > > The same concept would apply to a read-only entity cache... some things might be possible to support, but would NOT improve performance making them a moot point. > > I don't know if I've written enough to convince everyone listening that even attempting a universal read-only entity cache is a useless idea... I'm sure some will still like the idea. If anyone gets into it and wants to try it out in their own branch of OFBiz, great... knock yourself out (probably literally...). But PLEASE no one ever commit something like this to the primary branch in the repo... not EVER. > > The whole idea that the OFBiz entity cache has had more limited ability to handle different scenarios in the past than it does now is not an argument of any sort supporting the idea of taking the entity cache to the ultimate possible end... which theoretically isn't even that far from where it is now. > > To apply a more useful standard the arguments should be for a _useful_ objective, which means increasing performance. I guarantee an always used find cache will NOT increase performance, it will kill it dead and cause infinite concurrency headaches in the process. > > -David > > > > >> On 19 Mar 2015, at 10:46, Adrian Crum <[hidden email]> wrote: >> >> The translation to English is not good, but I think I understand what you are saying. >> >> The entity values in the cache MUST be immutable - because multiple threads share the values. To do otherwise would require complicated synchronization code in GenericValue (which would cause blocking and hurt performance). >> >> When I first starting working on the entity cache issues, it appeared to me that mutable entity values may have been in the original design (to enable a write-through cache). That is my guess - I am not sure. At some time, the entity values in the cache were made immutable, but the change was incomplete - some cached entity values were immutable and others were not. That is one of the things I fixed - I made sure ALL entity values coming from the cache are immutable. >> >> One way we can eliminate the additional complication of cloning immutable entity values is to wrap the List in a custom Iterator implementation that automatically clones elements as they are retrieved from the List. The drawback is the performance hit - because you would be cloning values that might not get modified. I think it is more efficient to clone an entity value only when you intend to modify it. >> >> Adrian Crum >> Sandglass Software >> www.sandglass-software.com >> >> On 3/19/2015 4:19 PM, Nicolas Malin wrote: >>> Le 18/03/2015 13:16, Adrian Crum a écrit : >>>> If you code Delegator calls to avoid the cache, then there is no way >>>> for a sysadmin to configure the caching behavior - that bit of code >>>> will ALWAYS make a database call. >>>> >>>> If you make all Delegator calls use the cache, then there is an >>>> additional complication that will add a bit more code: the >>>> GenericValue instances retrieved from the cache are immutable - if you >>>> want to modify them, then you will have to clone them. So, this >>>> approach can produce an additional line of code. >>> >>> I don't see any logical reason why we need to keep a GenericValue came >>> from cache as immutable. In large vision, a developper give information >>> on cache or not only he want force the cache using during his process. >>> As OFBiz manage by default transaction, timezone, locale, auto-matching >>> or others. >>> The entity engine would be works with admin sys cache tuning. >>> >>> As example delegator.find("Party", "partyId", partyId) use the default >>> parameter from cache.properties and after the store on a cached >>> GenericValue is a delegator's problem. I see a simple test like that : >>> if (genericValue came from cache) { >>> if (value is already done) { >>> getFromDataBase >>> update Value >>> } >>> else refuse (or not I have a doubt :) ) >>> } >>> store >>> >>> >>> Nicolas > |
Hi all,
I just have to say I learned an insane amount of information from this thread! More than half of this stuff should be documented somewhere in the wiki. And to David Jones, respect! Thank you for sharing this information. Taher Alkhateeb ----- Original Message ----- From: "Adrian Crum" <[hidden email]> To: [hidden email] Sent: Saturday, 21 March, 2015 10:39:09 AM Subject: Re: Entity Caching Thanks for the info David! I agree 100% with everything you said. There may be some misunderstanding about my advice. I suggested that caching should be configured in the settings file, I did not suggest that everything should be cached all the time. Like you said, JMeter tests can reveal what needs to be cached, and a sysadmin can fine-tune performance by tweaking the cache settings. The problem I mentioned is this: A sysadmin can't improve performance by caching a particular entity if a developer has hard-coded it not to be cached. Btw, I removed the complicated condition checking in the condition cache because it didn't work. Not only was the system spending a lot of time evaluating long lists of values (each value having a potentially long list of conditions), at the end of the evaluation the result was always a cache miss. Adrian Crum Sandglass Software www.sandglass-software.com On 3/20/2015 9:22 PM, David E. Jones wrote: > > Stepping back a little, some history and theory of the entity cache might be helpful. > > The original intent of the entity cache was a simple way to keep frequently used values/records closer to the code that uses them, ie in the application server. One real world example of this is the goal to be able to render ecommerce catalog and product pages without hitting the database. > > Over time the entity caching was made more complex to handle more caching scenarios, but still left to the developer to determine if caching is appropriate for the code they are writing. > > In theory is it possible to write an entity cache that can be used 100% of the time? IMO the answer is NO. This is almost possible for single record caching, with the cache ultimately becoming an in-memory relational database running on the app server (with full transaction support, etc)... but for List caching it totally kills the whole concept. The current entity cache keeps lists of results by the query condition used to get those results and this is very different from what a database does, and makes things rather messy and inefficient outside simple use cases. > > On top of these big functional issues (which are deal killers IMO), there is also the performance issue. The point, or intent at least, of the entity cache is to improve performance. As the cache gets more complex the performance will suffer, and because of the whole concept of caching results by queries the performance will be WORSE than the DB performance for the same queries in most cases. Databases are quite fast and efficient, and we'll never be able to reproduce their ability to scale and search in something like an in-memory entity cache, especially not considering the massive redundancy and overhead of caching lists of values by condition. > > As an example of this in the real world: on a large OFBiz project I worked on that finished last year we went into production with the entity cache turned OFF, completely DISABLED. Why? When doing load testing on a whim one of the guys decided to try it without the entity cache enabled, and the body of JMeter tests that exercised a few dozen of the most common user paths through the system actually ran FASTER. The database (MySQL in this case) was hit over the network, but responded quickly enough to make things work quite well for the various find queries, and FAR faster for updates, especially creates. This project was one of the higher volume projects I'm aware of for OFBiz, at peaks handling sustained processing of around 10 orders per second (36,000 per hour), with some short term peaks much higher, closer to 20-30 orders per second... and longer term peaks hitting over 200k orders in one day (north America only day time, around a 12 hour window). > > I found this to be curious so looked into it a bit more and the main performance culprit was updates, ESPECIALLY creates on any entity that has an active list cache. Auto-clearing that cache requires running the condition for each cache entry on the record to see if it matches, and if it does then it is cleared. This could be made more efficient by expanding the reverse index concept to index all values of fields in conditions... though that would be fairly complex to implement because of the wide variety of conditions that CAN be performed on fields, and even moreso when they are combined with other logic... especially NOTs and ORs. This could potentially increase performance, but would again add yet more complexity and overhead. > > To turn this dilemma into a nightmare, consider caching view-entities. In general as systems scale if you ever have to iterate over stuff your performance is going to get hit REALLY hard compared to indexed and other less than n operations. > > The main lesson from the story: caching, especially list caching, should ONLY be done in limited cases when the ratio of reads to write is VERY high, and more particularly the ratio of reads to creates. When considering whether to use a cache this should be considered carefully, because records are sometimes updated from places that developers are unaware, sometimes at surprising volumes. For example, it might seem great (and help a lot in dev and lower scale testing) to cache inventory information for viewing on a category screen, but always go to the DB to avoid stale data on a product detail screen and when adding to cart. The problem is that with high order volumes the inventory data is pretty much constantly being updated, so the caches are constantly... SLOWLY... being cleared as InventoryDetail records are created for reservations and issuances. > > To turn this nightmare into a deal killer, consider multiple application servers and the need for either a (SLOW) distributed cache or (SLOW) distributed cache clearing. These have to go over the network anyway, so might as well go to the database! > > In the case above where we decided to NOT use the entity cache at all the tests were run on one really beefy server showing that disabling the cache was faster. When we ran it in a cluster of just 2 servers with direct DCC (the best case scenario for a distributed cache) we not only saw a big performance hit, but also got various run-time errors from stale data. > > I really don't how anyone could back the concept of caching all finds by default... you don't even have to imagine edge cases, just consider the problems ALREADY being faced with more limited caching and how often the entity cache simply isn't a good solution. > > As for improving the entity caching in OFBiz, there are some concepts in Moqui that might be useful: > > 1. add a cache attribute to the entity definition with true, false, and never options; true and false being defaults that can be overridden by code, and never being an absolute (OFBiz does have this option IIRC); this would default to false, true being a useful setting for common things like Enumeration, StatusItem, etc, etc > > 2. add general support in the entity engine find methods for a "for update" parameter, and if true don't cache (and pass this on to the DB to lock the record(s) being queried), also making the value mutable > > 3. a write-through per-transaction cache; you can do some really cool stuff with this, avoiding most database hits during a transaction until the end when the changes are dumped to the DB; the Moqui implementation of this concept even looks for cached records that any find condition would require to get results and does the query in-memory, not having to go to the database at all... and for other queries augments the results with values in the cache > > The whole concept of a write-through cache that is limited to the scope of a single transaction shows some of the issues you would run into even if trying to make the entity cache transactional. Especially with more complex finds it just falls apart. The current Moqui implementation handles quite a bit, but there are various things that I've run into testing it with real-world business services that are either a REAL pain to handle (so I haven't yet, but it is conceptually possible) or that I simply can't think of any good way to handle... and for those you simply can't use the write-through cache. > > There are some notes in the code for this, and some code/comments to more thoroughly communicate this concept, in this class in Moqui: > > https://github.com/moqui/moqui/blob/master/framework/src/main/groovy/org/moqui/impl/context/TransactionCache.groovy > > I should also say that my motivation to handle every edge case even for this write-through cache is limited... yes there is room for improvement handling more scenarios, but how big will the performance increase ACTUALLY be for them? The efforts on this so far have been based on profiling results and making sure there is a significant difference (which there is for many services in Mantle Business Artifacts, though I haven't even come close to testing all of them this way). > > The same concept would apply to a read-only entity cache... some things might be possible to support, but would NOT improve performance making them a moot point. > > I don't know if I've written enough to convince everyone listening that even attempting a universal read-only entity cache is a useless idea... I'm sure some will still like the idea. If anyone gets into it and wants to try it out in their own branch of OFBiz, great... knock yourself out (probably literally...). But PLEASE no one ever commit something like this to the primary branch in the repo... not EVER. > > The whole idea that the OFBiz entity cache has had more limited ability to handle different scenarios in the past than it does now is not an argument of any sort supporting the idea of taking the entity cache to the ultimate possible end... which theoretically isn't even that far from where it is now. > > To apply a more useful standard the arguments should be for a _useful_ objective, which means increasing performance. I guarantee an always used find cache will NOT increase performance, it will kill it dead and cause infinite concurrency headaches in the process. > > -David > > > > >> On 19 Mar 2015, at 10:46, Adrian Crum <[hidden email]> wrote: >> >> The translation to English is not good, but I think I understand what you are saying. >> >> The entity values in the cache MUST be immutable - because multiple threads share the values. To do otherwise would require complicated synchronization code in GenericValue (which would cause blocking and hurt performance). >> >> When I first starting working on the entity cache issues, it appeared to me that mutable entity values may have been in the original design (to enable a write-through cache). That is my guess - I am not sure. At some time, the entity values in the cache were made immutable, but the change was incomplete - some cached entity values were immutable and others were not. That is one of the things I fixed - I made sure ALL entity values coming from the cache are immutable. >> >> One way we can eliminate the additional complication of cloning immutable entity values is to wrap the List in a custom Iterator implementation that automatically clones elements as they are retrieved from the List. The drawback is the performance hit - because you would be cloning values that might not get modified. I think it is more efficient to clone an entity value only when you intend to modify it. >> >> Adrian Crum >> Sandglass Software >> www.sandglass-software.com >> >> On 3/19/2015 4:19 PM, Nicolas Malin wrote: >>> Le 18/03/2015 13:16, Adrian Crum a écrit : >>>> If you code Delegator calls to avoid the cache, then there is no way >>>> for a sysadmin to configure the caching behavior - that bit of code >>>> will ALWAYS make a database call. >>>> >>>> If you make all Delegator calls use the cache, then there is an >>>> additional complication that will add a bit more code: the >>>> GenericValue instances retrieved from the cache are immutable - if you >>>> want to modify them, then you will have to clone them. So, this >>>> approach can produce an additional line of code. >>> >>> I don't see any logical reason why we need to keep a GenericValue came >>> from cache as immutable. In large vision, a developper give information >>> on cache or not only he want force the cache using during his process. >>> As OFBiz manage by default transaction, timezone, locale, auto-matching >>> or others. >>> The entity engine would be works with admin sys cache tuning. >>> >>> As example delegator.find("Party", "partyId", partyId) use the default >>> parameter from cache.properties and after the store on a cached >>> GenericValue is a delegator's problem. I see a simple test like that : >>> if (genericValue came from cache) { >>> if (value is already done) { >>> getFromDataBase >>> update Value >>> } >>> else refuse (or not I have a doubt :) ) >>> } >>> store >>> >>> >>> Nicolas > |
In reply to this post by Adrian Crum-3
> My preference is to make ALL Delegator calls use the cache.
Perhaps I misunderstood the above sentence? I responded because I don't think caching everything is a good idea On 21 Mar 2015 20:41, "Adrian Crum" <[hidden email]> wrote: > > Thanks for the info David! I agree 100% with everything you said. > > There may be some misunderstanding about my advice. I suggested that caching should be configured in the settings file, I did not suggest that everything should be cached all the time. > > Like you said, JMeter tests can reveal what needs to be cached, and a sysadmin can fine-tune performance by tweaking the cache settings. The problem I mentioned is this: A sysadmin can't improve performance by caching a particular entity if a developer has hard-coded it not to be cached. > > Btw, I removed the complicated condition checking in the condition cache because it didn't work. Not only was the system spending a lot of time evaluating long lists of values (each value having a potentially long list of conditions), at the end of the evaluation the result was always a cache miss. > > > > Adrian Crum > Sandglass Software > www.sandglass-software.com > > On 3/20/2015 9:22 PM, David E. Jones wrote: >> >> >> Stepping back a little, some history and theory of the entity cache >> >> The original intent of the entity cache was a simple way to keep frequently used values/records closer to the code that uses them, ie in the application server. One real world example of this is the goal to be able to render ecommerce catalog and product pages without hitting the database. >> >> Over time the entity caching was made more complex to handle more caching scenarios, but still left to the developer to determine if caching is appropriate for the code they are writing. >> >> In theory is it possible to write an entity cache that can be used 100% of the time? IMO the answer is NO. This is almost possible for single record caching, with the cache ultimately becoming an in-memory relational database running on the app server (with full transaction support, etc)... but for List caching it totally kills the whole concept. The current entity cache keeps lists of results by the query condition used to get those results and this is very different from what a database does, and makes things rather messy and inefficient outside simple use cases. >> >> On top of these big functional issues (which are deal killers IMO), there is also the performance issue. The point, or intent at least, of the entity cache is to improve performance. As the cache gets more complex the performance will suffer, and because of the whole concept of caching results by queries the performance will be WORSE than the DB performance for the same queries in most cases. Databases are quite fast and efficient, and we'll never be able to reproduce their ability to scale and search in something like an in-memory entity cache, especially not considering the massive redundancy and overhead of caching lists of values by condition. >> >> As an example of this in the real world: on a large OFBiz project I worked on that finished last year we went into production with the entity cache turned OFF, completely DISABLED. Why? When doing load testing on a whim one of the guys decided to try it without the entity cache enabled, and the body of JMeter tests that exercised a few dozen of the most common user paths through the system actually ran FASTER. The database (MySQL in this case) was hit over the network, but responded quickly enough to make things work quite well for the various find queries, and FAR faster for updates, especially creates. This project was one of the higher volume projects I'm aware of for OFBiz, at peaks handling sustained processing of around 10 orders per second (36,000 per hour), with some short term peaks much higher, closer to 20-30 orders per second... and longer term peaks hitting over 200k orders in one day (north America only day time, around a 12 hour window). >> >> I found this to be curious so looked into it a bit more and the main performance culprit was updates, ESPECIALLY creates on any entity that has an active list cache. Auto-clearing that cache requires running the condition for each cache entry on the record to see if it matches, and if it does then it is cleared. This could be made more efficient by expanding the reverse index concept to index all values of fields in conditions... though that would be fairly complex to implement because of the wide variety of conditions that CAN be performed on fields, and even moreso when they are combined with other logic... especially NOTs and ORs. This could potentially increase performance, but would again add yet more complexity and overhead. >> >> To turn this dilemma into a nightmare, consider caching view-entities. In general as systems scale if you ever have to iterate over stuff your performance is going to get hit REALLY hard compared to indexed and other less than n operations. >> >> The main lesson from the story: caching, especially list caching, should ONLY be done in limited cases when the ratio of reads to write is VERY high, and more particularly the ratio of reads to creates. When considering whether to use a cache this should be considered carefully, because records are sometimes updated from places that developers are unaware, sometimes at surprising volumes. For example, it might seem great (and help a lot in dev and lower scale testing) to cache inventory information for viewing on a category screen, but always go to the DB to avoid stale data on a product detail screen and when adding to cart. The problem is that with high order volumes the inventory data is pretty much constantly being updated, so the caches are constantly... SLOWLY... being cleared as InventoryDetail records are created for reservations and issuances. >> >> To turn this nightmare into a deal killer, consider multiple application servers and the need for either a (SLOW) distributed cache or (SLOW) distributed cache clearing. These have to go over the network anyway, so might as well go to the database! >> >> In the case above where we decided to NOT use the entity cache at all the tests were run on one really beefy server showing that disabling the cache was faster. When we ran it in a cluster of just 2 servers with direct DCC (the best case scenario for a distributed cache) we not only saw a big performance hit, but also got various run-time errors from stale data. >> >> I really don't how anyone could back the concept of caching all finds by default... you don't even have to imagine edge cases, just consider the problems ALREADY being faced with more limited caching and how often the entity cache simply isn't a good solution. >> >> As for improving the entity caching in OFBiz, there are some concepts in Moqui that might be useful: >> >> 1. add a cache attribute to the entity definition with true, false, and never options; true and false being defaults that can be overridden by code, and never being an absolute (OFBiz does have this option IIRC); this would default to false, true being a useful setting for common things like Enumeration, StatusItem, etc, etc >> >> 2. add general support in the entity engine find methods for a "for update" parameter, and if true don't cache (and pass this on to the DB to lock the record(s) being queried), also making the value mutable >> >> 3. a write-through per-transaction cache; you can do some really cool stuff with this, avoiding most database hits during a transaction until the end when the changes are dumped to the DB; the Moqui implementation of this concept even looks for cached records that any find condition would require to get results and does the query in-memory, not having to go to the database at all... and for other queries augments the results with values in the cache >> >> The whole concept of a write-through cache that is limited to the scope of a single transaction shows some of the issues you would run into even if trying to make the entity cache transactional. Especially with more complex finds it just falls apart. The current Moqui implementation handles quite a bit, but there are various things that I've run into testing it with real-world business services that are either a REAL pain to handle (so I haven't yet, but it is conceptually possible) or that I simply can't think of any good way to handle... and for those you simply can't use the write-through cache. >> >> There are some notes in the code for this, and some code/comments to more thoroughly communicate this concept, in this class in Moqui: >> >> https://github.com/moqui/moqui/blob/master/framework/src/main/groovy/org/moqui/impl/context/TransactionCache.groovy >> >> I should also say that my motivation to handle every edge case even for this write-through cache is limited... yes there is room for improvement handling more scenarios, but how big will the performance increase ACTUALLY be for them? The efforts on this so far have been based on profiling results and making sure there is a significant difference (which there is for many services in Mantle Business Artifacts, though I haven't even come close to testing all of them this way). >> >> The same concept would apply to a read-only entity cache... some things might be possible to support, but would NOT improve performance making them a moot point. >> >> I don't know if I've written enough to convince everyone listening that even attempting a universal read-only entity cache is a useless idea... I'm sure some will still like the idea. If anyone gets into it and wants to try it out in their own branch of OFBiz, great... knock yourself out (probably literally...). But PLEASE no one ever commit something like this to the primary branch in the repo... not EVER. >> >> The whole idea that the OFBiz entity cache has had more limited ability to handle different scenarios in the past than it does now is not an argument of any sort supporting the idea of taking the entity cache to the ultimate possible end... which theoretically isn't even that far from where it is now. >> >> To apply a more useful standard the arguments should be for a _useful_ objective, which means increasing performance. I guarantee an always used find cache will NOT increase performance, it will kill it dead and cause infinite concurrency headaches in the process. >> >> -David >> >> >> >> >>> On 19 Mar 2015, at 10:46, Adrian Crum < [hidden email]> wrote: >>> >>> The translation to English is not good, but I think I understand what you are saying. >>> >>> The entity values in the cache MUST be immutable - because multiple threads share the values. To do otherwise would require complicated synchronization code in GenericValue (which would cause blocking and hurt performance). >>> >>> When I first starting working on the entity cache issues, it appeared to me that mutable entity values may have been in the original design (to enable a write-through cache). That is my guess - I am not sure. At some time, the entity values in the cache were made immutable, but the change was incomplete - some cached entity values were immutable and others were not. That is one of the things I fixed - I made sure ALL entity values coming from the cache are immutable. >>> >>> One way we can eliminate the additional complication of cloning immutable entity values is to wrap the List in a custom Iterator implementation that automatically clones elements as they are retrieved from the List. The drawback is the performance hit - because you would be cloning values that might not get modified. I think it is more efficient to clone an entity value only when you intend to modify it. >>> >>> Adrian Crum >>> Sandglass Software >>> www.sandglass-software.com >>> >>> On 3/19/2015 4:19 PM, Nicolas Malin wrote: >>>> >>>> Le 18/03/2015 13:16, Adrian Crum a écrit : >>>>> >>>>> If you code Delegator calls to avoid the cache, then there is no way >>>>> for a sysadmin to configure the caching behavior - that bit of code >>>>> will ALWAYS make a database call. >>>>> >>>>> If you make all Delegator calls use the cache, then there is an >>>>> additional complication that will add a bit more code: the >>>>> GenericValue instances retrieved from the cache are immutable - if you >>>>> want to modify them, then you will have to clone them. So, this >>>>> approach can produce an additional line of code. >>>> >>>> >>>> I don't see any logical reason why we need to keep a GenericValue came >>>> from cache as immutable. In large vision, a developper give information >>>> on cache or not only he want force the cache using during his process. >>>> As OFBiz manage by default transaction, timezone, locale, auto-matching >>>> or others. >>>> The entity engine would be works with admin sys cache tuning. >>>> >>>> As example delegator.find("Party", "partyId", partyId) use the default >>>> parameter from cache.properties and after the store on a cached >>>> GenericValue is a delegator's problem. I see a simple test like that : >>>> if (genericValue came from cache) { >>>> if (value is already done) { >>>> getFromDataBase >>>> update Value >>>> } >>>> else refuse (or not I have a doubt :) ) >>>> } >>>> store >>>> >>>> >>>> Nicolas >> >> |
I will try to say it again, but differently.
If I am a developer, I am not aware of the subtleties of caching various entities. Entity cache settings will be determined during staging. So, I write my code as if everything will be cached - leaving the door open for a sysadmin to configure caching during staging. During staging, a sysadmin can start off with caching disabled, and then switch on caching for various entities while performance tests are being run. After some time, the sysadmin will have cache settings that provide optimal throughput. Does that mean ALL entities are cached? No, only the ones that need to be. The point I'm trying to make is this: The decision to cache or not should be made by a sysadmin, not by a developer. Adrian Crum Sandglass Software www.sandglass-software.com On 3/21/2015 10:08 AM, Scott Gray wrote: >> My preference is to make ALL Delegator calls use the cache. > > Perhaps I misunderstood the above sentence? I responded because I don't > think caching everything is a good idea > > On 21 Mar 2015 20:41, "Adrian Crum" <[hidden email]> > wrote: >> >> Thanks for the info David! I agree 100% with everything you said. >> >> There may be some misunderstanding about my advice. I suggested that > caching should be configured in the settings file, I did not suggest that > everything should be cached all the time. >> >> Like you said, JMeter tests can reveal what needs to be cached, and a > sysadmin can fine-tune performance by tweaking the cache settings. The > problem I mentioned is this: A sysadmin can't improve performance by > caching a particular entity if a developer has hard-coded it not to be > cached. >> >> Btw, I removed the complicated condition checking in the condition cache > because it didn't work. Not only was the system spending a lot of time > evaluating long lists of values (each value having a potentially long list > of conditions), at the end of the evaluation the result was always a cache > miss. >> >> >> >> Adrian Crum >> Sandglass Software >> www.sandglass-software.com >> >> On 3/20/2015 9:22 PM, David E. Jones wrote: >>> >>> >>> Stepping back a little, some history and theory of the entity cache > might be helpful. >>> >>> The original intent of the entity cache was a simple way to keep > frequently used values/records closer to the code that uses them, ie in the > application server. One real world example of this is the goal to be able > to render ecommerce catalog and product pages without hitting the database. >>> >>> Over time the entity caching was made more complex to handle more > caching scenarios, but still left to the developer to determine if caching > is appropriate for the code they are writing. >>> >>> In theory is it possible to write an entity cache that can be used 100% > of the time? IMO the answer is NO. This is almost possible for single > record caching, with the cache ultimately becoming an in-memory relational > database running on the app server (with full transaction support, etc)... > but for List caching it totally kills the whole concept. The current entity > cache keeps lists of results by the query condition used to get those > results and this is very different from what a database does, and makes > things rather messy and inefficient outside simple use cases. >>> >>> On top of these big functional issues (which are deal killers IMO), > there is also the performance issue. The point, or intent at least, of the > entity cache is to improve performance. As the cache gets more complex the > performance will suffer, and because of the whole concept of caching > results by queries the performance will be WORSE than the DB performance > for the same queries in most cases. Databases are quite fast and efficient, > and we'll never be able to reproduce their ability to scale and search in > something like an in-memory entity cache, especially not considering the > massive redundancy and overhead of caching lists of values by condition. >>> >>> As an example of this in the real world: on a large OFBiz project I > worked on that finished last year we went into production with the entity > cache turned OFF, completely DISABLED. Why? When doing load testing on a > whim one of the guys decided to try it without the entity cache enabled, > and the body of JMeter tests that exercised a few dozen of the most common > user paths through the system actually ran FASTER. The database (MySQL in > this case) was hit over the network, but responded quickly enough to make > things work quite well for the various find queries, and FAR faster for > updates, especially creates. This project was one of the higher volume > projects I'm aware of for OFBiz, at peaks handling sustained processing of > around 10 orders per second (36,000 per hour), with some short term peaks > much higher, closer to 20-30 orders per second... and longer term peaks > hitting over 200k orders in one day (north America only day time, around a > 12 hour window). >>> >>> I found this to be curious so looked into it a bit more and the main > performance culprit was updates, ESPECIALLY creates on any entity that has > an active list cache. Auto-clearing that cache requires running the > condition for each cache entry on the record to see if it matches, and if > it does then it is cleared. This could be made more efficient by expanding > the reverse index concept to index all values of fields in conditions... > though that would be fairly complex to implement because of the wide > variety of conditions that CAN be performed on fields, and even moreso when > they are combined with other logic... especially NOTs and ORs. This could > potentially increase performance, but would again add yet more complexity > and overhead. >>> >>> To turn this dilemma into a nightmare, consider caching view-entities. > In general as systems scale if you ever have to iterate over stuff your > performance is going to get hit REALLY hard compared to indexed and other > less than n operations. >>> >>> The main lesson from the story: caching, especially list caching, should > ONLY be done in limited cases when the ratio of reads to write is VERY > high, and more particularly the ratio of reads to creates. When considering > whether to use a cache this should be considered carefully, because records > are sometimes updated from places that developers are unaware, sometimes at > surprising volumes. For example, it might seem great (and help a lot in dev > and lower scale testing) to cache inventory information for viewing on a > category screen, but always go to the DB to avoid stale data on a product > detail screen and when adding to cart. The problem is that with high order > volumes the inventory data is pretty much constantly being updated, so the > caches are constantly... SLOWLY... being cleared as InventoryDetail records > are created for reservations and issuances. >>> >>> To turn this nightmare into a deal killer, consider multiple application > servers and the need for either a (SLOW) distributed cache or (SLOW) > distributed cache clearing. These have to go over the network anyway, so > might as well go to the database! >>> >>> In the case above where we decided to NOT use the entity cache at all > the tests were run on one really beefy server showing that disabling the > cache was faster. When we ran it in a cluster of just 2 servers with direct > DCC (the best case scenario for a distributed cache) we not only saw a big > performance hit, but also got various run-time errors from stale data. >>> >>> I really don't how anyone could back the concept of caching all finds by > default... you don't even have to imagine edge cases, just consider the > problems ALREADY being faced with more limited caching and how often the > entity cache simply isn't a good solution. >>> >>> As for improving the entity caching in OFBiz, there are some concepts in > Moqui that might be useful: >>> >>> 1. add a cache attribute to the entity definition with true, false, and > never options; true and false being defaults that can be overridden by > code, and never being an absolute (OFBiz does have this option IIRC); this > would default to false, true being a useful setting for common things like > Enumeration, StatusItem, etc, etc >>> >>> 2. add general support in the entity engine find methods for a "for > update" parameter, and if true don't cache (and pass this on to the DB to > lock the record(s) being queried), also making the value mutable >>> >>> 3. a write-through per-transaction cache; you can do some really cool > stuff with this, avoiding most database hits during a transaction until the > end when the changes are dumped to the DB; the Moqui implementation of this > concept even looks for cached records that any find condition would require > to get results and does the query in-memory, not having to go to the > database at all... and for other queries augments the results with values > in the cache >>> >>> The whole concept of a write-through cache that is limited to the scope > of a single transaction shows some of the issues you would run into even if > trying to make the entity cache transactional. Especially with more complex > finds it just falls apart. The current Moqui implementation handles quite a > bit, but there are various things that I've run into testing it with > real-world business services that are either a REAL pain to handle (so I > haven't yet, but it is conceptually possible) or that I simply can't think > of any good way to handle... and for those you simply can't use the > write-through cache. >>> >>> There are some notes in the code for this, and some code/comments to > more thoroughly communicate this concept, in this class in Moqui: >>> >>> > https://github.com/moqui/moqui/blob/master/framework/src/main/groovy/org/moqui/impl/context/TransactionCache.groovy >>> >>> I should also say that my motivation to handle every edge case even for > this write-through cache is limited... yes there is room for improvement > handling more scenarios, but how big will the performance increase ACTUALLY > be for them? The efforts on this so far have been based on profiling > results and making sure there is a significant difference (which there is > for many services in Mantle Business Artifacts, though I haven't even come > close to testing all of them this way). >>> >>> The same concept would apply to a read-only entity cache... some things > might be possible to support, but would NOT improve performance making them > a moot point. >>> >>> I don't know if I've written enough to convince everyone listening that > even attempting a universal read-only entity cache is a useless idea... I'm > sure some will still like the idea. If anyone gets into it and wants to try > it out in their own branch of OFBiz, great... knock yourself out (probably > literally...). But PLEASE no one ever commit something like this to the > primary branch in the repo... not EVER. >>> >>> The whole idea that the OFBiz entity cache has had more limited ability > to handle different scenarios in the past than it does now is not an > argument of any sort supporting the idea of taking the entity cache to the > ultimate possible end... which theoretically isn't even that far from where > it is now. >>> >>> To apply a more useful standard the arguments should be for a _useful_ > objective, which means increasing performance. I guarantee an always used > find cache will NOT increase performance, it will kill it dead and cause > infinite concurrency headaches in the process. >>> >>> -David >>> >>> >>> >>> >>>> On 19 Mar 2015, at 10:46, Adrian Crum < > [hidden email]> wrote: >>>> >>>> The translation to English is not good, but I think I understand what > you are saying. >>>> >>>> The entity values in the cache MUST be immutable - because multiple > threads share the values. To do otherwise would require complicated > synchronization code in GenericValue (which would cause blocking and hurt > performance). >>>> >>>> When I first starting working on the entity cache issues, it appeared > to me that mutable entity values may have been in the original design (to > enable a write-through cache). That is my guess - I am not sure. At some > time, the entity values in the cache were made immutable, but the change > was incomplete - some cached entity values were immutable and others were > not. That is one of the things I fixed - I made sure ALL entity values > coming from the cache are immutable. >>>> >>>> One way we can eliminate the additional complication of cloning > immutable entity values is to wrap the List in a custom Iterator > implementation that automatically clones elements as they are retrieved > from the List. The drawback is the performance hit - because you would be > cloning values that might not get modified. I think it is more efficient to > clone an entity value only when you intend to modify it. >>>> >>>> Adrian Crum >>>> Sandglass Software >>>> www.sandglass-software.com >>>> >>>> On 3/19/2015 4:19 PM, Nicolas Malin wrote: >>>>> >>>>> Le 18/03/2015 13:16, Adrian Crum a écrit : >>>>>> >>>>>> If you code Delegator calls to avoid the cache, then there is no way >>>>>> for a sysadmin to configure the caching behavior - that bit of code >>>>>> will ALWAYS make a database call. >>>>>> >>>>>> If you make all Delegator calls use the cache, then there is an >>>>>> additional complication that will add a bit more code: the >>>>>> GenericValue instances retrieved from the cache are immutable - if you >>>>>> want to modify them, then you will have to clone them. So, this >>>>>> approach can produce an additional line of code. >>>>> >>>>> >>>>> I don't see any logical reason why we need to keep a GenericValue came >>>>> from cache as immutable. In large vision, a developper give information >>>>> on cache or not only he want force the cache using during his process. >>>>> As OFBiz manage by default transaction, timezone, locale, auto-matching >>>>> or others. >>>>> The entity engine would be works with admin sys cache tuning. >>>>> >>>>> As example delegator.find("Party", "partyId", partyId) use the default >>>>> parameter from cache.properties and after the store on a cached >>>>> GenericValue is a delegator's problem. I see a simple test like that : >>>>> if (genericValue came from cache) { >>>>> if (value is already done) { >>>>> getFromDataBase >>>>> update Value >>>>> } >>>>> else refuse (or not I have a doubt :) ) >>>>> } >>>>> store >>>>> >>>>> >>>>> Nicolas >>> >>> > |
<quote>The decision to cache or not should be made by a sysadmin</quote>
I agree. This should be a configuration aspect. But I would suggest to have it by default set to false. We have to take into consideration that in testing and staging environments the data sets to work with can be limited in size as compared to the production environment. And performance measurements might go unnoticed. Having it set to false by default, can also build merit for the system integrator/system admin when he switches the caching configuration from false to true to improve performance (perception). When done with true by default and done the other way around, I expect that yielding the same result might prove more difficult. Best regards, Pierre Smits *ORRTIZ.COM <http://www.orrtiz.com>* Services & Solutions for Cloud- Based Manufacturing, Professional Services and Retail & Trade http://www.orrtiz.com On Sat, Mar 21, 2015 at 11:22 AM, Adrian Crum < [hidden email]> wrote: > I will try to say it again, but differently. > > If I am a developer, I am not aware of the subtleties of caching various > entities. Entity cache settings will be determined during staging. So, I > write my code as if everything will be cached - leaving the door open for a > sysadmin to configure caching during staging. > > During staging, a sysadmin can start off with caching disabled, and then > switch on caching for various entities while performance tests are being > run. After some time, the sysadmin will have cache settings that provide > optimal throughput. Does that mean ALL entities are cached? No, only the > ones that need to be. > > The point I'm trying to make is this: The decision to cache or not should > be made by a sysadmin, not by a developer. > > Adrian Crum > Sandglass Software > www.sandglass-software.com > |
In reply to this post by Adrian Crum-3
I agree with Adrian that caching should be a sysadmin choice.
I would also caution that measuring cache performance during testing is not a very useful activity. Testing tends to test one use case once and move on to the next. In production, users tend to do the same thing over and over. Testing might fill a shopping cart a few times and do a lot of other administrative functions as many times . In real life, shopping carts are filled much more frequently than catalog updates (one hopes). Using performance numbers from functional testing will be misleading. The other message that I get from David's discussion is that caching built by professional caching experts (Database developers as he mentioned) worked better than caching systems built by application developers. It is likely that ehcache and the database built-in caching functions will outperform caching systems built by OFBiz developers and will handle the main cases better and will handle edge cases properly. They will probably integrate better and be easier to configure at run-time or during deployment. They will also be easier to tune by the system administrator. I understand that Adrian needs to fix this quickly. I suppose that caching could be eliminated to solve the problem while a better solution is implemented. Do we know what it will take to add enough ehcache to make the system perform adequately to meet current requirements? Ron On 21/03/2015 6:22 AM, Adrian Crum wrote: > I will try to say it again, but differently. > > If I am a developer, I am not aware of the subtleties of caching > various entities. Entity cache settings will be determined during > staging. So, I write my code as if everything will be cached - leaving > the door open for a sysadmin to configure caching during staging. > > During staging, a sysadmin can start off with caching disabled, and > then switch on caching for various entities while performance tests > are being run. After some time, the sysadmin will have cache settings > that provide optimal throughput. Does that mean ALL entities are > cached? No, only the ones that need to be. > > The point I'm trying to make is this: The decision to cache or not > should be made by a sysadmin, not by a developer. > > Adrian Crum > Sandglass Software > www.sandglass-software.com > > On 3/21/2015 10:08 AM, Scott Gray wrote: >>> My preference is to make ALL Delegator calls use the cache. >> >> Perhaps I misunderstood the above sentence? I responded because I don't >> think caching everything is a good idea >> >> On 21 Mar 2015 20:41, "Adrian Crum" <[hidden email]> >> wrote: >>> >>> Thanks for the info David! I agree 100% with everything you said. >>> >>> There may be some misunderstanding about my advice. I suggested that >> caching should be configured in the settings file, I did not suggest >> that >> everything should be cached all the time. >>> >>> Like you said, JMeter tests can reveal what needs to be cached, and a >> sysadmin can fine-tune performance by tweaking the cache settings. The >> problem I mentioned is this: A sysadmin can't improve performance by >> caching a particular entity if a developer has hard-coded it not to be >> cached. >>> >>> Btw, I removed the complicated condition checking in the condition >>> cache >> because it didn't work. Not only was the system spending a lot of time >> evaluating long lists of values (each value having a potentially long >> list >> of conditions), at the end of the evaluation the result was always a >> cache >> miss. >>> >>> >>> >>> Adrian Crum >>> Sandglass Software >>> www.sandglass-software.com >>> >>> On 3/20/2015 9:22 PM, David E. Jones wrote: >>>> >>>> >>>> Stepping back a little, some history and theory of the entity cache >> might be helpful. >>>> >>>> The original intent of the entity cache was a simple way to keep >> frequently used values/records closer to the code that uses them, ie >> in the >> application server. One real world example of this is the goal to be >> able >> to render ecommerce catalog and product pages without hitting the >> database. >>>> >>>> Over time the entity caching was made more complex to handle more >> caching scenarios, but still left to the developer to determine if >> caching >> is appropriate for the code they are writing. >>>> >>>> In theory is it possible to write an entity cache that can be used >>>> 100% >> of the time? IMO the answer is NO. This is almost possible for single >> record caching, with the cache ultimately becoming an in-memory >> relational >> database running on the app server (with full transaction support, >> etc)... >> but for List caching it totally kills the whole concept. The current >> entity >> cache keeps lists of results by the query condition used to get those >> results and this is very different from what a database does, and makes >> things rather messy and inefficient outside simple use cases. >>>> >>>> On top of these big functional issues (which are deal killers IMO), >> there is also the performance issue. The point, or intent at least, >> of the >> entity cache is to improve performance. As the cache gets more >> complex the >> performance will suffer, and because of the whole concept of caching >> results by queries the performance will be WORSE than the DB performance >> for the same queries in most cases. Databases are quite fast and >> efficient, >> and we'll never be able to reproduce their ability to scale and >> search in >> something like an in-memory entity cache, especially not considering the >> massive redundancy and overhead of caching lists of values by condition. >>>> >>>> As an example of this in the real world: on a large OFBiz project I >> worked on that finished last year we went into production with the >> entity >> cache turned OFF, completely DISABLED. Why? When doing load testing on a >> whim one of the guys decided to try it without the entity cache enabled, >> and the body of JMeter tests that exercised a few dozen of the most >> common >> user paths through the system actually ran FASTER. The database >> (MySQL in >> this case) was hit over the network, but responded quickly enough to >> make >> things work quite well for the various find queries, and FAR faster for >> updates, especially creates. This project was one of the higher volume >> projects I'm aware of for OFBiz, at peaks handling sustained >> processing of >> around 10 orders per second (36,000 per hour), with some short term >> peaks >> much higher, closer to 20-30 orders per second... and longer term peaks >> hitting over 200k orders in one day (north America only day time, >> around a >> 12 hour window). >>>> >>>> I found this to be curious so looked into it a bit more and the main >> performance culprit was updates, ESPECIALLY creates on any entity >> that has >> an active list cache. Auto-clearing that cache requires running the >> condition for each cache entry on the record to see if it matches, >> and if >> it does then it is cleared. This could be made more efficient by >> expanding >> the reverse index concept to index all values of fields in conditions... >> though that would be fairly complex to implement because of the wide >> variety of conditions that CAN be performed on fields, and even >> moreso when >> they are combined with other logic... especially NOTs and ORs. This >> could >> potentially increase performance, but would again add yet more >> complexity >> and overhead. >>>> >>>> To turn this dilemma into a nightmare, consider caching view-entities. >> In general as systems scale if you ever have to iterate over stuff your >> performance is going to get hit REALLY hard compared to indexed and >> other >> less than n operations. >>>> >>>> The main lesson from the story: caching, especially list caching, >>>> should >> ONLY be done in limited cases when the ratio of reads to write is VERY >> high, and more particularly the ratio of reads to creates. When >> considering >> whether to use a cache this should be considered carefully, because >> records >> are sometimes updated from places that developers are unaware, >> sometimes at >> surprising volumes. For example, it might seem great (and help a lot >> in dev >> and lower scale testing) to cache inventory information for viewing on a >> category screen, but always go to the DB to avoid stale data on a >> product >> detail screen and when adding to cart. The problem is that with high >> order >> volumes the inventory data is pretty much constantly being updated, >> so the >> caches are constantly... SLOWLY... being cleared as InventoryDetail >> records >> are created for reservations and issuances. >>>> >>>> To turn this nightmare into a deal killer, consider multiple >>>> application >> servers and the need for either a (SLOW) distributed cache or (SLOW) >> distributed cache clearing. These have to go over the network anyway, so >> might as well go to the database! >>>> >>>> In the case above where we decided to NOT use the entity cache at all >> the tests were run on one really beefy server showing that disabling the >> cache was faster. When we ran it in a cluster of just 2 servers with >> direct >> DCC (the best case scenario for a distributed cache) we not only saw >> a big >> performance hit, but also got various run-time errors from stale data. >>>> >>>> I really don't how anyone could back the concept of caching all >>>> finds by >> default... you don't even have to imagine edge cases, just consider the >> problems ALREADY being faced with more limited caching and how often the >> entity cache simply isn't a good solution. >>>> >>>> As for improving the entity caching in OFBiz, there are some >>>> concepts in >> Moqui that might be useful: >>>> >>>> 1. add a cache attribute to the entity definition with true, false, >>>> and >> never options; true and false being defaults that can be overridden by >> code, and never being an absolute (OFBiz does have this option IIRC); >> this >> would default to false, true being a useful setting for common things >> like >> Enumeration, StatusItem, etc, etc >>>> >>>> 2. add general support in the entity engine find methods for a "for >> update" parameter, and if true don't cache (and pass this on to the >> DB to >> lock the record(s) being queried), also making the value mutable >>>> >>>> 3. a write-through per-transaction cache; you can do some really cool >> stuff with this, avoiding most database hits during a transaction >> until the >> end when the changes are dumped to the DB; the Moqui implementation >> of this >> concept even looks for cached records that any find condition would >> require >> to get results and does the query in-memory, not having to go to the >> database at all... and for other queries augments the results with >> values >> in the cache >>>> >>>> The whole concept of a write-through cache that is limited to the >>>> scope >> of a single transaction shows some of the issues you would run into >> even if >> trying to make the entity cache transactional. Especially with more >> complex >> finds it just falls apart. The current Moqui implementation handles >> quite a >> bit, but there are various things that I've run into testing it with >> real-world business services that are either a REAL pain to handle (so I >> haven't yet, but it is conceptually possible) or that I simply can't >> think >> of any good way to handle... and for those you simply can't use the >> write-through cache. >>>> >>>> There are some notes in the code for this, and some code/comments to >> more thoroughly communicate this concept, in this class in Moqui: >>>> >>>> >> https://github.com/moqui/moqui/blob/master/framework/src/main/groovy/org/moqui/impl/context/TransactionCache.groovy >> >>>> >>>> I should also say that my motivation to handle every edge case even >>>> for >> this write-through cache is limited... yes there is room for improvement >> handling more scenarios, but how big will the performance increase >> ACTUALLY >> be for them? The efforts on this so far have been based on profiling >> results and making sure there is a significant difference (which >> there is >> for many services in Mantle Business Artifacts, though I haven't even >> come >> close to testing all of them this way). >>>> >>>> The same concept would apply to a read-only entity cache... some >>>> things >> might be possible to support, but would NOT improve performance >> making them >> a moot point. >>>> >>>> I don't know if I've written enough to convince everyone listening >>>> that >> even attempting a universal read-only entity cache is a useless >> idea... I'm >> sure some will still like the idea. If anyone gets into it and wants >> to try >> it out in their own branch of OFBiz, great... knock yourself out >> (probably >> literally...). But PLEASE no one ever commit something like this to the >> primary branch in the repo... not EVER. >>>> >>>> The whole idea that the OFBiz entity cache has had more limited >>>> ability >> to handle different scenarios in the past than it does now is not an >> argument of any sort supporting the idea of taking the entity cache >> to the >> ultimate possible end... which theoretically isn't even that far from >> where >> it is now. >>>> >>>> To apply a more useful standard the arguments should be for a _useful_ >> objective, which means increasing performance. I guarantee an always >> used >> find cache will NOT increase performance, it will kill it dead and cause >> infinite concurrency headaches in the process. >>>> >>>> -David >>>> >>>> >>>> >>>> >>>>> On 19 Mar 2015, at 10:46, Adrian Crum < >> [hidden email]> wrote: >>>>> >>>>> The translation to English is not good, but I think I understand what >> you are saying. >>>>> >>>>> The entity values in the cache MUST be immutable - because multiple >> threads share the values. To do otherwise would require complicated >> synchronization code in GenericValue (which would cause blocking and >> hurt >> performance). >>>>> >>>>> When I first starting working on the entity cache issues, it appeared >> to me that mutable entity values may have been in the original design >> (to >> enable a write-through cache). That is my guess - I am not sure. At some >> time, the entity values in the cache were made immutable, but the change >> was incomplete - some cached entity values were immutable and others >> were >> not. That is one of the things I fixed - I made sure ALL entity values >> coming from the cache are immutable. >>>>> >>>>> One way we can eliminate the additional complication of cloning >> immutable entity values is to wrap the List in a custom Iterator >> implementation that automatically clones elements as they are retrieved >> from the List. The drawback is the performance hit - because you >> would be >> cloning values that might not get modified. I think it is more >> efficient to >> clone an entity value only when you intend to modify it. >>>>> >>>>> Adrian Crum >>>>> Sandglass Software >>>>> www.sandglass-software.com >>>>> >>>>> On 3/19/2015 4:19 PM, Nicolas Malin wrote: >>>>>> >>>>>> Le 18/03/2015 13:16, Adrian Crum a écrit : >>>>>>> >>>>>>> If you code Delegator calls to avoid the cache, then there is no >>>>>>> way >>>>>>> for a sysadmin to configure the caching behavior - that bit of code >>>>>>> will ALWAYS make a database call. >>>>>>> >>>>>>> If you make all Delegator calls use the cache, then there is an >>>>>>> additional complication that will add a bit more code: the >>>>>>> GenericValue instances retrieved from the cache are immutable - >>>>>>> if you >>>>>>> want to modify them, then you will have to clone them. So, this >>>>>>> approach can produce an additional line of code. >>>>>> >>>>>> >>>>>> I don't see any logical reason why we need to keep a GenericValue >>>>>> came >>>>>> from cache as immutable. In large vision, a developper give >>>>>> information >>>>>> on cache or not only he want force the cache using during his >>>>>> process. >>>>>> As OFBiz manage by default transaction, timezone, locale, >>>>>> auto-matching >>>>>> or others. >>>>>> The entity engine would be works with admin sys cache tuning. >>>>>> >>>>>> As example delegator.find("Party", "partyId", partyId) use the >>>>>> default >>>>>> parameter from cache.properties and after the store on a cached >>>>>> GenericValue is a delegator's problem. I see a simple test like >>>>>> that : >>>>>> if (genericValue came from cache) { >>>>>> if (value is already done) { >>>>>> getFromDataBase >>>>>> update Value >>>>>> } >>>>>> else refuse (or not I have a doubt :) ) >>>>>> } >>>>>> store >>>>>> >>>>>> >>>>>> Nicolas >>>> >>>> >> > -- Ron Wheeler President Artifact Software Inc email: [hidden email] skype: ronaldmwheeler phone: 866-970-2435, ext 102 |
Is there a convenient setting for disabling cache completely as David
mentioned he did? On Sat, 2015-03-21 at 21:39 -0400, Ron Wheeler wrote: > I agree with Adrian that caching should be a sysadmin choice. > > I would also caution that measuring cache performance during testing is > not a very useful activity. Testing tends to test one use case once and > move on to the next. > In production, users tend to do the same thing over and over. > Testing might fill a shopping cart a few times and do a lot of other > administrative functions as many times . In real life, shopping carts > are filled much more frequently than catalog updates (one hopes). Using > performance numbers from functional testing will be misleading. > > The other message that I get from David's discussion is that caching t > built by professional caching experts (Database developers as he > mentioned) worked better than caching systems built by application > developers. > It is likely that ehcache and the database built-in caching functions > will outperform caching systems built by OFBiz developers and will > handle the main cases better and will handle edge cases properly. They > will probably integrate better and be easier to configure at run-time or > during deployment. They will also be easier to tune by the system > administrator. > > I understand that Adrian needs to fix this quickly. I suppose that > caching could be eliminated to solve the problem while a better solution > is implemented. > > Do we know what it will take to add enough ehcache to make the system > perform adequately to meet current requirements? > > Ron > > > On 21/03/2015 6:22 AM, Adrian Crum wrote: > > I will try to say it again, but differently. > > > > If I am a developer, I am not aware of the subtleties of caching > > various entities. Entity cache settings will be determined during > > staging. So, I write my code as if everything will be cached - leaving > > the door open for a sysadmin to configure caching during staging. > > > > During staging, a sysadmin can start off with caching disabled, and > > then switch on caching for various entities while performance tests > > are being run. After some time, the sysadmin will have cache settings > > that provide optimal throughput. Does that mean ALL entities are > > cached? No, only the ones that need to be. > > > > The point I'm trying to make is this: The decision to cache or not > > should be made by a sysadmin, not by a developer. > > > > Adrian Crum > > Sandglass Software > > www.sandglass-software.com > > > > On 3/21/2015 10:08 AM, Scott Gray wrote: > >>> My preference is to make ALL Delegator calls use the cache. > >> > >> Perhaps I misunderstood the above sentence? I responded because I don't > >> think caching everything is a good idea > >> > >> On 21 Mar 2015 20:41, "Adrian Crum" <[hidden email]> > >> wrote: > >>> > >>> Thanks for the info David! I agree 100% with everything you said. > >>> > >>> There may be some misunderstanding about my advice. I suggested that > >> caching should be configured in the settings file, I did not suggest > >> that > >> everything should be cached all the time. > >>> > >>> Like you said, JMeter tests can reveal what needs to be cached, and a > >> sysadmin can fine-tune performance by tweaking the cache settings. The > >> problem I mentioned is this: A sysadmin can't improve performance by > >> caching a particular entity if a developer has hard-coded it not to be > >> cached. > >>> > >>> Btw, I removed the complicated condition checking in the condition > >>> cache > >> because it didn't work. Not only was the system spending a lot of time > >> evaluating long lists of values (each value having a potentially long > >> list > >> of conditions), at the end of the evaluation the result was always a > >> cache > >> miss. > >>> > >>> > >>> > >>> Adrian Crum > >>> Sandglass Software > >>> www.sandglass-software.com > >>> > >>> On 3/20/2015 9:22 PM, David E. Jones wrote: > >>>> > >>>> > >>>> Stepping back a little, some history and theory of the entity cache > >> might be helpful. > >>>> > >>>> The original intent of the entity cache was a simple way to keep > >> frequently used values/records closer to the code that uses them, ie > >> in the > >> application server. One real world example of this is the goal to be > >> able > >> to render ecommerce catalog and product pages without hitting the > >> database. > >>>> > >>>> Over time the entity caching was made more complex to handle more > >> caching scenarios, but still left to the developer to determine if > >> caching > >> is appropriate for the code they are writing. > >>>> > >>>> In theory is it possible to write an entity cache that can be used > >>>> 100% > >> of the time? IMO the answer is NO. This is almost possible for single > >> record caching, with the cache ultimately becoming an in-memory > >> relational > >> database running on the app server (with full transaction support, > >> etc)... > >> but for List caching it totally kills the whole concept. The current > >> entity > >> cache keeps lists of results by the query condition used to get those > >> results and this is very different from what a database does, and makes > >> things rather messy and inefficient outside simple use cases. > >>>> > >>>> On top of these big functional issues (which are deal killers IMO), > >> there is also the performance issue. The point, or intent at least, > >> of the > >> entity cache is to improve performance. As the cache gets more > >> complex the > >> performance will suffer, and because of the whole concept of caching > >> results by queries the performance will be WORSE than the DB performance > >> for the same queries in most cases. Databases are quite fast and > >> efficient, > >> and we'll never be able to reproduce their ability to scale and > >> search in > >> something like an in-memory entity cache, especially not considering the > >> massive redundancy and overhead of caching lists of values by condition. > >>>> > >>>> As an example of this in the real world: on a large OFBiz project I > >> worked on that finished last year we went into production with the > >> entity > >> cache turned OFF, completely DISABLED. Why? When doing load testing on a > >> whim one of the guys decided to try it without the entity cache enabled, > >> and the body of JMeter tests that exercised a few dozen of the most > >> common > >> user paths through the system actually ran FASTER. The database > >> (MySQL in > >> this case) was hit over the network, but responded quickly enough to > >> make > >> things work quite well for the various find queries, and FAR faster for > >> updates, especially creates. This project was one of the higher volume > >> projects I'm aware of for OFBiz, at peaks handling sustained > >> processing of > >> around 10 orders per second (36,000 per hour), with some short term > >> peaks > >> much higher, closer to 20-30 orders per second... and longer term peaks > >> hitting over 200k orders in one day (north America only day time, > >> around a > >> 12 hour window). > >>>> > >>>> I found this to be curious so looked into it a bit more and the main > >> performance culprit was updates, ESPECIALLY creates on any entity > >> that has > >> an active list cache. Auto-clearing that cache requires running the > >> condition for each cache entry on the record to see if it matches, > >> and if > >> it does then it is cleared. This could be made more efficient by > >> expanding > >> the reverse index concept to index all values of fields in conditions... > >> though that would be fairly complex to implement because of the wide > >> variety of conditions that CAN be performed on fields, and even > >> moreso when > >> they are combined with other logic... especially NOTs and ORs. This > >> could > >> potentially increase performance, but would again add yet more > >> complexity > >> and overhead. > >>>> > >>>> To turn this dilemma into a nightmare, consider caching view-entities. > >> In general as systems scale if you ever have to iterate over stuff your > >> performance is going to get hit REALLY hard compared to indexed and > >> other > >> less than n operations. > >>>> > >>>> The main lesson from the story: caching, especially list caching, > >>>> should > >> ONLY be done in limited cases when the ratio of reads to write is VERY > >> high, and more particularly the ratio of reads to creates. When > >> considering > >> whether to use a cache this should be considered carefully, because > >> records > >> are sometimes updated from places that developers are unaware, > >> sometimes at > >> surprising volumes. For example, it might seem great (and help a lot > >> in dev > >> and lower scale testing) to cache inventory information for viewing on a > >> category screen, but always go to the DB to avoid stale data on a > >> product > >> detail screen and when adding to cart. The problem is that with high > >> order > >> volumes the inventory data is pretty much constantly being updated, > >> so the > >> caches are constantly... SLOWLY... being cleared as InventoryDetail > >> records > >> are created for reservations and issuances. > >>>> > >>>> To turn this nightmare into a deal killer, consider multiple > >>>> application > >> servers and the need for either a (SLOW) distributed cache or (SLOW) > >> distributed cache clearing. These have to go over the network anyway, so > >> might as well go to the database! > >>>> > >>>> In the case above where we decided to NOT use the entity cache at all > >> the tests were run on one really beefy server showing that disabling the > >> cache was faster. When we ran it in a cluster of just 2 servers with > >> direct > >> DCC (the best case scenario for a distributed cache) we not only saw > >> a big > >> performance hit, but also got various run-time errors from stale data. > >>>> > >>>> I really don't how anyone could back the concept of caching all > >>>> finds by > >> default... you don't even have to imagine edge cases, just consider the > >> problems ALREADY being faced with more limited caching and how often the > >> entity cache simply isn't a good solution. > >>>> > >>>> As for improving the entity caching in OFBiz, there are some > >>>> concepts in > >> Moqui that might be useful: > >>>> > >>>> 1. add a cache attribute to the entity definition with true, false, > >>>> and > >> never options; true and false being defaults that can be overridden by > >> code, and never being an absolute (OFBiz does have this option IIRC); > >> this > >> would default to false, true being a useful setting for common things > >> like > >> Enumeration, StatusItem, etc, etc > >>>> > >>>> 2. add general support in the entity engine find methods for a "for > >> update" parameter, and if true don't cache (and pass this on to the > >> DB to > >> lock the record(s) being queried), also making the value mutable > >>>> > >>>> 3. a write-through per-transaction cache; you can do some really cool > >> stuff with this, avoiding most database hits during a transaction > >> until the > >> end when the changes are dumped to the DB; the Moqui implementation > >> of this > >> concept even looks for cached records that any find condition would > >> require > >> to get results and does the query in-memory, not having to go to the > >> database at all... and for other queries augments the results with > >> values > >> in the cache > >>>> > >>>> The whole concept of a write-through cache that is limited to the > >>>> scope > >> of a single transaction shows some of the issues you would run into > >> even if > >> trying to make the entity cache transactional. Especially with more > >> complex > >> finds it just falls apart. The current Moqui implementation handles > >> quite a > >> bit, but there are various things that I've run into testing it with > >> real-world business services that are either a REAL pain to handle (so I > >> haven't yet, but it is conceptually possible) or that I simply can't > >> think > >> of any good way to handle... and for those you simply can't use the > >> write-through cache. > >>>> > >>>> There are some notes in the code for this, and some code/comments to > >> more thoroughly communicate this concept, in this class in Moqui: > >>>> > >>>> > >> https://github.com/moqui/moqui/blob/master/framework/src/main/groovy/org/moqui/impl/context/TransactionCache.groovy > >> > >>>> > >>>> I should also say that my motivation to handle every edge case even > >>>> for > >> this write-through cache is limited... yes there is room for improvement > >> handling more scenarios, but how big will the performance increase > >> ACTUALLY > >> be for them? The efforts on this so far have been based on profiling > >> results and making sure there is a significant difference (which > >> there is > >> for many services in Mantle Business Artifacts, though I haven't even > >> come > >> close to testing all of them this way). > >>>> > >>>> The same concept would apply to a read-only entity cache... some > >>>> things > >> might be possible to support, but would NOT improve performance > >> making them > >> a moot point. > >>>> > >>>> I don't know if I've written enough to convince everyone listening > >>>> that > >> even attempting a universal read-only entity cache is a useless > >> idea... I'm > >> sure some will still like the idea. If anyone gets into it and wants > >> to try > >> it out in their own branch of OFBiz, great... knock yourself out > >> (probably > >> literally...). But PLEASE no one ever commit something like this to the > >> primary branch in the repo... not EVER. > >>>> > >>>> The whole idea that the OFBiz entity cache has had more limited > >>>> ability > >> to handle different scenarios in the past than it does now is not an > >> argument of any sort supporting the idea of taking the entity cache > >> to the > >> ultimate possible end... which theoretically isn't even that far from > >> where > >> it is now. > >>>> > >>>> To apply a more useful standard the arguments should be for a _useful_ > >> objective, which means increasing performance. I guarantee an always > >> used > >> find cache will NOT increase performance, it will kill it dead and cause > >> infinite concurrency headaches in the process. > >>>> > >>>> -David > >>>> > >>>> > >>>> > >>>> > >>>>> On 19 Mar 2015, at 10:46, Adrian Crum < > >> [hidden email]> wrote: > >>>>> > >>>>> The translation to English is not good, but I think I understand what > >> you are saying. > >>>>> > >>>>> The entity values in the cache MUST be immutable - because multiple > >> threads share the values. To do otherwise would require complicated > >> synchronization code in GenericValue (which would cause blocking and > >> hurt > >> performance). > >>>>> > >>>>> When I first starting working on the entity cache issues, it appeared > >> to me that mutable entity values may have been in the original design > >> (to > >> enable a write-through cache). That is my guess - I am not sure. At some > >> time, the entity values in the cache were made immutable, but the change > >> was incomplete - some cached entity values were immutable and others > >> were > >> not. That is one of the things I fixed - I made sure ALL entity values > >> coming from the cache are immutable. > >>>>> > >>>>> One way we can eliminate the additional complication of cloning > >> immutable entity values is to wrap the List in a custom Iterator > >> implementation that automatically clones elements as they are retrieved > >> from the List. The drawback is the performance hit - because you > >> would be > >> cloning values that might not get modified. I think it is more > >> efficient to > >> clone an entity value only when you intend to modify it. > >>>>> > >>>>> Adrian Crum > >>>>> Sandglass Software > >>>>> www.sandglass-software.com > >>>>> > >>>>> On 3/19/2015 4:19 PM, Nicolas Malin wrote: > >>>>>> > >>>>>> Le 18/03/2015 13:16, Adrian Crum a écrit : > >>>>>>> > >>>>>>> If you code Delegator calls to avoid the cache, then there is no > >>>>>>> way > >>>>>>> for a sysadmin to configure the caching behavior - that bit of code > >>>>>>> will ALWAYS make a database call. > >>>>>>> > >>>>>>> If you make all Delegator calls use the cache, then there is an > >>>>>>> additional complication that will add a bit more code: the > >>>>>>> GenericValue instances retrieved from the cache are immutable - > >>>>>>> if you > >>>>>>> want to modify them, then you will have to clone them. So, this > >>>>>>> approach can produce an additional line of code. > >>>>>> > >>>>>> > >>>>>> I don't see any logical reason why we need to keep a GenericValue > >>>>>> came > >>>>>> from cache as immutable. In large vision, a developper give > >>>>>> information > >>>>>> on cache or not only he want force the cache using during his > >>>>>> process. > >>>>>> As OFBiz manage by default transaction, timezone, locale, > >>>>>> auto-matching > >>>>>> or others. > >>>>>> The entity engine would be works with admin sys cache tuning. > >>>>>> > >>>>>> As example delegator.find("Party", "partyId", partyId) use the > >>>>>> default > >>>>>> parameter from cache.properties and after the store on a cached > >>>>>> GenericValue is a delegator's problem. I see a simple test like > >>>>>> that : > >>>>>> if (genericValue came from cache) { > >>>>>> if (value is already done) { > >>>>>> getFromDataBase > >>>>>> update Value > >>>>>> } > >>>>>> else refuse (or not I have a doubt :) ) > >>>>>> } > >>>>>> store > >>>>>> > >>>>>> > >>>>>> Nicolas > >>>> > >>>> > >> > > > > |
I don't see an enable/disable setting but
default.maxSize=0 in cache.properties should do it. Adrian Crum Sandglass Software www.sandglass-software.com On 3/22/2015 3:16 AM, Christian Carlow wrote: > Is there a convenient setting for disabling cache completely as David > mentioned he did? > > On Sat, 2015-03-21 at 21:39 -0400, Ron Wheeler wrote: >> I agree with Adrian that caching should be a sysadmin choice. >> >> I would also caution that measuring cache performance during testing is >> not a very useful activity. Testing tends to test one use case once and >> move on to the next. >> In production, users tend to do the same thing over and over. >> Testing might fill a shopping cart a few times and do a lot of other >> administrative functions as many times . In real life, shopping carts >> are filled much more frequently than catalog updates (one hopes). Using >> performance numbers from functional testing will be misleading. >> >> The other message that I get from David's discussion is that caching t >> built by professional caching experts (Database developers as he >> mentioned) worked better than caching systems built by application >> developers. >> It is likely that ehcache and the database built-in caching functions >> will outperform caching systems built by OFBiz developers and will >> handle the main cases better and will handle edge cases properly. They >> will probably integrate better and be easier to configure at run-time or >> during deployment. They will also be easier to tune by the system >> administrator. >> >> I understand that Adrian needs to fix this quickly. I suppose that >> caching could be eliminated to solve the problem while a better solution >> is implemented. >> >> Do we know what it will take to add enough ehcache to make the system >> perform adequately to meet current requirements? >> >> Ron >> >> >> On 21/03/2015 6:22 AM, Adrian Crum wrote: >>> I will try to say it again, but differently. >>> >>> If I am a developer, I am not aware of the subtleties of caching >>> various entities. Entity cache settings will be determined during >>> staging. So, I write my code as if everything will be cached - leaving >>> the door open for a sysadmin to configure caching during staging. >>> >>> During staging, a sysadmin can start off with caching disabled, and >>> then switch on caching for various entities while performance tests >>> are being run. After some time, the sysadmin will have cache settings >>> that provide optimal throughput. Does that mean ALL entities are >>> cached? No, only the ones that need to be. >>> >>> The point I'm trying to make is this: The decision to cache or not >>> should be made by a sysadmin, not by a developer. >>> >>> Adrian Crum >>> Sandglass Software >>> www.sandglass-software.com >>> >>> On 3/21/2015 10:08 AM, Scott Gray wrote: >>>>> My preference is to make ALL Delegator calls use the cache. >>>> >>>> Perhaps I misunderstood the above sentence? I responded because I don't >>>> think caching everything is a good idea >>>> >>>> On 21 Mar 2015 20:41, "Adrian Crum" <[hidden email]> >>>> wrote: >>>>> >>>>> Thanks for the info David! I agree 100% with everything you said. >>>>> >>>>> There may be some misunderstanding about my advice. I suggested that >>>> caching should be configured in the settings file, I did not suggest >>>> that >>>> everything should be cached all the time. >>>>> >>>>> Like you said, JMeter tests can reveal what needs to be cached, and a >>>> sysadmin can fine-tune performance by tweaking the cache settings. The >>>> problem I mentioned is this: A sysadmin can't improve performance by >>>> caching a particular entity if a developer has hard-coded it not to be >>>> cached. >>>>> >>>>> Btw, I removed the complicated condition checking in the condition >>>>> cache >>>> because it didn't work. Not only was the system spending a lot of time >>>> evaluating long lists of values (each value having a potentially long >>>> list >>>> of conditions), at the end of the evaluation the result was always a >>>> cache >>>> miss. >>>>> >>>>> >>>>> >>>>> Adrian Crum >>>>> Sandglass Software >>>>> www.sandglass-software.com >>>>> >>>>> On 3/20/2015 9:22 PM, David E. Jones wrote: >>>>>> >>>>>> >>>>>> Stepping back a little, some history and theory of the entity cache >>>> might be helpful. >>>>>> >>>>>> The original intent of the entity cache was a simple way to keep >>>> frequently used values/records closer to the code that uses them, ie >>>> in the >>>> application server. One real world example of this is the goal to be >>>> able >>>> to render ecommerce catalog and product pages without hitting the >>>> database. >>>>>> >>>>>> Over time the entity caching was made more complex to handle more >>>> caching scenarios, but still left to the developer to determine if >>>> caching >>>> is appropriate for the code they are writing. >>>>>> >>>>>> In theory is it possible to write an entity cache that can be used >>>>>> 100% >>>> of the time? IMO the answer is NO. This is almost possible for single >>>> record caching, with the cache ultimately becoming an in-memory >>>> relational >>>> database running on the app server (with full transaction support, >>>> etc)... >>>> but for List caching it totally kills the whole concept. The current >>>> entity >>>> cache keeps lists of results by the query condition used to get those >>>> results and this is very different from what a database does, and makes >>>> things rather messy and inefficient outside simple use cases. >>>>>> >>>>>> On top of these big functional issues (which are deal killers IMO), >>>> there is also the performance issue. The point, or intent at least, >>>> of the >>>> entity cache is to improve performance. As the cache gets more >>>> complex the >>>> performance will suffer, and because of the whole concept of caching >>>> results by queries the performance will be WORSE than the DB performance >>>> for the same queries in most cases. Databases are quite fast and >>>> efficient, >>>> and we'll never be able to reproduce their ability to scale and >>>> search in >>>> something like an in-memory entity cache, especially not considering the >>>> massive redundancy and overhead of caching lists of values by condition. >>>>>> >>>>>> As an example of this in the real world: on a large OFBiz project I >>>> worked on that finished last year we went into production with the >>>> entity >>>> cache turned OFF, completely DISABLED. Why? When doing load testing on a >>>> whim one of the guys decided to try it without the entity cache enabled, >>>> and the body of JMeter tests that exercised a few dozen of the most >>>> common >>>> user paths through the system actually ran FASTER. The database >>>> (MySQL in >>>> this case) was hit over the network, but responded quickly enough to >>>> make >>>> things work quite well for the various find queries, and FAR faster for >>>> updates, especially creates. This project was one of the higher volume >>>> projects I'm aware of for OFBiz, at peaks handling sustained >>>> processing of >>>> around 10 orders per second (36,000 per hour), with some short term >>>> peaks >>>> much higher, closer to 20-30 orders per second... and longer term peaks >>>> hitting over 200k orders in one day (north America only day time, >>>> around a >>>> 12 hour window). >>>>>> >>>>>> I found this to be curious so looked into it a bit more and the main >>>> performance culprit was updates, ESPECIALLY creates on any entity >>>> that has >>>> an active list cache. Auto-clearing that cache requires running the >>>> condition for each cache entry on the record to see if it matches, >>>> and if >>>> it does then it is cleared. This could be made more efficient by >>>> expanding >>>> the reverse index concept to index all values of fields in conditions... >>>> though that would be fairly complex to implement because of the wide >>>> variety of conditions that CAN be performed on fields, and even >>>> moreso when >>>> they are combined with other logic... especially NOTs and ORs. This >>>> could >>>> potentially increase performance, but would again add yet more >>>> complexity >>>> and overhead. >>>>>> >>>>>> To turn this dilemma into a nightmare, consider caching view-entities. >>>> In general as systems scale if you ever have to iterate over stuff your >>>> performance is going to get hit REALLY hard compared to indexed and >>>> other >>>> less than n operations. >>>>>> >>>>>> The main lesson from the story: caching, especially list caching, >>>>>> should >>>> ONLY be done in limited cases when the ratio of reads to write is VERY >>>> high, and more particularly the ratio of reads to creates. When >>>> considering >>>> whether to use a cache this should be considered carefully, because >>>> records >>>> are sometimes updated from places that developers are unaware, >>>> sometimes at >>>> surprising volumes. For example, it might seem great (and help a lot >>>> in dev >>>> and lower scale testing) to cache inventory information for viewing on a >>>> category screen, but always go to the DB to avoid stale data on a >>>> product >>>> detail screen and when adding to cart. The problem is that with high >>>> order >>>> volumes the inventory data is pretty much constantly being updated, >>>> so the >>>> caches are constantly... SLOWLY... being cleared as InventoryDetail >>>> records >>>> are created for reservations and issuances. >>>>>> >>>>>> To turn this nightmare into a deal killer, consider multiple >>>>>> application >>>> servers and the need for either a (SLOW) distributed cache or (SLOW) >>>> distributed cache clearing. These have to go over the network anyway, so >>>> might as well go to the database! >>>>>> >>>>>> In the case above where we decided to NOT use the entity cache at all >>>> the tests were run on one really beefy server showing that disabling the >>>> cache was faster. When we ran it in a cluster of just 2 servers with >>>> direct >>>> DCC (the best case scenario for a distributed cache) we not only saw >>>> a big >>>> performance hit, but also got various run-time errors from stale data. >>>>>> >>>>>> I really don't how anyone could back the concept of caching all >>>>>> finds by >>>> default... you don't even have to imagine edge cases, just consider the >>>> problems ALREADY being faced with more limited caching and how often the >>>> entity cache simply isn't a good solution. >>>>>> >>>>>> As for improving the entity caching in OFBiz, there are some >>>>>> concepts in >>>> Moqui that might be useful: >>>>>> >>>>>> 1. add a cache attribute to the entity definition with true, false, >>>>>> and >>>> never options; true and false being defaults that can be overridden by >>>> code, and never being an absolute (OFBiz does have this option IIRC); >>>> this >>>> would default to false, true being a useful setting for common things >>>> like >>>> Enumeration, StatusItem, etc, etc >>>>>> >>>>>> 2. add general support in the entity engine find methods for a "for >>>> update" parameter, and if true don't cache (and pass this on to the >>>> DB to >>>> lock the record(s) being queried), also making the value mutable >>>>>> >>>>>> 3. a write-through per-transaction cache; you can do some really cool >>>> stuff with this, avoiding most database hits during a transaction >>>> until the >>>> end when the changes are dumped to the DB; the Moqui implementation >>>> of this >>>> concept even looks for cached records that any find condition would >>>> require >>>> to get results and does the query in-memory, not having to go to the >>>> database at all... and for other queries augments the results with >>>> values >>>> in the cache >>>>>> >>>>>> The whole concept of a write-through cache that is limited to the >>>>>> scope >>>> of a single transaction shows some of the issues you would run into >>>> even if >>>> trying to make the entity cache transactional. Especially with more >>>> complex >>>> finds it just falls apart. The current Moqui implementation handles >>>> quite a >>>> bit, but there are various things that I've run into testing it with >>>> real-world business services that are either a REAL pain to handle (so I >>>> haven't yet, but it is conceptually possible) or that I simply can't >>>> think >>>> of any good way to handle... and for those you simply can't use the >>>> write-through cache. >>>>>> >>>>>> There are some notes in the code for this, and some code/comments to >>>> more thoroughly communicate this concept, in this class in Moqui: >>>>>> >>>>>> >>>> https://github.com/moqui/moqui/blob/master/framework/src/main/groovy/org/moqui/impl/context/TransactionCache.groovy >>>> >>>>>> >>>>>> I should also say that my motivation to handle every edge case even >>>>>> for >>>> this write-through cache is limited... yes there is room for improvement >>>> handling more scenarios, but how big will the performance increase >>>> ACTUALLY >>>> be for them? The efforts on this so far have been based on profiling >>>> results and making sure there is a significant difference (which >>>> there is >>>> for many services in Mantle Business Artifacts, though I haven't even >>>> come >>>> close to testing all of them this way). >>>>>> >>>>>> The same concept would apply to a read-only entity cache... some >>>>>> things >>>> might be possible to support, but would NOT improve performance >>>> making them >>>> a moot point. >>>>>> >>>>>> I don't know if I've written enough to convince everyone listening >>>>>> that >>>> even attempting a universal read-only entity cache is a useless >>>> idea... I'm >>>> sure some will still like the idea. If anyone gets into it and wants >>>> to try >>>> it out in their own branch of OFBiz, great... knock yourself out >>>> (probably >>>> literally...). But PLEASE no one ever commit something like this to the >>>> primary branch in the repo... not EVER. >>>>>> >>>>>> The whole idea that the OFBiz entity cache has had more limited >>>>>> ability >>>> to handle different scenarios in the past than it does now is not an >>>> argument of any sort supporting the idea of taking the entity cache >>>> to the >>>> ultimate possible end... which theoretically isn't even that far from >>>> where >>>> it is now. >>>>>> >>>>>> To apply a more useful standard the arguments should be for a _useful_ >>>> objective, which means increasing performance. I guarantee an always >>>> used >>>> find cache will NOT increase performance, it will kill it dead and cause >>>> infinite concurrency headaches in the process. >>>>>> >>>>>> -David >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> On 19 Mar 2015, at 10:46, Adrian Crum < >>>> [hidden email]> wrote: >>>>>>> >>>>>>> The translation to English is not good, but I think I understand what >>>> you are saying. >>>>>>> >>>>>>> The entity values in the cache MUST be immutable - because multiple >>>> threads share the values. To do otherwise would require complicated >>>> synchronization code in GenericValue (which would cause blocking and >>>> hurt >>>> performance). >>>>>>> >>>>>>> When I first starting working on the entity cache issues, it appeared >>>> to me that mutable entity values may have been in the original design >>>> (to >>>> enable a write-through cache). That is my guess - I am not sure. At some >>>> time, the entity values in the cache were made immutable, but the change >>>> was incomplete - some cached entity values were immutable and others >>>> were >>>> not. That is one of the things I fixed - I made sure ALL entity values >>>> coming from the cache are immutable. >>>>>>> >>>>>>> One way we can eliminate the additional complication of cloning >>>> immutable entity values is to wrap the List in a custom Iterator >>>> implementation that automatically clones elements as they are retrieved >>>> from the List. The drawback is the performance hit - because you >>>> would be >>>> cloning values that might not get modified. I think it is more >>>> efficient to >>>> clone an entity value only when you intend to modify it. >>>>>>> >>>>>>> Adrian Crum >>>>>>> Sandglass Software >>>>>>> www.sandglass-software.com >>>>>>> >>>>>>> On 3/19/2015 4:19 PM, Nicolas Malin wrote: >>>>>>>> >>>>>>>> Le 18/03/2015 13:16, Adrian Crum a écrit : >>>>>>>>> >>>>>>>>> If you code Delegator calls to avoid the cache, then there is no >>>>>>>>> way >>>>>>>>> for a sysadmin to configure the caching behavior - that bit of code >>>>>>>>> will ALWAYS make a database call. >>>>>>>>> >>>>>>>>> If you make all Delegator calls use the cache, then there is an >>>>>>>>> additional complication that will add a bit more code: the >>>>>>>>> GenericValue instances retrieved from the cache are immutable - >>>>>>>>> if you >>>>>>>>> want to modify them, then you will have to clone them. So, this >>>>>>>>> approach can produce an additional line of code. >>>>>>>> >>>>>>>> >>>>>>>> I don't see any logical reason why we need to keep a GenericValue >>>>>>>> came >>>>>>>> from cache as immutable. In large vision, a developper give >>>>>>>> information >>>>>>>> on cache or not only he want force the cache using during his >>>>>>>> process. >>>>>>>> As OFBiz manage by default transaction, timezone, locale, >>>>>>>> auto-matching >>>>>>>> or others. >>>>>>>> The entity engine would be works with admin sys cache tuning. >>>>>>>> >>>>>>>> As example delegator.find("Party", "partyId", partyId) use the >>>>>>>> default >>>>>>>> parameter from cache.properties and after the store on a cached >>>>>>>> GenericValue is a delegator's problem. I see a simple test like >>>>>>>> that : >>>>>>>> if (genericValue came from cache) { >>>>>>>> if (value is already done) { >>>>>>>> getFromDataBase >>>>>>>> update Value >>>>>>>> } >>>>>>>> else refuse (or not I have a doubt :) ) >>>>>>>> } >>>>>>>> store >>>>>>>> >>>>>>>> >>>>>>>> Nicolas >>>>>> >>>>>> >>>> >>> >> >> > > |
Free forum by Nabble | Edit this page |