Entity Caching

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
46 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: Entity Caching

Adrian Crum-3
Interesting. The current cache code ignores the maxSize setting.

Adrian Crum
Sandglass Software
www.sandglass-software.com

On 3/22/2015 7:38 AM, Adrian Crum wrote:

> I don't see an enable/disable setting but
>
> default.maxSize=0 in cache.properties
>
> should do it.
>
>
> Adrian Crum
> Sandglass Software
> www.sandglass-software.com
>
> On 3/22/2015 3:16 AM, Christian Carlow wrote:
>> Is there a convenient setting for disabling cache completely as David
>> mentioned he did?
>>
>> On Sat, 2015-03-21 at 21:39 -0400, Ron Wheeler wrote:
>>> I agree with Adrian that caching should be a sysadmin choice.
>>>
>>> I would also caution that measuring cache performance during testing is
>>> not a very useful activity. Testing tends to test one use case once and
>>> move on to the next.
>>> In production, users tend to do the same thing over and over.
>>> Testing might fill a shopping cart a few times and do a lot of other
>>> administrative functions as many times . In real life, shopping carts
>>> are filled much more frequently than catalog updates (one hopes). Using
>>> performance numbers from functional testing will be misleading.
>>>
>>> The other message that I get from David's discussion is that caching t
>>> built by professional caching experts  (Database developers as he
>>> mentioned) worked better than caching systems built by application
>>> developers.
>>> It is likely that ehcache and the database built-in caching functions
>>> will outperform caching systems built by OFBiz developers and will
>>> handle the main cases better and will handle edge cases properly. They
>>> will probably integrate better and be easier to configure at run-time or
>>> during deployment. They will also be easier to tune by the system
>>> administrator.
>>>
>>> I understand that Adrian needs to fix this quickly. I suppose that
>>> caching could be eliminated to solve the problem while a better solution
>>> is implemented.
>>>
>>> Do we know what it will take to add enough ehcache to make the system
>>> perform adequately to meet current requirements?
>>>
>>> Ron
>>>
>>>
>>> On 21/03/2015 6:22 AM, Adrian Crum wrote:
>>>> I will try to say it again, but differently.
>>>>
>>>> If I am a developer, I am not aware of the subtleties of caching
>>>> various entities. Entity cache settings will be determined during
>>>> staging. So, I write my code as if everything will be cached - leaving
>>>> the door open for a sysadmin to configure caching during staging.
>>>>
>>>> During staging, a sysadmin can start off with caching disabled, and
>>>> then switch on caching for various entities while performance tests
>>>> are being run. After some time, the sysadmin will have cache settings
>>>> that provide optimal throughput. Does that mean ALL entities are
>>>> cached? No, only the ones that need to be.
>>>>
>>>> The point I'm trying to make is this: The decision to cache or not
>>>> should be made by a sysadmin, not by a developer.
>>>>
>>>> Adrian Crum
>>>> Sandglass Software
>>>> www.sandglass-software.com
>>>>
>>>> On 3/21/2015 10:08 AM, Scott Gray wrote:
>>>>>> My preference is to make ALL Delegator calls use the cache.
>>>>>
>>>>> Perhaps I misunderstood the above sentence? I responded because I
>>>>> don't
>>>>> think caching everything is a good idea
>>>>>
>>>>> On 21 Mar 2015 20:41, "Adrian Crum"
>>>>> <[hidden email]>
>>>>> wrote:
>>>>>>
>>>>>> Thanks for the info David! I agree 100% with everything you said.
>>>>>>
>>>>>> There may be some misunderstanding about my advice. I suggested that
>>>>> caching should be configured in the settings file, I did not suggest
>>>>> that
>>>>> everything should be cached all the time.
>>>>>>
>>>>>> Like you said, JMeter tests can reveal what needs to be cached, and a
>>>>> sysadmin can fine-tune performance by tweaking the cache settings. The
>>>>> problem I mentioned is this: A sysadmin can't improve performance by
>>>>> caching a particular entity if a developer has hard-coded it not to be
>>>>> cached.
>>>>>>
>>>>>> Btw, I removed the complicated condition checking in the condition
>>>>>> cache
>>>>> because it didn't work. Not only was the system spending a lot of time
>>>>> evaluating long lists of values (each value having a potentially long
>>>>> list
>>>>> of conditions), at the end of the evaluation the result was always a
>>>>> cache
>>>>> miss.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Adrian Crum
>>>>>> Sandglass Software
>>>>>> www.sandglass-software.com
>>>>>>
>>>>>> On 3/20/2015 9:22 PM, David E. Jones wrote:
>>>>>>>
>>>>>>>
>>>>>>> Stepping back a little, some history and theory of the entity cache
>>>>> might be helpful.
>>>>>>>
>>>>>>> The original intent of the entity cache was a simple way to keep
>>>>> frequently used values/records closer to the code that uses them, ie
>>>>> in the
>>>>> application server. One real world example of this is the goal to be
>>>>> able
>>>>> to render ecommerce catalog and product pages without hitting the
>>>>> database.
>>>>>>>
>>>>>>> Over time the entity caching was made more complex to handle more
>>>>> caching scenarios, but still left to the developer to determine if
>>>>> caching
>>>>> is appropriate for the code they are writing.
>>>>>>>
>>>>>>> In theory is it possible to write an entity cache that can be used
>>>>>>> 100%
>>>>> of the time? IMO the answer is NO. This is almost possible for single
>>>>> record caching, with the cache ultimately becoming an in-memory
>>>>> relational
>>>>> database running on the app server (with full transaction support,
>>>>> etc)...
>>>>> but for List caching it totally kills the whole concept. The current
>>>>> entity
>>>>> cache keeps lists of results by the query condition used to get those
>>>>> results and this is very different from what a database does, and
>>>>> makes
>>>>> things rather messy and inefficient outside simple use cases.
>>>>>>>
>>>>>>> On top of these big functional issues (which are deal killers IMO),
>>>>> there is also the performance issue. The point, or intent at least,
>>>>> of the
>>>>> entity cache is to improve performance. As the cache gets more
>>>>> complex the
>>>>> performance will suffer, and because of the whole concept of caching
>>>>> results by queries the performance will be WORSE than the DB
>>>>> performance
>>>>> for the same queries in most cases. Databases are quite fast and
>>>>> efficient,
>>>>> and we'll never be able to reproduce their ability to scale and
>>>>> search in
>>>>> something like an in-memory entity cache, especially not
>>>>> considering the
>>>>> massive redundancy and overhead of caching lists of values by
>>>>> condition.
>>>>>>>
>>>>>>> As an example of this in the real world: on a large OFBiz project I
>>>>> worked on that finished last year we went into production with the
>>>>> entity
>>>>> cache turned OFF, completely DISABLED. Why? When doing load testing
>>>>> on a
>>>>> whim one of the guys decided to try it without the entity cache
>>>>> enabled,
>>>>> and the body of JMeter tests that exercised a few dozen of the most
>>>>> common
>>>>> user paths through the system actually ran FASTER. The database
>>>>> (MySQL in
>>>>> this case) was hit over the network, but responded quickly enough to
>>>>> make
>>>>> things work quite well for the various find queries, and FAR faster
>>>>> for
>>>>> updates, especially creates. This project was one of the higher volume
>>>>> projects I'm aware of for OFBiz, at peaks handling sustained
>>>>> processing of
>>>>> around 10 orders per second (36,000 per hour), with some short term
>>>>> peaks
>>>>> much higher, closer to 20-30 orders per second... and longer term
>>>>> peaks
>>>>> hitting over 200k orders in one day (north America only day time,
>>>>> around a
>>>>> 12 hour window).
>>>>>>>
>>>>>>> I found this to be curious so looked into it a bit more and the main
>>>>> performance culprit was updates, ESPECIALLY creates on any entity
>>>>> that has
>>>>> an active list cache. Auto-clearing that cache requires running the
>>>>> condition for each cache entry on the record to see if it matches,
>>>>> and if
>>>>> it does then it is cleared. This could be made more efficient by
>>>>> expanding
>>>>> the reverse index concept to index all values of fields in
>>>>> conditions...
>>>>> though that would be fairly complex to implement because of the wide
>>>>> variety of conditions that CAN be performed on fields, and even
>>>>> moreso when
>>>>> they are combined with other logic... especially NOTs and ORs. This
>>>>> could
>>>>> potentially increase performance, but would again add yet more
>>>>> complexity
>>>>> and overhead.
>>>>>>>
>>>>>>> To turn this dilemma into a nightmare, consider caching
>>>>>>> view-entities.
>>>>> In general as systems scale if you ever have to iterate over stuff
>>>>> your
>>>>> performance is going to get hit REALLY hard compared to indexed and
>>>>> other
>>>>> less than n operations.
>>>>>>>
>>>>>>> The main lesson from the story: caching, especially list caching,
>>>>>>> should
>>>>> ONLY be done in limited cases when the ratio of reads to write is VERY
>>>>> high, and more particularly the ratio of reads to creates. When
>>>>> considering
>>>>> whether to use a cache this should be considered carefully, because
>>>>> records
>>>>> are sometimes updated from places that developers are unaware,
>>>>> sometimes at
>>>>> surprising volumes. For example, it might seem great (and help a lot
>>>>> in dev
>>>>> and lower scale testing) to cache inventory information for viewing
>>>>> on a
>>>>> category screen, but always go to the DB to avoid stale data on a
>>>>> product
>>>>> detail screen and when adding to cart. The problem is that with high
>>>>> order
>>>>> volumes the inventory data is pretty much constantly being updated,
>>>>> so the
>>>>> caches are constantly... SLOWLY... being cleared as InventoryDetail
>>>>> records
>>>>> are created for reservations and issuances.
>>>>>>>
>>>>>>> To turn this nightmare into a deal killer, consider multiple
>>>>>>> application
>>>>> servers and the need for either a (SLOW) distributed cache or (SLOW)
>>>>> distributed cache clearing. These have to go over the network
>>>>> anyway, so
>>>>> might as well go to the database!
>>>>>>>
>>>>>>> In the case above where we decided to NOT use the entity cache at
>>>>>>> all
>>>>> the tests were run on one really beefy server showing that
>>>>> disabling the
>>>>> cache was faster. When we ran it in a cluster of just 2 servers with
>>>>> direct
>>>>> DCC (the best case scenario for a distributed cache) we not only saw
>>>>> a big
>>>>> performance hit, but also got various run-time errors from stale data.
>>>>>>>
>>>>>>> I really don't how anyone could back the concept of caching all
>>>>>>> finds by
>>>>> default... you don't even have to imagine edge cases, just consider
>>>>> the
>>>>> problems ALREADY being faced with more limited caching and how
>>>>> often the
>>>>> entity cache simply isn't a good solution.
>>>>>>>
>>>>>>> As for improving the entity caching in OFBiz, there are some
>>>>>>> concepts in
>>>>> Moqui that might be useful:
>>>>>>>
>>>>>>> 1. add a cache attribute to the entity definition with true, false,
>>>>>>> and
>>>>> never options; true and false being defaults that can be overridden by
>>>>> code, and never being an absolute (OFBiz does have this option IIRC);
>>>>> this
>>>>> would default to false, true being a useful setting for common things
>>>>> like
>>>>> Enumeration, StatusItem, etc, etc
>>>>>>>
>>>>>>> 2. add general support in the entity engine find methods for a "for
>>>>> update" parameter, and if true don't cache (and pass this on to the
>>>>> DB to
>>>>> lock the record(s) being queried), also making the value mutable
>>>>>>>
>>>>>>> 3. a write-through per-transaction cache; you can do some really
>>>>>>> cool
>>>>> stuff with this, avoiding most database hits during a transaction
>>>>> until the
>>>>> end when the changes are dumped to the DB; the Moqui implementation
>>>>> of this
>>>>> concept even looks for cached records that any find condition would
>>>>> require
>>>>> to get results and does the query in-memory, not having to go to the
>>>>> database at all... and for other queries augments the results with
>>>>> values
>>>>> in the cache
>>>>>>>
>>>>>>> The whole concept of a write-through cache that is limited to the
>>>>>>> scope
>>>>> of a single transaction shows some of the issues you would run into
>>>>> even if
>>>>> trying to make the entity cache transactional. Especially with more
>>>>> complex
>>>>> finds it just falls apart. The current Moqui implementation handles
>>>>> quite a
>>>>> bit, but there are various things that I've run into testing it with
>>>>> real-world business services that are either a REAL pain to handle
>>>>> (so I
>>>>> haven't yet, but it is conceptually possible) or that I simply can't
>>>>> think
>>>>> of any good way to handle... and for those you simply can't use the
>>>>> write-through cache.
>>>>>>>
>>>>>>> There are some notes in the code for this, and some code/comments to
>>>>> more thoroughly communicate this concept, in this class in Moqui:
>>>>>>>
>>>>>>>
>>>>> https://github.com/moqui/moqui/blob/master/framework/src/main/groovy/org/moqui/impl/context/TransactionCache.groovy
>>>>>
>>>>>
>>>>>>>
>>>>>>> I should also say that my motivation to handle every edge case even
>>>>>>> for
>>>>> this write-through cache is limited... yes there is room for
>>>>> improvement
>>>>> handling more scenarios, but how big will the performance increase
>>>>> ACTUALLY
>>>>> be for them? The efforts on this so far have been based on profiling
>>>>> results and making sure there is a significant difference (which
>>>>> there is
>>>>> for many services in Mantle Business Artifacts, though I haven't even
>>>>> come
>>>>> close to testing all of them this way).
>>>>>>>
>>>>>>> The same concept would apply to a read-only entity cache... some
>>>>>>> things
>>>>> might be possible to support, but would NOT improve performance
>>>>> making them
>>>>> a moot point.
>>>>>>>
>>>>>>> I don't know if I've written enough to convince everyone listening
>>>>>>> that
>>>>> even attempting a universal read-only entity cache is a useless
>>>>> idea... I'm
>>>>> sure some will still like the idea. If anyone gets into it and wants
>>>>> to try
>>>>> it out in their own branch of OFBiz, great... knock yourself out
>>>>> (probably
>>>>> literally...). But PLEASE no one ever commit something like this to
>>>>> the
>>>>> primary branch in the repo... not EVER.
>>>>>>>
>>>>>>> The whole idea that the OFBiz entity cache has had more limited
>>>>>>> ability
>>>>> to handle different scenarios in the past than it does now is not an
>>>>> argument of any sort supporting the idea of taking the entity cache
>>>>> to the
>>>>> ultimate possible end... which theoretically isn't even that far from
>>>>> where
>>>>> it is now.
>>>>>>>
>>>>>>> To apply a more useful standard the arguments should be for a
>>>>>>> _useful_
>>>>> objective, which means increasing performance. I guarantee an always
>>>>> used
>>>>> find cache will NOT increase performance, it will kill it dead and
>>>>> cause
>>>>> infinite concurrency headaches in the process.
>>>>>>>
>>>>>>> -David
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> On 19 Mar 2015, at 10:46, Adrian Crum <
>>>>> [hidden email]> wrote:
>>>>>>>>
>>>>>>>> The translation to English is not good, but I think I understand
>>>>>>>> what
>>>>> you are saying.
>>>>>>>>
>>>>>>>> The entity values in the cache MUST be immutable - because multiple
>>>>> threads share the values. To do otherwise would require complicated
>>>>> synchronization code in GenericValue (which would cause blocking and
>>>>> hurt
>>>>> performance).
>>>>>>>>
>>>>>>>> When I first starting working on the entity cache issues, it
>>>>>>>> appeared
>>>>> to me that mutable entity values may have been in the original design
>>>>> (to
>>>>> enable a write-through cache). That is my guess - I am not sure. At
>>>>> some
>>>>> time, the entity values in the cache were made immutable, but the
>>>>> change
>>>>> was incomplete - some cached entity values were immutable and others
>>>>> were
>>>>> not. That is one of the things I fixed - I made sure ALL entity values
>>>>> coming from the cache are immutable.
>>>>>>>>
>>>>>>>> One way we can eliminate the additional complication of cloning
>>>>> immutable entity values is to wrap the List in a custom Iterator
>>>>> implementation that automatically clones elements as they are
>>>>> retrieved
>>>>> from the List. The drawback is the performance hit - because you
>>>>> would be
>>>>> cloning values that might not get modified. I think it is more
>>>>> efficient to
>>>>> clone an entity value only when you intend to modify it.
>>>>>>>>
>>>>>>>> Adrian Crum
>>>>>>>> Sandglass Software
>>>>>>>> www.sandglass-software.com
>>>>>>>>
>>>>>>>> On 3/19/2015 4:19 PM, Nicolas Malin wrote:
>>>>>>>>>
>>>>>>>>> Le 18/03/2015 13:16, Adrian Crum a écrit :
>>>>>>>>>>
>>>>>>>>>> If you code Delegator calls to avoid the cache, then there is no
>>>>>>>>>> way
>>>>>>>>>> for a sysadmin to configure the caching behavior - that bit of
>>>>>>>>>> code
>>>>>>>>>> will ALWAYS make a database call.
>>>>>>>>>>
>>>>>>>>>> If you make all Delegator calls use the cache, then there is an
>>>>>>>>>> additional complication that will add a bit more code: the
>>>>>>>>>> GenericValue instances retrieved from the cache are immutable -
>>>>>>>>>> if you
>>>>>>>>>> want to modify them, then you will have to clone them. So, this
>>>>>>>>>> approach can produce an additional line of code.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I don't see any logical reason why we need to keep a GenericValue
>>>>>>>>> came
>>>>>>>>> from cache as immutable. In large vision, a developper give
>>>>>>>>> information
>>>>>>>>> on cache or not only he want force the cache using during his
>>>>>>>>> process.
>>>>>>>>> As OFBiz manage by default transaction, timezone, locale,
>>>>>>>>> auto-matching
>>>>>>>>> or others.
>>>>>>>>> The entity engine would be works with admin sys cache tuning.
>>>>>>>>>
>>>>>>>>> As example delegator.find("Party", "partyId", partyId) use the
>>>>>>>>> default
>>>>>>>>> parameter from cache.properties and after the store on a cached
>>>>>>>>> GenericValue is a delegator's problem. I see a simple test like
>>>>>>>>> that :
>>>>>>>>> if (genericValue came from cache) {
>>>>>>>>>       if (value is already done) {
>>>>>>>>>          getFromDataBase
>>>>>>>>>          update Value
>>>>>>>>>       }
>>>>>>>>>       else refuse (or not I have a doubt :) )
>>>>>>>>> }
>>>>>>>>> store
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Nicolas
>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>>
>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: Entity Caching

Jacques Le Roux
Administrator
Maybe only when it's set to 0?

Jacques

Le 22/03/2015 10:38, Adrian Crum a écrit :

> Interesting. The current cache code ignores the maxSize setting.
>
> Adrian Crum
> Sandglass Software
> www.sandglass-software.com
>
> On 3/22/2015 7:38 AM, Adrian Crum wrote:
>> I don't see an enable/disable setting but
>>
>> default.maxSize=0 in cache.properties
>>
>> should do it.
>>
>>
>> Adrian Crum
>> Sandglass Software
>> www.sandglass-software.com
>>
>> On 3/22/2015 3:16 AM, Christian Carlow wrote:
>>> Is there a convenient setting for disabling cache completely as David
>>> mentioned he did?
>>>
>>> On Sat, 2015-03-21 at 21:39 -0400, Ron Wheeler wrote:
>>>> I agree with Adrian that caching should be a sysadmin choice.
>>>>
>>>> I would also caution that measuring cache performance during testing is
>>>> not a very useful activity. Testing tends to test one use case once and
>>>> move on to the next.
>>>> In production, users tend to do the same thing over and over.
>>>> Testing might fill a shopping cart a few times and do a lot of other
>>>> administrative functions as many times . In real life, shopping carts
>>>> are filled much more frequently than catalog updates (one hopes). Using
>>>> performance numbers from functional testing will be misleading.
>>>>
>>>> The other message that I get from David's discussion is that caching t
>>>> built by professional caching experts  (Database developers as he
>>>> mentioned) worked better than caching systems built by application
>>>> developers.
>>>> It is likely that ehcache and the database built-in caching functions
>>>> will outperform caching systems built by OFBiz developers and will
>>>> handle the main cases better and will handle edge cases properly. They
>>>> will probably integrate better and be easier to configure at run-time or
>>>> during deployment. They will also be easier to tune by the system
>>>> administrator.
>>>>
>>>> I understand that Adrian needs to fix this quickly. I suppose that
>>>> caching could be eliminated to solve the problem while a better solution
>>>> is implemented.
>>>>
>>>> Do we know what it will take to add enough ehcache to make the system
>>>> perform adequately to meet current requirements?
>>>>
>>>> Ron
>>>>
>>>>
>>>> On 21/03/2015 6:22 AM, Adrian Crum wrote:
>>>>> I will try to say it again, but differently.
>>>>>
>>>>> If I am a developer, I am not aware of the subtleties of caching
>>>>> various entities. Entity cache settings will be determined during
>>>>> staging. So, I write my code as if everything will be cached - leaving
>>>>> the door open for a sysadmin to configure caching during staging.
>>>>>
>>>>> During staging, a sysadmin can start off with caching disabled, and
>>>>> then switch on caching for various entities while performance tests
>>>>> are being run. After some time, the sysadmin will have cache settings
>>>>> that provide optimal throughput. Does that mean ALL entities are
>>>>> cached? No, only the ones that need to be.
>>>>>
>>>>> The point I'm trying to make is this: The decision to cache or not
>>>>> should be made by a sysadmin, not by a developer.
>>>>>
>>>>> Adrian Crum
>>>>> Sandglass Software
>>>>> www.sandglass-software.com
>>>>>
>>>>> On 3/21/2015 10:08 AM, Scott Gray wrote:
>>>>>>> My preference is to make ALL Delegator calls use the cache.
>>>>>>
>>>>>> Perhaps I misunderstood the above sentence? I responded because I
>>>>>> don't
>>>>>> think caching everything is a good idea
>>>>>>
>>>>>> On 21 Mar 2015 20:41, "Adrian Crum"
>>>>>> <[hidden email]>
>>>>>> wrote:
>>>>>>>
>>>>>>> Thanks for the info David! I agree 100% with everything you said.
>>>>>>>
>>>>>>> There may be some misunderstanding about my advice. I suggested that
>>>>>> caching should be configured in the settings file, I did not suggest
>>>>>> that
>>>>>> everything should be cached all the time.
>>>>>>>
>>>>>>> Like you said, JMeter tests can reveal what needs to be cached, and a
>>>>>> sysadmin can fine-tune performance by tweaking the cache settings. The
>>>>>> problem I mentioned is this: A sysadmin can't improve performance by
>>>>>> caching a particular entity if a developer has hard-coded it not to be
>>>>>> cached.
>>>>>>>
>>>>>>> Btw, I removed the complicated condition checking in the condition
>>>>>>> cache
>>>>>> because it didn't work. Not only was the system spending a lot of time
>>>>>> evaluating long lists of values (each value having a potentially long
>>>>>> list
>>>>>> of conditions), at the end of the evaluation the result was always a
>>>>>> cache
>>>>>> miss.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Adrian Crum
>>>>>>> Sandglass Software
>>>>>>> www.sandglass-software.com
>>>>>>>
>>>>>>> On 3/20/2015 9:22 PM, David E. Jones wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Stepping back a little, some history and theory of the entity cache
>>>>>> might be helpful.
>>>>>>>>
>>>>>>>> The original intent of the entity cache was a simple way to keep
>>>>>> frequently used values/records closer to the code that uses them, ie
>>>>>> in the
>>>>>> application server. One real world example of this is the goal to be
>>>>>> able
>>>>>> to render ecommerce catalog and product pages without hitting the
>>>>>> database.
>>>>>>>>
>>>>>>>> Over time the entity caching was made more complex to handle more
>>>>>> caching scenarios, but still left to the developer to determine if
>>>>>> caching
>>>>>> is appropriate for the code they are writing.
>>>>>>>>
>>>>>>>> In theory is it possible to write an entity cache that can be used
>>>>>>>> 100%
>>>>>> of the time? IMO the answer is NO. This is almost possible for single
>>>>>> record caching, with the cache ultimately becoming an in-memory
>>>>>> relational
>>>>>> database running on the app server (with full transaction support,
>>>>>> etc)...
>>>>>> but for List caching it totally kills the whole concept. The current
>>>>>> entity
>>>>>> cache keeps lists of results by the query condition used to get those
>>>>>> results and this is very different from what a database does, and
>>>>>> makes
>>>>>> things rather messy and inefficient outside simple use cases.
>>>>>>>>
>>>>>>>> On top of these big functional issues (which are deal killers IMO),
>>>>>> there is also the performance issue. The point, or intent at least,
>>>>>> of the
>>>>>> entity cache is to improve performance. As the cache gets more
>>>>>> complex the
>>>>>> performance will suffer, and because of the whole concept of caching
>>>>>> results by queries the performance will be WORSE than the DB
>>>>>> performance
>>>>>> for the same queries in most cases. Databases are quite fast and
>>>>>> efficient,
>>>>>> and we'll never be able to reproduce their ability to scale and
>>>>>> search in
>>>>>> something like an in-memory entity cache, especially not
>>>>>> considering the
>>>>>> massive redundancy and overhead of caching lists of values by
>>>>>> condition.
>>>>>>>>
>>>>>>>> As an example of this in the real world: on a large OFBiz project I
>>>>>> worked on that finished last year we went into production with the
>>>>>> entity
>>>>>> cache turned OFF, completely DISABLED. Why? When doing load testing
>>>>>> on a
>>>>>> whim one of the guys decided to try it without the entity cache
>>>>>> enabled,
>>>>>> and the body of JMeter tests that exercised a few dozen of the most
>>>>>> common
>>>>>> user paths through the system actually ran FASTER. The database
>>>>>> (MySQL in
>>>>>> this case) was hit over the network, but responded quickly enough to
>>>>>> make
>>>>>> things work quite well for the various find queries, and FAR faster
>>>>>> for
>>>>>> updates, especially creates. This project was one of the higher volume
>>>>>> projects I'm aware of for OFBiz, at peaks handling sustained
>>>>>> processing of
>>>>>> around 10 orders per second (36,000 per hour), with some short term
>>>>>> peaks
>>>>>> much higher, closer to 20-30 orders per second... and longer term
>>>>>> peaks
>>>>>> hitting over 200k orders in one day (north America only day time,
>>>>>> around a
>>>>>> 12 hour window).
>>>>>>>>
>>>>>>>> I found this to be curious so looked into it a bit more and the main
>>>>>> performance culprit was updates, ESPECIALLY creates on any entity
>>>>>> that has
>>>>>> an active list cache. Auto-clearing that cache requires running the
>>>>>> condition for each cache entry on the record to see if it matches,
>>>>>> and if
>>>>>> it does then it is cleared. This could be made more efficient by
>>>>>> expanding
>>>>>> the reverse index concept to index all values of fields in
>>>>>> conditions...
>>>>>> though that would be fairly complex to implement because of the wide
>>>>>> variety of conditions that CAN be performed on fields, and even
>>>>>> moreso when
>>>>>> they are combined with other logic... especially NOTs and ORs. This
>>>>>> could
>>>>>> potentially increase performance, but would again add yet more
>>>>>> complexity
>>>>>> and overhead.
>>>>>>>>
>>>>>>>> To turn this dilemma into a nightmare, consider caching
>>>>>>>> view-entities.
>>>>>> In general as systems scale if you ever have to iterate over stuff
>>>>>> your
>>>>>> performance is going to get hit REALLY hard compared to indexed and
>>>>>> other
>>>>>> less than n operations.
>>>>>>>>
>>>>>>>> The main lesson from the story: caching, especially list caching,
>>>>>>>> should
>>>>>> ONLY be done in limited cases when the ratio of reads to write is VERY
>>>>>> high, and more particularly the ratio of reads to creates. When
>>>>>> considering
>>>>>> whether to use a cache this should be considered carefully, because
>>>>>> records
>>>>>> are sometimes updated from places that developers are unaware,
>>>>>> sometimes at
>>>>>> surprising volumes. For example, it might seem great (and help a lot
>>>>>> in dev
>>>>>> and lower scale testing) to cache inventory information for viewing
>>>>>> on a
>>>>>> category screen, but always go to the DB to avoid stale data on a
>>>>>> product
>>>>>> detail screen and when adding to cart. The problem is that with high
>>>>>> order
>>>>>> volumes the inventory data is pretty much constantly being updated,
>>>>>> so the
>>>>>> caches are constantly... SLOWLY... being cleared as InventoryDetail
>>>>>> records
>>>>>> are created for reservations and issuances.
>>>>>>>>
>>>>>>>> To turn this nightmare into a deal killer, consider multiple
>>>>>>>> application
>>>>>> servers and the need for either a (SLOW) distributed cache or (SLOW)
>>>>>> distributed cache clearing. These have to go over the network
>>>>>> anyway, so
>>>>>> might as well go to the database!
>>>>>>>>
>>>>>>>> In the case above where we decided to NOT use the entity cache at
>>>>>>>> all
>>>>>> the tests were run on one really beefy server showing that
>>>>>> disabling the
>>>>>> cache was faster. When we ran it in a cluster of just 2 servers with
>>>>>> direct
>>>>>> DCC (the best case scenario for a distributed cache) we not only saw
>>>>>> a big
>>>>>> performance hit, but also got various run-time errors from stale data.
>>>>>>>>
>>>>>>>> I really don't how anyone could back the concept of caching all
>>>>>>>> finds by
>>>>>> default... you don't even have to imagine edge cases, just consider
>>>>>> the
>>>>>> problems ALREADY being faced with more limited caching and how
>>>>>> often the
>>>>>> entity cache simply isn't a good solution.
>>>>>>>>
>>>>>>>> As for improving the entity caching in OFBiz, there are some
>>>>>>>> concepts in
>>>>>> Moqui that might be useful:
>>>>>>>>
>>>>>>>> 1. add a cache attribute to the entity definition with true, false,
>>>>>>>> and
>>>>>> never options; true and false being defaults that can be overridden by
>>>>>> code, and never being an absolute (OFBiz does have this option IIRC);
>>>>>> this
>>>>>> would default to false, true being a useful setting for common things
>>>>>> like
>>>>>> Enumeration, StatusItem, etc, etc
>>>>>>>>
>>>>>>>> 2. add general support in the entity engine find methods for a "for
>>>>>> update" parameter, and if true don't cache (and pass this on to the
>>>>>> DB to
>>>>>> lock the record(s) being queried), also making the value mutable
>>>>>>>>
>>>>>>>> 3. a write-through per-transaction cache; you can do some really
>>>>>>>> cool
>>>>>> stuff with this, avoiding most database hits during a transaction
>>>>>> until the
>>>>>> end when the changes are dumped to the DB; the Moqui implementation
>>>>>> of this
>>>>>> concept even looks for cached records that any find condition would
>>>>>> require
>>>>>> to get results and does the query in-memory, not having to go to the
>>>>>> database at all... and for other queries augments the results with
>>>>>> values
>>>>>> in the cache
>>>>>>>>
>>>>>>>> The whole concept of a write-through cache that is limited to the
>>>>>>>> scope
>>>>>> of a single transaction shows some of the issues you would run into
>>>>>> even if
>>>>>> trying to make the entity cache transactional. Especially with more
>>>>>> complex
>>>>>> finds it just falls apart. The current Moqui implementation handles
>>>>>> quite a
>>>>>> bit, but there are various things that I've run into testing it with
>>>>>> real-world business services that are either a REAL pain to handle
>>>>>> (so I
>>>>>> haven't yet, but it is conceptually possible) or that I simply can't
>>>>>> think
>>>>>> of any good way to handle... and for those you simply can't use the
>>>>>> write-through cache.
>>>>>>>>
>>>>>>>> There are some notes in the code for this, and some code/comments to
>>>>>> more thoroughly communicate this concept, in this class in Moqui:
>>>>>>>>
>>>>>>>>
>>>>>> https://github.com/moqui/moqui/blob/master/framework/src/main/groovy/org/moqui/impl/context/TransactionCache.groovy
>>>>>>
>>>>>>
>>>>>>>>
>>>>>>>> I should also say that my motivation to handle every edge case even
>>>>>>>> for
>>>>>> this write-through cache is limited... yes there is room for
>>>>>> improvement
>>>>>> handling more scenarios, but how big will the performance increase
>>>>>> ACTUALLY
>>>>>> be for them? The efforts on this so far have been based on profiling
>>>>>> results and making sure there is a significant difference (which
>>>>>> there is
>>>>>> for many services in Mantle Business Artifacts, though I haven't even
>>>>>> come
>>>>>> close to testing all of them this way).
>>>>>>>>
>>>>>>>> The same concept would apply to a read-only entity cache... some
>>>>>>>> things
>>>>>> might be possible to support, but would NOT improve performance
>>>>>> making them
>>>>>> a moot point.
>>>>>>>>
>>>>>>>> I don't know if I've written enough to convince everyone listening
>>>>>>>> that
>>>>>> even attempting a universal read-only entity cache is a useless
>>>>>> idea... I'm
>>>>>> sure some will still like the idea. If anyone gets into it and wants
>>>>>> to try
>>>>>> it out in their own branch of OFBiz, great... knock yourself out
>>>>>> (probably
>>>>>> literally...). But PLEASE no one ever commit something like this to
>>>>>> the
>>>>>> primary branch in the repo... not EVER.
>>>>>>>>
>>>>>>>> The whole idea that the OFBiz entity cache has had more limited
>>>>>>>> ability
>>>>>> to handle different scenarios in the past than it does now is not an
>>>>>> argument of any sort supporting the idea of taking the entity cache
>>>>>> to the
>>>>>> ultimate possible end... which theoretically isn't even that far from
>>>>>> where
>>>>>> it is now.
>>>>>>>>
>>>>>>>> To apply a more useful standard the arguments should be for a
>>>>>>>> _useful_
>>>>>> objective, which means increasing performance. I guarantee an always
>>>>>> used
>>>>>> find cache will NOT increase performance, it will kill it dead and
>>>>>> cause
>>>>>> infinite concurrency headaches in the process.
>>>>>>>>
>>>>>>>> -David
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> On 19 Mar 2015, at 10:46, Adrian Crum <
>>>>>> [hidden email]> wrote:
>>>>>>>>>
>>>>>>>>> The translation to English is not good, but I think I understand
>>>>>>>>> what
>>>>>> you are saying.
>>>>>>>>>
>>>>>>>>> The entity values in the cache MUST be immutable - because multiple
>>>>>> threads share the values. To do otherwise would require complicated
>>>>>> synchronization code in GenericValue (which would cause blocking and
>>>>>> hurt
>>>>>> performance).
>>>>>>>>>
>>>>>>>>> When I first starting working on the entity cache issues, it
>>>>>>>>> appeared
>>>>>> to me that mutable entity values may have been in the original design
>>>>>> (to
>>>>>> enable a write-through cache). That is my guess - I am not sure. At
>>>>>> some
>>>>>> time, the entity values in the cache were made immutable, but the
>>>>>> change
>>>>>> was incomplete - some cached entity values were immutable and others
>>>>>> were
>>>>>> not. That is one of the things I fixed - I made sure ALL entity values
>>>>>> coming from the cache are immutable.
>>>>>>>>>
>>>>>>>>> One way we can eliminate the additional complication of cloning
>>>>>> immutable entity values is to wrap the List in a custom Iterator
>>>>>> implementation that automatically clones elements as they are
>>>>>> retrieved
>>>>>> from the List. The drawback is the performance hit - because you
>>>>>> would be
>>>>>> cloning values that might not get modified. I think it is more
>>>>>> efficient to
>>>>>> clone an entity value only when you intend to modify it.
>>>>>>>>>
>>>>>>>>> Adrian Crum
>>>>>>>>> Sandglass Software
>>>>>>>>> www.sandglass-software.com
>>>>>>>>>
>>>>>>>>> On 3/19/2015 4:19 PM, Nicolas Malin wrote:
>>>>>>>>>>
>>>>>>>>>> Le 18/03/2015 13:16, Adrian Crum a écrit :
>>>>>>>>>>>
>>>>>>>>>>> If you code Delegator calls to avoid the cache, then there is no
>>>>>>>>>>> way
>>>>>>>>>>> for a sysadmin to configure the caching behavior - that bit of
>>>>>>>>>>> code
>>>>>>>>>>> will ALWAYS make a database call.
>>>>>>>>>>>
>>>>>>>>>>> If you make all Delegator calls use the cache, then there is an
>>>>>>>>>>> additional complication that will add a bit more code: the
>>>>>>>>>>> GenericValue instances retrieved from the cache are immutable -
>>>>>>>>>>> if you
>>>>>>>>>>> want to modify them, then you will have to clone them. So, this
>>>>>>>>>>> approach can produce an additional line of code.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I don't see any logical reason why we need to keep a GenericValue
>>>>>>>>>> came
>>>>>>>>>> from cache as immutable. In large vision, a developper give
>>>>>>>>>> information
>>>>>>>>>> on cache or not only he want force the cache using during his
>>>>>>>>>> process.
>>>>>>>>>> As OFBiz manage by default transaction, timezone, locale,
>>>>>>>>>> auto-matching
>>>>>>>>>> or others.
>>>>>>>>>> The entity engine would be works with admin sys cache tuning.
>>>>>>>>>>
>>>>>>>>>> As example delegator.find("Party", "partyId", partyId) use the
>>>>>>>>>> default
>>>>>>>>>> parameter from cache.properties and after the store on a cached
>>>>>>>>>> GenericValue is a delegator's problem. I see a simple test like
>>>>>>>>>> that :
>>>>>>>>>> if (genericValue came from cache) {
>>>>>>>>>>       if (value is already done) {
>>>>>>>>>>          getFromDataBase
>>>>>>>>>>          update Value
>>>>>>>>>>       }
>>>>>>>>>>       else refuse (or not I have a doubt :) )
>>>>>>>>>> }
>>>>>>>>>> store
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Nicolas
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Entity Caching

Adrian Crum-3
In reply to this post by Adrian Crum-3
Oops, my bad. A maxSize setting of zero means there is no limit.

I spent some time looking through the UtilCache code, and I don't see
any way to disable a cache. Generally speaking, any setting of zero
means there is no limit for that setting.

It would be nice to have an enabled/disabled setting.

Adrian Crum
Sandglass Software
www.sandglass-software.com

On 3/22/2015 9:38 AM, Adrian Crum wrote:

> Interesting. The current cache code ignores the maxSize setting.
>
> Adrian Crum
> Sandglass Software
> www.sandglass-software.com
>
> On 3/22/2015 7:38 AM, Adrian Crum wrote:
>> I don't see an enable/disable setting but
>>
>> default.maxSize=0 in cache.properties
>>
>> should do it.
>>
>>
>> Adrian Crum
>> Sandglass Software
>> www.sandglass-software.com
>>
>> On 3/22/2015 3:16 AM, Christian Carlow wrote:
>>> Is there a convenient setting for disabling cache completely as David
>>> mentioned he did?
>>>
>>> On Sat, 2015-03-21 at 21:39 -0400, Ron Wheeler wrote:
>>>> I agree with Adrian that caching should be a sysadmin choice.
>>>>
>>>> I would also caution that measuring cache performance during testing is
>>>> not a very useful activity. Testing tends to test one use case once and
>>>> move on to the next.
>>>> In production, users tend to do the same thing over and over.
>>>> Testing might fill a shopping cart a few times and do a lot of other
>>>> administrative functions as many times . In real life, shopping carts
>>>> are filled much more frequently than catalog updates (one hopes). Using
>>>> performance numbers from functional testing will be misleading.
>>>>
>>>> The other message that I get from David's discussion is that caching t
>>>> built by professional caching experts  (Database developers as he
>>>> mentioned) worked better than caching systems built by application
>>>> developers.
>>>> It is likely that ehcache and the database built-in caching functions
>>>> will outperform caching systems built by OFBiz developers and will
>>>> handle the main cases better and will handle edge cases properly. They
>>>> will probably integrate better and be easier to configure at
>>>> run-time or
>>>> during deployment. They will also be easier to tune by the system
>>>> administrator.
>>>>
>>>> I understand that Adrian needs to fix this quickly. I suppose that
>>>> caching could be eliminated to solve the problem while a better
>>>> solution
>>>> is implemented.
>>>>
>>>> Do we know what it will take to add enough ehcache to make the system
>>>> perform adequately to meet current requirements?
>>>>
>>>> Ron
>>>>
>>>>
>>>> On 21/03/2015 6:22 AM, Adrian Crum wrote:
>>>>> I will try to say it again, but differently.
>>>>>
>>>>> If I am a developer, I am not aware of the subtleties of caching
>>>>> various entities. Entity cache settings will be determined during
>>>>> staging. So, I write my code as if everything will be cached - leaving
>>>>> the door open for a sysadmin to configure caching during staging.
>>>>>
>>>>> During staging, a sysadmin can start off with caching disabled, and
>>>>> then switch on caching for various entities while performance tests
>>>>> are being run. After some time, the sysadmin will have cache settings
>>>>> that provide optimal throughput. Does that mean ALL entities are
>>>>> cached? No, only the ones that need to be.
>>>>>
>>>>> The point I'm trying to make is this: The decision to cache or not
>>>>> should be made by a sysadmin, not by a developer.
>>>>>
>>>>> Adrian Crum
>>>>> Sandglass Software
>>>>> www.sandglass-software.com
>>>>>
>>>>> On 3/21/2015 10:08 AM, Scott Gray wrote:
>>>>>>> My preference is to make ALL Delegator calls use the cache.
>>>>>>
>>>>>> Perhaps I misunderstood the above sentence? I responded because I
>>>>>> don't
>>>>>> think caching everything is a good idea
>>>>>>
>>>>>> On 21 Mar 2015 20:41, "Adrian Crum"
>>>>>> <[hidden email]>
>>>>>> wrote:
>>>>>>>
>>>>>>> Thanks for the info David! I agree 100% with everything you said.
>>>>>>>
>>>>>>> There may be some misunderstanding about my advice. I suggested that
>>>>>> caching should be configured in the settings file, I did not suggest
>>>>>> that
>>>>>> everything should be cached all the time.
>>>>>>>
>>>>>>> Like you said, JMeter tests can reveal what needs to be cached,
>>>>>>> and a
>>>>>> sysadmin can fine-tune performance by tweaking the cache settings.
>>>>>> The
>>>>>> problem I mentioned is this: A sysadmin can't improve performance by
>>>>>> caching a particular entity if a developer has hard-coded it not
>>>>>> to be
>>>>>> cached.
>>>>>>>
>>>>>>> Btw, I removed the complicated condition checking in the condition
>>>>>>> cache
>>>>>> because it didn't work. Not only was the system spending a lot of
>>>>>> time
>>>>>> evaluating long lists of values (each value having a potentially long
>>>>>> list
>>>>>> of conditions), at the end of the evaluation the result was always a
>>>>>> cache
>>>>>> miss.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Adrian Crum
>>>>>>> Sandglass Software
>>>>>>> www.sandglass-software.com
>>>>>>>
>>>>>>> On 3/20/2015 9:22 PM, David E. Jones wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Stepping back a little, some history and theory of the entity cache
>>>>>> might be helpful.
>>>>>>>>
>>>>>>>> The original intent of the entity cache was a simple way to keep
>>>>>> frequently used values/records closer to the code that uses them, ie
>>>>>> in the
>>>>>> application server. One real world example of this is the goal to be
>>>>>> able
>>>>>> to render ecommerce catalog and product pages without hitting the
>>>>>> database.
>>>>>>>>
>>>>>>>> Over time the entity caching was made more complex to handle more
>>>>>> caching scenarios, but still left to the developer to determine if
>>>>>> caching
>>>>>> is appropriate for the code they are writing.
>>>>>>>>
>>>>>>>> In theory is it possible to write an entity cache that can be used
>>>>>>>> 100%
>>>>>> of the time? IMO the answer is NO. This is almost possible for single
>>>>>> record caching, with the cache ultimately becoming an in-memory
>>>>>> relational
>>>>>> database running on the app server (with full transaction support,
>>>>>> etc)...
>>>>>> but for List caching it totally kills the whole concept. The current
>>>>>> entity
>>>>>> cache keeps lists of results by the query condition used to get those
>>>>>> results and this is very different from what a database does, and
>>>>>> makes
>>>>>> things rather messy and inefficient outside simple use cases.
>>>>>>>>
>>>>>>>> On top of these big functional issues (which are deal killers IMO),
>>>>>> there is also the performance issue. The point, or intent at least,
>>>>>> of the
>>>>>> entity cache is to improve performance. As the cache gets more
>>>>>> complex the
>>>>>> performance will suffer, and because of the whole concept of caching
>>>>>> results by queries the performance will be WORSE than the DB
>>>>>> performance
>>>>>> for the same queries in most cases. Databases are quite fast and
>>>>>> efficient,
>>>>>> and we'll never be able to reproduce their ability to scale and
>>>>>> search in
>>>>>> something like an in-memory entity cache, especially not
>>>>>> considering the
>>>>>> massive redundancy and overhead of caching lists of values by
>>>>>> condition.
>>>>>>>>
>>>>>>>> As an example of this in the real world: on a large OFBiz project I
>>>>>> worked on that finished last year we went into production with the
>>>>>> entity
>>>>>> cache turned OFF, completely DISABLED. Why? When doing load testing
>>>>>> on a
>>>>>> whim one of the guys decided to try it without the entity cache
>>>>>> enabled,
>>>>>> and the body of JMeter tests that exercised a few dozen of the most
>>>>>> common
>>>>>> user paths through the system actually ran FASTER. The database
>>>>>> (MySQL in
>>>>>> this case) was hit over the network, but responded quickly enough to
>>>>>> make
>>>>>> things work quite well for the various find queries, and FAR faster
>>>>>> for
>>>>>> updates, especially creates. This project was one of the higher
>>>>>> volume
>>>>>> projects I'm aware of for OFBiz, at peaks handling sustained
>>>>>> processing of
>>>>>> around 10 orders per second (36,000 per hour), with some short term
>>>>>> peaks
>>>>>> much higher, closer to 20-30 orders per second... and longer term
>>>>>> peaks
>>>>>> hitting over 200k orders in one day (north America only day time,
>>>>>> around a
>>>>>> 12 hour window).
>>>>>>>>
>>>>>>>> I found this to be curious so looked into it a bit more and the
>>>>>>>> main
>>>>>> performance culprit was updates, ESPECIALLY creates on any entity
>>>>>> that has
>>>>>> an active list cache. Auto-clearing that cache requires running the
>>>>>> condition for each cache entry on the record to see if it matches,
>>>>>> and if
>>>>>> it does then it is cleared. This could be made more efficient by
>>>>>> expanding
>>>>>> the reverse index concept to index all values of fields in
>>>>>> conditions...
>>>>>> though that would be fairly complex to implement because of the wide
>>>>>> variety of conditions that CAN be performed on fields, and even
>>>>>> moreso when
>>>>>> they are combined with other logic... especially NOTs and ORs. This
>>>>>> could
>>>>>> potentially increase performance, but would again add yet more
>>>>>> complexity
>>>>>> and overhead.
>>>>>>>>
>>>>>>>> To turn this dilemma into a nightmare, consider caching
>>>>>>>> view-entities.
>>>>>> In general as systems scale if you ever have to iterate over stuff
>>>>>> your
>>>>>> performance is going to get hit REALLY hard compared to indexed and
>>>>>> other
>>>>>> less than n operations.
>>>>>>>>
>>>>>>>> The main lesson from the story: caching, especially list caching,
>>>>>>>> should
>>>>>> ONLY be done in limited cases when the ratio of reads to write is
>>>>>> VERY
>>>>>> high, and more particularly the ratio of reads to creates. When
>>>>>> considering
>>>>>> whether to use a cache this should be considered carefully, because
>>>>>> records
>>>>>> are sometimes updated from places that developers are unaware,
>>>>>> sometimes at
>>>>>> surprising volumes. For example, it might seem great (and help a lot
>>>>>> in dev
>>>>>> and lower scale testing) to cache inventory information for viewing
>>>>>> on a
>>>>>> category screen, but always go to the DB to avoid stale data on a
>>>>>> product
>>>>>> detail screen and when adding to cart. The problem is that with high
>>>>>> order
>>>>>> volumes the inventory data is pretty much constantly being updated,
>>>>>> so the
>>>>>> caches are constantly... SLOWLY... being cleared as InventoryDetail
>>>>>> records
>>>>>> are created for reservations and issuances.
>>>>>>>>
>>>>>>>> To turn this nightmare into a deal killer, consider multiple
>>>>>>>> application
>>>>>> servers and the need for either a (SLOW) distributed cache or (SLOW)
>>>>>> distributed cache clearing. These have to go over the network
>>>>>> anyway, so
>>>>>> might as well go to the database!
>>>>>>>>
>>>>>>>> In the case above where we decided to NOT use the entity cache at
>>>>>>>> all
>>>>>> the tests were run on one really beefy server showing that
>>>>>> disabling the
>>>>>> cache was faster. When we ran it in a cluster of just 2 servers with
>>>>>> direct
>>>>>> DCC (the best case scenario for a distributed cache) we not only saw
>>>>>> a big
>>>>>> performance hit, but also got various run-time errors from stale
>>>>>> data.
>>>>>>>>
>>>>>>>> I really don't how anyone could back the concept of caching all
>>>>>>>> finds by
>>>>>> default... you don't even have to imagine edge cases, just consider
>>>>>> the
>>>>>> problems ALREADY being faced with more limited caching and how
>>>>>> often the
>>>>>> entity cache simply isn't a good solution.
>>>>>>>>
>>>>>>>> As for improving the entity caching in OFBiz, there are some
>>>>>>>> concepts in
>>>>>> Moqui that might be useful:
>>>>>>>>
>>>>>>>> 1. add a cache attribute to the entity definition with true, false,
>>>>>>>> and
>>>>>> never options; true and false being defaults that can be
>>>>>> overridden by
>>>>>> code, and never being an absolute (OFBiz does have this option IIRC);
>>>>>> this
>>>>>> would default to false, true being a useful setting for common things
>>>>>> like
>>>>>> Enumeration, StatusItem, etc, etc
>>>>>>>>
>>>>>>>> 2. add general support in the entity engine find methods for a "for
>>>>>> update" parameter, and if true don't cache (and pass this on to the
>>>>>> DB to
>>>>>> lock the record(s) being queried), also making the value mutable
>>>>>>>>
>>>>>>>> 3. a write-through per-transaction cache; you can do some really
>>>>>>>> cool
>>>>>> stuff with this, avoiding most database hits during a transaction
>>>>>> until the
>>>>>> end when the changes are dumped to the DB; the Moqui implementation
>>>>>> of this
>>>>>> concept even looks for cached records that any find condition would
>>>>>> require
>>>>>> to get results and does the query in-memory, not having to go to the
>>>>>> database at all... and for other queries augments the results with
>>>>>> values
>>>>>> in the cache
>>>>>>>>
>>>>>>>> The whole concept of a write-through cache that is limited to the
>>>>>>>> scope
>>>>>> of a single transaction shows some of the issues you would run into
>>>>>> even if
>>>>>> trying to make the entity cache transactional. Especially with more
>>>>>> complex
>>>>>> finds it just falls apart. The current Moqui implementation handles
>>>>>> quite a
>>>>>> bit, but there are various things that I've run into testing it with
>>>>>> real-world business services that are either a REAL pain to handle
>>>>>> (so I
>>>>>> haven't yet, but it is conceptually possible) or that I simply can't
>>>>>> think
>>>>>> of any good way to handle... and for those you simply can't use the
>>>>>> write-through cache.
>>>>>>>>
>>>>>>>> There are some notes in the code for this, and some
>>>>>>>> code/comments to
>>>>>> more thoroughly communicate this concept, in this class in Moqui:
>>>>>>>>
>>>>>>>>
>>>>>> https://github.com/moqui/moqui/blob/master/framework/src/main/groovy/org/moqui/impl/context/TransactionCache.groovy
>>>>>>
>>>>>>
>>>>>>
>>>>>>>>
>>>>>>>> I should also say that my motivation to handle every edge case even
>>>>>>>> for
>>>>>> this write-through cache is limited... yes there is room for
>>>>>> improvement
>>>>>> handling more scenarios, but how big will the performance increase
>>>>>> ACTUALLY
>>>>>> be for them? The efforts on this so far have been based on profiling
>>>>>> results and making sure there is a significant difference (which
>>>>>> there is
>>>>>> for many services in Mantle Business Artifacts, though I haven't even
>>>>>> come
>>>>>> close to testing all of them this way).
>>>>>>>>
>>>>>>>> The same concept would apply to a read-only entity cache... some
>>>>>>>> things
>>>>>> might be possible to support, but would NOT improve performance
>>>>>> making them
>>>>>> a moot point.
>>>>>>>>
>>>>>>>> I don't know if I've written enough to convince everyone listening
>>>>>>>> that
>>>>>> even attempting a universal read-only entity cache is a useless
>>>>>> idea... I'm
>>>>>> sure some will still like the idea. If anyone gets into it and wants
>>>>>> to try
>>>>>> it out in their own branch of OFBiz, great... knock yourself out
>>>>>> (probably
>>>>>> literally...). But PLEASE no one ever commit something like this to
>>>>>> the
>>>>>> primary branch in the repo... not EVER.
>>>>>>>>
>>>>>>>> The whole idea that the OFBiz entity cache has had more limited
>>>>>>>> ability
>>>>>> to handle different scenarios in the past than it does now is not an
>>>>>> argument of any sort supporting the idea of taking the entity cache
>>>>>> to the
>>>>>> ultimate possible end... which theoretically isn't even that far from
>>>>>> where
>>>>>> it is now.
>>>>>>>>
>>>>>>>> To apply a more useful standard the arguments should be for a
>>>>>>>> _useful_
>>>>>> objective, which means increasing performance. I guarantee an always
>>>>>> used
>>>>>> find cache will NOT increase performance, it will kill it dead and
>>>>>> cause
>>>>>> infinite concurrency headaches in the process.
>>>>>>>>
>>>>>>>> -David
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> On 19 Mar 2015, at 10:46, Adrian Crum <
>>>>>> [hidden email]> wrote:
>>>>>>>>>
>>>>>>>>> The translation to English is not good, but I think I understand
>>>>>>>>> what
>>>>>> you are saying.
>>>>>>>>>
>>>>>>>>> The entity values in the cache MUST be immutable - because
>>>>>>>>> multiple
>>>>>> threads share the values. To do otherwise would require complicated
>>>>>> synchronization code in GenericValue (which would cause blocking and
>>>>>> hurt
>>>>>> performance).
>>>>>>>>>
>>>>>>>>> When I first starting working on the entity cache issues, it
>>>>>>>>> appeared
>>>>>> to me that mutable entity values may have been in the original design
>>>>>> (to
>>>>>> enable a write-through cache). That is my guess - I am not sure. At
>>>>>> some
>>>>>> time, the entity values in the cache were made immutable, but the
>>>>>> change
>>>>>> was incomplete - some cached entity values were immutable and others
>>>>>> were
>>>>>> not. That is one of the things I fixed - I made sure ALL entity
>>>>>> values
>>>>>> coming from the cache are immutable.
>>>>>>>>>
>>>>>>>>> One way we can eliminate the additional complication of cloning
>>>>>> immutable entity values is to wrap the List in a custom Iterator
>>>>>> implementation that automatically clones elements as they are
>>>>>> retrieved
>>>>>> from the List. The drawback is the performance hit - because you
>>>>>> would be
>>>>>> cloning values that might not get modified. I think it is more
>>>>>> efficient to
>>>>>> clone an entity value only when you intend to modify it.
>>>>>>>>>
>>>>>>>>> Adrian Crum
>>>>>>>>> Sandglass Software
>>>>>>>>> www.sandglass-software.com
>>>>>>>>>
>>>>>>>>> On 3/19/2015 4:19 PM, Nicolas Malin wrote:
>>>>>>>>>>
>>>>>>>>>> Le 18/03/2015 13:16, Adrian Crum a écrit :
>>>>>>>>>>>
>>>>>>>>>>> If you code Delegator calls to avoid the cache, then there is no
>>>>>>>>>>> way
>>>>>>>>>>> for a sysadmin to configure the caching behavior - that bit of
>>>>>>>>>>> code
>>>>>>>>>>> will ALWAYS make a database call.
>>>>>>>>>>>
>>>>>>>>>>> If you make all Delegator calls use the cache, then there is an
>>>>>>>>>>> additional complication that will add a bit more code: the
>>>>>>>>>>> GenericValue instances retrieved from the cache are immutable -
>>>>>>>>>>> if you
>>>>>>>>>>> want to modify them, then you will have to clone them. So, this
>>>>>>>>>>> approach can produce an additional line of code.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I don't see any logical reason why we need to keep a GenericValue
>>>>>>>>>> came
>>>>>>>>>> from cache as immutable. In large vision, a developper give
>>>>>>>>>> information
>>>>>>>>>> on cache or not only he want force the cache using during his
>>>>>>>>>> process.
>>>>>>>>>> As OFBiz manage by default transaction, timezone, locale,
>>>>>>>>>> auto-matching
>>>>>>>>>> or others.
>>>>>>>>>> The entity engine would be works with admin sys cache tuning.
>>>>>>>>>>
>>>>>>>>>> As example delegator.find("Party", "partyId", partyId) use the
>>>>>>>>>> default
>>>>>>>>>> parameter from cache.properties and after the store on a cached
>>>>>>>>>> GenericValue is a delegator's problem. I see a simple test like
>>>>>>>>>> that :
>>>>>>>>>> if (genericValue came from cache) {
>>>>>>>>>>       if (value is already done) {
>>>>>>>>>>          getFromDataBase
>>>>>>>>>>          update Value
>>>>>>>>>>       }
>>>>>>>>>>       else refuse (or not I have a doubt :) )
>>>>>>>>>> }
>>>>>>>>>> store
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Nicolas
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
Reply | Threaded
Open this post in threaded view
|

Re: Entity Caching

Jacques Le Roux
Administrator
Maybe using 1 in the meantime ? ;)

Jacques

Le 22/03/2015 11:17, Adrian Crum a écrit :

> Oops, my bad. A maxSize setting of zero means there is no limit.
>
> I spent some time looking through the UtilCache code, and I don't see any way to disable a cache. Generally speaking, any setting of zero means
> there is no limit for that setting.
>
> It would be nice to have an enabled/disabled setting.
>
> Adrian Crum
> Sandglass Software
> www.sandglass-software.com
>
> On 3/22/2015 9:38 AM, Adrian Crum wrote:
>> Interesting. The current cache code ignores the maxSize setting.
>>
>> Adrian Crum
>> Sandglass Software
>> www.sandglass-software.com
>>
>> On 3/22/2015 7:38 AM, Adrian Crum wrote:
>>> I don't see an enable/disable setting but
>>>
>>> default.maxSize=0 in cache.properties
>>>
>>> should do it.
>>>
>>>
>>> Adrian Crum
>>> Sandglass Software
>>> www.sandglass-software.com
>>>
>>> On 3/22/2015 3:16 AM, Christian Carlow wrote:
>>>> Is there a convenient setting for disabling cache completely as David
>>>> mentioned he did?
>>>>
>>>> On Sat, 2015-03-21 at 21:39 -0400, Ron Wheeler wrote:
>>>>> I agree with Adrian that caching should be a sysadmin choice.
>>>>>
>>>>> I would also caution that measuring cache performance during testing is
>>>>> not a very useful activity. Testing tends to test one use case once and
>>>>> move on to the next.
>>>>> In production, users tend to do the same thing over and over.
>>>>> Testing might fill a shopping cart a few times and do a lot of other
>>>>> administrative functions as many times . In real life, shopping carts
>>>>> are filled much more frequently than catalog updates (one hopes). Using
>>>>> performance numbers from functional testing will be misleading.
>>>>>
>>>>> The other message that I get from David's discussion is that caching t
>>>>> built by professional caching experts  (Database developers as he
>>>>> mentioned) worked better than caching systems built by application
>>>>> developers.
>>>>> It is likely that ehcache and the database built-in caching functions
>>>>> will outperform caching systems built by OFBiz developers and will
>>>>> handle the main cases better and will handle edge cases properly. They
>>>>> will probably integrate better and be easier to configure at
>>>>> run-time or
>>>>> during deployment. They will also be easier to tune by the system
>>>>> administrator.
>>>>>
>>>>> I understand that Adrian needs to fix this quickly. I suppose that
>>>>> caching could be eliminated to solve the problem while a better
>>>>> solution
>>>>> is implemented.
>>>>>
>>>>> Do we know what it will take to add enough ehcache to make the system
>>>>> perform adequately to meet current requirements?
>>>>>
>>>>> Ron
>>>>>
>>>>>
>>>>> On 21/03/2015 6:22 AM, Adrian Crum wrote:
>>>>>> I will try to say it again, but differently.
>>>>>>
>>>>>> If I am a developer, I am not aware of the subtleties of caching
>>>>>> various entities. Entity cache settings will be determined during
>>>>>> staging. So, I write my code as if everything will be cached - leaving
>>>>>> the door open for a sysadmin to configure caching during staging.
>>>>>>
>>>>>> During staging, a sysadmin can start off with caching disabled, and
>>>>>> then switch on caching for various entities while performance tests
>>>>>> are being run. After some time, the sysadmin will have cache settings
>>>>>> that provide optimal throughput. Does that mean ALL entities are
>>>>>> cached? No, only the ones that need to be.
>>>>>>
>>>>>> The point I'm trying to make is this: The decision to cache or not
>>>>>> should be made by a sysadmin, not by a developer.
>>>>>>
>>>>>> Adrian Crum
>>>>>> Sandglass Software
>>>>>> www.sandglass-software.com
>>>>>>
>>>>>> On 3/21/2015 10:08 AM, Scott Gray wrote:
>>>>>>>> My preference is to make ALL Delegator calls use the cache.
>>>>>>>
>>>>>>> Perhaps I misunderstood the above sentence? I responded because I
>>>>>>> don't
>>>>>>> think caching everything is a good idea
>>>>>>>
>>>>>>> On 21 Mar 2015 20:41, "Adrian Crum"
>>>>>>> <[hidden email]>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Thanks for the info David! I agree 100% with everything you said.
>>>>>>>>
>>>>>>>> There may be some misunderstanding about my advice. I suggested that
>>>>>>> caching should be configured in the settings file, I did not suggest
>>>>>>> that
>>>>>>> everything should be cached all the time.
>>>>>>>>
>>>>>>>> Like you said, JMeter tests can reveal what needs to be cached,
>>>>>>>> and a
>>>>>>> sysadmin can fine-tune performance by tweaking the cache settings.
>>>>>>> The
>>>>>>> problem I mentioned is this: A sysadmin can't improve performance by
>>>>>>> caching a particular entity if a developer has hard-coded it not
>>>>>>> to be
>>>>>>> cached.
>>>>>>>>
>>>>>>>> Btw, I removed the complicated condition checking in the condition
>>>>>>>> cache
>>>>>>> because it didn't work. Not only was the system spending a lot of
>>>>>>> time
>>>>>>> evaluating long lists of values (each value having a potentially long
>>>>>>> list
>>>>>>> of conditions), at the end of the evaluation the result was always a
>>>>>>> cache
>>>>>>> miss.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Adrian Crum
>>>>>>>> Sandglass Software
>>>>>>>> www.sandglass-software.com
>>>>>>>>
>>>>>>>> On 3/20/2015 9:22 PM, David E. Jones wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Stepping back a little, some history and theory of the entity cache
>>>>>>> might be helpful.
>>>>>>>>>
>>>>>>>>> The original intent of the entity cache was a simple way to keep
>>>>>>> frequently used values/records closer to the code that uses them, ie
>>>>>>> in the
>>>>>>> application server. One real world example of this is the goal to be
>>>>>>> able
>>>>>>> to render ecommerce catalog and product pages without hitting the
>>>>>>> database.
>>>>>>>>>
>>>>>>>>> Over time the entity caching was made more complex to handle more
>>>>>>> caching scenarios, but still left to the developer to determine if
>>>>>>> caching
>>>>>>> is appropriate for the code they are writing.
>>>>>>>>>
>>>>>>>>> In theory is it possible to write an entity cache that can be used
>>>>>>>>> 100%
>>>>>>> of the time? IMO the answer is NO. This is almost possible for single
>>>>>>> record caching, with the cache ultimately becoming an in-memory
>>>>>>> relational
>>>>>>> database running on the app server (with full transaction support,
>>>>>>> etc)...
>>>>>>> but for List caching it totally kills the whole concept. The current
>>>>>>> entity
>>>>>>> cache keeps lists of results by the query condition used to get those
>>>>>>> results and this is very different from what a database does, and
>>>>>>> makes
>>>>>>> things rather messy and inefficient outside simple use cases.
>>>>>>>>>
>>>>>>>>> On top of these big functional issues (which are deal killers IMO),
>>>>>>> there is also the performance issue. The point, or intent at least,
>>>>>>> of the
>>>>>>> entity cache is to improve performance. As the cache gets more
>>>>>>> complex the
>>>>>>> performance will suffer, and because of the whole concept of caching
>>>>>>> results by queries the performance will be WORSE than the DB
>>>>>>> performance
>>>>>>> for the same queries in most cases. Databases are quite fast and
>>>>>>> efficient,
>>>>>>> and we'll never be able to reproduce their ability to scale and
>>>>>>> search in
>>>>>>> something like an in-memory entity cache, especially not
>>>>>>> considering the
>>>>>>> massive redundancy and overhead of caching lists of values by
>>>>>>> condition.
>>>>>>>>>
>>>>>>>>> As an example of this in the real world: on a large OFBiz project I
>>>>>>> worked on that finished last year we went into production with the
>>>>>>> entity
>>>>>>> cache turned OFF, completely DISABLED. Why? When doing load testing
>>>>>>> on a
>>>>>>> whim one of the guys decided to try it without the entity cache
>>>>>>> enabled,
>>>>>>> and the body of JMeter tests that exercised a few dozen of the most
>>>>>>> common
>>>>>>> user paths through the system actually ran FASTER. The database
>>>>>>> (MySQL in
>>>>>>> this case) was hit over the network, but responded quickly enough to
>>>>>>> make
>>>>>>> things work quite well for the various find queries, and FAR faster
>>>>>>> for
>>>>>>> updates, especially creates. This project was one of the higher
>>>>>>> volume
>>>>>>> projects I'm aware of for OFBiz, at peaks handling sustained
>>>>>>> processing of
>>>>>>> around 10 orders per second (36,000 per hour), with some short term
>>>>>>> peaks
>>>>>>> much higher, closer to 20-30 orders per second... and longer term
>>>>>>> peaks
>>>>>>> hitting over 200k orders in one day (north America only day time,
>>>>>>> around a
>>>>>>> 12 hour window).
>>>>>>>>>
>>>>>>>>> I found this to be curious so looked into it a bit more and the
>>>>>>>>> main
>>>>>>> performance culprit was updates, ESPECIALLY creates on any entity
>>>>>>> that has
>>>>>>> an active list cache. Auto-clearing that cache requires running the
>>>>>>> condition for each cache entry on the record to see if it matches,
>>>>>>> and if
>>>>>>> it does then it is cleared. This could be made more efficient by
>>>>>>> expanding
>>>>>>> the reverse index concept to index all values of fields in
>>>>>>> conditions...
>>>>>>> though that would be fairly complex to implement because of the wide
>>>>>>> variety of conditions that CAN be performed on fields, and even
>>>>>>> moreso when
>>>>>>> they are combined with other logic... especially NOTs and ORs. This
>>>>>>> could
>>>>>>> potentially increase performance, but would again add yet more
>>>>>>> complexity
>>>>>>> and overhead.
>>>>>>>>>
>>>>>>>>> To turn this dilemma into a nightmare, consider caching
>>>>>>>>> view-entities.
>>>>>>> In general as systems scale if you ever have to iterate over stuff
>>>>>>> your
>>>>>>> performance is going to get hit REALLY hard compared to indexed and
>>>>>>> other
>>>>>>> less than n operations.
>>>>>>>>>
>>>>>>>>> The main lesson from the story: caching, especially list caching,
>>>>>>>>> should
>>>>>>> ONLY be done in limited cases when the ratio of reads to write is
>>>>>>> VERY
>>>>>>> high, and more particularly the ratio of reads to creates. When
>>>>>>> considering
>>>>>>> whether to use a cache this should be considered carefully, because
>>>>>>> records
>>>>>>> are sometimes updated from places that developers are unaware,
>>>>>>> sometimes at
>>>>>>> surprising volumes. For example, it might seem great (and help a lot
>>>>>>> in dev
>>>>>>> and lower scale testing) to cache inventory information for viewing
>>>>>>> on a
>>>>>>> category screen, but always go to the DB to avoid stale data on a
>>>>>>> product
>>>>>>> detail screen and when adding to cart. The problem is that with high
>>>>>>> order
>>>>>>> volumes the inventory data is pretty much constantly being updated,
>>>>>>> so the
>>>>>>> caches are constantly... SLOWLY... being cleared as InventoryDetail
>>>>>>> records
>>>>>>> are created for reservations and issuances.
>>>>>>>>>
>>>>>>>>> To turn this nightmare into a deal killer, consider multiple
>>>>>>>>> application
>>>>>>> servers and the need for either a (SLOW) distributed cache or (SLOW)
>>>>>>> distributed cache clearing. These have to go over the network
>>>>>>> anyway, so
>>>>>>> might as well go to the database!
>>>>>>>>>
>>>>>>>>> In the case above where we decided to NOT use the entity cache at
>>>>>>>>> all
>>>>>>> the tests were run on one really beefy server showing that
>>>>>>> disabling the
>>>>>>> cache was faster. When we ran it in a cluster of just 2 servers with
>>>>>>> direct
>>>>>>> DCC (the best case scenario for a distributed cache) we not only saw
>>>>>>> a big
>>>>>>> performance hit, but also got various run-time errors from stale
>>>>>>> data.
>>>>>>>>>
>>>>>>>>> I really don't how anyone could back the concept of caching all
>>>>>>>>> finds by
>>>>>>> default... you don't even have to imagine edge cases, just consider
>>>>>>> the
>>>>>>> problems ALREADY being faced with more limited caching and how
>>>>>>> often the
>>>>>>> entity cache simply isn't a good solution.
>>>>>>>>>
>>>>>>>>> As for improving the entity caching in OFBiz, there are some
>>>>>>>>> concepts in
>>>>>>> Moqui that might be useful:
>>>>>>>>>
>>>>>>>>> 1. add a cache attribute to the entity definition with true, false,
>>>>>>>>> and
>>>>>>> never options; true and false being defaults that can be
>>>>>>> overridden by
>>>>>>> code, and never being an absolute (OFBiz does have this option IIRC);
>>>>>>> this
>>>>>>> would default to false, true being a useful setting for common things
>>>>>>> like
>>>>>>> Enumeration, StatusItem, etc, etc
>>>>>>>>>
>>>>>>>>> 2. add general support in the entity engine find methods for a "for
>>>>>>> update" parameter, and if true don't cache (and pass this on to the
>>>>>>> DB to
>>>>>>> lock the record(s) being queried), also making the value mutable
>>>>>>>>>
>>>>>>>>> 3. a write-through per-transaction cache; you can do some really
>>>>>>>>> cool
>>>>>>> stuff with this, avoiding most database hits during a transaction
>>>>>>> until the
>>>>>>> end when the changes are dumped to the DB; the Moqui implementation
>>>>>>> of this
>>>>>>> concept even looks for cached records that any find condition would
>>>>>>> require
>>>>>>> to get results and does the query in-memory, not having to go to the
>>>>>>> database at all... and for other queries augments the results with
>>>>>>> values
>>>>>>> in the cache
>>>>>>>>>
>>>>>>>>> The whole concept of a write-through cache that is limited to the
>>>>>>>>> scope
>>>>>>> of a single transaction shows some of the issues you would run into
>>>>>>> even if
>>>>>>> trying to make the entity cache transactional. Especially with more
>>>>>>> complex
>>>>>>> finds it just falls apart. The current Moqui implementation handles
>>>>>>> quite a
>>>>>>> bit, but there are various things that I've run into testing it with
>>>>>>> real-world business services that are either a REAL pain to handle
>>>>>>> (so I
>>>>>>> haven't yet, but it is conceptually possible) or that I simply can't
>>>>>>> think
>>>>>>> of any good way to handle... and for those you simply can't use the
>>>>>>> write-through cache.
>>>>>>>>>
>>>>>>>>> There are some notes in the code for this, and some
>>>>>>>>> code/comments to
>>>>>>> more thoroughly communicate this concept, in this class in Moqui:
>>>>>>>>>
>>>>>>>>>
>>>>>>> https://github.com/moqui/moqui/blob/master/framework/src/main/groovy/org/moqui/impl/context/TransactionCache.groovy
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>>
>>>>>>>>> I should also say that my motivation to handle every edge case even
>>>>>>>>> for
>>>>>>> this write-through cache is limited... yes there is room for
>>>>>>> improvement
>>>>>>> handling more scenarios, but how big will the performance increase
>>>>>>> ACTUALLY
>>>>>>> be for them? The efforts on this so far have been based on profiling
>>>>>>> results and making sure there is a significant difference (which
>>>>>>> there is
>>>>>>> for many services in Mantle Business Artifacts, though I haven't even
>>>>>>> come
>>>>>>> close to testing all of them this way).
>>>>>>>>>
>>>>>>>>> The same concept would apply to a read-only entity cache... some
>>>>>>>>> things
>>>>>>> might be possible to support, but would NOT improve performance
>>>>>>> making them
>>>>>>> a moot point.
>>>>>>>>>
>>>>>>>>> I don't know if I've written enough to convince everyone listening
>>>>>>>>> that
>>>>>>> even attempting a universal read-only entity cache is a useless
>>>>>>> idea... I'm
>>>>>>> sure some will still like the idea. If anyone gets into it and wants
>>>>>>> to try
>>>>>>> it out in their own branch of OFBiz, great... knock yourself out
>>>>>>> (probably
>>>>>>> literally...). But PLEASE no one ever commit something like this to
>>>>>>> the
>>>>>>> primary branch in the repo... not EVER.
>>>>>>>>>
>>>>>>>>> The whole idea that the OFBiz entity cache has had more limited
>>>>>>>>> ability
>>>>>>> to handle different scenarios in the past than it does now is not an
>>>>>>> argument of any sort supporting the idea of taking the entity cache
>>>>>>> to the
>>>>>>> ultimate possible end... which theoretically isn't even that far from
>>>>>>> where
>>>>>>> it is now.
>>>>>>>>>
>>>>>>>>> To apply a more useful standard the arguments should be for a
>>>>>>>>> _useful_
>>>>>>> objective, which means increasing performance. I guarantee an always
>>>>>>> used
>>>>>>> find cache will NOT increase performance, it will kill it dead and
>>>>>>> cause
>>>>>>> infinite concurrency headaches in the process.
>>>>>>>>>
>>>>>>>>> -David
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> On 19 Mar 2015, at 10:46, Adrian Crum <
>>>>>>> [hidden email]> wrote:
>>>>>>>>>>
>>>>>>>>>> The translation to English is not good, but I think I understand
>>>>>>>>>> what
>>>>>>> you are saying.
>>>>>>>>>>
>>>>>>>>>> The entity values in the cache MUST be immutable - because
>>>>>>>>>> multiple
>>>>>>> threads share the values. To do otherwise would require complicated
>>>>>>> synchronization code in GenericValue (which would cause blocking and
>>>>>>> hurt
>>>>>>> performance).
>>>>>>>>>>
>>>>>>>>>> When I first starting working on the entity cache issues, it
>>>>>>>>>> appeared
>>>>>>> to me that mutable entity values may have been in the original design
>>>>>>> (to
>>>>>>> enable a write-through cache). That is my guess - I am not sure. At
>>>>>>> some
>>>>>>> time, the entity values in the cache were made immutable, but the
>>>>>>> change
>>>>>>> was incomplete - some cached entity values were immutable and others
>>>>>>> were
>>>>>>> not. That is one of the things I fixed - I made sure ALL entity
>>>>>>> values
>>>>>>> coming from the cache are immutable.
>>>>>>>>>>
>>>>>>>>>> One way we can eliminate the additional complication of cloning
>>>>>>> immutable entity values is to wrap the List in a custom Iterator
>>>>>>> implementation that automatically clones elements as they are
>>>>>>> retrieved
>>>>>>> from the List. The drawback is the performance hit - because you
>>>>>>> would be
>>>>>>> cloning values that might not get modified. I think it is more
>>>>>>> efficient to
>>>>>>> clone an entity value only when you intend to modify it.
>>>>>>>>>>
>>>>>>>>>> Adrian Crum
>>>>>>>>>> Sandglass Software
>>>>>>>>>> www.sandglass-software.com
>>>>>>>>>>
>>>>>>>>>> On 3/19/2015 4:19 PM, Nicolas Malin wrote:
>>>>>>>>>>>
>>>>>>>>>>> Le 18/03/2015 13:16, Adrian Crum a écrit :
>>>>>>>>>>>>
>>>>>>>>>>>> If you code Delegator calls to avoid the cache, then there is no
>>>>>>>>>>>> way
>>>>>>>>>>>> for a sysadmin to configure the caching behavior - that bit of
>>>>>>>>>>>> code
>>>>>>>>>>>> will ALWAYS make a database call.
>>>>>>>>>>>>
>>>>>>>>>>>> If you make all Delegator calls use the cache, then there is an
>>>>>>>>>>>> additional complication that will add a bit more code: the
>>>>>>>>>>>> GenericValue instances retrieved from the cache are immutable -
>>>>>>>>>>>> if you
>>>>>>>>>>>> want to modify them, then you will have to clone them. So, this
>>>>>>>>>>>> approach can produce an additional line of code.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I don't see any logical reason why we need to keep a GenericValue
>>>>>>>>>>> came
>>>>>>>>>>> from cache as immutable. In large vision, a developper give
>>>>>>>>>>> information
>>>>>>>>>>> on cache or not only he want force the cache using during his
>>>>>>>>>>> process.
>>>>>>>>>>> As OFBiz manage by default transaction, timezone, locale,
>>>>>>>>>>> auto-matching
>>>>>>>>>>> or others.
>>>>>>>>>>> The entity engine would be works with admin sys cache tuning.
>>>>>>>>>>>
>>>>>>>>>>> As example delegator.find("Party", "partyId", partyId) use the
>>>>>>>>>>> default
>>>>>>>>>>> parameter from cache.properties and after the store on a cached
>>>>>>>>>>> GenericValue is a delegator's problem. I see a simple test like
>>>>>>>>>>> that :
>>>>>>>>>>> if (genericValue came from cache) {
>>>>>>>>>>>       if (value is already done) {
>>>>>>>>>>>          getFromDataBase
>>>>>>>>>>>          update Value
>>>>>>>>>>>       }
>>>>>>>>>>>       else refuse (or not I have a doubt :) )
>>>>>>>>>>> }
>>>>>>>>>>> store
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Nicolas
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Entity Caching

Adrian Crum-3
I was thinking a maxSize setting of -1 means the cache is disabled. That
would be an easy change to make.

Having a separate enable/disable property will take a lot of rewriting.

Adrian Crum
Sandglass Software
www.sandglass-software.com

On 3/22/2015 10:21 AM, Jacques Le Roux wrote:

> Maybe using 1 in the meantime ? ;)
>
> Jacques
>
> Le 22/03/2015 11:17, Adrian Crum a écrit :
>> Oops, my bad. A maxSize setting of zero means there is no limit.
>>
>> I spent some time looking through the UtilCache code, and I don't see
>> any way to disable a cache. Generally speaking, any setting of zero
>> means there is no limit for that setting.
>>
>> It would be nice to have an enabled/disabled setting.
>>
>> Adrian Crum
>> Sandglass Software
>> www.sandglass-software.com
>>
>> On 3/22/2015 9:38 AM, Adrian Crum wrote:
>>> Interesting. The current cache code ignores the maxSize setting.
>>>
>>> Adrian Crum
>>> Sandglass Software
>>> www.sandglass-software.com
>>>
>>> On 3/22/2015 7:38 AM, Adrian Crum wrote:
>>>> I don't see an enable/disable setting but
>>>>
>>>> default.maxSize=0 in cache.properties
>>>>
>>>> should do it.
>>>>
>>>>
>>>> Adrian Crum
>>>> Sandglass Software
>>>> www.sandglass-software.com
>>>>
>>>> On 3/22/2015 3:16 AM, Christian Carlow wrote:
>>>>> Is there a convenient setting for disabling cache completely as David
>>>>> mentioned he did?
>>>>>
>>>>> On Sat, 2015-03-21 at 21:39 -0400, Ron Wheeler wrote:
>>>>>> I agree with Adrian that caching should be a sysadmin choice.
>>>>>>
>>>>>> I would also caution that measuring cache performance during
>>>>>> testing is
>>>>>> not a very useful activity. Testing tends to test one use case
>>>>>> once and
>>>>>> move on to the next.
>>>>>> In production, users tend to do the same thing over and over.
>>>>>> Testing might fill a shopping cart a few times and do a lot of other
>>>>>> administrative functions as many times . In real life, shopping carts
>>>>>> are filled much more frequently than catalog updates (one hopes).
>>>>>> Using
>>>>>> performance numbers from functional testing will be misleading.
>>>>>>
>>>>>> The other message that I get from David's discussion is that
>>>>>> caching t
>>>>>> built by professional caching experts  (Database developers as he
>>>>>> mentioned) worked better than caching systems built by application
>>>>>> developers.
>>>>>> It is likely that ehcache and the database built-in caching functions
>>>>>> will outperform caching systems built by OFBiz developers and will
>>>>>> handle the main cases better and will handle edge cases properly.
>>>>>> They
>>>>>> will probably integrate better and be easier to configure at
>>>>>> run-time or
>>>>>> during deployment. They will also be easier to tune by the system
>>>>>> administrator.
>>>>>>
>>>>>> I understand that Adrian needs to fix this quickly. I suppose that
>>>>>> caching could be eliminated to solve the problem while a better
>>>>>> solution
>>>>>> is implemented.
>>>>>>
>>>>>> Do we know what it will take to add enough ehcache to make the system
>>>>>> perform adequately to meet current requirements?
>>>>>>
>>>>>> Ron
>>>>>>
>>>>>>
>>>>>> On 21/03/2015 6:22 AM, Adrian Crum wrote:
>>>>>>> I will try to say it again, but differently.
>>>>>>>
>>>>>>> If I am a developer, I am not aware of the subtleties of caching
>>>>>>> various entities. Entity cache settings will be determined during
>>>>>>> staging. So, I write my code as if everything will be cached -
>>>>>>> leaving
>>>>>>> the door open for a sysadmin to configure caching during staging.
>>>>>>>
>>>>>>> During staging, a sysadmin can start off with caching disabled, and
>>>>>>> then switch on caching for various entities while performance tests
>>>>>>> are being run. After some time, the sysadmin will have cache
>>>>>>> settings
>>>>>>> that provide optimal throughput. Does that mean ALL entities are
>>>>>>> cached? No, only the ones that need to be.
>>>>>>>
>>>>>>> The point I'm trying to make is this: The decision to cache or not
>>>>>>> should be made by a sysadmin, not by a developer.
>>>>>>>
>>>>>>> Adrian Crum
>>>>>>> Sandglass Software
>>>>>>> www.sandglass-software.com
>>>>>>>
>>>>>>> On 3/21/2015 10:08 AM, Scott Gray wrote:
>>>>>>>>> My preference is to make ALL Delegator calls use the cache.
>>>>>>>>
>>>>>>>> Perhaps I misunderstood the above sentence? I responded because I
>>>>>>>> don't
>>>>>>>> think caching everything is a good idea
>>>>>>>>
>>>>>>>> On 21 Mar 2015 20:41, "Adrian Crum"
>>>>>>>> <[hidden email]>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Thanks for the info David! I agree 100% with everything you said.
>>>>>>>>>
>>>>>>>>> There may be some misunderstanding about my advice. I suggested
>>>>>>>>> that
>>>>>>>> caching should be configured in the settings file, I did not
>>>>>>>> suggest
>>>>>>>> that
>>>>>>>> everything should be cached all the time.
>>>>>>>>>
>>>>>>>>> Like you said, JMeter tests can reveal what needs to be cached,
>>>>>>>>> and a
>>>>>>>> sysadmin can fine-tune performance by tweaking the cache settings.
>>>>>>>> The
>>>>>>>> problem I mentioned is this: A sysadmin can't improve
>>>>>>>> performance by
>>>>>>>> caching a particular entity if a developer has hard-coded it not
>>>>>>>> to be
>>>>>>>> cached.
>>>>>>>>>
>>>>>>>>> Btw, I removed the complicated condition checking in the condition
>>>>>>>>> cache
>>>>>>>> because it didn't work. Not only was the system spending a lot of
>>>>>>>> time
>>>>>>>> evaluating long lists of values (each value having a potentially
>>>>>>>> long
>>>>>>>> list
>>>>>>>> of conditions), at the end of the evaluation the result was
>>>>>>>> always a
>>>>>>>> cache
>>>>>>>> miss.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Adrian Crum
>>>>>>>>> Sandglass Software
>>>>>>>>> www.sandglass-software.com
>>>>>>>>>
>>>>>>>>> On 3/20/2015 9:22 PM, David E. Jones wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Stepping back a little, some history and theory of the entity
>>>>>>>>>> cache
>>>>>>>> might be helpful.
>>>>>>>>>>
>>>>>>>>>> The original intent of the entity cache was a simple way to keep
>>>>>>>> frequently used values/records closer to the code that uses
>>>>>>>> them, ie
>>>>>>>> in the
>>>>>>>> application server. One real world example of this is the goal
>>>>>>>> to be
>>>>>>>> able
>>>>>>>> to render ecommerce catalog and product pages without hitting the
>>>>>>>> database.
>>>>>>>>>>
>>>>>>>>>> Over time the entity caching was made more complex to handle more
>>>>>>>> caching scenarios, but still left to the developer to determine if
>>>>>>>> caching
>>>>>>>> is appropriate for the code they are writing.
>>>>>>>>>>
>>>>>>>>>> In theory is it possible to write an entity cache that can be
>>>>>>>>>> used
>>>>>>>>>> 100%
>>>>>>>> of the time? IMO the answer is NO. This is almost possible for
>>>>>>>> single
>>>>>>>> record caching, with the cache ultimately becoming an in-memory
>>>>>>>> relational
>>>>>>>> database running on the app server (with full transaction support,
>>>>>>>> etc)...
>>>>>>>> but for List caching it totally kills the whole concept. The
>>>>>>>> current
>>>>>>>> entity
>>>>>>>> cache keeps lists of results by the query condition used to get
>>>>>>>> those
>>>>>>>> results and this is very different from what a database does, and
>>>>>>>> makes
>>>>>>>> things rather messy and inefficient outside simple use cases.
>>>>>>>>>>
>>>>>>>>>> On top of these big functional issues (which are deal killers
>>>>>>>>>> IMO),
>>>>>>>> there is also the performance issue. The point, or intent at least,
>>>>>>>> of the
>>>>>>>> entity cache is to improve performance. As the cache gets more
>>>>>>>> complex the
>>>>>>>> performance will suffer, and because of the whole concept of
>>>>>>>> caching
>>>>>>>> results by queries the performance will be WORSE than the DB
>>>>>>>> performance
>>>>>>>> for the same queries in most cases. Databases are quite fast and
>>>>>>>> efficient,
>>>>>>>> and we'll never be able to reproduce their ability to scale and
>>>>>>>> search in
>>>>>>>> something like an in-memory entity cache, especially not
>>>>>>>> considering the
>>>>>>>> massive redundancy and overhead of caching lists of values by
>>>>>>>> condition.
>>>>>>>>>>
>>>>>>>>>> As an example of this in the real world: on a large OFBiz
>>>>>>>>>> project I
>>>>>>>> worked on that finished last year we went into production with the
>>>>>>>> entity
>>>>>>>> cache turned OFF, completely DISABLED. Why? When doing load testing
>>>>>>>> on a
>>>>>>>> whim one of the guys decided to try it without the entity cache
>>>>>>>> enabled,
>>>>>>>> and the body of JMeter tests that exercised a few dozen of the most
>>>>>>>> common
>>>>>>>> user paths through the system actually ran FASTER. The database
>>>>>>>> (MySQL in
>>>>>>>> this case) was hit over the network, but responded quickly
>>>>>>>> enough to
>>>>>>>> make
>>>>>>>> things work quite well for the various find queries, and FAR faster
>>>>>>>> for
>>>>>>>> updates, especially creates. This project was one of the higher
>>>>>>>> volume
>>>>>>>> projects I'm aware of for OFBiz, at peaks handling sustained
>>>>>>>> processing of
>>>>>>>> around 10 orders per second (36,000 per hour), with some short term
>>>>>>>> peaks
>>>>>>>> much higher, closer to 20-30 orders per second... and longer term
>>>>>>>> peaks
>>>>>>>> hitting over 200k orders in one day (north America only day time,
>>>>>>>> around a
>>>>>>>> 12 hour window).
>>>>>>>>>>
>>>>>>>>>> I found this to be curious so looked into it a bit more and the
>>>>>>>>>> main
>>>>>>>> performance culprit was updates, ESPECIALLY creates on any entity
>>>>>>>> that has
>>>>>>>> an active list cache. Auto-clearing that cache requires running the
>>>>>>>> condition for each cache entry on the record to see if it matches,
>>>>>>>> and if
>>>>>>>> it does then it is cleared. This could be made more efficient by
>>>>>>>> expanding
>>>>>>>> the reverse index concept to index all values of fields in
>>>>>>>> conditions...
>>>>>>>> though that would be fairly complex to implement because of the
>>>>>>>> wide
>>>>>>>> variety of conditions that CAN be performed on fields, and even
>>>>>>>> moreso when
>>>>>>>> they are combined with other logic... especially NOTs and ORs. This
>>>>>>>> could
>>>>>>>> potentially increase performance, but would again add yet more
>>>>>>>> complexity
>>>>>>>> and overhead.
>>>>>>>>>>
>>>>>>>>>> To turn this dilemma into a nightmare, consider caching
>>>>>>>>>> view-entities.
>>>>>>>> In general as systems scale if you ever have to iterate over stuff
>>>>>>>> your
>>>>>>>> performance is going to get hit REALLY hard compared to indexed and
>>>>>>>> other
>>>>>>>> less than n operations.
>>>>>>>>>>
>>>>>>>>>> The main lesson from the story: caching, especially list caching,
>>>>>>>>>> should
>>>>>>>> ONLY be done in limited cases when the ratio of reads to write is
>>>>>>>> VERY
>>>>>>>> high, and more particularly the ratio of reads to creates. When
>>>>>>>> considering
>>>>>>>> whether to use a cache this should be considered carefully, because
>>>>>>>> records
>>>>>>>> are sometimes updated from places that developers are unaware,
>>>>>>>> sometimes at
>>>>>>>> surprising volumes. For example, it might seem great (and help a
>>>>>>>> lot
>>>>>>>> in dev
>>>>>>>> and lower scale testing) to cache inventory information for viewing
>>>>>>>> on a
>>>>>>>> category screen, but always go to the DB to avoid stale data on a
>>>>>>>> product
>>>>>>>> detail screen and when adding to cart. The problem is that with
>>>>>>>> high
>>>>>>>> order
>>>>>>>> volumes the inventory data is pretty much constantly being updated,
>>>>>>>> so the
>>>>>>>> caches are constantly... SLOWLY... being cleared as InventoryDetail
>>>>>>>> records
>>>>>>>> are created for reservations and issuances.
>>>>>>>>>>
>>>>>>>>>> To turn this nightmare into a deal killer, consider multiple
>>>>>>>>>> application
>>>>>>>> servers and the need for either a (SLOW) distributed cache or
>>>>>>>> (SLOW)
>>>>>>>> distributed cache clearing. These have to go over the network
>>>>>>>> anyway, so
>>>>>>>> might as well go to the database!
>>>>>>>>>>
>>>>>>>>>> In the case above where we decided to NOT use the entity cache at
>>>>>>>>>> all
>>>>>>>> the tests were run on one really beefy server showing that
>>>>>>>> disabling the
>>>>>>>> cache was faster. When we ran it in a cluster of just 2 servers
>>>>>>>> with
>>>>>>>> direct
>>>>>>>> DCC (the best case scenario for a distributed cache) we not only
>>>>>>>> saw
>>>>>>>> a big
>>>>>>>> performance hit, but also got various run-time errors from stale
>>>>>>>> data.
>>>>>>>>>>
>>>>>>>>>> I really don't how anyone could back the concept of caching all
>>>>>>>>>> finds by
>>>>>>>> default... you don't even have to imagine edge cases, just consider
>>>>>>>> the
>>>>>>>> problems ALREADY being faced with more limited caching and how
>>>>>>>> often the
>>>>>>>> entity cache simply isn't a good solution.
>>>>>>>>>>
>>>>>>>>>> As for improving the entity caching in OFBiz, there are some
>>>>>>>>>> concepts in
>>>>>>>> Moqui that might be useful:
>>>>>>>>>>
>>>>>>>>>> 1. add a cache attribute to the entity definition with true,
>>>>>>>>>> false,
>>>>>>>>>> and
>>>>>>>> never options; true and false being defaults that can be
>>>>>>>> overridden by
>>>>>>>> code, and never being an absolute (OFBiz does have this option
>>>>>>>> IIRC);
>>>>>>>> this
>>>>>>>> would default to false, true being a useful setting for common
>>>>>>>> things
>>>>>>>> like
>>>>>>>> Enumeration, StatusItem, etc, etc
>>>>>>>>>>
>>>>>>>>>> 2. add general support in the entity engine find methods for a
>>>>>>>>>> "for
>>>>>>>> update" parameter, and if true don't cache (and pass this on to the
>>>>>>>> DB to
>>>>>>>> lock the record(s) being queried), also making the value mutable
>>>>>>>>>>
>>>>>>>>>> 3. a write-through per-transaction cache; you can do some really
>>>>>>>>>> cool
>>>>>>>> stuff with this, avoiding most database hits during a transaction
>>>>>>>> until the
>>>>>>>> end when the changes are dumped to the DB; the Moqui implementation
>>>>>>>> of this
>>>>>>>> concept even looks for cached records that any find condition would
>>>>>>>> require
>>>>>>>> to get results and does the query in-memory, not having to go to
>>>>>>>> the
>>>>>>>> database at all... and for other queries augments the results with
>>>>>>>> values
>>>>>>>> in the cache
>>>>>>>>>>
>>>>>>>>>> The whole concept of a write-through cache that is limited to the
>>>>>>>>>> scope
>>>>>>>> of a single transaction shows some of the issues you would run into
>>>>>>>> even if
>>>>>>>> trying to make the entity cache transactional. Especially with more
>>>>>>>> complex
>>>>>>>> finds it just falls apart. The current Moqui implementation handles
>>>>>>>> quite a
>>>>>>>> bit, but there are various things that I've run into testing it
>>>>>>>> with
>>>>>>>> real-world business services that are either a REAL pain to handle
>>>>>>>> (so I
>>>>>>>> haven't yet, but it is conceptually possible) or that I simply
>>>>>>>> can't
>>>>>>>> think
>>>>>>>> of any good way to handle... and for those you simply can't use the
>>>>>>>> write-through cache.
>>>>>>>>>>
>>>>>>>>>> There are some notes in the code for this, and some
>>>>>>>>>> code/comments to
>>>>>>>> more thoroughly communicate this concept, in this class in Moqui:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>> https://github.com/moqui/moqui/blob/master/framework/src/main/groovy/org/moqui/impl/context/TransactionCache.groovy
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I should also say that my motivation to handle every edge case
>>>>>>>>>> even
>>>>>>>>>> for
>>>>>>>> this write-through cache is limited... yes there is room for
>>>>>>>> improvement
>>>>>>>> handling more scenarios, but how big will the performance increase
>>>>>>>> ACTUALLY
>>>>>>>> be for them? The efforts on this so far have been based on
>>>>>>>> profiling
>>>>>>>> results and making sure there is a significant difference (which
>>>>>>>> there is
>>>>>>>> for many services in Mantle Business Artifacts, though I haven't
>>>>>>>> even
>>>>>>>> come
>>>>>>>> close to testing all of them this way).
>>>>>>>>>>
>>>>>>>>>> The same concept would apply to a read-only entity cache... some
>>>>>>>>>> things
>>>>>>>> might be possible to support, but would NOT improve performance
>>>>>>>> making them
>>>>>>>> a moot point.
>>>>>>>>>>
>>>>>>>>>> I don't know if I've written enough to convince everyone
>>>>>>>>>> listening
>>>>>>>>>> that
>>>>>>>> even attempting a universal read-only entity cache is a useless
>>>>>>>> idea... I'm
>>>>>>>> sure some will still like the idea. If anyone gets into it and
>>>>>>>> wants
>>>>>>>> to try
>>>>>>>> it out in their own branch of OFBiz, great... knock yourself out
>>>>>>>> (probably
>>>>>>>> literally...). But PLEASE no one ever commit something like this to
>>>>>>>> the
>>>>>>>> primary branch in the repo... not EVER.
>>>>>>>>>>
>>>>>>>>>> The whole idea that the OFBiz entity cache has had more limited
>>>>>>>>>> ability
>>>>>>>> to handle different scenarios in the past than it does now is
>>>>>>>> not an
>>>>>>>> argument of any sort supporting the idea of taking the entity cache
>>>>>>>> to the
>>>>>>>> ultimate possible end... which theoretically isn't even that far
>>>>>>>> from
>>>>>>>> where
>>>>>>>> it is now.
>>>>>>>>>>
>>>>>>>>>> To apply a more useful standard the arguments should be for a
>>>>>>>>>> _useful_
>>>>>>>> objective, which means increasing performance. I guarantee an
>>>>>>>> always
>>>>>>>> used
>>>>>>>> find cache will NOT increase performance, it will kill it dead and
>>>>>>>> cause
>>>>>>>> infinite concurrency headaches in the process.
>>>>>>>>>>
>>>>>>>>>> -David
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> On 19 Mar 2015, at 10:46, Adrian Crum <
>>>>>>>> [hidden email]> wrote:
>>>>>>>>>>>
>>>>>>>>>>> The translation to English is not good, but I think I understand
>>>>>>>>>>> what
>>>>>>>> you are saying.
>>>>>>>>>>>
>>>>>>>>>>> The entity values in the cache MUST be immutable - because
>>>>>>>>>>> multiple
>>>>>>>> threads share the values. To do otherwise would require complicated
>>>>>>>> synchronization code in GenericValue (which would cause blocking
>>>>>>>> and
>>>>>>>> hurt
>>>>>>>> performance).
>>>>>>>>>>>
>>>>>>>>>>> When I first starting working on the entity cache issues, it
>>>>>>>>>>> appeared
>>>>>>>> to me that mutable entity values may have been in the original
>>>>>>>> design
>>>>>>>> (to
>>>>>>>> enable a write-through cache). That is my guess - I am not sure. At
>>>>>>>> some
>>>>>>>> time, the entity values in the cache were made immutable, but the
>>>>>>>> change
>>>>>>>> was incomplete - some cached entity values were immutable and
>>>>>>>> others
>>>>>>>> were
>>>>>>>> not. That is one of the things I fixed - I made sure ALL entity
>>>>>>>> values
>>>>>>>> coming from the cache are immutable.
>>>>>>>>>>>
>>>>>>>>>>> One way we can eliminate the additional complication of cloning
>>>>>>>> immutable entity values is to wrap the List in a custom Iterator
>>>>>>>> implementation that automatically clones elements as they are
>>>>>>>> retrieved
>>>>>>>> from the List. The drawback is the performance hit - because you
>>>>>>>> would be
>>>>>>>> cloning values that might not get modified. I think it is more
>>>>>>>> efficient to
>>>>>>>> clone an entity value only when you intend to modify it.
>>>>>>>>>>>
>>>>>>>>>>> Adrian Crum
>>>>>>>>>>> Sandglass Software
>>>>>>>>>>> www.sandglass-software.com
>>>>>>>>>>>
>>>>>>>>>>> On 3/19/2015 4:19 PM, Nicolas Malin wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Le 18/03/2015 13:16, Adrian Crum a écrit :
>>>>>>>>>>>>>
>>>>>>>>>>>>> If you code Delegator calls to avoid the cache, then there
>>>>>>>>>>>>> is no
>>>>>>>>>>>>> way
>>>>>>>>>>>>> for a sysadmin to configure the caching behavior - that bit of
>>>>>>>>>>>>> code
>>>>>>>>>>>>> will ALWAYS make a database call.
>>>>>>>>>>>>>
>>>>>>>>>>>>> If you make all Delegator calls use the cache, then there
>>>>>>>>>>>>> is an
>>>>>>>>>>>>> additional complication that will add a bit more code: the
>>>>>>>>>>>>> GenericValue instances retrieved from the cache are
>>>>>>>>>>>>> immutable -
>>>>>>>>>>>>> if you
>>>>>>>>>>>>> want to modify them, then you will have to clone them. So,
>>>>>>>>>>>>> this
>>>>>>>>>>>>> approach can produce an additional line of code.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I don't see any logical reason why we need to keep a
>>>>>>>>>>>> GenericValue
>>>>>>>>>>>> came
>>>>>>>>>>>> from cache as immutable. In large vision, a developper give
>>>>>>>>>>>> information
>>>>>>>>>>>> on cache or not only he want force the cache using during his
>>>>>>>>>>>> process.
>>>>>>>>>>>> As OFBiz manage by default transaction, timezone, locale,
>>>>>>>>>>>> auto-matching
>>>>>>>>>>>> or others.
>>>>>>>>>>>> The entity engine would be works with admin sys cache tuning.
>>>>>>>>>>>>
>>>>>>>>>>>> As example delegator.find("Party", "partyId", partyId) use the
>>>>>>>>>>>> default
>>>>>>>>>>>> parameter from cache.properties and after the store on a cached
>>>>>>>>>>>> GenericValue is a delegator's problem. I see a simple test like
>>>>>>>>>>>> that :
>>>>>>>>>>>> if (genericValue came from cache) {
>>>>>>>>>>>>       if (value is already done) {
>>>>>>>>>>>>          getFromDataBase
>>>>>>>>>>>>          update Value
>>>>>>>>>>>>       }
>>>>>>>>>>>>       else refuse (or not I have a doubt :) )
>>>>>>>>>>>> }
>>>>>>>>>>>> store
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Nicolas
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: Entity Caching

Jacques Le Roux
Administrator
Sound good to me, it's only a matter of convention

Jacques

Le 22/03/2015 11:31, Adrian Crum a écrit :

> I was thinking a maxSize setting of -1 means the cache is disabled. That would be an easy change to make.
>
> Having a separate enable/disable property will take a lot of rewriting.
>
> Adrian Crum
> Sandglass Software
> www.sandglass-software.com
>
> On 3/22/2015 10:21 AM, Jacques Le Roux wrote:
>> Maybe using 1 in the meantime ? ;)
>>
>> Jacques
>>
>> Le 22/03/2015 11:17, Adrian Crum a écrit :
>>> Oops, my bad. A maxSize setting of zero means there is no limit.
>>>
>>> I spent some time looking through the UtilCache code, and I don't see
>>> any way to disable a cache. Generally speaking, any setting of zero
>>> means there is no limit for that setting.
>>>
>>> It would be nice to have an enabled/disabled setting.
>>>
>>> Adrian Crum
>>> Sandglass Software
>>> www.sandglass-software.com
>>>
>>> On 3/22/2015 9:38 AM, Adrian Crum wrote:
>>>> Interesting. The current cache code ignores the maxSize setting.
>>>>
>>>> Adrian Crum
>>>> Sandglass Software
>>>> www.sandglass-software.com
>>>>
>>>> On 3/22/2015 7:38 AM, Adrian Crum wrote:
>>>>> I don't see an enable/disable setting but
>>>>>
>>>>> default.maxSize=0 in cache.properties
>>>>>
>>>>> should do it.
>>>>>
>>>>>
>>>>> Adrian Crum
>>>>> Sandglass Software
>>>>> www.sandglass-software.com
>>>>>
>>>>> On 3/22/2015 3:16 AM, Christian Carlow wrote:
>>>>>> Is there a convenient setting for disabling cache completely as David
>>>>>> mentioned he did?
>>>>>>
>>>>>> On Sat, 2015-03-21 at 21:39 -0400, Ron Wheeler wrote:
>>>>>>> I agree with Adrian that caching should be a sysadmin choice.
>>>>>>>
>>>>>>> I would also caution that measuring cache performance during
>>>>>>> testing is
>>>>>>> not a very useful activity. Testing tends to test one use case
>>>>>>> once and
>>>>>>> move on to the next.
>>>>>>> In production, users tend to do the same thing over and over.
>>>>>>> Testing might fill a shopping cart a few times and do a lot of other
>>>>>>> administrative functions as many times . In real life, shopping carts
>>>>>>> are filled much more frequently than catalog updates (one hopes).
>>>>>>> Using
>>>>>>> performance numbers from functional testing will be misleading.
>>>>>>>
>>>>>>> The other message that I get from David's discussion is that
>>>>>>> caching t
>>>>>>> built by professional caching experts  (Database developers as he
>>>>>>> mentioned) worked better than caching systems built by application
>>>>>>> developers.
>>>>>>> It is likely that ehcache and the database built-in caching functions
>>>>>>> will outperform caching systems built by OFBiz developers and will
>>>>>>> handle the main cases better and will handle edge cases properly.
>>>>>>> They
>>>>>>> will probably integrate better and be easier to configure at
>>>>>>> run-time or
>>>>>>> during deployment. They will also be easier to tune by the system
>>>>>>> administrator.
>>>>>>>
>>>>>>> I understand that Adrian needs to fix this quickly. I suppose that
>>>>>>> caching could be eliminated to solve the problem while a better
>>>>>>> solution
>>>>>>> is implemented.
>>>>>>>
>>>>>>> Do we know what it will take to add enough ehcache to make the system
>>>>>>> perform adequately to meet current requirements?
>>>>>>>
>>>>>>> Ron
>>>>>>>
>>>>>>>
>>>>>>> On 21/03/2015 6:22 AM, Adrian Crum wrote:
>>>>>>>> I will try to say it again, but differently.
>>>>>>>>
>>>>>>>> If I am a developer, I am not aware of the subtleties of caching
>>>>>>>> various entities. Entity cache settings will be determined during
>>>>>>>> staging. So, I write my code as if everything will be cached -
>>>>>>>> leaving
>>>>>>>> the door open for a sysadmin to configure caching during staging.
>>>>>>>>
>>>>>>>> During staging, a sysadmin can start off with caching disabled, and
>>>>>>>> then switch on caching for various entities while performance tests
>>>>>>>> are being run. After some time, the sysadmin will have cache
>>>>>>>> settings
>>>>>>>> that provide optimal throughput. Does that mean ALL entities are
>>>>>>>> cached? No, only the ones that need to be.
>>>>>>>>
>>>>>>>> The point I'm trying to make is this: The decision to cache or not
>>>>>>>> should be made by a sysadmin, not by a developer.
>>>>>>>>
>>>>>>>> Adrian Crum
>>>>>>>> Sandglass Software
>>>>>>>> www.sandglass-software.com
>>>>>>>>
>>>>>>>> On 3/21/2015 10:08 AM, Scott Gray wrote:
>>>>>>>>>> My preference is to make ALL Delegator calls use the cache.
>>>>>>>>>
>>>>>>>>> Perhaps I misunderstood the above sentence? I responded because I
>>>>>>>>> don't
>>>>>>>>> think caching everything is a good idea
>>>>>>>>>
>>>>>>>>> On 21 Mar 2015 20:41, "Adrian Crum"
>>>>>>>>> <[hidden email]>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Thanks for the info David! I agree 100% with everything you said.
>>>>>>>>>>
>>>>>>>>>> There may be some misunderstanding about my advice. I suggested
>>>>>>>>>> that
>>>>>>>>> caching should be configured in the settings file, I did not
>>>>>>>>> suggest
>>>>>>>>> that
>>>>>>>>> everything should be cached all the time.
>>>>>>>>>>
>>>>>>>>>> Like you said, JMeter tests can reveal what needs to be cached,
>>>>>>>>>> and a
>>>>>>>>> sysadmin can fine-tune performance by tweaking the cache settings.
>>>>>>>>> The
>>>>>>>>> problem I mentioned is this: A sysadmin can't improve
>>>>>>>>> performance by
>>>>>>>>> caching a particular entity if a developer has hard-coded it not
>>>>>>>>> to be
>>>>>>>>> cached.
>>>>>>>>>>
>>>>>>>>>> Btw, I removed the complicated condition checking in the condition
>>>>>>>>>> cache
>>>>>>>>> because it didn't work. Not only was the system spending a lot of
>>>>>>>>> time
>>>>>>>>> evaluating long lists of values (each value having a potentially
>>>>>>>>> long
>>>>>>>>> list
>>>>>>>>> of conditions), at the end of the evaluation the result was
>>>>>>>>> always a
>>>>>>>>> cache
>>>>>>>>> miss.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Adrian Crum
>>>>>>>>>> Sandglass Software
>>>>>>>>>> www.sandglass-software.com
>>>>>>>>>>
>>>>>>>>>> On 3/20/2015 9:22 PM, David E. Jones wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Stepping back a little, some history and theory of the entity
>>>>>>>>>>> cache
>>>>>>>>> might be helpful.
>>>>>>>>>>>
>>>>>>>>>>> The original intent of the entity cache was a simple way to keep
>>>>>>>>> frequently used values/records closer to the code that uses
>>>>>>>>> them, ie
>>>>>>>>> in the
>>>>>>>>> application server. One real world example of this is the goal
>>>>>>>>> to be
>>>>>>>>> able
>>>>>>>>> to render ecommerce catalog and product pages without hitting the
>>>>>>>>> database.
>>>>>>>>>>>
>>>>>>>>>>> Over time the entity caching was made more complex to handle more
>>>>>>>>> caching scenarios, but still left to the developer to determine if
>>>>>>>>> caching
>>>>>>>>> is appropriate for the code they are writing.
>>>>>>>>>>>
>>>>>>>>>>> In theory is it possible to write an entity cache that can be
>>>>>>>>>>> used
>>>>>>>>>>> 100%
>>>>>>>>> of the time? IMO the answer is NO. This is almost possible for
>>>>>>>>> single
>>>>>>>>> record caching, with the cache ultimately becoming an in-memory
>>>>>>>>> relational
>>>>>>>>> database running on the app server (with full transaction support,
>>>>>>>>> etc)...
>>>>>>>>> but for List caching it totally kills the whole concept. The
>>>>>>>>> current
>>>>>>>>> entity
>>>>>>>>> cache keeps lists of results by the query condition used to get
>>>>>>>>> those
>>>>>>>>> results and this is very different from what a database does, and
>>>>>>>>> makes
>>>>>>>>> things rather messy and inefficient outside simple use cases.
>>>>>>>>>>>
>>>>>>>>>>> On top of these big functional issues (which are deal killers
>>>>>>>>>>> IMO),
>>>>>>>>> there is also the performance issue. The point, or intent at least,
>>>>>>>>> of the
>>>>>>>>> entity cache is to improve performance. As the cache gets more
>>>>>>>>> complex the
>>>>>>>>> performance will suffer, and because of the whole concept of
>>>>>>>>> caching
>>>>>>>>> results by queries the performance will be WORSE than the DB
>>>>>>>>> performance
>>>>>>>>> for the same queries in most cases. Databases are quite fast and
>>>>>>>>> efficient,
>>>>>>>>> and we'll never be able to reproduce their ability to scale and
>>>>>>>>> search in
>>>>>>>>> something like an in-memory entity cache, especially not
>>>>>>>>> considering the
>>>>>>>>> massive redundancy and overhead of caching lists of values by
>>>>>>>>> condition.
>>>>>>>>>>>
>>>>>>>>>>> As an example of this in the real world: on a large OFBiz
>>>>>>>>>>> project I
>>>>>>>>> worked on that finished last year we went into production with the
>>>>>>>>> entity
>>>>>>>>> cache turned OFF, completely DISABLED. Why? When doing load testing
>>>>>>>>> on a
>>>>>>>>> whim one of the guys decided to try it without the entity cache
>>>>>>>>> enabled,
>>>>>>>>> and the body of JMeter tests that exercised a few dozen of the most
>>>>>>>>> common
>>>>>>>>> user paths through the system actually ran FASTER. The database
>>>>>>>>> (MySQL in
>>>>>>>>> this case) was hit over the network, but responded quickly
>>>>>>>>> enough to
>>>>>>>>> make
>>>>>>>>> things work quite well for the various find queries, and FAR faster
>>>>>>>>> for
>>>>>>>>> updates, especially creates. This project was one of the higher
>>>>>>>>> volume
>>>>>>>>> projects I'm aware of for OFBiz, at peaks handling sustained
>>>>>>>>> processing of
>>>>>>>>> around 10 orders per second (36,000 per hour), with some short term
>>>>>>>>> peaks
>>>>>>>>> much higher, closer to 20-30 orders per second... and longer term
>>>>>>>>> peaks
>>>>>>>>> hitting over 200k orders in one day (north America only day time,
>>>>>>>>> around a
>>>>>>>>> 12 hour window).
>>>>>>>>>>>
>>>>>>>>>>> I found this to be curious so looked into it a bit more and the
>>>>>>>>>>> main
>>>>>>>>> performance culprit was updates, ESPECIALLY creates on any entity
>>>>>>>>> that has
>>>>>>>>> an active list cache. Auto-clearing that cache requires running the
>>>>>>>>> condition for each cache entry on the record to see if it matches,
>>>>>>>>> and if
>>>>>>>>> it does then it is cleared. This could be made more efficient by
>>>>>>>>> expanding
>>>>>>>>> the reverse index concept to index all values of fields in
>>>>>>>>> conditions...
>>>>>>>>> though that would be fairly complex to implement because of the
>>>>>>>>> wide
>>>>>>>>> variety of conditions that CAN be performed on fields, and even
>>>>>>>>> moreso when
>>>>>>>>> they are combined with other logic... especially NOTs and ORs. This
>>>>>>>>> could
>>>>>>>>> potentially increase performance, but would again add yet more
>>>>>>>>> complexity
>>>>>>>>> and overhead.
>>>>>>>>>>>
>>>>>>>>>>> To turn this dilemma into a nightmare, consider caching
>>>>>>>>>>> view-entities.
>>>>>>>>> In general as systems scale if you ever have to iterate over stuff
>>>>>>>>> your
>>>>>>>>> performance is going to get hit REALLY hard compared to indexed and
>>>>>>>>> other
>>>>>>>>> less than n operations.
>>>>>>>>>>>
>>>>>>>>>>> The main lesson from the story: caching, especially list caching,
>>>>>>>>>>> should
>>>>>>>>> ONLY be done in limited cases when the ratio of reads to write is
>>>>>>>>> VERY
>>>>>>>>> high, and more particularly the ratio of reads to creates. When
>>>>>>>>> considering
>>>>>>>>> whether to use a cache this should be considered carefully, because
>>>>>>>>> records
>>>>>>>>> are sometimes updated from places that developers are unaware,
>>>>>>>>> sometimes at
>>>>>>>>> surprising volumes. For example, it might seem great (and help a
>>>>>>>>> lot
>>>>>>>>> in dev
>>>>>>>>> and lower scale testing) to cache inventory information for viewing
>>>>>>>>> on a
>>>>>>>>> category screen, but always go to the DB to avoid stale data on a
>>>>>>>>> product
>>>>>>>>> detail screen and when adding to cart. The problem is that with
>>>>>>>>> high
>>>>>>>>> order
>>>>>>>>> volumes the inventory data is pretty much constantly being updated,
>>>>>>>>> so the
>>>>>>>>> caches are constantly... SLOWLY... being cleared as InventoryDetail
>>>>>>>>> records
>>>>>>>>> are created for reservations and issuances.
>>>>>>>>>>>
>>>>>>>>>>> To turn this nightmare into a deal killer, consider multiple
>>>>>>>>>>> application
>>>>>>>>> servers and the need for either a (SLOW) distributed cache or
>>>>>>>>> (SLOW)
>>>>>>>>> distributed cache clearing. These have to go over the network
>>>>>>>>> anyway, so
>>>>>>>>> might as well go to the database!
>>>>>>>>>>>
>>>>>>>>>>> In the case above where we decided to NOT use the entity cache at
>>>>>>>>>>> all
>>>>>>>>> the tests were run on one really beefy server showing that
>>>>>>>>> disabling the
>>>>>>>>> cache was faster. When we ran it in a cluster of just 2 servers
>>>>>>>>> with
>>>>>>>>> direct
>>>>>>>>> DCC (the best case scenario for a distributed cache) we not only
>>>>>>>>> saw
>>>>>>>>> a big
>>>>>>>>> performance hit, but also got various run-time errors from stale
>>>>>>>>> data.
>>>>>>>>>>>
>>>>>>>>>>> I really don't how anyone could back the concept of caching all
>>>>>>>>>>> finds by
>>>>>>>>> default... you don't even have to imagine edge cases, just consider
>>>>>>>>> the
>>>>>>>>> problems ALREADY being faced with more limited caching and how
>>>>>>>>> often the
>>>>>>>>> entity cache simply isn't a good solution.
>>>>>>>>>>>
>>>>>>>>>>> As for improving the entity caching in OFBiz, there are some
>>>>>>>>>>> concepts in
>>>>>>>>> Moqui that might be useful:
>>>>>>>>>>>
>>>>>>>>>>> 1. add a cache attribute to the entity definition with true,
>>>>>>>>>>> false,
>>>>>>>>>>> and
>>>>>>>>> never options; true and false being defaults that can be
>>>>>>>>> overridden by
>>>>>>>>> code, and never being an absolute (OFBiz does have this option
>>>>>>>>> IIRC);
>>>>>>>>> this
>>>>>>>>> would default to false, true being a useful setting for common
>>>>>>>>> things
>>>>>>>>> like
>>>>>>>>> Enumeration, StatusItem, etc, etc
>>>>>>>>>>>
>>>>>>>>>>> 2. add general support in the entity engine find methods for a
>>>>>>>>>>> "for
>>>>>>>>> update" parameter, and if true don't cache (and pass this on to the
>>>>>>>>> DB to
>>>>>>>>> lock the record(s) being queried), also making the value mutable
>>>>>>>>>>>
>>>>>>>>>>> 3. a write-through per-transaction cache; you can do some really
>>>>>>>>>>> cool
>>>>>>>>> stuff with this, avoiding most database hits during a transaction
>>>>>>>>> until the
>>>>>>>>> end when the changes are dumped to the DB; the Moqui implementation
>>>>>>>>> of this
>>>>>>>>> concept even looks for cached records that any find condition would
>>>>>>>>> require
>>>>>>>>> to get results and does the query in-memory, not having to go to
>>>>>>>>> the
>>>>>>>>> database at all... and for other queries augments the results with
>>>>>>>>> values
>>>>>>>>> in the cache
>>>>>>>>>>>
>>>>>>>>>>> The whole concept of a write-through cache that is limited to the
>>>>>>>>>>> scope
>>>>>>>>> of a single transaction shows some of the issues you would run into
>>>>>>>>> even if
>>>>>>>>> trying to make the entity cache transactional. Especially with more
>>>>>>>>> complex
>>>>>>>>> finds it just falls apart. The current Moqui implementation handles
>>>>>>>>> quite a
>>>>>>>>> bit, but there are various things that I've run into testing it
>>>>>>>>> with
>>>>>>>>> real-world business services that are either a REAL pain to handle
>>>>>>>>> (so I
>>>>>>>>> haven't yet, but it is conceptually possible) or that I simply
>>>>>>>>> can't
>>>>>>>>> think
>>>>>>>>> of any good way to handle... and for those you simply can't use the
>>>>>>>>> write-through cache.
>>>>>>>>>>>
>>>>>>>>>>> There are some notes in the code for this, and some
>>>>>>>>>>> code/comments to
>>>>>>>>> more thoroughly communicate this concept, in this class in Moqui:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>> https://github.com/moqui/moqui/blob/master/framework/src/main/groovy/org/moqui/impl/context/TransactionCache.groovy
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I should also say that my motivation to handle every edge case
>>>>>>>>>>> even
>>>>>>>>>>> for
>>>>>>>>> this write-through cache is limited... yes there is room for
>>>>>>>>> improvement
>>>>>>>>> handling more scenarios, but how big will the performance increase
>>>>>>>>> ACTUALLY
>>>>>>>>> be for them? The efforts on this so far have been based on
>>>>>>>>> profiling
>>>>>>>>> results and making sure there is a significant difference (which
>>>>>>>>> there is
>>>>>>>>> for many services in Mantle Business Artifacts, though I haven't
>>>>>>>>> even
>>>>>>>>> come
>>>>>>>>> close to testing all of them this way).
>>>>>>>>>>>
>>>>>>>>>>> The same concept would apply to a read-only entity cache... some
>>>>>>>>>>> things
>>>>>>>>> might be possible to support, but would NOT improve performance
>>>>>>>>> making them
>>>>>>>>> a moot point.
>>>>>>>>>>>
>>>>>>>>>>> I don't know if I've written enough to convince everyone
>>>>>>>>>>> listening
>>>>>>>>>>> that
>>>>>>>>> even attempting a universal read-only entity cache is a useless
>>>>>>>>> idea... I'm
>>>>>>>>> sure some will still like the idea. If anyone gets into it and
>>>>>>>>> wants
>>>>>>>>> to try
>>>>>>>>> it out in their own branch of OFBiz, great... knock yourself out
>>>>>>>>> (probably
>>>>>>>>> literally...). But PLEASE no one ever commit something like this to
>>>>>>>>> the
>>>>>>>>> primary branch in the repo... not EVER.
>>>>>>>>>>>
>>>>>>>>>>> The whole idea that the OFBiz entity cache has had more limited
>>>>>>>>>>> ability
>>>>>>>>> to handle different scenarios in the past than it does now is
>>>>>>>>> not an
>>>>>>>>> argument of any sort supporting the idea of taking the entity cache
>>>>>>>>> to the
>>>>>>>>> ultimate possible end... which theoretically isn't even that far
>>>>>>>>> from
>>>>>>>>> where
>>>>>>>>> it is now.
>>>>>>>>>>>
>>>>>>>>>>> To apply a more useful standard the arguments should be for a
>>>>>>>>>>> _useful_
>>>>>>>>> objective, which means increasing performance. I guarantee an
>>>>>>>>> always
>>>>>>>>> used
>>>>>>>>> find cache will NOT increase performance, it will kill it dead and
>>>>>>>>> cause
>>>>>>>>> infinite concurrency headaches in the process.
>>>>>>>>>>>
>>>>>>>>>>> -David
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> On 19 Mar 2015, at 10:46, Adrian Crum <
>>>>>>>>> [hidden email]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> The translation to English is not good, but I think I understand
>>>>>>>>>>>> what
>>>>>>>>> you are saying.
>>>>>>>>>>>>
>>>>>>>>>>>> The entity values in the cache MUST be immutable - because
>>>>>>>>>>>> multiple
>>>>>>>>> threads share the values. To do otherwise would require complicated
>>>>>>>>> synchronization code in GenericValue (which would cause blocking
>>>>>>>>> and
>>>>>>>>> hurt
>>>>>>>>> performance).
>>>>>>>>>>>>
>>>>>>>>>>>> When I first starting working on the entity cache issues, it
>>>>>>>>>>>> appeared
>>>>>>>>> to me that mutable entity values may have been in the original
>>>>>>>>> design
>>>>>>>>> (to
>>>>>>>>> enable a write-through cache). That is my guess - I am not sure. At
>>>>>>>>> some
>>>>>>>>> time, the entity values in the cache were made immutable, but the
>>>>>>>>> change
>>>>>>>>> was incomplete - some cached entity values were immutable and
>>>>>>>>> others
>>>>>>>>> were
>>>>>>>>> not. That is one of the things I fixed - I made sure ALL entity
>>>>>>>>> values
>>>>>>>>> coming from the cache are immutable.
>>>>>>>>>>>>
>>>>>>>>>>>> One way we can eliminate the additional complication of cloning
>>>>>>>>> immutable entity values is to wrap the List in a custom Iterator
>>>>>>>>> implementation that automatically clones elements as they are
>>>>>>>>> retrieved
>>>>>>>>> from the List. The drawback is the performance hit - because you
>>>>>>>>> would be
>>>>>>>>> cloning values that might not get modified. I think it is more
>>>>>>>>> efficient to
>>>>>>>>> clone an entity value only when you intend to modify it.
>>>>>>>>>>>>
>>>>>>>>>>>> Adrian Crum
>>>>>>>>>>>> Sandglass Software
>>>>>>>>>>>> www.sandglass-software.com
>>>>>>>>>>>>
>>>>>>>>>>>> On 3/19/2015 4:19 PM, Nicolas Malin wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Le 18/03/2015 13:16, Adrian Crum a écrit :
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> If you code Delegator calls to avoid the cache, then there
>>>>>>>>>>>>>> is no
>>>>>>>>>>>>>> way
>>>>>>>>>>>>>> for a sysadmin to configure the caching behavior - that bit of
>>>>>>>>>>>>>> code
>>>>>>>>>>>>>> will ALWAYS make a database call.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> If you make all Delegator calls use the cache, then there
>>>>>>>>>>>>>> is an
>>>>>>>>>>>>>> additional complication that will add a bit more code: the
>>>>>>>>>>>>>> GenericValue instances retrieved from the cache are
>>>>>>>>>>>>>> immutable -
>>>>>>>>>>>>>> if you
>>>>>>>>>>>>>> want to modify them, then you will have to clone them. So,
>>>>>>>>>>>>>> this
>>>>>>>>>>>>>> approach can produce an additional line of code.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I don't see any logical reason why we need to keep a
>>>>>>>>>>>>> GenericValue
>>>>>>>>>>>>> came
>>>>>>>>>>>>> from cache as immutable. In large vision, a developper give
>>>>>>>>>>>>> information
>>>>>>>>>>>>> on cache or not only he want force the cache using during his
>>>>>>>>>>>>> process.
>>>>>>>>>>>>> As OFBiz manage by default transaction, timezone, locale,
>>>>>>>>>>>>> auto-matching
>>>>>>>>>>>>> or others.
>>>>>>>>>>>>> The entity engine would be works with admin sys cache tuning.
>>>>>>>>>>>>>
>>>>>>>>>>>>> As example delegator.find("Party", "partyId", partyId) use the
>>>>>>>>>>>>> default
>>>>>>>>>>>>> parameter from cache.properties and after the store on a cached
>>>>>>>>>>>>> GenericValue is a delegator's problem. I see a simple test like
>>>>>>>>>>>>> that :
>>>>>>>>>>>>> if (genericValue came from cache) {
>>>>>>>>>>>>>       if (value is already done) {
>>>>>>>>>>>>>          getFromDataBase
>>>>>>>>>>>>>          update Value
>>>>>>>>>>>>>       }
>>>>>>>>>>>>>       else refuse (or not I have a doubt :) )
>>>>>>>>>>>>> }
>>>>>>>>>>>>> store
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Nicolas
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>
>
123