|
*Adrian,
I’ve updated to the latest ofbiz code (revision 1374598) and trying to setup our code to use the new changes to the service engine and job poller. Here are a few questions: 1. Instantiating a new dispatcher to run a service. We use to instantiate a LocalDispatcher to run a service with the following code: LocalDispatcher olapDispatcher = GenericDispatcher.getLocalDispatcher(“some dispatcher Name”, olapDelegator); Now it looks like we have a Factor object that creates the dispatcher if one is not already created with that name. The method is createLocallDispatcher but its not a static method and so a GenericDispatcherFactory needs to be instantiated first. LocalDispatcher olapDispatcher = GenericDispatcherFactory.createLocalDispatcher(dbConfig, olapDelegator); How should I be instantiating the GenericDispatcherFactory or is there preferred way to run a service from code? 2. Is the “wait-millis” attribute still required? The service-config.xsd still lists it as a required attribute for thread-pool but I don’t see it reference anywhere in the code. If it is needed how does it work? 3. If I understand the service configuration file, it looks like I can configure the service engine to work against multiple pools (see example config below). If I wanted to run some services in specific pools can I use the LocalDispatcher.scedule() method and just have an immediate time to run but specify the pool I want them to use. We need this functionality for our data warehouse processing. We try to provide real time reports but our database cannot handle a high number of data warehouse updates during heavy loads. By configuring only one server to service a particular pool we can limit the number of concurrent processes running those services. <thread-pool send-to-pool="pool" purge-job-days="4" failed-retry-min="3" ttl="120000" jobs="100" min-threads="2" max-threads="5" wait-millis="1000" poll-enabled="true" poll-db-millis="30000"> <run-from-pool name="pool"/> <run-from-pool name="dwPool"/> </thread-pool> Thanks in advance for your help. I’ll continue to test the new configuration as soon as I can get these answers. Brett* |
|
On 8/22/2012 7:04 PM, Brett Palmer wrote: > *Adrian, > > I’ve updated to the latest ofbiz code (revision 1374598) and trying to > setup our code to use the new changes to the service engine and job poller. > > > Here are a few questions: > > > 1. Instantiating a new dispatcher to run a service. > > We use to instantiate a LocalDispatcher to run a service with the following > code: > > LocalDispatcher olapDispatcher = > GenericDispatcher.getLocalDispatcher(“some dispatcher Name”, olapDelegator); > > > Now it looks like we have a Factor object that creates the dispatcher if > one is not already created with that name. The method is > createLocallDispatcher but its not a static method and so a > GenericDispatcherFactory needs to be instantiated first. > > LocalDispatcher olapDispatcher = > GenericDispatcherFactory.createLocalDispatcher(dbConfig, olapDelegator); > > > How should I be instantiating the GenericDispatcherFactory or is there > preferred way to run a service from code? ServiceContainer.getLocalDispatcher(...) > > > 2. Is the “wait-millis” attribute still required? The service-config.xsd > still lists it as a required attribute for thread-pool but I don’t see it > reference anywhere in the code. If it is needed how does it work? It is not used. The schema has been updated to reflect that - make sure you are looking at the service-config.xsd file in your local copy. > > 3. If I understand the service configuration file, it looks like I can > configure the service engine to work against multiple pools (see example > config below). If I wanted to run some services in specific pools can I > use the LocalDispatcher.scedule() method and just have an immediate time to > run but specify the pool I want them to use. Correct. Just remember the multiple pools share a delegator, so they are all in the same data source. > > We need this functionality for our data warehouse processing. We try to > provide real time reports but our database cannot handle a high number of > data warehouse updates during heavy loads. By configuring only one server > to service a particular pool we can limit the number of concurrent > processes running those services. > > > <thread-pool send-to-pool="pool" > purge-job-days="4" > failed-retry-min="3" > ttl="120000" > jobs="100" > min-threads="2" > max-threads="5" > wait-millis="1000" > poll-enabled="true" > poll-db-millis="30000"> > <run-from-pool name="pool"/> > <run-from-pool name="dwPool"/> > </thread-pool> That configuration will work. That server will service the two pools. > > Thanks in advance for your help. I’ll continue to test the new > configuration as soon as I can get these answers. Thank you taking the time to test this. I have a client requirement similar to yours, but on a smaller scale - so I am very interested in how it all works out. -Adrian |
|
On 8/23/2012 8:46 AM, Adrian Crum wrote:
> > On 8/22/2012 7:04 PM, Brett Palmer wrote: We need this functionality > for our data warehouse processing. We try to >> provide real time reports but our database cannot handle a high >> number of >> data warehouse updates during heavy loads. By configuring only one >> server >> to service a particular pool we can limit the number of concurrent >> processes running those services. >> >> >> <thread-pool send-to-pool="pool" >> purge-job-days="4" >> failed-retry-min="3" >> ttl="120000" >> jobs="100" >> min-threads="2" >> max-threads="5" >> wait-millis="1000" >> poll-enabled="true" >> poll-db-millis="30000"> >> <run-from-pool name="pool"/> >> <run-from-pool name="dwPool"/> >> </thread-pool> > > > That configuration will work. That server will service the two pools. I forgot to mention, if you're running lots of jobs, then you will want to increase the jobs (queue size) value. You mentioned in another thread that your application will run up to 10,000 jobs - in that case you should increase the jobs value to 1000 or more. The queue size affects memory, so there is an interaction between responsiveness and memory use. The potential problem with the Job Poller (before and after the overhaul) is with asynchronous service calls (not scheduled jobs). When you run an async service, the service engine converts the service call to a job and places it in the queue. It is not persisted like scheduled jobs. If the Job Poller has just filled the queue with scheduled jobs, then there is no room for async services, and any attempt to queue an async service will fail (throws an exception "Unable to queue job"). I designed the new code so the service engine can check for that possibility, but I didn't change the service engine behavior. Instead, users should configure their <thread-pool> element(s) and applications carefully. For example, if your application schedules lots of jobs, then design it in a way that it schedules no more than (queue size - n) jobs at a time - to leave room for async services. Another option would be to have a server dedicated to servicing scheduled jobs - that way the potential clash with async services is not an issue. -Adrian |
|
*Adrian,
Thanks for the information. Please see my questions inline:* On Thu, Aug 23, 2012 at 6:24 AM, Adrian Crum < [hidden email]> wrote: > On 8/23/2012 8:46 AM, Adrian Crum wrote: > >> >> On 8/22/2012 7:04 PM, Brett Palmer wrote: We need this functionality for >> our data warehouse processing. We try to >> >> provide real time reports but our database cannot handle a high number of >>> data warehouse updates during heavy loads. By configuring only one >>> server >>> to service a particular pool we can limit the number of concurrent >>> processes running those services. >>> >>> >>> <thread-pool send-to-pool="pool" >>> purge-job-days="4" >>> failed-retry-min="3" >>> ttl="120000" >>> jobs="100" >>> min-threads="2" >>> max-threads="5" >>> wait-millis="1000" >>> poll-enabled="true" >>> poll-db-millis="30000"> >>> <run-from-pool name="pool"/> >>> <run-from-pool name="dwPool"/> >>> </thread-pool> >>> >> >> >> That configuration will work. That server will service the two pools. >> > > I forgot to mention, if you're running lots of jobs, then you will want to > increase the jobs (queue size) value. You mentioned in another thread that > your application will run up to 10,000 jobs - in that case you should > increase the jobs value to 1000 or more. The queue size affects memory, so > there is an interaction between responsiveness and memory use. > > > The potential problem with the Job Poller (before and after the overhaul) > is with asynchronous service calls (not scheduled jobs). When you run an > async service, the service engine converts the service call to a job and > places it in the queue. It is not persisted like scheduled jobs. If the Job > Poller has just filled the queue with scheduled jobs, then there is no room > for async services, and any attempt to queue an async service will fail > (throws an exception "Unable to queue job"). > > *I assume the “queue” is a memory queue and not the same as the JobSandBox pool that is stored in the database which is why there is a limit to the queue. Let me know if that assumption is not correct. If you run an async service and set the “persist” option to true will you still hit the Job Poller limit or will the job be persisted and run when the Job Poller has sufficient resources?* > I designed the new code so the service engine can check for that > possibility, but I didn't change the service engine behavior. Instead, > users should configure their <thread-pool> element(s) and applications > carefully. For example, if your application schedules lots of jobs, then > design it in a way that it schedules no more than (queue size - n) jobs at > a time - to leave room for async services. Another option would be to have > a server dedicated to servicing scheduled jobs - that way the potential > clash with async services is not an issue. > > scheduled jobs - thanks again for the update. We like the idea of dedicating an app server to service specific scheduled jobs as it controls the number of concurrent processes we run in production. I’m still curious why the service engine dispatcher does not have an API to run an async service to a specified “pool”. This seems like a simple addition since there is an API to schedule a job to run in a specific pool. I understand that there is potential this could fail if the queue is full (unless my question above about the persisted job is a possible workaround). From your provided information here is how we would likely use the new changes with the service engine and job poller: Background: Our application is an online testing application with multiple ofbiz servers and a single ofbiz data warehouse. Tests are taken on the dedicated app servers and when the test is done a data warehouse process picks up the tests and processes them for the data warehouse reports. The reports are near real time but during heavy testing periods we want to limit how many concurrent warehouse processes are running. Here are the steps in the process: 1. Configure a limited number of ofbiz servers to process scheduled data warehouse jobs that are submitted to a specific job pool (i.e. dwPool). 2. When a person has completed a test the application creates a scheduled job with a current timestamp for when the service should be run. The scheduled job would be assigned to the “dwPool”. The servers configured in item 1 above would then process these jobs. The above steps allow us to scale our solution horizontally by adding more ofbiz servers to handle online testing as needed. We are still able to handle near real time reporting as we have dedicated servers assigned to process data warehouse requests. During light testing days the warehouse scheduled jobs process almost immediately and during heavy testing days they lag slightly depending on the service request rate. Question: If a scheduled job is set with a current timestamp for the “startTime”, but the JobPoller is behind because of a large number of scheduled service requests, will the JobPoller still pick up the scheduled job according to the order of startTime? Here is a specific example: Current time: Aug. 23, 10:00AM - A schedule job is created with a start time of Aug. 23, 10:00AM - JobPoller finishes processing current queue of jobs at timestamp: Aug. 23, 10:05AM - JobPoller queries data for the next list of jobs to process. Question: Will it pick up the jobs scheduled for Aug. 23, 10:00AM even though the current time is past that time? Thanks in advance for your response. Brett* |
|
On 8/23/2012 4:42 PM, Brett Palmer wrote:
> *Adrian, > > Thanks for the information. Please see my questions inline:* > > On Thu, Aug 23, 2012 at 6:24 AM, Adrian Crum < > [hidden email]> wrote: > >> On 8/23/2012 8:46 AM, Adrian Crum wrote: >> >>> On 8/22/2012 7:04 PM, Brett Palmer wrote: We need this functionality for >>> our data warehouse processing. We try to >>> >>> provide real time reports but our database cannot handle a high number of >>>> data warehouse updates during heavy loads. By configuring only one >>>> server >>>> to service a particular pool we can limit the number of concurrent >>>> processes running those services. >>>> >>>> >>>> <thread-pool send-to-pool="pool" >>>> purge-job-days="4" >>>> failed-retry-min="3" >>>> ttl="120000" >>>> jobs="100" >>>> min-threads="2" >>>> max-threads="5" >>>> wait-millis="1000" >>>> poll-enabled="true" >>>> poll-db-millis="30000"> >>>> <run-from-pool name="pool"/> >>>> <run-from-pool name="dwPool"/> >>>> </thread-pool> >>>> >>> >>> That configuration will work. That server will service the two pools. >>> >> I forgot to mention, if you're running lots of jobs, then you will want to >> increase the jobs (queue size) value. You mentioned in another thread that >> your application will run up to 10,000 jobs - in that case you should >> increase the jobs value to 1000 or more. The queue size affects memory, so >> there is an interaction between responsiveness and memory use. >> >> > *Thanks for the information that is very helpful.* > > > >> The potential problem with the Job Poller (before and after the overhaul) >> is with asynchronous service calls (not scheduled jobs). When you run an >> async service, the service engine converts the service call to a job and >> places it in the queue. It is not persisted like scheduled jobs. If the Job >> Poller has just filled the queue with scheduled jobs, then there is no room >> for async services, and any attempt to queue an async service will fail >> (throws an exception "Unable to queue job"). >> >> > *I assume the “queue” is a memory queue and not the same as the JobSandBox > pool that is stored in the database which is why there is a limit to the > queue. Let me know if that assumption is not correct. That is correct. The queue size limit was put there to prevent the Job Scheduler from saturating or crashing the server. During a polling interval, the Job Manager will fill the queue with jobs scheduled to run. Any jobs that don't fit in the queue will be queued during the next polling interval. Queue service threads will run the queued jobs. Creating too many queue service threads will slow down queue throughput because of Thread maintenance overhead. So, there are some parameters for users to tweak and they interact with each other, but the overall objective is to configure the Job Scheduler so that it has good throughput but doesn't run out of control and swamp the server. > > If you run an async service and set the “persist” option to true will you > still hit the Job Poller limit or will the job be persisted and run when > the Job Poller has sufficient resources?* The async service will be persisted as a job scheduled to run now. The job will be in the pool specified in the <thread-pool> send-to-pool attribute. > > >> I designed the new code so the service engine can check for that >> possibility, but I didn't change the service engine behavior. Instead, >> users should configure their <thread-pool> element(s) and applications >> carefully. For example, if your application schedules lots of jobs, then >> design it in a way that it schedules no more than (queue size - n) jobs at >> a time - to leave room for async services. Another option would be to have >> a server dedicated to servicing scheduled jobs - that way the potential >> clash with async services is not an issue. >> >> > *I wasn’t aware that the same queue was shared between async jobs and > scheduled jobs - thanks again for the update. > > We like the idea of dedicating an app server to service specific scheduled > jobs as it controls the number of concurrent processes we run in > production. > > I’m still curious why the service engine dispatcher does not have an API to > run an async service to a specified “pool”. This seems like a simple > addition since there is an API to schedule a job to run in a specific pool. > I understand that there is potential this could fail if the queue is full > (unless my question above about the persisted job is a possible > workaround). If persist is true, then the async service will be assigned to the pool specified in the <thread-pool> send-to-pool attribute. If persist is false, then specifying a job pool would have no affect. We could create an "async-service-only" queue that would be unaffected by persisted jobs, but it can still be overrun. That's why I changed the code to allow the service engine to check for that possibility. I don't know what OFBiz should do by default in those scenarios, so I thought it best to leave the async service behavior the same (an exception is thrown). In other words, we could create the extra queue to give users a warm fuzzy feeling, but the same basic problem will still exist. I believe it is best to make it clear that, because of their nature, non-persisted async services are not guaranteed to run. > > From your provided information here is how we would likely use the new > changes with the service engine and job poller: > > Background: Our application is an online testing application with multiple > ofbiz servers and a single ofbiz data warehouse. Tests are taken on the > dedicated app servers and when the test is done a data warehouse process > picks up the tests and processes them for the data warehouse reports. The > reports are near real time but during heavy testing periods we want to > limit how many concurrent warehouse processes are running. Here are the > steps in the process: > > 1. Configure a limited number of ofbiz servers to process scheduled data > warehouse jobs that are submitted to a specific job pool (i.e. dwPool). > > 2. When a person has completed a test the application creates a scheduled > job with a current timestamp for when the service should be run. The > scheduled job would be assigned to the “dwPool”. The servers configured in > item 1 above would then process these jobs. That sounds like a good strategy. An improvement would be to have the job servers service all pools. In that configuration the online testing application servers would have the <thread-pool> poll-enabled attribute set to "false" - so they will not run any jobs themselves. The only bottleneck would be the data source - and that bottleneck can be fixed by putting the JobSandbox entity on a separate data source and use a "jobs only" delegator. > > The above steps allow us to scale our solution horizontally by adding more > ofbiz servers to handle online testing as needed. We are still able to > handle near real time reporting as we have dedicated servers assigned to > process data warehouse requests. During light testing days the warehouse > scheduled jobs process almost immediately and during heavy testing days > they lag slightly depending on the service request rate. > > Question: > > If a scheduled job is set with a current timestamp for the “startTime”, but > the JobPoller is behind because of a large number of scheduled service > requests, will the JobPoller still pick up the scheduled job according to > the order of startTime? > > Here is a specific example: > > Current time: Aug. 23, 10:00AM > > - A schedule job is created with a start time of Aug. 23, 10:00AM > - JobPoller finishes processing current queue of jobs at timestamp: Aug. > 23, 10:05AM > - JobPoller queries data for the next list of jobs to process. > > Question: Will it pick up the jobs scheduled for Aug. 23, 10:00AM even > though the current time is past that time? Yes, the Job Manager will retrieve all jobs scheduled to start prior to now. -Adrian |
| Free forum by Nabble | Edit this page |
