|
I just committed a bunch of changes to the Job Manager group of classes.
The changes help simplify the code and hopefully make the Job Manager more robust. On the other hand, I might have broken something. ;) I will monitor the mailing list for problems. I believe the JobPoller settings in serviceengine.xml (the <thread-pool> element) should be changed. I think min-threads should be set to "2" and max-threads should be set to "5". Creating lots of threads can hurt throughput because the JVM spends more time managing them. I would be interested in hearing what others think. -Adrian |
|
On 8/5/2012 11:02 AM, Adrian Crum wrote:
> I just committed a bunch of changes to the Job Manager group of > classes. The changes help simplify the code and hopefully make the Job > Manager more robust. On the other hand, I might have broken something. > ;) I will monitor the mailing list for problems. > > I believe the JobPoller settings in serviceengine.xml (the > <thread-pool> element) should be changed. I think min-threads should > be set to "2" and max-threads should be set to "5". Creating lots of > threads can hurt throughput because the JVM spends more time managing > them. I would be interested in hearing what others think. Thinking about this more, there are some other things that need to be fixed: 1. The JobPoller uses an unbounded queue. In a busy server, there is the potential the queue will grow in size until it causes an out-of-memory condition. 2. There is no accommodation for when a job cannot be added to the queue - it is just lost. We could add a dequeue method to the Job interface that will allow implementations to recover or reschedule the job when it can't be added to the queue. 3. There is a JobPoller instance per delegator, and each instance contains the number of threads configured in serviceengine.xml. With the current max-threads setting of 15, a multi-tenant installation with 100 tenants will create up to 1500 threads. (!!!) A smarter strategy might be to have a single JobPoller instance that services multiple JobManagers. -Adrian |
|
Adrian,
Thanks for the update. Here are some feedback points on your listed items: 1. JobPoller get out-of-memor error. We've seen this a lot in production servers when the JobSandbox table is not constantly pruned of old records. It would be nice if the poller restricted its search for only active records it could process. 2. Queue for capturing missing records would be good. From item 1 above we have had locks on table when the poller is busy doing a scan and new jobs cannot be added or time out. Other wish items: - Ability to assign different service engines to process specific job types. We often multiple application servers but want to limit how many concurrent jobs are run. For example, if I had 4 app servers connected to the same DB I may only want one app server to service particular jobs. I thought this feature was possible but when I tried to implement it by changing some of the configuration files it never worked correctly. - JMS support for the service engine. It would be nice if there was a JMS interface for those that want to use JMS as their queuing mechanism for jobs. Brett On Sun, Aug 5, 2012 at 6:21 AM, Adrian Crum < [hidden email]> wrote: > On 8/5/2012 11:02 AM, Adrian Crum wrote: > >> I just committed a bunch of changes to the Job Manager group of classes. >> The changes help simplify the code and hopefully make the Job Manager more >> robust. On the other hand, I might have broken something. ;) I will monitor >> the mailing list for problems. >> >> I believe the JobPoller settings in serviceengine.xml (the <thread-pool> >> element) should be changed. I think min-threads should be set to "2" and >> max-threads should be set to "5". Creating lots of threads can hurt >> throughput because the JVM spends more time managing them. I would be >> interested in hearing what others think. >> > > Thinking about this more, there are some other things that need to be > fixed: > > 1. The JobPoller uses an unbounded queue. In a busy server, there is the > potential the queue will grow in size until it causes an out-of-memory > condition. > 2. There is no accommodation for when a job cannot be added to the queue - > it is just lost. We could add a dequeue method to the Job interface that > will allow implementations to recover or reschedule the job when it can't > be added to the queue. > 3. There is a JobPoller instance per delegator, and each instance contains > the number of threads configured in serviceengine.xml. With the current > max-threads setting of 15, a multi-tenant installation with 100 tenants > will create up to 1500 threads. (!!!) A smarter strategy might be to have a > single JobPoller instance that services multiple JobManagers. > > -Adrian > > > |
|
Thanks Brett! I will be working on this again next weekend, and I will
try to include your suggestions. -Adrian On 8/5/2012 4:53 PM, Brett Palmer wrote: > Adrian, > > Thanks for the update. Here are some feedback points on your listed items: > > 1. JobPoller get out-of-memor error. We've seen this a lot in production > servers when the JobSandbox table is not constantly pruned of old records. > It would be nice if the poller restricted its search for only active > records it could process. > > 2. Queue for capturing missing records would be good. From item 1 above we > have had locks on table when the poller is busy doing a scan and new jobs > cannot be added or time out. > > Other wish items: > > - Ability to assign different service engines to process specific job > types. We often multiple application servers but want to limit how many > concurrent jobs are run. For example, if I had 4 app servers connected to > the same DB I may only want one app server to service particular jobs. I > thought this feature was possible but when I tried to implement it by > changing some of the configuration files it never worked correctly. > > - JMS support for the service engine. It would be nice if there was a JMS > interface for those that want to use JMS as their queuing mechanism for > jobs. > > > Brett > > On Sun, Aug 5, 2012 at 6:21 AM, Adrian Crum < > [hidden email]> wrote: > >> On 8/5/2012 11:02 AM, Adrian Crum wrote: >> >>> I just committed a bunch of changes to the Job Manager group of classes. >>> The changes help simplify the code and hopefully make the Job Manager more >>> robust. On the other hand, I might have broken something. ;) I will monitor >>> the mailing list for problems. >>> >>> I believe the JobPoller settings in serviceengine.xml (the <thread-pool> >>> element) should be changed. I think min-threads should be set to "2" and >>> max-threads should be set to "5". Creating lots of threads can hurt >>> throughput because the JVM spends more time managing them. I would be >>> interested in hearing what others think. >>> >> Thinking about this more, there are some other things that need to be >> fixed: >> >> 1. The JobPoller uses an unbounded queue. In a busy server, there is the >> potential the queue will grow in size until it causes an out-of-memory >> condition. >> 2. There is no accommodation for when a job cannot be added to the queue - >> it is just lost. We could add a dequeue method to the Job interface that >> will allow implementations to recover or reschedule the job when it can't >> be added to the queue. >> 3. There is a JobPoller instance per delegator, and each instance contains >> the number of threads configured in serviceengine.xml. With the current >> max-threads setting of 15, a multi-tenant installation with 100 tenants >> will create up to 1500 threads. (!!!) A smarter strategy might be to have a >> single JobPoller instance that services multiple JobManagers. >> >> -Adrian >> >> >> |
|
Administrator
|
In reply to this post by Brett
Hi Brett,
From: "Brett Palmer" <[hidden email]> > Adrian, > > Thanks for the update. Here are some feedback points on your listed items: > > 1. JobPoller get out-of-memor error. We've seen this a lot in production > servers when the JobSandbox table is not constantly pruned of old records. > It would be nice if the poller restricted its search for only active > records it could process. Did you use the purge-job-days setting in serviceengine.xml and the related purgeOldJobs? If not was there a reason? > 2. Queue for capturing missing records would be good. From item 1 above we > have had locks on table when the poller is busy doing a scan and new jobs > cannot be added or time out. +1 > Other wish items: > > - Ability to assign different service engines to process specific job > types. We often multiple application servers but want to limit how many > concurrent jobs are run. For example, if I had 4 app servers connected to > the same DB I may only want one app server to service particular jobs. I > thought this feature was possible but when I tried to implement it by > changing some of the configuration files it never worked correctly. Las time I used this it was with R4.0 and it worked, which problems did you cross exactly (if you remember) ? Thanks > - JMS support for the service engine. It would be nice if there was a JMS > interface for those that want to use JMS as their queuing mechanism for > jobs. +1 Jacques > > Brett > > On Sun, Aug 5, 2012 at 6:21 AM, Adrian Crum < > [hidden email]> wrote: > >> On 8/5/2012 11:02 AM, Adrian Crum wrote: >> >>> I just committed a bunch of changes to the Job Manager group of classes. >>> The changes help simplify the code and hopefully make the Job Manager more >>> robust. On the other hand, I might have broken something. ;) I will monitor >>> the mailing list for problems. >>> >>> I believe the JobPoller settings in serviceengine.xml (the <thread-pool> >>> element) should be changed. I think min-threads should be set to "2" and >>> max-threads should be set to "5". Creating lots of threads can hurt >>> throughput because the JVM spends more time managing them. I would be >>> interested in hearing what others think. >>> >> >> Thinking about this more, there are some other things that need to be >> fixed: >> >> 1. The JobPoller uses an unbounded queue. In a busy server, there is the >> potential the queue will grow in size until it causes an out-of-memory >> condition. >> 2. There is no accommodation for when a job cannot be added to the queue - >> it is just lost. We could add a dequeue method to the Job interface that >> will allow implementations to recover or reschedule the job when it can't >> be added to the queue. >> 3. There is a JobPoller instance per delegator, and each instance contains >> the number of threads configured in serviceengine.xml. With the current >> max-threads setting of 15, a multi-tenant installation with 100 tenants >> will create up to 1500 threads. (!!!) A smarter strategy might be to have a >> single JobPoller instance that services multiple JobManagers. >> >> -Adrian >> >> >> > |
|
We can use JMS as an option to replace the job queue and associated
threads, but we can't eliminate the entity-engine-based Job Manager code because there are too many places where the JobSandbox entity is manipulated (a bad practice, but it is done a lot). -Adrian Quoting Jacques Le Roux <[hidden email]>: > Hi Brett, > > From: "Brett Palmer" <[hidden email]> >> Adrian, >> >> Thanks for the update. Here are some feedback points on your listed items: >> >> 1. JobPoller get out-of-memor error. We've seen this a lot in production >> servers when the JobSandbox table is not constantly pruned of old records. >> It would be nice if the poller restricted its search for only active >> records it could process. > > Did you use the purge-job-days setting in serviceengine.xml and the > related purgeOldJobs? If not was there a reason? > >> 2. Queue for capturing missing records would be good. From item 1 above we >> have had locks on table when the poller is busy doing a scan and new jobs >> cannot be added or time out. > > +1 > >> Other wish items: >> >> - Ability to assign different service engines to process specific job >> types. We often multiple application servers but want to limit how many >> concurrent jobs are run. For example, if I had 4 app servers connected to >> the same DB I may only want one app server to service particular jobs. I >> thought this feature was possible but when I tried to implement it by >> changing some of the configuration files it never worked correctly. > > Las time I used this it was with R4.0 and it worked, which problems > did you cross exactly (if you remember) ? > > Thanks > >> - JMS support for the service engine. It would be nice if there was a JMS >> interface for those that want to use JMS as their queuing mechanism for >> jobs. > > +1 > > Jacques > >> >> Brett >> >> On Sun, Aug 5, 2012 at 6:21 AM, Adrian Crum < >> [hidden email]> wrote: >> >>> On 8/5/2012 11:02 AM, Adrian Crum wrote: >>> >>>> I just committed a bunch of changes to the Job Manager group of classes. >>>> The changes help simplify the code and hopefully make the Job Manager more >>>> robust. On the other hand, I might have broken something. ;) I >>>> will monitor >>>> the mailing list for problems. >>>> >>>> I believe the JobPoller settings in serviceengine.xml (the <thread-pool> >>>> element) should be changed. I think min-threads should be set to "2" and >>>> max-threads should be set to "5". Creating lots of threads can hurt >>>> throughput because the JVM spends more time managing them. I would be >>>> interested in hearing what others think. >>>> >>> >>> Thinking about this more, there are some other things that need to be >>> fixed: >>> >>> 1. The JobPoller uses an unbounded queue. In a busy server, there is the >>> potential the queue will grow in size until it causes an out-of-memory >>> condition. >>> 2. There is no accommodation for when a job cannot be added to the queue - >>> it is just lost. We could add a dequeue method to the Job interface that >>> will allow implementations to recover or reschedule the job when it can't >>> be added to the queue. >>> 3. There is a JobPoller instance per delegator, and each instance contains >>> the number of threads configured in serviceengine.xml. With the current >>> max-threads setting of 15, a multi-tenant installation with 100 tenants >>> will create up to 1500 threads. (!!!) A smarter strategy might be to have a >>> single JobPoller instance that services multiple JobManagers. >>> >>> -Adrian >>> >>> >>> >> > |
|
In reply to this post by Jacques Le Roux
*Jacques,*
* I had to review some of my notes to remember what we were trying to do with the JobSandbox. Here are my replies to your questions: 1. Did you use the purge-job-days setting in serviceengine.xml and the related purgeOldJobs? If not was there a reason? We were not using the purgeOldJobs service. This was probably because we didn’t understand how the service worked. We may have thought the service was specific to order only jobs which would not have worked for us. Our jobs are custom service jobs for the particular application we are developing. One problem that we had with most jobs that hit the JobSandbox (including the poller) was that it appeared they were doing full table scans instead of an indexed scan. These would cause problems for us when the JobSandbox grew larger and especially during heavy production days. We would often see transaction locks on the JobSandbox and I/O bottlenecks on the server in general due to the scans. The purgeOldJobs service may be a good solution for that if we could keep the JobSandbox to a reasonable number of records. I created issue OFBIZ-3855 on this a couple of years ago when we tried to use the JobSandbox as a batch process service for multiple application servers. We were filling up the JobSandbox with 100k of records over a short period of time. The poller was getting transaction timeouts before it could change the status of the next available job to process. I created a patch to allow a user to customize the transaction timeout for the poller. I thought I had submitted this patch but looking at the Jira issue it doesn’t look like it was every submitted. In the end we changed how we did our data warehouse processing. Increasing the transaction timeout didn’t really solve the problem either it just made it possible to extend the timeout length which can have other consequences in the system. If the community is still interested in the patch I can submit it to Jira for a recent version from the trunk. 2. Configuring service engine to run with multiple job pools. As I’m looking at my notes I believe the problem with configuring the service engine with multiple job pools was that there wasn’t an API to run a service (async or synchronous) to a specific job service pool. You could schedule a job to run against a particular pool. For example in the serviceengine.xml file you can configure a job to run in a particular job pool like the following: <startup-service name="testScv" runtime-data-id="9900" runtime-delay="0" run-in-pool="pool"/> You can also use the LocalDispatcher.schedule() method to schedule a job to run in a particular pool. What we needed was a way to configure our app servers to service different service pools but allow all app servers to request the service dynamically. This would allow us to limit the number of concurrent services that were run in our system. The default system engine lets all the app servers service the jobSandbox which doesn’t scale well for us during heavy production days. This is one of the reasons we liked the idea of a JMS integration with the service engine. Then we could start up processes to listen to a specific queues and our application could write to the different queues. This would allow us to control the amount of concurrent services processed at a time. Let me know if you need any more information. Thanks, Brett* On Mon, Aug 6, 2012 at 1:10 AM, Jacques Le Roux < [hidden email]> wrote: > Hi Brett, > > From: "Brett Palmer" <[hidden email]> > > Adrian, >> >> Thanks for the update. Here are some feedback points on your listed >> items: >> >> 1. JobPoller get out-of-memor error. We've seen this a lot in production >> servers when the JobSandbox table is not constantly pruned of old records. >> It would be nice if the poller restricted its search for only active >> records it could process. >> > > Did you use the purge-job-days setting in serviceengine.xml and the > related purgeOldJobs? If not was there a reason? > > > 2. Queue for capturing missing records would be good. From item 1 above >> we >> have had locks on table when the poller is busy doing a scan and new jobs >> cannot be added or time out. >> > > +1 > > > Other wish items: >> >> - Ability to assign different service engines to process specific job >> types. We often multiple application servers but want to limit how many >> concurrent jobs are run. For example, if I had 4 app servers connected to >> the same DB I may only want one app server to service particular jobs. I >> thought this feature was possible but when I tried to implement it by >> changing some of the configuration files it never worked correctly. >> > > Las time I used this it was with R4.0 and it worked, which problems did > you cross exactly (if you remember) ? > > Thanks > > > - JMS support for the service engine. It would be nice if there was a JMS >> interface for those that want to use JMS as their queuing mechanism for >> jobs. >> > > +1 > > Jacques > > > >> Brett >> >> On Sun, Aug 5, 2012 at 6:21 AM, Adrian Crum < >> adrian.crum@sandglass-**software.com <[hidden email]>> >> wrote: >> >> On 8/5/2012 11:02 AM, Adrian Crum wrote: >>> >>> I just committed a bunch of changes to the Job Manager group of classes. >>>> The changes help simplify the code and hopefully make the Job Manager >>>> more >>>> robust. On the other hand, I might have broken something. ;) I will >>>> monitor >>>> the mailing list for problems. >>>> >>>> I believe the JobPoller settings in serviceengine.xml (the <thread-pool> >>>> element) should be changed. I think min-threads should be set to "2" and >>>> max-threads should be set to "5". Creating lots of threads can hurt >>>> throughput because the JVM spends more time managing them. I would be >>>> interested in hearing what others think. >>>> >>>> >>> Thinking about this more, there are some other things that need to be >>> fixed: >>> >>> 1. The JobPoller uses an unbounded queue. In a busy server, there is the >>> potential the queue will grow in size until it causes an out-of-memory >>> condition. >>> 2. There is no accommodation for when a job cannot be added to the queue >>> - >>> it is just lost. We could add a dequeue method to the Job interface that >>> will allow implementations to recover or reschedule the job when it can't >>> be added to the queue. >>> 3. There is a JobPoller instance per delegator, and each instance >>> contains >>> the number of threads configured in serviceengine.xml. With the current >>> max-threads setting of 15, a multi-tenant installation with 100 tenants >>> will create up to 1500 threads. (!!!) A smarter strategy might be to >>> have a >>> single JobPoller instance that services multiple JobManagers. >>> >>> -Adrian >>> >>> >>> >>> >> |
|
Thanks Brett! Your feedback and Jira issue will help a lot.
-Adrian On 8/6/2012 10:52 PM, Brett Palmer wrote: > *Jacques,* > * > I had to review some of my notes to remember what we were trying to do with > the JobSandbox. Here are my replies to your questions: > > 1. Did you use the purge-job-days setting in serviceengine.xml and the > related purgeOldJobs? If not was there a reason? > > We were not using the purgeOldJobs service. This was probably because we > didn’t understand how the service worked. We may have thought the service > was specific to order only jobs which would not have worked for us. Our > jobs are custom service jobs for the particular application we are > developing. > > One problem that we had with most jobs that hit the JobSandbox (including > the poller) was that it appeared they were doing full table scans instead > of an indexed scan. These would cause problems for us when the JobSandbox > grew larger and especially during heavy production days. We would often > see transaction locks on the JobSandbox and I/O bottlenecks on the server > in general due to the scans. The purgeOldJobs service may be a good > solution for that if we could keep the JobSandbox to a reasonable number of > records. > > I created issue OFBIZ-3855 on this a couple of years ago when we tried to > use the JobSandbox as a batch process service for multiple application > servers. We were filling up the JobSandbox with 100k of records over a > short period of time. The poller was getting transaction timeouts before > it could change the status of the next available job to process. I created > a patch to allow a user to customize the transaction timeout for the > poller. I thought I had submitted this patch but looking at the Jira issue > it doesn’t look like it was every submitted. > > In the end we changed how we did our data warehouse processing. Increasing > the transaction timeout didn’t really solve the problem either it just made > it possible to extend the timeout length which can have other consequences > in the system. > > If the community is still interested in the patch I can submit it to Jira > for a recent version from the trunk. > > > 2. Configuring service engine to run with multiple job pools. > > As I’m looking at my notes I believe the problem with configuring the > service engine with multiple job pools was that there wasn’t an API to run > a service (async or synchronous) to a specific job service pool. You could > schedule a job to run against a particular pool. > > For example in the serviceengine.xml file you can configure a job to run in > a particular job pool like the following: > > <startup-service name="testScv" runtime-data-id="9900" > runtime-delay="0" > run-in-pool="pool"/> > > You can also use the LocalDispatcher.schedule() method to schedule a job to > run in a particular pool. > > What we needed was a way to configure our app servers to service different > service pools but allow all app servers to request the service dynamically. > This would allow us to limit the number of concurrent services that were > run in our system. The default system engine lets all the app servers > service the jobSandbox which doesn’t scale well for us during heavy > production days. > > This is one of the reasons we liked the idea of a JMS integration with the > service engine. Then we could start up processes to listen to a specific > queues and our application could write to the different queues. This would > allow us to control the amount of concurrent services processed at a time. > > Let me know if you need any more information. > > > Thanks, > > > Brett* > > On Mon, Aug 6, 2012 at 1:10 AM, Jacques Le Roux < > [hidden email]> wrote: > >> Hi Brett, >> >> From: "Brett Palmer" <[hidden email]> >> >> Adrian, >>> Thanks for the update. Here are some feedback points on your listed >>> items: >>> >>> 1. JobPoller get out-of-memor error. We've seen this a lot in production >>> servers when the JobSandbox table is not constantly pruned of old records. >>> It would be nice if the poller restricted its search for only active >>> records it could process. >>> >> Did you use the purge-job-days setting in serviceengine.xml and the >> related purgeOldJobs? If not was there a reason? >> >> >> 2. Queue for capturing missing records would be good. From item 1 above >>> we >>> have had locks on table when the poller is busy doing a scan and new jobs >>> cannot be added or time out. >>> >> +1 >> >> >> Other wish items: >>> - Ability to assign different service engines to process specific job >>> types. We often multiple application servers but want to limit how many >>> concurrent jobs are run. For example, if I had 4 app servers connected to >>> the same DB I may only want one app server to service particular jobs. I >>> thought this feature was possible but when I tried to implement it by >>> changing some of the configuration files it never worked correctly. >>> >> Las time I used this it was with R4.0 and it worked, which problems did >> you cross exactly (if you remember) ? >> >> Thanks >> >> >> - JMS support for the service engine. It would be nice if there was a JMS >>> interface for those that want to use JMS as their queuing mechanism for >>> jobs. >>> >> +1 >> >> Jacques >> >> >> >>> Brett >>> >>> On Sun, Aug 5, 2012 at 6:21 AM, Adrian Crum < >>> adrian.crum@sandglass-**software.com <[hidden email]>> >>> wrote: >>> >>> On 8/5/2012 11:02 AM, Adrian Crum wrote: >>>> I just committed a bunch of changes to the Job Manager group of classes. >>>>> The changes help simplify the code and hopefully make the Job Manager >>>>> more >>>>> robust. On the other hand, I might have broken something. ;) I will >>>>> monitor >>>>> the mailing list for problems. >>>>> >>>>> I believe the JobPoller settings in serviceengine.xml (the <thread-pool> >>>>> element) should be changed. I think min-threads should be set to "2" and >>>>> max-threads should be set to "5". Creating lots of threads can hurt >>>>> throughput because the JVM spends more time managing them. I would be >>>>> interested in hearing what others think. >>>>> >>>>> >>>> Thinking about this more, there are some other things that need to be >>>> fixed: >>>> >>>> 1. The JobPoller uses an unbounded queue. In a busy server, there is the >>>> potential the queue will grow in size until it causes an out-of-memory >>>> condition. >>>> 2. There is no accommodation for when a job cannot be added to the queue >>>> - >>>> it is just lost. We could add a dequeue method to the Job interface that >>>> will allow implementations to recover or reschedule the job when it can't >>>> be added to the queue. >>>> 3. There is a JobPoller instance per delegator, and each instance >>>> contains >>>> the number of threads configured in serviceengine.xml. With the current >>>> max-threads setting of 15, a multi-tenant installation with 100 tenants >>>> will create up to 1500 threads. (!!!) A smarter strategy might be to >>>> have a >>>> single JobPoller instance that services multiple JobManagers. >>>> >>>> -Adrian >>>> >>>> >>>> >>>> |
|
In reply to this post by Brett
Brett,
I think I solved your problems with my recent commits, ending with rev 1370566. Let me know if it helps. -Adrian On 8/5/2012 4:53 PM, Brett Palmer wrote: > Adrian, > > Thanks for the update. Here are some feedback points on your listed items: > > 1. JobPoller get out-of-memor error. We've seen this a lot in production > servers when the JobSandbox table is not constantly pruned of old records. > It would be nice if the poller restricted its search for only active > records it could process. > > 2. Queue for capturing missing records would be good. From item 1 above we > have had locks on table when the poller is busy doing a scan and new jobs > cannot be added or time out. > > Other wish items: > > - Ability to assign different service engines to process specific job > types. We often multiple application servers but want to limit how many > concurrent jobs are run. For example, if I had 4 app servers connected to > the same DB I may only want one app server to service particular jobs. I > thought this feature was possible but when I tried to implement it by > changing some of the configuration files it never worked correctly. > > - JMS support for the service engine. It would be nice if there was a JMS > interface for those that want to use JMS as their queuing mechanism for > jobs. > > > Brett > > On Sun, Aug 5, 2012 at 6:21 AM, Adrian Crum < > [hidden email]> wrote: > >> On 8/5/2012 11:02 AM, Adrian Crum wrote: >> >>> I just committed a bunch of changes to the Job Manager group of classes. >>> The changes help simplify the code and hopefully make the Job Manager more >>> robust. On the other hand, I might have broken something. ;) I will monitor >>> the mailing list for problems. >>> >>> I believe the JobPoller settings in serviceengine.xml (the <thread-pool> >>> element) should be changed. I think min-threads should be set to "2" and >>> max-threads should be set to "5". Creating lots of threads can hurt >>> throughput because the JVM spends more time managing them. I would be >>> interested in hearing what others think. >>> >> Thinking about this more, there are some other things that need to be >> fixed: >> >> 1. The JobPoller uses an unbounded queue. In a busy server, there is the >> potential the queue will grow in size until it causes an out-of-memory >> condition. >> 2. There is no accommodation for when a job cannot be added to the queue - >> it is just lost. We could add a dequeue method to the Job interface that >> will allow implementations to recover or reschedule the job when it can't >> be added to the queue. >> 3. There is a JobPoller instance per delegator, and each instance contains >> the number of threads configured in serviceengine.xml. With the current >> max-threads setting of 15, a multi-tenant installation with 100 tenants >> will create up to 1500 threads. (!!!) A smarter strategy might be to have a >> single JobPoller instance that services multiple JobManagers. >> >> -Adrian >> >> >> |
|
Adrian,
Thanks I'll take an update and try it out. Brett On Aug 7, 2012 4:23 PM, "Adrian Crum" <[hidden email]> wrote: > Brett, > > I think I solved your problems with my recent commits, ending with rev > 1370566. Let me know if it helps. > > -Adrian > > On 8/5/2012 4:53 PM, Brett Palmer wrote: > >> Adrian, >> >> Thanks for the update. Here are some feedback points on your listed >> items: >> >> 1. JobPoller get out-of-memor error. We've seen this a lot in production >> servers when the JobSandbox table is not constantly pruned of old records. >> It would be nice if the poller restricted its search for only active >> records it could process. >> >> 2. Queue for capturing missing records would be good. From item 1 above >> we >> have had locks on table when the poller is busy doing a scan and new jobs >> cannot be added or time out. >> >> Other wish items: >> >> - Ability to assign different service engines to process specific job >> types. We often multiple application servers but want to limit how many >> concurrent jobs are run. For example, if I had 4 app servers connected to >> the same DB I may only want one app server to service particular jobs. I >> thought this feature was possible but when I tried to implement it by >> changing some of the configuration files it never worked correctly. >> >> - JMS support for the service engine. It would be nice if there was a JMS >> interface for those that want to use JMS as their queuing mechanism for >> jobs. >> >> >> Brett >> >> On Sun, Aug 5, 2012 at 6:21 AM, Adrian Crum < >> adrian.crum@sandglass-**software.com <[hidden email]>> >> wrote: >> >> On 8/5/2012 11:02 AM, Adrian Crum wrote: >>> >>> I just committed a bunch of changes to the Job Manager group of classes. >>>> The changes help simplify the code and hopefully make the Job Manager >>>> more >>>> robust. On the other hand, I might have broken something. ;) I will >>>> monitor >>>> the mailing list for problems. >>>> >>>> I believe the JobPoller settings in serviceengine.xml (the <thread-pool> >>>> element) should be changed. I think min-threads should be set to "2" and >>>> max-threads should be set to "5". Creating lots of threads can hurt >>>> throughput because the JVM spends more time managing them. I would be >>>> interested in hearing what others think. >>>> >>>> Thinking about this more, there are some other things that need to be >>> fixed: >>> >>> 1. The JobPoller uses an unbounded queue. In a busy server, there is the >>> potential the queue will grow in size until it causes an out-of-memory >>> condition. >>> 2. There is no accommodation for when a job cannot be added to the queue >>> - >>> it is just lost. We could add a dequeue method to the Job interface that >>> will allow implementations to recover or reschedule the job when it can't >>> be added to the queue. >>> 3. There is a JobPoller instance per delegator, and each instance >>> contains >>> the number of threads configured in serviceengine.xml. With the current >>> max-threads setting of 15, a multi-tenant installation with 100 tenants >>> will create up to 1500 threads. (!!!) A smarter strategy might be to >>> have a >>> single JobPoller instance that services multiple JobManagers. >>> >>> -Adrian >>> >>> >>> >>> > |
|
In reply to this post by Brett
Quoting Brett Palmer <[hidden email]>:
> *Jacques,* > * > I had to review some of my notes to remember what we were trying to do with > the JobSandbox. Here are my replies to your questions: > > 1. Did you use the purge-job-days setting in serviceengine.xml and the > related purgeOldJobs? If not was there a reason? > > We were not using the purgeOldJobs service. This was probably because we > didn’t understand how the service worked. We may have thought the service > was specific to order only jobs which would not have worked for us. Our > jobs are custom service jobs for the particular application we are > developing. I just looked at that service and I don't like it. It is simplistic and takes a brute-force approach. Basically, the service is scheduled to run at midnight each day and it purges all jobs older than a configurable time duration. There is no guarantee the Job Poller will be idle when that service fires. The Java code tries to limit the purge to 1000 record batches. That setting is hard-coded and appears arbitrary. I would prefer to put job purging back under the control of the Job Poller - where purges can be scheduled during queue idle periods and they can follow the same constraints as any other job. -Adrian |
|
Administrator
|
From: <[hidden email]>
> Quoting Brett Palmer <[hidden email]>: > >> *Jacques,* >> * >> I had to review some of my notes to remember what we were trying to do with >> the JobSandbox. Here are my replies to your questions: >> >> 1. Did you use the purge-job-days setting in serviceengine.xml and the >> related purgeOldJobs? If not was there a reason? >> >> We were not using the purgeOldJobs service. This was probably because we >> didn’t understand how the service worked. We may have thought the service >> was specific to order only jobs which would not have worked for us. Our >> jobs are custom service jobs for the particular application we are >> developing. > > I just looked at that service and I don't like it. It is simplistic and takes a brute-force approach. Basically, the service is > scheduled to run at midnight each day and it purges all jobs older than a configurable time duration. There is no guarantee the > Job Poller will be idle when that service fires. The Java code tries to limit the purge to 1000 record batches. That setting is > hard-coded and appears arbitrary. > > I would prefer to put job purging back under the control of the Job Poller - where purges can be scheduled during queue idle > periods and they can follow the same constraints as any other job. That sounds like a good replacement indeed +1 Jacques > -Adrian > > |
|
Administrator
|
In reply to this post by Brett
Hi Brett,
Interesting... Brett Palmer wrote: > *Jacques,* > * > I had to review some of my notes to remember what we were trying to do with > the JobSandbox. Here are my replies to your questions: > > 1. Did you use the purge-job-days setting in serviceengine.xml and the > related purgeOldJobs? If not was there a reason? > > We were not using the purgeOldJobs service. This was probably because we > didn’t understand how the service worked. We may have thought the service > was specific to order only jobs which would not have worked for us. Our > jobs are custom service jobs for the particular application we are > developing. I agree with Adrian, this can be perfected, using a smart dynamic way of purging old jobs during Job Poller idle periods > One problem that we had with most jobs that hit the JobSandbox (including > the poller) was that it appeared they were doing full table scans instead > of an indexed scan. These would cause problems for us when the JobSandbox > grew larger and especially during heavy production days. We would often > see transaction locks on the JobSandbox and I/O bottlenecks on the server > in general due to the scans. The purgeOldJobs service may be a good > solution for that if we could keep the JobSandbox to a reasonable number of > records. > > I created issue OFBIZ-3855 on this a couple of years ago when we tried to > use the JobSandbox as a batch process service for multiple application > servers. We were filling up the JobSandbox with 100k of records over a > short period of time. The poller was getting transaction timeouts before > it could change the status of the next available job to process. I created > a patch to allow a user to customize the transaction timeout for the > poller. I thought I had submitted this patch but looking at the Jira issue > it doesn’t look like it was every submitted. I put a comment there. I browsed (I can't really say reviewed) Adrian's recent work, after Jacopo's, and it seems to me that it should address your problem. Or at least is a sound foundation for that... > In the end we changed how we did our data warehouse processing. Increasing > the transaction timeout didn’t really solve the problem either it just made > it possible to extend the timeout length which can have other consequences > in the system. > > If the community is still interested in the patch I can submit it to Jira > for a recent version from the trunk. > > > 2. Configuring service engine to run with multiple job pools. > > As I’m looking at my notes I believe the problem with configuring the > service engine with multiple job pools was that there wasn’t an API to run > a service (async or synchronous) to a specific job service pool. You could > schedule a job to run against a particular pool. > > For example in the serviceengine.xml file you can configure a job to run in > a particular job pool like the following: > > <startup-service name="testScv" runtime-data-id="9900" > runtime-delay="0" > run-in-pool="pool"/> > > You can also use the LocalDispatcher.schedule() method to schedule a job to > run in a particular pool. > > What we needed was a way to configure our app servers to service different > service pools but allow all app servers to request the service dynamically. I see, you want to have this dynamically done with an API, to better handle where the jobs are running, not statically as done by the thread-pool attribute. > This would allow us to limit the number of concurrent services that were > run in our system. If I well understand what you mean by "concurrent services" (I guess you mean jobs), when I want to avoid running concurrent services, I put the semaphore service attribute to "fail". Since it uses the ServiceSemaphore it should span across all services manager and thread-pools which use the same DB. So far I did not cross issues with that, but maybe it can also be improved, notably to guarantee any collisions in DB using SELECT for UPDATE. >The default system engine lets all the app servers > service the jobSandbox which doesn’t scale well for us during heavy > production days. Not sure to understand, you mean that assigning services to thread-pools has no effects? I rather guess it was not sufficient from you explanation above. > This is one of the reasons we liked the idea of a JMS integration with the > service engine. Then we could start up processes to listen to a specific > queues and our application could write to the different queues. This would > allow us to control the amount of concurrent services processed at a time. There is already a JMS integration with the service engine. I use it for the DCC https://cwiki.apache.org/confluence/display/OFBIZ/Distributed+Entity+Cache+Clear+Mechanism You want something more flexible, like the "dynamic thread-pool API" you suggested, more integrated? Jacques > Let me know if you need any more information. > > > Thanks, > > > Brett* > > On Mon, Aug 6, 2012 at 1:10 AM, Jacques Le Roux < > [hidden email]> wrote: > >> Hi Brett, >> >> From: "Brett Palmer" <[hidden email]> >> >> Adrian, >>> >>> Thanks for the update. Here are some feedback points on your listed >>> items: >>> >>> 1. JobPoller get out-of-memor error. We've seen this a lot in production >>> servers when the JobSandbox table is not constantly pruned of old records. >>> It would be nice if the poller restricted its search for only active >>> records it could process. >>> >> >> Did you use the purge-job-days setting in serviceengine.xml and the >> related purgeOldJobs? If not was there a reason? >> >> >> 2. Queue for capturing missing records would be good. From item 1 above >>> we >>> have had locks on table when the poller is busy doing a scan and new jobs >>> cannot be added or time out. >>> >> >> +1 >> >> >> Other wish items: >>> >>> - Ability to assign different service engines to process specific job >>> types. We often multiple application servers but want to limit how many >>> concurrent jobs are run. For example, if I had 4 app servers connected to >>> the same DB I may only want one app server to service particular jobs. I >>> thought this feature was possible but when I tried to implement it by >>> changing some of the configuration files it never worked correctly. >>> >> >> Las time I used this it was with R4.0 and it worked, which problems did >> you cross exactly (if you remember) ? >> >> Thanks >> >> >> - JMS support for the service engine. It would be nice if there was a JMS >>> interface for those that want to use JMS as their queuing mechanism for >>> jobs. >>> >> >> +1 >> >> Jacques >> >> >> >>> Brett >>> >>> On Sun, Aug 5, 2012 at 6:21 AM, Adrian Crum < >>> adrian.crum@sandglass-**software.com <[hidden email]>> >>> wrote: >>> >>> On 8/5/2012 11:02 AM, Adrian Crum wrote: >>>> >>>> I just committed a bunch of changes to the Job Manager group of classes. >>>>> The changes help simplify the code and hopefully make the Job Manager >>>>> more >>>>> robust. On the other hand, I might have broken something. ;) I will >>>>> monitor >>>>> the mailing list for problems. >>>>> >>>>> I believe the JobPoller settings in serviceengine.xml (the <thread-pool> >>>>> element) should be changed. I think min-threads should be set to "2" and >>>>> max-threads should be set to "5". Creating lots of threads can hurt >>>>> throughput because the JVM spends more time managing them. I would be >>>>> interested in hearing what others think. >>>>> >>>>> >>>> Thinking about this more, there are some other things that need to be >>>> fixed: >>>> >>>> 1. The JobPoller uses an unbounded queue. In a busy server, there is the >>>> potential the queue will grow in size until it causes an out-of-memory >>>> condition. >>>> 2. There is no accommodation for when a job cannot be added to the queue >>>> - >>>> it is just lost. We could add a dequeue method to the Job interface that >>>> will allow implementations to recover or reschedule the job when it can't >>>> be added to the queue. >>>> 3. There is a JobPoller instance per delegator, and each instance >>>> contains >>>> the number of threads configured in serviceengine.xml. With the current >>>> max-threads setting of 15, a multi-tenant installation with 100 tenants >>>> will create up to 1500 threads. (!!!) A smarter strategy might be to >>>> have a >>>> single JobPoller instance that services multiple JobManagers. >>>> >>>> -Adrian |
|
Jacques,
It sounds like what Adrian implemented would solve a lot of our problems with the service engine. Please see my comments inline.. On Wed, Aug 8, 2012 at 3:54 PM, Jacques Le Roux < [hidden email]> wrote: > Hi Brett, > > Interesting... > > Brett Palmer wrote: > >> *Jacques,* >> * >> >> I had to review some of my notes to remember what we were trying to do >> with >> the JobSandbox. Here are my replies to your questions: >> >> 1. Did you use the purge-job-days setting in serviceengine.xml and the >> related purgeOldJobs? If not was there a reason? >> >> We were not using the purgeOldJobs service. This was probably because we >> didn’t understand how the service worked. We may have thought the service >> was specific to order only jobs which would not have worked for us. Our >> jobs are custom service jobs for the particular application we are >> developing. >> > > I agree with Adrian, this can be perfected, using a smart dynamic way of > purging old jobs during Job Poller idle periods > > times. > > One problem that we had with most jobs that hit the JobSandbox (including >> the poller) was that it appeared they were doing full table scans instead >> of an indexed scan. These would cause problems for us when the JobSandbox >> grew larger and especially during heavy production days. We would often >> see transaction locks on the JobSandbox and I/O bottlenecks on the server >> in general due to the scans. The purgeOldJobs service may be a good >> solution for that if we could keep the JobSandbox to a reasonable number >> of >> records. >> >> I created issue OFBIZ-3855 on this a couple of years ago when we tried to >> use the JobSandbox as a batch process service for multiple application >> servers. We were filling up the JobSandbox with 100k of records over a >> short period of time. The poller was getting transaction timeouts before >> it could change the status of the next available job to process. I >> created >> a patch to allow a user to customize the transaction timeout for the >> poller. I thought I had submitted this patch but looking at the Jira >> issue >> it doesn’t look like it was every submitted. >> > > I put a comment there. I browsed (I can't really say reviewed) Adrian's > recent work, after Jacopo's, and it seems to me that it should address your > problem. Or at least is a sound foundation for that... > > > In the end we changed how we did our data warehouse processing. >> Increasing >> the transaction timeout didn’t really solve the problem either it just >> made >> it possible to extend the timeout length which can have other consequences >> in the system. >> >> If the community is still interested in the patch I can submit it to Jira >> for a recent version from the trunk. >> >> >> 2. Configuring service engine to run with multiple job pools. >> >> As I’m looking at my notes I believe the problem with configuring the >> service engine with multiple job pools was that there wasn’t an API to run >> a service (async or synchronous) to a specific job service pool. You >> could >> schedule a job to run against a particular pool. >> >> For example in the serviceengine.xml file you can configure a job to run >> in >> a particular job pool like the following: >> >> <startup-service name="testScv" runtime-data-id="9900" >> runtime-delay="0" >> run-in-pool="pool"/> >> >> You can also use the LocalDispatcher.schedule() method to schedule a job >> to >> run in a particular pool. >> >> What we needed was a way to configure our app servers to service different >> service pools but allow all app servers to request the service >> dynamically. >> > > I see, you want to have this dynamically done with an API, to better > handle where the jobs are running, not statically as done by the > thread-pool attribute. > > > an async service and assign it to a particular pool. For example, localDispatcher.Async("PoolName", other params...). This was for our data warehouse process that we wanted to be as close to real time as possible. For our application we would run multiple application servers talking to the same database. During heavy usage periods we could not have all app servers servicing this asynchronous requests as it would be competing for limited resources on our database. > This would allow us to limit the number of concurrent services that were >> run in our system. >> > > If I well understand what you mean by "concurrent services" (I guess you > mean jobs), when I want to avoid running concurrent services, I put the > semaphore service attribute to "fail". Since it uses the ServiceSemaphore > it should span across all services manager and thread-pools which use the > same DB. So far I did not cross issues with that, but maybe it can also be > improved, notably to guarantee any collisions in DB using SELECT for UPDATE. > > service attribute which would have helped. We ended up implementing a custom "SELECT for UPDATE" method with our servers with a semaphore table to prevent more than one data warehouse process running on a single application server. We scheduled this service to run once every 5 mins using the normal ofbiz scheduler. The problem was during high loads the process would often not complete and the service engine would start another service. We used the semaphore service to set a flag in a semaphore table to limit a single data warehouse process per server. Perhaps the semaphore service attribute could have done the same thing. > > The default system engine lets all the app servers >> service the jobSandbox which doesn’t scale well for us during heavy >> production days. >> > > Not sure to understand, you mean that assigning services to thread-pools > has no effects? I rather guess it was not sufficient from you explanation > above. > > > This is one of the reasons we liked the idea of a JMS integration with the >> service engine. Then we could start up processes to listen to a specific >> queues and our application could write to the different queues. This >> would >> allow us to control the amount of concurrent services processed at a time. >> > > There is already a JMS integration with the service engine. I use it for > the DCC https://cwiki.apache.org/**confluence/display/OFBIZ/** > Distributed+Entity+Cache+**Clear+Mechanism<https://cwiki.apache.org/confluence/display/OFBIZ/Distributed+Entity+Cache+Clear+Mechanism> > You want something more flexible, like the "dynamic thread-pool API" you > suggested, more integrated? > > do a lot of multi-server implementations with ofbiz. Thanks for your help. I'll take a look at the recent commits and post any questions I have to the list. Brett |
|
In reply to this post by Jacques Le Roux
On 8/8/2012 10:10 PM, Jacques Le Roux wrote:
> From: <[hidden email]> >> Quoting Brett Palmer <[hidden email]>: >> >>> *Jacques,* >>> * >>> I had to review some of my notes to remember what we were trying to >>> do with >>> the JobSandbox. Here are my replies to your questions: >>> >>> 1. Did you use the purge-job-days setting in serviceengine.xml and the >>> related purgeOldJobs? If not was there a reason? >>> >>> We were not using the purgeOldJobs service. This was probably >>> because we >>> didn’t understand how the service worked. We may have thought the >>> service >>> was specific to order only jobs which would not have worked for us. >>> Our >>> jobs are custom service jobs for the particular application we are >>> developing. >> >> I just looked at that service and I don't like it. It is simplistic >> and takes a brute-force approach. Basically, the service is >> scheduled to run at midnight each day and it purges all jobs older >> than a configurable time duration. There is no guarantee the Job >> Poller will be idle when that service fires. The Java code tries to >> limit the purge to 1000 record batches. That setting is hard-coded >> and appears arbitrary. >> >> I would prefer to put job purging back under the control of the Job >> Poller - where purges can be scheduled during queue idle periods and >> they can follow the same constraints as any other job. > > That sounds like a good replacement indeed > +1 Implemented in rev 1371140. -Adrian |
|
In reply to this post by Adrian Crum-3
On 8/5/2012 1:21 PM, Adrian Crum wrote:
> On 8/5/2012 11:02 AM, Adrian Crum wrote: >> I just committed a bunch of changes to the Job Manager group of >> classes. The changes help simplify the code and hopefully make the >> Job Manager more robust. On the other hand, I might have broken >> something. ;) I will monitor the mailing list for problems. >> >> I believe the JobPoller settings in serviceengine.xml (the >> <thread-pool> element) should be changed. I think min-threads should >> be set to "2" and max-threads should be set to "5". Creating lots of >> threads can hurt throughput because the JVM spends more time managing >> them. I would be interested in hearing what others think. > > Thinking about this more, there are some other things that need to be > fixed: > > 1. The JobPoller uses an unbounded queue. In a busy server, there is > the potential the queue will grow in size until it causes an > out-of-memory condition. > 2. There is no accommodation for when a job cannot be added to the > queue - it is just lost. We could add a dequeue method to the Job > interface that will allow implementations to recover or reschedule the > job when it can't be added to the queue. > 3. There is a JobPoller instance per delegator, and each instance > contains the number of threads configured in serviceengine.xml. With > the current max-threads setting of 15, a multi-tenant installation > with 100 tenants will create up to 1500 threads. (!!!) A smarter > strategy might be to have a single JobPoller instance that services > multiple JobManagers. I fixed #1 and #2. I am considering working on #3, but I want some feedback first. A JobPoller instance is created for each delegator. So, in a multi-tenant or multi-delegator scenario, multiple JobPollers will be created - which means one job queue per delegator and (threads per queue) threads per delegator. In a multi-server installation, things are multiplied: (# of servers * # of delegators) job queues. Fortunately, in that scenario we can disable the JobPoller on all but one server. So, we are left with the potential problem of too many queues/threads being created on a multi-delegator or multi-tenant server. So, I think there should be one JobPoller instance that services all delegators. At each polling interval, the JobPoller gets a list of jobs from each delegator (JobManager) - creating a list of lists. Then the JobPoller creates a queue candidate list from the list of lists - using a round-robin approach so each delegator gets an equal opportunity to queue jobs. The JobPoller queues the candidate list, and any candidates that don't fit in the queue are rescheduled. With this approach the JobPoller can service any number of delegators without saturating the server. What do you think? -Adrian |
|
Hi Adrian,
thanks for the excellent work you did on mini language and also here for the background jobs. Your proposal sounds like the way to go so +1. Regards, Hans On 08/12/2012 06:36 PM, Adrian Crum wrote: > On 8/5/2012 1:21 PM, Adrian Crum wrote: >> On 8/5/2012 11:02 AM, Adrian Crum wrote: >>> I just committed a bunch of changes to the Job Manager group of >>> classes. The changes help simplify the code and hopefully make the >>> Job Manager more robust. On the other hand, I might have broken >>> something. ;) I will monitor the mailing list for problems. >>> >>> I believe the JobPoller settings in serviceengine.xml (the >>> <thread-pool> element) should be changed. I think min-threads should >>> be set to "2" and max-threads should be set to "5". Creating lots of >>> threads can hurt throughput because the JVM spends more time >>> managing them. I would be interested in hearing what others think. >> >> Thinking about this more, there are some other things that need to be >> fixed: >> >> 1. The JobPoller uses an unbounded queue. In a busy server, there is >> the potential the queue will grow in size until it causes an >> out-of-memory condition. >> 2. There is no accommodation for when a job cannot be added to the >> queue - it is just lost. We could add a dequeue method to the Job >> interface that will allow implementations to recover or reschedule >> the job when it can't be added to the queue. >> 3. There is a JobPoller instance per delegator, and each instance >> contains the number of threads configured in serviceengine.xml. With >> the current max-threads setting of 15, a multi-tenant installation >> with 100 tenants will create up to 1500 threads. (!!!) A smarter >> strategy might be to have a single JobPoller instance that services >> multiple JobManagers. > > I fixed #1 and #2. I am considering working on #3, but I want some > feedback first. > > A JobPoller instance is created for each delegator. So, in a > multi-tenant or multi-delegator scenario, multiple JobPollers will be > created - which means one job queue per delegator and (threads per > queue) threads per delegator. In a multi-server installation, things > are multiplied: (# of servers * # of delegators) job queues. > Fortunately, in that scenario we can disable the JobPoller on all but > one server. > > So, we are left with the potential problem of too many queues/threads > being created on a multi-delegator or multi-tenant server. So, I think > there should be one JobPoller instance that services all delegators. > At each polling interval, the JobPoller gets a list of jobs from each > delegator (JobManager) - creating a list of lists. Then the JobPoller > creates a queue candidate list from the list of lists - using a > round-robin approach so each delegator gets an equal opportunity to > queue jobs. The JobPoller queues the candidate list, and any > candidates that don't fit in the queue are rescheduled. With this > approach the JobPoller can service any number of delegators without > saturating the server. > > What do you think? > > -Adrian > |
|
Administrator
|
This sounds indeed like a good, simple, and easy to maintain solution to this problem
+ Jacques From: "Hans Bakker" <[hidden email]> > Hi Adrian, > > thanks for the excellent work you did on mini language and also here for > the background jobs. > > Your proposal sounds like the way to go so +1. > > Regards, > Hans > > On 08/12/2012 06:36 PM, Adrian Crum wrote: >> On 8/5/2012 1:21 PM, Adrian Crum wrote: >>> On 8/5/2012 11:02 AM, Adrian Crum wrote: >>>> I just committed a bunch of changes to the Job Manager group of >>>> classes. The changes help simplify the code and hopefully make the >>>> Job Manager more robust. On the other hand, I might have broken >>>> something. ;) I will monitor the mailing list for problems. >>>> >>>> I believe the JobPoller settings in serviceengine.xml (the >>>> <thread-pool> element) should be changed. I think min-threads should >>>> be set to "2" and max-threads should be set to "5". Creating lots of >>>> threads can hurt throughput because the JVM spends more time >>>> managing them. I would be interested in hearing what others think. >>> >>> Thinking about this more, there are some other things that need to be >>> fixed: >>> >>> 1. The JobPoller uses an unbounded queue. In a busy server, there is >>> the potential the queue will grow in size until it causes an >>> out-of-memory condition. >>> 2. There is no accommodation for when a job cannot be added to the >>> queue - it is just lost. We could add a dequeue method to the Job >>> interface that will allow implementations to recover or reschedule >>> the job when it can't be added to the queue. >>> 3. There is a JobPoller instance per delegator, and each instance >>> contains the number of threads configured in serviceengine.xml. With >>> the current max-threads setting of 15, a multi-tenant installation >>> with 100 tenants will create up to 1500 threads. (!!!) A smarter >>> strategy might be to have a single JobPoller instance that services >>> multiple JobManagers. >> >> I fixed #1 and #2. I am considering working on #3, but I want some >> feedback first. >> >> A JobPoller instance is created for each delegator. So, in a >> multi-tenant or multi-delegator scenario, multiple JobPollers will be >> created - which means one job queue per delegator and (threads per >> queue) threads per delegator. In a multi-server installation, things >> are multiplied: (# of servers * # of delegators) job queues. >> Fortunately, in that scenario we can disable the JobPoller on all but >> one server. >> >> So, we are left with the potential problem of too many queues/threads >> being created on a multi-delegator or multi-tenant server. So, I think >> there should be one JobPoller instance that services all delegators. >> At each polling interval, the JobPoller gets a list of jobs from each >> delegator (JobManager) - creating a list of lists. Then the JobPoller >> creates a queue candidate list from the list of lists - using a >> round-robin approach so each delegator gets an equal opportunity to >> queue jobs. The JobPoller queues the candidate list, and any >> candidates that don't fit in the queue are rescheduled. With this >> approach the JobPoller can service any number of delegators without >> saturating the server. >> >> What do you think? >> >> -Adrian >> > |
|
In reply to this post by Adrian Crum-3
*Adrian,
I think the single JobPoller is a good idea and reduces the chance of too many JobPoller’s running on a machine. We often setup multiple delegators to communicate with different databases. For example, our data warehouse is hosted on a separate server. These databases usually have a full ofbiz schema on them (jobsandbox table, etc). Here is how our data warehouse process works: The application has several ofbiz servers talking to a primary database. These servers contain all the user information for our application. When a person logs into the application they are redirected to a secondary ofbiz server that is used for running the application under heavy loads. The data is captured on the secondary server. A data warehouse process is scheduled to run every 5 mins on these secondary servers. The secondary servers have a delegator that talks to its local database and a delegator to talk to the data warehouse. With the new job poller changes would the poller pick up jobs from the data warehouse database since it has a delegator that points to that instance? For this example, we would need to make sure the job poller on the secondary server only serviced jobs from its local database (default delegator) and not the our configured olap delegator. Let me know if this will be possible with the new job poller or if you have any questions from my scenario. I realize this scenario is not the typical ofbiz e-commerce type of setup everyone is use to, but we use ofbiz to create lots of different types applications and have found it very flexible for creating just about any type of ERP application. Thanks for your work on the job poller. Brett * On Sun, Aug 12, 2012 at 5:36 AM, Adrian Crum < [hidden email]> wrote: > On 8/5/2012 1:21 PM, Adrian Crum wrote: > >> On 8/5/2012 11:02 AM, Adrian Crum wrote: >> >>> I just committed a bunch of changes to the Job Manager group of classes. >>> The changes help simplify the code and hopefully make the Job Manager more >>> robust. On the other hand, I might have broken something. ;) I will monitor >>> the mailing list for problems. >>> >>> I believe the JobPoller settings in serviceengine.xml (the <thread-pool> >>> element) should be changed. I think min-threads should be set to "2" and >>> max-threads should be set to "5". Creating lots of threads can hurt >>> throughput because the JVM spends more time managing them. I would be >>> interested in hearing what others think. >>> >> >> Thinking about this more, there are some other things that need to be >> fixed: >> >> 1. The JobPoller uses an unbounded queue. In a busy server, there is the >> potential the queue will grow in size until it causes an out-of-memory >> condition. >> 2. There is no accommodation for when a job cannot be added to the queue >> - it is just lost. We could add a dequeue method to the Job interface that >> will allow implementations to recover or reschedule the job when it can't >> be added to the queue. >> 3. There is a JobPoller instance per delegator, and each instance >> contains the number of threads configured in serviceengine.xml. With the >> current max-threads setting of 15, a multi-tenant installation with 100 >> tenants will create up to 1500 threads. (!!!) A smarter strategy might be >> to have a single JobPoller instance that services multiple JobManagers. >> > > I fixed #1 and #2. I am considering working on #3, but I want some > feedback first. > > A JobPoller instance is created for each delegator. So, in a multi-tenant > or multi-delegator scenario, multiple JobPollers will be created - which > means one job queue per delegator and (threads per queue) threads per > delegator. In a multi-server installation, things are multiplied: (# of > servers * # of delegators) job queues. Fortunately, in that scenario we can > disable the JobPoller on all but one server. > > So, we are left with the potential problem of too many queues/threads > being created on a multi-delegator or multi-tenant server. So, I think > there should be one JobPoller instance that services all delegators. At > each polling interval, the JobPoller gets a list of jobs from each > delegator (JobManager) - creating a list of lists. Then the JobPoller > creates a queue candidate list from the list of lists - using a round-robin > approach so each delegator gets an equal opportunity to queue jobs. The > JobPoller queues the candidate list, and any candidates that don't fit in > the queue are rescheduled. With this approach the JobPoller can service any > number of delegators without saturating the server. > > What do you think? > > -Adrian > > |
|
I just updated the schema with documentation that should help everyone
understand how to set up the Job Manager/Job Poller. It seems to me it can accommodate the scenario you described. -Adrian On 8/13/2012 2:42 AM, Brett Palmer wrote: > *Adrian, > > I think the single JobPoller is a good idea and reduces the chance of too > many JobPoller’s running on a machine. > > We often setup multiple delegators to communicate with different databases. > For example, our data warehouse is hosted on a separate server. These > databases usually have a full ofbiz schema on them (jobsandbox table, etc). > > Here is how our data warehouse process works: > > The application has several ofbiz servers talking to a primary database. > These servers contain all the user information for our application. When > a person logs into the application they are redirected to a secondary ofbiz > server that is used for running the application under heavy loads. The > data is captured on the secondary server. > > A data warehouse process is scheduled to run every 5 mins on these > secondary servers. The secondary servers have a delegator that talks to > its local database and a delegator to talk to the data warehouse. > > With the new job poller changes would the poller pick up jobs from the data > warehouse database since it has a delegator that points to that instance? > > For this example, we would need to make sure the job poller on the > secondary server only serviced jobs from its local database (default > delegator) and not the our configured olap delegator. > > Let me know if this will be possible with the new job poller or if you have > any questions from my scenario. I realize this scenario is not the typical > ofbiz e-commerce type of setup everyone is use to, but we use ofbiz to > create lots of different types applications and have found it very flexible > for creating just about any type of ERP application. > > > Thanks for your work on the job poller. > > > Brett * > > On Sun, Aug 12, 2012 at 5:36 AM, Adrian Crum < > [hidden email]> wrote: > >> On 8/5/2012 1:21 PM, Adrian Crum wrote: >> >>> On 8/5/2012 11:02 AM, Adrian Crum wrote: >>> >>>> I just committed a bunch of changes to the Job Manager group of classes. >>>> The changes help simplify the code and hopefully make the Job Manager more >>>> robust. On the other hand, I might have broken something. ;) I will monitor >>>> the mailing list for problems. >>>> >>>> I believe the JobPoller settings in serviceengine.xml (the <thread-pool> >>>> element) should be changed. I think min-threads should be set to "2" and >>>> max-threads should be set to "5". Creating lots of threads can hurt >>>> throughput because the JVM spends more time managing them. I would be >>>> interested in hearing what others think. >>>> >>> Thinking about this more, there are some other things that need to be >>> fixed: >>> >>> 1. The JobPoller uses an unbounded queue. In a busy server, there is the >>> potential the queue will grow in size until it causes an out-of-memory >>> condition. >>> 2. There is no accommodation for when a job cannot be added to the queue >>> - it is just lost. We could add a dequeue method to the Job interface that >>> will allow implementations to recover or reschedule the job when it can't >>> be added to the queue. >>> 3. There is a JobPoller instance per delegator, and each instance >>> contains the number of threads configured in serviceengine.xml. With the >>> current max-threads setting of 15, a multi-tenant installation with 100 >>> tenants will create up to 1500 threads. (!!!) A smarter strategy might be >>> to have a single JobPoller instance that services multiple JobManagers. >>> >> I fixed #1 and #2. I am considering working on #3, but I want some >> feedback first. >> >> A JobPoller instance is created for each delegator. So, in a multi-tenant >> or multi-delegator scenario, multiple JobPollers will be created - which >> means one job queue per delegator and (threads per queue) threads per >> delegator. In a multi-server installation, things are multiplied: (# of >> servers * # of delegators) job queues. Fortunately, in that scenario we can >> disable the JobPoller on all but one server. >> >> So, we are left with the potential problem of too many queues/threads >> being created on a multi-delegator or multi-tenant server. So, I think >> there should be one JobPoller instance that services all delegators. At >> each polling interval, the JobPoller gets a list of jobs from each >> delegator (JobManager) - creating a list of lists. Then the JobPoller >> creates a queue candidate list from the list of lists - using a round-robin >> approach so each delegator gets an equal opportunity to queue jobs. The >> JobPoller queues the candidate list, and any candidates that don't fit in >> the queue are rescheduled. With this approach the JobPoller can service any >> number of delegators without saturating the server. >> >> What do you think? >> >> -Adrian >> >> |
| Free forum by Nabble | Edit this page |
