Job Manager

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
32 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Job Manager

Adrian Crum-3
I just committed a bunch of changes to the Job Manager group of classes.
The changes help simplify the code and hopefully make the Job Manager
more robust. On the other hand, I might have broken something. ;) I will
monitor the mailing list for problems.

I believe the JobPoller settings in serviceengine.xml (the <thread-pool>
element) should be changed. I think min-threads should be set to "2" and
max-threads should be set to "5". Creating lots of threads can hurt
throughput because the JVM spends more time managing them. I would be
interested in hearing what others think.

-Adrian



Reply | Threaded
Open this post in threaded view
|

Re: Job Manager

Adrian Crum-3
On 8/5/2012 11:02 AM, Adrian Crum wrote:

> I just committed a bunch of changes to the Job Manager group of
> classes. The changes help simplify the code and hopefully make the Job
> Manager more robust. On the other hand, I might have broken something.
> ;) I will monitor the mailing list for problems.
>
> I believe the JobPoller settings in serviceengine.xml (the
> <thread-pool> element) should be changed. I think min-threads should
> be set to "2" and max-threads should be set to "5". Creating lots of
> threads can hurt throughput because the JVM spends more time managing
> them. I would be interested in hearing what others think.

Thinking about this more, there are some other things that need to be fixed:

1. The JobPoller uses an unbounded queue. In a busy server, there is the
potential the queue will grow in size until it causes an out-of-memory
condition.
2. There is no accommodation for when a job cannot be added to the queue
- it is just lost. We could add a dequeue method to the Job interface
that will allow implementations to recover or reschedule the job when it
can't be added to the queue.
3. There is a JobPoller instance per delegator, and each instance
contains the number of threads configured in serviceengine.xml. With the
current max-threads setting of 15, a multi-tenant installation with 100
tenants will create up to 1500 threads. (!!!) A smarter strategy might
be to have a single JobPoller instance that services multiple JobManagers.

-Adrian


Reply | Threaded
Open this post in threaded view
|

Re: Job Manager

Brett
Adrian,

Thanks for the update.  Here are some feedback points on your listed items:

1. JobPoller get out-of-memor error.  We've seen this a lot in production
servers when the JobSandbox table is not constantly pruned of old records.
 It would be nice if the poller restricted its search for only active
records it could process.

2. Queue for capturing missing records would be good.  From item 1 above we
have had locks on table when the poller is busy doing a scan and new jobs
cannot be added or time out.

Other wish items:

- Ability to assign different service engines to process specific job
types.  We often multiple application servers but want to limit how many
concurrent jobs are run.  For example, if I had 4 app servers connected to
the same DB I may only want one app server to service particular jobs.  I
thought this feature was possible but when I tried to implement it by
changing some of the configuration files it never worked correctly.

- JMS support for the service engine.  It would be nice if there was a JMS
interface for those that want to use JMS as their queuing mechanism for
jobs.


Brett

On Sun, Aug 5, 2012 at 6:21 AM, Adrian Crum <
[hidden email]> wrote:

> On 8/5/2012 11:02 AM, Adrian Crum wrote:
>
>> I just committed a bunch of changes to the Job Manager group of classes.
>> The changes help simplify the code and hopefully make the Job Manager more
>> robust. On the other hand, I might have broken something. ;) I will monitor
>> the mailing list for problems.
>>
>> I believe the JobPoller settings in serviceengine.xml (the <thread-pool>
>> element) should be changed. I think min-threads should be set to "2" and
>> max-threads should be set to "5". Creating lots of threads can hurt
>> throughput because the JVM spends more time managing them. I would be
>> interested in hearing what others think.
>>
>
> Thinking about this more, there are some other things that need to be
> fixed:
>
> 1. The JobPoller uses an unbounded queue. In a busy server, there is the
> potential the queue will grow in size until it causes an out-of-memory
> condition.
> 2. There is no accommodation for when a job cannot be added to the queue -
> it is just lost. We could add a dequeue method to the Job interface that
> will allow implementations to recover or reschedule the job when it can't
> be added to the queue.
> 3. There is a JobPoller instance per delegator, and each instance contains
> the number of threads configured in serviceengine.xml. With the current
> max-threads setting of 15, a multi-tenant installation with 100 tenants
> will create up to 1500 threads. (!!!) A smarter strategy might be to have a
> single JobPoller instance that services multiple JobManagers.
>
> -Adrian
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Job Manager

Adrian Crum-3
Thanks Brett! I will be working on this again next weekend, and I will
try to include your suggestions.

-Adrian

On 8/5/2012 4:53 PM, Brett Palmer wrote:

> Adrian,
>
> Thanks for the update.  Here are some feedback points on your listed items:
>
> 1. JobPoller get out-of-memor error.  We've seen this a lot in production
> servers when the JobSandbox table is not constantly pruned of old records.
>   It would be nice if the poller restricted its search for only active
> records it could process.
>
> 2. Queue for capturing missing records would be good.  From item 1 above we
> have had locks on table when the poller is busy doing a scan and new jobs
> cannot be added or time out.
>
> Other wish items:
>
> - Ability to assign different service engines to process specific job
> types.  We often multiple application servers but want to limit how many
> concurrent jobs are run.  For example, if I had 4 app servers connected to
> the same DB I may only want one app server to service particular jobs.  I
> thought this feature was possible but when I tried to implement it by
> changing some of the configuration files it never worked correctly.
>
> - JMS support for the service engine.  It would be nice if there was a JMS
> interface for those that want to use JMS as their queuing mechanism for
> jobs.
>
>
> Brett
>
> On Sun, Aug 5, 2012 at 6:21 AM, Adrian Crum <
> [hidden email]> wrote:
>
>> On 8/5/2012 11:02 AM, Adrian Crum wrote:
>>
>>> I just committed a bunch of changes to the Job Manager group of classes.
>>> The changes help simplify the code and hopefully make the Job Manager more
>>> robust. On the other hand, I might have broken something. ;) I will monitor
>>> the mailing list for problems.
>>>
>>> I believe the JobPoller settings in serviceengine.xml (the <thread-pool>
>>> element) should be changed. I think min-threads should be set to "2" and
>>> max-threads should be set to "5". Creating lots of threads can hurt
>>> throughput because the JVM spends more time managing them. I would be
>>> interested in hearing what others think.
>>>
>> Thinking about this more, there are some other things that need to be
>> fixed:
>>
>> 1. The JobPoller uses an unbounded queue. In a busy server, there is the
>> potential the queue will grow in size until it causes an out-of-memory
>> condition.
>> 2. There is no accommodation for when a job cannot be added to the queue -
>> it is just lost. We could add a dequeue method to the Job interface that
>> will allow implementations to recover or reschedule the job when it can't
>> be added to the queue.
>> 3. There is a JobPoller instance per delegator, and each instance contains
>> the number of threads configured in serviceengine.xml. With the current
>> max-threads setting of 15, a multi-tenant installation with 100 tenants
>> will create up to 1500 threads. (!!!) A smarter strategy might be to have a
>> single JobPoller instance that services multiple JobManagers.
>>
>> -Adrian
>>
>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: Job Manager

Jacques Le Roux
Administrator
In reply to this post by Brett
Hi Brett,

From: "Brett Palmer" <[hidden email]>
> Adrian,
>
> Thanks for the update.  Here are some feedback points on your listed items:
>
> 1. JobPoller get out-of-memor error.  We've seen this a lot in production
> servers when the JobSandbox table is not constantly pruned of old records.
> It would be nice if the poller restricted its search for only active
> records it could process.

Did you use the purge-job-days setting in serviceengine.xml and the related  purgeOldJobs? If not was there a reason?
 
> 2. Queue for capturing missing records would be good.  From item 1 above we
> have had locks on table when the poller is busy doing a scan and new jobs
> cannot be added or time out.

+1
 
> Other wish items:
>
> - Ability to assign different service engines to process specific job
> types.  We often multiple application servers but want to limit how many
> concurrent jobs are run.  For example, if I had 4 app servers connected to
> the same DB I may only want one app server to service particular jobs.  I
> thought this feature was possible but when I tried to implement it by
> changing some of the configuration files it never worked correctly.

Las time I used this it was with R4.0 and it worked, which problems did you cross exactly (if you remember) ?

Thanks
 
> - JMS support for the service engine.  It would be nice if there was a JMS
> interface for those that want to use JMS as their queuing mechanism for
> jobs.

+1

Jacques

>
> Brett
>
> On Sun, Aug 5, 2012 at 6:21 AM, Adrian Crum <
> [hidden email]> wrote:
>
>> On 8/5/2012 11:02 AM, Adrian Crum wrote:
>>
>>> I just committed a bunch of changes to the Job Manager group of classes.
>>> The changes help simplify the code and hopefully make the Job Manager more
>>> robust. On the other hand, I might have broken something. ;) I will monitor
>>> the mailing list for problems.
>>>
>>> I believe the JobPoller settings in serviceengine.xml (the <thread-pool>
>>> element) should be changed. I think min-threads should be set to "2" and
>>> max-threads should be set to "5". Creating lots of threads can hurt
>>> throughput because the JVM spends more time managing them. I would be
>>> interested in hearing what others think.
>>>
>>
>> Thinking about this more, there are some other things that need to be
>> fixed:
>>
>> 1. The JobPoller uses an unbounded queue. In a busy server, there is the
>> potential the queue will grow in size until it causes an out-of-memory
>> condition.
>> 2. There is no accommodation for when a job cannot be added to the queue -
>> it is just lost. We could add a dequeue method to the Job interface that
>> will allow implementations to recover or reschedule the job when it can't
>> be added to the queue.
>> 3. There is a JobPoller instance per delegator, and each instance contains
>> the number of threads configured in serviceengine.xml. With the current
>> max-threads setting of 15, a multi-tenant installation with 100 tenants
>> will create up to 1500 threads. (!!!) A smarter strategy might be to have a
>> single JobPoller instance that services multiple JobManagers.
>>
>> -Adrian
>>
>>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Job Manager

Adrian Crum-3
We can use JMS as an option to replace the job queue and associated  
threads, but we can't eliminate the entity-engine-based Job Manager  
code because there are too many places where the JobSandbox entity is  
manipulated (a bad practice, but it is done a lot).

-Adrian

Quoting Jacques Le Roux <[hidden email]>:

> Hi Brett,
>
> From: "Brett Palmer" <[hidden email]>
>> Adrian,
>>
>> Thanks for the update.  Here are some feedback points on your listed items:
>>
>> 1. JobPoller get out-of-memor error.  We've seen this a lot in production
>> servers when the JobSandbox table is not constantly pruned of old records.
>> It would be nice if the poller restricted its search for only active
>> records it could process.
>
> Did you use the purge-job-days setting in serviceengine.xml and the  
> related  purgeOldJobs? If not was there a reason?
>
>> 2. Queue for capturing missing records would be good.  From item 1 above we
>> have had locks on table when the poller is busy doing a scan and new jobs
>> cannot be added or time out.
>
> +1
>
>> Other wish items:
>>
>> - Ability to assign different service engines to process specific job
>> types.  We often multiple application servers but want to limit how many
>> concurrent jobs are run.  For example, if I had 4 app servers connected to
>> the same DB I may only want one app server to service particular jobs.  I
>> thought this feature was possible but when I tried to implement it by
>> changing some of the configuration files it never worked correctly.
>
> Las time I used this it was with R4.0 and it worked, which problems  
> did you cross exactly (if you remember) ?
>
> Thanks
>
>> - JMS support for the service engine.  It would be nice if there was a JMS
>> interface for those that want to use JMS as their queuing mechanism for
>> jobs.
>
> +1
>
> Jacques
>
>>
>> Brett
>>
>> On Sun, Aug 5, 2012 at 6:21 AM, Adrian Crum <
>> [hidden email]> wrote:
>>
>>> On 8/5/2012 11:02 AM, Adrian Crum wrote:
>>>
>>>> I just committed a bunch of changes to the Job Manager group of classes.
>>>> The changes help simplify the code and hopefully make the Job Manager more
>>>> robust. On the other hand, I might have broken something. ;) I  
>>>> will monitor
>>>> the mailing list for problems.
>>>>
>>>> I believe the JobPoller settings in serviceengine.xml (the <thread-pool>
>>>> element) should be changed. I think min-threads should be set to "2" and
>>>> max-threads should be set to "5". Creating lots of threads can hurt
>>>> throughput because the JVM spends more time managing them. I would be
>>>> interested in hearing what others think.
>>>>
>>>
>>> Thinking about this more, there are some other things that need to be
>>> fixed:
>>>
>>> 1. The JobPoller uses an unbounded queue. In a busy server, there is the
>>> potential the queue will grow in size until it causes an out-of-memory
>>> condition.
>>> 2. There is no accommodation for when a job cannot be added to the queue -
>>> it is just lost. We could add a dequeue method to the Job interface that
>>> will allow implementations to recover or reschedule the job when it can't
>>> be added to the queue.
>>> 3. There is a JobPoller instance per delegator, and each instance contains
>>> the number of threads configured in serviceengine.xml. With the current
>>> max-threads setting of 15, a multi-tenant installation with 100 tenants
>>> will create up to 1500 threads. (!!!) A smarter strategy might be to have a
>>> single JobPoller instance that services multiple JobManagers.
>>>
>>> -Adrian
>>>
>>>
>>>
>>
>



Reply | Threaded
Open this post in threaded view
|

Re: Job Manager

Brett
In reply to this post by Jacques Le Roux
*Jacques,*
*
I had to review some of my notes to remember what we were trying to do with
the JobSandbox.  Here are my replies to your questions:

1. Did you use the purge-job-days setting in serviceengine.xml and the
related  purgeOldJobs? If not was there a reason?

We were not using the purgeOldJobs service.  This was probably because we
didn’t understand how the service worked.  We may have thought the service
was specific to order only jobs which would not have worked for us.  Our
jobs are custom service jobs for the particular application we are
developing.

One problem that we had with most jobs that hit the JobSandbox (including
the poller) was that it appeared they were doing full table scans instead
of an indexed scan.  These would cause problems for us when the JobSandbox
grew larger and especially during heavy production days.  We would often
see transaction locks on the JobSandbox and I/O bottlenecks on the server
in general due to the scans.  The purgeOldJobs service may be a good
solution for that if we could keep the JobSandbox to a reasonable number of
records.

I created issue OFBIZ-3855 on this a couple of years ago when we tried to
use the JobSandbox as a batch process service for multiple application
servers.  We were filling up the JobSandbox with 100k of records over a
short period of time.  The poller was getting transaction timeouts before
it could change the status of the next available job to process.  I created
a patch to allow a user to customize the transaction timeout for the
poller.  I thought I had submitted this patch but looking at the Jira issue
it doesn’t look like it was every submitted.

In the end we changed how we did our data warehouse processing.  Increasing
the transaction timeout didn’t really solve the problem either it just made
it possible to extend the timeout length which can have other consequences
in the system.

If the community is still interested in the patch I can submit it to Jira
for a recent version from the trunk.


2. Configuring service engine to run with multiple job pools.

As I’m looking at my notes I believe the problem with configuring the
service engine with multiple job pools was that there wasn’t an API to run
a service (async or synchronous) to a specific job service pool.  You could
schedule a job to run against a particular pool.

For example in the serviceengine.xml file you can configure a job to run in
a particular job pool like the following:

       <startup-service name="testScv" runtime-data-id="9900"
runtime-delay="0"
run-in-pool="pool"/>

You can also use the LocalDispatcher.schedule() method to schedule a job to
run in a particular pool.

What we needed was a way to configure our app servers to service different
service pools but allow all app servers to request the service dynamically.
 This would allow us to limit the number of concurrent services that were
run in our system.  The default system engine lets all the app servers
service the jobSandbox which doesn’t scale well for us during heavy
production days.

This is one of the reasons we liked the idea of a JMS integration with the
service engine.  Then we could start up processes to listen to a specific
queues and our application could write to the different queues.  This would
allow us to control the amount of concurrent services processed at a time.

Let me know if you need any more information.


Thanks,


Brett*

On Mon, Aug 6, 2012 at 1:10 AM, Jacques Le Roux <
[hidden email]> wrote:

> Hi Brett,
>
> From: "Brett Palmer" <[hidden email]>
>
>  Adrian,
>>
>> Thanks for the update.  Here are some feedback points on your listed
>> items:
>>
>> 1. JobPoller get out-of-memor error.  We've seen this a lot in production
>> servers when the JobSandbox table is not constantly pruned of old records.
>> It would be nice if the poller restricted its search for only active
>> records it could process.
>>
>
> Did you use the purge-job-days setting in serviceengine.xml and the
> related  purgeOldJobs? If not was there a reason?
>
>
>  2. Queue for capturing missing records would be good.  From item 1 above
>> we
>> have had locks on table when the poller is busy doing a scan and new jobs
>> cannot be added or time out.
>>
>
> +1
>
>
>  Other wish items:
>>
>> - Ability to assign different service engines to process specific job
>> types.  We often multiple application servers but want to limit how many
>> concurrent jobs are run.  For example, if I had 4 app servers connected to
>> the same DB I may only want one app server to service particular jobs.  I
>> thought this feature was possible but when I tried to implement it by
>> changing some of the configuration files it never worked correctly.
>>
>
> Las time I used this it was with R4.0 and it worked, which problems did
> you cross exactly (if you remember) ?
>
> Thanks
>
>
>  - JMS support for the service engine.  It would be nice if there was a JMS
>> interface for those that want to use JMS as their queuing mechanism for
>> jobs.
>>
>
> +1
>
> Jacques
>
>
>
>> Brett
>>
>> On Sun, Aug 5, 2012 at 6:21 AM, Adrian Crum <
>> adrian.crum@sandglass-**software.com <[hidden email]>>
>> wrote:
>>
>>  On 8/5/2012 11:02 AM, Adrian Crum wrote:
>>>
>>>  I just committed a bunch of changes to the Job Manager group of classes.
>>>> The changes help simplify the code and hopefully make the Job Manager
>>>> more
>>>> robust. On the other hand, I might have broken something. ;) I will
>>>> monitor
>>>> the mailing list for problems.
>>>>
>>>> I believe the JobPoller settings in serviceengine.xml (the <thread-pool>
>>>> element) should be changed. I think min-threads should be set to "2" and
>>>> max-threads should be set to "5". Creating lots of threads can hurt
>>>> throughput because the JVM spends more time managing them. I would be
>>>> interested in hearing what others think.
>>>>
>>>>
>>> Thinking about this more, there are some other things that need to be
>>> fixed:
>>>
>>> 1. The JobPoller uses an unbounded queue. In a busy server, there is the
>>> potential the queue will grow in size until it causes an out-of-memory
>>> condition.
>>> 2. There is no accommodation for when a job cannot be added to the queue
>>> -
>>> it is just lost. We could add a dequeue method to the Job interface that
>>> will allow implementations to recover or reschedule the job when it can't
>>> be added to the queue.
>>> 3. There is a JobPoller instance per delegator, and each instance
>>> contains
>>> the number of threads configured in serviceengine.xml. With the current
>>> max-threads setting of 15, a multi-tenant installation with 100 tenants
>>> will create up to 1500 threads. (!!!) A smarter strategy might be to
>>> have a
>>> single JobPoller instance that services multiple JobManagers.
>>>
>>> -Adrian
>>>
>>>
>>>
>>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: Job Manager

Adrian Crum-3
Thanks Brett! Your feedback and Jira issue will help a lot.

-Adrian

On 8/6/2012 10:52 PM, Brett Palmer wrote:

> *Jacques,*
> *
> I had to review some of my notes to remember what we were trying to do with
> the JobSandbox.  Here are my replies to your questions:
>
> 1. Did you use the purge-job-days setting in serviceengine.xml and the
> related  purgeOldJobs? If not was there a reason?
>
> We were not using the purgeOldJobs service.  This was probably because we
> didn’t understand how the service worked.  We may have thought the service
> was specific to order only jobs which would not have worked for us.  Our
> jobs are custom service jobs for the particular application we are
> developing.
>
> One problem that we had with most jobs that hit the JobSandbox (including
> the poller) was that it appeared they were doing full table scans instead
> of an indexed scan.  These would cause problems for us when the JobSandbox
> grew larger and especially during heavy production days.  We would often
> see transaction locks on the JobSandbox and I/O bottlenecks on the server
> in general due to the scans.  The purgeOldJobs service may be a good
> solution for that if we could keep the JobSandbox to a reasonable number of
> records.
>
> I created issue OFBIZ-3855 on this a couple of years ago when we tried to
> use the JobSandbox as a batch process service for multiple application
> servers.  We were filling up the JobSandbox with 100k of records over a
> short period of time.  The poller was getting transaction timeouts before
> it could change the status of the next available job to process.  I created
> a patch to allow a user to customize the transaction timeout for the
> poller.  I thought I had submitted this patch but looking at the Jira issue
> it doesn’t look like it was every submitted.
>
> In the end we changed how we did our data warehouse processing.  Increasing
> the transaction timeout didn’t really solve the problem either it just made
> it possible to extend the timeout length which can have other consequences
> in the system.
>
> If the community is still interested in the patch I can submit it to Jira
> for a recent version from the trunk.
>
>
> 2. Configuring service engine to run with multiple job pools.
>
> As I’m looking at my notes I believe the problem with configuring the
> service engine with multiple job pools was that there wasn’t an API to run
> a service (async or synchronous) to a specific job service pool.  You could
> schedule a job to run against a particular pool.
>
> For example in the serviceengine.xml file you can configure a job to run in
> a particular job pool like the following:
>
>         <startup-service name="testScv" runtime-data-id="9900"
> runtime-delay="0"
> run-in-pool="pool"/>
>
> You can also use the LocalDispatcher.schedule() method to schedule a job to
> run in a particular pool.
>
> What we needed was a way to configure our app servers to service different
> service pools but allow all app servers to request the service dynamically.
>   This would allow us to limit the number of concurrent services that were
> run in our system.  The default system engine lets all the app servers
> service the jobSandbox which doesn’t scale well for us during heavy
> production days.
>
> This is one of the reasons we liked the idea of a JMS integration with the
> service engine.  Then we could start up processes to listen to a specific
> queues and our application could write to the different queues.  This would
> allow us to control the amount of concurrent services processed at a time.
>
> Let me know if you need any more information.
>
>
> Thanks,
>
>
> Brett*
>
> On Mon, Aug 6, 2012 at 1:10 AM, Jacques Le Roux <
> [hidden email]> wrote:
>
>> Hi Brett,
>>
>> From: "Brett Palmer" <[hidden email]>
>>
>>   Adrian,
>>> Thanks for the update.  Here are some feedback points on your listed
>>> items:
>>>
>>> 1. JobPoller get out-of-memor error.  We've seen this a lot in production
>>> servers when the JobSandbox table is not constantly pruned of old records.
>>> It would be nice if the poller restricted its search for only active
>>> records it could process.
>>>
>> Did you use the purge-job-days setting in serviceengine.xml and the
>> related  purgeOldJobs? If not was there a reason?
>>
>>
>>   2. Queue for capturing missing records would be good.  From item 1 above
>>> we
>>> have had locks on table when the poller is busy doing a scan and new jobs
>>> cannot be added or time out.
>>>
>> +1
>>
>>
>>   Other wish items:
>>> - Ability to assign different service engines to process specific job
>>> types.  We often multiple application servers but want to limit how many
>>> concurrent jobs are run.  For example, if I had 4 app servers connected to
>>> the same DB I may only want one app server to service particular jobs.  I
>>> thought this feature was possible but when I tried to implement it by
>>> changing some of the configuration files it never worked correctly.
>>>
>> Las time I used this it was with R4.0 and it worked, which problems did
>> you cross exactly (if you remember) ?
>>
>> Thanks
>>
>>
>>   - JMS support for the service engine.  It would be nice if there was a JMS
>>> interface for those that want to use JMS as their queuing mechanism for
>>> jobs.
>>>
>> +1
>>
>> Jacques
>>
>>
>>
>>> Brett
>>>
>>> On Sun, Aug 5, 2012 at 6:21 AM, Adrian Crum <
>>> adrian.crum@sandglass-**software.com <[hidden email]>>
>>> wrote:
>>>
>>>   On 8/5/2012 11:02 AM, Adrian Crum wrote:
>>>>   I just committed a bunch of changes to the Job Manager group of classes.
>>>>> The changes help simplify the code and hopefully make the Job Manager
>>>>> more
>>>>> robust. On the other hand, I might have broken something. ;) I will
>>>>> monitor
>>>>> the mailing list for problems.
>>>>>
>>>>> I believe the JobPoller settings in serviceengine.xml (the <thread-pool>
>>>>> element) should be changed. I think min-threads should be set to "2" and
>>>>> max-threads should be set to "5". Creating lots of threads can hurt
>>>>> throughput because the JVM spends more time managing them. I would be
>>>>> interested in hearing what others think.
>>>>>
>>>>>
>>>> Thinking about this more, there are some other things that need to be
>>>> fixed:
>>>>
>>>> 1. The JobPoller uses an unbounded queue. In a busy server, there is the
>>>> potential the queue will grow in size until it causes an out-of-memory
>>>> condition.
>>>> 2. There is no accommodation for when a job cannot be added to the queue
>>>> -
>>>> it is just lost. We could add a dequeue method to the Job interface that
>>>> will allow implementations to recover or reschedule the job when it can't
>>>> be added to the queue.
>>>> 3. There is a JobPoller instance per delegator, and each instance
>>>> contains
>>>> the number of threads configured in serviceengine.xml. With the current
>>>> max-threads setting of 15, a multi-tenant installation with 100 tenants
>>>> will create up to 1500 threads. (!!!) A smarter strategy might be to
>>>> have a
>>>> single JobPoller instance that services multiple JobManagers.
>>>>
>>>> -Adrian
>>>>
>>>>
>>>>
>>>>

Reply | Threaded
Open this post in threaded view
|

Re: Job Manager

Adrian Crum-3
In reply to this post by Brett
Brett,

I think I solved your problems with my recent commits, ending with rev
1370566. Let me know if it helps.

-Adrian

On 8/5/2012 4:53 PM, Brett Palmer wrote:

> Adrian,
>
> Thanks for the update.  Here are some feedback points on your listed items:
>
> 1. JobPoller get out-of-memor error.  We've seen this a lot in production
> servers when the JobSandbox table is not constantly pruned of old records.
>   It would be nice if the poller restricted its search for only active
> records it could process.
>
> 2. Queue for capturing missing records would be good.  From item 1 above we
> have had locks on table when the poller is busy doing a scan and new jobs
> cannot be added or time out.
>
> Other wish items:
>
> - Ability to assign different service engines to process specific job
> types.  We often multiple application servers but want to limit how many
> concurrent jobs are run.  For example, if I had 4 app servers connected to
> the same DB I may only want one app server to service particular jobs.  I
> thought this feature was possible but when I tried to implement it by
> changing some of the configuration files it never worked correctly.
>
> - JMS support for the service engine.  It would be nice if there was a JMS
> interface for those that want to use JMS as their queuing mechanism for
> jobs.
>
>
> Brett
>
> On Sun, Aug 5, 2012 at 6:21 AM, Adrian Crum <
> [hidden email]> wrote:
>
>> On 8/5/2012 11:02 AM, Adrian Crum wrote:
>>
>>> I just committed a bunch of changes to the Job Manager group of classes.
>>> The changes help simplify the code and hopefully make the Job Manager more
>>> robust. On the other hand, I might have broken something. ;) I will monitor
>>> the mailing list for problems.
>>>
>>> I believe the JobPoller settings in serviceengine.xml (the <thread-pool>
>>> element) should be changed. I think min-threads should be set to "2" and
>>> max-threads should be set to "5". Creating lots of threads can hurt
>>> throughput because the JVM spends more time managing them. I would be
>>> interested in hearing what others think.
>>>
>> Thinking about this more, there are some other things that need to be
>> fixed:
>>
>> 1. The JobPoller uses an unbounded queue. In a busy server, there is the
>> potential the queue will grow in size until it causes an out-of-memory
>> condition.
>> 2. There is no accommodation for when a job cannot be added to the queue -
>> it is just lost. We could add a dequeue method to the Job interface that
>> will allow implementations to recover or reschedule the job when it can't
>> be added to the queue.
>> 3. There is a JobPoller instance per delegator, and each instance contains
>> the number of threads configured in serviceengine.xml. With the current
>> max-threads setting of 15, a multi-tenant installation with 100 tenants
>> will create up to 1500 threads. (!!!) A smarter strategy might be to have a
>> single JobPoller instance that services multiple JobManagers.
>>
>> -Adrian
>>
>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: Job Manager

Brett
Adrian,

Thanks I'll take an update and try it out.

Brett
On Aug 7, 2012 4:23 PM, "Adrian Crum" <[hidden email]>
wrote:

> Brett,
>
> I think I solved your problems with my recent commits, ending with rev
> 1370566. Let me know if it helps.
>
> -Adrian
>
> On 8/5/2012 4:53 PM, Brett Palmer wrote:
>
>> Adrian,
>>
>> Thanks for the update.  Here are some feedback points on your listed
>> items:
>>
>> 1. JobPoller get out-of-memor error.  We've seen this a lot in production
>> servers when the JobSandbox table is not constantly pruned of old records.
>>   It would be nice if the poller restricted its search for only active
>> records it could process.
>>
>> 2. Queue for capturing missing records would be good.  From item 1 above
>> we
>> have had locks on table when the poller is busy doing a scan and new jobs
>> cannot be added or time out.
>>
>> Other wish items:
>>
>> - Ability to assign different service engines to process specific job
>> types.  We often multiple application servers but want to limit how many
>> concurrent jobs are run.  For example, if I had 4 app servers connected to
>> the same DB I may only want one app server to service particular jobs.  I
>> thought this feature was possible but when I tried to implement it by
>> changing some of the configuration files it never worked correctly.
>>
>> - JMS support for the service engine.  It would be nice if there was a JMS
>> interface for those that want to use JMS as their queuing mechanism for
>> jobs.
>>
>>
>> Brett
>>
>> On Sun, Aug 5, 2012 at 6:21 AM, Adrian Crum <
>> adrian.crum@sandglass-**software.com <[hidden email]>>
>> wrote:
>>
>>  On 8/5/2012 11:02 AM, Adrian Crum wrote:
>>>
>>>  I just committed a bunch of changes to the Job Manager group of classes.
>>>> The changes help simplify the code and hopefully make the Job Manager
>>>> more
>>>> robust. On the other hand, I might have broken something. ;) I will
>>>> monitor
>>>> the mailing list for problems.
>>>>
>>>> I believe the JobPoller settings in serviceengine.xml (the <thread-pool>
>>>> element) should be changed. I think min-threads should be set to "2" and
>>>> max-threads should be set to "5". Creating lots of threads can hurt
>>>> throughput because the JVM spends more time managing them. I would be
>>>> interested in hearing what others think.
>>>>
>>>>  Thinking about this more, there are some other things that need to be
>>> fixed:
>>>
>>> 1. The JobPoller uses an unbounded queue. In a busy server, there is the
>>> potential the queue will grow in size until it causes an out-of-memory
>>> condition.
>>> 2. There is no accommodation for when a job cannot be added to the queue
>>> -
>>> it is just lost. We could add a dequeue method to the Job interface that
>>> will allow implementations to recover or reschedule the job when it can't
>>> be added to the queue.
>>> 3. There is a JobPoller instance per delegator, and each instance
>>> contains
>>> the number of threads configured in serviceengine.xml. With the current
>>> max-threads setting of 15, a multi-tenant installation with 100 tenants
>>> will create up to 1500 threads. (!!!) A smarter strategy might be to
>>> have a
>>> single JobPoller instance that services multiple JobManagers.
>>>
>>> -Adrian
>>>
>>>
>>>
>>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Job Manager

Adrian Crum-3
In reply to this post by Brett
Quoting Brett Palmer <[hidden email]>:

> *Jacques,*
> *
> I had to review some of my notes to remember what we were trying to do with
> the JobSandbox.  Here are my replies to your questions:
>
> 1. Did you use the purge-job-days setting in serviceengine.xml and the
> related  purgeOldJobs? If not was there a reason?
>
> We were not using the purgeOldJobs service.  This was probably because we
> didn’t understand how the service worked.  We may have thought the service
> was specific to order only jobs which would not have worked for us.  Our
> jobs are custom service jobs for the particular application we are
> developing.

I just looked at that service and I don't like it. It is simplistic  
and takes a brute-force approach. Basically, the service is scheduled  
to run at midnight each day and it purges all jobs older than a  
configurable time duration. There is no guarantee the Job Poller will  
be idle when that service fires. The Java code tries to limit the  
purge to 1000 record batches. That setting is hard-coded and appears  
arbitrary.

I would prefer to put job purging back under the control of the Job  
Poller - where purges can be scheduled during queue idle periods and  
they can follow the same constraints as any other job.

-Adrian

Reply | Threaded
Open this post in threaded view
|

Re: Job Manager

Jacques Le Roux
Administrator
From: <[hidden email]>

> Quoting Brett Palmer <[hidden email]>:
>
>> *Jacques,*
>> *
>> I had to review some of my notes to remember what we were trying to do with
>> the JobSandbox.  Here are my replies to your questions:
>>
>> 1. Did you use the purge-job-days setting in serviceengine.xml and the
>> related  purgeOldJobs? If not was there a reason?
>>
>> We were not using the purgeOldJobs service.  This was probably because we
>> didn’t understand how the service worked.  We may have thought the service
>> was specific to order only jobs which would not have worked for us.  Our
>> jobs are custom service jobs for the particular application we are
>> developing.
>
> I just looked at that service and I don't like it. It is simplistic  and takes a brute-force approach. Basically, the service is
> scheduled  to run at midnight each day and it purges all jobs older than a  configurable time duration. There is no guarantee the
> Job Poller will  be idle when that service fires. The Java code tries to limit the  purge to 1000 record batches. That setting is
> hard-coded and appears  arbitrary.
>
> I would prefer to put job purging back under the control of the Job  Poller - where purges can be scheduled during queue idle
> periods and  they can follow the same constraints as any other job.

That sounds like a good replacement indeed
+1

Jacques

> -Adrian
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Job Manager

Jacques Le Roux
Administrator
In reply to this post by Brett
Hi Brett,

Interesting...

Brett Palmer wrote:

> *Jacques,*
> *
> I had to review some of my notes to remember what we were trying to do with
> the JobSandbox.  Here are my replies to your questions:
>
> 1. Did you use the purge-job-days setting in serviceengine.xml and the
> related  purgeOldJobs? If not was there a reason?
>
> We were not using the purgeOldJobs service.  This was probably because we
> didn’t understand how the service worked.  We may have thought the service
> was specific to order only jobs which would not have worked for us.  Our
> jobs are custom service jobs for the particular application we are
> developing.

I agree with Adrian, this can be perfected, using a smart dynamic way of purging old jobs during Job Poller idle periods

> One problem that we had with most jobs that hit the JobSandbox (including
> the poller) was that it appeared they were doing full table scans instead
> of an indexed scan.  These would cause problems for us when the JobSandbox
> grew larger and especially during heavy production days.  We would often
> see transaction locks on the JobSandbox and I/O bottlenecks on the server
> in general due to the scans.  The purgeOldJobs service may be a good
> solution for that if we could keep the JobSandbox to a reasonable number of
> records.
>
> I created issue OFBIZ-3855 on this a couple of years ago when we tried to
> use the JobSandbox as a batch process service for multiple application
> servers.  We were filling up the JobSandbox with 100k of records over a
> short period of time.  The poller was getting transaction timeouts before
> it could change the status of the next available job to process.  I created
> a patch to allow a user to customize the transaction timeout for the
> poller.  I thought I had submitted this patch but looking at the Jira issue
> it doesn’t look like it was every submitted.

I put a comment there. I browsed (I can't really say reviewed) Adrian's recent work, after Jacopo's, and it seems to me that it
should address your problem. Or at least is a sound foundation for that...

> In the end we changed how we did our data warehouse processing.  Increasing
> the transaction timeout didn’t really solve the problem either it just made
> it possible to extend the timeout length which can have other consequences
> in the system.
>
> If the community is still interested in the patch I can submit it to Jira
> for a recent version from the trunk.
>
>
> 2. Configuring service engine to run with multiple job pools.
>
> As I’m looking at my notes I believe the problem with configuring the
> service engine with multiple job pools was that there wasn’t an API to run
> a service (async or synchronous) to a specific job service pool.  You could
> schedule a job to run against a particular pool.
>
> For example in the serviceengine.xml file you can configure a job to run in
> a particular job pool like the following:
>
>        <startup-service name="testScv" runtime-data-id="9900"
> runtime-delay="0"
> run-in-pool="pool"/>
>
> You can also use the LocalDispatcher.schedule() method to schedule a job to
> run in a particular pool.
>
> What we needed was a way to configure our app servers to service different
> service pools but allow all app servers to request the service dynamically.

I see, you want to have this dynamically done with an API, to better handle where the jobs are running, not statically as done by
the thread-pool attribute.

>  This would allow us to limit the number of concurrent services that were
> run in our system.

If I well understand what you mean by "concurrent services" (I guess you mean jobs), when I want to avoid running concurrent
services, I put the semaphore service attribute to "fail". Since it uses the ServiceSemaphore it should span across all services
manager and thread-pools which use the same DB. So far I did not cross issues with that, but maybe it can also be improved, notably
to guarantee any collisions in DB using SELECT for UPDATE.

>The default system engine lets all the app servers
> service the jobSandbox which doesn’t scale well for us during heavy
> production days.

Not sure to understand, you mean that assigning services to thread-pools has no effects? I rather guess it was not sufficient from
you explanation above.

> This is one of the reasons we liked the idea of a JMS integration with the
> service engine.  Then we could start up processes to listen to a specific
> queues and our application could write to the different queues.  This would
> allow us to control the amount of concurrent services processed at a time.

There is already a JMS integration with the service engine. I use it for the DCC
https://cwiki.apache.org/confluence/display/OFBIZ/Distributed+Entity+Cache+Clear+Mechanism
You want something more flexible, like the "dynamic thread-pool API" you suggested, more integrated?

Jacques

> Let me know if you need any more information.
>
>
> Thanks,
>
>
> Brett*
>
> On Mon, Aug 6, 2012 at 1:10 AM, Jacques Le Roux <
> [hidden email]> wrote:
>
>> Hi Brett,
>>
>> From: "Brett Palmer" <[hidden email]>
>>
>>  Adrian,
>>>
>>> Thanks for the update.  Here are some feedback points on your listed
>>> items:
>>>
>>> 1. JobPoller get out-of-memor error.  We've seen this a lot in production
>>> servers when the JobSandbox table is not constantly pruned of old records.
>>> It would be nice if the poller restricted its search for only active
>>> records it could process.
>>>
>>
>> Did you use the purge-job-days setting in serviceengine.xml and the
>> related  purgeOldJobs? If not was there a reason?
>>
>>
>>  2. Queue for capturing missing records would be good.  From item 1 above
>>> we
>>> have had locks on table when the poller is busy doing a scan and new jobs
>>> cannot be added or time out.
>>>
>>
>> +1
>>
>>
>>  Other wish items:
>>>
>>> - Ability to assign different service engines to process specific job
>>> types.  We often multiple application servers but want to limit how many
>>> concurrent jobs are run.  For example, if I had 4 app servers connected to
>>> the same DB I may only want one app server to service particular jobs.  I
>>> thought this feature was possible but when I tried to implement it by
>>> changing some of the configuration files it never worked correctly.
>>>
>>
>> Las time I used this it was with R4.0 and it worked, which problems did
>> you cross exactly (if you remember) ?
>>
>> Thanks
>>
>>
>>  - JMS support for the service engine.  It would be nice if there was a JMS
>>> interface for those that want to use JMS as their queuing mechanism for
>>> jobs.
>>>
>>
>> +1
>>
>> Jacques
>>
>>
>>
>>> Brett
>>>
>>> On Sun, Aug 5, 2012 at 6:21 AM, Adrian Crum <
>>> adrian.crum@sandglass-**software.com <[hidden email]>>
>>> wrote:
>>>
>>>  On 8/5/2012 11:02 AM, Adrian Crum wrote:
>>>>
>>>>  I just committed a bunch of changes to the Job Manager group of classes.
>>>>> The changes help simplify the code and hopefully make the Job Manager
>>>>> more
>>>>> robust. On the other hand, I might have broken something. ;) I will
>>>>> monitor
>>>>> the mailing list for problems.
>>>>>
>>>>> I believe the JobPoller settings in serviceengine.xml (the <thread-pool>
>>>>> element) should be changed. I think min-threads should be set to "2" and
>>>>> max-threads should be set to "5". Creating lots of threads can hurt
>>>>> throughput because the JVM spends more time managing them. I would be
>>>>> interested in hearing what others think.
>>>>>
>>>>>
>>>> Thinking about this more, there are some other things that need to be
>>>> fixed:
>>>>
>>>> 1. The JobPoller uses an unbounded queue. In a busy server, there is the
>>>> potential the queue will grow in size until it causes an out-of-memory
>>>> condition.
>>>> 2. There is no accommodation for when a job cannot be added to the queue
>>>> -
>>>> it is just lost. We could add a dequeue method to the Job interface that
>>>> will allow implementations to recover or reschedule the job when it can't
>>>> be added to the queue.
>>>> 3. There is a JobPoller instance per delegator, and each instance
>>>> contains
>>>> the number of threads configured in serviceengine.xml. With the current
>>>> max-threads setting of 15, a multi-tenant installation with 100 tenants
>>>> will create up to 1500 threads. (!!!) A smarter strategy might be to
>>>> have a
>>>> single JobPoller instance that services multiple JobManagers.
>>>>
>>>> -Adrian
Reply | Threaded
Open this post in threaded view
|

Re: Job Manager

Brett
Jacques,

It sounds like what Adrian implemented would solve a lot of our problems
with the service engine.  Please see my comments inline..

On Wed, Aug 8, 2012 at 3:54 PM, Jacques Le Roux <
[hidden email]> wrote:

> Hi Brett,
>
> Interesting...
>
> Brett Palmer wrote:
>
>> *Jacques,*
>> *
>>
>> I had to review some of my notes to remember what we were trying to do
>> with
>> the JobSandbox.  Here are my replies to your questions:
>>
>> 1. Did you use the purge-job-days setting in serviceengine.xml and the
>> related  purgeOldJobs? If not was there a reason?
>>
>> We were not using the purgeOldJobs service.  This was probably because we
>> didn’t understand how the service worked.  We may have thought the service
>> was specific to order only jobs which would not have worked for us.  Our
>> jobs are custom service jobs for the particular application we are
>> developing.
>>
>
> I agree with Adrian, this can be perfected, using a smart dynamic way of
> purging old jobs during Job Poller idle periods
>
>
Yes, a smart poller would work and avoid conflicts during heavy transaction
times.


>
>  One problem that we had with most jobs that hit the JobSandbox (including
>> the poller) was that it appeared they were doing full table scans instead
>> of an indexed scan.  These would cause problems for us when the JobSandbox
>> grew larger and especially during heavy production days.  We would often
>> see transaction locks on the JobSandbox and I/O bottlenecks on the server
>> in general due to the scans.  The purgeOldJobs service may be a good
>> solution for that if we could keep the JobSandbox to a reasonable number
>> of
>> records.
>>
>> I created issue OFBIZ-3855 on this a couple of years ago when we tried to
>> use the JobSandbox as a batch process service for multiple application
>> servers.  We were filling up the JobSandbox with 100k of records over a
>> short period of time.  The poller was getting transaction timeouts before
>> it could change the status of the next available job to process.  I
>> created
>> a patch to allow a user to customize the transaction timeout for the
>> poller.  I thought I had submitted this patch but looking at the Jira
>> issue
>> it doesn’t look like it was every submitted.
>>
>
> I put a comment there. I browsed (I can't really say reviewed) Adrian's
> recent work, after Jacopo's, and it seems to me that it should address your
> problem. Or at least is a sound foundation for that...
>
>
>  In the end we changed how we did our data warehouse processing.
>>  Increasing
>> the transaction timeout didn’t really solve the problem either it just
>> made
>> it possible to extend the timeout length which can have other consequences
>> in the system.
>>
>> If the community is still interested in the patch I can submit it to Jira
>> for a recent version from the trunk.
>>
>>
>> 2. Configuring service engine to run with multiple job pools.
>>
>> As I’m looking at my notes I believe the problem with configuring the
>> service engine with multiple job pools was that there wasn’t an API to run
>> a service (async or synchronous) to a specific job service pool.  You
>> could
>> schedule a job to run against a particular pool.
>>
>> For example in the serviceengine.xml file you can configure a job to run
>> in
>> a particular job pool like the following:
>>
>>        <startup-service name="testScv" runtime-data-id="9900"
>> runtime-delay="0"
>> run-in-pool="pool"/>
>>
>> You can also use the LocalDispatcher.schedule() method to schedule a job
>> to
>> run in a particular pool.
>>
>> What we needed was a way to configure our app servers to service different
>> service pools but allow all app servers to request the service
>> dynamically.
>>
>
> I see, you want to have this dynamically done with an API, to better
> handle where the jobs are running, not statically as done by the
> thread-pool attribute.
>
>
>
At the time we were looking for a method with the service dispatcher to run
an async service and assign it to a particular pool.  For example,
 localDispatcher.Async("PoolName", other params...).  This was for our data
warehouse process that we wanted to be as close to real time as possible.
 For our application we would run multiple application servers talking to
the same database.  During heavy usage periods we could not have all app
servers servicing this asynchronous requests as it would be competing for
limited resources on our database.



>   This would allow us to limit the number of concurrent services that were
>> run in our system.
>>
>
> If I well understand what you mean by "concurrent services" (I guess you
> mean jobs), when I want to avoid running concurrent services, I put the
> semaphore service attribute to "fail". Since it uses the ServiceSemaphore
> it should span across all services manager and thread-pools which use the
> same DB. So far I did not cross issues with that, but maybe it can also be
> improved, notably to guarantee any collisions in DB using SELECT for UPDATE.
>
>
Yes, I mean jobs in the JobSandbox.  I was not aware of the semaphore
service attribute which would have helped.  We ended up implementing a
custom "SELECT for UPDATE" method with our servers with a semaphore table
to prevent more than one data warehouse process running on a single
application server.

We scheduled this service to run once every 5 mins using the normal ofbiz
scheduler.  The problem was during high loads the process would often not
complete and the service engine would start another service.  We used the
semaphore service to set a flag in a semaphore table to limit a single data
warehouse process per server.  Perhaps the semaphore service attribute
could have done the same thing.


>
>  The default system engine lets all the app servers
>> service the jobSandbox which doesn’t scale well for us during heavy
>> production days.
>>
>
> Not sure to understand, you mean that assigning services to thread-pools
> has no effects? I rather guess it was not sufficient from you explanation
> above.
>
>
>  This is one of the reasons we liked the idea of a JMS integration with the
>> service engine.  Then we could start up processes to listen to a specific
>> queues and our application could write to the different queues.  This
>> would
>> allow us to control the amount of concurrent services processed at a time.
>>
>
> There is already a JMS integration with the service engine. I use it for
> the DCC https://cwiki.apache.org/**confluence/display/OFBIZ/**
> Distributed+Entity+Cache+**Clear+Mechanism<https://cwiki.apache.org/confluence/display/OFBIZ/Distributed+Entity+Cache+Clear+Mechanism>
> You want something more flexible, like the "dynamic thread-pool API" you
> suggested, more integrated?
>
>
Great article on using JMS with ofbiz.  This is something we can use as we
do a lot of multi-server implementations with ofbiz.


Thanks for your help.  I'll take a look at the recent commits and post any
questions I have to the list.



Brett
Reply | Threaded
Open this post in threaded view
|

Re: Job Manager

Adrian Crum-3
In reply to this post by Jacques Le Roux
On 8/8/2012 10:10 PM, Jacques Le Roux wrote:

> From: <[hidden email]>
>> Quoting Brett Palmer <[hidden email]>:
>>
>>> *Jacques,*
>>> *
>>> I had to review some of my notes to remember what we were trying to
>>> do with
>>> the JobSandbox.  Here are my replies to your questions:
>>>
>>> 1. Did you use the purge-job-days setting in serviceengine.xml and the
>>> related  purgeOldJobs? If not was there a reason?
>>>
>>> We were not using the purgeOldJobs service.  This was probably
>>> because we
>>> didn’t understand how the service worked.  We may have thought the
>>> service
>>> was specific to order only jobs which would not have worked for us.  
>>> Our
>>> jobs are custom service jobs for the particular application we are
>>> developing.
>>
>> I just looked at that service and I don't like it. It is simplistic  
>> and takes a brute-force approach. Basically, the service is
>> scheduled  to run at midnight each day and it purges all jobs older
>> than a  configurable time duration. There is no guarantee the Job
>> Poller will  be idle when that service fires. The Java code tries to
>> limit the  purge to 1000 record batches. That setting is hard-coded
>> and appears  arbitrary.
>>
>> I would prefer to put job purging back under the control of the Job  
>> Poller - where purges can be scheduled during queue idle periods and  
>> they can follow the same constraints as any other job.
>
> That sounds like a good replacement indeed
> +1

Implemented in rev 1371140.

-Adrian

Reply | Threaded
Open this post in threaded view
|

Job Manager Part 2

Adrian Crum-3
In reply to this post by Adrian Crum-3
On 8/5/2012 1:21 PM, Adrian Crum wrote:

> On 8/5/2012 11:02 AM, Adrian Crum wrote:
>> I just committed a bunch of changes to the Job Manager group of
>> classes. The changes help simplify the code and hopefully make the
>> Job Manager more robust. On the other hand, I might have broken
>> something. ;) I will monitor the mailing list for problems.
>>
>> I believe the JobPoller settings in serviceengine.xml (the
>> <thread-pool> element) should be changed. I think min-threads should
>> be set to "2" and max-threads should be set to "5". Creating lots of
>> threads can hurt throughput because the JVM spends more time managing
>> them. I would be interested in hearing what others think.
>
> Thinking about this more, there are some other things that need to be
> fixed:
>
> 1. The JobPoller uses an unbounded queue. In a busy server, there is
> the potential the queue will grow in size until it causes an
> out-of-memory condition.
> 2. There is no accommodation for when a job cannot be added to the
> queue - it is just lost. We could add a dequeue method to the Job
> interface that will allow implementations to recover or reschedule the
> job when it can't be added to the queue.
> 3. There is a JobPoller instance per delegator, and each instance
> contains the number of threads configured in serviceengine.xml. With
> the current max-threads setting of 15, a multi-tenant installation
> with 100 tenants will create up to 1500 threads. (!!!) A smarter
> strategy might be to have a single JobPoller instance that services
> multiple JobManagers.

I fixed #1 and #2. I am considering working on #3, but I want some
feedback first.

A JobPoller instance is created for each delegator. So, in a
multi-tenant or multi-delegator scenario, multiple JobPollers will be
created - which means one job queue per delegator and (threads per
queue)  threads per delegator. In a multi-server installation, things
are multiplied: (# of servers * # of delegators) job queues.
Fortunately, in that scenario we can disable the JobPoller on all but
one server.

So, we are left with the potential problem of too many queues/threads
being created on a multi-delegator or multi-tenant server. So, I think
there should be one JobPoller instance that services all delegators. At
each polling interval, the JobPoller gets a list of jobs from each
delegator (JobManager) - creating a list of lists. Then the JobPoller
creates a queue candidate list from the list of lists - using a
round-robin approach so each delegator gets an equal opportunity to
queue jobs. The JobPoller queues the candidate list, and any candidates
that don't fit in the queue are rescheduled. With this approach the
JobPoller can service any number of delegators without saturating the
server.

What do you think?

-Adrian

Reply | Threaded
Open this post in threaded view
|

Re: Job Manager Part 2

hans_bakker
Hi Adrian,

thanks for the excellent work you did on mini language and also here for
the background jobs.

Your proposal sounds like the way to go so +1.

Regards,
Hans

On 08/12/2012 06:36 PM, Adrian Crum wrote:

> On 8/5/2012 1:21 PM, Adrian Crum wrote:
>> On 8/5/2012 11:02 AM, Adrian Crum wrote:
>>> I just committed a bunch of changes to the Job Manager group of
>>> classes. The changes help simplify the code and hopefully make the
>>> Job Manager more robust. On the other hand, I might have broken
>>> something. ;) I will monitor the mailing list for problems.
>>>
>>> I believe the JobPoller settings in serviceengine.xml (the
>>> <thread-pool> element) should be changed. I think min-threads should
>>> be set to "2" and max-threads should be set to "5". Creating lots of
>>> threads can hurt throughput because the JVM spends more time
>>> managing them. I would be interested in hearing what others think.
>>
>> Thinking about this more, there are some other things that need to be
>> fixed:
>>
>> 1. The JobPoller uses an unbounded queue. In a busy server, there is
>> the potential the queue will grow in size until it causes an
>> out-of-memory condition.
>> 2. There is no accommodation for when a job cannot be added to the
>> queue - it is just lost. We could add a dequeue method to the Job
>> interface that will allow implementations to recover or reschedule
>> the job when it can't be added to the queue.
>> 3. There is a JobPoller instance per delegator, and each instance
>> contains the number of threads configured in serviceengine.xml. With
>> the current max-threads setting of 15, a multi-tenant installation
>> with 100 tenants will create up to 1500 threads. (!!!) A smarter
>> strategy might be to have a single JobPoller instance that services
>> multiple JobManagers.
>
> I fixed #1 and #2. I am considering working on #3, but I want some
> feedback first.
>
> A JobPoller instance is created for each delegator. So, in a
> multi-tenant or multi-delegator scenario, multiple JobPollers will be
> created - which means one job queue per delegator and (threads per
> queue)  threads per delegator. In a multi-server installation, things
> are multiplied: (# of servers * # of delegators) job queues.
> Fortunately, in that scenario we can disable the JobPoller on all but
> one server.
>
> So, we are left with the potential problem of too many queues/threads
> being created on a multi-delegator or multi-tenant server. So, I think
> there should be one JobPoller instance that services all delegators.
> At each polling interval, the JobPoller gets a list of jobs from each
> delegator (JobManager) - creating a list of lists. Then the JobPoller
> creates a queue candidate list from the list of lists - using a
> round-robin approach so each delegator gets an equal opportunity to
> queue jobs. The JobPoller queues the candidate list, and any
> candidates that don't fit in the queue are rescheduled. With this
> approach the JobPoller can service any number of delegators without
> saturating the server.
>
> What do you think?
>
> -Adrian
>

Reply | Threaded
Open this post in threaded view
|

Re: Job Manager Part 2

Jacques Le Roux
Administrator
This sounds indeed like a good, simple, and easy to maintain solution to this problem
+

Jacques

From: "Hans Bakker" <[hidden email]>

> Hi Adrian,
>
> thanks for the excellent work you did on mini language and also here for
> the background jobs.
>
> Your proposal sounds like the way to go so +1.
>
> Regards,
> Hans
>
> On 08/12/2012 06:36 PM, Adrian Crum wrote:
>> On 8/5/2012 1:21 PM, Adrian Crum wrote:
>>> On 8/5/2012 11:02 AM, Adrian Crum wrote:
>>>> I just committed a bunch of changes to the Job Manager group of
>>>> classes. The changes help simplify the code and hopefully make the
>>>> Job Manager more robust. On the other hand, I might have broken
>>>> something. ;) I will monitor the mailing list for problems.
>>>>
>>>> I believe the JobPoller settings in serviceengine.xml (the
>>>> <thread-pool> element) should be changed. I think min-threads should
>>>> be set to "2" and max-threads should be set to "5". Creating lots of
>>>> threads can hurt throughput because the JVM spends more time
>>>> managing them. I would be interested in hearing what others think.
>>>
>>> Thinking about this more, there are some other things that need to be
>>> fixed:
>>>
>>> 1. The JobPoller uses an unbounded queue. In a busy server, there is
>>> the potential the queue will grow in size until it causes an
>>> out-of-memory condition.
>>> 2. There is no accommodation for when a job cannot be added to the
>>> queue - it is just lost. We could add a dequeue method to the Job
>>> interface that will allow implementations to recover or reschedule
>>> the job when it can't be added to the queue.
>>> 3. There is a JobPoller instance per delegator, and each instance
>>> contains the number of threads configured in serviceengine.xml. With
>>> the current max-threads setting of 15, a multi-tenant installation
>>> with 100 tenants will create up to 1500 threads. (!!!) A smarter
>>> strategy might be to have a single JobPoller instance that services
>>> multiple JobManagers.
>>
>> I fixed #1 and #2. I am considering working on #3, but I want some
>> feedback first.
>>
>> A JobPoller instance is created for each delegator. So, in a
>> multi-tenant or multi-delegator scenario, multiple JobPollers will be
>> created - which means one job queue per delegator and (threads per
>> queue)  threads per delegator. In a multi-server installation, things
>> are multiplied: (# of servers * # of delegators) job queues.
>> Fortunately, in that scenario we can disable the JobPoller on all but
>> one server.
>>
>> So, we are left with the potential problem of too many queues/threads
>> being created on a multi-delegator or multi-tenant server. So, I think
>> there should be one JobPoller instance that services all delegators.
>> At each polling interval, the JobPoller gets a list of jobs from each
>> delegator (JobManager) - creating a list of lists. Then the JobPoller
>> creates a queue candidate list from the list of lists - using a
>> round-robin approach so each delegator gets an equal opportunity to
>> queue jobs. The JobPoller queues the candidate list, and any
>> candidates that don't fit in the queue are rescheduled. With this
>> approach the JobPoller can service any number of delegators without
>> saturating the server.
>>
>> What do you think?
>>
>> -Adrian
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Job Manager Part 2

Brett
In reply to this post by Adrian Crum-3
*Adrian,

I think the single JobPoller is a good idea and reduces the chance of too
many JobPoller’s running on a machine.

We often setup multiple delegators to communicate with different databases.
 For example, our data warehouse is hosted on  a separate server.  These
databases usually have a full ofbiz schema on them (jobsandbox table, etc).

Here is how our data warehouse process works:

The application has several ofbiz servers talking to a primary database.
 These servers contain all the user information for our application.  When
a person logs into the application they are redirected to a secondary ofbiz
server that is used for running the application under heavy loads.  The
data is captured on the secondary server.

A data warehouse process is scheduled to run every 5 mins on these
secondary servers.  The secondary servers have a delegator that talks to
its local database and a delegator to talk to the data warehouse.

With the new job poller changes would the poller pick up jobs from the data
warehouse database since it has a delegator that points to that instance?

For this example, we would need to make sure the job poller on the
secondary server only serviced jobs from its local database (default
delegator) and not the our configured olap delegator.

Let me know if this will be possible with the new job poller or if you have
any questions from my scenario. I realize this scenario is not the typical
ofbiz e-commerce type of setup everyone is use to, but we use ofbiz to
create lots of different types applications and have found it very flexible
for creating just about any type of ERP application.


Thanks for your work on the job poller.


Brett *

On Sun, Aug 12, 2012 at 5:36 AM, Adrian Crum <
[hidden email]> wrote:

> On 8/5/2012 1:21 PM, Adrian Crum wrote:
>
>> On 8/5/2012 11:02 AM, Adrian Crum wrote:
>>
>>> I just committed a bunch of changes to the Job Manager group of classes.
>>> The changes help simplify the code and hopefully make the Job Manager more
>>> robust. On the other hand, I might have broken something. ;) I will monitor
>>> the mailing list for problems.
>>>
>>> I believe the JobPoller settings in serviceengine.xml (the <thread-pool>
>>> element) should be changed. I think min-threads should be set to "2" and
>>> max-threads should be set to "5". Creating lots of threads can hurt
>>> throughput because the JVM spends more time managing them. I would be
>>> interested in hearing what others think.
>>>
>>
>> Thinking about this more, there are some other things that need to be
>> fixed:
>>
>> 1. The JobPoller uses an unbounded queue. In a busy server, there is the
>> potential the queue will grow in size until it causes an out-of-memory
>> condition.
>> 2. There is no accommodation for when a job cannot be added to the queue
>> - it is just lost. We could add a dequeue method to the Job interface that
>> will allow implementations to recover or reschedule the job when it can't
>> be added to the queue.
>> 3. There is a JobPoller instance per delegator, and each instance
>> contains the number of threads configured in serviceengine.xml. With the
>> current max-threads setting of 15, a multi-tenant installation with 100
>> tenants will create up to 1500 threads. (!!!) A smarter strategy might be
>> to have a single JobPoller instance that services multiple JobManagers.
>>
>
> I fixed #1 and #2. I am considering working on #3, but I want some
> feedback first.
>
> A JobPoller instance is created for each delegator. So, in a multi-tenant
> or multi-delegator scenario, multiple JobPollers will be created - which
> means one job queue per delegator and (threads per queue)  threads per
> delegator. In a multi-server installation, things are multiplied: (# of
> servers * # of delegators) job queues. Fortunately, in that scenario we can
> disable the JobPoller on all but one server.
>
> So, we are left with the potential problem of too many queues/threads
> being created on a multi-delegator or multi-tenant server. So, I think
> there should be one JobPoller instance that services all delegators. At
> each polling interval, the JobPoller gets a list of jobs from each
> delegator (JobManager) - creating a list of lists. Then the JobPoller
> creates a queue candidate list from the list of lists - using a round-robin
> approach so each delegator gets an equal opportunity to queue jobs. The
> JobPoller queues the candidate list, and any candidates that don't fit in
> the queue are rescheduled. With this approach the JobPoller can service any
> number of delegators without saturating the server.
>
> What do you think?
>
> -Adrian
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Job Manager Part 2

Adrian Crum-3
I just updated the schema with documentation that should help everyone
understand how to set up the Job Manager/Job Poller. It seems to me it
can accommodate the scenario you described.

-Adrian

On 8/13/2012 2:42 AM, Brett Palmer wrote:

> *Adrian,
>
> I think the single JobPoller is a good idea and reduces the chance of too
> many JobPoller’s running on a machine.
>
> We often setup multiple delegators to communicate with different databases.
>   For example, our data warehouse is hosted on  a separate server.  These
> databases usually have a full ofbiz schema on them (jobsandbox table, etc).
>
> Here is how our data warehouse process works:
>
> The application has several ofbiz servers talking to a primary database.
>   These servers contain all the user information for our application.  When
> a person logs into the application they are redirected to a secondary ofbiz
> server that is used for running the application under heavy loads.  The
> data is captured on the secondary server.
>
> A data warehouse process is scheduled to run every 5 mins on these
> secondary servers.  The secondary servers have a delegator that talks to
> its local database and a delegator to talk to the data warehouse.
>
> With the new job poller changes would the poller pick up jobs from the data
> warehouse database since it has a delegator that points to that instance?
>
> For this example, we would need to make sure the job poller on the
> secondary server only serviced jobs from its local database (default
> delegator) and not the our configured olap delegator.
>
> Let me know if this will be possible with the new job poller or if you have
> any questions from my scenario. I realize this scenario is not the typical
> ofbiz e-commerce type of setup everyone is use to, but we use ofbiz to
> create lots of different types applications and have found it very flexible
> for creating just about any type of ERP application.
>
>
> Thanks for your work on the job poller.
>
>
> Brett *
>
> On Sun, Aug 12, 2012 at 5:36 AM, Adrian Crum <
> [hidden email]> wrote:
>
>> On 8/5/2012 1:21 PM, Adrian Crum wrote:
>>
>>> On 8/5/2012 11:02 AM, Adrian Crum wrote:
>>>
>>>> I just committed a bunch of changes to the Job Manager group of classes.
>>>> The changes help simplify the code and hopefully make the Job Manager more
>>>> robust. On the other hand, I might have broken something. ;) I will monitor
>>>> the mailing list for problems.
>>>>
>>>> I believe the JobPoller settings in serviceengine.xml (the <thread-pool>
>>>> element) should be changed. I think min-threads should be set to "2" and
>>>> max-threads should be set to "5". Creating lots of threads can hurt
>>>> throughput because the JVM spends more time managing them. I would be
>>>> interested in hearing what others think.
>>>>
>>> Thinking about this more, there are some other things that need to be
>>> fixed:
>>>
>>> 1. The JobPoller uses an unbounded queue. In a busy server, there is the
>>> potential the queue will grow in size until it causes an out-of-memory
>>> condition.
>>> 2. There is no accommodation for when a job cannot be added to the queue
>>> - it is just lost. We could add a dequeue method to the Job interface that
>>> will allow implementations to recover or reschedule the job when it can't
>>> be added to the queue.
>>> 3. There is a JobPoller instance per delegator, and each instance
>>> contains the number of threads configured in serviceengine.xml. With the
>>> current max-threads setting of 15, a multi-tenant installation with 100
>>> tenants will create up to 1500 threads. (!!!) A smarter strategy might be
>>> to have a single JobPoller instance that services multiple JobManagers.
>>>
>> I fixed #1 and #2. I am considering working on #3, but I want some
>> feedback first.
>>
>> A JobPoller instance is created for each delegator. So, in a multi-tenant
>> or multi-delegator scenario, multiple JobPollers will be created - which
>> means one job queue per delegator and (threads per queue)  threads per
>> delegator. In a multi-server installation, things are multiplied: (# of
>> servers * # of delegators) job queues. Fortunately, in that scenario we can
>> disable the JobPoller on all but one server.
>>
>> So, we are left with the potential problem of too many queues/threads
>> being created on a multi-delegator or multi-tenant server. So, I think
>> there should be one JobPoller instance that services all delegators. At
>> each polling interval, the JobPoller gets a list of jobs from each
>> delegator (JobManager) - creating a list of lists. Then the JobPoller
>> creates a queue candidate list from the list of lists - using a round-robin
>> approach so each delegator gets an equal opportunity to queue jobs. The
>> JobPoller queues the candidate list, and any candidates that don't fit in
>> the queue are rescheduled. With this approach the JobPoller can service any
>> number of delegators without saturating the server.
>>
>> What do you think?
>>
>> -Adrian
>>
>>

12