Stalled jobs

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Stalled jobs

Mike Baschky
Hey All,
    I've got an odd issue I'm trying figure out. My service engine does
not appear to be running jobs anymore. It appears to be just sitting
there doing nothing (even for the standard ofbiz jobs like
purgeOldJobs). When I look into the Jobsandbox table I see several jobs
in running status but nothing is happening. These jobs are a several
days old so I'm guessing they are not really running. I've shut the
system down an restarted but still nothing seems to happen. I'm not
really seeing any error messages that help me out here. Thinking that
maybe I've hit some sort of job limit I banked out the statusId on
several jobs and then cancelled them in webtools - again no luck on
kicking off the remaining jobs. One other item I noted in the thread
list is there are 5 sleeping default-invoker-thread-xxx threads at the
top of the page but I only see of these threads in the Java threads
listed below (not sure if this means anything).
 
    I'm not sure where to look next. Can anyone point me in the right
direction on how to track this issue down. Thanks.
 
-Mike
Reply | Threaded
Open this post in threaded view
|

Re: Stalled jobs

Vince Clark
Mike

This won't give you an answer, but maybe some helpful insight. I have not had a problem with standard jobs but have had a problem with jobs we added to do synchronization with POS terminals. I've seen two situations where the system seems to leave a job in a "running" status and not clear it out and reschedule on a restart.
1) Server encounters a heap space (out of memory) error. - Solved by increasing max memory in startup.sh
2) Server (or POS client in our case) is stopped while a job is running. - Haven't implemented a solution but we are planning on a "graceful shutdown" on our POS terminals to make sure all entity sync jobs are finished before shutting down.


----- Original Message -----
From: "Mike Baschky" <[hidden email]>
To: [hidden email]
Sent: Friday, January 18, 2008 10:09:51 AM (GMT-0700) America/Denver
Subject: Stalled jobs

Hey All,
I've got an odd issue I'm trying figure out. My service engine does
not appear to be running jobs anymore. It appears to be just sitting
there doing nothing (even for the standard ofbiz jobs like
purgeOldJobs). When I look into the Jobsandbox table I see several jobs
in running status but nothing is happening. These jobs are a several
days old so I'm guessing they are not really running. I've shut the
system down an restarted but still nothing seems to happen. I'm not
really seeing any error messages that help me out here. Thinking that
maybe I've hit some sort of job limit I banked out the statusId on
several jobs and then cancelled them in webtools - again no luck on
kicking off the remaining jobs. One other item I noted in the thread
list is there are 5 sleeping default-invoker-thread-xxx threads at the
top of the page but I only see of these threads in the Java threads
listed below (not sure if this means anything).

I'm not sure where to look next. Can anyone point me in the right
direction on how to track this issue down. Thanks.

-Mike
Reply | Threaded
Open this post in threaded view
|

RE: Stalled jobs

Mike Baschky
Thanks Vince. My guess is my problem is in line with your second point -
server stopped while the jobs are running. In this case did you delete
the jobs from the JobSandbox entity and manually re-schedule?

-----Original Message-----
From: Vince M. Clark [mailto:[hidden email]]
Sent: Friday, January 18, 2008 11:17 AM
To: [hidden email]
Subject: Re: Stalled jobs

Mike

This won't give you an answer, but maybe some helpful insight. I have
not had a problem with standard jobs but have had a problem with jobs we
added to do synchronization with POS terminals. I've seen two situations
where the system seems to leave a job in a "running" status and not
clear it out and reschedule on a restart.
1) Server encounters a heap space (out of memory) error. - Solved by
increasing max memory in startup.sh
2) Server (or POS client in our case) is stopped while a job is running.
- Haven't implemented a solution but we are planning on a "graceful
shutdown" on our POS terminals to make sure all entity sync jobs are
finished before shutting down.


----- Original Message -----
From: "Mike Baschky" <[hidden email]>
To: [hidden email]
Sent: Friday, January 18, 2008 10:09:51 AM (GMT-0700) America/Denver
Subject: Stalled jobs

Hey All,
I've got an odd issue I'm trying figure out. My service engine does
not appear to be running jobs anymore. It appears to be just sitting
there doing nothing (even for the standard ofbiz jobs like
purgeOldJobs). When I look into the Jobsandbox table I see several jobs
in running status but nothing is happening. These jobs are a several
days old so I'm guessing they are not really running. I've shut the
system down an restarted but still nothing seems to happen. I'm not
really seeing any error messages that help me out here. Thinking that
maybe I've hit some sort of job limit I banked out the statusId on
several jobs and then cancelled them in webtools - again no luck on
kicking off the remaining jobs. One other item I noted in the thread
list is there are 5 sleeping default-invoker-thread-xxx threads at the
top of the page but I only see of these threads in the Java threads
listed below (not sure if this means anything).

I'm not sure where to look next. Can anyone point me in the right
direction on how to track this issue down. Thanks.

-Mike
Reply | Threaded
Open this post in threaded view
|

Re: Stalled jobs

David E Jones

Right now the service engine will look at running jobs on startup and  
do some resetting to put them back into persisted job queue.

For long-term scheduled jobs it is important to watch them around  
these events, and restart them manually if needed (which can be done  
from WebTools unless you have an older version). When manually killing  
them it's good to look at the thread pool (in webtools too) of all  
servers running against the database to make sure it isn't running.  
For some services it won't do any harm to have multiple copies  
running, but will affect performance and server load.

-David


On Jan 18, 2008, at 10:45 AM, Mike Baschky wrote:

> Thanks Vince. My guess is my problem is in line with your second  
> point -
> server stopped while the jobs are running. In this case did you delete
> the jobs from the JobSandbox entity and manually re-schedule?
>
> -----Original Message-----
> From: Vince M. Clark [mailto:[hidden email]]
> Sent: Friday, January 18, 2008 11:17 AM
> To: [hidden email]
> Subject: Re: Stalled jobs
>
> Mike
>
> This won't give you an answer, but maybe some helpful insight. I have
> not had a problem with standard jobs but have had a problem with  
> jobs we
> added to do synchronization with POS terminals. I've seen two  
> situations
> where the system seems to leave a job in a "running" status and not
> clear it out and reschedule on a restart.
> 1) Server encounters a heap space (out of memory) error. - Solved by
> increasing max memory in startup.sh
> 2) Server (or POS client in our case) is stopped while a job is  
> running.
> - Haven't implemented a solution but we are planning on a "graceful
> shutdown" on our POS terminals to make sure all entity sync jobs are
> finished before shutting down.
>
>
> ----- Original Message -----
> From: "Mike Baschky" <[hidden email]>
> To: [hidden email]
> Sent: Friday, January 18, 2008 10:09:51 AM (GMT-0700) America/Denver
> Subject: Stalled jobs
>
> Hey All,
> I've got an odd issue I'm trying figure out. My service engine does
> not appear to be running jobs anymore. It appears to be just sitting
> there doing nothing (even for the standard ofbiz jobs like
> purgeOldJobs). When I look into the Jobsandbox table I see several  
> jobs
> in running status but nothing is happening. These jobs are a several
> days old so I'm guessing they are not really running. I've shut the
> system down an restarted but still nothing seems to happen. I'm not
> really seeing any error messages that help me out here. Thinking that
> maybe I've hit some sort of job limit I banked out the statusId on
> several jobs and then cancelled them in webtools - again no luck on
> kicking off the remaining jobs. One other item I noted in the thread
> list is there are 5 sleeping default-invoker-thread-xxx threads at the
> top of the page but I only see of these threads in the Java threads
> listed below (not sure if this means anything).
>
> I'm not sure where to look next. Can anyone point me in the right
> direction on how to track this issue down. Thanks.
>
> -Mike

Reply | Threaded
Open this post in threaded view
|

Re: Stalled jobs

Vince Clark
In reply to this post by Mike Baschky
Yes, I had to delete the record showing "running" from the sandbox and reschedule. It has been a while since this happened so my memory is a bit vague. I may have also deleted some records from the entity sync tables that tracked when the last sync occurred. But that would be specific to entity sync jobs only.

----- Original Message -----
From: "Mike Baschky" <[hidden email]>
To: [hidden email]
Sent: Friday, January 18, 2008 10:45:19 AM (GMT-0700) America/Denver
Subject: RE: Stalled jobs

Thanks Vince. My guess is my problem is in line with your second point -
server stopped while the jobs are running. In this case did you delete
the jobs from the JobSandbox entity and manually re-schedule?

-----Original Message-----
From: Vince M. Clark [mailto:[hidden email]]
Sent: Friday, January 18, 2008 11:17 AM
To: [hidden email]
Subject: Re: Stalled jobs

Mike

This won't give you an answer, but maybe some helpful insight. I have
not had a problem with standard jobs but have had a problem with jobs we
added to do synchronization with POS terminals. I've seen two situations
where the system seems to leave a job in a "running" status and not
clear it out and reschedule on a restart.
1) Server encounters a heap space (out of memory) error. - Solved by
increasing max memory in startup.sh
2) Server (or POS client in our case) is stopped while a job is running.
- Haven't implemented a solution but we are planning on a "graceful
shutdown" on our POS terminals to make sure all entity sync jobs are
finished before shutting down.


----- Original Message -----
From: "Mike Baschky" <[hidden email]>
To: [hidden email]
Sent: Friday, January 18, 2008 10:09:51 AM (GMT-0700) America/Denver
Subject: Stalled jobs

Hey All,
I've got an odd issue I'm trying figure out. My service engine does
not appear to be running jobs anymore. It appears to be just sitting
there doing nothing (even for the standard ofbiz jobs like
purgeOldJobs). When I look into the Jobsandbox table I see several jobs
in running status but nothing is happening. These jobs are a several
days old so I'm guessing they are not really running. I've shut the
system down an restarted but still nothing seems to happen. I'm not
really seeing any error messages that help me out here. Thinking that
maybe I've hit some sort of job limit I banked out the statusId on
several jobs and then cancelled them in webtools - again no luck on
kicking off the remaining jobs. One other item I noted in the thread
list is there are 5 sleeping default-invoker-thread-xxx threads at the
top of the page but I only see of these threads in the Java threads
listed below (not sure if this means anything).

I'm not sure where to look next. Can anyone point me in the right
direction on how to track this issue down. Thanks.

-Mike
Reply | Threaded
Open this post in threaded view
|

RE: Stalled jobs

Mike Baschky
In reply to this post by David E Jones
Thanks David.

-----Original Message-----
From: David E Jones [mailto:[hidden email]]
Sent: Friday, January 18, 2008 11:51 AM
To: [hidden email]
Subject: Re: Stalled jobs


Right now the service engine will look at running jobs on startup and do
some resetting to put them back into persisted job queue.

For long-term scheduled jobs it is important to watch them around these
events, and restart them manually if needed (which can be done from
WebTools unless you have an older version). When manually killing them
it's good to look at the thread pool (in webtools too) of all servers
running against the database to make sure it isn't running.  
For some services it won't do any harm to have multiple copies running,
but will affect performance and server load.

-David


On Jan 18, 2008, at 10:45 AM, Mike Baschky wrote:

> Thanks Vince. My guess is my problem is in line with your second point

> - server stopped while the jobs are running. In this case did you
> delete the jobs from the JobSandbox entity and manually re-schedule?
>
> -----Original Message-----
> From: Vince M. Clark [mailto:[hidden email]]
> Sent: Friday, January 18, 2008 11:17 AM
> To: [hidden email]
> Subject: Re: Stalled jobs
>
> Mike
>
> This won't give you an answer, but maybe some helpful insight. I have
> not had a problem with standard jobs but have had a problem with jobs
> we added to do synchronization with POS terminals. I've seen two
> situations where the system seems to leave a job in a "running" status

> and not clear it out and reschedule on a restart.
> 1) Server encounters a heap space (out of memory) error. - Solved by
> increasing max memory in startup.sh
> 2) Server (or POS client in our case) is stopped while a job is
> running.
> - Haven't implemented a solution but we are planning on a "graceful
> shutdown" on our POS terminals to make sure all entity sync jobs are
> finished before shutting down.
>
>
> ----- Original Message -----
> From: "Mike Baschky" <[hidden email]>
> To: [hidden email]
> Sent: Friday, January 18, 2008 10:09:51 AM (GMT-0700) America/Denver
> Subject: Stalled jobs
>
> Hey All,
> I've got an odd issue I'm trying figure out. My service engine does
> not appear to be running jobs anymore. It appears to be just sitting
> there doing nothing (even for the standard ofbiz jobs like
> purgeOldJobs). When I look into the Jobsandbox table I see several  
> jobs
> in running status but nothing is happening. These jobs are a several
> days old so I'm guessing they are not really running. I've shut the
> system down an restarted but still nothing seems to happen. I'm not
> really seeing any error messages that help me out here. Thinking that
> maybe I've hit some sort of job limit I banked out the statusId on
> several jobs and then cancelled them in webtools - again no luck on
> kicking off the remaining jobs. One other item I noted in the thread
> list is there are 5 sleeping default-invoker-thread-xxx threads at the
> top of the page but I only see of these threads in the Java threads
> listed below (not sure if this means anything).
>
> I'm not sure where to look next. Can anyone point me in the right
> direction on how to track this issue down. Thanks.
>
> -Mike

Reply | Threaded
Open this post in threaded view
|

Re: Stalled jobs

BJ Freeman
In reply to this post by Mike Baschky
Please state you version of ofbiz and the SVN number.

Mike Baschky sent the following on 1/18/2008 9:09 AM:

> Hey All,
>     I've got an odd issue I'm trying figure out. My service engine does
> not appear to be running jobs anymore. It appears to be just sitting
> there doing nothing (even for the standard ofbiz jobs like
> purgeOldJobs). When I look into the Jobsandbox table I see several jobs
> in running status but nothing is happening. These jobs are a several
> days old so I'm guessing they are not really running. I've shut the
> system down an restarted but still nothing seems to happen. I'm not
> really seeing any error messages that help me out here. Thinking that
> maybe I've hit some sort of job limit I banked out the statusId on
> several jobs and then cancelled them in webtools - again no luck on
> kicking off the remaining jobs. One other item I noted in the thread
> list is there are 5 sleeping default-invoker-thread-xxx threads at the
> top of the page but I only see of these threads in the Java threads
> listed below (not sure if this means anything).
>  
>     I'm not sure where to look next. Can anyone point me in the right
> direction on how to track this issue down. Thanks.
>  
> -Mike
>

Reply | Threaded
Open this post in threaded view
|

RE: Stalled jobs

Mike Baschky
The version is approximately version 4 (or pre 4). I don't remember the
actual svn number because we pulled it into our own svn system. It was
pulled around the middle of March of last year (2007).

-----Original Message-----
From: BJ Freeman [mailto:[hidden email]]
Sent: Friday, January 18, 2008 12:40 PM
To: [hidden email]
Subject: Re: Stalled jobs

Please state you version of ofbiz and the SVN number.

Mike Baschky sent the following on 1/18/2008 9:09 AM:

> Hey All,
>     I've got an odd issue I'm trying figure out. My service engine
> does not appear to be running jobs anymore. It appears to be just
> sitting there doing nothing (even for the standard ofbiz jobs like
> purgeOldJobs). When I look into the Jobsandbox table I see several
> jobs in running status but nothing is happening. These jobs are a
> several days old so I'm guessing they are not really running. I've
> shut the system down an restarted but still nothing seems to happen.
> I'm not really seeing any error messages that help me out here.
> Thinking that maybe I've hit some sort of job limit I banked out the
> statusId on several jobs and then cancelled them in webtools - again
> no luck on kicking off the remaining jobs. One other item I noted in
> the thread list is there are 5 sleeping default-invoker-thread-xxx
> threads at the top of the page but I only see of these threads in the
> Java threads listed below (not sure if this means anything).
>  
>     I'm not sure where to look next. Can anyone point me in the right
> direction on how to track this issue down. Thanks.
>  
> -Mike
>

Reply | Threaded
Open this post in threaded view
|

Re: Stalled jobs

BJ Freeman
Sept of last year andy did some work one the service engine.
http://svn.apache.org/viewvc?view=rev&revision=575074
http://svn.apache.org/viewvc?view=rev&revision=577097
not sure if this is related.


Mike Baschky sent the following on 1/18/2008 12:04 PM:

> The version is approximately version 4 (or pre 4). I don't remember the
> actual svn number because we pulled it into our own svn system. It was
> pulled around the middle of March of last year (2007).
>
> -----Original Message-----
> From: BJ Freeman [mailto:[hidden email]]
> Sent: Friday, January 18, 2008 12:40 PM
> To: [hidden email]
> Subject: Re: Stalled jobs
>
> Please state you version of ofbiz and the SVN number.
>
> Mike Baschky sent the following on 1/18/2008 9:09 AM:
>> Hey All,
>>     I've got an odd issue I'm trying figure out. My service engine
>> does not appear to be running jobs anymore. It appears to be just
>> sitting there doing nothing (even for the standard ofbiz jobs like
>> purgeOldJobs). When I look into the Jobsandbox table I see several
>> jobs in running status but nothing is happening. These jobs are a
>> several days old so I'm guessing they are not really running. I've
>> shut the system down an restarted but still nothing seems to happen.
>> I'm not really seeing any error messages that help me out here.
>> Thinking that maybe I've hit some sort of job limit I banked out the
>> statusId on several jobs and then cancelled them in webtools - again
>> no luck on kicking off the remaining jobs. One other item I noted in
>> the thread list is there are 5 sleeping default-invoker-thread-xxx
>> threads at the top of the page but I only see of these threads in the
>> Java threads listed below (not sure if this means anything).
>>  
>>     I'm not sure where to look next. Can anyone point me in the right
>> direction on how to track this issue down. Thanks.
>>  
>> -Mike
>>
>
>
>
>