[jira] Created: (OFBIZ-3583) Resolve two issues with scheduled jobs related to clean-up

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (OFBIZ-3583) Resolve two issues with scheduled jobs related to clean-up

Nicolas Malin (Jira)
Resolve two issues with scheduled jobs related to clean-up
----------------------------------------------------------

                 Key: OFBIZ-3583
                 URL: https://issues.apache.org/jira/browse/OFBIZ-3583
             Project: OFBiz
          Issue Type: Bug
          Components: framework
            Reporter: Bob Morley
         Attachments: OFBIZ-3583_FixsToScheduledJobCleanup.patch

Encountered two problems --

1) If a semaphore service is executing when the application server goes down (see purgeOldJobs) the reloadCrashedJobs takes over to mark this job as CRASHED.  However, it does not clean-up the ServiceSemaphore record which causes all future jobs to either fail immediately or wait (until semaphore timeout) and then fail.

2) When ServiceUtil.purgeOldJobs is invoked it blindly attempts to delete runtimeData and then rollsback if this delete fails (always when other jobs reference the same runtimeData).  This causes a service error log message for what is really typical behavior.

Solutions --

1) When reloading crashed jobs, we look for a rogue ServiceSemaphore for this service name and purge it (on start-up).  This works for multiple application servers because any crashed job would leave behind the semaphore and no other application server running the JobManager could have created it (as they would have been blocked from executing).

2) In purgeOldJobs I changed the list of runtimeDataIds from a List to a Set (this remove the redundant delete requests).  When attempting the delete I do a "quick" count on the JobSandbox table to see if there are any jobs still hanging onto the particular RuntimeData instance and only attempt the delete when there are no more remaining jobs.  There is an existing index on the JobSandbox for the runtimeDataId so this count should perform relatively quickly.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (OFBIZ-3583) Resolve two issues with scheduled jobs related to clean-up

Nicolas Malin (Jira)

     [ https://issues.apache.org/jira/browse/OFBIZ-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bob Morley updated OFBIZ-3583:
------------------------------

    Attachment: OFBIZ-3583_FixsToScheduledJobCleanup.patch

> Resolve two issues with scheduled jobs related to clean-up
> ----------------------------------------------------------
>
>                 Key: OFBIZ-3583
>                 URL: https://issues.apache.org/jira/browse/OFBIZ-3583
>             Project: OFBiz
>          Issue Type: Bug
>          Components: framework
>            Reporter: Bob Morley
>         Attachments: OFBIZ-3583_FixsToScheduledJobCleanup.patch
>
>
> Encountered two problems --
> 1) If a semaphore service is executing when the application server goes down (see purgeOldJobs) the reloadCrashedJobs takes over to mark this job as CRASHED.  However, it does not clean-up the ServiceSemaphore record which causes all future jobs to either fail immediately or wait (until semaphore timeout) and then fail.
> 2) When ServiceUtil.purgeOldJobs is invoked it blindly attempts to delete runtimeData and then rollsback if this delete fails (always when other jobs reference the same runtimeData).  This causes a service error log message for what is really typical behavior.
> Solutions --
> 1) When reloading crashed jobs, we look for a rogue ServiceSemaphore for this service name and purge it (on start-up).  This works for multiple application servers because any crashed job would leave behind the semaphore and no other application server running the JobManager could have created it (as they would have been blocked from executing).
> 2) In purgeOldJobs I changed the list of runtimeDataIds from a List to a Set (this remove the redundant delete requests).  When attempting the delete I do a "quick" count on the JobSandbox table to see if there are any jobs still hanging onto the particular RuntimeData instance and only attempt the delete when there are no more remaining jobs.  There is an existing index on the JobSandbox for the runtimeDataId so this count should perform relatively quickly.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Resolved: (OFBIZ-3583) Resolve two issues with scheduled jobs related to clean-up

Nicolas Malin (Jira)
In reply to this post by Nicolas Malin (Jira)

     [ https://issues.apache.org/jira/browse/OFBIZ-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bob Morley resolved OFBIZ-3583.
-------------------------------

       Resolution: Fixed
    Fix Version/s: SVN trunk

> Resolve two issues with scheduled jobs related to clean-up
> ----------------------------------------------------------
>
>                 Key: OFBIZ-3583
>                 URL: https://issues.apache.org/jira/browse/OFBIZ-3583
>             Project: OFBiz
>          Issue Type: Bug
>          Components: framework
>            Reporter: Bob Morley
>             Fix For: SVN trunk
>
>         Attachments: OFBIZ-3583_FixsToScheduledJobCleanup.patch
>
>
> Encountered two problems --
> 1) If a semaphore service is executing when the application server goes down (see purgeOldJobs) the reloadCrashedJobs takes over to mark this job as CRASHED.  However, it does not clean-up the ServiceSemaphore record which causes all future jobs to either fail immediately or wait (until semaphore timeout) and then fail.
> 2) When ServiceUtil.purgeOldJobs is invoked it blindly attempts to delete runtimeData and then rollsback if this delete fails (always when other jobs reference the same runtimeData).  This causes a service error log message for what is really typical behavior.
> Solutions --
> 1) When reloading crashed jobs, we look for a rogue ServiceSemaphore for this service name and purge it (on start-up).  This works for multiple application servers because any crashed job would leave behind the semaphore and no other application server running the JobManager could have created it (as they would have been blocked from executing).
> 2) In purgeOldJobs I changed the list of runtimeDataIds from a List to a Set (this remove the redundant delete requests).  When attempting the delete I do a "quick" count on the JobSandbox table to see if there are any jobs still hanging onto the particular RuntimeData instance and only attempt the delete when there are no more remaining jobs.  There is an existing index on the JobSandbox for the runtimeDataId so this count should perform relatively quickly.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Reopened: (OFBIZ-3583) Resolve two issues with scheduled jobs related to clean-up

Nicolas Malin (Jira)
In reply to this post by Nicolas Malin (Jira)

     [ https://issues.apache.org/jira/browse/OFBIZ-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bob Morley reopened OFBIZ-3583:
-------------------------------


> Resolve two issues with scheduled jobs related to clean-up
> ----------------------------------------------------------
>
>                 Key: OFBIZ-3583
>                 URL: https://issues.apache.org/jira/browse/OFBIZ-3583
>             Project: OFBiz
>          Issue Type: Bug
>          Components: framework
>            Reporter: Bob Morley
>             Fix For: SVN trunk
>
>         Attachments: OFBIZ-3583_FixsToScheduledJobCleanup.patch
>
>
> Encountered two problems --
> 1) If a semaphore service is executing when the application server goes down (see purgeOldJobs) the reloadCrashedJobs takes over to mark this job as CRASHED.  However, it does not clean-up the ServiceSemaphore record which causes all future jobs to either fail immediately or wait (until semaphore timeout) and then fail.
> 2) When ServiceUtil.purgeOldJobs is invoked it blindly attempts to delete runtimeData and then rollsback if this delete fails (always when other jobs reference the same runtimeData).  This causes a service error log message for what is really typical behavior.
> Solutions --
> 1) When reloading crashed jobs, we look for a rogue ServiceSemaphore for this service name and purge it (on start-up).  This works for multiple application servers because any crashed job would leave behind the semaphore and no other application server running the JobManager could have created it (as they would have been blocked from executing).
> 2) In purgeOldJobs I changed the list of runtimeDataIds from a List to a Set (this remove the redundant delete requests).  When attempting the delete I do a "quick" count on the JobSandbox table to see if there are any jobs still hanging onto the particular RuntimeData instance and only attempt the delete when there are no more remaining jobs.  There is an existing index on the JobSandbox for the runtimeDataId so this count should perform relatively quickly.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.