Josh,
I'm attaching the patch I used to work around this issue. This is based on an older version of ofbiz so I would compare your current files carefully. The following files were patched:
service-config.xsd serviceengine.xml JobManager.java JobPoller.java The patch allowed for a new configuration option
poll-transaction-timeout="300" I'm pretty sure that I was using 300 seconds for the poll-transaction-timeout. I believe the default is 60 or 120 seconds.
I originally created a JIRA issue 3855 for this problem. If you set the transaction time out too high when the poller wakes up to process new requests it will timeout because the first poller has a lock on the table (or ofbiz semaphore method).
Here are a couple of other options you could try since the number of pending jobs is so high. 1. Create a temporary status for the jobSandbox statusId and assign a large set of pending transactions to this status. Then only process a few 1000 at a time. Then you can incrementally change these back to pending so the service engine can process them in reasonable batches. I haven't tried this option but it would allow you to work with the service engine without modifying any code.
2. Start up several more instances of ofbiz all pointing to the same database. Each will start service process to process more requests in parallel. This probably won't work with out the patch I've attached as each service process would still time out and not allow other processes to start.
Good luck, Brett On Wed, Jul 13, 2011 at 8:10 PM, Josh Jacobson <[hidden email]> wrote: Thanks again. I actually meant a suggestion for the transaction ofbiz_jobpoll_tx_deploy_patch.jar (18K) Download Attachment |
In reply to this post by Scott Gray-2
I tried 60 seconds for timeout but that didn't work. I guess Ill
double it now and keep trying. I have about 260,000 pending jobs, and nothing is getting done. I know what you mean about purgeOldjobs. That service is crashed now and I deleted old jobs from the database by hand. I was up to 2.6 million rows. Ofbiz was pretty much unusable. If you have any other suggestions I'd love Yo hear them. On Wednesday, July 13, 2011, Scott Gray <[hidden email]> wrote: > Ah okay, that is entirely dependent on the number of jobs and the speed the server can process them. As a side note I would keep a close eye on the purgeOldJobs service, when it starts falling over (transaction timeout again) then the number of rows in the table will increase quickly which in turn will slow down polling. > > In general the whole persisted jobs implementation is a bit fragile, especially when dealing with a large number of jobs. I've wanted to replace it with something like quartz for a while but haven't had the time. > > Regards > Scott > > On 14/07/2011, at 2:10 PM, Josh Jacobson wrote: > >> Thanks again. I actually meant a suggestion for the transaction >> timeout. In any case I am grateful for your explanation. >> >> >> On Wednesday, July 13, 2011, Scott Gray <[hidden email]> wrote: >>> As best I can tell there shouldn't be any need to increase the interval between polls since the interval timer doesn't actually start until the previous poll has completed (see JobPoller.run()) so I can't see how a small interval would cause any backlog problems. >>> >>> I'm guessing if there is any lock contention then it's probably caused by the executing jobs trying to update their respective rows while the poller is holding a table lock. So from that point of view I guess increasing the interval could reduce the amount of contention between the executing jobs and the next poll. >>> >>> Regards >>> Scott >>> >>> On 14/07/2011, at 1:02 PM, Josh Jacobson wrote: >>> >>>> Scott, >>>> >>>> Thanks! That is very precise advise. Do you have a suggestion on >>>> interval time? 60 seconds? 120? >>>> >>>> Thanks, >>>> >>>> On Wed, Jul 13, 2011 at 5:34 PM, Scott Gray <[hidden email]> wrote: >>>>> That configuration is for the frequency of job polls. There isn't any ability to specify the transaction timeout via configuration so you'll need to modify the code directly: >>>>> JobManager.java (line 148): >>>>> beganTransaction = TransactionUtil.begin(); >>>>> needs to be changed to use TransactionUtil.begin(int) >>>>> >>>>> Regards >>>>> Scott >>>>> >>>>> HotWax Media >>>>> http://www.hotwaxmedia.com >>>>> >>>>> On 14/07/2011, at 12:23 PM, Josh Jacobson wrote: >>>>> >>>>>> Brett, >>>>>> >>>>>> Before I start trying to run the jobs manually, I want to give your >>>>>> suggestion a try. I think I know where to configure the job polling >>>>>> transaction time (I believe it's the poll-db-millis="20000" value on >>>>>> the framework/service/config/serviceengine.xml. >>>>>> >>>>>> However, I still don't know what to increase it to. I understand that >>>>>> we wouldn't want to make it bigger than the default polling interval. >>>>>> Do you know what the default interval between polling is? >>>>>> >>>>>> Thanks, >>>>>> >>>>>> On Wed, Jul 13, 2011 at 12:31 PM, Brett Palmer <[hidden email]> wrote: >>>>>>> I meant removing finished jobs. If you have thousands of pending jobs then >>>>>>> you will have the same problem I mentioned in my first email. One >>>>>>> resolution will be to increase the job poller transaction time. In the >>>>>>> ofbiz version I was using there was not a way to configure the poller >>>>>>> transaction time. It just used the default time. I had to create a patch >>>>>>> to allow this to happen. >>>>>>> >>>>>>> In the patch you had to be careful to not increase the transaction time >>>>>>> greater than the frequency of the job poller. Otherwise you get into a lock >>>>>>> situation where one job poller is still running within a transaction and >>>>>>> another poller starts. This didn't create a huge problem but the second job >>>>>>> poller would usually lock and then time out. >>>>>>> >>>>>>> >>>>>>> >>>>>>> Brett >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Jul 13, 2011 at 1:15 PM, Josh Jacobson <[hidden email]>wrote: >>>>>>> >>>>>>>> Brett, >>>>>>>> >>>>>>>> |
Not sure what db you're using but it probably wouldn't hurt to run a vacuum on the table to speed up processing.
By the way, I'm pretty sure the default timeout is 60 seconds so you might want to try something a little larger :-) Regards Scott On 14/07/2011, at 2:58 PM, Josh Jacobson wrote: > I tried 60 seconds for timeout but that didn't work. I guess Ill > double it now and keep trying. > > I have about 260,000 pending jobs, and nothing is getting done. > > I know what you mean about purgeOldjobs. That service is crashed now > and I deleted old jobs from the database by hand. I was up to 2.6 > million rows. Ofbiz was pretty much unusable. > > If you have any other suggestions I'd love Yo hear them. > > On Wednesday, July 13, 2011, Scott Gray <[hidden email]> wrote: >> Ah okay, that is entirely dependent on the number of jobs and the speed the server can process them. As a side note I would keep a close eye on the purgeOldJobs service, when it starts falling over (transaction timeout again) then the number of rows in the table will increase quickly which in turn will slow down polling. >> >> In general the whole persisted jobs implementation is a bit fragile, especially when dealing with a large number of jobs. I've wanted to replace it with something like quartz for a while but haven't had the time. >> >> Regards >> Scott >> >> On 14/07/2011, at 2:10 PM, Josh Jacobson wrote: >> >>> Thanks again. I actually meant a suggestion for the transaction >>> timeout. In any case I am grateful for your explanation. >>> >>> >>> On Wednesday, July 13, 2011, Scott Gray <[hidden email]> wrote: >>>> As best I can tell there shouldn't be any need to increase the interval between polls since the interval timer doesn't actually start until the previous poll has completed (see JobPoller.run()) so I can't see how a small interval would cause any backlog problems. >>>> >>>> I'm guessing if there is any lock contention then it's probably caused by the executing jobs trying to update their respective rows while the poller is holding a table lock. So from that point of view I guess increasing the interval could reduce the amount of contention between the executing jobs and the next poll. >>>> >>>> Regards >>>> Scott >>>> >>>> On 14/07/2011, at 1:02 PM, Josh Jacobson wrote: >>>> >>>>> Scott, >>>>> >>>>> Thanks! That is very precise advise. Do you have a suggestion on >>>>> interval time? 60 seconds? 120? >>>>> >>>>> Thanks, >>>>> >>>>> On Wed, Jul 13, 2011 at 5:34 PM, Scott Gray <[hidden email]> wrote: >>>>>> That configuration is for the frequency of job polls. There isn't any ability to specify the transaction timeout via configuration so you'll need to modify the code directly: >>>>>> JobManager.java (line 148): >>>>>> beganTransaction = TransactionUtil.begin(); >>>>>> needs to be changed to use TransactionUtil.begin(int) >>>>>> >>>>>> Regards >>>>>> Scott >>>>>> >>>>>> HotWax Media >>>>>> http://www.hotwaxmedia.com >>>>>> >>>>>> On 14/07/2011, at 12:23 PM, Josh Jacobson wrote: >>>>>> >>>>>>> Brett, >>>>>>> >>>>>>> Before I start trying to run the jobs manually, I want to give your >>>>>>> suggestion a try. I think I know where to configure the job polling >>>>>>> transaction time (I believe it's the poll-db-millis="20000" value on >>>>>>> the framework/service/config/serviceengine.xml. >>>>>>> >>>>>>> However, I still don't know what to increase it to. I understand that >>>>>>> we wouldn't want to make it bigger than the default polling interval. >>>>>>> Do you know what the default interval between polling is? >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> On Wed, Jul 13, 2011 at 12:31 PM, Brett Palmer <[hidden email]> wrote: >>>>>>>> I meant removing finished jobs. If you have thousands of pending jobs then >>>>>>>> you will have the same problem I mentioned in my first email. One >>>>>>>> resolution will be to increase the job poller transaction time. In the >>>>>>>> ofbiz version I was using there was not a way to configure the poller >>>>>>>> transaction time. It just used the default time. I had to create a patch >>>>>>>> to allow this to happen. >>>>>>>> >>>>>>>> In the patch you had to be careful to not increase the transaction time >>>>>>>> greater than the frequency of the job poller. Otherwise you get into a lock >>>>>>>> situation where one job poller is still running within a transaction and >>>>>>>> another poller starts. This didn't create a huge problem but the second job >>>>>>>> poller would usually lock and then time out. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Brett >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Jul 13, 2011 at 1:15 PM, Josh Jacobson <[hidden email]>wrote: >>>>>>>> >>>>>>>>> Brett, >>>>>>>>> >>>>>>>>> smime.p7s (3K) Download Attachment |
Vacuum has been run, (took quite a while). Yeah, I see now that the
JobManager actually tries to update all the JobSandbox rows in the transaction, so 60 seconds was pretty low. I am trying 10 minutes now and see how that goes. I am using postgress by the way. Thanks for the help, I really appreciate it. -- Josh. On Wed, Jul 13, 2011 at 8:29 PM, Scott Gray <[hidden email]> wrote: > Not sure what db you're using but it probably wouldn't hurt to run a vacuum on the table to speed up processing. > > By the way, I'm pretty sure the default timeout is 60 seconds so you might want to try something a little larger :-) > > Regards > Scott > > On 14/07/2011, at 2:58 PM, Josh Jacobson wrote: > >> I tried 60 seconds for timeout but that didn't work. I guess Ill >> double it now and keep trying. >> >> I have about 260,000 pending jobs, and nothing is getting done. >> >> I know what you mean about purgeOldjobs. That service is crashed now >> and I deleted old jobs from the database by hand. I was up to 2.6 >> million rows. Ofbiz was pretty much unusable. >> >> If you have any other suggestions I'd love Yo hear them. >> >> On Wednesday, July 13, 2011, Scott Gray <[hidden email]> wrote: >>> Ah okay, that is entirely dependent on the number of jobs and the speed the server can process them. As a side note I would keep a close eye on the purgeOldJobs service, when it starts falling over (transaction timeout again) then the number of rows in the table will increase quickly which in turn will slow down polling. >>> >>> In general the whole persisted jobs implementation is a bit fragile, especially when dealing with a large number of jobs. I've wanted to replace it with something like quartz for a while but haven't had the time. >>> >>> Regards >>> Scott >>> >>> On 14/07/2011, at 2:10 PM, Josh Jacobson wrote: >>> >>>> Thanks again. I actually meant a suggestion for the transaction >>>> timeout. In any case I am grateful for your explanation. >>>> >>>> >>>> On Wednesday, July 13, 2011, Scott Gray <[hidden email]> wrote: >>>>> As best I can tell there shouldn't be any need to increase the interval between polls since the interval timer doesn't actually start until the previous poll has completed (see JobPoller.run()) so I can't see how a small interval would cause any backlog problems. >>>>> >>>>> I'm guessing if there is any lock contention then it's probably caused by the executing jobs trying to update their respective rows while the poller is holding a table lock. So from that point of view I guess increasing the interval could reduce the amount of contention between the executing jobs and the next poll. >>>>> >>>>> Regards >>>>> Scott >>>>> >>>>> On 14/07/2011, at 1:02 PM, Josh Jacobson wrote: >>>>> >>>>>> Scott, >>>>>> >>>>>> Thanks! That is very precise advise. Do you have a suggestion on >>>>>> interval time? 60 seconds? 120? >>>>>> >>>>>> Thanks, >>>>>> >>>>>> On Wed, Jul 13, 2011 at 5:34 PM, Scott Gray <[hidden email]> wrote: >>>>>>> That configuration is for the frequency of job polls. There isn't any ability to specify the transaction timeout via configuration so you'll need to modify the code directly: >>>>>>> JobManager.java (line 148): >>>>>>> beganTransaction = TransactionUtil.begin(); >>>>>>> needs to be changed to use TransactionUtil.begin(int) >>>>>>> >>>>>>> Regards >>>>>>> Scott >>>>>>> >>>>>>> HotWax Media >>>>>>> http://www.hotwaxmedia.com >>>>>>> >>>>>>> On 14/07/2011, at 12:23 PM, Josh Jacobson wrote: >>>>>>> >>>>>>>> Brett, >>>>>>>> >>>>>>>> Before I start trying to run the jobs manually, I want to give your >>>>>>>> suggestion a try. I think I know where to configure the job polling >>>>>>>> transaction time (I believe it's the poll-db-millis="20000" value on >>>>>>>> the framework/service/config/serviceengine.xml. >>>>>>>> >>>>>>>> However, I still don't know what to increase it to. I understand that >>>>>>>> we wouldn't want to make it bigger than the default polling interval. >>>>>>>> Do you know what the default interval between polling is? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> On Wed, Jul 13, 2011 at 12:31 PM, Brett Palmer <[hidden email]> wrote: >>>>>>>>> I meant removing finished jobs. If you have thousands of pending jobs then >>>>>>>>> you will have the same problem I mentioned in my first email. One >>>>>>>>> resolution will be to increase the job poller transaction time. In the >>>>>>>>> ofbiz version I was using there was not a way to configure the poller >>>>>>>>> transaction time. It just used the default time. I had to create a patch >>>>>>>>> to allow this to happen. >>>>>>>>> >>>>>>>>> In the patch you had to be careful to not increase the transaction time >>>>>>>>> greater than the frequency of the job poller. Otherwise you get into a lock >>>>>>>>> situation where one job poller is still running within a transaction and >>>>>>>>> another poller starts. This didn't create a huge problem but the second job >>>>>>>>> poller would usually lock and then time out. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Brett >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Jul 13, 2011 at 1:15 PM, Josh Jacobson <[hidden email]>wrote: >>>>>>>>> >>>>>>>>>> Brett, >>>>>>>>>> >>>>>>>>>> > > |
you going to run into this from time to time or one reason or another.
the approach I took was to spread the jobs out so they are not lumped together. take a look at how the jobs are Marshalled to be run. Josh Jacobson sent the following on 7/13/2011 8:35 PM: > Vacuum has been run, (took quite a while). Yeah, I see now that the > JobManager actually tries to update all the JobSandbox rows in the > transaction, so 60 seconds was pretty low. > > I am trying 10 minutes now and see how that goes. > > I am using postgress by the way. > > Thanks for the help, I really appreciate it. > > -- > Josh. > > On Wed, Jul 13, 2011 at 8:29 PM, Scott Gray <[hidden email]> wrote: >> Not sure what db you're using but it probably wouldn't hurt to run a vacuum on the table to speed up processing. >> >> By the way, I'm pretty sure the default timeout is 60 seconds so you might want to try something a little larger :-) >> >> Regards >> Scott >> >> On 14/07/2011, at 2:58 PM, Josh Jacobson wrote: >> >>> I tried 60 seconds for timeout but that didn't work. I guess Ill >>> double it now and keep trying. >>> >>> I have about 260,000 pending jobs, and nothing is getting done. >>> >>> I know what you mean about purgeOldjobs. That service is crashed now >>> and I deleted old jobs from the database by hand. I was up to 2.6 >>> million rows. Ofbiz was pretty much unusable. >>> >>> If you have any other suggestions I'd love Yo hear them. >>> >>> On Wednesday, July 13, 2011, Scott Gray <[hidden email]> wrote: >>>> Ah okay, that is entirely dependent on the number of jobs and the speed the server can process them. As a side note I would keep a close eye on the purgeOldJobs service, when it starts falling over (transaction timeout again) then the number of rows in the table will increase quickly which in turn will slow down polling. >>>> >>>> In general the whole persisted jobs implementation is a bit fragile, especially when dealing with a large number of jobs. I've wanted to replace it with something like quartz for a while but haven't had the time. >>>> >>>> Regards >>>> Scott >>>> >>>> On 14/07/2011, at 2:10 PM, Josh Jacobson wrote: >>>> >>>>> Thanks again. I actually meant a suggestion for the transaction >>>>> timeout. In any case I am grateful for your explanation. >>>>> >>>>> >>>>> On Wednesday, July 13, 2011, Scott Gray <[hidden email]> wrote: >>>>>> As best I can tell there shouldn't be any need to increase the interval between polls since the interval timer doesn't actually start until the previous poll has completed (see JobPoller.run()) so I can't see how a small interval would cause any backlog problems. >>>>>> >>>>>> I'm guessing if there is any lock contention then it's probably caused by the executing jobs trying to update their respective rows while the poller is holding a table lock. So from that point of view I guess increasing the interval could reduce the amount of contention between the executing jobs and the next poll. >>>>>> >>>>>> Regards >>>>>> Scott >>>>>> >>>>>> On 14/07/2011, at 1:02 PM, Josh Jacobson wrote: >>>>>> >>>>>>> Scott, >>>>>>> >>>>>>> Thanks! That is very precise advise. Do you have a suggestion on >>>>>>> interval time? 60 seconds? 120? >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> On Wed, Jul 13, 2011 at 5:34 PM, Scott Gray <[hidden email]> wrote: >>>>>>>> That configuration is for the frequency of job polls. There isn't any ability to specify the transaction timeout via configuration so you'll need to modify the code directly: >>>>>>>> JobManager.java (line 148): >>>>>>>> beganTransaction = TransactionUtil.begin(); >>>>>>>> needs to be changed to use TransactionUtil.begin(int) >>>>>>>> >>>>>>>> Regards >>>>>>>> Scott >>>>>>>> >>>>>>>> HotWax Media >>>>>>>> http://www.hotwaxmedia.com >>>>>>>> >>>>>>>> On 14/07/2011, at 12:23 PM, Josh Jacobson wrote: >>>>>>>> >>>>>>>>> Brett, >>>>>>>>> >>>>>>>>> Before I start trying to run the jobs manually, I want to give your >>>>>>>>> suggestion a try. I think I know where to configure the job polling >>>>>>>>> transaction time (I believe it's the poll-db-millis="20000" value on >>>>>>>>> the framework/service/config/serviceengine.xml. >>>>>>>>> >>>>>>>>> However, I still don't know what to increase it to. I understand that >>>>>>>>> we wouldn't want to make it bigger than the default polling interval. >>>>>>>>> Do you know what the default interval between polling is? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> On Wed, Jul 13, 2011 at 12:31 PM, Brett Palmer <[hidden email]> wrote: >>>>>>>>>> I meant removing finished jobs. If you have thousands of pending jobs then >>>>>>>>>> you will have the same problem I mentioned in my first email. One >>>>>>>>>> resolution will be to increase the job poller transaction time. In the >>>>>>>>>> ofbiz version I was using there was not a way to configure the poller >>>>>>>>>> transaction time. It just used the default time. I had to create a patch >>>>>>>>>> to allow this to happen. >>>>>>>>>> >>>>>>>>>> In the patch you had to be careful to not increase the transaction time >>>>>>>>>> greater than the frequency of the job poller. Otherwise you get into a lock >>>>>>>>>> situation where one job poller is still running within a transaction and >>>>>>>>>> another poller starts. This didn't create a huge problem but the second job >>>>>>>>>> poller would usually lock and then time out. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Brett >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Jul 13, 2011 at 1:15 PM, Josh Jacobson <[hidden email]>wrote: >>>>>>>>>> >>>>>>>>>>> Brett, >>>>>>>>>>> >>>>>>>>>>> >> >> > |
One feature that would help to prevent this problem in the future is a
configuration parameter in the service engine that would set the maximum number of jobs the poller would process at a time. Right now the poller reads the JobSandbox and gets every job that has a status of Pending. Then it tries to change the status for each of these to running (or something like that). If the number of pending jobs is too large the poller will time out before it can change the state of all the pending jobs. Changing the transaction timeout can help this problem but having another configuration like "max-poll-jobs" could limit the number of pending jobs that are processed in one transaction. There is a configuration called "jobs" but I don't think that is used by the polling process. I've tried to use the service engine as an asynchronous batch server but run into problems when the number of pending jobs gets around 10,000. Brett On Wed, Jul 13, 2011 at 10:34 PM, BJ Freeman <[hidden email]> wrote: > you going to run into this from time to time or one reason or another. > the approach I took was to spread the jobs out so they are not lumped > together. > take a look at how the jobs are Marshalled to be run. > > Josh Jacobson sent the following on 7/13/2011 8:35 PM: > > Vacuum has been run, (took quite a while). Yeah, I see now that the > > JobManager actually tries to update all the JobSandbox rows in the > > transaction, so 60 seconds was pretty low. > > > > I am trying 10 minutes now and see how that goes. > > > > I am using postgress by the way. > > > > Thanks for the help, I really appreciate it. > > > > -- > > Josh. > > > > On Wed, Jul 13, 2011 at 8:29 PM, Scott Gray <[hidden email]> > wrote: > >> Not sure what db you're using but it probably wouldn't hurt to run a > vacuum on the table to speed up processing. > >> > >> By the way, I'm pretty sure the default timeout is 60 seconds so you > might want to try something a little larger :-) > >> > >> Regards > >> Scott > >> > >> On 14/07/2011, at 2:58 PM, Josh Jacobson wrote: > >> > >>> I tried 60 seconds for timeout but that didn't work. I guess Ill > >>> double it now and keep trying. > >>> > >>> I have about 260,000 pending jobs, and nothing is getting done. > >>> > >>> I know what you mean about purgeOldjobs. That service is crashed now > >>> and I deleted old jobs from the database by hand. I was up to 2.6 > >>> million rows. Ofbiz was pretty much unusable. > >>> > >>> If you have any other suggestions I'd love Yo hear them. > >>> > >>> On Wednesday, July 13, 2011, Scott Gray <[hidden email]> > wrote: > >>>> Ah okay, that is entirely dependent on the number of jobs and the > speed the server can process them. As a side note I would keep a close eye > on the purgeOldJobs service, when it starts falling over (transaction > timeout again) then the number of rows in the table will increase quickly > which in turn will slow down polling. > >>>> > >>>> In general the whole persisted jobs implementation is a bit fragile, > especially when dealing with a large number of jobs. I've wanted to replace > it with something like quartz for a while but haven't had the time. > >>>> > >>>> Regards > >>>> Scott > >>>> > >>>> On 14/07/2011, at 2:10 PM, Josh Jacobson wrote: > >>>> > >>>>> Thanks again. I actually meant a suggestion for the transaction > >>>>> timeout. In any case I am grateful for your explanation. > >>>>> > >>>>> > >>>>> On Wednesday, July 13, 2011, Scott Gray <[hidden email]> > wrote: > >>>>>> As best I can tell there shouldn't be any need to increase the > interval between polls since the interval timer doesn't actually start until > the previous poll has completed (see JobPoller.run()) so I can't see how a > small interval would cause any backlog problems. > >>>>>> > >>>>>> I'm guessing if there is any lock contention then it's probably > caused by the executing jobs trying to update their respective rows while > the poller is holding a table lock. So from that point of view I guess > increasing the interval could reduce the amount of contention between the > executing jobs and the next poll. > >>>>>> > >>>>>> Regards > >>>>>> Scott > >>>>>> > >>>>>> On 14/07/2011, at 1:02 PM, Josh Jacobson wrote: > >>>>>> > >>>>>>> Scott, > >>>>>>> > >>>>>>> Thanks! That is very precise advise. Do you have a suggestion on > >>>>>>> interval time? 60 seconds? 120? > >>>>>>> > >>>>>>> Thanks, > >>>>>>> > >>>>>>> On Wed, Jul 13, 2011 at 5:34 PM, Scott Gray < > [hidden email]> wrote: > >>>>>>>> That configuration is for the frequency of job polls. There isn't > any ability to specify the transaction timeout via configuration so you'll > need to modify the code directly: > >>>>>>>> JobManager.java (line 148): > >>>>>>>> beganTransaction = TransactionUtil.begin(); > >>>>>>>> needs to be changed to use TransactionUtil.begin(int) > >>>>>>>> > >>>>>>>> Regards > >>>>>>>> Scott > >>>>>>>> > >>>>>>>> HotWax Media > >>>>>>>> http://www.hotwaxmedia.com > >>>>>>>> > >>>>>>>> On 14/07/2011, at 12:23 PM, Josh Jacobson wrote: > >>>>>>>> > >>>>>>>>> Brett, > >>>>>>>>> > >>>>>>>>> Before I start trying to run the jobs manually, I want to give > your > >>>>>>>>> suggestion a try. I think I know where to configure the job > polling > >>>>>>>>> transaction time (I believe it's the poll-db-millis="20000" value > on > >>>>>>>>> the framework/service/config/serviceengine.xml. > >>>>>>>>> > >>>>>>>>> However, I still don't know what to increase it to. I understand > that > >>>>>>>>> we wouldn't want to make it bigger than the default polling > interval. > >>>>>>>>> Do you know what the default interval between polling is? > >>>>>>>>> > >>>>>>>>> Thanks, > >>>>>>>>> > >>>>>>>>> On Wed, Jul 13, 2011 at 12:31 PM, Brett Palmer < > [hidden email]> wrote: > >>>>>>>>>> I meant removing finished jobs. If you have thousands of > pending jobs then > >>>>>>>>>> you will have the same problem I mentioned in my first email. > One > >>>>>>>>>> resolution will be to increase the job poller transaction time. > In the > >>>>>>>>>> ofbiz version I was using there was not a way to configure the > poller > >>>>>>>>>> transaction time. It just used the default time. I had to > create a patch > >>>>>>>>>> to allow this to happen. > >>>>>>>>>> > >>>>>>>>>> In the patch you had to be careful to not increase the > transaction time > >>>>>>>>>> greater than the frequency of the job poller. Otherwise you get > into a lock > >>>>>>>>>> situation where one job poller is still running within a > transaction and > >>>>>>>>>> another poller starts. This didn't create a huge problem but > the second job > >>>>>>>>>> poller would usually lock and then time out. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Brett > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> On Wed, Jul 13, 2011 at 1:15 PM, Josh Jacobson < > [hidden email]>wrote: > >>>>>>>>>> > >>>>>>>>>>> Brett, > >>>>>>>>>>> > >>>>>>>>>>> > >> > >> > > > |
I find that anything not time based does not work when, like you said
the numbers get large. I added the createtime to the conditions currently set in the milliseconds. Brett Palmer sent the following on 7/14/2011 5:35 AM: > One feature that would help to prevent this problem in the future is a > configuration parameter in the service engine that would set the maximum > number of jobs the poller would process at a time. Right now the poller > reads the JobSandbox and gets every job that has a status of Pending. Then > it tries to change the status for each of these to running (or something > like that). If the number of pending jobs is too large the poller will time > out before it can change the state of all the pending jobs. Changing the > transaction timeout can help this problem but having another configuration > like "max-poll-jobs" could limit the number of pending jobs that are > processed in one transaction. There is a configuration called "jobs" but I > don't think that is used by the polling process. > > I've tried to use the service engine as an asynchronous batch server but run > into problems when the number of pending jobs gets around 10,000. > > > Brett > > On Wed, Jul 13, 2011 at 10:34 PM, BJ Freeman <[hidden email]> wrote: > >> you going to run into this from time to time or one reason or another. >> the approach I took was to spread the jobs out so they are not lumped >> together. >> take a look at how the jobs are Marshalled to be run. >> >> Josh Jacobson sent the following on 7/13/2011 8:35 PM: >>> Vacuum has been run, (took quite a while). Yeah, I see now that the >>> JobManager actually tries to update all the JobSandbox rows in the >>> transaction, so 60 seconds was pretty low. >>> >>> I am trying 10 minutes now and see how that goes. >>> >>> I am using postgress by the way. >>> >>> Thanks for the help, I really appreciate it. >>> >>> -- >>> Josh. >>> >>> On Wed, Jul 13, 2011 at 8:29 PM, Scott Gray <[hidden email]> >> wrote: >>>> Not sure what db you're using but it probably wouldn't hurt to run a >> vacuum on the table to speed up processing. >>>> >>>> By the way, I'm pretty sure the default timeout is 60 seconds so you >> might want to try something a little larger :-) >>>> >>>> Regards >>>> Scott >>>> >>>> On 14/07/2011, at 2:58 PM, Josh Jacobson wrote: >>>> >>>>> I tried 60 seconds for timeout but that didn't work. I guess Ill >>>>> double it now and keep trying. >>>>> >>>>> I have about 260,000 pending jobs, and nothing is getting done. >>>>> >>>>> I know what you mean about purgeOldjobs. That service is crashed now >>>>> and I deleted old jobs from the database by hand. I was up to 2.6 >>>>> million rows. Ofbiz was pretty much unusable. >>>>> >>>>> If you have any other suggestions I'd love Yo hear them. >>>>> >>>>> On Wednesday, July 13, 2011, Scott Gray <[hidden email]> >> wrote: >>>>>> Ah okay, that is entirely dependent on the number of jobs and the >> speed the server can process them. As a side note I would keep a close eye >> on the purgeOldJobs service, when it starts falling over (transaction >> timeout again) then the number of rows in the table will increase quickly >> which in turn will slow down polling. >>>>>> >>>>>> In general the whole persisted jobs implementation is a bit fragile, >> especially when dealing with a large number of jobs. I've wanted to replace >> it with something like quartz for a while but haven't had the time. >>>>>> >>>>>> Regards >>>>>> Scott >>>>>> >>>>>> On 14/07/2011, at 2:10 PM, Josh Jacobson wrote: >>>>>> >>>>>>> Thanks again. I actually meant a suggestion for the transaction >>>>>>> timeout. In any case I am grateful for your explanation. >>>>>>> >>>>>>> >>>>>>> On Wednesday, July 13, 2011, Scott Gray <[hidden email]> >> wrote: >>>>>>>> As best I can tell there shouldn't be any need to increase the >> interval between polls since the interval timer doesn't actually start until >> the previous poll has completed (see JobPoller.run()) so I can't see how a >> small interval would cause any backlog problems. >>>>>>>> >>>>>>>> I'm guessing if there is any lock contention then it's probably >> caused by the executing jobs trying to update their respective rows while >> the poller is holding a table lock. So from that point of view I guess >> increasing the interval could reduce the amount of contention between the >> executing jobs and the next poll. >>>>>>>> >>>>>>>> Regards >>>>>>>> Scott >>>>>>>> >>>>>>>> On 14/07/2011, at 1:02 PM, Josh Jacobson wrote: >>>>>>>> >>>>>>>>> Scott, >>>>>>>>> >>>>>>>>> Thanks! That is very precise advise. Do you have a suggestion on >>>>>>>>> interval time? 60 seconds? 120? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> On Wed, Jul 13, 2011 at 5:34 PM, Scott Gray < >> [hidden email]> wrote: >>>>>>>>>> That configuration is for the frequency of job polls. There isn't >> any ability to specify the transaction timeout via configuration so you'll >> need to modify the code directly: >>>>>>>>>> JobManager.java (line 148): >>>>>>>>>> beganTransaction = TransactionUtil.begin(); >>>>>>>>>> needs to be changed to use TransactionUtil.begin(int) >>>>>>>>>> >>>>>>>>>> Regards >>>>>>>>>> Scott >>>>>>>>>> >>>>>>>>>> HotWax Media >>>>>>>>>> http://www.hotwaxmedia.com >>>>>>>>>> >>>>>>>>>> On 14/07/2011, at 12:23 PM, Josh Jacobson wrote: >>>>>>>>>> >>>>>>>>>>> Brett, >>>>>>>>>>> >>>>>>>>>>> Before I start trying to run the jobs manually, I want to give >> your >>>>>>>>>>> suggestion a try. I think I know where to configure the job >> polling >>>>>>>>>>> transaction time (I believe it's the poll-db-millis="20000" value >> on >>>>>>>>>>> the framework/service/config/serviceengine.xml. >>>>>>>>>>> >>>>>>>>>>> However, I still don't know what to increase it to. I understand >> that >>>>>>>>>>> we wouldn't want to make it bigger than the default polling >> interval. >>>>>>>>>>> Do you know what the default interval between polling is? >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> On Wed, Jul 13, 2011 at 12:31 PM, Brett Palmer < >> [hidden email]> wrote: >>>>>>>>>>>> I meant removing finished jobs. If you have thousands of >> pending jobs then >>>>>>>>>>>> you will have the same problem I mentioned in my first email. >> One >>>>>>>>>>>> resolution will be to increase the job poller transaction time. >> In the >>>>>>>>>>>> ofbiz version I was using there was not a way to configure the >> poller >>>>>>>>>>>> transaction time. It just used the default time. I had to >> create a patch >>>>>>>>>>>> to allow this to happen. >>>>>>>>>>>> >>>>>>>>>>>> In the patch you had to be careful to not increase the >> transaction time >>>>>>>>>>>> greater than the frequency of the job poller. Otherwise you get >> into a lock >>>>>>>>>>>> situation where one job poller is still running within a >> transaction and >>>>>>>>>>>> another poller starts. This didn't create a huge problem but >> the second job >>>>>>>>>>>> poller would usually lock and then time out. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Brett >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Jul 13, 2011 at 1:15 PM, Josh Jacobson < >> [hidden email]>wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Brett, >>>>>>>>>>>>> >>>>>>>>>>>>> >>>> >>>> >>> >> > |
I should add that the environment also has a lot to o with this.
In this area I have changed to Solid State Drives for Storage and 32gb SDHC for Swap files. BJ Freeman sent the following on 7/14/2011 8:09 AM: > I find that anything not time based does not work when, like you said > the numbers get large. > I added the createtime to the conditions currently set in the milliseconds. > > Brett Palmer sent the following on 7/14/2011 5:35 AM: >> One feature that would help to prevent this problem in the future is a >> configuration parameter in the service engine that would set the maximum >> number of jobs the poller would process at a time. Right now the poller >> reads the JobSandbox and gets every job that has a status of Pending. Then >> it tries to change the status for each of these to running (or something >> like that). If the number of pending jobs is too large the poller will time >> out before it can change the state of all the pending jobs. Changing the >> transaction timeout can help this problem but having another configuration >> like "max-poll-jobs" could limit the number of pending jobs that are >> processed in one transaction. There is a configuration called "jobs" but I >> don't think that is used by the polling process. >> >> I've tried to use the service engine as an asynchronous batch server but run >> into problems when the number of pending jobs gets around 10,000. >> >> >> Brett >> >> On Wed, Jul 13, 2011 at 10:34 PM, BJ Freeman <[hidden email]> wrote: >> >>> you going to run into this from time to time or one reason or another. >>> the approach I took was to spread the jobs out so they are not lumped >>> together. >>> take a look at how the jobs are Marshalled to be run. >>> >>> Josh Jacobson sent the following on 7/13/2011 8:35 PM: >>>> Vacuum has been run, (took quite a while). Yeah, I see now that the >>>> JobManager actually tries to update all the JobSandbox rows in the >>>> transaction, so 60 seconds was pretty low. >>>> >>>> I am trying 10 minutes now and see how that goes. >>>> >>>> I am using postgress by the way. >>>> >>>> Thanks for the help, I really appreciate it. >>>> >>>> -- >>>> Josh. >>>> >>>> On Wed, Jul 13, 2011 at 8:29 PM, Scott Gray <[hidden email]> >>> wrote: >>>>> Not sure what db you're using but it probably wouldn't hurt to run a >>> vacuum on the table to speed up processing. >>>>> >>>>> By the way, I'm pretty sure the default timeout is 60 seconds so you >>> might want to try something a little larger :-) >>>>> >>>>> Regards >>>>> Scott >>>>> >>>>> On 14/07/2011, at 2:58 PM, Josh Jacobson wrote: >>>>> >>>>>> I tried 60 seconds for timeout but that didn't work. I guess Ill >>>>>> double it now and keep trying. >>>>>> >>>>>> I have about 260,000 pending jobs, and nothing is getting done. >>>>>> >>>>>> I know what you mean about purgeOldjobs. That service is crashed now >>>>>> and I deleted old jobs from the database by hand. I was up to 2.6 >>>>>> million rows. Ofbiz was pretty much unusable. >>>>>> >>>>>> If you have any other suggestions I'd love Yo hear them. >>>>>> >>>>>> On Wednesday, July 13, 2011, Scott Gray <[hidden email]> >>> wrote: >>>>>>> Ah okay, that is entirely dependent on the number of jobs and the >>> speed the server can process them. As a side note I would keep a close eye >>> on the purgeOldJobs service, when it starts falling over (transaction >>> timeout again) then the number of rows in the table will increase quickly >>> which in turn will slow down polling. >>>>>>> >>>>>>> In general the whole persisted jobs implementation is a bit fragile, >>> especially when dealing with a large number of jobs. I've wanted to replace >>> it with something like quartz for a while but haven't had the time. >>>>>>> >>>>>>> Regards >>>>>>> Scott >>>>>>> >>>>>>> On 14/07/2011, at 2:10 PM, Josh Jacobson wrote: >>>>>>> >>>>>>>> Thanks again. I actually meant a suggestion for the transaction >>>>>>>> timeout. In any case I am grateful for your explanation. >>>>>>>> >>>>>>>> >>>>>>>> On Wednesday, July 13, 2011, Scott Gray <[hidden email]> >>> wrote: >>>>>>>>> As best I can tell there shouldn't be any need to increase the >>> interval between polls since the interval timer doesn't actually start until >>> the previous poll has completed (see JobPoller.run()) so I can't see how a >>> small interval would cause any backlog problems. >>>>>>>>> >>>>>>>>> I'm guessing if there is any lock contention then it's probably >>> caused by the executing jobs trying to update their respective rows while >>> the poller is holding a table lock. So from that point of view I guess >>> increasing the interval could reduce the amount of contention between the >>> executing jobs and the next poll. >>>>>>>>> >>>>>>>>> Regards >>>>>>>>> Scott >>>>>>>>> >>>>>>>>> On 14/07/2011, at 1:02 PM, Josh Jacobson wrote: >>>>>>>>> >>>>>>>>>> Scott, >>>>>>>>>> >>>>>>>>>> Thanks! That is very precise advise. Do you have a suggestion on >>>>>>>>>> interval time? 60 seconds? 120? >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> On Wed, Jul 13, 2011 at 5:34 PM, Scott Gray < >>> [hidden email]> wrote: >>>>>>>>>>> That configuration is for the frequency of job polls. There isn't >>> any ability to specify the transaction timeout via configuration so you'll >>> need to modify the code directly: >>>>>>>>>>> JobManager.java (line 148): >>>>>>>>>>> beganTransaction = TransactionUtil.begin(); >>>>>>>>>>> needs to be changed to use TransactionUtil.begin(int) >>>>>>>>>>> >>>>>>>>>>> Regards >>>>>>>>>>> Scott >>>>>>>>>>> >>>>>>>>>>> HotWax Media >>>>>>>>>>> http://www.hotwaxmedia.com >>>>>>>>>>> >>>>>>>>>>> On 14/07/2011, at 12:23 PM, Josh Jacobson wrote: >>>>>>>>>>> >>>>>>>>>>>> Brett, >>>>>>>>>>>> >>>>>>>>>>>> Before I start trying to run the jobs manually, I want to give >>> your >>>>>>>>>>>> suggestion a try. I think I know where to configure the job >>> polling >>>>>>>>>>>> transaction time (I believe it's the poll-db-millis="20000" value >>> on >>>>>>>>>>>> the framework/service/config/serviceengine.xml. >>>>>>>>>>>> >>>>>>>>>>>> However, I still don't know what to increase it to. I understand >>> that >>>>>>>>>>>> we wouldn't want to make it bigger than the default polling >>> interval. >>>>>>>>>>>> Do you know what the default interval between polling is? >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Jul 13, 2011 at 12:31 PM, Brett Palmer < >>> [hidden email]> wrote: >>>>>>>>>>>>> I meant removing finished jobs. If you have thousands of >>> pending jobs then >>>>>>>>>>>>> you will have the same problem I mentioned in my first email. >>> One >>>>>>>>>>>>> resolution will be to increase the job poller transaction time. >>> In the >>>>>>>>>>>>> ofbiz version I was using there was not a way to configure the >>> poller >>>>>>>>>>>>> transaction time. It just used the default time. I had to >>> create a patch >>>>>>>>>>>>> to allow this to happen. >>>>>>>>>>>>> >>>>>>>>>>>>> In the patch you had to be careful to not increase the >>> transaction time >>>>>>>>>>>>> greater than the frequency of the job poller. Otherwise you get >>> into a lock >>>>>>>>>>>>> situation where one job poller is still running within a >>> transaction and >>>>>>>>>>>>> another poller starts. This didn't create a huge problem but >>> the second job >>>>>>>>>>>>> poller would usually lock and then time out. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Brett >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Jul 13, 2011 at 1:15 PM, Josh Jacobson < >>> [hidden email]>wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Brett, >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>> >>>>> >>>> >>> >> > |
Free forum by Nabble | Edit this page |