As Adrian and I previously discussed, he said he had discovered some
possible problems with SequenceUtil in multi-threaded situations. He discovered this when he made EntityDataLoadContainer load each xml file in a thread. I've recently done the same on my local copy, but I don't see any problems. What I did see, however, was that just throwing every xml data file into a thread(actually, a 4-count thread pool), had errors loading some files, because each file has an implicit dependency on some possible other set of files, and those files hadn't been loaded yet. So, before doing a thread load, the files would have to have an explicit dependency listed, so that correct ordering could be done. This is not something that would make ofbiz easier to use. Trying to figure out the implicit dependencies automatically by comparing each entity line isn't worthwhile, as that would be reimplementing a database, and what would be the point. So, Adrian, if you have any more pointers as to what your original change did, I'd appreciate any insight you might have. Otherwise, I will say that we can't load data in parallel. Additionally, I suspsected that SequenceUtil actually *didn't* have any problems. I wrote a test case quite a while back that did multi-threaded testing of SequenceUtil, and it never had any problems. It used 100 threads, with each thread trying to allocate 1000 sequence values. |
Adam Heath wrote:
> As Adrian and I previously discussed, he said he had discovered some > possible problems with SequenceUtil in multi-threaded situations. He > discovered this when he made EntityDataLoadContainer load each xml > file in a thread. > > I've recently done the same on my local copy, but I don't see any > problems. What I did see, however, was that just throwing every xml > data file into a thread(actually, a 4-count thread pool), had errors > loading some files, because each file has an implicit dependency on > some possible other set of files, and those files hadn't been loaded yet. > > So, before doing a thread load, the files would have to have an > explicit dependency listed, so that correct ordering could be done. > This is not something that would make ofbiz easier to use. > > Trying to figure out the implicit dependencies automatically by > comparing each entity line isn't worthwhile, as that would be > reimplementing a database, and what would be the point. > > So, Adrian, if you have any more pointers as to what your original > change did, I'd appreciate any insight you might have. Otherwise, I > will say that we can't load data in parallel. > > Additionally, I suspsected that SequenceUtil actually *didn't* have > any problems. I wrote a test case quite a while back that did > multi-threaded testing of SequenceUtil, and it never had any problems. > It used 100 threads, with each thread trying to allocate 1000 > sequence values. I ran my patch against your recent changes and the errors went away. I guess we can consider that issue resolved. As far as the approach I took to multi-threading the data load - here is an overview: I was able to run certain tasks in parallel - creating entities and creating primary keys, for example. I have the number of threads allocated configured in a properties file. By tweaking that number I was able to increase CPU utilization and reduce the creation time. Of course there was a threshold where CPU utilization was raised and creation time decreased - due to thread thrash. Creating foreign keys must be run on a single thread to prevent database deadlocks. I multi-threaded the data load by having one thread parse the XML files and put the results in a queue. Another thread services the queue and loads the data. I also multi-threaded the EECAs - but that has an issue I need to solve. My original goal was to reduce the ant clean-all + ant run-install cycle time. I recently purchased a much faster development machine that completes the cycle in about 2 minutes - slightly longer than the multi-threaded code, so I don't have much of an incentive to develop the patch further. The whole experience was an educational one. There is a possibility the techniques I developed could be used to speed up import/export of large datasets. If anyone is interested in that, I am available for hire. -Adrian |
Adrian Crum wrote:
> I ran my patch against your recent changes and the errors went away. I > guess we can consider that issue resolved. Yeah, I did do some changes to SequenceUtil a while back. The biggest functional change was to remove some variables from the inner class to the outer, and not try to access them all the time. > As far as the approach I took to multi-threading the data load - here is > an overview: > > I was able to run certain tasks in parallel - creating entities and > creating primary keys, for example. I have the number of threads > allocated configured in a properties file. By tweaking that number I was > able to increase CPU utilization and reduce the creation time. Of course > there was a threshold where CPU utilization was raised and creation time > decreased - due to thread thrash. So each entity creation itself was a separate work unit. Once an entity was created, you could submit the primary key creation as well. That's simple enough to implement(in theory, anyways). This design is starting to go towards the Sandstorm(1) approach. There are ways to find out how many cpus are available. Look at org.ofbiz.base.concurrent.ExecutionPool.getNewOptimalExecutor(); it calls into ManagementFactory. > Creating foreign keys must be run on a single thread to prevent database > deadlocks. Maybe. If the entity and primary keys are all created for both sides of the foreign key, then shouldn't it be possible to submit the work unit to the pool? > I multi-threaded the data load by having one thread parse the XML files > and put the results in a queue. Another thread services the queue and > loads the data. I also multi-threaded the EECAs - but that has an issue > I need to solve. Hmm. You dug deeper, splitting up the points into separate calls. I hadn't done that yet, and just dumped each xml file to a separate thread. My approach is obviously wrong. > My original goal was to reduce the ant clean-all + ant run-install cycle > time. I recently purchased a much faster development machine that > completes the cycle in about 2 minutes - slightly longer than the > multi-threaded code, so I don't have much of an incentive to develop the > patch further. I've reduced the time it takes to do a run-tests loop. The changes I've done to log4j.xml reduces the *extreme* debug logging produced by several classes. log4j would create a new exception, so that it could get the correct class and line number to print to the log. This is a heavy-weight operation. This mostly showed up as slowness when catalina would start up, so this set of changes doesn't directly affect the run-install cycle. > The whole experience was an educational one. There is a possibility the > techniques I developed could be used to speed up import/export of large > datasets. If anyone is interested in that, I am available for hire. We have a site, where users could upload original images(6), then fill out a bunch of form data, then some pdfs would be generated. I would submit a bunch of image resize operations(had to make 2 reduced-size images for each of the originals). All of those are able to run in parallel. Then, once all the images were done, the 2 pdfs would be submitted. This entire pipeline itself might be run in parallel too, as the user could have multiple such records that needed to be updated. 1: http://www.eecs.harvard.edu/~mdw/proj/seda/ |
Adam Heath wrote:
> Adrian Crum wrote: >> I ran my patch against your recent changes and the errors went away. I >> guess we can consider that issue resolved. > > Yeah, I did do some changes to SequenceUtil a while back. The biggest > functional change was to remove some variables from the inner class to > the outer, and not try to access them all the time. > >> As far as the approach I took to multi-threading the data load - here is >> an overview: >> >> I was able to run certain tasks in parallel - creating entities and >> creating primary keys, for example. I have the number of threads >> allocated configured in a properties file. By tweaking that number I was >> able to increase CPU utilization and reduce the creation time. Of course >> there was a threshold where CPU utilization was raised and creation time >> decreased - due to thread thrash. > > So each entity creation itself was a separate work unit. Once an > entity was created, you could submit the primary key creation as well. > That's simple enough to implement(in theory, anyways). This design > is starting to go towards the Sandstorm(1) approach. > > There are ways to find out how many cpus are available. Look at > org.ofbiz.base.concurrent.ExecutionPool.getNewOptimalExecutor(); it > calls into ManagementFactory. I don't think the number of CPUs is useful information. Even a single CPU system might benefit. From my perspective, the best approach is to have a human tweak the settings to get the result they want. I might be wrong, but I don't think you can do that automatically. >> Creating foreign keys must be run on a single thread to prevent database >> deadlocks. > > Maybe. If the entity and primary keys are all created for both sides > of the foreign key, then shouldn't it be possible to submit the work > unit to the pool? I don't know - I didn't spend a lot of time thinking about it. I just separated out the create foreign keys loop and executed it in a single thread. It would be fun to go back and analyze the code more and come up with a multi-threaded solution. >> I multi-threaded the data load by having one thread parse the XML files >> and put the results in a queue. Another thread services the queue and >> loads the data. I also multi-threaded the EECAs - but that has an issue >> I need to solve. > > Hmm. You dug deeper, splitting up the points into separate calls. I > hadn't done that yet, and just dumped each xml file to a separate > thread. My approach is obviously wrong. > >> My original goal was to reduce the ant clean-all + ant run-install cycle >> time. I recently purchased a much faster development machine that >> completes the cycle in about 2 minutes - slightly longer than the >> multi-threaded code, so I don't have much of an incentive to develop the >> patch further. > > I've reduced the time it takes to do a run-tests loop. The changes > I've done to log4j.xml reduces the *extreme* debug logging produced by > several classes. log4j would create a new exception, so that it could > get the correct class and line number to print to the log. This is a > heavy-weight operation. This mostly showed up as slowness when > catalina would start up, so this set of changes doesn't directly > affect the run-install cycle. I had to disable logging entirely in the patch. The logger would get swamped and throw an exception - bringing everything to a stop. >> The whole experience was an educational one. There is a possibility the >> techniques I developed could be used to speed up import/export of large >> datasets. If anyone is interested in that, I am available for hire. > > We have a site, where users could upload original images(6), then fill > out a bunch of form data, then some pdfs would be generated. I would > submit a bunch of image resize operations(had to make 2 reduced-size > images for each of the originals). All of those are able to run in > parallel. Then, once all the images were done, the 2 pdfs would be > submitted. This entire pipeline itself might be run in parallel too, > as the user could have multiple such records that needed to be updated. > > 1: http://www.eecs.harvard.edu/~mdw/proj/seda/ Cool site Bro. |
In reply to this post by Adam Heath-2
Adam Heath wrote:
> So each entity creation itself was a separate work unit. Once an > entity was created, you could submit the primary key creation as well. > That's simple enough to implement(in theory, anyways). This design > is starting to go towards the Sandstorm(1) approach. I just looked at that site briefly. You're right - my thinking was a lot like that. Split up the work with queues - in other words, use the provider/consumer pattern. If I was designing a product like OFBiz, I would have JMS at the front end. Each request gets packaged up into a JMS message and submitted to a queue. Different tasks respond to the queued messages. The last task is writing the response. The app server's request thread returns almost immediately. Each queue/task could be optimized. |
Administrator
|
From: "Adrian Crum" <[hidden email]>
> Adam Heath wrote: >> So each entity creation itself was a separate work unit. Once an >> entity was created, you could submit the primary key creation as well. >> That's simple enough to implement(in theory, anyways). This design >> is starting to go towards the Sandstorm(1) approach. > > I just looked at that site briefly. You're right - my thinking was a lot > like that. Split up the work with queues - in other words, use the > provider/consumer pattern. > > If I was designing a product like OFBiz, I would have JMS at the front > end. Each request gets packaged up into a JMS message and submitted to a > queue. Different tasks respond to the queued messages. The last task is > writing the response. The app server's request thread returns almost > immediately. Each queue/task could be optimized. This makes remind me that it's mostly what is used underneath in something like ServiceMix or Mule (ESBs). ServiceMix is Based on the JBI concept http://servicemix.apache.org/what-is-jbi.html and uses http://activemq.apache.org/ underneath Jacques |
--- On Thu, 4/1/10, Jacques Le Roux <[hidden email]> wrote:
> > Adam Heath wrote: > >> So each entity creation itself was a separate work > unit. Once an > >> entity was created, you could submit the primary > key creation as well. > >> That's simple enough to implement(in theory, > anyways). This design > >> is starting to go towards the Sandstorm(1) > approach. > > > > I just looked at that site briefly. You're right - my > thinking was a lot like that. Split up the work with queues > - in other words, use the provider/consumer pattern. > > > > If I was designing a product like OFBiz, I would have > JMS at the front end. Each request gets packaged up into a > JMS message and submitted to a queue. Different tasks > respond to the queued messages. The last task is writing the > response. The app server's request thread returns almost > immediately. Each queue/task could be optimized. > > This makes remind me that it's mostly what is used > underneath in something like ServiceMix or Mule (ESBs). > ServiceMix is Based on the JBI concept http://servicemix.apache.org/what-is-jbi.html > and uses http://activemq.apache.org/ underneath Actually, the goals and designs are quite different. The goal of ESB is to have a standards-based message bus so that applications from different vendors can inter-operate. The goal of SEDA (Adam's link) is to use queues to provide uniform response time in servers and allow their services to degrade gracefully under load. My idea of using JMS is for overload control. Each queue can be serviced by any number of servers (since JMS uses JNDI). In effect, the application itself becomes a crude load balancer. -Adrian |
In reply to this post by Adrian Crum
Adrian Crum wrote:
> I multi-threaded the data load by having one thread parse the XML files > and put the results in a queue. Another thread services the queue and > loads the data. I also multi-threaded the EECAs - but that has an issue > I need to solve. We need to be careful with that. EntitySaxReader supports reading extremely large data files; it doesn't read the entire thing into memory. So, any such event dispatch system needs to keep the parsing from getting to far ahead. |
--- On Thu, 4/1/10, Adam Heath <[hidden email]> wrote:
> Adrian Crum wrote: > > I multi-threaded the data load by having one thread > parse the XML files > > and put the results in a queue. Another thread > services the queue and > > loads the data. I also multi-threaded the EECAs - but > that has an issue > > I need to solve. > > We need to be careful with that. EntitySaxReader > supports reading > extremely large data files; it doesn't read the entire > thing into > memory. So, any such event dispatch system needs to > keep the parsing > from getting to far ahead. http://java.sun.com/javase/6/docs/api/java/util/concurrent/BlockingQueue.html |
Adrian Crum wrote:
> --- On Thu, 4/1/10, Adam Heath <[hidden email]> wrote: >> Adrian Crum wrote: >>> I multi-threaded the data load by having one thread >> parse the XML files >>> and put the results in a queue. Another thread >> services the queue and >>> loads the data. I also multi-threaded the EECAs - but >> that has an issue >>> I need to solve. >> We need to be careful with that. EntitySaxReader >> supports reading >> extremely large data files; it doesn't read the entire >> thing into >> memory. So, any such event dispatch system needs to >> keep the parsing >> from getting to far ahead. > > http://java.sun.com/javase/6/docs/api/java/util/concurrent/BlockingQueue.html Not really. That will block the calling thread when no data is available. |
--- On Thu, 4/1/10, Adam Heath <[hidden email]> wrote:
> Adrian Crum wrote: > > --- On Thu, 4/1/10, Adam Heath <[hidden email]> > wrote: > >> Adrian Crum wrote: > >>> I multi-threaded the data load by having one > thread > >> parse the XML files > >>> and put the results in a queue. Another > thread > >> services the queue and > >>> loads the data. I also multi-threaded the > EECAs - but > >> that has an issue > >>> I need to solve. > >> We need to be careful with that. > EntitySaxReader > >> supports reading > >> extremely large data files; it doesn't read the > entire > >> thing into > >> memory. So, any such event dispatch system > needs to > >> keep the parsing > >> from getting to far ahead. > > > > http://java.sun.com/javase/6/docs/api/java/util/concurrent/BlockingQueue.html > > Not really. That will block the calling thread when > no data is available. Yeah, really. 1. Construct a FIFO queue, fire up n consumers to service the queue. 2. Consumers block, waiting for queue elements. 3. Producer adds elements to queue. Consumers unblock. 4. Queue reaches capacity, producer blocks, waiting for room. 5. Consumers empty the queue. 6. Goto step 2. |
Adrian Crum wrote:
> --- On Thu, 4/1/10, Adam Heath <[hidden email]> wrote: >> Adrian Crum wrote: >>> --- On Thu, 4/1/10, Adam Heath <[hidden email]> >> wrote: >>>> Adrian Crum wrote: >>>>> I multi-threaded the data load by having one >> thread >>>> parse the XML files >>>>> and put the results in a queue. Another >> thread >>>> services the queue and >>>>> loads the data. I also multi-threaded the >> EECAs - but >>>> that has an issue >>>>> I need to solve. >>>> We need to be careful with that. >> EntitySaxReader >>>> supports reading >>>> extremely large data files; it doesn't read the >> entire >>>> thing into >>>> memory. So, any such event dispatch system >> needs to >>>> keep the parsing >>>> from getting to far ahead. >>> http://java.sun.com/javase/6/docs/api/java/util/concurrent/BlockingQueue.html >> Not really. That will block the calling thread when >> no data is available. > > Yeah, really. > > 1. Construct a FIFO queue, fire up n consumers to service the queue. > 2. Consumers block, waiting for queue elements. > 3. Producer adds elements to queue. Consumers unblock. > 4. Queue reaches capacity, producer blocks, waiting for room. > 5. Consumers empty the queue. > 6. Goto step 2. And that's a blocking algo, which is bad. If you only have a limited number of threads, then anytime one of them blocks, the thread becomes unavailable to do real work. What needs to happen in these cases is that the thread removes it self from the thread pool, and the consumer thread then had to resubmit the producer. The whole point of SEDA is to not have unbounded resource usage. If a thread gets blocked, then that implies that another new thread will be needed to keep the work queue proceeding. > > > > > |
--- On Thu, 4/1/10, Adam Heath <[hidden email]> wrote:
> Adrian Crum wrote: > > --- On Thu, 4/1/10, Adam Heath <[hidden email]> > wrote: > >> Adrian Crum wrote: > >>> --- On Thu, 4/1/10, Adam Heath <[hidden email]> > >> wrote: > >>>> Adrian Crum wrote: > >>>>> I multi-threaded the data load by > having one > >> thread > >>>> parse the XML files > >>>>> and put the results in a queue. > Another > >> thread > >>>> services the queue and > >>>>> loads the data. I also multi-threaded > the > >> EECAs - but > >>>> that has an issue > >>>>> I need to solve. > >>>> We need to be careful with that. > >> EntitySaxReader > >>>> supports reading > >>>> extremely large data files; it doesn't > read the > >> entire > >>>> thing into > >>>> memory. So, any such event dispatch > system > >> needs to > >>>> keep the parsing > >>>> from getting to far ahead. > >>> http://java.sun.com/javase/6/docs/api/java/util/concurrent/BlockingQueue.html > >> Not really. That will block the calling > thread when > >> no data is available. > > > > Yeah, really. > > > > 1. Construct a FIFO queue, fire up n consumers to > service the queue. > > 2. Consumers block, waiting for queue elements. > > 3. Producer adds elements to queue. Consumers > unblock. > > 4. Queue reaches capacity, producer blocks, waiting > for room. > > 5. Consumers empty the queue. > > 6. Goto step 2. > > And that's a blocking algo, which is bad. Huh? You just asked for a blocking algorithm: "So, any such event dispatch system needs to keep the parsing from getting to far ahead." > The whole point of SEDA is to not have unbounded resource > usage. If a > thread gets blocked, then that implies that another new > thread will be > needed to keep the work queue proceeding. You lost me again. I thought we were talking about entity import/export - not SEDA. |
Adrian Crum wrote:
> --- On Thu, 4/1/10, Adam Heath <[hidden email]> wrote: >> Adrian Crum wrote: >>> --- On Thu, 4/1/10, Adam Heath <[hidden email]> >> wrote: >>>> Adrian Crum wrote: >>>>> --- On Thu, 4/1/10, Adam Heath <[hidden email]> >>>> wrote: >>>>>> Adrian Crum wrote: >>>>>>> I multi-threaded the data load by >> having one >>>> thread >>>>>> parse the XML files >>>>>>> and put the results in a queue. >> Another >>>> thread >>>>>> services the queue and >>>>>>> loads the data. I also multi-threaded >> the >>>> EECAs - but >>>>>> that has an issue >>>>>>> I need to solve. >>>>>> We need to be careful with that. >>>> EntitySaxReader >>>>>> supports reading >>>>>> extremely large data files; it doesn't >> read the >>>> entire >>>>>> thing into >>>>>> memory. So, any such event dispatch >> system >>>> needs to >>>>>> keep the parsing >>>>>> from getting to far ahead. >>>>> http://java.sun.com/javase/6/docs/api/java/util/concurrent/BlockingQueue.html >>>> Not really. That will block the calling >> thread when >>>> no data is available. >>> Yeah, really. >>> >>> 1. Construct a FIFO queue, fire up n consumers to >> service the queue. >>> 2. Consumers block, waiting for queue elements. >>> 3. Producer adds elements to queue. Consumers >> unblock. >>> 4. Queue reaches capacity, producer blocks, waiting >> for room. >>> 5. Consumers empty the queue. >>> 6. Goto step 2. >> And that's a blocking algo, which is bad. > > Huh? You just asked for a blocking algorithm: "So, any such event dispatch system needs to keep the parsing from getting to far ahead." No, I didn't ask for a blocking algorithm. When the outgoing queue is full, the producer needs to pause itself, so that it's thread can be used for other things. Consider a single, shared thread pool, used system wide. There are only 8 threads available, as there are only 6 real cpus available. This thread pool is used to keep the system from getting overloaded, running too many things at once, and thrashing. If any of the work items being processed by one of these threads blocks, then the system will loose a thread for doing other work. And if A blocks on B, which blocks on C, then D, you've lost 4 threads. |
--- On Thu, 4/1/10, Adam Heath <[hidden email]> wrote:
> Adrian Crum wrote: > > --- On Thu, 4/1/10, Adam Heath <[hidden email]> > wrote: > >> Adrian Crum wrote: > >>> --- On Thu, 4/1/10, Adam Heath <[hidden email]> > >> wrote: > >>>> Adrian Crum wrote: > >>>>> --- On Thu, 4/1/10, Adam Heath <[hidden email]> > >>>> wrote: > >>>>>> Adrian Crum wrote: > >>>>>>> I multi-threaded the data load > by > >> having one > >>>> thread > >>>>>> parse the XML files > >>>>>>> and put the results in a > queue. > >> Another > >>>> thread > >>>>>> services the queue and > >>>>>>> loads the data. I also > multi-threaded > >> the > >>>> EECAs - but > >>>>>> that has an issue > >>>>>>> I need to solve. > >>>>>> We need to be careful with that. > >>>> EntitySaxReader > >>>>>> supports reading > >>>>>> extremely large data files; it > doesn't > >> read the > >>>> entire > >>>>>> thing into > >>>>>> memory. So, any such event > dispatch > >> system > >>>> needs to > >>>>>> keep the parsing > >>>>>> from getting to far ahead. > >>>>> http://java.sun.com/javase/6/docs/api/java/util/concurrent/BlockingQueue.html > >>>> Not really. That will block the > calling > >> thread when > >>>> no data is available. > >>> Yeah, really. > >>> > >>> 1. Construct a FIFO queue, fire up n consumers > to > >> service the queue. > >>> 2. Consumers block, waiting for queue > elements. > >>> 3. Producer adds elements to queue. Consumers > >> unblock. > >>> 4. Queue reaches capacity, producer blocks, > waiting > >> for room. > >>> 5. Consumers empty the queue. > >>> 6. Goto step 2. > >> And that's a blocking algo, which is bad. > > > > Huh? You just asked for a blocking algorithm: "So, any > such event dispatch system needs to keep the parsing from > getting to far ahead." > > No, I didn't ask for a blocking algorithm. When the > outgoing queue is > full, the producer needs to pause itself, so that it's > thread can be > used for other things. I guess you could make the producer consume a queue element, then try adding the new one again. So: 1. Construct a FIFO queue, fire up n consumers to service the queue. 2. Consumers block, waiting for queue elements. 3. Producer adds elements to queue. Consumers unblock. 4. Queue reaches capacity, producer becomes a consumer until there is room for new elements. 5. Consumers empty the queue. 6. Goto step 2. Btw, from my understanding of SEDA, entity import/export would be tasks that are submitted to a task queue. The queue's response time controller would determine if there are enough resources available to run the task. If the server is really busy, the task is rejected. |
Adrian Crum wrote:
> --- On Thu, 4/1/10, Adam Heath <[hidden email]> wrote: >> Adrian Crum wrote: >>> --- On Thu, 4/1/10, Adam Heath <[hidden email]> >> wrote: >>>> Adrian Crum wrote: >>>>> --- On Thu, 4/1/10, Adam Heath <[hidden email]> >>>> wrote: >>>>>> Adrian Crum wrote: >>>>>>> --- On Thu, 4/1/10, Adam Heath <[hidden email]> >>>>>> wrote: >>>>>>>> Adrian Crum wrote: >>>>>>>>> I multi-threaded the data load >> by >>>> having one >>>>>> thread >>>>>>>> parse the XML files >>>>>>>>> and put the results in a >> queue. >>>> Another >>>>>> thread >>>>>>>> services the queue and >>>>>>>>> loads the data. I also >> multi-threaded >>>> the >>>>>> EECAs - but >>>>>>>> that has an issue >>>>>>>>> I need to solve. >>>>>>>> We need to be careful with that. >>>>>> EntitySaxReader >>>>>>>> supports reading >>>>>>>> extremely large data files; it >> doesn't >>>> read the >>>>>> entire >>>>>>>> thing into >>>>>>>> memory. So, any such event >> dispatch >>>> system >>>>>> needs to >>>>>>>> keep the parsing >>>>>>>> from getting to far ahead. >>>>>>> http://java.sun.com/javase/6/docs/api/java/util/concurrent/BlockingQueue.html >>>>>> Not really. That will block the >> calling >>>> thread when >>>>>> no data is available. >>>>> Yeah, really. >>>>> >>>>> 1. Construct a FIFO queue, fire up n consumers >> to >>>> service the queue. >>>>> 2. Consumers block, waiting for queue >> elements. >>>>> 3. Producer adds elements to queue. Consumers >>>> unblock. >>>>> 4. Queue reaches capacity, producer blocks, >> waiting >>>> for room. >>>>> 5. Consumers empty the queue. >>>>> 6. Goto step 2. >>>> And that's a blocking algo, which is bad. >>> Huh? You just asked for a blocking algorithm: "So, any >> such event dispatch system needs to keep the parsing from >> getting to far ahead." >> >> No, I didn't ask for a blocking algorithm. When the >> outgoing queue is >> full, the producer needs to pause itself, so that it's >> thread can be >> used for other things. > > I guess you could make the producer consume a queue element, then try adding the new one again. So: Nope, not good enough. It would be possible for the producer thread to stuck for a long time, producing/consuming. If there are several such workflows like this in the thread pool, then the threads become unavailable for doing other work. CPU is a limited resource. In the SEDA model, a worker must be short in execution time, and return back into the pool when it is done. It's perfectly acceptable, however, to add another item to the pool's queue to continue processing, however. 1: producer runs, creates a work unit 2: if the end has reach, submit the work unit directly 3: otherwise, wrap the unit, so that when the unit gets run, the producer will be resubmitted. |
--- On Thu, 4/1/10, Adam Heath <[hidden email]> wrote:
> Adrian Crum wrote: > > --- On Thu, 4/1/10, Adam Heath <[hidden email]> > wrote: > >> Adrian Crum wrote: > >>> --- On Thu, 4/1/10, Adam Heath <[hidden email]> > >> wrote: > >>>> Adrian Crum wrote: > >>>>> --- On Thu, 4/1/10, Adam Heath <[hidden email]> > >>>> wrote: > >>>>>> Adrian Crum wrote: > >>>>>>> --- On Thu, 4/1/10, Adam Heath > <[hidden email]> > >>>>>> wrote: > >>>>>>>> Adrian Crum wrote: > >>>>>>>>> I multi-threaded the > data load > >> by > >>>> having one > >>>>>> thread > >>>>>>>> parse the XML files > >>>>>>>>> and put the results in > a > >> queue. > >>>> Another > >>>>>> thread > >>>>>>>> services the queue and > >>>>>>>>> loads the data. I > also > >> multi-threaded > >>>> the > >>>>>> EECAs - but > >>>>>>>> that has an issue > >>>>>>>>> I need to solve. > >>>>>>>> We need to be careful with > that. > >>>>>> EntitySaxReader > >>>>>>>> supports reading > >>>>>>>> extremely large data > files; it > >> doesn't > >>>> read the > >>>>>> entire > >>>>>>>> thing into > >>>>>>>> memory. So, any such > event > >> dispatch > >>>> system > >>>>>> needs to > >>>>>>>> keep the parsing > >>>>>>>> from getting to far > ahead. > >>>>>>> http://java.sun.com/javase/6/docs/api/java/util/concurrent/BlockingQueue.html > >>>>>> Not really. That will block > the > >> calling > >>>> thread when > >>>>>> no data is available. > >>>>> Yeah, really. > >>>>> > >>>>> 1. Construct a FIFO queue, fire up n > consumers > >> to > >>>> service the queue. > >>>>> 2. Consumers block, waiting for queue > >> elements. > >>>>> 3. Producer adds elements to queue. > Consumers > >>>> unblock. > >>>>> 4. Queue reaches capacity, producer > blocks, > >> waiting > >>>> for room. > >>>>> 5. Consumers empty the queue. > >>>>> 6. Goto step 2. > >>>> And that's a blocking algo, which is bad. > >>> Huh? You just asked for a blocking algorithm: > "So, any > >> such event dispatch system needs to keep the > parsing from > >> getting to far ahead." > >> > >> No, I didn't ask for a blocking algorithm. > When the > >> outgoing queue is > >> full, the producer needs to pause itself, so that > it's > >> thread can be > >> used for other things. > > > > I guess you could make the producer consume a queue > element, then try adding the new one again. So: > > Nope, not good enough. It would be possible for the > producer thread > to stuck for a long time, producing/consuming. If > there are several > such workflows like this in the thread pool, then the > threads become > unavailable for doing other work. Are we talking about theoretical software or OFBiz? What thread pool? The application server's? I have been referring to the existing OFBiz entity import/export code. If an entity import takes n mS in the current single-threaded code, and the same import takes n/x mS using multi-threaded code, then hasn't the performance improved? > CPU is a limited resource. CPUs are cheap. Just buy more. ;-) |
Adrian Crum wrote:
> --- On Thu, 4/1/10, Adam Heath <[hidden email]> wrote: >> Adrian Crum wrote: >>> --- On Thu, 4/1/10, Adam Heath <[hidden email]> >> wrote: >>>> Adrian Crum wrote: >>>>> --- On Thu, 4/1/10, Adam Heath <[hidden email]> >>>> wrote: >>>>>> Adrian Crum wrote: >>>>>>> --- On Thu, 4/1/10, Adam Heath <[hidden email]> >>>>>> wrote: >>>>>>>> Adrian Crum wrote: >>>>>>>>> --- On Thu, 4/1/10, Adam Heath >> <[hidden email]> >>>>>>>> wrote: >>>>>>>>>> Adrian Crum wrote: >>>>>>>>>>> I multi-threaded the >> data load >>>> by >>>>>> having one >>>>>>>> thread >>>>>>>>>> parse the XML files >>>>>>>>>>> and put the results in >> a >>>> queue. >>>>>> Another >>>>>>>> thread >>>>>>>>>> services the queue and >>>>>>>>>>> loads the data. I >> also >>>> multi-threaded >>>>>> the >>>>>>>> EECAs - but >>>>>>>>>> that has an issue >>>>>>>>>>> I need to solve. >>>>>>>>>> We need to be careful with >> that. >>>>>>>> EntitySaxReader >>>>>>>>>> supports reading >>>>>>>>>> extremely large data >> files; it >>>> doesn't >>>>>> read the >>>>>>>> entire >>>>>>>>>> thing into >>>>>>>>>> memory. So, any such >> event >>>> dispatch >>>>>> system >>>>>>>> needs to >>>>>>>>>> keep the parsing >>>>>>>>>> from getting to far >> ahead. >>>>>>>>> http://java.sun.com/javase/6/docs/api/java/util/concurrent/BlockingQueue.html >>>>>>>> Not really. That will block >> the >>>> calling >>>>>> thread when >>>>>>>> no data is available. >>>>>>> Yeah, really. >>>>>>> >>>>>>> 1. Construct a FIFO queue, fire up n >> consumers >>>> to >>>>>> service the queue. >>>>>>> 2. Consumers block, waiting for queue >>>> elements. >>>>>>> 3. Producer adds elements to queue. >> Consumers >>>>>> unblock. >>>>>>> 4. Queue reaches capacity, producer >> blocks, >>>> waiting >>>>>> for room. >>>>>>> 5. Consumers empty the queue. >>>>>>> 6. Goto step 2. >>>>>> And that's a blocking algo, which is bad. >>>>> Huh? You just asked for a blocking algorithm: >> "So, any >>>> such event dispatch system needs to keep the >> parsing from >>>> getting to far ahead." >>>> >>>> No, I didn't ask for a blocking algorithm. >> When the >>>> outgoing queue is >>>> full, the producer needs to pause itself, so that >> it's >>>> thread can be >>>> used for other things. >>> I guess you could make the producer consume a queue >> element, then try adding the new one again. So: >> >> Nope, not good enough. It would be possible for the >> producer thread >> to stuck for a long time, producing/consuming. If >> there are several >> such workflows like this in the thread pool, then the >> threads become >> unavailable for doing other work. > > Are we talking about theoretical software or OFBiz? What thread pool? The application server's? I have been referring to the existing OFBiz entity import/export code. If an entity import takes n mS in the current single-threaded code, and the same import takes n/x mS using multi-threaded code, then hasn't the performance improved? Data loading can take place from webtools. And several requests could be submitted at once. There's no reason to try and process them all at the same time, if the cpu is loaded. Just queue up the requests. Plus(this part is theorhetical), when ofbiz is more segmented, other things would go thru same pool. And thrashing would be reduced. I'm not suggesting we go thru and change ofbiz to some kind of segmented event dispatcher. But the basic infrastructure is simple enough to write, it doesn't hurt to do it right in the first place. > >> CPU is a limited resource. > > CPUs are cheap. Just buy more. ;-) Go survive a slashdotting. |
--- On Thu, 4/1/10, Adam Heath <[hidden email]> wrote:
> >> Nope, not good enough. It would be possible > for the > >> producer thread > >> to stuck for a long time, > producing/consuming. If > >> there are several > >> such workflows like this in the thread pool, then > the > >> threads become > >> unavailable for doing other work. > > > > Are we talking about theoretical software or OFBiz? > What thread pool? The application server's? I have been > referring to the existing OFBiz entity import/export code. > If an entity import takes n mS in the current > single-threaded code, and the same import takes n/x mS using > multi-threaded code, then hasn't the performance improved? > > Data loading can take place from webtools. And > several requests could > be submitted at once. There's no reason to try and > process them all > at the same time, if the cpu is loaded. Just queue up > the requests. Like SEDA or my JMS idea. In other words, theoretical. > I'm not suggesting we go thru and change ofbiz to some kind > of > segmented event dispatcher. But the basic > infrastructure is simple > enough to write, it doesn't hurt to do it right in the > first place. Simpler yet is to use a BlockingQueue for this one task. I'm not disagreeing with you - it would be cool to have a SEDA-style application. Instead, I'm advocating baby steps. From my perspective, it is easier to try a simple multi-threaded approach and see if it causes any problems. If that works okay, then you can make it more sophisticated. Multiple simultaneous huge entity import requests under heavy load sounds like an unlikely scenario. Is there a real need to design for that? |
In reply to this post by Adam Heath-2
--- On Thu, 4/1/10, Adam Heath <[hidden email]> wrote:
> Adrian Crum wrote: > > --- On Thu, 4/1/10, Adam Heath <[hidden email]> > wrote: > >> Adrian Crum wrote: > >>> --- On Thu, 4/1/10, Adam Heath <[hidden email]> > >> wrote: > >>>> Adrian Crum wrote: > >>>>> I multi-threaded the data load by > having one > >> thread > >>>> parse the XML files > >>>>> and put the results in a queue. > Another > >> thread > >>>> services the queue and > >>>>> loads the data. I also multi-threaded > the > >> EECAs - but > >>>> that has an issue > >>>>> I need to solve. > >>>> We need to be careful with that. > >> EntitySaxReader > >>>> supports reading > >>>> extremely large data files; it doesn't > read the > >> entire > >>>> thing into > >>>> memory. So, any such event dispatch > system > >> needs to > >>>> keep the parsing > >>>> from getting to far ahead. > >>> http://java.sun.com/javase/6/docs/api/java/util/concurrent/BlockingQueue.html > >> Not really. That will block the calling > thread when > >> no data is available. > > > > Yeah, really. > > > > 1. Construct a FIFO queue, fire up n consumers to > service the queue. > > 2. Consumers block, waiting for queue elements. > > 3. Producer adds elements to queue. Consumers > unblock. > > 4. Queue reaches capacity, producer blocks, waiting > for room. > > 5. Consumers empty the queue. > > 6. Goto step 2. > > And that's a blocking algo, which is bad. > > If you only have a limited number of threads, then anytime > one of them > blocks, the thread becomes unavailable to do real work. > > What needs to happen in these cases is that the thread > removes it self > from the thread pool, and the consumer thread then had to > resubmit the > producer. > > The whole point of SEDA is to not have unbounded resource > usage. If a > thread gets blocked, then that implies that another new > thread will be > needed to keep the work queue proceeding. Why Events Are A Bad Idea (for high-concurrency servers) - http://capriccio.cs.berkeley.edu/pubs/threads-hotos-2003.pdf An interesting refutation to SEDA. |
Free forum by Nabble | Edit this page |