Hi,
I was curious whether it would be possible to replace the proprietary Lucene engine with solr (www.apache.org/solr). The question I am asking is because that could increase the overall performance and also allow the fancy facetting. So in case it is possible (it should be: solr is based on lucene): how are products/catalogs added to lucene right now? is lucene running on a certain port? if so, would it be possible to simply switch ports? Cheers, Paul |
While this is an interesting discussion, I think you need to do more research and be more specific before you can start a discussion. It sounds like you are talking specifically about the product searching, but please be explicit about the type of data you'd like to search. If it is about product searching then please do more research because that does not use Lucene. Also, in what was is Lucene "proprietary" or in other words, how is solr less proprietary than Lucene? Sorry, that commented just sounded really funny. -David On Sep 10, 2008, at 6:46 AM, madppiper wrote: > > Hi, > > I was curious whether it would be possible to replace the > proprietary Lucene > engine with solr ( http://apache.org/solr www.apache.org/solr ). The > question I am asking is because that could increase the overall > performance > and also allow the fancy facetting. > > So in case it is possible (it should be: solr is based on lucene): > how are > products/catalogs added to lucene right now? is lucene running on a > certain > port? if so, would it be possible to simply switch ports? > > Cheers, > Paul > -- > View this message in context: http://www.nabble.com/Replacing-Lucene-with-Solr-tp19412826p19412826.html > Sent from the OFBiz - Dev mailing list archive at Nabble.com. > |
Ah, yes indeed.
Sorry, about my mumbling today. To be a tiny bit more specific. I am of course only referring to the indexed product search. I know that Solr is based on Lucene and that both use a similar way of indexing new listed products (that is by adding, updating,removing new entities from/to the index through XML files). From as much as I understand, Lucene is the main search engine used within OFBiz (I hope we are NOT relying on cached Mysql-native Fulltext Queries here). New Products are added to or removed from the IndexTree through a xml script (just as they would in Solr). Lucene then provides the rest of the OFBiz environment with a regular search mechanism, as well as ranked indexing, sorting and yadayadayada. Lucene however, does not allow advanced Facetting, nor is it a standalone application. Both of these advantages are something that Solr provides. Therefore it would be great if one could easily switch both search engines...
|
NO! Sorry, not correct. I'll state it again and this time with nothing else so it's clear: Product searching in OFBiz does NOT use Lucene. Please keep researching, and then feel free to continue this discussion. -David On Sep 10, 2008, at 7:56 AM, madppiper wrote: > > Ah, yes indeed. > > Sorry, about my mumbling today. To be a tiny bit more specific. I am > of > course only referring to the indexed product search. I know that > Solr is > based on Lucene and that both use a similar way of indexing new listed > products (that is by adding, updating,removing new entities from/to > the > index through XML files). > > From as much as I understand, Lucene is the main search engine used > within > OFBiz (I hope we are NOT relying on cached Mysql-native Fulltext > Queries > here). New Products are added to or removed from the IndexTree > through a xml > script (just as they would in Solr). Lucene then provides the rest > of the > OFBiz environment with a regular search mechanism, as well as ranked > indexing, sorting and yadayadayada. > > Lucene however, does not allow advanced Facetting, nor is it a > standalone > application. Both of these advantages are something that Solr > provides. > Therefore it would be great if one could easily switch both search > engines... > > > David E Jones wrote: >> >> >> While this is an interesting discussion, I think you need to do more >> research and be more specific before you can start a discussion. >> >> It sounds like you are talking specifically about the product >> searching, but please be explicit about the type of data you'd like >> to >> search. >> >> If it is about product searching then please do more research because >> that does not use Lucene. >> >> Also, in what was is Lucene "proprietary" or in other words, how is >> solr less proprietary than Lucene? Sorry, that commented just sounded >> really funny. >> >> -David >> >> >> On Sep 10, 2008, at 6:46 AM, madppiper wrote: >> >>> >>> Hi, >>> >>> I was curious whether it would be possible to replace the >>> proprietary Lucene >>> engine with solr ( http://apache.org/solr www.apache.org/solr ). The >>> question I am asking is because that could increase the overall >>> performance >>> and also allow the fancy facetting. >>> >>> So in case it is possible (it should be: solr is based on lucene): >>> how are >>> products/catalogs added to lucene right now? is lucene running on a >>> certain >>> port? if so, would it be possible to simply switch ports? >>> >>> Cheers, >>> Paul >>> -- >>> View this message in context: >>> http://www.nabble.com/Replacing-Lucene-with-Solr-tp19412826p19412826.html >>> Sent from the OFBiz - Dev mailing list archive at Nabble.com. >>> >> >> >> > > -- > View this message in context: http://www.nabble.com/Replacing-Lucene-with-Solr-tp19412826p19414154.html > Sent from the OFBiz - Dev mailing list archive at Nabble.com. > |
In reply to this post by madppiper-2
madppiper wrote:
> Ah, yes indeed. You're still mumbling. > Sorry, about my mumbling today. To be a tiny bit more specific. I am of > course only referring to the indexed product search. I know that Solr is > based on Lucene and that both use a similar way of indexing new listed > products (that is by adding, updating,removing new entities from/to the > index through XML files). No, it doesn't. Lucene is an api; it doesn't use xml files to populate it's internal database. Lucene doesn't index products. It provides an inverted index to records of text fields(which can be tokenized(or not), compressed(or not), or stored(or not). It's up to the application to decide which fields to add to a record, and what the fields and records actually mean. > From as much as I understand, Lucene is the main search engine used within > OFBiz (I hope we are NOT relying on cached Mysql-native Fulltext Queries > here). New Products are added to or removed from the IndexTree through a xml > script (just as they would in Solr). Lucene then provides the rest of the > OFBiz environment with a regular search mechanism, as well as ranked > indexing, sorting and yadayadayada. Um, no, lucene is not the *main* engine, not at all. Have you seen ProductKeyword? > Lucene however, does not allow advanced Facetting, nor is it a standalone > application. Both of these advantages are something that Solr provides. > Therefore it would be great if one could easily switch both search > engines... You say "program A has feature Z, so program A is obviously better", but you never define feature Z. I suggest you go have your morning coffee first, and maybe wait 'til after lunch. You're not helping your argument at all. ps: for our own internal websites, I've utilized both lucene and nutch(which is based on hadoop and lucene), and in both of those cases, you write *java* code to populate the lucene record with fields. No xml is ever used. |
Well,
I am sorry to say this, but I don't think that insulting me would help a healthy discussion at all - since I am new to the OFBiz environment and the documents are really scattered all over the place, I don't think I have to feel ashamed over not knowing OFBiz in each detail, nor should it hinder me from asking this type of questions. To Continue the discussion however (...and hopefully put this all behind), I still continue to believe that moving the OFBiz framework to a new Searchengine could be a very valuable idea. In case of moving to Solr, the indexed search (of products) would greatly benefit from such features as: # Support for Dynamic Faceted Browsing and Filtering # Advanced, Configurable Text Analysis # Multiple search indices #etc. (Feel welcome to have a read through http://lucene.apache.org/solr/features.html) So from what you said, no matter how you said it, I have learnt that Lucene is indeed the native search engine used by ofbiz and it is NOT fed through XML files. (that makes sense, cause I couldn't find the respective XML files mentioned - also explains what I found in org.ofbiz.content.search.SearchServices). So in order to use a different Searchengine, I would have to write my own XML-File generator and feed the search engine with that, correct? Doesn't the entityengine already do that with such data? EDIT: In case Lucene is NOT used by OFBiz for product search, are we really using Fulltext queries????
|
madppiper wrote:
> Well, > > I am sorry to say this, but I don't think that insulting me would help a > healthy discussion at all - since I am new to the OFBiz environment and the > documents are really scattered all over the place, I don't think I have to > feel ashamed over not knowing OFBiz in each detail, nor should it hinder me > from asking this type of questions. If I came off that harsh, then I apologize. But you haven't been making much of any sense yourself. First, you call lucene proprietary. Huh? Come again? Second, you say replacing lucene usage with solr would increase performance. That's not possible. It would slow things down. Let's just say that product searching was done with lucene, and it was switched to solr. Instead of the lucene searching being done in process, we'd now have to go to a separate process, serializing thru xml, then this other system would do an internal lucene query, then send results back thru xml. There's no way that could be faster. If you start talking about using solr's replication, then maybe you could get a speed increase. However, product searching is not where ofbiz spends most of it's time. Third, you mention facetting. What is that? It's not defined on the main website for solr. If it's such a whiz-bang fancy feature, then it should be prominently defined; it's not. Fourth, you ask what port lucene is running on? Huh? Lucene doesn't run on a port. It's not server software. It's an api for indexing documents. Fifth, you then say that solr and lucene index files in a similiar way. This is not entirely accurate. The way you stated it, it implies that solr and lucene are completely different tools, that just happen to have a similiar set of features. However, solr is built *on top of* lucene. Sixth, you go on to say that lucene is fed with xml files. It's not. It's an api, and it's up to the application to fetch the data, split it into fields, then hand it off to lucene for tokenizing/indexing. I'm getting bored now, but to summarize: You start talking about other programs, but the things you say are *not* what those other programs do. We can forgive your lack of knowledge about ofbiz. But when you are trying to sell use of another software project, and the statements you make show that you don't know what you are talking about, don't be upset if we don't respond favourably. ps: again, sorry if feel insulted by this, but it's obvious from reading your mails that you don't even know how the other software systems you are talking about are supposed to work. > To Continue the discussion however (...and hopefully put this all behind), I > still continue to believe that moving the OFBiz framework to a new > Searchengine could be a very valuable idea. In case of moving to Solr, the > indexed search (of products) would greatly benefit from such features as: > > # Support for Dynamic Faceted Browsing and Filtering > # Advanced, Configurable Text Analysis > # Multiple search indices > #etc. What are these? Definitions. Be a salesmen. Seal the deal. > (Feel welcome to have a read through > http://lucene.apache.org/solr/features.html > http://lucene.apache.org/solr/features.html ) As far as documentation goes, that page ranks right up there with the best of the ofbiz pages. Short, terse, and completely lacking in anything useful to digest. Filled with internal industry buzz-phrases that mean nothing to those who don't already have an intimate knowledge of the product. :| If you want us to look at this software, then don't make us do your work for you. While we are completely capable of cross-referencing undefined terms on that page, doing so would entail extra work on our part. We already have enough on our plates. And we have a system that is currently working. A new system would have to be much better, in order for us to consider switching to it, and dealing with the integration issues that occur. Plus, if we switch to some new tool, then we have to increase our working set to include the use of that tool, and continue to remember how it works going forward. > So from what you said, no matter how you said it, I have learnt that Lucene > is indeed the native search engine used by ofbiz and it is NOT fed through > XML files. (that makes sense, cause I couldn't find the respective XML files > mentioned - also explains what I found in > org.ofbiz.content.search.SearchServices). > So in order to use a different Searchengine, I would have to write my own > XML-File generator and feed the search engine with that, correct? Doesn't > the entityengine already do that with such data? Again, no, it does not use lucene. You're having a hard time understanding. The entityengine is not a free-form generic xml producer. It produces a raw dump, in xml form, of the database(in effect). While you might be able to massage that into something that solr can use(by way of xsl), it's not the way I would do this. |
In reply to this post by madppiper-2
Not to sound rude, but frankly I am already getting sick of all of these false accusations. All I wanted to know was whether or not anybody ever tried to switch search engines. I was in the understanding that for the product searches, OFBiz was using Lucene (and yes, I was fully aware that Solr is based upon Lucene, didn't I stay that clearly?) - since that was obviously not the case I then asked which Search engine (if any) was in the use. I did not intend to sell a product here, nor am I interested in pursuading the rest of you into using Solr. The reason to why I, personally, want to use Solr however, is because I have come to know and like this engine on different projects of mine. It is incredibly fast, and can do alot more than most other search engines can. The "faceted browsing", for instance, allows the user to dig deeper into the search results and narrow the results by keywords (or in other words facets that apply to any given object - it lists criteria so to speak that help users narrow down results by manufacturer, price, or author or whatever). It is also extremely powerful when it comes to correcting the userinput. It analyzes the input text and corrects any misspellings: "Example queries demonstrating relevancy improving transformations: * A search for power-shot matches PowerShot, and adata matches A-DATA due to the use of WordDelimiterFilter and LowerCaseFilter. * A search for name:printers matches Printer, and features:recharging matches Rechargeable due to stemming with the EnglishPorterFilter. * A search for "1 gigabyte" matches things with GB, and pixima matches Pixma due to use of a SynonymFilter. " (from: http://lucene.apache.org/solr/tutorial.html#Text+Analysis) I haven't seen that implemented in OFBiz, but if I am mistaken, I all be gladly proven otherwise. Final question? What engine do we use for product searches then? Cached Fulltext-queries? Don't get me wrong - still very grateful for an honest, not offending answer... Cheers, Paul P.S.:Instead of proprietary i meant native, btw - not all of us are native speakers you know... sorry P.P.S.: Solr is a stand-alone application and communicates through a port - hence my misunderstanding that Lucene would do the same - I have worked with Solr, not lucene |
You have stated what caused the responses, when you made assumptions.
[I have worked with Solr, not lucene.] You have not investigated how ofbiz works. you can research this on your own by doing search for files that include the word search. and follow the code or widgets you find. madppiper sent the following on 9/10/2008 2:07 PM: > > Not to sound rude, but frankly I am already getting sick of all of these > false accusations. All I wanted to know was whether or not anybody ever > tried to switch search engines. I was in the understanding that for the > product searches, OFBiz was using Lucene (and yes, I was fully aware that > Solr is based upon Lucene, didn't I stay that clearly?) - since that was > obviously not the case I then asked which Search engine (if any) was in the > use. I did not intend to sell a product here, nor am I interested in > pursuading the rest of you into using Solr. > > The reason to why I, personally, want to use Solr however, is because I have > come to know and like this engine on different projects of mine. It is > incredibly fast, and can do alot more than most other search engines can. > The "faceted browsing", for instance, allows the user to dig deeper into the > search results and narrow the results by keywords (or in other words facets > that apply to any given object - it lists criteria so to speak that help > users narrow down results by manufacturer, price, or author or whatever). It > is also extremely powerful when it comes to correcting the userinput. It > analyzes the input text and corrects any misspellings: > > "Example queries demonstrating relevancy improving transformations: > > * A search for power-shot matches PowerShot, and adata matches A-DATA > due to the use of WordDelimiterFilter and LowerCaseFilter. > * A search for name:printers matches Printer, and features:recharging > matches Rechargeable due to stemming with the EnglishPorterFilter. > * A search for "1 gigabyte" matches things with GB, and pixima matches > Pixma due to use of a SynonymFilter. > " (from: http://lucene.apache.org/solr/tutorial.html#Text+Analysis > http://lucene.apache.org/solr/tutorial.html#Text+Analysis ) > > > I haven't seen that implemented in OFBiz, but if I am mistaken, I all be > gladly proven otherwise. > > Final question? What engine do we use for product searches then? Cached > Fulltext-queries? > > Don't get me wrong - still very grateful for an honest, not offending > answer... > Cheers, > Paul > > > > P.S.:Instead of proprietary i meant native, btw - not all of us are native > speakers you know... sorry > P.P.S.: Solr is a stand-alone application and communicates through a port - > hence my misunderstanding that Lucene would do the same - I have worked with > Solr, not lucene > |
Administrator
|
In reply to this post by madppiper-2
So the question is : would Solr be better for OFBiz than what is existing right now, right ?
Jacques From: "madppiper" <[hidden email]> > > > Not to sound rude, but frankly I am already getting sick of all of these > false accusations. All I wanted to know was whether or not anybody ever > tried to switch search engines. I was in the understanding that for the > product searches, OFBiz was using Lucene (and yes, I was fully aware that > Solr is based upon Lucene, didn't I stay that clearly?) - since that was > obviously not the case I then asked which Search engine (if any) was in the > use. I did not intend to sell a product here, nor am I interested in > pursuading the rest of you into using Solr. > > The reason to why I, personally, want to use Solr however, is because I have > come to know and like this engine on different projects of mine. It is > incredibly fast, and can do alot more than most other search engines can. > The "faceted browsing", for instance, allows the user to dig deeper into the > search results and narrow the results by keywords (or in other words facets > that apply to any given object - it lists criteria so to speak that help > users narrow down results by manufacturer, price, or author or whatever). It > is also extremely powerful when it comes to correcting the userinput. It > analyzes the input text and corrects any misspellings: > > "Example queries demonstrating relevancy improving transformations: > > * A search for power-shot matches PowerShot, and adata matches A-DATA > due to the use of WordDelimiterFilter and LowerCaseFilter. > * A search for name:printers matches Printer, and features:recharging > matches Rechargeable due to stemming with the EnglishPorterFilter. > * A search for "1 gigabyte" matches things with GB, and pixima matches > Pixma due to use of a SynonymFilter. > " (from: http://lucene.apache.org/solr/tutorial.html#Text+Analysis > http://lucene.apache.org/solr/tutorial.html#Text+Analysis ) > > > I haven't seen that implemented in OFBiz, but if I am mistaken, I all be > gladly proven otherwise. > > Final question? What engine do we use for product searches then? Cached > Fulltext-queries? > > Don't get me wrong - still very grateful for an honest, not offending > answer... > Cheers, > Paul > > > > P.S.:Instead of proprietary i meant native, btw - not all of us are native > speakers you know... sorry > P.P.S.: Solr is a stand-alone application and communicates through a port - > hence my misunderstanding that Lucene would do the same - I have worked with > Solr, not lucene > > -- > View this message in context: http://www.nabble.com/Replacing-Lucene-with-Solr-tp19412826p19423163.html > Sent from the OFBiz - Dev mailing list archive at Nabble.com. > |
In reply to this post by BJ Freeman
I think that comments like that are not only unneccesary, but unhealthy for any open discussion. (Please read my original message again, replace the term "proprietary" with "native", keep in mind that OFBIz does NOT use Lucene for searching - so I was told several times now, and then skip through the original question at hand) @Jacques: Thanks for the response - not quite. There are actually two questions at hand: 1) What search engine, if any, is used by OFBiz to generate keyword search results for Products? 2) If 1) can be answered with "NO Searchengine per se" - which would implie that we are doing real database queries right now (perhaps one that use Fulltext-query algorithms), would it not be a good idea to move to a standalone searchengine as Solr? |
Hello,
Just to put some light on the product search. Main class involved : applications/product/src/org/ofbiz/product/product/ProductSearch.java It's 100% SGDB based, not lucene or whatever. For a reminder, there is an entity in Ofbiz called ProductKeyword which primary key is ProductId and Keyword (varchar(60)) and that is filled at each creation update of the product carateristics, name, fields,.... So is it today the best and most efficient way to do search? huho, not sure you are right.. But for product only, it's usually enough (boolean search speaking). Now if need also to index files that are associated with product and may be (but i don't know if exist already as i never looked) if need to index CMS and files uploaded through CMS, a solution based on a real search engine should be far more superior. Regards 2008/9/11 madppiper <[hidden email]> > > > BJ Freeman wrote: > > > > You have stated what caused the responses, when you made assumptions. > > [I have worked with Solr, not lucene.] > > > > You have not investigated how ofbiz works. > > I think that comments like that are not only unneccesary, but unhealthy for > any open discussion. (Please read my original message again, replace the > term "proprietary" with "native", keep in mind that OFBIz does NOT use > Lucene for searching - so I was told several times now, and then skip > through the original question at hand) > > > > @Jacques: Thanks for the response - not quite. There are actually two > questions at hand: > > 1) > What search engine, if any, is used by OFBiz to generate keyword search > results for Products? > > 2) > If 1) can be answered with "NO Searchengine per se" - which would implie > that we are doing real database queries right now (perhaps one that use > Fulltext-query algorithms), would it not be a good idea to move to a > standalone searchengine as Solr? > > > -- > View this message in context: > http://www.nabble.com/Replacing-Lucene-with-Solr-tp19412826p19429281.html > Sent from the OFBiz - Dev mailing list archive at Nabble.com. > > |
Administrator
|
Paul,
I think Patrick explained pretty well the situation. So now, as most things in OFBiz, it's only a matter of manpower :o) Of course, I don't mean coding right away. But looking closer at existing code and gather requirements, use cases, etc.... Then discuss and hopefully provide a patch in Jira, simple isn'it ? For patch in Jira please read http://docs.ofbiz.org/display/OFBADMIN/OFBiz+Contributors+Best+Practices Jacques From: "Patrick Antivackis" <[hidden email]> > Hello, > Just to put some light on the product search. > Main class involved : > applications/product/src/org/ofbiz/product/product/ProductSearch.java > > It's 100% SGDB based, not lucene or whatever. > > For a reminder, there is an entity in Ofbiz called ProductKeyword which > primary key is ProductId and Keyword (varchar(60)) and that is filled at > each creation update of the product carateristics, name, fields,.... > > So is it today the best and most efficient way to do search? huho, not sure > you are right.. But for product only, it's usually enough (boolean search > speaking). Now if need also to index files that are associated with product > and may be (but i don't know if exist already as i never looked) if need to > index CMS and files uploaded through CMS, a solution based on a real search > engine should be far more superior. > > Regards > > 2008/9/11 madppiper <[hidden email]> > >> >> >> BJ Freeman wrote: >> > >> > You have stated what caused the responses, when you made assumptions. >> > [I have worked with Solr, not lucene.] >> > >> > You have not investigated how ofbiz works. >> >> I think that comments like that are not only unneccesary, but unhealthy for >> any open discussion. (Please read my original message again, replace the >> term "proprietary" with "native", keep in mind that OFBIz does NOT use >> Lucene for searching - so I was told several times now, and then skip >> through the original question at hand) >> >> >> >> @Jacques: Thanks for the response - not quite. There are actually two >> questions at hand: >> >> 1) >> What search engine, if any, is used by OFBiz to generate keyword search >> results for Products? >> >> 2) >> If 1) can be answered with "NO Searchengine per se" - which would implie >> that we are doing real database queries right now (perhaps one that use >> Fulltext-query algorithms), would it not be a good idea to move to a >> standalone searchengine as Solr? >> >> >> -- >> View this message in context: >> http://www.nabble.com/Replacing-Lucene-with-Solr-tp19412826p19429281.html >> Sent from the OFBiz - Dev mailing list archive at Nabble.com. >> >> > |
In reply to this post by Patrick Antivackis
While it's possible that Lucene (or Solr) is faster for the keyword searches I wouldn't be convinced until I saw a comparison done on a reasonably large data set between Lucene and the ProductKeyword table using a few different keyword combinations. With ProductKeyword we're using a database index on the keywords to lookup productIds, which is basically what Lucene does with its own reverse index. Lucene does do some cool search expression stuff that our current product searching doesn't support. However, the current product search does support various features like stem removal and thesaurus expansion (which has been mentioned in this thread). One of the really big problems with moving to Lucene is how to handle the parametric searching and flexible sorting that we currently do by taking advantage of a dozen or so tables in the database to search on features associated with products and categories (optionally including all sub-categories) and prices and catalogs and stores, and on top of that it's easy to add constraints for just about anything else you might associate with a product. The option of doing a Lucene search first to get a set of productIds that match and then passing that to the database with a possibly massive IN expression would work, but might perform horribly because of all of the data that needs to be moved around and such. If Solr supports this sort of parametric search it might be interesting, but it would be a LOT of redundant data to keep track of, and I don't really like that a whole lot... So, back to the beginning, unless someone can show that Lucene beats out the keyword indexing that a good database (and properly configured to make sure the keyword index is working and so on) does with the ProductKeyword table then I wouldn't even want to start going in this direction. -David On Sep 13, 2008, at 6:43 AM, Patrick Antivackis wrote: > Hello, > Just to put some light on the product search. > Main class involved : > applications/product/src/org/ofbiz/product/product/ProductSearch.java > > It's 100% SGDB based, not lucene or whatever. > > For a reminder, there is an entity in Ofbiz called ProductKeyword > which > primary key is ProductId and Keyword (varchar(60)) and that is > filled at > each creation update of the product carateristics, name, fields,.... > > So is it today the best and most efficient way to do search? huho, > not sure > you are right.. But for product only, it's usually enough (boolean > search > speaking). Now if need also to index files that are associated with > product > and may be (but i don't know if exist already as i never looked) if > need to > index CMS and files uploaded through CMS, a solution based on a real > search > engine should be far more superior. > > Regards > > 2008/9/11 madppiper <[hidden email]> > >> >> >> BJ Freeman wrote: >>> >>> You have stated what caused the responses, when you made >>> assumptions. >>> [I have worked with Solr, not lucene.] >>> >>> You have not investigated how ofbiz works. >> >> I think that comments like that are not only unneccesary, but >> unhealthy for >> any open discussion. (Please read my original message again, >> replace the >> term "proprietary" with "native", keep in mind that OFBIz does NOT >> use >> Lucene for searching - so I was told several times now, and then skip >> through the original question at hand) >> >> >> >> @Jacques: Thanks for the response - not quite. There are actually two >> questions at hand: >> >> 1) >> What search engine, if any, is used by OFBiz to generate keyword >> search >> results for Products? >> >> 2) >> If 1) can be answered with "NO Searchengine per se" - which would >> implie >> that we are doing real database queries right now (perhaps one that >> use >> Fulltext-query algorithms), would it not be a good idea to move to a >> standalone searchengine as Solr? >> >> >> -- >> View this message in context: >> http://www.nabble.com/Replacing-Lucene-with-Solr-tp19412826p19429281.html >> Sent from the OFBiz - Dev mailing list archive at Nabble.com. >> >> |
Perhaps something easy and simple can be done first, creating and
updating product info in a catalog to a Lucene index. No matter whether the Lucene index would be used in OFBiz backoffice, it's still very useful for some other scenarioes such as in a catalog CD. Someone may want to distribute a catalog and other info in a CD-ROM or similar, a Java-based client can use the Lucene index to do search without an OFBiz installation. Actually, we can offer the later part in a component with configurable search pipeline function. Shi Yusen/Beijing Langhua Ltd. 在 2008-09-13六的 11:18 -0600,David E Jones写道: > While it's possible that Lucene (or Solr) is faster for the keyword > searches I wouldn't be convinced until I saw a comparison done on a > reasonably large data set between Lucene and the ProductKeyword table > using a few different keyword combinations. With ProductKeyword we're > using a database index on the keywords to lookup productIds, which is > basically what Lucene does with its own reverse index. > > Lucene does do some cool search expression stuff that our current > product searching doesn't support. However, the current product search > does support various features like stem removal and thesaurus > expansion (which has been mentioned in this thread). > > One of the really big problems with moving to Lucene is how to handle > the parametric searching and flexible sorting that we currently do by > taking advantage of a dozen or so tables in the database to search on > features associated with products and categories (optionally including > all sub-categories) and prices and catalogs and stores, and on top of > that it's easy to add constraints for just about anything else you > might associate with a product. > > The option of doing a Lucene search first to get a set of productIds > that match and then passing that to the database with a possibly > massive IN expression would work, but might perform horribly because > of all of the data that needs to be moved around and such. > > If Solr supports this sort of parametric search it might be > interesting, but it would be a LOT of redundant data to keep track of, > and I don't really like that a whole lot... > > So, back to the beginning, unless someone can show that Lucene beats > out the keyword indexing that a good database (and properly configured > to make sure the keyword index is working and so on) does with the > ProductKeyword table then I wouldn't even want to start going in this > direction. > > -David > > > On Sep 13, 2008, at 6:43 AM, Patrick Antivackis wrote: > > > Hello, > > Just to put some light on the product search. > > Main class involved : > > applications/product/src/org/ofbiz/product/product/ProductSearch.java > > > > It's 100% SGDB based, not lucene or whatever. > > > > For a reminder, there is an entity in Ofbiz called ProductKeyword > > which > > primary key is ProductId and Keyword (varchar(60)) and that is > > filled at > > each creation update of the product carateristics, name, fields,.... > > > > So is it today the best and most efficient way to do search? huho, > > not sure > > you are right.. But for product only, it's usually enough (boolean > > search > > speaking). Now if need also to index files that are associated with > > product > > and may be (but i don't know if exist already as i never looked) if > > need to > > index CMS and files uploaded through CMS, a solution based on a real > > search > > engine should be far more superior. > > > > Regards > > > > 2008/9/11 madppiper <[hidden email]> > > > >> > >> > >> BJ Freeman wrote: > >>> > >>> You have stated what caused the responses, when you made > >>> assumptions. > >>> [I have worked with Solr, not lucene.] > >>> > >>> You have not investigated how ofbiz works. > >> > >> I think that comments like that are not only unneccesary, but > >> unhealthy for > >> any open discussion. (Please read my original message again, > >> replace the > >> term "proprietary" with "native", keep in mind that OFBIz does NOT > >> use > >> Lucene for searching - so I was told several times now, and then skip > >> through the original question at hand) > >> > >> > >> > >> @Jacques: Thanks for the response - not quite. There are actually two > >> questions at hand: > >> > >> 1) > >> What search engine, if any, is used by OFBiz to generate keyword > >> search > >> results for Products? > >> > >> 2) > >> If 1) can be answered with "NO Searchengine per se" - which would > >> implie > >> that we are doing real database queries right now (perhaps one that > >> use > >> Fulltext-query algorithms), would it not be a good idea to move to a > >> standalone searchengine as Solr? > >> > >> > >> -- > >> View this message in context: > >> http://www.nabble.com/Replacing-Lucene-with-Solr-tp19412826p19429281.html > >> Sent from the OFBiz - Dev mailing list archive at Nabble.com. > >> > >> > |
In reply to this post by David E Jones
David is right to point out the parametric search, this is a good reason to
keep a 100% db search for products. About the speed, i think nobody knows who will win between lucene and a database (and which database ;) ), but databases usually are less efficient with variable length index than with constant length index (and keyword is a varchar(60)). Patrick |
In reply to this post by Shi Yusen
On Saturday 13 September 2008, Shi Yusen wrote:
> Perhaps something easy and simple can be done first, creating and > updating product info in a catalog to a Lucene index. > > No matter whether the Lucene index would be used in OFBiz backoffice, > it's still very useful for some other scenarioes such as in a catalog > CD. Someone may want to distribute a catalog and other info in a CD-ROM > or similar, a Java-based client can use the Lucene index to do search > without an OFBiz installation. > > Actually, we can offer the later part in a component with configurable > search pipeline function. > > Shi Yusen/Beijing Langhua Ltd. > > > > 在 2008-09-13六的 11:18 -0600,David E Jones写道: > > > While it's possible that Lucene (or Solr) is faster for the keyword > > searches I wouldn't be convinced until I saw a comparison done on a > > reasonably large data set between Lucene and the ProductKeyword table > > using a few different keyword combinations. With ProductKeyword we're > > using a database index on the keywords to lookup productIds, which is > > basically what Lucene does with its own reverse index. > > > > Lucene does do some cool search expression stuff that our current > > product searching doesn't support. However, the current product search > > does support various features like stem removal and thesaurus > > expansion (which has been mentioned in this thread). > > > > One of the really big problems with moving to Lucene is how to handle > > the parametric searching and flexible sorting that we currently do by > > taking advantage of a dozen or so tables in the database to search on > > features associated with products and categories (optionally including > > all sub-categories) and prices and catalogs and stores, and on top of > > that it's easy to add constraints for just about anything else you > > might associate with a product. > > > > The option of doing a Lucene search first to get a set of productIds > > that match and then passing that to the database with a possibly > > massive IN expression would work, but might perform horribly because > > of all of the data that needs to be moved around and such. > > > > If Solr supports this sort of parametric search it might be > > interesting, but it would be a LOT of redundant data to keep track of, > > and I don't really like that a whole lot... > > > > So, back to the beginning, unless someone can show that Lucene beats > > out the keyword indexing that a good database (and properly configured > > to make sure the keyword index is working and so on) does with the > > ProductKeyword table then I wouldn't even want to start going in this > > direction. > > > > -David > > > > On Sep 13, 2008, at 6:43 AM, Patrick Antivackis wrote: > > > Hello, > > > Just to put some light on the product search. > > > Main class involved : > > > applications/product/src/org/ofbiz/product/product/ProductSearch.java > > > > > > It's 100% SGDB based, not lucene or whatever. > > > > > > For a reminder, there is an entity in Ofbiz called ProductKeyword > > > which > > > primary key is ProductId and Keyword (varchar(60)) and that is > > > filled at > > > each creation update of the product carateristics, name, fields,.... > > > > > > So is it today the best and most efficient way to do search? huho, > > > not sure > > > you are right.. But for product only, it's usually enough (boolean > > > search > > > speaking). Now if need also to index files that are associated with > > > product > > > and may be (but i don't know if exist already as i never looked) if > > > need to > > > index CMS and files uploaded through CMS, a solution based on a real > > > search > > > engine should be far more superior. > > > > > > Regards > > > > > > 2008/9/11 madppiper <[hidden email]> > > > > > >> BJ Freeman wrote: > > >>> You have stated what caused the responses, when you made > > >>> assumptions. > > >>> [I have worked with Solr, not lucene.] > > >>> > > >>> You have not investigated how ofbiz works. > > >> > > >> I think that comments like that are not only unneccesary, but > > >> unhealthy for > > >> any open discussion. (Please read my original message again, > > >> replace the > > >> term "proprietary" with "native", keep in mind that OFBIz does NOT > > >> use > > >> Lucene for searching - so I was told several times now, and then skip > > >> through the original question at hand) > > >> > > >> > > >> > > >> @Jacques: Thanks for the response - not quite. There are actually two > > >> questions at hand: > > >> > > >> 1) > > >> What search engine, if any, is used by OFBiz to generate keyword > > >> search > > >> results for Products? > > >> > > >> 2) > > >> If 1) can be answered with "NO Searchengine per se" - which would > > >> implie > > >> that we are doing real database queries right now (perhaps one that > > >> use > > >> Fulltext-query algorithms), would it not be a good idea to move to a > > >> standalone searchengine as Solr? > > >> > > >> > > >> -- > > >> View this message in context: > > >> http://www.nabble.com/Replacing-Lucene-with-Solr-tp19412826p19429281.h > > >>tml Sent from the OFBiz - Dev mailing list archive at Nabble.com. Might some answers be found by looking at the performance of H2 which has Lucene built into it. David |
In reply to this post by Patrick Antivackis
Why even stick to Varchars? You can keep it at Char(60) - that reserves the space within the database table, but doesn't mean that you can't enter a 10char string into it... (it's a matter of diskspace really, nothing else)
P.S.: Thanks for all the great feedback - reading up on it.
|
Administrator
|
According to http://archives.postgresql.org/pgsql-performance/2005-03/msg00491.php This is not true for Postgres. Read tip in
http://www.postgresql.org/docs/8.3/interactive/datatype-character.html, Tip: There are no performance differences between these three types, apart from increased storage size when using the blank-padded type, and a few extra cycles to check the length when storing into a length-constrained column. While character(n) has performance advantages in some other database systems, it has no such advantages in PostgreSQL. In most situations text or character varying should be used instead. This is *one of* the reasons I prefer Postgres (though I did not look into MySql details) BTW I will commit https://issues.apache.org/jira/browse/OFBIZ-1920 if nobody see a problem with that. Jacques From: "madppiper" <[hidden email]> > > Why even stick to Varchars? You can keep it at Char(60) - that reserves the > space within the database table, but doesn't mean that you can't enter a > 10char string into it... (it's a matter of diskspace really, nothing else) > > > > > > Patrick Antivackis wrote: >> >> David is right to point out the parametric search, this is a good reason >> to >> keep a 100% db search for products. >> About the speed, i think nobody knows who will win between lucene and a >> database (and which database ;) ), but databases usually are less >> efficient >> with variable length index than with constant length index (and keyword is >> a >> varchar(60)). >> >> Patrick >> >> > > -- > View this message in context: http://www.nabble.com/Replacing-Lucene-with-Solr-tp19412826p19506954.html > Sent from the OFBiz - Dev mailing list archive at Nabble.com. > |
Free forum by Nabble | Edit this page |