Replacing Lucene with Solr

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

Replacing Lucene with Solr

madppiper-2
Hi,

I was curious whether it would be possible to replace the proprietary Lucene engine with solr (www.apache.org/solr). The question I am asking is because that could increase the overall performance and also allow the fancy facetting.

So in case it is possible (it should be: solr is based on lucene): how are products/catalogs added to lucene right now? is lucene running on a certain port? if so, would it be possible to simply switch ports?

Cheers,
Paul
Reply | Threaded
Open this post in threaded view
|

Re: Replacing Lucene with Solr

David E Jones

While this is an interesting discussion, I think you need to do more  
research and be more specific before you can start a discussion.

It sounds like you are talking specifically about the product  
searching, but please be explicit about the type of data you'd like to  
search.

If it is about product searching then please do more research because  
that does not use Lucene.

Also, in what was is Lucene "proprietary" or in other words, how is  
solr less proprietary than Lucene? Sorry, that commented just sounded  
really funny.

-David


On Sep 10, 2008, at 6:46 AM, madppiper wrote:

>
> Hi,
>
> I was curious whether it would be possible to replace the  
> proprietary Lucene
> engine with solr ( http://apache.org/solr www.apache.org/solr ). The
> question I am asking is because that could increase the overall  
> performance
> and also allow the fancy facetting.
>
> So in case it is possible (it should be: solr is based on lucene):  
> how are
> products/catalogs added to lucene right now? is lucene running on a  
> certain
> port? if so, would it be possible to simply switch ports?
>
> Cheers,
> Paul
> --
> View this message in context: http://www.nabble.com/Replacing-Lucene-with-Solr-tp19412826p19412826.html
> Sent from the OFBiz - Dev mailing list archive at Nabble.com.
>

Reply | Threaded
Open this post in threaded view
|

Re: Replacing Lucene with Solr

madppiper-2
Ah, yes indeed.

Sorry, about my mumbling today. To be a tiny bit more specific. I am of course only referring to the indexed product search. I know that Solr is based on Lucene and that both use a similar way of indexing new listed products (that is by adding, updating,removing new entities from/to the index through XML files).

From as much as I understand, Lucene is the main search engine used within OFBiz (I hope we are NOT relying on cached Mysql-native Fulltext Queries here). New Products are added to or removed from the IndexTree through a xml script (just as they would in Solr). Lucene then provides the rest of the OFBiz environment with a regular search mechanism, as well as ranked indexing, sorting and yadayadayada.

Lucene however, does not allow advanced Facetting, nor is it a standalone application. Both of these advantages are something that Solr provides. Therefore it would be great if one could easily switch both search engines...

David E Jones wrote
While this is an interesting discussion, I think you need to do more  
research and be more specific before you can start a discussion.

It sounds like you are talking specifically about the product  
searching, but please be explicit about the type of data you'd like to  
search.

If it is about product searching then please do more research because  
that does not use Lucene.

Also, in what was is Lucene "proprietary" or in other words, how is  
solr less proprietary than Lucene? Sorry, that commented just sounded  
really funny.

-David


On Sep 10, 2008, at 6:46 AM, madppiper wrote:

>
> Hi,
>
> I was curious whether it would be possible to replace the  
> proprietary Lucene
> engine with solr ( http://apache.org/solr www.apache.org/solr ). The
> question I am asking is because that could increase the overall  
> performance
> and also allow the fancy facetting.
>
> So in case it is possible (it should be: solr is based on lucene):  
> how are
> products/catalogs added to lucene right now? is lucene running on a  
> certain
> port? if so, would it be possible to simply switch ports?
>
> Cheers,
> Paul
> --
> View this message in context: http://www.nabble.com/Replacing-Lucene-with-Solr-tp19412826p19412826.html
> Sent from the OFBiz - Dev mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

Re: Replacing Lucene with Solr

David E Jones

NO!

Sorry, not correct. I'll state it again and this time with nothing  
else so it's clear:

Product searching in OFBiz does NOT use Lucene.

Please keep researching, and then feel free to continue this discussion.

-David

On Sep 10, 2008, at 7:56 AM, madppiper wrote:

>
> Ah, yes indeed.
>
> Sorry, about my mumbling today. To be a tiny bit more specific. I am  
> of
> course only referring to the indexed product search. I know that  
> Solr is
> based on Lucene and that both use a similar way of indexing new listed
> products (that is by adding, updating,removing new entities from/to  
> the
> index through XML files).
>
> From as much as I understand, Lucene is the main search engine used  
> within
> OFBiz (I hope we are NOT relying on cached Mysql-native Fulltext  
> Queries
> here). New Products are added to or removed from the IndexTree  
> through a xml
> script (just as they would in Solr). Lucene then provides the rest  
> of the
> OFBiz environment with a regular search mechanism, as well as ranked
> indexing, sorting and yadayadayada.
>
> Lucene however, does not allow advanced Facetting, nor is it a  
> standalone
> application. Both of these advantages are something that Solr  
> provides.
> Therefore it would be great if one could easily switch both search
> engines...
>
>
> David E Jones wrote:
>>
>>
>> While this is an interesting discussion, I think you need to do more
>> research and be more specific before you can start a discussion.
>>
>> It sounds like you are talking specifically about the product
>> searching, but please be explicit about the type of data you'd like  
>> to
>> search.
>>
>> If it is about product searching then please do more research because
>> that does not use Lucene.
>>
>> Also, in what was is Lucene "proprietary" or in other words, how is
>> solr less proprietary than Lucene? Sorry, that commented just sounded
>> really funny.
>>
>> -David
>>
>>
>> On Sep 10, 2008, at 6:46 AM, madppiper wrote:
>>
>>>
>>> Hi,
>>>
>>> I was curious whether it would be possible to replace the
>>> proprietary Lucene
>>> engine with solr ( http://apache.org/solr www.apache.org/solr ). The
>>> question I am asking is because that could increase the overall
>>> performance
>>> and also allow the fancy facetting.
>>>
>>> So in case it is possible (it should be: solr is based on lucene):
>>> how are
>>> products/catalogs added to lucene right now? is lucene running on a
>>> certain
>>> port? if so, would it be possible to simply switch ports?
>>>
>>> Cheers,
>>> Paul
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Replacing-Lucene-with-Solr-tp19412826p19412826.html
>>> Sent from the OFBiz - Dev mailing list archive at Nabble.com.
>>>
>>
>>
>>
>
> --
> View this message in context: http://www.nabble.com/Replacing-Lucene-with-Solr-tp19412826p19414154.html
> Sent from the OFBiz - Dev mailing list archive at Nabble.com.
>

Reply | Threaded
Open this post in threaded view
|

Re: Replacing Lucene with Solr

Adam Heath-2
In reply to this post by madppiper-2
madppiper wrote:
> Ah, yes indeed.

You're still mumbling.

> Sorry, about my mumbling today. To be a tiny bit more specific. I am of
> course only referring to the indexed product search. I know that Solr is
> based on Lucene and that both use a similar way of indexing new listed
> products (that is by adding, updating,removing new entities from/to the
> index through XML files).

No, it doesn't.  Lucene is an api; it doesn't use xml files to populate
it's internal database.

Lucene doesn't index products.  It provides an inverted index to records
of text fields(which can be tokenized(or not), compressed(or not), or
stored(or not).  It's up to the application to decide which fields to
add to a record, and what the fields and records actually mean.

> From as much as I understand, Lucene is the main search engine used within
> OFBiz (I hope we are NOT relying on cached Mysql-native Fulltext Queries
> here). New Products are added to or removed from the IndexTree through a xml
> script (just as they would in Solr). Lucene then provides the rest of the
> OFBiz environment with a regular search mechanism, as well as ranked
> indexing, sorting and yadayadayada.

Um, no, lucene is not the *main* engine, not at all.  Have you seen
ProductKeyword?

> Lucene however, does not allow advanced Facetting, nor is it a standalone
> application. Both of these advantages are something that Solr provides.
> Therefore it would be great if one could easily switch both search
> engines...

You say "program A has feature Z, so program A is obviously better", but
you never define feature Z.

I suggest you go have your morning coffee first, and maybe wait 'til
after lunch.  You're not helping your argument at all.

ps: for our own internal websites, I've utilized both lucene and
nutch(which is based on hadoop and lucene), and in both of those cases,
you write *java* code to populate the lucene record with fields.  No xml
is ever used.
Reply | Threaded
Open this post in threaded view
|

Re: Replacing Lucene with Solr

madppiper-2
Well,

I am sorry to say this, but I don't think that insulting me would help a healthy discussion at all - since I am new to the OFBiz environment and the documents are really scattered all over the place, I don't think I have to feel ashamed over not knowing OFBiz in each detail, nor should it hinder me from asking this type of questions.

To Continue the discussion however (...and hopefully put this all behind), I still continue to believe that moving the OFBiz framework to a new Searchengine could be a very valuable idea. In case of moving to Solr, the indexed search (of products) would greatly benefit from such features as:

#  Support for Dynamic Faceted Browsing and Filtering
#  Advanced, Configurable Text Analysis
#  Multiple search indices
#etc.

(Feel welcome to have a read through http://lucene.apache.org/solr/features.html)

So from what you said, no matter how you said it, I have learnt that Lucene is indeed the native search engine used by ofbiz and it is NOT fed through XML files. (that makes sense, cause I couldn't find the respective XML files mentioned - also explains what I found in org.ofbiz.content.search.SearchServices).
So in order to use a different Searchengine, I would have to write my own XML-File generator and feed the search engine with that, correct? Doesn't the entityengine already do that with such data?

EDIT: In case Lucene is NOT used by OFBiz for product search, are we really using Fulltext queries????

Adam Heath-2 wrote
madppiper wrote:
> Ah, yes indeed.

You're still mumbling.

> Sorry, about my mumbling today. To be a tiny bit more specific. I am of
> course only referring to the indexed product search. I know that Solr is
> based on Lucene and that both use a similar way of indexing new listed
> products (that is by adding, updating,removing new entities from/to the
> index through XML files).

No, it doesn't.  Lucene is an api; it doesn't use xml files to populate
it's internal database.

Lucene doesn't index products.  It provides an inverted index to records
of text fields(which can be tokenized(or not), compressed(or not), or
stored(or not).  It's up to the application to decide which fields to
add to a record, and what the fields and records actually mean.

> From as much as I understand, Lucene is the main search engine used within
> OFBiz (I hope we are NOT relying on cached Mysql-native Fulltext Queries
> here). New Products are added to or removed from the IndexTree through a xml
> script (just as they would in Solr). Lucene then provides the rest of the
> OFBiz environment with a regular search mechanism, as well as ranked
> indexing, sorting and yadayadayada.

Um, no, lucene is not the *main* engine, not at all.  Have you seen
ProductKeyword?

> Lucene however, does not allow advanced Facetting, nor is it a standalone
> application. Both of these advantages are something that Solr provides.
> Therefore it would be great if one could easily switch both search
> engines...

You say "program A has feature Z, so program A is obviously better", but
you never define feature Z.

I suggest you go have your morning coffee first, and maybe wait 'til
after lunch.  You're not helping your argument at all.

ps: for our own internal websites, I've utilized both lucene and
nutch(which is based on hadoop and lucene), and in both of those cases,
you write *java* code to populate the lucene record with fields.  No xml
is ever used.
Reply | Threaded
Open this post in threaded view
|

Re: Replacing Lucene with Solr

Adam Heath-2
madppiper wrote:
> Well,
>
> I am sorry to say this, but I don't think that insulting me would help a
> healthy discussion at all - since I am new to the OFBiz environment and the
> documents are really scattered all over the place, I don't think I have to
> feel ashamed over not knowing OFBiz in each detail, nor should it hinder me
> from asking this type of questions.

If I came off that harsh, then I apologize.  But you haven't been making
much of any sense yourself.

First, you call lucene proprietary.  Huh?  Come again?

Second, you say replacing lucene usage with solr would increase
performance.  That's not possible.  It would slow things down.  Let's
just say that product searching was done with lucene, and it was
switched to solr.  Instead of the lucene searching being done in
process, we'd now have to go to a separate process, serializing thru
xml, then this other system would do an internal lucene query, then send
results back thru xml.  There's no way that could be faster.  If you
start talking about using solr's replication, then maybe you could get a
speed increase.  However, product searching is not where ofbiz spends
most of it's time.

Third, you mention facetting.  What is that?  It's not defined on the
main website for solr.  If it's such a whiz-bang fancy feature, then it
should be prominently defined; it's not.

Fourth, you ask what port lucene is running on?  Huh?  Lucene doesn't
run on a port.  It's not server software.  It's an api for indexing
documents.

Fifth, you then say that solr and lucene index files in a similiar way.
 This is not entirely accurate.  The way you stated it, it implies that
solr and lucene are completely different tools, that just happen to have
a similiar set of features.  However, solr is built *on top of* lucene.

Sixth, you go on to say that lucene is fed with xml files.  It's not.
It's an api, and it's up to the application to fetch the data, split it
into fields, then hand it off to lucene for tokenizing/indexing.

I'm getting bored now, but to summarize:

You start talking about other programs, but the things you say are *not*
what those other programs do.  We can forgive your lack of knowledge
about ofbiz.  But when you are trying to sell use of another software
project, and the statements you make show that you don't know what you
are talking about, don't be upset if we don't respond favourably.

ps: again, sorry if feel insulted by this, but it's obvious from reading
your mails that you don't even know how the other software systems you
are talking about are supposed to work.

> To Continue the discussion however (...and hopefully put this all behind), I
> still continue to believe that moving the OFBiz framework to a new
> Searchengine could be a very valuable idea. In case of moving to Solr, the
> indexed search (of products) would greatly benefit from such features as:
>
> #  Support for Dynamic Faceted Browsing and Filtering
> #  Advanced, Configurable Text Analysis
> #  Multiple search indices
> #etc.

What are these?  Definitions.  Be a salesmen.  Seal the deal.

> (Feel welcome to have a read through
> http://lucene.apache.org/solr/features.html
> http://lucene.apache.org/solr/features.html )

As far as documentation goes, that page ranks right up there with the
best of the ofbiz pages.  Short, terse, and completely lacking in
anything useful to digest.  Filled with internal industry buzz-phrases
that mean nothing to those who don't already have an intimate knowledge
of the product. :|

If you want us to look at this software, then don't make us do your work
for you.  While we are completely capable of cross-referencing undefined
terms on that page, doing so would entail extra work on our part.  We
already have enough on our plates.

And we have a system that is currently working.  A new system would have
to be much better, in order for us to consider switching to it, and
dealing with the integration issues that occur.

Plus, if we switch to some new tool, then we have to increase our
working set to include the use of that tool, and continue to remember
how it works going forward.

> So from what you said, no matter how you said it, I have learnt that Lucene
> is indeed the native search engine used by ofbiz and it is NOT fed through
> XML files. (that makes sense, cause I couldn't find the respective XML files
> mentioned - also explains what I found in
> org.ofbiz.content.search.SearchServices).
> So in order to use a different Searchengine, I would have to write my own
> XML-File generator and feed the search engine with that, correct? Doesn't
> the entityengine already do that with such data?

Again, no, it does not use lucene.  You're having a hard time understanding.

The entityengine is not a free-form generic xml producer.  It produces a
raw dump, in xml form, of the database(in effect).  While you might be
able to massage that into something that solr can use(by way of xsl),
it's not the way I would do this.
Reply | Threaded
Open this post in threaded view
|

Re: Replacing Lucene with Solr

madppiper-2
In reply to this post by madppiper-2

Not to sound rude, but frankly I am already getting sick of all of these false accusations. All I wanted to know was whether or not anybody ever tried to switch search engines. I was in the understanding that for the product searches, OFBiz was using Lucene (and yes, I was fully aware that Solr is based upon Lucene, didn't I stay that clearly?) - since that was obviously not the case I then asked which Search engine (if any) was in the use. I did not intend to sell a product here, nor am I interested in pursuading the rest of you into using Solr.

The reason to why I, personally, want to use Solr however, is because I have come to know and like this engine on different projects of mine. It is incredibly fast, and can do alot more than most other search engines can. The "faceted browsing", for instance, allows the user to dig deeper into the search results and narrow the results by keywords (or in other words facets that apply to any given object - it lists criteria so to speak that help users narrow down results by manufacturer, price, or author or whatever). It is also extremely powerful when it comes to correcting the userinput. It analyzes the input text and corrects any misspellings:

"Example queries demonstrating relevancy improving transformations:

    * A search for power-shot matches PowerShot, and adata matches A-DATA due to the use of WordDelimiterFilter and LowerCaseFilter.
    * A search for name:printers matches Printer, and features:recharging matches Rechargeable due to stemming with the EnglishPorterFilter.
    * A search for "1 gigabyte" matches things with GB, and pixima matches Pixma due to use of a SynonymFilter.
"
 (from: http://lucene.apache.org/solr/tutorial.html#Text+Analysis)


I haven't seen that implemented in OFBiz, but if I am mistaken, I all be gladly proven otherwise.

Final question? What engine do we use for product searches then? Cached Fulltext-queries?

Don't get me wrong - still very grateful for an honest, not offending answer...
Cheers,
Paul



P.S.:Instead of proprietary i meant native, btw - not all of us are native speakers you know... sorry
P.P.S.: Solr is a stand-alone application and communicates through a port - hence my misunderstanding that Lucene would do the same - I have worked with Solr, not lucene
Reply | Threaded
Open this post in threaded view
|

Re: Replacing Lucene with Solr

BJ Freeman
You have stated what caused the responses, when you made assumptions.
[I have worked with Solr, not lucene.]

You have not investigated how ofbiz works.

you can research this on your own by doing search for files that include
the word search.
and follow the code or widgets you find.


madppiper sent the following on 9/10/2008 2:07 PM:

>
> Not to sound rude, but frankly I am already getting sick of all of these
> false accusations. All I wanted to know was whether or not anybody ever
> tried to switch search engines. I was in the understanding that for the
> product searches, OFBiz was using Lucene (and yes, I was fully aware that
> Solr is based upon Lucene, didn't I stay that clearly?) - since that was
> obviously not the case I then asked which Search engine (if any) was in the
> use. I did not intend to sell a product here, nor am I interested in
> pursuading the rest of you into using Solr.
>
> The reason to why I, personally, want to use Solr however, is because I have
> come to know and like this engine on different projects of mine. It is
> incredibly fast, and can do alot more than most other search engines can.
> The "faceted browsing", for instance, allows the user to dig deeper into the
> search results and narrow the results by keywords (or in other words facets
> that apply to any given object - it lists criteria so to speak that help
> users narrow down results by manufacturer, price, or author or whatever). It
> is also extremely powerful when it comes to correcting the userinput. It
> analyzes the input text and corrects any misspellings:
>
> "Example queries demonstrating relevancy improving transformations:
>
>     * A search for power-shot matches PowerShot, and adata matches A-DATA
> due to the use of WordDelimiterFilter and LowerCaseFilter.
>     * A search for name:printers matches Printer, and features:recharging
> matches Rechargeable due to stemming with the EnglishPorterFilter.
>     * A search for "1 gigabyte" matches things with GB, and pixima matches
> Pixma due to use of a SynonymFilter.
> " (from:  http://lucene.apache.org/solr/tutorial.html#Text+Analysis
> http://lucene.apache.org/solr/tutorial.html#Text+Analysis )
>
>
> I haven't seen that implemented in OFBiz, but if I am mistaken, I all be
> gladly proven otherwise.
>
> Final question? What engine do we use for product searches then? Cached
> Fulltext-queries?
>
> Don't get me wrong - still very grateful for an honest, not offending
> answer...
> Cheers,
> Paul
>
>
>
> P.S.:Instead of proprietary i meant native, btw - not all of us are native
> speakers you know... sorry
> P.P.S.: Solr is a stand-alone application and communicates through a port -
> hence my misunderstanding that Lucene would do the same - I have worked with
> Solr, not lucene
>

Reply | Threaded
Open this post in threaded view
|

Re: Replacing Lucene with Solr

Jacques Le Roux
Administrator
In reply to this post by madppiper-2
So the question is : would Solr be better for OFBiz than what is existing right now, right ?

Jacques

From: "madppiper" <[hidden email]>

>
>
> Not to sound rude, but frankly I am already getting sick of all of these
> false accusations. All I wanted to know was whether or not anybody ever
> tried to switch search engines. I was in the understanding that for the
> product searches, OFBiz was using Lucene (and yes, I was fully aware that
> Solr is based upon Lucene, didn't I stay that clearly?) - since that was
> obviously not the case I then asked which Search engine (if any) was in the
> use. I did not intend to sell a product here, nor am I interested in
> pursuading the rest of you into using Solr.
>
> The reason to why I, personally, want to use Solr however, is because I have
> come to know and like this engine on different projects of mine. It is
> incredibly fast, and can do alot more than most other search engines can.
> The "faceted browsing", for instance, allows the user to dig deeper into the
> search results and narrow the results by keywords (or in other words facets
> that apply to any given object - it lists criteria so to speak that help
> users narrow down results by manufacturer, price, or author or whatever). It
> is also extremely powerful when it comes to correcting the userinput. It
> analyzes the input text and corrects any misspellings:
>
> "Example queries demonstrating relevancy improving transformations:
>
>    * A search for power-shot matches PowerShot, and adata matches A-DATA
> due to the use of WordDelimiterFilter and LowerCaseFilter.
>    * A search for name:printers matches Printer, and features:recharging
> matches Rechargeable due to stemming with the EnglishPorterFilter.
>    * A search for "1 gigabyte" matches things with GB, and pixima matches
> Pixma due to use of a SynonymFilter.
> " (from:  http://lucene.apache.org/solr/tutorial.html#Text+Analysis
> http://lucene.apache.org/solr/tutorial.html#Text+Analysis )
>
>
> I haven't seen that implemented in OFBiz, but if I am mistaken, I all be
> gladly proven otherwise.
>
> Final question? What engine do we use for product searches then? Cached
> Fulltext-queries?
>
> Don't get me wrong - still very grateful for an honest, not offending
> answer...
> Cheers,
> Paul
>
>
>
> P.S.:Instead of proprietary i meant native, btw - not all of us are native
> speakers you know... sorry
> P.P.S.: Solr is a stand-alone application and communicates through a port -
> hence my misunderstanding that Lucene would do the same - I have worked with
> Solr, not lucene
>
> --
> View this message in context: http://www.nabble.com/Replacing-Lucene-with-Solr-tp19412826p19423163.html
> Sent from the OFBiz - Dev mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

Re: Replacing Lucene with Solr

madppiper-2
In reply to this post by BJ Freeman
BJ Freeman wrote
You have stated what caused the responses, when you made assumptions.
[I have worked with Solr, not lucene.]

You have not investigated how ofbiz works.
I think that comments like that are not only unneccesary, but unhealthy for any open discussion. (Please read my original message again, replace the term "proprietary" with "native", keep in mind that OFBIz does NOT use Lucene for searching - so I was told several times now, and then skip through the original question at hand)



@Jacques: Thanks for the response - not quite. There are actually two questions at hand:

1)
What search engine, if any, is used by OFBiz to generate keyword search results for Products?

2)
If 1) can be answered with "NO Searchengine per se" - which would implie that we are doing real database queries right now (perhaps one that use Fulltext-query algorithms), would it not be a good idea to move to a standalone searchengine as Solr?

Reply | Threaded
Open this post in threaded view
|

Re: Replacing Lucene with Solr

Patrick Antivackis
Hello,
Just to put some light on the product search.
Main class involved :
applications/product/src/org/ofbiz/product/product/ProductSearch.java

It's 100% SGDB based, not lucene or whatever.

For a reminder, there is an entity in Ofbiz called ProductKeyword which
primary key is ProductId and Keyword (varchar(60)) and that is filled at
each creation update of the product carateristics, name, fields,....

So is it today the best and most efficient way to do search? huho, not sure
you are right.. But for product only, it's usually enough (boolean search
speaking). Now if need also to index files that are associated with product
and may be (but i don't know if exist already as i never looked) if need to
index CMS and files uploaded through CMS, a solution based on a real search
engine should be far more superior.

Regards

2008/9/11 madppiper <[hidden email]>

>
>
> BJ Freeman wrote:
> >
> > You have stated what caused the responses, when you made assumptions.
> > [I have worked with Solr, not lucene.]
> >
> > You have not investigated how ofbiz works.
>
> I think that comments like that are not only unneccesary, but unhealthy for
> any open discussion. (Please read my original message again, replace the
> term "proprietary" with "native", keep in mind that OFBIz does NOT use
> Lucene for searching - so I was told several times now, and then skip
> through the original question at hand)
>
>
>
> @Jacques: Thanks for the response - not quite. There are actually two
> questions at hand:
>
> 1)
> What search engine, if any, is used by OFBiz to generate keyword search
> results for Products?
>
> 2)
> If 1) can be answered with "NO Searchengine per se" - which would implie
> that we are doing real database queries right now (perhaps one that use
> Fulltext-query algorithms), would it not be a good idea to move to a
> standalone searchengine as Solr?
>
>
> --
> View this message in context:
> http://www.nabble.com/Replacing-Lucene-with-Solr-tp19412826p19429281.html
> Sent from the OFBiz - Dev mailing list archive at Nabble.com.
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Replacing Lucene with Solr

Jacques Le Roux
Administrator
Paul,

I think Patrick explained pretty well the situation. So now, as most things in OFBiz,  it's only a matter of manpower :o)
Of course, I don't mean coding right away. But looking closer at existing code and gather requirements, use cases, etc....
Then discuss and hopefully provide a patch in Jira, simple isn'it ?
For patch in Jira please read http://docs.ofbiz.org/display/OFBADMIN/OFBiz+Contributors+Best+Practices

Jacques

From: "Patrick Antivackis" <[hidden email]>

> Hello,
> Just to put some light on the product search.
> Main class involved :
> applications/product/src/org/ofbiz/product/product/ProductSearch.java
>
> It's 100% SGDB based, not lucene or whatever.
>
> For a reminder, there is an entity in Ofbiz called ProductKeyword which
> primary key is ProductId and Keyword (varchar(60)) and that is filled at
> each creation update of the product carateristics, name, fields,....
>
> So is it today the best and most efficient way to do search? huho, not sure
> you are right.. But for product only, it's usually enough (boolean search
> speaking). Now if need also to index files that are associated with product
> and may be (but i don't know if exist already as i never looked) if need to
> index CMS and files uploaded through CMS, a solution based on a real search
> engine should be far more superior.
>
> Regards
>
> 2008/9/11 madppiper <[hidden email]>
>
>>
>>
>> BJ Freeman wrote:
>> >
>> > You have stated what caused the responses, when you made assumptions.
>> > [I have worked with Solr, not lucene.]
>> >
>> > You have not investigated how ofbiz works.
>>
>> I think that comments like that are not only unneccesary, but unhealthy for
>> any open discussion. (Please read my original message again, replace the
>> term "proprietary" with "native", keep in mind that OFBIz does NOT use
>> Lucene for searching - so I was told several times now, and then skip
>> through the original question at hand)
>>
>>
>>
>> @Jacques: Thanks for the response - not quite. There are actually two
>> questions at hand:
>>
>> 1)
>> What search engine, if any, is used by OFBiz to generate keyword search
>> results for Products?
>>
>> 2)
>> If 1) can be answered with "NO Searchengine per se" - which would implie
>> that we are doing real database queries right now (perhaps one that use
>> Fulltext-query algorithms), would it not be a good idea to move to a
>> standalone searchengine as Solr?
>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Replacing-Lucene-with-Solr-tp19412826p19429281.html
>> Sent from the OFBiz - Dev mailing list archive at Nabble.com.
>>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Replacing Lucene with Solr

David E Jones
In reply to this post by Patrick Antivackis

While it's possible that Lucene (or Solr) is faster for the keyword  
searches I wouldn't be convinced until I saw a comparison done on a  
reasonably large data set between Lucene and the ProductKeyword table  
using a few different keyword combinations. With ProductKeyword we're  
using a database index on the keywords to lookup productIds, which is  
basically what Lucene does with its own reverse index.

Lucene does do some cool search expression stuff that our current  
product searching doesn't support. However, the current product search  
does support various features like stem removal and thesaurus  
expansion (which has been mentioned in this thread).

One of the really big problems with moving to Lucene is how to handle  
the parametric searching and flexible sorting that we currently do by  
taking advantage of a dozen or so tables in the database to search on  
features associated with products and categories (optionally including  
all sub-categories) and prices and catalogs and stores, and on top of  
that it's easy to add constraints for just about anything else you  
might associate with a product.

The option of doing a Lucene search first to get a set of productIds  
that match and then passing that to the database with a possibly  
massive IN expression would work, but might perform horribly because  
of all of the data that needs to be moved around and such.

If Solr supports this sort of parametric search it might be  
interesting, but it would be a LOT of redundant data to keep track of,  
and I don't really like that a whole lot...

So, back to the beginning, unless someone can show that Lucene beats  
out the keyword indexing that a good database (and properly configured  
to make sure the keyword index is working and so on) does with the  
ProductKeyword table then I wouldn't even want to start going in this  
direction.

-David


On Sep 13, 2008, at 6:43 AM, Patrick Antivackis wrote:

> Hello,
> Just to put some light on the product search.
> Main class involved :
> applications/product/src/org/ofbiz/product/product/ProductSearch.java
>
> It's 100% SGDB based, not lucene or whatever.
>
> For a reminder, there is an entity in Ofbiz called ProductKeyword  
> which
> primary key is ProductId and Keyword (varchar(60)) and that is  
> filled at
> each creation update of the product carateristics, name, fields,....
>
> So is it today the best and most efficient way to do search? huho,  
> not sure
> you are right.. But for product only, it's usually enough (boolean  
> search
> speaking). Now if need also to index files that are associated with  
> product
> and may be (but i don't know if exist already as i never looked) if  
> need to
> index CMS and files uploaded through CMS, a solution based on a real  
> search
> engine should be far more superior.
>
> Regards
>
> 2008/9/11 madppiper <[hidden email]>
>
>>
>>
>> BJ Freeman wrote:
>>>
>>> You have stated what caused the responses, when you made  
>>> assumptions.
>>> [I have worked with Solr, not lucene.]
>>>
>>> You have not investigated how ofbiz works.
>>
>> I think that comments like that are not only unneccesary, but  
>> unhealthy for
>> any open discussion. (Please read my original message again,  
>> replace the
>> term "proprietary" with "native", keep in mind that OFBIz does NOT  
>> use
>> Lucene for searching - so I was told several times now, and then skip
>> through the original question at hand)
>>
>>
>>
>> @Jacques: Thanks for the response - not quite. There are actually two
>> questions at hand:
>>
>> 1)
>> What search engine, if any, is used by OFBiz to generate keyword  
>> search
>> results for Products?
>>
>> 2)
>> If 1) can be answered with "NO Searchengine per se" - which would  
>> implie
>> that we are doing real database queries right now (perhaps one that  
>> use
>> Fulltext-query algorithms), would it not be a good idea to move to a
>> standalone searchengine as Solr?
>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Replacing-Lucene-with-Solr-tp19412826p19429281.html
>> Sent from the OFBiz - Dev mailing list archive at Nabble.com.
>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: Replacing Lucene with Solr

Shi Yusen
Perhaps something easy and simple can be done first, creating and
updating product info in a catalog to a Lucene index.

No matter whether the Lucene index would be used in OFBiz backoffice,
it's still very useful for some other scenarioes such as in a catalog
CD. Someone may want to distribute a catalog and other info in a CD-ROM
or similar, a Java-based client can use the Lucene index to do search
without an OFBiz installation.

Actually, we can offer the later part in a component with configurable
search pipeline function.

Shi Yusen/Beijing Langhua Ltd.



在 2008-09-13六的 11:18 -0600,David E Jones写道:

> While it's possible that Lucene (or Solr) is faster for the keyword  
> searches I wouldn't be convinced until I saw a comparison done on a  
> reasonably large data set between Lucene and the ProductKeyword table  
> using a few different keyword combinations. With ProductKeyword we're  
> using a database index on the keywords to lookup productIds, which is  
> basically what Lucene does with its own reverse index.
>
> Lucene does do some cool search expression stuff that our current  
> product searching doesn't support. However, the current product search  
> does support various features like stem removal and thesaurus  
> expansion (which has been mentioned in this thread).
>
> One of the really big problems with moving to Lucene is how to handle  
> the parametric searching and flexible sorting that we currently do by  
> taking advantage of a dozen or so tables in the database to search on  
> features associated with products and categories (optionally including  
> all sub-categories) and prices and catalogs and stores, and on top of  
> that it's easy to add constraints for just about anything else you  
> might associate with a product.
>
> The option of doing a Lucene search first to get a set of productIds  
> that match and then passing that to the database with a possibly  
> massive IN expression would work, but might perform horribly because  
> of all of the data that needs to be moved around and such.
>
> If Solr supports this sort of parametric search it might be  
> interesting, but it would be a LOT of redundant data to keep track of,  
> and I don't really like that a whole lot...
>
> So, back to the beginning, unless someone can show that Lucene beats  
> out the keyword indexing that a good database (and properly configured  
> to make sure the keyword index is working and so on) does with the  
> ProductKeyword table then I wouldn't even want to start going in this  
> direction.
>
> -David
>
>
> On Sep 13, 2008, at 6:43 AM, Patrick Antivackis wrote:
>
> > Hello,
> > Just to put some light on the product search.
> > Main class involved :
> > applications/product/src/org/ofbiz/product/product/ProductSearch.java
> >
> > It's 100% SGDB based, not lucene or whatever.
> >
> > For a reminder, there is an entity in Ofbiz called ProductKeyword  
> > which
> > primary key is ProductId and Keyword (varchar(60)) and that is  
> > filled at
> > each creation update of the product carateristics, name, fields,....
> >
> > So is it today the best and most efficient way to do search? huho,  
> > not sure
> > you are right.. But for product only, it's usually enough (boolean  
> > search
> > speaking). Now if need also to index files that are associated with  
> > product
> > and may be (but i don't know if exist already as i never looked) if  
> > need to
> > index CMS and files uploaded through CMS, a solution based on a real  
> > search
> > engine should be far more superior.
> >
> > Regards
> >
> > 2008/9/11 madppiper <[hidden email]>
> >
> >>
> >>
> >> BJ Freeman wrote:
> >>>
> >>> You have stated what caused the responses, when you made  
> >>> assumptions.
> >>> [I have worked with Solr, not lucene.]
> >>>
> >>> You have not investigated how ofbiz works.
> >>
> >> I think that comments like that are not only unneccesary, but  
> >> unhealthy for
> >> any open discussion. (Please read my original message again,  
> >> replace the
> >> term "proprietary" with "native", keep in mind that OFBIz does NOT  
> >> use
> >> Lucene for searching - so I was told several times now, and then skip
> >> through the original question at hand)
> >>
> >>
> >>
> >> @Jacques: Thanks for the response - not quite. There are actually two
> >> questions at hand:
> >>
> >> 1)
> >> What search engine, if any, is used by OFBiz to generate keyword  
> >> search
> >> results for Products?
> >>
> >> 2)
> >> If 1) can be answered with "NO Searchengine per se" - which would  
> >> implie
> >> that we are doing real database queries right now (perhaps one that  
> >> use
> >> Fulltext-query algorithms), would it not be a good idea to move to a
> >> standalone searchengine as Solr?
> >>
> >>
> >> --
> >> View this message in context:
> >> http://www.nabble.com/Replacing-Lucene-with-Solr-tp19412826p19429281.html
> >> Sent from the OFBiz - Dev mailing list archive at Nabble.com.
> >>
> >>
>

Reply | Threaded
Open this post in threaded view
|

Re: Replacing Lucene with Solr

Patrick Antivackis
In reply to this post by David E Jones
David is right to point out the parametric search, this is a good reason to
keep a 100% db search for products.
About the speed, i think nobody knows who will win between lucene and a
database (and which database ;) ), but databases usually are less efficient
with variable length index than with constant length index (and keyword is a
varchar(60)).

Patrick
Reply | Threaded
Open this post in threaded view
|

Re: Replacing Lucene with Solr

David Goodenough
In reply to this post by Shi Yusen
On Saturday 13 September 2008, Shi Yusen wrote:

> Perhaps something easy and simple can be done first, creating and
> updating product info in a catalog to a Lucene index.
>
> No matter whether the Lucene index would be used in OFBiz backoffice,
> it's still very useful for some other scenarioes such as in a catalog
> CD. Someone may want to distribute a catalog and other info in a CD-ROM
> or similar, a Java-based client can use the Lucene index to do search
> without an OFBiz installation.
>
> Actually, we can offer the later part in a component with configurable
> search pipeline function.
>
> Shi Yusen/Beijing Langhua Ltd.
>
>
>
> 在 2008-09-13六的 11:18 -0600,David E Jones写道:
>
> > While it's possible that Lucene (or Solr) is faster for the keyword
> > searches I wouldn't be convinced until I saw a comparison done on a
> > reasonably large data set between Lucene and the ProductKeyword table
> > using a few different keyword combinations. With ProductKeyword we're
> > using a database index on the keywords to lookup productIds, which is
> > basically what Lucene does with its own reverse index.
> >
> > Lucene does do some cool search expression stuff that our current
> > product searching doesn't support. However, the current product search
> > does support various features like stem removal and thesaurus
> > expansion (which has been mentioned in this thread).
> >
> > One of the really big problems with moving to Lucene is how to handle
> > the parametric searching and flexible sorting that we currently do by
> > taking advantage of a dozen or so tables in the database to search on
> > features associated with products and categories (optionally including
> > all sub-categories) and prices and catalogs and stores, and on top of
> > that it's easy to add constraints for just about anything else you
> > might associate with a product.
> >
> > The option of doing a Lucene search first to get a set of productIds
> > that match and then passing that to the database with a possibly
> > massive IN expression would work, but might perform horribly because
> > of all of the data that needs to be moved around and such.
> >
> > If Solr supports this sort of parametric search it might be
> > interesting, but it would be a LOT of redundant data to keep track of,
> > and I don't really like that a whole lot...
> >
> > So, back to the beginning, unless someone can show that Lucene beats
> > out the keyword indexing that a good database (and properly configured
> > to make sure the keyword index is working and so on) does with the
> > ProductKeyword table then I wouldn't even want to start going in this
> > direction.
> >
> > -David
> >
> > On Sep 13, 2008, at 6:43 AM, Patrick Antivackis wrote:
> > > Hello,
> > > Just to put some light on the product search.
> > > Main class involved :
> > > applications/product/src/org/ofbiz/product/product/ProductSearch.java
> > >
> > > It's 100% SGDB based, not lucene or whatever.
> > >
> > > For a reminder, there is an entity in Ofbiz called ProductKeyword
> > > which
> > > primary key is ProductId and Keyword (varchar(60)) and that is
> > > filled at
> > > each creation update of the product carateristics, name, fields,....
> > >
> > > So is it today the best and most efficient way to do search? huho,
> > > not sure
> > > you are right.. But for product only, it's usually enough (boolean
> > > search
> > > speaking). Now if need also to index files that are associated with
> > > product
> > > and may be (but i don't know if exist already as i never looked) if
> > > need to
> > > index CMS and files uploaded through CMS, a solution based on a real
> > > search
> > > engine should be far more superior.
> > >
> > > Regards
> > >
> > > 2008/9/11 madppiper <[hidden email]>
> > >
> > >> BJ Freeman wrote:
> > >>> You have stated what caused the responses, when you made
> > >>> assumptions.
> > >>> [I have worked with Solr, not lucene.]
> > >>>
> > >>> You have not investigated how ofbiz works.
> > >>
> > >> I think that comments like that are not only unneccesary, but
> > >> unhealthy for
> > >> any open discussion. (Please read my original message again,
> > >> replace the
> > >> term "proprietary" with "native", keep in mind that OFBIz does NOT
> > >> use
> > >> Lucene for searching - so I was told several times now, and then skip
> > >> through the original question at hand)
> > >>
> > >>
> > >>
> > >> @Jacques: Thanks for the response - not quite. There are actually two
> > >> questions at hand:
> > >>
> > >> 1)
> > >> What search engine, if any, is used by OFBiz to generate keyword
> > >> search
> > >> results for Products?
> > >>
> > >> 2)
> > >> If 1) can be answered with "NO Searchengine per se" - which would
> > >> implie
> > >> that we are doing real database queries right now (perhaps one that
> > >> use
> > >> Fulltext-query algorithms), would it not be a good idea to move to a
> > >> standalone searchengine as Solr?
> > >>
> > >>
> > >> --
> > >> View this message in context:
> > >> http://www.nabble.com/Replacing-Lucene-with-Solr-tp19412826p19429281.h
> > >>tml Sent from the OFBiz - Dev mailing list archive at Nabble.com.

Might some answers be found by looking at the performance of H2 which has
Lucene built into it.

David
Reply | Threaded
Open this post in threaded view
|

Re: Replacing Lucene with Solr

madppiper-2
In reply to this post by Patrick Antivackis
Why even stick to Varchars? You can keep it at Char(60) - that reserves the space within the database table, but doesn't mean that you can't enter a 10char string into it...  (it's a matter of diskspace really, nothing else)


P.S.: Thanks for all the great feedback - reading up on it.



Patrick Antivackis wrote
David is right to point out the parametric search, this is a good reason to
keep a 100% db search for products.
About the speed, i think nobody knows who will win between lucene and a
database (and which database ;) ), but databases usually are less efficient
with variable length index than with constant length index (and keyword is a
varchar(60)).

Patrick
Reply | Threaded
Open this post in threaded view
|

Re: Replacing Lucene with Solr

Jacques Le Roux
Administrator
According to http://archives.postgresql.org/pgsql-performance/2005-03/msg00491.php This is not true for Postgres. Read tip in
http://www.postgresql.org/docs/8.3/interactive/datatype-character.html,
Tip: There are no performance differences between these three types, apart from increased storage size when using the blank-padded
type, and a few extra cycles to check the length when storing into a length-constrained column. While character(n) has performance
advantages in some other database systems, it has no such advantages in PostgreSQL. In most situations text or character varying
should be used instead.

This is *one of* the reasons I prefer Postgres (though I did not look into MySql details)

BTW I will commit https://issues.apache.org/jira/browse/OFBIZ-1920 if nobody see a problem with that.

Jacques

From: "madppiper" <[hidden email]>

>
> Why even stick to Varchars? You can keep it at Char(60) - that reserves the
> space within the database table, but doesn't mean that you can't enter a
> 10char string into it...  (it's a matter of diskspace really, nothing else)
>
>
>
>
>
> Patrick Antivackis wrote:
>>
>> David is right to point out the parametric search, this is a good reason
>> to
>> keep a 100% db search for products.
>> About the speed, i think nobody knows who will win between lucene and a
>> database (and which database ;) ), but databases usually are less
>> efficient
>> with variable length index than with constant length index (and keyword is
>> a
>> varchar(60)).
>>
>> Patrick
>>
>>
>
> --
> View this message in context: http://www.nabble.com/Replacing-Lucene-with-Solr-tp19412826p19506954.html
> Sent from the OFBiz - Dev mailing list archive at Nabble.com.
>