Modified: ofbiz/trunk/plugins/solr/home/solrdefault/conf/lang/stopwords_sv.txt
URL: http://svn.apache.org/viewvc/ofbiz/trunk/plugins/solr/home/solrdefault/conf/lang/stopwords_sv.txt?rev=1781731&r1=1781730&r2=1781731&view=diff ============================================================================== --- ofbiz/trunk/plugins/solr/home/solrdefault/conf/lang/stopwords_sv.txt (original) +++ ofbiz/trunk/plugins/solr/home/solrdefault/conf/lang/stopwords_sv.txt Sun Feb 5 11:09:59 2017 @@ -1,133 +1,133 @@ - | From svn.tartarus.org/snowball/trunk/website/algorithms/swedish/stop.txt - | This file is distributed under the BSD License. - | See http://snowball.tartarus.org/license.php - | Also see http://www.opensource.org/licenses/bsd-license.html - | - Encoding was converted to UTF-8. - | - This notice was added. - | - | NOTE: To use this file with StopFilterFactory, you must specify format="snowball" - - | A Swedish stop word list. Comments begin with vertical bar. Each stop - | word is at the start of a line. - - | This is a ranked list (commonest to rarest) of stopwords derived from - | a large text sample. - - | Swedish stop words occasionally exhibit homonym clashes. For example - | sÃ¥ = so, but also seed. These are indicated clearly below. - -och | and -det | it, this/that -att | to (with infinitive) -i | in, at -en | a -jag | I -hon | she -som | who, that -han | he -pÃ¥ | on -den | it, this/that -med | with -var | where, each -sig | him(self) etc -för | for -sÃ¥ | so (also: seed) -till | to -är | is -men | but -ett | a -om | if; around, about -hade | had -de | they, these/those -av | of -icke | not, no -mig | me -du | you -henne | her -dÃ¥ | then, when -sin | his -nu | now -har | have -inte | inte nÃ¥gon = no one -hans | his -honom | him -skulle | 'sake' -hennes | her -där | there -min | my -man | one (pronoun) -ej | nor -vid | at, by, on (also: vast) -kunde | could -nÃ¥got | some etc -frÃ¥n | from, off -ut | out -när | when -efter | after, behind -upp | up -vi | we -dem | them -vara | be -vad | what -över | over -än | than -dig | you -kan | can -sina | his -här | here -ha | have -mot | towards -alla | all -under | under (also: wonder) -nÃ¥gon | some etc -eller | or (else) -allt | all -mycket | much -sedan | since -ju | why -denna | this/that -själv | myself, yourself etc -detta | this/that -Ã¥t | to -utan | without -varit | was -hur | how -ingen | no -mitt | my -ni | you -bli | to be, become -blev | from bli -oss | us -din | thy -dessa | these/those -nÃ¥gra | some etc -deras | their -blir | from bli -mina | my -samma | (the) same -vilken | who, that -er | you, your -sÃ¥dan | such a -vÃ¥r | our -blivit | from bli -dess | its -inom | within -mellan | between -sÃ¥dant | such a -varför | why -varje | each -vilka | who, that -ditt | thy -vem | who -vilket | who, that -sitta | his -sÃ¥dana | such a -vart | each -dina | thy -vars | whose -vÃ¥rt | our -vÃ¥ra | our -ert | your -era | your -vilkas | whose - + | From svn.tartarus.org/snowball/trunk/website/algorithms/swedish/stop.txt + | This file is distributed under the BSD License. + | See http://snowball.tartarus.org/license.php + | Also see http://www.opensource.org/licenses/bsd-license.html + | - Encoding was converted to UTF-8. + | - This notice was added. + | + | NOTE: To use this file with StopFilterFactory, you must specify format="snowball" + + | A Swedish stop word list. Comments begin with vertical bar. Each stop + | word is at the start of a line. + + | This is a ranked list (commonest to rarest) of stopwords derived from + | a large text sample. + + | Swedish stop words occasionally exhibit homonym clashes. For example + | sÃ¥ = so, but also seed. These are indicated clearly below. + +och | and +det | it, this/that +att | to (with infinitive) +i | in, at +en | a +jag | I +hon | she +som | who, that +han | he +pÃ¥ | on +den | it, this/that +med | with +var | where, each +sig | him(self) etc +för | for +sÃ¥ | so (also: seed) +till | to +är | is +men | but +ett | a +om | if; around, about +hade | had +de | they, these/those +av | of +icke | not, no +mig | me +du | you +henne | her +dÃ¥ | then, when +sin | his +nu | now +har | have +inte | inte nÃ¥gon = no one +hans | his +honom | him +skulle | 'sake' +hennes | her +där | there +min | my +man | one (pronoun) +ej | nor +vid | at, by, on (also: vast) +kunde | could +nÃ¥got | some etc +frÃ¥n | from, off +ut | out +när | when +efter | after, behind +upp | up +vi | we +dem | them +vara | be +vad | what +över | over +än | than +dig | you +kan | can +sina | his +här | here +ha | have +mot | towards +alla | all +under | under (also: wonder) +nÃ¥gon | some etc +eller | or (else) +allt | all +mycket | much +sedan | since +ju | why +denna | this/that +själv | myself, yourself etc +detta | this/that +Ã¥t | to +utan | without +varit | was +hur | how +ingen | no +mitt | my +ni | you +bli | to be, become +blev | from bli +oss | us +din | thy +dessa | these/those +nÃ¥gra | some etc +deras | their +blir | from bli +mina | my +samma | (the) same +vilken | who, that +er | you, your +sÃ¥dan | such a +vÃ¥r | our +blivit | from bli +dess | its +inom | within +mellan | between +sÃ¥dant | such a +varför | why +varje | each +vilka | who, that +ditt | thy +vem | who +vilket | who, that +sitta | his +sÃ¥dana | such a +vart | each +dina | thy +vars | whose +vÃ¥rt | our +vÃ¥ra | our +ert | your +era | your +vilkas | whose + Propchange: ofbiz/trunk/plugins/solr/home/solrdefault/conf/lang/stopwords_sv.txt ------------------------------------------------------------------------------ svn:eol-style = native Modified: ofbiz/trunk/plugins/solr/home/solrdefault/conf/lang/stopwords_th.txt URL: http://svn.apache.org/viewvc/ofbiz/trunk/plugins/solr/home/solrdefault/conf/lang/stopwords_th.txt?rev=1781731&r1=1781730&r2=1781731&view=diff ============================================================================== --- ofbiz/trunk/plugins/solr/home/solrdefault/conf/lang/stopwords_th.txt (original) +++ ofbiz/trunk/plugins/solr/home/solrdefault/conf/lang/stopwords_th.txt Sun Feb 5 11:09:59 2017 @@ -1,119 +1,119 @@ -# Thai stopwords from: -# "Opinion Detection in Thai Political News Columns -# Based on Subjectivity Analysis" -# Khampol Sukhum, Supot Nitsuwat, and Choochart Haruechaiyasak -à¹à¸§à¹ -à¹à¸¡à¹ -à¹à¸ -à¹à¸à¹ -à¹à¸«à¹ -à¹à¸ -à¹à¸à¸¢ -à¹à¸«à¹à¸ -à¹à¸¥à¹à¸§ -à¹à¸¥à¸° -à¹à¸£à¸ -à¹à¸à¸ -à¹à¸à¹ -à¹à¸à¸ -à¹à¸«à¹à¸ -à¹à¸¥à¸¢ -à¹à¸£à¸´à¹à¸¡ -à¹à¸£à¸² -à¹à¸¡à¸·à¹à¸ -à¹à¸à¸·à¹à¸ -à¹à¸à¸£à¸²à¸° -à¹à¸à¹à¸à¸à¸²à¸£ -à¹à¸à¹à¸ -à¹à¸à¸´à¸à¹à¸à¸¢ -à¹à¸à¸´à¸ -à¹à¸à¸·à¹à¸à¸à¸à¸²à¸ -à¹à¸à¸µà¸¢à¸§à¸à¸±à¸ -à¹à¸à¸µà¸¢à¸§ -à¹à¸à¹à¸ -à¹à¸à¸à¸²à¸° -à¹à¸à¸¢ -à¹à¸à¹à¸² -à¹à¸à¸² -à¸à¸µà¸ -à¸à¸²à¸ -à¸à¸°à¹à¸£ -à¸à¸à¸ -à¸à¸¢à¹à¸²à¸ -à¸à¸¢à¸¹à¹ -à¸à¸¢à¸²à¸ -หาภ-หลาย -หลัà¸à¸à¸²à¸ -หลัภ-หรืภ-หà¸à¸¶à¹à¸ -สà¹à¸§à¸ -สà¹à¸ -สุภ-สà¹à¸²à¸«à¸£à¸±à¸ -วà¹à¸² -วัภ-ลภ-รà¹à¸§à¸¡ -ราย -รัภ-ระหวà¹à¸²à¸ -รวม -ยัภ-มี -มาภ-มา -à¸à¸£à¹à¸à¸¡ -à¸à¸ -à¸à¹à¸²à¸ -à¸à¸¥ -à¸à¸²à¸ -à¸à¹à¸² -à¸à¸µà¹ -à¸à¹à¸² -à¸à¸±à¹à¸ -à¸à¸±à¸ -à¸à¸à¸à¸à¸²à¸ -à¸à¸¸à¸ -à¸à¸µà¹à¸ªà¸¸à¸ -à¸à¸µà¹ -à¸à¹à¸²à¹à¸«à¹ -à¸à¹à¸² -à¸à¸²à¸ -à¸à¸±à¹à¸à¸à¸µà¹ -à¸à¸±à¹à¸ -à¸à¹à¸² -à¸à¸¹à¸ -à¸à¸¶à¸ -à¸à¹à¸à¸ -à¸à¹à¸²à¸à¹ -à¸à¹à¸²à¸ -à¸à¹à¸ -à¸à¸²à¸¡ -à¸à¸±à¹à¸à¹à¸à¹ -à¸à¸±à¹à¸ -à¸à¹à¸²à¸ -à¸à¹à¸§à¸¢ -à¸à¸±à¸ -à¸à¸¶à¹à¸ -à¸à¹à¸§à¸ -à¸à¸¶à¸ -à¸à¸²à¸ -à¸à¸±à¸ -à¸à¸° -à¸à¸·à¸ -à¸à¸§à¸²à¸¡ -à¸à¸£à¸±à¹à¸ -à¸à¸ -à¸à¸¶à¹à¸ -à¸à¸à¸ -à¸à¸ -à¸à¸à¸° -à¸à¹à¸à¸ -à¸à¹ -à¸à¸²à¸£ -à¸à¸±à¸ -à¸à¸±à¸ -à¸à¸§à¹à¸² -à¸à¸¥à¹à¸²à¸§ +# Thai stopwords from: +# "Opinion Detection in Thai Political News Columns +# Based on Subjectivity Analysis" +# Khampol Sukhum, Supot Nitsuwat, and Choochart Haruechaiyasak +à¹à¸§à¹ +à¹à¸¡à¹ +à¹à¸ +à¹à¸à¹ +à¹à¸«à¹ +à¹à¸ +à¹à¸à¸¢ +à¹à¸«à¹à¸ +à¹à¸¥à¹à¸§ +à¹à¸¥à¸° +à¹à¸£à¸ +à¹à¸à¸ +à¹à¸à¹ +à¹à¸à¸ +à¹à¸«à¹à¸ +à¹à¸¥à¸¢ +à¹à¸£à¸´à¹à¸¡ +à¹à¸£à¸² +à¹à¸¡à¸·à¹à¸ +à¹à¸à¸·à¹à¸ +à¹à¸à¸£à¸²à¸° +à¹à¸à¹à¸à¸à¸²à¸£ +à¹à¸à¹à¸ +à¹à¸à¸´à¸à¹à¸à¸¢ +à¹à¸à¸´à¸ +à¹à¸à¸·à¹à¸à¸à¸à¸²à¸ +à¹à¸à¸µà¸¢à¸§à¸à¸±à¸ +à¹à¸à¸µà¸¢à¸§ +à¹à¸à¹à¸ +à¹à¸à¸à¸²à¸° +à¹à¸à¸¢ +à¹à¸à¹à¸² +à¹à¸à¸² +à¸à¸µà¸ +à¸à¸²à¸ +à¸à¸°à¹à¸£ +à¸à¸à¸ +à¸à¸¢à¹à¸²à¸ +à¸à¸¢à¸¹à¹ +à¸à¸¢à¸²à¸ +หาภ+หลาย +หลัà¸à¸à¸²à¸ +หลัภ+หรืภ+หà¸à¸¶à¹à¸ +สà¹à¸§à¸ +สà¹à¸ +สุภ+สà¹à¸²à¸«à¸£à¸±à¸ +วà¹à¸² +วัภ+ลภ+รà¹à¸§à¸¡ +ราย +รัภ+ระหวà¹à¸²à¸ +รวม +ยัภ+มี +มาภ+มา +à¸à¸£à¹à¸à¸¡ +à¸à¸ +à¸à¹à¸²à¸ +à¸à¸¥ +à¸à¸²à¸ +à¸à¹à¸² +à¸à¸µà¹ +à¸à¹à¸² +à¸à¸±à¹à¸ +à¸à¸±à¸ +à¸à¸à¸à¸à¸²à¸ +à¸à¸¸à¸ +à¸à¸µà¹à¸ªà¸¸à¸ +à¸à¸µà¹ +à¸à¹à¸²à¹à¸«à¹ +à¸à¹à¸² +à¸à¸²à¸ +à¸à¸±à¹à¸à¸à¸µà¹ +à¸à¸±à¹à¸ +à¸à¹à¸² +à¸à¸¹à¸ +à¸à¸¶à¸ +à¸à¹à¸à¸ +à¸à¹à¸²à¸à¹ +à¸à¹à¸²à¸ +à¸à¹à¸ +à¸à¸²à¸¡ +à¸à¸±à¹à¸à¹à¸à¹ +à¸à¸±à¹à¸ +à¸à¹à¸²à¸ +à¸à¹à¸§à¸¢ +à¸à¸±à¸ +à¸à¸¶à¹à¸ +à¸à¹à¸§à¸ +à¸à¸¶à¸ +à¸à¸²à¸ +à¸à¸±à¸ +à¸à¸° +à¸à¸·à¸ +à¸à¸§à¸²à¸¡ +à¸à¸£à¸±à¹à¸ +à¸à¸ +à¸à¸¶à¹à¸ +à¸à¸à¸ +à¸à¸ +à¸à¸à¸° +à¸à¹à¸à¸ +à¸à¹ +à¸à¸²à¸£ +à¸à¸±à¸ +à¸à¸±à¸ +à¸à¸§à¹à¸² +à¸à¸¥à¹à¸²à¸§ Propchange: ofbiz/trunk/plugins/solr/home/solrdefault/conf/lang/stopwords_th.txt ------------------------------------------------------------------------------ svn:eol-style = native Modified: ofbiz/trunk/plugins/solr/home/solrdefault/conf/lang/stopwords_tr.txt URL: http://svn.apache.org/viewvc/ofbiz/trunk/plugins/solr/home/solrdefault/conf/lang/stopwords_tr.txt?rev=1781731&r1=1781730&r2=1781731&view=diff ============================================================================== --- ofbiz/trunk/plugins/solr/home/solrdefault/conf/lang/stopwords_tr.txt (original) +++ ofbiz/trunk/plugins/solr/home/solrdefault/conf/lang/stopwords_tr.txt Sun Feb 5 11:09:59 2017 @@ -1,212 +1,212 @@ -# Turkish stopwords from LUCENE-559 -# merged with the list from "Information Retrieval on Turkish Texts" -# (http://www.users.muohio.edu/canf/papers/JASIST2008offPrint.pdf) -acaba -altmıŠ-altı -ama -ancak -arada -aslında -ayrıca -bana -bazı -belki -ben -benden -beni -benim -beri -beÅ -bile -bin -bir -birçok -biri -birkaç -birkez -birÅey -birÅeyi -biz -bize -bizden -bizi -bizim -böyle -böylece -bu -buna -bunda -bundan -bunlar -bunları -bunların -bunu -bunun -burada -çok -çünkü -da -daha -dahi -de -defa -deÄil -diÄer -diye -doksan -dokuz -dolayı -dolayısıyla -dört -edecek -eden -ederek -edilecek -ediliyor -edilmesi -ediyor -eÄer -elli -en -etmesi -etti -ettiÄi -ettiÄini -gibi -göre -halen -hangi -hatta -hem -henüz -hep -hepsi -her -herhangi -herkesin -hiç -hiçbir -için -iki -ile -ilgili -ise -iÅte -itibaren -itibariyle -kadar -karÅın -katrilyon -kendi -kendilerine -kendini -kendisi -kendisine -kendisini -kez -ki -kim -kimden -kime -kimi -kimse -kırk -milyar -milyon -mu -mü -mı -nasıl -ne -neden -nedenle -nerde -nerede -nereye -niye -niçin -o -olan -olarak -oldu -olduÄu -olduÄunu -olduklarını -olmadı -olmadıÄı -olmak -olması -olmayan -olmaz -olsa -olsun -olup -olur -olursa -oluyor -on -ona -ondan -onlar -onlardan -onları -onların -onu -onun -otuz -oysa -öyle -pek -raÄmen -sadece -sanki -sekiz -seksen -sen -senden -seni -senin -siz -sizden -sizi -sizin -Åey -Åeyden -Åeyi -Åeyler -Åöyle -Åu -Åuna -Åunda -Åundan -Åunları -Åunu -tarafından -trilyon -tüm -üç -üzere -var -vardı -ve -veya -ya -yani -yapacak -yapılan -yapılması -yapıyor -yapmak -yaptı -yaptıÄı -yaptıÄını -yaptıkları -yedi -yerine -yetmiÅ -yine -yirmi -yoksa -yüz -zaten +# Turkish stopwords from LUCENE-559 +# merged with the list from "Information Retrieval on Turkish Texts" +# (http://www.users.muohio.edu/canf/papers/JASIST2008offPrint.pdf) +acaba +altmıŠ+altı +ama +ancak +arada +aslında +ayrıca +bana +bazı +belki +ben +benden +beni +benim +beri +beÅ +bile +bin +bir +birçok +biri +birkaç +birkez +birÅey +birÅeyi +biz +bize +bizden +bizi +bizim +böyle +böylece +bu +buna +bunda +bundan +bunlar +bunları +bunların +bunu +bunun +burada +çok +çünkü +da +daha +dahi +de +defa +deÄil +diÄer +diye +doksan +dokuz +dolayı +dolayısıyla +dört +edecek +eden +ederek +edilecek +ediliyor +edilmesi +ediyor +eÄer +elli +en +etmesi +etti +ettiÄi +ettiÄini +gibi +göre +halen +hangi +hatta +hem +henüz +hep +hepsi +her +herhangi +herkesin +hiç +hiçbir +için +iki +ile +ilgili +ise +iÅte +itibaren +itibariyle +kadar +karÅın +katrilyon +kendi +kendilerine +kendini +kendisi +kendisine +kendisini +kez +ki +kim +kimden +kime +kimi +kimse +kırk +milyar +milyon +mu +mü +mı +nasıl +ne +neden +nedenle +nerde +nerede +nereye +niye +niçin +o +olan +olarak +oldu +olduÄu +olduÄunu +olduklarını +olmadı +olmadıÄı +olmak +olması +olmayan +olmaz +olsa +olsun +olup +olur +olursa +oluyor +on +ona +ondan +onlar +onlardan +onları +onların +onu +onun +otuz +oysa +öyle +pek +raÄmen +sadece +sanki +sekiz +seksen +sen +senden +seni +senin +siz +sizden +sizi +sizin +Åey +Åeyden +Åeyi +Åeyler +Åöyle +Åu +Åuna +Åunda +Åundan +Åunları +Åunu +tarafından +trilyon +tüm +üç +üzere +var +vardı +ve +veya +ya +yani +yapacak +yapılan +yapılması +yapıyor +yapmak +yaptı +yaptıÄı +yaptıÄını +yaptıkları +yedi +yerine +yetmiÅ +yine +yirmi +yoksa +yüz +zaten Propchange: ofbiz/trunk/plugins/solr/home/solrdefault/conf/lang/stopwords_tr.txt ------------------------------------------------------------------------------ svn:eol-style = native Modified: ofbiz/trunk/plugins/solr/home/solrdefault/conf/lang/userdict_ja.txt URL: http://svn.apache.org/viewvc/ofbiz/trunk/plugins/solr/home/solrdefault/conf/lang/userdict_ja.txt?rev=1781731&r1=1781730&r2=1781731&view=diff ============================================================================== --- ofbiz/trunk/plugins/solr/home/solrdefault/conf/lang/userdict_ja.txt (original) +++ ofbiz/trunk/plugins/solr/home/solrdefault/conf/lang/userdict_ja.txt Sun Feb 5 11:09:59 2017 @@ -1,29 +1,29 @@ -# -# This is a sample user dictionary for Kuromoji (JapaneseTokenizer) -# -# Add entries to this file in order to override the statistical model in terms -# of segmentation, readings and part-of-speech tags. Notice that entries do -# not have weights since they are always used when found. This is by-design -# in order to maximize ease-of-use. -# -# Entries are defined using the following CSV format: -# <text>,<token 1> ... <token n>,<reading 1> ... <reading n>,<part-of-speech tag> -# -# Notice that a single half-width space separates tokens and readings, and -# that the number tokens and readings must match exactly. -# -# Also notice that multiple entries with the same <text> is undefined. -# -# Whitespace only lines are ignored. Comments are not allowed on entry lines. -# - -# Custom segmentation for kanji compounds -æ¥æ¬çµæ¸æ°è,æ¥æ¬ çµæ¸ æ°è,ããã³ ã±ã¤ã¶ã¤ ã·ã³ãã³,ã«ã¹ã¿ã åè© -é¢è¥¿å½é空港,é¢è¥¿ å½é 空港,ã«ã³ãµã¤ ã³ã¯ãµã¤ ã¯ã¦ã³ã¦,ã«ã¹ã¿ã åè© - -# Custom segmentation for compound katakana -ãã¼ãããã°,ãã¼ã ããã°,ãã¼ã ããã°,ããã«ãåè© -ã·ã§ã«ãã¼ããã°,ã·ã§ã«ãã¼ ããã°,ã·ã§ã«ãã¼ ããã°,ããã«ãåè© - -# Custom reading for former sumo wrestler -æéé¾,æéé¾,ã¢ãµã·ã§ã¦ãªã¥ã¦,ã«ã¹ã¿ã 人å +# +# This is a sample user dictionary for Kuromoji (JapaneseTokenizer) +# +# Add entries to this file in order to override the statistical model in terms +# of segmentation, readings and part-of-speech tags. Notice that entries do +# not have weights since they are always used when found. This is by-design +# in order to maximize ease-of-use. +# +# Entries are defined using the following CSV format: +# <text>,<token 1> ... <token n>,<reading 1> ... <reading n>,<part-of-speech tag> +# +# Notice that a single half-width space separates tokens and readings, and +# that the number tokens and readings must match exactly. +# +# Also notice that multiple entries with the same <text> is undefined. +# +# Whitespace only lines are ignored. Comments are not allowed on entry lines. +# + +# Custom segmentation for kanji compounds +æ¥æ¬çµæ¸æ°è,æ¥æ¬ çµæ¸ æ°è,ããã³ ã±ã¤ã¶ã¤ ã·ã³ãã³,ã«ã¹ã¿ã åè© +é¢è¥¿å½é空港,é¢è¥¿ å½é 空港,ã«ã³ãµã¤ ã³ã¯ãµã¤ ã¯ã¦ã³ã¦,ã«ã¹ã¿ã åè© + +# Custom segmentation for compound katakana +ãã¼ãããã°,ãã¼ã ããã°,ãã¼ã ããã°,ããã«ãåè© +ã·ã§ã«ãã¼ããã°,ã·ã§ã«ãã¼ ããã°,ã·ã§ã«ãã¼ ããã°,ããã«ãåè© + +# Custom reading for former sumo wrestler +æéé¾,æéé¾,ã¢ãµã·ã§ã¦ãªã¥ã¦,ã«ã¹ã¿ã 人å Propchange: ofbiz/trunk/plugins/solr/home/solrdefault/conf/lang/userdict_ja.txt ------------------------------------------------------------------------------ svn:eol-style = native |
Free forum by Nabble | Edit this page |