Solrインストール - カブトボーグを写経するblog

Tomcatのダウンロード
C:\Lucene\tomcatとして展開
Solrのダウンロード

http://lucene.apache.org/solr/

apache-solr-1.3.0\example\webapps\solr.warを
C:\Lucene\tomcat\webapps\solrとして展開
apache-solr-1.3.0\example\solr\bin と apache-solr-1.3.0\example\solr\confをC:\Lucene\tomcat\webapps\solrにコピー

環境変数の追加
CATALINA_HOME
C:\Lucene\tomcat
startup.batの先頭に以下の2行を追加

set CATALINA_OPTS=-Dsen.home=D:\Software\Java\sen -Xmx512M
set JAVA_OPTS="-Dsolr.solr.home=C:\Lucene\tomcat\webapps\solr"

tomcatのstartup.batを実行
http://localhost:8080/solr/admin/analysis.jsp
にアクセス
Fieldをtypeとしtextと入力
Field valueに適当な英数字を入力しAnalyzeをクリック
IndexAnalyzerと分割されたものが表示されればOK

以上で英語環境は完成

ここから日本語環境を目指す。
これはFieldのtypeから日本語のJapaneseAnalyzerを指定し実行する。
lucene-ja.jarとsen.jarをC:\Lucene\tomcat\webapps\solr\WEB-INF\libにコピー
C:\Lucene\tomcat\webapps\solr\conf\solrconfig.xml
下記のように変更

<dataDir>C:\Lucene\solr</dataDir>

C:\Lucene\tomcat\webapps\solr\conf\schema.xml以下を

    <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <!-- in this example, we will only use synonyms at query time
        <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        -->
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldType>

以下のように修正

    <fieldType name="text" class="solr.TextField">
      <analyzer class="org.apache.lucene.analysis.ja.JapaneseAnalyzer" />
    </fieldType>

C:\Lucene\tomcat\conf\Catalina\localhost\solr.xmlを下記のように作成

<Context docBase="solr_first" debug="0" crossContext="true" >
<Environment name="solr/home" type="java.lang.String" value="C:\Lucene\tomcat\webapps\solr" override="true" />
</Context>

ここでTomcatを起動すると

solrconfig.xmlが読めないとエラー

home設定してるのに・・・
homeをもう1つ別の方法で設定してみる
startup.batの先頭に以下を追加

set JAVA_OPTS="-Dsolr.solr.home=C:\Lucene\tomcat\webapps\solr"

ここでtomcatを起動すると

致命的: org.apache.solr.common.SolrException: Error loading clas 'org.apache.lucene.analysis.ja.JapaneseAnalyzer'

というエラーが・・・
ちゃんとlucene-ja.jarはlibフォルダにコピーしてるんだけど
CATALINA_HOMEをC:\Lucene\tomcatと設定してみた
warを解凍して中のlibにsen.jarとlucene-ja.jarを入れてまた圧縮して再配置してみたらエラー消えた！

動くかな？

tomcatのstartup.batを実行
http://localhost:8080/solr/admin/analysis.jsp
にアクセス
Fieldをtypeとしtextと入力
Field valueに適当な日本語を入力しAnalyzeをクリック
IndexAnalyzerと分割されたものが表示されればOK
動いたあああああああああああ
少し休憩
再開

検索をしてみる。
http://localhost:8080/solr/admin/
検索結果が文字化けする

C:\Lucene\tomcat\conf\server.xml
URIEncoding="UTF-8" useBodyEncodingForURI="true"を下記のように追加

    <Connector port="8080" protocol="HTTP/1.1" 
               connectionTimeout="20000" 
               redirectPort="8443" URIEncoding="UTF-8" useBodyEncodingForURI="true"/>

C:\Lucene\tomcat\webapps\solr\WEB-INF\web.xml
のfilterっていうところを下に修正（元々書いてあるfilterの上にコピペ)

   <filter>
        <filter-name>encodingfilter</filter-name>
         <filter-class>filters.SetCharacterEncodingFilter</filter-class>
        <init-param>
            <param-name>encoding</param-name>
            <param-value>UTF-8</param-value>
        </init-param>
    </filter>
    <filter-mapping>
        <filter-name>encodingfilter</filter-name>
        <url-pattern>/*</url-pattern>
    </filter-mapping>

http://www.javaroad.jp/opensource/SetCharacterEncodingFilter.java
http://www.javaroad.jp/opensource/SetCharacterEncodingFilter.class
をC:\Lucene\tomcat\webapps\solr\WEB-INF\classes\filtersに配置
classesフォルダ、filtersフォルダは自分で作成
最後にC:\Lucene\tomcat\webapps\solrをzipで圧縮して、拡張子をwarに変更し、C:\Lucene\tomcat\webapps\solr.warとして配置

文字化け直ったあああ
またさっきと同じでwarの中に入れてあげなきゃいけないみたい

http://localhost:8080/solr/admin/
適当な日本語入れてSearch

とりあえず動くところまで確認
あとはLuceneで作ったindexから検索するように設定しよう
ちゃんとフォルダ指定してるのにlukeから検索したときは結果出るのに、solrから検索すると駄目だ

numDocs: 27864
maxDoc: 27864
こうなってるからindexは読めてるのかな？

困ったぞ
原因がまったくわからない・・・
schema.xmlが問題と予想
よく考えたらschema.xmlとかexampleっていうフォルダから持ってきたんだった
当然修正が必要なわけだ

こんな風に書いてみた

 <fields>
   <field name="path" type="text" indexed="true" stored="true" />
   <field name="contents" type="text" indexed="true" stored="true" />  
　　いろいろ 
 </fields>
いろいろ
 <uniqueKey>path</uniqueKey>
 <defaultSearchField>contents</defaultSearchField>
いろいろ

field nameは追加
uniqueKeyとdefaultSearchFieldは修正

公式ドキュメントを読まず直感で設定して無事動作まで完了ｗ

最後にメモ

hadoopとかsolrとかの実験 - Solrのインストール - myfinder -redMine-
- http://repos.myfinder.jp/wiki/hadoop-and-lucene/Solr%E3%81%AE%E3%82%A4%E3%83%B3%E3%82%B9%E3%83%88%E3%83%BC%E3%83%AB