Solr6.3 配置整理

根据网上的资料整理如下，对solr的入门配置很有帮助。（6.3版本）

数据源配置

支持MySQL

Solr里关于MySQL的配置主要是两个文件：data-config.xml和managed-schema
data-config.xml

<dataConfig>
 <dataSource name="source1" type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/test" user="root" password="******" />
   <document>
     <entity name="goods" query="select * from goods" >
       <field column="id" name="id"/>
       <field column="name" name="name"/>
       <field column="number" name="number"/>
       <field column="updateTime" name="updateTime"/>
     </entity>
   </document>
</dataConfig>

其中source1为数据源自定义名称，没什么约束，type这是固定值，表示JDBC数据源，后面的driver表示JDBC驱动类，这与具体使用的数据库有关，url即JDBC链接URL,后面的user，password分别表示链接数据库的账号密码，下面的entity映射有点类似hiberante或mybatis中的mapping映射，column即数据库表的列名称，name即managed-schema中定义的域名称
managed-schema

<field name="id" type="int" indexed="true" stored="true" required="true" omitNorms="true" />
<field name="name" type="string" indexed="true" stored="true" omitNorms="true"/>
<field name="number" type="int" indexed="true" stored="true" omitNorms="true"/>
<field name="updateTime" type="date" indexed="true" stored="true" omitNorms="true"/>

支持Oracle

Oracle的配置与MySQL类似，唯一的区别是在配置dataSource的时候要修改driver和url的属性如下：

<dataSource driver="oracle.jdbc.driver.OracleDriver" url="jdbc:oracle:thin:@//ip:port/database" user="****" password="*****" />

支持word、excel、ppt、pdf、txt等文件

注意，txt文件编码请保证是UTF-8编码，否则无法索引文件内容
data-config.xml配置

<dataConfig>
 <dataSource type="FileDataSource" name="fileDataSource"/>
 <dataSource name="binary" type="BinFileDataSource" />
 <document>
   <entity name="files" dataSource="null" rootEntity="false" processor="FileListEntityProcessor"      baseDir="E:/Docs/搜索引擎/source/"
     fileName=".*.*" onError="skip" recursive="true">
     <field column="fileAbsolutePath" name="filePath"/>
     <field column="file" name="fileName"/>
     <field column="fileSize" name="size"/>
     <field column="fileLastModified" name="lastModified"/>
     <entity name="tika" processor="TikaEntityProcessor"
       dataSource="binary"
       url="${files.fileAbsolutePath}" format="text">
       <field column="Author" name="author" meta="true"/>
       <field column="text" name="text"/>
     </entity>
   </entity>
 </document>
</dataConfig>

baseDir表示目标文件夹的路径，fileName支持使用正则表达式来过滤一些baseDir文件夹下不想被索引的文件，processor是用来生成Entity的处理器，而不同Entity默认会生成不同的Field域。FileListEntityProcessor处理器会根据指定的文件夹生成多个Entity,且生成的Entity会包含fileAbsolutePath, fileSize, fileLastModified, fileName这几个域，recursive表示是否递归查找子目录下的文件，onError表示当出现异常时是否跳过这个条件不处理。
managed-schema配置

<field name="author" type="text_general" indexed="true" stored="true" omitNorms="true"/>
<field name="text" type="textComplex" indexed="true" stored="true" omitNorms="false"/>
<field name="filePath" type="textComplex" indexed="true" stored="true" omitNorms="true"/>
<field name="fileName" type="textComplex" indexed="true" stored="true" omitNorms="true"/>
<field name="size" type="textComplex" indexed="true" stored="true" omitNorms="true"/>
<field name="lastModified" type="date" indexed="true" stored="true" omitNorms="true"/>

混合支持数据库和文档

配置方法与上面类似，只需把所有的entity的配置放在一个document下，并设置好相应的dataSource即可。
data-config.xml配置

<dataConfig>
 <dataSource type="FileDataSource" name="fileDataSource"/>
 <dataSource name="binary" type="BinFileDataSource" />
 <dataSource name="source1" type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/test" user="root" password="mysql" />
 <document>
   <entity name="files" dataSource="null" rootEntity="false" processor="FileListEntityProcessor"     baseDir="E:/Docs/搜索引擎/source/"
    fileName=".*.*" onError="skip" recursive="true">
   <field column="fileAbsolutePath" name="filePath"/>
   <field column="file" name="fileName"/>
   <field column="fileSize" name="size"/>
   <field column="fileLastModified" name="lastModified"/>
   <entity name="tika" processor="TikaEntityProcessor"
    dataSource="binary"
    url="${files.fileAbsolutePath}" format="text">
     <field column="Author" name="author" meta="true"/>
     <field column="title" name="title" meta="true"/>
     <field column="text" name="text"/>
   </entity>
   </entity>
   <entity name="goods" dataSource="source1" query="select * from goods" >
     <field column="id" name="fileName"/>
     <field column="name" name="author"/>
     <field column="number" name="size"/>
     <field column="updateTime" name="lastModified"/>
   </entity>
 </document>
</dataConfig>

索引

索引的增量更新

如果数据库中有新的数据插入，可以通过索引的增量更新查询到最新的数据。data-config.xml的配置如下：

<dataConfig>
 <dataSource name="source1" type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/test" user="root" password="******" />
 <document>
   <entity name="goods" pk="id" dataSource="source1"
     query="select * from goods where isDelete=0"
     deltaImportQuery="select * from goods where id='${dih.delta.id}'"
           deletedPkQuery="select id from goods where isDelete=1"

      deltaQuery="select id from goods where updateTime> '${dataimporter.last_index_time}'">
     <field column="id" name="id"/>
     <field column="name" name="name"/>
     <field column="number" name="number"/>
     <field column="updateTime" name="updateTime"/>
   </entity>
 </document>
</dataConfig>

参数说明：

pk="ID" ，增量索引查询主键ID时需要
query：用于全量导入而非增量导入
query="select * from goods WHERE isdelete=0
query查询是指查询出表里所有的符合条件的数据，因为有删除业务，所以
where后面有一个限定条件isdelete=0，意思为查询未被删除的数据
deltaQuery：用于增量导入且只返回ID
deltaQuery="select ID from goods where updateTime > '${dih.last_index_time}'"
deltaQuery的意思是，查询出所有经过修改的记录的ID，可能是修改操作，添加操作，删除操作产生的
deletedPkQuery : 用于增量导入且只返回ID
deletedPkQuery="select ID from goods where isdelete=1"
此操作只查询那些数据库里伪删除的数据的ID（即isdelete标识为1的数据）
solr通过它来删除索引里面对应的数据
deltaImportQuery：增量导入起作用，可以返回多个字段的值,一般情况下，都是返回所有字段的列
deltaImportQuery="select * from goods where ID='${dih.delta.ID}'"
deltaImportQuery查询是获取以上两步的ID，然后把其全部数据获取，根据获取的数据
对索引库进行更新操作，可能是删除，添加，修改

其中内置变量${dataimporter.last_index_time}用来记录最后一次索引的时间(包括全量与增量)，${dih.delta.id}记录本次要索引的id，并保存在core/conf/dataimport.properties文件中，该文件为自动创建。
注：该增量更新机制要求数据库表中存在updateTime字段，即记录每条数据插入的时间。如果需要增量删除机制，还需要在数据表中添加isDelete字段，用于标识记录是否需要删除。

索引定时更新机制

索引更新的配置文件dataimport.properties位于solr_home/conf目录下，具体的参数说明如下：

################################################
# dataimport scheduler properties #
#################################################

# to sync or not to sync
# 1 - active; anything else - inactive
syncEnabled=1

# which cores to schedule
# in a multi-core environment you can decide which cores you want syncronized
# leave empty or comment it out if using single-core deployment
# 多个core之间用逗号隔开
syncCores=file

# solr server name or IP address
# [defaults to localhost if empty]
server=localhost

# solr server port
# [defaults to 80 if empty]
port=8080

# application name/context
# [defaults to current ServletContextListener's context (app) name]
webapp=solr

# URL params [mandatory]
# remainder of URL
#增量
params=/dataimport?command=delta-import&clean=false&commit=true&optimize=false&wt=json&indent=true&verbose=false&debug=false

# schedule interval
# number of minutes between two runs
# [defaults to 30 if empty]
interval=2

# 重做索引的时间间隔，单位分钟，默认7200，即1天;
# 为空,为0,或者注释掉:表示永不重做索引
reBuildIndexInterval=7200

# 重做索引的参数
reBuildIndexParams=/dataimport?command=full-import&clean=true&commit=true&optimize=true&wt=json&indent=true&verbose=false&debug=false

# 重做索引时间间隔的计时开始时间，第一次真正执行的时间=reBuildIndexBeginTime+reBuildIndexInterval*60*1000；
# 两种格式：2012-04-11 03:10:00 或者 03:10:00，后一种会自动补全日期部分为服务启动时的日期
reBuildIndexBeginTime=15:10:00

搜索

关键词高亮、返回关键词的上下文

设置参数hl=on开启高亮，并可通过调整hl.fragsize的大小来设置返回上下文的字符数
例：http://localhost:8080/solr/file/select?hl=on&indent=on&q=solr&wt=json&hl.fragsize=20
hl其他参数说明：

hl.fl: 用空格或逗号隔开的字段列表。要启用某个字段的highlight功能，就得保证该字段在schema中是stored。如果该参数未被给出，那么就会高亮默认字段。可以使用星号高亮所有字段。如果使用了通配符，那么要考虑启用hl.requiredFieldMatch选项。
hl.requireFieldMatch: 如果置为true，除非该字段的查询结果不为空才会被高亮。它的默认值是false，意味着它可能匹配某个字段却高亮一个不同的字段。如果hl.fl使用了通配符，那么就要启用该参数。尽管如此，如果查询是all字段（可能是使用copy-field 指令），那么还是把它设为false，这样搜索结果能表明哪个字段的查询文本未被找到。
hl.simple.pre和hl.simple.post: 高亮内容与关键匹配的地方，默认将会被"<em>"和"</em>"包围。可以使用hl.simple.pre和hl.simple.post参数设置前后标签。

排序

使用参数sort，后面加上要排序的字段和顺序(desc或asc)
例：http://localhost:8080/solr/file/select?indent=on&q=:&wt=json&sort=lastModified desc或asc

分页

配合使用参数start和rows实现分页。

start: 指定第一条记录在所有记录的位置。
rows: 一次性取出多少条数据。
start=20&rows=10表示取20到30之间的数据

最后编辑于：2017.12.05 02:30:43

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 215,923评论 6赞 498
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 92,154评论 3赞 392
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 161,775评论 0赞 351
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 57,960评论 1赞 290
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 66,976评论 6赞 388
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 50,972评论 1赞 295
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 39,893评论 3赞 416
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 38,709评论 0赞 271
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 45,159评论 1赞 308
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 37,400评论 2赞 331
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 39,552评论 1赞 346
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 35,265评论 5赞 341
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 40,876评论 3赞 325
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 31,528评论 0赞 21
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 32,701评论 1赞 268
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 47,552评论 2赞 368
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 44,451评论 2赞 352