ES设置多个自定义分词器,每个分词器使用不同的词库
阅读原文时间:2021年04月22日阅读:1

ES中如何设置自定义分词器并且每个分词器使用自己定义的词库?
1.首先在ansj.cfg.yml中配置

然后在ansj-library.properties文件中添加词典放置路径。。ansj-library.properties和library文件放同一路径下

curl - XPUT‘ http: //localhost:9200/fencitest3?pretty’ -d
‘{“
settings”: {“
analysis”: {“
analyzer”: {“
my_xm_analyzer”: {“
type”: “custom”,
“tokenizer”: “xm_dic”
}
},
“tokenizer”: {“
xm_dic”: {“
type”: “dic_ansj”,
“dic”: “dicxm”,
“stop”: “stop”,
“ambiguity”: “ambiguity”,
“synonyms”: “synonyms”,
“isNameRecognition”: “true”,
“isNumRecognition”: true,
“isQuantifierRecognition”: true,
“isRealName”: false
}
}
}
},
“mappings”: {“
fencitest3”: {“
properties”: {“
title”: {“
type”: “string”,
“analyzer”: “my_xm_analyzer”
}
}
}
}
}’
curl -XGET ‘http://localhost:9200/fencitest3/_analyze?pretty&analyzer=my_xm_analyzer’ -d ‘网五河是一个名字’

如果要在一个es中自定义多个分词器应如下

 curl - XPUT 'http://localhost:9200/fencitest3?pretty' - d '{

  "settings": {
  "analysis": {

    "analyzer": {

        "my_xm_analyzer": {

            "type": "custom",

        "tokenizer": "xm_dic"

    },

        "my_bm_analyzer": {

            "type": "custom",

            "tokenizer": "bm_dic"       
        }           
},
    "tokenizer": {
        "xm_dic": {
            "type": "dic_ansj",
            "dic": "dicxm",
            "stop": "stop",
            "ambiguity": "ambiguity",
            "synonyms": "synonyms",
            "isNameRecognition": "true",
            "isNumRecognition": true,
            "isQuantifierRecognition": true,
            "isRealName": false
        },
        "bm_dic": {
            "type": "dic_ansj",
            "dic": "dicbm",
            "stop": "stop",
            "ambiguity": "ambiguity",
            "synonyms": "synonyms",
            "isNameRecognition": "true",
            "isNumRecognition": true,
            "isQuantifierRecognition": true,
            "isRealName": false
        }
    }
}

},
“mappings”: {
“fencitest3”: {
“properties”: {
“title”: {
“type”: “string”,
“analyzer”: “my_xm_analyzer”
},
“name”: {
“type”: “string”,
“analyzer”: “my_bm_analyzer”
}
}
}
}
}

curl - XPUT ‘http://localhost:9200/fencitest4?pretty’ - d
‘{
“settings”: {
“analysis”: {
“analyzer”: {
“my_xm_analyzer”: {
“type”: “custom”,
“tokenizer”: “xm_dic”
},
“my_bm_analyzer”: {
“type”: “custom”,
“tokenizer”: “bm_dic”
}
},
“tokenizer”: {
“xm_dic”: {
“type”: “dic_ansj”,
“dic”: “dicxm”,
“stop”: “stop”,
“ambiguity”: “ambiguity”,
“synonyms”: “synonyms”,
“isNameRecognition”: “true”,
“isNumRecognition”: true,
“isQuantifierRecognition”: true,
“isRealName”: false
},
“bm_dic”: {
“type”: “dic_ansj”,
“dic”: “dicbm”,
“stop”: “stop”,
“ambiguity”: “ambiguity”,
“synonyms”: “synonyms”,
“isNameRecognition”: “true”,
“isNumRecognition”: true,
“isQuantifierRecognition”: true,
“isRealName”: false
}
}
}
},
“mappings”: {
“fencitest4”: {
“properties”: {
“title”: {
“type”: “string”,
“analyzer”: “my_xm_analyzer”
},
“name”: {
“type”: “string”,
“analyzer”: “my_bm_analyzer”
}
}
}
}
}’

手机扫一扫

移动阅读更方便

阿里云服务器
腾讯云服务器
七牛云服务器

你可能感兴趣的文章