은전한닢 형태소 분석기와 full-text-search 모듈 연동

POSTGRESQL 2018. 12. 10. 17:42
현재 사용중인 검색엔진의 캐싱 문제로 full-text-search에 대해 조사하던 중 
한글 형태소 분석기와 사전을 RDBMS로 활용할 수 있으면 효율적인 운영이 가능하겠다 싶어 
은전한닢 형태소 분석기를 postgreSQL Full text search 모듈에 연동해 보았습니다. 
external C 소스 및 컴파일 스크립트는 textsearch_ko 프로젝트를 참조하였습니다. 

 1. MECAB-KO 설치
$ yum install gcc-c++ libstdc++ -y
$ git clone https://bitbucket.org/eunjeon/mecab-ko.git
$ cd mecab-ko
$ ./configure 
$ make all && make install
2. MECAB-KO-DIC 설치
$ wget https://bitbucket.org/eunjeon/mecab-ko-dic/downloads/mecab-ko-dic-2.1.1-20180720.tar.gz
$ tar xzf mecab-ko-dic-2.1.1-20180720.tar.gz
$ cd mecab-ko-dic-2.1.1-20180720
$ ./configure
$ make all && make install

3. 형태소 분석기 설치확인

$ echo '아버지가방에들어가신다'|mecab
아버지  NNG,*,F,아버지,*,*,*,*
가      JKS,*,F,가,*,*,*,*
방      NNG,장소,T,방,*,*,*,*
에      JKB,*,F,에,*,*,*,*
들어가  VV,*,F,들어가,*,*,*,*
신다    EP+EC,*,F,신다,Inflect,EP,EC,시/EP/*+ㄴ다/EC/*
EOS
4. extension 소스를 github 에서 다운로드 후 컴파일
$ git clone https://github.com/i0seph/textsearch_ko.git
$ cd textsearch_ko
$ make USE_PGXS=1
$ make USE_PGXS=1 install
5. 함수생성 스크립트를 실행
postgres=# \i ts_mecab_ko.sql
SET
BEGIN
psql:ts_mecab_ko.sql:12: ERROR:  could not load library "/usr/local/pgsql/lib/ts_mecab_ko.so": libmecab.so.2: cannot open shared object file: No such file or directory
psql:ts_mecab_ko.sql:17: ERROR:  current transaction is aborted, commands ignored until end of transaction block
psql:ts_mecab_ko.sql:22: ERROR:  current transaction is aborted, commands ignored until end of transaction block
psql:ts_mecab_ko.sql:30: ERROR:  current transaction is aborted, commands ignored until end of transaction block
psql:ts_mecab_ko.sql:32: ERROR:  current transaction is aborted, commands ignored until end of transaction block
psql:ts_mecab_ko.sql:41: ERROR:  current transaction is aborted, commands ignored until end of transaction block
psql:ts_mecab_ko.sql:45: ERROR:  current transaction is aborted, commands ignored until end of transaction block
psql:ts_mecab_ko.sql:49: ERROR:  current transaction is aborted, commands ignored until end of transaction block
psql:ts_mecab_ko.sql:55: ERROR:  current transaction is aborted, commands ignored until end of transaction block
psql:ts_mecab_ko.sql:57: ERROR:  current transaction is aborted, commands ignored until end of transaction block
psql:ts_mecab_ko.sql:63: ERROR:  current transaction is aborted, commands ignored until end of transaction block
psql:ts_mecab_ko.sql:69: ERROR:  current transaction is aborted, commands ignored until end of transaction block
psql:ts_mecab_ko.sql:73: ERROR:  current transaction is aborted, commands ignored until end of transaction block
psql:ts_mecab_ko.sql:93: ERROR:  current transaction is aborted, commands ignored until end of transaction block
psql:ts_mecab_ko.sql:98: ERROR:  current transaction is aborted, commands ignored until end of transaction block
psql:ts_mecab_ko.sql:103: ERROR:  current transaction is aborted, commands ignored until end of transaction block
ROLLBACK
# libmecab.so.2 not found 오류 시 해당경로를 ld.so.conf 파일에 추가하여 리로드 합니다.
$ ldconfig -p|grep mecab
$ sudo find / -name libmecab.so.2
/usr/local/lib/libmecab.so.2
/home/postgres/soft/mecab-ko/src/.libs/libmecab.so.2
$ sudo vi /etc/ld.so.conf
include ld.so.conf.d/*.conf
/usr/local/lib # 추가
$ldconfig # /etc/ld.so.conf 리로드
$ ldconfig -p|grep mecab
        libmecab.so.2 (libc6,x86-64) => /usr/local/lib/libmecab.so.2
        libmecab.so (libc6,x86-64) => /usr/local/lib/libmecab.so
# 함수생성 스크립트를 실행합니다.
postgres=# \i ts_mecab_ko.sql 
SET
BEGIN
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE TEXT SEARCH PARSER
COMMENT
CREATE FUNCTION
CREATE TEXT SEARCH TEMPLATE
CREATE TEXT SEARCH DICTIONARY
CREATE TEXT SEARCH CONFIGURATION
COMMENT
ALTER TEXT SEARCH CONFIGURATION
ALTER TEXT SEARCH CONFIGURATION
ALTER TEXT SEARCH CONFIGURATION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
COMMIT
6. 함수생성 결과 확인
postgres=# select * from mecabko_analyze('아버지가방에들어가신다.');
  word  | type | part1st | partlast | pronounce | conjtype | conjugation | basic | detail | lucene 
--------+------+---------+----------+-----------+----------+-------------+-------+--------+--------
 아버지 | NNG  |         | F        | 아버지    |          |             |       |        | 아버지
 가     | JKS  |         | F        | 가        |          |             |       |        | 가
 방     | NNG  | 장소    | T        | 방        |          |             |       |        | 방
 에     | JKB  |         | F        | 에        |          |             |       |        | 에
 들어가 | VV   |         | F        | 들어가    |          |             |       |        | 들어가
 시     | EP   |         | F        | 시        |          |             |       |        | 
 ㄴ다     | EF   |         | F        | ㄴ다        |          |             |       |        | 
 .      | SF   |         |          | .         |          |             |       |        | .
(8 rows)
# 한국어 full-text-search 기능을 온전히 사용하기 위해 default_text_search_config 파라메터를 korean 으로 변경합니다.
postgres=# select * from to_tsvector('아버지가방에들어가신다');
        to_tsvector         
----------------------------
 '아버지가방에들어가신다':1
(1 row)

postgres=# show default_text_search_config;
 default_text_search_config 
----------------------------
 pg_catalog.english
(1 row)

postgres=# set default_text_search_config = 'korean';
SET
postgres=# select * from to_tsvector('아버지가방에들어가신다');
         to_tsvector          
------------------------------
 '들어가':3 '방':2 '아버지':1
(1 row)


이후 세부적인 테스트결과도 정리해 보려 합니다.