一。Mnogosearch是PHP的搜索引擎 同dateparksearch一样,并且是由dpsearch改良而来的,与PHP整合比较好用。
下面是mnogosearch叙述的mnogosearch安装步骤,已经很完整了,如果还缺少某些应用就是用 apt-get install XXX 命令安装就可以
原文地址 http://www.mnogosearch.org/board/message.php?id=19573
Posted by Kostas Paganelis 2007-09-07 13:37:17
install mnogosearch-3.3.4 with PHP extension module
if they arent already installed
sudo apt-get install apache2-mpm-prefork
and
sudo apt-get install MysqL-server
1. sudo apt-get install zlib1g-dev
2. sudo apt-get install libMysqLclient15-dev
inside mnogosearch-3.x.x directory
3. ./install.pl (build shared libraries - default settings)
or
./configure --prefix=/usr/local/mnogosearch --bindir=/usr/local/mnogosearch/bin --sbindir=/usr/local/mnogosearch/sbin --sysconfdir=/usr/local/mnogosearch/etc --localstatedir=/usr/local/mnogosearch/var --libdir=/usr/local/mnogosearch/lib --includedir=/usr/local/mnogosearch/include --mandir=/usr/local/mnogosearch/man --enable-shared --enable-static --enable-syslog --without-docs --enable-pthreads --disable-dmalloc --enable-parser --enable-mp3 --enable-file --enable-http --enable-ftp --enable-htdb --enable-news --with-MysqL
4. make
5. sudo make install
Install PHP with mnogosearch support
the packages below may be already installed.if not,install them
x. sudo apt-get install build-essential flex
x. sudo apt-get install libxml2-dev
x. sudo apt-get install g++
x. sudo apt-get install apache2-prefork-dev
6. ./configure /
--disable-debug /
--disable-rpath /
--enable-bcmath /
--enable-calendar /
--enable-maintainer-zts /
--enable-embed=shared /
--enable-force-cgi-redirect /
--enable-ftp /
--enable-inline-optimization /
--enable-magic-quotes /
--enable-memory-limit /
--enable-pic /
--enable-safe-mode /
--enable-sockets /
--enable-track-vars /
--enable-trans-sid /
--enable-wddx /
--with-db /
--with-regex=system /
--with-pear /
--with-xml /
--with-xmlrpc /
--with-zli /
--with-MysqL=/usr /
--with-gd /
--enable-mbstring /
--with-apxs2=/usr/bin/apxs2 /
--with-mnogosearch
7. make
the step below is necessary just in order ta have the httpd.conf with at least a line (the httpd.conf must not be empty)
8. sudo gedit /etc/apache2/httpd.conf and write inside "LoadModule mod_xmlent /usr/lib/apache2/modules/mod_xmlent.so" after the installation you can remove the line
9. sudo make install
10. sudo cp PHP.ini-dist /usr/local/lib/PHP.ini
11. sudo gedit /etc/apache2/mods-enabled/PHP5.load and write inside "LoadModule PHP5_module modules/libPHP5.so"
12. sudo gedit /etc/apache2/mods-enabled/PHP5.conf and write inside "<IfModule PHP5_module>
AddType application/x-httpd-PHP .PHP
AddType application/x-httpd-PHP-source .PHPs
</IfModule>"
13. sudo gedit /usr/local/lib/PHP.ini and the parameters you want
create database and user for mnogosearch
edit mnogosearch/etc/indexer.conf (define at least Server and DBAddr)
copy mnogosearch/etc/stopwords.conf-dist mnogosearch/etc/stopwords.conf
sudo cp mnogosearch/etc/langmap.conf-dist mnogosearch/etc/langmap.conf
run mnogosearch/sbin/indexer -Ecreate in order to create the database structure
run the indexer
make the PHP extension module
the step below may not be needed
x. sudo apt-get install autoconf
14. run PHPize in extension module directory (1.96)
the step below may not be needed
x. sudo apt-get install re2c
15. ./configure --with-mnogosearch
16. make
17. then you have mnogosearch.so and mnogosearch.la in the modules directory. move them to your PHP extension directory (look at the extension_dir value in your PHP.ini)
sudo cp modules/mnogosearch.so /usr/local/lib/PHP/extensions/
18. edit PHP.ini
19. add the line: extension = mnogosearch.so in the extension section
- restart apache
after that you configure the file search.htm from the mnogosearch-PHP-3.2.11 in any way you want.
At first it gave me no results but when i commented most of the search options i had results normally (e.g categories etc - i hadn't configured the indexer to index the pages by categories or tags).
mnogosearch配置文档:
注:mnogosearch配置相对比dpsearch简单些,只需要配置一个config就可以。
一、配置DBAddr 部分 ;
二、Document sections. 部分
三、server 部分
还需要注意db和congif的 charset要一致。
这部分是配置数据库DB的 section
###########################################################################
# DBAddr <URL-style database description>
# Options (type,host,database name,port,user and password)
# to connect to sql database.
# Should be used before any other commands.
# Has global effect for whole config file.
# Format:
#DBAddr <DBType>:[//[DBUser[:DBPass]@]DBHost[:DBPort]]/DBName/[?dbmode=mode]
#
# ODBC notes:
#Use DBName to specify ODBC data source name (DSN)
#DBHost does not matter,use "localhost".
#
# Currently supported DBType values are
# MysqL,pgsql,mssql,oracle,ibase,db2,mimer,sqlite.
#
# MysqL users can specify path to Unix socket when connecting to localhost:
# MysqL://foo:bar@localhost/mnogosearch/?socket=/tmp/MysqL.sock
#
# If you are using Postgresql and do not specify hostname,
#e.g. pgsql://user:password@/dbname/
# then Postgresql will not work via TCP,but will use Unix socket.
#
# You may also select database mode of word storage.
# When "single" is specified,all words are stored in the same table.
# If "multi" is selected,words will be located in different tables.
# "multi" mode is usually faster but requires more tables.
# Default mode is "single".
# DBAddrMysqL://root:123456@localhost/mnogosearch/?dbmode=blob
RemoteCharset utf-8
DBAddr MysqL://root:123456@localhost/test1/?dbmode=single&setnames=utf8
//你要搜索那个数据库,只有在进行DBsearch的时候,才需要这句话配置
HTDBAddr MysqL://root:123456@localhost/test2/?dbmode=single&setnames=utf8
// 当你使用DBsearch的时候,需要下面的设置
HTDBList "SELECT ID FROM tablename WHERE status = 'y' AND (tag <> '' OR name <> '' OR description <> '')"
HTDBDoc "SELECTname,tag,description FROM tablename WHERE status = 'y' AND ID = $2 AND (tag <> '' OR name <> '' OR description <>'')"
Server htdb:/dbName/
// 在进行爬页面的时候需要配置 section ,也就是你需要爬页面那部分内容,也可以自定义,在最下面有注释
注:htdb search 需要配置这一块
#######################################################################
# Document sections.
#
# Format is:
#
# Section <string> <number> <maxlen> [clone] [sep] [{expr} {repl}]
#
# where <string> is a section name and <number> is section ID
# between 0 and 255. Use 0 if you don't want to index some of
# these sections. It is better to use different sections IDs
# for different documents parts. In this case during search
# time you'll be able to give different weight to each part
# or even disallow some sections at a search time.
# <maxlen> argument contains a maximum length of section
# which will be stored in database.
# "clone" is an optional parameter describing whether this
# section should affect clone detection. It can
# be "DetectClone" or "cdon",or "NoDetectClone" or "cdoff".
# By default,url.* section values are not taken in account
# for clone detection,while any other sections take part
# in clone detection.
# "sep" is an optional argument to specify a separator between
# parts of the same section. It is a space character by default.
# "expr" and "repl" can be used to extract user defined sections,
# for example pieces of text between the given tags. "expr" is
# a regular expression,"repl" is a replacement with $1,$2,etc
# Meta-characters designating matches "expr" matches.
# Standard HTML sections: body,title
//body title 是每个页面都会有的内容,也就是标准页面部分。
Sectionbody1256
Section title2128
// 如果是htdb search 需要加入检索出的字段(而其他的选项入body、title则需要注释掉) ,例如下面的写法:
tag,name,description是需要被检索出来的关键字,
而NoSupported 的<number>选项被设置成 0 代表他不能是被搜索的关键字,也就是检索时搜索引擎不会搜索NoSupported 这个字段。
Section tag 1 128
Section name2 128
Section description3 1024
SectionNoSupported0 1024
# Meta tags
# For example <Meta NAME="KEYWORDS" CONTENT="xxxx">
#
Section Meta.keywords3128
SectionMeta.description4128
# HTTP headers example,let's store "Server" HTTP header
#
#
#Section header.server564
# Document's URL parts
Section url.file60
Section url.path70
Sectionurl.host80
Section url.proto90
# CrossWords
Section crosswords100
#
# If you use CachedCopy for smart excerpts (see below),
# please keep Charset section active.
#
Section Charset 11 32
Section Content-Type1264
Section Content-Language1316
# Uncomment the following lines if you want tag attributes
# to be indexed
#Section attribute.alt14128
#Section attribute.label15128
#Section attribute.summary16128
#Section attribute.title17128
#Section attribute.face270
# Uncomment the following lines if you want use NewsExtensions
# You may add any Newsgroups header to be indexed and stored in urlinfo table
#Section References180
#Section Message-ID190
#Section Parent-ID200
# Uncomment the following lines if you want index MP3 tags.
#Section MP3.Song 21 128
#Section MP3.Album 22 128
#Section MP3.Artist 23 128
#Section MP3.Year 24 128
# Comment this line out if you don't want to store "cached copies"
# to generate smart excerpts at search time.
# Don't forget to keep "Charset" section active if you use cached copies.
# NOTE: 3.2.18 has limits for CachedCopy size,32000 for Ibase and
# 15000 for Mimer. Other databases do not have limits.
# If indexer fails with 'string too long' error message then reduce
# this number. This will be fixed in the future versions.
#
Section CachedCopy25 64000
# A user defined section example.
# Extract text between <h1> and </h1> tags:
#Section h126 128 "<h1>(.*)</h1>" $1
//这一部分是自定义爬页面的部分 只去爬页面的content 部分,也可以用正则表达式
Section content1 512 "<!--search start-->(.*)<!--search end-->" $1
//这一块没什么说的server section这部分配置将要爬的网站地址,可以是一个页面也可以是一个网站
注:htdb search 不需要配置这一块,注释掉就可以了。
#########################################################################
#Server [Method] [SubSection] <URL> [alias]
# This is the main command of the indexer.conf file. It's used
# to describe web-space you want to index. It also inserts
# given URL into database to use it as a start point.
# You may use "Server" command as many times as a number of different
# servers or their parts you want to index.
#
# "Method" is an optional parameter which can take on of the following values:
# Allow,Disallow,CheckOnly,HrefOnly,CheckMP3,CheckMP3Only,Skip.
#
# "SubSection" is an optional parameter to specify server's subsection,
# i.e. a part of Server command argument.
# It can take the following values:
# "page" describes web space which consists of one page with address <URL>.
# "path" describes all documents which are under the same path with <URL>.
# "site" describes all documents from the same host with <URL>.
# "world" means "any document".
# Default value is "path".
#
# To index whole server "localhost":
#Server http://localhost/
#
# You can also specify some path to index subdirectory only:
#Server http://localhost/subdir/
#
# To specify the only one page:
#Server page http://localhost/path/main.html
#
# To index whole server but giving non-root page as a start point:
#Server site http://localhost/path/main.html
#
#
# You can also specify optional parameter "alias". This example will
# index server "http://www.mnogosearch.org/" directly from disk instead of
# fetching from HTTP server:
#Server http://www.mnogosearch.org/ file:///home/httpd/www.mnogosearch.org/
配置到这里就可以进行基本的检索了,详细配置要需要参考mnogosearch的手册,相比较dpsearch,mnogosearch更适合搭建初级搜索引擎。