PEPPLER.ORG
Michael Peppler
Sybase Consulting
Menu
Home
Sybase on Linux
Install Guide for Sybase on Linux
General Sybase Resources
General Perl Resources
Freeware
Sybperl
Sybase::Simple
DBD::Sybase
BCP Tool
Bug Tracker
Mailing List Archive
Downloads Directory
FAQs
Sybase on Linux FAQ
Sybperl FAQ
Personal
Michael Peppler's resume

sybperl-l Archive

Up    Prev    Next    

From: Michael Peppler <mpeppler at MBAY dot NET>
Subject: Re: text searching
Date: Mar 3 1998 10:55PM

philip mikal wrote:
> 
> Hello:
> 
> This might be slightly off topic...
> 
> I have a web site where I allow searching through my Sybase database
> via a HTML form and Sybperl.  Currently I have it working by
> using "like '%$search_string%'",

Using that sort of search will make it pretty slow - you're
guaranteed a table scan for each search, because of the leading
'%' in the like clause.


> but I wish to increase the functionality of it
> by allowing boolean and/or of the search words and the ability to
> ignore case. I believe that I could accomplish the and/or function
> by splitting apart the words from the search string and then having
> an sql statement for each search word joined with a union. 

I would build the SQL statement on the fly - let's say I want to
search for 'Sybase' & 'perl'. The user enters 'sybase+perl'.

@words = split(/+/, $input);
$w1 = shift(@words)
$sql = "select * from table where field like '%$w1%'";
foreach (@words) {
	$sql .= "and field like '%$_%'";
}

$rows = $dbh->sql($sql);

You can add logic to figure out what type of boolean the user
entered (ie if it's an and or an or, and whether parens are
needed).

This will still be terribly slow for even a medium sized database.

To make things faster you would need to use an ad-hoc indexing
scheme, for example extract all the words from each row and
insert a row to an index table that has the word and the foreign
key to the table:

eg if the table you wish to search has 
	text_table(row_id int, msg varchar(255))

you'd have a text_table_index table like
	(word varchar(50), row_id int)
which is indexed on 'word' and allows duplicates.

Now you search the text_table_index instead, and return any hits
there - which will be a lot faster.


> How do I accomplish ignoring case?

You can use the lower() or upper() t-sql functions, or you can 
turn your SQL server to case-insensitive.

Michael
--
Michael Peppler         -||-  Data Migrations Inc.
mpeppler@datamig.com    -||-  http://www.mbay.net/~mpeppler
Int. Sybase User Group  -||-  http://www.isug.com