faqts : Computers : Databases : MySQL : Common Problems : Full Text Indexing

+ Search
Add Entry AlertManage Folder Edit Entry Add page to http://del.icio.us/
Did You Find This Entry Useful?

48 of 79 people (61%) answered Yes
Recently 7 of 10 people (70%) answered Yes

Entry

How can I do simple text indexing on a blob column?
How can I search for words in a blob field?
How can I simulate a search engine and look for words in blob fields?

Jan 23rd, 2000 21:42
Nathan Wallace, Jan Dvorak


Even if you could index the whole of your TEXT field, it wouldn't help
in the LIKE '%keyword%' searches.

The solution is to keep the words in a separate table.  Say, you keep
information about Things. Then you probably have done something like

    create table Thing (
        id integer unsigned not null auto_increment primary key,
        description text
    );

If you want to have a fulltext index of description, you should add two
other tables:

    create table Word (
        id integer unsigned not null auto_increment primary key,
        word varchar(100) not null unique
    );

    create table Word_Occurrence_in_Thing_Description (
        word_id integer unsigned not null,
        thing_id integer unsigned not null,
        primary key ( word_id, thing_id )
    );

Now, if you want to search the Thing descriptions for a word, you'll say

    select
        WO.thing_id
      from
        Word W,
        Word_Occurrence_in_Thing_Description WO
      where
        W.word = 'searchword%'
      and
        WO.word_id = W.id;

This query will return you the list of Thing id's that contain the
'searchword' in their description. This query is as optimized as it can
be.

Admittedly, you'll have to do more work when inserting a new Thing.  You
start with the standard

    insert into Thing ( description )
      values ( 'new_thing_description' );

You store the last_insert_id() in a client variable, $new_thing_id, say.

Now you take the same description you fed to the previous query and
split it into a list of words.  Then, for each word (let's call it
$new_word) at a time, you do

    insert ignore into Word ( word ) values ( lower( $new_word ) );

    insert ignore into 
       Word_Occurrence_in_Thing_Description ( word_id, thing_id )
    select id, $new_thing_id 
       from Word where word = lower( $new_word );

When the whole list is processed, you're done.

When you delete a Thing, you should also delete all the
Word_Occurrence_in_Thing_Description records that have thing_id equal to
the id of the Thing being deleted.

When updating the Thing description, you'll be best served by deleting
the old word occurrences and inserting new ones.

You don't have to worry about Words that are not referenced by any
Word_Occurrence_in_Thing_Description.  They just sit there.  However, if
you feel a very strong urge to get rid of them (and have nothing else to
do), you can safely delete them.  There was a thread nearby on this list
that says how to do it.

That's the way to handle fulltext in a relational database.