Phonetic Searching and Full Text Indexing For Business and Personal Names an Addresses

Phonetic Searching and Full Text Indexing For Business and Personal Names an Addresses

Post by stabber » Sun, 11 Mar 2007 22:36:08


I am undertaking a project which requires the ability to perform
phonetic searching across business and personal names and addresses.
This data is relatively uncleansed to date. The amount of work to
cleanse the data is tremendous based upon existing data analysis. My
thoughts are that if we can find the right tools we should be able to
perform matching and searching across the data without implementing
heavy duty cleansing prior to the search. When I talk about
uncleansed data in terms of names I am talking about things like the
following:

Acme, Inc
Acme, Incorp.
Acme Incorporated

**Note variations in abbreviations of Inc. this occurs on other words
such as Company, Association, Limited, DBA and other unknowns.

Smith Plumbing
Smyth Plumbing
Kool Runnings
Cool Runnings

**Note variations in phonetic spelling of words

Ultimately what I am wondering is to what success have people had with
searching fuzzy type searches like this using solely Microsofts Full
Text Indexing. I would also be interested to know if SQL Server 2005
improves on the capabilities of fuzzy matching. In addition I found
some information regardng SQL Turbo http://www.yqcomputer.com/
which seems to be an extender or different implementation of FTS which
does support phonetic matche and I would be interested to know if
anyone has used this tool as well. Any thoughts on how to approach
this matter would be appreciated.
 
 
 

Phonetic Searching and Full Text Indexing For Business and Personal Names an Addresses

Post by Hilary Cot » Mon, 12 Mar 2007 02:01:40

You need to build a thesaurus with all the alternative spellings to
implement this in SQL FTS. You could try to cleanse your data before
indexing it using fuzzy grouping, or by rolling your own levenstein edit
distance. SQL FTS fuzzy search (aka freetext search) will stem your search
phrase for verb forms but will not do the kind of fuzzy searches you are
looking for.

There was a time when SQL Turbo was actually faster than SQL 2000 FTS, but
SQL 2005 is now faster, but I have not compared it against the new version
of SQL Turbo yet.

--
Hilary Cotter

Looking for a SQL Server replication book?
http://www.yqcomputer.com/

Looking for a FAQ on Indexing Services/SQL FTS
http://www.yqcomputer.com/

 
 
 

Phonetic Searching and Full Text Indexing For Business and Personal Names an Addresses

Post by Russell Fi » Wed, 14 Mar 2007 04:42:54

Have you looked at: http://www.yqcomputer.com/

The double-metaphone algorithm is available and has been implemented as an
extended stored procedure. It is pretty good for people names, but I don't
know if it would work well for business names.

FWIW - RLF