Full Text Search - a few questions from a semi-newbie

Full Text Search - a few questions from a semi-newbie

Post by bigjoep » Sat, 17 Sep 2005 07:21:19


I've been learning a lot about full text searching over the last few
days while working on a large scale project (well, large for me atleast
- currently almost 4 million records with near-future expansion to
about 10 million).

I still have a few unanswered questions though. First, let me give my
environment info: Single CPU Server (1Ghz), 1 GB RAM (expanding to 2GB
tomorrow), Windows 2000 server, SQL 2000 Standard Edition. Database is
on mirrored SCSI drive, Full Text Data is on new 80GB IDE drive
(couldn't fit another SCSI). The site is a merchant affiliate, which
pulls in data from Linkshare merchants and then allows for searching
globally, by category, and by merchant.

During population (full or incremental) searching on the index with
CONTAINS queries takes forever. I would expect the performance to be
affected during population, but is there anything I can do to avoid
this headache?

Here's my current situation with this: I had about 2.2 million records
in my table yesterday, and my index finally finished populating - the
index is on 3 varchar fields, one is 255 length, one is 500 and one is
2000. My queries were coming back super fast at this point. I then
turned on "Change Tracking". Not sure if this was a "bad" time to do
this. Possibly should have been done up front, but nothing I've seen
online indicates this can't be done at any point. Thinking that the
index would then update smoothly as I imported data, I proceeded to
import another 1.5 million records. (FYI, new data imports would
optimally be nightly and size would range from 2,000 to 2,000,000
records)

It is now 18 hours later and my catalog says "incremental population in
progress" and is only at about 2.23 million records. It is moving
incredibly slow. I know that it moved faster yesterday when load was
reduced later at night, so I am hoping that it finishes before the
morning.

So basically here's my questions:
1. Will my future updates index quickly with change tracking?

2. How does change tracking work? Does it re-index ANY row called
with ANY UPDATE/INSERT/DELETE statement? My import updates all
existing records. It doesn't check for changes, just assumes it needs
the latest data and updates the record. I can adjust this if
necessary. (any suggestions would be great too)

3. Is there any way to do an alternate search, or does anyone else do
something to avoid searching the indexed data while it's being
populated so it doesn't take 3 minutes to search on my website?

4. I've heard that change tracking is better than scheduled
incremental population. Would I be better off in my scenario with the
scheduled population? I would need a timestamp column right? And
would have to populate that field for all existing rows, right?

5. Any additional insight would be great. I've gotten some great
info from this forum and others, but sometimes you just need to "hear
it" yourself to get it, ya know?


Thanks in advance,
Joe Potenza
XXXX@XXXXX.COM
 
 
 

Full Text Search - a few questions from a semi-newbie

Post by Hilary Cot » Sat, 17 Sep 2005 09:28:26

ou queries may be slow due to locking which occurs during the population
process. Try to use the nolock hint on your queries. Use change tracking to
offer faster indexing times.

Starting change tracking can kick off another incremental or full population
depending on whether a timestamp column exists on this table..

To answer your questions

1) if you are doing singleton updates, change tracking should offer best
performance. So your future DML should progress much faster. If a large
portion of your table changes, i.e. 90% you will probably find that a full
population works best.

2) Change tracking will only resubmit for indexing rows which have the
column full text indexed modified.

3) After a full population is done, the old index should be used. I am
wondering if perhaps your performance problem isn't related to the number of
rows you are returning. You might want to limit this to 100 rows and order
them by rand desc using the ContainsTable or FreeTextTable predicates.

4) To reduce locking you might be better of scheduling your change tracking
updates as opposed to using change tracking with update index in background.
This is almost always more efficient than incremental populations. A
timestamp column should be on your table to perform an incremental
population, otherwise a full population will be done. If you have an full
population completed and you enable change tracking and you have a timestamp
column, IIRC no incremental population will be required. If no timestamp
column exists or if a full population has not completed, a full population
will be run.

5) have you checked out the following kb articles?

http://support.microsoft.com/?scid=http%3a%2f%2fsupport.microsoft.com%2fsupport%2fsql%2fcontent%2f2000papers%2ffts_white+paper.asp

or
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnsqlpro04/html/sp04f9.asp




--
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com

"bigjoepo" < XXXX@XXXXX.COM > wrote in message
news: XXXX@XXXXX.COM ...