CONTAINSTABLE with individual RANK?

CONTAINSTABLE with individual RANK?

Post by DC » Sun, 05 Jul 2009 01:30:52


Hi,

we need the ability to retrieve the top 5000 hits from a fulltext
query, sorted by our own score column. This was not possible in SQL
Server 2005, it is only possilby to dig up based on the RANK. (It is
of course possible to retrieve everything and then used an ORDER BY
clause on the results, but that will delay matters substantially).

My question is, if SQL Server 2008 finally offers an added
functionally for this. We currently use Lucene to get this done, but
this requires quiet some maintenance.

Thanks for any hint,
Regards
DC
 
 
 

CONTAINSTABLE with individual RANK?

Post by hilar » Sun, 05 Jul 2009 20:35:44


No, SQL Sever 2008 will allow you to order and filter on rank alone.

You do not have the ability to order on a custom ranking column/algorithm.

I am familiar with lucene, but unfamiliar with the feature you are talking
about. What is it?

 
 
 

CONTAINSTABLE with individual RANK?

Post by DC » Mon, 06 Jul 2009 05:40:59


Hi,

we are using a CLR stored procedure to get results from a webservice
which returns Lucene results. The feature - and I am having a hard
time believing that SQL Server FT still does not support it - is to
quickly return the top N documents ordered by a custom rank (e.g. a
"score" column). SQL Turbo did this years ago, Lucene can do it - why
can't SQL Server do it? It is such an important feature in my opinion.

However, maybe the overall performance has changed since SQL Server
2000. I will check the difference between CONTAINSTABLE / TOP N BY
RANK versus CONTAINS / ORDER BY in 2008 and post results.

Regards
DC
 
 
 

CONTAINSTABLE with individual RANK?

Post by hilar » Mon, 06 Jul 2009 19:34:46


don't bother with the returning the results of top n by rank. Its much
faster for large results sets.

Thanks for the info on Lucene. I was aware of this feature but I thought you
were talking about something which would show the contribution of rank from
each search term in Lucene.
 
 
 

CONTAINSTABLE with individual RANK?

Post by hilar » Mon, 06 Jul 2009 20:23:17

Sorry, I meant to say don't bother with posting the timings for top n by
rank, vs without - as its much faster for large results sets.
 
 
 

CONTAINSTABLE with individual RANK?

Post by Simon Sabi » Tue, 07 Jul 2009 07:58:01


Hello dc,

The point with iFTS in SQL 2008, is that you can join to another table to
get an different ranking and order by that.

From what I remember of SQL Turbo it was very efficient at getting all the
result keys back into a temporary table and then joined from there.

What you should find that, in SQL 2008 you can use the ORDER BY and order
by your custom rank.

The issue is if someone puts in a search criteria that returns 1,000,000
records, however SQL should still be able to do a good job. Just make sure
you look at your query plans well. A search that returns 1 record will produce
a very different plan to one that returns 1,000s.

Simon Sabin
SQL Server MVP
http://www.yqcomputer.com/
 
 
 

CONTAINSTABLE with individual RANK?

Post by DC » Sat, 11 Jul 2009 22:57:46

n 6 Jul., 00:58, Simon Sabin < XXXX@XXXXX.COM > wrote:

Hoorey, Mr. Sabin! I did some testing because this sounded to good to
be true.

I created the [expressions] (with FT Index on "expression") and
[items] tables (scripts below) and filled both with 1,5 million
records. Our indexed text length average length is only about 100
characters (we are searching in names).

To my delight the following two searches (the fulltext returns about
290.000 items for "bank") are both very fast and the RANK query does
not outperform the ORDER BY query by more then 10-20%.

declare
@searchfor varchar(200),
@top int

set @top = 5000
set @searchfor = 'bank*'

-- rank - I knew this was fast
select i.id, i.name, i.score
from items i
inner join CONTAINSTABLE (expressions, expression, @searchfor, @top) c
ON c.[key] = i.id
order by i.score desc

-- find 290.000 records and order by custom score - this used to be
slow with sql server 2000
select top(@top) e.id, i.name, e.score
from items i
inner join expressions e on e.id = i.id
inner join CONTAINSTABLE (expressions, expression, @searchfor) c on c.
[key] = e.id
order by e.score desc


I am not sure when Microsoft optimized this, but this is good news
(even if it may be old news) to me.

Thank you Simon, I wouldn't have bothered to try this out without your
post.

In the past, there were some other problems with sql server fulltext.
I remember, that we had to rebuild the index sometimes because it got
corrupted, and in the meantime no ft search was possible. Also,
rebuilding the index took quiet a while. Does this look better in
2008, too?

Regards
DC



USE [TestFT]
GO

/****** Object: Table [dbo].[expressions] Script Date: 07/10/2009
15:43:29 ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

SET ANSI_PADDING ON
GO

CREATE TABLE [dbo].[expressions](
[id] [int] NOT NULL,
[expression] [varchar](4000) NOT NULL,
[score] [int] NOT NULL,
CONSTRAINT [PK_expressions] PRIMARY KEY CLUSTERED
(
[id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY
= OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

SET ANSI_PADDING OFF
GO

USE [TestFT]
GO

/****** Object: Table [dbo].[items] Script Date: 07/10/2009
15:43:29 ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

SET ANSI_PADDING ON
GO

CREATE TABLE [dbo].[items](
[id] [int] NOT NULL,
[name] [varchar](254) NOT NULL,
[wp1symbol] [varchar](31) NOT NULL,
[score] [int] NOT NULL,
CONSTRAINT [PK_items] PRIMARY KEY CLUSTERED
(
[id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY
= OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

SET ANSI_PADDING OFF
GO

ALTER TABLE [dbo].[expressions] WITH CHECK ADD CONSTRAINT
[FK_expressions_items] FOREIGN KEY([id])
REFERENCES [dbo].[items] ([id])
ON UPDATE CASCADE
ON DELETE CASCADE
GO

ALTER TABLE [dbo].[expressions] CHECK CONSTRAINT
[FK_expressions_items]
GO

 
 
 

CONTAINSTABLE with individual RANK?

Post by Baba » Sun, 02 Aug 2009 08:27:49

have the same question and problem here, I think it is very important to
have a feature in SQL FTS as DC pointed out below. The problem is that it is
not practical at least in my case to retrieve everything and then sort by
our own custom rank column. I am working on an application that requires top
100 or 200 items off of millions of records and should take only a second or
two and based on my own custom ranking. Any idea? is there any way to do
such a thing in SQL FTS?

Thank you.


Bo


"Simon Sabin" < XXXX@XXXXX.COM > wrote in message
news: XXXX@XXXXX.COM ...


 
 
 

CONTAINSTABLE with individual RANK?

Post by Russell Fi » Sun, 02 Aug 2009 08:48:41

o,

Did you read DC's post of 7/10? It said in part: "the RANK query does not
outperform the ORDER BY query by more then 10-20%."

In any case, if you have full-text queried the data you will not get back
millions of records, but some set of them that matches your FT query. Those
are the rows from which you can select the TOP 100 based on your custom rank
column. (Of course, if the FT query returns millions of rows, then you will
have to deal with that.)

Can you try it and report on how well it works?

RLF


"Babak" < XXXX@XXXXX.COM > wrote in message
news: XXXX@XXXXX.COM ...

 
 
 

CONTAINSTABLE with individual RANK?

Post by Baba » Sun, 02 Aug 2009 09:23:46

hanks Russell,

I have had this problem for a long time and it has always been an stopper
for me and my app. If you are dealing with terabytes of data it is very
typical to see hundreds of thousands or millions of matching records for
majority of queries. In this case FT RANK has no use and I thought
Microsoft should have had a solution for this in SQL 2005 or 2008,
apparently not. Again, without this feature the FT search (as far as I
learned) won't work for large databases and custom built apps unless we
tweak the system.

You are right, I knew I have to somehow deal with this problem. It needs
some work around, innovative system design to get the performance I need.

If I am wrong please let me know and drop a line. I am glad I found this
news tread, at least I can share my experience here :)


Bo



"Russell Fields" < XXXX@XXXXX.COM > wrote in message
news: XXXX@XXXXX.COM ...


 
 
 

CONTAINSTABLE with individual RANK?

Post by DC » Mon, 24 Aug 2009 17:27:22

n 1 Aug., 02:23, "Babak" < XXXX@XXXXX.COM > wrote:

Hi Bo,

just stumbeld over your reply; just in case you are still on this: I
HAD this problem at least in Sql Server 2000, maybe it was still
present in Sql Server 2005 (but we were using Lucene to get around the
problem by then) but in Sql Server 2008 (or maybe in 2005 already) the
performance of custom rankings has been seriously enhanced.

I know that the following statement performed poorly in Sql Server
2000 since the FT engine would dig up 100 thousands of results and
then a rather slow sort operation would follow. This took typically
10s of seconds on our servers (back then, of course the hardware is
much better today, too):

select top 5000 e.id, i.name, e.score
from items i
inner join expressions e on e.id = i.id
inner join CONTAINSTABLE (expressions, expression, @searchfor) c on c.
[key] = e.id
order by e.score desc

BUT with Sql Server 2008, that same query returns in 10s of
milliseconds. Give it a try. If you can join in your custom sort
criteria somehow, then you should have much better luck now than in
older Sql Server versions. I was always hoping for an additional
CONTAINSTABLE parameter to get this solved, but apparently that is not
even necessary. I must admit though, that I have only tested this, we
have nothing in production yet (we are still on Lucene but plan to get
rid of it since the index management is not without its burdens).

Regards
DC