Looking for semantic patterns in computer marketplace advertisements

Looking for semantic patterns in computer marketplace advertisements

Post by Giuseppe.G » Thu, 16 Feb 2006 20:41:28


Hello, I really need help on the following.. I'm doing research for my
thesis work, which deals with capturing a NLP adverti *** t, written in
English [which is not my main language - I apologise for my errors],
and sending it to a parser. The (bottom up) parser outputs a computer
understandable description of the adverti *** t (the language chosen
for now is Neoclassic Lisp) deconstructing it with the help of a
lexicon, a two level grammar and a domain-dependent ontology. My main
reference for this part of my study has been the book "Natural Language
Understanding" by Allen.

The scenario I'd like to depict is the following: a
"computer-uneducated" person (I mean a person who doesn't have nor
*wants* to have specific knowledge or competence about the domain of
computers and computer components) gives a short NLP description
(typing into an imaginary internet computer-shop dialog ) of the
machine he would like to buy, specifying what he would use it for (e.g.
3d rendering, word processing, etc) and broadly adding non technical
requirements such as maximum price, a flat monitor and so on..

Now, I'm told to build a statistically significant corpus of NLP
sentences to (at least try to) capture a certain "structure" in the
sencences: I want to see if I can spot some kind of recurrent pattern
coming through them, in order to fine-tune my grammar.

Browsing ebay pages doesn't help, because people buying/selling
computers simply enumerate a number of technical features without any
structure! My question is, all of you being English language speakers,
would you make me some examples(or direct me towards a list of ads, or
whatever..) of the way you (or your mother, father, grandmother ,etc)
would, say, talk to a salesclerk or shop assistant when you're on the
market for a computer? Obviously without introducing technical specs..

Thank you for your time, anyway!

Giuseppe
 
 
 

Looking for semantic patterns in computer marketplace advertisements

Post by Ian Parke » Sat, 18 Feb 2006 06:23:52

What you do in essence is do a Google type search. You search for a set
of keywords. If the text is parsed then you can confine keywords to a
particular part of speech.

If you say word processing, 3d rendering, this is where knowledge comes
in. You will in fact need nuggets of knowledge.

3d rendering is CPU intensive - Keyword = GHz
Word processing - Requires hard drive

You can fairly easily create expert nuggets yourself (there arn't in
fact that many). The $64K (realistically the $64G) is can we create
nuggets by searching tghe Web. Can we reason?

Basically you need to go NLP description -> Search Nugget file ->
search ads
The nugget file will rate characteristics in order of merit.

Am I helpful, or is this right off the point.

 
 
 

Looking for semantic patterns in computer marketplace advertisements

Post by Giuseppe.G » Sat, 18 Feb 2006 18:50:47

First of all thank you for your reply, Ian!


That's exactly the point! That's where my grammar comes in.


What do you mean by "create expert nuggets yourself"? I'm not sure I
can understand the preceding sentences. My current situation is that
I'm able to associate a particular sequence of one to three-four words
to a specific "concept" (a nugget?), for example:

[NLP] "A computer for playing 3d games" ->
[LOGIC] ->(COMPUTER) AND (GAME_MACHINE) AND (has_VIDEOCARD.GPU)

but my concern is that I'm not sure if right now I can map a good set
of english expression, cause I basically don't know ALL the different
ways (eg a good number of words sequences) to express my
computer-related concept in English!

For example: for me, "Computer Graphics", "3D Rendering" "3D Modeling"
all map to the same concept, but what if a user comes in with another
group of words to express that same concept? Obviously I'm gonna ignore
some (domain)-meaningless words, but I need to include a good set of
real world expressions to create my rules.
So I thought I might ask you advice about examples of sentences you
would produce when going to a clerk and asking for a computer that
suited your needs.

Thank you, Giuseppe
 
 
 

Looking for semantic patterns in computer marketplace advertisements

Post by Ian Parke » Sat, 18 Feb 2006 23:51:10

Basically a nugget is something which provides a logical connection. A
good example is the knowledge that rendering is CPU intensive. Hence
the nugget will tell you

Rendering = GHz

and give you figures of merit. In the case of merit your nuggets may
well be a matrix where you relate suitability for a task to what is
advertised by computer manufacturers.
 
 
 

Looking for semantic patterns in computer marketplace advertisements

Post by Ted Dunnin » Mon, 20 Feb 2006 07:15:12


Nugget = concept.

You probably don't even want to use much grammar for this. Simple
extraction patterns suffice for most systems like this.

The real problem is how do you build your collection of nuggets. You
don't want to have experts doing this (trust me). The trick is to find
some knowledge acquisition technique that scales faster than the usage
of your system. That way your system seems to get smarter and smarter.
Obviously, the only resource that scales even as fast as your usage
rate is your user base itself. Thus, you have to learn from your
users. My own term for systems that have lots of users and which use
the intelligent behavior of some users in order to appear intelligent
to others is a "reflective intelligence". This is in contrast to an
"artificial intelligence" where you build in the intelligence. The
former is often vastly easier than the latter.

Realizing that you have to build a reflective intelligence is the easy
part. Doing it can be very, very hard except in a few clever
exceptions. The reason it is hard is that you will find it difficult
get people to do things that they aren't rewarded for doing, typically
because they wanted to do them in the first place. This means that
asking them to do obscure tasks like create regex patterns or even
getting them to answer more than very occasional questions is pretty
hard to do. You are generally left with observing their normal
behavior and drawing inferences from that.

I myself make a speciality out of finding kinds of behavior that are
easily understandable and commercially valuable. Music listening is a
good example. Most people are willing to let you observe their
listening history if you give them something in return (like nice radio
stations). Thus, at Musicmatch, we were able to use this to learn
about music.

Your problem is one of the ones which are commercially important, but
which don't exhibit the "easily understandable" characteristic. If you
figure out how to bypass that difficulty, then you will really have
something.

Post your results if you find out something interesting!