Annotated corpora with ontologies

Annotated corpora with ontologies

Post by Richard Ca » Sun, 05 Feb 2006 16:02:37

Does anyone know of sizable document collections that are manually
annotated with respect to a rich domain ontology and that are publicly
available at no or nominal cost for research purposes? The markup
should be semantically rich, as opposed to just covering linguistic
features such as PoS (e.g., Penn Treebank). Any domain will do and the
markup may have been derived semi-automatically, as long as the final
result is of high quality (usable as groundtruth/gold standard). Both
English and other languages are of interest. Large instance data sets
tied to an ontology would be useful, too.

I found two examples:


but not much else, even after extensive searching.

Any pointers are appreciated.

Thanks in advance,

Annotated corpora with ontologies

Post by Milin » Tue, 07 Feb 2006 02:54:52

Hi Richard,

Do you think it makes sense to create such an annotated ontology
corpus? In the past, it was very expensive to create such a corpus, but
now, thanks to the well-proven model of outsourcing, it might be
possible to create such a corpus and make it freely available for
research. It would cost some money of course, but the benefits of such
a freely available corpus far outweigh the relatively minimal costs...

We have the capability to automatically tag articles and paragraph with
a set of ontologies, and this could be further refined/corrected

How difficult do you think would it be to find a budget to do something
like this?

We have the capability to lead such an excercise.

Milind Joshi


Annotated corpora with ontologies

Post by Richard Ca » Tue, 07 Feb 2006 09:36:20

Thanks for the heads-up. There is no budget at the moment but I will
keep this option in mind.