Tuesday, September 26, 2006

U of T DCS colloquium series starts today

The U of T DCS colloquium series starts today. Today, the seminar is by Lilian Lee from Cornell on Sense and Sensibility: Automatically Analyzing Subject and
Sentiment.

Abstract:
This talk addresses issues in document classification, which
we construe broadly to mean the grouping together of texts that have
similar content. While this task is surely easier than explicitly
determining document content, it has great utility in practice and is
still plenty hard.

One problem currently attracting a great deal of attention is that of
classifying documents by their overall "sentiment": for example, one
might want to determine from its text alone whether a movie review is
"thumbs up" or "thumbs down". Sentiment analysis has empirically been
shown to be resistant to traditional text-categorization approaches,
and involves more subtlety than one might at first imagine. We
demonstrate that learning techniques can yield state-of-the-art
results even when no explicit linguistic information is used.

We also discuss the long-standing problem of representing topical
content. In particular, we present an analysis of the widely-used
SVD-based Latent Semantic Indexing (LSI) algorithm; this analysis
motivates an intuitive generalization providing striking empirical
improvements over LSI.

Finding similarity between documents is a popular thing to do, so you can do clustering, and obtain knowledge from the content. This requires computing the term-document matrix. Text categorization based on topic is actually quite useful for blog posts, and determine from blog posts, the content of them. State-of-the-art methods using bag-of-words based feature vectors have proven less effective for sentiment classification. Doing content analysis of blogs will require use of text classification, but since blog posts deal with sentiment and emotion, therefore this is a problem with topic classification.

She proposes using sentiment summarization on reviews by breaking up the sentences, to determine which are subjective and which are objective, and incorporate relationships between them.

On Technorati:

No comments: