[Apologies for multiple postings]
**************************************************
*
* GRAND Seminar
* http://cs.gmu.edu/~jmlien/seminar/
*
* Tuesday, March 31, 2009, 12:00 pm
* Room 430A ST2
*
**************************************************
*Title*
Towards a Universal Text Classifier: Transfer Learning from Encyclopedic
Knowledge
*Speaker*
Pu Wang
Ph.D Student
Department of Computer Science
GMU
*Abstract*
Document classification is a key task for many text mining applications.
However, traditional text classification requires labeled data to
construct reliable and accurate classifiers. Unfortunately, labeled data
are seldom available, and often too expensive to obtain. In this work,
we propose a universal text classifier, which does not require any
labeled training document. Our approach simulates the capability of
people to classify documents based on background knowledge. As such, we
build a classifier that can effectively group documents based on their
content, under the guidance of few words describing the classes of
interest. Background knowledge is modeled using encyclopedic knowledge,
namely Wikipedia. Wikipedia's articles related to the specific problem
domain at hand are selected, and used during the learning process for
predicting labels of test documents. The universal text classifier can
also be used to perform document retrieval, in which the pool of test
documents may or may not be relevant to the topics of interest for the
user. In our experiments with real data we test the feasibility of our
approach for both the classification and retrieval tasks. The results
demonstrate the advantage of incorporating background knowledge through
Wikipedia, and the effectiveness of modeling such knowledge via
probabilistic topic modeling. The accuracy achieved by the universal
text classifier is comparable to that of a supervised learning technique
for transfer learning.
*Short Bio*
Pu Wang is a PhD student in the Department of Computer Science at George
Mason University. He received a Masters degree in Computer Science from
Beijing University in 2007, and was an intern at Microsoft Research Asia
from May 2006 to July 2007. His research interest is in machine
learning, currently focusing on graph learning.
--
---------------------------------------------------------------
Jyh-Ming Lien
Assistant Professor
Department of Computer Science [log in to unmask]
George Mason University, MSN 4A5 http://cs.gmu.edu/~jmlien
Fairfax, VA, 22030, USA tel: +1-703-993-9546
---------------------------------------------------------------
|