PhD in Information Technology Doctoral Dissertation Defense Announcement
Candidate: Juan (Judy) Luo
Bachelor of Science, Wuhan University, 1997
Master of Science, University of Nebraska-Lincoln, 2003
REGRESSION LEARNING IN DECISION GUIDANCE SYSTEMS: MODELS, LANGUAGES AND
ALGORITHMS
Monday, April 30, 2012
3:00PM -5:00PM
Nguyen Engineering Building, Room 2901
All are invited to attend.
Committee
Alexander Brodsky, Chair
Larry Kerschberg
Carlotta Domeniconi
Ruixin Yang
Abstract
The state-of-art research in the decision guidance applications is
trying to build complex systems with predicting capability. This
dissertation focuses on a framework, models, languages, and algorithms
to integrate the machine learning functionality (regression learning)
into DGMS applications as their first class citizen.
A framework CoReJava (Constraint Optimization Regression in Java), which
extends the Java programming language with regression learning -- the
ability of parameter estimation for a function, is proposed and
developed. CoReJava is unique in that functional forms for regression
analysis are expressed as first class citizens, i.e., as Java programs,
in which some parameters are not given in advance, but will be learned
from learning data sets provided as input.The if-then-else decision
structures of Java language are naturally adopted to represent piecewise
functional forms of regression. Thus, minimization of the sum of squared
errors involves an optimization problem with a search space that is
exponential to the size of learning set. A combinatorial restructuring
algorithm is proposed to guarantee learning optimality and furthermore
reduce the search space to be polynomial in the size of learning set,
but exponential to the number of piece-wise bounds. A Heaviside
restructuring algorithm, which expresses the piecewise linear regression
function using a unified functional format, instead of multiple pieces,
is proposed to decrease the searching complexity further to be
polynomial in both the size of learning set and the number of piece-wise
bounds, while the learning outcome will be an approximation of the
optimality.
A multi-step Expectation Maximization based (EM-based) algorithm
(EMMPSR) is proposed to solve piecewise surface regression problem. The
multiple steps involved are local regression on each data point of the
training data set and a small set of its closest neighbors, clustering
on the feature vector space formed from the local regression, regression
learning for each individual surface, and classification to determine
the boundaries for each individual surface. An EM-based iteration
process is introduced in the regression learning phase to improve the
learning outcome. The reassignment of a cluster identifier for every
data point in the training set is determined by predictive performance
of each submodel. Clustering quality validity indices are applied to the
scenario in which the number of piecewise surfaces is not given in advance.
The Relational Database Management System (RDBMS) is extended with the
piecewise regression learning capability as well. The functional forms
are represented as database tables. The EMMPSR algorithm is implemented
as stored procedures. A case study is undertaken to describe the
decision optimization process based on the learning outcome of the
multi-step Expectation Maximization based (EM-based) algorithm.
Evaluation of the resulting research is established by experiments and
empirical analysis in comparison with those of related regression
learning packages.
A copy of this doctoral dissertation is on reserve at the Johnson Center
Library.
|