PhD in Information Technology
Doctoral Dissertation Defense Announcement

Candidate: Juan (Judy) Luo

Bachelor of Science, Wuhan University, 1997

Master of Science, University of Nebraska-Lincoln, 2003

REGRESSION LEARNING IN DECISION GUIDANCE SYSTEMS: MODELS,
LANGUAGES AND ALGORITHMS

Monday,
April 30, 2012

3:00PM -5:00PM

Nguyen Engineering Building, Room 2901

All are invited to attend.

Committee

Alexander Brodsky, Chair

Larry Kerschberg

Carlotta Domeniconi

Ruixin Yang

Abstract

The state-of-art research in the
decision guidance
applications is trying to build complex systems with predicting
capability.
This dissertation focuses on a framework, models, languages, and
algorithms to
integrate the machine learning functionality (regression
learning) into DGMS
applications as their first class citizen.

A
framework CoReJava (Constraint Optimization Regression in Java),
which extends
the Java programming language with regression learning – the
ability of
parameter estimation for a function, is proposed and developed.
CoReJava is
unique in that functional forms for regression analysis are
expressed as first
class citizens, i.e., as Java programs, in which some parameters
are not given
in advance, but will be learned from learning data sets provided
as input. The
if-then-else decision structures of Java
language are naturally adopted to represent piecewise functional
forms of
regression. Thus, minimization of the sum of squared errors
involves an
optimization problem with a search space that is exponential to
the size of learning
set. A combinatorial restructuring algorithm is proposed to
guarantee learning
optimality and furthermore reduce the search space to be
polynomial in the size
of learning set, but exponential to the number of piece-wise
bounds. A
Heaviside restructuring algorithm, which expresses the piecewise
linear
regression function using a unified functional format, instead
of multiple
pieces, is proposed to decrease the searching complexity further
to be
polynomial in both the size of learning set and the number of
piece-wise
bounds, while the learning outcome will be an approximation of
the optimality.

A
multi-step Expectation Maximization based (EM-based) algorithm
(EMMPSR) is
proposed to solve piecewise surface regression problem. The
multiple steps
involved are local regression on each data point of the training
data set and a
small set of its closest neighbors, clustering on the feature
vector space
formed from the local regression, regression learning for each
individual
surface, and classification to determine the boundaries for each
individual
surface. An EM-based iteration process is introduced in the
regression learning
phase to improve the learning outcome. The reassignment of a
cluster identifier
for every data point in the training set is determined by
predictive
performance of each submodel. Clustering quality validity
indices are applied
to the scenario in which the number of piecewise surfaces is not
given in
advance.

The
Relational Database Management System (RDBMS) is extended with
the piecewise
regression learning capability as well. The functional forms are
represented as
database tables. The EMMPSR algorithm is implemented as stored
procedures. A
case study is undertaken to describe the decision optimization
process based on
the learning outcome of the multi-step Expectation Maximization
based
(EM-based) algorithm.

Evaluation
of
the resulting research is established by experiments and
empirical analysis
in comparison with those of related regression learning
packages.

A copy of this doctoral dissertation is on
reserve at the
Johnson Center Library.