PhD in Information Technology Doctoral Dissertation Defense Announcement
Candidate: Juan (Judy) Luo

Bachelor of Science, Wuhan University, 1997
Master of Science, University of Nebraska-Lincoln, 2003


Monday, April 30, 2012

3:00PM -5:00PM
Nguyen Engineering Building, Room 2901
All are invited to attend.

Alexander Brodsky, Chair
Larry Kerschberg

Carlotta Domeniconi
Ruixin Yang


The state-of-art research in the decision guidance applications is 
trying to build complex systems with predicting capability. This 
dissertation focuses on a framework, models, languages, and algorithms 
to integrate the machine learning functionality (regression learning) 
into DGMS applications as their first class citizen.

A framework CoReJava (Constraint Optimization Regression in Java), which 
extends the Java programming language with regression learning -- the 
ability of parameter estimation for a function, is proposed and 
developed. CoReJava is unique in that functional forms for regression 
analysis are expressed as first class citizens, i.e., as Java programs, 
in which some parameters are not given in advance, but will be learned 
from learning data sets provided as input.The if-then-else decision 
structures of Java language are naturally adopted to represent piecewise 
functional forms of regression. Thus, minimization of the sum of squared 
errors involves an optimization problem with a search space that is 
exponential to the size of learning set. A combinatorial restructuring 
algorithm is proposed to guarantee learning optimality and furthermore 
reduce the search space to be polynomial in the size of learning set, 
but exponential to the number of piece-wise bounds. A Heaviside 
restructuring algorithm, which expresses the piecewise linear regression 
function using a unified functional format, instead of multiple pieces, 
is proposed to decrease the searching complexity further to be 
polynomial in both the size of learning set and the number of piece-wise 
bounds, while the learning outcome will be an approximation of the 

A multi-step Expectation Maximization based (EM-based) algorithm 
(EMMPSR) is proposed to solve piecewise surface regression problem. The 
multiple steps involved are local regression on each data point of the 
training data set and a small set of its closest neighbors, clustering 
on the feature vector space formed from the local regression, regression 
learning for each individual surface, and classification to determine 
the boundaries for each individual surface. An EM-based iteration 
process is introduced in the regression learning phase to improve the 
learning outcome. The reassignment of a cluster identifier for every 
data point in the training set is determined by predictive performance 
of each submodel. Clustering quality validity indices are applied to the 
scenario in which the number of piecewise surfaces is not given in advance.

The Relational Database Management System (RDBMS) is extended with the 
piecewise regression learning capability as well. The functional forms 
are represented as database tables. The EMMPSR algorithm is implemented 
as stored procedures. A case study is undertaken to describe the 
decision optimization process based on the learning outcome of the 
multi-step Expectation Maximization based (EM-based) algorithm.

Evaluation of the resulting research is established by experiments and 
empirical analysis in comparison with those of related regression 
learning packages.

A copy of this doctoral dissertation is on reserve at the Johnson Center