Please join us for our first KRYPTON seminar of the Fall 2021 semester. We will meet at the usual time, 1600h, on Friday September 10, 2021. We will continue to have a virtual option, but will also meet in person for the first time in 18 months.
You can check Krypton events through our calendar at:
[log in to unmask]&ctz=America/New_York&pli=1" class="">https:[log in to unmask]&ctz=America/New_York&pli=1

Krypton Seminar Series - Spring 2021

Date: September 10, 2021
Time: 4:00PM - 5:30PM

Venue: Zoom and Room 2241 ENGR

Link for remote participation:
Redouane Bertrouni
Ph.D. candidate in Computational Sciences and Informatics, GMU

Title — 
PCA Systematic Sampling: A New Sampling Method to Control Overfitting

Abstract — 
In this presentation, I will present my Ph.D. dissertation's first contribution. I developed a novel sampling method called PCA Systematic sampling as an improved stratified systematic sampling to optimally split data into training and testing subsets. This procedure will help machine learning algorithms avoid the classical mistake of overfitting. While it might be slightly more computationally expensive, it makes up for this apparent weakness by having a better estimate of test error and improving prediction accuracy. The dissertation provides computational and theoretical evidence to support the benefit of the new proposed sampling design over the traditional approach. Examples and mathematical evidence are presented to show how traditional splitting methods such as simple random sampling to partition data can distort the relationship between important covariates and the variable of interest for the test dataset and, as a consequence, leads to either poor model construction or poor model fitting assessment.
In this presentation, I will present my Ph.D. dissertation's second contribution., I create a sampling utility score index to assess data splits or sampling designs as a data quality control tool. This dissertation demonstrates the benefits of my sampling utility index as its mathematical property is derived and studied. Sensitivity analysis is conducted to investigate how it behaves under different scenarios of sampling designs. Monte Carlo simulation is used to assess the significance of this index.
Finally, this dissertation contributes to the field of survey sampling and predictive modeling when the newly developed methodology is implemented on three distinct publicly available datasets. I show in this dissertation how this new scheme of new sampling design developed and named PCA-Systematic can be used as an application on real surveys data like the Annual Survey of Public Employment and Payroll (ASPEP) and the American Housing Survey (AHS) data. I provide evidence of improvement in estimates with comparison to the traditional methods of systematic sampling. My novel PCA-Stratified-Systematic sampling method developed in this dissertation outperforms current and best state of the art sampling methods for the classification problem of Fisher IRIS data.

Bio —Redouane holds a Master's in Statistical Science from George Mason University and held multiple teaching positions in his earlier career, including a GTA for GMU Statistics Department. He worked as a Statistician in the private sector before joining the U.S. Census Bureau to assist with research and production related to record linkage of federal administrative records. In 2012 He served as a Mathematical Statistician to support the sampling and design of educational and demographic surveys then In 2015, He became a Mathematical Statistician technical lead for the sampling design and estimation of Public Sector Government economic surveys, where he published three JSM papers
• Impact of Certified Mail on Nonresponse Rates
• The Performance of the Empirical Best Linear Unbiased Predictor in Annual Survey of Local Government Finances
• Outliers Research in the Annual Survey of Annual Survey of Local Government Finances
Since 2013, He consulted for the International programs area of the Census Bureau to provide numerous technical assistance in survey sampling during multiple overseas workshops with the USAID. From late 2019 to the present, He worked as detailed from Census for the U.S Global AIDS Coordinator (S/GAC) Office to support the U.S. President's Emergency Plan For AIDS Relief (PEPFAR) to provide technical guidance in HIV modeling. He has had the opportunity to work on multiple ad-hoc production and research issues related to HIV and COVID-19. He facilitated a virtual UNAIDS workshop for Francophone countries to assist with HIV 2020 estimation. Recently he presented at the 23rd International AIDS virtual conference and provided technical assistance to the National Statistical Office in Malawi to develop their Master Sampling frame. He is currently a Ph.D. candidate in the Computational Sciences and Informatics at GMU.
Zoom Link
Meeting ID: 923 5693 8266
One tap mobile
+12678310333,,92356938266# US (Philadelphia)
+13017158592,,92356938266# US (Washington DC)

Dial by your location
        +1 267 831 0333 US (Philadelphia)
        +1 301 715 8592 US (Washington DC)
Meeting ID: 923 5693 8266
Find your local number:

Dr. Kathryn Laskey/ Dr. Paulo Costa
[log in to unmask] / [log in to unmask]
703-993-1644 / 703-993-9989