Please join us for our next KRYPTON seminar of the Fall 2021 semester. We will meet at the usual time, 1600h, on Friday October 15, 2021. We will continue to have a virtual option, but will also meet in person. I will attend in person in Room 2241 ENGR for anyone who wants to attend in person.
You can check Krypton events through our calendar at:
[log in to unmask]&ctz=America/New_York&pli=1" class="">https:[log in to unmask]&ctz=America/New_York&pli=1

Krypton Seminar Series - Fall 2021

Date: October 15, 2021
Time: 4:00PM - 5:30PM

Venue: Zoom and Room 2241 ENGR

Link for remote participation:
James Lee
Ph.D. candidate in Systems Engineering and Operations Research, GMU

Title — 
A Systems Engineering Approach to Data Science Project Knowledge Management

Abstract — 
Data scientists currently employ manual processes where either a new solution is created from scratch or potential previous solutions are searched for and modified for new problems. The process of searching, selecting candidates, and evaluating whether it is the appropriate solution is often a time-consuming activity. Although data science is a relatively new term, it is an interdisciplinary field that spans many mature areas such as database management, statistics, and computer/software engineering. The trend is to have data scientists mainly work in Python or R, but there exist many high-quality scripts that are written in other programming languages such as SQL, Java, and/or JavaScript, each that demonstrates its own strengths in its respective field. It would be ideal to take these existing scripts and knowledge and be able to stitch them together to solve new problems, but interoperability is a challenge. Finally, there exist some traceability challenges with software development projects in that the actual implementation, which happens in an environment external to a design documentation repository, i.e., a systems engineering modeling tool, is seldom linked back to the design repository, leaving a broken link between the design and implementation of the product. 

The objective of this work is to improve the initial solution search and selection process for data science projects, enable interoperability and reuse of existing solutions from different disciplines in a single integrated workflow, and advance traceability between design and implementation of data science projects. I propose a method that enables documenting software solutions in a central repository which provides efficient and effective prototyping of solutions for problems that have similar characteristics to previously developed solutions and allows execution of modules of different software languages in a single workflow.

A knowledge management framework that utilizes systems engineering tools, process modeling, and ontologies is proposed for documenting process-oriented data science solutions and generating potential solutions for new problem instances. Using this framework, I document solutions from the presented case study in a systematic way by using formal process modeling to document the solution workflows, generalize the solution methods by creating a process template and categorizing each part of the solution as a part of the template, and define traceability links from the problem requirements to each solution method so that new solutions of the generalized workflow can be instantiated given a new problem and be executed in simulation tools. I also provide evidence that supports the utility of this approach by demonstrating both efficient and effective prototyping of solutions for new problems and show that the proposed method enables executing software modules written in different software languages in a single workflow.

Bio —James Lee is a data scientist at DCI Solutions and Ph.D. candidate at George Mason University. James received a B.S. in statistics from the University of Michigan and a Master's in Systems Engineering at George Mason University. His academic research focuses on systems architecture, process management, and ontology development. He has researched insider threat detection methods and contributed to the advancement of insider threat detection research. James' recent work focuses on application of deep learning and natural language processing for anomaly detection in the cyber security domain.  
Zoom Link
Meeting ID: 923 5693 8266
One tap mobile
+12678310333,,92356938266# US (Philadelphia)
+13017158592,,92356938266# US (Washington DC)

Dial by your location
        +1 267 831 0333 US (Philadelphia)
        +1 301 715 8592 US (Washington DC)
Meeting ID: 923 5693 8266
Find your local number:

Dr. Kathryn Laskey/ Dr. Paulo Costa
[log in to unmask] / [log in to unmask]
703-993-1644 / 703-993-9989