Data Science as a Means to Expedite Software Behavior Analysis

Presentation September 3, 2021
By Joely Nelson

In the summer of 2021 I had the privilege of working as an R&D intern focused on data science at Sandia’s Center for Cyber Defenders. I joined a team engaged in vulnerability assessment of software. For my summer project, I was asked to research the efficacy of data analytics as a means of expediting software behavior analysis. Early results showed that NLP-based techniques developed; such as n-gram divergence comparisons and UMAP dimension reductionality; successfully differentiate event logs collected under varying conditions.

Abstract: When programs run on a computer system they generate event logs which can be analyzed manually to determine the behavior of the program. However, this analysis requires an experienced analyst and can be tedious and time consuming. This presentation details the research done as part of a summer project at Sandia National Laboratories, where the efficacy of data science and analytical techniques as a means of differentiating program behavior by log data was researched. Early results showed that NLP-based techniques developed; such as n-gram divergence comparisons and UMAP dimension reductionality; successfully differentiate event logs collected under varying conditions, and show promise for applications in more complex applications and environments.

See the presentation slides here

Written on September 3, 2021