Titles and Abstracts
Short Course
Robert Gramacy, University of Chicago
Bayesian Inference
This course will cover the basics of the Bayesian approach to practical and coherent statistical
inference. Particular attention will be paid to computational aspects, including MCMC.
Examples/practical hands-on exercises will the run gamut from toy illustration to real-world
data analysis from all areas of science, with R implementations/coaching provided.
Keynote Presentation
Catherine Warner, Institute for Defense Analyses
Rigorous, Defensible, Efficient Testing
DOT&E employs rigorous statistical methodologies in its assessments of test adequacy
and in its operational evaluations to provide sound analysis of test results to
acquisition decision makers. DOT&E has taken the initiative to promote and require
these scientific best practices in operational testing and in the analysis of the
data gathered from these well-designed tests. The methodologies have not only provide
a rigorous and defensible coverage of the operational space; they also allow us to
quantify the trade-space between testing more and answering complex questions about
system performance. The results have been better test plans, more confidence in the
information provided to acquisition and Service officials, and improved test efficiency.
These methods encapsulate the need to "do more without more," especially in light of a
highly constrained fiscal environment. Indeed, the final result of this policy is to
ensure that we optimize scarce resources by neither over-testing nor under-testing
weapon systems; provide sound rationale for the level of testing prescribed; and
gather the data needed to provide warfighters with a basis for trust in evaluations
of the performance of those weapon systems.
While there has been a lot of progress, much work remains. The implementation
of these techniques is still far from widespread across all DoD T&E communities.
Furthermore, we are currently not leveraging these methods in a sequential fashion
to improve knowledge as we move from developmental testing to operational testing.
In this talk I will discuss DOT&E's efforts to institutionalize the use of statistical
methods in DoD T&E. I will share case studies highlighting successful examples and
areas for improvement.
Invited Presentations
Sallie Keller, Professor of Statistics and Director, Social and Decision Analytics
Laboratory, Virginia Bioinformatics Institute at Virginia Tech
Data, Models, and Analytics
Data, models, and analytics have become ubiquitous across all areas of science and
engineering. They are at the heart of scientific discovery and responsible policy and
strategy development. The language used to describe data, models, and analytics is also
pervasive opening opportunities for miscommunication of major results and lack of rigor
in drawing conclusions. Today's focus on "big data" and "data science" is exacerbating
this situation. This talk will discuss data, models, and analytics with a focus on placing
these concepts in a crisp problem-solving framework. Special emphasis will be given to
the collection of skills needed for the analysis of complex scenarios faced by defense
and security analysts.
Edward Wegman, George Mason University
Malware Static Analysis Techniques
Using a Multidisciplinary Approach
Most research work discussing malware detection completely dismiss signatures
as being a thing of the past accusing it of suffering from a weak ability to
detect new zero-day malware. This indeed could be the case if we are still
referring to the classic definition of signatures that renders them specific
to only a single malicious executable binary. But what if these signatures
grouped more malicious executables under a single signature? It would make a
valuable defense towards the fight against malware. To create such signatures
we need to develop new methods and techniques to constantly advance the state
of the art as malware get more and more elusive under old methods and approaches.
The methods I will discuss not only give a good chance of creating effective
signatures for malware, but also provide something just as important and that
is giving the malware analyst an automated approach to understanding key
characteristics of the analyzed malware. This work has resulted in Patent
US 9,092,229.
This is joint work with Muhammed Aljammaz
Bradley Jones, SAS/JMP
Design Oriented Modeling: A Powerful Approach for
Model Selection for Definitive Screening Designs
Optimal designs implement the idea of model oriented design. That is, an
optimal design maximizes the information about a specified model. Designed
experiments often have strong symmetry (such as orthogonal columns). This
suggests that analytical methods for designed experiments could profitably
take advantage of what is already known about their structure. I call this
idea design oriented modeling.
Definitive Screening Designs (DSDs) have a special structure with many desirable
properties. They have orthogonal main effects, and main effects are also orthogonal
to all second order effects. DSDs with more than five factors project onto any three
factors to enable efficient fitting of a full quadratic model. However, analytical
methods for DSDs employ generic tools invented for regression analysis of observational
data. These approaches do not take advantage of all the useful structure that DSDs provide.
This talk introduces an analytical approach for DSDs that does take explicit advantage
of the special structure of DSDs. To make the methodology clear, a step-by-step procedure
for analysis will be presented using specific examples.
Special
Session 1: Text Mining
Organizer: Ed Wegman, George Mason University
David Banks, Duke University
Mining Text Networks
The last decade has seen substantial progress in topic modeling, and considerable progress in the study of dynamic
networks. This research combines these threads, so that the network structure informs topic discovery and the
identified topics predict network behavior. The data consist of text and links from all U.S. political blogs
curated by Technorati during the calendar year 2012. We find cascading patterns of discussion driven by events,
and interpretable topic distributions. A particular advantage of the model used in this research is that it
naturally enforces cluster structure in the topics, through a block model for the bloggers.
Michael James Garrity
Modeling the Evolution and Influence of Blogger's Topic Sentiment and Opinion
Profile
Online blogging has become a popular way for people to share their opinions to a general audience. It also provides
a forum in which others can leave feedback and recommend blogs to other users. Over time, certain users become popular
with high numbers of recommendations and comments, and this may lead to increased influence on other bloggers’ sentiment
within the blogging community. Given the large number of users and topics being discussed in any blogosphere, it can be
difficult to easily gauge patterns and trends in public sentiment, as well as any level of influence certain bloggers may
have. In the following work, we introduce new techniques to model the evolving sentiment and blogger influence within a
blogosphere.
Yasmin H. Said
Risk
Preserving Semantic Content in Text Mining Using Multigrams
Text mining can be thought of as a synthesis of information retrieval, natural language processing,
and statistical data mining. The set of documents being considered can scale to hundreds of thousands and the
associated lexicon can be a million or more words. Information retrieval often focuses on the so-called vector
space model. Clearly, the vector space model can involve very high-dimensional vector spaces. Analysis using the
vector space model is done by consideration of a term-document matrix. However, the term-document matrix basically
is a bag-of-words approach capturing little semantic content. We have been exploring bigrams and trigrams as a
method for capturing some semantic content and generalizing the term-document matrix to a bigram-document matrix.
The cardinality of the set of bigrams is in general not as big as nC2; it is nonetheless usually considerably larger
than n, where n is the number of words in the lexicon.
show the results over time. These maps can be displayed
on such devices as iPads or smartphones.