Statistical Data Mining

Dr. Edward J. Wegman
George Mason University
October 28 -29

Abstract:

Overview: Used to extract knowledge hidden from large volumes of raw data, data mining has been applied in recent years to a wide variety of research areas such as medical imaging, high-energy physics, consumer behavior, atmospheric sciences, and security and surveillance. Data mining is an extension of exploratory data analysis and has basically the same goals, the discovery of unknown and unanticipated structure in the data. The chief distinction between the two topics resides in the size and dimensionality of the data sets involved. Data mining in general deals with much more massive data sets for which highly interactive analysis is not fully feasible. In this course we shall discuss the scales of data set sizes and the limits of feasibility for the various data set sizes. We will introduce some visualization tools and indicate how they may be used to accomplish data mining tasks. We shall review some structure finding algorithms including: density estimation and bump hunting; clustering and classification; visual clustering strategies; CART and related methods; time domain time series methods; nonparametric regression including convolution, LOESS and ridges and skeletons methods will be illustrated with application to several data sets. Particular emphasis will be placed on visualization techniques. The course will cover basic techniques used in visual data mining, including parallel coordinates, grand tour, and saturation brushing. These techniques will be illustrated in further discussions on rapid data editing, density estimation, inverse regression, tree-structured decision rules, classification and clustering, structural inference and outlier investigation.

The short course is a free service offered to conference registrants. No additional fees are required beyond conference registration.

 

Return to ACAS Home Page