CIS 4523/5523: Knowledge Discovery and Data Mining
Spring 2024
Goals
The objective of knowledge discovery and data mining process is to extract
nontrivial, implicit, previously unknown, and potentially useful information
from massive datasets. The course is intended to serve as an introduction to the
fundamental techniques required to support this process. The course is structured
to provide ample opportunity for participants to learn about this research area,
and scout around for promising research topics by a hands-on experience.
Prerequisites
Basic knowledge in database systems; programming skills; basic
statistics, graph theory, and linear algebra.
Texts
Tan P.N., Steinbach M., Kumar V., Karpatne A.: Introduction to Data
Mining, 2nd Edition, Pearson Education, 2018 ISBN: 0133128903 (required)
Aggarwal, C.: Data Mining, The Textbook: Springer, 2015,
ISBN-13: 978-3319141411 (recommended)
Topics
An overview of data mining tasks and techniques.
Data:
data types
data quality
data preprocessing: aggregation, sampling, dimensionality reduction, feature selection
Similarities and distances:
multidimensional data
text similarity measures
temporal similarity measures
graph similarity measures
supervised similarity functions
Descriptive and Predictive Modeling:
model functions (cluster analysis, summarization, classification, regression, anomaly detection)
model representation (instance-based and rule-based classifiers, decision trees, probabilistic classifiers, density models, partitioning, hierarchical, density-based, grid-based and model-based clustering algorithms, frequent pattern mining).
Advanced topics:
mining data streams
mining time series
mining spatial data
mining discrete sequences
Reading and research projects presentations.
Grading
Homework (30%), midterm exam on March 17(20%), reading/presenting
assignments (20%) and a research project report due May 2 by 5:30pm (30%).
Late Policy and Academic Honesty
An automatic extension of homework submission is acceptable with 20% penalty per
day. Discussing materials with fellow students is acceptable, but programs,
experiments and the reports must be done individually.