Professor Harry Wechsler
Department of Computer Science
e-mail : wechsler@cs.gmu.edu
web : http://cs.gmu.edu/~wechsler/
(703) 993-1533 (office)
(703) 993-1530 (sec)
(703)993-1710 (fax)
SUMMER '2006
CS 750 Theory and Applications of Data
Mining
Class Information
A01 5/22 50855 MWF
3:50 p.m. – 6:50 p.m.
IN 136
Prerequisites
CS 450
(“databases”), CS 580 (“AI”) or permission of
instructor
Office Hours
M-W-F 3:15 –
3:45 PM or by appointment (SITE II - Rm. 461)
Textbook
Introduction
to Data Mining, Tan, Steinbach and
Kumar,
Pearson Addison
Wesley, 2006
web site for textbook slides : http://www-users.cs.umn.edu/~kumar/dmbook/
Reference
Data Mining: Concepts and
Techniques, Han and Kamber, Morgan
Kaufmann, 2001
web site for textbook slides : http://www-faculty.cs.uiuc.edu/~hanj/bk2/
WEKA web site for data mining software
http://www.togaware.com/datamining/survivor/Weka.html
Background
for Pattern Recognition and Classification
http://research.cs.tamu.edu/prism/lectures.htm
UCI
Machine Learning Repository Content Summary
http://www.ics.uci.edu/~mlearn/MLSummary.html
References
1. V. Cherkassky and F. Mulier,
Learning from Data : Concepts, Theory, and Methods, John Wiley,
1999.
2. D. Pyle, Data Preparation for Data
Mining, Morgan Kaufmann, 1999.
3. R.
Baeza -Yates and B. Ribeiro-Neto, Modern
Information Retrieval, Addison-Wesley, 1999.
4. T.
Hastie, R. Tibshirani, and
J. Friedman, The Elements of
Statistical Learning: Data Mining, Inference, and Prediction, Springer,
2001.
Course
Description
Concepts and
techniques in data mining and their multidisciplinary implementation and applications. Topics include data warehousing and
databases, data cleaning and transformation, concept description, association
and correlation rules, data classification and predictive modeling, clustering,
performance analysis and scalability, mining stream and sequence data, social
network analysis, multimedia data mining, biometrics, and emerging themes and
trend. Term team project and topical
review are required.
Motivation
The explosive
growth in generating, collecting and storing data has generated an urgent need
for new techniques and automated tools that can intelligently assist us in
transforming the vast amounts of data into useful information and knowledge.
Data mining is a multidisciplinary field, drawing from areas including AI,
database technology, data visualization, information retrieval, high performance
computing, machine learning, mathematical programming, neural networks, pattern
recognition, statistical learning theory, and statistics. The course provides the graduate students the
opportunity to learn about the management and use of large data repositories
based upon a multidisciplinary approach.
Goals
The objective of this course is to introduce graduate students to
current research, technological advances and trends in data mining. Data mining, which supports knowledge discovery
in databases (KDD), helps with the automated extraction of patterns
representing knowledge implicitly stored in large databases, data warehouses,
and other massive information repositories.
The course focuses on issues related to the feasibility, usefulness,
efficiency, and scalability of automated techniques for the discovery of
patterns hidden in large databases.
Students will be exposed to the above topics via lectures and reading
assignments, including recent journal and conference papers. Students are
expected to complete a term project and to make an in depth presentation on a
topic related to data mining. As data mining has matured, the field is now
advancing on three new fronts: (i) ability to mine
data in real time; (ii) predictive analysis rather than merely explain past
trends; and (iii) analyze messy “unstructured” data.
Follow – Up Studies
with Professor Wechsler : 1. CS 667 –
Biometrics – Spring 2008; 2. CS 775
/ IT 844
-- Pattern Recognition – Spring 2007; 3. Certificate in Biometrics; 4.
PhD dissertation.
Grading
(Team) Term Project à 50 %.
Midterm – June 9 à 30 %
Science and Technology REVIEW and Class Participation à 20 %
Term Project
Students are working in teams on the term project.
Scope and range for the project has to be agreed with the instructor.
Task involves meaningful functionality and significant amounts of data.
Project includes the following STEPS :
1. Problem definition,
requirements analysis and conceptual design.
2. Data selection / sampling. // visualization //
3. Cleaning and integration / Preprocessing // visualization //
4. Data transformation / Data Reduction // visualization //
5. Data Mining // visualization //
6. Modeling, test & evaluation, and performance assessment // visualization
//
7. Knowledge discovery // visualization //
Use domain
knowledge and visualization for all the steps.
Iteratively refine
the quality and scope of your project
Reviews and class presentations are conducted stepwise
throughout the course (see tentative schedule). First a draft for each step is
expected
the lecture the STEP is listed in the tentative schedule listed below.
Based upon feedback received in class the same step is completed and
presented again the following lecture.
Final (In Class)
Project Presentation (SLIDES)
(about 45 minutes)
1. Survey / Literature Review
of (a) application
and (b) task / functionality, data mining (STEP 5)
and model selection (“training strategy”).
2. Brief
Description of STEPS 1 – 7.
3. Performance Evaluation and Assessment of your project.
Final Project Report (HARD COPY) (at
most 15 pages)
Submit Technical Report (TR) that
covers your Final Project
Presentation.
Tentative Schedule
|
May 22 |
- Appendix C – Probability and Statistics - |
|
May 24 |
- Appendix A – Linear Algebra - |
|
May 26 |
- Appendix E – Optimization - |
|
May 29 |
Memorial Day – no class |
|
May 31 June 2 – June 5 - |
Data reduction & transformation - Step 2& 3 [5/31] Appendix B
– Dimensionality Reduction |
|
June 7 |
REVIEW for Mid – Term Appendix D –Regression |
|
June 9 |
Mid – Term Closed books and notes bring blue book and
calculator |
|
June 12 – 14 - 16 |
Chaps. 4/5, 6/7, 8/9 - Advanced Topics – Classification – Association –Clustering
Biometrics STEP 5 – June 12 STEPS
6 – 7 – June 16 |
|
June 19 |
FINAL PROJECT PRESENTATION |
|
June 21 |
FINAL PROJECT PRESENTATION |