CMSC 173 - Data Mining
Course Description
Fundamental concepts in data mining. Big data and basic statistics; databases and data
warehouses; preprocess: data preparation for data mining; patterns, association,
correlations; classification and prediction; clustering; applications in data mining.
Course Learning Outcomes
After completion of the course, the student should be able to:
- Analyze large sets of data to uncover patterns within the data using mathematical algorithms;
- Predict data based on patterns discovery;
- Use data mining tools and statistical analysis to analyze data; and
- Implement data mining techniques to data mining problems.
Course Outline
UNIT 1. Big Data and Basic Statistics with MS Excel
- Introduction to Data Mining
- Databases, database systems and Data warehouses
- Introduction to statistics
- Statistical Analysis using Excel
- Machine learning
- Describing Structural Patterns
- Major Issues in Data Mining
UNIT 2. Data, Databases, and Data Warehouses and Basic Statistics with R
- Data Exploration
- R Statistics Essential for Data Analysis and Graphics
- Data Warehousing and Online Analytical Processing
- Data Cube Technology
- Machine Learning Tools and Techniques
UNIT 3. Preprocessing: Data Preparation for Data Mining
- Data Preprocessing
- Transformations: Engineering the input and output
- An Introduction to data cleaning with R
UNIT 4. Basic Statistics with advanced R
- Basic Statistical Description of Data
- Data Visualization
- Measuring Data Similarity and Dissimilarity
- Correlation
- Regression
- ANOVA
UNIT 5. Patterns, Associations, Correlations
- Python Programming Language for Data Analysis
- Frequent Itemset
- Apriori
- Support and Confidence Measures
- Recommender Systems
- Frequent Pattern Growth
- Measures of Interestingness
- Implicit Rating and Item Based Filtering
- Association Rules Mining
- Correlation
- Knowledge Representation
- Extending Linear Models
- Bayesian Networks
- Market Basket Analysis
- Defining and Visualizing Sentiment Data
UNIT 6. Classification and Recommendation Systems
- Decision Trees
- Naïve Bayes
- kNearest Neighbor
- Generalized Linear Models
- Ensemble Learning
- Neural Networks/Deep Learning
- Random Forest
- Recommendation Systems
UNIT 7. Clustering
- kMeans
- Hierarchical Clustering
- DBScan
UNIT 8. Applications in Data Mining
- Search Engines and Text Retrieval
- Social Network Mining
- Big Data Processing and MapReduce
- Security - networking and banking
- Ethics and Electronic Profiling