STATISTICS


Course Credits: 3 Units

Prerequisites: CMSC 123, Stat 106 or COI (for non-majors)

CMSC 173 - Data Mining

Course Description

Fundamental concepts in data mining. Big data and basic statistics; databases and data warehouses; preprocess: data preparation for data mining; patterns, association, correlations; classification and prediction; clustering; applications in data mining.

Course Learning Outcomes

After completion of the course, the student should be able to:

  1. Analyze large sets of data to uncover patterns within the data using mathematical algorithms;
  2. Predict data based on patterns discovery;
  3. Use data mining tools and statistical analysis to analyze data; and
  4. Implement data mining techniques to data mining problems.
Course Outline

UNIT 1. Big Data and Basic Statistics with MS Excel

  1. Introduction to Data Mining
  2. Databases, database systems and Data warehouses
  3. Introduction to statistics
  4. Statistical Analysis using Excel
  5. Machine learning
  6. Describing Structural Patterns
  7. Major Issues in Data Mining

UNIT 2. Data, Databases, and Data Warehouses and Basic Statistics with R

  1. Data Exploration
  2. R Statistics Essential for Data Analysis and Graphics
  3. Data Warehousing and Online Analytical Processing
  4. Data Cube Technology
  5. Machine Learning Tools and Techniques

UNIT 3. Preprocessing: Data Preparation for Data Mining

  1. Data Preprocessing
  2. Transformations: Engineering the input and output
  3. An Introduction to data cleaning with R

UNIT 4. Basic Statistics with advanced R

  1. Basic Statistical Description of Data
  2. Data Visualization
  3. Measuring Data Similarity and Dissimilarity
  4. Correlation
  5. Regression
  6. ANOVA

UNIT 5. Patterns, Associations, Correlations

  1. Python Programming Language for Data Analysis
  2. Frequent Itemset
  3. Apriori
  4. Support and Confidence Measures
  5. Recommender Systems
  6. Frequent Pattern Growth
  7. Measures of Interestingness
  8. Implicit Rating and Item Based Filtering
  9. Association Rules Mining
  10. Correlation
  11. Knowledge Representation
  12. Extending Linear Models
  13. Bayesian Networks
  14. Market Basket Analysis
  15. Defining and Visualizing Sentiment Data

UNIT 6. Classification and Recommendation Systems

  1. Decision Trees
  2. Naïve Bayes
  3. kNearest Neighbor
  4. Generalized Linear Models
  5. Ensemble Learning
  6. Neural Networks/Deep Learning
  7. Random Forest
  8. Recommendation Systems

UNIT 7. Clustering

  1. kMeans
  2. Hierarchical Clustering
  3. DBScan

UNIT 8. Applications in Data Mining

  1. Search Engines and Text Retrieval
  2. Social Network Mining
  3. Big Data Processing and MapReduce
  4. Security - networking and banking
  5. Ethics and Electronic Profiling