CMSC 197 - Special Topics (Introduction to Data Science)
Course Description
This is a 3-unit course that will discuss an overview and foundation of Data Science, covering a broad selection of key challenges in and methodologies for working with big data. Topics to be covered include data collection, data cleaning, integration, management, modeling, analysis, visualization, prediction, and informed decision making. This course also covers the basics needed for collecting, cleaning, and sharing of data. Additionally, the course covers the essential exploratory techniques for summarizing data which includes some of the common multivariate statistical techniques used to visualize high-dimensional data.
Course Learning Outcomes
After completion of the course, the student should be able to:
- Write programs in R;
- Understand problems solvable with data science and able to attack those problems form a statistical
perspective;
- Collect, manipulate, blend data from different data sources; and
- Visualize Data and Perform Exploratory Data Analysis.
Course Outline
UNIT 1. Data Science Overview
- Overview of R
- Getting and Cleaning Data Overview
- Practical Machine Learning Overview
- Regression Models Overview
- Reproducible Research
- Statistical Inference Overview
- Big Data
- Experimental Design
- Types of Questions
UNIT 2. R Programming
- Introduction and History of R
- R Data types and Objects
- Reading and Writing Data
- Control Structures
- Functions
- Scoping Rules
- Dates and Times
- Loop Functions
- Debugging Tools
- Simulation
- Code Profiling
UNIT 3. Getting and Cleaning Data
- Data Collection
- Data Formats
- Making Data Tidy
- Distributing Data
- Scripting for Data Cleaning
UNIT 4. Exploratory Data Analysis
- Making Exploratory graphs
- Principles of Analytic graphs
- Plotting systems and graphics devices in R
- The base, lattice, and ggplot2 systems in R
- Clustering methods
- Dimension reduction techniques