Data Science

Undergraduate Courses

CS 4433. Big Data Management and Analytics

Cat I (offered at least 1x per Year).
This course introduces the emerging techniques and infrastructures for big data management and analytics including parallel and distributed database systems, map-reduce, Spark, and NoSQL infrastructures, data stream processing systems, scalable analytics and mining, and cloud-based computing. Query processing and optimization, access methods, and storage layouts developed on these infrastructures will be covered. Students are expected to engage in hands-on projects using one or more of these technologies.

CS 4804. Data Visualization

This course trains students in data visualization, the graphical communication of data and information for presentation, confirmation, and exploration. Students learn the stages of the visualization pipeline, including data characterization, mapping data attributes to graphical attributes, user task abstraction, visual display techniques, tools, paradigms, and perceptual issues. Students evaluate the effectiveness of visualizations for specific data, task, and user types. Students implement visualization algorithms and undertake projects involving the use of commercial and public-domain visualization tools.

DS 1010. Data Science I: Introduction to Data Science

Cat I (offered at least 1x per Year).
This course provides an introduction to the core concepts in Data Science. It covers a broad range of methodologies for working with and making informed decisions based on real-world data. Core topics introduced in this course include basic statistics, data exploration, data cleaning, data visualization, business intelligence, and data analysis. Students will utilize various techniques and tools to explore, understand and visualize real-world data sets from various domains and learn how to communicate data results to decision makers.

DS 2010. Data Science II: Modeling and Data Analysis

Cat I (offered at least 1x per Year).
This course focuses on model- and data-driven approaches in Data Science. It covers methods from applied statistics (regression), optimization, and machine learning to analyze and make predictions and inferences from real-world data sets. Topics introduced in this course include basic statistics (regression), analytics (explanatory and predictive), basics of machine learning (classification and clustering), eigen values and singular matrices, data exploration, data cleaning, data visualization, and business intelligence. Students will utilize various techniques and tools to explore and understand real-world data sets from various domains.

DS 3010. Data Science III: Computational Data Intelligence

Cat I (offered at least 1x per Year).
This course introduces core methods in Data Science. It covers a broad range of methodologies for working with large and/or high-dimensional data sets to making informed decisions based on real-world data. Core topics introduced in this course include data collection through use cycle, data management of large-scale data, cloud computing, machine learning and deep learning. Students will acquire experience with big data problems through hands-on projects using real-world data sets.

DS 4099. Special Topics in Data Science

Cat III (offered at discretion of dept/prgm).
Instances of this course will explore advanced and emerging topics in Data Science that are not covered by the current regular Data Science offerings. Content and format will vary to suit the interests and needs of the faculty and students. This course may be repeated by students for credit as topics change.

DS 4433. Big Data Management and Analytics

Cat I (offered at least 1x per Year).
This course introduces the emerging techniques and infrastructures for big data management and analytics including parallel and distributed database systems, map-reduce, Spark, and NoSQL infrastructures, data stream processing systems, scalable analytics and mining, and cloud-based computing. Query processing and optimization, access methods, and storage layouts developed on these infrastructures will be covered. Students are expected to engage in hands-on projects using one or more of these technologies.

DS 4635. Data Analytics and Statistical Learning

Cat I (offered at least 1x per Year).
The focus of this class will be on statistical learning - the intersection of applied statistics and modeling techniques used to analyze and to make predictions and inferences from complex real-world data. Topics covered include: regression; classification/clustering; sampling methods (bootstrap and cross validation); and decision tree learning. Students may not receive credit for both MA 463X and MA 4635.

MA 4635. Data Analytics and Statistical Learning

Cat I (offered at least 1x per Year).
The focus of this class will be on statistical learning - the intersection of applied statistics and modeling techniques used to analyze and to make predictions and inferences from complex real-world data. Topics covered include: regression; classification/clustering; sampling methods (bootstrap and cross validation); and decision tree learning. Students may not receive credit for both MA 463X and MA 4635.

Graduate Courses

CS 541. Deep Learning

This course will offer a mathematical and practical perspective on artificial neural networks for machine learning. Students will learn about the most prominent network architectures including multilayer feedforward neural networks, convolutional neural networks (CNNs), auto-encoders, recurrent neural networks (RNNs), and generative-adversarial networks (GANs). This course will also teach students optimization and regularization techniques used to train them such as back- propagation, stochastic gradient descent, dropout, pooling, and batch normalization. Connections to related machine learning techniques and algorithms, such as probabilistic graphical models, will be explored. In addition to understanding the mathematics behind deep learning, students will also engage in hands-on course projects. Students will have the opportunity to train neural networks for a wide range of applications, such as object detection, facial expression recognition, handwriting analysis, and natural language processing.

CS 547. Information Retrieval

This course introduces the theory, design, and implementation of text-based and Web-based information retrieval systems. Students learn the key concepts and models relevant to information retrieval and natural language processing on large-scale corpus such as the Web and social systems. Topics include vector space model, crawling, indexing, web search, ranking, recommender systems, embedding and language model.

CS 552. Generative Artificial Intelligence

Generative Artificial Intelligence (Gen-AI) is a class of machine learning models that generate new data (text, images, faces, voice, artwork) that is near indistinguishable from the equivalent real data typically generated by humans. These models are trained based on realistic example data sets from the real world. This course covers the underlying fundamentals of generative models. It also introduces the design and modeling of some of the modern generative models: Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), Diffusion models, ChatGPT, Large Language Models, to name a few. Several applications will be discussed, ranging from image generation for engineering or science applications to the utilization of generated data for data augmentation in AI systems. Ethical concerns related to the danger of these generative technologies concerning issues from misinformation, bias, to data ownership are reviewed.

CS 553. Machine Learning Development and Operations

This course teaches students the computational skills required in the fields of Artificial Intelligence (AI) and Data Science. As data-driven decision-making and AI applications continue to transform industries, proficiency in programming and machine learning tools is important. In this course, you will develop a strong foundation in programming languages commonly used in AI and Data Science (such as Python). This course will cover the development, debugging, deployment, and subsequent monitoring phases of models in end-to-end pipelines core to machine learning systems. You will also familiarize yourself with popular libraries, frameworks and debugging on IDEs, such as PyCharm, PyTorch, scikit-learn, and/or pandas. Possible topics may include practice code development with a copilot as well as deployment of models on a cloud computing environment The student will engage in hands-on projects to practice their programming skills to solve realworld AI and Data Science problems.

CS 554. Natural Language Processing

Natural Language Processing (NLP) is an interdisciplinary field at the intersection of artificial intelligence, linguistics, and computer science, dedicated to enabling computers to understand, interpret, and generate human language. NLP underpins advancements in human-computer interaction, information retrieval, sentiment analysis, chatbots, and a multitude of other applications. The course may cover a wide range of topics, including language modeling, sequence-to-sequence architectures, sentiment analysis, machine translation, and advanced techniques for natural language understanding and generation, providing a comprehensive foundation for NLP expertise.

CS 555. Responsible Artificial Intelligence

Artificial Intelligence (AI) algorithms have a significant impact on peoples lives. In this course, we discuss social responsibility around data privacy, bias in data and decision-making, policies as guardrails, fairness and transparency in the context of applying AI algorithms. Case studies considering societal challenges caused by AI technologies may include AI-based hiring recommendations stemming from societal biases present in training datasets, AI-empowered selfdriving cars behaving in a dangerous manner when encountering atypical road conditions, digital health applications inadvertently revealing private patient information, or large language models like chat-GPT generating incorrect or harmful responses. This course also studies AI-based algorithmic solutions to some of these challenges. These include the design of robust machine learning algorithms with constraints to ensure fairness, privacy, and safety. Strategies for how to apply these methods to design safe and fair AI are introduced. Topics may include min-max optimization with applications to training machine learning models robust to adversarial attacks, stochastic methods for preserving privacy of sensitive data, and multi-agent machine learning models for reducing algorithmic bias and polarization in recommender systems.

CS 556/DS. On-Device Deep Learning

Deep Learning, a core of modern Artificial Intelligence, is rapidly expanding to resourceconstrained devices, including smartphones, wearables, and intelligent embedded systems for improving response time, privacy, and reliability. This course focuses on bringing these powerful deep-learning applications from central data centers and large GPUs to distributed ubiquitous systems. On-Device Deep Learning is an interdisciplinary topic at the intersection of artificial intelligence and ubiquitous systems, dedicated to enabling computing on edge devices. This course includes a wide range of topics related to deep learning in resource constrained settings including pruning and sparsity, quantization, neural architecture search, knowledge distillation, on-device training and transfer learning, distributed training, gradient compression, federated learning, efficient data movement and accelerator design, dynamic network inference, and advanced compression and approximation techniques for enabling on-device deep neural network inference and training. This course provides a comprehensive foundation for cutting-edge tinyML expertise

CS 594. Graduate Qualifying Project in Artificial Intelligence

This 3-credit graduate qualifying project, typically done in teams, provides a capstone experience in applying Artificial Intelligence skills to a real-world problem. It will be carried out in cooperation with an industrial sponsor, and is approved and overseen by a core or collaborative faculty member in the Artificial Intelligence Program. This offering integrates theory and practice of Artificial Intelligence, and includes the utilization of tools and techniques acquired in the Artificial Intelligence Program to a real-world problem. In addition to a written report, this project must be presented in a formal presentation to faculty of the AI program and sponsors. Professional development skills, such as communication, teamwork, leadership, and collaboration, will be practiced. This course is a degree requirement for the Master of Science in Artificial Intelligence (MS-AI) and may not be taken before completion of 21 credits in the program. Students outside the MS-AI program must get the instructors approval before.

DS 5006. Machine Learning for Engineering and Science Applications

Cat I.
This course surveys the application of data science (DS) and machine learning (ML) to problems arising in engineering and the sciences. While DS and ML have profoundly affected domains such as image understanding and natural language processing, ML has seen comparatively less impact in chemistry, physics, chemical engineering, electrical engineering, and many other important application domains. Topics covered will include predictive modeling, feature engineering, and model assessment, with a particular focus on the small-data limit. We will analyze and apply algorithms with wide applicability in engineering and sciences including classic techniques such as multiple linear regression and random forests, and state-of-the-art techniques such as deep neural networks.

DS 501. Introduction to Data Science

Introduction to Data Science provides an overview of Data Science, covering a broad selection of key challenges in and methodologies for working with big data. Topics to be covered include data collection, integration, management, modeling, analysis, visualization, prediction and informed decision making, as well as data security and data privacy. This introductory course is integrative across the core disciplines of Data Science, including databases, data warehousing, statistics, data mining, data visualization, high performance computing, cloud computing, and business intelligence. Professional skills, such as communication, presentation, and storytelling with data, will be fostered. Students will acquire a working knowledge of data science through hands-on projects and case studies in a variety of business, engineering, social sciences, or life sciences domains. Issues of ethics, leadership, and teamwork are highlighted.

DS 502. Statistical Methods for Data Science

Statistical Methods for Data Science surveys the statistical methods most useful in data science applications. Topics covered include predictive modeling methods, including multiple linear regression, and time series, data dimension reduction, discrimination and classification methods, clustering methods, and committee methods. Students will implement these methods using statistical software.

DS 551. Reinforcement Learning

Reinforcement Learning is an area of machine learning concerned with how agents take actions in an environment with a goal of maximizing some notion of cumulative reward. The problem, due to its generality, is studied in many disciplines, and applied in many domains, including robotics and industrial automation, marketing, education and training, health and medicine, text, speech, dialog systems, finance, among many others. In this course, we will cover topics including: Markov decision processes, reinforcement learning algorithms, value function approximation, actor-critics, policy gradient methods, representations for reinforcement learning (including deep learning), and inverse reinforcement learning. The course project(s) will require the implementation and application of many of the algorithms discussed in class.

DS 552. Generative Artificial Intelligence

Generative Artificial Intelligence (Gen-AI) is a class of machine learning models that generate new data (text, images, faces, voice, artwork) that is near indistinguishable from the equivalent real data typically generated by humans. These models are trained based on realistic example data sets from the real world. This course covers the underlying fundamentals of generative models. It also introduces the design and modeling of some of the modern generative models: Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), Diffusion models, ChatGPT, Large Language Models, to name a few. Several applications will be discussed, ranging from image generation for engineering or science applications to the utilization of generated data for data augmentation in AI systems. Ethical concerns related to the danger of these generative technologies concerning issues from misinformation, bias, to data ownership are reviewed.

DS 553. Machine Learning Development and Operations

This course teaches students the computational skills required in the fields of Artificial Intelligence (AI) and Data Science. As data-driven decision-making and AI applications continue to transform industries, proficiency in programming and machine learning tools is important. In this course, you will develop a strong foundation in programming languages commonly used in AI and Data Science (such as Python). This course will cover the development, debugging, deployment, and subsequent monitoring phases of models in end-to-end pipelines core to machine learning systems. You will also familiarize yourself with popular libraries, frameworks and debugging on IDEs, such as PyCharm, PyTorch, scikit-learn, and/or pandas. Possible topics may include practice code development with a copilot as well as deployment of models on a cloud computing environment The student will engage in hands-on projects to practice their programming skills to solve realworld AI and Data Science problems.

DS 554. Natural Language Processing

Natural Language Processing (NLP) is an interdisciplinary field at the intersection of artificial intelligence, linguistics, and computer science, dedicated to enabling computers to understand, interpret, and generate human language. NLP underpins advancements in human-computer interaction, information retrieval, sentiment analysis, chatbots, and a multitude of other applications. The course may cover a wide range of topics, including language modeling, sequence-to-sequence architectures, sentiment analysis, machine translation, and advanced techniques for natural language understanding and generation, providing a comprehensive foundation for NLP expertise.

DS 555. Responsible Artificial Intelligence

Artificial Intelligence (AI) algorithms have a significant impact on peoples lives. In this course, we discuss social responsibility around data privacy, bias in data and decision-making, policies as guardrails, fairness and transparency in the context of applying AI algorithms. Case studies considering societal challenges caused by AI technologies may include AI-based hiring recommendations stemming from societal biases present in training datasets, AI-empowered selfdriving cars behaving in a dangerous manner when encountering atypical road conditions, digital health applications inadvertently revealing private patient information, or large language models like chat-GPT generating incorrect or harmful responses. This course also studies AI-based algorithmic solutions to some of these challenges. These include the design of robust machine learning algorithms with constraints to ensure fairness, privacy, and safety. Strategies for how to apply these methods to design safe and fair AI are introduced. Topics may include min-max optimization with applications to training machine learning models robust to adversarial attacks, stochastic methods for preserving privacy of sensitive data, and multi-agent machine learning models for reducing algorithmic bias and polarization in recommender systems.

DS 5900. Data Science Internship

The internship is an elective-credit option designed to provide an opportunity to put into practice the principles studied in previous Data Science courses. Internships will be tailored to the specific interests of the student. Each internship must be carried out in cooperation with a sponsoring organization, generally from off campus and must be approved and advised by a core faculty member in the Data Science program. The internship must include proposal, design and documentation phases. Following the internship, the student will report on his or her internship activities in a mode outlined by the supervising faculty member. Students are limited to counting a maximum of 3 internship credits towards their degree requirements for the M.S. degree in Data Science. We expect a full-time graduate student to take on only part-time (20 hours or less of) internship work during the regular academic semester, while a full-time internship of 40 hours per week is appropriate during the summer semester as long as the student does not take a full class load at the same time. Internship credit cannot be used towards a certificate degree in Data Science. The internship may not be completed at the students current place of employment.

DS 594. Graduate Qualifying Project in Artificial Intelligence

This 3-credit graduate qualifying project, typically done in teams, provides a capstone experience in applying Artificial Intelligence skills to a real-world problem. It will be carried out in cooperation with an industrial sponsor, and is approved and overseen by a core or collaborative faculty member in the Artificial Intelligence Program. This offering integrates theory and practice of Artificial Intelligence, and includes the utilization of tools and techniques acquired in the Artificial Intelligence Program to a real-world problem. In addition to a written report, this project must be presented in a formal presentation to faculty of the AI program and sponsors. Professional development skills, such as communication, teamwork, leadership, and collaboration, will be practiced. This course is a degree requirement for the Master of Science in Artificial Intelligence (MS-AI) and may not be taken before completion of 21 credits in the program. Students outside the MS-AI program must get the instructors approval before.

DS 595. Special Topics in Data Science

Special Topics in Data Science is course offering that will cover a topic of current interest in detail. This serves as a flexible vehicle to provide a one-time offering of topics of current interest as well as to offer new topics before they are made into a permanent course.

DS 596. Independent Study

Independent Study, as the name suggests, is a course that allows a student to study a chosen topic in Data Science under the guidance of a faculty member affiliated with the Data Science program. The student must produce a written report to satisfy the course requirement.

DS 597. Directed Research

Directed Research study, conducted under the guidance of a faculty member affiliated with the Data Science Program, investigates the challenges and techniques central to data science, and aims to develop novel approaches and techniques towards solving these challenges. The student who chooses this course must produce a written report to fulfil the course requirement.

DS 598. Graduate Qualifying Project

This 3-credit graduate qualifying project, done in teams, can be taken a second time for credit with permission by the instructor, up to a total of 6 credits. The project is to be carried out in cooperation with a sponsor or industrial partner. It must be overseen by a faculty member affiliated with the Data Science Program. This offering integrates theory and practice of Data Science, and includes the utilization of tools and techniques acquired in the Data Science Program. In addition to a written report, this project must be presented in a formal presentation to faculty of the Data Science program and sponsors. Professional development skills, such as communication, teamwork, leadership, and collaboration, along with storytelling, will be practiced.

DS 599. Master's Thesis in Data Science

The Masters Thesis in Data Science consists of a research and development project worth a minimum of 9 graduate credit hours and is advised by a faculty member affiliated with the Data Science Program. A thesis proposal must be approved by the DS Program Review Board and the students advisor, before the student can register for more than three thesis credits. The student must satisfactorily complete a written thesis document, and present the results to the DS faculty in a public presentation.

DS 699. Dissertation Research.

Intended for doctoral students admitted to candidacy wishing to obtain research credit toward their dissertations.

ECE 556. On-Device Deep Learning

Deep Learning, a core of modern Artificial Intelligence, is rapidly expanding to resourceconstrained devices, including smartphones, wearables, and intelligent embedded systems for improving response time, privacy, and reliability. This course focuses on bringing these powerful deep-learning applications from central data centers and large GPUs to distributed ubiquitous systems. On-Device Deep Learning is an interdisciplinary topic at the intersection of artificial intelligence and ubiquitous systems, dedicated to enabling computing on edge devices. This course includes a wide range of topics related to deep learning in resource constrained settings including pruning and sparsity, quantization, neural architecture search, knowledge distillation, on-device training and transfer learning, distributed training, gradient compression, federated learning, efficient data movement and accelerator design, dynamic network inference, and advanced compression and approximation techniques for enabling on-device deep neural network inference and training. This course provides a comprehensive foundation for cutting-edge tinyML expertise

ECE 577. Machine Learning in Cybersecurity

Machine Learning has proven immensely effective in a diverse set of applications. This trend has reached a new high with the application of Deep Learning virtually in any application domain. This course studies the applications of Machine Learning in the sub domain of Cybersecurity by introducing a plethora of case studies including anomaly detection in networks and computing, side-channel analysis, user authentication and biometrics etc. These case studies are discussed in detail in class, and further examples of potential applications of Machine Learning techniques including Deep Learning are outlined. The course has a strong hands-on component, i.e. students are given datasets of specific security applications and are required to perform simulations.

MA 517. Mathematical Foundations for Data Science

The foci of this class are the essential statistics and linear algebra skills required for Data Science students. The class builds the foundation for theoretical and computational abilities of the students to analyze high dimensional data sets. Topics covered include Bayes theorem, the central limit theorem, hypothesis testing, linear equations, linear transformations, matrix algebra, eigenvalues and eigenvectors, and sampling techniques, including Bootstrap and Markov chain Monte Carlo. Students will use these techniques while engaging in hands-on projects with real data.

MA 543. Statistical Methods for Data Science

Statistical Methods for Data Science surveys the statistical methods most useful in data science applications. Topics covered include predictive modeling methods, including multiple linear regression, and time series, data dimension reduction, discrimination and classification methods, clustering methods, and committee methods. Students will implement these methods using statistical software.