Micro-Credential in Data Science

Micro-credential in Data Science

This page contains information and guidelines for students interested in pursuing the Micro-credential in Data Science from the Hewlett Packard Enterprise Data Science Institute (HPE DSI).

The purpose of the Micro-credential in Data Science is to recognize the expertise gained by students in the course of their studies in the areas of Data Management, Python programming, Data Visualization, and Machine Learning.

Core Courses

212 Scientific Programming with Python
251 Data Visualization using Paraview & Tableau
261 Principles of Data Management
311 Introduction to Machine Learning

Cost

Free for active University of Houston System (UHS) students, staff or faculty
$250/course badge for non-UH individuals

4 badges = 1 micro-credential

How it works: Register and complete each of the four courses below, and the HPE DSI will award you the Micro-credential in Data Science. Badges will be given at the end of each semester.

Course Descriptions

To receive the Micro-credential badge, complete the courses listed below in any semester. These courses will neither affect your GPA nor appear in your transcripts. The description for each course can be found below:

Principles of Data Management Badge (15hrs)

Today, multiple resources such as Social Media, healthcare applications, banking applications and so on, generate a large volume of data.

But how do we make the data useful in order to help our society?

This course has been designed to help you familiarize yourself with some of the essential steps to begin analyzing data and discovering patterns within the dataset.

It will include working on some real-world datasets covering skills such as:

Data scraping, data pre-processing (data importing & cleaning), data wrangling (exploratory data analysis and data structuring) and applying statistical techniques to prepare for machine learning algorithms.

Pre-requisites: Familiarity with Python Programming language

Recommended: Intro to Linux/Cluster computing course provided by HPE Data Science Institute

Scientific Programming with Python Badge (15hrs)

Python is an easy to learn, powerful programming language. It has efficient high-level data structures that make it suitable for rapid application development.

Topics covered in this session will include data types, conditional and loop statements, functions, input/output, modules, classes and exceptions.

Upon completion of this tutorial series, participants should understand existing scientific Python codes and be able to write their own simple Python applications.

This training session also introduces participants to scientific computing extensions of Python like NumPy for use in high-performance computing.

Advanced Python libraries like regular expressions, SciPy, pandas, seaborn, scikit-learn, etc. for every day scientific computing will also be covered in the course.

Prerequisites: Participants are expected to have a working knowledge of the UNIX/Linux environment or should have taken Cluster computing course from HPE Data Science Institute.

Data Visualization using Paraview & Tableau Badge (15hrs)

This tutorial will provide hands-on skills to use modern data visualization and analysis platforms, specifically the open source parallel ParaView and Tableau. ParaView is very powerful and popular in the High Performance Computing scientific and engineering research communities.

In the ParaView section, we will explore representations, color-scales and their controls, data filters, how to build pipelines, multi-view & camera links using synthetic seismic data, streamline plots, plot-over-line analysis and histograms.

Also, the course will explore the calculator tool, datasets and time, animations and their controls, time interpolation, camera animations, static vector field animation and Python scripting. Finally, we will cover how to use these tools/skills to do remote, parallel visualization using HPE Data Science Institute computer clusters. In the Tableau workshop, we will use Tableau Public to create interactive data visualizations. We will also use more advanced features in Tableau to manage data and use calculations and parameters to make views more interactive. In the end, students will publish their visualizations to the Tableau Public web server.

Introduction to Machine Learning Badge (15hrs)

Machine learning is the science of developing statistical methods that quantify relationships within data. This branch of mathematics/computer science has seen an explosive growth over the past decade as our ability to store and process digital data has dramatically increased. Prediction, classification, regression and identification are what we learn from data, and we can solve these problems with data analytics.

To obtain an overview of the literature in learning-based methods and applications.

To obtain an understanding of a variety of machine learning techniques for classification, regression and prediction.

To obtain the ability to implement and experiment with a wide range of machine learning algorithms in Python, with examples.

To apply Unsupervised and Supervised learning and clustering concepts, Dimensional Reduction, kernels, and kernel-based classifiers such as SVM, and Deep Learning algorithms.

To understand and implement learning-based methods for classification of images, signals and features.

Recommended: Intro to Linux/Cluster computing course provided by HPE DSI