Exploratory Data Analysis (EDA) in Python for Machine Learning in Bioinformatics
About Course
While dealing with massive amounts of biological data, it is difficult to properly understand it in a written or tabular form. Hence, in order to gain a better understanding of our biological data it is essential that we represent it in a pictorial form so that various trends, correlations, outliers, and patterns in our biological data can be exposed. Biological data visualization means the graphical representation of biological data and information.
Biological data visualization is an important aspect of bioinformatics which involves the graphical representation of unstructured or structured biological data. It helps you in making impactful decisions during your research based on data visualizations along with publishable figures for your research papers.
Exploratory Data Analysis (EDA) is an approach to analyzing biological datasets to summarize their main characteristics. It is used to understand biological data, get some context, understand the variables and the relationships between them, and formulate hypotheses that could be useful when building predictive models. EDA is performed with the help of biological data visualization.
BioCode is offering a detailed handson course on Exploratory Data Analysis for machine learning and data preprocessing in Python. Python provides us with various libraries that come with different features for visualizing biological data and information. This course will help the students in understanding the concept and purpose of exploratory data analysis. The students will learn the importance of exploratory data analysis in machine learning. Students will also learn various different use cases for Pandas, Numpy, Seaborn, Matplotlib, JupyterNotebook, and Anaconda in EDA.
Students will also learn how to retrieve bioinformatics, genomics, and health informatics datasets and develop machine learning models after performing the EDA.
In this course, students will identify useful features from the dataset that can be used for machine learning. Students will learn how to completely perform endtoend exploratory data analysis of their biological datasets and plot beautiful charts such as joint plots, bar plots, line plots, swarm plots, scatter plots, correlation plots, histograms, etc. Students will learn how to analyze trends, distributions, and relations between biological features. This course is for absolute beginners in bioinformatics scripting and you don’t require any prior knowledge of scripting or even bioinformatics to get started with this course.
This course will include the following sections:
Section 1: Introduction to Exploratory Data Analysis and Visualization in Python
Description: This section will focus on making sure that the students gain an understanding of exploratory data analysis and the importance of exploratory data analysis for the identification of trends, patterns, distributions, and correlations in the biological data. Students will learn about the various Python libraries that help us in performing exploratory data analysis. Students will be able to retrieve raw datasets for machine learning.
Learning Outcomes: Upon completion of this section, students will be able to:
 Discuss Exploratory Data Analysis.
 Understand the Importance of Exploratory Data Analysis in Machine Learning for Bioinformatics.
 Explain Pandas Structures.
 Explain Numpy Structures.
 Describe Matplotlib.
 Describe Seaborn.
 Retrieve Datasets for Machine Learning.
 Explain the Raw Breast Cancer Dataset.
Section 2: Handson Exploratory Data Analysis of Cancer Dataset
Description: This section will focus on making sure that the students learn how to perform exploratory data analysis of the cancer dataset. Students will learn how to make several types of graphs and plots including line plots, joint plots, density plots, swarm plots, scatter plots, histograms, correlation plots, linear model plots, and bar charts. Students will be able to identify the biological factors and their relations utilizing these plots. Students will learn how these graphs will help them in their biological data analysis.
Learning Outcomes: Upon completion of this section, students will be able to:
 Create a Line Plot to Understand the Trends in Cancer Datasets.
 Create a Joint Plot to Visualize Features from Multiple Angles.
 Understand the Density Plot to Evaluate the Enzyme Levels in Cancer Individuals.
 Compare the Serum Levels in Healthy and Patients Individuals through Swarm Plot.
 Evaluate the Distribution of the Features Histogram.
 Elucidate the Relation Between Two Features Using Scatter Plot
 Understand the Correlation Between Features Using Correlation Plot and Heatmap Visualizations
 Create a Linear Model Between Two or More Features to Understand their Relation Using a Linear Model Plot.
 Draw a Regression Line Between Two Features for Regression Analysis.
 Identify the Frequency of Patients Using Bar Charts.
Course Content
Introduction to Exploratory Data Analysis and Visualization in Python

Introduction to EDA in Machine Learning for Bioinformatics
18:13 
Introduction to Pandas Structures
08:39 
Introduction to Numpy Structures
07:35 
Introduction to Matplotlib
06:25 
Introduction to Seaborn
05:00 
How to Retrieve Datasets for Machine Learning
05:41 
Raw Breast Cancer Dataset Explanation
06:46
Handson EDA of Cancer Dataset
Exercise
Earn a certificate
Add this certificate to your resume to demonstrate your skills & increase your chances of getting noticed.