
Data Analysis and Visualization using Python
Overview
Python has been one of the premier, flexible, and powerful open-source language that is easy to learn, easy to use, and has powerful libraries for data manipulation and analysis. This training is a step-by-step guide to Python and Statistical Data Analysis with extensive hands on. The course is delivered with several activity problems, assignments and scenarios that help participants gain practical experience in data handling, analysis, interpretation as well as reporting. This course starts by exploring basic statistics such as mean, median and mode and commence to advanced exploratory features such as groups comparisons, regression, test of relationships, classification, clustering, just to mention a few.Target Audience
The course is useful for professionals who use data as part of their work and who need to make decisions from data analysis. Those with prior understanding of programming and statistics finds it easier to take this course.Learning Outcomes / Objectives
By the end of this course the participants will be able to:- Easily read and write files of various types in to a Python program.
- Identify and fix errors in datasets.
- Work with Python ‘modules’ and use them for data analysis tasks
- Use libraries like pandas, numpy, matplotlib, scikit, and master the concepts like Python machine learning, scripts, and sequence.
- Gain high level skills on statistical results interpretation and report writing.
Duration
5 daysModules / Course Content
Module1: Introduction Introduction to Statistical Data Analysis- Introduction to statistical concepts
- Descriptive and inferential statistics
- Research designing
- The research/survey process
- Introduction to data science
- Different sectors using data science
- Purpose and components of python
- Data analytics process
- Knowledge check
- Exploratory Data Analysis (EDA)
- EDA-Quantitative technique
- EDA – Graphical technique
- Data analytics conclusion or predictions
- Data analytics communication
- Data types and plotting considerations
- Statistical analysis considerations
- Population and sample
- Statistical analysis process
- Descriptive statistics – Measures of centres, distribution, dispersion
- Inferential Statistics (correlation, regression, t-tests, chi-square, etc)
- Anaconda
- Installation of Anaconda Python distribution
- Data types with Python
- Basic operators and functions
- What is NumPy?
- NumPy vs list
- Installation
- NumPy arrays
- Built-in methods of NumPy (arrange; zeros and ones; linspace; eye; random)
- Array attributes and methods (reshape; max, min, argmax, argmin; shape; dtype)
- NumPy indexing and selection
- Broadcasting
- Indexing a 2D array (matrices)
- Selection
- NumPy operations (arithematic; universal array functions)
- Vectorization
- Introduction to SciPy
- SciPy sub package – integration and optimisation
- Calculating eigenvalues and eigenvector
- Using SciPy to solve a linear algebra problem
- Use SciPy to define random variables for random values
- Introduction to Pandas
- DataFrame in Pandas
- Viewing and opening data
- Dealing with missing values
- Data operations
- Reading and writing files
- Pandas SQL operation
- Introduction to machine learning
- Understanding data sets and extraction features
- Problem types and learning models
- How to train, test and optimise models
- Considerations for supervised learning models
- Scikit-Learn
- Supervised learning models – Linear regression, logistic regression
- Unsupervised learning models
- Pipeline
- Model persistence and evaluation
- Overview of Natural Language Processing
- Applications of Natural Language Processing
- Libraries-Scikit
- Extraction considerations
- Scikit Learn-model training and grid search
- Introduction to data visualisation
- Line properties
- (x, y) plot and subplots
- Types of plots
- Web scraping and parsing
- Knowledge check
- Understanding and searching the tree
- Navigating options and modification options of a tree
- Parsing and printing documents
- Big data solutions in Python
- Big Data and Hadoop
- Hadoop core components
- Python integration with HDFS using Hadoop streaming
- Using Hadoop streaming for calculating word count
- Python Integration with Spark using PySpark
- Using PySpark to determine word count
Training Methodology
The course will employ a hands-on, practical approach to ensure participants develop both conceptual understanding and technical proficiency. Each module will integrate interactive lectures, guided software demonstrations, and individual or group exercises based on real-world illustrations. Participants will receive continuous feedback and personalized coaching to reinforce learning. By the end of the training, they will have completed a mini project that demonstrates their ability to apply the acquired skills in a practical context.More Details
Upon successful completion of this course, participants will be issued a certificate.
Registration
Registration as an individual (Onsite course delivery)
Click on the Register button aligned with your course dates and venue from the table provided.
Click on the Register button aligned with your course dates and venue from the table provided.