Description
Introduction to Pandas Pandas is a Python library used for data manipulation and analysis. Pandas provides a convenient way to analyze and clean data. The Pandas library introduces two new data structures to Python - Series and DataFrame, both of which are built on top of NumPy. What is Pandas Used for?Pandas is a powerful library generally used for:-
- Data Cleaning
- Data Transformation
- Data Analysis
- Machine Learning
- Data Visualization
Why Use Pandas?
Handle Large Data Efficiently
Pandas is designed for handling large datasets. It provides powerful tools that simplify tasks like data filtering, transforming, and merging. It also provides built-in functions to work with formats like CSV, JSON, TXT, Excel, and SQL databases.Tabular Data Representation
Pandas DataFrames, the primary data structure of Pandas, handle data in tabular format. This allows easy indexing, selecting, replacing, and slicing of data.Data Cleaning and Preprocessing
Data cleaning and preprocessing are essential steps in the data analysis pipeline, and Pandas provides powerful tools to facilitate these tasks. It has methods for handling missing values, removing duplicates, handling outliers, data normalization, etc.Time Series Functionality
Pandas contains an extensive set of tools for working with dates, times, and time-indexed data as it was initially developed for financial modeling.Free and Open-Source
Pandas follows the same principles as Python, allowing you to use and distribute Pandas for free, even for commercial use.Target Audience
The primary target audience for the pandas library in Python consists of individuals and professionals involved in data analysis, data science, and data manipulation. This includes: Data Analysts and Data Scientists Researchers Business Intelligence Professionals Software Developers Students and EducatorsPrerequisite
We have to install Pandas before we can "operate with" it. Additionally, Python 3.5.x or later is needed. Python 3.6, 3.7, or 3.8 is required in order to install it. It contains optional dependencies (like Matplotlib for plotting) and depends on other libraries (like NumPy).Pandas Introduction
Pandas Installation
Installation Verification
Running First Pandas Program
What is Pandas Used for?
Why Use Pandas?
Import Pandas in Python
Creating a Pandas Series
Labels
Creating Series From a Python Dictionary
Creating a Pandas DataFrame:-
Pandas DataFrame Using Python Dictionary
Pandas DataFrame Using Python List
Pandas DataFrame From a File
Creating an Empty DataFrame
Creating Indexes in Pandas:-
Default Index
Setting Index
Creating a Range Index
Modifying Indexes in Pandas:-
Renaming Index
Resetting Index
Accessing Rows by Index:-
Getting DataFrame Index:-
Creating Array Using Python List
Explicitly Specify Array Elements Data Type
Creating Series From Pandas Array
View Data in a Pandas DataFrame
Pandas head()
Pandas tail()
Get DataFrame Information
Add a New Column to a Pandas DataFrame:-
Add a New Row to a Pandas DataFrame:-
Remove Rows/Columns from a Pandas DataFrame:-
Delete Rows
Delete columns
Rename Labels in a DataFrame:-
Rename Columns
Rename Row Labels
Access Columns of a DataFrame:-
Pandas .loc:-
Indexing Using .loc
Slicing Using .loc
Boolean Indexing With .loc
Pandas .iloc:-
Indexing Using .iloc
Slicing Using .iloc
.loc vs .iloc:-
Select Data Using Indexing and Slicing
Using loc and iloc to Select Data
Select Rows Based on Specific Criteria
query() to Select Data
Select Rows Based on a List of Values
Creating MultiIndex in Pandas
Access Rows With MultiIndex
MultiIndex from Arrays
Reshape Data Using pivot()
Reshape Data Using pivot_table()
Reshape Data Using stack() and unstack()
Use of melt() to Reshape DataFrame
Find Duplicate Entries
Find Duplicates Based on Columns
Remove Duplicate Entries
pivot() syntax
pivot() for Multiple Values
pivot() vs pivot_table()
pivot_table() Syntax
pivot_table() with Multiple Values
pivot_table() With Aggregate Functions
Pivot Table With MultiIndex
Handle Missing Values With pivot_table()
Read CSV Files
read_csv() Syntax
read_csv() With Arguments
Write to CSV Files
to_csv() Syntax
to_csv() With Arguments
Read JSON in Pandas
read_json() Syntax
Write JSON in Pandas
to_json() Syntax
Read Text Using read_fwf()
Read Text Using read_table()
Read Text Using read_csv()
merge() Syntax in Pandas:-
Merge DataFrames Based on Keys:-
Types of Join Operations In merge():-
Left Join
Right Join
Inner Join
Outer Join
Cross Join
Join vs Merge vs Concat:-
Join() Syntax:-
Join DataFrames:-
Types of Join:-
Left Join (Default)
Right Join
Inner Join
Outer Join
Cross Join
concat() Syntax
concat() With Arguments
Concatenation Along Axis 1
Inner Join Vs Outer Join
Concatenation With Keys
Drop Rows With Missing Values
Fill Missing Values
Use Aggregate Functions to Fill Missing Values
Handle Duplicates Values
Rename Column Names to Meaningful Names
Remove Rows Containing Missing Values
Replace Missing Values
Replace Missing Values With Mean, Median and Mode
Replace Values Using Another DataFrame
Convert Data to Correct Format
Handling Mixed Date Formats
Replace Individual Values
Replace Values Based on a Condition
Remove Wrong Values
Using get_dummies() on Pandas Series
Use get_dummies() on a DataFrame Column
Use of drop_first Inside get_dummies()
Use of prefix Inside get_dummies()
Create Categorical Data Type in Pandas :-
Convert Pandas Series to Categorical Series:-
Using the astype() Function
Using the dtype parameter Inside Series()
Access Categories and Codes in Pandas:-
Rename Categories in Pandas:-
Add New Categories in Pandas:-
Remove Categories in Pandas:-
Check if Categorical Variable is Ordered or Not:-
Convert String to DateTime:-
to_datetime() With Default Arguments
to_datetime() With Day First Format
to_datetime() With Custom Format
Get DateTime From Multiple Columns:-
Get Year, Month and Day From DateTime:-
Get Day of Week, Week of Year and Leap Year:-
DateTime Index in Pandas:-
Apply Single Aggregate Function
Apply Multiple Aggregate Functions in Pandas
Apply Different Aggregation Functions
Group by a Single Column in Pandas
Group by a Multiple Column in Pandas
Group With Categorical Data
Filter Data By Labels:-
Filter Data By Values:-
Logical Operators
isin() Method
str Accessor
query() Method
Sort DataFrame in Pandas
Sort Pandas DataFrame by Multiple Columns
Sort Pandas Series
index Sort Pandas DataFrame Using sort_index()
Line Plot For Data Visualization
Scatter Plots For Data Visualization
Bar Graphs For Data Visualization
Histograms For Data Visualization
Convert String to DateTime:-
Pandas Customized Histogram
Multiple Histograms in Pandas