Course Content
Day-2: How to use VScode (an IDE) for Python?
0/1
Day-3: Basics of Python Programming
This section will train you for Python programming language
0/4
Day-4: Data Visualization and Jupyter Notebooks
You will learn basics of Data Visualization and jupyter notebooks in this section.
0/1
Day-5: MarkDown language
You will learn whole MarkDown Language in this section.
0/1
Day-10: Data Wrangling and Data Visualization
Data Wrangling and Visualization is an important part of Exploratory Data Analysis, and we are going to learn this.
0/1
Day-11: Data Visualization in Python
We will learn about Data Visualization in Python in details.
0/2
Day-12,13: Exploratory Data Analysis (EDA)
EDA stands for Exploratory Data Analysis. It refers to the initial investigation and analysis of data to understand the key properties and patterns within the dataset.
0/2
Day-15: Data Wrangling Techniques (Beginner to Pro)
Data Wrangling in python
0/1
Day-26: How to use Conda Environments?
We are going to learn conda environments and their use in this section
0/1
Day-37: Time Series Analysis
In this Section we will learn doing Time Series Analysis in Python.
0/2
Day-38: NLP (Natural Language Processing)
In this section we learn basics of NLP
0/2
Day-39: git and github
We will learn about git and github
0/1
Day-40: Prompt Engineering (ChatGPT for Social Media Handling)
Social media per activae rehna hi sab kuch hy, is main ap ko wohi training milay ge.
0/1
Python ka Chilla for Data Science (40 Days of Python for Data Science)
About Lesson

EDA stands for Exploratory Data Analysis. It refers to the initial investigation and analysis of data to understand the key properties and patterns within the dataset.

Some key aspects of EDA include:

🔍 Uncovering Insights: Exploring the data to gain insights, form hypotheses and drive the overall analysis process. This helps reveal important relationships and patterns.

📊 Summarizing Data: Calculating summaries like mean, median, mode, standard deviation, minimum, maximum etc. to get an idea of central tendencies and variability in the data.

💾 Data Quality Checks: Checking for missing values, duplicates, outliers, inconsistencies, errors etc. to verify data quality before modeling or drawing conclusions.

📈 Visualization: Using charts, plots and visualization techniques to better understand distributions, relationships and spot anomalies in the data. Helps identify what to focus on.

🔄 Transforming Data: Applying techniques like binning, normalizing, aggregating, filtering etc. to make raw data amenable for modeling tasks.

📝 Documentation: Recording observations, insights, conclusions gained from EDA as well as methods used to understand and replay the process later.

The goal of EDA is to uncover the story within data, check assumptions and gain familiarity with the dataset before building machine learning models. It forms a critical early step in any data science or analytics process.

Join the conversation
Riyan 5 months ago
please let resolve my issue in this video df3 = df2
Reply
Muhammad Abdullah Khalil 2 weeks ago
Use this code :df2.fillna({"age": df["age"].mean()}, inplace=True) instead of the given which supports older versions.
Ghayas uddin 9 months ago
Q. How to deal with missing values? 1. categorical 2. number 3. object 4. boolean ANSWER: 1. Categorical Data: Fill missing categorical values with mode (most frequent category). Treat missing values as a separate category if meaningful. 2. Numeric Data: Replace missing numeric values with mean, median, or a specific value (zero, for instance). Consider methods like interpolation for time-series data. 3. Object Data: Handle object data based on context; it might involve text, dates, or mixed types. For text, missing values might be replaced with a placeholder or a special category. 4. Boolean Data: For boolean values, missing entries might be replaced with the most frequent value (mode). Alternatively, consider treating missing boolean values as a distinct category.
Reply
Ghayas uddin 9 months ago
To calculate percentage if null Values: (df.isnull().sum()/len(df))*100
Reply
Kamran Qayyum 10 months ago
a very very good lecture. very good learning in one lecture.
Reply
Shahid Umar 10 months ago
The start of EDA is good and covers most of the topic in this lecture
Reply
IMTIAZ 11 months ago
Find Null Value Persentage null_percentage = df.isnull().mean() * 100 print(null_percentage)
Reply
shahid khan 11 months ago
gooooooood lecture
Reply
shafiq ahmed 12 months ago
central tendency
Reply
shafiq ahmed 12 months ago
df.dropna(axis=1, inplace=True)
Reply
shafiq ahmed 12 months ago
Drop rows with any null values df.dropna(inplace=True)
Reply
shafiq ahmed 12 months ago
Drop columns with any null values df.dropna(axis=1, inplace=True)