Course Content
Day-2: How to use VScode (an IDE) for Python?
0/1
Day-3: Basics of Python Programming
This section will train you for Python programming language
0/4
Day-4: Data Visualization and Jupyter Notebooks
You will learn basics of Data Visualization and jupyter notebooks in this section.
0/1
Day-5: MarkDown language
You will learn whole MarkDown Language in this section.
0/1
Day-10: Data Wrangling and Data Visualization
Data Wrangling and Visualization is an important part of Exploratory Data Analysis, and we are going to learn this.
0/1
Day-11: Data Visualization in Python
We will learn about Data Visualization in Python in details.
0/2
Day-12,13: Exploratory Data Analysis (EDA)
EDA stands for Exploratory Data Analysis. It refers to the initial investigation and analysis of data to understand the key properties and patterns within the dataset.
0/2
Day-15: Data Wrangling Techniques (Beginner to Pro)
Data Wrangling in python
0/1
Day-26: How to use Conda Environments?
We are going to learn conda environments and their use in this section
0/1
Day-37: Time Series Analysis
In this Section we will learn doing Time Series Analysis in Python.
0/2
Day-38: NLP (Natural Language Processing)
In this section we learn basics of NLP
0/2
Day-39: git and github
We will learn about git and github
0/1
Day-40: Prompt Engineering (ChatGPT for Social Media Handling)
Social media per activae rehna hi sab kuch hy, is main ap ko wohi training milay ge.
0/1
Python ka Chilla for Data Science (40 Days of Python for Data Science)
About Lesson

Data wrangling, also known as data munging, is the process of cleaning, transforming, and organizing data in a way that makes it more suitable for analysis. It is a crucial step in the data science process as real-world data is often messy and inconsistent.

 

The general steps to do Data Wrnagling in python are as follows:

Steps to perform data wrangling on the Titanic dataset in Python using pandas library: The steps of data wrangling in Python typically include:

  1. Importing necessary libraries such as Pandas, NumPy, and Matplotlib
  2. Loading the data into a Pandas DataFrame
  3. Assessing the data for missing values, outliers, and inconsistencies
  4. Cleaning the data by filling in missing values, removing outliers, and correcting errors
  5. Organizing the data by creating new columns, renaming columns, sorting, and filtering the data
  6. Storing the cleaned data in a format that can be used for future analysis, such as a CSV or Excel file
  7. Exploring the data by creating visualizations and using descriptive statistics
  8. Creating a pivot table to summarize the data
  9. Checking for and handling duplicate rows
  10. Encoding categorical variables
  11. Removing unnecessary columns or rows
  12. Merging or joining multiple datasets
  13. Handling missing or null values
  14. Reshaping the data
  15. Formatting the data
  16. Normalizing or scaling the data
  17. Creating new features from existing data
  18. Validating data integrity
  19. Saving the final data for future use
  20. Documenting the data wrangling process for reproducibility

Please note that the steps may vary depending on the data, the requirements, and the goals of the analysis. It’s worth noting that these are general steps and the specific steps you take will depend on the dataset you are working with and the analysis you plan to perform.

 

All codes can be found here

 

 

Join the conversation
Ghayas uddin 6 months ago
Equation To remove Data: lower_bound=Q1-1.5*IQR upper_bound=Q3-1.5*IQRfiltered_data= df[(df['age']>=lower_bound) & (df['age']<=upper_bound)]sns.boxplot(data=filtered_data, y='age', x='sex')
Reply
Muhammad Haroon 7 months ago
Respected Sir, I learned a lot from this session; it really made me realise that I can do this with hard work and dedication. Jazak Allah Sir.
Reply
shafiq ahmed 8 months ago
+ - ki change aae hy sir
Reply
shafiq ahmed 8 months ago
nahi sahi
Reply
shafiq ahmed 8 months ago
# Z score mehtod from scipy import stats import numpy as np zscore = np.abs(stats.zscore(df['age'])) threshold = 3 df = df[zscore < threshold]
Reply
shafiq ahmed 8 months ago
duplicate_rows = df[df.duplicated()]
Reply
komal Baloch 8 months ago
done
Reply
Syed Abdul Qadir Gilani 9 months ago
MA SHA ALLAH, Sir
Reply