EDA stands for Exploratory Data Analysis. It refers to the initial investigation and analysis of data to understand the key properties and patterns within the dataset.
Some key aspects of EDA include:
🔍 Uncovering Insights: Exploring the data to gain insights, form hypotheses and drive the overall analysis process. This helps reveal important relationships and patterns.
📊 Summarizing Data: Calculating summaries like mean, median, mode, standard deviation, minimum, maximum etc. to get an idea of central tendencies and variability in the data.
💾 Data Quality Checks: Checking for missing values, duplicates, outliers, inconsistencies, errors etc. to verify data quality before modeling or drawing conclusions.
📈 Visualization: Using charts, plots and visualization techniques to better understand distributions, relationships and spot anomalies in the data. Helps identify what to focus on.
🔄 Transforming Data: Applying techniques like binning, normalizing, aggregating, filtering etc. to make raw data amenable for modeling tasks.
📝 Documentation: Recording observations, insights, conclusions gained from EDA as well as methods used to understand and replay the process later.
The goal of EDA is to uncover the story within data, check assumptions and gain familiarity with the dataset before building machine learning models. It forms a critical early step in any data science or analytics process.