ABC of Statistics-Day-1

Introduction

Statistics is a branch of mathematics that deals with the collection, analysis, interpretation, presentation, and organization of data.

In applying statistics to a scientific, industrial, or social problem, it is conventional to begin with a statistical population or a statistical model to be studied.

Here are some key aspects of statistics:

Data Collection: Statistics involves gathering data from various sources, which could be experiments, surveys, or observational studies. The way data is collected is critical to ensuring its validity and relevance.
Data Analysis: Once data is collected, statisticians use a variety of methods to analyze it. This includes descriptive statistics, which summarize data through numbers like the mean and standard deviation, and inferential statistics, which allow for conclusions and predictions to be drawn from data.
Interpretation: One of the most crucial steps in statistics is interpreting the results of the data analysis. This involves understanding what the data is saying and, just as importantly, understanding the limitations of the data and the analysis methods used.
Presentation: Effective communication of statistical findings, often through charts, graphs, and tables, is essential. This makes complex data understandable and accessible to those who need to make decisions based on it.
Decision Making: In many cases, the ultimate goal of statistics is to inform decision-making. This can range from business decisions, like how to improve a product, to healthcare decisions, like evaluating the effectiveness of a new treatment.
Prediction and Forecasting: Statistics are often used to make predictions about the future based on existing data. Techniques like regression analysis, time series analysis, and machine learning are used for forecasting.
Variability and Uncertainty Handling: Statistics provides methods to deal with variability and uncertainty in data. It helps in understanding the randomness in data and making informed decisions despite the inherent uncertainties.
Quantitative Research: In various fields like economics, medicine, psychology, and environmental science, statistics is used for quantitative research and hypothesis testing.

Statistics, thus, plays a crucial role in numerous fields, enabling us to make sense of the vast amounts of data generated in our world today. It is a fundamental tool in research and analysis, aiding in the understanding and solving of complex problems.

Statistics in Data Science and Machine Learning

Statistics provides methods for designing experiments and surveys, and for making inferences about the characteristics of populations based on sample data. In the context of data science and machine learning, statistics plays several crucial roles:

Data Understanding and Preparation: Statistics helps in understanding data through descriptive statistics such as mean, median, mode, variance, and standard deviation. These metrics provide insights into the data’s central tendency, dispersion, and overall distribution. Understanding these aspects is vital for data cleaning and preparation.
Modeling and Algorithm Selection: Many machine learning algorithms are grounded in statistical theories. For instance, linear regression, logistic regression, and various types of clustering methods are directly based on statistical concepts. Selecting the right algorithm often requires understanding these statistical underpinnings.
Inference and Prediction: Statistics is key to making inferences and predictions from data. It helps in estimating the relationships between variables and in making predictions about future observations. For example, statistical hypothesis testing is used to infer if the observed data can be explained by a model or is due to random chance.
Performance Evaluation: After training a machine learning model, statistics is used to evaluate its performance. Metrics like confusion matrix, precision, recall, F1 score, and ROC curves are based on statistical concepts. These metrics help in understanding the strengths and weaknesses of a model.
Experimentation and Validation: In machine learning, experimentation is essential. Statistical methods such as A/B testing and cross-validation are used to validate models and ensure their effectiveness and reliability before deploying them in real-world applications.
Dealing with Uncertainty: Machine learning models often have to deal with uncertainty in data. Statistics provides tools to quantify, manage, and make decisions under uncertainty, for instance, through probabilistic models and Bayesian methods.
Feature Engineering and Selection: Statistical methods help in identifying significant variables (features) that have more predictive power. Techniques like correlation analysis and principal component analysis (PCA) are used for feature selection and dimensionality reduction.
Ethical and Responsible AI: Statistics plays a role in ensuring that machine learning models are fair, ethical, and unbiased. Statistical analysis can help identify and mitigate biases in data and models.

In essence, statistics forms the backbone of data science and machine learning, providing the necessary tools and methodologies for extracting insights and knowledge from data. Its importance cannot be overstated, as it enables practitioners to make data-driven decisions and build intelligent systems that are effective, reliable, and ethical.

Why data is important for statistics?

Before you can use statistics to analyze a problem, you must convert information about the problem into data. That is, you must establish or adopt a system of assigning values, most often numbers, to the objects or concepts that are central to the problem in question.

For instance, when you buy something at the store, the price you pay is a measurement: it assigns a number signifying the amount of money that you must pay to buy the item. Similarly, when you step on the bathroom scale in the morning, the number you see is a measurement of your body weight. Depending on where you live, this number may be expressed in either pounds or kilograms, but the principle of assigning a number to a physical quantity (weight) holds true in either case.

Data need not be inherently numeric to be useful in an analysis. For instance, the categories male and female are commonly used in both science and everyday life to classify people, and there is nothing inherently numeric about these two categories. Similarly, we often speak of the colors of objects in broad classes such as red and blue, and there is nothing inherently numeric about these categories either. (Although you could make an argument about different wavelengths of light, it’s not necessary to have this knowledge to classify objects by color.)

This kind of thinking in categories is a completely ordinary, everyday experience, and we are seldom bothered by the fact that different categories may be applied in different situations. For instance, an artist might differentiate among colors such as carmine, crimson, and garnet, whereas a layperson would be satisfied to refer to all of them as red. Similarly, a social scientist might be interested in collecting information about a person’s marital status in terms such as single-never married, single-divorced, and single-widowed, whereas to someone else, a person in any of those three categories could simply be considered single. The point is that the level of detail used in a system of classification should be appropriate, based on the reasons for making the classification and the uses to which the information will be put.

Measurements

Measurement is the process of systematically assigning numbers to objects and their properties to facilitate the use of mathematics in studying and describing objects and their relationships. Some types of measurement are fairly concrete: for instance, measuring a person’s weight in pounds or kilograms or his height in feet and inches or in meters. Note that the particular system of measurement used is not as important as the fact that we apply a consistent set of rules: we can easily convert a weight expressed in kilograms to the equivalent weight in pounds, for instance. Although any system of units may seem arbitrary (try defending feet and inches to someone who grew up with the metric system!), as long as the system has a consistent relationship with the property being measured, we can use the results in calculations. Measurement is not limited to physical qualities such as height and weight. Tests to measure abstract constructs such as intelligence or scholastic aptitude are commonly used in education and psychology, and the field of psychometrics is largely concerned with the development and refinement of methods to study these types of constructs. Establishing that a particular measurement is accurate and meaningful is more difficult when it can’t be observed directly. Although you can test the accuracy of one scale by comparing results with those obtained from another scale known to be accurate, and you can see the obvious use of knowing the weight of an object, the situation is more complex if you are interested in measuring a construct such as intelligence. In this case, not only are there no universally accepted measures of intelligence against which you can compare a new measure, there is not even common agreement about what “intelligence” means. To put it another way, it’s difficult to say with confidence what someone’s actual intelligence is because there is no certain way to measure it, and in fact, there might not even be common agreement on what it is. These issues are particularly relevant to the social sciences and education, where a great deal of research focuses on just such abstract concepts.

These concepts of Measurements have been adapted from: Statistics in a Nutshell, 2nd Edition [Book]

Scales or levels of measurements

Below is a tabulated summary of the scales or levels of measurement in statistics:

Scale	Definition	Examples	Important Information
Nominal	Categorizes data without a natural order or ranking.	Gender, Nationality	Only used for labeling; mathematical operations are not meaningful.
Ordinal	Categorizes data with a natural order, but intervals are not consistent.	Movie ratings (e.g., 1-5 stars), Economic class (e.g., low, middle, high)	Indicates order, but differences between values are not standardized.
Interval	Numeric scale where intervals between values are consistent, but there is no true zero point.	Temperature in Celsius or Fahrenheit, Calendar years	Allows for meaningful addition and subtraction; multiplication and division are not meaningful.
Ratio	Similar to interval, but with a meaningful zero point, allowing for all mathematical operations.	Height, Weight, Age, Income	Allows for all mathematical operations, including meaningful ratios.

This table encapsulates the essential aspects of each measurement level, including their definitions, typical examples, and important considerations for their use in statistical analysis and data science.

Data Types

Qualitative vs. Quantitative Data Types

Aspect	Qualitative Data	Quantitative Data
Definition	Data that describes qualities or characteristics.	Data that can be measured or counted.
Nature	Non-numeric, subjective.	Numeric, objective.
Examples	Colors, textures, smells, opinions, genres.	Height, weight, temperature, scores, quantities.
Analysis	Categorization, thematic analysis, content analysis.	Statistical analysis, mathematical calculations.
Measurement	Nominal or ordinal scales.	Interval or ratio scales.
Purpose	Understanding complex concepts, opinions, or experiences.	Quantifying characteristics, making predictions, testing hypotheses.

Categorical vs. Numerical Data Types

Aspect	Categorical Data	Numerical Data
Definition	Data that represents groups or categories.	Data that represents quantities and can be measured.
Types	Nominal (no inherent order) and Ordinal (ordered categories).	Discrete (countable numbers) and Continuous (measurable quantities).
Examples	Gender, nationality, blood type.	Age, salary, temperature, distance.
Analysis	Used for classification, sorting, grouping.	Used for statistical calculations, comparisons.
Characteristic	Often non-numeric, but can be coded numerically.	Inherently numeric.
Use in Research	Identifying subgroups, exploring data distribution.	Performing calculations, establishing correlations.

These tables should help differentiate these data types and provide a clear understanding of their uses and characteristics in data analysis, statistics, and research methodologies.

In summary:

Qualitative Data Types (Categorical): Nominal and Ordinal scales are used for categorizing and ranking data without implying a numeric nature.
Quantitative Data Types (Numerical): Interval and Ratio scales are used for numeric data, allowing for a wide range of arithmetic and statistical operations. Ratio data includes a true zero point, which differentiates it from interval data.

Few other data types important for statistical analysis are mentioned in this table:

Data Type	Description	Examples
Discrete Data	Data that can only take specific values (typically integers). These values are countable and have gaps between them.	Number of children in a family, number of cars in a parking lot.
Continuous Data	Data that can take any value within a range. These values are measurable and can be infinitely subdivided.	Height, weight, temperature, time.
Binary Data	A special case of nominal data with only two categories or states (0 or 1, True or False, Yes or No).	Outcome of a coin flip (heads or tails), a light switch (on or off).
Categorical Data	Data that can be divided into groups, which may or may not have a logical order.	Blood type (A, B, AB, O), types of cuisine (Italian, Chinese, Indian).
Ordinal Categorical Data	A type of categorical data with a clear ordering or ranking.	Star ratings for a hotel (1-star, 2-star, 3-star), education level (high school, undergraduate, graduate).
Time Series Data	Data points collected or recorded at regular time intervals.	Daily stock market prices, monthly rainfall amounts.
Spatial Data	Data that has a geographical or spatial component.	Locations on a map, regions in a geographic information system (GIS).
Multivariate Data	Data involving multiple variables or attributes.	Data sets containing demographics, economic indicators.
Structured Data	Data that adheres to a predefined model or schema, like in databases.	Relational database tables, Excel spreadsheets.
Unstructured Data	Data that doesn’t fit into a conventional database schema.	Text files, multimedia content, web pages.
Semi-Structured Data	A mix of structured and unstructured data formats.	Emails (structured headers, unstructured body), XML and JSON documents.
Boolean Data	Data with only two possible values.	True/False questions, On/Off switches.
Nominal Data	Categorizes data without any order or rank.	Types of animals, varieties of fruits.
Textual Data	Data consisting of words, sentences, or paragraphs.	Books, articles, social media posts.
Audio Data	Data in the form of sound.	Recorded speeches, music files.
Video Data	Sequences of images (frames).	Movies, surveillance footage.
Image Data	Visual data in the form of pixels.	Photographs, paintings.

Learning Resources

Learn from our YouTube channel in urdu/Hindi here is the link to playlist:

ABC of Statistics for Data Science and Machine Learning.

My Kaggle master journey at 40 in one month.

Kaggle Master at 40 in 1 month

June 11, 2024 2 Comments

Powerful Websites

May 26, 2024 1 Comment

AI tools- Useful for students and researchers

AI tools designed for students and academics

May 23, 2024 1 Comment

Successful scholarship Hunting

May 9, 2024 1 Comment

Google Scholar Kaise Use Karein?

May 4, 2024 No Comments

Chapter 8: The Future of Sampling in Statistics – Emerging Trends and Innovations

November 30, 2023 5 Comments

4.63

(8)

Data Analyst Summer Bootcamp 2024

By Dr. Aammar Tufail

5.00

(1)

Scientific Writing Mentorship Program

By Dr. Aammar Tufail

₨150,000.00

Tableau for Data Analytics (A complete course in urdu/hindi)

By Dr. Aammar Tufail

₨3,500.00

7-Months online Data Science & AI bootcamp for Kids (Mathematics, logic building, report writing and Coding)

5.00

(2)

7-Months online Data Science & AI bootcamp for Kids (Mathematics, logic building, report writing and Coding)

By Dr. Aammar Tufail

Learn English (Read, Write and Speak)

By Dr. Aammar Tufail

Start Learning

18 Comments.

Nisar Ahmad says:
February 11, 2024 at 12:20 pm
Thanks worthy sir, JazakAllah o Khair!
Ashok sharma says:
February 9, 2024 at 8:58 am
this blog is very useful for those who is beginner in data scientists
Umar Lodhi says:
January 24, 2024 at 2:37 pm
اسلام علیکم! سر بہت عمدہ ہم کافی دنوں سے شماریات کو پڑھنے کے لیے انٹر نیٹ پر تلاش کر رہا تھا ، ماشاءاللہ آپ کے لیکچرز نےبہت متاثر کیا اور سیکھ رہاہوں جزاء ک اللہ ، اللہ رب العزت آپکو جزاء خیرعطاء فرمائے اور ہمیشہ خوش و خرم رکھےآمین۔
Aftab Ahmad says:
January 7, 2024 at 9:59 pm
GREAT SIR G
saima Shahzadi says:
December 18, 2023 at 10:08 pm
done
Muhammad Naeem says:
December 9, 2023 at 9:07 pm
Very easy way to teach. Understand very easily
saima saeed says:
December 2, 2023 at 9:25 pm
This is very simple way to teach students, really sir I get much knowledge from this blog.
Javed Ali says:
November 28, 2023 at 3:36 pm
AOA, This blog post provides an introduction to statistics and covers topics such as the definition of statistics, types of data, and the importance of statistics in data science. You have done a good job of explaining the concepts in a simple and easy-to-understand manner. The use of examples and illustrations makes the blog post engaging and informative. Overall, I found the blog post to be a great resource for me.ALLAH PAK ap ko dono jahan ki bhalian aata kry AAMEEN.
Areej Panhwer says:
November 26, 2023 at 7:24 pm
very well explaination of Statistics in this blog , this blog is very helpful to understand statistics for Datascience and Machine Learning … Thank u Sir
Danish Ammar says:
November 25, 2023 at 3:42 am
Gr8 lecture
Muhammad Haroon says:
November 24, 2023 at 11:08 pm
This is very simple way to teach students, really sir I get much knowledge from this blog, Jazak Allah
Nimra Ishaq says:
November 24, 2023 at 1:38 pm
Well explained.
Asif Bokhari says:
November 23, 2023 at 11:25 pm
Asslam o allakam , I was a student of Physics ,these nots can help me to analyse data batter
Ansa Anjum says:
November 23, 2023 at 7:03 pm
This blog is superb and easily understandable for all those who have no any background to statics ……Its first time in my life i take interest in statistic and this because of yours easy way of explaining of Statistic
Najeeb Ullah says:
November 23, 2023 at 1:36 am
done
RASHAD MASUD says:
November 22, 2023 at 8:48 am
DOING GOOD WORK
Shahid Umar says:
November 22, 2023 at 6:55 am
This blog post contains the starter topics of statistics, if anyone follows these topics then they can easily improve their statistical expertise.
Bushra says:
November 22, 2023 at 5:45 am
Very well explained

ABC of Statistics-Day-1

Table of Contents

Introduction

Statistics in Data Science and Machine Learning

Why data is important for statistics?

Measurements

Scales or levels of measurements

Data Types

Qualitative vs. Quantitative Data Types

Categorical vs. Numerical Data Types

Learning Resources

Data Analyst Summer Bootcamp 2024

Scientific Writing Mentorship Program

Tableau for Data Analytics (A complete course in urdu/hindi)

7-Months online Data Science & AI bootcamp for Kids (Mathematics, logic building, report writing and Coding)

Learn English (Read, Write and Speak)

Recent Blog Posts

18 Comments.

Leave a Reply Cancel reply

Quick Links