Missing values k Rolay

Handling missing values in Data

Missing Values Ko Kaise Handle Kiya Jaye? Aur Inhe Handle Karna Kyun Zaroori Hai?” – Data Science Ki Dunia Mein Iska Role 🤔🛠️

Assalamualaikum, pyare data science ke talib ilm! 🌟 Missing values yaani ghaib data se guzarne wala har data scientist ya researcher ko iski ahmiyat aur isse judi mushkilaat ka andaza ho sakta hai. Data Science ki duniya mein, yeh missing values se guzarne ka tajurba aksar humein milta hai. Agar aap mein se kuch khush naseeb hain jo is masle se guzre nahi, toh woh waqai kismat wale hain! 😄 Lekin un logon ke liye jo is masle ka samna karte hain, unko yeh samajhne mein mushkil nahi hoti ke missing values kitne masail paida kar sakti hain.

Naukri, Missing Values aur Aik Bari Ghalti 😢

Lahore ki ek mashhoor company “Codanics Solutions” mein Ahmed ek talented data scientist tha. Woh apne projects ko hamesha top priority deta tha aur is wajah se us ki company mein bhi bohat izzat thi. 🌟

Ek roz, Ahmed apne doston ke sath lunch kar raha tha. 🍛

Ali (ek aur data scientist): “Ahmed bhai! Suna hai aap ko naya project mila hai?”

Ahmed: “Ji haan, Ali. Mujhe customers ki buying habits analyze karni hai. Lekin data mein kuch missing values hain, mujhe lagta hai koi masla nahi hoga agar main unhein ignore kar doon.” 😕

Ali: “Bhai, kabhi bhi missing values ko ignore mat karo. Yeh choti si baat model ki performance ko kharab kar sakti hai.”

Lekin Ahmed ne Ali ki baat ko nazar andaaz kiya aur apne tareeque se kaam karna shuru kar diya.

Jab model tayyar hua aur us ko real-world data par test kiya gaya, to us ki predictions bilkul bhi sahi nahi thi. 😲 Company ko is wajah se bohat bada nuqsan hua.

CEO, Mr. Usman, ne Ahmed ko apne office mein bulaya. 🏢

Mr. Usman: “Ahmed, humein bohat zyada nuqsan hua hai is project se. Kya masla hai?”

Ahmed: “Sir, maine socha tha ke kuch missing values se koi masla nahi hoga. Lekin mujhe ab samajh aaya hai ke maine ghalat socha.” 😔

Mr. Usman: “Ahmed, aap jante hain data science mein kitni bhi choti ghalti badi problem create kar sakti hai. Mujhe afsos hai, lekin humein aap ko company se nikalna parega.”

Ahmed ko bohat afsos hua. Us ne realize kiya ke kabhi bhi data ko lightly nahi lena chahiye. Woh ghar wapas laut kar Ali ko call kiya. 📞

Ahmed: “Ali, tum sahi keh rahe the. Mujhe company se nikal diya gaya hai.”

Ali: “Afsos hai sun kar. Lekin Ahmed, har galti se humein kuch na kuch seekhne ko milta hai. Aap ab better tareeque se kaam karenge.”

Ahmed ne apni galti se seekha aur woh ab missing values aur data preprocessing par khaas tawajjo dene laga. Chand mahine baad, Ahmed ne ek aur company mein job shuru ki, aur wahan us ne prove kiya ke woh ek maahir data scientist hai. Lekin, us ek ghalti ka sabak us ne hamesha yaad rakha.

Ab agar ap b ahmad ki trah risk lena chahtay hyn tu missing values ko seekhnay se pehlay ap is blog ko ignore kar den, warna agar ap interested hyn tu yaqeen manen ye blog ap ki Data Science or AI ki journey ko bht kamal karne wala hy, I know ap soch rahay hun gay k aisa kia hy is main, Q fir Pola Payen kareay Start? Han Bholay phir tayyar ho?

I know ye nick names hyn magar isi trah or b bht se nick names hyn missing values k, By the way ap apna nick name likhen gay comments main?

Missing values k ultay naam

Agar ap b aik desi culture ki paidawar hyn tu ap k bhi bht saray ultay naam gay. hai na? like Achoo, Billa, Bhola, Pola, Saji, kala, chitta, mota, chota, kaddu etc., ye main nahi keh raha ap kahin b nazar dorayen tu aisay naaam htay hyn, or kuch tu bht hi adab se pukaray jatay hyn, jaisa k, Paye Kalay. Ab isi trah missing values k bhi naam hyn kaafi jo agar ap ko na pata hun tu ap preshan hun gay. Chalein phir dekhtay hyn!

Missing values ko mukhtalif namon se pukara jata hai, depend karta hai ke context kya hai aur kis domain ya field mein baat ho rahi hai. Lekin, Data Science aur statistics mein commonly istemal hone wale names hain:

  1. NA (Not Available)
  2. NaN (Not a Number): Khaas taur par programming languages jaise ke Python mein pandas library mein istemal hota hai.
  3. Null: Database management systems jaise SQL mein istemal hone wala term hai.
  4. Undefined
  5. Blank ya Empty
  6. Placeholder Values: Kabhi-kabhi kuch default values set ki jati hain jinhein hum recognize kar sakte hain ke yeh actual data nahi hai. Masalan, kisi age field mein -1 ya 999 set karna.
  7. Sentinel Values: Yeh bhi ek tarah ke placeholder values hoti hain jo specific conditions ko represent karte hain.
  8. Dummy Data: Placeholder ya test purpose ke liye istemal hoti hai.
  9. Missing Data: Aam taur se research papers mein istemal hone wala term.

In tamaam terms mein se kuch specific situations ya tools ke liye hote hain, jabke baaz aam istemal ke liye hote hain. Hamesha zaroori hai ke jab aap data ko analyze ya preprocess kar rahe hoon, toh aap in different types ke missing values ko pehchanein aur unhein sahi tareeqay se handle karein.

Missing Values Handle Karna Kyun Itna Ahem Hai? 🤷‍♂️

  1. Model Ki Accuracy Par Gehra Asar: 💔Missing values ke honay se machine learning models ki accuracy mein kami aati hai, aur iski performance par bhi bura asar hota hai.
  2. Data Ki Mayari Par Sawal: 📉Missing values data ki mayari ko kamzor banate hain, jisse hamare analysis aur faislay mein bhi ghalat fehmiyan paida ho sakti hain.
  3. Model Training Ka Waqt Barh Jata Hai: ⏱️Kabhi-kabhi, missing values ki wajah se model training ka waqt barh jata hai, jo ke resources aur waqt dono ka zaya hai.

Ruku Zara Sabr Karo

Missing values ka hona kisi bhi dataset mein aam baat hai, lekin jab hum decide karte hain ke kisi column ko remove karna chahiye ya nahi, to iska faisla humein kuch factors par depend karta hai:

  1. Data Ki Quantity: Agar aapke paas bohat zyada data hai aur aik specific column mein missing values ki tadad bohat zyada hai (masalan, 70% ya 80%), toh us column ko remove kar dena behtar ho sakta hai, kyun ke us column se faida uthana mushkil ho sakta hai.
  2. Column Ki Importance: Agar missing values wala column aapke analysis ya model ke liye bohat ahem hai, toh us column ko remove karna acha nahi hoga. Aise mein aap missing values ko impute karne ke tareeqe istemal kar sakte hain.
  3. Nature of Data: Kabhi-kabhi, missing values ka hona bhi kuch indicate karta hai. Masalan, kisi survey mein, agar kisi sawal ka jawab nahi diya gaya, toh yeh indicate kar sakta hai ke participant us sawal se comfortable nahi tha. Aise mein, missing value ko hata dena ya replace karna sahi nahi hoga.
  4. Model Ki Sensitivity: Kuch machine learning models missing values ko handle kar sakte hain, jabke kuch models sensitive hoti hain. Aise mein, agar model missing values ke sensitive hai, toh aapko missing values ko handle karna parega.
  5. Type of Data: Numeric data mein missing values ko mean, median ya mode se replace kiya ja sakta hai. Categorical data mein, missing values ko mode ya kisi specific category se replace kiya ja sakta hai.

Aam taur par, agar aapke column mein 50% se zyada data missing hai, toh us column ko consider karna chahiye ke kya usse remove karna behtar rahega ya nahi. Lekin, yeh hard and fast rule nahi hai. Har dataset unique hota hai aur uski requirements bhi alag hoti hain. Is liye, aapko har dataset ke context mein decide karna hoga ke missing values ko kaise handle kiya jaye.

Missing Values Ko Handle Karne Ke Mufassal Tariqay 🧐

  1. Maujooda Data Source Se Phir Se Data Hasil Karna: 🔄Agar aap ke paas woh resource maujood hai jahan se aapne data liya tha, toh aap missing values ko wahan se dobara hasil kar sakte hain.
  2. Mean, Median, Ya Mode Se Data Ko Impute Karna: 📊Agar aapke paas numerical data hai, toh usmein missing values ko mean ya median se replace kiya jata hai. Wahi, categorical data ke liye mode ka istemal hota hai.
  3. Forward Ya Backward Fill Ka Istemal: 🚶‍♂️🏃‍♂️Kuch data sets mein waqt ya tarikh ka silsila hota hai. Aise data sets mein, aik row ke missing value ko pichli ya agli row ki value se replace kiya jata hai.
  4. KNN Imputation Ka Istemal: 🧑‍🤝‍🧑Yeh ek advanced technique hai jahan missing value ko uske aas-paas ke data points ke average value se replace kiya jata hai. Aise libraries jaise scikit-learn mein yeh method maujood hai.
  5. Deep Learning Techniques Ka Istemal: 🧠Deep learning techniques jaise autoencoders bhi missing values ko handle karne mein madadgar sabit ho sakte hain.
  6. Simply Delete Kar Dena: ❌Agar aapke data set mein missing values ki tadad bahut kam hai, toh aap us specific row ya column ko bhi delete kar sakte hain.

Agar main na handle karun tu?

Bachoo Jee! phir tu hargiz model acha kaam nahi kare ga, yehi nahi abhi or suneay!

Agar hum missing values ko nazar andaaz kar dein toh humein kai masail ka samna karna par sakta hai. Yahan kuch masail hain jo arise ho sakti hain:

  1. 📉 Model Accuracy Mein Kami: Machine learning models ki accuracy kam ho sakti hai, kyun ke model ko complete information nahi milti.
  2. 📊 Ghalat Analysis: Data analysis mein ghalat nataij nikal sakte hain, jo ke decisions par negative asar dal sakta hai.
  3. 😕 Model Confusion: Kuch models missing values handle nahi kar pate, jis se model train nahi ho pata ya phir ghalat predictions karta hai.
  4. 🤖 Bias in Model: Missing values ki wajah se model mein bias aane ka khatra barh jata hai.
  5. 📚 Data ka Ghalat Interpretation: Missing values ki wajah se humare paas adhoori ya ghalat malumat ho sakti hai, jis ki wajah se hum data ko ghalat tareeqe se interpret kar sakte hain.
  6. 💾 Storage Issues: Agar missing values ko replace nahi kiya jaye toh storage mein bhi masail ho sakti hain, kyun ke kuch systems missing values ko store nahi kar pate.
  7. 🔀 Data Integration Masail: Different sources se aane wale data mein agar missing values hain toh integration mein masail ho sakti hain.
  8. 🚫 Features ka Ghalat Selection: Missing values ki presence mein, kuch aham features ko ignore kiya ja sakta hai jin ka model par asar hona chahiye.
  9. 🧪 Ghalat Experimental Results: Science ya research projects mein, missing values ki wajah se ghalat experimental nataij aa sakte hain.
  10. 😰 Stress aur Extra Kaam: Data scientists ko extra kaam karna par sakta hai tajziyat mein, kyun ke missing values ko identify aur handle karna parta hai.

Is liye, missing values ko handle karna bohat zaroori hota hai ta ke hum upar diye gaye masail se bach saken. 🛠️🔧🔍

Khatma: Missing Values – Ek Badi Challenge Lekin Ek Behtareen Mauqa Bhi 🌟Missing values se guzarne ka tajurba har data scientist ke liye ek challenge toh hai hi, lekin isse humein yeh bhi seekhne ko milta hai ke hum kaise data ki mayari ko behtar bana sakte hain. Aakhir mein, behtar quality wale data se hi behtar aur zaheen insights aur models tayyar hoti hain.

Dosto, yaad rakhein, agle dafa jab bhi aapko missing values se guzarne ka mauqa mile, toh darne ki zaroorat nahi! 🚀 Yeh ek mauqa hai seekhne ka aur apni data science ki journey ko mazbooti se aage badhane ka.

Is blog main maine details main btaya hy Data Wrangling ko desi style main

Also explore the following one How to handle Missing Values?

A desi guide to Exploratory Data Analysis

63 Comments.

  1. I appreciate the effort you put into your blogs. Your engaging writing style and well-organized content have made it easy for me to follow and comprehend the material. Thank you for making learning enjoyable.

  2. If we don’t deal with the missing values then it will create a gap in data and may skew the distribution to the wrong way. So, it is better to deal the missing values at the initial stage.

  3. “This blog provides valuable insights and practical techniques for effectively dealing with missing values. It emphasizes the importance of not solely relying on removing missing values but instead encourages utilizing the methods mentioned above to address this issue appropriately.”.. Thank you, Sir.

  4. very well explained.I’m sure whenever we are going to handle missing value we will remember this blog😊
    thakns a lot sir! for your desi way of teaching

  5. Mohtram Doctor Sahib,

    Aap kay tamaam blogs ki tarah, ye dono blogs bhi behtreen hein, aor is topic par, ye kisi bhi English language kay blog kay ham palla hein balkay behter hein kiyon keh aap desi andaaz mein, sab cheezon ki bohat achay tareeqay say wazahaat kar rahey hein.

  6. You can not leave important or valuable values on chance or empty in any data set. In some cases empty values that are not of any importance, in a data set may be dropped or can be compromised. but important values , in context to ones requirement must be well maintained to evaluate authentic conclusions. jazakallah

  7. Agr apna name ky sath scientist lagna haa to pir hard-work ki taraf jana chahye na ky short-cut or jan chordna walay rastye tilash karee. Big respect Dr: Ammar , It’s Lovely Unique Style of teaching .

  8. Bohat Alaa. Very nice. is se easiest GHOL KE DIMAGH ma baba ji ke ilava koi nai daal skta. azmaish shart ha.

  9. Sir in “Missing Values Ko Handle Karne Ke Mufassal Tariqay 🧐” Point 6 Simply Delete Kar Dena: ❌Agar aapke data set mein missing values ki tadad bahut kam hai, toh aap us specific row ya column ko bhi delete kar sakte hain. Yeh ult ho gya yaha pe “kam” ki jaga “zada” hona chaye tha I think. Baqi bht hi ala

    1. If missing values are less than 20% then we have to replace them or find another way like imputing the values of mean, median or mode or use other technique like K-mean cluster. Simple cannot delete them. If the missing values are more than 70% then simply remove the column. If data have missing values between 20% and 70% we can use more techniques like grouping to analyze data, K-NN, and feature engineering and so on and so forth

  10. This blog is Very informative and give us many ways to handle with missing values …always the removal of missing values is not good we should to handle with this problem with above mentioned methods …Thanks alot sir for giving such best knowledge

Leave a Reply

Your email address will not be published. Required fields are marked *