Join the conversation
done once again
Reply
aaj te baba g ne pent coat paya e
Reply
I learned about Data Inconsistencies:
1: Inconsistent format (i.e different date formats in the date column)
2: Inconsistent Naming Conventions (i.e Usa, U.S.A, usa, United States, United States of America)
3: Typographical Errors (Mistakes in data entry i.e Pakistan, Paaakistannn)
4: Duplicates (Multiple Occurrence of same row)
5: Contradictory (Logical inconsistency/contradiction i.e father_age<son_age [logically not possible])
Reply
very nice
done
Reply
Done
Reply
df = pd.DataFrame(data)# Define a function to standardize the date format
def standardize_date(date_str):
try:
# Try parsing the date with different formats
date_obj = pd.to_datetime(date_str, errors='coerce')
return date_obj.strftime("%m/%d/%Y")
except ValueError:
# If parsing fails, return NaN
return pd.NaT# Apply the function to the 'date' column
df['date'] = df['date'].apply(standardize_date)print(df)
Reply
import pandas as pd# Sample DataFrame
data = {
'date': ['2021-12-01', '01-12-2022', 'dec-22-2022', '2021/12/12'],
'country': ['USA', 'UK', 'United States of America', 'UK'],
'name': ['nazra', 'nazra', 'Nazar', 'naz'],
'age': [21, 22, 23, 24],
'city': ['lahore', 'karach', 'lahore', 'lahore'],
'sale': [100, None, 300, 400]
}df = pd.DataFrame(data)# Define a function to standardize the date format
def standardize_date(date_str):
try:
# Try parsing the date with different formats
date_obj = pd.to_datetime(date_str, errors='coerce')
return date_obj.strftime("%m/%d/%Y")
except ValueError:
# If parsing fails, return NaN
return pd.NaT# Apply the function to the 'date' column
df['date'] = df['date'].apply(standardize_date)print(df)
Reply
import pandas as pd# Sample DataFrame
data = {
'date': ['2021-12-01', '01-12-2022', 'dec-22-2022', '2021/12/12'],
'country': ['USA', 'UK', 'United States of America', 'UK'],
'name': ['nazra', 'nazra', 'Nazar', 'naz'],
'age': [21, 22, 23, 24],
'city': ['lahore', 'karach', 'lahore', 'lahore'],
'sale': [100, None, 300, 400]
}df = pd.DataFrame(data)# Define a function to standardize the date format
def standardize_date(date_str):
try:
# Try parsing the date with different formats
date_obj = pd.to_datetime(date_str, errors='coerce')
return date_obj.strftime("%m/%d/%Y")
except ValueError:
# If parsing fails, return NaN
return pd.NaT# Apply the function to the 'date' column
df['date'] = df['date'].apply(standardize_date)print(df)date country name age city sale
0 12/01/2021 USA nazra 21 lahore 100.0
1 01/12/2022 UK nazra 22 karach NaN
2 12/22/2022 United States of America Nazar 23 lahore 300.0
3 12/12/2021 UK naz 24 lahore 400.0
Reply
How to remove inconsistencies in big data
Reply
Excellent sir
Reply