I have a code that converts all the dates in the specified date column to the specified date format. In case the date is invalid or empty, it gets replaced by the replacement value specified.
My code works as expected for all values except the following two:
1. 12-2012-18 (Gets considered invalid & is being replaced by the replacement value).
2. 2008,12,02 (Converted to "02,01,2008" instead of "02,12,2008" or "12,02,2008").
What I have tried:
CODE:
def date_fun(df, col_name, date_format, replace_value):
# Convert column values to string
df[col_name] = df[col_name].astype(str)
# Convert serial date values to datetime values
def handle_serial_date(val):
try:
return pd.to_datetime(float(val), unit='D')
except:
return val
df[col_name] = df[col_name].apply(handle_serial_date)
# Convert regular date values to datetime values
df[col_name] = pd.to_datetime(df[col_name], errors='coerce')
# Replace invalid or null dates with the specified value (if any)
if replace_value is not None:
replace_value = pd.to_datetime(str(replace_value), errors='coerce') # convert replace_value to datetime format
df[col_name] = df[col_name].fillna(replace_value) # replace invalid or null dates with replace_value
# Convert the datetime values to the specified format
df[col_name] = df[col_name].dt.strftime(date_format)
new_data = df[col_name].to_dict()
return new_data
Input & Output:
Input:
"0": 25.09.2019
"1": 9/16/2015
"2": 10.12.2017
"3": 02.12.2014
"4": 08-Mar-18
"5": 08-12-2016
"6": 26.04.2016
"7": 05-03-2016
"8": 24.12.2016
"9": 10-Aug-19
"10": abc
"11": 05-06-2015
"12": 12-2012-18
"13": 24-02-2010
"14": 2008,13,02
"15": 16-09-2015
"16": 23-01-1992, 7:45
"17":
"18": 2nd December 2018
"19": 45678
"20": 798
Output: (format= m-d-Y, replacement value=05-06-1987 (for null & invalid dates))
"0": "09-25-2019",
"1": "09-16-2015",
"2": "10-12-2017",
"3": "02-12-2014",
"4": "03-08-2018",
"5": "08-12-2016",
"6": "04-26-2016",
"7": "05-03-2016",
"8": "12-24-2016",
"9": "08-10-2019",
"10": "05-06-1987",
"11": "05-06-2015",
"12": "05-06-1987",
"13": "02-24-2010",
"14": "02-01-2008",
"15": "09-16-2015",
"16": "01-23-1992",
"18": "12-02-2018",
"19": "01-23-2095",
"20": "03-09-1972"
The problematic rows are: 12 & 14. Why is my code not working on these two values?