Click here to Skip to main content
15,885,869 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
I have a code that converts all the dates in the specified date column to the specified date format. In case the date is invalid or empty, it gets replaced by the replacement value specified.

My code works as expected for all values except the following two:

1. 12-2012-18 (Gets considered invalid & is being replaced by the replacement value).
2. 2008,12,02 (Converted to "02,01,2008" instead of "02,12,2008" or "12,02,2008").

What I have tried:

CODE:
def date_fun(df, col_name, date_format, replace_value):
                
            # Convert column values to string
            df[col_name] = df[col_name].astype(str)
            
            # Convert serial date values to datetime values
            def handle_serial_date(val):
                try:
                    return pd.to_datetime(float(val), unit='D')
                except:
                    return val
            df[col_name] = df[col_name].apply(handle_serial_date)
            
            # Convert regular date values to datetime values
            df[col_name] = pd.to_datetime(df[col_name], errors='coerce')
            
            # Replace invalid or null dates with the specified value (if any)
            if replace_value is not None:
                replace_value = pd.to_datetime(str(replace_value), errors='coerce') # convert replace_value to datetime format
                df[col_name] = df[col_name].fillna(replace_value) # replace invalid or null dates with replace_value
                               
            # Convert the datetime values to the specified format
            df[col_name] = df[col_name].dt.strftime(date_format)
            
            new_data = df[col_name].to_dict()
                        
            return new_data


Input & Output:

Input:
"0": 25.09.2019
"1": 9/16/2015
"2": 10.12.2017
"3": 02.12.2014
"4": 08-Mar-18
"5": 08-12-2016
"6": 26.04.2016
"7": 05-03-2016
"8": 24.12.2016
"9": 10-Aug-19
"10": abc
"11": 05-06-2015
"12": 12-2012-18
"13": 24-02-2010
"14": 2008,13,02
"15": 16-09-2015
"16": 23-01-1992, 7:45
"17": 
"18": 2nd December 2018
"19": 45678
"20": 798

Output: (format= m-d-Y, replacement value=05-06-1987 (for null & invalid dates))

            "0": "09-25-2019",
            "1": "09-16-2015",
            "2": "10-12-2017",
            "3": "02-12-2014",
            "4": "03-08-2018",
            "5": "08-12-2016",
            "6": "04-26-2016",
            "7": "05-03-2016",
            "8": "12-24-2016",
            "9": "08-10-2019",
            "10": "05-06-1987",
            "11": "05-06-2015",
            "12": "05-06-1987",
            "13": "02-24-2010",
            "14": "02-01-2008",
            "15": "09-16-2015",
            "16": "01-23-1992",
            "18": "12-02-2018",
            "19": "01-23-2095",
            "20": "03-09-1972"


The problematic rows are: 12 & 14. Why is my code not working on these two values?
Posted
Updated 30-Apr-23 23:32pm
v2

Because neither of them are known, standard date formats.

If you want to process those, you will have to detect the failure, and write your own code to do the conversion.
 
Share this answer
 
Comments
Apoorva 2022 1-May-23 4:51am    
12-2012-18 doesn't work, but "08-12-2016", a similar format is supported & it converts as expected.
And Pandas supports '-' separator. Isn't it a popular & known date format?
I gave you the answer in your previous post of this question at Transformation of "serial date" to the specified date format[^]. And as you can clearly see, items 12 and 14 are not valid dates so will always convert to NaT Once you have done the initial conversion you need to check each row individually to correct any NaT values.
 
Share this answer
 
Comments
Apoorva 2022 1-May-23 5:30am    
Yes, that's clear. ',' as a separator is not supported by Pandas. I replaced comma with other separators & it worked. But what about '12-2018-18'? That's in a valid date format which works for other values, but not the above combination. I want to know why '12-2018-18' is invalid.
Richard MacCutchan 1-May-23 5:34am    
It is not valid because the converter only works when the year is the first or last field. As far as I am aware there is no country in the world that uses such a format. So you need to convert it first into a valid date string.
Apoorva 2022 1-May-23 8:14am    
Ok. Got it. Thanks.
OriginalGriff 1-May-23 5:51am    
No, there is no standard date format where the year is in the middle, regardless of the separator characters.

Think about it: if the date you handed it was 05-2023-01, is that the fifth of January, or today? It's bad enough with the US being weird and putting the month first, let alone someone inventing even odder ones ... If your data provides dates in that format, it's a) nasty, and b) defined by the data provider - so only you know how to parse it.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900