Click here to Skip to main content
15,885,931 members
Please Sign up or sign in to vote.
1.00/5 (2 votes)
I wrote a function that converts all the dates of the specified date column to the specified format. Any missing or invalid date is replaced with a value specified by the user.

The code also takes "serial dates" like "5679" into consideration. But my code isn't working on the serial numbers. Let me know where the issue is.



Example:

Input: 45678
Expected Output: 2024-06-27  
Current Output: NaN


What I have tried:

Code:

import pandas as pd
import math

def date_fun(df, col_name, date_format, replace_value):
    
    # Convert column values to string
    df[col_name] = df[col_name].astype(str)
    
    # Check if the column contains serial dates
    if df[col_name].str.isnumeric().all():
        # Convert the column to integer
        df[col_name] = pd.to_numeric(df[col_name], errors='coerce')
        
        # Check if the values are within the valid range of serial dates in Excel
        if df[col_name].between(1, 2958465).all():
            df[col_name] = pd.to_datetime(df[col_name], unit='D', errors='coerce')
        else:
            if replace_value is not None:
                df[col_name] = replace_value
            else:
                df[col_name] = "Invalid Date"
    else:
        df[col_name] = pd.to_datetime(df[col_name], errors='coerce')
    
    # Convert the datetime values to the specified format
    df[col_name] = df[col_name].dt.strftime(date_format)
    
    # Replace invalid or null dates with the specified value (if any)
    if replace_value is not None:
        replace_value = str(replace_value) # convert to string
        df[col_name] = df[col_name].fillna(replace_value)
    
    new_data = df[col_name].to_dict()
    
    # Handle NaN and infinity values
    def handle_nan_inf(val):
        if isinstance(val, float) and (math.isnan(val) or math.isinf(val)):
            return str(val)
        else:
            return val
    
    new_data = {k: handle_nan_inf(v) for k, v in new_data.items()}
    
    return new_data



Input
25.09.2019
9/16/2015
10.12.2017
02.12.2014
08-Mar-18
08-12-2016
26.04.2016
05-03-2016
24.12.2016
10-Aug-19
abc
05-06-2015
12-2012-18
24-02-2010
2008,13,02
16-09-2015
23-01-1992, 7:45

2nd December 2018
45678


My output
            "2019/09/25",
            "2015/09/16",
            "2017/10/12",
            "2014/02/12",
            "2018/03/08",
            "2016/08/12",
            "2016/04/26",
            "2016/05/03",
            "2016/12/24",
            "2019/08/10",
            "nan",
            "2015/05/06",
            "nan",
            "2010/02/24",
            "2008/02/01",
            "2015/09/16",
            "1992/01/23",
            "nan",
            "2018/12/02",
            "nan"

Date Format specified: "%Y/%m/%d"
Posted
Updated 26-Apr-23 1:20am
v4
Comments
OriginalGriff 26-Apr-23 6:11am    
This is not a good question - we cannot work out from that little what you are trying to do.
Remember that we can't see your screen, access your HDD, or read your mind - we only get exactly what you type to work with - we get no other context for your project.
Imagine this: you go for a drive in the country, but you have a problem with the car. You call the garage, say "it broke" and turn off your phone. How long will you be waiting before the garage arrives with the right bits and tools to fix the car given they don't know what make or model it is, who you are, what happened when it all went wrong, or even where you are?

That's what you've done here. So stop typing as little as possible and try explaining things to people who have no way to access your project!

Use the "Improve question" widget to edit your question and provide better information.

1 solution

When I use the value 45678 I get the following result:
Python test result:
2095-01-23 00:00:00


[edit]
I think I have found the problem. The following line tries to check if every column of the frame contains only numeric values:
Python
if df[col_name].str.isnumeric().all():

which will never be true. You actually need to test the contents for each row in the frame one at a time.

[/edit]
 
Share this answer
 
v2
Comments
Apoorva 2022 26-Apr-23 7:03am    
Ok. The start date is taken differently by different systems, but I don't understand why it is returning null for me instead of a date. Did you make any changes in my code when executing?
Richard MacCutchan 26-Apr-23 7:08am    
Nor do I, because I do not have access to the data that you are using. And I did not use your code I just wrote a simple test of the pd.to_datetime function.

You need to show the actual values that cause the problem and explain where in the code it goes wrong.
Richard MacCutchan 26-Apr-23 8:23am    
See my update above.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900