Click here to Skip to main content
15,881,882 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
I'm new to Python and could use some help. I have a function that takes a pandas DataFrame and returns an ordered dictionary to create a YAML file, and another function that takes the created ordered dictionary and performs a YAML dump. The function works for single dictionaries, but I want to add an additional function that can output multiple input dictionaries onto one YAML file.

Take these two DataFrames for example, and convert them into two ordered dictionaries:
df1:
      Name        Occupation  Years  JobSatisfaction
0     Lucy Black  Teacher     10     Yes
1     John Doe    Gardener    3      Yes

df2:
      Name        Age  Salary  Company Name
0     Lucy Black  31   $38000  LAUSD
1     John Doe    23   $17000  Beautiful Lawn


Now, how would you read both ordered dictionaries at once and get the YAML output to look like this:
Python
# In this case, the key_index='Name'for both df1 and df2:

-df1:
  -Lucy Black:
      -Occupation:Teacher
      -Years:10
      -JobSatisfaction:Yes
  -John Doe:
      -Occupation:Gardener
      -Years:3
      -JobSatisfaction:Yes
-df2:
  -Lucy Black:
      -Age: 31
      -Salary: $38000
      -Company Name: LAUSD
  -John Doe:
      -Age: 23
      -Salary: $17000
      -Company Name: Beautiful Lawn


Here is some skeleton code with my preferred code format:
Python
def yaml_output(arguments):
"""
considerations/notes/checks:
        - we want the output yaml to have one outermost dictionary per input 
          excel sheet (so in our case, this will be two total: df1, df2)
        - confirm the first two entries for each dictionary match the example 
          yaml file contained in this directory
"""    
    function parameters
    return() 


What I have tried:

And this is what my code looks like so far:
Python
def create_dict(df, key_index='uniqueID'):
    """
    Function create_dict to take input pandas df and return dictionary which will 
    then be used to create final YAML file

    args:
        df: pandas 2-dim labeled data structure
        key_index: the column that we want set as the index
    
    input: pandas df
    returns: dictionary

    """
    # Get the unordered dictionary
    unordered_dict = df.set_index(key_index).T.to_dict()
    
    # Order the dictionary
    ordered_dict = OrderedDict((k,unordered_dict.get(k)) for k in df[key_index])
    
    return ordered_dict
    

def dump_ordered(dictionary):
    """
    Serialize the ordered dictionary into a YAML stream 
    
    args:
        dictionary: ordered collection of data values that were 
        converted from pandas dataframes
    
    input: Ordered dictionary
    return: ordered yaml 
    """
    yaml.add_representer(OrderedDict, lambda dumper, 
                         data:dumper.represent_mapping('tag:yaml.org,2002:map',     
                         data.items()))

    return yaml.dump(dictionary)

Ultimately, I want to be able to pass the newly created function a variable number of dictionaries with each dictionary's name (could be a list of tuples?), so all dictionaries become one output YAML file. Let me know if you need additional clarification. Thank you!
Posted

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900