Click here to Skip to main content
15,884,176 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
I want to create 2 data frames out of the below list:-

results = [
         {'type': 'check_datatype',
          'kwargs': {'table': 'cars', 'columns': ['car_id','index'], 'd_type': 'str'},
          'datasource_path': '/cars_dataset_ok/',
          'Result': False},
        {'type': 'check_string_consistency',
          'kwargs': {'table': 'cars', 'columns': ['car_id'], 'string_length': 6},
          'datasource_path': '/cars_dataset_ok/',
          'Result': False}
        ]
    
    
    
    
1st dataframe should give type,id(it will be incremental id and unique for each type)

2nd dataframe should give argument details of each type e.g. for 'check_datatype',

1st data frame should have output like this -

type | id

check_datatype,1

check_string_consistency,2
    
2nd data frame should have output like this -

id|key|value|index

1,table,cars,1

1,columns,car_id,1

1,columns,index,2

1,dtype,str,1

2,table,cars,1

2,columns,car_id,1

2,string_length,6,1

What I have tried:

<pre>Somehow, i am able to create the first dataframe using below approach but not able to create second dataframe -

from pyspark.sql import functions as F
from pyspark.sql import Window
results = [[elt['type']] for elt in results]
checkColumns = ['type']
checkDF = spark.createDataFrame(data=results, schema=checkColumns)   
checkDF=checkDF.withColumn("id",F.row_number().over(Window.orderBy(F.monotonically_increasing_id())))
checkDF.printSchema()
checkDF.show(truncate=False)
Posted

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900