I want to create 2 data frames out of the below list:-
results = [
{'type': 'check_datatype',
'kwargs': {'table': 'cars', 'columns': ['car_id','index'], 'd_type': 'str'},
'datasource_path': '/cars_dataset_ok/',
'Result': False},
{'type': 'check_string_consistency',
'kwargs': {'table': 'cars', 'columns': ['car_id'], 'string_length': 6},
'datasource_path': '/cars_dataset_ok/',
'Result': False}
]
1st dataframe should give type,id(it will be incremental id and unique for each type)
2nd dataframe should give argument details of each type e.g. for 'check_datatype',
1st data frame should have output like this -
type | id
check_datatype,1
check_string_consistency,2
2nd data frame should have output like this -
id|key|value|index
1,table,cars,1
1,columns,car_id,1
1,columns,index,2
1,dtype,str,1
2,table,cars,1
2,columns,car_id,1
2,string_length,6,1
What I have tried:
<pre>Somehow, i am able to create the first dataframe using below approach but not able to create second dataframe -
from pyspark.sql import functions as F
from pyspark.sql import Window
results = [[elt['type']] for elt in results]
checkColumns = ['type']
checkDF = spark.createDataFrame(data=results, schema=checkColumns)
checkDF=checkDF.withColumn("id",F.row_number().over(Window.orderBy(F.monotonically_increasing_id())))
checkDF.printSchema()
checkDF.show(truncate=False)