So I have this spark dataframe with following schema:
```
root
|-- id: string (nullable = true)
|-- elements: struct (nullable = true)
| |-- created: string (nullable = true)
| |-- id: string (nullable = true)
| |-- items: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- field: string (nullable = true)
| | | |-- fieldId: string (nullable = true)
| | | |-- fieldtype: string (nullable = true)
| | | |-- from: string (nullable = true)
| | | |-- fromString: string (nullable = true)
| | | |-- tmpFromAccountId: string (nullable = true)
| | | |-- tmpToAccountId: string (nullable = true)
| | | |-- to: string (nullable = true)
| | | |-- toString: string (nullable = true)
For this case, I want to change value inside "items" elements (field, fieldId, etc.) using defined value ("Issue") - without caring if it is empty or already filled. So it should be from:
+--------+--------------------------------------------------------------------------------+
| id | elements |
+--------+--------------------------------------------------------------------------------+
|ABCD-123|[2023-01-16T20:25:30.875+0700, 5388402, [[field, , status,,,,, 23456, Yes]]] |
+--------+--------------------------------------------------------------------------------+
To:
+--------+----------------------------------------------------------------------------------------------------------+
| id | elements |
+--------+----------------------------------------------------------------------------------------------------------+
|ABCD-123|[2023-01-16T20:25:30.875+0700, 5388402, [[Issue, Issue, Issue, Issue, Issue, Issue, Issue, Issue, Issue]]]|
+-------------------------------------------------------------------------------------------------------------------+
What I have tried:
I already try using this script in python file, but it didn't work:
replace_list = ['field', 'fieldtype', 'fieldId', 'from', 'fromString', 'to', 'toString', 'tmpFromAccountId', 'tmpToAccountId']
# Didn't work 1
for col_name in replace_list: df = df.withColumn(f"items.element.{col_name}", lit("Issue"))
# Didn't work 2
for col_name in replace_list: df = df.withColumn("elements.items.element", struct(col(f"elements.items.element.*"), lit("Issue").alias(f"{col_name}")))
In this case, I'm using Spark version 2.4.8. I don't want to use explode method since I want to avoid join dataframes. Is it possible to perform this kind of operation directly in spark? Thank you.