Home > database >  Writing Pyspark Dataframe to TFrecords file
Writing Pyspark Dataframe to TFrecords file

Time:01-13

I have a dataframe with schema, and want to convert this into tfRecords

root
 |-- col1: string (nullable = true)
 |-- col2: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- col3: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- col4: array (nullable = true)
 |    |-- element: float (containsNull = true)
 |-- col5: array (nullable = true)
 |    |-- element: float (containsNull = true)
 |-- col6: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- col7: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- col8: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- col9: array (nullable = true)
 |    |-- element: string (containsNull = true) 

I'm using spark tensorflow connector

df.write.mode("overwrite").format("tfrecords").option("recordType", "Example").save("targetpath.tf")

Error which I'm getting while saving the data into tfrecords

java.lang.NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)Lscala/collection/mutable/ArrayOps

I have tried similar approach in databricks community edition as well , also got the similar erro

Can anyone help here ?

CodePudding user response:

The most probable cause (judging from Maven Central information) is that you're using connector compiled for Scala 2.11 on the Databricks runtime that uses Scala 2.12.

Either you need to use DBR 6.4 for that conversion, or compile connector for Scala 2.12 & use.

  •  Tags:  
  • Related