Home > Software engineering >  I get 'invalid syntax' error using 'var df...' on databricks (trying to change c
I get 'invalid syntax' error using 'var df...' on databricks (trying to change c

Time:01-26

I am trying to change the type of column from string to Datetime using the code below (in Databricks notebook).

import org.apache.spark.sql.functions._
val df\ = df.withColumn("end",col("end").cast(DateType))
    df\.printSchema()

Or like that:

df.createOrReplaceTempView("CastExample")
val df4 = spark.sql("DATE(end) from CastExample")
df4.printSchema()
df4.show(false)

But I get this error:

SyntaxError: invalid syntax
  File "<command-1642181972810133>", line 2
    val df4 = spark.sql("DATE(end) from CastExample")
        ^
SyntaxError: invalid syntax

"val"

It seems like it means 'immutable reference' or something, but I can not find any information about it online. There are many examples of code using it, but no one mentions why it is there. Or I am searching it in the wrong way. It seems like it from Scala, but I don't know... Maybe I did not import something.

I would appreciate any advice on it.

CodePudding user response:

You should not use 'val' as thats the Scala syntax, also if you want all columns of df in df4, use *.

df.createOrReplaceTempView("CastExample")
df4 = spark.sql("SELECT *, DATE(end) as new_name from CastExample")
df4.printSchema()
df4.show(10,False)

You can use PySpark to achieve the same too.

df4=df.select(to_date(df.end).alias('new_name'))
df4.show(10,False)
  •  Tags:  
  • Related