Home > Software design >  Write a window function Spark
Write a window function Spark

Time:01-14

I have a table:

 ------------------- ------------- -------------- ------- ------------- 
|         session_id|  insert_dttm|           key|  value| process_name|
 ------------------- ------------- -------------- ------- ------------- 
|local-1641922005078|1641922023703|test_file1.csv|Success|ProcessResult|
|local-1641922005078|1641922023704|test_file1.csv|Success|ProcessResult|
|local-1641922005078|1641922023705|test_file2.csv|Success|ProcessResult|
|local-1641922005080|1641922023706|test_file2.csv|Success|ProcessResult|
|local-1641922005080|1641922023707|test_file3.csv|Success|ProcessResult|
|local-1641922005080|1641922023708|test_file3.csv|Success|ProcessResult|
 ------------------- ------------- -------------- ------- ------------- 

I want to get last session from this table:

local-1641922005080 :String

Can I do this using a window function?

I have solution:

val lastSessionId = ds.select(max(struct(col("insert_dttm"), col("session_id")))("session_id"))
                      .first.getString(0)

Only I also want to implement this with a window function.

CodePudding user response:

Actually you don't need here a window function since you can just sort the data in desc order and return the first record using limit(1).

But for the sake of practice, you can use window function like this:

import org.apache.spark.sql.functions.{col, row_number}
import org.apache.spark.sql.expressions.Window


val windowSpec  = Window.orderBy(col("insert_dttm").desc)
val lastSessionId = df.withColumn("row_number", row_number.over(windowSpec)).filter("row_number=1").first.getString(0)
  •  Tags:  
  • Related