I have a table:
------------------- ------------- -------------- ------- -------------
| session_id| insert_dttm| key| value| process_name|
------------------- ------------- -------------- ------- -------------
|local-1641922005078|1641922023703|test_file1.csv|Success|ProcessResult|
|local-1641922005078|1641922023704|test_file1.csv|Success|ProcessResult|
|local-1641922005078|1641922023705|test_file2.csv|Success|ProcessResult|
|local-1641922005080|1641922023706|test_file2.csv|Success|ProcessResult|
|local-1641922005080|1641922023707|test_file3.csv|Success|ProcessResult|
|local-1641922005080|1641922023708|test_file3.csv|Success|ProcessResult|
------------------- ------------- -------------- ------- -------------
I want to get last session from this table:
local-1641922005080 :String
Can I do this using a window function?
I have solution:
val lastSessionId = ds.select(max(struct(col("insert_dttm"), col("session_id")))("session_id"))
.first.getString(0)
Only I also want to implement this with a window function.
CodePudding user response:
Actually you don't need here a window function since you can just sort the data in desc order and return the first record using limit(1).
But for the sake of practice, you can use window function like this:
import org.apache.spark.sql.functions.{col, row_number}
import org.apache.spark.sql.expressions.Window
val windowSpec = Window.orderBy(col("insert_dttm").desc)
val lastSessionId = df.withColumn("row_number", row_number.over(windowSpec)).filter("row_number=1").first.getString(0)
