Home > Net >  Spark How to add value in a Hashmap from RDD?
Spark How to add value in a Hashmap from RDD?

Time:02-09

I have below Data frame

val df = phDF.groupBy("name").agg(collect_list("message").as("Messages"))

I got below output

 ----------- -------------------- 
|name       |Messages            |
 ----------- -------------------- 
|     Test1 |['A','B','C']       |
|     Test2 |['A','B','C','D']   |
|     Test3 |['A','B']           |
 ----------- -------------------- 

Now I want to add above name (as a Key) and Message (as a value) into a Hashmap.

I have used below approach to convert it into RDD but not getting any clue

var m = scala.collection.mutable.Map[String, String]()
val rdd = df.rdd.map(_.mkString("##"))
val rdd1 = rdd.map(s=>s.split("##"))
val rdd2 = rdd1.map(ele=>m.put(ele(0),ele(1)))
print(m)   // Output:- HashMap()

As above when I try to print hashMap then I am getting blank

Does anyone can help me how could I store this value in HashMap as below like?

Map("Test1" -> "['A','B','C']" ,"Test2" -> "['A','B','C','D']","Test3" -> "['A','B']")

CodePudding user response:

Given your initial data:

val df = Seq(
  ("test1", Seq("A", "B", "C")),
  ("test2", Seq("A", "B", "C", "D")),
).toDF("name", "Messages")

You can convert it into a map with the map_from_entries method:

val asMapDf = df.select(
  map_from_entries(
    array(
      struct("name", "Messages")
    )
  )
)

Note you create an array of struct items with two columns. Each entry in the array becomes an entry in the map. This gives you:

 ----------------------- 
|map                    |
 ----------------------- 
|{test1 -> [A, B, C]}   |
|{test2 -> [A, B, C, D]}|
 ----------------------- 
  •  Tags:  
  • Related