i have some individual values with data and i have to convert it into dataframe. and i tried the below . Only one row output will come.
val matchingcount= 3
val notmatchingcount=5
val filename=h:/filename1
import spark.implicits._
val data=Seq(" filename "," matchingcount "," notmatchingcount ").toDF("ezfilename","match_count","non_matchcount")
data.show()
throwing error :
Exception in thread "main" java.lang.IllegalArguementException : requirement failed : the number of columns doesn't match.
Old column names (1): value
New column names (8) : ezfilename,match_count,non_matchcount
Any help please
CodePudding user response:
You were almost there! The code that does what you want is the following:
val matchingcount= 3
val notmatchingcount=5
val filename="h:/filename1"
import spark.implicits._
val data=Seq((filename,matchingcount,notmatchingcount)).toDF("ezfilename","match_count","non_matchcount")
data.show()
------------ ----------- --------------
| ezfilename|match_count|non_matchcount|
------------ ----------- --------------
|h:/filename1| 3| 5|
------------ ----------- --------------
There are 3 key differences between your code and the code above here:
- In scala, a string has to be surrounded by
"characters. So I've added these characters toval filename= - You were correct in the fact that you could use a
Seqto use thetoDFmethod after importsspark.implicits._, but each element of the string would represent one row of the dataframe. So instead of creating a dataframe with 3 columns you were creating one with 1 element. The way you can create 3 columns is by adding tuples inside of yourSeq. So notice the difference betweenSeq(bla,bla,bla)andSeq((bla, bla, bla))where the latter is the correct one. You can also create multiple rows like this by doing:Seq((bla, bli, blu), (blo, ble, bly)). - In Scala, the way you access a variable's value is by simply writing the variable's name. So writing
filenameinstead of" filename "is the correct way of doing that.
Hope this helps!
