Home > database >  Spark doesn't recognize the column name in SQL query while can output it to a dataset
Spark doesn't recognize the column name in SQL query while can output it to a dataset

Time:01-24

I'm applying the SQL query like that:

s"SELECT *  FROM my_table_joined WHERE (timestamp > '2022-01-23' and writetime is not null and acceptTimestamp is not null)"

and I'm getting the error message like that.

warning: there was one deprecation warning (since 2.0.0); for details, enable `:setting -deprecation' or `:replay -deprecation'
org.postgresql.util.PSQLException: ERROR: column "accepttimestamp" does not exist
  Hint: Perhaps you meant to reference the column "mf_joined.acceptTimestamp".
  Position: 103
  at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2497)
  at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2233)
  at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:310)
  at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:446)
  at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:370)
  at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:149)
  at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:108)
  at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:61)
  at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:226)
  at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:35)
  at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:344)
  at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:297)
  at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:286)
  at scala.Option.getOrElse(Option.scala:189)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:286)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:221)
  at $$$e76229fa87b6865de321c5274e52c2f9$$$$w$getDFFromJdbcSource(<console>:1133)
  ... 326 elided

If I omit acceptTimestamp like that:

s"SELECT *  FROM my_table_joined WHERE (timestamp > '2022-01-23' and writetime is not null)"

I'm getting the data as below:

 ------------------- ---------- ---- ------------------ ----------------- --- ----- ------ ---------- --------------- ------- ----------------------- ---------- --------- ------------- ------------ --------------- --------- ----- ------------------- ----------------------- --------------- -------------- ------------- ------------------- ------------------- --- --- ------------------ ----- ---- ---- ------------------ --- 
|timestamp          |flags     |type|lon               |lat              |alt|speed|course|satellites|digital_twin_id|unit_id|unit_ts                |name      |unit_type|measure_units|access_level|uid            |placement|stale|start              |writetime              |acceptTimestamp|delayWindowEnd|DiffInSeconds|time               |hour               |max|min|mean              |count|max2|min2|mean2             |rnb|
 ------------------- ---------- ---- ------------------ ----------------- --- ----- ------ ---------- --------------- ------- ----------------------- ---------- --------- ------------- ------------ --------------- --------- ----- ------------------- ----------------------- --------------- -------------- ------------- ------------------- ------------------- --- --- ------------------ ----- ---- ---- ------------------ --- 

please note acceptTimestamp is here!

So how I should handle this column in my query to make it taken into account?

CodePudding user response:

From the exception, it seems this is related to Postgres not Spark. If you look at the error message you got, the column name is folded to lowercase accepttimestamp whereas in your query the T is in uppercase acceptTimestamp.

To make the column name case-sensitive for Postgres, you need to use double-quotes. Try this:

val query = s"""SELECT * FROM my_table_joined 
    WHERE   timestamp > '2022-01-23' 
    and     writetime is not null 
    and     "acceptTimestamp" is not null"""  
  •  Tags:  
  • Related