I'm applying the SQL query like that:
s"SELECT * FROM my_table_joined WHERE (timestamp > '2022-01-23' and writetime is not null and acceptTimestamp is not null)"
and I'm getting the error message like that.
warning: there was one deprecation warning (since 2.0.0); for details, enable `:setting -deprecation' or `:replay -deprecation'
org.postgresql.util.PSQLException: ERROR: column "accepttimestamp" does not exist
Hint: Perhaps you meant to reference the column "mf_joined.acceptTimestamp".
Position: 103
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2497)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2233)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:310)
at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:446)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:370)
at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:149)
at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:108)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:61)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:226)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:35)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:344)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:297)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:286)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:286)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:221)
at $$$e76229fa87b6865de321c5274e52c2f9$$$$w$getDFFromJdbcSource(<console>:1133)
... 326 elided
If I omit acceptTimestamp like that:
s"SELECT * FROM my_table_joined WHERE (timestamp > '2022-01-23' and writetime is not null)"
I'm getting the data as below:
------------------- ---------- ---- ------------------ ----------------- --- ----- ------ ---------- --------------- ------- ----------------------- ---------- --------- ------------- ------------ --------------- --------- ----- ------------------- ----------------------- --------------- -------------- ------------- ------------------- ------------------- --- --- ------------------ ----- ---- ---- ------------------ ---
|timestamp |flags |type|lon |lat |alt|speed|course|satellites|digital_twin_id|unit_id|unit_ts |name |unit_type|measure_units|access_level|uid |placement|stale|start |writetime |acceptTimestamp|delayWindowEnd|DiffInSeconds|time |hour |max|min|mean |count|max2|min2|mean2 |rnb|
------------------- ---------- ---- ------------------ ----------------- --- ----- ------ ---------- --------------- ------- ----------------------- ---------- --------- ------------- ------------ --------------- --------- ----- ------------------- ----------------------- --------------- -------------- ------------- ------------------- ------------------- --- --- ------------------ ----- ---- ---- ------------------ ---
please note acceptTimestamp is here!
So how I should handle this column in my query to make it taken into account?
CodePudding user response:
From the exception, it seems this is related to Postgres not Spark. If you look at the error message you got, the column name is folded to lowercase accepttimestamp whereas in your query the T is in uppercase acceptTimestamp.
To make the column name case-sensitive for Postgres, you need to use double-quotes. Try this:
val query = s"""SELECT * FROM my_table_joined
WHERE timestamp > '2022-01-23'
and writetime is not null
and "acceptTimestamp" is not null"""
