Home > Software engineering >  How to check if two table include the same information in Big Query?
How to check if two table include the same information in Big Query?

Time:02-08

Assume we have a table table_a with two string and one int fields:

CREATE TABLE `YOUR_DATABASE.table_a` (
  `A1` STRING,
  `A2` INT64,
  `A3` STRING,
);

table_a includes some data:

INSERT `YOUR_DATABASE.table_a` (A1, A2, A3)
VALUES("X111", 10, "2022-02-07 08:19:00"),
      ("Y222", 15, "2017-05-08 10:19:00"),
      ("X111", 0,"2011-08-02 02:30:00"),
      ("X111", 12, "2019-12-20 11:25:00"),
      ("Y222", 258, "2001-10-14 07:00:00"),
      ("Z333", 4, "2015-03-08 03:25:12")

We have another table, table_b, which is kind of identical to table_a. In fact, table_b has the exact same fields and values as in table_a and it also has an extra DATETIME field. This extra DATETIME field, B4, is essentially same as A3 but converted to a standard DATETIME format.

CREATE TABLE `YOUR_DATABASE.table_b` (
  `B1` STRING,
  `B2` INT64,
  `B3` STRING,
  `B4` DATETIME,
);

And the values in table_b are:

INSERT `YOUR_DATABASE.table_b` (B1, B2, B3, B4)
VALUES("X111", 10, "2022-02-07 08:19:00", DATETIME(PARSE_TIMESTAMP("%Y-%m-%y %H:%M:%S","2022-02-07 08:19:00"),"America/Los_Angeles")),
      ("Y222", 15, "2017-05-08 10:19:00", DATETIME(PARSE_TIMESTAMP("%Y-%m-%y %H:%M:%S","2017-05-08 10:19:00"),"America/Los_Angeles")),
      ("X111", 0, "2011-08-02 02:30:00", DATETIME(PARSE_TIMESTAMP("%Y-%m-%y %H:%M:%S","2011-08-02 02:30:00"),"America/Los_Angeles")),
      ("X111", 12, "2019-12-20 11:25:00", DATETIME(PARSE_TIMESTAMP("%Y-%m-%y %H:%M:%S","2019-12-20 11:25:00"),"America/Los_Angeles")),
      ("Y222", 258, "2001-10-14 07:00:00", DATETIME(PARSE_TIMESTAMP("%Y-%m-%y %H:%M:%S","2001-10-14 07:00:00"),"America/Los_Angeles")),
      ("Z333", 4, "2015-03-08 03:25:12", DATETIME(PARSE_TIMESTAMP("%Y-%m-%y %H:%M:%S","2015-03-08 03:25:12"),"America/Los_Angeles"))

Note, table_a and table_b may or may not have a primary key. How can I check if table_a and table_b include the same information? I have a couple of table_a in my database and I have created equivalent table_b for each of them; now I want to double check and make sure the tables include same information.

CodePudding user response:

If you only want to know whether the tables contain the same data, then you can use CHECKSUM(). This basically calculates the cumulative hash for all values in the column and generates a value. If both values are the same, your tables have the same data in those columns.

SELECT 
    CHECKSUM(A1), CHECKSUM(A2), CHECKSUM(A3)
FROM 
    YOUR_DATABASE.table_a

SELECT 
    CHECKSUM(A1), CHECKSUM(A2), CHECKSUM(A3)
FROM 
    YOUR_DATABASE.table_b

More on CHECKSUM() enter image description here

  •  Tags:  
  • Related