Home > Blockchain >  Merge or Bind_rows just only new data - R
Merge or Bind_rows just only new data - R

Time:02-03

Im working with a transaction data set, new data is update every in a Daily-Monthly file. Actually i'm load Historical and Daily files (10 files), then processed and finally created a Full data (Last year sales current year sales), this procces takes 20 min every day.

Now i want to change the process, load Full data (1 file) and then add new data. This works much faster, like 2 min. The problem is the next one:

  • FULL data contains 2021 Sales January 2022 Sales.
  • New data contains January 2022 Sales February Sales.

when I merge the data with the following code, i lost data, because in FULL data exists duplicate data (this is correct).

DATA <- bind_rows(FULL, Daily)%>%distinct()

Here is an example of what I need:

FULL DATA:

Sales ID Sales Order SKU Price Quantity Description Date
ABC ABC1 123452 100 1 Tshirt 21-12-31
ABCD ABCD 12345 200 2 Shoes 21-12-31
ABCDE ABCDE1 ERROR -100 1 Devolution 22-01-01
ABCDE ABCDE1 ERROR -100 1 Devolution 22-01-01
ABCDE ABCDE1 ERROR -100 1 Devolution 22-01-01
FGH FGH1 12412 10 1 Socks 22-01-02
FGH FGH1 12112 10 1 Socks 22-01-02

DAILY DATA:

Sales ID Sales Order SKU Price Quantity Description Date
FGH FGH1 12412 10 1 Socks 22-01-02
FGH FGH1 12112 10 1 Socks 22-01-02
FGHRT FGHRT1 12112 100 1 Shoe 22-01-03
FGHRT FGHRT1 12342 10 2 Backpack 22-01-03
FGHRTY FGHRTY1 126342 1000 1 Bag 22-01-04
FGHRUT FGHRUT3 121534 107 2 Tshit 22-01-04

Final Output (Just add bold data)

Sales ID Sales Order SKU Price Quantity Description Date
ABC ABC1 123452 100 1 Tshirt 21-12-31
ABCD ABCD 12345 200 2 Shoes 21-12-31
ABCDE ABCDE1 ERROR -100 1 Devolution 22-01-01
ABCDE ABCDE1 ERROR -100 1 Devolution 22-01-01
ABCDE ABCDE1 ERROR -100 1 Devolution 22-01-01
FGH FGH1 12412 10 1 Socks 22-01-02
FGH FGH1 12112 10 1 Socks 22-01-02
FGHRT FGHRT1 12112 100 1 Shoe 22-01-03
FGHRT FGHRT1 12342 10 2 Backpack 22-01-03
FGHRTY FGHRTY1 126342 1000 1 Bag 22-01-04
FGHRUT FGHRUT3 121534 107 2 Tshit 22-01-04

CodePudding user response:

You can use merge() for that.

merge(full_data, daily_data, all = T)

   Sales.ID Sales.Order    SKU Price Quantity Description     Date
1       ABC        ABC1 123452   100        1      Tshirt 21-12-31
2      ABCD        ABCD  12345   200        2       Shoes 21-12-31
3     ABCDE      ABCDE1  ERROR  -100        1  Devolution 22-01-01
4     ABCDE      ABCDE1  ERROR  -100        1  Devolution 22-01-01
5     ABCDE      ABCDE1  ERROR  -100        1  Devolution 22-01-01
6       FGH        FGH1  12112    10        1       Socks 22-01-02
7       FGH        FGH1  12412    10        1       Socks 22-01-02
8     FGHRT      FGHRT1  12112   100        1        Shoe 22-01-03
9     FGHRT      FGHRT1  12342    10        2    Backpack 22-01-03
10   FGHRTY     FGHRTY1 126342  1000        1         Bag 22-01-04
11   FGHRUT     FGHRUT3 121534   107        2       Tshit 22-01-04
  •  Tags:  
  • Related