Im working with a transaction data set, new data is update every in a Daily-Monthly file. Actually i'm load Historical and Daily files (10 files), then processed and finally created a Full data (Last year sales current year sales), this procces takes 20 min every day.
Now i want to change the process, load Full data (1 file) and then add new data. This works much faster, like 2 min. The problem is the next one:
- FULL data contains 2021 Sales January 2022 Sales.
- New data contains January 2022 Sales February Sales.
when I merge the data with the following code, i lost data, because in FULL data exists duplicate data (this is correct).
DATA <- bind_rows(FULL, Daily)%>%distinct()
Here is an example of what I need:
FULL DATA:
| Sales ID | Sales Order | SKU | Price | Quantity | Description | Date |
|---|---|---|---|---|---|---|
| ABC | ABC1 | 123452 | 100 | 1 | Tshirt | 21-12-31 |
| ABCD | ABCD | 12345 | 200 | 2 | Shoes | 21-12-31 |
| ABCDE | ABCDE1 | ERROR | -100 | 1 | Devolution | 22-01-01 |
| ABCDE | ABCDE1 | ERROR | -100 | 1 | Devolution | 22-01-01 |
| ABCDE | ABCDE1 | ERROR | -100 | 1 | Devolution | 22-01-01 |
| FGH | FGH1 | 12412 | 10 | 1 | Socks | 22-01-02 |
| FGH | FGH1 | 12112 | 10 | 1 | Socks | 22-01-02 |
DAILY DATA:
| Sales ID | Sales Order | SKU | Price | Quantity | Description | Date |
|---|---|---|---|---|---|---|
| FGH | FGH1 | 12412 | 10 | 1 | Socks | 22-01-02 |
| FGH | FGH1 | 12112 | 10 | 1 | Socks | 22-01-02 |
| FGHRT | FGHRT1 | 12112 | 100 | 1 | Shoe | 22-01-03 |
| FGHRT | FGHRT1 | 12342 | 10 | 2 | Backpack | 22-01-03 |
| FGHRTY | FGHRTY1 | 126342 | 1000 | 1 | Bag | 22-01-04 |
| FGHRUT | FGHRUT3 | 121534 | 107 | 2 | Tshit | 22-01-04 |
Final Output (Just add bold data)
| Sales ID | Sales Order | SKU | Price | Quantity | Description | Date |
|---|---|---|---|---|---|---|
| ABC | ABC1 | 123452 | 100 | 1 | Tshirt | 21-12-31 |
| ABCD | ABCD | 12345 | 200 | 2 | Shoes | 21-12-31 |
| ABCDE | ABCDE1 | ERROR | -100 | 1 | Devolution | 22-01-01 |
| ABCDE | ABCDE1 | ERROR | -100 | 1 | Devolution | 22-01-01 |
| ABCDE | ABCDE1 | ERROR | -100 | 1 | Devolution | 22-01-01 |
| FGH | FGH1 | 12412 | 10 | 1 | Socks | 22-01-02 |
| FGH | FGH1 | 12112 | 10 | 1 | Socks | 22-01-02 |
| FGHRT | FGHRT1 | 12112 | 100 | 1 | Shoe | 22-01-03 |
| FGHRT | FGHRT1 | 12342 | 10 | 2 | Backpack | 22-01-03 |
| FGHRTY | FGHRTY1 | 126342 | 1000 | 1 | Bag | 22-01-04 |
| FGHRUT | FGHRUT3 | 121534 | 107 | 2 | Tshit | 22-01-04 |
CodePudding user response:
You can use merge() for that.
merge(full_data, daily_data, all = T)
Sales.ID Sales.Order SKU Price Quantity Description Date
1 ABC ABC1 123452 100 1 Tshirt 21-12-31
2 ABCD ABCD 12345 200 2 Shoes 21-12-31
3 ABCDE ABCDE1 ERROR -100 1 Devolution 22-01-01
4 ABCDE ABCDE1 ERROR -100 1 Devolution 22-01-01
5 ABCDE ABCDE1 ERROR -100 1 Devolution 22-01-01
6 FGH FGH1 12112 10 1 Socks 22-01-02
7 FGH FGH1 12412 10 1 Socks 22-01-02
8 FGHRT FGHRT1 12112 100 1 Shoe 22-01-03
9 FGHRT FGHRT1 12342 10 2 Backpack 22-01-03
10 FGHRTY FGHRTY1 126342 1000 1 Bag 22-01-04
11 FGHRUT FGHRUT3 121534 107 2 Tshit 22-01-04
