How to merge data vertically so there are multiple lines instead of multiple columns-CodePudding

I have a dataframe that shows lineage (in the sample data provided it is a person's route). I have another dataframe that gives me preceding information about the routes, which should slot in the same format.

First sample looks like this:

                 'Lineage Step': {0: 1, 1: 2, 2: 3, 3: 4, 4:1, 5:2},
                 'From Country': {0: 'Spain', 1: 'Scotland', 2: 'England', 3: 'England', 4: 'Scotland', 5:'England'},
                 'From Town': {0: 'Madrid', 1: 'Edinburgh', 2: 'London', 3: 'London', 4: 'Edinburgh', 5: 'Manchester'},
                 'FromStreet': {0: 'Spanish St', 1: 'Main St', 2: 'Lower St', 3: 'Middle St', 4: 'London St', 5: 'Scotland St'},
                 'ToCountry': {0: 'Scotland', 1: 'England', 2: 'England', 3: 'England', 4: 'England', 5: 'England'},
                 'ToTown': {0: 'Edinburgh', 1: 'London', 2: 'London', 3: 'London', 4: 'Liverpool', 5: 'London'}, 
                 'ToStreet': {0: 'Lower St', 1: 'Middle St', 2: 'Upper St', 3: 'Upper St', 4: 'new St', 5: 'Old St'}}
sample_data = pd.DataFrame.from_dict(current_route)

The data I'd like to incorporate into this format looks like this:

upstream_route = {'From Country': {0: 'France', 1: 'Germany'},
                  'From Town': {0: 'Paris', 1: 'Berlin'},
                  'FromStreet': {0: 'French St', 1: 'Gerrman St'},
                  'ToCountry': {0: 'Spain', 1: 'Scotland'}, 
                  'ToTown': {0: 'Madrid', 1: 'Edinburgh'}, 
                  'ToStreet': {0: 'Spanish St', 1: 'London St'}}

upstream_data = pd.DataFrame.from_dict(upstream_route)

The data here is non-person specific. I.e. anyone who starts in Spain should now have the prior France leg of the journey added in.

The final dataset would look like this,

joined_route = {'AttributeName': {0: 'John', 1: 'John', 2: 'John', 3: 'John', 4: 'John', 5: 'Sally', 6: 'Sally', 7:'Sally'},
                 'Lineage Step': {0: 1, 1: 2, 2: 3, 3: 4, 4:5, 5:1, 6:2, 7:3},
                 'From Country': {0: 'France', 1:'Spain', 2: 'Scotland', 3: 'England', 4: 'England', 5: 'Germany', 6:'Scotland', 7: 'England'},
                 'From Town': {0: 'Paris', 1: 'Madrid', 2: 'Edinburgh', 3: 'London', 4: 'London', 5: 'Berlin', 6: 'Edinburgh', 7: 'London'},
                 'FromStreet': {0: 'French St', 1: 'Spanish St', 2: 'Main St', 3: 'Lower St', 4: 'Middle St', 5: 'German St', 6: 'London St', 7: 'Scotland St'},
                 'ToCountry': {0: 'Spain', 1: 'Scotland', 2: 'England', 3: 'England', 4: 'England', 5: 'Scotland', 6: 'England', 7: 'England'},
                 'ToTown': {0: 'Madrid', 1: 'Edinburgh', 2: 'London', 3: 'London', 4: 'Edinburgh', 5: 'London', 6: 'Liverpool', 7: 'London'}, 
                 'ToStreet': {0: 'Spanish St', 1: 'Lower St', 2: 'Middle St', 3: 'Upper St', 4: 'Upper St', 5: 'London St', 6: 'new St', 7: 'Old St'}}
joined_data = pd.DataFrame.from_dict(joined_route)

Where we see John and Sally have respectively had France and Germany added into their journeys, even though the added data didn't use the name variable.

One way I've thought about doing this is to merge the data like so:

sample_data.merge(upstream_data, left_on = ['From Country', 'From Town','FromStreet'], right_on = ['ToCountry', 'ToTown',
       'ToStreet'], how = 'left', suffixes= ('_current', '_new'))

I'd then duplicate the lines and then delete alternating lines on the current / new columns, and merge the two (that's theoretical though,I'm not sure how the code would work in practice.

Alternatively, a way of joining the data vertically, so the new lines slot in automatically above based on a match, would be the quickest and easiest way of joining the data. Is this possible?

CodePudding user response：

Your first idea is doable (and seems to be the best approach).

First we merge and select the 'new' columns that we want. We also set 'Lineage Step to 0 as we will sort later:

df_up = pd.merge(sample_data, upstream_data, left_on = ['From Country', 'From Town','FromStreet'], right_on = ['ToCountry', 'ToTown',
       'ToStreet'], how = 'left', suffixes= ('_current', ''))[['AttributeName','Lineage Step','From Country', 'From Town','FromStreet','ToCountry', 'ToTown',
       'ToStreet']].dropna()
df_up['Lineage Step'] = 0
df_up

df_up looks like this:

    AttributeName      Lineage Step  From Country    From Town    FromStreet    ToCountry    ToTown     ToStreet
--  ---------------  --------------  --------------  -----------  ------------  -----------  ---------  ----------
 0  John                          0  France          Paris        French St     Spain        Madrid     Spanish St
 4  Sally                         0  Germany         Berlin       Gerrman St    Scotland     Edinburgh  London St

now we append this dataframe to sample_data and sort

df_jn = sample_data.append(df_up, ignore_index = True).sort_values(['AttributeName','Lineage Step'])
df_jn['Lineage Step']  =1

df_jn looks like this:

    AttributeName      Lineage Step  From Country    From Town    FromStreet    ToCountry    ToTown     ToStreet
--  ---------------  --------------  --------------  -----------  ------------  -----------  ---------  ----------
 6  John                          1  France          Paris        French St     Spain        Madrid     Spanish St
 0  John                          2  Spain           Madrid       Spanish St    Scotland     Edinburgh  Lower St
 1  John                          3  Scotland        Edinburgh    Main St       England      London     Middle St
 2  John                          4  England         London       Lower St      England      London     Upper St
 3  John                          5  England         London       Middle St     England      London     Upper St
 7  Sally                         1  Germany         Berlin       Gerrman St    Scotland     Edinburgh  London St
 4  Sally                         2  Scotland        Edinburgh    London St     England      Liverpool  new St
 5  Sally                         3  England         Manchester   Scotland St   England      London     Old St