Have got two dataframe like below with index values: tab_df
row table_no table_bit
83 45 1 1
84 45 1 2
85 45 1 3
86 45 2 1
87 45 2 2
88 45 2 3
91 46 1 1
92 46 1 2
93 46 1 3
item_df
count Item
15 2 Apple
16 2 Apple
17 1 Lemon
18 2 Apricot
19 2 Apricot
35 2 Melon
36 2 Melon
37 1 Orange
38 1 Potato
Condition to be considered:
tab_dfrows along with index value should not be jumbled. It's fixed.item_dfrows to be concat/joined withtab_dfto attainoutput_dfconsidering below rules: (item_dfdata will be in row-table_no-table_bit order. Each row can have any number of table_no. Each table_no will have only 3 table_bit.) 2.1.item_dfrows to be arranged/joined in the same order withtab_dfthe number of times it repeats. [Column count gives number of times that particular Item repeats]. E.g., Item Apple repeats twice so it takes two bits from first row, then comes Lemon which takes single bit from same row and continues. 2.2. No Item should sharetable_bitacross rows. E.g., On arranging up items with rule 2.1, at the end of row 45, after Apricot, Melon need to be arranged but that row got only single table_bit left out. So, need to check for next Item which needs only single bit and that to be arranged there and Melon should be moved to other row and carry on with the same flow as of rule 2.1 2.3. Each Item initem_dfgot it's respective index which should not be changed. If any item is rearranged like Melon same index need to be carried out for every item.
output_df
tab_index row table_no table_bit item_index count Item
83 45 1 1 15 2 Apple
84 45 1 2 16 2 Apple
85 45 1 3 17 1 Lemon
86 45 2 1 18 2 Apricot
87 45 2 2 19 2 Apricot
88 45 2 3 37 1 Orange
91 46 1 1 35 2 Melon
92 46 1 2 36 2 Melon
93 46 1 3 38 1 Potato
Looking forward. Any help will be appreciated!
CodePudding user response:
I feel like there ought to be an easier way to do this, but if I understand you correctly, the following should work.
Setup
import pandas as pd
data = {'row': {83: 45, 84: 45, 85: 45, 86: 45, 87: 45, 88: 45, 91: 46, 92: 46, 93: 46},
'table_no': {83: 1, 84: 1, 85: 1, 86: 2, 87: 2, 88: 2, 91: 1, 92: 1, 93: 1},
'table_bit': {83: 1, 84: 2, 85: 3, 86: 1, 87: 2, 88: 3, 91: 1, 92: 2, 93: 3}}
tab_df = pd.DataFrame(data)
data2 = {'count': {15: 2, 16: 2, 17: 1, 18: 2, 19: 2, 35: 2, 36: 2, 37: 1, 38: 1},
'Item': {15: 'Apple', 16: 'Apple', 17: 'Lemon', 18: 'Apricot', 19: 'Apricot', 35: 'Melon', 36: 'Melon', 37: 'Orange', 38: 'Potato'}}
item_df = pd.DataFrame(data2)
Code
tmp = item_df.reset_index().groupby('Item', as_index=False).agg(
{'index': list, 'count': 'first'})
tmp = tmp.loc[tmp.groupby('count').cumcount().sort_values().index].explode('index')
tmp = tmp.loc[:, ['index','count','Item']].rename(columns={'index':'item_index'})
tab_df[tmp.columns] = tmp.to_numpy()
print(tab_df)
row table_no table_bit item_index count Item
83 45 1 1 15 2 Apple
84 45 1 2 16 2 Apple
85 45 1 3 17 1 Lemon
86 45 2 1 18 2 Apricot
87 45 2 2 19 2 Apricot
88 45 2 3 37 1 Orange
91 46 1 1 35 2 Melon
92 46 1 2 36 2 Melon
93 46 1 3 38 1 Potato
Explanation
- First, use
df.reset_indexand applydf.groupbyonItem. Use.aggto turnindexvalues (now a column) per group into lists (e.g. forApple:[15, 16]), and for columncountsimply get.first(e.g. forApple:2). - In the next step, apply
df.groupbyagain, this time oncount(now:[2, 2, 1, 2, 1, 1]associated with['Apple', 'Apricot', 'Lemon', 'Melon', 'Orange', 'Potato']), and get.cumcount. Sorting these values usingSeries.sort_valuesgets us a rearranged index[0, 2, 1, 4, 3, 5]. We want the original index order adjusted in this way, which we can achieve by nesting the above insidetmp.loc. Chain.explodeon columnindexto get the lists back as rows again. - The row order for
tmp(derived fromitem_df) is now correct; last step is just cosmetic: i.e. getting the columns in the desired order as well:tmp = tmp.loc[:, ['index','count','Item']], and chaining.renameto change col nameindexintoitem_index. - Finally, add these columns to
tab_df. Useto_numpy, since we want to ignore the index oftmp.
