Home > Mobile >  Split a tuple within a dict and convert into dataframe
Split a tuple within a dict and convert into dataframe

Time:02-04

I have a dataframe like as shown below

td = {966: [('Feat1', -0.04),
  ('Feat2=True ', -0.02),
  ('Feat3 <= 20000.00', 0.01),
  ('Feat4=Power Supply', -0.01),
  ('Feat5=dada', -0.0)],
 879: [('Feat8=Rare', 0.02),
  ('Feat11=HV', -0.01),
  ('Feat21=Power Supply', -0.01),
  ('20000.00 < Feat3 <= 50000.00', 0.01),
  ('Feat5=dada', -0.01)]}

I would like to do the below

a) Split the tuple within dict based on , comma seperator

b) store the numeric part in value column of dataframe and text part in feature column of dataframe

c) repeat the key values for all values in dataframe (and store it in key column)

I tried the below but it is not efficient/elegant and doesn't scale for big data of million rows

feature=[]
value=[]
key=[]
for k, v in td.items():
    for x in v:
        key.append(k)
        f, v  = x
        feature.append(f)
        value.append(v)
data_tuples = list(zip(key,feature,value))
pd.DataFrame(data_tuples, columns=['key','feature','value'])

I expect my output to be like as shown below

enter image description here

CodePudding user response:

You can even use a generator comprehension for the data to avoid building a full list in memory:

pd.DataFrame(([k, elt[0], elt[1]] for  k,v in td.items() for elt in v),
             columns = ['key', 'Feature', 'Value'])

   key                       Feature  Value
0  966                         Feat1  -0.04
1  966                   Feat2=True   -0.02
2  966             Feat3 <= 20000.00   0.01
3  966            Feat4=Power Supply  -0.01
4  966                    Feat5=dada  -0.00
5  879                    Feat8=Rare   0.02
6  879                     Feat11=HV  -0.01
7  879           Feat21=Power Supply  -0.01
8  879  20000.00 < Feat3 <= 50000.00   0.01
9  879                    Feat5=dada  -0.01

CodePudding user response:

Use generator comprehension with flatten values and pass to DataFrame constructor:

df = pd.DataFrame()(k,b,c) for k, v in td.items() for b, c in v), 
                  columns=['key','feature','value'])
print (df)
   key                       feature  value
0  966                         Feat1  -0.04
1  966                   Feat2=True   -0.02
2  966             Feat3 <= 20000.00   0.01
3  966            Feat4=Power Supply  -0.01
4  966                    Feat5=dada  -0.00
5  879                    Feat8=Rare   0.02
6  879                     Feat11=HV  -0.01
7  879           Feat21=Power Supply  -0.01
8  879  20000.00 < Feat3 <= 50000.00   0.01
9  879                    Feat5=dada  -0.01
  •  Tags:  
  • Related