Hi all I'm currently working with word vectors in Python and would like to run some Bayesian Hierarchical Clustering in R which seems to only cluster when each vector index is given its own column. I have the code to retrieve the vectors but they are given in numpy arrays in a single column:
label vector \
0 1 Crónicas [ 5.26622403e-03, 2.76202578e-02, -2.03670934e-...
1 1 Juan [-4.13045213e-02, -3.40997241e-04, 6.59986138e-...
2 1 Pedro [ 1.93648413e-03, 7.61903543e-03, 5.45683019e-...
3 1 Reyes [-0.01713392, 0.01234968, -0.00780387, 0.013362...
Ideally I would want it to be something like this:
label x1 x2 x3 \
0 1 Crónicas 5.26622403e-03 2.76202578e-02 -2.03670934e-...
1 1 Juan -4.13045213e-02 -3.40997241e-04 6.59986138e-...
2 1 Pedro 1.93648413e-03 7.61903543e-03 5.45683019e-...
3 1 Reyes -0.01713392 0.01234968 -0.00780387...
Here's some reproducible code I came up with
import pandas as pd
import random
import numpy as np
row_names = ["train", "car", "tractor", "truck", "boat", "plane"]
random_vectors = []
for i in row_names:
vector = [random.uniform(0,1) for i in range(10)]
random_vectors.append(np.array(vector))
label_DF = pd.DataFrame({'label':row_names, 'vector':random_vectors})
Any and all tips are welcome. Have a good day :)
CodePudding user response:
You can convert your list of lists to a 2D Numpy array and construct the final DataFrame with it:
import pandas as pd
import random
import numpy as np
row_names = ["train", "car", "tractor", "truck", "boat", "plane"]
random_vectors = []
for i in row_names:
vector = [random.uniform(0,1) for i in range(10)]
random_vectors.append(np.array(vector))
label_DF = pd.DataFrame({'label':row_names, 'vector':random_vectors})
# Create 2D Numpy array from values
temp = label_DF.vector.values
temp = np.array(list(temp))
# Create final DataFrame using the numpy array
output = pd.DataFrame(temp, index=label_DF.index)
output['label'] = label_DF.label
print(output)
which gives me:
0 1 2 3 4 5 6 7 8 9 label
0 0.971427 0.608333 0.415566 0.139951 0.870935 0.219539 0.972286 0.345405 0.567477 0.087404 train
1 0.568816 0.178477 0.497407 0.415878 0.356035 0.915570 0.119754 0.064307 0.327284 0.899719 car
2 0.947162 0.622367 0.930498 0.362429 0.177074 0.828043 0.434496 0.334775 0.586800 0.685099 tractor
3 0.790544 0.630087 0.323274 0.656123 0.462856 0.437417 0.908296 0.883913 0.028340 0.901321 truck
4 0.110653 0.647129 0.902092 0.597604 0.312707 0.688970 0.889833 0.874016 0.292510 0.256918 boat
5 0.364499 0.149350 0.275034 0.959932 0.890455 0.548498 0.476552 0.146530 0.273142 0.008246 plane
