Home > Software design >  Tagging character matrices in NumPy
Tagging character matrices in NumPy

Time:01-25

I'm trying to come up with a way of encoding strings with character-level tags in NumPy. For example, given the following accepted characters:

chars = ['a', 'b', 'c', 'd', '1', '2', '3', '4']

The string:

s = ['1','b','a','c','3','4','1','1']

gets encoded as like so:

char_mat = np.array([[c]*len(chars) for c in chars])

s_mat = 1*(char_mat==s)

and the resulting s_mat looks like this:

array([[0, 0, 1, 0, 0, 0, 0, 0],
       [0, 1, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 1, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [1, 0, 0, 0, 0, 0, 1, 1],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 1, 0, 0, 0],
       [0, 0, 0, 0, 0, 1, 0, 0]])

Where each row corresponds to a character in chars, and each column corresponds to the position in the string. So the a in s is in the third column, the b is in the second, and so on.

Let's say I also have a class tag for each character in the string, and these are, for instance:

'1' 'b' 'a' 'c' '3' '4' '1' '1'
 |   |   |   |   |   |   |   |
 v   v   v   v   v   v   v   v
 1   2   2   0   0   3   3   1

I'd like to come up with a way of outputting a tag_matrix that has the same shape as s_mat but contains the tag for each element, like this:

array([[0, 0, 2, 0, 0, 0, 0, 0],
       [0, 2, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [1, 0, 0, 0, 0, 0, 3, 1],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 3, 0, 0]])

But I really can't figure out how to do this. Many thanks in advance for any help or suggestions. Note also that this is a small version of the actual problem I'm working on, in which strings are much longer, and the accepted characters include ascii lowercase, digits, and punctuation.

CodePudding user response:

It should be as simple as multiplying the class tags array with s_mat:

class_tags = np.array([1, 2, 2, 0, 0, 3, 3, 1])
tag_matrix = s_mat * class_tags
  •  Tags:  
  • Related