appending first element of row in iterrows convert alpha to numeric equivelant?-CodePudding

Please see my code below. I'm iterating through strings like '1A', '4D', etc, and I want the output to instead be 1.1, 4.4, and so on..see below.

Instead of 1A I want 1.1, 1B= 1.2, 4A = 4.1, 5D = 5.4, etc...

Convert alphabet letters to number in Python

data = ['1A','1B','4A', '5D','']
df = pd.DataFrame(data, columns = ['Score'])

newcol = []

for col, row in df['Score'].iteritems()
    if pd.isnull(row):
        newcol.append(row)       
    elif pd.notnull(row): 
        newcol.append(#FIRST ELEMENT OF ROW, 1-5,'.', 
                      #NUMERIC EQUIVALENT OF ALPHA, IE, A=1, B=2, C=3, D=4, etc)

CodePudding user response：

You can use str.replace:

df['Score'] = df['Score'].str.replace('\D',
              lambda x: f'.{ord(x.group(0).upper())-64}', regex=True)

output:

  Score
0   1.1
1   1.2
2   4.1
3   5.4
4

CodePudding user response：

Use (with @Ch3steR's comment)-

from string import ascii_uppercase
dic = {j:str(i) for i,j in enumerate(ascii_uppercase, 1)}
df['Score'].str[0]   '.'   df['Score'].str[1].map(dic)

Output

0    1.1
1    1.2
2    4.1
3    5.4
4    NaN
Name: Score, dtype: object

CodePudding user response：

You could build mapping using str.maketrans and str.translate, a common recipe for mapping each character to it's output.

str.maketrans

This static method returns a translation table usable for str.translate().
str.translate

Return a copy of the s where all characters have been mapped through the map which must be a dictionary of Unicode ordinals (integers) to Unicode ordinals, strings or None. Unmapped characters are left untouched.

Use pd.Series.apply and pass str.translate to it.

from string import ascii_uppercase

table = str.maketrans({c: f'.{i}' for i, c in enumerate(ascii_uppercase, 1)})
df['Score'].apply(str.translate, args=(table, ))

# 0    1.1
# 1    1.2
# 2    4.1
# 3    5.4
# 4       
# Name: Score, dtype: object

Timeit results:

benchmarking setup

# Million rows
chars = np.arange(1_000_000).astype(str)   pd.Series([random.choice(ascii_uppercase) for _ in range(1_000_000)])
df = pd.DataFrame({"Score": chars})

Results

@Ch3ster
582 ms ± 4.42 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

@Mozway
1.03 s ± 46.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

@Vivek
Different output (as of this writing the posted answer
                  only works with a string of size two)

When df is large:

If execution time matters you could use maketrans translate solution.

When df is small (size less than 50K):

Both mozway's solution and maketrans almost take a similar time. maketrans being a slightly faster.