Python dataframe div function will not execuud-CodePudding

I'm working as a python newbie and having following problem. Source Table:

                  Timestamp Value
0      2022-01-31T23:00:37Z    79
1      2022-01-31T23:00:38Z    80
2      2022-01-31T23:00:39Z    79
3      2022-01-31T23:00:46Z    79
4      2022-01-31T23:00:47Z    80
...                     ...   ...
17181  2022-02-01T22:56:54Z    79
17182  2022-02-01T22:59:16Z    79
17183  2022-02-01T22:59:17Z    80
17184  2022-02-01T22:59:18Z    80
17185  2022-02-01T22:59:19Z    79

[17186 rows x 2 columns]

And want to divide it by a function with following code:

MAX = 79
SCALLING = 100
try: 
    cond = result['Value']>MAX
    result.loc[cond,'Value'] = result['Value'].div(SCALLING).round(2)
except:
    result = result
print(result)

The function want divide the Values. I tried itterating over the dataframe by code:

MAX = 79
SCALLING = 100
for i in result.index: 
    cond = result['Value']>MAX
    result.loc[cond,'Value'] = result['Value'].div(SCALLING).round(2)
print(result)

But then I got the 'TypeError' : '>' not supported between instances of 'dict' and 'int'. Probably because result.loc[cond,'Value'] = dictonary datatype, how can I specify the specific value?

CodePudding user response：

Your initial code, though not extremely "pythonic", actually works just fine with the data you've provided. E.g.:

import pandas as pd

data = {'Timestamp': {0: '2022-01-31T23:00:37Z',
  1: '2022-01-31T23:00:38Z',
  2: '2022-01-31T23:00:39Z',
  3: '2022-01-31T23:00:46Z',
  4: '2022-01-31T23:00:47Z'},
 'Value': {0: 79,
  1: 80,
  2: 79,
  3: 79,
  4: 80 }}

result = pd.DataFrame(data)

print(result)

              Timestamp  Value
0  2022-01-31T23:00:37Z     79
1  2022-01-31T23:00:38Z     80
2  2022-01-31T23:00:39Z     79
3  2022-01-31T23:00:46Z     79
4  2022-01-31T23:00:47Z     80

MAX = 79
SCALLING = 100
try: 
    cond = result['Value']>MAX
    result.loc[cond,'Value'] = result['Value'].div(SCALLING).round(2)
except:
    result = result
print(result)

              Timestamp  Value
0  2022-01-31T23:00:37Z   79.0
1  2022-01-31T23:00:38Z    0.8
2  2022-01-31T23:00:39Z   79.0
3  2022-01-31T23:00:46Z   79.0
4  2022-01-31T23:00:47Z    0.8

However, this error message:

'TypeError' : '>' not supported between instances of 'dict' and 'int'`

means that you have one or more dict values somewhere in result.Value.

E.g.:

data = {'Timestamp': {0: '2022-01-31T23:00:37Z',
  1: '2022-01-31T23:00:38Z',
  2: '2022-01-31T23:00:39Z',
  3: '2022-01-31T23:00:46Z',
  4: '2022-01-31T23:00:47Z'},
 'Value': {0: {'Value': 79},
  1: 80,
  2: 79,
  3: 79,
  4: 80 }}

result = pd.DataFrame(data)

print(result)

              Timestamp          Value
0  2022-01-31T23:00:37Z  {'Value': 79}
1  2022-01-31T23:00:38Z             80
2  2022-01-31T23:00:39Z             79
3  2022-01-31T23:00:46Z             79
4  2022-01-31T23:00:47Z             80

result['Value']>MAX would haved raised the aforementioned TypeError, if not for the Try ... Except construction.

So, the remedy is to find the dict values in your column, and deal with them. To locate them, you could use:

dict_values = result[result.Value.map(type)==dict]

print(dict_values)

              Timestamp          Value
0  2022-01-31T23:00:37Z  {'Value': 79}

CodePudding user response：

Here's an answer if your goal is to divide all value's by SCALLING if they are larger than MAX. I don't know how you got a dict error. Maybe you have a dict called MAX somewhere?

import pandas as pd
import io

#Read in your example table
result = pd.read_csv(
    io.StringIO("""
                  Timestamp Value
0      2022-01-31T23:00:37Z    79
1      2022-01-31T23:00:38Z    80
2      2022-01-31T23:00:39Z    79
3      2022-01-31T23:00:46Z    79
4      2022-01-31T23:00:47Z    80
17181  2022-02-01T22:56:54Z    79
17182  2022-02-01T22:59:16Z    79
17183  2022-02-01T22:59:17Z    80
17184  2022-02-01T22:59:18Z    80
17185  2022-02-01T22:59:19Z    79
"""),
    delim_whitespace=True,
    index_col=0,
    parse_dates=['Timestamp'],
)


#Divide all value's by SCALLING if they are larger than MAX
MAX = 79
SCALLING = 100

cond = result['Value']>MAX
result.loc[cond,'Value'] = result.loc[cond,'Value'].div(SCALLING).round(2)
print(result)

Output