Home > Software engineering >  CleanTextEmptyString: No text is provided to clean. Apply on each row in a dataframe
CleanTextEmptyString: No text is provided to clean. Apply on each row in a dataframe

Time:01-28

I am trying to apply a function cleantext to each row of a dataframe column. It works perfect without apply function and I get the result which I want. Here is the problem

import cleantext
from cleantext import clean
master_df_m['col'] = master_df_m.Presentation.apply(lambda row: clean(row))
CleanTextEmptyString: No text is provided to clean

Here is no problem:


print(clean(master_df_m.Presentation[0], clean_all=True))

Output:

oper good morn name janeka confer oper time would like welcom everyon comerica second quarter earn call line place mute prevent background nois speaker remark questionandansw session oper instruct thank would like turn call ms darlen person director investor relat may begin darlen person comerica incorpor director ir thank janeka good morn welcom comerica

What is the matter? I also tried to put axis=1 in the brackets for apply function.

CodePudding user response:

You could try something like this assuming your dataframe does not have any empty strings:

from cleantext import clean
import pandas as pd

df = pd.DataFrame(data={'Presentation': [' This is some kind of sentence', ' This is anoTher! kind of sentence']})
df['cleaned_text'] = df.Presentation.apply(clean)

Output:

                         Presentation        cleaned_text
0       This is some kind of sentence        kind sentenc
1   This is anoTher! kind of sentence  anoth kind sentenc

If you want to overwrite your Presentation column, then just use df['Presentation']. Alternatively use map:

df['Presentation'] = df['Presentation'].map(clean)

Update 1: If you have empty strings in your dataframe, try something like this:

df = pd.DataFrame(data={'Presentation': [' This is some kind of sentence', ' This is anoTher! kind of sentence', ""]})
df = df.replace('', 'NaN') 
# or df.loc[df.Presentation == '', 'Presentation'] = 'NaN'

df['Presentation'] = df['Presentation'].map(clean)

Or:

df['Presentation'] = df.loc[df.Presentation !='', 'Presentation'].map(clean)
        Presentation
0        kind sentenc
1  anoth kind sentenc
2                 NaN

CodePudding user response:

HERE IS A SIMPLE WAY:

from cleantext import clean
for col in master_df_m.columns:
    master_df_m[col] = master_df_m[col].apply(lambda word: clean(word))

This will help you can specify another arguments in clean() as per your requirements. https://pypi.org/project/cleantext/

  •  Tags:  
  • Related