My dataframe lists some of the name I interested and may make some changes, what I want is keep the original name if option is blank, delete the name if it says delete and replace the name if it has something.
| name | option |
|---|---|
| A | |
| B | delete |
| C | |
| D | DDDD |
| E | delete |
| F | FF |
The final dataframe I want is to change the names in a much more big dataframe and with other infos. may look like this
| name | info1 | info2 |
|---|---|---|
| A | xxxx | sth |
| A | xxxx | sth |
| C | x | blala |
| DDDD | xx | lalalaa |
| DDDD | xx | laa |
| DDDD | xx | laaaa |
| FF | xxx | aaaa |
CodePudding user response:
For good performance first remove unnecessary rows and then set values by condition:
df1 = df[~df['option'].str.contains('delete')]
df1.loc[df['option'].ne('') | df['option'].notnull(), 'name'] = ''
Or:
df1['name'] = np.where(df['option'].ne('') | df['option'].notnull(), '', df1['name'])
CodePudding user response:
You can use case condition to replace based on option value.
val df = Seq(("A", ""), ("B", "delete"), ("C", ""), ("D", "DDDD"), ("E", "delete"), ("F", "FF")).toDF("name", "option")
df.createOrReplaceTempView("temp")
val query = """
select case when option = 'delete' then ''
when option == '' then name
else option
end as name,
option
from temp
"""
spark.sql(query).show()
---- ------
|name|option|
---- ------
| A| |
| |delete|
| C| |
|DDDD| DDDD|
| |delete|
| FF| FF|
---- ------
