Home > Mobile >  filter a column using spark databricks dataframe
filter a column using spark databricks dataframe

Time:01-11

I have a dataframe, and I have a column called url, what I want is to select all the url which is not containing the word "www.ebay.com", I have tried this:

%python
display(flutten_df.printSchema())
display(flutten_df[flutten_df['url'].str.contains("www.ebay.com")])

it gives me this error:

AnalysisException: Can't extract value from url#75009: need struct type but got string;

the schema is :

root
|-- web: string (nullable = true)
|-- url: string (nullable = true)

How to fix this problem please?

CodePudding user response:

You're trying to use pandas syntax on spark DataFrame.

In Pyspark, flutten_df['url'].str means get struct field str from column url. Thus you got that error saying it can't extract value from a column which is not a struct.

Use filter with rlike instead:

display(flutten_df.filter(~flutten_df['url'].rlike("www.ebay.com")))
  •  Tags:  
  • Related