I have an error, which appears after I run my script in Databricks
TypeError: _() takes 2 positional arguments but 4 were given
sessionevents = eventsDF.filter(eventsDF.eventcategory.contains("size guide","native size guide","product interactions")).groupby('eventcategory','uniquesessionid').count()
I am not sure if I should in this case define the self and if yes then how. Can anyone help me, please?
CodePudding user response:
The Column method contains can only take one value but you're passing 3. The function is defined as:
def contains(self, item: Any) -> Column
The errors message says "2 positional arguments" as it counts self which is the Column itself.
If I correctly understood what you're trying to achieve, you can use rlike function instead:
sessionevents = eventsDF.filter(
eventsDF.eventcategory.rlike("size guide|native size guide|product interactions")
).groupby('eventcategory', 'uniquesessionid').count().show()
