pyspark replace lowercase characters in column with 'x'-CodePudding

I'm trying to do the following but for a column in pyspark but no luck. Any idea on isolating just the lowercase characters in column of a spark df?

''.join('x' if x.islower() else 'X' if x.isupper() else x for x in text)

CodePudding user response：

Using the following dataframe as an example

 ---------- 
|     value|
 ---------- 
|lRQWg2IZtB|
|hVzsJhPVH0|
|YXzc4fZDwu|
|qRyOUhT5Hn|
|b85O0H41RE|
|vOxPLFPWPy|
|fE6o5iMJ6I|
|918JI00EC7|
|x3yEYOCwek|
|m1eWY8rZwO|
 ----------

You can use a pyspark.sql function called regexpr_replace to isolate the lowercase letters in the column with the following code

from pyspark.sql import functions

df = (df.withColumn("value", 
         functions.regexp_replace("value", r'[A-Z]|[0-9]|[,.;@#?!&$]', "")))

df.show()
 ----- 
|value|
 ----- 
|  lgt|
| hzsh|
|zcfwu|
| qyhn|
|    b|
|  vxy|
|  foi|
|     |
|xywek|
| merw|
 -----

CodePudding user response：

You can directly use regex_replace to substitute the lowercase values to any desired value -

In your case you will have to chain regex_replace to get the final output -

Data Preparation


inp_string = """
lRQWg2IZtB
hVzsJhPVH0
YXzc4fZDwu
qRyOUhT5Hn
b85O0H41RE
vOxPLFPWPy
fE6o5iMJ6I
918JI00EC7
x3yEYOCwek
m1eWY8rZwO
""".strip().split()


df = pd.DataFrame({
        'value':inp_string
})

sparkDF = sql.createDataFrame(df)


sparkDF.show()

 ---------- 
|     value|
 ---------- 
|lRQWg2IZtB|
|hVzsJhPVH0|
|YXzc4fZDwu|
|qRyOUhT5Hn|
|b85O0H41RE|
|vOxPLFPWPy|
|fE6o5iMJ6I|
|918JI00EC7|
|x3yEYOCwek|
|m1eWY8rZwO|
 ----------

Regex Replace

sparkDF = sparkDF.withColumn('value_modified',F.regexp_replace("value", r'[a-z]', "x"))

sparkDF = sparkDF.withColumn('value_modified',F.regexp_replace("value_modified", r'[A-Z]', "X"))

sparkDF.show()

 ---------- -------------- 
|     value|value_modified|
 ---------- -------------- 
|lRQWg2IZtB|    xXXXx2XXxX|
|hVzsJhPVH0|    xXxxXxXXX0|
|YXzc4fZDwu|    XXxx4xXXxx|
|qRyOUhT5Hn|    xXxXXxX5Xx|
|b85O0H41RE|    x85X0X41XX|
|vOxPLFPWPy|    xXxXXXXXXx|
|fE6o5iMJ6I|    xX6x5xXX6X|
|918JI00EC7|    918XX00XX7|
|x3yEYOCwek|    x3xXXXXxxx|
|m1eWY8rZwO|    x1xXX8xXxX|
 ---------- --------------