I have two columns, in charges column I have different charges like post, neg, transfer, drawer charges
| ref_no | charges |
|---|---|
| 001 | POSTEX USD 100 B^NEGO USD 200 B^TRAN USD 150 |
| 002 | POSTEX USD 400 B^TRAN USD 350 |
| 003 | NEGO USD 700 B^TRAN USD 350 |
| 004 | DRAW USD 700 |
Now what I want is like this:
| ref_no | POSTEX | NEGO | TRAN | DRAW |
|---|---|---|---|---|
| 001 | POSTEX USD 100 | NEGO USD 200 | TRAN USD 150 | |
| 002 | POSTEX USD 400 | TRAN USD 350 | ||
| 003 | NEGO USD 700 | TRAN USD 350 | ||
| 004 | DRAW USD 700 |
I have tried use lateral view split but its not working, is there some other way I can do it using SparkSQL or MySql?
CodePudding user response:
One way using regexp_extract function in Spark SQL:
spark.sql(r"""
SELECT ref_no,
regexp_extract(charges, '(POSTEX USD \\d )', 1) AS POSTEX,
regexp_extract(charges, '(NEGO USD \\d )', 1) AS NEGO,
regexp_extract(charges, '(TRAN USD \\d )', 1) AS TRAN,
regexp_extract(charges, '(DRAW USD \\d )', 1) AS DRAW
FROM my_table
""").show()
# ------ -------------- ------------ ------------ ------------
#|ref_no| POSTEX| NEGO| TRAN| DRAW|
# ------ -------------- ------------ ------------ ------------
#| 001|POSTEX USD 100|NEGO USD 200|TRAN USD 150| |
#| 002|POSTEX USD 400| |TRAN USD 350| |
#| 003| |NEGO USD 700|TRAN USD 350| |
#| 004| | | |DRAW USD 700|
# ------ -------------- ------------ ------------ ------------
