I want to save just normal letters and apostrophes with re.sub command in Python, however right now my code removes apostrophes so don't becomes dont etc. Can i add a "save" of apostrophes to my re.sub command or do I have to use some other solution?
My code right now:
text = open("songs/" artist "/" album "/" song, "r", encoding="latin-1")
lines = text.readlines()
for line in lines:
line = line.lower()
line = re.sub('[^a-z ]', '', line)
words = line.split(" ")
CodePudding user response:
The code
re.sub('[^a-z ]', '', line)
is taking all characters that are not (^) either lowercase a-z, or space , and removing them (by replacing them with '')
You want to add apostrophes to the list of characters that are preserved. In order to do so, you can either escape the single-quote/apostrophe character in your regex:
re.sub('[^a-z \']', '', line)
or use double-quotes in the string for your regex:
re.sub("[^a-z ']", '', line)
separate comment
By the way, a modern way of filling in a string with variables is with an f-string (documentation). Instead of
"songs/" artist "/" album "/" song
you can use
f"songs/{artist}/{album}/{song}"
