I've started studying RegexOne, and there's a exercise in which we must capture a piece of a string until the "." , as long as the string ends with .pdf
match file_a_record_file.pdf
match file_yesterday.pdf
skip testfile_fake.pdf.tmp
But I wanted to get a bit deeper and capture this piece of a string, unless the string contains more than 3 characters after a dot character "."
I tried using
^(\w (?!([.](.{4,}))$))
but of course it didn't work. How could I correct this pattern, considering the JavaScript RegEx library? (I don't want a function, only a pattern, if that's possible). I guess it would be more flexible if I could avoid using $, but I'd accept any answer matching the question. Thank you all in advance.
CodePudding user response:
You could check if there are no more than 4 non dot characters after a dot.
^(?!.*\.[^\.\n]{4})\w (?:\.\w )*$
const regex = /^(?!.*\.[^\.\n]{4})\w (?:\.\w )*$/;
[
"file_a_record_file.pdf",
"file_yesterday.pdf",
"testfile_fake.pdf.tmp",
"testfile_fake.docx.tmp"
].forEach(s => {
console.log((regex.test(s) ? "Match: " : "No match: ") s);
});
If you want the part before pdf if it should end on pdf, you can use a capture group and match .pdf at the end.
^(?!.*\.[^\.\n]{4})(\w (?:\.\w )*)\.pdf$
const regex = /^(?!.*\.[^.\n]{4})(\w (?:\.\w )*)\.pdf$/;
[
"file_a_record_file.pdf",
"file_yesterday.pdf",
"testfile_fake.pdf.tmp",
"testfile_fake.docx.tmp"
].forEach(s => {
const m = s.match(regex);
if (m) {
console.log(m[1]);
}
});
CodePudding user response:
I belive this is what you are trying to do:
mySting.match(/(.*)\.pdf$/)
Match file_a_record_file.pdf
Match file_yesterday.pdf
null testfile_fake.pdf.tmp
Edit: the string without the extension is stored on mySting.match(/(.*)\.pdf$/)[1]
For the 3 characters match after the first dot
myString.match(/^([^\.] )\.[\w]{1,3}$/);
Explanation: match everything that is not a dot match the first dot match a word that contains 1 to 3 characters at the end of string
CodePudding user response:
I'm confused with your question and the pattern you mentioned have worked for you, but here is my two cents:
^([^\W_] (?:_[^\W_] )*(?:\.[^\W_] )*)\.[^\W_]{1,3}$
See an online demo. This will now allow for more than 4 word-characters other than underscore after a dot as long as that dot isn't the last one:
^(- Start-line anchor and open 1st capture group;[^\W_] (?:_[^\W_] )*- Match any word-character other than underscore 1 (greedy) times and use a non-capture group 0 times to allow for concatenated strings through underscore;(?:\.[^\W_] )*)- Optionally match 0 times a literal dot followed by word-characters other than underscore and close 1st capture group;
\.[^\W_]{1,3}- A literal dot followed by 1-3 word-characters other than underscore;$- End-line anchor.
If you don't want to allow for intermediate dots, just remove that part from the expression:
^([^\W_] (?:_[^\W_] )*)\.[^\W_]{1,3}$
See the online demo
