Home > OS >  How to write a RegEx that would capture a piece of a string, unless the string ends with a certain p
How to write a RegEx that would capture a piece of a string, unless the string ends with a certain p

Time:01-28

I've started studying RegexOne, and there's a exercise in which we must capture a piece of a string until the "." , as long as the string ends with .pdf

match   file_a_record_file.pdf
match   file_yesterday.pdf
skip    testfile_fake.pdf.tmp   

But I wanted to get a bit deeper and capture this piece of a string, unless the string contains more than 3 characters after a dot character "."

I tried using

^(\w (?!([.](.{4,}))$))

but of course it didn't work. How could I correct this pattern, considering the JavaScript RegEx library? (I don't want a function, only a pattern, if that's possible). I guess it would be more flexible if I could avoid using $, but I'd accept any answer matching the question. Thank you all in advance.

CodePudding user response:

You could check if there are no more than 4 non dot characters after a dot.

^(?!.*\.[^\.\n]{4})\w (?:\.\w )*$

Regex demo

const regex = /^(?!.*\.[^\.\n]{4})\w (?:\.\w )*$/;
[
  "file_a_record_file.pdf",
  "file_yesterday.pdf",
  "testfile_fake.pdf.tmp",
  "testfile_fake.docx.tmp"
].forEach(s => {
  console.log((regex.test(s) ? "Match: " : "No match: ")   s);
});

If you want the part before pdf if it should end on pdf, you can use a capture group and match .pdf at the end.

^(?!.*\.[^\.\n]{4})(\w (?:\.\w )*)\.pdf$

Regex demo

const regex = /^(?!.*\.[^.\n]{4})(\w (?:\.\w )*)\.pdf$/;
[
  "file_a_record_file.pdf",
  "file_yesterday.pdf",
  "testfile_fake.pdf.tmp",
  "testfile_fake.docx.tmp"
].forEach(s => {
  const m = s.match(regex);
  if (m) {
    console.log(m[1]);
  }
});

CodePudding user response:

I belive this is what you are trying to do:

mySting.match(/(.*)\.pdf$/)

Match   file_a_record_file.pdf
Match   file_yesterday.pdf
null    testfile_fake.pdf.tmp

Edit: the string without the extension is stored on mySting.match(/(.*)\.pdf$/)[1]

For the 3 characters match after the first dot

myString.match(/^([^\.] )\.[\w]{1,3}$/);

Explanation: match everything that is not a dot match the first dot match a word that contains 1 to 3 characters at the end of string

CodePudding user response:

I'm confused with your question and the pattern you mentioned have worked for you, but here is my two cents:

^([^\W_] (?:_[^\W_] )*(?:\.[^\W_] )*)\.[^\W_]{1,3}$

See an online demo. This will now allow for more than 4 word-characters other than underscore after a dot as long as that dot isn't the last one:

  • ^( - Start-line anchor and open 1st capture group;
    • [^\W_] (?:_[^\W_] )* - Match any word-character other than underscore 1 (greedy) times and use a non-capture group 0 times to allow for concatenated strings through underscore;
    • (?:\.[^\W_] )*) - Optionally match 0 times a literal dot followed by word-characters other than underscore and close 1st capture group;
  • \.[^\W_]{1,3} - A literal dot followed by 1-3 word-characters other than underscore;
  • $ - End-line anchor.

If you don't want to allow for intermediate dots, just remove that part from the expression:

^([^\W_] (?:_[^\W_] )*)\.[^\W_]{1,3}$

See the online demo

  •  Tags:  
  • Related