Home > database >  Remove all but the first four characters on each line
Remove all but the first four characters on each line

Time:02-07

So I have a text file in Vscode that contains several lines of text like so:

1801: Joseph Marie Jacquard, a French merchant and inventor invent a loom that uses punched wooden cards to automatically weave fabric designs. Early computers would use similar punch cards.

So now I'm trying to isolate the year number/the first 4 characters of each line. I'm new to regex, and I know how to get the first 4 characters (I used ^.{4}) but how would I be able to find all EXCEPT for the first 4 characters so that I can replace them with nothing and be left with just the year numbers?

CodePudding user response:

Find: (?<=^\d{4}).* Replace: with nothing

regex101 Demo

(?<=^\d{4}) if a line starts ^ with 4 digits , (?<=...) is a positive lookbehind

.* match everything else up to line terminators, so the : will be included in the match

Since you never matched the 4 digits, a lookbehind/lookahead isn't part of any match necessarily, that you want to keep, you don't have to worry about any capture groups or replacements.

CodePudding user response:

You can

Find:       ^(.{4}).
Replace: $1

See the regex demo. Details:

  • ^ - start of a line (in Visual Studio Code, ^ matches any line start)
  • (.{4}) - capturing group #1 that captures any four chars other than line break chars
  • . - one or more chars other than line break chars, as many as possible.

The $1 backreference in the replacement pattern replaces the match with Group 1 value.

  •  Tags:  
  • Related