Home > Blockchain >  Extract text starting from negated set up til (but not including) first occurance of @
Extract text starting from negated set up til (but not including) first occurance of @

Time:01-05

good day community.

Say I have the following line:

    [ ] This is a sentence about apples. @fruit @tag

I wish to create a regex that can generically extract the portion: "This is a sentence about apples." only.

That is, ignore the [ ] before the sentence, and ignore @fruit @tag after.

What I have so far is: ([^\s*\[\s\]\s])(.*@)

Which is creating the following match: This is a sentence about apples. @fruit @

How would I match up to, but not including the first occurrence of @ symbol, while still negating [ ] pattern with ([^\s*\[\s\]\s]) group?

EDIT: Thanks to Wiktor Stribiżew for the critical piece to help:

RegExMatch(str, "O)\[\s*]\s*([^@]*[^@\s])", output)

Final code:

; Zim Inbox txt file
FileEncoding, UTF-8
File := "C:\Users\dragoon\Desktop\anki_cards.txt"

; sleep is necessary

;;Highlight line and copy
#IfWinActive ahk_exe zim.exe
{
clipboard=
sleep, 500
Send ^ c
ClipWait
Send ^{Down}
clipboardQuestion := clipboard
FoundQuestion := RegExMatch(clipboardQuestion,"O)\[\s*]\s*([^@]*[^@\s])",outputquestion)

clipboard=
sleep, 500
Send ^ c
ClipWait
clipboardAnswer := clipboard
FoundAnswer := RegExMatch(clipboardAnswer,"O)\[\s*]\s*([^@]*[^@\s])",outputanswer)

quotedQuestionAnswer := outputquestion[1] """" outputanswer[1] """"

Fileappend, %quotedQuestionAnswer%, %File%
}

What it does: In Zim Wiki notebook, on Windows, press Win V hotkey over Question? in the following structure:

[ ] Question Header
    [ ] Question?
        [ ] Answer about dogs @cat @dog

This will result in the text being formatted as such in an external file:

Question?"Answer about dogs"

This is an acceptable format for Anki card importing, and can be used to quickly make cards from a review structure. Thanks again for all the help on my first SO question.

CodePudding user response:

You can use

\[\s*]\s*\K[^@]*[^@\s]

See the regex demo. Details:

  • \[\s*]\s* - [, zero or more whitespaces, ], zero or more whitespaces
  • \K - "forget" what has just been matched
  • [^@]* - zero or more chars other than @
  • [^@\s] - a char other than @ and whitespace.

Note that in AutoHotKey, you can also capture the part of a match if use Object mode:

RegExMatch(str, "O)\[\s*]\s*([^@]*[^@\s])", output)

The string you want to use is captured with Group 1 pattern (defined with a pair of unescaped parentheses) and you can access it via output[1]. See documentation:

Object mode. [v1.1.05 ]: This causes RegExMatch() to yield all information of the match and its subpatterns to a match object in OutputVar. For details, see OutputVar.

  •  Tags:  
  • Related