Home > Mobile >  Finding tokens with a length that falls within a certain range
Finding tokens with a length that falls within a certain range

Time:01-18

I am using spacy to find PII entities within a body of text. Right now I am using the following pattern

{
    "label": "CREDIT_CARD",
    "id": "CREDIT_CARD_PATTERN",
    "pattern": [{"IS_DIGIT": True, "LENGTH": 14}, {"IS_DIGIT": True, "OP": "?"}]
}

The hope here is that I want to find a DIGIT token with length 14 or 15. However, this is not working as expected and I am only getting results for DIGIT tokens of length 14. Does anyone know of a better way to find DIGIT tokens with length that falls within a range. Say a pattern that looks for all DIGIT tokens that have length from 14 to 16?

CodePudding user response:

You can specify a range for a value like LENGTH using extended comparison operators:

pattern = [{"LENGTH": {">=": 10, "<=": 12}}]

See: https://spacy.io/usage/rule-based-matching#adding-patterns-attributes-extended

  •  Tags:  
  • Related