I am using spacy to find PII entities within a body of text. Right now I am using the following pattern
{
"label": "CREDIT_CARD",
"id": "CREDIT_CARD_PATTERN",
"pattern": [{"IS_DIGIT": True, "LENGTH": 14}, {"IS_DIGIT": True, "OP": "?"}]
}
The hope here is that I want to find a DIGIT token with length 14 or 15. However, this is not working as expected and I am only getting results for DIGIT tokens of length 14. Does anyone know of a better way to find DIGIT tokens with length that falls within a range. Say a pattern that looks for all DIGIT tokens that have length from 14 to 16?
CodePudding user response:
You can specify a range for a value like LENGTH using extended comparison operators:
pattern = [{"LENGTH": {">=": 10, "<=": 12}}]
See: https://spacy.io/usage/rule-based-matching#adding-patterns-attributes-extended
