I would like to find whether a cell contains the substring foo and only this string (nothing before, nothing after) in a series of cells that may contain foobar.
I am currently using regexp in MATLAB and would like to tweak the searched pattern regexp to exclude cells that contain a string that contains the substring I defined.
I know it kind of goes against the very idea of regexp, but I am fairly certain there is a way to do what I want.
As a MWE, here is a snippet of the data I have (in cell format), called potentialfields:
'horaracha'
'sol'
'presmax'
'horapresmax'
'presmin'
'horapresmin'
and the regexp expression that I am currently using:
selected_fields={'sol','presmin'};
diffset=setdiff(potentialfields,selected_fields);
pattern=strjoin(diffset,'|');
idx_to_delete=~cellfun(@isempty,regexp(potentialfields,pattern));
The expected output of idx_to_delete is the following:
1 0 1 1 0 1
At the moment, the output is 1 0 1 1 1 1 because horapresmin contains presmin.
Thank you very much in advance.
CodePudding user response:
regexp is overkill here, ismember is an in-built function specifically designed for finding exact strings in a cell
idx_to_delete = ismember( potentialfields, selected_fields );
If you're really set on regexp you can use the start anchor (^) and end anchor ($) like so:
pattern = ['^(', strjoin( selected_fields, '|' ), ')$'];
idx_to_delete2 = ~cellfun( @isempty, regexp( potentialfields, pattern ) );
CodePudding user response:
You can build the word boundary based regex dynamically:
pattern = strcat('\\<(', strjoin(diffset,'|'), ')\\>')
idx_to_delete=~cellfun(@isempty,regexp(potentialfields,pattern))
With strjoin(diffset,'|'), you get the alternation pattern created, and the \<(...)\> is a grouping construct wrapped with word boundaries to only match whole words where word boundaries apply to every alternative start and end char.
