I have a string url like "home/products/product_name_1/details/some_options"
And i want to parse it into array with Regexp to ["home", "products","product","details","some"]
So the rule is "split by words if backslash, but if the word have underscores - take only that part that comes before first underscore"
JavaScript equivalent for this regex is
str.split("/").map(item => item.indexOf("_") > -1 ? item.split("_")[0] : item)
Please help!
CodePudding user response:
you can use this pattern
(?<!\w)[^/_]
results
['home', 'products', 'product', 'details', 'some']
python code
import re
str="home/products/product_name_1/details/some_options"
re.findall('(?<!\w)[^/_] ',str)
['home', 'products', 'product', 'details', 'some']
CodePudding user response:
Try this:
input = ["home/products/product_name_1/details/some_options",
"company/products/cars_all/details/black_color",
"public/places/1_cities/disctricts/1234_something"]
let pattern = /([a-zA-Z\d]*)(?:\/|_.*?(?:\/|$))/gmi
input.forEach(el => {
let matches = el.matchAll(pattern)
for (const match of matches) {
console.log(match[1]);
}
})
Remove \d from the regex pattern if you dont want digits in the url.
I have used matchAll here, matchAll returns a iterator, use that to get each match object, inside which the first element is the full match, and the second elemnt(index: 1) is the required group.
/([a-zA-Z\d]*)(?:\/|_.*?(?:\/|$))/gmi
/
([a-zA-Z\d]*) capture group to match letters and digits
(?:\/|_.*?(?:\/|$)) non capture group to match '/' or '_' and everything till another '/' or end of the line is found
/gmi
You can test this regex here: https://regex101.com/r/B5Bo74/1
CodePudding user response:
You can use:
\b[^\W_]
\bA word boundary to prevent a partial match[^\W_]Match 1 word characters except for_
See a regex demo.
const s = "home/products/product_name_1/details/some_options";
const regex = /\b[^\W_] /g;
console.log(s.match(regex));
If there has to be a leading / or the start of the string before the match, you can use an alternation (?:^|\/) and use a capture group for the values that you want to keep:
const s = "home/products/product_name_1/details/some_options";
const regex = /(?:^|\/)([^\W_] )/g;
console.log(Array.from(s.matchAll(regex), m => m[1]));
CodePudding user response:
Given input:
- string
"home/products/product_name_1/details/some_options"
Expected output:
- array
["home", "products", "product", "details", "some"] - Note: ignore/exclude
name,1,options(because word occurs after 1st underscore).
Task:
- split URI by slash into a set of path-segments (words)
- (if the path-segment or word contains underscores) remove the part after first underscore
Regex to match
With a regex \/|_\w you could match the URL-path separator (slash) and excluded word-part (every word after an underscore).
Then use this regex
- either as separator to split the string into its parts(excluding the regex matches): e.g. in JS
split(/\/|_\w /) - or as search-pattern in replace to prepare a string that can be easily split: e.g. in JS
replaceAll(/\/|_\w /g, ',')to obtain a CSV row which can be easily split by comma `split(',')
Beware: The regular-expression itself (flavor) and functions to apply it depend on your environment/regex-engine and script-/programming-language.
Regex applied in Javascript
split by regex
For example in Javascript use url.split(/\/|_\w*/) where:
/pattern/: everything inside the slashes is the regex-pattern\/: a c slash (URL-path-separator)|: the alternate junction, interpreted as boolean OR_\w*: zero or more (*) word-characters (w, i.e. letter from alphabet, numeric digit or underscore) following an underscore
See also:
However, this returns also empty strings (as empty split-off second parts inside underscore-containing path-segments). We can remove the empty strings with a filter where predicate s => s returns true if the string is non-empty.
Demo to solve your task:
const url = "home/products/product_name_1/details/some_options";
let firstWordsInSegments = url.split(/\/|_\w*/).filter(s => s);
console.log(firstWordsInSegments);
const urlDuplicate = "home/products/product_name_1/details/some_options/_/home";
console.log(urlDuplicate.split(/\/|_\w*/).filter(s => s)); // contains duplicates in output array
replace into CSV, then split and exclude (map,replace,filter)
The CSV containing path-segments can be split by comma and resulting parts (path-segments) can be filtered or replaced to exclude unwanted sub-parts.
using:
replaceAllto transform to CSV or remove empty strings. Note: global flag required when calling replaceAll with regexmapto remove unwanted parts after underscorefilter(s => s)to filter out empty strings
const url = "home/products/product_name_1/details/some_options";
// step by step
let pathSegments = url.split('/');
console.log('pathSegments:', pathSegments);
let firstWordsInSegments = pathSegments.map(s => s.replaceAll(/_\w*/g,''));
console.log(firstWordsInSegments);
// replace to obtain CSV and then split
let csv = "home/products/product_name_1/details/some_options/_/home".replaceAll(/\/|_\w /g, ',');
console.log('csv:', csv);
let parts = csv.split(',');
console.log('parts:', parts); // contains empty parts
let nonEmptyParts = parts.filter(s => s);
console.log('nonEmptyParts:', nonEmptyParts); // filtered out empty parts
Bonus Tip
Try your regex online (e.g. regex101 or regexplanet). See the demo on regex101.
CodePudding user response:
You could split the url with this regex
(_\w*) |(\/)
This matches the /, _name_1 and _options.
BUT depending what you are trying to to, or which language do you use, there are way better options to do this.
CodePudding user response:
You can try a pattern like \/([^\/_] ){1,} (assuming that the path starts with '/' and the components are separated by '/'); depending on language you might get an array or iterator that will give the components.
CodePudding user response:
Try ^[[:alpha:]] |(?<=\/)[[:alpha:]] or ^[a-zA-Z] |(?<=\/)[a-zA-Z] if [[:alpha:]] is not supported , it matches one or more characters on the beginning or after slash until first non char.
