The goal is to, within a JSON file and using jq, assign portions of hrefFull to hrefSimple and hrefSubsite. There may be better ways to achieve this, but I have approached this by looking for a solution that removes everything up until the string articles in a key's value but preserves the string. As a result, multiple objects like the example objects below are contained in a single JSON file formatted with a [ at the start and and ] at the end.
Desired results:
hrefFulldoes not change. Strings extracted fromhrefFullare applied tohrefSimpleandhrefSubsite.hrefSimpleis everything after and includingarticles. Ifarticlesis not in the string,hrefSimpleis the string after the final/. See example object 7.hrefSubsiteis the string betweenhttps://docs.mysite.com/and/articles....
Example results - object 1:
{
"hrefFull": "https://docs.mysite.com/product-a/articles/page-a.html",
"hrefSimple": "articles/page-a.html",
"hrefSubsite": "product-a"
}
Example results - object 2:
{
"hrefFull": "https://docs.mysite.com/product-b/articles/guide-b/page-b.html",
"hrefSimple": "articles/guide-b/page-b.html",
"hrefSubsite": "product-b"
}
Example results - object 3:
{
"hrefFull": "https://docs.mysite.com/product-c/articles/guide-c/section-c/page-c.html",
"hrefSimple": "articles/guide-c/section-c/page-c.html",
"hrefSubsite": "product-c"
}
Example results - object 4:
{
"hrefFull": "https://docs.mysite.com/product-d/sub-product-d/articles/page-d.html",
"hrefSimple": "articles/page-d.html",
"hrefSubsite": "product-d/sub-product-d"
}
Example results - object 5:
{
"hrefFull": "https://docs.mysite.com/product-e/sub-product-e/articles/guide-e/page-e.html",
"hrefSimple": "articles/guide-e/page-e.html",
"hrefSubsite": "product-e/sub-product-e"
}
Example results - object 6:
{
"hrefFull": "https://docs.mysite.com/product-f/sub-product-f/articles/guide-f/section-f/page-f.html",
"hrefSimple": "articles/guide-f/section-f/page-f.html",
"hrefSubsite": "product-f/sub-product-f"
}
Example results - object 7:
{
"hrefFull": "https://docs.mysite.com/product-g/index.html",
"hrefSimple": "index.html",
"hrefSubsite": "product-g"
}
Failed attempt (in a Bash script):
siteUrl="docs.mysite.com"
jq '
(.hrefSimple = .hrefFull)
| .hrefSimple |= (gsub("https://\($siteUrl)/.*?/"; ""))
| (.hrefSubsite = .hrefFull)
| .hrefSubsite |= (gsub("https://\($siteUrl)/"; ""))
' file-1.json > file-2.json
The script produces both accurate and inaccurate results.
Accurate results:
- Object 1
- Object 2
- Object 3
- Object 7
Inaccurate results:
- Object 4:
hrefSimpleis incorrectlysub-product-d/articles/page-d.htmlinstead ofarticles/page-d.htmlhrefSubsiteis incorrectlysub-product-dinstead ofproduct-d/sub-product-d
- Object 5:
hrefSimpleis incorrectlysub-product-e/articles/guide-e/page-e.htmlinstead ofarticles/guide-e/page-e.htmlhrefSubsiteis incorrectlysub-product-einstead ofproduct-e/sub-product-e
- Object 6:
hrefSimpleis incorrectlysub-product-f/articles/guide-f/section-f/page-f.htmlinstead ofarticles/guide-f/section-f/page-f.htmlhrefSubsiteis incorrectlysub-product-finstead ofproduct-f/sub-product-f
Other unsuccessful attempts (I can provide exact results if that's helpful):
- Various iterations of
articlesin forms of.hrefSimple |= (gsub("https://\($siteUrl)/.*?/"; ""))and.hrefSubsite |= (gsub("https://\($siteUrl)/"; "")) - Various iterations of
.hrefSimple |= split("articles")[0](also within.hrefSubsite)
For context, if it matters, hrefFull comes from an Azure App Insights export of page views for a documentation website. The exported data is used in an analytics report. I am creating hrefSimple to join two tables and would like to filter on hrefSubsite. The paths in hrefFull are produced when generating a website using the DocFx static site generator and deploying to an Azure Blob.
CodePudding user response:
I'd use capture with a regex:
. (.hrefFull | capture(
"^https://docs.mysite.com/(?<hrefSubsite>.*?)/(?<hrefSimple>articles.*|[^/]*)$"
))
{
"hrefFull": "https://docs.mysite.com/product-a/articles/page-a.html",
"hrefSubsite": "product-a",
"hrefSimple": "articles/page-a.html"
}
{
"hrefFull": "https://docs.mysite.com/product-b/articles/guide-b/page-b.html",
"hrefSubsite": "product-b",
"hrefSimple": "articles/guide-b/page-b.html"
}
{
"hrefFull": "https://docs.mysite.com/product-c/articles/guide-c/section-c/page-c.html",
"hrefSubsite": "product-c",
"hrefSimple": "articles/guide-c/section-c/page-c.html"
}
{
"hrefFull": "https://docs.mysite.com/product-d/sub-product-d/articles/page-d.html",
"hrefSubsite": "product-d/sub-product-d",
"hrefSimple": "articles/page-d.html"
}
{
"hrefFull": "https://docs.mysite.com/product-e/sub-product-e/articles/guide-e/page-e.html",
"hrefSubsite": "product-e/sub-product-e",
"hrefSimple": "articles/guide-e/page-e.html"
}
{
"hrefFull": "https://docs.mysite.com/product-f/sub-product-f/articles/guide-f/section-f/page-f.html",
"hrefSubsite": "product-f/sub-product-f",
"hrefSimple": "articles/guide-f/section-f/page-f.html"
}
{
"hrefFull": "https://docs.mysite.com/product-g/index.html",
"hrefSubsite": "product-g",
"hrefSimple": "index.html"
}
If your input objects live in an array, wrap this filter into a map(…).
