I have a MongoDB NoSQL database, the name is baike, there is a collection named baike_items with the following format:
id:
title:
baike_id
page_url
text
All other fields are fine except the page_url. Some of the urls are normal like:
'https://baike.baidu.hk/item/契丹族/2390374'
But some urls are ended with a string #viewPageContent, like:
https://baike.baidu.hk/item/ 