I have a collection in MongoDB that looks something like:
{
"foo": "something",
"tag": 0,
},
{
"foo": "bar",
"tag": 1,
},
{
"foo": "hello",
"tag": 0,
},
{
"foo": "world",
"tag": 3,
}
If we consider this example, there are entries in the collection with tag of value 0, 1 or 3 and these aren't unique values, tag value can be repeated. My goal is to find that 2 is missing. Is there a way to do this with a query?
CodePudding user response:
Query1
- in the upcoming mongodb 5.2 we will have sort on arrays that could do this query easier without set operation but this will be ok also
- group and find the min,max and all the values
- take the
range(max-min) - the missing are
(setDifference range_above tags) - and from them you take only the smallest => 2
aggregate(
[{"$group":
{"_id":null,
"min":{"$min":"$tag"},
"max":{"$max":"$tag"},
"tags":{"$addToSet":"$tag"}}},
{"$project":
{"_id":0,
"missing":
{"$min":
{"$setDifference":
[{"$range":[0, {"$subtract":["$max", "$min"]}]}, "$tags"]}}}}])
Query2
- in Mongodb 5 (the current version) we can use also $setWindowFields
- sort by tag, add the dense-rank(same values=same rank), and the min
- then find the
difference of tag-min - and then filter those that this
difference < rank - and find the max of them (max of the tag that are ok)
- increase 1 to find the one missing
*test it before using it to be sure, i tested it 3-4 times seemed ok, for big collection if you have many different tags, this is better i think. (the above addtoset can cause memory problems)
aggregate(
[{"$setWindowFields":
{"output":{"rank":{"$denseRank":{}}, "min":{"$first":"$tag"}},
"sortBy":{"tag":1}}},
{"$set":{"difference":{"$subtract":["$tag", "$min"]}}},
{"$match":{"$expr":{"$lt":["$difference", "$rank"]}}},
{"$group":{"_id":null, "last":{"$max":"$tag"}}},
{"$project":{"_id":0, "missing":{"$add":["$last", 1]}}}])
