How can I scramble an array in linear time without having duplicate elements next to one another?-CodePudding

For example:

T = [b, c, b, b, a, c] #each element represents questions related to topics a, b, c

What I want is make sure that no 2 questions from same topic are next to one another. (see the T where b, b are next to eachother)

So I want to rearrange the T in such a that no two questions belonging to same topic are next to each other, i.e. Tnew = [b, c, b, a, b, c]

But condition is that we have to do it linear time i.e. O(n) or Big-O of (n)

The algorithm that I thought of:

1)Create a dict or map to hold the occurrence of each topics:
a --> 1
b --> 3
c --> 2

2) Now based on the counts we can create new array such that:
A = [a, b, b, b, c, c]

3) Now perform unsorting of Array which I believe runs in O(n).
(unsorting is basically find midpoint and then merge the elements alternately from each half.

Can someone please help me design a pseudocode or algorithm that can do this better on any inputs with k number of topics?

This is a random question that I am practing for exam.

CodePudding user response：

There is a linearithmic approach of time complexity O(log c * n) where c is the number of unique items and n is the total number of items.

It works as follows:

Create a frequency table of the items. This is just a list of tuples that tells how many of each item are available. Let's store the tuples in the list as (quantity, item). This Step is O(n). You could use collections.Counter in python, or collections.defaultdict(int) or a vanilla dict.
Heapify the list of tuples in a max heap. This can be done in O(n). This heap has the items with the largest number of quantity at the front. You could use heapq module in python. Let's call this heap hp.
Have a list for the results called res.
Now run a loop while len(hp) > 0: and do as follows:

pop the 2 largest elements from the heap. O(log c) operation.
add one from each to res. Make sure you handle edge cases properly, if any.
decrement the quantity of both items. If their quantity > 0 push them back on the heap. O(log c) operation.

At the end, you could end with one item that has no peers for inter weaving them. This can happen if the quantity of one item is larger than the sum of the quantities of all the other items. But there's no way around this. Your input data must respect this condition.

One final note about time complexity: If the number of unique items is constant, we could drop the log c factor from the time complexity and consider it linear. This is mainly a case of how we define things.

CodePudding user response：

Here's the O(n) solution (inspired in part by @user1984's answer):

Imagine you know how many of each element to insert, and have ordered these counts. Say we then decided to build up a solution by interleaving groups of elements incrementally. We start off with just our group of elements G0 with lowest frequency. Then, we take the next most popular group G1, and interleave these values into our existing list.

If we were to continue in this fashion, we could observe a few rules:

if the current group G has more elements than all over smaller groups combined plus one, then:
- the result will have elements of G neighboring each other
- regardless of its prior state, no elements from the smaller groups will neighbor each other (inter-group nor intra-group)
otherwise
- the result will have no elements of G neighboring each other
regardless, G contains enough elements to separate individual elements of the next smallest group G-1, if positioned wisely.

With this in mind, we can see (recursively) that as long as we shift our interleaving to overlap any outstanding neighbor violations, we can guarantee that as long as G has fewer elements than the smaller groups combined plus two, the overall result absolutely will not have any neighbor violations.

Of course, the logic I outlined above for computing this poses some performance issues, so we're going to compute the same result in a slightly different, but equivalent, way. We'll instead insert the largest group first, and work our way smaller.

First, create a frequency table. You need to form a item: quantity mapping, so something like collections.Counter() in python. This takes O(N) time.

Next, order this mapping by frequency. This can be done in O(N) time using counting sort, since all elements are integers. Note there's c elements, but c <= N, so O(c) is O(N).

After that, build a linked list of length N, with node values from [0, N) (ascending). This will help us track which indices to write into next.

For each item in our ordered mapping, iterate from 0 to the associated count (exclusive). Each iteration, remove the current element from linked list ((re)starting at the head), and traverse two nodes forward in the linked list. Insert the item/number into the destination array at the index of each removed node. This will take O(N) time since we traverse ~2k nodes per item (where k in the group size), and the combined size of all groups is N, so 2N traversals. Each removal can be performed in O(1) time, so O(N) for traversing and removing.

So all in all, this will take O(N) time, utilizing linked lists, hash tables (or O(1) access mappings of some sort), and counting sort.

CodePudding user response：

The collections module can help. Using Counter gets you a number of occurrence of each question in O(n) time. Converting those into iterators in a deque will allow you to interleave the questions sequentially but you need to process them in decreasing order of occurrences. Getting the counters in order of frequency would normally require a sort, which is O(nLogn), but you can use a pigeon hole approach to group the iterators by common frequency in O(n) time then go through the groups in reverse order of frequency. The maximum number of distinct frequencies is knownable and will be less or equal to √(0.25 2n)-0.5 which is less than O(n).

There will be at most n iterators so building the deque will be <= O(n). going through the iterators until exhaustion will take at most 2n iterations:

T= ['b', 'c', 'b', 'b', 'a', 'c']

from collections import Counter,deque
from itertools import repeat
    
result = []
counts = Counter(T)                             # count occurrences O(n)
common = [[] for _ in range(max(counts.values()))]    # frequency groups
for t,n in counts.items():                            
    common[n-1].append(repeat(t,n))                   # iterators by freq.
Q = deque(iq for cq in reversed(common) for iq in cq) # queue iterators
while Q:                                        # 2n iterations or less
    iq = Q.popleft()          # O(1) - extract first iterator
    q = next(iq,None)         # O(1) - next question
    if not q: continue        #      - exhausted, removed from deque
    result.append(q)          # O(1) - add question to result
    Q.insert(1,iq)            # O(1) - put back iterator as 2nd

print(result)
['b', 'c', 'b', 'c', 'b', 'a']