Let's suppose that array B is made from array A by concatenating it with itself n times (example: A=[1,2,3], n=3, B=[1,2,3,1,2,3,1,2,3]) What is an efficient algorithm to find A by given B (we don't know n)? UPD we search for smallest A (when B=[1,2,1,2,1,2,1,2], A = [1,2], not [1,2,1,2])
CodePudding user response:
Assuming that [1,2,1,2,1,2,1,2] means n is 4 and not 2, for example. In other words, you're assuming the smallest such sublist, A. Otherwise, there could be multiple solutions.
Enumerate the unique integer divisors of the length of B (including 1). These would be the only valid candidates for
n.For each divisor, starting with the smallest, consider it as a candidate value for
n:a. Take the first
len(B)/nelements of B and start checking whether it is a sublist that repeats through B (I'll assume you can figure out an efficient way of doing this. I can add a suggestion if you need it.)b. If
nworks (you get to the end of B and everything matched), then you're done, otherwise, try the next divisor
CodePudding user response:
You could basically find the longest prefix in B that is also a suffix. You can derive the table from the steps involved in KMP pattern matching algorithm.
Note that there could be multiple correct solutions.(say 1,2,1,2,1,2,1,2 could have A as 1,2,1,2 or 1,2.
Once found, you will need to rerun the match against the slices of B just to make sure the whole array B matches the repeating pattern. This is necessary since there could be edge cases such as 1,2,1,2,3,4,1,2,1,2 which has 1,2,1,2 as the longest prefix that is also a suffix but it isn't a continuous repetition of A.
If the obtained length doesn't divide the length of B evenly, you will need to decrease the length evenly(as in factor wise) each time to see which one matches. (Example case: 1,2,1,2,1,2).
