Let $n$ be a positive integer, and let $W = \ldots x_{-1}x_0x_1x_2 \ldots$ be an infinite periodic word, consisting of just letters $a$ and/or $b$. Suppose that the minimal period $N$ of $W$ is greater than $2^n$. A finite nonempty word $U$ is said to appear in $W$ if there exist indices $k \leq \ell$ such that $U=x_k x_{k+1} \ldots x_{\ell}$. A finite word $U$ is called ubiquitous if the four words $Ua$, $Ub$, $aU$, and $bU$ all appear in $W$. Prove that there are at least $n$ ubiquitous finite nonempty words. Proposed by Grigory Chelnokov, Russia
Problem
Source: IMO Shortlist 2011, Combinatorics 6
Tags: combinatorics, Combinatorics of words, IMO Shortlist
12.07.2012 10:00
Solution:
22.07.2014 19:31
Incorrect solution, please ignore.
02.07.2015 05:19
Here is my solution! (good=ubiquitous). First notice that no two words of length $\ge N$ at a distance not multiple of $N$ can be equal. This is easy. For two equal blocks of letters that aren't at a distance that is a multiple of $N$, consider the following operation. Look to the left of these blocks. If the letters are equal, incorporate them into the blocks and continue. The process will eventually stop by the first observation. Now repeat the process to the right. We are left with two equal blocks $U$ of length at least the original length. To the left of one block is an $a$ and to the left of another block is a $b$. To the right, same thing. So $U$ is good. To this process we will call "expanding the blocks" (the blocks I talked about at the beginning of this paragraph). Let $ U_1, U_2, ... U_{\ell} $ denote the good words (with lengths $ M_1 \ge M_2 \ge ... M_{\ell} $). LEMMA: $U_k$ occurs at most $2^k$ times in a period. PROOF: By induction. $U_1$ occurs at most $2$ times, otherwise, by Pigeonhole, we could expand two of the three $U_1$'s and create a longer good word. Now, assume it works for $1,...,k$ and I prove it works for $k+1$. Otherwise, we have $2^{k+1}+1$ occurrences of $U_{k+1}$. At least half have the same letter to the left, by Pigeonhole. So we have $2^k+1$ words of $U_{k+1}$ like this, and we can expand all these block until we obtain a new good word, and by the induction it occurs at most $2^k$ times, contradiction since the new word measures more then $U_{k+1}$, so it is $U_k$ or $U_i$ with $i<k$. Now consider a period and consider the $N$ blocks of $n$ letters formed. Since $N>2^n$, two of those are the same, by Pigeonhole. If we expand them we get a good word of length $ \ge n$. Thus $M_1 \ge n$. I prove by induction that $M_i \ge n+1-i$. This was the base case. Now assume it works for $1,..,k-1$, I prove it works for $k$. Now consider the $N$ blocks of $n+1-k$ letters formed. By Pigeonhole, $2^{k-1}+1$ are the same. Assume this block $B$ is not a good word. Then there do not exist two occurrences of this block that are of the form $aBa,bBb$ nor $aBb, bBa$. This easily implies that they all have the same letter to the right or the same letter to the right. Expand $B$ like this and we get a new $B'$ of lenght at least $n+1-k+1$. If this $B'$ is not good, we repeat the process, until eventually it is good (this moment will come, by the first observation). Since it occurs $2^{k-1}+1$ different times, we have that it cannot be $U_1,...,U_{k-1}$. And so this implies $U_k$ exists and that $M_k \ge n+1-k$. So $U_1,...,U_n$ exist, so we are done! Notice this also proof that there's at least $m$ good words of length at least $n+1-m$, for $m \le n$.
22.11.2016 21:02
My proof : Consider minimal period $N$ and period word $a_0a_1\ldots a_{N-1}$, so we have that $N>2^n$. Let $I_1=\{0\leq i\leq N-1| a_i = a\}$. Without loss of generality we can assume that $|I_1| = |\{0\leq i\leq N-1| a_i = a\}|\geq |\{0\leq i\leq N-1| a_i = b\}|$. So we have that $|I_1|> 2^{n-1}$. Next we will construct iteration process. First consider maximal continuation of letters $\{a_i\}_{i\in I_1}$ in left and right sides, such that resulting words $U_i^{(1)}, i\in I_1$ are same ($|U_i^{(1)}|< \infty$ because $|a_i-a_j|<N, \forall i\not= j\in I_1$ and $N$ is period). Easy to see that word $U^{(1)}:= U_i^{(1)}$ is ubiquitous. From pigeonhole principle we get that there exists at least $|I_1|/2$ words $U_i^{(1)}$ where $i\in I_1$, such that one neighbor right letter for any such word are same. Name these words as $U_i^{(1)'}$ and set of their indexes as $I_2\subseteq I_1$. So we know that $|I_2|>|I_1|/2>2^{n-2}$. Next like the same consider maximal continuations of words $U_i^{(1)'}$ in left and right sides, such that resulting words $U_i^{(2)}, i\in I_2$ are same. So we get that word $U^{(2)}:= U_i^{(2)}$ is ubiquitous. Iterating this process we will get sequences $\ldots I_3\subseteq I_2\subseteq I_1$ and words $\{U_i^{(k)}\}_{i\in I_k}$, where $2^{n-k}<|I_k|$ and words $U^{(k)}:=U_i^{(k)}$ are all ubiquitous. So last ubiquitous word which we can get by such method is $U^{(n)}$. And so we have algorithm to construct $n$ ubiquitous words $U^{(1)}, U^{(2)}, \ldots, U^{(n)}$. $\Box$
23.11.2016 00:50
For example consider infinite word $\ldots ababbaabaaabbbbaa .ababbaabaaabbbbaa.ababbaabaaabbbbaa\ldots$ With period $ababbaabaaabbbbaa$ of size $2^4+1$. Then algorithm acts as follows : $\ldots (a)b(a)bb(a)(a)b(a)(a)(a)bbbb(a)(a).ababbaabaaabbbbaa.ababbaabaaabbbbaa\ldots$ Word $a$ is ubiquitous. $\ldots ababb(aa)b[ a(a]a)bbbb[a(a].a)babbaabaaabbbbaa.ababbaabaaabbbbaa\ldots$ Word $aa$ is ubiquitous. $\ldots ababb(aab)a(aab)bbba(a.ab)abbaabaaabbbbaa.ababbaabaaabbbbaa\ldots$ Word $aab$ is ubiquitous. $\ldots ababb(aaba)aabbbba(a.aba)bbaabaaabbbbaa.ababbaabaaabbbbaa\ldots$ Word $aaba$ is ubiquitous. Finish.
21.10.2018 15:58
Can someone explain solver6's algorithm? Thanks
17.02.2020 00:05
WLOG $W$ has more $a$s than $b$s. Consider the following sequence of strings: $s_0=a$, and $s_i$ is the string which appears most often every $N$ characters out of the four strings formed by $s_{i-1}$ with one letter annexed at the beginning or end (if two strings satisfy this criterion, pick one arbitrarily.) We terminate this sequence once $s_i$ only appears only once every $N$ characters in $W$, and suppose this last term is $s_k$. $s_k$ has to exist as if we have two identical sufficiently large strings separated by less than $N$ characters, then we can find a period less than $N$. I claim that there exist $n$ ubiquitous words among the $\{s_i\}$. For convenience, denote the number of times $s_i$ appears every $N$ characters be $n_i$. Note that $n_0>2^{n-1}$ and $n_k=1$. Claim 1: If $s_i$ is not ubiquitous, then $n_i=n_{i+1}$. Proof: Consider one appearance of $s_i$, and suppose the characters directly to the left and right of it are $l$ and $r$. Then, I claim that all appearances of $s_i$ have $l$ on the left, or all appearances have $r$ on the right. If not, then we can find an appearance which does not have $l$ on the left, and another which does not have $r$ on the right. However, this means that $as_i$, $bs_i$, $s_ia$, $s_ib$ all appear, and $s_i$ is ubiquitous, which is a contradiction. Now, WLOG $s_i$s have $r$ on the right, then we can set $s_{i+1}=s_ir$, and we get $n_i=n_{i+1}$ as desired. Claim 2: If $s_i$ is ubiquitous, then $n_{i+1}\ge n_i/2$ Proof: Consider the characters to the right of every appearence of $s_i$. If there are more $a$s than $b$s, we set $s_{i+1}=s_ia$. Otherwise, we set $s_{i+1}=s_ib$. As $s_{i+1}$ can be constructed from the majority of the $s_i$s, we have $n_{i+1}\ge n_i/2$ as desired. Combining these two claims, note that in $\{s_i\}$, we must have at least $\left\lceil\log_2\left(\frac{n_0}{n_k}\right)\right\rceil\ge n$ ubiquitous words, as desired.
10.04.2021 08:38
Here is one important observation. Let $T$ be the period of $W$. Suppose $X_iX_{i+1}\cdots X_{i+d} = X_jX_{j+1}\cdots X_{j+d}$, and this word is NOT ubitquitous. Say it always has a prefix a, then we add $a$ to it, and consider that string instead. Eventually, we can form a ubiquitous string. Now, the $2^n$ suggests a divide and conquer algorithm, so here is a solution. WLOG, there are more than $2^{n-1}$ a's in the period. Call this number $A$. Obviously $a$ is an ubiquitous string. We consider this algorithm to construct $c_i$ from $c_{i-1}$ where $c_i$ are strings with $c_1=1$. Append a prefix to our current string. Case 1: Preceding all occurences of $c_i$, there is all a (or all b). Then we simply add this letter to the left of our current string $c_i$. Case 2: Preceding occurences of $c_i$, some start with $a$, and some start with $b$. Take the one with more letters. By pigeonhole, there are more than $2^{n-i}$ of them. If suffixes turn out to be a problem, i.e. they all end with the same letter, I can append that letter until that is no longer a problem. If this goes on long enough, we can also conclude our string is periodic. This indeed produces $\lceil \log_2(A) \rceil$ ubiquitous substrings. Example: N=aabaabab Our string is aabaabaabab|aabaabab... First of all, a work. Let $c_1=a$. There are 3 occurences of $ba$ and 2 occurrences of $aa$ (we can see $X_1=X_2=X_4=X_5=X_7=a, X_3=X_6=X_8=b$ here, $X_{8+i}=X_i$, and this counts $1\le i\le 8$ as our leftmost character of the string), so we take ba. There are 3 occurrences of $aba$. Do nothing There are 2 occurences of $aaba$ and one occurence of $baba$, so $aba$ is ubiquitous. If there happens to all stop in case 1 before there is one occurence of something, then $X_i=X_j \rightarrow X_{i-1}=X_{j-1} \rightarrow \cdots$, we can easily see that there is a smaller period.
30.01.2022 18:56
08.04.2022 18:09
22.06.2022 15:42
If we have a word $w$ that works and occurs more than $k$ times in any minimal period, we can have a word $w'$ that works and occurs more than $\frac{k}{2}$ times in any minimal period, for any $k > 2.$ By Pigeonhole more than $\frac{k}{2}$ of the $w$-words are preceded by a common letter, say an $a.$ Append this to $w.$ Extend this new word in both directions until we find differing characters in the preceding and following positions. This exists since there occur multiple instances of our word within each period, but there is not a smaller period. One of $a$ or $b$ occurs more than $2^{n-1}$ times in each minimal period and works, so we're done.
27.06.2022 14:07
Let $x_1x_2 \cdots x_p$ be the repeating part of the word $W$, with $p > 2^n$. Define the value of a word $X = w_1w_2 \cdots w_k$ (denoted by $f(X)$) as the number of $i$ with $1 \le i \le p$ such that $w_i w_{i+1} \cdots w_{i+k-1}$ coincides with $X$. Note that since this is succeeded by an $a$ or a $b$ in all occurences, we have $f(Xa) + f(Xb) = f(X)$. Suppose $U$ is a ubiquitous word with $f(U) = m \geqslant 2$. Construct two binary trees from $U$, one going forward, where $X$ points to $Xa$ and $Xb$ and one going backward, with $X$ pointing backward to $aX$ and $bX$. Move forward on the tree until we have a word $X$ such that $f(Xa), f(Xb) > 0$. Similarly move backward until we have a word $Y$ such that $f(aY), f(bY) > 0$. Such a point must exist since otherwise we get a word of length $> p$ which has value at least $2$, which would contradict the fact that $p$ was its minimal period. Now, the new word $YUX$ is ubiquitous too. Note that if each time, we go along the branch with higher value, we can ensure that he new word satisfies that $f(YUX) \geqslant \frac{m}{2}$. To finish, see that one of $a,b$ whichever appears more times, must be ubiquitous. Since such a word must appear $> 2^{n-1}$ times and each time, $f(U)$ at most halves, we can ensure that by this procedure, we obtain at least $n$ ubiquitous finite nonempty words, as desired. $\blacksquare$