Sturmian words: structure, combinatorics, and their arithmetics

1997, Theoretical Computer Science

We prove some new results concerning the structure, the combinatorics and the arithmetics of the set PER of all the words w having two periods p and q, p <q, which are coprimes and such that ]w] = pfq-2.

The o re tic a l C o m pute r Sc ie nc e Theoretical Computer Science 183 (1997) 45-82 Sturmian words: structure, combinatorics, and their arithmetics' Aldo de Luca* Dipartimento di Matematica, Universitir di Roma "La Sapienza", Piazzale A. Moro 2, 00185 Roma, and Istituto di Cibemetica de1 C.N. R., Arco Felice, Italy Ab stra c t We prove some new results concerning the structure, the combinatorics and the arithmetics of the set PER of all the words w having two periods p and q, p <q, which are coprimes and such that ]w] = pfq-2. A basic theorem relating PER with the set of finite standard Sturmian words was proved in de Luca and Mignosi (1994). The main result of this paper is the following simple inductive definition of PER: the empty word belongs to PER. If w is an already constructed word of PER, then also (a~)'-' and (bw)'-j belong to PER, where (-) denotes the operator of palindrome left-closure, i.e. it associates to each word u the smallest palindrome word u(-) having u as a suffix. If w is an already constructed word of PER, then also (a~)‘-’ and (bw)‘-j belong to PER, where (-) denotes the operator of palindrome left-closure, i.e. it associates to each word u the smallest palindrome word u(-) having u as a suffix. We show that, by this result, one can construct in a simple way all finite and infinite standard Sturmian words. We prove also that, up to the automorphism which interchanges the letter a with the letter b, any element of PER can be codified by the irreducible fraction p/q. This allows us to construct for any n 20 a natural bijection, that we call Farey correspondence, of the set of the Farey series of order n+ 1 and the set of special elements of length n of the set St of all finite Sturmian words. Finally, we introduce the concepts of Farey tree and Farey monoid. This latter is obtained by defining a suitable product operation on the developments in continued fractions of the set of all irreducible fractions p/q. All rights reserved 46 zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA A. De LucaITheoretical Computer Science 183 (1997) 45- 82 several applications in various fields such as Physics, Algebra and Computer For this reason there exists a large literature on this subject (cf. [3]). The most famous Sturmian word is the Fibonacci Science. word f which is the limit of the sequence of words { fn}n>o, zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA inductively defined as: fo= b, zyxwvutsrqponmlkjihgfedcbaZY fi =a, fn+l= fnfn_l,for all n > 0. The words fn of this sequence are called the jinite Fibonacci words. The name Fibonacci is due to the fact that for each IZ, 1fnjis equal to the (n + 1)th term of the Fibonacci numerical sequence: zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJ 1, 1,2,3,5,8,. . . . There exist several different but equivalent, definitions of Sturmian words. Some are of ‘combinatorial’ nature and others of ‘geometrical nature’. For instance, a Sturmian word can be defined by considering the sequence of the intersections with a squared-lattice of a semi-line having a slope which is an irrational number. A horizontal intersection is denoted by the letter b, a vertical intersection by a and an intersection with a corner by ab or ba. From this point of view the Fibonacci word is obtained by considering a semi-line starting from the origin and having a slope equal to g - 1, where g = i( zyxwvutsrqponmlkjihgfedcbaZYXWVU 1 + &) is the golden number. Sturmian words represented by a semi-line starting from the origin are usually called standard. They are of great interest from the language point of view since one can prove that the set of all finite subwords of a Sturmian word depends only on the slope of the corresponding semi-line. A finite subword of any Sturmian word is called jinite Sturmian word. We shall denote by St the set of all finite Sturmian Standard Sturmian eralization words can be defined in the following of the definition of the Fibonacci words. way which is a natural gen- word. Let qo, 41,. . . , qn, . . . be any sequence of natural integers such that qo > 0 and qi > 0 (i = 1,. . . , n). We define, inductively, the sequence of words {s n }nap, where SO= b, SI =a, S,+I =sn‘“-‘s,_l, n > 1. The sequence {s,},~o has a limit s which is a standard Sturmian word. Any standard Sturmian word is obtained in this way. The set of all the words s,, n > 0 of any standard sequence {s,},~o constitutes a language Stand which has remarkable and surprising properties. In a previous paper [7] we proved a basic theorem (cf. Theorem 2) which gives three different characterizations of Stand. The first, which generalizes a property of Fibonacci words, is based on palindrome words. More precisely, we proved that s E Stand if and only if s E {a, b} or s = AB = Cxy, where A, B, C are palindromes and x, y E {a, b}, x# y. The second is based on the periodicities of words. Let PER be the set of all words w having two periods p and q which are coprimes and such that 1WI = p + q - 2. We proved that Stand = {a, b} zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGF U PER{ab, ba}. Finally, the third characterization is of a ‘syntactical’ nature. A word s belongs to PER if and only if asa,asb, bsa, bsb E St. A word w E St with this property is called also a strictly bispecial element. This theorem has several applications. In particular, one can determine the subword complexity of Stand and derive in a simple and purely combinatorial way, the subword complexity formula for St (cf. [7]). The above results show that the ‘kernel’ of the standard Sturmian words is the set PER. In this paper we present some new results concerning the structure, the combinatorics and the arithmetics of PER. These results can be extended to Stand. Moreover, they are relevant for all finite Sturmian words since St is equal to the set of all subwords of PER. In Section 5 we prove that a word w belongs to PER if and only if either A. De LucalTheoretical Computer Science 183 (1997) 41 45- 82 is a power of a single letter or can be, uniquely, represented as w = PxyQ = QyxP, with P,Q palindromes, IP( <IQ1 and x,y~{a,b}, xfy. Moreover, p=IPj+2 and q= lQl+2 are periods of w such that gcd(p,q)= 1, Iw/ =p+q-2, p is the minimal period of w and Q the maximal to obtain a new characterization proper palindrome suffix of w. From this we are able of the set PER which allows us to construct it in a very simple way. The construction makes use of the operator (-) of palindrome left-closure which associates to any word w the word w(-) defined as the smallest palindrome word having w as a suffix. The set PER has the following closure property: if w E PER, then the words (aw)(-) is the smallest subset of d*, and (bw)(-) d = {a, b}, containing belong to PER. Moreover, PER the empty word E and having the above closure property. If we define recursively, the sequence of sets {X,}nao, where X0={&} and X,+1 =(JzZX~)(-), n30, then PER= lJnaOXn. For each n >O, X, is a biprefix code having 2” elements. Let w = ahI bh2ah3 . . . be a finite or infinite word such that the exponents hi, i > 0 are natural integers and hi > 0 for i > 1. One can associate with w a finite or infinite sequence {~,},~a of elements of PER having SO= E and for each n 2 0, s,+i = (w,s,)(-), where w, is the nth letter of w. We prove that if the sequence {.~,},~a is infinite, then it converges to a standard Sturmian bijection. In Section 6 we are concerned word. Moreover, the above correspondence with some results of a more arithmetical is a nature. The starting point is the existence of a natural bijection, up to the automorphism of d* which interchanges the letter a with the letter b, of PER and the set 4 of all irreducible fractions p/q, with p<q. The correspondence is obtained by associating with each word w E PER the fraction (IwJ/= p/q, where p is the minimal period of w and q the period such that /WI = p+q-2 (one sets also llsll= l/l). For any n > 0, let d, be the set of all the elements WEPER such that llwll=p/q with qdn+l and p+q-23n. We prove that A, is a biprefix code. Moreover, the set of the suffixes of A, of length n coincides with the set SR(~) of right special elements of St of length R. An element w E St is right-special if wa, wb E St. Moreover, A,, coincides with the left-palindrome closure of S&n). Let C9n be the set Fn = {p/q E F I q <n}. increasing argument If the elements of Pn are ordered in an way, one obtains the so-called Farey series of order n. By a cardinality one knows that for any n30 there exists a bijection of SR(n) and CFn+i. By using the previous and further results we are able to construct, for any n, a very natural bijection of SR(n) and &+I, which we call the Furey correspondence. In the last section we introduce the concepts of Farey tree and Farey monoid. The first is the usual binary tree representing all binary words beginning with the let- ter u. To each vertex representing a word w one can associate the corresponding Farey number III,@w)I~= p/q. The ‘sons’ of p/q are the fractions p/(p+q) and q/(p+q). Some interesting properties of this tree are shown. The Farey monoid is obtained by considering a natural product operation on the developments in continued fractions of the Farey numbers. We prove that there exists an isomorphism of a.&* zyxwvutsrqponm U {E} and F. 48 A. De LucalTheoretical Computer Science 183 (1997) The main results of this paper without complete the Conference (cf. “Semigroups, Automata 45- 82 proofs have been communicated and Languages” held in Port0 in June at 1994 WI). 2. zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA Preliminaries Let d be a finite non-empty set, or alphabet and &* the free monoid generated by &. The elements of d are called letters, those of &‘* words. The identity of d* is named empty word and denoted by 6. For any word w E &*, IwJ denotes its length, i.e. the number of letters occurring in w. The length of E is taken to be equal to 0. For any letter a E &f, 1~1, will denote the number of occurrences of the letter a in w. One has, of course, that Iw( = CaEd 1~1,. For any w E &*, alph(w) is the subset of & which is minimal, with respect to the inclusion, and such that w E (alph( w))*. _ The mirror image (“) is the unary operation in &* recursively defined as E”= E and (ua) = aii, for all u E &* and a E d. The mirror image is involutory and such that for all U, u E -Qz*, (G) = iX, i.e. it is an involutory antiautomorphism of d”. For any L subset of d* we set I= {G 1w E L}. A word w which coincides with its mirror image is called palindrome. The set of all palindromes over & is denoted by PAL(&), or simply, by PAL. When d = {a, b} we denote by (^) the involutory automorphism of d* defined as: a^= b, 6 = a. Thus El= E and for any w E &*, w # zyxwvutsrqponmlkjihgfedcbaZYXW E, G is obtained from w by the letter a with b. For a subset L of d* we set i = (6 ( w E 15). A word w = w1 . . . w,, wi E d, 1 <i <n, has a period p if the following condition satisfied: interchanging is If i E [ 1, IZ- p], then zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCB wi = wi +p. We denote by n(w) the set of all periods of w. From the definition one has that any integer p 2 (W I is a period of w. We recall the following important theorem due to Fine and Wilf [9]: If p,q~ZT(w) and IwI2p+q- gcd(p,q), then gcd(p,q)En(w). Moreover, one can prove (cf. [ 111) that the lower bound p + q - gcd(p, q) to the length of w in order that w admits the period gcd(p, q) is optimal. A word u is a factor, or subword, of w if w E ._&*u&*, i.e. there exist x, y E &* such that w = xuy. The factor u is called proper if u # w. If x = E (y = E), then u is called a prefix (su$ix) of w. By F(w) we denote the set of all factors of w. A subset L C d* is called language. For any language L the set F(L) of its factors is defined F(w). A language L is called factorial if it is closed by factors, i.e. as F(L) = UwEL L = F(L). For any language L the enumeration function, or subword complexity , gL of L is the map gL : N --f N defined as: for all n > 0, gL(n) = Card(L fl A?‘~). If X, Y are languages we denote by X-‘Y and YX-’ the subsets of d* A. De Lucal Theoretical Computer Science 183 (1997) When X is a singleton, i.e. X = {u}, the sets {u}-‘Y denoted by C’ Y and Y v-’ . 49 45- 82 and Y(v)- will be simply An infinite word (from left to right) x over ,d is any map x : N 4 d. we set xi =x(i) and write: For any i 3 0, . ..x.... X’XOX~ The set of all infinite words over & is denoted by .zzP. A word u E d* is a (finite) factor of x E dw if u = E or there exist integers i, j such that i <j and u = xi . . . Xi. Any pair (i, j) such that the preceding equality is satisfied is called an occurrence of u in x. If a factor u of x is empty or has the occurrence zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQ (0,Iu/- l), then u is called a prejx of x. We denote by F(x) the set of all (finite) factors of x and by Pref(x) the set of prefixes of x. An infinite word x E JZZ”’is called recurrent if for any u E F(x) there is an infinite number of occurrences of u in x. The enumeration function of the language F(x) is also called the enumeration function of x and simply denoted by yX. As we said in the introduction, infinite Sturmian words are infinite words over the alphabet {a,b} which can be defined in several different and equivalent give the following definition: Definition function 1. An infinite word x E dw gX satisfies the following g,(n)=n+ is Sturmian condition: if and only if the enumeration for all n 3 0 1 Let us now define the set St of finite Sturmian Definition ways. We shall words: 2. A word w E St if and only if there exists an infinite Sturmian word x such that w E F(x). By definition St is a factorial teresting and useful combinatorial Dulucq and Gouyou-Beauchamps language on the alphabet characterization [8]: {a,!~}. The following of the language in- St was given by Theorem 1. The language St is the set of all the words w E {a, b}* such that for any pair (u, v) of factors of w having the same length one has: 3. Standard Sturmian words There exist several methods to construct infinite Sturmian words. We shall refer here to the following procedure that we call standard method: Let (qo,ql, 92,. .) be an infinite sequence of integers such that qo 30 and qi > 0 for all i > 0. 50 zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA A. De LucaITheoretical Computer Science 183 (1997) 45- 82 We define the sequence &I+1 {.~,,},,~s, where ss =b, and for all n3 1, zyxwvutsrqponmlkj si =a =s,q”-‘s,_l. One easily verifies that the sequence the approximating We call {s,,},~o {s,},>o converges to an infinite sequence x. sequence of x and (qo,q,, 92,. . .) the directive sequence of x. It has been proved that x is an infinite Sturmian word whose representative semi-line representative starts from the origin. semi-line method. Moreover, Conversely, any infinite Sturmian word whose starts from the origin can be obtained by the preceding standard one can also prove that if go > 0 then [0, go, 41, 92, . . .] represents the development in continued fractions of the slope associated with the in$nite Sturmian word x. Let us remark that if (go, ql,qz,. . .) is the directive sequence of x and go >O then, as one easily verifies, (0, go, ql,q2, . . .) is the directive sequence of the Sturmian word f which is obtained from x by interchanging the letter a with the letter b. An infinite Sturmian word constructed by the standard method will be also called injinite standard Sturmian word. We denote by zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFED Stand the set of all infinite standard Sturmian words. Let x be an infinite {&l),30. standard Sturmian word whose approximating sequence is For each n 20 let us set x(n) = Is,I. One has then for n > 0: X(n+l>=q,-lX(n)+X(n-1). One easily verifies, by induction, that for all n 20, gcd(x(n), X(n+ 1)) = 1. If the directive sequence of x is zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFED (1, zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFED 1,. . . , 1,. . .), then x is the infinite Fibonacci word f, { fn}n3~ is the sequence numerical of finite Fibonacci words and x(n) is the (n + 1)th term of the Fibonacci sequence. Definition 3. A word s E St is called standard if there exists an infinite Sturmian word x and an integer n 20 such that s =s, mating sequence of x. where {s,},~o standard is the approxi- We shall denote by Stand the set of all finite standard Sturmian words. We say that a standard word s E Stand has the directive sequence (go, ql,. . . , qn), with go 2 0, qi > 0, 1< id n, if there exists a sequence of standard words SO,s1 ,...,Sn,Sn+1,Sn+2 such that so = b, s1 = a, Si+l =Si 4t-1 Si-1, l<idn+l and s = ~2. One can prove that any standard word has a unique directive sequence. This will be proved in Section 5 (cf. Corollary 4) as a consequence of Propositions 8 and 10. Since the set of factors of an infinite Sturmian word y depends only on the slope associated to y and does not depend on its starting point (cf. [12]), then a word s E St if and only if there exists an injinite standard Sturmian word x such that s E F(x). Thus it follows that St = F(Stand) = F(Stand). A. De Lucal Theoretical Computer Science 183 (1997) Let us remark that infinite, as well as finite, standard Sturmian 4582 51 zyxwvutsrqponmlkjihgfedcb words can be defined and constructed by other different methods (cf. [18, 161). In [7] we gave three different characterizations of the set Stand. The first is based on the following property, which is expressed by palindrome words, that we called Robinson’s property (cf. [17,7]). This property was considered by the author in [5] in the case of Fibonacci Definition 4. A word W E {a, b}* has the Robinson words. zyxwvutsrqponm property if / W I= 1 or when 1WI >/ 2 then W=AB=Cxy, with x, y E {a, b}, x # y and A, B, C palindrome We remark that Pedersen W such that words. et al. [14] proved that there exists, and is unique, a word W=AB=Cxy, if and only if gcd(IAI+2, IBI-2)=1. Let us denote by C the set of all the words on the alphabet property. d having Robinson’s One has that C = .r$ U (PAL’ n PAL{ab, In [7] we proved the following ba}). noteworthy result: Proposition 1. The set of jinite standard Sturmian all words having the Robinson property, i.e. words coincides with the set of C = Stand. A remarkable application of Proposition 1 to the study of Sturmian words generated by iterated morphisms was recently given by Berstel and St&bold in [2]. A second characterization of finite standard Sturmian words is based on periodicities of words. Let w E &* and II(w) be the set of its periods. We define the set PER of all words w having two periods p and q which are coprimes and such that jwj = p + q - 2. Thus a word w belongs to PER if it is a power of a single letter or is a word of maximal length for which the theorem of Fine and Wilf does not apply. In the sequel we assume that EE PER. This is, formally, coherent with the above definition if one takes p = q = 1. In [7] we proved the following remarkable result: Proposition 2. Stand = ~4 U PER{ab, ba}. The third and last characterization is based on an analysis of some combinatorial properties concerning special, bispecial and strictly bispecial elements of St. A. De Lucal Theoretical Computer Science 183 (1997) 52 We recall that any infinite there exists always Sturmian word x is recurrent 45- 82 so that for any s EF(x) at least one letter ZE {a, b} such that zs~F(x). Let us give the following definitions: Definition 5. A word s~St is right (left) special if sa, sb E St (us, bs E St). Definition 6. A word s~St is bispecial Definition 7. A word s~St is strictly bispecial 8. Let x be an infinite Definition special) in x if sa,sb~F(x) We recall the following if it is right and left special. Sturmian if asa, asb, bsa, bsb are in St. word. A factor s of x is right special (left (as, bseF(x)). proposition (cf. [12]). 3. Ifs E F(x), where x is an infinite Sturmian word, then s”~F(x). Moreover, if x is an infinite standard Sturmian word, then s is right special in x if and only ifs =$,, where p is a prejix of X. Proposition We shall denote by SR,&, BS and SBS the sets of right special, left special, bispecial and strictly bispecial The following elements of St, respectively. (cf. [7]) holds: 4. PER = SBS. Proposition We can summarize the previous results in the following basic theorem: Theorem 2. (1) Stand = C = AZ!’ U zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHG PER{ab, ba}, (2) PER = SBS, (3) St = F(Z) = F(PER). In [7] we proved, as consequences of Theorem 2, the following results concerning the enumeration functions of the previous sets. Let us denote by sR,s~ and sbs the enumeration functions of the sets &,SL and SBS. If gst and gSt& are the enumeration functions following of finite Sturmian relations and finite standard Sturmian words, one has that the hold for each n > 0: gst(n $- 1) =gst(n) + SR(n), zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGF SR(n + 1) = sR(a) + sbs(n) = (1/2)gstand(n &tand@ ) sbs(n), + 21, = 2&n), where 4 is the totient Euler’s function. From the above relations one easily derives (cf. [13,7]) that zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCB n+l sR(fl) =c 4(i), i=l zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDC A. De LucaITheoretical Computer Science 183 (1997) 53 45- 82 and, moreover 4. Combinatorial properties of special elements We shall give now some lemmas concerning the structure of the sets SR, S,>,BS and SBS. Proof. Let us first prove that SR = 3~. One has: Let us now prove that S, = Sa. One has: In a similar way one proves that Sr = 2~. Cl From this lemma it follows that SR=SL; moreover, operators the set BS is invariant under the (-) and (^). Lemma 2. sR TiPAL = SL fl PAL = SBS. Proof. since Let us prove that SR n PAL = SBS. SBS C SR and from Theorem inclusion bispecial. The inclusion 2, SBS 2 PAL. SBS 2 SR n PAL In order to prove is trivial the inverse we have to show that a palindrome right-special element of St is strictly Let SESR n PAL. One has then sa,sb E St and s = s”. From Proposition 3 it follows that as, bs E St. We shall prove now, by Theorem 1, that for X, y E {a, 6) the word xsy belongs to St. Let zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFED f,f' be two non-overlapping factors of xsy having the same length. We want to prove that ((f Ix- 1f 'lx1 d 1. If zyxwvutsrqponmlkjihgfedcbaZYXW f, f’ OF or f, f’ E F(sy), then the result is obvious since xs,sy E St. Let us then suppose that f =xu and f’ = UJ with 1~1= )u/. S’mce s is palindrome one has II = 11. Hence zyxwvutsrqponmlkjihgfedcbaZYX Ilf lx- lf'lnl = II4 zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA - IQ YIXI = I1 - IYIXI. Thus the previous difference is equal to 0 if x = y and equal to 1 otherwise. that xsy E St. The proof of the equality SL 0 PAL = SBS is perfectly This shows El symmetric. Let us explicitly observe that a palindrome element of St, in general, is not an element of SBS. For instance, the palindrome word baabESt is neither a right-special element nor a left-special element of St since baabb and bbaab do not belong to St. 54 A. De LucaITheoretical Computer Science 183 (1997) 45- 82 From Lemma 2 it follows that any palindrome sufJix (prefix) of a right (left) special element is strictly bispecial. Moreover, BS n PAL = SBS. Thus a bispecial element of St is not strictly bispecial unless it is a palindrome. For instance for the length n = 4 there are 10 right special elements: aaaa, baaa, abaa, aaba, baba, bbbb, abbb, babb, bbab, abab. The elements baaa and abbb are not left-special. All the others are bispecial. ever the only strictly bispecial elements are aaaa and bbbb. Let us now introduce How- the set: SBS rI A*&. An element cs belongs to SBS n A*SR if and only if there exist s E& and 2 E A* such that 0 = ,Js E SBS; the word cr is called a left-extension of s in the set SBS. A left-extension rs of s in SBS is called proper if the word 0 has no palindrome suffixes r such that Icr) > Irj> IsI. In a symmetric way one can consider the set SBSfl&A*. If s E SL and rr = sl E SBS, 1 CA*, then cr is called a right-extension of s in SBS; CTis called proper if 0 has no palindrome prefixes r such that /crj > (rI> IsI. Proposition 5. Any right-special element of St has a unique proper left-extension in SBS. Proof. Let s be a right special element of St. If SESBS then suppose that s E SR\SBS. the result is trivial. Let us first prove the ‘unicity’ of a proper left-extension of s in SBS. Suppose that 0 and rr’ are two proper and distinct Let us and later the ‘existence’ left-extensions of s in SBS. We can write: li’ = A’s 0=AS. with 51’~ versely, {a, b}*. Since CJ and ~9 are ‘proper’, ,J cannot be a suffix of i’ and, con- 1’ cannot be a suffix of 2. Hence we can write: jl=O%, 1’ = pyu, with ~1,P,UE {a, b}* and x, YE {a, b}, x # y. One has then: 0 = CUUS, Since xus, yus E& 6’ = /Iyyus. it follows: xusx, zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA xusy, yusx, zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFE yusyE St. i.e. us E SBS, so that us E PAL. If u = E this contradicts the fact that s is not a palindrome. If u # E, then one contradicts the fact that U, as well as (r’, are ‘proper’ left extensions of s in SBS. A. De Lucal Theoretical Computer Science 183 (1997) 55 45- 82 Let us now prove the ‘existence’ of a proper left-extension of s in SBS. We suppose first that s = xtl, where x E {a, b} and c1E PAL. By Lemma 2 one has a E SBS, so that from Theorem 2, if y E {a, b} and y fx then clxy E Stand so that there exists an infinite standard Sturmian approximating word x and an integer n >, 0 such that ~xy = s,, where {s,},~o sequence of x. One has that n 22. If n = 2, then x =xI’l Hence suppose n > 2. Since s,_t = flyx, with ,l3E SBS still standard, one has is the and s E PAL. and by the fact that .s,,s,_~ is with CCXY~ = fly~a = /Iys E PER = SBS. Thus s admits a left-extension in SBS. consider a left-extension of s in SBS of minimal length this has to be proper. If we Let us now suppose that s = kxcc, where 1,E {a, b}*,x E {a, b} and M is the maximal palindrome in SBS. suffix of s. As we have seen above XCIhas a unique proper left-extension Let us denote by 0 this extension: We want to prove that s is a suffix of cr. Let us observe of ;1. Indeed, otherwise, one will contradict that p cannot be a suffix the fact that tl is the palindrome suffix of s of maximal length. Let us suppose, by contradiction, that 3, is not a suffix of p. This implies since cs#s, that there exist UE {a, b}* and x1,x2 E {a, b}, x1 #.x2, such that s= 2x, UXCI, CT = p’xp.xcI with ?b’,$ E {a, b}*. Since s, (TESR one derives that uxcl E SBS. If u = E, then xa E SBS so that XCIis a palindrome and this contradicts the fact that c1is the maximal palindrome suffix of s. Let us then suppose u # 6:. Since lwccll > lcll one contradicts 0 is the unique proper left-extension of XCIin SBS. Let s E SR, we denote by st the unique proper left-extension that st coincides with the left-extension the fact that e of s in SBS. We observe qf s in SBS of minimal length. Indeed, this is an obvious consequence of the previous proposition and of the fact that a left-extension of ,s in SBS of minimal length has to be proper. zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPO Corollary 1. A right special standard Sturmian word. element Proof. Let SESR. We consider of St is a right special factor the proper left-extension in an i$inite st of s in SBS. One has that: with ~E{u, b}*. From Theorem 2, s+ubEStund, so that there exists an infinite standard Sturmian word x and an integer n>2 such that i&b = s,, where {s,},,,>o is 56 A. De LucalTheoretical the approximating Proposition sequence Computer Science 183 (1997) then 3~Pref(x), of x. Since s”~Pref(x) 3 it follows that s is right-special Let us remark that in a perfect symmetric in x. 45- 82 so that from 0 way one can prove that any left-special element s of St has a unique proper right-extension in SBS. Moreover, one derives as a corollary that any left-special element of St is a left-special factor in an infinite standard Sturmian word. 6. Let x and y be two injinite Sturmian words having the same right special factor of length n. Then Proposition F(x)fld’=F(y)fl&‘, (i=l,..., n+l). Proof. Since for any infinite Sturmian word there exists always an infinite standard Sturmian word having the same set of finite factors, we can assume that x and y are standard. From the hypothesis and Proposition 3 one has that x and y have the same prefix of length n and then the same right special factors of lengths 1,2,. . . , n. The proof of the proposition is obtained by induction on the integer i. Base of the induction. One has F(x) II d =F(y) n d = {a, b}. By hypothesis x and y have the same prefix of length 1, say a; hence a is a right special factor of x and y. This implies that aa, abEF(x) flF(y). Moreover, from Proposition 3, one has bu =@$~F(x)nF(y). Note that bb +ZF(x) W(y). This completes the base of the induction, Induction step. Suppose that we have proved the property up to i - 1,l < i < n. Thus by hypothesis F(x) n d’ = F(y) n d’, 1 d r < i and, moreover, x and y have the same right special factor of length i. Let {fo, f,, . . . , A} the set of i + 1 factors of length i belonging to F(x) n F(y). Let fo be the right special factor of length i. We have then fau, fob EF(x) n F(y). Let us now suppose that there exists f E {f 1,. . . , fi} such that faEP(x) and fbEF(y). Since Ifol=lfl=i and fo# f there will exist a word u~{u,b}* and x,y~{u,b}, fo = f&u, f =f xfy such that ‘YU, with fi, f’E {a, b}*. Hence f;xuu, f;xub U(x) n F(y), and f’yuuO’(x), f’yubG’(y). If x = a, then y = b and J(uual,- Ibubl,( = 2 which is contradiction. If x = b one reaches a similar contradiction. Hence for every f E {fi, . . . , fi} there exists a unique letter x such that fx EF(x) n F(y). This implies that F(x) n di+’ = F(y) n difl. 0 A. De LucaITheoretical Computer Science 183 (1997) 45-82 5. A new characterization of Standard words Lemma 3. A palindrome word w has the period drome prejix 57 p < /WI if and only if it has a palin- (sufJ;x) of length IwI - p. Proof. Let w=w, has for iE[l,n] . ..w., zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFED WiEd (ix I , . . . , n) be a palindrome word of length 12. One Wl = wn-,+I. Since w E PAL then w has always the period Iw] - 1. Let p be any other period such that p < Iwl. If we set q = n - p > 0, then we can write the above relation as WI = yy-z+l)+p. Now for i E [ 1, q], one has q - i + 13 1, so that from the p-periodicity of w it follows: wi = wq-i+l> for i E [l,q] i.e. w has a palindrome Q is also a suffix of w. Conversely, prefix Q of length q. Since w is palindrome suppose that w is a palindrome then word of length n having the palindrome prefix Q of length q -C n. We can write: w=QA=nQ. From the lemma of Lyndon ;=a& A=@, w = (ap)‘+‘a, and Schiitzenberger Q=(ajI)‘a, (cf. zyxwvutsrqponmlkjihgfedcbaZYXWVU [l zyxwvutsrqponmlkjihgfedcbaZYXW 11) one derives: r30 CI= di, /I = j, so that w has the period I@] = /AI = IwI - IQ] = lwl -q=p. Proposition Proof. 0 7. PER = a* zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCB U b* U (PAL n (PALabPAL)). Let w E PER. Thus w has two periods Iw] = p + q - 2. This implies (cf. [7, Theorem p and q such that gcd(p, q) = 1 and 41) that w is a palindrome word which is either a power of a single letter (a or b) or w has the palindrome prefixes (and suffixes) P and Q of lengths IPI = p - 2 and IQ1 = q - 2. Hence w can be written as w = PxyQ = QyxP, with x, y E {a, b} and x # y (cf. [7]). Conversely, if w E a* u b* then w has the periods p = 1 and q = Iwl + 1 having gcd( p, q) = 1 and Iwl = p + q - 2. If w = PxyQ = QyxP, with P, Q E PAL, x, y E {a, b} and x # y, then from the previous lemma, w has the periods: P= zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA Iwl - IQlt 4 = Iwl - IPI> A. De Lucal Theoretical Computer Science 183 (1997) 45- 82 58 zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA so that [WI= p + q - 2. Since w E PAL and wyx = Pxy Qy x E (PAL)2 it follows that wyx E C, so that from the theorem of Pedersen et al. [ 141 one has gcd( p, q) = 1. Hence, W EPER. Lemma 0 4. Let w E PER be such that Card(alph(w))> 1. Then w satisjes the follow- ing properties: 1. w can be uniquely represented as: w = Pxy Q = Qy xP, with x,y fixed letters in {a, b}, where p=JPI+2 x# y and P,Q E PAL. M oreover, gcd(p,q)= 1, andq=(Ql+2. 2. IflPl< IQI, thenQ is the maximal proper palindrome &tix (and prejx) of w. 3. p = IPI + 2 is the minimal period of w. M oreover, the standard word s = wy x will have still the minimal period p. (PI+ 1~ IQl, then 4. If k>O, O<r<p there exist and are unique the integers k and r such that and Q = (PnvlkU with I UI = r. Hence w = (Px~)~+’ U. zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPO Proof. 1. Let w = Pxy Q = Qy xP, with x,y E {a,b}, x #y and P,Q EPAL. Suppose now that there exist P’, Q’ E PAL such that w = Pxy Q = Qy xP = P’xy Q’ = Q’y xP’. This implies that: wyx = P(xy Qy x) = P’(xy Q’y x). If P # P’ then wyx can be factorized palindromes. Since wyx is primitive in two distinct ways in the product (cf. [7]) one reaches a contradiction. of two Thus it follows P = P’ and Q = Q’. Since wyx E C then from [ 141 it follows that gcd( p, q) = 1. 2. Let w = Pxy Q = Qy xP with P, Q E PAL and suppose that IPI < IQl. Let us prove that Q is the maximal proper palindrome suffix (and prefix) of w. Indeed, suppose by contradiction, that Q’ is a palindrome suffix of w such that IQ’] > IQ\. From Lemma 3, w has the period p’= IwI - IQ’1<p. Since pap’ + 1 it follows: IwI=p+q- Dp’+q- lap’+q- d, where d = gcd(p’, q), so that w has the period d in view of the theorem Wilf. Moreover, of Fine and since q> p, one has: Iw1=p+qq22p+p’. This implies derives: ]wl>d+d’, that w has also the period d’ = gcd( p, p’). Since p 2 d’ and p’ 2 d one A. De Luca I Theoretical so that w has the period implies Curd(alph(w)) Computer 6 = gcd(d,d’). = 1, which Science Since 183 (1997) 45-82 59 gcd( zyxwvutsrqponmlkjihgfedcbaZYXWVU p, q) = 1 it follows 6 = 1. This is a contradiction. 3. Let us prove that p = IPI+2 is the minimal period of w. Indeed, if w has a period 3, w will have a palindrome suffix Q’ of length IQ’1 = 1WI - p’ < p, then by Lemma p’ > IQ\.Let us consider now the standard word s = wyx = Pxy Qy x = Qy xPy x. As one easily verifies the word s saves the period p. This is also the minimal period of s. In fact if s has a period p’ <p, then also w will have the period p’< p which is a contradiction. 4. By hypothesis integers IQ1 > IP( + 2 = p. This implies that there exist and are unique the k > 0 and r such that 0 <r < p and IQ\ = kp + r. Since w = Pxy Q = Qy xP has the period p then also Q will have the period p. Moreover, since Pxy is a prefix of Q we can write: zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA Q= (Ply jk U with I U/ = r. Since w = Pxy Q one derives w = (Px~)~+’ U. 0 From the above lemma one has that if w E PER and Curd(alph(w)) > 1, then w can be uniquely represented as: w = Pxy Q, with X, y E {a,b}, x # y, P, Q E PAL, Pxy Q = Qy xP and IPI < IQl. We call this representation the canonical representation of w. The word xy= (P- ‘w)Q- ’ is uniquely determined and will be called the intermediate word of w. For any w E d* we introduce the set L,.=d*wnPAL. Any element of L, will be called a palindrome left- extension of w. zyxwvutsrqponmlkjihgfedcba Lemma 5. Let w E &*. M oreover, There exists in L, a unique element W C- ) of minimal length. if w = Q6, 6 E s$*, where Q is the maximal palindrome prefix of w, then w(- ) = 8QS. Proof. Let k be the minimal iVr,& E&* length of the elements such that ~~w,&wEPAL and lllwl= of L,. Suppose now that there exist l&w] =k. This implies Iit\ = /izj =s, with 0 ds 6 (WI- 1. Moreover, Ar = 22 = ii, where u is the suffix of w of length s. Hence there exists in L, a unique element w (-1 of minimal length. Let us now write w as w = Q6, where Q is the maximal palindrome prefix of w. One has then with II,1< (61. Since WC-) E PAL then 6 = S’x, with 6’ E XI*. This implies w(-) = AQs’X. A. De LucaITheoretical 60 Computer Science 183 (1997) Since w(-) E PAL then Q8 E PAL, so that Q6’ is a palindrome we reach a contradiction prefix of w. If 6’ #E since IQ81 > /Ql. Hence 8 = E and A = 8. It follows from the above lemma that one can introduce which associates 45- 82 0 the map (-) : d* --) PAL to any word w E d* the palindrome word WC-). We call w(-) the the operator of palindrome left-closure. Let us remark that one can introduce in a perfect symmetric way an operator (+): J$‘* + PAL of palindrome right-closure which associates to any word w E JZZ*the word palindrome left-closure of w and (-) WC+) defined as the (unique) word of minimal length in the set R, = wsd* fl PAL. One easily verifies that if w = SQ, where Q is the palindrome then WC+)= SQS”. Moreover, ,(-) suffix of w of maximal length, one has that for any w E &*. =(G)‘+‘. Let us now prove the following remarkable: Theorem 3. Let s be a right special element of St. Then ,t =,(-1 i.e. the palindrome left-closure of s coincides with the proper left-extension of s in SBS. zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA Proof. We prove first the theorem in the case s =xQ Theorem 2 and Lemma with x E {a, b} and Q E PAL. By 2 one has Q E PER = SBS. If s E SBS, then there is nothing to prove. Let us then suppose that s is not a palindrome. If Card(aZph(Q)) = 1, then Q = zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA ylQl with y E & and x # y. Indeed, Q fx IQl, otherwise s would be a palindrome. Hence s = xylQ1 and ,(-1 = ylQlxylQl~ This implies that .s-) E SBS since s(-)yx = yiQlxy’Qlyx E c. Hence in this case st = SC-). Let us then suppose that Card(aZph(Q)) = 2. By Lemma 4 we can write Q as: Q = PxyR = RyxP, with P, R E PAL, x, y E {a, b}, x # y. Hence s = xPxyR. Let us first suppose that IPI > [RI. Let us prove that in this case the maximal palindrome prefix V of s is XPX. In fact, otherwise, there would exist words RI, R2 E {a, b}* such that R = RlxR2 and V =xPxyRlx. A. De LucaITheoretical Computer Science 183 (1997) This would imply that Pxy Rl is a palindrome IPI which is absurd 45- 82 61 zyxwvutsrqponmlkjihgfed prefix of Q whose length is greater than in view of Lemma 4. It follows then: s(-) = Ry xPxy R. Since by Propositions 7 and 4, Ry xPxy R E SBS, one has st = s(-1. Let us now suppose that jP[<IRI. W e want to prove that also in this case the maximal palindrome prefix of s is XPX. If IRI= IPI + 1, then since Pxy R = Ry xP one has Px = R =xP so that R,XIPI+r, zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDC Q ,x ipi+l yx I pl+l s ,x I w yxlpI+‘. p=$I, In this case the maximal palindrome prefix of s is xI’I+~ =xPx. that IRI> IPl + 2 = p. From Lemma Let us then suppose 4 there exist and are unique the integers k > 0, 0 <r < p, such that R = (Px~)~ V, with (VI =r. Q = (Px~)~+’ V, Hence: s = x(pxy )k+’ u. Suppose now that s has a palindrome prefix V whose length is greater than IxPxl. We have to consider two cases: G2se 1: IV( < \~(Pxy )~+‘(. S’mce I VI > (xPx( there exist h 3 1 and a prefix P’ of P such that: v = x(Pxy )hP’x. This implies that the word (Px~)~P’ is palindrome, i.e. (Pxy )hP’ = P’(y xP)! If P = P’ we reach a contradiction IP’I <(PI. PI’ E d*. since one derives Pxy = Py x. Let us then suppose In this case P’x is a prefix of P, so that we can write P = P’xP” Thus from the above equation one has (P’xP”xy )hP’ = p’( y xP’xP”)! This implies: P’xPl’xy P/ = p’ YXP'XP". From this it follows P’ =p’ and then x = y which is a contradiction. Case 2: I VI > lx(Px~)~+’ I. One has [VI>1 +(k+ l)p=IRI+(p- r)+ Since p - r 3 1 one has II’ -2>lRI + 1. 1. with A. De LucaITheoretical Computer Science 183 (1997) 45-82 62 Now V is a palindrome prefix of s =xQ so that V = xV’X. This implies that Q has the palindrome prefix V’ whose length 1V’ I= 1VI - 2 3 /RI + 1. Moreover, since IVI 6 IQ1 one has IVI < IQ1 which is absurd since R is the maximal proper palindrome prefix of Q. Turning back to our problem we have that also in this case the maximal prefix of s is XPX so that by Lemma palindrome 5 s(-) = RyxPxyR. Since RyxPxyR E SBS one has s+ =s(-). Let us now suppose that s E SR is such that s = iLrQ where 1 E &*, the maximal suffix of s. If s E PAL there is nothing proper palindrome us then suppose that s is not a palindrome. where A’ E sP* and f =xQ. is a palindrome left-extension A # E and Q is to prove. Let One has then Now, as proved before f + = f (-). Moreover, since SC-) of zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGF f one has: Since the left-extension zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFED ft of f ESRin SBS is unique one has s+ = ft, so that Is(-)I > Is+I. Moreover, one has obviously that: Is(-)1 d 1st 1. = Is+/. Since by Lemma 5, s(-) = 8P8, where P is the maximal palindrome Hence Is( prefix of s = P6 and s+ = @6, it follows that IpI= Is”1so that ,B= 8 and s+ =s(-1. 0 zyxwvutsrq Remark 1. Let us observe previous that Proposition 5 can be derived as a corollary of the since this latter, according to the given proof, can be restated as theorem follows: If s E S,, then SC-) E SBS. Moreover, from the proof of Theorem 3 one has also that if w = PxyQ=QyxP E SBS with P,Qc PAL and x,y E {a,b), xf y, then (xw)(-1, (yw)(-) E SBS and (xw)‘-’ = QyxPxyQ, (yw)(-) = PxyQyxP. If X is a subset of &* we denote by X(-j x(-j = {WC-) E d* Let us define inductively the set 1w EX}. the sequence {Xn}na~ of finite subsets of A?* as x, = {E}, X n+l =(&X$), Thus s E &+I 9= n30. if and only if there exist x E &’ and t EX, such that s = (xt)(-). u X,. zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA n>O We set A. De LucalTheoretical Computer Science 183 (1997) 63 45- 82 Theorem 4. Let &’ = {a, b}. One has 2’ = SBS. Proof. Let us first prove the inclusion 9 C SBS. We show by induction tt that for any n > 0, X, C_SBS. The proof of the base of the induction on the integer is trivial since X0 and Xi are obviously included in SBS. Suppose now that X,, C SBS for n > 0; we want to prove that X,+1 C SBS. Let s EX,+~. This implies that there exist x E {a,b} and t E X, such that s = (xt)(-). Since t E SBS then xt is a right special element of St so that by Theorem 3, s = (xt)(-) = (xt)+ E SBS. Thus X,+1 C SBS. Hence 55’ C SBS. Let us now prove the inverse inclusion SBS C 2. The proof is by induction on the length of the elements of SBS. Let s E SBS. If IsI d 1 the result is trivial. Let us then suppose IsI > 1. By Theorem 2, SBS = PER and by Proposition 7, s is either equal to .&I with x E {a, b} or s f PALabPAL zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIH n PAL. In the first case, trivially, xl’1 EXI,~I & 9’. Let us then suppose that .s = PxyQ = QyxP, with P,QEPAL, x,yE{a,b},x#y. + 1, then Q = Px =xP xlQi EXlel C 9. W e can always suppose that lP( < [Qj. If IQ\ = \P( so that Q =xlQiand s =xlQl yxlQl, Hence, s = (yxlQl)(-) where This implies s zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJ EXiQI+l. Let us now suppose IQ] > IPl + 2. One has: Q = PxyR, where R E PAL by Lemma s = PxyPxyR 3. Hence = PxyRyxP. Now by Proposition 7 and Theorem 2, RyxP E PER = SBS. Moreover, s = (yRyxP)(-) since, as we have seen in the proof of Theorem 3 (cf. Remark 1) the maximal palindrome prefix of yRyxP is yRy. Since \RyxP\ < Is] by the inductive hypothesis RyxP E X, for a suitable IZ> 0. Hence (yRyxP)(-) E Xn+l. q Let B be any alphabet. A subset X of d* is a prefix code if X nXd+ = 0.In a symmetric way X is a sufix code if X n .d+X = 8. The set X is called hiprefix code if it is both prefix and suffix (cf. [l]). Lemma 6. If X & d* Y = (&X)(-j is a &fix code, then Y =(X)(-j is a biprejix is a biprejix code such that Card(Y) = Card(d)Ca Proof. Let us prove that Y is a suthx code. Suppose, by contradiction, code. Moreover, that there exist yi, .y2 E Y such that yi = 2~2, i E d*. We can write yi =xi-’ and y2 =xi-’ with x1,x2 EX. This implies that yi = axl, y2 = fix*, with a, fl E &‘*. By the hypothesis one has c(xt = @x2. Since X is a suffix code it follows that xl =x2 and then yl = y2. This shows that Y is a suffix code. Since the words of Y are palindromes it follows that Y is also a prefix code and then a biprefix code. 64 A. De LucaITheoretical Computer Science 183 (1997) 45-82 If X is a suffix code, then so will be &‘X. Thus from the previous result (zz?X)(-) is a biprefix code. Let xi ,x2 EX be such that xi #x2 and suppose that (xxi)(-) = (ye) for x, y E &. This implies that Axxi = 11~x2, for suitable 1, p E &*. Since X is a suffix code then xi =x2 which is absurd. From this trivially Curd(X). follows that Card(Y) = Curd(d) 0 Corollary 2. For each n > 0 the set X, is a biprejix code having 2” elements. Proof. Since Xt = {a, b} is a biprefix code, then by the above lemma it follows that also X2 is biprefix, so that by induction one has that for all n > 0, X, is a biprefix code. Moreover, Card(&) = 2Card(X,_l) that implies by iteration Card(&) = 2”. q Let d = {a, b}. We define the map $:d*+SBS, as J/(s) = c, *(a) = a, IC/(b)= b, and for all WE~*,XE&‘, $(wx) = (xl&w))‘-‘. Lemma 7. For all w, u E d* $(wa) E ~*$(w). Proof. The proof is by induction us then suppose on the length of U. If u = E the result is trivial. Let (u( > 0. We can write u = vx with v E d* and x E d. One has then: Il/(wvx) = @@(WV))‘-’ = Ixll/(wv) 2 with AC._&*. The last equality left palindrome closure. Il/(wu) = nxn’l&w). 0 is due to the fact that any word is a suffix of its By the induction hypothesis $(wv) = I’ll/(w), ;1’ E d*. Thus Proposition 8. The map $ : at* -+ SBS is a bijection. Proof. Since I&&*) = 9, from Theorem 4 one derives that @ is a surjection. Let us then prove that + is an injection. Let wi, wz E JX?* be such that wi # w2 and I&WI ) = Ic/(wz). We may always suppose Jwi ( < Jw2(. We have to consider the following two cases: Case 1: wt is a proper prefix of ~2, i.e. wz = wixi, with x E ~2 and 5 E &*. One has then by the previous lemma: $(wz) = i$(WiX) = i(X$(Wi))(_) for suitable 1, A’ E &*. This contradicts = nn’xlC/(wi). the hypothesis that +(wi) = $(w2). A. De LucalTheoretical Cuse 2: wi = px<, w2 =pyc’ with 65 45- 82 with p, [, 5’ E J&‘* and x, y E .d,x # y. One has from lemma: zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA the previous $(Wl> Computer Science 183 (1997) = +(w 2 > = W b+ MPX>? A, A’ E &‘*. ~1E d* and Ilk/ = Since $(px) = (x$(p))(-) = ,nx$(p), = zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA $ y i&p), p’ E d”, it follows: (y$( p))‘-) @(WI) = ~P&(P)~ ~UY+CJ). bNw 2) = Since x # y it follows I&WI) # $(wz) Lemma 8. Let wl,wz E d* which is a contradiction. be such that $4~~) = Irc/(wl), 0 i E d*, then there exists w’ E SZZ* such that w2 = WI w’. Proof. If A=& one has I/~w~)=I/~wI). Since $ is a bijection Let us then suppose (21 > 0. We can write = pxi r,k(wi ), so that $(wz) is a left-palindrome palindrome suffix of Il/(wz) of minimal w2 = wi, so that u” =E. A =1*x1, with xi E&‘. Thus $(wz) extension of xi$(wi). Let cr be the length such that IG.(2 Ixi$(wi)I. One has then that CJ= (xi $(wi ))(-I = Il/(wixi ). Hence we can write $(w 2 I= 21 $(WlXl 1, with %i E &* and [AlI < 13,l. If Ai = E, then $4~2) = +(wixi) so that, since $ is a bijection, w2 = ~1x1. If jkl # E, then, by using the above argument, one derives that there exists an integer n and letters x2,. . . ,A-, such that $(w2) = $(wixi . . .x,) = $(wlw’), it follows w2 = wiw’. 0 where w’ =x1 . . .x,. Since $ is a bijection Corollary 3. If X C d* is a prejix code, then $(X) is u biprejix code. Proof. Let Y = $(X) and suppose that there exist yl, y2 E Y such that y2 = iyl, 1, E ,d*. Let us set yi =$(x1) and y2 = r&x2),x1,x2 EX. One has then Ic/(x2)=A$(xl). From the above lemma one has x2 =x1x’, x’ E JzZ*. Since X is a prefix code then x’ = E so that xi =x2. This implies r&xi) = $(x2). Thus Y is a suffix code. Since the words of 0 Y are palindromes one has that Y is biprefix. Proposition 9. Let w E &*. If t&w) has the canonical representation ti(w) = PxyQ, with P, Q, PxyQ E PAL, x, y E ~4, IP\ < IQl, then for any k 20, $(wxk> have the canonical representations: $(wx’) Proof. = QyxP(xvQ)? The proof is by induction and $(wyk) $(wY’? = PxyQ(yxP?. on the integer k. For k = 0 the result Suppose that we have proved the assertion up to k- 1. Since P(x~Q)~-’ is trivial. and (Px~)~-’ Q A. De LucaITheoretical 66 are palindromes, Computer one has by the inductive r,@vxk) = (xt,+xk-’ Science I83 (1997) hypothesis 45-82 and by Theorem 3 (cf. Remark 1): ))‘-I = (x(Qy~)~-‘PxyQ)‘-’ = zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA (Qd‘PvQ = Qvxf’(x~Q)~> and I,+$) = 4(y1,@$~))(-) = (Px&QyxP = (y(Px~)~-‘QyxP)(-) = PxyQ( yx@. We can represent any word w E d* uniquely integers, where hl 20, hi > 0 for 1 < i dn and 0 by a finite sequence (At, hz, . . . , h,) zyxwvutsrqponm of ,,, = & bhz& . . . . One has IwI = C’= Ihi. We call such a representation of the words of &* the integral representation. Proposition 10. Let w E d* and be (hl, h2, , . . , h,) its integral representation. The standard words have, respectively , the directive sequences (h,...,h,,l), (h,...,h,-,,A, + 11, if n is even, and, respectively (h, . . + 11, (h,...,hml), if n is odd. Proof. The proof is by induction on the length n of the integral representation h,) of w. We shall first suppose that hl > 0. (ht,hz,..., Base of the induction. For n = 1 we have that w = ah1 so that *(ah’) standard words ahlab and ahIba have, as one easily verifies, the directive = ahI. The sequences (hl + 1) and (hl, 1), respectively. We check the basis of the induction also for n = 2 for reasons which will be clear in the proof of the induction step. For n = 2 $(ah’ bh2) = (ahIb)h2ah1, so that the two standard words: (ah1b)h2ah’+1b, have the directive sequences (hl, hz, 1) and (hl, h2 + zyxwvutsrqponmlkjihgfedcbaZYXWVU 1), respectively. A. De LucaITheoretical The induction Thus consider Computer Science 183 (1997j step. Suppose now that we have proved the assertion the sequence 67 45- 82 (hl,hz, . . . , h,_l ). This is the integral up to n - 122. representation of a word wt E &‘* having that: w = lV,Xh”, where x = b (resp. x = a) if n is even (resp. odd). sequence By the inductive hypothesis the (hl, zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA hz, . . , h,_l, 1) is the directive sequence of a standard word f having $(wi ) as a prefix of length IfI - 2. We can consider the sequence of standard words: .fo>fl,...~.f;I+l> where .fo = b, fi =a, 2<s<n, fs = &!;‘fs--2, and f = fn+l. Moreover, as one easily derives, %+I =fnfn-I, one has .f = zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA rl/(w by, with y E .d and y # x. Since n > 3 we can write .L = zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA PYX, h- 1 = zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA QXY, fn+~ = PYXQXY, with P, Q, PyxQ E PAL. Hence $(wt ) = PyxQ. Now by Theorem $(w,x) = (xPyxQ)‘-’ so that $(wrx)xy derives: = (xQxyP)(-) =f,*f,_i. ti(W)XY = @Vhfl)xy so that the standard (hl,...,Ll,h, By iterating = PyxQxyP = PyxPyxQ, the same argument (cf. Proposition 9) one = f,hn+lf,-l, word $(w)xy = zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJI f,h,+’ji-1 has the directive sequence + 1). Since the words fn, fnhnfn_i, fnhn+‘fn- 1 are standard, also (f,h”+‘.f,-, 3 )(XY)_l one has (cf. [7, Proposition 21) =&Llh)(Y~)- l. Therefore i(NYx=fnh”f,- lf, which is a standard word having the directive sequence (hi,...,h,-l,h,,l). If n is even, then x = b so that IC/(w)ba has the directive sequence (hl, . . . , h,_l, h, + 1) (hl, . . . , h,_, , h,, 1). If n is odd, then x = a and and $(w)ab has the directive sequence the result follows in a similar way. A. De LucaITheoretical 68 Computer Science 183 (1997) Let us now suppose that hi = 0, i.e. the integral representation We first observe that G has the inteE representation sult the standard words $(G)ab = ($(w)ba) and $(G)ba the directive sequences 45-82 of w is (0, AZ,. . . , h,). (AZ, =A,). = (t&w)&) By the above rehave, respectively, (hz, . . . , h,, 1) and (hz, . . . , Iz,_~, h, + 1) if 12 is odd and, re- spectively, (h2, . . . , h,-l,h, + 1) and (h2 ,..., h,, 1) if n is even. Hence if n is odd (resp. even) Il/(w)ba and $(w)ab have the integral representations (0, h2,. . . ,A,,, 1) (resp. (0,hz ,..., h,-i,h,+l))and(O,h~ ,..., h,_i,h,+l) (resp. (O,h2 ,..., h,,l)). 0 Corollary 4. Zf s E Stand, then s has a unique directive sequence. Proof. Let s E Stand. Then h,) be the integral s = $(w)xy representation with w E d* and x,y E d,x # y. Let of w. We suppose that x =a; the case (hi,h2,..., x = b is dealt with in a symmetric way. From the previous proposition s has the directive sequence (hl, h2,. . . , h,, 1) if n is even and (hl, h2,. . ., h,_l, h, + 1) if n is odd. Suppose now that s has also the directive sequence (kl, k2,. . . , k,). Since s E &*ab then m has to be an odd integer. If k,,, = 1, then be w’ the word whose integral representation is (kl, k2,. . . , km-l). From the preceding proposition $(w’)ab has the directive sequence (kl , k2,. . . , k,). Hence $(w)ab = $(w’)ab. This implies $(w) = Il/(w’). Since Ic/ is injective one has w = w’. It follows (kl, k2,. . . , k,_l) = (hl, h2,. . . , h,). Thus n=m- 1 and (kl,k2 ,..., k,)=(hl,hz ,..., h,, 1). Suppose now that k,,, > 1 and consider the word w’ whose integral representation is (kl, k2,. . . , k,,_l, k,,, - 1). Since m is odd from the previous proposition Il/(w’)ab has the directive sequence (kl, k2,. . . , k,). Thus Il/(w)ab=tj(w’)ab Hence n=m and w=w’. This implies and (kl,kz ,..., k,)=(hl,hl,..., (hl,h2,._.,h,)=(kl,k2,...,k,_l,k, h,+ 1). 0 Let d = {a, b} and be ._GP the set of all infinite words on d. - 1). We consider the subset SB,” defined as In other words y $&t if and only if there exists a word u E &‘* and a letter x E d such that y=uxw=uxxx...x Hence any infinite .... word x E &y can be uniquely expressed with hl>O and hi>0 as for i> 1. We call the infinite sequence integral representation of x. Let us now associate so =&, s,+l = to each x E JZ$” the sequence (xnsn)(-), n > 0. (hl,h2,...,h,,...) of words {s,},>s the defined as: 69 A. De Luca I Theoretical Computer Science 183 (1997) 45- 82 Since s, is a proper {.s,},~O converges II/ : dt + d” suffix and prefix to an infinite which associates sequence of sn+i for any n 30, s = lims,. to x E &$’ the infinite the above Thus one can introduce sequence a map word G(x) = s. Theorem 5. Let x E ,Qe,w and be (hl,hz ,..., h, ,...) its integral representation. Then word having the directive sequence $(x) is the infinite standard Sturmian (hl,hz,...>hn,...,). Proof. Let x E J$” and be (hl, h2,. . . , h,, . .) its integral representation. lim s,, where SO= E and s,+i = (xnsn)(-), n 3 0. We consider of {s,},~o defined for all n > 0 as Let $(x) = s = the subsequence {cJ~}~,o @n =Sh,+h2+..+h,. Since for all i > 0, Ci is a proper prefix of tri+l and ci E Pref(s) lim s, = s. Let us now consider the sequence of standard one has lim u,, = words {tn},,ao defined for any n 20 as b+~ = a,+~ ba, t2,,= oz,,ab. 10 that for each n > 0, tn has the directive One derives from Proposition sequence: Let us consider now the infinite standard Sturmian word y whose directive sequence is (h,,h2 ,..., h, ,... ). Thus there exists an infinite sequence of standard words: .h,fi>...>.L... such that .fi _ti=a, =b, .L+I =fnhnh-l, n > 0. It follows that for any n > 0, 47= fn+lfn. Since t,+l = fn+z fn+l = fnh;;’ fn fn+l and fn+l = fnhnfn_l one derives that for any n > 0, t, is a proper prefix of &+I. This implies that there exists the lim t,, and, moreover, t= limt, =limo,= lims,=s. Since for any n > 0, fn E Pref (t) one has also lim fn = y = t. Hence s = y that concludes our proof. 0 Proposition 11. The map $ is a bijection I+!J : dz -+ Stand. 70 A. De Luca / Theoretical Computer Science I83 (1997) 4542 Proof. From the preceding theorem one has I/(&;) = Stand so that y9 is a surjection. Let us now prove that $ is injective. Let x,x’ E x$’ be such that x # x’. We denote by IZ the minimal integer such that x, #xA. Let G(x) = s and t&x’) = s’. One has then s:, = (&_t s, =(x&t)(-), )(-). Thus s, has the suffix x,,s,_t and, since s,, and s,,_t are palindromes, the prefix sn_ixn. Similarly, s: has the prefix s,_tx~. Hence s will have the prefix s,_tx,, and s’ the prefix s,_rxA. This shows that s # s’. In conclusion of this section we remark that a formalism to our has been considered fractions 0 in relation for some respects by Raney [ 151 for the study of some problems with automata similar on continued theory. 6. The Farey correspondence We denote by zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA SBS(,) the set of all elements of SBS whose first letter is a, i.e. SBSc,) = SBS II ad*. Similarly, sBs(b) will be the set sBs,b) = SBS f-lb&‘*. Hence SBS = {E} zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA U SBS(,, U % ?s@,. One easily verifies that s E SBSc,) if and only if s^ E sBs@), so that the operation ( * ) determines a bijection of SBS,,) in SBScb). In the following we denote by 9 the set of all fractions and gcd(p, q) = 1. We call 9 the set of Farey numbers. p/q such that 0 < p <q Lemma 9. For any s E SBS there exists a unique fraction p/q E B such that p,q E II(s), p is the minimal period ofs and Is/= p + q - 2. The map ‘1 : SBS + F dejined as v](s)= P/4, is a surjection. Moreover, for s # 8: r(s) = (Isl - lQl)/(lQl + 2), where Q is the maximal proper palindrome sufJix of s. The restrictions & and @, of II, respectively, to SBS(,) U {E} and to SBS@) U {E}, are bijections. Proof. Let s E SBS. Ifs = a, then the unique fraction in 9 such that I&J= 0 = p+q -2 is p/q = l/l. Let us now suppose that s =&I with x E {a, b}. The word s has the minimal period p = 1. The unique period q of s for which IsI = q - 1 is then q = IsI+ 1. In this case the maximal p/q = l/(l4 suffix of s is Q =~l~l-~ so that + 1) = (I4 - zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA IQIMIQI + 2). Let us now suppose resented proper palindrome that Card(alph(s)) =2. as s = PxyQ = QyxP, By Lemma with P, Q E PAL, 4, s can be uniquely rep- (PJ < IQ1 and x, y E {a, b}, x # y. A. De Luca I Theoretical Computer Science 183 (1997) Moreover, s has the periods p = lP( + 2, q = IQ\ + 2 such that gcd(p, q) = 1 and (s] = p + q - 2. Since [P( < IQ\, then p is the minimal p/q= period of s and Q is the max- suffix of s. We can write the ratio p/q, uniquely imal proper palindrome by s, as 71 45- 82 determined (IPI + zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA 2YClQl+ 2) =(I4 - lQl>/(lQl + 2). Let now p/q E 97 We want show that there exist and are only two the words s, s^E SBS such that q(s) = q(i) = p/q. If p = 1 then the words s = as-’ and s^= 8-l are, trivially, the only two words of SBS such that q(s) = q(i) = l/q. Let us then suppose p > 1. Since gcd( p, q) = 1 then from the theorem of Pedersen et al. [14] there exists and is uniquely determined, the word W E C such that IV = AB = Cab, with A, B, C E PAL and IAl = p - 2, IBI =q + 2. Thus C is uniquely determined, C E PER and has the periods p and q such that JCJ = pfq-2 (cf. [7]). By Theorem 2 and Lemma 4, C can be uniquely expressed as C = PxyQ = QyxP, with P,Q E PAL, x, y E {a,b}, x # y, q= IQ1 + 2, p= (PI + 2. Moreover, from the above equation one derives A = P, xy = ba and B = baQab. One has also, by Lemma 4, that p is the minimal W = AB = Cab one has that period of C. From the equation so that one derives that C and C are the only two words in SBS such that q(C) = q(C) = p/q. From this it follows, trivially, that the restrictions n= and nb of q to SBSc,, zyxwvutsrqponml U{E} cl and to sBs(b, U {E}, respectively, are bijections. In the following we denote for s E SBS, Lemma 10. Let s E SBS that Ilw’ -‘ II = Pl(P with yE{a,b} andxf q(s) = I(s((. such that IIs((= p/q. + 4) and II(v’ II Th ere exists a letter x E {a, b} such = q / b + 9) 2 y Proof. Let s E SBS such that l]sll = p/q. Ifs = E, then /(E\/= f, so that (as)(-) (b&)(-J = b and l](as)(-)ll = II(b = :. If Card(aZph(s)) p=l andq=Is(+l. = 1, then s = xlsI with XE {a,b} Thusxs=xlSl+l and Il(xs)(-)I] = in this case (ys)(-)=xlslyxlsl. Hence ll(y~)(-)]] =(\s] Let us now suppose that Card(aEph(s)) =2. By expressed as s = PxyQ = QyxP, = a and and lls]l = p/q= l/(lsl + l), i.e. l/(lsl+2) = p/(p+q). Moreover, + l)/(lsl + 2)=q/(p + q). Lemma 4, s can be canonically 72 A. De LucaITheoretical Computer Science I83 (1997) 45-82 with P,QE PAL, IPJ < IQ/, x,y~{u,b}, x # y. Moreover, P= (P/+2 and q= 18b2. One has then xs = xPxyQ, YS =yQyxP, so that (cf. Proposition (xs)(-) 9) = QyxPxyQ Hence, recalling II(x’II = sxyQ, (ys)(-) = PxyQyxP = syxP. that IsI = p + q - 2, it follows: =q/(p + q), II(y’II = Pl(P + 4). cl zyxwvutsrqponmlkjihgfedcbaZYXW Corollary 5. Ifs E SR, then jjs(-)ll = p/q with q < Is/ + 1. Proof. If s E SBS = PER then the result is trivial since SC-) = s so that q - 2 is the length of the maximal proper palindrome suffix of s. This length is d (sJ - 1. Thus q<Js( + 1. Let us then suppose that s # SBS. s=Ant, AEd*, We can write s as XEd, where t E SBS is the maximal proper palindrome suffix of s. Hence from Theorem 3, s(-) = (xt)(-1. Let (Is(-)II = p/q and I/d-)/( = p//q’. By the preceding lemma q = p’+q’. Since p’ + q’ - 2 = (t( it follows that q = p’ + q’= (tJ + 2 = Jxt( + 1 d JSI+ 1. Let us now introduce cl for x E {a, b} the following maps: 71,,px : SBS(x) --t SBS(,,, defined as: for s E SBSc,) G(S) = r,‘(Pl(P such that 1JsI1= p/q then + q)), PX(S) = C1(4/(P Hence ll~X(.s)ll = p/(p+q) and IIpn(s)II =q/(p+q). q/(p + q), then n,(s) = (ys)(-) and p*(s) = (xs)(-) if Il(xs)(-‘(1= p/(p+q), then rcX(s) = (xs)(-) + 4)). By the above lemma if Il(xs)(-‘II= with y E {a, b}, x # y. Conversely, and pJs) = (ys)(-). Hence rcX(s) is a left palindrome extension of s which ‘saves’ the minimal period p, whereas p&s) ‘saves’ the period q. For k>O let us define the map zk) inductively as 7~:‘) is the identity map and for k > 0 #)= n o $+*) where o denotes the composition of maps. For s E SBSQ) such ;hf llsll =fp/qX one’has Il#‘(s)ll = p/(kp + q), k > 0. Let us define for k > 0, a(k) = pX o r&k--1) x . One easily verifies that IlS$!‘(s)ll = q/(kq + p). X Lemma 11. 8:“(t), then s, t E t. x {a, b} h, k >O such that @‘(s)= A. De LucalTheoretical 73 Computer Science 183 (1997) 4.582 Suppose that ]]s]]= p/q and IIt]] = p//q’. If there exist h, k > 0 such that 6?)(s)= Proof. 13?‘(t), then one has zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA ll@ %)ll = q / G % + P> = Il~~h’ (~>ll = q ’ l(kf + P’). Hence q = q’ and p + kq = p’ + hq’. p = p’ + (h - k)q. If h > k then p > q which is absurd. If h < k, then since p’ < q it follows p < 0 which is absurd. Thus the only possibility from Lemma 9, s = t. 0 Let n be a positive integer. is h = k, i.e. p = p’. We define the set 4 This implies llsll = llt]l and as s$={p/qEFIqbn}. One has Curd(4) F(n) = =F(n), where 2 c)(i). i=l If we order the elements of 4 in an increasing way we obtain a sequence of irreducible fractions called the Furey series of order n and length F(n) (cf. [IS]). In the following table we report the Farey series for n <5: 11 zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA 2 1 LLZ! 3231 LL1211224! 5435253451 We have seen in Section a bijection of the set &+t @‘n: =%+l -+ 1 that SE(~) = F(n + 1). Hence, for any n > 0 there exists in &(n) = & n d”. SRb), that we call the Farey correspondence. Lemma 12. For each n > 0, let $ ~~={p/qE~+,(p+q-2~n}. One has that Curd(9,,) = iF(n + 1) = isR(n). be the set We shall determine a natural bijection: A. De Luca I Theoretical Computer Science 183 (1997) 45-82 zyxwvutsrqponmlkjihgfedcba 74 Proof. An element p/q E &+I does not belong to 4 if and only if pfq < n +2. Let us recall that the number of all pairs (p, q) of positive integers such that gcd(p, q) = 1 and p+q=i, for a fixed i > 0 is given by 4(i). Let us now add the further condition that pdq. If 2 < id II + I, the number of solutions of the previous equation is then i&i). If i = 2, then the only solution is p = q = 1. Since 1 = i(& 1) + 42)) we can state that in any case the number above conditions Curd(9$) of solutions is given by k CrL: = C&(&+1) - Curd(9,+i\4) 4(i). of the equation Hence, p + q < n + 2 under the C~rd(9~+i\%~)= = ~s&z). i cF:i 4(i) and cl For each n > 0 let us define the set 2, = rl;‘($,). Since Q is a bijection one has from Lemma 12 that Curd(&) = Curd(!&) = is~(n). We introduce also the sets r~;‘($) = 2, and A,, = Z,, U 2, = q-‘(9n) = {s E SBS 1JJs\IE 23nn). Thus s E A, if and only if s is a strictly bispecial p/q satisfies the condition element of St whose Farey number (s[ = p + q - 2 > IZ with q <n + I. Theorem 6. For each n > 0 let &(n) = SR rl ~4”. The map dejined for s E A,, us is a bijection fn : A,, --) S&n). (&*)-‘A, n &02”= SR(n), Thus one has: and, moreover, A, = (SR(n))(-). Proof. For each s E A,,, fn(s) gives the suffix of s of length n. Thus fn(s) E SR(n). We want to prove that fn is a bijection. Since Curd(A,) = Curd(Z,) + Curd(&) = sR(n), it is sufficient to prove that fn is a surjection. one has from Corollary 5, I\s(-)ll= p/q with q <n+l. Since Is(-)1 = p+q-2 2 n one has s (-) E A, and &(s(-)) =s. This proves that fn is a bijection and then (&*)-l A, n d” = S&n). As we have seen above (&(n))(-) C A,. Let now Indeed, for any s e&(n) t E A,,. One has fn(t) =s E&(n). Since fn(s(-)) = s it follows, in view of the fact that 0 fn is a bijection, t = s(-). This shows A,, c (SR(n))(-) and then A,, = (SR(n))(-). Corollary 6. For each n > 0 one zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLK has: A,(&*)-’ n d’ = SL(n), A. De LucalTheoretical Computer Science 183 (1997) 15 45- 82 and, moreover, A, = (&(n))‘+’ zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA Proof. From the previous Moreover, theorem (&‘*)-I A, n d” = SR(~). From Lemma A, = A,, since the words in A, are palindromes. images of both sides of the previous A,, = (Sk(n))(- ) = (S,(n))(- ) Hence by taking the mirror relation it follows A,(&- ’ for any w E &* one has WC-) =(G)(+) 1, SR = S,. n d$”= SL(~).Since it follows: = ($(n))‘+‘. 0 Let i, j be two positive integers such that id j. We introduce the following numbertheoretic function 4li,jl, that we call generalized Euler’s function defined as: cbri,jl(n) = Card{x E [i, j] 1gcd(x,n) = 1). In other words 4li,jl(n) gives the number of integers in the interval primes with the integer IZ. One has, of course, that &i,,l(n) Lemma = &i,,-i,(n) [i, j] which are = 4(n). 13. Let n > 1 and k be such that k < n < 2k. The number of pairs (p,q) of positive integers such that p+q=n, ldp<qdk, gcd(p>q) = 1, is given by Proof. Let us first count the number of pairs (p, q) of positive integers such that gcd(p,q)=l, < p dq, $4,n-k&(4. then the number of the pairs (p, q) becomes 0 Proposition 12. For each n > 0, A,, is a biprejix code. The minimal length lmin and the maximal length I,,, of the words of A,, are lmin= n, l,, = 2n - 1. M oreover, for each h E [O,n - l] Card(A, n s% “‘+~)= (P[t,+l++l](n + h + 2). Proof. From Theorem 6, for each n > 0, A,, = (St+(n))(- ). Since SR(n) is, trivially, a biprefix code then from Lemma 6 it follows that for each n > 0, A,, is a biprefix code. LetsEAn and \jsjl=p/qitsFareynumber. Onehas IsI=p+q- 2>,n andqdn+l. Since p<q1 it follows ]s]=p+q- 2<2q- 3<2n1. Hence ndIs]<2n- 1. 76 A. De Lucal Theoretical To achieve the result it is sufficient a”-‘ban-‘ESBS, llanI]=l/(n+l), =2n - 1. Let s E A,, n JP+~ with hE[O,n- lemma the number above condition is satisfied number p/q E 9,, correspond Science 183 (1997) 45-82 to observe that an, a”-‘ban-’ E A,,; indeed, a”, IJa”-‘ba”-‘II=n/(n+l) and la”I=n, (a”-‘ba”-‘( 11. One has Isl=p+q-2=n+h. 1<pGq<n p+q=n+h+2, By the previous Computer + 1, PMP, 4) = 1. of pairs (p, q) of positive integers for which the is given by ~~[h+l,n+ll(n + h + 2). Since to each Farey two words, namely s and s^ having the Farey number p/q Card(A, n LZ?+~)= $$h+l,n+l](n+ h + 2). one derives: Hence 0 Let us observe that Card( A,) = F(n + 1) so that the following identity holds for any n>O g14 n+l [r,n+ll(n+ r + 1)= ,ICl4(i). zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA For each n > 0 we shall call A,, the n-Farey code. Let us consider defined as now the subset S;(n) of&(n) S;(n) = (sP-lZ, n d”. One has that Card(Sh(n)) = i.sR(n). Moreover, if S;(n) = (&*)-‘&, n Let us introduce now the map @L: 29,,-+ S;(n) defined as d” one has S:(n) = i;(n). The map @A is a bijection that we call the first restricted Farey correspondence. Example. Let us consider the Farey series of order n = 7: The length of the Farey series is 18. We construct spondence @L: 96 ---) S;(6). in an increasing the first restricted The set $j is formed by the following Farey corre- 9 elements ordered way: 123415456 7’ 7’ 7’ 7’ 5’ 7’ 5’ 6’ 7’ In Table 1 we report in the first column the elements of ‘36, in the second column values of 11; l, in the third colwnn their lengths and in the fourth column of the first restricted Farey correspondence. Let us set for each n > 0, 59;= &+I\%~. Let Y, be the set: the the values A. De LucaITheoretical Computer Science 183 (1997) 45- 82 77 Table 1 Farey number s.bisp.element Length Special element 5I zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA aaaaaa 6 aaaaaa 2 7 abababa I bababa 3 i aabaabaa 8 baabaa 4 aabaaabaa 9 aaabaa 3 5 abaaba 6 abaaba 5 i ababaababa 10 aababa 4 5 aaabaaa I aabaaa 1 6 aaaabaaaa 9 ahaaaa 4 aaaaabaaaaa 11 baaaaa I We introduce the map g,, defined in Y, as follows: where k(s)=min{k~N+ 1jbf)(s)l for any s E Y, an} zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIH Theorem 7. For each n > 0, gn is a bijection gn : Y, + 2,. Proof. From Lemma so that Card(Y,) 12 one has that for each n > 0, Card(S$) = Card(% ) = is. Moreover, for any k > 0, sff’ = pb o 7cf-l). such that llsll= p/q. One has then 116f)(s)// = q/(p+kq). = is, Let s E Y, be Let k = k(s); from the defini- tion of k(s) it follows that p+(k+l)q- 2 2 n and p+kq- 2 <n. Hence p + kqdn + 1. This implies that g,(s) E A,,. Since gn(.s) is a palindrome left-extension of s and s terminates with the letter b then gn(s) will begin with the letter b. Thus gn(S) E 2,. From Lemma 11 it follows that gn is injective. Since Card(2,) = Card( Y,) one has that gn is a bijection. 0 Let us introduce the map C$’ : 9; + S:(n) The map dji is a bijection defined as that we call the second restricted Farey correspondence. Hence the Farey correspondence restrictions @A and @$‘. @,, &+I +SR(~), Example. We report in Table 2 the second restricted is completely determined Farey correspondence by @t : Sk 4 S;(6) in the case n = 6. The elements of 9; ordered in an increasing way are reported in the first column. In the second column there are the values of the function r;‘, in A. De Luca I Theoretical Computer Science 183 (1997) 45- 82 78 zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA Table 2 s.bispec.element zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHG Farey number Special element 96 Farey number bbbbbabbbbb 6 7 bbbb bbbbabbbb .! 6 babbbb bbb bbbabbb 4 5 bbabbb bb bbabbabb 2 7 abbabb babab bababbabab s7 bbabab b bababab z7 ababab bab babbab I5 babbab bbabb bbabbbabb E bbbbbb bbbbb the third column fifth column the values abbbbb 4 7 zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPO bbbabb 1 7 of $5, in the fourth column bbbbbb their Farey numbers. In the the values of Qt. 7. Farey numbers and standard words We have seen in the previous and a surjection q: SBS + 9. sections that there exists a bijection This latter becomes a bijection $ : d* + SBS if rl is restricted to {E} U zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA SBS,,, (or to {E} U SBSp)). Hence any strictly bispecial element of St can be ‘codified’, up to the automorphism (^), by a Farey number. Moreover, any binary word faithfully represents a strictly bispecial element. If we consider the restriction 11/01 of $ to ad* U {E}, then we obtain a bijection $a : a d* U {E} -+ SBSc,, U {E}. Hence the composition i = 1c/aOra, is a bijection quences [ : ad* U {E} + 9 . We shall see in this section of the previous some remarkable conse- correspondences. Proposition 13. The set 9 of Farey numbers is the smallest subset Y of the set 62 of rational numbers, which contains i and such that 4 -EY. P+4 Proof. It is trivial by Lemma 10 that y > Y. The proof that 9 & Y is then obtained by induction on the length II = I[-‘( f)l with f E y. If n = 0 the result is trivial since *(E) = E and V(E) = f. Let us then suppose that the assertion is true up to n - 1 and consider a word w E a&‘* of length n and such that Ilrl/(w)ll = f E SC We can write A. De LucaITheoretical Computer Science I83 (1997) 79 45-82 Fig. 1. w = ZAX,x E d. Now $(ux) Thus IuI= IZ- 1 so that by the inductive = (xll/(u))(-). or q/( p + q). But these fractions hypothesis Illl/(u)l\ = p/q E Y. 10 it follows that zyxwvutsrqponmlkjihgfedcbaZYXWV f = jItj(w)lj is either p/(p+q) By Lemma belong to Y because of the induction hypothesis. Cl The words of the set {a} zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDC U ad* can be represented by the vertices of a binary tree, where words are ordered lexicographically. Words of smaller length are to the left of words of greater length. The root represents the empty word zyxwvutsrqponmlkjihgfedcbaZYXWVU E. subtree having the root a is a complete binary tree. The edges represent the ‘covering’ relation relative to the prejixial ordering, i.e. there is an edge from u to v if and only if there is a letter a E szf such that v = ua. In view of the previous bijections one can associate with each vertex a strictly bispecial element of St and a Farey number If a vertex denotes a word w, then its corresponding (see Fig. 1). Farey number If WI and w2 are ‘son’ vertices of w, then their Farey numbers is p/q = ll$(w)ll. will be p/( p + q) and q/( p + q). If s = $(w), si = $(wr ), s2 = $(wz), then si and s2 are obtained from s by the left-palindrome closures of as and bs. We call the above tree the Farey tree. Let us consider a set of vertices on the Farey tree representing a prefix code X C: ,d*. As we have seen in Section 5 the set G(X) of the corresponding strictly bispecial elements is a biprefix code. This result is easily interpreted by the Farey tree. In fact, any element yi E It/(X) cannot be derived from any other element y2 E $(X), y2 # ~1, by left palindrome closures so that r&7) is a prefix code and then a biprefix code. For a finite sequence (al,. . . , a,) of integers such that ai > 0, zyxwvutsrqponmlkjihgfedcbaZY 1 <i < n and a, 30, we set (al,. , a,) equal to the continued fraction [0, al,. , a,_l,a, + 11, i.e.: (a1,...,4l) 1 = 1 al + 1 an-1 + _L--&I -I- 1 80 A. De LucaITheoretical Computer Science 183 (1997) 4542 zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA Theorem 8. Let w be a word on the alphabet d = {a, b} and $(w) be the corresponding strictly bispecial element. If (hl, . . . , h,) is the integral representation of w, then the Farey number jl$(w)ll has a development in continued fractions given by (h,, . . . , h). Proof. The proof is by induction on the length n of the integral representation (hl, . . . , h,) of w. Let us first suppose that hl > 0. Base of the induction. In the case n = 1 the word w = ahI and *(ah’) = ah1 whose Farey number is l/(hl + 1). In this case the result is trivial. Suppose n = 2 so that w = ahlbhz. One has $(ahlbh2) = (ah1b)h2ahl. Now 1 hl + 1 = hz(h,+1)+1=h2+ 1 ’ w’xhn, h,_l ) is the integral representation of w’. Thus by the inductive hypothesis II$(w’)Ij = (h,_l,. . . , hl). Let P’xyQ’ with P’, Q’ E PAL, IP’I < IQ’l, x, y E d, where (hl,..., x # y, be the canonical We first prove representation of $(w). that if n is even, then the intermediate word xy =ab and if n is an odd integer >l, then xy = ba. Indeed, for n =2, It/(w) = (ahlb)h2ahl, so that ~r~~hl-1, zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA Q’=(a h~b)hz-la h 1 and xy = ab. Suppose the assertion is true up to n - 1. We want to prove it for n. Since w = w’xhn if n is odd then x = a and by induction $(w’)= PabQ with P,Q E PAL, IPI < IQl. On the contrary, if n is even then x= b and by induction t+k(w’)= PbaQ. By Proposition 9, one has that in the first case the intermediate word of Ii/(w) is ba and in the second case is ab. Let n be any integer greater than 2. By the above result one has that the intermediate word of $(w’) is xy so that $(w’) = PxyQ. By Proposition 9 one has that I,&w) = $(w’xhn) = (Q~x)~“Px~Q, so that setting p = IPI + 2, q = IQ1+ 2 one derives: lMw )ll= 4 P + h,q 1 zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGF = ___ h, + &!. 4 By the inductive hypothesis p/q=II$(w’)II=(hn_l,...,hl) so that Ilt&w)ll=(h,,...,hl). Let us now suppose hl = 0. In this case the word $ has the integral representation (hz,. . . , h,), so that by the above result lIIc/(G)ll= (h,,. . .,hz). Since (h,,. . .,h~) = (h,, . . . , hz,O) and lltj(w)ll = ~~$(~)~~, the result follows. q A. De LucalTheoretical Computer Let FC be the set of all continued i E [ 1, n]. We can introduce operation o defined ,..., h,) Let FC’ be the monoid obtained from FC by adding From the theory of continued fractions one has that any represented by one element of FC’. We shall denote by the fraction i corresponds the identity of FC’). Thus Hence P so that FC is a semigroup. Proof. to F by defining to FC an identity element 1. Farey number can be faithfully g: 9 -+ FC’ this bijection (to the monoid operation in FC’ for x, y E 9 the product x o y, as 0 a(y)). that we call the Farey monoid. is a monoid Proposition If x = if Iz is odd. One can easily verify that the above product is associative x 0 y = 6(0(x) as follows. if IZ is even, (k, ,..., k,,_l,k,+hl,h2 transmitted 81 are in FC, then: (kl,...,k,,hl,...,h,) can be naturally 45-82 (hl,. . . ,I$,) with n > 0 and h, > 0, fractions in FC a product (h,, . . ,h,) and y = ($1,. . _,k,) Science 183 (1997) 14. The map [ : a&‘* zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHG U {E} + 9 is a monoid isomorphism. Let WI, 14~~ E a&* U {E}. We want to prove that [(WI w2) = [(WI ) o i(w2). result is trivial if w1 or w2 is the empty word. Let us then suppose Let us denote, respectively, by (hl, . . . , h,) and (kl, . . . , k,) of WI and ~2. By Theorem 8, the integral The WI, w2 E a&*. representations Il$(wl>I1 = (h,,. . . ,h), zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFED llt4~2)11= (kn,...,h), where we identify a continued fraction with its value. The word wlw2 has the integral representation (h 1,, . . , h,,kl,..., k,) if n is even and (h, ,..., h,_l,h,+kl,k2 ,..., k,) if n is odd. Thus if n is even zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGF i(w1~2)=II~(w,~2))1=(k,,...,k,,h,,...,hlj=(h,,...,hljo(k,,...,kl) Il$(w III 0 ll$(w 2>ll = 5(w) ORw2). = If n is odd, then i(w1w2)== IIIc/(ww2)II = ll~(w)II Since [ is a bijection, =(k,n,...,k2,kl O llti(w2>11=i(w>“ i(w2). the result follows. We have seen in Section of a sequence SIJ= E, {s,},>o +h,,h,+l,...,h,)=(h,,...,hl)o(k,,...,kl) q 5 that any infinite standard of words defined for any x E &f s,+l = (x,s,)(-), n 20. Sturmian as word s is the limit A. De Luca I Theoretical Computer Science 183 (1997) 45-82 82 Let us remark that if ~1 =x1 =x E {a, b}, then for all n > 0, s,, E SBS(,). the above infinite sequence sequence is completely determined, up to the automorphism By Lemma 9 (^), by the of Farey numbers: fO,fl,...,fn,..., where for any 1220, zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA fn= [Is, 11. L et us set for any n 2 0, fn= p,,/q,,, with p,, <q,, and gcd(p,,q,) = 1. One has then fn+l=p&p, +qn) or fo= i and by Lemma 10, for any n, fn+l=qn/(pn+h). In this way we obtain a complete arithmetical description of standard words. References [l] J. Berstel, D. Perrin, Theory of Codes, Academic Press, New York, 1985. [2] J. Berstel, P. S&bold, A remark on Morphic Stnrmian words, R.A.I.R.O., LT. 28 (1994) 255-263. [3] T.C. Brown, Descriptions of the characteristic sequence of an irrational, Canad. Math. Bull. 36 (1993) 15-21. [4] E.B. Christoffel, Observatio Arithmetica, Ann. Mat. Pura Appl. 6 (1875) 145-152. [5] A. de Luca, A combinatorial property of the Fibonacci word, Inform. Process. Len 12 (1981) 193-195. [6] A. de Luca, Sturmian words: new combinatorial results, in: J. Almeida, G.M.S. Gomes, P.V. Silva (Eds.), Proc. Conf. on Semigroups Automata and Languages, Porto, 1994, World Scientific, Singapore, 1996, pp. 67-83. [7] A. de Luca, F. Mignosi, On some Combinatorial properties of Sturmian words, Theoret. Comput. Sci. 136 (1994) 361-385. [8] S. Dulucq, D. Gouyou-Beauchamps, Sur les facteurs des suites de Stnrm, Theoret. Comput. Sci. 71 (1990) 381-400. [9] N.J. Fine, H.S. Wilf, Uniqueness theorems for periodic functions, Proc. Amer. Math. Sot. 16 (1965) 109-l 14. [lo] G.A. Hedlund, M. Morse, Sturmian sequences, Amer. J. Math. 61 (1940) 605-620. [l l] M. Lothaire, Combinatorics on Words, Addison-Wesley, Reading, MA, 1983. [12] F. Mignosi, Infinite words with linear subword complexity, Theoret. Comput. Sci. 65 (1989) 221-242. [13] F. Mignosi, On the number of factors of Sturmian words, Theoret. Comput. Sci. 82 (1991) 71-84. [14] A. Pedersen, Solution of Problem E 3156, Amer. Math. Monthly 95 (1988) 954-955. [15] G.N. Raney, On continued fractions and finite automata, Math. Ann. 206 (1973) 265-283. [16] G. Rauzy, Mots infinis en arithmetique, in: M. Nivat, D. Perrin (Eds.), Automata on Inflnite Words, Lectures Notes in Computer Science, vol. 192, Springer, Berlin, 1984, pp. 165-171. [17] R.M. Robinson, Problem E3156, Amer. Math. Monthly 93 (1986) 482. [18] B.A. Venkov, Elementary Number Theory, Wolters-Noordhoff, Pays-Bas, 1970.