Academia.eduAcademia.edu

Outline

2015

Abelian Repetitions in Sturmian Words Gabriele Fici1 Lefebvre3 Alessio Langiu2 Thierry Lecroq3 Arnaud Filippo Mignosi4 Élise Prieur-Gaston3 1 Università di Palermo, Italy King’s College London, UK Normandie Université, Université de Rouen, LITIS EA 4108, France 4 Università dell’Aquila, Italy 2 3 SeqBio 2013 November 25th-26th 2013 – Montpellier, France TL (LITIS) Abelian Periods SeqBio 2013 1 / 27 2nd International Conference on Algorithms for Big Data Sala Gialla (Yellow Room) of Palazzo dei Normanni — Piazza del Parlamento Palermo, Italy, 07-09 April 2014 TL (LITIS) Abelian Periods SeqBio 2013 2 / 27 Outline 1 Introduction 2 Sturmian words and abelian repetitions TL (LITIS) Abelian Periods SeqBio 2013 3 / 27 Outline 1 Introduction 2 Sturmian words and abelian repetitions TL (LITIS) Abelian Periods SeqBio 2013 4 / 27 Notation and definitions Given a word w = w[1 . . n] of length n over alphabet Σ = {a1 , . . . , aσ } of cardinality σ we denote by: w[i] its i-th symbol w[i . . j] the factor from the i-th to the j-th symbols |w|a the number of occurrences of symbol a in w Pw = (|w|a1 , . . . , |w|aσ ) its Parikh vector TL (LITIS) Abelian Periods SeqBio 2013 5 / 27 Notation and definitions Given a word w = w[1 . . n] of length n over alphabet Σ = {a1 , . . . , aσ } of cardinality σ we denote by: w[i] its i-th symbol w[i . . j] the factor from the i-th to the j-th symbols |w|a the number of occurrences of symbol a in w Pw = (|w|a1 , . . . , |w|aσ ) its Parikh vector TL (LITIS) Abelian Periods SeqBio 2013 5 / 27 Notation and definitions Given a word w = w[1 . . n] of length n over alphabet Σ = {a1 , . . . , aσ } of cardinality σ we denote by: w[i] its i-th symbol w[i . . j] the factor from the i-th to the j-th symbols |w|a the number of occurrences of symbol a in w Pw = (|w|a1 , . . . , |w|aσ ) its Parikh vector TL (LITIS) Abelian Periods SeqBio 2013 5 / 27 Notation and definitions Given a word w = w[1 . . n] of length n over alphabet Σ = {a1 , . . . , aσ } of cardinality σ we denote by: w[i] its i-th symbol w[i . . j] the factor from the i-th to the j-th symbols |w|a the number of occurrences of symbol a in w Pw = (|w|a1 , . . . , |w|aσ ) its Parikh vector TL (LITIS) Abelian Periods SeqBio 2013 5 / 27 Remarks on Parikh vectors Consider Pw = (|w|a1 , . . . , |w|aσ ) then Pw [i] = |w|ai P |Pw | = σi=1 Pw [i] = |w| Pw ⊂ Q iff Pw [i] 6 Q[i] for every 1 6 i 6 σ and |Pw | < |Q| TL (LITIS) Abelian Periods SeqBio 2013 6 / 27 Remarks on Parikh vectors Consider Pw = (|w|a1 , . . . , |w|aσ ) then Pw [i] = |w|ai P |Pw | = σi=1 Pw [i] = |w| Pw ⊂ Q iff Pw [i] 6 Q[i] for every 1 6 i 6 σ and |Pw | < |Q| TL (LITIS) Abelian Periods SeqBio 2013 6 / 27 Remarks on Parikh vectors Consider Pw = (|w|a1 , . . . , |w|aσ ) then Pw [i] = |w|ai P |Pw | = σi=1 Pw [i] = |w| Pw ⊂ Q iff Pw [i] 6 Q[i] for every 1 6 i 6 σ and |Pw | < |Q| TL (LITIS) Abelian Periods SeqBio 2013 6 / 27 Remarks on Parikh vectors Consider Pw = (|w|a1 , . . . , |w|aσ ) then Pw [i] = |w|ai P |Pw | = σi=1 Pw [i] = |w| Pw ⊂ Q iff Pw [i] 6 Q[i] for every 1 6 i 6 σ and |Pw | < |Q| Example Pterm TL (LITIS) Abelian Periods SeqBio 2013 6 / 27 Remarks on Parikh vectors Consider Pw = (|w|a1 , . . . , |w|aσ ) then Pw [i] = |w|ai P |Pw | = σi=1 Pw [i] = |w| Pw ⊂ Q iff Pw [i] 6 Q[i] for every 1 6 i 6 σ and |Pw | < |Q| Example Pterm ⊂ Premote TL (LITIS) Abelian Periods SeqBio 2013 6 / 27 Remarks on Parikh vectors Consider Pw = (|w|a1 , . . . , |w|aσ ) then Pw [i] = |w|ai P |Pw | = σi=1 Pw [i] = |w| Pw ⊂ Q iff Pw [i] 6 Q[i] for every 1 6 i 6 σ and |Pw | < |Q| Example Pterm ⊂ Premote ⊂ Pmontpellier TL (LITIS) Abelian Periods SeqBio 2013 6 / 27 Abelian periods [Constantinescu and Ilie, 2006] introduced the notion of Abelian period. Definition A word w has Abelian period (h, p) iff w = u0 u1 · · · uk−1 uk such that: Pu0 ⊂ Pu1 = · · · = Puk−1 ⊃ Puk |Pu0 | = h, |Pu1 | = p u0 is called the head and uk is called the tail. Pw will denote the set of Abelian periods of w. TL (LITIS) Abelian Periods SeqBio 2013 7 / 27 Abelian periods 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 w=a b a a b b b b a a a b a b a b a b b a b b a a TL (LITIS) Abelian Periods SeqBio 2013 8 / 27 Abelian periods 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 w=a b a a b b b b a a a b a b a b a b b a b b a a Pw = {(0, 6)} TL (LITIS) Abelian Periods SeqBio 2013 8 / 27 Abelian periods 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 w=a b a a b b b b a a a b a b a b a b b a b b a a Pw = {(0, 6), (0, 10)} TL (LITIS) Abelian Periods SeqBio 2013 8 / 27 Abelian periods 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 w=a b a a b b b b a a a b a b a b a b b a b b a a Pw = {(0, 6), (0, 10), (0, 12)} TL (LITIS) Abelian Periods SeqBio 2013 8 / 27 Abelian periods 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 w=a b a a b b b b a a a b a b a b a b b a b b a a Pw = {(0, 6), (0, 10), (0, 12), (0, 24)} TL (LITIS) Abelian Periods SeqBio 2013 8 / 27 Abelian periods 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 w=a b a a b b b b a a a b a b a b a b b a b b a a Pw = {(0, 6), (0, 10), (0, 12), (0, 24), (1, 9)} TL (LITIS) Abelian Periods SeqBio 2013 8 / 27 Abelian periods 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 w=a b a a b b b b a a a b a b a b a b b a b b a a Pw = {(0, 6), (0, 10), (0, 12), (0, 24), (1, 9), (1, 11)} TL (LITIS) Abelian Periods SeqBio 2013 8 / 27 Abelian periods 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 w=a b a a b b b b a a a b a b a b a b b a b b a a Pw = {(0, 6), (0, 10), (0, 12), (0, 24), (1, 9), (1, 11), (2, 8)} TL (LITIS) Abelian Periods SeqBio 2013 8 / 27 Abelian periods 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 w=a b a a b b b b a a a b a b a b a b b a b b a a Pw = {(0, 6), (0, 10), (0, 12), (0, 24), (1, 9), (1, 11), (2, 8), (3, 9)} TL (LITIS) Abelian Periods SeqBio 2013 8 / 27 Abelian periods 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 w=a b a a b b b b a a a b a b a b a b b a b b a a Pw = {(0, 6), (0, 10), (0, 12), (0, 24), (1, 9), (1, 11), (2, 8), (3, 9), (4, 7)} TL (LITIS) Abelian Periods SeqBio 2013 8 / 27 Abelian periods 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 w=a b a a b b b b a a a b a b a b a b b a b b a a Pw = {(0, 6), (0, 10), (0, 12), (0, 24), (1, 9), (1, 11), (2, 8), (3, 9), (4, 7), (5, 7)} TL (LITIS) Abelian Periods SeqBio 2013 8 / 27 Abelian periods 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 w=a b a a b b b b a a a b a b a b a b b a b b a a Pw = {(0, 6), (0, 10), (0, 12), (0, 24), (1, 9), (1, 11), (2, 8), (3, 9), (4, 7), (5, 7), (5, 9)} TL (LITIS) Abelian Periods SeqBio 2013 8 / 27 Abelian periods 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 w=a b a a b b b b a a a b a b a b a b b a b b a a Pw = {(0, 6), (0, 10), (0, 12), (0, 24), (1, 9), (1, 11), (2, 8), (3, 9), (4, 7), (5, 7), (5, 9)} TL (LITIS) Abelian powers (weak Ap) Abelian Periods SeqBio 2013 8 / 27 Abelian periods Remark an has n2 Abelian periods. TL (LITIS) Abelian Periods SeqBio 2013 9 / 27 Motivations Bioinformatics finding CpG islands finding clusters of genes proteomics: mass spectrometry Other fields approximate pattern matching games (letters) TL (LITIS) Abelian Periods SeqBio 2013 10 / 27 Sturmian words Definition 1 Infinite words over a binary alphabet that have exactly n + 1 distinct factors of length n for each n ≥ 0 TL (LITIS) Abelian Periods SeqBio 2013 11 / 27 Fibonacci words Fibonacci numbers F0 = 0, F1 = 1, Fj = Fj−1 + Fj−2 for j ≥ 2 (0, 1, 1, 2, 3, 5, 8, 13, 21, 34, . . .) Fibonacci words f1 = b, f2 = a, fj = fj−1 · fj−2 for j ≥ 3 (b, a, ab, aba, abaab, abaababa, abaababaabaab, . . .) Fibonacci words are Sturmian words TL (LITIS) Abelian Periods SeqBio 2013 12 / 27 Outline 1 Introduction 2 Sturmian words and abelian repetitions TL (LITIS) Abelian Periods SeqBio 2013 13 / 27 Our starting point G. Fici, T. L., A. Lefebvre and É. Prieur-Gaston Computing Abelian periods in words In J. Holub and J. Žďárek editors, Proceedings of the Prague Stringology Conference 2011 (PSC 2011), Prague, Tcheque Republic, Pages 184–196, 2011 G. Fici, T. L., A. Lefebvre, É. Prieur-Gaston and W. F. Smyth Quasi-Linear Time Computation of the Abelian Periods of a Word In J. Holub and J. Žďárek editors, Proceedings of the Prague Stringology Conference 2012 (PSC 2012), Prague, Tcheque Republic, Pages 103–110, 2012 G. Fici, T. L., A. Lefebvre and É. Prieur-Gaston Algorithms for Computing Abelian Periods of Words. Discrete Applied Mathematics, 2013, to appear TL (LITIS) Abelian Periods SeqBio 2013 14 / 27 Our starting point 2 i 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Fi 0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 ap 0 1 1 1 2 2 2 3 5 5 5 8 13 13 13 21 TL (LITIS) i 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Fi 987 1597 2584 4181 6765 10946 17711 28657 46368 75025 121393 196418 317811 514229 832040 1346269 ap 34 34 34 55 89 89 89 144 233 233 233 377 610 610 610 987 Abelian Periods i 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 Fi 2178309 3524578 5702887 9227465 14930352 24157817 39088169 63245986 102334155 165580141 267914296 433494437 701408733 1134903170 1836311903 2971215073 ap 1597 1597 1597 2584 4181 4181 4181 6765 10946 10946 10946 17711 28657 28657 28657 46368 SeqBio 2013 15 / 27 Sturmian words Definition 2 Let α and ρ, α ∈ (0, 1) irrational . The fractional part of r is defined by {r} = r − ⌊r⌋. Therefore, for α ∈ (0, 1), one has that {−α} = 1 − α. The sequence {nα + ρ}, n > 0, defines an infinite word sα,ρ = a1 (α, ρ)a2 (α, ρ) · · · by the rule  b if {nα + ρ} ∈ [0, {−α}), an (α, ρ) = a if {nα + ρ} ∈ [{−α}, 1). {−α} 0 (an ) 1 a b For α = φ − 1 and ρ = 0, φ = (1 + TL (LITIS) √ 5)/2, f = abaababaabaabab · · · Abelian Periods SeqBio 2013 16 / 27 The Sturmian bijection 1 Proposition For any n, i, with n > 0, if {−(i + 1)α} < {−iα} then an+i = a ⇐⇒ {nα + ρ} ∈ [{−(i + 1)α}, {−iα}), whereas if {−iα} < {−(i + 1)α}) then an+i = a ⇐⇒ {nα + ρ} ∈ [0, {−iα}) ∪ [{−(i + 1)α}, 1). {−α} 0 (an+1 ) a {−2α} b 1 a When α = φ − 1 ≈ 0.618 (thus {−α} ≈ 0.382) for i = 1. If {nα + ρ} ∈ [0, {−α}) ∪ [{−2α}, 1), then an+1 = a; otherwise an+1 = b. TL (LITIS) Abelian Periods SeqBio 2013 17 / 27 The Sturmian bijection 2 {−3α} {−6α}{−α} {−4α} c0 (α, 6) c1 (α, 6) c2 (α, 6)c3 (α, 6) c4 (α, 6) α 0 0.145... 0.291...0.381... 0.527... (an ) (an+1 ) (an+2 ) (an+3 ) (an+4 ) (an+5 ) b a b a a b b a a b a b b a a b a a a b a b a a {−2α} {−5α} c5 (α, 6) c6 (α, 6)c7 (α, 6) 0.763... 0.909... 1 a b a a b a a a b a b a a a b a a b The subintervals of the Sturmian bijection obtained for α = φ − 1 and m = 6. Below each interval there is the factor of sα of length 6 associated with that interval. For ρ = 0 and n = 1, the prefix of length 6 of the Fibonacci word is associated with [c4 (α, 6), c5 (α, 6)), which is the interval containing α. TL (LITIS) Abelian Periods SeqBio 2013 18 / 27 The Sturmian bijection and abelian repetitions {−3α} {−6α}{−α} {−4α} c0 (α, 6) c1 (α, 6) c2 (α, 6)c3 (α, 6) c4 (α, 6) α 0 0.145... 0.291...0.381... 0.527... (an ) (an+1 ) (an+2 ) (an+3 ) (an+4 ) (an+5 ) b a b a a b b a a b a b b a a b a a a b a b a a {−2α} {−5α} c5 (α, 6) c6 (α, 6)c7 (α, 6) 0.763... 0.909... 1 a b a a b a a a b a b a a a b a a b All factors of length m to the right of {−mα} have the same Parikh vector. All factors of length m to the left of {−mα} have the same Parikh vector. The two Parikh vectors are different. TL (LITIS) Abelian Periods SeqBio 2013 19 / 27 The Sturmian bijection and abelian repetitions 2 Main Idea All the points in the sequence {nα}, {(n + m)α}, {(n + 2m)α}, . . . , {(n + km)α} are one after the other in the unitary thorus with step = |{−mα}|, i.e. the distance between {(n + im)α} and {(n + (i + 1)m)α} is |{−mα}| in the unitary thorus. HENCE, if |{−mα}| is small and {nα} is close to zero, there is a big number k such that all previous points are all to the left of {−mα}, in the unitary interval. In turn, by the Sturmian bijection, the factors of length m starting at letters an , an+m , an+2m , . . . , an+km have the same Parikh vector. We have an abelian power of exponent k (and conversely). TL (LITIS) Abelian Periods SeqBio 2013 20 / 27 The Sturmian bijection and abelian repetitions 3 Main result Theorem Let m be a positive integer such that {mα} < 0.5 (resp. {mα} > 0.5). Then: 1 2 In sα there is an abelian power of period m and exponent k ≥ 2 if and only if {mα} < k1 (resp. {−mα} < k1 ). If in sα there is an abelian power of period m and exponent k ≥ 2 starting in position i with {iα} ≥ {mα} (resp. {iα} ≤ {mα}), then 1 1 1 {mα} < k+1 (resp. {−mα} < k+1 ). Conversely, if {mα} < k+1 1 (resp. {−mα} < k+1 ), then there is an abelian power of period m and exponent k ≥ 2 starting in position m. TL (LITIS) Abelian Periods SeqBio 2013 21 / 27 Consequences using Number Theory Theorem Let sα be a Sturmian word. For any integer q > 1, let kq be the maximal exponent of an abelian repetition of period q in sα . Then lim sup √ kq ≥ 5, q and the equality holds if α = φ − 1. TL (LITIS) Abelian Periods SeqBio 2013 22 / 27 Other results on Fibonacci words Theorem Let j > 1. The longest prefix of the Fibonacci infinite word that is an abelian repetition of period Fj has length Fj (Fj+1 + Fj−1 + 1) − 2 if j is even or Fj (Fj+1 + Fj−1 ) − 2 if j is odd. Corollary Let j > 1 and kj be the maximal exponent of a prefix of the Fibonacci word that is an abelian repetition of period Fj . Then √ kj = 5. j→∞ Fj lim TL (LITIS) Abelian Periods SeqBio 2013 23 / 27 Other results on Fibonacci words 2 Theorem For j ≥ 3, the (smallest) abelian period of the word fj is the n-th Fibonacci number Fn , where n = ⌊j/2⌋ if j = 0, 1, 2 mod 4, or n = 1 + ⌊j/2⌋ if j = 3 mod 4. 2, 2, 2, 3, 5, 5, 5, 8, 13, 13, 13, 21, 34, 34, 34, 55, 89, 89, 89 2 is the abelian period of aba, a ba ab and of a ba ab ab a. Not of aba aba baa baa b that has abelian period 3. Instead 5 is the abelian period of a baaba baaba ababa ababa and of a baaba baaba ababa ababa abaab abaab aab TL (LITIS) Abelian Periods SeqBio 2013 24 / 27 Open problems 1 2 Is it possible to find the exact value of lim sup words sα with slope α different from φ − 1? kq q for other Sturmian Is it possible to give the exact value of this superior limit when α is an algebraic number of degree 2? TL (LITIS) Abelian Periods SeqBio 2013 25 / 27 References G. Fici, A. Langiu, T. L., A. Lefebvre, F. Mignosi, and É. Prieur-Gaston Abelian repetitions in sturmian words In M.-P. Béal and O. Carton, editors, Proceedings of the 17th International Conference on Developments in Language Theory (DLT 2013), volume 7907 of Lecture Notes in Computer Science, pages 227–238, Marne-la-Vallée, France, 2013. Springer-Verlag, Berlin G. Fici, A. Langiu, T. L., A. Lefebvre, F. Mignosi, and É. Prieur-Gaston Abelian repetitions in sturmian words Report arXiv:1209.6013v3 TL (LITIS) Abelian Periods SeqBio 2013 26 / 27 THANK YOU FOR YOUR ATTENTION! TL (LITIS) Abelian Periods SeqBio 2013 27 / 27