Abelian Repetitions in Sturmian Words
Gabriele Fici1
Lefebvre3
Alessio Langiu2 Thierry Lecroq3 Arnaud
Filippo Mignosi4 Élise Prieur-Gaston3
1
Università di Palermo, Italy
King’s College London, UK
Normandie Université, Université de Rouen, LITIS EA 4108, France
4
Università dell’Aquila, Italy
2
3
SeqBio 2013
November 25th-26th 2013 – Montpellier, France
TL (LITIS)
Abelian Periods
SeqBio 2013
1 / 27
2nd International Conference on
Algorithms for Big Data
Sala Gialla (Yellow Room) of Palazzo dei Normanni — Piazza del
Parlamento
Palermo, Italy, 07-09 April 2014
TL (LITIS)
Abelian Periods
SeqBio 2013
2 / 27
Outline
1
Introduction
2
Sturmian words and abelian repetitions
TL (LITIS)
Abelian Periods
SeqBio 2013
3 / 27
Outline
1
Introduction
2
Sturmian words and abelian repetitions
TL (LITIS)
Abelian Periods
SeqBio 2013
4 / 27
Notation and definitions
Given a word w = w[1 . . n] of length n over alphabet Σ = {a1 , . . . , aσ } of
cardinality σ we denote by:
w[i] its i-th symbol
w[i . . j] the factor from the i-th to the j-th symbols
|w|a the number of occurrences of symbol a in w
Pw = (|w|a1 , . . . , |w|aσ ) its Parikh vector
TL (LITIS)
Abelian Periods
SeqBio 2013
5 / 27
Notation and definitions
Given a word w = w[1 . . n] of length n over alphabet Σ = {a1 , . . . , aσ } of
cardinality σ we denote by:
w[i] its i-th symbol
w[i . . j] the factor from the i-th to the j-th symbols
|w|a the number of occurrences of symbol a in w
Pw = (|w|a1 , . . . , |w|aσ ) its Parikh vector
TL (LITIS)
Abelian Periods
SeqBio 2013
5 / 27
Notation and definitions
Given a word w = w[1 . . n] of length n over alphabet Σ = {a1 , . . . , aσ } of
cardinality σ we denote by:
w[i] its i-th symbol
w[i . . j] the factor from the i-th to the j-th symbols
|w|a the number of occurrences of symbol a in w
Pw = (|w|a1 , . . . , |w|aσ ) its Parikh vector
TL (LITIS)
Abelian Periods
SeqBio 2013
5 / 27
Notation and definitions
Given a word w = w[1 . . n] of length n over alphabet Σ = {a1 , . . . , aσ } of
cardinality σ we denote by:
w[i] its i-th symbol
w[i . . j] the factor from the i-th to the j-th symbols
|w|a the number of occurrences of symbol a in w
Pw = (|w|a1 , . . . , |w|aσ ) its Parikh vector
TL (LITIS)
Abelian Periods
SeqBio 2013
5 / 27
Remarks on Parikh vectors
Consider Pw = (|w|a1 , . . . , |w|aσ ) then
Pw [i] = |w|ai
P
|Pw | = σi=1 Pw [i] = |w|
Pw ⊂ Q iff Pw [i] 6 Q[i] for every 1 6 i 6 σ and |Pw | < |Q|
TL (LITIS)
Abelian Periods
SeqBio 2013
6 / 27
Remarks on Parikh vectors
Consider Pw = (|w|a1 , . . . , |w|aσ ) then
Pw [i] = |w|ai
P
|Pw | = σi=1 Pw [i] = |w|
Pw ⊂ Q iff Pw [i] 6 Q[i] for every 1 6 i 6 σ and |Pw | < |Q|
TL (LITIS)
Abelian Periods
SeqBio 2013
6 / 27
Remarks on Parikh vectors
Consider Pw = (|w|a1 , . . . , |w|aσ ) then
Pw [i] = |w|ai
P
|Pw | = σi=1 Pw [i] = |w|
Pw ⊂ Q iff Pw [i] 6 Q[i] for every 1 6 i 6 σ and |Pw | < |Q|
TL (LITIS)
Abelian Periods
SeqBio 2013
6 / 27
Remarks on Parikh vectors
Consider Pw = (|w|a1 , . . . , |w|aσ ) then
Pw [i] = |w|ai
P
|Pw | = σi=1 Pw [i] = |w|
Pw ⊂ Q iff Pw [i] 6 Q[i] for every 1 6 i 6 σ and |Pw | < |Q|
Example
Pterm
TL (LITIS)
Abelian Periods
SeqBio 2013
6 / 27
Remarks on Parikh vectors
Consider Pw = (|w|a1 , . . . , |w|aσ ) then
Pw [i] = |w|ai
P
|Pw | = σi=1 Pw [i] = |w|
Pw ⊂ Q iff Pw [i] 6 Q[i] for every 1 6 i 6 σ and |Pw | < |Q|
Example
Pterm ⊂ Premote
TL (LITIS)
Abelian Periods
SeqBio 2013
6 / 27
Remarks on Parikh vectors
Consider Pw = (|w|a1 , . . . , |w|aσ ) then
Pw [i] = |w|ai
P
|Pw | = σi=1 Pw [i] = |w|
Pw ⊂ Q iff Pw [i] 6 Q[i] for every 1 6 i 6 σ and |Pw | < |Q|
Example
Pterm ⊂ Premote ⊂ Pmontpellier
TL (LITIS)
Abelian Periods
SeqBio 2013
6 / 27
Abelian periods
[Constantinescu and Ilie, 2006] introduced the notion of Abelian period.
Definition
A word w has Abelian period (h, p) iff w = u0 u1 · · · uk−1 uk such that:
Pu0 ⊂ Pu1 = · · · = Puk−1 ⊃ Puk
|Pu0 | = h, |Pu1 | = p
u0 is called the head and uk is called the tail.
Pw will denote the set of Abelian periods of w.
TL (LITIS)
Abelian Periods
SeqBio 2013
7 / 27
Abelian periods
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
w=a b a a b b b b a a a b a b a b a b b a b b a a
TL (LITIS)
Abelian Periods
SeqBio 2013
8 / 27
Abelian periods
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
w=a b a a b b b b a a a b a b a b a b b a b b a a
Pw = {(0, 6)}
TL (LITIS)
Abelian Periods
SeqBio 2013
8 / 27
Abelian periods
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
w=a b a a b b b b a a a b a b a b a b b a b b a a
Pw = {(0, 6), (0, 10)}
TL (LITIS)
Abelian Periods
SeqBio 2013
8 / 27
Abelian periods
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
w=a b a a b b b b a a a b a b a b a b b a b b a a
Pw = {(0, 6), (0, 10), (0, 12)}
TL (LITIS)
Abelian Periods
SeqBio 2013
8 / 27
Abelian periods
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
w=a b a a b b b b a a a b a b a b a b b a b b a a
Pw = {(0, 6), (0, 10), (0, 12), (0, 24)}
TL (LITIS)
Abelian Periods
SeqBio 2013
8 / 27
Abelian periods
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
w=a b a a b b b b a a a b a b a b a b b a b b a a
Pw = {(0, 6), (0, 10), (0, 12), (0, 24),
(1, 9)}
TL (LITIS)
Abelian Periods
SeqBio 2013
8 / 27
Abelian periods
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
w=a b a a b b b b a a a b a b a b a b b a b b a a
Pw = {(0, 6), (0, 10), (0, 12), (0, 24),
(1, 9), (1, 11)}
TL (LITIS)
Abelian Periods
SeqBio 2013
8 / 27
Abelian periods
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
w=a b a a b b b b a a a b a b a b a b b a b b a a
Pw = {(0, 6), (0, 10), (0, 12), (0, 24),
(1, 9), (1, 11),
(2, 8)}
TL (LITIS)
Abelian Periods
SeqBio 2013
8 / 27
Abelian periods
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
w=a b a a b b b b a a a b a b a b a b b a b b a a
Pw = {(0, 6), (0, 10), (0, 12), (0, 24),
(1, 9), (1, 11),
(2, 8),
(3, 9)}
TL (LITIS)
Abelian Periods
SeqBio 2013
8 / 27
Abelian periods
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
w=a b a a b b b b a a a b a b a b a b b a b b a a
Pw = {(0, 6), (0, 10), (0, 12), (0, 24),
(1, 9), (1, 11),
(2, 8),
(3, 9),
(4, 7)}
TL (LITIS)
Abelian Periods
SeqBio 2013
8 / 27
Abelian periods
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
w=a b a a b b b b a a a b a b a b a b b a b b a a
Pw = {(0, 6), (0, 10), (0, 12), (0, 24),
(1, 9), (1, 11),
(2, 8),
(3, 9),
(4, 7),
(5, 7)}
TL (LITIS)
Abelian Periods
SeqBio 2013
8 / 27
Abelian periods
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
w=a b a a b b b b a a a b a b a b a b b a b b a a
Pw = {(0, 6), (0, 10), (0, 12), (0, 24),
(1, 9), (1, 11),
(2, 8),
(3, 9),
(4, 7),
(5, 7), (5, 9)}
TL (LITIS)
Abelian Periods
SeqBio 2013
8 / 27
Abelian periods
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
w=a b a a b b b b a a a b a b a b a b b a b b a a
Pw = {(0, 6), (0, 10), (0, 12), (0, 24),
(1, 9), (1, 11),
(2, 8),
(3, 9),
(4, 7),
(5, 7), (5, 9)}
TL (LITIS)
Abelian powers (weak Ap)
Abelian Periods
SeqBio 2013
8 / 27
Abelian periods
Remark
an has n2 Abelian periods.
TL (LITIS)
Abelian Periods
SeqBio 2013
9 / 27
Motivations
Bioinformatics
finding CpG islands
finding clusters of genes
proteomics: mass spectrometry
Other fields
approximate pattern matching
games (letters)
TL (LITIS)
Abelian Periods
SeqBio 2013
10 / 27
Sturmian words
Definition 1
Infinite words over a binary alphabet that have exactly n + 1 distinct
factors of length n for each n ≥ 0
TL (LITIS)
Abelian Periods
SeqBio 2013
11 / 27
Fibonacci words
Fibonacci numbers
F0 = 0, F1 = 1, Fj = Fj−1 + Fj−2 for j ≥ 2
(0, 1, 1, 2, 3, 5, 8, 13, 21, 34, . . .)
Fibonacci words
f1 = b, f2 = a, fj = fj−1 · fj−2 for j ≥ 3
(b, a, ab, aba, abaab, abaababa, abaababaabaab, . . .)
Fibonacci words are Sturmian words
TL (LITIS)
Abelian Periods
SeqBio 2013
12 / 27
Outline
1
Introduction
2
Sturmian words and abelian repetitions
TL (LITIS)
Abelian Periods
SeqBio 2013
13 / 27
Our starting point
G. Fici, T. L., A. Lefebvre and É. Prieur-Gaston
Computing Abelian periods in words
In J. Holub and J. Žďárek editors, Proceedings of the Prague
Stringology Conference 2011 (PSC 2011), Prague, Tcheque Republic,
Pages 184–196, 2011
G. Fici, T. L., A. Lefebvre, É. Prieur-Gaston and W. F. Smyth
Quasi-Linear Time Computation of the Abelian Periods of a Word
In J. Holub and J. Žďárek editors, Proceedings of the Prague
Stringology Conference 2012 (PSC 2012), Prague, Tcheque Republic,
Pages 103–110, 2012
G. Fici, T. L., A. Lefebvre and É. Prieur-Gaston
Algorithms for Computing Abelian Periods of Words.
Discrete Applied Mathematics, 2013, to appear
TL (LITIS)
Abelian Periods
SeqBio 2013
14 / 27
Our starting point 2
i
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Fi
0
1
1
2
3
5
8
13
21
34
55
89
144
233
377
610
ap
0
1
1
1
2
2
2
3
5
5
5
8
13
13
13
21
TL (LITIS)
i
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Fi
987
1597
2584
4181
6765
10946
17711
28657
46368
75025
121393
196418
317811
514229
832040
1346269
ap
34
34
34
55
89
89
89
144
233
233
233
377
610
610
610
987
Abelian Periods
i
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
Fi
2178309
3524578
5702887
9227465
14930352
24157817
39088169
63245986
102334155
165580141
267914296
433494437
701408733
1134903170
1836311903
2971215073
ap
1597
1597
1597
2584
4181
4181
4181
6765
10946
10946
10946
17711
28657
28657
28657
46368
SeqBio 2013
15 / 27
Sturmian words
Definition 2
Let α and ρ, α ∈ (0, 1) irrational .
The fractional part of r is defined by {r} = r − ⌊r⌋. Therefore, for
α ∈ (0, 1), one has that {−α} = 1 − α.
The sequence {nα + ρ}, n > 0, defines an infinite word
sα,ρ = a1 (α, ρ)a2 (α, ρ) · · · by the rule
b if {nα + ρ} ∈ [0, {−α}),
an (α, ρ) =
a if {nα + ρ} ∈ [{−α}, 1).
{−α}
0
(an )
1
a
b
For α = φ − 1 and ρ = 0, φ = (1 +
TL (LITIS)
√
5)/2, f = abaababaabaabab · · ·
Abelian Periods
SeqBio 2013
16 / 27
The Sturmian bijection 1
Proposition
For any n, i, with n > 0, if {−(i + 1)α} < {−iα} then
an+i = a ⇐⇒ {nα + ρ} ∈ [{−(i + 1)α}, {−iα}),
whereas if {−iα} < {−(i + 1)α}) then
an+i = a ⇐⇒ {nα + ρ} ∈ [0, {−iα}) ∪ [{−(i + 1)α}, 1).
{−α}
0
(an+1 )
a
{−2α}
b
1
a
When α = φ − 1 ≈ 0.618 (thus {−α} ≈ 0.382) for i = 1. If
{nα + ρ} ∈ [0, {−α}) ∪ [{−2α}, 1), then an+1 = a; otherwise an+1 = b.
TL (LITIS)
Abelian Periods
SeqBio 2013
17 / 27
The Sturmian bijection 2
{−3α} {−6α}{−α} {−4α}
c0 (α, 6) c1 (α, 6) c2 (α, 6)c3 (α, 6) c4 (α, 6) α
0
0.145... 0.291...0.381... 0.527...
(an )
(an+1 )
(an+2 )
(an+3 )
(an+4 )
(an+5 )
b
a
b
a
a
b
b
a
a
b
a
b
b
a
a
b
a
a
a
b
a
b
a
a
{−2α} {−5α}
c5 (α, 6) c6 (α, 6)c7 (α, 6)
0.763... 0.909... 1
a
b
a
a
b
a
a
a
b
a
b
a
a
a
b
a
a
b
The subintervals of the Sturmian bijection obtained for α = φ − 1 and
m = 6. Below each interval there is the factor of sα of length 6 associated
with that interval. For ρ = 0 and n = 1, the prefix of length 6 of the
Fibonacci word is associated with [c4 (α, 6), c5 (α, 6)), which is the interval
containing α.
TL (LITIS)
Abelian Periods
SeqBio 2013
18 / 27
The Sturmian bijection and abelian repetitions
{−3α} {−6α}{−α} {−4α}
c0 (α, 6) c1 (α, 6) c2 (α, 6)c3 (α, 6) c4 (α, 6) α
0
0.145... 0.291...0.381... 0.527...
(an )
(an+1 )
(an+2 )
(an+3 )
(an+4 )
(an+5 )
b
a
b
a
a
b
b
a
a
b
a
b
b
a
a
b
a
a
a
b
a
b
a
a
{−2α} {−5α}
c5 (α, 6) c6 (α, 6)c7 (α, 6)
0.763... 0.909... 1
a
b
a
a
b
a
a
a
b
a
b
a
a
a
b
a
a
b
All factors of length m to the right of {−mα} have the same Parikh
vector.
All factors of length m to the left of {−mα} have the same Parikh vector.
The two Parikh vectors are different.
TL (LITIS)
Abelian Periods
SeqBio 2013
19 / 27
The Sturmian bijection and abelian repetitions 2
Main Idea
All the points in the sequence
{nα}, {(n + m)α}, {(n + 2m)α}, . . . , {(n + km)α} are one after the other
in the unitary thorus with step = |{−mα}|, i.e. the distance between
{(n + im)α} and {(n + (i + 1)m)α} is |{−mα}| in the unitary thorus.
HENCE, if |{−mα}| is small and {nα} is close to zero, there is a big
number k such that all previous points are all to the left of {−mα}, in the
unitary interval.
In turn, by the Sturmian bijection, the factors of length m starting at
letters an , an+m , an+2m , . . . , an+km have the same Parikh vector. We
have an abelian power of exponent k (and conversely).
TL (LITIS)
Abelian Periods
SeqBio 2013
20 / 27
The Sturmian bijection and abelian repetitions 3
Main result
Theorem
Let m be a positive integer such that {mα} < 0.5 (resp. {mα} > 0.5).
Then:
1
2
In sα there is an abelian power of period m and exponent k ≥ 2 if
and only if {mα} < k1 (resp. {−mα} < k1 ).
If in sα there is an abelian power of period m and exponent k ≥ 2
starting in position i with {iα} ≥ {mα} (resp. {iα} ≤ {mα}), then
1
1
1
{mα} < k+1
(resp. {−mα} < k+1
). Conversely, if {mα} < k+1
1
(resp. {−mα} < k+1 ), then there is an abelian power of period m
and exponent k ≥ 2 starting in position m.
TL (LITIS)
Abelian Periods
SeqBio 2013
21 / 27
Consequences using Number Theory
Theorem
Let sα be a Sturmian word. For any integer q > 1, let kq be the maximal
exponent of an abelian repetition of period q in sα . Then
lim sup
√
kq
≥ 5,
q
and the equality holds if α = φ − 1.
TL (LITIS)
Abelian Periods
SeqBio 2013
22 / 27
Other results on Fibonacci words
Theorem
Let j > 1. The longest prefix of the Fibonacci infinite word that is an
abelian repetition of period Fj has length Fj (Fj+1 + Fj−1 + 1) − 2 if j is
even or Fj (Fj+1 + Fj−1 ) − 2 if j is odd.
Corollary
Let j > 1 and kj be the maximal exponent of a prefix of the Fibonacci
word that is an abelian repetition of period Fj . Then
√
kj
= 5.
j→∞ Fj
lim
TL (LITIS)
Abelian Periods
SeqBio 2013
23 / 27
Other results on Fibonacci words 2
Theorem
For j ≥ 3, the (smallest) abelian period of the word fj is the n-th
Fibonacci number Fn , where n = ⌊j/2⌋ if j = 0, 1, 2 mod 4, or
n = 1 + ⌊j/2⌋ if j = 3 mod 4.
2, 2, 2, 3, 5, 5, 5, 8, 13, 13, 13, 21, 34, 34, 34, 55, 89, 89, 89
2 is the abelian period of aba, a ba ab and of a ba ab ab a.
Not of aba aba baa baa b that has abelian period 3.
Instead 5 is the abelian period of a baaba baaba ababa ababa and of
a baaba baaba ababa ababa abaab abaab aab
TL (LITIS)
Abelian Periods
SeqBio 2013
24 / 27
Open problems
1
2
Is it possible to find the exact value of lim sup
words sα with slope α different from φ − 1?
kq
q
for other Sturmian
Is it possible to give the exact value of this superior limit when α is
an algebraic number of degree 2?
TL (LITIS)
Abelian Periods
SeqBio 2013
25 / 27
References
G. Fici, A. Langiu, T. L., A. Lefebvre, F. Mignosi, and
É. Prieur-Gaston
Abelian repetitions in sturmian words
In M.-P. Béal and O. Carton, editors, Proceedings of the 17th
International Conference on Developments in Language Theory (DLT
2013), volume 7907 of Lecture Notes in Computer Science, pages
227–238, Marne-la-Vallée, France, 2013. Springer-Verlag, Berlin
G. Fici, A. Langiu, T. L., A. Lefebvre, F. Mignosi, and
É. Prieur-Gaston
Abelian repetitions in sturmian words
Report arXiv:1209.6013v3
TL (LITIS)
Abelian Periods
SeqBio 2013
26 / 27
THANK YOU FOR YOUR ATTENTION!
TL (LITIS)
Abelian Periods
SeqBio 2013
27 / 27