Permutation Group Algorithms

This page intentionally left blank
Permutation group algorithms comprise one of the workhorses of symbolic

algebra systems computing with groups and play an indispensable role in the
proof of many deep results, including the construction and study of sporadic
finite simple groups. This book describes the theory behind permutation group
algorithms, up to the most recent developments based on the classification of
finite simple groups. Rigorous complexity estimates, implementation hints, and
advanced exercises are included throughout.
The central theme is the description of nearly linear-time algorithms, which
are extremely fast in terms of both asymptotic analysis and practical running
time. A significant part of the permutation group library of the computational
group algebra system GAP is based on nearly linear-time algorithms.
The book fills a significant gap in the symbolic computation literature. It is
recommended for everyone interested in using computers in group theory and
is suitable for advanced graduate courses.
Ákos Seress is a Professor of Mathematics at The Ohio State University.

CAMBRIDGE TRACTS IN MATHEMATICS
General Editors
B. BOLLOBÁS, W. FULTON, A. KATOK, F. KIRWAN, P. SARNAK
152 Permutation Group Algorithms

Ákos Seress
The Ohio State University
Permutation Group
Algorithms
  
Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo
Cambridge University Press

The Edinburgh Building, Cambridge  , United Kingdom
Published in the United States by Cambridge University Press, New York
www.cambridge.org
Information on this title: www.cambridge.org/9780521661034
© Ákos Seress 2002
This book is in copyright. Subject to statutory exception and to the provision of

relevant collective licensing agreements, no reproduction of any part may take place
without the written permission of Cambridge University Press.
First published in print format 2003
ISBN-13 978-0-511-06647-4 eBook (NetLibrary)

ISBN-10 0-511-06647-3 eBook (NetLibrary)
ISBN-13 978-0-521-66103-4 hardback

ISBN-10 0-521-66103-X hardback
Cambridge University Press has no responsibility for the persistence or accuracy of

s for external or third-party internet websites referred to in this book, and does not
guarantee that any content on such websites is, or will remain, accurate or appropriate.
Contents
1 Introduction page 1
1.1 A List of Algorithms 4
1.2 Notation and Terminology 6
1.2.1 Groups 7
1.2.2 Permutation Groups 9
1.2.3 Algorithmic Concepts 10
1.2.4 Graphs 11
1.3 Classification of Randomized Algorithms 12
2 Black-Box Groups 16
2.1 Closure Algorithms 18
2.1.1 Orbit Computations 18
2.1.2 Closure of Algebraic Structures 23
2.2 Random Elements of Black-Box Groups 24
2.3 Random Subproducts 30
2.3.1 Definition and Basic Properties 30
2.3.2 Reducing the Number of Generators 33
2.3.3 Closure Algorithms without Membership Testing 37
2.3.4 Derived and Lower Central Series 38
2.4 Random Prefixes 40
2.4.1 Definition and Basic Properties 40
2.4.2 Applications 44
3 Permutation Groups: A Complexity Overview 48

3.1 Polynomial-Time Algorithms 48
3.2 Nearly Linear-Time Algorithms 51
3.3 Non-Polynomial-Time Methods 52
v
vi Contents
4 Bases and Strong Generating Sets 55

4.1 Basic Definitions 55
4.2 The Schreier–Sims Algorithm 57
4.3 The Power of Randomization 62
4.4 Shallow Schreier Trees 64
4.5 Strong Generators in Nearly Linear Time 70
4.5.1 Implementation 75
5 Further Low-Level Algorithms 79

5.1 Consequences of the Schreier–Sims Method 79
5.1.1 Pointwise Stabilizers 79
5.1.2 Homomorphisms 80
5.1.3 Transitive Constituent and Block
Homomorphisms 81
5.1.4 Closures and Normal Closures 83
5.2 Working with Base Images 84
5.3 Permutation Groups as Black-Box Groups 93
5.4 Base Change 97
5.5 Blocks of Imprimitivity 100
5.5.1 Blocks in Nearly Linear Time 101
5.5.2 The Smallest Block Containing a Given Subset 107
5.5.3 Structure Forests 111
6 A Library of Nearly Linear-Time Algorithms 114

6.1 A Special Case of Group Intersection and Applications 115
6.1.1 Intersection with a Normal Closure 115
6.1.2 Centralizer in the Symmetric Group 117
6.1.3 The Center 120
6.1.4 Centralizer of a Normal Subgroup 120
6.1.5 Core of a Subnormal Subgroup 124
6.2 Composition Series 125
6.2.1 Reduction to the Primitive Case 126
6.2.2 The O’Nan–Scott Theorem 129
6.2.3 Normal Subgroups with Nontrivial Centralizer 133
6.2.4 Groups with a Unique Nonabelian Minimal
Normal Subgroup 139
6.2.5 Implementation 146
6.2.6 An Elementary Version 149
6.2.7 Chief Series 155
6.3 Quotients with Small Permutation Degree 156
6.3.1 Solvable Radical and p-Core 157
Contents vii
7 Solvable Permutation Groups 162

7.1 Strong Generators in Solvable Groups 162
7.2 Power-Conjugate Presentations 165
7.3 Working with Elementary Abelian Layers 166
7.3.1 Sylow Subgroups 167
7.3.2 Conjugacy Classes in Solvable Groups 172
7.4 Two Algorithms for Nilpotent Groups 175
7.4.1 A Fast Nilpotency Test 176
7.4.2 The Upper Central Series in Nilpotent Groups 179
8 Strong Generating Tests 183

8.1 The Schreier–Todd–Coxeter–Sims Procedure 184
8.1.1 Coset Enumeration 184
8.1.2 Leon’s Algorithm 186
8.2 Sims’s Verify Routine 188
8.3 Toward Strong Generators by a Las Vegas Algorithm 191
8.4 A Short Presentation 197
9 Backtrack Methods 201

9.1 Traditional Backtrack 202
9.1.1 Pruning the Search Tree: Problem-Independent
Methods 203
9.1.2 Pruning the Search Tree: Problem-Dependent
Methods 205
9.2 The Partition Method 207
9.3 Normalizers 211
9.4 Conjugacy Classes 214
10 Large-Base Groups 218

10.1 Labeled Branchings 218
10.1.1 Construction 222
10.2 Alternating and Symmetric Groups 225
10.2.1 Number Theoretic and Probabilistic Estimates 228
10.2.2 Constructive Recognition: Finding the
New Generators 235
10.2.3 Constructive Recognition: The Homomorphism λ 239
10.2.4 Constructive Recognition: The Case of Giants 244
10.3 A Randomized Strong Generator Construction 246
Bibliography 254
Index 262
1
Introduction
Computational group theory (CGT) is a subfield of symbolic algebra; it deals

with the design, analysis, and implementation of algorithms for manipulating
groups. It is an interdisciplinary area between mathematics and computer sci-
ence. The major areas of CGT are the algorithms for finitely presented groups,
polycyclic and finite solvable groups, permutation groups, matrix groups, and
representation theory.
The topic of this book is the third of these areas. Permutation groups are
the oldest type of representations of groups; in fact, the work of Galois on
permutation groups, which is generally considered as the start of group theory
as a separate branch of mathematics, preceded the abstract definition of groups
by about a half a century. Algorithmic questions permeated permutation group
theory from its inception. Galois group computations, and the related problem
of determining all transitive permutation groups of a given degree, are still
active areas of research (see [Hulpke, 1996]). Mathieu’s constructions of his
simple groups also involved serious computations.
Nowadays, permutation group algorithms are among the best developed parts
of CGT, and we can handle groups of degree in the hundreds of thousands. The
basic ideas for handling permutation groups appeared in [Sims, 1970, 1971a];
even today, Sims’s methods are at the heart of most of the algorithms.
At first glance, the efficiency of permutation group algorithms may be surpris-
ing. The input consists of a list of generators. On one hand, this representation
is very efficient, since a few permutations in Sn can describe an object of size up
to n!. On the other hand, the succinctness of such a representation G = S ne-
cessitates nontrivial algorithms to answer even such basic questions as finding
the order of G or testing membership of a given permutation in G.
Initially, it is not even clear how to prove in polynomial time in the input length
that a certain permutation g is in G, because writing g as a product of the given
generators S for G may require an exponentially long word. Sims’s seminal
1
2 Introduction
idea was to introduce the notions of base and strong generating set. This data
structure enables us to decide membership in G constructively, by writing any
given element of G as a short product of the strong generators. The technique
for constructing a strong generating set can also be applied to other tasks such
as computing normal closures of subgroups and handling homomorphisms of
groups. Therefore, a significant part of this book is devoted to the description
of variants and applications of Sims’s method.
A second generation of algorithms uses divide-and-conquer techniques by
utilizing the orbit structure and imprimitivity block structure of the input group,
thereby reducing the problems to primitive groups. Although every abstract
group has a faithful transitive permutation representation, the structure of prim-
itive groups is quite restricted. This extra information, partly obtained as a
consequence of the classification of finite simple groups, can be exploited in
the design of algorithms.
We shall also describe some of the latest algorithms, which use an even finer
divide-and-conquer technique. A tower of normal subgroups is constructed
such that the factor groups between two consecutive normal subgroups are the
products of isomorphic simple groups. Abelian factors are handled by linear al-
gebra, whereas the simple groups occurring in nonabelian factors are identified
with standard copies of these groups, and the problems are solved in the stan-
dard copies. This identification process works in the more general black-box
group setting, when we do not use the fact that the input group is represented
by permutations: The algorithms only exploit the facts that we can multiply
and invert group elements and decide whether two group elements are equal.
This generality enables us to use the same algorithms for matrix group inputs.
Computations with matrix groups is currently the most active area of CGT.
Dealing with permutation groups is the area of CGT where the complexity
analysis of algorithms is the most developed. The initial reason for interest
in complexity analysis was the connection of permutation group algorithms
with the celebrated graph isomorphism problem. The decisive result in estab-
lishing the connection is the polynomial-time algorithm in [Luks, 1982] for
testing isomorphism of graphs with bounded valence, where the isomorphism
problem is reduced to finding setwise stabilizers of subsets in the permutation
domain of groups with composition factors of bounded size. This paper not
only established a link between complexity theory and CGT but provided new
methodology for permutation group algorithms.
Up until the end of the 1980s, permutation group algorithms were devel-
oped in two different contexts. In one of these, the primary goal was efficient
implementation, to handle the groups occurring in applications. In the other
context, the main goal was the rigorous asymptotic analysis of algorithms.
Introduction 3
Algorithms for numerous tasks were developed separately in the two con-
texts, and the two previous books on permutation group algorithms reflect
this division: [Butler, 1991] deals mostly with the practical approach, whereas
[Hoffmann, 1982] concentrates on the asymptotic analysis. In the past decade,
a remarkable convergence of the approaches occurred, and algorithms with fast
asymptotic running times that are suitable for implementation were developed.
The main purpose of this book is to describe this new development. We con-
sider the interaction of theory and implementation to be of great importance
to each side: Symbolic algebra can benefit considerably by the influx of ideas
of algorithmic complexity theory and rigorous asymptotic analysis; conversely,
the implementations help demonstrate the power of the asymptotic paradigm,
which is at the foundation of the theory of computing.
The major theme of this book is the description of nearly linear-time algo-
rithms. These are the algorithms representing the convergence of theoretical
and practical considerations. Their running time is O(n|S| logc |G|) for input
groups G = S ≤ Sn ; in particular, in the important subcase of small-base
groups, when log |G| is bounded from above by a polylogarithmic function of
n, the running time is a nearly linear, O(N logc N ), function of the input length
N = n|S|. The category of small-base groups includes all permutation repre-
sentations of finite simple groups except the alternating ones and all primitive
groups that do not have alternating composition factors in their socle. Most
practical computations are performed with small-base input groups.
Quite different methods give the asymptotically fastest solutions for com-
putational problems in large-base groups, where log |G| is bounded only by
log n!. Most of these algorithms have not yet been implemented. We shall also
describe backtrack methods, which are the practical algorithms for problems
with no known polynomial-time solutions. For small-base input groups, back-
track methods may be practical in groups of degree in the tens of thousands.
Our main goal is to present the mathematics behind permutation group algo-
rithms, and implementation details will be mostly omitted. We shall give details
only in the cases where the implemented version differs significantly from the
one described by the theoretical result or when the reason for the fast asymptotic
running time is a nontrivial data structure. Most of the algorithms described in
this book have been implemented in the GAP system [GAP, 2000], which, along
with its source code, is freely available. GAP code is written in a high-level,
Pascal-like language, and it can be read as easily as the customary pseudocode
in other books and articles on computational group theory. The addresses of ftp
servers for GAP can be obtained from the World Wide Web page
http://www-gap.dcs.st-and.ac.uk/~gap.
4 Introduction
The other large computer algebra system particularly suitable for computations
with groups is Magma (see [Bosma et al., 1997]). The World Wide Web page
http://www.maths.usyd.edu.au:8000/u/magma
describes how to access Magma on a subscription basis.
Acknowledgments
The writing of this book began in 1993, on the suggestion of Joachim Neubüser,
who envisioned a series of books covering the major areas of computational
group theory. The fact that the writing is finished in less than a decade is in no
small part the consequence of my wife Sherry’s continuous encouragement. I
am thankful to both of them and to the editors of Cambridge University Press
for their patience. During this period, I was partially supported by the National
Science Foundation.
Alexander Hulpke, William Kantor, Joachim Neubüser, Cheryl Praeger,
Charles Sims, and Leonard Soicher read parts of the manuscript, and their
comments improved the presentation significantly. I am especially indebted to
William Kantor and Joachim Neubüser for their help.
1.1. A List of Algorithms

In this book, most algorithms are described in the proofs of theorems or just
in the narrative, without any display or pseudocode. Whenever it is possible,
algorithms given in the narrative are preceded by a centered paragraph header.
The following list serves as a reference guide; it is organized roughly along
the lines of the lists in Sections 3.1 and 3.3. The input is a permutation group
G ≤ Sym().
r Orbit of some α ∈ : Section 2.1.1; in particular, Theorem 2.1.1
r Blocks of imprimitivity
(i) A minimal nontrivial block: Section 5.5.1 (algorithm MinimalBlock)
(ii) The minimal block containing a given subset of : Section 5.5.2
r Shallow Schreier tree construction
(i) Deterministic: Lemma 4.4.2, Remark 4.4.3, Lemma 4.4.8
(ii) Las Vegas: Theorem 4.4.6, Remark 4.4.7
r Strong generating set construction
(i) Deterministic: Section 4.2 (Schreier–Sims algorithm), Theorem 5.2.3
(with known base), Section 7.1 (for solvable groups), Theorem 10.1.3
(stored in a labeled branching)
(ii) Monte Carlo: Section 4.5, Theorems 5.2.5 and 5.2.6, Lemma 5.4.1,
Section 10.3
1.1 A List of Algorithms 5
(iii) Heuristic: Section 4.3 (random Schreier–Sims algorithm)
(iv) GAP implementation: Section 4.5.1, Remark 5.2.7
r Strong generating set verification
(i) Deterministic: Section 8.1 (Schreier–Todd–Coxeter–Sims algorithm),
Section 8.2 (Verify routine)
(ii) Monte Carlo: Lemma 4.5.6
(iii) Las Vegas: Theorem 8.3.1
r Membership test (sifting): Section 4.1
r Reduction of the size of generating sets
(i) Strong generators, deterministic: Lemma 4.4.8, Exercise 4.7
(ii) Arbitrary generators, Monte Carlo: Lemma 2.3.4, Theorem 2.3.6
r Random element generation
(i) With an SGS: Section 2.2, first paragraph
(ii) Without an SGS: Section 2.2 (random walk on a Cayley graph, product
replacement algorithm)
(iii) In alternating and symmetric groups: Exercises 2.1 and 2.2
r Isomorphism with other representations
(i) With a black-box group: Section 5.3
(ii) Solvable groups, with a power-commutator presentation: Section 7.2
(iii) An and Sn , with natural action: Theorem 10.2.4
(iv) PSLd (q), with the action on projective points: Section 5.3
r Operations with base images and with words in generators: Lemmas 5.2.1,
5.2.2, and 5.3.1
r Base change (transposing and conjugating base points, deterministic and Las
Vegas algorithms): Section 5.4, Exercise 5.5
r Presentations: Section 7.2 (for solvable groups), Section 8.1, Exercise 5.2,
Theorem 8.4.1
r Pointwise stabilizer of a subset of : Section 5.1.1
r Handling of homomorphisms (kernel, image, preimage)
(i) Transitive constituent and block homomorphisms: Section 5.1.3
(ii) General case: Section 5.1.2
r Closure for G-action, normal closure
(i) With membership test in substructures, deterministic: Sections 2.1.2 and
5.1.4, Lemma 6.1.1
(ii) Without membership test, Monte Carlo: Theorems 2.3.9 and 2.4.5
r Commutator subgroup computation, derived series, lower central series
(i) With membership test in substructures, deterministic: Sections 2.1.2
and 5.1.4
(ii) Without membership test, Monte Carlo: Theorems 2.3.12 and 2.4.8
r Upper central series in nilpotent groups: Section 7.4.2
r Solvability test: Sections 2.1.2, 5.1.4, and 7.1
6 Introduction
r Nilpotency test: Sections 2.1.2, 5.1.4, and 7.4.1
r Subnormality test: Section 2.1.2
r Commutativity test (Monte Carlo): Lemma 2.3.14
r Regularity test: Exercises 5.12–5.15
r Center: Section 6.1.3
r Permutation representation with a normal subgroup N in the kernel: Lem-
mas 6.2.2 and 6.2.4, Theorem 6.3.1 (if N is abelian)
r Composition series
(i) Reduction to the primitive group case: Section 6.2.1
(ii) Finding normal subgroups in various types of primitive groups (Monte
Carlo): Sections 6.2.3 and 6.2.4
(iii) Verification of composition series, if SGS is known: Section 6.2.6
(iv) GAP implementation: Section 6.2.5
(v) Composition series without the classification of finite simple groups:
Section 6.2.6
r Chief series: Section 6.2.7
r Sylow subgroups and Hall subgroups in solvable groups: Section 7.3.1, Ex-
ercise 7.5 (Theorem 7.3.3 for conjugating Sylow subgroups)
r Core of a subnormal subgroup: Section 6.1.5
r p-core and solvable radical: Section 6.3.1
r Backtrack, general description: Section 9.1 (traditional), Section 9.2 (partition
backtrack)
r Setwise stabilizer of a subset of : Section 9.1.2, Example 2
r Centralizer
(i) In the full symmetric group: Section 6.1.2
(ii) Of a normal subgroup: Section 6.1.4
(iii) General case: Section 9.1.2, Example 1
r Intersection of groups: Corollary 6.1.3 (if one of the groups normalizes the
other), Section 9.1.2, Example 3 (general case)
r Conjugating element: Section 9.1.2, Example 4
r Conjugacy classes: Section 7.3.2 (in solvable groups), Section 9.4 (general
case)
r Normalizer: Section 9.3
1.2. Notation and Terminology

We assume that the reader is familiar with basic notions concerning groups
covered in introductory graduate courses and with elementary probability the-
ory. A background area with which we do not suppose reader familiarity is
the detailed properties of finite simple groups, and the occasional references to
these properties can be ignored without impeding understanding of the subse-
quent material. However, readers interested in further research in permutation
group algorithms are strongly advised to acquire knowledge of groups of Lie
type. One of the current largest obstacles, both in the permutation group and
matrix group setting, is our inability to exploit algorithmically properties of
exceptional groups of Lie type.
The required (minimal) background material about permutation groups
can be found for example in the first chapter of the recent books [Dixon and
Mortimer, 1996] and [Cameron, 1999]. Here we only summarize our notation
and terminology. In this book, all groups are finite.
All statements (i.e., theorems, lemmas, propositions, corollaries, and re-
marks) are numbered in a common system. For example, Theorem X.Y.Z de-
notes the Zth statement in Chapter X, Section Y, if this statement happens to
be a theorem. Definitions are just part of the text and are not displayed with a
number. Any unknown items (hopefully) can be found in the index. In the index,
boldface type is used for the page number where an item or notation is defined.
There are exercises at the end of some chapters, numbered as Exercise X.Y in
Chapter X. A third numbering system is used for the displayed formulas, in the
form (X.Y) in Chapter X.
1.2.1. Groups
If G is a group and S ⊆ G then we denote by S the subgroup generated by S.
We write H ≤ G to indicate that H is a subgroup of G and H G if H ≤ G and
H = G. If H is isomorphic to a subgroup of G then we write H G. The symbol
|G : H | denotes the number |G|/|H |, and H G denotes that H is normal in
G. A subgroup H ≤ G is subnormal in G, in notation H G, if there exists a
chain of subgroups H = H0 H1 · · · Hk = G. If N G and H ≤ G such
that N ∩ H = 1 and G = N H then we call H a complement of N in G.
The group of automorphisms, outer automorphisms, and inner automor-
phisms of G are denoted by Aut(G), Out(G), and Inn(G), respectively. We say
that G acts on a group H if a homomorphism ϕ : G → Aut(H ) is given. If ϕ is
clear from the context, for g ∈ G and h ∈ H we sometimes denote ϕ(g)(h), the
image of h under the automorphism ϕ(g), by h g . If G acts on H and U ⊆ H
then U G := {U g | g ∈ G} is the orbit of U under the G-action, and U G is
the G-closure of U . In the special case H = G, the group U G is called the
normal closure of U . For U ≤ H, C G (U ) := {g ∈ G | (∀u ∈ U )(u g = u)} is
the centralizer of U in G and NG (U ) := {g ∈ G | U g = U } is the normalizer
of U in G. In particular, Z (G) := C G (G) is the center of G.
8 Introduction
The commutator of a, b ∈ G is [a, b] := a −1 b−1 ab and the conjugate of a

by b is a b := b−1 ab. For H, K ≤ G, the commutator of H and K is defined
as [H, K ] := [h, k] | h ∈ H, k ∈ K . In particular, [G, G], also called the
derived subgroup of G, is denoted by G . A group G is perfect if G = G . The
derived series of G is the sequence D0 ≥ D1 ≥ · · · of subgroups of G, defined
recursively by the rules D0 := G and Di+1 := Di for i ≥ 0. The lower central
series L 0 ≥ L 1 ≥ · · · of G is defined as L 0 := G and L i+1 := [L i , G] for
i ≥ 0. The upper central series Z 0 ≤ Z 1 ≤ · · · of G is defined as Z 0 := 1 and
Z i+1 is the preimage of Z (G/Z i ) in G for i ≥ 0. A group G is called solvable
if Dm = 1 for some m, and it is called nilpotent if L m = 1 or Z m = G for
some m.
The direct product of groups A1 , . . . , Am is denoted by A1 × · · · × Am or by
m
i=1 Ai . For i = 1, 2, . . . , m, the projection function πi : A1 × · · · × Am → Ai
is defined by the rule πi : (a1 , . . . , am ) → ai . A group H ≤ A1 × · · · × Am is
a subdirect product of the Ai if all functions πi restricted to H are surjective,
i.e., {πi (h) | h ∈ H } = Ai .
For H ≤ G, a transversal G mod H is a set of representatives from the right
cosets of H in G. For a fixed transversal T and g ∈ G, we denote by ḡ the coset
representative in T such that g ∈ H ḡ. Unless stated explicitly otherwise, cosets
always mean right cosets.
If
is any collection of simple groups, O
(G) denotes the largest normal
subgroup of G such that each composition factor of O
(G) is isomorphic to
a member of
, and O
(G) denotes the smallest normal subgroup of G such
that each composition factor of G/O
(G) is isomorphic to a member of
. In
particular, if
consists of a single group of prime order p then O
(G) is denoted
by O p (G); this is the largest normal p-subgroup, the p-core, of G. When
consists of all cyclic simple groups, O

(G) is denoted by O∞ (G); this is the
largest solvable normal subgroup, the solvable radical of G. Similarly, O ∞ (G)
denotes the smallest normal subgroup of G with solvable factor group and it is

called the solvable residual of G. For H ≤ G, CoreG (H ) := {H g | g ∈ G}
is the largest normal subgroup of G contained in H ; it is the kernel of the
permutation representation of G on the (right) cosets of H . The socle of G
is the subgroup of G generated by all minimal normal subgroups of G and is
denoted by Soc(G).
The cyclic group of order n is denoted by Cn . The group of invertible d × d
matrices over the q-element field GF(q) is denoted by GLd (q). Similar notation
is used for the other classical matrix groups of Lie type and for their projective
factor groups: SLd (q), PSLd (q), and so on. The unitary groups GUd (q), SUd (q),
and PSUd (q) are defined over GF(q 2 ). For exceptional groups of Lie type, we
use the Lie-theoretic notation 2B2 (q), 2G 2 (q), and so on. As mentioned earlier,
no detailed knowledge of the groups of Lie type is required in this book.
1.2.2. Permutation Groups

We shall use the cycle notation for permutations, and the identity permuta-
tion is denoted by ( ). The group of all permutations of an n-element set is
denoted Sym(), or Sn if the specific set is inessential. Subgroups of Sn are
the permutation groups of degree n. We use lowercase Greek letters to denote
elements of ; lower- and uppercase italics denote elements and subgroups
of Sn , respectively. For α ∈ and g ∈ Sym(), we write α g for the image of
α under the permutation g. The alternating group on is denoted by Alt()
(or An ). The support of g ∈ Sym(), denoted by supp(g), consists of those el-
ements of that are actually displaced by g: supp(g) = {ω ∈ | ω g = ω}.
The set of fixed points of g is defined as fix(g) := \supp(g). The degree of g
is deg(g) = |supp(g)|.
We say that a group G acts on if a homomorphism ϕ : G → Sym() is
given (by specifying the image of a generator set of G). This action is faithful if
its kernel ker(ϕ) is the identity. The image ϕ(G) ≤ Sym() is also denoted by
G . In the special case when G ≤ Sym(), ⊆ is fixed by G, and ϕ is the
restriction of permutations to , we also denote G by G| . The orbit of ω ∈
under G ≤ Sym() is the set of images ω G := {ω g | g ∈ G}. For ⊆ and
g ∈ Sym(), g := {δ g | δ ∈ }. A group G ≤ Sym() is transitive on if it
has only one orbit, and G is t-transitive if the action of G induced on the set of
ordered t-tuples of distinct elements of is transitive (t ≤ n). The maximum
such t is the degree of transitivity of G.
If G ≤ Sym() is transitive and ⊆ , then is called a block of imprimi-
tivity for G if for all g ∈ G either g = or g ∩ = ∅. The group G is called
primitive if all blocks have 0, 1, or || elements. If is a block then the set of
images of is a partition of , which is called a block system, and an action of G
is induced on the block system. A block is called minimal if it has more than one
element and its proper subsets of size at least two are not blocks. A block is called
maximal if the only block properly containing it is . A block system is maximal
if it consists of minimal blocks, whereas a block system is minimal if it consists
of maximal blocks. The action of G on a minimal block system is primitive.
For ⊆ and G ≤ Sym(), G () denotes the pointwise stabilizer of ,
namely, G () = {g ∈ G | (∀δ ∈ )(δ g = δ)}. If has only one or two elements,
we often drop the set braces and parentheses from the notation; in particular,
G δ denotes the stabilizer of δ ∈ . The setwise stabilizer of is denoted
10 Introduction
by G (i.e., G = {g ∈ G | g = }). If = (δ1 , . . . , δm ) is a sequence of
elements of then G denotes the pointwise stabilizer of that sequence (i.e.,
G = G ({δ1 ,...,δm }) ).
A group G ≤ Sym() is semiregular if G δ = 1 for all δ ∈ , whereas G
is regular if it is transitive and semiregular. A Frobenius group is a transitive
group G ≤ Sym() that is not regular but for which G αβ = 1 for all distinct
α, β ∈ .
If g ∈ Sym() then a bijection ϕ : → naturally defines a permutation
ϕ̄(g) ∈ Sym() by the rule ϕ(ω)ϕ̄(g) := ϕ(ω g ) for all ω ∈ . We say that
G ≤ Sym() and H ≤ Sym() are permutation isomorphic, H ∼ G, if there
is a bijection ϕ: → such that ϕ̄(G) := {ϕ̄(g) | g ∈ G} = H .
Let G be an arbitrary group and let H ≤ Sk be a transitive permutation
group. The wreath product G H consists of the sequences (g1 , . . . , gk ; h)
where gi ∈ G for i = 1, . . . , k and h ∈ H . The product of (g1 , . . . , gk ; h) and
(ḡ 1 , . . . , ḡ k ; h̄) is defined as (g1 ḡ 1h . . . , gk ḡ k h ; h h̄).
1.2.3. Algorithmic Concepts

Groups in algorithms will always be input and output by specifying a list of
generators.
Given G = S, a straight-line program of length m reaching some g ∈ G
is a sequence of expressions (w1 , . . . , wm ) such that, for each i, wi is a symbol
for some element of S, or wi = (w j , −1) for some j < i, or wi = (w j , wk )
for some j, k < i, such that if the expressions are evaluated in G the obvious
way then the value of wm is g. Namely, the evaluated value of a symbol for a
generator is the generator itself; the evaluated value of wi = (w j , −1) is the
inverse of the evaluated value of w j ; and the evaluated value of wi = (w j , wk )
is the product of the evaluated values of w j and wk . Hence a straight-line
program is an encoding of a sequence of group elements (g1 , . . . , gm ) such that
gm = g and for each i one of the following holds: gi ∈ S, or gi = g −1 j for some
j < i, or gi = g j gk for some j, k < i. However, the more abstract definition as
a sequence of expressions not only requires less memory but also enables us to
construct a straight-line program in one representation of G and evaluate it in
another, which is an important feature of some algorithms.
The symbols Z, N, and R denote the set of integers, nonnegative integers,
and real numbers, respectively. Let
F := { f : N → R | (∃n 0 ∈ N)(∀n > n 0 )( f (n) > 0)}
(i.e., functions that take positive values with finitely many exceptions). For
f ∈ F, we define
O( f ) := {t ∈ F | (∃c > 0)(∃n 0 ∈ N)(∀n > n 0 )(t(n) < c f (n))},

( f ) := {t ∈ F | (∃c > 0)(∃n 0 ∈ N)(∀n > n 0 )(t(n) > c f (n))},
o( f ) := {t ∈ F | (∀c > 0)(∃n 0 ∈ N)(∀n > n 0 )(t(n) < c f (n))},
and
( f ) := O( f ) ∩ ( f ).
Stated less formally, t ∈ O( f ) means that for large enough n, t(n)/ f (n) is
bounded from above by an absolute constant c; t ∈ ( f ) means that for large
enough n, t(n)/ f (n) is bounded from below by an absolute positive constant c;
and t ∈ o( f ) means that the limit of t(n)/ f (n) is 0.
For t ∈ O( f ), we also say that t is O( f ), and we use similar statements for
and as well. We believe that this notation is more correct than the traditional
t = O( f ) (see [Brassard and Bratley, 1988, Chap. 2] for the different variants
of these notations).
We also introduce a “soft version” of the big-O notation. We write t ∈ O ∼ ( f )
if t(n) ≤ C f (n) logc n for large enough n (where c, C are positive constants).
Logarithms are always of base 2.
1.2.4. Graphs
Let V be a set and E a subset of the two-element subsets of V . The pair (V, E)
is called a graph X (V, E). The elements of V and E are the vertices and the
edges of X , respectively. For v ∈ V , the number of edges containing v is the
degree or valency of v, which we shall denote by deg(v). A graph X is called
regular if all vertices have the same valency. We also say that an edge {u, v} ∈ E
connects u and v. The set N (v) := {u ∈ V | {u, v} ∈ E} is the neighborhood of
v in X . A graph X is called bipartite if V can be partitioned into two sets A, B
so that all edges of X connect some vertex in A with some vertex in B. The
automorphism group Aut(X ) consists of those permutations of V that leave E
invariant.
These notions can be generalized by requiring only that the elements of E are
subsets of V , but of arbitrary size. Then X is called a hypergraph. A hypergraph
X is uniform if all elements of E are of the same size.
Another generalization is when E consists of ordered pairs of V . Then X is
called a directed graph. If we want to emphasize that a graph is directed then
we shall use arrows above E and above the ordered pairs in E. The out-degree
−−→
of a vertex u is the number of edges (u, v) in E. For a directed graph X (V, E),
12 Introduction
the underlying graph of X is the graph U(V, E) with edge set E = {{u, v} |
−−→
(u, v) ∈ E}.
Let X (V, E) be a graph. A walk in X is a sequence of vertices (v0 , v1 , . . . , vk )
such that {vi , vi+1 } ∈ E for all i ∈ [0, k − 1]. A path is a walk with vi = v j for
all i, j with 0 ≤ i < j ≤ k; a cycle is a walk with v0 = vk and vi = v j for all
i, j with 0 ≤ i < j ≤ k − 1. We can define a binary relation R on V by letting
(u, v) ∈ R if and only if there is a walk (u = v0 , v1 , . . . , vk = v). Then R is an
equivalence relation; the equivalence classes are called the components of X .
The graph X is connected if it has only one component.
If X is a directed graph then walks, cycles, and paths are defined similarly,
but the binary relation R is not necessarily an equivalence relation. We may
define another binary relation R̂ on V: (u, v) ∈ R̂ if and only if (u, v) ∈ R
and (v, u) ∈ R. Then R̂ is an equivalence relation, and its equivalence classes
are called the strongly connected components of X . The directed graph X is
strongly connected if it has only one strongly connected component.
Cycle-free graphs are called forests and connected, cycle-free graphs are
called trees. A rooted tree is a tree with a distinguished vertex. If X (V, E) is a
rooted tree with root r then the parent of v ∈ V \{r } is the first vertex after v
on the unique path from v to r . The children of v ∈ V are those vertices whose
parent is v. A vertex without children is called a leaf.
Let G be a group and S ⊆ G. The Cayley graph (G, S) is defined to have
vertex set G; for g, h ∈ G, {g, h} is an edge if and only if gs = h for some
s ∈ S ∪ S −1 . The Cayley graph (G, S) is connected if and only if G = S.
Cayley graphs are regular and vertex-transitive (i.e., Aut((G, S)) is a transitive
subgroup of Sym(G)).
Sequences will be denoted by enclosing their elements in parentheses. For a
sequence L , L[i] denotes the ith element of L.
The most abused notation is (a, b) for integers a and b. Depending on the
context, it may mean a sequence with elements a and b, the set of real numbers
between a and b, the set of integers between a and b (although, in this case,
we shall prefer to use the closed interval [a + 1, b − 1]), or the permutation
exchanging a and b.
1.3. Classification of Randomized Algorithms

As we shall see in numerous examples in this book, randomization can speed up
many algorithms handling permutation groups. In other parts of computational
group theory, randomization is even more important: For example, when deal-
ing with matrix groups, the scope of efficient deterministic algorithms is very
limited, both in the practical and theoretical sense of efficiency. Randomization
also seems to be an indispensable tool in algorithms for black-box groups.
In this section, we describe the different types of randomized algorithms.
Our discussion follows [Babai, 1997].
Computational tasks can be described by a relation R(x, y) between an input
string x and output string y. The relation R(x, y) holds if y is a correct output
for the task described by x. In group algorithms, the input usually contains
a set of generators for a group. The output may consist of group elements,
generating a desired subgroup of the input, but other types of output are also
conceivable: The output could be a number (for example, the order of the input
group) or a statement that group elements with the desired property do not exist
(for example, when asking for a group element conjugating one given element
to another one).
Note that there may be different correct outputs for the same input. We call a
computational task functional if it has exactly one correct output for all inputs.
For example, finding generators for a Sylow 2-subgroup is not a functional
computational task, whereas finding the order of a group is. A special category
of (functional) computational tasks is the class of decision problems. Here, the
answer is a single bit, representing “yes” or “no.” For example, determining
solvability of the input group is a decision problem.
A (correct) deterministic algorithm computes an output f (x) for all inputs
x, so that R(x, f (x)) holds. A randomized algorithm uses a string r of random
bits (“coin flippings”) and returns the output f (x, r ). The output may not be
correct for every sequence r .
We call a randomized algorithm Monte Carlo if for all inputs x,
Prob(R(x, f (x, r )) holds) ≥ 1 − ε,
where the value for the error term ε < 1/2 is specified in the description of the
Monte Carlo algorithm. In most practical situations, the reliability of the algo-
rithms can be improved by repeated applications, or by running the algorithms
longer. Although we cannot formulate a theorem in the general setting consid-
ered in this section, at least the following holds for all Monte Carlo algorithms
for permutation groups described in this book: The probability of an incorrect
answer can be bounded from above by an arbitrary ε > 0, prescribed by the
user. The associated cost is the running time of the algorithm multiplied by a
factor O(log(1/ε)).
A situation we can analyze here is the case of decision problems. Suppose that
we have a Monte Carlo algorithm for a decision problem, with error probability
ε < 1/2. Running the algorithm t times, and taking a majority vote of the results,
14 Introduction
√
we can increase the reliability to at least 1 − δ t , with δ := 2 ε(1 − ε) < 1:
The probability of error is at most
t t t/2
t k t ε
ε (1 − ε) < (1 − ε)
t−k t
< δt .
k=t/2
k k=t/2
k 1 − ε
Babai also discusses one-sided Monte Carlo algorithms (1MC algorithms)

for decision problems. These are algorithms where at least one of the outputs
is guaranteed to be correct: If the correct answer is “yes,” a 1MC algorithm
may err; if the correct answer is “no,” the algorithm must always output “no.”
Hence, an output “yes” is always correct, whereas an output “no” may not be.
The co-1MC algorithms are defined by exchanging the words “yes” and “no”
in the definition of 1MC algorithms.
In the context of group theoretical algorithms, the notion of 1MC algorithms
can be extended to most computational tasks, since the error is usually one-
sided: For example, when computing generators U for the normal closure of
some H ≤ G (cf. Section 2.3.3), we never place an element of G on the gener-
ator list U that is not in H G . The error we may commit is that U is a proper
subgroup of H G . Another example is the Monte Carlo order computation in
permutation groups (cf. Section 4.5): The result we obtain is always a lower
bound for |G|, because we do not place permutations not belonging to G into
the strong generating set of G.
An important subclass of Monte Carlo algorithms is the class of Las Vegas
algorithms. This term was introduced in [Babai, 1979] to denote Monte Carlo
algorithms that never give an incorrect answer. The output is either correct (with
the prescribed probability at least 1 − ε) or the algorithm reports failure. Here,
ε may be any given constant less than 1, since the probability of an (always cor-
rect) output can be increased to at least 1 − ε t by running the algorithm t times.
Las Vegas algorithms are preferable over general Monte Carlo algorithms
for many reasons. One of them is the certainty of the answer. Another one is
that recognizing the correct answer often allows early termination. Finally, if
we can guarantee that the output is always correct then we may use heuristics to
speed up the algorithm, even in cases when we cannot estimate the probability
of error for our heuristics.
In practice, if a Las Vegas algorithm reports failure then we rerun it until
the correct output is produced. The definition of Las Vegas algorithms can be
reformulated so that they always return a correct answer, with the running time
estimate replaced by an estimate of the expected running time.
A Monte Carlo algorithm, combined with a deterministic checking of the
result, becomes a Las Vegas algorithm, because we can recognize that the Monte
Carlo algorithm returned an incorrect output, and report failure. Another way to
obtain Las Vegas algorithms is by using both 1MC and co-1MC algorithms for
a decision problem. Running both of these algorithms, with probability at least
1 − ε one of them will return a guaranteed correct output. If both algorithms
return an answer that may not be correct, then we report failure. An important
direction of current research is the upgrading of Monte Carlo algorithms for
permutation groups to Las Vegas type. We shall describe a result in this direction
in Section 8.3.
2
Black-Box Groups
Groups can be given in different ways: The group elements can be, for exam-
ple, permutations, invertible matrices, or words in generators satisfying some
relations. Algorithms usually try to exploit the specific features of the given
representation. As an example, suppose that we have to compute the mth power
of some g ∈ G. If G is a permutation group then for large values of m the fastest
method to compute g m is the following. We compute the mth power of each
cycle of g separately, first reducing m modulo the cycle lengths. If C is a cycle
of length k and j is the remainder of m divided by k, then for all α ∈ C we have
m j
α g = α g . The arithmetic operations required in the computation of j are much
cheaper than group operations. Unfortunately, no analogue of this method is
available in other representations of groups.
i
Another way to compute g m is by repeated squaring. First, we compute g 2

for 1 ≤ i ≤ log m. Then, if the binary expansion of m is m = i∈I 2i then
i
g m = i∈I g 2 . This method requires O(log m) group operations and it is inde-
pendent of the actual representation of G: All we use is that – somehow – we
can compute the product of two given group elements.
In this chapter we deal with algorithms of the latter type. A black-box group
G is a group whose elements are encoded as strings of length at most N over
an alphabet Q, for some positive integer N and finite set Q. We do not require
that group elements have a unique representation as a string, and not all strings
need to correspond to group elements.
Group operations are performed by an oracle (the black box). Given strings
representing g, h ∈ G, we can
(i) compute a string representing gh;

(ii) compute a string representing g −1 ; and
(iii) decide whether g = 1.
16
Black-Box Groups 17
It is possible that the oracle accepts strings representing elements of an over-
group Ḡ of G, and it performs the group operations in Ḡ. If this is the case then
we assume that we can decide whether a string represents an element of Ḡ, but
we do not assume that we can decide whether a string represents an element of
G or Ḡ\G.
Combining (i)–(iii), we can compare group elements. In practical situations,
usually it is possible to decide whether g = h directly, without computing gh −1
and then comparing it with 1. A black-box group algorithm is an algorithm that
does not use specific features of the group representation or particulars of how
the group operations are performed; it can use only the operations described in
the list above. Note that we have an upper bound for the order of G, namely,
N
|G| ≤ i=1 |Q|i ≤ N |Q| N .
The definition we gave here is a slight generalization of the original one in
[Babai and Szemerédi, 1984]. There, only the alphabet Q = {0, 1} was allowed,
and the group elements had to be represented by strings of uniform length. It
is easy to reduce our more general definition to the original one: The elements
of Q can be encoded by 1 + log |Q| -long 0–1 sequences corresponding to
the binary representations of the numbers 1, 2, . . . , |Q|, and short strings can
be padded by zeros to achieve uniform length. Although we did not require
in the definition that group elements have a unique representation as a string
over the alphabet Q, in a number of practical situations this additional property
is satisfied. Unique representation allows ordering of group elements (for ex-
ample, by the lexicographic ordering of strings) that may speed up algorithms
dealing with lists of group elements (see, for example, Section 2.1.1).
Two examples of black-box groups are matrix groups (with a finite field as
Q) and permutation groups (with Q = {1, 2, . . . , n}). In these examples, the set
of all invertible matrices and the set of all permutations, respectively, are natural
overgroups. The oracles for group operations are the usual matrix and permuta-
tion multiplications and inverses. The strings representing group elements are of
uniform length, and each group element has a unique representation as a string.
Another example is the polycyclic representation of solvable groups (cf.
Section 7.2). In this case, the alphabet Q is a set of generators, and multiplication
is performed by a collection method. Every string up to length N represents a
group element, and group elements may be represented by more than one string.
However, each group element has a string representation in a normal form, the
so-called collected word. In fact, the oracle performs a group multiplication by
concatenating the input strings and computing the collected word corresponding
to this concatenation. By always using the collected word to represent group
elements, we can consider the black-box group as having unique representation.
For details, we again refer to Section 7.2.
18 Black-Box Groups
In Section 5.3, we shall introduce another way to consider permutation groups
as black-box groups. Similarly to the case of polycyclic representations, the al-
phabet is a set of generators, and every string represents a group element. Again,
we shall have a normal form for representing group elements, the so-called stan-
dard word, and we can restrict ourselves to working with standard words. For
small-base permutation groups, a class of groups important for practical com-
putations, this new representation allows much faster group operations than
are possible with permutation multiplication. The ability to perform group op-
erations quickly offsets the disadvantage that we lose the information stored
implicitly in the cycle structure of permutations; all we can do is to perform the
black-box operations listed above.
An important example of black-box groups with no unique representation
of elements is when G is a factor group H/M. Here H can be any black-box
group, and the oracle for group operations in G is the oracle of H . However, to
satisfy condition (iii), we must be able to decide whether a string representing
an element of H is in M. For example, if M = Z (H ) then we can test g ∈ M
by checking whether g commutes with the generators of H . Another example
where we can test membership in M is when M = O∞ (H ) is the solvable
radical of H or M is the Fitting subgroup of H . In Sections 2.3 and 2.4,
we shall describe Monte Carlo algorithms for computing normal closures and
commutator subgroups. Using these methods, we can test whether g ∈ M by
computing g H and the derived or lower central series of g H , respectively.
These last examples are oracles with the possibility of incorrect output, and
the membership computation in M is quite time consuming and so cannot be
applied frequently in practice. However, these examples play an important role
in the theoretical complexity investigations of matrix group algorithms.
Finitely presented groups, however, are not black-box groups. Although we
can perform group multiplications by simply concatenating the words repre-
senting g and h, there is no uniform bound for the word length. Even more
importantly, the innocent looking condition (iii) corresponds to the word prob-
lem, which is, by the celebrated results of Novikov and Boone [Rotman, 1995,
Chap. 12], in general undecidable.
2.1. Closure Algorithms

2.1.1. Orbit Computations
A frequently occurring situation is that a black-box group G acts on a permu-
tation domain and we have to compute the orbit α G := {α g | g ∈ G} of some
α ∈ . Naturally, the primary example is when the black-box group G is actually
a permutation group G ≤ Sym(). Two other examples are when G = and
we have to compute the conjugacy class of some element of G, or when
consists of all subgroups of G (specified by sets of generators) and we need the
conjugacy class of a subgroup. These latter examples explain why we consider
orbit computations here rather than among the permutation group algorithms.
When working with permutation groups, we always assume that the generators
of G are input as permutations of , while here we may have permutation
domains that are too large to be listed explicitly and yet the orbit α G is small
enough to be handled. Therefore, all we assume here is that given β ∈ and
g ∈ G, we can compute the image β g and that we can compare elements of .
Note that the latter task may be nontrivial: In the case when we compute the con-
jugacy class of a subgroup, comparing elements of amounts to the problem
of deciding whether two lists of generators define the same subgroup of G.
Suppose that G = S. The orbit α G is the smallest subset of contain-
ing α and closed under the action of the generators of G. Hence, α G can
be obtained by a standard algorithm for computing connected components in
graphs. We may define a directed graph D(, E) with vertex set and edge
−−−→
set E = {(β, γ ) | β, γ ∈ ∧ (∃g ∈ S)(β g = γ )} (i.e., the ordered pair (β, γ ) is an
edge if and only if one of the generators of G carries β to γ ). Note that if β g = γ
k
then γ g = β for a suitable power g k of g (because in this book we deal only
with finite groups), so the connected components of D are strongly connected.
The Basic Orbit Algorithm

The orbit α is the vertex set of a breadth-first-search tree rooted at α in the
G
directed graph D. Let L 0 := {α} and, recursively, define

−−−→
L i := {γ ∈ | (∃β ∈ L i−1 )((β, γ ) ∈ E)} L j.
j<i
In human language, L i consists of those vertices that are endpoints of directed

edges starting in L i−1 but do not already occur in the previously defined sets L j .
The L i are called the levels of the breadth-first-search tree. The algorithm stops

when L m = ∅ for some m. Then O := j<m L j is the connected component
of D(, E) containing α. Note that since Os = O for all generators s ∈ S, we
also have O g = O for all g ∈ G since any g ∈ G can be written as a product of
elements of S. Hence indeed O = α G .
As an example, let = [1, 5] = {1, 2, 3, 4, 5}, G = S = s1 , s2 ≤ Sym()
with s1 = (1, 2)(4, 5) and s2 = (1, 4, 5), and let α = 4. The orbit algorithm
starts with setting L 0 := {4}. Then we compute the set L 0S = {4s1 , 4s2 } = {5}
and L 1 := {5}\L 0 = {5}. Next, we compute L 1S = {5s1 , 5s2 } = {1, 4} and L 2 :=
{1, 4}\(L 0 ∪ L 1 ) = {1}. In the next step, L 2S = {1s1 , 1s2 } = {2, 4} and L 3 :=
20 Black-Box Groups
{2, 4}\(L 0 ∪ L 1 ∪ L 2 ) = {2} are computed. Finally, L 3S = {2s1 , 2s2 } = {1, 2} and
L 4 := {1, 2}\(L 0 ∪ L 1 ∪ L 2 ∪ L 3 ) = ∅ are computed. Since L 4 = ∅, the algo-
rithm terminates and outputs the orbit 4G = L 0 ∪ L 1 ∪ L 2 ∪ L 3 = {1, 2, 4, 5}.
In implementations, we collect the elements of the sets L i in a list U . We
define U [1] := α and, successively for each element β ∈ U , compute β g for all
g ∈ S. If β g is not already in U then we add β g to U . The algorithm terminates
when all elements of U are processed.
The timing of the algorithm depends on |α G |. We have to compute the image
of each β ∈ α G under each g ∈ S, and so the total number of image computations
is |α G ||S|. The cost of an image computation depends heavily on the actual
representation of G. If the black-box group G is a permutation group then
computing β g for some β ∈ and g ∈ G is the basic operation whose time
requirement is the unit in running time estimates; in the other two examples
mentioned earlier, an image computation amounts to performing some group
operations to compute a conjugate of a group element or the conjugates of a list
of generators.
Also, we have to decide whether an image β g just computed already occurs
in the list U . As with image computations, the cost of this operation varies ac-
cording to the actual representation of the black-box group and . We consider
three possible situations. In the first one, a superset of manageable size is
known for α G and the position of elements of α G in can be determined easily.
This occurs, for example, when G is a permutation group acting on : We may
simply choose := . In the second situation, no such superset is available.
The third situation is a special case of the second: There is no superset at
our disposal, but a linear ordering of is defined and we can compare any two
elements of in this ordering. This latter situation occurs, for example, when
= G and we have to compute the conjugacy class of some α ∈ G, and the
elements of the black-box group G have unique representations as strings. In
this case, we may simply consider the lexicographic ordering of strings as a
linear ordering of .
Theorem 2.1.1. Suppose that a group G = S acts on a set . Then the orbit
of some α ∈ can be computed using O(|α G ||S|) image computations. More-
over,
(i) if a superset ⊇ α G is known in advance and it is possible to allocate

|| units of memory then the image computations dominate the running
time;
(ii) if no such set is available, the additional cost is the comparison of
O(|α G |2 |S|) pairs in ;
(iii) if comparison of elements also recognizes which comes first in a lin-
ear ordering on , then the additional cost drops to the comparison of
O(|α G | log2 |α G ||S|) pairs.
Proof. (i) We define an auxiliary list L of size ||. Each time an element is

added to U , we also mark the corresponding position in L. Thus, when we have
to decide whether some newly computed image γ = β g is already in U , it is
enough to look up whether the position of γ in L is already marked.
(ii) If no such superset is available, we may compare a newly computed
image β g to the already defined elements of U . This results in an algorithm
with O(|α G |2 |S|) comparisons of elements of .
(iii) Suppose that there is a linear ordering on . If, besides the list U , we
maintain a list V containing a linear ordering of U then it is possible to decide
whether some γ ∈ already occurs in U by a binary search of V , which is
fast; however, if γ ∈ U then the insertion of γ into the list V is potentially
very expensive, because we have to shift by one position all elements of the
list that are greater than γ . Therefore, instead of a long list V containing a
linear ordering of U , we maintain shorter lists V1 , V2 , . . . that contain linear
orderings of certain segments of U . Namely, if the binary expansion of |U | is
|U | = 2i1 + 2i2 + · · · + 2ik with i 1 > i 2 > · · · > i k , then V j contains the linear
ordering of the 2i j elements of U in positions U [l +1], U [l +2], . . . , U [l +2i j ],
for l = 2i1 + · · · + 2i j−1 . Deciding whether some γ ∈ already occurs in U can
be done by binary searches of all V j , requiring O(i 1 + · · · + i k ) = O(log2 |U |)
comparisons.
If some γ ∈ is added to U , the linear orderings V j can be updated in the
following way. We create a new set Vk+1 := {γ } and, starting with j := k and
proceeding recursively through decreasing values of j, we merge the linear
orders V j and V j+1 if |V j | = |V j+1 |. We stop when we encounter the first j
such that |V j | > |V j+1 |. For example, if |U | = 55 then adding an element to
U triggers three merges, since 55 = 32 + 16 + 4 + 2 + 1 and 55 + 1 = 32 +
16 + 4 + 2 + (1 + 1) = 32 + 16 + 4 + (2 + 2) = 32 + 16 + (4 + 4) =
32 + 16 + 8.
We made a total of O(|α G | log2 |α G ||S|) comparisons while deciding whether
we should add some elements of to U . In addition, we must make compar-
isons when we merge some sets V j and V j+1 . We claim that during the en-
tire orbit computation, the total number of comparisons at updating the V j is
O(|α G | log |α G |). This claim follows from the observation that for each fixed
β ∈ U , if the set V j containing β is merged then the new set containing β is
twice the size of the original. Hence, β is processed log |α G | times. Because
|α G | log |α G | ∈ O(|α G | log2 |α G ||S|), we are done.
22 Black-Box Groups
Remark 2.1.2. Another possibility for searching the list U is to use hash func-
tions. A hash function is a function h : → K for some set K ; in particular, h
computes a value associated to each element of U , and accordingly the elements
of U can be stored in |K | sets. When we have to decide whether a new ω ∈
is already in U , it is enough to compare ω with previous elements that have the
same hash value h(ω).
The efficiency of the method depends on finding a hash function that is easily
computable, and relatively few elements of U will get the same hash value.
Among elements with the same hash value, one of the methods described in
Theorem 2.1.1 may be applied, but hopefully on an input of much smaller size.
Finding good hash functions is the subject of extensive research (see [Knuth,
1973, Chap. 6]). Here we only mention that if the orbit U is expected to be a
“random” subset of then choosing a few random bits works well in practice.
For example, if G is a matrix group of dimension d over GF(2), = GF(2)d ,
and we need the orbit of some vector α ∈ then we may try the hash function
that retrieves, for example, the middle ten coordinates of vectors.
Remark 2.1.3. Using more sophisticated data structures (e.g., 2–3 trees; cf.
[Sedgewick, 1988, Chap. 15]), the number of comparisons in Theorem 2.1.1(iii)
can be decreased to O(|α G | log |α G ||S|).
Often the information that some γ ∈ occurs in α G is not enough; we also

need an element h ∈ G such that α h = γ . Such elements can be obtained during
the orbit computation. For this purpose, besides U , we maintain a list W of
group elements such that for all i ≤ |U |, α W [i] = U [i]. When a new image β g
is added to U for some β = U [i] and g ∈ S, we define W [|U | + 1] := W [i]g.
The additional cost is that of |α G | group multiplications and, often more
restrictively, the storage of |α G | group elements. Hence, in practice, if β g is
added to U then we store in W [|U | + 1] only (a pointer to) g ∈ S. This method
saves storage, and we do not even have to perform group multiplications during
the orbit computation. We pay the price when an element h ∈ G with α h = γ
is actually needed. Using W , we have to trace back the preimages of γ until
we reach α and compute the product of generators encountered at the tracing.
Note that in the version mentioned in Theorem 2.1.1(i), the array of length ||
can be used to store pointers to generators, instead of our having to define an
additional array W .
The orbit algorithms can also be used to find the permutation action of the
generators of G on α G . For all γ ∈ α G and s ∈ S, we have computed γ s and
identified it with an element of α G . Keeping track of these identifications, we
obtain the desired permutation actions.
Finally, we remark that for any A ⊆ , the orbit algorithm can be used to com-
pute A G := {α g | α ∈ A, g ∈ G}. The only modification needed is to initialize
U := A.
2.1.2. Closure of Algebraic Structures

An important variant of orbit computations is the situation when the set
on which the group G acts has an algebraic structure. Given A ⊆ , we have
to determine A G , the smallest subset of containing A that is closed for
the G-action and for the operation(s) of the algebraic structure on . The
most important example occurring in this book is when = G and we have to
compute the normal closure of some A ≤ G or, slightly more generally, when
G acts on a group and we need the G-closure of some subgroup A ≤ .
As customary when the output of an algorithm is closed for some algebraic
operations, we seek only generators for A G . This enables us to handle objects
that are too large to have all their elements listed.
In this section, we present a version that supposes that we are able to test
membership in the already constructed substructures of A G . Versions without
this assumption (using randomized algorithms) will be given in Sections 2.3
and 2.4.
The Closure Algorithm

Given G = S and A ⊆ , generators for the G-closure A G can be obtained
by a slight modification of the orbit algorithm. We collect the generators for
A G in a list U . We define U := A and, successively for each element h ∈ U ,
compute h g for all g ∈ S. If h g is not in the algebraic structure U then we add
h g to U . The algorithm terminates when all elements of U are processed. At
that moment, U contains A and is closed for the action of G on . Also, each
element of U is of the form x y for some x ∈ A and y ∈ G; therefore U = A G .
Note that each time an element was added to U , the algebraic structure U
increased. Hence, if has at least a group structure (which will be the case
in all applications) then Lagrange’s theorem implies that |U | increased by an
integer factor and so the number of generators added to the generator list U for
A G is at most log |A G |. The time-critical part of the algorithm is usually the
computation that allows us to test membership in U .
Algorithms Based on Normal Closure

As a consequence, we obtain algorithms for tasks that are based on normal clo-
sure computations. In particular, we can compute the derived series and lower
24 Black-Box Groups
central series in a group and test solvability and nilpotence. Also, it can be deter-
mined whether a given H ≤ G is subnormal in G. Recall that H is subnormal
in G if and only if the series of subgroups defined by G 0 := G, G i+1 := H G i
for i ≥ 0 reaches H . Note, however, the difference in difficulty for these tasks
when no membership test is available. Because normal closure can be computed
by randomized algorithms, in each case we can construct generators for the re-
quired series of subgroups. However, at testing solvability and nilpotence we
have to decide only whether a group in the subgroup chain is trivial, whereas at
testing subnormality we have to check whether some subgroup we constructed
is contained in H . The latter task does not seem possible without membership
testing.
2.2. Random Elements of Black-Box Groups

Randomized algorithms for groups frequently call for random elements. In per-
mutation groups, the situation is quite satisfactory after the basic data structure
bases and strong generators (cf. Chapter 4) are constructed: We can easily con-
struct uniformly distributed, independent random elements of the group, as a
product of randomly chosen coset representatives from the transversals along
the point stabilizer chain defined by the base and strong generating set. Choos-
ing a random element from a transversal amounts to the choice of a uniformly
distributed random integer in an interval, because we have to specify a position
in the list of transversal elements. This is possible if independent, uniformly dis-
tributed random bits are available: To choose an element from [1, k], we evaluate
a sequence of log k random bits as a number in base 2. In this book, we take the
availability of random bits for granted, although generating random numbers is
a large and important area of computer science (see [Knuth, 1969, Chap. 3]).
The situation is more complicated if we need random elements in a per-
mutation group without a strong generating set. The primary example for this
phenomenon is the construction of a strong generating set itself, but this need
may arise in other situations as well: We may need random elements of certain
subgroups and we do not want to spend time to construct strong generators for
these subgroups. In very special situations (cf. Exercises 2.1 and 2.2) we may
be able to construct uniformly distributed random elements, but in general, we
settle for less: We use a heuristic or try to obtain nearly uniformly distributed
random elements.
We say that an algorithm outputs an ε-uniformly distributed element x in
a group G if (1 − ε)/|G| < Prob(x = g) < (1 + ε)/|G| for all g ∈ G. Nearly
uniformly distributed means ε-uniformly distributed for some ε < 1/2.
The construction of nearly uniformly distributed elements is even more im-
portant in matrix groups, where no practical alternative of strong generating
sets is known. It turns out that all methods proposed so far for obtaining nearly
uniformly distributed elements in matrix groups work in the black-box group
setting; they only multiply and invert already available group elements.
We shall describe three methods for constructing random elements in black-
box groups. For this description, we need to define the notion of Markov chains.
A Markov chain M is a sequence (x0 , x1 , . . .) of random elements from a set
V = {v1 , . . . , vn }. The elements of V are called the states of the Markov chain.
The Markov chain is specified by a vector a = (a1 , . . . , an ) with nonnegative

entries and j a j = 1 and by an n × n matrix P = ( pi j ) with nonnegative
entries and unit row sums. The vector a describes the distribution of the initial
element x0 , namely, Prob(x0 = vi ) = ai . The matrix P contains the transi-
tion probabilities; the entry pi j is the conditional probability that xm+1 = v j ,
provided that xm = vi .
−−−→
The matrix P defines a directed graph X on the vertex set V : (vi , v j ) is
an edge if and only if pi j > 0. The Markov chain M can be interpreted as a
random walk on X , with starting point x0 . If the random walk arrived at vi after
−−−→
m steps then the probability that it continues with the edge (vi , v j ) is pi j .
We defined only Markov chains with a finite number of states and sta-
tionary transition probabilities (i.e., the matrix P did not depend on m). The
higher transition probabilities pi(k) j are defined as the conditional probabilities
pi(k)
j := Prob(x m+k = v j | x m = vi It is easy to see that these are independent of
).
m (Exercise 2.3). We say that the state v j can be reached from vi if pi(k) j > 0 for
some k ≥ 0. The Markov chain is called irreducible if any state can be reached
from any state.
We say that the state vi has period t > 1 if pii(k) = 0 for all k that are not
multiples of t, and t is the greatest number with this property. If no period t > 1
exists then vi is called aperiodic. In an irreducible Markov chain, all states are
aperiodic or all of them have the same period (Exercise 2.4).
Let u = (u 1 , . . . , u n ) be a vector with positive coordinates and satisfying

j u j = 1. We say that u is the stationary distribution for M if uP = u. The
major result we need concerning Markov chains is the following well-known
theorem (see, e.g., [Feller, 1968]).
Theorem 2.2.1. Let M be a finite state, irreducible, and aperiodic Markov

chain with n states. Then there exists a stationary distribution vector
(u 1 , . . . , u n ) for M and a constant δ ∈ (0, 1) such that for all states vi and
m ≥ 0, we have |Prob(xm = vi ) − u i | ≤ δ m .
26 Black-Box Groups
Random Elements as Products of Generators
One method for constructing random elements in a black-box group G = S is
to take products x = s1 s2 · · · sk , where each si is a randomly chosen element of
S ∪ S −1 , and where k is “sufficiently large.” Such an x corresponds to a random
walk of length k on the Cayley graph (G, S), starting from the identity element
of G. At each vertex vi of the Cayley graph, the random walk chooses the next
vertex uniformly among the neighbors of vi . The transition probability matrix
P of the Markov chain M associated with this random walk has entries

1/d, {g, h} ∈ E((G, S))

pg,h := (2.1)
0, otherwise,
where d is the common valency of the vertices of (G, S). Because (G, S)
is connected, this Markov chain is irreducible. We have pg,g (2)
> 0 for all g ∈ G,
because we can walk to any neighbor of g and then back to g in two steps; in
(2)
fact, pg,g = 1/d. Hence M is aperiodic, or each vertex has period 2. In fact,
period 2 is possible if (G, S) is bipartite; so, to obtain an aperiodic Markov
chain M ∗ , we consider lazy random walks on the Cayley graph. At each step, the
walk stays at its end vertex v with probability 1/2, or it continues to a neighbor
u of v with probability 1/2d. The transition probability matrix P ∗ has entries

 1/2d, {g, h} ∈ E((G, S))
∗
pg,h := 1/2, g=h

0, otherwise.
∗
Since pg,g > 0 for all g ∈ G, the Markov chain M ∗ is aperiodic. Staying at a
vertex during the random walk means inserting the identity element of G in the
product g = s1 s2 · · · sk , so the computation of random elements using M and
M ∗ is actually the same. The difference is the length of products we consider:
It is the predetermined number k in M, but it is a random variable in M ∗ .
Since M ∗ is irreducible and aperiodic, Theorem 2.2.1 applies. The stationary
distribution vector of M ∗ has entries 1/|G| in each coordinate (cf. Exercise 2.5),
so x converges to the uniform distribution on G. Since |Prob(x = g)−1/|G|| <
δ k for some 0 < δ < 1, stopping after k steps for k > (1 + log |G|)/ log(1/δ)
means that we obtain a nearly uniformly distributed element of G. The problem
with this estimate is that although we have an upper bound for |G| (we can
always use |G| < N |Q| N if the black-box group is encoded by strings of length
at most N over the alphabet Q), usually we do not have any idea about δ.
The Perron–Frobenius theorem implies that 1 is an eigenvalue of the transition
probability matrix P ∗ , and all other eigenvalues have absolute value less than 1.
The value of δ is roughly the second largest among the absolute values of
eigenvalues, but obtaining good estimates for this absolute value in an arbitrary
Cayley graph is quite difficult. Also, it is possible that the difference between 1
and the second largest absolute value of an eigenvalue, the eigenvalue gap of
P ∗ , is O(1/|G|), which means slow convergence to the uniform distribution.
Examples for slow convergence are abelian groups with large cyclic subgroups
or, more generally, groups with large abelian factor groups. Ideally, we would
like an eigenvalue gap polynomial in 1/ log |G|, as this means that we can
construct a nearly uniformly distributed group element as a product of length
bounded by a polynomial of the input length.
If an algorithm needs a sequence of random elements, a practical solution is
to construct a random walk of length k0 , for a large enough k0 with which the
user feels comfortable; after that, we continue the random walk and output the
sequence of vertices that are added to the walk.
The Product Replacement Algorithm

Another method for generating random elements is the product replacement
algorithm from [Celler et al., 1995]. Let (x1 , . . . , xm ) be a sequence of elements
of G that generate G. We pick an ordered pair (i, j), i = j from the uniform
distribution of pairs with 1 ≤ i, j ≤ m, and we replace xi either by the product
xi x j or by xi x −1
j . The resulting new sequence still generates G, because we can
recover xi as the product or ratio of two elements of the new sequence. Product
replacement corresponds to a Markov chain M where the set V of states is the
set of m-long generating sequences of G. From each state, we can proceed in
2m(m − 1) ways to a new state, but it is possible that choosing different pairs
(i, j) or a different exponent in the product xi x ±1 j carries us to the same new
state. Hence the positive transition probabilities may not have a common value,
as in (2.1).
Our goal is to show that, under some mild assumptions, the Markov chain M
associated with the product replacement algorithm is irreducible and aperiodic
(cf. Theorem 2.2.3). In the proof, we need the following simple lemma.
Lemma 2.2.2. Let X = (x1 , . . . , xm ) and Y = (y1 , . . . , ym ) be two states of

M that differ only in the kth coordinate (i.e., xi = yi for all i = k). Suppose
further that X \{xk } also generates G. Then Y can be reached from X .
Proof. Let z := xk−1 yk , and write z as a product z = xi1 · · · xil of elements from
X \{xk }. Starting with X and replacing the element in position k by the product
of elements in position k and i j for j = 1, . . . , l takes X to Y .
Theorem 2.2.3. Let m 1 be the minimal size of a generating set of G and let
m 2 be the maximal size of a nonredundant generating set. If m ≥ m 1 + m 2
28 Black-Box Groups
then the Markov chain M associated with the product replacement algorithm
is irreducible and aperiodic.
Proof. Let X = (x1 , . . . , xm 1 , 1, . . . , 1) ∈ V be fixed, where {x1 , . . . , xm 1 } is a

generating set of minimal size of G. First, we prove that the period of the
state X is 1. If G is not cyclic of prime order then m 2 ≥ 2 and so the transition
probability p X,X is positive, since we can replace the last coordinate of X by the
product of the last two coordinates. If G is cyclic of prime order then X = (x, 1)
for some x ∈ G\{1}. The sequence ((x, 1), (x, x), (x, 1)) shows that p (2) X,X > 0
and the sequence ((x, 1), (x, x), (x 2 , x), (x 2 , x −1 ), (x, x −1 ), (x, 1)) shows that
p (5)
X,X > 0; therefore, the period of X is 1 in this case as well. Because the period
of X is 1, if we prove that any state Y ∈ V can be reached from X and X can
be reached from Y then irreducibility and aperiodicity follow immediately.
First we show that if X ∗ is a permutation of the sequence X then X and X ∗ can
be reached from each other. It is enough to prove this if X ∗ is obtained by a single
transposition from X , since any permutation is a product of transpositions. If
we want to exchange some xi , x j with i ≤ m 1 < j then we apply Lemma 2.2.2
to change the jth coordinate of X to xi , and then, by the same lemma, change
the ith coordinate to 1. If we want to exchange some xi , x j with i, j ≤ m 1 then
first we exchange xi with xk for some k > m 1 , then we exchange positions i
and j (we can do this because now we have the identity element in position i),
and finally we exchange positions j and k.
Let now Y = (y1 , . . . , ym ) be an arbitrary state, and let (yi1 , . . . , yil ) be a
subsequence of Y containing a nonredundant generating set. Then l ≤ m 2 . We
can reach Y from X by first reaching a permutation X ∗ of X where the nontrivial
coordinates of X ∗ comprise a subset of the positions [1, m]\{i 1 , . . . , il }, then
applying Lemma 2.2.2 to change positions i 1 , . . . , il of X ∗ to yi1 , . . . , yil , and
then applying Lemma 2.2.2 again to change the entries in the positions [1, m]\
{i 1 , . . . , il } to the values y j . The reversal of these steps reaches X from Y .
Each element of V can be obtained in 2m(m −1) ways as a result of a product

replacement. Therefore, the transition probability matrix has constant column
sums and so the stationary distribution of M is uniform on the m-element
generating sets of G. As in the case of random walks on Cayley graphs, the
problem is that we cannot estimate the parameter δ, which shows the speed of
the convergence to the stationary distribution. Note that because the number
of states in this Markov chain is much larger than |G|, an eigenvalue gap
proportional to |V | would be disastrous. An additional difficulty is that we
need random elements of G, not random generating sets. The practical solution
is to output a random element of the generating set created by the product
replacement algorithm. If a sequence of random elements is needed then we
perform k0 steps of product replacement to initialize a random generating set,
and at each further product replacement we output the element that is inserted
into the new generating set. Hence, after initialization, random elements are
created with the cost of one group operation. Note that the elements of G
are not necessarily distributed uniformly in generating sets (the case of cyclic
groups of nonprime order is simple enough to analyze).
Despite the difficulties mentioned in the previous paragraph, the product
replacement algorithm performs amazingly well in practice. In matrix groups
of dimension in the low hundreds, the GAP implementation uses about k0 = 100
product replacements to initialize. After that, the distribution of the sequence of
random elements output by the algorithm seems to be close to the distribution in
the entire group with respect to the properties most frequently needed in matrix
group algorithms. [Celler et al., 1995] contains statistical analysis of the output
in some linear groups and sporadic groups. Serious efforts have been made to
estimate the rate of convergence in product replacement (cf. the survey [Pak,
2001] and its references).
The third method for generating random elements is from [Babai, 1991]. We
cite the following theorem without proof.
Theorem 2.2.4. Let c, C > 0 be given constants, and let ε = K −c where K is a

given upper bound for the order of a group G. There is a Monte Carlo algorithm,
that, given any set S of generators of G, sets up a data structure for the construc-
tion of ε-uniformly distributed elements at a cost of O(log5 K + |S| log log K )
group operations. The probability that the algorithm fails is at most K −C .
If the algorithm succeeds, it permits the construction of ε-uniformly dis-
tributed random elements of G at a cost of O(log K ) group operations per
random element.
The algorithm mentioned in Theorem 2.2.4 has not been implemented,

because O(log5 K ) group operations are prohibitively expensive in practice.
This is especially true in the case of matrix groups, which is the potentially
most important application of black-box group techniques. The significance of
Theorem 2.2.4 is in theoretical investigations: It allows, in polynomial time in
the input length, the construction of random elements that are provably nearly
uniformly distributed.
In certain situations, both in theoretical investigations and in practice, we can
avoid the use of random elements by applying random subproducts, which can
30 Black-Box Groups
emulate some properties of truly random elements. This is the topic of our next
section.
2.3. Random Subproducts

2.3.1. Definition and Basic Properties
A random subproduct of a sequence of group elements (g1 , g2 , . . . , gk ) is a
product of the form g1ε1 g2ε2 · · · gkεk , where the εi are chosen independently from
the uniform distribution over {0, 1}. Slightly more generally, a random sub-
product of length l is defined by selecting a random subset of size l of the gi
and forming a random subproduct of them. In some applications, we use the
ordering inherited from the sequence (g1 , g2 , . . . , gk ); in other ones, we also
construct a random ordering of the chosen l-element subset.
Random subproducts were introduced in [Babai et al., 1988] to speed up
certain orbit computations. (The journal version of this paper is [Babai et al.,
1997b].)
Lemma 2.3.1. Suppose that G = S acts on a set and let ⊆ . Let g be
a random subproduct of the elements of S (in any fixed ordering of S). If is
not closed for the action of G then Prob(g = ) ≥ 1/2.
Proof. Let S = {g1 , . . . , gk }, g = g1ε1 g2ε2 · · · gkεk , and pi = g1ε1 g2ε2 · · · giεi for
0 ≤ i ≤ k. If is not closed for the G-action then gi = for some gi ∈ S.
Let us consider the largest index i with this property. If pi−1 fixes then pi does
not fix with probability 1/2, since if εi = 1 then pi = pi−1 gi = gi = .
The other case is that pi−1 does not fix . Then with probability at least 1/2
neither does pi , since if εi = 0 then pi = pi−1 = . Furthermore, if pi =
then g = , since all g j , j > i, fix . Formally,
ε1 ε2 εi
Prob(g = ) = Prob g1 g2 ···gi =
ε1 εi−1 ε1 εi−1
≥ Prob εi = 1|g1 ···gi−1 = Prob g1 ···gi−1 =
ε1 εi−1 ε1 εi−1
+ Prob εi = 0|g1 ···gi−1 = Prob g1 ···gi−1 = = 1/2.
(As usual, Prob(A|B) denotes conditional probability.)
The systematic development of the random subproduct method started in

1991, in the conference proceedings version of [Babai et al., 1995]. Most ap-
plications are based on the following corollary of Lemma 2.3.1.
Lemma 2.3.2. Suppose that G = S and K G. Let g be a random subproduct
of the elements of S. Then Prob(g ∈ K ) ≥ 1/2.
Proof. G acts transitively (by right multiplication) in its regular representa-

tion on := G. The subset := K of is not closed for this G-action;
hence, by Lemma 2.3.1, Prob(K g = K ) ≥ 1/2. (Note that, in this context,
K g = {kg | k ∈ K }, not the conjugate of K by g.) To finish the proof, observe
that K g = K implies that g ∈ K .
Numerous algorithms that use uniformly distributed random elements ex-

ploit the fact that, if K is a proper subgroup of G and g is a random element
of G, then Prob(g ∈ K ) = 1 − 1/|G : K | ≥ 1/2. One of the applications of ran-
dom subproducts is to provide an efficient alternative in these algorithms via
Lemma 2.3.2.
The applications of Lemma 2.3.2 use the following technical lemma from
[Babai et al., 1995]. The proof we give is by A. Zempléni.
Lemma 2.3.3. Let X 1 , X 2 , . . . be a sequence of 0−1 valued random variables

such that Prob(X i = 1) ≥ p for any values of the previous X j (but X i may
depend on these X j ). Then, for all integers t and 0 < ε < 1,

t
≤ e−ε
2
Prob X i ≤ (1 − ε) pt pt/2
.
i=1
Proof. The proof is based on Chernoff’s bound (cf. [Chernoff, 1952]). This
states that if Y1 , Y2 , . . . are independent Bernoulli trials (coin flippings) with
Prob(Yi = 1) = p then

t
≤ e−ε
2
Prob Yi ≤ (1 − ε) pt pt/2
. (2.2)
i=1
Hence, to prove the lemma, it is enough to show that for all integers k, t,

t
t
Prob Xi ≥ k ≥ Prob Yi ≥ k . (2.3)
i=1 i=1
32 Black-Box Groups
We prove (2.3) by induction on t. The initial case is obvious. Supposing (2.3)
for t, we have

t+1
t
Prob Xi ≥ k = Prob Xi ≥ k
i=1 i=1

t t

+ Prob X t+1 = 1 X i = k − 1 Prob Xi = k − 1
i=1 i=1

t t
≥ Prob X i ≥ k + p Prob Xi = k − 1
i=1 i=1

t
t
= p Prob X i ≥ k − 1 + (1 − p) Prob Xi ≥ k
i=1 i=1

t
t
≥ p Prob Yi ≥ k − 1 + (1 − p) Prob Yi ≥ k . (2.4)
i=1 i=1
We used the inductive hypothesis in the last inequality. Doing the steps of (2.4)
backward using the Yi , and noting that the independence of the Yi means that
we have equality at each step, we obtain

t
t
t+1
p Prob Yi ≥ k − 1 + (1− p) Prob Yi ≥ k = Prob Yi ≥ k ,
i=1 i=1 i=1
which finishes the proof of (2.3).
Primarily, we shall apply Lemma 2.3.3 in the following situation. Sup-

pose that we know an upper bound l H for the length of subgroup chains in
a group H and we have a sequence of elements (x1 , . . . , xs ) from H with
the following property: For all i ≤ s, if x1 , . . . , xi−1 = H then Prob(xi ∈
x1 , . . . , xi−1 ) ≥ p. Then, if we define X i = 0 if and only if x1 , . . . , xi−1 = H

and xi ∈ x1 , . . . , xi−1 , and X i = 1 otherwise, then the condition Xi ≥ lH

means that the xi s generate H . Indeed, if X i ≥ l H and x1 , . . . , xs = H then
the subgroup chain 1 ≤ x1 ≤ x1 , x2 ≤ · · · ≤ x1 , . . . , xs ≤ H increases
strictly more than l H times, which is a contradiction.
We shall refer to such applications of Lemma 2.3.3 as applications of basic
type. In some cases, we need more complicated variants of the method; here we
wanted to present only the basic idea, without including too many hypotheses.
2.3.2. Reducing the Number of Generators
As a first application of random subproducts, we describe algorithms to handle
the following situation. Given a black-box group G = S, suppose that an upper
bound l G is known for the length of subgroup chains in G. (If the elements of
G are encoded as strings of length at most N over an alphabet Q then we have
|G| ≤ N |Q| N , and so l G ≤ log(N |Q| N ).) If |S| > l G then the generating set
S is redundant; if |S| is much larger than l G then we want to find a smaller
generating set.
If membership testing is possible in subgroups of G then the reduction
of the generating set can be done by a deterministic algorithm. Namely,
if S = {g1 , . . . , gk } then, recursively for i = 1, 2, . . ., determine whether
gi+1 ∈ g1 , . . . , gi . If so, then gi+1 is discarded. The remaining elements of
S form a generating set of size at most l G .
If the case when no membership testing is available, there are Monte Carlo
algorithms that, with high probability, construct a small generating set.
Lemma 2.3.4. Suppose that a generating set S and an upper bound l G for
the length of subgroup chains for a group G are given, and let δ > 0 be ar-
bitrary. Then there is a Monte Carlo algorithm that, with probability at least
1 − δ, constructs a generating set T of size O(l G ), using O(|S|l G ) group op-
erations.
Proof. Let the set T consist of cl G random subproducts made from S, where
the constant c depends on the desired error probability. We claim that
2 2
Prob(T = G) ≥ 1 − e−(1− c ) cl G /4
. (2.5)
Let r1 , r2 , . . . , rclG denote the elements of T . If r1 , . . . , ri−1 = G then, by

Lemma 2.3.2, ri ∈ r1 , . . . , ri−1 with probability at least 1/2. Hence an appli-
cation of Lemma 2.3.3 of basic type with the parameters ε = 1 − 2/c, p = 1/2,
and t = cl G proves (2.5).
Remark 2.3.5. As is clear from (2.5), to achieve error probability less than δ we
may choose the constant c := max{4, 16 ln(δ −1 )/l G }. Another interpretation is
that we construct a new generating set of size 4l G , with the reliability of the
algorithm improving exponentially as a function of l G . In a similar way, we
can compute explicit values for the constants in all algorithms presented in
Sections 2.3.3 and 2.3.4 as well. In Remark 2.3.13, we shall comment further
on the error probability in our algorithms.
34 Black-Box Groups
The following theorem from [Babai et al., 1995] also constructs a generating
set of size O(l G ), while reducing the number of necessary group multiplications
significantly. The idea is that at the beginning of the construction of a new gen-
erating set, already short random subproducts have a good chance to augment
the already constructed subgroup.
Theorem 2.3.6. Suppose that a generating set S and an upper bound l G for
the length of subgroup chains for a group G are given, and let δ > 0 be ar-
bitrary. Then there is a Monte Carlo algorithm that, with probability at least
1 − δ, constructs a generating set T of size O(l G ), using O(|S| log l G ) group
operations.
Proof. We can suppose that l G is a power of 2 and also that l G divides |S|
because we may put the first (|S| mod l G ) elements of S into T and work with
the subgroup of G generated by the remaining elements of S. Also, we can
suppose |S| ≥ 10l G ; otherwise we can define T := S.
The elements of T are constructed in two phases. In the first phase, we place
cl G random subproducts of length |S|/l G made from S into T , where c ≥ 10 is
a constant depending on the desired error probability. We claim that, after this
phase,
Prob(|{g ∈ S | g ∈ T }| < l G ) ≥ 1 − e−clG /15 . (2.6)
To see (2.6), we use an extension of the idea behind the basic-type applications
of Lemma 2.3.3. Let r1 , r2 , . . . be the sequence of random subproducts of length
|S|/l G placed in T and associate a 0–1 valued random variable X i to each ri . Set
X i = 1 if and only if either ri ∈ r1 , . . . , ri−1 or |{g ∈ S | g ∈ r1 , . . . , ri−1 }| <

l G . Clearly, if i≤clG X i > l G then fewer than l G elements of S are not in
T , since otherwise the initial segments of the ri define a subgroup chain of
length greater than l G in G. Hence, it is enough to give an upper estimate for

Prob( i≤clG X i ≤ l G ).
To this end, note that Prob(X i = 1) > 3/10. This is true because if |{g ∈ S |
g ∈ r1 , . . . , ri−1 }| < l G then Prob(X i = 1) = 1. Otherwise, with probability
at least

|S| − l G |S| |S| − |S|/l G lG
1− ≥1 − > 1 − 1/e > 3/5,
|S|/l G |S|/l G |S|
the set of size |S|/l G used at the definition of ri contains an element of

S\r1 , . . . , ri−1 . Hence, by Lemma 2.3.2, Prob(ri ∈ r1 , . . . , ri−1 ) > 3/10. Fi-
nally, applying Lemma 2.3.3 with the parameters ε = 1 − 10/(3c), p = 3/10,
and t = cl G , we obtain

cl G 2 3
X i ≤ l G ≤ e− 2 (1− c(3/10) ) 10 clG ≤ e−clG /15 ,
1 1
Prob
i=1
which implies (2.6).

In the second phase, there are log l G rounds. In the ith round, our goal is to
reduce the number |{g ∈ S | g ∈ T }| to less than l G /2i . To this end, we fix a
constant c ≥ 100 and in the ith round we add cl G /2i random subproducts of
length 2i−1 |S|/l G , made from S, to the set T .
Suppose that, after the (i − 1)st round, |{g ∈ S | g ∈ T }| < l G /2i−1 . We
claim that, after the ith round,

Prob(|{g ∈ S | g ∈ T }| < l G /2i ) > 1 − e−c lG /(2 64) .
i
(2.7)
To prove (2.7), denote by Ti−1 the set T after round i − 1 and by q1 , q2 , . . .

the random subproducts of length 2i−1 |S|/l G added to T in round i. Let
Si j := {g ∈ S | g ∈ Ti−1 , q1 , . . . , q j−1 }. Applying our usual trick, associate a
0–1 valued random variable Z j to each q j . Now Z j = 1 if and only if either
q j (considered as a product of elements of S) contains exactly one element
of Si j or |Si j | < l G /2i . If q j contains exactly one element, say h, of Si j then

h ∈ Ti−1 , q1 , . . . , q j−1 , q j . Therefore, if j≤c lG /2i Z i > l G /2i then fewer than
l G /2i elements of S are not in Ti , because the number of such elements cannot
decrease more than l G /2i and still remain at least l G /2i . Hence, it is enough to

give an upper estimate for Prob( j≤c lG /2i Z j ≤ l G /2i ).
We claim that Prob(Z j = 1) ≥ 1/20. To see this, let x = |Si j |. If x < l G /2i
then Z j = 1. Otherwise, the subset of S of size 2i−1 |S|/l G used at the definition
of q j contains exactly one element h from Si j with probability

|S| − x |S|
x i−1
2 |S|/l G − 1 2i−1 |S|/l G
2i−1 |S| 1
x−2
|S| − 2i−1 |S|/l G − k
=x
l G |S| − x + 1 k=0 |S| − k
x−1
|S| − 2i−1 |S|/l G − x + 2
> (x2 /l G )(|S|/(|S| − x + 1))
i−1
|S| − x + 2
x−1 lG −1
1 2i−1 |S|/l G 1 2i−1 |S|/l G 2i−1 1
> 1− > 1− >
2 |S| − x + 2 2 9|S|/10 10
and then h has 1/2 chance to get exponent 1 in q j . If h gets exponent 1
then h ∈ Ti−1 , q1 , . . . , q j . Finally, applying Lemma 2.3.3 with the parameters
36 Black-Box Groups
ε = 1 − 20/c , p = 1/20, and t = cl G /2i , we obtain

2
Z i ≤ l G /2 ≤ e−(1/2)(1−20/c ) c lG /(2 20) < e−c lG /(2 64) ,
i i
i
Prob
j≤c l G /2i
which finishes the proof of (2.7).

Combining the two phases, the algorithm constructs a generating set of size

(c + c )l G with error probability at most e−clG /15 + i e−c lG /(2 64) < e−clG /15 +
i
−c /100
e . The number of group operations is O(l G (|S|/l G )) = O(|S|) in the first
phase and
log l
G l G 2i−1 |S|
O = O(|S| log l G )
i=1
2i lG
in the second.
Lemma 2.3.7. Suppose that a black-box group G = S acts transitively on a

set , and let δ > 0 be arbitrary. With probability at least 1 − δ, O(log ||)
random subproducts made from S generate a subgroup of G that acts transi-
tively on .
Proof. Let g1 , . . . , gt be random subproducts made from S (using an arbitrary

but fixed ordering of S), where t = c ln || for some sufficiently large constant
c. We can suppose that c ≥ 45. For i ∈ [0, t], let G i := g1 , . . . , gi , and let Ni
denote the number of orbits of G i on . We claim that, for all i ∈ [1, t],

if Ni−1 > 1 then Prob Ni ≤ 78 Ni−1 ≥ 13 . (2.8)
Indeed, let 1 , . . . , k , k = Ni−1 > 1, be the orbits of G i−1 on . For each

g
j ∈ [1, k], let X j be a 0–1 valued random variable with X j = 1 if j i = j ,
and X j = 0 otherwise. By Lemma 2.3.1, Prob(X j = 1) ≥ 1/2, and so for the

random variable X = kj=1 X j we have E(X ) ≥ k/2. (As usual, E(A) denotes
the expected value of a random variable A.) Let p := Prob(X ≤ k/4). Then
pk/4 + (1 − p)k ≥ E(X ) ≥ k/2, implying p ≤ 2/3. Hence, with probability
at least 1/3, at least k/4 orbits of G i−1 are proper subsets of an orbit of G i .
However, this means that at least k/8 orbit of G i are the union of more than
one orbit of G i−1 , implying (2.8).
Let Y1 , . . . , Yt be 0–1 valued random variables, with Yi = 0 if Ni−1 > 1 and
Ni > 7Ni−1 /8, and Yi = 1 otherwise. By (2.8), we have Prob(Yi = 1) ≥ 1/3, and
t
i=1 Yi ≥ ln ||/ ln(8/7) implies that G t is transitive. Applying Lemma 2.3.3
with the parameters ε = 1 − 3/(c ln(8/7)), p = 1/3, and t = c ln ||, we obtain

t
ln || 21
≤ e− 2 (1− c ln(8/7) ) 3 c ln || ≤ e−c ln ||/24 . (2.9)
1 3
Prob Yi ≤
i=1
ln(8/7)
In the last inequality, we have used our assumption that c ≥ 45.
2.3.3. Closure Algorithms without Membership Testing

As promised in Section 2.1.2, we give versions of the G-closure algorithm
without using membership testing. The situation we consider is the following.
A black-box group G = S acts on a (not necessarily different) black-box
group H . We suppose that given any h ∈ H and g ∈ G, we can compute the
image h g ∈ H . Also, we suppose that an upper bound l H is known for the
length of subgroup chains in H . Given A ⊆ H , our goal is to find generators
for A G ≤ H .
The breadth-first-search technique of Section 2.1.2 found generators for a
subgroup chain A = U0 ≤ U1 ≤ · · · ≤ Ul H = A G , where Ui generated
the group Ui−1 , {u g | u ∈ Ui−1 , g ∈ S}. Membership testing was used to dis-
card redundant generators and ensure that |Ui | ≤ l H for all i.
Without the benefit of membership testing, a possibility is to start with
U0 := A and define sets Ti , Ui recursively. Given Ui−1 , let Ti := Ui−1 ∪
{u g | u ∈ Ui−1 , g ∈ S} and let Ui be a set of size O(l H ) obtained via the al-
gorithm in the proof of Lemma 2.3.4 or Theorem 2.3.6 such that, with high
probability, Ui = Ti . Clearly, A = U0 ≤ U1 ≤ U2 ≤ · · · ≤ A G
and Ui < Ui+1 if Ui = A G . Since a subgroup chain cannot grow more
than l H times, Ul H must be closed for the G-action and so Ul H = A G .
A faster algorithm, which avoids the construction of small generating sets for
the intermediate groups in the procedure described in the previous paragraph,
is given in [Cooperman and Finkelstein, 1993]. It is based on the following
lemma.
Lemma 2.3.8. Suppose that G = S acts on H and let U ⊆ H . Let g be a

random subproduct of S and u be a random subproduct of U . If U is not
closed for the G-action then Prob(u g ∈ U ) ≥ 1/4.
Proof. By hypothesis, K := {k ∈ G | U k = U } G. Hence, by Lemma 2.3.2,

−1
Prob(g ∈ K ) ≥ 1/2. If g ∈ K then X := U g ∩ U = U . Thus, by Lemma
2.3.2, Prob(u ∈ X ) ≥ 1/2. Combining the two probabilities, we obtain that
Prob(u g ∈ U ) = Prob(u ∈ X | g ∈ K )Prob(g ∈ K ) ≥ 1/4.
38 Black-Box Groups
Theorem 2.3.9. Suppose that G = S acts on a group H , an upper bound
l H for the length of subgroup chains in H is given, and δ > 0 is an arbitrary
constant. Then there is a Monte Carlo algorithm that, with probability at least
1 − δ, constructs O(l H + |A|) generators for the G-closure of some A ⊆ H ,
using O(l H (|A| + |S| + l H )) group operations.
Proof. The algorithm is a straightforward consequence of Lemma 2.3.8. Ini-

tialize U := A. Given U , let u be a random subproduct of U and let g be a
random subproduct of S. Add u g to U . An application of Lemma 2.3.3 of basic
type shows that increasing U O(l H ) times creates a generating set for A G with
arbitrarily small constant error.
2.3.4. Derived and Lower Central Series

As pointed out in [Babai et al., 1995], given a black-box group G = S and an
upper bound l G for the length of subgroup chains in G, there is a Monte Carlo
algorithm that, with arbitrarily small constant error probability, constructs a
generating set of size O(l G ) for the derived subgroup G of G. Namely, we
compute the set A := {[a, b] | a, b ∈ S}, then we construct a generating set of
size O(|S|2 +l G ) for G = A G by the algorithm in the proof of Theorem 2.3.9,
and finally we reduce the number of generators for G to O(l G ) by the algorithm
in the proof of Lemma 2.3.4 or Theorem 2.3.6. This process can be iterated to ob-
tain generators for the groups in the derived series and lower central series of G.
Note that it is possible to test solvability and nilpotence of G without mem-
bership testing: all we have to do is to check whether the l G th member of the
appropriate series is the identity subgroup of G. Also, often bounds sharper
than l G for the length of the derived series are available. For example, if G is a
permutation group of degree n then the length of the derived series is O(log n)
(cf. [Dixon, 1968]) whereas G may have subgroup chains of length (n).
However, we cannot test subnormality of some H ≤ G without membership
testing. Repeated applications of the algorithm in the proof of Theorem 2.3.9
yield a candidate subnormal series between H and G, but eventually we have
to decide whether two sets of group elements generate the same subgroup.
In the rest of this section, we describe a more practical algorithm for comput-
ing commutator subgroups that avoids the blowup in the number of generators.
We start with an identity about the commutator of products.
Lemma 2.3.10.

n
1
[a1 a2 · · · an , b1 b2 · · · bm ] = [ai , b j ]b j+1 ···bm ai+1 ···an .
i=1 j=m
Proof. By induction, it is easy to check that

1
[a, b1 b2 · · · bm ] = [a, b j ]b j+1 ···bm
j=m
and

n
[a1 a2 · · · an , b] = [ai , b]ai+1 ···an .
i=1
The combinination of these two identities proves the lemma.
Lemma 2.3.11. Let H = U and K = V be two subgroups of a common

parent group and let u and v be random subproducts from the sets U and
V , respectively. Moreover, let N ≤ [H, K ], N H, K . If N = [H, K ] then
Prob([u, v] ∈ N ) ≥ 1/4.
Proof. Let U = {u 1 , . . . , u n }, V = {v1 , . . . , vm }, u = u ε11 · · · u εnn , and v =

v1ν1 · · · vmνm . If N = [H, K ] then there exist u i ∈ U, v j ∈ V such that [u i , v j ] ∈
N . Fix the last such pair u i , v j with respect to the ordering implied by
Lemma 2.3.10: (i 1 , j1 ) ≺ (i 2 , j2 ) if either i 1 < i 2 , or i 1 = i 2 and j1 > j2 .
ν
Then, by Lemma 2.3.10, [u, v] can be written in the form [u, v] = x[u iεi , v j j ] y z
for some x ∈ [H, K ], y ∈ H, K , and z ∈ N . So
Prob([u, v] ∈ N ) ≥ Prob(εi ν j = 1|x ∈ N ) Prob(x ∈ N )

+ Prob(εi ν j = 0|x ∈ N ) Prob(x ∈ N )
≥ (1/4) Prob(x ∈ N ) + (3/4) Prob(x ∈ N ) ≥ 1/4.
Theorem 2.3.12. Suppose that H = U and K = V are subgroups of G

and an upper bound l G for the length of subgroup chains in G is known. Let
δ > 0 be an arbitrary constant. Then there is a Monte Carlo algorithm that,
with probability at least 1 − δ, constructs O(l G ) generators for [H, K ], using
O(l G (|U | + |V | + l G )) group operations.
Proof. We construct a generating set for [H, K ] in two phases. In the first
phase, we place cl G commutators [u, v] into a set T , where u and v are random
subproducts from the sets U and V , respectively. Based on Lemma 2.3.11, an
application of Lemma 2.3.3 of basic type shows that, with error probability at

most e−c lG , the normal closure of T generates [H, K ]. In the second phase,
using the algorithm in the proof of Theorem 2.3.9, we compute generators for
T H,K .
40 Black-Box Groups
Remark 2.3.13. During the computation of the derived or lower central series
of G, we apply the algorithm in the proof of Theorem 2.3.12 l G times, and
the error probabilities of all rounds have to be added. This causes no problem,

because all error estimates are of the form e−c lG for some constant c and

l G e−c lG is still exponentially small as a function of l G .
Lemma 2.3.11 also can be applied to test whether a given group G = S is
abelian. Note that the straightforward deterministic method, checking whether
each pair of generators commutes, requires (|S|2 ) group operations.
Lemma 2.3.14. Given a black-box group G = S and an arbitrary constant

δ > 0, there is a Monte Carlo algorithm that, with probability at least 1 − δ,
decides whether G is abelian, using O(|S|) group operations.
Proof. We apply Lemma 2.3.11 with H = K = G, U = V = S, and N = 1.

If G is not abelian then two random subproducts on S do not commute with
probability at least 1/4. So, if c pairs of random subproducts commute then
Prob(G is abelian) ≥ 1 − (3/4)c .
2.4. Random Prefixes

The material in this section is reproduced from the paper [Babai et al., 1995],
copyright
c 1995 by Academic Press; reprinted by permission of the publisher.
2.4.1. Definition and Basic Properties

The paper by [Babai et al., 1995] also introduces the method of random pre-
fixes, which is an alternative to the applications of random subproducts. Ran-
dom prefixes can be used to reduce generating sets roughly with the asymp-
totic efficiency as the algorithm in the proof of Theorem 2.3.6 and to speed
up asymptotically the computation of algebraic closures and commutator sub-
groups (cf. Theorems 2.3.9 and 2.3.12).
Given a sequence of group elements g1 , g2 , . . . , gk , a random prefix of this
sequence is the product of a randomly chosen initial segment of a random order-
ing of the gi . To apply random prefixes, we need an analogue of Lemma 2.3.2
to ensure that a random prefix of generators has a fair chance to avoid any given
proper subgroup. Our first goal is the deduction of such an analogue.
We start with some technical results concerning spreader permutations. Let
M = {1, 2, . . . , m}, M̄ = M ∪ {0, m + 1}, H ⊆ M, and H̄ = H ∪ {0, m + 1}.
For x ∈ H , let
δ H (x) = min |x − y|
y ∈ H̄ \{x}
(i.e., δ H (x) is the distance of x to its nearest neighbor in H̄ ). Define δ(H ) :=

x ∈ H δ H (x) and spread(H ) := δ(H )/(m + 1). It is easy to see that 0 <
spread(H ) < 1 (in fact, |H |/(m + 1) ≤ spread(H ) ≤ 1 − 1/(|H | + 1)).
This quantity, the spreading factor of H , measures how evenly the set H is
distributed within M. We say that H is ε-spread if spread(H ) ≥ ε.
Let P = { p1 , . . . , pr } be a set of permutations of M. We call P an (ε, m)-
spreader if for every nonempty H ⊆ M, at least one of the sets H p j (1 ≤ j ≤ r )
is ε-spread.
Lemma 2.4.1. Let H be a random subset of size h = 0 of M = {1, . . . , m}

and 1/30 ≤ ε < 1/15. Then Prob(spread(H ) ≥ ε) > 1 − (15ε)h/4 .
Proof. If |H | ≥ ε(m + 1) then spread(H ) ≥ |H |/(m + 1) ≥ ε with probability

1.
If 1 ≤ h ≤ 5 and h < ε(m + 1) then we compute the ratio q of h-element
subsets a1 < a2 < · · · < ah of M such that ai − ai−1 ≥ ε(m + 1)/ h for 1 ≤
i ≤ h + 1, with a0 := 0 and ah+1 := m + 1. For such sets H, spread(H ) ≥
hε(m + 1)/ h /(m + 1) ≥ ε. Introducing the notation d := ε(m + 1)/ h − 1,
we have
Prob(spread(H ) ≥ ε) ≥ q

m − (h + 1)d m m − h + 1 − (h + 1)d h
= >
h h m−h+1
h
(h + 1)d (h + 1)d (h + 1)ε(m + 1)
= 1− >1−h >1−
m−h+1 m−h+1 m−h+1
(h + 1)ε(m + 1)
>1 − > 1 − (h + 1)ε(15/14) > 1 − (15ε)h/4 .
(1 − ε)(m + 1)
In the case 6 ≤ h < ε(m + 1), we consider H as the range of an injective

function f from H0 := {1, 2, . . . , h} to M. We also define H0 := H0 ∪ {0, h +1}
and extend f by setting f (0) := 0 and f (h + 1) := m + 1.
Let k := &h/2'. For 0 < c < 1, we estimate the probability of the event A(c)
that there exist k distinct elements H0∗ ⊂ H0 , H0∗ = {ai, j | 1 ≤ i ≤ l, 1 ≤ j ≤

ki }, where 2 ≤ ki ≤ 3 for 1 ≤ i ≤ l and ki = k, satisfying the following
property: For all 1 ≤ i ≤ l and 1 ≤ j ≤ ki − 1, f (ai, j ) < f (ai, j+1 ) < f (ai, j ) +
42 Black-Box Groups
c(m + 1). Such k elements can be chosen

h+2 l (h + 2 − l)!
l k − 2l (h + 2 − k)!
ways, since the elements a1,1 , a2,1 , . . . , al,1 are from H0 ; from this set, we have
to choose a k − 2l-element subset to determine those ai,1 which have ki = 3;
finally, fixing the two subsets of the ai,1 , there are (h + 2 − l)!/(h + 2 − k)!
choices for the remaining ai, j .
Fixing the set H0∗ , we can define f by first choosing the (random) values
f (ai,1 ) for 1 ≤ i ≤ l and then continuing with the choice of f (ai, j ) for j ≥ 2.
The probability that f (ai, j+1 ) falls between f (ai, j ) and f (ai, j ) + c(m + 1) is at
most c(m + 1)/(m + 1 − (k − 1)). (Note that this estimate is valid even in the
case ai, j+1 = h + 1, although the value of f (h + 1) is not chosen randomly:
In this case, consider the probability that f (ai, j ) falls between (1 − c)(m + 1)
and m + 1.) Therefore,

h+2 l (h + 2 − l)! c(m + 1) k−l
Prob(A(c)) < .
l k − 2l (h + 2 − k)! m + 1 − k
Since k ≤ h/2 ≤ ε(m + 1)/2 ≤ (m + 1)/30, we have (m + 1)/(m + 1 − k) ≤

30/29. We claim that

h+2 l (h + 2 − l)! 29h k−l
< , (2.10)
l k − 2l (h + 2 − k)! 4
and so Prob(A(c)) < (15ch/2)k−l .

For 6 ≤ h ≤ 200, (2.10) can be checked by computer. (GAP needs about
10 seconds on a SparcStation2 to √do that.) For larger values, we use Stirling’s
formula. Note that the sequence
√ 2πn(n/e)n /n!, n = 0,√1, . . . is monotone
√ < n! n< (10/9) 2π n(n/e) for all
n n
increasing with limit 1, so 2πn(n/e)
n ≥ 1. Substituting the√appropriate 2πn(n/e) for the factorials in the de-
nominator and (10/9) 2πn(n/e)n in the numerator gives an upper estimate
for the left-hand side of (2.10).
Using that for 1/3 < x < 1/2
(3x − 1)3x−1 (1 − 2x)1−2x > 4x (29e/8)x−1 , (2.11)

in the case k/3 < l < k/2 the term (3l − k)3l−k (k − 2l)k−2l can be estimated
from below by k l 4l (29e/8)l−k . Also, (h + 2 − k)h+2−k > e2 (h/2)h+2−k . Using
these estimates, it is straightforward to check that (2.10) holds. If l ∈ {k/3, k/2}
then ( k−2ll
) = 1, and Stirling’s formula easily yields (2.10).
What does it mean that A(c) fails? Let A = (a0 , a1 , . . . , ah+1 ) be the sequence
of elements of H ∪ {0, m + 1}, listed in increasing order. Then let A1 , . . . , An
be the maximal subsequences of consecutive elements with difference less than
c(m + 1); that is, ai+1 − ai < c(m + 1) if ai and ai+1 are in the same A j , but
min(A j+1 ) − max(A j ) ≥ c(m + 1). Each A j of size greater than 1 can be par-
titioned into subsequences of length two and three, which can be considered as

the f -images of the set H0∗ . Therefore, if A(c) fails then j ∈ J |A j | < k, where
the summation runs over the A j with |A j | > 1. Hence more than h −k elements
x ∈ H are in one-element sets A j , which means that δ H (x) ≥ c(m+1). In particu-
lar, if h is even then we choose c := 2ε/ h and obtain that spread(H ) ≥ (h−k)c =
ε with probability greater than 1 − (15ε)k−l ≥ 1 − (15ε)h/4 . If h is odd then we
choose c := 2ε/(h +1) and obtain that spread(H ) ≥ (h −k)c = ε with probabil-
ity greater than 1−(15εh/(h+1))k−l ≥ 1−(15εh/(h+1))(h−1)/4 > 1−(15ε)h/4 .
(The last inequality is the only point in the proof of case h ≥ 6 where we used
the lower bound ε ≥ 1/30.)
Theorem 2.4.2. Let 1/30 ≤ ε < 1/15, C > 0, and r > (4 + 4(C + 1) log m)/
log(1/(15ε)). Furthermore, let P = { p1 , . . . , pr } be a set of r random permu-
tations of {1, 2, . . . , m}. Then P is an (ε, m)-spreader with probability at least
1 − m −C .
Proof. Let H be a nonempty subset of {1, 2, . . . , m} and h := |H |. By

Lemma 2.4.1, the probability that spread(H ) < ε for all permutations in P
is at most (15ε)hr/4 . Hence the probability that not all nonempty subsets are
ε-spread is at most
m
m m
(15ε)hr/4 = 1 + (15ε)r/4 − 1 < 2(15ε)r/4 m < m −C .
h=1
h
Here, the first inequality is true because the condition imposed on r implies that
(15ε)r/4 < 1/(me), and (1 + x)m < 1 + 2xm for 0 < x < 1/(me). The second
inequality is a straightforward consequence of the condition imposed on r .
Now we are in position to connect spreader permutations, random prefixes,

and expansion of subgroups.
44 Black-Box Groups
Lemma 2.4.3. Let G = S be a black-box group and P = { p1 , . . . , pr } be
an (ε, |S|)-spreader, and let K G. Moreover, let g be a random prefix of S,
obtained from a randomly chosen element of P. Then Prob(g ∈ K ) ≥ ε/(2r ).
Proof. Let H := {s ∈ S | s ∈ K } and let a1 < a2 < · · · < ah denote the posi-
tions corresponding to the elements of H in the randomly chosen permuta-
tion p ∈ P. We also define a0 := 0 and ah+1 := |S| + 1 and let g j denote the
random prefix of S corresponding to the first j elements of p. Then, for a
fixed i ∈ [1, h], either g j ∈ K for all j with ai−1 ≤ j < ai or g j ∈ K for all
j with ai ≤ j < ai+1 (it is also possible that both of these events occur).
So Prob(g ∈ K ) ≥ spread({a1 , . . . , ah })/2 and, since P is an (ε, |S|)-spreader,
spread({a1 , . . . , ah }) ≥ ε with probability at least 1/r .
2.4.2. Applications
We shall apply random prefixes to G-closure computations. We consider the
same situation as in Section 2.3.3, namely, G = S acts on H , we can compute
h g ∈ H for any h ∈ H, g ∈ G, and an upper bound l H is known for the length of
subgroup chains in H .
Lemma 2.4.4. Suppose that G = S acts on H and U ≤ H is not

closed for the G-action. Let P = { p1 , . . . , pr1 } be an (ε, |S|)-spreader and
Q = {q1 , . . . , qr2 } be an (ε, |U |)-spreader. Moreover, let g and u be random
prefixes on the sets S and U , respectively, obtained from randomly chosen
elements of P and Q. Then Prob(u g ∈ U ) ≥ ε 2 /(4r1r2 ).
Proof. By hypothesis, K := {k ∈ G | U k = U } G. Hence, by Lemma 2.4.3,

−1
Prob(g ∈ K ) ≥ ε/(2r1 ). If g ∈ K then X := U g ∩ U = U . Thus, by
Lemma 2.4.3, Prob(u ∈ X ) ≥ ε/(2r2 ). Combining the two probabilities, we ob-
tain Prob(u g ∈ U ) ≥ Prob(u ∈ X |g ∈ K )Prob(g ∈ K ) ≥ ε 2 /(4r1r2 ).
Theorem 2.4.5. Suppose that G = S acting on a group H , an upper bound

l H for the length of subgroup chains in H , a subset A ⊆ H , and a constant δ > 0
are given. Then there is a Monte Carlo algorithm that, with probability at least
1 − δ, constructs O(l H log |S|(log l H + log log |S|)) generators for A G , using
O(l H log |S|(log l H + log log |S|)3 + |A| log l H + |S| log |S|)
group operations.
Proof. We can suppose that |A| ∈ O(l H ) because if |A| is too big then we
construct O(l H ) generators for A. By Theorem 2.3.6, this can be done by
O(|A| log l H ) group operations.
We would like to apply Lemma 2.4.4. We can fix a value ε in the interval
[1/30, 1/15), and there is no problem concerning the set S: We can construct an
(ε, |S|)-spreader P = { p1 , . . . , pr1 } of size O(log |S|) and collect in a set TG all
products of elements of S corresponding to initial segments of the permutations
in P. For the construction of random permutations in P, see Exercise 2.1. The
construction of TG requires O(|S| log |S|) group operations.
However, the set U containing the already constructed generators for A G
changes during the algorithm, so we cannot precompute the product of initial
segments in random permutations of U . Computing spreader permutations of
increasing degree as |U | changes would take too much time. Therefore, we
proceed in the following way.
Let m = cl H log |S|(log l H + log log |S|) for a sufficiently large constant c
and let Q = {q1 , . . . , qr2 } be an (ε, m)-spreader of size O(log m) = O(log l H +
log log |S|). We can suppose that m is a power of 2. We use an m-long array
U to store generators for A G . Initially, U contains the elements of A and the
rest of U is padded with the identity element. As new generators x for A G
are constructed, we replace one of the identity elements in U by x. Now comes
the idea that speeds up the computation. Instead of products of initial segments
of the U qi , we store the product of elements in positions l2 j + 1, l2 j + 2, . . . ,
(l + 1)2 j in U qi for all i ∈ [1, r2 ], j ∈ [0, log m], and l ∈ [0, m/2 j − 1]. This
requires the storage of less than 2mr2 group elements.
From this data structure TH , the product of the first k elements of U qi
can be computed with O(log m) = O(log l H + log log |S|) group operations,
by taking segments of lengths corresponding to the terms in the binary ex-
pansion of k. Also, after replacing an element of U , the data structure can
be updated by at most r2 log m ∈ O((log l H + log log |S|)2 ) group operations,
since after updating segments of length 2 j , the r2 segments of length 2 j+1
containing the new element can be updated by one multiplication per seg-
ment.
This gives us the following algorithm: Replace the identity elements in U by
group elements of the form u g as described in Lemma 2.4.4, always updating the
data structure TH . If the construction of the spreaders P, Q was successful then
Lemma 2.4.4 and an application of the basic type of Lemma 2.3.3 imply that
by the time all m − |A| ∈ O(l H r1r2 ) identity elements are replaced, U generates

A G with probability greater than 1 − e−c l H for some constant c > 0. The
number of group operations required is O(mr2 log m) = O(l H log |S|(log l H +
log log |S|)3 ).
46 Black-Box Groups
Corollary 2.4.6. Suppose that an upper bound l G is known for the length of
subgroup chains in a group G and that O(l G ) generators are given for G and
for some H ≤ G. Let δ > 0. Then there is a Monte Carlo algorithm that, with
probability at least 1 − δ, constructs O(l G ) generators for the normal closure
H G , using O(l G log4 l G ) group operations.
Proof. By Theorem 2.4.5, O(l G log2 l G ) generators can be obtained by the indi-
cated number of group operations. By Theorem 2.3.6, this generating set can be
reduced to one of size O(l G ) with O(l G log3 l G ) further group operations.
Our last application is the asymptotic speedup of commutator subgroup com-

putations.
Lemma 2.4.7. Let H = U and K = V be two subgroups of a common par-

ent group, and let N [H, K ], N H, K . Moreover, let P = { p1 , . . . , pr1 }
be an (ε, |U |)-spreader and Q = {q1 , . . . , qr2 } be an (ε, |V |)-spreader, and let
u and v be random prefixes on the sets U and V , respectively, obtained from
randomly chosen elements of P and Q. Then Prob([u, v] ∈ N ) ≥ ε 2 /(4r1r2 ).
Proof. By Lemma 2.3.10, X := {h ∈ H | [h, K ] ≤ N } is a subgroup of H and,

by hypothesis, X = H . So Lemma 2.4.3 implies that Prob(u ∈ X ) ≥ ε/(2r1 ). If
u ∈ X then Y := {k ∈ K | [u, k] ∈ N } = K , so, by Lemma 2.4.3, Prob([u, v] ∈
N ) ≥ ε/(2r2 ). Combining the two probabilities, we obtain Prob([u, v] ∈ N ) ≥
Prob([u, v] ∈ N |u ∈ X )Prob(u ∈ X ) ≥ ε 2 /(4r1r2 ).
Theorem 2.4.8. Suppose that H = U and K = V are subgroups of G and

an upper bound l G for the length of subgroup chains in G is known. Let δ > 0.
Then there is a Monte Carlo algorithm that, with probability at least 1 − δ,
constructs O(l G ) generators for [H, K ], using O(l G log4 l G +(|U |+|V |) log l G )
group operations.
Proof. With O((|U | + |V |) log l G ) group operations, we can construct gener-

ating sets of size O(l G ) for H and K . So we suppose that |U | ∈ O(l G ) and
|V | ∈ O(l G ).
A generating set for [H, K ] is obtained in two phases. In the first phase,
we construct an (ε, |U |)-spreader P = { p1 , . . . , pr1 } and an (ε, |V |)-spreader
Q = {q1 , . . . , qr2 }, and we place the products of initial segments of U pi , V qi into
the sets TU and TV , respectively. This requires O(|U | log |U | + |V | log |V |) =
O(l G log l G ) group operations. Then, for a sufficiently large constant c, we
place cl G r1r2 ∈ O(l G log2 l G ) commutators [u, v] into a set T , where u and v
Exercises 47
are randomly chosen elements of TU and TV , respectively. After that, using

O(l G log3 l G ) group operations, we construct O(l G ) generators for T .
By Lemma 2.4.7, the normal closure of T in H, K is [H, K ] with high
probability. In the second phase, we compute generators for this normal closure.
By Corollary 2.4.6, this requires O(l G log4 l G ) group operations.
Exercises
2.1. Design an algorithm that constructs a uniformly distributed random ele-
ment of Sn in O(n) time. (You can suppose that a random element of a
list can be chosen in O(1) time.) Hint: We have to construct an injective
function f : [1, n] → [1, n]. When f (1), . . . , f (k) are already defined,
store the remaining n − k possible function values in an array of length
n − k. How should this array be modified when f (k + 1) is defined?
2.2. Design an algorithm that constructs a uniformly distributed random ele-
ment of An .
2.3. Let M be a finite state Markov chain, with transition probability matrix
P. Prove that pi(k) k
j is the (i, j)-entry in P .
2.4. Let M be a Markov chain and suppose that states u, v can be reached
from each other. Prove that u and v are both aperiodic or they have the
same period.
2.5. Prove that the stationary distribution of a finite, irreducible, aperiodic
Markov chain is the uniform one if and only if the column sums of the
transition probability matrix are all 1.
2.6. Prove the inequality (2.11).
2.7. [Beals and Babai, 1993] Suppose that G is a nonabelian black-box
group and N G, N = G. Suppose also that a subset A := {g1 , . . . ,
gk } ⊆ G\{1} is given, and we know that A ∩ N = ∅. Design a Monte
Carlo algorithm that, with high probability, computes a nontrivial ele-
ment of a proper normal subgroup of G. Hint: If {a, b} ∩ N = ∅ then
[a, b] ∈ N . What can we do if a and b commute?
3
Permutation Groups
A Complexity Overview
In this chapter, we start the main topic of this book with an overview of permu-
tation group algorithms.
3.1. Polynomial-Time Algorithms

In theoretical computer science, a universally accepted measure of efficiency
is polynomial-time computation. In the case of permutation group algorithms,
groups are input by a list of generators. Given G = S ≤ Sn , the input is of
length |S|n and a polynomial-time algorithm should run in O((|S|n)c ) for some
fixed constant c. In practice, |S| is usually small: Many interesting groups,
including all finite simple groups, can be generated by two elements, and it is
rare that in a practical computation a permutation group is given by more than
ten generators. On the theoretical side, any G ≤ Sn can be generated by at most
n/2 permutations (cf. [McIver and Neumann, 1987]). Moreover, any generating
set S can be easily reduced to less than n 2 generators in O(|S|n 2 ) time by
a deterministic algorithm (cf. Exercise 4.1), and in Theorem 10.1.3 we shall
describe how to construct at most n − 1 generators for any G ≤ Sn . Hence, we
require that the running time of a polynomial-time algorithm is O(n c + |S|n 2 )
for some constant c.
In this book, we promote a slightly different measure of complexity involv-
ing n, |S|, and log |G| (cf. Section 3.2), which better reflects the practical per-
formance of permutation group algorithms. However, “traditional” polynomial
time is still very important for us: Experience shows that a lot of ideas developed
in the polynomial-time context are later incorporated in practical algorithms;
conversely, procedures performing well in practice often have versions with
polynomial running time.
Nice surveys on polynomial-time computations can be found in [Luks, 1993]
and [Kantor and Luks, 1990]. The latter paper also contains a comprehensive
48
3.1 Polynomial-Time Algorithms 49
polynomial-time toolkit. The following list is a part of that toolkit; we overview
those tasks that can be done by a polynomial-time algorithm for all inputs
G = S ≤ Sn . For special classes of groups (mostly for solvable groups or,
slightly more generally, for groups with nonabelian composition factors of
bounded order), there is a significant extension of this list. We mention some
of the additional problems, which can be handled in polynomial time if the
input group has bounded nonabelian composition factors, in Section 3.3. Also,
as a general rule, it seems to be easier to compute a targeted subgroup if it is
normal in G: For example, C G (H ) is computable in polynomial time if H G
(since H G implies C G (H ) G) but no polynomial-time algorithm is known
for arbitrary inputs H ≤ G.
Let G = S ≤ Sym(). Then the following tasks can be computed by deter-
ministic polynomial-time algorithms:
(a) Find orbits and blocks of imprimitivity of G.

(b) Given h ∈ Sym(), test whether h ∈ G.
(c) Find the order of G.
(d) Find a generator–relator presentation for G.
(e) Given ⊆ , find the pointwise stabilizer of in G.
(f) Compute the kernel of the homomorphism π: G → Sym(). (π is defined
by specifying the images of generators.)
(g) (i) Given T ⊆ G, find the normal closure T G .
(ii) Given H ≤ G, test whether H is subnormal in G; and, if so, find a
sequence H = L 0 L 1 · · · L m = G.
(iii) Find the derived series and lower central series of G (and hence test
solvability and nilpotence).
(h) Find the center of G.
(i) Find a composition series 1 = H0 H1 · · · Hm = G for G, and find
a faithful permutation representation for each of the composition factors
Hi /Hi−1 , specifically, a homomorphism πi : Hi → Sym(i ) with ker(πi ) =
Hi−1 and |i | ≤ ||.
(j) If G is simple, identify the isomorphism type of G.
(k) (i) Given H G, test whether H is minimal normal in G. If it is not, then
find N G such that 1 N H .
(ii) Find a chief series for G.
(l) (i) If p is a prime, find a Sylow p-subgroup of G containing a given
p-subgroup P of G.
g
(ii) Given Sylow p-subgroups P1 , P2 of G, find g ∈ G such that P1 = P2 .
(iii) Given a Sylow p-subgroup P of L where L G, find NG (P).
50 Permutation Groups
(m) Given H ≤ G, find CoreG (H ). (For the definitions used in (m) – (p), see
Section 1.2.1.)
(n) Find the socle of G.
(o) Find the intersection of all maximal normal subgroups of G.
(p) For any collection
of simple groups, find O
(G) and O
(G).
(q) Find all of the above in quotient groups of G.
We give historical remarks and indicate some of the group theory involved
in the different algorithms.
Item (a) is essentially all we can do using only the given generators of G
(although there are some exceptions, e.g., testing whether a permutation group
is regular). The first algorithm for computing blocks of imprimitivity appeared
in [Atkinson, 1975]. For all other tasks, we need to compute bases and strong
generating sets (cf. Section 4.1), the basic data structures introduced in the
seminal works [Sims, 1970, 1971a]. It was first analyzed in [Furst et al., 1980]
that Sims’s algorithm for computing strong generating sets runs in polynomial
time. Items (b)–(g) are straightforward consequences of the strong generator
construction. Concerning (d), we note that the presentation obtained from such
a construction usually contains redundant defining relations; the first method
to delete some of the redundant relators is in [Cannon, 1973]. Sims’s strong
generating set construction uses only basic group theory: Lagrange’s theorem
and Schreier’s lemma (cf. Lemma 4.2.1).
Computing the center is also elementary (cf. [Luks, 1987]); in fact, it can be
reduced to a point stabilizer construction (cf. [Cooperman et al., 1989]). The
preliminary version of [Luks, 1987] for finding a composition series, circulated
since 1981, was the first algorithm based on consequences of the classification
of finite simple groups. [Neumann, 1986] also contains an algorithm for com-
puting a composition series, which runs in polynomial time for groups with
some restrictions of the primitive groups involved in them. Problem ( j), the
identification of simple groups, is solved in [Kantor, 1985b].
The nontrivial case of (k)(i) is when H is abelian. This was resolved in
[Rónyai, 1990], as an application of methods handling associative algebras.
(k)(ii) is a direct consequence of the first part.
Item (l), the work on the algorithmic Sylow machinery in [Kantor, 1985b,
1990] is the most complicated algorithm in our list: In addition to consequences
of the classification of finite simple groups, it requires a case-by-case study of the
classical finite simple groups. Considering the elementary nature of the Sylow
theorems and their important role in group theory, it would be worthwhile to
find simpler algorithms to deal with Sylow subgroups.
Problem (m) is solved in [Kantor and Luks, 1990]. The algorithm uses (l)
and so the classification of finite simple groups. The nonabelian part of the
3.2 Nearly Linear-Time Algorithms 51
socle is first computed in [Babai et al., 1983] whereas the abelian part, based
on [Rónyai, 1990], is obtained in [Kantor and Luks, 1990].
The solution of (o) and computing O
(G) is implicit in [Babai et al., 1987].
The subgroups O p (G) are obtained in [Kantor, 1985b] and [Neumann, 1986]
and the general case of O
(G) is handled in [Kantor and Luks, 1990].
The paper [Kantor and Luks, 1990] generalized the entire polynomial-time
toolkit to work with quotient groups G/K of G ≤ Sn . The trivial approach,
namely finding a permutation representation of the factor group, does not work:
In Exercise 6.6, we shall give examples K G ≤ Sn from [Neumann, 1986] for
an infinite sequence of n values so that no faithful permutation representation
of G/K acts on less than 2n/4 points. The quotient group machinery required
a radically new approach and it is based on Sylow subgroup computations.
Therefore, it is much more difficult to compute, for example, the upper central
series of G than to compute Z (G).
We shall present algorithms for most of the tasks listed in our toolkit. How-
ever, in most cases, these will not be the deterministic polynomial-time versions.
Rather, we shall describe faster, randomized algorithms, which we define in the
next section.
3.2. Nearly Linear-Time Algorithms

A lot of interesting groups do not have small-degree permutation representa-
tions; however, in their permutation representations as G ≤ Sn , log |G| is small
compared to n. This phenomenon can be formalized in the following way. We
call an (infinite) family G of permutation groups small-base groups if each
G ∈ G of degree n satisfies log |G| < logc n for some fixed constant c. Impor-
tant families of groups, including all permutation representations of all finite
simple groups except the alternating ones, belong to this category (with c = 2).
Primitive groups not containing alternating composition factors in their socle
are also small-base groups.
Permutation group algorithms belong to the class of procedures where the
running time may differ significantly on inputs of the same size, because it
usually depends both on the degree and on the order of the group the given
permutations generate. The running time of most of the polynomial-time algo-
rithms surveyed in Section 3.1 is at least (n 2 ). However, on tens of thousands
of points, even a (n 2 ) algorithm may be prohibitively slow (and using (n 2 )
memory is currently out of the question). Therefore, to handle small-base groups
of large degree, we have to find alternatives to the polynomial-time methods
that are more efficient on small-base inputs. We are interested in algorithms
with running time estimates of the form O(n|S| logc |G|); we call such proce-
dures nearly linear-time algorithms because, for small-base groups, they run in

nearly linear, O(n|S| logc (n|S|)), time of the input length. Using the soft ver-
sion of the big-O notation (cf. Section 1.2), the time bound of nearly linear-time
algorithms on small-base input groups is O ∼ (n|S|).
In Chapters 4–6, we shall see that there are nearly linear-time versions for a
significant part of the polynomial-time library: [Beals, 1993a], [Schönert and
Seress, 1994] for (a), [Babai et al., 1991] for (b)–(g), [Beals and Seress, 1992]
for (h) and (i), [Morje, 1995] for ( j) and for (l) in groups with no composition
factors of exceptional Lie type, [Holt and Rees, 1994] and [Ivanyos and Lux,
2000] for (k), and [Luks and Seress, 1997] for O
(G) in the cases when
consists of one cyclic group or it is the family of all cyclic groups. The price we
pay for the speedup is that, with a few exceptions, the algorithms are random-
ized Monte Carlo (cf. Section 1.3 for the definition), and so they may return
incorrect answers. Also, these algorithms may not have the lowest time com-
plexity on general inputs. For example, the version of basic permutation group
manipulation from [Babai et al., 1991], which we present in Section 4.5, runs
in O(n log4 |G| log n + |S|n log |G|) time, which is O ∼ (n 5 + |S|n 2 ) in the worst
case. However, there are O ∼ (n 4 +|S|n 2 ) deterministic (cf. [Babai et al., 1997b])
and O ∼ (n 3 + |S|n) Monte Carlo (cf. [Babai et al., 1995]) algorithms for the
same task. We mention, however, that usually some of the log |G| factors in the
running time estimates can be replaced by a bound on the length of subgroup
chains, which may be significantly smaller than log |G|.
A recent direction of research is to upgrade the Monte Carlo nearly linear-
time algorithms to Las Vegas–type algorithms. The papers [Kantor and Seress,
1999, 2001] establish a general framework for the upgrade, which we shall
discuss in Section 8.3.
Most of the nearly linear-time algorithms listed above are implemented in
GAP and are available in the library of the standard GAP distribution. Practical
computations almost exclusively deal with small-base input groups. The nearly
linear-time algorithms usually run faster than quadratic ones even in the indi-
vidual cases when the factor logc |G| occurring in their running time estimate
is larger than n. Because of their practical importance, we devote a significant
portion of this book to nearly linear-time algorithms.
3.3. Non-Polynomial-Time Methods

Some important problems do not have (at present) polynomial-time solutions.
They are interesting both in theory and practice: On one hand, the graph
isomorphism problem is reducible to them in polynomial time; on the other
hand, they are important tools in the study of groups.
3.3 Non-Polynomial-Time Methods 53
Examples of such problems include instances of the following, all of which
are reducible to each other in polynomial time (cf. [Luks, 1993]):
(a) Given ⊆ , compute the setwise stabilizer G = {g ∈ G | g = }.
(b) Given H, G ≤ Sym(), compute C G (H ).
(c) Given H, G ≤ Sym(), compute H ∩ G.
(d) Given H, G ≤ Sym() and x1 , x2 ∈ Sym(), are the double cosets H x1 G
and H x2 G equal?
(e) Given x1 , x2 ∈ G, decide whether they are conjugate.
Although all known algorithms for these problems have exponential worst-
case complexity, they are not considered difficult in practice. Further research is
needed to find the reason for this phenomenon. It is improbable that the decision
problems (d) and (e) are NP-complete; otherwise the polynomial-time hierarchy
of computational problems would collapse (see [Babai and Moran, 1988]; we
refer the reader to [Garey and Johnson, 1979] for the definitions regarding
complexity classes). Moreover, it is concievable that there may be polynomial-
time algorithms (at least for the classes of groups occurring in practice) to solve
them; or the average case running time of the implemented methods may be
polynomial. However, it is not even clear from which probability distribution the
“average” input should be chosen. In the uniform distribution on the subgroups
of Sn , a typical permutation group G ≤ Sn has large order: log |G| is (n) (see
[Pyber, 1993]). However, some heuristics indicate that almost all G ≤ Sn are
nilpotent, and for nilpotent groups, all instances of the above list are solvable in
polynomial time. More generally, there are polynomial-time solutions if the
non-abelian composition factors of G have bounded order. [Luks, 1993, Section
6] contains a unified treatment of all cases (a)–(e).
An important problem, which is considered more challenging than those
mentioned in (a)–(e) both in theory and practice, is the computation of normal-
izers. It is not even known whether NSym() (G) is computable in polynomial
time; for comparison, we mention that CSym() (G) is easily obtainable (cf.
Section 6.1.2). Also, it is an open issue whether the normalizer problem is
polynomially equivalent to (a)–(e).
The practical way to attack problems (a)–(e) and the normalizer problem
is by backtrack methods (cf. Chapter 9). These are systematic considerations
of all elements of G, in which only those that satisfy the desired property
are chosen. Various tricks are built-in to eliminate large parts (entire cosets of
certain subgroups) of G at once, by establishing that the coset cannot contain
any of the desired elements or that we already constructed all such elements.
Backtrack methods for permutation groups were first described in [Sims,
1971b] where the construction of centralizers was presented. Since then, a
large library of algorithms has been developed; we mention [Butler, 1982] for
a general description of backtrack searches, [Butler, 1983] and [Holt, 1991] for
computing normalizers, and [Butler and Cannon, 1989] for computing Sylow
subgroups. A second generation of backtrack algorithms is developed in [Leon,
1991], where ideas from the graph isomorphism program nauty from [McKay,
1981] are utilized. The thesis [Theißen, 1997] extends Leon’s method to obtain
a very efficient normalizer algorithm.
4
Bases and Strong Generating Sets
4.1. Basic Definitions

In this section, we define the fundamental data structures in [Sims, 1970, 1971a]
for efficient permutation group manipulation. Suppose that G ≤ Sym() and
|| = n; in fact, in this chapter we can identify with {1, 2, . . . , n}.
A sequence of elements B = (β1 , . . . , βm ) from is called a base for G if
the only element of G to fix B pointwise is the identity. The sequence B defines
a subgroup chain
G = G [1] ≥ G [2] ≥ · · · ≥ G [m] ≥ G [m+1] = 1, (4.1)
where G [i] := G (β1 ,...,βi−1 ) is the pointwise stabilizer of {β1 , . . . , βi−1 }. The base
is called nonredundant if G [i+1] is a proper subgroup of G [i] for all i ∈ [1, m].
Different nonredundant bases can have different size (cf. Exercise 4.2).
By repeated applications of Lagrange’s theorem, we obtain

m
|G| = G [i] : G [i+1] . (4.2)
i=1
Because the cosets of G [i] mod G [i+1] correspond to the elements of the orbit
[i] [i]
βiG , we obtain |G [i] : G [i+1] | = |βiG | ≤ n for all i ∈ [1, m]. Moreover, if B is
nonredundant then |G [i] : G [i+1] | ≥ 2. These inequalities, combined with (4.2),
immediately yield 2|B| ≤ |G| ≤ n |B| , or log |G|/ log n ≤ |B| ≤ log |G|. The
last inequality justifies the name “small-base group” introduced in Section 3.2:
When |G| is small, so is |B|.
A strong generating set (SGS) for G relative to B is a generating set S for
G with the property that

S ∩ G [i] = G [i] , for 1 ≤ i ≤ m + 1. (4.3)
As a simple example, consider the group G = S4 , in its natural action on
55
56 Bases and Strong Generating Sets
the set [1, 4] = {1, 2, 3, 4}. The sequence B = (1, 2, 3) is a nonredundant base
for G, since G [1] = Sym([1, 4]) G [2] = Sym([2, 4]) G [3] = Sym([3, 4])
G [4] = 1. The sets S = {(1, 2, 3, 4), (3, 4)} and T = {(1, 2, 3, 4), (2, 3, 4), (3, 4)}
generate G, but S is not a strong generating set relative to B since S ∩ G [2] =
Sym([3, 4]) = G [2] = Sym([2, 4]). In contrast, T is an SGS relative to B.
[i]
Given an SGS, the orbits βiG can be easily computed (cf. Theorem 2.1.1(i)).
These orbits are called the fundamental orbits of G. By (4.2), from the orbit
sizes we immediately get |G|. Also, keeping track of elements of G [i] in the orbit
[i]
algorithm that carry βi to points in βiG , we obtain transversals Ri for G [i] mod
G [i+1]
. We always require that the representative of the coset G [i+1] · 1 = G [i+1]
is the identity.
The Sifting Procedure

By Lagrange’s theorem, every g ∈ G can be written uniquely in the form
g = rm rm−1 · · · r1 with ri ∈ Ri . This decomposition can be done algorithmi-
g
cally: Given g ∈ G, find the coset representative r1 ∈ R1 such that β1 = β1r1 .
−1 g2 r2
Then compute g2 := gr1 ∈ G ; find r2 ∈ R2 such that β2 = β2 ; compute
[2]
g3 := g2r2−1 ∈ G [3] ; etc. This factorization procedure is called sifting and may
be considered as a permutation group version of Gaussian elimination.
Sifting can also be used for testing membership in G. Given h ∈ Sym(),
we attempt to factor h as a product of coset representatives. If the factor-
ization is successful then h ∈ G. Two things may go awry: It is possible
that, for some i ≤ m, the ratio h i := hr1−1r2−1 · · · ri−1
−1
computed by the sift-
[i]
ing procedure carries βi out of the orbit βi ; the other possibility is that
G
h m+1 := hr1−1r2−1 · · · rm−1

−1 −1
rm = 1. In both cases, obviously h ∈ G. The ratio
h i with the largest index i (i ≤ m + 1) computed by the sifting procedure is
called the siftee of h.
Computing and storing a transversal explicitly may require (n 2 ) time and
memory. To avoid that, transversals can be stored in Schreier trees. A Schreier
tree data structure for G is a sequence of pairs (Si , Ti ) called Schreier trees,
one for each base point βi , 1 ≤ i ≤ m. Here Ti is a directed labeled tree, with
all edges directed toward the root βi and edge labels from a set Si ⊆ G [i] . The
[i]
vertices of Ti are the points of the fundamental orbit βiG . The labels satisfy
the condition that for each directed edge from γ to δ with label h, γ h = δ. If
γ is a vertex of Ti then the sequence of the edge labels along the unique path
from γ to βi in Ti is a word in the elements of Si such that the product of these
permutations moves γ to βi . Thus each Schreier tree (Si , Ti ) defines inverses
of a set of coset representatives for G [i+1] in G [i] .
We store inverses of coset representatives in the Schreier trees because sifting
requires the inverses of these transversal elements.
Besides the O(|Si |n) memory requirement to store the permutations in Si ,
storing Ti requires only O(n) additional memory, because Ti can be stored in
[i]
an array Vi of length n. The γ th entry of Vi is defined if and only if γ ∈ βiG ,
and in this case Vi [γ ] is a pointer to the element of Si that is the label of the
unique directed edge of Ti starting at γ . Because of this representation as an
array, Sims originally called Schreier trees Schreier vectors.
Continuing our example G = S4 with base B = (1, 2, 3) and SGS T = {(1,
2, 3, 4), (2, 3, 4), (3, 4)}, let us construct Schreier trees for G using the label
sets Si := T ∩ G [i] . The trees Ti can be constructed as the breadth-first-search
[i]
trees, which compute the orbits βiG (cf. Section 2.1.1) but, since the edges of
the trees must be directed toward the roots, we have to use the inverses of the
elements of Si in the construction of the Ti . The label set Si determines uniquely
only the levels of the tree Ti , because the vertices on level j may be the images of
more vertices on level j −1, under more permutations. For example, in T1 , level
0 contains the point 1, level 1 contains only the point 4 (since 4 is the only point
that is the image of 1 under the inverse of some element of S1 ), and (1, 2, 3, 4)
−−→
is the only possible label for the edge (4, 1). Level 2 consists only of the point 3,
−−→
but we have three possibilities for defining the label of (3, 4) because the
inverses of (1, 2, 3, 4), (2, 3, 4), and (3, 4) all map 4 to 3. Therefore, the label of
−−→
the edge (3, 4) depends on the order in which the inverses of the elements of S1
are applied to take the image of 4. One possibility for the three Schreier trees is
coded in the arrays ((), (2, 3, 4), (2, 3, 4), (1, 2, 3, 4)), (∗, (), (2, 3, 4), (2, 3, 4)),
and (∗, ∗, (), (3, 4)). Here () denotes the identity permutation and ∗ denotes
that the appropriate entry of the array is not defined because the corresponding
point is not in the fundamental orbit of βi . If, for example, we need a transversal
element carrying the first base point 1 to 3 then from the first array we obtain
that (2, 3, 4) · (1, 2, 3, 4) = (1, 2, 4, 3) maps 3 to 1, and its inverse is the desired
transversal element.
4.2. The Schreier–Sims Algorithm

Strong generating set constructions are based on the following lemma by O.
Schreier (hence giving rise to the name Schreier–Sims algorithm). Recall that
if H ≤ G and R is a right transversal for G mod H , then for g ∈ G we denote
the unique element of H g ∩ R by ḡ.
Lemma 4.2.1. Let H ≤ G = S and let R be a right transversal for G mod
H , with 1 ∈ R. Then the set
T = {r s(r s)−1 | r ∈ R, s ∈ S}
generates H .
The elements of T are called Schreier generators for H .
Proof. By definition, the elements of T are in H , so it is enough to show that

T ∪ T −1 generates H . Note that T −1 = {r s(r s)−1 | r ∈ R, s ∈ S −1 }.
Let h ∈ H be arbitrary. Since H ≤ G, h can be written in the form h =
s1 s2 · · · sk for some nonnegative integer k and si ∈ S ∪ S −1 for i ≤ k. We define
a sequence h 0 , h 1 , . . . , h k of group elements such that
h j = t1 t2 · · · t j r j+1 s j+1 s j+2 · · · sk (4.4)
with ti ∈ T ∪ T −1 for i ≤ j, r j+1 ∈ R, and h j = h. Let h 0 := 1s1 s2 · · · sk .

Recursively, if h j is already defined then let t j+1 := r j+1 s j+1 (r j+1 s j+1 )−1 and
r j+2 := r j+1 s j+1 . Clearly, h j+1 = h j = h, and it has the form required in (4.4).
We have h = h k = t1 t2 · · · tk rk+1 . Since h ∈ H and t1 t2 · · · tk ∈ T ≤ H , we
must have rk+1 ∈ H ∩ R = {1}. Hence h ∈ T .
Remark 4.2.2. In this book we deal only with finite groups, and so every
element h of a given group G = S can be written as a product h = s1 s2 · · · sk
of generators and we do not have to deal with the possibility that some si is the
inverse of a generator. In the proof of Lemma 4.2.1, we included the possibility
si ∈ S −1 since this lemma is valid for infinite groups as well, and in an infinite
group we may need the inverses of generators to write every group element as
a finite product.
In the analysis of the Schreier–Sims algorithm, we shall also use the following
observation from [Sims, 1971a].
Lemma 4.2.3. Let {β1 , . . . , βk } ⊆ and G ≤ Sym(). For 1 ≤ j ≤ k + 1, let

S j ⊆ G (β1 ,...,β j−1 ) such that S j ≥ S j+1 holds for j ≤ k. If G = S1 , Sk+1 =
∅, and
S j β j = S j+1 (4.5)

holds for all 1 ≤ j ≤ k then B = (β1 , . . . , βk ) is a base for G and S = 1≤ j≤k Sj
is an SGS for G relative to B.
Proof. We use induction on ||. Our inductive hypothesis is that S ∗ :=

∗
2≤ j≤k S j is an SGS for S2 , relative to the base B := (β2 , . . . , βk ). Let
G := G (β1 ,...,βi−1 ) ; we have to check that (4.3) holds for 2 ≤ i ≤ k + 1. For
[i]
i = 2, (4.3) holds since applying (4.5) with j = 1, we obtain G β1 = S2 ≤

S ∩ G β1 . The reverse containment is obvious. For i > 2, (4.3) follows from
the fact that S ∗ ∩ G (β1 ,...,βi−1 ) generates S2 (β2 ,...,βi−1 ) by the inductive hypoth-
esis, and so G [i] ≥ S ∩ G (β1 ,...,βi−1 ) ≥ S ∗ ∩ G (β1 ,...,βi−1 ) = S2 (β2 ,...,βi−1 ) =
(G β1 )(β2 ,...,βi−1 ) = G [i] .
The Schreier–Sims Algorithm

Given G = T , we can construct an SGS in the following way. We maintain
a list B = (β1 , . . . , βm ) of already known elements of a nonredundant base
and an approximation Si for a generator set of the stabilizer G (β1 ,...,βi−1 ) , for
1 ≤ i ≤ m. During the construction, we always maintain the property that, for
all i, Si ≥ Si+1 . We say that the data structure is up to date below level j if
(4.5) holds for all i satisfying j < i ≤ m.
In the case when the data structure is up to date below level j, we compute
a transversal R j for S j mod S j β j . Then we test whether (4.5) holds for
i = j. By Lemma 4.2.1, this can be done by sifting the Schreier generators
obtained from R j and S j in the group S j+1 . (In this group, membership testing
is possible, because Lemma 4.2.3 implies that we have a strong generating set
for S j+1 .) If all Schreier generators are in S j+1 then the data structure is up
to date below level j − 1; otherwise, we add a nontrivial siftee h of a Schreier
generator to S j+1 and obtain a data structure that is up to date below level j + 1.
In the case j = m, we also choose a new base point βm+1 from supp(h).
We start the algorithm by choosing β1 ∈ that is moved by at least one
generator in T and setting S1 := T . At that moment, the data structure is up to
date below level 1; the algorithm terminates when the data structure becomes
up to date below level 0. Lemma 4.2.3 immediately implies correctness.
As an example, let us follow the construction of an SGS for the input
G = (2, 3), (1, 2, 4) ≤ Sym([1, 4]). We start by choosing 2 as the first base
point, as the first element of = [1, 4] moved by the first generator of G.
(Note that although generating sets and orbits are sets, in the computer they
are stored as sequences and so we may talk about their first element.) We de-
fine S1 := {(2, 3), (1, 2, 4)} and compute the Schreier tree coded by the array
((1, 2, 4), (), (2, 3), (1, 2, 4)). At that moment, the data structure is up to date
below level 1. Next, we have to check whether the Schreier generators for the
point stabilizer G 2 are in the trivial group, which is the current data struc-
ture below level 1. The first Schreier generator is composed from the coset
representative r = (1, 4, 2), which carries the base point 2 to 1, and from the
generator s = (2, 3). So r s = (1, 4, 3, 2), r s = (1, 4, 2), and r s(r s)−1 = (3, 4).
The siftee of (3, 4) in the trivial group is obviously itself and not the identity, so
we define the new base point 3 (as the first element moved by the siftee). We also
define S2 := {(3, 4)} and compute the Schreier tree coded by (∗, ∗, (), (3, 4)).
At that moment, the data structure is up to date below level 2. Checking the
Schreier generators reveals that we have a correct SGS for the group S2 , so we
can step back a level and the data structure is up to date below level 1. The next
Schreier generator on level 1 with nontrivial siftee is composed from r = (2, 3),
which carries the base point 2 to 3, and from the generator s = (1, 2, 4).
Now r s = (1, 2, 3, 4), r s = (2, 3), and r s(r s)−1 = (1, 3, 4). The siftee of
(1, 3, 4) in S2 is (1, 4), so we have to redefine S2 as S2 := {(3, 4), (1, 4)}
and recompute the second Schreier tree. The new tree is coded by the array
((1, 4), ∗, (), (3, 4)). At that moment, the data structure is up to date below level
2. For the coset representative r = ((1, 4) · (3, 4))−1 = (3, 1, 4) carrying the
base point 3 to 1 and for s = ((3, 4) ∈ S2 , we obtain r s = (1, 3), r s = (3, 1, 4),
and r s(r s)−1 = (1, 4). The siftee of (1, 4) is itself, so we add 1 to the base,
define S3 := {(1, 4)}, and compute the third Schreier tree coded in the array
((), ∗, ∗, (1, 4)). At that moment, the data structure is up to date below level
3. After that, we check that all Schreier generators composed on levels 3, 2, 1
have trivial siftees, and so the data structure is up to date below level 0 and
we can terminate the algorithm. The output is the base (2, 3, 1) and the SGS
{(2, 3), (1, 2, 4), (3, 4), (1, 4)}.
Because of the fundamental importance of SGS constructions for permutation
group computations, we do a detailed complexity analysis. We analyze two
versions. In the first one, we store transversals explicitly; in the second one,
transversals are stored in Schreier trees.
The number of base points is at most log |G|. For a fixed base point βk , the
set Sk changes at most log |G| times during the algorithm, since the group Sk
must increase each time we add an element to Sk . We have to recompute the
transversal Rk after each change of Sk . This can be done by an orbit computation
as in Theorem 2.1.1(i) (using the inverses of the elements of Sk , since the edges
in the Schreier tree are directed toward the root, and now we proceed in the
opposite direction). However, we have to find the image of a point in the orbit of
βk under an element of Sk only once during the entire algorithm; hence the time
spent in image computations (for all base points, during the entire algorithm)
is O(|B| log |G|n + |T |n), which is O(n log2 |G| + |T |n).
In the first version, we also have to compute transversals explicitly. For a
fixed base point, this may require O(n) permutation multiplications at cost
O(n) each, so the total time spent in transversal computations is O(n 2 log |G|).
In the second version, the bookkeeping necessary to set up the Schreier trees
can be done in O(n log |G|) total time.
Next, we estimate the time needed to sift Schreier generators. Again, we
observe that although the sets Rk and Sk change, they are always only aug-
mented and therefore any elements of Rk and Sk must be combined to a
Schreier generator only once. The total number of Schreier generators is

k |Rk ||Sk | ∈ O(n log |G| + |T |n). In the first version, sifting a Schreier gen-
2
erator can be done by O(log |G|) permutation multiplications, so the total

cost is O(n 2 log3 |G| + |T |n 2 log |G|). In the second version, recovering
a coset representative from a Schreier tree may require O(n) permutation
multiplications, so the cost of one sift is O(n 2 log |G|); the total cost is
O(n 3 log3 |G| + |T |n 3 log |G|).

The first version stores k |Sk | ∈ O(log2 |G|) strong generators and |Rk |
∈ O(n log |G|) coset representatives; therefore it requires O(n 2 log |G| + |T |n)
memory, whereas the second one needs only O(n log2 |G| + |T |n). Note that
we combined the terms log2 |G| and n log |G| to n log |G|, which is incorrect
if log |G| ∈ O(n). However, the stated memory requirements are valid even
in this case, since |Sk | can also be estimated from above by the length of the
longest subgroup chain in G. This length is always less than 3n/2 (cf. [Cameron

et al., 1989]), so k |Sk | ∈ O(n log |G|).
Hence we have proved the following theorem.
Theorem 4.2.4. Given G = T , a strong generating set for G can be com-

puted by deterministic algorithms in O(n 2 log3 |G| + |T |n 2 log |G|) time using
O(n 2 log |G| + |T |n) memory or in O(n 3 log3 |G| + |T |n 3 log |G|) time using
O(n log2 |G| + |T |n) memory.
Some remarks are in order. First, we note that in both versions a factor n can
be shaved off from the term including |T | in the time analysis (Exercise 4.4).
The time–space tradeoff in complexity demonstrated in Theorem 4.2.4 is
quite common in computational group theory. Because of hardware restrictions,
we usually choose versions with smaller memory usage.
In the analysis of the second version, we estimated the depth of Schreier trees
by n. As a cyclic group with one generator shows, this bound can be achieved;
however, practical experience indicates that, for most groups and generating
sets, a breadth-first-search Schreier tree will have significantly smaller depth.
Nevertheless, even in such nicely behaving groups, Schreier trees with large
depth may occur! During the construction, when we define a new base point,
the last group in our stabilizer chain is temporarily a cyclic group with one
generator. If the group contains elements of large order, this cyclic group may
define a Schreier tree of large depth. When a second generator is added to this
level, the breadth-first Schreier tree defined by the two generators may have
small depth; however, if we always augment previous Schreier trees, a long
branch remains in the tree forever. (This situation occurs for example in the
group GLd (q), acting as a permutation group on the q d points of the underlying
vector space.) Hence, it may be useful to recompute the ith Schreier tree from
scratch each time a new generator is added to Si . Depending on the types of
groups with which we work, this recomputation may mean savings in the sifting
part of the algorithm or it may be just a waste of time because the augmented tree
works just as well. We brought up this issue to indicate the nature of problems
we may face at implementation, in addition to the complexity considerations.
For arbitrary inputs, the worst-case complexity of the first version is O(n 5 )
(as in the memory estimates, the extra log n factors can be avoided by using
stronger bounds than the trivial log(n!) on the length of subgroup chains).
[Knuth, 1991] gives a family of generators for Sn for which the running time
of the algorithm is indeed (n 5 ). Based on experiments, we believe that, using
random generators for Sn , the expected running time is (n 4 ). In Section 10.2,
we shall describe faster algorithms for the recognition and effective handling
of symmetric and alternating groups.
SGS constructions are essential parts of almost all permutation group al-
gorithms since strong generating sets provide the only known means to test
membership in groups. Therefore, it is of utmost importance to have efficient
ways to construct strong generating sets. This importance is the reason why we
deal at length with the various attempts to speed up the Schreier–Sims algorithm.
4.3. The Power of Randomization

The time-critical part of the Schreier–Sims procedure is the sifting of Schreier
generators. Therefore, a possible way to speed up the algorithm is to find a fast
solution of the following problem:
Given G = S, a transversal R for G mod G β , and H ≤ G β ,

decide whether H = G β . (4.6)
Of course, we want to apply the solution in the case when G = Si and H =
Si+1 for two consecutive levels during the SGS construction.
Practical experience shows that if H = G β then usually at least half of the
Schreier generators constructed from R and S are in G β \H . Therefore, testing
only a random sample of Schreier generators, we obtain a heuristic solution for
(4.6), and thus a heuristic randomized algorithm for SGS construction.
We caution that, although testing a small number of randomly chosen Schreier
generators usually suffices, it is easy to construct examples when there is only
one generator g ∈ S such that Schreier generators obtained from g and R may
show H = G β , because for all k ∈ S\{g} and r ∈ R, r k(r k)−1 ∈ H (cf. Exercise
4.5). Nevertheless, it is possible to extend the heuristics above to an algorithm
4.3 The Power of Randomization 63
with rigorous probability analysis, leading to a fast Monte Carlo SGS construc-
tion. We shall describe this extension in Section 4.5.
Another possibility for a randomized SGS construction is based on the fact
that if an alleged SGS is not correct then, with probability at least 1/2, a uni-
formly distributed random element of G does not sift through the correspond-
ing transversal system. More precisely, let G = S, B = (β1 , . . . , βm ) ⊆
such that no element of S fixes B pointwise, and let Si := S ∩ G (β1 ,...,βi−i ) for
S
1 ≤ i ≤ m + 1. For 1 ≤ i ≤ m, we can compute the orbits βi i and sets of
Si
group elements Ri ⊆ Si such that for all γ ∈ βi there is a unique rγ ∈ Ri
r
with βi γ = γ . (Note that Ri may be only a proper subset of a transversal for
Si mod Si+1 if S is not an SGS for G. The Ri may be computed explicitly
or stored in a Schreier tree data structure.) Given h ∈ Sym(), we can apply the
sifting procedure to attempt to factor h in the form h = rm · · · r1 with ri ∈ Ri .
We say that h sifts through if this factorization procedure succeeds.
Lemma 4.3.1. Let G = S, B = (β1 , . . . , βm ), and Si , for 1 ≤ i ≤ m + 1, be

as in the previous paragraph. If B is not a base or S is not a strong generating
set relative to B then the probability that a uniformly distributed random g ∈ G
does not sift through the transversal system built from S is at least 1/2.
Proof. Let i be the largest integer such that Si βi Si+1 . By Lemma 4.2.3,
such an i exists and we have a correct SGS for Si+1 . In particular, we can test
membership in the group Si+1 .
Let p be the probability that a given uniformly distributed random g ∈ G
sifts through the first i levels of our data structure and, in the case when
g sifts through, let ḡ denote the unique element of Ri · · · R2 R1 , which is
the product of transversal elements computed by the sifting procedure. Then
g(ḡ)−1 ∈ G (β1 ,...,βi ) , and every element of G (β1 ,...,βi ) has the same chance to
occur as g(ḡ)−1 for some g ∈ G. Hence Prob(g(ḡ)−1 ∈ Si+1 ) = 1/|G (β1 ,...,βi ) :
Si+1 |. We know that |G (β1 ,...,βi ) : Si+1 | ≥ 2, since G (β1 ,...,βi ) ≥ Si βi Si+1 ;
we also know that g(ḡ)−1 ∈ Si+1 if and only if it sifts through the transver-
sal system of Si+1 . In summary, g does not sift through with probability
(1 − p) + (1 − 1/|G (β1 ,...,βi ) : Si+1 |) p ≥ 1 − p + p/2 ≥ 1/2.
To get a better approximation of an SGS, a nontrivial siftee can be used to

augment S. Hence, if uniformly distributed random elements of G are available,
we have a Monte Carlo algorithm that, with arbitrarily small error probability
prescribed by the user, constructs an SGS. A version of this procedure is de-
scribed and analyzed in more detail in Lemma 5.4.1.
The Random Schreier–Sims Algorithm
If uniformly distributed random elements are not available then we have to resort
to one of the methods for random element generation described in Section 2.2.
The first practical implementation is described in [Leon, 1980b, Section 7].
Random elements are constructed via the first method described in Section 2.2,
as the vertices traversed during a random walk on the Cayley graph (G, T )
defined by the input G = T . Leon uses the stopping criterion that twenty con-
secutively constructed random elements of G sift through. The term “random
Schreier–Sims algorithm” in the literature usually refers to Leon’s algorithm.
Similar SGS computations based on the other two methods for random ele-
ment selection described in Section 2.2 are not implemented. In recent releases
of GAP, a heuristic speedup of the nearly linear-time SGS construction of
Section 4.5 is used to replace the random Schreier–Sims algorithm.
In practice, fast heuristic SGS constructions are used in conjunction with
independent strong generating tests to check the correctness of the result. The
simplest such test, which can be applied quite frequently in practice, is when the
order of G is known in advance. Because the methods never put permutations
in the SGS that are not elements of the input group, the possible error is always
one-sided: Namely, the group order computed by (4.2) from the alleged SGS is
never greater than the true value. Therefore, if the computed and the true values
are the same then the construction is correct.
We postpone the description of further strong generating tests to Chapter 8,
when more algorithmic machinery will be available to us.
4.4. Shallow Schreier Trees

For large values of n, storing transversals for a point stabilizer chain of some
G ≤ Sn explicitly requires too much memory. However, if transversals are
stored in Schreier trees then the sifting process slows down proportionally to
the depth of trees. Hence we are interested in methods for constructing Schreier
trees that have both small depth and use only a small number of edge labels.
In this section, we present two constructions: A deterministic one from
[Babai et al., 1991] and a randomized one from [Cooperman et al., 1990]. Both
methods manipulate subsets of groups, which themselves are not subgroups.
Given a sequence of elements (g1 , . . . , gk ) ⊆ G ≤ Sym(), || = n, the
cube Ck of these elements is defined as

Ck = g1ε1 g2ε2 · · · gkεk εi ∈ {0, 1} . (4.7)
Also, let Ck−1 := {g ∈ G | g −1 ∈ Ck }. Clearly, |Ck | ≤ 2k ; we say that the cube

is nondegenerate if equality is attained. Cubes were introduced in [Babai and
Szemerédi, 1984] in a highly nonconstructive setting: They were used to es-
tablish that testing membership in black-box groups is in the complexity class
NP.
Nondegenerate cubes are usually obtained by repeated applications of the
following simple observation.
Lemma 4.4.1. Let g1 , . . . , gk , gk+1 ∈ G. Then |Ck+1 | = 2|Ck | if and only if

gk+1 ∈ Ck−1 Ck .
Proof. |Ck+1 | = 2|Ck | ⇐⇒ Ck gk+1 ∩ Ck = ∅ ⇐⇒ gk+1 ∈ Ck−1 Ck .
Testing membership in Ck−1 Ck is difficult and no method other than essen-

tially listing all elements is known. However, as observed in [Babai et al., 1991],
it is easy to check the following sufficient condition for nonmembership: If, for
−1 −1
some α ∈ , α g ∈ α Ck Ck then g ∈ Ck−1 Ck . The set α Ck Ck is computable in
hi
O(kn) time: Define 0 := {α} and recursively i := i−1 ∪ i−1 , where h i is
−1 −1 −1
the ith member in the sequence gk , . . . , g1 , g1 , . . . , gk . Then 2k = α Ck Ck .
−1
Also, we can keep track how points in α Ck Ck were first reached, thereby defin-
−1
ing a breadth-first-search tree structure on α Ck Ck (cf. Section 2.1.1 for the
definition of breadth-first-search trees). Based on this observation, a shallow
Schreier tree for the transversal of a point stabilizer can be built.
Lemma 4.4.2. Given G = S ≤ Sym() and α ∈ , a Schreier tree of depth

at most 2 log |G| for the transversal G mod G α can be built in O(n log2 |G| +
|S|n) time, by a deterministic algorithm.
Proof. We define elements g1 , . . . , gk of G such that their cube is non-

−1
degenerate and α Ck Ck = α G . Since Ck is nondegenerate, k ≤ log |G| and
the Schreier tree with labels {gi , gi−1 | i ≤ k} has depth at most 2 log |G|.
Let g1 be an arbitrary (nonidentity) element of S. If g1 , . . . , gi are already
−1 −1
defined and α Ci Ci = α G then there exists γi ∈ α Ci Ci and u i ∈ S such that
−1
γiu i ∈ α Ci Ci . Let g be the product of labels along the path from γi to α in the
−1
tree on α Ci Ci and let gi+1 := g −1 u i .
−1
The orbit α G is computed in O(|S|n) time. For fixed i ≤ k, the set α Ci Ci can
be obtained in O(n log |G|) time and gi+1 is computed using O(log |G|) group
multiplications, requiring also O(n log |G|) time. Finally, we observe that, for
each β ∈ α G and s ∈ S, we have to compute β s at most once during the entire
−1
algorithm when searching for appropriate γi and u i values, since if β s ∈ α Ci Ci
−1
then β s ∈ α C j C j for all j > i as well. Therefore, the total cost of all image
computations β s during the algorithm is O(|S|n).
Remark 4.4.3. In implementations, we use the following slight modification

of the algorithm described in Lemma 4.4.2. Using the notation introduced there,
suppose that g1 , . . . , gi are already defined. Let be the union of the first 2i
levels of the breadth-first-search tree rooted at α, in the graph defined by the
edge labels {g1 , . . . , gi , g1−1 , . . . , gi−1 }. If = α G then we construct gi+1 such
−1
that α gi+1 ∈ . Clearly, α Ci Ci ⊆ , so we have the same theoretical guarantee
for the depth of the output Schreier tree as in Lemma 4.4.2. In practice, the
depth is usually less than log |α G |.
The randomized method of [Cooperman et al., 1990], from which the follow-
ing two lemmas are taken, constructs a Schreier tree of depth O(log |α G |) for G
mod G α . It assumes the availability of uniformly distributed random elements
of G. A variant of Lemma 4.4.4 seems to have first appeared in [Erdős and
Rényi, 1965].
For a random variable A, the expected value of A is denoted by E(A).
Lemma 4.4.4. Let G ≤ Sym(), let be an orbit of G, with || = m, and let

be an arbitrary subset of . If g is a uniformly distributed random element
of G then E(| g \|) = ||(m − ||)/m.
Proof. Since G acts transitively on , for any γ ∈ and δ ∈ \ there are

|G|/m elements h ∈ G such that γ h = δ. Thus Prob(δ ∈ g ) = ||/m for any
δ ∈ \ and E(| g \|) = ||(m − ||)/m.
Lemma 4.4.5. Let G ≤ Sym(), let be an orbit of G, with || = m, and let

∅ = ⊂ . Moreover, let 0 < p < 1/2, and let g be a uniformly distributed
random element of G. Then
(i) if || ≤ m/2 then

||
Prob | g ∪ | ≥ || 2 − ≥ p;
(1 − p)m
(ii) if || ≥ m/2 then

m − ||
Prob |\( ∪ )| ≤ (m − ||)
g
≥ p.
(1 − p)m
Proof. (i) The condition | g ∪ | ≥ ||(2 − ||/((1 − p)m)) is equivalent to

| g \| ≥ c||, for c = 1 − ||/((1 − p)m). Let x = Prob(| g \| ≥ c||).
Then x|| + (1 − x)c|| ≥ E(| g \|). By Lemma 4.4.4, this inequality is
equivalent to x ≥ p.
(ii) In this case, the condition |\( g ∪ )| ≤ (m − ||)2 /((1 − p)m) is
equivalent to | g \| ≥ (1 − c)(m − ||), for c = (m − ||)/((1 − p)m). Let x =
Prob(| g \| ≥ (1 − c)(m − ||)). Then x(m − ||) + (1 − x)(1 − c)(m − ||) ≥
E(| g \|). By Lemma 4.4.4, this inequality is equivalent to x ≥ p.
The following theorem, with weaker constants, is the main result of

[Cooperman et al., 1990].
Theorem 4.4.6. Let G ≤ Sym(), let be an orbit of G, with || = m, and

let δ ∈ . Moreover, let (g1 , g2 , . . . , gt ), t ≥ 8 log m + 16, be a sequence of
uniformly distributed random elements of G. Then, with probability at least
1 − m −0.29 , δ g1 ,g2 ,...,gt = and the breadth-first-search Schreier tree for G
mod G δ using the labels g1−1 , g2−1 , . . . , gt−1 has depth at most 2 log m + 4. For
c > 1, using c(8 log m + 16) random elements ensures a Schreier tree of depth
at most 2 log m + 4 with probability at least 1 − m −0.29c .
Proof. The basic idea of the proof is the following. Let D j denote the cube of
the first j random elements, and let j := δ D j . We also fix some 0 < p < 1/2.
g
Then j+1 = j ∪ j j+1 , and Lemma 4.4.5 implies that, with probability at
least p, the size of the j increase by at least a constant factor when | j | ≤
m/2, and the size of the \ j decrease by at least a constant factor when
| j | > m/2. Hence, after O(log m) rounds, δ D j = . Also, the vertices in
δ D j are on the first j levels of the Schreier tree (note that the Schreier tree uses
the inverses of the gi as labels, since edges are directed toward the root δ).
We can obtain the estimate for the depth of the Schreier tree described in the
statement of the theorem by a refinement of the argument in the previous para-
graph. Let us fix p := 0.46. We define a subsequence H = (h 1 , h 2 , . . . , h s )
of (g1 , g2 , . . . , gt ), consisting of those gi for which the already constructed
part of increases by the amount indicated in Lemma 4.4.5. Formally, sup-
pose that (h 1 , h 2 , . . . , h j ) is already constructed, where h i = gki for some k1 <
k2 < · · · < k j . Let C j denote the cube of (h 1 , h 2 , . . . , h j ), and let j := δ C j .
If | j | ≤ m/2 then we define h j+1 as the gi with the smallest index i > k j
satisfying

gi
∪ j ≥ | j | 2 − | j | ,
j
0.54m
whereas if | j | > m/2 then we define h j+1 as the gi with the smallest index
i > k j satisfying
g (m − | j |)2
m − j i ∪ j ≤ .
0.54m
We have to show that
(a) s ≥ &2 log m' + 4, with probability at least 1 − m −0.29 if t ≥ 8 log m + 16,
and with probability at least 1 − m −0.29c if t ≥ c(8 log m + 16); and
(b) δ C&2 log m'+4 = .
(a) and (b) imply that the breadth-first-search tree using the first &2 log m' + 4
of the h i as generators already has depth at most &2 log m' + 4 and, of course,
the depth of the breadth-first-search tree using all gi cannot be greater.
(a) is an easy consequence of Lemma 2.3.3, applied with the parameters
ε = 0.4, t = 8 log m + 16 , p = 0.46 and ε = 1 − 0.6/c, t = c(8 log m +
16) , p = 0.46 respectively. To prove (b), consider the function

x(2 − x/0.54m) if 0 < x ≤ 0.5m

f m (x) =
m − &(m − x)2 /0.54m' if 0.5m < x < m
{k}
defined for the integers 1 ≤ x ≤ m. We use the notation f m (x) := f m ( f m ( f m
. . . f m (x) . . .))(k iterations). Since f m (x) is monotone increasing, it is enough
to show that
f m{&2 log m'+4} (1) = m. (4.8)
For m < 28 /0.54, (4.8) can be checked by a small computer calculation. For
m ≥ 28 /0.54, we can argue as follows. If k > 1.36 log m − 3.35 > log5/3 0.18m
{k}
then f m (1) ≥ 0.18m, since f m (x) ≥ 5x/3 for x ≤ 0.18m. If x ≥ 0.18m then
{5}
f m (x) > 0.625m. Finally, if x > 0.625m and k > 0.93 + log log 0.54m then
{k}
f m (x) = m. To see this, observe that for such an x and k,
k
(m − x)2
m− f m{k} (x) ≤ < 1.
(0.54m)(2k −1)
{k}
Since f m (x) is an integer, it must be equal to m. To finish the proof, we check
that 1.36 log m − 3.35 + 5 + 0.93 + log log 0.54m < 2 log m + 4 if m ≥
28 /0.54.
Remark 4.4.7. A possible variant is to generate the random elements one by

one and to discard immediately those that do not extend the already constructed
part of by the amounts indicated in the statement of Lemma 4.4.5. In this
way, we never store more than &2 log m' + 4 group elements and have the same
theoretical guarantee for success (number of random elements to construct,
depth of tree, and error estimate) as in Theorem 4.4.6.
The cutoff point 0.18m and the probability p = 0.46 in the previous analysis
are quite arbitrary. Using a cutoff point closer to 0 and probability closer to 0.5,
the same argument as in the proof of Theorem 4.4.6 shows that, for each fixed
ε > 0, the breadth-first-search tree of (2 + ε) log m + c random group elements
has depth at most (1 + ε) log m + c with probability at least 1 − m −C . Here c and
C are positive constants, depending on ε. In practice, the breadth-first-search
tree of log m random elements usually has depth less than log m.
Given an arbitrary strong generating set for a group G, a slight modifica-
tion of the idea of Lemma 4.4.2 produces an SGS of size at most log |G|,
which also defines shallow Schreier trees. The following lemma is from
[Cooperman and Finkelstein, 1992].
Lemma 4.4.8. Let G = S and suppose that S is a strong generating set relative
to the base B = (β1 , . . . , βm ) with the corresponding point stabilizer chain G =
G [1] ≥ G [2] ≥ · · · ≥ G [m+1] = 1. Then an SGS R relative to B can be computed
in nearly linear time by a deterministic algorithm, such that |R| ≤ log |G| and
the breadth-first-search Schreier tree defined by (R ∪ R −1 ) ∩ G [i] for G [i] mod
G [i+1] has depth at most 2 log |G [i] |, for 1 ≤ i ≤ m.
Proof. We construct an SGS as a sequence of group elements R := (r1 ,

r2 , . . . , rk ) such that their cube is nondegenerate. This will ensure that k ≤
log |G|.
We work in a “bottom-up” manner, constructing an SGS for G [m] , G [m−1] ,
etc. Suppose that an initial segment R j := (r1 , . . . , r j ) of R is already defined
such that R j ⊆ G [i] and R j ∩ G [i+1] is an SGS for G [i+1] with Schreier trees of
[i]
depth as stated. Let ⊆ βiG be the set of points reachable from βi via a path of
length at most 2 j using the edges defined by R j ∪ R −1 j . If = βi
G [i]
then, as in
r
the proof of Lemma 4.4.2, we construct r j+1 ∈ G such that βi ∈ . If =
[i] j+1
[i]
βiG then we decrease i by 1. The procedure stops when i reaches 0.
We remark that [Sims, 1971a] contains an even faster method, which

constructs an SGS of size at most log |G| as a subset of the given SGS
(cf. Exercise 4.7). However, there is no theoretical guarantee better than the
orbit sizes on the depths of the new Schreier trees.
4.5. Strong Generators in Nearly Linear Time
In this section, we describe a nearly linear-time Monte Carlo SGS construction
from [Babai et al., 1991]. In Section 5.2, we shall give a version of this algorithm
that, with high probability, constructs an SGS in small-base groups even faster.
We follow the outline of the second version of the Schreier–Sims procedure
analyzed in Section 4.2 (where we store coset representatives implicitly in
Schreier trees). In this second version, there are two nonlinear bottlenecks: We
have to guarantee that the Schreier trees are of small depth, and we have to find
a fast solution for the problem presented in (4.6) in Section 4.3 (i.e., we have
to decide whether a subgroup H ≤ G β is proper, or H = G β ).
The input is G = T ≤ Sym(), with || = n. Suppose that an initial seg-
ment B = (β1 , . . . , βk ) ⊆ of a nonredundant base and an approximation Si
for a generator set of the stabilizer G [i] := G (β1 ,...,βi−1 ) for 1 ≤ i ≤ k are already
computed. As in the original Schreier–Sims algorithm, we always maintain the
property that Si ≥ Si+1 for all i. In this section, we use a slight modifica-
tion of the definition of an up-to-date SGS data structure, which was originally
given in Section 4.2. We say that the data structure is up to date below level j
if Si βi = Si+1 holds for all i with j < i ≤ k and the sum of depths of the
Schreier trees for levels j + 1, . . . , k is at most 6 log |S j+1 |.
The SGS Construction

In the case when the data structure is up to date below level j, we use the
algorithm described in the proof of Lemma 4.4.2 to compute a Schreier tree of
S
depth at most 2 log |S j | for β j j . After that, we apply a Monte Carlo solution
of (4.6), to be described later, that either returns some g ∈ S j β j \S j+1 or
reports that S j β j = S j+1 . In the former case, we add g to S j+1 and declare
that the data structure is up to date below level j + 1. If S j β j = S j+1 then we
have an SGS for S j , and so we can construct uniformly distributed random
elements of S j . We compute a Schreier tree for S j mod S j+1 as described
in Remark 4.4.7, using a sequence (g1 , . . . , gs ) of random elements as labels,
S S
with s := &2 log |β j j | + 4' ≤ 6 log |β j j |. If this Schreier tree construction
succeeds then we set S j := S j+1 ∪ {g1 , . . . , gs } and obtain a data structure that
is up to date below level j − 1.
We want to handle the base point β1 in the same manner as the later base
points. To this end, in the spirit of Exercise 4.4, we start the algorithm by
defining S0 := T , corresponding to an imaginary base point β0 ∈ (we can
think of β0 as a point added to the permutation domain, which is fixed by every
element of G, and so G = G β0 and the Schreier tree for G mod G β0 is trivial).
Then we say that the data structure is up to date below level 0; the algorithm
terminates when the data structure becomes up to date below level −1.
Lemma 4.5.1. The Schreier tree computations of the SGS construction require
O(n log n log4 |G|) total time and O(n log |G|) memory. This part of the SGS
construction algorithm is of Las Vegas type.
Proof. We use the notation introduced at the beginning of this section. There
are at most log |G| base points, and each base point βi is processed at most
log |G| times, since the subgroup Si increases at each processing. Hence the
algorithms in Lemma 4.4.2 and Remark 4.4.7 are invoked at most log2 |G|
times. Lemma 4.4.2 is deterministic. Therefore, to obtain a fixed, but arbitrarily
small, constant error probability for the entire algorithm, we have to ensure
that each invocation of the algorithm in Remark 4.4.7 fails with probability
at most c / log2 |G|, for some constant c prescribed by the overall error re-
quirement. This can be achieved by choosing the value of c in the statement of
Theorem 4.4.6 as c = c log n for an appropriate constant c . In fact, using a
large enough but constant c , we can reduce the error probability to less than
1/n d , since log2 |G| < (n log n)2 , and so the algorithm in Remark 4.4.7 fails

with probability less than 2−0.29c log n < 1/(log2 |G|n d ). Note that, when invok-
ing this algorithm on level j, during the execution of the algorithm we actually
S
check whether the vertex set of the Schreier tree we construct is β j j ; therefore
the Schreier tree computation part of the SGS algorithm is of Las Vegas type.
Concerning the time requirement, note that |Si | is O(log |G|) during the
S
entire algorithm, since Si is obtained by adding O(log |βi i |) elements to Si+1 .
Hence one call of the algorithm in Lemma 4.4.2 runs in O(n log2 |G|) time.
On level i, one call of the algorithm in Remark 4.4.7 runs in O(n log |G| ·
log n log |G [i] : G [i+1] |) time, since our requirement on the depths of Schreier
trees ensures that a random element can be generated by O(log |G|) multipli-
cations and, as discussed at the error estimate part above, we have to generate
O(log n log |G [i] : G [i+1] |) random elements.
Since both kinds of Schreier tree computations are called at most log2 |G|
times, the total time requirement is as stated.
Remark 4.5.2. With a time–space tradeoff, it is possible to achieve Schreier

tree computations that run in O(n log n log3 |G|) time but use O(n log2 |G|)
memory. The applications of the algorithm in Remark 4.4.7 already run within
this tighter time bound and we claim that, for fixed i, the at most log |G|
invocations of Lemma 4.4.2 on level i run in O(n log2 |G|) total time. This can
be achieved by saving the sequence Ri := (r1 , r2 , . . . , r ji ) from elements of
G [i] with nondegenerate cube computed at the first invocation of Lemma 4.4.2
on level i, and letting the algorithm augment Ri at later calls of Lemma 4.4.2.
S
The generating set Si changes at each call at level i, so the orbit βi i has to be
computed at each call, but the other steps of the algorithm are executed only
once, as if the O(log2 |G|) elements of all the different Si were given together,
as one generating set.
What is left is to describe the solution for (4.6). This solution is based on the
following local expansion lemma from [Babai, 1992].
Lemma 4.5.3. Let S denote a set of generators of the group G and set T =
S ∪ S −1 ∪ {1}. Let D be any finite subset of T t , the set of t-term products of
members of T (in any order). Suppose that there exists q with 0 < q ≤ 1/(2t +1)
such that |D| ≤ (1 − 2qt)|G|. Then, for at least one generator g ∈ S,
|D\Dg| ≥ q|D|.
Proof. Suppose, on the contrary, that |D\Dg| < q|D| for all g ∈ S. Using this
assumption, we first prove that G = T 2t .
Observe that, for each g ∈ S, |D\Dg −1 | = |Dg\D| = |D\Dg| < q|D| and
for any x, y ∈ G,
D\Dx y ⊆ (D\Dy) ∪ (D\Dx)y. (4.9)
From these, an easy induction on k shows that for any u ∈ T k , we have

|D\Du| < kq|D|. As long as kq ≤ 1, this implies that u ∈ D −1 D. Since q ≤
1/(2t + 1), we obtain T 2t+1 ⊆ D −1 D ⊆ T 2t and so T 2t+1 = T 2t . Moreover,

since S generates G, we have G = k≥0 T k ; from this we obtain G = T 2t .
To finish the proof, we count the number of pairs (x, u) with x ∈ D, u ∈ G,
and xu ∈ D two different ways. On one hand, for each fixed x ∈ D, the number
of u ∈ G with xu ∈ D is exactly |D|, so the number of pairs (x, u) is |D|2 . On
the other hand, fixing u ∈ G = T 2t we have |D\Du| < 2qt|D|, so the number
of x ∈ D with xu ∈ D is greater than (1 − 2qt)|D| and the number of pairs
(x, u) is greater than (1 − 2qt)|D||G|. However, this contradicts the condition
|D| ≤ (1 − 2qt)|G|.
The Algorithm Solving (4.6)

We apply the local expansion lemma with the following parameters. Using the
notation of (4.6), let S be the union of labels used in the Schreier tree of R and in
the Schreier tree data structure of H . Moreover, let t be the sum of depths of all
of the trees mentioned above and let q = 1/4t. First, we sift the input generators
of G through H R. If one of them has a nontrivial siftee then H = G β . Hence
it is enough to consider the case when all input generators sift through, and so
S generates G.
Let D = H R; then D ⊆ (S ∪ S −1 ∪ {1})t . If H = G β then |G β : H | ≥ 2,
|D| ≤ |G|/2, and all conditions of Lemma 4.5.3 are satisfied.
Note that although D is not necessarily a group, just a subset of a group,
it is defined as a product of transversals. Consequently, it is possible to test
membership in D and to generate uniformly distributed random elements of D.
By Lemma 4.5.3, if H = G β then, for a randomly chosen d ∈ D,
Prob(∃g ∈ S(dg ∈ D)) ≥ 1/4t. (4.10)
Hence, if 4ct random elements d ∈ D are chosen and dg ∈ D for all such d and
all g ∈ S then Prob(H = G β ) > 1 − e−c .
How can we test that dg ∈ D? Observe that each d ∈ D is factored uniquely
in the form d = hr for some h ∈ H and r ∈ R. Moreover, dg = hrg ∈ D if
and only if hrg(hrg)−1 ∈ H . (As usual, k̄ denotes the element of R such that
k ∈ G β k̄.) Now h fixes β, so hrg = rg. Also, hrg(hrg)−1 ∈ H if and only if
rg(hrg)−1 ∈ H . Summarizing, we obtain that dg ∈ D if and only if the Schreier
generator rg(rg)−1 ∈ H . Consequently, instead of generating random elements
of D, it is enough to check whether Schreier generators constructed by com-
bining randomly chosen elements of R with all elements of S are in H . Hence
the solution of (4.6) we obtained can be considered as a theoretical justification
(with error estimate) of a version of the first heuristic described in Section 4.3.
The algorithm, as just described, requires that, after choosing some random
r ∈ R, we check the Schreier generators rg(rg)−1 for all g ∈ S. Our next goal is to
present an improvement that avoids checking all elements of S. Picking random
elements of S may not work well; this is the essence of Exercise 4.5. Instead, we
use an enhancement of the random subproduct method (cf. Section 2.3). Let k
be an integer chosen randomly from the uniform distribution on [0, |S| − 1] and
let k = &k /2'. Moreover, let w1 be a random subproduct of a random ordering
of S and let w2 be a random subproduct of a random ordering of a random subset
S of size k of S (cf. Exercise 2.1 for the construction of random orderings).
Finally, for g ∈ G, D ⊆ G, and integer a, let P(g, a) denote the proposition
that |{d ∈ D | dg ∈ D}| ≥ a. Note that
P(g, a) ∧ ¬P(h, b) =⇒ P(gh, a − b) ∧ P(hg, a − b), (4.11)
where ¬P denotes the negation of the proposition P (cf. Exercise 4.8).

Lemma 4.5.4. Suppose that P(g, a) holds for some g ∈ S. Then, with proba-
bility at least 1/4, P(wi , a/4) holds for at least one i ∈ {1, 2}.
Proof. Let us fix g ∈ S such that P(g, a) holds. First, we observe that

Prob(g ∈ S ) ≤ 1/2, since ( |S|−1
k
)/( |S|
k
) = (|S| − k)/|S| ≥ 1/2.
We have w1 = ug ε v, where ε ∈ {0, 1} and u, v are random subproducts of
two subsets of S that partition S\{g}. Assume that v is the random subproduct
on the set that is not larger than the other one; the other case can be handled
similarly.
Let p denote the probability of the event that P(h, a/4) holds when h is a
random subproduct of a random ordering of a random subset of size k of S\{g}.
(The number k = &k /2' is also a random variable.) Because the distribution of v
is the same as the distribution of h in this definition, we have Prob(P(v, a/4)) =
p.
Now we prove that the conditional probability Prob(P(ug ε , a/2)|
¬P(v, a/4)) is at least 1/2. This is true because ε is independent of u and v.
Therefore, by (4.11),
Prob(P(ug ε , a/2)|¬P(v, a/4))

= Prob(P(ug ε , a/2)|P(u, a/2) ∧ ¬P(v, a/4)) Prob(P(u, a/2))
+ Prob(P(ug ε , a/2)|¬P(u, a/2) ∧ ¬P(v, a/4)) Prob(¬P(u, a/2))
≥ Prob(ε = 0|P(u, a/2) ∧ ¬P(v, a/4)) Prob(P(u, a/2))
+ Prob(ε = 1|¬P(u, a/2) ∧ ¬P(v, a/4)) Prob(¬P(u, a/2)) = 1/2.
Finally, using (4.11) again,
Prob(P(w1 , a/4)) ≥ Prob(P(ug ε , a/2)|¬P(v, a/4)) Prob(¬P(v, a/4))

≥ (1 − p)/2
and
Prob(P(w2 , a/4)) ≥ Prob(P(w2 , a/4)|g ∈ S ) Prob(g ∈ S ) ≥ p/2.
Since max{(1 − p)/2, p/2} ≥ 1/4, P(wi , a/4) holds with probability at least
1/4 for i = 1 or 2.
In the application to solve (4.6), we have D = H R. Using the notation intro-

duced after Lemma 4.5.3, (4.10) gives that if H = G β then P(g, |D|/4t) holds
for some g ∈ S. Hence Lemma 4.5.4 implies that, for a randomly chosen r ∈ R
and for the random subproducts w1 , w2 as just described,
1
Prob(r w1 (r w1 )−1 ∈ H or r w2 (r w2 )−1 ∈ H ) ≥ . (4.12)
64t
During the SGS construction, we call the subroutine solving (4.6) O(log2 |G|)
times, so each call can fail with probability at most c / log2 |G|, for some con-
stant c . In these applications, we have t ≤ 6 log |G|, so (4.12) and log |G| <
n log n imply that testing c t log n ∈ O(log n log |G|) pairs r w1 (r w1 )−1 ,
r w2 (r w2 )−1 achieves the desired error probability. In fact, as discussed in the
proof of Lemma 4.5.1, at the error estimate of the Schreier tree construction
part of the SGS construction algorithm, choosing a large enough constant c
ensures that the overall probability of failure is less than 1/n d .
Constructing r wi (r wi )−1 and sifting it in H requires O(n log |G|) time.
Hence the overall time requirement of calls to solve (4.6) is O(n log n log4 |G|).
Combining this result with Lemma 4.5.1, we obtain the following:
Theorem 4.5.5. Suppose that G = T ≤ Sym() and || = n. There is a

Monte Carlo algorithm that, with error probability less than 1/n d for a constant
d prescribed by the user, constructs an SGS for G in O(n log n log4 |G| +
|T |n log |G|) time. The memory requirement of the algorithm is O(n log |G| +
|T |n).
The term |T |n log |G| in the running time stems from the handling of level 0.
Recall that we stored the original generators in a set S0 . When the data structure
is up to date below level 0, we have to sift the elements of S0 in S1 to ensure
that G = S1 .
By Remark 4.5.2, the Schreier tree computations run in O(n log n log3 |G|)
time. In Section 5.2 we shall give a version of the algorithm that, with high
probability, runs in O(n log n log3 |G|) time for small-base group inputs.
4.5.1. Implementation
A version of the nearly linear-time Monte Carlo SGS construction was first im-
plemented in the C language (see [Seress and Weisz, 1993]). The main purpose
was to experiment with the different Schreier tree constructions and to get a
feeling for the true probabilities in (4.12), which turn out to be much higher
(usually, at least 1/3) than the estimate 1/64t obtained in (4.12).
It turns out that almost always the Schreier tree construction described in
Remark 4.4.3 already produces at most log |α G | generators, such that these
generators and their inverses define a Schreier tree of depth at most log |α G | for
G mod G α , and applying the algorithm in Remark 4.4.7 does not improve the
depth significantly. Hence, in the GAP implementation, we use the algorithm
in Remark 4.4.3, and no additional work for Schreier tree computations is
necessary.
As indicated already, the practical performance of the algorithm is far better
than our probability estimate in (4.12). One reason may be that the Schreier tree
construction of Lemma 4.4.2 “randomizes” the coset representatives in the sense
that by writing out a coset representative as a product of original generators, the
length of the word may be exponential compared to the distance from the root of
the Schreier tree. Also, the usage of the random subproducts w1 and w2 provides
further random mixing of the generators. So, although we could guarantee only
with probability (1/ log |G|) that a Schreier generator obtained from a random
coset and the wi s witnesses that H = G β , the practical performance is probably
closer to the case when the Schreier generator is constructed using a random
element as a generator. Such a Schreier generator has a chance of at least
1 − 1/|G β : H | ≥ 1/2 to witness that H = G β .
Based on that experience, the following very fast heuristic version is imple-
mented in GAP. While working on level i of the data structure, we test only
one Scheier generator obtained from a random coset and (the long) random
subproduct w1 . If this Schreier generator sifts through Si+1 then we declare
that Si βi = Si+1 and that the data structure is up to date below level i − 1.
Although such a declaration may be wrong with a significant probability, indi-
vidual errors seem to be corrected as the Schreier–Sims procedure moves along
the base points. We mention that the implementation uses a further heuristic, re-
placing the sifting of Schreier generators; we shall describe this further heuristic
at the end of Section 5.2.
To ensure the correctness of the output, the user has two options in GAP. The
default value is a deterministic test, which decides with certainty whether the
SGS construction is correct. The user also has the option to choose a randomized
test, which declares that, with probability at least 1 − δ, the SGS construction
is correct; the error bound δ is specified by the user.
In this section we describe only the randomized SGS construction test, which
is based on the following lemma. The deterministic strong generating set used
in GAP is the so-called Verify routine by Sims, which we shall discuss in
Section 8.2. In Chapter 8, we shall present other strong generating tests as well.
Lemma 4.5.6. Let S be an alleged strong generating set of G = S ≤ Sym()

relative to the base (β1 , . . . , βm ). For 1 ≤ i ≤ m, let Si = S ∩ Sym()(β1 ,...,βi−1 )
and let Ri be the transversal built from Si for G (β1 ,...,βi−1 ) mod G (β1 ,...,βi ) . Finally,
let D = Rm Rm−1 · · · R1 , let g be a uniformly distributed random element of
Exercises 77
D, t be the sum of the depths of Schreier trees coding the Ri , and w1 , w2 be a

pair of random subproducts built from S, as described in the algorithm solving
(4.6). If S is not a strong generating set then Prob(gw j ∈ D) ≥ 1/64t for at
least one j ∈ {1, 2}.
Proof. By Lemma 4.2.3, if S is not a strong generating set then Si βi Si+1
for at least one i ∈ [1, m]. This implies that |D| ≤ |G|/2 and Lemma 4.5.3 can
be applied with q = 1/4t. We obtain that Prob(dh ∈ D) ≥ 1/4t for some h ∈ S
and so, by Lemma 4.5.4, Prob(gw j ∈ D) ≥ 1/64t.
It is clear that if 64t ln(1/δ) pairs gw1 , gw2 sift through D then the probability
that S is indeed an SGS is at least 1 − δ.
The advantages of separating the construction of an SGS and the checking of
the correctness of the construction are twofold. On one hand, in the oft occurring
case when |G| is known in advance, it is very easy to check correctness by
comparing |D| and |G| and, in the case |D| = |G|, we can terminate already
after the fast heuristic construction. On the other hand, even if |G| is not known
in advance, we have to apply the expensive sifting of Schreier generators about
a factor log2 |G| less times than in the original algorithm, which checks whether
Si βi = Si+1 each time a level i is processed.
If the input SGS is not correct then the probability that gw j ∈ D seems to
be much higher in practice than the estimate 1/64t in Lemma 4.5.6, and errors
are detected very quickly. In the case when gw j ∈ D is detected, the siftee of
gw j can be added to S and we can return to the construction phase of the
algorithm. Hence, the vast majority of time in an SGS construction is spent
checking the already correct final result. However, since the algorithm is Monte
Carlo, we cannot terminate and guarantee the correctness of the answer with
the prescribed probability earlier than the number of checks implied by the
estimate 1/64t in Lemma 4.5.6.
Exercises
4.1. Given G = S, construct less than n 2 generators for G in O(|S|n 2 ) time.
Hint: Sift the elements of S.
4.2. Give an example of a permutation group G ≤ Sn with two nonredundant
bases: One of size &log |G|' and one of size log |G|/ log n . Hint: G can
be chosen cyclic.
4.3. Prove Lemma 4.2.1 without the assumption that 1 ∈ R.
4.4. Design a version of the Schreier–Sims algorithm that handles all base
points uniformly (i.e., do not place all input generators immediately into
the generating set for G [1] ) and that achieves the time savings indicated
following Theorem 4.2.4.
4.5. (Luks) Let H = {h 1 , . . . , h n−2 } be a regular group acting on {1, . . . , n −
2}, and let G = h 1 , . . . , h n−2 , (n − 1, n). Show that B = (1, n − 1) is
a nonredundant base for G but the probability that a randomly chosen
Schreier generator detects that G 1 = 1 is only 1/(n − 1).
4.6. Determine the time requirement of the algorithm presented in the proof of
Lemma 4.4.8.
4.7. [Sims, 1971a] Design and analyze an algorithm that computes an SGS of
size at most log |G| as a subset of a given SGS S. Hint: Let S be an SGS
relative to the base (β1 , . . . , βm ), let G = G [1] ≥ · · · ≥ G [m+1] = 1 be
the corresponding point stabilizer chain, and let Si = S ∩ G [i] . List the
elements of S such that Si+1 comes before Si \Si+1 , for all i ∈ [1, m]. Do
an orbit computation to decide whether the jth element s j of the list is in
the subgroup generated by the first j − 1 elements; if it is then discard s j .
4.8. Prove the implication (4.11) in Section 4.5. Hint: Use that
(Dg\D)h ⊆ (Dgh\D) ∪ (D\Dh)
and
D\Dg ⊆ (D\Dhg) ∪ (Dh\D)g.

5
Further Low-Level Algorithms
In this chapter, we collect some basic algorithms that serve as building blocks
to higher level problems. Frequently, the efficient implementation of these low-
level algorithms is the key to the practicality of the more glamorous, high-level
problems that use them as subroutines.
5.1. Consequences of the Schreier–Sims Method

The major applications of the Schreier–Sims SGS construction are membership
testing in groups and finding the order of groups, but an additional benefit of
the algorithm is that its methodology can be applied to solve a host of other
problems in basic permutation group manipulation. We list the most important
applications in this section. The running times depend on which version of the
Schreier–Sims algorithm we use; in particular, all tasks listed in this section
can be performed by nearly linear-time Monte Carlo algorithms. For use in
Chapter 6, we also point out that if we already have a nonredundant base
and SGS for the input group then the algorithms presented in Sections 5.1.1–
5.1.3 are all Las Vegas. Concerning the closure algorithms in Section 5.1.4, see
Exercise 5.1.
5.1.1. Pointwise Stabilizers

Any version of the Schreier–Sims method presented in Chapter 4 can be easily
modified to yield the pointwise stabilizer of some subset of the permutation
domain. Suppose that G = S ≤ Sym() and ⊆ are given, and we need
generators for G () . To this end, we fix an ordering of in which the elements
of come first and construct a base and strong generating set such that the base
elements are ordered according to that chosen ordering. Then there is an initial
79
80 Further Low-Level Algorithms
segment of the base, say the first k elements, that contains the base points in
and so the (k + 1)st group in the stabilizer chain is G () .
The only modification in the Schreier–Sims algorithm is that we temporarily
lift the restriction that the constructed base must be nonredundant and we declare
every element of to be a base point (in the chosen ordering). After the strong
generating construction is finished, we can simply delete the redundant base
points. Note that on levels with |G [i] : G [i+1] | > 1, the created strong generating
set and Schreier trees do not have to be changed. Also, the time complexity of
the algorithm remains the same as it was originally, since sifting through a level
with fundamental orbit of size 1 can be done in O(1) time.
5.1.2. Homomorphisms
The same idea as in pointwise stabilizer computations can be used to compute
kernels of actions. Given G = S ≤ Sym() and a homomorphism ϕ : G →
Sym() (defined by the images of the elements of S), we can consider G as a
permutation group acting on ∪ . Namely, for g ∈ S, denote by (g, ϕ(g))
the element of Sym() × Sym() whose restrictions to and are g and
ϕ(g), respectively. Then Ḡ := {(g, ϕ(g)) | g ∈ G} is isomorphic to G, and
ker(ϕ) is the restriction of the pointwise stabilizer Ḡ () to . Similarly, we can
determine whether a map ϕ : S → Sym() defines a homomorphism. The map
ϕ defines a homomorphism if and only if the pointwise stabilizer of in the
group (g, ϕ(g)) | g ∈ S is the identity.
Specifying the images of the generators at a homomorphism ϕ : G = S →
Sym() is not sufficient for the effective handling of ϕ. We also need to be able
to compute ϕ(g) for any g ∈ G and a representative from the coset ϕ −1 (g) for
any g ∈ ϕ(G). To this end, we store two strong generating sets S1 , S2 with the
corresponding Schreier tree data structures for the group Ḡ = (g, ϕ(g)) | g ∈
S ≤ Sym( ∪ ). The first SGS S1 is relative to a base B1 = (β1 , . . . , βm ) such
that an initial segment (β1 , . . . , βk ) of B1 consists of points of and Ḡ (β1 ,...,βk )
fixes pointwise; S2 is relative to a base B2 such that an initial segment of
B2 consists of points in and the pointwise stabilizer of this initial segment is
Ḡ () . Given g ∈ G, ϕ(g) can be computed by sifting (g, 1) in the Schreier tree
data structure corresponding to S1 and then restricting the inverse of the siftee of
(g, 1) to . Similarly, given g ∈ ϕ(G), a representative of the preimage ϕ −1 (g)
can be obtained by sifting (1, g) in the Schreier tree data structure corresponding
to S2 and restricting the inverse of the siftee to .
In implementations, permutation groups of degree n are represented acting on
the set [1, n]. In particular, ϕ : G → Sym() is specified by pairs of permutations
(g, ϕ(g)), where g acts on [1, n] for n := || and ϕ(g) acts on [1, k] for k := ||.
Therefore, when computing in Sym() × Sym(), we have to fix a permuta-
tion c acting on [1, n + k] such that [1, k]c = [n + 1, n + k] and use c and
c−1 to conjugate the elements of Sym() to Sym([n + 1, n + k]) and vice
versa.
In practice, we often know an SGS S for G ≤ Sym() before the homomor-
phism ϕ is defined, and ϕ is specified by the images of the elements of the SGS.
In this case, we have at hand the SGS S1 defined in this section without any
additional work. Note also that restricting the elements of S2 to , we obtain
an SGS for the image group ϕ(G).
5.1.3. Transitive Constituent and Block Homomorphisms

Two oft occurring special cases of group homomorphisms, which can be handled
faster than the general case of homomorphisms specified by the images of gen-
erators, are transitive constituent homomorphisms and block homomorphisms.
A transitive constituent homomorphism is the restriction of G ≤ Sym() to an
orbit ⊆ . In this case, we do not have to extend the G-action to a domain
of size || + ||; we only have to compute a strong generating set according
to an ordering of starting with . The kernel of the homomorphism is the
pointwise stabilizer of , and preimages can be computed as in the general
case. Moreover, the image of a group element g ∈ G can be computed without
sifting, just restricting g to . Therefore, we do not have to compute a second
SGS for G as in the case of general homomorphisms.
In implementations, similarly to the general case of homomorphisms, we
need to construct a permutation c ∈ Sym([1, ||]) that maps to [1, ||]. Then
c and c−1 can be used to define the image group of the transitive constituent
homomorphism as a permutation group acting on [1, ||] and to convert the
elements of this image group to permutations acting on .
The other special case of homomorphisms is block homomorphism. Sup-
pose that a not necessarily transitive group G ≤ Sym() is given, and let :=
{1 , . . . , k } be a partition of with the property that G permutes the pairwise
disjoint subsets 1 , . . . , k of . The elements of are called blocks for G.
A block homomorphism ϕ : G → Sk is defined by the permutation action of G
on .
An efficient method for computing with block homomorphisms is described
in [Butler, 1985]. The goal is to avoid working with permutations on ∪ and
use only the permutation representation of G on . The algorithm assumes that
given a base and SGS for some H ≤ Sym(), we can quickly compute another
base B̄ and SGS for H , where B̄ starts with some specified ω ∈ . This can be
achieved by a base change algorithm (cf. Section 5.4). This application of base
change is much faster than the computation of an SGS from scratch.
The Block Homomorphism Algorithm

Computing the image of some g ∈ G at the block homomorphism ϕ is easy:
We store an ||-long list L, where the ith entry of L records which block the
ith element of belongs to. Since g permutes the blocks, it is enough to pick a
representative δ j ∈ j from each block and look up in L the block containing
δ j . This defines the image j ϕ(g) for all j ∈ [1, k].
g
Next, we describe the computation of ker(ϕ). Since ker(ϕ) consists of those

elements of G that fix the blocks setwise, we compute recursively generators for
the subgroups G [ j] := G (1 ,..., j−1 ) , j = 1, 2, . . . , k + 1 (i.e., G [ j] stabilizes the
first j −1 blocks but can permute elements of a block among themselves). Then
we can set ker(ϕ) := G [k+1] . Suppose that an SGS for G [ j] is already known,
and let us fix some δ j ∈ j . We apply a base change to get a base B j for G [ j]
starting with δ j and the corresponding SGS. In particular, we have generators
X j+1 for the subgroup (G [ j] )δ j and a transversal R j (coded in a Schreier tree)
for G [ j] mod (G [ j] )δ j . Our goal is to add some elements r j1 , r j2 , . . . of R j to
X j+1 so that X j+1 := X j+1 ∪ {r j1 , r j2 , . . .} generates G [ j+1] .
For any g ∈ G [ j] , g ∈ G [ j+1] if and only if δ gj ∈ j . Based on this observa-
rj
tion, we choose r j1 , r j2 , . . . ∈ R j recursively so that δ j i ∈ j and the orbits
X j+1 ∪{r j ,...,r j }
δj 1 i
increase as i increases. At the addition of each r ji , the size of the or-
bit of δ j at least doubles, since the groups X j+1 ∪{r j1 , . . . , r ji } define a strictly
increasing sequence of groups with the property that the stabilizer of δ j does
not increase. Hence, with the addition of at most log | j | elements from R j , we
X
can achieve that the set X j+1 = X j+1 ∪ {r j1 , r j2 , . . .} satisfies δ j j+1 = δ Gj ∩ j .
[ j]
This means that X j+1 = G . Note that X j+1 is an SGS for G , relative
[ j+1] [ j+1]
to the base B j .
The kernel computation just described has an additional advantage: In the
process, we get strong generating sets for the subgroups in the chain G = G [1] ≥
G [2] ≥ · · · ≥ G [k+1] = ker(ϕ), which is the point stabilizer chain of the image
group ϕ(G) on the permutation domain . More precisely, for each j ∈ [1, k],
we get two SGS for G [ j] : The first SGS comes from the recursion, and it is
discarded during the base change, when the second SGS relative to the base
B j is constructed. After the SGS X j+1 for G [ j+1] is constructed, we discard
the second SGS for G [ j] as well, but we store the Schreier tree (S j , T j ), which
codes the transversal R j for G [ j] mod (G [ j] )δ j . Hence, the output of the kernel
computation is an SGS X k+1 for ker(ϕ) and a sequence of Schreier trees (S j , T j )
for 1 ≤ j ≤ k. The set

k
ϕ(S j )
j=1
is an SGS for ϕ(G). Note that we do not compute and store ϕ(S j ) explicitly;
we can work with the elements of S j , which are permutations of .
The last basic task we have to perform is to find a preimage for some given
g ∈ ϕ(G). This can be accomplished by a slight modification of the usual sifting
procedure, using the transversals R j . The only concern is that g ∈ Sym() ∼ =
Sk and R j ⊆ Sym(), so we cannot take the product of g with inverses of
transversal elements, as required in sifting. Instead, we construct recursively
r j ∈ R j for j = 1, 2, . . . so that gϕ(r1 )−1 · · · ϕ(r j−1 )−1 ∈ ϕ(G [ j] ). Suppose
that r1 , . . . , r j−1 and the product p j := r1−1 · · · r −1
j−1 of their inverses are already
pj
constructed. Let l := j , and suppose that δl is in the block l . Then we pick
g
r j ∈ R j such that δ j j ∈ l . Following this procedure, pk+1 := r1−1 · · · rk−1 is

r
the inverse of a preimage of g. Note that, as customary with sifting, the same
procedure can be used to test membership in ϕ(G). Given g ∈ Sk , either the
procedure constructs a preimage of g, or we notice for some j that no element
of R j carries δ j to the block l , as required in the sifting procedure above.
5.1.4. Closures and Normal Closures

Suppose that we have already computed a base B = (β1 , . . . , βm ) and a strong
generating set S relative to B for some group G ≤ Sym(), and now we
need an SGS for the group H = S ∪ T , for some T ⊆ Sym(). Following
the terminology in GAP, we call H the closure of G by T . The incremental
nature of the Schreier–Sims procedure enables us to compute an SGS for H as an
augmentation of the SGS for G, instead of starting the construction from scratch.
Namely, using the terminology of Section 4.2, we add T to the generating set
of G, recompute the first fundamental orbit β1H and the transversal for H mod
Hβ1 , and declare that our data structure is up to date below level 1. When the
data structure becomes up to date below level 0, we have an SGS for H relative
to a base that contains B as a (not necessarily proper) initial segment.
We can also compute normal closures and G-closures. Suppose that H =
T ≤ Sym() and G = S ≤ Sym() are given, T is an SGS for H , and we
need to compute the G-closure H G . Since a strong generating set enables us
to test membership in a permutation group, the method of Section 2.1.2 can
be applied. According to that algorithm, when we detect that h g ∈ H for some
h ∈ T and g ∈ S, we replace H by the closure H, h g . However, it was observed
in practice that frequent calls of the closure algorithm are quite slow. The reason
is that the algorithm spends a significant amount of time for SGS computations
in subgroups of H G that will be replaced later by larger subgroups of H G .
Therefere, to reduce the number of calls of the closure algorithm, it is beneficial
to collect all h g with h ∈ T, g ∈ S, h g ∈ H in a list L and call the closure
algorithm for H, L. Of course, it is possible that H, L is not yet the G-
closure of H and the procedure has to be iterated; however, the number of
calls of the closure algorithm usually decreases significantly, and the G-closure
algorithm becomes faster.
In GAP, this observation is combined with the random subproduct method.
Given H, G ≤ Sym() as in the previous paragraph, in the list L we collect
permutations h g , where h is a random subproduct of the generators of H and
g is a random subproduct of the generators of G. In this way, we can avoid
working with long lists of permutations in the case when |S||T | is too large. In
the GAP implementation, L is chosen to be of length 10. Also, when computing
the random subproduct h, we also use the previously constructed elements of
L as generators. With this strategy, the G-closure is almost always obtained by
computing at most two lists L of length 10, and so calling the closure algorithm
at most twice. After the alleged G-closure H̄ of H is computed, we may check
the correctness of the result deterministically, by confirming that the conjugates
of the generators of H̄ by S are in H̄ . Alternatively, by Lemma 2.3.8, if k random
subproduct conjugates h g fail to increase H̄ then H̄ = H̄ G with probability
at least 1 − (3/4)k .
Normal closure computations enable us to get generators for commutator
subgroups and hence for the elements of the derived and lower central series.
In particular, we can test solvability and nilpotence. We note, however, that the
methods of Chapter 7 test solvability and nilpotence more quickly.
5.2. Working with Base Images

Given a base B for some G ≤ Sym(), the images of base points uniquely
determine the elements of G. Namely, if B g = B h for some g, h ∈ G then gh −1
fixes B pointwise and so g = h. Therefore, a set A ⊆ G may be described by
storing only the images of B under the elements of A. If, as usually happens in
practical situations, |B| is significantly smaller than || then this method saves
memory compared to the storage of permutations.
We need, however, a method to recover the elements of G from the base
images. Given an SGS for G relative to B, this task can be readily accomplished.
Lemma 5.2.1. Let S be an SGS for G ≤ Sym() relative to B and let t denote
the sum of depths of Schreier trees coding the coset representative sets along the
point stabilizer chain of G. Then, given an injection f : B → , it is possible to
find a permutation g ∈ G with B g = f (B) or determine that no such element
of G exists in O(t||) time, by a deterministic algorithm.
Proof. Let B = (β1 , . . . , βm ) be the base of G and let G = G [1] ≥ · · · ≥

G [m+1] = 1 be the corresponding point stabilizer chain. The element g ∈ G
with the required property can be obtained by a modification of the standard
[1]
sifting procedure for permutations. If f (β1 ) is in the orbit β1G then, taking the
product of edge labels along the path from f (β1 ) to β1 in the first Schreier tree,
we obtain a permutation r1 ∈ G such that f (β1 )r1 = β1 . Hence, the function
[2]
f 2 : B → defined by f 2 (βi ) := f (βi )r1 fixes β1 . If f 2 (β2 ) ∈ β2G then, as a
product of edge labels in the second Schreier tree, we obtain r2 ∈ G such that
the function f 3 : B → defined by f 3 (βi ) := f (βi )r1 r2 fixes the first two base
points. Continuing this process, we obtain some g = r1r2 · · · rm ∈ G such that
−1
f (B) = B (g ) . The number of permutation multiplications in the procedure is
[i]
at most t. If, for some i ∈ [1, m], we have f i (βi ) ∈ βiG then we conclude that
there is no g ∈ G with f (B) = B g .
A possible application of this method is when we have to store numerous

elements of G or generators for numerous subgroups of G. For example, this
is the case when we need to store conjugacy class representatives of G or the
subgroup lattice of G. It is enough to store the base images of the conjugacy
class representatives or the base images of the generators of the subgroups and,
when the group elements are actually needed, Lemma 5.2.1 can be invoked. We
shall describe another applications of Lemma 5.2.1 in Sections 6.1.3 and 6.1.4.
Lemma 5.2.1 can be sped up significantly in the case when we do not need the
permutation g with B g = f (B) explicitly but only need to establish the existence
of such g or it is enough to have g written as a word in the strong generators.
For simplicity, let us assume that the SGS S of G is closed for inverses (i.e.,
S = S −1 ). This assumption is automatically satisfied if, as in GAP, the Schreier
trees are constructed by the algorithm in the proof of Lemma 4.4.2.
Lemma 5.2.2. Given an injection f : B → , it is possible to either find a

word w of length at most t in the strong generators whose product is the unique
g ∈ G with B g = f (B) or to determine that no such element of G exists, in
O(|B|t) time.
Proof. We modify the algorithm described in the proof of Lemma 5.2.1 by
not computing the inverses r1 , . . . , rm of transversal elements explicitly, but
by writing them only as words in the strong generators. Namely, if f (β1 ) is in
[1]
the orbit β1G then we take the sequence of edge labels along the path from
f (β1 ) to β1 in the first Schreier tree. In this way, we obtain a word w1 in
the strong generators such that f (β1 )w1 = β1 . Then we define the function
f 2 : B → by f 2 (βi ) := f (βi )w1 . Iterating this process, eventually we obtain
a word w1 w2 · · · wm such that w := (w1 w2 · · · wm )−1 satisfies f (B) = B w .
Since S = S −1 , we can write w as a word in the strong generators as well. As
[i]
before, if f i (βi ) ∈ βiG for some i ∈ [1, m] then we conclude that there is no
g ∈ G with f (B) = B g .
We shall refer to the procedure described in Lemma 5.2.2 as sifting as a word.

The first important application of sifting as words is the following theorem.
Theorem 5.2.3. Given a base B for some G = S ≤ Sym(), with || = n,
an SGS for G can be computed in O(n|B|2 |S| log3 |G|) time, by a deterministic
algorithm. In particular, if a nonredundant base is known then an SGS can be
obtained by a nearly linear-time deterministic algorithm.
Proof. We modify the basic Schreier–Sims algorithm, described in Section 4.2,

as follows: Coset representatives for the stabilizer chain of G are stored in
Schreier trees, computed by the algorithm in the proof of Lemma 4.4.2. This
ensures that the depth of each tree is at most 2 log |G|. The time-critical part
of the procedure is the sifting of Schreier generators. We construct and sift
Schreier generators as words. If the sifting produces a nontrivial residue then
we compute this residue as a permutation of and add it to the strong generating
set. Nontrivial residues occur at most |B| log |G| times.
We leave for the reader to check the details that the modified procedure
satisfies the claimed time bound.
In practice, it occurs frequently that we know a base in advance. In a number

of cases, we compute an SGS for some group G, and then the bulk of the
computation deals with various subgroups of G. In these cases, the base of G
can be used as a base for all of its subgroups. This will play an important role
in Chapter 6. Of course, a known base can be used in a combination with the
randomized methods of Chapter 4 as well.
Base image computations can be utilized at the construction of an SGS even
in the case when a base for G is not known in advance. We finish this section
with such an application from [Babai et al., 1991], which, with high probability,
constructs an SGS of a small-base group faster than the version described in
Theorem 4.5.5.
The following lemma is folklore; it seems to appear first in print in [Cameron
et al., 1984].
Lemma 5.2.4. Let G ≤ Sym(), with || = n, be transitive. Suppose that the
minimal base size of G is b. Then, for all g ∈ G\{1}, we have |supp(g)| ≥ n/b.
Proof. Let B be a minimal base for G (considered as a subset of , not a

sequence) and let be the support of some g = 1, || = d. We define two
hypergraphs on with edge sets B = {B h | h ∈ G} and D = {h | h ∈ G},
respectively. Both hypergraphs are uniform and, since G is transitive, each
element of is contained in the same number of edges of B and D. Let deg(B)
and deg(D) denote the valencies of vertices in these hypergraphs. Counting the
number of pairs (B , β) with B ∈ B and β ∈ B two ways, we obtain deg(B)n =
b|B|. Similarly, deg(D)n = d|D|. Moreover, because the elements of B are
bases for G and the elements of D are supports of group elements, B ∩ = ∅
for all B ∈ B, ∈ D. A simple double counting argument gives
deg(B) deg(D)n = |{(B , , β) | B ∈ B, ∈ D, β ∈ B ∩ }| ≥ |B||D|.
Substituting deg(B) = b|B|/n and deg(D) = d|D|/n yields bd ≥ n.
The speedup of the nearly linear-time SGS construction for small-base groups
may proceed as follows. We warn the reader in advance that this is a theoretical
exercise: The GAP implementation also uses base image computations, but the
implemented version is much simpler than the algorithm presented here. We
shall describe how base images are used in the implementation after the proof
of Theorem 5.2.6.
We first handle the case of transitive groups. Please note the distinction in the
statement of the next theorem between the maximum running time, which we
never exceed, and the typical running time, which happens with high probability.
Such distinction occurs frequently in Las Vegas algorithms, where we can check
the correctness of the output with certainty, but it is quite unusual in the case
of Monte Carlo algorithms.
Theorem 5.2.5. Suppose that G = T ≤ Sym(), with || = n, is transitive,

and let d > 0 be a constant. There is a Monte Carlo algorithm that constructs an
SGS for G in O(n log n log4 |G| + |T |n log |G|) time. With probability greater
than 1−1/n d , the algorithm terminates in O(log5 |G| log n +n log n log3 |G|+
|T |n log |G|) time. In any case, the output is correct with probability greater
than 1 − 1/n d . The memory requirement is O(n log2 |G| + |T |n).
Proof. We assume familiarity with the nearly linear-time SGS construction in

Section 4.5, which we briefly summarize here. Recall that we work with an
initial segment B = (β1 , . . . , βk ) of a nonredundant base and an approximation
Si for a generating set of G [i] := G (β1 ,...,βi−1 ) , for 1 ≤ i ≤ k. We maintain the
property that, for all i ∈ [1, k], Si ≥ Si+1 , and we say that the data structure
is up to date below level j if Si βi = Si+1 holds for all i with j < i ≤ k
and the sum of depths of the Schreier trees for levels j + 1, . . . , k is at most
6 log |S j+1 |.
In the case when the data structure is up to date below level j, we compute a
Schreier tree for S j mod S j β j and check whether S j β j = S j+1 by sifting
Schreier generators composed from random cosets for S j mod S j β j and short
and long random subproducts from S j . If S j β j = S j+1 then we compute a
shallower Schreier tree for S j mod S j β j ; if not then we add a new generator
to S j+1 .
As asserted in Remark 4.5.2, the total time spent in Schreier tree computations
is O(n log n log3 |G|). Hence, to achieve the running time stated in this theorem,
we have to speed up the sifting of Schreier generators.
The idea is that we form the short and long random subproducts only as words,
without multiplying out the permutations, and sift the Schreier generators as
words in the Schreier tree data structure of S j+1 . If a Schreier generator
does not sift through as a word then we multiply out the siftee and obtain
a permutation in S j β j \S j+1 . If a Schreier generator sifts through then we
have to establish that the siftee w, which is a word fixing B pointwise, is really
the identity. We would like to check this property by establishing that some
randomly chosen points in are fixed by w.
By Lemma 5.2.4, if b is the minimal base size of G and b randomly cho-
sen points of are fixed by w then w = 1 with probability greater than
1 − (1 − 1/b)b > 1/2. The problem is that we do not know the value of b
in advance. Therefore, we guess an upper bound M for log |G| and run the
entire algorithm under the assumption that log |G| ≤ M. We shall check each
siftee by computing the image of M randomly chosen points. If our guess for
M is correct then b ≤ M, and we detect a nontrivial siftee with probability
greater than 1/2.
We know that G is transitive on and so |G| ≥ n. Therefore, our initial
guess for M is the smallest power of 2 greater than log n, M = 2l ≥ log n. We
abort the algorithm immediately if we obtain some evidence that our guess
S
for M is incorrect: Either j |β j j | > 2 M or the deterministic Schreier tree
computation of Lemma 4.4.2 produces a Schreier tree of depth greater than 2M,
or a randomized Schreier tree computation does not return a shallow Schreier
tree as required in Remark 4.4.7.
We claim that the running time of the SGS construction with a guess M as
an upper bound for log |G| is
O(n M 3 log n + M 5 log n + |T |M 2 ). (5.1)
The Schreier tree computations run in O(n M 3 log n) time, since we do not
encounter more than M base points (otherwise the algorithm aborts), each
S j increases at most M times, and by Remark 4.5.2, the invocations of the
algorithm in the proof of Lemma 4.4.2 on each fixed level run in O(n M 2 ) total
time. Hence all log |G| factors in the original timing estimate in Remark 4.5.2
can be replaced by M.
The other part of the algorithm, the sifting of Schreier generators, can be
done in O(n M 3 log n + M 5 log n) time: We have at most M base points, each
S j increases at most M times, and after each increase of S j , we sift O(M log n)
Schreier generators as words. Note that we have to sift twice as many Schreier
generators as in the version in Section 4.5, since we have to compensate for the
fact that we detect a nontrivial siftee only with probability between 1/2 and 1.
The handling of each Schreier generator takes O(M 2 ) time: This is the time
requirement of sifting as a word by Lemma 5.2.2, as well as the required time
to compute the images of M randomly chosen points in the siftee. We detect a
nontrivial siftee at most M 2 times (at most M times at each of at most M base
points) and multiplying out a nontrivial siftee takes O(Mn) time. Finally, the
term O(|T |M 2 ) in (5.1) comes from the sifting of the input generators as words.
If our guess for M is correct then the error probability analysis in Section 4.5
is valid (note again that we have compensated for the fact that not all non-
trivial siftees are detected by sifting twice as many Schreier generators as in
Section 4.5) and we obtain a correct SGS with probability greater than 1−1/n d .
However, it is possible that, although the algorithm did not abort, the output
is incorrect because our estimate for M was too low and we missed too many
nontrivial siftees. Hence, after termination, we use the strong generating test
described in Lemma 4.5.6 to check whether the result is correct. If R1 , . . . , Rk
denote the transversals constructed by the algorithm then, by Lemma 4.5.6,
sifting O(M log n) elements of T and of D := Rk · · · R1 in D detects if the
output is incorrect with probability greater than 1 − 1/n d . We sift the elements
of D and T as permutations, not only as words, and the time requirement for
each sift is O(Mn).
If something bad happens (the algorithm aborts or the final strong generating
test reveals that the output SGS is incorrect) then we double our guess M and
repeat the entire algorithm.
When our guess exceeds the actual value of log |G| for the first time, the SGS
construction succeeds with probability greater than 1 − 1/n d and the algorithm
terminates. Since this last guess satisfies log |G| ≤ M < 2 log |G|, the running
time estimate of the last run is O(log5 |G| log n + n log n log3 |G| + |T | log2
|G|). Also, because we double our guesses for M, the running time estimate of
the last run dominates the sum of running time estimates for all previous runs.
What happens if, owing to a sequence of unlucky random bits, the algorithm
does not terminate when M exceeds log |G|? When the algorithm is run with a
guess M > log |G|, the running time is
O(n log3 |G| log n + M log4 |G| log n + |T |M log |G|), (5.2)
since we do not encounter more than log |G| base points, each S j increases
at most log |G| times, the sum of depths of Schreier trees is O(log |G|), and
on each level we construct O(log |G| log n) Schreier generators (note that this
number depended on the sum of depths of trees). Hence all factors M in the
estimate (5.1) can be replaced by log |G|, except one: The number of points
whose images are traced in the siftees of Schreier generators. Note also that if
the guess M exceeds n then the factors M can be replaced by n in (5.2), because
we never trace the images of more than n points.
We continue the doubling of our guess M at most until M exceeds log(n!).
Hence the run with the last estimate for M terminates in O(n log4 |G| log n +
|T |n log |G|) time, and this time estimate dominates the sum of running time
estimates for all previous runs.
Theorem 5.2.6. Given G = T ≤ Sym(), with || = n, and a constant

d > 0, there is a Monte Carlo algorithm that computes an SGS for G
in O(log6 |G| log n + n log4 |G| log n + |T |n log |G|) time. With probability
greater than 1 − 1/n d , the algorithm terminates in O(log6 |G| log n +
n log3 |G| log n + |T |n log |G|) time. In any case, the output is correct with
probability greater than 1 − 1/n d . The memory requirement is O(n log2 |G| +
|T |n).
Proof. We start by computing the orbits of G on and choosing an orbit 1

of size greater than 1. We compute a base and SGS for the G-action on 1 , by
the algorithm described in Theorem 5.2.5. The Schreier tree computations and
multiplying out nontrivial siftees as permutations are carried out with permu-
tations of , not only with their restriction to 1 . This computation is the base
case of the following recursive procedure.
Suppose that a base B = (β1 , . . . , βk ) and generating sets Si and transversals
Ri for 1 ≤ i ≤ k are already computed for the G-action G|1 ∪···∪ j−1 on
1 ∪ · · · ∪ j−1 , for some orbits 1 , . . . , j−1 of G. As in the base case, the
strong generators and the labels in Schreier trees are computed as permutations
of . Then we perform the following three steps:
(a) Decide whether G (1 ∪···∪ j−1 ) = 1; if not, then find an orbit j where
G (1 ∪···∪ j−1 ) acts nontrivially.
(b) Compute a nonredundant base for the G-action on j .
(c) Compute a base and SGS for the G-action on 1 ∪ · · · ∪ j .
The recursion terminates when Step (a) reports that G (1 ∪···∪ j−1 ) = 1, or
1 ∪ · · · ∪ j = .
Step (a) is simply calling the strong generating test of Lemma 4.5.6, with the
set D := Rk · · · R1 .
If an orbit j is output by Step (a) then we call the algorithm of Theorem 5.2.5
to compute a nonredundant base := (γ1 , . . . , γl ) (and an SGS, which we shall
discard) for G| j . We carry out this computation using only the restrictions of
permutations to j . Since G|1 ∪···∪ j ≤ G|1 ∪···∪ j−1 ×G| j , the concatenation
of B and is a (possibly redundant) base for G (1 ∪···∪ j ) .
In Step (c), we augment the already known SGS for G|(1 ∪···∪ j−1 ) , modifying
the nearly linear-time algorithm of Section 4.5 by executing all sifts only as
words and checking whether a siftee is trivial by computing the images of all
points in B and . Initially, we declare that the already known data structure
is up to date below level k. Note that, although the Schreier trees at the base
points β1 , . . . , βk do not change, we may have to add new generators to some
Si for some i ≤ k.
Now we address the time requirement of the algorithm. The base case,
the computation of an SGS for G|1 , runs in O(|1 | log4 |G| log n +
n log3 |G| log n + |T ||1 | log |G|) maximum time and, with probability greater
than 1 − 1/n d+2 , the algorithm terminates in
O(min{log |G|, |1 |} log4 |G| log n + n log3 |G| log n

+ |T | min{log |G|, |1 |} log |G|)
time. Recursion is called at most log |G| times, since the order of G|1 ∪···∪ j is
at least twice the order of G|1 ∪···∪ j−1 .
One call of Step (a) takes O(nt 2 log n) time, where t is the sum of depths of
Schreier trees at the time of the call. This sum is always at most 6 log |G|, so the
total cost of calls of Step (a) during the entire recursion is O(n log3 |G| log n).
Calling Step (b) on the orbit j runs in time O(| j | log4 |G| log n +
|T || j | log |G|) and, with probability greater than 1 − 1/n d+2 , this call ter-
minates in
O(min{log |G|, | j |} log4 |G| log n + | j | log3 |G| log n

+ |T | min{log |G|, | j |} log |G|)
time.
In Step (c), the cost of Schreier tree computations and multiplying out non-
trivial siftees is O(n log3 |G| log n) during the entire algorithm, since the usual
estimates (at most log |G| base points, each S j increases at most log |G| times,
etc.) are valid and we do not duplicate work at the different calls of Step (c).
Sifting one Schreier generator and deciding whether the siftee is trivial
can be done in O(log2 |G|) time, since we evaluate siftees on |B| + || ≤
2 log |G| points. During one call of Step (c), Schreier generators are sifted
O(log3 |G| log n) times because there are at most log |G| base points, each S j
increases at most log |G| times, and O(log |G| log n) generators are sifted at
the processing of a level. Consequently, during all calls of Step (c), we spend
O(log6 |G| log n) time handling Schreier generators.
If an element of the generating set T sifts through as a word in G|1 ∪···∪ j−1
with trivial siftee w (i.e., w fixes all base points in 1 ∪ · · · ∪ j−1 ), then we
store w and sift it along the base points in j at the next call of Step (c). In this
way, we can achieve that the total time requirement of sifting T during calls of
Step (c) is O(|T | log2 |G|).
Summarizing, we obtain that the maximum time requirement of the entire
algorithm is O(log6 |G| log n + n log4 |G| log n + |T |n log |G|). Note that we
ensured that each call of the base case and of Steps (a), (b), and (c) succeeds
with probability greater than 1 − 1/n d+2 , so the probability of a correct output
is greater than 1 − 3 log |G|/n d+2 > 1 − 1/n d and the algorithm terminates in
O(log6 |G| log n + n log3 |G| log n+|T |n log |G|) time with probability greater
than 1 − log |G|/n d+2 > 1 − 1/n d .
Remark 5.2.7. As mentioned in Section 4.5.1, GAP employs a heuristic version

of the nearly linear-time Monte Carlo SGS construction. At the processing of a
level of the Schreier tree data structure, we form only one Schreier generator g
from a random coset and a random subproduct of generators on this level. Then
g is sifted as a word, and the siftee is evaluated at each point in orbits of size
at most 50, and at five randomly chosen points in orbits of size greater than 50.
Although Lemma 5.2.4 cannot be sharpened (cf. Exercise 5.3), five randomly
chosen points have a decent chance to detect that a word does not represent the
identity on an orbit. By a lemma of Cauchy (widely attributed to Burnside), the
average number of fixed points of elements in a transitive permutation group is
one (cf. Exercise 5.4).
Groups with numerous orbits do not occur frequently as inputs in practical
computations. Usually, we encounter a group H with a lot of orbits as a subgroup
of some group G so that when we need an SGS for H , we already have a base
B for G. Then H B = 1, so all sifts in H can be done as words and it is enough
to evaluate the siftees on B. GAP also contains a version of the algorithm of
Lemma 4.5.6, which sifts the elements described in Lemma 4.5.6 as words.
This version is utilized if, for example, the user asks for a randomized testing
of the correctness of an SGS construction for an input group with known base.
Monte Carlo SGS constructions are applied in GAP for groups acting on more
than 100 points. For smaller permutation domains, Sims’s original deterministic
SGS construction (cf. Section 4.2) is called.
5.3. Permutation Groups as Black-Box Groups

Representing elements of a group G ≤ Sym() by storing only the images
of a base B = (β1 , . . . , βm ) has a serious drawback: From this representation,
it is hard to compute the representation of the product of two elements of
G. In applications where products have to be computed frequently, invoking
Lemma 5.2.1 at each occasion becomes too cumbersome.
Representation as a Black-Box Group

A compromise between storing base images and full permutations is to represent
the elements of G by words over an alphabet consisting of the labels S of a
Schreier tree data structure of G. Suppose that S is closed for taking inverses.
Each g ∈ G can be written uniquely in the form g = rm rm−1 · · · r1 , where ri
is a coset representative for G [i] mod G [i+1] . Also, taking the inverses of edge
labels along the path from βi to βiri in the ith Schreier tree in the Schreier tree
data structure of G, we can decompose ri as a product of labels. Eventually,
g is written as a word in S. We call this word the standard word representing
g. The length of any standard word is at most the sum t of the depths of the
Schreier trees.
Representing the elements of G by standard words defines an isomorphism
between G and a black-box group H , where the elements of H are strings of
length at most t over the alphabet S. Let ψ : G → H denote this isomorphism.
Moreover, let µ denote the time requirement in H to compute the product of
two given elements of H or the inverse of an element of H , and let ξ be the
time requirement to compute a nearly uniformly distributed random element in
a given subgroup of H .
Lemma 5.3.1. There is an absolute constant c (for example, we can choose

c = 8) such that:
(a) The quantities ξ, µ, t for H are bounded from above by logc |G|.
(b) For any g ∈ G, ψ(g) can be computed in O(logc |G|) time. Conversely,
given h ∈ H, ψ −1 (h) can be computed in O(|| logc |G|) time.
Proof. (a) By Lemma 4.4.2, given any strong generating set of G relative to
some nonredundant base B, we can compute Schreier trees of depth at most
2 log |G| for a Schreier tree data structure of G in nearly linear time by a deter-
ministic algorithm. Hence, we can suppose that t ≤ 2|B| log |G| ≤ 2 log2 |G|.
(In fact, if the SGS of G was computed by the nearly linear-time Monte Carlo
algorithm described in Section 4.5 then t ∈ O(log |G|).)
Given h 1 , h 2 ∈ H , we compute their products by concatenating them, then
follow the images of base points in this concatenated word to obtain a function
f : B → , and use Lemma 5.2.2 to compute the standard word represent-
ing h 1 h 2 . This requires O(|B|t) time. Similarly, the inverse of any h ∈ H can
be computed by taking the inverse of the word h formally, as a word, follow-
ing the images of base points to obtain a function f : B → , and again using
Lemma 5.2.2. Hence, products and inverses in H can be computed in O(|B|t)
time and so µ ∈ O(log3 |G|). Since the elements of H are represented by stan-
dard words in a unique way, comparision of group elements can be done in
O(t) time.
Random elements of H can be constructed in O(t) time, by concatenating
words representing randomly chosen coset representatives along the point sta-
bilizer subgroup chain of G. In subgroups of H , nearly uniformly distributed
random elements can be constructed by the algorithm of Theorem 2.2.4 in
O(log8 |H |) time.
(b) Given g ∈ G, ψ(g) can be computed in O(|B|t) time by Lemma 5.2.2.
Conversely, given h ∈ H, ψ −1 (h) is computed by at most t permutation multi-
plications.
The advantage of considering G as a black-box group is that we can perform

O(n) group operations while still remaining within the nearly linear time frame.
The disadvantage is that we lose the information stored implicitly in the cycle
structure of permutations; for example, we cannot compute orders of group ele-
ments in polynomial time. Hence, we have to rely on the techniques of Chapter 2.
The idea of considering permutation groups as black-box groups was in-
troduced in [Beals and Seress, 1992]. We shall describe several applications
(cf. Sections 6.2, 8.3, and 8.4), but here we only give an illustration of the
technique. We describe a nearly linear-time solution from [Morje, 1995] for
the following problem, which arose as part of the nearly linear-time computa-
tion of Sylow subgroups in groups with no composition factors of exceptional
Lie type: Given an arbitrary permutation representation G ≤ Sym() for some
classical simple group G of Lie type, we want to construct the action of G on
one of its natural domains. For example, in the case G ∼ = PSLd (q), we need the
action of G on the (q d −1)/(q −1) points or on the (q d −1)/(q −1) hyperplanes
of the (d − 1)-dimensional projective space. The permutation representations
on the projective points and on the hyperplanes have degree no greater than the
original one. Also, in later steps of the Sylow algorithm, the elements of the new
permutation domain are identified with the points of the projective space, which
enables us to write matrices representing some cleverly chosen generators of
G. Then the matrix representation is used for finding Sylow subgroups of G
and for conjugating two given Sylow subgroups into each other.
In the following description, our main goal is to demonstrate the use of black-
box group techniques, and we use properties of the special linear groups without
definitions and proofs. (The basic properties of classical groups of Lie type can
be found, for example, in [Taylor, 1992].) The reader unfamiliar with these
groups can skip the rest of the section without impeding the understanding of
further material. Let q = p e , with p prime. The idea is to find the conjugacy
class
of transvections and the conjugation action of G on
. The set
is partitioned into (q d − 1)/(q − 1) equal parts in two different ways; in the

first partition, each part contains the transvections that fix pointwise the same
hyperplane, and in the second partition, each part contains the transvections with
the same center. The actions of G on these parts are permutation isomorphic to
the two desired actions on the projective space.
We may assume that G acts primitively on the original permutation domain
, since otherwise we can apply a transitive constituent homomorphism or
block homomorphism (cf. Section 5.1.3) to obtain a faithful action of G on a
smaller permutation domain. We shall describe in Section 5.5 how can one find
a nontrivial block of imprimitivity if G does not act primitively. We shall also
assume that || = (q d −1)/(q −1) and || = (q d −1)(q d−1 −1)/(q 2 −1)(q −1),
which means that the given primitive action of G is not permutation isomorphic
to the action on subspaces of dimension one, two, d − 2, or d − 1. If the given
action is on the one- or (d − 1)-dimensional subspaces then there is nothing
to do; the other two special cases we just excluded are handled by another
argument which we do not describe here.
How do we find a transvection? The number of transvections is relatively
small (the restrictions in the previous paragraph ensure that |
| < ||). On one
hand, this is good, since we have at least enough time to write down the action
of the generators of G as permutations of
. On the other hand, we have only
a very slim chance of finding a transvection in the most natural way, by taking
random elements of G. Hence, we apply an enhancement of this idea: We are
searching for some g ∈ G such that a suitable power of g is a transvection.
We consider G as a black-box group H , as described earlier in this section. We
choose random elements from H until we find g ∈ H with order px, where
d−3 i
x|q d−2 − 1 but x does not divide i=1 (q − 1). It can be shown that for such
a g, the (q d−2
− 1)st power of g is a transvection.
The difficulty with this procedure is that we cannot compute the orders of
elements in black-box groups. However, we do not need the exact order of
g; rather, we have to check that it satisfies the given divisibility constraints.
These can be checked by taking appropriate powers of g, and comparing the
result with the identity of H . Powers of g are taken by repeated squaring, as
described on the first page of Chapter 2. Once a transvection is found, conjugates
are determined by an orbit computation, as in Theorem 2.1.1(iii). The required
ordering for this orbit computation may be chosen as the lexicographic ordering
of the standard words, with respect to some ordering of the alphabet (i.e., the
SGS of G). The action of the generators on
is obtained automatically during
the orbit computation.
Note that the algorithm we just described is a Las Vegas one. It can be shown
that a random element of H satisfies our order restrictions with probability
(1/(qd)); hence the running time of the algorithm is better than nearly linear,
it is o(||). In particular, working in the black-box group, we do not need to
write down or read any permutation of .
The basic idea of the algorithm, namely defining a large set X such that an
appropriate power of the elements of X falls into a small conjugacy class, is very
important. Variations of this idea are used, for example, in [Beals and Babai,
1993], [Hulpke, 1993], and [Beals et al., 2002]. In this book, we shall use the
idea in Sections 9.4 and 10.2. There are also applications in nonalgorithmic
settings (see, e.g., [Guralnick and Kantor, 2000]).
It turns out that it is possible to construct the matrix representation of black-
box classical simple groups without listing any conjugacy classes of group
elements (cf. [Kantor and Seress, 2001]). Although we shall state this result
5.4 Base Change 97
formally in Section 8.3, its proof far exceeds the scope of this book both in
length and in the required group theoretical background.
5.4. Base Change

An oft occurring problem is that we already have an SGS for a group G relative
to some base but we need another base and corresponding SGS that are more
suitable for the problem at hand. In Section 5.1.1, we saw that we can construct a
base relative to any ordering of the permutation domain; however, more efficient
methods exist that exploit the already existing SGS.
The first such base change algorithm was described in [Sims, 1971a, 1971b].
Given an SGS relative to some base (β1 , . . . , βm ), Sims’s method constructs an
SGS relative to the base
(β1 , . . . , βi−1 , βi+1 , βi , βi+2 , . . . , βm ).
Repeated applications of this procedure enable us to compute an SGS relative

to any base (α1 , α2 , . . . , αk ). Namely, if α1 does not already occur on the list
(β1 , . . . , βm ) then we insert it at the earliest possible position (i.e., after the
first βi such that G (β1 ,...,βi ) fixes α) as a redundant base point; after that, with
repeated transpositions, we bring α1 to the first position. Then, working in G α1 ,
we construct an SGS relative to a base starting with (α1 , α2 ), and so on. This
incremental procedure is especially efficient when only an initial segment of a
new base is prescribed. For example, this is the case when we need generators
for the stabilizer of some point α1 ∈ .
The Exchange of Two Base Points (Deterministic Version)

Let S be an SGS for G relative to the base (β1 , . . . , βm ), let G = G [1] ≥
· · · ≥ G [m+1] = 1 be the corresponding point stabilizer subgroup chain, and
let R1 , . . . , Rm be the transversals along this subgroup chain. Our goal is to
compute an SGS and the transversals R̄ 1 , . . . , R̄ m corresponding to the point
stabilizer chain G = Ḡ [1] ≥ · · · ≥ Ḡ [m+1] = 1 defined by the new ordering
(β1 , . . . , βi−1 , βi+1 , βi , βi+2 , . . . , βm ).
For 1 ≤ j < i − 1 and i + 1 < j ≤ m, R̄ j = R j . Also, since Ḡ [i] = G [i] ,
S∩G [i]
the new ith fundamental orbit βi+1 and a Schreier tree coding R̄ i are readily
obtained. The bulk of the work is to compute R̄ i+1 .
We initialize the new (i + 1)st fundamental orbit as ¯ i+1 := {βi }. Then we
form Schreier generators (only as words, without multiplying out permutations)
from R̄ i and S ∩ G [i] . If, for some word w representing a Schreier generator,
the image βiw is not in ¯ i+1 then we multiply out w to obtain a permutation p ∈
S∩G (β ,...,β ,β )
G (β1 ,...,βi−1 ,βi+1 ) , add p to the set S, and recompute ¯ i+1 := βi 1 i−1 i+1
.
Note that the correct value of |i+1 | = | R̄ i+1 | = |Ri ||Ri+1 |/| R̄ i | is known in
¯
advance, so the algorithm may terminate (and usually does) without considering
all Schreier generators. We have to add at most log | R̄ i+1 | permutations to S,
since the already constructed part of ¯ i+1 at least doubles at each addition.
The Schreier tree representing the transversal R̄ i+1 can be computed from the
augmented strong generating set S. If we use the deterministic algorithm in
the proof of Lemma 4.4.2 to code transversals in Schreier trees of depth at
most 2 log |G|, then transposing two base points can be done in nearly linear
time.
The Exchange of Two Base Points (Las Vegas Version)

The set R̄ i+1 can also be computed by a randomized nearly linear-time algo-
rithm. Since we have an SGS for G [i] , we can construct uniformly distributed
random elements g of G [i] . Let ḡ ∈ R̄ i so that g(ḡ)−1 ∈ G [i] βi+1 . As we saw
in the proof of Lemma 4.3.1, g(ḡ)−1 is a uniformly distributed random ele-
ment of G [i] βi+1 , so it has a chance at least 1/2 to increase the already known
part of ¯ i+1 and R̄ i+1 . Since we know the size of R̄ i+1 in advance, this con-
struction is Las Vegas type. This randomized construction is usually faster than
the deterministic nearly linear one described earlier; an additional advantage
is that, by Theorem 4.4.6, the Schreier tree coding R̄ i+1 will be shallow with
high probability. The GAP implementation of base change uses this Las Vegas
version.
We note that Sims originally used a different method for computing R̄ i+1
(cf. Exercise 5.5). This result was generalized in [Brown et al., 1989] and
used in a deterministic O(n 2 ) time algorithm to perform a cyclic base change,
that is, changing the base from (β1 , . . . , βm ) to (β1 , . . . , βi−1 , β j , βi , βi+1 , . . . ,
β j−1 , β j+1 , . . . , βm ). In [Cooperman and Finkelstein, 1992], this generalization
is combined with Lemma 4.4.8 to obtain a nearly linear-time deterministic cyclic
base change algorithm.
Base Change by Conjugation

[Sims, 1971a] also contains the following observation. Suppose that an SGS S
relative to the base (β1 , . . . , βm ) is known and we need an SGS S̄ relative to a
g
base starting with (α1 , . . . , αk ). If there exists g ∈ G with βi = αi for 1 ≤ i ≤ k
then S̄ := S g can be chosen, which is usually faster than a sequence of base
point transpositions. Therefore, in GAP, the method of transposing base points
is combined with the construction of a conjugating permutation c. Initially, we
set c := 1 and build up c recursively, obtaining permutations that conjugate
5.4 Base Change 99
larger and larger initial segments of the base to the desired points. Suppose
that currently, after some base point transpositions, we have an SGS relative
−1
to the base (γ1 , . . . , γl ), and γic = αi for 1 ≤ i ≤ j − 1. If α cj is in the
jth fundamental orbit then we compute the transversal element r from the jth
−1 −1
transversal satisfying γ jr = α cj , and we redefine c := r c. If α cj is not in the
−1
jth fundamental orbit then we bring α cj in the jth position of the base via
repeated transpositions and leave c unchanged. In both cases, we get a base
(γ1 , . . . , γl ) and c such that (γi )c = αi holds for i ≤ j. After k steps, we have
an SGS S relative to some base such that S c is an SGS relative to a base starting
with (α1 , . . . , αk ).
If a very long initial segment of a new base is prescribed and it differs
significantly from the current base, it may be beneficial to construct the new
SGS from scratch, instead of applying a long sequence of transpositions of base
points. Since we already have an SGS, uniformly distributed random elements
of G are available. Thus the heuristic SGS construction described in Section 4.3,
the so-called random Schreier–Sims algorithm, is upgraded to Las Vegas type
by Lemma 4.3.1.
Lemma 5.4.1. Let G ≤ Sym(), let = (β1 , β2 , . . . , βn ) be an ordering

of , and suppose that uniformly distributed random elements of G can be
constructed in time ξ . Let d > 0 be also given. Then there is a Monte Carlo
algorithm that constructs an SGS of size O(d log n log |G|) for G, relative to
the (redundant) base (β1 , β2 , . . . , βn ). The time requirement of the algorithm
is O(d log n log |G|(ξ + nd log n log |G|)) and the probability of error is less
than 1/n d . If |G| is known in advance then this algorithm is a Las Vegas one.
Proof. Suppose that we already constructed some subsets S1 , S2 , . . . of gener-

ator sets for the chain of subgroups G [1] := G, G [2] := G β1 , . . . and Schreier
trees (Si , Ti ) reaching subsets 1 , 2 , . . . of the fundamental orbits. During
the construction, we maintain the property that, for all i, if the current Si has
k elements then the depth of the Schreier tree Ti is at most k. Because of this
S
restriction, it is possible that i does not contain all points of βi i .
Let g be a uniformly distributed random element of G. We sift g through
the partial coset representative system we have already constructed and obtain
elements g1 , g2 , . . . , g j , gi ∈ G [i] , up to an index j; the sifting process stops at
a subgroup G [ j] where the siftee falls into a coset not yet represented in (S j , T j ).
As observed in the proof of Lemma 4.3.1, gi is a uniformly distributed random
element of G [i] for 1 ≤ i ≤ j. For 1 ≤ i ≤ j we add gi to the generator set Si
−1
for G [i] , and we add one more level to the tree Ti , by computing γ gi for all
g −1
γ ∈ i . Hence the new tree reaches the subset i ∪ i i of the fundamental
gi−1
orbit. (If i ∪ i = i then we discard gi .) The procedure stops when d log n
random elements sift through (or, in the case when |G| is known, d log n random
n
elements sift through or i=1 |i | reaches |G|). By Lemma 4.3.1, at termination
we have an SGS for G with probability at least 1 − 1/n d .
By Theorem 4.4.6, for each i with G [i] = G [i+1] , the size of |Si |, and hence the
depth of the ith Schreier tree, will be O(d log n log |G [i] : G [i+1] |) with probabil-
ity at least 1−1/n d+1 . Thus the new SGS will consist of O(d log |G| log n) group
elements and the sum of depths of the Schreier trees is O(d log n log |G|). The
time requirement of the algorithm is as stated, since we sift O(d log n log |G|)
random elements at an O(nd log n log |G|) cost each.
Although it is easy to give examples for which constructing an SGS from

scratch is faster than the construction via a sequence of base point transpositions,
in practice it is hard to recognize in advance the instances when the construction
from scratch will be faster. The default method in GAP is via a sequence of
base point transpositions combined with a conjugation, as described in this
section.
5.5. Blocks of Imprimitivity

Let G ≤ Sym() be transitive, and let n = ||. Recall that ⊆ is a block
of imprimitivity for G if for all g ∈ G, g = or g ∩ = ∅. It is easy to
see that G-images of a block of imprimitivity partition . A transitive group
G ≤ Sym() is called primitive if the only blocks of imprimitivity are ∅, ,
and the one-element subsets of . A characterization of primitive groups is that
G is primitive if and only if G ω is a maximal subgroup of G for some (and
then, for all) ω ∈ . More generally, the subgroups of G containing G ω are
in one-to-one correspondence with the blocks of imprimitivity containing ω.
Namely, if G ω ≤ H ≤ G then ω H is a block of imprimitivity for G; conversely,
if is a block of imprimitivity for G with ω ∈ then H := {h ∈ G | ωh ∈ }
is a subgroup satisfying G ω ≤ H ≤ G.
Primitivity is a very strong structural property and, from the very beginning
of group theory, primitive groups have played a central role in the study of
permutation groups. Also, algorithms for finding special subgroups often use
the action on blocks of imprimitivity of the input group to reduce the problem
to smaller groups. Hence finding blocks of imprimitivity efficiently is crucial
for many investigations.
Finding blocks of imprimitivity is one of the few tasks that can be accom-
plished without computing a strong generating set. The reason for this may
be that computing blocks can be reduced to an orbit computation, albeit on a
set of size (n 2 ) (cf. Exercise 5.6). The first algorithm for computing blocks of
imprimitivity is given in [Atkinson, 1975], where a variant of this orbit compu-
tation is performed without the quadratic increase of the permutation domain.
Atkinson’s algorithm runs in quadratic time for any transitive input group.
5.5.1. Blocks in Nearly Linear Time

The first nearly linear-time algorithm for computing blocks is given in [Beals,
1993a]. Here we reproduce a method from [Schönert and Seress, 1994], copy-
right c 1994 Association for Computing Machinery, Inc. Reprinted by Per-
mission. This method has the same worst-case estimate as Beals’s algorithm
but is easier to implement and runs faster in practice. The algorithm uses some
ideas and a subroutine of Beals.
Suppose that G ≤ Sym() is transitive, and let α ∈ be fixed. For β ∈ ,
let β̄ denote the smallest block of imprimitivity containing {α, β}. Two elements
β, γ ∈ are called equivalent, in notation β ∼ γ , if β̄ = γ̄ . Clearly, ∼ is an
equivalence relation on . We also introduce a binary relation , on : β , γ
if and only if β̄ ⊆ γ̄ . Then , is a partial order (we allow that β , γ and γ , β
for distinct β, γ ; this happens exactly when β ∼ γ ). Each block containing α
is a union of equivalence classes. A block is minimal if it is the union of {α}
and exactly one other equivalence class. Our goal is to find a minimal block.
Let R be a transversal for G mod G α . For β ∈ , let rβ ∈ R be the unique
element of R satisfying αrβ = β. Because the subgroups of G containing G α
are in one-to-one correspondence with the blocks of imprimitivity containing
α, it is easy to see that β̄ is the orbit of α in the group G α , rβ , points in the
orbits of G α are equivalent, and γ ∈ β̄ if and only if rγ ∈ G α , rβ .
Theorem 5.5.1. Suppose that a set S of generators for some transitive G ≤

Sym() is given and || = n. Then a minimal block of imprimitivity can be
computed in O(n log3 |G| + n|S| log |G|) time by a deterministic algorithm.
The algorithm runs much faster in the case when the orbits of G α and the
coset representatives rβ are already known. For example, this situation occurs
when a strong generating set for G was already computed.
Theorem 5.5.2. Suppose that the orbits of G α are known and R is coded in a
Schreier tree of depth l. Then a minimal block can be computed in O(nl log n)
time by a deterministic algorithm.
We use Beals’s idea to compute β̄ as the orbit of α in G α , rβ rather than

the component containing α in the graph {α, β}G , as in [Atkinson, 1975]
(cf. Exercise 5.6). Also, as in [Beals, 1993a], the algorithm consists of two
phases: Construction and checking. In the case when the orbits of G α are known,
the construction phase always returns a minimal block. In the case when only
a refinement of the partition of into orbits of G α is known, the construction
phase may return an incorrect answer. In this case, the checking phase returns
an element of G α that collapses some sets of the known partition, and we re-
turn to the construction phase. In fact, we use a small modification of Beals’s
Verify Block routine for the checking phase (cf. Lemma 5.5.5).
This is an algorithm for which the claimed running times critically depend
on the data structures used; in fact, the data structures we introduce are largely
responsible for the nearly linear running time. Therefore, contrary to our general
philosophy, we shall give a detailed description of the data structures and a
pseudocode for the algorithm.
We start with our usual high-level description. The basic idea is that, while
computing β̄ for some β ∈ , we immediately abandon this computation as
we encounter some γ , β for which we cannot exclude that γ̄ is a proper
subset of β̄, and then we start to compute γ̄ . Let H be a subgroup of G α for
which generators are already known. In the construction phase, we maintain a
list L = (1 , 2 , . . . , k ) of disjoint subsets of \{α} with the property that
elements in i are already known to be equivalent, and 1 - 2 - · · · - k .
Also, each set i is a union of H -orbits. Moreover, for each i , we store a
representative λi ∈ i . The list L is initialized as L = (1 ), where 1 is an
orbit of H different from {α}.
We always work in the last set k in L and try to compute λ̄k . To this end, we
compute the images γ rλk for all γ ∈ k . If µ := γ rλk is not in {1} ∪ 1 ∪· · ·∪k
then we define k+1 as the H -orbit of µ, abandon the computation of λ̄k , and
start to compute λ̄k+1 for a representative λk+1 ∈ k+1 . If µ = γ rλk ∈ i for
some i < k then we see that i ⊆ λ̄k . Hence we redefine i as the union of
the previous i , i+1 , . . . , k , and we start the procedure in i . Finally, if all
images γ rλk are in {α} ∪ k then we stop the construction phase.
If the orbit structure of H is the same as the orbit structure of G α then
{α} ∪ k is a minimal block (cf. Lemma 5.5.3). In the general case, we call
Beals’s Verify Block routine, which checks that {α} ∪ k is a minimal block
or not; if not, it returns a permutation g ∈ G α \H with the property that the
number of orbits of H, g is strictly less the number of orbits of H . We update
L to achieve that each set i is the union of orbits of H, g, and then we return
to the construction phase.
The pseudocode for the algorithm is as follows:
MinimalBlock[G]
Input: Generators for G ≤ Sym() and for H ≤ G α ; H may be trivial.
Output: A minimal block of imprimitivity properly containing {α}.
(Note that G is primitive if and only if the output is .)
Step 1: Initialize: Compute coset representatives rβ for G mod G α as words of

length at most 2 log |G| over a set T ∪ T −1 ⊆ G, |T | ≤ log |G|; 1 := ω H
for some arbitrary ω ∈ \{α}; λ1 := ω; L := (1 ).
Step 2: :=last element of the list L = (1 , . . . , k ); λ :=representative
of .
Step 3: for all γ ∈ do
(i) Compute µ := γ rλ .
(ii) if µ ∈ {1} ∪ 1 ∪ · · · ∪ k
then add a new set k+1 to L: let k+1 := H -orbit of µ; λk+1 :=first
element of k+1 ; goto Step 2.
elseif µ ∈ i , i < k
then i := i ∪ · · · ∪ k ; λi :=representative of largest (by car-
dinality) of previous i , i+1 , . . . , k ; remove i+1 , . . . , k ;
goto Step 2.
Step 4: Verify that {α} ∪ is a block. If not, compute g ∈ G α \H . If yes,
output {α} ∪ .
Step 5: H := H, g; update L by possibly merging some of the i and adding
points from \({α} ∪ 1 ∪ · · · ∪ k ) to achieve that 1 - · · · - k and
each i is the union of H -orbits; goto Step 2.
We postpone details of Steps 4 and 5 until later.

It is clear from the description already given that, at any moment during the
execution of the algorithm, each i is the union of H -orbits. Also, using the
remarks preceding Theorem 5.5.1, it is easy to see the following facts:
Lemma 5.5.3. (a) At any moment during the execution of the algorithm, each
i is a subset of a ∼-equivalence class and 1 - · · · - k .
(b) Suppose that Step 4 of the algorithm is entered and, at that moment,
is the last element of L and λ is the representative of . Then α H,rλ ⊆
{α} ∪ ⊆ α G α ,rλ ; in particular, if H and G α have the same orbits then
{α} ∪ is a minimal block of imprimitivity.
Proof. (a) We prove the two statements in (a) simultaneously, by induction on

the number of executions of the loop in Steps 2 and 3. After 0 executions of
this loop (i.e., after the initialization in Step 1), both statements are obviously
true. Suppose that the statements hold for the list L = (1 , . . . , k ); in partic-
ular, k ⊆ λ̄k = α G α ,rλk for the representative λk of k . If a new set k+1 is
added to L then k+1 ⊆ α G α ,rλk , so k+1 , k . Also, since k+1 is an orbit
of H ≤ G α , all elements of k+1 are equivalent. Similarly, if a final segment
(i , i+1 , . . . , k ) of L is merged then i ⊆ α G α ,rλk , so λi , λk for the rep-
resentative λi of i . But λi - ν - λk for all ν ∈ i ∪ · · · ∪ k ; therefore all
such ν are equivalent.
(b) {α} ∪ is the union of H -orbits and closed for the action of rλ ; hence it
contains α H,rλ . Also, since the elements of are equivalent, ⊆ λ̄ = α G α ,rλ .
For the last assertion, we note that = α H,rλ depends only on the partition
of into orbits of H , and not on H itself since is the set containing α in the
join of the two partitions defined by the orbits of H and by the orbits of rλ in
the partition lattice of .
Next we describe the data structures used in the algorithm. We also give some
further details of Steps 4 and 5 and prove the claimed bounds on the running
time.
The coset representatives rβ are stored in the usual Schreier tree data structure.
The sets 1 , . . . , k are stored as linked lists. We maintain four lists of
length k, named firstL, lastL, lengthL, and testL, and two lists of length n,
named nextL and reprL. firstL[i] and lastL[i] contain the first and last ele-
ments of i , respectively; lengthL[i] is the cardinality of i . reprL[ν], if de-
fined, contains the index i such that ν ∈ i . The list nextL contains the links:
nextL[ν], if defined, is the successor of ν in the set i , for ν ∈ L i (note that the
set i is stored as a list in the computer, so it makes sense to talk about the suc-
cessor of ν). The representative λi of i will always be the first element of i .
testL[i] gives the latest element ν of i for which ν rλi was already computed.
The orbits of H are also stored in linked lists. We define firstH, lastH, nextH,
and reprH analogously to the appropriate items for L.
The implementation of Step 3 is quite straightforward; therefore we re-
mark only on one subcase. Suppose µ = γ rλ was computed for some γ ∈
and, using the list reprL, we found that µ ∈ i for some i < k. The smallest
index j such that | j | ≥ | j | for all j ∈ [i, k] can be determined from the list
lengthL. Then we link the sets i , . . . , k such that the elements of j come
first; in this way, we can maintain the property that the coset representative rλ
belonging to the first element λ of the new block was already applied to an initial
segment of this new block, and the last element of this initial segment is given as
testL[ j].
We use the same “biggest comes first” linking strategy in Step 5. Suppose
that Step 4 returned some g ∈ G α \H such that g collapses some orbits of H ;
our goal is to make the necessary merges of the sets i and orbits of H to
maintain the properties described in Lemma 5.5.3(a).
Step 5:
(i) Order the entries of lengthL in decreasing order:
| j1 | ≥ | j2 | ≥ · · · ≥ | jk |.
(ii) for i = 1 to k do
iffirstL[ ji ] is bound (i.e., defined)
then for all ν ∈ ji do
µ := ν g ;
if µ ∈ {α} ∪ 1 ∪ · · · ∪ k
then link the H -orbit of µ to ji
elseif µ ∈ l , l = ji
then link all s with firstL[s] bound and s between l and ji to ji ;
unbind firstL[s].
(iii) Left-justify bound entries of firstL (i.e., write the entries of firstL that are
still defined in a sequence); update lastL, reprL, testL, lengthL, nextL.
(iv) Compute orbits of (new) H ; update firstH, lastH, reprH, nextH.
At the end of Step 5, the sets in L are closed for the action of g.
The following lemma plays a crucial role in estimating the running time.
Lemma 5.5.4. For each ν ∈ , the representative λi of the set i ∈ L con-

taining ν changes at most log n times during the algorithm.
Proof. The representative may change only when i is merged into a larger set
in Step 3 or Step 5. At these occasions, since the representative of the new set
is defined as the representative of the largest original set j that was merged,
|| ≥ 2|i | for all sets i whose representative was changed. Clearly, such
doubling of size can occur only at most log n times.
Step 4 is handled by the following lemma. Recall that the input is a set
⊆ \{α} such that all elements of are known to be equivalent and a
representative λ ∈ . The set {α} ∪ is closed for the action of rλ .
Lemma 5.5.5. (a) In O(nl + n|S|) time, it can be decided whether {α} ∪ is
a block of imprimitivity.
(b) If {α} ∪ is not a block of imprimitivity then some g ∈ G α \H can be
constructed in O(n log2 |G| + n|S1 |) time, where S1 is the known generating
set of H .
Proof. (a) Let := {α} ∪ . We try to partition into disjoint G-images of .
Suppose that the disjoint sets = 1 , 2 , . . . , j−1 are already constructed;
we pick β j ∈ \(1 ∪ · · · ∪ j−1 ) and compute rβ j . If rβ j ⊆ \(1 ∪ · · · ∪
j−1 ) then we define j := rβ j ; otherwise, we have ∅ rβi ∩ rβ j rβ j
for some i < j, which proves that is not a block. Since the rβ j are stored as
words of length at most l, this part of the algorithm runs in O(nl) time.
If the partition of into G-images of succeeds then we check that each
generator of G respects this partition. This can be done in O(n|S|) time.
(b) If is not a block then we claim that := α H,rλ is not a block either.
By Lemma 5.5.3, ⊆ . If = then obviously is not a block. If
then λ ∈ \{α} \{α} ⊆ λ̄, so is not a block.
We repeat the algorithm described in part (a) with playing the role of . This
means that we try to partition into G-images of and, if the partition succeeds,
compute the action of the generators of G on the sets in the partition. Since
is not a block, we shall encounter h 1 , h 2 ∈ G such that ∅ = h 1 ∩ h 2 h 1 .
(If h 1 , h 2 are found during the partition phase then h 1 = rβi , h 2 = rβ j for
some coset representatives; if they are found at the checking of the action of
generators then h 1 = rβi s, h 2 = rβ j for some s ∈ S.) Defining g1 := h 1 h −1 2 , we
have ∅ = ∩ . We multiply out g1 as a permutation. As in part (a), the
g1
time requirement so far is O(nl + n|S|).

After that, we use the algorithm in the proof of Lemma 4.4.2 to build a Schreier
tree (S ∗ , T ) of depth at most 2 log |H, rλ |, coding a transversal for H, rλ mod
H, rλ α . This Schreier tree computation takes O(n log2 |G| + |S1 |n) time.
Let β, γ ∈ such that β g1 = γ , and let wβ , wγ be the corresponding coset rep-
resentatives from (S ∗ , T ). In particular, wβ = wγ = . Then g := wβ g1 wγ−1 ∈
G α and g = . Thus g ∈ H and α H,g α H ; otherwise we would have
α H,g,rλ = α H,rλ = , a contradiction.
Lemma 5.5.6.
(a) In O(n log2 |G| + n|S|) time, coset representatives rβ as words of length l
for some l ≤ 2 log |G| can be computed.
(b) The total amount of time spent in Step 3 is O(nl log n).
(c) The time spent in Step 4 is O(n log3 |G| + n|S| log |G|).
(d) The time spent in Step 5 is O(n log n log |G|).
Proof. (a) This is proven by Lemma 4.4.2. Recall that, although a Schreier tree
codes inverses of coset representatives, the set S ∗ of labels in the Schreier tree
(S ∗ , T ) constructed in Lemma 4.4.2 is closed for inverses. Hence, after (S ∗ , T )
is constructed, the coset representatives themselves can also be written as words
in S ∗ .
(b) By (a), computing an image γ rλ takes O(l) time. By Lemma 5.5.4, for
each fixed γ ∈ , γ rλ for some λ is computed at most log n times; hence the
total cost of these computations is O(nl log n). Updating reprL and the linking
of lists cost O(n log n).
(c) Step 4 is entered at most log |G α | times, since H increases between
consecutive calls.
(d) Step 5 is entered always after a call of Step 4, that is, at most log |G α |
times. At one call, ordering lengthL costs O(n log n) whereas the linking and
updating of lists runs in O(n) time.
Lemma 5.5.6 immediately implies Theorem 5.5.1. Theorem 5.5.2 is obtained

by combining Lemma 5.5.6(b) and Lemma 5.5.3(b). We also remark that within
the time bounds claimed in Theorems 5.5.1 and 5.5.2, all minimal blocks con-
taining α can be constructed.
Our running time estimate is overly pessimistic. It is dominated by the appli-
cations of Lemma 4.4.2, which, in practice, are much faster than our worst-case
estimate. Also, if the orbit structure of G α is not known, a few (five in the GAP
implementation) random Schreier generators may be computed to approximate
G α , and this seems to be enough: The construction phase may return a correct
block of imprimitivity even in the case when the orbit structure of H is not the
same as the orbit structure of G α , and in practice it is extremely rare that the
output of the construction phase is incorrect.
5.5.2. The Smallest Block Containing a Given Subset

In a number of applications, we need to construct the block system
with
the smallest possible block sizes for some transitive G = S ≤ Sym(), =
[1, n], satisfying the property that points of a given subset ⊆ belong to the
same block. The algorithm in Section 5.5.1 can be modified to do that, but we
leave the details as an exercise (cf. Exercise 5.8). In this section we describe a
faster method from [Atkinson et al., 1984].
Computation of the Smallest Block Containing

During the algorithm, we maintain a partition of . Initially, consists of
and || − || sets of size 1, the latter ones containing the elements of \.
Later we merge sets of , but always satisfies the property that if two points
of belong to the same set in then they must belong to the same block in
.
If α, β ∈ belong to the same set A in then, for all s ∈ S, the points
α , β s must belong to the same set. Hence if α s and β s belong to different sets
s
A1 , A2 ∈ for some α, β ∈ and s ∈ S then we delete A1 and A2 from ,

and add the set A1 ∪ A2 to . The algorithm terminates when for all s ∈ S and
A ∈ there exists B ∈ such that As = B. At termination, consists of the
blocks of the desired block system
.
The efficiency of the algorithm depends on how quickly we can decide which
set of a point α s belongs to, and then how quickly we can perform the
necessary merges of sets in .
We store the sets in in a Union-Find data structure. Each set A is stored
in a rooted tree T A . Each vertex of T A represents an element in A, and the root
represents the entire set as well. A vertex v of T A consists of three cells. One
of them contains the appropriate element of A, and the second one contains a
pointer to the parent of v. The pointer of the root points to itself. The content
of the third cell is considered only if v is the root of T A , and in this case the
third cell contains |A|. In an implementation, the entire data structure for can
be stored in two lists of length n. For α ∈ , let v(α) denote the tree vertex
representing α. In one of the lists, at position α we store the element of
represented by the second cell of v(α), and in the other one at position α we
store |A| if v(α) happens to be the root of T A for some set A.
We perform two operations on the data structure. One of them is Find(α),
which finds the set A to which α ∈ belongs. This amounts to following the
path (determined by the second cells of vertices) from the vertex v(α) repre-
senting α until we reach the root of T A . In addition, after each Find operation,
we also perform a collapse of the appropriate tree: We go along the path from
α to the root again and change the parent of each vertex to the (already known)
root. The cost of Find(α), with or without applying the collapse rule, is (l),
where l is the length of the path from v(α) to the root in T A .
The second operation is Union(A, B), which replaces the sets A, B ∈ by
their union. We make one of the roots of T A , TB a child of the other, and we
update its second cell as the pointer to the new root. We also update the third
cell of the new root. It is beneficial to use the weighted union rule: If |A| ≥ |B|
then the new root is the root of T A , and otherwise the new root is the root of
TB . We consider the cost of Union(A, B) as constant, regardless of whether
we apply the weighted union rule (i.e., we ignore the cost of the arithmetic
operations necessary for updating the third cell).
Theorem 5.5.7. Given a transitive G = S ≤ Sym(), = [1, n], and

⊆ , the smallest block of imprimitivity of G containing can be computed
using at most 2|S|n Find operations and less than n Union operations.
Proof. As indicated in the preceding paragraphs, we initialize a partition of

to consist of and || − || sets of size 1. We pick an arbitrary element δ
of as root, and we store in a tree T of depth 1. The other sets A = {α}
are represented by a tree T A with one vertex. We also maintain a list L of pairs
of points from . This list contains the pairs {α, β} ⊆ for which we want to
check whether α s and β s belong to the same set of , for s ∈ S. We initialize
L to contain all pairs {δ, β}, for β ∈ \{δ}.
We process the list L as follows: If {α, β} is the next element of L to process
then for each s ∈ S we compute α s and β s , and we use Find(α s ) and Find(β s )
to determine which sets A and B in these points belong to. If A = B then
we perform Union(A, B) and add the pair {ρ A , ρ B } to L, where ρ A and ρ B
are the elements of represented by the roots of T A and TB , respectively. The
algorithm terminates when all pairs in L are processed.
We claim that at termination consists of the sets in a block system. Since it
is clear from the definition of blocks that, at any moment during the execution of
the algorithm, points belonging to the same set A in must belong to the same
block in any block system where is a subset of a block, our claim implies
that, at termination, consists of blocks of minimal possible size.
For α ∈ , let A1 A2 · · · Am be the sets in containing α during
the algorithm. At termination, the set Am is in . Moreover, let (ρ1 , . . . , ρm ) be
the sequence of elements of that are represented by the roots of T A1 , . . . , T Am .
For i ∈ [1, m − 1], if ρi = ρi+1 then the pair {ρi , ρi+1 } occurs in L, since Ai+1
is obtained as the union of Ai with some set in . Moreover, if ρ1 = α (i.e.,
α ∈ \{δ}) then the pair {α, ρ1 } is in L. Therefore, at termination α s belongs
to the same set of as ρms , for all s ∈ S. Since any generator of G maps each
element of Am to the same set as it maps ρm , applying this argument for all sets
of at termination gives that G permutes the sets in .
Because initially consists of || − || + 1 sets, we obviously perform at
most || − || ≤ n − 1 Union operations. Initially |L| = || − 1, and we add
at most || − || pairs to L. Hence |L| ≤ n − 1 at termination. For each pair
in L, we perform 2|S| Find operations.
Let t(m, n) denote the worst-case time requirement of at most m Find opera-
tions and at most n − 1 Union operations. If we do not use the collapse rule then
it is fairly easy to estimate t(m, n), regardless of whether the weighted union
rule is used or not (cf. Exercise 5.9). However, if both the collapse and weighted
union rules are used then the analysis becomes much harder. The asymptotic
behavior of t(m, n) was determined in [Tarjan, 1975]. To state the result, we
have to introduce the Ackerman function.
Let A : N × N → N be defined recursively as follows: For all i, x ∈ N, let
A(0, x) := 2x, A(i, 0) := 0, and A(i, 1) := 2. Moreover, for i ≥ 1 and x ≥ 2,
let
A(i, x) := A(i − 1, A(i, x − 1)). (5.3)
Using (5.3), it is easy to see by induction that A(1, x) = 2x and A(i, 2) = 4 for
all x ≥ 1 and i ≥ 0. Also, A(2, x + 1) = A(1, A(2, x)) = 2 A(2,x) , and so
·2
··
A(2, x) = 22 (a tower of x twos).
The function A(2, x) already grows very fast, but A(3, x) is mind-boggling. We
22
have A(3, 0) = 0, A(3, 1) = 2, A(3, 2) = 4, A(3, 3) = A(2, A(3, 2)) = 22 =
65536, and
·2
··
A(3, 4) = A(2, A(3, 3)) = 22 (a tower of 65536 twos).
For i ≥ 4, the functions A(i, x) are growing even faster. We just compute one
more value: A(4, 3) = A(3, A(4, 2)) = A(3, 4).
For fixed m ≥ 1, we define the inverse of A(m, n) as A−1 (m, n) := min{x |
A(m, x) > log n}. If m ≥ 3 then, for all practical purposes and beyond,
A−1 (m, n) ≤ 4. The main result of [Tarjan, 1975] is the following theorem.
Theorem 5.5.8. Let t(m, n) denote the worst-case time requirement of at most
m Find operations and at most n − 1 Union operations, and suppose that
m ≥ n. If both the collapse rule and the weighted union rule are used then
t(m, n) ∈ (m A−1 (m, n)).
Corollary 5.5.9. Given a transitive G = S ≤ Sym(), with || = n, and

⊆ , the smallest block of imprimitivity of G containing can be computed
in O(|S|n A−1 (2|S|n, n)) time by a deterministic algorithm.
The method described in this section can be used to test the primitivity of
G, by computing the smallest block containing {α, β} for some fixed α ∈
and all β ∈ \{α}. However, this multiplies the running time indicated in
Corollary 5.5.9 by a factor n. There is a possible shortcut. If we already know
that the smallest block containing {α, β1 } is and at a subsequent computation
we deduce that the smallest block containing {α, β2 } must contain β1 , then we
can terminate that computation. This observation usually helps in practice, but
the worst-case running time is quadratic in n.
Exercises 111
5.5.3. Structure Forests

In many problems dealing with permutation groups, both in theoretical and
computational settings, we can use the orbit structure to reduce the original
problem to smaller groups. A further reduction is possible by using the imprim-
itivity structure of transitive groups. This way, the original problem is reduced
to primitive groups, and the strong structural properties of primitive groups can
be utilized. In computations, sometimes it is convenient to organize this reduc-
tion process by using a structure forest. A structure forest F for a permutation
group G ≤ Sym() is a forest of rooted trees where each vertex is a subset of
, and the leaves are the one-element subsets of . The elements of G act as
automorphisms of F, fixing the roots, such that for each nonleaf vertex v of
the forest, G v acts primitively on the children of v. Thus, in particular, there is
exactly one tree TO per orbit O in . In TO , the vertices are blocks of imprim-
itivity in the transitive G-action on O such that each level of the tree defines a
partition of O. It is not possible to insert intermediate levels in that tree, with
nontrivial branching, and remain consistent with the G-action on the tree.
As a simple example, consider the cyclic group G = (1, 2, 3, 4)(5, 6) ≤
Sym([1, 6]). The structure forest consists of two trees. In the first one, the root
is the orbit {1, 2, 3, 4}, it has two children {1, 3} and {2, 4}, and the tree has
the leaves {1}, {2}, {3}, and {4}. In the second tree, the root is the orbit {5, 6}
and it has two children {5} and {6}. The edges of the forest are defined by set
containment.
In general, a group G ≤ Sym() may have many nonisomorphic structure
forests. We may construct one by taking a maximal block system on each orbit
of G, then taking maximal block systems of the action on these blocks, etc.
Therefore, a structure forest for G can be computed in nearly linear time by a
deterministic algorithm.
Structure forests play an important role in the current fastest deterministic
SGS construction for arbitrary input groups (cf. [Babai et al., 1997b]) and in
the parallel (NC) handling of permutation groups (cf. [Babai et al., 1987]).
Exercises
5.1. Suppose that a nonredundant base B is given for some G ≤ Sym().
Prove that for H ≤ G and T ⊆ G, the subgroup H ∪ T and normal
closure H G can be computed in nearly linear time by a deterministic
algorithm.
5.2. Let S be an SGS for G relative to the base (β1 , . . . , βm ). Let Si = S ∩
G (β1 ,...,βi−1 ) and let Ri be the transversal computed from Si for G (β1 ,...,βi−1 )
mod G (β1 ,...,βi ) . Consider the elements of Ri as words in Si . For all j ∈
[1, m], r ∈ R j , and g ∈ S j , let
rg = rm rm−1 · · · r1 , ri ∈ Ri (5.4)
be the factorization of rg as a product of coset representatives.

Introduce a symbol x g for each g ∈ S. Prove that the set of words
obtained by rewriting all equations of the form (5.4) in terms of the x g
gives a presentation for G. Hint: Rewrite any word in the x g representing
the identity in the spirit of the proof of Lemma 4.2.1.
5.3. Give an example of some transitive G ≤ Sn with minimal base size b
such that there exists g ∈ G with |supp(g)| = n/b. Hint: Take a wreath
product of cyclic groups.
5.4. (Cauchy) Suppose that some group G ≤ Sym() has t orbits. For g ∈ G,
define fix(g) := \supp(g). Prove that

|fix(g)| = t|G|.
g∈G
Hint: Count the number of pairs (α, g) with α ∈ , g ∈ G, α ∈ fix(g)

two different ways.
For the history of this lemma, see [Neumann, 1979].
5.5. [Sims, 1971a] Let (β1 , . . . , βm ) be a base for G with the correspond-
ing point stabilizer chain G = G [1] ≥ · · · ≥ G [m+1] = 1, fundamental
orbits 1 , . . . , m , and transversals R1 , . . . , Rm . Let
¯ 1, . . . ,
¯m
be the fundamental orbits of G relative to the base B̄ = (β1 , . . . ,
βi−1 , βi+1 , βi , βi+2 , . . . , βm ). Prove that
¯ i+1 ⊆ i . Moreover, for δ ∈
g
i , let g ∈ Ri be such that βi = δ. Prove that δ ∈ ¯ i+1 if and only if
g −1
βi+1 ∈ i+1 . Based on this observation, design an algorithm for comput-
ing an SGS relative to B̄.
5.6. Let G ≤ Sym() be transitive, and let α, β ∈ . Prove that the smallest
block of imprimitivity containing α and β can be obtained as the com-
ponent containing α in the graph with vertex set and edge set {α, β}G
(i.e., the edges are the G-images of the unordered pair {α, β}) or as the
strongly connected component containing α in the directed graph with
edge set (α, β)G (i.e., the edges are the G-images of the ordered pair
(α, β)).
5.7. Modify the algorithm described in Section 5.5.1 to find all minimal blocks
containing α within the time bounds given in Theorems 5.5.1 and 5.5.2.
5.8. Let G ≤ Sym() be transitive. Modify the algorithm in Section 5.5.1 to
compute a minimal block containing a given subset ⊆ by a nearly
linear-time deterministic algorithm.
Exercises 113
5.9. Let t(m, n) denote the worst-case time requirement of at most m Find
operations and at most n − 1 Union operations, and suppose that m ≥ n.
Prove that
(i) if neither the collapse rule nor the weighted union rule is used then
t(m, n) ∈ (mn); and
(ii) if only the weighted union rule is used then t(m, n) ∈ (m log n).
5.10. Let G ≤ Sym(). Prove that any structure forest of G has less than 2||
vertices.
5.11. Give examples of groups G ≤ Sym() with exponentially many (as a
function of ||) structure forests.
The following four exercises lead to the regularity test from [Acciaro
and Atkinson, 1992].
5.12. Let G = S ≤ Sym() be transitive, and let α ∈ . Prove that G acts
regularly on if and only if G α = G α g for all g ∈ S.
5.13. Let G = S ≤ Sym() be transitive, and let α, ω ∈ . Construct a
list (T [β] | β ∈ ) in the following way: Initialize T [α] := ω. Build a
breadth-first-search tree L that computes the orbit α G by the algorithm
−−−→
described in Section 2.1.1. Each time an edge (β, γ ) is added to L, because
γ = β s for some already constructed vertex β of L and some s ∈ S, we
also define T [γ ] := T [β]s .
Prove that G α = G ω if and only if T [β g ] = T [β]g for all β ∈ and
for all g ∈ S.
5.14. Based on Exercises 5.12 and 5.13, design a deterministic algorithm that
tests whether a group G = S ≤ Sym(), with || = n, acts regularly
on , in O(|S|2 n) time.
5.15. Let G = S ≤ Sym(), with || = n, be transitive. Suppose that for
some α, β ∈ we have G α = G β , and let be the smallest block of
imprimitivity for G that contains α and β. Prove that G α = G δ for each
δ ∈ .
Based on this observation, modify the algorithm in Exercise 5.14 to
run in O(|S|n log n) time.
5.16. Let G = S ≤ Sym(), with || = n, be transitive. Based on Lemma
4.4.2, design a deterministic algorithm that in O((|S| + log n)n log n)
time either decides that G is not regular, or outputs at most log n generators
for G.
6
A Library of Nearly Linear-Time Algorithms
In this chapter, we develop a nearly linear-time library for constructing certain

important subgroups of a given group. All algorithms are of the Monte Carlo
type, since they are based on results of Section 4.5. However, if a base and SGS
are known for the input group, then all algorithms in this chapter are of Las
Vegas type (and in most cases there are even deterministic versions).
A current research project of great theoretical and practical interest is the up-
grading of Monte Carlo permutation group algorithms to Las Vegas type. The
claim we made in the previous paragraph implies that it is enough to upgrade the
SGS constructions; we shall present a result in this direction in Section 8.3. To
prove that all algorithms in this chapter are of Las Vegas type or deterministic,
we always suppose that the input is an SGS S for some G ≤ Sym() relative to
some base B, S satisfies S = S −1 , and transversals corresponding to the point
stabilizer chain defined by B are coded in shallow Schreier trees. Through-
out this chapter, shallow Schreier tree means a Schreier tree of depth at most
2 log |G|. We remind the reader that, by Lemma 4.4.2, given an arbitrary SGS
for G, a new SGS S satisfying S = S −1 and defining a shallow Schreier tree data
structure can be computed in nearly linear time by a deterministic algorithm.
Therefore, without further mention, we also suppose that the algorithms output
shallow Schreier tree data structures for the constructed subgroups.
In the descriptions, we do not compute the exponent of log |G| in the running
time estimates; rather, for clarity, we concentrate only on making clear the nearly
linear running time. For example, several log |G| factors can be eliminated from
the running times by applying Theorem 4.4.6 at the construction of Schreier
trees. The algorithms we present are practical enough for implementation and,
in fact, most of them are already available in the GAP system.
For most algorithms, we have deterministic and Las Vegas nearly linear-
time versions, depending on which variants of the low-level algorithms from
Chapter 5 are used as subroutines. Whenever possible, we shall describe
114
deterministic nearly linear-time algorithms. In implementations, usually the
faster Las Vegas versions are used.
In the entire chapter, there is only one algorithm where the worst-case running
time is not nearly linear: The computation of CSym() (G) in Section 6.1.2. We
included that algorithm here because its method is similar to the accompanying
nearly linear-time algorithms.
6.1. A Special Case of Group Intersection and Applications

The common theme in this section is the application of a subroutine for a
special case of the group intersection problem. As mentioned in Section 3.3,
no polynomial-time algorithm is known for computing the intersection of two
arbitrary permutation groups. However, if one of the groups normalizes the
other then their intersection is computable in polynomial time. The method has
been used for a long time in linear algebra to obtain a basis for the intersection
of subspaces. It was generalized for both the theoretical (cf. [Rose, 1965])
and computational (cf. [Laue et al., 1984]) study of solvable groups. In the
permutation group setting, it was introduced in [Cooperman et al., 1989] and
our discussion in Sections 6.1.1 and 6.1.2 follows this paper.
6.1.1. Intersection with a Normal Closure

Given G, H ≤ Sym(), we introduce subgroups of Sym(1 )×Sym(2 ), where
1 , 2 are disjoint copies of the set . We write elements of Sym(1 ) ×
Sym(2 ) as ordered pairs (a, b), where a ∈ Sym(1 ) and b ∈ Sym(2 ). The
subgroups we consider are D = Diag(G × G) := {(g, g) | g ∈ G}, 1 × H :=
{(1, h) | h ∈ H }, and K := D, 1 × H .
An arbitrary element (x1 , x2 ) ∈ K can be written in the form
(x1 , x2 ) = (g1 , g1 )(1, h 1 )(g2 , g2 )(1, h 2 ) · · · (gk , gk )(1, h k )(gk+1 , gk+1 )

= (g1 g2 · · · gk+1 , g1 h 1 g2 h 2 · · · gk h k gk+1 )
g −1 (g g )−1 (g g ···g )−1
= g1 g2 · · · gk+1 , h 11 h 2 1 2 · · · h k 1 2 k g1 g2 · · · gk gk+1 , (6.1)
for some g1 , . . . , gk+1 ∈ G and h 1 , . . . , h k ∈ H .
Lemma 6.1.1. K (1 ) = {(x1 , x2 ) | x1 = 1} ∼

= H G .
Proof. If (x1 , x2 ) ∈ K (1 ) then from (6.1) we obtain x1 = g1 g2 · · · gk+1 = 1 and

g̃ g̃ g̃
x2 ∈ H G = h g | h ∈ H, g ∈ G. Conversely, if x2 = h 1 1 h 2 2 · · · h k k ∈ H G
then successively define g1 , g2 , . . . , gk+1 ∈ G such that (g1 g2 · · · gi )−1 = g̃ i for
116 A Library of Nearly Linear-Time Algorithms
1 ≤ i ≤ k and gk+1 = (g1 g2 · · · gk )−1 . Then
(1, x2 ) = (g1 , g1 )(1, h 1 )(g2 , g2 )(1, h 2 ) · · · (gk , gk )(1, h k )(gk+1 , gk+1 ),
so (1, x2 ) ∈ K (1 ) .
Lemma 6.1.2. K (2 ) = {(x1 , x2 ) | x2 = 1} ∼

= H G ∩ G.
Proof. If (x1 , x2 ) ∈ K (2 ) then from (6.1) we obtain g1 g2 · · · gk+1 ∈ H G and,

obviously, g1 g2 · · · gk+1 ∈ G. Therefore, x1 ∈ H G ∩ G and x2 = 1. Conversely,
if x1 ∈ H G ∩ G then, by Lemma 6.1.1, (1, x1−1 ) ∈ K . Also, (x1 , x1 ) ∈ D ≤ K ,
so (1, x1−1 )(x1 , x1 ) = (x1 , 1) ∈ K . Since (x1 , 1) ∈ K and it stabilizes 2 point-
wise, we have (x1 , 1) ∈ K (2 ) .
Corollary 6.1.3. Suppose that strong generating sets defining shallow Schreier
tree data structures are given for G, H ≤ Sym(). If G normalizes H then the
intersection of G and H can be computed by a deterministic nearly linear-time
algorithm.
Proof. We use the notation just introduced. Observe that the action of K on 1
is permutation isomorphic to the action of G on , so Lemma 6.1.1 implies that
|K | = |G| · |H G |. In particular, if G normalizes H then |K | = |G||H | and
it follows from Lemma 6.1.1 that B1 ∪ B2 is a base for K , where B1 is a copy
of the base of G in 1 and B2 is a copy of the base of H in 2 . Therefore, by
Lemma 5.2.3, an SGS for K is computable by a deterministic algorithm with
running time estimate of the form O(n(log |G| + log |H |)c ). If we order B1 ∪ B2
such that the elements of B2 precede the elements of B1 then (G ∩ H )×1 occurs
as a member of the point stabilizer subgroup chain of K .
We remark that Lemma 6.1.1 also implies that the computation of normal
closures can also be reduced to point stabilizer computations. However, imple-
mentations of the method presented in Section 5.1.4 are faster in practice.
It is noted in [Cooperman et al., 1989] that if G normalizes H and strong
generating sets for H and G are already known then an SGS for K (relative
to a base that first fixes the points of 1 ) can be constructed directly, without
invoking the Schreier–Sims algorithm. After that, generators for G ∩ H may
be obtained by a base change.
6.1.2. Centralizer in the Symmetric Group
First we handle the transitive case. Let G ≤ Sym() be transitive and, for some
fixed α ∈ , let A be the set of fixed points of G α :
A := fix(G α ) = {β ∈ | (∀g ∈ G α )(β g = β)}.
Observe that G β = G α for all β ∈ A since G α ≤ G β by the definition of A and

the transitivity of G implies that |G α | = |G β | (because any two point stabilizer
groups are conjugate).
We denote CSym() (G) by C. Then the following lemma holds.
Lemma 6.1.4. C is semiregular and A is an orbit of C.
Proof. Suppose that c ∈ C and β c = β for some β ∈ . Let δ ∈ be arbitrary.

Since G is transitive, there exists g ∈ G such that β g = δ. Then
δ c = β gc = β cg = β g = δ.
This proves that C is semiregular. Next, suppose that c ∈ C and α c = β. Let

h ∈ G α be arbitrary. Then
β h = α ch = α hc = α c = β,
which shows that α C ⊆ A.

Conversely, suppose that β ∈ A; we have to construct c ∈ C such that α c = β.
Given an arbitrary γ ∈ , γ c can be determined as follows: Take g ∈ G such
that α g = γ and define
γ c := β g . (6.2)
This definition is independent of which representative of the coset of G α car-

rying α to γ is chosen since if α g1 = α g2 then g1 g2−1 ∈ G α and so β g1 = β g2 .
We have to show that the function c : → defined by (6.2) is a permutation
and that it centralizes G. For all g ∈ G, α gc = β g = α cg since any g ∈ G can
be used in (6.2) when we define γ c for γ := α g . Next we show that δ gc = δ cg
holds for all g ∈ G and δ ∈ . Fix g, δ and take h ∈ G such that α h = δ. Then
δ gc = α (hg)c = α c(hg) = α (ch)g = α (hc)g = δ cg .
Finally, we show that c is a permutation. Suppose that γ c = δ c for some γ , δ ∈

and choose g, h ∈ G such that α g = γ and α h = δ. Then
β g = α cg = α gc = γ c = δ c = α hc = α ch = β h ,
so gh −1 ∈ G β . However, G β = G α and so γ = α g = α h = δ.
Corollary 6.1.5. If G ≤ Sym() is transitive and abelian then G is regular
and CSym() (G) = G.
Theorem 6.1.6. Suppose that G ≤ Sym() is transitive. If a nonredundant

base B and an SGS S relative to B are given for G then CSym() (G) can be
computed by a nearly linear-time deterministic algorithm.
Proof. Let n := || and B = (β1 , . . . , βm ). The construction of C = CSym() (G)

is straightforward from the proof of Lemma 6.1.4. We use the notation of that
lemma. We can choose α = β1 , and A is obtained as the set of points fixed by
all elements of S ∩ G α . Let R be the transversal, coded in a Schreier tree, for
G mod G α . For β ∈ A, an element c ∈ C carrying α to β can be constructed
using (6.2). Note that the elements of R may be used as the permutations
g required in (6.2). Moreover, we do not need to multiply out g explicitly,
since we are interested only in the image of one point of . It is enough to
consider g as a word and follow the image of β. Note that since S = S −1 by
our convention in this chapter, we can switch easily to the word representing
g from the word representing g −1 , which can actually be obtained from the
Schreier tree (cf. Section 4.1). Hence, because the tree coding R is shallow,
an element of C can be computed in nearly linear time by a deterministic
algorithm. Finally, observe that at most log |C| ≤ log n elements of C have to be
constructed via (6.2) since we need only generators for C. If c1 , . . . , ck ∈ C are
already known, we choose βk+1 ∈ A\α c1 ,...,ck and construct ck+1 ∈ C such that
α ck+1 = βk+1 .
Next, we treat the case of intransitive G ≤ Sym(). We start with a sim-

ple but important lemma about groups normalized by certain elements of
Sym(). This lemma is useful in other situations as well and it is one of the
most frequently quoted results in this book (cf. Exercise 6.1, Theorem 6.3.1,
Lemmas 6.1.10, 6.1.11, 7.1.1, and Sections 7.3.2 and 7.4.1).
Lemma 6.1.7. Suppose that H ≤ Sym(), y ∈ Sym(), and y normalizes H .

Then y permutes the orbits of H .
Proof. Let β, γ ∈ be in the same H -orbit. Then there exists h ∈ H such that
−1 ∗
β h = γ and so γ y = β yy hy = β yh for h ∗ = y −1 hy ∈ H . This means that
if two points β, γ ∈ are in the same H -orbit then their images β y , γ y are in
the same H -orbit again.
Since C = CSym() (G) normalizes G, Lemma 6.1.7 implies that C permutes
the orbits of G. We say that two orbits 1 , 2 of G are equivalent, denoted
1 ≡ 2 , if there is a bijection ϕ : 1 → 2 such that for all g ∈ G and
δ ∈ 1 ,
ϕ(δ g ) = ϕ(δ)g . (6.3)
It is clear that ≡ is an equivalence relation. Also, it is clear that if c ∈ C maps

1 to 2 then the restriction c|1 is a bijection satisfying (6.3). Conversely,
if ϕ is a bijection satisfying (6.3) then ϕ ∪ ϕ −1 is an involution in C that
fixes the set \(1 ∪ 2 ) pointwise. Therefore, for any two orbits 1 , 2 of
G, 1 ≡ 2 if and only if some c ∈ C maps 1 to 2 . Moreover, if 1 ≡ 2
then CSym(1 ) (G|1 ) ∼= CSym(2 ) (G|2 ). From these, we obtain the following
lemma.
Lemma 6.1.8. Let C1 , . . . , Ct be the ≡-equivalence classes of orbits of G,

|Ci | = ki , and for each i choose a representative i ∈ Ci . Then

t
CSym() (G) ∼
= CSym(i ) (G|i ) Ski .
i=1
Hence, to obtain CSym() (G), we have to determine which orbits of G are

equivalent and we have to construct bijections satisfying (6.3) between a rep-
resentative of an equivalence class and the other members of the class.
Lemma 6.1.9. Let 1 , 2 be orbits of G of the same size, let α ∈ 1 , and let
A be the set of fixed points of G α . Then 1 ≡ 2 if and only if A ∩ 2 = ∅.
Proof. If ϕ : 1 → 2 witnesses 1 ≡ 2 then ϕ(α) ∈ A ∩ 2 . To see this, it is

enough to observe that (6.3) implies that, for all g ∈ G α , ϕ(α)g = ϕ(α g ) = ϕ(α).
Conversely, let β ∈ A ∩ 2 . Then we can define ϕ : 1 → 2 as follows:
For γ ∈ 1 , take any g ∈ G such that α g = γ and let ϕ(γ ) := β g . Similarly to
the proof of Lemma 6.1.4, it can be shown that ϕ is a bijection satisfying (6.3).
Details are left to the reader.
If α is the first base point of G then, using the set of fixed points of G α
and a shallow Schreier tree coding the transversal for G mod G α , we can
compute the orbits equivalent to the orbit of α and bijections between them by
a nearly linear-time deterministic algorithm. Equivalence of the orbit containing
some γ ∈ with the other orbits can be determined by first applying a base
change algorithm to obtain a base starting with γ . However, if G has numerous
inequivalent orbits (but of the same size, so equivalence
√
must be tested) then
this computation of CSym() (G) may require (n /22 log n
) time even for small-
base inputs (cf. Exercise 6.3). It is an open problem whether there is a nearly
linear-time algorithm that computes CSym() (G) for all G ≤ Sym().
6.1.3. The Center

Since CSym() (G) is normalized by G, it is possible to compute Z (G) =
CSym() (G) ∩ G by combining the methods of Sections 6.1.1 and 6.1.2. How-
ever, computing CSym() (G) may not be possible by a nearly linear-time al-
gorithm. Moreover, even if CSym() (G) can be computed, it is possible that
|CSym() (G)| is much larger than |G| and so the intersection algorithm does not
run in time O(n logc |G|). Hence we need a small additional trick.
Our treatment follows that of [Beals and Seress, 1992]. Let B be a non-
redundant base for G, and let {1 , 2 , . . . , k } be the set of orbits of G that
have nonempty intersections with B. Note that k ≤ |B| ≤ log |G|. Let :=
k ∗
i=1 i , and let G be the restriction of G to . Since contains a base for
G, |G | = |G| and, by Lemma 5.2.1, each element of G ∗ is extendible uniquely
∗
to an element of G by a deterministic nearly linear-time algorithm. Hence it is

enough to construct generators for Z (G ∗ ).
Note that G ∗ has at most log |G| = log |G ∗ | orbits, so CSym() (G ∗ ) can be
obtained by a deterministic nearly linear-time algorithm. A further speedup
is possible by observing that it is enough to construct generators for some
K ≤ Sym() satisfying Z (G ∗ ) ≤ K ≤ CSym() (G ∗ ), since K ∩ G ∗ = Z (G ∗ ) for
any such K . Clearly,

k
K = CSym(i ) (G ∗ |i )
i=1
is appropriate, since elements of Z (G ∗ ) cannot move the orbits of G ∗ .
6.1.4. Centralizer of a Normal Subgroup

If N G ≤ Sym() then CSym() (N ) is normalized by G (cf. Exercise 6.4)
and so C G (N ) = CSym() (N ) ∩ G can be constructed by the methods of Sec-
tions 6.1.1 and 6.1.2. As in the case of Z (G), the difficulty is that CSym() (N )
may not be computable in nearly linear time and/or it may have much larger
order than G.
A deterministic nearly linear-time algorithm is described in [Luks and Seress,
1997]. We start with a trick similar to the one applied for computing the center.
As in Section 6.1.3, let B be a nonredundant base for G, let {1 , 2 , . . . , k } be
k
the set of orbits of G that have nonempty intersections with B, let := i=1 i ,
and let G ∗ , N ∗ be the restrictions of G, N respectively, to . It is enough to
compute C G ∗ (N ∗ ). Moreover,

k
K = C G ∗ |i (N ∗ |i )
i=1
contains C G ∗ (N ∗ ), K is normalized by G ∗ , and |K | ≤ |G|log |G| , so log |K | ≤

log2 |G|. Hence C G ∗ (N ∗ ) can be obtained as C G ∗ (N ∗ ) = K ∩ G ∗ , provided that
the groups C G ∗ |i (N ∗ |i ) are available, and so we reduced the computation of
C G (N ) to the computation of the groups C G ∗ |i (N ∗ |i ). Therefore it is enough
to find a nearly linear-time solution to the following problem:
Given a transitive G ≤ Sym() and N G, compute C G (N )

by a deterministic nearly linear-time algorithm. (6.4)
Note that despite the fact that G is transitive in (6.4), N may have numer-
ous orbits and so the difficulties with the nearly linear-time construction of
CSym() (N ) seem to persist. Although in this special case it is possible to con-
struct generators for CSym() (N ) in nearly linear time, CSym() (N ) may be too
large. Therefore we still cannot apply the intersection method of Section 6.1.1.
Instead, we proceed by constructing a block homomorphism whose kernel G̃
contains C G (N ) and a second homomorphism from G̃ whose kernel is exactly
C G (N ).
Lemma 6.1.10. Let G ≤ Sym() be transitive and let N G.
(i) For some α ∈ , let A = fix(Nα ) be the set of fixed points of Nα . Then A is
a block of imprimitivity for G.
(ii) Let B be the block system consisting of the G-images of A and let G̃ be the
kernel of the G-action on B. Then C G (N ) ≤ G̃.
Proof. (i) By Lemmas 6.1.4, 6.1.8, and 6.1.9, A is an orbit of CSym() (N ). So,
by Exercise 6.4 and Lemma 6.1.7, A is a block of imprimitivity for G.
(ii) The blocks in B are the orbits of CSym() (N ); therefore, in particular, the
elements of C G (N ) cannot move these blocks. Thus C G (N ) ≤ G̃.
The tricky part of the computation of C G (N ) is the definition of a ho-
momorphism ϕ : G̃ → Sym() with ker(ϕ) = C G (N ). We shall define ϕ in
Lemma 6.1.12, but first we need some preparatory steps.
Lemma 6.1.11. Let 1 , 2 , . . . , k be the orbits of N that have nonempty

k
intersection with A and let := i=1 i . Then is a block of imprimitivity
for G.
Proof. Observe that is the union of the N -images of A and so {1 , 2 ,

. . . , k } is an orbit of N in the action on B. By Lemma 6.1.7, {1 , 2 , . . . , k }
k
is a block for G in the action on B and so i=1 i is a block for the G-action
on .
Next, choose αi ∈ A ∩ i for 1 ≤ i ≤ k. Since is a block of imprimitivity

for G, we can take G-images g1 , . . . , gm of that partition and so the
g
images := {αi j | 1 ≤ i ≤ k, 1 ≤ j ≤ m} define a system of representatives
of all N -orbits. Recall again that we have an equivalence relation on the orbits
of N : Two orbits are equivalent if and only if there is an element of CSym() (N )
mapping one to the other. The orbits 1 , 2 , . . . , k form one of the equiva-
lence classes. If two N -orbits are equivalent then their representatives in are
in the same orbit of CSym() (N ).
Lemma 6.1.12. Let G ≤ Sym() be transitive, let N G, and define B, G̃,

as in Lemma 6.1.10 and in the previous paragraph.
(i) For each g ∈ G̃, there exists a unique cg ∈ CSym() (N ) such that gcg−1 fixes
pointwise.
(ii) The map ϕ : G̃ → Sym() defined by ϕ : g → gcg−1 is a group homomor-
phism.
(iii) C G (N ) is the kernel of the homomorphism ϕ.
Proof. (i) Let g ∈ G̃. Recall that, by Lemma 6.1.8,

m

CSym() (N ) ∼
= CSym(g j ) (N |g j ) Sk . (6.5)
1 1
j=1
Since g ∈ G̃, it cannot move the blocks in B. In particular, g fixes g j for

1 ≤ j ≤ m. Let j ∈ [1, m] be arbitrary. There are k N -orbits in g j and g
permutes them; this permutation π j , regarded as an element of Sk in the wreath
product in the jth term of (6.5), must also be induced by any cg ∈ CSym() (N )
g
satisfying that gcg−1 fixes αi j for i = 1, 2, . . . , k.
g
Now, for each i ∈ [1, k], consider (αi j )g . Since g fixes the G-images of
g
A and CSym(g j ) (N |g j ) acts transitively on A g j ∩ i j , there exist ci j ∈
i i g g
CSym(g j ) (N |g j ) such that (αi j )g = (αi j )ci j . Moreover, since CSym(g j ) (N |g j )
i i i i
is semiregular, there is a unique ci j with this property. Then for

m
cg := (c1 j , . . . , ck j ; π j ) ∈ CSym() (N ),
j=1
gcg−1 fixes pointwise and it is clear from the discussion above that there is
only one element in CSym() (N ) with this property.
(ii) Observe that for g, h ∈ G̃, we have gcg−1 hch−1 = gh(cg−1 )h ch−1 . Here
(cg ) ch ∈ CSym() (N ) and gh(cg−1 )h ch−1 fixes pointwise, so the uniqueness
−1 h −1
−1
of cgh implies that cgh = (cg−1 )h ch−1 and ϕ(g)ϕ(h) = ϕ(gh).
(iii) C G (N ) is the kernel of the homomorphism ϕ since ϕ(g) = 1 if and only
if g = cg , that is, g ∈ G̃ ∩ CSym() (N ) = C G (N ).
In summary, the algorithm for solving problem (6.4) is the following:
Centralizer of Normal Subgroup in Transitive Group[G, N]

Input: An SGS for a transitive G ≤ Sym() relative to a nonredundant base
B; generators for N G.
Outout: An SGS for C G (N ).
(Note that N is not necessarily transitive.)
Step 1: Compute an SGS relative to B for N .

Step 2: Let α be the first point of B. Compute A := fix(Nα ).
Step 3: Compute the block system B consisting of the G-images of A and
G̃, the kernel of the G-action on B.
Step 4: Compute those N -orbits 1 , . . . , k that have nonempty intersection
k
with A, and compute := i=1 i . For 1 ≤ i ≤ k, pick αi ∈ A ∩ i .
Step 5: Compute G-images g1 , . . . , gm of that partition . Compute
g
:= {αi j | 1 ≤ i ≤ k, 1 ≤ j ≤ m}.
Step 6: For each generator g of G̃, compute cg ∈ CSym() (N ) such that gcg−1
fixes pointwise.
Step 7: Compute and output the kernel of the homomorphism ϕ : G̃ →
Sym() defined by ϕ : g → gcg−1 .
Theorem 6.1.13. The algorithm Centralizer of Normal Subgroup in Tran-

sitive Group(G, N ) is deterministic and it runs in nearly linear time.
Proof. Appealing to the results of Chapter 5 about computing an SGS relative
to a given base (Theorem 5.2.3) and computing kernels of homomorphisms
(Section 5.1.2), we clearly see that Steps 1–5 and 7 can be performed by nearly
linear-time deterministic algorithms. For Step 6, observe that for fixed i, j,
generators and a shallow Schreier tree data structure for CSym(g j ) (N |g j ) can
i i
be computed in O(|i | logc |N |) time with an absolute constant c, as described
in Section 6.1.2; in particular, ci j can be obtained in O(|i | logc |N |) time.
Hence, for each generator g of G̃, we can compute cg in nearly linear time.

6.1.5. Core of a Subnormal Subgroup

By repeated applications of Corollary 6.1.3, we can compute intersections with
subnormal subgroups.
Lemma 6.1.14. Suppose that H M and G ≤ M. Given an SGS for M and

generators for H and G, an SGS for H ∩ G can be computed in nearly linear
time by a deterministic algorithm.
Proof. Define H0 := M and Hi := H Hi−1 for i > 0. The series of subgroups

H0 H1 · · · is computable in nearly linear time and, since H M, H = Hk
for some integer k (cf. Exercise 6.5). Note that the first such k is at most log |M|,
since the previous Hi define a strictly decreasing subgroup chain in M.
For all i ≥ 0, G ∩ Hi normalizes Hi+1 . Hence Corollary 6.1.3 implies that
(G ∩ Hi ) ∩ Hi+1 = G ∩ Hi+1 is successively computable for i = 0, 1, . . . , k − 1
by a nearly linear-time deterministic algorithm.
Computation of the Core

We can apply Lemma 6.1.14 for computing CoreG (H ) = {H g | g ∈ G} in
the case when H G. Let G = S. We compute a series of subgroups H =
C0 C1 · · · and permutations g0 = 1, g1 , g2 , . . . ∈ G such that, for all i ≥ 0,

i
Ci = H ∩ H gj . (6.6)
j=1
The process stops when Cm = CoreG (H ). Since the Ci define a strictly de-
creasing subgroup chain and, by definition, Ci ≥ CoreG (H ) for all i, we reach
CoreG (H ) in m ≤ log |G| steps.
Suppose that Ci and g0 , . . . , gi are already defined for some i ≥ 0. Then
g
we compute the conjugates of Ci by the generators of G. If Ci = Ci for all
g ∈ S then Cih = Ci for all h ∈ G. This follows from induction on the length
of the word as h is written as a product of generators. Also, (6.6) implies that
Ci ≤ H h . Hence Ci = CoreG (H ) and we are done.
g g
Otherwise, there exists g ∈ S such that Ci = Ci . Since |Ci | = |Ci |, this means
g
that Ci is not a subgroup of H g j for some j ∈ [0, i]. Such j can be found by
testing for each generator c of Ci whether c g is in the group H g j (or, for
−1
easier implementation, testing whether c gg j is in H ). After the appropriate
index j is found, we define gi+1 := g j g −1 , use the −1 algorithm in the proof of
g j g −1 gg j −1
Lemma 6.1.14 to compute Ci+1 := Ci ∩ H = (Ci ∩ H )g j g , and proceed
with the recursion.
We remark that [Kantor and Luks, 1990] contains a polynomial-time algo-
rithm for computing the core of arbitrary subgroups of G. However, that algo-
rithm uses Sylow subgroup computations, which, at the moment, have nearly
linear-time versions only for groups with no exceptional Lie-type composition
factors (cf. [Morje, 1995]), and the algorithm is much more complicated than
the elementary procedure described here.
6.2. Composition Series

The composition factors reveal substantial information about the structure of
a group, and computing a composition series is the starting point in many
algorithms (cf. Sections 7.3.1, 6.3.1, 8.3, 8.4, and 9.4). Therefore, significant
effort has been devoted to the construction of a compostion series [Neumann,
1986; Luks, 1987; Kantor, 1991; Babai et al., 1993; Beals and Seress, 1992;
Beals, 1993b; Cannon and Holt, 1997].
Given G = S ≤ Sym(), with || = n, the goal is to find generators for
the subgroups Ni in a composition series G = N1 N2 · · · Nl = 1 and ho-
momorphisms ϕi : Ni → Sn with ker(ϕi ) = Ni+1 for i = 1, 2, . . . , l − 1. We
remark that it is not true in general that if N G then G/N has a faithful permu-
tation representation of degree at most n (cf. Exercise 6.6), but we shall prove
that if N is a maximal normal subgroup of G then such a small-degree faithful
permutation representation exists. Hence the homomorphisms ϕi exist as well.
The preliminary version of [Luks, 1987], circulated since 1981, was the
first permutation group algorithm that utilized the classification of finite simple
groups (CFSG). Although it turned out later that a compsosition series can
be computed without appealing to this classification (cf. [Beals, 1993b]), the
CFSG is a powerful and indispensible tool in many other algorithms. Also,
implementations of the composition series algorithm in both GAP and Magma
use consequences of the CFSG.
Our goal is to prove the following theorem.
Theorem 6.2.1. Given G = S ≤ Sym(), with || = n, and an SGS relative
to a nonredundant base for G, a composition series for G can be computed by
a nearly linear-time Las Vegas algorithm.
The proof of Theorem 6.2.1 is spread out in Sections 6.2.1–6.2.4 and 6.2.6.
In Sections 6.2.1–6.2.4 we describe a nearly linear-time Monte Carlo compo-
sition series construction. This construction is Monte Carlo, even if we have
a nonredundant base and a strong generating set for the input group. The up-
grade to a Las Vegas algorithm (supposing that a base and SGS are known) is
given in Section 6.2.6. Moreover, in Section 8.3, we describe how to compute a
nonredundant base, SGS, and composition series by a Las Vegas nearly linear-
time algorithm, in groups satisfying some restrictions on their composition
factors.
6.2.1. Reduction to the Primitive Case

Given G = S ≤ Sym(), with || = n, and an SGS relative to a nonredundant
base for G, the basic strategy for constructing a composition series of G is to
find a normal subgroup M G and a faithful permutation representation of
degree at most n for G/M. Moreover, either 1 = M = G or M = 1 but the
faithful permutation representation of G/M is of degree at most n/2. Applying
this procedure recursively for G/M and M, we find a composition series of G
in at most log |G| log n iterations.
If G is not transitive then let be the smallest nontrivial orbit of G and con-
struct a transitive constituent homomorphism ϕ : G → Sym() and its kernel
M := ker(ϕ). By Section 5.1.3, this construction can be done in nearly linear
time.
If G is transitive then we use the nearly linear-time algorithm of Section 5.5
to check whether G is primitive. If G is imprimitive then this algorithm outputs
a nontrivial block, and so we can construct a nontrivial block system
, the
action ϕ : G → Sym(
) on the blocks, and the kernel M := ker(ϕ) of this action.
Using the results of Section 5.1.3 again, this construction can be done in nearly
linear time.
If G is primitive then, by the following lemma from [Luks, 1987], it is enough
to find any proper normal subgroup of G.
Lemma 6.2.2. Let G ≤ Sym(), with || = n, let B = (β1 , . . . , βm ) be a

base for G, and let G = G [1] ≥ G [2] ≥ · · · ≥ G [m+1] = 1 be the corresponding
point stabilizer chain. Suppose that N G, and let i be the index such that
G = G [i] N G [i+1] N . Then |G : G [i+1] N | ≤ n and the action of G on the

cosets of G [i+1] N has N in its kernel. Moreover, given B and generators for
N , this action can be constructed by a deterministic algorithm in nearly linear
time.
Proof. We have |G : G [i+1] N | = |G [i] N : G [i+1] N | ≤ |G [i] : G [i+1] |. Since

N G [i+1] N , N acts trivially on the G-cosets of G [i+1] N .
To show the algorithmic feasibility of the construction of the action on the
G-cosets of G [i+1] N , first observe that, by Theorem 5.2.3, the index i can
be found in nearly linear time. Let H := G [i] ∩ G [i+1] N . Note that H can
be constructed as the pointwise stabilizer of {β1 , . . . , βi−1 } in G [i+1] N . Since
G [i] ≥ H ≥ G [i+1] = (G [i] )βi , the set := βiH is a block of imprimitivity for the
action of G [i] on the cosets of G [i+1] . Let
denote the block system consisting
of the G [i] -images of . The homomorphism ψ : G [i] → Sym(
) defining the
action of G [i] on the block system
can be obtained in nearly linear time.
We claim that the image of ψ is permutation isomorphic to the action of
G [i] on the G-cosets of G [i+1] N . Indeed, the map ϕ : g → G [i+1] N g is a bi-
jection between the permutation domains. This map ϕ is well defined, since for
g, h ∈ G [i] ,
g = h ⇐⇒ gh −1 ∈ H = G [i] ∩ G [i+1] N ⇐⇒ gh −1 ∈ G [i+1] N

⇐⇒ G [i+1] N g = G [i+1] N h.
Moreover, (ϕ(g ))h = ϕ((g )h ), so ϕ induces a permutation isomorphism.

Since we can construct the permutation action of G [i] on the cosets of G [i+1] N
and N acts trivially on these cosets, the permutation action of a generating set
of G = G [i] , N is constructed in nearly linear time.
Corollary 6.2.3. Let G ≤ Sym(), with || = n, and let N be a maximal nor-
mal subgroup of G. Then G/N has a faithful permutation representation of
degree at most n.
By the preceding discussion, finding a composition series is reduced to the

following problem:
Given a primitive group G ≤ Sym(), with || = n, find generators

for a proper normal subgroup, or find a faithful representation (6.7)
on at most n/2 points, or prove that G is simple.
The fact that the construction of the action of G on the cosets of G [i+1] N
in Lemma 6.2.2 can be done in nearly linear time was observed in [Beals and
Seress, 1992]. That paper also contains a strengthening of the method, when
we do not have generators for N . We shall need this extension in Section 6.2.3.
Lemma 6.2.4. Let G ≤ Sym(), with || = n, let B = (β1 , . . . , βm ) be a base

for G, and let G = G [1] ≥ G [2] ≥ · · · ≥ G [m+1] = 1 be the corresponding
point stabilizer chain. Suppose that N G, and suppose that the index i such
that G = G [i] N G [i+1] N is known. Then, given an SGS for G relative to B
and generators for a subgroup K ≥ G [i+1] N , the action of G on the cosets of
K can be constructed in nearly linear time by a deterministic algorithm, even
if we have no generators for N .
Proof. The first part of the algorithm is similar to the one described in
Lemma 6.2.2. Namely, we construct the permutation action of G [i] on the cosets
of K . The only difference from Lemma 6.2.2 is that we compute H := G [i] ∩ K
(instead of the unknown G [i] ∩ G [i+1] N ). Then, as in the proof of Lemma 6.2.2,
we compute := βiH and the block system
consisting of the G [i] -images of
[i]
in the fundamental orbit βiG . The same proof as in Lemma 6.2.2, just replacing
each reference to G [i+1] N by K , shows that the G [i] -action on the cosets of K
is permutation isomorphic to the G [i] -action on
. Let 1 := , 2 , . . . , m
denote the blocks in
(with m = |G : K | ≤ n).
The new part of the algorithm is that we also have to be able to construct the
action of arbitrary elements of G on the cosets of K . To this end, we compute
a shallow Schreier tree data structure for K relative to the base B, and for each
j ∈ [1, m] we choose a point σ j from j . Using the ith tree in the shallow
Schreier tree data structure of G, we write the coset representatives h j carrying
βi to σ j as words in the strong generators. The set of all h j is a transversal
system for G mod K . Note that because of the nearly linear time constraint, we
cannot compute the permutations h j explicitly.
Given some g ∈ G, for each j ∈ [1, m] we want to determine to which coset
of K the permutation h j g belongs. First, we compute g −1 . We claim that after
that, for a fixed j, the coset of h j g can be found in O(log3 |G|) time. Applying
this procedure for the generators g of G, the action of G on the cosets of K is
found in nearly linear time.
Recall our convention that strong generating sets are closed for taking in-
verses. Hence we can write a word w of length at most 2 log |G| + 1 whose
product is g −1 h −1
j . We sift w as a word (cf. Lemma 5.2.2) through the first
i − 1 levels of the Schreier tree data structure of K . Since any element of
G = G [i] K can be written in the form gi k for some k ∈ K and gi ∈ G [i] , this
sifting is feasible. By Lemma 5.2.2, the time requirement is O(log3 |G|) and the
siftee s := g −1 h −1
j k ∈ G , for some k ∈ K , is represented as a word of length
[i]
O(log |G|). So we can write a word (in terms of g and the SGS for G, K )
2
representing s −1 in O(log2 |G|) time. Since ks −1 = h j g, the coset of h j g can

−1
be found by looking up to which block l the point βis belongs.
6.2.2. The O’Nan–Scott Theorem

Every abstract group has a faithful transitive permutation representation, but the
structure of primitive permutation groups is quite restricted. Primitive groups
can be divided into classes according to their socle and the stabilizer of a point
of the permutation domain in the socle. This characterization is known as the
O’Nan–Scott theorem (cf. [Scott, 1980]), but there are numerous versions. In
fact, we shall also state two versions: The first one is more of a group theoretic
nature, whereas the second one is a rearrangement of the cases and is suited
better for our algorithmic purposes.
We need some notation. If T is a simple nonabelian group and H = T1 ×· · ·×
Tk is a group such that for i = 1, 2, . . . , k we have isomorphisms ϕi : T → Ti then
we denote the subgroup {(ϕ1 (t), . . . , ϕk (t)) ∈ H | t ∈ T } by Diag(H ) and call it
a diagonal subgroup of H . Although Diag(H ) depends on the isomorphisms
ϕi , there will be no confusion from omitting these isomorphisms from the
notation.
Theorem 6.2.5. Let G ≤ Sym() be primitive and let || = n. Then one of
the following holds:
(I) G has a minimal normal subgroup N with C G (N ) = 1. Moreover,

(i) if N is abelian then n = p d for some prime p, N is regular, and it is
the only minimal normal subgroup of G;
(ii) if N is nonabelian then there are exactly two minimal normal sub-
groups of G and both of them are regular.
(II) G has a unique minimal normal subgroup N = Soc(G) = T1 × · · · × Tr ,
where each Ti is isomorphic to the same simple nonabelian group T and
G Aut(T ) Sr . The group G permutes the set {T1 , . . . , Tr } by conjugation,
and this permutation action is transitive. Moreover, let α ∈ . Then one
of the following three cases occurs:
(i) Nα = 1 and n = |T |r ;
(ii) Nα = (T1 )α × · · · × (Tr )α with isomorphic subgroups (Ti )α satisfying
1 = (Ti )α = Ti , and n = |T1 : (T1 )α |r ; or
(iii) r = kl with k ≥ 2 and there exists a permutation of the indices
1, 2, . . . , r such that Nα = Diag(T1 × · · · × Tk ) × Diag(Tk+1 × · · · ×
T2k ) × · · · × Diag(Tr −k+1 × · · · × Tr ). Moreover, n = |T1 |(k−1)l .
Proof. Suppose first that G has a minimal normal subgroup N with C G (N ) = 1.

By Exercise 6.4, C G (N ) is also a normal subgroup of G and its centralizer is
nontrivial, since it contains N . Moreover, by Exercise 6.1, both N and C G (N )
are transitive. However, by Lemma 6.1.4, N and C G (N ) are semiregular as sub-
groups of the semiregular groups CSym() (C G (N )) and CSym() (N ), respectively.
Hence both N and C G (N ) are regular.
If N is abelian then N ≤ C G (N ) and so N = C G (N ). Moreover, since N is a
minimal normal subgroup, it is the direct product of isomorphic simple groups,
that is, an elementary abelian p-group for some prime p and n = p d for some
d ≥ 1. If N is nonabelian then N and C G (N ) are two different subgroups. Both
of them are minimal normal in G, since a proper subgroup of a regular group
cannot be transitive. In both of the cases of abelian and nonabelian N , there
are no other minimal normal subgroups of G because any two minimal normal
subgroups must centralize each other. Hence we are in case I(i) or I(ii) of the
theorem.
Suppose now that G has no minimal normal subgroup with nontrivial cen-
tralizer. Then, using again that any two minimal normal subgroups centralize
each other, we obtain that G has only one minimal normal subgroup, which we
may denote by N . Since N is minimal normal, there exists a nonabelian simple
group T and an integer r ≥ 1 such that N = T1 × · · · × Tr with Ti ∼ = T for
all i ∈ [1, r ]. The only minimal normal subgroups of N are the groups Ti for
i ∈ [1, r ] (cf. Exercise 6.10), so G must permute these by conjugation. Hence
Aut(N ) ∼ = Aut(T ) Sr and G can be identified with a subgroup of Aut(T ) Sr .
The minimality of N also implies that this conjugation action is transitive since
any orbit corresponds to a normal subgroup of G.
Let α ∈ . We first observe that G α also acts by conjugation transitively
on {T1 , . . . , Tr }, since G = N G α and the conjugation action of N is trivial. We
consider the projections πi : Nα → Ti .
If π1 is surjective then all πi are surjective since πi (Nα ) = π1 (Nα )g for some
g ∈ G α conjugating T1 to Ti . In this case, Nα is a subdirect product of the Ti
and, by Exercise 6.9, there is a partition = (1 , . . . , l ) of the index set
{1, 2, . . . , r } such that Nα is the product of l diagonal subgroups, corresponding
to this partition. Since Nα G α , the parts i are permuted by the conjugation
action of G α on {1, 2, . . . , r }, and since the conjugation action is transitive, all
parts must be of the same size. Hence we are in case II(iii).
If 1 π1 (Nα ) T1 then 1 πi (Nα ) Ti for all i ∈ [1, r ] and we are in
case II(ii). Finally, if 1 = π1 (Nα ) then 1 = πi (Nα ) for all i ∈ [1, r ] and we are
in case II(i).
It is possible to describe the structure and action of primitive groups in much

more detail than was done here, but we confine ourselves to the properties
needed in the design and analysis of the composition series algorithm. A more
thorough description can be found, for example, in [Dixon and Mortimer, 1996,
Chap. 4]. Some further material is also listed in Exercise 6.7 concerning case
I(ii) and in Exercise 6.8 concerning cases II(ii) and II(iii) with l > 1. We need
more details only in the case II(iii) with l = 1. The proof of the following lemma
can be found in [Dixon and Mortimer, 1996, Theorem 4.5A].
Lemma 6.2.6. Let G ≤ Sym() be primitive with a unique minimal normal

subgroup N = T1 × · · · × Tr , such that all Ti are isomorphic to a nonabelian
simple group T . Let α ∈ and suppose that Nα = Diag(T1 × · · · × Tr ). Then
G α = H × P, where Inn(T ) H Aut(T ) and P is isomorphic to a primitive
subgroup of Sr .
The subgroup Nα defines an isomorphism among the Ti . So N can be iden-
tified with the set of sequences (t1 , . . . , tr ), ti ∈ T . For g ∈ P, (t1 , . . . , tr )g is a
sequence consisting of a permutation of the entries of (t1 , . . . , tr ), according to
the permutation action of P in Sr .
Our second version of the O’Nan–Scott theorem follows.
Theorem 6.2.7. Let G ≤ Sym() be primitive and let || = n. Then G satisfies
at least one of the following properties:
(A) G has a minimal normal subgroup N with C G (N ) = 1.

(B) G has a proper normal subgroup of index less than n.
(C) G has a unique minimal normal subgroup N = Soc(G) = N1 × · · · × Nm ,
where the Ni are isomorphic groups, m!/2 ≥ n, and G acts by conjugation
on {N1 , . . . , Nm } as the full alternating group Am . Moreover, let α ∈ .
Then one of the following three cases occurs:
(i) Nα = 1;
(ii) Nα = (N1 )α × · · · × (Nm )α with 1 = (Ni )α = Ni ; or
(iii) each Ni is isomorphic to the same simple nonabelian group T and
Nα = Diag(N1 × · · · × Nm ).
(D) G is simple.
Proof. We have to show that the groups in case II of Theorem 6.2.5 are covered
by cases B, C, and D of this theorem. Using the notation of Theorem 6.2.5,
suppose that G is in case II, with minimal normal subgroup N = T1 × · · · × Tr
and all Ti isomorphic to the same simple nonabelian group T . Since |T | ≥ 60
and |T : H | ≥ 5 for any proper subgroup H of T , we have r ≤ log5 n in each
subcase of case II.
If r = 1 then either G is simple (and it is in case D) or N ∼ = Inn(T ) G ≤
Aut(T ). In the latter case, G/N is solvable by Schreier’s Hypothesis (which is
now a theorem, as a consequence of the classification of finite simple groups);
hence G has a cyclic factor group of order p for some prime p. Clearly p ≤ n,
and so G is in case B.
Suppose now that r ≥ 2. Let
be a minimal block system in the conjugation
action of G on {T1 , , . . . , Tr } and let m := |
|. If G is in case II(iii) and m <
r then we also suppose that the groups Ti belonging to the same diagonal
collection are in the same block of
. Then G acts primitively on
; let P ≤ Sm
denote the image of this action. By [Praeger and Saxl, 1980], if P ≤ Sm is
primitive and P = Am , Sm then |P| < 4m ; since m ≤ r ≤ log5 n, the kernel of
action on
has index less than n and G is in case B. If P = Sm then it has
a normal subgroup of index 2 and G is in case B. This leaves us with the
case that P = Am . If m!/2 < n then we are in case B, so we may suppose that
m!/2 ≥ n.
Now, if G is in case II(i) then G is in case C(i). If G is in case II(ii) then it is
in case C(ii). If G is in case II(iii) with l > 1 then it is in case C(ii). Finally, if
G is in case II(iii) with l = 1 then G is in case C(iii).
The composition series algorithm that we shall describe in the next two
sections uses different methods for solving the problem (6.7) from Section 6.2.1
for inputs belonging to the different cases of Theorem 6.2.7. If it is not known
to which case of Theorem 6.2.7 the input primitive group G belongs, then the
algorithm tries all methods. Note that if the output of (6.7) is a proper normal
subgroup or a faithful permutation action on at most n/2 points then the cor-
rectness of the output can be checked in nearly linear time. Hence if all methods
report failure or that G is simple then we can conclude that G is indeed simple.
The only regular primitive groups are cyclic of prime order, and these can
be easily recognized. Hence, in the next two sections, we may assume that the
input primitive group is not regular.
6.2.3. Normal Subgroups with Nontrivial Centralizer
In practice, solving the problem (6.7) is most time consuming in the case when
the input primitive group G has a regular normal subgroup with a nontrivial
centralizer (i.e., G has a regular abelian normal subgroup or G has two regular
nonabelian normal subgroups). We shall present two approaches. The first
method originated in [Luks, 1987] and subsequently was refined in [Babai
et al., 1993] and [Beals and Seress, 1992] to a nearly linear-time algorithm. The
second method was introduced in [Neumann, 1986], and it handles the case
when G has a regular abelian normal subgroup. Here we give a nearly linear-
time version. As we shall discuss in Remarks 6.2.11 and 6.2.14, both approaches
can be extended to handle all primitive groups with regular normal subgroups.
However, we cannot guarantee nearly linear running time of these extensions
in the cases when G has a unique regular nonabelian normal subgroup or G has
a nonabelian regular normal subgroup, respectively.
Let G ≤ Sym() be primitive, and suppose that G has a regular normal
subgroup N . For any α, β ∈ , there is a unique element xαβ ∈ N such that
α xαβ = β. The first method finds generators for larger and larger subgroups
of G that contain xαβ , eventually constructing some maximal subgroup K ≤ G
such that the action of G on the cosets of K is not faithful and |G : K | < n.
The action on the cosets of K is constructed, and its kernel is a solution for
(6.7). The second method constructs smaller and smaller subsets containing
xαβ , eventually arriving to some subset K ⊆ G that is small enough so that we
can afford to take the normal closure of each element.
Computing the Socle of a Frobenius Group

As a preliminary step, both methods handle the case of Frobenius groups. We
present Neumann’s algorithm for this task; another algorithm is described in
Section 6.2.5. Some background material on Frobenius groups can be found in
Exercises 6.11 and 6.12.
Let G ≤ Sym() be a primitive Frobenius group, and let N denote its unique
minimal normal subgroup. Moreover, let α ∈ and 1 = z ∈ Z (G α ). It is known
(cf. Exercise 6.12(iii), (iv)) that such an element z exists. For any g ∈ G\G α , we
g g g
have α zz = α z = α z z , since the only fixed point of z is α and so, in particular,
zg
z moves the point α . Hence the commutator [z, z g ] = 1, but [z, z g ] ∈ N since
the coset N z is in the center of G/N . Hence [z, z g ]G = N .
Given a base and SGS for G, the center of G α , and thus a nontrivial element
z in it, can be constructed by a nearly linear-time deterministic algorithm, as
described in Section 6.1.3. The normal closure [z, z g ]G can also be computed
in nearly linear time by a deterministic algorithm.
From now on, we may suppose that the input primitive group has a normal
subgroup with nontrivial centralizer, but it is not a Frobenius group. First we
describe a solution for (6.7) based on Luks’s approach.
Lemma 6.2.8. Suppose that G ≤ Sym() is primitive, || = n, G has a regular

normal subgroup with nontrivial centralizer, and there are two points α, β ∈
such that G αβ = 1. If K is any maximal subgroup of G containing NG (G αβ )
then K contains a regular normal subgroup of G and |G : K | < n.
Proof. Let N be an arbitrary regular normal subgroup of G. It contains a unique

g
element xαβ that maps α to β. For any g ∈ G αβ , xαβ ∈ N and it maps α to β;
g
hence xαβ = xαβ , which implies that xαβ ∈ C G (G αβ ) and xαβ ∈ NG (G αβ ).
Since K is a maximal subgroup of G, the action of G on the cosets of
K is primitive. This action cannot be faithful, since G either has a regular
abelian normal subgroup or two regular normal nonabelian subgroups, and
in any primitive faithful action these minimal normal subgroups have to act
regularly; however, in the G-action on the cosets of K , the element xαβ has a
fixed point (the coset K · 1).
Now, since the action of G on the cosets of K is not faithful, K contains
a normal subgroup of G and so it contains a minimal normal subgroup M of
G. Since K also contains G αβ and M ∩ G αβ = 1, we have |G : K | ≤ |M||G :
G αβ | ≤ n − 1.
Luks’s Algorithm
The algorithm for solving (6.7) is to compute NG (G αβ ), embed it into a maximal
subgroup K , construct the action of G on the cosets of K , and compute the
kernel of the action. We have to show that these computations can be done in
nearly linear time.
Lemma 6.2.9. Given G ≤ Sym() and α, β ∈ as in Lemma 6.2.8 and an

SGS for G relative to some nonredundant base, NG (G αβ ) can be computed in
nearly linear time by a deterministic algorithm.
Proof. Applying a base change (cf. Section 5.4), we may assume that α and β
are the first two base points. Let R1 and R2 be the transversals for G mod G α
and G α mod G αβ , respectively, encoded in shallow Schreier trees (S1 , T1 ) and
(S2 , T2 ). Let N denote a regular normal subgroup of G.
Let be the fixed point set of G αβ . For any g ∈ G, it is clear that g ∈ NG (G αβ )
if and only if g fixes setwise and g fixes setwise if and only if α g ∈ and
β g ∈ . Moreover, for any γ , δ ∈ , the unique element xγ δ ∈ N moving γ to
δ centralizes G αβ and so xγ δ ∈ NG (G αβ ). Therefore NG (G αβ ) acts transitively
on .
First, we compute . This can be done in nearly linear time, using the genera-
tors for G αβ in the SGS for G. After that, generators for NG (G αβ ) are computed
in two steps. We initialize H := G αβ and the orbits α H := {α} and β H := {β}.
In the first step, we compute NG (G αβ )α . This is done by running through the
elements of . If for some γ ∈ we have γ ∈ β H but γ is in the fundamental
orbit β G α then we multiply out the product rγ of labels along the path from γ to
β in the Schreier tree T2 , we replace H by H, rγ , and we recompute the orbit
β H . Since R2 is encoded in a shallow Schreier tree, the computation of rγ can
be done in nearly linear time. Moreover, H is increased at most log(n − 1)
times, since |NG (G αβ )α : G αβ | ≤ n − 1.
In the second step, we embed NG (G αβ )α into a subgroup that acts transitively
on . By the discussion in the second paragraph of the proof, this subgroup
will be NG (G αβ ). Again, we run through the elements of . If we encounter
some δ ∈ that is not in α H then we multiply out the product of labels rδ in
the Schreier tree T1 along the path from δ to α and compute β G α ∩ rδ . This
intersection is nonempty, since there exist elements in G that map δ to α and fix
setwise (and so, in particular, map β to an element of ). Any such element
x can be written in the form x̄rδ−1 for some x̄ ∈ G α and then β x̄ ∈ β G α ∩ rδ .
Therefore we can pick γ ∈ β G α ∩ rδ , multiply out the product of labels rγ in
the Schreier tree T2 along the path from γ to β, replace H by H, rγ−1rδ−1 , and
recompute the orbit α H . Since H increases in the second step at most log ||
times, the second step runs in nearly linear time as well.
The next step in the solution of (6.7) is the embedding of NG (G αβ ) into a

maximal subgroup K of G. This part of the algorithm is of Monte Carlo type
and requires (n) group operations, but so many permutation multiplications
are not allowed in the nearly linear-time context. Therefore, we consider the
isomorphism ψ : G → H with the black-box group H consisting of the stan-
dard words, as described in Section 5.3. We embed the subgroup ψ(NG (G αβ ))
into a maximal subgroup K ∗ of H , and we finally construct K := ψ −1 (K ∗ ). By
Lemma 5.3.1, group operations in H (including the construction of nearly uni-
formly distributed random elements in subgroups of H ) require only O(logc |G|)
time for an absolute constant c.
Lemma 6.2.10. Let H be a black-box group, L ≤ H , and suppose that we can

test membership in L and that there exists a maximal subgroup of K ∗ ≤ H
containing L that has index less than n. Then generators for a subgroup L ∗
satisfying L L ∗ H can be computed by a Monte Carlo algorithm, using
O(log2 (ε−1 )n log n) group operations and membership tests in L, where ε is
the desired error probability.
Proof. Let l := n log((ε/2)−1 ) . We construct a sequence (h 1 , . . . , h l ) of ele-

ments of H recursively. Let h 1 be a random element of H \L and, if h i−1 is
already defined, let h i be a random element of L , h i−1 \L. (We construct h i
as a random element of L , h i−1 ; if h i happens to be in L, then we repeat the
choice of h i , up to log(2l/ε) times.)
We claim that with probability greater than 1 − ε, the sequence (h 1 , . . . , h l )
is constructed and L L , h l H . By the definition of the sequence, for each
i we have L L , h i . For a fixed i, if h i−1 is defined then the probability that
a random element of L , h i−1 is not in L is at least 1/2, so the probability
that the algorithm fails to construct h i with log(2l/ε) random selections in
L , h i−1 is at most ε/2l. Therefore, the sequence (h 1 , . . . , h l ) is constructed
with probability at least 1 − ε/2.
Suppose now that (h 1 , . . . , h l ) is constructed. If L , h i−1 = H for some i
then the probability that h i ∈ K ∗ is greater than 1/n, and if L , h i is a proper
subgroup of H then L , h j is a proper subgroup of H for all j ≥ i. Hence the
probability that L , h l = H is less than (1 − 1/n)l < ε/2.
We compute a shallow Schreier tree data structure for NG (G αβ ), compute

L := ψ(NG (G αβ )), and apply Lemma 6.2.10 to obtain L ∗ ≤ H and the subgroup
ψ −1 (L ∗ ) ≤ G properly containing NG (G αβ ). (Note that because we have a
shallow Schreier tree data structure for NG (G αβ ), we can test membership in
L.) Then we repeat the same steps, starting with ψ −1 (L ∗ ) ≤ G. We stop when
the subgroup returned by the algorithm of Lemma 6.2.10 is H (this fact is
noticed when the next shallow Schreir tree data structure is computed); with
high probability, the input subgroup of the last iteration is maximal in G. Since
|G : NG (G αβ )| < n 2 , the number of iterations is O(log n).
The final stage of the solution of (6.7) is the construction of the action of G
on the cosets of K and the computation of the kernel of this action. This can be
done as described in Lemma 6.2.4.
Remark 6.2.11. A similar algorithm can be used to solve (6.7) in the case
when G has a unique nonabelian regular normal subgroup. The construction
of NG (G αβ ) and its embedding into a maximal subgroup K can be done in
nearly linear time, as described in Lemmas 6.2.9 and 6.2.10. However, we
cannot guarantee that K contains N , and so the action of G on the cosets of K
may be faithful and Lemma 6.2.4 may not be applied. We obtain a solution for
(6.7) even in the case when K does not contain N , since then G = K N and
K ∩ N = 1, so |G : K | ≤ n/2. However, it is not clear how we can construct the
action on the cosets of K in nearly linear time. We shall see in Section 6.2.6 that
this is possible if β is chosen from the smallest orbit of G α in \{α}, but the
proof of that uses consequences of the classification of simple groups. There
is also an elementary nearly linear-time algorithm for the handling of groups
with regular non-abelian socle, which we shall present in Section 6.2.4.
Neumann’s Algorithm (the EARNS Subroutine)

Now we turn to the description of the EARNS (Elementary Abelian Regular
Normal Subgroup) method. Suppose that G ≤ Sym() is primitive with a regu-
lar abelian normal subgroup N , n := || = p d for some prime p, and there are
two points α, β ∈ such that G αβ = 1. We may suppose that β is from an orbit
of G α in \{α} of the smallest size, since replacing any β with one from such
an orbit does not decrease the size of G αβ and so the property G αβ = 1 still
holds.
As in the proof of Lemma 6.2.9, let be the set of fixed points of G αβ . For
any two points γ , δ ∈ , we have G αβ ≤ G γ δ and so the maximality of |G αβ |
implies G αβ = G γ δ . Therefore the restriction L := NG (G αβ )| is either regular
or a Frobenius group. In either case, L has a regular normal subgroup R that
consists of the restrictions of the elements xγ δ ∈ N mapping γ to δ, for all (not
necessarily different) γ , δ ∈ . Moreover, all such X γ δ ∈ N centralize G αβ (see
the second paragraph of the proof of Lemma 6.2.9).
We start the algorithm by computing M := NG (G αβ ), and then C G (G αβ ) =
C M (G αβ ). As we have observed in the previous paragraph, C G (G αβ )| contains
R as a normal subgroup. We continue the algorithm by computing R, and the
next step is the computation of the preimage D of R in C G (G αβ ). The group
D is abelian, because D() = Z (G αβ ) and so D is generated by Z (G αβ ) and
by elements of N that commute with each other and with Z (G αβ ). Next, we
construct the subgroup P of Z (G αβ ) consisting of the elements of Z (G αβ ) of
order dividing p, and x ∈ D of order p such that x| = 1. Then the coset P x
contains a nontrivial element of N . Therefore, we can construct N by taking
the normal closure of each element of P x in G and pick the only abelian group
among these normal closures.
By Lemma 6.2.9 and by Section 6.1.4, the computation of C G (G αβ ) can be
done in nearly linear time. We have seen earlier in this section that the regular
normal subgroup in Frobenius groups can be computed in nearly linear time, so
R can be obtained by a nearly linear-time algorithm. The group D is a preimage
at a transitive constituent homomorphism, and Z (G αβ ) is the pointwise stabilizer
D() . Generators for P are obtained by taking the appropriate powers of the
generators of Z (G αβ ), and x can be chosen as a power of an element in the
transversal for D mod Dα . Hence the coset P x can be constructed in nearly
linear time.
Next, we need an upper estimate for |P|. It is based on the following result
[Neumann, 1986, Lemma 3.4], the proof of which we leave as Exercise 6.13.
Lemma 6.2.12. Let M be any group, let A be an abelian p-subgroup of M of

order q, let C := C M (A), and let m := |M : C|. If O p (M) = 1 then q ≤ m.
Corollary 6.2.13. |P| < n.
Proof. We claim that O p (G α ) = 1. Indeed, N can be identified with a d-

dimensional vector space over GF( p), with α playing the role of the 0 vector;
with this identification, G α acts on N by conjugation as an irreducible subgroup
of GLd ( p), and the fixed vectors of O p (G α ) comprise a block of imprimitivity
of G (cf. Exercise 6.14). Hence the only fixed vector of O p (G α ) is the 0 vector
and so O p (G α ) = 1.
We can apply Lemma 6.2.12 with M := G α and A := P. Since P ≤ Z (G αβ ),
we have C M (P) ≥ G αβ and so |M : C M (P)| ≤ |G α : G αβ )| ≤ n − 1. Hence
|P| ≤ |M : C M (P)| < n.
By the corollary, |P x| < n. In a permutation group, we can compute the nor-

mal closure of an element in nearly linear time; however, we cannot do (n) such
computations within the nearly linear time constraint. Therefore, as in Luks’s
approach, we consider the isomorphism ψ : G → H with the black-box group
H = S consisting of the standard words. By Theorem 2.3.9 and Lemma 2.3.14,
for any h ∈ H the normal closure h H can be computed, and then the commu-
tativeness of h H can be tested, using O(log |H |(|S| + log |H |)) group oper-
ations. By Lemma 5.3.1, group operations in H can be performed in logc |G|
time, for an absolute constant c. Hence we can compute the normal closure of
all standard words representing elements of P x in nearly linear total time by a
Monte Carlo algorithm. When an abelian normal subgroup of H is found, the
image of its generators under ψ −1 can be computed in nearly linear time by a
Remark 6.2.14. This method can also be extended to solve (6.7) for primitive
groups with one or two nonabelian regular normal subgroups. Let N denote
a regular normal subgroup of G. Using the notation of Neumann’s algorithm,
our observation that R consists of the restrictions of the elements of N to
is still valid, and R can be constructed in nearly linear time. For an element
x ∈ D of prime order r satisfying x| = 1 and for the subgroup P of Z (G αβ )
consisting of elements of order dividing r , it is still true that P x contains a
nontrivial element of N . Corollary 6.2.13 is also true, since G α has no non-
trivial solvable normal subgroups (cf. Exercise 6.15 for the case of two regular
normal subgroups and [Dixon and Mortimer, 1996, Theorem 4.7B(i)] for the
case of a unique nonabelian regular normal subgroup). Hence we can com-
pute the normal closures in H of all standard words representing elements
of P x in nearly linear total time. However, we cannot decide which normal
closure corresponds to a regular subgroup of G within the nearly linear time
constraint.
6.2.4. Groups with a Unique Nonabelian Minimal Normal Subgroup

Given a primitive group G ≤ Sym(), with || = n, if the algorithm of Sec-
tion 6.2.3 fails to solve the problem (6.7) then, with high probability, G has
a unique nonabelian minimal normal subgroup. In this section we solve (6.7)
for such groups. The solution again originated in [Luks, 1987] and was sped
up to nearly linear time in [Babai et al., 1993] and [Beals and Seress, 1992],
but in this case the bulk of the work was done in [Babai et al., 1993]. Groups
in cases B, C(i), C(ii), and C(iii) of Theorem 6.2.7 are handled by different
methods, but in each subcase of case C, a common feature is that we embed
a two-point stabilizer G αβ into a maximal subgroup K < G and compute the
primitive permutation representation of G on the cosets of K .
The first step is to check whether G belongs to case B. If G has a proper
normal subgroup of index less than n then, for a uniformly distributed random
g ∈ G\{1}, the normal closure g G is a solution of (6.7) with probability greater
than 1/n. Given an SGS for G, a random element and its normal closure can
be computed in nearly linear time; however, we have to repeat this process
(n) times, which is not possible in the nearly linear time frame. Hence, once
again, we consider the isomorphism ψ : G → H with the black-box group H
consisting of the standard words representing the elements of G.
Lemma 6.2.15. Let H = S be a black-box group, and suppose that there
exists a normal subgroup of H of index less than n. Then generators for a proper
normal subgroup of H can be computed by a Monte Carlo algorithm, using
O(log(ε−1 )n(|S| + log |H | + log n + log(ε −1 ))2 ) group operations, where ε is
the desired error probability. (As in Lemma 6.2.10, we consider constructing a
nearly uniformly distributed random element in a subgroup of H as one group
operation.)
Proof. Let l := n log((ε/3)−1 ) . We construct a sequence (h 1 , . . . , h l ) of ele-

ments of H recursively. Let h 1 be a random element of H \{1} and, if h i−1 is
already defined, let h i be a random element of h i−1 H
\{1}. (We construct h i as
a random element of h i−1 ; if h i happens to be 1, then we repeat the choice of
H
h i , up to log(3l/ε) times.)
We claim that with probability greater than 1 − ε, the sequence (h 1 , . . . , h l )
is constructed and 1 h lH H . By the definition of the sequence, for each
i we have 1 h iH . For a fixed i, if h i−1 is defined then the probability that
a random element of h i−1 H
is not 1 is at least 1/2, so the probability that the
algorithm fails to construct h i with log(3l/ε) random selections in h i−1 H
is at
most ε/3l. Therefore, the sequence (h 1 , . . . , h l ) is constructed with probability
at least 1 − ε/3.
Suppose now that (h 1 , . . . , h l ) is constructed. If h i−1
H
= H for some i then
the probability that h i is in a proper normal subgroup is greater than 1/n, and
if h iH is a proper subgroup of H then h Hj is a proper subgroup of H for all
j ≥ i. Hence the probability that h lH = H is less than (1 − 1/n)l < ε/3.
The analysis in the previous paragraph assumed that the normal closure
computations give the correct result. If we ensure that each of the l normal
closure computations are correct with probability at least 1 − ε/3l then the
entire algorithm succeeds with probability greater than 1 − ε.
The normal closures are computed by the algorithm of Theorem 2.3.9. Since
for each i we want the computation of h iH to succeed with probability at
least 1 − ε/3l, we need to construct O(log |H | + log n + log 1/ε) generators
for h iH , at a cost of O((log |H | + log n + log 1/ε))(|S| + log |H | + log n +
log 1/ε) group operations (see also Remark 2.3.5 for the justification of these
quantities and Exercise 6.16).
The Algorithm for Case B of Theorem 6.2.7

Therefore, to check whether G belongs to case B, we compute generators for a
normal subgroup L of H := ψ(G) by the algorithm of Lemma 6.2.15. After that,
we compute an SGS for ψ −1 (L) ≤ G in nearly linear time by a deterministic
algorithm. If ψ −1 (L) = G then we have a solution for (6.7); otherwise, with
high probability, G has no proper normal subgroup of index less than n.
The treatment of primitive groups belonging to case C is deterministic. In
each subcase of C, we need information about the orbits of G α . The elements
of correspond to the cosets of G α , and the action of G α on is permuta-
tion isomorphic to its action on these cosets by multiplication. Furthermore,
an elementary but very important observation is that the action of G α on the
cosets by multiplication is permutation isomorphic to its action on the cosets by
conjugation, since for g ∈ G α and h ∈ G, we have (G α h)g = (G α h)g = G α h g .
Moreover, since N = Soc(G) is transitive on , the elements of can be also
identified with the cosets of Nα in N . Hence, from each G α -coset we can choose
a representative h ∈ N ; the conjugation action of G α carries G α h to a coset with
representative h g ∈ N .
First, we consider the subcases C(i) and C(ii). In these subcases, Nα =
(N1 )α × · · · × (Nm )α . Any β ∈ corresponds to a sequence of cosets
((N1 )α h 1 , . . . , (Nm )α h m ) with h i ∈ Ni for i ∈ [1, m]. We call the sequence
(h 1 , . . . , h m ) the coordinates of β; a coordinate is called nontrivial if h i ∈ (Ni )α .
Note that the number of nontrivial coordinates is independent of the choice of
the coordinate sequence for β and, since G α permutes the Ni and the (Ni )α by
conjugation, for any g ∈ G α the number of nontrivial coordinates in β and in
β g is the same.
Lemma 6.2.16. Let G ≤ Sym() be primitive, with || = n, and suppose that
G belongs to case C(i) or C(ii) of Theorem 6.2.7. Then for any α ∈ , the
smallest orbits of G α in \{α} have size less than log2 n. Moreover, if G be-
longs to case C(ii) then points in these smallest orbits have only one nontrivial
coordinate and even the sum of the sizes of the smallest orbits is less than log2 n.
Proof. Since m!/2 ≥ n and (log n/ log log n)log n/ log log n < n, we have m >
log n/ log log n. Therefore, since n = |N1 : (N1 )α |m , we must have |N1 :
(N1 )α | < log n. Hence the number of points with one nontrivial coordinate
is at most m|N1 : (N1 )α | < log2 n. So there are orbits of G α in \{α}, namely
the orbits containing all points with exactly one nontrivial coordinate, whose
union has size less than log2 n.
Hence, to finish the proof, it is enough to show that, in case C(ii), the smallest
orbits in \{α} consist of points with one nontrivial coordinate. Let K G α be
the kernel of the conjugation action of G α on the set {N1 , . . . , Nm }. Note that
Nα ≤ K , G α /K ∼ = Am , and K permutes the cosets of (N1 )α in N1 . Let q denote
the size of the smallest K -orbit L among the cosets of (N1 )α , excluding (N1 )α
itself. Then, taking β ∈ with the first coordinate in L and the other m − 1
coordinates trivial, we have |β G α | = mq.
We claim that for any γ ∈ with k nontrivial coordinates, |γ G α | ≥ ( mk )q2k−1 ,
which is greater than mq if k ≥ 2. Indeed, if (h 1 , . . . , h m ) is a coordinate
sequence for γ and the jth coordinate is nontrivial, then (N j )α h j is not fixed
by the conjugation action of (N j )α ≤ K (otherwise (1, . . . , 1, h j , 1, . . . , 1) is
a fixed point of Nα , contradicting the primitivity of G by Lemma 6.1.10(i)).
Thus |γ K | ≥ q2k−1 , since γ has at least q images differing pairwise in the first
nontrivial coordinate of γ , and for each of these, the action of (N j )α in the other
nontrivial coordinates j provides at least 2k−1 images. Also, since G α /K ∼

= Am ,
the k nontrivial coordinates can be conjugated into any k positions, accounting
for the ( mk ) term.
Lemma 6.2.17. Let G = S ≤ Sym() be transitive, with = [1, n], and
suppose that for some α, β ∈ , |G α : G αβ | = k. Then the action of any g ∈ G
on the cosets of G αβ can be computed in O(|S|nk 2 ) time by a deterministic
algorithm, using O(nk) memory.
Proof. The action of G on the cosets of G αβ is permutation isomorphic to

the action of G on the orbit L := (α, β)G of the sequence (α, β). As prepro-
cessing, we compute this orbit, combining the approach of Remark 2.1.2 and
Theorem 2.1.1(ii). The elements of L are stored in an n × k matrix M, where
the rows are numbered by the elements of = [1, n]. In row γ , we store the
elements of L with first coordinate equal to γ . (Formally, the hash function
h : L → [1, n] is defined by the rule h : (γ , δ) → γ .) When the orbit algorithm
has to decide whether a newly computed image (γ , δ) already occurs in L, it
is enough to compare (γ , δ) with the pairs in row γ of M. If (γ , δ) is new, we
store it at the first empty slot in row γ . This computation requires nk|S| image
computations of pairs and O(nk 2 ) comparison of pairs.
The elements of L can be numbered 1, 2, . . . , nk, according to their positions
in M. For any g ∈ G and i ∈ [1, nk], the image i g is computed by looking up
the pair (γ , δ) in position i in M and searching for the position of (γ g , δ g ) in
row γ g of M. This requires O(k) time.
The Algorithm for Case C(i)

Now we are ready to describe the solution of (6.7) in the case C(i). Given
G ≤ Sym() belonging to this case, together with an SGS for G, we pick the
first base point as α and pick β from a smallest orbit of G α in \{α}. We compute
NG (G αβ ) as described in Lemma 6.2.9. By Lemmas 6.2.16 and 6.2.17, the action
of G on the cosets of G αβ (or, more precisely, the permutation isomorphic
action on (α, β)G ) can be computed in nearly linear time by a deterministic
algorithm. The set (α, β) NG (G αβ ) is a block of imprimitivity in this action and,
by the algorithm of Section 5.5, it can be embedded in a maximal block . The
action on the block system
consisting of the images of can be computed
easily and, by Remark 6.2.11, either |
| ≤ n/2 or the kernel of this action is a
nontrivial normal subgroup of G.
The case C(ii) requires one more preparatory step.
Lemma 6.2.18. Let G ≤ Sym(), with || = n, belong to case C(ii), let α ∈ ,
and let β be in a smallest orbit of G α in \{α}. Then there exist γ , δ ∈ such
that
(i) γ is in the same orbit of G α as β.

(ii) δ is in a smallest orbit of G γ in \{γ }.
(iii) The subgroup H := G αβ , G γ δ is not equal to G, and the action of G
on the cosets of any maximal subgroup K containing H is not faithful or
|G : K | ≤ n/2.
Proof. By Lemma 6.2.16, β has exactly one nontrivial coordinate; we may

suppose that it is the first, and so β = α t for some t ∈ N1 \(N1 )α and β has
the coordinate sequence (t, 1, . . . , 1). Let γ be chosen such that the nontrivial
coordinate of γ is the second one. Then γ = α u for some u ∈ N2 \(N2 )α and γ
has the coordinate sequence (1, u, 1, . . . , 1). Finally, let δ := γ t = β u . Then δ
has the coordinate sequence (t, u, 1, . . . , 1) and, since δ is defined as an image
of γ under only one of the N j , it is in a smallest orbit of G γ in \{γ }.
We claim that α, β, γ , and δ satisfy (iii). We first show that H = G αβ , G γ δ
normalizes N1 , and so it must be a proper subgroup of G. Let g ∈ G αβ , and
suppose that N1 = N j for some j ∈ [1, m]. Then t g t −1 ∈ N j N1 . If j = 1 then
g
g −1
α t t = α would have a nontrivial first coordinate since t ∈ (N1 )α , which is a
contradiction. The same argument can be repeated for g ∈ G γ δ , in the coordi-
nate system according to the cosets of Nγ , since Nγ = Nαu for u ∈ N2 , which
centralizes N1 , and so (N1 )γ = (N1 )uα = (N1 )α and t ∈ (N1 )γ .
Let K be a maximal subgroup containing H , and suppose that the action of
G on the cosets of K is faithful. We claim that |G : K | ≤ n/2m . Note first that,
similarly to the argument in the previous paragraph, (N2 )β = (N2 )tα = (N2 )α
and (N2 )δ = (N2 )tγ = (N2 )γ since t centralizes N2 . Therefore (N2 )α ≤ G αβ and
(N2 )γ ≤ G γ δ , and so (N2 )α , (N2 )γ ≤ K . This means that K ∩ N2 = 1
and the action of G on the cosets of K falls in case C(ii), implying |G :
K | = |N2 : (N2 ∩ K )|m (since N2 ∩ K is the stabilizer of the “point” K in
N2 , in the action on the cosets of K ). We have (N2 )γ = (N2 )uα = (N2 )α , because
otherwise (N j )γ = (N j )α for all j ∈ [1, m], contradicting Lemma 6.1.10(i).
Thus |N2 : (N2 ∩ K )| < |N2 : (N2 )α | and |G : K | = |N2 : (N2 ∩ K )|m ≤ (|N2 :
(N2 )α |/2)m = n/2m , as claimed.
The Algorithm for Case C(ii)

If the input group G belongs to case C(ii) then we can solve (6.7) by fixing
α ∈ , fixing β in a smallest orbit of G α in \{α}, and testing for all γ ∈
and all δ in the smallest orbits of G γ in \{γ } whether or not H := G αβ , G γ δ
is equal to G. When a pair (γ , δ) is found with H = G, we construct the action
of G on the cosets of a maximal subgroup of G containing H .
By Lemma 6.2.16, the number of pairs (γ , δ) to be tested is less than log4 n.
Note that we can test whether H = G without computing an SGS for H by
constructing the action of the generators of H on the set L := (α, β)G . By
Lemma 6.2.17, this can be done by a nearly linear-time deterministic algorithm,
and H = G if and only if H acts transitively on L. Similarly, we do not need to
embed the group H = G we have found into a maximal subgroup; instead, it is
enough to compute a maximal block in L containing (α, β) H and the action
of G on the block system consisting of the images of .
The last subcase we have to handle is C(iii). In this subcase, using the no-
tation of Theorem 6.2.7, Nα = Diag(N1 × · · · × Nm ) defines an identification
among the terms Ni of the direct product N , and the points of correspond to
equivalence classes of the sequences (t1 , . . . , tm ), with ti ∈ T . Two sequences
(t1 , . . . , tm ) and (s1 , . . . , sm ) are equivalent if and only if there exists some t ∈ T
such that ti = tsi holds for all i ∈ [1, m]. Since m!/2 ≥ n and n = |T |m−1 , we
have |T | < log n and so each equivalence class has less than log n elements.
Lemma 6.2.19. Let G ≤ Sym() be primitive, with || = n, and suppose that
G belongs to case C(iii) of Theorem 6.2.7. Then, for any α ∈ , the cardinality
of the union of orbits of G α of size less than log2 n is O(log6 n).
Proof. Since the conclusion of the lemma is an asymptotic statement, we can

suppose that n is large enough so that ( m4 ) ≥ log3 n. We shall prove that, for
such n, there are less than log6 n points in in G α -orbits of size less than log2 n.
Again, we use the notation of Theorem 6.2.7. For g ∈ N , let E(g) denote
the equivalence class of g and let βg ∈ denote the element of the permutation
domain corresponding to E(g). If some g ∈ N has at least log3 n conjugates
under G α then |βgG α | ≥ log2 n since the conjugates of g belong to at least log2 n
equivalence classes. Hence it is enough to estimate the number of equivalence
classes that contain elements of N with less than log3 n conjugates under G α .
We assign a nondecreasing sequence S(g) = (m 1 , . . . , m k ) of positive in-
tegers to each g = (t1 , . . . , tm ) ∈ N . The length k of S(g) is the number of
elements of T occurring in the sequence g, and the numbers m i are the frequen-
cies of how often these elements occur. It is clear from the definition of S(g)
that if h ∈ E(g) then S(h) = S(g). Moreover, by Lemma 6.2.6, G α = H × P,
where Inn(T ) H Aut(T ) and P ∼ = Am . Hence if g, h ∈ N are conjugates
under G α then S(g) = S(h).
Suppose that for some g = (t1 , . . . , tm ) ∈ N there are indices i 1 , . . . , il such
that 4 ≤ m i1 + · · · + m il ≤ m − 4 for the numbers m i in S(g). Then g has at
least ( m4 ) ≥ log3 n conjugates under G α , since the subgroup P ≤ G α can move

the set of coordinates of g where the elements with frequency m i1 , . . . , m il
occur to any set of m i1 + · · · + m il positions, and ( m i +···+m m
il
) ≥ ( m4 ). The
1
only way that no such indices i 1 , . . . , il exist is that some element of T oc-
curs in the sequence g = (t1 , . . . , tm ) at least m − 3 times. Therefore, the ele-
ments g ∈ N with less than log3 n conjugates must have S(g) = (1, 1, 1, m −
3), (1, 2, m − 3), (3, m − 3), (1, 1, m − 2), (2, m − 2), (1, m − 1), or (m). The
number of elements with these S(g)-values is ( m3 )|T |(|T |−1)(|T |−2)(|T |−3) +
( m3 )( 32 )|T |(|T | − 1)(|T | − 2) + ( m3 )|T |(|T | − 1) + ( m2 )|T |(|T | − 1)(|T | − 2) +
( m2 )|T |(|T | − 1) + ( m1 )|T |(|T | − 1) + ( m0 )|T | < |T | log6 n and the set of these
elements is the union of equivalence classes, so there are less than log6 n equiv-
alence classes with corresponding orbits in of size less than log2 n.
Lemma 6.2.20. Let G ≤ Sym(), with || = n, belong to case C(iii), and let
α ∈ . Then there exists β ∈ such that
(i) β is in an orbit of G α in \{α} of size less than log2 n.

(ii) The action of G on the cosets of any maximal subgroup K containing the
setwise stabilizer G {α,β} is not faithful or |G : K | ≤ n/2.
Proof. As usual, we use the notation of Theorem 6.2.7. Let t ∈ T be an in-

volution (note that t exists since, by the Feit–Thompson theorem, every finite
nonabelian simple group has even order). Let g := (t, 1, . . . , 1) ∈ N , and let
β := α t . Then β satisfies (i) since g has at most m|T | < log2 n conjugates
under G α .
Let K be a maximal subgroup K containing the setwise stabilizer G {α,β} , and
suppose that the action of G on the cosets of K is faithful. We have β g = α since
g is an involution, and so g ∈ K . This means that K ∩ N1 = 1 and therefore
the primitive action on the cosets of K belongs to case C(ii) of Theorem 6.2.7.
Hence |G : K | = |N1 : (N1 ∩ K )|m ≤ |T |m /2m ≤ n|T |/2m . Since m!/2 ≥ n,
we have |T | = n 1/(m−1) ≤ (m!/2)1/(m−1) ≤ 2m−1 and |G : K | ≤ n/2.
The Algorithm for Case C(iii)

In case C(iii) the solution of (6.7) is to fix α ∈ , and for all β ∈ in orbits of
G α of size less than log2 n, construct the action of G on the cosets of a maximal
subgroup containing G {α,β} , and compute the kernel of the action.
By Lemma 6.2.19, the number of points β to be processed is O(log6 n). For a
fixed β, we construct the action of G on the set L := (α, β)G . By Lemma 6.2.17,
this can be done in nearly linear time by a deterministic algorithm. If (β, α) ∈ L
then we discard this β since it cannot be the one described in the proof of
Lemma 6.2.20. If (β, α) ∈ L then we construct a maximal block ⊆ L con-
taining (α, β) and (β, α), and the action of G on the block system consisting of
the images of . This action is permutation isomorphic to the action of G on
the cosets of a maximal subgroup containing G {α,β} .
If none of the algorithms described in Sections 6.2.3 and 6.2.4 find a solution
for (6.7) then, with high probability, the input primitive group is simple.
6.2.5. Implementation
Composition series computations are implemented in both GAP and Magma.
Reduction to the primitive case proceeds almost as described in Section 6.2.1;
however, primitive groups are handled somewhat differently, based on ideas
from [Kantor, 1991]. We give some details about the GAP implementation; the
version in Magma, as described in [Cannon and Holt, 1997], is similar. The
current GAP implementation works for inputs of degree up to 106 , whereas the
Magma implementation works up to degree 107 .
The GAP implementation starts with checking whether the input group is
solvable. This can be done very efficiently by an SGS construction for solvable
groups from [Sims, 1990]. We shall describe this algorithm in Section 7.1. If the
input group G is solvable then the output is not only an SGS but a composition
series as well; if G is not solvable then the SGS construction reports failure. For
nonsolvable inputs, we apply one more reduction step, the computation of de-
rived subgroups, besides the transitive constituent and block homomorphisms
indicated in Section 6.2.1. This eventually reduces the composition series prob-
lem to one of finding a composition series in primitive perfect groups.
Kantor’s idea is that there are categories of the O’Nan–Scott theorem such
that no primitive perfect group G ≤ Sym(), with || = n, can belong to these
categories if n is in the practical range n ≤ 107 . Moreover, just by looking at
n and |G|, for most inputs G we can determine which category of the O’Nan–
Scott theorem G belongs to. Therefore, we do not have to try all algorithms
designed for the different categories. In particular, simple groups can be often
recognized without any further work.
Lemma 6.2.21. Let G ≤ Sym(), with || = n, be a primitive perfect group,

and let n ≤ 107 . Then one of the following holds:
(1) n = p d for some prime p, G has a regular normal abelian subgroup, and
|G| divides p d |GLd ( p)|;
(2) n is the order of a finite simple group, |G| = n 2 , and G is the direct product
of two isomorphic single groups each acting regularly;
(3) n = n r1 for some values n 1 , r occurring in the first and second table in
(6.8), respectively; morever, |G| = ts r u for some value s occurring in line
n 1 of the first table and for some value u occurring in line r of the second
table, t|12r , and G falls in case II(ii) of Theorem 6.2.5; or
(4) G is simple.
n1 s
5 |A5 |
6 |A5 |; |A6 |
7 |PSL2 (7)|; |A7 |
8 |PSL2 (7)|; |A8 |
9 |PSL2 (8)|; |A9 |
10 |A5 |; |A6 |; |A10 |
11 |PSL2 (11)|; |M11 |; |A11 | r u
12 |PSL2 (11)|; |M11 |; |M12 |; |A12 |
5 |A5 |
13 |PSL3 (3)|; |A13 |
6 |A5 |; |A6 |
14 |PSL2 (13)|; |A14 |
7 |PSL2 (7)|; |A7 |
15 |A6 |; |A7 |; |A8 |; |A15 |
8 |PSL2 (7)|; |AGL3 (2)|; |A8 |
16 |A16 |
9 |PSL2 (8)|; |A9 |
17 |PSL2 (16)|; |A17 |
10 |A5 |; 24 · |A5 |; |A6 |; |A10 |
18 |PSL2 (17)|; |A18 |
19 |A19 |
20 |PSL2 (19)|; |A20 |
21 |A7 |; |PSL2 (7)|; |PSL3 (4)|; |A21 |
22 |M22 |; |A22 |
23 |M23 |; |A23 |
24 |PSL2 (23)|; |M24 |; |A24 |
25 |A25 |
(6.8)
Proof. We go through the cases of Theorem 6.2.5 and use the notation of that
theorem. If G belongs to case I(i) then it belongs to case (1) of this lemma.
If G belongs to case I(ii) then the minimal normal subgroups N1 , N2 of G are
isomorphic to T r for some simple group T (cf. Exercise 6.7) and n = |T |r . The
conjugation action of G permutes the r simple groups in N1 and N2 , and so G
has a factor group isomorphic to a subgroup F ≤ Sr × Sr . We must have r ≤ 3
since |T | ≥ 60 and 604 > 107 . Since G is perfect, F = 1, and so G/Soc(G)
is isomorphic to a subgroup of Out(T )2r . By Schreier’s hypothesis, Out(T )2r is
solvable, so we must have G = Soc(G) = N1 × N2 . Our last observation is that
r = 1 because otherwise, identifying with N1 = T1 × · · · × Tr , we have that
the set 1T1 is a nontrivial block of imprimitivity for G (cf. Exercise 6.7 again).
Hence we are in case (2).
Case II(i) cannot occur since the perfectness of G implies r ≥ 5 and n ≥ 605
(in fact, the smallest degree of a primitive group belonging to case II(i) is 606 ,
cf. [Dixon and Mortimer, 1996, Theorem 4.7B(iv)]).
If G belongs to case II(ii) and r = 1 then, by Schreier’s hypothesis, G belongs
to case (4) of this lemma. If r > 1 then the transitive permutation action of G
on {T1 , . . . , Tr } is isomorphic to a perfect subgroup of Sr of order u, and n = n r1
for some integer n 1 . Moreover, by Exercise 6.8, there is a primitive permutation
representation of degree n 1 of a subgroup of Aut(T ) containing Inn(T ). Note
the subtle point that the permutation representation of degree n 1 of T ∼ = Inn(T )
may not be primitive; the smallest example, in fact the only one in our range
n ≤ 107 , occurs when n 1 = 21 and r = 5, and T = PSL2 (7). We are in case
(3), and the possibilities for n 1 , r, u, and s := |T | are listed in Table (6.8). The
group orders s, u are given in a form that indicates the structure of the groups.
The Mathieu groups are denoted by Mi (i ∈ {11, 12, 22, 23, 24}). The factor t
in the order of G in this case is the contribution of the outer automorphisms
of T .
Finally, we show that G cannot belong to case II(iii). Suppose the contrary.
Then we must have l = 1 or l ≥ 5 since G is perfect. Here l ≥ 5 cannot occur
because n ≤ 107 . If l = 1 then k ≥ 5, using again that G is perfect; hence
n ≥ 604 > 107 , a contradiction.
Both algorithms described in Section 6.2.3 for the handling of case (1) are
implemented in GAP. Neumann’s method runs somewhat faster in practice, so
it is the current default technique. The algorithm for Frobenius groups differs
from the one given in Section 6.2.3. By Exercise 6.12(v), if G ≤ Sym(),
with || = p d , is a perfect Frobenius group then G/Soc(G) ∼ = SL2 (5). Hence,
taking random elements of G, we have a 1/120 chance of finding an element of
the socle. Note that it is very easy to test whether some g ∈ G is in Soc(G): This
happens if and only if g p = 1. Alternatively, g ∈ Soc(G) if and only if g = 1 or
g fixes no points of .
Groups belonging to case (2) are handled by the extension of Neumann’s
method, described in Remark 6.2.14.
For groups belonging to case (3), the algorithm is different from the one given
in Section 6.2.4. Let α ∈ and let K denote the kernel of the permutation action
of G on {T1 , . . . , Tr }. We compute the smallest orbit of G α in \{α}. It can be
checked that in each group in case (3), using the terminology of Lemma 6.2.16,
consists of points with exactly one nontrivial coordinate. Next, we compute
the (unique) minimal block system
for G α | and the kernel M of the action
of G α on
. With one exception, points in belong to the same block of
if
and only if their nontrivial coordinate is in the same position, and so M = K α .
The exception is when n 1 = 5, r = 10, and u = 24 · |A5 | in Table (6.8), in
which case the nontrivial coordinates can be in two positions for points in the
same block, and M is the extension of K α by an elementary abelian group of
order 16. Therefore, we can obtain Soc(G) as M G or M G (we may need
the second derived subgroup in the case n 1 = 21, s = |PSL3 (4)| as well).
6.2.6. An Elementary Version

The purpose of this section is twofold. First, we describe the nearly linear-time
Monte Carlo composition series algorithm from [Beals, 1993b], which does
not use consequences of the classification of finite simple groups. Then we
describe how one can upgrade Monte Carlo composition series constructions
to Las Vegas type, provided that a base and SGS, or the order, is given for the
input group.
Beals’s method solves the following basic problem:
Given a group G, find generators for

a simple subnormal subgroup T of G. (6.9)
If 1 T G and T G then of course T G is a proper normal subgroup of

G. Hence, if we can solve (6.9) in nearly linear time then, by Lemma 6.2.2, we
can also compute a composition series in nearly linear time.
The reduction to the primitive case proceeds as described in Section 6.2.1. If
G ≤ Sym() is not transitive or transitive but not primitive then either we obtain
a faithful permutation representation on at most ||/2 points or we can replace
G by the kernel N of a transitive constituent or block homomorphism, since a
simple subnormal subgroup of N is subnormal in G as well. The handling of
primitive groups with no regular normal subgroups is based on the following
result, which is an extension of the well-known lemma of Iwasawa used in the
textbook proofs of the simplicity of the groups PSLd (q) (cf. [Huppert, 1967,
Hilfssatz 6.12]).
Lemma 6.2.22. Let G ≤ Sym() be primitive and α ∈ . Suppose further that

T G α and T is simple. Let M := T G α and N := M G . Then
(i) N = M Soc(G).
(ii) If T is solvable then N /Soc(G) is solvable.
(iii) If T is nonabelian then either N = Soc(G) or C Nα (M) = Soc(G)α .
Proof. (i) Since Soc(G) is transitive, any g ∈ G can be written in the form
g = g1 h for some g1 ∈ G α and h ∈ Soc(G). Therefore, since M G α , we have
M g = M g1 h = M h ≤ MSoc(G) and so N ≤ MSoc(G). Conversely, M ≤ N
by definition and we claim that Soc(G) ≤ N . This is clear if Soc(G) is the
unique minimal normal subgroup of G. If G has two minimal regular normal
subgroups then Soc(G)α ≤ M ≤ N by Exercise 6.15, and N also contains a
minimal normal subgroup N1 of G. Hence Soc(G) = Soc(G)α , N1 ≤ N .
(ii) Suppose that T is a cyclic group of order p for some prime p. Then M is
a p-group by Exercise 6.17(ii) and so N /Soc(G) ∼ = M/(Soc(G) ∩ M) is also a
p-group.
(iii) If T is nonabelian then M is a minimal normal subgroup of G α by
Exercise 6.17(i). Since Soc(G)α G α , we must have M ≤ Soc(G)α or M ∩
Soc(G)α = 1. If M ≤ Soc(G)α then N = Soc(G). If M ∩ Soc(G)α = 1 then
Soc(G)α centralizes M and so Soc(G)α ≤ C Nα (M). Conversely, if c ∈ C Nα (M)
then Soc(G)c centralizes MSoc(G)/Soc(G). However, MSoc(G)/Soc(G) ∼ =
M, so it has trivial center, implying c ∈ Soc(G), and so c ∈ Soc(G) ∩ Nα =
Soc(G)α .
Beals’s Algorithm for Groups with Regular Normal Subgroups

If a primitive group G has a regular normal subgroup then we can solve (6.9)
in nearly linear time by constructing a minimal normal subgroup N of G by
the methods of Sections 6.2.3 and 6.2.4. Note that these algorithms do not
depend on the classification of finite simple groups. If N is elementary abelian
then for any nonidentity g ∈ N , g is a subnormal simple subgroup of G. If
N = T1 × · · · × Tr with isomorphic, nonabelian simple Ti then we construct a
minimal normal subgroup of N by applying recursively the composition series
algorithm for N . Note that the only subroutines that may be called during this
recursion are the transitive constituent and block homomorphism computations
of Section 6.2.1 and the algorithm for handling primitive groups with two regular
normal subgroups, since a direct product of simple groups cannot fall into other
categories of the O’Nan–Scott theorem.
Beals’s Algorithm for Groups with No Regular Normal Subgroups

If a primitive group G ≤ Sym() has no regular normal subgroups then we
construct Soc(G) as follows: First, we recursively construct a simple subnor-
mal subgroup T G α for some α ∈ and compute M := T G α and N :=
M G . If T is abelian then we compute the solvable residual N ∞ of N . By
Lemma 6.2.22(ii), Soc(G) = N ∞ . If T is nonabelian then we compute C Nα (M)
and (C Nα (M))G . Since M = Mα Nα , this can be done in nearly linear time
by the algorithm of Section 6.1.4. By Lemma 6.2.22(iii), Soc(G) is the smaller
of N and (C Nα (M))G . After Soc(G) is constructed, we solve (6.9) for G by
recursively computing a composition series for Soc(G). We note again that only
the homomorphism computations of Section 6.2.1 and the algorithm handling
primitive groups with two regular normal subgroups may occur during this
recursion.
We claim that the running time of the algorithm for solving (6.9) is nearly
linear for any input group G. Let f (n, k) be the maximum time requirement of
the algorithm for groups G ≤ Sn with |G| ≤ k. For an input G ≤ Sn , the algorithm
calls itself recursively only at most once, when the reduction subroutines of
Section 6.2.1 constructed some H G such that H acts primitively, with no
regular normal subgroups on some permutation domain with || ≤ n. In
this case, the recursive call is made for the subgroup Hδ for some δ ∈ . The
other steps of the algorithm are nearly linear, so f (n, |G|) ≤ Cn logc |G| +
f (||, |Hδ |) for some absolute constants c and C. Clearly, |Hδ | ≤ |G|/2; thus

&log |G|'
f (n, G) ≤ Cn(log |G| − i)c ≤ Cn logc+1 |G|.
i=0
Verification of a Composition Series

Our next goal is to show how Lemma 6.2.22 can be used to verify in nearly
linear time the correctness of a composition series construction, provided that
at least the order of the input group G is known. In this case, we can compute a
nonredundant base and an SGS for G and subgroups of G by nearly linear-time
Las Vegas algorithms, and all algorithms in Chapter 5 and Section 6.1 have Las
Vegas versions. In fact, if the initial SGS computation for G succeeds (and we
can check that since |G| is given) then all further computations we require from
Chapter 5 and Section 6.1 can be done in nearly linear time by deterministic
algorithms. The verification of composition series will use the classification of
finite simple groups.
Given G = S ≤ Sym(), with || = n, the output of the composition series
algorithm is a sequence of subgroups G = N1 N2 · · · Nl = 1 and homo-
morphisms ϕi : Ni → Sn with ker(ϕi ) = Ni+1 for i ∈ [1, l − 1]. We can verify
that Ni+1 Ni since we can test membership in the subgroups Ni ; therefore,
the only task left is to verify that ϕi (Ni ) is simple, for i ∈ [1, l − 1]. Further-
more, we can assume that ϕi (Ni ) acts primitively on some set i since we
can apply transitive constituent and block homomorphism algorithms to ϕi (Ni )
with guaranteed correct output and check with certainty that the kernel of these
homomorphisms is trivial. Hence the problem of upgrading the composition
series algorithm to Las Vegas type is reduced to the following task:
Given a nonredundant base and SGS for a primitive

group G ≤ Sym(), decide by a nearly linear-time
Las Vegas algorithm whether G is simple. (6.10)
Solving (6.10) is an important theoretical exercise. However, in the practical

range || ≤ 107 , only a small part of the following discussion is needed. We
leave the details to the reader (cf. Exercise 6.18).
First, we check that |G| is the order of some simple group. We claim that
the arithmetic operations required for this computation can be done in nearly
linear time. Since all prime divisors of |G| are at most ||, we can factor |G|
in nearly linear time. After that, we can identify O(logc |G|) simple groups
S such that if |G| is the order of a simple group then |G| must be the order
of one of these candidates. A very rough estimate provides c = 3. There are
O(log |G|) sporadic simple or alternating groups S with |S| ≤ |G| and there is
at most one cyclic simple group whose order is divisible by |G|. For each of the
O(log |G|) prime divisors p of |G| and each infinite series of Lie-type groups
in characteristic p, there are O(log |G|) possibilities for the size of the defining
field of S and O(log |G|) possibilities for the rank of S.
If |G| is the order of a nonabelian simple group then we solve (6.10) by a
recursive procedure. For some δ ∈ , we compute a simple subnormal subgroup
T G δ by the nearly linear-time Monte Carlo algorithm for (6.9). Recursively,
we solve (6.10) for T and verify that T is really simple. Then we compute
M := T G δ and N := M G .
If T is abelian then we compute the solvable residual N ∞ of N . By
Lemma 6.2.22(ii), if G has a regular abelian normal subgroup then N ∞ = 1 and
otherwise N ∞ = Soc(G). Hence, if N ∞ = G then we declare with certainty
that G = Soc(G) and so G is not simple. If G = Soc(G) then we declare that
G is simple. This decision is correct, since G = Soc(G) is the direct product
of isomorphic simple groups and so |G| = s r , where s is the order of some
simple group S and r is a positive integer. We also know that |G| is the order
of a simple group. By the Ph.D. thesis results of Teague (cf. [Kimmerle et al.,
1990, Section 6]), if s, t are orders of finite simple groups and r is a positive
integer satisfying s r = t then s = t and r = 1.
If T is nonabelian then we first check that G = N . If G = N then of course
G is not simple, so we can suppose G = N . Next, we compute C Nδ (M). By
Lemma 6.2.22(iii), C Nδ (M) ≤ Soc(G)δ , with equality if N = Soc(G). Hence, if
C Nδ (M) = 1 then we compute K := (C Nδ (M))G . We have 1 = K ≤ Soc(G).
If K = G then G is not simple since K is a proper normal subgroup; however,
if K = G then G = Soc(G) and, by Teague’s result, G is simple.
We are left with the case that G = N and C Nδ (M) = 1. We have to decide
whether N = Soc(G) (which implies Soc(G)δ = 1 by Lemma 6.2.22(iii)) or
N = Soc(G) = G (which implies that G is simple by Teague’s result). If ||
is not a power of the order of a simple group then the first of these cases cannot
occur and so G is simple. Hence we can suppose that || = s r , where s is the
order of some (not necessarily nonabelian) simple group.
Lemma 6.2.23. Let G ≤ Sym() be primitive, where || = pr for some prime

p and integer r . Suppose that |G| is the order of a simple group, T is a simple
group such that T G δ for some δ ∈ , and for M = T G δ , we have G =
M G and C G δ (M) = 1. Then G is not simple if and only if M = T = G δ and
|G δ | divides |GLr ( p)|.
Proof. We have seen that if a primitive group G satisfies the conditions of the
lemma then either Soc(G) is regular abelian or G is simple.
If Soc(G) is regular abelian then G δ ∩ Soc(G) = 1, so G = N = MSoc(G)
implies that M = G δ . This also implies that T = M, since if T = G δ then T G δ
is a proper normal subgroup of G δ , contradicting M = G δ . The other property
in the conclusion of the lemma, that |G δ | divides |GLr ( p)|, obviously holds in
this case.
If G is simple then by [Guralnick, 1983] or [Kantor, 1985a], the pair (G, G δ )
is one of the following:
(i) G ∼= A pr and G δ ∼
= A pr −1 ;
∼
(ii) G = PSLd (q) with (q d − 1)/(q − 1) = pr , and G δ is a maximal parabolic
subgroup of G;
(iii) G ∼= PSL2 (11), pr = 11, and G δ ∼
= A5 ;
∼ ∼
(iv) G = M23 , p = 23, and G δ = M22 ;
r
(v) G ∼ = M11 , pr = 11, and G δ ∼ = M10 ;

(vi) G ∼ = PSU4 (2), pr = 27, and G δ is a maximal parabolic subgroup of G of
order 960.
Here M23 , M22 , and M11 denote Mathieu groups, and M10 is an index two
extension of A6 . In cases (ii), (v), and (vi) G δ is not simple, whereas in cases
(i), (iii), and (iv) |G δ | does not divide |GLr ( p)|.
Based on Lemma 6.2.23, we can easily finish the solution of (6.10) in the
case when || is a prime power. The last remaining case is to decide whether
Soc(G) is regular nonabelian. We shall do this by showing that, for primitive
groups with regular nonabelian socle, (6.7) from Section 6.2.1 can be solved in
Lemma 6.2.24. Let G ≤ Sym() be primitive with regular nonabelian socle
and let α ∈ and || = n. Then the smallest orbit of G α in \{α} has size
less than log6 n.
Proof. Let Soc(G) = T1 × · · · × Tr with all Ti isomorphic to a simple non-

abelian group T , and let ν(T ) denote the degree of the smallest faithful permu-
tation representation of T . By [Dixon and Mortimer, 1996, Theorem 4.7B(iv)],
T is isomorphic to a section (i.e., a factor group of a subgroup) of Sr −1 ; therefore,
by Corollary 6.2.3, r − 1 ≥ ν(T ).
Let s denote the size of the smallest conjugacy class of Aut(T ) in Inn(T )\{1}.
We claim that s < ν 5 (T ). For sporadic T , this can be checked from the Atlas
[Conway et al., 1985]; for alternating groups, the number of 3-cycles satisfies
this inequality; for classical groups of Lie type, ν(T ) can be obtained from
[Cooperstein, 1978], and the number of transvections (or number of root ele-
ments in the orthogonal case) satisfies the inequality; and for exceptional groups
of Lie type, |T | < ν 5 (T ) (cf. [Kantor and Penttila, 1999]).
As we discussed before Lemma 6.2.16 in Section 6.2.4, the points in
correspond to the cosets of Soc(G)α = 1 in Soc(G), and the permutation action
of G α on is the same as the conjugation action of G α on these cosets. Hence
the smallest orbit of G α in \{α} has size at most sr < r 6 < log6 n.
By Lemmas 6.2.24 and 6.2.17, if G has regular nonabelian socle then the
action of G on the cosets of G αβ can be constructed by a nearly linear-time
deterministic algorithm, provided that β is chosen from the smallest orbit of
G α in \{α}. After that, the action of G on the cosets of a maximal subgroup
containing NG (G αβ ) can be constructed as described in Section 6.2.4. This latter
action either is not faithful or falls into a different case of Theorem 6.2.5 (in
fact, into case II(ii)). If the first of these alternatives holds then G is not simple,
and in the second case we have seen already how (6.10) can be solved.
The last observation we have to make is that the algorithm solving (6.10) can
be organized such that it calls itself recursively only once, for a group of order
at most half of the order of the input group. If this property holds then the same
argument that we used earlier in this section in the case of (6.9) shows that
the running time is nearly linear. As we described the algorithm, one recursive
call is made for the simple group T G δ . The only way a second recursive
call may occur is that the subroutine for the handling of groups with regular
nonabelian socle returns a faithful primitive permutation representation on a
smaller domain, and we check the simplicity of a subnormal subgroup of a
point stabilizer in this new representation. To avoid two recursive calls, we
postpone the verification of the simplicity of T as the last step of the algorithm,
and if a new permutation representation is constructed then we do not perform
the verification of T at all.
6.2.7. Chief Series

Given a nonredundant base and SGS, and a composition series G = N1 N2
· · · Nl = 1 for some group G ≤ Sym(), we can construct a chief series as
follows: First, we compute the subgroup chain G = H1 ≥ H2 ≥ · · · ≥ Hl = 1,
where Hi := NiG . The subgroups Hi are normal in G and, by repeated appli-
cations of Exercise 6.17, we see that if Hi = Hi+1 then Hi /Hi+1 is the direct
product of simple groups isomorphic to Ni /Ni+1 if Ni /Ni+1 is nonabelian, and
Hi /Hi+1 is a p-group if Ni /Ni+1 ∼ = C p . Moreover, by the same exercise, the
nonabelian groups Hi /Hi+1 are chief factors, and so the only task left is the
refinement of the solvable factors Hi /Hi+1 .
If K is a p-group then K p K is a characteristic subgroup of K , where K p
denotes the subgroup of K generated by the pth power of elements of K , and
the factor group K /K p K is elementary abelian. (The subgroup K p K is called
the Frattini subgroup of K , and it is the intersection of all maximal subgroups of
p
K .) Therefore, if Hi = Si and Hi /Hi+1 is a p-group then we compute Si :=
{g p | g ∈ Si } and Hi,1 := Si , Hi , Hi+1 . We have Hi ≥ Hi,1 ≥ Hi+1 and
p
Hi,1 G, and the factor Hi /Hi,1 is an elementary abelian p-group. Repeating

this procedure recursively in Hi,1 , we can construct a chain of normal subgroups
Hi = Hi,0 ≥ Hi,1 ≥ · · · ≥ Hi,m i = Hi+1 of G, with elementary abelian factors
Hi, j /Hi, j+1 . So far, obviously everything could be done by nearly linear-time
deterministic algorithms.
The only nontrivial step of the chief series construction is to decide whether
in the resulting series of normal subgroups G = M1 M2 · · · Mm = 1 the
elementary abelian factors are chief factors. Suppose that Mi /Mi+1 is an el-
ementary abelian p-group, and |Mi /Mi+1 | = p d . The factor group Mi /Mi+1
can be identified with the vector space GF( p)d , and the conjugation action of
G on Mi /Mi+1 is isomorphic to a subgroup G i ≤ GLd ( p). The factor Mi /Mi+1
is a chief factor if and only if G i acts irreducibly on GF( p)d . We can construct
generators for G i (as d × d matrices over GF( p)) by the following procedure.
p
Suppose that Mi = Mi+1 , y1 , . . . , yd , with y j ∈ Mi+1 for j ∈ [1, d]. We con-
struct an SGS S0 with shallow Schreier trees for Mi+1 and then, recursively for
j = 1, 2, . . . , d, we augment S j−1 to an SGS S j for Mi+1 , y1 , . . . , y j by the
algorithm described in Lemma 7.1.2. By Exercise 7.2, there are z j ∈ Mi+1 y j
for j ∈ [1, d] such that any label in the Schreier tree data structure defined by
Sd is from S0 or is a power of some z j . During the SGS constructions, we can
easily keep track of the exponents of the labels as powers of the z j . To construct
g
the conjugation action of any g ∈ G on Mi /Mi+1 we write z j as a word w( j, g)
g
in Sd by sifting z j as a word in the Schreier tree data structure of Mi and then
deleting the elements of S0 from w( j, g). The remaining word defines the im-
age of the basis vector z j at the linear transformation ĝ ∈ G i , corresponding
to g.
After the matrix generators for G i are constructed, the irreducibility of the
action of G i on GF( p)d can be decided by Norton and Parker’s Meat-Axe
procedure (cf. [Parker, 1984]). The version described in [Holt and Rees, 1994]
and augmented in [Ivanyos and Lux, 2000] is very fast in practice, is of Las Vegas
type, and runs in polynomial time in d, log p, and the number of generators (and
hence in nearly linear time in terms of the original permutation group input
G ≤ Sym()). The output is either a certificate that the action is irreducible,
and so Mi /Mi+1 is a chief factor, or generators for a proper invariant subspace,
which can be used to insert a normal subgroup between Mi and Mi+1 .
6.3. Quotients with Small Permutation Degree

For iterative processes like computing the upper central series, it would be useful
to construct permutation representations of quotient groups. However, as shown
in [Neumann, 1986] (cf. Exercise 6.6), there are examples of G ≤ Sn , N G
such that G/N has no faithful permutation representation on less than 2n/4
points. On the positive side, it is proven in [Easdown and Praeger, 1988] and
[Luks, 1990] that if N is abelian then there exists a homomorphism ϕ : G → Sn
such that N ≤ ker(ϕ) and, in a sense, ker(ϕ) is not too big. Also, as observed
by Luks, ϕ can be constructed efficiently. The method was also rediscovered in
[Holt, 1997].
Theorem 6.3.1. Let G ≤ Sym(), with || = n, and let N G be abelian. Then
there exists a homomorphism ϕ : G → Sn such that N ≤ ker(ϕ) and the prime
divisors of N and | ker(ϕ)| are the same. In particular, if N is a p-group then
so is ker(ϕ); moreover, if N is elementary abelian then ker(ϕ) is elementary
abelian. Given generators for G and N , ϕ can be constructed in linear time by
a deterministic algorithm.
k
Proof. Let 1 , . . . , k be the orbits of N and so = i=1 i . By Corol-
lary 6.1.5, N acts regularly on each set i , so N |i has |i | elements. Let
i
k
denote the set of permutations in N |i , and let
:= i=1
i . Clearly, |
| = n.
We shall define an action of G on
, satisfying the conclusion of the theorem.
g
Given g ∈ G and σi ∈
i , we define σi ∈
in the following way: By
g
Lemma 6.1.7, G permutes the orbits of N ; suppose that i = j . Take any
6.3 Quotients with Small Permutation Degree 157
g
x ∈ N with x|i = σi , and let σi := x g | j . This map is well defined since if
x|i = y|i for some x, y ∈ N then x g | j = y g | j . It is also clear that this
construction defines a homomorphism ϕ : G → Sym(
).
Now N ≤ ker(ϕ), since N acts trivially on itself by conjugation. Moreover, if
g ∈ ker(ϕ) then g fixes the orbits of N and g|i centralizes N |i . By the second
part of Corollary 6.1.5, N |i is self-centralizing in Sym(i ), so g|i ∈ N |i .
Hence the transitive constituents of N and ker(ϕ) are the same, implying the
required relations between the prime factors and exponents.
To see the algorithmic feasibility of the construction, let us fix a representa-
tive δi ∈ i in each orbit of N and build a (breadth-first-search) Schreier tree
Ti with root δi for N mod Nδi . In this tree, contrary to the usual Schreier tree
constructions, we direct the edges away from the root. Identifying the trivial
permutation in
i with δi , the elements of
i naturally correspond to the ele-
g
ments of i . If σi ∈
i corresponds to γ ∈ i then, to compute σi , it is enough
g
to determine the image of δ j under the permutation xγ , where xγ is the prod-
uct of edge labels in Ti along the path Pγ from δi to γ . (Of course, we do
not compute the permutation xγ ; instead, we work with the word consisting of
the sequence of edge labels g
along Pγ .) This path Pγ may be (|i |) long, so
computing one image δ jγ may grequire (|i |) time; however, noticing that
x
g
if
x
δ is the parent of γ in Ti and δ jδ is already computed then we can obtain δ jγ in
x
g
constant time, we can construct the entire image
i in O(|i |) time.
6.3.1. Solvable Radical and p-Core

For a group G, the solvable radical (in short, radical) O∞ (G) is defined as the
largest solvable normal subgroup of G and, for prime numbers p, the p-core
O p (G) is the largest normal p-group in G. Note that, since two solvable nor-
mal subgroups generate a solvable normal subgroup, there is a unique maximal
solvable normal subgroup in G. Similarly, G has a unique maximal normal
p-subgroup. By the same argument, we see that O∞ (G) and O p (G) are char-
acteristic in G.
Based on Theorem 6.3.1, [Luks, 1990] gave polynomial-time algorithms for
computing O∞ (G) and O p (G). For p-cores, another polynomial-time algorithm
is described in [Neumann, 1986], and p-cores can be also obtained as the
intersection of Sylow p-subgroups (cf. [Butler and Cannon, 1989]), although
the computation of a Sylow subgroup is much more difficult than obtaining the
p-core. Here we follow the nearly linear-time version from [Luks and Seress,
1997]. The basic idea is the same as in [Luks, 1990]. The radical and p-core are
computed similarly, so we handle both of them simultaneously. Our primary
description is the computation of the radical; the necessary changes for the
computation of the p-core are indicated in double brackets.
First, let us consider the special case when G has a maximal normal subgroup
N with O∞ (N ) = 1[[O p (N ) = 1]]. It turns out that this extra condition restricts
severely the radical and p-core of G.
Lemma 6.3.2. Suppose that N ≤ G is a maximal normal subgroup and

O∞ (N ) = 1[[O p (N ) = 1]]. Then, if G/N is not cyclic [[not a p-cycle]] then
O∞ (G) = 1[[O p (G) = 1]].
Proof. Since the radical and p-core of G intersect N trivially, they must central-
ize N . In particular, if the radical [[ p-core]] of G is nontrivial then C G (N ) has
nontrivial radical [[ p-core]]. If C G (N ) ≤ N then its radical [[ p-core]] is triv-
ial. If C G (N ) ≤ N then G = N C G (N ) and C G (N )/Z (N ) ∼ = G/N , so C G (N )
has a chance to have nontrivial radical [[ p-core]] only if G/N is cyclic [[is a
p-cycle]].
Lemma 6.3.3.
(i) Suppose that N ≤ G is a maximal normal subgroup, O∞ (N ) = 1, and G/N

is cyclic. Then O∞ (G) = 1 if and only if C G (N ) = 1.
(ii) Suppose that N ≤ G is a maximal normal subgroup, O p (N ) = 1, and G/N
is a p-cycle. Then O p (G) = 1 if and only if C G (N ) ≤ N .
Proof. (i) Since Z (N ) = 1, we must have C G (N ) = 1 or C G (N ) ∼ = G/N . In the

latter case, O∞ (G) = C G (N ) = 1.
(ii) First, we observe that O p (Z (N )) = 1, since O p (Z (N )) N . If C G (N ) ≤
N then C G (N ) = Z (N ) and so O p (C G (N )) = 1. If C G (N ) ≤ N then C G (N )/
Z (N ) ∼= C p and C G (N ) = Z (N ), g for any g ∈ C G (N )\Z (N ). Since g com-
mutes with N , it commutes with Z (N ). Therefore C G (N ) is abelian, which
implies O p (C G (N )) ∼
= C p and O p (G) = O p (C G (N )) = 1.
The Algorithms
Now we are ready to describe the algorithms. First, we compute a composition
series 1 = N1 N2 · · · Nm = G. By the results of Section 6.2, this can
be done in nearly linear time. Then we find the smallest index i such that
Ni has a nontrivial radical [[ p-core]] by the following method: Suppose that
O∞ (Ni ) = 1[[O p (Ni ) = 1]]. If Ni+1 /Ni is not cyclic [[not a p-cycle]] then, by
Lemma 6.3.2, we conclude that O∞ (Ni+1 ) = 1[[O p (Ni+1 ) = 1]]. Otherwise,
we compute C Ni+1 (Ni ), using the algorithm of Section 6.1.4. If C Ni+1 (Ni ) ≤ Ni
Exercises 159
then, by Lemma 6.3.3, we get a nontrivial radical [[ p-core]]. We take its normal
closure H in G; by Exercise 6.17(ii), it is a solvable normal subgroup [[normal
p-group]] in G.
Next, we compute the derived series of H . The last nontrivial term in the
derived series is an abelian normal subgroup [[abelian normal p-group]] N of
G. We compute the homomorphism ϕ : G → Sn described in Theorem 6.3.1
and repeat the procedure in the image of ϕ. Note that the composition series
computation need not be repeated, since 1 = ϕ(N1 ) ϕ(N2 ) · · · ϕ(Nm ) =
ϕ(G), with ϕ(Ni+1 )/ϕ(Ni ) ∼ = Ni+1 /Ni or ϕ(Ni+1 )/ϕ(Ni ) = 1.
Theorem 6.3.1 and computing O∞ (G) are becoming more and more
important. In any group G, we can define a series of characteristic subgroups
1 ≤ N1 ≤ N2 ≤ N3 ≤ G, where N1 = O∞ (G), N2 /N1 = Soc(G/N1 ) ∼ = T1 ×
· · · × Tm is the direct product of nonabelian simple groups, N3 /N2
Out(T1 ) × · · · × Out(Tm ), and G/N3 is a permutation group of degree m,
corresponding to the conjugation action of G on {T1 , . . . , Tm }. Constructing
these subgroups is one of the main approaches for matrix group computations
(cf. [Babai and Beals, 1999; Kantor and Seress, 2002]), but a number of
recent permutation group algorithms utilize them as well, for example, for
computing conjugacy classes (cf. [Cannon and Souvignier, 1997; Hulpke,
2000]), maximal subgroups (cf. [Eick and Hulpke, 2001; Cannon and Holt,
2002]), or automorphism groups (cf. [Holt, 2001]).
In the permutation group setting, during the construction of O∞ (G) we obtain
faithful permutation representations for the factor groups G/K i for a sequence
of normal subgroups 1 = K r ≤ K r −1 ≤ · · · ≤ K 1 = N1 = O∞ (G), with elemen-
tary abelian factors K i /K i+1 . First we solve the problem at hand (for example,
the computation of conjugacy classes) for G/O∞ (G). The bulk of the work in
this step is the handling of the simple nonabelian groups T j in Soc(G/O∞ (G)),
which can be done by identifying these groups with a standard copy of the T j
(cf. Section 8.3 for details and a precise formulation of the identification). In the
standard copy, the problems indicated here can be solved more easily than in
arbitrary permutation representations. After that, we lift the result recursively
from G/K i to G/K i+1 , for i ∈ [1, r − 1]. For that task, techniques for solvable
permutation groups are used. We shall give examples of such lifting procedures
in Section 7.3.
Exercises
6.1. Let G ≤ Sym() be primitive and 1 = N G. Prove that N is transitive.
6.2. Finish the proof of Lemma 6.1.9.
6.3. Construct
√
a permutation group G ≤ Sym(), with || = n, and with
(n/2 log n ) orbits that are of the same size but pairwise not equivalent
in the equivalence relation defined in Section 6.1.2. Hint: We may choose
G to be an elementary abelian 2-group.
6.4. Let H, G ≤ Sym() such that G normalizes H . Prove that G normalizes
CSym() (H ).
6.5. Prove that if H ≤ G then H G if and only if the subgroup chain defined
recursively as H0 := G, Hi := H Hi−1 for i > 0 reaches H .
6.6. [Neumann, 1986] Let = {α1 , . . . , α4m }. For i ∈ [1, m], let G i be a
dihedral group of order 8, acting on {α4i−3 , α4i−2 , α4i−1 , α4i } and fixing
the other 4m − 4 points of . Let z i be the generator of Z (G i ). Prove
that the subgroup N := z 1 z 2 , z 1 z 3 , . . . , z 1 z m is normal in G := G 1 ×
· · · × G m and G/N has no faithful permutation representation of degree
less than 2m+1 .
6.7. Let G ≤ Sym() be primitive, with two regular normal subgroups N1
and N2 . Prove that N1 ∼ = N2 ∼= N for some characteristically simple non-
abelian group N and that can be identified with N such that N1 and N2
are the right-regular and left-regular representations of N on itself.
6.8. Let 1 = K ≤ Sym() and 1 = H ≤ Sm . We define an action of K H
on the set m by the rule that if g := (k1 , . . . , km ; h) ∈ K H and
k h̄ k h̄
δ := (δ1 , . . . , δm ) ∈ m then δ g := (δ11h̄ , . . . , δmmh̄ ) for h̄ := h −1 . (This
is called the product action of the wreath product.)
Prove that this product action is primitive if and only if K acts primi-
tively but not regularly on and H is a transitive subgroup of Sk .
6.9. Let T1 , . . . , Tr be nonabelian simple groups and let H be a subdirect prod-
uct of the Ti , i = 1, 2, . . . , r . Prove that there is a partition (
1 , . . . ,
l )
of {1, 2, . . . , r } such that, for each fixed j ∈ [1, l], all groups Tk with

k ∈
j are isomorphic, and there is a diagonal subgroup D j ≤ k∈
j Tk

such that H = lj=1 D j .
6.10. Let T1 , . . . , Tr be nonabelian simple groups. Prove that if M is a minimal
normal subgroup of T1 × · · · × Tr then M = Ti for some i ∈ [1, r ].
6.11. For any group G, prove that the following are equivalent:
(i) G has a faithful transitive permutation representation that is not regu-
lar, but the stabilizer of any two points is trivial (i.e., G is a Frobenius
group).
(ii) G has a nontrivial subgroup H such that NG (H ) = H and any two
conjugates of H are identical or their intersection is 1 (such a sub-
group H is called a Frobenius complement).
6.12. This is not an exercise but rather a compilation of results about Frobe-
nius groups. Proofs can be found, for example, in Sections 17 and 18 of
[Passman, 1968]. Let G ≤ Sym() be a Frobenius group and let α ∈ .
Then H := G α is a Frobenius complement for G.
Exercises 161

(i) (Frobenius) The set {1} ∪ (G\ g∈G H g ) is a regular normal sub-
group of G, which is called the Frobenius kernel.
(ii) (Thompson) The Frobenius kernel is nilpotent.
(iii) If |H | is even then H has a unique element of order 2, which is
therefore central.
(iv) (Zassenhaus) If |H | is odd then for every prime divisor p of |H : H |,
elements of order p in H are central.
(v) (Zassenhaus) If H is nonsolvable then H has a normal subgroup H0
of index 1 or 2 such that H0 ∼ = SL2 (5) × S for some solvable group
S of order relative prime to 30.
6.13. Prove Lemma 6.2.12. Hint: Use induction on m.
6.14. Let p be a prime number.
(i) Let P be the set of d × d lower triangular matrices over GF( p), with
all diagonal entries equal to 1. Prove that P is a Sylow p-subgroup
of GLd ( p).
(ii) For H ≤ GLd ( p), let fix(H ) denote that the set of vectors in GF( p)d
fixed by each element of H . Prove that if H is a p-group then fix(H )
is a nonzero subspace of GF( p)d .
(iii) Prove that if H ≤ GLd ( p) then fix(O p (H )) is an H -invariant sub-
space.
6.15. Let G ≤ Sym() be primitive with two regular normal subgroups and let
α ∈ . Prove that the only minimal normal subgroup of G α is Soc(G)α .
6.16. Design a faster version of the algorithm described in Lemma 6.2.15,
where the intermediate groups h iH for 1 ≤ i < l are not computed; in-
stead, guarantee that a subgroup of h iH of order greater than n is con-
structed with sufficiently high probability and that h i+1 is chosen from
this subgroup.
6.17. Suppose that T G and T is simple. Prove that
(i) if T is nonabelian then T G is a direct product of simple groups
isomorphic to T and it is a minimal normal subgroup of G;
(ii) if T is cyclic of order p then T G is a p-group;
(iii) it is possible that T G is not elementary abelian in part (ii).
6.18. Combining ideas from Sections 6.2.5 and 6.2.6, design a fast Las Vegas
algorithm to decide whether a primitive group of degree at most 107 and
of known order is simple.
7
Solvable Permutation Groups
7.1. Strong Generators in Solvable Groups

Exploiting special properties of solvable groups, [Sims, 1990] describes a
method for constructing a strong generating set. Recall that a finite group G is
solvable if and only if it is polycyclic, that is, there exists a sequence of elements
(y1 , . . . , yr ) such that G = y1 , . . . , yr and for all i ∈ [1, r − 1], yi normalizes
Hi+1 := yi+1 , . . . , yr .
The main idea is that given an SGS for a group H ≤ Sym() and y ∈ Sym()
such that y normalizes H , an SGS for H, y can be constructed without sifting
of Schreier generators. The method is based on the following observation.
Lemma 7.1.1. Suppose that G = H, y ≤ Sym() and y normalizes H . For

a fixed α ∈ , let = α H , = α G , and m = ||/||. Then
(i) m is an integer and there exists h ∈ H such that z := y m h fixes α;

(ii) z normalizes Hα ; and
(iii) G α = Hα , z.
Proof. (i) By Lemma 6.1.7, is the disjoint union of G-images of . Moreover,

the G-images of are cyclically permuted by y and m is the smallest integer
m m
such that y = . In particular, α y ∈ , so there exists h ∈ H with the desired
property.
(ii) Since y m and h normalize H , so does their product z. Moreover, since
z ∈ G α , it normalizes Hα = G α ∩ H .
(iii) Any x ∈ G can be written in the form x = h x y i for some h x ∈ H and
some integer i ≥ 0. If, in addition, x ∈ G α then x = and so m divides i.
Therefore x = h x (zh −1 )i/m = uz i/m for some u ∈ H . Since both x and z fix
α, so does u. This implies that x ∈ Hα , z.
162
7.1 Strong Generators in Solvable Groups 163
Lemma 7.1.2. Suppose that G = H, y ≤ Sym(), with || = n, and y
normalizes H . Moreover, suppose that a nonredundant base B and an SGS S
relative to B is given for H , and let t denote the sum of depths of the Schreier
trees defined by S. Then there is a deterministic algorithm that computes an SGS
S ∗ ⊇ S for G in O(n(t + log |G : H |)) time, satisfying |S ∗ | ≤ |S| + 2 log |G :
H | and that the sum of depths of the Schreier trees defined by S ∗ is at most
t + 2 log |G : H |.
Proof. Let B = (β1 , . . . , β M ) be the nonredundant base of H , and let m 1 :=

|β1G |/|β1H |. By Lemma 7.1.1(i), m 1 is an integer. First, we determine the value
yi
of m 1 . This is done by computing the images β1 for positive integers i. The
i
y
smallest i such that β1 is in the fundamental orbit := β1H gives the value of
i
m 1 . After that, we compute the permutations y 2 for all i satisfying 2i ≤ m 1 .
The Schreier tree (S1∗ , T1∗ ) for G mod G β1 is constructed by augmenting the
i
Schreier tree (S1 , T1 ) for H mod Hβ1 , using the labels y 2 . Let t1 denote the depth
j
of (S1 , T1 ). Then points in y are of distance at most t1 + b( j) from the root
∗ ∗
in (S1 , T1 ), where b( j) denotes the number of 1s in the binary expansion of
j. Hence the depth of (S1∗ , T1∗ ) is at most t1 + log m 1 ≤ t1 + 2 log m 1 . We
finish processing the first level by computing y m 1 and the coset representative
h defined by (S1 , T1 ) such that y m 1 h fixes β1 . By Lemma 7.1.1(iii) we have
G β1 = Hβ1 , y m 1 h.
The time requirement is O(n) for computing m 1 , O(n log m 1 ) for computing
the y 2 for 0 < i ≤ log m 1 , O(n log m 1 ) for computing (S1∗ , T1∗ ), and O(n(t1 +
i
log m 1 )) for computing y m 1 h.

After processing the first level, we apply recursively the same procedure for
G (β1 ,...,β j ) , for j = 1, 2, . . . , M. Denoting the depth of the jth Schreier tree for
G (β ,...,β ) H(β ,...,β )
H by t j and |β j 1 j−1 |/|β j 1 j−1 | by m j , the total time requirement of the
M
algorithm is O(n j=1 (t j + log m j )) = O(n(t + log |G : H |)).
Following Sims, we call the algorithm described in the proof of Lemma 7.1.2
Normalizing Generator.
Corollary 7.1.3. Given a polycyclic generating sequence (y1 , . . . , yr ) for a

solvable group G ≤ Sn , an SGS supporting membership testing in O(n log |G|)
time per test can be computed in O(r n log |G|) time by a deterministic algo-
rithm. The memory requirement is O(n log |G| + r n).
In particular, if G = T is abelian then the given generators form a polycyclic

generating sequence and an SGS can be computed in O(|T |n log |G|) time by
a deterministic algorithm.
164 Solvable Permutation Groups
Unfortunately, the input generating set for a solvable group is frequently not
a polycyclic one and often we do not even know whether the input group is
solvable. Hence we use Normalizing Generator as a subroutine of the follow-
ing generalized normal closure (GNC) algorithm. The input of GNC is an SGS
for some N G and y ∈ G such that y ∈ N . The output is either an SGS for
the normal closure M = N , yG or two conjugates u, v of y such that the
commutator w = [u, v] ∈ N .
Starting with N = 1, repeated applications of GNC construct an SGS for a
solvable group. If we get an output of the first type, the next call is made with M
and a generator of G not in M. When an output of the second type is returned,
the next call is made with the same N as before and with w. In this case, the
progress we made is that w is in the derived subgroup G . All conjugates of w
remain in G , so a new output of the second type will be in G . Clearly, in a
solvable group sooner or later an output of the first type occurs.
If the input group is not solvable then recursive calls of GNC continue in-
definitely. To handle this possibility, a result from [Dixon, 1968] is used. Dixon
showed that the derived length of a solvable permutation group of degree n
is at most (5 log3 n)/2. Hence, if more than (5 log3 n)/2 consecutive outputs
of GNC are of the second type then we stop and conclude that G is not
solvable.
The Generalized Normal Closure Algorithm

To finish our description of the solvable group SGS construction, we indicate
how the procedure GNC works. Given N and y, we start a standard normal
closure algorithm (cf. Section 2.1.2), collecting the conjugates of generators
in a set U . We add only those conjugates that increase N , U . Each time a
permutation u is added to U , we check whether the commutators of u with all
previously defined elements of U are in N . If some commutator is not in N
then we have found an output of the second type. If all commutators are in N
then in particular u normalizes N , U \{u} and Normalizing Generator can
be used for constructing an SGS for N , U .
We analyze the version of the algorithm when the inverses of transversals
are stored in Schreier trees using the method of Lemma 7.1.2. Let T denote the
set of given generators for G. The procedure GNC is called O(log |G| log n)
times, because outputs of the first type define an increasing chain of subgroups
that must have length O(log |G|), and between two outputs of the first type only
O(log n) outputs of the second type occur.
Within one call of GNC, the set U increases O(log |G|) times since the groups
N , U define an increasing subgroup chain. Hence O(log2 |G|) commutators
and O(|T | log |G|) conjugates of the elements of U must be computed and
7.2 Power-Conjugate Presentations 165
their membership must be tested in the groups N and N , U , respectively.
By Corollary 7.1.3, one membership test costs O(n log |G|). By Lemma 7.1.2,
the total cost of O(log |G|) calls of Normalizing Generator is O(n log2 |G|).
Combining these estimates, we obtain the following theorem.
Theorem 7.1.4. An SGS for a solvable permutation group G = T ≤ Sn

can be computed in O(n log n log3 |G|(|T | + log |G|)) time by a deterministic
algorithm.
The solvable SGS construction belongs to the rare species of deterministic

nearly linear-time algorithms. An additional bonus is that we also obtain a
polycyclic generating sequence (and so a composition series). Moreover, in
practice, the algorithm runs faster than other speedups of the Schreier–Sims
procedure and the time lost on nonsolvable inputs is often insignificant. Hence,
it may be beneficial to execute this algorithm for all input groups that are not
yet known to be nonsolvable.
7.2. Power-Conjugate Presentations

Computing with solvable groups is one of the success stories of computational
group theory. The representation most convenient to work with is a generator–
relator presentation with special properties. Let (y1 , . . . , yr ) be a polycyclic
generating sequence for a solvable group G, with yi normalizing G i+1 :=
yi+1 , . . . , yr for i ∈ [1, r − 1]. Any g ∈ G can be written uniquely in the
form g = y1e1 y2e2 · · · yrer for integers ei satisfying 0 ≤ ei < |G i : G i+1 |. This
product is called the collected word for g. We also have
|G i :G i+1 | ε ε
yi = yi+1
i;i+1 i;i+2
yi+2 · · · yrεi;r , for 1 ≤ i ≤ r, (7.1)
εi, j;i+1 εi, j;i+2 ε
y j yi = yi yi+1 yi+2 ··· yr i, j;r , for 1 ≤ i < j ≤ r.
These are the relators for the so-called power-conjugate presentation (pcp)
for G. By repeated applications of the relators in (7.1), any word in the yi
representing an element g ∈ G can be rewritten to the collected word for g.
This process is called a collection. In particular, given collected words for g, h ∈
G, we can compute the collected word of their product by concatenating the
collected words for g and h and performing a collection. We note that no matter
in which order the rules (7.1) are applied, the collection process terminates;
however, the number of applications can vary significantly. A discussion of
the different collection methods can be found in [Leedham-Green and Soicher,
1990] and in [Vaughan-Lee, 1990].
The pcps are very compact, but still relatively easily manageable represen-
tations of polycyclic groups. In some instances, groups G with log |G| in the
thousands can be handled, which is currently not possible in a representation
as a permutation or matrix group. There is a large library of algorithms to deal
with polycyclic groups, but this is the topic for another book. Here we just give
the references [Laue et al., 1984], [Celler et al., 1990], [Sims, 1994], and [Eick,
1997] as a starting point for the interested reader. Most algorithms work well
in practice, but the complexity analysis is not as well developed as in the case
of permutation groups.
It is often beneficial to work with a pcp of a special form. By inserting
appropriate powers of the generators into the polycyclic generating sequence,
we can assume that the indices pi := |G i : G i+1 | are prime numbers. Moreover,
it is useful if the tower of subgroups G = G 1 G 2 · · · G r G r +1 = 1 is a
refinement of a chain of normal subgroups G = N1 N2 · · · Nm = 1, with
Ni /Ni+1 elementary abelian. Then we can use induction to extend information
from G/Ni to G/Ni+1 . Since G/Ni acts as a group of linear transformations
on Ni /Ni+1 , often linear algebra methods can be utilized.
Conversion to a pcp
Because of the usefulness of pcps, we may want to compute one for a
given solvable permutation group. Fortunately, the solvable SGS construc-
tion in Section 7.1 accomplishes this task, with very little extra work. By
Exercises 7.1 and 7.2, we can construct a polycyclic generating sequence
(y1 , . . . , yr ) such that the indices in the corresponding subgroup chain G =
G 1 G 2 · · · G r G r +1 = 1 are prime numbers and the SGS S constructed
by the algorithm of Section 7.1 consists of powers of the yi . Moreover, by
Lemma 7.1.2, S ∩ G i is an SGS for G i relative to the base of G (but this base
|G :G |
may be redundant for indices i > 1). Therefore, sifting yi i i+1 as a word in
y
G i+1 for 1 ≤ i ≤ r − 1, and sifting y j i as a word in G i+1 for 1 ≤ i < j ≤ r , we
can express the left-hand-sides of the relators in (7.1) as words in the yk . Induc-
|G :G | y
tively for i = r − 1, r − 2, . . . , 1, we can rewrite these words for yi i i+1 , y j i
in the form required in (7.1), utilizing the already constructed part of (7.1).
Although we cannot give a complexity estimate of this procedure because the
last step involves a collection process, the practical performance is very good.
7.3. Working with Elementary Abelian Layers

The main tool when working with groups given by a power-conjugate presen-
tation is to construct a chain G = G 1 ≥ G 2 ≥ · · · ≥ G m = 1 of normal
subgroups with elementary abelian quotients G i /G i+1 , identify G i /G i+1 with
a vector space, and recursively solve the problem at hand for the factors G/G i ,
for i = 1, 2, . . . , m. In almost all instances, the extension of the solution from
G/G i to G/G i+1 does not use the fact that G/G i and G i+1 are solvable;
the only property exploited by the algorithms is the conjugation action of G,
considered as a group of linear transformations on the vector space G i /G i+1 .
Therefore, the same methods can be used in not necessarily solvable permuta-
tion groups to extend a solution from a factor group G/N to G/M, provided that
N /M is elementary abelian and we can handle the conjugation action of G on
N /M.
Given M ≤ N ≤ G ≤ Sym() with M, N G and N /M elementary abelian,
we have already described in Section 6.2.7 how to identify N /M with a vector
space VN ,M in nearly linear time. Recall that we construct an SGS S M with
shallow Schreier tree data structure for M and then repeatedly use the algorithm
Normalizing Generator, described in the proof of Lemma 7.1.2, to augment
S M to an SGS S N for N . The elements in S N \S M can be considered as a basis for
the vector space VN ,M , and any coset Mg can be written in this basis by sifting
a representative of the coset as a word in N , and deleting the elements of S M
from the resulting word. We shall refer to this process as the coordinatization
of g (or Mg). The linear transformation of VN ,M corresponding to some h ∈ G
is constructed by coordinatizing the permutations z h for z ∈ S N \S M .
The coordinatization process is more efficient if we have a faithful permuta-
tion representation for G/M, since working in this permutation representation
we have S M = 1. In particular, extending a solution from G/O∞ (G) to G can
be done using the normal subgroup chain between 1 and O∞ (G), which was
constructed in Section 6.3.1, at the computation of O∞ (G).
In this section, we give two examples of lifting solutions from G/N to G/M.
To simplify notation, we assume that M = 1 (or, equivalently, we have a faith-
ful permutation representation for G/M). We emphasize again that the same
algorithms can be used in G/M without constructing a faithful permutation rep-
resentation for G/M; the coordinatization process just becomes less efficient.
7.3.1. Sylow Subgroups

Sylow subgroups play a central role in group theory, so constructing them is a
natural and important task. However, although there are numerous proofs for
the existence of Sylow subgroups, none of them translates easily to an efficient
algorithm. Consequently, some of the most intricate algorithmic machinery
for permutation groups was first developed in the context of Sylow subgroup
constructions.
The theoretical approach originates in [Kantor, 1985b; 1990] where a
polynomial-time algorithm is described for the construction of Sylow sub-
groups. Given G ≤ Sym(), first we construct a chain G = G 1 ≥ G 2 ≥
· · · ≥ G m = 1 of normal subgroups with characteristically simple quotients
G i /G i+1 (i.e., G i /G i+1 is the product of isomorphic simple groups). Such nor-
mal subgroups can be obtained by the methods of Section 6.2 (see especially
Section 6.2.7).
For a prime p dividing |G|, we construct a Sylow p-subgroup of G by
constructing Sylow p-subgroups recursively in G/G i , for i = 1, 2, . . . , m.
The extension of the solution from G/G i to G/G i+1 requires algorithms for
the following two problems:
(1) Find a Sylow p-subgroup in G i /G i+1 .
(2) Given two Sylow p-subgroups of G i /G i+1 , find an element of G i /G i+1
conjugating one to the other.
These problems are trivial if G i /G i+1 is elementary abelian, but if G i /G i+1
is nonsolvable then their solution requires some form of constructive recog-
nition of the simple groups in the direct product G i /G i+1 . For the definition
of constructive recognition, we refer to Section 8.3. Constructive recognition
identifies the simple groups under consideration with a standard copy. For clas-
sical groups of Lie type, the standard copy is a group of matrices with “nice”
generating sets, and problems (1) and (2) can be solved by linear algebra. How-
ever, for exceptional groups of Lie type, the current solutions of (1) and (2) are
essentially by brute force. The running time is polynomial in the input length,
but the algorithm is impractical. An important direction of future research is
the efficient algorithmic handling of exceptional groups of Lie type, including
the solutions of (1) and (2).
Later theoretical developments, namely the nearly linear-time Sylow sub-
group computations of [Morje, 1995; 1997] in groups with no exceptional
composition factors and the parallel machinery of [Mark, 1993] and [Kantor
et al., 1999], all follow this pattern. In this section we shall only describe how
to construct a Sylow subgroup of G/G i+1 in the case when G i /G i+1 is ele-
mentary abelian. In particular, solving this special problem is enough for the
construction of Sylow subgroups in solvable groups.
The first practical algorithm in [Cannon, 1971; Butler and Cannon, 1989]
constructs a Sylow p-subgroup by finding larger and larger p-subgroups, based
on Exercise 7.3. If a p-subgroup H ≤ G is already constructed but H is not
a Sylow subgroup then the algorithm constructs the centralizers C G (h) for
elements of Z (H ) until some h is found with the p-part of C G (h) exceeding
|H |. Then, recursively, a Sylow subgroup of C G (h) is constructed. In the group
C G (h), we can factor out a normal p-group using Theorem 6.3.1 and work in a
smaller quotient group. This algorithm uses centralizer computations and thus
backtrack searches, which we shall discuss in Chapter 9.
Later practical versions gradually converged toward the theoretical approach
just described. In [Atkinson and Neumann, 1990], [Butler and Cannon, 1991],
and [Cannon et al., 1997], the method of [Butler and Cannon, 1989] is combined
with a divide-and-conquer approach that utilizes block homomorphisms and
transitive constituent homomorphisms. In the most recent implementations, a
Sylow p-subgroup of G/O∞ (G) is lifted to a Sylow p-subgroup of G through
the elementary abelian layers of O∞ (G), as in the theoretical solution.
After this lengthy introduction, we can now get down to work. We would
like to solve the following problem:
Given Q G ≤ Sym(), with || = n, such that Q is elementary abelian

and G/Q is a p-group, construct a Sylow p-subgroup of G
in nearly linear time by a deterministic algorithm.
(7.2)
The approach we present for solving (7.2) is a straightforward sequential modi-

fication of the parallel procedure in [Kantor et al., 1999]. Although we describe
a deterministic version, randomized methods are very well suited to speed up
Sylow subgroup computations. If we make sure that the orders of the input
group and of the output (the alleged Sylow subgroup) are computed correctly,
then we can check whether the output is correct.
Let Q ∼ = Cqm for some prime q. If q = p then G is a p-group and we are
done. Hence we assume now that q = p. We compute an SGS {z 1 , . . . , z m } for
Q that supports the coordinatization of the elements of Q, as described in the
introduction of Section 7.3.
First, we solve (7.2) in the special case when G/Q ∼ = C rp is an elementary
abelian p-group. Note that in this case any Sylow p-subgroup P of G is ele-
mentary abelian. For x ∈ G, we denote by x̄ the coset Qx in G/Q.
Lemma 7.3.1. Let Q G ≤ Sym() be given, with Q ∼ = Cqm and G/Q ∼ = C rp

for two different primes p, q. Then a Sylow p-subgroup of G can be found in
Proof. Let {z 1 , . . . , z m } be the SGS of Q supporting coordinatization. First, we

construct x1 , . . . , xr ∈ G such that x̄ 1 , . . . , x̄ r generate G/Q. This can be done
in the faithful permutation representation of G/Q provided by Theorem 6.3.1,
or just by picking a subsequence of the given generators of G that define a
subgroup chain with groups having orders with increasing p-parts. After that,
we compute the matrices T1 , . . . , Tr of the linear transformations corresponding
to the conjugation action of x1 , . . . , xr on Q.
It is enough to find q1 , . . . , qr ∈ Q such that xi qi and x j q j commute (in G) for
all i, j ∈ [1, r ]. Having accomplished that, the qth powers of the permutations
xi qi generate a Sylow p-subgroup of G.
Since z 1 , . . . , z m is a basis of the vector space Q, the qk can be written in
m
the form qk = l=1 αkl zl , where the αkl are (a priori unknown) elements of
GF(q). For fixed i, j ∈ [1, r ], the condition [xi qi , x j q j ] = 1 provides m linear
equations for the αkl , since
1 = [xi qi , x j q j ] = qi−1 xi−1 q −1 −1

j x j x i qi x j q j
xi−1
= qi−1 · q −1
x
j · [xi , x j ] · qi j · q j . (7.3)
All five terms on the second line of (7.3) are elements of Q. We can compute the
coordinate vectors of these elements (as functions of the αkl ) since [xi , x j ] is a
known element of Q, and conjugation by xi and x j are linear transformations
of Q with known matrices Ti and T j , respectively. Hence, for example, the
x
coordinate vector of qi j is the product of the coordinate vector of qi and the
matrix T j . The sum of the five coordinate vectors corresponding to the terms of
the second line of (7.3) is 0 and each coordinate provides a linear equation for
the αkl . Altogether, we get ( r2 )m ∈ O(log3 |G|) equations in mr ∈ O(log2 |G|)
unknowns. This system of equations has a solution (since Sylow subgroups
exist), and we can find a solution by Gaussian elimination, using O(log7 |G|)
field operations in GF(q).
If G/Q is not elementary abelian then we use an algorithmic form of the

“Frattini argument” (cf. Exercise 7.4). First, we compute a homomorphism
ϕ : G → Sn with kernel Q using Theorem 6.3.1, and then we compute
an elementary abelian normal subgroup Ā ∼ = C rp of ϕ(G) and its preim-
−1
age A := ϕ ( Ā). By Lemma 7.3.1, we can compute a Sylow p-subgroup
P = x1 , . . . , xr of A. As a final preparatory step, we set up the data struc-
ture (cf. Section 5.1.2) for computing with the isomorphism ψ : P = x1 , . . . ,
xr → Ā = ϕ(x1 ), . . . , ϕ(xr ), which is the restriction of ϕ to P. The point
is that, given x̄ ∈ Ā, we can compute x ∈ P with ϕ(x) = x̄. As before, let
{z 1 , . . . , z m } be the SGS for Q ∼
= Cqm supporting coordinatization.
Now comes the Frattini argument. Since NG (P)A/A ∼ = G/A, the normalizer
NG (P) contains a Sylow p-subgroup of G. Our next goal is to construct
generators for NG (P), by computing some yg ∈ A for each generator g of G
such that gyg normalizes P. In fact, since A = P Q, we shall construct yg in Q.
g
Let g be a fixed generator of G. Since A G, we have x j ∈ A for 1 ≤ j ≤ r
and so there exist x j ∈ P and q j ∈ Q such that x j = x j q j . In fact, x j and
g
q j can be constructed, since x j = ψ −1 (ϕ(x j )ϕ(g) ) and q j = x j /x j . Hence we

g

need yg ∈ Q such that (x j q j ) = x j for 1 ≤ j ≤ r . This last equation is
yg
equivalent to
x j
yg−1 · yg · q j = 1, (7.4)
which, as in the proof of Lemma 7.3.1, leads to a system of linear equations

for the coordinates of yg in the basis {z 1 , . . . , z m }. The system of equations we
have to solve consists of mr ∈ O(log2 |G|) equations.
Given G = T , we compute yg for each g ∈ T . Then the set {gyg | g ∈ T }
generates NG (P). Next, using Theorem 6.3.1, we construct an action of NG (P)
with P in the kernel; repeating the foregoing procedure in the image, eventually
we can discard the p -part of NG (P) and so obtain a Sylow p-subgroup of G.
Note that the number of recursive calls in the procedure is bounded by the
maximal length of subgroup chains in G.
We can also solve the conjugation problem for Sylow subgroups in solvable
groups.
Lemma 7.3.2. Suppose that Q G ≤ Sym(), || = n, Q is an elementary

abelian q-group, and G/Q is a p-group for distinct primes p and q. Given Sylow
p-subgroups P1 , P2 in G, we can construct an element y ∈ G conjugating P1
to P2 in nearly linear time by a deterministic algorithm.
Proof. We start by constructing an SGS {z 1 , . . . , z m } for Q supporting coordina-

tization and constructing the homomorphism ϕ : G → Sn with kernel Q, as de-
scribed in Theorem 6.3.1. We also construct the isomorphisms ψ1 : P1 → ϕ(G)
and ψ2 : P2 → ϕ(G), which are the restrictions of ϕ to P1 and P2 , respectively.
The maps ψi define an isomorphism between Pi and G/Q, for i = 1, 2.
y
Since P1 Q = G, there exists y ∈ Q with P1 = P2 , and our aim is to construct
a conjugating element in Q. For any g ∈ P1 and y ∈ Q, the conjugate g y is in
the coset Qg; therefore, if y conjugates P1 to P2 then g y = ψ2−1 (ψ1 (g)) for all
g ∈ P1 .
Based on the observations in the previous paragraph, it is straightforward to
write a system of linear equations for the coefficients of a conjugating element
y in the basis {z 1 , . . . , z m }. Let P1 = T . The condition g y = ψ2−1 (ψ1 (g)) is
equivalent to
(y −1 )g · y = g −1 ψ2−1 (ψ1 (g)). (7.5)

Writing (7.5) for all g ∈ T and noticing that g −1 ψ2−1 (ψ1 (g)) ∈ Q, we obtain
|T |m equations for the coefficients of y.
Theorem 7.3.3. Given a solvable group H ≤ Sym(), with || = n, and two
Sylow p-subgroups P1 , P2 of H , an element of H conjugating P1 to P2 can be
constructed by a deterministic nearly linear-time algorithm.
Proof. We start with the computation of a series of normal subgroups 1 =

K r K r −1 · · · K 1 = H with elementary abelian quotients K i /K i+1 and
homomorphisms ϕi : H → Sn with kernel K i as in Section 6.3.1, at the construc-
tion of O∞ (H ). Then, recursively for i = 1, . . . , r , we construct gi ∈ ϕi (H )
such that ϕi (P1 )gi = ϕi (P2 ). We can start with g1 := 1. Suppose that gi is al-
ready constructed for some i, and let h i ∈ H be an arbitrary preimage of gi
under ϕi−1 . If K i /K i+1 is a p-group then gi+1 := ϕi+1 (h i ) conjugates ϕi+1 (P1 )
to ϕi+1 (P2 ). If K i /K i+1 is a q-group for some prime q = p then in the
group G := ϕi+1 (K i ), ϕi+1 (P2 ), the subgroup Q := ϕi+1 (K i ) is an elemen-
tary abelian normal q-subgroup with G/Q a p-group, and ϕi+1 (P1h i ), ϕi+1 (P2 )
are two Sylow p-subgroups of G. Hence we are in the situation of Lemma 7.3.2
and we can construct y ∈ G conjugating ϕi+1 (P1h i ) to ϕi+1 (P2 ). Then we define
gi+1 := ϕi+1 (h i )y. At the end of the recursion, gr ∈ ϕr (H ) ∼ = H is the element
of H conjugating P1 to P2 .
7.3.2. Conjugacy Classes in Solvable Groups

We defer the general discussion of conjugacy class computations to Section 9.4.
Here we only describe an algorithm from [Mecky and Neubüser, 1989] to solve
the following problem:
Let Q G ≤ Sym(), with || = n, and suppose that Q is elementary

abelian. Given representatives of the conjugacy classes of G/Q and
the centralizers of the representatives in G/Q, construct
representatives for the conjugacy classes of G and
the centralizers of the representatives.
(7.6)
Let {Qh 1 , . . . , Qh k } be the set of representatives for the conjugacy classes

of G/Q, and let C̄i := C G/Q (Qh i ) for i ∈ [1, k]. We denote the complete
preimage of C¯i in G by Ci .
Computation of the Class Representatives
For any g ∈ G, the conjugacy class of g intersects nontrivially exactly one of
the cosets Qh i (namely, the representative of the conjugacy class of Qg in
G/Q). Hence we can find representatives for the conjugacy classes of G in
the cosets Qh 1 , . . . , Qh k . For fixed i ∈ [1, k], two elements g1 , g2 ∈ Qh i are
conjugate in G if and only if g2 ∈ g1Ci , since any element of G\Ci conjugates
g1 outside of the coset Qh i . Therefore, we can obtain representatives for the
conjugacy classes of G that intersect Qh i nontrivially by computing the orbits
of the conjugation action of Ci on Qh i and taking a representative from each
orbit.
The argument in the previous paragraph is valid for any normal subgroup Q
of G. However, we can exploit the fact that Q is elementary abelian and so can
be considered as a vector space, allowing us to make two possible shortcuts
in the foregoing computation. To simplify notation, we drop the index i and
consider an arbitrary coset Qh ∈ {Qh 1 , . . . , Qh k }, with C̄ := C G/Q (Qh). We
denote the complete preimage of C̄ in G by C.
For qh ∈ Qh and c ∈ C, we have (qh)c = q c [c, h −1 ]h. For fixed c (and h),
the map Ac : Q → Q defined by the rule Ac : q → q c [c, h −1 ] is an affine map
of the vector space Q, since q → q c is a linear transformation and [c, h −1 ] is a
fixed element of Q, independent of q. Therefore the set of permutations of Qh,
defined by the conjugation actions of the elements of C, correspond to a set of
permutations defined by the affine transformations {Ac | c ∈ C} of Q. We claim
that AC := {Ac | c ∈ C} is in fact a group, and with the natural identification
ϕ : Qh → Q defined as ϕ : qh → q, the conjugation action of C on Qh is
permutation isomorphic to the action of AC on Q. Indeed,
q c1 c2 [c1 c2 , h −1 ] = (q c1 [c1 , h −1 ])c2 [c2 , h −1 ],
since [c1 c2 , h −1 ] = [c1 , h −1 ]c2 [c2 , h −1 ]. This last identity is a special case of
Lemma 2.3.10, and it can also be checked directly.
The first shortcut is to compute an SGS {z 1 , . . . , z m } of Q supporting co-
ordinatization and matrices in this basis for the affine transformations A T :=
{Ac | c ∈ T } for a generating set T of C. Then, instead of the orbits of the
conjugation of C on Qh, we compute the orbits of AC = A T on Q and take
the preimages of representatives of these orbits under ϕ −1 . The affine images
of elements of the vector space Q can be computed much more quickly than
the conjugates of the corresponding elements of Qh.
The second shortcut goes even further and computes the orbits of AC =
A T on a potentially smaller factor space of Q instead of Q. Since Q C,
Lemma 6.1.7 implies that the orbits of the subgroup A Q := {Ac | c ∈ Q} of AC
are permuted by AC . For c ∈ Q, we have q c = q for all q ∈ Q and so Ac
is just a translation of the vector space Q by the vector [c, h −1 ]. Hence, for
the 0 vector of Q, we have 0 A Q = [Q, h −1 ]. This subspace [Q, h −1 ] can be
computed easily, and the orbits of AC on Q are the complete preimages of the
orbits of AC on the factor space Q/[Q, h −1 ]. Let ĀC denote the permutation
action of AC on Q/[Q, h −1 ]. Note that ĀC acts on Q/[Q, h −1 ] as a group of
affine transformations as well, since for any q, r ∈ Q and c ∈ C we have
[q, h −1 ]c = (q −1 )c h c q c (h −1 )c = (q −1 )c h[h, c]q c (h −1 )c

= (q −1 )c hq c [h, c](h −1 )c = (q −1 )c hq c h −1 = [q c , h −1 ]
and so ([Q, h −1 ]r )c [c, h −1 ] = [Q, h −1 ]r c [c, h −1 ].
Computation of the Centralizers

To apply (7.6) recursively, we also have to compute the centralizers of the repre-
sentatives of the conjugacy classes in G. Let [Q, h −1 ]q1 , . . . , [Q, h −1 ]ql be the
representatives of the orbits of ĀC on Q/[Q, h −1 ] (and so q1 h, . . . , ql h are the
representatives of those conjugacy classes of G that intersect Qh nontrivially).
During the orbit computations in Q/[Q, h −1 ], we can build Schreier trees S j
that code elements of ĀC carrying the representatives [Q, h −1 ]q j , j ∈ [1, l], to
other elements of their orbits (cf. the end of Section 2.1.1). The elements coded
by S j comprise a transversal for ĀC mod ( ĀC )[Q,h −1 ]q j , and so some form of
point stabilizer subgroup constructions can be used to obtain generators for
D̄ j := ( ĀC )[Q,h −1 ]q j . Note that we know | D̄ j | in advance, so the randomized
methods from Chapter 4 give guaranteed correct output as well.
Let D j denote the complete preimage of D̄ j in AC . We have A Q ≤ D j , since
A Q acts trivially on Q/[Q, h −1 ]. In terms of the action of AC on Q, the group
D j is the setwise stabilizer of [Q, h −1 ]q j . We still have to construct C G (q j h),
which is permutation isomorphic to the point stabilizer subgroup of D j fixing
q j . For that, we do not need another orbit computation on [Q, h −1 ]q j ; instead,
we can use linear algebra.
q
As a first step, we construct (A Q )q j . For q ∈ Q, we have Aq (q j ) = q j [q,
−1 −1
h ], which is equal to q j if and only if [q, h ] = 1, that is, q ∈ C Q (h) =
C Q (q j h). Hence (A Q )q j can be obtained as the fixed point space of Ah , which
is a linear transformation of Q.
Next, we claim that, for any c ∈ D j ,
(i) there is an element r (c) ∈ Q such that Acr (c) fixes q j ; and
(ii) if Acr1 , Acr2 fix q j for some r1 , r2 ∈ Q then r1r2−1 ∈ C Q (h).
To show (i), observe that
Ac (q j ) = q cj [c, h −1 ] = [q, h −1 ]q j (7.7)
for some q ∈ Q, since Ac fixes [Q, h −1 ]q j . For that q, using (7.7) and using
repeatedly that Q is abelian and that [c, h −1 ] ∈ Q, we have
cq −1 −1
Acq −1 (q j ) = q j [cq −1 , h −1 ] = q cj [c, h −1 ]q [q −1 , h −1 ]
= q cj [c, h −1 ][q −1 , h −1 ] = [q, h −1 ]q j [q −1 , h −1 ] = q j .
Hence r (c) := q −1 satisfies (i). Moreover, if Acr1 (q j ) = Acr2 (q j ) = q j then

−1
[cr1 , h −1 ] = [cr2 , h −1 ], implying [r1 , h −1 ] = [r2 , h −1 ] and h r1 r2 = h. There-
fore, (ii) holds.
We can construct C G (q j h) as follows: For each generator c of D j = T , we
construct r (c) satisfying (i), by solving the system of linear equations
−1
q cj [c, h −1 ]q −1
j =q
−1
· qh
for the coordinates of q in the basis {z 1 , . . . , z m } of Q supporting coordinati-

zation. This equation was obtained by rewriting (7.7). The left-hand side is a
known vector in Q, and the right-hand side can be expressed as a linear function
of the coordinates of q. As we have seen in the previous paragraph, r (c) := q −1
satisfies (i). Let C Q (h) = S. By (ii), the set S ∪ {cr (c) | c ∈ T } generates
C G (q j h).
The running time and memory requirement of the algorithm is proportional
to |Q/[Q, h −1 ]|, which may be exponentially large in terms of the input length.
In general, an exponential blowup is unavoidable, since the length of the output
may be this large. This happens, for example, if Q ≤ Z (G). Hence, the method
should be used judiciously. For example, when computing the conjugacy classes
of a solvable group, we may try to arrange the series of normal subgroups with
elementary abelian factors guiding the recursion so that central extensions come
at the end of the recursion. Moreover, we may have to restrict ourselves only to
the counting of the conjugacy classes instead of listing representatives.
7.4. Two Algorithms for Nilpotent Groups

In this section, we describe two algorithms that use quite different methods
than the top-down extension of information to larger and larger factor groups
in Section 7.3.
7.4.1. A Fast Nilpotency Test
There are numerous characterizations of finite nilpotent groups (cf. Exer-
cise 7.9), and quite a few of them can be utilized for testing whether a given per-
mutation group G is nilpotent. Here we present a nilpotency test from [Rákóczi,
1995; 1997], which is extremely fast even in the case when log |G| is propor-
tional to the permutation degree.
First, we describe an algorithm that decides whether a group G = S ≤
Sym(), with || = n, is a p-group for a given prime p. A possibility is to
compute |G| by the algorithm of Section 7.1, but we can do better. A permutation
group is a p-group if and only if all of its transitive constituents are p-groups;
hence we start with constructing the transitive constituents of G. This amounts
to computing the orbits of G and can be done in O(|S|n) time. If G has an orbit
such that || is not a power of p then we can conclude immediately that G
is not a p-group.
For the timing of the next theorem, recall that the inverse Ackerman function
−1
A (m, n) was defined in Section 5.5.2.
Theorem 7.4.1. Let G = S ≤ Sym() be transitive, with || = n = p k

for some prime p. It can be decided whether G is a p-group by a deterministic
algorithm in O(|S|n log n A−1 (2|S|n, n)) time.
Proof. The algorithm is based on the following simple characterization of tran-

sitive p-groups. A transitive group G ≤ Sym(), with || = p k for some k ≥ 1,
is a p-group if and only if
(i) G has a block system

= {
1 , . . . ,
p };
(ii) G acts on
as the cyclic group C p ; and
(iii) if G 1 is the setwise stabilizer of
1 in G then the restriction G 1 |
1 is a
p-group.
If G satisfies (i) and (ii) then G 1 is the kernel of action on

, since G acts
regularly on
. Hence |G| = p|G 1 | and (iii) implies that G is a p-group. Con-
versely, Theorem 6.2.5 and Exercise 6.14 imply that the only primitive p-group
is C p in its natural action on p points, so if G is a transitive p-group then it
has a block system satisfying (i) and (ii), and (iii) holds obviously. Therefore,
we can test whether G = S is a p-group by checking whether it has a block
system satisfying (i)–(iii).
The first step is a subroutine that constructs a block system of size p if G is
a p-group and returns a block system of size p or {} if G is not a p-group.
We use the version of block computations described in Section 5.5.2. To this
end, we need to construct some subset ⊆ , with || ≥ 2, such that if G
is a p-group then there is a nontrivial block system with a block containing all
points of .
One way of finding such a set is based on the observation that if G is
a p-group and g, h ∈ G are not commuting then [g, h] fixes all blocks in any
block system of size p for G, and so any nontrivial orbit of [g, h] can be used
as . Another way is to take a nontrivial element g of Z (G). By Lemma 6.1.7,
the orbits of g are permuted by G. Therefore, if g has more than one orbit on
then any nontrivial orbit of g is suitable as , whereas if g has only one orbit
then any orbit of g p suffices.
Let g be the first generator of G given in the generating set S. We compute the
commutator of g with all other elements h ∈ S. If one of these commutators is
nontrivial then the first method described in the previous paragraph for finding
can be applied, whereas if all commutators are equal to 1 then g ∈ Z (G) and
the second method can be used. Applying the algorithm of Section 5.5.2, we
construct a block system
∗ such that some block of
∗ contains all points of
. If
∗ = {} then we conclude that G is not a p-group, and if |
∗ | > p then
we construct the action of G on
∗ and repeat the procedure recursively for
this action. Eventually, we obtain a block system
for G satisfying (i), or we
terminate with the conclusion that G is not a p-group.
If
is constructed then the second step of the algorithm checks whether the
action of G on
satisfies (ii). Any element of S acting nontrivially on
must
define a cyclic ordering of the blocks in
and it is straightforward to check
whether the other generators respect this cyclic ordering.
The third step of the algorithm is the construction of generators for
G 1 |
1 . We find some g ∈ S acting nontrivially on
and construct the bijec-
tions ϕ j :
1 →
j and their inverses for j ∈ [2, p], defined by the powers
g, g 2 , . . . , g p−1 of g. These bijections can be used to construct the restrictions
of the permutations in the set
T := {g i hg − j | i ∈ [0, p − 1], j ∈ [0, p − 1], h ∈ S, g i hg − j ∈ G 1 }
to
1 . By Lemma 4.2.1, these restrictions generate G 1 |
1 .
After that, the same procedure as we just described can be applied recursively
to determine whether G 1 |
1 is a p-group. Note that G 1 |
1 acts transitively on
1 , since for any α, β ∈

1 , there exists x ∈ G with α x = β and obviously
x|
1 ∈ G 1 |
1 .
Now we analyze the time requirement of the algorithm. In the first step of
the algorithm, the set is computed in O(|S|n) time. By Corollary 5.5.9,
the block system
∗ is obtained in O(|S|n A−1 (2|S|n, n)) time. If |
∗ | > p
then further block computations are required; however, these recursive calls of
block computations are on sets of size decreasing by at least a factor p, so they
contribute only a constant multiplier to the time complexity. The second step,
checking the action on
, is in O(|S| p) time. In the third step, the bijections
ϕ j and their inverses are constructed in O(n) total time, because for each of the
n/ p elements α ∈
1 and for each j ∈ [2, p − 1], ϕ j+1 (α) = ϕ j (α)g can be
obtained in constant time from ϕ j (α). The |S| p Schreier generators for G 1 |
1
are computed in O(|S|n) total time, since we construct permutations only on a
domain of degree n/ p. Hence the total time requirement of the three steps is
O(|S|n A−1 (2|S|n, n)).
The recursive call is for the permutation group G 1 |
1 of degree n/ p
with |S| p generators, so the three steps of the algorithm for G 1 |
1 require
O((|S| p)(n/ p)A−1 (2|S|n, n/ p)) = O(|S|n A−1 (2|S|n, n)) time. Similarly, fur-
ther recursive calls are on domains of degree decreasing by a factor p and with
the number of generators increasing by a factor p, so the time requirement
remains the same. Since the number of recursive calls is at most log n, the total
time requirement of the algorithm is O(|S|n log n A−1 (2|S|n, n)), as claimed.
The nilpotency test for permutation groups is based on the p-group test
described in the proof of Theorem 7.4.1. A permutation group G = S ≤
Sym(), with || = n, is nilpotent if and only if all of its transitive con-
stituents are nilpotent, so again we start with the construction of the transitive
constituents, in O(|S|n) time. From now on, we suppose that G is transitive.
Let p be a prime divisor of n. If G is nilpotent then G = P × Q, where P is
a nontrivial p-group and Q is a nilpotent group of order not divisible by p. By
Lemma 6.1.7, the orbits of P constitute a block system
∗ for G. Since P is a
p-group, the size of blocks in
∗ is p k for some k ≥ 1. Moreover, as P is in the
kernel of the action of G on
∗ , the action of G on
∗ has order relative prime
to p. So, in particular, the divisor |
∗ | of this group order is relative prime to p
and p k is the largest power of p that divides n. The orbits of Q constitute another
block system
∗∗ for G. By the same argument we used for
∗ , we obtain that
|
∗∗ | is relative prime to each prime divisor of |Q|. Hence |
∗∗ | = p k .
Finally, we describe how can we get generators for P and Q. For a generator
l
s ∈ S, let |s| = pl r with ( p, r ) = 1. Then s p := s r ∈ P and sq := s p ∈ Q, and
s ∈ s p , sq . Hence P = s p | s ∈ S and Q = sq | s ∈ S. Note that |s| divides
n since obviously pl ≤ p k , and by induction on the number of different prime
divisors of |Q| we can prove that r divides n/ p k .
Lemma 7.4.2. Let G = S ≤ Sym(), with || = n, be transitive, let p be

a prime divisor of n, and let P := s p | s ∈ S, Q := sq | s ∈ S with s p , sq as
defined in the previous paragraph. Let p k be the largest power of p dividing n.

Then G is nilpotent if and only if it satisfies all of the following properties:
(i) For each s ∈ S, |s| divides n.

(ii) The orbits of P constitute a block system
∗ for G.
(iii) The orbits of Q constitute a block system
∗∗ for G.
(iv) |
∗ | = n/ p k and |
∗∗ | = p k .
(v) P acts as a p-group on its orbits.
(vi) Q acts as a nilpotent group on its orbits.
Proof. We have seen that a transitive nilpotent group satisfies (i)–(vi). Con-
versely, suppose that a transitive group G satisfies (i)–(vi), and let N1 and N2 be
the kernels of actions of G on
∗ and
∗∗ , respectively. We have s ∈ s p , sq
for each s ∈ S, so G = P, Q. Since P ≤ N1 and Q ≤ N2 , we also have
G = N1 , N2 . Moreover, N1 ∩ N2 = 1, since the orbits of N1 ∩ N2 constitute
a block system that is a common refinement of
∗ and
∗∗ , and so by (iv) the
orbits must have size 1. Hence G = N1 × N2 . Using again that G = P, Q and
P ≤ N1 , Q ≤ N2 , we obtain G = P × Q. By (v) and (vi), G is nilpotent.
Corollary 7.4.3. Let G = S ≤ Sym(), with || = n, be transitive. Then

we can test whether G is nilpotent in O(|S|n log n A−1 (2|S|n, n)) time by a
Proof. We have to show that properties (i)–(vi) in Lemma 7.4.2 can be checked
in the given time bound. The prime factorization of n can be obtained in
√
O( n log2 n) time by brute force. For each s ∈ S, we can check whether
s n = 1 and, if the answer is yes, we can compute s p and sq in O(n log n)
time. The orbits of P and Q can be determined, and properties (ii), (iii), and
(iv) can be checked, in O(|S|n) time. By Theorem 7.4.1, (v) can be checked
in O(|S|n log n A−1 (2|S|n, n)) time. Finally, (vi) is checked by a recursive call.
Since the orbits of Q have size n/ p k ≤ n/2, and further recursive calls are for
groups of degree decreasing at least by a factor 2, the recursive calls contribute
only a constant multiplier to the overall running time.
7.4.2. The Upper Central Series in Nilpotent Groups

As we have seen in Section 6.1.3, the center of a group is easily computable.
To obtain the upper central series, this computation should be iterated in factor
groups. However, as we pointed out repeatedly, it is possible that even factor
groups of p-groups G ≤ Sn can have no faithful permutation representations
on less than 2n/4 points (cf. Exercise 6.6). Hence a completely new approach is
necessary. In this section we present the nearly linear-time upper central series
algorithm for nilpotent groups from [Seress, 1998]. The method is based on
ideas from [Kantor and Luks, 1990].
As in Section 6.1.1, given G, H ≤ Sym(), let 1 , 2 be disjoint copies
of the set and introduce the subgroups D = Diag(G × G) := {(g, g) | g ∈
G}, 1 × H := {(1, h) | h ∈ H }, and K := D, 1 × H . The following lemma
is a special case of the discussion in [Kantor and Luks, 1990, Section 6].
Lemma 7.4.4. If H G then CoreG×G (K ) acts as H · Z (G/H ) on 1 .
Proof. Denote the restriction of CoreG×G (K ) to 1 by C. Then C contains H

because H × H G ×G and H × H ≤ K . Also, K = {(g, gh) | g ∈ G, h ∈ H }
since G normalizes H . Now, for an arbitrary (g, gh) ∈ K ,
(g, gh) ∈ CoreG×G (K )

⇔ (g, gh) ∈ K (y1 ,y2 ) for all y1 , y2 ∈ G
⇔ (g, gh)(y1 ,y2 ) ∈ K for all y1 , y2 ∈ G

−1
⇔ (g, gh) 1,y2 y1 ∈ K for all y1 , y2 ∈ G (since K (y1 ,y1 ) = K )
⇔ (g, g y h y ) = (g, g(g −1 g y )h y ) ∈ K for all y ∈ G
⇔ g −1 g y ∈ H for all y ∈ G.
The last condition means that g and g y are in the same coset of H for all y ∈ G;
in other words, H g ∈ Z (G/H ).
Lemma 7.4.4 reduces the construction of Z (G/H ) to a core computation.

By the algorithm of Section 6.1.5, we can compute the core of a subnormal
subgroup in nearly linear time; hence, we need to determine in which cases
is K subnormal in G × G. The necessary and sufficient condition for that is
provided by the following lemma.
Lemma 7.4.5. K G × G if and only if G/H is nilpotent.
Proof. Define K 0 := G × G and K i := K K i−1 for i > 0. Also, let L 0 :=

G, L i := [G, L i−1 ] for i > 0 denote the lower central series of G. We claim
that, for all i ≥ 0,
K i = D, 1 × H L i . (7.8)
Exercises 181
We prove (7.8) by induction on i. The case i = 0 is trivial. The inductive step

is based on the identity that for g ∈ G, h 1 , h 2 ∈ H , and l ∈ L i ,

(g, gh 1 )(1,h 2 l) = g, g h 2 l h 1h 2 l
h2l h2l
= g, g g −1l −1 h −1 2 gh 2 l h 1 = g, g [g, l]l −1 g −1 h −1
2 gh 2 l h 1 (7.9)

= g, g[g, l][g, h 2 ]l h 1h 2 l = (g, g) 1, [g, l][g, h 2 ]l h 1h 2 l .
Suppose that (7.8) holds for some i ≥ 0. Since K d = K for all d ∈ D, the
group K i+1 = K K i is generated by elements of the form (g, gh 1 )(1,h 2 l) for g ∈
G, h 1 , h 2 ∈ H , and l ∈ L i . By (7.9), all generators are in D, 1 × H L i+1 since
[g, l] ∈ L i+1 and [g, h 2 ]l , h 1h 2 l ∈ H . Conversely, D, 1 × H L i+1 is generated
by D and elements of the form (1, [g, l]h) for g ∈ G, h ∈ H , and l ∈ L i .
−1
Applying (7.9) backward with the notation h 1 := h l and h 2 := 1, we obtain
that
−1 l −1 (1,l)
(1, [g, l]h) = 1, [g, l][g, 1]l h l = (g −1 , g −1 ) g, gh l ∈ K i+1 .
Hence (7.8) holds for i + 1.

To finish the proof of the lemma, observe that H L i is the ith term in the lower
central series of G/H . Hence, if G/H is nilpotent then H L i = H for some i
and (7.8) implies that K = K i for the same i. By Exercise 6.5, this means that
K G × G. Conversely, if K i = K for some i then (7.8) implies that for all
x ∈ H L i we have (1, x) = (g, gh) for some g ∈ G and h ∈ H . This means
g = 1 and h = x, so H L i = H and G/H is nilpotent.
In particular, if G is nilpotent then we can compute its upper central series

in nearly linear time by a deterministic algorithm. A practical concern is that
some of the computations have to be carried out on a permutation domain that is
four times larger than the original one, since the core construction doubles the
permutation domain of G × G. Fortunately, numerous shortcuts are possible.
For example, (7.8) implies that generators for the subnormal series between K
and G × G can be computed on the original permutation domain via the lower
central series of G.
Exercises
7.1. Modify the solvable SGS construction of Section 7.1 such that the out-
put polycyclic generating sequence defines a subgroup chain with prime
indices.
7.2. Prove that if G = H, y ≤ Sym(), y normalizes H , and S is an SGS
for H then the algorithm of Lemma 7.1.2 constructs an SGS S ∗ for G
relative to some base (β1 , . . . , β M ) such that for all i ∈ [1, M] the ratio
G (β ,...,β ) H(β ,...,β )
|βi 1 i−1 |/|βi 1 i−1 | of the fundamental orbit sizes divides |G : H |. In
particular, if |G : H | is a prime number then
(i) the fundamental orbits of H and G differ for exactly one base point;
(ii) all elements in S ∗ \S are powers of some z ∈ G\H belonging to the
coset H y.
7.3. Let H ≤ G and suppose that |H | = p m for a prime p. Prove that if H is
not a Sylow p-subgroup of G then there exists h ∈ Z (H )\{1} such that
|C G (h)| is divisible by p m+1 .
7.4. Let N G and let P be a Sylow p-subgroup of N . Prove that G =
NG (P)N .
7.5. [Kantor and Taylor, 1988] Modify the methods of Section 7.3.1 to construct
and conjugate Hall subgroups of solvable groups.
7.6. Use the techniques of Section 7.3 to design a nearly linear-time deter-
ministic algorithm to construct a Sylow p-subgroup of a solvable group
G ≤ Sym() that contains a given p-subgroup of G.
7.7. Use the techniques of Section 7.3 to design a nearly linear-time construc-
tive version of one half of the Schur–Zassenhaus theorem. Namely, given
N G ≤ Sym() with (|N |, |G/N |) = 1 and N solvable, construct a com-
plement H for N in G. Furthermore, given two complements H1 , H2 of
N , construct some g ∈ G that conjugates H1 to H2 .
7.8. How can random subproducts speed up the algorithms for computing
strong generating sets in solvable groups (Section 7.1), cores of subnor-
mal subgroups (Section 6.1.5), and Sylow subgroups of solvable groups
(Section 7.3.1)?
7.9. Show that the following are equivalent for a finite group G.
(i) The lower central series of G terminates at 1.
(ii) Every subgroup of G is subnormal.
(iii) Every proper subgroup of G is properly contained in its normalizer.
(iv) Every maximal subgroup of G is normal.
(v) The Sylow subgroups of G are normal in G.
(vi) G is the direct product of p-groups.
(vii) The upper central series of G terminates at G.
8
Strong Generating Tests
The fast constructions of strong generating sets are randomized and in prac-
tice we often use heuristics; therefore, it is necessary to test the correctness of
the output. Given an input group G = S ≤ Sym(), suppose that we have
constructed an alleged base B = (β1 , . . . , βm ), generating sets Si for each
G [i] := G (β1 ,...,βi−1 ) , and transversals Ri for Si mod Si βi . The correctness of
the computation can be checked by forming all Schreier generators from Si , Ri
for i = 1, 2, . . . , m and sifting them through our data structure. However, this
method is too slow, and trying to avoid the formation of all Schreier generators
was the primary reason why we resorted to randomized constructions.
So far, we have seen two methods to check the result of an SGS computation.
The most simple-minded way, which is very useful in practice, is to compare

i |Ri | and |G| if the order of G is known in advance. Errors in SGS computa-
tions are one-sided in the sense that we never put permutations into Si that are
not in G [i] , and the possible error is that Si generates only a proper subgroup

of G [i] . Therefore, if i |Ri | = |G| then the construction is correct.
The second method we have encountered is the Monte Carlo strong gen-
erating test described in Lemma 4.5.6. Although this procedure is reasonably
fast in practice if the SGS construction used one of the shallow Schreier tree
construction methods of Section 4.4, the drawback is that we cannot guarantee
correctness with absolute certainty.
In this chapter, we describe three further methods to check the correctness of
an SGS computation. By Lemma 4.2.3, it is enough to check that Si βi = Si+1
holds for each i. To this end, the first two methods we present solve the problem
(4.6) in Section 4.3. Recall that (4.6) asks for deciding whether H = G β , given
the input G = S, a transversal for G mod G β , and H ≤ G β . The third method
departs radically from the traditional point stabilizer approach and is based on
structural properties of G. By the time it confirms the correctness of the SGS,
it also computes a composition series and a short presentation for G.
183
184 Strong Generating Tests
8.1. The Schreier–Todd–Coxeter–Sims Procedure

One possibility for solving the problem (4.6) in Section 4.3 is a combination of
coset enumeration and the Schreier generator approach. The idea was originally
proposed in some unpublished notes of Sims in 1974 and in [Sims, 1978], and
the algorithm was developed in [Leon, 1980b]. A brief summary of the method
is in [Leon, 1980a].
We start with a crash course on coset enumeration from [Seress, 1997],
copyright c 1997 American Mathematical Society. Reprinted by Permission.
A very accessible and more thorough introduction to the subject can be found in
[Neubüser, 1982], and the monograph [Sims, 1994] contains a comprehensive
treatment of finitely presented groups.
8.1.1. Coset Enumeration

Let G = E | R be a presentation for a group G, where E = {g1 , . . . , gn } is
a finite set of generators, and R = {r1 = 1, . . . , rm = 1} is a set of defining
relations. Each ri is a word, using the generators in E and their inverses. The
basic questions are to decide whether G is finite and to determine whether a
given word represents the identity of G.
By the celebrated result of Novikov and Boone (cf. [Rotman, 1995, Chap.
12]), these questions are undecidable: They cannot be answered by a recursive
algorithm. Nevertheless, because of the practical importance of the problem,
considerable effort has been devoted to the development of methods for inves-
tigating finitely presented groups.
One basic method is the Todd–Coxeter coset enumeration procedure. Given
G = E | R and H = h 1 , . . . , h k , where H ≤ G and each h j is a word in
the generators of G and their inverses, our goal is to compute the permutation
representation of G on the right cosets of H .
We set up a coset table. This is a matrix M, where the rows are labeled
by positive integers, representing cosets of H , and the columns are labeled by
the elements of Ē := {g1 , . . . , gn , g1−1 , . . . , gn−1 }. The entries (if defined) are
positive integers: M(k, g) = l if we know that kg = l for the cosets k, l and
for g ∈ Ē. Originally, we have a 1 × | Ē| table with no entries, where 1 denotes
the coset H · 1. As new cosets are defined, we add rows to the coset table.
Of course, we have to detect when two words, defining different rows of the
table, actually belong to the same coset of H . To this end, for each relation
ri = gi1 gi2 · · · git , we also maintain a relation table. This is a matrix Mi , with
rows labeled by the cosets 1, 2, . . . as defined in M and columns labeled by the
elements of the sequence (gi1 , gi2 , . . . , git ). The entry Mi (k, gi j ), if defined, is
the number of the coset kgi1 · · · gi j . Initially, we have Mi (k, git ) = k for each
row number k, since ri = 1 in G. Whenever a new coset is defined, we fill all
entries of the relation tables that we can.
Finally, for each generator h j = g j1 · · · g jt of H , we maintain a subgroup
table. This is a matrix S j with only one row, corresponding to the coset H · 1,
and columns labeled by the factors of h j . The rule for filling entries is the same
as for the Mi ; originally, S j (1, g jt ) = 1, since H h j = H .
When the last entry is filled in a row of a relation table or a subgroup table,
we also get an extra piece of information kg = l for some cosets k, l and g ∈ Ē.
This is called a deduction. If the entry M(k, g) is not yet defined then we fill the
entries M(k, g), M(l, g −1 ) and all possible entries in the relation and subgroup
tables; in this way, we may get further deductions. If M(k, g) is already defined
but l := M(k, g) = l, then we realize that l, l denote the same coset of H .
This is called a coincidence. We replace all occurrences of l, l by the smaller
of these two numbers and fill the entries of the tables that we can. This may
lead to further deductions and coincidences. The process stops when all entries
of the coset table, the relation tables, and subgroup tables are filled.
We illustrate these ideas by enumerating G = g1 , g2 | g12 = 1, g22 = 1,
(g1 g2 )3 = 1 ∼= S3 on the cosets of the subgroup H = g1 g2 g1 g2 of order 3.
Since both generators are involutions, we have Ē = E. Also, we maintain only
one relation table, corresponding to (g1 g2 )3 = 1; the other two relators tell us
that, at the definition of new cosets, we should multiply previous cosets by
g1 , g2 alternatingly. The display
CT g1 g2 RT g1 g2 g1 g2 g1 g2
1 2 3 1 2 4 3 1
2 1 4 2 1 3 4 2
3 1 3 4 2 1 3
(8.1)
4 2 4 3 1 2 4
ST g1 g2 g1 g2
1 2 4 3 1
shows the coset table CT, relation table RT, and subgroup table ST after the
definition of the cosets 1 := H, 2 := 1g1 , 3 := 1g2 , 4 := 2g2 . At that moment,
the last entry (in the second column) of ST is filled and we get the deduction
4g1 = 3, which also implies 3g1 = 4. Then all entries in CT are known, and
we can complete RT; this leads to the coincidences 1 = 4 and 2 = 3.
There are different strategies for coset enumeration. The version we pre-
sented here, where we fill all consequences of new definitions, deductions, and
coincidences in all relation tables as soon as possible, is called the Felsch strat-
egy. However, it is possible to postpone the filling of some relation tables: If
we ensure that if a coset k is defined then sooner or later we define kg for all
g ∈ Ē, and sooner or later we deduce all consequences of these definitions,
then it is guaranteed that the algorithm terminates if |G : H | < ∞. This remark
plays an important role in the theoretical justification of the algorithm described
in Section 8.1.2, where we do not have a full presentation at the start of coset
enumeration and, as the coset enumeration progresses, we add relators to R.
From a theoretical point of view, we can suppose that all relators are already
present at the beginning; we just postpone the filling of entries in some relation
tables.
There is no recursive function of |G : H | and the input length that would
bound the number of cosets defined during the Todd–Coxeter procedure. It
is easy to give presentations of the trivial group such that no commonly used
variant of the Todd–Coxeter algorithm can handle them. This, and different coset
enumeration strategies, are discussed, for example, in [Sims, 1994, Chap. 5].
Success mostly depends on the number of entries defined in the coset table rather
than |G : H |. There are instances of successful enumeration with |G :H | > 106 .
8.1.2. Leon’s Algorithm

Now we are ready to present the strong generating test using coset enumeration.
Leon calls his procedure the Schreier–Todd–Coxeter–Sims (STCS) method.
The factorization of Schreier generators as products of coset representatives
gives a set of defining relations (cf. Exercise 5.2). Hence, shifting the Schreier
generators can be interpreted as checking that each relation holds. However,
this relator set is usually highly redundant. The idea behind STCS is to find a
smaller set R of relations and check only the elements of R.
In addition to the notation of (4.6) in Section 4.3, suppose that |G : G β | = n
and we already have a presentation E 1 | R1 for H . The latter assumption is
satisfied automatically as we process groups along a stabilizer chain.
The STCS Algorithm

We have to verify that |G : H | = n or find an element of G β \H . To this end,
we perform a coset enumeration of G on the cosets of H . If H has n cosets then
H = G β . We work with a presentation E | R, where E is a set of symbols
containing E 1 and a symbol for each generator of G. However, at the beginning
we do not have relators for G and initialize R := R1 . During the procedure, we
add relators to R. The algorithm verifies that for each relator added to R, the
product of the corresponding elements of G is 1. Let E ∗ denote the set of finite
words over the alphabet E, and let ϕ : E ∗ → G be the map that evaluates words
in E ∗ in G.
We fix a number m > n and we interrupt the coset enumeration when m
cosets are defined. There are three possibilities:
Case 1: The coset table closes with n cosets. In this case, |G : H | = n or,
equivalently, H = G β .
Case 2: The coset table closes with more than n cosets. Then H = G β and
an element of G β \H can be obtained by choosing two words w1 , w2 from
the coset table that represent different cosets of H but β ϕ(w1 ) = β ϕ(w2 ) .
Then ϕ(w1 w2−1 ) ∈ G β \H .
Case 3: The coset enumeration was interrupted after defining m cosets. Then,
as in Case 2, there are words w1 , w2 in the coset table such that β ϕ(w1 ) =
β ϕ(w2 ) . This means that g := ϕ(w1 w2−1 ) ∈ G β ; we test whether g ∈ H .
If the answer is “no” then we have found a witness for G β = H . If the
answer is “yes” then we sift g as a word in H and obtain a word t in
the strong generators of H such that the product of permutations in t is
g. Let w denote the word corresponding to t in E ∗ . We add the relator
w1 w2−1 w −1 = 1 to R and perform the collapse w1 = w2 in the coset table.
Then we resume the coset enumeration.
Owing to the elusive nature of coset enumeration, it is hard to analyze what

happens. The finiteness of the procedure can be ensured by a careful choice of the
words w1 , w2 in Step 3. Namely, at each interruption of the coset enumeration,
we build a breadth-first-search tree T that contains the already defined cosets as
vertices. The tree is built the same way as at orbit calculations (cf. Section 2.1.1).
The root is coset 1, corresponding to H . After the ith level L i is defined, let L i+1
consist of those cosets k that do not occur on some already constructed level
and were defined as k := lg for some l ∈ L i , g ∈ E ∪ E −1 . This construction
can be easily done using the (partial) coset table we have at the interruption.
Since there are m cosets, the depth of T is at most m and each coset k is
represented by a word wk of length at most m in E ∪ E −1 . Also, for each
vertex k of T , we compute β ϕ(wk ) . Recall that we choose cosets k1 , k2 such that
β ϕ(wk1 ) = β ϕ(wk2 ) and add a relator of the form wk1 wk−12
w to R, where w is a
word in E 1 ∪ E 1−1 and w is determined uniquely by wk1 and wk2 . Hence there
are only finitely many possibilities to add a relator to R and so the procedure
must terminate. In the implementation, whenever a coset k is added to T, β ϕ(wk )
is computed immediately and the construction of T can be abandoned as soon
as we encounter two words with β ϕ(wk1 ) = β ϕ(wk2 ) .
Of course, we need much more than finiteness of the procedure. Leon experi-
mented extensively with the value of the parameter m and the coset enumeration
technique to be used. He chose the value m = 1.2n and developed a general-
ization of one of the basic strategies for coset enumeration, the Haselgrove–
Leech–Trotter method. Note that if m < 2n then Case 2 cannot occur, since
|G : H | > n means |G β : H | ≥ 2 and so |G : H | = |G : G β | · |G β : H | ≥ 2n.
When the entire procedure terminates in Case 1, we have a presentation
E | R for a group Ḡ that has a subgroup H̄ of index n, and G satisfies this
presentation. Hence there is a homomorphism ψ : Ḡ → G such that ψ( H̄ ) =
H . However, since relators for H are a subset of R, we must have H̄ ∼ =
∼
H, ker(ψ) = 1, and Ḡ = G. Hence we obtain a presentation for G, which is
usually much shorter than the one described in Exercise 5.2.
The STCS algorithm is used in Magma.
8.2. Sims’s Verify Routine

We present another solution for (4.6) in Section 4.3. This method was developed
by Sims in the early 1970s when he constructed a permutation representation
for Lyon’s simple group, but it has not been published. Lecture notes from Sims
have circulated at least since 1972, and the procedure is implemented both in
GAP and Magma. Again, we use the notation of (4.6).
Lemma 8.2.1. Suppose that for all δ ∈ β G there exist sets U (δ) ⊆ G satisfying
the following properties:
(i) (∀u ∈ U (δ)) (β u = δ) and U (β) ⊆ H ;

(ii) (∀u, v ∈ U (δ)) (uv −1 ∈ H ); and
(iii) (∀x ∈ S) (∃u ∈ U (δ)) (∃v ∈ U (δ x )) (uxv −1 ∈ H ).
Then H = G β .
Proof. We use the idea of the proof of Lemma 4.2.1. Let g ∈ G β be arbi-
trary. Since G β ≤ G, g can be written in the form g = x1 x2 · · · xk for some
nonnegative integer k and xi ∈ S for i ≤ k. Let δi := β x1 ···xi , for 0 ≤ i ≤ k.
By (iii), for each i ∈ [1, k] there exist u i−1 ∈ U (δi−1 ) and vi ∈ U (δi ) such
that u i−1 xi vi−1 ∈ H . Also, vi u i−1 ∈ H by (ii) and u −1
0 , vk ∈ H by (i). Hence
g = x1 x2 · · · xk
−1
= u −1
0 u 0 x 1 v1
−1
v1 u 1 u 1 x2 v2−1 · · · vk−1 u −1 −1
k−1 u k−1 x k vk vk ∈ H.
Since g ∈ G β was arbitrary, we have H = G β .

8.2 Sims’s Verify Routine 189
Let H = S1 . We consider the special case when S = S1 ∪ {z}; that is, G is
generated by H and one additional element. We would like to define sets U (δ)
that can be used in (i)–(iii) of Lemma 8.2.1.
Let 1 = {β} and let β G = 1 ∪ · · · ∪ l be the decomposition of the G-
orbit of β into orbits of H . For each i ∈ [1, l], we fix a representative δi ∈ i
and u(δi ) ∈ G such that β u(δi ) = δi . We may choose u(δ1 ) := 1. After that, for
δ ∈ i , we define

U (δ) := u(δi )h | h ∈ H, δih = δ . (8.2)
It is clear that the sets U (δ) satisfy (i).
Recall that we consider the special case when G is generated by H and one ad-
−1
ditional element z. Let γ := β (z ) and, for 1 ≤ i ≤ l, let i = i1 ∪ · · · ∪ ili
be the decomposition of i into orbits of Hγ . We fix a representative γi j of
each Hγ -orbit i j . Since δi and γi j are in the same H -orbit, we can also fix
y(γ )
y(γi j ) ∈ H such that δi i j = γi j . We also define u(γi j ) := u(δi ) y(γi j ). Finally,
if the representative of the Hγ -orbit of γizj is γk jk then we fix v(γizj ) := u(γk jk )y
for some y ∈ Hγ such that β v(γi j ) = γizj .
z
Lemma 8.2.2. If the following conditions (a), (b), and (c) hold then the sets
U (δ) defined in (8.2) satisfy (ii) and (iii) of Lemma 8.2.1.
−1
(a) Hδu(δ
i
i)
≤ H for all i ∈ [1, l];
(b) u(γi j ) z v(γizj )−1 ∈ H for all i ∈ [1, l] and for all j ∈ [1, li ]; and
(c) Hγz ≤ H .
Proof. Let us fix δ ∈ i . First we check (ii). If u(δi )h 1 , u(δi )h 2 ∈ U (δ) then

δi h 1 = δi h 2 = δ and so h 1 h −1
2 fixes δi . Hence the ratio

u(δi )h 1 (u(δi )h 2 )−1 = u(δi ) h 1 h −1
2 u(δi )−1 ∈ u(δi )Hδi u(δi )−1 ,
and by (a) it is in H .
For (iii), we have to consider two cases: x ∈ S1 or x = z. If x ∈ S1 then
let u = u(δi )h ∈ U (δ) be arbitrary, and v := ux. Note that δ x is in the same
H -orbit i as δ, so v = u(δi ) (hx) ∈ U (δ x ). Moreover, uxv −1 = 1 ∈ H .
Finally, let x = z. Let γi j be the representative of the Hγ -orbit of δ, and fix
w ∈ Hγ such that γiwj = δ. Define u := u(γi j ) w. Then u = u(δi ) (y(γi j )w) ∈
U (δ). We also define v := v(γizj ) w z . By (c), w z ∈ H , so, if the representa-
tive of the Hγ -orbit of γizj is γk jk , then v = u(δk ) h for some h ∈ H . We have
−1
β v = β v(γi j ) w = (γizj )z wz = δ z , and so v ∈ U (δ z ). By (b),
z z
−1
uzv −1 = u(γi j ) wz(z −1 w −1 z) v γizj ∈ H,
so (iii) holds.
The Verify Routine

To solve (4.6), we have to check whether (a), (b), and (c) of Lemma 8.2.2
−1
are satisfied. On one hand, it is clear that Hδu(δ i
i)
≤ G β , u(γi j ) z v(γizj )−1 ∈ G β ,
and Hγ ≤ G β , so if (a), (b), and (c) do not hold then we have constructed a
z
witness for H = G β . On the other hand, if (a), (b), and (c) are satisfied then,
by Lemmas 8.2.2 and 8.2.1, we have H = G β .
The required orbit computations of this algorithm are deterministic and can
be done in nearly linear time. If the transversals for the point stabilizer chain of
H are stored in shallow Schreier trees then each sifting required in (b), and each
base change required in (a) and (c), can also be executed by a nearly linear-
time deterministic or by a nearly linear-time Las Vegas algorithm. Hence the
time requirement of the algorithm is O(n M logc |G|), where n is the size of the
permutation domain and M is the number of Hγ -orbits in β G .
Now we handle the general case. Let S = {s1 , . . . , sk } and G(i) := H,
s1 , . . . , si . Recursively for i = 1, 2, . . . , k, we shall check whether G(i)β = H .
The case i = 1 is discussed in the previous paragraph. Suppose that we have
verified that G(i − 1)β = H for some i ≥ 2 and have constructed a Schreier
tree data structure for G(i − 1). Our next aim is to decide whether G(i)β = H .
We sift si in G(i −1). If si ∈ G(i −1) then G(i) = G(i −1) and we are done. If
si has a nontrivial siftee g then there are two cases. The first one is that β g = β,
which means that g ∈ G(i)β \G(i − 1) and so it is a witness for G(i)β = H .
The second case is that β g = β. This can happen only if the sifting procedure
failed on the first level because β g ∈ β G(i) \β G(i−1) . In this case, let := β G(i)
and := β G(i−1) . If indeed G(i)β = H then G(i) G(i − 1) ≥ G(i)β and
is a block of imprimitivity for the G-action on , so our first task is to check
whether is a block. This is exactly the situation we have encountered in
Section 5.5.1, in Step 4 of the construction of a block system. By Lemma 5.5.5,
in nearly linear time it is possible to check whether is a block and, if it is not,
to construct some h ∈ G(i)β \H . (We leave to the reader to check the details that
the algorithm described in Lemma 5.5.5(b) works in the situation considered
here, with G(i − 1) playing the role of H, rλ in Lemma 5.5.5.)
If is a block of imprimitivity then let
denote the block system consisting
of the G(i)-images of . In the G(i)-action on
, the “point” is stabilized
by G(i − 1); therefore, since G(i) is generated by G(i − 1) and one additional
element, we can use the special case of Sims’s Verify routine discussed earlier
in this section to check whether G(i) = G(i − 1). If indeed G(i) = G(i − 1)
then |G(i)| = (||/||)|G(i − 1)| = (||/||)|||H | = |||H | and |G(i) :
G(i)β | = ||, so G(i)β = H . However, if G(i) = G(i − 1) then the algorithm
constructs some h 1 ∈ G(i) \G(i − 1) to witness this fact. Taking h 2 ∈ G(i − 1)
such that β h 1 = β h 2 , we obtain h 1 h −1
2 ∈ G β \H .
Versions of Sims’s Verify routine are implemented both in Magma and GAP.
The bottleneck to nearly linear running time and to good practical performance
is that in the special case G = H, z, the running time is proportional to the
−1
number of orbits of Hγ , for γ = β (z ) . To alleviate this problem, the Verify
routine is combined with the Schreier–Todd–Coxeter–Sims method (cf. Sec-
tion 8.1) in Magma. This combination is referred to as the Brownie–Cannon–
Sims algorithm. Since this algorithm is unpublished, we do not know what
criterion on the number of orbits of Hγ is used to choose between Verify and
STCS.
The GAP implementation constructs a maximal block system
for the G-
action on β G , and z ∈ G such that z fixes the block B containing β. Then it
verifies that H, zβ = H and, recursively by the same method for computing
block systems, that G B = H, z for the point stabilizer G B in the action on
. Block computations are so cheap that their cost seems to be more than
offset by the decrease of the number of orbits of Hγ , which we achieve with
that choice of z. In essence, this modification inserts a chain of subgroups
between G and G β , which is in agreement with the general philosophy behind
permutation group algorithms, namely, that we try to define a subgroup chain
G = G 1 ≥ G 2 ≥ · · · ≥ G m = 1 with small indices |G i : G i+1 |.
As mentioned in Section 4.5.1 and discussed further at the end of Section 5.2,
GAP employs a heuristic version of the nearly linear-time Monte Carlo SGS
construction discussed in the above mentioned sections. The user can choose
whether the output should be checked with a Monte Carlo algorithm by apply-
ing Lemma 4.5.6 or its correctness should be decided with certainty, applying
Sims’s Verify routine. Not only do both strong generating tests give a yes/no
answer but, in the case of incorrect SGS, they construct an element of the input
group that can be the starting point of the continuation of the SGS construction.
However, the Verify routine is a quite expensive way to discover that an SGS
is not correct, and GAP tries to call Verify only for correct inputs. Therefore,
another heuristics is built in: If the user asks for Verify then, as a first step, ten
random elements are tested, as described in Lemma 4.5.6; Verify is called only
after these elements pass the test.
8.3. Toward Strong Generators by a Las Vegas Algorithm

In this section, we present a nearly linear-time Las Vegas algorithm that con-
structs a strong generating set in groups that satisfy some mild restrictions on
their composition factors. The restriction on the composition factors is necessary
because at present we do not have nearly linear-time constructive recognition
algorithms, which we shall define shortly, for all finite simple groups. We shall
follow the description of the algorithm from [Kantor and Seress, 1999], which
makes clear how the class of groups covered by the algorithm extends when the
nearly linear-time constructive recognition of further simple groups becomes
available. In Theorem 8.3.2, we shall give the current list of simple groups that
can be recognized constructively in nearly linear time.
The algorithm can be used as a strong generating test by comparing the order
of the input group it computed with the order computed from the SGS to be
tested. As we emphasized earlier, if the initial SGS computation is correct then
the nearly linear-time algorithms described in Chapters 5 and 6 are deterministic
or of Las Vegas type; therefore, for groups with restricted composition factors,
the entire nearly linear-time library of algorithms is upgraded to Las Vegas
type.
As promised, we give the definition of constructive recognition of simple
groups. We formulate constructive recognition in the black-box group setting,
since its applications extend beyond permutation group algorithms: Construc-
tive recognition of simple groups is a central concept in matrix group algorithms
as well. Let F be a family of simple groups and let f : F → R be a function
taking positive values. We say that F is black-box f -recognizable if, whenever
a group H = S isomorphic to a member of F is given as a black-box group
encoded by strings of length at most N and, in the case of Lie-type H , the
characteristic of H is given, there are Las Vegas algorithms for (i) and (ii) and
a deterministic algorithm for (iii) of the following:
(i) Find the isomorphism type of H .
(ii) Find a new set S ∗ of size O(log |H |) generating H and a presentation
of length O(log2 |H |) such that S ∗ satisfies the presentation. (Note that
log |H | ≤ N , and this presentation proves that H has the isomorphism
type determined in (i).)
(iii) Given h ∈ H , find a straight-line program of length O(N ) from S ∗ to h.
(Straight-line programs were defined in Section 1.2.3.)
Moreover,
(iv) The algorithms for (i)–(iii) run in time O((ξ + µ) f (H )N c ), where ξ is an
upper bound on the time requirement per element for the construction of
independent, (nearly) uniformly distributed random elements in subgroups
of H, µ is an upper bound on the time required for each group operation
in H , and c is an absolute constant.
As we have seen in Section 5.3, a permutation group G ≤ Sym() is iso-
morphic to a black-box group H over an alphabet consisting of the labels in
a Schreier tree data structure of G. Let ψ : G → H denote this isomorphism.
By Lemma 5.3.1, the quantities ξ, µ, N for H in part (iv) of the definition of

constructive recognition are bounded from above by logc |G| for some absolute

constant c . Moreover, for any g ∈ G, ψ(g) can be computed in O(logc |G|)

time, and for any h ∈ H, ψ −1 (h) can be computed in O(|| logc |G|) time.
Therefore, a constructive recognition algorithm runs in nearly linear time
for permutation group inputs if f is bounded from above by a nearly linear
function of the degree of the input group. This fact motivates the consideration
of the function m : G → R, where G is the family of all finite simple groups and
m(G) is the degree of the smallest faithful permutation representation of G.
Theorem 8.3.1. Given a permutation group G = T ≤ Sym(), with || =

n, such that all composition factors of G are m-recognizable, a base and strong
generating set for G can be computed by a nearly linear-time Las Vegas algo-
rithm.
Proof. We compute an alleged base and strong generating set, and a composi-
tion series G = N1 N2 · · · Nl = 1, by the nearly linear-time Monte Carlo
algorithms in Sections 4.5 and 6.2. The composition series algorithm also pro-
vides homomorphisms ϕi : Ni → Sn with ker(ϕi ) = Ni+1 , for 1 ≤ i ≤ l − 1.
We also compute strong generating sets for all Ni relative to the base of G.
We will verify the correctness of the base and strong generating sets for the
subgroups Ni by induction on i = l, l − 1, . . . ,1.
Suppose that we already have verified an SGS for Ni+1 . We compute a base,
SGS, shallow Schreier-tree data structure, and an isomorphism ψi : ϕi (Ni ) →
Hi with a black-box group Hi for the image ϕi (Ni ) ≤ Sn , which is allegedly
isomorphic to a simple group. Our first goal is to recognize Hi , and so ϕi (Ni ),
constructively.
As a consequence of the classification of finite simple groups, we know that
there are no three pairwise nonisomorphic simple groups of the same order.
Therefore, since we know |ϕi (Ni )|, we have at most two candidate simple
groups C for the isomorphism type of ϕi (Ni ), and in the ambiguous cases we
try both possibilities. Also, if |ϕi (Ni )| > 8!/2 then |ϕi (Ni )| determines whether
ϕi (Ni ) is of Lie type, and if it is, its characteristic. Hence, using that ϕi (Ni )
is from an m-recognizable family, we can obtain by a nearly linear-time Las
Vegas algorithm a generating set Si∗ of Hi of size O(log |Hi |) and a presentation
E i | Ri of length O(log2 |Hi |) satisfied by Si∗ . Moreover, for any given h ∈
Hi , we can write a straight-line program of length O(log |Hi |) from Si∗ to
h in nearly linear time by a deterministic algorithm. By Lemma 5.3.1, the
preimage Si∗∗ = ψi−1 (Si∗ ) ⊆ ϕi (Ni ) can also be obtained in nearly linear time
by a deterministic algorithm.
Now the correctness of the SGS for Ni can be proved in the following way: Let
Ti be the set of generators of Ni computed by the composition series algorithm.
We check that
(a) Ni+1 Ni and Ni = Ni+1 ;
(b) ϕi−1 (Si∗∗ )Ni+1 /Ni+1 satisfies the presentation E i | Ri computed for
ϕi (Ni ) ∼
= Hi ; and
(c) Ti ⊂ ϕi−1 (Si∗∗ )Ni+1 , where ϕi−1 (g) denotes a lift (i.e., an arbitrary preim-
age) of g ∈ Si∗∗ ⊆ ϕi (Ni ).
If (a), (b), and (c) hold then |Ni | = |Ni+1 ||ϕi (Ni )|. If |Ni | is equal to the
value for |Ni | computed from the alleged SGS of Ni then the SGS construction
is correct, since the alleged order of a group computed by a Monte Carlo SGS
construction is not greater than the true order, with equality if and only if the
SGS construction is correct.
We indicate how (a), (b), and (c) can be checked algorithmically. For (a),
we conjugate the generators of Ni+1 by the elements of Ti and check that the
resulting permutations are in Ni+1 . Because the correctness of the SGS for
Ni+1 is already known, membership testing giving guaranteed correct results
is available for that group. Also, we check that not all elements of Ti are in
Ni+1 . For (b), we multiply out the relators that were written in terms of Si∗ ,
using the permutations in ϕi−1 (Si∗∗ ) = ϕi−1 ψ −1 (Si∗ ); then we check that the
resulting permutations are in Ni+1 . Finally, for (c), for each ti ∈ Ti we write a
straight-line program P from Si∗ to ψi ϕi (ti ) ∈ Hi and evaluate P starting from
ϕi−1 ψ −1 (Si∗ ) (recall that straight-line programs can be evaluated in any group,
giving values to the symbols representing generators in the program). The result
of the evaluation is some t ∗ ∈ ϕi−1 ψi−1 (Si∗ ) and we check that t ∗ t −1 ∈ Ni+1 .
By (b) and (c), we have checked that the factor group Ni /Ni+1 ∼ = C.
At the end of the induction, we have obtained a correct SGS for the group
N1 = T1 that was output by the composition series algorithm. After that, we
verify that G = N1 by sifting the elements of the original generating set T
in N1 .
To justify the nearly linear running time of the entire algorithm, note that
l ∈ O(log |G|), so it is enough to show that the ith step of the induction for
a fixed value i ≤ l runs in nearly linear time. We have already seen that the
constructions of both Si∗ and of the presentation for ϕi (Ni ) ∼ = Hi are within
∗
this time bound. Since both |Ti | and |Si | are O(log |G|), whereas the length
of the presentation E i | Ri is O(log2 |G|) and the Schreier-tree data structure
of Ni+1 is shallow, the number of permutation multiplications that occur while
checking (a), (b), and (c) is bounded from above by a polylogarithmic function
of |G|.
We note that, to achieve an algorithm failure with probability less than ε, we
have to require that calls to the SGS construction algorithm and to constructive
recognition fail with probability less than ε/(c log |G|) for some constant c,
because, during the induction, O(log |G|) such calls may be made; however,
this multiplies the running time only by a O(log log |G|) factor.
To apply the theorem to a wide class of groups, we need m-recognizable

simple groups. The current state of the art is summarized in the following
theorem.
Theorem 8.3.2. The set of all finite simple groups, with the possible exceptions
of the groups 2F4 (q) and 2G 2 (q), comprises a black-box m-recognizable family.
We do not prove Theorem 8.3.2; instead we just make some comments and
give the appropriate references. Cyclic simple groups are m-recognizable by
brute force, and the twenty-six sporadic simple groups can be handled with a
constant number of black-box group operations.
The alternating groups comprise a 1-recognizable family (i.e., one can take
f (G) = 1 in the definition of constructive recognition for all alternating groups
G). This result was first proven in [Beals and Babai, 1993], and more effi-
cient versions are described in [Bratus and Pak, 2000] and [Beals et al., 2002].
We shall return to the recognition of alternating and symmetric groups in Sec-
tion 10.2.
Classical simple groups are treated in [Kantor and Seress, 2001]. We note
that in that paper there is no algorithm for the constructive recognition of three-
dimensional unitary groups, since at the time of writing it was not known
whether the groups G ∼ = PSU3 (q) have presentations of length O(log2 |G|), as
required in part (ii) of the definition. Such presentations are described in [Hulpke
and Seress, 2001]. A predecessor of [Kantor and Seress, 2001] is [Cooperman
et al., 1997], where a constructive recognition algorithm for the groups GLn (2)
is given. In [Bratus, 1999], that method is extended to all special linear groups
PSLn (q).
Constructive recognition algorithms for the exceptional groups of Lie type
are in [Kantor and Magaard, 2002]. Both [Kantor and Seress, 2001] and
[Kantor and Magaard, 2002] far exceed the scope of this book.
Actually, in all these cited references, much more is done than parts (i)–
(iii) of constructive recognition: Given a black-box group H isomorphic to
some simple group S, an isomorphism λ : H → C, defined by the images of
the elements of S ∗ , is constructed between H and a standard copy C of S,
together with procedures to compute the image of any element of H under
λ or of any element of C under λ−1 . These procedures are very useful for
further computations with groups containing S as a composition factor, such
as the construction of Sylow subgroups (cf. [Kantor, 1985b; Morje, 1995]) or
the computation of maximal subgroups or conjugacy classes (cf. [Cannon and
Souvignier, 1997; Hulpke, 2000; Eick and Hulpke, 2001]). Another possible
application of these procedures is in computation with matrix groups.
The “standard copy” of a simple group S referred to in the previous paragraph
is the natural permutation representation of degree n if S is the alternating group
An , matrices modulo scalars in the correct dimension, over the field of definition
if S is a classical group, and a presentation utilizing the Bruhat decomposition
if S is an exceptional group of Lie type.
In [Kantor and Seress, 2001] and [Kantor and Magaard, 2002], there are also
more precise timings of algorithms than required in Theorem 8.3.1. It is shown
that the family of Lie-type simple groups, with the possible exception of the
groups 2 F4 (q) and 2 G 2 (q), is black-box f -recognizable for the function

q 3/2 ∼ PSUd (q 1/2 ) for some d
if G =




q for all other classical G defined on a vector space over GF(q)
f (G) = q 2 if G ∼
= 2B2 (q)



q3 for other exceptional groups defined over GF(q) except

 G∼ = 2F4 (q) and 2G 2 (q).
It seems very likely that the set of all groups of Lie type is a black-box f -
recognizable family with some f (G) ≤ m(G).
The algorithm described in Theorem 8.3.1 has not been implemented. Al-
though the running time is nearly linear, computing the order of the input
group by this algorithm involves a lot of superfluous work. However, it may
be worthwhile to verify the correctness of an SGS computation in this way
if the constructive recognition of the composition factors is needed in further
computations. Besides the examples of computing Sylow subgroups, maximal
subgroups, or conjugacy classes that we have mentioned earlier, this is the case
at the construction of a short presentation, as described in the next section.
Constructive recognition of simple groups is even more important in the
matrix group setting, where every attempt so far for finding the structure of
a matrix group has led to some version of the constructive recognition prob-
lem (cf. [Babai and Beals, 1999; Leedham-Green, 2001; Kantor and Seress,
2002]).
Currently, constructive recognition of black-box groups isomorphic to alter-
nating, special linear, symplectic, and unitary groups is implemented in GAP,
but these algorithms are not in the standard distribution.
We finish this section with a lemma that will be needed in Section 8.4.
Lemma 8.3.3. Let G ≤ Sym(), with || = n, be a permutation group, such

that all composition factors of G are m-recognizable. Suppose that the follow-
ing have already been computed, as in the proof of Theorem 8.3.1: a compo-
sition series G = N1 N2 · · · Nl = 1, homomorphisms ϕi : Ni → Sn with
ker(ϕi ) = Ni+1 , and presentations satisfied by generating sets Si∗∗ ⊂ ϕi (Ni ).

Then any g ∈ G can be reached from i ϕi−1 (Si∗∗ ) by a straight-line program
of length O(log |G|), and such a straight-line program can be computed in
Proof. By induction on i = 1, 2, . . . , l, we construct a straight-line program Pi

of length O(log(|G|/|Ni |)) to some gi ∈ G such that ggi−1 ∈ Ni . Let g1 := 1. If
gi has already been obtained for some i, then we write a straight-line program
L i of length O(log |Ni /Ni+1 |) from Si∗ to ϕi (ggi−1 ). In the case when Ni /Ni+1
is cyclic or sporadic, this can be done by brute force. In the other cases, we use
an isomorphism ψi between ϕi (Ni ) and a black-box group Hi , as in the proof
of Theorem 8.3.1, and the fact that ψi ϕi (Ni ) is black-box m-recognizable.
We evaluate this straight-line program starting from ϕi−1 (Si∗∗ ), producing an
element h i ∈ Ni . Here, ggi−1 h i−1 ∈ Ni+1 , and we can define gi+1 := h i gi . The
straight-line program Pi+1 reaching gi+1 is defined as the concatenation of Pi
and L i , with the added term (w j , wk ) for the entries w j ∈ L i and wk ∈ Pi ,
which evaluate to h i and gi , respectively.
Finally, we notice that the entire procedure runs in nearly linear time, since
m(Ni /Ni+1 ) ≤ n for all i.
8.4. A Short Presentation

In strong generating set constructions for a permutation group G ≤ Sym(),
we defined a subgroup chain G = G 1 > G 2 > · · · > G l = 1, constructed
generators Si for the subgroups G i , and tested that Si indeed generates G i . In
all but one of the SGS constructions, the subgroups G i comprised a point
stabilizer chain relative to some nonredundant base, whereas in the nearly
linear-time Las Vegas algorithm presented in Section 8.3, the subgroups G i
defined a composition series of G. In this section, we describe how the test-
ing phase of these SGS constructions can be used to construct a presentation
for G.
In the variants using a point stabilizer chain, the testing phase checks for
i ∈ [1, l − 1] whether or not a suitable subset of Schreier generators obtained
from Si and a transversal Ri for G i mod G i+1 are in G i+1 . This subset of Schreier
generators defines a presentation for G in the following way. We introduce a

set of symbols E i corresponding to Si , and we write the presentation using the

generating set E := i E i . Each element of the transversal Ri corresponds to
a word in E i , and hence each Schreier generator si constructed from Si and Ri
corresponds to a word w(si ) using the symbols in E i . Since si ∈ G i+1 , it has a
standard word decomposition (cf. Section 5.3), which corresponds to a word

w̄(si ) using the symbols in j>i E j . We claim that E | R, where R is the set
of relators w(s)w̄(s)−1 = 1 for the Schreier generators s used in the checking
phase in the SGS construction, is a presentation for G.
In the case of Sims’s original SGS construction, when all Schreier generators
composed from Si and Ri are tested for all i ∈ [1, l], our claim is just Exer-
cise 5.2. In the case of the Schreier–Todd–Coxeter–Sims method described in
Section 8.1.2, the SGS test itself constructs the presentation described in our
claim, and we have seen in Section 8.1.2 that E | R is indeed a presentation
for G. Finally, in the case of Sims’s Verify routine in Section 8.2, the Schreier
generators used in the presentation can be obtained from Lemma 8.2.2, and the
proof that all relations corresponding to other Schreier generators are conse-
quences of the relators in R can be deduced from Lemma 8.2.1. We leave the
details of this argument to the reader.
Another method for reducing the set of relations corresponding to Schreier
generators, while of course maintaining that the remaining relators define a
presentation for G, is in [Cannon, 1973]. A recent improvement of Cannon’s
method is described in [Gebhardt, 2000].
The disadvantage of presentations obtained from SGS constructions using
a point stabilizer chain is that the number of relators, and so the length of the
presentation, can be proportional to the degree n of the permutation domain.
(The length of a presentation is simply the number of symbols needed to write
down the presentation.) The main purpose of this section is to construct in
nearly linear time a presentation of length polylogarithmic in |G|, in groups
G where all composition factors are m-recognizable. The method is based
on the strong generating test described in Theorem 8.3.1. The existence of a
presentation of length O(logC |G|), provided that all composition factors of G
have presentations of length O(logC−1 |G|) for some absolute constant C ≥ 3,
was proven in [Babai et al., 1997a]. The following algorithmic version of this
result for groups with m-recognizable composition factors is from [Kantor and
Seress, 1999].
Theorem 8.4.1. Let G ≤ Sym(), with || = n, be a permutation group,

such that all composition factors of G are m-recognizable. Then it is possible
to construct a presentation of length O(log3 |G|) for G by a nearly linear-time
Las Vegas algorithm.
Proof. We use the Las Vegas algorithm of Theorem 8.3.1 to compute a com-
position series G = N1 N2 · · · Nl = 1 and, for each i ∈ [1, l − 1], a
homomorphism ϕi : Ni → Sn with ker(ϕi ) = Ni+1 , together with a presenta-
tion E i | Ri of length O(log2 |Ni /Ni+1 |) for Ni /Ni+1 , satisfied by generating
sets Si∗∗ ⊂ ϕi (Ni ) of cardinality O(log |Ni /Ni+1 |).
Next, for each i < l, we compute a set Ti = {ϕi−1 (g) | g ∈ Si∗∗ } ⊆ Ni ,
where ϕi−1 (g) denotes an arbitrary preimage of g under the homomorphism ϕi .
Then, corresponding to parts (a) and (b) of the SGS test in Theorem 8.3.1, we
compute a set of permutations by the following two methods. The first method
computes all permutations p(ti , t j ) = ti−1 t j ti , where 1 ≤ i < j ≤ l − 1 and
ti ∈ Ti , t j ∈ T j . Since Ni+1 Ni , we have p(ti , t j ) ∈ Ni+1 . The second method
computes all permutations p(i, r ), where 1 ≤ i ≤ l − 1 and r = 1 is a relator
in Ri . The permutation p(i, r ) ∈ G is computed by multiplying out r , using
the appropriate elements of Ti . Since r = 1 is satisfied in Ni /Ni+1 , we have
p(i, r ) ∈ Ni+1 .
Since the number of generators is O(log |Ni /Ni+1 |) and the total length of
relators is O(log2 |Ni /Ni+1 |) in E i | Ri , all permutations p(ti , t j ), p(i, r ) can
be computed using O(log2 |G|) permutation multiplications. Moreover, the total
number of the permutations p(ti , t j ), p(i, r ) is O(log2 |G|).
As a final step of preparation, for each p = p(ti , t j ) and p = p(i, r ) we
compute a straight-line program W ( p) of length O(log |G|) reaching p from

k≥i+1 Tk . By Lemma 8.3.3, this can also be done in nearly linear time. We
may assume that the straight-line programs use the symbols from the sets E i to
denote the generators and that each element of each E i occurs in at least one of
the straight-line programs W ( p) (if this is not the case then we add the missing
generators to one of the straight-line programs).
Now we are ready to write a presentation for G. More exactly, we de-
fine a group Ĝ = Ê | R̂ and prove that G ∼ = Ĝ. The generating set Ê is the
union of symbols occurring in all straight-line programs W ( p), for p = p(ti , t j )
and p = p(i, r ) (i.e., if W ( p) = (w1 , w2 , . . . , wl( p) ) then wk ∈ Ê for all k ∈

[1, l( p)]). In particular, i E i ⊆ Ê. The relator set R̂ consists of the following
defining relations:
(i) If w j = (wk , −1) for some w j in a straight-line program W ( p) then we

add the relator w j wk = 1 to R̂.
(ii) If w j = (wk , wm ) then we add the relator w j (wk wm )−1 = 1 to R̂.
(iii) For 1 ≤ i < j ≤ l − 1 and for ti ∈ Ti , t j ∈ T j , let ei ∈ E i and e j ∈
E j be the symbols corresponding to ti and t j , respectively, and let W ( p(ti ,
t j )) = (w1 , w2 , . . . , wl( p) ). We add the relator ei−1 e j ei wl(−1p) = 1 to R̂.
(iv) For 1 ≤ i ≤ l − 1 and {r = 1} ∈ Ri , let W ( p(i, r )) = (w1 , w2 , . . . , wl( p) ).
We add the relator r wl(−1p) = 1 to R̂.

Substituting the elements of i Ti for the appropriate elements i E i ⊆

Ê and substituting the evaluations starting from i Ti for the elements of Ê
representing nongenerator symbols in straight-line programs, it is clear that G
satisfies the presentation R̂. Therefore there exists a homomorphism ψ : Ĝ →
G, and it is enough to prove that |Ĝ| ≤ |G|.

For i = 1, 2, . . . , l − 1, let Ĝi be the subgroup of Ĝ generated by j≥i E j .
Then Ĝ1 = Ĝ, since (i) and (ii) ensure that all generators in Ê can be expressed

as products of the elements of j≥1 E j .
We claim that Ĝ i+1 Ĝ i for i = 1, 2, . . . , l − 1. Indeed, by (iii), if j ≥ i + 1
then for any ei ∈ E i and e j ∈ E j we have ei−1 e j ei = wl( p) for the last term
wl( p) of the straight-line program W ( p(ti , t j )). Since ti−1 t j ti ∈ Ni+1 for the
corresponding generators ti , t j of G, by (i) and (ii) again we have that all
symbols in W ( p(ti , t j )), in particular wl( p) , are in Ĝ i+1 .
Finally, we claim that Ĝ i /Ĝ i+1 satisfies the presentation E i | Ri for
Ni /Ni+1 . Indeed, for any relator {r = 1} ∈ Ri , by (iv) we have r ∈ Ĝ i+1 . Our
last claim proves that |Ĝ i /Ĝ i+1 | ≤ |Ni /Ni+1 |, and so |Ĝ| ≤ |G| and Ĝ ∼ = G.
Since the total number of permutations p = p(ti , t j ) and p = p(i, r ) is
O(log2 |G|) and each straight-line program W ( p) is of length O(log |G|), we
have O(log3 |G|) relators described in (i) and (ii) and each of these relators has
length bounded by an absolute constant. Similarly, since there are O(log2 |G|)
pairs (ti , t j ), the total length of relators described in (iii) is O(log2 |G|). Finally,
since the sum of lengths of the presentations E i | Ri is O(log2 |G|), the total
length of relators described in (iv) is also O(log2 |G|).
As we mentioned in Section 8.3, if H is a simple group not isomorphic to

2
G 2 (q) then H has a presentation of length O(log2 |H |) (cf. [Suzuki, 1962,
p. 128; Babai et al., 1997a; Hulpke and Seress, 2001]). Hence if a group G has
no composition factors isomorphic to 2 G 2 (q) then the result of [Babai et al.,
1997a] mentioned just before Theorem 8.4.1 implies that G has a presentation
of length O(log3 |G|). Moreover, if a permutation group G has no composition
factors isomorphic to 2 G 2 (q) or 2 F4 (q) then Theorems 8.3.2 and 8.4.1 imply
that such a short presentation can be constructed in nearly linear time by a Las
Vegas algorithm. Currently it is not known whether the groups 2 G 2 (q) have
presentations of length polylogarithmic in their order.
9
Backtrack Methods
In this chapter we consider computational problems that, at present, have no

polynomial-time solutions. They can be formulated by the following general
description: Given G = S ≤ Sym() and a property P, find the set G(P)
of elements of G satisfying P. (We assume that for each g ∈ G, it is possible
to decide whether g satisfies P.) For certain properties, there seems to be no
better general method than to run through a listing of G and examine each
element until we find the ones satisfying P. This happens, for example, when
we search for the solutions of an equation in G. If a method checks each g ∈ G
separately then, of course, there is a severe limit on the orders of the groups
it can handle in practice. However, in many cases of practical interest, the set
G(P), if not empty, is a subgroup of G or a coset of a subgroup. Examples
when G(P) is a subgroup are centralizers of elements or subgroups of Sym(),
normalizers of subgroups, and setwise stabilizers of subsets of . In these
cases, we are usually interested in finding all g ∈ G satisfying P. The most
important examples for properties satisfied by a coset of a subgroup (or no
elements of G at all) are when P is the property that g ∈ G conjugates some
g
fixed a ∈ Sym() to b ∈ Sym() or A ≤ Sym() to B ≤ Sym(), or 1 = 2
for some 1 , 2 ⊆ . In these cases, we are usually interested in finding only
one element of G(P), or in deciding that G(P) = ∅. As these examples indicate,
subgroup-type and coset-type problems come in pairs. To find all elements of
G(P) in a coset-type problem, we may search for one element of G(P) and
then solve the corresponding subgroup-type problem.
Throughout this chapter, we assume that G(P) is a subgroup, or a coset
of a subgroup in G, or empty. This assumption will enable us to search G
without processing every element of G separately, thereby leading to practical
algorithms.
201
202 Backtrack Methods
9.1. Traditional Backtrack
The content of this section essentially was formulated in [Sims, 1971a, b]. Let
B = (β1 , . . . , βm ) be a base of G ≤ Sym(), and let G = G [1] ≥ G [2] ≥ · · · ≥
G [m] ≥ G [m+1] = 1 be the corresponding point stabilizer subgroup chain. The
images of base points uniquely determine the elements of G, and we may
identify group elements with their sequence of base images. The image of an
initial segment of B of length l defines a coset of G [l+1] . The images of all initial
segments define a partial order by inclusion, which is called the search tree T
for G. The lth level Tl of T corresponds to the cosets of G [l+1] ; in particular,
the root of T corresponds to G and the leaves of the search tree correspond to
the elements of G. We shall denote the subtree of T rooted at t ∈ T by T (t).
Formally, we can define a function ϕ from T to the power set of G by the rule
g
ϕ((γ1 , . . . , γl )) := g ∈ G | βi = γi for 1 ≤ i ≤ l .
Traditional backtrack methods systematically examine the elements of the

search tree. It is especially valuable when a vertex t close to the root can
be eliminated, by establishing either that ϕ(t) does not contain any elements
of G(P) or that ϕ(t) ∩ G(P) is in the subset of G(P) that is already found,
because then all elements of G corresponding to leaves less than this vertex
(i.e., elements of the coset ϕ(t)) can be excluded from the search at once. Of
course, we do not construct and store T explicitly; rather, we traverse T and
construct and discard vertices as required by the traversal.
Let (, ≺) be an ordering of with the property that β1 ≺ β2 ≺ · · · ≺ βm
and βm ≺ β for all β ∈ \B. This ordering (, ≺) induces a lexicographic
ordering on G. Namely, for g, h ∈ G, we have h ≺ g if and only if there exists
g g
l ∈ [1, m] such that βih = βi for all i < l and βlh ≺ βl . Similarly, (, ≺)
induces a lexicographic ordering of every level Tl .
We require that our algorithm traverses T by a depth-first-search, or back-
track, traversal. This is defined by the following rule:
(i) The starting point of the traversal is the root of T .

(ii) Suppose that the traversal arrived at a vertex t ∈ T .
r If t is a leaf then examine whether the corresponding group element
satisfies P, and then go to the parent of t.
r If t is not a leaf then traverse the subtrees rooted at the children of t,
considering these subtrees in the lexicographic ordering of their roots,
and then go to the parent of t.
(iii) The algorithm ends when the previous step tries to go to the parent of the
root.
In particular, the backtrack traversal considers the elements of G in increasing
order and determines G [i] ∩ G(P) before processing any element of G\G [i] .
There are many methods for eliminating a vertex t of the search tree, that is, for
determining that no element of G(P) is in the subtree rooted at t, or that we have
already found all such elements. In this case, the traversal of this subtree can be
skipped. These methods fall into two categories: problem-independent methods,
which rely only on the fact that G(P) is a subgroup or a coset of a subgroup,
and problem-dependent ones, which use the specific features of the property P.
9.1.1. Pruning the Search Tree: Problem-Independent Methods

Lemma 9.1.1. Suppose that G(P) is a subgroup of G, the subgroup K :=
G(P) ∩ G [l+1] is already computed for some l ∈ [1, m], and currently the
backtrack algorithm traverses the subtree T (t) with root t = (γ1 , γ2 , . . . , γl ) ∈
Tl . Then, if any g ∈ ϕ(t) ∩ G(P) is found then g, K ⊇ ϕ(t) ∩ G(P), so we
can skip traversing the rest of T (t).
Proof. Any h ∈ ϕ(t) ∩ G(P) is in the coset K g.
Note that in coset-type problems we terminate the entire search procedure as

soon as one element of G(P) is found.
In the following criteria, G(P) may be a subgroup or a coset of a subgroup.
Suppose that we already know subgroups K , L ≤ G such that any g ∈ G
belongs to G(P) if and only if the double coset K gL ⊆ G(P). If G(P) is a
subgroup then K and L may be chosen as the already constructed subgroup of
G(P). We may have nontrivial K and L at our disposal even when G(P) is a
coset of a subgroup, and we do not yet know any element of G(P): For example,
when G(P) consists of the elements of G conjugating a ∈ G to b ∈ G, we may
choose K = a and L = b.
Because it is enough to consider only one element from each double coset
K gL and the backtrack traversing encounters the elements of G in increasing
order, we can skip processing a subtree T (t) if we know that no g ∈ ϕ(t) is the
lexicographically first element of its double coset K gL. However, computing
the first element of a double coset is, in general, an NP-hard problem (cf. [Luks,
1993]), so we have to settle for weaker, but easier computable criteria.
Lemma 9.1.2. Let t = (γ1 , γ2 , . . . , γl ) ∈ T . If g ∈ ϕ(t) is the first element of

L (γ ,γ ,...,γ )
K gL then γl is the ≺-minimal element of the orbit γl 1 2 l−1 .
L (γ ,γ ,...,γ
Proof. Suppose, on the contrary, that γ ∈ γl 1 2 l−1 and γ ≺ γl . Let h ∈
)
L (γ1 ,γ2 ,...,γl−1 ) with γlh = γ . Then gh ∈ K gL and gh ≺ g, contradicting the

minimality of g.
Lemma 9.1.2 implies that when processing the successors of the vertex t :=
(γ1 , γ2 , . . . , γl−1 ), it is enough to consider extensions of t by the ≺-minimal
elements of the orbits of L (γ1 ,γ2 ,...,γl−1 ) . The application of this criterion involves
a base change in L, in which a base for L starting with (γ1 , γ2 , . . . , γl−1 )
is computed. Because of the order in which the search tree is traversed, we
already have L (γ1 ,γ2 ,...,γl−2 ) at hand when L (γ1 ,γ2 ,...,γl−1 ) is to be computed, so only
γl−1 has to be included in the new base. We have to find a balance between the
cost of the base change and the gain in pruning the search tree. Hence, in some
implementations, this criterion is applied only for small values of l.
[Leon, 1991] contains the following observation which, when an additional
condition is satisfied, leads to a strengthening of Lemma 9.1.2.
K (β ,...,β
Lemma 9.1.3. Suppose that βl ∈ βk 1 k−1 for some k ≤ l, and let t =
)
(γ1 , γ2 , . . . , γl ) ∈ T . If g ∈ ϕ(t) is the first element of K gL then γk ,

L (γ ,γ ,...,γ )
min(γl 1 2 k−1 ).
K (β ,...,β
Proof. Since βl ∈ βk 1 k−1 , there exists h 1 ∈ K (β1 ,...,βk−1 ) such that βl = βkh 1 .
)
L (γ ,γ ,...,γ )
Suppose, on the contrary, that there exists γ ∈ γl 1 2 k−1 with γ ≺ γk . Then
γ = γlh 2 for some h 2 ∈ L (γ1 ,γ2 ,...,γk−1 ) , and h 1 gh 2 ∈ K gL. Moreover, h 1 gh 2 ≺
h gh g h gh g
g, since βi 1 2 = βi for i < k, and βk 1 2 = γ ≺ γk = βk . This contradicts
the minimality of g in K gL.
K (β ,...,β
In particular, if βl ∈ βk 1 k−1 for some k < l then we have to consider only
)
vertices t = (γ1 , γ2 , . . . , γl ) ∈ Tl satisfying γk ≺ γl . Sims’s original criterion

in Lemma 9.1.2 is the special case k = l of Lemma 9.1.3.
K (β ,...,β
Lemma 9.1.4. Let s be the length of the lth fundamental orbit βl 1 l−1 of K ,
)
and let t = (γ1 , γ2 , . . . , γl ) ∈ T . If g ∈ ϕ(t) is the first element of K gL then γl

cannot be among the last s − 1 elements of its orbit in G (γ1 ,...,γl−1 ) .
hg g
Proof. The set := {βl | h ∈ K (β1 ,...,βl−1 ) } has s elements and γl = βl ∈ .
hg
All elements of are in the same orbit of G (γ1 ,...,γl−1 ) , since for γ = βl ∈ ,
−1 −1
we have γ g h g = γl with g −1 h −1 g ∈ G (γ1 ,...,γl−1 ) . Also, hg ∈ K gL, so the
G (γ ,...,γ )
minimality of g implies γl = min . Hence γl 1 l−1 contains at least s − 1
elements after γl .
Lemma 9.1.4 implies that when processing the successors of the vertex t :=
(γ1 , γ2 , . . . , γl−1 ), extensions by the last s −1 elements of each G (γ1 ,...,γl−1 ) -orbit
can be skipped.
9.1.2. Pruning the Search Tree: Problem-Dependent Methods

The property P may imply restrictions on the base images of elements of G(P).
While processing the vertex t = (γ1 , . . . , γl−1 ) of T , we search for elements
of G(P) for which the images of the first l − 1 base points are known. It is
possible that all elements of ϕ(t) ∩ G(P) must satisfy additional properties,
which enables us to eliminate some (or all) children of t from the backtrack
traversal. A priori, we know that, for all children (γ1 , . . . , γl ) of t, we must have
G (β ,...,β )
γl ∈ (βl 1 l−1 )g , where g is an arbitrary element of ϕ(t). We are looking for
G (β ,...,β )
restrictions that define a subset P (γ1 , . . . , γl−1 ) of (βl 1 l−1 )g , containing
the possible extensions of t. These restrictions vary, depending on P. As in the
case of the problem-independent reductions, we try to satisfy two competing
requirements: We are looking for restrictions that are easy to compute but
preferably still eliminate a significant portion of the children of t.
Example 1: Centralizer Given G ≤ Sym() and c ∈ G, we search for a gen-

erating set of C G (c). The structure of CSym() (c) = CSym() (c) was described
in Lemma 6.1.8. The equivalent orbits of c are the supports of cycles of c of
equal length, so we know that the elements of C G (c) must respect the partition
{1 , 2 , . . .} of , where i is the union of supports of cycles of c of length i.
k−1
A further, quite severe, restriction is that for any cycle C = (ω, ωc , . . . , ωc )
of c, the image ω g uniquely determines C g for any g ∈ C G (c). Namely, for all
l ∈ [1, k − 1],
cl g l
ω = (ω g )c .
Hence, if ω g is decided for some ω ∈ B then we know the image of some

additional elements of . However, the elements of C are usually not among
the base points. Therefore, to take full advantage of this criterion, we start
the backtrack search with a base change. We compute a base according to an
ordering of , where elements from each cycle of c are consecutive. Then
P (γ1 , . . . , γl−1 ) consists of at most one point if βl = βl−1 c
; otherwise, we can
use
G (β ,...,β ) g
P (γ1 , . . . , γl−1 ) := βl 1 l−1 ∩ i ,
where g is an arbitrary element of ϕ((γ1 , . . . , γl−1 )) and i is the length of the

c-cycle containing βl . Because of this strong restriction, computing centralizers
of elements is not considered to be difficult in practice, despite the exponential
worst-case complexity of the procedure.
Example 2: Setwise Stabilizer Given G ≤ Sym() and ⊆ , we are look-

ing for generators of G . First, we compute a base for G according to an
ordering of where the elements of precede all elements of \. If the first
base point in \ is βk then we know that G [k] = G () ≤ G(P). Therefore,
by Lemma 9.1.1, ϕ(t) ⊆ G(P) or ϕ(t) ∩ G(P) = ∅ for any t = (γ1 , . . . , γk−1 ),
and it is enough to work with initial segments of base images of length at most
k − 1. We also have the restriction
G (β ,...,β ) g
P (γ1 , . . . , γl−1 ) := βl 1 l−1 ∩
for all l < k and all g ∈ ϕ((γ1 , . . . , γl−1 )).
Example 3: Intersection of Groups Given G, H ≤ Sym(), we are looking

for generators of G ∩ H . We can suppose that |G| ≤ |H |. First, we compute a
base B = (β1 , . . . , βm ) for G and define the search tree T according to B. We
also compute a base for H starting with B. The restriction on base images is
G (β ,...,β ) g H(β ,...,β ) h
P (γ1 , . . . , γl−1 ) := βl 1 l−1 ∩ βl 1 l−1
for all g ∈ ϕ((γ1 , . . . , γl−1 )) and all h ∈ ϕ ((γ1 , . . . , γl−1 )), where ϕ is the
map from T to the power set of H , defined analogously to ϕ, namely,
ϕ ((γ1 , . . . , γl−1 )) := {h ∈ H | βih = γi , 1 ≤ i ≤ l − 1}. When the traversing of
T reaches Tm , it means that we have constructed a sequence t = (γ1 , . . . , γm )
with |ϕ(t)| = 1 and ϕ (t) = ∅. We have to check that the unique element of
ϕ(t) occurs in H as well.
Example 4: Conjugating Element Given G ≤ Sym() and a, b ∈ G, we

would like to find some g ∈ G with a g = b or determine that no such element
g exists. In this problem, the set G(P) is empty or a right coset of C G (a). The
restrictions on elements g ∈ G(P) are similar to the ones we encountered at
centralizer computations: g must map the cycles of a to cycles of b of the same
length, and the image α g of any α ∈ determines uniquely the image of the
entire cycle of a containing α. Hence, we compute a base for G according to
an ordering of where elements from each cycle of a are consecutive. Then
P (γ1 , . . . , γl−1 ) consists of at most one point if βl = βl−1
a
; otherwise,
G (β ,...,β ) g
P (γ1 , . . . , γl−1 ) := βl 1 l−1 ∩ i ,
where g is an arbitrary element of ϕ((γ1 , . . . , γl−1 )), i is the length of the a-
cycle containing βl , and {1 , 2 , . . .} is the partition of where i is the union
of supports of cycles of b of length i.
Similarly to computing centralizers, finding conjugating elements is not con-
sidered a difficult problem in practice.
9.2. The Partition Method

An ordered partition of a set is a sequence = ([1], . . . , [r ]) of pairwise

disjoint, nonempty subsets of , satisfying [i] = . We call the sets [i]
the cells of the ordered partition; the cells are considered only as sets (i.e., we do
not define orderings within the cells). We denote the set of all ordered partitions
of by OP().
Ordered partitions in backtrack searches were first used in the highly suc-
cessful program nauty (cf. [McKay, 1981]) for testing isomorphism of graphs
and for computing automorphism groups of graphs. The basic idea is the fol-
lowing: If X (, E) is a graph with vertex set and edge set E then any
x ∈ Aut(X ) must fix certain ordered partitions of . Namely, vertices of va-
lency k must be mapped by x to vertices of valency k, so x must respect the
partition = ([1], . . . , [r ]), where the [i] consist of the vertices of equal
valency. This partition can also be refined: If α, β ∈ [i] have a different num-
ber of neighbors in some [ j] then there is no automorphism of X mapping
α to β. Using this criterion, we can split the cells of such that Aut(X ) must
respect the finer partition as well. This refinement process can be iterated, by
examining the number of neighbors in the cells of the finer partition. Let us call
the final result of these refinements
.
Backtrack search comes into the picture when the refinement process in the
previous paragraph bottoms out. We pick α1 ∈ and look for automorphisms
of X fixing α1 . The distance from α1 in X defines a partition of , which must
be respected by Aut(X )α1 . This partition can be intersected with
and refined
by considering the number of neighbors in different cells, as described in the
previous paragraph. We call the final result of the refinement process
α1 . A
backtrack search is started by constructing the analogous ordered partitions
β
for those β ∈ that are in the same cell of
as α1 and considering the cases
that an automorphism x maps α1 to β for all such β. If the lists of cell sizes in
α1 and
β are not identical then there is no x ∈ Aut(X ) with α1x = β. We build
the next level of the search tree by picking some α2 ∈ \{α1 } and computing a
refinement
(α1 ,α2 ) of
α1 , which must be fixed by Aut(X )(α1 ,α2 ) . Further vertices
on this level are the partitions
(β,γ ) for sequences (β, γ ) such that the list of cell
sizes does not exclude the existence of x ∈ Aut(X ) with α1x = β and α2x = γ .
Continuing, we arrive at a sequence (α1 , α2 , . . . , αm ) of points in , such that the
base partition fixing the αi is discrete (i.e., contains only one-element cells),
and at other discrete partitions that at the appropriate positions contain the
possible images of the αi by elements of Aut(X ). Obviously, Sym() contains a
unique permutation that maps a given discrete ordered partition to another given
discrete ordered partition. Therefore, we constructed a base (α1 , α2 , . . . , αm )
for Aut(X ) and the permutations in Aut(X ).
These ideas were adopted for searches in permutation groups G in [Leon,
1991]. A formal description and some examples can be found in [Leon, 1997]
as well. For simplicity, we describe only the case when G(P) is a subgroup
and indicate the necessary modification for coset-type problems at the end of
the section. Let B = (β1 , . . . , βm ) be a base for G. We consider the search
tree T defined in Section 9.1 and use the sequences of base images as names
for the vertices of T as in that section. However, the vertices themselves are
ordered partitions of . The vertex with name t = (γ1 , . . . , γl ) is a partition
t , satisfying the following properties:
(p1) γ1 , . . . , γl occur in one-element cells of t .

(p2) Any g ∈ G(P) ∩ ϕ(t) must satisfy ((β1 ,...,βl ) )g = t .
(p3) If t is the parent of t then t is a refinement of t .
Here, the image of an ordered partition = ([1], . . . , [r ]) under g ∈

Sym() is defined as g := ([1]g , . . . , [r ]g ). We say that = ([1],
. . . , [r ]) is a refinement of
= (
[1], . . . ,
[s]) (in notation ≤
) if
each
[i] is the union of consecutive cells of . Formally,

ki

[i] = [ j],
j=ki−1 +1
for some sequence of indices 0 = k0 < k1 < · · · < ks = r .

We shall also need the intersection of ordered partitions. Given = ([1],
. . . , [r ]) ∈ OP() and
= (
[1], . . . ,
[s]) ∈ OP(), the cells of their
intersection ∧
are the nonempty subsets of of the form [i] ∩
[ j]
for 1 ≤ i ≤ r and 1 ≤ j ≤ s, ordered by the following rule: [i 1 ] ∩
[ j1 ]
precedes [i 2 ] ∩
[ j2 ] if and only if i 1 < i 2 , or i 1 = i 2 and j1 < j2 . Hence
∧
≤ , but ∧
≤
is not necessarily true. We listed some properties
of intersections in Exercise 9.3.
Traversing a search tree of ordered partitions is a generalization of the original
backtrack method. If the partitions are defined by the recursive rule (γ1 ,...,γl ) :=
(γ1 ,...,γl−1 ) ∧({γl }, \{γl }), that is, the images of base points are in one-element
cells and all further elements of are in one cell, then we get back the traditional
backtrack search tree. Ordered partitions have the advantage that they provide
a convenient language to describe restrictions on the children of a vertex of the
search tree. For example, defining a set of possible children P (γ1 , . . . , γl ) of
the vertex (γ1 , . . . , γl ) as in Section 9.1.2 simply corresponds to intersecting
(γ1 ,...,γl ) with (P (γ1 , . . . , γl ), \P (γ1 , . . . , γl )). Even more importantly, as
we shall see in some examples, it is possible to describe restrictions that cannot
be expressed in terms of the base point images and so cannot be incorporated
easily into a traditional backtrack search.
Given a partition (γ1 ,...,γl−1 ) at vertex (γ1 , . . . , γl−1 ) of T , how do we define
t for t := (γ1 , . . . , γl−1 , γl )? To satisfy the properties (p1)–(p3), t must be a
refinement of ∗t := (γ1 ,...,γl−1 ) ∧ ({γl }, \{γl }). Moreover, P (γ1 , . . . , γl−1 )
provides the further restriction that t can be chosen as a refinement of
∗t ∧ (P (γ1 , . . . , γl−1 ), \P (γ1 , . . . , γl−1 )). From property (p2), we may
get further restrictions on t . In general, we define a P-refinement process
R : OP() → OP(), incorporating the restrictions we can (or are willing to)
compute, satisfying the following two rules:
(p4) R() ≤ for all ∈ OP().
(p5) R()g = R(g ) for all vertices of the search tree and for all g ∈
G(P).
The partition t is constructed as t := R(∗t ), for the partition ∗t =
(γ1 ,...,γl−1 ) ∧ ({γl }, \{γl }) just defined. In practice, the refinement process
usually consists of a sequence (R1 , . . . , Rl ) of refinements. Given ∈ OP(),
we initialize := and always replace by Ri ( ) for the least index i
such that Ri ( ) = , continuing this process until becomes invariant for
all Ri . Then we define R() as the current value of .
The leftmost branch (∅, (β1 ), (β1 , β2 ), . . . , (β1 , β2 , . . . , βm )) of the tradi-
tional backtrack search tree corresponds to the point stabilizer subgroup chain
G = G [1] ≥ · · · ≥ G [m+1] = 1, where G [i] = G (β1 ,...,βi−1 ) , in the sense that for
t = (β1 , . . . , βi−1 ) we have ϕ(t) = G [i] . At partition backtrack, this leftmost
branch consists of the ordered partitions 1 ≥ · · · ≥ m+1 , where 1 =
R(()) and i = R(i−1 ∧ ({βi−1 }, \{βi−1 })) for 2 ≤ i ≤ m + 1. Some of
these partitions may coincide: For example, in the case when G(P) = C G (c)
for some c ∈ G, we choose the base (β1 , . . . , βm ) such that an initial segment of
base points, say (β1 , . . . , βk ), is contained in the same cycle of c (cf. Example 1
g
in Section 9.1.2). Then, as we saw in this example, the image β1 under an ele-
g
ment g ∈ C G (c) determines uniquely (β1 , . . . , βk )g . In particular, if β1 = β1 then
(β1 , . . . , βk ) = (β1 , . . . , βk ). This restriction can be built into the refinement
g
process R, resulting in 2 = 3 = · · · = k+1 . Deleting the repetitions from

the chain of partitions 1 ≥ · · · ≥ m+1 , we get a sequence of partitions that
Leon calls an R-base for G(P). The corresponding elements of (β1 , . . . , βm )
(i.e., those βi for which i+1 was not deleted) form a base of G(P). This base
may be substantially shorter than the base for G, which speeds up the partition
backtrack search. The first step of partition backtrack is the construction of an
R-base.
It would be enough to require that property (p5) holds when is an element
of the R-base, since the purpose of (p5) is to ensure that the refinement process
does not violate (p2).
The problem-independent prunings of the search tree described in Sec-
tion 9.1.1 can be applied for partition backtrack as well. There are further
problem-independent restrictions, which show one of the advantages of par-
tition backtrack over the traditional method: Instead of considering only the
G (β ,...,β )
fundamental orbit βl 1 l−1 , we can work with all orbits of G [l] = G (β1 ,...,βl−1 ) .
One such restriction is that if βl is in the ith cell of (β1 ,...,βl−1 ) then (p2) implies
that for any extension (γ1 , . . . , γl ) of a vertex (γ1 , . . . , γl−1 ), the extending el-
ement γl must come from the ith cell of (γ1 ,...,γl−1 ) . Also, a vertex (γ1 , . . . , γl )
can be eliminated if the list of cell sizes of (β1 ,...,βl ) and (γ1 ,...,γl ) differ.
Now we describe another restriction that uses all orbits of G [l] . For H ≤
Sym(), we define (H ) ∈ OP() to have as cells the orbits of H ; orbit O1
precedes orbit O2 in (H ) if and only if minO1 ≺ minO2 .
Suppose that at the construction of an R-base the partition l := (β1 ,...,βl−1 )
is already defined and now we would like to construct l+1 := (β1 ,...,βl ) .
The elements of G [l+1] must leave (G [l+1] ) invariant, so it makes sense to
∗ ∗
define the first refinement of l+1 := l ∧ ({βl }, \{βl }) as R1 (l+1 ) :=
∗ ∗
l+1 ∧ (G [l+1]
). Then we have to define R1 for partitions t := (γ1 ,...,γl−1 ) ∧
({γl }, \{γl }), which will be placed on level Tl with the name t = (γ1 , . . . , γl ),
so that the compatibility conditions (p2) and (p5) are satisfied.
Let g be an arbitrary element of ϕ(t) and define R1 (∗t ) := ∗t ∧ (G [l+1] )g .
Note that R1 (∗t ) is well defined, because if g1 , g2 ∈ ϕ(t) then g1 = sg2 for
some s ∈ G [l+1] and, since (G [l+1] )s = (G [l+1] ), we have (G [l+1] )g1 =
(G [l+1] )sg2 = (G [l+1] )g2 . Since any g ∈ ϕ(t) can be used at the definition of
R1 (∗t ), we can compute R1 (∗t ) independently of whether we have at hand
elements of G(P) that map l to (γ1 ,...,γl−1 ) . Note further that the cells of
(G [l+1] )g are the orbits of G (γ1 ,...,γl ) , but not necessarily in the same order as
in (G (γ1 ,...,γl ) ).
We claim that the refinement R1 as just defined satisfies (p2) and (p5). To
check (p2), let g ∈ G(P) ∩ ϕ(t) be arbitrary. Then, in particular, g ∈ G(P) ∩
g ∗
ϕ((γ1 , . . . , γl−1 )), so l = (γ1 ,...,γl−1 ) . By Exercise 9.3, R1 (l+1 )g = (l ∧
) = R1 (∗t ).
g
({βl }, \{βl }) ∧ (G [l+1] g
)) = l ∧ ({γl }, \{γl }) ∧ (G [l+1] g
To prove (p5), suppose that = (γ1 ,...,γl−1 ) ∧ ({γl }, \{γl }) and g ∈ G(P).
9.3 Normalizers 211
Let g1 ∈ ϕ((γ1 , . . . , γl )), and let (γ1 , . . . , γl ) = (γ1 , . . . , γl )g . Then g =

(γ1 ,...,γl−1

) ∧ ({γl }, \{γl }) and g1 g ∈ ϕ((γ1 , . . . , γl )), so
g
R1 ()g = R1 (γ1 ,...,γl−1 ) ∧ ({γl }, \{γl })
g g
= (γ1 ,...,γl−1 ) ∧ ({γl }, \{γl }) ∧ G [l+1] 1

[l+1] g1 g
= (γ1 ,...,γl−1

) ∧ ({γl }, \{γl }) ∧ G = R1 (g ).
We finish this section by briefly indicating the necessary change in the par-
tition backtrack method when G(P) is a coset of a subgroup of G. We have
to define two refinement processes R and R , connected by a modification of
property (p5):
p(5) R()g = R (g ) for all vertices of the search tree and for all
g ∈ G(P).
For example, if G(P) consists of the elements of G conjugating a ∈ G to

b ∈ G then R can be chosen as the refinement process used at the computation
of C G (a), which is based on the cycle structure of a, and R can be chosen as
the refinement process used at the computation of C G (b).
9.3. Normalizers
Given G, H ≤ Sym(), computing the normalizer NG (H ) is considered to be
a harder problem than the ones discussed in the examples in Section 9.1.2. All
of those problems can be reduced to each other in polynomial time (cf. [Luks,
1993]), but no polynomial-time reduction of the normalizer computation to
centralizer computations is known. In fact, there is no known “easy” way to
compute even NSym() (H ); in contrast, we saw in Section 6.1.2 that CSym() (H )
is computable in polynomial time. Note that since NG (H ) = NSym() (H ) ∩ G,
a polynomial-time algorithm for computing NSym() (H ) would imply the
polynomial-time equivalence of normalizer computations with centralizer com-
putations.
If some g ∈ Sym() normalizes H then it must leave invariant certain struc-
tures associated with H . For example, Lemma 6.1.7 says that g must permute
the orbits of H . The first algorithm for computing normalizers, which appeared
in [Butler, 1983], was based on this observation. In this section we describe the
method of [Theißen, 1997], which uses that g must permute the orbital graphs
of H . Considering orbital graphs gives more efficient criteria for pruning the
search tree, yet these criteria are still easy enough to compute.
Let H ≤ Sym(). Then H also acts on the ordered pairs (α, β) ∈ ×
by the natural componentwise action: (α, β)h := (α h , β h ), for all h ∈ H . The
orbital graphs of H are directed graphs with vertex set . The edge set of an
orbital graph of H is one of the orbits in the action ϕ : H → Sym( × ).
We denote the orbital graph with edge set (α, β) H by X (α, β). Note that H
is a subgroup of the automorphism group of each orbital graph X (α, β), since
obviously H leaves invariant the set (α, β) H of edges. Applying Lemma 6.1.7 to
ϕ(H ), we obtain that any g ∈ NSym( (H ) must permute the orbital graphs of H .
Let also G ≤ Sym() be given; our goal is to compute G(P) := NG (H ).
It is not necessary that H ≤ G, although the condition H ≤ G makes the
application of the criteria in Section 9.1.1 more efficient since we know a priori
a nontrivial subgroup of NG (H ).
The main tool of [Theißen, 1997] is a refinement process that can be applied to
ordered partitions satisfying the condition that there are α, β ∈ belonging
to the same H -orbit on and that α and β occur in one-element cells in .
The orbital graph X (α, β) defines an ordered partition := ([0], [1], . . . ,
[k], [∞]), where [i] consists of vertices of X (α, β) of (directed) distance
i from α for 0 ≤ i ≤ k, and [∞] contains the elements of that cannot be
reached by directed paths from α. If g ∈ G(P) maps to with α := α g
and β := β g then g must also map the refinement ∧ to ∧ , where
is the distance partition from α in the orbital graph X (α , β ). If ∧ and
∧ have different sequences of cell sizes then we can conclude that there
is no g ∈ G(P) mapping to .
Instead of , we could have used a refinement of as well, which can be
obtained by partitioning the cells [i] according to the number of neighbors
in different cells (we described such refinements in detail at the beginning of
Section 9.2, while sketching McKay’s graph isomorphism program nauty). Of
course, we have to consider whether the cost of computing this refinement is
justified by the additional pruning of the search tree. As a compromise, in the
GAP implementation the sets [i] are partitioned according to the valencies of
vertices, but further refinements are not computed.
We need an efficient method for computing . For 0 ≤ i ≤ k, let [i] ¯ be
the set of vertices of X (α, β) that can be reached by a walk of length i from
α. If [0], . . . , [i − 1] are already known then it is enough to compute [i], ¯

since [i] = [i]\¯ j<i [ j]. Also, since α and β are in the same H -orbit,
we can fix r ∈ H with αr = β.
Lemma 9.3.1. [i]

¯ = αr Hα ···r Hα (i iterations of r Hα ).
Proof. We denote r Hα · · · r Hα , with i iterations of r Hα , by (r Hα )i .

9.3 Normalizers 213
We prove the statement of the lemma by induction on i. For any γ ∈ , by
definition γ ∈ [1]
¯ if and only if (α, γ ) is an edge in (α, β) H . This happens if
and only if there exists h ∈ H with α h = α and β h = γ . Since β = αr , the last
statement is equivalent with the existence of h ∈ Hα such that αr h = γ , which
is the definition of γ ∈ αr Hα . Hence the lemma holds for i = 1.
i
Suppose that [i]
¯ = α (r Hα ) for some i ≥ 1. For any γ ∈ , we have
γ ∈ [i¯ + 1] if and only if there exists δ ∈ [i] ¯ such that (δ, γ ) is an edge
in X (α, β). By the inductive hypothesis, δ ∈ [i] ¯ if and only if there exists
h ∈ (r Hα )i such that α h = δ. Moreover, since h is an automorphism of X (α, β),
the condition (δ, γ ) is an edge in X (α, β) means that γ = β1h for some neighbor
β1 of α. By the already proven case i = 1 of the lemma, β1 is a neighbor of
α if and only if β1 = αr h 1 for some h 1 ∈ Hα . In summary, γ ∈ [i ¯ + 1] if
and only if there exists h ∈ (r Hα ) and h 1 ∈ r Hα such that γ = α (i.e., γ ∈
i h1 h
(r Hα )i+1 ).
Lemma 9.3.1 implies that the cells [i] are unions of Hα -orbits. This pro-
vides the opportunity to compute working in a quotient graph of X (α, β),
which may have significantly fewer vertices than X (α, β). The vertices of
are the Hα -orbits on ; for γ ∈ , we denote by [γ ] the vertex γ Hα of . For
[γ ], [δ] ∈ V (), the ordered pair ([γ ], [δ]) is an edge of if and only if there
exists γ ∈ [γ ] and δ ∈ [δ] such that (γ , δ ) ∈ (α, β) H .
Lemma 9.3.2. Let γ ∈ be arbitrary. Then there exists a walk of length i from
α to γ in X (α, β) if and only if there is a walk of length i in from [α] to [γ ].
Proof. One direction is clear. Namely, if (α = γ0 , γ1 , . . . , γi = γ ) is a walk in

X (α, β) from α to γ then by definition ([γ0 ], . . . , [γi ]) is a walk from [α] to
[γ ] in .
We prove the converse, that if there is a walk of length i from [α] to [γ ] in
then there is a walk of length i from α to γ in X (α, β), by induction on i.
The case i = 0 is trivial. Suppose that we know the statement for some i, and
suppose that there is a walk of length i + 1 from [α] to [γ ] for some γ ∈ . By
definition, this means that there exist γ j for j ∈ [0, i] and γ j for j ∈ [1, i + 1]

such that (γ j , γ j+1 ) ∈ (α, β) H for j ∈ [0, i] and γ j ∈ γ jHα for j ∈ [1, i], with

γ0 = α and γi+1 ∈ γ Hα . Since there is a walk of length i from [α] to [γi ] in ,
the inductive hypothesis implies that there is a walk of length i from α to γi in

X (α, β), and so there is a walk of length i + 1 from α to γi+1 in X (α, β). By

Lemma 9.3.1, this means that γi+1 = α for some h ∈ (r Hα )i+1 . Also, since γ
h

and γi+1 are in the same Hα -orbit, γ = (γi+1 )h 1 for some h 1 ∈ Hα . However,
hh 1 ∈ (r Hα )i+1 and α hh 1 = γ , so Lemma 9.3.1 implies that there is a walk of

length i + 1 from α to γ in X (α, β).
It follows immediately from Lemma 9.3.2 that the sets [i]

¯ can be computed
in .
Another refinement method of [Theißen, 1997] is based on the fact that if α, β
are contained in one-element cells of and g = , and α g = α , β g = β
for some g ∈ G(P), then the smallest block of imprimitivity B of H containing
α and β must be mapped by g to the smallest block of imprimitivity of H
containing α and β . The block B can be computed as B := \[∞] (see also
Exercise 5.6).
9.4. Conjugacy Classes

Finding the conjugacy classes of a group is an important task, as it is an essen-
tial ingredient in the computation of the character table of the group. Listing
conjugacy class representatives is also a less space-consuming alternative to
listing all group elements. There are numerous methods for computing conju-
gacy classes, and the newest approaches use quite a bit of the machinery we
have described in the previous chapters.
The output of the conjugacy class algorithms is a list of representatives of the
classes. This list may be exponentially long compared to the input length, so the
user is advised to do some structural exploration of the input group G before
calling a conjugacy class algorithm in the major computer algebra systems.
Obvious obstacles are when Z (G) or G/G are large and, in general, solvable
groups may have numerous conjugacy classes. On the other end of the spectrum,
nonabelian simple groups tend to have relatively few conjugacy classes.
Class Representatives by Random Sampling

The simplest algorithm for computing conjugacy classes, that of Butler and
Cannon (cf. [Butler, 1979]), is to take random elements of the input group G
until we find representatives for all classes. After a partial list (g1 , . . . , gm ) of
class representatives is already constructed, we can test whether a new random
element g ∈ G is in the same class as one of the gi by the algorithm of Example 4
in Section 9.1.2 or by the partition backtrack version of that algorithm. For each
class representative gi , we also compute C G (gi ) and |giG | = |G|/|C G (gi )|; the

algorithm terminates when i≤m |giG | reaches |G|.
The main drawback of the method described in the previous paragraph is
that we have only a slim chance to find representatives for the small classes.
9.4 Conjugacy Classes 215
An obvious but very useful improvement is to test not only whether the new
random element g is in a new conjugacy class but to test the powers of g as
well (cf. a similar application of power maps in Section 5.3).
The ultimate word on finding class representatives by random sampling is
in [Jerrum, 1995]. Instead of taking random elements of the input group G,
Jerrum’s method constructs a Markov chain M (cf. Section 2.2) with states
V := G. For g, h ∈ G, the entry pg,h of the transition probability matrix P is
defined as
1/|C G (g)|, h ∈ C G (g)

pg,h :=
0, otherwise.
Lemma 9.4.1. The Markov chain M defined in the previous paragraph is irre-
ducible and aperiodic.
Proof. Given any two states g, h ∈ G, we can reach h from g in at most two
steps since pg,1 = 1/|C G (g)| > 0 and p1,h = 1/|G| > 0. Hence M is irreducible.
Also, p1,1 = 1/|G| > 0 and so M is aperiodic.
The random walk on G corresponding to M is constructed by starting at a

random element x0 of G. After the sequence (x0 , . . . , xn ) is defined, we compute
C G (xn ) and choose xn+1 as a uniformly distributed random element of C G (xn ).
Lemma 9.4.1 implies that M satisfies the conditions of Theorem 2.2.1, and
so there is a stationary distribution vector u = (u[h] | h ∈ G) and a constant δ ∈
(0, 1) such that for all m ≥ 0 and h ∈ G we have |Prob(xm = h) − u[h]| ≤ δ m .
Let k denote the number of conjugacy classes of G and define u[h] := (k|h G |)−1
for h ∈ G. Using that |G| = |h G ||C G (h)| for all h ∈ G, it is easy to see that

u = uP and h∈G u[h] = 1. Hence u is the stationary distribution vector of

M. For each conjugacy class C of G, we have h∈C u[h] = 1/k. This means
that for large enough m, xm is close to being uniformly distributed among the
conjugacy classes. Unfortunately, the usual caveat applies: For most groups, we
do not have a good estimate for the parameter δ, so we do not know the speed
of the convergence to the stationary distribution.
We note that [Jerrum, 1995] discusses a more general situation, when G
permutes the elements of some domain that is too big to list, and we would
like to choose an orbit of G from the uniform distribution. Our description above
corresponds to the case = G, with G acting by conjugation on itself. We
leave the details of the more general argument as an exercise (cf. Exercise 9.4).
Representatives in Centralizers of p-Elements
A completely different approach for conjugacy class computations is described
in [Butler, 1994]. The method is based on the following observation: Since
every element g ∈ G\{1} has a power g m of prime order and, of course, g m
commutes with g, we can find representatives for each conjugacy class of G in
the centralizers of elements of prime order. Moreover, for a fixed prime p, all
Sylow p-subgroups of G are conjugate; hence, for each prime p dividing the
order of G, we can fix one Sylow p-subgroup S p of G and find representatives
of all classes of G in the centralizers of elements of these fixed S p . Besides the
difficulties with Sylow subgroup computations, a drawback of this approach is
that there may be many more conjugacy classes of elements of order p in S p
then there are in G, since conjugation by the elements of G may fuse classes
of S p . Therefore, although S p is solvable and so we may apply the method of
[Mecky and Neubüser, 1989] from Section 7.3.2, we may have to compute a
list of class representatives in S p that is much longer than would be necessary
to handle G.
Separation of the Solvable Radical

The latest approaches (cf. [Cannon and Souvignier, 1997; Hulpke, 2000])
combine the methods of separating computations in G/O∞ (G) and O∞ (G)
(cf. Section 6.3) and extending the result through elementary abelian lay-
ers (Section 7.3.2). As noted at the end of Section 6.3, every group G has
a series of characteristic subgroups 1 ≤ N1 ≤ N2 ≤ N3 ≤ G, where N1 =
O∞ (G), N2 /N1 = Soc(G/N1 ) ∼ = T1 × · · · × Tm is the direct product of non-
abelian simple groups, N3 /N2 Out(T1 ) × · · · × Out(Tm ), and G/N3 is a
permutation group of degree m, corresponding to the conjugation action of
G on {T1 , . . . , Tm }. Given G = S ≤ Sym(), both aforementioned references
compute a faithful permutation representation of G/N1 of degree at most ||,
compute the conjugacy classes of G/N1 in this new representation, and then
obtain the conjugacy classes of G by the method of [Mecky and Neubüser,
1989], which we described in Section 7.3.2. The two papers also handle N2 /N1
similarly, by computing the conjugacy classes of N2 /N1 using the random sam-
pling method described at the beginning of this section and then identifying the
classes that are fused by the conjugation action of G/N1 on N2 /N1 . The major
difference is in the handling of classes that are subsets of G\N2 . In [Cannon and
Souvignier, 1997], random elements of G are taken until all conjugacy classes
are hit. In [Hulpke, 2000], random sampling is used only in N3 ; the classes in
G\N3 are constructed by extending the ideas of [Mecky and Neubüser, 1989]
into a nonabelian setting.
Exercises 217
Exercises
9.1. Draw the search tree T for S4 , according to the base B = (1, 2, 3). What
vertices of T are visited during the computation of C S4 ((1, 2)), if we
use the restrictions in Section 9.1.1? What happens if we use instead the
restrictions in Example 1 of Section 9.1.2? And if we combine all restric-
tions?
9.2. Repeat the computation of C S4 ((1, 2)), using the partition method. Define
a refinement process using the restrictions in Example 1 of Section 9.1.2.
9.3. Prove the following properties of ordered partitions:
(i) ( ∧
) ∧ T = ∧ (
∧ T ).
(ii) ( ∧
)g = g ∧
g .
(iii) ≤
⇐⇒ g ≤
g .
9.4. [Jerrum, 1995] Let G ≤ Sym(). We define a Markov chain M with
states , and with the following transition rule: Given a state α ∈ ,
first compute a uniformly distributed random element g of G α , and then
compute a uniformly distributed random element β of the fixed point set
{γ ∈ | γ g = γ } of g. Then we define β as the next state in M. Let k
denote the number of orbits of G on .
Prove that M is irreducible and aperiodic and that the stationary distri-
bution vector u = (u[α] | α ∈ ) satisfies u[α] = (k|α G |)−1 for all α ∈ .
10
Large-Base Groups
The central theme of this book is the design and analysis of nearly linear-time
algorithms. Given a permutation group G = S ≤ Sym(), with || = n, such
algorithms run in O(|S|n logc |G|) time for some constant c. If, as happens
most of the time in practice, log |G| is bounded from above by a polyloga-
rithmic function of n then these algorithms run in time that is nearly linear as
a function of the input length |S|n. In most cases, we did not make an effort
to minimize the exponent c of log |G| in the running time since achieving the
smallest possible exponent c was not the most pressing problem from either
the theoretical or practical point of view. However, in families of groups where
log |G| or, equivalently, the minimal base size of G is large, the nearly linear-
time algorithms do not run as fast as their name indicates; in fact, for certain
tasks, they may not be the asymptotically fastest algorithms at all. Another
issue is the memory requirement of the algorithms, which again may be unnec-
essarily high. The purpose of this chapter is to describe algorithms that may be
used in the basic handling of large-base groups. The practical limitation is their
memory requirement, which is in most cases a quadratic function of n. The use
of (n 2 ) memory in storing an SGS for an arbitrary G ≤ Sym() seems to be
unavoidable.
10.1. Labeled Branchings

A labeled branching is a data structure that uses a total of n permutations to
code all transversals along a point stabilizer chain of some G ≤ Sym(), with
|| = n. This data structure supports membership testing in G with the same
asymptotic efficiency as the storage of all transversal elements as permutations
but is more space efficient: Storing all transversal elements may require (n 3 )
memory. In this section we describe the definition and construction of labeled
branchings based on [Jerrum, 1986].
218
Let G ≤ Sym(), with || = n, be an arbitrary permutation group, and let
us fix an ordering ω1 ≺ · · · ≺ ωn of . Let G = G [1] ≥ G [2] ≥ · · · ≥ G [n−1] ≥
G [n] = 1 be the point stabilizer chain corresponding to this ordering; that is,
G [i] is the subgroup comprised of the elements of G that fix {ω1 , . . . , ωi−1 }
pointwise. A labeled branching B(, E) for G is a directed graph with vertex
−−−−→
set and edge set E, and with a function f : E → G. For (ωi , ω j ) ∈ E,
we
−−−−→
call the group element f ((ωi , ω j )) the label of this edge. A labeled branching
must satisfy the following properties:
(i) The underlying graph of B is a forest (i.e., disregarding the orientation of

edges, each connected component of B is a tree).
(ii) Each connected component is a rooted tree, and all edges are directed away
from the root.
−−−−→
(iii) For each (ωi , ω j ) ∈ E, we have i < j.
−−−−→ g
(iv) Each (ωi , ω j ) ∈ E is labeled with some g ∈ G [i] satisfying ωi = ω j .
We denote by f (B) the set of elements of G that are labels of some edge.
We shall use repeatedly the following simple characterization of branchings.
Lemma 10.1.1. Let B(, E) be a directed graph such that for each −−−−→
(ωi , ω j ) ∈
we have i < j. Then the underlying graph of B is a forest if and only if for
E,
each ωk ∈ there exists at most one edge with endpoint ωk .
Proof. If the underlying graph of B is a forest then it is clear that each vertex
is the endpoint of at most one edge. To prove the converse, suppose, on the
contrary, that the underlying graph contains a cycle and let ωk be the vertex
with the largest index k in this cycle. Then both edges of the cycle incident with
ωk are directed toward ωk , contradicting the assumption of the lemma.
Let T be the transitive closure of B. This means that T is a directed graph

−−−−→
on , with the property that (ωi , ω j ) is an edge of T if and only if there
is a directed path (ωi = ωi1 , . . . , ωik = ω j ) in B connecting ωi with ω j .
Note that since the connected components of B are trees, there is at most one
such path. The labeling of B extends naturally to the edges of T . Namely, if
(ωi = ωi1 , . . . , ωik = ω j ) is the unique path in B connecting ωi with ω j and
−−−−−−→
gl = f ((ωil , ωil+1 )) for l ∈ [1, k − 1] are the labels of edges along this path, then
−−−−→
we can define the label of (ωi , ω j ) in T as the product g1 · · · gk−1 . Although T
is not a forest, its edges and their labels satisfy properties (iii) and (iv) of the
definition before the lemma. With a slight abuse of notation, we shall denote
the extension of the function f that assigns labels to the edges of T by f as
well, and we shall denote by f (T ) the set of elements of G that are labels of
some edge in T .
220 Large-Base Groups
The reason for introducing labeled branchings is that we want to encode

transversals G [i] mod G [i+1] , for 1 ≤ i ≤ n − 1. We say that the labeled branch-
ing B represents a transversal G [i] mod G [i+1] if for all ω j in the fundamental
[i] −−−−→
orbit ωiG , either i = j or (ωi , ω j ) is an edge of T . The transversal represented
−−−−→ [i]
by B is {()} ∪ { f ((ωi , ω j )) | ω j ∈ ωiG \{ωi }}. We say that B represents G if it
represents a transversal G [i] mod G [i+1] for all i ∈ [1, n − 1].
As an example, let us consider G = Alt([1, 4]), with the ordering
1 ≺ 2 ≺ 3 ≺ 4 of the permutation domain. The directed graph with edge set
−−→ −−→ −−→ −−→ −−→
{(1, 2), (2, 3), (2, 4)} and labels f ((1, 2)) = (1, 2)(3, 4), f ((2, 3)) = (2, 3, 4),
−−→
f ((2, 4)) = (2, 4, 3) is a labeled branching B representing G, and its transitive
−−→
closure T has the additional edge labels f ((1, 3)) = (1, 2)(3, 4) · (2, 3, 4) =
−−→
(1, 3, 2) and f ((1, 4)) = (1, 2)(3, 4) · (2, 4, 3) = (1, 4, 2). The transversals
represented by B are {(), (1, 2)(3, 4), (1, 3, 2), (1, 4, 2)} for G [1] mod G [2] ,
{(), (2, 3, 4), (2, 4, 3)} for G [2] mod G [3] , and {()} for G [3] mod G [4] .
Theorem 10.1.2. Let G ≤ Sym(), and let ω1 ≺ · · · ≺ ωn be an arbitrary

ordering of . Then there exists a labeled branching representing G, with
respect to the point stabilizer chain defined by this ordering.
Proof. Let G = G [1] ≥ G [2] ≥ · · · ≥ G [n−1] ≥ G [n] = 1 be the point stabilizer

chain corresponding to the ordering ω1 ≺ · · · ≺ ωn . We define a directed graph
−−−−→
T with vertex set by the following rule: For 1 ≤ i, j ≤ n, (ωi , ω j ) is an edge
[i] −−−−→
of T if and only if i < j and ω j ∈ ωiG . Then T is transitive, since if (ωi , ω j )
−−−−→
and (ω j , ωk ) are edges of T then i < j < k and there exist g1 ∈ G [i] with
g1 g g g
ωi = ω j and g2 ∈ G [ j] with ω j 2 = ωk . Hence ωi 1 2 = ωk and g1 g2 ∈ G [i] ,
−−−−→
implying that (ωi , ωk ) is an edge of T .
Let E be a subset of the edge set of T such that the transitive closure of
the graph B(, E) is T and E is a minimal set (subject to inclusion) with this
property.
We claim that for each ωk ∈ , at most one edge in E has endpoint ωk
and so, by Lemma 10.1.1, the underlying graph of B is a forest. Suppose,
−−−−→ −−−−→
on the contrary, that (ωi , ωk ) ∈ E and (ω j , ωk ) ∈ E for some i < j. Then
g g
there exist g1 ∈ G [i] with −1ωi 1 = ωk and g2 ∈ G [ j] with ω j 2 = ωk . We
−1 g1 g2 −−−−→
have g1 g2 ∈ G and ωi[i]
= ω j ; hence, by definition, (ωi , ω j ) is an edge
of T . Since the transitive closure of B is T , there is a directed path (ωi =
ωi1 , . . . , ωil = ω j ) connecting ωi with ω j in B, and (ωi = ωi1 , . . . , ωil , ωk ) is a
−−−−→ its
directed path connecting ωi with ωk . Therefore, if we delete (ωi , ωk ) from E,
transitive closure remains the same. However, this contradicts the minimality
of E.
So far, we have shown that B satisfies properties (i)–(iii) of being a labeled
branching (for (ii), note that in each connected component the vertex with
smallest index can be chosen as the root). By the definition of edges of T , it is
also clear that there are elements of G that can be chosen as the labels of the
edges of B such that (iv) is satisfied. Moreover, since the transitive closure of
B is T , it is clear that B represents G.
Now we describe a data structure for storing a labeled branching B that en-
ables us to recover the labels of edges of the transitive closure T of B efficiently.
First, we store an array B of length n := ||, containing integers from [1, n].
We define B[k] := k if ωk is the root of a tree in B; otherwise, B[k] := j for
−−−−→
the integer j such that (ω j , ωk ) is the unique edge in B with endpoint ωk .
The most natural solution for storing the labels of edges of B is to define an
array P of permutations of length n such that the kth entry P[k] is the identity
permutation if ωk is the root of a tree in B, and otherwise P[k] is the label of
−−−−→
the edge (ω j , ωk ) for the parent ω j of ωk . However, if we need the label of an
−−−−→
edge (ωi , ωk ) of T then we have to construct the path from ωi to ωk in B, and
then we have to take the product of edge labels along this path. Although the
construction of the path can be done in O(n) time, starting at ωk and using the
array B to construct larger and larger finishing segments of the path, taking
the product of edge labels may require (n 2 ) time.
We can do better if we also define an array Q of n permutations as follows:
Let Q[k] := () if ωk is the root of a tree in B, and otherwise let Q[k] be the
product of edge labels along the path from ωt to ωk , where ωt is the root of the
−−−−→
tree containing ωk . Then the label of an edge (ωi , ωk ) of T can be computed in
O(n) time, as the product Q[i]−1 Q[k].
−−−−→
Given i, j ∈ [1, n], we have two methods to decide whether (ωi , ω j ) is an
edge of T . The first one is to use the array B and to check whether the unique
path from ω j to the root of the tree containing ω j goes through ωi . The second
−−−−→
method computes Q[i]−1 Q[ j]. It is clear that (ωi , ω j ) is an edge of T if and
only if Q[i]−1 Q[ j] fixes {ω1 , . . . , ωi−1 } pointwise. The second method also
−−−−→
computes the label of the edge (ωi , ω j ), in the case when this ordered pair is
indeed an edge of T .
If the labeled branching B represents G then the array Q can be used to
test membership in G, in O(n 2 ) time. Given g ∈ Sym(), we try to factor g
as a product g = rn−1 · · · r1 of coset representatives, as in the original sifting
procedure of Sims (cf. Section 4.1). If r1 , . . . , ri−1 and gi := gr1−1 · · · ri−1
−1
gi
are already computed for some i ∈ [1, n − 1] then we compute ω j := ωi and
Q[i]−1 Q[ j]. If Q[i]−1 Q[ j] fixes {ω1 , . . . , ωi−1 } pointwise then we define ri :=
Q[i]−1 Q[ j] and gi+1 := gi ri−1 ; otherwise we conclude that g ∈ G.
10.1.1. Construction
In Theorem 10.1.2, we have shown that each permutation group can be rep-
resented by a labeled branching. However, the proof of that theorem did not
provide an efficient algorithm for the construction of the labeled branching.
Theorem 10.1.3. Given G = S ≤ Sym(), with || = n, and an ordering

ω1 ≺ · · · ≺ ωn of , a labeled branching representing G with respect to the
point stabilizer chain defined by this ordering can be constructed in O(|S|n 2 +
n 5 ) time by a deterministic algorithm. The memory requirement is O(|S|n +n 2 ).
Proof. Let G = G [1] ≥ G [2] ≥ · · · ≥ G [n−1] ≥ G [n] = 1, G [i] = G (ω1 ,...,ωi−1 ) ,

be the point stabilizer chain corresponding to the ordering ω1 ≺ · · · ≺ ωn . The
original Schreier–Sims algorithm, described in Section 4.2, constructs an SGS
for G in a bottom-up manner. This means that, in intermediate steps, we have
a correct strong generating set for a subgroup Hi ≤ G [i] for some index i, and
then we try to construct an SGS for a subgroup Hi−1 ≤ G [i−1] containing Hi .
In contrast, here we proceed in a top-down manner. Recursively for i =
1, 2, . . . , n, we construct labeled branchings B1 , . . . , Bn such that Bi satisfies
the following two properties:
(c1) Bi represents transversals G [ j] mod G [ j+1] , for all j ∈ [1, i − 1].

(c2) f (Bi ) ∩ G [i] generates G [i] .
At the end of the recursion, Bn represents G. During the construction, the

branchings are stored using the arrays B, P, and Q, as described at the end of
Section 10.1. When the construction is complete, we discard P.
Suppose that Bi is already constructed for some i ∈ [1, n − 1]. Then we
compute Bi+1 by the following algorithm:
Step 1: Compute a generating set Si for G [i] and a transversal Ti for G [i]
mod G [i+1] .
Step 2: Modify Bi into a labeled branching Bi such that Bi represents Ti ,
maintaining the property that Bi represents transversals G [ j] mod G [ j+1] ,
for all j ∈ [1, i − 1].
Step 3: For each Schreier generator g constructed from Si and Ti
(cf. Lemma 4.2.1), modify Bi into a labeled branching Bi such that
f (Bi ) ∩ G [i+1] = f (Bi ) ∩ G [i+1] , g, maintaining the property (c1)
for all j ∈ [1, i]. Rename Bi as Bi .
At the end of Step 3, Bi+1 can be defined as the current version of Bi .
To start the recursion, we define the trivial branching B0 by the arrays B, P,
and Q with B[ j] := j, P[ j] := (), and Q[ j] := () for all j ∈ [1, n], and we
perform Step 3 using the elements g of the input generating set S.
Now we describe the three steps of the construction of Bi+1 in more detail.
In Steps 1 and 2, we work only with the arrays B and P.
In Step 1, we obtain Si as Si := {P[k] | i ≤ B[k] < k}. Then we compute
S
the orbit ωi i . The elements of the transversal Ti can be constructed during
the orbit computation, as described at the end of Section 2.1.1. The time and
memory requirements of Step 1 are O(n 2 ).
In Step 2, we modify the arrays P and B. We process each g ∈ Ti . Let
g
ωk := ωi . If k = i then we do not do anything; otherwise, we define P[k] := g
and B[k] := i.
We have to show that this definition of P[k] and B[k] does not destroy
the property that the labeled branching Bi , defined by B and P, represents
transversals G [ j] mod G [ j+1] , for all j ∈ [1, i − 1]. Let j ≤ i − 1 be fixed,
[ j]
and let ωl be an arbitrary element of the fundamental orbit ω Gj . If the unique
path from ω j to ωl in Bi (before the modification of Bi ) does not go through
ωk then this path remains intact after the change of P[k] and B[k]; therefore, it
is enough to consider the case that the path from ω j to ωl contains ωk . In this
g
case, in particular, ωk ∈ ωGj and so there exists g j ∈ G [ j] with ω j j = ωk . Then
[ j]
−1
g g
ωj j = ωi and g j g −1 ∈ G [ j] , so there is a path ω j = ωi1 , . . . , ωim = ωi from
ω j to ωi in Bi (this path exists before and after the change of P[k] and B[k]).
Hence ω j = ωi1 , . . . , ωim , ωk is a path in Bi after the change of P[k] and B[k],
and so there is a path from ω j to ωl .
The time requirement of Step 2 is O(n 2 ), and it is clear that after Step 2 the
labeled branching represents the transversal Ti . After Step 2, we reconstruct the
array Q. For k = 1, 2, . . . , n, we define Q[k] := () if B[k] = k and Q[k] :=
Q[B[k]] · P[k] if B[k] < k. The construction of Q requires O(n 2 ) time. In
Step 3, we shall use all three arrays B, P, and Q.
Before we describe Step 3, we introduce a quantity associated with labeled
branchings. Let B be a labeled branching, stored in the arrays B, P, and Q as
in the foregoing. The length l(B) of B is defined as

l(B) = (k − B[k]) + k.
{k|B[k]<k} {k|B[k]=k}
In Step 3, we apply a modification of the sifting procedure described at the

end of Section 10.1. Given a labeled branching Bi that satisfies property (c1)
for i + 1 (i.e., it represents transversals G [ j] mod G [ j+1] for all j ∈ [1, i]),
and given some g ∈ G [i+1] , the sifting of g returns one of the following three
outputs:
(i) a report that g ∈ f (Bi ) ∩ G [i+1] ;

(ii) a labeled branching Bi that satisfies property (c1) for i + 1 and l(B ) <
l(B ) and f (Bi ) ∩ G [i+1] = f (Bi ) ∩ G [i+1] , g; or
(iii) a labeled branching Bi and h ∈ G [i+1] with the properties that Bi satisfies
(c1) for i + 1 and l(B ) < l(B ), and f (Bi ) ∩ G [i+1] , h = f (Bi ) ∩
G [i+1] , g.
Some explanations are in order. In Section 4.2, in the original Schreier–Sims

procedure, sifting a Schreier generator g ∈ G [i+1] in the already constructed
Schreier tree data structure has two possible outcomes. One possibility is that
g sifts through, which means that g can be discarded. This case corresponds
to (i) above. The other possibility is that g has a nontrivial siftee h, which can
be added to the strong generating set we construct. We try to apply the same
strategy here and add the siftee h to Bi . However, a labeled branching is a
more complicated data structure than a sequence of Schreier trees. Sometimes,
we succeed in adding the siftee to Bi and we are in case (ii) above; at other
times, we do not succeed, and we are in case (iii). (In case (iii), the output h
is not the siftee of g, just some other group element satisfying the property
described there.) The progress we make in case (iii) is that the length of the
labeled branching decreases, so, after attempting to sift h, sooner or later an
output of type (i) or (ii) must occur.
Now we describe the sifting procedure. Given g ∈ G [i+1] , we attempt the
factorization of g as a product g = rn−1 · · · ri+1 of coset representatives, as
described at the end of Section 10.1. If the factorization succeeds then we are
in case (i) and we can terminate sifting. The other possibility is that sifting
−1 −1
constructs a group element gl := gri+1 · · · rl−1 with the following properties:
g
gl fixes {ω1 , . . . , ωl−1 } pointwise, ωl l = ωk for some k > l,
and there is no directed path from ωl to ωk in Bi . (10.1)
(Recall that we notice that there is no such path by the fact that Q[l]−1 Q[k] does
not fix {ω1 , . . . , ωl−1 } pointwise.) In some gl satisfying (10.1) is constructed
then sifting enters a second phase that tries to decrease the value of k. Namely,
−−−−−→
if there is an edge (ωm , ωk ) in Bi with m > l then gl is replaced by gl · P[k]−1 ,
and we redefine k := m. This new gl also satisfies (10.1). The second phase
terminates when B[k] = k or B[k] < l.
−−−−→
If B[k] = k then we add the edge (ωl , ωk ) to Bi with label gl , and terminate
sifting with an output in case (ii). Adding the edge to Bi is done by defining
B[k] := l and P[k] := gl .
−−−−−→ −−−−→
If m := B[k] < l then we delete the edge (ωm , ωk ) and add the edge (ωl , ωk )
with label gl . Again, this modification is done by defining B[k] := l and
P[k] := gl . The same argument as in Step 2 shows that the new branching
Bi represents transversals G [ j] mod G [ j+1] for all j ∈ [1, i]. Moreover, if m <
i + 1 then f (Bi ) ∩ G [i+1] = f (Bi ) ∩ G [i+1] , g and we are in case (ii). If
−−−−−→
i + 1 < m < l then we also output the label of the deleted edge (ωm , ωk ) as h,
and we are in case (iii).
If sifting terminates with output in case (ii) or (iii) then after termination we
recompute the array Q in the same way as was done after Step 2.
Finally, we estimate the time requirement of Step 3. One call of sifting and
the possible recomputation of Q takes O(n 2 ) time. For a fixed i ≥ 1, the number
of calls is less than 2n 2 , since we sift less than n 2 Schreier generators and less
than n 2 permutations h obtained as an output in case (iii) of some previous call
of sifting. The number of outputs in case (iii) is less than n 2 since at each output
in case (ii) or (iii) the length of the labeled branching decreases, and the original
length after Step 2 is at most n(n − 1)/2. Similarly, for i = 0, the number of
calls of sifting is less than |S| + n 2 . Hence the total time requirement of Step 3,
added up for all i, is O(n 5 + |S|n 2 ). The memory requirement for the storing
of Si , Ti , B, P, and Q is O(n 2 ).
10.2. Alternating and Symmetric Groups

Before we invoke any form of the Schreier–Sims algorithm for an input group
G = S ≤ Sym(), it is good to know whether G = Alt() or G = Sym().
We shall call Alt() or Sym() the giant permutation groups on . For giant
inputs, the running times of the original Schreier–Sims algorithm and of the
nearly linear-time version described in Section 4.5 are unnecessarily long, since
there are faster, specialized algorithms that can handle these groups.
An algorithm for the fast recognition of giants is one of the first results in com-
putational permutation group theory (cf. [Parker and Nikolai, 1958]). A variant
of this algorithm is also described in [Cannon, 1984]. The method is based on the
following result from [Jordan, 1873] (see also [Wielandt, 1964, Theorem 13.9]).
Theorem 10.2.1. Let G ≤ Sym(), with || = n, be a primitive group, and

let p be a prime satisfying p < n − 2. If G contains a p-cycle then G is a giant.
Corollary 10.2.2. Let G ≤ Sym(), with || = n, be transitive, and let p be

a prime satisfying n/2 < p < n − 2. If G contains an element that has a cycle
of length p then G is a giant.

Proof. Let g ∈ G have a cycle of length p. Since p (n − p)!, we have that
g (n− p)! is a p-cycle. Moreover, G is primitive, since an imprimitive, transitive
subgroup H ≤ Sym() must satisfy H Sm Sk for some k, m ∈ [2, n/2],

km = n. Hence H cannot contain an element of order divisible by p.
To apply Corollary 10.2.2, we need an estimate for the proportion of elements

of the giants having a cycle of length p for some prime p satisfying n/2 < p <
n − 2.
Lemma 10.2.3. (a) Let q be an integer in the range n/2 < q ≤ n. The
proportion of elements of Sn containing a cycle of length q is 1/q. In An , this
proportion is 1/q if q ≤ n − 2, and it is 0 or 2/q if q ∈ {n − 1, n}.
(b) The proportion of elements of the giants that contain a cycle of length p
for some prime p in the range n/2 < p < n − 2 is asymptotically ln 2/ ln n.
Proof. (a) There are ( qn )(q − 1)!(n −q)! elements of Sym([1, n]) with a cycle of
length q, since there are ( qn ) possibilities for the support of this cycle, and on
a fixed support, there are (q − 1)! cycles. The term (n − q)! counts the number
of possibilities for the restrictions of the permutations on [1, n]\. We counted
every permutation containing a cycle of length q exactly once, since q > n/2
implies that a permutation cannot contain two cycles of length q. Dividing by n!,
we obtain the required proportion 1/q. If q ≤ n − 2 then a similar computation
shows that there are ( qn )(q − 1)!(n − q)!/2 elements of Alt([1, n]) with a cycle
of length q, and dividing by n!/2 gives the proportion 1/q. If q ∈ {n − 1, n}
then either none or all of the ( qn )(q − 1)!(n − q)! permutations of Sym([1, n])
containing a cycle of length q are in Alt([1, n]).
(b) No element of Sym([1, n]) can have cycles of lengths p1 and p2 for
two different primes greater than n/2. Therefore, the proportion of elements
of the giants that contain a cycle of length p for some prime p in the range
n/2 < p < n − 2 is the sum of the proportions for each p. The sum of the
reciprocals of primes less than x is asymptotically ln ln x (see [Hardy and
Wright, 1979, Theorem 427]); hence

1 n ln(n − 2) ln 2 ln 2
∼ ln ln(n−2)−ln ln = ln ∼ ln 1 + ∼ .
n/2< p<n−2
p 2 ln n − ln 2 ln n ln n

To decide whether the input G = S ≤ Sym() is a giant, first we compute

whether G is transitive on . If G is transitive then we take random elements,
and we compute their cycle structure. If we find an element with a cycle of
length p in the range n/2 < p < n − 2 then we know with certainty that G is a
giant. The symmetric and alternating groups are distinguished from each other
by examining whether there is a generator of G that is an odd permutation. If
none of c ln n/ ln 2 random elements have a cycle of length p for any prime in
the range n/2 < p < n − 2 then, with probability about 1 − e−c , the input is
not a giant. Since we have no SGS for G available, we have to use one of the
methods of Section 2.2 for random element generation.
The algorithm described in the previous paragraph is a nonconstructive recog-
nition algorithm. It gives an answer for the decision problem whether G is a giant
but, in the case of an affirmative answer, it does not provide a way of expressing
group elements in terms of the input generators. In a number of applications
(for example, in Theorems 8.3.1 or 8.4.1, or when working with matrix groups)
this is not enough: We need the constructive recognition of symmetric and
alternating groups. We recall the definition of constructive recognition from
Section 8.3, specialized for alternating and symmetric groups of known degree.
Let G = S be a black-box group, isomorphic to An or Sn for a given integer
n. We say that G is constructively recognizable if there is a Las Vegas algorithm
that decides whether G ∼ = An or G ∼= Sn and finds a new set S ∗ = {s, t} gener-
ating G and a homomorphism λ : G → Sym([1, n]), specified by the image of
S ∗ , such that S ∗ satisfies the presentation

s, t | s n = t 2 = (st)(n−1) = [t, s j ]2 = 1 for 2 ≤ j ≤ n/2 (10.2)
in the case G ∼
= Sn ,
s, t | s n−2 = t 3 = (st)n = (ts −k ts k )2 = 1 for 1 ≤ k ≤ (n − 3)/2 (10.3)
in the case G ∼
= An with n odd, and
s, t | s n−2 = t 3 = (st)n−1 = [t, s]2 = 1 (10.4)
in the case G ∼
= An with n even. Moreover, there are deterministic algorithms
for the following:
(i) Given g ∈ G, find λ(g) and a straight-line program of length O(n log n)
from S ∗ to g.
(ii) Given h ∈ Sym([1, n]), decide whether or not h ∈ λ(G); and, if it is, find
λ−1 (h) and a straight-line program of length O(n log n) from λ(S ∗ ) to h.
We note that the presentation (10.2) is from the book [Coxeter and Moser,
1957], and the presentations (10.3) and (10.4) are from [Carmichael, 1923].
The permutations s = (1, 2, . . . , n) and t = (1, 2) satisfy (10.2), whereas s =
(3, 4, . . . , n) and t = (1, 2, 3) satisfy (10.3), and s = (1, 2)(3, 4, . . . , n) and
t = (1, 2, 3) satisfy (10.4).
The rest of this section is devoted to the description of the constructive recog-
nition of alternating and symmetric groups from [Beals et al., 2002]. The mate-
rial in Sections 10.2.1–10.2.3 is reproduced from this paper, c 2002 American
Mathematical Society. Reprinted by Permission. Although the algorithm works
in the black-box group setting, so it can be used to construct an isomorphism
between G and a standard copy of An or Sn for any permutation group or matrix
group representation G of these groups, there is a much simpler version if G is
a giant (i.e., G is a permutation group acting on a domain of size n). We shall
comment on this simpler version in Section 10.2.4.
Let ξ be an upper bound on the time required per element to construct inde-
pendent, nearly uniformly distributed random elements of G and let µ be an
upper bound on the time required for each group operation in G.
Theorem 10.2.4. (a) Given an integer n ≥ 7, a black-box group G = S iso-

morphic to An or Sn , and an upper bound ε > 0 for the probability of fail-
ure of the algorithm, G can be constructively recognized in O(log ε −1 (ξ n +
µ|S|n log n)) time, with probability at least 1 − ε.
The time requirement of (i) is O(µn log n), whereas constructing the straight-
line program in (ii) is in O(n log n) time and the computation of the inverse
image is in O(µn log n) time. The data structures underlying (i) and (ii) require
the storage of O(log n) elements of G.
(b) Given an arbitrary black-box group G = S, the algorithm described
in part (a) can be used as a Monte Carlo algorithm to decide whether G is
alternating or symmetric of a given degree n. If the algorithm returns the answer
“yes” then the answer is correct. The time requirement is O(log ε−1 (ξ n +
µ|S|n log n)).
We shall prove Theorem 10.2.4 in Sections 10.2.2 and 10.2.3 (cf. the summary
at the end of Section 10.2.3).
10.2.1. Number Theoretic and Probabilistic Estimates

In this section, we collect some estimates that are needed in the proof of The-
orem 10.2.4. We shall use the following notation: If the cycle decomposition
of an element h ∈ Sn consists of m i cycles of length i, where 0 ≤ m i ≤ n for
1 ≤ i ≤ n, then we say that h has cycle type 1m 1 · · · n m n . If m i = 0 then we
omit i m i from this expression. Note that if n = 6 then Aut(An ) ∼ = Sn , so the
cycle type of λ(g) is the same at any isomorphism λ : G → Sn or An . Hence
we can talk about the cycle type of elements of G.
The first four lemmas are of a number theoretic nature. In most cases, there
are better asymptotic results but, for the algorithmic applications, we need
estimates that are valid for all values of n.
Lemma 10.2.5. For all n, the number of divisors of n is at most 48 n 1/3 /25201/3 ,
and the sum of the divisors of n is at most n(3 + ln n)/2.
Proof. For each δ > 0, the number of divisors of n is at most Cδ n δ for a constant
Cδ . An algorithm for computing the value of Cδ is described in [Niven et al.,
1991, pp. 395–396]; C1/3 = 48/2520 1/3
≈ 3.527.
√n −1
We use the trivial estimate i=1 i < n/2 for the sum of the divisors less
√ √
than n. The sum
of divisors greater than or equal to n satisfies
√n
n n n 1 ln n

≤ n + + + ··· + √ < n 1 + dt = n 1 + .
2 3 & n' 1 t 2
Adding these two estimates, we obtain the second assertion of the lemma.
Lemma 10.2.6. Let x = np, where p is prime, with n > p 2 and all prime
divisors of n greater than p. Let D denote the set of divisors of x that are not
greater than n. Then |D| ≤ 96 n 1/3 /25201/3 < 8n 1/3 and
p + 3 − 2( p + 1) ln p + ( p + 1) ln n
d≤n .
d∈D
2

Proof. Because p n, the number of divisors of x is twice the number of divisors
of n. Thus Lemma 10.2.5 yields |D| ≤ 96 n 1/3 /25201/3 .
Also, because of the restriction on the prime factorization of n, all divisors

of x except np are at most n, and d∈D d = ( p + 1) d|n d − pn. Hence it
is enough to estimate the sum of the divisors of n. We use a refinement of the
√
argument in Lemma 10.2.5. For the sum of divisors less than n, we still use
the trivial estimate n/2. However, for the larger divisors, we note that since the
largest proper divisor of n is at most n/( p + 1), the sum
of divisors greater
√
or equal than n satisfies
n n n

≤n+ + + ··· + √
p+1 p+2 & n'
√n
1 ln n
< n 1+ dt = n 1 + − ln p .
p t 2
Combining these estimates, we obtain the assertion of the lemma.

Lemma 10.2.7. Let x = ny, and let D denote the set of divisors of x that are
not greater than n. Let k > 1. Then

y
d ≤n 1+
k k
.
d∈D
k−1
Proof. All divisors d ∈ D are of the form d = x/s for some s ≥ y. Therefore,
x k k ∞
x x 1
d ≤
k
< +x k
dt
d∈D s=y s y y tk
k k
x 1 x y
= + x · k−1
k
= 1+ .
y y (k − 1) y k−1
Lemma 10.2.8. Let x = m! (n − m). If n > m + m! m then the largest divisor
of x not exceeding n is n − m.
Proof. Suppose that there is a divisor k of x with n − m < k ≤ n. Then

(k, n − m) ≤ m, and so the least common multiple of k and n − m is at least
k(n − m)/m. This implies x ≥ k(n − m)/m > k(m! m)/m > m! (n − m),
which is a contradiction.
The proof of the next lemma is quite simple and, since it is similar to the
proof of Lemma 10.2.3(a), it is omitted.
Lemma 10.2.9. For n ≥ 10, each column of the following table lists a cycle
type in the first row and the proportion of elements of that cycle type in Sn in
the second row.
cycle type n1 11 (n − 1)1 31 (n − 3)1
proportion n −1 (n − 1)−1 1
3
(n − 3)−1
cycle type 11 31 (n − 4)1 12 31 (n − 5)1 13 31 (n − 6)1
proportion 1
3
(n − 4)−1 1
6
(n − 5)−1 1
18
(n − 6)−1
In An , the proportions are 0 or twice the proportions in Sn , depending on
whether the permutations with the desired cycle type are odd or even.
Let x be the product of the cycle lengths in a cycle type occurring in

Lemma 10.2.9. The rest of this section is devoted to estimating the conditional
probability that a randomly chosen element h ∈ An or Sn satisfying h x = 1 has
the cycle type that defined x. This is the key step in the analysis of the algorithm
proving Theorem 10.2.4.
For integers n and x, we define Tn (x) := {h ∈ Sn | h x = 1} and Nn (x) :=
|Tn (x)|.
Theorem 10.2.10. Let m be a fixed nonnegative integer. For any ε > 0 there
exists a bound n(ε) such that if h is a uniformly distributed random permutation
from Sn for some n > n(ε) then the probability that h m! (n−m) = 1 is less than
(1 + ε)/n.
Proof. If h is a uniformly distributed random permutation from Sn then

Prob(h m! (n−m) = 1) = Nn (m! (n − m))/n!. Therefore, we have to prove the up-
per bound (1 + ε)n!/n for Nn (m! (n − m)). We can suppose that n > 8 + 8m! m.
We denote the set of divisors of m!(n − m) by D.
The basic strategy of the proof is as follows. For h ∈ Sn , h m!(n−m) = 1 if and
only if the length of each cycle of h is a divisor of m!(n − m). This gives us
too many conditions to handle, so, instead, we fix a number k and consider the
set of those h that satisfy the property that the lengths of the cycles intersecting
the first k points of the permutation domain are divisors of m!(n − m). We shall
compute an upper estimate for the number of such permutations in terms of
n, m, and k (cf. (10.7)), and we are lucky enough that k can be chosen to be a
constant (depending on m and ε, but independent of n) so that for large enough
n, this upper estimate is less than (1 + ε)n!/n.
Let k ≥ 3 be fixed, let = {P1 , . . . , Pl } be a fixed partition of {1, . . . , k} into
l nonempty parts for some l ≤ k, and let d1 , d2 , . . . , dl be a sequence of elements

from D such that li=1 di ≤ n. First, we estimate from above the number
N (k, , d1 , . . . , dl ) of permutations h ∈ Tn (m! (n − m)) for which h has cycles
C1 , . . . , Cl of lengths d1 , . . . , dl , respectively, such that Ci ∩ {1, 2, . . . , k} = Pi
for 1 ≤ i ≤ l.
We can choose the support set of C1 in ( d1n−k −|P1 |
) ways, and then the cycle C1
itself in (d1 − 1)! ways. Recursively, if C1 , . . . , Ci−1 are already defined, the
support set of Ci can be chosen in

n − k − i−1 j=1 (d j − |P j |)
di − |Pi |
ways, and then the cycle Ci can be chosen in (di − 1)! ways. Finally, after
C1 , . . . , Cl are defined, the rest of the permutation can be chosen in at most

(n − li=1 di )! ways. Multiplying these numbers, we obtain
l
(n − k)! − 1)!
l
i=1 (di |Pi |−1
N (k, , d1 , . . . , dl ) ≤ l ≤ (n − k)! di . (10.5)
i=1 (di − |Pi |)! i=1
For a fixed partition , let N (k, ) denote the sum of all N (k, , d1 , . . . , dl )
obtained above for all choices d1 , . . . , dl of divisors of m! (n − m). If l = 1

then we obtain N (k, ) ≤ (n − k)! d∈D d k−1 , which, by Lemmas 10.2.8 and
10.2.7, is at most

m!
(n − k)!(n − m)k−1 1 + . (10.6)
k−2
For l ≥ 2, we use the estimates that, by Lemma 10.2.5, a sequence d1 , . . . , dl
from D can be chosen in at most (8(m! (n −m))1/3 )l ways and for each sequence
|P |−1
(|Pi |−1)
trivially li=1 di i ≤n i = n k−l . Hence, since n ≥ 8 + 8m! m,
l
N (k, ) ≤ (n − k)! 8(m! (n − m))1/3 n k−l
2
≤ (n − k)! 8(m!(n − m))1/3 n k−2 < 64m! (n − k)! n k−4/3 .
We estimate (n − k)! by
k
n! n n n n! k n! k
(n − k)! = · · · < 1 + < k e n−k
nk n − 1 n − 2 n−k+1 nk n−k n
and the number of partitions of {1, 2, . . . , k} by k k (note that each partition can
be obtained as the sets where some function f : {1, 2, . . . , k} → {1, 2, . . . , k}
takes constant values). Combining these estimates with the observation that

each element of Tn (m! (n − m)) is counted exactly once in N (k, ), we
obtain that

m! k 1 k 1
Nn (m! (n − m)) ≤ n! 1 + e n−k + 64m!k k e n−k 4/3 . (10.7)
k−2 n n
Given ε > 0, we choose k such that

m! k ε
1+ e k 2 −k < 1 + .
k−2 2
After that, we choose n 0 > k 2 such that

k ε 1/3
64m! k k e n0 −k < n .
2 0
Then, for n > max{8 + 8m! m, n 0 }, we have Nn (m! (n − m)) < (1 + ε)n!/n.

Corollary 10.2.11. Let ε > 0 and n > n(ε).

(a) Let h be a uniformly distributed random element of Tn (m! (n − m)). Then,
with probability greater than 1 − ε, h contains a cycle of length n − m.
(b) Let 2 ≤ s ≤ m, and let h be a uniformly distributed random element of
Tn ((n − m)s). Then, with probability greater than (1 − ε)/m!, the cycle
structure of h is 1m−s s 1 (n − m)1 .
Proof. (a) This is immediate from Lemma 10.2.3(a) and Theorem 10.2.10.
(b) Tn ((n − m)s) ⊆ Tn (m! (n − m)), so it has at most (1 + ε)n!/n elements.
Out of these, there are n!/(s(n − m)(m − s)!) > n!/(m! n) with cycle structure
1m−s s 1 (n −m)1 , so the proportion of elements in Tn ((n −m)s) with the required
cycle structure is greater than (1 − ε)/m!.
Corollary 10.2.11 covers all cases occurring in Lemma 10.2.9, since if m ∈

{0, 1} then part (a) of the corollary can also be interpreted as a conditional
probability of permutations with the required cycle structure. Note that if x is
odd then Tn (x) ⊆ An , so if (n − m)s is odd then the conditional probability that
an element h ∈ An has cycle structure 1m−s s 1 (n − m)1 given that h (n−m)s = 1 is
the same as the corresponding conditional probability in Sn . However, for our
algorithmic application, we need a lower bound for the conditional probability
that is valid for all values of n.
Theorem 10.2.12. Let n ≥ 5 and let h be a randomly selected permutation

from Sn . Let x = n or x = (n − m)s for one of the cycle types 1m−s s 1 (n − m)1
described in Lemma 10.2.9. Then, given that h x = 1, the conditional probability
that h is an n-cycle, or h has cycle structure 1m−s s 1 (n − m)1 , is at least 1/180.
Proof. Using the notation of the proof of Theorem 10.2.10, we derive a tighter
upper bound for Nn (x) by evaluating (10.5) more carefully in the case k = 3.
First, we suppose that n > 50.
There is one partition 3 of {1, 2, 3} with three parts, and (10.5) gives N (3,
3 , d1 , d2 , d3 ) ≤ (n − 3)! d10 d20 d30 = (n − 3)!. By Lemmas 10.2.5 and 10.2.6,
963 963
N (3, 3 ) ≤ (n − m) (n − 3)! ≤ n (n − 3)!.
2520 2520
There are three partitions 2 of {1, 2, 3} with two parts, and (10.5) gives
N (3, 2 , d1 , d2 ) ≤ (n − 3)! d1 d20 . Hence, using both statements of Lemmas
10.2.5 and 10.2.6,
96 6 − 8 ln 3 + 4 ln n
N (3, 2 ) ≤ (n − 3)! n 4/3 .
25201/3 2
In this estimate, we used Lemma 10.2.6 with p = 3, since for n > 50 this value
gives the largest upper bound.
Finally, there is one partition 1 of {1, 2, 3} with one part, N (3, 1 , d1 ) ≤

(n−3)! d12 and, by Lemmas 10.2.7 and 10.2.8, N (3, 1 ) ≤ 4n 2 (n−3)!. (Again,
the case y = 3 yields the largest upper estimate.) Adding these estimates, we
obtain

3 · 96 4/3 963
Nn (x) ≤ (n − 3)! 4n 2 + n (3 − 4 ln 3 + 2 ln n) + n
25201/3 2520
= f (n)(n − 1)!.
The function f (n) is monotone decreasing for n ≥ 50.

We claim that Nn (x) ≤ 10(n − 1)! for all n ≥ 5. We have f (301) < 10,
so it is enough to check that Nn (x) ≤ 10(n − 1)! for 5 ≤ n ≤ 300. This can
be done by a GAP program, using the following recursion to compute Nn (x):
For 1 ≤ k ≤ n, let r x (k) denote the number of permutations h ∈ Sk such that
h x = 1, and initialize r x (0) = 0. Then
k − 1
r x (k) = (d − 1)! rn (k − d),
d|x
d −1
which can be seen by partitioning the permutations according to the length

of the cycle containing the first point of the permutation domain. We have
Nn (x) = r x (n).
By Lemma 10.2.9 and by checking the cases n ≤ 9, which are not covered
by the lemma, the number of permutations with the required cycle structure is
at least (n − 1)!/18. Hence the proportion of these elements in Tn (x) is at least
1/180.
Although Theorem 10.2.12 gives a positive constant lower estimate for the
conditional probability that can be used in the design of an algorithm with
asymptotic running time described in Theorem 10.2.4, the constant 1/180 is
too small for an efficient implementation. In fact, we cannot expect a good
constant from an argument covering all cases together, since Nn (n) ≥ (n − 1)!
and the number of permutations with cycle type 13 31 (n − 6)1 is only about
(n − 1)!/18. Therefore, in the range n ≤ 300, we have computed the value
of Nn (x) explicitly, and this can be used for obtaining better estimates for the
number of iterations in an implementation. We summarize these computations
in the next lemma. Without doubt, the constant 1/7 bounds the conditional
probabilities for larger values of n as well, for all required cycle types. We also
note that [Warlimont, 1978] gives an asymptotic estimate with a very precise
error term for Nn (n).
Lemma 10.2.13. Let 8 ≤ n ≤ 300.
(a) If n ∈ {8, 12, 24} then the conditional probability is greater than 1/2 for
the event that an element of Sn of order dividing n is an n-cycle. For all n,
the conditional probability is greater than 1/4.
(b) If n is even then the conditional probability is greater than 1/2 for the event
that an element of An of order dividing n − 1 has cycle type 11 (n − 1)1 .
(c) If n ∈ {31, 61, 91, 121, 151, 181, 211, 241, 271} then the conditional prob-
ability is greater than 1/3 for the event that an element of An of order di-
viding 3(n − m) for the m values described in Lemma 10.2.9 has cycle type
1m−3 31 (n − m)1 . For all n, the conditional probability is greater than 1/7.
10.2.2. Constructive Recognition: Finding the New Generators

Given a black-box group G = S isomorphic to An or Sn , in this section we
describe an algorithm that constructs s, t ∈ G such that the subgroup s, t
satisfies the presentation (10.3) or (10.4), if n is odd or even, respectively.
We construct random elements of G (the number of which are needed will be
computed in the proof of Theorem 10.2.17) to find a ∈ G satisfying a n−k = 1,
where k = 0 if n is odd, and k = 1 if n is even. Also, we construct random
elements c ∈ G to find one satisfying c3(n−m) = 1, where m is given in the
following table:
n mod 6 0 1 2 3 4 5
(10.8)
m 5 6 3 4 3 4
Then, by Theorem 10.2.12, the cycle type of a is 1k (n − k)1 with probability at

least 1/180, and the cycle type of b := cn−m is 1n−3 31 with probability at least
1/180.
The following two lemmas describe algorithms that, given a, b ∈ G as in the
previous paragraph, construct s, t ∈ G as required in (10.3) or (10.4). These
algorithms are of Las Vegas type, which means that if the input group elements
a, b have cycle type 1k (n − k)1 and 1n−3 31 , respectively, then the output group
elements s, t behave as required, or the algorithm reports failure. However, if
the input elements do not have the prescribed cycle types then the algorithms
may return an incorrect answer. If the output group elements s, t are incorrect
then this fact is noticed when we check whether s, t satisfies the presentation
(10.3) or (10.4).
Lemma 10.2.14. Let n be odd and n ≥ 7. Suppose that a ∈ G has cycle type
n 1 and b ∈ G has cycle type 1n−3 31 . Then in O(ξ n + µn) time, it is possible to
construct s, t ∈ G such that s, t satisfies the presentation (10.3) for An . This
algorithm is of Las Vegas type and succeeds with probability at least 3/4.
Proof. Let us fix a homomorphism λ : G → Sym([1, n]) such that λ(a) =

(1, 2, . . . , n). Our first goal is to show that, with probability at least 3/4, among
1 + n/3 random conjugates of b we find one, c, satisfying λ(c) = (i, i + 1, k)
or λ(c) = (i + 1, i, k) with 1 ≤ i, k ≤ n and k ∈ {i, i + 1}, where the numbers
are taken modulo n.
Suppose c is a conjugate of b satisfying [c, ca ] = 1. We claim that λ(c) has
support set {i, i + 1, k} as desired. Indeed, as c is a conjugate of b, it satisfies
λ(c) = (i, j, k) for some triple {i, j, k} and λ(ca ) = (i + 1, j + 1, k + 1). Now
c and ca do not commute if and only if the sets {i, j, k} and {i + 1, j + 1, k + 1}
intersect. Hence it follows that two of i, j, k are consecutive numbers modulo n.
Next we show that the probability that we cannot find such an element c
among 1 + n/3 random conjugates of b is less than 1/4. There are ( n3 ) possible
support sets for a 3-cycle, and out of these, n(n − 3) contain two consecutive
numbers modulo n. Hence one random conjugate succeeds with probability
n(n − 3)/( n3 ) = 6(n − 3)/((n − 1)(n − 2)), and the probability that 1 + n/3 >
2(n − 1)(n − 2)/(6(n − 3)) random conjugates succeed is greater than
2(n−1)(n−2)
6(n − 3) 6(n−3) 1 3
1− 1− >1− > .
(n − 1)(n − 2) e2 4
The rest of the algorithm is deterministic and runs in O(µ) time. Without
loss of generality, we can suppose that supp(λ(c)) = {1, 2, k} for some k with
3 ≤ k ≤ n − 1. The next goal is to construct t ∈ G such that λ(t) = (1, 2, 3).
If k = 3 then cca is an involution whereas if 4 ≤ k ≤ n − 1 then cca has
order 5. Hence these two cases can be distinguished in O(µ) time.
Suppose first that k = 3. We can distinguish the cases λ(c) = (1, 2, 3) and
2
λ(c) = (1, 3, 2) by computing x := ca and y := c x . In the first case λ(y) =
(1, 2, 4) and in the second case λ(y) = (1, 5, 2), which can be distinguished by
2
checking whether [y, y a ] = 1. After that, t is defined as t := c or t := c2 ,
respectively.
Suppose next that 4 ≤ k ≤ n−1. The case k ∈ {4, n−1} can be distinguished
2
from the case 5 ≤ k ≤ n −2 by checking whether [c, ca ] = 1. If 5 ≤ k ≤ n −2
then λ(c) = (1, 2, k) can be distinguished from λ(c) = (1, k, 2) by computing
x := ca and y := c x . In the first case λ(y) = (1, 3, k) and in the second
case λ(y) = (1, k, k + 1), which can be distinguished by checking whether
[y, y a ] = 1. If it turns out that λ(c) = (1, 2, k) then define t := [c2 , x]. If
λ(c) = (1, k, 2) then define t := [c, x 2 ].
If k ∈ {4, n − 1} then the cases λ(c) = (1, 2, 4), λ(c) = (1, 4, 2), λ(c) =
(1, 2, n − 1), and λ(c) = (2, 1, n − 1) are also distinguished by computing x :=
ca and y := c x . In the four cases, λ(y) = (1, 3, 4), λ(y) = (1, 4, 5), λ(y) = (n−
1, 1, 3), and λ(y) = (n −1, n, 1), respectively. The third of these is distinguished
from the others as the only one with [y, y a ] = 1. The second one is distinguished
2
among the remaining three as the only one with [y, y a ] = 1. Finally, the first
and fourth are distinguished by the order of yy a . If λ(c) = (1, 2, 4) or λ(c) =
(1, 2, n − 1) then define t := [c2 , x]. If λ(c) = (1, 4, 2) or λ(c) = (2, 1, n − 1)
then define t := [c, x 2 ].
Finally, output s := at 2 , which satisfies λ(s) = (3, 4, . . . , n), and t.
Lemma 10.2.15. Let n be even and n ≥ 10. Suppose that a ∈ G has cycle type
11 (n − 1)1 and b ∈ G has cycle type 1n−3 31 . Then, in O(ξ n + µn) time, it is
possible to construct s, t ∈ G such that s, t satisfies the presentation (10.4)
for An . This algorithm is of Las Vegas type and succeeds with probability at
least 3/4.
Proof. Let us fix a homomorphism λ : G → Sym([1, n]) such that λ(a) =

(2, 3, . . . , n). Our first goal is to show that, with probability at least 3/4, among
2n/3 random conjugates of b we find one, c, satisfying λ(c) = (1, i, j) with
2 ≤ i, j ≤ n.
2
Suppose c is a conjugate of b satisfying [c, ca ] = 1, [c, ca ] = 1, and
4
[c, ca ] = 1. We claim that λ(c) has support set {1, i, j} as desired. Indeed,
m
as c is a conjugate of b, if 1 ∈ supp(λ(c)) then c does not commute with ca
for any m. However, if supp(λ(c)) = {i, j, k} ⊆ {2, 3, . . . , n} then c commutes
with ca if no two of i, j, k are consecutive in the (n − 1)-cycle λ(a), and in the
2 4
other cases, it is easy to check that c commutes with ca or ca .
Next we show that the probability that we cannot find such an element c
among 2n/3 random conjugates of b is less than 1/4. There are ( n3 ) possi-
ble support sets for a 3-cycle, and out of these, ( n−1 2
) contain 1. One random
conjugate succeeds with probability ( n−1 2
)/( n
3
) = 3/n. Hence the probability
that 2n/3 random conjugates succeed is greater than 1 − (1 − 3/n)2n/3 > 1 −
1/e2 > 3/4.
Define t := [ca , c]. Then λ(t) = (1, i, i +1) and, without loss of generality, we
may suppose λ(t) = (1, 2, 3). Then s := at satisfies λ(s) = (1, 2)(3, 4, . . . , n).
Output s and t.
Lemma 10.2.16. Given s, t ∈ G, it can be checked in O(µn) time whether

s, t satisfies the presentation for An given in (10.3) and (10.4).
Proof. The case when n is even, as well as the evaluation of the relators s n−2 , t 3 ,
and (st)n in the odd case, is clear. In the case when n is odd, we evaluate the
relators (ts −k ts k )2 for k = 1, . . . , &(n − 3)/2' in &(n − 3)/2' rounds: In the
kth round, we use the input s k−1 and we output s k and (ts −k ts k )2 . One round
requires only a constant number of group operations.
Theorem 10.2.17. Given a black-box group G = S isomorphic to An or Sn

and an error probability ε > 0, group elements s, t ∈ G such that s, t satisfies
the presentation for An given in (10.3) or (10.4) can be constructed by a Las
Vegas algorithm, with probability at least 1 − ε, in O(log ε −1 (ξ n + µn log n))
time.
Proof. By Lemma 10.2.9, among 2n uniformly distributed random elements

a ∈ G we can find one satisfying a n−k = 1, with the appropriate k ∈ {0, 1},
with probability at least 1 − (1 − 1/n)2n > 1 − 1/e2 > 3/4. Similarly, among
36n uniformly distributed random elements c ∈ G we can find one satisfying
c3(n−m) = 1 for the value m described in (10.8), with probability greater than
3/4. Constructing a, c and taking the appropriate powers can be done in O(ξ n +
µn log n) time. By Theorem 10.2.12, a has cycle type 1k (n − k)1 and cn−m has
cycle type 1n−3 31 with probability at least 1/1802 . By applying the appropriate
one of Lemmas 10.2.14 and 10.2.15, and then Lemma 10.2.16, if a and c have
the correct cycle type then s and t are constructed in O(ξ n + µn) time, with
probability at least 3/4. Hence the entire procedure takes O(ξ n + µn log n)
time and succeeds with probability greater than (3/4)3 /1802 .
Repeating this procedure up to (4/3)3 1802 ln(ε −1 ) times, we construct s
and t with probability at least 1 − ε, in O(log ε−1 (ξ n + µn log n)) time.
Remark 10.2.18. We note that, in view of Lemma 10.2.13, for “practical” val-
ues of n the number (4/3)3 1802 ln(ε −1 ) can be replaced by (4/3)3 72 ln(ε −1 )
in the expression for the number of iterations in the proof of Theorem 10.2.17.
This reduction does not really matter if the input group is indeed isomorphic
to An or Sn , since the expected number of iterations depends on the true con-
ditional probabilities that an element of order dividing (n − m)s has cycle type
1n−m−s s 1 (n − m)1 for the appropriate values of s and m, and it does not depend
on how badly or well we estimate these conditional probabilities. However, if
the algorithm is used as a Monte Carlo algorithm to test whether an unknown
input group G is isomorphic to An or Sn then the better constants ensure earlier
termination in the case of a negative answer.
10.2.3. Constructive Recognition: The Homomorphism λ
Given a black-box group G = S isomorphic to An or Sn , the algorithm
described in the proof of Theorem 10.2.17 constructs s, t ∈ G such that s, t ∼ =
An and it satisfies the presentation in (10.3) or (10.4). In this section we construct
a homomorphism λ : G → Sym([1, n]) by specifying the images of s and t and
by giving a procedure that constructs the image of any z ∈ G. The algorithm
will detect if G ∼ = Sn , and in this case it replaces s and t by two new elements
s1 , t1 such that s1 , t1 satisfies (10.2). We shall also describe a procedure that,
given an arbitrary element z ∈ G, computes a straight-line program reaching z
from s and t (or from s1 and t1 in the case G ∼ = Sn ).
Recall that for n > 6 we have Aut(An ) ∼ = Sn , and so, for any g ∈ G, the cy-
cle structure of λ(g) is the same for any faithful homomorphism λ : G →
Sym([1, n]). Therefore, without loss of generality, we can assume that λ(s) =
(3, 4, . . . , n) or λ(s) = (1, 2)(3, 4, . . . , n), depending on the parity of n, and
λ(t) = (1, 2, 3).
We start with the easier inverse problem: Finding straight-line programs
reaching any p ∈ An from g := λ(s) and h := λ(t).
Lemma 10.2.19. Given p ∈ An , a straight-line program of length O(n log n)

reaching p from g and h can be constructed in O(n log n) time by a deterministic
algorithm.
Proof. Let g1 := g and h 1 := h. Recursively for i = 1, . . . , n − 3, define
h i−1 gi−1 h i gi h i if n − i is even

h i+1 =
h i−1 gi−1 h i2 gi h i if n − i is odd,
gi h i+1 if n − i is even
gi+1 = −1 (10.9)
gi h i2 h i+1 if n − i is odd.
Then h i = (i, i + 1, i + 2) for all i ∈ [1, n − 2], and gi = (i, i + 1)(i + 2, i +

3, . . . , n) or gi = (i + 2, i + 3, . . . , n), depending on the parity of n − i. It is
clear from the recursion that all gi , h i can be obtained by a single straight-line
program of length O(n) from g and h. Hence it is enough to write a straight-line
program of length O(n log n) from T := {gi , h i | 1 ≤ i ≤ n − 2} to p.
Write p as the product of transpositions by decomposing each cycle
(c1 , . . . , cl ) of p as (c1 , . . . , cl ) = (c1 , c2 )(c1 , c3 ) · · · (c1 , cl ). Since p ∈ An ,
we have an even number of transpositions in this product. By inserting
(n − 1, n)(n − 1, n) between the (2k − 1)st and (2k)th transposition for all
k, the permutation p is written as the product of less than n permutations of the
form (i, j)(n −1, n) or (n −1, n)(i, j) and it is enough to show that any such per-
mutation can be obtained by a straight-line program of length O(log n) from T .
For any k ∈ [i + 2, n], we have gi−(k−i−2) h i gik−i−2 = (i, i + 1, k) or
gi−(k−i−2) h i gik−i−2 = (i +1, i, k), and these 3-cycles can be reached by straight-
line programs of length O(log n) from T . Hence it is enough to observe that
(i, i + 1)(n − 1, n) = (i, n, i + 1)(i, i + 1, n − 1)(i, n, i + 1)
and, for j ∈ [i + 2, n − 2], (i, j)(n − 1, n) = (i, i + 1, j) · (i, i + 1)(n − 1, n) ·

(i, j, i + 1). Finally, if i < j and j ∈ {n − 1, n} then (i, j)(n − 1, n), (n − 1,
n)(i, j) ∈ (i, n − 1, n) and (i, n − 1, n) = (i, n − 1, i + 1)(i, i + 1, n).
Corollary 10.2.20. Given p ∈ An , the inverse image λ−1 ( p) ∈ G can be

computed in O(µn log n) time by a deterministic algorithm. At any moment
during the execution of this algorithm, we have to store only a constant number
of elements of G.
Proof. We simply evaluate in G the straight-line program reaching p from

g and h, starting with s and t. The storage requirement can be satisfied by
ordering the cycles of p according to the smallest element contained in them and
decomposing each cycle as (c1 , . . . , cl ) = (c1 , c2 )(c1 , c3 ) · · · (c1 , cl ) using its
smallest element c1 . Then transpositions (i, j) with i < j for some fixed i occur
consecutively, and all later transpositions (i 1 , j1 ) have i 1 , j1 > i. Therefore,
evaluating the straight-line programs reaching the preimages of the elements
(i, j)(n − 1, n) and (n − 1, n)(i, j) successively as required in the proof of
Lemma 10.2.19, we have to store λ−1 (gi ) and λ−1 (h i ) only for a fixed i at
any given time. Note that an inverse image λ−1 (gik−i−2 ) can be computed by
repeated squaring, storing only a constant number of elements of G at any time.
After processing all (i, j) with this fixed i, we compute λ−1 (gi+1 ) and λ−1 (h i+1 )
by the formulas in (10.9), and we discard λ−1 (gi ) and λ−1 (h i ).
The main idea of the construction of λ(z) for an arbitrary z ∈ G is the fol-
lowing. We need to define an n-element set on which G acts, and then we
need to identify with {1, 2, . . . , n}. We define as a set of unordered pairs
{a, b} ⊆ G such that both a and b have cycle type 1n−3 31 , satisfying the require-
ment that the supports of λ(a) and λ(b) intersect in exactly one point i ∈ [1, n].
In this way, {a, b} can be identified with i, and can be identified with [1, n].
Representing in this way creates two problems. First, we do not want to
store 2n elements of G, so the set is not computed explicitly. Elements of ,
when needed, will be reached by straight-line programs from s and t. Second,
for an arbitrary z ∈ G the image {a z , b z } of a point {a, b} ∈ is not necessarily
an element of . Thus we need to be able to identify the intersection of the
supports of λ(a z ) and λ(b z ) with supp(λ(c))∩supp(λ(d)) for some {c, d} ∈ in a
way different from simply checking whether {a z , b z } = {c, d}. We shall narrow
down the possibilities for supp(λ(a z )) ∩ supp(λ(b z )) to a constant number by
taking commutators of a z and b z with certain elements x1 , . . . , xm of G of
order five, utilizing the fact that an element of order five and a three-cycle in
Sn commute if and only if their supports are disjoint.
Now we describe the construction of the elements x1 , . . . , xm . Let n = 5k +r
where 0 ≤ r ≤ 4, let m := log2 (k + 1) , and put m := 2m . For 1 ≤ j ≤ m ,
we define partitions p j = {P j,0 , P j,1 } of the set {1, . . . , k} into two parts such
that the common refinement of these partitions is the trivial one. Namely, for
each i ∈ [1, k], we compute the binary expansion b1(i) b2(i) . . . bm(i) of i. For i ∈
[1, k] and j ∈ [1, m], let
b(i)
j if j ≤ m
ε( j, i) = (10.10)
1 − b(i)
j−m if j ≥ m + 1.
Then, for 1 ≤ j ≤ m , we define P j,0 := {i | ε( j, i) = 0} and P j,1 := {i |

ε( j, i) = 1}.
We define elements a1 , . . . , ak of G with cycle type 1n−5 51 as follows: Work-
ing in the group A := {s −e ts e | 0 ≤ e ≤ 7} ∼ = A10 , in O(µ) time we construct
a1 , a2 ∈ s, t with λ(a1 ) = (1, 2, 3, 4, 5) and λ(a2 ) = (6, 7, 8, 9, 10). We also
compute c := s 5 and define, but do not compute, ai := c−(i−2) a2 ci−2 for
3 ≤ i ≤ k. Note that λ(ai ) = (5i − 4, 5i − 3, 5i − 2, 5i − 1, 5i).
k ε( j,i)
Finally we define x j := i=1 ai for 1 ≤ j ≤ m. Thus each x j has order

five. For j ≤ m , we have {i | supp(λ(ai )) ⊆ supp(λ(x j ))} = P j,1 . Similarly, for
j > m , we have {i | supp(λ(ai )) ⊆ supp(λ(x j ))} = P j−m ,0 and so the supports
of λ(x j ) and λ(x j+m ) are disjoint for all j ∈ [1, m ].
Note that, for any J ⊆ {1, 2, . . . , m },

supp(λ(x j )) ∩ supp(λ(x j+m )) ∈ {0, 5}. (10.11)
j∈J j∈{1,2,...,m }\J

If this intersection is nonempty, then it is the support of λ(ai ) for the unique i
whose binary expansion contains 1s exactly in the positions given in J .
Lemma 10.2.21. Given a1 , a2 , and c as defined in the preceding paragraphs,

x1 , . . . , xm can be computed in O(µn log n) time, storing only O(log n) ele-
ments of G at any moment of the computation.
Proof. Initialize x j := 1 for 1 ≤ j ≤ m. Iteratively for i = 1, . . . , k, compute

ε( j,i)
x j := x j ai for all j, compute ai+1 = aic , and discard ai .
Lemma 10.2.22. Given z ∈ G and {x1 , . . . , xm } as constructed in the forego-

ing, for any l ∈ {1, . . . , n} the image l λ(z) can be determined in O(µ log n) time
by a deterministic algorithm.
Proof. We choose some i with 1 ≤ i ≤ n − 4 such that l ∈ {i, i + 1, i +

2, i + 3, i + 4}, and we determine i λ(z) , (i + 1)λ(z) , (i + 2)λ(z) , (i + 3)λ(z) , and
(i +4)λ(z) simultaneously. First, we construct the ten elements t{i1 ,i2 ,i3 } of G with
the property that supp(λ(t{i1 ,i2 ,i3 } )) = {i 1 , i 2 , i 3 }, for all three-element subsets
{i 1 , i 2 , i 3 } ⊆ {i, i +1, i +2, i +3, i +4}. This can be done by constructing these
ten elements t{i1 ,i2 ,i3 } for 6 ≤ i 1 , i 2 , i 3 ≤ 10 in the group A := {s −e ts e | 0 ≤
e ≤ 7} ∼ = A10 in O(µ) time and conjugating the result by s i−6 .
Let j ∈ [1, m ] be fixed. We construct the twenty commutators [x j ,
(t{i1 ,i2 ,i3 } )z ], [x j+m , (t{i1 ,i2 ,i3 } )z ]. At least one of these twenty commutators is
trivial, since at least three elements of {i, i + 1, i + 2, i + 3, i + 4}λ(z) are out-
side supp(λ(x j )) or supp(λ(x j+m )), and two permutations with disjoint support
commute.
Suppose, for example, that [x j , (t{i,i+1,i+2} )z ] = 1; in the other nineteen cases,
we can use an analogous argument. That [x j , (t{i,i+1,i+2} )z ] = 1 means that
i λ(z) , (i + 1)λ(z) , (i + 2)λ(z) ∈ supp(λ(x j+m )) ∪ {5k + 1, . . . , 5k + r }.
We also want to compute which of the sets supp(λ(x j )) ∪ {5k + 1, . . . , 5k + r }

and supp(λ(x j+m )) ∪ {5k+1, . . . , 5k+r } contain the other two images (i +3)λ(z)
and (i + 4)λ(z) . If [x j , (t{i,i+1,i+3} )z ] = 1 then (i + 3)λ(z) ∈ supp(λ(x j )) ∪ {5k +
1, . . . , 5k + r }, since {i, i + 1, i + 3}λ(z) can intersect supp(λ(x j )) only in the
point (i + 3)λ(z) . If [x j , (t{i,i+1,i+3} )z ] = 1 then (i + 3)λ(z) ∈ supp(λ(x j+m )) ∪
{5k + 1, . . . , 5k + r }. We can decide similarly which of these two sets contains
(i + 4)λ(z) .
Having performed the commutator calculations of the previous two para-
graphs for all j ∈ [1, m ], by (10.11) we have at most 5 +r ≤ 9 possibilities for
l λ(z) . To finish the algorithm, we need a procedure that decides whether l λ(z) = ı̄
for a fixed ı̄ ∈ [1, n], in O(µ log n) time.
We construct s1 , s2 , s3 , s4 , s5 ∈ s, t of cycle type 1n−3 31 , such that
supp(λ(sd1 )) ∩ supp(λ(sd2 )) = {ı̄} for any two distinct d1 , d2 ∈ [1, 5]. Again,
these sd can be obtained as conjugates of appropriate elements of A. We
also pick t{i1 ,i2 ,i3 } and t{i1 ,i2 ,i3 } from our collection of ten group elements
such that {i 1 , i 2 , i 3 } ∩ {i 1 , i 2 , i 3 } = {l}. We claim that l λ(z) = ı̄ if and only if
[sd , (t{i1 ,i2 ,i3 } )z ] = 1 for at least four d ∈ [1, 5] and [sd , (t{i1 ,i2 ,i3 } )z ] = 1 for at
least four d ∈ [1, 5]. Indeed, if l λ(z) = ı̄ then supp(λ((t{i1 ,i2 ,i3 } )z )) intersects but
is not equal to at least four of the sets supp(λ(sd )) and so (t{i1 ,i2 ,i3 } )z and sd do
not commute. The same argument works for (t{i1 ,i2 ,i3 } )z as well. Conversely, if
[sd , (t{i1 ,i2 ,i3 } )z ] = 1 for at least four d ∈ [1, 5] then supp(λ((t{i1 ,i2 ,i3 } )z )) inter-
sects four three-element sets with pairwise intersection ı̄. This can happen only
if ı̄ ∈ {i 1 , i 2 , i 3 }λ(z) , and similarly ı̄ ∈ {i 1 , i 2 , i 3 }λ(z) .
Lemma 10.2.23. (a) Given s, t and x1 , . . . , xm , it can be decided in

O(µ|S|n log n) time whether G ∼ = An or G ∼ = Sn .
∼
(b) If it turns out that G = Sn then generators s1 , t1 for G satisfying (10.2)
can be computed in O(µn log n) time.
(c) If G ∼ = Sn then, for any permutation p ∈ Sn , a straight-line program of
length O(n log n) reaching p from λ(s1 ) and λ(t1 ) can be written by a determin-
istic algorithm, in O(n log n) time. The inverse image λ−1 ( p) can be computed
in O(µn log n) time, storing only a constant number of elements of G at any
moment during the computation.
Proof. (a) By Lemma 10.2.22, the images λ(z) of all input generators z ∈ S
can be computed in O(µ|S|n log n) time. All λ(z) are even permutations if and
only if G ∼= An .
(b) Suppose that we have found z 0 ∈ S such that q := λ(z 0 ) is an odd
permutation. By Corollary 10.2.20, z 1 := λ−1 (q · (1, 2)) can be computed in
O(µn log n) time, and then t1 := z 0−1 z 1 satisfies λ(t1 ) = (1, 2). Depending on
the parity of n, we compute z 2 := s or z 2 := t1 s such that λ(z 2 ) = (3, 4, . . . , n).
Finally, we compute s1 := z 2 t, which satisfies λ(s1 ) = (1, 2, . . . , n).
(c) Depending on the parity of p, we compute a straight-line program reaching
p or p · (1, 2) from λ(s) and λ(t), as described in Lemma 10.2.19. Then, it is
enough to observe that λ(s) and λ(t) can be reached from λ(s1 ) and λ(t1 ) by a
straight-line program of constant length, since t = [s1 , t1 ] and s = s1 t 2 if n is
odd, and s = t1 s1 t 2 if n is even.
The evaluation of this straight-line program in G is done as described in
Corollary 10.2.20.
Lemma 10.2.24. Given any z ∈ G, a straight-line program of length O(n log n)

and reaching z from s1 , t1 in the case G ∼
= Sn and from s, t in the case G ∼
= An
can be computed in O(µn log n) time.
Proof. Using the algorithm described in the proof of Lemma 10.2.22, we com-
pute λ(z). By Lemmas 10.2.19 and 10.2.23(c), we can write a straight-line
program reaching λ(z) from λ(s), λ(t) or λ(s1 ), λ(t1 ), respectively. The same
straight-line program reaches z from s, t or s1 , t1 .
Summary of the Proof of Theorem 10.2.4(a)

The decision procedure for whether G = ∼ An or G = ∼ Sn is described in
Lemma 10.2.23(a), and the new generators for G are constructed in the proof
of Theorem 10.2.17 for G ∼ = An and in the proof of Lemma 10.2.23(b) for
G∼ = Sn . Given g ∈ G, the construction of λ(g) is described in Lemma 10.2.22,
and the straight-line program reaching g is constructed in Lemma 10.2.24. Fi-
nally, the inverse image of a permutation is constructed in Lemma 10.2.19 and
Corollary 10.2.20 in the case G ∼ = An and in Lemma 10.2.23(c) in the case
G∼ = Sn .
Proof of Theorem 10.2.4(b)

Given G = S of unknown isomorphism type, we attempt to construct s, t,
x1 , . . . , xm (and s1 , t1 , if necessary). If the construction fails then, with high
probability, G is not isomorphic to An or Sn . If the construction succeeds and
s1 , t1 have been computed then we check that s1 , t1 satisfies (10.2) (recall that
it has been checked during the construction that s, t satisfies either (10.3) or
(10.4) as appropriate).
Finally, we write straight-line programs from s, t or s1 , t1 to each generator
z ∈ S, as described in the proof of Lemma 10.2.24, and evaluate the straight-
line programs. If the construction of the straight-line program succeeds and the
evaluated value is equal to z for all z ∈ S then we know with certainty that G
is An or Sn . If not, then G is not isomorphic to An or Sn .
10.2.4. Constructive Recognition: The Case of Giants

If the input black-box group in Theorem 10.2.4 acts as a giant on a set
(i.e., G ∼
= An or G ∼ = Sn , where n := ||) then the constructive recognition
algorithm for G becomes much simpler than the general procedure described
in the previous sections.
From Section 10.2.1, we need only the trivial Lemma 10.2.9, since most
of the work was needed to recognize elements that may have certain desired
cycle types. Here, since the input action is the natural one on , it is trivial
to decide whether some randomly selected z ∈ G acts as a permutation with
cycle type n 1 , 11 (n − 1)1 , or 1n−m−3 31 (n − m)1 on , as required in the proof
of Theorem 10.2.17. The rest of the algorithm in the proof of Theorem 10.2.17
is also simplified, although to a lesser extent: In Lemmas 10.2.14 and 10.2.15,
we can recognize more easily whether elements of cycle type 1n−3 31 move
consecutive points in the ordering of defined by the n-cycle or (n − 1)-
cycle we have constructed. Checking the presentations (10.3) and (10.4) can
be omitted. The time requirement of computing elements s, t ∈ G such that
s| , t| satisfy (10.3) or (10.4) is O(ξ n + µn).
Most of the algorithms in Section 10.2.3, including the computation and
storage of x1 , . . . , xm , are also unnecessary, since it is trivial to determine the
image λ(z) = z| of any z ∈ G. All we have to keep are the routines for
computing and evaluating straight-line programs.
In the next section, we shall apply the constructive recognition of giants as a
subroutine in a strong generating set construction algorithm, in a situation where
a subgroup H acts as a giant on a subset of the permutation domain. In this
setting, we would like to express the parameter ξ as a function of the size of the
input permutation domain. However, nearly uniformly distributed random ele-
ments of H are not available, and appealing to Theorem 2.2.4 would increase the
running time too much. With this application in mind, we describe a constructive
recognition algorithm from [Babai et al., 1988]. This algorithm uses random-
ization only by applying the random subproduct method (cf. Section 2.3).
Theorem 10.2.25. Suppose that H = S ≤ Sym() and a subset ⊆ are

given, such that is an orbit of H and H | = Alt(). Let n := || and k :=
||, and let = {δ1 , . . . , δk }. Moreover, let s = (δ1 , δ2 )(δ3 , δ4 , . . . , δk ) or s =
(δ3 , δ4 , . . . , δk ) if k is even or odd, respectively, and let t = (δ1 , δ2 , δ3 ). Then,
given an arbitrary d > 0, there is a Las Vegas algorithm that, with probability
greater than 1 − k d , computes g, h ∈ H such that g| = s and h| = t.
The time requirement of the algorithm is O(d(nk 2 log2 k + |S|n log k)) and the
storage requirement is O(kn + |S|n).
Proof. Let c ≥ 45 be a sufficiently large but fixed constant, depending on d.

For i = 1, 2, . . . , k, we construct recursively a subset Si ⊆ H and permutations
gi , h i ∈ H such that
(i) |Si | = c log k, Si | stabilizes {δ1 , . . . , δi−1 } pointwise, and with high
probability Si |{δi ,...,δk } = Alt({δi , . . . , δk }).
g
(ii) For all j ∈ [1, i − 1], we have δ j i = δ sj and δ hj i = δ tj .
If this recursion succeeds than we output g := gk and h := h k .
We initialize g1 := (), h 1 := (), and S1 as the set of c log k random subprod-
ucts made from S, where the constant c is determined such that, with probability
at least 1−k d+1 , the group S1 acts transitively on the set of 6-tuples made from
distinct elements of . By Lemma 2.3.7, such a c exists, and by the inequality
(2.9) in the proof of that lemma, c can be chosen as a linear function of d. One
of the most celebrated consequences of the classification of finite simple groups
is that the only 6-transitive permutation groups are the giants. Therefore, if S1
acts transitively on the 6-tuples then S1 | = Alt(). The time requirement of
this initialization step is O(d|S|n log k).
Suppose that Si , gi , and h i are already constructed for some i ∈ [1, k − 6].
We compute and store a transversal Ti for Si mod Si δi , and we choose
g
transversal elements a, b such that gi+1 := agi and h i+1 := bh i satisfy δi i+1 = δis
h i+1
and δi = δi . We construct the elements of the set Si+1 as random subproducts,
t
made from the Schreier generators defined by Si and Ti (cf. Lemma 4.2.1).
Again, we construct c log k random subproducts. If Si |{δi ,...,δk } = Alt({δi , . . . ,
δk }) then the same 6-transitivity argument that we used in the case of S1 shows
that, with probability at least 1 − k d+1 , Si+1 |{δi+1 ,...,δk } = Alt({δi+1 , . . . , δk }).
If i ∈ [k − 5, k − 1] then we compute Ti , gi+1 , and h i+1 as in the previous
paragraph and let Si+1 consist of all Schreier generators defined by Si and Ti .
For a fixed i, the construction of Ti can be done by an orbit algorithm (cf.
Section 2.1.1) in O(dnk log k) time. To stay within the memory usage asserted
in the statement of the theorem, we do not compute and store the Schreier gener-
ators explicitly. Whenever a particular Schreier generator is needed in a random
subproduct, we compute it from Si and Ti . Even with this restriction, the con-
struction of one random subproduct can be performed in O(dnk log k) time, and
the construction of Si+1 is accomplished in O(dnk log2 k) time. Therefore, per-
forming all k steps of the recursion can be done in O(d(nk 2 log2 k + |S|n log k))
time and the probability of failure is at most k/k d+1 = 1/k d .
10.3. A Randomized Strong Generator Construction

For arbitrary input groups G = S ≤ Sym(), with || = n, the worst-case
time complexity of any deterministic version of the Schreier–Sims SGS con-
struction presented in this book is at least O(n 5 + |S|n 2 ) (cf. Theorems 4.2.4,
5.2.3, and 10.1.3). The term n 5 is necessary, as [Knuth, 1991] contains examples
where the running time is indeed (n 5 ).
Although the n 5 bound achieved some notoriety, it can be broken: [Babai
et al., 1997b] contains a deterministic SGS construction with running time
O ∼ (n 4 ) (recall that the notation O ∼ hides a factor logc n for some constant c).
In this section, we present a Monte Carlo SGS construction from [Babai
et al., 1995], which can be considered as an elementary version of the algo-
rithm of [Babai et al., 1997b]. Some paragraphs in this section are reproduced
from [Babai et al., 1995], copyright c 1995 by Academic Press; reprinted by
permission of the publisher. Randomization makes the algorithm simpler and
it also lowers the running time one more order of magnitude, to O ∼ (n 3 ).
The algorithm constructs an SGS in two phases. Given G = S ≤ Sym(),
with || = n, in the first phase it constructs a subgroup chain G = G 1 ≥
G 2 ≥ · · · ≥ G m = 1, which, in general, is not the point stabilizer subgroup
chain relative to any ordering of . Each G i has its own permutation domain
i . We have 1 := . For i ∈ [2, m], either i = i−1 or i = i−1 ∪
i−1 ,
where
i−1 is a minimal block system on some orbit i−1 ⊆ i−1 of G i−1 . The
group G i is either the stabilizer of a point αi−1 ∈ i or the pointwise stabilizer
of an orbit i−1 ⊆ i−1 of G i−1 , on which G i−1 acts as a giant.
Roughly speaking, the algorithm always tries to compute generators for the
stabilizer of a point in the smallest orbit of the group at hand. Moreover, it
tries to work in primitive groups; hence, if the action on the smallest orbit is
imprimitive then it adds blocks of imprimitivity to the permutation domain and
first computes the pointwise stabilizer of these blocks. The last consideration
is that if the primitive group at hand is a giant then, counting transitivity, the
algorithm recognizes it and switches to a routine designed especially to handle
the giants. The version we present here depends on the classification of finite
simple groups, since it utilizes the fact that the only 6-transitive permutation
groups are the giants. If we used instead of that the weaker, but elementary
result from [Wielandt, 1934], that the only 3 log n-transitive subgroups of Sn
are An and Sn , then the running time would increase by a factor log n.
The algorithm constructs sets Ui ⊆ G i . If G i is the stabilizer of a point
αi−1 ∈ i then Ui−1 is a transversal G i−1 mod G i , whereas if G i is a pointwise
stabilizer of an orbit i−1 on which G i−1 acts as a giant then Ui−1 = {s, t}
for two permutations s, t ∈ G i−1 , such that s|i−1 , t|i−1 satisfy the appropriate

one of the presentations (10.2)–(10.4). For each i ∈ [1, m], the set j≥i U j |i
generates G i . This finishes the outline of the first phase of the algorithm.
In the second phase, the algorithm constructs a point stabilizer chain relative
to some ordering of . Using the sets Ui , we can choose uniformly distributed
random cosets of G i in G i−1 . This is clear if Ui−1 is a transversal G i−1 mod G i .
If G i is the pointwise stabilizer of an orbit i−1 then we construct a random
element p of Alt(i−1 ) or Sym(i−1 ), depending on the action of G i−1 on
i−1 , by applying the algorithm of Exercise 2.1 or 2.2. Then, by the algorithms
described in the proofs of Corollary 10.2.20 and Lemma 10.2.23(c), we can
construct g ∈ Ui−1 with the property g|i−1 = p.
Since uniformly distributed random cosets of G i in G i−1 are available for all
i, we can construct uniformly distributed random elements of G, as the product
of the restrictions of random coset representatives to . Therefore, we can apply
the algorithm described in the proof of Lemma 5.4.1 to construct an SGS.
We describe the first phase of the algorithm in more detail. Throughout
the algorithm, the points of the permutation domain are marked by integers,
denoting the priority of which ones we want to stabilize first. Initially, each
point has mark 0; if G is not transitive then the points of the smallest orbit
get mark 1; if the action on this orbit is not primitive, a system of blocks of
imprimitivity is added with mark 2; etc. Higher mark means higher priority.
The main routine is a recursive procedure SGS[H, ], where H is a group
acting on the set . In the previous notation, H is G i−1 , and is the corre-
sponding domain i−1 with the fixed points deleted. The sets Ui are collected
in an array U . The purpose of the following pseudocode is to describe how to
choose the next point to be fixed and how to detect that the group acts as a giant
on an orbit. The subroutines that compute generators for the next group in the
subgroup chain will be described later.
Initially, we set the global variable n to ||, U to the empty list, and
transitivity to 0. We set the mark m(α) = 0 for all α ∈ . Then
SGS[G, ] is called. All internal calculations on permutations are carried out
on all n points and possibly additional points (such as points corresponding to
the action on blocks).
SGS[H,]
Input: generators for H ≤ Sym(); global variables n, transitivity,
U
Output: U
:= \{α ∈ | α H = {α}}
if = ∅ then stop
mark := max{m(α) | α ∈ }
O := {α ∈ | m(α) = mark}
Case:
H O is intransitive:
transitivity := 0
let O1 := smallest orbit of H O
for α ∈ O1 , m(α) := mark + 1
call SGS[H, ]
H O is imprimitive:
compute a minimal block system B on O
:= B ∪
for α ∈ B, m(α) := mark + 1
transitivity := 0
call SGS[H, ]
H O is primitive:
transitivity := transitivity + 1
if transitivity ≥ 6 then
(* H O is Sym(O) or Alt(O) *)
compute s, t ∈ H with s|O , t|O satisfying one of
(10.2)–(10.4)
add {s, t} to U
H := pointwise stabilizer of O in H
call SGS[H, ]
else
α := first element of O
add a transversal H mod Hα to U
compute generators for Hα
call SGS[Hα , ]
Next, we describe the auxiliary routines that compute generators for the
groups G i and estimate the time requirement of these routines. First, we estab-
lish that the permutation domains i cannot grow too large.
Lemma 10.3.1. For each i, |i | ≤ 2n.
Proof. The procedure SGS[G, ] defines naturally a forest F, whose vertices

are the elements of i i . The leaves of F are the elements of . Any other
vertex v of F is a block of imprimitivity for a subgroup of G, acting on some
previously defined vertices of F. The children of v are the elements of the
block v. Since every nonleaf vertex has at least two children, the total number
of vertices is less than 2n.
Lemma 10.3.1 implies that the cost of permutation multiplications is O(n)

in all groups G i . As a second preliminary step, we observe that the length of
any subgroup chain in Sn is less than 3n/2 (cf. [Cameron et al., 1989]), and so
m < 3n/2.
Lemma 10.3.2. Suppose that G i = Si is already computed, and suppose that
G i+1 = (G i )αi for some point αi ∈ i+1 . Let k := |αiG i |. Then there is a Monte
Carlo algorithm that computes a transversal G i mod G i+1 and a generating set
Si+1 of size O(n) for G i+1 in O(nk|Si | log n) time. The memory requirement is
O(n min{n, k|Si |}).
Proof. A transversal Ui for G i mod G i+1 can be computed by an orbit algorithm

in O(nk + k|Si |) time. If k|Si | ≤ n then we construct Si+1 as the set of Schreier
generators defined by Si and Ui . If k|Si | > n then we construct the elements of
the set Si+1 as random subproducts, made from these Schreier generators (cf.
the proof of Theorem 10.2.25 for a similar argument). In the latter case, we do
not compute and store the Schreier generators explicitly. Whenever a particular
Schreier generator is needed in a random subproduct, we compute it from Si
and Ui . Since the length of any subgroup chain of G i is less than 3n/2, the
number of Schreier generators is k|Si |, and one permutation multiplication can
be performed in O(n) time, Theorem 2.3.6 implies that the construction of Si+1
takes O(kn|Si | log n) time.
Lemma 10.3.3. Suppose that G i = Si is already computed, and suppose that
G i+1 is the pointwise stabilizer of an orbit i ⊆ i on which G i acts as a giant.
Let k := |i |. Then there is a Las Vegas algorithm that computes s, t ∈ G i such
that s|i and t|i satisfy the appropriate one of the presentations (10.2)–(10.4).
The time requirement is O(n 2 log4 n + nk 2 log2 n + |Si |n log n) and the memory
requirement is O(n 2 log2 n + |Si |n).
Proof. First, we determine whether G i acts on i as Alt(i ) or Sym(i ).

This can be done in O(k|Si |) time, by examining whether there is some
g ∈ Si such that g|i is an odd permutation. If G i |i = Sym(i ) then we com-
pute H := G by the Monte Carlo algorithm described in the proof of Theo-
rem 2.4.8 in O(n 2 log4 n + |Si |n log n) time. Note that the memory requirement
is O(n 2 log2 n), since the normal closure part of the algorithm first contructs
O(n log2 n) generators. If G i |i = Alt(i ) then we define H := G.
In the next step, we compute g, h ∈ H such that g|i and h|i satisfy (10.3)
or (10.4), as described in the proof of Theorem 10.2.25. The time requirement
for this step is O(n 2 log n + nk 2 log2 n) if G i |i = Sym(i ) (and so we con-
structed O(n) generators for H = G ) and O(|Si |n log n + nk 2 log2 n) in the
case G i |i = Alt(i ).
Finally, if G i |i = Alt(i ) then we output s := g and t := h. If G i |i =
Sym(i ) then we compute s, t ∈ G i satisfying (10.2), as described in the proof
of Lemma 10.2.23(b). This computation takes O(kn log n) time.
Suppose that we are in the situation described in Lemma 10.3.3: G i = Si is

already computed, G i+1 is the pointwise stabilizer of an orbit i ⊆ i on which
G i acts as a giant, and s, t ∈ G i satisfy the appropriate one of the presentations
(10.2)–(10.4). For g ∈ Si , let slp(g) be a straight-line program reaching g|i
from s|i and t|i , as described in the proof of Lemmas 10.2.19 and 10.2.23(c).
Let ḡ denote the evaluation of slp(g), starting from s and t, and let S̃i :=
{g ḡ −1 | g ∈ Si }. Finally, let E(s, t) be the set of permutations obtained by evalu-
ating the relators of the appropriate one of (10.2)–(10.4) (for example, if G i |i =
Alt(i ) and k = |i | is even then E(s, t) = {s k−2 , t 3 , (st)k−1 , [t, s]2 }).
Lemma 10.3.4. With the notation introduced in the previous paragraph, we

have G i+1 = ( S̃i ∪ E(s, t))G i .
Proof. Let K := ( S̃ i ∪ E(s, t))G i . By definition, K G i , and clearly K ≤

G i+1 . We also have G i = K , s, t, since any g ∈ Si can be written as = (g ḡ −1 )ḡ,
with g ḡ −1 ∈ K and ḡ ∈ s, t.
Let Ḡ i := G i /K . On one hand, |Ḡ i | ≥ |G i /G i+1 |, since K ≤ G i+1 ; on the
other hand, |Ḡ i | ≤ |G i /G i+1 |, since E(s, t) ⊆ K , and so Ḡ i is a homomorphic
image of G i /G i+1 . Therefore, K = G i+1 .
Lemma 10.3.5. Suppose that G i = Si is already computed, and suppose

that G i+1 is the pointwise stabilizer of an orbit i ⊆ i on which G i acts as
a giant. Let k := |i |. Then there is a Monte Carlo algorithm that computes
O(n) generators for G i+1 in O(n 2 log4 n + kn 2 log2 n + |Si |n log n) time. The
memory requirement is O(n 2 log2 n + |Si |n).
Proof. If |Si | > 15n then, using the algorithm described in the proof of Theo-
rem 2.3.6, we construct a generating set of size O(n) for G i in O(|Si |n log n)
time. Hence, from now on, we suppose that |Si | ∈ O(n).
First, we compute s, t ∈ G i such that s|i and t|i satisfy the appropriate
one of the presentations (10.2)–(10.4). By Lemma 10.3.3, this can be done in
O(n 2 log4 n + k 2 n log2 n) time.
Our next goal is to construct S̃i and E(s, t). For each g ∈ Si , we com-
pute slp(g) and ḡ. By Corollary 10.2.20 and Lemma 10.2.23(c), the time re-
quirement for that is O(kn log n). Hence, S̃i can be obtained in O(kn 2 log n)
time. The set E(s, t) can be constructed in O(kn) time (cf. the proof of
Lemma 10.2.16).
Finally, we compute the normal closure of S̃ i ∪ E(s, t) in G i . By
Corollary 2.4.6, the time requirement of this step is O(n 2 log4 n).
For the overall running time estimate of the algorithm SGS[G, ], we need
a lemma that may be interesting on its own.
Lemma 10.3.6. Let G ≤ Sym() be a primitive permutation group. Let α ∈

, and suppose that G α has an orbit of length d ≥ 2. Then every nontrivial
subgroup of G α has a nontrivial orbit of length at most d.
Proof. Let ⊆ denote an orbit of G α of length d. Consider the orbital graph

X corresponding to . Recall that X is a directed graph with vertex set and
edge set (α, β)G for some (and so for all) β ∈ . Note that, in X , every vertex
has out-degree d. Clearly G acts as automorphisms of X . Since G is primitive,
X must be strongly connected (cf. Exercise 5.6).
If H ≤ G α is nontrivial then there exists a point γ not fixed by H . Let us

consider a directed path α = α0 , α1 , . . . , αl = γ in X , connecting α to such
a point γ . Let αi be the first point on this path not fixed by H (necessarily
i ≥ 1). Then all points in the orbit αiH are out-neighbors of αi−1 in X (since
αi−1
H
= {αi−1 }), and so |αiH | ≤ d.
Theorem 10.3.7. Given G = S ≤ Sym(), with || = n, the running time
of the algorithm SGS[G, ] is O(n 3 log4 n + |S|n log n).
Proof. By Theorem 2.3.6, in O(|S|n log n) time we can construct O(n) gen-
erators for G. Hence, because all subroutines construct O(n) generators for
subgroups, we may suppose that, at each recursive call, the input of SGS[H, ]
contains O(n) generators.
During the run of SGS[G, ], the number of recursive calls of SGS[H, ]
is O(n). At each call, we have to find the orbits of the input group and and test
primitivity on a domain of size O(n). By Theorem 2.1.1 and Corollary 5.5.9,
one computation of the orbit structure can be done in O(n 2 ) time and one test
of primitivity in O(n 2 log n) time. Therefore, the total time spent in orbit and
primitivity computations is O(n 3 log n).
Next, we estimate the time spent in computing G i+1 for the values of i when
G i acts as a giant on some orbit i ⊆ i , and G i+1 is the pointwise stabilizer
of . For different i, these orbits i are disjoint, so Lemma 10.3.1 implies
i
i |i | ≤ 2n and Lemma 10.3.5 implies that the total time spent for these i
values is O(n 3 log4 n).
Finally, we estimate the time requirement of computing G i+1 for the values
of i when G i+1 = (G i )αi for some αi ∈ i+1 . Let I ⊆ [1, m] denote the set of
indices i in this category, and let ki be the length of the orbit αiG i in i+1 . We
claim that

ki ≤ 10n log n. (10.12)
i∈I
Let α ∈ m be arbitrary, and let us consider the subsequence i 1 < · · · < il

Gi
of indices, i j ∈ I for j ∈ [1, l], such that α is in the orbit i j := αi j j . We have
i1 i2 · · · il . We claim that l ≤ 5 log n; that is, α occurs in at most
5 log n orbits of points αi with i ∈ I . Since there are at most 2n possibilities
for α, this new claim implies (10.12) by counting the incidences of orbits
αiG i , i ∈ [1, m], and the points in them in two ways.
We claim that, for each j ∈ [1, l − 1], either ki j+1 = ki j − 1 or ki j+1 ≤ ki j /2
and that the former cannot occur more than five times in a row. Clearly, this
implies the claimed bound 5 log n.
For each j ∈ [1, l], G i j |i j is primitive. There are three possibilities.
Case 1: G i j |i j is doubly transitive and the point stabilizer (G i j )αi j acts
primitively on i j \{αi j }. In this case, ki j+1 = ki j − 1. If this continues more
than five times in a row then the algorithm recognizes that (G i j )αi j acts as
a giant on i j \{αi j } and in the next recursive call of SGS[H, ] all points
of i j \{αi j } are stabilized.
Case 2: G i j |i j is doubly transitive and the point stabilizer (G i j )αi j acts
imprimitively on i j \{αi j }. Then in the next recursive call of SGS[H, ]
a minimal block system
i j is added to i j , and G i j+1 is a subgroup of
the pointwise stabilizer of the action of (G i j )αi j on
i j . Hence i j+1 is a
subset of a block in
i j and so ki j+1 ≤ (ki j − 1)/2.
Case 3: G i j |i j is primitive but not doubly transitive. Then (G i j )αi j acts
intransitively on i j \{αi j }, and its smallest orbit is of size at most (ki j −
1)/2. The point α may not be in this smallest orbit but, by Lemma 10.3.6,
ki j+1 ≤ (ki j − 1)/2 still holds. This finishes the proof of (10.12).
Lemma 10.3.2 and (10.12) imply that the total time requirement of computing
G i+1 for i ∈ I is O(n 3 log2 n). We note that (10.12) also implies that the output
of SGS[G, ] consists of O(n log n) permutations, which require O(n 2 log n)
storage (but the memory requirement of the entire algorithm is slightly bigger,
O(n 2 log2 n)).
Our last task is to give the time requirement of the second phase of the
algorithm, the construction of a strong generating set relative to some ordering
of the original permutation domain . Recall that the output U of SGS[G, ]
is used to generate uniformly distributed random elements of G.
If G i+1 = (G i )αi for some αi ∈ i+1 then we have a transversal G i mod
G i+1 stored in U , so a uniformly distributed random coset representative can
be written down in O(n) time. If G i+1 is the pointwise stabilizer of an orbit
i ⊆ i , with |i | = ki , on which G i acts as a giant then Corollary 10.2.20
and Lemma 10.2.23(c) imply that a coset representative G i mod G i+1 can be
constructed in O(ki n log n) time, as described at the beginning of this sec-
tion. Therefore, a uniformly distributed random element of G is constructed
in O(n 2 log n) time, and Lemma 5.4.1 implies that an SGS for G can be con-
structed in O(n 3 log4 n) time.
Bibliography
[Acciaro and Atkinson, 1992] Acciaro, V., and Atkinson, M. D. (1992). A new
algorithm for testing the regularity of a permutation group. Congr. Numer.,
90:151–160.
[Atkinson, 1975] Atkinson, M. (1975). An algorithm for finding the blocks of a
permutation group. Math. Comp., 29:911–913.
[Atkinson and Neumann, 1990] Atkinson, M. D., and Neumann, P. M. (1990).
Computing Sylow subgroups of permutation groups. Congr. Numer., 79:55–60.
[Atkinson et al., 1984] Atkinson, M., Hassan, R., and Thorne, M. (1984). Group
theory on a micro-computer. In Atkinson, M., editor, Computational Group
Theory, pages 275–280, Academic Press, London.
[Babai, 1979] Babai, L. (1979). Monte Carlo algorithms for graph isomorphism
testing. Technical report DMS 79–10, Univ. de Montréal.
[Babai, 1991] Babai, L. (1991). Local expansion of vertex-transitive graphs and
random generation in finite groups. In Proc. 23rd ACM Symposium on Theory of
Computing, pages 164–174, ACM Press, New York.
[Babai, 1992] Babai, L. (1992). Bounded round interactive proofs in finite groups.
SIAM J. Discrete Math., 5:88–111.
[Babai, 1997] Babai, L. (1997). Randomization in group algorithms: Conceptual
questions. In Groups and Computation II, volume 28 of Amer. Math. Soc.
DIMACS Series, pages 1–17.
[Babai and Beals, 1999] Babai, L., and Beals, R. (1999). A polynomial-time theory
of black box groups I. In Campbell, C. M., Robertson, E. F., Ruskuc, N., and
Smith, G. C., editors, Groups St. Andrews 1997 in Bath, I, volume 260 of London
Math. Soc. Lecture Note Ser., pages 30–64, Cambridge University Press,
Cambridge.
[Babai and Moran, 1988] Babai, L., and Moran, S. (1988). Arthur–Merlin games: A
randomized proof system and a hierarchy of complexity classes. J. Comp. Syst.
Sci., 36:254–276.
[Babai and Szemerédi, 1984] Babai, L., and Szemerédi, E. (1984). On the complexity
of matrix group problems, I. In Proc. 25th IEEE Symposium on Foundations of
Computer Science, pages 229–240, IEEE Press, Washington.
[Babai et al., 1983] Babai, L., Kantor, W. M., and Luks, E. M. (1983). Computational
complexity and the classification of finite simple groups. In Proc. 24th IEEE
Symposium on Foundations of Computer Science, pages 162–171, IEEE Press,
Washington.
[Babai et al., 1987] Babai, L., Luks, E. M., and Seress, Á. (1987). Permutation groups
254
Bibliography 255
in NC. In Proc. 19th ACM Symposium on Theory of Computing, pages 409–420,
ACM Press, New York.
[Babai et al., 1988] Babai, L., Luks, E. M., and Seress, Á. (1988). Fast management of
permutation groups. In Proc. 29th IEEE Symposium on Foundations of Computer
Science, pages 272–282, IEEE Press, Washington.
[Babai et al., 1991] Babai, L., Cooperman, G., Finkelstein, L., and Seress, Á. (1991).
Nearly linear time algorithms for permutation groups with a small base. In Proc.
of International Symposium on Symbolic and Algebraic Computation ISSAC ’91,
pages 200–209, ACM Press, New York.
[Babai et al., 1993] Babai, L., Luks, E. M., and Seress, Á. (1993). Computing
composition series in primitive groups. In Groups and Computation, volume 11
of Amer. Math. Soc. DIMACS Series, pages 1–16.
[Babai et al., 1995] Babai, L., Cooperman, G., Finkelstein, L., Luks, E. M., and Seress,
Á. (1995). Fast Monte Carlo algorithms for permutation groups. J. Comp. Syst.
Sci., 50:296–308.
[Babai et al., 1997a] Babai, L., Goodman, A. J., Kantor, W. M., Luks, E. M., and Pálfy,
P. P. (1997a). Short presentations for finite groups. J. Algebra, 194:97–112.
[Babai et al., 1997b] Babai, L., Luks, E. M., and Seress, Á. (1997b). Fast management
of permutation groups I. SIAM J. Computing, 26:1310–1342.
[Beals, 1993a] Beals, R. (1993a). Computing blocks of imprimitivity for small-base
groups in nearly linear time. In Groups and Computation, volume 11 of Amer.
Math. Soc. DIMACS Series, pages 17–26.
[Beals, 1993b] Beals, R. M. (1993b). An elementary algorithm for computing the
composition factors of a permutation group. In Proc. of International Symposium
on Symbolic and Algebraic Computation ISSAC ’93, pages 127–134, ACM Press,
New York.
[Beals and Babai, 1993] Beals, R., and Babai, L. (1993). Las Vegas algorithms for
matrix groups. In Proc. 34rd IEEE Symposium on Foundations of Computer
Science, pages 427–436, IEEE Press, Washington.
[Beals and Seress, 1992] Beals, R. M., and Seress, Á. (1992). Structure forest and
composition factors for small base groups in nearly linear time. In Proc. 24th ACM
Symposium on Theory of Computing, pages 116–125, ACM Press, New York.
[Beals et al., 2002] Beals, R., Leedham-Green, C., Niemeyer, A., Praeger, C., and
Seress, Á. (2002). A black-box group algorithm for recognizing finite symmetric
and alternating groups, I. Transactions Amer. Math. Soc. To appear.
[Bosma et al., 1997] Bosma, W., Cannon, J., and Playoust, C. (1997). The Magma
algebra system I: The user language. J. Symbolic Comput., 24:235–265.
[Brassard and Bratley, 1988] Brassard, G., and Bratley, P. (1988). Algorithmics Theory
and Practice, Prentice–Hall, Englewood Cliffs, NJ.
[Bratus, 1999] Bratus, S. (1999). Recognition of Finite Black Box Groups. PhD thesis,
Northeastern University.
[Bratus and Pak, 2000] Bratus, S., and Pak, I. (2000). Fast constructive recognition of
a black box group isomorphic to Sn or An using Goldbach’s Conjecture.
J. Symbolic Comput., 29:33–57.
[Brown et al., 1989] Brown, C. A., Finkelstein, L., and Purdom, P. W. (1989). A new
base change algorithm for permutation groups. SIAM J. Computing,
18:1037–1047.
[Butler, 1979] Butler, G. (1979). Computational Approaches to Certain Problems in
the Theory of Finite Groups. PhD thesis, University of Sydney.
[Butler, 1982] Butler, G. (1982). Computing in permutation and matrix groups II:
Backtrack algorithm. Math. Comp., 39:671–680.
256 Bibliography
[Butler, 1983] Butler, G. (1983). Computing normalizers in permutation groups.
J. Algorithms, 4:163–175.
[Butler, 1985] Butler, G. (1985). Effective computation with group homomorphisms.
[Butler, 1991] Butler, G. (1991). Fundamental Algorithms for Permutation Groups,
volume 559 of Lecture Notes in Computer Science, Springer-Verlag, Berlin.
[Butler, 1994] Butler, G. (1994). An inductive schema for computing conjugacy
classes in permutation groups. Math. Comp., 62:363–383.
[Butler and Cannon, 1989] Butler, G., and Cannon, J. J. (1989). Computing in
permutation and matrix groups III: Sylow subgroups. J. Symbolic Comput.,
8:241–252.
[Butler and Cannon, 1991] Butler, G., and Cannon, J. J. (1991). Computing Sylow
subgroups using homomorphic images of centralizers. J. Symbolic Comput.,
12:443–458.
[Cameron, 1999] Cameron, P. J. (1999). Permutation Groups, Cambridge University
Press, Cambridge.
[Cameron et al., 1984] Cameron, P. J., Neumann, P. M., and Saxl, J. (1984). On groups
with no regular orbits on the set of subsets. Archive Math., 43:295–296.
[Cameron et al., 1989] Cameron, P. J., Solomon, R., and Turull, A. (1989). Chains of
subgroups in symmetric groups. J. Algebra, 127:340–352.
[Cannon, 1984] Cannon, J. (1984). A computational toolkit for finite permutation
groups. In Proc. Rutgers Group Theory Year, 1983–1984, pages 1–18, Cambridge
University Press, Cambridge.
[Cannon and Souvignier, 1997] Cannon, J., and Souvignier, B. (1997). On the
computation of conjugacy classes in permutation groups. In Proc. of International
Symposium on Symbolic and Algebraic Computation ISSAC ’97, pages 392–399,
[Cannon, 1971] Cannon, J. J. (1971). Computing local structure of large finite groups.
In Birkhoff, G., and Marshall Hall, J., editors, Computers in Algebra and Number
Theory, volume 4 of Proc. Amer. Math. Soc., pages 161–176, Amer. Math. Soc.,
Providence, RI.
[Cannon, 1973] Cannon, J. J. (1973). Construction of defining relators for finite
groups. Discrete Math., 5:105–129.
[Cannon and Holt, 1997] Cannon, J. J., and Holt, D. F. (1997). Computing chief series,
composition series and socles in large permutation groups. J. Symbolic Comput.,
24:285–301.
[Cannon and Holt, 2002] Cannon, J. J., and Holt, D. F. (2002). Computing maximal
subgroups of finite groups. Submitted.
[Cannon et al., 1997] Cannon, J. J., Cox, B. C., and Holt, D. F. (1997). Computing
Sylow subgroups in permutation groups. J. Symbolic Comput., 24:303–316.
[Carmichael, 1923] Carmichael, R. (1923). Abstract definitions of the symmetric and
alternating groups and certain other permutation groups. Quart. J. Math.,
49:226–270.
[Celler et al., 1995] Celler, F., Leedham-Green, C. R., Murray, S. H., Niemeyer, A. C.,
and O’Brien, E. (1995). Generating random elements of a finite group. Comm.
Algebra, 23:4931–4948.
[Celler et al., 1990] Celler, F., Neubüser, J., and Wright, C. R. B. (1990). Some
remarks on the computation of complements and normalizers in soluble groups.
Acta Appl. Math., 21:57–76.
[Chernoff, 1952] Chernoff, H. (1952). A measure of asymptotic efficiency for tests of a
hypothesis based on the sum of observations. Ann. Math. Statistics, 23:493–507.
[Conway et al., 1985] Conway, J., Curtis, R., Norton, S., Parker, R., and Wilson, R.
(1985). Atlas of Finite Groups, Clarendon Press, Oxford.
[Cooperman and Finkelstein, 1992] Cooperman, G., and Finkelstein, L. (1992). A fast
Bibliography 257
cyclic base change for permutation groups. In Proc. of International Symposium
on Symbolic and Algebraic Computation ISSAC ’92, pages 224–232, ACM Press,
New York.
[Cooperman and Finkelstein, 1993] Cooperman, G., and Finkelstein, L. (1993).
Combinatorial tools for computational group theory. In Groups and Computation,
volume 11 of Amer. Math. Soc. DIMACS Series, pages 53–86.
[Cooperman et al., 1989] Cooperman, G., Finkelstein, L., and Luks, E. M. (1989).
Reduction of group constructions to point stabilizers. In Proc. of International
[Cooperman et al., 1990] Cooperman, G., Finkelstein, L., and Sarawagi, N. (1990). A
random base change algorithm for permutation groups. In Proc. of International
[Cooperman et al., 1997] Cooperman, G., Finkelstein, L., and Linton, S. A. (1997).
Recognizing GLn (2) in non-standard representation. In Groups and Computation
II, volume 28 of Amer. Math. Soc. DIMACS Series, pages 85–100.
[Cooperstein, 1978] Cooperstein, B. N. (1978). Minimal degree for a permutation
representation of a classical group. Israel J. Math., 30:213–235.
[Coxeter and Moser, 1957] Coxeter, H., and Moser, W. (1957). Generators and
Relations for Discrete Groups, 4th edition, volume 14 of Ergeb. Math. Grenzgeb.
Springer-Verlag, Berlin.
[Dixon, 1968] Dixon, J. D. (1968). The solvable length of a solvable linear group.
Math. Z., 107:151–158.
[Dixon and Mortimer, 1996] Dixon, J. D., and Mortimer, B. (1996). Permutation
Groups. Graduate Texts in Math., Springer-Verlag, Berlin.
[Easdown and Praeger, 1988] Easdown, D., and Praeger, C. E. (1988). On minimal
faithful permutation representations of finite groups. Bull. Aust. Math. Soc.,
38:207–220.
[Eick, 1997] Eick, B. (1997). Special presentations for finite soluble groups and
computing (pre-)Frattini subgroups. In Groups and Computation II, volume 28 of
Amer. Math. Soc. DIMACS Series, pages 101–112.
[Eick and Hulpke, 2001] Eick, B., and Hulpke, A. (2001). Computing the maximal
subgroups of a permutation group I. In Kantor, W. M., and Seress, Á., editors,
Groups and Computation III, volume 8 of OSU Mathematical Research Institute
Publications, pages 155–168, de Gruyter, Berlin.
[Erdős and Rényi, 1965] Erdős, P., and Rényi, A. (1965). Probabilistic methods in
group theory. J. d’Analyse Math., 14:127–138.
[Feller, 1968] Feller, W. (1968). An Introduction to Probability Theory and Its
Applications, volume 1, 3rd edition, Wiley, New York.
[Furst et al., 1980] Furst, M., Hopcroft, J., and Luks, E. M. (1980). Polynomial-time
algorithms for permutation groups. In Proc. 21st IEEE Symposium on
Foundations of Computer Science, pages 36–41, IEEE Press, Washington.
[GAP, 2000] GAP (2000). GAP – Groups, Algorithms, and Programming, Version 4.2.
The GAP Group, Aachen, St Andrews
(http://www-gap.dcs.st-and.ac.uk/~gap).
[Garey and Johnson, 1979] Garey, M. R., and Johnson, D. S. (1979). Computers and
Intractability, A Guide to the Theory of NP-completeness, Freeman, New York.
[Gebhardt, 2000] Gebhardt, V. (2000). Constructing a short defining set of relations for
a finite group. J. Algebra, 233:526–542.
[Guralnick, 1983] Guralnick, R. M. (1983). Subgroups of prime power index in a
simple group. J. Algebra, 81:304–311.
[Guralnick and Kantor, 2000] Guralnick, R. M., and Kantor, W. M. (2000). The
probability of generating a simple group. J. Algebra, 234:743–792.
258 Bibliography
[Hardy and Wright, 1979] Hardy, G. H., and Wright, E. M. (1979). An Introduction the
Theory of Numbers, 5th edition, Clarendon Press, Oxford.
[Hoffmann, 1982] Hoffmann, C. M. (1982). Group-theoretic Algorithms and Graph
Isomorphism, volume 136 of Lecture Notes in Computer Science,
Springer-Verlag, Berlin.
[Holt, 1991] Holt, D. F. (1991). The computation of normalizers in permutation
groups. J. Symbolic Comput., 12:499–516.
[Holt, 1997] Holt, D. F. (1997). Representing quotients of permutation groups. Quart.
J. Math. Oxford Ser. (2), 48:347–350.
[Holt, 2001] Holt, D. F. (2001). Computing automorphism groups of finite groups. In
Kantor, W. M., and Seress, Á., editors, Groups and Computation III, volume 8 of
OSU Mathematical Research Institute Publications, pages 201–208, de Gruyter,
Berlin.
[Holt and Rees, 1994] Holt, D. F., and Rees, S. (1994). Testing modules for
irreducibility. J. Austral. Math. Soc. Ser. A, 57:1–16.
[Hulpke, 1993] Hulpke, A. (1993). Zur Berechnung von Charaktertafeln.
Diplomarbeit, RWTH Aachen.
[Hulpke, 1996] Hulpke, A. (1996). Konstruktion transitiver Permutationsgruppen.
PhD thesis, RWTH Aachen.
[Hulpke, 2000] Hulpke, A. (2000). Conjugacy classes in finite permutation groups via
homomorphic images. Math. Comp., 69:1633–1651.
[Hulpke and Seress, 2001] Hulpke, A., and Seress, Á. (2001). Short presentations for
three-dimensional unitary groups. J. Algebra, 245:719–729.
[Huppert, 1967] Huppert, B. (1967). Endliche Gruppen I, volume 134 of Grundlehren
Math. Wiss., Springer-Verlag, Berlin.
[Ivanyos and Lux, 2000] Ivanyos, G., and Lux, K. (2000). Treating the exceptional
cases of the Meataxe. Exp. Math., 9:373–381.
[Jerrum, 1986] Jerrum, M. (1986). A compact representation for permutation groups.
[Jerrum, 1995] Jerrum, M. (1995). Computational Pólya theory. In Surveys in
Combinatorics, 1995, pages 103–118, Cambridge University Press,
Cambridge.
[Jordan, 1873] Jordan, C. (1873). Sur la limite de transitivité des groupes non alternés.
Bull. Soc. Math. France, 1:40–71.
[Kantor, 1985a] Kantor, W. M. (1985a). Some consequences of the classification of
finite simple groups. In McKay, J., editor, Finite Groups – Coming of Age,
volume 45 of Contemporary Mathematics, pages 159–173, Amer. Math. Soc.,
Providence, RI.
[Kantor, 1985b] Kantor, W. M. (1985b). Sylow’s theorem in polynomial time.
J. Comp. Syst. Sci., 30:359–394.
[Kantor, 1990] Kantor, W. M. (1990). Finding Sylow normalizers in polynomial time.
[Kantor, 1991] Kantor, W. M. (1991). Finding composition factors of permutation
groups of degree n ≤ 106 . J. Symbolic Comput., 12:517–526.
[Kantor and Luks, 1990] Kantor, W. M., and Luks, E. M. (1990). Computing in
quotient groups. In Proc. 22nd ACM Symposium on Theory of Computing, pages
524–534, ACM Press, New York.
[Kantor and Magaard, 2002] Kantor, W. M., and Magaard, K. (2002). Black-box
exceptional groups of Lie type. In preparation.
[Kantor and Penttila, 1999] Kantor, W. M., and Penttila, T. (1999). Reconstructing
simple group actions. In Cossey, J., Miller, C. F., Neumann, W. D., and Shapiro,
M., editors, Geometric Group Theory Down Under, pages 147–180, de Gruyter,
Berlin.
Bibliography 259
[Kantor and Seress, 1999] Kantor, W. M., and Seress, Á. (1999). Permutation group
algorithms via black box recognition algorithms. In Campbell, C. M., Robertson,
E. F., Ruskuc, N., and Smith, G. C., editors, Groups St. Andrews 1997 in Bath, II,
volume 261 of London Math. Soc. Lecture Note Ser., pages 436–446, Cambridge
University Press, Cambridge.
[Kantor and Seress, 2002] Kantor, W. M., and Seress, Á. (2002). Computing with
matrix groups. Submitted.
[Kantor and Seress, 2001] Kantor, W. M., and Seress, Á. (2001). Black box classical
groups. Memoirs Amer. Math. Soc., 149(708).
[Kantor and Taylor, 1988] Kantor, W. M., and Taylor, D. E. (1988). Polynomial-time
versions of Sylow’s theorem. J. Algorithms, 9:1–17.
[Kantor et al., 1999] Kantor, W. M., Luks, E. M., and Mark, P. D. (1999). Sylow
subgroups in parallel. J. Algorithms, 31:132–195.
[Kimmerle et al., 1990] Kimmerle, W., Lyons, R., Sandling, R., and Teague, D. N.
(1990). Composition factors from the group ring and Artin’s theorem on orders of
simple groups. Proc. London Math. Soc. (3), 60:89–122.
[Knuth, 1969] Knuth, D. E. (1969). The Art of Computer Programming. Volume 2:
Seminumerical Algorithms, Addison-Wesley, Reading, MA.
[Knuth, 1973] Knuth, D. E. (1973). The Art of Computer Programming. Volume 3:
Sorting and Searching, Addison-Wesley, Reading, MA.
[Knuth, 1991] Knuth, D. E. (1991). Notes on efficient representation of perm groups.
Combinatorica, 11:57–68.
[Laue et al., 1984] Laue, R., Neubüser, J., and Schoenwaelder, U. (1984). Algorithms
for finite soluble groups and the SOGOS system. In Atkinson, M., editor,
Computational Group Theory, pages 105–135, Academic Press, London.
[Leedham-Green and Soicher, 1990] Leedham-Green, C., and Soicher, L. (1990).
Collection from the left and other strategies. J. Symbolic Comput., 9:665–675.
[Leedham-Green, 2001] Leedham-Green, C. R. (2001). The computational matrix
group project. In Kantor, W. M., and Seress, Á., editors, Groups and Computation
III, volume 8 of OSU Mathematical Research Institute Publications, pages
229–247, de Gruyter, Berlin.
[Leon, 1980a] Leon, J. S. (1980a). Finding the order of a permutation group. In
Cooperstein, B., and Mason, G., editors, Finite Groups, volume 37 of Proc.
Sympos. Pure Math., pages 511–517, Amer. Math. Soc., Providence, RI.
[Leon, 1980b] Leon, J. S. (1980b). On an algorithm for finding a base and strong
generating set for a group given by generating permutations. Math. Comp.,
35:941–974.
[Leon, 1991] Leon, J. S. (1991). Permutation group algorithms based on partitions, I:
Theory and algorithms. J. Symbolic Comput., 12:533–583.
[Leon, 1997] Leon, J. S. (1997). Partitions, refinements, and permutation group
computation. In Groups and Computation II, volume 28 of Amer. Math. Soc.
[Luks, 1982] Luks, E. M. (1982). Isomorphism of graphs of bounded valence can be
tested in polynomial time. J. Comp. Syst. Sci., 25:42–65.
[Luks, 1987] Luks, E. M. (1987). Computing the composition factors of a permutation
group in polynomial time. Combinatorica, 7:87–99.
[Luks, 1990] Luks, E. M. (1990). Lectures on Polynomial-Time Computation in
Groups. Lecture notes, Northeastern University.
[Luks, 1993] Luks, E. M. (1993). Permutation groups and polynomial-time
computation. In Groups and Computation, volume 11 of Amer. Math. Soc.
[Luks and Seress, 1997] Luks, E. M., and Seress, Á. (1997). Computing the Fitting
subgroup and solvable radical of small-base permutation groups in nearly linear
260 Bibliography
time. In Groups and Computation II, volume 28 of Amer. Math. Soc. DIMACS
Series, pages 169–181.
[Mark, 1993] Mark, P. D. (1993). Parallel computation of Sylow subgroups in solvable
groups. In Groups and Computation, volume 11 of Amer. Math. Soc. DIMACS
Series, pages 177–187.
[McIver and Neumann, 1987] McIver, A., and Neumann, P. M. (1987). Enumerating
finite groups. Quart. J. Math. Oxford, 38:473–488.
[McKay, 1981] McKay, B. D. (1981). Practical graph isomorphism. Congr. Numer.,
30:45–87.
[Mecky and Neubüser, 1989] Mecky, M., and Neubüser, J. (1989). Some remarks on
the computation of conjugacy classes of soluble groups. Bull. Aust. Math. Soc.,
40:281–292.
[Morje, 1995] Morje, P. (1995). A Nearly Linear Algorithm for Sylow Subgroups of
Small-Base Permutation Groups. PhD thesis, The Ohio State University.
[Morje, 1997] Morje, P. (1997). On nearly linear time algorithms for Sylow subgroups
of small-base permutation groups. In Groups and Computation II, volume 28 of
[Neubüser, 1982] Neubüser, J. (1982). An elementary introduction to coset table
methods in computational group theory. In Campbell, C. M., and Robertson, E. F.,
editors, Groups – St Andrews 1981, volume 71 of London Math. Soc. Lecture
Note Ser., pages 1–45, Cambridge University Press, Cambridge.
[Neumann, 1979] Neumann, P. M. (1979). A lemma that is not Burnside’s. Math.
Scientist, 4:133–141.
[Neumann, 1986] Neumann, P. M. (1986). Some algorithms for computing with finite
permutation groups. In Robertson, E., and Campbell, C., editors, Proc. of
Groups – St Andrews 1985, volume 121 of London Math. Soc. Lecture Note Ser.,
pages 59–92, Cambridge University Press, Cambridge.
[Niven et al., 1991] Niven, I., Zuckerman, H. S., and Montgomery, H. L. (1991). An
Introduction to the Theory of Numbers, 5th edition, Wiley, New York.
[Pak, 2001] Pak, I. (2001). What do we know about the product replacement
algorithm? In Kantor, W. M., and Seress, Á., editors, Groups and Computation
III, volume 8 of OSU Mathematical Research Institute Publications, pages
301–347, de Gruyter, Berlin.
[Parker and Nikolai, 1958] Parker, E. T., and Nikolai, P. J. (1958). A search for
analogues of the Mathieu groups. Math. Tables Aids Comput., 12:38–43.
[Parker, 1984] Parker, R. (1984). The computer calculation of modular characters (the
Meat-Axe). In Atkinson, M., editor, Computational Group Theory, pages
267–274, Academic Press, London.
[Passman, 1968] Passman, D. (1968). Permutation Groups. Benjamin, New York.
[Praeger and Saxl, 1980] Praeger, C. E., and Saxl, J. (1980). On the orders of primitive
permutation groups. Bull. London Math. Soc., 12:303–307.
[Pyber, 1993] Pyber, L. (1993). Asymptotic results for permutation groups. In Groups
and Computation, volume 11 of Amer. Math. Soc. DIMACS Series, pages
197–219.
[Rákóczi, 1995] Rákóczi, F. (1995). Fast recognition of the nilpotency of permutation
groups. In Proc. of International Symposium on Symbolic and Algebraic
Computation ISSAC ’95, pages 265–269, ACM Press, New York.
[Rákóczi, 1997] Rákóczi, F. (1997). Data Structures and Algorithms for Computing in
Nilpotent and Solvable Permutation Groups. PhD thesis, University of Oregon.
[Rónyai, 1990] Rónyai, L. (1990). Computing the structure of finite algebras.
[Rose, 1965] Rose, J. (1965). Abnormal depth and hypereccentric length in finite
soluble groups. Math. Z., 90:29–40.
Bibliography 261
[Rotman, 1995] Rotman, J. J. (1995). An Introduction to the Theory of Groups, 4th
edition, Springer-Verlag, Berlin.
[Schönert and Seress, 1994] Schönert, M., and Seress, Á. (1994). Finding blocks of
imprimitivity in small-base groups in nearly linear time. In Proc. of International
[Scott, 1980] Scott, L. L. (1980). Representations in characteristic p. In The Santa
Cruz Conference on Finite Groups, volume 37 of Proc. Symposium Pure Math.,
pages 319–331, Amer. Math. Soc., Providence, RI.
[Sedgewick, 1988] Sedgewick, R. (1988). Algorithms, 2nd edition, Addison–Wesley,
Reading, MA.
[Seress, 1997] Seress, Á. (1997). An introduction to Computational Group Theory.
Notices Amer. Math. Soc., 44:671–679.
[Seress, 1998] Seress, Á. (1998). Nearly linear time algorithms for permutation
groups: an interplay between theory and practice. Acta Appl. Math., 52:183–207.
[Seress and Weisz, 1993] Seress, Á., and Weisz, I. (1993). PERM: A program
computing strong generating sets. In Groups and Computation I, volume 11 of
[Sims, 1970] Sims, C. C. (1970). Computational methods in the study of permutation
groups. In Computational Problems in Abstract Algebra, pages 169–183,
Pergamon Press, Oxford.
[Sims, 1971a] Sims, C. C. (1971a). Computation with permutation groups. In Proc.
Second Symposium on Symbolic and Algebraic Manipulation, pages 23–28, ACM
Press, New York.
[Sims, 1971b] Sims, C. C. (1971b). Determining the conjugacy classes of permutation
groups. In Birkhoff, G., and Marshall Hall, J., editors, Computers in Algebra and
Number Theory, volume 4 of Proc. Amer. Math. Soc., pages 191–195, Amer.
Math. Soc., Providence, RI.
[Sims, 1978] Sims, C. C. (1978). Some group theoretic algorithms. In Dold, A., and
Eckmann, B., editors, Topics in Algebra, volume 697 of Lecture Notes in Math.,
pages 108–124, Springer-Verlag, Berlin.
[Sims, 1990] Sims, C. C. (1990). Computing the order of a solvable permutation
group. J. Symbolic Comput., 9:699–705.
[Sims, 1994] Sims, C. C. (1994). Computation with Finitely Presented Groups,
Cambridge University Press, Cambridge.
[Suzuki, 1962] Suzuki, M. (1962). On a class of doubly transitive groups. Ann. Math.,
75:105–145.
[Tarjan, 1975] Tarjan, R. E. (1975). Efficiency of a good but not linear set union
algorithm. J. Assoc. Comput. Mach., 22:215–225.
[Taylor, 1992] Taylor, D. E. (1992). The Geometry of the Classical Groups, volume 9
of Sigma Series in Pure Mathematics, Heldermann-Verlag, Berlin.
[Theißen, 1997] Theißen, H. (1997). Eine Methode zur Normalisatorberechnung in
Permutationsgruppen mit Anwendungen in der Konstruktion primitiver Gruppen.
PhD thesis, RWTH Aachen.
[Vaughan-Lee, 1990] Vaughan-Lee, M. (1990). Collection from the left. J. Symbolic
Comput., 9:725–733.
[Warlimont, 1978] Warlimont, R. (1978). Über die Anzahl der Lösungen von x n = 1
in der symmetrischen Gruppe Sn . Archive Math., 30:591–594.
[Wielandt, 1934] Wielandt, H. (1934). Abschätzungen für den Grad einer
Permutationsgruppe von vorgeschriebenem Transitivitätsgrad. Schrift. Math.
Sem. Inst. Angew. Math. Univ. Berlin, 2:151–174.
[Wielandt, 1964] Wielandt, H. (1964). Finite Permutation Groups, Academic Press,
New York.
Index
Note: Numerals in boldface type indicate where a notion or notation is defined.
( ), 9 ( f ), 11
1m 1 · · · n m n , 228 P (γ1 , . . . , γl−1 ), 205
A × B, 8 Out(G), 7
A(m, x), 109 ∧
, 208
An , 9 R(), 209
C G (U ), 7 Soc(G), 8
Cn , 8 Sym(), 9
E(A), 36 T (t), 202
G| , 9 (H ), 210
G H , 10 ( f ), 11
G, 8 X (V, E), 11
G(P), 201 X (α, β), 212
G, 9 αg , 9
G [i] , 55 N, 10
G () , 9 R, 10
G , 10 Z, 10
H G, 7 fix(G), 117
H << G, 7 S, 7
L[i], 12 U G , 7
N (k, , d1 , . . . , dl ), 231 µ, 94, 228
N G (U ), 7 ωG , 9
Nn (x), 231 O ∼ ( f ), 11

O( f ), 11 Ai , 8
O ∞ (G), 8 supp(g), 9
O p (G), 8 ϕ((γ1 , . . . , γl )), 202
O∞ (G), 8 ξ , 94, 228
Sn , 9 o( f ), 11
Tn (x), 231 OP(), 207
[H, K ], 8 Prob(A|B), 30
[a, b], 8
Alt(), 9 Ackerman function, 109, 176
Aut(G), 7 automorphism group, 7, 129, 131, 144
Aut(X ), 11 of a graph, 11, 207
g , 9
Diag(H ), 129 backtrack, 53, 169, 201–217
GF(q), 8 base, 50, 55
GLd (q), 8 R-base, 210
Inn(G), 7 nonredundant, 55
262
Index 263
base change, 82, 97, 112, 116, 134, 143, 190, forest, 12, 219, 249
204, 205 Frattini argument, 170, 182
black-box f -recognizable, 192, 193, 195, 196, Frobenius group, 10, 133, 137, 148, 160
198
black-box group, 16, 16–47, 135, 139, 192, graph, 11
193, 195, 228, 235–244 component, 12, 112
block, 9, 50, 100, 107–110, 112, 113, 121, connected, 12
142, 190, 214
maximal, 9, 144, 146 Hall subgroup, 182
minimal, 9, 101–107, 112 hash function, 22, 142
block homomorphism, 81 hypergraph, 11, 87
block system, 9, 121, 126, 128, 142, 176, 178 uniform, 11, 87
maximal, 9, 191
minimal, 9, 132, 149, 247, 253 labeled branching, 219, 218–225
represents a group, 220
Cayley graph, 12, 26, 64 represents a transversal, 220
center, 7, 50, 120, 133 Las Vegas algorithm, 14
centralizer, 7, 53, 117–124, 130, 134, 149, local expansion, 72
150, 152, 158, 169, 172, 205, 216 lower central series, 8, 24, 38, 49, 84, 180
Chernoff’s bound, 31, 33–35, 37–39
basic-type application, 32 Markov chain, 25, 27, 215, 217
chief series, 49, 155 aperiodic, 25, 28, 47, 215
closure, 83, 111 irreducible, 25, 28, 215
G-closure, 7, 23, 38, 44, 83 period of, 25, 47
normal closure, 7, 23, 83, 111, 116, 138, stationary distribution of, 25, 28, 215
140, 155, 250, 251 transition probability, 25, 28, 47, 215
collection, 17, 165 Monte Carlo algorithm, 13
commutator, 8
complement, 7, 182 nearly linear-time algorithm, 51
composition series, 50, 125–155, 158, 165, nearly uniform distribution, 24, 29
193, 197, 199 nilpotent group, 175–182
conjugacy class, 172, 214–216 nonconstructive recognition, 227
constructive recognition, 168, 192, 193, 195, normalizer, 7, 134, 137, 142, 154, 170, 211
196, 227, 235–246
coordinatization, 167, 169–171, 173 orbit, 9, 18, 36, 49, 60, 65, 102, 112, 120–122,
core, 8, 50, 124, 180 141, 143, 144, 154, 173, 179, 189, 190,
p-core, 8, 51, 138, 157–159 252
coset enumeration, 184–186 fundamental, 56, 83, 97, 99, 182, 204, 223
cube, 64, 67, 69 orbital graph, 212, 251
nondegenerate, 65 ordered partition, 207
cycle type, 228 cell of, 207
refinement, 208
degree out-degree, 11
of a graph or hypergraph, 11
of a permutation, 9 path, 12, 219, 252
of a permutation group, 9 perfect group, 8, 146
derived series, 8, 24, 38, 49, 84, 159 permutation group as black-box group, 93,
diagonal subgroup, 129, 131, 144, 160 135, 138, 139, 192, 193, 197
direct product, 8, 119–122, 129, 141, 147, permutation isomorphism, 10, 95, 119, 127,
216 128, 140, 142, 146, 173
projection, 8, 130 polycyclic generating sequence, 162, 163, 165,
directed graph, 11 166, 181
out-degree, 11 power-conjugate presentation, 165–166
strongly connected, 12, 112, 251 presentation, 49, 112, 165, 184, 192, 194,
underlying graph, 12, 219 197–200, 227, 236, 237, 244, 247,
double coset, 53, 203 250
264 Index
primitive group, 9, 95, 100, 126, 129–149, stabilizer
160, 225, 251 pointwise, 9, 49, 79, 115, 127, 247
product replacement algorithm, 27 setwise, 10, 53, 145, 176, 206
standard word, 93
random prefix, 40 straight-line program, 10, 192–194, 197, 199,
random subproduct, 30, 30–40, 73, 77, 84, 88, 200, 227, 239, 240, 243, 244, 250
92, 182, 245, 246, 249 strong generating set, see SGS
recognition, see constructive recognition subdirect product, 8, 130, 160
regular graph, 11 subnormal, 7, 49, 124, 149, 152, 160, 161,
regular group, 10, 78, 113, 129, 133, 137, 138, 180
146, 150, 154, 160, 161 support, 9
Sylow subgroup, 50, 95, 125, 157, 161,
Schreier generator, 58, 59, 62, 73, 76–78, 86, 167–172, 182, 216
88, 92, 97, 177, 198, 222, 246, 249
Schreier tree, 56, 65, 67, 70, 72, 75, 77, 82, 85, transitive closure, 219, 220
88, 91, 97, 99, 101, 106, 118, 128, 135, transitive constituent homomorphism, 81
136, 155, 163, 190 transitive group, 9, 36, 87, 117
shallow, 114 transversal, 8, 56, 57, 59, 65, 82, 101, 118,
Schreier vector, see Schreier tree 218, 220, 246, 247, 249, 253
Schreier–Sims algorithm, 59 tree, 12
Schur–Zassenhaus theorem, 182 breadth-first-search, 19, 65, 68, 113, 157,
search tree, 202 187
semiregular group, 10, 117, 123, 130 rooted, 12, 108, 111, 202, 219
SGS, 50, 55 children of a vertex, 12
construction, 59, 63, 70, 72, 75, 86, 87, 90, leaf, 12
99, 162, 193, 222, 246 parent of a vertex, 12
testing, 64, 77, 186, 190, 193
siftee, 56 Union-Find data structure, 108, 113
sifting, 56 up-to-date SGS data structure, 59, 70, 83, 88,
as a word, 86, 88, 91, 128, 156, 166, 167, 91
187 upper central series, 8, 179–181
in a labeled branching, 221, 223, 224
small-base group, 51 valency, 11
socle, 8, 51, 129, 147–149, 152–154, 161
solvable radical, 8, 157–159, 216 walk, 12, 25, 212, 213, 215
solvable residual, 8, 150, 152 lazy random, 26
spreader, 41 wreath product, 10, 119, 122, 129, 160, 226

Permutation Group Algorithms

Uploaded by

Copyright:

Available Formats

Permutation Group Algorithms

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Permutation Group Algorithms

Uploaded by

Copyright:

Available Formats

This page intentionally left blank

Permutation group algorithms comprise one of the workhorses of symbolic

Ákos Seress is a Professor of Mathematics at The Ohio State University.

B. BOLLOBÁS, W. FULTON, A. KATOK, F. KIRWAN, P. SARNAK

152 Permutation Group Algorithms

Cambridge University Press

© Ákos Seress 2002

This book is in copyright. Subject to statutory exception and to the provision of

First published in print format 2003

ISBN-13 978-0-511-06647-4 eBook (NetLibrary)

ISBN-13 978-0-521-66103-4 hardback

Cambridge University Press has no responsibility for the persistence or accuracy of

3 Permutation Groups: A Complexity Overview 48

4 Bases and Strong Generating Sets 55

5 Further Low-Level Algorithms 79

6 A Library of Nearly Linear-Time Algorithms 114

7 Solvable Permutation Groups 162

8 Strong Generating Tests 183

9 Backtrack Methods 201

10 Large-Base Groups 218

Computational group theory (CGT) is a subﬁeld of symbolic algebra; it deals

describes how to access Magma on a subscription basis.

1.1. A List of Algorithms

1.2. Notation and Terminology

The commutator of a, b ∈ G is [a, b] := a −1 b−1 ab and the conjugate of a

consists of all cyclic simple groups, O

1.2.2. Permutation Groups

1.2.3. Algorithmic Concepts

F := { f : N → R | (∃n 0 ∈ N)(∀n > n 0 )( f (n) > 0)}

O( f ) := {t ∈ F | (∃c > 0)(∃n 0 ∈ N)(∀n > n 0 )(t(n) < c f (n))},

1.3. Classiﬁcation of Randomized Algorithms

Prob(R(x, f (x, r )) holds) ≥ 1 − ε,

Babai also discusses one-sided Monte Carlo algorithms (1MC algorithms)

(i) compute a string representing gh;

2.1. Closure Algorithms

The Basic Orbit Algorithm

directed graph D. Let L 0 := {α} and, recursively, deﬁne

In human language, L i consists of those vertices that are endpoints of directed

(i) if a superset ⊇ α G is known in advance and it is possible to allocate

Proof. (i) We deﬁne an auxiliary list L of size | |. Each time an element is

Often the information that some γ ∈ occurs in α G is not enough; we also

2.1.2. Closure of Algebraic Structures

The Closure Algorithm

Algorithms Based on Normal Closure

2.2. Random Elements of Black-Box Groups

Theorem 2.2.1. Let M be a ﬁnite state, irreducible, and aperiodic Markov

1/d, {g, h} ∈ E((G, S))

The Product Replacement Algorithm

Lemma 2.2.2. Let X = (x1 , . . . , xm ) and Y = (y1 , . . . , ym ) be two states of

Proof. Let X = (x1 , . . . , xm 1 , 1, . . . , 1) ∈ V be ﬁxed, where {x1 , . . . , xm 1 } is a

Each element of V can be obtained in 2m(m −1) ways as a result of a product

Theorem 2.2.4. Let c, C > 0 be given constants, and let ε = K −c where K is a

The algorithm mentioned in Theorem 2.2.4 has not been implemented,

2.3. Random Subproducts

(As usual, Prob(A|B) denotes conditional probability.)

The systematic development of the random subproduct method started in

Proof. G acts transitively (by right multiplication) in its regular representa-

Numerous algorithms that use uniformly distributed random elements ex-

Lemma 2.3.3. Let X 1 , X 2 , . . . be a sequence of 0−1 valued random variables

which ﬁnishes the proof of (2.3).

Proof. (i) We deﬁne an auxiliary list L of size ||. Each time an element is

1/d, {g, h} ∈ E((G, S))

ε = 1 − 20/c , p = 1/20, and t = cl G /2i , we obtain