Lecture Notes in Computer Science
Edited by G. Goos, J. Hartmanis, and J. van Leeuwen
2751
3
Berlin
Heidelberg
New York
Hong Kong
London
Milan
Paris
Tokyo
Andrzej Lingas Bengt J. Nilsson (Eds.)
Fundamentals of
Computation Theory
14th International Symposium, FCT 2003
Malmö, Sweden, August 12-15, 2003
Proceedings
13
Series Editors
Gerhard Goos, Karlsruhe University, Germany
Juris Hartmanis, Cornell University, NY, USA
Jan van Leeuwen, Utrecht University, The Netherlands
Volume Editors
Andrzej Lingas
Lund University
Department of Computer Science
Box 118, 221 00 Lund, Sweden
E-mail: Andrzej.Lingas@cs.lth.se
Bengt J. Nilsson
Malmö University College
School of Technology and Society
205 06 Malmö, Sweden
E-mail: Bengt.Nilsson@ts.mah.se
Cataloging-in-Publication Data applied for
A catalog record for this book is available from the Library of Congress.
Bibliographic information published by Die Deutsche Bibliothek
Die Deutsche Bibliothek lists this publication in the Deutsche Nationalbibliografie;
detailed bibliographic data is available in the Internet at <http://dnb.ddb.de>.
CR Subject Classification (1998): F.1, F.2, F.4, I.3.5, G.2
ISSN 0302-9743
ISBN 3-540-40543-7 Springer-Verlag Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting,
reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965,
in its current version, and permission for use must always be obtained from Springer-Verlag. Violations are
liable for prosecution under the German Copyright Law.
Springer-Verlag Berlin Heidelberg New York
a member of BertelsmannSpringer Science+Business Media GmbH
http://www.springer.de
© Springer-Verlag Berlin Heidelberg 2003
Printed in Germany
Typesetting: Camera-ready by author, data conversion by PTP-Berlin GmbH
Printed on acid-free paper
SPIN: 10930632
06/3142
543210
Preface
The papers in this volume were presented at the 14th Symposium on Fundamentals of Computation Theory.
The symposium was established in 1977 as a biennial event for researchers
interested in all aspects of theoretical computer science, in particular in algorithms, complexity, and formal and logical methods. The previous FCT conferences were held in the following cities: Poznań (Poland, 1977), Wendisch-Rietz
(Germany, 1979), Szeged (Hungary, 1981), Borgholm (Sweden, 1983), Cottbus
(Germany, 1985), Kazan (Russia, 1987), Szeged (Hungary, 1989), Gosen-Berlin
(Germany, 1991), Szeged (Hungary, 1993), Dresden (Germany, 1995), Kraków
(Poland, 1997), Iasi (Romania, 1999), and Riga (Latvia, 2001).
The FCT conferences are coordinated by the FCT steering committee, which
consists of B. Chlebus (Denver/Warsaw), Z. Esik (Szeged), M. Karpinski (Bonn),
A. Lingas (Lund), M. Santha (Paris), E. Upfal (Providence), and I. Wegener
(Dortmund).
The call for papers sought contributions on original research in all aspects
of theoretical computer science including design and analysis of algorithms,
abstract data types, approximation algorithms, automata and formal languages, categorical and topological approaches, circuits, computational and structural complexity, circuit and proof theory, computational biology, computational
geometry, computer systems theory, concurrency theory, cryptography, domain
theory, distributed algorithms and computation, molecular computation, quantum computation and information, granular computation, probabilistic computation, learning theory, rewriting, semantics, logic in computer science, specification, transformation and verification, and algebraic aspects of computer science.
There were 73 papers submitted, of which the majority were very good.
Because of the FCT format, the program committee could select only 36 papers
for presentation. In addition, invited lectures were presented by Sanjeev Arora
(Princeton), George Păun (Romanian Academy), and Christos Papadimitriou
(Berkeley).
FCT 2003 was held on August 13–15, 2003, in Malmö, and Andrzej Lingas
(Lund University) and Bengt Nilsson (Malmö University College) were, respectively, the program committee and the conference chairs.
We wish to thank all referees who helped to evaluate the papers. We are grateful to Lund University, Malmö University College, and the Swedish Research
Council for their support.
Lund, May 2003
Andrzej Lingas
Bengt J. Nilsson
Organizing Committee
Bengt Nilsson, Malmö (Chair)
Oscar Garrido, Malmö
Thore Husfeldt, Lund
Miroslaw Kowaluk, Warsaw
Program Committee
Arne Andersson, Uppsala
Stefan Arnborg, KTH Stockholm
Stephen Alstrup, ITU Copenhagen
Zoltan Esik, Szeged
Rusins Freivalds, UL Riga
Alan Frieze, CMU Pittsburgh
Leszek Ga̧sieniec, Liverpool
Magnus Halldórsson, UI Reykjavik
Klaus Jansen, Kiel
Juhani Karhumäki, Turku
Marek Karpinski, Bonn
Christos Levcopoulos, Lund
Ming Li, Santa Barbara
Andrzej Lingas, Lund (Chair)
Jan Maluszyński, Linköping
Fernando Orejas, Barcelona
Jürgen Prömel, Berlin
Rüdiger Reischuk, Lübeck
Wojciech Rytter, Warsaw/NJIT
Miklos Santha, Paris-Sud
Andrzej Skowron, Warsaw
Paul Spirakis, Patras
Esko Ukkonen, Helsinki
Ingo Wegener, Dortmund
Pawel Winter, Copenhagen
Vladimiro Sassone, Sussex
Referees
M. Albert
A. Aldini
J. Arpe
A. Barvinok
C. Bazgan
S.L. Bloom
M. Bläser
M. Bodirsky
B. Bollig
C. Braghin
R. Bruni
A. Bucalo
G. Buntrock
M. Buscemi
B. Chandra
J. Chlebikova
A. Coja-Oghlan
L.A. Cortes
W.F. de la Vega
M. de Rougemont
W. Drabent
S. Droste
C. Durr
M. Dyer
L. Engebretsen
H. Eriksson
L.M. Favrholdt
H. Fernau
A. Ferreira
A. Fishkin
A. Flaxman
D. Fotakis
O. Gerber
G. Ghelli
O. Giel
M. Grantson
J. Gudmundsson
V. Halava
B.V. Halldórsson
L. Hemaspaandra
M. Hermo
M. Hirvensalo
F. Hoffmann
T. Hofmeister
J. Holmerin
J. Hromkovic
L. Ilie
A. Jakoby
T. Jansen
J. Jansson
A. Jarry
M. Jerrum
P. Kanarek
J. Kari
R. Karlsson
J. Katajainen
A. Kiehn
Organization
H. Klaudel
B. Klin
B. Konev
S. Kontogiannis
J. Kortelainen
G. Kortsarz
M. Koutny
D. Kowalski
M. Krivelevich
K.N. Kumar
M. Kääriäinen
G. Lancia
R. Lassaigne
M. Latteux
M. Libura
M. Liśkiewicz
K. Loryś
E.M. Lundell
F. Magniez
B. Manthey
M. Margraf
N. Marti-Oliet
M. Mavronicolas
E. Mayordomo
C. McDiarmid
T. Mielikäinen
M. Mitzenmacher
S. Nikoletseas
B.J. Nilsson
U. Nilsson
J. Nordström
H. Ohsaki
D. Osthus
A. Palbom
G. Persiano
T. Petkovic
I. Potapov
C. Priami
E. Prouff
K. Reinert
J. Rousu
M. Sauerhoff
H. Shachnai
J. Shallit
D. Slezak
J. Srba
F. Stephan
O. Sykora
P. Tadepalli
M. Takeyama
A. Taraz
P. Thiemann
M. Thimm
P. Valtr
S.P.M. van Hoesel
J. van Leeuwen
S. Vempala
Y. Verhoeven
E. Venigoda
H. Vogler
B. Vöcking
H. Völzer
A.P.M. Wagelmans
R. Wanka
M. Westermann
A. Wojna
J. Wroblewski
Q. Xin
M. Zachariasen
G. Zhang
G.Q. Zhang
H. Zhang
VII
Table of Contents
Approximability 1
Proving Integrality Gaps without Knowing the Linear Program . . . . . . . . .
Sanjeev Arora
1
An Improved Analysis of Goemans and Williamson’s LP-Relaxation
for MAX SAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Takao Asano
2
Certifying Unsatisfiability of Random 2k-SAT Formulas Using
Approximation Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Amin Coja-Oghlan, Andreas Goerdt, André Lanka, Frank Schädlich
15
Approximability 2
Inapproximability Results for Bounded Variants of
Optimization Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Miroslav Chlebı́k, Janka Chlebı́ková
27
Approximating the Pareto Curve with Local Search for the
Bicriteria TSP(1,2) Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Eric Angel, Evripidis Bampis, Laurent Gourvès
39
Scheduling to Minimize Max Flow Time: Offline and
Online Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Monaldo Mastrolilli
49
Algorithms 1
Linear Time Algorithms for Some NP-Complete Problems on
(P5 ,Gem)-Free Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Hans Bodlaender, Andreas Brandstädt, Dieter Kratsch, Michaël Rao,
Jeremy Spinrad
Graph Searching, Elimination Trees, and a Generalization
of Bandwidth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Fedor V. Fomin, Pinar Heggernes, Jan Arne Telle
61
73
Constructing Sparse t-Spanners with Small Separators . . . . . . . . . . . . . . . . .
Joachim Gudmundsson
86
Composing Equipotent Teams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Mark Cieliebak, Stephan Eidenbenz, Aris Pagourtzis
98
X
Table of Contents
Algorithms 2
Efficient Algorithms for GCD and Cubic Residuosity in the Ring
of Eisenstein Integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
Ivan Bjerre Damgård, Gudmund Skovbjerg Frandsen
An Extended Quadratic Frobenius Primality Test with Average and
Worst Case Error Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
Ivan Bjerre Damgård, Gudmund Skovbjerg Frandsen
Periodic Multisorting Comparator Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 132
Marcin Kik
Fast Periodic Correction Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
Grzegorz Stachowiak
Networks and Complexity
Games and Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
Christos Papadimitriou
One-Way Communication Complexity of Symmetric Boolean Functions . . 158
Jan Arpe, Andreas Jakoby, Maciej Liśkiewicz
Circuits on Cylinders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
Kristoffer Arnsfelt Hansen, Peter Bro Miltersen, V. Vinay
Computational Biology
Fast Perfect Phylogeny Haplotype Inference . . . . . . . . . . . . . . . . . . . . . . . . . . 183
Peter Damaschke
On Exact and Approximation Algorithms for Distinguishing
Substring Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
Jens Gramm, Jiong Guo, Rolf Niedermeier
Complexity of Approximating Closest Substring Problems . . . . . . . . . . . . . . 210
Patricia A. Evans, Andrew D. Smith
Computational Geometry
On Lawson’s Oriented Walk in Random Delaunay Triangulations . . . . . . . 222
Binhai Zhu
Competitive Exploration of Rectilinear Polygons . . . . . . . . . . . . . . . . . . . . . . 234
Mikael Hammar, Bengt J. Nilsson, Mia Persson
An Improved Approximation Algorithm for Computing Geometric
Shortest Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
Lyudmil Aleksandrov, Anil Maheshwari, Jörg-Rüdiger Sack
Table of Contents
XI
Adaptive and Compact Discretization for Weighted Region Optimal
Path Finding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
Zheng Sun, John H. Reif
On Boundaries of Highly Visible Spaces and Applications . . . . . . . . . . . . . . 271
John H. Reif, Zheng Sun
Computational Models and Complexity
Membrane Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
Gheorghe Păun
Classical Simulation Complexity of Quantum Machines . . . . . . . . . . . . . . . . 296
Farid Ablayev, Aida Gainutdinova
Using Depth to Capture Average-Case Complexity . . . . . . . . . . . . . . . . . . . . 303
Luı́s Antunes, Lance Fortnow, N.V. Vinodchandran
Structural Complexity
Non-uniform Depth of Polynomial Time and Space Simulations . . . . . . . . . 311
Richard J. Lipton, Anastasios Viglas
Dimension- and Time-Hierarchies for Small Time Bounds . . . . . . . . . . . . . . 321
Martin Kutrib
Baire’s Categories on Small Complexity Classes . . . . . . . . . . . . . . . . . . . . . . . 333
Philippe Moser
Formal Languages
Operations Preserving Recognizable Languages . . . . . . . . . . . . . . . . . . . . . . . 343
Jean Berstel, Luc Boasson, Olivier Carton, Bruno Petazzoni,
Jean-Éric Pin
Languages Defined by Generalized Equality Sets . . . . . . . . . . . . . . . . . . . . . . 355
Vesa Halava, Tero Harju, Hendrik Jan Hoogeboom, Michel Latteux
Context-Sensitive Equivalences for Non-interference Based
Protocol Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364
Michele Bugliesi, Ambra Ceccato, Sabina Rossi
On the Exponentiation of Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376
Werner Kuich, Klaus W. Wagner
Kleene’s Theorem for Weighted Tree-Automata . . . . . . . . . . . . . . . . . . . . . . . 387
Christian Pech
XII
Table of Contents
Logic
Weak Cardinality Theorems for First-Order Logic . . . . . . . . . . . . . . . . . . . . . 400
Till Tantau
Compositionality of Hennessy-Milner Logic through Structural
Operational Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412
Wan Fokkink, Rob van Glabbeek, Paulien de Wind
On a Logical Approach to Estimating Computational Complexity of
Potentially Intractable Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423
Andrzej Szalas
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433
Proving Integrality Gaps without Knowing the
Linear Program
Sanjeev Arora
Princeton University
During the past decade we have had much success in proving (using probabilistically checkable proofs or PCPs) that computing approximate solutions to
NP-hard optimization problems such as CLIQUE, COLORING, SET-COVER
etc. is no easier than computing optimal solutions.
After the above notable successes, this effort is now stuck for many other
problems, such as METRIC TSP, VERTEX COVER, GRAPH EXPANSION,
etc.
In a recent paper with Béla Bollobás and László Lovász we argue that NPhardness of approximation may be too ambitious a goal in these cases, since
NP-hardness implies a lowerbound – assuming P = NP – on all polynomial time
algorithms. A less ambitious goal might be to prove a lowerbound on restricted
families of algorithms. Linear and semidefinite programs constitute a natural
family, since they are used to design most approximation algorithms in practice. A lowerbound result for a large subfamily of linear programs may then be
viewed as a lowerbound for a restricted computational model, analogous say to
lowerbounds for monotone circuits
The above paper showed that three fairly general families of linear relaxations
for vertex cover cannot be used to design a 2-approximation for Vertex Cover.
Our methods seem relevant to other problems as well.
This talk surveys this work, as well as other open problems in the field. The
most interesting families of relaxations involve those obtained by the so-called
lift and project methods of Lovász-Schrijver and Sherali-Adams.
Proving lowerbounds for such linear relaxations involve elements of combinatorics (i.e., strong forms of classical Erdős theorems), proof complexity, and
the theory of convex sets.
References
1. S. Arora, B. Bollobás, and L. Lovász. Proving integrality gaps without knowing the
linear program. Proc. IEEE FOCS 2002.
2. S. Arora and C. Lund. Hardness of approximations. In [3].
3. D. Hochbaum, ed. Approximation Algorithms for NP-hard problems. PWS Publishing, Boston, 1996.
4. L. Lovász and A. Schrijver. Cones of matrices and setfunctions, and 0-1 optimization.
SIAM Journal on Optimization, 1:166–190, 1990.
5. H. D. Sherali and W. P. Adams. A hierarchy of relaxations between the continuous and convex hull representations for zeroone programming problems. SIAM J.
Optimization, 3:411–430, 1990.
A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, p. 1, 2003.
c Springer-Verlag Berlin Heidelberg 2003
An Improved Analysis of Goemans and
Williamson’s LP-Relaxation for MAX SAT
Takao Asano
Department of Information and System Engineering
Chuo University, Bunkyo-ku, Tokyo 112-8551, Japan
asano@ise.chuo-u.ac.jp
Abstract. For MAX SAT, which is a well-known NP-hard problem,
many approximation algorithms have been proposed. Two types of best
approximation algorithms for MAX SAT were proposed by Asano and
Williamson: one with best proven performance guarantee 0.7846 and the
other with performance guarantee 0.8331 if a conjectured performance
guarantee of 0.7977 is true in the Zwick’s algorithm. Both algorithms
are based on their sharpened analysis of Goemans and Williamson’s
LP-relaxation for MAX SAT. In this paper, we present an improved
analysis which is simpler than the previous analysis. Furthermore,
algorithms based on this analysis will play a role as a better building
block in designing an improved approximation algorithm for MAX SAT.
Actually we show an example that algorithms based on this analysis
lead to approximation algorithms with performance guarantee 0.7877
and conjectured performance guarantee 0.8353 which are slightly better
than the best known corresponding performance guarantees 0.7846 and
0.8331 respectively.
Keywords: Approximation algorithm, MAX SAT, LP-relaxation.
1
Introduction
MAX SAT, one of the most well-studied NP-hard problems, is stated as follows:
given a set of clauses with weights, find a truth assignment that maximizes
the sum of the weights of the satisfied clauses. More precisely, an instance of
MAX SAT is defined by (C, w), where C is a set of boolean clauses, each clause
C ∈ C being a disjunction of literals and having a positive weight w(C). Let
X = {x1 , . . . , xn } be the set of boolean variables in the clauses of C. A literal is a
variable x ∈ X or its negation x̄. For simplicity we assume xn+i = x̄i (xi = x̄n+i ).
Thus, X̄ = {x̄ | x ∈ X} = {xn+1 , xn+2 , . . . , x2n } and X ∪ X̄ = {x1 , . . . , x2n }.
We assume that no literals with the same variable appear more than once in
a clause in C. For each xi ∈ X, let xi = 1 (xi = 0, resp.) if xi is true (false,
resp.). Then, xn+i = x̄i = 1 − xi and a clause Cj = xj1 ∨ xj2 ∨ · · · ∨ xjkj ∈ C
kj
can be considered to be a function Cj = Cj (x) = 1 − i=1
(1 − xji ) on x =
(x1 , . . . , x2n ) ∈ {0, 1}2n . Thus, Cj = Cj (x) = 0 or 1 for any truth assignment
x ∈ {0, 1}2n with xi + xn+i = 1 (i = 1, 2, ..., n) and Cj is satisfied if Cj (x) = 1.
A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 2–14, 2003.
c Springer-Verlag Berlin Heidelberg 2003
An Improved Analysis of Goemans and Williamson’s LP-Relaxation
3
The value of a truth assignment x is defined to be FC (x) = Cj ∈C w(Cj )Cj (x).
That is, the value of x is the sum of the weights of the clauses in C satisfied by
x. Thus, the goal of MAX SAT is to find an optimal truth assignment (i.e., a
truth assignment of maximum value). We will also use MAX kSAT, a restricted
version of the problem in which each clause has at most k literals.
MAX SAT is known to be NP-hard and many approximation algorithms for
it have been proposed. Håstad [5] has shown that no approximation algorithm
for MAX SAT can achieve performance guarantee better than 7/8 unless P =
N P . On the other hand, Asano and Williamson [1] have presented a 0.7846approximation algorithm and an approximation algorithm whose performance
guarantee is 0.8331 if a conjectured performance guarantee of 0.7977 is true in
the Zwick’s algorithm [9]. Both algorithms are based on their sharpened analysis
of Goemans and Williamson’s LP-relaxation for MAX SAT [3].
In this paper, we present an improved analysis which is simpler than the
previous analysis by Asano and Williamson [1]. Furthermore, this analysis will
lead to approximation algorithms with better performance guarantees if combined with other approximation algorithms which were (and will be) presented.
Algorithms based on this analysis will be used as a building block in designing an
improved approximation algorithm for MAX SAT. Actually, algorithms based
on this analysis lead to approximation algorithms with performance guarantee
0.7877 and conjectured performance guarantee 0.8353 which are slightly better
than the best known corresponding performance guarantees 0.7846 and 0.8331
respectively, if combined with the MAX 2SAT and MAX 3SAT algorithms by
Halperin and Zwick [6] and the Zwick’s algorithm [9], respectively.
To explain our result in more detail, we briefly review the 0.75-approximation
algorithm of Goemans and Williamson based on the probabilistic method [3].
Let xp = (xp1 , . . . , xp2n ) be a random truth assignment with 0 ≤ xpi = pi ≤ 1
(xpn+i = 1 − xpi = 1 − pi = pn+i ). That is, xp is obtained by setting independently each variable xi ∈ X to be true with probability pi (and xn+i = x̄i
to be true with probability pn+i = 1 − pi ). Then the probability of a clause
Cj = xj1 ∨ xj2 ∨ · · · ∨ xjkj ∈ C satisfied by the random truth assignment
kj
xp = (xp1 , . . . , xp2n ) is Cj (xp ) = 1 − i=1
(1 −
xpji ). Thus, the expected value
p
p
of the random truth assignment x is FC (x ) = Cj ∈C w(Cj )Cj (xp ). The probabilistic method assures that there is a truth assignment xq ∈ {0, 1}2n of value
at least FC (xp ). Such a truth assignment xq can be obtained by the method of
conditional probabilities [3].
Using an IP (integer programming) formulation of MAX SAT and its LP
(linear programming) relaxation, Goemans and Williamson [3] obtained an algorithm for finding a random truth assignment xp of value FC (xp ) at least
1
1 k
k≥1 (1 − (1 − k ) )Ŵk ≥ (1 − e )Ŵ ≈ 0.632Ŵ , where e is the base of nat
ural logarithm, Ŵk = C∈Ck w(C)C(x̂), and FC (x̂) = k≥1 Ŵk for an optimal
truth assignment x̂ of (C, w) (Ck denotes the set of clauses in C with k literals).
Goemans and Williamson also obtained a 0.75-approximation algorithm by using
a hybrid approach of combining the above algorithm with Johnson’s algorithm
[7]. It finds a random truth assignment of value at least
4
T. Asano
0.750Ŵ1 + 0.750Ŵ2 + 0.789Ŵ3 + 0.810Ŵ4 + 0.820Ŵ5 + 0.824Ŵ6 +
k≥7
βk Ŵk ,
where βk = 12 (2 − 21k − (1 − k1 )k ). Asano and Williamson [1] showed that one
of the non-hybrid algorithms of Goemans and Williamson finds a random truth
assignment xp with value FC (xp ) at least
0.750Ŵ1 + 0.750Ŵ2 + 0.804Ŵ3 + 0.851Ŵ4 + 0.888Ŵ5 + 0.915Ŵ6 +
k≥7
γk Ŵk ,
1
where γk = 1 − 21 ( 43 )k−1 (1 − 3(k−1)
)k−1 for k ≥ 3 (γk > βk for k ≥ 3). Actually,
they obtained a 0.7846-approximation algorithm by combining this algorithm
with known MAX kSAT algorithms. They also proposed a generalization of this
algorithm which finds a random truth assignment xp with value FC (xp ) at least
′
γk Ŵk ,
0.914Ŵ1 + 0.750Ŵ2 + 0.750Ŵ3 + 0.766Ŵ4 + 0.784Ŵ5 + 0.801Ŵ6 + 0.817Ŵ7 +
k≥8
where γk′ = 1 − 0.914k (1 − k1 )k for k ≥ 8. They showed that if this is combined
with Zwick’s MAX SAT algorithm with conjectured 0.7977 performance guarantee then it leads to an approximation algorithm with performance guarantee
0.8331.
In this paper, we show that another generalization of the non-hybrid algorithms of Goemans and Williamson finds a random truth assignment xp with
value FC (xp ) at least
0.750Ŵ1 + 0.750Ŵ2 + 0.815Ŵ3 + 0.859Ŵ4 + 0.894Ŵ5 + 0.920Ŵ6 +
k≥7
ζk Ŵk ,
where ζk = 1 − 41 ( 43 )k−2 for k ≥ 3 and ζk > γk . We also present another
algorithm which finds a random truth assignment xp with value FC (xp ) at least
′
γk Ŵk .
0.914Ŵ1 + 0.750Ŵ2 + 0.757Ŵ3 + 0.774Ŵ4 + 0.790Ŵ5 + 0.804Ŵ6 + 0.818Ŵ7 +
k≥8
This will be used to obtain a 0.8353-approximation algorithm.
The remainder of the paper is structured as follows. In Section 2 we review
the algorithms of Goemans and Williamson [3] and Asano and Williamson [1].
In Section 3 we give our main results and their proofs. In Section 4 we briefly
outline improved approximation algorithms for MAX SAT obtained by our main
results.
2
MAX SAT Algorithms of Goemans and Williamson
Goemans and Williamson considered the following LP relaxation (GW ) of MAX
SAT [3]:
(GW ) max
w(Cj )zj
Cj ∈ C
s.t.
kj
i=1
yji ≥ zj
∀Cj = xj1 ∨ xj2 ∨ · · · ∨ xjkj ∈ C
An Improved Analysis of Goemans and Williamson’s LP-Relaxation
5
∀i ∈ {1, 2, ..., n}
yi + yn+i = 1
0 ≤ yi ≤ 1
0 ≤ zj ≤ 1
∀i ∈ {1, 2, ..., 2n}
∀Cj ∈ C.
In this formulation, variables y = (yi ) correspond to the literals {x1 , . . . , x2n }
and variables z = (zj ) correspond to the clauses C. Thus, variable yi = 1 if and
only if xi = 1. Similarly, zj = 1 if and only if Cj is satisfied. The first set of constraints implies that one of the literals in a clause is true if the clause is satisfied
and thus IP formulation of this (GW ) with yi ∈ {0, 1} (∀i ∈ {1, 2, ..., 2n}) and
zj ∈ {0, 1} (∀Cj ∈ C) exactly corresponds to MAX SAT.
Throughout this paper, let (y ∗ , z ∗ ) be an optimal solution to this LP relaxation of MAX SAT. Goemans and Williamson set each variable xi to be true
with probability yi∗ . Then a clause Cj = xj1 ∨ xj2 ∨ · · · ∨ xjkj is satisfied by this
k ∗
random truth assignment xp = y ∗ with probability Cj (y ∗ ) ≥ 1 − 1 − k1
zj .
Thus, the expected value F (y ∗ ) of y ∗ obtained in this way satisfies
k
1
1
∗
∗
Wk∗ ≥ 1 −
w(Cj )Cj (y ) ≥
1− 1−
F (y ) =
k
e
Cj ∈C
W ∗,
k≥1
∗
∗
∗
∗
=
where W ∗ =
Cj ∈Ck w(Cj )zj (note that W
Cj ∈C w(Cj )zj and Wk =
∗
w(C
)ẑ
for
an
optimal
solution
(ŷ,
ẑ)
to
the
IP
≥
Ŵ
=
w(C
)z
j j
j j
Cj ∈C
Cj ∈C
formulation of MAX SAT). Since (1 − 1e ) ≈ 0.632, this is a 0.632-approximation
algorithm for MAX SAT.
Goemans and Williamson [3] also considered three other non-linear randomized rounding algorithms. In these algorithms, each variable xi is set to be true
with probability fℓ (yi∗ ) defined as follows (ℓ = 1, 2, 3).
3
1
1
4 y + 4 if 0 ≤ y ≤ 3
if 31 ≤ y ≤ 32
f1 (y) = 12
3
if 32 ≤ y ≤ 1,
4y
f2 (y) = (2a − 1)y + 1 − a
1 − 4−y ≤ f3 (y) ≤ 4y−1 .
3
3
≤a≤ √
−1 ,
3
4
4
∗
Note that fℓ (yi∗ ) + fℓ (yn+i
) = 1 hold for ℓ = 1, 2 and that f3 (yi∗ ) has to be
∗
∗
chosen to satisfy f3 (yi ) + f3 (yn+i
) = 1. They then proved that all the random
p
∗
∗
truth assignments x = fℓ (y ) = (fℓ (y1∗ ), . . . , fℓ (y2n
)) obtained in this way
3
3
∗
have the expected values at least 4 W and lead to 4 -approximation algorithms.
Asano and Williamson [1] sharpened the analysis of Goemans and Williamson to
provide more precise bounds on the probability of a clause Cj = xj1 ∨xj2 ∨· · ·∨xjk
with k literals being satisfied (and thus on the expected weight of satisfied clauses
6
T. Asano
in Ck ) by the random truth assignment xp = fℓ (y ∗ ) for each k (and ℓ = 1, 2).
From now on, we assume by symmetry, xji = xi for each i = 1, 2, ..., k since
fℓ (x) = 1 − fℓ (x̄) and we can set x := x̄ if necessary. They considered clause
Cj = x1 ∨ x2 ∨ · · · ∨ xk corresponding to the constraint y1 + y2 + · · · + yk ≥ zj
in the LP relaxation (GW ) of MAX SAT, and gave a bound on the ratio of
Cj (fℓ (y ∗ )) to zj∗ , where Cj (fℓ (y ∗ )) is the probability of clause Cj being satisfied
by the random truth assignment xp = fℓ (y ∗ ) (ℓ = 1, 2). Actually, they analyzed
parametrized functions f1a and f2a with 21 ≤ a ≤ 1 defined as follows:
1
ay + 1 − a if 0 ≤ y ≤ 1 − 2a
1
1
if 1 − 2a
≤ y ≤ 2a
(1)
f1a (y) = 21
1
ay
if 2a
≤ y ≤ 1,
f2a (y) = (2a − 1)y + 1 − a.
3/4
Note that f1 = f1
and f2 = f2a . Let
k−1
a
γk,1
(2)
1
1 − 2a
1
1
a
= 1 − ak−1 1 −
,
γk,2
= 1 − ak 1 −
2
k−1
k
a
if k = 1
γka =
a
a
, γk,2
} if k ≥ 2,
min{γk,1
and
δka = 1 − ak 1 −
2−
k
1
a
k
,
(3)
(4)
k
.
(5)
Then their results are summarized as follows.
k
Proposition 1. [1] For 12 ≤ a ≤ 1, let Cj (fℓa (y ∗ )) = 1 − i=1 (1 − fℓa (yi∗ ))
be the probability of clause Cj = x1 ∨ x2 ∨ · · · ∨ xk ∈ C being satisfied by the
∗
random truth assignment xp = fℓa (y ∗ ) = (fℓa (y1∗ ), . . . , fℓa (y2n
)) (ℓ = 1, 2). Then
the following statements hold.
k
a ∗
a
∗
1. Cj (f1a (y ∗ )) = 1− i=1 (1−f1a (yi∗ )) ≥ γ
k zj and the expected value F (f1 (y ))
a
∗
∗
∗
a
p
a
of x = f1 (y ) satisfies F (f1 (y )) ≥ k≥1 γk Wk .
k
a ∗
a
∗
2. Cj (f2a (y ∗ )) = 1 − i=1 (1 − f2a (yi∗ )) ≥ δ
k zj and the expected value F (f2 (y ))
a
∗
p
a
∗
a
∗
of x = f2 (y ) satisfies F (f2 (y )) ≥ k≥1 δk Wk .
3. γka > δka hold for all k ≥ 3 and for all a with 21 < a < 1. For k = 1, 2,
γka = δka (γ1a = δ1a = a, γ2a = δ2a = 34 ) hold.
3
Main Results and Their Proofs
Asano and Williamson did not consider a parametrized function of f3 . In this
section we consider a parametrized function f3a of f3 and show that it has better
An Improved Analysis of Goemans and Williamson’s LP-Relaxation
7
performance than f1a and f2a . Furthermore, its analysis (proof) is simpler. We
also consider a generalization of both f1a and f2a .
For 12 ≤ a ≤ 1, let f3a be defined as follows:
a
1
1 − (4a2 )y if 0 ≤ y ≤ 2
a
f3 (y) =
(6)
(4a2 )y
1
if 2 ≤ y ≤ 1.
4a
For
3
4
≤ a ≤ 1, let
1 1
− .
a 2
Then the other parametrized function f4a is defined as follows:
ay + 1 − a if 0 ≤ y ≤ 1 − ya
f4a (y) = a2 y + 12 − a4 if 1 − ya ≤ y ≤ ya
ay
if ya ≤ y ≤ 1.
ya =
(7)
(8)
Thus, f3a (y) + f3a (1 − y) = 1 and f4a (y) + f4a (1 − y) = 1 hold for 0 ≤ y ≤ 1.
Furthermore, f3a and f4a are both continuous functions which are increasing with
y. Thus, f3a ( 12 ) = f4a ( 21 ) = 12 . Let ζka and ηka be the numbers defined as follows.
a
if k = 1
a
ζk =
(9)
1 − 14 ak−2 if k ≥ 2,
a
a
ηk,1
= γk,2
= 1 − ak 1 −
a
ηk,3
=1−
ak
2
1−
ηka =
1 − ya
k−1
1
k
k
ak−2
,
4
(10)
a a k
1
−
1
+
,
2k
2 k
(11)
,
a
ηk,2
= ζka = 1 −
,
a
ηk,4
=1−
k−1
a
if k = 1
a
a
a
a
min{ηk,1
, ηk,2
, ηk,3
, ηk,4
} if k ≥ 2.
(12)
Then we have the following theorems for the two parameterized functions f3a
and f4a .
√
Theorem 1. For 12 ≤ a ≤ 2e = 0.82436, the probability of Cj = x1 ∨ x2 ∨
· · · ∨ xk ∈ C being satisfied by the random truth assignment xp = f3a (y ∗ ) =
k
∗
(f3a (y1∗ ), . . . , f3a (y2n
)) is Cj (f3a (y ∗ )) = 1 − i=1 (1 − f3a (yi∗ )) ≥
ζka zj∗ . Thus, the
a
∗
p
a
∗
a
∗
expected value F (f3 (y )) of x = f3 (y ) satisfies F (f3 (y )) ≥ k≥1 ζka Wk∗ .
√
Theorem 2. For 2e = 0.82436 ≤ a ≤ 1, the probability of Cj = x1 ∨ x2 ∨
· · · ∨ xk ∈ C being satisfied by the random truth assignment xp = f4a (y ∗ ) =
k
∗
)) is Cj (f4a (y ∗ )) = 1 − i=1 (1 − f4a (yi∗ )) ≥
ηka zj∗ . Thus, the
(f4a (y1∗ ), . . . , f4a (y2n
∗
∗
a
∗
p
a
a
expected value F (f4 (y )) of x = f4 (y ) satisfies F (f4 (y )) ≥ k≥1 ηka Wk∗ .
8
T. Asano
Theorem 3. The following statements hold for γka , δka , ζka , and ηka .
√
e
1
a
a
a
1. If √
2 ≤ a ≤ 2 = 0.82436, then ζk > γk > δk hold for all k ≥ 3.
e
a
a
a
2. If 2 = 0.82436 ≤ a < 1, then ηk ≥ γk > δk hold for all k ≥ 3. In particular,
√
if 2e = 0.82436 ≤ a ≤ 0.881611, then ηka > √
γka > δka hold for all k ≥ 3.
3. For k = 1, 2, γka = δka = ζka hold if 12 ≤ a ≤ 2e = 0.82436, and γka = δka = ηka
√
hold if 2e = 0.82436 ≤ a ≤ 1.
In this paper, we first give a proof of Theorem 1. It is very simple and we
use only the following lemma.
Lemma 1. If
1
2
≤a≤
√
e
2
= 0.82436, then f3a (y) ≥ ay.
2 y
2 y
Proof. Let g(y) ≡ (4a4a) − ay. Then its derivative is g ′ (y) = ln(4a2 ) (4a4a) − a.
Thus,√g ′ (y) is increasing with y and g ′ (1) = a(ln(4a2 ) − 1) ≤ 0, since ln(4a2 ) ≤
ln(4( 2e )2 ) = 1. This implies that g ′ (y) ≤ 0 for all 0 ≤ y ≤ 1 and that g(y) is
decreasing with 0 ≤ y ≤ 1. Thus, g(y) takes a minimum value at y = 1, i.e.,
2 y
2
g(y) = (4a4a) − ay ≥ g(1) = 4a
4a − a = 0.
Now we are ready to prove the lemma. For 21 ≤ y ≤ 1, we have f3 (y) − ay =
g(y) =
(4a2 )y
4a
− ay ≥ 0. For 0 ≤ y ≤ 21 , we have
a
(4a2 )1−y
+ a(1 − y) + 1 − a
− ay = −
2
y
(4a )
4a
1
1−a
= −g(1 − y) + 1 − a ≥ −g( ) + 1 − a =
≥0
2
2
f3 (y) − ay = 1 −
since g(y) is decreasing and g(1 − y) ≤ g( 21 ) =
1−a
2
for
1
2
≤ 1 − y ≤ 1.
Proof of Theorem 1. Noting that clause Cj = x1 ∨x2 ∨· · ·∨xk corresponds
to the constraint
y1∗ + y2∗ + · · · + yk∗ ≥ zj∗
(13)
in the LP relaxation (GW ) of MAX SAT, we will show that
Cj (f3a (y ∗ ))
=1−
k
(1 − f3a (yi∗ )) ≥ ζka zj∗
i=1
√
for 21 ≤ a ≤ 2e = 0.82436. By symmetry, we assume y1∗ ≤ y2∗ ≤ · · · ≤ yk∗ . Note
that yk∗ ≤ zj∗ , since otherwise (y ∗ , z ∗ ) would not be an optimal solution to the LP
zj′ = yk∗ and zj′ ′ = zj∗′
relaxation (GW ) of MAX SAT (if yk∗ > zj∗ then (y ∗ , z ′ ) with
′
(j = j) would also be a feasible solution to (GW ) and Cj′ ∈ C w(Cj ′ )zj′ ′ >
∗
Cj ′ ∈ C w(Cj ′ )zj ′ ), a contradiction.
If k = 1, then we have Cj (f3a (y ∗ )) = f3a (y1∗ ) ≥ ay1∗ ≥ azj∗ = ζ1a zj∗ by Lemma 1
and inequality (13).
Next suppose k ≥ 2. We consider two cases as follows: Case 1: 0 ≤ yk∗ ≤ 12 ;
and Case 2: 12 < yk∗ ≤ 1.
An Improved Analysis of Goemans and Williamson’s LP-Relaxation
9
Case 1: 0 ≤ yk∗ ≤ 21 . Since all yi∗ ≤ 21 (i = 1, 2, ..., k), we have f3a (yi∗ ) =
a
1 − a2 y∗ and 1 − f3a (yi∗ ) =
∗ . Thus, we have
2 y
(4a )
(4a )
i
Cj (f3a (y ∗ )) = 1 −
k
i
(1 − f3a (yi∗ )) = 1 −
i=1
≥1−
ak
∗ ≥
(4a2 )zj
1−
ak
4a2
k
ak
a
k ∗
=1−
yi∗
2
(4a )
(4a2 ) i=1 yi
i=1
ak−2
zj∗ = ζka zj∗ ,
zj∗ = 1 −
4
where the first inequality follows by inequality (13), and the second inequality
k
follows from the fact that 1 − a z∗ is a concave function in 0 ≤ zj∗ ≤ 1.
Case 2:
1
2
<
yk∗
≤ 1. Let
(4a2 ) j
∗
yk−1 > 12 .
Then, since f3a (yi∗ ) ≥ 1 − a (i = 1, 2, ..., k),
∗
2 yi
we have 1 − f3a (yi∗ ) ≤ a (i = 1, 2, ..., k − 2), 1 − f3a (yi∗ ) = 1 − (4a4a) ≤
k
(i = k − 1, k), and zj∗ ≤ 1, and Cj (f3a (y ∗ )) = 1 − i=1 (1 − f3a (yi∗ )) satisfies
Cj (f3a (y ∗ )) ≥ 1 − ak−2
1
2
2
=1−
ak−2
≥
4
1−
∗
Thus, we can assume yk−1
≤ 12 . Since 1 − f3a (yi∗ ) =
ak−2
4
a
y∗
(4a2 ) i
1
2
zj∗ = ζka zj∗ .
(i = 1, 2, ..., k − 1),
we have
Cj (f3a (y ∗ )) = 1 −
k
(1 − f3a (yi∗ )) = 1 −
i=1
ak−1
k−1 ∗
(4a2 ) i=1 yi
∗
1−
(4a2 )yk
4a
∗
∗
∗
(4a2 )yk
(4a2 )yk
ak−1
ak−1
2 yk
1−
1−
=1 −
∗ −y ∗
∗ (4a )
z
z
4a
4a
(4a2 ) j k
(4a2 ) j
k
k−2
k−1
a
a
a
zj∗ = ζka zj∗
a=1−
≥ 1−
≥1−
zj∗
zj∗
2
2
4
(4a )
(4a )
≥1 −
∗
2 yk
u
) ≤ a with u = (4a2 )yk ,
by inequality (13), yk∗ ≤ zj∗ , (4a2 )yk (1− (4a4a) ) = u(1− 4a
∗
and the fact that 1 −
ak
(4a2 )
z∗
j
∗
is a concave function in 0 ≤ zj∗ ≤ 1.
Proofs of Theorems 2 and 3. Proofs of Theorems 2 and 3 are almost
similar to ones in Asano and Williamson [1]. In this sense, proofs may be a little
complicated, however, they can be done in a systematic way. Here, we will give
only an outline of Proof of Theorem 2. Proof of Theorem 3 is almost similar.
Outline of Proof of Theorem 2. For a clause Cj = x1 ∨ x2 ∨ · · · ∨ xk
corresponding to the constraint y1∗ + y2∗ + · · · + yk∗ ≥ zj∗ as described in Proof
k
of Theorem 1, we will show that Cj (f4a (y ∗ )) = 1 − i=1 (1 − f4a (yi∗ )) ≥ ηka zj∗
for 34 ≤ a ≤ 1. We assume y1∗ ≤ y2∗ ≤ · · · ≤ yk∗ and yk∗ ≤ zj∗ holds as described
before.
Suppose k = 1. Since f4a (y) − ay = 1 − a ≥ 0 for 0 ≤ y ≤ 1 − ya and
a
f4 (y) − ay = 0 for ya ≤ y ≤ 1, we consider the case when 1 − ya ≤ y ≤ ya .
10
T. Asano
In this case, f4a (y) − ay = 2−a−2ay
is decreasing with 1 − ya ≤ y ≤ ya and we
4
2−a−2ay
a
a
a
≥ f4 (ya ) − aya = 2−a−2ay
= 0 by Eq.(7). Thus,
have f4 (y) − ay =
4
4
a
∗
a ∗
∗
∗
a ∗
Cj (f4 (y )) = f4 (y1 ) ≥ ay1 ≥ azj = η1 zj by inequality (13).
Next suppose k ≥ 2. We consider three cases as follows. Case 1: yk∗ ≤ 1 − ya ;
Case 2: 1 − ya < yk∗ ≤ ya ; and Case 3: ya ≤ yk∗ ≤ 1.
Case 1: yk∗ ≤ 1 − ya . Since all yi∗ ≤ 1 − ya (i = 1, 2, ..., k), f4a (yi∗ ) = 1 − a + ayi∗
k
and 1 − f4a (yi∗ ) = a(1 − yi∗ ). Thus, Cj (f4a (y ∗ )) = 1 − i=1 (1 − f4a (yi∗ )) satisfies
Cj (f4a (y ∗ ))
k
=1−a
k
(1 −
yi∗ )
zj∗
k
k
k
≥1−a
i=1
≥ 1 − ak 1 −
≥
1−
k
i=1
1 − ak 1 −
k
yi∗
k
1
k
k
a
zj∗ = ηk,1
zj∗ ,
where the first inequality follows by the arithmetic/geometric mean inequality,
z∗
the second by inequality (13), and third by the fact that 1 − ak (1 − kj )k is a
concave function in 0 ≤ zj∗ ≤ 1.
∗
Case 2: 1 − ya ≤ yk∗ ≤ ya . Let ℓ be the number such that yℓ∗ < 1 − ya ≤ yℓ+1
ℓ
k
and let yA = i=1 yi∗ and yB = i=ℓ+1 yi∗ . Then k − ℓ ≥ 1 and ℓ ≥ 0. If ℓ = 0
then, f4a (yi∗ ) = 12 ayi∗ + 1 − a2 (i = 1, 2, ..., k) and for the same reason as in
Case 1 above, we have
Cj (f4a (y ∗ )) = 1 −
k
(1 − f4a (yi∗ )) = 1 −
i=1
k
k
k
1
2
1+
i=1
a
− ayi∗
2
k
a ayB k
a azj∗
1
1
≥1−
1+ −
1+ −
≥1−
2
2
k
2
2
k
k
1
a a k
a
= 1−
zj∗ = ηk,4
1+ −
zj∗ .
2
2 k
k
Now suppose ℓ > 0 and that yB ≤ zj∗ (we omit the case when yB > zj∗ , since
k
it can be argued similarly). Then Cj (f4a (y ∗ )) = 1 − i=1 (1 − f4a (yi∗ )) satisfies
Cj (f4a (y ∗ )) = 1 − aℓ
1
2
ℓ
k−ℓ
(1 − yi∗ )
i=1
≥ 1 − aℓ
1
2
k−ℓ
1
2
k−ℓ
≥ 1 − aℓ
1
2
k−ℓ
= 1 − aℓ
k
1+
i=ℓ+1
a
− ayi∗
2
a
ayB
y A ℓ
1+ −
1−
ℓ
2 k−ℓ
1−
zj∗ − yB
ℓ
g(yB ),
ℓ
1+
k−ℓ
ayB
a
−
2 k−ℓ
k−ℓ
An Improved Analysis of Goemans and Williamson’s LP-Relaxation
zj∗ −yB
ℓ
where g(yB ) ≡ 1 −
ℓ
1+
a
2
−
ayB
k−ℓ
with yB . Thus, if k − ℓ ≥ 2 then, by yB ≤
Cj (f4a (y ∗ )) ≥ 1 − aℓ
≥
≥
zj∗
1
2
k−ℓ
a
a
−
2 k−ℓ
k−ℓ
1+
1+
a a 2
−
2 2
g(zj∗ ) = 1 − aℓ
1
2
k−ℓ
1
2
1 − ak−2
2
. Note that g(yB ) is increasing
and g(yB ) ≤ g(zj∗ ), we have
k−ℓ
1
2
1 − aℓ
k−ℓ
11
1+
azj∗
a
−
2 k−ℓ
k−ℓ
zj∗
zj∗ =
1−
ak−2
4
a
zj∗ = ηk,2
zj∗ ,
k−ℓ
azj∗ k−ℓ
1 + a2 − k−ℓ
is a concave function in 0 ≤ zj∗ ≤ 1 and
since 1 − aℓ 21
k−ℓ
k−ℓ
a
1 + a2 − k−ℓ
1 − aℓ 21
is increasing with k − ℓ for 43 ≤ a ≤ 1 (which
can be shown by Lemma 2.5 in [1]). Similarly, if k − ℓ = 1, then yB = yk∗ ≤ ya
and
zj∗ − ya
1
ak
g(ya ) = 1 −
1−
Cj (f4a (y ∗ )) ≥ 1 − ak−1
2
2
k−1
k−1
k
1 − ya
a
a
≥ 1−
zj∗ = ηk,3
1−
zj∗
2
k−1
k−1
k
z ∗ −ya k−1
z ∗ −ya k−1
by Eq.(7) and 1 − a2 1 − jk−1
since g(yB ) ≤ g(ya ) = a 1 − jk−1
is a concave function in ya ≤ zj∗ ≤ 1 (see Lemma 2.4 in [1]).
∗
∗
Case 3: ya ≤ yk∗ ≤ 1. If yk−1
+yk∗ > 1 then (1−f4a (yk−1
))(1−f4a (yk∗ )) ≤ 41 and
k
∗
a
a ∗
1 − f4 (yi ) ≤ a (i = 1, 2, ..., k) and Cj (f4 (y )) = 1 − i=1 (1 − f4a (yi∗ )) satisfies
∗
Cj (f4a (y ∗ )) ≥ 1 − ak−2 (1 − f4a (yk−1
))(1 − f4a (yk∗ )) ≥ 1 −
∗
Thus, we can assume yk−1
≤ 1 − ya . Let yA =
k−1
Cj (f4a (y ∗ )) ≥ 1 − ak−1(1 − ayk∗ )
k−1
i=1
ak−2
a
a
= ηk,2
≥ ηk,2
zj∗ .
4
yi∗ . Then we have
(1 − yi∗ ) ≥ 1 − ak−1 (1 − ayk∗ ) 1 −
i=1
k−1
≥1−a
(1 −
ayk∗ )
zj∗ − yk∗
1−
k−1
∗
zj∗ −yk
k−1
k−1
k−1
k−1
zj∗ − ya k−1
ak
=1−
≥ 1 − ak−1 (1 − aya ) 1 −
k−1
2
k−1
k
a
1 − ya
a
1−
≥ 1−
zj∗ = ηk,3
zj∗ ,
2
k−1
since (1 − ayk∗ ) 1 −
yA
k−1
1−
zj∗ − ya
k−1
is decreasing with yk∗ (ya ≤ yk∗ ≤ 1).
k−1
12
4
T. Asano
Improved Approximation Algorithms
In this section, we briefly outline our improved appproximation algorithms for
MAX SAT based on a hybrid approach which is described in detail in Asano
and Williamson [1]. We use a semidefinite programming relaxation of MAX SAT
which is a combination of ones given by Goemans and Williamson [4], Feige and
Goemans [2], Karloff and Zwick [8], Halperin and Zwick [6], and Zwick [9]. Our
algorithms pick the best solution returned by the four algorithms corresponding
to (1) f3a in Goemans and Williamson [3], (2) MAX 2SAT algorithm of Feige
and Goemans [2] or of Halperin and Zwick [6], (3) MAX 3SAT algorithm of
Karloff and Zwick [8] or of Halperin and Zwick [6], and (4) Zwick’s MAX SAT
algorithm with a conjectured performance guarantee 0.7977 [9]. The expected
value of the solution is at least as good as the expected value of an algorithm
that uses Algorithm (i) with probability pi , where p1 + p2 + p3 + p4 = 1.
Our first algorithm picks the best solution returned by the three algorithms
corresponding to (1) f3a in Goemans and Williamson [3], (2) Feige and Goemans’s
MAX 2SAT algorithm [2], and (3) Karloff and Zwick’s MAX 3SAT algorithm
[8] (this implies that p4 = 0). From the arguments in Section 3, the probability
that a clause Cj ∈ Ck is satisfied by Algorithm (1) is at least ζka zj∗ , where ζka is
defined in Eq.(9). Similarly, from the arguments in [4,2], the probability that a
clause Cj ∈ Ck is satisfied by Algorithm (2) is
at least
0.93109 ·
2 ∗
z
k j
for k ≥ 2,
and at least
0.97653zj∗
for k = 1.
By an analysis obtained by Karloff and Zwick [8] and an argument similar to
one in [4], the probability that a clause Cj ∈ Ck is satisfied by Algorithm (3) is
at least
37 ∗
z
k8 j
for k ≥ 3,
and at least
0.87856zj∗
for k = 1, 2.
Suppose that we set a = 0.74054, p1 = 0.7861, p2 = 0.1637, and p3 = 0.0502
(p4 = 0). Then
ap1 + 0.97653p2 + 0.87856p3 ≥ 0.7860 for k = 1,
3
p1 + 0.93109p2 + 0.87856p3 ≥ 0.7860 for k = 2,
4
2 × 0.93109
37
ζka p1 +
p2 +
p3 ≥ 0.7860 for k ≥ 3.
k
k8
Thus this is a 0.7860-approximation algorithm. Note that the algorithm in Asano
and Williamson [1] picking the best solution returned by the three algorithms
corresponding to (1) f1a with a = 43 in Goemans and Williamson [3], (2) Feige
and Goemans [2], and (3) Karloff and Zwick [8] only achieves the performance
guarantee 0.7846.
Suppose next that we use three algorithms (1) f3a in Goemans and Williamson
[3], (2) Halperin and Zwick’s MAX 2SAT algorithm [6], and (3) Halperin and
Zwick’s MAX 3SAT algorithm [6] instead of Feige and Goemans [2] and Karloff
An Improved Analysis of Goemans and Williamson’s LP-Relaxation
13
and Zwick [8]. If we set a = 0.739634, p1 = 0.787777, p2 = 0.157346, and
p3 = 0.054877, then we have
ap1 + 0.9828p2 + 0.9197p3 ≥ 0.7877 for k = 1,
3
p1 + 0.9309p2 + 0.9197p3 ≥ 0.7877 for k = 2,
4
2 × 0.9309
37
ζka p1 +
p2 +
p3 ≥ 0.7877 for k ≥ 3.
k
k8
Thus we have a 0.7877-approximation algorithm for MAX SAT (note that the
performance guarantees of Halperin and Zwick’s MAX 2SAT and MAX 3SAT
algorithms are based on the numerical evidence [6]).
Suppose finally that we use two algorithms (1) f4a in Goemans and
Williamson [3] and (4) Zwick’s MAX SAT algorithm with a conjectured performance guarantee 0.7977 [9]. If we set a = 0.907180, p1 = 0.343137 and
p4 = 0.656863 (p2 = p3 = 0), then the probability of clause Cj with k literals being satisfied can be shown to be at least 0.8353zj∗ for each k ≥ 1. Thus,
we can obtain a 0.8353-approximation algorithm for MAX SAT if a conjectured
performance guarantee 0.7977 is true in Zwick’s MAX SAT algorithm [9,1].
Remarks. As described above, algorithms based on f3a and f4a can be used as
a building block for designing an improved approximation algorithm for MAX
SAT. We have examined several other parameterized functions including ones in
Asano and Williamson [1] and we are sure that algorithms based on f3a and f4a
are almost the best as such a building block among functions of using an optimal
solution (y ∗ , z ∗ ) to Goemans and Williamson’s LP relaxation for MAX SAT.
Acknowledgments. I would like to thank Prof. B. Korte of Bonn University for
having invited me to have stayed in his institute and done this work. I also thank
Dr. D.P. Williamson for useful comments. This work was supported in part by
21st Century COE Program: Research on Security and Reliability in Electronic
Society, Grant in Aid for Scientific Research of the Ministry of Education, Science, Sports and Culture of Japan, The Institute of Science and Engineering of
Chuo University, and The Telecommunications Advancement Foundation.
References
1. T. Asano and D.P. Williamson, Improved approximation algorithms for MAX SAT,
Journal of Algorithms 42, pp.173–202, 2002.
2. U. Feige and M.X. Goemans, Approximating the value of two prover proof systems,
with applications to MAX 2SAT and MAX DICUT, In Proc. 3rd Israel Symposium
on Theory of Computing and Systems, pp. 182–189, 1995.
3. M.X. Goemans and D.P. Williamson, New 3/4-approximation algorithms for the
maximum satisfiability problem, SIAM Journal on Discrete Mathematics 7, pp.
656–666, 1994.
14
T. Asano
4. M.X. Goemans and D.P. Williamson, Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming, Journal of
the ACM 42, pp. 1115–1145, 1995.
5. J. Håstad, Some optimal inapproximability results, In Proc. 28th ACM Symposium
on the Theory of Computing, pp. 1–10, 1997.
6. E. Halperin and U. Zwick, Approximation algorithms for MAX 4-SAT and rounding
procedures for semidefinite programs, Journal of Algorithms 40, pp. 184–211, 2001.
7. D.S. Johnson, Approximation algorithms for combinatorial problems, Journal of
Computer and Systems Science 9, pp. 256–278, 1974.
8. H. Karloff and U. Zwick, A 7/8-approximation algorithm for MAX 3SAT?, In Proc.
38th IEEE Symposium on the Foundations of Computer Science, pp. 406–415, 1997.
9. U. Zwick, Outward rotations: a tool for rounding solutions of semidefinite programming relaxations, with applications to MAX CUT and other problems, In Proc.
31st ACM Symposium on the Theory of Computing, pp. 679–687, 1999.
Certifying Unsatisfiability of Random 2k-SAT
Formulas Using Approximation Techniques
Amin Coja-Oghlan1 , Andreas Goerdt2 , André Lanka2 , and Frank Schädlich2
1
2
Humboldt-Universität zu Berlin, Institut für Informatik
Unter den Linden 6, 10099 Berlin, Germany
coja@informatik.hu-berlin.de
Technische Universität Chemnitz, Fakultät für Informatik
Straße der Nationen 62, 09107 Chemnitz, Germany
{goerdt,lanka,frs}@informatik.tu-chemnitz.de
Abstract. Let k be an even integer. We investigate the applicability of
approximation techniques to the problem of deciding whether a random
k-SAT formula is satisfiable. Let n be the number of propositional variables under consideration. First we show that if the number m of clauses
satisfies m ≥ Cnk/2 for a certain constant C, then unsatisfiability can
be certified efficiently using (known) approximation algorithms for MAX
CUT or MIN BISECTION. In addition, we present an algorithm based
on the Lovász ϑ function that within polynomial expected time decides
whether the input formula is satisfiable, provided m ≥ Cnk/2 . These
results improve previous work by Goerdt and Krivelevich [14]. Finally,
we present an algorithm that approximates random MAX 2-SAT within
expected polynomial time.
1
Introduction
The k-SAT problem is to decide whether a given k-SAT formula is satisfiable or
not. Since it is well-known that the k-SAT problem is N P-complete for k ≥ 3,
it is natural to ask for algorithms that can handle random formulas efficiently.
Given a set of n propositional variables and a function c = c(n), a random k-SAT
instance is obtained by picking c k-clauses over the set of n variables uniformly at
random and independently of each other. Part of the recent interest in random
k-SAT is due to the interesting threshold behavior, in that there exist values
ck = ck (n) such that random k-SAT instances with at most (1 − ε)·ck ·n random
clauses are satisfiable with high probability, whereas for at least (1 + ε) · ck · n
random clauses we have unsatisfiability with high probability. (Here, “with high
probability” or “whp.” means “with probability tending to 1 as n, the number
of variables, tends to infinity”). In particular, according to current knowledge
ck = ck (n) lies in a bounded interval depending on k only. However, it is not
known whether the threshold really is a constant independent of n, cf. [10]. In
this paper, we are concerned with values of c(n) well above the threshold, and
the problem is to certify efficiently that a random formula is unsatisfiable.
A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 15–26, 2003.
c Springer-Verlag Berlin Heidelberg 2003
16
A. Coja-Oghlan et al.
There are two different types of algorithms for deciding whether a random
k-SAT formula is satisfiable or not. First, there are algorithms that on any input formula have a polynomial running time, and that whp. give the correct
answer, “satisfiable” or “unsatisfiable”. However, with probability o(1), the algorithm may give an inconclusive answer. Hence, the algorithm never makes an
incorrect decision. We shall refer to algorithms of this type as efficient certification algorithms. Note that the trivial constant time algorithm always returning
“unsatisfiable” is not an efficient certification algorithm in our sense because it
gives an incorrect answer in some (rare) cases. Secondly, there are algorithms
that always answer correctly (either “satisfiable” or “unsatisfiable”), and that
applied to a random formula have a polynomial expected running time.
Let us emphasize that although an efficient certification algorithm may give
an inconclusive answer in some (rare) cases, such an algorithm is still complete
in the following sense. Given a random k-SAT instance such that the number
of clauses is above the satisfiability threshold, whp. the algorithm will indeed
give the correct answer (“unsatisfiable” in the present case). Note that no polynomial time algorithm can answer “unsatisfiable” on all unsatisfiable inputs;
completeness only refers to a subset whose probability tends to 1.
Any certification algorithm can be turned into a satisfiability algorithm that
answers correctly on any input, simply by invoking an enumeration procedure in
case that the efficient certification procedure gives an inconclusive answer. However, an algorithm obtained in this manner will not run in polynomial expected
time in general. For the probability of an inconclusive answer may be too large
(even though it is o(1)). Thus, asking for polynomial expected running time is
a rather strong requirement.
From [11] and [14] it is essentially known that for random k-SAT instances
with Poly(log n) · nk/2 clauses we can efficiently certify unsatisfiability, in case
of even k. For odd k we need n(k/2) + ε random clauses. Hence, it is an obvious
problem to design algorithms that can certify unsatisfiability of random formulas efficiently for smaller numbers of clauses than given in [11,14]. To make
further progress on this question, new techniques seem to be necessary. Therefore, in this paper we investigate what various algorithmic techniques contribute
to the random k-SAT problem. We achieve some improvements for the case of
even k, removing the polylogarithmic factor and achieving an algorithm with a
polynomial expecteded running time.
Based on reductions from 4-SAT instances to instances of graph theoretic optimization problems we obtain efficient certification algorithms applying known
approximation algorithms for the case of at least C · n2 4-clauses. Similar constructions involving approximation algorithms can be found in [6] or [13]. We
present two different certification algorithms. One applies the MAX CUT approximation algorithm of Goemans and Williamson [12]. The other one employs
the MIN BISECTION approximation algorithm of Feige and Krauthgamer [8].
Since the MAX CUT approximation algorithm is based on semidefinite programming, our first algorithm is not purely combinatorial. In contrast, the application
of the MIN BISECTION algorithm yields a combinatorial algorithm. We state
Certifying Unsatisfiability of Random 2k-SAT Formulas
17
our result only for k = 4, but it seems to be only a technical matter to extend
it to arbitrary even numbers k and C · nk/2 clauses.
Moreover, we obtain the first algorithm for deciding satisfiability of random
k-SAT formulas with at least C · nk/2 random clauses in expected polynomial
time (k even). Indeed, the algorithm can even handle semirandom formulas, cf.
Sec. 4 for details. Since the algorithm is based on computing the Lovász number
ϑ, it is not purely combinatorial. The analysis is based on a recent estimate on
the probable value of the ϑ-function of sparse random graphs [4].
The paper [2] is also motivated by improving the nk/2 barrier. Further, in
[9] another algorithm is given that certifies unsatisfiability of random 2k-SAT
formulas consisting of at least Cnk/2 clauses with probability tending to 1 as
C → ∞.
Though the decision version of the 2-SAT problem (“given a 2-SAT formula,
is there a satisfying assignment?”) can be solved in polynomial time, the optimization version MAX 2-SAT (“given a 2-SAT formula, find an assignment that
satisfies the maximum number of clauses”) is NP-hard. Therefore, we present
an algorithm that approximates MAX 2-SAT in expected polynomial time. The
algorithm is based on a probabilistic analysis of Goemans’ and Williamson’s
semidefinite relaxation of MAX 2-SAT [12]. Concerning algorithms for worst
case instances cf. [7].
In Section 2 we give our certification algorithms and in Section 3 we state the
theorem crucial for their correctness. Section 4, which is independent of Sections
2 and 3, deals with the expected polynomial time algorithm. Finally, in Section
5 we consider the MAX 2-SAT problem.
2
Efficient Certification of Unsatisfiability
Given a set of n propositional variables, Var = Varn = {v1 , . . . , vn }, a literal over
Var is a variable vi or a negated variable ¬vi . A k-clause is an ordered k-tuple
l1 ∨ l2 ∨ . . . ∨ lk of literals such that the variables underlying the literals are
distinct. A k-SAT instance is a set of k-clauses. We think of a k-SAT instance as
C1 ∧ C2 ∧ . . . ∧ Cm where each Ci is a k-clause. Given a truth value assignment
a of Var, we can assign true or false to a k-SAT instance as usual. We let Ta
be the set of variables x with a(x) = true and Fa the set of variables x with
a(x) = false. The probability space Formn,k,p is the probability space of k-SAT
instances obtained by picking each k-clause with probability p independently.
A k-uniform hyperedge or simply k-tuple over the vertex set V is a vector
(x1 , x2 , . . . , xk ) where the xi ∈ V are all distinct. H = (V, E) is a k-uniform
hypergraph if E is a set of k-tuples over the vertex set V . In the context of
k-uniform hypergraphs we use the notion of type in the following sense: Let
X1 , X2 , . . . , Xk ⊆ V , a k-tuple (x1 , x2 , . . . , xk ) is of type (X1 , X2 , . . . , Xk ) if
we have for all i that xi ∈ Xi . A random hypergraph H ∈ HGn,k,p is obtained
by picking each of the possible (n)k k-tuples with probability p, independently.
Let S be a set of k-clauses over the set of variables Var, as defined above.
The hypergraph H = (V, E) associated to S is defined by V = Var and
18
A. Coja-Oghlan et al.
(x1 , x2 , x3 , . . . , xk ) ∈ E if and only if there is a k-clause l1 ∨ l2 ∨ . . .∨lk ∈ S such
that for all i li = xi or li = ¬xi . In case of even k, the graph G = (V, E) associated to S is defined by V = {(x1 , . . . , xk/2 ) | xi ∈ Var and xi = xj for i = j}
and {(x1 , x2 , . . . , xk/2 ), (x(k/2)+1 , . . . , xk )} ∈ E if and only if there is a k-clause
l1 ∨ l2 ∨ . . . ∨ lk ∈ S such that the variable underlying li is xi .
The following asymptotic abbreviations are used: f (n) ∼s g(n) iff there
is an ε > 0 such that f (n) = g(n) · (1 + O(1/nε )). Here ∼s stands for strong
asymptotic equality. Similarly we use f (n) = so(g(n)) iff f (n) = O(1/nε )·g(n).
We say f (n) is negligible iff f (n) = so(1).
Parity properties analogous to the next theorem have been proved in [6] for
3-SAT instances with a linear number of clauses and in [13] for 4-SAT instances.
But in the proof of [13] it is important that the probability of each clause is p ≤
1/n2+ε where ε > 0 is a constant. This implies that the number of occurrences
of two given literals in several clauses of a random formula is small. This is not
any more the case for p = C/n2 and some complications arise.
Theorem 1 (Parity Theorem). For a random F ∈ Formn,4,p where p =
C/n2 and C is a sufficiently large constant, we can efficiently certify the following
properties.
(a) Let S ⊆ F be the subset of all clauses of F corresponding to one of
the 16 possibilities of placing negated and non-negated variables into the four
slots of clauses available. Let G = (V, E) be the graph associated to S. Then
|S| = C · n2 · (1 + so(1)) and |E| = C · n2 · (1 + so(1)).
(b) For all satisfying assignments a of F we have that |Ta | ∼s (1/2) · n and
|Fa | ∼s (1/2) · n.
(c) Let S be the set of clauses of F consisting only of non-negated variables.
Let H be the hypergraph associated to S. For all satisfying assignments a of F the
number of 4-tuples of H of each of the 8 types (Ta , Ta , Ta , Fa ), (Ta , Ta , Fa , Ta ),
(Ta , Fa , Ta , Ta ), (Fa , Ta , Ta , Ta ), (Fa , Fa , Fa , Ta ), (Fa , Fa , Ta , Fa ), (Fa , Ta ,
Fa , Fa ), (Ta , Fa , Fa , Fa ) is (1/8) · C · n2 · (1 + so(1)). The same statement
applies when S is one of the remaining seven subsets of clauses of F which
have a given even number of negated variables in a given subset of the four slots
available.
(d) Let H be the hypergraph associated to those clauses of F whose first slot
contains a negated variable and whose remaining three slots contain non-negated
variables. The number of 4-tuples of H of each of the 8 types (Ta , Ta , Ta , Ta ),
(Ta , Ta , Fa , Fa ), (Ta , Fa , Ta , Fa ), (Ta , Fa , Fa , Ta ), (Fa , Fa , Fa , Fa ), (Fa , Fa ,
Ta , Ta ), (Fa , Ta , Fa , Ta ), (Fa , Ta , Ta , Fa ) is (1/8) · C · n2 · (1 + so(1)). A statement analogously to (c) applies.
The technical notion type of a 4-tuple of a hypergraph is defined above.
Statement (b) means that we have an ε > 0 such that we can certify that all
assignments a with |Ta | ≥ (1/2) · n · (1 + 1/nε ) or |Fa | ≥ (1/2) · n · (1 + 1/nε )
do not satisfy a random F . Similarly for the remaining statements. Of course
probabilistically there should be no satisfying assignment.
Given a graph G = (V, E), a cut is a partition of V into two subsets V1
and V2 . The MAX CUT problem is the problem to maximize the number of
Certifying Unsatisfiability of Random 2k-SAT Formulas
19
crossing edges, that is the number of edges with one endpoint in V1 and the
other endpoint in V2 . There is a polynomial time approximation algorithm which,
given G, finds a cut such that the number of crossing edges is guaranteed to be
at least 0.87 · Opt(G), see [12]. Note that the algorithm is deterministic.
Algorithm 2. Certifies unsatisfiability. The input is a 4-SAT instance F .
1. Certify the properties as stated in Theorem 1.
2. Let S be the subset of all clauses of F containing only non-negated variables. We construct the graph G = (V, E) as defined above, associated to S.
3. Apply the MAX CUT approximation algorithm to G.
4. If the cut found in 3. contains at most 0.86 · |E| edges the output is
“unsatisfiable”, otherwise the algorithm gives an inconclusive answer.
Theorem 3. When applying Algorithm 2 to a F ∈ Formn,4,p where p = C/n2
and C is sufficiently large the algorithm efficiently certifies the unsatisfiability of
F.
Proof. To show that the algorithm is correct, let F be any satisfiable 4-SAT
instance. Let a be a satisfying truth value assignment of F . Then Theorem 1
(c) implies that G has a cut comprising almost all edges and the approximation
algorithm finds sufficiently many edges, so that we do not answer “unsatisfiable”. Completeness follows from Theorem 1 (c) and the fact that when C is a
sufficiently large constant any cut of G has at most slightly more than a fraction
of 1/2 of all edges with high probability.
⊓
⊔
At this point we know that an algorithm efficiently certifying unsatisfiability
exists, because there exist suitable so(1)-terms as we know from our theorems
and considerations.
Given a graph G = (V, E), where |V | is even. A bisection of G is a partition of V into two subsets V1 and V2 with |V1 | = |V2 | = |V |/2. The MIN
BISECTION problem is the problem to minimize the number of crossing edges.
There is a polynomial time approximation algorithm which, given G, finds a
bisection such that the number of crossing edges is guaranteed to be at most
O((log n)2 ) · Opt(G), |V | = n, see [8].
Algorithm 4. Certifies unsatisfiability. The input is a 4-SAT instance F .
1. Certify the properties as stated in Theorem 1.
2. Let S be the subset of all clauses of F whose first literal is a negated
variable and whose remaining literals are non-negated variables. We construct
the graph G = (V, E) associated to this set S. Check if the maximal degree of
G is at most 3 · ln n.
3. Apply the MIN BISECTION approximation algorithm to G.
4. If the bisection found contains at least (1/3) · |E| edges, then the output
is “unsatisfiable”, otherwise inconclusive.
Theorem 3 applies analogously to Algorithm 4. Now, the proof relies on
Theorem 1 (d).
20
3
A. Coja-Oghlan et al.
Proof of the Parity Theorem
We present the algorithms to prove Theorem 1. To deal with the problem of
multiple occurrences of pairs of variables in several clauses we need to work
with labelled (multi-)graphs and labelled (multi-)hypergraphs. Here the edges
between vertices are distinguished by labels.
Let H = (V, E) be a standard 4-uniform hypergraph. When speaking of the
projection of H onto coordinates 1 and 2 we think of H as a labelled multigraph in which the labelled edge {x1 , x2 }(x1 ,x2 ,x3 ,x4 ) is present if and only if
(x1 , x2 , x3 , x4 ) ∈ E. We denote this projection by G = (V, E).
Let e = |E| , V = {1, . . . , n}, X ⊆ V , and Y = V \ X. We denote the
number of labelled edges of G with one endpoint in X and the other endpoint in
Y by e(X, Y ). Similarly e(X) is the number of labelled edges with both endpoints
from X. In an asymptotic setting we use our terminology from Section 2 and
say that G has negligible discrepancy iff for all X ⊆ V with |X| = α · n where
β ≤ α ≤ 1 − β and Y = V \X e(X) ∼s eα2 and e(X, Y ) ∼s 2eα(1 − α) . Here
β > 0 is a constant. This extends the discrepancy notion from page 71ff. of
[3] to multigraphs. The n × n-matrix A = AG is the adjacency matrix of G,
where A(x, y) is the number of labelled edges between x and y. As A is real
valued and symmetric, A has n different eigenvectors and corresponding real
eigenvalues which we consider ordered as λ1,A ≥ λ2,A ≥ · · · ≥ λn,A . We let λ =
λA = max2≤i≤n |λi,A |. In an asymptotic context we speak of
strong eigenvalue
n
separation with respect to a constant k. By this we mean that i=2 λki = so(λk1 ).
When k is even and constant, strong eigenvalue separation implies
n in particular
k
k
that
λ
=
so(λ
).
It
is
known
that
for
any
k
≥
0
Trace(A
)
=
1
x=1 A (x, x) =
n
k
k
i=1 λi,A . Moreover, the Trace(A ) is equal to the number of closed walks of
length k, that is k steps, in G.
The degree of the vertex x in G dx is the number of labelled edges in which
x occurs. The n × n-matrix L = LG is a normalized adjacency
matrix, it is
related to the Laplacian matrix. We have L(x, y) = A(x, y)/ dx dy . As L = LG
is real valued and symmetric, too, we use all the eigenvalue notation introduced
for A analogously for L. Here λ1,L = 1 is known. Let d = d(n) be given. In an
asymptotic context we say that G is almost d-regular, if for any vertex x of G
dx,G = d(n) · (1 + so(1)). Theorem 5.1 and its corollaries on page 72/73 of [3]
imply the following fact.
Fact 5. Let G = (V, E) where V = {1, . . . , n} be a projection onto two coordinates of the 4-uniform hypergraph H = (V, E) with e = |E|. Let G be almost
d-regular, let β ≤ α ≤ 1 − β where β > 0 is a constant, and let X ⊆ V with
|X| = αn. Then we have,
(a) e(X) − eα2 ≤ λL · e · α · (1 + so(1)),
(b) |e(X, Y ) − 2eα(1 − α)| ≤ λL · 2 · e · α · (1 − α) · (1 + so(1)) for Y = V \ X.
We need methods to estimate λL , they are provided by the next lemma.
Lemma 6. Let G be the projection onto two given coordinates of the 4-uniform
hypergraph H = (V, E) where V = {1, . . . , n}. If G is almost d-regular and
Certifying Unsatisfiability of Random 2k-SAT Formulas
21
AG has strong eigenvalue separation with respect to a given constant k, then LG
has strong eigenvalue separation with respect to k.
Proof. Let W be the number
of
walks of length k in G. Then W =
closed
n
k
L
shows
Trace(Ak ) and Trace LkG =
G (x, x). An inductive argument
x=1
n
k
λ
that Trace LkG ≤ W · (1/d)k · (1 + so(1)) . Then we get,
i=1 i,LG ≤
n
k
1
k
· d · (1 + so(1)) . As λ1,LG = 1, whereas λk1,AG = dk · (1 + so(1))
i=1 λi,AG
n
we get that i=2 λki,LG = so(1). Note that λ1,A is always at most the maximal
degree of G and at least the minimal degree.
⊓
⊔
We collect some probabilistic properties of labelled projections when H is a
random hypergraph. The proof follows known principles.
Lemma 7. Let p = c/n2 where c is a sufficiently large constant and let H =
(V, E) be a hypergraph from HGn,4,p . Let G = (V, E) be a labelled projection of
H onto two coordinates. (a) Let d = d(n) = 2·c·n . Then G is almost d-regular
ε
with probability at least 1 − e−Ω(n ) for a constant ε > 0. (b) The adjacency
matrix A = AG has strong Eigenvalue separation with respect to k = 4 with high
probability.
Algorithm 8. Efficiently certifies negligible discrepancy with respect to a given
constant β of projection graphs. Input is a 4-uniform hypergraph H = (V, E).
Let G = (V, E) be the projection onto two given coordinates of H. Check almost
d-regularity of G and check for the adjacency matrix A of G if Trace A4 =
d4 · (1 + so(1)).
The correctness of the algorithm follows from Fact 5, the completeness when
considering HGn,4,p , where p = C/n2 , C sufficiently large, from Lemma 7 and
Fact 5.
We need to certify discrepancy properties of projections onto 3 given coordinates of a random 4-uniform hypergraph from HGn,4,p where p = c/n2 . Let
H = (V, E) be a standard 4-uniform hypergraph. When speaking of the projection of H onto coordinates 1, 2, and 3, we think of H as a labelled 3-uniform
hypergraph G = (V, E) in which the labelled 3-tuple (x1 , x2 , x3 )(x1 ,x2 ,x3 ,x4 )
is present if (x1 , x2 , x3 , x4 ) ∈ E. We restrict attention to the projection
onto coordinates 1, 2 and 3 in the following. For X, Y, Z ⊆ V we define
eG (X, Y, Z) = |{(x, y, z, −) ∈ E | (x, y, z) is of type (X, Y, Z)}| . For the notion of type we refer to the beginning of Section 2. With n = |V | and e = |E|
we say that the projection G has negligible discrepancy with respect to β if
for all X with |X| = αn, β ≤ α ≤ 1 − β, and Y = V \X we have that
eG (X, X, X) ∼s α3 · e, eG (X, Y, X) ∼s α2 (1 − α) · e and analogously for the
remaining 6 possibilities of placing X and Y . For 1 ≤ i ≤ 3 and x ∈ V we let
dx,i be the number of 4-tuples in E which have x in the i’th slot. Given d = d(n),
we say that G is almost d-regular if and only if dx,i = d · (1 + so(1)) for all x ∈ V
and all i = 1, 2, 3. We assign labelled product graphs to G.
Definition 9 (Labelled product). Let G = (V, E) be the projection onto coordinates 1, 2, and 3 of the 4-uniform hypergraph H = (V, E).
22
A. Coja-Oghlan et al.
The labelled product of G with respect to the first coordinate is the labelled
graph P = (W, F ), where W = V × V and F is defined as: For x1 , x2 , y1 , y2 ∈ V
with (x1 , y1 ) = (x2 , y2 ) we have {(x1 , y1 ), (x2 , y2 )}(h,k) ∈ F iff h = (z, x1 , x2 , −)
∈ E and k = (z, y1 , y2 , −) ∈ E and (!) h = k.
If the projection G is almost d-regular the number of labelled edges of the product
is n · d2 · (1 + so(1)) provided d ≥ nǫ for constant ǫ > 0. Discrepancy notions for
labelled products are totally analogous to those for labelled projection graphs
defined above. Theorem 10 is an adaption of Theorem 3.2 in [13].
Theorem 10. Let ǫ > 0 and d = d(n) ≥ nǫ . Let G = (V, E) with |V | = n be
the labelled projection hypergraph onto coordinates 1, 2 and 3 of the 4-uniform
hypergraph H = (V, E). Assume that G and H have the following properties. 1.
G is almost d-regular. 2. The labelled projection graphs of H onto any two of
the coordinates 1, 2, and 3 have negligible discrepancy with respect to β > 0. 3.
The labelled products of G have negligible discrepancy with respect to β 2 . Then
the labelled projection G has negligible discrepancy with respect to β.
Lemma 11. Let H = (V, E) be a random hypergraph from HGn,4,p where p =
c/n2 and c is sufficiently large. Let G be the labelled projection of H onto the
coordinates 1, 2, and 3. Let P = (W, F ) be the labelled product with respect to
the first coordinate of G. Then we have
(a) P is almost d-regular, where d = 2·c2 ·n, with probability 1−n−Ω(log log n) .
(b) The adjacency matrix AP has strong eigenvalue separation with respect
to k = 6.
Proof. (a) We consider the vertex (x1 , y1 ) ∈ W . First, assume that x1 = y1 .
We introduce the random variables,
Xz = |{(z, x1 , −, −) ∈ E}|, Yz = |{(z, y1 , −, −) ∈ E}|
Xz′ = |{(z, −, x1 , −) ∈ E}|, Yz′ = |{(z, −, y1 , −) ∈ E}|
′
′
and finally D = z Xz · Yz +
z Xz · Yz . Then D is the degree of the vertex
(x1 , y1 ) in the labelled product. The claim follows with Hoeffding’s bound [16],
page 104, Theorem 7. For x1 = y1 we can argue similarly.
(b) Applying standard techniques we get E[Trace(A6P )] = (2c2 n)6 + so(n6 ).
which with (a) implies strong Eigenvalue separation with respect to k = 6 with
high probability.
⊓
⊔
Algorithm 12. Certifies negligible discrepancy of labelled projections onto 3
coordinates of 4-uniform hypergraphs. The input is a 4-uniform hypergraph H =
(V, E). Let G = (V, E) be the projection of H onto the coordinates 1, 2, and 3.
1. Check if there is a suitable d such that G is almost d-regular. That is check
if dx,i = d · (1 + so(1)) for all vertices x and all i = 1, 2, 3.
2. Check if the labelled projections onto any two of the coordinates 1, 2, 3 of
H have negligible discrepancy. Apply Algorithm 8.
3. Check if the products of G are almost d-regular with d = 2c2 n.
Certifying Unsatisfiability of Random 2k-SAT Formulas
23
4. For each of the 3 labelled products P of G check if Trace A6P =
(2c2 n)6 · (1 + so(1)) where AP is the adjacency matrix of P .
5. Successful certification for G iff all checks are positive.
Correctness of the algorithm follows with Theorem 10. Completeness for
HGn,k,p with p = C/n2 and C sufficiently large with Theorem 10, Lemma
11 whose proof shows that the property concerning the trace holds with high
probability and implies strong eigenvalue separation.
Now we can prove Theorem 1. Theorem 1 (a) is trivial. Concerning Theorem
1 (b) we consider the following algorithm.
Algorithm 13. Certifies Theorem 1 (b). The input is a 4-SAT instance F . Let
H = (V, E) be the hypergraph associated to the subset of clauses which consist
of unnegated variables only.
1. Check that the labelled projection of H onto coordinates 1, 2, 3 has negligible discrepancy.
2. Check that the labelled projection of H onto coordinates 2, 3, 4 has negligible discrepancy.
3. Do the same as 1. and 2. for the hypergraph associated to the clauses
consisting only of negated variables.
4. If all checks have been successful, certify Theorem 1 (b).
Let F be any 4-SAT instance such that the algorithm is successful. Let a
be an assignment with |Fa | ≥ (1/2) · n · (1 + δ) where δ = δ(n) > 0 is not
negligible in the sense of Section 2 (for example δ = 1/ log n). From Step 1 we
know that the fraction of 4-tuples of H of type (Fa , Fa , Fa , −) is ((1/2) · (1 +
δ))3 · (1 + so(1)). Under the assumption that a satisfies F , the empty slot is
filled with a variable from Ta . From Step 2 we know that the fraction of 4-tuples
of H of type (−, Fa , Fa , Ta ) is ((1/2) · (1 + δ))2 · (1/2) · (1 − δ). As δ is not
negligible this contradicts negligible discrepancy of the labelled projection onto
coordinates 2, 3, and 4 of H. In the same way we can exclude assignments with
more variables set to true than false because Step 3 is successful. Therefore the
algorithm is correct. For random F the hypergraphs constructed are random
hypergraphs and the completeness of Algorithm 12 implies the completeness of
the algorithm.
Concerning Theorem 1 (c) we consider the following algorithm.
Algorithm 14. certifies parity properties. The input is a 4-SAT instance F .
1. Invoke Algorithm 13.
2. Let H be the hypergraph associated to the clauses of F consisting only of
non-negated variables.
3. Certify that all 4 labelled projections onto any 3 different coordinates of
H have negligible discrepancy (wrt. a suitable β > 0).
4. Certify that all 6 labelled projections onto any two coordinates of H have
negligible discrepancy.
5. Certify Theorem 1 (c) if all preceeding checks are successful.
24
A. Coja-Oghlan et al.
Correctness and completeness follow similarly as for the preceding algorithm.
Those cases of Theorem 1 which are left open by now can be treated analogously
and the Parity Theorem is proved.
4
Deciding Satisfiability in Expected Polynomial Time
Let Var = Varn = {x1 , . . . , xn } be a set of variables, and let Formn,k,m denote
a k-SAT formula chosen uniformly at random among all (2n)k·m possibilities.
Further, we consider semirandom formulas Form+
n,k,m , which are made up of a
random share and a worst case part added by an adversary:
1. Choose F0 = C1 ∧ · · · ∧ Cm = Formn,k,m at random.
2. An adversary picks any formula F = Form+
n,k,m over Var in which at least
one copy of each Ci , i = 1, . . . , m, occurs.
Note that in general we cannot reconstruct F0 from F . We say that an algorithm
A has a polynomial expected running time applied to Form+
n,k,m if the expected
running time remains bounded by a polynomial in the input length regardless
of the decisions of the adversary.
Theorem 15. Let k ≥ 4 be an even integer. Suppose that m ≥ C · 2k · nk/2 ,
for some sufficiently large constant C > 0. There exists an algorithm DecideSAT
that satisfies the following conditions.
1. Let F be any k-SAT instance over Var. If F is satisfiable, then DecideSAT(F )
will find a satisfying assignment. Otherwise DecideSAT(F ), will output “unsatisfiable”.
2. Applied to Form+
n,k,m , DecideSAT runs in polynomial expected time.
DecideSAT exploits the following connection between the k-SAT problem and
the maximum independent set problem. Let V = {1, . . . , n}k/2 , and ν = nk/2 .
Given any k-SAT instance F over Varn we define two graphs GF = (V, EF ),
G′F = (V, EF′ ) as follows. We let {(v1 , . . . , vk/2 ), (w1 , . . . , wk/2 )} ∈ EF iff
the k-clause xv1 ∨ · · · ∨ xvk/2 ∨ xw1 ∨ · · · ∨ xwk/2 occurs in F . Similarly,
{(v1 , . . . , vk/2 ), (w1 , . . . , wk/2 )} ∈ EF′ iff the k-clause ¬xv1 ∨· · ·∨¬xvk/2 ∨¬xw1 ∨
· · · ∨ ¬xwk/2 occurs in F. Let α(G) denote the independence number of a graph
G.
Lemma 16. [14] If F is satisfiable, then max{α(GF ), α(G′F )} ≥ 2−k/2 nk/2 .
Let Gν,µ denote a graph with ν vertices and µ edges, chosen uniformly at
random. We need the following slight extension of a lemma from [14].
Lemma 17. Let F ∈ Formn,k,m be a random formula.
1. Conditioned on |E(GF )| = µ, the graph GF is uniformly distributed; i.e.
GF = Gν,µ . A similar statement holds for G′F .
2. Let ε > 0. Suppose that 2k · nk/2 ≤ m ≤ nk−1 . Then with probability at least
1 − exp(−Ω(m)) we have min{|E(GF )|, |E(G′F )|} ≥ (1 − ε) · 2−k · m.
Certifying Unsatisfiability of Random 2k-SAT Formulas
25
Thus, our next aim is to bound
the independence number of a semirandom
graph efficiently. Let 0 ≤ µ ≤ ν2 . The semirandom graph G+
ν,µ is produced in
two steps: First, choose a random graph G0 = Gν,µ . Then, an adversary adds
to G0 arbitrary edges, thereby completing G = G+
ν,µ . We employ the Lovász
number ϑ, which can be seen as a semidefinite programming relaxation of the
independence number. Indeed, ϑ(G) ≥ α(G) for any graph G, and ϑ(G) can be
computed in polynomial time [15]. Our algorithm DecideMIS, which will output
“typical”, if the independence number of the input graph is “small”, and “not
typical” otherwise, is based on ideas invented in [4,5].
Algorithm 18. DecideMIS(G, µ)
Input: A graph G of order ν, and a number µ. Output: “typical” or “not typical”.
1. If ϑ(G) ≤ C ′ ν(2µ)−1/2 , then terminate with output “typical”. Here C ′ denotes some sufficiently large constant.
2. If there is no subset S of V , |S| = 25 ln(µ/ν)ν/µ, such that |V \(S ∪N (S))| >
12ν(2µ)−1/2 , then output “typical” and terminate.
3. Check whether in G there is an independent set of size 12ν(2µ)−1/2 . If this
is not the case, then output “typical”. Otherwise, output “not typical”.
Proposition 19. For any G, if DecideMIS(G, µ) outputs “typical”, then we
have α(G) ≤ C ′ ν(2µ)−1/2 . Moreover, the probability that DecideMIS(G+
ν,µ , µ)
outputs “not typical” is < exp(−ν). Applied to G+
ν,µ , DecideMIS has a polynomial expected running time, provided µ ≥ C ′′ ν, for some constant C ′′ > 0.
Proof. The proof goes along the lines of [5] and is based on the following facts
(cf. [4]): Whp. we have ϑ(Gν,µ ) ≤ c1 ν(2µ)−1/2 . Moreover, if M is a median of
ϑ(Gν,µ ), and if ξ > 0, then Prob[ϑ(Gν,p ) ≥ M + ξ] ≤ 30ν exp(−ξ 2 /(5M + 10ξ)).
To handle G+
⊓
⊔
ν,µ , we make use of the monotonicity of ϑ (cf. [15]).
Algorithm 20. DecideSAT(F )
Input: A k-SAT formula F over Varn .
Output: Either a satisfying assignment of F or “unsatisfiable”.
1. Let µ = 2−k−1 m. If both DecideMIS(GF , µ) and DecideMIS(G′F , µ) answer
“typical”, then terminate with output “unsatisfiable”.
2. Enumerate all 2n assignments and look for a satisfying one.
Thus, Thm. 15 follows from Lemmas 16, 17 and Prop. 19.
5
Approximating Random MAX 2-SAT
Theorem 21. Suppose that m = Cx2 n for some large constant C > 0 and some
constant x > 0. There is an algorithm ApxM2S that approximates MAX 2-SAT
within a factor of 1 − 1/x for any formula C ∈ Formn,2,m such that the expected
running time of ApxM2S(Formn,2,m ) is polynomial.
The analysis of ApxM2S is based on the probabilistic analysis of the SDP relaxation SMS of MAX 2-SAT of Goemans and Williamson [12] (details omitted).
26
A. Coja-Oghlan et al.
Algorithm 22. ApxM2S(C)
Input: An instance C ∈ Formn,2,m of MAX 2-SAT.
Output: An assignment of x1 , . . . , xn .
1. Check
√ whether the assignment xi =true for all i satisfies at least 3m/4 −
c1 mn clauses of C. If this is not the case, then go to 3. Here c1 denotes
some suitable constant.
√
2. Compute SMS(C). If SMS(C) ≤ 3m/4 + c2 mn, then output the assignment
xi =true for all i and terminate. Here c2 denotes some suitable constant.
3. Enumerate all 2n assignments of x1 , . . . , xn and output an optimal solution.
References
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
Alon, N., Spencer J.: The Probabilistic Method. John Wiley and Sons 1992.
Ben-Sasson, E., Bilu, Y.: A Gap in Average Proof Complexity. ECCC 003 (2002).
Chung, F.R.K.: Spectral Graph Theory. American Mathematical Society 1997.
Coja-Oghlan, A.: The Lovasz number of random graphs. Hamburger Beiträge zur
Mathematik 169.
Coja-Oghlan, A., Taraz, A.: Colouring random graphs in expected polynomial time.
Proc. STACS 2003, Springer LNCS 2607 487–498.
Feige, U.: Relations between average case complexity and approximation complexity. Proc. 34th STOC (2002) 310–332.
Feige, U., Goemans, M. X.: Approximating the value of two prover proof systems,
with applications to MAX 2SAT and MAX DICUT. Proc. 3rd Israel Symp. on
Theory of Computing and Systems (1995) 182–189.
Feige, U., Krauthgamer, R.: A polylogarithmic approximation of the minimum
bisection. Proc. 41st FOCS (2000) 105–115.
Feige, U., Ofek, E.: Spectral techniques applied to sparse random graphs, report
MCS03-01, Weizmann Institute of Science (2003).
Friedgut., E.: Necessary and Sufficient Conditions for Sharp Thresholds of Graph
Properties and the k-SAT problem. J. Amer. Math. Soc. 12 (1999) 1017–1054.
Friedman, J., Goerdt, A.: Recognizing more Unsatisfiable Random 3-SAT Instances
efficiently. Proc. ICALP 2001, Springer LNCS 2076 310–321.
Goemans, M.X., Williamson, D.P.: Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. J. ACM 42
1115–1145.
Goerdt, A., Jurdzinski, T.: Some Results on Random Unsatisfiable k-SAT Instances
and Approximation Algorithms Applied to Random Structures. Proc. MFCS 2002,
Springer LNCS 2420 280–291.
Goerdt, A., Krivelevich, M.: Efficient recognition of random unsatisfiable k-SAT
instances by spectral methods. Proc. STACS 2001, Springer LNCS 2010 294–304.
Grötschel, M., Lovász, L., Schrijver, A.: Geometric algorithms and combinatorial
optimization. Springer 1988.
Hofri, M.: Probabilistic Analysis of Algorithms. Springer 1987.
Inapproximability Results for Bounded Variants
of Optimization Problems
Miroslav Chlebı́k1 and Janka Chlebı́ková2⋆
1
Max Planck Institute for Mathematics in the Sciences
Inselstraße 22-26, D-04103 Leipzig, Germany
2
Christian-Albrechts-Universität zu Kiel
Institut für Informatik und Praktische Mathematik
Olshausenstraße 40, D-24098 Kiel, Germany
jch@informatik.uni-kiel.de
Abstract. We study small degree graph problems such as Maximum
Independent Set and Minimum Node Cover and improve approximation lower bounds for them and for a number of related problems,
like Max-B-Set Packing, Min-B-Set Cover, Max-Matching in
B-uniform 2-regular hypergraphs. For example, we prove NP-hardness
95
factor of 94
for Max-3DM, and factor of 48
for Max-4DM; in both cases
47
the hardness result applies even to instances with exactly two occurrences
of each element.
1
Introduction
This paper deals with combinatorial optimization problems related to bounded
variants of Maximum Independent Set (Max-IS) and Minimum Node
Cover (Min-NC) in graphs. We improve approximation lower bounds for small
degree variants of them and apply our results to even highly restricted versions of set covering, packing and matching problems, including Maximum-3Dimensional-Matching (Max-3DM).
It has been well known that Max-3DM is MAX SNP-complete (or APXcomplete) even when restricted to instances with the number of occurrences of
any element bounded by 3. To the best of our knowledge, the first inapproximability result for bounded Max-3DM with the bound 2 on the number of
occurrences of any elements in triples, appeared in our paper [5], where the first
explicit approximation lower bound for Max-3DM problem is given. (For less
restricted matching problem, Max 3-Set Packing, the similar inapproximability result for instances with 2 occurrences follows directly from hardness results
for Max-IS problem on 3-regular graphs [2], [3]). For B-dimensional Matching problem with B ≥ 4 the lower bounds on approximability were recently
proven by Hazan, Safra and Schwartz [12]. A limitation of their method, as
their explicitly state, is that it does not provide an inapproximability factor for
⋆
The author has been supported by EU-Project ARACNE, Approximation and Randomized Algorithms in Communication Networks, HPRN-CT-1999-00112.
A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 27–38, 2003.
c Springer-Verlag Berlin Heidelberg 2003
28
M. Chlebı́k and J. Chlebı́ková
3-Dimensional Matching. But just inapproximability factor for 3-dimensional
case is of major interest, as it allows the improvement of hardness of approximation factors for several problems of practical interest, e.g. scheduling problems,
some (even highly restricted) cases of Generalized Assignment problem, and
other packing problems.
This fact, and an important role of small degree variants of Max-IS
(Min-NC) problem as intermediate steps in reductions to many other problems
of interest, are good reasons for trying to push our technique to its limits. We
build our reductions on a restricted version of Maximum Linear Equations
over Z2 with 3 variables per equation and with the (large) constant number of
occurrences of each variable. Recall that this method, based on the deep Håstad’s
version of PCP theorem, was also used to prove ( 117
116 − ε)-approximability lower
bound for Traveling Salesman problem by Papadimitriou and Vempala [14],
96
and for our lower bound of 95
for Steiner Tree problem in graphs [6].
In this paper we optimize our equation gadgets and their coupling via a
consistency amplifier. The notion of consistency amplifier varies slightly from
problem to problem. Generally, they are graphs with suitable expanding (or
mixing) properties. Interesting quantities, in which our lower bounds can be
expressed, are parameters of consistency amplifiers that provably exist.
Let us explain how our inapproximability results for bounded variants of
Max-IS and Min-NC, namely B-Max-IS and B-Min-NC, imply the same
bounds for some set packing, set covering and hypergraph matching problems.
Max Set Packing (resp. Min Set Cover) is the following: Given a collection
C of subsets of a finite set S, find a maximum (resp., minimum) cardinality
collection C ′ ⊆ C such that each element in S is contained in at most one (resp.,
in at least one) set in C ′ . If each set in C is of size at most B, we speak about
B-Set Packing (res. B-Set Cover).
It may be phrased also in hypergraph notation; the set of nodes is S and
elements of C are hyperedges. In this notation a set packing is just a matching
in the corresponding hypergraph. For a graph G = (V, E) we define its dual
= (E, V ) whose node set is just E, V = {
hypergraph G
v : v ∈ V }, and for each
v ∈ V hyperedge v consists of all e ∈ E such that v ∈ e in G. Hypergraph G
is contained exactly
defined by this duality is clearly 2-regular, each node of G
is of dimension B, in particin two hyperedges. G is of maximum degree B iff G
is B-uniform. Independent sets in G are in one-to-one
ular G is B-regular iff G
(hence with set packings, in set-system nocorrespondence with matchings in G
Hence any approximation
tation), and node covers in G with set covers for G.
hardness result for B-Max-IS translates via this duality to the one for MaxB-Set Packing (with exact 2 occurrences), or to Max Matching in 2-regular
B-dimensional hypergraphs. Similar is the relation of results on B-Min-NC to
Min-B-Set Cover problem.
is, moreover, B-partite with
If G is B-regular edge B-colored graph, then G
balanced B-partition determined by corresponding color classes. Hence independent sets in such graphs correspond to B-dimensional matchings in natural
way. Hence any inapproximability result for B-Max-IS problem restricted to
Inapproximability Results for Bounded Variants of Optimization Problems
29
B-regular edge-B-colored graphs translates directly to inapproximability result
for Max-B-Dimensional Matching (Max-B-DM), even on instances with
exact two occurrences of each element.
Our results for Max-3DM and Max-4DM nicely complement recent results
of [12] on Max-B-DM given for B ≥ 4. To compare our results with their for
54
B = 4, we have better lower bound ( 48
47 vs. 53 − ε) and our result applies even
to highly restricted version with two occurrences. On the other hand, their hard
gap result has almost perfect completeness.
The main new explicit NP-hardness factors of this contribution are summarized in the following theorem. In more precise parametric way they are expressed
in Theorems 3, 5, 6. Better upper estimates on parameters from these theorems
immediately improve lower bounds given bellow.
Theorem. It is NP-hard to approximate:
95
• Max-3DM and Max-4DM to within 94
and 48
47 respectively, both results
apply to instances with exactly two occurrences of each element;
• 3-Max-IS (even on 3-regular graphs) and Max Triangle Packing (even
on 4-regular line graphs) to within 95
94 ;
• 3-Min-NC (even on 3-regular graphs) and Min-3-Set Cover (with exactly
two occurrences of each element) to within 100
99 ;
48
• 4-Max-IS (even on 4-regular graphs) to within 47
;
• 4-Min-NC (even on 4-regular graphs) and Min-4-Set Cover (with exactly
53
two occurrences) to within 52
;
• B-Min-NC (B ≥ 3) to within 67 − 12 logBB .
Preliminaries
Definition 1. Max-E3-Lin-2 is the following optimization problem: Given a
system I of linear equations over Z2 , with exactly 3 (distinct) variables in each
equation. The goal is to maximize, over all assignments ϕ to the variables, the
ratio sat(ϕ)
|I| , where sat(ϕ) is the number of equations of I satisfied by ϕ.
We use the notation Ek-Max-E3-LIN-2 for the same maximization problem,
where each variable occurs exactly k times. The following theorem follows from
Håstad’s results [11], see [5] for more details
Theorem 1. For every ε ∈ 0, 41 there is a constant k(ε) such that for every
k ≥ k(ε) the following problem is NP-hard: given an instance of Ek-Max-E3Lin-2, decide whether the fraction of more than (1 − ε) or less than ( 21 + ε) of
all equations is satisfied by the optimal (i.e. maximizing) assignment.
To use all properties of our equation gadgets, the order of variables in equations will play a role. We denote by E[k, k, k]-Max-E3-Lin-2 those instances
of E3k-Max-E3-Lin-2 for which each variable occurs exactly k times as the
first variable, k times as the second variable and k times as the third variable in
equations. Given an instance I0 of Ek-Max-E3-Lin-2 we can easily transform
30
M. Chlebı́k and J. Chlebı́ková
it into an instance I of E[k, k, k]-Max-E3-Lin-2 with the same optimum, as
follows: for any equation x + y + z = j of I0 we put in I the triple of equations
x + y + z = j, y + z + x = j, and z + x + y = j. Hence the same NP-hard gap as
in Theorem 1 applies for E[k, k, k]-Max-E3-Lin-2 as well. We describe several
reductions from E[k, k, k]-Max-E3-Lin-2 to bounded occurrence instances of
NP-hard problems that preserve the hard gap of E[k, k, k]-Max-E3-Lin-2.
2
Consistency Amplifiers
As a parameter of our reduction for B-Max-IS (or B-Min-NC) (B ≥ 3), and
Max-3DM, we will use a graph H, so called consistency 3k-amplifier, with the
following structure:
(i)
(ii)
(iii)
(iv)
The degree of each node is at most B.
There are 3k pairs of contact nodes {(ci0 , ci1 ) : i = 1, 2, . . . , 3k}.
The degree of any contact node is at most B − 1.
The first 2k pairs of contact nodes {(ci0 , ci1 ) : i = 1, 2, . . . , 2k} are implicitly
linked in the following sense: whenever J is an independent set in H, there
is an independent set J ′ in H such that |J ′ | ≥ |J|, a contact node c can
belong to J ′ only if c ∈ J, and for any i = 1, 2, . . . , 2k at most one node of
the pair (ci0 , ci1 ) belongs to J ′ .
(v) The consistency property: Let us denote Cj := {c1j , c2j , . . . , c3k
j } for j ∈
{0, 1}, and Mj := max{|J| : J is an independent set in H such that J ∩
C1−j = ∅}. Then M1 = M2 (:= M (H)), and for every ψ : {1, 2, . . . , 3k} →
{0, 1} and for every independent set J in H \ {ci1−ψ(i) : i = 1, 2, . . . , 3k} we
have |J| ≤ M (H) − min |{i : ψ(i) = 0}|, |{i : ψ(i) = 1}| .
Remark 1. Let j ∈ {0, 1} and J be any independent set in H \ C1−j such that
|J| = M (H), then J ⊇ Cj . To show that, assume that for some l ∈ {1, 2, . . . , 3k}
clj ∈
/ J. Define ψ : {1, 2, . . . , 3k} → {0, 1} by ψ(l) = 1 − j, and ψ(i) = j for i = l.
Now (v) above says |J| < M (H), a contradiction. Hence, in particular, Cj is an
independent set in H.
To obtain better inapproximability results we use equation gadgets that
require some further restrictions on degrees of contact nodes of a consistency
3k-amplifier: (iii-1) For B-Max-IS, B ≥ 6, the degree of any contact node is at
most B − 2. (iii-2) For B-Max-IS, B ∈ {4, 5}, the degree of any contact node
cij with i ∈ {1, . . . , k} is at most B − 1, the degree of cij with i ∈ {k + 1, . . . , 3k}
is at most B − 2, where j = 1, 2.
For integers B ≥ 3 and k ≥ 1 let GB,k stand for the set of corre
: H ∈ GB,k ,
sponding consistency 3k-amplifiers. Let µB,k := min M (H)
k
(H)
λB,k := min |V (H)|−M
: H ∈ GB,k (if GB,k = ∅, let λB,k = µB,k = ∞),
k
µB = limk→∞ µB,k , and λB = limk→∞ λB,k . The parameters µB and λB play
a role of quantities in which our inapproximability results for B-Max-IS and
B-Min-NC can be expressed. To obtain explicit lower bounds on approximability requires to find upper bounds on those parameters.
Inapproximability Results for Bounded Variants of Optimization Problems
31
In what follows we describe some methods how consistency 3k-amplifiers
can be constructed. We will confine ourselves to highly regular amplifiers. This
ensures that our inapproximability results apply to B-regular graphs for small
values of B. We will look for a consistency 3k-amplifier H as a bipartite graph
with bipartition (D0 , D1 ), where C0 ⊆ D0 , C1 ⊆ D1 and |D0 | = |D1 |. The idea
is that if Dj (j = 0, 1) is significantly larger than 3k (= |Cj |) then suitable
probabilistic model of constructing bipartite graphs with bipartition (D0 , D1 )
and prescribed degrees, will produce with high probability a graph H with good
“mixing properties” that ensures the consistency property with M (H) = |Dj |.
We will not develop probabilistic model here, rather we will rely on what has
already been proved (using similar methods) for amplifiers. The starting point
to our construction of consistency 3k-amplifiers will be amplifiers, which were
studied by Berman & Karpinski [3], [4] and Chlebı́k & Chlebı́ková [5].
Definition 2. A graph G = (V, E) is a (2, 3)-graph if G contains only the
nodes of degree 2 (contacts) and 3 (checkers). We denote Contacts = {v ∈
V : degG (v) = 2}, and Checkers = {v ∈ V : degG (v) = 3}. Furthermore, a
(2, 3)-graph G is an amplifier if for every A ⊆ V : |Cut A| ≥ |Contacts ∩ A|, or
|Cut A| ≥ |Contacts \ A|, where Cut A = {{u, v} ∈ E: exactly one of nodes u
and v is in A}. An amplifier G is called a (k, τ )-amplifier if |Contacts| = k and
|V | = τ k.
To simplify proofs we will use in our constructions only such (k, τ )-amplifiers
which contain no edge between contact nodes. Recall, that the infinite families
of amplifiers with τ = 7 [3], and even with τ ≤ 6.9 constructed in [5], are of this
kind.
The consistency 3k-amplifier for B = 3. Let a (3k, τ )-amplifier G =
(V (G), E(G)) from Definition 2 be fixed, and x1 , . . . , x3k be its contact
nodes. We assume, moreover, that there is a matching in G consisting of nodes
V (G) \ {x2k+1 , . . . , x3k }. Let us point out that both, the wheel-amplifiers with
τ = 7 [3], and also their generalization given in [5] with τ ≤ 6.9, clearly contain
such matchings.
Let one such matching M ⊆ E(G) be fixed from now on. Each node x ∈ V (G)
is replaced with a small gadget Ax . The gadget of x ∈ V (G) \ {x2k+1 , . . . , x3k }
is a path of 4 nodes x0 , X1 , X0 , x1 (in this order). For x ∈ {x2k+1 , . . . , x3k } we
take as Ax a pair of nodes x0 , x1 without an edge. Denote Ex := {x0 , x1 } for
each x ∈ V (G), and Fx := {X0 , X1 } for x ∈ V (G) \ {x2k+1 , . . . , x3k }. The union
of gadgets Ax (over all x ∈ V (G)) contains already all nodes of our consistency
3k-amplifier H, and some of its edges. Now we identify the remaining edges of H.
For each edge {x, y} of G we connect corresponding gadgets Ax , Ay with a pair
of edges in H, as follows: if {x, y} ∈ M, we connect X0 with Y1 and X1 with Y0 ;
if {x, y} ∈ E(G) \ M, we connect x0 with y1 , and x1 with y0 .
Having this done, one after another for each edge {x, y} ∈ E(G), we obtain
the consistency 3k-amplifier H = (V (H), E(H)) with contact nodes xij determined by contact nodes xi of G, for j ∈ {0, 1}, i ∈ {1, 2, . . . , 3k}. The proof of all
conditions from the definition of a consistency 3k-amplifier can be found in [7].
Hence, µ3 ≤ 40.4, λ3 ≤ 40.4 follows from this construction.
32
M. Chlebı́k and J. Chlebı́ková
The construction of the consistency amplifier for B = 4 is similar and can be
also found in [7]. In this case µ4 ≤ 21.7, λ4 ≤ 21.7 follows from the construction.
We do not try to optimize our estimates for B ≥ 5 in this paper, we are mainly
focused on cases B = 3 and B = 4. For larger B we provide our inapproximability
results based on small degree amplifiers constructed above. Of course, one can
expect that amplifiers with much better parameters can be found for these cases
by suitable constructions. We only slightly change the consistency 3k-amplifier
H constructed for case B = 4 to get some (very small) improvement for B ≥ 5
case. Namely, also for x ∈ {xk+1 , xk+2 , . . . , x2k } we take as Ax a pair of nodes
connected by an edge. The corresponding ci0 , ci1 nodes of H will have degree 3 in
H, but we will have now M (H) = 3τ k. The same proof of consistency for H will
work. This consistency amplifier H will be clearly simultaneously a consistency
3k-amplifier for any B ≥ 5. In this way we get the upper bound µB ≤ 20.7,
λB ≤ 20.7 for any B ≥ 5.
3
The Equation Gadgets
In the reduction to our problems we use the equation gadgets Gj for equations
x + y + z = j, j = 0, 1. To obtain better inapproximability results, we use
slightly modified equation gadgets for distinct value of B in B-Max-IS problem (or B-Min-NC problem). For j ∈ {0, 1} we define equation gadgets Gj [3]
for 3-Max-IS problem (Fig. 1), Gj [4] for 4(5)-Max-IS (Fig. 2(i)), Gj [6] for
B-Max-IS B ≥ 6 (Fig. 2(ii)). In each case the gadget G1 [∗] can be obtained
from G0 [∗] replacing each i ∈ {0, 1} in indices and labels by 1 − i.
For each u ∈ {x, y, z} we denote by Fu the set of all accented u-nodes from
Gj (hence Fu is a subset of {u′0 , u′1 , u′′0 , u′′1 }), and Fu := ∅ if Gj does not contain
any accented u-node; Tu := Fu ∪ {u0 , u1 }. For a subset A of nodes of Gj and
any independent set J in Gj we will say that J is pure in A if all nodes of A ∩ J
have the same lower index (0 or 1). If moreover, A ∩ J consists exactly of all
nodes of A of one index, we say that J is full in A.
The following theorem describes basic properties of equation gadgets, the
proof can be found in [7].
Theorem 2. Let Gj (j ∈ {0, 1}) be one of the following gadgets: Gj [3], Gj [4],
or Gj [6], corresponding to an equation x + y + z = j. Let J be an independent set
in Gj such that for each u ∈ {x, y} at most one of two nodes u0 and u1 belongs
to J. Then there is an independent set J ′ in Gj with the following properties:
(I)
(II)
(III)
(IV)
|J ′ | ≥ |J|,
for each u ∈ {x, y} it holds J ′ ∩ {u0 , u1 } = J ∩ {u0 , u1 },
J ′ ∩ {z0 , z1 } ⊆ J ∩ {z0 , z1 } and |J ′ ∩ {z0 , z1 }| ≤ 1,
J ′ contains (exactly) one special node, say ψ(x)ψ(y)ψ(z). Furthermore, J ′
is pure in Tu and full in Fu .
Inapproximability Results for Bounded Variants of Optimization Problems
z1
z0′
x′1
x0
z0
z1′′
z0′′
011
z1′
110
y1
y0′
x′′0
y1′′
x′′1
y0′′
x1
101
x′0
33
000
y0
y1′
Fig. 1. The equation gadget G0 := G0 [3] for 3-Max-IS and Max-3DM.
y1
y0
101
z0
110
x′0
000
101
z1
000
z1
z0
x0
011
x1
x′1
x1
(i)
y1
011
y0
x0
(ii)
110
Fig. 2. The equation gadget (i) G0 := G0 [4] for B-Max-IS, B ∈ {4, 5}, (ii) G0 := G0 [6]
for B-Max-IS (B ≥ 6).
4
Reduction for B-Max-IS and B-Min-NC
For arbitrarily small fixed ε > 0 consider k large enough such that conclusion of
Theorem 1 for E[k, k, k]-Max-E3-Lin-2 is satisfied. Further, let a consistency
(H)
(resp. |V (H)|−M
) as close to µB (resp. λB ) as we
3k-amplifier H have M (H)
k
k
need. Keeping one consistency 3k-amplifier H fixed, our reduction f (= fH ) from
E[k, k, k]-Max-E3-Lin-2 to B-Max-IS (resp., B-Min-NC) is as follows: Let I
be an instance of E[k, k, k]-Max-E3-Lin-2, V(I) be the set of variables of I,
m := |V(I)|. Hence I has mk equations, each variable u ∈ V(I) occurs exactly in
3k of them: k times as the first variable, k times as the second one, and k times
as the third variable in the equation. Assume, for convenience, that equations
are numbered by 1, 2, . . . , mk. Given variable u ∈ V(I) and s ∈ {1, 2, 3} let
rs1 (u) < rs2 (u) < · · · < rsk (u) be the numbers of equations in which variable u
occurs as the s-th variable. On the other hand, if for fixed r ∈ {1, 2, . . . , mk}
the r-th equation is x + y + z = j (j ∈ {0, 1}), there are uniquely determined
34
M. Chlebı́k and J. Chlebı́ková
i(x,r)
i(y,r)
numbers i(x, r), i(y, r), i(z, r) ∈ {1, 2, . . . , k} such that r1
(x) = r2
(y) =
i(z,r)
r3
(z) = r.
Take m disjoint copies of H, one for each variable. Let Hu denote a copy of
H that correspondents to a variable u ∈ V(I). The corresponding contacts are
in Hu denoted by Cj (u) = {uij : i = 1, 2, . . . , 3k}, j = 0, 1. Now we take mk
disjoint copies of equation gadgets Gr , r ∈ {1, 2, . . . , mk}. More precisely, if the
r-th equation reads as x + y + z = j (j ∈ {0, 1}) we take as Gr a copy of Gj [3]
for 3-Max-IS (or Gj [4] for 4(5)-Max-IS or Gj [6] for B-Max-IS, B ≥ 6). Then
i(x,r)
i(x,r)
, x1
the nodes x0 , x1 , y0 , y1 , z0 , z1 of Gr are identified with nodes x0
k+i(y,r)
k+i(y,r)
2k+i(z,r)
2k+i(z,r)
(of Hy ), z0
, z1
(of Hz ), respectively.
(of Hx ), y0
, y1
It means that in each Hu the first k-tuple of pairs of contacts corresponds to
the occurrences of u as the first variable, the second k-tuple corresponds to the
occurrences as the second variable, and the third one occurrences as the last
variable. Making the above identification for all equations, one after another,
we get a graph of degree at most B, denoted by f (I). Clearly, the above reduction f (using the fixed H as a parameter) to special instances of B-Max-IS is
polynomial. It can be proved that NP-hard gap of E[k, k, k]-Max-E3-Lin-2 is
preserved ([7]).
The following main theorem summarizes the results
Theorem 3. It is NP-hard to approximate: the solution of 3-Max-IS to within
any constant smaller than 1 + 2µ31+13 ; for B ∈ {4, 5} the solution of B-Max-IS
to within any constant smaller than 1 + 2µB1 +3 , the solution of B-Max-IS, B ≥
6, to within any constant smaller than 1 + 2µB1 +1 . Similarly, it is NP-hard to
approximate the solution of 3-Min-NC to within any constant smaller than 1 +
1
2λ3 +18 , for B ∈ {4, 5} the solution of B-Min-NC to within any constant smaller
than 1 + 2λB1+8 , the solution of B-Min-NC, B ≥ 6, to within any constant
smaller than 1 + 2λB1+6 .
Using our upper bounds given for µB , λB for distinct value of B we obtain
Corollary 1. It is NP-hard to approximate the solution of 3-Max-IS to within
48
1.010661 (> 95
94 ); the solution of 4-Max-IS to within 1.0215517 (> 47 ), the so46
lution of 5-Max-IS to within 1.0225225 (> 45 ) and the solution of B-Max-IS,
44
B ≥ 6 to within 1.0235849 (> 43
). Similarly, it is NP-hard to approximate the
solution of 3-Min-NC to within 1.0101215 (> 100
99 ); the solution of 4-Min-NC
);
the
solution
of
5-Min-NC
to within 1.0202429
to within 1.0194553 (> 53
52
49
)
and
B-Min-NC,
B
≥
6,
to
within
1.021097
(>
).
For each B ≥ 3, the
(> 51
50
48
corresponding result applies to B-regular graphs as well.
5
Asymptotic Approximability Bounds
This paper is focused mainly on graphs of very small degree. In this section
we discuss also the asymptotic relation between hardness of approximation and
degree for Independent Set and Node Cover problem in bounded degree
graphs.
Inapproximability Results for Bounded Variants of Optimization Problems
35
For the Independent Set problem in the class of graphs of maximum degree
B the problem is known to be approximable with performance ratio arbitrarily close to B+3
(Berman & Fujito, [2]). But asymptotically better ratios can
5
be achieved by polynomial algorithms, currently the best one approximates to
within a factor of O(B log log B/ log B), as follows from [1], [13]. On the other
hand, Trevisan
[15] has proved NP-hardness to approximate the solution to
√
O( log B)
within B/2
.
For the Node Cover problem the situation is more challenging, even in
general graphs. A recent result of Dinur and Safra [10] shows that for any δ >
0 the
√ Minimum Node Cover problem is NP-hard to approximate to within
10 5 − 21 − δ. One can observe that their proof can give hardness result also
for graphs with (very large) bounded degree B(δ). This follows from the fact
that after their use of Raz’s parallel repetition (where each variable appears in
only a constant number of tests), the degree of produced instances is bounded
by a function of δ. But the dependence of B(δ) on δ in their proof is really very
complicated. The earlier 76 − δ lower bound proved by Håstad [11] was extended
by Clementi & Trevisan [9] to graphs with bounded degree B(δ).
Our next result improve on their; it has better trade-off between non-approximability and the degree bound. There are no hidden constants in our asymptotic formula, and it provides good explicit inapproximability results for degree
bound B starting from few hundreds. First we need to introduce some notation.
Notation. Denote F (x) := −x log x − (1 − x) log(1 − x), x ∈ (0, 1), where
log means the natural logarithm. Further, G(c, t) := (F (t) + F (ct))/(F (t) −
1
ctF ( 1c )) for 0 < t < 1c < 1, g(t) := G( 1−t
t , t) for t ∈ (0, 2 ). More explicitly,
g(t) = 2[−t log t − (1 − t) log(1 − t)]/[−2(1 − t) log(1 − t) + (1 − 2t) log(1 − 2t)].
Using Taylor series of the logarithm near 1 we see that the denominator here is
∞
∞
1
2k+2 −2
tk > t2 , and −(1−t) log(1−t) = t−t2 k=0 (k+1)(k+2)
tk < t,
t2 · k=0 (k+1)(k+2)
2
1
consequently g(t) < t (1 + log t ).
For large enough B we look for δ ∈ (0, 16 ) such that 3⌊g( 2δ )⌋ + 3 ≤ B. As
1
≈ 75.62 and g is decreasing in (0, 12
, we can see that for B ≥ 228 any
−1 B
δ > δB := 2g (⌊ 3 ⌋) will do. Trivial estimates on δB (using g(t) < 2t (1 + log 1t ))
B
12
are δB < B−3
(log(B − 3) + 1 − log 6) < 12 log
.
B
We will need the following lemma about regular bipartite expanders to prove
the Theorem 4 (see [7] for proofs).
1
g( 12
)
Lemma 1. Let t ∈ (0, 12 ) and d be an integer for which d > g(t). For every
sufficiently large positive integer n there is a d-regular n by n bipartite graph
H with bipartition (V0 , V1 ), such that for each independent set J in H either
|J ∩ V0 | ≤ tn, or |J ∩ V1 | ≤ tn.
Theorem 4. For every δ ∈ (0, 16 ) it is NP-hard to approximate Minimum Node
Cover to within 67 − δ even in graphs of maximum degree ≤ 3⌊g( 2δ )⌋ + 3 ≤
3⌈ 4δ (1 + log 2δ )⌉. Consequently, for any B ≥ 228 it is NP-hard to approximate B-Min-NC to within any constant smaller than 76 − δB , where δB :=
12
(log(B − 3) + 1 − log 6) < 12 logBB .
2g −1 (⌊ B3 ⌋) < B−3
36
M. Chlebı́k and J. Chlebı́ková
Typically, the methods used for asymptotic results cannot be used for small
values of B to achieve interesting lower bounds. Therefore we work on new
techniques that improve the results of Berman & Karpinski [3] and Chlebı́k &
Chlebı́ková [5].
6
Max-3DM and Other Problems
Clearly, the restriction of B-Max-IS problem to edge-B-colored B-regular
graphs is a subproblem of Maximum B-Dimensional Matching (see [5] for
more details). Hence we want to prove that our reduction to B-Max-IS problem
can produce as instances edge-B-colored B-regular graphs. In this contribution
we present results for B = 3, 4. For the equation x + y + z = j (j ∈ {0, 1}) of
E[k, k, k]-Max-E3-Lin-2 we will use an equation gadget Gj [B], see Fig. 1 and
Fig. 2(i). The basic properties of these gadgets are described in Theorem 2.
Maximum 3-Dimensional Matching
As follows from Fig. 1 a gadget G0 [3] can be edge-3-colored by colors a, b, c in
such way that all edges adjacent to nodes of degree one (contacts) are colored
by one fixed color, say a (for G1 [3] we take the corresponding analogy). As an
amplifier of our reduction f = fH from E[k, k, k]-Max-E3-Lin-2 to Max-3DM
we use a consistency 3k-amplifier H ∈ G3,k with some additional properties:
degree of any contact node is exactly 2, degree of any other node is 3 and
moreover, a graph H is an edge-3-colorable by colors a, b, c in such way that all
edges adjacent to contact nodes are colored by two colors b and c. Let G3DM,k ⊆
: H ∈
G3,k be the class of all such amplifiers. Denote µ3DM,k = min M (H)
k
G3DM,k and µ3DM := limk→∞ µ3DM,k .
We use the same construction for consistency 3k-amplifiers as was presented
for 3-Max-IS, but now we have to show that produced graph H fulfills conditions
about coloring of edges. For fixed (3k, τ )-amplifier G and the matching M ⊆
E(G) of nodes V (G) \ {x2k+1 , . . . , x3k } we define edge coloring in two steps:
(i) Take preliminary the following edge coloring: for each {x, y} ∈ M we color
the corresponding edges in H as depicted on Fig. 3(i). The remaining edges of H
are easily 2-colored by colors b and c, as the rest of the graph is bipartite and of
degree at most 2. So, we have a proper edge-3-coloring but some edges adjacent
to contacts are colored by color a. It will happen exactly if x ∈ {x1 , x2 , . . . , x2k },
{x, y} ∈ M. (We assume that no two contacts of G are adjacent, hence y is
a checker node of G.) Clearly, one can ensure that in the above extension of
coloring of edges by colors c and b both other edges adjacent to x0 and x1 have
the same color. (ii) Now we modify our edge coloring in all these violating cases
as follows. Fix x ∈ {x1 , . . . , x2k }, {x, y} ∈ M, and let both other edges adjacent
to x0 and x1 have assigned color b. Then change coloring according Fig. 3(ii).
The case when both edges have assigned color c, can be solved analogously (see
Fig. 3(iii)). From the construction follows µ3DM ≤ 40.4.
Keeping one such consistency 3k-gadget H fixed, our reduction f (= fH )
from E[k, k, k]-Max-E3-Lin-2 is exactly the same as for B-Max-IS described
Inapproximability Results for Bounded Variants of Optimization Problems
37
x1
X0
Y1
y0
x1
X0
Y1
y0
x1
X0
Y1
y0
x0
X1
Y0
y1
x0
X1
Y0
y1
x0
X1
Y0
y1
(i)
(ii)
(iii)
Fig. 3. a color: dashed line, b color: dotted line, c color: solid line
in Section 4. Let us fix an instance I of E[k, k, k]-Max-E3-Lin-2 and consider
an instance f (I) of 3-Max-IS. As f (I) is edge 3-colored 3-regular graph, it is
at the same time an instance of 3DM with the same objective function. We can
show how the NP-hard gap of E[k, k, k]-Max-E3-Lin-2 is preserved exactly in
the same way as for 3-Max-IS. Consequently it is NP-hard to approximate the
solution of Max-3DM to within 1 + (1 − 4ε)( 2Mk(H) + 13 + 2ε), even on instances
with each element occurring in exactly two triples.
Maximum 4-Dimensional Matching
We will use the following edge-4-coloring of our gadget G0 [4] in Fig. 2(i)
(analogously for G1 [4]): a-colored edges {x′0 , 101 }, {x′1 , 011 }, {y1 , 000 },
{y0 , 110 }; b-colored edges {x′0 , 110 }, {x′1 , 000 }, {y1 , 101 }, {y0 , 011 }; ccolored edges {x1 , x′0 }, {x0 , x′1 }, { 101 , 110 }, {z0 , 011 }, {z1 , 000 }; d-colored
edges {x′0 , x′1 }, { 000 , 011 }, {z0 , 101 }, {z1 , 110 }. Now we will show that
an edge-4-coloring of a consistency 3k-amplifier H exists that fit well with
the above coloring of equation gadgets. We suppose that the (3k, τ )-amplifier
G from which H was constructed has a matching M of all checkers. (This is
true for amplifiers of [3] and [5]). The color d will be used for edges {x0 , x1 },
x ∈ V (G) \ {x2k+1 , . . . , x3k }. Also, for any x ∈ {xk+1 , . . . , x2k }, the corresponding {X0 , X1 } edge will have color d too. The color c will be reserved for coloring
edges of H “along the matching M”, i.e. if {x, y} ∈ M, edges {x0 , y1 } and
{x1 , y0 } have color c. Furthermore, for x ∈ {xk+1 , . . . , x2k } the corresponding
edges {x0 , X1 } and {x1 , X0 } will be of color c too. The edges that are not colored
by c and d form a 2-regular bipartite graph, hence they can be edge 2-colored
by colors a and b. The above edge 4-coloring of H and Gj [4] (j ∈ {0, 1}) ensures that instances produced in our reduction to 4-Max-IS are edge-4-colored
4-regular graphs.
The following theorem summarizes both achieved results:
Theorem 5. It is NP-hard to approximate the solution of Max-3DM to within
95
1
any constant smaller than 1 + 2µ3DM
+13 > 1.010661 > 94 , and the solution
48
of Max-4-DM to within 1.0215517 (> 47
). The both inapproximability results
hold also on instances with each element occurring in exactly two triples, resp.
quadruples.
Lower bound for Min-B-Set Cover follows from that of B-Min-NC, as
was explained in Introduction. It is also easy to see that instances obtained by
38
M. Chlebı́k and J. Chlebı́ková
our reduction for 3-Max-IS are 3-regular triangle-free graphs. Hence, we get the
same lower bound for Maximum Triangle Packing by simple reduction (see
[5] for more details).
Theorem 6. It is NP-hard to approximate the solution of the problems Maximum Triangle Packing (even on 4-regular line graphs) to within any constant
95
smaller than 1 + 2µ31+13 > 1.010661 > 94
, Min-3-Set Cover with exactly two
occurrences of each element to within any constant smaller than 1 + 2λ31+13 >
1.0101215 > 100
99 ; and Min-4-Set Cover with exactly two occurrences of each
element to within any constant smaller than 1 + 2λ41+8 > 1.0194553 > 53
52 .
Conclusion Remarks. A plausible direction to improve further our inapproximability results is to give better upper bounds on parameters λB , µB . We think
that there is still a potential for improvement here, using a suitable probabilistic
model for the construction of amplifiers.
References
1. N. Alon and N. Kahale: Approximating the independent number via the θ function,
Mathematical Programming 80(1998), 253–264.
2. P. Berman and T. Fujito: Approximating independent sets in degree 3 graphs, Proc.
of the 4th WADS, LNCS 955, 1995, Springer, 449–460.
3. P. Berman and M. Karpinski: On Some Tighter Inapproximability Results, Further
Improvements, ECCC Report TR98-065, 1998.
4. P. Berman and M. Karpinski: Efficient Amplifiers and Bounded Degree Optimization, ECCC Report TR01-053, 2001.
5. M. Chlebı́k and J. Chlebı́ková: Approximation Hardness for Small Occurrence Instances of NP-Hard Problems, Proc. of the 5th CIAC, LNCS 2653, 2003, Springer
(also ECCC Report TR02-73, 2002).
6. M. Chlebı́k and J. Chlebı́ková: Approximation Hardness of the Steiner Tree Problem on Graphs, Proc. of the 8th SWAT, LNCS 2368, 2002, Springer, 170–179.
7. M. Chlebı́k and J. Chlebı́ková: Inapproximability results for bounded variants of
optimization problems, ECCC Report TR03-26, 2003.
8. F. R. K. Chung: Spectral Graph Theory, CBMS Regional Conference Series in
Mathematics, AMS, 1997, ISSN 0160-7642, ISBN 0-8218-0315-8.
9. A. Clementi and L. Trevisan: Improved non-approximability results for vertex cover
with density constraints, Theoretical Computer Science 225(1999), 113–128.
10. I. Dinur and S. Safra: The importance of being biased, ECCC Report TR01-104,
2001.
11. J. Håstad: Some optimal inapproximability results, Journal of ACM 48(2001),
798–859.
12. E. Hazan, S. Safra and O. Schwartz: On the Hardness of Approximating kDimensional Matching, ECCC Report TR03-20, 2003.
13. D. Karger, R. Motwani and M. Sudan: Approximate graph coloring by semi-definite
programming, Journal of the ACM 45(2)(1998), 246–265.
14. C. H. Papadimitriou and S. Vempala: On the Approximability of the Traveling
Salesman Problem, In Proc. 32nd ACM Symposium on Theory of Computing,
Portland, 2000.
15. L. Trevisan: Non-approximability results for optimization problems on bounded degree instances, In Proc. 33rd ACM Symposium on Theory of Computing, 2001.
Approximating the Pareto Curve with Local
Search for the Bicriteria TSP(1,2) Problem⋆
(Extended Abstract)
Eric Angel, Evripidis Bampis, and Laurent Gourvès
LaMI, CNRS UMR 8042, Université d’Évry Val d’Essonne, France
Abstract. Local search has been widely used in combinatorial optimization [3], however in the case of multicriteria optimization almost
no results are known concerning the ability of local search algorithms to
generate “good” solutions with performance guarantee. In this paper, we
introduce such an approach for the classical traveling salesman problem
(TSP) problem [13]. We show that it is possible to get in linear time,
a 32 -approximate Pareto curve using an original local search procedure
based on the 2-opt neighborhood, for the bicriteria TSP(1,2) problem
where every edge is associated to a couple of distances which are either
1 or 2 [12].
1
Introduction
The traveling salesman problem (TSP) is one of the most popular problems in
combinatorial optimization. Given a complete graph where the edges are associated with a positive distance, we search for a cycle visiting each vertex of the
graph exactly once and minimizing the total distance. It is well known that the
TSP problem is NP-hard and it cannot be approximated within a bounded approximation ratio, unless P=NP. However, for the metric TSP (i.e. when the distances satisfy the triangle inequality), Christofides proposed an algorithm with
performance ratio 3/2 [1]. For more than 25 years, many researchers attempted
to improve this bound but with no success. Papadimitriou and Yannakakis [12]
studied a more restrictive version of the metric TSP, the case where all distances
are either one or two, and they achieved a 7/6 approximation algorithm. This
problem, known as the T SP (1, 2) problem, remains NP-hard, it is in fact this
version of TSP that was shown NP-complete in the original reduction of Karp
[2]. The T SP (1, 2) problem is a generalization of the hamiltonian cycle problem
since we are asking for the tour of the graph that contains the fewest possible
non-edges (edges of weight 2). More recently, Monnot et al. obtained results for
the T SP (1, 2) with respect to the differential approximation ratio [8,9].
In this paper, we consider the bicriteria T SP (1, 2) problem which is a special
case of the multicriteria TSP problem [14] in which every edge is associated to a
⋆
Research partially supported by the thematic network APPOL II (IST 2001-32007)
of the European Union, and the France-Berkeley Fund project MULT-APPROX.
A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 39–48, 2003.
c Springer-Verlag Berlin Heidelberg 2003
40
E. Angel, E. Bampis, and L. Gourvès
couple of distances which are either 1 or 2, i.e. each edge can take a value from the
set {(1, 1), (1, 2), (2, 1), (2, 2)}. As an application consider two undirected graphs
G1 and G2 on the same set V of n vertices. Does there exists a hamiltonian
cycle which is common for both graphs? This problem can be formulated as a
special case of the bicriteria traveling salesman problem we consider. Indeed, for
G = G1 or G2 let δG ([i, j]) = 1 if there is an edge between vertices i and j in
graph G and let δG ([i, j]) = 0 otherwise. We form a bicriteria TSP instance in a
complete graph in the following way: consider any couple of vertices {i, j} ∈ V 2 ,
we set the cost of edge [i, j] to be c([i, j]) = (2 − δG1 ([i, j]), 2 − δG2 ([i, j])). Then
there exists a hamiltonian cycle common for both graphs if and only if there
exists a solution for the bicriteria TSP achieving a cost (n, n). Here, we study
the optimization version of this bicriteria TSP in which we look for a common
“hamiltonian cycle” using the fewest possible non-edges in each graph, i.e. we
are seeking a hamiltonian cycle in the complete graph of the T SP (1, 2) instance
minimizing the cost of both coordinates. A solution of our problem is evaluated
with respect to two different optimality criteria (see [5] for a recent book on
multicriteria optimization). Here, we are interested in the trade-off between the
different objective functions which is captured by the set of all possible solutions
which are not dominated by other solutions (the so-called Pareto curve). Since
the monocriterion T SP (1, 2) problem is NP-hard, determining whether a point
belongs to the Pareto curve is NP-hard. Papadimitriou and Yannakakis [11]
considered an approximate version of the Pareto curve, the so-called (1 + ε)approximate Pareto curve. Informally, an (1+ε)-Pareto curve is a set of solutions
that dominates all other solutions approximately (within a factor 1+ε) in all the
objectives. In other words, for every other solution, the considered set contains
a solution that is as good approximately (within a factor 1 + ε) in all objectives.
We propose a bicriteria local search procedure using the 2-opt neighborhood
which finds a 3/2-approximate Pareto curve (notice that a 2-approximate Pareto
curve can be trivially constructed, just consider any tour). Interestingly, Khanna
et al. [7] have shown that a local search algorithm using the 2-opt neighborhood
achieves a 3/2 performance ratio, for the monocriterion T SP (1, 2) problem. We
furthermore show that the gap between the cost of a local optimum produced
by our local search procedure when compared to a solution of the exact Pareto
curve is 3/2, and thus our result is tight. Up to the best of our knowledge, no
results were known about the ability of local search algorithms to provide good
(from the approximation –with performance guarantee– point of view) solutions
in the area of multicriteria optimization.
1.1
Definitions
Given an instance of a multicriteria minimization problem, with γ ≥ 1 objective
functions Gi , i = 1, . . . , γ, its Pareto curve P is the set of all γ-vectors (cost
vectors) such that for each v = (v1 , . . . , vγ ) ∈ P,
1. there exists a feasible solution s such that Gi (s) = vi for all i, and
2. there is no other feasible solution s′ such that Gi (s′ ) ≤ vi for all i, with a
strict inequality for some i.
Approximating the Pareto Curve with Local Search
41
For the ease of presentation, we will sometimes use P to denote a set of solutions which achieve these values. (If there is more than one solution with the
same vi values, P contains one of them.) Since for the problem we consider computing the (exact) Pareto curve is infeasible in polynomial time (unless P=NP),
we consider an approximation. Given ε > 0, an (1+ε)-approximate Pareto curve,
denoted P(1+ε) , is a set of cost vectors of feasible solutions such that for every
feasible solution s of the problem there is a solution s′ with cost vector from
P(1+ε) such that Gi (s′ ) ≤ (1 + ε)Gi (s) for all i = 1, ..., γ.
2
Bicriteria Local Search
We consider the bicriteria T SP (1, 2) with n cities. For an edge e, we shall denote by c(e) ∈ {(1, 1), (1, 2), (2, 1), (2, 2)} its cost, and c(e) = (c1
(e), c2 (e)). The
objective
is to find a tour T (set of edges) minimizing G1 (T ) = e∈T c1 (e) and
G2 (T ) = e∈T c2 (e). In the following we develop a local search based procedure
in order to find a 3/2-approximate Pareto curve for this bicriteria problem.
We shall use the well known 2-opt neighborhood for the traveling salesman
problem [4]. Given a tour T , its neighborhood N (T ), is the set of all the tours
which can be obtained from T by removing two non adjacent edges from T
(a = [x, y] and b = [u, v] in Figure 1) and inserting two new edges (c = [y, v] and
d = [x, u] in Figure 1) in order to obtain a new tour.
x a y
x
y
2−opt
v b u
Tour T
c
d
v
u
Tour T’
Fig. 1. The 2-opt move.
In the bicriteria setting there is a difficulty to define properly what is a local
optimum. The natural preference relation over the set of tours, denoted ≺n , is
defined as follows.
Definition 1. Let T and T ′ be two tours. One has T ′ ≺n T iff
– G1 (T ′ ) ≤ G1 (T ) and G2 (T ′ ) < G2 (T ), or
– G1 (T ′ ) < G1 (T ) and G2 (T ′ ) ≤ G2 (T ).
If we consider this natural preference relation in order to define the notion of
local optimum i.e. if we say that a tour T is a local optimum tour with respect
to the 2-opt neighborhood whenever there does not exist a tour T ′ ∈ N (T ) such
that T ′ ≺n T , then there exist instances for which a local optimum tour gives a
performance guarantee strictly worse than 3/2 for one criterion.
42
E. Angel, E. Bampis, and L. Gourvès
Indeed, in Figure 2, the exact Pareto curve of the depicted instance contains
only the tour abcdef ghij of weight (10, 10). Thus, a 3/2-approximate Pareto
curve of the instance should contain a single tour of weight strictly less than
16 for both criteria. Tours aebicdf ghj and adjigf ecbh are both local optima
with respect to ≺n and their weights are respectively (16, 10) and (10, 16) (see
Figure 2). Thus, using local optima with respect to ≺n is not appropriate to
compute a 3/2-approximate Pareto curve of the considered problem (more details
are given in the full paper).
a
b
j
c
i
d
e
h
g
(2, 1)
(1, 2)
(1, 1)
f
Fig. 2. Non represented egdes have a weight (2, 2).
Hence, we introduce the following partial preference relations among the set
of two edges. These preference relations, denoted by ≺1 and ≺2 , are defined in
Figure 3. The set of the ten possible couples of cost-vectors of the edges has
been partitioned into three sets S1 , S2 and S3 , and for any s1 ∈ S1 , s2 ∈ S2 ,
s3 ∈ S3 , we have s1 ≺1 s2 , s1 ≺1 s3 and s2 ≺1 s3 . Intuitively, preference relation
≺1 (resp. ≺2 ) means: pairs with at least one (1,1)-weighted edge in front of all
others, and among the rest, pairs with at least one (1,2)-weighted edge (resp.
(2,1)-weighted edge) in front.
Definition 2. We say that the tour T is a local optimum tour with respect to
the 2-opt neighborhood and the preference relation ≺1 if there does not exist a
tour T ′ ∈ N (T ), obtained from T by removing edges a, b and inserting edges c, d,
such that {c, d} ≺1 {a, b}.
A similar definition holds for the preference relation ≺2 .
We consider the following local search procedure:
Bicriteria Local Search (BLS):
1. Let s1 be a 2-opt local optimum tour with the preference relation ≺1 .
2. Let s2 be a 2-opt local optimum tour with the preference relation ≺2 .
3. If s1 ≺n s2 output {s1 }, if s2 ≺n s1 output {s2 }, otherwise output
{s1 , s2 }.
Approximating the Pareto Curve with Local Search
43
In order to find a local optimum tour, we start from an arbitrary solution
(say s). We look for a solution s′ in the 2-opt neighborhood of s such that s′ ≺1 s
(resp. s′ ≺2 s) and replace s by s′ . The procedure stops when such a solution s′
does not exist, meaning that the solution s is a local optimum with respect to
the preference relation ≺1 (resp. ≺2 ).
Notice that the proposed 2-opt neighborhood local search algorithm does
not collapse to the traditional 2-opt neighborhood local search when applied to
the monocriterion special case TSP with c1 (e) = c2 (e) for all edges e. In this
case our BLS algorithm does not replace a pair of edges with weights (1,1) and
(2,2) by a pair of edges edges with weights (1,1) and (1,1), even if this move
improves the quality of the tour. However allowing such moves does not improve
the performance guarantee as the example in Figure 7 shows.
In the next section, we prove the next two theorems.
Theorem 1. The set of solution(s) returned by the Bicriteria Local Search
(BLS) procedure is a 3/2-approximate Pareto curve for the multicriteria TSP
problem with distances one and two. Moreover, this bound is asymptotically
sharp.
Theorem 2. The number of 2-opt moves performed by BLS is O(n).
3
Analysis of BLS
The idea of the proof of Theorem 1 is based (as in [7]) on the comparison of
the number of the different types of cost vectors in the obtained local optimum
solution(s) with the corresponding numbers with any other feasible solution (including the optimal one). In the following we assume that T is any 2-opt local
optimal tour with respect to the preference relation ≺1 . The tour O is any fixed
tour (in particular, one of the exact Pareto curve). Let us denote by x (resp.
y,z and t) the number of (1,1) (resp. (1,2), (2,1) and (2,2)) edges in tour T . We
denote with a prime the same quantities for the tour O.
Lemma 1. With the preference relation ≺1 one has x ≥ x′ /2.
Proof. Let UO (resp. UT ) be the set of (1, 1) edges in the tour O (resp. local
optimum tour T ). We define a function f : UO → UT in the following way. Let
e be an edge in UO . If e ∈ UT then f (e) = e. Otherwise let e′ and e′′ be the
two edges adjacent to e in the tour T as depicted in Figure 4 (we assume an
arbitrary orientation of T and consider that the only edges adjacent to e are e′
and e′′ and not e4 and e5 ). Let e′′′ be the edge forming a cycle of length 4 with
e, e′ and e′′ (see Figure 4). We claim that there is at least one edge among e′
and e′′ with a weight (1, 1) and define f (e) to be one of those edges (possibly
chosen arbitrarily). Otherwise, we have {e, e′′′ } ∈ S1 and {e′ , e′′ } ∈ S2 ∪ S3 (see
Figure 3), contradicting the fact that T is a local optimum with respect to the
preference relation ≺1 . Now observe that for a given edge e′ ∈ UT , there can be
at most two edges e ∈ UO such that f (e) = e′ . Such a case occurs in Figures 5
and 6. Therefore we have |UT | ≥ |UO |/2.
⊓
⊔
44
E. Angel, E. Bampis, and L. Gourvès
S1
S1
(1, 1)
S2
(1, 1)
S3
S2
S3
(1, 1)
(1, 2)
(2, 1)
(1, 1)
(2, 1)
(1, 2)
(1, 1)
(1, 2)
(2, 1)
(1, 1)
(2, 1)
(1, 2)
(1, 2)
(1, 2)
(2, 1)
(2, 1)
(2, 1)
(1, 2)
(1, 1)
(2, 1)
(2, 2)
(1, 1)
(1, 2)
(2, 2)
(2, 1)
(1, 2)
(2, 2)
(1, 2)
(2, 1)
(2, 2)
(1, 1)
(2, 2)
(2, 2)
(1, 1)
(2, 2)
(2, 2)
(2, 2)
(2, 2)
(a) The preference relation ≺1 .
(b) The preference relation ≺2 .
Fig. 3. The two preference relations ≺1 and ≺2 .
e′′
e4
e′′′
e
e′
e5
Fig. 4. The local optimal tour T (arbitrarily oriented).
a
a
e1
c
e2
b
c
e2
e1
e’ b
Tour O
Tour T
Fig. 5. f (e1 ) = f (e2 ) = e′ with e1 , e2 ∈ O and e′ ∈ T
a e’’
e1
a
e1
c
e2
Tour O
b
c e’=e b
2
Tour T
Fig. 6. f (e1 ) = f (e2 ) = e′ with e1 , e2 ∈ O and e′ ∈ T
Approximating the Pareto Curve with Local Search
45
Lemma 2. With the preference relation ≺2 one has x ≥ x′ /2.
Proof. The proof of Lemma 2 is symmetric to the one of Lemma 1, just assume
that T is any 2-opt local optimal tour with respect to the preference relation
≺2 .
⊓
⊔
Lemma 3. With the preference relation ≺1 one has x + y ≥ (x′ + y ′ )/2.
Proof. Let UO (resp. UT ) be the set of (1, 1) and (1, 2) edges in the tour O (resp.
local optimum tour T ). We define a function f : UO → UT in the following way.
Let e be an edge in UO . If e ∈ UT then f (e) = e. Otherwise let e′ and e′′ be
the two edges adjacent to e in the tour T as depicted in Figure 4 (we assume
an arbitrary orientation of T as in the proof of Lemma 1 ). Let e′′′ be the edge
forming a cycle of length 4 with e, e′ and e′′ (see Figure 4). We claim that there
is at least one edge among e′ and e′′ with a weight (1, 1) or (1, 2) and define
f (e) to be one of those edges (possibly chosen arbitrarily). Otherwise, we have
{e, e′′′ } ∈ S1 ∪ S2 and {e′ , e′′ } ∈ S3 (see Figure 3), contradicting the fact that
T is a local optimum with respect to the preference relation ≺1 . Now observe
that for a given edge e′ ∈ UT , there can be at most two edges e ∈ UO such that
f (e) = e′ . Therefore we have |UT | ≥ |UO |/2.
⊓
⊔
Proposition 1. If the tour O has a cost (X, X + α) with X a positive integer
(n ≤ X ≤ 2n) and n ≥ α ≥ 0, then the solution T achieves a performance
guarantee of 3/2 relatively to the solution O for both criteria.
1
2
Proof. Let (CO
, CO
) be the cost of the tour O and (CT1 , CT2 ) be the cost of the
1
tour T . We have CT1 = 2n − x − y, CO
= 2n − x′ − y ′ and CT2 = 2n − x − z,
2
CO
= 2n − x′ − z ′ . Let us consider the first coordinate. We want to show that
1
CT
1
CO
2n−x−y
3
= 2n−x
′ −y ′ ≤ 2 . Using Lemma 3 we get
Now we have to show
′
2n−x−y
2n−x′ −y ′
′
≤
′
2n− x2 − y2
2n−x′ −y ′
.
′
2n − x2 − y2
3
⇐⇒ 4n − x′ − y ′ ≤ 6n − 3x′ − 3y ′
≤
′
′
2n − x − y
2
⇐⇒ 2x′ + 2y ′ ≤ 2n
⇐⇒ x′ + y ′ ≤ n,
which is true since x′ + y ′ + z ′ + t′ = n and z ′ , t′ ≥ 0. We consider now the second
2
1
coordinate. Since the tour O has a cost (X, X + α), it means that CO
= CO
+α
′
′
and therefore z = y − α. We have to show
2n − x − z
3
⇐⇒ 4n − 2x − 2z ≤ 6n − 3x′ − 3z ′
≤
2n − x′ − z ′
2
⇐⇒ 3x′ − 2x + 3z ′ − 2z ≤ 2n
⇐⇒ 3x′ − 2x + 3y ′ − 3α − 2z ≤ 2(x′ + y ′ + z ′ + t′ )
⇐⇒ x′ − 2x − y ′ − α − 2z ≤ 2t′ ,
which is true since x′ − 2x ≤ 0 by Lemma 1.
⊓
⊔
46
E. Angel, E. Bampis, and L. Gourvès
We assume now that T is any 2-opt local optimal tour with respect to the
preference relation ≺2 . The tour O is any fixed tour. In a similar way as in the
case of Lemma 3 we can prove:
Lemma 4. With the preference relation ≺2 one has x + z ≥ (x′ + z ′ )/2.
Proof. The proof of Lemma 4 is symmetric to the one of Lemma 3.
⊓
⊔
Proposition 2. If the tour O has a cost (X + α, X) with X a positive integer
(n ≤ X ≤ 2n) and α > 0, then the solution T achieves a performance guarantee
of 3/2 relatively to the solution O for both criteria.
Proof. The proof of Proposition 2 is symmetric to the one of Proposition 1, using
Lemma 4 and Lemma 2 instead of Lemma 3 and Lemma 1.
⊓
⊔
Now, we are ready to prove Theorems 1 and 2.
Proof of Theorem 1.
Proof. Let s be an arbitrary tour. If s has a cost (X, X + α), α ≥ 0, then
using Proposition 1 the solution s1 3/2-approximately dominates the solution s.
Otherwise, s has a cost (X + α, X), α > 0, and using Proposition 2 the solution
s2 3/2-approximately dominates the solution s.
To see that this bound is asymptotically sharp consider the instance depicted
in Figure 7. The tour s1 s2 . . . s2n s1 is a local optimum with respect to ≺1 and
≺2 , and it has a weight n × (1, 1) + n × (2, 2) = (3n, 3n), whereas the optimal
tour
s1 s3 s2n s4 s2n−1 . . . sn−1 sn+4 sn sn+3 sn+1 sn+2 s2 s1
has a weight (2n − 1) × (1, 1) + (2, 2) = (2n + 1, 2n + 1).
s1
⊓
⊔
s2
s2n
s3
s4
s2n−1
(1, 1)
sn−1
sn+4
sn
sn+3
sn+2 sn+1
Fig. 7. The edges represented have a weight (1, 1), whereas non represented edges have
a weight (2, 2).
Approximating the Pareto Curve with Local Search
47
Proof of Theorem 2.
Proof. Let T be a tour. Let F1 (T ) = 3x + y with x (resp. y) the number of (1, 1)
edges (resp. (1, 2) edges) of T . We assume that one 2-opt move, with respect to
≺1 , transforms T to T ′ . Then it is easy to see that one has F1 (T ′ ) ≥ F1 (T ) + 1
for any such 2-opt move. Indeed, each 2-opt move with respect to ≺1 increases
either the number of (1, 2) without decreasing the number of (1, 1), or increases
the number of (1, 1) edges by decreasing the number of (1, 2) edges by at most
two. Since 0 ≤ F1 (T ) ≤ 3(x + y) ≤ 3n and F1 (T ) ∈ N, a local search which uses
≺1 converges to a local optimum solution in less than 3n steps.
One can use the same proof with ≺2 , just assume that F2 (T ) = 3x + z with x
(resp. z) the number of (1, 1) edges (resp. (2, 1) edges) of a tour T .
⊓
⊔
4
Concluding Remarks
In this paper we proposed a bicriteria local search procedure based on the standard 2-opt neighborhood which allowed to get a 3/2-approximate Pareto curve
for the bicriteria T SP (1, 2). Our results can be extended to the T SP (a, a + δ)
δ
with a ∈ R+∗ and 0 ≤ δ ≤ a. In that case we obtain an 1 + 2a
-approximate
Pareto curve. Since Chandra et al. [6] have shown that for the TSP satisfying
the triangle inequality, the worst-case performance ratio of 2-opt (resp. k-opt)
√
√
1
local search is at most 4 n and at least 41 n (resp. 41 n 2k ), our constant approximation result cannot be extended for the metric case. It would be however
interesting to establish lower and upper bounds for this more general case.
Our results can also be applied to the bicriteria version of the M AX T SP
(1, 2) problem. In this problem, the objective is the maximization of the length
of the tour. For the monocriterion case the best approximation algorithm known
has a performance ratio of 7/8 [8,9] (the previously known approximation algorithm had a performance ratio of 3/4 [10]). We can obtain for the bicriteria
case a 2/3-approximate Pareto curve in the following way. The idea is to modify
the instance by replacing each edge (2,2) by an edge (1,1), each edge (1,1) by
and edge (2,2), and each edge (1,2) by an edge (2,1) and vice et versa. It can be
shown that obtaining a 3/2-approximate Pareto curve for the bicriteria M IN
T SP (1, 2) on this modified instance yields a 2/3-approximate Pareto curve for
the bicriteria M AX T SP (1, 2) on the original instance. This is equivalent to say
that we work on the original instance, but using modified preference relations
≺′1 and ≺′2 obtained from ≺1 and ≺2 by replacing each edge (2,2) by an edge
(1,1), each edge (1,2) by an edge (2,1), and vice et versa.
An interesting question is whether it is possible to obtain constant approximation ratios for the more general k-criteria T SP (1, 2) problem (for k > 2). It
seems that our approach cannot be directly applied to this case.
48
E. Angel, E. Bampis, and L. Gourvès
References
1. N. Christofides. Worst-Case analysis of a new heuristic for the traveling salesman
problem. Technical Report, GSIA, Carnegie Mellon University, 1976.
2. R.M. Karp. Reducibility among combinatorial problems. Complexity of Computer
Computations, R.E. Miller and J.W. Thatcher (Eds.), Pluner, NY, 1972.
3. E. Aarts and J.K. Lenstra, Local search in combinatorial optimization, John Wiley
and Sons, 1997.
4. D.S. Johnson and L.A. McGeoch, The traveling salesman problem: a case study
in Local Optimization, chapter in Local search in combinatorial optimization, E.
Aarts and J.K. Lenstra (eds.), John Wiley and Sons, 1997.
5. M. Ehrgott Multicriteria Optimization, Lecture Notes in Economics and Mathematical Systems, vol. 491, Springer, 2000.
6. B. Chandra, H. Karloff and C. Tovey, New results on the old k-opt algorithm for
the TSP, SIAM Journal on Computing, 28(6), 1998–2029, 1999.
7. S. Khanna, R. Motwani, M. Sudan and V. Vazirani, On syntactic versus computational views of approximability, SIAM Journal on Computing, 28(1), 164–191,
1998.
8. J. Monnot, Differential approximation results for the traveling salesman and related problems, Information Processing Letters, 82(5), 229–235, 2002.
9. J. Monnot, V. Th. Paschos and S. Toulouse, Differential approximation results
for the traveling salesman problem with distances 1 and 2, European Journal of
Operational Research, 145, 557–568, 2003.
10. A.I. Serdyukov, An algorithm with an estimate for the traveling salesman problem
of the maximum, Upravlyaemye Sistemy, 25, 80–86, 1984.
11. C.H. Papadimitriou and M. Yannakakis, On the approximability of trade-offs and
optimal access of web sources, Proceedings 41th Annual IEEE Symposium on
Foundations of Computer Science, 86–92, 2000.
12. C.H. Papadimitriou and M. Yannakakis. The traveling salesman problem with
distances one and two. In Mathematics of Operations Research, 18(1), 1–11, 1993.
13. C.H. Papadimitriou, S. Vempala. On the approximability of the traveling salesman
problem. Proc. STOC’00, 126–133, 2000.
14. A. Gupta, A. Warburton. Approximation methods for multiple criteria traveling
salesman problems, Towards Interactive and Intelligent Decision Support Systems,
Proc. of the 7th International Conference on Multiple Criteria Decision Making,
(Y. Sawaragi Ed.), Springer Verlag, 211–217, 1986.
Scheduling to Minimize Max Flow Time: Offline
and Online Algorithms⋆
Monaldo Mastrolilli
IDSIA, Galleria 2, 6928 Manno, Switzerland
monaldo@idsia.ch
Abstract. We investigate the max flow scheduling problem in the offline and on-line setting. We prove positive and negative theoretical results. In the off-line setting, we address the unrelated parallel machines
model and present the first known fully polynomial time approximation
scheme, when the number of machines is fixed. In the on-line setting
and when the machines are identical, we analyze the First In First Out
(FIFO) heuristic when preemption is allowed. We show that FIFO is an
on-line algorithm with a (3 − 2/m)-competitive ratio. Finally, we present
two lower bounds on the competitive ratio of deterministic on-line algorithms.
1
Introduction
The m-machine scheduling problem is one of the most widely-studied problems
in computer science, with an almost limitless number of variants ( [3,6,12,18]
are surveys). The most common objective function is the makespan, which is the
length of the schedule, or equivalently the time when the last job is completed.
This objective function formalizes the viewpoint of the owner of the machines.
If the makespan is small, the utilization of his machines is high; this captures
the situation when the benefits of the owner are proportional to the work done.
If we turn our attention to the viewpoint of a user, the time it takes to finish
individual jobs may be more important; this is especially true in interactive
environments. Thus, if many jobs that are released early are postponed at the
end of the schedule, it is unacceptable to the user of the system even if the
makespan is optimal.
For that reason other objective functions are studied. With this aim, a wellstudied objective function is the total flow time [1,13,17]. The flow time of a
job is the time the job is in the system, i.e., the completion time minus the
time when it becomes first available. The above mentioned objective function is
the sum of these values over all jobs. The Shortest Remaining Processing Times
(SRPT) heuristic produces a schedule with optimum total flow time (see [12])
when there is a single processor. Unfortunately, this heuristic has the well-known
⋆
Supported by the “Metaheuristics Network”, grant HPRN-CT-1999-00106, and by
Swiss National Science Foundation project 20-63733.00/1, “Resource Allocation and
Scheduling in Flexible Manufacturing Systems”.
A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 49–60, 2003.
c Springer-Verlag Berlin Heidelberg 2003
50
M. Mastrolilli
drawback that it leads to starvation. That is, some jobs may be delayed to an
unbounded extent. Inducing starvation is an inherent property of the total flow
time metric. In particular, there exists inputs where any optimal schedule for
total flow time forces the starvation of some job (see Lemma 2.1 in [2]). This
property is undesirable.
From the discussion above, it is natural to conclude that in order to avoid
starvation, one should bound the flow time of each job. This motivates the study
of the minimization of the maximum flow time.
Problems: We address three basic types of parallel machine models. In each
there are n jobs J1 , ..., Jn to be scheduled on m machines M1 , ..., Mm . Each
machine can process at most one job at a time, and each job must be processed
in an uninterrupted fashion on one of the machines. We will also consider the
preemptive case, in which a job may be interrupted on one machine and continued
later (possibly on another machine) without penalty. Job Jj (j = 1, ..., n) is
released at time rj ≥ 0 and cannot start processing before that time. In the
most general setting, the machines are unrelated : job Jj takes pij = pj /sij time
units when processed by machine Mi , where pj is the processing requirement
of job Jj and sij is the speed of machine Mi for job Jj . If the machines are
uniformly related, then each machine Mi runs at a given speed si for all jobs.
Finally, for identical machines, we assume that si = 1 for each machine Mi .
We denote the completion time of job Jj in a schedule S by Cjs or Cj , if no
confusion is possible. The flow time of job Jj is defined as Fj = Cj − rj , and the
maximum flow time Fmax is maxj=1,...,n Fj . We seek to minimize the maximum
flow time.
In the off-line version of the problem, it is assumed that the scheduler has full
information of the problem instance. By contrast, in the on-line version of the
problem, jobs are introduced to the algorithm at their release times. Thus, the
algorithm bases its decision only upon information related to already released
jobs. In the on-line paradigm, we distinguish between the clairvoyant and nonclairvoyant model. In the clairvoyant model we assume that once a job is known
to the scheduler, its processing time is also known. In the non-clairvoyant model
the processing time of a job is unknown until its processing is completed.
Previous Work: To the best of our knowledge, the only known result about
the non-preemptive max flow time scheduling problem is due to Bender et al.
[2]. They address the on-line non-preemptive problem with identical parallel
machines (in the notation of Graham et al. [6], this problem is noted P |online; rj |Fmax ). In [2] they claim that the First In First Out (FIFO) heuristic
(that is, scheduling jobs in the order they arrive to the machine on which they
will finish first) is a (3 − 2/m)-competitive algorithm1 .
When preemption is allowed, in each of the three types of parallel models,
we observe that there are polynomial-time off-line algorithms for finding optimal
1
A ρ-competitive algorithm is an on-line algorithm that finds a solution within a ρ
factor of the optimum.
Scheduling to Minimize Max Flow Time: Offline and Online Algorithms
51
preemptive solutions: these are obtained by adapting the approaches proposed
in [14,15] for the preemptive parallel machines problems with release times and
deadlines. In [14,15] the objective function is the minimization of the maximum
lateness Lmax = max Lj , where Lj is the lateness of job Jj , that is the completion
time of Jj minus the its deadline (the time by which job Jj must be completed).
We can use the algorithms in [14,15] for the preemptive maximum flow time
minimization by setting the deadline of each job equal to its release time.
When the jobs release times are identical, the problem reduces to the classical
makespan minimization problem. In this case the three types of parallel machine
models have been studied extensively (see [3,6,12,18] for surveys). Here, we only
mention that these related scheduling problems are all strongly NP-hard [5],
and polynomial time approximation schemes2 (PTAS) are known when the machines are either identical or uniformly related [7,8]. For unrelated machines,
Lenstra, Shmoys and Tardos [16] gave a polynomial-time 2-approximation algorithm for this problem; and this is the currently known best approximation ratio
achieved in polynomial time. They also proved that for any positive ε < 1/2, no
polynomial-time (1 +ε)-approximation algorithm exists, unless P=NP. Since the
problem is NP-hard even for m = 2, it is natural to ask how well the optimum
can be approximated when there is only a constant number of machines. In contrast to the previously mentioned inapproximability result for the general case,
there exists a fully polynomial-time approximation scheme for the problem when
m is fixed. Horowitz and Sahni [10] proved that for any ε > 0, an ε-approximate
solution can be computed in O(nm(nm/ε)m−1 ) time, which is polynomial in
both n and 1/ε if m is constant. Recently, Jansen and Porkolab [11], and later
improved by Fishkin, Jansen and Mastrolilli [4], presented a fully polynomial
time approximation scheme for the problem whose running time is linear in the
number of jobs.
Note that, as the makespan problem is a special case of the max flow time
problem, all the mentioned negative results hold also for the problems addressed
in this paper.
Our Results: In this paper, we investigate the max flow time problem in the
off-line and on-line setting. We prove positive and negative theoretical results.
In the off-line setting, we address the unrelated parallel machines model
(Section 2.1) and present, when the number m of machines is fixed, the first
known fully polynomial time approximation scheme (FPTAS). Observe that no
polynomial time approximation scheme is possible when the number of machines
is part of the input [16], unless P=NP. Therefore, for fixed m obtaining a FPTAS
is to some extent the strongest possible result.
In the on-line setting and when the machines are identical, we analyze the
(non-preemptive) FIFO heuristic when preemption is allowed (noted as P |online; pmtn; rj |Fmax according to Graham et al. [6]). Bender et al. [2] claimed that
2
Algorithms that, for any fixed ε > 0, find a solution within a (1 + ε) factor of
the optimum in polynomial time. If the running time is bounded by a polynomial
in the input size and 1/ε, then these algorithms are called fully polynomail time
approximation schemes (FPTAS).
52
M. Mastrolilli
this strategy is a (3 − 2/m)-competitive algorithm for the non-preemptive problem. We show (Section 3.1) that FIFO comes within the same bound of the optimal preemptive schedule length. Since FIFO does not depend on the sizes of the
jobs, it is also an on-line non-clairvoyant algorithm with a (3−2/m)-competitive
ratio. In Section 3.2 we show that no 1-competitive (optimal) on-line algorithm
is possible for the preemptive problem (P |on-line; pmtn; rj |Fmax ). This result
should be contrasted with the related problem P |on-line; pmtn; rj |Cmax (i.e.,
the same problem with makespan as objective function) that admits an optimal
on-line algorithm [9]. In Section 3.3, we show that in the non-clairvoyant model
the competitive ratio cannot be better than 2. This proves that the competitive
ratio of FIFO matches the lower bound when m = 2. Finally, in Section 3.4
we address the problem with uniformly related parallel machines and identical
processing times (noted as Q|on-line; pj = p; rj |Fmax according to [6]). We show
that in this case FIFO is 1-competitive (optimal).
Due to page limit, several proofs had to be omitted from this version of the paper. A complete version of the paper is available (http://www.idsia.ch/˜monaldo
/research papers.html).
2
2.1
Offline Max Flow Time
A FPTAS for Unrelated Parallel Machines
In this section we consider the off-line problem of scheduling a set J =
{J1 , ..., Jn } of n independent jobs on a set M = {M1 , ..., Mm } of m unrelated
parallel machines. We present a FPTAS when the number m of machines is
a constant. Our approach consists of partitioning the set of jobs into blocks
B(1), B(2), ..., such that jobs belonging to any block can be scheduled regardless
of jobs belonging to other blocks (Separation Property). The FPTAS follows by
presenting a (1 + ε)-approximation algorithm for each block of jobs.
Separation Property. Let pj = mini=1,...,m pij denote the smallest processing
time of job Jj . Let R = {r(1), r(2), ..., r(ρ)} be the set of all release dates (ρ ≤ n
is the number of different release values). Assume, without loss of generality,
that r(1) < r(2) < ... < r(ρ). Set r(ρ + 1) = ∞. Partition jobs according to their
release times and let N (i) = {Jj : rj = r(i)}, i = 1, ..., ρ, denote the set of jobs
released at time r(i). Finally, let PN (i)
be the sum of the smallest processing
times of jobs from N (i), i.e., PN (i) = Jj ∈N (i) pj .
Block Definition. The first block B(1) is defined as follows. If r(1)+PN (1) ≤ r(2)
then B(1) = N (1). Otherwise, if r(1) + PN (1) + PN (2) ≤ r(3) then B(1) =
N (1) ∪ N (2), else continue similarly. More formally,
B(1) =
i=1,..,b1
N (i)
Scheduling to Minimize Max Flow Time: Offline and Online Algorithms
53
where b1 is the smallest positive integer such that
r(1) +
PN (i) ≤ r(b1 + 1).
i=1,..,b1
Therefore if a job belongs to B(1) then it could be completed not later than
time r(b1 + 1) (by assigning jobs to the machines with the smallest processing
requirements).
Other possible blocks are obtained in a similar way: if r(b1 + 1) ≤ r(ρ) then
discard all jobs from B(1) and apply a similar procedure to obtain the next block
B(2). More formally, for w = 2, 3, ..., the w-th block is defined as
B(w) =
N (i)
i=bw−1 +1,..,bw
where bw is the smallest positive integer such that
r(bw−1 + 1) +
PN (i) ≤ r(bw + 1).
i=bw−1 +1,..,bw
In the following, let us use β to denote the number of blocks. By definition,
observe that bβ = ρ.
release time of jobs from block B(i),
Block Property. Let rB(i) be the earliest
i.e., rB(i) = minJj ∈B(i) rj , and PB(i) =
Jj ∈B(i) pj . Formerly, we claim that
jobs belonging to any block can be scheduled regardless of jobs belonging to
other blocks. A sufficient condition to have this separation property would be
that in any ‘good’ (optimal or approximate) solution all jobs from block B(i)
(i = 1, ..., β) could be scheduled between time rB(i) and rB(i) + PB(i) . However,
this is not always true for this problem, as Example 1 shows.
Example 1. Consider an instance with 3 jobs and 2 machines. The data are
reported in the table of Figure 1.
In this example we have only one block B(1) and rB(1) + PB(1) = 5. In Figure
∗
1 it is shown an optimal solution (Fmax
= 3) in which the last job completes at
time 6 (> rB(1) + PB(1) ).
j
rj
p1j
p2j
1
0
3
10
2
2
1
3
3
3
3
1
M1
M2
Fig. 1. Block example
J3
J1
J2
54
M. Mastrolilli
We overcome the previous problem by showing that there exists always at
least one ‘good’ (optimal or approximate) solution in which all jobs from block
B(i) (i = 1, ..., β) are scheduled between time rB(i) and rB(i) + PB(i) . We prove
this by exhibiting an algorithm which transforms any solution into another solution with the desired separation property. Moreover, the objective function value
of the new solution is not worse than the previous one.
Separation Algorithm. Assume that we have a solution SOL of value Fmax in
which jobs from different blocks are not scheduled separately. Then there exists
at least one block, say B(w), in which the last job of B(w) completes after time
rB(w) + PB(w) . For those blocks B(w), and starting with the block with the
lowest index w, we show how to reschedule jobs from B(w) such that they are
completed within time rB(w) + PB(w) , and without worsening the solution value.
Let C(i) denote the time all jobs from N (i) are completed according to solution SOL, i.e., the time the last job from N (i) completes.
Observe that Fmax =
maxi (C(i) − r(i)). Recall the block definition B(w) = i=bw−1 +1,..,bw N (i), and
let N (l) ⊆ B(w) be the last released group of jobs such that
C(l) ≤ rB(w) +
PN (i) .
i=bw−1 +1,...,l
By construction we have
C(x) > rB(w) +
PN (i) , for x = l + 1, ..., bw .
i=bw−1 +1,...,x
Now remove from SOL all jobs belonging to N (l +1)∪...∪N (bw ) and reschedule
them in order of non-decreasing release times and on the machine requiring the
lowest processing time. We claim that according to the new solution SOL′ the
completion time C ′ (i) of every class N (i) is not increased, i.e. C ′ (i) ≤ C(i) for
i = l + 1, ..., bw , and all jobs from B(w) are completed by time rB(w) + PB(w) .
Indeed, the new completion time C ′ (l
+ 1) of jobs from N (l + 1) is bounded by
C(l)+PN (l+1) that is at most rB(w) + i=bw−1 +1,...,l+1 PN (i) , and by construction
less than C(l + 1). More generally, this property holds for every set N (x + 1)
with x = l + 1, ..., bw , i.e.
C ′ (x + 1) ≤ C ′ (x) + PN (x+1)
≤ rB(w) +
PN (i) < C(x + 1).
i=bw−1 +1,...,x+1
It follows that in solution
SOL′ every job from N (x) (⊆ B(w)) is completed
within time rB(w) + i=bw−1 +1,...,x PN (i) and therefore every job from B(w) is
′
completed by time rB(w) + PB(w) . Moreover the maximum flow time Fmax
of the
′
′
new solution is not increased since Fmax = maxi (C (i) − r(i)) ≤ maxi (C(i) −
r(i)) = Fmax .
Scheduling to Minimize Max Flow Time: Offline and Online Algorithms
55
Lemma 1. Without increasing the maximum flow time, any given solution can
be transformed into a new feasible solution having all jobs from block B(w) (w =
1, ..., β) scheduled between time rB(w) and rB(w) + PB(w) .
Block Approximation. By Lemma 1 a (1 + ε)-approximate solution can be
obtained as follows: starting from the first block, compute a (1 + ε)-approximate
schedule for each block B(w) that starts at time rB(w) and completes by time
rB(w) + PB(w) , i.e., not later than the earliest starting time of the next block
B(w + 1) of jobs. A (1 + ε)-approximate solution can be computed in polynomial
time if there exists a polynomial time (1 + ε)-approximation algorithm for each
block of jobs.
By previous arguments, we focus our attention on a single block of jobs and
assume, without loss of generality, that the input instance is given by this set of
jobs. For simplicity of notation we again use n to denote the number of jobs in
the block instance and {J1 , ..., Jn } the set of jobs. Moreover, we assume, without
loss of generality, that the earliest release date is zero, i.e., minj rj = 0.
Observe that pmax = maxj pj is a lower bound for the minimum objec∗
∗
tive value Fmax
, i.e., Fmax
≥ pmax . By block definition,
Lemma 1 and since
n
minj rj = 0, all jobs can be completed by time j=1 pj ≤ npmax . Moreover,
any solution that completes within time npmax has a maximum flow time that
∗
cannot be larger than npmax . Therefore, the optimal objective value Fmax
can
∗
be bounded as follows: pmax ≤ Fmax ≤ npmax . Without loss of generality, we restrict our attention to finding those solutions with maximum flow time at most
npmax . Therefore we can discard all solutions whose last job completes later than
2npmax , since all solutions with greater length have a maximum flow time larger
than npmax . Similarly, we will implicitly assume that job Jj cannot be scheduled
on those machines Mi with pij > npmax , since otherwise the resulting schedule
would have a maximum flow time larger than npmax .
In the following we show how to compute a (1 + ε)-approximate solution in
which the last job completes not later than 2npmax . This solution can be always
transformed
a (1 + ε)-approximate solution with the last job completing not
into
n
later than j=1 pj by Lemma 1.
The (1 + ε)-approximation algorithm is structured in the following three
steps.
1. Round input values.
2. Find an optimal solution of the rounded instance.
3. Unround values.
We will first describe step 2, then step 1 with its “inverse” step 3.
An Optimal Algorithm. We start making some observations regarding the maximum flow time of a schedule. First renumber the jobs such that r1 ≤ r2 ≤ ... ≤ rn
holds. A simple job interchange argument shows that for a single machine, the
maximum flow time is minimized if the jobs are processed in a non-decreasing
order of release times. This property was first observed by Bender et al. [2].
56
M. Mastrolilli
We may view any m-machine schedule as an assignment of the set of jobs
to machines with jobs assigned to machine Mi being processed in increasing
order of index. Consequently given an assignment the max flow time is easily
computed. We are interested in obtaining an assignment which minimizes Fmax .
Thus we may regard assignment and schedule as synonymous.
A completion configuration c is a m-dimensional vector c =(c1 , ..., cm ): ci
denotes the completion time of machine Mi , for i = 1, ..., m. A partial schedule,
σk is an assignment of the jobs J1 , ..., Jk to machines. A completion schedule ωk
is an assignment of the remaining jobs Jk+1 , ..., Jn to machines. Consider two
partial schedules σk1 and σk2 such that according to σk1 the last job on machine
Mi (for i = 1, ..., m) completes not later than the last job scheduled on the same
machine Mi according to σk2 ; moreover the maximum flow time of σk1 is not larger
than that of σk2 . If this happens we say that σk1 dominates σk2 . It is easy to check
that whatever is the completion schedule ωk , the schedule obtained considering
the assignment of jobs as in σk1 and ωk cannot be worse that attainable with σk2
and ωk . Therefore, with no loss, we can discard all dominated partial schedules.
The reason is that by adding the remaining jobs Jk+1 , ..., Jn in order of increasing
index, the completion time of the current job Jj (j = k + 1, ..., n) is a monotone
non-decreasing function of the completion times of machines before scheduling
Jj (and does not depend on how J1 , ..., Jj−1 are really scheduled). Therefore
if jobs J1 , ..., Jk are scheduled according to σk1 then the maximum flow time of
jobs Jk+1 , ..., Jn , when scheduled according to any ωk , cannot be larger than
the maximum flow time of the same set of jobs when J1 , ..., Jk are scheduled
according to σk2 .
We encode a feasible schedule s by a (m + 1)-dimensional vector
s =(c1 , ..., cm , F ), where (c1 , ..., cm ) is a completion configuration and F is the
maximum flow time in s. We say that schedule s1 =(c′1 , ..., c′m , F ′ ) dominates
s2 =(c′′1 , ..., c′′m , F ′′ ) if c′i ≤ c′′i , for i = 1, ..., m, and F ′ ≤ F ′′ . Moreover, since
∗
Fmax
≤ npmax we classify as dominated all those schedule s =(c1 , ..., cm , F ) with
F > npmax . The latter implies ci ≤ 2npmax (i = 1, ..., m) in any not dominated
schedule.
For every s =(c1 , ..., cm , F ), let us define the operator ⊕ as follows:
s ⊕ pij = (c1 , ..., c′i , ..., cm , F ′ )
where
c′i
=
ci + pij if rj ≤ ci
rj + pij otherwise
and F ′ = max {F ; c′i − rj }.
The following dynamic programming algorithm computes the optimal solution:
Scheduling to Minimize Max Flow Time: Offline and Online Algorithms
57
Algorithm OPT-Fmax
1. Initialization: L0 ← {(c1 = 0, ..., cm = 0, 0)}
2. For j = 1 to n
3.
For i = 1 to m
4.
For every vector s ∈ Lj−1 put vector s ⊕ pij in Lj
5.
Discard from Lj all dominated schedules
6. Output: return the vector (c1 , ..., cm , F ) ∈ Ln with minimum F
At line 4, the algorithm schedules job Jj at the end of machine Mi . At line
5, all dominated partial schedules are discarded.
The total running time of the dynamic program is O(nmD), where D is the
maximum number of not dominated schedules at steps 4 and 5. Let δ be the
maximum number of different values that each machine completion time ci can
take in any not dominated schedule. The reader should have no difficulty to
bound D by O(δ m ). Therefore, the described algorithm is, for every fixed m, a
polynomial time algorithm iff δ is polynomial in n and 1/ε. The next subsection
shows how to transform any given instance such that the latter happens.
Rounding and Unrounding Jobs. Let ε > 0 be an arbitrary small rational number
and assume, for simplicity, that 1/ε is an integral value. The first step is to round
max
i, for
down every processing and release time to the nearest lower value of εp2n
2
i = 0, 1, . . . , 2n /ε; clearly this does not increase the objective function value.
Note that the largest release time rn is not greater than npmax since all jobs can
be completed by that time. Then, find the optimal solution SOL of the resulting
instance by using the dynamic programming approach described in the previous
subsection. Observe that, since in every not dominated schedule the completion
time ci of any machine Mi cannot be larger than 2npmax , then the maximum
max
number δ of different values of ci is now bounded by 1 + (2npmax )/( εp2n
) =
1 + 4n2 /ε, i.e., polynomial in n and 1/ε.
Solution SOL can be easily modified to be a feasible solution also for the
max
(this is
original instance. First, delay the starting time of each job by εp2n
sufficient to guarantee that all jobs do not start before their original release date);
max
the completion time of each job may increase by at most εp2n
. Second, replace
the rounded processing values with the originals; now the completion time of
each job may increase by at most εpmax /2 (here we are using the assumption
that each processing time cannot be larger than npmax , and that each processing
max
). Therefore, we may potentially increase the
time may increase by at most εp2n
∗
max
+ εp2n
≤ εFmax
. This results in
maximum flow time of SOL by at most εpmax
2
a (1 + ε)-approximate solution for the block instance.
The total running time of the described FPTAS is determined by the dynamic
programming algorithm, that is O(nm(n2 /ε)m ).
Theorem 1. For the problem of minimizing the maximum flow time in scheduling n jobs on m unrelated machines (m fixed), there exists a fully polynomial
time approximation scheme that runs in O(nm(n2 /ε)m ) time.
58
3
3.1
M. Mastrolilli
Online Max Flow Time
Analysis of FIFO for P|on-line; pmtn; rj |Fmax
In this section we will analyze the FIFO heuristic when preemption is allowed
and in the identical machines model. Bender et al. [2] claimed that this strategy
is a (3 − 2/m)-competitive algorithm for nonpreemptive scheduling. We show
that FIFO (that is non-preemptive) comes within the same bound of the optimal
preemptive schedule length. Since FIFO does not depend on the sizes of the jobs,
it is also an on-line non-clairvoyant algorithm with a (3−2/m)-competitive ratio.
In Section 3.2 we will show that no 1-competitive (optimal) on-line algorithm is
possible.
Lower Bounds. First observe that pmax = maxj pj is a lower bound for the
∗
∗
minimum objective value Fmax
, i.e., Fmax
≥ pmax . In the following we provide a
second lower bound.
Consider a relaxed version of the problem in which a job Jj can be processed
by more that one machine simultaneously and without changing the total processing time pj that Jj spends on machines. Let us call this relaxed version of
◦
of the
the problem as the fractional problem. Clearly the optimal value Fmax
∗
fractional problem cannot be larger than Fmax , i.e. the optimal preemptive max
flow time.
Now, recall the definitions given in subsection 2.1, and without loss of generality, let us renumber the jobs J1 , J2 , ..., Jn such that r1 ≤ r2 ≤ ... ≤ rn .
Consider the following rule that we call fractional FIFO: schedule jobs in order
of increasing index and assigning pj /m time units of job Jj (j = 1, ..., n) to each
machine.
Lemma 2. The optimal solution of the fractional problem can be obtained by
using the fractional FIFO.
Now according to the fractional FIFO, let the fractional load ℓ(i) at time r(i)
be defined as the total sum of processing times of jobs that at time r(i) have
been released but not yet finished. More formally, we have
ℓ(1) = PN (1) ,
ℓ(i + 1) = PN (i+1) + max{ℓ(i) − m(r(i + 1) − r(i)); 0}.
By Lemma 2, the maximum flow time FN (i) of jobs from N (i) is the time required
to process all jobs that at time r(i) have been released but not yet finished, i.e.
◦
FN (i) = ℓ(i)/m. The optimal solution value Fmax
of the fractional solution is
ℓmax
1
therefore equal to m = m maxi=1,...,ρ ℓ(i). We will refer to this value ℓmax as
∗
the maximal fractional load over time. Since the optimal solution value Fmax
◦
of our original preemptive problem cannot be smaller than Fmax , we have the
following lower bounds
∗
≥ max{
Fmax
ℓmax
; pmax }.
m
(1)
Scheduling to Minimize Max Flow Time: Offline and Online Algorithms
59
Analysis of FIFO. We start showing that FIFO delivers a schedule whose max1
imum flow time is within ℓmax
m + 2(1 − m )pmax .
Lemma 3. FIFO returns a solution with maximum flow bounded by
ℓmax
1
+ 2(1 − )pmax .
m
m
By Lemma 3 and inequality (1) it follows that FIFO is a (3 − 2/m)competitive algorithm. Moreover, this bound is tight.
Theorem 2. FIFO is (3 − 2/m)-competitive algorithm for P |on-line; pmtn;
rj |Fmax and this bound is tight.
3.2
A Lower bound for P|on-line; pmtn; rj |Fmax
We show that no on-line preemptive algorithm can be 1-competitive.
Theorem 3. The competitive ratio of any deterministic algorithm for P |on1
.
line; pmtn; rj |Fmax is at least 1 + 14
This result should be contrasted with the related problem P |on-line; pmtn;
rj |Cmax (i.e., the same problem with makespan as objective function) that admits
an optimal on-line algorithm [9]. Moreover, we already observed that in the offline setting the problem can be solved optimally in polynomial time by adapting
the algorithm described in [14,15].
3.3
A Lower bound for P|on-line-nclv; rj |Fmax
When jobs processing times are known at their arrival dates (clairvoyant model),
Bender et al. [2] observed a simple lower bound of 3/2 on the competitive ratio
of any on-line deterministic algorithm. In the following we show that in the nonclairvoyant model the competitive ratio cannot be better than 2. This shows
that the competitive ratio of FIFO matches the lower bound when m = 2.
Theorem 4. The competitive ratio of any deterministic algorithm for P |online-nclv; rj |Fmax is at least 2.
3.4
Analysis of FIFO for Q|on-line; pj =p; rj |Fmax
We address the problem with identical and uniformly related parallel machines.
We assume that the processing times of jobs are identical. Simple analysis shows
that FIFO is optimal.
Theorem 5. FIFO is 1-competitive for Q|on-line; pj = p; rj |Fmax .
60
M. Mastrolilli
References
1. B. Awerbuch, Y. Azar, S. Leonardi, and O. Regev. Minimizing the flow time
without migration. In In Proceedings of the 31st Annual ACM Symposium on
Theory of Computing (STOC’99), pages 198–205, 1999.
2. M. A. Bender, S. Chakrabarti, and S. Muthukrishnan. Flow and stretch metrics for
scheduling continuous job streams. In Proceedings of the 9th Annual ACM-SIAM
Symposium on Discrete Algorithms (SODA’98), pages 270–279, 1998.
3. B. Chen, C. Potts, and G. Woeginger. A review of machine scheduling: Complexity,
algorithms and approximability. Handbook of Combinatorial Optimization, 3:21–
169, 1998.
4. A. Fishkin, K. Jansen, and M. Mastrolilli. Grouping techniques for scheduling
problems: simpler and faster. In 9th Annual European Symposium on Algorithms
(ESA’01), volume LNCS 2161, pages 206–217, 2001.
5. M. R. Garey and D. S. Johnson. Computers and intractability; a guide to the theory
of NP-completeness. W.H. Freeman, 1979.
6. R. Graham, E. Lawler, J. Lenstra, and A. R. Kan. Optimization and approximation
in deterministic sequencing and scheduling: A survey. volume 5, pages 287–326.
North–Holland, 1979.
7. D. Hochbaum and D. Shmoys. Using dual approximation algorithms for scheduling
problems: theoretical and practical results. Journal of the ACM, 34:144–162, 1987.
8. D. Hochbaum and D. Shmoys. A polynomial approximation scheme for machine
scheduling on uniform processors: Using the dual approximation approach. SIAM
J. on Computing, 17:539–551, 1988.
9. K. S. Hong and J. Y.-T. Leung. On-line scheduling of real-time tasks. IEEE
Transactions on Computing, 41:1326–1331, 1992.
10. E. Horowitz and S. Sahni. Exact and approximate algorithms for scheduling nonidentical processors. Journal of the ACM, 23(2):317–327, 1976.
11. K. Jansen and L. Porkolab. Improved approximation schemes for scheduling unrelated parallel machines. In Proceedings of the 31st Annual ACM Symposium on
the Theory of Computing, pages 408–417, 1999.
12. D. Karger, C. Stein, and J. Wein. Scheduling algorithms. In M. J. Atallah, editor,
Handbook of Algorithms and Theory of Computation. CRC Press, 1997.
13. H. Kellerer, T. Tautenhahn, and G. J. Woeginger. Approximability and nonapproximability results for minimizing total flow time on a single machine. In In Proceedings of the 28th Annual ACM Symposium on Theory of Computing (STOC’96),
pages 418–426, 1996.
14. J. Labetoulle, E. L. Lawler, J. K. Lenstra, and A. H. G. R. Kan. Preemptive
scheduling of uniform machines subject to release dates. In W. R. Pulleyblank,
editor, Progress in Combinatorial Optimization, pages 245–261. Academic Press,
1984.
15. E. Lawler and J. Labetoulle. On preemptive scheduling of unrelated parallel processors by linear programming. Journal of the ACM, 25:612–619, 1978.
16. J. K. Lenstra, D. B. Shmoys, and E. Tardos. Approximation algorithms for scheduling unrelated parallel machines. Mathematical Programming, 46:259–271, 1990.
17. S. Leonardi and D. Raz. Approximating total flow time on parallel machines.
In Proc. 28th Annual ACM Symposium on the Theory of Computing (STOC’96),
pages 110–119, 1997.
18. J. Sgall. On-line scheduling – a survey. In A. Fiat and G. Woeginger, editors, OnLine Algorithms, Lecture Notes in Computer Science. Springer-Verlag, Berlin.,
1997.
Linear Time Algorithms for Some NP-Complete
Problems on (P5 ,Gem)-Free Graphs
(Extended Abstract)
Hans Bodlaender1 , Andreas Brandstädt2 , Dieter Kratsch3 ,
Michaël Rao3 , and Jeremy Spinrad4
1
3
Institute of Information and Computing Sciences, Utrecht University
P.O. Box 80.089, 3508 TB Utrecht, The Netherlands
hansb@cs.uu.nl
2
Fachbereich Informatik, Universität Rostock
A.-Einstein-Str. 21, 18051 Rostock, Germany
ab@informatik.uni-rostock.de
Université de Metz, Laboratoire d’Informatique Théorique et Appliquée
57045 Metz Cedex 01, France
fax: ++ 00 33 387315309
{kratsch,rao}@sciences.univ-metz.fr
4
Department of Electrical Engineering and Computer Science
Vanderbilt University, Nashville TN 37235, U.S.A.
spin@vuse.vanderbilt.edu
Abstract. A graph is (P5 ,gem)-free, when it does not contain P5 (an induced path
with five vertices) or a gem (a graph formed by making an universal vertex adjacent
to each of the four vertices of the induced path P4 ) as an induced subgraph.
Using a characterization of (P5 ,gem)-free graphs by their prime graphs with
respect to modular decomposition and their modular decomposition trees [6],
we obtain linear time algorithms for the following NP-complete problems
on (P5 ,gem)-free graphs: Minimum Coloring, Maximum Weight Stable Set,
Maximum Weight Clique, and Minimum Clique Cover.
Keywords: algorithms, graph algorithms, NP-complete problems, modular decomposition, (P5 ,gem)-free graphs.
1
Introduction
Graph decompositions play an important role in graph theory. The central role of decompositions in the recent proof of one of the major open conjectures in Graph Theory,
the so-called Strong Perfect Graph Conjecture of C. Berge, is an exciting example [9].
Furthermore various decompositions of graphs such as decomposition by clique cutsets,
tree-decomposition and clique-width are often used to design efficient graph algorithms.
There are even beautiful general results stating that a variety of NP-complete graph
problems can be solved in linear time for graphs of bounded treewidth and bounded
clique-width, respectively [1,12].
A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 61–72, 2003.
c Springer-Verlag Berlin Heidelberg 2003
62
H. Bodlaender et al.
Despite the fact that modular decomposition is a well-known decomposition in graph
theory having algorithmic uses that seem to be simple and obvious, there is relatively
few research concerning non-trivial uses of modular decomposition such as designing
polynomial time algorithms for NP-complete problems on special graph classes. An important exception are the many linear and polynomial time algorithms for cographs [10,
11] i.e. P4 -free graphs which are known to have a cotree representation which allows to
solve various NP-complete problems in linear time when restricted to cographs, among
them the problems Maximum (Weight) Stable Set, Maximum (Weight) Clique, Minimum Coloring and Minimum Clique Cover [10,11].
The original motivation to study (P5 ,gem)-free graphs, as a natural generalization of
cographs, by the authors of [6] had been to construct a faster, possibly linear time algorithm for the Maximum Stable Set problem on (P5 ,gem)-free graphs. They established a
characterization of the (P5 ,gem)-free graphs by their prime induced subgraphs called the
Structure Theorem for (P5 ,gem)-free graphs. We show in this paper that the Structure
Theorem is a powerful tool to design efficient algorithms for NP-complete problems
on (P5 ,gem)-free graphs. All our algorithms use the modular decomposition tree of the
input graph and the structure of the prime (P5 ,gem)-free graphs. We are convinced that
efficient algorithms for other NP-complete graph problems (e.g. domination problems)
on (P5 ,gem)-free graphs can also be obtained by this approach.
It is remarkable that there are only few papers establishing efficient algorithms for
NP-complete graph problems using modular decomposition and that most of them consider a single problem, namely Maximum (Weight) Stable Set. For work dealing with
other problems we refer to [4,5,18]. Concerning the limits of modular decomposition it
is known, for example, that Achromatic Number, List Coloring, and λ2,1 -Coloring with
pre-assigned colors remain NP-complete on cographs [2,3,19]. This implies that these
three problems are NP-complete on (P5 ,gem)-free graphs.1
There is also a strong relation between modular decomposition and the clique-width
of graphs. For example, if all prime graphs of a graph class have bounded size then
this class has bounded clique-width. Problems definable in a certain logic, so-called
LinEMSOL(τ1,L )-definable problems, such as Maximum (Weight) Stable Set, Maximum (Weight) Clique and Minimum (Weight) Dominating Set, can be solved in linear
time on any graph class of bounded clique-width, assuming a k-expression describing the graph is part of the input [12]. Many other NP-complete problems which are
not LinEMSOL(τ1,L )-definable can be solved in polynomial time on graph classes of
bounded clique-width [15,20].
Brandstädt et al. have shown that (P5 ,gem)-free graphs have clique-width at most
five [7]. However this does not yet imply linear time algorithms for LinEMSOL(τ1,L )definable problems on (P5 ,gem)-free graphs, since their approach does not provide a
linear time algorithm to compute a suitable k-expression.
We present a linear time algorithm to solve the NP-complete Minimum Coloring
problem on (P5 ,gem)-free graphs using modular decomposition in Section 5. The NP1
A proof, similarly to the one in [3] shows that λ2,1 -Coloring is NP-complete for graphs with at
most one prime induced subgraph, the P4 , and hence for (P5 ,gem)-free graphs.
Linear Time Algorithms for Some NP-Complete Problems on (P5 ,Gem)-Free Graphs
63
complete problems Maximum Weight Stable Set, Maximum Weight Clique and Minimum Clique Cover can also be solved by linear time algorithms using modular decomposition for (P5 ,gem)-free graphs. Due to space constraints, these algorithms are not
shown in this extended abstract.
2
Preliminaries
We assume the reader to be familiar with standard graph theoretic notations. In this
paper, G = (V, E) is a finite undirected graph, and |V | = n and |E| = m. N (v) := {u :
u ∈ V, u = v, uv ∈ E} denotes the open neighborhood of v and N [v] := N (v) ∪ {v}
the closed neighborhood of v. The complement graph of G is denoted G = (V, E). For
U ⊆ V let G[U ] denote the subgraph of G induced by U . A graph is co-connected if
its complement G is connected. If for U ⊂ V , a vertex not in U is adjacent to exactly k
vertices in U then it is called a k-vertex for U .
A function f : V → N is a (proper) coloring of the graph G = (V, E), if {u, v} ∈ E
implies f (u) = f (v). The chromatic number of G, denoted χ(G), is the smallest k such
that the graph G has a k-coloring f : V → {1, 2, . . . , k}.
Let G = (V, E) be a graph with vertex weight
function w : V → N. The weight
of a vertex set U ⊆ V is defined to be w(U ) := u∈U w(u). We let αw (G) denote
the maximum weight of a stable set of G and ωw (G) denote the maximum weight of
a clique of G. A weighted k-coloring of (G, w) assigns to each vertex v of G w(v)
different colors, i.e. integers of {1, 2, . . . , k}, such that {x, y} ∈ E implies that no color
assigned to x is equal to a color assigned to y. χw (G) denotes the smallest k such that
the graph G with weight function w has a weighted k-coloring. Note that each weighted
k-coloring of (G, w) corresponds to a multiset S1 , S2 , . . . , Sk of stable sets of G where
Si , i ∈ {1, 2, . . . , k}, is the set of all vertices of G to which color i is assigned.
3
Modular Decomposition
Modular decomposition is a fundamental decomposition technique that can be applied to
graphs, partially ordered sets, hypergraphs and other structures. It has been described and
used under different names and it has been rediscovered various times. Gallai introduced
and studied modular decomposition in his seminal 1967 paper [17] where it is used to
decompose comparability graphs.
A vertex set M ⊆ V is a module in G if for all vertices x ∈ V \ M , x is either
adjacent to all vertices in M , or non-adjacent to all vertices in M . The trivial modules
of G are ∅, V and the singletons. A homogeneous set in G is a nontrivial module in G.
A graph containing no homogeneous set is called prime. Note that the smallest prime
graph is the P4 . A homogeneous set M is maximal if no other homogeneous set properly
contains M .
Modular decomposition of graphs is based on the following decomposition theorem.
Theorem 1 ([17]). Let G = (V, E) be a graph with at least two vertices. Then exactly
one of the following conditions holds:
64
H. Bodlaender et al.
(i) G is not connected: it can be decomposed into its connected components;
(ii) G is not connected: G can be decomposed into the connected components of G;
(iii) G is connected and co-connected. There is some U ⊆ V and a unique partition P
of V such that
(a) |U | > 3,
(b) G[U ] is a maximal prime induced subgraph of G, and
(c) for every class S of the partition P , S is a module of G and |S ∩ U | = 1.
Consequently there are three decomposition operations.
0-Operation: If G is disconnected then decompose it into its connected components
G1 , G2 , . . . , Gr .
1-Operation: If G is disconnected then decompose G into G1 , G2 , . . . Gs , where G1 ,
G2 , . . . Gs are the connected components of G.
2-Operation: If G = (V, E) is connected and co-connected then its maximal homogeneous sets are pairwise disjoint and they form the partition P of V . The graph G[U ] is
obtained from G by contracting every maximal homogeneous set of G to a single vertex;
it is called the characteristic graph of G and denoted by G∗ . (Note that the characteristic
graph of a connected and co-connected graph G is prime.)
The decomposition theorem and the above mentioned operations lead to the uniquely
determined modular decomposition tree T of G. The leaves of the modular decomposition tree are the vertices of G. The interior nodes of T are labeled 0, 1 or 2 according
to the operation corresponding to the node. Thus we call them 0-node (parallel node),
1-node (series node) and 2-node (prime node). Any interior node x of T corresponds to
the subgraph of G induced by the set of all leaves in the subtree of T rooted at x, denoted
by G(x).
0-node. The children of a 0-node x correspond to the components obtained by a 0operation applied to the disconnected graph G(x).
1-node. The children of a 1-node x correspond to the components obtained by a 1operation applied to the not co-connected graph G(x).
2-node The children of a 2-node x correspond to the subgraphs induced by the maximal
homogeneous sets or single vertices of the connected and co-connected graph G(x).
Additionally, the characteristic graph of G(x) is assigned to the 2-node x.
The modular decomposition tree is of basic importance for many algorithmic applications, and in [22,13,14], linear time algorithms are given for determining the modular
decomposition tree of an input graph.
Often, algorithms exploiting the modular decomposition have the following structure. Let Π be a graph problem to be solved on some graph class G, e.g., Maximum
Stable Set on (P5 ,gem)-free graphs. First the algorithm computes the modular decomposition tree T of the input graph G using one of the linear time algorithms. Then in
a bottom up fashion the algorithm computes for each node x of T the optimal value
for the subgraph G(x) of G induced by the set of all leaves of the subtree of T rooted
at x. Thus the computation starts assigning the optimal value to the leaves. Then the
algorithm computes the optimal value of an interior node x by using the optimal values
of all children of x depending on the type of the node. Finally the optimal value of the
Linear Time Algorithms for Some NP-Complete Problems on (P5 ,Gem)-Free Graphs
65
root is the optimal value of Π for the input graph G. (Note that various more complicated
variants of this method can be useful.)
Thus to specify such a modular decomposition based algorithm we only have to
describe how to obtain the value for the leaves, and which formula to evaluate or which
subproblem to solve on 0-nodes, 1-nodes and 2-nodes, using the values of all children as
input. It is well-known how to do this for 0-nodes and 1-nodes for the NP-complete graph
problems Maximum Weight Stable Set, Maximum Weight Clique, Minimum Coloring
and Minimum Clique Cover from the corresponding cograph algorithm [10,11]. On
the other hand to find out the algorithmic problem to solve on 2-nodes, called the 2node subproblem, for solving problem Π using modular decomposition can be quite
challenging.
4 The Structure Theorem for (P5 ,Gem)-Free Graphs
To state the Structure Theorem of (P5 ,gem)-free graphs we need to define three classes
of (P5 ,gem)-free graphs which together contain all prime (P5 ,gem)-free graphs.
Definition 1. A graph G = (V, E) is called matched cobipartite if its vertex set V is
partionable into two cliques C1 , C2 with |C1 | = |C2 | or |C1 | = |C2 | − 1 such that the
edges between C1 and C2 form a matching and at most one vertex in C1 and at most
one vertex in C2 are not covered by the matching.
Definition 2. A graph G is called specific if it is the complement of a prime induced
subgraph of one of the three graphs in Figure 1.
Fig. 1.
To establish a definition of the third class of prime graphs, we do need some more
notions. A graph is chordal if it contains no induced cycles Ck , k ≥ 4. See e.g. [8] for
properties of chordal graphs. A graph is cochordal if its complement graph is chordal.
A vertex v is simplicial in G if its neighborhood N (v) in G is a clique. A vertex v is
cosimplicial in G if it is simplicial in G. It is well-known that every chordal graph has
a simplicial vertex and that such a vertex can be found in linear time.
We also need the following kind of substituting a C5 into a vertex: For a graph G
and a vertex v in G, let the result of the extension operation ext(G, v) denote the graph
G′ resulting from G by replacing v with a C5 (v1 , v2 , v3 , v4 , v5 ) of new vertices such
that v2 , v4 and v5 have the same neighborhood in G as v, and v1 , v3 have only their C5
66
H. Bodlaender et al.
neighbors, i.e. have degree 2 in G′ . For a vertex set U ⊆ V of G, let ext(G, U ) denote
the result of applying repeatedly the extension operation to all vertices of U . Note that
the resulting graph does not depend on the order of replacing U vertices.
Definition 3. For k ≥ 0, let Ck be the class of prime graphs G′ = ext(G, Q) resulting
from a (not necessarily prime) cochordal gem-free graph G by extending a clique Q of
exactly k cosimplicial vertices of G. Thus, C0 is the class of prime cochordal gem-free
graphs.
Clearly each graph in Ck contains k C5 ’s which are vertex-disjoint. It is also known
that each graph in Ck has neither C4 nor C6 as an induced subgraph [6].
Lemma 1. Let G = (V, E) be a graph of Ck , k ≥ 1. Then for every C5 C = (v1 , v2 , v3 ,
v4 , v5 ) of G, the vertex set V has a partition into {v1 , v2 , v3 , v4 , v5 }, the stable set A of
0-vertices for C and the set B of 3-vertices for C such that all vertices of B have the
same non consecutive neighbors in C, say v2 , v4 , v5 , and G[B] is a cograph.
Theorem 2 (Structure Theorem [6]). A connected and co-connected graph G is
(P5 ,gem)-free if and only if the following conditions hold:
(1) The homogeneous sets of G are P4 -free (i.e., induce a cograph);
(2) For the characteristic graph G∗ of G, one of the following conditions holds:
(2.1) G∗ is a matched co-bipartite graph;
(2.2) G∗ is a specific graph;
(2.3) there is a k ≥ 0 such that G∗ is in Ck .
Consequently, the modular decomposition tree T of any connected (P5 ,gem)-free
G contains at most one 2-node. If G is a cograph then T has no 2-node. If G is not a
cograph then the only 2-node of T is its root.
5 An Algorithm for Minimum Coloring on (P5 ,Gem)-Free Graphs
In this section we present a linear time algorithm for the Minimum Coloring problem
on (P5 ,gem)-free graphs. That is we are given a (P5 ,gem)-free graph G, and want to
determine χ(G).
Minimum Coloring is not LinEMSOL(τ1,L ) definable. Nevertheless there is a polynomial time algorithm for graphs of bounded clique-width [20]. However this algorithm
is only of theoretical interest. For graphs of clique-width at most five (which is the best
known upper bound for the clique-width of (P5 ,gem)-free graphs [7]), the exponent r
of the running time O(nr ) of this algorithm is larger than 2000.
5.1 The Subproblems
We use the approach discussed in Section 3. Thus, we start by computing (in linear time)
the modular decomposition tree T of G. For each node x of T , we compute χ(G(x)).
Suppose x1 , x2 , . . . , xr are the children of x. For leaves, 0-nodes, and 1-nodes x, the
Linear Time Algorithms for Some NP-Complete Problems on (P5 ,Gem)-Free Graphs
67
steps of the linear time algorithm for Minimum Coloring on cographs can be used: If x
is a leaf of T then χ(G(x)) := 1. If x is a 0-node, then χ(G(x)) := max χ(G(xi )).
i=1,... ,r
r
If x is a 1-node, then χ(G(x)) := i=1 χ(G(xi )).
Suppose x is a 2-node of T . Let G∗ = (V ∗ , E ∗ ) be the characteristic graph assigned
to x. We assign to the vertex set V ∗ of G∗ the weight function w∗ : V ∗ → N such that
w∗ (vi ) := χ(G(xi )). We have that χ(G(x)) := χw∗ (G∗ ).
Thus, the Minimum Coloring problem on (P5 ,gem)-free graphs becomes the problem
of computing the minimum number of colors for a weighted coloring of (G∗ , w∗ ), where
G∗ is a prime (P5 ,gem)-free graph. The remainder of this section is devoted to this
problem. The Structure Theorem tells us that G∗ either is a matched co-bipartite graph,
a specific graph, or there is a k ≥ 0 with G∗ in Ck . In three subsections, each of these
cases will be dealt with. We also use the following notation and lemma.
Let N = v∈V ∗ w∗ (v) be the total weight. Observe that N is at most the number
of vertices of the original (P5 ,gem)-free graph.
Lemma 2. Let G be a perfect graph and w be a vertex weight function of G. Then
χw (G) = ωw (G) and κw (G) = αw (G).
Proof. Let G′ be the graph obtained from G by substituting each vertex v of G by a
clique of cardinality w(v). As any weighted coloring of (G, w) corresponds to a coloring
of G′ and vice versa, we have χw (G) = χ(G′ ). Similarly, ωw (G) = ω(G′ ).
Let G be perfect. Then G is perfect by Lovasz’s Perfect Graph Theorem [21]. G′ is
obtained from the perfect graph G by vertex multiplication, and thus it is perfect [21].
As G′ is the complement of a perfect graph (G′ ), it is perfect. Since G′ is perfect we
have χ(G′ ) = ω(G′ ) and thus χw (G) = ωw (G). Similarly, since G′ is perfect we obtain
χw (G) = ωw (G). Hence κw (G) = αw (G).
⊓
⊔
We now discuss how to solve the weighted coloring problem for each of the three
classes of prime (P5 ,gem)-free graphs.
5.2
Matched Cobipartite Graphs
The graph G∗ is cobipartite and thus perfect. By Lemma 2 we obtain χw∗ (G∗ ) =
ωw∗ (G∗ ). One easily finds in linear time a partition of the vertex set of G∗ into two
cliques, C1 , and C2 . Now, as each maximal clique of G∗ is either C1 , C2 , or an edge of
G∗ , ωw∗ (G∗ ) = χw∗ (G∗ ) can be computed by a linear time algorithm.
5.3
Specific Graphs
Each specific graph G∗ is a prime induced subgraph of the complement of one of the
three graphs in Figure 1. To solve the weighted coloring problem on specific graphs, we
formulate this problem as an integer linear programming problem, and then argue that
this ILP can be solved in constant time.
68
H. Bodlaender et al.
Consider the specific graph G∗ with weights w∗ . Let S be the collection of all
maximal stable sets of G∗ . We build an integer linear programming with for each S ∈ S
a variable xS , as follows.
minimize
xS such that
(1)
S∈S
v∈S,S∈S
xS ≥ w(v) for all v ∈ V
(2)
xS ≥ 0 for all S ∈ S
(3)
xS integer
for all S ∈ S
(4)
With x we denote a vector containing for each S ∈ S a value xS .
Let z be the optimal value of this ILP. z equals the minimum number of colors
needed for (G∗ , w∗ ). If we have a coloring of (G∗ , w∗ ) with a minimum number of
colors, then assign to each color one maximal stable set S ∈ S, such that this color is
given to (a subset of) all vertices in S. Let xS be the number of colors assigned to S.
Clearly,
xS is a non-negative
integer. For each v ∈ V , as v has w(v) colors, we have
S∈S xS equals the number of colors. Conversely, suppose
v∈S,S∈S xS ≥ w(v).
we have an optimal solution xS of the ILP. For each S ∈ S, we can take a set of xS
unique colors, and use these
colors to color the vertices in xS . As S is stable, this gives a
proper coloring, and as v∈S,S∈S xS ≥ w(v), each vertex has sufficiently many colors
available. So, this gives a coloring of (G∗ , w∗ ) with z colors.
The relaxation of the ILP is the linear program, obtained by dropping the integer
condition (4):
minimize
xS such that
(5)
S∈S
v∈S,S∈S
xS ≥ w(v) for all v ∈ V
(6)
xS ≥ 0 for all S ∈ S
(7)
Let x′ be an optimal solution of this relaxation, with value z ′ = S∈S x′S .
As G∗ is a specific graph, the linear program has a constant number of variables
(namely, the number of maximal stable sets of G∗ ) and a constant number of constraints
(at most nine, one per vertex of G∗ ), and hence can be solved in constant time. (E.g.,
enumerate all corners of the polyhedron spanned by program, and take the optimal one.)
Note that we can write the linear program in the form max{cx | Ax ≤ b}, such that
each element of A is either 0 or 1. Let ∆ be the maximum value of a subdeterminant of
this matrix A. It follows that ∆ is bounded by a constant. Write s = |S|.
Now we can use a result of Cook, Gerards, Schrijver, and Tardos, see Theorem 17.2
from [23]. This theorem tells us that the ILP has an optimal solution x′′ , such that for
each S ∈ S, |x′S − x′′S | ≤ s∆.
Thus, the following is an algorithm that finds the optimal solution to the ILP (and
hence the number of colors needed for (G∗ , w∗ )) in constant time. First, find an optimal
solution x′ of the relaxation. Then, enumerate all integer vectors x′′ with for all S ∈ S,
Linear Time Algorithms for Some NP-Complete Problems on (P5 ,Gem)-Free Graphs
69
|x′S −x′′S | ≤ s∆. For each such x′′ , check if it fulfils condition (2), and select the solution
vector that fulfils the conditions with the minimum value. By Theorem 17.2 from [23],
this is an optimal solution of the ILP. This method takes constant time, as s and ∆ are
bounded by constants, and thus ‘only’ a constant number of vectors have to be checked,
and each is of constant size.2
A straightforward implementation of this procedure would not be practical, as more
than (s∆)s vectors are checked, with s the number of maximal stable sets in one of the
specific graphs. In a practical setting, one could first solve the linear program, and use
that value as starting point in a branch and bound procedure.
Remark 1. The method not only works for the specific graphs, but for any constant
size graph. This implies that Minimum Coloring can be solved in linear time for graphs
whose modular decomposition has a constant upper bound on the size of the characteristic
graphs.
Remark 2. In the full version we present an O(N 3 ) time algorithm to solve the weighted
coloring of the specific graphs, that has no large hidden constant in the running time.
5.4
∞
k=0
Ck
∗
Let G ∈ Ck , for some k ≥ 0, and w∗ the weight function G∗ . All C5 ’s of G∗ can be
computed by a linear time algorithm that first computes all vertices of degree two.
If G∗ = C5 then with the technique applied to specific graphs χw∗ (G∗ ) can be
computed in constant time. If G∗ ∈ C0 then it is cochordal and thus perfect. Hence
χw∗ (G∗ ) = ωw∗ (G∗ ) by Lemma 2.
Lemma 3. The Maximum Weight Clique problem and the weighted coloring problem
can be solved by a linear time algorithm for cochordal graphs.
Proof. Frank [16] gave a linear time algorithm to compute the maximum weight of a
stable set of a chordal graph G. This implies that there is an O(n2 ) algorithm to compute
the maximum weight of a clique in a cochordal graph G since ωw (G) = αw (G). To get
a linear time algorithm, we must avoid the complementation; thus, we simulate Frank’s
algorithm applied to G. This is Frank’s algorithm: First it computes a perfect elimination
ordering v1 , . . . , vn of the input chordal graph G = (V, E). Then a maximum weight
stable set is constructed as follows. Initially, let c w(vi ) = w(vi ), for all 1 ≤ i ≤ n. For
each i from 1 to n, if c w(vi ) > 0 then colour vi red, and subtract c w(vi ) from c w(vj )
for all vj ∈ {vi } ∪ (N (vi ) ∩ {vi+1 , . . . , vn }). After all vertices have been processed,
set I = ∅ and, for each i from n down to 1, if vi is red and not adjacent to any vertex
of I then I = I ∪ {vi }. When all vertices have been processed again, the algorithm
terminates and outputs the maximum weight stable set I of (G, w).
We now describe our simulation of this algorithm. First a perfect elimination ordering
v1 , v2 , . . . , vn of G is computed in linear time (see e.g. [22]).
The maximum weight of a clique of G is constructed as follows. Initially, let W ′ = 0
and s(vi ) = 0 for all i (1 ≤ i ≤ n). For each i from 1 to n, if w(vi ) − W ′ + s(vi ) > 0
2
Computer computation shows that ∆ ≤ 3 for specific graphs.
70
H. Bodlaender et al.
then colour vi red, set W ′ = w(vi ) + s(vi ) and add w(vi ) − W ′ + s(vi ) to s(vj ) for all
vj ∈ (N (vi ) ∩ {vi+1 , . . . , vn }).
After all vertices have been processed, set K = ∅ and, for each i from n down to 1,
if vi is red and adjacent to all vertices of K then K = K ∪ {vi }. Finally the algorithm
outputs the maximum weight clique K of (G, w).
Clearly our algorithm runs in linear time. Its correctness follows from the fact that
when treating the vertex vi , the difference W ′ − s(vi ) is precisely the value the original
Frank algorithm applied to the complement of G would have subtracted from c w(vi )
up to the point when it treats vi . Thus our algorithm simulates Frank’s algorithm on G,
and thus it is correct.
⊓
⊔
In the remaining case, we consider a prime graph G∗ ∈ Ck , k ≥ 1 such that G∗ = C5 .
Lemma 4. Let k ≥ 1, G∗ ∈ Ck and G∗ = C5 . Let C = (v1 , v2 , v3 , v4 , v5 ) be a C5 in G∗
and v1 and v3 its vertices of degree two. Let w∗ be the vertex weight function of G∗ . Then
there is a minimum weight coloring S ∗ of (G∗ , w∗ ) with precisely max(w∗ (v2 ), w∗ (v4 )+
w∗ (v5 )) stable sets containing at least one of the vertices of {v2 , v4 , v5 }.
Proof. By Lemma 1, the set A of 0-vertices for C = (v1 , v2 , v3 , v4 , v5 ) is a stable set,
B = V ∗ \ (C ∪ A) = N (v2 ) \ C = N (v4 ) \ C = N (v5 ) \ C, and G∗ [B] is a cograph.
Let S be any minimum weight coloring of (G∗ , w∗ ). Since N (v1 )\C = N (v3 )\C =
∅ and N (v2 ) \ C = N (v4 ) \ C = N (v5 ) \ C = B we may assume that every stable set
of S contains either none or two vertices of C. Therefore we study weighted colorings
of a C5 C = (v1 , v2 , v3 , v4 , v5 ) of G∗ with vertex weights w∗ , where all stable sets are
non edges of C and call them partial weight colorings (abbr. pwc) of C. Clearly any
pwc of C = (v1 , v2 , v3 , v4 , v5 ) contains at least w∗ (v2 ) stable sets containing v2 , and it
contains at least w∗ (v4 ) + w∗ (v5 ) stable sets containing v4 or v5 .
Let S ′ be a weighted coloring of G∗ containing the smallest possible number of
stable sets S with S ∩ {v2 , v4 , v5 } =
∅. Let t be the number of stable sets S of S ′
satisfying S ∩ {v2 , v4 , v5 } = ∅ and suppose that, contrary to the statement of the
lemma, t > max(w∗ (v2 ), w∗ (v4 ) + w∗ (v5 )). Let s(v) be the number of stable sets
of S ′ containing the vertex v. Then t > w∗ (v4 ) + w∗ (v5 ) implies s(v4 ) > w∗ (v4 ) or
s(v5 ) > w∗ (v5 ). W.l.o.g. we may assume s(v4 ) > w∗ (v4 ). Hence there is a stable set
S ′ ∈ S ′ containing v4 . Consequently either S ′ ⊆ {v2 , v4 } ∪ A or S ′ ⊆ {v1 , v4 } ∪ A.
In both cases we replace the stable set S ′ of S ′ by {v1 , v3 } ∪ A. Thus the replacement
decrements the number of stable sets containing v4 and possibly the number of stable
sets containing v2 . Thus we obtain a new weighted coloring S ′′ of G∗ with t − 1
stable sets S with S ∩ {v2 , v4 , v5 } =
∅. This contradicts the choice of t. Consequently
t = max(w∗ (v2 ), w∗ (v4 ) + w∗ (v5 )).
⊓
⊔
To extend any pwc of a C5 C to G∗ only two parameters are important: the number
a of stable sets {v1 , v3 } in the pwc of C, and the number b of non edges in the pwc of
C different from {v1 , v3 }. Each of the a stable sets {v1 , v3 } in the pwc of C, can be
extended to a maximal stable set {v1 , v3 } ∪ A′ of G∗ , where A′ is some maximal stable
set of G∗ − C. Each of the b non edges S, S = {v1 , v3 }, in the pwc of C has the unique
extension to the maximal stable set S ∪ A of G∗ .
Linear Time Algorithms for Some NP-Complete Problems on (P5 ,Gem)-Free Graphs
71
By Lemma 4, for each C5 of G∗ there is a minimum weight coloring of G∗ extending a pwc of the C5 C with b = max(w∗ (v2 ), w∗ (v4 ) + w∗ (v5 )). Taking such a
minimum weight coloring we can clearly remove vertices v1 and v3 from stable sets
containing both until we obtain the smallest possible value of a in a pwc of C with
b = max(w∗ (v2 ), w∗ (v4 ) + w∗ (v5 )).
Finally given a C5 C, the smallest possible value of a in a pwc of C with b =
max(w∗ (v2 ), w∗ (v4 ) + w∗ (v5 )) can be computed in constant time. (Details omitted.)
Now we are ready to present our coloring algorithm that computes a minimum
weight coloring of (G∗ , w∗ ) for a graph G∗ of Ck , k ≥ 1. It removes at most k times
the precomputed C5 from the current graph until the remaining graph has no C5 and
is therefore a cochordal graph. Then by Lemma 3 there is an algorithm to solve the
weighted coloring problem for the cochordal graph in linear time.
In each round, i.e. when removing one C5 C = (v1 , v2 , v3 , v4 , v5 ) from the current
graph G′ with current weight function w′ , the algorithm proceeds as follows: First it
computes in constant time a pwc of C such that b = max(w′ (v2 ), w′ (v4 ) + w′ (v5 ))
and a as small as possible. Then the algorithm removes all vertices of C and obtains
the graph G′′ = G′ − C. Furthermore it removes all vertices of the stable set A of
0-vertices for C in G′ with weight at most a and decrements the weight of all other
vertices in A by a. Recursively the algorithm solves the minimum weight coloring
problem on the obtained graph G′′ with weight function w′′ . Finally the minimum
number of stable sets in a weighted coloring of (G′ , w′ ) is obtained using the formula
χw′ (G′ ) = a + max(b, χw′′ (G′′ )).
Thus the algorithm removes at most k ≤ n times a C5 . Each pwc of a C5 can be
computed in constant time. For the final cochordal graph the minimum weight coloring
can be solved in linear time. Hence the overall running time of the algorithmis linear.
We have given a linear time algorithm for the weighted coloring problem for k≥0 Ck .
We can finally conclude:
Theorem 3. There is a linear time algorithm to solve the Minimum Coloring problem
on (P5 ,gem)-free graphs.
6
Conclusion
We have shown how modular decomposition and the Structure Theorem for (P5 ,gem)free graphs can used to obtain a linear time algorithm to solve the Minimum Coloring
problem. In a quite similar way one can construct a linear time algorithm to solve the
Minimum Clique Cover problem on (P5 ,gem)-free graphs. Modular decomposition can
also be used to obtain linear time algorithms for the LinEMSOL(τ1,L ) definable NPcomplete graph problems Maximum Weight Stable Set and Maximum Weight Clique
on (P5 ,gem)-free graphs. These algorithms are given in the full version of this paper.
Acknowledgement. Thanks are due to Alexander Schrijver for pointing towards Theorem 17.2 from his book [23].
72
H. Bodlaender et al.
References
1. S. Arnborg, J. Lagergren, D. Seese, Easy problems for tree-decomposable graphs, J.
Algorithms 12 (1991), 308–340.
2. H. Bodlaender, Achromatic number is NP-complete for cographs and interval graphs, Inform. Process. Lett. 31 (1989) 135–138
3. H.L. Bodlaender, H.J. Broersma, F.V. Fomin, A.V. Pyatkin, G.J. Woeginger, Radio
labeling with pre-assigned frequencies, Proceedings of the 10th European Symposium on
Algorithms (ESA’2002), LNCS 2461 (2002) 211–222
4. H.L. Bodlaender, K. Jansen, On the complexity of the maximum cut problem, Nord. J.
Comput. 7 (2000) 14–31
5. H.L. Bodlaender, U. Rotics, Computing the treewidth and the minimum fill-in with the
modular decomposition, Proceedings of the 8th Scandinavian Workshop on Algorithm Theory
(SWAT’2002), LNCS 1851 (2002) 388–397
6. A. Brandstädt, D. Kratsch, On the structure of (P5 ,gem)-free graphs, Manuscript 2002
7. A. Brandstädt, H.-O. Le, R. Mosca, Chordal co-gem-free graphs have bounded clique
width, Manuscript 2002
8. A. Brandstädt, V.B. Le, J. Spinrad, Graph Classes: A Survey, SIAM Monographs on
Discrete Math. Appl., Vol. 3, SIAM, Philadelphia (1999)
9. M. Chudnovsky, N. Robertson, P.D.Seymour, R.Thomas, The Strong Perfect Graph Theorem, Manuscript 2002
10. D.G. Corneil, H. Lerchs, L. Stewart-Burlingham, Complement reducible graphs, Discrete Applied Math. 3 (1981) 163–174
11. D.G. Corneil, Y. Perl, L.K. Stewart, Cographs: recognition, applications, and algorithms,
Congressus Numer. 43 (1984) 249–258
12. B. Courcelle, J.A. Makowsky, U. Rotics, Linear time solvable optimization problems on
graphs of bounded clique-width, Theory of Computing Systems 33 (2000) 125–150
13. A. Cournier, M. Habib, A new linear algorithm for modular decomposition, Trees in Algebra
and Programming - CAAP ’94, LNCS 787 (1994) 68–84
14. E. Dahlhaus, J. Gustedt, R.M. McConnell, Efficient and practical algorithms for sequential modular decomposition, J. Algorithms 41 (2001) 360–387
15. W. Espelage, F. Gurski, E. Wanke, How to solve NP-hard graph problems on clique-width
bounded graphs in polynomial time, Proceedings of the 27th Workshop on Graph-Theoretic
Concepts in Computer Science (WG 2001), LNCS 2204 (2001) 117–128
16. A. Frank, Some polynomial algorithms for certain graphs and hypergraphs, Proceedings
of the Fifth British Combinatorial Conference (Univ. Aberdeen, Aberdeen, 1975) 211–226,
Congressus Numerantium No. XV, Utilitas Math., Winnipeg, Man. (1976)
17. T. Gallai, Transitiv orientierbare Graphen, Acta Mathematica Academiae Scientiarum Hungaricae 18 (1967) 25–66
18. V. Giakoumakis, I. Rusu, Weighted parameters in (P 5, P 5)-free graphs, Discrete Appl.
Math. 80 (1997) 255–261
19. K. Jansen, P. Scheffler, Generalized coloring for tree-like graphs, Discrete Appl. Math. 75
(1997) 135–155
20. D. Kobler, U. Rotics, Edge dominating set and colorings on graphs with fixed clique-width,
Discrete Appl. Math. 126 (2003) 197–221
21. L. Lovász, Normal hypergraphs and the perfect graph conjecture, Discrete Math. 2 (1972)
253–267
22. R.M. McConnell, J. Spinrad, Modular decomposition and transitive orientation, Discrete
Math. 201 (1999) 189–241
23. A. Schrijver, Theory of Linear and Integer Programming, John Wiley & Sons, Chichester,
1986.
Graph Searching, Elimination Trees, and a
Generalization of Bandwidth
Fedor V. Fomin, Pinar Heggernes, and Jan Arne Telle
Department of Informatics, University of Bergen, N-5020 Bergen, Norway
{fomin,pinar,telle}@ii.uib.no
Abstract. The bandwidth minimization problem has a long history and
a number of practical applications. In this paper we introduce a generalization of bandwidth to partially ordered layouts. We consider this generalization from two main viewpoints: graph searching and tree decompositions. The three graph parameters pathwidth, profile and bandwidth
related to linear layouts can be defined by variants of graph searching
using a standard fugitive. Switching to an inert fugitive, the two former
parameters are generalized to treewidth and fill-in, and our first viewpoint considers the analogous tree-like generalization that arises from
the bandwidth variant. Bandwidth also has a definition in terms of ordered path decompositions, and our second viewpoint generalizes this
in a natural way to ordered tree decompositions. In showing that both
generalizations are equivalent we employ the third viewpoint of elimination trees, as used in the field of sparse matrix computations. We call
the resulting parameter the treespan of a graph and prove some of its
combinatorial and algorithmic properties.
1
Motivation through Graph Searching Games
Different versions of graph searching has been attracting the attention of researchers from Discrete Mathematics and Computer Science for a variety of
elegant and unexpected applications in different and seemingly unrelated fields.
There is a strong resemblance of graph searching to certain pebble games [15]
that model sequential computation. Other applications of graph searching can
be found in VLSI theory since this game-theoretic approach to some important
parameters of graph layouts such as the cutwidth [19], the topological bandwidth
[18], the bandwidth [9], the profile [10], and the vertex separation number [8]
is very useful for the design of efficient algorithms. There is also a connection
between graph searching, pathwidth and treewidth, parameters that play an important role in the theory of graph minors developed by Robertson & Seymour
[3,7,22]. Furthermore, some search problems have applications in problems of
privacy in distributed environments with mobile eavesdroppers (‘bugs’) [11].
In the standard node-search version of searching, a single searcher is placed
at a vertex of a graph G at every move, while from other vertices searchers
are removed (see e.g. [15]). The purpose of searching is to capture an invisible
fugitive moving fast along paths in G. The fugitive is not allowed to run through
A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 73–85, 2003.
c Springer-Verlag Berlin Heidelberg 2003
74
F.V. Fomin, P. Heggernes, and J.A. Telle
the vertices currently occupied by searchers. So the fugitive is caught when a
searcher is placed on the vertex it occupies, and it has no possibility to leave the
vertex because all the neighbors are occupied (guarded) by searchers. The goal
of search games is to find a search strategy to guarantee the fugitive’s capture
while minimizing some resource usage.
Because the fugitive is invisible, the only information the searchers possess
are the previous search moves that may give knowledge about subgraphs where
the fugitive cannot possibly be present. This brings us to the interesting interpretation of the search problem [3] as the problem of fighting against damage
spread in complex systems, e.g. the spread of a mobile computer virus in networks. Initially all vertices are viewed as contaminated (infected by a virus or
damaged) and a contaminated vertex is cleared once it is occupied by a searcher
(checked by an anti-virus program). A clear vertex v is recontaminated if there
is a path without searchers leading from v to a contaminated vertex. In some
applications it is required that recontamination should never occur and in this
case we are interested in the so-called ’monotone’ searching. For most of the
search game variants considered in the literature it can be shown, sometimes by
very clever techniques, that the resource usage does not increase in spite of this
constraint [15,16,4,7]. The ‘classical’ goal of the search problem is to find the
search program such that the maximum number of searchers in use at any move
is minimized. The minimum number of searchers needed to clear the graph is
related to the parameter called pathwidth. Dendris et al. [7] studied a variation
of the node-search problem with inert, or lazy, fugitive. In this version of the
game the fugitive is allowed to move only just before a searcher is placed on the
vertex it occupies. The smallest number of searchers needed to find the fugitive
in this version of searching is related to the parameter called treewidth [7].
Another criteria of optimality in node-searching, namely search cost was
studied in [10]. Here the goal is to minimize the sum of the number of searchers
in use over all moves of the search program. The search cost of a graph is equal to
the interval completion number, or profile, which is the smallest number of edges
in any interval supergraph of the given graph. Looking at the monotone search
cost version but now with an inert fugitive, it is easy to see that this parameter is
equal to the smallest number of edges in the chordal supergraph of a given graph,
so called fill-in. (It is not clear if in this version of searching recontamination
can help and this is an interesting open question.) We thus have the following
elegant relation: the parameters related to standard node searching (pathwidth,
profile) expressible in terms of interval completion problems, correspond in inert
fugitive searching to chordal completion problems (treewidth, fill-in).
In this paper we want to minimize the maximum length of time (number
of intermediate moves) during which a searcher occupies a vertex. A similar
problem for pebbling games (that can be transferred into search terms) was
studied by Rosenberg & Sudborough [23]. In terms of monotone pebbling (i.e.,
no recontamination allowed) this becomes the maximum lifetime of any pebble in
the game. It turned out that this parameter is related to the bandwidth of a graph
G, which is the minimum over all linear layouts of vertices in G of the maximum
Graph Searching, Elimination Trees, and a Generalization of Bandwidth
75
distance between images of adjacent vertices. The following table summarizes
the knowledge about known relations between graph monotone searching and
graph parameters.
Standard Search
Inert Search
Number of Searchers Cost of Searching Occupation Time
pathwidth [15]
profile [10]
bandwidth [23]
treewidth [7]
fill-in
???
One of the main questions answered in this paper concerns the entry labeled
??? above: What kind of graph parameter corresponds to the minimum occupation time (mot) for monotone inert fugitive search? In section 2 we introduce a
generalization of bandwidth to tree-like layouts, called treespan, based on what
we call ordered tree decompositions. In section 3 we give the formal definition
of the parameter mot(G), and then in section 4 we show that it is equivalent to
a parameter arising from elimination trees, as used in the sparse matrix computation community. In section 5 we show the equivalence also between this
elimination tree parameter and treespan, thereby providing evidence that the
entry labeled ??? above indeed corresponds to a natural generalization of bandwidth to partially ordered (tree) layouts. Finally in section 6 we obtain some
algorithmic and complexity results on the treespan parameter.
2
Motivation through Tree Decompositions
We assume simple, undirected, connected graphs G = (V, E), where |V | = n.
We let N (v) denote the neighbors of vertex v, and d(v) = |N (v)| is the degree
of v. The maximum degree of any vertex in G is denoted by ∆(G). For a set of
vertices U ⊆ V , N (U ) = {v ∈ U | uv ∈ E and u ∈ U }. H ⊆ G means that H
is a subgraph of G. For a rooted tree T and a vertex v in T , we let T [v] denote
the subtree of T with root in v.
A chord of a cycle C in a graph is an edge that connects two non-consecutive
vertices of C. A graph G is chordal if there are no induced chordless cycles of
length ≥ 4 in G. Given any graph G = (V, E), a triangulation G+ = (V, E + ) of
G is a chordal graph such that E ⊆ E + .
A tree decomposition of a graph G = (V, E) is a pair (X, T ), where T = (I, M )
is a tree and X = {Xi | i ∈ I} is a collection of subsets of V called bags, such
that:
1. i∈I Xi = V
2. uv ∈ E ⇒ ∃i ∈ I with u, v ∈ Xi
3. For all vertices v ∈ V , the set {i ∈ I | v ∈ Xi } induces a connected subtree
of T .
The width of a tree decomposition (X, T ) is tw(X, T ) = maxi∈I |Xi | − 1. The
treewidth of a graph G is the minimum width over all tree decompositions of G.
A path decomposition is a tree decomposition (X, T ) such that T is a path. The
pathwidth of a graph G is the minimum width over all path decompositions of
G. We refer to Bodlaender’s survey [5] for further information on treewidth.
76
F.V. Fomin, P. Heggernes, and J.A. Telle
For a chordal graph G, the treewidth is one less than the size of the largest
clique in G. For a non-chordal graph G, the treewidth is the minimum treewidth
over all triangulations of G. This is due to the fact that a tree decomposition
(X, T ) of G actually corresponds to a triangulation of the given graph G: simply
add edges to G such that each bag of X becomes a clique. The resulting graph,
which we will call tri(X, T ) is a chordal graph of which G is a subgraph. In addition, any triangulation G+ of G is equal to tri(X, T ) for some tree decomposition
(X, T ) of G.
Another reason why tree decompositions and chordal graphs are closely related is that chordal graphs are exactly the intersection graphs of subtrees of a
tree [14]. Analogously, interval graphs are related to path decompositions, and
they are the intersection graphs of subpaths of a path. A graph is interval if there
is a mapping f of its vertices into sets of consecutive integers such that for each
pair of vertices v, w the following is true: vw is an edge ⇔ f (v)∩f (w) = ∅. Interval graphs form a subclass of chordal graphs. Similar to treewidth, the pathwidth
of a graph G is one less than the smallest clique number over all triangulations
of G into interval graphs.
The bandwidth of G, bw(G), is defined as the minimum, over all linear orders
of the vertices of G, maximum difference between labels of two adjacent vertices.
Similar to pathwidth and treewidth, bandwidth can be defined in terms of triangulations as follows. A graph isomorphic to K1,3 is referred to as a claw, and a
graph that does not contain an induced claw is said to be claw-free. An interval
graph G is a proper interval graph if it is claw-free [21]. As it was observed by
Parra & Scheffler [20], the bandwidth of a graph G is one less than the smallest clique number over all triangulations of G into proper interval graphs. One
can define bandwidth in terms of ordered path decompositions. In an ordered
path decomposition, the bags are numbered 1, 2, ..., n from left to right. The
first bag X1 contains only one vertex of G, and for 1 ≤ i ≤ n − 1 we have
|Xi+1 \ Xi | = 1, meaning that exactly one new graph vertex is introduced in
each new bag. The number of bags a vertex v belongs to is denoted by l(v). It is
easy to show that bw(G) is the minimum, over all ordered path decompositions,
max{l(v) − 1 | v ∈ V }.
The natural question here is, what kind of parameter corresponds to bandwidth when, instead of path decompositions, we switch to tree decompositions?
This brings us to the definition of ordered tree decomposition and treespan.
Definition 1. An ordered tree decomposition (X, T, r) of a graph G = (V, E)
is a tree decomposition (X, T ) of G where T = (I, M ) is a rooted tree with root
r ∈ I, such that:
|Xr | = 1, and if i is the parent of j in T , then |Xj \ Xi | = 1.
Definition 2. Given a graph G = (V, E) and an ordered tree decomposition
(X, T, r) of G, we define:
l(v) = |{i ∈ I | v ∈ Xi }| (number of bags that contain v), for each v ∈ V .
ts(X, T, r) = max{l(v) | v ∈ V } − 1.
The treespan of a graph G is ts(G) = min{ts(X, T, r) | (X, T, r) is an ordered
tree decomposition of G}.
Graph Searching, Elimination Trees, and a Generalization of Bandwidth
77
Since every ordered path decomposition is an ordered tree decomposition, it
is clear that for every graph G, ts(G) ≤ bw(G).
3
Search Minimizing Occupation Time with Inert
Fugitive
In this section we give a formal definition of minimum occupation time for inert
fugitive searching. A search program Π on a graph G = (V, E) is the sequence
of pairs
(A0 , Z0 ), (A1 , Z1 ), . . . , (Am , Zm )
such that
I.
II.
III.
IV.
V.
For i ∈ {0, . . . , m}, Ai ⊆ V and Zi ⊆ V . We say that vertices Ai are cleared,
vertices V − Ai are contaminated and vertices Zi are occupied by searchers
at the ith step.
(Initial state.) A0 = ∅ and Z0 = ∅. All vertices are contaminated.
(Final state.) A0 = V and Z0 = ∅. All vertices are cleared.
(Placing-removing searchers and clearing vertices.) For i ∈ {1, . . . , m} there
exists v ∈ V and Yi ⊆ Ai−1 such that Ai − Ai−1 = v and Zi = Yi ∪ {v}.
Thus at every step one of the searchers is placed on a contaminated vertex v
while the others are placed on cleared vertices Yi . The searchers are removed
from vertices Zi−1 − Yi . Note that Yi is not necessarily a subset of Zi−1 .
(Possible recontamination.) For i ∈ {1, . . . , m} Ai − {v} is the set of vertices
u ∈ Ai−1 such that every uv-path has an internal vertex in Zi . This means
that the fugitive awakening in v can run to a cleared vertex u if there is a
uv-path unguarded by searchers.
Dendris, Thilikos & Kirousis [7] initiated the study of inert search problem, where
the problem is to find a search program Π with the smallest maxi∈{0,...,m} |Zi |
(this maximum can be treated as the maximum number of searchers used in one
step). It turns out that this number is equal to the treewidth of a graph. We find
an alternative measure of search to be interesting as well. For a search program
Π = (A0 , Z0 ), (A1 , Z1 ), . . . , (Am , Zm ) on a graph G = (V, E) and vertex v ∈ V
we define
1, v ∈ Zi
δi (v) =
0, v ∈ Zi
m
Then the number i=0 δi (v) is the number of steps at which vertex v was occupied by searchers. For a program
mΠ we define the maximum vertex occupation
time to be ot(Π, G) = maxv∈V i=0 δi (v). The vertex occupation time of a graph
G, denoted by ot(G), is the minimum maximum vertex occupation time over all
search programs on G.
A search program (A0 , Z0 ), (A1 , Z1 ), . . . , (Am , Zm ) is monotone if Ai−1 ⊆
Ai for each i ∈ {1, . . . , m}. Note that recontamination does not occur when a
searcher is placed on a contaminated vertex thus awaking the fugitive.
78
F.V. Fomin, P. Heggernes, and J.A. Telle
Finally, for a graph G we define mot(G) to be the minimum maximum vertex
occupation time over all monotone search programs on G. We do not know
whether mot(G) = ot(G) for every graph G, and leave it as an interesting open
question.
4
Searching and Elimination Trees
In this section we discuss a relation between mot(G) and elimination trees of G.
This relation is not only interesting in its own but also serves as a tool in further
proofs.
For a graph G = (V, E), an elimination order α : {1, 2, ..., n} → V is a linear
order of the vertices of G. For each given order α, a unique triangulation G+
α of
G can be computed from the following procedure: starting with vertex α(1), at
each step i, make the higher numbered neighbors of vertex α(i) in the transitory
graph into a clique by adding edges. The resulting graph, which is denoted G+
α,
is chordal [12], and the given elimination ordering decides the quality of this
resulting triangulation. The following lemma follows from the definition of G+
α.
Lemma 1. uv is an edge of G+
α ⇔ uv is an edge of G or there is a path
u, x1 , x2 , ..., xk , v in G with k ≥ 1 such that all xi are ordered before u and
v by α (in other words, max{α−1 (xi ) | 1 ≤ i ≤ k} < min{α−1 (u), α−1 (v)}).
Definition 3. For a vertex v ∈ V we define madj + (v) to be the set of vertices
u ∈ V such that α(u) ≥ α(v) and uv is an edge of G+
α . (The higher numbered
neighbors of v in G+
α .)
Given a graph G, and an elimination order α on G, the corresponding elimination tree is a rooted tree ET = (V, P ), where the edges in P are defined by
the following parent function: parent(α(i)) = α(j) where j = min{k | α(k) ∈
madj + (α(i))}, for i = 1, 2, ..., n. Hence the elimination tree is a tree on the vertices of G, and vertex α(n) is always the root. The height of the elimination tree
is the longest path from a leaf to the root. Minimum elimination tree height of
a graph G, mh(G) is the minimum height of an elimination tree corresponding
to any triangulation of G. For a vertex u ∈ V we denote by ET [u] the subtree
of ET rooted in u and containing all descendants (in ET ) of u. It is important
to note that, for two vertices u and v such that ET [u] and ET [v] are disjunct
subtrees of ET , no vertex belonging to ET [u] is adjacent to any vertex belonging
+
to ET [v] in G or G+
α . In addition, N (ET [v]) is a clique in Gα , and a minimal
+
vertex separator in both Gα and G when v is not the only child of its parent in
ET .
Let α be an elimination order of the vertices of a graph G = (V, E) and let
ET be the corresponding elimination tree of G. Observe that the elimination
tree ET gives enough information about the chordal completion G+ of G that
ET corresponds to. It is important to understand that any post order α of the
vertices of ET is an elimination order on G that results in the same chordal
+
completion G+
α = G . Thus given G and ET , we have all the information we
need on the corresponding triangulation.
Graph Searching, Elimination Trees, and a Generalization of Bandwidth
79
Definition 4. Given an elimination tree ET of G, the pruned subtree with root
in x, ETp [x], is the subtree obtained from ET [x] by deleting all descendants of
every vertex y ∈ ET [x] such that xy ∈ E(G) but no descendant of y is a neighbor
of x in G.
Thus, the leaves of ETp [x] are neighbors of x in G, and all lower numbered
neighbors in G+ of x are also included in ETp [x]. In addition, there might clearly
appear vertices in ETp [x] that are not neighbors of x in G. However, every
neighbor of x in G+ appears in ETp [x], as we prove in the following lemma.
Lemma 2. Let α be an elimination order of graph G = (V, E) and let ET be a
corresponding elimination tree. Then for any u, v ∈ V , u ∈ ETp [v] if and only if
v ∈ madj + (u).
Proof. Let u ∈ ETp [v] and let w be a neighbor of v in G such that u is on a
vw-path in ET . By the definition of pruned tree such a vertex w always exists.
Because ET is an elimination tree, there is a uw-path P + in G+
α such that for
any vertex x of P + , α−1 (x) ≤ α−1 (u). By Lemma 1, this implies that there is
also an uw-path P in G such that for any vertex x of P , α−1 (x) ≤ α−1 (u). Since
w is adjacent to v in G, we conclude that v ∈ madj + (u).
Let v ∈ madj + (u). Then there is an uv-path P in G (and hence in G+
α ) such
that all inner vertices of the path are ordered before u in α. Let w be the vertex
of P adjacent to v. Because ET is elimination tree, we have that u is on vw-path
in ET . Thus u ∈ ETp [v].
We define a parameter called elimination span, es, as follows:
Definition 5. Given an elimination tree ET of a graph G = (V, E), for each
vertex v ∈ V we define s(v) = |ETp [v]| and es(ET ) = max{s(v) | v ∈ V }−1. The
elimination span of a graph G is es(G) = min{es(ET ) | ET is an elimination
tree of G}.
Theorem 1. For any graph G = (V, E), es(G) = mot(G) − 1.
Proof. Let us prove es(G) ≤ mot(G) − 1 first. Let Π = (A0 , Z0 ), (A1 , Z1 ), . . . ,
(Am , Zn ) be a monotone search program. At every step of the program exactly
one new vertex Ai − Ai−1 is cleared. Thus we can define the vertex ordering α by
putting for 1 ≤ i ≤ n α(Ai − Ai−1 ) = n − i + 1. At the ith step, when a searcher
is placed at a vertex u = Ai − Ai−1 every vertex v ∈ Ai such that there is a
uv-path with no inner vertices in Ai should be occupied by a searcher (otherwise
v would be recontaminated). Therefore, v ∈ madj + (u) and the number of steps
when a vertex v is occupied by searchers, is |{u | v ∈ madj + (u)}|. By Lemma 2,
|{u | v ∈ madj + (u)}| = s(v) and we arrive at es(ET ) ≤ mot(Π, G) − 1.
We now show that es(G) ≥ mot(G)−1. Let ET be an elimination tree and let
α be a corresponding elimination vertex ordering. We consider a search program
Π where at the ith step of the program, 1 ≤ i ≤ n, the searchers occupy the
set of vertices madj + (v), where v is a vertex with α(v) = n − i + 1. Let us first
80
F.V. Fomin, P. Heggernes, and J.A. Telle
prove that Π is recontamination free. Suppose, on the contrary, that a vertex u is
recontaminated at the ith step after placing a searcher on a vertex v. Then there
is a uv-path P such that no vertex of P except v contains a searcher at the ith
step. On the other hand, vertex u is after v in ordering α. Thus P should contain
a vertex w ∈ madj + (u), w = u, occupied by a searcher. This is a contradiction.
Since every vertex was occupied at least once and no recontamination occurs,
we conclude that at the end of Π all vertices are cleared. Every vertex v was
occupied by searchers during |{u | v ∈ madj + (u)}| steps and using Lemma 2 we
conclude that es(ET ) ≥ mot(Π, G) − 1.
5
Ordered Tree Decompositions and Elimination Trees
In this section we discuss a relation between the treespan ts(G) and elimination
trees of G, establishing that ts(G) = mot(G). We first give a simplified view of
ordered tree decompositions and then proceed to prove some of their properties.
There are exactly n bags in X of an ordered tree decomposition (X, T, r)
of G. Thus, the index set I for Xi , i ∈ I can be chosen so that I = V , with
r ∈ V . Then T is a tree on the vertices of G. To identify the bags and to define
the correspondence between I and V uniquely, name the bags so that Xr is the
bag corresponding to the root r of T . Regarding the bags in a top down fashion
according to T , name the bag in which vertex v appears for the first time Xv and
the corresponding tree node v. Thus if y is the parent of v in T then Xv \ Xy =
{v}. This explains how to rename the bags and the vertices of T with elements
from V given a tree decomposition based on I. However, if we replace i with v
and I with V in Conditions 1 - 3 of the definition of a tree decomposition, and
change condition in the definition of ordered tree decompositions to “Xr = {r},
and if y is the parent of v in T then Xv \ Xy = {v}”, then this will automatically
give a tree T on the vertices of G as we have explained above. For the remainder
of this paper, when we mention an ordered tree decomposition (X, T, r), we will
assume that T is a tree on the vertices of G as explained here. The following
lemma will make the role of T even clearer.
Lemma 3. Given a graph G = (V, E) and a rooted tree T = (V, P ), there exists
an ordered tree decomposition (X, T, r) of G ⇔ for every edge uv ∈ E, u and v
have an ancestor-descendant relationship in T .
Proof. Assume that T corresponds to a valid ordered tree decomposition of G,
but there is an edge uv in G such that T [u] and T [v] are disjunct subtrees of
T . Xu is the first bag in which u appears and Xv is the first bag in which v
appears, thus u and v do not appear in any bag Xw where w is on the path from
u to the root or from v to the root in T . Thus if u and v appear together in any
other bag Xy where y belongs to T [u] or T [v] or any other disjunct subtree in
T , this would violate Condition 3 of a tree decomposition. Therefore, u and v
cannot appear together in any bag, and there cannot exist a valid decomposition
(X, T, r) of G.
Graph Searching, Elimination Trees, and a Generalization of Bandwidth
81
For the reverse direction, assume that for every edge uv in G, u and v have
an ancestor-descendant relationship in T . Assume without loss of generality that
v is an ancestor of u. Then the bags can be defined so that 1) Xv contains v, 2)
no bag Xy contains v where y is an ancestor of v, 3) for every vertex w on the
path from v to u in T , Xw contains v (and w of course), and 4) Xu contains both
u and v. We can see that all the conditions of an ordered tree decomposition are
satisfied.
Lemma 4. Let (X, T, r) be an ordered tree decomposition of a given graph. For
every edge uv in tri(X, T ), u and v have an ancestor-descendant relationship in
T.
Proof. As we have seen in the proof of Lemma 3, if u and v belong to disjunct
subtrees of T , then they cannot appear together in the same bag. Since only
the bags are made into cliques, u and v cannot belong to the same clique in
tri(X, T ), which means that the edge uv does not exist in tri(X, T ).
Lemma 5. Let (X, T, r) be an ordered tree decomposition of a given graph. Let
uv be an edge of tri(X, T ) such that v is an ancestor of u in T . Then v belongs
to bag Xw for every w on the path from v to u including Xv and Xu .
Proof. Vertex v appears for the first time in Xv on the path from the root, and
u appears for the first time in Xu . For every vertex w on the path from v to u,
exactly vertex w is introduced in Xw . Thus Xu is the first bag in which u and
v both can belong to. In order for this to be possible, v must belong to bag Xw
for every vertex w on the path from v to u in T .
Lemma 6. For each graph G, there exists an ordered tree decomposition
(X, T, r) of G of minimum treespan such that if u is a child of v in T then
v ∈ Xu .
Proof. Assume that u is a child of v in T and v ∈ Xu . Clearly, uv is not an edge
of G. Since v does not belong to any bag Xy for a descendant y of u, we can
move u up to be a child of a node w in T where uw is an edge of G and where
w is the first node on the path from v to the root that is a neighbor of u.
Lemma 7. Let (X, T, r) be an ordered tree decomposition of G, and let α :
{1, ..., n} → V be a post order of T . Then G+
α ⊆ tri(X, T ).
Proof. Let uv be an edge of G+
α , and assume without loss of generality that u
has a lower number than v according to α. If uv is an edge of of G, then we are
done. Otherwise, due to Lemma 1, there must exist a path u, x1 , x2 , ..., xk , v in
G with k ≥ 1 such that all xi are ordered before u. Since α is a post order of
T , none of the vertices xi , i = 1, ..., k, can lie on the path from u to the root in
T . Consequently and due to Lemma 3, since ux1 is an edge of G, x1 belongs to
82
F.V. Fomin, P. Heggernes, and J.A. Telle
T [u]. With the same argument, since x1 , x2 , ..., xk is a path in G, all the vertices
x1 , x2 , ..., xk must belong to T [u]. Now, since vxk is an edge in G, v must be an
ancestor of xk and thus of u in T , where u lies on the path from v to xk . By
Lemma 5, vertex v must be present in all bags Xw where w lies on the path from
v to xk , and consequently also in bag Xu . Therefore, u and v are both present
in bag Xu and are neighbors in tri(X, T ).
Lemma 8. Let (X, T, r) be an ordered tree decomposition of G, and let α be a
post order of T . Let ET be the elimination tree of G+
α . Then for any vertex u,
if v is the parent of u in ET , then v lies on the path from u to the root in T .
Proof. Since v is the parent of u in ET , uv is an edge of G+
α . By Lemma 7,
uv is also an edge of tri(X, T ). By Lemma 4, u and v must have an ancestordescendant relationship in T . Since α is a post order of T , and α−1 (u) < α−1 (v),
v must be an ancestor of u in T .
Theorem 2. For any graph G, ts(G) = es(G).
Proof. First we prove that ts(G) ≤ es(G). Let ET = (V, P ) be an elimination
tree of G such that es(G) = es(ET ), and let r be the root vertex of ET . We
define an ordered tree decomposition (X = {Xv | v ∈ V }, T = ET, r) of G in the
following way. For each vertex v in ET , put v in exactly the bags Xu such that
u ∈ ETp [v]. Regarding ET top down, each vertex u will appear for the first time
in bag Xu , and clearly |Xu \ Xv | = 1 whenever v is the parent of u. It remains
to show that (X, ET ) is a tree decomposition of G. Conditions 1 and 3 of a tree
decomposition are trivially satisfied since ETp [v] is connected and includes u for
every vertex v. For Condition 2, if uv is an edge of G, then the lower numbered
of v and u is a descendant of the other in ET . Let us say u is a descendant of v,
then u ∈ ETp [v], and v and u will both appear in bag Xu . Thus (X, ET ) is an
ordered tree decomposition of G, and clearly, ts(X, ET ) = es(G). Consequently,
ts(G) ≤ es(G).
Now we show that es(G) ≤ ts(G). Let (X, T, r) be an ordered tree decomposition of G with ts(X, T, r) = ts(G). Let α be a post order on T , and let ET be
the elimination tree of G+
α . For any two adjacent vertices u and v in G, u and v
must have an ancestor-descendant relationship both in T and in ET . Moreover,
due to Lemma 8, all vertices that are on the path between u and v in ET must
also be present on the path between u and v in T . Assume, without loss of generality, that u is numbered lower than v. By Lemma 5, v must belong to all the
bags corresponding to the vertices on the path from v to u in T . Thus for each
vertex v, s(v) in ET is at most l(v) in (X, T, r). Consequently, es(G) ≤ ts(G),
and the proof is complete.
Theorems 1 and 2 imply the main combinatorial result of this paper.
Corollary 1. For any graph G, ts(G) = es(G) = mot(G).
Graph Searching, Elimination Trees, and a Generalization of Bandwidth
6
83
Treespan of Some Special Graph Classes
The diameter of a graph G, diam(G), is the maximum length of a shortest
path between any two vertices of G. The density of a graph G is defined as
dens(G) = (n − 1)/diam(G). The following result is well known
Lemma 9. [6] For any graph G, bw(G) ≥ max{dens(H) | H ⊆ G}.
A caterpillar is a tree consisting of a main path of vertices of degree at least
two with some leaves attached to this main path.
Theorem 3. For any graph G, ts(G) ≥ max{dens(H) | H ⊆ G and H is a
caterpillar}.
Proof. Let the caterpillar H be a subgraph of G consisting of the following main
path: c1 , c2 , ..., cdiam(H)−1 . We view the bags of an ordered tree decomposition
as labeled by vertices of G in the natural manner (as described before Lemma
3). Let (X, T, r) be an ordered tree decomposition of G with (X ′ , T ′ , r′ ) being
the topologically induced ordered tree decomposition on H, i.e. containing only
bags labeled by a vertex from H, where we contract edges of T going to vertices
labeled by vertices not in H to get T ′ . Let Xci be the ’highest’ bag in (X ′ , T ′ , r′ )
labeled by a vertex from the main path, so that only the subtree of (X ′ , T ′ , r′ )
rooted at Xci contains any vertices from the main path. Let there be h + 1
bags on the path from Xci to the root Xr′ of (X ′ , T ′ , r′ ). Since vertex r′ of H
(a leaf unless r′ = ci ) is adjacent to a vertex on the main path it appears in
at least h + 1 bags, giving ts(G) ≥ h. Moreover, by applying Lemma 3 we get
that T ′ between its root Xr′ and Xci consists simply of a path without further
children, so that the subtree rooted at Xci has |V (H)| − h bags. Each of these
bags contain a vertex from the main path since every leaf of H is adjacent in H
only to a vertex on the main path, and by the pigeonhole principle we thus have
that some main path vertex lives in at least ⌈(|V (H)| − h)/(diam(H) − 1)⌉ bags.
If (|V (H)| − h)/(diam(H) − 1) is not an integer, then immediately we have the
bound ts(G) ≥ ⌊(|V (H)| − h)/(diam(H) − 1)⌋. If (diam(H) − 1) on the other
hand does divide (|V (H)| − h) then we apply the fact that at least diam(H) − 2
bags must contain at least two vertices from the main path, to account for edges
between them, and for diam(H) ≥ 3 (which holds except for the trivial case H
a star) this increases the span of at least one main path vertex and we again get
ts(G) ≥ ⌊(|V (H)| − h)/(diam(H) − 1)⌋.
Thus ts(G) ≥ max{h, ⌊(|V (H)| − h)/(diam(H) − 1)⌋}. If h ≤ dens(H) we
have that ⌊(|V (H)| − h)/(diam(H) − 1)⌋ ≥ (|V (H)| − 1)/diam(H) and therefore
⌊(|V (H)| − h)/(diam(H) − 1)⌋ ≥ dens(H). We conclude that ts(G) ≥ dens(H)
and the lemma follows.
With this theorem, in connection with the following result from [2], we can
conclude that bw(G) = ts(G) for a caterpillar graph G.
Lemma 10. [2] For a caterpillar graph G, bw(G) ≤ max{dens(H) | H ⊆ G}.
84
F.V. Fomin, P. Heggernes, and J.A. Telle
Lemma 11. For a caterpillar graph G, bw(G) = ts(G) = max{dens(H) | H ⊆
G}.
Proof. Let G be a caterpillar graph. Then, bw(G) ≥ ts(G) ≥ max{dens(H) |
H ⊆ G} ≥ bw(G). The first inequality was mentioned in Section 5, the second
inequality is due to Theorem 3, and the last inequality is due to Lemma 10 since
G is a caterpillar. Thus all of the mentioned parameters on G are equal.
A set of three vertices x, y, z of a graph G is called an asteroidal triple (AT)
if for any two of these vertices there exists a path joining them that avoids the
(closed) neighborhood of the third. A graph G is called an asteroidal triplefree (AT-free) graph if G does not contain an asteroidal triple. This notion was
introduced by Lekkerkerker an Boland [17] for the following characterization of
interval graphs: G is an interval graph if and only if it is chordal and AT-free.
A graph G is said to be cobipartite if it is the complement of a bipartite
graph. Notice that cobipartite graphs form a subclass of AT-free claw-free graphs.
Another subclass of AT-free claw-free graphs are the proper interval graphs,
which were mentioned earlier. Thus G is a proper interval graph if and only if it
is chordal and AT-free claw-free. A minimal triangulation of G is a triangulation
H such that no proper subgraph of H is a triangulation of G. The following result
is due to Parra and Scheffler.
Theorem 4. [20] Let G be an AT-free claw-free graph. Then every minimal
triangulation of G is a proper interval graph, and hence, bw(G) = pw(G) =
tw(G).
Theorem 5. For an AT-free claw-free graph G, ts(G) = bw(G) = pw(G) =
tw(G).
Proof. Let G be AT-free claw-free and let H be its minimal triangulation such
that ts(G) = ts(H). Such a graph H must exist, since for an optimal ordered
tree decomposition (X, T, r), the graph tri(X, T ) is chordal and ts(tri(X, T )) =
ts(G). Thus any minimal graph from the set of chordal graphs ’sandwiched’
between tri(X, T ) and G can be chosen as H. By Theorem 4, H is a proper
interval graph. Thus ω(H) − 1 = bw(H) ≥ bw(G). Since ts(H) ≥ ω(H) − 1, we
have that ts(G) = ts(H) ≥ ω(H) − 1 ≥ bw(G) ≥ ts(G).
By the celebrated result of Arnborg, Corneil & Proskurowski [1], tree-width
(and hence path-width and bandwidth) is NP-hard even for cobipartite graphs.
Thus Theorem 5 yields the following corollary.
Corollary 2. Computing treespan is NP-hard for cobipartite graphs.
We conclude with an open question. For any graph G, ts(G) ≥ ⌈∆(G)/2⌉. For
trees of maximum degree at most 3 it is easy to prove that ts(G) ≤ ⌈∆(G)/2⌉. It
is an interesting question whether treespan can be computed in polynomial time
for trees of larger max degree. Notice that bandwidth remains NP-complete on
trees of max degree 3 [13].
Graph Searching, Elimination Trees, and a Generalization of Bandwidth
85
References
1. S. Arnborg, D.G. Corneil, and A. Proskurowski, Complexity of finding embeddings in a k-tree, SIAM J. Alg. Disc. Meth., 8 (1987), pp. 277–284.
2. S.F. Assman, G.W. Peck, M.M. Syslo, and J.Zak, The bandwidth of caterpillars with hairs of length 1 and 2, SIAM J. Alg. Disc. Meth., 2 (1981), pp. 387–392.
3. D. Bienstock, Graph searching, path-width, tree-width and related problems (a
survey), DIMACS Ser. in Discrete Mathematics and Theoretical Computer Science,
5 (1991), pp. 33–49.
4. D. Bienstock and P. Seymour, Monotonicity in graph searching, J. Algorithms,
12 (1991), pp. 239–245.
5. H.L. Bodlaender, A partial k-arboretum of graphs with bounded treewidth, Theor.
Comp. Sc., 209 (1998), pp. 1–45.
6. P.Z. Chinn, J. Chvátalová, A.K. Dewdney, and N.E. Gibbs, The bandwidth
problem for graphs and matrices – a survey, J. Graph Theory, 6 (1982), pp. 223–
254.
7. N.D. Dendris, L.M. Kirousis, and D.M. Thilikos, Fugitive-search games on
graphs and related parameters, Theor. Comp. Sc., 172 (1997), pp. 233–254.
8. J.A. Ellis, I.H. Sudborough, and J. Turner, The vertex separation and search
number of a graph, Information and Computation, 113 (1994), pp. 50–79.
9. F. Fomin, Helicopter search problems, bandwidth and pathwidth, Discrete Appl.
Math., 85 (1998), pp. 59–71.
10. F.V. Fomin and P.A. Golovach, Graph searching and interval completion, SIAM
J. Discrete Math., 13 (2000), pp. 454–464 (electronic).
11. M. Franklin, Z. Galil, and M. Yung, Eavesdropping games: A graph-theoretic
approach to privacy in distributed systems, J. ACM, 47 (2000), pp. 225–243.
12. D. Fulkerson and O. Gross, Incidence matrices and interval graphs, Pacific
Journal of Math., 15 (1965), pp. 835–855.
13. M.R. Garey, R.L. Graham, D.S. Johnson, and D.E. Knuth, Complexity results for bandwidth minimization, SIAM J. Appl. Math., 34 (1978), pp. 477–495.
14. F. Gavril, The intersection graphs of subtrees in trees are exactly the chordal
graphs, J. Combin. Theory Ser. B, 16 (1974), pp. 47–56.
15. L.M. Kirousis and C.H. Papadimitriou, Searching and pebbling, Theor. Comp.
Sc., 47 (1986), pp. 205–218.
16. A.S. LaPaugh, Recontamination does not help to search a graph, J. ACM, 40
(1993), pp. 224–245.
17. C.G. Lekkerkerker and J.C. Boland, Representation of a finite graph by a set
of intervals on the real line, Fund. Math, 51 (1962), pp. 45–64.
18. F.S. Makedon, C.H. Papadimitriou, and I.H. Sudborough, Topological bandwidth, SIAM J. Alg. Disc. Meth., 6 (1985), pp. 418–444.
19. F.S. Makedon and I.H. Sudborough, On minimizing width in linear layouts,
Disc. Appl. Math., 23 (1989), pp. 201–298.
20. A. Parra and P. Scheffler, Treewidth equals bandwidth for AT-free claw-free
graphs, Technical Report 436/1995, Technische Universität Berlin, Fachbereich
Mathematik, Berlin, Germany, 1995.
21. F.S. Roberts, Indifference graphs, in Proof Techniques in Graph Theory, F.
Harary, ed., Academic Press, 1969, pp. 139–146.
22. N. Robertson and P.D. Seymour, Graph minors – a survey, in Surveys in
Combinatorics, I. Anderson, ed., Cambridge Univ. Press, 1985, pp. 153–171.
23. A.L. Rosenberg and I.H. Sudborough, Bandwidth and pebbling, Computing,
31 (1983), pp. 115–139.
Constructing Sparse t-Spanners with Small
Separators
Joachim Gudmundsson⋆
Department of Mathematics and Computing Science, TU Eindhoven
5600 MB Eindhoven, The Netherlands.
Abstract. Given a set of n points S in the plane and a real value t > 1
we show how to construct in time O(n log n) a t-spanner G of S such
√
that there exists a set of vertices S ′ of size O( n log n) whose removal
leaves two disconnected sets A and B where neither is of size greater than
2/3 · n. The spanner also has some additional properties; low weight and
constant degree.
1
Introduction
Complete graphs represent ideal communication networks but they are expensive to build; sparse spanners represent low cost alternatives. The weight of the
spanner network is a measure of its sparseness; other sparseness measures include
the number of edges, maximum degree and the number of Steiner points. Spanners for complete Euclidean graphs as well as for arbitrary weighted graphs find
applications in robotics, network topology design, distributed systems, design of
parallel machines, and many other areas, and have been subject to considerable
research [1,2,5,8,14]. Consider a set S of n points in the plane. A network on
S can be modeled as an undirected graph G with vertex set S and with edges
e = (u, v) of weight wt(e). In this paper we will study Euclidean networks, a
Euclidean network is a geometric network where the weight of the edge e = (u, v)
is equal to the Euclidean distance d(u, v) between its two endpoints u and v. Let
t > 1 be a real number. We say that G is a t-spanner for S, if for every pair of
points u, v ∈ S, there exists a path in G of weight at most t times the Euclidean
distance between u and v. A sparse t-spanner is defined to be a t-spanner with a
linear number of edges and total weight (sum of edge weights) O(wt(M ST (S))),
where wt(M ST (S)) is the total weight of a minimal spanning tree of S.
Many algorithms are known that compute t-spanners with O(n) edges that
have additional properties such as bounded degree, small spanner diameter (i.e.,
any two points are connected by a t-spanner path consisting of only a small
number of edges), low weight (i.e., the total length of all edges is proportional
to the weight of a minimum spanning tree of S), and fault-tolerance; see, e.g.,
[1,2,3,5,7,8,9,11,12,14,19], and the surveys [10,20]. All these algorithms compute
t-spanners for any given constant t > 1.
⋆
Supported by The Netherlands Organisation for Scientific Research (NWO).
A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 86–97, 2003.
c Springer-Verlag Berlin Heidelberg 2003
Constructing Sparse t-Spanners with Small Separators
87
In this paper, we consider the construction of a sparse t-spanner with constant degree and with a provable balanced separator. Finding small separators in
a graph is a problem that has been studied extensively within theoretical computer science for the last three decades, and a survey of the area can be found
in the book by Rosenberg and Heath [17]. Spanners with good separators have,
for example, applications in the construction of external memory data structures [16]. It is well-known that planar graphs have small separators and, hence
any planar spanner has a small separator. Bose et al. [4] showed how to construct
a planar t-spanner for t ≈ 10 with constant degree and low weight. Also, it is
known that the Delaunay triangulation is a t-spanner for t = 2π/(3 cos(π/6))
[13]. For arbitrary values of t > 1 this article is, to the best of the author’s
knowledge, the first time that separators have been considered.
Definition 1. Given a graph G = (V, E), a separator is a set of vertices C ⊂ V
whose removal leaves two disconnected sets A and B. A separator C is said to be
balanced if the size of both A and B is at most 2/3 · |V |.
The main result of this paper is the following theorem.
Theorem 1. Given a set S of n points in the plane and a constant t > 1, there
is an O(n log n)-time algorithm that constructs a graph G = (S, E)
1.
2.
3.
4.
5.
that it is a t-spanner of S,
that has a linear number of edges,
that has weight O(wt(M ST (S))),
√
that has a balanced separator of size O( n log n),
and, in which each node has constant degree.
The paper is organised as follows. First we present an algorithm that produces
a t-spanner G. Then, in Section 3, we prove that G has all the properties stated
in Theorem 1.
2
Constructing a t-Spanner
In this section we first show an algorithm that, given a set S of n points in the
plane together with a real value t > 1, produces a t-spanner G. The algorithm
works in two steps: first it produces a modified approximate θ-graph [6,12,18],
denoted Gθ , which is then pruned using a greedy approach [1,5,8,11]. We show
that the resulting graph, denoted G, has two basic properties that will be used
to prove that it is a sparse spanner with a balanced separator.
2.1
The Algorithm
It has long been known that for any constant t > 1, every point set S in the
plane has a t-spanner with O(n) edges. One such construction is the θ-graph
of S. Let θ < π/4 be a value such that kθ = 2π/θ is a positive integer. The
θ-graph of S is obtained by drawing kθ non-overlapping cones around each point
88
J. Gudmundsson
p ∈ S, each spanning an angle of θ, and connecting p to the point in each cone
closest to p. For each of these edges, p is said to be the source while the other
endpoint is said to be the sink. The result is a tθ -spanner with at most nkθ edges.
Here tθ = (cos(θ) − sin(θ))−1 . The time needed to construct the θ-graph for any
constant θ is O(n log n) [12].
Approximate the θ-Graph. Here we will build an approximate version of the
θ-graph, which we denote a φ-graph Gφ = (S, Eφ ). First build a θ′ -graph (S, Eθ′ )
with θ′ = ǫθ, for some small constant ǫ, as shown in Fig. 1a. A point v ∈ S
belongs to Sp if and only if (p, v) ∈ Eθ′ and p is the source of (p, v). Process
each point p ∈ S iteratively as follows until Sp is empty. Let v be the point in
Sp closest to p. Add the edge (p, v) to Eφ′ and remove every point u from Sp for
which it holds that ∠vpu < (θ/2), as illustrated in Fig. 1b. Continue until Sp is
empty.
Gφ′ is a tφ′ -spanner with tφ′ = (cos(φ′ ) − sin(φ′ ))−1 and, since two adjacent
cones may overlap, the number of outgoing edges is bounded by 4π/θ. Arya et
al. [2] showed that a θ-graph can be pruned such that each point has constant
degree. Applying this result to Gφ′ gives a tφ -spanner Gφ where each point has
t ′
degree bounded by O( θ(tφφ−tφ ) ). Note that the value of φ′ is θ(1 + 2ǫ).
Remove “long” Edges Intersecting “short” Edges. The remaining two
steps of the construction algorithms are both pruning the graph. Prune Gφ =
(S, Eφ ) to obtain a graph Gθ = (S, Eθ ) as follows. Build the minimum spanning
tree Tmst = (S, Emst ) of S. Sort the edges in Eφ and in Emst with respect to
their lengths. We obtain the two ordered sets Eφ = {e1 , . . . , eO(n) } and Emst =
{e′1 , . . . , e′n−1 } respectively. The idea is to process the edges in Eφ in order, while
maintaining a graph T that will cluster vertices that lie within distance l from
each other, where l = |ei |/n2 and ei is the edge just about to be processed. The
graph will also contain information about the convex hull of each cluster and we
will show that this can be done in linear time if the minimum spanning tree is
given.
Initially T contains n clusters where every cluster is a single point. Assume
that we are about to process an edge ei = (u, v) ∈ Eφ . The first step is to merge
all clusters in T that are connected by an edge of length at most l = |ei |/n2 .
This is done by extracting the shortest edge, e′j = (u′j , vj′ ), in Emst and merging
the two clusters C1 and C2 containing u′j respectively vj′ . This is done until there
are no more edges in Emst of length less than l = |ei |/n2 . At the same time we
also compute the convex hull, denoted C, of C1 and C2 , note that this can be
done in linear time with respect to the decrease in complexity from C1 and C2 ,
to C. Hence, in total, it will require linear time to update the convex hulls of
the clusters. Now we are ready to process ei = (u, v). Let m(u, l) and m(v, l)
denote the clusters in T containing u and v respectively. If ei intersects the
convex hull of either m(u, l) or m(v, l) then ei is discarded, otherwise it is added
to Eθ , as shown in Fig. 1c. Since the original graph is a φ-graph it is not hard
Constructing Sparse t-Spanners with Small Separators
89
to see that between every pair of clusters, C1 and C2 , there is at least one edge
(u, v) ∈ Eφ such that u and v lies on the convex hull of C1 and C2 respectively.
This finishes the second part of the algorithm and we sum it up by stating the
following straight-forward observation.
m(u, l)
u
e
m(v, l)
v
θ′
m(v, l)
u
e
m(u, l)
(a)
(b)
v
(c)
Fig. 1. (a) Constructing a θ′ -graph, which is then (b) pruned to obtain a φ-graph. (c)
Every edge is tested to see if it intersects the convex hulls of the clusters containing u
and v.
Observation 1 The above algorithm produces a graph Gθ in time O(n log n)
1
which is a tθ -spanner, where tθ ≤ ( cos(φ)−sin(φ)
+ n1 ).
Greedily Pruning the Graph. We are given a modified approximate θ-graph
Gθ for tθ = t/(1 + ε). The final step is to run the greedy tg -spanner algorithm
with Gθ and tg = (1 + ε) as input. The basic idea of the standard greedy algorithm is sorting the edges (by increasing weight) and then processing them in
order. Greedy processing of an edge e = (u, v) entails a shortest path query, i.e.,
checking whether the shortest path in the graph built so far has length at most
t · d(u, v). If the answer to the query is no, then edge e is added to the spanner
G, else it is discarded, see Fig. 2. The greedy algorithm was first considered by
Althöfer et al. [1] and later variants of the greedy algorithm using clustering
techniques improved the analysis [5,8,11]. In [8] it was observed that shortest
path queries need not be answered precisely. Instead, approximate shortest path
queries suffice, of course, this meant that the greedy algorithm, too, was only
approximately simulated by the algorithm. The most efficient algorithm was
recently presented by Gudmundsson et al. [11], where they show an O(n log n)time variant of the greedy algorithm. In the approximate greedy algorithm an
approximate shortest path query checks if the path is longer than τ · d(u, v),
where 1 < τ < t.
2.2
Two Basic Properties
The final result is a t-spanner G = (S, E) with several nice properties, among
them the following two simple and fundamental properties that will be used in
90
J. Gudmundsson
Algorithm Standard-Greedy(G = (S, E), t)
1.
sort the edges in E by increasing weight
2.
E ′ := ∅
3.
G′ := (S, E ′ )
4.
for each edge (u, v) ∈ E do
5.
if ShortestPath(G′ , u, v) > t · d(u, v) then
6.
E ′ := E ′ ∪ {(u, v)}
7.
G′ := (S, E ′ )
′
8.
output G
Fig. 2. The naive O(|E|2 · |S| log |S|)-time greedy spanner algorithm
the analysis: the obtuse Empty-cone property, and the Leap-frog property. Let
C(u, v, θ) denote the (unbounded) cone with apex at u, spanning an angle of θ
such that (u, v) splits the angle at u into two equal angles. An edge set E is said
to have the Empty-cone property if for every edge e = (u, v) ∈ E it holds that
v is the point closest to u within C(u, v, θ).
From the definition of θ-graphs it is obvious that Gθ satisfies the Empty-cone
property, actually we can see that the property can be somewhat strengthen
to what we call an obtuse Empty-cone property. Assume w.l.o.g. that (u, v) is
vertical, u lies below v and u is the source of e. Since u and v lies on the convex
hull of m(u, l) and m(v, l) (otherwise e would have been discarded in the pruning
step) it holds that there are two half disks intersecting (u, v) with radii l = |e|/n2
and centers at u and v, see Fig 3a. Thus, the union of the half disks and the part
of the cone C(u, v, θ) within distance |uv| from u is said to be an obtuse cone,
and is denoted Co (u, v, θ). The following observation is straight-forward.
Observation 2 The shortest edge that intersects an edge e = (u, v) ∈ E satisθ/2
.
fying the obtuse Empty-cone property must be longer than 2|e| sin
n2
Next we consider the Leap-frog property. Let t ≥ τ > 1. An edge set E
satisfies the (t, τ )-leapfrog property if the following is true for every possible
E ′ = {(u1 , v1 ), . . . , (um , vm )}, which is a subset of E:
τ ·wt(u1 , v1 ) <
m
i=2
m−1
wt(ui , vi ) + t·(
wt(vi , ui+1 ) + wt(vm , u1 )).
i=1
Informally, this definition says that if there exists an edge between u1 and v1
then any path, not including (u1 , v1 ) must have length greater than τ ·wt(u1 , v1 ),
as illustrated in Fig. 3b.
Lemma 1. Given a set of points in the plane and a real value t > 1 the above
algorithm produces a t-spanner G = (S, E) that satisfies the obtuse Empty-cone
property, and the Leap-frog property.
Constructing Sparse t-Spanners with Small Separators
(a)
v
91
(b)
u2
v2
u3
v1
v3
u1
u
Fig. 3. (a) The shaded area, denoted Co (u, v, θ), is empty if e satisfies the obtuse
Empty-cone property. (b) Illustrating the Leap-frog property.
Proof. Since E is a subset of the edges in the approximate θ-graph Gθ it immediately follows that E has the obtuse Empty-cone property.
Now, let C be the shortest simple cycle in G containing an arbitrary edge
e = (u, v). To prove that G satisfies the leapfrog property we have to estimate
wt(C) − wt(u, v). Let e′ = (u′ , v ′ ) be the longest edge of C. Among the cycle
edges e′ is examined last by the algorithm. What happens while the algorithm
is examining e′ ? In [11] it was shown that if the algorithm adds an edge e′ to the
graph the shortest path between u′ and v ′ must be longer than τ · d(u′ , v ′ ) in the
partial graph constructed so far. Hence, wt(C) − d(u, v) ≥ wt(C) − d(u′ , v ′ ) >
τ · d(u′ , v ′ ) ≥ τ · d(u, v). The lemma follows.
⊓
⊔
The obtuse Empty-cone property will be used to prove that G has a balanced
separator and, the Leap-frog property will mainly be used to prove that the total
weight of G is small, as will be shown in Section 3.2.
3
The Analysis
In this section we will perform a close analysis of the graph constructed by the
algorithm presented in the previous section. First we study the separator property and then, in Section 3.2, we take a closer look at the remaining properties
claimed in Theorem 1.
3.1
A Balanced Separator
In this subsection
we prove that the graph G = (S, E) has a balanced separator
√
of size O( n log n), by using the famous Planar Separator Theorem by Lipton
and Tarjan [15].
Fact 1 (Planar Separator Theorem [15]) Every planar graph G with n vertices
can be partitioned into three parts A, B and C such that C is a separator of
92
J. Gudmundsson
√ √
G and |A| ≤ 2n/3, |B| ≤ 2n/3 and |C| ≤ 2 2 n. Furthermore, there is an
algorithm to compute this partition in time O(n).
The following corollary is a straight-forward consequence of Fact 1.
Corollary 1. Let G be a graph in the plane such that every edge of G intersects
at most N other edges of G. It can be partitioned into three parts A,√B√and C
such that C is a separator of G and |A| ≤ 2n/3, |B| ≤ 2n/3 and |C| ≤ 2 2 n·N .
This corollary immediately
suggests a way prove that G has a balanced sep√
arator of size O(N n), namely prove that every edge in E intersects at most
N other edges in E. It should be noted that it is not enough to prove that the
intersection graph I of G has low complexity since finding a balanced separator
in I does not imply a balanced separator of G.
The first step is to partition the edge set E into a constant number of groups,
each having the three nice properties listed below. The idea of partitioning the
edge set into groups is borrowed from [7].
The edge set E can be partitioned into a constant number of groups such
that the following three properties are satisfied for each subset:
1. Near-parallel property: Associate to each edge e = (u, v) a slope as follows. Let h be a horisontal segment with left endpoint at the source of e.
The of e is now the counter-clockwise angle between h and e. An edge e in
E belongs to the subgroup Ei if the slope of e is between (i − 1)β and iβ, for
some small angle β ≪ θ.
2. Length-grouping property: Let γ > 0 be a small constant. The length of
any two edges in Ei,j differ by at most a factor δ = (1 − γ) or by at least a
factor xδ c−1 .
Consider a group Ei of near-parallel edges. Let the length of the longest edge
in Ei be ℓ. Partition the interval [0, ℓ] into an infinite number of intervals
{[ℓδ, ℓ], [ℓδ 2 , ℓδ], [ℓδ 3 , ℓδ 2 ], . . . }. Define the subgroup Ei,j as containing the
edges whose lengths lie in intervals {[ℓδ j+1 , ℓδ j ], [ℓδ j+c+1 , ℓδ j+c ], . . . }. There
is obviously only a constant number of such groups.
3. Empty-region property: Any two edges e1 and e2 in Ei,j,k that are nearparallel and almost of equal length are separated by a distance which is a
large multiple of |e1 |. Hence, two “near-equal” edges cannot be close to each
other.
To achieve this grouping [7], construct a graph H where the nodes are edges
of Ei,j , and two “near-equal” nodes in H, say e1 and e2 , are connected by
an edge if e1 intersects a large cylinder of radius α|e2 | and height α|e2 |
centered at the center of the edge e2 in Ei,j , for some large constant α. This
graph has constant degree, because by the Leap-frog property, there can
be only a constant number of similar “near-equal” edges whose endpoints
can be packed into the cylinder. Thus this graph has a constant chromatic
number, and consequently a constant number of independent sets. Hence,
Ei,j is subdivided into a constant number of groups, denoted Ei,j,k .
Constructing Sparse t-Spanners with Small Separators
93
Cov (u, v, θ)
Cou (u, v, θ)
R1′
v
u
u
(a)
(b)
Fig. 4. (a) Illustrating the split of Co (u, v, θ) into Cou (u, v, θ) and Cov (u, v, θ). (b) R′1 lies
inside R1 .
Let e = (u, v) be an arbitrary edge in E. Next we will prove that the number
of edges in D = Ei,j,k , for any i, j and k, that may intersect e is bounded by
O(log n) and since there is only a constant number of groups this implies that e
is intersected by at most a logarithmic number of edges of E. For simplicity we
will assume that e is horisontal.
To simplify the analysis we partition Co (u, v, θ) into two regions, Cou (u, v, θ)
and Cov (u, v, θ), where every point in Cou (u, v, θ) lies closer to u than to v, see
Fig. 4a. We will prove that the number of edges intersecting (u, v) within the
region Cou (u, v, θ) is bounded by O(log n). By symmetry the proof also holds for
the region Cov (u, v, θ) since a cone of size and shape as described by the region
Cou (u, v, θ) can be placed within Cov (u, v, θ), see Fig. 4a. Hence, for the rest of this
section we will only consider the region Cou (u, v, θ).
Let D′ = {e1 , e2 , . . . , er } be the edges in D intersecting the part of e within
u
Co (u, v, θ) , ordered from left to right with respect to their intersection with e.
Let qi denote the intersection point between ei and e and let yi denote the length
of the intersection between a vertical line through qi and Cou (u, v, θ).
ui+1
ui
ui
ui+1
vi+1
vi
vi+1
Fig. 5. Illustrating the proof of Lemma 2
94
J. Gudmundsson
Lemma 2. The distance between any pair of consecutive points qi and qi+1
along e is greater than y2i sin(θ/2).
Proof. We will assume that ui and ui+1 lie above vi and vi+1 . Note that in the
calculations below we assumed that the edges in D are parallel but since the
final bound is far from the exact solution the bound stated in the lemma is still
valid. There are three cases to consider.
1. |ei+1 | < δ c · |ei |. We will have two subcases:
a) ei+1 does not intersect C(ui , vi , θ), see Fig. 5a.
The distance between ei and ei+1 is minimised when vi+1 is the intersection between the lower side of C(u, v, θ) and the right side of C(ui , vi , θ),
and ui lies on the top side of C(u, v, θ). Now, straight-forward trigonometry shows that the horisontal distance between qi and qi+1 is greater
than yi sin(θ/2) > y2i sin(θ/2).
b) ei+1 intersects C(ui , vi , θ), see Fig. 5b.
The distance between qi and qi+1 is minimised when ui+1 lies on the right
side of C(ui , vi , θ) in a leftmost position. Again, using straight-forward
trigonometry we obtain that the distance between qi and qi+1 is greater
than (ei (1 − δ c−1 ) sin(θ/2) > yi (1 − δ c−1 ) sin(θ/2) > y2i sin(θ/2).
2. |ei | ≤ δ c · ei+1 . We will have two subcases
a) ei does not intersect C(ui+1 , vi+1 , θ), see Fig. 6a.
The proof is almost identical to case 1a. The distance between qi and
qi+1 is minimised when vi+1 is the intersection between the lower side
of C(u, v, θ) and the right side of C(ui , vi , θ), and ui lies on the top side
of C(u, v, θ). Simple calculations show that the distance between qi and
qi+1 is greater than y2i sin(θ/2).
b) ei intersects C(ui+1 , vi+1 , θ), see Fig. 6b.
The proof is similar to case 1b. The distance between qi and qi+1 is
minimised when ui lies on the left side of C(ui+1 , vi+1 , θ) in a rightmost
position. Again, using straight-forward trigonometry we obtain that the
distance between ei and ei+1 is at least (ei (1 − δ c−1 ) sin(θ/2) > yi (1 −
δ c−1 ) sin(θ/2) > y2i sin(θ/2).
3. δ c · |ei | ≤ |ei+1 | ≤ (1/δ c ) · |ei |.
It follows from the Empty-region property of D that the distance between
ei and ei+1 is at least (α · max(|ei |, |ei+1 |)).
⊓
⊔
We need one more lemma before we can state the main theorem of this
section.
Lemma 3. e intersects O(log n) edges in G.
Proof. As above we assume w.l.o.g. that e is horisontal. Partition Cou (u, v, θ)
into two regions, the region R1 containing all points in Cou (u, v, θ) with horisontal distance at most (|e|/n2 ) from u, and the region R2 containing the remaining
Constructing Sparse t-Spanners with Small Separators
95
ui+1
ui+1
ui
ui
vi
vi
vi+1
Fig. 6. Illustrating the proof of Lemma 2
region. Consider the disk Du of radius (|e|/n2 ) with center at u. From the construction of G it holds that there is a half-disk centered at u and with radius
(|e|/n2 ) that is empty. We may assume w.l.o.g. that the half-disk covers the upper right quadrant of Du (otherwise it must cover the lower right quadrant of
Du ), Fig. 4b.
Let us first consider the region R1 . Let R′1 be the rectilinear box inside
R1 with bottom left corner at u, width (|e|/n2 ) and height (|e|/n2 · sin(θ/2)), as
illustrated in Fig. 4b. Every edge intersecting e within R1 must also intersect R′1 ,
hence we may consider R′1 instead of R1 . According to Lemma 2, the distance
· sin(θ/2)
, which implies that the total
between qi and qi+1 , is at least |e|·sin(θ/2)
n2
2
2
1
number of edges that may intersect e within R′1 is n|e|2 / |e|·sinn2(θ/2) = sin2 (θ/2)
which is constant since θ is a constant.
Next we consider the part of e within R2 . The width of R2 is less than (|e|/2),
its left side has height at least (|e|/n2 · sin(θ/2)) and its right side has height at
2
(θ/2)
most (|e|/2 · sin θ). From Lemma 2 it holds that yi+1 ≥ yi (1 + 2sin
cos(θ/2) ) since
2
(θ/2)
the distance between qi and qi+1 is at least yi /2 · sin(θ/2). Set λ = 2sin
cos(θ/2) . The
length of the shortest edge ℓmin is Ω(1/n2 ) according to Observation 2, and the
value of yi is at least (1 + λ)i−1 · ℓmin . The largest y-value is obtained for the
rightmost intersection point qb . Obviously yb is bounded by (|e|/2 · sin θ), hence
it holds that (1 + λ)b · ℓmin = O(|e|) which is true for b = O(log n).
⊓
⊔
Now we are ready to state the main theorem of this section, which is obtained
by putting together Corollary 1 and Lemma 3 .
√
Theorem 2. G has a balanced separator of size O( n log n).
3.2
Other Properties
Theorem 1 claims that G has five properties which we will discuss below, one by
one:
96
J. Gudmundsson
1. G is a t-spanner of the complete Euclidean graph.
Since Gθ is a (t/(1 + ε))-spanner of the complete Euclidean graph and since
G is a (1 + ε)-spanner of Gθ it follows that G is a t-spanner of the complete
Euclidean graph.
2. G has a linear number of edges.
This property is straight-forward since G is a subgraph of Gθ and we already
know from Section 2.1 that the number of edges in Gθ is less than n · 4π
θ .
3. G has weight O(wt|M ST |).
Das and Narasimhan showed the following fact about the weight of graphs
that satisfy the Leap-frog property.
Fact 2 [Theorem 3 in [8]] There exists a constant 0 < φ < 1 such that
the following holds: if a set of line segments E in d-dimensional space satisfies the (t, τ )-leapfrog property, where t ≥ τ ≥ φt + 1 − φ > 1, then
wt(E) = O(wt(M ST )), where M ST is a minimum spanning tree connecting
the endpoints of E. The constant implicit in the O-notation depends on t
and d.
The low weight property now follows from the above fact together with
Lemma 1 and the fact that Gθ is a (t/(1 + ε))-spanner of the complete
Euclidean graph of S, hence it also includes a spanning tree of weight
O(wt(M ST (S))).
4. G has a balanced separator
Follows from Theorem 2.
5. G has constant degree.
This property is straight-forward since G is a subgraph of Gφ , constructed in
Section 2.1, which has constant degree.
This concludes the proof of Theorem 1.
4
Conclusions and Further Research
We have shown the first algorithm that given a set of points in the plane and a
real value t > 1 constructs in time O(n log n) a sparse t-spanner with constant
degree and with a provably balanced separator.
There are two obvious ques√
tions: (1) Is there a separator of size O( n), and (2) will the algorithm produce
a t-spanner with similar properties in higher dimensions. Another interesting
question to answer is if the greedy algorithm by itself produces a t-spanner with
a balanced separator.
Acknowledgements. I am grateful to Anil Maheswari for introducing me to
the problem, and to Mark de Berg, Otfried Cheong and Andrzej Lingas for
stimulating and helpful discussions during the preparation of this article.
Constructing Sparse t-Spanners with Small Separators
97
References
1. I. Althöfer, G. Das, D. P. Dobkin, D. Joseph, and J. Soares. On sparse spanners
of weighted graphs. Discrete Computational Geometry, 9.
2. S. Arya, G. Das, D. M. Mount, J. S. Salowe, and M. Smid. Euclidean spanners:
short, thin, and lanky. In Proc. 27th Annual ACM Symposium on Theory of Computing, pages 489–498, 1995.
3. J. Bose, J. Gudmundsson, and P. Morin. Ordered theta graphs. In Proc. 14th
Canadian Conference on Computational Geometry, 2002.
4. J. Bose, J. Gudmundsson, and M. Smid. Constructing plane spanners of bounded
degree and low weight. In Proc. 10th European Symposium on Algorithms, 2002.
5. B. Chandra, G. Das, G. Narasimhan, and J. Soares. New sparseness results on
graph spanners. International Journal of Computational Geometry and Applications, 5:124–144, 1995.
6. K. L. Clarkson. Approximation algorithms for shortest path motion planning. In
Proc. 19th ACM Symposium on Computational Geometry, pages 56–65, 1987.
7. G. Das, P. Heffernan, and G. Narasimhan. Optimally sparse spanners in 3dimensional Euclidean space. In Proc. 9th Annual ACM Symposium on Computational Geometry, pages 53–62, 1993.
8. G. Das and G. Narasimhan. A fast algorithm for constructing sparse Euclidean
spanners. International Journal of Computational Geometry and Applications,
7:297–315, 1997.
9. G. Das, G. Narasimhan, and J. Salowe. A new way to weigh malnourished Euclidean graphs. In Proc. 6th ACM-SIAM Sympos. Discrete Algorithms, pages 215–
222, 1995.
10. D. Eppstein. Spanning trees and spanners. In J.-R. Sack and J. Urrutia, editors,
Handbook of Computational Geometry, pages 425–461. Elsevier Science Publishers,
Amsterdam, 2000.
11. J. Gudmundsson, C. Levcopoulos, and G. Narasimhan. Improved greedy algorithms for constructing sparse geometric spanners. SIAM Journal of Computing,
31(5):1479–1500, 2002.
12. J. M. Keil. Approximating the complete Euclidean graph. In Proc. 1st Scandinavian Workshop on Algorithmic Theory, pages 208–213, 1988.
13. J. M. Keil and C. A. Gutwin. Classes of graphs which approximate the complete
Euclidean graph. Discrete and Computational Geometry, 7:13–28, 1992.
14. C. Levcopoulos, G. Narasimhan, and M. Smid. Improved algorithms for constructing fault-tolerant spanners. Algorithmica, 32:144–156, 2002.
15. R. J. Lipton and R. E. Tarjan. A separator theorem for planar graphs. SIAM
Journal of Applied Mathematics, 36:177–189, 1979.
16. A. Maheswari. Personal communication, 2002.
17. A. L. Rosenberg and L. S. Heath. Graph separators, with applications. Kluwer
Academic/Plenum Publishers, Dordrecht, the Netherlands, 2001.
18. J. Ruppert and R. Seidel. Approximating the d-dimensional complete Euclidean
graph. In Proc. 3rd Canadian Conference on Computational Geometry, pages 207–
210, 1991.
19. J. S. Salowe. Construction of multidimensional spanner graphs with applications to
minimum spanning trees. In Proc. 7th Annual ACM Symposium on Computational
Geometry, pages 256–261, 1991.
20. M. Smid. Closest point problems in computational geometry. In J.-R. Sack and
J. Urrutia, editors, Handbook of Computational Geometry, pages 877–935. Elsevier
Science Publishers, Amsterdam, 2000.
Composing Equipotent Teams
Mark Cieliebak1 , Stephan Eidenbenz2 , and Aris Pagourtzis3
1
2
Institute of Theoretical Computer Science, ETH Zürich
cieliebak@inf.ethz.ch
Basic and Applied Simulation Science (CCS-5), Los Alamos National Laboratory†
eidenben@lanl.gov
3
Department of Computer Science, School of ECE,
National Technical University of Athens, Greece‡
pagour@cs.ntua.gr
Abstract. We study the computational complexity of k Equal Sum
Subsets, in which we need to find k disjoint subsets of a given set of
numbers such that the elements in each subset add up to the same sum.
This problem is known to be NP-complete. We obtain several variations
by considering different requirements as to how to compose teams of
equal strength to play a tournament. We present:
– A pseudo-polynomial time algorithm for k Equal Sum Subsets
with k = O(1) and a proof of strong NP-completeness for k = Ω(n).
– A polynomial-time algorithm under the additional requirement that
the subsets should be of equal cardinality c = O(1), and a pseudopolynomial time algorithm for the variation where the common cardinality is part of the input or not specified at all, which we proof
NP-complete.
– A pseudo-polynomial time algorithm for the variation where we look
for two equal sum subsets such that certain pairs of numbers are not
allowed to appear in the same subset.
Our results are a first step towards determining the dividing lines between polynomial time solvability, pseudo-polynomial time solvability,
and strong NP-completeness of subset-sum related problems; we leave
an interesting set of questions that need to be answered in order to obtain the complete picture.
1
Introduction
The problem of identifying subsets of equal value among the elements of a given
set is constantly attracting the interest of various research communities due to
its numerous applications, such as production planning and scheduling, parallel processing, load balancing, cryptography, and multi-way partitioning in VLSI
design, to name only a few. Most research has so far focused on the version where
†
‡
LA–UR–03:1158; work done while at ETH Zürich.
Work partially done while at ETH Zürich, supported by the Human Potential Programme of EU, contract no HPRN-CT-1999-00104 (AMORE).
A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 98–108, 2003.
c Springer-Verlag Berlin Heidelberg 2003
Composing Equipotent Teams
99
the subsets must form a partition of the given set; however, the variant where
we skip this restriction is interesting as well. For example, the Two Equal
Sum Subsets problem can be used to show NP-hardness for a minimization
version of Partial Digest (one of the central problems in computational biology whose exact complexity is unknown) [4]. Further applications may include:
forming similar groups of people for medical experiments or market analysis,
web clustering (finding groups of pages of similar content), or fair allocation of
resources.
Here, we look at the problem from the point of view of a tournament organizer: Suppose that you and your friends would like to organize a soccer tournament (you may replace soccer with the game of your choice) with a certain
number of teams that will play against each other. Each team should be composed of some of your friends and – in order to make the tournament more
interesting – you would like all teams to be of equal strength. Since you know
your friends quite well, you also know how well each of them plays. More formally, you are given a set of n numbers A = {a1 , . . . , an }, where the value ai
represents the excellence of your i-th friend in the chosen game, and you need to
find k teams (disjoint subsets1 of A) such that the values of the players of each
team add up to the same number.
This problem can be seen as a variation of Bin Packing with fixed number
of bins. In this new variation we require that all bins should be filled to the same
level while it is
not necessary to use all the elements. For any set A of numbers,
let sum(A) := a∈A a denote the sum of its elements. We call our problem k
Equal Sum Subsets, where k is a fixed constant:
Definition 1 (k Equal Sum Subsets). Given is a set of n numbers A =
{a1 , . . . , an }. Are there k disjoint subsets S1 , . . . , Sk ⊆ A such that sum(S1 ) =
. . . = sum(Sk )?
The problem k Equal Sum Subsets has been recently shown to be NPcomplete for any constant k ≥ 3 [3]. The NP-completeness of the particular case
where k = 2 has been shown earlier by Woeginger and Yu [8]. To the best of
our knowledge, the variations of k Equal Sum Subsets that we study in this
paper have not been investigated before in the literature.
We have introduced parameter k for the number of equal size subsets as a
fixed constant that is part of the problem definition. An interesting variation is
to allow k to be a fixed function of the number of elements n, e.g. k = nq for some
constant q. In the sequel, we will always consider k as a function of n; whenever
k is a constant we simply write k = O(1).
The definition of k Equal Sum Subsets corresponds to the situation in
which it is allowed to form subsets that do not have the same number of elements. In some cases this makes sense; however, we may want to have the same
1
Under a strict formalism we should define A as a set of elements which have values {a1 , . . . , an }. For convenience, we prefer to identify elements with their values.
Moreover, the term “disjoint subsets” refers to subsets that contain elements of A
with different indices.
100
M. Cieliebak, S. Eidenbenz, and A. Pagourtzis
number of elements in each subset (this would be especially useful in composing
teams for a tournament). We thus define k Equal Sum Subsets of Specified
Cardinality as follows:
Definition 2 (k Equal Sum Subsets of Specified Cardinality). Given
are a set of n numbers A = {a1 , . . . , an } and a cardinality c. Are there k disjoint
subsets S1 , . . . , Sk ⊆ A with sum(S1 ) = . . . = sum(Sk ) such that each Si has
cardinality c?
There are two nice variations of this problem, depending on the parameter c.
The first is to require c to be a fixed constant; this corresponds to always playing
a specific game (e.g. if you always play soccer then it is c = 11). We call this
problem k Equal Sum Subsets of Cardinality c. The second variation is
to require only that all teams should have an equal number of players, without
specifying this number; this indeed happens in several “unofficial” tournaments,
e.g. when composing groups of people for medical experiments, or in online
computer games. We call the second problem k Equal Sum Subsets of Equal
Cardinality.
Let us now consider another aspect of the problem. Your teams would be
more efficient and happy if they consisted of players that like each other or,
at least, that do not hate each other. Each of your friends has a list of people
that she/he prefers as team-mates or, equivalently, a list of people that she/he
would not like to have as team-mates. In order to compose k equipotent teams
respecting such preferences/exclusions, you should be able to solve the following
problem:
Definition 3 (k Equal Sum Subsets with Exclusions). Given are a set of
n numbers A = {a1 , . . . , an }, and an exclusion graph Gex = (A, Eex ) with vertex
set A and edge set Eex ⊆ A × A. Are there k disjoint subsets S1 , . . . , Sk ⊆ A
with sum(S1 ) = . . . = sum(Sk ) such that each set Si is an independent set in
Gex , i.e., there is no edge between any two vertices in Si ?
An overview of the results presented in this paper is given below. In Section
2, we propose a dynamic programming algorithm for k Equal Sum Subsets
k
with running time O( knS
k−1 ), where n is the cardinality of the input set and S is
the sum of all numbers in the input set; the algorithm runs in pseudo-polynomial
time2 for k = O(1). For k Equal Sum Subsets with k = Ω(n), we show strong
NP-completeness3 in Section 3 by proposing a reduction from 3-Partition.
2
3
That is, the running time of the algorithm is polynomial in (n, m), where n denotes
the cardinality of the input set and m denotes the largest number of the input, but it
is not necessarily polynomial in the length of the representation of the input (which
is O(n log m)).
This means that the problem remains NP-hard even when restricted to instances
where all input numbers are polynomially bounded in the cardinality of the input
set. In this case, no pseudo-polynomial time algorithm can exist for the problem
(unless P = NP). For formal definitions and a detailed introduction to the theory of
NP-completeness the reader is referred to [5].
Composing Equipotent Teams
101
In Section 4, we propose a polynomial-time algorithm for k Equal Sum
Subsets of Cardinality c. The algorithm uses exhaustive search and runs in
time O(nkc ), which is polynomial in n as the two parameters k and c are fixed
constants. For k Equal Sum Subsets of Specified Cardinality, we show
NP-completeness; the result holds also for k Equal Sum Subsets of Equal
Cardinality. However, we show that none of these problems is strongly NPcomplete, by presenting an algorithm that can solve them in pseudo-polynomial
time.
In Section 5, we study k Equal Sum Subsets with Exclusions, which is
NP-complete since it is a generalization of k Equal Sum Subsets. We present
a pseudo-polynomial time algorithm for the case where k = 2. We also give a
modification of this algorithm that additionally guarantees that the two sets will
have an equal (specified or not) cardinality.
We conclude in Section 6 presenting a set of open questions and problems.
1.1
Number Representation
In many of our proofs, we use numbers that are expressed
in the number system
of some base B. We denote by a1 , . . . , an the number 1≤i≤n ai B n−i ; we say
that ai is the i-th digit of this number. In our proofs, we will choose base B
large enough such that even adding up all numbers occurring in the reduction
will not lead to carry-digits from one digit to the next. Therefore, we can add
numbers digit by digit. The same holds for scalar products. For example, having
base B = 27 and numbers α = 3, 5, 1, β = 2, 1, 0, then α + β = 5, 6, 1 and
3 · α = 9, 15, 3.
We will generally make liberal use of the notation and allow different bases
for each digit. We define the concatenation of two numbers by a1 , . . . , an
b1 , . . . , bm := a1 , . . . , an , b1 , . . . , bm , i.e., α β = αB m + β, where m
is the number of digits in β. We will use ∆n (i) := 0, . . . , 0, 1, 0, . . . , 0 for the
number that has n digits, all 0’s except for the i-th position where the digit
is 1. Furthermore, 1 n := 1, . . . , 1 is the number that has n digits, all 1’s, and
0 n := 0, . . . , 0 has n zeros. Notice that 1 n = B n − 1.
2
A Pseudo-Polynomial Time Algorithm for k Equal
Sum Subsets with k = O(1)
We present a dynamic programming algorithm for k Equal Sum Subsets that
uses basic ideas of well-known dynamic programming algorithms for Bin Packing with fixed number of bins [5]. For constant k, this algorithm runs in pseudopolynomial time.
For an instance A = {a1 , . . . , an } of k Equal Sum Subsets, let S =
sum(A). We define boolean variables F (i, s1 , . . . , sk ), where i ∈ {1, . . . , n} and
sj ∈ {0, . . . , ⌊ Sk ⌋} for 1 ≤ j ≤ k. Variable F (i, s1 , . . . , sk ) will be TRUE if
there are k disjoint subsets X1 , . . . , Xk ⊆ {a1 , . . . , ai } with sum(Xj ) = sj , for
1 ≤ j ≤ k.
102
M. Cieliebak, S. Eidenbenz, and A. Pagourtzis
There are k sets of equal sum if and only if there exists a value s ∈
{1, . . . , ⌊ Sk ⌋} such that F (n, s, . . . , s) = TRUE.
Clearly, F (1, s1 , . . . , sk ) is TRUE if and only if either si = 0 for 1 ≤ i ≤ k
or there exists index j such that sj = a1 and si = 0 for all 1 ≤ i ≤ k, i = j.
For i ∈ {2, . . . , n} and sj ∈ {0, . . . , ⌊ Sk ⌋}, variable F (i, s1 , . . . , sk ) can be
expressed recursively as follows:
F (i, s1 , . . . , sk ) = F (i − 1, s1 , . . . , sk ) ∨
F (i − 1, s1 , . . . , sj−1 , sj − ai , sj+1 , . . . , sk ).
1≤j≤k
sj −ai ≥0
k
The value of all variables can be determined in time O( knS
k−1 ), since there
are n⌊ Sk ⌋k variables, and computing each variable takes at most time O(k). This
yields the following
Theorem 1. There is a dynamic programming algorithm that solves k Equal
k
Sum Subsets for input A = {a1 , . . . , an } in time O( kn·S
k−1 ), where S = sum(A).
For k = O(1) this algorithm runs in pseudo-polynomial time.
3
Strong NP-Completeness of k Equal Sum Subsets with
k = Ω(n)
In Section 2 we gave a pseudo-polynomial time algorithm for k Equal Sum
Subsets assuming that k is a fixed constant. We will now show that this is
unlikely if k is a fixed function of the cardinality n of the input set. In particular,
we will prove that k Equal Sum Subsets is strongly NP-complete if k = Ω(n).
Let k = nq for some fixed integer q ≥ 2. We provide a polynomial reduction
from 3-Partition, which is defined as follows: Given a multiset of n = 3m
numbers P = {p1 , . . . , pn } and a number h with h4 < pi < h2 , for 1 ≤ i ≤ n, are
there m pairwise disjoint sets T1 , . . . , Tm such that sum(Tj ) = h, for 1 ≤ j ≤ m?
Observe that in a solution for 3-Partition, there are exactly three elements in
each set Tj .
Lemma 1. If k = nq for some fixed integer q ≥ 2, then 3-Partition can be
reduced to k Equal Sum Subsets.
Proof. Let P = {p1 , . . . , pn } and h be an instance of 3-Partition. If all elements
in P are equal, then there is a trivial solution. Otherwise, let r = 3 · (q − 2) + 1
and
ai = pi 0 r , for 1 ≤ i ≤ n
2n
bj = h 0 r , for 1 ≤ j ≤
3
dk,ℓ = 0 ∆r (k), for 1 ≤ k ≤ r, 1 ≤ ℓ ≤
n
3
Composing Equipotent Teams
103
Here, we use base B = 2nh for all numbers. Let A be the set containing all
numbers ai , bj and dk,ℓ . We will use A as an instance of k Equal Sum Subsets.
n
2n
n
The size of A is n′ = n + 2n
3 + r · 3 = n + 3 + (3 · (q − 2) + 1) · 3 = q · n. We
prove that there is a solution for the 3-Partition instance P and h if and only
′
if there are nq disjoint subsets of A with equal sum.
“only if ”: Let T1 , . . . , Tm be a solution for the 3-Partition instance. This
induces m subsets of A with sum h 0 r , namely Si = {ai | pi ∈ Ti }. Together
n′
with the 2n
3 subsets that contain exactly one of the bj ’s each, we have n = q
subsets of equal sum h 0 r .
“if ”: Assume there is a solution S1 , . . . , Sn for the k Equal Sum Subsets
instance. Let Sj be any set in this solution. Then sum(Sj ) will have a zero in
the r rightmost digits, since for each of these digits, there are only n3 numbers in
A for which this digit is non-zero (which are not enough to have one of them in
each of the n sets Sj ). Thus, only numbers ai and bj can occur in the solution;
moreover, we only need to consider the first digit of these numbers (as the other
are zeros).
′
Since not all numbers ai are equal, and the solution consists of nq = n disjoint
sets, there must be at least one bj in one of the subsets in the solution. Thus,
for all j we have sum(Sj ) ≥ h. On the other hand, the sum of all ai ’s and of all
bj ’s is exactly n · h, therefore sum(Sj ) = h, which means that all ai ’s and all bj ’s
appear in the solution. More specifically, there are 2n
3 sets in the solution such
that each of them contains exactly one of the bj ’s, and each of the remaining n3
sets in the solution consists only of ai ’s, such that the corresponding ai ’s add up
to h. Therefore, the latter sets immediately yield a solution for the 3-Partition
instance.
⊓
⊔
In the previous proof, r is a constant, therefore numbers ai and bj are polynomial in h and numbers dk,ℓ are bounded by a constant. Since 3-Partition is
strongly NP-complete [5], k Equal Sum Subsets is strongly NP-hard for k = nq
as well. Obviously, k Equal Sum Subsets is in NP even if k = nq for some fixed
integer q ≥ 2, thus we have the following
Theorem 2. k Equal Sum Subsets is NP-complete in the strong sense for
k = nq , for any fixed integer q ≥ 2.
4
Restriction to Equal Cardinalities
In this section we study the setting where we do not only require the teams to
be of equal strength, but to be of equal cardinality as well. If we are interested
in a specific type of game, e.g. soccer, then the size of the teams is also fixed, say
c = 11, and we have k Equal Sum Subsets of Cardinality c. This problem
is solvable
in time polynomial in n by exhaustive search as follows: compute all
N = nc subsets of the input set A that have cardinality c; consider all N
k
possible sets of k subsets, and for each one check if it consists of disjoint subsets
104
M. Cieliebak, S. Eidenbenz, and A. Pagourtzis
of equal sum. This algorithm needs time O(nck ), which is polynomial in n, since
c and k are constants.
On the other hand, if the size of the teams is not fixed, but given as part of
the input, then we have k Equal Sum Subsets of Specified Cardinality.
We show that this problem is NP-hard by modifying a reduction used in [3]
to show NP-completeness of k Equal Sum Subsets. The reduction is from
Alternating Partition, which is the following NP-complete [5] variation of
Partition: Given n pairs of numbers (u1 , v1 ), . . . , (un , vn ),
are there two
disjoint
u
+
sets
of
indices
I
and
J
with
I
∪
J
=
{1,
.
.
.
,
n}
such
that
i∈I i j∈J vj =
v
=
u
+
u
(equivalently,
v
+
j ∈J vj )?
i∈I ui +
j∈J j
i∈I i
j∈J j
i∈I i
Lemma 2. Alternating Partition can be reduced to k Equal Sum Subsets of Specified Cardinality for any k ≥ 2.
Proof. We transform a given Alternating Partition instance with pairs
(u1 , v1 ), . . . , (un , vn ) into a k Equal
n Sum Subsets of Specified Cardinality instance as follows: Let S = i=1 (ui + vi ). For each pair (ui , vi ) we create
two numbers u′i = ui ∆n (i) and vi′ = vi ∆n (i). In addition, we create
k − 2 (equal) numbers b1 , . . . , bk−2 with bi = S2 ∆n (n). Finally, for each bi
we create n − 1 numbers di,j = 0 ∆n (j), for 1 ≤ j ≤ n − 1. While we set the
base of the first digit to k · S, for all other digits it suffices to use base n + 1, in
order to ensure that no carry-digits can occur in any addition in the following
proof. The set A that contains all u′i ’s, vi′ ’s, bi ’s, and dij ’s, together with chosen
cardinality c = n, is our instance of k Equal Sum Subsets of Specified
Cardinality.
Assume first that we are given a solution for the Alternating Partition
instance, i.e., two indices sets I and J. We create k equal sum subsets S1 , . . . , Sk
as follows: for i = 1, . . . , k − 2, we have Si = {bi , di,1 , . . . , di,n−1 }; for the
remaining two subsets, we let u′i ∈ Sk−1 , if i ∈ I, and vj′ ∈ Sk−1 , if j ∈ J, and
we let u′j ∈ Sk , if j ∈ J, and vi′ ∈ Sk , if i ∈ I. Clearly, all these sets have n
elements, and their sum is S2 1 n .
Now assume we are given a solution for the k Equal Sum Subsets of
Specified Cardinality instance, i.e., k equal sum subsets S1 , . . . , Sk of cardinality n; in this case, all numbers participate in the sets Si , and the elements
in each Si sum up to S2 1 n . Since the first digit of each bi equals S2 , we may
assume w.l.o.g. that for each 1 ≤ i ≤ k − 2, set Si contains bi and does not
contain any number with non-zero first digit (i.e., it does not contain any u′j or
any vj′ ). Therefore, all u′i ’s and vi′ ’s (and only these numbers) are in the remaining two subsets; this yields an alternating partition for the original instance, as
u′i and vi′ can never be in the same subset since both have the (i + 1)-th digit
non-zero.
⊓
⊔
Since the problem k Equal Sum Subsets of Specified Cardinality is
obviously in NP, we get the following:
Theorem 3. For any k ≥ 2, k Equal Sum Subsets of Specified Cardinality is NP-complete.
Composing Equipotent Teams
105
Remark: Note that the above reduction, hence also the theorem, holds also for
the variation k Equal Sum Subsets of Equal Cardinality. This requires
to employ a method where additional extra digits are used in order to force the
equal sum subsets to include all augmented numbers that correspond to numbers
in the Alternating Partition instance; a similar method has been used in [8]
to establish the NP-completeness of Two Equal Sum Subsets (called EqualSubset-Sum there).
However, these problems are not strongly NP-complete for fixed constant k.
We will now describe how to convert the dynamic programming algorithm of
Section 2 to a dynamic programming algorithm for k Equal Sum Subsets of
Specified Cardinality and for k Equal Sum Subsets of Equal Cardinality.
It suffices to add to our variables k more dimensions corresponding to cardinalities of the subsets. We define boolean variables F (i, s1 , . . . , sk , c1 , . . . , ck ),
where i ∈ {1, . . . , n}, sj ∈ {0, . . . , ⌊ Sk ⌋} for 1 ≤ j ≤ k, and cj ∈ {0, . . . , ⌊ nk ⌋} for
1 ≤ j ≤ k. Variable F (i, s1 , . . . , sk , c1 , . . . , ck ) will be TRUE if there are k disjoint subsets X1 , . . . , Xk ⊆ {a1 , . . . , ai } with sum(Xj ) = sj and the cardinality
of Xj is cj , for 1 ≤ j ≤ k.
There are k subsets of equal sum and equal cardinality c if and only if there
exists a value s ∈ {1, . . . , ⌊ Sk ⌋} such that F (n, s, . . . , s, c, . . . , c) = TRUE. Also,
there are k subsets of equal sum and equal (non-specified) cardinality if and only
if there exists a value s ∈ {1, . . . , ⌊ Sk ⌋} and a value d ∈ {1, . . . , ⌊ nk ⌋} such that
F (n, s, . . . , s, d, . . . , d) = TRUE.
Clearly, F (1, s1 , . . . , sk , c1 , . . . , ck ) = TRUE if and only if either si = 0, ci =
0 for 1 ≤ i ≤ k, or there exists index j such that sj = a1 , cj = 1, and si = 0 and
ci = 0 for all 1 ≤ i ≤ k, i = j.
Each variable F (i, s1 , . . . , sk , c1 , . . . , ck ), for i ∈ {2, ..., n}, sj ∈ {0, . . . , ⌊ Sk ⌋},
and cj ∈ {0, . . . , ⌊ nk ⌋}, can be expressed recursively as follows:
F (i, s1 , . . . , sk , c1 , . . . , ck ) = F (i − 1, s1 , . . . , sk , c1 , . . . , ck ) ∨
F (i − 1, s1 , . . . , sj − ai , . . . , sk , c1 , . . . , cj − 1, . . . , ck ).
1≤j≤k
sj −ai ≥0
cj >0
k
k+1
·n
The boolean value of all variables can be determined in time O( Sk2k−1
),
S k n k
since there are n⌊ k ⌋ ⌊ k ⌋ variables, and computing each variable takes at most
time O(k). This yields the following:
Theorem 4. There is a dynamic programming algorithm that solves k Equal
Sum Subsets of Specified Cardinality and k Equal Sum Subsets of
k
·nk+1
),
Equal Cardinality for input A = {a1 , . . . , an } in running time O( Sk2k−1
where S = sum(A). For k = O(1) this algorithm runs in pseudo-polynomial time.
106
5
M. Cieliebak, S. Eidenbenz, and A. Pagourtzis
Adding Exclusion Constraints
In this section we study the problem k Equal Sum Subsets with Exclusions where we are additionally given an exclusion graph (or its complement: a
preference graph) and ask for teams that take this graph into account.
Obviously, k Equal Sum Subsets with Exclusions is NP-complete, since
k Equal Sum Subsets (shown NP-complete in [3]) is the special case where
the exclusion graph is empty (Eex = ∅). Here, we present a pseudo-polynomial
algorithm for the case k = 2, using a dynamic programming approach similarin-spirit to the one used for finding two equal sum subsets (without exclusions)
[1].
Let A = {a1 , . . . , an } and Gex = (A, Eex ) be an instance of k Equal Sum
Subsets with Exclusions. We assume
w.l.o.g. that the input values are orn
dered, i.e., a1 ≤ . . . ≤ an . Let S = i=1 ai .
We define boolean variables F (k, t) for k ∈ {1, . . . , n} and t ∈ {1, . . . , S}.
Variable F (k, t) will be TRUE if there exists a set X ⊆ A such that X ⊆
{a1 , . . . , ak }, ak ∈ X, sum(X) = t, and X is independent in Gex . For a TRUE
entry F (k, t) we store the corresponding set in a second variable X(k, t).
We compute the value of all variables F (k, t) by iterating over t and k. The
algorithm runs until it finds the smallest t ∈ {1, . . . , S} for which there are
indices k, ℓ ∈ {1, . . . , n} such that F (k, t) = F (ℓ, t) = TRUE; in this case, sets
X(k, t) and X(ℓ, t) constitute a solution: sum(X(k, t)) = sum(X(ℓ, t)) = t, both
sets are disjoint due to minimality of t, and both sets are independent in Gex .
We initialize the variables as follows. For all 1 ≤ k ≤ n, we set F (k, t) =
k
FALSE for 1 ≤ t < ak and for i=1 ai < t ≤ S; moreover, we set F (k, ak ) =
TRUE and X(k, ak ) = {ak }. Observe that these equations already define F (1, t)
for 1 ≤ t ≤ S, and F (k, 1) for 1 ≤ k ≤ n.
k
After initialization, the table entries for k > 1 and ak ≤ t ≤ i=1 ai can be
computed recursively: F (k, t) is TRUE if there exists an index ℓ ∈ {1, . . . , k − 1}
such that F (ℓ, t − ak ) is TRUE and the subset X(ℓ, t − ak ) remains independent
in Gex when adding ak . The recursive computation is as follows.
F (k, t) =
k−1
[ F (ℓ, t − ak ) ∧ ∀a ∈ X(ℓ, t − ak ), (a, ak ) ∈ Eex ].
ℓ=1
If F (k, t) is set to TRUE due to F (ℓ, t − ak ), then we set X(k, t) = X(ℓ, t −
ak ) ∪ {ak }. The key observation for showing correctness is that for each F (k, t)
considered by the algorithm there is at most one F (ℓ, t − ak ) that is TRUE, for
1 ≤ ℓ ≤ k − 1; if there were two, say ℓ1 , ℓ2 , then X(ℓ1 , t − ak ) and X(ℓ2 , t − ak )
would be a solution to the problem and the algorithm would have stopped earlier
– a contradiction. This means that all subsets considered are constructed in a
unique way, and therefore no information can be lost.
In order to determine the value F (k, t), the algorithm considers k − 1 table
entries. As shown above, only one of them may be TRUE; for such an entry, say
F (ℓ, t − ak ), the (at most ℓ) elements of X(ℓ, t − ak ) are checked to see if they
Composing Equipotent Teams
107
exclude ak . Hence, computation of F (k, t) takes time O(n) and the total time
complexity of the algorithm is O(n2 · S). Therefore, we have the following
Theorem 5. Two Equal Sum Subsets with Exclusions can be solved for
input A = {a1 , . . . , an } and Gex = (A, Eex ) in pseudo-polynomial time O(n2 ·S),
where S = sum(A).
Remarks: Observe that the problem k Equal Sum Subsets of Cardinality
c with Exclusions, where cardinality c is constant, and an exclusion graph is
given, can be solved by exhaustive search in time O(nkc ) in the same way as the
problem k Equal Sum Subsets of Cardinality c is solved (see Section 4).
Moreover, we can have a pseudo-polynomial time algorithm for k Equal Sum
Subsets of Equal Cardinality with Exclusions, where the cardinality is
part of the input, if k = 2, by modifying the dynamic programming algorithm
for Two Equal Sum Subsets with Exclusions as follows. We introduce a
further dimension in our table F , the cardinality, and set F (k, t, c) to TRUE
if there is a set X with sum(X) = t (and all other conditions as before), and
such that the cardinality of X equals c. Again, we can fill the table recursively,
and we stop as soon as we find values k, ℓ ∈ {1, . . . , n}, t ∈ {1, . . . , S} and
c ∈ {1, . . . , n} such that F (k, t, c) = F (ℓ, t, c) = TRUE, which yields a solution.
Notice that the corresponding two sets must be disjoint, since otherwise removing
their intersection would yield two subsets of smaller equal cardinality that are
independent in Gex ; thus, the algorithm - which constructs two sets of minimal
cardinality - would have stopped earlier. Table F now has n2 · S entries, thus we
can solve Two Equal Sum Subsets with Exclusions in time O(n3 · S).
Note that the above sketched algorithm does not work for specified cardinalities, because there may be exponentially many ways to construct a subset of
the correct cardinality.
6
Conclusion – Open Problems
In this work we studied the problem k Equal Sum Subsets and some of its
variations. We presented a pseudo-polynomial time algorithm for constant k, and
proved strong NP-completeness for non-constant k, namely for the case in which
we want to find nq subsets of equal sum, where n is the cardinality of the input
set and q a constant. We also gave pseudo-polynomial time algorithms for the k
Equal Sum Subsets of Specified Cardinality problem and for the Two
Equal Sum Subsets with Exclusions problem, as well as for variations of
them.
Several questions remain open. Some of them are: determine the exact borderline between pseudo-polynomial time solvability and strong NP-completeness
for k Equal Sum Subsets, for k being a function different than nq , for example
k = logq n ; find faster dynamic programming algorithms for k Equal Sum Subsets of Specified Cardinality; and, finally, determine the complexity of k
Equal Sum Subsets with Exclusions, i.e. is it solvable in pseudo-polynomial
time or strongly NP-complete?
108
M. Cieliebak, S. Eidenbenz, and A. Pagourtzis
Another promising direction is to investigate approximation versions related
to the above problems, for example “given a set of numbers A, find k subsets of
A with sums that are as similar as possible”. For k = 2, the problem has been
studied by Bazgan et al. [1] and Woeginger [8]; an FPTAS was presented in [1].
We would like to find out whether there is an FPTAS for any constant k. Finally,
it would be interesting to study phase transitions of these problems with respect
to their parameters, in a spirit similar to the work of Borgs, Chayes and Pittel
[2], where they analyzed the phase transition of Two Equal Sum Subsets.
Acknowledgments. We would like to thank Peter Widmayer for several fruitful
discussions and ideas in the context of this work.
References
1. C. Bazgan, M. Santha, and Zs. Tuza; Efficient approximation algorithms for the
Subset-Sum Equality problem; Proc. ICALP’98, pp. 387–396.
2. C. Borgs, J.T. Chayes, and B. Pittel; Sharp Threshold and Scaling Window for the
Integer Partitioning Problem; Proc. STOC’01, pp. 330–336.
3. M. Cieliebak, S. Eidenbenz, A. Pagourtzis, and K. Schlude; Equal Sum Subsets:
Complexity of Variations; Technical Report 370, ETH Zürich, Department of Computer Science, 2003.
4. M. Cieliebak, S. Eidenbenz, and P. Penna; Noisy Data Make the Partial Digest
Problem N P -hard; Technical Report 381, ETH Zürich, Department of Computer
Science, 2002.
5. M.R. Garey and D.S. Johnson; Computers and Intractability: A Guide to the Theory
of NP-completeness; Freeman, San Francisco, 1979.
6. R.M. Karp; Reducibility among combinatorial problems; in R.E. Miller and J.W.
Thatcher (eds.), Complexity of Computer Computations, Plenum Press, New York,
pp. 85 – 103, 1972.
7. S. Martello and P. Toth; Knapsack Problems; John Wiley & Sons, Chichester, 1990.
8. G.J. Woeginger and Z.L. Yu; On the equal-subset-sum problem; Information Processing Letters, 42(6), pp. 299–302, 1992.
Efficient Algorithms for GCD and Cubic
Residuosity in the Ring of Eisenstein Integers⋆
Ivan Bjerre Damgård and Gudmund Skovbjerg Frandsen
BRICS⋆⋆
Department of Computer Science
University of Aarhus
Ny Munkegade
DK-8000 Aarhus C, Denmark
{ivan,gudmund}@daimi.au.dk
Abstract. We present simple and efficient algorithms for computing
gcd and cubic residuosity in the ring of Eisenstein integers, Z[ζ], i.e. the
integers extended with ζ, a complex primitive third root of unity. The
algorithms are similar and may be seen as generalisations of the binary
integer gcd and derived Jacobi symbol algorithms. Our algorithms take
time O(n2 ) for n bit input. This is an improvement from the known
results based on the Euclidean algorithm, and taking time O(n · M (n)),
where M (n) denotes the complexity of multiplying n bit integers. The
new algorithms have applications in practical primality tests and the
implementation of cryptographic protocols.
1
Introduction
The Eisenstein integers, Z[ζ] = {a+bζ | a, b ∈ Z}, is the ring of integers extended
with a complex primitive third root of unity, i.e. ζ is root of x2 + x + 1. Since the
ring Z[ζ] is a unique factorisation domain, a greatest common divisor (gcd) of two
numbers is well-defined (up to multiplication by a unit). The gcd of two numbers
may be found using the classic Euclidean algorithm, since Z[ζ] is an Euclidean
domain, i.e. there is a norm N (·) : Z[ζ] \ {0} → N such that for a, b ∈ Z[ζ] \ {0}
there is q, r ∈ Z[ζ] such that a = qb + r with r = 0 or N (r) < N (b).
When a gcd algorithm is directly based on the Euclidean property, it requires
a subroutine for division with remainder. For integers there is a very efficient
alternative in the form of the binary gcd, that only requires addition/subtraction
and division by two [12]. A corresponding Jacobi symbol algorithm has been
analysed as well [11].
It turns out that there are natural generalisations of these binary algorithms
over the integers to algorithms over the Eisenstein integers for computing the
⋆
⋆⋆
Partially supported by the IST Programme of the EU under contract number IST1999-14186 (ALCOM-FT).
Basic Research in Computer Science, Centre of the Danish National Research Foundation.
A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 109–117, 2003.
c Springer-Verlag Berlin Heidelberg 2003
110
I.B. Damgård and G.S. Frandsen
gcd and the cubic residuosity symbol. The role of 2 is taken by the number 1 − ζ,
which is a prime of norm 3 in Z[ζ].
We present and analyse these new algorithms. It turns out that they both
have bit complexity O(n2 ), which is an improvement over the so far best known
algorithms by Scheidler and Williams [8], Williams [16], Williams and Holte [17].
Their algorithms have complexity O(nM (n)), where M (n) is the complexity of
integer multiplication and the best upper bound on M (n) is O(n log n log log n)
[10].
1.1
Related Work
The asymptotically fastest algorithm for integer gcd takes time O(nlognloglogn)
and is due to Schönhage [9]. There is a derived algorithm for the Jacobi symbol
of complexity O(n(log n)2 log log n). For practical input sizes the most efficient
algorithms seems to be variants of the binary gcd and derived Jacobi symbol
algorithms [11,7].
If ωn is a complex primitive nth root of unity, say ωn = e2π/n then the ring
Z[ωn ] is known to be norm-Euclidean for only finitely many n and the smallest
unresolved case is n = 17 [6,4].
Weilert have generalised both the “binary” and the asymptotically fast gcd
algorithms to Z[ω4 ] = Z[i], the ring of Gaussian integers [13,14]. In the latter
case Weilert has also described a derived algorithm for computing the quartic
residue symbol [15], and in all cases the complexity is identical to the complexity
of the corresponding algorithm over Z.
Williams [16], Williams and Holte [17] both describe algorithms for computing gcd and cubic residue symbols in Z[ω3 ], the Eisenstein integers. Scheidler and
Williams describe algorithms for computing gcd and nth power residue symbol
in Z[ωn ] for n = 3, 5, 7 [8]. Their algorithms all have complexity O(nM (n)) for
M (n) being the complexity of integer multiplication.
Weilert suggests that his binary (i.e. (1 + i)-ary) gcd algorithm for the Gaussian integers may generalise to other norm-Euclidean rings of algebraic integers
[13]. Our gcd algorithm for the Eisenstein integers was obtained independently,
but it may nevertheless be seen as a confirmation of this suggestion in a specific
case. It is an open problem whether the “binary” approach to gcd computation
may be further generalised to Z[ω5 ].
Weilert gives an algorithm for the quartic residue symbol that is derived
from the asymptotically fast gcd algorithm over Z[i]. For practical purposes,
however, it would be more interesting to have a version derived from the “binary”
approach. In the last section of this paper, we sketch how one can obtain such
an algorithm.
1.2
Applications
Our algorithms may be used for the efficient computation of cubic residuosity in other rings than Z[ζ] when using an appropriate homomorphism. As an
Efficient Algorithms for GCD and Cubic Residuosity
111
example, consider the finite field GF (p) for prime p ≡ 1 mod 3. A number
z ∈ {1, . . . , p − 1} is a cubic residue precisely when z (p−1)/3 ≡ 1 mod p, implying that (non)residuosity may be decided by a (slow) modular exponentiation.
However, it is possible to decide cubic residuosity much faster provided we make
some preprocessing depending only on p. The preprocessing consists in factoring p over Z[ζ], i.e. finding a prime π ∈ Z[ζ] such that p = ππ̄. A suitable π
may be found as π = gcd(p, r − ζ), where r ∈ Z is constructed as a solution
to the quadratic equation x2 + x + 1 = 0 mod p. Following this preprocessing
cubic residuosity of any z is decided using that z (p−1)/3 ≡ 1 mod p if and only
if [z/π] = 1, where [·/·] denotes the cubic residuosity symbol.
When the order of the multiplicative group in question is unknown, modular
exponentiation cannot be used, but it may still be possible to identify some
nonresidues by computing residue symbols. In particular, the primality test of
Damgård and Frandsen [2] uses our algorithms for finding cubic nonresidues in
a more general ring.
Computation of gcd and cubic residuosity is also used for the implementation
of cryptosystems by Scheidler and Williams [8], and by Williams [16].
2
Preliminary Facts about Z[ζ]
Z[ζ] is the ring of integers extended with a primitive third root of unity ζ (complex root of z 2 + z + 1). We will be using the following definitions and facts (see
f.x. [3]).
Define the two conjugate mappings σi : Z[ζ] → Z[ζ] by σi (ζ) = ζ i for i = 1, 2.
The rational integer N (α) = σ1 (α)σ2 (α) ≥ 0 is called the norm of α ∈ Z[ζ], and
N (a + bζ) = a2 + b2 − ab. (Note that σ2 (·) and N (·) coincides with complex
conjugation and complex norm, respectively).
A unit in Z[ζ] is an element of norm 1. There are 6 units in Z[ζ]: ±1, ±ζ, ±ζ 2 .
Two elements α, β ∈ Z[ζ] are said to be associates if there exists a unit ǫ such
that α = ǫβ.
A prime π in Z[ζ] is a non-unit such that for any α, β ∈ Z[ζ], if π|αβ, then
π|α or π|β.
1 − ζ is a prime in Z[ζ] and N (1 − ζ) = 3. A primary number has the
form 1 + 3β for some β ∈ Z[ζ]. If α ∈ Z[ζ] is not divisible by 1 − ζ then α is
associated to a primary number. (The definition of primary seems to vary in
that some authors require the alternate forms ±1 + 3β [5] and −1 + 3β [3], but
our definition is more convenient in the present context).
A simple computation reveals that the norm of a primary number has residue
1 modulo 3, and since the norm is a multiplicative homomorpism it follows that
every α ∈ Z[ζ] that is not divisible by 1 − ζ has N (α) ≡ 1( mod 3).
3
Computing GCD in Z[ζ]
It turns out that the well-known binary integer gcd algorithm has a natural
generalisation to a gcd algorithm for the Eisenstein integers. The generalised
112
I.B. Damgård and G.S. Frandsen
algorithm is best understood by relating it to the binary algorithm in a nonstandard version. The authors are not aware of any description of the latter in the
literature (for the standard version see f.x. [1]).
A slightly nonstandard version of the binary gcd is the following. Every
integer can be represented as (−1)i · 2j · (4m + 1), where i ∈ {0, 1}, j ≥ 0 and
m ∈ Z. Without loss of generality, we may therefore assume that the numbers
in question are of the form (4m+1). One iteration consists in replacing the
numerically larger of the two numbers by their difference. If it is nonzero then
the dividing 2-power (at least 22 ) may be removed without changing the gcd. If
necessary the resulting odd number is multiplied with −1 to get a number of the
form 4m + 1 and we are ready for the next iteration. It is fairly obvious that the
product of the numeric values of the two numbers decreases by a factor at least
2 in each step until the gcd is found, and hence the gcd of two numbers a, b can
be computed in time (log2 |ab|).
To make the analogue, we recall that any element of Z[ζ] that is not divisible
by 1 − ζ is associated to a (unique) primary number, i.e. a number of the form
1 + 3α. This implies that any element in Z[ζ] \ {0} has a (unique) representation
on the form (−ζ)i · (1 − ζ)j · (1 + 3α) where 0 ≤ i < 6, 0 ≤ j and α ∈ Z[ζ]. In
addition, the difference of two primary numbers is divisible by (1− ζ)2 , since 3 =
−ζ 2 (1 − ζ)2 . Now a gcd algorithm for the Eisenstein integers may be formulated
as an analogue to the binary integer gcd algorithm. We may assume without loss
of generality that the two input numbers are primary. Replace the (normwise)
larger of the two numbers with their difference. If it is nonzero, we may divide
out any powers of (1 − ζ) that divide the difference (at least (1 − ζ)2 ) and
convert the remaining factor to primary form by multiplying with a unit. We
have again two primary numbers and the process may be continued. In each step
we are required to identify the (normwise) larger of two numbers. Unfortunately
it would be too costly to compute the relevant norm, but it suffices to choose the
large number based on an approximation that we can afford to compute. By a
slightly nontrivial argument one may prove that the product of the norms of the
two numbers decreases by a factor at least 2 in each step until the gcd is found,
and hence the gcd of two numbers α, β can be computed in time O(log2 N (αβ)).
Algorithm 1 describes the details including a start-up to bring the two numbers on primary form.
Theorem 1. Algorithm 1 takes time O(log2 N (αβ)) to compute the gcd of α, β,
or formulated alternatively, the algorithm has bit complexity O(n2 ).
Proof. Let us assume that a number α = a + bζ ∈ Z[ζ] is represented by the
integer pair (a, b). Observe that since N (α) = a2 + b2 − ab, we have that log |a| +
log |b| ≤ log N (α) ≤ 2(log |a| + log |b|) for a, b = 0, i.e. the logarithm of the norm
is proportional to the number of bits in the representation of a number.
We may do addition, subtraction on general numbers and multiplication by
units in linear time. Since (1 − ζ)−1 = (2 + ζ)/3, division by (and check for
divisibility by) (1 − ζ) may also be done in linear time.
Efficient Algorithms for GCD and Cubic Residuosity
113
Algorithm 1 Compute gcd in Z[ζ]
Require: α, β ∈ Z[ζ] \ {0}
Ensure: g = gcd(α, β)
1: Let primary γ, δ ∈ Z[ζ] be defined by α = (−ζ)i1 · (1 − ζ)j1 · γ and β = (−ζ)i2 ·
(1 − ζ)j2 · δ.
2: g ← (1 − ζ)min{j1 ,j2 }
3: Replace α, β with γ, δ.
4: while α = β do
5:
LOOP INVARIANT: α, β are primary
6:
Let primary γ be defined by α − β = (−ζ)i · (1 − ζ)j · γ
7:
Replace “approximately” larger of α, β with γ.
8: end while
9: g ← g · α
Clearly, the startup part of the algorithm that brings the two numbers on
primary form can be done in time O(log2 N (αβ)). Hence, we need only worry
about the while loop.
We want to prove that the norm of the numbers decrease for each iteration.
The challenge is to see that forming the number α−β does not increase the norm
too much. In fact N (α−β) ≤ 4·max{N (α), N (β)}. This follows trivially from the
fact that the norm is non-negative combined with the equation N (α+β)+N (α−
β) = 2(N (α)+N (β)) that may be proven by an elementary computation. Hence,
for the γ computed in the loop of the algorithm, we get N (γ) = 3−j N (α − β) ≤
3−2 4 · max{N (α), N (β)}. In each iteration, γ ideally replaces the one of α and β
with the larger norm. However, we can not afford to actually compute the norms
to find out which one is the larger. Fortunately, by Lemma 1, it is possible in
linear time to compute an approximate norm that may be slightly smaller than
the exact norm, namely up to a factor 9/8. When γ replaces the one of α and β
with the larger approximate norm, we know that N (αβ) decreases by a factor
at least 9/4 · 8/9 = 2 in each iteration, i.e. the total number of iterations is
O(log N (αβ)).
Each loop iteration takes time O(log N (αβ)) except possibly for finding the
exponent of (1 − ζ) that divides α − β. Assume that (1 − ζ)ti is the maximal
power of (1 − ζ) that divides α − β in the ith
iteration. Then the combined
time complexity of all loop iterations is O(( i ti ) log N (αβ)). We also know
that the norm decreases by a factor at least 3ti −2 · 2 in the ith iteration, i.e.
ti −2
i (3 · 2) ≤ N (αβ). Since there is only O(log
N (αβ)) iterations it follows
that i 3ti ≤ (9/2)O(log N (αβ)) N (αβ) and hence i ti = O(log N (αβ)).
Lemma 1. Given α = a + bζ ∈ Z[ζ] it is possible to compute an approximate
norm Ñ (α) such that
8
N (α) ≤ Ñ (α) ≤ N (α)
9
in linear time, i.e. in time O(log N (α)).
114
I.B. Damgård and G.S. Frandsen
Proof. Note that
N (a + bζ) =
(a − b)2 + a2 + b2
.
2
Given ǫ > 0, we let d˜ denote some approximation to integer d satisfying that
(1 − ǫ)|d| ≤ d˜ ≤ |d|. Note that
(1 − ǫ)2 N (a + bζ) ≤
(a
− b)2 + ã2 + b̃2
≤ N (a + bζ)
2
Since we may compute a−b in linear time it suffices to compute ˜-approximations
and square them in linear time for some ǫ < 1/18. Given d in the usual binary
representation, we take d˜ to be |d| with all but the 6 most significant bits replaced
with zeroes, in which case
(1 −
1
)|d| ≤ d˜ ≤ |d|
32
and we can compute d˜2 from d in linear time.
4
Computing Cubic Residuosity in Z[ζ]
Just as the usual integer gcd algorithms may be used for constructing algorithms
for the Jacobi symbol, so can our earlier strategy for computing the gcd in Z[ζ]
be used as the basis for an algorithm for computing the cubic residuosity symbol.
We start by recalling the definition of the cubic residuosity symbol.
[·/·] : Z[ζ] × (Z[ζ] − (1 − ζ)Z[ζ]) → {0, 1, ζ, ζ −1 }
is defined as follows:
– For prime π ∈ Z[ζ] where π is not associated to 1 − ζ:
[α/π] = (α
– For number β =
t
i=1
N (π)−1
3
) mod π
πimi ∈ Z[ζ] where β is not divisible by 1 − ζ:
[α/β] =
t
[α/πi ]mi
i=1
Note that these rules imply [α/ǫ] = 1 for a unit ǫ and [α/β] = 0 when gcd(α, β) =
1. In addition, we will need the following laws satisfied by the cubic residuosity
symbol (recall that β is primary when it has the form β = 1 + 3γ for γ ∈ Z[ζ])
[5]:
– Modularity:
[α/β] = [α′ /β],
when α ≡ α′ ( mod β).
Efficient Algorithms for GCD and Cubic Residuosity
115
– Multiplicativity:
[αα′ /β] = [α/β] · [α′ /β].
– The cubic reciprocity law:
[α/β] = [β/α],
when α and β are both primary.
– The complementary laws (for primary β = 1 + 3(m + nζ), where m, n ∈ Z)
[1 − ζ/β] = ζ m ,
[ζ/β] = ζ −(m+n) ,
[−1/β] = 1.
The cubic residuosity algorithm will follow the gcd algorithm closely. In each
iteration we will assume the two numbers α, β to be primary with Ñ (α) ≥ Ñ (β).
We write their difference on the form α − β = (−ζ)i (1 − ζ)j γ, for primary
γ = 1+3(m+nζ). By the above laws, [α/β] = ζ mj−(m+n)i [γ/β]. If Ñ (α) < Ñ (β),
we use the reciprocity law to swap γ and β before being ready to a new iteration.
The algorithm stops, when the two primary numbers are identical. If the identical
value (the gcd) is not 1 then the residuosity symbol evaluates to 0.
Algorithm 2 describes the entire procedure including a start-up to ensure
that the numbers are primary.
Algorithm 2 Compute cubic residuosity in Z[ζ]
Require: α, β ∈ Z[ζ] \ {0}, and β is not divisible by (1 − ζ)
Ensure: c = [α/β]
1: Let primary γ, δ ∈ Z[ζ] be defined by α = (−ζ)i1 · (1 − ζ)j1 · γ and β = (−ζ)i2 · δ.
2: Let m, n ∈ Z be defined by δ = 1 + 3m + 3nζ.
3: t ← mj1 − (m + n)i1 mod 3
4: Replace α, β by γ, δ.
5: If Ñ (α) < Ñ (β) then interchange α, β.
6: while α = β do
7:
LOOP INVARIANT: α, β are primary and Ñ (α) ≥ Ñ (β)
8:
Let primary γ be defined by α − β = (−ζ)i · (1 − ζ)j · γ
9:
Let m, n ∈ Z be defined by β = 1 + 3m + 3nζ.
10:
t ← t + mj − (m + n)i mod 3
11:
Replace α with γ.
12:
If Ñ (α) < Ñ (β) then interchange α, β.
13: end while
14: If α = 1 then c ← 0 else c ← ζ t
Theorem 2. Algorithm 2 takes time O(log2 N (αβ)) to compute [α/β], or formulated alternatively, the algorithm has bit complexity O(n2 ).
Proof. The complexity analysis from the gcd algorithm carries over without
essential changes.
116
5
I.B. Damgård and G.S. Frandsen
Computing GCD and Quartic Residuosity in the Ring
of Gaussian Integers
We may construct fast algorithms for gcd and quartic residuosity in the ring of
Gaussian integers, Z[i] = {a+bi | a, b ∈ Z}, in a completely analogous way to the
algorithms over the Eisenstein integers. In the case of the gcd, this was essentially
done by Weilert [13]. However, the case of the quartic residue symbol may be
of independent interest since such an algorithm is likely to be more efficient for
practical input values than the asymptically ultrafast algorithm [15].
Here is a sketch of the necessary facts (see [5]). There are 4 units in Z[i]:
±1, ±i. 1 + i is a prime in Z[i] and N (1 + i) = 2. A primary number has the
form 1 + (2 + 2i)β for some β ∈ Z[i]. If α ∈ Z[i] is not divisible by 1 + i then α
is associated to a primary number.
In particular, any element in Z[i] \ {0} has a (unique) representation on the
form ij · (1 + i)k · (1 + (2 + 2i)α) where 0 ≤ j < 4, 0 ≤ k and α ∈ Z[i]. In
addition, the difference of two primary numbers is divisible by (1 + i)3 , since
(2 + 2i) = −i(1 + i)3 . This is the basis for obtaining an algorithm for computing
gcd over the Gaussian integers analogous to Algorithm 1. This new algorithm
has also bit complexity O(n2 ) as one may prove when using that N ((1 + i)3 ) = 8
and N (α − β) ≤ 4 · max{N (α), N (β)}.
For computing quartic residuosity, we need more facts [5]. If π is a prime in
Z[i] and π is not associated to 1 + i then N (π) ≡ 1( mod 4), and the quartic
residue symbol [·/·] : Z[i] × (Z[i] − (1 + i)Z[i]) → {0, 1, −1, i, −i} is defined as
follows:
– For prime π ∈ Z[i] where π is not associated to 1 + i:
N (π)−1
4
[α/π] = (α
– For number β =
t
j=1
mj
πj
) mod π
∈ Z[i] where β is not divisible by 1 + i:
t
[α/β] =
[α/πj ]mj
j=1
The quartic residuosity symbol satisfies in addition
– Modularity:
[α/β] = [α′ /β],
when α ≡ α′ ( mod β).
– Multiplicativity:
[αα′ /β] = [α/β] · [α′ /β].
– The quartic reciprocity law:
[α/β] = [β/α] · (−1)
N (α)−1 N (β)−1
·
4
4
,
when α and β are both primary.
Efficient Algorithms for GCD and Cubic Residuosity
117
– The complementary laws (for primary β = 1 + (2 + 2i)(m + ni), where
m, n ∈ Z)
2
[1 + i/β] = i−n−(n+m) ,
[i/β] = in−m .
This is the basis for obtaining an algorithm for computing quartic residuosity
analogous to Algorithm 2. This new algorithm has also bit complexity O(n2 ).
References
1. Eric Bach and Jeffrey Shallit. Algorithmic number theory. Vol. 1. Foundations of
Computing Series. MIT Press, Cambridge, MA, 1996. Efficient algorithms.
2. Ivan B. Damgård and Gudmund Skovbjerg Frandsen. An extended quadratic
Frobenius primality test with average and worst case error estimates. Research
Series RS-03-9, BRICS, Department of Computer Science, University of Aarhus,
February 2003. Extended abstract in these proceedings.
3. Kenneth Ireland and Michael Rosen. A classical introduction to modern number
theory, Vol. 84 of Graduate Texts in Mathematics. Springer-Verlag, New York,
second edition, 1990.
4. Franz Lemmermeyer. The Euclidean algorithm in algebraic number fields. Exposition. Math. 13(5) (1995), 385–416.
5. Franz Lemmermeyer. Reciprocity laws. Springer Monographs in Mathematics.
Springer-Verlag, Berlin, 2000. From Euler to Eisenstein.
6. Hendrik W. Lenstra, Jr. Euclidean number fields. I. Math. Intelligencer 2(1)
(1979/80), 6–15.
7. Shawna Meyer Eikenberry and Jonathan P. Sorenson. Efficient algorithms for
computing the Jacobi symbol. J. Symbolic Comput. 26(4) (1998), 509–523.
8. Renate Scheidler and Hugh C. Williams. A public-key cryptosystem utilizing cyclotomic fields. Des. Codes Cryptogr. 6(2) (1995), 117–131.
9. A. Schönhage. Schnelle Berechnung von Kettenbruchentwicklungen. Acta Informat. 1 (1971), 139–144.
10. A. Schönhage and V. Strassen. Schnelle Multiplikation grosser Zahlen. Computing
(Arch. Elektron. Rechnen) 7 (1971), 281–292.
11. Jeffrey Shallit and Jonathan Sorenson. A binary algorithm for the jacobi symbol.
ACM SIGSAM Bull. 27(1) (1993), 4–11.
12. J. Stein. Computationals problems associated with Racah algebra. J. Comput.
Phys. 1 (1967), 397–405.
13. André Weilert. (1 + i)-ary GCD computation in Z[i] is an analogue to the binary
GCD algorithm. J. Symbolic Comput. 30(5) (2000), 605–617.
14. André Weilert. Asymptotically fast GCD computation in Z[i]. In Algorithmic
number theory (Leiden, 2000), Vol. 1838 of Lecture Notes in Comput. Sci., pp.
595–613. Springer, Berlin, 2000.
15. André Weilert. Fast computation of the biquadratic residue symbol. J. Number
Theory 96(1) (2002), 133–151.
16. H. C. Williams. An M 3 public-key encryption scheme. In Advances in cryptology—
CRYPTO ’85 (Santa Barbara, Calif., 1985), Vol. 218 of Lecture Notes in Comput.
Sci., pp. 358–368. Springer, Berlin, 1986.
17. H. C. Williams and R. Holte. Computation of the solution of x3 + Dy 3 = 1. Math.
Comp. 31(139) (1977), 778–785.
An Extended Quadratic Frobenius Primality
Test with Average and Worst Case Error
Estimates⋆ ⋆⋆
Ivan Bjerre Damgård and Gudmund Skovbjerg Frandsen
BRICS⋆ ⋆ ⋆
Department of Computer Science, University of Aarhus.
{ivan,gudmund}@daimi.au.dk
Abstract. We present an Extended Quadratic Frobenius Primality Test
(EQFT), which is related to an extends the Miller-Rabin test and the
Quadratic Frobenius test (QFT) by Grantham. EQFT takes time about
equivalent to 2 Miller-Rabin tests, but has much smaller error probability,
namely 256/331776t for t iterations of the test in the worst case. We give
bounds on the average-case behaviour of the test: consider the algorithm
that repeatedly chooses random odd k bit numbers, subjects them to
t iterations of our test and outputs the first one found that passes all
tests. We obtain numeric upper bounds for the error probability of this
algorithm as well as a general closed expression bounding the error. For
instance, it is at most 2−143 for k = 500, t = 2. Compared to earlier
similar results for the Miller-Rabin test, the results indicates that our
test in the average case has the effect of 9 Miller-Rabin tests, while only
taking time equivalent to about 2 such tests. We also give bounds for
the error in case a prime is sought by incremental search from a random
starting point.
1
Introduction
Efficient methods for primality testing are important, in theory as well as in
practice. Tests that always return correct results exist see for instance [1], but
all known tests of this type are only of theoretical interest because they are much
too inefficient to be useful in practice. In contrast, tests that accept composite
numbers with bounded probability are typically much more efficient. This paper
presents and analyses one such test. Primality tests are used, for instance, in
public-key cryptography, where efficient methods for generating large, random
primes are indispensable tools. Here, it is important to know how the test behaves
in the average case. But there are also scenarios (e.g., in connection with DiffieHellman key exchange) where one needs to test if a number n is prime and where
⋆
⋆⋆
⋆⋆⋆
Partially supported by the IST Programme of the EU under contract number IST1999-14186 (ALCOM-FT).
Full paper is available at http://www.brics.dk/RS/03/9/index.html
Basic Research in Computer Science, Centre of the Danish National Research Foundation.
A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 118–131, 2003.
c Springer-Verlag Berlin Heidelberg 2003
An Extended Quadratic Frobenius Primality Test
119
n may have been chosen by an adversary. Here the worst case performance of
the test is important.
Virtually all known probabilistic tests are built on the same basic principle:
from the input number n, one defines an Abelian group and then tests if the
group structure we expect to see if n is prime, is actually present. The well-known
Miller-Rabin test uses the group Zn∗ in exactly this way. A natural alternative
is to try a quadratic extension of Zn , that is, we look at the ring Zn [x]/(f (x))
where f (x) is a degree 2 polynomial chosen such that it is guaranteed to be
irreducible if n is prime. In that case the ring is isomorphic to the finite field
with n2 elements, GF (n2 ). This approach was used successfully by Grantham [6],
who proposed the Quadratic Frobenius Test (QFT), and showed that it accepts
a composite with probability at most 1/7710, i.e. a better bound than may be
achieved using 6 independent Miller-Rabin tests, while asymptotically taking
time approximately equivalent to only 3 such tests. Müller proposes a different
approach based on computation of square roots, the MQFT [7,8] which takes
the same time as QFT and has error probability essentially1 1/131040. Just
as for the Miller-Rabin test, however, it seems that most composites would be
accepted with probability much smaller than the worst-case numbers. A precise
result quantifying this intuition would allow us to give better results on the
average case behaviour of the test, i.e., when it is used to test numbers chosen at
random, say, from some interval. Such an analysis has been done by Damgård,
Landrock and Pomerance for the Miller-Rabin test, but no corresponding result
for QFT or MQFT is known.
In this paper, we propose a new test that can be seen as an extension of QFT.
We call this the Extended Quadratic Frobenius test (EQFT). EQFT comes in
two variants, EQFTac which works well in an average case analysis and EQFTwc,
which is better for applications where the worst case behavior is important.
For the average case analysis: consider an algorithm that repeatedly chooses
random odd k-bit numbers, subject each number to t iterations of EQFTac,
and outputs the first number found that passes all t tests. Under the ERH,
each iteration takes expected time equivalent to about 2 Miller-Rabin tests, or
2/3 of the time for QFT/MQFT (the ERH is only used to bound the run time
and does not affect the error probability). Let qk,t be the probability that a
composite is output. We derive numeric upper bounds for qk,t , e.g., we show
q500,2 ≤ 2−143 , and also show√a general upper bound, namely for 2 ≤ t ≤ k − 1,
qk,t is O(k 3/2 2(σt +1)t t−1/2 4− 2σt tk ) with an easily computable big-O constant,
where σt = log2 24 − 2/t. Comparison to the similar analysis by Damgård et al.
for the MR test indicates that for t ≥ 2, our test in the average case roughly
speaking has the effect of 9 Miller-Rabin tests, while only taking time equivalent
to 2 such tests. We also analyze the error probability when a random k-bit prime
is instead generated using incremental search from a random starting point, still
using (up to) t iterations of our test to distinguish primes from composites.
1
The test and analysis results are a bit different, depending on whether the input is
3 or 1 modulo 4, see [7,8] for details
120
I.B. Damgård and G.S. Frandsen
Concerning worst case analysis, we show that t iterations of EQFTwc err with
probability at most 256/331776t except for an explicit finite set of numbers2 . The
same worst case error probability can be shown for EQFTac, but this variant is
up to 4 times slower on worst case inputs than in the average case, namely on
numbers n where very large powers of 2 and 3 divide n2 − 1. For EQFTwc, on
the other hand, t iterations take time equivalent to about 2t + 2 MR tests on all
inputs (still assuming ERH). For comparison with EQFT/MQFT, assume that
we are willing to spend the same fixed amount of time testing an input number.
Then EQFTwc gives asymptotically a better bound on the error probability:
using time approximately corresponding to 6t Miller-Rabin tests, we get error
probability 1/77102t ≈ 1/19.86t using QFT, 1/1310402t ≈ 1/50.86t using MQFT,
and 256/3317763t−1 ≈ 1/5766t using EQFTwc.
2
The Intuition behind EQFT
2.1
A Simple Initial Idea
Given the number n to be tested, we start by constructing a quadratic extension
Zn [X]/(f (X)), which is kept fixed during the entire test (across all iterations).
We let H be the multiplicative group in this extension ring. If n is prime, the
quadratic extension is a field, and so H is cyclic of order n2 − 1. We may of
course assume that n is not divisible by 2 or 3, which implies that n2 − 1 is
always divisible by 24. Let H24 be the subgroup of elements of order dividing
24. If H is cyclic, then clearly |H24 | = 24. On the other hand, if n is not prime,
H is the direct product of a number of subgroups, one for each distinct prime
factor in n, and we may have |H24 | ≫ 24.
Now, suppose we are already given an element r ∈ H of order 24. Then a
very simple approach to a primality test could be the following: Choose a random
element z in H, and verify that z n = z̄, where z̄ refers to the standard conjugate
2
(explained later). This implies z n −1 = 1 for any invertible z and so is similar
to the classical Fermat test. It is, however, in general a much stronger test than
just checking the order of z. Then, from z construct an element z ′ chosen from
H24 with some “suitable” distribution. For this intuitive explanation, just think
of z ′ as being uniform in H24 . Now check that z ′ ∈< r >, i.e. is a power of
r. This must be the case if n is prime, but may fail if n is composite. This is
similar to the part of the MR test that checks for existence of elements of order
2 different from -1.
To estimate the error probability, let ω be the number of distinct prime
factors in n. Since H is the direct product of ω subgroups, H24 is typically of
order 24ω 3 . As one might then expect, it can be shown that the error probability
of the test is at most 24/24ω times the probability that z n = z̄ . The factor 241−ω
corresponds to the factor of 21−ω one obtains for the MR test.
2
3
namely if n has no prime factors less than 118, or if n ≥ 242
it may be smaller, but then the Fermat-like part of the test is stronger than otherwise,
so we only consider the maximal case in this section
An Extended Quadratic Frobenius Primality Test
2.2
121
Some Problems and Two Ways to Solve Them
It is not clear how to construct an element of order 24 (if it exists at all),
and we have not specified how to construct z ′ from z. We present two different
approaches to these problems.
EQFTwc. In this approach, we run a start-up procedure that may discover
that n is composite. But if not, it constructs an element of order 24 and also
guarantees that H contains ω distinct subgroups, each of order divisible by 2u 3v ,
where 2u , 3v are the maximal 2- and 3-powers dividing n2 − 1. This procedure
runs in expected time O(1) Miller-Rabin tests. Details on the idea behind it
are given in Section 5. Having run the start-up procedure, we construct z ′ as
2
z ′ = z (n −1)/24 . Note that without the condition on the subgroups of H, we
could have z ′ = 1 always which would clearly be bad. Each z can be tested in
time approximately 2 MR tests, for any n. This leads to the test we call EQFTwc
(since it works well in a worst case analysis).
EQFTac. The other approach avoids spending time on the start-up. This comes
at the cost that the test becomes slower on n’s where u, v are very large. But this
only affects a small fraction of the potential inputs and is not important when
testing randomly chosen n, since then the expected values of u, v are constant.
The basic idea is the following: we start choosing random z’s immediately,
and instead of trying to produce an element in H24 from z, we look separately
for an element of order dividing 3 and one of order dividing 8. For order 3,
2
v
we compute z (n −1)/3 and repeatedly cube this value at most v times. This is
guaranteed to produce an element of order 3, if 3 divides the order of z. If we
already know an element ξ3 of order 3, we can check that the new element we
produce is in the group generated by ξ3 , and if not, n is composite. Of course, we
do not know an element of order 3 from the start, but note that the computations
we do on each z may produce such an element. So if we do several iterations
of the test, as soon as an iteration produces an element of order 3, this can be
used as ξ3 by subsequent iterations. A similar idea can be applied to elements
of order 8.
This leads to a test of strength comparable to EQFTwc, except for one problem: the iterations we do before finding elements of the right order may have
larger error probability than the others. This can be compensated for by a number of further tricks: rather than choosing z uniformly, we require that N (z) has
Jacobi symbol 1, where N () is a fixed homomorphism from H to Zn∗ defined
below. This means we can expect z to have order a factor 2 smaller than otherwise4 , and this turns out to improve the error probability of the Fermat-like part
of the test by a factor of 21−ω . Moreover, some partial testing of the elements
we produce is always possible: for instance, we know n is composite if we see
an element of order 2 different from -1. These tricks imply that the test, up to
4
This also means that we should look for an element ξ4 of order 4 (and not 8) in the
part of the test that produces elements of order a 2-power
122
I.B. Damgård and G.S. Frandsen
a small constant factor on the error probability, is as good as if we had known
ξ3 , ξ4 from the start. This version of the test is called EQFTac (since it works
well in an average case analysis). We show that it satisfies the same upper bound
on the error probability as we have for EQFTwc.
2.3
Comparison to Other Tests
We give some comments on the similarities and difference between EQFT and
Grantham’s QFT. In QFT the quadratic extension, that is, the polynomial f (x),
is randomly chosen, whereas the element corresponding to our z is chosen deterministically, given f (x). This seems to simplify the error analysis for EQFT.
Other than that, the Fermat part of QFT is transplanted almost directly to
EQFT. For the test for roots of 1, QFT does something directly corresponding
to the square root of 1 test from Miller-Rabin, but does nothing relating to elements of higher order. In fact, several of our ideas cannot be directly applied
to QFT since there, f (x) changes between iterations. As for the running time,
since our error analysis works for any (i.e. a worst case) quadratic extension, we
can pick one that has a particularly fast implementation of arithmetic, and this
is the basis for the earlier mentioned difference in running time between EQFT
and QFT.
A final comment relates to the comparison in running times between MillerRabin, Grantham’s and our test. Using the standard way to state running times
in the literature, the Miller-Rabin, resp. Grantham’s, resp. our test run in time
log n+o(log n) resp. 3 log n+o(log n) resp. 2 log n+o(log n)) multiplications in Zn .
However, the running time of Miller-Rabin is actually log n squarings +o(log n)
multiplications in Zn , while the 3 log n (2 log n) multiplications mentioned for
the other tests are a mix of squarings and multiplications. So we should also
compare the times for modular multiplications and squarings. On a standard,
say, 32 bit architecture, a modular multiplication takes time about 1.25 times
that of a modular squaring if the numbers involved are very large. However, if we
use the fastest known modular multiplication method (which is Montgomery’s
in this case, where n stays constant over many multiplications), the factor is
smaller for numbers in the range of practical interest. Concrete measurements
using highly optimized C code shows that it is between 1 and 1.08 for numbers
of length 500-1000 bits. Finally, when using dedicated hardware the factor is
exactly 1 in most cases. So we conclude that the comparisons we stated are
quite accurate also for practical purposes.
2.4
The Ring R(n, c) and EQFTac
Definition 1. Let n be an odd integer and let c be a unit modulo n.
Let R(n, c) denote the ring Z[x]/(n, x2 − c).
More concretely, an element z ∈ R(n, c) can be thought of as a degree 1
polynomial z = ax + b, where a, b ∈ Zn , and arithmetic on polynomials is
modulo x2 − c where coefficients are computed on modulo n.
An Extended Quadratic Frobenius Primality Test
123
Let p be an odd prime. If c is not a square modulo p, i.e. (c/p) = −1, then the
polynomial x2 − c is irreducible modulo p and R(p, c) is isomorphic to GF (p2 ).
Definition 2. Define the following multiplicative homomorphisms on R(n, c)
(assume z = ax + b):
· : R(n, c) → R(n, c), z = −ax + b
N (·) :
R(n, c) → Zn ,
(1)
2
2
N (z) = z · z = b − ca
(2)
and define the map (·/·) : Z × Z → {−1, 0, 1} to be the Jacobi symbol.
The maps · and N (·) are both multiplicative homomorphisms whether n is
composite or n is a prime. The primality test will be based on some additional
properties that are satisfied when p is a prime and (c/p) = −1, in which case
R(p, c) ≃ GF (p2 ):
Frobenius property / generalised Fermat property: Conjugation, z → z, is
a field automorphism on GF (p2 ). In characteristic p, the Frobenius map that
raises to the p’th power is also an automorphism, using this it follows easily that
z = zp
(3)
Quadratic residue property / generalised Solovay-Strassen property: The
norm, z → N (z), is a surjective multiplicative homomorphism from GF (p2 ) to
the subfield GF (p). As such the norm maps squares to squares and non-squares
to non-squares, it follows from the definition of the norm and (3) that
2
z (p
−1)/2
= N (z)(p−1)/2 = (N (z)/p)
(4)
4’th-root-of-1-test / generalised Miller-Rabin property: Since GF (p2 ) is a field
there are only four possible 4th roots of 1 namely 1, −1 and ξ4 , −ξ4 , the two
roots of the cyclotomic polynomial Φ4 (x) = x2 + 1. In particular, this implies
for p2 − 1 = 2u 3v q where (q, 6) = 1 that if z ∈ GF (p2 ) \ {0} is a square then
z3
v
q
= ±1, or z 2
i v
3 q
= ±ξ4 for some i = 0, . . . , u − 3
(5)
3’rd-root-of-1-test: Since GF (p2 ) is a field there is only three possible 3rd
roots of 1 namely 1 and ξ3 , ξ3−1 , the two roots of the cyclotomic polynomial
Φ3 (x) = x2 + x + 1. In particular, this implies for p2 − 1 = 2u 3v q where (q, 6) = 1
that if z ∈ GF (p2 ) \ {0} then
z2
u
q
= 1, or z 2
u i
3 q
= ξ3±1 for some i = 0, . . . , v − 1
(6)
The actual test will have two parts (see algorithm 1). In the first part, a
specific quadratic extension is chosen, i.e. R(n, c) for an explicit c. In the second
part, the above properties of R(n, c) are tested for a random choice of z. When
the EQFTac is run several times on the same n, only the second part is executed
multiple times. The second part receives two extra inputs, a 3rd and a 4th root
of 1. On the first execution of the second part these are both 1. During later
124
I.B. Damgård and G.S. Frandsen
Algorithm 1 Extended Quadratic Frobenius Test (EQFTac).
First part (construct quadratic extension):
Require: input is odd number n ≥ 13
Ensure: output is “composite” or c satisfying (c/n) = −1
1: if n is divisible by a prime less than 13 return “composite”
2: if n is a perfect square return “composite”
3: choose a small c with (c/n) = −1; return c
Second part (make actual test):
Require: input is n, c, r3 , r4 , where n ≥ 5 not divisible by 2 or 3, (c/n) = −1, r3 ∈
{1} ∪ {ξ ∈ R(n, c) | Φ3 (ξ) = 0} and r4 ∈ {1, −1} ∪ {ξ ∈ R(n, c) | Φ4 (ξ) = 0}
Let u, v be defined by n2 − 1 = 2u 3v q for (q, 6) = 1.
Ensure: output is “composite”, or “probable prime”, s3 , s4 , where s3 ∈ {1} ∪ {ξ ∈
R(n, c) | Φ3 (ξ) = 0} and s4 ∈ {1, −1} ∪ {ξ ∈ R(n, c) | Φ4 (ξ) = 0}
4: select random z ∈ R(n, c)∗ with (N (z)/n) = 1
2
5: if z = z n or z (n −1)/2 = 1 return “composite”
v
i v
6: if z 3 q = 1 and z 2 3 q = −1 for all i = 0, . . . , u − 2 return “composite”
i0 v
7: if we found i0 ≥ 1 with z 2 3 q = −1 (there can be at most one such value) then
i0 −1 v
v
3 q
. Else let R4 (z) = z 3 q (= ±1);
let R4 (z) = z 2
if (r4 = ±1 and R4 (z) ∈ {±1, ±r4 }) return “composite”
u
u i
8: if z 2 q = 1 and Φ3 (z 2 3 q ) = 0 for all i = 0, . . . , v − 1 return “composite”
u i0
9: if we found i0 ≥ 0 with Φ3 (z 2 3 q ) = 0 (there can be at most one such value)
u i0
then let R3 (z) = z 2 3 q else let R3 (z) = 1;
if (r3 = 1 and R3 (z) ∈ {1, r3±1 }) return “composite”
10: if r3 = 1 and R3 (z) = 1 then let s3 = R3 (z) else let s3 = r3 ;
if r4 = ±1 and R4 (z) = ±1 then let s4 = R4 (z) else let s4 = r4 ;
return “probable prime”, s3 , s4
executions of the second part some nontrivial roots are possibly constructed. If
so they are transferred to all subsequent executions of the second part.
Here follows some more detailed comments to algorithm 1:
Line 1 ensures that 24 | n2 − 1. In addition, we will use that n has no small
prime factors in the later error analysis.
Line 2 of the algorithm is necessary, since no c with (c/n) = −1 exists when
n is a perfect square.
Line 3 of the algorithm ensures that R(n, c) ≃ GF (n2 ) when n is a prime.
Lemma 2 defines more precisely what “small” means.
Line 4 makes sure that z is a square, when n is a prime.
Line 5 checks equations (3) and (4), the latter in accordance with the condition enforced in line 4.
Line 6 checks equation (5) to the extent possible without having knowledge
of ξ4 , a primitive 4th root of 1.
Line 7f continues the check of equation (5) by using any ξ4 given on the
input.
Line 8 checks equation (6) to the extent possible without having knowledge
of ξ3 , a primitive 3rd root of 1.
An Extended Quadratic Frobenius Primality Test
125
Line 9f continues the check of equation (6) by using any ξ3 given on the
input.
2.5
Implementation of the Test
High powers of elements in R(n, c) may be computed efficiently when c is (numerically) small. Represent z ∈ R(n, c) in the natural way by ((Az , Bz ) ∈ Zn × Zn ,
i.e. z = Az x + Bz .
Lemma 1. Let z, w ∈ R(n, c):
1. z · w may be computed from z and w using 3 multiplications and O(log c)
additions in Zn
2. z 2 may be computed from z using 2 multiplications and O(log c) additions in
Zn
Proof. For 1, we use the equations Azw = m1 + m2 and Bzw = (cAz + Bz )(Aw +
Bw ) − (cm1 + m2 ) with m1 = Az Bw and m2 = Bz Aw . For 2, we need only
observe that in the proof of 1, z = w implies that m1 = m2 .
We also need to argue that it is easy to find a small c with (c/n) = −1.
One may note that if n = 3 mod 4, then c = −1 can always be used, and if
n = 5 mod 8, then c = 2 will work. In general, we have the following:
Lemma 2. Let n be an odd composite number that is not a perfect square. Let
π− (x, n) denote the number of primes p ≤ x such that (p/n) = −1, and, as
usual, let π(x) denote the total number of primes p ≤ x. Assuming the Extended
Riemann Hypothesis (ERH), there exists a constant C (independent of n) such
that
1
π− (x, n)
>
for all x ≥ C(log n log log n)2
π(x)
3
Proof. We refer to the full paper for the proof that is based on [2, th.8.4.6].
Theorem 1. Let n be a number that is not divisible by 2 or 3, and let u ≥ 3
and v ≥ 1 be maximal such that n2 − 1 = 2u 3v q. There is an implementation
of algorithm 1 that on input n takes expected time equivalent to 2 log n + O(u +
v) + o(log n) multiplications in Zn , when assuming the ERH.
Remark 1. We can only prove a bound on the expected time, due to the random
selection of an element z (in line 4) having a property that is only satisfied by
half the elements, and to the selection of a suitable c (line 3), where at least a
third of the candidates are usable. Although there is in principle no bound on
the maximal time needed, the variance around the expectation is small because
the probability of failing to find a useful z and c drops exponentially with the
number of attempts. We emphasize that the ERH is only used to bound the
126
I.B. Damgård and G.S. Frandsen
running time (of line 3) and does not affect the error probability, as is the case
with the original Miller test.
The detailed implementation of algorithm 1 may be optimized in various
ways. The implementation given in the proof that follows this remark has focused
on simplicity more than saving a few multiplications. However, we are not aware
of any implementation that avoids the O(u + v) term in the complexity analysis.
Proof. We will first argue that only lines 5-9 in the algorithm have any significance in the complexity analysis.
line 2. By Newton iteration the square root of n may be computed using
O(log log n) multiplications.
line 3. By lemma 2, we expect to find a c of size O((log n log log n)2 ) such
that (c/n) = −1 after three attempts (or discover that n is composite).
line 4. z is selected randomly from R(n, c) \ {0}. We expect to find z with
(N (z)/n) = 1 after two attempts (or discover that n is composite).
line 5-9. Here we need to explain how it is possible to simultaneously verify
that z = z n , and do both a 4’th-root-of-1-test and a 3’rd-root-of-1-test without
using too many multiplications. We refer to lemma 1 for the implementation of
arithmetic in R(n, c).
Define s, r by n = 2u 3v s + r for 0 < r < 2u 3v . A simple calculation confirms
that
q = ns + rs + (r2 − 1)/(2u 3v ),
(7)
where the last fraction is integral. Go through the following computational steps
using the z selected in line 4 of the algorithm:
1. compute z s .
This uses 2 log n + o(log n) multiplications in Zn .
2. compute z n .
Starting from step 1 this requires O(v + u) multiplications in Zn .
3. verify z n = z.
4. compute z q .
One may compute z q from step 1 using O(v + u) multiplications in Zn , when
using (7) and the shortcut z ns = z s , where the shortcut is implied by step 3
and exponentiation and conjugation being commuting maps.
v
v
2 v
u−2 v
5. compute z 3 q , z 2·3 q , z 2 3 q , . . . , z 2 3 q .
Starting from step 4 this requires O(v + u) multiplications in Zn .
v
i v
6. verify that z 3 q = 1 or z 2 3 q = −1 for some 0 ≤ i ≤ u − 2. If there is i0 ≥ 1
i0 v
i0 −1 v
3 q
with z 2 3 q = −1 and if ξ4 is present, verify that z 2
= ±ξ4 .
u
u
u 2
u v−1
2 q 2 3q 2 3 q
2 3
q
7. compute z , z
,z
,...,z
.
Starting from step 4 this requires O(v + u) multiplications in Zn .
u i
8. By step 6 there must be an i (0 ≤ i ≤ v) such that z 2 3 q = 1. Let i0 be the
u i0 −1
q
smallest such i. If i0 ≥ 1 verify that z 2 3
is a root of x2 + x + 1. If ξ3 is
±1
2u 3i0 −1 q
present, verify in addition that z
= ξ3
An Extended Quadratic Frobenius Primality Test
3
127
An Expression Bounding the Error Probability
Theorem 2 assumes that the auxiliary inputs r3 , r4 are “good”, which should
be taken to mean that they are non-trivial third and fourth roots of 1, and are
roots in the third and fourth cyclotomic polynomial (provided such roots exist
in R(n, c). When EQFT is executed as described earlier, we cannot be sure that
r3 , r4 are good. However, the probability that they are indeed good is sufficiently
large that the theorem can still be used to bound the actual error probability as
shown in Theorem 3 (for proofs, see the full paper):
Theorem
ω 2.i Let n be an
ωodd composite number with prime power factorisation
n = i=1 pm
,
let
Ω
=
i
i=0 mi , and let c satisfy that (c/n) = −1. Given good
values of the inputs r3 , r4 , the error probability of a single iteration of the second
part of the EQFTac (algorithm 1) is bounded by
β(n, c) ≤ 241−ω
ω
2(1−mi )
pi
sel[(c/pi ),
i=1
(n/pi − 1, (p2i − 1)/24) 12
,
] ≤ 241−Ω
(p2i − 1)/24
pi − 1
where, we have adopted the notation sel[±1, E1 , E2 ] for a conditional expression
with the semantics sel[−1, E1 , E2 ] = E1 and sel[1, E1 , E2 ] = E2 .
Theorem 3. Let n be an odd composite number with ω distinct prime factors.
For any t ≥ 1, the error probability βt (n) of t iterations of EQFTac (algorithm 1) is bounded by
βt (n) ≤
4
4.1
max
(c/n)=−1
4ω−1 β(n, c)t
EQFTac: Average Case Behaviour
Uniform Choice of Candidates
Let Mk be the set of odd k-bit integers (2k−1 < n < 2k ). Consider the algorithm
that repeatedly chooses random numbers in Mk , until one is found that passes
t iterations of EQFTac, and outputs this number.
The expected time to find a “probable prime” with this method is at most
tTk /pk , where Tk is the expected time for running the test on a random number
from Mk , and pk is the probability that such a number is prime. Suppose we
choose n at random and let n2 − 1 = 2u 3v q, where q is prime to 2 and 3. It is
easy to see that the expected values of u and v are constant, and so it follows
from Theorem 1 that Tk is 2k + o(k) multiplications modulo a k bit number.
This gives approximately the same time needed to generate a probable prime,
as if we had used 2t iterations of the Miller-Rabin test in place of t iterations of
EQFTac. But, as we shall see, the error probability is much smaller than with
2t MR tests.
128
I.B. Damgård and G.S. Frandsen
Let qk,t be the probability that the algorithm above outputs a composite
number. When running t iterations of our test on input n, it follows from Theorem 3 and Theorem 2 that the probability βt (n) of accepting n satisfies
βt (n) ≤ 4ω−1 24t(1−Ω) max{
(n/p − 1, (p2 − 1)/24) 12 t
,
}
(p2 − 1)/24
p−1
where p is the largest prime factor in n and Ω is the number of prime factors in n,
counted with multiplicity. This expression is extremely similar to the one for the
Rabin test found in [5]. Therefore we can find bounds for qk,t in essentially the
same way as there. Details can be found in the full paper. We obtain numerical
estimates for qk,t , some sample results are shown in the table 1, which contains
− log2 of the estimates, so we assert that, e.g., q500,2 ≤ 2−143 .
Table 1. Lower bounds on − log2 qk,t
k\t
300
400
500
600
1000
1
42
49
57
64
86
2
105
125
143
159
212
3
139
165
187
208
276
4
165
195
221
245
325
We also get a closed expression (with an easily computable big-O constant):
√
Theorem 4. For 2 ≤ t ≤ k−1, we have that qk,t is O(k 3/2 2(σt +1)t t−1/2 4−
2σt tk
)
Comparing to corresponding results in [5] for the Miller-Rabin test one finds
that if several iteration of EQFTac are performed, then roughly speaking each
iteration has the effect of 9 Miller-Rabin tests, while only taking time equivalent
to about 2 M-R tests.
4.2
Incremental Search
The algorithm we have just analysed is in fact seldom used in practice. Most
real implementations will not want to choose candidates for primes uniformly at
random. Instead one will choose a random starting point n0 in Mk and then test
n0 , n0 + 2, n0 + 4, . . . for primality until one is found that passes t iterations of
the test. Many variations are possible, such as other step sizes, various types of
sieving, but the basic principle remains the same. The reason for applying such
an algorithm is that test division by small primes can be implemented much
more efficiently (see for instance [4]). On the other hand, the analysis we did
above depends on the assumption that candidates are independent. In [3], a way
to get around this problem for the Miller-Rabin test was suggested. We apply
an improvement of that technique here.
An Extended Quadratic Frobenius Primality Test
129
We will analyse the following example algorithm which depends on parameters t and s: choose n0 uniformly in Mk and test n0 , n0 + 2, .., n0 + 2(s − 1) using
t iterations of EQFTac. If no probable prime is found, start over with a new
independently chosen value of n0 . Output the first number found that passes all
t iterations of EQFTac.
We argue in the full paper that the expected time to find a probable prime
by the above algorithm is at most O(tk 2 ) multiplications modulo k bit numbers,
if s is θ(k). Practice shows that for s = 10 ln 2k , we need almost all the time only
one value of n0 , and so st(2k + o(k)) multiplications is an upper bound5 . Let
Qk,t,s be the probability that the above algorithm outputs a composite number.
Table 2 shows sample numeric results of our estimates of Qk,t,s .
Table 2. Estimates of the overall error probability with incremental search, lower
bounds on − log2 Qk,t,s using s = c · ln(2k ) and c = 10.
k\t
300
400
500
600
1000
5
1
18
26
34
40
62
2
74
93
109
125
176
3
107
132
153
174
239
4
133
162
186
210
288
EQFTwc: Worst Case Analysis
We present in this section the version of our test (EQFTwc) which is fast for all
n and has essentially the same error probability bound as EQFTac. The price for
this is an expected start up cost of ≤ 2 log n + o(log n) multiplications in Zn for
the first iteration of the test. For comparison of our test with the earlier tests of
Grantham, Müller and Miller-Rabin, assume that we are willing to spend some
fixed amount of time testing an input number, say, approximately corresponding
to the time for t Miller-Rabin tests. Then, using our test, we get asymptotically a
better bound on the error probability: using Miller-Rabin, Grantham[6], Müller
[7,8], and EQFTwc, respectively, we get error bounds 4−t , 19.8−t , 50.8−t and
approximately 576−t .
In Section 2, the general idea behind EQFTwc was explained. The only point
left open was the following: we need to design a start-up procedure that can either
discover that n is composite, or construct an element r24 of order 24, and also
guarantee that all Sylow-2 and -3 subgroups of R(n, c)∗ have order at least 2u , 3v
5
Of course, this refers to the run time when only the EQFTac is used. In practice,
one would use test division and other tricks to eliminate some of the non primes
faster than EQFTac can do it. This may reduce the run time significantly. Any such
method can be used without affecting the error estimates, as long as no primes are
rejected.
130
I.B. Damgård and G.S. Frandsen
Algorithm 2 Extended Quadratic Frobenius Test (EQFTwc).
First iteration:
Require: input is an odd number n ≥ 5
Ensure: output is “composite”, or “probable prime”, c ∈ Zn , r24 ∈ R(n, c)∗ , where
(c/n) = −1 and Φ24 (r24 ) = 0.
1: if n is divisible by 2 or 3 return “composite”
2: if n is a perfect square or a perfect cube return “composite”
3: choose a small c with (c/n) = −1
4: compute r ∈ R(n, c) satisfying r2 + r + 1 = 0 (may return “composite”)
5: a: if n ≡ 1 mod 3 then select a random z ∈ R(n, c)∗ with (N (z)/n) = −1 and
res3 (z) = 1.
b: if n ≡ 2 mod 3 then repeat
Make a Miller-Rabin primality test on n (may return “composite”)
select a random z ∈ R(n, c)∗ with (N (z)/n) = −1 and compute res3 (z)
until either the Miller-Rabin test returns composite or the selected z satisfies
that res3 (z) = 1
6: if z = z n return “composite”.
2
8
12
7: Let r24 = z (n −1)/24 . If r24
= r±1 or r24
= −1 return “composite”.
8: return “probable prime”, c, r24
Subsequent iterations:
Require: input is n, c, r24 , where n ≥ 5 is not divisible by 2 or 3, (c/n) = −1, and
Φ24 (r24 ) = 0
Ensure: output is “composite” or “probable prime”
9: select random z ∈ R(n, c)∗
10: if z = z n return “composite”
2
i
11: if z (n −1)/24 ∈ {r24
| i = 0, . . . , 23} return “composite”
12: return “probable prime”
(where as usual, 2u , 3v are the maximal 2- and 3-powers dividing n2 − 1). We do
this by choosing z ∈ R(n, c)∗ in such a way that if n is prime, then z is both a
2
non-square and a non-cube. This means that we can expect that z (n −1)/2 = −1
2
and that z (n −1)/3 = r±1 , where r is a primitive 3rd root of 1. If this is not the
case, n is composite. If it is, n may still be composite, but we have the required
2
condition on the Sylow-2 and -3 subgroups, and we can set r24 = z (n −1)/24 . The
subsequent iterations of the test are then very simple: take a random z ∈ R(n, c)
2
i
and check whether z = z n and z (n −1)/24 ∈ {r24
| i = 0, . . . , 23 }
Before presenting the algorithm, we need to define a homomorphism res3
from the ring R(n, c)∗ into the complex third roots of unity {1, ζ, ζ 2 }. This
homomorphism will be used to recognize cubic nonresidues.
Definition 3. For arbitrary n ≥ 5 with (n, 6) = 1, for arbitrary c with (c/n) =
−1, assume there exists an r = gx + h ∈ R(n, c) with r2 + r + 1 = 0, and if
n ≡ 1 mod 3 assume in addition that r ∈ Zn , i.e. g = 0.
Define res3 : R(n, c)∗ → {1, ζ, ζ 2 } ⊆ Z[ζ] by
2
[b − ca2 / gcd(n, r − ζ)], if n ≡ 1 mod 3
res3 (ax + b) =
[(b + a(ζ − h)/g) / n],
if n ≡ 2 mod 3
where [·/·] denotes the cubic residuosity symbol.
An Extended Quadratic Frobenius Primality Test
131
To find the element z mentioned above, we note that computing the Jacobi
symbol will let us recognize 1/2 of all elements as nonsquares. One might expect
that applying res3 would let us recognize 2/3 of all elements as noncubes. Unfortunately, all we can show is that res3 is nontrivial except possibly when n is
a perfect cube, or n is composite and n ≡ 2 mod 3. To handle this problem, we
take a pragmatic solution: Run a Miller-Rabin test and a search for noncubes
in parallel. If n is prime then the search for a noncube will succeed, and if n is
composite then the MR-test (or the noncube search) will succeed.
The following results are proved in the full paper:
Theorem 5. There is an implementation of algorithm 2 that on input n takes
expected time equivalent to at most 2 log n + o(log n) multiplications in Zn per
iteration, when assuming the ERH. The first iteration has an additional expected
start up cost equivalent to at most 2 log n + o(log n) multiplications in Zn .
Theorem
6. Let n be an
ωodd composite number with prime power factorisation
ω
i
,
let
Ω
=
n = i=1 pm
i
i=0 mi . If γt (n) denotes the probability that n passes t
iterations of the EQFTwc test (algorithm 2) then
γt (n)
≤
ω−1
max
(c/n)=−1
ω−1
≤4
4
1−ω
(24
ω
i=1
2(1−mi )
pi
sel[(c/pi ),
(n/pi − 1, p2i − 1) (n2 /p2i − 1, pi − 1) t
])
,
(pi − 1)2
p2i − 1
t(1−Ω)
24
If n has no prime factor ≤ 118 or n ≥ 242 then γt (n) ≤ 44 24−4t ≈ 28−18.36t
References
1. Manindra Agrawal, Neeraj Kayal, and Nitin Saxena. PRIMES is in P. Preprint 2002.
Department of Computer Science & Engineering, Indian Institute of Technology,
Kanpur Kanpur-208016, INDIA, 2002.
2. Eric Bach and Jeffrey Shallit. Algorithmic number theory. Vol. 1. Foundations of
Computing Series. MIT Press, Cambridge, MA, 1996. Efficient algorithms.
3. Jørgen Brandt and Ivan Damgård. On generation of probable primes by incremental
search. In Advances in cryptology—CRYPTO ’92 (Santa Barbara, CA, 1992), Vol.
740 of Lecture Notes in Comput. Sci., pp. 358–370. Springer, Berlin, 1993.
4. Jørgen Brandt, Ivan Damgård, and Peter Landrock. Speeding up prime number
generation. In Advances in cryptology—ASIACRYPT ’91 (Fujiyoshida, 1991), Vol.
739 of Lecture Notes in Comput. Sci., pp. 440–449. Springer, Berlin, 1993.
5. Ivan Damgård, Peter Landrock, and Carl Pomerance. Average case error estimates
for the strong probable prime test. Math. Comp. 61(203) (1993), 177–194.
6. Jon Grantham. A probable prime test with high confidence. J. Number Theory
72(1) (1998), 32–47.
7. Siguna Müller. A probable prime test with very high confidence for n ≡ 1 mod 4.
In Advances in cryptology—ASIACRYPT 2001 (Gold Coast), Vol. 2248 of Lecture
Notes in Comput. Sci., pp. 87–106. Springer, Berlin, 2001.
8. Siguna Müller. A probable prime test with very high confidence for n ≡ 3 mod 4.
J. Cryptology 16(2) (2003), 117–139.
Periodic Multisorting Comparator Networks⋆
Marcin Kik
Institute of Mathematics, Wroclaw University of Technology
ul. Wybrzeże Wyspiańskiego 27, 50-370 Wroclaw, Poland
kik@im.pwr.wroc.pl
Abstract. We present a family of periodic comparator networks that
transform the input so that it consists of a few sorted subsequences.
The depths of the networks range from 4 to 2 log n while the number
of sorted subsequences ranges from 2 log n to 2. They work in time
c log2 n + O(log n) with 4 ≤ c ≤ 12, and the remaining constants
are also suitable for practical applications. So far, known periodic
sorting networks of a constant depth that run in time O(log2 n) (a
periodic version of AKS network [7]) are impractical because of complex
structure and very large constant factor hidden by big “Oh”.
Keywords: sorting, comparator networks, parallel algorithms.
1
Introduction
Comparator is a simple device capable of sorting two elements. Many comparators can be connected together to form a comparator network. This way we get
the classical framework for sorting algorithms. Optimal arranging the comparators turned out to be a challenge. The main complexity measures of comparator
networks are time complexity (depth or number of steps) and the number of
comparators. The most famous sorting network is AKS network with asymptotically optimal depth O(log n) [1], however the big constant hidden by big “Oh”
makes it impractical. The Batcher networks of depth ≈ 12 log2 n [2], seem to be
very attractive for practical applications.
A periodic network is repeatedly used on the intermediate results until the
output becomes sorted, thus the same comparators are reused many times. In this
case, the time complexity is the depth of the network multiplied by the number
of iterations. The main advantage of periodicity is the reduction of the amount of
hardware (comparators) needed for the realization of the sorting algorithm, with
a very simple control mechanism providing the output of one iteration as the
input for the next iteration. Dowd et al, [3], reduced the number of comparators
from Ω(n log2 n) to 12 n log n, while keeping the sorting time log2 n, by the
use of a periodic network of depth log n. (The networks of depth d have at most
dn/2 comparators.) There are some periodic sorting networks of a constant depth
([10], [5], [7]). In [7], constant depth networks with time complexity O(log2 n) are
⋆
Research supported by KBN grant 7T11C 3220 in the years 2002, 2003.
A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 132–143, 2003.
c Springer-Verlag Berlin Heidelberg 2003
Periodic Multisorting Comparator Networks
133
obtained by “periodification” of the AKS network, and more practical solutions
with time complexity O(log3 n), are obtained by “periodification” of the Batcher
network. On the other hand there is not known any ω(log n) lower bound on the
time complexity of periodic sorting networks of constant depth. Closing the gap
between the known upper bound of O(log2 n) and the trivial general lower bound
Ω(log n) seems to be a very hard problem.
Periodic networks of constant depth can also be used for simpler tasks, such
as merging sorted sequences [6], or resorting sequences with few values modified
[4].
1.1
New Results
We assume that the values are stored in the registers and the only allowed
operations are compare-exchange operations (applications of comparators) on
the pairs of registers. Such an operation takes the two values stored in the
pair of registers and stores the lower value in the first register and the greater
value in the second register. (This interpretation differs from the one presented
for instance in [8] but is more useful when periodic comparator networks are
concerned.)
We present a family of periodic comparator networks Nm,k . The input size
of Nm,k is n = 4m2k . The depth of Nm,k is 2⌈k/m⌉ + 2. In Section 4 we prove
the following theorem.
Theorem. The periodic network Nm,k transforms the input into 2m sorted subsequences of length n/(2m) in time 4k 2 + 8km + O(k + m).
For example, the network N1,k is a network of depth ≈ 2 log n that produces
2 sorted sequences in time ≈ 4 log2 n + O(log n). On the other hand, Nk,k is a
network of depth 4 that transforms the input into ≈ 2 log n sorted sequences in
time ≈ 12 log2 n + O(log n). Due to the large constants in the known periodic
constant depth networks sorting in time O(log2 n), [7], it could be interesting
alternative to use Nk,k to produce very much ordered (although not completely
sorted) output.
The output produced by Nm,k can be finally sorted by a network merging 2m
sequences. This can be performed by the very efficient multiway merge sorting
networks [9]. It is an interesting problem to find efficient periodic network of
constant depth that merges multiple sorted sequences. The periodic networks
of constant depth that merge two sorted sequences in time O(log n) are already
known [6].
As Nm,k outputs multiple sorted sequences, we call it a multisorting network.
Much simpler multisorting networks of constant depth exist if some additional
operations are allowed (such as permutations of the elements in the registers
between the iterations). However, we consider only the case restricted to the
compare-exchange operations.
134
2
M. Kik
Preliminaries
By a comparator network we mean a set of registers R0 , . . . , Rn−1 together with
a finite sequence of layers of comparators. Every moment a register Ri contains
a single value (denoted by v(Ri )) from some totally ordered set, say IN. We say
that the network stores a sequence v(R0 ), . . . , v(Rn−1 ). A subset S of registers
is sorted if for all Ri , Rj in S, i < j implies that v(Ri ) ≤ v(Rj ). A comparator
is denoted by an ordered pair of registers (Ri , Rj ). If v(Ri ) = x and v(Rj ) = y
before an application of the comparator (Ri , Rj ), then v(Ri ) = min{x, y} and
v(Rj ) = max{x, y} after the application of (Ri , Rj ). A set of comparators L
forms a layer if each register is contained in at most one of the comparators
of L. So all the comparators of a layer can be applied simultaneously. We call
such application a step. The depth of the network is the number of its layers. An
input is the initial value of the sequence v(R0 ), . . . , v(Rn−1 ). An output of the
network N is the sequence v(R0 ), . . . , v(Rn−1 ) obtained after application of all
its layers (application of N ) on some initial input sequence. We can iterate the
network’s application, by applying it to the output of its previous application.
We call such network a periodic network. The time complexity of the periodic
network is the number of steps performed in all iterations.
3
Definition of the Network Nm,k
We define a periodic network Nm,k for positive integers m and k. For the sake of
simplicity we fix the values m and k and denote Nm,k by N . Network N contains
n registers R0 , . . . , Rn−1 , where n = 4m · 2k . It will be useful to imagine that the
registers are arranged in a three-dimensional matrix M of size 2 × 2m × 2k . For
0 ≤ x ≤ 1, 0 ≤ y ≤ 2m − 1 and 0 ≤ z ≤ 2k − 1, the element Mx,y,z is a register
Ri such that i = x + 2y + 4mz. For the intuitions, we assume that Z and Y
coordinates are increasing downwards and rightwards respectively. By a column
Cx,y we mean a subset of registers Mx,y,z with 0 ≤ z < 2k . Py = C0,y ∪ C1,y is a
pair of columns. An Z-slice is a subset of registers with the same Z coordinate.
Let d = ⌈k/m⌉. We define the sets of comparators X, Y0 , Y1 , and Zi , for
0 ≤ i < d, as follows. (Comparators of X, Yj and Zi are called X-comparators,
Y -comparators and Z-comparators, respectively.) The comparators of X, Y0 and
Y1 act in each Z-slice separately (see Figure 1). Set X contains comparators
(M0,y,z , M1,y,z ), for all y and z. Let Y be an auxiliary set of all comparators
(Mx,y,z , Mx,y′ ,z ) such that y ′ = (y + 1) mod 2m. Y0 contains all comparators
(Mx,y,z , Mx,y′ ,z ) from Y , such that y is even. Y1 consists of these comparators
from Y that are not in Y0 . Note that the layer Y1 contains nonstandard comparators (Mx,2m−1,z , Mx,0,z ) (i.e. comparators that place the greater value in
the register with lower index).
In order to describe Zi we define a matrix α of size d × 2m (with the rows
indexed by the first coordinate) such that, for 0 ≤ i < d and 0 ≤ j < 2m:
– if j is even then αi,j = d · j/2 + i,
– if j is odd αi,j = αi,2m−1−j .
Periodic Multisorting Comparator Networks
135
X
Y
Fig. 1. Comparator connections within a single Z-slice. Dotted (respectively, dashed
and solid) arrows represent comparators from X (respectively, Y0 and Y1 ).
For example, for m = 4 and 4 < k ≤ 8, α is the following matrix:
06244260
.
17355371
For 0 ≤ i < d, Zi consists of comparators (M1,y,z , M0,y,z′ ) such that 0 ≤ y < 2m
and z ′ = z + 2k−1−αi,y provided that 0 ≤ z, z ′ < 2k and k − 1 − αi,y ≥ 0. By a
height of the comparator (Mx,y,z , Mx′ ,y′ ,z′ ) we mean z ′ − z. Note that each single
Z-comparator is contained within a single pair of columns and all comparators
of Zi contained in the same pair of columns are are of the same height which
is a power of two. All Z-comparators of height 2k−1 , 2k−2 , . . . , 2k−d (which are
from Z0 , Z1 , . . . , Zd−1 , respectively) are placed in the pairs of columns P0 and
P2m−1 . All Z-comparators of height 2k−1−d , . . . , 2k−2d (from Z0 , . . . , Zd−1 ) are
placed in P2 and P2m−3 . And so on. Generally, for 0 ≤ i < d and 0 ≤ y < m, the
height of all comparators of Zi contained in P2y and in P2m−1−2y is 2k−1−dy−i .
X
Z
height 4
X
Z
height 2
X
Z
height 1
Fig. 2. Z-comparators of different heights within the pairs of columns, for k = 3.
The sequence of layers of the network N is (L0 , . . . , L2d+1 ) where L2i = X,
L2i+1 = Zi , for 0 ≤ i < d, and L2d = Y0 , L2d+1 = Y1 .
136
M. Kik
Y
X
Z
X−comparators
Z−comparators
Y
X
Z
Y −comparators
0
Y −comparators
1
Fig. 3. Network N3,3 . For clarity, the Y -comparators are drawn separately.
Periodic Multisorting Comparator Networks
137
A set of comparators K is symmetric if (Ri , Rj ) ∈ K implies (Rn−1−j ,
Rn−1−i ) ∈ K. Note that all layers of N are symmetric.
Figure 3 shows a network Nk,m , for k = m = 3. As m ≥ k, this network
contains only one layer of Z-comparators Z0 .
4
Analysis of the Computation of Nm,k
The following theorem is a more detailed version of the theorem stated in the
introduction.
k
Theorem 1. After T ≤ 4k 2 + 8mk + 7k + 14m + 6 m
+ 13 steps of the periodic
network Nm,k all its pairs of columns are sorted.
We denote Nm,k by N . By the zero-one principle, [8], it is enough to show this
property for the case when only zeroes and ones are stored in the registers. We
replace zeroes by negative numbers and ones by positive numbers. These numbers
can increase their absolute values between the applications of subsequent layers
in periodic computation of N , but can not change their signs. We show that,
after T steps, negative values preceed all positive values within each pair of
columns.
Initially, let v(R0 ), . . . v(Rn−1 ) be arbitrary sequence of the values from
{−1, 1}. We apply N to this sequence as a periodic network. We call the application of the layer Yi (respectively, X,Zi ) an Y-step (respectively, X-step,
Z-step).
To make the analysis more intuitive, we assume that each register stores
(besides the value) an unique element. The value of an element e stored in Ri ,
(denoted v(e)) is equal to v(Ri ). If v(e) > 0 then e is positive. Otherwise e
is negative. If just before the application of comparator c = (Ri , Rj ) we have
v(Ri ) > v(Rj ) then during the application of c the elements are exchanged
between Ri and Rj . If c is from Y0 or Y1 then the elements are exchanged also
if v(Ri ) = v(Rj ). If e is a positive (respectively, negative) element contained in
Ri or Rj , before the application of c, then e wins in c if, after the application of
c, it ends up in Rj (respectively, Ri ). Otherwise e loses in c.
We call the elements that are stored during the X-steps and Z-steps in the
pairs of columns P2i , for 0 ≤ i < m, right-running elements. The remaining
elements are called left-running.
Let k ′ = md. (Recall that d = ⌈k/m⌉.) Let δ = 1/(4k ′ ). Note that k ′ δ < 1.
By critical comparators we mean the comparators between P2m−1 and P0 from
the layer Y1 . We modify the computation of N as follows:
– After each Z-step, we increase the values of the positive right-running elements and decrease the values of the negative left-running elements by δ.
(We call it δ-increase.)
– When a positive right-running (respectively, negative left-running) element
e wins in a critical comparator, we increase v(e) to ⌊v(e) + 1⌋ (respectively,
decrease v(e) to ⌈v(e) − 1⌉).
138
M. Kik
Note that once a positive (respectively, negative) element becomes rightrunning (respectively, left-running) it remains right-running (respectively, leftrunning) for ever. All the positive left-running and negative right-running elements have absolute value 1.
Lemma 1. If, during the Z-step t, |v(e)| = l + y ′ δ, where l and y ′ are nonnegative integers such that l ≥ 2 and 0 ≤ y ′ < k ′ , then, during t, e can be processed
′
only by comparators with height 2k−1−y .
Let e be a positive element. (A negative element behaves symmetrically.)
Since v(e) > 1, e is a right-running element during step t. At the moment when
e started being right-running, its value was equal 1. A right-running element
can be δ-increased at most k ′ times between its subsequent wins in the critical
comparators, and k ′ δ < 1. Thus e reached the value 2 when it entered P0 for
the first time. Then its value was being increased by δ, after each Z-step (d
times in each P2j ), and rounded up to the next integer during its wins in critical
comparators. The lemma follows from the definition of α and Zi : The heights
of the Z-comparators from the subsequent Z-layers Zi , for 0 ≤ i < d, in the
subsequent pairs of columns P2j , for 0 ≤ j < m, are the decreasing powers of
two. ✷
We say that a register Mx,y,z is l-dense for v if
– in the case v > 0: v(Mx,y,z+i⌈2l ⌉ ) ≥ v, for all i ≥ 0 such that z + i⌈2l ⌉ < 2k ,
and
– in the case v < 0: v(Mx,y,z−i⌈2l ⌉ ) ≤ v for all i ≥ 0 such that z − i⌈2l ⌉ ≥ 0.
Note that, for l < 0,“l-dense” means “0-dense”. An element is l-dense for v if it
is stored in a register that is l-dense for v.
Lemma 2. If Mx,y,z is l-dense for v > 0 (respectively, v < 0), then, for 0 <
v ′ ≤ v (respectively, v ≤ v ′ < 0), Mx,y,z is l-dense for v ′ .
If Mx,y,z is l-dense for v > 0 (respectively, v < 0), then, for all j ≥ 0
(respectively, j ≤ 0), Mx,y,z+j⌈2l ⌉ is l-dense for v.
If Mx,y,z is l-dense for v > 0 (respectively, v < 0) and Mx,y,z+⌊2l−1 ⌋ (respectively, Mx,y,z−⌊2l−1 ⌋ ) is l-dense for v, then Mx,y,z is (l − 1)-dense for v.
The properties can be easily derived from the definition. ✷
Lemma 3. Let L be any layer of N and (Mx,y,z , Mx′ ,y′ ,z′ ) ∈ L.
If Mx,y,z or Mx′ ,y′ ,z′ is l-dense for v > 0 (respectively, v < 0), just before an
application of L, then Mx′ ,y′ ,z′ (respectively, Mx,y,z ) is l-dense for v just after
the application of L.
If Mx,y,z and Mx′ ,y′ ,z′ are l-dense for v, just before the application of L, then
Mx,y,z and Mx′ ,y′ ,z′ are l-dense for v just after the application of L.
Proof. The lemma follows from the fact that, for each integer i such that
0 ≤ z + i⌈2l ⌉, z ′ + i⌈2l ⌉ < 2k , the comparator (Mx,y,z+i⌈2l ⌉ , Mx′ ,y′ ,z′ +i⌈2l ⌉ ) is
also in L. ✷
Periodic Multisorting Comparator Networks
139
Corollary 1. If an element l-dense for v wins during an application of a layer
L of N , then it remains l-dense for v. If it looses to another element l-dense for
v, then it also remains l-dense for v. If it wins in critical comparator and v > 0
(respectively, v < 0), then it becomes l-dense for ⌊v + 1⌋ (respectively, ⌈v − 1⌉).
If just before Z-step t, e is right-running positive (respectively, left-running
negative) element l-dense for v > 0 (respectively, v < 0), and, during t, e looses
to another element l-dense for v or wins, then it becomes l-dense for v + δ
(respectively, v − δ), after the δ-increase following t.
The following lemma states that each positive element e that was rightrunning for a long time is contained in a dense foot of the elements with the
value v(e) or greater, and an analogical property holds for left-running negative
values.
Lemma 4. Consider the configuration of N after the Z-step. For nonnegative
integers l,s and y ′ such that y ′ ≤ k ′ , for each element e:
If v(e) = l + 2 + s + y ′ δ, then e is (k − l)-dense for l + 2 + y ′ δ and, if y ′ > l,
then e is (k − l − 1)-dense for l + 2 + y ′ δ.
If v(e) = −(l + 2 + s + y ′ δ), then e is (k − l)-dense for −(l + 2 + y ′ δ) and, if
′
y > l, then e is (k − l − 1)-dense for −(l + 2 + y ′ δ).
Proof. We prove only the first part. The second part is analogical since all layers
of N are symmetrical. The proof is on induction by l. Let 0 ≤ l < k. Let e be
any element with v(e) = l + 2 + s + y ′ δ, for some nonnegative integers s,y ′ , where
y ′ ≤ k ′ . The element e was right-running during each of the last y ′ Z-steps.
These steps were preceeded by a critical step t, that increased v(e) to l + 2 + s.
Let ti (respectively, t′i ) be the (i + 1)-st X-step (respectively, Z-step) after step
t. Let Mxi ,yi ,zi (respectively, Mx′i ,yi ,zi′ ) be the register that stored e just after
ti (respectively, t′i ). Let vi denote the value l + 2 + iδ. During each step ti and
t′i , all elements e′ with v(e′ ) ≥ v(e), in the pair of columns containing e, are
(k − l)-dense for vi . (For l = 0 it is obvious, since the “height” of N is 2k , and,
for l > 0, it follows from the induction hypothesis and Corollary 1, since e′ was
(k − l)-dense for l + 1 already before t, and, hence, (k − l)-dense for v0 just after
t.)
Claim (Breaking Claim). For 0 ≤ i ≤ l, just after the X-step ti , the registers
M0,yi ,zi +2k−i and M1,yi ,zi +2k−i are (k − l)-dense for vi , if they exist.
We prove the claim by induction on i. For i = 0 it is obvious. (M0,yi ,zi +2k and
M1,yi ,zi +2k do not exist.)
Let 0 < i ≤ l. Consider the configuration just after step ti−1 . (See Figure
4.) Since ti−1 was an X-step, v(M1,yi−1 ,zi−1 ) ≥ v(e) and, hence, M1,yi−1 ,zi−1 is
(k − l)-dense for vi−1 . Thus, M1,yi−1 ,zi−1 +2k−i is (k − l)-dense for vi−1 , since 2k−i
is multiple of 2k−l . By the induction hypothesis of the claim, M0,yi−1 ,zi−1 +2k−i+1
and M1,yi−1 ,zi−1 +2k−i+1 are (k − l)-dense for vi−1 . Just after the step t′i−1 ,
M1,yi−1 ,zi−1 +2k−i , and M1,yi−1 ,zi−1 +2k−i+1 remain (k − l)-dense for vi−1 , since
they were compared to the registers M0,yi−1 ,zi−1 +2k−i+1 and M0,yi−1 ,zi−1 +2k−i+2
140
M. Kik
e
z
i−1
z
+2k − i
z
+2k − i +1
i−1
i−1
x=0
x =1
Fig. 4. The configuration after ti−1 in Pyi−1 in the registers with Z-coordinates zi−1 +
j2k−i , for 0 ≤ j < 4. (Black registers are (k − l)-dense for vi−1 . Arrows denote the
comparators from t′i−1 .)
that were (k − l)-dense for vi−1 . M0,yi−1 ,zi−1 +2k−i+1 remains (k − l)-dense for
vi−1 . M0,yi−1 ,zi−1 +2k−i also becomes (or remains) (k − l)-dense for vi−1 , since it
was compared to M1,yi−1 ,zi−1 . Thus, just after the Z-step t′i−1 , for x ∈ {0, 1}, the
′
registers Mx′ = Mx,yi−1 ,zi−1
+2k−i are (k−l)-dense for vi−1 (and for vi , after the δ′
′
increase). (Either zi−1
= zi−1 + 2k−i
= zi−1 and Mx′ = Mx,yi−1 ,zi−1 +2k−i , or zi−1
′
and Mx = Mx,yi−1 ,zi−1 +2k−i+1 .) If i mod d = 0 then, during the next two Y-steps,
the elements from both M0′ and M1′ together with the element e are moved “horizontally” to P2i/d (wining by the way). Thus, by Corollary 1, just before and
after the X-step ti , for x ∈ {0, 1}, the registers Mx,yi ,zi +2k−i are (k − l)-dense
for vi . This completes the proof of the claim.
The next claim shows how the values vl or greater form twice more condensed
foot below e.
Claim (Condensing Claim). After the Z-step t′l , e is (k − l − 1)-dense for vl (and
for vl+1 , after the δ-increase).
Consider the configuration just after X-step tl . The registers Mxl ,yl ,zl and, by
the Breaking Claim, M0,yl ,zl +2k−l and M1,yl ,zl +2k−l are (k −l)-dense for vl . Since
the last step was an X-step, M1,yl ,zl is (k − l)-dense for vl .
Consider the following scenarios of the Z-step t′l (see Figure 5):
1. e remains in M0,yl ,zl : Then the register M0,yl ,zl +2k−l−1 becomes (k − l)-dense
for vl , by Lemma 3, since M1,yl ,zl was (k − l)-dense for vl just before t′l . Thus
e becomes (k − l − 1)-dense for vl , by Lemma 2.
2. e is moved from M1,yl ,zl to M0,yl ,zl +2k−l−1 : Then by Corollary 1, e remains
(k − l)-dense for vl , and the register M0,yl ,zl +2k−l remains (k − l)-dense for
vl . Thus e becomes (k − l − 1)-dense for vl , by Lemma 2.
3. e remains in M1,yl ,zl : Then v(e) ≤ v(M0,yl ,zl +2k−l−1 ) ≤ v(M1,yl ,zl +2k−l−1 )
just before t′l . (The second inequality is forced by the X-step tl ). Hence, for
x ∈ {0, 1}, Rx′ = Mx,yl ,zl +2k−l−1 was (k−l)-dense for vl just before t′l . During
Periodic Multisorting Comparator Networks
141
R’
z
l
e
e
e
z +2 k −l −1
l
e
R’0
R’1
R’’
case 4
z +2 k −l
l
case 1
case 2
case 3
Fig. 5. The scenarios of t′l .
t′l the register R1′ is compared to M0,yl ,zl +2k−l . So R1′ remains (k − l)-dense
for vl . Since e was compared to R0′ , it also remains (k − l)-dense for vl . By
Lemma 2, e is (k − l − 1)-dense for vl just after t′l .
4. e is moved from M0,yl ,zl to R′ = M1,yl ,zl −2k−l−1 : During t′l , R′ was compared
to Mxl ,yl ,zl and R′′ = M1,yl ,zl was compared to M0,yl ,zl +2k−l−1 that was
(k − l)-dense for vl just before t′l , by the Breaking Claim applied to the
element in R′ . Thus, by Lemma 3, the registers R′ and R′′ remain (k − l)dense for vl just after t′l . By Lemma 2, R′ is (k − l − 1)-dense for vl just after
t′l .
Since there are no other scenarios for e and the subsequent δ-increase is the same
for all positive elements in Pyl , the proof of the claim is completed.
By Corollary 1, the element e remains (k − l − 1)-dense for vi , for i > l, since
other elements in its pair of columns with values v(e) or greater are now also
(k − l − 1)-dense for vi , and during Y-steps e is wining (right-running).
For l ≥ k, “(k − l)-dense for v” means “0-dense for v”. The element e with
v(e) = k + 1 + kδ is 0-dense for k + 1 + kδ. All the positive elements below it
increase their values at the same rate as e. Thus, when v(e) reaches k + 2, it
becomes 0-dense for k + 2. By repeating this reasoning for the values k + 2 and
greater we complete the proof of the Lemma 4. ✷
By Lemma 4, whenever any element e reaches the value k + 2 (in the pair
of columns P0 ) it is 0-dense for k + 2. Then, by the Breaking Claim, after the
X-step after e reaches the value k + 2 + kδ, e is stored in a register Mx,y,z such
that M0,y,z+1 is also 0-dense for k + 2 + kδ. Hence, all the elements following e
in its pair of columns are 0-dense for k + 2 + kδ. By Corollary 1, this property of
e remains valid forever. Since the network is symmetric, we have the following
corollary:
Corollary 2. Consider a configuration in a pair of columns Py just after an
X-step.
142
M. Kik
If, for some register Ri ∈ Py , v(Ri ) ≥ k + 2 + kδ, then, for all Rj ∈ Py such
that j ≥ i, we have v(Rj ) ≥ k + 2 + kδ.
If, for some register Ri ∈ Py , v(Ri ) ≤ −(k + 2 + kδ), then, for all Rj ∈ Py
such that j ≤ i, we have v(Rj ) ≤ −(k + 2 + kδ).
Now, it is enough to show that, after the last X-step of the first T steps, all
right-running positive and all left-running negative elements have the absolute
values k+2+kδ or greater. Then in each pair of columns containing right-running
elements, the −1s are above the positive values, and in each pair of columns
containing left-running elements, the 1s are below the negative elements.
Lemma 5. If, after m Y-steps, and the next k ′ (k + 1) + k Z-steps, and the next
X-step, e is a left-running positive (respectively, right-running negative) element,
then e remains left-running (respectively, right-running) forever.
Let e be positive. (The proof for e negative is analogical). During each of
the first m Y-steps, e was compared with the positive right-running elements.
For t ≥ 0, let yt be such that e was in Pyt just after the (t + 1)st Y-step. For
0 ≤ i < m, let Si (respectively, Si′ ) denote the set of positive elements that were
in Pyi (respectively, P(yi +1) mod 2m ) just after (i + 1)st Y-step. Let S ′′ be the
set of negative elements in Pym−1 just after the mth Y-step. For 0 ≤ i < m,
|Sm−1 | = 2 · 2k − |S ′′ | ≤ |Si′ |, since Sm−1 ⊆ Si and |Si | ≤ |Si′ |. Note that, for all
t ≥ m, during the (t + 1)st Y-step, the pair of columns containing (left-running)
S ′′ is compared to the pair of columns containing (right-running) St′ mod m .
After the next k ′ (k + 1) + k Z-steps all the elements of S ′′ have values −(k +
2 + kδ) or less, and, for 0 ≤ i < k, the elements of Si′ have values k + 2 + kδ or
greater (they have walked at least k + 1 times through the critical comparators
and then increased their values by δ at least k times during Z-steps). Let t′ be
the next X-step. Let t be any Y-step after t′ such that e is still in the same
pair of columns as S ′′ . Before the step t, the elements in S ′′ and each Si′ were
processed by an X-step after their absolute values had reached k + 2 + kδ. Hence,
by the Corollary 2, just before the Y-step t, all the final |Si′ | registers of the pair
of columns containing Si′ store the values k + 2 + kδ or greater and the pair
of columns containing S ′′ has all the initial |S ′′ | registers filled with the values
−(k + 2 + kδ) or less. Thus, e is stored in one of its remaining 2 · 2k − |S ′′ | final
registers and, during the Y-step t, e is compared with a value k + 2 + kδ or
greater and it must remain left-running. ✷
The depth of N is 2d + 2. Each iteration of N performs two Y-steps as its
last steps. Thus the first m Y-steps are performed during the first (2d + 2)⌈m/2⌉
steps. Each iteration of N performs d Z-steps. Thus, the next k ′ (k + 1) + k Zsteps are performed during the next (2d + 2)⌈(k ′ (k + 1) + k)/d⌉ steps. After the
next X-step, t′ , by Lemma 5, the set S of positive right-running and negative
left-running elements remains fixed. After the next ⌈(k ′ (k + 1) + k)/d⌉ iterations
absolute values of elements in S are k + 2 + kδ or greater. (t′ was the first step
of these iterations.) After the first X-step of the next iteration, by Corollary 2,
in all pairs of columns the negative values preceed the positive values. We can
now replace negative values with zeroes, positive values with ones, and, by the
Periodic Multisorting Comparator Networks
143
zero-one principle, we have all the pairs of columns sorted. (Note that, by the
definition of N , once all the pairs of columns are sorted, they remain sorted for
ever.)
We can estimate the number of steps by T ≤ (2d + 2)(⌈m/2⌉ + 2⌈(k ′ (k + 1) +
k)/d⌉) + 1. Recall that d = ⌈k/m⌉. It can be verified that T ≤ 4k 2 + 8mk + 7k +
k
+ 13. This completes the proof of Theorem 1.
14m + 6 m
Remarks: Note that the network N1,k can be simplified to a periodic sorting
network of depth 2 log n, by removing the Y-steps and merging P0 with P1 .
However, better networks exist, [3], with depth log n that sort in log n iterations.
Note also that the arrangement of the registers in the matrix M can be arbitrary.
We can select the one that is most suitable for the subsequent merging.
Acknowledgments. I would like to thank Miroslaw Kutylowski for his useful
suggestions and comments on this paper.
References
1. M. Ajtai, J. Komlós and E. Szemerédi. Sorting in c log n parallel steps. Combinatorica, Vol. 3, pages 1–19, 1983.
2. K. E. Batcher. Sorting networks and their applications. Proceedings of 32nd
AFIPS, pages 307–314, 1968.
3. M. Dowd, Y. Perl, L. Rudolph, and M. Saks. The periodic balanced sorting network. Journal of the ACM, Vol. 36, pages 738–757, 1989.
4. M. Kik. Periodic correction networks. Proceedings of the Euro-Par 2000, Springer
Verlag, LNCS 1900, pages 471–478, 2000.
5. M. Kik, M. Kutylowski and G. Stachowiak. Periodic constant depth sorting network. Proceedings of the 11th STACS, Springer Verlag, LNCS 775, pages 201–212,
1994.
6. M. Kutylowski, K. Loryś and B. Oesterdiekhoff. Periodic merging networks. Proceedings of the 7th ISAAC, pages 336–345, 1996.
7. M. Kutylowski, K. Loryś, B. Oesterdiekhoff, and R. Wanka. Fast and feasible
periodic sorting networks. Proceedings of the 55th IEEE-FOCS, 1994.
8. D. E. Knuth. The art of Computer Programming. Volume 3: Sorting and Searching.
Addison-Wesley, 1973.
9. De-Lei Lee and K. E. Batcher. A multiway merge sorting network. IEEE Transactions on Parallel and Distributed Systems 6, pages 211–215, 1995.
10. U. Schwiegelshohn. A short-periodic two-dimensional systolic sorting algorithm.
IEEE International Conference on Systolic Arrays, pages 257–264, 1988.
Fast Periodic Correction Networks
Grzegorz Stachowiak
Institute of Computer Science, University of Wrocáaw,
Przesmyckiego 20, 51-151 Wrocáaw, Poland
gst@ii.uni.wroc.pl
Abstract. We consider the problem of sorting N -element inputs differing
from already sorted sequences on t entries. To perform this task we construct
a comparator network that is applied periodically. The two constructions for
this problem made by previous authors required O(log n + t) iterations of the
network. Our construction requires O(log n + (log log N )2 (log t)3 ) iterations
which makes it faster for t ≫ log N .
Keywords: sorting network, comparator, periodic sorting network.
1
Introduction
Sorting is one of the most fundamental problems of computer science. A classical approach to sort a sequence of keys is to apply a comparator network. Apart from a long
tradition, comparator networks are particularly interesting due to hardware implementations. They can be also implemented as sorting algorithms for parallel computers.
In our approach sorted elements are stored in registers r1 , r2 , . . . , rN . Registers are
indexed with integers or elements of other linearly ordered sets. A comparator [i : j] is
a simple device connecting registers ri and rj (i < j). It compares the keys they contain
and if the key in ri is bigger, it swaps the keys. The general problem is the following. At
the beginning of the computations the input sequence of keys is placed in the registers.
Our task is to sort the sequence of keys according to the linear order of register indices
by applying a sequence of comparators. The sequence of comparators is the same for all
possible inputs. We assume that comparators connecting disjoint pairs of registers can
work in parallel. Thus we arrange the sequence of comparators into a series of layers
which are sets of comparators connecting disjoint pairs of registers. The total time needed
by such a network to sort a sequence is proportional to the number of layers called the
network’s depth.
Much research concerning sorting networks was done in the past. Most famous
results are asymptotically optimal AKS [1] sorting network of depth O(log N ) and more
‘practical’ Batcher [2] network of depth ∼ 12 log2 N (from now on all the logarithms are
binary).
Some research was devoted to problems concerning periodic sorting networks. Such
a comparator network is applied not once but many times in a series of iterations. The
input of the first iteration is the sequence to be sorted. The input of (i + 1)st iteration is
the output of ith iteration. The output of the last iteration should always be sorted. The
total time needed to sort an input sequence is the product of the number of iterations
A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 144–156, 2003.
c Springer-Verlag Berlin Heidelberg 2003
Fast Periodic Correction Networks
145
and the depth of the network. Constructing such networks especially of small constant
depth gives hope to reduce the amount of hardware needed to build sorting comparator
networks. It can be done by applying the same small chip many times to sort an input.
We can also view such a network as a building block of a sorting network in which
layers are repeated periodically. Main results concerning periodic sorting networks are
presented in the table:
depth # iterations
DPS [3]
log N
√log N
Schwiegelsohn [15] 8 O( N log N )
KKS [5]
O(k) O(N 1/k )
Loryś et al. [9]
3-5
O(log2 N )
Last row of this table requires some words of explanation. The paper [9] describes
a network of depth 5, but a later paper [10] reduces this value to 3. The number of
iterations O(log2 N ) is achieved by periodification of AKS sorting network for which
the constant hidden behind big O is very big. Periodification of Batcher network requires
less iterations for practical sizes of the input, though it requires the time O(log3 N )
asymptotically. It is not difficult to show that 3 is the minimal depth of a periodic sorting
network which requires o(N ) iterations to sort an arbitrary input.
A sequence obtained from a sorted one by t changes being either swaps between pairs
of elements or changes on single positions we call t-disturbed. We define t-correction
network to be a specialized network sorting t-disturbed inputs. Such networks were
designed to obtain a sorted sequence from an output produced by a sorting network
having t faulty comparators [14,11,16]. There are also other potential applications in
which we have to deal with sequences that differ not much from a sorted one. Let us
consider a large sorted database with N entries. In some period of time we make t
modifications of the database and want to have it sorted back. It can be more effective to
use a specialized correction unit in such a case, than to apply a sorting network. Results
concerning such correction networks are presented in [4,16].
There was some interest in constructing periodic comparator networks of a constant
depth, that sort t-disturbed inputs. The reason is that the fastest known constant depth
periodic sorting networks have running time O(log2 N ). On the other hand in some
applications faster correction networks can replace sorting networks. Two periodic correction networks were already constructed by Kik and Piotrów [6,12]. The first of them
has depth 8 and the other has depth 6. Both of them require O(log N + t) iterations
for considered inputs where N is input size and t is the number of modifications. The
running time is O(log N ) for t = O(log N ) and the constants hidden behind the big O
are small. Unfortunately it is not known how fast these networks complete sorting if
t ≫ log N .
In this paper we construct a periodic t-correction network to deal with t : log N ≪
t ≪ N . The reason we assume that t is small in comparison to N is the following.
If t is about the same as N , then the periodification scheme gives a practical periodic
sorting network of depth 3 requiring O(log3 N ) = O(log3 t) iterations. Actually we do
not hope to get better performance in such a case. Our network has depth 3 and running time: O(log N + (log log N )2 (log t)3 ). We should mention that in our construction
146
G. Stachowiak
we do not use AKS sorting network. If this network was used (also in the auxiliary
construction of a non periodic t-correction network) we would get the running time:
O(log N + (log log N )(log t)2 ). In such case the AKS constant would stand in front of
(log log N )(log t)2 .
Now we remind of a couple of useful properties of comparator networks. The first
of them is a general property of all comparator networks. Let us assume we have two
inputs for a fixed comparator network. We say that we have relation (x1 , x2 , . . . , xN ) ≤
(y1 , y2 , . . . , yN ) between these inputs if for all i we have xi ≤ yi .
Lemma 1.1. If we apply the same comparator network to inputs for which we have
(x1 , x2 , . . . , xN ) ≤ (y1 , y2 , . . . , yN ) then this relation is preserved for the outputs.
The analysis of sorting networks is most often based on the following lemma [7]
Lemma 1.2 (zero–one principle). A comparator network is a sorting network if and
only if it can sort any input consisting only of 0s and 1s.
This lemma is the reason, why from now on we consider inputs consisting only of 0s
and 1s. Thus we consider only t-disturbed sequences consisting of 0s and 1s. We note,
that 0-1 sequence x1 , . . . , xN is t disturbed if for some index b called the border at most
t entries in x1 , . . . , xb are 1s and at most t entries in xb+1 , . . . , xN are 0s. These 1s (0s)
we call displaced.
Let us remind the proof of zero–one principle. The input consists of arbitrary elements. We prove that the comparator network sorts it. We consider an arbitrary a from
this input and show it gets to the register corresponding to its rank in the sequence.
We replace elements bigger than a by 1, and smaller by 0. Indeed the only difference
between outputs for sequences where a is replaced by 0 or 1 respectively is the register
with the index corresponding to rank(a).
Now we deal with an arbitrary t-disturbed input. We transform it to a t-disturbed 0-1
sequence as in the proof of zero–one principle. This gives us a useful analog of zero-one
principle for t-correction networks.
Lemma 1.3. A comparator network is a t-correction network if it can sort any tdisturbed input consisting of 0s and 1s.
We define dirty area for 0-1 a sequence stored in the registers during computations
of a comparator network. Dirty area is the smallest set of subsequent registers such that
all registers with lower indices contain 0s and all registers with bigger indices contain
1s. A specialized comparator network that sorts any input having dirty area of a given
size we call a cleaning network.
2
Periodic Sorting Networks
In this section we remind the periodification scheme in [9]. Actually what we present is
closer to the version of this scheme described by Oesterdiekhoff [10] which produces
a network of depth 3. In comparison to previous authors we change the construction of
Schwiegelsohn edges and embed only a single copies of sorting and merging networks.
Fast Periodic Correction Networks
147
The analysis of the network is almost the same as in abovementioned papers and we do
not show it.
The periodification scheme is a method to convert a non periodic sorting network
having T (p) layers for input size p into a periodic sorting network of depth 3. This
periodic network sorts any input containing Θ(pT (p)) items in O(T (p) log p) iterations.
We take advantage of the fact, that for any sorting network T (p) = Ω(log p). The periodification scheme applied to Batcher sorting network gives a periodic sorting network
which needs O(log3 N ) iterations to sort an arbitrary input of size N . If we put AKS
sorting network into this scheme, we get a periodic sorting network requiring O(log2 N )
iterations which is (due to very large constants in AKS) worse solution for practical N .
In the periodification scheme registers are indexed with pairs (i, j), 1 ≤ i ≤ p, 1 ≤
j ≤ q ordered lexicographically. Thus we view these registers as arranged in rectangular
matrix p × q of p rows and q columns. We have the rows with smallest indices i at the‘top’
and those with biggest indices at the ‘bottom’ of the array. We also view columns with
smallest indices j to be on the left hand side and those with biggest indices to be on the
right hand side. The parameter q = 10(T (p) + log p) is an even number (for simplicity
from now on we write log p instead of ⌈log p⌉).
The periodic sorting network consists of three subsequent layers A, B and C. The
layers A and B which are layers of odd-even transposition sort network are called
horizontal steps. They are sets of comparators:
A = {[(i, 2j − 1) : (i, 2j)]|i, j = 1, 2, . . .}
B = {[(i, 2j) : (i, 2j + 1)]|i, j = 1, 2, . . .} ∪ {[(i, q) : (i + 1, 1)]|i = 1, 2, . . .}
The edges of A and B connecting registers of the same row we call horizontal. The
layers A, B alone sort any input but in general the time to do it is very long.
Defining layer C called vertical step is much more complicated. We first divide the
columns of registers into six subsequent areas: S, ML , XL , Y, XR , MR . Each of the areas
contains an even number of columns. First two columns form an area S where so called
‘Schwiegelsohn’ edges are located. So the columns with numbers 3, 4, . . . , 2 log p + 2 are
in the area ML . Next 2T (p) columns form area XL . Last 2 log p columns are contained
in area MR . Area XR consists of 2T (p) columns directly preceding MR . And the area
Y contains all the columns between XL and XR . We now say where the comparators of
layer C are in each area.
In area S the comparators form the set
{[(2i − 1, 1) : (2i, 2)]|i = 1, 2, . . .}
Note that this way of defining “Schwiegelsohn” edges differs from one described in
previous papers on this subject. Comparators of C in all other areas unlike those in S
connect always registers in the same column. There are no comparators in area Y on
layer C.
In each area ML and MR we embed a single copy of a network which merges
two sorted sequences of length p/2. In this network’s input of length p even indexed
entries are one sequence and odd indexed entries are the other. We also assume, that
the sequence in odd indexed entries does not have more 1s than one contained in even
148
G. Stachowiak
2 log p
p
S
2T (p)
2T (p)
2 log p
merging
sorting
sorting
merging
network
network
network
network
in
in
in
in
odd
even
odd
even
columns
columns
columns
columns
ML
XL
Y
XR
MR
Fig. 1. Areas defined to embed C-layer. Arrows indicate the order of layers of embeded networks.
indexed entries. A comparator network merging two such sequences is the series of
layers L1 , L2 , . . . , Llog p−1 where
Li = {[2j : 2j + 2log p−i − 1]|j = 1, 2, . . .}.
Thus the set of comparators in ML is equal to
{[(k, 2j + 1) : (l, 2j + 1)]|[k : l] ∈ Lj , j = 1, 2, . . .}.
The set of comparators in MR is equal to
{[(k − 1, q − 2j + 2)) : (l − 1, q − 2j + 2)]|[k : l] ∈ Lj , j = 1, 2, . . .}.
For technical reasons the network embedded in MR is moved one row up.
Finally we define comparators in XL and XR . These comparators are embedding
of a single sorting network in each area. Let this sorting network be the series of layers
L′1 , L′2 , . . . , L′T (p) . Let jL = 2 + 2 log p + 2T (p) be the last column of XL . The set of
comparators in XL is equal
{[(k, jL − 2(j − 1)) : (l, jL − 2(j − 1))]|[k : l] ∈ L′j , j = 1, 2, . . .}.
Analogously if jR = q − 2 log p − 2T (p) + 1 is the first column of XR , then the set of
comparators in XR is equal
{[(k, jR + 2(j − 1)) : (l, jR + 2(j − 1))]|[k : l] ∈ L′j , j = 1, 2, . . .}.
The edges connecting registers in the same column we call vertical. Almost all the
edges of step C are vertical. Only the slanted edges in S are not vertical.
Our aim in the analysis of the network obtained in periodification scheme is to prove
that it sorts any input in O(T (p) log p) steps. The proof easily follows from the key
lemma
Fast Periodic Correction Networks
149
Lemma 2.1 (key lemma). There exist constants c and d such that after d · q steps
– the bottom c · p rows contain only 1s if there are more 1s than 0s in the registers;
– the top c · p rows contain only 0s if there are more 0s then 1s in the registers.
Indeed if we consider only the rows containing dirty area in the key lemma, then this
area is guaranteed to be reduced by a constant factor within O(q) steps. Thus applying
the key lemma O(log p) times we reduce this area within O(q log p) steps to a constant
number of rows. Next O(q) steps sort such a reduced dirty area.
We do not describe the proof of key lemma, but define some notions from it to use
them further in the paper. In this proof it is assumed, that two 1s or 0s compared by
a horizontal edge are exchanged. In a given moment of computations we call an item
(i.e. 0 or 1) right-running (left-running) if it is placed in the right (left) register by a
horizontal edge of the recently executed horizontal step. We can extend this definition
on wrap-around edges of layer B in a natural way saying that they put right-running
items in the first column and left-running items in the last. A column containing rightrunning (left-running) items is called R-column (L-column). Analyzing the network we
can observe ‘movement’ of R-columns of 1s to the right and L-columns of 0s to the left.
Thus any column is alternately L-column and R-column and the change occurs during
every horizontal step. The only exception are two columns of S. From the proof of key
lemma it also follows, that we have the following property
Fact 2.2 Assume we add any vertical edges to the layer C in area Y . For such a new
network the key lemma still holds.
Now we modify periodification scheme step by step to obtain at the end periodic
t-correction network. First we introduce a construction of a periodic cleaning network
sorting any N -element input with the dirty area of size qt, q ≥ 10(T (2t) + 2 log t). In
this construction registers are arranged into q columns and dirty area is contained in t
subsequent rows. This network needs O(q log t) iterations to do its job. The construction
of periodic correction network is based on this cleaning network. We first build a simple
non periodic cleaning network
Lemma 2.3. Assume we have a sorting network of depth T (t) for input size t. We can
construct a comparator network of depth T ′ (t) = T (2t) + log t which sorts any input
with dirty area of size t.
Proof. We divide the set of all registers r1 , r2 , . . . , rN into N/t disjoint parts each consisting of t subsequent registers. Thus we obtain part P1 containing registers r1 , . . . , rt ,
P2 containing registers rt+1 , . . . , r2t , P3 containing registers r2t+1 , . . . , r3t , and so on.
The cleaning network consists of two steps. First we have networks sorting keys in
P2i ∪ P2i+1 for each i. It requires T (2t) layers. Then we have networks merging elements in P2i−1 with those in P2i for each i. It requires log t layers of the network.
Now we can build a periodic cleaning network. We do it substituting sorting network
in the periodification scheme with the cleaning network described above. This way we
can reduce XL and XR to 2T ′ (t) columns. We also reduce ML and ML to 2 log t
150
G. Stachowiak
columns, by embedding only log t last layers of merging network instead of the whole
merging network applied in periodification scheme. These layers are (after relabeling)
L1 , L2 , . . . , Llog t where
Li = {[2j + 1 : 2j + 2log t−i+1 ]|j = 1, 2, . . .}.
They merge any two sequences that do not differ by more than t/2 1s. So instead of a
sorting network we use a cleaning one and we reduce the merging network. Such reduced
sorting and merging networks are not distinguishable from original merging and sorting
networks if we deal only with inputs having dirty areas of size at most qt. The analysis
of such a periodification scheme for cleaning networks is the same as the original one
for sorting networks and gives us the following fact
Lemma 2.4. The periodic cleaning network described above has depth 3 and sorts any
input with dirty area having t rows in O(q log t) iterations.
One can notice that there are no edges of layer C in Y in this construction. If we
add any vertical edges in Y or any other edges with the difference between row numbers
of end registers bigger than t to layer C, then the network remains a cleaning network.
Roughly speaking by adding such edges we are going to transform the periodic cleaning
network into a periodic t-correction network.
3
Main Construction
In this section we define our periodic t-correction network. To do it we need another
non periodic comparator network. We call it (t, ∆, δ)-semi-correction network. If a tdisturbed input with dirty area of size ∆ is processed by such a network, then the dirty
area size is guaranteed to be reduced to δ. Now we present quite unoptimal construction
of (t, ∆, δ)-semi-correction network.
We divide the set of all registers r1 , r2 , . . . , rN into N/∆ disjoint parts each consisting
of ∆ subsequent registers. Thus we obtain part P1 containing registers r1 , . . . , r∆ , P2
containing registers r∆+1 , . . . , r2∆ , P3 containing registers r2∆+1 , . . . , r3∆ , and so on.
The construction consists of two steps. In step 1 we give new indices to the registers of
each sum P2k ∪ P2k+1 , k = 1, 2, . . .. These indices are lexicographically ordered pairs
(i, j), 1 ≤ i ≤ 2t∆/δ, 1 ≤ j ≤ δ/(2t). The ordering of new indices is the same as the
main ordering of indices. We apply a t-correction network to each column j of each sum
separately. This way we obtain dirty area of size at most δ in each sum. In step 2 we
repeat the construction from step 1 for sums P2k−1 ∪ P2k Because any dirty area of size
∆ is contained in one of the sums Pl ∪ Pl+1 from step 1 or 2, this dirty area is reduced
to size δ. Thus we get the following lemma
Lemma 3.1.
There exists a (t, ∆, δ)-semi-correction network
Let t ≪ δ and t ≪ ∆/δ.
of depth O log x + (log t log log x)2 , where x = ∆/δ.
Proof. Description of t-correction networks of depth O log N + d(log t log log N )2
(N is the input size) can be found in [4,16]. We apply such a network in the construction presented above and obtain a semi-correction network with desired depth. Simple
calculations are left to the reader.
Fast Periodic Correction Networks
151
Now at last we get to the main construction of this paper. We assume,
that log N ≪ t ≪ N and want to construct an efficient periodic t-correction
network.
Without loss of generality
we assume that t is even. Let S(N, t) =
O log N/ log t + (log log N )2 (log t)2 be the maximum depth of a (t, ∆, δ)-semicorrection network for x = ∆/δ = N 1/ log t . As before T (t) is the depth of a sorting
network. In our construction the registers are arranged into an array of q columns and
N/q rows, where
q = max {10(T (4t + 4) + 2 log t), 4(T (4t + 4) + 2 log t) + 2S(N, t)}
The rows of this array are divided into N/pq floors which are sets of p = 4t+4 subsequent
rows. So the floor 1 consists of rows 1, 2, . . . , p, floor 2 of rows p + 1, p + 2, . . . , 2p and
so on. We use the notions of ‘bottom’ and ‘top’ registers from the proof of key lemma.
Thus we divide each floor into two halves: top and bottom. They consist of p/2 = 2t + 2
top and bottom rows of each floor respectively. We define a family of floors to be a the
set of all floors whose indices differ by i · log t for some integer i. Altogether we have
log t families of floors. To each family of floors we assign the index of its first floor.
From now on we all the time deal only with t-disturbed 0-1 input sequences. Any
such a sequence has a border index b. The b-th register we call the border register. Its
row we call the border row. Its floor we call the border floor. In the analysis we take into
account only behavior of displaced 1s. Due to symmetry of the network the analysis for
displaced 0s is the same and can be omitted.
We begin with defining a particular kind of periodic cleaning network, which the
whole construction is based on. By adding comparators to this network we finally obtain
a periodic t-correction network. The periodic cleaning network is constructed in the
similar way as one in the previous chapter.
Above all we want to have some relation between vertical edges in areas XL and XR
and the division of rows into floors. These comparators are embeddings of a cleaning
network for dirty area p/2 = 2t + 2 in each area. Note that such a network also sorts any
input with dirty area of size t, so can be used in the construction of periodic cleaning
network for t dirty rows. The cleaning network consists of three subsequent parts. The
first part are sorting networks – each sorting a group of p subsequent registers corresponding to a single floor. This part has depth T (p). The second part consists of merging
networks which merge neighboring upper and lower halves of each pair of subsequent
groups from the first part. It has depth log p. The third part is the last layer which we
can add arbitrarily, because any layer of comparators does nothing to a sorted sequence.
This layer is defined a bit later in the paper.
Parts S, ML , MR are defined exactly the same way as earlier for a periodic cleaning
network. So as we previously proved the periodic network we now defined is a cleaning network for dirty areas consisting of at most t rows and the following key lemma
describing its running time holds
Lemma 3.2 (key lemma). We consider t′ , t′ ≤ t subsequent rows of above defined network, such that above (below) these rows there are only 0s (1s). Let we have majority
of 0s (1s) in these rows. There exist constants c and d such that after d · q steps the top
(bottom) c · t′ of these t′ rows contain only 0s (1s).
152
G. Stachowiak
Note that if we add to C any edges in Y or connecting rows whose difference is
bigger than t, then the key lemma still holds and so all its consequences hold too. We
prove the following lemma
Lemma 3.3. The periodic cleaning network described above sorts considered inputs
with dirty area having a · t rows in O(qa + q log t) iterations.
Proof. If the number of rows in dirty area is smaller than t then a standard reasoning
for periodic sorting networks works. We need only to consider what happens if the
number of rows in dirty area is bigger than t. If there are at least highest t/2 rows of
dirty area above the border row, then we can apply key lemma to these rows. Since the
input is t-disturbed we have majority of 0s in these rows. So we obtain ct/2 top rows
of 0s in time dq. Thus the dirty area is reduced by ct/2 rows. In the opposite case an
analogous reasoning can be applied to t/2 lowest rows where we have majority of 1s.
Now we add some comparators to layer C so that our network gains new properties.
First we add in area S comparators
{[(2i, 1) : (2i + t + 1, 2)]|i}.
To formulate the fact which follows from the presence of these comparators we must
specify what exactly we mean by right-running items. In the proof of key lemma rightrunning items were those 0s and 1s which were on the right of a horizontal edge after
step A or B. We redefine it saying that in area S right-running items go right in step C
instead of step A that is just after this step. Analogously we can redefine left-running
items. We assume that two diplaced 1s or two 0s are swapped by an edge if this is an
edge of step B or a slanted edge of step C or an edge of step A not belonging to area
S. Displaced 1s are not swapped with non-displaced 1s. We can now formulate a simple
property of our network that is preserved when we add edges
Fact 3.4 In the network defined above right-running displaced 1s remain right-running
as long as they are more than t + 1 rows above border row.
Now for a while we assume, that we deal only with displaced 1s that are more that
one floor above the border. We remind, that R-columns and L-columns after a given step
are columns containing right-running and left-running items respectively. We can note
that R-column which gets to the column jR while moving through XR is first sorted
separately on each floor by the first part of the cleaning network. Next the displaced 1s
from each floor go half a floor down by the second part. An analogous process is also
performed for left-running 1s in XL as long as they remain left-running.
Thus after the second part of their way through XR right-running displaced 1s are
located at the bottom of the top half of each floor above the border floor. Analogously
left-running displaced 1s are also moved just before the last layer embedded in XL to
the bottom of the top half of each floor.
We now should specify what the additional layer in the third part of XR does.
Formally speaking this layer is the set of comparators
{[(kp + p/2 + 2i) : (kp + p/2 − 2i + 1)]|, 0 < i ≤ t/2}.
Fast Periodic Correction Networks
153
It moves right-running displaced 1s that went through XR to odd indexed rows in the
middle of each floor. Analogously the last layer embedded in XL is
{[(kp + p/2 + 2i − 1) : (kp + p/2 − 2i)]|, 0 < i ≤ ⌈t/2⌉}.
It moves left-running displaced 1s to even indexed rows in the middle of each floor.
Let us call these rows for a while starting rows of these 1s. We can see that these
all right-running displaced 1s then pass MR , S, ML and XL without being moved by
vertical edges in MR , ML , XL . Note, that they encounter vertical edges only in ML and
they are at the bottom of these edges. The same happens to left-running 1s when they
pass ML , S, MR and XR . After passing XL each right-running 1 is t + 2 rows below
its starting (odd) row. After passing XR each left-running 1 is 2 rows above its starting
(even) row. These 1s are still on the same floors as their starting rows. Similar facts can
be proved for displaced 0s below the border which are also moved by last layers of XL
and XR described above.
Now we define the vertical edges added in area Y of layer C. These comparators are
embeddings of four semi-correction networks in each family of floors. Now we describe
the comparators embedded in r-th family of floors. Let a1 , a2 , . . . , a2N/(q log t) be the indices of odd rows in this family of floors. We can build a (t, N 1−(r−1)/ log t , N 1−r/ log t )semi-correction network on registers with these indices. The depth of this network is not
bigger (from the assumption about q) than the number of odd indexed columns in Y .
Let this network be the sequence of layers L1 , L2 , L3 , . . .. The first set of comparators is
{[(jL + 2j − 1, k) : (jL + 2j − 1, l) : [k, l] ∈ Lj ]}.
We assumed that after passing XL right-running 1s are in odd rows. Assume that
they can be present only in N 1−(r−1)/ log t odd rows of rth family directly above the
border. When they pass Y they can be present only in N 1−r/ log t odd rows of rth family
directly above the border. Passing Y in family r = log t finally causes these 1s get to
some of t odd rows of this family directly above the border. We formulate this assertion
as a fact later because we need some additional assumptions. Analogously we can embed
the same network once again to deal with left-running 1s that are in even rows. Formally
speaking we add to C the following set of comparators
{[(jR − 2j + 1, k + 1) : (jR − 2j + 1, l + 1) : [k, l] ∈ Lj ]}.
This set of edges again causes left running 1s which are in N 1−(r−1)/ log t even rows
of rth family directly above the border reduce the number of these rows between these
1s and the border to at most N 1−r/ log t . Analogously we also embed two copies of
(t, N r/ log t , N (r−1)/ log t )-semi-correction network to deal with displaced 0s below the
border row.
We described the whole network and the way it works informally. To make this
analysis more formal we assign colors to displaced 1s. We use five colors: blue, black,
red, yellow and green. Let β be the index of the border floor. We assume the following
rules of coloring displaced 1s:
– At the beginning the color of all displaced 1s is blue.
154
G. Stachowiak
reduced
merging
merging
network
network
in even
in odd
columns
merging
network
in odd
columns
S
ML
columns
sorting
sorting
network
network
in even
in odd
columns
columns
reduced
merging
network
in odd
merging
merging
network
network
in even
in odd
columns
columns
Y
columns
MR
Fig. 2. Comparator networks embeded on a single floor.
– If a blue 1 is compared with a non-blue 1 by a vertical edge, then the blue 1 behaves
like a 0.
– When any 1 gets to the floor with the index not smaller than β − 1, it changes its
color to green.
– When a right-running non-blue 1 gets to the floor β − 2, it changes its color to green.
– When a non-green left-running 1 changes to be right-running it becomes blue.
– When a blue 1 gets from Y to outside of Y it changes its color to black.
– When a black 1 enters Y from outside of Y on the floor belonging to the family 1,
then it changes its color to red.
– When a red 1 leaves Y on the floor in the last family of floors (family log t), then it
becomes yellow.
First we prove, that all green 1s stay close to the border. They prove to be all the
time at the floors with indices not smaller than β − 2, so they are not more than 13t rows
above the border row. We notice that right-running 1s can go only to the lower rows.
Left-running 1s can go to the higher rows only in area S of layer C and by wrap-around
edges of layer B. So only left-running can go up from the floor β − 2. Each q horizontal
steps a left-running 1 can go up by maximum t + 2 rows. But on the other hand each q
horizontal steps it passes XL once. Passing XL it goes to the row t-th or lower counting
from the bottom of floor β − 2. Thus it cannot leave floor β + 2 going up not more than
t + 2 rows. Moreover because our network is periodic correction network for dirty area
of t rows we have the following fact
Fact 3.5 If all displaced 1s are green, then the time to sort all the items above the border
is O(q log t).
Now we consider a right-running blue 1 or a left-running blue 1 assuming it stays
left-running. From what we said before a right-running 1 stops to be right-running only
when it is green. We want to see how quickly it becomes green. After O(q) steps this
1 stops to be blue. The worst case is that it becomes black. The following fact can be
Fast Periodic Correction Networks
155
viewed as a summary of what we said defining comparators of last column of XL and
XR . We take advantage of the fact that right-running 1s that just changed from being
left-running above floor β − 1 are blue. We also take advantage of the fact, that rightrunning 1s which are more than t rows above the border do not become left-running.
Such 1s is the only factor that could disturb the 1s we are interested in to go one floor
down.
Fact 3.6 Any black, red or yellow right-running 1 on the floor higher than β − 1 passing areas XR , MR , S, ML , XL goes one floor down and ends up in an odd indexed
row. Any black, red or yellow left-running 1 on the floor higher than β passing areas
XL , ML , S, MR and XR goes one floor down and ends up in an even indexed row.
The comparators in Y connect only the rows belonging to the same family of floors.
So passing Y a displaced 1 does not change its family of floors. Thus we have the next
fact.
Fact 3.7 Every q horizontal steps a black or red 1 gets from a family r to the family
r + 1. The exception is family r = log t from which it gets to family 1.
So after at most q log t horizontal steps a black 1 becomes red, unless it starts to be
green. We measure the distance of a red 1 that is in family r to the border as the number
of rows that belong to the family r and are between this 1 and the floor β − 2. Passing
Y in family 1 a red 1 reduces this distance from at most N to at most 2N 1−1/ log t . Then
it gets to families 2, 3, . . . , log t − 1. Passing Y in family r a red 1 reduces this distance
from 2N 1−(r−1)/ log t to 2N 1−r/ log t . Then after passing Y in family log t a red 1 is in
the distance at most 2t. This way a red 1 becomes yellow after q log t horizontal steps.
Now it is at most log t + 2 floors above the border. A yellow right-running 1 goes at least
1 floor down each q horizontal steps, till it becomes green after at most q log t horizontal
steps.
This whole process of color change from blue to green takes altogether 3q log t
horizontal steps. It always succeeds for right-running 1s. Left-running 1s can switch to
be right-running before they become green. They have to do it before 3q log t horizontal
steps in which they have to become green if they are all the time left-running. In such a
case they become inevitably green after next 3q log t iterations as right-running 1s. Thus
we have the following fact.
Fact 3.8 All disturbed 1s start to be green after at most 6q log t horizontal steps.
Because inputs having only green 1s are quickly sorted we get the main result of the
paper
Theorem 3.9. The periodic t-correction network we defined in this paper sorts any t
disturbed input in O(q log t) iterations, which is equal to
O(log N + (log log N )2 (log t)3 )
Acknowledgments. The author wishes to thank Marek Piotrów and other coworkers
from algorithms and complexity group of his institute for helpful discussions.
156
G. Stachowiak
References
1. M. Ajtai, J. Komolós, E. Szemerédi, Sorting in c log n parallel steps, Combinatorica 3 (1983),
1–19.
2. K.E. Batcher, Sorting networks and their applications, in AFIPS Conf. Proc. 32 (1968), 307–
314.
3. M. Dowd, Y. Perl, M. Saks, L. Rudolph. The Periodic Balanced Sorting Network. Journal of
the ACM 36 (1989), 738–757.
4. M. Kik, M. Kutyáowski, M. Piotrów, Correction Networks, in Proc. of 1999 ICPP, 40–47.
5. M. Kik, M. Kutyáowski, G. Stachowiak, Periodic constant depth sorting networks, Proc. of
the 11th STACS, 1994, 201–212.
6. M. Kik, Periodic Correction Networks, EUROPAR 2000 Proceedings, LNCS 1900, 471–478
7. D. E. Knuth, The Art of Computer Programming, Vol 3, 2nd edition, Addison Wesley, Reading,
MA, 1975.
8. J. Krammer, Lösung von Datentransportproblemen in integrierten Schaltungen. Dissertation,
TU München 1991.
9. K. Loryś, M. Kutyáowski, B. Oesterdiekoff, R. Wanka, Fast and Feasible Periodic Sorting
Networks of Constant Depth, Proc of 35 IEEE-FOCS, 1994, 369–380.
10. B. Oesterdiekoff, On the Minimal Period of Fast Periodic Sorting Networks, Technical Report
TR-RI-95-167, University of Paderborn, 1995.
11. M. Piotrów, Depth Optimal Sorting Networks Resistant to k Passive Faults in Proc. 7th SIAM
Symposium on Discrete Algorithms (1996), 242–251 (also accepted for SIAM J. Comput.).
12. M. Piotrów, Periodic Random-Fault-Tolerant Correction Networks, Proceedings of 13th
SPAA, ACM 2001, 298–305.
13. L. Rudolph,A Robust Sorting Network, IEEE Transactions on Computers 34(1985), 326–336.
14. M. Schimmler, C. Starke, A Correction Network for N -Sorters, SIAM J. Comput. 18 (1989),
1179–1197.
15. U. Schwiegelsohn. A shortperiodic two-dimensional systolic sorting algorithm. In International Conference on Systolic Arrays, Computer Society Press, Baltimore 1988, 257–264.
16. G. Stachowiak, Fibonacci Correction Networks, in Algorithm Theory – SWAT 2000 , M
Halldórsson (Ed.) , LNCS 1851, Springer 2000, 535–548.
Games and Networks
Christos Papadimitriou
The Computer Science Division
University of California, Berkeley
Berkeley, CA 94720-1776
christos@cs.berkeley.edu
Abstract. Modern networks are the product of, and arena for, the complex interactions between selfish entities. This talk surveys recent work
(with Alex Fabrikant, Eli Maneva, Milena Mihail, Amin Saberi, and Scott
Shenker) on various instances in which the theory of games offers interesting insights to networks. We study the Nash equilibria of a simple and
novel network creation game in which nodes/players add edges, at a cost,
to improve communication delays. We point out that the heavy tails in
the degree distribution of the Internet topology can be the result of a
trade-off between connection costs and quality of service for each arriving
node. We study an interesting class of games called network congestion
games, and prove positive and negative complexity results on the problem of computing pure Nash equilibria in such games. And we show that
shortest path auctions, which are known to involve huge overpayments
in the worst case, are “frugal” in expectation in several random graph
models appropriate for the Internet.
A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, p. 157, 2003.
c Springer-Verlag Berlin Heidelberg 2003
One-Way Communication Complexity of
Symmetric Boolean Functions
Jan Arpe⋆ , Andreas Jakoby⋆⋆ , and Maciej Liśkiewicz⋆ ⋆ ⋆
Institut für Theoretische Informatik, Universität zu Lübeck
{arpe,jakoby,liskiewi}@tcs.uni-luebeck.de
Abstract. We study deterministic one-way communication complexity
of functions with Hankel communication matrices. In this paper some
structural properties of such matrices are established and applied to
the one-way two-party communication complexity of symmetric Boolean
functions. It is shown that the number of required communication bits
does not depend on the communication direction, provided that neither
direction needs maximum complexity. Moreover, in order to obtain an
optimal protocol, it is in any case sufficient to consider only the communication direction from the party with the shorter input to the other
party. These facts do not hold for arbitrary Boolean functions in general.
Next, gaps between one-way and two-way communication complexity for
symmetric Boolean functions are discussed. Finally, we give some generalizations to the case of multiple parties.
1
Introduction
The communication complexity of two-party protocols was introduced by Yao in
1979 [15]. The theory of communication complexity evolved into an important
branch of computational complexity (for a general survey of the theory see e.g.
Kushilevitz and Nisan [9]).
In this paper we consider one-way communication, i.e. we restrict the communication to a single round. This simple model has been investigated by several
authors for different types of communication such as fully deterministic, probabilistic, nondeterministic, and quantum (see e.g. [15,12,1,11,3,8,7]). We study
the deterministic setting. One-way communication complexity finds application
in a wide range of areas, e.g. it provides lower bounds on VLSI complexity and
on the size of finite automata (cf. [5]). Moreover, one-way communication complexity of symmetric Boolean functions is connected to binary decision diagrams
by the following observation due to Wegener [14]: The size of an optimal protocol
coincides with the number of nodes at a certain level in a minimal OBDD.
We consider the standard two-party communication model: Initially the parties, called Alice and Bob, hold disjoint parts of input data x and y, respectively.
⋆
⋆⋆
⋆⋆⋆
Supported by DFG research grant Re 672/3.
Part of this work was done while visiting International University Bremen, Germany.
On leave from Instytut Informatyki, Uniwersytet Wroclawski, Wroclaw, Poland.
A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 158–170, 2003.
c Springer-Verlag Berlin Heidelberg 2003
One-Way Communication Complexity of Symmetric Boolean Functions
159
In order to compute a function f (x, y), they exchange messages between each
other according to a communication protocol.
In a (deterministic) one-way protocol P for f , one of the parties sends a
single message to the other party, and then the latter party computes the output
f (x, y). We call P a protocol of type A → B if Alice sends to Bob and of type
B → A if Bob sends to Alice. The size of P is the number of different messages
that can potentially be transmitted via the communication channel according to
P. The one-way communication size S A→B (f ) of f is the size of the best protocol
of type A → B. It is clear that the respective one-way communication complexity
is C A→B (f ) = ⌈log S A→B (f )⌉. For the case when Bob sends messages to Alice,
we analogously use the notation S B→A and C B→A . Note that throughout this
paper, log always denotes the binary logarithm.
The main results of this paper deal with one-way communication complexity
of symmetric Boolean functions – an important subclass of all Boolean functions.
A Boolean function F is called symmetric, if permuting the input bits does not
effect the function value. Some examples for symmetric functions are and, or,
parity, majority, and arbitrary threshold functions. We assume that the input
bits for a given F are partitioned into two parts, one part consisting of m bits
held by Alice and the other part consisting of n bits only known to Bob. As the
function value of a symmetric Boolean function only depends on the number of
1’s in the input (cf. [13]), it is completely determined by the sum of the number
of 1’s in Alice’s input part and the number of 1’s in Bob’s part. Hence for such
functions, we are faced with the problem of determining the one-way communication complexity of a function f : {0, . . . , m} × {0, . . . , n} → {0, 1} associated
to F , where f (x, y) only depends on the sum x + y. Note that S A→B (F ) ≤ m + 1
is a trivial upper bound on the one-way communication size of F .
Let us assume that Alice’s input part is at most as large as Bob’s is (i.e.
let m ≤ n). While for arbitrary functions this property does not imply which
communication direction admits the better one-way protocols, we show that the
converse is true for symmetric Boolean functions F , namely in this case we have
C A→B (F ) ≤ C B→A (F ). Moreover, we prove that if some protocol of type A → B
does not require maximal size, i.e. if S A→B (F ) < m + 1, then both directions
yield the same complexities, i.e. C A→B (F ) = C B→A (F ).
We also present a class of families of symmetric Boolean functions for which
one-way communication is almost as powerful as two-way communication. More
precisely, for any family of symmetric Boolean functions F1 , F2 , F3 . . . with Fm :
{0, 1}2m → {0, 1}, let fm : {0, . . . , m} × {0, . . . , m} → {0, 1} denote the integer
function associated to Fm . We prove that if fm ⊆ fm+1 for all m ∈ N, then either
the one-way communication complexities of F1 , F2 , F3 . . . are almost all equal
to a constant c or the two-way communication complexities of F1 , F2 , F3 . . . are
infinitely often maximal. We show that one can easily test whether the first or the
second case occurs: The two-way communication complexities are infinitely often
maximal if and only if the unary language {0k+ℓ | fm (k, ℓ) = 1, m, k, ℓ ∈ N} is
nonregular.
160
J. Arpe, A. Jakoby, and M. Liśkiewicz
On the other hand, we construct an example of a symmetric Boolean function
having one-way communication complexity exponentially larger than its twoway communication complexity. Finally, we generalize the two-party model to
the case of multiple parties and extend our results to such a setting.
Our proofs are based on the fact that the communication matrix of the
integer function f associated with a symmetric Boolean function F is a Hankel
matrix. In general, the entries of the communication matrix Mf of f are defined
by mi,j = f (i, j). A Hankel matrix is a matrix in which the entries on each
anti-diagonal are constant (equivalently, mi,j only depends on i + j). Hankel
matrices are completely determined by the entries of their first rows and their
last columns. Thus with any (m + 1) × (n + 1)-Hankel matrix H we associate
a function fH such that fH (0), fH (1), . . . , fH (n) compose the first row of H
and fH (n), fH (n + 1), . . . , fH (m + n) make up its last column. One of the main
technical contributions of this paper is a theorem saying that if m ≤ n and
H has less than m + 1 different rows, then fH is periodic on a certain large
interval. We apply this property to the one-way communication size using a
known relationship between this measure and the number of different rows in
communication matrices.
As a byproduct, we obtain a word combinatorial property: Let w be an
arbitrary string over some alphabet Σ. Then, for m ≤ ⌈|w|/2⌉ and n = |w| −
m + 1, the number of different substrings of w of length n is at most as large
as the number of different substrings of w of length m. Moreover, if the former
number is strictly less than m (note that it can be at most m in general), then the
number of different substrings of length n and the number of different substrings
of length m coincide.
The paper is organized as follows: In Section 2, we introduce basic definitions
and notation. Section 3 deals with the examination of the number of different
rows and columns in Hankel matrices involving certain periodicity properties.
In Section 4, we state some applications of these properties. Then, in Section 5,
we present a class of symmetric Boolean functions with both maximal one-way
and two-way communication complexity, and then we construct a symmetric
Boolean function with an exponential gap between its one-way and its two-way
communication complexity. Finally, in Section 6, we discuss natural extensions
of our results to the case of multiple parties.
2
Preliminaries
For any integers 0 ≤ k < k′ , let [k..k′ ] denote the set {k, k + 1, . . . , k′ }, and
denote [0..k] by [k] for short. By N we denote the set of nonnegative integers.
We consider deterministic one-way communication protocols between Alice and
Bob for functions f : [m] × [n] → Σ, where Σ is an arbitrary (finite or infinite)
nonempty set. More specifically, we assume that Alice holds a value x ∈ [m],
and Bob holds a value y ∈ [n] for some fixed positive integers m and n. Their
aim is to compute the value f (x, y).
One-Way Communication Complexity of Symmetric Boolean Functions
161
Let M(m, n) denote the set of all (m + 1) × (n + 1) matrices M = (mi,j ),
with mi,j ∈ Σ. It will be convenient for us to enumerate the rows from 0 to m
and the columns from 0 to n. For a given function f : [m] × [n] → Σ, we denote
by Mf the corresponding communication matrix in M(m, n).
Definition 1. For a matrix M ∈ M(m, n), define #row(M ) to be the number
of different rows of M , and similarly let #col(M ) be the number of different
columns of M . Furthermore, for any i, j ∈ [m], let i ∼M j denote that the rows
i and j of M are equal.
It is easy to characterize the one-way communication size by #row and #col.
Fact 1. For all m, n ∈ N and for every function f : [m] × [n] → Σ, it holds that
S A→B (f ) = #row(Mf ) and S B→A (f ) = #col(Mf ).
In this paper we will restrict ourselves to functions f that only depend on the
sum of the arguments. Note that for such functions f the communication matrix
Mf is a Hankel matrix. The problem of finding protocols for such restricted f
arises naturally when one considers symmetric Boolean functions.
Definition 2. Let f : [s] → N, λ ≥ 1 and s1 , s2 ∈ [s] with s1 ≤ s2 − λ. We call
f λ-periodic on [s1 ..s2 ], if for all x ∈ [s1 ..s2 − λ], f (x) = f (x + λ).
Obviously, f is λ-periodic on [s1 ..s2 ] if and only if for all x, x′ ∈ [s1 ..s2 ] with
λ | (x − x′ ), it holds that f (x) = f (x′ ).
3
Periodicity of Rows and Columns in Hankel Matrices
This section is devoted to examine the relationship between the number of different rows and the number of different columns in a Hankel matrix. Lemmas
1 through 3 are technical preparations for Theorem 1 which gives an explicit
characterization of a certain periodic behaviour of the function associated with
a Hankel matrix and of the Hankel matrix itself. Theorems 2 and 3 reveal all
possible constellations of values for #row(H) and #col(H) for a Hankel matrix H. The results will be applied to the theory of one-way communication in
Section 4.
Fact 2. Let f : [s] → N be λ-periodic on [s1 ..s2 ] ⊆ [s] and on [t1 ..t2 ] ⊆ [s] such
that s1 ≤ t1 and t1 + λ ≤ s2 . Then f is λ-periodic on [s1 ..t2 ].
Lemma 1. Let H ∈ M(m, n) be a Hankel matrix, m0 , m1 ∈ [m] with m0 < m1 ,
and λ ∈ [1..m1 − m0 ]. Then the following two statements are equivalent:
(a) fH is λ-periodic on [m0 ..m1 + n].
(b) For all x ∈ [m0 ..m1 ] and all k ∈ N such that x + kλ ≤ m1 , x ∼H x + kλ.
162
J. Arpe, A. Jakoby, and M. Liśkiewicz
Fig. 1. An illustration of Case 1.
Proof. “(a)⇒(b)”: Let x ∈ [m0 ..m1 ] and k ∈ N such that x + kλ ≤ m1 . For all
y ∈ [n], x + y ≥ m0 and x + y + kλ ≤ m1 + n . Since fH is λ-periodic on
[m0 ..m1 + n], we have fH (x + y) = fH (x + kλ + y).
“(b)⇒(a)”: Let x ∈ [m0 ..m1 + n − λ]. We consider two cases. If x ≤ m0 + n,
then fH (x) = fH (m0 + (x − m0 )) = fH (m0 + λ + (x − m0 )) = fH (x + λ) ,
because m0 ∼H m0 + λ by hypothesis. If on the other hand x > m0 + n, then
x − n > m0 and x − n + λ ≤ m1 . By hypothesis, x − n ∼H x − n + λ, and thus
fH (x) = fH (x − n + n) = fH (x − n + λ + n) = fH (x + λ) .
⊓
⊔
Corollary 1. Let H ∈ M(m, n) be a Hankel matrix and i, j ∈ [m] with i < j.
Then i ∼H j if and only if fH is (j − i)-periodic on [i..j + n].
Corollary 2. Let H ∈ M(m, n) be a Hankel matrix. If fH is λ-periodic on
[m0 ..m1 + n] for some m0 , m1 ∈ [m] with m0 < m1 and some λ ∈ [1..m1 − m0 ],
then #row(H) ≤ m0 + λ + m − m1 , where equality holds if and only if all rows
0, . . . , m0 + λ − 1 and m1 + 1, . . . , m are pairwise different.
Lemma 2. Let H ∈ M(m, n) be a Hankel matrix and m0 , m′0 , i, j ∈ [m] such
that m0 ≤ i < j, m′0 − m0 ≤ n + 1, j − m0 ≤ n + 1, i ∼H j, and m0 ∼H m′0 .
Then fH is (j − i)-periodic on [m0 ..j + n].
Proof. Choose λ = j − i and µ0 = m′0 − m0 . By Corollary 1, fH is
(i) µ0 -periodic on [m0 ..m′0 + n] and
(ii) λ-periodic on [i..j + n].
Let x ∈ [m0 ..j + n − λ]. In order to show that fH (x + λ) = fH (x), we consider:
Case 1: m0 ≤ x < i: Let k ∈ N such that i ≤ x + kµ0 ≤ i + µ0 − 1. We need to
show that
x, x + kµ0 , x + kµ0 + λ, x + λ ∈ [m0 ..m′0 + n]
x + kµ0 , x + kµ0 + λ ∈ [i..j + n]
and
(1)
(2)
One-Way Communication Complexity of Symmetric Boolean Functions
163
in order to apply properties (i) and (ii) to the corresponding elements. Property
(1) follows from m0 ≤ x and x+kµ0 +λ ≤ i+µ0 +λ−1 = j+m′0 −m0 −1 ≤ m′0 +n.
Property (2) is due to i ≤ x + kµ0 and x + kµ0 + λ ≤ j − 1 + µ0 ≤ j + n. Now (cf.
Fig. 1) fH (x) = fH (x + kµ0 ) = fH (x + kµ0 + λ) = fH (x + λ) , where the first
and the last equality follow from properties (1) and (i), and the middle equality
is due to properties (2) and (ii).
Case 2: i ≤ x ≤ j + n − λ: In this case, fH (x) = fH (x + λ) by Corollary 1. ⊓
⊔
The following lemma is symmetric to the previous one:
Lemma 3. Let H ∈ M(m, n) be a Hankel matrix and m1 , m′1 , i, j ∈ [m] such
that i < j ≤ m1 , m1 − m′1 ≤ n + 1, m1 − i ≤ n + 1, i ∼H j, and m1 ∼H m′1 .
Then fH is (j − i)-periodic on [i..m1 + n].
Proof. Let H = (hi,j ). We define λ = j − i and H ′ = (h′µ,ν ) ∈ M(m, n) by
h′µ,ν = hm−µ,n−ν for (µ, ν) ∈ [m] × [n], i.e. we rotate H by 180 degrees in
the plane. Clearly, H ′ is again a Hankel matrix. Moreover, we have fH (z) =
fH ′ (m + n − z) for all z ∈ [m + n]. We set m0 = m − m1 , m′0 = m − m′1 ,
i′ = m − j, and j ′ = m − i. Now it is easy to check that H ′ , i′ , j ′ , m0 , and m′0
fulfill the preconditions of Lemma 2 and m + n − x − λ ∈ [m0 ..j ′ + n − λ], thus
⊓
⊔
yielding fH (x + λ) = fH ′ (m + n − x − λ) = fH ′ (m + n − x) = fH (x) .
Theorem 1. Let m ≤ n + 1 and H ∈ M(m, n) be a Hankel matrix with
#row(H) < m + 1. Then there exist λ ∈ [1..n] and m0 , m1 ∈ [m] with
m1 − m0 ≥ λ such that the following two properties hold:
(a) The function fH is λ-periodic on [m0 ..m1 + n].
(b) If i, j ∈ [m] with i < j and i ∼H j, then i, j ∈ [m0 ..m1 ] and λ | (j − i).
Moreover, m0 , m1 and λ can be explicitly determined as follows:
m0 = min{k ∈ [m] | ∃k ′ ∈ [m] with k ′ > k and k ∼H k ′ } ,
m1 = max{k ∈ [m] | ∃k ′ ∈ [m] with k ′ < k and k ∼H k ′ } , and
λ = min{j − i | i, j ∈ [m] with i ∼H j and i < j} .
Proof. Since #row(H) < m + 1, there exist i, j ∈ [m] with i < j such that
i ∼H j. Thus, m0 , m1 and λ are well-defined. Clearly, m1 − m0 ≥ λ. Choose
i0 , j0 ∈ [m] such that i0 ∼H j0 and j0 − i0 = λ. Since m ≤ n, all preconditions of
Lemma 2 and Lemma 3 are satisfied. Thus we conclude that fH is λ-periodic on
both discrete intervals [m0 ..j0 + n] and [i0 ..m1 + n]. Fact 2 now yields property
(a). Now let i, j ∈ [m] with i < j and i ∼H j. Let k ∈ N such that j − i = kλ + r
with 0 ≤ r < λ. By property (a), fH is λ-periodic on [m0 ..m1 + n], and so by
Lemma 1 (note that i + kλ = j − r ≤ j ≤ m1 ), we have i + kλ ∼H i ∼H j. As
r = j − i − kλ < λ and λ is the minimal difference between two equal rows of
different indices, we have r = 0, so λ | (j − i).
⊓
⊔
Using Corollary 2 we deduce two consequences of Theorem 1:
Corollary 3. For H, m0 , m1 and λ as in Theorem 1, #row(H) = m0 + λ + m −
m1 , i.e. H has exactly m0 + λ + m − m1 pairwise different rows.
164
J. Arpe, A. Jakoby, and M. Liśkiewicz
Corollary 4. Let m ≤ n + 1 and H ∈ M(m, n) be a Hankel matrix with
#row(H) < m + 1. Then #col(H) ≤ #row(H).
The next lemma states an “expansion property” of Hankel matrices with at
least two equal rows.
Lemma 4. For arbitrary m, n ∈ N let H ∈ M(m, n) be a Hankel matrix with
#row(H) < m + 1. Then there exist m′ ≥ n and a Hankel matrix H̃ ∈ M(m′ , n)
such that #row(H̃) = #row(H) and #col(H̃) = #col(H).
Sketch of proof. We duplicate the area between two equal rows until the total
number of rows exceeds the total number of columns n. This process effects
neither the number of different rows nor the number of different columns.
⊓
⊔
Theorem 2. Let m ≤ n + 1 and H ∈ M(m, n) be a Hankel matrix with
#row(H) < m + 1. Then #row(H) = #col(H).
Proof. From Corollary 4, we have #row(H) ≥ #col(H). By Lemma 4, there exist
m′ ≥ n and a Hankel matrix H̃ ∈ M(m′ , n) such that #row(H̃) = #row(H)
and #col(H̃) = #col(H). Thus, again by Corollary 4, we obtain #row(H) =
#row(H̃) = #col(H̃ T ) ≤ #row(H̃ T ) = #col(H̃) = #col(H) . Consequently, we
have #row(H) = #col(H).
⊓
⊔
Theorem 3. Let m ≤ n and H ∈ M(m, n) be a Hankel matrix with #row(H) =
m + 1. Then #col(H) ≥ m + 1.
Proof. Induction on n: For n = m, we have H = H T and thus #col(H) =
#row(H T ) = #row(H) = m+1. Now suppose that n > m. Let H ′ ∈ M(m, n−1)
be the matrix H without its last column. We consider two cases:
Case 1: n ∼H T n′ for some n′ ∈ [n−1]. Then #col(H) = #col(H ′ ). In addition,
#row(H ′ ) = m + 1, because if #row(H ′ ) ≤ m was true, then we had i ∼H ′ j
for some 0 ≤ i < j ≤ m, and thus i ∼H j, since fH (i + n) = fH (i + n′ ) =
fH (j + n′ ) = fH (j + n). Thus, we get #col(H) = #col(H ′ ) ≥ m + 1 by induction
hypothesis.
Case 2: n ∼H T n′ for all n′ ∈ [n − 1]. Then #col(H) = #col(H ′ ) + 1. Once
again, we have to consider two subcases:
Case 2a: #row(H ′ ) = m + 1: Then #col(H) = #col(H ′ ) + 1 = m + 2 > m + 1
by induction hypothesis.
Case 2b: #row(H ′ ) ≤ m: Assume that #row(H ′ ) < m, and let
m0 = min{k ∈ [m] | ∃k ′ ∈ [m] with k ′ > k and k ∼H k ′ } ,
m1 = max{k ∈ [m] | ∃k ′ ∈ [m] with k ′ < k and k ∼H k ′ } ,
λ = min{k ′ − k | k, k′ ∈ [m] with k < k′ and k ∼H k ′ } ,
where m′0 , m′1 and λ′ are the corresponding numbers for H ′ . By Corollary 3,
#row(H ′ ) = m′0 + m − m′1 + λ′ , and f is λ′ -periodic on [m′0 ..m′1 + n − 1] by
One-Way Communication Complexity of Symmetric Boolean Functions
165
Theorem 1. Since #row(H ′ ) < m by assumption, λ′ < m′1 − m′0 . In particular,
m0 ∼H m0 + λ′ , and thus λ | λ′ by Theorem 1. Consequently, m0 ≤ m′0 ,
m1 ≥ m′1 − 1 and λ ≤ λ′ . Hence again by Corollary 3,
#row(H) = m0 + m − m1 + λ ≤ m′0 + m − (m′1 − 1) + λ′
≤ m′0 + m − m′1 + λ′ + 1 = #row(H ′ ) + 1 < m + 1 ,
contradicting the precondition #row(H) = m + 1. Thus, #row(H ′ ) = m. By
Theorem 2, #col(H ′ ) = #row(H ′ ) = m. Consequently, #col(H) = #col(H ′ ) +
1 = m + 1.
⊓
⊔
Note that for Hankel matrices over Σ with |Σ| ≥ m + n + 1 we can say even
more. Namely, if m ≤ n, then for all r ∈ [m + 1..n + 1], there exists a Hankel
matrix H ∈ M(m, n) with #row(H) = m + 1 and #col(H) = r. To see this,
define f : [m] × [n] → Σ = {a0 , . . . , am+n } by f (x, y) = a(x+y) mod r . Then
H = Mf is a Hankel matrix fulfilling the requested properties.
4
Applications
Theorems 2 and 3 can be summarized in terms of one-way communication as
follows.
Theorem 4. Let m ≤ n and f : [m] × [n] → Σ be a function for which the
corresponding communication matrix Mf is a Hankel matrix. Then the following properties hold: (a) S A→B (f ) ≤ S B→A (f ). (b) If S A→B (f ) < m + 1, then
S A→B (f ) = S B→A (f ).
This result can immediately be applied to symmetric Boolean functions:
Corollary 5. Let m ≤ n and F : {0, 1}m × {0, 1}n → {0, 1} be a symmetric Boolean function. Then the following properties hold: (a) S A→B (F ) ≤
S B→A (F ). (b) If S A→B (F ) < m + 1, then S A→B (F ) = S B→A (F ).
The results of the last paragraph can also be applied to word combinatorics
as follows:
Theorem 5. Let w be an arbitrary string over some alphabet Σ, and let Nw (i)
denote the number of different subwords of w of length i. Then, for m ≤ ⌈|w|/2⌉
and n = |w| − m + 1, we have Nw (n) ≤ Nw (m). Moreover, if Nw (n) < m (note
that Nw (n) ≤ m in general), then Nw (n) = Nw (m).
5
One-Way versus Two-Way Protocols
In this section we first present a class of families of functions for which one-way
communication complexities are almost the same as two-way communication
complexities. We denote the two-way complexity of F by C(F ). Let F1 , F2 , F3 . . .
with Fm : {0, 1}2m → {0, 1} be a family of symmetric Boolean functions and
let fm : [m] × [m] → {0, 1} denote the integer function associated to Fm , i.e.
m
2m
F (x1 , . . . , x2m ) = 1 if and only if f ( i=1 xi , i=m+1 xi ) = 1.
166
J. Arpe, A. Jakoby, and M. Liśkiewicz
Theorem 6. Let F1 , F2 , F3 . . . be a family of symmetric Boolean functions such
that fm ⊆ fm+1 for all m ∈ N. Then either
(a) for almost all m ∈ N, C A→B (Fm ) = c for some constant c or
(b) for infinitely many m ∈ N, C(Fm ) = ⌈log(m + 1)⌉.
Moreover, (b) holds iff the language L = {0k+ℓ | fm (k, ℓ) = 1, m, k, ℓ ∈ N} is
nonregular.
Proof. First, Theorem 11.3 in [6] gives a nice characterization of (non)regular
unary languages in terms of the rank of certain Hankel matrices. This characterization was first observed by Condon et al. in [2]. It says that the unary language
L is nonregular if and only if for infinitely many m ∈ N, rank(Mfm ) = m + 1
(i.e. the communication matrix Mfm has maximum rank). Second, Mehlhorn
and Schmidt [10] showed that C(f ) ≥ log(rank(Mf )) for every f . Combining
these facts we get that for nonregular L, C(fm ) = ⌈log(m + 1)⌉ for infinitely
many m ∈ N.
On the other hand, if L is regular then by the Myhill-Nerode Theorem [4] the
infinite matrix M = (mi,j )i,j∈N defined by mi,j = 1 iff 0i+j ∈ L, has constant
number of different rows. Hence the theorem follows.
⊓
⊔
Example 1. Let Fm (x1 , x2 , . . . , x2m ) = 1 iff the number of 1’s in the sequence
x1 , x2 , . . . , x2m is the square of some integer. By Theorem 6 either for all m ∈ N,
C(Fm ), C A→B (Fm ) ≤ c for some constant c or for infinitely many m ∈ N,
C A→B (Fm ) = C(Fm ) = ⌈log(m + 1)⌉. Since the language {0n | n is the square
of some integer} is nonregular, the (one-way) communication complexity of Fm
is maximal for infinitely many m ∈ N.
Next, we construct a symmetric Boolean function with an exponential difference between its one-way and its two-way communication complexity. Let
p0 , p1 , . . . with pi < pi+1 for all i ∈ N be the sequence of all prime numbers.
According to the Prime Number Theorem, there are at least logℓ ℓ prime numbers
2k −1
in the interval [ℓ] for all ℓ ≥ 5. For k = ⌈log log m⌉ and n = 2k · (1 + i=0 pi ),
the function f : [m] × [n] → {0, 1} defined by f (x, y) = 1 iff
consider
z
mod pz mod2k = 0, where z = x + y. Using the following two-way proto2k
col, one can see that the two-way communication complexity of f is at most
4 log log m: In the first round, Bob sends y0 = y mod 2k to Alice. In the sec0
ond round, Alice sends z0 = (x + y0 ) mod 2k and z ′ = x+y
mod pz0 to Bob.
2k
y
′
Finally, Bob computes f (x, y) by checking whether ( 2k + z ) mod pz0 = 0.
Note that z0 = z mod 2k . The correctness of the protocol can be seen by
investigating the addition of integers using a remainder representation.
Lemma 5. C(f ) ≤ 4 log log m.
For the one-way communication complexity of f we obtain:
Lemma 6. #row(Mf ) = m + 1, i.e. C A→B (f ) = ⌈log(m + 1)⌉.
Theorem 7. For the symmetric Boolean function F : {0, 1}m ×{0, 1}n → {0, 1}
associated with f , we have C(F ) ∈ O(log log m) and C A→B (F ) ∈ Θ(log m).
One-Way Communication Complexity of Symmetric Boolean Functions
6
167
Multiparty Communication
So far we have analyzed the case that a fixed input partition for a function is
given. However, sometimes it is also of interest to examine the communication
complexity of a fixed function under varying the input partition. A typical question for this scenario is whether we can partition the input in such a way that the
communication complexities for protocols of type A → B and B → A coincide.
The main tool for these examinations is the diversity ∆(f ) of f which we introduce below. For a function f : [s] → Σ and m ∈ [s], define fm : [m]×[s−m] → Σ
by fm (x, y) = f (x+y) for x ∈ [m] and y ∈ [s−m], and let rf (m) = #row(Mfm ).
We define ∆(f ) = maxm∈[s] rf (m).
Lemma 7. For every function f : [s] → Σ, the following conditions hold:
(a) rf (m) = m + 1 for all m ∈ [∆(f ) − 1],
(b) if ∆(f ) ≤ 2s , then rf (m) = ∆(f ) for all m ∈ [∆(f ) − 1 .. s − ∆(f ) + 1],
(c) rf (m) ≥ rf (m + 1) for all m ∈ [∆(f ) − 1 .. s − 1].
It is an immediate consequence of Lemma 7 that ∆(f ) equals the minimum
m such that Mfm has less than m + 1 different rows, provided that such an m
exists.
The diversity is helpful to analyze the case that more than two parties are
involved. For such multiparty communication we assume that the input is distributed among d parties P1 , . . . , Pd . Every party Pi knows a value xi ∈ [mi ].
The goal is to compute a fixed function f : [m1 ]×. . .×[md ] → Σ. Analogously to
communication matrices in the two-party case, we use multidimensional arrays
to represent f .
Let M(m1 , . . . , md ) be the set of all d-dimensional (m1 + 1) × . . . × (md + 1)
arrays M with entries M (i1 , . . . , id ) ∈ Σ for ij ∈ [mj ], j ∈ [1..d]. M is called
the communication array of a function f iff M (i1 , . . . , id ) = f (i1 , . . . , id ). We
denote the communication array of f by Mf .
Recall that in the two-party model the sender has to specify the row/column
his input belongs to. In the multiparty case each party will have to specify the
type of subarray determined by his input value. Therefore, for each k ∈ [1..d] and
(k)
each x ∈ [mk ], we define the subarray Mx ∈ M(m1 , . . . , mk−1 , mk+1 , . . . , md )
(k)
of M by Mx (i1 , . . . , ik−1 , ik+1 , . . . , id ) = M (i1 , . . . , ik−1 , x, ik+1 , . . . , id ) for
all 0 ≤ ij ≤ mj , j ∈ [1..d] \ {k}. Finally, for k ∈ [1..d] we define #subk (M ) as
the number of different subarrays with fixed k th dimension:
#subk (M ) = |{ Mx(k) | x ∈ [mk ] }| .
We call M ∈ M(m1 , . . . , md ) a Hankel array, if M (i1 , . . . , id ) = M (j1 , . . . ,
jd ) for every pair (i1 , . . . , id ), (j1 , . . . , jd ) ∈ [m1 ] × . . . × [md ] with i1 + . . . + id =
d
j1 + . . . + jd . For a Hankel array M ∈ M(m1 , . . . , md ), let fM : [ i=1 mi ] → Σ
be defined by fM (x) = M (x1 , . . . , xd ), if x = x1 + . . . + xd . Note that fM is
well-defined since M is a Hankel array.
168
J. Arpe, A. Jakoby, and M. Liśkiewicz
Lemma 8. For a function f such that the corresponding communication array
M is a Hankel array, we have rfM (mk ) = #subk (M ) for every k ∈ [1..d].
As a natural extension of two-party communication complexity we consider
the case that the parties P1 , . . . , Pd are connected by a directed chain of the
parties specified by a permutation π : [1..d] → [1..d], i.e. Pπ(i) can only send
messages to Pπ(i+1) for i ∈ [d − 1]. Let S π be the size of an optimal protocol.
More precisely, S π is the number of possible communication sequences on the
network in an optimal protocol.
We will now present a protocol of minimal size for a fixed chain network
and functions f such that Mf is a Hankel array. During the computation the
i
parties use the arrays Mi ∈ M( j=1 mπ(j) , mπ(i+1) , . . . , mπ(d) ), where Mi is
the Hankel array defined by
Mi (yi , . . . , yd ) = Mf (z1 , . . . , zd )
i
for all yi ∈ [ j=1 mπ(j) ], yi+1 ∈ [mπ(i+1) ], . . . , yd ∈ [mπ(d) ] and values z1 ∈
i
[m1 ], . . . , zd ∈ [md ] with yi = j=1 zπ(j) and yj = zπ(j) for all j ∈ [i + 1..d].
(1)
(1)
Furthermore, let Γi (yi ) be the minimum value z such that (Mi )z = (Mi )yi .
The protocol works as follows: (1) Pπ(1) sends γ1 = Γ1 (xπ(1) ) to Pπ(2) . (2) For
i ∈ [2..d − 1], Pπ(i) receives γi−1 from Pπ(i−1) and sends γi = Γi (xπ(i) + γi−1 ) to
Pπ(i+1) . (3) Pπ(d) receives γd−1 from Pπ(d−1) . Then Md (γd−1 + xπ(d) ) gives the
result of the function.
Theorem 8. For a function f such that Mf ∈ M(m1 , . . . , md ) is a Hankel
array the size of the protocol presented above is minimal.
Note that the communication size S π may depend on the order π of the
parties on the chain. We will state that for mπ(i) ≤ mπ(i+1) for all i ∈ [1..d − 1]
the ordering is optimal with respect to the communication size.
Theorem 9. Let f be a function such that Mf ∈ M(m1 , . . . , md ) is a Hankel
array and π : [1..d] → [1..d] be a permutation with mπ(i) ≤ mπ(i+1) for all
′
i ∈ [1..d − 1]. Then for every permutation π ′ : [1..d] → [1..d] S π (f ) ≤ S π (f ) .
A second generalization of the two-party model is the simultaneous communication complexity (C || ), where all parties can simultaneously write in a
single round on a blackboard. This means that the messages send by each
party do not depend on the messages send by the other parties. After finishing the communication round, each party has to be able to compute the result of the function (see e.g. [9]). For two-party communication it is well-known
that C || (f )
= C A→B (f ) + C B→A (f ) . Similarly, for the d-party case we have
||
C (f ) =
i∈[1..d] ⌈log #subi (Mf )⌉ . Hence, if Mf is a Hankel array and if
for some dimension k ∈ [1..d] we have #subk (Mf ) ≤ mini∈[1..d] mi , then by
Lemmas 7 and 8 C || (f ) = d · ⌈log ∆(fMf )⌉ .
As a third generalization, we consider the case that in each round some party
can write a message on a blackboard. The message may depend on messages that
One-Way Communication Complexity of Symmetric Boolean Functions
169
have been published on the board in previous rounds. We restrict the communication such that each party (except for the last one) publishes exactly one
message on the blackboard, and in each round exactly one message is published.
After finishing the communication rounds, at least one party has to be able to
compute the result of the function. Let S ✷ be the corresponding size of an optimal protocol. Note that this model generalizes both of the previous models.
Theorem 10. Let f be a function such that Mf ∈ M(m1 , . . . , md ) is a Hankel
array and let π : [1..d] → [1..d] be a permutation such that mπ(i) ≤ mπ(i+1) for
all i ∈ [1..d − 1]. Then S π (fM ) = S ✷ (fM ) .
7
Conclusions and Open Problems
In this paper we have investigated one-way communication complexity of functions for which the corresponding communication matrices are Hankel matrices.
We have established some structural properties of such matrices. As a direct application, we have obtained a complete solution to the problem of how the communication direction in deterministic one-way communication protocols effects
the communication complexity of symmetric Boolean functions. One possible
direction of future research is to study other kinds of one-way communication
such as nondeterministic and randomized for the class of symmetric functions.
Another interesting extension of the topic is to drop the restriction to one-way
protocols and consider the deterministic two-way communication complexity of
symmetric Boolean functions for both a bounded and an unbounded number of
communication rounds. This particularly involves results about the computation
of the rank of Hankel matrices. In addition, consequences for word combinatorics
and OBDD theory may be of interest.
Acknowledgment. We would like to thank Ingo Wegener for his useful comment on the connection between one-way communication and OBDD theory.
References
1. F. Ablayev, Lower bounds for one-way probabilistic communication complexity and
their application to space complexity. Theoretical Comp. Sc., 157 (1996), 139–159.
2. A. Condon, L. Hellerstein, S. Pottle, and A. Wigderson, On the power of finite
automata with both nondeterministic and probabilistic states. SIAM J. Comput.,
27 (1998), 739–762.
3. P. Ďuriš, J. Hromkovič, J.D.P. Rolim, and G. Schnitger, On the power of Las
Vegas for one-way communication complexity, finite automata, and polynomialtime computations. Proc. 14th STACS, Springer, 1997, 117–128.
4. J. E. Hopcroft and J. D. Ullman, Formal Languages and Their Relation to Automata. Addison-Wesley, Reading, Massachusetts, 1969.
5. J. Hromkovič, Communication Complexity and Parallel Computing. Springer, 1997.
170
J. Arpe, A. Jakoby, and M. Liśkiewicz
6. I. S. Iohvidov, Hankel and Toeplitz Matrices and Forms. Birkhäuser, Boston, 1982
7. H. Klauck, On quantum and probabilistic communication: Las Vegas and one-way
protocols. Proc. 32nd STOC, 2000, 644–651.
8. I. Kremer, N. Nisan, and D. Ron, On randomized one-round communication complexity. Computational Complexity, 8 (1999), 21–49.
9. E. Kushilevitz and N. Nisan, Communication Complexity. Camb. Univ. Press, 1997.
10. K. Mehlhorn and E. M. Schmidt, Las Vegas is better than determinism in VLSI
and distributed computing. Proc. 14th STOC, 1982, 330–337.
11. I. Newman and M. Szegedy, Public vs. private coin flips in one round communication games. Proc. 28th STOC, 1996, 561–570.
12. C. Papadimitriou and M. Sipser, Communication complexity. J. Comput. System
Sci., 28 (1984), 260–269.
13. I. Wegener, The complexity of Boolean functions. Wiley-Teubner, 1987.
14. I. Wegener, personal communication, April 2003.
15. A. C. Yao, Some complexity questions related to distributive computing. Proc.
11th STOC, 1979, 209–213.
Circuits on Cylinders
Kristoffer Arnsfelt Hansen1 , Peter Bro Miltersen1 , and V. Vinay2
1
Department of Computer Science, University of Aarhus, Denmark
{arnsfelt,bromille}@daimi.au.dk
2
Indian Institute of Science, Bangalore, India.
vinay@csa.iisc.ernet.in
Abstract. We consider the computational power of constant width
polynomial size cylindrical circuits and nondeterministic branching programs. We show that every function computed by a Π2 ◦ MOD ◦ AC0
circuit can also be computed by a constant width polynomial size cylindrical nondeterministic branching program (or cylindrical circuit) and
that every function computed by a constant width polynomial size cylindrical circuit belongs to ACC0 .
1
Introduction
In this paper we consider the computational power of constant width, polynomial
size cylindrical branching programs and circuits.
It is well known that there is a rough similarity between the computational
power of width restricted circuits and depth restricted circuits, but that this
similarity is not a complete equivalence. For instance, the class of functions
computed by a family of circuits of quasi-polynomial size and polylogarithmic
depth is equal to the class of functions computed by a family of circuits of
quasi-polynomial size and polylogarithmic width. On the other hand, the class
of functions computed by a family of circuits of polynomial size and polylogarithmic width (non-uniform SC) is, in general, conjectured to be different from
the class of functions computed by a family of circuits of polynomial size and
polylogarithmic depth (non-uniform NC). For the case of constant depth and
width, there is a provable difference in computational power; the class of functions computable by constant depth circuits of polynomial size, i.e, AC0 , is a
proper subset of the functions computable by constant width circuits (or branching programs) of polynomial size, the latter being, by Barrington’s Theorem [1],
the bigger class NC1 . On the other hand, Vinay [9] and Barrington et al [2,3]
showed that by putting a geometric restriction on the computation, the difference disappears: The class of functions computable by upwards planar, constant
width, polynomial size circuits (or nondeterministic branching programs) is exactly AC0 . Thus, both AC0 and NC1 can be captured by a constant width as
well as by a constant depth circuit model. It is then natural to ask if one can
similarly capture classes between AC0 and NC1 defined by various constant
depth circuit models, such as ACC0 and TC0 , by some natural constant width
circuit or branching program model.
A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 171–182, 2003.
c Springer-Verlag Berlin Heidelberg 2003
172
K.A. Hansen, P.B. Miltersen, and V. Vinay
x1
x1
x2
x2
x3
x3
x4
x4
x2
x3
x4
x2
x3
x4
x5
x5
Fig. 1. A cylindrical branching program of width 2 computing PARITY.
Building upon the results in this paper, such a characterisation has recently
been obtained for ACC0 [6]: The class of functions computed by planar constant
width, polynomial size circuits is exactly ACC0 .
In this paper we consider a slightly more relaxed geometric restriction than
upwards planarity, yet more restrictive than planarity: We consider the functions
computed by cylindrical polynomial size, constant width circuits (or nondeterministic branching programs). Informally (for formal definitions, see the next
section), a layered circuit (branching program) is cylindrical if it can be embedded on the surface of a cylinder in such a way that each layer is embedded on a
cross section of the cylinder (disjoint from the cross sections of the other layers),
no wires intersect and all wires between two layers are embedded on the part of
the cylinder between the two corresponding cross sections (see Fig. 1).
It is immediate that constant width polynomial size cylindrical branching
programs have more computational power than constant width polynomial size
upwards planar branching programs: The latter compute only functions in AC0
[2] while the former may compute PARITY (see Fig. 1). We ask what their exact
computational power is and show that their power does not extend much beyond
computing functions such as PARITY. Indeed, they can only compute functions
in ACC0 . To be precise, the first main result of this paper is the following lower
bound on the power of cylindrical computation.
Theorem 1. Every Boolean function computed by a polynomial size Π2 ◦
MOD ◦ AC0 circuit is also computed by a constant width, polynomial size cylindrical nondeterministic branching program.
By a Π2 ◦ MOD ◦ AC0 circuit we mean a polynomial sized circuit with an
AND gate at the output, a layer of OR gates feeding the AND gate, a layer of
MODm gates (perhaps for many different constant-bounded values of m) feeding
the OR gates and a (multi-output) AC0 circuit feeding the MOD gates. It is not
known if the inclusion is proper. We prove Theorem 1 by a direct construction,
generalising and extending the simple idea of Fig. 1.
Our second main result is the following upper bound on the power of cylindrical computation.
Theorem 2. Every Boolean function computed by a constant width, polynomial
size cylindrical circuit is in ACC0 .
Due to space constraints, the proof of Theorem 2 is omitted from this version
of the paper. Instead we provide a proof of the weaker statement that cylindrical
Circuits on Cylinders
173
branching programs only compute functions in ACC0 . We do however give an
overview of a proof of Theorem 2. The full proof can be found in the technical
report version of this paper [7].
The simulation is done (as were many previous results about constant width
computation) by using the theory of finite monoids and the results of Barrington
and Therien [4]. The notions of upwards planarity and of cylindricality share the
property that all arcs flow along a common direction. This allows these notions
to be captured by local constraints, which allows one to transfer the analysis
of the restricted branching programs and circuits into an appropriate algebraic
setting. Thus, we show the inclusion by relating the computation of cylindrical
circuits to solving the word problem of a certain finite monoid and then show
that this monoid is solvable.
A standard simulation shows that every Boolean function computed by a constant width, polynomial size cylindrical nondeterministic branching program is
also computed by a constant width, polynomial size cylindrical circuit. For completeness, we describe this simulation in Proposition 3. Thus, one can exchange
“cylindrical nondeterministic branching program” with “cylindrical circuit” and
vice versa in our two main results.
Organisation of Paper. In Sect. 2, we formally define the notions of cylindrical
branching program and circuits. We also give an overview of the algebraic tools
we use. In Sect. 3, we show Theorem 1. In Sect. 4 we show the weaker version of
Theorem 2 for cylindrical branching programs (instead of circuits), and in Sect.
5, we give an overview of the full proof of Theorem 2. We conclude with some
discussions and open problems in Sect. 6.
2
Preliminaries
Bounded Depth Circuits. Let A ⊂ {0, . . . , m − 1}. Using the notation of
Grolmusz and
[5], a MODA
m gate takes n boolean inputs x1 , . . . , xn and
Tardos
n
outputs 1 if i=1 xi ∈ A (mod m) and 0 otherwise. We let MOD denote the
family of MODA
m gates for all constant bounded m and all A. Similarly will
AND and OR denote the family of unbounded fanin AND and OR gates.
If G is a family of boolean gates and C is a family of circuits we let G ◦ C
denote the class of polynomial size circuit families consisting of a G gate taking
circuits from C as inputs.
AC0 is the class of functions computed by polynomial size bounded depth
circuits consisting of NOT gates and unbounded fanin AND and OR gates.
ACC0 is the class of functions computed when we also allow unbounded fanin
MOD gates computing MODk for constants k. We will also use AC0 and ACC0
to denote the class of circuits computing the languages in the respective classes.
Cylindrical Branching Programs and Circuits. A digraph D = (V, A) is
called layered if there is a partition V = V0 ∪ V1 ∪ · · · ∪ Vh such that all arcs of
174
K.A. Hansen, P.B. Miltersen, and V. Vinay
A goes from layer Vi to the next layer Vi+1 for some i. We call h the depth of
D, |Vi | the width of layer i and k = max |Vi | the width of D.
Let [k] denote the integers {1, . . . , k}. For a, b ∈ [k] where a ≡ b + 1
(mod k) we define the (cyclic) interval [a, b] to be the set {a, . . . , b} if a ≤ b
and {a, . . . , k} ∪ {1, . . . , b} if a > b. Furthermore let (a, b) = [a, b] \ {a, b}, and
let (a, b) = [k] \ {a, b} if a ≡ b + 1 (mod k).
Let D be a layered digraph in which all layers have width k. We will assume
the nodes in each layer numbered 1, . . . , k, and refer to nodes by these numbers.
Then, D is called a cylindrical if the following property is satisfied: For every
pair of arcs going from layer l to layer l + 1 connecting node a to node c and
node b to node d the following must hold: Nodes in the interval (a, b) of layer l
can only connect to nodes in the interval [c, d] of layer l + 1 and nodes in the
interval (b, a) of layer l can only connect to nodes in the interval [d, c] of layer
l + 1.
Notice this is equivalent of saying that nodes in the interval (c, d) of layer
l + 1 can only connect to nodes in the interval [a, b] of layer l and nodes in the
interval (d, c) of layer l + 1 can only connect to nodes in the interval [b, a] of layer
l.
A nondeterministic branching program 1 is an acyclic digraph where all arcs
are labelled by either a literal, i.e. a variable or a negated variable, or a boolean
constant, and an initial and a terminal node. An input is accepted if and only
if there is a path from the initial node to the terminal node in the digraph that
results from substituting constants for the literals according to the input and
then deleting arcs labelled by 0.
We will only consider branching programs in layered form, that is, viewed as
a digraph it is layered. We can assume that the initial node is in the first layer
and the terminal node in the last layer, and furthermore that these are the only
nodes incident to arcs in these layers. We can also assume that all layers have
the same number of nodes, by the addition of dummy nodes.
By a cylindrical branching program we will then mean a bounded-width
nondeterministic branching program in layered form, which is cylindrical when
viewed as a digraph.
A cylindrical circuit is a circuit consisting of fanin 2 AND and OR gates and
fanin 1 COPY gates, which when viewed as a digraph is a cylindrical digraph.
Inputs nodes can be literals or boolean constants. The output gate is in the last
layer. We can assume that all layers have the same number of nodes by adding
dummy input nodes to the first layer and dummy COPY gates to the other
layers.
A standard simulation of nondeterministic branching programs by circuits
extends to cylindrical branching programs and cylindrical circuits. We give the
details for completeness.
1
Our definition deviates slightly from the usual definition where nodes rather than
edges are labelled by literals and unlabelled nodes serve as special nondeterministic
“choice”-nodes, but it is easily seen to be polynomially equivalent - also in the
cylindrical case - and it is more convenient for us.
Circuits on Cylinders
175
Proposition 3. Every function computed by a width k, depth d cylindrical
branching program is also computed by a width O(k), depth O(d log k) cylindrical
circuit
Proof. Replace every node in the branching program by an OR-gate. Replace
each arc, going from, say, node u to node v and labelled with the literal x, with
a new AND-gate taking two inputs, gate u and the literal x and with the output
of the AND-gate feeding gate v.
This transformation clearly preserves the cylindricality of the graph. Also,
the width of the circuit is linear in the width of the branching program. The
resulting OR-gates may have fan-in bigger than two. We replace each such gate
with a tree of fan-in two OR-gates, preserving the width and blowing up the
depth by at most a factor of O(log k).
⊓
⊔
Monoids and Groups. Let x and y be elements of a group G. The commutator
of x and y is the element x−1 y −1 xy. The subgroup G(1) of G generated by all
of the commutators in G is called the commutator subgroup of G. In general,
let G(i+1) denote the commutator subgroup of G(i) . G is solvable if G(n) is the
trivial group for some n. It follows that an Abelian group, and in particular a
cyclic group, is solvable.
A monoid is a set M with an associative binary operation and a two sided
identity. A subset G of M is a group in M if it is a group with respect to the
operation of M . Note that a group G in M is not necessarily a submonoid of M
as the identity element of G may not be equal to the identity element of M . M
is called solvable if every group in M is solvable. The word problem for a finite
monoid M is the computation of the product x1 x2 . . . xn given x1 , x2 , . . . , xn as
input. A theorem by Barrington and Therien [4] states that the word problem
for a solvable finite monoid is in ACC0 .
3
Simulation of Bounded Depth Circuits by Cylindrical
Branching Programs
In this section, we prove Theorem 1. As a starting point, we shall use the “only
if” part of the following correspondence established by Vinay [9] and Barrington
et al [2]. We include here a proof of the “only if” part for completeness.
Theorem 4. A language is in AC0 if and only if it is accepted by a polynomial
size, constant width upwards planar branching program.
Here an upwards planar branching program is a layered branching program
satisfying, that for every pair of arcs going from layer l to layer l + 1 connecting
node a to node c and node b to node d, if a < b then c ≤ d.
We need some simple observations. First observe that if we can simulate a
class of circuits C with upwards planar (cylindrical) branching programs, then we
can also simulate AND ◦ C by upwards planar (cylindrical) branching programs
by simply concatenating the appropriate branching programs.
176
K.A. Hansen, P.B. Miltersen, and V. Vinay
Another way to combine branching programs is by substitution where we simply substitute a branching program for the edges corresponding to a particular
literal. The effect of this is captured in the following lemma.
Lemma 5. If f (x1 , . . . , xn ) is computed by an upwards planar (cylindrical)
branching program of size s1 and width w1 and g1 , . . . , gn and g 1 , . . . , g n are
computed by upwards planar branching programs, each of size s2 and width w2
then f (g1 , . . . , gn ) is computed by an upwards planar (cylindrical) branching program of size O(s1 w1 s2 ) and width O(w12 w2 ).
•
x1
•
•
1
•
•
1
•
1
xn−1
x2
•
1
•
1
•
xn
•
1
Fig. 2. An upwards planar branching program computing OR.
Combining the above observations with the construction in Fig. 2, simulating
an OR gate, we have established the “only if” part of Theorem 4.
Simulation of a MODA
m gate can be done as shown in Fig. 3 if one disregards
the top nodes in the first and last layers and modifies the connections between
the second-to-last layer to take the set A into account. Thus, combining this
construction with Lemma 5, the “only if” part of Theorem 4 and the closure of
cylindrical branching programs under polynomial fan-in AND, we have established that we can simulate AND ◦ MOD ◦ AC0 circuits by bounded width
polynomial size cylindrical circuits.
•
1
•
1
•
1
•
x1
x1
x1
x1
•
•
x1
•
x1
•
x1
•
x2
x2
•
•
x2
•
x2
•
xn
xn
1
•
•
xn
•
xn
•
•
xn
1
•
1
xn
•
x2
•
xn
xn
x2
•
x1
1
x2
x2
•
1
•
Fig. 3. A cylindrical branching program fragment for MOD4 .
The construction as shown in Fig. 3 has actually more use, by seeing it as
computing elements of M2 , where M2 is the monoid of binary relations on [2].
The general construction of a branching program fragment for MODA
m taking
n inputs is as follows: Without loss of generality we can assume that |A| = 1
and in fact A = {0} since we aim for simulating OR ◦ MOD. The branching
program fragment will have n + 3 layers. The first and last layer of width 2 and
the middle layers of width m. The top node in the first layer has arcs to all nodes
but node 1 and the bottom node has an arc to node 1. The top node in the last
layer has arcs from all nodes but the one in A and the bottom node has an arc
from this node. The nodes in the middle layers represent the sum of a prefix
Circuits on Cylinders
177
of the input modulo m in the obvious way. Consider now the elements of M2
shown in Fig. 4. The branching program fragment just described corresponds to
(a) and (b) for m = 2 and m > 2 respectively, when the simulated MOD gate
evaluates to 0. In both cases, the fragment correspond to (c) when the simulated
MOD gate evaluates to 1.
•
••
••
••
•
•
••
••
••
•
(a)
(b)
(c)
(d)
Fig. 4. Some elements of M2 .
We can now describe our construction for simulating OR ◦ MOD circuits.
The construction interleaves branching program fragments for (d) between the
branching program fragments for the MOD gates. This can be seen as a way
of “short circuiting” the branching program in the case that one of the MOD
gates evaluate to 1. Finally we add layers at both ends picking out the appropriate nodes for the simulation. The entire construction is shown in Fig. 5. The
correctness can easily be verified.
The simulation of OR ◦ MOD circuits, the “only if” part of Theorem 4,
Lemma 5, and the closure of cylindrical branching programs under polynomial
fan-in AND, together completes the proof of Theorem 1.
•
1
•
•
MOD
•
•
1
1
1
•
•
MOD
•
•
•
1
1
1
•
•
•
1
1
1
•
•
MOD
•
•
1
•
Fig. 5. A cylindrical branching program computing MOD ∨ · · · ∨ MOD.
4
Simulation of Cylindrical Branching Programs by
Bounded Depth Circuits
In this section, we compensate for the omitted proof of Theorem 2 sketched in
the next section, by giving a simpler (but similar) proof of the weaker result that
constant width polynomial size cylindrical nondeterministic branching programs
compute only functions in ACC0 .
In fact, we shall prove that for fixed k the following “branching program value
problem” BPVk is in ACC0 : Given a width k cylindrical branching program and
a truth assignment to its variables, decide if the program accepts. As any function
computed by width k cylindrical polynomial size branching program clearly is a
Skyum-Valiant projection [8] of BPVk , we will be done.
We shall prove that BPVk is in ACC0 by showing that it reduces, by an
AC0 reduction, to the word problem of the monoid Mk we define next. Then,
we show that the monoid Mk is solvable, and since this implies, by the result
178
K.A. Hansen, P.B. Miltersen, and V. Vinay
of Barrington and Therien [4] that the word problem for Mk is in ACC0 , our
proof will be complete.
We define Mk to be the monoid of binary relations on [k] which capture
the calculation of width k branching programs embedded on a cylinder in the
following sense: Mk is the monoid generated by all the relations which express
how arcs can travel between two adjacent layers in an width k cylindrical digraph.
The monoid operation is the usual composition operation of binary relations, i.e.,
if A, B ∈ Mk and x, y ∈ [k], xABy ⇔ ∃z : xAz ∧ zBy.
BPVk reduces to the word problem for Mk by the following AC0 reduction:
Substitute constants for the literals in the branching program according to the
truth assignment. Consider now the cylindrical digraph D consisting only of
arcs which have the constant 1 associated. Then, the branching program accepts
the input given if and only if there is a path from the initial node in the first
layer to the terminal node in the last layer of D. We can decide this by simply
decomposing D into a sequence A1 , A2 , . . . , Ah of elements from Mk , computing
the product A = A1 A2 · · · Ah and checking whether this is different from the
zero element of Mk .
Thus, we just need to show that Mk is solvable. Our proof is finished by the
following much stronger statement.
Proposition 6. All groups in Mk are cyclic.
Proof. Let G ⊆ Mk be a group with identity E. Let A ∈ G and let R be the
set of all x such that xEx. As will be shown next it will be enough to consider
elements of R to capture the structure of A.
Let x ∈ R. Since AA−1 = E there exists z such that xAz and zA−1 x.
Since A−1 A = E it follows zEz, that is, z ∈ R. Hence there exists a function
πA : R → R such that
∀x : xAπA (x) ∧ πA (x)A−1 x
To see that A is completely described by by πA , we define a relation  on [k]
such that xÂy ⇔ πA (x) = y. That is, Â is just πA viewed as a relation. Since
 ⊆ A it follows E ÂE ⊆ EAE = A. Conversely let xAy. Since E k A = A there
exists z ∈ R such that xEz and zAy. Since πA (z)A−1 z we get πA (z)Ey. That
is xEz, z ÂπA (z) and πA (z)Ey. Thus xE ÂEy. Hence we obtain that A = E ÂE.
We would like to have both that πA is a permutation and that {πA |A ∈ G}
is a group. This is in general not true, since E can be any transitive relation in
Mk .
To obtain this we will first simplify the structure of the elements of G using
the following equivalence relation on [k] defined by
x ∼ y ⇔ (xEy ∧ yEx) ∨ x = y.
Let A ∈ G. If x ∼ x′ and y ∼ y ′ then xAy ⇔ x′ Ay ′ , since EAE = A. Thus A
gives rise to a relation à on [k]/∼ where xAy ⇔ [k]x Ã[k]y and it will follow that
{Ã|A ∈ G} is an isomorphic group of G.
Circuits on Cylinders
179
= ÃB̃. This follows since [k]x AB[k]
z ⇔
For this we need to show that AB
xABz ⇔ ∃y : xAy ∧ yBz ⇔ ∃y : [k]x Ã[k]y ∧ [k]y B̃[k]z ⇔ [k]x ÃB̃[k]z
We can find an isomorphic copy of this group in Mk as follows. Choose for
each equivalence class [k]x a representative r([k]x ) in [k]x . Define a relation C on
[k] such that xCy ⇔ x = y = r([k]x ). Thus ∀x : r([k]x )Cr([k]x ). Let σ : G → Mk
be given by σ(A) = CAC. Then σ(G) is the desired isomorphic copy of G. We
can thus assume that the equivalence classes with respect to ∼ are of size 1.
We now return to the study of πA . The following property, that for x, y ∈ R
it holds that xEy ⇔ πA (x)EπA (y), is satisfied:
If xEy then πA (x)A−1 y since A−1 E = A−1 . As A−1 A = E it follows that
πA (x)EπA (y).
Conversely if πA (x)EπA (y) then xAπA (y) since xAπA (x) and AE = A. As
πA (y)A−1 y and AA−1 = E it then follows that xEy.
We can now conclude that πA is a permutation on R: If πA (x) = πA (y) then
πA (x) ∼ πA (y) so x ∼ y, that is, x = y. Also πA is uniquely defined : Assume
π̂A : R → R satisfies
∀x : xAπ̂A (x) ∧ π̂A (x)A−1 x
Let x ∈ R. We then obtain πA (x) ∼ π̂A (x) so πA (x) = π̂A (x). Hence πA = π̂A .
Now we can conclude that {πA |A ∈ G} is a permutation group which is
isomorphic to G. For this we need to show that πAB = πB ◦ πA .
Let x ∈ R. Since xAπA (x) and πA (x)BπB ◦ πA (x) it follows xABπB ◦ πA (x).
Since πB ◦ πA (x)B −1 πA (x) and πA (x)A−1 x it follows πB ◦ πA (x)B −1 A−1 x, i.e.
−1
πB ◦ πA (x)(AB) x
Since πAB is uniquely defined the result follows.
To show that {πA |A ∈ G} is cyclic we need the following fact, which easily
follows from the definition of cylindricality
Fact: Let A be a relation which can be directly embedded on a cylinder. Let
p1 < p2 < . . . pm and q1 < q2 < · · · < qm and π a permutation on [m] such that
∀i : pi Aqπ(i) . Then π is in the cyclic group of permutations on [m] generated by
the cycle (1 2 . . . m).
Now let r1 < r2 < · · · < rm be the elements of R. Write A ∈ G as A =
A1 A2 . . . Ah where the Ai ’s can be directly embedded on the cylinder. Since
ri AπA (ri ) we have for all i, elements of [k], ri = qi0 , qi1 , . . . , qih = πA (ri ) such
that qij Aj+1 qij+1 . For fixed j all the qij ’s are distinct. If not we would have i1
and i2 such that ri1 AπA (ri2 ) and ri2 AπA (ri1 ). But then since πA (ri1 )A−1 ri1 and
πA (ri2 )A−1 ri2 we then get ri1 Eri2 and ri2 Eri1 . That is ri1 ∼ ri2 which implies
ri1 = ri2 . Now by the fact and induction on h we have a permutation π in the
cyclic group generated by the cycle (1 2 . . . m) such that rπ(i) = πA (ri ). Thus πA
is in the cyclic group generated by the cycle (r1 r2 . . . rm ) and we can conclude
that G is cyclic.
⊓
⊔
180
5
K.A. Hansen, P.B. Miltersen, and V. Vinay
Simulation of Cylindrical Circuits by Bounded Depth
Circuits
In this section we provide an overview of the proof of Theorem 2 which can be
found in the technical report version of this paper [7].
The rough outline is similar to that of the last section. For fixed k we consider
the following “circuit value problem” CVk : Given a width k cylindrical circuit
and a truth assignment to its input variables, decide if the circuit evaluates to 1.
This is then reduced, by an AC0 reduction, to the word problem of the monoid
Nˆk defined next, which will be proved to be solvable. By the result of Barrington
and Therien [4] it then follows that CVk is in ACC0 .
Consider a width k cylindrical circuit C with k input nodes, all placed in the
first layer. We can view this as computing a function mapping {0, 1}k to {0, 1}k
by reading off the values of the nodes in the last layer. We let N̂k be the monoid
of such functions mapping {0, 1}k to {0, 1}k .
This provides the base for the desired AC0 reduction in the following way:
Given an instance of the circuit value problem we substitute constants for the
variables according to the truth assignment and then view each layer of the
circuit as an element of Nˆk by preceding it with k input nodes. By computing
the product of these and evaluating it on the constants given to the first layer,
the desired result is obtained.
The monoid Nˆk is shown to be solvable like in the previous section, by proving
that all its groups are cyclic. A first step to obtain this is to eliminate constants
from the circuits correspond to group elements. Let Nk be the monoid of functions mapping {0, 1}k to {0, 1}k which are computed by width k cylindrical
circuits with k variable input nodes, all placed in the first layer, with constant
input nodes disallowed. It is then proved that every group in Nˆk is isomorphic
to a group in Nk .
The tool for studying Nk will be an identification of input vectors in {0, 1}k
with its set of maximal 1-intervals as considered in [3], only here we consider
cyclic intervals. For example is the vector 1010011011 identified with the set of
intervals {[3, 3], [6, 7], [9, 1]}.
Now consider a group G in Nk with identity e, and let f ∈ G. Since e ◦ e = e
we get that e is the identity mapping on the image of e, Im e. Thus any f ∈ G
is a permutation of Im e, since f ◦ f −1 = f −1 ◦ f = e and e ◦ f = f . Also since
f ◦ e = f it follows that f is completely described by its restriction to Im e.
The fact that f has an inverse on Im e, is shown to imply that f must preserve
the number of intervals in any x ∈ Im e. The crucial property employed here,
is the monotonicity of the gate operations. This furthermore implies that f is
completely described by its restriction to the set I of vectors in Im e consisting
of only a single interval.
Next, using the natural partial order on I given by lifting the order 0 < 1
pointwise, one can decompose I into antichains, onto which f ∈ G is easy to
describe. In fact f is a cyclic shift on each of these antichains. Finally by relating
these cyclic shifts one can conclude that G is a cyclic group.
Circuits on Cylinders
6
181
Conclusion and Open Problems
We have located the class of functions computed by small constant width cylindrical circuits (or nondeterministic branching programs) between Π2 ◦ MOD ◦
AC0 and ACC0 . It would be very interesting to get an exact characterisation
of the power of cylindrical circuits and branching programs in terms of bounded
depth circuits. It is not known whether Π2 ◦ MOD ◦ AC0 is different from
ACC0 and this seems a difficult problem to resolve, so we cannot hope for an
unconditional separation of the power of cylindrical circuits from ACC0 . On the
other hand, it seems difficult to generalise the simulation of Π2 ◦ MOD ◦ AC0
by cylindrical branching programs to handle more than one layer of MOD gates
and we tend to believe that such a simulation is in general not possible. Thus,
one could hope that by better understanding the structure of the monoids we
have considered in this paper, it would be possible to prove an upper bound
seemingly better than ACC0 , such as for instance AC0 ◦ MOD ◦ AC0 .
It would also be interesting to separate the power of branching programs
from the power of circuits. As circuits can be trivially negated while preserving
cylindricality, we immediately have that not only Π2 ◦ MOD ◦ AC0 but also
Σ2 ◦ MOD ◦ AC0 can be simulated by small constant width cylindrical circuits.
On the other hand, we don’t know if Σ2 ◦ MOD ◦ AC0 can be simulated by
small constant width cylindrical branching programs. Note that in the upwards
planar case, both models capture AC0 and in the geometrically unrestricted case,
both models capture NC1 , so it is not clear if one should a priori conjecture the
cylindrical models to have different power. Note that if the models have identical
power then they can simulate AC0 ◦ MOD ◦ AC0 . This follows from the fact
that the branching program model is closed under polynomial fan-in AND while
the circuit model is closed under negation.
An interesting problems concerns the blowup of width to depth when going
from a cylindrical circuit or branching program to an ACC0 circuit. Our proof
does not yield anything better than a doubly exponential blowup. Again, by
better understanding the structure of the monoids we have considered, one could
hope for a better upper bound.
Acknowledgements. The first two authors are supported by BRICS, Basic
Research in Computer Science, a Centre of the Danish National Research Foundation.
References
1. D. A. Barrington. Bounded-width polynomial-size branching programs recognize
exactly those languages in NC1 . J. Comput. System Sci., 38(1):150–164, 1989.
2. D. A. M. Barrington, C.-J. Lu, P. B. Miltersen, and S. Skyum. Searching constant width mazes captures the AC0 hierarchy. In Proceedings of the 15th Annual
Symposium on Theoretical Aspects of Computer Science, pages 73–83, 1998.
182
K.A. Hansen, P.B. Miltersen, and V. Vinay
3. D. A. M. Barrington, C.-J. Lu, P. B. Miltersen, and S. Skyum. On monotone planar
circuits. In 14th Annual IEEE Conference on Computational Complexity, pages
24–31. IEEE Computer Society Press, 1999.
4. D. A. M. Barrington and D. Thérien. Finite monoids and the fine structure of NC1 .
Journal of the ACM (JACM), 35(4):941–952, 1988.
5. V. Grolmusz and G. Tardos. Lower bounds for (modp − modm) circuits. SIAM
Journal on Computing, 29(4):1209–1222, Aug. 2000.
6. K. A. Hansen. Constant width planar computation characterizes ACC0 . Technical
Report 25, Electronic Colloquium on Computational Complexity, 2003.
7. K. A. Hansen, P. B. Miltersen, and V. Vinay. Circuits on cylinders. Technical
Report 66, Electronic Colloquium on Computational Complexity, 2002.
8. S. Skyum and L. G. Valiant. A complexity theory based on boolean algebra. Journal
of the ACM (JACM), 32(2):484–502, 1985.
9. V Vinay. Hierarchies of circuit classes that are closed under complement. In 11th
Annual IEEE Conference on Computational Complexity, pages 108–117. IEEE Computer Society, 1996.
Fast Perfect Phylogeny Haplotype Inference
Peter Damaschke
Chalmers University, Computing Sciences, 41296 Göteborg, Sweden
ptr@cs.chalmers.se
Abstract. We address the problem of reconstructing haplotypes in a
population, given a sample of genotypes and assumptions about the underlying population. The problem is of major interest in genetics because haplotypes are more informative than genotypes when it comes
to searching for trait genes, but it is difficult to get them directly by
sequencing. After showing that simple resolution-based inference can be
terribly wrong in some natural types of population, we propose a different combinatorial approach exploiting intersections of sampled genotypes
(considered as sets of candidate haplotypes). For populations with perfect phylogeny we obtain an inference algorithm which is both sound and
efficient. It yields with high propability the complete set of haplotypes
showing up in the sample, for a sample size close to the trivial lower
bound. The perfect phylogeny assumption is often justified, but we also
believe that the ideas can be further extended to populations obeying
relaxed structural assumptions. The ideas are quite different from other
existing practical algorithms for the problem.
1
Introduction
Somatic cells of diploid organisms such as higher animals and plants contain two
copies of genetic material, in pairs of homologous chromosomes. The material on
an arbitrary but fixed part of a single chromosome is called a haplotype. Formally
we may describe a haplotype as a vector (a1 , . . . , as ) where s is the number of
sites considered, and ai is the genetic data at site i. Here the term site can refer to
a gene, a short subsequence, or even a single nucleotide. The ai are called alleles.
The vector of unordered pairs ({a1 , b1 }, . . . , {an , bn }) resulting from haplotypes
(a1 , . . . , an ) and (b1 , . . . , bn ) on homologous chromosomes is called a genotype.
A site is homozygous if ai = bi , and heterozygous (or ambigous) if ai = bi . The
terminology in the literature is not completely standardized, in the present paper
we use it as introduced above.
Usual sequencing methods yield only genotypes but not the pairs of haplotypes they are built from, the so-called phase information. Haplotyping techniques exist, but they are much more expensive, and it is expected that this
relation will stay so for quite many years. On the other hand, haplotype data is
often needed for analyzing the background of hereditary dispositions.
For example, a hereditary trait often originates from a single mutation on
a chromosome that has been transmitted over generations, and further silent
A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 183–194, 2003.
c Springer-Verlag Berlin Heidelberg 2003
184
P. Damaschke
mutations (without effect) supervened. This way the trait is associated with a
certain subset of haplotypes. If one wants to find the relevant mutation amongst
the silent ones, it is useful to recognize haplotypes of affected individuals and to
search the corresponding chromosomes only. Genotype information alone is less
specific, also for the purpose of prediction of traits. Other applications include
questions from population dynamics. Therefore it is important to reconstruct
haplotypes from observed genotypes.
A genotype with k > 0 ambigous sites can be explained by 2k−1 distinct haplotype pairs, and reconstruction is impossible if we consider isolated genotypes
only. However if we have a large enough genotype sample from a population and
a proper assumption about the structure of this population, we may be able to
infer the haplotypes with high confidence. One of them is:
Definition 1. A population fulfills the random mating assumption (is in HardyWeinberg equilibrium) if the haplotypes form pairs at random, according to their
frequencies in the population, i.e. the probability to have a specific ordered pair
of haplotypes in a randomly chosen genotype is simply the product of their frequencies.
Although this is not perfectly true in real populations, due to mating preferences and spatial structure, the behaviour of an inference algorithm in such a
setting says much about its appropriateness.
We focus attention on the biallelic case where each ai has two possible values which we may denote by Boolean constants 0 and 1. This is not a severe
restriction because there exist only two alleles per locus if mutations affect every
locus only once, which is typically the case. For notational convenience we write
haplotypes as binary strings and genotypes as ternary strings where 0,1, and 2
stand for {0, 0}, {1, 1}, and {0, 1}, respectively.
Definition 2. For β ⊂ {0, 1, 2}, the β-set of a genotype or haplotype is the set
of all sites whose value is in β. We omit set parantheses in β.
Sometimes it is convenient to rename the alleles such that some specific
haplotype is w.l.o.g. the zero string 00 . . . 0. Note that the 2-sets of genotypes
are invariant under this renaming.
One may also think of haplotypes as vertices of the s-dimensional cube of
Boolean vectors of length s. Having this picture in mind, we identify a genotype
with the subcube c having the generating haplotype pair as one of its diagonals,
i.e. with the set of haplotypes a ∈ c. This relation holds true iff ai = ci for all i
in the 0,1-set of c. We will use the notations interchangeably.
Related literature and our contribution. We try to give an overview of
various attempts, and we apologize for any omission.
In [2], the following resolution (or subtraction) method has been proposed.
Assume that our sample contains a genotype with no or one ambigous site.
Then we immediately know one or two haplotypes, respectively, for sure. They
are called resolved haplotypes. For any resolved haplotype a = a1 . . . an and any
Fast Perfect Phylogeny Haplotype Inference
185
genotype c = c1 . . . cn such that ai = 2 implies ci = ai , it is possible that c
is composed of a and another haplotype b defined by bi = ai for ci = 2, and
bi = 1 − ai for ci = 2. We call b the complement of a in c. The classical resolution
algorithm simply assumes that c is indeed built from a and b, it considers b as a
new resolved haplotype, and removes c as a resolved genotype from the sample,
and so on, until no further resolution step can be executed.
Objections against this heuristic have been noticed already in [2]: A minor problem is that we may not find a resolved haplotype to start with. A large
enough sample will contain some homozygous genotypes w.h.p. (Here and henceforth, w.h.p. means: with high probability.) More seriously, any resolution step
may be wrong, i.e. the subcube c containing vertex a may actually be formed
by a different haplotype pair. This is called an anomalous match. Even worse,
further resolution steps starting from a false haplotype b may cause a cascade
of such errors. The rush removal of resolved genotypes is yet another source of
errors, since the same genotype may well be formed by different haplotype pairs
in a population.
Resolution has been further studied in [7,8]. The output depends on the
ordering the steps are performed, and the “true” ordering must resolve all genotypes. Unfortunately, the corresponding maximization problem to resolve as
many genotypes as possible is Max-SNP hard [7], and moreover, a big number of resolved genotypes does not guarantee that the inferred haplotypes are
correct. (There exist some conjectures, heuristic reasoning, and experimental
results around this question, but apparently without rigorous theoretical foundation.) More advanced resolution algorithms solve some integer programming
problem on a resolution graph constructed from the sample, and they can find
good results in experiments [8], but still the reliabilty question remains.
A completlely different approach to haplotype inference is Bayesian statistics
under the random mating assumption. We refer to [4,11,12]. Although accuracy
has certainly been noticed as an issue, it is not obvious how reliable every single
haplotype in output sets of the various algorithms actually is.
In the present paper we address the question of reliable combinatorial haplotype inference methods. For haplotype populations having a perfect phylogenetic
tree (definitions are given later) we show that a combinatorial algorithm which
is different from resolution is able to infer all haplotypes w.h.p. from a large
enough sample, whereas resolution is provably bad.
The perfect phylogeny assumption has first been used for haplotype inference
in [9], resulting in an almost linear but very complicated algorithm (via reduction
to the graph realization problem). Slower but practical and elegant algorithms
have been discovered shortly thereafter independently by [1,3], and they proved
useful on real data. The work presented here (including the principal idea to
exploit perfect phylogeny structure) was mainly finished before we became aware
of [9,1,3]. We propose another elementary algorithm. It happens to be quite
different from the algorithms in [1,3] which work with pairs of sites. Our approach
is “orthogonal” so to speak, as it works with pairs of genotypes. This can be
advantageous for the running time, since only certain pairs of genotypes have
186
P. Damaschke
to be considered. It should be noticed that the algorithms in [1,3] output a
representation of all consistent haplotyping results, whereas our primary goal
is to output the haplotypes that can be definitely determined. We also study
the size of a random sample that leads to a unique result w.h.p. This does not
mean that the method gives a result only in the latter case: It still resolves
many haplotypes if fewer genotypes are available, and it is incremental in the
sense that new genotype data can be easily incorporated. Due to the different
approach and focus, our expected time complexity is not directly comparable to
the previous bounds, but under some circumstances it seems to be favourable.
(Details follow later.)
We believe that our approach complements the arsenal of haplotype inference methods. It seems that the ideas can be generalized to more complicated
populations.
2
Preliminaries
In addition to the notion already introduced, we clarify some more terminology
as we use it in the paper.
Definition 3. The genotype formed by haplotypes a and b (where a = b is allowed) is simply denoted ab. Haplotype b is called the complement of a in ab,
and vice versa.
Recall that we sometimes consider genotypes as sets (subcubes) of haplotypes, and note that each haplotype has a unique complement in a genotype.
Definition 4. A population is a set P of haplotypes, equipped with a frequency
of each haplotype in P . Clearly, the frequencies sum up to 1. A sample from P is
a multiset G of genotypes (not haplotypes!) ab with a, b ∈ P . (The same genotype
may appear several times in G.) An anomalous match, with respect to G, is a
triple of haplotypes a, b, c such that a, b ∈ P , ab ∈ G, c ∈ ab, but the complement
of c in ab is not in P .
An anomalous match can cause a wrong resolution step, if c is used to resolve
ab. (We do not demand c ∈ P since c may already be result of an earlier false
resolution step.)
Since very rare haplotypes are hard to find but, on the other hand, are also
of minor significance, we take a parameter n and aim at finding those haplotypes
with frequency at least 1/n. Suppose that n is chosen large enough such that
these haplotypes cover the whole P , up to some negligible fraction.
In the following we adopt the random mating assumption and make some
technical simplifications for the analysis later on. We emphasize that they are
not substantial and do not affect the algorithm itself. Let fi (i = 1, 2, . . .) denote
the haplotype frequencies. We fix some parameter n and aim at identifying all
haplotypes with fi ≥ 1/n, where n is chosen large enough such that the fi <
1/n sum up to a negligible fraction. In the worst case P contains n different
Fast Perfect Phylogeny Haplotype Inference
187
haplotypes, all with fi = 1/n. In general we will for simplicity pretend that all
fi are (roughly) integer multiples of 1/n. Then a haplotype of frequency fi is
considered as a set of fi n haplotypes which are equal as strings. Henceforth,
if we speak of “k haplotypes” or “k genotypes”, we do not require that they
are pairwise different. We say “identical” and “distinct” when we refer to these
copies of haplotypes and genotypes, and “equal” and “different” when we refer
to their string values. The probability that a randomly chosen genotype yields
a resolved haplotype is 1/n in the worst case.
Definition 5. The sample graph of G has vertex set P (consisting of n distinct
haplotypes) and edge set G, that is: An edge joins two haplotypes if they produced
the genotype corresponding to that edge.
A sample graph may contain loops (completlely homozygous genotypes) and
multiple edges (if the same haplotype pair is sampled several times). Note that
the sample graph is of course not “visible”, otherwise we would already know P .
Our focus is on asymptotic results, so we consider sums of sufficiently many
independent random variables, being sharply concentrated around their expected
values, such that we may simply take these expectations for deterministic values.
A well-known result on the coupon collector’s problem says that, if we choose
one of k objects at random then, after O(k log k) such trials, we have w.h.p.
touched every object at least once (see e.g. [10]). Consequently, if we sample
O(n2 log n) genotypes then w.h.p. all haplotypes are trivially resolved, because
all vertices in the sample graph get loops. The interesting question is what can
be accomplished by a smaller sample. Thus, suppose that G has size n1+g , with
g < 1. Then the sample graph has loops at (expected) ng distinct vertices and
about n1+g further edges between distinct vertices.
3
Populations with Tree Structure
Now we approach the particular contribution of this paper. A natural special
type of population has a single founder haplotype and is exposed to random
mutations over time. As long as the population is relatively young and the total
√
number of mutations (and hence n) is bounded by some small fraction of s,
w.h.p. each of the s sites is affected at most once. (Calculations are simple.)
Non-affected sites can be ignored, therefore s henceforth denotes the number of
sites where different alleles appear. From the uniqueness of mutations at every
site it follows that such a population P forms a phylogenetic tree T that enjoys
some strong properties discussed below. We call T a perfect phylogeny [6].
Definition 6. A population P of s-site haplotypes has a perfect phylogeny T if
the following holds:
(1) T is a tree. The vertices of T are labeled by haplotypes (bit strings) such that:
(1.1) P is a subset of the vertex set of T .
(1.2) Labels of any two vertices joined by an edge in T differ on exactly one site.
(2) Edges of T are labeled by sites, such that:
188
P. Damaschke
(2.1) The label of every edge is the site mentioned in (1.2).
(2.2) Each site is the label of at most one edge.
A branch vertex of T is a vertex with degree > 2.
The vertices of T can be seen as the haplotypes that appeared in the history
of P . However not every vertex is necessarily in P , since it can have disappeared
by extinction. Every edge in T is labeled by the site of the allele that has been
changed by the mutation corresponding to that edge. Sometimes we identify
vertices and edges of T with their labels, i.e. haplotypes and sites, respectively.
Note that T is an undirected tree. (Knowing the root is immaterial for our
purpose.) The distance of two vertices in T equals the Hamming distance of
their labels.
For every pair of haplotypes a, b let [a, b] = [b, a] denote the unique path (of
length 0 if a = b) in T connecting a and b. Obviously, edge labels on [a, b] are
exactly the members of the 2-set of ab. It follows easily:
Lemma 1. A haplotype c from T belongs to (the subcube) ab if and only if the
vertex labeled c is on [a, b].
Proof. We have c ∈ ab iff a, b, c agree at all sites in the 0,1-set of ab. These sites
are exactly the labels of edges out of [a, b].
⊓
⊔
Lemma 1 implies that every such triple a, b, c is an anomalous match, unless
c = a or c = b: If the complement d of c in ab were in P then d is on [a, b],
and [c, d] = [a, b], an obvious contradiction. Therefore we have many anomalous
matches already in trivial cases: Θ(n3 ) if T is a path. Even in more natural
cases such as fat trees, the number of anomalous matches is still in the order of
n2 log n.
In general, suppose that we have n2+d anomalous matches and sampled n1+g
random genotypes. Consider any of the hg haplotypes in P which are resolved
right from the beginning. It has the role of c in (expected) n1+d anomalous
matches, but it has only 2hg true haplotypes as neighbors in the sample graph.
That means that already for d > g − 1, almost all resolution results would be
false. (In contrast to perfect trees, resolution is a very good method if parts of
the genetic material under consideration have a high mutation rate: O(log n)
random sites are enough to destroy all anomalous matches.)
In the next section we address haplotype inference from a genotype sample G,
provided that the given population P has a perfect phylogeny. Since resolution is
highly misleading then, we follow another natural idea: We utilize intersections
of genotypes (considered as subcubes) from sample G.
4
Haplotype Inference in a Perfect Phylogeny
Problem statement: Given an unknown population P of haplotypes and a
known sample G of genotypes, as in Definition 4. We assume (or: it is promised)
that P has a perfect phylogeny T (unknown, of course). Identify as many as
possible haplotypes in P .
Fast Perfect Phylogeny Haplotype Inference
189
We continue analyzing the problem. Note that the intersection of any two
paths in T , say [a, b] and [c, d], is either empty or a path, say [e, f ]. Genotype
intersection neatly corresponds to path intersection in T :
Lemma 2. With the above denotations, the intersection of genotypes ab and cd
is the genotype ef .
Proof. W.l.o.g. let a − e − f − b and c − e − f − d be the ordering of vertices
a, b, c, d, e, f (not necessarily distinct) on path [a, b] and [c, d], respectively. Let
the label of e be w.l.o.g. the zero string. Let A, B, C, D, F denote the set of edge
labels on [a, e], [b, f ], [c, e], [d, f ], [e, f ], respectively. Then the label of a, b, c, d, f
has the 1-set A, B ∪ F, C, D ∪ F, F , respectively. Hence ab has the 2-set A ∪ B ∪ F
and the 1-set ∅. Similarly, cd has the 2-set C ∪D∪F and the 1-set ∅. We conclude
that ab ∩ cd has the 2-set F and the 1-set ∅. On the other hand, ef has the 2-set
F and the 1-set ∅. Now equality follows.
⊓
⊔
Due to this exact correspondence we sometimes use the notions genotype and
path interchangeably if we do not risk confusion.
Definition 7. For a subset S of vertices in T , the hull [S] of S is the unique
smallest subtree of T that includes S.
Algorithm, phase 1: We reconstruct [U ] where U is the set of haplotypes
known in the beginning (i.e. genotypes of size 1 and 2), utilizing the algorithm
of [5] which runs in O(ns) time. Surely, output [U ] is a (correct) subtree of T
since this reconstruction problem has a unique solution up to isomorphism.
While the labels of vertices in U are already determined, we have to compute
the labels of branch vertices in [U ] as well. For any branch vertex d, there exist
three vertices a, b, c ∈ U such that the paths from d to them are pairwise edgedisjoint. By Lemma 1, d belongs to each of ab, ac, bc. Given three binary strings
a, b, c of length s, their majority string, also of length s, is simply defined as
follows: At each position, the bit in the majority string is the bit appearing
there in a, b, c two or three times.
Lemma 3. With the above denotations, the label of d is the majority string of
labels of a, b, c.
Proof. Consider any bit position i, and w.l.o.g. let 1 be the bit which has majority
among ai , bi , ci . W.l.o.g. let be ai = bi = 1. Since d ∈ ab, we must have di = 1.
⊓
⊔
Algorithm, phase 2: Compute the labels of all branch vertices d in [U ] in
O(ns) time, using Lemma 3. Note that we can choose some fixed vertex from U
as a, and b, c as descendants of two distinct children of d in the tree rooted at a.
Let U ′ be the union of U and the set of branch vertices in [U ]. Note that
[U ] = [U ], and that U ′ partitions [U ] into edge-disjoint paths. Since we have
the vertex labels in U ′ , we know the 2-set assigned to each of these paths, but
not the internal linear ordering of edge labels. This gives reason to define the
following data structure:
′
190
P. Damaschke
Definition 8. A path-labeled tree consists of:
- a tree,
- a subset of its vertices called pins,
- labels of the pins,
- labels of the pin paths,
where a pin path is a path that connects two pins, without a further pin as
internal vertex.
In our case, every pin path label is simply the set of edge labels on that pin
path, i.e. we forget the ordering of edge labels, and the set of pins is initially
U ′ . The edge-labeled tree for [U ′ ] can be finished in O(ns) time, as we know the
labels of pins, including all the branch vertices. Sometimes we abuse notation
and identify edges and their labels if the context is clear.
Algorithm, phase 3: For each genotype in G, compute the intersection of its
2-set with [U ]. Recall that this intersection must be a path in [U ] (since the
2-set of every genotype ab ∈ G with a, b ∈ P corresponds to path [a, b] in T ).
In particular, if the 2-set of a genotype is entirely contained in [U ], we conclude
that the ends of this path are haplotypes in P . All intersections are obviously
computable in O(n1+g s) time.
In our path-labeled tree we recover the labels of endvertices of all (at most
n1+g ) intersection paths [a, b] (where not necessarily a, b ∈ P ), as described
in the following. Path [a, b] intersects one or more pin paths in [U ], and we
can recognize these pin paths by nonempty intersection of their labels with the
known 2-set of ab. If an end of [a, b], say a, happens to be a pin, then nothing
remains to be done with a. Otherwise a is an inner vertex of a pin path with
ends denoted by c and d. If [a, b] intersects parts of [U ] outside [c, d], let c be
that end of [c, d] not included in [a, b]. By computing set differences we get the
path labels of [a, d] and [c, a]. Since we know the label of pin c, and now also the
2-set of ca, we can change exactly those sites of c being in this 2-set and obtain
the label of a. (By symmetry we could also start from d.) Due to this refinement
of the path-labeled tree, a satisfies all requirements to become a new pin.
A slightly more complicated argument applies if [a, b] is contained in [c, d].
Again let a denote the end of [a, b] being closer to c. Since we have the label of
c and the 0-,1-, and 2-set of ab, we can split the set of sites in three subsets: the
2-set of ab, and the remaining sites being equal and different, respectively, in c
and ab. (Note that their values are 0 or 1.) If we walk the path [c, d] starting
in c, the sites in the 2-set and those being equal in c and ab cannot be changed
before a is reached, whereas the sites being different must be changed before a is
reached. These conditions uniquely determine the path label of [c, a]. Once this
path label is established, we recover the label of a as in the previous case.
This refinement of the path-labeled tree is successively done for all genotypes
from G. The operations which are merely manipulations along paths in [U ′ ] can
be implemented in O(n1+g s) time for all genotypes.
We summarize the preliminary results in
Lemma 4. We can identify, in O(n1+g s) time, all haplotypes a ∈ P for which
there exists another haplotype b ∈ P such that ab ∈ G, and a, b ∈ [U ].
⊓
⊔
Fast Perfect Phylogeny Haplotype Inference
191
Next we try to identify also haplotypes that do not fulfill the condition in
Lemma 4. Let ab ∈ G be a genotype such that [a, b] intersects [U ], in at least
one vertex or in some path. The part of [a, b] outside [U ] may consist of two
paths. Obviously, it is not possible to determine the correct splitting of the 2-set
of ab if we solely look at ab. However we shall see that pairwise intersections of
genotypes are useful.
Definition 9. At any moment, the known part K of T is the subtree represented
by our path-labeled tree as described in Definition 8, where each pin is a haplotype
from P or a branch vertex or both.
In particular, after the steps leading to Lemma 4 we have K = [U ].
Consider a, b, c, d ∈ P with ab, cd ∈ G, ab = cd, and ab ∩ cd = ∅. W.l.o.g.
suppose that ab ⊆ cd. Due to Lemma 2, these assumptions imply that
[e, f ] := [a, b] ∩ [c, d] = ∅, and that some edge of [a, b] is not in [e, f ] but incident
to e or f . Let us call this edge an anchor. Remember that we can easily compute
the 0-, 1- and 2-set of ef from the sampled ab and cd. From the 2-set we get also
K ∩ [e, f ] if this intersection contains at least one edge. By the same method as
described in phase 3, using the labels of pins and pin paths, we can also determine
the labels of ends of K ∩ [e, f ] and thus the precise location of K ∩ [e, f ] in K,
and split the path labels of affected pin paths in K accordingly.
With the denotations from the previous paragraph, next suppose that the
anchor is also an edge of K. We can recognize if this is true, since we know that
K ∩ [a, b] is a path in K extending K ∩ [e, f ], and we know the corresponding
2-sets. In fact, an anchor belongs to K iff the 2-set of K ∩ [a, b] properly contains
the 2-set of K ∩ [e, f ].
Definition 10. With respect to K, we call ab, cd an anchored pair of genotypes
if they have a nonempty intersection which also intersects K in at least one edge,
[e, f ] = [a, b] ∩ [c, d] is not completely in K, and they have an anchor, i.e. an
edge from the set difference, incident to e or f , in K.
In that case we can conclude that one end of path [e, f ] in T is exactly the
vertex of K where the anchor is attached to [e, f ], since otherwise [e, f ] would
not be the intersection of [a, b] and [c, d]. (This picture of a fixed point where
some “rope” ends inspired the naming “anchor”.) Finally, if we start at the
anchor and trace the edges of K whose labels are in the known 2-set of ef ,
we can reconstruct the entire path [e, f ], thereby adding its last part to K. In
particular, e and f and the vertex where [e, f ] leaves K become pins in tree K
extended by [e, f ] \ K. Thus, if [e, f ] is not entirely in K, we have extended the
known part of T .
Algorithm, phase 4: Choose an anchored pair of genotypes and extend K.
Repeat this step as long as possible. Resolve the genotypes whose paths are
completely contained in K, as in Lemma 4.
192
P. Damaschke
Rephrasing Definition 10 we see that a pair of genotypes is anchored if their
intersection paths with K end in the same vertex, x say, in K, their other ends
in K are different, and the part of the intersection of their 2-sets not yet in K is
nonempty. Testing any two genotypes from G for nonempty intersection outside
K takes O(s) time, and each pair must be tested at most once: If the test fails,
the intersection outside K will always be empty, since K only grows. If the test
succeeds, the missing part of the intersection is attached to K at x. This gives
a naive overall time bound of O(n2(1+g) s). However, the nice thing here is that
we need not check all pairs in G in order to find anchored pairs. (The following
is simpler than the implementation suggested in an earlier manuscript.)
In a random sample G we can expect that every set of genotypes in G whose
intersection paths in K end at the same vertex x is much smaller than n1+g .
Since tests can be restricted to paths that end in the same x, this gives already
an improvement. Moreover, the remaining 2-sets of genotypes outside K can
be maintained in O(n1+g s) time during the course of the algorithm. To find an
anchored pair with common end vertex x we may randomly pick such paths,
first with mutually distinct other ends, and mark their edges outside K in an
array of length < s. As long as no intersection is found, the time is within
O(s). If the degree of x is smaller than the number of distinct ends, we find a
nonempty intersection in O(s) time by the pigeonhole principle. Otherwise, since
the sample graph is random, a nonempty intersection involves w.h.p. two paths
with distinct ends in K, such that a few extra trials succeed. Thus we conjecture
O(s2 + n1+g s) expected time for all O(s) extension steps, under the probabilistic
assumptions made, but the complete analysis could be subtle.
The algorithms in [1,3] both run in guaranteed time O(n1+g s2 ) (in our terminology), however recall that they also output a representation of not completely
identified haplotypes, and that improved time bounds might be established. It
is hard to compare the algorithms directly.
To resume our haplotype inference algorithm for tree populations: First determine the set U of resolved haplotypes (i.e genotypes being homozygous in
all positions except at most one), set up the path-labeled tree description of
K = [U ], and then successively refine and enlarge it by paths from G in K and
intersection paths of anchored pairs, as long as possible.
With all the notation from above we can now state the following, still rather
technical result:
Lemma 5. Given a sample G of genotypes from a population P of haplotypes
with perfect phylogeny, we can determine, in polynomial time, all haplotypes
v ∈ P that satisfy these two conditions: v belongs to the subtree K of T obtained
by successively adding, to the initially known subtree [U ], intersection paths of
anchored pairs, and v is endpoint of some path from G in the final K.
⊓
⊔
Note that Lemma 5 is a combinatorial statement, saying which haplotypes
can at least be inferred from a given sample G. No probabilistic assumptions have
been made at this stage. However, if we plug in the random mating assumption,
Fast Perfect Phylogeny Haplotype Inference
193
we can expect that singleton intersections and anchors occur frequently enough
such that the final subtree K covers the entire population P :
Theorem 1. Given a population of n haplotypes with perfect phylogeny which
form genotypes by random mating, our algorithm reconstructs the population
w.h.p., from a random sample of n1+g genotypes, where for any desired confidence level, any g > 0 is sufficient for large enough n.
Proof. (Sketch) In T we may assign to every path from G a random orientation,
such that the bundles of roughly ng paths starting in each vertex of P are
pairwise independent random sets. This can only double the sample size estimate,
but it simplifies the argument. Recall that initially K = [U ] where U is the set
of haplotypes known from the beginning. The expected number of elements in
U is ng . A component (maximal subtree) of T \ K of size larger than Õ(n1−g )
does not exist w.h.p. since it would contain w.h.p. an element from U which is
impossible by definition of K.
Now let v ∈ P be any vertex in any component C of T \ K. Some pair of
paths from G starting in v has an anchor in K that allows to extend K up to
v, unless all these paths end in the same component of T \ K or at the same
vertex in K. Since roughly ng paths of G start in v and end
in random vertices,
g
the probability of this bad event is in the order of 1/ng n for any single v, and
at most n times as large for all v. Thus we will eventually have K = [P ] w.h.p.,
and all haplotypes inside K can be recovered.
⊓
⊔
If the haplotype fractions fi < 1/n sum up to some considerable fraction r(n),
the analysis goes through, only at cost of another factor 1/(1 − r(n))2 = O(1)
in the sample size.
The tradeoff between error probability and sample size may be further analyzed. Here it was our main concern to show that much fewer than O(n2 )
genotypes are sufficient. We may also recognize a larger part of T in the beginning, since one can show that intersections of genotypes with cardinality at most
2 must be vertices of T , on the other hand it costs extra time to find them.
5
Conclusions
Although perfect phylogeny is not only a narrow special case, as discussed in [9,
1,3], some extensions are desirable. Can we still apply the ideas if P has arisen
from several founders by mutations, if mutations affected some sites more than
once, if several evolutionary paths led to the same haplotype, if mutations are
interspersed with a few crossover events, etc.?
If P consists of several perfect phylogenetic trees with pairwise Hamming
distance greater than the number of mutations in each tree, the method obviously
works with slight modification: Genotypes with 2-set larger than this distance
are ignored. Since the others are composed of two haplotypes from the same tree,
the trees can be recovered independently. The fraction of “useful” genotypes in
a random sample, and thus the blow-up in sample size, is constant, for any
constant number of trees. However, this trivial extension is no longer possible if
the trees are not so well separated.
194
P. Damaschke
Acknowledgments. This work was partially supported by SWEGENE and by
The Swedish Research Council (Vetenskapsrådet), project title “Algorithms for
searching and inference in genetics”, file no. 621-2002-4574. I also thank Olle
Nerman (Chalmers, Göteborg) and Andrzej Lingas (Lund) for some inspiring
discussions.
References
1. V. Bafna, D. Gusfield, G. Lancia, S. Yooseph: Haplotyping as perfect phylogeny:
A direct approach, UC Davis Computer Science Tech. Report CSE-2002-21
2. A. Clark: Inference of haplotypes from PCR-amplified samples of diploid populations, Mol. Biol. Evol. 7 (1990), 111–122
3. E. Eskin, E. Halperin, R.M. Karp: Large scale reconstruction of haplotypes from
genotype data, 7th Int. Conf. on Research in Computational Molecular Biology
RECOMB’2003, 104–113
4. L. Excoffier, M. Slatkin: Maximum-likelihood estimation of molecular haplotype
frequencies in a diploid population, Amer. Assoc. of Artif. Intell. 2000
5. D Gusfield: Efficient algorithms for inferring evolutionary trees, Networks 21
(1991), 19–28
6. D. Gusfield: Algorithms on Strings, Trees and Sequences: Computer Science and
Computational Biology, Cambridge Univ. Press 1997
7. D. Gusfield: Inference of haplotypes from preamplified samples of diploid populations, UC Davis, technical report csse-99-6
8. D. Gusfield: A practical algorithm for optimal inference of haplotypes from diploid
populations, 8th Int. Conf. on Intell. Systems for Mol. Biology ISMB’2000 (AAAI
Press), 183–189
9. D. Gusfield: Haplotyping as perfect phylogeny: Conceptual framework and efficient solutions (extended abstract), 6th Int. Conf. on Research in Computational
Molecular Biology RECOMB’2002, 166–175
10. R. Motwani, P. Raghavan: Randomized Algorithms, Cambridge Univ. Press 1995
11. M. Stephens, N.J. Smith, P. Donnelly: A new statistical method for haplotype
reconstruction from population data, Amer. J. Human Genetics 68 (2001), 978–
989
12. J. Zhang, M. Vingron, M.R. Hoehe: On haplotype reconstruction for diploid populations, EURANDOM technical report, 2001
On Exact and Approximation Algorithms for
Distinguishing Substring Selection
Jens Gramm⋆ , Jiong Guo⋆⋆ , and Rolf Niedermeier⋆⋆
Wilhelm-Schickard-Institut für Informatik, Universität Tübingen, Sand 13,
D-72076 Tübingen, Fed. Rep. of Germany
{gramm,guo,niedermr}@informatik.uni-tuebingen.de
Abstract. The NP-complete Distinguishing Substring Selection
problem (DSSS for short) asks, given a set of “good” strings and a set
of “bad” strings, for a solution string which is, with respect to Hamming
metric, “away” from the good strings and “close” to the bad strings.
Studying the parameterized complexity of DSSS, we show that DSSS
is W[1]-hard with respect to its natural parameters. This, in particular, implies that a recently given polynomial-time approximation scheme
(PTAS) by Deng et al. cannot be replaced by a so-called efficient polynomial-time approximation scheme (EPTAS) unless an unlikely collapse
in parameterized complexity theory occurs.
By way of contrast, for a special case of DSSS, we present an exact
fixed-parameter algorithm solving the problem efficiently. In this way,
we exhibit a sharp border between fixed-parameter tractability and
intractability results.
Keywords: Algorithms and complexity, parameterized complexity, approximation algorithms, exact algorithms, computational biology.
1
Introduction
Recently, there has been strong interest in developing polynomial-time approximation schemes (PTAS’s) for several string problems motivated by computational molecular biology [6,15,16]. More precisely, all these problems adhere to a
scenario where we are looking for a string which is “close” to a given set of strings
and, in some cases, which shall also be “far” from another given set of strings
(see Lanctot et al. [14] for an overview on these kinds of problems and their applications in molecular biology). The underlying distance measure is Hamming
metric. The list of problems in this context includes Closest (Sub)String [15],
Consensus Patterns [16], and Distinguishing (Sub)String Selection [6].
All these problems are NP-complete, hence polynomial-time exact solutions are
out of reach and PTAS’s might be the best one can hope for. PTAS’s, however,
⋆
⋆⋆
Supported by the Deutsche Forschungsgemeinschaft (DFG), project OPAL (optimal
solutions for hard problems in computational biology), NI 369/2.
Partially supported by the Deutsche Forschungsgemeinschaft (DFG), junior research
group PIAF (fixed-parameter algorithms), NI 369/4.
A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 195–209, 2003.
c Springer-Verlag Berlin Heidelberg 2003
196
J. Gramm, J. Guo, and R. Niedermeier
often carry huge hidden constant factors that make them useless from a practical
point of view. This difficulty also occurs with the problems mentioned above.
Hence, two natural questions arise.
1. To what extent can the above approximation schemes be made really practical? 1
2. Are there, besides pure heuristics, theoretically satisfying approaches to solve
these problems exactly, perhaps based on a parameterized point of view [2,
10]?
In this paper, we address both these questions, focusing on the Distinguishing
Substring Selection problem (DSSS):
Input: Given an alphabet Σ of constant size, two sets of strings over Σ,
– Sg = {s1 , . . . , skg }, each string of length at least L (the “good”
strings),2
– Sb = {s′1 , . . . , s′kb }, each string of length at least L (the “bad”
strings),
and two non-negative integers dg and db .
Question: Is there a length-L string s over Σ such that
– in every si ∈ Sg , for every length-L substring ti , dH (s, ti ) ≥ dg and
– every s′i ∈ Sb has at least one length-L substring t′i with dH (s, t′i ) ≤
db ?
Here, dH (s, ti ) denotes the Hamming distance between strings s and si . Following
Deng et al. [6], we distinguish DSSS from Distinguishing String Selection
(DSS) in which all good and bad strings have the same length L; note that
Lanctot et al. [14] did not make this distinction and denoted both problems as
DSS.
The above mentioned Closest Substring is the special case of DSSS where
the set of good strings is empty. Furthermore, Closest String is the special
case of Closest Substring where all given strings and the goal string have the
same length. Since Closest String is known to be NP-complete [12,14], the
NP-completeness of Closest Substring and DSSS immediately follows.
All the mentioned problems carry at least two natural input parameters (“distance” and “number of input strings”) which often are small in practice when
compared to the overall input size. This leads to the important question whether
the seemingly inevitable “combinatorial explosion” in exact algorithms for these
problems can be restricted to some of the parameters—this is the parameterized
1
2
As Fellows [10] put it in his recent survey, “it would be interesting to sort out which
problems with PTAS’s have any hope of practical approximation”. Also see the new
survey by Downey [7] for a good exposition on this issue.
Deng et al. [6] let all good strings be of same length L; we come back to this restriction in Sect. 4. The terminology “good” and “bad” has its motivation in the
application [14] of designing genetic markers to distinguish the sequences of harmful germs (to which the markers should bind) from human sequences (to which the
markers should not bind).
On Exact and Approximation Algorithms
197
complexity approach [2,7,8,10]. In [13], it was shown that for Closest String
this can successfully be done for the “distance” parameter as well as the parameter “number of input strings”. However, Closest String is the easiest
of these problems. As to Closest Substring, fixed-parameter intractability
(in the above sense of restricting combinatorial explosion to parameters) was
recently shown with respect to the parameter “number of input strings” [11].
More precisely, a proof of W[1]-hardness (see [8] for details on parameterized
complexity theory) was given. It was conjectured that Closest Substring is
also fixed-parameter intractable with respect to the distance parameter, but it
is an open question to prove (or disprove) this statement.3
Now, in this work, we show that DSSS is fixed-parameter intractable (i.e., W[1]hard) with respect to all natural parameters as given in the problem definition
and, thus, in particular, with respect to the distance parameters. Besides of the
interest in its own concerning the impossibility4 of efficient exact fixed-parameter
algorithms, this result also has important consequences concerning approximation algorithms. More precisely, our result implies that no efficient polynomialtime approximation scheme (EPTAS) in the sense of Cesati and Trevisan [5] is
available for DSSS. As a consequence, there is strong theoretical support for
the claim that the recent PTAS of Deng et al. [6] cannot be made practical. In
addition, we indicate an instructive border between fixed-parameter tractability and fixed-parameter intractability for DSSS which lies between alphabets of
size two and alphabets of size greater than two. Two proofs in Sect. 4 had to be
omitted due to the lack of space.
2
Preliminaries and Previous Work
Parameterized Complexity. Given a graph G = (V, E) with vertex set V , edge
set E, and a positive integer k, the NP-complete Vertex Cover problem is
to determine whether there is a subset of vertices C ⊆ V with k or fewer vertices such that each edge in E has at least one of its endpoints in C. Vertex
Cover is fixed-parameter tractable with respect to the parameter k. There now
are algorithms solving it in less than O(1.3k + kn) time. The corresponding
complexity class is called FPT. By way of contrast, consider the NP-complete
Clique problem: Given a graph G = (V, E) and a positive integer k, Clique
asks whether there is a subset of vertices C ⊆ V with at least k vertices such
that C forms a clique by having all possible edges between the vertices in C.
Clique appears to be fixed-parameter intractable: It is not known whether it
can be solved in f (k) · nO(1) time, where f might be an arbitrarily fast growing
function only depending on k.
Downey and Fellows developed a completeness program for showing fixedparameter intractability [8]. We very briefly sketch some integral parts of this
theory.
3
4
In fact, more hardness results for unbounded alphabet size are known [11]. Here, we
refer to the practically most relevant case of constant alphabet size.
Unless an unlikely collapse in structural parameterized complexity theory occurs [10].
198
J. Gramm, J. Guo, and R. Niedermeier
Let L, L′ ⊆ Σ ∗ ×N be two parameterized languages.5 For example, in the case
of Clique, the first component is the input graph and the second component is
the positive integer k, that is, the parameter. We say that L reduces to L′ by a
standard parameterized m-reduction if there are functions k → k ′ and k → k ′′
from N to N and a function (x, k) → x′ from Σ ∗ × N to Σ ∗ such that
1. (x, k) → x′ is computable in time k ′′ |x|c for some constant c and
2. (x, k) ∈ L iff (x′ , k ′ ) ∈ L′ .
Observe that in the subsequent section we will present a reduction from
Clique to DSSS, mapping the Clique parameter k into all
four parameters of
DSSS; i.e., k ′ in fact is a four-tuple (kg , kb , dg , db ) = (1, k2 , k + 3, k − 2) (see
Sect. 3.1 for details). Notably, most reductions from classical complexity turn
out not to be parameterized ones. The basic reference degree for fixed-parameter
intractability, W[1], can be defined as the class of parameterized languages that
are equivalent to the Short Turing Machine Acceptance problem (also
known as the k-Step Halting problem). Here, we want to determine, for an
input consisting of a nondeterministic Turing machine M and a string x, whether
or not M has a computation path accepting x in at most k steps. This can
trivially be solved in O(nk+1 ) time and we would be surprised if this can be
much improved. Therefore, this is the parameterized analogue of the Turing
Machine Acceptance problem that is the basic generic NP-complete problem
in classical complexity theory, and the conjecture that FPT = W[1] is very
much analogous to the conjecture that P = NP. Other problems that are W[1]hard (and also W[1]-complete) include Clique and Independent Set, where
the parameter is the size of the relevant vertex set [8]. W[1]-hardness gives a
concrete indication that a parameterized problem with parameter k is unlikely
to allow for a solving algorithm with f (k) · nO(1) running time, i.e., restricting
the combinatorial explosion to k.
Approximation. In the following, we explain some basic terms of approximation theory, thereby restricting to minimization problems. Given a minimization
problem, a solution of the problem is (1 + ǫ)-approximate if the cost of the
solution is d, the cost of an optimal solution is dopt , and d/dopt ≤ 1 + ǫ. A
polynomial-time approximation scheme (PTAS) is an algorithm that computes,
for any given real ǫ > 0, a (1+ǫ)-approximate solution in polynomial time where
ǫ is considered to be constant. For more details on approximation algorithms,
refer to [4]. Typically, PTAS’s have a running time nO(1/ǫ) , often with large
constant factors hidden in the exponent which make them infeasible already for
moderate approximation ratio. Therefore, Cesati and Trevisan [5] proposed the
concept of an efficient polynomial-time approximation scheme (EPTAS) where
the PTAS is required to have an f (ǫ) · nO(1) running time where f is an arbitrary
function depending only on ǫ and not on n. Notably, most known PTAS’s are
not EPTAS’s [7,10].
5
Generally, the second component (representing the parameter) can also be drawn
from Σ ∗ ; for most cases, assuming the parameter to be a positive integer (or a tuple
of positive integers) is sufficient.
On Exact and Approximation Algorithms
199
Previous Work. Lanctot et al. [14] initiated the research on the algorithmic
complexity of distinguishing string selection problems. In particular, besides
showing NP-completeness (an independent NP-completeness result was also
proven by Frances and Litman [12]), they gave a polynomial-time factor-2approximation for DSSS. Building on PTAS algorithms for Closest String
and Closest Substring [15], Deng et al. [6] recently gave a PTAS for DSSS.
There appear to be no nontrivial results on exact or fixed-parameter algorithms for DSSS. Since Closest Substring is a special case of DSSS, however, the fixed-parameter intractability results for Closest Substring [11]
also apply to DSSS, implying that DSSS is W[1]-hard with respect to the
parameter “number of input strings”. Finally, the special case DSS of DSSS
(where all given input strings have exactly the same length as the goal string)
is solvable in O((kg + kb ) · L · (max {db + 1, (d′g + 1) · (|Σ| − 1)})db ) time with
d′g = L − dg [13], i.e., for constant alphabet size, it is fixed-parameter tractable
with respect to the aggregate parameter (d′g , db ). In a sense, DSS relates to DSSS
as Closest String relates to Closest Substring and, thus, DSS should be
regarded as considerably easier and of less practical importance than DSSS.
3
Fixed-Parameter Intractability of DSSS
We show that DSSS is, even for binary alphabet, W[1]-hard with respect to the
aggregate parameter (dg , db , kg , kb ). This also means hardness for every single of
these parameters. With [5], this implies that DSSS does not have an EPTAS.
To simplify presentation, in the rest of this section we use the following
technical terms. Regarding the good strings, we say that a length-L string s
matches an si ∈ Sg or, equivalently, s is a match for si , if dH (s, ti ) ≥ dg for every
length-L substring ti of si . Regarding the bad strings, we say that a length-L
string s matches an s′i ∈ Sb or, equivalently, s is a match for s′i , if there is a
length-L substring t′i of s′i with dH (s, t′i ) ≤ db . Both these notions of matching
for good as well as for bad strings generalize to sets of strings in the natural way.
Our hardness proof follows a similar structure as the W[1]-hardness proof for
Closest Substring [11]. We give a parameterized reduction from Clique to
DSSS. Here, however, the reduction has novel features in two ways. Firstly, from
the technical point of view, the reduction becomes much more compact and, thus,
more elegant. Secondly, for Closest Substring with binary alphabet, we could
only show W[1]-hardness with respect to the number of input strings. Here, however, we can show W[1]-hardness with respect to, among others, parameters dg
and db . This has strong implications: Here, we can conclude that DSSS has no
EPTAS, which is an open question for Closest Substring [11].
3.1
Reduction from Clique to DSSS
A Clique instance is given by an undirected graph G = (V, E), with a set V =
{v1 , v2 , . . . , vn } of n vertices, a set E of m edges, and a positive integer k denoting
the desired clique size. We describe how to generate two sets of strings over
200
J. Gramm, J. Guo, and R. Niedermeier
alphabet {0, 1}, Sg (containing one string sg of length L := nk + 5) and Sb
(containing k2 strings, each of length m · (2nk + 5) + (m − 1)), such that G has
a clique of size k iff there is a length-L string s which is a match for Sg and also
for Sb ; this means that dH (s, sg ) ≥ dg with Sg := {sg } and dg := k + 3, and
every s′b ∈ Sb has a length-L substring t′b with dH (s, t′b ) ≤ db and db := k − 2. In
the following we use “◦” to denote the concatenation of strings.
Good String. Sg := {sg } where sg = 0L , the all-zero string of length L.
Bad Strings. Sb := {s′1,2 , . . . , s′1,k , s′2,3 , s′2,4 , . . . , s′k−1,k }, where every s′i,j has
length m · (2nk + 5) + (m − 1) and encodes the whole graph; in the following,
we describe how we generate a string s′i,j .
We encode a vertex vr ∈ V , 1 ≤ r ≤ n, in a length-n string by setting the
rth position of this string to “1” and all other positions to “0”, i.e.,
vertex(vr ) := 0r−1 10n−r .
In s′i,j , we encode an edge {vr , vs } ∈ E, 1 ≤ r < s ≤ n, by a length-(nk)
string
n
edge(i, j,{vr , vs }) := 0n .
. . 0n ◦ vertex(vr ) ◦ 0n .
. .0n ◦ vertex(vs ) ◦0
. . 0n .
.
(i − 1)
(j − i − 1)
(k − j)
Furthermore, we define
edge block(i, j, {vr , vs }) := edge(i, j, {vr , vs }) ◦ 01110 ◦ edge(i, j, {vr , vs }) .
We choose this way of constructing the edge block(·, ·, ·) strings for the
following reason: Let edge(i, j, {vr , vs }) [h1 , h2 ] denote the substring of
edge(i, j, {vr , vs }) ranging from position h1 to position h2 . Then, every
length L = nk + 5 substring of edge block(·, ·, ·) which contains the “01110”
substring will have the form
edge(i, j, {vr , vs }) [h, nk] ◦ 01110 ◦ edge(i, j, {vr , vs }) [1, h − 1]
for 1 ≤ h ≤ nk + 1. This will be important because our goal is that a match for
a solution in a bad string contains all information of edge(i, j, {vr , vs }) . It is
difficult to enforce that a match starts at a particular position but we will show
that we are able to enforce that it contains a “111” substring which, by our construction, implies that the match contains all information of edge(i, j, {vr , vs }) .
Then, given E = {e1 , . . . , em }, we set
s′i,j := edge block(i, j, e1 ) ◦0◦ edge block(i, j, e2 ) ◦ . . .◦ edge block(i, j, em ) .
Parameter Values. We set L := nk + 5 and generate kg := 1 good string,
kb := k2 bad strings, and we set distance parameters dg := k +3 and db := k −2.
Example. Let G = (V, E) with V := {v1 , v2 , v3 , v4 } and E := {{v1 , v3 }, {v1 , v4 },
{v2 , v3 }, {v3 , v4 }} asshown in Fig. 1(a) and let k = 3. Fig. 1(b) displays the good
string sg and the k2 = 3 bad strings s′1,2 , s′1,3 , and s′2,3 . Additionally, we show
the length-(nk + 5), i.e., length-17, string s which is a match for Sg = {sg } and
a match for Sb = {s′1,2 , s′1,3 , s′2,3 } and, thus, corresponds to the k-clique in G.
On Exact and Approximation Algorithms
201
Fig. 1. Example for the reduction from a Clique instance to a DSSS instance with
binary alphabet. (a) A Clique instance G = (V, E) with k = 3. (b) The produced DSSS
instance. We indicate the “1”s of the construction by grey boxes, the “0”s by white
boxes. We display the solution s that is found since G has a clique of size k = 3; matches
of s in s′1,2 , s′1,3 , and s′2,3 are indicated by dashed boxes. By bold lines we indicate the
substrings by which we constructed the bad strings: each edge block(·, ·, e) substring
is built from edge(·, ·, e) for some e ∈ E, consisting of k length-n substrings, followed
by “01110”, followed again by edge(·, ·, e). (c) Alignment of the matches t′1,2 , t′1,3 ,
and t′2,3 (marked by dashed boxes in (b)) with sg and s.
3.2
Correctness of the Reduction
We show the two directions of the correctness proof for the above construction
by two lemmas.
Lemma 1 For a graph with a k-clique, the construction in Sect. 3.1 produces an
instance of DSSS that has a solution, i.e., there is a length-L string s such that
dH (s, sg ) ≥ dg and every s′i,j ∈ Sb has a length-L substring t′i,j with dH (s, t′i,j ) ≤
db .
Proof. Let h1 , h2 , . . . , hk denote the indices of the clique’s vertices, 1 ≤ h1 <
h2 < · · · < hk ≤ n. Then, we can find a solution string
s := vertex(vh1 ) ◦ vertex(vh2 ) ◦ · · · ◦ vertex(vhk ) ◦ 01110.
For every s′i,j , 1 ≤ i < j ≤ k, the bad string s′i,j contains a substring t′i,j
with dH (s, t′i,j ) ≤ db = k − 2, namely
t′i,j := edge(i, j, {vhi , vhj }) ◦ 01110.
Moreover, we have dH (s, sg ) ≥ dg = k + 3.
⊓
⊔
202
J. Gramm, J. Guo, and R. Niedermeier
Lemma 2 A solution for the DSSS instance produced from a graph G by the
construction in Sect. 3.1 corresponds to a k-clique in G.
Proof. We prove this statement in several steps:
(1) We observe that a solution for the DSSS instance has at least k + 3 “1”s
since dH (s, sg ) ≥ dg = k + 3 and sg consists only of “0”s.
(2) We observe that a solution for the DSSS instance has at most k + 3 many
“1”s: Following the construction, every length-L substring t′i,j of every bad
string s′i,j , 1 ≤ i < j ≤ k, contains at most five “1”s and dH (s, t′i,j ) ≤ k − 2.
(3) A match t′i,j for s in the bad string s′i,j contains exactly five “1”s: This follows
from the observation that any length-L substring in a bad string contains at most
five “1”s together with (1) and (2): Only if t′i,j contains five “1”s and all of them
coincide with “1”s in s, we have dH (s, t′i,j ) ≤ (k + 3) − 5 = k − 2.
(4) All t′i,j , 1 ≤ i < j ≤ k, and s must contain a “111” substring, located at
the same position: To show this, let t′i,j be a match of s in a bad string s′i,j
for some 1 ≤ i < j ≤ k. From (3), we know that the match t′i,j must contain
exactly five “1”s. Thus, since a substring of a bad string contains five “1”s only
if it contains a “111” substring, t′i,j must also contain a “111” substring (which
separates in s′i,j two substrings edge(i, j, e) for some e ∈ E). All “1”s in t′i,j
have to coincide with “1”s chosen from the k − 3 “1”s in s. In particular, the
position of the “111” substring must be the same in the solution and in t′i,j for
all 1 ≤ i < j ≤ k. This ensures a “synchronization” of the matches.
(5) W.l.o.g., all t′i,j , 1 ≤ i < j ≤ k, and s all end with the “01110”
substring: From (4), we know that all t′i,j contain a “111” substring at the
same position. If they do not all end with “01110”, we can shift them such
that the contained “111” substring is shifted to the appropriate position, as
we describe more precisely in the following. Recall that every length-L substring which contains the “111” substring of edge block(i, j, e) has the form
edge(i, j, e) [h, nk] ◦ 01110 ◦ edge(i, j, e) [1, h − 1] for 1 ≤ h ≤ nk and e ∈ E.
Since all t′i,j , 1 ≤ i < j ≤ k, contain the “111” substring at the same position, they all have this form for the same h. Then, we can, instead, consider
edge(i, j, e) [1, nk] ◦ 01110 and, by a circular shift, move the “111” substring
in the solution to the appropriate position. Considering the solution s and the
matches t′i,j for all 1 ≤ i < j ≤ k as a character matrix, this is a reordering of
columns and, thus, the pairwise Hamming distances do not change.
(6) We divide the first nk positions of the matches and the solution into k
“sections”, each of length n. In s, each of these sections has the form vertex(v)
for a vertex v ∈ V by the following argument: By (5), all matches in bad strings
end with “01110” and, by the way we constructed the bad strings, each of their
sections either consists only of “0”s or has the form vertex(v) for a vertex
v ∈ V . If the section encodes a vertex, it contains one “1” which has to coincide
with a “1” in s. For the ith section, 1 ≤ i ≤ k, the matches in strings s′i,j
for i < j ≤ k and in strings s′j,i for 1 ≤ j < i, encode a vertex in their ith
section. Therefore, every of the k sections in s contains a “1” and, since s (by
(1) and (2)) contains k + 3 many “1”s and (by (4)) ends with “01110”, each of
its sections contains exactly one “1”. Therefore, every section of s can be read
as the encoding vertex(v) for a v ∈ V .
On Exact and Approximation Algorithms
203
Conclusion. Following (6), let vhi , 1 ≤ i ≤ k, be the vertex encoded in the ith
length-n section of s. Now, consider some 1 ≤ i < j ≤ k. Solution s has a match
in s′i,j iff there is an edge(i, j, {vhi , vhj }) ◦ 01110 substring in s′i,j and this holds
iff {vhi , vhj } ∈ E. Since this is true for all 1 ≤ i < j ≤ k, all vh1 , vh2 , . . . , vhk
are pairwisely connected by edges in G and, thus, form a k-clique.
⊓
⊔
Lemmas 1 and 2 yield the following theorem.
Theorem 1 DSSS with binary alphabet is W[1]-hard for every combination of
the parameters kg , kb , dg , and db .6
⊓
⊔
Theorem 1 means, in particular, that DSSS with binary alphabet is W[1]hard with respect to every single parameter kg , kb , dg , and db . Moreover, it
allows us to exploit an important connection between parameterized complexity
and the theory of approximation algorithms as follows.
Corollary 1 There is no EPTAS for DSSS unless W[1] = FPT.
Proof. Cesati and Trevisan [5] have shown that a problem with an EPTAS is
fixed-parameter tractable with respect to the parameters that correspond to the
objective functions of the EPTAS. In Theorem 1, we have shown W[1]-hardness
for DSSS with respect to dg and db . Therefore, we conclude that DSSS cannot
have an EPTAS for the objective functions dg and db unless W[1] = FPT. ⊓
⊔
4
Fixed-Parameter Tractability for a Special Case
In this section, we give a fixed-parameter algorithm for a modified version
of DSSS. First of all, we restrict the problem to a binary alphabet Σ = {0, 1}.
Then, the problem input consists, similar as in DSSS, of two sets Sg and Sb
of binary strings, here with all strings in Sg being of length L. Increasing the
number of good strings, we can easily transform an instance of DSSS into one in
which all good strings have the same length L by replacing each string si ∈ Sg
by a set containing all length-L substrings of si . Therefore, in the same way as
Deng et al. [6] we assume in the following that all strings in Sg have length L.
We now consider, instead of the parameter dg from the DSSS definition, the
“dual parameter” d′g := L − dg such that we require a solution string s with
dH (s, si ) ≥ L − d′g for all si ∈ Sg . The idea behind is that in some practical
cases it might occur that, while dg is rather large, d′g is fairly small. Hence,
restricting the combinatorial explosion to d′g might sometimes be more natural
than restricting it to dg . Parameter d′g is said to be optimal if there is an s with
dH (s, si ) ≥ L − d′g for all si ∈ Sg and if there is no s′ with dH (s′ , si ) ≥ L − d′g + 1
for all si ∈ Sg . The question addressed in this section is to find the minimum
integer db such that, for the optimal parameter value d′g , there is a length-L
6
Note that this is the strongest statement possible for these parameters because it means that the combinatorial explosion cannot be restricted to a function f (kg , kb , dg , db ).
204
J. Gramm, J. Guo, and R. Niedermeier
string s with dH (s, si ) ≥ L − d′g for every si ∈ Sg and such that every s′i ∈ Sb
has a length-L substring t′i with dH (s, t′i ) ≤ db . Naturally, we also want to compute the length-L solution string s corresponding to the found minimum db . We
refer to this modified version of DSSS as MDSSS. We can read the set Sg of kg
length-L strings as a kg × L character matrix. We call a column in this matrix
dirty if it contains “0”s as well as “1”s.
In the following, we present an algorithm solving MDSSS. We conclude this
section by pointing out the difficulties arising when giving up some of the restrictions concerning MDSSS.
4.1
Fixed-Parameter Algorithm
We present an algorithm that shows the fixed-parameter tractability of MDSSS
with respect to the parameter d′g . There are instances of MDSSS where d′g is
in fact smaller than the parameter dg . In these cases, solving MDSSS could be
a way to circumvent the combinatorial difficulty of computing exact solutions
for DSSS; notably, DSSS is not fixed-parameter tractable with respect to dg
(Sect. 3) and we conjecture that it is not fixed-parameter tractable with respect
to d′g . The structure of the algorithm is as follows.
Preprocessing: Process all non-dirty columns of the input set Sg . If there are
more than d′g · kg dirty columns then reject the input instance. Otherwise,
proceed on the thereby reduced set Sg consisting only of dirty columns.
Phase 1: Determine all solutions s such that dH (s, si ) ≥ L−d′g for every si ∈ Sg
for the optimal d′g .
Phase 2: For every s found in Phase 1, determine the minimal value of db such
that every s′i ∈ Sb has a length-L substring t′i with dH (s, t′i ) ≤ db . Finally,
find the minimum value of db over all examined choices of s.
Note that, in fact, Phase 1 and Phase 2 are interleaved. Phase 1 of our algorithm
extends the ideas behind a bounded search tree algorithm for Closest String
in [13]. There, however, the focus was on finding one solution whereas, here, we
require to find all solutions for the optimal parameter value. This extension was
only mentioned in [13] and it will be described here.
Preprocessing. Reading the set Sg as a kg ×L character matrix, we set, for an all“0” (all-“1”) column in this matrix, the corresponding character in the solution
to “1” (“0”); otherwise, we would not find a solution for an optimal d′g . If the
number of remaining dirty columns is larger than d′g · kg then we reject the input
instance since no solution is possible.
Phase 1. The precondition of this phase is an optimal parameter d′g . Since, in
general, the optimal d′g is not known in advance, it can be found by looping
through d′g = 0, 1, 2, . . . , each time invoking the procedure described in the
following until we meet the optimal d′g . Notably, for each such d′g value, we
do not have to redo the preprocessing, but only compare the number of dirty
columns against d′g · kg .
On Exact and Approximation Algorithms
205
Phase 1 is realized as a recursive procedure: We maintain a length-L candidate string sc which is initialized as sc := inv(s1 ) for s1 ∈ Sg , where inv(s1 ) denotes the bitwise complement of s1 . We call a recursive procedure Solve MDSSS,
given in Fig. 2, working as follows.
If sc is far away from all strings in Sg (i.e., dH (sc , si ) ≥ L − d′g for all
si ∈ Sg ) then sc already is a solution for Phase 1. We invoke the second phase
of the algorithm with the argument sc . Since it is possible that sc can be further
transformed into another solution, we continue the traversal of the search tree:
we select a string si ∈ Sg such that sc is not allowed to be closer to si (i.e.,
dH (sc , si ) = L − d′g ); such an si must exist since parameter d′g is optimal. We try
all possible ways to move sc away from si (such that dH (sc , si ) = L − (d′g − 1)),
calling the recursive procedure Solve MDSSS for each of the produced instances.
Otherwise, if sc is not a solution for Phase 1, we select a string si ∈ Sg such
that sc is too close to si (i.e., dH (sc , si ) < L − d′g ) and try all possible ways to
move sc away from si , calling the recursive procedure for each of the produced
instances.
The invocations of the recursive procedure can, thus, be described by a search
tree. In the above recursive calls, we omit those calls trying to change a position
in sc which has already been changed before. Therefore, we also omit further
invocations of the recursive procedure if the current node of the search tree is
already at depth d′g of the tree; otherwise, sc would move too close to s1 (i.e.,
dH (sc , s1 ) < L − d′g ).
Phase 1 is given more precisely in Fig. 2. It is invoked by
Solve MDSSS(inv(s1 ), d′g ).
Phase 2. The second phase deals with determining the minimal value of db such
that there is a string s in the set of the solution strings found in the first phase
with dH (s, t′i ) ≤ db for 1 ≤ i ≤ kb , where t′i is a length-L substring of s′i .
For a given solution string s from the first phase and a string s′i ∈ Sb , we
use Abrahamson’s algorithm [1] to find the minimum of the number
of mis√
matches between s and every length-L substring of s′i in O(|si | L log L) time.
This minimum is equal to mint′i dH (s, t′i ), where t′i is length-L substring of s′i .
Applying this algorithm to all strings in Sb , we get the value of db for s,
maxi=1,... ,kb mint′i dH (s, t′i ). The minimum value of db is then the minimum distance of a solution string from Phase 1 to all bad strings, and s which achieves
this minimum distance is the corresponding solution string.
If we are given a fixed db and are asked if there is a string s among the solution
strings from the first phase which is a match to all strings in Sb , there is a more
efficient algorithm by √
Amir et al. [3] for string matching with db -mismatches,
which takes only O(|s′i | db log db ) time to find all length-L substrings in s′i whose
Hamming distance to s is at most db .
4.2
Correctness of the Algorithm
Preprocessing. The correctness of the preprocessing follows in a similar way as
the correctness of the “problem kernel” for Closest String observed by Evans
et al. [9] (proof omitted).
206
J. Gramm, J. Guo, and R. Niedermeier
Recursive procedure Solve MDSSS(sc , ∆d):
Global variables: Sets Sg and Sb of strings, all strings in Sg of length L, and integer d′g .
Input: Candidate string sc and integer ∆d, 0 ≤ ∆d ≤ d′g .
Output: For optimal d′g , each length-L string ŝ with dH (ŝ, si ) ≥ L−d′g and dH (ŝ, sc ) ≤
∆d.
Remark: The procedure calls, for each computed string ŝ, Phase 2 of the algorithm.
Method:
(0) if (∆d < 0) then return;
(1) if (dH (sc , si ) ≤ L − (d′g + ∆d)) for some i ∈ {1, . . . , kg } then return;
(2) if (dH (sc , si ) ≥ L − d′g ) for all i = 1, . . . , kg then
/* sc already is a solution for Phase 1 */
call Phase 2(sc , Sb );
choose i ∈ {1, . . . , kg } such that dH (sc , si ) = L − d′g ;
P := { p | sc [p] = si [p] };
for all p ∈ P do
s′c := sc ;
s′c [p] := inv(sc [p]);
call Solve MDSSS(s′c , ∆d − 1);
end for
else
/* sc is not a solution for Phase 1 */
choose i ∈ {1, . . . , kg } such that dH (sc , si ) < L − d′g ;
Q := { p | sc [p] = si [p] };
choose any Q′ ⊆ Q with |Q′ | = d′g + 1;
for all q ∈ Q′ do
s′c := sc ;
s′c [q] := inv(sc [q]);
call Solve MDSSS(s′c , ∆d − 1);
end for
end if
(3) return;
Fig. 2. Recursive procedure realizing Phase 1 of the algorithm for MDSSS.
Lemma 3 Given an MDSSS instance with the set Sg of kg good length-L
strings, and a positive integer d′g . If the resulting kg × L matrix has more than
kg · d′g dirty columns then there is no string s with dH (s, si ) ≥ L − d′g for all
si ∈ Sg .
⊓
⊔
Phase 1. From Step (2) in Fig. 2 it is obvious that every string s, which is output
of Phase 1 and for which, then, Phase 2 is invoked, satisfies dH (s, si ) ≥ L − d′g
for all si ∈ Sg . The reverse direction, i.e., to show that Phase 1 finds every
length-L string s with dH (s, si ) ≥ d′g for all si ∈ Sg , is more involved; the proof
is omitted:
On Exact and Approximation Algorithms
207
Lemma 4 Given an MDSSS instance, if s is an arbitrary length-L solution
string, i.e., dH (s, si ) ≥ L − d′g for all si ∈ Sg , then s can be found by calling
procedure Solve MDSSS.
⊓
⊔
Phase 2. The second phase is only an application of known algorithms.
4.3
Running Time of the Algorithm
Preprocessing. The preprocessing can easily be done in O(L · kg ) time. Even if
the optimal d′g is not known in advance, we can simply process the non-dirty
columns and count the number Ld of dirty ones; therefore, the preprocessing
has to be done only once. Then, while looping through d′g = 0, 1, 2, . . . in order
to find the optimal d′g , we only have to check, for every value of d′g in constant
time, whether Ld ≤ d′g · kg .
Phase 1. The dependencies of the recursive calls of procedure Solve MDSSS can
be described as a search tree in which an instance of the procedure is the parent
node of all its recursive calls. One call of procedure Solve MDSSS invokes at
most d′g + 1 new recursive calls. More precisely, if sc is a solution then it invokes
at most d′g calls and if sc is not a solution then it invokes at most d′g + 1 calls.
Therefore, every node in the search tree has at most d′g + 1 children. Moreover,
∆d is initialized to d′g and every recursive call decreases ∆d by 1. As soon as
∆d = 0, no new recursive calls are invoked. Therefore, the height
of the search
′
′
tree is at most d′g . Hence, the search tree has a size of O((d′g + 1)dg ) = O((d′g )dg ).
Regarding the running time needed for one call of procedure Solve MDSSS,
note that, after the preprocessing, the instance consists of at most d′g ·kg columns.
Then, a central task in the procedure is to compute the Hamming distance of
two strings. To this end, we initially build, in O(d′g · kg2 ) = O(L · kg ) time, a table
containing the distances of sc to all strings in Sg . Using this table, to determine
whether or not sc is a match for Sg or to find an si having at least d′g positions
coinciding with sc can both be done in O(kg ) time. To identify the positions in
which sc coincides with an si ∈ Sg can be done in O(d′g ·kg ) time. After we change
one position in sc , we only have to inspect one column of the kg × (d′g · kg ) matrix
induced by Sg and, therefore, can update the table in O(kg ) time. Summarizing,
one call of procedure Solve MDSSS can be done in O(d′g · kg ) time.
Together with the d′g = 0, 1, 2, . . . loop in order to find the optimal d′g , Phase 1
′
can be done in O((d′g )2 · kg · (d′g )dg ) time.
Phase 2. For every solution
string found in Phase 1, the running time of the
√
second phase is O(N L log L), where N denotes the sum of the length of all
strings in Sb [1].
We obtain the following theorem:
√
Theorem 2 MDSSS can be solved in O(L · kg + ((d′g )2 kg + N L log L) ·
′
(d′g )dg ) time where N = s′ ∈Sb |s′i | is the total size of the bad strings.
⊓
⊔
i
208
4.4
J. Gramm, J. Guo, and R. Niedermeier
Extensions of MDSSS
The special requirements imposed on the input of MDSSS seem inevitable in
order to obtain the above fixed-parameter tractability result. We discuss the
problems arising when relaxing the constraints on the alphabet size and the
value of d′g .
Non-binary alphabet. Already extending the alphabet size in the formulation of MDSSS from two to three makes our approach, described in Sect. 4.1,
combinatorially much more difficult such that it does not yield fixed-parameter
tractability any more. A reason lies in the preprocessing. When having an allequal column in the character matrix induced by Sg , for a three-letter alphabet
there are two instead of one possible choices for the corresponding position in the
solution string. Therefore, to enumerate all solutions s with dH (s, si ) ≥ L−d′g for
all si ∈ Sg , which is essential for our approach, is not fixed-parameter tractable
any more; the number of solutions is too large. Let L′ ≤ L be the number
of non-dirty columns and let the alphabet size be three. Then, aside from the
′
dirty columns, we already have 2L assignments of characters to the positions
corresponding to non-dirty columns.
Non-optimal d′g parameter. Also for non-optimal d′g parameter, the number
of solutions s with dH (s, si ) ≥ L − d′g for all si ∈ Sg can become too large and
it appears to be fixed-parameter intractable with respect to d′g to enumerate
them
where Sg = {0L }. Then, there are more than
L all. Consider the example
L
′
′
d′g strings s with dH (s, 0 ) ≥ L − dg . (If the value of dg is only a fixed number
larger than the optimal one, it could, nevertheless, be possible to enumerate all
solution strings of Phase 1.)
5
Conclusion
We have shown that Distinguishing Substring Selection, which has a
PTAS, cannot have an EPTAS unless FPT = W[1]. It remains open whether
this also holds for the tightly related and similarly important computational biology problems Closest Substring and Consensus Patterns, each of which
has a PTAS [15,16] and for each of which it is unknown whether an EPTAS
exists. It has been shown that, even for constant size alphabet, Closest Substring and Consensus Patterns are W[1]-hard with respect to the number
of input strings [11]; the parameterized complexity with respect to the distance
parameter, however, is open for these problems, whereas it has been settled for
DSSS in this paper. It would be interesting to further explore the border between
fixed-parameter tractability and intractability as initiated in Sect. 4.
On Exact and Approximation Algorithms
209
References
1. K. Abrahamson. Generalized string matching. SIAM Journal on Computing,
16(6):1039–1051, 1987.
2. J. Alber, J. Gramm, and R. Niedermeier. Faster exact solutions for hard problems:
a parameterized point of view. Discrete Mathematics, 229(1-3):3–27, 2001.
3. A. Amir, M. Lewenstein, and E. Porat. Faster algorithms for string matching with
k mismatches. In Proc. of 11th ACM-SIAM SODA, pages 794–803, 2000.
4. G. Ausiello, P. Crescenzi, G. Gambosi, V. Kann, A. Marchetti-Spaccamela, and
M. Protasi. Complexity and Approximation – Combinatorial Optimization Problems and their Approximability Properties. Springer, 1999.
5. M. Cesati and L. Trevisan. On the efficiency of polynomial time approximation
schemes. Information Processing Letters, 64(4):165–171, 1997.
6. X. Deng, G. Li, Z. Li, B. Ma, and L. Wang. A PTAS for Distinguishing (Sub)string
Selection. In Proc. of 29th ICALP, number 2380 in LNCS, pages 740–751, 2002.
Springer.
7. R. G. Downey. Parameterized complexity for the skeptic (invited paper). In Proc. of
18th IEEE Conference on Computational Complexity, July 2003.
8. R. G. Downey and M. R. Fellows. Parameterized Complexity. Springer, 1999.
9. P. A. Evans, A. Smith, and H. T. Wareham. The parameterized complexity of
p-center approximate substring problems. Technical report TR01-149, Faculty of
Computer Science, University of New Brunswick, Canada. 2001.
10. M. R. Fellows. Parameterized complexity: the main ideas and connections to practical computing. In Experimental Algorithmics, number 2547 in LNCS, pages 51–77,
2002. Springer.
11. M. R. Fellows, J. Gramm, and R. Niedermeier. On the parameterized intractability
of Closest Substring and related problems. In Proc. of 19th STACS, number 2285
in LNCS, pages 262–273, 2002. Springer.
12. M. Frances and A. Litman. On covering problems of codes. Theory of Computing
Systems, 30:113–119, 1997.
13. J. Gramm, R. Niedermeier, and P. Rossmanith. Exact solutions for Closest String
and related problems. In Proc. of 12th ISAAC, number 2223 in LNCS, pages
441–453, 2001. Springer. Full version to appear in Algorithmica.
14. J. K. Lanctot, M. Li, B. Ma, S. Wang, and L. Zhang. Distinguishing string selection
problems. In Proc. of 10th ACM-SIAM SODA, pages 633–642, 1999.
15. M. Li, B. Ma, and L. Wang. On the Closest String and Substring Problems. Journal
of the ACM, 49(2):157–171, 2002.
16. M. Li, B. Ma, and L. Wang. Finding similar regions in many sequences, Journal
of Computer and System Sciences, 65(1):73–96, 2002.
Complexity of Approximating Closest Substring
Problems
Patricia A. Evans1 and Andrew D. Smith1,2
1
University of New Brunswick, P.O. Box 4400, Fredericton N.B., E3B 5A3, Canada
pevans@unb.ca
2
Ontario Cancer Institute, University Health Network, Suite 703
620 University Avenue, Toronto, Ontario, M5G 2M9 Canada
fax: +1-506-453-3566
asmith@uhnres.utoronto.ca
Abstract. The closest substring problem, where a short string
is sought that minimizes the number of mismatches between it and
each of a given set of strings, is a minimization problem with a
polynomial time approximation scheme [6]. In this paper, both this
problem and its maximization complement, where instead the number
of matches is maximized, are examined and bounds on their hardness
of approximation are proved. Related problems differing only in their
objective functions, seeking either to maximize the number of strings
covered by the substring or maximize the length of the substring, are
also examined and bounds on their approximability proved. For this
last problem of length maximization, the approximation bound of 2 is
proved to be tight by presenting a 2-approximation algorithm.
Keywords: Approximation algorithms; Hardness of approximation;
Closest Substring
1
Introduction
Given a set F of strings, the closest substring problem seeks to find a string
C of a desired length l that minimizes the maximum distance from C to a substring in each member of F. We call such a short string C a center for F. The
corresponding substrings from each string in F are the occurrences of C. If all
strings in F are the same length n, and the center is also to be of length n, then
this special case of the problem is known as closest string. We examine the
complexity of approximating three problems related to closest substring with
different objective functions. A center is considered to be optimal in the context
of the problem under discussion, in that it either maximized or minimizes the
problem’s objective function. This examination of the problems’ approximability
with respect to their differing objective functions reveals interesting differences
between the optimization goals.
In [6], a polynomial time approximation scheme (PTAS) is given for closest
1
substring that has a performance ratio of 1 + 2r−1
+ ǫ, for any 1 ≤ r ≤ m
where m = |F|, and ǫ > 0.
A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 210–221, 2003.
c Springer-Verlag Berlin Heidelberg 2003
Complexity of Approximating Closest Substring Problems
211
While closest substring minimizes the number of mismatches, max closest substring maximizes the number of matches. We show that the max closest substring problem cannot be approximated in polynomial time with ratio
better than (log m)/4, unless P=NP. As the maximization complement of the
closest substring problem, its reduction can also be applied to closest substring. This application produces a similarly complementary result indicating
1
the necessity of the O(m)
term in the PTAS [6]. While the hard ratio for closest
substring disappears asymptotically when m approaches infinity (as is to be
expected given the PTAS [6]), it indicates a connection between the objective
function and the number of strings given as input. This result supports the posi1
in the PTAS performance ratio cannot be significantly
tion that the term O(m)
improved by a polynomial time algorithm.
In [8], Sagot presents an exponential exact algorithm for the decision problem
version of closest substring, also known as common approximate substring. Sagot also extends the problem to quorums, finding strings that are
approximately present in at least a specified number of the input strings. This
quorum size can be maximized as an alternate objective function, producing the
maximum coverage approximate substring problem. A restricted version
of this problem was examined in [7], and erroneously claimed to be as hard to
approximate as clique. We give a reduction from the maximum coverage version of set cover, showing that the problem is hard to approximate within
e/(e − 1) − ǫ (where e is the base of the natural logarithm) for any ǫ > 0.
The longest common approximate substring problem seeks to maximize the length of a center string that is within some specified distance d from
every occurrence. We give a 2-approximation algorithm for this problem and
show that 2 is optimal unless P=NP.
2
Preliminary Definitions
Definition 1. Let x be an instance of optimization problem Π with optimal
solution opt(x). Let A be an algorithm solving Π, and A(x) the solution value
produced by A for x. The performance ratio of A with respect to x is
A(x) opt(x)
max
,
.
opt(x) A(x)
A is a ρ-approximation algorithm if and only if A always returns a solution with
performance ratio less than or equal to ρ.
Definition 2. Let Π and Π ′ be two minimization problems. A gap-preserving
reduction (GP -reduction, ≤GP ) from Π to Π ′ with parameters (c, ρ),(c′ , ρ′ ) is a
polynomial-time algorithm f . For each instance I of Π, f produces an instance
I ′ = f (I) of Π ′ . The optima of I and I ′ , say opt(I) and opt(I ′ ) respectively,
satisfy the following properties:
212
P.A. Evans and A.D. Smith
opt(I) ≤ c ⇒ opt(I ′ ) ≤ c′ ,
opt(I) > cρ ⇒ opt(I ′ ) > c′ ρ′ ,
where (c, ρ) and (c′ , ρ′ ) are functions of |I| and |I ′ | respectively, and ρ, ρ′ > 1.
Observe that the above definition of gap preserving reduction specifically refers
to minimization problems, but can easily be adapted for maximization problems.
Although it is implied by the name, GP -reductions do not require the size of
the gap to be preserved, only that some gap remains [1].
We now formally specify the problems treated in this paper. All of these can
be seen as variations on the closest substring problem. Note that dH (x, y)
represents the number of mismatches, or Hamming distance, between two strings
x and y of equal length |x| = |y|.
max closest substring
Instance: A set F = {S1 , . . . , Sm } of strings over alphabet Σ such that
max1≤i≤m |Si | = n, integer l, (1 ≤ l ≤ n).
Question: Maximize mini (l − dH (C, si )), such that C ∈ Σ l and si is a
substring of Si , (1 ≤ i ≤ m).
maximum coverage approximate substring
Instance: A set F = {S1 , . . . , Sm } of strings over alphabet Σ such that
max1≤i≤m |Si | = n, integers d and l, (1 ≤ d < l ≤ n).
Question: Maximize |F ′ |, F ′ ⊆ F, such that for some C ∈ Σ l and for all
Si ∈ F ′ , there exists a substring si of Si such that dH (C, si ) ≤ d.
longest common approximate substring
Instance: A set F = {S1 , . . . , Sm } of strings over alphabet Σ such that
max1≤i≤m |Si | = n, integer d, (1 ≤ d < n).
Question: Maximize l = |C|, C ∈ Σ ∗ , such that dH (C, si ) ≤ d and si is a
substring of Si , (1 ≤ i ≤ m).
Throughout this paper, when discussing different problems the values of d, l
and m may refer to either the optimal values of objective functions or the values
specified as part of the input. These symbols are used in accordance with their
use in the formal statement of whatever problem is being discussed.
3
3.1
Max Closest Substring
Hardness of Approximating Max Closest Substring
In this section we use a gap preserving reduction from set cover to show
inapproximability for max closest substring. Lund and Yannakakis [2], with
a reduction from label cover to set cover, showed that set cover could
not be approximated in polynomial time with performance ratio better than
Complexity of Approximating Closest Substring Problems
213
(log |B|)/4 (where B is the base set) unless NP = DTIME(2poly(log n) ). A result
of Raz and Safra [3] indirectly strengthened the conjecture; set cover is now
known to be NP-hard to approximate with ratio better than (log |B|)/4.
set cover
Instance: A set B of elements to be covered and a collection of sets L such
that Li ⊆ B, (1 ≤ i ≤ |L|).
|R|
Question: Minimize |R|, R ⊆ L, such that ∪j=1 Rj = B.
Let I = B, L be an instance of set cover. The reduction constructs, in
polynomial time, a corresponding instance I ′ = F, l of max closest substring. For all ρ > 1, there exists a ρ′ > 1 such that a solution for I with a
ratio of ρ can be obtained in polynomial time from a solution to I ′ with ratio ρ′ .
The Alphabet. The strings of F are composed of characters from the alphabet
Σ = Σ1 ∪ Σ2 . The characters of Σ1 are referred to as set characters, and identify
sets in L. The characters of Σ2 are referred to as element characters and are in
one-to-one correspondence with elements of the base set B.
Σ1 = {pi : 1 ≤ i ≤ |L|} ,
Σ2 = {ui : 1 ≤ i ≤ |B|} .
Substring Gadgets. The strings of F are made up of two types of substring
gadgets. We use the function f , defined below, to ensure that the substring
gadgets are sufficiently large. The gadgets are defined as follows:
Subset Selectors:
Separators:
f (|B|)
set(i) = pi
f (|B|)
separator(j) = uj
The Reduction. The string set F contains |B| strings, corresponding to the
elements of B. For each j ∈ B, let Lj ⊆ L be the subfamily of sets containing the
element j. With product notation referring to concatenation, define the string
set(q)separator(j) .
Sj =
q∈Lj
The function f : N → N must be defined. It is necessary for f to have the
property that for all positive integers x < |B|,
f (|B|)
f (|B|)
>
.
x
x+1
It is straightforward to check that f (y) = y 2 has this property. The maximum
length of any member of F is n = 2|L||B|2 , the size of F is m = |B|, the length
of the center is l = f (|B|) = |B|2 and the alphabet size is |Σ| = |L| + |B|. We call
any partition of F whose equivalence relation is the property of having an exact
214
P.A. Evans and A.D. Smith
common substring a substring induced partition. For any two occurrences s, s′ of
a center, we call s and s′ disjoint if for all 1 ≤ q ≤ |s|, s[q] = s′ [q]. Observe that
the maximum distance to an optimal center, for any set of disjoint occurrences,
increases with the size of the set.
Lemma 1. Let F be a set of occurrences of an optimal center C such that |F | =
k. If for each pair s, s′ ∈ F , dH (s, s′ ) = l, then for every s ∈ F , l − dH (C, s) ≥
⌊l/k⌋. Also, there is at least one s ∈ F such that l − dH (C, s) = ⌊l/k⌋.
Proof. There are l total positions and for any position p, there is a unique s ∈ F
such that s[p] = C[p]. If some s ∈ F had l − dH (C, s) < ⌊l/k⌋, then the center C
would not be optimal, as a better center can be constructed by taking position
symbols evenly from the k occurrences. If all s ∈ F have l − dh (C, s) > ⌊l/k⌋,
then the total number of matches exceeds l, some pair of matches would have
the same position, and thus some pair s, s′ ∈ F have dH (s, s′ ) < l.
⊓
⊔
The significance of our definition for f is apparent from the above proof. It is
essential that, under the premise of Lemma 1, values of k (the number of distinct
occurrences of a center) can be distinguished based on the maximum distance
from any occurrence to the optimal center.
Lemma 2. set cover ≤GP max closest substring.
Proof. Suppose the optimal cover R for B, L has size less than or equal to c.
Construct string C of length |B|2 as follows. To the positions in C, assign in equal
amounts the set characters representing members of R. Then C is a center for F
with maximum similarity ⌊|B|2 /c⌋.
Suppose |R| > c. Let F ′ be the largest subset of F having a substring induced
c-partition. By the reduction, since |R| > c, F ′ = F. Let S be any string in F \F ′ .
By Lemma 1, any optimal center for F ′ must have minimum similarity ⌊|B|2 /c⌋,
and therefore has at least ⌊|B|2 /c⌋ characters from a substring of every string in
F ′ . But the occurrence in S is disjoint from the occurrences in F ′ , forcing the
optimal center to match an equal number of positions in more than c disjoint
occurrences. Hence, also by Lemma 1, the optimal center matches no more than
⌊|B|2 /(c + 1)⌋ < ⌊|B|2 /c⌋ characters in some occurrence. The gap preserving
property of the reduction follows since ⌊|B|2 /c⌋ is a decreasing function of c. ⊓
⊔
Theorem 1. max closest substring is not approximable within (log m)/4
in polynomial time unless P=NP.
Proof. The theorem follows from the fact that the NP-hard ratio for max closest substring remains identical to that of the source problem set cover. ⊓
⊔
As max closest substring is the complementary maximization version
of closest substring, and there is a bijection between feasible solutions to
the complementary problems that preserves the order of solution quality, this
reduction also applies to closest substring. The form of the hard performance
ratio for closest substring provides evidence that the two separate sources
of error, 1/O(m) and ǫ, are necessary in the PTAS of [6].
Complexity of Approximating Closest Substring Problems
215
Theorem 2. closest substring cannot be approximated with performance
1
ratio 1 + ω(m)
in polynomial time unless P=NP.
Proof. Since the NP-hard ratio for set cover is ρ = (1/4) log |B|, the NP-hard
ratio obtained for closest substring in the above reduction is
ρ′ =
cρ−1
cρ−ρ
=1+
≥1+
ρ−1
ρ
1
O(m)
1
· c−1
.
⊓
⊔
3.2
An Approximation Algorithm for Max Closest Substring
The preceding subsection showed that max closest substring cannot be approximated within (log m)/4. Here, we show that this bound is within a factor
of 4 · |Σ| of being tight, by presenting an approximation algorithm that achieves
a bound of |Σ| log m for max closest substring.
Due to the complementary relationship between max closest substring
and closest substring, we start by presenting a greedy algorithm for closest
string. The greedy nature of the algorithm is due to the fact that it commits to a
local improvement at each iteration. The algorithm also uses a lazy strategy that
bases each decision on information obtained by examining a restricted portion
of the input. This is the most naive form of local search; the algorithm is not
expected to perform well. The idea of the algorithm is to read the input strings
column by column, and for each column i, assign a character to C[i] before
looking at any column j such that j > i. Algorithm 1 describes this procedure,
named GreedyAndLazy, in pseudocode.
216
P.A. Evans and A.D. Smith
Lemma 3. The greedy and lazy algorithm for closest string produces a cen1
ter string with radius within a factor of m(1 − |Σ|
) of the optimal radius.
Proof. Consider the number of iterations required to guarantee that each S ∈ F
matches C in at least one position. Let Ji be the set of strings that do not match
any position of C after the ith iteration, then
|Σ| − 1
Ji+1 ≤
Ji ≤ exp(−1/|Σ|)Ji .
|Σ|
This is because the algorithm always selects the column majority character of
those strings in Ji . Let x be the number of iterations required before all members
of F match C in at least one position. A bound on the value of x is given by the
following inequality:
1
x
> exp −
.
m
|Σ|
Hence, for any strictly positive ǫ, after x = |Σ| ln m+ǫ iterations, each member of
F matches C in at least one position. After the final iteration, the total distance
from C to any member of F is at most n − n/(|Σ| ln m). The optimal distance is
at least n/m, otherwise some positions are identical in F (and thus should not
be considered). Therefore the performance ratio of GreedyAndLazy is
n − n/(|Σ| ln m)
1
≤m 1−
.
n/m
|Σ|
⊓
⊔
The running time of GreedyAndLazy, for m sequences of length n, is
O(|Σ|mn2 ).
Now consider applying GreedyAndLazy to the max closest substring
problem by selecting an arbitrary set of substrings of length l to reduce the
problem to a max closest string problem. The number of matches between
any string in F and the constructed center will be at least Ω(l/(|Σ| log m)).
Corollary 1. GreedyAndLazy is a O(|Σ| log m)-approximation algorithm for
max closest substring.
Since max closest substring is hard to approximate with ratio better than
(log m)/4, this approximation algorithm is within 4 · |Σ| of optimal.
4
Maximum Coverage Approximate Substring
The incorrect reduction given in [7] claimed an NP-hard ratio of O(nǫ ), ǫ = 14 ,
for maximum coverage approximate substring when l = n and |Σ| = 2. Its
error resulted from applying Theorem 5 of [5], proven only for alphabet size at
least three, to binary strings. Hardness of approximation for the general problem
is shown here by a reduction from maximum coverage.
Complexity of Approximating Closest Substring Problems
217
maximum coverage
Instance: A set B of elements to be covered and a collection of sets L such
that Li ⊆ B, (1 ≤ i ≤ |L|), a positive integer k.
Question: Maximize |B|, B ⊆ B, such that B = ∪kj=1 Lj , where Lj ∈ L.
Given an instance B, L, k of maximum coverage, we construct an instance F, l, d of maximum coverage approximate substring where m =
|B|, l = k, d = k − 1 and n ≤ k|L|. The construction of F is similar to the
construction used when reducing from set cover to closest substring in
Section 3; unnecessary parts are removed.
The Alphabet. The strings of F are composed of characters from the alphabet
Σ. The characters of Σ correspond to the sets Li ∈ L that can be part of a
cover, so Σ = {xi : 1 ≤ i ≤ |L|}.
The Reduction. The string set F = {S1 , . . . , S|B| } will contain strings corresponding to the elements of B. To construct these strings for each j ∈ B, let
Lj ⊆ L be the subfamily of sets containing the element j. For each j ∈ B, define
xki .
Sj =
xi ∈Lj
Set d = k − 1 and l = k. We seek to maximize the number of strings in F
containing occurrences of some center C.
Lemma 4. maximum coverage ≤GP maximum coverage approximate
substring.
Proof. Suppose L, B, k is an instance of maximum coverage with a solution
set R ⊂ L, such that |R| = k and R covers b ≤ |B| elements. Then there is
a center C for F of length l = k that has distance at most d = k − 1 from
a substring of b strings in F. Let the k positions in C be assigned characters
representing the k sets in the cover, i.e. for each xi ∈ R, there is a position p
such that C[p] = xi . All b members of F corresponding to those covered elements
in B contain a substring matching at least one character in C, and mismatch at
most k − 1 characters. Suppose one cannot obtain a k cover with ratio better
than ρ. Then one cannot obtain a center for F that occurs in more than b/ρ
b
= ρ.
⊓
⊔
strings of F, so the hard ratio is ρ′ = b/ρ
Theorem 3. maximum coverage approximate substring cannot be approximated with performance ratio e/(e − 1) − ǫ, for any ǫ > 0, unless P=NP.
Proof. It was shown in [4] that the NP-hard ratio for maximum coverage is
e/(e − 1) − ǫ. This result combined with Lemma 4 proves the theorem.
⊓
⊔
Note that this reduction shows hardness for the general version of the problem, and leaves open the restricted case of l = n with |Σ| = 2. No approximation
algorithms with nontrivial ratios are known.
218
5
P.A. Evans and A.D. Smith
Longest Common Approximate Substring
The longest common approximate substring problem seeks to maximize
the length of a center that is within a given distance from each string in the
problem instance. That a feasible solution always exists can be seen by considering the case of a single character, since the problem is defined with d > 0. This
problem is useful in finding seeds of high similarity for sequence comparisons.
Here we show that a simple algorithm always produces a valid center that is
at least half the optimal length. A valid center is any string that has distance at
most d from at least one substring of each string in F. The algorithm simply evaluates each substring of members of F and tests them as centers. The following
procedure Extend accomplishes this with a time complexity of Θ(m2 n3 ).
Theorem 4. Extend is a 2-approximation algorithm for longest common
approximate substring.
Proof. Let C be the optimal center for F. For each Si ∈ F, let si be the occurrence of C from Si ; observe that |si | = |C|. Define si,1 as the substring of si
consisting of the first |C|/2 positions of si , and si,2 as the substring consisting
of the remaining positions. Similarly, define C1 and C2 as the first and last half
of C. For x ∈ {1, 2}, let cx be equal to the string si,x that satisfies
dH (si,x , Cx ) ≤ min dH (sj,x , Cx ) .
sj,x ,j =i
Define c such that
c=
c1
if dH (c1 , C1 ) ≤ dH (c2 , C2 ),
c2
otherwise.
Note that dH (c, Cx ) ≤ d/2, for some x ∈ {1, 2}. Suppose, for contradiction, that
c is not a valid center. Assume, without loss of generality, that c = si,1 for some
i. Then there is some si,1 such that dH (c, si,1 ) > d. Since dH (c, C1 ) = d/2 − y
for some 1 ≤ y ≤ d/2, by the triangle inequality dH (si,1 , C1 ) ≥ d/2 + y + 1. This
implies that dH (si,2 , C2 ) ≤ d/2 − y − 1 < dH (c, C1 ), contradicting the definition
Complexity of Approximating Closest Substring Problems
219
of c. Hence c is a valid center. Since c is a substring of one of the input strings,
it will be found by Extend. It is half the length of the optimal length center C,
so a center will be found that is at least half the length of the longest center. ⊓
⊔
The performance ratio of 2 is optimal unless P=NP. We use a transformation
from the vertex cover decision problem that introduces a gap in the objective
function.
vertex cover
Instance: A graph G = (V, E) and a positive integer k.
Question: Does G have a vertex cover of size at most k, i.e., a set of vertices
V ′ ⊆ V , |V ′ | ≤ k, such that for each edge (u, v) ∈ E, at least one
of u and v belongs to V ′ ?
Suppose for some graph G, we seek to determine if G contains a vertex cover
of size k. We construct an instance of longest common approximate substring with |E| strings corresponding to the edges of G. The intuition behind
the reduction is that an occurrence of the center in each string corresponds to
the occurrence of a cover vertex in the corresponding edge. Before giving values
of n and d, we describe the gadgets used in the reduction.
The Alphabet. The string alphabet is Σ = Σ1 ∪ Σ2 ∪ {A}. We refer to these
as vertex characters (Σ1 ), unique characters (Σ2 ), and the alignment character
(A), where Σ1 = {vi : 1 ≤ i ≤ |V |} and Σ2 = {uij : (i, j) ∈ E}.
Substring Gadgets. We next describe the two “high level” component substrings used in the construction. The function f is any arbitrarily large polynomial function of |G|.
Vertex Selectors:
Separators:
(z−1)
vertex(x, i, j, z) = Af (k) uij
separator(i, j) =
(k−z)
vx uij
Af (k)
3f (k)
uij
The Reduction. We construct F as follows. For any edge (i, j) ∈ E:
Sij =
vertex(i, i, j, z)separator(i, j)vertex(j, i, j, z)separator(i, j)
1≤z≤k
The length of each string is then n = k(10f (k) + 2k). The threshold distance is
d = k − 1.
Theorem 5. longest common approximate substring cannot be approximated in polynomial time with performance ratio better than 2−ǫ, for any ǫ > 0,
unless P=NP.
Proof. For any set of strings F so constructed, there is an exact common substring of length f (k) corresponding to the f (k) repeats of the alignment character
A. Suppose there is a size k cover for the source instance of vertex cover.
Construct a center C for F as follows. Assign the alignment character A to the
220
P.A. Evans and A.D. Smith
first f (k) positions in C. To positions f (k) + 1 through f (k) + k, assign the characters corresponding to the vertices in the vertex cover. These may be assigned
in any order. Finally, assign the alignment character A to the remaining f (k)
positions of C. Each string in F contains a substring that matches 2f (k) + 1
positions in C, so C is a valid center.
If there is no k cover for the source instance of vertex cover, then for any
length f (k)+k string there will be some S ∈ F that mismatches k positions. As f
can be any arbitrarily large polynomial function of k, the NP-hard performance
ratio is
2f (k) + k
≥2−ǫ ,
f (k) + k
for any constant ǫ > 0.
To show hardness for 2 − ǫ, where ǫ is not a constant (it can be a function of
l), consider that we can manipulate the hard ratio into the form
2−
k
.
f (k) + k
Since l is the optimal length and l = 2f (k) + k, substitute f (k) = l/2 − k/2
in the performance ratio:
2−
k
2k
=2−
.
l/2 − k/2 + k
l+k
Suppose we select l = k c during the reduction, where c is any arbitrarily large
constant. Then we have shown a hard performance ratio of
2
2l1/c
2l1/c
1
=
2
−
2−
≥
2
−
=
2
1 − (c−1)/c .
1/c
(c−1)/c
l
l+l
l
l
⊓
⊔
6
Conclusion
These results show that, unless P=NP, the max closest substring, maximum
coverage approximate substring, and longest common approximate
substring problems all have limitations on their approximability.
The relationships between the different objective functions produce an interesting interplay between the approximability of minimizing d with l fixed,
maximizing l with d fixed, and maximizing their difference l − d. While this
last variant, the max closest substring problem, has a hard performance
ratio directly related to the number of strings m, the two variants that fix one
parameter and attempt to maximize the difference by optimizing the other parameter have lower ratios of approximability. It is NP-hard to approximate max
closest substring with a performance ratio better than (log m)/4, and we
Complexity of Approximating Closest Substring Problems
221
have provided a (|Σ| log m)-approximation. For longest common approximate substring, with d fixed, the length can be approximately maximized
with a ratio of 2, and it is NP-hard to approximate for any smaller ratio. The
best ratio of approximation is for closest substring, where l is fixed and d is
1
+ ǫ), for any 1 ≤ r ≤ m,
minimized; the PTAS of [6] achieves a ratio of (1 + 2r−1
and we have now shown that unless P=NP it cannot be approximated closer
1
than 1 + O(m)
.
For the quorum variant of closest substring, where the number of strings
covered is instead the objective function to be maximized, then it is NP-hard
to obtain a performance ratio better than e/(e − 1). The restricted variant with
l = n and |Σ| = 2 once thought to be proven hard by [7] is still open, without
either hardness or a nontrivial approximation algorithm.
Our reductions use alphabets whose size will increase. The complexity of
variants of these new problems where the alphabet size is treated as a constant
is open, except as they relate to known results for constant alphabets [6,7].
References
1. Sanjeev Arora. Probabilistic checking of proofs and the hardness of approximation
problems. PhD thesis, UC Berkeley, 1994.
2. Carsten Lund and Mihalis Yannakakis. On the hardness of approximating minimization problems. Journal of the ACM, 41(5), 1994.
3. Ran Raz and Shmuel Safra. A sub-constant error-probability low-degree test, and a
sub-constant error-probability PCP characterization of NP. In Proceedings of the
Annual ACM Symposium on Theory of Computing, 475–484, 1997.
4. Uriel Feige. A threshold of log n for approximating set cover. Journal of the ACM,
45(4):634–652, 1998.
5. J. K. Lanctot, M. Li, B. Ma, S. Wang, and L. Zhang. Distinguishing string selection problems. In Proceedings of the Annual ACM-SIAM Symposium on Discrete
Algorithms, 633–642. ACM Press, 1999.
6. Ming Li, Bin Ma, and Lusheng Wang. On the closest string and substring problems.
Journal of the ACM, 49(2):157–171, 2002.
7. Bin Ma. A polynomial time approximation scheme for the closest substring problem. In Combinatorial Pattern Matching (CPM 2000), Lecture Notes in Computer
Science 1848, 99–107. Springer, 2000.
8. Marie-France Sagot. Spelling approximate repeated or common motifs using a suffix
tree. In LATIN’98, Lecture Notes in Computer Science 1380, 374–390. Springer,
1998.
On Lawson’s Oriented Walk in Random
Delaunay Triangulations⋆
Binhai Zhu
Department of Computer Science
Montana State University
Bozeman, MT 59717-3880 USA
bhz@cs.montana.edu
Abstract. In this paper we study the performance of Lawson’s Oriented Walk, a 25-year old randomized point location algorithm without
any preprocessing and extra storage, in 2-dimensional Delaunay triangulations. Given n pseudo-random points drawn from a convex set C
with unit area and their Delaunay triangulation D, √
we prove that the
algorithm locates a query point q in D in expected O( n log n) time. We
also present an improved version of this algorithm, Lawson’s Oriented
Walk with Sampling, which takes expected O(n1/3 ) time. Our technique
is elementary and the proof is in fact to relate Lawson’s Oriented Walk
with Walkthrough, another well-known point location algorithm without
preprocessing. Finally, we present empirical results to compare these two
algorithms with their siblings, Walkthrough and Jump&Walk.
Keywords: Random Delaunay triangulation, point location, averagecase analysis.
1
Introduction
Point location is one of the classical problems in computational geometry, GIS,
graphics and solid modeling. In general, point location deals with the following problem: given a set of disjoint geometric objects, determine the object
containing a query point. The theoretical problem is well studied in the computational geometry literature and several theoretically optimal algorithms have
been proposed since early 1980s; see e.g., Snoeyink’s recent survey [Sn97]. In the
last couple of years, optimal or close to optimal solutions (sometimes even in
the average-case) are proposed with simpler data structures [ACMR00,AMM00,
AMM01a,AMM01b,GOR97]. All these (theoretically) faster algorithms require
preprocessing to obtain fast query bounds.
However, it should be noted that in practice point location is mainly used as
a subroutine for computing and updating large scale triangulations, like in mesh
generation. Therefore, extra preprocessing and building additional data structure
⋆
The research is partially supported by NSF CARGO grant DMS-0138065 and a
MONTS grant.
A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 222–233, 2003.
c Springer-Verlag Berlin Heidelberg 2003
On Lawson’s Oriented Walk in Random Delaunay Triangulations
223
is hard, if not impossible, to perform in practice. We need practical point location
solutions which performs no or very little preprocessing in practice; moreover,
as Delaunay triangulation is used predominantly in areas like mesh generation,
finite-element analysis (FEA) and GIS we in fact need efficient practical point
location algorithms in Delaunay triangulations.
Practical point location in Delaunay triangulations only receives massive
attention from computational geometers very recently [DMZ98,De98,MSZ99,
DLM99]. All these works are somehow based on an idea due to Green and
Sibson to use the “walkthrough” method to perform point location in Delaunay triangulation, a common data structure used in these areas. In particular
the Jump&Walk method of [DMZ98,MSZ99] uses random sampling to select a
good starting point to walk toward the destination while others mix the “walkthrough” idea with some extra simple tree-like data structure to make the algorithm more general [De98,DLM99] (e.g., deal with arbitrary-distributed data
[De98] or handle extremely large input while bounding the query time [DLM99]).
Some of these algorithms, e.g., Jump&Walk, has been used in important software packages [Sh96,TG+ 96,BDTY00]. Theoretically, for pseudo-uniformly distributed points in a convex set C, in 2D Jump&Walk is known to have a running
time of O(n1/3 ) when the query point is slightly away from the boundary of C
[DMZ98]. Similar result holds in 3D [MSZ99]. (We remark that similar “walk”
ideas have also been used in ray shooting [AF97,HS95].)
Lawson’s Oriented Walk, another randomized point location algorithm without preprocessing, was proposed in 1977 [La77]. It is known that, unlike the
Walkthrough method, it could run into loops in arbitrary triangulations. But in
Delaunay triangulations it always terminates [Ed90,DFNP91]. Almost no theoretical analysis was ever done on its performance and this question was raised
again recently [DPT01]. In this paper, we focus on proving the expected performance of Lawson’s Oriented Walk algorithm in a random Delaunay triangulation
(i.e., Delaunay triangulation of n random points). (We remark that given these
random data, when enough preprocessing, i.e., Θ(n) expected time and space, is
performed we can answer point location queries in expected O(1) time [AEI+ 85].)
Delaunay Triangulations. For completeness, we briefly mention the following
definitions. Further details can be found in some standard textbooks like [PS85].
The convex hull of a finite point set X is the smallest convex set containing X,
denoted as CH(X). The convex hull of a set of k + 1 affinely independent points
in Rd , for 0 ≤ k ≤ d, is called a k-simplex; i.e., a vertex, an edge, a triangle,
or a tetrahedron, etc. If k = d, we also say the simplex is full dimensional.
A triangulation T of X is a subdivision of the convex hull of X consisting of
simplices with the following two properties: (1) for every simplex in T , all its
faces are also simplices in T ; (2) the intersection of any two simplices in T is
either empty or a face of both, in which case it is again a simplex in T . A
Delaunay triangulation D of X is a triangulation in which the circumsphere
of every full-dimensional simplex is empty, i.e., contains no points of X in its
interior.
224
B. Zhu
Point Location by Walking. The basic idea is straightforward; it goes back
to early work on constructing Delaunay triangulations in 2D and 3D [GS78,
Bo81]. Given a Delaunay triangulation D of a set X of n points in Rd , and a
query point q; in order to locate the (full-dimensional) simplex in D containing
q, start at some arbitrary simplex in D and then “walk” from the center of that
simplex to neighboring simplex “in the general direction” of the target point
q. Figure 1 shows an example for the straight Walkthrough method walking
from an edge e to q. Other simple variations of this kind of “walk” are possible,
e.g., the Orthogonal Walk [DPT01]. The underlying assumption for “walk” is
that the D is given by an internal representation allowing constant-cost access
between neighboring simplices (for example, in 2D, a linked list of triangles
suffices as long as each triangle store its corresponding local information, i.e.,
the coordinates of its three vertices and pointers to its three edges and three
neighboring triangles). The list of other suitable data structures includes the
2D quad-edge data structure [GS85], the edge-facet structure in 3D [DL89], its
specialization and compactification to the domain of 3D triangulations [Mu93],
or its generalization to d dimensions [Br93], etc.
e1
e2
q
e
Fig. 1. An example for the walkthrough method and Lawson’s Oriented Walk.
Lawson’s Oriented Walk.
Given the Delaunay triangulation D of these n
points {X1 , X2 , . . . , Xn }, and a query point q, Lawson’s Oriented Walk algorithm
locates the simplex of D containing q, if such a simplex exists, as follows (Figure
1).
(1) Select an edge e = Y1 Y2 at random from D.
(2) Determine the triangle t adjacent to e such that t and q are on the same side
of the line containing e. Let the other two edges of t be e1 , e2 .
On Lawson’s Oriented Walk in Random Delaunay Triangulations
225
(3) Determine ei , i = 1, 2, such that the halfplane passing through ei and not
containing t, hi , contains q. If both ei ’s have this property, randomly pick
up one. If neither ei ’s have this property, return t as the triangle containing
q.
(4) Update e ← ei and repeat step (2)-(4).
The advantage of Lawson’s Oriented Walk is that it handles geometric degeneracy better in practice compared with the Walkthrough method (in which
some edges of D might be collinear with the walking segment). In the following, we focus on proving the expected performance of Lawson’s Oriented Walk
algorithm under the assumption that the Delaunay triangulation D of n points
X1 , ..., Xn are pseudo-uniformly distributed in a compact convex set C.
2
Theoretical Analysis
We start by recalling some fundamental definitions. Let C be a compact convex
set of R2 and let α and β be two reals such that 0 < α < β. We say that
a probability measure P is an (α, β)-measure over C if P [C] = 1 and if we
have α λ(S) ≤ P [S] ≤ β λ(S) for every measurable subset S of C, where λ
is the usual Lebesgue measure. An R2 -valued random variable X is called an
(α, β)-random variable over C if its probability law L(X) is an (α, β)-measure
over C. A particular and important example of an (α, β)-measure P is when P
is a probability measure with density f (x) such that α ≤ f (x) ≤ β for all x ∈ C.
This probabilistic model was slightly more general than the uniform distribution
and we will loosely call it pseudo-uniform or pseudo-random.
Throughout this section, ci ’s are constants related to the local geometry
(but not related to n). The idea of our analysis on Lawson’s Oriented Walk in
random Delaunay triangulations is as follows. When e = Y1 Y2 is selected, we
consider
two situations. In case 1, the segment pq, where p is any point on e, is
O( logn n ) distance away from the boundary of C, ∂C. In case 2, the segment
pq could be very close to ∂C (but this event has a very small probability). In
both cases, we argue that the number of triangles visited by Lawson’s Oriented
Walk is proportional to the number of triangles crossed by the segment pq. To
estimate
the number of triangles of D crossed by a line segment pq when pq is
O( logn n ) distance away from ∂C, we need the following lemma of [BD98] which
is reorganized as follows.
Lemma 1. Let C be a compact convex set with unit area in R2 and let X1 , . . . ,
Xn be n points drawn independently in C from an (α, β)-measure. Let D be
the Delaunay triangulation
of X1 , . . . , Xn . If L is a fixed line segment of length
|L| in C and is O( logn n ) distance away from the boundary of C and if L is
independent of X, then the expected number of triangles or edges of the Delaunay
triangulation D crossed by L is bounded by
√
c3 + c4 |L| n .
226
B. Zhu
We now prove the following lemma.
Lemma 2. Let E[T1 (e, q)], where e = Y1 Y2 is a random edge picked by Lawson’s
Oriented Walk
and the query point q is independent of X1 , . . . , Xn and both
e, q are O( logn n ) distance away from ∂C, be the expected number of triangles
crossed by (or, visited by the walkthrough method along) a straight
√ segment pq,
where p ∈ e is any point of e. We have E[T1 (e, q)] ≤ c5 + c6 E|pq| n .
Proof. Let De be the Delaunay triangulation for data points {X1 , ..., Xn } −
{Y1 , Y2 }. Then L = pq, the line segment connecting p and q, is independent of
the data points {X1 , ...,
√Xn } − {Y1 , Y2 }. By Lemma 1, pq crosses an expected
number of c3 + c4 E|pq| n − 2 edges in De .
Let T1 (e, q) denote the number of triangles in D crossed by pq, p ∈ e. Clearly
ET1 (e, q) is bounded by the number of triangles in D crossed by pq which is in
turn bounded by the number of triangles of De crossed by pq plus the sum of
the degrees of Y1 , Y2 in the Delaunay triangulation De . To see this, note that L
either crosses a triangle without one of Y1 and Y2 as a vertex (in which case the
triangle is identical in D and De ) or with one of Y1 and Y2 as a vertex. The total
number of the latter kind of triangles does not exceed S. The expected value
of S is, by symmetry, 2 times the expected degree of Y1 , which is at most 6 by
Euler’s formula. Therefore, we have
√
ET1 (e, q) ≤ 6 × 2 + c3 + c4 · E|pq| n − 2
√
≤ 12 + c3 + c4 E|pq| n
√
= c5 + c6 E|pq| n , c5 > 12 .
This concludes the proof of Lemma 2. ⊓
⊔
Lemma 2 has a very interesting implication which will be useful in the proof
of Theorem 1. We simply list it as a corollary.
√
Corollary 1. Let e, q, c5 , c6 be as in Lemma 2. If c5√+ c6 E|p′ q| n, for some
p′ ∈ e, is greater than a value, then so is c5 + c6 E|pq| n, for every p ∈ e.
Now we are ready to prove the following theorem regarding the expected
performance of Lawson’s Oriented Walk in a random Delaunay triangulation.
Theorem 1. Let C be a compact convex set with unit area in R2 , and let
X1 , . . . , Xn be n points drawn independently in C from
an (α, β)-measure. If
the query point q is independent of X1 , . . . , Xn and is O( logn n ) distance away
from ∂C, then the expected number of triangles visited by Lawson’ Oriented Walk
is bounded by
c1 + c2 n log n .
Proof of Theorem 1. Let B be the event that e is O( logn n ) distance
away from the boundary of C, i.e., B = {e is O( logn n ) distance away from
On Lawson’s Oriented Walk in Random Delaunay Triangulations
227
∂C}. Clearly, P [B] ≥ 1 − β · O( logn n ) and P [B] ≤ β · O( logn n ) following the
property of (α, β) measure.
Let E[T (e, q)], e = Y1 Y2 , be the expected number of triangles of D visited
by Lawson’s Oriented Walk. We first consider E[T (e, q)|B]. Let t be the triangle
incident to e such that t and q are on the same side of the line through e. Let
t = △Y1 Y2 Y3 . We have two cases: (a) Y3 is inside △qY1 Y2 ; and (b) Y√
3 is outside
of △qY1 Y2 . We prove by induction that E[T (e, q)|B] ≤ c7 + c8 · E|pq| n, for any
point p ∈ e; moreover, c7 = c5 and c8 = c6 suffices.
Notice that in case (a), the algorithm needs to pick up e1 or e2 randomly.
Without loss of generality, assume that algorithm picks e1 . We have
E[T (e, q)|B] = 1 + E[T (e1 , q)|B].
In this case the distance from any point on e1 to q is always smaller than the
distance from some point on e to q. We extend qY3 which intersects e at Y4 and
we have qY4 = qY3 + Y3 Y4 (Figure
2 (a)). We prove by induction that in this case
√
E[T (e, q)|B] ≤ c7 +c8 ·E|pq| n for any p ∈ e. (The induction is on the number of
edges visited by the algorithm, in reverse order.) The basis is straightforward: if q
is inside a triangle incident to e and p is any point on e, then E[T (e, q)|B]
and
= 1√
√
log n
following Lemma 2, c7 +c8 ·E|pq| n is less than on equal to c7 +c8 ·O(
n ) n.
(This is due to the fact that |pq| is less than the maximal edge length of the
triangle containing
q, following [BEY91,MSZ99],
the expected maximal edge
when the edge is O( logn n ) distance away from the
√
√
boundary of C.) Clearly, 1 ≤ c7 + c8 · O( logn n ) n = c7 + c8 O( log n) (if we set
√
c7 = c5 > 12). Let the inductive hypothesis be E[T (e1 , q)|B] ≤ c7√
+c8 ·E|qY ′ | n,
for any Y ′ ∈ e1 . Consequently, E[T (e1 , q)|B] ≤ c7 + c8 · E|qY3 | n, as Y3 ∈ e1 .
We have
length in D is O(
log n
n )
E[T (e, q)|B] = 1 + E[T (e1 , q)|B]
√
≤ 1 + c7 + c8 E|qY3 | n
√
√
= c7 + c8 E(|qY3 | + |Y3 Y |) n + (1 − c8 nE|Y Y3 |),
√
√
− c8 E|Y Y3 | n < 0,
which is bounded by c7 + c8 · E|qY | n, Y ∈ Y1 Y2 , if we set 1
n
i.e., c8 ≥ E|Y Y13 |√n . Following [BEY91,MSZ99], E|Y Y3 | ≤ c9 log
. So in this
n
1
√
case we just need to set c8 = max{c6 ,
}, which is c6 when n is sufficiently
c9 log n
large. To finish our inductive proof for case (a) using Corollary
3, we can simple
√
set c7 = c5 . In other words, E[T (e, q)|B] ≤ c7 + c8 · E|pq| n, for any point p ∈ e;
moreover, c7 = c5 and c8 = c6 .
Notice that in case (b), the algorithm can only pick up one of e1 and e2 .
Without loss of generality, assume that algorithm picks e1 . Let the intersection
of qY1 and e1 be Y (Figure 2 (b)). In this case we still have E[T (e, q)|B] =
1 + E[T (e1 , q)|B].
In this case, we can again prove
√ by induction that E[T (e, q)|B] is bounded
by E[T (e, q)|B] ≤ c7 + c8 · E|pq| n, for any p ∈ e. We consider the line segment
228
B. Zhu
q
Y3
Y1
e2
e1
e
Y
Y
Y1
Y3
e2
e1
e
q
Y2
Y2
(a)
(b)
Fig. 2. Illustration for the proof of Theorem 1.
qY1 = qY + Y Y1 . From the inductive hypothesis we further have
√
E[T (e1 , q)|B] ≤ c7 + c8 · E|qY | n.
Therefore, in this case we also have
E[T (e, q)|B] = 1 + E[T (e1 , q)|B]
√
≤ 1 + c7 + c8 · E|qY | n
√
√
= c7 + c8 · E|qY1 | n + (1 − c8 E|Y Y1 | n).
√
To make E[T (e, q)|B] ≤ c7 + c8 · E|qY1 | n, we just need to set c8 ≥ E|Y Y11 |√n .
n
, in this case we also need
Again, following [BEY91,MSZ99], E|Y Y1 | ≤ c6 log
n
to set c8 = max{c6 , √ 1
} = c6 . Similarly, we can set c7 = c5 and finish the
c9 log n
inductive proof for case (b).
By definition, we have
E[T (e, q)] = E[T (e, q)|B] · P [B] + E[T (e, q)|B] · P [B].
To conclude the proof, we note that E|pq| is of length Θ(1) in both cases. To
see this, let p be any point on Y1 Y2 . and note that |pq|2 π is the probability
contents of the circle at q of radius |pq|, and is therefore distributed as an i.i.d.
(independently identically distributed) uniform [0, c10 ] random variables, which
we call Z.
E{Z} = c10
/2. Following the Cauchy-Schwarz inequality,
Clearly,
10
E|pq| ≤ E|pq|2 = E(Z/π) = c2π
.
Also, note that E[T (e, q)|B] is bounded by the size of D, i.e., E[T (e, q)|B] =
O(n). A final calculation shows that
E[T (e, q)] ≤ c1 + c2 n log n .
⊔
⊓
On Lawson’s Oriented Walk in Random Delaunay Triangulations
3
229
Lawson’s Oriented Walk with Sampling
We notice that it is very easy to generalize Lawson’s Oriented Walk by starting
at a ‘closer’ edge e using random sampling, as done in [DMZ98]. The algorithm
is presented as follow.
(1) Select m edges at random and without replacement from D. Let e = Y1 Y2
be the closest one from q.
(2) Determine the triangle t adjacent to e such that t and q are on the same side
of the line containing e. Let the other two edges of t be e1 , e2 .
(3) Determine ei , i = 1, 2, such that the halfplane passing through ei and not
containing t, hi , contains q. If both ei ’s have this property, randomly pick
up one. If neither ei ’s have this property, return t as the triangle containing
q.
(4) Update e ← ei and repeat step (2)-(4).
In Step (1), the distance between a sample edge and q can be measured as the
distance between the midpoint of the sample edge and q. The following theorem
can be obtained in very much the way as in [DMZ98]. We hence omit the proof.
Theorem 2. Let C be a compact convex set with unit area in R2 , and let
X1 , . . . , Xn be n points drawn independently in C from
an (α, β)-measure. If
the query point q is independent of X1 , . . . , Xn and is O( logn n ) distance away
from ∂C, then the expected time of Lawson’ Oriented Walk with Sampling is
bounded by
c11 m + c12 n/m .
1/3
1/3
If m =
Θ(n ), then the running time is optimized to O(n ), provided that q
n
) distance away from ∂C.
is O( log
n1/3
4
Empirical Results
In this section, we present some empirical results to compare the following algorithms: Green and Sibson’s Walkthrough method (Walk), Lawson’s Oriented
Walk (Lawson), Jump and Walk (J&W) and Lawson’s Oriented Walk with Sampling (L&S). All the data points and query points are within a unit square Q
bounded by (0,0) and (1,1). (Throughout this section, we define an axis-parallel
square by giving the coordinates of its lower-left and upper-right corner points.)
We mainly consider two classes of data: random (uniformly generated) points in
Q and three clusters of random points in Q. The latter case does not satisfy the
conditions of the theorems we have proved in this paper, but it covers practical
situation when data points could be clustered.
The 3-cluster contains three cluster squares defined by lower-left and upperright corner points: (0.40,0.10) and (0.63,0.33); (0.70,0.67) and (0.93,0.90); and,
(0.10,0.67) and (0.33,0.90). Each cluster square has an area of 0.0529 (or 5.29%
230
B. Zhu
Fig. 3. 200 random data points in Q and 200 random data points within the threecluster.
of the area of Q). In Figure 3 we show two examples for random data and 3cluster data when there are 200 data points. In both situations, we include the
four corner points of Q as data points.
Our empirical results are summarized in Table 1 and Table 2. For each n,
we record the average cost (i.e., # of triangles visited) over 10000 queries. The
actual cost is also related to the actual implementation, especially the geometric
primitives used. For Jump&Walk and Lawson’s Oriented Walk with Sampling,
we use either s1 = ⌊n1/3 ⌋ or s2 = ⌈n1/3 ⌉ sample edges, depending on whether
|n − s31 | or |n − s32 | is smaller.
Table 1. Comparison of Walk, Jump&Walk, Lawson’s Oriented Walk and Lawson’s
Oriented Walk with Sampling when the data points are random.
n
10000 15000 20000 25000 30000 35000 40000 45000 50000
W alk 110 130 155 182 197 211 227 235 257
Lawson 127 140 173 193 209 244 243 258 265
J&W
24
28
31
33
35
38
39
40
42
L&S
25
29
33
35
37
41
42
43
45
From Table 1, we can see that when the data points are randomly generated
Lawson’s Oriented Walk usually visits an extra (small) constant number of triangles compared with Green and Sibson’s walkthrough method. This conforms
with the proof of Theorem 1 (in which we set c8 = c6 , i.e., the number of triangles visited by the two algorithms is bounded by the same function). For Jump &
Walk and Lawson’s Oriented Walk with Sampling, the difference is even smaller.
Table 2. Comparison of Walk, Jump&Walk, Lawson’s Oriented Walk and Lawson’s
Oriented Walk with Sampling when the data points are clustered.
n
10000 15000 20000 25000 30000 35000 40000 45000 50000
W alk
87
114 137 148 156 170 184 184 187
Lawson 103 132 151 156 175 189 207 225 237
J&W
27
33
34
36
37
40
41
44
45
L&S
29
33
36
38
39
41
44
46
47
On Lawson’s Oriented Walk in Random Delaunay Triangulations
231
From Table 2, we can see that when the data points are clustered similar
fact can be observed: Lawson’s Oriented Walk usually visits an extra constant
number of triangles compared with Green and Sibson’s walkthrough method and
the difference between Jump & Walk and Lawson’s Oriented Walk with Sampling is very small. One interesting observation is that the costs for walkthrough
and Lawson’s Oriented Walk algorithms when data are clustered are lower than
the corresponding costs when data are random. The reason is probably the following: As the three clusters have a total area of 15.87% of Q, most parts of
the Delaunay triangulation in Q are ‘sparse’. Since the 10000 query points are
randomly generated, we can say that most of the time these algorithms traverse
those ‘sparse’ regions.
5
Closing Remarks
We remark that similar results for Theorem 1 and Theorem 2 hold for d = 3,
with a polylog factor and extra boundary conditions inherit from [MSZ99]. It
is an interesting question whether we can generalize these results into any fixed
dimension, possibly with no extra polylog factor.
The theoretical results in this paper implies that within random Delaunay
triangulations Lawson’s Oriented Walk performs in very much the same way
as the Walkthrough method. Empirical results show the Walkthrough performs
slightly better. Still, if we know in advance that degeneracies could appear in
the data then Lawson’s Oriented Walk might be a better choice. It seems that
when the input data points are random then such degeneracies do not occur.
Acknowledgement. The author would like to thank Sunil Arya for communicating his research results.
References
[AEI+ 85]
T. Asano, M. Edahiro, H. Imai, M. Iri, and K. Murota. Practical use
of bucketing techniques in computational geometry. In G. T. Toussaint,
editor, Computational Geometry, pages 153–195. North-Holland, Amsterdam, Netherlands, 1985.
[AF97]
B. Aronov and S. Fortune. Average-case ray shooting and minimum weight
triangulations. In Proceedings of the 13th Symposium on Computational
Geometry, pages 203–212, 1997.
[ACMR00] S. Arya, S.W. Cheng, D. Mount and H. Ramesh. Efficient expected-case
algorithms for planar point location. In Proceedings of the 7th Scand.
Workshop on Algorithm Theory, pages 353–366, 2000.
[AMM00] S. Arya, T. Malamatos and D. Mount. Nearly optimal expected-case planar point location. In Proceedings of the 41th IEEE Symp on Foundation
of Computer Science, pages 208–218, 2000.
[AMM01a] S. Arya, T. Malamatos and D. Mount. A simple entropy-based algorithm
for planar point location. In Proceedings of the 12th ACM/SIAM Symp
on Discrete Algorithms, pages 262–268, Jan, 2001.
232
B. Zhu
[AMM01b] S. Arya, T. Malamatos and D. Mount. Entropy-preserving cuttings
and space-efficient planar point location. In Proceedings of the 12th
ACM/SIAM Symp on Discrete Algorithms, pages 256–261, Jan, 2001.
[BD98]
P. Bose and L. Devroye. Intersections with random geometric objects.
Comp. Geom. Theory and Appl., 10:139–154, 1998.
[BDTY00] J. Boissonnat, O. Devillers, M. Teillaud and M. Yvinc. Triangulations
in CGAL triangulation. Proc. 16th Symp. On Computational Geometry,
pages 11–18, 2000.
[BEY91]
M. Bern, D. Eppstein, and F. Yao. The expected extremes in a Delaunay triangulation. International Journal of Computational Geometry &
Applications, 1:79–91, 1991.
[Bo81]
A. Bowyer. Computing Dirichlet tessellations. The Computer Journal,
24:162–166, 1981.
[Br93]
E. Brisson. Representing geometric structures in d dimensions: Topology
and Order. Discrete & Computational Geometry, 9(4):387–426, 1993.
[De98]
O. Devillers. Improved incremental randomized Delaunay triangulation.
In Proceedings of the 14th Symposium on Computational Geometry, pages
106–115, 1998.
[DFNP91] L. De Floriani, B. Falcidieno, G. Nagy and C. Pienovi. On sorting triangles
in a Delaunay tessellation. Algorithmica, 6: 522–532, 1991.
[DLM99]
L. Devroye, C. Lemaire and J-M. Moreau. Fast Delaunay point location
with search structures. In Proceedings of the 11th Canadian Conf on
Computational Geometry, pages 136–141, 1999.
[DMZ98]
L. Devroye, E. P. Mücke, and B. Zhu. A note on point location in Delaunay
triangulations of random points. Algorithmica, Special Issue on Average
Case Analysis of Algorithms, 22(4):477–482, 1998.
[DL89]
D. P. Dobkin and M. J. Laszlo. Primitives for the manipulation of threedimensional subdivisions. Algorithmica, 4(1):3–32, 1989.
[DPT01]
O. Devillers, S. Pion, and M. Teillaud. Walking in a triangulation. In Proceedings of 17th ACM Symposium on Computational Geometry (SCG’01),
pages 106–114, 2001.
[Ed90]
H. Edelsbrunner. An acyclicity theorem for cell complexes in d dimensions.
Combinatorica, 10(3):251–280, 1990.
[GOR97]
M. T. Goodrich, M. Orletsky, and K. Ramaiyer. Methods for achieving fast query times in point location data structures. In Proceedings of
Eighth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA
’97), pages 757–766, 1997.
[GS78]
P. J. Green and R. Sibson. Computing Dirichlet tessellations in the plane.
The Computer Journal, 21:168–173, 1978.
[GS85]
L. J. Guibas and J. Stolfi. Primitives for the manipulation of general subdivisions and the computation of Voronoi diagrams. ACM Transactions
on Graphics, 4(2):74–123, 1985.
[HS95]
J. Hershberger and S. Suri. A pedestrian approach to ray shootings: shoot
a ray, take a walk. J. Algorithms, 18:403–431, 1995.
[La77]
C. L. Lawson. Software for C 1 surface interpolation. In J.R. Rice, editor,
Mathematical Software III, pages 161–194. Academic Press, NY, 1977.
[Mu93]
E. P. Mücke. Shapes and Implementations in Three-Dimensional Geometry. Ph.D. thesis. Technical Report UIUCDCS-R-93-1836. Department of
Computer Science, University of Illinois at Urbana-Champaign, Urbana,
Illinois, 1993.
On Lawson’s Oriented Walk in Random Delaunay Triangulations
[MSZ99]
[PS85]
[Sh96]
[Sn97]
[TG+ 96]
233
E. P. Mücke, I. Saias and B. Zhu. Fast randomized point location without preprocessing in two and three-dimensional Delaunay triangulations.
Comp. Geom. Theory and Appl., Special Issue for SoCG’96, 12(1/2):63–
83, 1999.
F. P. Preparata and M.I. Shamos. Computational Geometry: An Introduction. Springer-Verlag, 1985.
J. R. Shewchuk. Triangle: Engineering a 2D quality mesh generator and
Delaunay triangulator. In Proceedings of the First ACM Workshop on
Applied Computational Geometry, pages 124–133, 1996.
J. Snoeyink. Point location. In J. E. Goodman and J. O’Rourke, editors,
Handbook of Discrete and Computational Geometry, pages 559–574. CRC
Press, Boca Raton, 1997.
H. Trease, D. George, C. Gable, J. Fowler, E. Linnbur, A. Kuprat and
A. Khamayseh. The X3D Grid Generation System. In Proceedings of the
5th International Conference on Numerical Grid Generation in Computational Field Simulations, 239–244, 1996.
Competitive Exploration of Rectilinear Polygons
Mikael Hammar1 , Bengt J. Nilsson2 , and Mia Persson2
1
Department of Computer Science, Salerno University, Baronissi (SA) - 84081, Italy.
hammar@dia.unisa.it
2
Technology and Society, Malmö University College, S-205 06 Malmö, Sweden.
{Bengt.Nilsson,Mia.Persson}@ts.mah.se
Abstract. Exploring a polygon with a robot, when the robot does not
have a map of its surroundings can be viewed as an online problem. Typical for online problems is that you must make decisions based on past
events without complete information about the future. In our case the
robot does not have complete information about the environment. Competitive analysis can be used to measure the performance of methods
solving online problems. The competitive ratio of such a method is the
ratio between the method’s performance and the performance of the best
method having full knowledge of the future. We are interested in obtaining good upper bounds on the competitive ratio of exploring polygons
and prove a 3/2-competitive strategy for exploring a simple rectilinear
polygon in the L1 metric.
1
Introduction
Exploring an environment is an important and well studied problem in robotics.
In many realistic situations the robot does not possess complete knowledge about
its environment, e.g., it may not have a map of its surroundings [1,2,4,6,7,8,9].
The search of the robot can be viewed as an online problem since the robot’s
decisions about the search are based only on the part of its environment that
it has seen so far. We use the framework of competitive analysis to measure the
performance of an online search strategy S. The competitive ratio of S is defined
as the maximum of the ratio of the distance traveled by a robot using S to the
optimal distance of the search.
We are interested in obtaining good upper bounds for the competitive ratio
of exploring a rectilinear polygon. The search is modeled by a path or closed
tour followed by a point sized robot inside the polygon, given a starting point
for the search. The only information that the robot has about the surrounding
polygon is the part of the polygon that it has seen so far. Deng et al. [4] show a
deterministic strategy having competitive ratio two for this problem if distance
is measured according to the L1 -metric. Hammar et al. [5] prove a strategy with
competitive ratio 5/3 and Kleinberg [7] proves a lower bound of 5/4 for the
competitive ratio of any deterministic strategy. We will show a deterministic
strategy obtaining a competitive ratio of 3/2 for searching a rectilinear polygon
in the L1 -metric.
A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 234–245, 2003.
c Springer-Verlag Berlin Heidelberg 2003
Competitive Exploration of Rectilinear Polygons
235
The paper is organized as follows. In the next section we present some definitions and preliminary results. In Section 3 we give an overview of the strategy
by Deng et al. [4]. Section 4 contains an improved strategy giving a competitive
ratio of 3/2.
2
Preliminaries
We will henceforth always measure distance according to the L1 metric, i.e., the
distance between two points p and q is defined by
||p, q|| = |px − qx | + |py − qy |,
where px and qx are the x-coordinates of p and q and py and qy are the ycoordinates. We define the x-distance between p and q to be ||p, q||x = |px − qx |
and the y-distance to be ||p, q||y = |py − qy |.
If C is a polygonal curve, then the length of C, denoted length(C), is defined
the sum of the distances between consecutive pairs of segment end points in C.
Let P be a simple rectilinear polygon. Two points in P are said to see each
other, or be visible to each other, if the line segment connecting the points lies
in P. Let p be a point somewhere inside P. A watchman route through p is
defined to be a closed curve C that passes through p such that every point in P
is seen by some point on C. The shortest watchman route through p is denoted
by SWR p . It can be shown that the shortest watchman route in a simple polygon
is a closed polygonal curve [3].
Since we are only interested in the L1 length of a polygonal curve we can
assume that the curve is rectilinear, that is, the segments of the curve are all
axis parallel. Note that the shortest rectilinear watchman route through a point
p is not necessarily unique.
For a point p in P we define four quadrants with respect to p. Those are the
regions obtained by cutting P along the two maximal axis parallel line segments
that pass through p. The four quadrants are denoted Q1 (p), Q2 (p), Q3 (p), and
Q4 (p) in anti-clockwise order from the top right quadrant to the bottom right
quadrant. We let Qi,j (p) denote the union of Qi (p) and Qj (p).
Consider a reflex vertex of P. The two edges of P connecting at the reflex
vertex can each be extended inside P until the extensions reach a boundary point.
The segments thus constructed are called extensions and to each extension a
direction is associated. The direction is the same as that of the collinear polygon
edge as we follow the boundary of P in clockwise order. We use the four compass
directions north, west, south, and east to denote the direction of an extension.
Lemma 1. (Chin, Ntafos [3]) A closed curve is a watchman route for P if
and only if the curve has at least one point to the right of every extension of P.
Our objective is thus to present a competitive online strategy that enables
a robot to follow a closed curve from the start point s in P and back to s with
the curve being a watchman route for P.
236
M. Hammar, B.J. Nilsson, and M. Persson
An extension e splits P into two sets Pl and Pr with Pl to the left of e and
Pr to the right. We say a point p is to the left of e if p belongs to Pl . To the right
is defined analogously.
As a further definition we say that an extension e is a left extension with
respect to a point p, if p lies to the left of e, and an extension e dominates
another extension e′ , if all points of P to the right of e are also to the right of
e′ . By Lemma 1 we are only interested in the extensions that are left extensions
with respect to the starting point s since the other ones already have a point
(the point s) to the right of them. So without loss of clarity when we mention
extensions we will always mean extensions that are left extensions with respect
to s.
3
An Overview of GO
Consider a rectilinear polygon P that is not a priori known to the robot. Let s
be the robot’s initial position inside P. For the starting position s of the robot
we associate a point f 0 on the boundary of P that is visible from s and call f 0
the principal projection point of s. For instance, we can choose f 0 to be the first
point on the boundary that is hit by an upward ray starting at s. Let f be the
end point of the boundary that the robot sees as we scan the boundary of P in
clockwise order; see Figure 1(a). The point f is called the current frontier.
f
f 0 principal projection
f frontier f
s
0
ext(f )
f =v
C
(a)
ext(f )
(b)
f
0
C
v
fr
fl
p
fl0
q
(c)
s
fr0
(d)
Fig. 1. Illustrating definitions.
Let C be a polygonal curve starting at s. Formally a frontier f of C is a
vertex of the visibility polygon, VP(C) of C adjacent to an edge e of VP(C)
that is not an edge of P. Extend e until it hits a point q on C and let v be the
vertex of P that is first encountered as we move along the line segment [q, f ]
from q to f . We denote the left extension with respect to s associated to the
vertex v by ext(f ); see Figures 1(b) and (c).
Deng et al. [4] introduce an online strategy called greedy-online, GO for short,
to explore a simple rectilinear polygon P in the L1 metric. If the starting point
s lies on the boundary of P, their strategy, we call it BGO, goes as follows: from
the starting point scan the boundary clockwise and establish the first frontier f .
Competitive Exploration of Rectilinear Polygons
237
Move to the closest point on ext(f ) and establish the next frontier. Continue in
this fashion until all of P has been seen and move back to the starting point.
Deng et al. show that a robot using strategy BGO to explore a rectilinear
polygon follows a tour with shortest length, i.e., BGO has competitive ratio one.
They also present a similar strategy, called IGO, for the case when the starting
point s lies in the interior of P. For IGO they show a competitive ratio of two,
i.e., IGO specifies a tour that is at most twice as long as the shortest watchman
route through s.
IGO shoots a ray upwards to establish a principal projection point f 0 and
then scans the boundary clockwise to obtain the frontier. Next, it proceeds exactly as BGO, moving to the closest point on the extension of the frontier,
updating the frontier, and repeating the process until all of the polygon has
been seen.
It is clear that BGO could just as well scan the boundary anti-clockwise
instead of clockwise when establishing the frontiers and still have the same competitive ratio. Hence, BGO can be seen as two strategies, one scanning clockwise
and the other anti-clockwise. We can therefore parameterize the two strategies
so that BGO(p, orient) is the strategy beginning at some point p on the boundary and scanning with orientation orient where orient is either clockwise cw or
anti-clockwise aw .
Similarly for IGO, we can not only choose to scan clockwise or anti-clockwise
for the frontier but also choose to shoot the ray giving the first principal projection point in any of the four compass directions north, west, south, or east.
Thus IGO in fact becomes eight different strategies that we can parameterize
as IGO(p, dir , orient) and the parameter dir can be one of north, south, west,
or east.
We further define partial versions of GO starting at boundary and interior
points. Strategies PBGO(p, orient, region) and PIGO(p, dir , orient, region) apply GO until either the robot has explored all of region or the robot leaves the
region region. The strategies return as result the position of the robot when it
leaves region or when region has been explored. Note that PBGO(p, orient, P)
and PIGO(p, dir , orient, P) are the same strategies as BGO(p, orient) and
IGO(p, dir , orient) respectively except that they do not move back to p when all
of P has been seen.
4
The Strategy CGO
We present a new strategy competitive-greedy-online(CGO) that explores two
quadrants simultaneosly without using up too much distance. We assume that
s lies in the interior of P since otherwise we can use BGO and achieve an
optimal route. The strategy uses two frontier points simultaneously to improve
the competitive ratio. However, to initiate the exploration, the strategy begins
by performing a scan of the polygon boundary to decide in which direction to
start the exploration. This is to minimize the loss inflicted upon us by our choice
of initial direction.
238
M. Hammar, B.J. Nilsson, and M. Persson
The initial scan works as follows: construct the visibility polygon VP(s) of
the initial point s. Consider the set of edges in VP(s) not coinciding with the
boundary of P. The end points of these edges define a set of frontier points each
having an associated left extension. Let e denote the left extension that is furthest
from s (distance being measured orthogonally to the extension), breaking ties
arbitrarily. Let l be the infinite line containing e. We rotate the view point of s so
that Q3 (s) and Q4 (s) intersect l whereas Q1 (s) and Q2 (s) do not. Hence, e is a
horizontal extension lying below s. The initial direction of exploration is upwards
through Q1 (s) and Q2 (s). The two frontier points used by the strategy are
obtained as follows: the left frontier fl is established by shooting a ray towards
the left for the left principal projection point fl0 and then scan the boundary in
clockwise direction for fl ; see Figure 1(d). The right frontier fr is established
by shooting a ray towards the right for the right principal projection point fr0
and then scan the boundary in anti-clockwise direction for fr ; see Figure 1(d).
To each frontier point we associate a left extension ext(fl ) and a right extension
ext(fr ) with respect to s.
The strategy CGO, presented in pseudo code below makes use of three different substrategies: CGO-0, CGO-1, and CGO-2, that each takes care of specific
cases that can occur.
Our strategy ensures that whenever it performs one of the substrategies this
is the last time that the outermost while-loop is executed. Hence, the loop is
repeated only when the strategy does not enter any of the specified substrategies.
The loop will lead the strategy to follow a straight line and we will maintain the
invariant during the while-loop that all of the region Q3,4 (p) ∩ Q1,2 (s) has been
seen.
We distinguish four classes of extensions. A is the class of extensions e whose
defining edge is above e, B is the class of extensions e whose defining edge is
below e. Similarly, L is the class of extensions e whose defining edge is to the
left of e, and R is the class of extensions e whose defining edge is to the right of
e. For conciseness, we use C1 C2 as a shorthand for the Cartesian product C1 × C2
of the two classes C1 and C2 .
fl
fl = u
fl = u
s
(a)
u
s
(b)
s
(c)
Fig. 2. Illustrating the key point u.
We define two key vertices u and v together with their extensions ext(u) and
ext(v) that are useful to establish the correct substrategy to enter. The vertex
u lies in Q2 (s) and v in Q1 (s). If ext(fl ) ∈ A ∪ B, then u is the vertex issuing
Competitive Exploration of Rectilinear Polygons
239
ext(fl ) and ext(u) = ext(fl ). If ext(fl ) ∈ L and ext(fl ) crosses the vertical line
through s, then u is the vertex issuing ext(fl ) and again ext(u) = ext(fl ). If
ext(fl ) ∈ L does not cross the vertical line through s, then u is the leftmost
vertex of the bottommost edge visible from the robot, on the boundary going
from fl clockwise until we leave Q2 (s). The extension ext(u) is the left extension
issued by u, and hence, ext(u) ∈ A; see Figures 2(a), (b), and (c). The vertex v
is defined symmetrically in Q1 (s) with respect to fr .
Each of the substrategies is presented in sequence and for each of them we
claim that if CGO executes the substrategy, then the competitive ratio of CGO is
bounded by 3/2. Let FR s be the closed route followed by strategy CGO starting
at an interior point s. Let FR s (p, q, orient) denote the subpath of FR s followed in
direction orient from point p to point q, where orient can either be cw (clockwise)
or aw (anti-clockwise). Similarly, we define the subpath SWR s (p, q, orient) of
SWR s . We denote by SP (p, q) a shortest rectilinear path from p to q inside P.
Strategy CGO
1 Establish the exploration direction by performing the initial scan of the polygon
boundary
2 Establish the left and right principal projection points fl0 and fr0 for Q2 (s) and
Q1 (s) respectively
3 while Q1 (s) ∪ Q2 (s) is not completely seen do
3.1
Obtain the left and right frontiers, fl and fr
3.2
if fl lies in Q2 (s) and fr lies in Q1 (s) then
3.2.1
Update vertices u and v as described
in the text
3.2.2
if (ext(u), ext(v)) ∈ LR or (ext(u), ext(v)) ∈ AR∪LA and ext(u)
crosses ext(v) then
3.2.2.1
Go to the closest horizontal extension
elseif (ext(u), ext(v)) ∈ BR ∪ LB or (ext(u), ext(v)) ∈ AR ∪ LA
and ext(u) does not cross ext(v) then
3.2.2.2
Apply substrategy CGO-1
elseif (ext(u), ext(v)) ∈ AA ∪ AB ∪ BA ∪ BB then
3.2.2.3
Apply substrategy CGO-2
endif
else
3.2.3
Apply substrategy CGO-0
endif
endwhile
4 if P is not completely visible then
4.1 Apply substrategy CGO-0
endif
End CGO
240
M. Hammar, B.J. Nilsson, and M. Persson
We claim the following two simple lemmas without proof.
Lemma 2. If t is a point on some tour SWR s , then
length(SWR t ) ≤ length(SWR s ).
Lemma 3. Let S be a set of points that are enclosed by some tour SWR s , and
let S1 = S ∩ Q1,2 (s), S2 = S ∩ Q2,3 (s), S3 = S ∩ Q3,4 (s), and S4 = S ∩ Q1,4 (s).
Then
length(SWR s ) ≥ 2 max{||s, p||y } + 2 max{||s, p||x } +
p∈S1
p∈S2
+ 2 max{||s, p||y } + 2 max{||s, p||x }.
p∈S3
p∈S4
The structure of the following proofs are very similar to each other. In each
case we will establish a point t that we can ensure is passed by SWR s and that
either lies on the boundary of P or can be viewed as to lie on the boundary of P.
We then consider the tour SWR t and compare its length to the length of FR s . By
Lemma 2 we know that length(SWR t ) ≤ length(SWR s ), hence the difference in
length between FR s and SWR t is an upper bound on the loss produced by CGO.
We start by presenting CGO-0, that does the following: Let p be the current
robot position. If Q1 (p) is completely seen from p then we run
PIGO(p, north, aw , P) and move back to the starting point s, otherwise Q2 (p)
is completely seen from p and we run PIGO(p, north, cw , P) and move back to
the starting point s.
Lemma 4. If CGO-0 is applied, then length(FR s ) = length(SWR s ).
Proof. Assume that CGO-0 realizes that when FR s reaches the point p, then
Q1 (p) is completely seen from p. The other case, that Q2 (p) is completely seen
from p is symmetric.
Since the path FR s (s, p, orient) that the strategy has followed when it reaches
point p is a straight line, the point p is the currently topmost point of the path.
Hence, we can add a vertical spike issued by the boundary point immediately
above p, giving a new polygon P′ having p on the boundary and furthermore
with the same shortest watchman route through p as P. This means that performing strategy IGO(p, north, orient) in P yields the same result as performing
BGO(p, orient) in P′ , p being a boundary point in P′ , and orient being either
cw or aw . The tour followed is therefore a shortest watchman route through the
point p in both P′ and P.
Also the point p lies on an extension with respect to s, by the way p is defined,
and it is the closest point to s such that all of Q1 (s) has been seen by the path
FR s (s, p, orient) = SP (s, p). Hence, there is a route SWR s that contains p and
by Lemma 2 length(SWR p ) ≤ length(SWR s ). The tour followed equals FR s =
SP (s, p) ∪ SWR p (p, s, aw ), and we have that length(FR s ) = length(SWR p ) ≤
length(SWR s ), and since FR s cannot be strictly shorter than SWR s the equality
holds which concludes the proof.
Competitive Exploration of Rectilinear Polygons
p
FR s
p
FR s
v
241
v
u
u
s
SWR u
(a)
r
s
SWR u
(b)
Fig. 3. Illustrating the cases in Lemma 5 when ||s, p||y + ||s, u||x ≤ ||s, v||x .
Next we present CGO-1. Let u and v be vertices as defined earlier. The
strategy does the following: if (ext(u), ext(v)) ∈ LA ∪ LB, we mirror the polygon P at the vertical line through s and swap the names of u and v. Hence,
(ext(u), ext(v)) ∈ AR ∪ BR. We continue moving upwards updating fr and v
until either all of Q1 (s) has been seen or ext(v) no longer crosses the vertical
line through s.
If all of Q1 (s) has been seen then we explore the remaining part of P using
PIGO(p, east, aw , P), where p is the current robot position.
If ext(v) no longer crosses the vertical line through s then we either need to
continue the exploration by moving to the right or return to u and explore the
remaining part of the polygon from there.
If ||s, p||y + ||s, u||x ≤ ||s, v||x we choose to return to u. If ext(u) ∈ A we
run PBGO(u, aw , P) and if ext(u) ∈ B we use PBGO(u, cw , P); see Figure 3.
Otherwise, ||s, p||y + ||s, u||x > ||s, v||x and in this case we move to the closest
point v ′ on ext(v). By definition, the extension of v is either in A or B in this
case.
If ext(v) ∈ B then v = v ′ and we choose to run PBGO(v, aw , P).
Otherwise, ext(v) ∈ A. If Q1 (v ′ ) is seen from v ′ then the entire quadrant
has been explored and we run PIGO(v ′ , east, aw , P) to explore the remainder of the polygon. If Q1 (v ′ ) is not seen from v ′ then there are still things
hidden from the robot in Q1 (v). We explore the rest of the quadrant using
PBGO(v ′ , north, aw , Q1 (v)) reaching a point q where a second decision needs to
be made.
If v is seen from the starting point and ||s, q||x ≤ ||s, v||, we go back to v and
run PBGO(v, aw , P), otherwise we run PIGO(q, east, cw , P) from the interior
point q; see Figure 5.
If v is not seen from the starting point s then we go back to v and run
PBGO(v, aw , P).
To finish the substrategy CGO-1 our last step is to return to the starting
point s.
242
M. Hammar, B.J. Nilsson, and M. Persson
Lemma 5. If CGO-1 is applied, then length(FR s ) ≤ 32 length(SWR s ).
v
p
v
FR s
u
p
FR s
u
r
SWR v
s
(a)
v′
r
SWR v
s
(b)
Fig. 4. Illustrating the proof of Lemma 5 when ||s, p||y + ||s, u||x > ||s, v||x .
Proof. We handle each case separately. Assume for the first case that when FR s
reaches the point p, then Q1 (p) is completely visible. Hence, we have the same
situation as in the proof of Lemma 4 and using the same proof technique it
follows that length(FR s ) = length(SWR s ).
Assume for the second case that CGO-1 decides to go back to u, i.e., that
||s, p||y + ||s, u||x ≤ ||s, v||x ; see Figures 3(a) and (b). The tour followed equals
one of
SP (s, p) ∪ SP (p, u) ∪ SWR u ∪ SP (u, s)
FR s =
SP (s, p) ∪ SP (p, u) ∪ SWR u (u, r, cw ) ∪ SP (r, s)
where r is the last intersection point of FR s with the horizontal line through s.
Using that ||s, p||y + ||s, u||x ≤ ||s, v||x it follows that the length of FR s in both
cases is bounded by
length(FR s ) = length(SWR u ) + 2||s, p||y + 2||s, u||x ≤ length(SWR s ) +
3
+ ||s, p||y + ||s, u||x + ||s, v||x ≤
length(SWR s ).
2
The inequalities follow from the assumption together with Lemmas 2 and 3.
Assume for the third case that CGO-1 goes to the right, i.e., that ||s, p||y +
||s, u||x > ||s, v||x . We begin by handling the different subcases that are independent of whether s sees v; see Figures 4(a) and (b). The tour followed equals
one of
SP (s, v) ∪ SWR v (v, r, aw ) ∪ SP (r, s)
FR s =
SP (s, v ′ ) ∪ SWR v′ (v ′ , r, aw ) ∪ SP (r, s)
Since ||s, v||x = ||s, v ′ ||x the length of FR s is in both subcases bounded by
length(FR s ) ≤ length(SWR s ) + 2||s, v||x < length(SWR s ) +
3
length(SWR s ),
+ ||s, p||y + ||s, u||x + ||s, v||x ≤
2
Competitive Exploration of Rectilinear Polygons
243
The inequalities follow from Lemmas 2 and 3.
v
v
′
p
v′
u
FR s
r
SWR v
q
v
p
FR s
u
q
s
s
SWR v
(a)
v
q
(b)
p
v′
u
q′
FR s
SWR v
s
r
(c)
Fig. 5. Illustrating the proof of Lemma 5.
Assume now that CGO-1 goes to the right, i.e., that ||s, p||y + ||s, u||x >
||s, v||x and that v is indeed seen from s; see Figures 5(a) and (b). The tour
followed in this case is one of
SP (s, v) ∪ SWR v (v, q, cw ) ∪ SP (q, v) ∪ SWR v (v, r, aw ) ∪ SP (r, s) (∗)
FR s =
SP (s, v) ∪ SWR v ∪ SP (v, s)
where q is the resulting location after exploring Q1 (v). Here we use that v is
seen from s, and hence, that the initial scan guarantees that there is a point t
of SWR s in Q3,4 (s) such that ||s, t||y ≥ ||s, v||x , thus FR s is bounded by
length(FR s ) = length(SWR v ) + 2 min{||s, v||, ||s, q||x } ≤ length(SWR s ) +
+ ||s, v||y + ||s, v||x + ||s, q||x < length(SWR s ) +
3
length(SWR s ).
+ ||s, v||y + ||s, t||y + ||s, q||x + ||s, u||x ≤
2
On the other hand, when v is not seen from s, the tour follows the path
marked with (∗) above; see Figure 5(c). Thus, the polygon boundary obscures
the view from s to v, and hence, there is a point q ′ on the boundary such that the
shortest path from s to v ′ contains q ′ . The path our strategy follows between s
244
M. Hammar, B.J. Nilsson, and M. Persson
and v ′ is a shortest path and we can therefore assume that it also passed through
q ′ . We use that ||s, q ′ ||x ≤ ||s, v||x ≤ ||s, q||x to get the bound.
length(FR s ) = length(SWR v ) + 2||s, q ′ ||x ≤ length(SWR s ) +
+ ||s, v||x + ||s, q||x < length(SWR s ) +
3
length(SWR s ).
+ ||s, v||y + ||s, u||x + ||s, q||x ≤
2
The inequalities above follow from Lemmas 2 and 3 and this concludes the proof.
Fig. 6. Illustrating the cases in the proof of Lemma 6.
We continue the analysis by first showing the substrategy CGO-2 and then
claiming its competitive ratio. The strategy does the following: if ||s, u||x ≤
||s, v||x then we mirror P at the vertical line through s also swapping the
names of u and v. This means that v is closer to the current point p with
respect to x-distance than u. Next, go to v ′ , the closest point on ext(v). If
ext(v) ∈ B, run PBGO(v, aw , P) since v = v ′ . If ext(v) ∈ A and Q1 (v) is seen
from v ′ then we run PIGO(v ′ , east, aw , P). If ext(v) ∈ A but Q1 (v) is not completely seen from v ′ then we explore Q1 (v) using PBGO(v ′ , north, cw , Q1 (v ′ )).
Competitive Exploration of Rectilinear Polygons
245
Once Q1 (v) is explored we have reached a point q and we make a second decision. If ||s, q||x ≤ ||s, v||, go back to v and run PBGO(v, aw , P), otherwise run
PIGO(q, east, cw , P). Finally go back to s.
We claim the following lemma without proof. The proof idea is the same as
that of Lemma 5.
Lemma 6. If CGO-2 is applied, then length(FR s ) ≤ 32 length(SWR s ).
We have the following theorem.
Theorem 1. CGO is 3/2-competitive.
5
Conclusions
We have presented a 3/2-competitive strategy to explore a rectilinear simple
polygon in the L1 metric.
An obvious open problem is to reduce the gap between the lower bound of
5/4 and our upper bound of 3/2 for deterministic strategies. It would also be
interesting to look at variants of this problem, e.g., what if we are only interested
in finding a shortest path and not a closed tour that sees all of the polygon; see
Deng et al. [4].
References
1. M. Betke, R.L. Rivest, M. Singh. Piecemeal Learning of an Unknown Environment. Machine Learning, 18(2–3):231–254, 1995.
2. K-F. Chan, T.W. Lam. An on-line algorithm for navigating in an unknown environment. International Journal of Computational Geometry & Applications, 3:227–244,
1993.
3. W. Chin, S. Ntafos. Optimum Watchman Routes. Information Processing Letters,
28:39–44, 1988.
4. X. Deng, T. Kameda, C.H. Papadimitriou. How to Learn an Unknown Environment I: The Rectilinear Case. Journal of the ACM, 45(2):215–245, 1998.
5. M. Hammar, B.J. Nilsson, S. Schuierer. Improved Exploration of Rectilinear
Polygons. Nordic Journal of Computing, 9(1):32–53, 2002.
6. F. Hoffmann, C. Icking, R. Klein, K. Kriegel. The Polygon Exploration
Problem. SIAM Journal on Computing, 31(2):577–600, 2001.
7. J.M. Kleinberg. On-line search in a simple polygon. In Proc. of 5th ACM-SIAM
Symp. on Discrete Algorithms, pages 8–15, 1994.
8. A. Mei, Y. Igarashi. An Efficient Strategy for Robot Navigation in Unknown
Environment. Inform. Process. Lett., 52:51–56, 1994.
9. C.H. Papadimitriou, M. Yannakakis. Shortest Paths Without a Map. Theoret.
Comput. Sci., 84(1):127–150, 1991.
An Improved Approximation Algorithm for
Computing Geometric Shortest Paths⋆
Lyudmil Aleksandrov1 , Anil Maheshwari2 , and Jörg-Rüdiger Sack2
1
Bulgarian Academy of Sciences, CICT,
Acad. G. Bonchev Str. Bl. 25-A, 1113 Sofia, Bulgaria
2
School of Computer Science, Carleton University,
Ottawa, Ontario K1S5B6, Canada
Abstract. Consider a polyhedral surface consisting of n triangular faces
where each face has an associated positive weight. The cost of travel
through each face is the Euclidean distance traveled multiplied by the
weight of the face. We present an approximation algorithm for computing a path such that the ratio of the cost of the computed path with
respect to the cost of a shortest path is bounded by (1 + ε), for a given
0 < ε < 1. The algorithm is based on a novel way of discretizing the polyhedral surface. We employ a generic greedy approach for solving shortest
path problems in geometric graphs produced by such discretization. We
improve upon existing approximation algorithms for computing shortest
paths on polyhedral surfaces [1,4,5,10,12,15].
1
Introduction
Shortest path problems are among the fundamental problems studied in computational geometry and graph algorithms. These problems arise naturally in
application areas such as motion planning, navigation and geographical information systems. Aside from the importance of shortest paths problems in their
own right, often they appear in the solutions of other problems. Existing algorithms for many shortest path problems, are quite complex in design and
implementation or have very large time and space complexities. Hence they are
unappealing to practitioners and pose a challenge to theoreticians. This coupled
with the fact that geographic and spatial models are approximations of reality and high-quality paths are favored over optimal paths that are “hard” to
compute, approximation algorithms are suitable and necessary.
In this paper we present algorithms for computing approximate shortest
paths on (weighted) polyhedral surfaces. Our solutions employ the paradigm
of partitioning a continuous geometric search space into a discrete combinatorial search space. Discretization methods are natural, theoretically interesting,
and enable implementation. They transform geometric shortest path problems
into combinatorial shortest path problems in graphs. Shortest path problems in
graphs are well studied and general solutions with implementations are readily
⋆
Research supported in part by NSERC
A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 246–257, 2003.
c Springer-Verlag Berlin Heidelberg 2003
An Improved Approximation Algorithm
247
available. We consider surfaces that are polyhedral 2-manifolds, whereas most of
the previous algorithms were designed to handle particular geometric instances,
such as convex polyhedra, or possibly non-convex hole-free polyhedra, etc. Also,
we allow arbitrary (positive) weights to be assigned to the faces of the domain
thus generalizing from uniform and obstacle avoidance scenarios. While shortest paths graph algorithms are available and applicable to the graphs generated
here, the geometric structure of shortest path problems can be exploited for the
design of more efficient algorithms.
Brief Literature Review: Shortest path problems can be categorized by
various factors which include the dimensionality of the space, the type and the
number of objects or obstacles, and the distance measure used. We discuss those
contributions which relate directly to this paper. The following table summarizes
the results for shortest path problems on polyhedral surfaces. We need a few
preliminaries in order to comprehend the table. Let P be a polyhedral surface in
3-dimensional Euclidean space consisting of n triangular faces. A path π ′ is an
1+ǫ approximation of a shortest path π between two vertices of P if ||π ′ || ≤ (1+
ǫ)||π||, where ||π|| denotes the length of π and ǫ > 0. A natural generalization of
the Euclidean shortest path problems are shortest path problems set in weighted
surfaces. In this problem a triangulated polyhedral surface is given consisting of
n faces, where each face has a positive weight representing the cost of traveling
through that face. The cost of a path is defined to be the sum of Euclidean
lengths multiplied by the face weights of the sub-paths within each face traversed.
(Results on weighted shortest paths involve geometric parameters and they have
been omitted for the sake of clarity.)
Surface
Cost Metric Approx. Ratio Time Complexity
Reference
Convex
Euclidean Exact
O(n3 log n)
[14]
Non-convex Euclidean Exact
O(n2 log n)
[11]
Non-convex Euclidean Exact
O(n2 )
[7]
2
Non-convex Euclidean Exact
O(n log n)
[9]
Convex
Euclidean 2
O(n)
[8]
Convex
Euclidean 1 + ǫ
O(n log 1ǫ + 1/ǫ3 )
[3]
√
Convex
Euclidean 1 + ǫ
O(n/ ǫ + 1/ǫ4 )
[2]
5/3
5/3
Non-convex Euclidean 7(1 + ǫ)
O(n log n)
[1]
Non-convex Euclidean 15(1 + ǫ)
O(n8/5 log8/5 n)
[1]
n
8
Non-convex Weighted (1 + ǫ)
O(n log ǫ )
[12]
Non-convex Weighted Additive
O(n3 log n)
[10]
Non-convex Weighted (1 + ǫ)
O( ǫn2 log n log 1ǫ )
[4]
Non-convex Weighted (1 + ǫ)
O( nǫ log 1ǫ ( √1ǫ + log n)) [5]
Non-convex Weighted (1 + ǫ)
O( nǫ log nǫ log 1ǫ )
[15]
Non-convex Weighted (1 + ǫ)
O( √nε log nε log 1ε )
This paper
From practical point of view the “exact” algorithms are unappealing, since they
are fairly complex, numerically unstable and may require exponential number
248
L. Aleksandrov, A. Maheshwari, and J.-R. Sack
of bits to perform the computation associated to “unfolding” of faces. These
drawbacks have motivated researchers to look into practical approximation algorithms. Approximation algorithms of [8,2,10,15,5,4] have been implemented.
New Results - Overview and Significance: Results of this paper are
1. We provide a new discretization of polyhedral surfaces. For a given approximation parameter ε ∈ (0, 1), the size of the discretization for a polyhedral
surface consisting of n triangular faces is O( √nε log 1ε ). We precisely evaluate
the constants hidden in the big-O notation. (Section 2)
2. We define approximation graphs with nodes corresponding to the Steiner
points of the discretization. We show that the distance between any pair of
nodes in the approximation graph is within a factor of (1 + ε) times the cost
of a shortest path in the corresponding surface. (Section 3)
3. We describe a greedy approach for solving the single source shortest path
(SSSP) problem in the approximation graph and obtain an O( √nε log nε log 1ε )
time (1 + ε)-approximation algorithm for SSSP problem on a polyhedral
surface. (Section 4)
Our scheme places Steiner points, for the first time, in the interior of the faces
and not on the face boundaries. While this is somewhat counter-intuitive, we
can show that the desired approximation properties can still be proven, but now
using a much sparser mesh. (In addition this leads to algorithmic simplifications
by avoiding the construction of “cones” used in [5].) The size of the discretization
is smaller
than those previously established and the improvement is by a factor
√
of ε. A greedy approach for computing SSSP in the approximation graph has
been proposed in [15]. Edges in our approximation graphs do not correspond to
line segments as required in their algorithm, as well as their approach does not
seem to generalize to 3-dimensions. We propose an alternative greedy algorithm,
which is applicable here as well as generalizes to 3-dimensions.
Geographical information systems are an immediate application domain for
shortest path problems on polyhedral surfaces and terrains. In such applications,
the number of faces, n, may be huge (several million). Storage and time complexities are functions on n and constants are critical. In terms of computational
complexity our algorithm improves upon previous approximation algorithms for
solving shortest path problems on polyhedral surfaces [1,4,5,10,12,15]. The running time of
√ our algorithm improves upon the most recent algorithm of [15] by
a factor of ε. Ignoring the geometric parameters, the original algorithm of [12]
has been improved by about 1/n7 . The algorithm of [12] uses O(n4 ) space. This
was improved substantially in [5,15]. The discretization presented here improves
further
on the storage requirement by reducing the number of Steiner points by
√
ε over [5,15].
The practicality of discretization for solving geodesic shortest path problems
has been demonstrated in [10,15,16]. From a theoretical viewpoint the discretization scheme proposed here is more complex and requires very careful analysis, its
implementation would however be similar to our previous ε-schemes [4,5]. These
have been implemented and experimentally verified in [16]. More precisely, the
An Improved Approximation Algorithm
249
algorithm presented here does not require any complex data structures (just
linked lists, binary search trees, and priority queues). Existing software libraries
for computing shortest paths in graphs (Dijkstra’s algorithm) can be used. We
provide explicit calculation of key constants often hidden through the use of the
big − O-notation. The constant in the estimate on the total number of Steiner
points (Lemma 1) is 12Γ log L, where Γ is the average of the reciprocals of the
sinuses of the angles of the faces of P . For example, if no face of P has angles
smaller than 10◦ , then Γ ≤ 5. Moreover the simplicity of our algorithm, coupled
with the fact that we obtain theoretically guaranteed approximation factors,
should make it a very promising candidate for the application domain. It is important to note that the edges and Steiner points of the discretization can be
produced on-the-fly. When Dijkstra’s algorithm requests edges incident to the
current vertex all incident edges (connecting Steiner points) are generated.
2
Preliminaries and Discretization
Let P be a triangulated polyhedral surface in the 3-dimensional Euclidean space.
P can be any polyhedral 2-manifold. We do not assume any additional geometrical or topological properties such as convexity, being a terrain, or absence of
holes, etc. Assume that P consists of n triangular faces denoted by t1 , . . . , tn .
Positive weights w1 , . . . , wn are associated with triangles t1 , . . . , tn representing
the cost of traveling inside them. The cost of traveling along an edge is the minimum of the weights of the triangles incident to that edge. Edges are assumed
to be part of the triangle, from which they inherit their weight. Any continuous
(rectifiable)
n curve lying in P is called a path. The cost of a path π is defined by
π = i=1 wi |πi |, where |πi | denotes the Euclidean length of the intersection
of π with triangle ti , i.e., πi = π ∩ ti . Given two distinct points u and v in P
a minimum cost path π(u, v) joining u and v is called a geodesic path. Without
loss of generality we may assume that u and v lie on a boundary of a face. In this
setting, it is well known that geodesic paths are simple (non self-intersecting)
and consist of a sequence of segments, whose endpoints are on the edges of P .
The intersection of a geodesic path with the interior of faces or edges is a set of
disjoint segments. More precisely, each segment on a geodesic path is of one of
the following two types: 1) face-crossing – a segment which crosses a face joining two points on its boundary; 2) edge-using – a sub-segment of an edge. We
define linear paths to be simple paths consisting of face-crossing and edge-using
segments exclusively. Thus, any geodesic path is a linear path. A linear path
π(u, v) is represented as a sequence of its segments {s1 , . . . , sl } or equivalently
as a sequence of points a0 , . . . , al+1 lying on the edges, that are endpoints of
these segments, i.e., si = (ai−1 , ai ), u = a0 , and v = al+1 . Points ai that are not
vertices of P are called bending points of the path. Geodesic paths satisfy Snell’s
law of refraction at each of their bending points (see [12] for details).
In the following we introduce a function d(x) defined as the minimum Euclidean distance from a point x ∈ P to the edges around x. The distance d(x) is
250
L. Aleksandrov, A. Maheshwari, and J.-R. Sack
a lower bound on the length of a face-crossing segment incident to x and plays
essential role in our constructions.
Definition 1. Given a point x ∈ P let E(x) be the set of edges of triangles
incident to x minus the edges incident to x. The distance d(x) is defined as the
minimum Euclidean distance from x to the edges in E(x).
Throughout the paper ε is a real number in (0, 1). Next, we define a set of points
on P , called Steiner points, that together with vertices of P constitute an (1+ε)approximation mesh for the set of linear paths on P . That is, we define a graph
Gε whose set of nodes consists of the vertices of P and the Steiner points. The
edges of Gε correspond to local shortest paths between their endpoints and have
cost equal to the cost of their corresponding path. Then we show how the graph
Gε can be used to approximate geodesic paths between vertices of P . Using
Definition 1 above, for each vertex v of P we define a weighted radius
r(v) =
wmin (v)
d(v),
7wmax (v)
(1)
where wmax (v) and wmin (v) are the maximum and the minimum weights of the
faces incident to v. By using the weighted radius r(v) for each face incident to
v we define a “small” isosceles triangle with two sides of length εr(v) incident
to v. These triangles around v form a star shaped polygon S(v), which we call
a vertex-vicinity of v.
In all previous approximation schemes Steiner points have been placed on
the edges of P . Here we place Steiner points inside faces of
√ P . In this way we
reduce the total number of Steiner points by a factor of ε. We will need to
show that the (1 + ε)-approximation property of the resulting mesh is preserved.
Let triangle t be a face of P . Steiner points inside t are placed along the three
bisectors of t as follows. Let v be a vertex of t and ℓ be the bisector of the angle
α of t at v. We define a set of Steiner points p1 , . . . , pk on ℓ by
(2)
|pi−1 pi | = sin(α/2) ε/2|vpi−1 |, i = 1, . . . , k,
where p0 is the point on ℓ and on the boundary of the vertex vicinity S(v)
(Figure 1). The next lemma establishes estimates on the number of Steiner
points inserted on a particular bisector and on their total number.
Lemma 1. (a) The number of Steiner points inserted in a bisector ℓ of an
angle α at a vertex v is bounded by C(ℓ) √1ε log2 2ε , where the constant C(ℓ) <
4
sin α
log2
|ℓ|
r(v) cos(α/2) .
(b) The total number of Steiner points on P is less than
n
2
C(P ) √ log2 ,
ε
ε
(3)
where C(P ) < 12Γ log L and L is the maximum of the ratios |ℓ(v)|/r(v) cos(α/2)
andΓ is the average of the reciprocals of the sinuses of angles on P , i.e. Γ =
3n
1
1
i=1 sin αi .
3n
An Improved Approximation Algorithm
251
Proof: We estimate the number of Steiner points on a bisector ℓ of an angle
α at
a vertex v. From (2) it follows, that |vpi | = λi εr(v) cos(α/2), where λ =
(1 + ε/2 sin(α/2)). Therefore the number of the Steiner points on ℓ is
k ≤ logλ
|ℓ|
|ℓ|
+ ln 2ε
ln 2r(v) cos(α/2)
4 log2 r(v) cos(α/2)
2
|ℓ|
√
=
log2 .
≤
εr(v) cos(α/2)
ε
sin α ε
ln(1 + ε/2 sin(α/2))
This proves (a). Estimate (b) is obtained by summing up (a) over all bisectors
on P .
⊓
⊔
Fig. 1. (Left) Steiner points inserted in a bisector ℓ are shown.
(Right) Proof of Lemma
2 is illustrated: Sinuses of angles ∠pi x1 pi+1 and ∠pi x2 pi+1 ≤ ε/2, implying |x1 pi | +
|pi x2 | ≤ (1 + ε/2)|x1 x2 |.
The set of Steiner points partitions bisectors into intervals, that we call
Steiner intervals. The following lemma establishes two important properties of
Steiner intervals (Figure 1).
Lemma 2. (a) Let ℓ be the bisector of the angle formed by edges e1 and e2 of
P . If (pi , pi+1 ) is a Steiner interval on ℓ and x is a point on e1 or e2 , then
(4)
sin(∠pi xpi+1 ) ≤ ε/2.
(b) Let x1 and x2 be points on e1 and e2 and outside the vertex vicinity of the
vertex incident to e1 and e2 . If p is the Steiner point closest to the intersection
between the segment (x1 , x2 ) and ℓ, then
|x1 p| + |px2 | ≤ (1 + ε/2)|x1 x2 |.
(5)
Proof: The statement (a) follows easily from the definition of Steiner points.
Here we prove (b). Let us denote by θ, θ1 , and θ2 the angles of the triangle px1 x2 at p, x1 and x2 respectively. From (a) and ε ≤ 1 it follows
1 /2) sin(θ2 /2)
)|x1 x2 | =
that θ ≥ π/2 and we have |x1 p| + |px2 | = (1 + 2 sin(θsin(θ/2)
(1 +
sin(θ1 ) sin(θ2 )
2 sin(θ/2) cos(θ1 /2) cos(θ2 /2) )|x1 x2 |
3
Discrete Paths
≤ (1 +
ε
)|x1 x2 |
4 sin2 (θ/2)
≤ (1 + ε/2)|x1 x2 |.⊓
⊔
Next, we define a graph Gε = (V (Gε ), E(Gε )). The set of nodes V (Gε ) consists
of the set of vertices of P and the set of Steiner points. The set of edges E(Gε ) is
252
L. Aleksandrov, A. Maheshwari, and J.-R. Sack
defined as follows. A node that is a vertex of P is connected to all Steiner points
on bisectors in the faces incident to this vertex. The cost of these edges equals
the cost of the shortest path between its endpoints restricted to lie inside the
triangle containing them. These shortest paths consist either of a single segment
joining the vertex and the corresponding Steiner point or of two segments the
first of which follows one of the edges incident to the vertex. The rest of the
edges of Gε join pairs of Steiner points lying on neighboring bisectors as follows.
Let e be an edge of P . In general, there are four bisectors incident to e. We define
graph edges between pairs of nodes (Steiner points) on these four bisectors. We
refer to all these edges as edges of Gε crossing the edge e of P . Let (p, q) be an
edge between Steiner points p and q crossing e. The cost of (p, q) is defined as the
cost of the shortest path between p and q restricted to lie inside the quadrilateral
formed by the two triangles around e, that is pq = minx,y∈e px+xy+yq.
(Note that we do not need edges in Gε between pairs of Steiner points for which
the local shortest paths do not intersect e.) Paths in Gε are called discrete
paths. The cost of a discrete path π is the sum of the costs of its edges and is
denoted by π. Note that if we replace each of the edges in a discrete path with
the corresponding segments (at most three) forming the shortest path used to
compute its cost we obtain a path on P of the same cost.
Theorem 1. Let π̃(v0 , v) be a linear path joining two different vertices v0 and
v on P . There exists a discrete path π(v0 , v), such that π ≤ (1 + ε)π̃.
Proof: First, we discuss the structure of linear paths. Following from the definition, a linear path π̃(v0 , v) consists of face-crossing and edge-using segments
and is determined by the sequence of their endpoints, called bending points,
which are located on the edges of P . Following the path from v0 and on, let a0
be the last bending point on π̃ that is inside the vertex vicinity S(v0 ). Next,
let b1 be the first bending point after a0 that is in a vertex vicinity, say S(v1 ),
and let a1 be the last bending point in S(v1 ). Continuing in this way, we define
a sequence of vertices of v0 , v1 , . . . , vl = v and a sequence of bending points
a0 , b1 , a1 , . . . , al−1 , bl on π̃, such that for i = 0, . . . , l, points bi , ai are in S(vi )
(we assume b0 = v0 , al = v). Furthermore, portions of π̃ between ai and bi do
not intersect vertex vicinities. Thereby, the path π̃ is partitioned into portions
π̃(v0 , a0 ), π̃(a0 , b1 ), π̃(b1 , a1 ), . . . , π̃(bl , v).
(6)
Portions π̃(ai , bi+1 ) for i = 0, . . . , l−1 are called between vertex vicinities portions
and portions π̃(bi , ai ) for i = 0, . . . , l (b0 = v0 ), are called vertex vicinities
portions. Consider a between vertex vicinity portion π̃(ai , bi+1 ) for some 0 ≤
i < l. We define π̃ ′ (vi , vi+1 ) to be the linear path from vi to vi+1 along the
sequence of inner bending points of π̃(ai , bi+1 ). Using triangle inequality and the
definition of vertex vicinities (1) we obtain
π̃ ′ (vi , vi+1 ) ≤ π̃(ai , bi+1 ) + vi ai + bi+1 vi+1 ≤
ε
π̃(ai , bi+1 ) + (wmin (vi )d(vi ) + wmin (vi+1 )d(vi+1 )). (7)
7
An Improved Approximation Algorithm
253
Changing all between vertex vicinities portions in this way we obtain a linear path π̃ ′ (v0 , v) = {π̃ ′ (v0 , v1 ), π̃ ′ (v1 , v2 ), . . . , π̃ ′ (vl−1 , v)}, consisting of between
vertex vicinities portions only.
Next, we approximate each of these portions by a discrete path. Consider a
portion π̃i′ = π̃ ′ (vi , vi+1 ) for some 0 ≤ i < l and let sj = (xj−1 , xj ), j = 1, . . . , ν
be the segments forming this portion (x0 = vi , xν = vi+1 ). Segments sj are facecrossing and edge-using segments. Indeed, there are no consecutive edge-using
segments. Let sj be a face-crossing segment. Then sj intersects the bisector
ℓj of the angle formed by the edges of P containing the end-points of sj . We
define pj to be the closest Steiner point to the intersection between sj and ℓj .
Now we replace each of the face-crossing segments sj of π̃i′ by two segments
path xj−1 , pj , xj and denote the obtained path by π̃i′′ . From (5) it follows that
π̃i′′ ≤ (1 + ε/2)π̃i′ . The sequence of bending points of π̃i′′ contains as a subsequence the Steiner points pj1 , . . . , pjν1 , (ν1 ≤ ν) corresponding to the facecrossing segments of π̃i′ . Note that pairs (vi , pi1 ) and (piν1 , vi+1 ) are adjacent in
Gε . Furthermore, between any two consecutive Steiner points pjµ , pjµ+1 there
is at most one edge-using segment and, according our definition of the graph
Gε , they are connected in Gε . The cost of each edge (pjµ , pjµ+1 ) is at most the
cost of the portions of π̃i′′ from pjµ to pjµ+1 . Therefore, the sequence of nodes
{vi , pj1 , . . . , pjν1 , vi+1 } defines a discrete path π(vi , vi+1 ) such that
π(vi , vi+1 ) ≤ π̃i′′ ≤ (1 + ε/2)π̃ ′ (vi , vi+1 ).
(8)
We combine discrete paths π(v0 , v1 ), . . . , π(vl−1 , v) and obtain a discrete path
π(v0 , v) from v0 to v. We complete the proof by estimating the cost of this path.
We denote wmin (vi )d(vi ) + wmin (vi+1 )d(vi+1 ) by κi and use (8), (7) obtaining
π(v0 , v) =
(1 + ε/2)
l−1
i=0
l−1
i=0
π((vi , vi+1 ) ≤ (1 + ε/2)
l−1
i=0
π̃ ′ (vi , vi+1 ) ≤
(π̃(ai , bi+1 ) + εκi /7) ≤ (1 + ε/2)π̃(v0 , v) + (3ε/14)
l−1
κi . (9)
i=0
l−1
It remains to estimate the sum i=0 κi appearing above. From the definitions
of d(·), (6), and (1) it follows that κi ≤ 2π̃(ai , bi+1 ) + vi ai + bi+1 vi+1 ≤
2π̃(ai , bi+1 ) + κi /7. Thus κi ≤ (7/3)π̃(ai , bi+1 ) and substituting this in (9)
we obtain the desired estimate π(v, v0 ) ≤ (1 + ε)π̃(v0 , v).
⊓
⊔
4
Algorithms
In this section we discuss algorithms for solving the Single Source Shortest Paths
(SSSP) problem in approximation graphs Gε . Straightforwardly, one can apply
Dijkstra’s algorithm. When implemented using Fibonacci heaps it would solve
SSSP problem in O(|Eε | + |Vε | log |Vε |) time. By Lemma 1, |Vε | = O( √nε log 1ε )
and by the definition of edges |Eε | = O( nε log2 1ε ). Thus it follows that the SSSP
254
L. Aleksandrov, A. Maheshwari, and J.-R. Sack
problem can be solved by Dijkstra’s algorithm in O( nε log nε log 1ε ) time. Already
this time matches the best previously known bound [15]. In the remainder of
this section we show how geometric properties of our model can be used to
obtain a more efficient algorithm for SSSP in the corresponding approximation
graph. More precisely, we present an algorithm that runs in O(|Vε | log |Vε |) =
O( √nε log nε log 1ε ) time.
First, we discuss the general structure of our algorithm. Let G(V, E) be a
directed graph with positive costs (lengths) assigned to its edges and s be a
fixed vertex of G. The SSSP problem is to find shortest paths from s to any
other vertex of G. The standard greedy approach for solving the SSSP problem
works as follows: a subset of vertices S to which the shortest path has already
been found and a set of edges E(S) connecting S with S a ⊂ V \ S is maintained.
The set S a consists of vertices not in S but adjacent to S. In each iteration
an optimal edge e(S) = (u, v) in E(S) is selected. Its target v is added to S
and E(S) is updated correspondingly. An edge e = e(S) is optimal for S if it
minimizes the value δ(u) + c(e), where δ(u) is the distance from s to u and c(e)
is the cost of e. The correctness of this approach follows from the fact that when
e = (u, v) is optimal the distance δ(v) is equal to δ(u) + c(e).
Different strategies for maintaining information about E(S) and finding an
optimal edge e(S) in each iteration result in different algorithms for computing
SSSP. For example, Dijkstra’s algorithm maintains only a subset Q(S) of E(S),
which however always contains an optimal edge. Namely, for each vertex v in S a
Dijkstra’s algorithm keeps in Q(S) one edge only – the one that ends the shortest
path to v using vertices in S only. Alternatively, one may maintain a subset Q(S)
of E(S) containing one edge per vertex u ∈ S. The target vertex of this edge
is called representative of u and is denoted by ρ(u). The vertex u itself is called
predecessor of its representative. The representative ρ(u) is defined to be the
target of the minimum cost edge in the propagation set I(u) of u, where I(u) ⊂
E(S) consists of all edges (u, v) such that δ(u) + c(u, v) ≤ δ(u′ ) + c(u′ , v) for any
other vertex u′ ∈ S (ties are broken arbitrarily). The union of propagation sets
forms a subset Q(S) of E(S), that always contains an optimal edge. Propagation
sets I(u) for u form a partition of Q(S), which we call a Propagation Diagram
and denote by I(S). Similar scheme has been used by [15].
A possible implementation of this alternative strategy is to maintain the set
of representatives R ⊂ S a organized in a priority queue, where a key of a vertex
ρ(u) in R is defined to be δ(u) + c(u, ρ(u)). Observe that the edge corresponding
to the minimum in R is an optimal edge for S. In each iteration the minimum
key node v in R is selected and the following three steps are implemented:
Step 1. The vertex v is moved from R into S. Then the propagation set I(v) is
computed and the propagation diagram I(S) is updated accordingly.
Step 2. The representative ρ(v) of v and a new representative ρ(u) for the
predecessor u of v are computed.
Step 3. The new representatives ρ(u) and ρ(v) are either inserted into R together
with their corresponding keys, or (if they are already in R) their keys are updated
and the decrease key operation is executed in R if necessary.
An Improved Approximation Algorithm
255
Clearly, this leads to a correct algorithm for solving the SSSP problem in G. The
total time for the priority queue operations if R is implemented with Fibonacci
heaps is O(|V | log |V |). Therefore the efficiency of this strategy depends on the
maintenance of the propagation diagrams, the complexity of the propagation
sets and efficient updates of the new representatives.
Our approach is as follows. We partition the set of edges E(S) into groups,
so that the propagation sets and the corresponding propagation diagrams when
restricted to a fixed group become simple and allow efficient updates. Then for
each vertex u in S we will keep multiple representatives in R, one for each
group, where edges incident to u participate. As a result a vertex in S a will
eventually have multiple predecessors. As we describe below, the number of
groups where u can participate will be O(1). We will be able to compute new
representatives in O(1) time and update propagation diagrams in logarithmic
time in our approximation graphs Gε . Next, we present some details and state
the complexity of the resulting algorithm.
The edges of the approximation graph Gε were defined to join pairs of nodes
(Steiner points) lying on neighboring bisectors, where two bisectors are neighbors
if the angles they split share an edge of P . Since the polyhedral surface P is
triangulated a fixed bisector may have at most six neighbors. We can partition
the set of edges of Gε into groups E(ℓ, ℓ1 ) corresponding to pairs of neighboring
bisectors ℓ and ℓ1 . For a node u on a bisector ℓ we maintain one representative
ρ(u, ℓ1 ) per each bisector ℓ1 neighboring ℓ. The representative ρ(u, ℓ1 ) is defined
to be the target of the minimum cost edge in the propagation set I(u; ℓ, ℓ1 ),
consisting of the edges (u, v) in E(ℓ, ℓ1 ), such that δ(u)+c(u, v) ≤ δ(u′ )+c(u′ , v)
for any node u′ ∈ ℓ ∩ S. A node on ℓ with a non-empty propagation set on ℓ1
will be called active for E(ℓ, ℓ1 ).
Consider now an iteration of our greedy algorithm. Let v be the node produced by Extract min operation in the priority queue R comprising of representatives. Denote the set of predecessors of v by R−1 (v). Our task is to compute
new representatives for v and for each of the predecessors u ∈ R−1 (v). Consider
first the case when v is a vertex of the polyhedral surface P . We assume that the
edges incident to a vertex v have been sorted with respect to their cost and when
a new representative for v is required we simply report the target of the smallest
cost edge joining v with S a . Thereby the new representative for a node that is
a vertex of P can be computed in constant time. The total number of edges
incident to vertices of P is O( √nε log 1ε ) and their sorting in a preprocessing step
takes O( √nε log2 1ε ) time. Consider now the case when v is a node on a bisector
say ℓ. An efficient computation of representatives in this case is based on the
following two lemmas.
Lemma 3. The propagation set I(v; ℓ, ℓ1 ) for an active node v is characterized
by an interval (x1 , x2 ) on ℓ1 , i.e., it consists of all edges in E(ℓ, ℓ1 ) whose targets
belong to (x1 , x2 ). Furthermore, the function dist(v, x), measuring the cost of
the shortest path from v to x restricted to lie in the union of the two triangles
containing ℓ and ℓ1 , is convex in (x1 , x2 ).
256
L. Aleksandrov, A. Maheshwari, and J.-R. Sack
Lemma 4. Let v1 , . . . , vk be the active vertices for E(ℓ, ℓ1 ). The propagation
diagram I(ℓ, ℓ1 ) = I(v1 , . . . , vk ) is characterized by k intervals. Updating the
diagram I(v1 , . . . , vk ) to the propagation diagram I(v1 , . . . , vk , v), where v is a
new active node in ℓ takes O(log k) time.
Thus to compute a new representative of v on a neighboring bisector ℓ1 we update
the propagation diagram I(ℓ, ℓ1 ). Then we consider the interval characterizing
the propagation set I(v; ℓ, ℓ1 ) and select the minimum cost edge whose target is in
that interval and in S a . Assume that nodes on ℓ1 currently in S a are maintained
in a doubly linked list with their positions on ℓ1 . Using the convexity of the
function dist(v, x) this selection can be done in time logarithmic on the number
of these nodes, which is O(log 1ε ). There are at most six new representatives of
v corresponding to bisectors around ℓ to be computed. Thus the total time for
updates related to v is O(log 1ε ). The update of the representative for a node
u ∈ R−1 (v) on ℓ takes constant time since no change in the propagation set
I(u; ·, ℓ) occurred and the new representative of u is a neighbor to the current
one in the list of nodes in S a on ℓ. The set of predecessors R−1 (v) contains at
most six vertices and thus their representatives are updated in constant time.
So computing representatives in an iteration takes O(log 1ε ) time and in total
O(|Vε | log 1ε ). The following theorem summarizes the result of this section.
Theorem 2. The SSSP problem in the approximation graph Gε for a polyhedral
surface P can be solved in O( √nε log nε log 1ε ) time.
In the following theorem we summarize the main result of this paper. Starting
from a vertex v0 our algorithm solves SSSP problem in the graph Gε and construct shortest paths tree rooted at v0 . According to Theorem 1 output distances
from v0 to other vertices of P are within a factor of 1 + ε from the cost of the
shortest paths. Using the definition of the edges of Gε an approximate shortest
path between pair of vertices can be output in time proportional to the number
of segments in this path. The approximate shortest paths tree rooted at v0 and
containing all Steiner points and vertices of P can be output in O(|Vε |) time.
Thus we have established the following theorem.
Theorem 3. Let P be a weighted polyhedral surface with n triangular faces and
ε ∈ (0, 1). Shortest paths from a vertex v0 to all other vertices of P can be
approximated within a factor of (1 + ε) in O( √nε log nε log 1ε ) time.
Extensions: We briefly comment on how our approach can be applied to
approximate shortest paths in weighted polyhedral domains and formulate the
corresponding result. In 3-dimensional space most shortest path problems are
difficult. Given a set of pairwise disjoint polyhedra in 3D and two points s
and t, the Euclidean 3-D Shortest Path Problem is to compute a shortest path
between s and t that avoids the interiors of polyhedra seen as obstacles. Canny
and Reif have shown that this problem is NP-hard [6] (even for the case of
axis parallel triangles in 3D). Papadimitriou [13] gave the first fully polynomial
(1 + ǫ)-approximation algorithm for the 3D problem. There are numerous other
An Improved Approximation Algorithm
257
results on this problem, but due to the space constraints we omit their discussion
and refer the reader to the most recent work [5] for a literature review.
Let P be a tetrahedralized polyhedral domain in the 3-dimensional Euclidean
space, consisting of n tetrahedra. Assume that positive weights are assigned to
the tetrahedra of P and that the cost of traveling inside a tetrahedron t is equal to
the Euclidean distance traveled multiplied by the weight of t. Using the approach
of this paper we are able to approximate shortest paths in P within (1+ε) factor
as follows: Discretization in this case is done by inserting Steiner points in the
bisectors of the dihedral angles of the tetrahedra of P . The total number of
Steiner points in this case is O( εn2 log 1ε ). The construction of Steiner points and
the proof of the approximation properties of the resulting graph Gε in this case
involves more elaborate analysis because of the presence of edge vicinities – small
spindle like regions around edges – in addition to vertex vicinities. Nevertheless,
an analogue to Theorem 1 holds. SSSP in the graph Gε can be computed by
following a greedy approach like that in Section 4.
References
1. K.R. Varadarajan and P.K. Agarwal, “Approximating Shortest Paths on Nonconvex Polyhedron”, SIAM Jl. Comput. 30(4): 1321–1340 (2000).
2. P.K. Agarwal, S. Har-Peled, and M.Karia, “Computing approximate shortest paths
on convex polytopes”, Algorithmica 33:227–242, 2002.
3. P.K. Agarwal et al., “Approximating Shortest Paths on a Convex Polytope in
Three Dimensions”, Jl. ACM 44:567–584, 1997.
4. L. Aleksandrov, M. Lanthier, A. Maheshwari, J.-R. Sack, “An ε-approximation
algorithm for weighted shortest paths”, SWAT, LNCS 1432:11–22, 1998.
5. L. Aleksandrov, A. Maheshwari, and J.-R. Sack, ”Approximation Algorithms for
Geometric Shortest Path Problems”, 32nd STOC, 2000, pp. 286–295.
6. J. Canny and J. H. Reif, “New Lower Bound Techniques for Robot Motion Planning
Problems”, 28th FOCS, 1987, pp. 49–60.
7. J. Chen and Y. Han, “Shortest Paths on a Polyhedron”, 6th SoACM-CG, 1990,
pp. 360–369. Appeared in ”Internat. J. Comput. Geom. Appl.”, 6: 127–144, 1996.
8. J. Hershberger and S. Suri, “Practical Methods for Approximating Shortest Paths
on a Convex Polytope in ℜ3 ”, 6SODA, 1995, pp. 447–456.
9. S. Kapoor, “Efficient Computation of Geodesic Shortest Paths”, 31st STOC, 1999.
10. M. Lanthier, A. Maheshwari and J.-R. Sack, “Approximating Weighted Shortest
Paths on Polyhedral Surfaces”, Algorithmica 30(4): 527–562 (2001).
11. J.S.B. Mitchell, D.M. Mount and C.H. Papadimitriou, “The Discrete Geodesic
Problem”, SIAM Jl. Computing, 16:647–668, August 1987.
12. J.S.B. Mitchell and C.H. Papadimitriou, “The Weighted Region Problem: Finding
Shortest Paths Through a Weighted Planar Subdivision”, JACM, 38:18–73, 1991.
13. C.H. Papadimitriou, “An Algorithm for Shortest Path Motion in Three Dimensions”, IPL, 20, 1985, pp. 259–263.
14. M. Sharir and A. Schorr, “On Shortest Paths in Polyhedral Spaces”, SIAM J. of
Comp., 15, 1986, pp. 193–215.
15. Z. Sun and J. Reif, “BUSHWACK: An approximation algorithm for minimal paths
through pseudo-Euclidean spaces”, 12th ISAAC, LNCS 2223:160–171, 2001.
16. M. Ziegelmann, Constrained Shortest Paths and Related Problems Ph.D. thesis,
Universität des Saarlandes (Max-Planck Institut für Informatik), 2001.
Adaptive and Compact Discretization for
Weighted Region Optimal Path Finding
Zheng Sun and John H. Reif
Department of Computer Science, Duke University, Durham, NC 27708, USA
{sunz,reif}@cs.duke.edu
Abstract. This paper presents several results on the weighted region
optimal path problem. An often-used approach to approximately solve
this problem is to apply a discrete search algorithm to a graph Gǫ generated by a discretization of the problem; this graph guarantees to contain
an ǫ-approximation of an optimal path between given source and destination points. We first provide a discretization scheme such that the
size of Gǫ does not depend on the ratio between the maximum and minimum unit weights. This leads to the first ǫ-approximation algorithm
whose complexity is not dependent on the unit weight ratio. We also
introduce an empirical method, called adaptive discretization method,
that improves the performance of the approximation algorithms by placing discretization points densely only in areas that may contain optimal
paths. BUSHWHACK is a discrete search algorithm used for finding optimal paths in Gǫ . We added two heuristics to BUSHWHACK to improve
its performance and scalability.
1
Introduction
In the past two decades the geometric optimal path problems have been extensively studied (see [1] for a review). These problems have a wide range of
applications in robotics and geographical information systems.
In this paper we study the path planning problem for a point robot in a
2D space consisting of n triangular regions, each of which is associated with a
distinct unit weight. Such a space can be used to model an area consisting of
different geographical features, such as deserts, forests, grasslands, and lakes, in
which the traveling costs for the robot are different. The goal is to find between
given source and destination points s and t an optimal path (a path with the
minimum weighted length).
Unlike the un-weighted 2D optimal path problem, which can be solved in
O(n log n) time, this problem is believed to be very difficult. Much of the effort
has been focused on ǫ-approximation algorithms that can guarantee to find ǫgood approximate optimal paths (see [2,3,4,5,6]). For any two points s and t in
the space, we say that a path p connecting s and t is an ǫ-good approximate
optimal path if p < (1 + ǫ)popt (s, t), where popt (s, t) represents an optimal
path from s to t and · represents the weighted length, or the cost, of a path.
Equivalently, we say that p is popt (s, t)’s ǫ-approximation.
A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 258–270, 2003.
c Springer-Verlag Berlin Heidelberg 2003
Adaptive and Compact Discretization
259
Before we give a review of previous works, we first define some notations. We
let V be the set of vertices of all regions, and let E be the set of all boundary
edges. We use wr to denote the unit weight of any region r. For a boundary
edge e separating two regions r1 and r2 , the unit weight we of e is defined to be
max
min{wr1 , wr2 }. We define unit weight ratio µ to be w
wmin , where wmax (wmin ,
respectively) is the maximum (minimum, respectively) unit weight among all
regions. We use |p| to denote the Euclidean length of path p, and use p1 + p2 to
denote the concatenation of two paths p1 and p2 .
The first ǫ-approximation algorithm on this problem was given by Mitchell
and Papadimitriou [2]. Their algorithm uses “Snell’s Law” and “continuous Dijkstra method” to give an optimal-path map for any given source point s. The
time complexity of their algorithm is O(n8 log nµ
ǫ ). In practice, however, the time
complexity is expected to be much lower. Later Mata and Mitchell [3] presented
another ǫ-approximation algorithm based on constructing a “pathnet graph” of
3
size O(nk), where ǫ = O( µk ). The time complexity, in terms of ǫ and n, is O( nǫ µ ).
Some of the existing algorithms construct from the original continuous space
a weighted graph Gǫ (V ′ , E ′ ) by placing discretization points, called Steiner
points, on boundary edges. The node set V ′ of Gǫ contains all Steiner points
as well as vertices of the regions. The edge set E ′ of Gǫ contains every edge v1 v2
such that v1 and v2 are on the border of the same region. The weight of edge v1 v2
is determined by the weighted length of segment v1 v2 in the original weighted
space. Gǫ guarantees to contain an ǫ-good approximate optimal path between s
and t, and therefore the task of finding an ǫ-good approximate optimal path is
reduced to computing a shortest path in Gǫ , which we call optimal discrete path,
using a discrete search algorithm such as Dijkstra’s algorithm or BUSHWHACK
[5,7].
In the remainder of this paper, we will mainly discuss techniques for approximation algorithms using this approach. Since an optimal discrete path from s
to t in Gǫ is used as an ǫ-approximation for the real optimal path, the phrases
“optimal discrete path” and “ǫ-good approximate optimal path” are used interchangeably, and are both denoted by p′opt (s, t).
Aleksandrov et al. [4,6] proposed two discretization schemes that place O( 1ǫ
log 1ǫ log µ) Steiner points on each boundary edge to construct Gǫ for a given ǫ.
Combining the discretization scheme of [6] with a “pruned” Dijkstra’s algorithm,
they provided an ǫ-approximation algorithm that runs in roughly O( nǫ ( √1ǫ +
log n) log 1ǫ log µ) time.
It is important to note, however, that the discretization size (and therefore
the time complexity) for these approximation algorithms also depends on various
geometric parameters, such as the smallest angle between two adjacent boundary edges, maximum integer coordinate of vertices, etc. These parameters are
omitted here since they are irrelevant to our discussion.
In this paper we present the following results on finding ǫ-good approximate
optimal paths in weighted regions:
Compact Discretization Scheme. The complexity of each of the approximation algorithms we have mentioned above depends more or less on µ, either
260
Z. Sun and J.H. Reif
linearly ([3]) or logarithmically ([2,4,6]). This dependency is caused by the corresponding discretization scheme used. In particular, the discretization scheme
of Aleksandrov et al. [6] places O( 1ǫ log 1ǫ log µ) Steiner points on each boundary
edge. Here again we omit the other geometric parameters.
The main obstacle for removing the dependency on µ from the size of Gǫ is
that otherwise it is difficult to prove that for each optimal path popt there exists
in Gǫ a discrete path that is an ǫ-approximation of popt . One traditional proof
technique used in proving the existence of such a discrete path is to decompose
popt into k subpaths p1 , p2 , · · · , pk and then construct a discrete path p′ = p′1 +
p′2 + · · · + p′k such that p′i ≤ (1 + ǫ)pi for each i. Ideally, we could choose p′i
such that pi and p′i lie in the same region, and therefore the discretization just
needs to make sure that |p′i | ≤ (1 + ǫ)|pi |. However, due to the discrete nature
of Gǫ , it is not always possible to find such p′i for each pi . For example, as shown
in Figure 1.a, popt could cross a series of boundary edges near a vertex v. The
point where it crosses each boundary edge e is between v and the closest Steiner
point from v on e. In that case, p′i could travel in regions different from where
pi lies in, and therefore to bound p′i with respect to pi , the discretization
scheme has to take into consideration variance of unit weights.
By modifying the above proof technique, we provide in Section 2 an improvement on the discretization scheme of Aleksandrov et al. [6]. The number
of Steiner points inserted by this new discretization scheme is O( 1ǫ log 1ǫ ), with
the dependency on other geometric parameters unchanged. Combining BUSHWHACK with this discretization scheme, we can have the first ǫ-approximation
algorithm whose time complexity is not dependent on µ.
popt
New York ✔
Durham
Malmö
< $980
≥ $100
v
≥ $600
✈
≥ $750
≥ $300
Mexico City ✘
(a) “Bad” optimal path
(b) Searching for the cheapest flight
Fig. 1.
Adaptive Discretization Method. The traditional approximation algorithms
construct from the original space a graph Gǫ and compute with a discrete search
algorithm an optimal discrete path in Gǫ in a one-step manner. We call this
method the fixed discretization method. For single query problem, this method
is rather inefficient in that, although the goal is to find an ǫ-good approximate
optimal path p′opt (s, t) from s to t, it actually computes an ǫ-good approximate
Adaptive and Compact Discretization
261
optimal path from s to any point v in Gǫ , as long as the cost of such a path is
less than that of p′opt (s, t). Much of the effort is unnecessary as most of these
points would not help to find an ǫ-good approximate optimal path from s to t.
We use flight ticket booking as an example. When trying to find the cheapest
flight from Durham to Malmö with one stop (supposing no direct flight is available), a travel agent does not need to consider Mexico City as a candidate for
the connecting airport if she knows the following: a) there is always a route from
Durham to Malmö with one stop that costs less than $980; b) any direct flight
from Durham to Mexico City costs no less than $300; and c) any direct flight
from Mexico City to Malmö costs no less than $750. Therefore, she does not need
to find out the exact prices of the direct flights from Durham to Mexico City
and from Mexico City to Malmö, saving two queries to the ticketing database.
Analogously, we do not need to compute p′opt (s, v) and p′opt (v, t) for a point
v ∈ Gǫ if we know in advance that v does not connect any optimal discrete
path between s and t. However, while the travel agent can rely on knowledge
she previously gained, the approximation algorithms using the fixed discretization method have no prior knowledge to draw upon. In Section 3 we discuss a
multiple-stage discretization method that we call adaptive discretization method.
It starts with a coarse discretization G ′ = Gǫ1 for some ǫ1 > ǫ and adaptively
refines G ′ until it guarantees to contain an ǫ-good approximate optimal path
from s to t. Approximate optimal path information acquired in each stage is
used to identify the areas where no optimal path from s to t will pass through
and therefore no further Steiner point needs to be inserted in the next stage.
Heuristics for BUSHWHACK. The BUSHWHACK algorithm is an alternative algorithm for computing optimal discrete paths in Gǫ . It uses a number of
complex data structures to keep track of all potential optimal paths. When m, the
number of Steiner points placed on each boundary edge, is small, the efficiency
gained by accessing only a subgraph of Gǫ is outweighed by the cost of establishing and maintaining these data structures. Another weakness of BUSHWHACK
is that its performance improvement diminishes when the number of regions in
the space is large. These weaknesses affect the practicability of BUSHWHACK
as in most cases the desired quality of approximation does not require too many
Steiner points for each boundary edge, while in the given 2D space there can be
arbitrary number of regions. In Section 4 we introduce two cost-saving heuristics
for the original BUSHWHACK algorithm to overcome the weaknesses mentioned
above.
2
Compact Discretization Scheme
In this section we provide an improvement on the discretization scheme of Aleksandrov et al. [6] by removing the dependency of the size of Gǫ on the unit weight
ratio µ.
For any point v, we let E(v) be the set of boundary edges incident to v and
let d(v) be the minimum distance between v and boundary edges in E\E(v). For
each edge e ∈ E, we let d(e) = sup{d(v) | v ∈ e} and let ve be the point on e so
262
Z. Sun and J.H. Reif
that d(ve ) = d(e). For each vertex v of a region, the radius r′ (v) of v is defined to
wmin (v)
′
be d(v)
5 , and the weighted radius r(v) of v is defined to be wmax (v) · r (v), where
wmin (v) and wmax (v) are the minimum and maximum unit weights among all
regions incident to v, respectively.
According to the discretization scheme of Aleksandrov et al. [6], for each
boundary edge e = v1 v2 , the Steiner points on e are chosen as the following. Each vertex vi has a “vertex-vicinity” S(vi ) of radius rǫ (vi ) = ǫr(vi )
and the Steiner points vi,1 , vi,2 , · · · , vi,ki are placed on the segment of e outside the vertex-vicinities so that |vi vi,1 | = rǫ (vi ), |vi,j vi,j+1 | = ǫd(vi,j ) and
vi,ki vi + ǫd(vi,ki ) ≥ |vi ve |. The number of Steiner points placed
on e can be
bounded by C(e) · 1ǫ log 1ǫ , where C(e) = O(|e|/d(e) · log(|e|/ r(v1 )r(v2 ))) =
O(|e|/d(e) · (log(|e|/ r′ (v1 )r′ (v2 )) + log µ)). This discretization can guarantee a
3ǫ-good approximate optimal path.
Observe that, for this discretization scheme, on each boundary edge e Steiner
points are placed more densely in the portion of e closer to the two endpoints,
with the exception that no Steiner point is placed inside the vertex-vicinities.
Therefore, the larger the vertex vicinities are, the less Steiner points the discretization needs to use. In the following we show that the radius rǫ (v) of the
vertex-vicinity of v can be increased to ǫr′ (v) while still guaranteeing the same
error bound. Here we assume that ǫ ≤ 21 .
A piecewise linear path p is said to be a normalized path if it does not cross
region boundaries inside vertex vicinities other than at the vertices. That is, for
each bending point u of p, if u is located on boundary edge e = v1 v2 , then either
u is one of the endpoints of e, or |vi u| ≥ rǫ (vi ) for i = 1, 2. For example, the
path shown in Figure 2 is not a normalized path, as it passes through u1 and
u2 , both of which are inside the vertex vicinity of v. We first state the following
lemma:
Lemma 1. For any path p from s to t, there is a normalized path p̂ from s to t
such that p̂ = (1 + 2ǫ ) · p.
Proof. In the following, for a path p and two points u1 , u2 ∈ p, we use p[u1 , u2 ]
to denote the subpath of p between u1 and u2 .
Refer to Figure 2. Suppose path p passes through the vertex vicinity S(v) of
v, as shown in Figure 2. We use u1 (u2 , respectively) to denote the first (last,
respectively) bending point of p inside S(v), and use u′′1 (u′′2 ) to denote the first
(last, respectively) bending point of p on the border of the union of all regions
incident to v. By the definition of d(v), we have |p[u′′1 , u1 ]| + |u1 v| ≥ d(v) and
ǫ·d(v)/5
ǫ
|p[u2 , u′′2 ]| + |vu2 | ≥ d(v). Therefore, |u1 v|/|p[u′′1 , u1 ]| ≤ d(v)−ǫ·d(v)/5
= 5−ǫ
≤ 4ǫ ,
ǫ
′′
as |u1 v| ≤ ǫd(v)
5 . Similarly, we can prove that |vu2 |/|p[u2 , u2 ]| ≤ 4 .
We let r1 be the region with the minimum unit weight among all regions
crossed by subpath p[u′′1 , u1 ], and u′1 be the point where p[u′′1 , u1 ] enters region
r1 for the first time. Similarly, we let r2 be the region with the minimum unit
weight among all regions crossed by subpath p[u2 , u′′2 ], and let u′2 be the point
where p[u2 , u′′2 ] leaves region r1 for the last time.
Adaptive and Compact Discretization
263
cheap region
u′′2
cheap region
r1
u′1
u1
r2
v
u′2
u2
u′′1
Fig. 2. Path passing through vicinity of a vertex
Consider replacing subpath p[u′′1 , u′′2 ] by this normalized subpath: p̂[u′′1 , u′′2 ] =
+ u′1 v + vu′2 + p[u′2 , u′′2 ]. We have the following inequality:
p[u′′1 , u′1 ]
p̂[u′′1 , u′′2 ] − p[u′′1 , u′′2 ]
= wr1 · |u′1 v| + wr2 · |vu′2 | − p[u′1 , u1 ] − p[u1 , u2 ] − p[u2 , u′2 ]
≤ (wr1 · |u′1 v| − p[u′1 , u1 ]) + (wr2 · |vu′2 | − p[u2 , u′2 ])
≤ wr1 (|u′1 v| − |p[u′1 , u1 ]|) + wr2 (|vu′2 | − |p[u2 , u′2 ]|)
ǫ·|p[u2 ,u′′
ǫ·|p[u′′
2 ]|
1 ,u1 ]|
+ wr2 ·
≤ wr1 · |u1 v| + wr2 · |vu2 | ≤ wr1 ·
4
4
≤ 4ǫ · (p[u′′1 , u1 ] + p[u2 , u′′2 ]) ≤ 4ǫ · p[u′′1 , u′′2 ]
Therefore, p̂[u′′1 , u′′2 ] ≤ (1+ 4ǫ )p[u′′1 , u′′2 ]. Suppose p passes through k vertex
vicinities, S(v1 ), S(v2 ), · · · , S(vk ). For each vi , we replace the subpath pi of p
that passes through S(vi ) by a normalized subpath p̂i as we described above.
Let p̂ be the resulting normalized path. Note that the sum of the weighted
lengths of p1 , p
2 , · · · , pk is less than twice of the weighted length of p, we have
k
⊓
⊔
p̂ ≤ p + 4ǫ i=1 pi ≤ (1 + 2ǫ )p.
We call a segment of a boundary edge bounded by two adjacent Steiner points
a Steiner segment. Each segment u1 u2 of a normalized path p̂ is significantly long
as compared to the Steiner segment on which u1 or u2 lies. Therefore, it is easy
to find a discrete path in Gǫ that is an ǫ-approximation of p̂. With Lemma 1, we
can prove the claimed error bound for this modified discretization:
Theorem 1. The discretization constructed with rǫ (v) = ǫr′ (v) contains a 3ǫgood approximation for an optimal path popt from s to t, for any two vertices s
and t.
Proof. We first construct a normalized path p̂ such that p̂ ≤ (1 + 2ǫ )popt .
Then we can use a proof similar to the one provided in [6] to show that, for
264
Z. Sun and J.H. Reif
any normalized path p̂, there is a discrete path p′ so that p′ ≤ (1 + 2ǫ)p̂.
Therefore, p′ ≤ (1 + 2ǫ)(1 + 2ǫ )popt = (1 + 25 ǫ + ǫ2 )popt ≤ (1 + 3ǫ)popt ,
⊓
⊔
assuming ǫ ≤ 21 .
With the modification on the radius of each vertex vicinity, for each boundary
edge e the number of Steiner points placed on e is reduced to C ′ (e)· 1ǫ log 1ǫ , where
C ′ (e) = O(|e|/d(e) log(|e|/ r′ (v1 )r′ (v2 ))). Note that C ′ (e) is independent of µ.
The significance of this compact discretization scheme is that, combining it
with either Dijkstra’s algorithm or BUSHWHACK, we can get an approximation
algorithm whose time complexity does not depend on µ. To our best knowledge,
all previous ǫ-approximation algorithms have time complexities dependent on µ.
3
Adaptive Discretization Method
Even with the compact discretization scheme, the size of Gǫ can still be very
large even for a modest ǫ, as the number of Steiner points placed on each boundary edge is also determined by a number of geometric parameters. Therefore,
computing an ǫ-good approximate optimal path by directly applying a discrete
search algorithm to Gǫ may be very costly. In particular, a discrete search algorithm such as Dijkstra’s algorithm will compute an optimal discrete path from s
to every point v ∈ Gǫ that is closer to s than t is, meaning that it has to search
through a large space with the same (small) error tolerance ǫ.
Here we further elaborate the flight ticket booking example. With the knowledge accumulated through past experiences, the travel agent may know, for any
intermediate airport A, a lower bound LD,A of the cost of a direct flight from
Durham to A as well as a lower bound LA,M of the cost of a direct flight from
A to Malmö. Further, she also knows an upper bound, say, $980, of the cost of
the cheapest flight (with one stop) from Durham to Malmö. In that case, the
travel agent would only consider airport A as a possible stop between Durham
and Malmö if LD,A + LA,M < 980. For example, it at least worths the effort
to check the database to find out the exact cost of the flight from Durham to
Malmö via New York, as shown in Figure 1.b.
The A* algorithm partially addresses this issue as it would first explore points
that are estimated using a heuristic function to be closer to the destination point
t. However, if the unit weights of the regions vary significantly, it is difficult for a
heuristic function to provide a close estimation of the weighted distance between
any point and t. As a result, the A* algorithm may still have to search through
many points in Gǫ unnecessarily.
Here we introduce a multi-stage approximation algorithm that uses an adaptive discretization method. For each i, 1 ≤ i ≤ d, this method computes
an ǫi -good approximate path from s to t in a subgraph Gǫ′ i of Gǫi , where
ǫ1 > ǫ2 > · · · > ǫd−1 > ǫd = ǫ. In each stage, with the approximate optimal path
information acquired through the previous stage, the algorithm can identify for
each boundary edge the portion of the edge where more Steiner points need to
placed to guarantee an approximate optimal path with a reduced error bound.
Adaptive and Compact Discretization
265
For the rest portion of the boundary edge, no further Steiner point needs to be
placed.
We say that a path p′ neighbors an optimal path popt if, for any Steiner
segment that popt crosses, p′ passes through one of the two Steiner points that
bound the Steiner segment. Our method requires that the discretization scheme
satisfy the following property (which is the case for the discretization schemes
of [4,6] and the one described in Section 2):
Property 1. For any two vertices v1 and v2 in the original (continuous) space and
any optimal path popt from v1 and v2 , there is a discrete path from v1 to v2 in
the discretization with a cost no more than (1 + ǫ) · popt (v1 , v2 ) that neighbors
popt .
For any two points v1 , v2 ∈ Gǫ′ i , we denote the optimal discrete path found
from v1 to v2 in the i-th stage by p′ǫi (v1 , v2 ). We say that a point v ∈ Gǫ′ i
is a searched point if an optimal discrete path p′ǫi (s, v) from s to v in Gǫ′ i is
determined. For each searched point v, we also compute an optimal discrete
path p′ǫi (v, t) from v to t. We say that a point v is a useful point if either
p′ǫi (s, v) + p′ǫi (v, t) ≤ (1 + ǫi ) · p′ǫi (s, t) or v is a vertex; we say that a
Steiner segment is a useful segment if at least one of its endpoints is useful. An
optimal path popt will not pass through a useless segment, and therefore in the
next stage the algorithm can avoid putting more Steiner points in this segment.
1. i ← 1
2. construct a discretization Gǫ′ i = Gǫi .
3. repeat
4.
compute p′ǫi (s, t) in Gǫ′ i .
5.
if i = d then return p′ǫi (s, t).
6.
continue to compute p′ǫi (s, v) for each point v in Gǫ′ i until p′ǫi (s, v) grows
beyond (1 + ǫi ) · p′ǫi (s, t).
7.
apply Dijkstra’s algorithm in a reversed way, and compute p′ǫi (v, t) for
any searched point v.
8.
G ′ ǫi+1 ← ∅
9.
for each useful point v ∈ G ′ ǫi
10.
add v into G ′ ǫi+1
11.
for each point v ∈ Gǫi+1
12.
if v is located inside a useful Steiner segment of G ′ ǫi then
13.
add v into G ′ ǫi+1
14.
i ← i+1
Algorithm 1: Adaptive
Each stage contains a forward search and a backward search. These two
searches can be performed simultaneously using Dijkstra’s two-tree algorithm
[8].
To prove the correctness of our multiple-stage approximation algorithm, it
suffices to show the following theorem:
Theorem 2. For any optimal path popt (s, t), in each Gǫ′ i there is a discrete path
p′ (s, t) with a cost no more than (1 + ǫi ) · popt (s, t) that neighbors popt (s, t).
266
Z. Sun and J.H. Reif
Proof. We prove by induction.
Basic Step: When i = 1, Gǫ′ i = Gǫ1 , and therefore the proposition is true, according to Property 1.
Inductive Step: We assume that, for any optimal path popt (s, t), Gǫ′ i contains
a discrete path p′ (s, t) neighboring popt (s, t) such that p′ (s, t) ≤ (1 + ǫi ) ·
popt (s, t). We first show that popt (s, t) will not pass through any useless Steiner
segment u1 u2 in Gǫ′ i . Suppose otherwise that popt (s, t) passes through a point
between u1 and u2 . According to the induction hypothesis, we can construct a
discrete path p′ (s, t) from s to t with a cost no more than (1 + ǫi ) · popt (s, t)
that neighbors popt (s, t). This implies that p′ (s, t) passes through either u1 or
u2 . W.L.O.G. we assume that p′ (s, t) passes through u1 . Because popt (s, t) ≤
p′ǫi (s, t), the cost of p′ (s, t) is no more than (1 + ǫi ) · p′ǫi (s, t). This is a
contradiction to the fact that p′ǫi (s, u1 ) + p′ǫi (u1 , t) > (1 + ǫi ) · p′ǫi (s, t), as
p′ (s, t) cannot be better than the concatenation of p′ǫi (s, u1 ) and p′ǫi (u1 , t).
Since any optimal path from s to t will not pass through a useless Steiner
segment, Gǫ′ i+1 , which includes all the Steiner points of Gǫi+1 except those inside
useless Steiner segments, contains every discrete path in Gǫ+1 that neighbors one
of the optimal paths from s to t. This finishes the proof.
⊓
⊔
The adaptive discretization method has both pros and cons when compared
against the fixed discretization method. It has to run a discrete search algorithm
on d different graphs, and each time it involves both forward and backward
searches. However, in the earlier stages it explores approximate optimal paths
with high error tolerance, while in later stages, as it gradually reduces the error
tolerance, it only searches approximate optimal paths in a small subspace (that
is, the useful segments of the boundary edges) instead of the entire original space
(all boundary edges). Our experimental results show that, when the desired error
tolerance ǫ is small, the adaptive discretization method performs more efficiently
than the fixed discretization.
This discretization method can also be applied to other geometric optimal
path problems, such as the time-optimum movement planning problem in regions with flows [9], the anisotropic optimal path problem [10,11], and the 3D
Euclidean shortest path problem [12,13].
4
Heuristics for BUSHWHACK
The BUSHWHACK algorithm was originally designed for the weighted region
optimal path problem [5] and was later generalized to a class of piecewise pseudoEuclidean optimal path problems [7]. BUSHWHACK, just like Dijkstra’s algorithm, is used to compute optimal discrete paths in a graph Gǫ generated by
a discretization scheme. Unlike Dijkstra’s algorithm, which applies to any arbitrary weighted graph, BUSHWHACK is adept at finding optimal discrete paths
in graphs derived from geometric spaces with certain properties, one of which
being the following:
Property 2. Two optimal discrete paths that originate from a same source point
cannot intersect in the interior of any region.
Adaptive and Compact Discretization
1
0
0
1
0
1
0
1
0
1
0
1
11
00
00
11
00
11
00
11
00
11
e′
00
11
00
11
00
11
1
0
00
11
0000000
1111111
1
0
00
11
0000000
1111111
00
11
0000000
1111111
00
11
0000000
1111111
00
11
0000000
1111111
00
11
00
11
0000000
1111111
r 1111111
00
11
00
11
0000000
00
11
0000000
1111111
1
0
00
11
0
1
00
11
0000000
1111111
0
1
00
11
0
1
00
11
0000000
1111111
0
00
0
1
11
1
e
1
0
0000
1111
01111
1
0000
0
1
0000
1111
0000
1111
0000
1111
0000
1111
0000
1111
0000
1111
0000
1111
0000
1111
0000
1111
e′′
0000
1111
0000
1111
0000
1111
0000
1111
0000
1111
0000
1111
0000
1111
0000
1111
0
1
0000
1111
0
1
0
1
0
1
0
1
(a) Edges associated
with ILISTe′′ ,e′
1
0
00 0
11
1
00
11
00
11
e′
0
1
0
1
0
1
00
11
11
00
00
11
1
0
0
1
r
e
1
0
0
1
1
0
1
0
0
1
1
0
0 e′′
1
0
1
0
1
0
1
0
1
0
1
00
00 11
11
00
11
11
00
00
00 11
11
(b) Edges associated
with ILISTe′′ ,e′
267
11
00
0000
1111
001111
11
0000
00
0011
11
0000
1111
00
11
00
11
0000
1111
00
11
00
11
0000
1111
1
0
00
11
00
11
0000
1111
1
0
00
11
0000
1111
00
11
0000
1111
00
11
0000
1111
00
11
e′
00
11
0000
1111
00
11
e′′
00
11
0000
1111
11
00
00
11
0000
1111
0000000
1111111
11
00
11
00
00
11
0000
1111
0000000
1111111
11
00
00
11
0000
1111
0000000
1111111
00
11
0000
1111
0000000
00
11
0000
1111
r1111111
00
11
0000000
1111111
0
1
00
11
0000
1111
00
11
0000000
1111111
0
1
00
11
0000
1111
00
11
0000000
1111111
00
11
0000
1111
0000000
1111111
00
11
0
1
00
11
00
11
00
11
0000
1111
0000000
1111111
00
11
00
11
0
1
00
11
00
11
00
11
0000000
1111111
00
00
0
00 11
00 11
11
1
11
e
(c) Edges associated
with either interval
lists
Fig. 3. Intersecting Edges Associated with Two Interval Lists
One implication of Property 2 is that, if two edges v1 v2 and u1 u2 of Gǫ
intersect inside region r, they cannot both be useful. An edge is said to be useful
if it contributes to optimal discrete paths that originate from s. To exploit this
property, BUSHWHACK maintains a list ILISTe,e′ of intervals for each pair of
boundary edges e and e′ such that e and e′ are on the border of the same region
r. A point v is said to be discovered if an optimal discrete path p′opt (s, v) has been
determined. ILISTe,e′ contains for each discovered point v ∈ e an interval Iv,e,e′
defined as the following: Iv,e,e′ = {v ∗ ∈ e′ |wr · |vv ∗ | + p′opt (s, v) ≤ wr · |v ′ v ∗ |
+p′opt (s, v ′ )∀ v ′ ∈ PLISTe }. Here PLISTe is the list of all discovered points
on e. We say that edge vv ∗ is associated with interval list ILISTe,e′ if v ∈ e and
v ∗ ∈ Iv,e,e′ .
It is easy to see that any edge vv ∗ that crosses region r is useful only if
it is associated with an interval list inside r. If m is the number of Steiner
points placed on each boundary edge, the total number of edges associated with
interval lists inside a region r is Θ(m). Dijkstra’s algorithm, on the other hand,
has to consider all Θ(m2 ) edges inside r. By avoid accessing most of the useless
edges, BUSHWHACK takes only O(nm log nm) time to compute an optimal
discrete path from s to t, as compared to O(nm2 +nm log nm) time for Dijkstra’s
algorithm.
In this section we introduce BUSHWHACK+ , a variation of BUSHWHACK.
On the basis of the original BUSHWHACK algorithm, BUSHWHACK+ uses
several cost-saving heuristics. The necessity of the first heuristic is rather obvious. Let r be a triangular region with boundary edges e, e′ and e′′ . There are six
interval lists for each triangular region r, one for each ordered pair of boundary
edges of r. Although the edges associated with the same interval list do not
intersect with each other, two edges associated with different interval lists may
still intersect inside r. Therefore, BUSHWHACK may still use some intersecting edges to construct candidate optimal paths. Figure 3.a and 3.b show the
edges associated with ILISTe,e′ and ILISTe′′ ,e′ , respectively. Figure 3.c shows
that these two sets of edges intersect with each other, meaning that some of
them must be useless.
To address this issue, BUSHWHACK+ merges ILISTe,e′ and ILISTe′′ ,e′ into
a single list ILISTr,e′ . Any point v ∗ ∈ e′ is included in one and only one interval
268
Z. Sun and J.H. Reif
in this list. (In BUSHWHACK, every such point is included in two intervals,
one in ILISTe,e′ and one in ILISTe′′ ,e′ .) More specifically, for any discovered
point v ∈ e ∪ e′′ , v ∗ ∈ Iv,r,e′ if and only if wr · |vv ∗ | + p′opt (s, v) ≤ wr ·
|v ′ v ∗ | + p′opt (s, v ′ ) for any other discovered point v ′ ∈ e ∪ e′′ . Therefore, any
two edges associated with ILISTr,e′ will not intersect with each other inside r. As
BUSHWHACK+ constructs candidate optimal paths using only edges associated
with interval lists, it would avoid using both of two intersecting edges v1 v1∗ and
v2 v2∗ if v1 , v2 ∈ e ∪ e′′ and v1∗ , v2∗ ∈ e′ .
The second heuristic is rather subtle. It reduces the size of QLIST, the list
of candidate optimal paths. Possible operations on this list include inserting a
new candidate optimal path and deleting the minimum cost path in the list.
On average, each such operation costs O(log(nm)) time. As each iteration of
the algorithm will invoke one or more such operations, it is very important to
contain the size of QLIST.
In the original BUSHWHACK, for any point v ∈ e, QLIST may contain six
or more candidate optimal paths from s to v. Among these paths, four of them
are propagated through edges associated with interval lists, while the remaining
ones are extended to v from left and right along the edge e. This is a serious
disadvantage against Dijkstra-based approximation algorithm, which keeps only
one path from s to v in the Fibonacci heap for each Steiner point v. When n
is relatively large, the performance gain of BUSHWHACK by accessing only a
small subgraph of Gǫ will be totally offset by the time wasted on a larger path
list.
If multiple candidate optimal paths for v are inserted into QLIST, BUSHWHACK keeps each of them until it is time to extract that path from QLIST,
even though it can be immediately decided that all of those paths except one
cannot be optimal (by comparing the costs of those paths). This is because
BUSHWHACK would generate new candidate optimal paths using these paths
in different ways. A (non-optimal) path may lead to the generation of a true
optimal discrete path and therefore it cannot be simply discarded. What BUSHWHACK does is to keep the path in QLIST until this path becomes the minimum
cost path. At that time, it will be extracted from QLIST and a new candidate
optimal path generated from the old path will be inserted into QLIST.
BUSHWHACK+ , however, uses a slightly different propagation scheme to
avoid keeping multiple paths with the same ending point. Let p(s, v ′ ) be a candidate optimal path from s to v ′ that has just been inserted into QLIST. If there
is already another candidate optimal path p′ (s, v ′ ) in QLIST, instead of keeping both of them in QLSIT, BUSHWHACK+ will take the more costly one, say
p′ (s, v ′ ), and immediately extract it from QLIST. This extracted path will be processed as if it had been extracted in the normal situation (in which it would have
been the minimum cost path in the list). This is, in essence, a “propagation-inadvance” strategy that is somewhat contradictory to “lazy” propagation scheme
of BUSHWHACK. It may cause accessing edges unnecessarily. It is a trade-off
between reducing the path list size and reducing the number of edges accessed.
Adaptive and Compact Discretization
5
269
Preliminary Experimental Results
In order to provide a performance comparison, we implemented using Java the
following three algorithms: 1) BUSHWHACK+ ; 2) pure Dijkstra’s algorithm,
which searches every incident edge of a Steiner point in Gǫ ; 3) two-stage adaptive
discretization method, which uses pure Dijkstra’s algorithm for each stage and
chooses ǫ1 = 2ǫ . All the timed results were acquired from a Sun Blade-1000
workstation with 4GB memory.
For our experiments we chose triangulations converted from terrain maps in
grid data format. More specifically, we used the DEM (Digital Elevation Model)
file of Kaweah River basin. It is a 1424x1163 grid with 30m between two neighboring grid points. We randomly took twenty 60x45 patches and converted them
to TINs by connecting two grid points diagonally for each grid cell. Therefore,
in each example there are 5192 triangular faces. For each triangular face r, we
assign to r a unit weight wr that is equal to 1 + 10 tan αr , where αr is the angle
between r and the horizontal plane.
Table 1. Statistics of running time (in seconds) and number of visited edges per region
Algorithm BUSHWHACK+ pure Dijkstra adaptive discretization
1
=3
156.9 / 2371
243.0 / 16558
281.3 / 10877
ǫ
1
290.7 / 4603
711.0 / 55797
570.2 / 24041
=5
ǫ
1
=7
440.6 / 7098 1506.0 / 124086
1054.7 / 40827
ǫ
1
631.9 / 9795 2672.5 / 224987
1528.9 / 60495
=9
ǫ
For each TIN, we ran the three algorithms five times, each time choosing
randomly generated source and destination points. For each algorithm, we took
the average of the running times of all experiments. We repeated the experiments
with 1ǫ = 3, 5, 7 and 9. From Table 1, it is easy to see that, when 1ǫ grows, the
running times of the BUSHWHACK+ algorithm and adaptive discretization
method are growing much slower than that of the pure Dijkstra’s algorithm. We
also list the average number of visited edges per region for each algorithm and
each ǫ value. It occurs to us that, the number of visited edges per region and
the running time are closely correlated.
6
Conclusion
In this paper we provided several improvements on the approximation algorithms
for the weighted region optimal path problem: 1) a compact discretization scheme
that removes the dependency on the unit weight ratio; 2) an adaptive discretization that selectively put Steiner points with high density on boundary edges;
and 3) a revised BUSHWHACK algorithm with two cost-saving heuristics.
Acknowledgement. This work is supported by NSF ITR Grant EIA-0086015,
DARPA/AFSOR Contract F30602-01-2-0561, NSF EIA-0218376, and NSF EIA0218359.
270
Z. Sun and J.H. Reif
References
1. Mitchell, J.S.B.: Geometric shortest paths and network optimization. In Sack,
J.R., Urrutia, J., eds.: Handbook of Computational Geometry. Elsevier Science
Publishers B.V. North-Holland, Amsterdam (2000) 633–701
2. Mitchell, J.S.B., Papadimitriou, C.H.: The weighted region problem: Finding shortest paths through a weighted planar subdivision. Journal of the ACM 38 (1991)
18–73
3. Mata, C., Mitchell, J.: A new algorithm for computing shortest paths in weighted
planar subdivisions. In: Proceedings of the 13th Annual ACM Symposium on
Computational Geometry. (1997) 264–273
4. Aleksandrov, L., Lanthier, M., Maheshwari, A., Sack, J.R.: An ǫ-approximation
algorithm for weighted shortest paths on polyhedral surfaces. In: Proceedings of
the 6th Scandinavian Workshop on Algorithm Theory. Volume 1432 of Lecture
Notes in Computer Science. (1998) 11–22
5. Reif, J.H., Sun, Z.: An efficient approximation algorithm for weighted region shortest path problem. In: Proceedings of the 4th Workshop on Algorithmic Foundations
of Robotics. (2000) 191–203
6. Aleksandrov, L., Maheshwari, A., Sack, J.R.: Approximation algorithms for geometric shortest path problems. In: Proceedings of the 32nd Annual ACM Symposium on Theory of Computing. (2000) 286–295
7. Sun, Z., Reif, J.H.: BUSHWHACK: An approximation algorithm for minimal paths
through pseudo-Euclidean spaces. In: Proceedings of the 12th Annual International
Symposium on Algorithms and Computation. Volume 2223 of Lecture Notes in
Computer Science. (2001) 160–171
8. Helgason, R.V., Kennington, J., Stewart, B.: The one-to-one shortest-path problem: An empirical analysis with the two-tree dijkstra algorithm. Computational
Optimization and Applications 1 (1993) 47–75
9. Reif, J.H., Sun, Z.: Movement planning in the presence of flows. In: Proceedings of
the 7th International Workshop on Algorithms and Data Structures. Volume 2125
of Lecture Notes in Computer Science. (2001) 450–461
10. Lanthier, M., Maheshwari, A., Sack, J.R.: Shortest anisotropic paths on terrains.
In: Proceedings of the 26th International Colloquium on Automata, Languages and
Programming. Volume 1644 of Lecture Notes in Computer Science. (1999) 524–533
11. Sun, Z., Reif, J.H.: On energy-minimizing paths on terrains for a mobile robot.
In: Proceedings of the 2003 IEEE International Conference on Robotics and Automation. (2003) To appear.
12. Papadimitriou, C.H.: An algorithm for shortest-path motion in three dimensions.
Information Processing Letters 20 (1985) 259–263
13. Choi, J., Sellen, J., Yap, C.K.: Approximate Euclidean shortest path in 3-space. In:
Proceedings of the 10th Annual ACM Symposium on Computational Geometry.
(1994) 41–48
On Boundaries of Highly Visible Spaces and
Applications
John H. Reif and Zheng Sun
Department of Computer Science, Duke University, Durham, NC 27708, USA
{reif,sunz}@cs.duke.edu
Abstract. The purpose of this paper is to investigate the properties
of a certain class of highly visible spaces. For a given geometric space S
containing obstacles specified by disjoint subsets of S, the free space F is
defined to be the portion of S not occupied by these obstacles. The space
is said to be highly visible if at each point in F a viewer can see at least
an ǫ fraction of the entire F . This assumption has been used for robotic
motion planning in the analysis of random sampling of points in the
robot’s configuration space, as well as the upper bound of the minimum
number of guards needed for art gallery problems. However, there is no
prior result on the implication of this assumption to the geometry of
the space under study. For the two-dimensional case, with the additional
assumptions that S is bounded within a rectangle of constant aspect
ratio and that the volume ratio between F and S is a constant, we show
by “charging” each obstacle boundary by a certain
portion of S that the
total length of all obstacle boundaries in S is O( nµ(F)/ǫ), if S contains
polygonal obstacles with a total of n boundary edges; or O( nµ(F )/ǫ),
if S contains n convex obstacles that are piecewise smooth. In both cases,
µ(F ) is the volume of F. For the polygonal case, this
bound is tight as
we can construct a space whose boundary size is Θ( nµ(F )/ǫ). These
results can be partially extended to three dimensions. We show that these
results can be applied to the analysis of certain probabilistic roadmap
planners, as well as a variation of the art gallery problem.
1
Introduction
Computational geometry is now a mature field with a multiplicity of well-defined
foundational problems associated with, for many cases, efficient algorithms as
well as well-established applications over a broad range of areas including computer vision, robotic motion planning and rendering. However, as compared to
some other fields, the field of computational geometry has not yet explored as
much the methodology of looking at reasonable sub-cases of inputs that appear
in practice for practical problems. For example, in matrix computation, there is
a well-established set of specialized matrices, such as sparse matrices, structured
matrices, and banded matrices, for which there are especially efficient algorithms.
One assumption that has been used in a number of previous works in computational geometry is the assumption that, for a given geometric space S with
A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 271–283, 2003.
c Springer-Verlag Berlin Heidelberg 2003
272
J.H. Reif and Z. Sun
a specified set of obstacles, a viewer can see at every point of the free space F
an ǫ fraction of the entire volume of F. Here obstacles are defined to be compact
subsets of S, while the free space F is defined to be the portion of S not occupied
by the obstacles. In this paper we will call this assumption ǫ-visibility (though
please note that some of the prior authors called it instead ǫ-goodness).
1.1
Probabilistic Roadmap Planners
The ǫ-visibility assumption, in particular, has been used in the analysis of randomized placements of points in the robot’s configuration space for probabilistic
roadmap (PRM) planners [1,2]. A classic PRM planner [3,4] randomly picks in
the free space of the robot’s configuration space a set of points, called milestones.
With these milestones, it constructs a roadmap by connecting each pair of milestones between which a collision-free path can be computed using a simple local
planner. For any given initial and goal configurations s and t, the planner first
finds two milestones s′ and t′ such that a simple collision-free path can be found
connecting s (t, respectively) with s′ (t′ , respectively) and then searches the
roadmap for a path connecting s′ and t′ . The PRM planners have proved to be
very effective in practice, capable of solving robotic motion planning problems
with many degrees of freedom. They also find applications in other areas such
as computer animation, computational biology, etc.
The performance of a PRM planner depends on two key features of the
roadmaps it constructs, visibility and connectivity. Firstly, for any given (initial
or goal) configuration v, there should exist in the roadmap a milestone v ′ such
that a local planner can find a path connecting v and v ′ . Since in practice most
PRM planners use local planners that connect configurations by straight line
segments, this implies that the milestones collectively need to see the entire
(or at least a significant portion of) free space. Secondly, the roadmap should
capture the connectivity of the free space it represents. Any two milestones in
the same connected component of the free space should also be connected via
the roadmap, or otherwise the planner would give “false negative” answers to
some queries.
The earlier PRM planners pick milestones with a uniform distribution in the
free space. The success of these planners motivated Kavraki et al.[1] to establish
a theoretical foundation for the effectiveness of this sampling method. They
showed that, for an ǫ-visible configuration space, O( 1ǫ log 1ǫ ) milestones uniformly
sampled in the free space are needed to adequately cover the free space with a
high probability.
1.2
Art Gallery Problems
The ǫ-visibility assumption has also been used in bounding the number of guards
needed for art gallery problems [5,6,7,8]. Potentially, this assumption might also
allow for much more efficient algorithms in this case. The assumption appears to
be reasonable in large number of practical cases as long as the considered area
is within a closed area (such as a room).
On Boundaries of Highly Visible Spaces and Applications
273
The original art gallery problem was first proposed by V. Klee, who described
the problem as the following: how many guards are necessary, and how many
guards are sufficient, to guard the paintings and works of art in an art gallery
with n walls? Later, Chvátal [9] showed that ⌊ n3 ⌋ guards are always sufficient
and occasionally necessary to guard a simple polygon with n edges. Since then,
there have been numerous variations of the art gallery problem, including, but
not limited to, vertex guard problem, edge guard problem, fortress and prison
yard problems, etc. (See [10] for a comprehensive review of various art gallery
problems.)
Although for the worst case the number of guards needed is Θ(n) for polygonal galleries with n edges, intuitively, one would expect that galleries that are
ǫ-visible should require much fewer guards. By translating the result of Kavraki
et al.[1] into the context of art gallery problems, a uniformly random placement
of O( 1ǫ log 1ǫ ) guards is very likely to guard an adequate portion of the gallery.
Kavraki et al.[1] also conjectured that in d-dimensional space any ǫ-visible polygonal gallery with h holes can be guarded by at most fd (h, 1ǫ ) guards, for some
polynomial function fd . Following some ideas of an earlier work by Kalai and
Matoušek [5], Valtr [6] confirmed the 2D version of the conjecture by showing
that f2 (h, 1ǫ ) = (2 + o(1)) 1ǫ log 1ǫ log(h + 2). However, Valtr [7] disapproved the
3D version of the conjecture by constructing for any integer k a 95 -visible art
gallery that cannot be guarded by k guards. Kirkpatrick [8] later showed that
64 1ǫ log log 1ǫ vertex guards are needed to guard all vertices of a simply connected
polygon P that has the property that each vertex of P can see at least ǫ fraction
of the other vertices of P . He also gave a similar result for boundary guards.
It has been proved that, for various art galleries problems, finding the minimum number of guards is difficult. Lee and Lin [11] proved that the minimum
vertex guard problem for polygons is NP-hard. Schuchardt and Hecker[12] further showed that even for orthogonal polygons, whose edges are parallel to either
the x-axis or the y-axis, the minimum vertex and point guard problems are NPhard. Ghosh [13] presented an O(n5 log n) algorithm that can compute a vertex
guard set whose size is at most O(log n) times the minimum number of guards
needed.
However, with the assumption of ǫ-visibility, one can use a simple and efficient
randomized approximation algorithm based on the result of Kavraki et al.[1] for
the original art gallery problem. Moreover, this approximation algorithm does
not require the assumption that the space is polygonal.
1.3
Our Result
Intuitively, for an ǫ-visible space, the total size of all obstacle boundaries cannot
be arbitrarily large; an excessive size of obstacle boundaries would inevitably
cause a point in F to lose ǫ-visibility by blocking a significant portion of its
view. Our main result of this paper is an upper bound of the boundary size of
ǫ-visible spaces in two and (in some special cases) three dimensions. The upper
bound of the boundary size not only is a fundamental property for the geometric
274
J.H. Reif and Z. Sun
spaces of this type, but also may have implications to other applications that
use this assumption.
We show that,
for an ǫ-visible 2D space, the total length of all obstacle
boundaries is O( nµ(F)/ǫ), if the
space contains polygonal obstacles with a
total of n boundary edges; or O( nµ(F)/ǫ), if the space contains n convex
obstacles that are piecewise smooth. In both cases, µ(F) is the area of F. For the
case of polygonal obstacles, this bound is tight as one can construct
an ǫ-visible
space containing obstacle boundaries with a total length of Θ( nµ(F)/ǫ).
Our result can be used to bound the number of guards needed for the following variation of the original art gallery problem: given a space with a specified
set of obstacles, how to put points on boundaries of obstacles so that these points
see the entire (or a significant portion of) space. We call this problem boundary
art gallery problem. This problem can find applications in practical situations
where the physical constraints would only allow points to be placed on obstacle boundaries. For example, one might need to install lights on the walls to
enlighten a closed space consisting of rooms and corridors.
If this result can be extended to higher dimensions, we can also apply it
to bounding the number of randomly sampled boundary points needed to adequately cover the free space. Although it is difficult to uniformly sample points
on the boundary of a space without an explicit algebraic description, there exist
PRM planners [14,15] that place milestones “pseudo-uniformly” on the boundary of the free space using various techniques. These planners have proved to be
more effective in capturing the connectivity of the configuration space with the
presence of narrow passages.
2
Bounding Boundary Size for 2D and 3D ǫ-Visible
Spaces
In this section we prove an upper bound of the boundary size of 2D ǫ-visible
spaces. We also show that this result can be partially extended to 3D ǫ-visible
spaces.
2.1
Preliminaries
Suppose S is the 2D space bounded inside a rectangle R. We let B denote the
union of all obstacles in S, and let ∂B denote the boundaries of all obstacles.
For each point v ∈ F, we let Vv = {v ′ | line segment vv ′ ⊂ F}. That is, Vv is the
set of all free space points that can be seen from v.
We assume that the aspect ratio of R, defined to be the ratio between the
lengths of the shorter and longer sides of R, is no less than λ, where 0 < λ < 1.
We also assume that µ(F) ≥ ρ · µ(S), for some constant ρ > 0. In the full version
of the paper, we will give examples where the boundary size cannot be bounded
if λ and ρ are not bounded by constants.
A segment of the boundary (which we call sub-boundary) of an obstacle is
said to be smooth if the curvature is continuous along the curve defining the
On Boundaries of Highly Visible Spaces and Applications
275
boundary. The boundary of an obstacle is said to be piecewise smooth if it
consists of a finite number of smooth sub-boundaries. In this section we assume
that the boundaries of all obstacles inside R are piecewise smooth.
For a smooth sub-boundary c, the turning angle, denoted by A(c), is defined
to be the integral of the curvature along c. For a piecewise sub-boundary c, the
turning angle is defined to be the sum of the turning angles of all smooth subboundaries of c, plus the sum of the instantaneous angular changes at the joint
points. Observe that the turning angle of the boundary of an obstacle is 2π if
the obstacle is convex, or greater than 2π if it is non-convex. In some sense, the
turning angle of the boundary of an obstacle reflects the geometric complexity
of the obstacle.
For each sub-boundary c, we use |c| to denote the length of c, and use c[u1 , u2 ]
to denote the part of c between points u1 and u2 on c. For any point v ∈ c, we
let u1 and u2 be the two points on c such that c is lying between the two rays
−
→1 and −
→2 . We call u1 and u2 bounding points of c by v. We define the viewing
vu
vu
angle of c from v to be u1 vu2 .
u1,1
c1
u1,2
c2
u1
c1
c3
u2
(a) Various ǫ-flat sub-boundaries
bounded between two arcs
v
(b) Blocked visibility near ǫflat sub-boundary
Fig. 1. Lines and curves are not drawn proportionally.
For each obstacle, we decompose its boundary into minimum number of ǫflat sub-boundaries. A sub-boundary c is said to be ǫ-flat if A(c) ≤ π − θǫ ,
λρ
where θǫ = 16(1+λ
2 ) · ǫ. Let u1 and u2 be the two endpoints of c. Observe that
c is bounded between two minor arcs each with chord u1 u2 and angle 2θǫ , as
shown in Figure 1.a. Therefore, the width of c, defined by |u1 u2 |, is no less than
|c| · cos θ2ǫ , while the height of c, defined by the maximum distance between any
θǫ
point u ∈ c and line segment u1 u2 , is no more than |c|
2 · sin 2 .
Since ǫ-flat sub-boundaries are “relatively” flat, any point v ∈ F “sandwiched” between two ǫ-flat sub-boundaries will have a limited visibility, as we
show in the follow lemma:
Lemma 1. If v ∈ F is a point between two ǫ-flat sub-boundaries c1 and c2 and
the total viewing angle of c1 and c2 from v is more than 2π − 6θǫ , then v is not
ǫ-visible.
Proof Abstract. For each i = 1, 2, let ui,1 and ui,2 be the two endpoints of ci .
Vv is the union of the following three regions: I) the region bounded by subboundary c1 , vu1,1 and vu1,2 ; II) the region bounded by sub-boundary c2 , vu2,1
276
J.H. Reif and Z. Sun
and vu2,2 ; and III) the region not inside either u1,1 vu1,2 or u2,1 vu2,2 . Since
the total viewing angle of v blocked by c1 and c2 is more than 2π − 6θǫ , and
u1,1 vu1,2 ≤ π + θǫ and u2,1 vu2,2 ≤ π + θǫ , we have u1,1 vu1,2 > π − 7θǫ and
u2,1 vu2,2 > π − 7θǫ . Since c1 is ǫ-flat, the volume of Region I is bounded by
the union of △ui,1 vui,2 and the arc with chord |c1 | and
angle 2θǫ , as shown in
2
+1
µ(F), where LR is
Figure 1.b. Since |c1 | · cos(θǫ /2) ≤ |u1,1 u1,2 | ≤ LR ≤ λλρ
the length of the diagonal of R, the volume of Region I is bounded by O(ǫµ(F)).
Region III is the union of two (possibly merged) cones with a total angle of
6θǫ , and therefore the volume of Region III is also O(ǫµ(F)). Hence, the region
visible from v has a total volume of O(ǫµ(F)). (In the full version of the paper
we will show that the volume is actually less than ǫµ(F).) Therefore, v is not
ǫ-visible.
⊓
⊔
In the rest of this section we will prove the following theorem:
Theorem 1. If the boundaries of all obstacles can be divided into n ǫ-flat
sub)
boundaries, the total length of all obstacle boundaries is bounded by O( nµ(F
).
ǫ
However, to prove Theorem 1 we need two lemmas, which we will prove in
the next subsection. In Subsection 2.3 we will show the proof of this theorem as
well as its corollaries.
2.2
Forbidden Neighborhoods of ǫ-Flat Sub-boundaries
For each ǫ-flat sub-boundary c with endpoints u1 and u2 , we divide it into 15
equal-length segments, and let u′1 and u′2 be the two endpoints of the middle
segment. The ǫ-neighborhood of c, denoted by Nǫ (c), is defined to be the union
of points from each of which the viewing angle of c[u′1 , u′2 ] is greater than π − θǫ ,
as show in Figure 2.a. It is easy to see that, for any v ∈ Nǫ (c), the distance
|c[u′1 ,u′2 ]|
between v and line segment u′1 u′2 is no more than
· tan θǫ = |c|
2
30 · tan θǫ .
The distance between v and line segment u1 u2 is no more than the sum of the
distance between u and u′1 u′2 and the maximum distance between u′1 u′2 and u1 u2 ,
|c|
θǫ
which is |c|
30 · tan θǫ + 2 · sin 2 .
These neighborhoods are “forbidden” in the sense that they do not overlap
with each other if the corresponding sub-boundaries are roughly the same length,
as we will show in Lemma 2. By “charging” a certain portion of S to each ǫ-flat
sub-boundary, we show that the total length of all ǫ-flat sub-boundaries, that is,
the length of ∂B, can be upper-bounded.
Lemma 2. The ǫ-neighborhoods of two sub-boundaries c1 and c2 do not overlap
if |c21 | ≤ |c2 | ≤ 2|c1 |.
Proof. Suppose for the sake of contradiction v ∈ S is a point inside Nǫ (c1 ) ∩
Nǫ (c2 ), where the length ratio between c1 and c2 is between 21 and 2. For each
i = 1, 2, we let ui,1 and ui,2 be the two endpoints of ci , and let u′i,1 and u′i,2 be
the endpoints of the portion of ci incident to the ǫ-neighborhood of ci . Let vi be
On Boundaries of Highly Visible Spaces and Applications
v1
u′1,2 v ′ u′1,1
1
c1
v ′
l1 + l2 u′2,2
u2,1 c
2
′
u2,2
v2
v
l1 u”2,2 2 u”2,1l1
u1,2
ǫ-neighborhood
u1
u′1
277
u′2
u2
c
obstacle
(a) ǫ-neighborhood
u1,1
u2,1
(b)
ǫ-neighborhoods
are
nonoverlapping
for
sub-boundaries
with similar lengths
Fig. 2. Lines and curves are not drawn proportionally.
the projection of v on line segment ui,1 ui,2 , and let vi′ be the intersection of ci
and the straight line that passes both vi and v.
The intuition here is as the following: since c1 and c2 are “relatively” flat,
non-intersecting, and about the same length, for Nǫ (c1 ) and Nǫ (c2 ) to overlap,
u1,1 u1,2 and u2,1 u2,2 have to be “almost” parallel and also close to each other.
That way, we can find in the free space between c1 and c2 a point that can only
see less than ǫµ(F) of the free space as its visibility is mostly “blocked” by c1
and c2 , leading to a contradiction to the assumption that S is ǫ-visible.
There are a number of cases corresponding to different geometric arrangements of the points, line segments and curves (sub-boundaries). In the following
we assume that u1,1 u1,2 and u2,1 u2,2 do not intersect, v lies between u1,1 u1,2
and u2,1 u2,2 , and v1′ (v2′ , respectively) lies between v and v1 (v2 , respectively),
as shown in Figure 2.b. The other cases can be analyzed in an analogous manner.
Since line segments u1,1 u1,2 and u2,1 u2,2 do not intersect, either both v1 u2,1
and v1 u2,2 lie between u1,1 u1,2 and u2,1 u2,2 , or both v2 u1,1 and v2 u1,2 lie between
u1,1 u1,2 and u2,1 u2,2 . Without loss of generality we assume that it is the former
case. Let l1 = |vv1 | and l2 = |vv2 |. Let u′′2,1 (u′′2,2 , respectively) be the projection
of u′2,1 (u′2,2 , respectively) on u2,1 u2,2 . Observe that v1′ lies inside the small
rectangle of width |u′′2,1 u′′2,2 | + 2l1 and height l1 + l2 (the solid rectangle in Figure
2.b). Since |u2,2 u′′2,2 | = |u2,2 u2,1 | − |u′′2,2 u2,1 | > |u2,2 u2,1 | − |c[u′2,2 , u2,1 ]|, we have
tan v1′ u2,1 u2,2 ≤
≤
l1 + l2
|u2,2 u2,1 | − |c[u′2,2 , u2,1 ]| − l1
1
· tan θǫ +
( 30
8|c2 |
15
|c2 | · cos θ2ǫ −
Applying |c1 | ≤ 2|c1 | and θǫ <
tan v1′ u2,1 u2,2 ≤
1
12 ,
1
2
· sin θ2ǫ ) · (|c1 | + |c2 |)
1
− ( 30
· tan θǫ +
1
2
· sin θ2ǫ ) · |c1 |
we now have
1
1
θǫ · ( 30 cos
θǫ + 4 ) · 3|c2 |
8
1
(cos θ2ǫ − 15
− ( 15
· tan θǫ + sin θ2ǫ )) · |c2 |
5θǫ
5θǫ
5
≤
≤ tan θǫ ≤ tan
.
2
2
2
.
278
J.H. Reif and Z. Sun
It follows that v1′ u2,1 u2,2 ≤ 5θ2ǫ . Similarly, we can show that v1′ u2,2 u2,1
≤ 5θ2ǫ , and therefore u2,1 v1′ u2,2 ≥ π−5θǫ . Since v1′ is on c1 , u1,1 v1′ u1,2 ≥ π−θǫ .
Therefore, the viewing angle from v1′ not blocked by c1 and c2 is no more than
2π − (π − θǫ ) − (π − 5θǫ ) = 6θǫ . According to Lemma 1 v1′ is not ǫ-visible.
Therefore, we can find a point v1∗ ∈ F close to v1′ who is also not ǫ-visible, a
contradiction to the assumption that S is ǫ-visible.
⊓
⊔
Next we give a lower bound of the volume of the ǫ-neighborhood of any ǫ-flat
sub-boundary with the following lemma:
Lemma 3. For any ǫ-flat sub-boundary c, the volume of Nǫ (c) is Ω(θǫ · |c|2 ).
Proof. We will show that, the ǫ-neighborhood of c has a volume no less than
θ |c[u′1 ,u′2 ]|2
, for some constant κ1 > 1. (We will explain later how this
µ0 = ǫ 18κ
1
constant κ1 is chosen.)
part of ǫ-neighborhood
part of ǫ-neighborhood
c
c2
u′2
c3
v
v0
c u0
c2
u
c1
obstacle
u′1
(a) ǫ-flat sub-boundary: case I
u′2
c3
obstacle
v1
c1
u′1
(b) ǫ-flat sub-boundary: case II
Fig. 3. In the figures we only show the portion of sub-boundary c between u′1 and u′2
We divide c[u′1 , u′2 ] into three equal-length segments, c1 , c2 , and c3 . For any
point u on c[u′1 , u′2 ], we say that v ∈ F is the lookout point of u if line segment
vu is normal to c[u′1 , u′2 ] and the viewing angle of c[u′1 , u′2 ] from v is π − θǫ . We
call the length of uv the lookout distance of c[u′1 , u′2 ] at u.
We first consider Case I, where for each point u ∈ c2 the length of the lookout
θ |c[u′ ,u′ ]|
distance of c at u is at least l = ǫ 3κ11 2 , as shown in Figure 3.a. In this case,
the volume of the ǫ-neighborhood of c outside c2 is at least |c2 | · l −
|c[u′1 ,u′2 ]|2 ·θǫ
9κ1
θǫ2
2κ1 )
|c[u′1 ,u′2 ]|2 ·θǫ
≥
18κ1
no less than µ0 .
l2 ·θǫ
2
=
· (1 −
= µ0 , and therefore the volume of the
ǫ-neighborhood of c is
Now we consider Case II, where there exists a point u0 ∈ c2 such that the
lookout distance at u0 is less than l, as shown in Figure 3.b. Let v0 be the lookout
point of u0 . Since A(c[u′1 , u′2 ]) ≤ A(c) ≤ θǫ , v0 will see at least one of the two
endpoints of c[u′1 , u′2 ], or otherwise the viewing angle of v0 is less than π − θǫ .
Without loss of generality we let u′1 be an endpoint of c[u′1 , u′2 ] that is visible
from v0 . c[u0 , u′1 ], the part of c between u0 and u′1 , lies below line segments v0 u′1 .
|c[u′1 ,u′2 ]|
Since u0 ∈ c2 , we have |c[u0 , u′1 ]| ≥ |c1 | =
.
3
On Boundaries of Highly Visible Spaces and Applications
279
Since curve c[u0 , u′1 ] is also ǫ-flat, we have |u0 u′1 | ≥ |c[u0 , u′1 ]| · cos θ2ǫ >
|c[u′1 ,u′2 ]|
6
u0 u′1 .
. We use u0 u′1 as the chord to draw a minor arc of angle 2θǫ outside
|u u′ |
|c[u′ ,u′ ]|
0 1
1
2
The radius of this arc is r0 = 2 sin
θǫ ≥
12θǫ . Let v1 be the point where
′
′
′
arc u
0 u1 intersects v0 u1 . We claim that any point v inside the closed region
′
′
bounded by arc u0 u1 and chord u1 v1 belongs to the ǫ-neighborhood of c. First
of all, v ′ is outside c[u0 , u′1 ], as c[u0 , u′1 ] lies below v0 u′1 . Secondly, the viewing
angle of c[u′1 , u′2 ] from v ′ should be no less than the viewing angle of c[u0 , u′1 ]
from v ′ , which is at least π − θǫ .
′
′
Now we consider the volume of the region bounded by u
0 u1 and u1 v1 . This
′
′
is actually an arc u
v1 with angle θ0 = 2θǫ − 2 u0 u v0 and radius r0 . Since
u0 u′1 v0 <
|u0 v0 |
|u0 u′1 |
1
1
<
l
|c[u′1 ,u′2 ]|/6
=
2θǫ
κ1 .
As long as we choose κ1 large enough,
′v ,
u0 u′ v0 < θǫ and therefore θ0 > θǫ . The volume of arc u
1
1 1
2
r02
r02 ·θ03
|c[u′1 ,u′2 ]|2 θǫ
therefore, is 2 (θ0 − sin θ0 ) ≥ 14 ≥
.
Once
again,
if
we
choose
κ1
14·122
′
′ 2
θ
|c[u
,u
]|
ǫ
′v )≥
1
2
= µ0 , and therefore the volume
large enough, we can have µ(u
1 1
18κ1
of the ǫ-neighborhood of c is greater than µ0 .
2
⊓
⊔
Since |c[u′1 , u′2 ]| = |c|
15 , we have µ(Nǫ (c)) = Ω(θǫ · |c| ).
we can have
2.3
Putting It Together
With the lemmas established in the last subsection, we are ready to prove Theorem 1:
Proof of Theorem 1. Let Lmax be the maximum length of all ǫ-flat sub-boundaries
inside R. We divide all ǫ-flat sub-boundaries into subsets S1 , S2 , · · · , Sk . For each
i, Si contains the boundaries edges whose lengths are between Lmax
and L2max
i−1 ,
2i
We let ci,1 , ci,2 , · · · , ci,ni be the ni sub-boundaries in Si . By Lemma 2,
Nǫ (ci,j ) ∩ Nǫ (ci,j ′ ) = ∅, for any j and j ′ , 1 ≤ j, j ′ ≤ ni . By Lemma 3, there
exists a constant K > 0 such that µ(Nǫ (ci,j )) ≥ K · θǫ · |ci,j |2 for all i and j.
Therefore, we have
ni
ni
µ(F)
µ(Nǫ (ci,j ))
Nǫ (ci,j )) =
≥ µ(S) ≥ µ(
ρ
j=1
j=1
=
ni
j=1
Hence we have ni ≤
K · θǫ · |ci,j |2 ≥ ni · K · θǫ ·
4i ·µ(F )
K·θǫ ·L2max ·ρ .
Let K ′ =
an upper bound of |∂B|, which is defined to be
ǫ-flat sub-boundaries. Since
k
Observe that i=1 ni = n,
L2max
.
4i
µ(F )
K·θǫ ·L2max ·ρ .
k ni
i=1
j=1
Now we are to give
|ci,j |, the sum of all
k
Lmax
|ci,j | ≤ 2i−1 , we have |∂B| ≤ Lmax · i=1 ni · 2−i+1 .
k
−i+1
is maximized when ni = K ′ · 4i for
i=1 ni · 2
280
J.H. Reif and Z. Sun
i < log4
3n
K′
k
i=1
3n
K′ .
and ni = 0 for i ≥ log4
log4
ni · 2
−i+1
≤
3n
K′
i=1
−1
Therefore, we have
log4
′
i
K ·4 ·2
< 2K ′ · 2log4
Therefore, |∂B| is no more than
3n
K′
=
√
−i+1
= 2K
12n · K ′ =
′
3n
K′
−1
2i
i=1
12n · µ(F)
.
K · θǫ · L2max · ρ
12n·µ(F )
K·θǫ ·ρ .
Recall that K and ρ are constants
)
).
⊓
⊔
and that θǫ = Θ(ǫ), we have |∂B| = O( nµ(F
ǫ
If all the obstacles inside S are polygons, each boundary edge is an ǫ-flat
sub-boundary, and therefore we have the following corollary:
Corollary
1. If S contains polygonal obstacles with a total of n edges, |∂B| is
nµ(F )
).
O(
ǫ
If all obstacles inside S are convex, the boundary of each obstacle can be
decomposed into 2π
θǫ ǫ-flat sub-boundaries, and therefore we have:
Corollary
2. If S contains n convex obstacles that are piecewise smooth, |∂B|
is O( 1ǫ nµ(F)).
In some sense, the upper bound stated in Corollary 1 is tight, as one can
construct an ǫ-visible space inside a square consisting of n = 1ǫ rectangular free
space “cells,” each with length µ(F) and
width ǫ · µ(F). The total length
)
).
of obstacle boundaries is Θ( 1ǫ µ(F) = Θ( nµ(F
ǫ
Nonetheless, we still conjecture that the best bound should be the following:
Conjecture 1. |∂B| is O( 1ǫ
2.4
µ(F)).
Extension to Three Dimensions
In this subsection we show how to generalize our proof of Theorem 1 to 3D
spaces. For simplicity, we assume that the boundary (surface) of each obstacle
is smooth, meaning that the curvature is continuous everywhere on the surface.
To replicate the proofs of Lemmas 1, 2, and 3 for the 3D case, we first need
to define the ǫ-flat surface patch, the 3D counterpart of ǫ-flat sub-boundary. A
surface patch s is said to be ǫ-flat if, for any point u ∈ s and any plane p that
contains the line ls,u , the curve c = p ∩ s is ǫ-flat. Here ls,u is the line that passes
through u and is normal to s. Moreover, we also need the surface patch to be
“relatively round.” More specifically, we require that for each ǫ-flat surface patch
s there exists a “center” vs such that, max{|vs v||v ∈ ∂s}/ min{|vs v||v ∈ ∂s} is
bounded by a constant. Here ∂s is the closed curve that defines the boundary of
s. We call Rs,vs = min{|vs v||v ∈ ∂s} the minimum radius of s at center vs .
On Boundaries of Highly Visible Spaces and Applications
281
We define the ǫ-neighborhood Nǫ (s) for an ǫ-flat surface patch similarly to
the case of ǫ-flat sub-boundary. We choose a small “sub-patch” s′ of s at the
center of s so that the distance between vs and every point on the boundary of
s′ is k1 · Rs,vs , for some constant k1 < 1. For any point v outside the obstacle
that s is bounding, v ∈ Nǫ (s) if and only if there exist two points u1 , u2 ∈ s′
such that u1 vu2 > π − k2 ǫ for some constant k2 > 0.
We use a sequence of planes each containing lv,sv to “sweep” through the
volume of Nǫ (s). Each such plane p contains a “slice” of Nǫ (s) with an area of
2
no less than Θ(ǫ · Rs,v
), following the same argument of the proof of Lemma 3.
s
3
3
) = Θ(ǫ · µ(s) 2 ). We leave the
Therefore, the total volume of Nǫ (s) is Θ(ǫ · Rs,v
s
details of the proof as well as the proofs of the 3D versions of the other lemmas
to the full version of the paper, and only state the result as the following:
Theorem 2. If S contains convex obstacles bounded by a total of n ǫ-flat surface
)2 1/3
) ).
patches, |∂B| is O(( nµ(F
ǫ2
3
Applications and Open Problems
It is easy to see that in a 2D ǫ-visible space ∂Bv = Ω(ǫ µ(F)) for any v ∈ F.
Therefore, we can arrive at a lower bound of the fraction of all obstacle boundaries that each free space point can see for various cases by using Corollaries 1
and 2. In particular, if Conjecture 1 holds, each free space point can see at least
Ω(ǫ2 ) fraction of all obstacle boundaries. Then, using the same proof technique
as [1]1 , we can show that O( ǫ12 log 1ǫ ) randomly sampled boundary points can
view a significant portion of F with a high probability. These results can be
applied to the boundary art gallery problem to provide an upper bound of the
number of boundary guards needed to adequately guard the space.
It occurs to us that, although one can construct an example wherethere exists
a free space point that can only see obstacle boundaries of size Θ(ǫ µ(F)), the
total volume of such points could be upper-bounded. In particular, we have the
following conjecture:
√
Conjecture 2. Every point in F, exceptfor a small subset of volume O( ǫµ(F)),
can see obstacle boundaries of size Ω( ǫµ(F)).
If we can prove both Conjecture 1 and Conjecture 2, we can reduce the
number of boundary points needed to adequately cover the space with high
1
probability to O( ǫ3/2
log 1ǫ ).
So far our results are limited to 2D ǫ-visible spaces and some special cases
of 3D ǫ-visible spaces. If we can extend these results to higher dimensions, we
will be able to provide a theoretical foundation for analyzing the effectiveness of
the PRM planners [14,15] that (randomly) pick milestones close to boundaries
of obstacles. These planners have shown to be more efficient than the earlier
1
The difference is that, in our proof, every point v in the free space sees at least ǫ2
fraction of obstacle boundaries, and therefore the probability that k points uniformly
sampled on obstacle boundaries cannot see v is (1 − ǫ2 )k .
282
J.H. Reif and Z. Sun
PRM planners based on uniform sampling in the free space by better capturing
narrow passages in the configuration space; that is, the roadmaps they construct
have better connectivity. However, there has been no prior theoretical result on
the visibility of the roadmaps constructed using the sampled boundary points.
With upper bound results analogous to the ones for 2D and 3D cases, we will
be able to prove an upper bound of the number of milestones uniformly sampled
on obstacle boundaries needed to adequately cover free space F with a high
probability, an result similar to the one provided by Kavraki [1] for uniform
sampling method.
4
Conclusion
In this paper we provided some preliminary results as well as several conjectures
on the upper bound of the boundary size of ǫ-visible spaces in 2D and 3D
spaces. These results can be used to bound the number of guards needed for
the boundary art gallery problem. Potentially, they can also be applied to the
analysis of a certain class of PRM planners that sample points close to obstacle
boundaries.
Acknowledgement. This work is supported by NSF ITR Grant EIA-0086015,
DARPA/AFSOR Contract F30602-01-2-0561, NSF EIA-0218376, and NSF EIA0218359.
References
1. Kavraki, L.E., Latombe, J.C., Motwani, R., Raghavan, P.: Randomized query
processing in robot motion planning. In: Proceedings of the 27th Annual ACM
Symposium on Theory of Computing. (1995) 353–362
2. Hsu, D., Kavraki, L., Latombe, J.C., Motwari, R., Sorkin, S.: On finding narrow
passages with probabilistic roadmap planners. In: Proceedings of the 3rd Workshop
on Algorithmic Foundations of Robotics. (1998)
3. Kavraki, L., Latombe, J.C.: Randomized preprocessing of configuration space
for fast path planning. In: Proceedings of the 1994 International Conference on
Robotics and Automation. (1994) 2138–2145
4. Overmars, M.H., Švestka, P.: A probabilistic learning approach to motion planning.
In: Proceedings of the 1st Workshop on Algorithmic Foundations of Robotics.
(1994) 19–37
5. Kalai, G., Matoušek, J.: Guarding galleries where every point sees a large area.
Israel Journal of Mathematics 101 (1997) 125–139
6. Valtr, P.: Guarding galleries where no point sees a small area. Israel Journal of
Mathematics 104 (1998) 1–16
7. Valtr, P.: On galleries with no bad points. Discrete & Computational Geometry
21 (1999) 193–200
8. Kirkpatrick, D.: Guarding galleries with no nooks. In: Proceedings of the 12th
Canadian Conference on Computational Geometry. (2000) 43–46
On Boundaries of Highly Visible Spaces and Applications
283
9. Chvátal, V.: A combinatorial theorem in plane geometry. Journal of Combinatorial
Theory Series B 18 (1975) 39–41
10. Urrutia, J.: Art gallery and illumination problems. In Sack, J.R., Urrutia, J., eds.:
Handbook of Computational Geometry. Elsevier Science Publishers B.V. NorthHolland, Amsterdam (2000) 973–1026
11. Lee, D.T., Lin, A.K.: Computational complexity of art gallery problems. IEEE
Transactions on Information Theory 32 (1986) 276–282
12. Schuchardt, D., Hecker, H.: Two NP-hard art-gallery problems for ortho-polygons.
Mathematical Logic Quarterly 41 (1995) 261–267
13. Ghosh, S.K.: Approximation algorithms for art gallery problems. In: Proceedings
of Canadian Information Processing Society Congress. (1987)
14. Amato, N.M., Bayazit, O.B., Dale, L.K., Jones, C., Vallejo, D.: OBPRM: An
obstacle-based PRM for 3d workspaces. In: Proceedings of the 3rd Workshop on
Algorithmic Foundations of Robotics. (1998) 155–168
15. Boor, V., Overmars, M.H., Stappen, A.F.: The Gaussian sampling strategy for
probabilistic roadmap planners. In: Proceedings of the 1999 IEEE International
Conference on Robotics and Automation. (1999) 1018–1023
Membrane Computing
Gheorghe Păun
Institute of Mathematics of the Romanian Academy
PO Box 1-764, 70700 Bucureşti, Romania, and
Research Group on Mathematical Linguistics
Rovira i Virgili University
Pl. Imperial Tárraco 1, 43005 Tarragona, Spain
gpaun@imar.ro, gp@astor.urv.es
Abstract. This is a brief overview of membrane computing, at about
five years since this area of natural computing has been initiated. One
informally introduces the basic ideas and the basic classes of membrane
systems (P systems), some directions of research already well developed
(mentioning only some central results or types of results along these
directions), as well as several research topics which seem to be of interest.
1
Foreword
Membrane computing is a branch of natural computing which abstracts distributed parallel computing models from the structure and functioning of the
living cell. The devices investigated in this framework, called membrane systems
or P systems, are both able of Turing universal computations and, in certain
cases where an enhanced parallelism is provided, able to solve intractable problems in a polynomial time (by trading space for time). The domain is well developed at the mathematical level, still waiting for implementations of a practical
computational interest, but several applications in modelling various biological
(but also related to ecology, artificial life, abstract chemistry, even to linguistics)
phenomena have been reported.
At less than five years since the paper [6] was circulated on Internet, the
bibliography of the domain is pretty large and continuously growing, hence the
present survey will only mention the main directions of research and their central results, as well as some topics for further investigation. The goal is to let
the reader to have an idea about what membrane computing is dealing with,
rather than to provide a formal presentation of membrane systems of various
types or a list of precise results. Also, we do not give complete references.
The domain is fastly evolving – in particular, several results are repeatedly improved – hence we suggest to the interested reader to consult the web page
http://psystems.disco.unimib.it for up-dated details and references. Of a
special interest can be the collective volumes available in the web page, those
devoted to the series of Workshops on Membrane Computing (held in Curtea
de Argeş, Romania, in 2000, 2001, and 2002, and in Tarragona, Spain, in 2003),
A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 284–295, 2003.
c Springer-Verlag Berlin Heidelberg 2003
Membrane Computing
285
as well as the proceedings of the Brainstorming Week on Membrane Computing, held in Tarragona, in February 2003. For a comprehensive introduction to
membrane computing one can also use the monograph [7].
2
The Basic Class of P Systems
The fundamental ingredients of a membrane system are the (1) membrane structure and the sets of (2) evolution rules which process (3) multisets of (4) objects
placed in the compartments of the membrane structure.
A membrane structure is a hierarchically arranged set of membranes (understood as three dimensional vesicles), as suggested in Figure 1. We distinguish the
external membrane (corresponding to the plasma membrane and usually called
the skin membrane) and several internal membranes (corresponding to the membranes present in a cell, around the nucleus, in Golgi apparatus, vesicles, etc); a
membrane without any other membrane inside it is said to be elementary. Each
membrane determines a compartment, also called region, the space delimited
from above by it and from below by the membranes placed directly inside, if
any exists. The correspondence membrane-region is one-to-one, that is why we
sometimes use interchangeably these terms; also, we identify by the same label
a membrane and its associated region. (Mathematically, a membrane structure
is represented by the unordered tree which describes it, or by a sequence of
matching labelled parentheses.)
elementary membrane
membrane
skin
❆❆❯
❅
✡ ✩
✬
❅
✡ ✩
✬
❘
❅
✘
1 2✛
4 ✓
✡
✢ ✏ membrane
5
✒
✑
✜
6✤
✓
✓
✏
✏
✯
✟
9
✠
region ✟
❍
❅❍✚ ✙
❍❍
✒8✑
✒✑
❅✛
✢
✘✣
❍❍
❅
✓
✏
❘
3❅
❍
❥
❍
environment
environment
✒
✑
7
✚✙
✫
✪
✫
✪
Fig. 1. A membrane structure
In the basic variant of P systems, each region contains a multiset of symbolobjects, which correspond to the chemicals swimming in a solution in a cell
compartment; these chemicals are considered here as unstructured, that is why
we describe them by symbols from a given alphabet.
The objects evolve by means of evolution rules, which are also localized,
associated with the regions of the membrane structure. The rules correspond
286
G. Păun
to the chemical reactions possible in the compartments of a cell. The typical
form of such a rule is aad → (a, here)(b, out)(b, in), with the following meaning:
two copies of object a and one copy of object d react and the reaction produces
one copy of a and two copies of b; the new copy of a remains in the same
region (indication here), one of the copies of b exits the compartment (indication
out) and the other enters one of the directly inner membranes (indication in).
We say that the objects a, b, b are communicated as indicated by the commands
associated with them in the right hand member of the rule. When an object exits
a compartment, it will go to the surrounding compartment; in the case of the
skin membrane this is the environment, hence the object is “lost”, it never comes
back into the system. If no inner membrane exists (that is, the rule is associated
with an elementary membrane), then the indication in cannot be followed, and
the rule cannot be applied.
The communication of objects through membranes reminds of the fact that
the biological membranes contain various (protein) channels through which the
molecules can pass (in a passive way, due to concentration difference, or in an
active way, with a consumption of energy), in a rather selective manner.
A rule as above, with several objects in its left hand member, is said to be
cooperative; a particular case is that of catalytic rules, of the form ca → cu,
where a is an object and c is a catalyst, always appearing only in such rules,
never changing. A rule of the form a → u, where a is an object, is called noncooperative.
The rules associated with a compartment are applied to the objects from
that compartment, in a maximally parallel way: all objects which can evolve by
means of local rules should do it (we assign objects to rules, until no further
assignment is possible). The used objects are “consumed”, the newly produced
objects are placed in the compartments of the membrane structure according to
the communication commands assigned to them. The rules to be used and the
objects to evolve are chosen in a nondeterministic manner. In turn, all compartments of the system evolve at the same time, synchronously (a common clock is
assumed for all membranes). Thus, we have two levels of parallelism, one at the
level of compartments and one at the level of the whole “cell”.
A membrane structure and the multisets of objects from its compartments
identify a configuration of a P system. By a nondeterministic maximally parallel
use of rules as suggested above we pass to another configuration; such a step
is called a transition. A sequence of transitions constitutes a computation. A
computation is successful if it halts, it reaches a configuration where no rule can
be applied to the existing objects. With a halting computation we can associate
a result in various ways. The simplest possibility is to count the objects present
in the halting configuration in a specified elementary membrane; this is called
internal output. We can also count the objects which leave the system during
the computation, and this is called external output. In both cases the result is
a number. If we distinguish among different objects, then we can have as the
result a vector of natural numbers. The objects which leave the system can also
Membrane Computing
287
be arranged in a sequence according to the moments when they exit the skin
membrane, and in this case the result is a string.
This last possibility is worth emphasizing, because of the qualitative difference between the data structure used inside the system (multisets of objects,
hence numbers) and the data structure of the result, which is a string, it contains
a positional information, a syntax. A string can also be obtained by following
the trace of a distinguished object (a “traveller”) through membranes.
Because of the nondeterminism of the application of rules, starting from an
initial configuration, we can get several successful computations, hence several
results. Thus, a P system computes (one also uses to say generates) a set of
numbers, or a set of vectors of numbers, or a language.
We stress the fact that the data structure used in this basic type of P systems is the multiset (of symbols), hence membrane computing can be considered
as a biologically inspired algorithmic framework for processing multisets (in a
distributed, parallel, nondeterministic manner). Moreover, the main type of evolution rules are rewriting-like rules. Thus, membrane computing has natural connections with many areas of (theoretical) computer science: formal languages (L
systems, commutative languages, formal power series, grammar systems, regulated rewriting), automata theory, DNA (more general: molecular) computing,
the chemical abstract machine, the Gamma language, Petri nets, complexity
theory, etc.
3
Further Ingredients
With motivations coming from biology (trying to have systems as adequate as
possible to the cell structure and functioning), from computer science (looking for computationally powerful and/or efficient models), or from mathematics
(minimalistic models, even if they are not realistic, are more elegant, challenging, appealing), many types of P systems were introduced and investigated. The
number of features considered in this framework is very large.
For instance, we can add a partial order relation to each set of rules, interpreted as a priority relation among rules (this corresponds to the fact that
certain reactions are more likely to appear – are more active – than others), and
in this way the nondeterminism is decreased.
The rules can also have other effects than changing the multisets of objects,
namely, they can control the membrane permeability (this corresponds to the
fact that the protein channels from cell membranes can sometimes be closed,
e.g., when an undesirable substance should be kept isolated, and they are reopen when the “poison” vanishes). If a membrane is non-permeable, then no
rule which asks for passing an object through it can be used. In this way, the
processes taking place in a membrane system can be controlled (“programmed”).
In particular, membranes can be dissolved (all objects and membranes from a
dissolved membrane are left free in the surrounding compartment – the skin
membrane is never dissolved, because this destroys the “computer”; the rules
of the dissolved membrane are removed, they are supposed to be specific to the
288
G. Păun
reaction conditions from the former compartment, hence they cannot be applied
in the upper compartment, which has its own rules), created, and divided (like
in biology, when a membrane is divided, its content is replicated in the newly
obtained membranes).
Furthermore, the rules can be used in a conditional manner, depending on
the contents of the region where they are applied. The conditions can be of
a permitting context type (a rule is applied only if certain associated objects
are present) or of a forbidding context type (a rule is applied only if certain
associated objects are not present). This also reminds of biological facts, the
promoters and the inhibitors which regulate many biochemical reactions.
Several other ingredients can be considered but we do not enter here into
details.
4
Processing Structured Objects
The case of symbol-objects corresponds to a level of approaching (“zooming”) the
cell where we distinguish the internal compartmentalization and the chemicals
from compartments, but not the structure of these chemicals. However, most of
the molecules present in a cell have a complex structure, and this observation
makes necessary to consider structured objects also in P systems. A particular
case of interest is that where the chemicals can be described by strings (this is
the case with DNA, RNA, etc).
String-objects were considered in membrane systems from the very beginning.
There are two possibilities: to work with sets of strings (hence languages, in the
usual sense) or with multisets of strings, where we count the different copies of
the same string. In both cases we need evolution rules based on string processing
operations, while the second case makes necessary the use of operations which
increase and decrease the number of (copies of) strings. Among the operations
used in this framework, the basic ones were rewriting and splicing (well-known
in DNA computing: two strings are cut at specific sites and the fragments are
recombined), but also less popular operations were used, such as rewriting with
replication, splitting, conditional concatenation, etc.
The next step is to consider trees or arbitrary graphs as objects, with corresponding operations, then two-dimensional arrays, or even more complex pictures. The bibliography from the mentioned web page contains titles which refer
to all these possibilities.
A common feature of the membrane systems which work with strings or with
more complex objects is the fact that the halting condition can be avoided when
defining the successful computations and their result: a number is not “completely computed” until the computation is finished, it can grow at any further
step, but a string sent out of the system at any time remains unchanged, irrespective whether or not the computation continues. Thus, if we compute/generate
languages, then the powerful “programming technique” of the halting condition
can be ignored (this is also biologically motivated, as, in general, the biological
processes aim to last as much as possible, not to reach a “dead state”).
Membrane Computing
5
289
Universality
From a computability point of view, it is quite interesting that many types of
P systems (this means, many combinations of ingredients as those described in
the previous sections), of rather restricted forms, are computationally universal.
In the case when numbers are computed, this means that these systems can
compute all Turing computable sets of natural numbers. When the result of
a computation is a string or a set of strings, we get characterizations of the
family of recursively enumerable languages. This is true even for systems with
simple rules (catalytic), with a very reduced number of membranes (most of the
universality results recalled in [7] refer to systems with less that five membranes).
The proof techniques frequently used in such universality results are based
on the universality of matrix grammars with appearance checking (in certain
normal forms) or on the universality of register machines – and this is rather
interesting, as both these machineries are “old stuff” in computer science, being
well investigated already three to four decades ago (in both cases, improvements
of old results were necessary, motivated by the applications to membrane computing; for instance, new normal forms for matrix grammars, sharper than those
known from the literature were recently proved).
The abundance of universality results obtained in membrane computing, on
the one hand, shows that “the cell is a powerful computer”, on the other hand,
asks for an “explanation” of this phenomenon. Roughly speaking, the explanation lies in the fact that Turing computability is based on the possibility to use
an arbitrarily large work space, and this means to really use it, that is, to control
all this space, to send messages at an arbitrary distance (in general, this can be
reformulated as context-sensitivity); besides context-sensitivity, essential is the
possibility of erasing. Membrane systems possess erasing by definition (sending
objects to the environment or to a “garbage collector” membrane can mean
erasing), while the synchronized use of rules (the maximal parallelism) together
with the compartmentalization and the halting condition provide “sufficient”
context-sensitivity. Thus, the universality is expected, the only challenge is to
get it by using systems with a small number of membranes, using as restricted
features as possible.
For instance, by using catalytic rules also having associated a priority relation
it is rather easy to get the universality; not so easy is to replace the priority
with the possibility to control the membrane permeability, but this can be done.
However, it is surprising to get the universality by using catalytic rules only and
no other ingredient. An additional problem concerns the number of catalysts.
The initial proof (by P. Sosik) of the universality of catalytic P systems used
eight catalysts, then the number was decreased to six, then to five (R. Freund
and P. Sosik), it was shown that one catalyst does not suffice (O.H. Ibarra
et al), but the question which is the optimal result from this point of view
remains open. Similar “races” for the best result can be found in the case of the
number of membranes for various other types of P systems (just one example:
for a while, matrix grammars without appearance checking were simulated by
rewriting string-object P systems with four membranes, but recently the result
290
G. Păun
was improved to three – M. Madhu – without knowing whether this is an optimal
result).
6
Computing by Communication Only
The chemicals do not pass always alone through membranes, but a coupled transport is often met, where two solutes pass together through a protein channel,
either in the same direction or in the opposite directions. In the first case the
process is called symport, in the latter case it is called antiport. For completeness,
uniport names the case when a single molecule passes through a membrane.
The idea of a coupled transport can be captured in membrane computing
terms in a rather easy way: for the symport case, consider rules of the form
(ab, in) or (ab, out), while for the antiport case write (a, out; b, in), with the
obvious meaning. Mathematically, we can generalize this idea and consider rules
which move arbitrarily many objects through a membrane.
The use such rules suggests a very interesting question (research topic): can
we compute only by communication, only by transferring objects through membranes? This question leads to considering systems which contain only symport/antiport rules, which only change the places of objects, but not their
“names” (no object is created or destroyed). One starts with (finite) multisets of
objects placed in the regions of the system, and with certain objects available in
the environment in arbitrarily many copies (the environment is an inexhaustible
provider of “raw materials”, otherwise we can only deal with the finite number
of objects given at the beginning; note that by symport and/or antiport rules
associated with the skin membrane we can bring objects from the environment
into the system); the symport/antiport rules associated with the membranes are
used in the standard nondeterministic maximally parallel manner – and in this
way we get a computation.
Note that such systems have several interesting properties, besides the fact
that they compute by communication only: the rules are directly inspired from
biology, the environment takes part to the process, nothing is created, nothing
is destroyed, hence the conservation law is observed – and all these features are
rather close to reality.
Surprising at the first sight, but expected in view of the context-sensitivity
and erasing possibilities available in symport/antiport P systems, these systems
are again universal, even when using a small number of membranes, symport
rules and/or antiport rules of small “weights” (the weight of a rule is the number
of objects it involves).
7
P Automata
Up to now we have discussed only P systems which behave like a grammar: one
starts from an initial configuration and one evolves according to the given evolution rules, collecting some results, numbers or strings, in a specified membrane
or in the environment. Also an automata-like behavior is possible, especially in
Membrane Computing
291
the case of systems using only symport/antiport rules. For instance, we can say
that a string is accepted by a P system if it consists of symbols brought into the
system during a halting computation (we can imagine that a tape is present in
the environment, the symbols of which are taken by symport or by antiport rules
and introduced into the system; if the computation halts, then the contents of
the tape is accepted).
This is a simple and natural definition, considered by R. Freund and M. Oswald. More automata ingredients were considered by E. Csuhaj-Varju and G.
Vaszil (the contents of regions are considered states, which control the computation, while only symport rules of the form (x, in) are used, hence the communication is done in a one-way manner; further features are considered, but we
omit them here), and by K. Krithivasan, M. Mutyam, and S.V. Varma (special
objects are used, playing the role of states, which raises interesting questions
concerning the minimisation of P automata both from the point of view of the
number of membranes and of states).
The next step is to consider not only an input but also an output of a P
system, and this step was also done, by considering P transducers (G. Ciobanu,
Gh. Păun, and Gh. Ştefănescu).
As expected, also in the case of P automata (and P transducers) we get
the universality: the recursively enumerable languages (the Turing translations,
respectively) are characterized in all circumstances mentioned above, always with
systems of a reduced size.
8
Computational Efficiency
The computational power is only one criterion for assessing the quality of a new
computing machinery; from a practical point of view at least equally important
is the efficiency of the new device. The P systems display a high degree of
parallelism. Moreover, at the mathematical level, rules of the form a → aa
are allowed and by iterating such rules we can produce an exponential number
of objects in a linear time. The parallelism and the possibility to produce an
exponential working space are standard ways to speed-up computations. In the
general framework of P systems with symbol-objects (and without membrane
division or membrane creation) these ingredients do not suffice in order to solve
computationally hard problems (e.g., NP-complete problems) in a polynomial
time: in [11] it is proved that any deterministic P system can be simulated by a
deterministic Turing machine with a linear slowdown.
However, pleasantly enough, if additional features are considered, either able
to provide an enhanced parallelism (for instance, by membrane division, which
may produce exponentially many membranes in a linear time), or to better
structurate the multisets of objects (by membrane creation), then NP-complete
problems can be solved in a polynomial (often, linear) time. The procedure is as
follows (it has some specific features, slightly different from the standard computational complexity requirements). Given a decision problem, we construct in
polynomial time a family of P systems (each one of a polynomial size) which
292
G. Păun
will solve the instance of the problem in the following sense. In a well specified
time, bounded by a given function, the system corresponding to the instances
of a given size of the problem will sent to its environment a special object yes
if and only if the instance of the problem introduced into the initial configuration of the system has a positive answer. During the computation, the system
can grow exponentially (as the number of objects and/or the number of membranes) and can work in a nondeterministic manner; important is that it always
halt. Standard problems for illustrating this approach are SAT (satisfiability of
propositional formulas in the conjunctive normal form) and HPP (the existence
of an Hamiltonian path in a directed graph), but many other problems were also
considered. Details can be found in [7] and [9].
There is an interesting point here: we have said that the family of P systems
solving a given problem is constructed in polynomial time, but this does not
necessarily mean that the construction is uniform: it may not start from n but
from the nth instance of the problem. Because the construction (done by a Turing
machine) takes a polynomial time, it is honest, it cannot hide the solution of the
problem in the system itself which solves the problem. This “semi-uniformity”
(we may call it fairness/honestity) is usual in molecular computing. However,
if we insist on having uniform constructions in the classic sense of complexity
theory, then this can also be obtained in many cases. A series of results in this
direction were obtained by the Sevilla membrane computing group (M.J. PerezJimenez, A. Romero-Jimenez, F. Sancho-Caparrini, etc).
Recently, a surprising result was reported by P. Sosik: P systems with membrane division can also solve in polynomial time problems known to be PSPACEcomplete. P. Sosik has shown this for QBF (satisfiability of quantified propositional formulas). The family of P systems used in the proof is constructed in the
semi-uniform manner mentioned above and the systems use the division operation not only for elementary membranes but also for arbitrary membranes. It
is an open problem whether or not the result can be improved from these two
points of view.
All previous remarks refer to P systems with symbol-objects. Polynomial
(often linear) solutions to NP-complete problems can be obtained also in the
framework of string-objects, for instance, when string replication is used for
obtaining an exponential work space.
9
Resent Research Topics
The two types of attractive results mentioned in the previous sections – computational universality and computational efficiency – as well as the versatility of
the P systems explain the very rapid development of the membrane computing
area. Besides the topics discussed above, many others were investigated (normal
forms in what concerns the shape of the membrane structure, the number and
the type of used rules, decidability problems, links with Eilenberg X machines,
parallel rewriting of string-objects, ways to avoid the communication deadlock in
this case, associating energy to objects or to reactions, and so on and so forth),
Membrane Computing
293
but we do not enter here into details. Instead, we just briefly mention some topics which were considered in the last time, some of them promising to open new
research vistas in membrane computing.
A P system is a computing model, but at the same time it is a model of
a cell, whatever reductionistic it is in a given form, hence one can consider its
evolution, its “life” as the main topic of investigation and not a number/string
produced at the end of a halting computation. This leads to interpreting P
systems as dynamic systems, possibly evolving forever, and this viewpoint raises
specific questions, different from the computer science ones. Such an approach (P
systems as dynamic systems) was started by V. Manca and F. Bernardini, and
promises to be of interest for biological applications (see also the next section).
At a theoretical level, a fruitful recent idea is to associate with a P system
(with string-objects) not only one language, as usual for grammar or automatalike devices, but a family of languages. This reminds the “old” idea of grammar
forms, but also the forbidding-enforcing systems [3]. Actually, M. Cavaliere and
N. Jonoska have started from such a possible bridge between forbidding-enforcing
systems and membrane systems, considering P systems with a way to define the
new populations of strings in terms of forbidding-enforcing conditions. A different
idea in defining a family of languages as “generated” by a P system was followed
by A. Alhazov.
Returning to the abundance of universality results, which somehow end the
research interest for the respective classes of P systems (the equivalence with
Turing machines directly implies conclusions regarding decidability, complexity,
closure properties, etc), a related question of interest is to investigate the subuniversal classes of P systems. For instance, several universality results refer to
systems with arbitrary catalytic rules (of the form ca → cu), used together with
non-catalytic rules; also, a given number of membranes is necessary (although
in many cases one does not know the sharp borderline between universality
and sub-universality from this point of view). What about the power and the
properties of P systems which are not universal? Some problems are shown to
be decidable for them; which is the complexity of these problems? Which are the
closure properties of the associated families of numbers or of languages? Topics
of this type were addressed from time to time, but recently O.H. Ibarra and his
group started a systematic study, considering both new (restricted) classes of
P systems and new problems (e.g., the reachability of a configuration and the
complexity of deciding it).
Rather promising seems to be the use of P systems for handling twodimensional objects. There are several papers in this area, dealing with graphs,
arrays, other types of pictures (R. Freund, M. Oswald, K. Krithivasan and her
group, R. Ceterchi, R. Gramatovici, N. Jonoska, K.G. Subramanian, etc). Especially interesting is the following idea (suggested several times in membrane
computing papers and now followed by R. Ceterchi and her colleagues in Tarragona): instead of using a membrane structure as a support of a computation
whose “main” subject are the objects present in the regions of the membrane
structure, let us take the tree which describes the membrane structure as the
294
G. Păun
subject of the computation, and use the contents of the regions as auxiliary tools
in the computation.
A very important direction of research – important especially from the point
of view of applications in biology and related areas – is to bring to membrane
computing some approximate reasoning tools, some non-crisp mathematics, in
the probability theory, fuzzy sets, or rough sets sense – or in a mixture of all
these. Randomized P algorithms, which solve hard problems in polynomial time,
using a polynomial space, with a controlled probability, were already proposed
by A. Obtulowicz, who has also started a systematic study of the possibility to
model the uncertainty in membrane computing.
It is highly probable that all these topics will be much investigated in the near
future, with a special emphasis on complexity matters and on issues related to
applications, to the adequacy of membrane computing to the biological reality.
10
Implementations and Applications
Some branches of natural computing, such as neural computing and evolutionary
computing, starts from biology and try to improve the way we use the existing
electronic computers, while DNA computing has the ambition to find a new
support for computations, a new hardware. For membrane computing is not yet
clear in which direction we have to look for implementations. Anyway, it seems
too early to try to implement computations at the level of a cell, whatever
attractive this seems to be.
However, there are several attempts to implement (actually, to simulate)
P systems on the usual computers. Of course, the biochemically inspired nice
features of P systems (in special, the nondeterminism and the parallelism) are
lost, as they can be only simulated on the deterministic usual computers, but
the obtained simulators still can be useful for certain practical purposes (not
to mention their didactical usefulness). At this moment, there are reported at
least one dozen of programs for implementing P systems of various types – see
references in the web page, where some programs are available, too.
On the other hand, several applications of membrane computing were reported in the literature, in general, of the following type: one takes a piece of
reality, most frequently from cell biology, but also from artificial life, abstract
chemistry, biology of eco-systems, one constructs a P system modelling this piece
of reality, then one writes a program which simulates this P system and one runs
experiments, carefully arranging the system parameters (especially, the form of
rules and their probabilities to be applied); statistics about the populations of
objects in various compartments of the system are obtained, sometimes suggesting interesting conclusions. Typical examples can be found in [1] (including an
approach to the famous Brusselator model, with conclusions which fit with the
known ones, obtained by using continuous mathematics – by Y. Suzuki et al,
an investigation of photosynthesis – by T. Nishida, signaling patways and T cell
activation – by G. Ciobanu and his collaborators). Several other (preliminary)
applications of P systems to cryptography, linguistics, distributed computing
Membrane Computing
295
can be found in the volumes [1,8], while [2] contains a promising application in
writing algorithms for sorting.
The turning of the domain towards applications in biology is rather natural:
P systems are (discrete, algorithmic, well investigated) models of the cell and
the cell biologists miss efficient global models of the cell, in spite of the fact
that modelling and simulating the living cell is a very important task (as it was
stated in several places, this is one of the main challenges of bioinformatics for
this beginning of millennium).
11
Final Remarks
At the end of this brief and informal excursion to membrane computing, we
stress the fact that our goal was only to give a general impression about this
fastly growing research area, hence we strongly suggest to the interested reader to
access the web page mentioned in the first section of the paper for any additional
information. The page contains the full current bibliography, many downloadable
papers, the addresses of people who have contributed to membrane computing,
lists of open problems, calls for participation to related meetings, some software
for simulating P systems, etc.
References
1. C.S. Calude, Gh. Păun, G. Rozenberg, A. Salomaa, eds., Multiset Processing. Mathematical, Computer Science, and Molecular Computing Points of View, Lecture
Notes in Computer Science, 2235, Springer, Berlin, 2001.
2. M. Cavaliere, C. Martin-Vide, Gh. Păun, eds., Proceedings of the Brainstorming Week on Membrane Computing; Tarragona, February 2003, Technical Report
26/03, Rovira i Virgili University, Tarragona, 2003.
3. A. Ehrenfeucht, G. Rozenberg, Forbidding-enforcing systems, Theoretical Computer Science, 292 (2003), 611–638.
4. O.H. Ibarra, On the computational complexity of membrane computing systems,
submitted, 2003.
5. K. Krithivasan, S.V. Varma, On minimising finite state P automata, submitted,
2003.
6. Gh. Păun, Computing with membranes, Journal of Computer and System Sciences,
61, 1 (2000), 108–143.
7. Gh. Păun, Computing with Membranes: An Introduction, Springer, Berlin, 2002.
8. Gh. Păun, G. Rozenberg, A. Salomaa, C. Zandron, eds., Membrane Computing.
International Workshop, WMC-CdeA 2002, Curtea de Argeş, Romania, Revised
Papers, Lecture Notes in Computer Science, 2597, Springer, Berlin, 2003.
9. M. Perez-Jimenez, A. Romero-Jimenez, F. Sancho-Caparrini, Teorı́a de la Complejidad en Modelos de Computatión Celular con Membranas, Editorial Kronos,
Sevilla, 2002.
10. P. Sosik, The computational power of cell division in P systems: Beating down
parallel computers?, Natural Computing, 2003 (in press).
11. C. Zandron, A Model for Molecular Computing: Membrane Systems, PhD Thesis,
Universitá degli Studi di Milano, 2001.
Classical Simulation Complexity of Quantum
Machines⋆
Farid Ablayev and Aida Gainutdinova
Dept. of Theoretical Cybernetics,
Kazan State University
420008 Kazan, Russia
{ablayev,aida}@ksu.ru
Abstract. We present a classical probabilistic simulation technique of
quantum Turing machines. As a corollary of this technique we obtain
several results on relationship among classical and quantum complexity
classes such as: P rQP = P P , BQP ⊆ P P and P rQSP ACE(S(n)) =
P rP SP ACE(S(n)).
1
Introduction
Investigations of different aspects of quantum computations in last decade became a very intensively growing area of mathematics, computer science, physics
and technology. A good source of information on quantum computations is
Nielsen’s and Chuang’s book [8].
Notice that in quantum mechanic and quantum computations traditionally
used “right-left” presentation of computational process. That is, current general
state of quantum system is presented as column-vector |ψ which is multiplied
by unitary transition matrix U to obtain next general state |ψ ′ = U |ψ.
In this paper we use “left-right” presentation of quantum computational process (as it is used to use for presentation of classical deterministic and stochastic
computational processes). That is, current general state of quantum system is
presented as row-vector ψ| (elements of ψ| are complex conjugates of elements
of |ψ) which is multiplied by unitary transition matrix W = U † to obtain next
general state ψ ′ | = ψ|W .
In the paper we consider probabilistic and quantum complexity classes. Here
BQSpace(S(n)) and P rQSpace(S(n)) stand for complexity classes determined
by O(S(n)) space bounded quantum Turing machines that recognize languages
with bounded and unbounded error respectively. P rSpace(S(n)) stands for complexity class determined by O(S(n)) space bounded classical probabilistic Turing machines that recognize languages with unbounded error. BQT ime(T (n))
and P rQT ime(T (n)) stand for complexity classes determined by O(T (n)) time
bounded quantum Turing machines that recognize languages with bounded and
⋆
Supported by the Russia Fund for Basic Research under the grant 03-01-00769
A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 296–302, 2003.
c Springer-Verlag Berlin Heidelberg 2003
Classical Simulation Complexity of Quantum Machines
297
unbounded error respectively. P rT ime(T (n)) stands for complexity class determined by O(T (n)) time bounded classical probabilistic Turing machines that recognize languages with unbounded error. We assume T (n) ≥ n and S(n) ≥ log n
are fully time and space constructible respectively. For most of the paper, we
will refer to the polynomial-time case, where T (n) = S(n) = nO(1) .
Classical simulations of quantum computational models use different techniques, see for example [3,9,10,6,7]. In our paper we view a computation process
of classical one-tape probabilistic Turing machines (PTM) and quantum Turing
machines (QTM) as a linear process. That is, a computation on PTM for particular input u is the Markov process, in which a vector of probabilities distribution
of configurations at a given step is multiplied by a fixed stochastic transition
matrix M to obtain the vector of probabilities distribution of configurations at
the next step. A computation on QTM is a unitary-linear process similar to the
Markov process. A quantum computation step corresponds to multiplying a general state (the vector of amplitudes distribution of all possible configurations) at
the current step by fixed complex unitary transition matrix to obtain a general
state at the next step. We refer to the paper [6] for more information.
In the paper we present classical Simulation Theorem 2 (simulation technique
of quantum computation process) which states that having unitary-linear process we can construct equivalent (in the sense of language presentation) Markov
process. This simulation technique allows to gather together different complexity
results on classical simulation of quantum computations. As a corollary of the
Theorem 2 we have the following relations among complexity classes.
Theorem 1.
P rQT ime(T (n)) = P rT ime(T (n)).
In particular P rQP = P P .
BQT ime(T (n)) ⊆ P rT ime(T (n)) [1],
In particular BQP ⊆ P P
BQSpace(S(n)) ⊆ P rSpace(S(n)) [10] and
P rQSpace(S(n)) = P rSpace(S(n)) [10]
Proof (Sketch): Quantum simulation technique of classical probabilistic Turing machines is well known, see for example [5,8]. This technique establishes inclusions P rSpace(S(n)) ⊆ P rQSpace(S(n)) and P rT ime(T (n)) ⊆
P rQT ime(T (n)). The Simulation Theorem 2 and observation (section 4) prove
inclusions:
BQT ime(T (n)) ⊆ P rT ime(T (n)),
P rQT ime(T (n)) ⊆ P rT ime(T (n)),
BQSpace(S(n)) ⊆ P rSpace(S(n)),
P rQSpace(S(n)) ⊆ P rSpace(S(n)).
2
Classical Simulation of Quantum Turing Machines
We consider a two-tape Turing machine (probabilistic and quantum) with readonly input tape and read-write tape. We call Turing machine M t(n)-time,
298
F. Ablayev and A. Gainutdinova
s(n)-space machine if every computation of M on input of length n halts in
at most t(n) steps and uses at most s(n) cells on the read-write tape during a
computation. We assume t(n) ≥ n and s(n) ≥ log n are fully time and space
constructible respectively. We will always have s(n) ≤ t(n) ≤ 2O(s(n)) . By a
configuration C of Turing machine we mean the content of its read-write tape,
tape pointers, and current state of the machine.
Definition 1 A probabilistic Turing machine (PTM) P consists of a finite set
Q of states, a finite input alphabet Σ, a finite tape alphabet Γ , and a transition
function
δ : Q × Σ × Γ × Q × Γ × {L, R} × {L, R} → [0, 1]
where δ(q, σ, γ, q ′ , γ ′ , d1 , d2 ) gives the probability with which the machine in state
q reading σ and γ will enter state q ′ , write γ ′ , and move in direction d1 and d2
on read and read-write tapes respectively.
Definition 2 A quantum Turing machine QTM Q consists of a finite set Q
of states, a finite input alphabet Σ, a finite tape alphabet Γ , and a transition
function
δ : Q × Σ × Γ × Q × Γ × {L, R} × {L, R} → C
where C is the set of complex numbers, δ(q, σ, γ, q ′ , γ ′ , d1 , d2 ) gives the amplitude
with which the machine in state q reading σ and γ will enter state q ′ , write γ ′ ,
and move in direction d1 and d2 on read and read-write tapes respectively.
Vector-Matrix Machine. From now we will view Turing machine computation as a linear process described in [6]. Below we present formal description of
probabilistic and quantum machine in matrix form. For fairness we should only
allow efficiently computable matrix entries, where we can compute i-th bit in
time polynomial in i.
First we define a general d-dimensional, t-time “vector-matrix machine”
(d, t)−VMM that feeds our needs for linear presentation of computation procedure of probabilistic and quantum machines. Fix an input u.
VMM (u) = a(0)|, T, F
where a(0)| = (a1 , . . . , ad ) is an initial row-vector for an input u, T is a d × d
transition matrix, F ⊆ {1, . . . , d} is an accepting set of states.
VMM (u) proceeds in t steps as follows: in each step i a current vector a(i)|
is multiplied by d × d matrix T to obtain the next vector a(i + 1)| that is,
a(i + 1)| = a(i)|T . From the resulting vector a(t)| we determine numbers
1
2
P raccept
(u) and P raccept
(u) as follows:
1
(u) = i∈F |ai (t)|;
1. P raccept
2
2. P raccept
(u) = i∈F |ai (t)|2 .
These numbers will express probability of u acceptance for probabilistic
and quantum machines respectively. We call VMM (u) that uses P r1 (VMM (u))
(P r2 (VMM (u))) for probability acceptance Type I VMM (u) (Type II VMM (u)).
Classical Simulation Complexity of Quantum Machines
299
Linear Presentation of Probabilistic Machine. Let P be a t(n)-time, s(n)space PTM. Computation on an input u of length n by P can be presented by
a finite Markov chain with d(n) = 2O(s(n)) states (states of this Markov chain
correspond to configurations of PTM) and d(n) × d(n) stochastic matrix M .
Notice that for polynomial-time computation, given configurations Ci , Cj and
input u one can in polynomial-time compute probability M (i, j) of transition
from Ci to Cj , even though the whole transition matrix M is too big to write
down in polynomial-time. Formally computation on input u, |u| = n, can be
described by stochastic machine SM (u)
SM (u) = p(0)|, M, F
where SM is Type I (d(n), t(n))−VMM with the following restrictions: p(0)| =
(p1 , . . . , pd(n) ) is stochastic row-vector of initial probabilities distribution of configurations. That is, pi = 1 and pj = 0 for j = i, where Ci is the initial
configuration of P for the input u. M is the stochastic matrix defined above.
F ⊆ {1, . . . , d(n)} is a set of indexes of accepting configurations of P.
Linear Presentation of Quantum Machine. Consider t(n)-time, s(n)-space
QTM Q. Computation on an input u of length n by Q can be presented by the
following restricted quantum system (unitary-linear process) with d(n) (d(n) =
2O(s(n)) ) basis states corresponding to configurations of QTM and d(n) × d(n)
complex valued unitary matrix W . Notice that for polynomial-time computation,
given configurations Ci , Cj and input u one can in polynomial-time compute
amplitude W (i, j) of transition from Ci to Cj , as for PTM. Formally computation
on input u, |u| = n, can be described by linear machine LM (u)
LM (u) = µ(0)|, W, F
where LM (u) is Type II (d(n), t(n)) − VMM with the following restrictions:
µ(0)| = (z1 , . . . , zd(n) ) is the initial general state (complex row-vector of initial
amplitudes distribution of configurations). Namely, zj = 0 for j = i and zi = 1
where Ci is the initial configuration of Q for the input u. W is the unitary matrix
defined above. F ⊆ {1, . . . , d(n)} is a set of indexes of accepting configurations
of Q.
Language Acceptance Criteria. We use standard unbounded error and
bounded error acceptance criteria. For a language L, for an n ≥ 1 denote
Ln = L ∩ Σ n . We say that language Ln is unbounded error recognized by Type
I (Type II) (d(n), t(n))−VMM if for arbitrary input u ∈ Σ n there exists Type I
(Type II) (d(n), t(n))−VMM (u) such that it is holds that P r(VMM (u)) > 1/2
for u ∈ Ln and P r(VMM (u)) < 1/2 for u ∈ Ln . Similarly we say that language Ln is (d(n), t(n))−VMM bounded error recognized by Type I (Type II)
(d(n), t(n))−VMM if for ǫ ∈ (0, 1/2), arbitrary u ∈ Σ n there exists Type I (Type
II) (d(n), t(n))−VMM (u) such that it is holds that P r(VMM (u)) ≥ 1/2 + ǫ for
u ∈ Ln and P r(VMM (u)) ≤ 1/2 − ǫ for u ∈ Ln . We say that VMM (u) process
its input u with threshold 1/2.
300
F. Ablayev and A. Gainutdinova
Let M be a classic probabilistic P or quantum Q Turing machine. We say
that M unbounded (bounded) error recognizes language L ⊆ Σ ∗ if for all n ≥ 1
corresponding (d(n), t(n)) − VMM unbounded (bounded) error recognizes language Ln .
Theorem 2 (Simulation Theorem). Let language Ln be unbounded error
(bounded error) recognized by quantum machine (d(n), t(n))−LM . Then there
exists stochastic machine (d′ (n), t′ (n))−SM that unbounded error recognizes Ln
with d′ (n) ≤ 4d2 (n) + 3, and t′ (n) = t(n).
We present the sketch of the proof of Theorem 2 in the next section.
3
Proof of Simulation Theorem
For the proof let us fix arbitrary input u, |u| = n, and let d = d(n) and t =
t(n). We call VMM (u) complex-valued (real-valued) if VMM has complex-valued
(real-valued) entries for initial vector and transition matrix.
Lemma 1. Let LM (u) be complex-valued (d, t)−LM (u). Then there exists realvalued (2d, t)−LM ′ (u) such that P r(LM (u)) = P r(LM ′ (u)).
Proof: The proof uses the real-valued simulation of complex-valued matrix multiplication (which is now folklore) and is omitted.
Next Lemma states complexity relation among machines of Type I and Type
II (among “linear” and “non linear” extracting a result of computation).
Lemma 2. Let LM (u) be real-valued (d, t) − LM (u). Then there exists realvalued Type I (d2 , t)−VMM (u) such that P r(V M M (u)) = P r(LM (u)).
Proof: Let LM (u) = µ(0)|, W, F . We construct VMM (u) = τ (0)|, T, F ′ as
follows. The initial general state τ (0)| = µ(0) ⊗ µ(0)| — is d2 -dimension vector,
T = W ⊗ W is d2 × d2 matrix. Accepting set F ′ ⊆ {1, . . . , d2 (n)} of states is
defined in according to F ⊆ {1, . . . , d} as follows F ′ = {j : j = (i−1)d+i, i ∈ F }.
We denote |i – d-dimensional unit column-vector with value 1 at i and
0 elsewhere. Using the fact that for real valued vectors c, b it is holds that
t
c|b2 = c ⊗ c|b ⊗ b we have that T t = W ⊗ W = W t ⊗ W t and
P r(VMM (u)) =
τ (0)|T t |j =
=
µ(0) ⊗ µ(0)|W t ⊗ W t |i ⊗ i
i∈F
j∈F ′
t
2
µ(0)|W |i = P r(LM (u)).
i∈F
Lemma 3. Let (d, t)−VMM (u) be real-valued Type I machine with k, k ≤ d,
accepting states. Then there exists real-valued Type I (d, t)−VMM ′ (u) with unique
accepting state such that P r(VMM (u)) = P r(VMM ′ (u)).
Classical Simulation Complexity of Quantum Machines
301
Proof: The proof uses standard technique from Linear Automata Theory (see for
example the book [4]) and is omitted.
Next lemma presents classical probabilistic simulation complexity of linear
machines.
Lemma 4. Let VMM (u) be real-valued Type I (d, t) − VMM (u). Then there
exists stochastic machine (d + 2, t)−SM (u) such that
P r(SM (u)) = ct P r(VMM (u)) + 1/(d + 2)
where constant c ∈ (0, 1] depends on VMM (u).
Proof: Let VMM (u) = τ (0)|, T, F . In according to Lemma 3 we consider
VMM (u) with unique accepting state. We construct SM (u) = p(0)|, M, F ′
as follows. For d × d matrix T we define (d + 2) × (d + 2) matrix
0 0...0 0
A = b T ... ,
β
q
0
such that sum of elements of each row and each column of A is zero (we are free
to select elements of column b, row q and number β).
Matrix A has the property: sum of elements of each row and each column of
A is zero. k-th power Ak of A preserves this property.
Now let R be stochastic (d + 2) × (d + 2) matrix who’s (i, j)-entry is 1/(d + 2).
Select positive constant c ≤ 1 such that matrix M , defined as
M = cA + R
is stochastic matrix. Further by induction on k we have that k-th power M k of
M is also stochastic matrix and has the same structure. That is,
M k = ck Ak + R.
By selecting suitable initial probabilities distribution p(0)| and accepting state
we can pick up from M t entry we need (entry that gives u accepting probability).
From the construction of stochastic machine ((d + 2), t)-SM (u) we have that
P r(SM (u)) = ct P r(VMM (u)) + 1/(d + 2).
Lemma 4 says that having Type I (d, t)−VMM (u) that process its input u
with threshold 1/2 one can construct stochastic machine (d + 2, t)−SM (u) that
process u with threshold λ = ct 1/2 + 1/(d + 2).
Lemma 5. Let (d, t)-SM (u) be stochastic machine that process its input u with
threshold λ ∈ [0, 1). Then for arbitrary λ′ ∈ (λ, 1) there exists (d + 1, t)-SM ′ (u)
that process u with threshold λ′ .
Proof: The proof uses standard technique from Probabilistic Automata Theory
(see for example the book [4]) and is omitted.
302
4
F. Ablayev and A. Gainutdinova
Observation
For machines presented in vector-matrix form Theorem 2 states complexity characteristics of classical simulation of quantum machines. Vector-matrix technique
keep the dimension of classical machine close to dimension of quantum machine, and amazingly we have that the simulation time does not increase. But
from Lemma 4 we have that the stochastic simulation of linear machine is not
completely free of charge — we lose ǫ-isolation of threshold (bounded error acceptance property) of the machine.
Notice that we present our classical simulation technique of quantum computation process (Simulation Theorem) in a form of vector-matrix machine VMM
and omit a description how to come back to the uniform Turing machine. Obviously we have that in the case of Turing machines we will have slowdown of such
simulations but this slowdown keeps simulations in polynomial time restriction.
Remind that threshold changing technique for Turing machine models is well
known (it was used for proving N P ⊆ P P inclusion, see for example [2]).
Acknowledgments. We are grateful to referees for helpful remarks and on
mentioning that the technique of the paper [1] also works for proving the first
statement P rQT ime(T (n)) = P rT ime(T (n)) of Theorem 1.
References
1. L. Adleman, J. Demarrais, M. Huang, Quantum computability, SIAM J. on Computing. 26(5), (1997), 1524–1540.
2. J. Balcázar, J. Dı́az and J. Gabarró, Structural Complexity I, An EATCS series,
Springer-Verlag, 1995.
3. E. Bernstein and U. Vazirany, Quantum complexity theory, SIAM J. Comput, Vol.
26, No. 5, (1997), 1411–1473.
4. R. Bukharaev. The Foundation of Theory of Probabilistic Automata. Moscow,
Nauka, 1985. (In Russian).
5. J. Gruska. Quantum computing. The McGraw-Hill Publishing Company. 1999.
6. L. Fortnow. One complexity theorist’s view of quantum computing. Theoretical
Computer Science, 292(3), (2003), 597–610.
7. C. Moore, J. Crutchfield. Quantum Automata and Quantum Grammars. Theoretical Computer Science 237, (2000), 275–306.
8. M. Nielsen and I. Chuang. Quantum Computation and Quantum Information.
Cambridge University Press. 2000.
9. D. Simon, On the power of quantum computation, SIAM J. Comput, Vol. 26, No.
5, (1997), 1474–1483.
10. J. Watrous. Space-bounded quantum complexity. Journal of Computer and System
Sciences, 59(2), (1999), 281–326.
Using Depth to Capture Average-Case
Complexity
Luı́s Antunes1⋆ , Lance Fortnow2 , and N.V. Vinodchandran3⋆⋆
1
3
DCC-FC & LIACC-University of Porto
R.Campo Alegre, 823, 4150-180 Porto, Portugal
lfa@ncc.up.pt
2
NEC Laboratories America
4 Independence way, Princeton, NJ 08540
fortnow@nec-labs.com
Department of Computer Science and Engineering
University of Nebraska
vinod@cse.unl.edu
Abstract. We give the first characterization of Turing machines that
run in polynomial-time on average. We show that a Turing machine M
runs in average polynomial-time if for all inputs x the Turing machine
uses time exponential in the computational depth of x, where the computational depth is a measure of the amount of “useful” information in x.
1
Introduction
In theoretical computer science we analyze most algorithms based on their worstcase performance. Many algorithms with bad worse-case performance nevertheless perform well in practice. The instances that require a large running-time
rarely occur. Levin [Lev86] developed a theory of average-case complexity to
capture this issue. Levin gives a clean definition of Average Polynomial Time for
a given language L and a distribution µ. Some languages may remain hard in
the worst case but can be solved in Average Polynomial Time for all reasonable
distributions. We give a crisp formulation of such languages using computational
depth as developed by Antunes, Fortnow and van Melkebeek [AFvM01].
Define deptht (x) as the difference of K t (x) and K(x) where K(x) is the usual
Kolmogorov complexity and K t (x) is the version where the running times are
bounded by time t. The deptht function [AFvM01] measures in some sense the
“useful information” of a string.
We have two main results that hold for every language L.
1. If (L, µ) is in Average Polynomial Time for all P-samplable distributions µ
then there exists a Turing machine M computing L and a polynomial p such
that for all x, the running time of M (x) is bounded by 2O(depthp (x)+log |x|) .
⋆
⋆⋆
Research done during an academic internship at NEC. This author is partially supported by funds granted to LIACC through the Programa de Financiamento Plurianual, Fundação para a Ciência e Tecnologia and Programa POSI.
Research done while a post doctoral scientist at NEC Research Institute, Princeton.
A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 303–310, 2003.
c Springer-Verlag Berlin Heidelberg 2003
304
L. Antunes, L. Fortnow, and N.V. Vinodchandran
2. If there exists a Turing machine M and a polynomial p such that M computes L and for all inputs x, the running time of M (x) is bounded by
2O(depthp (x)+log |x|) , then (L, µ) is in Average Polynomial Time for all Pcomputable distributions.
We do not get an exact characterization from these results. The first result
requires P-samplable distributions and the second holds only for the smaller class
of P-computable distributions. However, we can get an exact characterization
by considering the time-bounded universal distribution mt . We show that the
following are equivalent for every language L and every polynomial p:
– (L, mp ) is in Average Polynomial Time.
– There is some Turing machine M computing L such that for all inputs x the
running time of M is bounded by 2O(depthp (x)+log |x|) .
Since the polynomial-time bounded universal distribution is dominated
by a P-samplable distribution and dominates all P-computable distributions
(see [LV97]) our main results follow from this characterization.
We prove our results for arbitrary time bounds t and as we take t towards
infinity we recover Li and Vitányi’s [LV92] result that under (non-time-bounded)
universal distribution, the average-case complexity and the worst-case complexity coincide. Our theorems could be viewed as a time-bounded version of Li and
Vitányi’s result. This directly addresses the issue raised by Miltersen [Mil93]
of relating a time-bounded version of Li and Vitányi with Levin’s average-case
complexity.
2
Preliminaries
We use binary alphabet Σ = {0, 1} for encoding strings. Our computation model
will be prefix free Turing machines: Turing machines with a one-way input tape
(the input head can only read from left to right), a one-way output tape and a
two-way work tape. The function log denote log2 . All explicit resource bounds
we use in this paper are time-constructible.
2.1
Kolmogorov Complexity and Computational Depth
We give essential definitions and basic result in Kolmogorov complexity for our
needs and refer the reader to the textbook by Li and Vitányi [LV97] for more
details. We are interested in self-delimiting Kolmogorov complexity (denoted by
K(.)).
Definition 1. Let U be a fixed prefix free universal Turing machine. Then for
any string x ∈ {0, 1}∗ , the Kolmogorov complexity of x is, K(x) = minp {|p| :
U (p) = x}.
For any time constructible t, the t-time-bounded Kolmogorov complexity of x is,
K t (x) = min{|p| : U (p) = x in at most t(|x|) steps}.
Using Depth to Capture Average-Case Complexity
305
Kolmogorov complexity of a string is a rigorous measure of the amount of
information contained in it. A string with high Kolmogorov complexity contains
lots of information. A random string has high Kolmogorov complexity and hence
very informative. However, intuitively, the very fact that it is random restricts its
utility in computational complexity theory. How can we measure the nonrandom
information in a string?
Antunes, Fortnow and van Melkebeek [AFvM01] propose a notion of Computational Depth as a measure of nonrandom information in a string. Intuitively
strings of high depth are low Kolmogorov complexity strings (and hence nonrandom), but a resource bounded machine cannot identify this fact. Indeed,
Bennett’s logical depth [Ben88] can be viewed as such a measure, but its definition is rather technical. Antunes, Fortnow and van Melkebeek suggest that the
difference between two Kolmogorov complexity measures captures the intuitive
notion of nonrandom information. Based on this intuition and with simplicity in
mind, in this work we use the following depth measure.
Definition 2 (Antunes-Fortnow-van Melkebeek). Let t be a constructible
time bound. For any string x ∈ {0, 1}∗ ,
deptht (x) = K t (x) − K(x).
Average Case Complexity
We give definitions from average case complexity theory necessary for our
purposes [Lev86]. For more details readers can refer to the survey by Jie
Wang [Wan97]. In average case complexity theory, a computational problem
is a pair (L, µ) where L ⊆ Σ ∗ and µ is a probability distribution. The probability
distribution is a function from Σ ∗ to the real interval [0, 1] such that
x∈Σ ∗ µ(x) ≤ 1. For probability
distribution µ, the distribution function, denoted by µ∗ is given by µ∗ (x) = y≤x µ(x). The notion of polynomial on average
is central to the theory of average case completeness.
Definition 3. Let µ be a probability distribution function on {0, 1}∗ . A function
f : Σ + → N is polynomial on µ-average if there exists an ǫ > 0 such that
f (x)ǫ
x |x| µ(x) < ∞.
From the definition it follows that any polynomial is polynomial on µ-average
for any µ. It is easy to show that if functions f and g are polynomial on µ-average,
then the functions f.g, f + g, and f k for some constant k are also polynomial
on µ-average.
Definition 4. Let µ be a probability distribution and L ⊆ Σ ∗ . Then the pair
(L, µ) is in Average Polynomial time (denoted as Avg-P) if there is a Turing
machine accepting L whose running time is polynomial on µ-average.
We need the notion of domination for comparing distributions. The next
definition formalizes this notion.
306
L. Antunes, L. Fortnow, and N.V. Vinodchandran
Definition 5. Let µ and ν be two distributions on Σ ∗ . Then µ dominates ν if
there is a constant c such that for all x ∈ Σ ∗ , µ(x) ≥ |x|1 c ν(x). We also say ν is
dominated by µ.
Proposition 1. If a function f is polynomial on µ-average, then for all distributions ν dominated by µ, f is also polynomial on ν-average.
Average case analysis is, in general, sensitive to the choice of distribution, if
we allow arbitrary distributions then average case complexity classes take the
form of traditional worst-case complexity classes [LV92]. So it is important to
restrict attention to distributions which are in some sense simple. Usually simple
distributions are identified with the polynomial-time computable or polynomialtime samplable distributions.
Definition 6. Let t be a time constructible function. A probability distribution
function µ on {0, 1}∗ is said to be t-time computable, if there is a deterministic
Turing machine that on every input x and a positive integer k, runs in time
t(|x| + k), and outputs a fraction y such that |µ∗ (x) − y| ≤ 2−k .
The most controversial definition in the average case complexity theory is the
association of the class of simple distributions with P-computable, which may
seem too restricting. Ben-David et al. in [BCGL92] introduced a wider family
of natural distributions, P-samplable, consisting of distributions that can be
sampled by randomized algorithms, working in time polynomial in the length of
the sample generated.
Definition 7. A probability distribution µ on {0, 1}∗ is said to be P-samplable,
if there is a probabilistic Turing machine M which on input 0k produces a string
x such that |P r(M (0k ) = x) − µ(x)| ≤ 2−k and M runs in time poly(|x| + k).
Every P-computable distribution is also P-samplable, however the converse
is unlikely.
Theorem 1 ([BCGL92]). If one-way functions exists, then there is a Psamplable probability distribution µ which is not dominated by any polynomialtime computable probability distribution ν.
Universal Distributions
The Kolmogorov complexity function K(.) naturally defines a probability distribution on Σ ∗ : for any string x assign a probability of 2−K(x) . Kraft’s inequality
implies that this indeed is a probability distribution. This distribution is called
the universal distribution and is denoted by m. Universal distribution has many
equivalent formulations and has many nice properties. Refer to the textbook by
Li and Vitányi [LV97] for an in-depth study on m. The main drawback of m
is that it is not computable. In this paper we consider the resource-bounded
version of the universal distribution.
Using Depth to Capture Average-Case Complexity
307
Definition 8. The t-time bounded universal distribution, mt is given by mt (x)
t
= 2−K (x) .
One important property of mt is that it dominates certain computable distributions.
Theorem 2 ([LV97]). mt dominates any t/n-time computable distribution.
Proof. (Sketch) Let µ be a t/n-time computable distribution and let µ∗ denote
the distribution of µ. We will show that for any x ∈ Σ n , K t (x) ≤ − log(µ(x)) +
Cµ for a constant Cµ which depends on µ. Let Bi = {x ∈ Σ n |2−(i+1) ≤ µ(x) <
2−i }. Since for any x in Bi , µ(x) ≥ 2−(i+1) , we have that |Bi | ≤ 2i . Consider
the real interval [0, 1]. Divide it into intervals of size 2−i . Since µ(x) ≥ 2−i , we
have for any j, 0 ≤ j ≤ 2i , the j th interval [j2−i , (j + 1)2−i ] will have at most
one x ∈ Bi such that µ(x) ∈ [j2−i , (j + 1)2−i ]. Since µ is t/n-computable, for
any x ∈ Bi , given j, we can do a binary search to output the unique x satisfying
µ(x) ∈ [j2−i , (j + 1)2−i ]. This involves computing µ∗ correct up to 2−(i+1) . So
the total running time of the process will be bounded by O((t/n)n). Hence we
have the theorem.
Note that mt approaches m as t → ∞. In the proof of Theorem2, mt very
strongly dominates t/n-time computable distributions, in the sense that mt (x) ≥
1
µ(x). The definition of domination that we follow only needs mt to dominate
2Cµ
µ within a polynomial.
It is then natural to ask if there exists a polynomial-time computable distribution dominating mt . Schuler [Sch99] showed that if such a distribution exists
then no polynomially secure pseudo-random generators exists. Pseudo-random
generators are efficiently computable functions which stretches a seed into a
long string so that for a random input the output looks random for a resourcebounded machine.
Theorem 3 ([Sch99]). If there exists a polynomial time computable distribution that dominates mt then pseudo-random generators do not exist.
While, it is unlikely that there are polynomial-time computable distributions
dominating universal distributions, we show that there are P-samplable distributions dominating the time-bounded universal distributions.
Lemma 1. For any polynomial t, there is a P-samplable distribution µ which
dominates mt .
Proof. (Sketch) We will define a samplable distribution µt by prescribing a sampling algorithm for µt as follows. Let U be the universal machine.
Sample n ∈ N with probability n12
Sample 1 ≤ j ≤ n with probability 1/n
Sample uniformally y ∈ Σ j
Run U (y) for t steps. If U stops and outputs a string x ∈ Σ n , output x.
For any string x of length n, K t (x) ≤ n. Hence it is clear that the probability
t
that x is at least n13 2−K (x) .
308
3
L. Antunes, L. Fortnow, and N.V. Vinodchandran
Computational Depth and Average Polynomial Time
We state our main theorem which relates computational depth to average polynomial time.
Theorem 4. Let T be a constructible time bound. Then for any time constructible t, the following statements are equivalent.
1. T (x) ∈ 2O(deptht (x)+log |x|) .
2. T is polynomial on mt -average.
In [LV92], Li and Vitányi showed that when the inputs to any algorithm
are distributed according to the universal distribution, the algorithm’s average
case complexity is of the same order of magnitude as its worst case complexity.
Rephrasing this connection in the setting of average polynomial time we can
make the following statement.
Theorem 5 (Li-Vitányi). Let T be a constructible time bound. The following
statements are equivalent
1. T (x) is bounded by a polynomial in |x|.
2. T is polynomial on m-average.
As t → ∞, K t approaches K. So deptht approaches 0 and mt approaches
m. Hence our main theorem can be seen as a generalization of Li and Vitányi’s
theorem.
We can apply the implication (1 ⇒ 2) of the main theorem in the following
way. Let M be a Turing machine and let L(M ) denote the language accepted
by M . Let TM denote its running time. If TM (x) ∈ 2O(deptht (x)+log |x|) then
(L(M ), µ) is in Avg-P for any µ which is computable in time t/n. The following
corollary follows from our main theorem and the universality of mt (Theorem
2).
Corollary 1. Let M be a deterministic Turing machine whose running time
is bounded by 2O(deptht (x)+log |x|) , for some polynomial t. Then for any t/ncomputable distribution µ, the pair (L(M ), µ) is in Avg-P.
Hence a sufficient condition for a language L (accepted by M ) to be in
Avg-P with respect to all polynomial-time computable distributions is that the
running time of M is bounded by exponential in deptht , for all polynomials t.
An obvious question that arises is whether this condition is necessary. We have
already partially answered this question (Lemma 1) by exhibiting an efficiently
samplable distribution µt that dominates mt . Hence if (L(M ), µt ) is in Avg-P
then (L(M ), mt ) is also in Avg-P. From the implication (2 ⇒ 1) of the main
theorem, we have that TM (x) ∈ 2O(deptht (x)+log |x|) .
From Lemma 1, we get that if a machine runs in time polynomial on average
for all P-samplable distributions then it runs in time exponential in its depth.
Corollary 2. Let M be a machine which runs in time TM . Suppose for all
distributions µ in P-samplable, TM is polynomial on µ-average, then TM (x) ∈
2O(deptht (x)+log |x|) , for some polynomial t.
Using Depth to Capture Average-Case Complexity
309
We now prove our main theorem.
Proof. (Theorem 4) (1 ⇒ 2). We will show that the statement 1 implies that
T (x) is polynomial on mt -average. Let T (x) ∈ 2O(deptht (x)+log |x|) . Because of
the closure properties of functions which are polynomial on average, it is enough
to show that the function T ′ (x) = 2deptht (x) is polynomial on mt -average. This
essentially follows from the definitions and Kraft’s inequality. The details are as
follows. Consider the sum
T ′ (x)
2deptht (x)
t
mt (x) =
2−K (x)
|x|
|x|
∗
∗
x∈Σ
x∈Σ
=
2K t (x)−K(x)
t
2−K (x)
|x|
∗
x∈Σ
≤
2−K(x)
|x|
∗
x∈Σ
<
2−K(x) < 1
x∈Σ ∗
The last inequality is the Kraft’s inequality.
(2 ⇒ 1) Let T (x) be a time constructible function which is polynomial on
mt -average. Then for some ǫ > 0 we have
T (x)ǫ
mt (x) < 1
|x|
∗
x∈Σ
n
Define Si,j,n = {x ∈ Σ |2i ≤ T (x) < 2i+1 and K t (x) = j}. Let 2r be the
approximate size of Si,j,n . Then the Kolmogorov complexity of elements in Si,j,n
is r up to an additive log n factor. The following claim (proof omitted) states
this fact more formally.
Claim. For i, j ≤ n2 , let 2r ≤ |Si,j,n | < 2r+1 . Then for any x ∈ Si,j,n , K(x) ≤
r + O(log n).
Consider the above sum restricted to elements in Si,j,n . Then we have
T (x)ǫ
mt (x) < 1
|x|
x∈Si,j,n
i
t
−j
T (x) ≥ 2 , m (x) = 2 and there are at least 2r elements in the above sum.
r iǫ −j
Hence the above sum is lower-bounded by the expression 2 .2|x|.2
for some
c
constant c. This gives us
T (x)ǫ
mt (x)
1>
|x|
x∈Si,j,n
≥
2r .2iǫ .2−j
= 2iǫ+r−j−c log n
|x|c
That is iǫ + r − j − c log n < 1. From Claim 3, it follows that there is
a constant d, such that for all x ∈ Si,j,n , iǫ ≤ deptht (x) + d log |x|. Hence
d
T (x) ≤ 2i+1 ≤ 2 ǫ (deptht (x)+log |x|) .
310
L. Antunes, L. Fortnow, and N.V. Vinodchandran
Acknowledgment. We thank Paul Vitányi for useful discussions.
References
[AFvM01] Luis Antunes, Lance Fortnow, and Dieter van Melkebeek. Computational
depth. In Proceedings of the 16th IEEE Conference on Computational
Complexity, pages 266–273, 2001.
[BCGL92] S. Ben-David, B. Chor, O. Goldreich and M. Luby. On the theory of
average case complexity. J. Computer System Sci., 44(2):193–219, 1992.
[Ben88]
Charles H. Bennett. Logical depth and physical complexity. In R. Herken,
editor, The Universal Turing Machine: A Half-Century Survey, pages 227–
257. Oxford University Press, 1988.
[HILL99] Johan Håstad, Russell Impagliazzo, Leonid A. Levin, and Michael Luby.
A pseudorandom generator from any one-way function. SIAM Journal on
Computing, 28(4):1364–1396, August 1999.
[Lev86]
Leonid A. Levin. Average case complete problems. SIAM Journal on
Computing, 15(1):285–286, 1986.
[Lev84]
Leonid A. Levin. Randomness conservation inequalities: information and
independence in mathematical theories. Information and Control, 61:15–
37, 1984.
[LV92]
Ming Li and Paul M. B. Vitanyi. Average case complexity under the universal distribution equals worst-case complexity. Information Processing
Letters, 42(3):145–149, May 1992.
[LV97]
Ming Li and Paul M. B. Vitányi. An introduction to Kolmogorov complexity
and its applications. Springer, 2nd edition, 1997.
[Mil93]
Peter Bro Miltersen. The complexity of malign measures. In SIAM Journal
on Computing, 22(1):147–156, 1993.
[Sch99]
Rainer Schuler. Universal distributions and time-bounded Kolmogorov
complexity. In Proc. 16th Annual Symposium on Theoretical Aspects of
Computer Science, pages 434–443, 1999.
[Wan97]
Jie Wang. Average-case computational complexity theory. In Alan L.
Selman, Editor, Complexity Theory Retrospective, volume 2. 1997.
Non-uniform Depth of Polynomial Time and
Space Simulations
Richard J. Lipton1 and Anastasios Viglas2
1
College of Computing, Georgia Institute of Technology and
Telcordia Applied Research
rjl@cc.gatech.edu
2
University of Toronto, Computer Science Department,
10 King’s College Road, Toronto, ON M5S 3G4, Canada
aviglas@cs.toronto.edu
Abstract. We discuss some connections between polynomial time and
non-uniform, small depth circuits. A connection is shown with simulating deterministic time in small space. The well known result of Hopcroft,
Paul and Valiant [HPV77] showing that space is more powerful than
time can be improved, by making an assumption about the connection of
deterministic time computations and non-uniform, small depth circuits.
To be more precise, we prove the following: If every linear time deterministic computation can be done by non-uniform circuits of polynomial
size and sub-linear depth,then DT IME(t) ⊆ DSPACE(t1−ǫ ) for some
constant ǫ > 0. We can also apply the same techniques to prove an
unconditional result, a trade-off type of theorem for the size and depth
of a non-uniform circuit that simulates a uniform computation.
Keywords: Space simulations, non-uniform depth, block respecting
computation.
1
Introduction
We present an interesting connection between non-uniform characterizations of
Polynomial time and time versus space results.
Hopcroft Paul and Valiant [HPV77] proved that space is more powerful than
time: DT IME(t) ⊆ DSPACE(t/ log t). The proof of this trade-off result is based
on pebbling techniques and the notion of block respecting computation. Improving the space simulation of deterministic time has been a long standing open
problem. Paul Tarjan and Celoni [PTC77] proved an n/ log n lower bound for
pebbling a certain family of graphs. This lower bound implies that the trade-off
result DT IME(t) ⊆ DSPACE(t/ log t) of [HPV77] cannot be improved using
similar pebbling arguments.
In this work we present a connection between space simulations of deterministic time and the depth of non-uniform circuits simulating polynomial time
computations. This connection gives a way to improve the space simulation result from [HPV77] mentioned above, by making a non-uniform assumption. If
A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 311–320, 2003.
c Springer-Verlag Berlin Heidelberg 2003
312
R.J. Lipton and A. Viglas
every problem in linear deterministic time can be solved by polynomial size
non-uniform circuits of small (sub-linear) depth then every deterministic computation of time t can be simulated in space t1−ǫ for some constant ǫ > 0 (that
depends only on our assumption about the non-uniform depth of linear time):
DT IME(n) ⊆SIZE−DEPT H(poly(n), nδ )
=⇒ DT IME(t) ⊆ DSPACE(t1−ǫ )
(1)
where δ < 1 and ǫ > 0. Note that we allow the size of the non-uniform circuit
to be any polynomial. Since DT IME(t) ⊆ SIZE(t · log t) (proved in [PF79]),
our assumption basically asks to reduce the depth of the non-uniform circuit by
a small amount, allowing the size to increase by any polynomial factor.
It is interesting to note that in this result, a non-uniform assumption is used
(P has small non-uniform depth) to prove a purely uniform result (deterministic time can be simulated in small space). This can also be considered as an
interesting result for the power of non-uniformity: If non-uniformity is powerful
enough to allow small depth circuits for linear time deterministic computations,
then we can improve the space-bounded simulation of deterministic time given
by Hopcroft Paul and Valiant.
A related result was shown by Sipser [Sip86,Sip88] from the point of view of
reducing randomness required for randomized algorithms. His result considers
the problem of constructing expanders with certain properties. Assuming that
those expanders can be constructed efficiently, the main theorem proved is that
either P is equal to RP or the space simulation of Hopcroft, Paul and Valiant
[HPV77] can be improved: Under the hypothesis that certain expanders have
explicit constructions, there exists an ǫ > 0 such that
(P = RP) or (DT IME(t) ∩ 1∗ ) ⊆ DSPACE(t1−ǫ )
(2)
An explicit construction for the expanders mentioned above was given by
Saks, Srinivasan and Zhou [SSZ98]. The theorem mentioned above reveals a
deep connection between pseudo-randomness and efficient space simulations (for
unary languages): either space bounded simulations for deterministic time can
be improved, or we can construct (pseudorandom) sequences that can be used
to improve the derandomization of certain algorithms. On the other hand, the
result we are going to present in this work, gives a connection between the power
of non-uniformity and the power of space bounded computations.
Other related results include Dymond and Tompa [DT85] where it is shown
that DT IME(t) ⊆ AT IME(t/ log t), improving the Hopcroft Paul Valiant theorem, and Paterson and Valiant [PV76] proving SIZE(t) ⊆ DEPT H(t/ log t).
We also show how to apply the same techniques to prove an unconditional
trade-off type of result for the size and depth of a non-uniform circuit that
simulates a uniform computation. Any deterministic√time t computation
can be
√
simulated by a non-uniform circuit of size roughly 2 t and depth t, which has
“semi-unbounded” fan-in: all AND gates have polynomially bounded fan-in and
OR gates are unbounded, or vice versa.
√ Similar results were given in [DT85]
showing that time t is in PRAM time t.
Non-uniform Depth of Polynomial Time and Space Simulations
2
313
Notation – Definitions
We use the standard notation for time and space complexity classes DT IME(t)
and DSPACE(t). SIZE−DEPT H(s, d) will denote the class of non-uniform circuits with size (number of gates) O(s) and depth O(d). We also use N C/poly (N C
with polynomial advice) to denote the class of non-uniform circuits of polynomial size and poly-logarithmic depth, SIZE−DEPT H(poly, polylog). At some
points in the paper, we will also avoid writing poly-logarithmic factors in detail
and use the notation Õ(n) to denote O(n logk n) for constant k. In this work we
consider time complexity functions that are time constructible: A function t(n)
is called fully time constructible if there exists a deterministic Turing Machine
that on input of length n halts after exactly t(n) steps. In general a function f (n)
is t-time constructible, if there is a deterministic Turing Machine that on input
x outputs 1f (|x|) and runs in time O(t). (t, s)-time-space constructible functions
are defined similarly. We also use “TM” for “deterministic Turing Machine”.
For the proof of the main result we use the notion of block respecting Turing
machines introduced by Hopcroft Paul and Valiant in [HPV77].
computation
tapes
t steps
b steps
t
b bits
Fig. 1. Block respecting computation
Definition 1. Let M be a machine running in time t(n), where n is the length
of its input x. Let the computation of M be partitioned in a(n) segments, where
each segment consists of b(n) consecutive steps, a(n) · b(n) = t(n). Let also the
tapes of M be partitioned into a(n) blocks each consisting of b(n) bits (cells)
on each tape. We will call M block respecting if during each segment of its
computation, each head visits only one block on each tape.
Every Turing Machine can be converted to a block respecting machine with
only a constant factor slow down in its running time. The construction is simple:
Let M be a deterministic Turing Machine running in time t. Break the computation steps (1 . . . t) in segments of size B. Break the work tapes in blocks of the
same size B. If at the start of a computation segment σ the work tape head is
314
R.J. Lipton and A. Viglas
in block bj , then during the computation steps (b steps) of that segment, the
head could only visit the adjacent blocks, bj−1 or bj+1 . Keep a copy of those
two blocks along with bj and do all the computation of segment σ reading and
updating from those copies (if needed). At the end of the computation of every
segment, there is a clean-up step: update the blocks bj−1 and bj+1 and move
the work tape head to the appropriate block to start the computation of the
next segment. This construction can be done for different block sizes B. For our
purposes B will be tc for a small constant c < 1.
Block respecting Turing machines are also used in [PPST83] to prove that
non-deterministic linear time is more powerful than deterministic linear time
(see also [PR81] for a generalization of the results from [HPV77] for RAMs and
other machine models).
3
Main Results
We show that if linear time has small non-uniform circuit depth (for polynomial
size circuits) then DT IME(t) ⊆ DSPACE(t1−ǫ ) for a constant ǫ > 0.
To be more precise, the strongest form of the main result is the following: if
(deterministic) linear time has polynomial size, non-uniform circuits of sublinear
depth (for example depth nδ for 0 < δ < 1), then DT IME(t) ⊆ DSPACE(t1−ǫ )
for a small positive ǫ > 0:
DT IME(n) ⊆ SIZE−DEPT H( poly, nδ ) =⇒ DT IME(t) ⊆ DSPACE(t1−ǫ )
(3)
The main idea is the following: Start with a deterministic Turing machine
M running in time t and convert it to a block respecting machine MB with
block size B. In each segment of the computation, MB reads and/or writes in
exactly one block on each tape. We will argue that we can check the computation
in each such segment with the same sub-circuit and we can actually construct
this sub-circuit with polynomial size and small (poly-logarithmic or sub-linear)
depth. Combining all these sub-circuits together we can build a larger circuit
that will check the entire computation of MB in small depth. The final step is
a technical lemma that shows how to evaluate this circuit in small space (equal
to its depth).
We start by proving the main theorem using the assumption P ⊆ N C/poly.
It is easy to see that an assumption of the form DT IME(n) ⊆ N C/poly implies
P ⊆ N C/poly by padding arguments.
Theorem 1. Let t be a polynomial time complexity function. If P ⊆ N C/poly
then DT IME(t) ⊆ DSPACE(t1−ǫ ) for some constant ǫ > 0.
Proof. (Any “reasonable” time complexity function could be used in the statement of this theorem.) Consider any Turing Machine M running in deterministic
time t. Here is how to simulate M in small space using the assumption that polynomial time has shallow (poly-logarithmic depth) polynomial size circuits:
Non-uniform Depth of Polynomial Time and Space Simulations
315
1. Convert given TM in a block respecting machine with block size B.
2. Construct the graph that describes the computation. Each vertex corresponds to a computation segment of B steps.
3. The computation on each vertex can be checked by the same TM U that
runs in polynomial time (linear time)
4. Since P ⊆ N C/poly, there is a circuit UC that can replace U . UC has polynomial size and polylogarithmic depth.
5. Construct UC by trying all possible circuits.
6. Plug in the sub-circuit UC to the entire graph. This graph is the description
of a circuit of small depth, that corresponds to the computation of the given
TM. Evaluate the circuit (in small space)
In more detail: Convert M to a block respecting machine MB . Break the
computation of MB (on input x) in segments of size B each; the number of segments is t/B. Consider the directed graph G corresponding to the computation
of the block respecting machine as described in [HPV77]: G has one vertex for
every time segment (that is t/B vertices) and the edges are defined from the
sequence of head positions. Let v(∆) denotes the vertex corresponding to time
segment ∆ then and ∆i is the last time segment before ∆ during which the i-th
head was scanning the same block as during segment ∆. Then the edges of G
are v(∆ − 1) → v(∆) and for all 1 ≤ i ≤ l, v(∆i ) → v(∆). The number of edges
t
can be at most
t O( tB) and therefore the number of bits required to describe the
graph is O B log B . Figure 2 shows the idea behind the construction of the
s1
1
computation
s2
t steps
B
b1
work
tapes
b2
tape block
b1
s1 B computation
steps
b2
s2
Fig. 2. Graph description of a block respecting computation.
graph for the block respecting computation. The computation is partitioned in
segments of size B. Every segment corresponds to a vertex (denoted by a circle in figure 2). Each segment will access only one block on each tape. Figure
2 shows the tape blocks blocks which are read during a computation segment
(input blocks for that vertex) and those that will be written during the same
segment (shown as output blocks). If a block is written during a segment and the
316
R.J. Lipton and A. Viglas
same block is read by another computation segment later in the computation,
then the second segment depends directly from the previous one and there will
be an edge connecting the corresponding vertices in our graph.
Each vertex of this graph corresponds to B computation steps of MB . During
this computation, MB reads and writes only in one block from each tape. In order
to check the computation that corresponds to a vertex of this graph, we would
need to simulate MB for B steps and check O(B) bits from MB ’s tapes. For each
vertex we need to check/simulate a different segment of MB ’s computation: this
can be done by a Turing machine that will check the corresponding computation
of MB . We argue that the same Turing machine can be used on every vertex.
The computation we need to do on each vertex of the graph is essentially the
same: given the “input” and “output” contents of certain tape blocks, simulate
the machine MB for B steps and check if the output contents are correct. The
only thing that changes is the actual segment of the computation of MB that
we are going to simulate (which B steps of MB we should simulate). This means
that the exact same “universal” Turing machine checks the computation for each
segment/vertex, and this universal machine also takes as input the description
(for example the index of the part of the computation of the initial machine
MB it will need to simulate or any reasonable encoding) of the computation
that it needs to actually simulate on each vertex. Therefore we have the same
machine U on all vertices of the graph which runs in deterministic polynomial
time. If P ⊆ SIZE−DEPT H(nk , logl n) then U can be simulated by a circuit
tape block
depth polylog B
size B
B computation
steps
Fig. 3. Insert the (same) sub-circuit on all vertices
UC of size O(B k ) and small depth O(logl B), for some k, l. The same circuit
is used on all vertices of the graph. In order to construct this circuit, we can
try all possible circuits and simulate them on all possible inputs. This requires
exponential time, but only small amount of space: the size of the circuit is B k
and its depth polylogarithmic in B. We need Õ(B k ) bits to write down the
circuit and only polylog space to evaluate it (using lemma 1).
Once we have constructed UC , we can build the entire circuit that will simulate MB . This circuit derives directly from the (block-respecting) computation
Non-uniform Depth of Polynomial Time and Space Simulations
317
graph where each vertex is an instance of the sub-circuit UC . The size of the
entire circuit is too big to write down. We have up to t/B sub-circuits (UC ) that
would require a size of Õ( Bt B k ) for some constant k. But since it is the same
sub-circuit UC that appears throughout the graph, we can implicitly describe the
entire circuit in much less space. For the evaluation of the circuit, we only need
to be able to describe the exact position of a vertex in the graph, and determine
the immediate neighbors of a given vertex (previous and next vertices). This can
easily be done in space Õ(t/B + B k ).
In order to complete the simulation we need to show how to evaluate a smalldepth circuit in small space (see Borodin [Bor77]).
Lemma 1. Consider a directed acyclic graph G with one source (root). Assume
that the leaves are labeled from {0, 1}, its inner nodes are either AND or OR
nodes and the depth is at most d. Then we can evaluate the graph in space at
most O(d).
Proof. (of lemma. See [Bor77] for more details).
Convert the graph to a tree (by making copies of the nodes). The tree will
have much bigger size but the depth will remain the same. We can prove (by
induction) that the value of the tree is the same as the value of the graph from
which we started. Evaluating the tree corresponds to computing the value of its
root. In order to find the value of any node v in the tree, proceed as follows: Let
u1 , . . . , uk denote the child-nodes of v.
If v is an AND node, then compute (recursively) the value of its first child
u1 . If value(u1 ) = 0 then the value of v is also 0. Otherwise continue with the
next child. If the last child has value 1 then the value of v is 1. Notice that we
do not need to remember the value of the child-nodes that we have evaluated.
If v is an OR node, the same idea can be applied. We can use a stack for the
evaluation of the tree. It is easy to see that the size of the stack will be at most
O(d), that is as big as the depth of the tree.
The total amount of space used is:
Õ(B 2k +
t
logl B)
B
(4)
To get the desired result, we need to choose the size B of the blocks appropriately to balance the two terms in (4). B will be t1/c for some constant c that
is larger than k.
As mentioned above, the exact same proof would work even if we allow almost
linear depth for the non-uniform circuits for just linear deterministic time instead
of P. The stronger theorem is the following:
Theorem 2. If DT IME(n) ⊆ SIZE−DEPT H(nk , nδ ) for some k > 0 and
1−δ
δ < 1, then DT IME(t) ⊆ DSPACE(t1−ǫ ) where ǫ = 1 − 2k+1
.
Proof. From the proof of theorem 1 we can calculate the space required for the
simulation: In order to find the correct sub-circuit which has size B k and depth
318
R.J. Lipton and A. Viglas
B δ , we need O(B 2k log B) space to write it down and O(B δ ) to evaluate it. To
evaluate the entire circuit which has depth Bt · B δ ) we are only using space
O(
t
t
· B δ log B + log t + B 2k log B)
B
B
(5)
The first term in equation (5), is the space required to evaluate the entire
circuit that has depth Bt · B δ and the second and third term is the space required
to write down an implicit description of the entire circuit (description of the
graph from the block respecting computation, and the description of the smaller
sub-circuit)
Total space used (to find the correct sub-circuit and to evaluate the entire
circuit) is
O(
t
· B δ log B + B 2k log B)
B
(6)
If we set B = t1/2k+1 then the space bound is
1−δ
O(t1− 2k+1 )
(7)
In these calculations 2k + 1 means just something greater than 2k.
These proof ideas seem to fail if we try to simulate non-deterministic time in
small space. In that case, evaluating the circuit would be more complicated: we
would need to use more space in order to make sure that the non-deterministic
guesses are consistent throughout the evaluation of the circuit.
4
Semi-unbounded Circuits
These simulation ideas using block respecting computation can also be used to
prove an unconditional result relating uniform polynomial time and non-uniform
small depth circuits. The simulation of the previous section implies unconditionally a trade-off type of result for the size and depth of non-uniform circuits that
simulate uniform computations. The next theorem proves that any deterministic
√ √
time
can be simulated by a non-uniform circuit of size t · 2 t or
√ t computation
√
2O( t) and depth t, which has “semi-unbounded” fan-in. Previous work by Dymond and Tompa [DT85]√also present similar results showing that deterministic
time t is in PRAM time t.
Theorem 3. Let t be a√reasonable time complexity function. Then DT IME(t)
√
⊆ SIZE−DEPT H(2O( t) , t), and the simulating circuits require exponential
fan-in for AND gates and polynomial for OR gates (or vice-versa)
Proof. Given a Turing machine running in DT IME(t), construct the block respecting version, and repeat the exact same construction as the one presented
in the proof of theorem 1: Construct the graph describing the block respecting
Non-uniform Depth of Polynomial Time and Space Simulations
319
computation, which has t/B nodes, and every node corresponds to a segment
of B (we will chose the size B later in the proof) computation steps. Use this
graph to construct the non-uniform circuit: For every node, build a circuit, say
in DNF, that corresponds to the computation that takes place on that node.
This circuit has size exponential in B in the worst case, 2O(B) , and depth 2.
The entire graph describes a circuit of size Bt 2O(B) and depth O(B). Also, note
that for every sub-circuit that corresponds to each node, the input gates (AND
gates as described in the proof) have a fan-in of at most O(B), while the second level might need exponential fan-in. This construction yields a circuit of
“semi-unbounded” type fan-in.
5
Discussion – Open Problems
In this work we have shown a connection between the power of non-uniformity
and the power of space bounded computation. The proof of the main theorem
is based on the notion of block respecting computation and various techniques
for simulating Turing Machine computation. The main result states that if Polynomial time has small non-uniform depth then space can simulate deterministic
time fast(-er). An interesting open question is to see if the same ideas can be
used to prove a similar space simulation for non-deterministic time. It seems
also possible that a result could be proved for probabilistic classes. A different
approach would be to make a stronger assumption (about complexity classes)
and reach a contradiction with some hierarchy theorem or other diagonalization
result thus proving a complexity class separation.
Acknowledgments. We would like to thank Nicola Galesi, Toni Pitassi and
Charlie Rackoff for many discussions on these ideas. Also many thanks to Dieter
van Melkebeek and Lance Fortnow.
References
[Bor77]
A. Borodin. On relating time and space to size and depth. SIAM Journal
of Computing, 6(4):733–744, December 1977.
[DT85]
Patrick W. Dymond and Martin Tompa. Speedups of deterministic machines by synchronous parallel machines. Journal of Computer and System
Sciences, 30(2):149–161, April 1985.
[HPV77] J. Hopcroft, W. Paul, and L. Valiant. On time versus space. Journal of the
ACM., 24(2):332–337, April 1977.
[PF79]
Nicholas Pippenger and Michael J. Fischer. Relations among complexity
measures. Journal of the ACM, 26(2):361–381, April 1979.
[PPST83] Wolfgang J. Paul, Nicholas Pippenger, Endre Szemerédi, and William T.
Trotter. On determinism versus non-determinism and related problems
(preliminary version). In 24th Annual Symposium on Foundations of Computer Science, pages 429–438, Tucson, Arizona, 7–9 November 1983. IEEE.
320
R.J. Lipton and A. Viglas
[PR81]
[PTC77]
[PV76]
[Sip86]
[Sip88]
[SSZ98]
W. Paul and R. Reischuk. On time versus space II. Journal of Computer
and System Sciences, 22(3):312–327, June 1981.
Wolfgang J. Paul, Robert Endre Tarjan, and James R. Celoni. Space
bounds for a game on graphs. Mathematical Systems Theory, 10:239–251,
1977.
M. S. Paterson and L. G. Valiant. Circuit size is nonlinear in depth. Theoretical Computer Science, 2(3):397–400, September 1976.
M. Sipser. Expanders, randomness, or time versus space. In Alan L. Selman,
editor, Proceedings of the Conference on Structure in Complexity Theory,
volume 223 of LNCS, pages 325–329, Berkeley, CA, June 1986. Springer.
M. Sipser. Expanders, randomness, or time versus space. Journal of Computer and System Sciences, 36:379–383, 1988.
Michael Saks, Aravind Srinivasan, and Shiyu Zhou. Explicit OR-dispersers
with polylogarithmic degree. Journal of the ACM, 45(1):123–154, January
1998.
Dimension- and Time-Hierarchies for Small
Time Bounds
Martin Kutrib
Institute of Informatics, University of Giessen
Arndtstr. 2, D-35392 Giessen, Germany
kutrib@informatik.uni-giessen.de
Abstract. Recently, infinite time hierarchies of separated complexity
classes in the range between real time and linear time have been shown.
This result is generalized to arbitrary dimensions. Furthermore, for fixed
time complexities of the form id + r, where r ∈ o(id) is a sublinear function, proper dimension hierarchies are presented. The hierarchy results
are established by counting arguments. For an equivalence relation and
a family of witness languages the number of induced equivalence classes
is compared to the number of equivalence classes distinguishable by the
model in question. By contradiction the properness of the inclusions is
proved.
1
Introduction
If one is particularly interested in computations with small time bounds, let
us say in the range between real time and linear time, most of the relevant
Turing machine results have been published in the early times of computational
complexity. In the sequel we are concerned with time bounds of the form id + r,
where id denotes the identity function on integers, and r ∈ o(id) is a sublinear
function. Most of the previous investigations in this area have been done in
terms of one-dimensional Turing machines. Recently, infinite time hierarchies of
separated complexity classes in the range in question have been shown [10].
In [2] it has been proved that the complexity class Q which is defined by
nondeterministic multitape real-time computations is equal to the corresponding
linear-time languages. Moreover, it has been shown that two working tapes and a
one-way input tape are sufficient to accept the languages from Q in real time. On
the other hand, in [13] an NP-complete language was exhibited which is accepted
1
by a nondeterministic single-tape Turing machine in time id + O(id 2 log) but
not in real time. This interesting result stresses the power of nondeterminism
impressively and motivates the exploration of the world below linear time once
more.
For deterministic machines the situation is different. Though in [7] for one
tape the identity DTIME1 (id) = DTIME1 (LIN) has been proved, for a total of at
least two tapes the real-time languages are strictly included in the linear-time
languages.
A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 321–332, 2003.
c Springer-Verlag Berlin Heidelberg 2003
322
M. Kutrib
Another aspect that, at first glance, might attack the time range of interest
is a possible speed-up. The well-known linear speed-up [6] from t(n) to id +
ε · t(n) for arbitrary ε > 0 yields complexity classes close to real time (i.e.
DTIME(LIN) = DTIME((1 + ε) · id)) for k-tape and multitape machines, but
does not allow assertions on the range between real time and linear time. An
application to the time bound id + r, r ∈ o(id), would result in a slow-down to
id + ε · (id + r) ≥ id + ε · id.
Let us recall known time hierarchy results. For a number of k ≥ 2 tapes
in [5,14] the hierarchy DTIMEk (t′ ) ⊂ DTIMEk (t), if t′ ∈ o(t) and t constructible,
has been shown. By the linear speed-up we obtain the necessity of the condition
t′ ∈ o(t). The necessity of the constructibility property of t follows from the
well-known Gap Theorem [9].
Since in case of multitape machines one needs to construct a Turing machine
with a fixed number of tapes that simulates machines even with more tapes, the
proof of a corresponding hierarchy involves a reduction of the number of tapes.
This costs a factor log for the time complexity. The hierarchy DTIME(t′ ) ⊂
DTIME(t), if t′ · log(t′ ) ∈ o(t) and t constructible, has been proved in [6]. Due to
the necessary condition t′ ∈ o(t) resp. t′ · log(t′ ) ∈ o(t), again, the range between
real time and linear time is not affected by the known time hierarchy results.
Moreover, it follows immediately from the condition t′ ∈ o(t) and the linear
speed-up that there are no infinite hierarchies for time bounds of the form t + r,
r ∈ o(id), if t ≥ c · id, c > 1.
Related work concerning higher dimensional Turing machines can be found
e.g. in [8], where for on-line computations the trade-off between time and dimensionality is investigated. Upper bounds for the reduction of the dimensions are
dealt with e.g. in [12,15,16,19].
Here, on one hand, we are going to present infinite time hierarchies below linear time for any dimension. Such hierarchies are also known for one-dimensional
iterative arrays [3]. On the other hand, dimension hierarchies are presented for
each time bound in question. Thus, we obtain a double time-dimension hierarchy.
The basic notions and a preliminary result of technical flavor are the objects
of the next section. Section 3 is devoted to the time hierarchies below linear
time. They are established by counting arguments. For an equivalence relation
and a family of witness languages the number of induced equivalence classes
is compared to the number of equivalence classes distinguishable by the model
in question. By contradiction the properness of the inclusions follows. In Section 4 for fixed time complexities of the form id + r, r ∈ o(id) proper dimension
hierarchies are proved.
2
Preliminaries
We denote the rational numbers by Q, the integers by ZZ, the positive integers
{1, 2, ...} by IN and the set IN ∪ {0} by IN0 . The reversal of a word w is denoted
by wR . For the length of w we write |w|. We use ⊆ for inclusions and ⊂ if the
inclusions are strict. Let ei = (0, . . . , 0, 1, 0, . . . , 0) (the 1 is at position i) denote
Dimension- and Time-Hierarchies for Small Time Bounds
323
the ith d-dimensional unit vector, then we define
Ed = {ei | 1 ≤ i ≤ d} ∪ {−ei | 1 ≤ i ≤ d} ∪ {(0, . . . , 0)}.
For a function f : IN0 → IN we denote its i-fold composition by f [i] , i ∈ IN. If f
is increasing and unbounded, then its inverse is defined according to
f −1 (n) = min{m ∈ IN | f (m) ≥ n}.
The identity function n → n is denoted by id. As usual we define the set of
functions that grow strictly less than f by
o(f ) = {g : IN0 → IN | lim
n→∞
g(n)
= 0}.
f (n)
In terms of orders of magnitude, f is an upper bound of the set
O(f ) = {g : IN0 → IN | ∃ n0 , c ∈ IN : ∀ n ≥ n0 : g(n) ≤ c · f (n)}.
Conversely, f is a lower bound of the set Ω(f ) = {g : IN0 → IN | f ∈ O(g)}.
A d-dimensional Turing machine with k ∈ IN tapes consists of a finitestate control, a read-only one-dimensional one-way input tape and k infinite
d-dimensional working tapes. On the input tape a read-only head, and on each
working tape a read-write head is positioned. At the outset of a computation, the
Turing machine is in the designated initial state and the input is the inscription
of the input tape, all the other tapes are blank. The head of the input tape scans
the leftmost input symbol whereas all other heads are positioned on arbitrary
tape cells. Dependent on the current state and the currently scanned symbols
on the k + 1 tapes, the Turing machine changes its state, rewrites the symbols at
the head positions of the working tapes, and possibly moves the heads independently to a neighboring cell. The head of the input tape may only be moved to
the right. With an eye towards language recognition, the machines have no extra output tape but the states are partitioned in accepting and rejecting states.
More formally:
Definition 1. A deterministic d-dimensional Turing machine with k ∈ IN tapes
(DTMdk ) is a system S, T, A, δ, s0 , F , where
1.
2.
3.
4.
5.
6.
S is the finite set of internal states,
T is the finite set of tape symbols containing the blank symbol ,
A ⊆ T \ {} is the set of input symbols,
s0 ∈ S is the initial state,
F ⊆ S is the set of accepting states,
δ : S × (A ∪ {}) × T k → S × T k × {0, 1} × Edk is the partial transition
function.
Since the input tape cannot be rewritten, we need no new symbol for its
current tape cell. Due to the same fact, δ may only expect symbols from A ∪ {}
on it. The input tape is one dimensional and one way and, thus, its head moves
324
M. Kutrib
according to {0, 1}. The set of rejecting states is implicitly given by the partitioning, i.e. S \ F . The unit vectors correspond to the possible moves of the
read-write heads.
Let M be a DTMdk . A configuration of M at some time t ≥ 0 is a description
of its global state which is a (2(k + 1) + 1)-tuple (s, f0 , f1 , . . . , fk , p0 , p1 , . . . , pk ),
where s ∈ S is the current state, f0 : ZZ → A ∪ {} and fi : ZZ d → T are
functions that map the tape cells of the corresponding tape to their current
contents, and p0 ∈ ZZ and pi ∈ ZZ d are the current head positions, 1 ≤ i ≤ k.
The initial configuration (s0 , f0 , f1 , . . . , fk , 1, 0, . . . , 0) at time 0 is defined by
the input word w = a1 · · · an ∈ A∗ , the initial state s0 , and blank working tapes:
a if 1 ≤ m ≤ n
f0 (m) = m
otherwise
fi (m1 , . . . , md ) =
for 1 ≤ i ≤ k
Successor configurations are computed according to the global transition function ∆: Let (s, f0 , f1 , . . . , fk , p0 , p1 , . . . , pk ) be a configuration. Then
(s′ , f0 , f1′ , . . . , fk′ , p′0 , p′1 , . . . , p′k ) = ∆(s, f0 , f1 , . . . , fk , p0 , p1 , . . . , pk )
if and only if
δ(s, f0 (p0 ), f1 (p1 ), . . . , fk (pk )) = (s′ , x1 , . . . , xk , j0 , j1 , . . . , jk )
such that
fi (m1 , . . . , md ) if (m1 , . . . , md ) = pi
xi
if (m1 , . . . , md ) = pi
′
′
pi = pi + ji , p0 = p0 + j0
fi′ (m1 , . . . , md )
=
for 1 ≤ i ≤ k. Thus, the global transition function ∆ is induced by δ. Throughout
the paper we are dealing with so-called multitape machines (DTMd ), where every
machine has an arbitrary but fixed number of working tapes.
A Turing machine halts iff the transition function is undefined for the current
configuration. An input word w ∈ A∗ is accepted by a Turing machine M if the
machine halts at some time in an accepting state, otherwise it is rejected.
L(M) = {w ∈ A∗ | w is accepted by M} is the language accepted by M. If
t : IN0 → IN, t(n) ≥ n, is a function, then M is said to be t-time-bounded or of
time complexity t iff it halts on all inputs w after at most t(|w|) time steps.
If t equals the function id, acceptance is said to be in real time. The lineartime languages are defined according to time complexities t = c · id, where c ∈ Q
with c ≥ 1. Since time complexities are mappings to positive integers and have
to be greater than or equal to id, actually, c · id means max{⌈c · id⌉, id}. But for
convenience we simplify the notation in the sequel.
The family of all languages which can be accepted by DTMd with time complexity t is denoted by DTIMEd (t).
Dimension- and Time-Hierarchies for Small Time Bounds
325
In order to prove tight time hierarchies, in almost all cases well-behaved
time bounding functions are required. Usually, the notion “well-behaved” is concretized in terms of computability or constructibility of the functions with respect
to the device in question.
Definition 2. Let d ∈ IN be a constant. A function f : IN0 → IN is said to be
DTMd constructible iff there exists a DTMd which for every n ∈ IN on input 1n
halts after exactly f (n) time steps.
Another common definition of constructibility demands the existence of an
O(f )-time-bounded Turing machine that computes the binary representation of
the value f (n) on input 1n . Both definitions have been proven to be equivalent
for multitape machines [11].
The following definition summarizes the properties of well-behaved (in our
sense) functions and names them.
Definition 3. The set of all increasing, unbounded DTMd -constructible functions f with the property ∀ c ∈ IN : ∃ c′ ∈ IN : c · f (n) ≤ f (c′ · n) is denoted by
T (DTMd ). The set of their inverses is T −1 (DTMd ) = {f −1 | f ∈ T (DTMd )}.
Since we are interested in time bounds of the form id + r, we need small
functions r below the identity. The constructible functions are necessarily greater
than the identity. Therefore, the inverses of constructible functions are used.
The properties increasing and unbounded are straightforward. At first glance
the property ∀ c ∈ IN : ∃ c′ ∈ IN : c · f (n) ≤ f (c′ · n) seems to be restrictive,
but it is not. It is easily verified that almost all of the commonly considered
bounding functions above the identity have this property (e.g, the identity itself,
polynomials, exponential functions, etc.) As usual here we remark that even the
family T(DTM1 ) is very rich. More details can be found for example in [1,17,20].
In order to clarify later calculations, we observe the following: Let r ∈
T −1 (DTMd ) be some function. Then there must exist a constructible function
r̂ ∈ T (DTMd ) such that r = r̂−1 . Moreover, for all n we obtain r(r̂(n)) = n
by definition: r(r̂(n)) = min{m ∈ IN | r̂(m) ≥ r̂(n)} implies m = n and, thus,
r(r̂(n)) = n.
In general, we do not have equality for the converse r̂(r(n)), but in the sequel
we will need only the equality case.
The following equivalence relation is well known (cf. Myhill-Nerode Theorem
on regular languages).
Definition 4. Let L ⊆ A∗ be a language over an alphabet A and l ∈ IN0 be a
constant. Two words w and w′ are l-equivalent with respect to L if and only if
wwl ∈ L ⇐⇒ w′ wl ∈ L
for all wl ∈ Al . The number of l-equivalence classes of words of length n − l with
respect to L (i.e. |wwl | = n) is denoted by N (n, l, L).
The underlying idea is to bound the number of distinguishable equivalence
classes. The following lemma gives a necessary condition for a language to be
(id + r)-time acceptable by a DTMd .
326
M. Kutrib
Lemma 5. Let r : IN0 → IN be a function and d ∈ IN be a constant. If L ∈
DTIMEd (id + r), then there exists a constant p > 1 such that:
N (n, l, L) ≤ p(l+r(n))
d
Proof. Let M = S, T, A, δ, s0 , F be a (id + r)-time DTMd that accepts a language L.
In order to determine an upper bound for the number of l-equivalence classes,
we consider the possible situations of M after reading all but l input symbols.
The remaining computation depends on the current internal state and the contents of the at most (2(l + r(n)) + 1)d cells on each tape that are still reachable
during the last at most l + r(n) time steps.
Let p1 = max{|T |, |S|, 2}.
(2(l+r(n))+1)d
For the (2(l + r(n)) + 1)d cells per tape there are at most p1
different inscriptions. For some k ∈ IN tapes we obtain altogether at most
k(2(l+r(n))+1)d +1
different situations which bounds the number of l-equivalence
p1
(k+1)·3d
classes. The lemma follows for p = p1
3
.
⊓
⊔
Time Hierarchies
In this section we will present the time hierarchies between real time and linear
time for any dimension d ∈ IN.
Theorem 6. Let r : IN0 → IN and r′ : IN0 → IN be two increasing functions
1
and d ∈ IN be a constant. If r ∈ T −1 (DTMd ), r ∈ O(id d ), and either r′ ∈ o(r)
if d = 1, or r′ ∈ o(r1−ε ) for an arbitrarily small ε > 0 if d > 1, then
DTIMEd (id + r′ ) ⊂ DTIMEd (id + r).
Before proving the theorem we give the following example which is naturally
based on root functions. The dimension hierarchies to be proved in Theorem 8
are also depicted.
Example 7. Since T (DTMd ) contains the polynomials idc , c ≥ 1, the functions
1
1
id c are belonging to T −1 (DTMd ). (Actually, the inverses of idc are ⌈id c ⌉ but
as mentioned before we simplify the notation for convenience.)
1
1
For d = 1, trivially, id i+1 ∈ o(id i ).
1
1
For d > 1 we need to find an ε such that id i+1 ∈ o(id i (1−ε) ). The condition
i
1
< 1i (1 − ε). Thus, if i+1
< 1 − ε and therefore, if
is fulfilled if and only if i+1
1
i
.
ε < 1 − i+1 . We conclude that the condition is fulfilled for all ε < i+1
The hierarchy ist depicted in Figure 1.
⊓
⊔
Proof (of Theorem 6). At first we adjust a constant q dependent on ε. Choose q
such that
d−1
≤ε
dq + d
for d > 1, and q = 1 for d = 1.
Dimension- and Time-Hierarchies for Small Time Bounds
´ ·
327
µ
´ ·
µ
´ ·
µ
´ ·
µ
´ ·
µ
´ ·
µ
´ ·
µ
´ ·
µ
´ ·
µ
ºº
º
ºº
º
ºº
º
´ µ
´ µ
´ µ
´ ·
µ
ºº
º
´ µ
Fig. 1. Double hierarchy based on root functions.
Since r ∈ T −1 (DTMd ), i.e. r is the inverse of a constructible function, there
exists a constructible function r−1 ∈ T (DTMd ) such that r(r−1 (n)) = n.
Now we are prepared to define a witness language L1 for the assertion.
The words of L1 are of the form
al br
−1
q−1
(l1+d
)
R
R
w1 $w1R c
|w2 $w2 c
| · · · c|ws $ws c|d1 · · · dm y,
q
q−1
where l ∈ IN is a positive integer, s = ld , m = (d − 1) · ld , y, wi ∈ {0, 1}l ,
1 ≤ i ≤ s, and di ∈ Ed−1 , 1 ≤ i ≤ m.
The acceptance of such a word is best described by the behavior of an accepting DTMd M.
During a first phase, M reads al and stores it on a tape. Since d and q are
q−1
constants, f (l) = l1+d
is a polynomial and, thus, constructible. The function
r−1 is constructible per assumption. The constructible functions are closed under
composition. Therefore, during a second phase, M can simulate a constructor
for r−1 (f ) on the stored input al and verify the number of b’s.
Parallel to what follows, M verifies the lengths of the subwords wi to be l
q
(with the help of the stored al ) and the numbers s and m (s = ld as well as
q−1
m = (d − 1) · ld
are constructible functions).
When w1 appears in the input M begins to store the subwords wi in a dq−1
q−1
q−1
dimensional area of size ld ×· · ·×ld ×l1+d . Suppose the area to consist of l
dq−1
hypercubes with edge length l
that are stacked up. The subwords are stored
q−1
along the last coordinate, such that ld
subwords are stacked up, respectively.
If, for example, the head of the corresponding tape is located at coordinates
(m1 , . . . , md ), then the following subword wi is stored into the cells
(m1 , . . . , md−1 , md ), (m1 , . . . , md−1 , md + 1), . . . , (m1 , . . . , md−1 , md + l − 1).
Temporarily, wi is also stored on another tape. Now M has to decide where to
store the next subword wi+1 (for this purpose it simulates appropriate construcq−1
tors for ld ). In principle, there are two possibilities. The first one is that wi+1
328
M. Kutrib
is stored as a neighbor of wi . In this case the head has to move back to position
(m1 , . . . , md ) and to change the dth coordinate appropriately. The second one
is that the subword wi+1 is stored below wi . In this case the head has to keep
its position (m1 , . . . , md + l). The head is possibly moved while reading wiR . In
both cases wiR is verified with the temporarily stored wi .
The last phase leads to acceptance or rejection. After storing all subwords wi ,
q−1
we may assume that the last coordinate of the head position is l1+d
(i.e., the
head is on the bottom face of the area). While reading the di , M changes its head
simply by adding di to the current position. Since di ∈ Ed−1 the d th coordinate
q−1
is not affected. This phase leads to a head position (m1 , . . . , md−1 , l1+d ). Now
the subword y is read and stored on two other tapes. Finally, M verifies whether
or not y matches one of the subwords which have been stacked up in the cells
q−1
(m1 , . . . , md−1 , 0), . . . , (m1 , . . . , md−1 , l1+d
− 1)
(if there are stored subwords in these cells at all). Continuous comparisons without delay are achieved by alternating moving one head from back to forth on
one of the stored copies of y, while the other head moves from forth to back over
the second copy. Machine M accepts if and only if it finds a matching subword.
Altogether, M needs n time steps for reading the whole input and at most
q−1
another l1+d
time steps for comparing the y with the stacked up subwords.
q−1
The first part of the input contains r−1 (l1+d ) symbols b. Therefore, n >
q−1
q−1
q−1
r−1 (l1+d ) and since r is increasing, r(n) ≥ r(r−1 (l1+d )) = l1+d . We conclude that M obeys the time complexity id+r and, hence, L1 ∈ DTIMEd (id + r).
Assume now L1 is acceptable by some DTMd M with time complexity id+r′ .
Two words
−1 1+dq−1
R
R
)
w1 $w1R c
al br (l
|w2 $w2 c| · · · c|ws $ws c
|
and
al br
−1
q−1
(l1+d
)
′
′R
′
′R
w1′ $w1′R c
|w2 $w2 c| · · · c
|ws $ws c|
are not (m + l)-equivalent with respect to L1 if the sets {w1 , . . . , ws } and
l
{w1′ , . . . , ws′ } are different. There exist exactly l2dq different subsets of {0, 1}l
q
q
with s = ld elements. For l large enough such that log(ld ) ≤ 14 l, it follows:
dq
l
q l
2 − ld
2l
N (n, l + m, L1 ) ≥ dq >
ldq
l
l ldq
l
dq
22
)
2 −log(l
≥
=
2
q
ld
q
ld
q
l
≥ 24
ld
l dq
= 24l
∈ 2Ω(l
1+dq
)
On the other hand, by Lemma 5 the number of equivalence classes distinguishable by M, is bounded for a constant p > 1:
′
N (n, l + m, L1 ) ≤ p(l+m+r (n))
d
Dimension- and Time-Hierarchies for Small Time Bounds
329
For n we have
q−1
n = l + r−1 (l1+d
q
q
q−1
) + (2l + 2) · ld + (d − 1) · ld
q−1
= O(l1+d ) + r−1 (l1+d
+l
).
1
Since r ∈ O(id d ), it follows r−1 ∈ Ω(idd ). Therefore,
q−1
r−1 (l1+d
We conclude
q
) ∈ Ω(ld+d ).
q−1
n ≤ c1 · r−1 (l1+d
) for some c1 ∈ IN.
Due to the property ∀ c ∈ IN : ∃ c′ ∈ IN : c · r−1 (n) ≤ r−1 (c′ · n), we obtain
q−1
n ≤ r−1 (c2 · l1+d
From 1 − ε ≤ 1 −
d−1
dq +d
dq +1
dq +d
=
=
) for some c2 ∈ IN.
1
dq−1 + d
dq−1 +1
and r′ ∈ o(r1−ε ) it follows:
q−1
r′ (n) ≤ r′ (r−1 (c2 · l1+d
))
q−1
∈ o(r(r−1 (c2 · l1+d
q−1
1
= o(l d +d
q−1
By l + m = l + (d − 1) · ld
)
q−1
∈ O(ld
dq−1 + 1
d
)) dq−1 +1 )
) it holds:
q−1
(l + m + r′ (n))d ∈ (O(ld
1
q−1
1
) + o(l d +d
q−1
= o(l d +d
))d
q
)d = o(l1+d )
So the number of distinguishable equivalence classes is
N (n, l + m, L1 ) ≤ po(l
1+dq
)
= 2o(l
1+dq
)
.
Now we have the contradiction that previously N (n, l + m, L1 ) has been calcu1+dq
)
lated to be at least 2Ω(l
which proves L1 ∈
/ DTIMEd (id + r′ ).
⊓
⊔
For one-dimensional machines we have hierarchies from real time to linear
time. Due to the possible speed-up from id + r to id + ε · r the condition r′ ∈ o(r)
cannot be relaxed.
4
Dimension Hierarchies
Now we are going to show that there exist infinite dimension hierarchies for all
time complexities in question. So we obtain double hierarchies. It turns out that
dimensions are more powerful than small time bounds.
330
M. Kutrib
Theorem 8. Let r : IN0 → IN be an increasing function and d ∈ IN be a con1
stant. If r ∈ o(id d ), then
DTIMEd+1 (id) \ DTIMEd (id + r) = ∅.
Again, before proving the theorem, we present an example based on natural
functions. It shows another double hierarchy.
Example 9. Since T (DTMd ) is closed under composition and contains 2id , the
functions log[i] , i ≥ 1 are belonging to T −1 (DTMd ).
For d = 1, trivially, log[i+1] ∈ o(log[i] ).
For d > 1 we need to find an ε such that log[i+1] ∈ o((log[i] )1−ε ). We have
log(log[i] ) and (log[i] )1−ε and, therefore, the condition is fulfilled for all ε < 1:
The hierarchy ist depicted in Figure 2.
⊓
⊔
℄
℄
℄
℄
℄
℄
℄
℄
Fig. 2. Double hierarchy based on iterated logarithms.
Proof (of Theorem 8). The words of the witness language L2 are of the form
R
w1 $w1R c|w2 $w2R c
|···c
|ws $ws c|d1 · · · dm y,
where l ∈ IN is a positive integer, s = ld , m = d · l, y, wi ∈ {0, 1}l , 1 ≤ i ≤ s,
and di ∈ Ed , 1 ≤ i ≤ m.
An accepting (d + 1)-dimensional real-time machine M works as follows. The
subwords wi are stored into a (d + 1)-dimensional area of size l × l × · · · × l. The
first symbols of the subwords wi are stored at the ld positions
(0, 0, . . . , 0) to (l − 1, l − 1, . . . , l − 1, 0).
The remaining symbols of each wi are stored along the (d + 1)st dimension,
respectively.
After storing the subwords, M moves its corresponding head as requested
by the di . Since the di are belonging to Ed , this movement is within the first d
Dimension- and Time-Hierarchies for Small Time Bounds
331
dimensions only. Finally, when y appears in the input, M tries to compare it
with the subword stored at the current position. M accepts if a subword has
been stored at the current position at all and if the subword matches y. Thus,
L2 ∈ DTIMEd+1 (id).
In order to apply Lemma 5, we observe that, again, two words
R
w1 $w1R c|w2 $w2R c
| · · · c|ws $ws c|
and
′R
′R
w1′ $w1′R c
|w2 $w2 c| · · · c|ws $ws c
|
are not (m + l)-equivalent with respect to L2 if the sets {w1 , . . . , ws } and
{w1′ , . . . , ws′ } are different. Therefore, L2 induces at least
l
d+1
2
N (n, l + m, L2 ) ≥ d ≥ 2Ω(l )
l
equivalence classes for all sufficiently large l.
On the other hand, we obtain an upper bound of the number of distinguishable equivalence classes for an (id + r)-time DTMd M as follows:
N (n, l + m, L2 ) ≤ p(l+m+r(n))
d
= p(l+d·l+r((2l+2)·l
≤ p(c1 ·l+r(c2 ·l
∈p
d+1
d
+l+d·l))d
))d
for some c1 , c2 ∈ IN
1
(O(l)+o(c2 ·ld+1 ) d )d
= p(O(l)+o(l
= po(l
= po(l
d+1
d
1
since r ∈ o(id d )
))d
d+1
d )d
d+1
)
= 2o(l
d+1
)
From the contradiction L2 ∈
/ DTIMEd (id + r) follows.
⊓
⊔
The inclusions DTIMEd+1 (id) ⊆ DTIMEd+1 (id + r) and DTIMEd (id + r) ⊆
DTIMEd+1 (id+r) are trivial. An application of Theorem 8 yields the hierarchies:
Corollary 10. Let r : IN0 → IN be an increasing function and d ∈ IN be a
1
constant. If r ∈ o(id d ), then
DTIMEd (id + r) ⊂ DTIMEd+1 (id + r).
1
Note that despite the condition r ∈ o(id d ), the dimension hierarchies can
1
touch r = id d :
1
1
1
1
id d ∈ o(id d−1 ) and DTIMEd−1 (id + id d ) ⊂ DTIMEd (id + id d ).
332
M. Kutrib
References
1. Balcázar, J. L., Dı́az, J., and Gabarró, J. Structural Complexity I . Springer, Berlin,
1988.
2. Book, R. V. and Greibach, S. A. Quasi-realtime languages. Math. Systems Theory
4 (1970), 97–111.
3. Buchholz, T., Klein, A. and Kutrib, M. Iterative arrays with small time bounds,
Mathematical Foundations of Computer Science (MFCS 2000), LNCS 1893,
Springer 2000, pp. 243–252.
4. Cole, S. N. Real-time computation by n-dimensional iterative arrays of finite-state
machines. IEEE Trans. Comput. C-18 (1969), 349–365.
5. Fürer, M. The tight deterministic time hierarchy. Proceedings of the Fourteenth
Annual ACM Symposium on Theory of Computing (STOC ’82), 1982, pp. 8–16.
6. Hartmanis, J. and Stearns, R. E. On the computational complexity of algorithms.
Trans. Amer. Math. Soc. 117 (1965), 285–306.
7. Hennie, F. C. One-tape, off-line turing machine computations. Inform. Control 8
(1965), 553–578.
8. Hennie, F. C. On-line turing machine computations. IEEE Trans. Elect. Comput.
EC-15 (1966), 35–44.
9. Hopcroft, J. E. and Ullman, J. D. Introduction to Automata Theory, Language,
and Computation. Addison-Wesley, Reading, Massachusetts, 1979.
10. Klein A. and Kutrib, M. Deterministic Turing machines in the range between
real-time and linear-time. Theoret. Comput. Sci. 289 (2002), 253–275.
11. Kobayashi, K. On proving time constructibility of functions. Theoret. Comput.
Sci. 35 (1985), 215–225.
12. Loui, M. C. Simulations among multidimensional turing machines. Theoret. Comput. Sci. 21 (1982), 145–161.
13. Michel P. An NP-complete language accepted in linear time by a one-tape Turing
machine. Theoret. Comput. Sci. 85 (1991), 205–212.
14. Paul, W. J. On time hierarchies. J. Comput. System Sci. 19 (1979), 197–202.
15. Paul, W., Seiferas, J. I., and Simon, J. An information-theoretic approach to time
bounds for on-line computation. J. Comput. System Sci. 23 (1981), 108–126.
16. Pippenger, N. and Fischer, M. J. Relations among complexity measures. J. Assoc.
Comput. Mach. 26 (1979), 361–381.
17. Reischuk, R. Einführung in die Komplexitätstheorie. Teubner, Stuttgart, 1990.
18. Rosenberg, A. L. Real-time definable languages. J. Assoc. Comput. Mach. 14
(1967), 645–662.
19. Stoß, H.-J. Zwei-Band Simulation von Turingmaschinen. Computing 7 (1971),
222–235.
20. Wagner, K. and Wechsung, G. Computational Complexity. Reidel, Dordrecht,
1986.
Baire’s Categories on Small Complexity Classes
Philippe Moser
Computer Science Department, University of Geneva
moser@cui.unige.ch
Abstract. We generalize resource-bounded Baire’s categories to small
complexity classes such as P, QP and SUBEXP and to probabilistic classes
such as BPP. We give an alternative characterization of small sets via
resource-bounded Banach-Mazur games. As an application we show that
for almost every language A ∈ SUBEXP, in the sense of Baire’s category,
PA = BPPA .
1
Introduction
Resource-bounded measure and resource-bounded Baire’s Category were introduced by Lutz in [1] and [2] for both complexity classes E and EXP. It
provides a means of investigating the sizes of various subsets of E and EXP.
In resource-bounded measure the small sets are those with measure zero, in
resource-bounded Baire’s Category the small sets are those of first category
(meager sets). Both smallness notions satisfy the following three axioms. First
every single language L ∈ E is small, second the whole class E is large, and finally
“easy infinite unions” of small sets are small. These axioms meet the essence of
Lebegue’s measure and Baire’s category and ensure that it is impossible for a
subset of E to be both large and small.
The first goal of Lutz’s approach was to extend existence results, such as
“there is a language in C satisfying property P ”, to abundance results such as
“most languages in C satisfy property P ”, which is more informative since an
abundance result reflects the typical behavior of languages in a class, whereas
an existence result could as well correspond to an exception in the class. Both
resource-bounded measure and resource-bounded Baire’s Category have been
successfully used to understand the structure of the exponential time classes E
and EXP.
An important problem in resource-bounded measure theory was to generalize
Lutz’s measure theory to small complexity classes such as P, QP and SUBEXP
and to probabilistic classes such as BPP and BPE. These issues have been solved
in the following list of papers [3], [4], [5] and [6]. As noticed in [7], the same
question in the Baire’s category setting was still left unanswered.
In this paper we solve this problem by generalizing resource-bounded Baire’s
categories on small complexity classes such as P, QP and SUBEXP and to probabilistic classes such as BPP. We also give an alternative characterization of
meager sets through Banach-Mazur games. As an application we improve the
result of [3] where it was shown that for almost every language A ∈ SUBEXP, in
A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 333–342, 2003.
c Springer-Verlag Berlin Heidelberg 2003
334
P. Moser
the sense of resource-bounded measure, PA = BPP. The question whether the
same result holds with PA = BPPA was raised in [3]. We answer this question
affirmatively in the resource-bounded Baire’s category setting, by showing show
that for almost every language A ∈ SUBEXP, in the sense of resource-bounded
Baire’s category, PA = BPPA .
The remainder of the paper is organized as follows. In section 3 we introduce resource-bounded Baire’s category on P. In section 3.1 we give another
characterization of small sets through resource-bounded Banach-Mazur games.
In section 4 we introduce resource-bounded Baire’s category on BPP with the
corresponding resource-bounded Banach-Mazur games formulation. Finally in
section 5 we prove the result on BPP mentioned above.
2
Preliminaries
We use standard notation for traditional complexity classes; see for instance
δ
[8] and [9], or [10]. For ǫ > 0, denote by Eǫ the class Eǫ = δ<ǫ DTIME(2n ).
SUBEXPis the class ∩ǫ>0 Eǫ , and quasi polynomial time refers to the class QP =
k
∪k≥1 DTIME(nlog n ). Let us fix some notations for strings and languages. Let
s0 , s1 , . . . be the standard enumeration of the strings in {0, 1}∗ in lexicographical
order, where s0 = λ denotes the empty string. A sequence is an element of
{0, 1}∞ . If w is a string or a sequence and 1 ≤ i ≤ |w| then w[i] and w[si ] denotes
the ith bit of w. Similarly w[i . . . j] and w[si . . . sj ] denote the ith through jth
bits, and dom(w) the domain of w, where w is viewed as a partial function. We
identify language L with its characteristic function χL , where χL is the sequence
such that χL [i] = 1 iff si ∈ L. For a string si define its position by pos(si ) = i. If
w1 is a string and w2 is a string or a sequence extending w1 , we write w1 ⊑ w2 .
We write w1 ❁ w2 if w1 ⊑ w2 and w1 = w2 . For two strings τ, σ ∈ {0, 1}∗ , we
denote by τ ∧ σ the concatenation of τ followed by σ. For a, b ∈ N let a−̇b denote
max(a − b, 0). We identify N with {0, 1}∗ , thus we denote by NN the set of all
functions mapping strings to strings.
2.1
Finite Extension Strategies
Whereas resource-bounded measure is defined via martingales, resource-bounded
Baire’s category is defined via finite extension strategies. Here is a definition.
Definition 1. A function h : {0, 1}∗ → {0, 1}∗ is a finite extension strategy, or
a constructor, if for every string τ ∈ {0, 1}∗ , τ ⊑ h(τ ).
For simplicity we will use the word “strategy” for finite extension strategy.
We will often consider indexed strategies. An indexed strategy is a function
h : N × {0, 1}∗ → {0, 1}∗ , such that hi := h(i, ·) is a strategy for every i ∈ N. If
h is a strategy and τ ∈ {0, 1}∗ , define ext h(τ ) to be the unique string u such
that h(τ ) = τ ∧ u. We say a strategy h avoids some language A (or language A
avoids strategy h) if for every string τ ∈ {0, 1}∗ we have h(τ ) ⊑ χA . We say a
Baire’s Categories on Small Complexity Classes
335
strategy h meets some language A if h does not avoid A.
For the results in Section 5 we will need the following definition of the relativized hardness of a pseudorandom generator.
Definition 2. Let A be any language. The hardness H A (Gm,n ) of a random
generator Gm,n : {0, 1}m −→ {0, 1}n , is defined as the minimal s such that there
exists an n-input circuit C with oracle gates to A, of size at most s, for which:
|
Pr
x∈{0,1}m
[C(Gm (x)) = 1] −
Pr
y∈{0,1}n
[C(y) = 1]| ≥
1
.
s
(1)
Klivans and Melkebeek [11] noticed that Impagliazzo and Widgerson’s [12]
pseudorandom generator construction relativizes; i.e. for any language A, there
is a deterministic polynomial time procedure that converts the truth table of a
Boolean function that is hard to compute for circuits having oracle gates for A,
into a pseudorandom generator that is pseudorandom for circuits with A oracle
gates. More precisely,
Theorem 1 (Klivans-Melkebeek [11]).
Let A be any language. There is a polynomial-time computable function F :
{0, 1}∗ × {0, 1}∗ → {0, 1}∗ , with the following properties. For every ǫ > 0, there
exists a, b ∈ N such that
a
F : {0, 1}n × {0, 1}b log n → {0, 1}n ,
(2)
and if r is the truth table of a (a log n)-variables Boolean function of A-oracle
circuit complexity at least nǫa , then the function Gr (s) = F (r, s) is a generator,
mapping {0, 1}b log n into {0, 1}n , which has hardness H A (Gr ) > n.
3
Baire’s Category on P
To define a resource bounded Baire’s category on P, we will consider strategies
computed by Turing machines which have random access to their inputs, i.e. on
input τ , the machine can query any bit of τ to its oracle. For such a random
Turing machine M running on input τ , we denote this convention by M τ (·).
Note that random Turing machines can compute the lengths of their input τ in
O(log |τ |) steps, by using bisection. We will consider random Turing machines
running in time polylog in the input’s length |τ | or equivalently polynomial in
|s|τ | |. Note that such machines cannot read their entire input, but only a sparse
subset of it.
Definition 3. An indexed strategy h : N × {0, 1}∗ → {0, 1}∗ is P-computable
if there is a random access Turing machine M as above, such that for every
τ ∈ {0, 1}∗ and every i ∈ N,
M τ (0i ) = ext hi (τ )
where M runs in time polynomial in |s|τ | | + i.
(3)
336
P. Moser
We say a class is small if there is a single indexed strategy that avoids every
language in the class. More precisely,
Definition 4. A class C of languages is P-meager if there exists a P-computable
indexed strategy h, such that for every L ∈ C there exists i ∈ N, such that hi
avoids L.
In order to formalize the third axiom we need to define “easy infinite unions”
precisely.
Definition 5. X = i∈N Xi is a P-union of P-meager sets, if there exists an
indexed P-computable strategy h : N × N × {0, 1}∗ → {0, 1}∗ , such that for every
i ∈ N, hi,· witnesses Xi ’s meagerness.
Let us prove the three basic axioms.
Theorem 2. For any language L in P, the singleton {L} is P-meager.
Proof. Let L ∈ P be any language. We describe a P-computable constructor h
which avoids {L}. Consider the following Turing machine M computing h. On
input string σ, M σ simply outputs 1 − L(s|σ|+1 ). h is clearly P-computable, and
h avoids {L}.
⊓
⊔
The proof of the third axiom is straightforward.
Theorem 3.
1. All subsets of a P-meager set are P-meager.
2. A P-union of P-meager sets is P-meager.
Proof. Immediate by definition of P-meagerness.
⊓
⊔
Let us prove the second axiom which says that the whole space P is not small.
Theorem 4. P is not P-meager.
Proof. Let h be an indexed P-computable constructor and let M be a Turing
machine computing h. We construct a language L ∈ P which meets hi for every i.
The idea is to construct a language L with the following characteristic function,
χL = |
0 | ext h1 (B0 )0· · ·0 |ext h2 (B0 ∧ B1 )0· · ·0 |· · ·|ext hi (B0 ∧ B1 ∧ · · · ∧ Bi−1 )0 · · ·0 |
B0
B1
B2
Bi
(4)
where block Bi corresponds to all strings of size i, and block Bi contains
ext hi (B0 ∧ B1 ∧ · · · ∧ Bi−1 ) followed by a padding with 0’s. Bi is large enough to
contain ext hi (B0 ∧ B1 ∧ · · · ∧ Bi−1 ), because M ’s output’s length is bounded by
a polynomial in i.
Let us construct a polynomial time Turing machine N deciding L. On input
x, where |x| = n,
Baire’s Categories on Small Complexity Classes
337
1. Compute p where x is the pth word
of ∧length n.
∧
∧
2. For i = 1 to n simulate M B0 B1 ··· Bi−1 (0i ). Answer M ’s queries with
the previously stored binary sequences B̄1 , B̄2 , B̄i−1 in the following way.
∧
∧
∧
Suppose that during its simulation M B0 B1 ··· Bi−1 (0i ) queries the kth bit
∧
∧
∧
of B0 B1 · · · Bi−1 to its oracle. To answer this query, simply compute sk
and compute its lengths lk and its position pk among words of size lk . Look
up whether the stored binary sequence B̄lk contains a pk th bit bk . If this is
the case answer M ’s query with bk , else answer M ’s query with 0. Finally
∧
∧
∧
store the output of M B0 B1 ··· Bi−1 (0i ) under B̄i .
3. If the stored binary sequence B̄n contains a pth bit then output this bit, else
output 0 (x is in the padded zone of Bn ).
Let us check that L is in P. The first and third step are clearly computable
in time polynomial in n. For the second step we have that for each of the n
recursive steps there are at most a polynomial number of queries (because h is
P-computable) and each simulation of M once the queries are answered takes
time polynomial in n because M is polynomial. Note that all B̄i ’s have size
polynomial in n, therefore it’s no problem to store them.
⊓
⊔
3.1
Resource-Bounded Banach-Mazur Games
We give an alternative characterization of small sets via resource-bounded
Banach-Mazur games. Informally speaking, a Banach-Mazur game, is a game
between two strategies f and g, where the game begins with the empty string
λ. Then g ◦ f is applied successively on λ. Such a game yields a unique infinite
string, or equivalently a language, called the result of the play between f and g.
For a class C, we say that g is a winning strategy if it can force the result of the
game with any strategy f to be a language not in C. We show that the existence
of a winning strategy is equivalent to the meagerness of C. This equivalence result is useful in practice, since it is often easier to find a winning strategy, rather
than a finite extension strategy.
Definition 6.
1. A play of a Banach-Mazur game is a pair (f, g) of strategies such that for
every string τ ∈ {0, 1}∗ , τ ❁ g(τ ).
2. The result R(f, g) of the play (f, g) is the unique element of {0, 1}∞ that
extends (g ◦ f )i (λ) for every i ∈ N.
For a class of languages C and two function classes FI and FII , denote by
G[C, FI , FII ] the Banach-Mazur game with distinguished set C, where player I
must choose a strategy in FI , and player II a strategy in FII . We say player II
wins the play (f, g) if R(f, g) ∈ C, otherwise we say player I wins. We say player
II has a winning strategy for the game G[C, FI , FII ], if there exists a strategy
g ∈ FII such that for every strategy f ∈ FI , player II wins (f, g)
The following result states that a class is meager iff there is a winning strategy
for player II. This is very useful since in practice it is often easier to give a winning
strategy for player II, than to exhibit a constructor avoiding every language in
the class.
338
P. Moser
Theorem 5. Let X be any class of languages. The following are equivalent.
1. Player II has a winning strategy for G[X, NN , P].
2. X is P-meager.
Proof. Suppose the first statement holds and let g be a P-computable wining
strategy for player II. Let M be a Turing machine computing g. We define an
indexed P-computable constructor h. Let k ∈ N and σ ∈ {0, 1}∗ ,
hk (σ) := g(σ ′ )
where σ ′ = σ ∧ 0k−̇|σ| .
(5)
′
h is P-computable because computing hk (σ) simply requires to simulate M σ ,
answering M’s queries in dom(σ ′ )\dom(σ) by 0. We show that if language A
meets hk for every k ∈ N, then A ∈ X. This implies that X is P-meager as
witnessed by h. To do this we show that for every α ❁ χA there is a string β
such that,
α ⊑ β ⊑ g(β) ❁ χA .
(6)
If this holds, then player I has a winning strategy yielding R(f, g) = A: for a
given α player I extends it to obtain the corresponding β, thus forcing player II
to extend to a prefix of χA . So let α be any prefix of χA , where |α| = k. Since A
meets hk , there is a string σ ❁ χA such that
σ ′ ⊑ g(σ ′ ) = hk (σ) ❁ χA
(7)
where σ ′ = σ ∧ 0k−̇|σ| . Since |α| ≤ |σ ′ | and α, σ ′ are prefixes of χA , we have
α ⊑ σ ′ . Define β to be σ ′ .
For the other direction, let X be P-meager as witnessed by h, i.e. for every A ∈
X there exists i ∈ N such that hi avoids A. Let N be a Turing machine computing
h. We define a P-computable constructor g inducing a winning strategy for player
II in the game G[X, NN , P]. We show that for any strategy f , R(f, g) meets hi
for every i ∈ N, which implies R(f, g) ∈ X. Here is a description of a Turing
machine M computing g. For a string σ , M σ does the following.
1. Compute n0 = mint≤n [(∀τ ⊑ σ such that |τ | ≤ n) ht (τ ) ⊑ σ], where n =
|s|σ| |.
2. If no such n0 exists output 0.
∧
3. If n0 exists (hn0 is the next strategy to be met), simulate N σ 0 (0n0 ) answering N ’s queries in dom(σ ∧ 0)\dom(σ) with 0, denote N ’s answer by ω.
Output 0 ∧ ω.
g is clearly P-computable. We show that R(f, g) meets every hi for any
strategy f . Suppose for a contradiction that this is not the case, i.e. there is a
strategy f such that R(f, g) does not meet h. Let n0 be the smallest index such
that R(f, g) does not meet hn0 . Since R(f, g) meets hn0 −1 there is a string τ
such that hn0 −1 (τ ) ❁ R(f, g). Since g strictly extends strings at every round,
after at most 2O(|τ |) rounds, f will output a string σ long enough to enable step 1
(of M ’s description) to find out that hn0 −1 (τ ) ⊑ σ ❁ R(f, g) thus incrementing
n0 −1 to n0 . At this round we have g(σ) = σ ∧ 0 ∧ ext hn0 (σ ∧ 0), i.e. hn0 ❁ R(f, g)
which is a contradiction.
⊓
⊔
Baire’s Categories on Small Complexity Classes
339
It is easy to check that throughout Section 3, P can be replaced by QP
or Eǫ , thus yielding a Baire’s category notion on both quasi-polynomial and
subexponential time classes.
4
Baire’s Category on BPP
To construct a notion of Baire’s category on probabilistic classes, we will use the
following probabilistic indexed strategies.
Definition 7. An indexed strategy h : N × {0, 1}∗ → {0, 1}∗ is BPP-computable
if there is a probabilistic oracle Turing machine M such that for every τ ∈ {0, 1}∗
and every i, n ∈ N,
Pr[M τ (0i , 0n ) = ext hi (τ )] ≥ 1 − 2−n
(8)
where the probability is taken over the internal coin tosses of M , and M runs in
time polynomial in |s|τ | | + i + n.
By using standard Chernoff bound arguments it is easy to show that Definition 7 is robust, i.e. the error probability can range from 1/2 + 1/p(n) to
1 − 2−q(n) for any polynomials p, q, without enlarging (resp. reducing) the class
of strategies defined in Definition 7.
As in Section 3, a class is meager if there is a single probabilistic strategy
that avoids every language in the class.
Definition 8. A class of languages C is BPP-meager if there exists a BPPcomputable indexed strategy h, such that for every L ∈ C there exists i ∈ N, such
that hi avoids L.
As in section 3, we need to define “easy infinite unions” precisely in order to
prove the third axiom.
Definition 9. X = i∈N Xi is a BPP-union of BPP-meager sets, if there exists
an indexed BPP-computable strategy h : N × N × {0, 1}∗ → {0, 1}∗ , such that for
every i ∈ N, hi,· witnesses Xi ’s meagerness.
Let us prove that all three axioms hold for our Baire’s category notion on
BPP.
Theorem 6. For any language L in BPP, {L} is BPP-meager.
Proof. The proof is similar to Theorem 2 except that the constructor h is computed with error probability smaller than 2−n .
⊓
⊔
The third axiom holds by definition.
Theorem 7.
1. All subsets of a BPP-meager set are BPP-meager.
2. A BPP-union of BPP-meager sets is BPP-meager.
340
P. Moser
Proof. Immediate by definition of BPP-meagerness.
⊓
⊔
Let us prove the second axiom.
Theorem 8. BPP is not BPP-meager.
Proof. The proof is similar to Theorem 4 except for the second step of N ’s
computation, where every simulation of M is performed with error probability
smaller than 2−n . Since there are n distinct simulation of M , the total error
probability is smaller than n2−n , which ensures that L is in BPP.
⊓
⊔
4.1
Resource-Bounded Banach-Mazur Games
Similarly to Section 3.1, we give an alternative characterization of meager sets
through resource-bounded Banach-Mazur games.
Theorem 9. Let X be any class of languages. The following are equivalent.
1. Player II has a winning strategy for G[X, NN , BPP].
2. X is BPP-meager.
Proof. The 1. implies 2. direction is similar to Theorem 5, except that hk (σ) can
be computed with error probability smaller than 2−n .
For the other direction, the only difference with Theorem 5, is that the first
and third step of M ’s computation can be performed with small error probability.
⊓
⊔
5
Application to the P = BPP Problem
It was shown in [3] that for every ǫ > 0, almost every language A ∈ Eǫ , in the
sense of resource-bounded measure, satisfies PA = BPP. We improve their result
by showing that for every ǫ > 0, almost every language A ∈ Eǫ , in the sense of
resource-bounded Baire’s category, satisfies PA = BPPA .
Theorem 10. For every ǫ > 0, the set of languages A such that PA = BPPA is
Eǫ -meager.
Proof. Let ǫ > 0. Let 0 < δ < max(ǫ, 1/4) and b > 2kδ/ǫ, where k is some constant that will be determined later. Consider the following strategy h, computed
by the following Turing machine M . On input σ, where |s|σ| | = n, M does the
following. At start Z = ∅, and i = 1. M computes zi in the following way. Deterb|u|
mine whether pos(s|σ|+i ) = pos(02 u) for some string u of size log(n2/b ). If not
then zi = 0, output zi , and compute zi+1 ; else denote by ui the corresponding
string u. Construct the set Ti of all truth tables of |ui |-inputs Boolean circuits
C with oracle gates for σ of size less than 2δ|ui | , such that C(uj ) = zj for every
(uj , zj ) ∈ Z. Compute Mi = MajorityC∈Ti [C(ui )], and let zi = 1 − Mi . Add
2/b
(ui , zi ) to Z. Output zi , and compute zi+1 , unless ui = 1log(n ) (i.e. ui is the
last string of size log(n2/b )), in which case M stops.
Baire’s Categories on Small Complexity Classes
341
4δ/b
Since there are 2n
circuits to simulate, and simulating such a circuit takes
time O(n4δ/b ), by answering its queries to σ with the input σ, M runs in time
4δ/b
ǫ′
2n , where ǫ′ < ǫ. Finally computing the majority Mi takes time 2O(n ) . Thus
2cδ/b
the total running time is less than 2n
for some constant c, which is less than
ǫ′
2n with ǫ′ < ǫ for an appropriate choice of k.
b|u|
Let A be any language and consider F (A) := {u|02 u ∈ A}. It is clear that
F (A) ∈ EA . Consider HδA the set of languages with high circuit complexity, i.e.
HδA = {L| every n-inputs circuits with oracle gates for A of size less than 2δn
fails to compute L}. We have, F (A) ∩ HδA = ∅ implies PA = BPPA , by Theorem 1.
We show that h avoids every language A such that F (A) ∩ HδA = ∅. So let A
be any such language, i.e. there is a n-inputs circuit family {Cn }n>0 , with oracle
gates for A, of size less than 2δn computing F (A). We have
C(ui ) = 1 iff 02
b|ui |
ui ∈ A for every string ui such that (ui , zi ) ∈ Z.
(9)
(for simplicity we omit C’s index). Consider the set Dn of all circuits with
log(n2/b )-inputs of size at most n2δ/b with oracles gates for A satisfying Equation
9. We have |Dn | ≤ 24δ/b . By construction, every zi such that (ui , zi ) ∈ Z reduces
the cardinal of Dn by a factor 2. Since there are n2/b zi ’s such that (ui , zi ) ∈ Z,
2/b
⊓
⊔
we have Dn ≤ 24δ/b · 2−n < 1, i.e. Dn = ∅. Therefore h(σ) ❁ χA .
6
Conclusion
Theorem 4 shows that the class SPARSE of all languages with polynomial density is not P-meager. To remedy this situation we can improve the power of
P-computable strategies by considering locally computable strategies, which can
avoid SPARSE and even the class of language of subexponential density. This
issue will be addressed in [13].
References
1. Lutz, J.: Category and measure in complexity classes. SIAM Journal on Computing
19(1990) 1100–1131
2. Lutz, J.: Almost everywhere high nonuniform complexity. Journal of Computer
and System Science 44(1992) 220–258
3. Allender, E., Strauss, M.: Measure on small complexity classes, with application
for BPP. Proceedings of the 35th Annual IEEE Symposium on Foundations of
Computer Science (1994) 807–818
4. Strauss, M.: Measure on P-strength of the notion. Inform. and Comp. 136:1(1997)
1–23
5. Regan, K., Sivakumar, D.: Probabilistic martingales and BPTIME classes. In Proc.
13th Annual IEEE Conference on Computational Complexity (1998) 186–200
6. Moser, P.: A generalization of Lutz’s measure to probabilistic classes. submitted
(2002)
7. Ambos-Spies, K.: Resource-bounded genericity. Proceedings of the Tenth Annual
Structure in Complexity Theory Conference (1995) 162–181
342
P. Moser
8. Balcázar, J.L., Dı́az, J., and Gabarró, J.: Structural Complexity I. EATCS Monographs on Theorical Computer Science Volume 11, Springer-Verlag (1995)
9. Balcázar, J.L., Dı́az, J., and Gabarró, J.: Structural Complexity II. EATCS Monographs on Theorical Computer Science Volume 22, Springer-Verlag (1990)
10. Papadimitriou, C.: Computational complexity. Addisson-Wesley (1994)
11. Klivans, A., Melkebeek, D.: Graph nonisomorphism has subexponential size proofs
unless the polynomial hierarchy collapses. Proceedings of the 31st Annual ACM
Symposium on Theory of Computing (1999) 659–667
12. Impagliazzo, R., Widgerson, A.: P = BPP if E requires exponential circuits: derandomizing the XOR lemma. Proceedings of the 29th Annual ACM Symposium
on Theory of Computing (1997) 220–229
13. Moser, P.: Locally computed Baire’s categories on small complexity classes. submitted (2002)
Operations Preserving Recognizable Languages
Jean Berstel1 , Luc Boasson2 , Olivier Carton2 ,
Bruno Petazzoni3 , and Jean-Éric Pin2
1
Institut Gaspard Monge, Université de Marne-la-Vallée,
5, boulevard Descartes, Champs-sur-Marne, F-77454 Marne-la-Vallée Cedex 2,
berstel@univ-mlv.fr,
2
LIAFA, Université Paris VII and CNRS, Case 7014,
2 Place Jussieu, F-75251 Paris Cedex 05, FRANCE†
{Olivier.Carton,Luc.Boasson,Jean-Eric.Pin}@liafa.jussieu.fr
3
Lycée Marcelin Berthelot, Saint-Maur
bpetazzoni@ac-creteil.fr
Abstract. Given a subset S of N, filtering a word a0 a1 · · · an by S consists in deleting the letters ai such that i is not in S. By a natural
generalization, denote by L[S], where L is a language, the set of all
words of L filtered by S. The filtering problem is to characterize the filters S such that, for every recognizable language L, L[S] is recognizable.
In this paper, the filtering problem is solved, and a unified approach is
provided to solve similar questions, including the removal problem considered by Seiferas and McNaughton. There are two main ingredients on
our approach: the first one is the notion of residually ultimately periodic
sequences, and the second one is the notion of representable transductions.
1
Introduction
The original motivation of this paper was to solve an automata-theoretic puzzle,
proposed by the fourth author (see also [8]), that we shall refer to as the filtering
problem. Given a subset S of N, filtering a word a0 a1 · · · an by S consists in
deleting the letters ai such that i is not in S. By a natural generalization, denote
by L[S], where L is a language, the set of all words of L filtered by S. The filtering
problem is to characterize the filters S such that, for every recognizable language
L, L[S] is recognizable. The problem is non trivial since, for instance, it can be
shown that the filter {n! | n ∈ N} preserves recognizable languages.
The quest for this problem led us to search for analogous questions in the
literature. Similar puzzles were already investigated in the seminal paper of
Stearns and Hartmanis [14], but the most relevant reference is the paper [12] of
Seiferas and McNaughton, in which the so-called “removal problem” was solved:
characterize the subsets S of N2 such that, for each recognizable language L, the
language
P (S, L) = {u ∈ A∗ | there exists v ∈ A∗ such that (|u|, |v|) ∈ S and uv ∈ L}
is recognizable.
†
Work supported by INTAS project 1224.
A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 343–354, 2003.
c Springer-Verlag Berlin Heidelberg 2003
344
J. Berstel et al.
The aim of this paper is to provide a unified approach to solve at the same
time the filtering problem, the removal problem and similar questions. There are
two main ingredients in our approach. The first one is the notion of residually
ultimately periodic sequences, introduced in [12] as a generalization of a similar
notion introduced by Siefkes [13]. The second one is the notion of representable
transductions introduced in [9,10]. Complete proofs will be given in the extended
version of this article.
Our paper is organized as follows. Section 2 introduces some basic definitions: rational and recognizable sets, etc. The precise formulation of the filtering
problem is given in Section 3. Section 4 is dedicated to transductions. Residually ultimately periodic sequences are studied in Section 5 and the properties of
differential sequences are analyzed in Section 6. Section 7 is devoted to residually representable transductions. Our main results are presented in Section 8.
Further properties of residually ultimately periodic sequences are discussed in
Section 9. The paper ends with a short conclusion.
2
2.1
Preliminaries and Background
Rational and Recognizable Sets
Given a multiplicative monoid M , the subsets of M form a semiring P(M )
under union as addition and subset multiplication defined by XY = {xy | x ∈
X and y ∈ Y }. Throughout this paper, we shall use the following convenient
notation. If X is a subset of M , and K is a subset of N, we set X K = n∈K X n .
Recall that the rational subsets of a monoid M form the smallest subset
R of P(M ) containing the finite subsets of M and closed under finite union,
product, and star (where X ∗ is the submonoid generated by X). The set of
rational subsets of M is denoted by Rat(M ). It is a subsemiring of P(M ).
Recall that a subset P of a monoid M is recognizable if there exists a finite
monoid F and a monoid morphism ϕ : M → F such that P = ϕ−1 (ϕ(P )). By
Kleene’s theorem, a subset of a finitely generated free monoid is recognizable if
and only if it is rational. Various characterizations of the recognizable subsets
of N are given in Proposition 1 below, but we need first to introduce some
definitions.
A sequence (sn )n≥0 of elements of a set is ultimately periodic (u.p.) if there
exist two integers m ≥ 0 and r > 0 such that, for each n ≥ m, sn = sn+r .
The (first) differential sequence of an integer sequence (sn )n≥0 is the sequence
∂s
defined
by (∂s)n = sn+1 − sn . Note that the integration formula sn = s0 +
(∂s)
i allows one to recover the original sequence from its differential
0≤i≤n−1
and s0 . A sequence is syndetic if its differential sequence is bounded.
If S is an infinite subset of N, the enumerating sequence of S is the unique
strictly increasing sequence (sn )n≥0 such that S = {sn | n ≥ 0}. The differential
sequence of this sequence is simply called the differential sequence of S. A set is
syndetic if its enumerating sequence is syndetic.
The characteristic sequence of a subset S of N is the sequence cn equal to 1
if n ∈ S and to 0 otherwise. The following elementary result is folklore.
Operations Preserving Recognizable Languages
345
Proposition 1. Let S be a set of non-negative integers. The following conditions are equivalent:
(1) S is recognizable,
(2) S is a finite union of arithmetic progressions,
(3) the characteristic sequence of S is ultimately periodic.
If S is infinite, these conditions are also equivalent to the following conditions
(4) the differential sequence of S is ultimately periodic.
Example 1. Let S = {1, 3, 4, 9, 11} ∪ {7 + 5n | n ≥ 0} ∪ {8 + 5n | n ≥ 0} =
{1, 3, 4, 7, 8, 9, 11, 12, 13, 17, 18, 22, 23, 27, 28, . . . }. Its characteristic sequence
0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, . . .
and its differential sequence 2, 1, 3, 1, 1, 2, 1, 1, 4, 1, 4, 1, 4, . . . are ultimately periodic.
2.2
Relations
Given two sets E and F , a relation on E and F is a subset of E × F . The
inverse of a relation S on E and F is the relation S −1 on F × E defined by
(y, x) ∈ S −1 if and only if (x, y) ∈ S. A relation S on E and F can also be
considered as a function from E into P(F ), the set of subsets of F , by setting,
for each x ∈ E, S(x) = {y ∈ F | (x, y) ∈ S}. It can also be viewed as a function
from P(E) into P(F ) by setting, for each subset X of E:
S(X) =
S(x) = {y ∈ F | there exists x ∈ X such that (x, y) ∈ S}
x∈X
Dually, S −1 can be viewed as a function from P(F ) into P(E) defined, for each
subset Y of F , by S −1 (Y ) = {x ∈ E | S(x) ∩ Y = ∅}. When this “dynamical”
point of view is adopted, we say that S is a relation from E into F and we use
the notation S : E → F .
A relation S : N → N is recognizability preserving if, for each recognizable
subset R of N, the set S −1 (R) is recognizable.
3
Filtering Languages
A filter is a finite or infinite increasing sequence s of non-negative integers. If
u = a0 a1 a2 · · · is an infinite word (the ai are letters), we set u[s] = as0 as1 · · · .
Similarly, if u = a0 a1 a2 · · · an is a finite word, we set u[s] = as0 as1 · · · ask , where
k is the largest integer such that sk ≤ n < sk+1 . Thus, for instance, if s is the
sequence of squares, abracadabra[s] = abcr.
By extension, if L is a language (resp. a set of infinite words), we set
L[s] = {u[s] | u ∈ L}
346
J. Berstel et al.
If s is the enumerative sequence of a subset S of N, we also use the notation
L[S]. If, for every recognizable language L, the set L[s] is recognizable, we say
that the filter S preserves recognizability. The filtering problem is to characterize
the recognizability preserving filters.
4
Transductions
In this paper, we consider transductions that are relations from a free monoid
A∗ into a monoid M . Transductions were intensively studied in connection with
context-free languages [1].
Some transductions can be realized by a non-deterministic automaton with
output in P(M ), called transducer. More precisely, a transducer is a 6-tuple
T = (Q, A, M, I, F, E) where Q is a finite set of states, A is the input alphabet,
M is the output monoid, I = (Iq )q∈Q and F = (Fq )q∈Q are arrays of elements
of P(M ), called respectively the initial and final outputs. The set of transitions
E is a finite subset of Q × A × P(M ) × Q. Intuitively, a transition (p, a, R, q) is
interpreted as follows: if a is an input letter, the automaton moves from state p
to state q and produces the output R.
A path is a sequence of consecutive transitions:
a1 |R1
a2 |R2
an |Rn
q0 −→ q1 −→ q2 · · · qn−1 −→ qn
The (input) label of the path is the word a1 a2 · · · an . Its output is the set
Iq0 R1 R2 · · · Rn Fqn . The transduction realized by T maps each word u of A∗
onto the union of the outputs of all paths of input label u.
A transduction τ : A∗ → M is said to be rational if τ is a rational subset of
the monoid A∗ × M . By the Kleene-Schützenberger theorem [1], a transduction
τ : A∗ → M is rational if and only if it can be realized by a rational transducer,
that is, a transducer with outputs in Rat(M ).
A transduction τ : A∗ → M is said to preserve recognizability, if, for each
recognizable subset P of M , τ −1 (P ) is a recognizable subset of A∗ . It is well
known that rational transductions preserve recognizability, but this property is
also shared by the larger class of representable transductions, introduced in [9,
10].
Two types of transduction will play an important role in this paper, the
removal transductions and the filtering transductions. Given a subset S of N2 ,
considered as a relation on N, the removal
transduction of S is the transduction
σS : A∗ → A∗ defined by σS (u) = (|u|,n)∈S uAn . The filtering transduction
of a filter s is the transduction τs : A∗ → A∗ defined by τs (a0 a1 · · · an ) =
As0 a0 As1 a1 · · · Asn an A{0,1,... ,sn+1 } .
The main idea of [9,10] is to write an n-ary operator Ω on languages as the
inverse of some transduction τ : A∗ → A∗ × · · · × A∗ , that is Ω(L1 , . . . , Ln ) =
τ −1 (L1 ×· · ·×Ln ). If the transduction τ turns out to be representable, the results
of [9,10] give an explicit construction of a monoid recognizing Ω(L1 , . . . , Ln ),
given monoids recognizing L1 , . . . , Ln , respectively.
Operations Preserving Recognizable Languages
347
−1
(L). Indeed,
In our case, we claim that P (S, L) = σS−1 (L) and L[s] = τ∂s−1
we have on the one hand
σS−1 (L) = {u ∈ A∗ |
uAn ∩ L = ∅}
(|u|,n)∈S
= {u ∈ A∗ | there exists v ∈ A∗ such that (|u|, |v|) ∈ S and uv ∈ L}
= P (S, L)
and on the other hand
−1
τ∂s−1
(L) = {a0 a1 · · · an ∈ A∗ |
As0 −1 a0 As1 −s0 −1 a1 · · · Asn −sn−1 −1 an A{0,1,... ,sn+1 −sn −1} ∩ L = ∅}
= L[s]
Unfortunately, the removal transductions and the filtering transductions are not
in general representable. We shall see in Section 7 how to overcome this difficulty.
But we first need to introduce our second major tool, the residually ultimately
periodic sequences.
5
Residually Ultimately Periodic Sequences
Let M be a monoid. A sequence (sn )n≥0 of elements of M is residually ultimately
periodic (r.u.p.) if, for each monoid morphism ϕ from M into a finite monoid F ,
the sequence ϕ(sn ) is ultimately periodic.
We are mainly interested in the case where M is the additive monoid N of
non-negative integers. The following connexion with recognizability preserving
sequences was established in [5,7,12,16].
Proposition 2. A sequence (sn )n≥0 of non-negative integers is residually ultimately periodic if and only if the function n → sn preserves recognizability.
For each non-negative integer t, define the congruence threshold t by setting:
x≡y
(thr t)
if and only if x = y < t or x ≥ t and y ≥ t.
Thus threshold counting can be viewed as a formalisation of children counting:
zero, one, two, three, . . . , many.
A function s : N → N is said to be ultimately periodic modulo p if, for each
monoid morphism ϕ : N → Z/pZ, the sequence un = ϕ(s(n)) is ultimately
periodic. It is equivalent to state that there exist two integers m ≥ 0 and r > 0
such that, for each n ≥ m, un ≡ un+r (mod p). A sequence is said to be cyclically
ultimately periodic (c.u.p.) if it is ultimately periodic modulo p for every p > 0.
These functions are called “ultimately periodic reducible” in [12,13].
Similarly, function s : N → N is said to be ultimately periodic threshold t if,
for each monoid morphism ϕ : N → Nt,1 , the sequence un = ϕ(s(n)) is ultimately
periodic. It is equivalent to state that there exist two integers m ≥ 0 and r > 0
such that, for each n ≥ m, un ≡ un+r (thr t).
348
J. Berstel et al.
Proposition 3. A sequence of non-negative integers is residually ultimately periodic if and only if it is ultimately periodic modulo p for all p > 0 and ultimately
periodic threshold t for all t ≥ 0.
The next proposition gives a very simple criterion to generate sequences that
are ultimately periodic threshold t for all t.
Proposition 4. A sequence (un )n≥0 of integers such that limn→∞ un = +∞ is
ultimately periodic threshold t for all t ≥ 0.
Example 2. The sequence n! is residually ultimately periodic. Indeed, let p be
a positive integer. Then for each n ≥ p, n! ≡ 0 mod p and thus n! is ultimately
periodic modulo p. Furthermore, Proposition 4 shows that, for each t ≥ 0, n! is
ultimately periodic threshold t.
The class of cyclically ultimately periodic functions has been thoroughly
studied by Siefkes [13], who gave in particular a recursion scheme for producing
such functions. Residually ultimately periodic sequences have been studied in [3,
5,7,12,15,16]. Their properties are summarized in the next proposition.
Theorem 1. [16,3] Let (un )n≥0 and (vn )n≥0 be r.u.p. sequences. Then the following sequences are also r.u.p.:
(1) (composition) uvn ,
(2) (sum) un + vn ,
(3) (product) un vn ,
(4) (difference) un − vn provided that un ≥ vn and lim (un − vn ) = +∞,
n→∞
(5) (exponentation) uvnn ,
(6) (generalized sum) 0≤i≤vn ui ,
(7) (generalized product) 0≤i≤vn ui .
In particular, the sequences nk and k n (for a fixed k), are residually ultimately
periodic. However, r.u.p. sequences are not closed under quotients. For instance,
let un be the sequence equal to 1 if n is prime and to n! + 1 otherwise. Then
nun is r.u.p. but un is not r.u.p.. This answers a question left open in [15].
. . .2
2
(exponential stack of 2’s of height n), considered in [12],
The sequence 22
is also a r.u.p. sequence, according to the following result.
Proposition 5. Let k be a positive integer. Then the sequence un defined by
u0 = 1 and un+1 = k un is r.u.p.
The existence of non-recursive, r.u.p. sequences was established in [12]: if
ϕ : N → N is a strictly increasing, non-recursive function, then the sequence
un = n!ϕ(n) is non-recursive but is residually ultimately periodic. The proof is
similar to that of Example 2.
Operations Preserving Recognizable Languages
6
349
Differential Sequences
An integer sequence is called differentially residually ultimately periodic (d.r.u.p.
in abbreviated form), if its differential sequence is residually ultimately periodic.
What are the connections between d.r.u.p. sequences and r.u.p. sequences?
First, the following result holds:
Proposition 6. [3, Corollary 28] Every d.r.u.p. sequence is r.u.p.
However, the two notions are not equivalent: for instance, it was shown in [3]
that
if bn is a non-ultimately periodic sequence of 0 and 1, the sequence un =
( 0≤i≤n bi )! is r.u.p. but is not d.r.u.p. It suffices to observe that (∂u)n ≡ bn
threshold 1.
Note that, if only cyclic counting were used, it would make no difference:
Proposition 7. Let p be a positive number. A sequence is ultimately periodic
modulo p if and only if its differential sequence is ultimately periodic modulo p.
There is a special case for which the notions of r.u.p. and d.r.u.p. sequences
are equivalent. Indeed, if the differential sequence is bounded, Proposition 1 can
be completed as follows.
Lemma 1. If a syndetic sequence is residually ultimately periodic, then its differential sequence is ultimately periodic.
Putting everything together, we obtain
Proposition 8. Let s be a syndetic sequence of non-negative integers. The following conditions are equivalent:
(1) s is residually ultimately periodic,
(2) ∂s is residually ultimately periodic,
(3) ∂s is ultimately periodic.
Proof. Proposition 6 shows that (2) implies (1). Furthermore (3) implies (2) is
trivial. Finally, Lemma 1 shows that (1) implies (3).
Proposition 9. Let S be an infinite syndetic subset of N. The following conditions are equivalent:
(1) S is recognizable,
(2) the enumerating sequence of S is residually ultimately periodic,
(3) the differential sequence of S is residually ultimately periodic,
(4) the differential sequence of S is ultimately periodic.
Proof. The last three conditions are equivalent by Proposition 8 and the equivalence of (1) and (4) follows from Proposition 1.
The class of d.r.u.p. sequences was thoroughly studied in [3].
Theorem 2. [3, Theorem 22] Differential residually ultimately periodic sequences are closed under sum, product, exponentation, generalized sum and generalized product. Furthermore, given two d.r.u.p. sequences (un )n≥0 and (vn )n≥0
such that un ≥ vn and lim (∂u)n −(∂v)n = +∞, the sequence un −vn is d.r.u.p.
n→∞
350
7
J. Berstel et al.
Residually Representable Transductions
Let M be a monoid. A transduction τ : A∗ → M is residually rational (resp.
residually representable ) if, for every monoid morphism α from M into a finite
monoid N , the transduction α ◦ τ : A∗ → N is rational (resp. representable).
Since a rational transduction is (linearly) representable, every residually rational transduction is residually representable. Furthermore, every representable
transduction is residually representable. We now show that the removal transductions and the filtering transductions are residually rational. We first consider
the removal transductions.
Fig. 1. A transducer realizing β.
Proposition 10. Let S be a recognizability preserving relation on N. The removal transduction of S is residually rational.
Proof. Let α be a morphism from A∗ into a finite monoid N . Let β = α ◦ τs and
R = α(A). Since the monoid P(N ) is finite, the sequence (Rn )n≥0 is ultimately
periodic. Therefore, there exist two integers r ≥ 0 and q > 0 such that, for all
n ≥ r, Rn = Rn+q . Consider the following subsets of N: K0 = {0}, K1 = {1},
. . . , Kr−1 = {r − 1}, Kr = {r, r + q, r + 2q, . . . }, Kr+1 = {r + 1, r + q + 1, r +
2q + 1, . . . }, . . . , Kr+q−1 = {r + q − 1, r + 2q − 1, r + 3q − 1, . . . }. The sets
Ki , for i ∈ {0, 1, . . . , r + q − 1} are recognizable and since S is recognizability
preserving, each set S −1 (Ki ) is also recognizable. By Proposition 1, there exist
two integers ti ≥ 0 and pi > 0 such that, for all n ≥ ti , n ∈ S −1 (Ki ) if and only
if n + pi ∈ S −1 (Ki ). Setting t = max0≤i≤r+q−1 ti and p = lcm0≤i≤r+q−1 pi , we
conclude that, for all n ≥ t and for 0 ≤ i ≤ r + q − 1, n ∈ S −1 (Ki ) if and only
if n + p ∈ S −1 (Ki ), or equivalently
S(n) ∩ Ki = ∅ ⇐⇒ S(n + p) ∩ Ki = ∅
Operations Preserving Recognizable Languages
351
It follows that the sequence Rn of P(N ) defined by Rn = RS(n) is ultimately
periodic of threshold t and period p, that is, Rn = Rn+p for all n ≥ t. Consequently, the transduction β can be realized by the transducer represented in
Figure 1, in which a stands for a generic letter of A. Therefore β is rational and
τs is residually rational.
Fig. 2. A transducer realizing γs .
Proposition 11. Let s be a residually ultimately periodic sequence. Then the
filtering transduction τs is residually rational.
Proof. Let α be a morphism from A∗ into a finite monoid N . Let γs = α ◦ τs
and R = α(A). Finally, let ϕ : N → P(N ) be the morphism defined by ϕ(n) =
Rn . Since P(N ) is finite and sn is residually ultimately periodic, the sequence
ϕ(sn ) = Asn is ultimately periodic. Therefore, there exist two integers t ≥ 0 and
p > 0 such that, for all n ≥ t, Rsn+p = Rsn . It follows that the transduction γs
can be realized by the transducer represented in Figure 2, in which a stands for
a generic letter of A. Therefore γs is rational and thus τs is residually rational.
The fact that the two previous transducers preserve recognizability is now a
direct consequence of the following general statement.
Theorem 3. Let M be a monoid. Any residually rational transduction τ : A∗ →
M preserves recognizability.
Proof. Let P be a recognizable subset of M and let α : M → N be a morphism
recognizing P , where N is a finite monoid. By definition, α−1 (α(P )) = P . Since
τ is residually rational, the transduction α ◦ τ : A∗ → N is rational. Since N
is finite, every subset of N is recognizable. In particular, α(P ) is recognizable
and since τ preserves recognizability, (α ◦τ )−1 α(P ) is recognizable. The theorem
follows, since (α ◦ τ )−1 α(P ) = τ −1 (α−1 (α(P ))) = τ −1 (P ).
352
8
J. Berstel et al.
Main Results
The aim of this section is to provide a unified solution for the filtering problem
and the removal problem.
8.1
The Filtering Problem
Theorem 4. A filter preserves recognizability if and only if it is differentially
residually ultimately periodic.
Proposition 11 and Theorem 3 show that if a filter is d.r.u.p., then it preserves
recognizability. We now establish the converse property.
Proposition 12. Every recognizability preserving filter is differentially residually ultimately periodic.
Proof. Let s be a recognizability preserving filter. By Proposition 3 and 7, it
suffices to prove the following properties:
(1) for each p > 0, s is ultimately periodic modulo p,
(2) for each t ≥ 0, ∂s is ultimately periodic threshold t.
(1) Let p be a positive integer and let A = {0, 1, ...(p − 1)}. Let u = a0 a1 · · · be
the infinite word whose i-th letter ai is equal to si modulo p. At this stage, we
shall need two elementary properties of ω-rational sets. The first one states that
an infinite word u is ultimately periodic if and only if the ω-language {u} is ωrational. The second one states that, if L is a recognizable language of A∗ , then
−
→
L (the set of infinite words having infinitely many prefixes in L) is ω-rational.
We claim that u is ultimately periodic. Define L as the set of prefixes of the
infinite word (0123 · · · (p − 1))ω . Then L[s] is the set of prefixes of u. Since L is
−→
recognizable, L[s] is recognizable, and thus the set L[s] is ω-rational. But this
set reduces to {u}, which proves the claim. Therefore, the sequence (sn )n≥0 is
ultimately periodic modulo p.
(2) The proof is quite similar to that of (1), but is sligthly more technical. Let t be
a non-negative integer and let B = {0, 1, . . . , t}∪{a}, where a is a special symbol.
Let d = d0 d1 · · · be the infinite word whose i-th letter di is equal to si+1 − si − 1
threshold t. Let us prove that d is ultimately periodic. Consider the recognizable
prefix code P = {0, 1a, 2a2 , 3a3 , . . . , tat , a}. Then P ∗ [s] is recognizable, and
so is the language R = P ∗ [s] ∩ {0, 1, . . . , t}∗ . We claim that, for each n >
0, the word pn = d0 d1 · · · dn−1 is the maximal word of R of length n in the
lexicographic order induced by the natural order 0 < 1 < . . . < t. First, pn =
u[s], where u = as0 d0 as1 −s0 −1 d1 · · · dn−1 asn −sn−1 −1 and thus pn ∈ R. Next, let
p′n = d′0 d′1 · · · d′n−1 be another word of R of length n. Then p′n = u′ [s] for some
word u′ ∈ P ∗ . Suppose that p′n comes after pn in the lexicographic order. We may
assume that, for some index i ≤ n − 1, d0 = d′0 , d1 = d′1 , . . . , di−1 = d′i−1 and
di < d′i . Since u′ ∈ P ∗ , the letter d′i , which occurs in position si in u′ , is followed
by at least d′i letters a. Now d′i > di , whence di < t and di = si+1 − si − 1. It
Operations Preserving Recognizable Languages
353
follows in particular that in u′ , the letter in position si+1 is an a, a contradiction,
since u′ [s] contains no occurrence of a. This proves the claim.
Let now A be a finite deterministic trim automaton recognizing R. It follows
from the claim that in order to read d in A, starting from the initial state, it
suffices to choose, in each state q, the unique transition with maximal label in the
lexicographic order. It follows at once that d is ultimately periodic. Therefore,
the sequence (∂s) − 1 is ultimately periodic threshold t, and so is (∂s).
8.2
The Removal Problem
The solution of the removal problem was given in [12].
Theorem 5. Let S be a subset of N2 . The following conditions are equivalent:
(1) for each recognizable language L, the language P (S, L) is recognizable,
(2) S is a recognizability preserving relation
The most difficult part of the proof, (2) implies (1), follows immediately from
Proposition 10 and Theorem 3.
9
Further Properties of d.r.u.p. Sequences
Coming back to the filtering problem, the question arises to characterize the
filters S such that, for every recognizable language L, both L[S] and L[N \ S]
are recognizable. By Theorem 4, the sequences defined by S and its complement
should be d.r.u.p. This implies that S is recognizable, according to the following
slightly more general result.
Proposition 13. Let S and S ′ be two infinite subsets of N such that S ∪ S ′ and
S ∩ S ′ are recognizable. If the enumerating sequence of S is d.r.u.p. and if the
enumerating sequence of S ′ is r.u.p., then S and S ′ are recognizable.
One can show that the conclusion of Proposition 13 no longer holds if S ′ is
only assumed to be residually ultimately periodic.
10
Conclusion
Our solution to the filtering problem was based on the fact that any residually
rational transduction preserves recognizability. There are several advantages to
our approach.
First, it gives a unified solution to apparently disconnected problems, like the
filtering problem and the removal problem. Actually, most of – if not all – the
automata-theoretic puzzles proposed in [4,5,6,7,9,10,11,12,14] and [15, Section
5.2], can be solved by using the strongest fact that any residually representable
transduction preserves recognizability.
Next, refining the approach of [9,10], if τ : A∗ → A∗ × · · · × A∗ is a residually
representable transduction, one could give an explicit construction of a monoid
354
J. Berstel et al.
recognizing τ −1 (L1 × · · · × Ln ), given monoids recognizing L1 , . . . , Ln , respectively (the details will be given in the full version of this paper). This information
can be used, in turn, to see whether a given operation on languages preserves
star-free languages, or other standard classes of rational languages.
Acknowledgements. Special thanks to Michèle Guerlain for her careful reading of a first version of this paper and to the anonymous referees for their suggestions.
References
1. J. Berstel, Transductions and context-free languages, Teubner, Stuttgart, (1979).
2. O. Carton and W. Thomas, The monadic theory of morphic infinite words and generalizations, in MFCS 2000, Lecture Notes in Computer Science 1893, M. Nielsen
and B. Rovan, eds, (2000), 275–284.
3. O. Carton and W. Thomas, The monadic theory of morphic infinite words and
generalizations, Inform. Comput. 176, (2002), 51–76.
4. S. R. Kosaraju, Finite state automata with markers, in Proc. Fourth Annual
Princeton Conference on Information Sciences and Systems, Princeton, N. J.
(1970), 380.
5. S. R. Kosaraju, Regularity preserving functions, SIGACT News 6 (2), (1974), 1617. Correction to “Regularity preserving functions”, SIGACT News 6 (3), (1974),
22.
6. S. R. Kosaraju, Context-free preserving functions, Math. Systems Theory 9,
(1975), 193–197.
7. D. Kozen, On regularity-preserving functions, Bull. Europ. Assoc. Theor. Comput.
Sc. 58 (1996), 131–138. Erratum: On Regularity-Preserving Functions, Bull. Europ.
Assoc. Theor. Comput. Sc. 59 (1996), 455.
8. A. B. Matos, Regularity-preserving letter selections, DCC-FCUP Internal Report.
9. J.-É. Pin and J. Sakarovitch, Operations and transductions that preserve rationality, in 6th GI Conference, Lecture Notes in Computer Science 145, Springer
Verlag, Berlin, (1983), 617–628.
10. J.-É. Pin and J. Sakarovitch, Une application de la représentation matricielle des
transductions, Theoret. Comp. Sci. 35 (1985), 271–293.
11. J. I. Seiferas, A note on prefixes of regular languages, SIGACT News 6, (1974),
25–29.
12. J. I. Seiferas and R. McNaughton, Regularity-preserving relations, Theoret. Comp.
Sci. 2, (1976), 147–154.
13. D. Siefkes, Decidable extensions of monadic second order successor arithmetic, in:
Automatentheorie und formale Sprachen, (Mannheim, 1970), J. Dörr and G. Hotz,
Eds, B.I. Hochschultaschenbücher, 441–472.
14. R. E. Stearns and J. Hartmanis, Regularity preserving modifications of regular
expressions, Information and Control 6, (1963), 55–69.
15. Guo-Qiang Zhang, Automata, Boolean matrices, and ultimate periodicity, Information and Computation, 152, (1999), 138–154.
16. Guo-Qiang Zhang, Periodic functions for finite semigroups, preprint.
Languages Defined by Generalized Equality Sets
Vesa Halava1 , Tero Harju1 , Hendrik Jan Hoogeboom2 , and Michel Latteux3
1
Department of Mathematics and TUCS – Turku Centre for Computer Science,
University of Turku, FIN-20014, Turku, Finland
{vehalava,harju}@utu.fi
2
Dept. of Comp. Science, Leiden University
P.O. Box 9512, 2300 RA Leiden, The Netherlands
hoogeboom@liacs.nl
3
Université des Sciences et Technologies de Lille, Bâtiment M3,
59655 Villeneuve d’Ascq Cédex, France
latteux@lifl.fr
Abstract. We consider the generalized equality sets which are of the
form EG (a, g1 , g2 ) = {w | g1 (w) = ag2 (w)}, determined by instances
of the generalized Post Correspondence Problem, where the morphisms
g1 and g2 are nonerasing and a is a letter. We are interested in the
family consisting of the languages h(EG (J)), where h is a coding and
J is a shifted equality set of the above form. We prove several closure
properties for this family.
1
Introduction
In formal language theory, languages are often determined by their generating
grammars or accepting machines. It is also customary to say that languages
generated by grammars of certain form or accepted by automata of specific type
form a language family. Here we shall study a language family defined by simple
generalized equality sets of the form EG (J), where J = (a, g1 , g2 ) is an instance
of the shifted Post Correspondence Problem consisting of a letter a and two
morphisms g1 and g2 . Then the set EG (J) consists of the words w that satisfy
g1 (w) = ag2 (w).
Our motivation for these generalized equality sets comes partly from a result
of [2], where it was proved that the family of regular valence languages is equal to
the family of languages of the form h(EG (J)), where h is a coding (i.e., a letterto-letter morphism), and, moreover, in the instance J = (a, g1 , g2 ) the morphism
g2 is periodic. Here we shall consider general case where we do not assume g2
to be periodic, but both morphisms to be nonerasing. We study the properties
of this family CE of languages by studying its closure properties. In particular,
we show that CE is closed under union, product, Kleene plus, intersection with
regular sets. Also, more surprisingly, CE is closed under nonerasing morphisms
inverse morphisms.
A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 355–363, 2003.
c Springer-Verlag Berlin Heidelberg 2003
356
2
V. Halava et al.
Preliminaries
Let A be an alphabet, and denote by A∗ the monoid of all finite words under
the operation of catenation. Note that the empty word, denoted by ε, is in the
monoid A∗ . The semigroup A∗ \ {ε} generated by A is denoted by A+ .
For two words u, v ∈ A∗ , u is a prefix of v if there exists a word z ∈ A∗ such
that v = uz. This is denoted by u ≤ v. If v = uz, then we also write u = vz −1
and z = u−1 v.
In the following, let A and B be alphabets and g : A∗ → B ∗ a mapping. For
a word x ∈ B ∗ , we denote by g −1 (x) = {w ∈ A∗ | g(w) = x} the inverse image
of x under g. Then g −1 (K) = ∪x∈K g −1 (x) is the inverse image of K ⊆ B ∗
under g, and g(L) = {g(w) | w ∈ L} is the image of L ⊆ A∗ under g. Also, g is
a morphism if g(uv) = g(u)g(v) for all u, v ∈ A∗ . A morphism g is a coding, if it
maps letters to letters, that is, if g(A) ⊆ B. A morphism g is said to be periodic,
if there exists a word w ∈ B ∗ such that g(A∗ ) ⊆ w∗ .
In the following section, for an alphabet A, the alphabet Ā = {ā | a ∈ A} is
a copy of A, if A ∩ Ā = ∅.
In the Post Correspondence Problem, PCP for short, we are given two morphisms g1 , g2 : A∗ → B ∗ and it is asked whether or not there exists a nonempty
word w ∈ A+ such that g1 (w) = g2 (w). Here the pair (g1 , g2 ) is an instance
of the PCP, and the word w is called a solution. As a general reference to the
problems and results concerning the Post Correspondence Problem, we give [3].
For an instance I = (g1 , g2 ) of the PCP, let
E(I) = {w ∈ A∗ | g1 (w) = g2 (w)}
be its equality set. It is easy to show that an equality set E = E(g1 , g2 ) is always
a monoid, that is, E = E ∗ . In fact, it is a free monoid, and thus the algebraic
structure of E is relatively simple, although the problem whether or not E is
trivial is undecidable.
We shall now consider special instances of the generalized Post Correspondence Problem in order to have slightly more structured equality sets. In the
shifted Post Correspondence Problem, or shifted PCP for short, we are given two
morphisms g1 , g2 : A∗ → B ∗ and a letter a ∈ B, and it is asked whether there
exists a word w ∈ A∗ such that
g1 (w) = ag2 (w).
(2.1)
The triple J = (a, g1 , g2 ) is called an instance of the shifted PCP and a word w
satisfying equation (2.1) is called a solution of J. It is clear that a solution w is
always nonempty. We let
EG (J) = w ∈ A+ | g1 (w) = ag2 (w)
be the generalized equality set of J.
We shall denote by CE the set of all languages h(EG (J)), where h is a coding,
and the morphisms in the instances J of the shifted PCP are both nonerasing.
Languages Defined by Generalized Equality Sets
357
In [2] CE per was defined as the family of languages h(EG (J)), where h is
a coding, and one of the morphisms in the instance J of the shifted PCP was
assumed to be periodic. It was proved in [2] that CE per is equal to the family
of languages defined by the regular valence grammars (see [6]). It is easy to see
that the morphisms in the instances could have been assumed to be nonerasing
in order to get the same result. Therefore, the family CE studied in this paper
is a generalization of CE per or, actually, CE per is a subfamily of CE.
3
Closure Properties of CE
The closure properties of the family CE per follow from the known closure properties of regular valence languages. In this section, we study the closure properties
of the more general family CE under various operations.
Before we start our journey through the closure results, we make first some
assumptions of the instances of the shifted PCP defining the languages at hand.
First of all, we may always assume that in an instance J = (a, g1 , g2 ) of the
shifted PCP the shift letter a is a special symbol that satisfies:
The shift letter a can appear only as the first letter in the images of g1 and
it does not occur at all in the images of g2 .
To see this, consider any language L = h(EG (a, g1 , g2 )), where g1 , g2 : A∗ → B ∗
and h : A∗ → C ∗ . Let # be a new letter not in A ∪ B. Construct a new instance
(#, g1′ , g2′ ), where g1′ , g2′ : (A ∪ Ā)∗ → (B ∪ {#})∗ and Ā is a copy of A, by setting
for all x ∈ A g2′ (x) = g2′ (x̄) = g2 (x), and g1′ (x) = g1 (x) and
g1 (x), if a g1 (x),
′
g1 (x̄) =
#w,
if g1 (x) = aw.
Define a new coding h′ : (A ∪ Ā)∗ → C ∗ by h′ (x) = h′ (x̄) = h(x) for all x ∈ A.
It is now obvious that L = h′ (EG (#, g1′ , g2′ )).
We shall call such an instance (#, g1′ , g2′ ) shift-fixed, where the shift letter #
is used only as the first letter.
The next lemma shows that we may also assume that the instance (g1 , g2 )
does not have any nontrivial solutions, that is, E(g1 , g2 ) = {ε} for all instance
J = (a, g1 , g2 ) defining the language h(EG (J)).
For this result we introduce two mappings which are used for desynchronizing
a pair of morphisms. Let d be a new letter. For a word u = a1 a2 · · · an , where
each ai is a letter, define
ℓd (u) = da1 da2 d · · · dan
and
rd (u) = a1 da2 d · · · dan d.
In other words ℓd is a morphism that adds d in front of every letter and rd is a
morphism that adds d after every letter of a word.
Lemma 1 For every instance J of the shifted PCP and coding h, there exists
an instance J ′ = (a, g1′ , g2′ ) and a coding h′ such that h(EG (J)) = h′ (EG (J ′ ))
and E(g1′ , g2′ ) = {ε}.
358
V. Halava et al.
Proof. Let J = (a, g1 , g2 ) be a shift-fixed instance of the shifted PCP where
g1 , g2 : A∗ → B ∗ , and let h : A∗ → C ∗ be a coding. We define new morphisms
/ B is a new letter and Ā is a copy of
g1′ , g2′ : (A ∪ Ā)∗ → (B ∪ {d})∗ , where d ∈
A, as follows. For all x ∈ A,
g2′ (x) = ℓd (g2 (x))
and g2′ (x̄) = ℓd (g2 (x))d
#d · rd (w), if g1 (x) = #w,
′
′
g1 (x) = g1 (x̄) =
rd (g1 (x)),
if a g1 (x).
(3.1)
(3.2)
Note that the letters in Ā can be used only as the last letter of a solution of
(a, g1′ , g2′ ). Since every image by g2′ begins with letter d and it is not a prefix of
any image of g1′ , we obtain that E(g1′ , g2′ ) = {ε}. On the other hand, (a, g1′ , g2′ )
has a solution wx̄ if and only if wx is a solution of (a, g1 , g2 ). Therefore, we
define h′ : (A ∪ Ā)∗ → C ∗ by h′ (x) = h′ (x̄) = h(x) for all x ∈ A. The claim of
the lemma follows, since obviously h(EG (J)) = h′ (EG (J ′ )).
We shall call an instance an instance (a, g1 , g2 ) reduced, if it is shift-fixed and
E(g1 , g2 ) = {ε}.
3.1
Union and Product
Theorem 2 The family CE is closed under union.
Proof. Let K, L ∈ CE with K = h1 (EG (J1 )) and L = h2 (EG (J2 )), where
J1 = (a1 , g11 , g12 ) and J2 = (a2 , g21 , g22 ) are reduced, and g11 , g12 : Σ ∗ → B1∗
and g21 , g22 : Ω ∗ → B2∗ . Without restriction we can suppose that Ω ∩ Σ = ∅.
(Otherwise we take a primed copy of the alphabet Ω that is disjoint from Σ, and
define a new instance J2′ by replacing the letter with primed copies.) Assume
also that B1 ∩ B2 = ∅.
Let B = B1 ∪ B2 , and let # be a new letter. First replace every appearance
of the shift letters a1 and a2 in J1 and J2 with #. Define morphisms g1 , g2 : (Σ ∪
Ω)∗ → B ∗ as follows: for all x ∈ Σ ∪ Ω,
g11 (x), if x ∈ Σ
g12 (x), if x ∈ Σ
g1 (x) =
and g2 (x) =
g21 (x), if x ∈ Ω
g22 (x), if x ∈ Ω.
Define a coding h : (Σ ∪ Ω)∗ → C ∗ similarly:
h1 (x), if x ∈ Σ
h(x) =
h2 (x), if x ∈ Ω.
(3.3)
Since Σ ∩ Ω = ∅, and J1 and J2 are reduced (i.e., E(g11 , g12 ) = {ε} =
E(g21 , g22 )), we see that the solutions in EG (J1 ) and EG (J2 ) cannot be combined
or mixed. Thus, it is straightforward to show that h(EG (#, g1 , g2 )) = K ∪ L.
Next we consider the product KL of languages.
Languages Defined by Generalized Equality Sets
359
Theorem 3 The family CE is closed under product of languages.
Proof. Let K, L ∈ CE with K = h1 (EG (J1 )) and L = h2 (EG (J2 )), where J1 =
(a1 , g11 , g12 ) and J2 = (a2 , g21 , g22 ) are shift-fixed. Assume that g11 , g12 : Σ ∗ →
B1∗ , and g21 , g22 : Ω ∗ → B2∗ , where again we can assume that Σ ∩ Ω = ∅, and
similarly that B1 ∩ B2 = ∅. We also assume that the length of the images of
the morphisms are at least 2 (actually, this is needed only for g11 ). This can be
assumed, for example, by the construction in Lemma 1.
We shall prove that KL = {uv | u ∈ K, v ∈ L} is in CE. For this, we define
morphisms g1 , g2 : (Σ ∪ Ω)∗ → (B1 ∪ B2 )∗ in the following way: for each x ∈ Σ,
ℓa2 (g11 (x)), if a1 g11 (x),
g1 (x) =
if g11 (x) = a1 yw (y ∈ B1 ),
a1 yℓa2 (w),
and
g2 (x) = ra2 (g12 (x)),
and for each x ∈ Ω, g1 (x) = g21 (x) and g2 (x) = g22 (x). If we now define h by
combining h1 and h2 as in (3.3), we obtain that h(EG (a1 , g1 , g2 )) = KL.
We shall now extend the above result by proving that CE is closed under
Kleene plus, i.e., if K ∈ CE, then
K+ =
K i ∈ CE.
i≥1
Clearly CE is not closed under Kleene star, since the empty word does not belong
to any language in CE.
Theorem 4 The family CE is closed under Kleene plus.
Proof. Let K = h(EG (#, g1 , g2 )), where g1 , g2 : A∗ → B ∗ are nonerasing morphisms, h : A∗ → C ∗ is a coding and the instance (#, g1 , g2 ) is shift-fixed. Also,
let Ā be a copy of A, and define ḡ1 , ḡ2 : (A ∪ Ā)∗ → B ∗ in the following way: for
each x ∈ A,
ḡ1 (x) = g1 (x) and ḡ2 (x) = g2 (x),
ℓ# (g1 (x)), if # g1 (x),
ḡ1 (x̄) =
ℓ# (w), if g1 (x) = #w,
ḡ2 (x̄) = r# (g2 (x)).
Extend h also to Ā by setting h(x̄) = h(x) for all x ∈ A.
It is now clear that h(EG (#, ḡ1 , ḡ2 )) = K + , since ḡ1 (w) = #ḡ2 (w) if and
only if, w = x1 · · · xn xn+1 , where xi ∈ Ā+ for 1 ≤ i ≤ n, xn+1 ∈ A+ , ḡ1 (xi )# =
#ḡ2 (xi ) for 1 ≤ i ≤ n and ḡ1 (xn+1 ) = #ḡ2 (xn+1 ). It is clear that after removing
the bars form the letters xi (by h), we obtain words in EG (#, g1 , g2 ).
360
3.2
V. Halava et al.
Intersection with Regular Languages
We show now that CE is closed under intersections with regular languages. Note
that for CE per this closure already follows from the closure of Reg(Z) languages.
Theorem 5 The family CE is closed under intersections with regular languages.
Proof. Let J = (a, g1 , g2 ) be an instance of the shifted PCP, g1 , g2 : Σ ∗ → B ∗ .
Let L = h(EG (J)), where h : Σ ∗ → C ∗ is coding.
We shall prove that h(EG (J)) ∩ R is in CE for all regular R ⊆ B ∗ . We note
first that h(EG (J)) ∩ R = h(EG (J) ∩ h−1 (R)), and therefore it is sufficient to
show that, for all regular languages R ⊆ Σ ∗ , h(EG (J)∩R) is in CE. Therefore, we
shall give a construction for instances J ′ of the shifted PCP such that EG (J ′ ) =
EG (J) ∩ R.
Assume R ⊆ Σ ∗ is regular language, and let G = (N, Σ, P, S) be a right
linear grammar generating R (see [7]). Let N = {A0 , . . . , An−1 }, where S = A0 ,
and assume without restriction, that there are no productions having S = A0 on
the right hand side. We consider the set P of the productions as an alphabet.
Let # and d be new letters. We define new morphisms g1′ , g2′ : P ∗ → (B ∪
{d, #})∗ as follows. First assume that
g1 (a) = a1 a2 . . . ak
and g2 (a) = b1 b2 . . . bm
for the (generic) letter a. We define
and
n
#d a1 dn a2 dn . . . ak dj ,
dn−i a dn a dn . . . a dj ,
1
2
k
g1′ (π) =
#dn a1 dn a2 dn . . . ak ,
n−i
d a1 dn a2 dn . . . ak ,
g2′ (π) = dn b1 dn b2 . . . dn bm ,
if
if
if
if
π
π
π
π
= (A0 → aAj )
= (Ai → aAj ),
= (A0 → a),
= (Ai → a).
if π = (A → aX),
where X ∈ N ∪ {ε}.
As in [4], EG (J ′ ) = EG (J) ∩ R for the new instance J ′ = (#, g1′ , g2′ ). The
claim follows from this.
3.3
Morphisms
Next we shall present a construction for the closure under nonerasing morphisms.
This construction is a bit more complicated than the previous ones.
Theorem 6 The family CE is closed under taking images of nonerasing morphisms.
Languages Defined by Generalized Equality Sets
361
Proof. Let J = (a, g1 , g2 ) be an instance of the shifted PCP, where g1 , g2 : A∗ →
B ∗ . Let L = h(EG (J)), where h : A∗ → C ∗ is a coding. Assume that f : C ∗ → Σ ∗
is a nonerasing morphism. We shall construct h′ , g1′ and g2′ such that f (L) =
h′ (EG (J ′ )) for the new instance J ′ = (a, g1′ , g2′ ).
First we show that we can restrict ourselves to cases where
min{|g1 (x)|, |g2 (x)|} ≥ |f (x)| for all x ∈ A.
(3.4)
Indeed, suppose the instance J does not satisfy (3.4). We construct a new in¯ = h(EG (J)) and ḡ1
stance J¯ = (#, ḡ1 , ḡ2 ) and a coding h̄ such that h̄(EG (J)
and ḡ2 do fulfill (3.4). Let c ∈
/ B be a new letter. Let k = maxx∈A {|f (x)|}. We
define ḡ1 (x) = ℓkc (g1 (x)) and ḡ2 (x) = ℓkc (g2 (x)) for all x ∈ A. We also need a
new copy x′ of each letter x for which a is a prefix of g1 (x). If g1 (x) = aw, where
¯ then
w ∈ B ∗ , then define ḡ1 (x′ ) = #ℓkc (w). It now follows that if u ∈ EG (J),
′
∗
u = x v for some word v ∈ A and xv ∈ EG (J). Therefore, by defining h̄ as
follows
h(y), if y ∈ A
h̄(y) =
h(x), if y = x′ ,
¯ = h(EG (J)) as required.
we have h̄(EG (J)
Now assume that (3.4) holds in J = (a, g1 , g2 ) and for f . Let us consider
the nonerasing morphism f ◦ h : A∗ → Σ ∗ . Note that also the morphism f ◦ h
satisfies (3.4). In order to prove the claim, it is clearly sufficient to consider the
case, where h is the identity mapping, that is, f = f ◦ h.
First we define for every image f (x), where x ∈ A, a new alphabet Ax =
{bx | b ∈ Σ}. We consider the words
(b1 b2 . . . bm )x = (b1 )x (b2 )x . . . (bm )x ,
for f (x) = b1 . . . bm .
Let c and d be new letters and let n =
x∈A |f (x)|. Assume that A =
{x1 , x2 , . . . , xq }.
Partition the integers 1, 2, . . . , n into q sets such that for the letter xi there
corresponds a set, say Si = {i1 , i2 , . . . , i|f (xi | }, of |f (xi )| integers.
Assume that f (xi ) = b1 . . . bm , g1 (xi ) = a1 a2 . . . aℓ , and g2 (xi ) = a′1 a′2 . . . a′k .
We define new morphisms g1′ and g2′ as follows:
g1′ ((b1 )xi ) = cn dn a1 ci1 ,
g1′ ((bj )xi ) = cn−ij−1 dn aj cij
g1′ ((bm )xi )
n−im−1 n
=c
for j = 2, . . . , m − 1,
n n
d am c d . . . cn dn aℓ ,
and
g2′ ((b1 )xi ) = cn dn a1 cn di1 ,
g2′ ((bj )xi ) = dn−ij−1 a′j cn dij
g2′ ((bm )xi )
=
for j = 2, . . . , m − 1,
cn dn−im−1 a′m cn dn
. . . cn dn a′k .
362
V. Halava et al.
Then
g1′ ((b1 . . . bm )xi ) = cn dn a1 cn dn a2 . . . cn dn aℓ ,
g2′ ((b1 . . . bm )xi ) = cn dn a′1 cn dn a′2 . . . cn dn a′k .
The beginning has to be still fixed. For the cases, where a1 = a, we need new
letters (b1 )′xi , for which we define
g1′ ((b1 )′xi ) = aci1 and g2′ ((b1 )′xi ) = cn dn aj cn di1 .
Now our constructions for the morphisms g1′ and g2′ are completed.
Next we define h′ , by setting h′ ((bi )x ) = bi and h′ ((b1 )′x ) = b1 for all i and
x. We obtain that h′ (EG (J ′ )) = f (h(EG (J)), which proves the claim.
Next we shall prove that the family CE is closed under inverse of nonerasing
morphisms.
Theorem 7 The family CE is closed under nonerasing inverse morphisms.
Proof. Consider an instance h(EG (J)), where J = (#, g1 , g2 ) with gi : A∗ → B ∗
and h : A∗ → C ∗ is a coding. We may assume that h(A) = C.
Moreover, let g : Σ ∗ → C ∗ be a nonerasing morphism.
For each a ∈ Σ, let h−1 g(a) = {va,1 , va,2 , . . . , va,ka } and let
Σa = {a(1) , . . . , a(ka ) }
be a set of new letters for a. Denote Θ = ∪a∈Σ Σa , and define the morphisms
g1′ , g2′ : Θ∗ → B ∗ and the coding t : Θ∗ → Σ ∗ by
gj′ (a(i) ) = gj (va,i ) for j = 1, 2, and t(a(i) ) = a
for each a(i) ∈ Θ.
Consider the instance J ′ = (#, g1′ , g2′ ).
Now, assume that x = a1 a2 . . . an ∈ g −1 h(EG (J)) (with ai ∈ Σ). Then there
exists a word w = w1 w2 . . . wn such that g1 (w) = #g2 (w) and ai ∈ g −1 h(wi ),
that is, wi = vai ,ri ∈ h−1 g(ai ) for some ri , and so g1′ (w′ ) = #g2′ (w′ ) for the word
(r ) (r )
(r )
w′ = a1 1 a2 2 . . . an n , for which t(w′ ) = x. Therefore x ∈ t(EG (J ′ )).
In converse inclusion, t(EG (J ′ )) ⊆ g −1 h(EG (J)) is clear by the above constructions.
∗
∗
Let A and B be two alphabets. A mapping τ : A∗ → 2B , where 2B denotes
the set of all subsets of B ∗ , is a substitution if for all u, v ∈ A∗
τ (uv) = τ (u)τ (v).
∗
Note that τ is actually a morphisms from A∗ to 2B .
A substitution τ is called finite if τ (a) is finite for all a ∈ A, and nonerasing
if ∅ = τ (a) = {ε} for all a ∈ A.
Languages Defined by Generalized Equality Sets
363
Corollary 8 The family CE is closed under nonerasing finite substitutions.
Proof. Since CE is closed under nonerasing morphisms, inverse of nonerasing
morphisms, that implies that it is closed under nonerasing finite substitutions
that are compositions of inverse of a coding and a nonerasing morphism.
Note that CE is almost a trio, see [1], but it seems that it is not closed under
all inverse morphisms. It is also almost a bifaithful rational cone, see [5], but
since the languages do not contain ε, CE is not closed under the bifaithful finite
transducers.
References
1. S. Ginsburg, Algebraic and Automata-theoretic Properties of Formal Languages,
North-Holland, 1975.
2. V. Halava, T. Harju, H. J. Hoogeboom and M. Latteux, Valence Languages Generated by Generalized Equality Sets, Tech. Report 502, Turku Centre for Computer
Science, August 2002, submitted.
3. T. Harju and J. Karhumäki, Morphisms, Handbook of Formal Languages (G. Rozenberg and A. Salomaa, eds.), vol. 1, Springer-Verlag, 1997.
4. H. Latteux and J. Leguy, On the composition of morphisms and inverse morphisms,
Lecture Notes in Comput. Sci. 154 (1983), 420–432.
5. H. Latteux and J. Leguy, On Usefulness of Bifaithful Rational cones, Math. Systems
Theory 18 (1985), 19–32.
6. G. Păun, A new generative device: valence grammars, Revue Roumaine de Math.
Pures et Appliquées 6 (1980), 911–924.
7. A. Salomaa, Formal Languages, Academic Press, New York, 1973.
Context-Sensitive Equivalences for Non-interference
Based Protocol Analysis ⋆
Michele Bugliesi, Ambra Ceccato, and Sabina Rossi
Dipartimento di Informatica, Università Ca’ Foscari di Venezia
via Torino 155, 30172 Venezia, Italy
{bugliesi, ceccato, srossi}@dsi.unive.it
Abstract. We develop new proof techniques, based on non-interference, for the
analysis of safety and liveness properties of cryptographic protocols expressed
as terms of the process algebra CryptoSPA. Our approach draws on new notions
of behavioral equivalence, built on top of a context-sensitive labelled transition
system, that allow us to characterize the behavior of a process in the presence of
any attacker with a given initial knowledge. We demonstrate the effectiveness of
the approach with an example of a protocol of fair exchange.
1 Introduction
Non-Interference has been advocated by various authors [1, 9] as a powerful method for
the analysis of cryptographic protocols. In [9], Focardi et al. propose a general schema
for specifying security properties with a uniform and concise definition. The approach
draws on earlier work by the same authors on characterizing information-flow security
in terms of Non-Interference for the Security Process Algebra (SPA, for short). We
briefly review the main ideas below.
SPA is a variant of CCS in which the set of actions is partitioned into two sets: L,
for low, and H for high. A Non-Interference property P for a process E is expressed as
follows:
E ∈ P if ∀Π ∈ EH : (E||Π) \ H ≈P E \ H
(1)
where EH is the set of all high-level processes, ≈P is an observation equivalence (parametric in P ), || is parallel composition, and \ is restriction. The processes E \ H and
(E||Π) \ H represent the low-level views of E and of E||Π, respectively. The basic intuition is expressed by the slogan: “If no high-level process can change the low behavior,
then no flow of information from high to low is possible”.
In [9] this idea is refined to provide a general definition of security properties for
cryptographic protocols described as terms of CryptoSPA, a process algebra that extends SPA with cryptographic primitives. Intuitively, the refinement amounts to viewing the participants to a protocol as low-level processes, while the high-level processes
represent the external attackers. Then, Non-Interference implies that the attackers have
no way to change the low (honest) behavior of the protocol.
⋆
This work has been partially supported by the MIUR project “Modelli formali per la sicurezza
(MEFISTO)” and the EU project IST-2001-32617 “Models and types for security in mobile
distributed systems (MyThS)”.
A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 364−375, 2003.
Springer-Verlag Berlin Heidelberg 2003
Context-Sensitive Equivalences for Non-interference Based Protocol Analysis
365
There are two problems that need to be addressed to formalize this idea. First, the
intruder should be assumed to have complete control over the public components of
the network. Consequently, any step in a protocol involving a public channel should be
classified as a high-level action. However, since a protocol specification is usually entirely determined by the exchange of messages over public channels, a characterization
like (1) becomes trivial, as (E||Π) \ H and E \ H are simply the null processes. This is
easily rectified by extending the protocol specification with low-level actions that are
used to specify the desired security property.
A further problem arises from the formalization of the perfect cryptography assumption that is usually made in the analysis of the logical properties of cryptographic
protocols. In [9] this assumption is expressed by making the definition of Non-Interference dependent on the initial knowledge of the attacker and on a deduction system
by which the attacker may compute new information. The initial knowledge, noted φ,
includes private data (e.g., the enemy’s private keys) as well as any piece of publicly
available information, such as names of entities and public keys. Property (1) is thus
reformulated for a protocol P as follows:
φ
P ∈ P if ∀Π ∈ EH : (P||Π) \ H ≈P P \ H.
φ
(2)
where EH is the set of the high-level processes Π which can perform only actions using
the public channel names and whose messages (those syntactically appearing in Π) can
be deduced from φ.
This framework is very general, and lends itself to the characterization of various
security properties, obtained by instantiating the equivalence ≈P in the schema above.
Instead, it is less effective as a proof method, due to the universal quantification over the
φ
possible intruders Π in the class EH . In [9], the problem is circumvented by analyzing
the protocol in presence of the “hardest attacker”. However, In [9] this characterization
is proved correct only for the class of relationships ≈P that are behavioral preorders
on processes. In particular, the proof method is not applicable for equivalences based
on bisimulation, and consequently, for the analysis of certain, branching time, liveness
properties, such as fairness.
We partially rectify the problem by developing a technique which does not require
us to exhibit an explicit attacker (nor, in particular, it requires the existence of a hardest
attacker). Our approach draws on ideas from [4] to represent the attacker indirectly, in
terms of a context-sensitive labelled transition system. The labelled transitions take the
a
form φ⊲P −→ φ′ ⊲P′ , where φ represents the context’s knowledge prior to the transition,
′
and φ is the new knowledge resulting from P performing the action a. Building on this
labelled transition system we provide quantification-free characterizations for different
instantiations of (2), specifically when ≈P is instantiated to trace equivalence, and to
weak bisimulation equivalence. This allows us to apply our technique to the analysis of
safety as well as liveness security properties. We demonstrate the latter with an example
of a protocol of fair exchange.
The rest of the presentation proceeds as follows: Section 2 briefly reviews the process algebra CryptoSPA, Section 3 introduces context-sensitive labelled transition systems, Section 4 gives characterizations for various security properties, Section 5 illustrates the example, and Section 6 draws some conclusions.
366
M. Bugliesi, A. Ceccato, and S. Rossi
All the results presented in this paper are described and proved in [7].
2 The CryptoSPA Language
The Cryptographic Security Process Algebra (CryptoSPA, for short) [9] is an extension
of SPA [8] with cryptographic primitives and constructs for value passing. The syntax
is based on the following elements: a set M of basic messages and a set K of encryption
keys with a function ·−1 : K −→ K such that (k−1 )−1 = k; a set M , ranged over by
m, of all messages, defined as the least set containing M ∪ K and closed under the
deduction rules in Table 1 (more on this below); a set C of channels partitioned into
two sets H and L of high and low channels, respectively; a function Msg which maps
every channel c into the set of messages that can be sent and received on c and such
that Msg(c) = Msg(c); a set L = {c(m) | m ∈ Msg(c)} ∪ {cm | m ∈ Msg(c)} of visible
actions and the set Act = L ∪ {τ} of all actions, ranged over by a, where τ is the internal
(invisible) action; a function chan(a) which returns c if a is either c(m) or cm and the
special channel void when a = τ; a set Const of constants. By an abuse of notation, we
write c(m), c m ∈ H whenever c, c ∈ H, and similarly for L.
The syntax of CryptoSPA terms (or processes) is defined as follows:
P ::= 0 | c(x).P | cm.P | τ.P | P + P | P||P | P \C | P[ f ] |
| A(m1 , ..., mn ) | [m = m′ ]P; P | [hm1 ...mn i ⊢rule x]P; P
Both c(x).P and [hm1 ...mn i ⊢rule x]P; P′ bind the variable x in P. Constants are defined
de f
as: A(x1 , ..., xn ) = P, where P is a CryptoSPA process that may contain no free variables
except x1 , . . . , xn , which must be pairwise distinct.
Table 1. Inference system for message manipulation where m, m′ ∈ M and k, k−1 ∈ K
m m′
(m, m′ )
(⊢ pair )
(m, m′ )
(m, m′ )
(⊢ f st )
m′
m
m k
{m}k
(⊢enc )
{m}k
k−1
(⊢snd )
(⊢dec )
m
Intuitively, 0 is the empty process; c(x).P waits for input m on channel c, and then
behaves as P[m/x] (i.e., P with all the occurrences of x substituted by m); c(m).P
outputs m on channel c and continues as P; P1 + P2 represents the nondeterministic
choice between P1 and P2 ; P1 ||P2 is parallel composition, where executions are interleaved, possibly synchronized on complementary input/output actions, producing an
internal action τ; P \ C is like P but prevented from sending and receiving messages
Context-Sensitive Equivalences for Non-interference Based Protocol Analysis
367
on channels in C ⊆ C ; in P[ f ] every channel c is relabelled into f (c); A(m1 , ..., mn )
behaves like the respective definition where the variables x1 , · · · , xn are substituted with
messages m1 , · · · , mn ; [m = m′ ]P1 ; P2 behaves as P1 if m = m′ and as P2 otherwise; finally, [hm1 ...mn i ⊢rule x]P1 ; P2 tries to deduce an information z from the tuple hm1 ...mn i
through rule ⊢rule ; if it succeeds then it behaves as P1 [z/x], otherwise it behaves as P2 .
In formalizing the security properties of interest, we will find it convenient to rely
on (an equivalent of) the hiding operator, of CSP, noted P/C with P process and C ⊆
C , which turns all actions using channels in C into internal τ’s. This operator can be
def
defined in CryptoSPA as follows: given any set C ⊆ C , P/C = P[ fC ] where fC (a) =
a if chan(a) 6∈ C and fC (a) = τ if chan(a) ∈ C.
We denote by E the set of all CryptoSPA processes and by EH the set of all highlevel processes, i.e., those constructed only using actions in H ∪ {τ}.
The operational semantics of CryptoSPA is defined in terms of the labelled transition system (LTS) in Table 2. Most of the transitions are standard, and simply formalize
the intuitive semantics of the process constructs discussed above. The two rules (⊢i )
connect the deduction system in in Table 1 with the transition system. The former system is used to model the ability of the attacker to deduce new information from its initial
knowledge. Note, in particular, that secret keys, not initially known to the attacker, may
not be deduced (hence we disregard cryptographic attacks, based on guessing secret
keys). We say that m is deducible from a set of messages φ (and write φ ⊢ m) if m can
be obtained from φ by applying the inference rules in Table 1. As in [9] we assume that
⊢ is decidable.
We complement the definition of the semantics with a corresponding notion of observation equivalence, which is used to establish equalities among processes and is
based on the idea that two systems have the same semantics if and only if they cannot be distinguished by an external observer. The equivalences that are relevant to the
present discussion are trace equivalence, noted ≈T , and weak bisimulation, noted ≈B
(see [13]).
In the next section, we introduce coarser versions of these equivalences, noted
φ
φ
≈T and ≈B , which distinguish processes in contexts with initial knowledge φ. These
context-sensitive notions of equivalence are built on a refined version of the labelled
transition system, which we introduce next.
3
Context-Sensitive Equivalences
Following [4], we characterize the behavior of processes in terms of “context-sensitive
labelled transitions” where each process transition depends on the knowledge of the
context. To motivate, consider a process P that produces and sends a message {m}k
reaching the state P′ , and assume that m and k are known to P but not to the context.
Under these hypotheses, the context will never be able to reply the message m to P′
(or any continuation thereof). Hence, if P′ waits for further input, we can safely leave
any input transition involving m out of the LTS, as the P′ will never receive m from the
context.
The states of the new labelled transition system are configurations of the form φ ⊲ P,
where P is a process and φ is the current knowledge of the context, represented through
368
M. Bugliesi, A. Ceccato, and S. Rossi
Table 2. The operational rules for CryptoSPA
m ∈ Msg(c)
(input)
c(m)
m ∈ Msg(c)
(output)
c(m)
c(x).P −→ P[m/x]
cm.P −→ P
a
(tau)
τ
τ.P −→ P
P1 −→ P1′
a
P1 ||P2 −→ P1′ ||P2
(||2 )
a
(=1 )
m=
6 m′ P2 −→ P2′
a
[m = m′ ]P1 ; P2 −→ P2′
f (a)
τ
P1 ||P2 −→ P1′ ||P2′
m = m′ P1 −→ P1′
a
[m = m′ ]P1 ; P2 −→ P1′
a
P[ f ] −→ P′ [ f ]
(\C)
a
(constant)
c(m)
P1 −→ P1′ P2 −→ P2′
a
(=2 )
a
P −→ P′
([ f ])
a
P1 + P2 −→ P1′
c(m)
a
(||1 )
P1 −→ P1′
(+1 )
P[m1 /x1 , . . . , mn /xn ] −→ P′
/C
P −→ P′ chan(a) ∈
a
P \C −→ P′ \C
de f
A(x1 , . . . , xn ) = P
a
A(m1 , . . . , mn ) −→ P′
a
(⊢1 )
hm1 , . . . , mn i ⊢rule m P1 [m/x] −→ P1′
a
[hm1 , . . . , mn i ⊢rule x]P1 ; P2 −→ P1′
a
(⊢2 )
∄m : hm1 , . . . , mn i ⊢rule m P2 −→ P2′
a
[hm1 , . . . , mn i ⊢rule x]P1 ; P2 −→ P2′
Context-Sensitive Equivalences for Non-interference Based Protocol Analysis
369
Table 3. Inference rules for the ELTS
cm
(output)
P −→ P′
cm ∈ H
c(m)
c(m)
(input)
P −→ P′
φ ⊲ P −→ φ ∪ {m} ⊲ P′
c(m) ∈ H
c(m)
φ ⊲ P −→ φ ⊲ P′
a
τ
(tau)
φ ⊢ m
P −→ P′
(low)
τ
φ ⊲ P −→ φ ⊲ P′
P −→ P′ a ∈ L
a
φ ⊲ P −→ φ ⊲ P′
a set of messages. The transitions represent interactions between the process and the
context and now take the form
a
φ ⊲ P −→ φ′ ⊲ P′ ,
where a is the action executed by the process P and φ′ is the new knowledge at disposal
to the context for further interactions with P′ .
The transitions between configurations, in Table 3, are defined rather directly starting from the corresponding transitions between processes. In rule (output), the context’s
knowledge is augmented with the information sent by the process. Dually, rule (input)
assumes that the context performs an output action synchronizing with the input of the
process. The message sent by the context must be completely deducible from the context’s knowledge φ, otherwise the corresponding transition is impossible: this is how the
new transitions provide an explicit account of the attacker’s knowledge. The remaining
rules, (tau) and (low) state that internal actions of the protocol, and low actions do not
contribute to the knowledge of the context in any way.
In the rest of the presentation, we refer to the transition rules in Table 3 collectively
as the enriched LTS (ELTS, for short). Also, we assume that the initial knowledge of
the context includes only public information and the context’s private names. This is
a reasonable condition, since it simply corresponds to assuming that each protocol run
starts with fresh keys and nonces, a condition that is readily guaranteed by relying on
time-dependent elements (e.g., time-stamps) and assuming that session keys are distinct
for every executions.
The notions of trace and weak bisimulation equivalences extend in the expected way
from processes to ELTS configurations, as we discuss below.
a
τ
We write φ ⊲ P =⇒ φ′ ⊲ P′ to denote the sequence of transitions φ ⊲ P (−→)∗ φ ⊲
a
τ
a
P1 −→ φ′ ⊲ P2 (−→)∗ φ′ ⊲ P′ , where, as expected, φ = φ′ if −→ is an input, low
or silent action. Furthermore, let γ = a1 . . . an ∈ L ∗ be a sequence of (non silent)
γ
actions; then φ ⊲ P =⇒ φ′ ⊲ P′ if there are P1 , P2 , . . . , Pn−1 ∈ E and φ1 , φ2 , . . . , φn−1
an−1
a1
a2
an
φ′ ⊲ P′ . The notation
. . . =⇒ φn−1 ⊲ Pn−1 =⇒
φ1 ⊲ P1 =⇒
states such that φ ⊲ P =⇒
â
a
τ
′
′
′
′
φ ⊲ P =⇒ φ ⊲ P stands for φ ⊲ P =⇒ φ ⊲ P if a ∈ L and for φ ⊲ P (−→)∗ φ ⊲ P′ if
a = τ, as usual.
370
M. Bugliesi, A. Ceccato, and S. Rossi
Definition 1 (Trace Equivalence over configurations).
γ
– T (φ ⊲ P) = {γ ∈ L ∗ | ∃φ′ , P′ : φ ⊲ P =⇒ φ′ ⊲ P′ } is the set of traces associated
with the configuration φ ⊲ P.
– Two configurations φP ⊲ P and φQ ⊲ Q are trace equivalent, denoted by φP ⊲ P ≈cT
φQ ⊲ Q, if T (φP ⊲ P) = T (φQ ⊲ Q).
Based on trace equivalence over configurations we can then define a corresponding
notion of process equivalence, for processes executing in an environment with initial
φ
knowledge φ. Formally, P ≈T Q whenever φ ⊲ P ≈cT φ ⊲ Q.
Definition 2 (Weak Bisimulation over configurations).
– A binary relation R over configurations is a weak bisimulation if, assuming (φP ⊲
P, φQ ⊲ Q) ∈ R , one has, for all a ∈ Act:
a
• if φP ⊲ P −→ φP′ ⊲ P′ , then there exists a configuration φQ′ ⊲ Q′ such that
â
φQ ⊲ Q =⇒ φQ′ ⊲ Q′ and (φP′ ⊲ P′ , φQ′ ⊲ Q′ ) ∈ R ;
a
• if φQ ⊲ Q −→ φQ′ ⊲ Q′ , then there exists a configuration φP′ ⊲ P′ such that
â
φP ⊲ P =⇒ φP′ ⊲ P′ and (φP′ ⊲ P′ , φQ′ ⊲ Q′ ) ∈ R .
– Two configurations φP ⊲ P and φQ ⊲ Q are weakly bisimilar, denoted by φP ⊲ P ≈cB
φQ ⊲ Q, if there exists a weak bisimulation containing the pair (φP ⊲ P, φQ ⊲ Q).
It is not difficult to prove that relation ≈cB is the largest weak bisimulation over configurations, and that it is an equivalence relation. As for trace equivalence, we can recover
an equivalence relation on processes executing in a context with initial knowledge φ by
φ
defining P ≈B Q if and only if φ ⊲ P ≈cB φ ⊲ Q.
4 Non-interference Proof Techniques
We show that the new definitions of behavioral equivalence may be used to construct
effective proof methods for various security properties within the general schema proposed in [9]. In particular, we show that making our equivalences dependent on the
initial knowledge of the attacker provides us with security characterizations that are
stated independently from the attacker itself.
The first property we study, known as NDC, results from instantiating ≈P in (2)
(see the introduction) to the trace equivalence relation ≈T . As discussed in [9], NDC is
a generalization of the classical idea of Non-Interference to non-deterministic systems
and can be used for analyzing different security properties of cryptographic protocols
such as secrecy, authentication and integrity. NDC can readily be extended to account
for the context’s knowledge as follows:
φ
Definition 3 (NDCφ ). P ∈ NDCφ if P \ H ≈T (P||Π) \ H, ∀ Π ∈ EH .
A process P is NDCφ if for every high-level process Π with initial knowledge φ a low
level user cannot distinguish P from (P||Π), i.e., if Π cannot interfere with the low-level
execution of the process P.
Context-Sensitive Equivalences for Non-interference Based Protocol Analysis
371
Focardi et al. in [9] show that when φ is finite it is possible to find a most general intruder Topφ so that verifying NDCφ reduces to checking P \ H ≈T (P||Topφ ) \ H.
Here we provide an alternative1 , quantification-free characterization of NDCφ . Let P/H
denote the process resulting from P, by replacing all high-level actions with the silent
action τ (cf. Section 2).
φ
Theorem 1 (NDCφ ). P ∈ NDCφ if and only if P \ H ≈T P/H.
More interestingly, our approach allows us to find a sound proof method for the BNDCφ
property, which results from instantiating (2) in the introduction with the equivalence
≈B as follows:
φ
Definition 4 (BNDCφ ). P ∈ BNDCφ if P \ H ≈B (P||Π) \ H, ∀Π ∈ EH .
As for NDCφ , the definition falls short of providing a proof method due to the universal
quantification over Π. Here, however, the problem may not be circumvented by resorting to a hardest attacker, as the latter does not exist, being there no (known) preorder on
processes corresponding to weak bisimilarity.
What we propose here is a partial solution that relies on providing a coinductive
(and quantification free) characterization of a sound approximation of BNDCφ , based
on the following persistent version of BNDCφ .
Definition 5 (P BNDCφ ). P ∈ P BNDCφ if P′ ∈ BNDCφ , ∀P′ reachable from P.
P BNDCφ is the context-sensitive version of the P BNDC property studied in [10].
Following the technique in [10], one can show that P BNDCφ is a sound approximation
of BNDCφ which admits elegant quantification-free characterizations. Specifically, like
P BNDC, P BNDCφ can be characterized both in terms of a suitable weak bisimulation
φ
relation “up to high-level actions”, noted ≈ \H , and in terms of unwinding conditions,
as discussed next. We first need the following definition:
â
Definition 6. Let a ∈ Act. The transition relation =⇒\H is defined as follows:
â
=⇒\H =
â
(
â
=⇒
if a 6∈ H
a
τ̂
=⇒ or =⇒
if a ∈ H
â
The transition relation =⇒\H is defined as =⇒, except that it treats H-level actions as
silent actions. Now, weak bisimulations up to H over configurations are defined as weak
bisimulations over configurations except that they allow a high action to be matched by
zero or more high actions. Formally:
Definition 7 (Weak Bisimulation up to H over configurations).
– A binary relation R over configurations is a weak bisimulation up to H if (φP ⊲
P, φQ ⊲ Q) ∈ R implies that, for all a ∈ Act,
1
An analogous result has been recently presented by Gorrieri et al. in [11] for a timed extension
of CryptoSPA. We discuss the relationships between our and their result in Section 6.
372
M. Bugliesi, A. Ceccato, and S. Rossi
a
• if φP ⊲ P −→ φP′ ⊲ P′ , then there exists a configuration φQ′ ⊲ Q′ such that
â
φQ ⊲ Q =⇒\H φQ′ ⊲ Q′ and (φP′ ⊲ P′ , φQ′ ⊲ Q′ ) ∈ R ;
a
• if φQ ⊲ Q −→ φQ′ ⊲ Q′ , then there exists a configuration φP′ ⊲ P′ such that
â
φP ⊲ P =⇒\H φP′ ⊲ P′ and (φP′ ⊲ P′ , φQ′ ⊲ Q′ ) ∈ R .
– Two configurations φP ⊲ P and φQ ⊲ Q are weakly bisimilar up to H, denoted by
φP ⊲ P ≈c\H φQ ⊲ Q, if there exists a weak bisimulation up to H containing the pair
(φP ⊲ P, φQ ⊲ Q).
Again, we can prove that the relation ≈c\H is the largest weak bisimulation up to H over
configurations and that it is an equivalence relation. Also, as for previous relations over
configurations, we can recover an associated relation over processes in a context with
initial knowledge φ by defining
φ
P ≈\H Q if and only if φ ⊲ P ≈c\H φ ⊲ Q.
We can finally state the two characterizations of P BNDCφ . The former characteriφ
zation is expressed in terms of ≈ \H (with no quantification on the reachable states and
on the high-level malicious processes).
φ
Theorem 2 (P BNDCφ 1). P ∈ P BNDCφ if and only if P \ H ≈ \H P.
The second characterization of P BNDCφ is given in terms of unwinding conditions
which demand properties of individual actions. Unwinding conditions aim at “distilling” the local effect of performing high-level actions and are useful to define both proof
systems (see, e.g., [6]) and refinement operators that preserve security properties, as
done in [12].
Theorem 3 (P BNDCφ 2). P ∈ P BNDCφ if and only if for all φi ⊲ Pi reachable from
h
τ̂
φ ⊲ P, if φi ⊲ Pi −→ φ′i ⊲ Pi′ for h ∈ H, then φi ⊲ Pi =⇒ φ′′i ⊲ Pi′′ such that φ′i ⊲ Pi′ \H ≈cB
φ′′i ⊲ Pi′′ \ H.
Both the characterizations can be used for verifying cryptographic protocols. A concrete
example of a fair exchange protocol is illustrated in the next section.
5
An Example: The ASW Fair Exchange Protocol
The ASW contract signing protocol [2] is used in electronic commerce transactions
to enable two parties, named O (originator) and R (responder), to obtain each other’s
commitment on a previously agreed contractual text M. To deal with unfair situations,
each party may appeal to a trusted third party T which can decide, on the basis of the
data it has received, whether to issue a replacement contract or an abort token. If both
O and R are honest, and they receive the messages sent to them, then they both obtain a
valid contract upon the completion of the protocol.
We say that the protocol guarantees fairness to O (dually, to R) on message M, if
whatever malicious R (O) is considered, if R (O) gets evidence that O (R) has originated M then also O (R) will eventually obtain the evidence that R (O) has received M.
Context-Sensitive Equivalences for Non-interference Based Protocol Analysis
373
Notice that this is a branching-time liveness property: we are requiring that something
should happen if O (resp. R) gets his evidence —i.e., that also R (resp. O) should get his
evidence— for all the execution traces in the protocol (cf. [9] for a thorough discussion
on this point).
The protocol consists of three independent sub-protocols: exchange, abort and resolve. Here, we focus on the main exchange sub-protocol that is specified by the following four messages, where M is the contractual text on which we assume the two parties
previously agreed, while SKO and SKR (PKO and PKR ) are the private (public) keys of
O and R, respectively.
O → R : me1 = {M, h(NO )}SKO
R → O : me2 = {{M, h(NO )}SKO , h(NR )}SKR
O → R : me3 = NO
R → O : me4 = NR
In the first step, O commits to the contractual text by hashing a random number NO , and
signing a message that contains both h(NO ) and M. While O does not actually reveal the
value of its contract authenticator NO to the recipient of message me1 , O is committed
to it. As in a standard commitment protocol, we assume that it is not computationally
feasible for O to find a different number NO′ such that h(NO′ ) = h(NO ). In the second
step, R replies with its own commitment. Finally, O and R exchange the actual contract
authenticators.
We specify the sub-protocol in CryptoSPA (see the figure below), by introducing
some low-level actions to verify the correctness of protocol’s executions. We say that
an execution is correct if we observe the sequence of low-level actions received me1 ,
received me2 , received NO , received NR in this order.
de f
O(M, NO ) = [hNO , kh i ⊢enc n][h(M, n), SKO i ⊢enc p] cp. c(v).
[hv, PK R i ⊢dec i][i ⊢ f st p′ ][i ⊢snd r′ ][p′ = p] received v.
cNO . c( j). [h j, kh i ⊢enc r′′ ][r′′ = r′ ] received j
de f
R(M, NR ) = c(q). [hq, PKO i ⊢dec s][s ⊢ f st m][s ⊢snd n′ ][m = M] received q.
[hNR , kh i ⊢enc r][h(q, r), SKR i ⊢enc t] ct. c(u).
[hu, kh i ⊢enc n′′ ][n′′ = n′ ] received u. cNR
de f
P = O(M, NO ) || R(M, NR )
Fig. 1. The CryptoSPA specification of the exchange sub-protocol of ASW
We can demonstrate that the protocol does not satisfy property P BNDCφ when
φ consists of public information and private data of possible attacker’s. This can be
easily checked by applying Theorem 3. Indeed, just observing the protocol ELTS, one
a
can immediately notice that there exists a configuration transition φ ⊲ P −→ φ′ ⊲ P′ ,
τ̂
where a = cme1 , but there isn’t any φ′′ and P′′ such that φ ⊲ P =⇒ φ′′ ⊲ P′′ and φ′ ⊲
P′ \ H ≈cB φ′′i ⊲ Pi′′ \ H. In fact, it is easy to prove that φ′ ⊲ P′ \ H ≈cB 0 for all φ′ , while
374
M. Bugliesi, A. Ceccato, and S. Rossi
τ̂
φ′′ ⊲ P′′ \ H 6≈cB 0 for all P′′ and φ′′ such that φ ⊲ P =⇒ φ′′ ⊲ P′′ . However, the fact
that, in this case, the ASW protocol does not satisfy P BNDCφ does not represent a real
attack to the protocol since such a situation is resolved by inching the trusted party T .
More interestingly, we can analyze the protocol under the assumption that one of
the participants is corrupt. This can be done by augmenting the knowledge φ with the
corrupt party’s private information such as its private key and its contract authenticator.
We can show that the protocol does not satisfy P BNDCφ when O is corrupt, finding
the attack already described in [14].
6 Conclusions and Related Work
We have studied context-sensitive equivalence relationships and relative proof techniques within the process algebra CryptoSPA to analyze protocols. Our approach builds
on context-sensitive labelled transition systems, whose transitions are constrained by
the knowledge of the environment. We showed that our technique can be used to analyze both safety and liveness properties of cryptographic protocols.
In a recent paper Gorrieri et al. [11] prove results related to ours, for a real-time
extension of CryptoSPA. In particular, they prove an equivalent of Theorem 1: however,
while the results are equivalent, the underlying proof techniques are not. More precisely,
instead of using context-sensitive LTS’s, [11] introduces a special hiding operator /φ
and prove that P ∈ NDCφ if and only if P \ H ≈T P/φ H. Process P/φ H corresponds
exactly to our configuration φ ⊲ P/H, in that the corresponding LTS’s are isomorphic.
However, the approach of [11] is still restricted to the class of observation equivalences
that are behavioral preorders on processes and thus it does not extend to bisimulations.
As we pointed out since the outset, our approach is inspired by Boreale, De Nicola
and Pugliese’s work [4] on characterizing may test and barbed congruence in the spi calculus by means of trace and bisimulation equivalences built on top of context-sensitive
LTS’s. Based on the same technique, symbolic semantics and compositional proofs have
been recently studied in [3, 5], providing effective tools for the verification of cryptographic protocols. Symbolic description methods could be exploited to deal with the
state-explosion problems which are intrinsic in the construction of context-sensitive labelled transition systems. Future plans include work in that direction.
References
1. M. Abadi. Security Protocols and Specifications. In W. Thomas, editor, Proc. of the Second
International Conference on Foundations of Software Science and Computation Structure
(FoSSaCS’99), volume 1578 of LNCS, pages 1–13. Springer-Verlag, 1999.
2. N. Asokan, V. Shoup, and M. Waidener. Asynchronuous Protocols for Optimistic Fair Exchange. In Proc. of the IEEE Symposium on Research in Security and Privacy, pages 86–99.
IEEE Computer Society Press, 1998.
3. M. Boreale and M. G. Buscemi. A Framework for the Analysis of Security Protocols. In
Proc. of the 13th International Conference on Concurrency Theory (CONCUR’02), volume
2421 of LNCS, pages 483–498. Springer-Verlag, 2002.
Context-Sensitive Equivalences for Non-interference Based Protocol Analysis
375
4. M. Boreale, R. De Nicola, and R. Pugliese. Proof Tecniques for Cryptographic Processes. In
Proc. of the 14th IEEE Symposium on Logic in Computer Science (LICS’99), pages 157–166.
IEEE Computer Society Press, 1999.
5. M. Boreale and D. Gorla. On Compositional Reasoning in the spi-calculus. In Proc. of the 5th
International Conference on Foundations of Software Science and Computation Structures
(FossaCS’02), volume 2303 of LNCS, pages 67–81. Springer-Verlag, 2002.
6. A. Bossi, R. Focardi, C. Piazza, and S. Rossi. A Proof System for Information Flow Security.
In M. Leuschel, editor, Proc. of Int. Workshop on Logic Based Program Development and
Transformation, LNCS. Springer-Verlag, 2002. To appear.
7. A. Ceccato. Analisi di protocolli crittografici in contesti ostili. Laurea thesis, Università Ca’
Foscari di Venezia, 2001.
8. R. Focardi and R. Gorrieri. Classification of Security Properties (Part I: Information Flow).
In R. Focardi and R. Gorrieri, editors, Foundations of Security Analysis and Design, volume
2171 of LNCS. Springer-Verlag, 2001.
9. R. Focardi, R. Gorrieri, and F. Martinelli. Non Interference for the Analysis of Cryptographic
Protocols. In U. Montanari, J.D.P. Rolim, and E. Welzl, editors, Proc. of Int. Colloquium on
Automata, Languages and Programming (ICALP’00), volume 1853 of LNCS, pages 744–
755. Springer-Verlag, 2000.
10. R. Focardi and S. Rossi. Information Flow Security in Dynamic Contexts. In Proc. of
the 15th IEEE Computer Security Foundations Workshop, pages 307–319. IEEE Computer
Society Press, 2002.
11. R. Gorrieri, E. Locatelli, and F. Martinelli. A Simple Language for Real-time Cryptographic
Protocol Analysis. In Proc. of 12th European Symposium on Programming Languages and
Systems, LNCS. Springer-Verlag, 2003. To appear.
12. H. Mantel. Unwinding Possibilistic Security Properties. In Proc. of the European Symposium
on Research in Computer Security, volume 2895 of LNCS, pages 238–254. Springer-Verlag,
2000.
13. R. Milner. Communication and Concurrency. Prentice-Hall, 1989.
14. V. Shmatikov and J. C. Mitchell. Analysis of a Fair Exchange Protocol. In Proc. of 7th
Annual Symposium on Network and Distributed System Security (NDSS 2000), pages 119–
128. Internet Society, 2000.
On the Exponentiation of Languages
Werner Kuich1 and Klaus W. Wagner2
1
Institut für Algebra und Computermathematik
Technische Universität Wien
Wiedner Hauptstraße 8, A 1040 Wien
kuich@tuwien.ac.at
2
Institut für Informatik
Bayerische Julius-Maximilians-Universität Würzburg
Am Hubland, D-97074 Würzburg, Germany
wagner@informatik.uni-wuerzburg.de
Abstract. We characterize the exponentiation of languages by other
language operations: In the presence of some “weak” operations, exponentiation is exactly as powerful as complement and ε-free morphism.
This characterization implies, besides others, that a semi-AFL is closed
under complement iff it is closed under exponentiation. As an application
we characterize the exponentiation closure of the context-free languages.
Furthermore, P is closed under exponentiation iff P = NP , and NP is
closed under exponentiation iff NP = co-NP.
1
Introduction
Kuich, Sauer, Urbanek [4] defined addition + and multiplication × (different
from concatenation) in such a way that equivalence classes of formal languages,
defined by help of length preserving morphisms, form a lattice. They defined
lattice families of formal languages and showed that, if F is a lattice family of
languages then LF is a lattice with a least and a largest element. Here LF is a
set of equivalence classes defined by a family F of languages.
Moreover, Kuich, Sauer, Urbanek [4] defined exponentiation of formal languages as a new operation. Then they defined stable families of languages (essentially, these are lattice families of languages closed under exponentiation) and
showed that, if F is a stable family of languages then LF is a Heyting algebra
with a largest element. Moreover, they proved that stable families F of languages can be used to characterize the join and meet irreducibility of LF . (See
Theorems 4.2 and 4.3 of Kuich, Sauer, Urbanek [4].)
From the point of view of lattice theory it is, by the results quoted above,
very interesting to find families of languages that are lattice families or stable
families.
The paper consists of this and four more sections. In Section 2, we introduce
the language operations and language families (formal language classes as well
as complexity classes) which are considered in this paper, and we cite from the
literature the present knowledge on the closure properties of these classes.
A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 376–386, 2003.
c Springer-Verlag Berlin Heidelberg 2003
On the Exponentiation of Languages
377
In Section 3 we examine which “classical” language operations are needed to
generate the operations addition, multiplication and exponentiation. As corollaries we get lists of classes which are closed under these operations and which are
lattice families or stable families, resp. It turns out that the regular languages,
the context-sensitive languages, the rudimentary languages, the class PH of the
polynomial time hierarchy, and the complexity classes PSPACE, DSPACE(s) for
s(n) ≥ n, and NSPACE(s) for space-constructible s(n) ≥ n are stable families
and hence closed under exponentiation.
In Section 4 we prove that, for every family F of languages that contains all
regular languages, the closure of F under union, inverse morphism and exponentiation coincides with the closure of F under union, inverse morphism, ε-free
morphism and complement. Since union and inverse morphism are weak operations which only smooth given language classes, this result can informally stated
as follows: exponentiation is just as powerful as ε-free morphism and complement together. As one of the possible consequences we obtain: A semi-AFL is
closed under exponentiation iff it is closed under complement.
In Section 5 we apply the results of Section 4 to various classes of languages
which are not closed or not known to be closed under exponentiation. Kuich,
Sauer, Urbanek [4] proved that the class CFL of context-free languages is not
closed under exponentiation. We show that the closure of CFL under exponentiation and the weak operations of union and inverse morphism coincides with
Smullyan’s class RUD of rudimentary languages. Furthermore, we prove that the
family of languages P (languages accepted by a deterministic Turing machine in
polynomial time) is closed under exponentiation iff P = NP, and that the family
of languages NP (languages accepted by a nondeterministic Turing machine in
polynomial time) is closed under exponentiation iff NP = co-NP.
It is assumed that the reader has a basic knowledge of lattice theory (see Balbes, Dwinger [2]), formal language and automata theory (see Ginsburg [3]), and
complexity theory (see Balcázar, Dı́az, Gabarró [1] and Wagner, Wechsung [7]).
2
Families of Languages and Their Closure Properties
In this paper we consider several classical operations on languages. We use the
symbol εh (lh, h−1 , lh−1 , ∩REG, and − , resp.) for the operation of ε-free morphism (length preserving morphism, inverse morphism, inverse length preserving
morphism, intersection with regular languages, and complement, resp.).
Given operations O1 , O2 , . . . , Or on languages, we introduce the closure operator ΓO1 ,O2 ,...,Or on families of languages as follows: For a family F of languages,
ΓO1 ,O2 ,...,Or (F) is the closure of F under the operations O1 , O2 , . . . , Or , i.e., the
least family of languages containing F and being closed under the operations
O1 , O2 , . . . , Or .
Let REG, CFL, and CSL be the classes of regular, context-free, and contextsensitive languages, resp. The class LOGCFL consists of the languages which are
logarithmic-space many-one reducible to context-free languages. The class RUD
378
W. Kuich and K.W. Wagner
of rudimentary languages is the smallest class of languages that contains CFL
and is closed under ε-free morphism and complement, i.e., RUD = Γεh,− (CFL).
The classes P and NP consist of all languages which can be accepted in
polynomial time by deterministic and nondeterministic, resp., Turing machines.
Let co-NP be the class of all languages whose complement is in NP. With Q
we denote the classes of languages which can be accepted in linear time by
nondeterministic Turing machines.
The classes L and NL consist of all languages which can be accepted in
logarithmic space by deterministic and nondeterministic, resp., Turing machines.
The class PSPACE consists of all languages which can be accepted in polynomial
space by deterministic Turing machines.
Let Σkp and Πkp , k ≥ 1, be the classes of the polynomial-time hierarchy, i.e.,
p
p
Σ1 = N P , Σk+1
is the class of all languages which are nondeterministically
polynomial-time Turing-reducible to languages from Σkp , and Πkp is the class of
all languages whose complement is in Σkp (k ≥ 1). Finally, PH is the union of all
these classes Σkp and Πkp . Notice that PH ⊆ PSPACE.
For a function t : N → N, the classes DTIME(t) and NTIME(t) consist of all
languages which can be accepted in time t by deterministic and nondeterministic,
resp., Turing machines. For a function s : N → N, the classes DSPACE(s)
and NSPACE(s) consist of all languages which can be accepted in space s by
deterministic and nondeterministic, resp., Turing machines.
For exact definitions and more information about these classes see e.g. [1]
and [7]. The following table shows the known closure properties of these classes
(cf. [7]).
Theorem 21 An entry + (-, ?, resp.) in the following table means that the
class in this row is closed (not closed, not known to be closed, resp.) under the
operation in this column.
3
Lattice Families and Stable Families of Languages
In this section we introduce the operations of addition, multiplication and exponentiation of languages, and we see how they can be generated by “classical”
operations on languages.
Throughout this paper the symbol Σ (possibly provided with indices) denotes
a finite subalphabet of some infinite alphabet Σ∞ of symbols.
Let L1 ⊆ Σ1∗ and L2 ⊆ Σ2∗ . Define L1 ≤ L2 if h(L1 ) ⊆ L2 for some length
preserving morphism h : Σ1∗ → Σ2∗ and L1 ∼ L2 if L1 ≤ L2 and L2 ≤ L1 . Then
∼ is an equivalence relation. If L1 ∼ L′1 and L2 ∼ L′2 then L1 ≤ L2 iff L′1 ≤ L′2 .
It follows that ≤ is a partial order relation on the ∼-equivalence classes. Let [L]
be the ∼-equivalence class including the language L.
Let L1 ⊆ Σ1∗ and L2 ⊆ Σ2∗ . Define L1 ×L2 = {(a1 , b1 ) . . . (an , bn ) | a1 . . . an ∈
L1 , b1 . . . bn ∈ L2 } ⊆ (Σ1 × Σ2 )∗ , and let L1 + L2 be the disjoint union of L1 and
L2 . That is the language defined as L1 ∪L2 given that Σ1 ∩Σ2 = ∅. If Σ1 ∩Σ2 = ∅
On the Exponentiation of Languages
language classes
REG
CFL
CSL
LOGCFL
RUD
L
NL
P
Q
NP
co-NP
Σkp (k ≥ 1)
Πkp (k ≥ 1)
PH
PSPACE
DTIME(t)
(t(n) ≥ n)
NTIME(t)
(t(n) ≥ n)
DSPACE(s) (s(n) ≥ n)
NSPACE(s) (s(n) ≥ n)
operations
∪ ∩REG ∩ − εh lh−1
+
+
+ + +
+
+
+
− − +
+
+
+
+ + +
+
+
+
+ + ?
+
+
+
+ + +
+
+
+
+ + ?
+
+
+
+ + ?
+
+
+
+ + ?
+
+
+
+ ? +
+
+
+
+ ? +
+
+
+
+ ?
?
+
+
+
+ ? +
+
+
+
+ ?
?
+
+
+
+ + +
+
+
+
+ + +
+
+
+
+ + ?
+
+
+
+ ? +
+
+
+
+ + +
+
+
+
+ +3 +
+
379
h−1
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+1
+1
+2
+2
The functions t and s are assumed to be increasing.
+1 - Replace t with t(O(n))
+2 - Replace s with s(O(n))
+3 - Assume that s is space-constructible, i.e., the computation x → s(|x|) can be carried out in space s(|x|).
then create the new alphabet Σ̄ = {ā | a ∈ Σ2 } such that Σ1 ∩ Σ̄ = ∅ and a
copy L̄ ⊆ Σ̄ ∗ of L2 and take L1 + L2 = L1 ∪ L̄.
It is easy to see that if L1 ∼ L3 and L2 ∼ L4 then L1 + L2 ∼ L3 + L4 and
L1 × L2 ∼ L3 × L4 . It follows that the operations + and × lift consistently to ∼equivalence classes of languages. It is clear that multiplication × and addition +
on ∼-equivalence classes are commutative and associative operations. We denote
the set of ∼-equivalence classes of languages by L. If F is a family of languages
◦
then we denote LF = {[L] ∩ F | L ∈ F}. By 1 ∈ L we denote the ∼-equivalence
class containing the language {a}∗ for some a ∈ Σ∞ and by ∅ ∈ L we denote
the ∼-equivalence class containing the language ∅.
A lattice P ; ≤, +, × is a partially ordered set in which for every two elements
a, b ∈ P there exists a least upper bound, denoted by a + b, and a greatest lower
bound, denoted by a × b.
A family F of languages is called lattice family if F is closed under isomorphism, plus + and times ×, and contains ∅ and Σ ∗ for all finite Σ ⊂ Σ∞ .
380
W. Kuich and K.W. Wagner
Theorem 31 (Kuich, Sauer, Urbanek [4]) L; ≤, +, × is a lattice with least
◦
element ∅ and largest element 1. If F is a lattice family of languages then
◦
LF ; ≤, +, × is a lattice with least element ∅ and largest element 1.
Lemma 32 For all L1 ⊆ Σ1∗ and L2 ⊆ Σ2∗ there exist length preserving morphisms H, H1 , H2 such that
L1 + L2 = L1 ∪ H −1 (L2 )
and
L1 × L2 = H1−1 (L1 ) ∩ H2−1 (L2 ) .
Proof. (i) If Σ1 ∩ Σ2 = ∅ then H : Σ2∗ → Σ2∗ is the identity. If Σ1 ∩ Σ2 = ∅
then create the new alphabet Σ̄ = {ā | a ∈ Σ2 } and define H : Σ̄ ∗ → Σ2∗ by
H(ā) = a, a ∈ Σ2 .
(ii) Define Hi : (Σ1 × Σ2 )∗ → Σi∗ , i = 1, 2, by H1 ([a, b]) = a and H2 ([a, b]) =
b, a ∈ Σ1 , b ∈ Σ2 . Then L1 × L2 = H1−1 (L1 ) ∩ H2−1 (L2 ).
⊓
⊔
From this and the previous theorem we conclude the following theorem.
Theorem 33 1. If F is a family of languages closed under union, intersection
and inverse length preserving morphism then F is also closed under addition
and multiplication.
2. If F is a family of languages that contains ∅ and Σ ∗ for all finite Σ ⊆ Σ∞
and that is closed under union, intersection, and inverse length preserving
morphism then F is a lattice family.
Corollary 34 The following families of languages are lattice families:
(i) REG, CSL, LOGCFL, and RUD.
(ii) L, NL, P, Q, NP, and PSPACE.
(iii) Σkp , Πkp for k ≥ 1, and PH.
(iv) DTIME(t) and NTIME(t) for t(n) ≥ n.
(v) DSPACE(s) and NSPACE(s) for s(n) ≥ n.
Proof. This is an immediate consequence of Theorem 21
⊓
⊔
Let Σ = {h | h : Σ1 → Σ2 } be the set of all functions h : Σ1 → Σ2 considered
as an alphabet. This alphabet is denoted by Σ2Σ1 . For f = h1 . . . hn ∈ Σ n and
w = a1 . . . am ∈ Σ1m define
h1 (a1 ) . . . hn (an ) if n = m
f (w) =
undefined
if n = m.
(and ε(ε) = ε if n = 0). For L1 ⊆ Σ1∗ , L2 ⊆ Σ2∗ define
∗
1
LL
2 = {f ∈ Σ | f (w) ∈ L2 for all w ∈ L1 for which f (w) is defined} .
1
Observe that LL
2 depends on the sets Σ1 and Σ2 .
On the Exponentiation of Languages
381
The notion of exponentiation lifts to ∼-equivalence classes of languages.
Hence, for ∼-equivalence classes of languages L1 and L2 the class LL2 1 is independent of the alphabets.
A lattice P ; ≤, +, × is called Heyting algebra if (i) for all a, b ∈ P there
exists a greatest c ∈ P such that a × c ≤ b. This element c is denoted by ba . It
is called the exponentiation of b by a. (ii) There exists a least element 0 in P .
A family F of languages is stable if it is a lattice family and closed under
exponentiation and intersection with regular languages.
Theorem 35 (Kuich, Sauer, Urbanek [4]) Let F be a stable family of languages. Then LF ; ≤, +, × is a Heyting algebra, where the class ∅ is the 0-element
◦
and 1 is the largest element.
Hence, for the equivalence classes of LF , where F is a stable family of languages,
the computation rules given in Kuich, Sauer, Urbanek [4], Corollary 2.3, are
valid, e. g., LL1 +L2 = LL1 × LL2 , (LL1 )L2 = LL1 ×L2 , (L1 × L2 )L = LL1 × LL2 for all
L, L1 , L2 ∈ LF .
For L ⊆ Σ ∗ we define the complement of L by complΣ (L) = Σ ∗ − L.
Lemma 36 For all L1 ⊆ Σ1∗ and L2 ⊆ Σ2∗ there exist length preserving morphisms H1 , H2 , H3 such that
−1
−1
1
LL
2 = complΣ (H3 (H1 (L1 ) ∩ H2 (complΣ2 (L2 )))) ,
where Σ = Σ2Σ1 .
Proof. Define the morphisms H1 : (Σ × Σ1 )∗ → Σ1∗ , H2 : (Σ × Σ1 )∗ → Σ2∗ and
H3 : (Σ × Σ1 )∗ → Σ ∗ by H1 ([h, a]) = a, H2 ([h, a]) = h(a) and H3 ([h, a]) = h
for all h ∈ Σ and a ∈ Σ1 . Then, for all h1 , . . . , hn ∈ Σ, n ≥ 0,
1
h1 . . . hn ∈ complΣ (LL
2 ) ⇔ ∃a1 , . . . , an (a1 . . . an ∈ L1 ∧ h1 (a1 ) . . . hn (an ) ∈ complΣ2 (L2 ))
⇔ ∃a1 , . . . , an (H1 ([h1 , a1 ]) . . . H1 ([hn , an ]) ∈ L1
∧H2 ([h1 , a1 ]) . . . H2 ([hn , an ]) ∈ complΣ2 (L2 ))
⇔ ∃a1 , . . . , an ([h1 , a1 ] . . . [hn , an ] ∈ H1−1 (L1 ) ∩ H2−1 (complΣ2 (L2 )))
⇔ h1 . . . hn ∈ H3 (H1−1 (L1 ) ∩ H2−1 (complΣ2 (L2 ))) .
⊓
⊔
From this and the previous theorem we conclude the following theorem.
Theorem 37 1. If F is a family of languages closed under union, complement,
inverse length preserving morphism and length preserving morphism then F
is also closed under exponentiation.
2. If F is a family of languages that contains ∅ and Σ ∗ for all finite Σ ⊆ Σ∞ and
that is closed under union, complement, inverse length preserving morphism,
length preserving morphism and intersection with regular languages then F
is stable.
From this and Theorem 21 we obtain
Corollary 38 The following families of languages are stable (and hence closed
under exponentiation):
382
W. Kuich and K.W. Wagner
(i)
(ii)
(iii)
(iv)
REG, CSL, and RUD.
PH and PSPACE.
DSPACE(s) for s(n) ≥ n.
NSPACE(s) for space-constructible s(n) ≥ n.
4
On the Power of Exponentiation
In this section we will compare the power of exponentiation with the power of
complement and ε-free morphism. In this comparision some other operations play
a role, namely union, intersection with regular languages, and inverse morphism.
However, these operations are weak in the sense that they do not really add power
to language classes, they only smooth them. Practically all formal language
classes and complexity classes are closed under these operations. On the other
side, the operations of length preserving morphism and complement are more
powerful: ε-free morphisms introduce nondeterminism, and the class of context
free languages, for example, is not closed under complement.
In this section we prove that, in the presence of the above mentioned weak
operations, ε-free morphism and complement on the one side and exponentiation
on the other side are equally powerful.
We start with two lemmas showing how length preserving morphism and
complementation can be generated by exponentiation.
For Σ ⊂ Σ∞ we define EΣ ⊆ (Σ × Σ)∗ by EΣ = {[x, x] | x ∈ Σ}+ . Observe
that complΣ×Σ (EΣ ) is a regular language.
Lemma 41 For L ⊆ Σ ∗ there exists a length preserving morphism H : Σ ∗ →
((Σ × Σ)Σ )∗ such that complΣ (L) = H −1 ((complΣ×Σ (EΣ ))L ).
Proof. We define hb : Σ → Σ × Σ by hb (a) = [a, b] and the morphism H by
H(b) = hb for all a, b ∈ Σ. Then, for b1 , . . . , bn ∈ Σ, the equivalence
b1 . . . bn ∈ L ⇔ ∃a1 , . . . , an (a1 . . . an ∈ L ∧ a1 . . . an = b1 . . . bn )
implies the equivalences
b1 . . . bn ∈ complΣ (L) ⇔ ∀a1 , . . . , an (a1 . . . an ∈ L ⇒ a1 . . . an = b1 . . . bn )
⇔ ∀a1 , . . . , an (a1 . . . an ∈ L ⇒ [a1 , b1 ] . . . [an , bn ] ∈ complΣ×Σ (EΣ ))
⇔ ∀a1 , . . . , an (a1 . . . an ∈ L ⇒ hb1 (a1 ) . . . hbn (an ) ∈ complΣ×Σ (EΣ ))
⇔ hb1 . . . hbn ∈ complΣ×Σ (EΣ )L
⇔ H(b1 . . . bn ) ∈ complΣ×Σ (EΣ )L
⇔ b1 . . . bn ∈ H −1 (complΣ×Σ (EΣ )L )
⊓
⊔
For a length preserving morphism h : Σ1∗ → Σ2∗ we define Eh = {[x, h(x)] |
x ∈ Σ1 }+ . Observe that Eh is a regular language.
Lemma 42 For L ⊆ Σ1∗ and a length preserving morphism h : Σ1∗ → Σ2∗
there exist length preserving morphisms H1 : Σ2∗ → ((Σ1 × Σ2 )Σ1 )∗ and H2 :
(Σ1 × Σ2 )∗ → Σ1∗ such that
∗
h(L) = complΣ2 (H1−1 (complΣ1 ×Σ2 (Eh ∩ H2−1 (L))Σ1 )) .
On the Exponentiation of Languages
383
Proof. We define hb : Σ1 → Σ1 × Σ2 by hb (a) = [a, b], H1 by H1 (b) = hb , and
H2 by H2 ([a, b]) = a for all a ∈ Σ1 , b ∈ Σ2 . Then, for b1 , . . . , bn ∈ Σ2 , the
equivalence
b1 . . . bn ∈ h(L) ⇔ ∃a1 , . . . , an (h(a1 . . . an ) = b1 . . . bn ∧ a1 . . . an ∈ L)
implies the equivalences
b1 . . . bn ∈ complΣ2 (h(L)) ⇔
⇔ ∀a1 , . . . , an (a1 . . . an ∈ Σ1∗ ⇒ ¬(h(a1 . . . an ) = b1 . . . bn ∧ a1 . . . an ∈ L))
⇔ ∀a1 , . . . , an (a1 . . . an ∈ Σ1∗ ⇒ [a1 , b1 ] . . . [an , bn ] ∈
/ Eh ∩ H2−1 (L))
∗
⇔ ∀a1 , . . . , an (a1 . . . an ∈ Σ1
⇒ hb1 (a1 ) . . . hbn (an ) ∈ complΣ1 ×Σ2 (Eh ∩ H2−1 (L)))
∗
⇔ hb1 . . . hbn ∈ complΣ1 ×Σ2 (Eh ∩ H2−1 (L))Σ1
∗
⇔ H1 (b1 . . . bn ) ∈ complΣ1 ×Σ2 (Eh ∩ H2−1 (L))Σ1
−1
−1
Σ1∗
⊓
⊔
⇔ b1 . . . bn ∈ H (compl
(Eh ∩ H (L)) ) .
1
Σ1 ×Σ2
2
The next lemma shows how ε-free morphisms can be generated by length
preserving morphisms (cf. [3]).
Lemma 43 Consider L ⊆ Σ1∗ and an ε-free morphism h : Σ1∗ → Σ2∗ . Then there
exists a length preserving morphism h′ : Σ ∗ → Σ2∗ , a morphism H : Σ ∗ → Σ1∗ ,
and a regular set R ⊆ Σ ∗ such that
h(L) = h′ (H −1 (L) ∩ R) .
Proof. Let Σ1 = {a1 , . . . , ak }, and let h(ai ) = bi1 bi2 . . . biri for i = 1, . . . , k.
Define the alphabet Σ by Σ = {aij | i = 1, . . . , k and j = 1, . . . , ri }, the length
preserving morphism h′ by h′ (aij ) = bij for i = 1, . . . , k and j = 1, . . . , ri , the
morphism H by H(ai1 ) = ai , H(aij ) = ε for i = 1, . . . , k and j = 2, . . . , ri , and
the regular set R by R = {ai1 ai2 . . . airi | i = 1, . . . , k}∗ . Then we obtain
h(L) = {h(ai1 ai2 . . . ain ) | ai1 ai2 . . . ain ∈ L}
= {bi1 1 . . . bi1 ri1 bi2 1 . . . bi2 ri2 . . . bin 1 . . . bin rin | ai1 ai2 . . . ain ∈ L}
= {h′ (ai1 1 . . . ai1 ri1 ai2 1 . . . ai2 ri2 . . . ain 1 . . . ain rin ) | ai1 ai2 . . . ain ∈ L}
= h′ ({ai1 1 . . . ai1 ri1 ai2 1 . . . ai2 ri2 . . . ain 1 . . . ain rin | ai1 ai2 . . . ain ∈ L}) ⊓
⊔
= h′ (H −1 (L) ∩ R) .
Using this notation we immediately obtain the following consequences from
Lemma 36, Lemma 41, Lemma 42, and Lemma 43.
Corollary 44 For any family F of languages there holds:
1.
2.
3.
4.
Γexp (F) ⊆ Γ∪,lh−1 ,lh,− (F)
Γ− (F) ⊆ Γlh−1 ,exp (F ∪ REG)
Γlh (F) ⊆ Γ∩REG,lh−1 ,− ,exp (F)
Γεh (F) ⊆ Γ∩REG,h−1 ,lh (F)
Now we can prove the main theorem of this section. Informally it says that, in
the presence of the weak operations ∪ and h−1 , the operation exp is as powerful
as the operations εh and − (lh and − , resp).
384
W. Kuich and K.W. Wagner
Theorem 45 For a family F of languages that contains REG, there holds
1. Γ∪,lh−1 ,lh,− (F) = Γ∪,lh−1 ,exp (F).
2. Γ∪,h−1 ,εh,− (F) = Γ∪,h−1 ,lh,− (F) = Γ∪,h−1 ,exp (F).
Proof. We conclude
Γ∪,lh−1 ,lh,− (F) ⊆ Γ∪,lh−1 ,∩REG,− ,exp (F) = Γ∪,lh−1 ,− ,exp (F) (Lemma 44.3)
⊆ Γ∪,lh−1 ,exp (F)
(Lemma 44.2)
(Lemma 44.1)
⊆ Γ∪,lh−1 ,lh,− (F)
and
Γ∪,h−1 ,εh,− (F) ⊆
⊆
⊆
⊆
⊆
Γ∪,h−1 ,∩REG,lh,− (F) = Γ∪,h−1 ,lh,− (F)
Γ∪,h−1 ,∩REG,− ,exp (F) = Γ∪,h−1 ,− ,exp (F)
Γ∪,h−1 ,exp (F)
Γ∪,h−1 ,lh,− (F)
Γ∪,h−1 ,εh,− (F)
(Lemma
(Lemma
(Lemma
(Lemma
44.4)
44.3)
44.2)
44.1)
⊓
⊔
Corollary 46 1. Let F be a family of languages that contains REG and is
closed under union and inverse length preserving morphism. Then F is closed
under exponentiation iff it is closed under length preserving morphism and
complement.
2. Let F be a family of languages that contains REG and is closed under union
and inverse morphism. Then F is closed under exponentiation iff it is closed
under ε-free morphism and complement.
From this corollary we get directly the following three corollaries.
Corollary 47 Let F be a family of languages that contains REG and is closed
under union, inverse length preserving morphism, and length preserving morphism. Then F is closed under complement iff it is closed under exponentiation.
A family of languages is called a semi-AFL if it is closed under union, inverse
morphism, ε-free morphism, and intersection with regular languages and if it
contains ∅ and Σ ∗ for all Σ ⊆ Σ∞ . (see [3]).
Corollary 48 A semi-AFL is closed under complement iff it is closed under
exponentiation.
Corollary 49 1. Let F be a family of languages that contains REG and is
closed under union, complement and inverse length preserving morphism.
Then F is closed under length preserving morphism iff it is closed under
exponentiation.
2. Let F be a family of languages that contains REG and is closed under union,
complement and inverse morphism. Then F is closed under ε-free morphism
iff it is closed under exponentiation.
On the Exponentiation of Languages
5
385
Application to Language Classes
In this section we apply the results of the previous section to the language classes
mentioned in Section 2. In the case that a class is not closed under exponentiation
we will characterize the closure of this class under exponentiation. In the case
that it is not known whether the class is closed under exponentiation we will
give equivalent conditions for the class being closed under exponentiation.
Let us start with the class CFL of context-free languages. By Lemma 2.1
of Kuich, Sauer, Urbanek [4], the context-free languages are not closed under
exponentiation. We are now able to determine the closure of CFL under exponentiation (together with some “weak” operations).
The class RUD of rudimentary languages, introduced by Smullyan in [5], can
be considered as the linear time analogon of the class PH of the polynomial time
hierarchy. From Theorem 45.2 and Theorem 21 we obtain the following theorem.
Theorem 51 The class RUD coincides with the closure of CFL under union,
inverse morphism and exponentiation.
Now we turn to classes which are not known to be closed under exponentiation. We start with some classes between L and P.
Theorem 52 Let F be a family of languages that is closed under union, complement, and logarithmic space many-one reducibility and that fulfills L ⊆ F ⊆ NP.
Then F is closed under exponentiation iff F = NP.
Proof. Obviously, closure under logarithmic space many-one reducibility implies
closure under inverse morphism. By Corollary 49.2 we obtain that F is closed
under exponentiation iff it is closed under ε-free morphism.
If F is closed under ε-free morphism, then we obtain F = Γεh (F) ⊇ Γεh (L).
A result by Springsteel [6] says that Γεh (L) ⊇ Q. Hence F ⊇ Q. The class Q
contains sets which are logarithmic space many-one complete for NP. Since F is
closed under logarithmic space many-one reducibility we get F ⊇ NP and hence
F = NP.
On the other side, if F = NP then, by Theorem 21, F is closed under ε-free
morphism.
⊓
⊔
Since the classes L, NL, LOGCFL, P, and NP ∩ coNP are closed under union,
complement, and logarithmic space many-one reducibility, we obtain the following corollary.
Corollary 53 1. L is closed under exponentiation iff L = NP.
2. NL is closed under exponentiation iff NL = NP.
3. LOGCFL is closed under exponentiation iff LOGCFL = NP.
4. P is closed under exponentiation iff P = NP.
5. NP ∩ coNP is closed under exponentiation iff NP = coNP.
386
W. Kuich and K.W. Wagner
The classes in the previous corollary are closed under complement but not
known to be closed under ε-free morphism. For the nondeterministic time classes
Q, NP, NTIME(t) and Σkp the opposite is true. Here we can apply Corollary 47.
Theorem 54 1. Q is closed under exponentiation iff Q = co-Q.
2. NP is closed under exponentiation iff NP = co-NP.
3. For every increasing t : N → N such that t(n) ≥ n,
NTIME(t) is closed under exponentiation iff NTIME(t) = co-NTIME(t).
4. Σkp is closed under exponentiation iff Σkp = Πkp .
Note that Q = co-Q implies NP = co-NP, and NP = co-NP implies Σkp = Πkp
for k ≥ 2 (cf. [7]).
Finally we consider the classes Πkp of the polynomial-time hierarchy.
Theorem 55 For k ≥ 1, the class Πkp is closed under exponentiation iff Πkp =
Σkp .
Proof. If Πkp is closed under exponentiation then, by Corollary 44.2 and Theorem
21, Πkp is closed under complementation, i.e., Πkp = Σkp .
On the other side, if Πkp = Σkp then Πkp = PH. By Corollary 38 we obtain
that Πkp is closed under exponentiation.
⊓
⊔
References
[1] Balcázar J.L., Dı́az J., Gabarró J.: Structural Complexity I. Second edition.
Springer-Verlag Berlin, 1995.
[2] Balbes R., Dwinger P.: Distributive Lattices. University of Missouri Press, 1974.
[3] Ginsburg S.: Algebraic and Automata-Theoretic Properties of Formal Languages.
North-Holland, 1975.
[4] Kuich W., Sauer N., Urbanek F.: Heyting algebras and formal languages. J.UCS
8(2002), 722–736.
[5] Smullyan R.: Theory of Formal Systems. Annals of Mathematical Studies vol. 47.
Princeton University Press, 1961.
[6] Springsteel F.N.: On the pre-AFL of logn space and related families of languages.
Theoretical Computer Science 2(1976), 295–303.
[7] Wagner K., Wechsung G.: Computational Complexity. Deutscher Verlag der Wissenschaften, 1986.
Kleene’s Theorem for Weighted Tree-Automata
Christian Pech⋆
Technische Universität Dresden
Fakultät für Mathematik und Naturwissenschaften
D-01062 Dresden, Germany
pech@math.tu-dresden.de
Abstract. We sketch the proof of a Kleene-type theorem for formal tree-series
over commutative semirings. That is, for a suitable set of rational operations we
show that the proper rational formal tree-series coincide with the recognizable
ones. A complete proof is part of the PhD-thesis of the author, which is available
at [9].
Keywords: tree, automata, weight, language, Kleene’s theorem, Schützenberger’s
theorem, rational expression.
A formal tree-series is a function from the set TΣ of trees over a given ranked
alphabet Σ into a semiring K. The classical notion of formal tree-languages is obtained
if K is chosen to be the Boolean semiring.
Rational operations on formal tree-languages like sum, topcatenation, a-multiplication etc. have been used by Thatcher and Wright [11] to characterize the recognizable formal tree-languages by rational expressions. Thus they generalized the classical
Kleene-theorem [6] stating that rational and recognizable formal languages coincide.
The rational operations on tree-languages can be generalized to formal tree-series.
We would like to know the generating power of these operations. There are several results
on this problem—each for some restricted class of semirings—saying that for formal
tree-series the rational series coincide with the recognizable series, too. In particular it
was shown by Kuich [7] for complete, commutative semirings, by Bozapalidis [3] for
ω-additive, commutative semirings, by Bloom and Ésik [2] for commutative Conwaysemirings and by Droste and Vogler [5] for idempotent, commutative semirings. The
necessary restrictions on the semiring are in contrast with the generality of Schützenbergers theorem for formal power series (i.e. functions from Σ ∗ into a semiring) [10] that
is completely independent of the semiring.
Here we develop a technique how to restrict the list of requirements to a minimum.
The main idea is that instead of working directly with formal tree-series, we introduce the
notion of weighted tree-languages. They form a category which algebraically is closer
related to formal tree-languages than to formal tree-series. The environment that we
obtain allows us to translate the known constructions of the rational operations directly
to weighted tree-languages.
⋆
This work was supported by the German Research Council (DFG, GRK 433/2).
A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 387–399, 2003.
c Springer-Verlag Berlin Heidelberg 2003
388
C. Pech
On the level of weighted tree-languages we can prove a Kleene-type theorem . Its
proof is rather conventional and often uses classical automata-theoretic constructions
tailored to the new categorical setting of weighted tree-languages.
Upto this point the results do not depend on the semiring at all. Only when translating
our results to formal tree-series the unavoidable restriction to the semiring becomes
apparent. Luckily we only need to require the coefficient-semiring to be commutative—a
very mild restriction given that almost all semirings, that are actually used in applications
like image compression (cf. [4]) or natural language processing (cf. [8]) are commutative.
1
Preliminaries
A ranked alphabet (or ranked set) is a pair (Σ, rk) where Σ is a set of letters (an alphabet)
and rk : Σ → IN assigns to each letter its rank. With Σ (n) we denote the set of letters
from Σ with rank n. For any set X disjoint from Σ we define Σ(X) := (Σ ∪ X, rk′ )
where rk′|Σ := rk and rk′ (x) := 0 for all x ∈ X. If X consists just of one element x
then we also write Σ(x) instead of Σ({x}).
The set TΣ of trees is the smallest set of words such that Σ (0) ⊆ TΣ and if f ∈ Σ (n) ,
t1 , . . . , tn ∈ TΣ , then f t1 , . . . , tn ∈ TΣ .
A semiring is a quintuple (K, ⊕, ⊙, 0, 1) such that (K, ⊕, 0) is a commutative
monoid, (K, ⊙, 1) is a monoid and the following identities hold: (x ⊕ y) ⊙ z =
(x ⊙ z) ⊕ (y ⊙ z), x ⊙ (y ⊕ z) = (x ⊙ y) ⊕ (x ⊙ z) and x ⊙ 0 = 0 ⊙ x = 0.
The set WTΣ of weighted trees is the smallest set of words such that [a|c] ∈ WTΣ
for all a ∈ Σ (0) , c ∈ K and if f ∈ Σ (n) , t1 , . . . , tn ∈ WTΣ , c ∈ K, then
[f |c]t1 , . . . , tn ∈ WTΣ . Each weighted tree t has an underlying tree ut(t). This
tree is obtained from t be deleting all weights from the nodes. Let a ∈ Σ (0) . To each tree
s ∈ TΣ we associate its a-rank rka (s) ∈ IN. This is just the number of occurrences of the
letter a in s. The a-rank can be lifted to weighted trees according to rka (t) := rka (ut(t))
(for t ∈ WTLΣ ).
The semiring K acts naturally on WTΣ from the left. In particular, for every c, d ∈
K: d · [a|c] := [a|d ⊙ c], d · [f |c]t1 , . . . , tn := [f |d ⊙ c]t1 , . . . , tn . Obviously
(c ⊙ d) · t = c · (d · t) for c, d ∈ K and t ∈ WTΣ .
For a ∈ Σ (0) we define the operation of a-substitution on WTΣ . In particular,
for t ∈ WTΣ , t1 , . . . , trka (t) ∈ WTΣ we define t ◦a t1 , . . . , trka (t) by induction on the structure of t: [a|c] ◦a t1 := c · t1 , [b|c] ◦a := [b|c] (where b = a)
and [f |c]t1 , . . . , tn ◦a s1,1 , . . . , sn,mn := [f |c]t1 ◦a s1,1 , . . . , s1,m1 , . . . , tn ◦a
sn,1 , . . . , sn,mn .
Next we equip WTΣ with the structure of a ranked monoid1 . Before we can do that,
we need to introduce further notions:
A ranked semigroup is a triple (S, rk, ◦) where (S, rk) is a ranked set and where
◦ = (◦i )i∈IN is a family of composition operations ◦i : S (i) × S i → S where ◦i :
(f, (g1 , . . . , gi )) → f ◦ g1 , . . . , gi such that rk(f ◦ g1 , . . . , gi ) = rk(g1 ) + · · · +
rk(gi ), and
1
These structures were already used by Berstel and Reutenauer [1] under the name “magma”;
however, this leads to a name clash with another type of algebraic structures.
Kleene’s Theorem for Weighted Tree-Automata
389
(f ◦ g1 , . . . , gn ) ◦ h1,1 , . . . , h1,m1 , . . . , hn,1 , . . . , hn,mn
= f ◦ g1 ◦ h1,1 , . . . , h1,m1 , . . . , gn ◦ hn,1 , . . . , hn,mn .
The latter is called superassociativity law.
A ranked monoid is a tuple (S, rk, ◦, 1) where (S, rk, ◦) is a ranked semigroup and
1 ∈ S (1) is a left- and right-unit of ◦. That is x ◦ 1, . . . , 1 = x and 1 ◦ y = y
for all x, y ∈ S. Examples of ranked monoids are (TΣ , rka , ◦a , a) for a ∈ Σ (0) and
(WTΣ , rka , ◦a , [a|1]).
Homomorphisms between ranked semigroups (monoids) are defined in the evident
way—as rank-preserving functions between the carriers that additionally preserve the
composition-operation (and the unit 1). Ranked semigroups and ranked monoids may
be considered as a special kind of many-sorted algebras where the sorts are the natural
numbers. Hence there exist free structures. The free ranked monoid freely generated by
a ranked alphabet Σ will be denoted by (Σ, rk)∗ . With Σ ′ := Σ(ε) (where ε is a letter
that is not in Σ) we have that
(Σ, rk)∗ = (TΣ ′ , rkε , ◦ε , ε).
(1)
2 Weighted Tree-Languages
Let K be a semiring and Σ = (Σ, rk) be a ranked alphabet. A weighted tree-language
is a pair L = (L, |.|) where L is a set, |.| : L → WTΣ : s → |s|. Let L1 =
(L1 , |.|1 ), L2 = (L2 , |.|2 ) be weighted tree-languages. A function h : L1 → L2 is
called homomorphism from L1 to L2 if for all t ∈ L1 holds |t|1 = |h(t)|2 . Thus the
weighted tree-languages form a category which will be denoted by WTLΣ . This category
is complete and cocomplete. The forgetful functor U : WTLΣ → Set creates colimits.
Moreover WTLΣ has an initial object (∅, ∅) and a terminal object (WTΣ , 1WTΣ ).
The action of K on WTΣ may be extended to a functor on WTLΣ . In particular, for
c ∈ K we define the functor [c · −] : L → c · L, h → h where c · (L, |.|) := (L, |.|′ )
such that |.|′ : t → c · |t|.
Next we define the topcatenation. Let f ∈ Σ (n) , c ∈ K. Then we define the functor
[f |c]−1 , . . . , −n : (L1 , . . . , Ln ) → [f |c]L1 , . . . , Ln , (h1 , . . . , hn ) → h1 ×· · ·×
hn where for Li = (Li , |.|i ) (i = 1, . . . , n) [f |c]L1 , . . . , Ln := (L1 × · · · × Ln , |.|)
such that |(t1 , . . . , tn )| := [f |c]|t1 |1 , . . . , |tn |n .
Let a ∈ Σ (0) . We will lift now the a-substitution from weighted trees to weighted
tree-languages. We do this in two steps. First we define t·a L for t ∈ WTΣ , L ∈ WTLΣ .
Later we will define L1 ·a L2 for L1 , L2 ∈ WTLΣ . As usual we proceed by induction:
[[a|c] ·a −] := [c · −], [[b|c] ·a −] := C{[b|c]} where C{[b|c]} is the constant functor
that maps each language to {[b|c]} and each homomorphism to the unit-homomorphism
of {[b|c]}. [[f |c]t1 , . . . , tn ·a −] := [f |c]t1 ·a −, . . . , tn ·a −. The connection of
this operation with the a-substitution on weighted trees is as follows. Let t ∈ WTΣ
with rka (t) = n. Let L = (L, |.|) ∈ WTLΣ . Then t ·a L ∼
= (Ln , |.|t,a ) where
|(t1 , . . . , tn )|t,a := t ◦a |t1 |, . . . , |tn |.
The a-product of two weighted tree-languages is now obtained by [− ·a L2 ] : L1 →
t∈L1 |t|1 ·a L2 . The definition of this functor on homomorphisms is done pointwise
390
C. Pech
in the evident way. Of course we can give a more transparent construction of this operation: Let L the set of words defined according to L := {t ◦a s1 , . . . , srka (t) |
t ∈ L1 , s1 , . . . , srka (t) ∈ L2 } and define a structure map |.| on L according to
|t ◦a s1 , . . . , srka (t) | := |t|1 ◦a |s1 |2 , . . . , |srka (t) |2 . Then L1 ·a L2 ∼
= (L, |.|).
As special case of the a-product we define [−¬a ] : L → L¬a where L¬a := L ·a ∅.
This operation is called a-annihilation.
Proposition 1. [c·−], [f |c]−1 , . . . , −n , [−·a L] and [−¬a ] preserve arbitrary colimits
and monos. [t ·a −] and [L ·a −] preserve directed colimits and monos.
⊓
⊔
Apart from the already defined operations on WTLΣ we also have the coproductfunctor [−1 + −2 ]. We note that this functor also preserves directed colimits and monos.
Recall that the composition of functors preserving directed colimits will again preserve
directed colimits (the same holds for monos-preserving functors).
Our next step is to introduce some iteration operations on WTLΣ . This can be done as
for usual tree-languages, only using the appropriate categorical notions. Let us start with
the a-iteration—a generalization of the Kleene-star for formal languages to weighted
tree-languages. Define Sa : WTL2Σ → WTLΣ : (X, L) → (L ·a X) + {[a|1]}.
Then this functor preserves directed colimits. Since WTLΣ has an initial object (the
empty language), there exists an initial Sa (−, L)-algebra µX.Sa (X, L). Its carrier may
be chosen to be the colimit of the initial sequence
∅ → Sa (∅, L) → S2a (∅, L) → · · ·
It is called the a-iteration of L and is denoted by L∗a . Next we will reveal a very nice
connection between a-iteration and ranked monoids. It is this connection that makes the
a-iteration a generalization of the Kleene-star.
Proposition 2. Given L = (L, |.|) ∈ WTLΣ . For t ∈ L set rka (t) := rka (|t|). Let
(L, rka )∗ be the free ranked monoid generated by (L, rka ). Let L∗a be its carrier and let
|.|∗a be the initial homomorphism from (L, rka )∗ to (WTΣ , rka , ◦a , [a|1]) induced by |.|.
Then (L∗a , |.|∗a ) ∼
= L∗a .
Another important iteration operation is obtained from Ra : WTL2Σ → WTLΣ
: (X, L) → L ·a X. We call its initial algebra carrier µX.Ra (X, −) a-recursion. The
a-recursion of a weighted tree-language L will be denoted by Lµa . A close relation of
a-recursion to a-iteration is given by the fact that Lµa ∼
= (L∗a )¬a for any L ∈ WTLΣ .
Let us introduce a last iteration operation. Set Pa : WTL2Σ → WTLΣ : (X, L) →
L ·a (X + {[a|1]}). Then the initial algebra carrier µX.Pa (X, −) : L → L+
a will be
called a-semiiteration. The relation of this operation to a-iteration is given by L∗a ∼
=
µ ∼
+
L+
a + {[a|1]}. An immediate consequence is that La = (La )¬a .
The following two properties of weighted tree-languages will be important later,
when we associate formal tree-series to weighted tree-languages. A weighted treelanguage L = (L, |.|) is called finitary if for all t ∈ TΣ the set {s ∈ L | ut(|s|) = t} is
finite. It is called a-quasiregular (for some a ∈ Σ (0) ) if it does not contain any element s
with ut(|s|) = a. The full subcategory of WTLΣ of all finitary weighted tree-languages
will be denoted by WTLfΣ .
Kleene’s Theorem for Weighted Tree-Automata
391
Proposition 3. Let L1 , . . . , Ln ∈ WTLfΣ , c ∈ K, f ∈ Σ (n) . Then L1 + L2 , c · L1 ,
[f |c]L1 , . . . , Ln , L1 ·a L2 , (L1 )¬a are all finitary again.
⊓
⊔
Proposition 4. Let L ∈ WTLfΣ . Then L∗a is finitary if and only if L is a-quasiregular.
⊓
⊔
3 Weighted Tree-Automata
Given a ranked alphabet Σ and a semiring K, a finite weak weighted tree-automaton
(wWTA) is a 7-tuple (Q, I, ι, T, λ, S, σ) where Q is a finite set of states, I ⊆ Q is a
set of initial states, ι : I → K describes the initial weights, T is a finite ranked set
of transition-symbols and λ is a function assigning to each transition-symbol τ ∈ T
a transition where for τ ∈ T (n) a transition is a tuple (q, f, q1 , . . . , qn , c) such that
q, q1 , . . . , qn ∈ Q, f ∈ Σ (n) and c ∈ K. Moreover, S is a finite set of silent transitionsymbols and σ assigns to each silent transition-symbol a silent transition where a silent
transition is a triple (q1 , q2 , c) for q1 , q2 ∈ Q, c ∈ K. Let A be a wWTA. For convenience
reasons for τ ∈ T with λ(τ ) = (q, f, q1 , . . . , qn , c) we define lab(τ ) := f , wt(τ ) := c,
dom(τ ) := q, cdomi (τ ) := qi and cdom(τ ) := {q1 , . . . , qn } and for s ∈ S with
σ(s) = (q1 , q2 , c) we define dom(s) := q1 , cdom(s) := q2 and wt(s) := c.
Let A be a wWTA. Runs through A are defined inductively: If τ ∈ T , λ(τ ) =
(q, a, c), then τ is a run of A with root q along a. If s ∈ S, σ(s) = (q, q ′ , c) and p is
a run of A with root q ′ along t. Then s · p is a run of A with root q along t. If finally
τ ∈ T , λ(τ ) = (q, f, q1 , . . . , qn , c) and p1 , . . . , pn are runs of A with root q1 , . . . , qn
along trees t1 , . . . , tn , respectively, then τ p1 , . . . , pn is a run of A with root q along
f t1 , . . . , tn . The root of a run p will be denoted by root(p). A run is called initial if
its root is in I. With runt (A) we denote the set of all initial runs in A along t and with
run(A) we denote the set of all initial runs of A. A (silent) transition symbol is called
reachable if it is involved in some initial run of A. A state of A is called reachable if it
is the domain of some reachable (silent) transition-symbol.
To each run p of A we may associate a weighted tree |p|. This is done by induction on
the structure of p. If p = τ , λ(τ ) = (q, a, c), then |τ | := [a|c]. If p = s · p′ with σ(s) =
(q1 , q2 , c), then |p| := c · |p′ | and if p = τ p1 , . . . , pn , λ(τ ) = (q, f, q1 , . . . , qn , c),
then |p| := [f |c]|p1 |, . . . , |pn |. The weighted tree-language recognized by A is defined
as LA := (run(A), |.|A ) where |p|A := ι(root(p)) · |p|. A weighted tree-language L
is called weakly recognizable if there is a finite wWTA A with L ∼
= LA . Two wWTAs
A1 , A2 are called equivalent (denoted by A1 ≡ A2 ) if LA1 ∼
= LA2 . A wWTA A is
called reduced if each of its states and (silent) transition-symbols is reachable. It is called
normalized if it has precisely one initial state and the initial weight of this state is equal
to 1. It is easy to see that for every wWTA A there is a reduced, normalized wWTA A′
such that A ≡ A′ . Therefore, from now on we will only consider normalized wWTAs.
Since the description of wWTAs by a tuple of certain set and mappings is tedious
we sometimes prefer a graphical representation. In such a representation each transitionsymbol τ with λ(τ ) = (q, f, q1 , . . . , qn , c) will be depicted by
392
C. Pech
qn
f |c
···
q
q2
q1 .
The output-arms are always ordered counterclockwise starting directly after the inputarm. The initial weights are depicted by arrows to the initial states carrying weights.
Silent transition symbols are represented by arrows between states that are equipped
with a weight. In normalized wWTAs we usually omit the arrow with the initial weight.
Let us give a small example of a wWTA:
1
i
f |1
q2
2
1
q3
g|2
q5
q4
f |3
q6
∗|1
∗|2
A weighted tree-automaton (WTA) is a wWTA with empty set of silent transitionsymbols. A weighted tree-language L is called recognizable if there is a WTA A such
that L ∼
= LA .
Proposition 5. Let L1 , . . . , Ln be recognizable weighted tree-languages, c ∈ K, f ∈
Σ (n) , a ∈ Σ (0) . Then the c · L1 , L1 + L2 , [f |c]L1 , . . . , Ln and L1 ·a L2 are also
recognizable.
⊓
⊔
Note that recognizable weighted tree-languages are always finitary. In particular we
can see already that the recognizable weighted tree-languages will not be closed with
respect to a-iteration (e.g. {[a|c]}∗a is not recognizable).
It is clear that recognizability implies weak recognizability. However, the converse
does not hold. In the next few paragraphs we will give necessary and sufficient conditions
for a wWTA to recognize a recognizable weighted tree-language.
A word s = s1 · · · sk ∈ S ∗ of silent transitions of A is called silent path if
cdom(si ) = dom(si+1 ) (1 ≤ i < k). By convention, the empty word ε counts also
as a silent path. We may extend dom and cdom to non-empty silent paths according to
dom(s) := dom(s1 ), cdom(s) := cdom(sk ). A silent path s with dom(s) = cdom(s)
is called silent cycle. If any silent transition of a silent cycle is reachable then the cycle
is called reachable. The set of all silent paths of A is denoted by sPA .
To each silent path s ∈ sPA we assign a weight wt(s) ∈ K according to wt(ε) := 1,
wt(s · s) := wt(s) ⊙ wt(s).
Silent cycles play a crucial role in the characterization of the finitary weakly recognizable weighted tree-languages.
Proposition 6. Let A be a wWTA. Then LA is finitary if and only if A does not contain
a reachable silent cycle.
⊓
⊔
Proposition 7. Let A be a wWTA without reachable silent cycles. Then there is a WTA
A′ such that LA ∼
= LA′ .
Kleene’s Theorem for Weighted Tree-Automata
393
Proof. Since the normalization and reduction of wWTAs do not introduce new silent
cycles, we can assume that A = (Q, {i}, ι, T, λ, S, σ) is normalized and reduced. Let
A have no silent cycles. Then we claim that sPA is finite, for assume it is not, then it
contains words of arbitrary length (because S is finite). Hence it would also contain a
word of length > |Q| but such a word contains necessarily a cycle—contradiction.
Let us construct the WTA A′ now. Its state set is Q and the set of transitions T ′
of A′ is defined as follows: T ′ := {(s, t) | s ∈ sPA , t ∈ T, s = ε or cdom(s) =
dom(t)} and λ′ (s, t) := (q ′ , f, q1 , . . . , qn , c′ ) where λ(t) = (q, f, q1 , . . . , qn , c) and
c′ := wt(s) ⊙ c and where q ′ = q if s = ε and q ′ = dom(s) else. Altogether A′ =
(Q, {i}, ι, T ′ , λ′ , ∅, ∅). We skip the proof that A′ is indeed equivalent to A.
⊓
⊔
As immediate consequence we get that a weakly recognizable weighted tree-language
is recognizable if and only if it is finitary. Another important question is how to decide
whether a given wWTA recognizes an a-quasiregular weighted tree-language. A short
thought reveals that a wWTA A fails to be a-quasiregular if and only if either there is
some t ∈ T with dom(t) ∈ I, lab(t) = a or there exists a silent path s starting in I and
ending in a state that is the domain of a transition t ∈ T with lab(t) = a.
Proposition 8. Let L1 , . . . , Ln be weakly recognizable weighted tree-languages, c ∈
K, f ∈ Σ (n) , a ∈ Σ (0) . Then the c · L1 , L1 + L2 , [f |c]L1 , . . . , Ln , L1 ·a L2 , (L1 )∗a ,
µ
(L1 )+
a and (L1 )a are also weakly recognizable.
Proof. Each operation is defined as construction on wWTAs. Then we argue that the
assignment A → LA preserves the operations up to isomorphism.
i1
c·A
:=
i′
1
c i
A1 + A2
A
:=
i
1
i2
f |c
[f |c]A1 , . . . , An
:=
i1
i2
A1
a|c1
a|c2
i1
a|ck
i2
A2
A2
ik
Ak
c1
c2
i1
:=
i2
···
···
·a
···
A1
i
ck
c2
c1
a|c1
a|c2
i
a|ck
+
a|c1
:=
i
···
···
a
a|c2
a|ck
ck
c2
a|c1
a|c2
i
a|ck
µ
a
c1
:=
i
···
···
ck
′
The a-iteration of a wWTA A can now be defined according to A∗a := A+
a + A where
′
A is a wWTA that recognizes {[a|1]}.
⊓
⊔
394
C. Pech
4 A Kleene-Type Result
Let X be a set of variable symbols disjoint from Σ and let K be a semiring. The set
Rat(Σ, K, X) of rational expressions over Σ, X and K is the set E of words given by
the following grammar:
E ::= a | x | c · E | E + E | f E, . . . , E | µx.(E) a ∈ Σ (0) , x ∈ X, f ∈ Σ
where in f E, . . . , E the number of E’s is equal to the rank of f .
The semantics of rational expressions is given in terms of weighted tree-languages
over the ranked alphabet Σ(X). It is defined inductively: [[a]] := {[a|1]}, [[x]] := {[x|1]},
[[f e1 , . . . , en ]] := [f |1][[e1 ]], . . . , [[en ]], [[c · e]] := c · [[e]], [[e1 + e2 ]] := [[e1 ]] + [[e2 ]]
µ
and [[µx.(e)]] := [[e]]x .
We have already seen that the semantics of each rational expression is weakly
recognizable. Showing the opposite direction—namely that each weakly recognizable
weighted tree-language is isomorphic to the semantics of a rational expression—will be
our goal in the next few paragraphs. As first step into this direction we introduce the
accessibility graph of wWTAs.
Let A = (Q, {i}, ι, T, λ, S, σ) be a normalized wWTA. Let E1 :=
T (j) ×
j∈IN\{0}
˙ and define s : E → Q according to s(e) = dom(t) if
{1, 2, . . . , j}, E := E1 ∪S,
e = (t, j), t ∈ T and s(e) = dom(e) if e ∈ S. Moreover define d : E → Q according
to d(e) := cdomj (t) if e = (t, j), t ∈ T and d(e) := cdom(e) if e ∈ S. Then the
multigraph ΓA = (Q, E, s, d) is called accessibility-graph of A.2
A path of length n in ΓA = (Q, E, s, d) is a word e1 e2 · · · en where e1 , . . . , en ∈ E
and such that d(ej ) = s(ej+1 ) (j = 1, . . . , n−1). Such a path is called cyclic if d(en ) =
s(e1 ). It is called minimal cycle if for all 1 ≤ j, k ≤ n we have s(ej ) = s(ek ) ⇒ j = k.
The number of minimal cycles of ΓA is called the cyclicity of A. It is denoted by cyc(A).
A state q of A is called source if it is a source of ΓA . That is, there does not exist
any arc e of ΓA with d(e) = q.
Let A = (Q, {i}, ι, T, λ, S, σ) be a normalized wWTA. Let τ ∈ T with domain i.
Assume λ(τ ) = (i, f, q1 , . . . , qn , c). Then for 1 ≤ k ≤ n the derivation of A by (q, k)
is the reduction of the automaton (Q, {qk }, ι′ , T, λ, S, σ) where ι′ maps qk to 1. It will
∂A
. Moreover we define the complete derivation of A by τ as the tuple
be denoted by ∂(τ,k)
∂A
∂A
∂A
∂τ := ∂(τ,1) , . . . , ∂(τ,n) .
Analogously, for s ∈ S with σ(s) = (i, q, c) we define the derivation of A by s as
the reduction of the automaton (Q, {q}, ι′ , T, λ, S, σ) where ι′ maps q to 1. It will be
denoted by ∂A
∂s .
Proposition 9. With the notions from above let Ti ⊆ T , Si ⊆ S be the sets of all
transition-symbols and silent transition-symbols with domain i, respectively. Then
∂A
∂A
[lab(τ )|wt(τ )]
+
A≡
wt(s)
∂τ
∂s
τ ∈Ti
s∈Si
⊓
⊔
2
The function names s and d are abbreviations for “source” and “destination” of arcs, respectively.
Kleene’s Theorem for Weighted Tree-Automata
395
Proposition 10. Let A = (Q, {i}, ι, T, λ, S, σ) be a reduced and normalized wWTA
whose initial state i is not a source. Let x be a variable symbol that does not occur in A. Define Q′ := Q + {q ′ } and T ′ := T + {τ ′ } and let ϕ : Q → Q′ such
that ϕ(q) = q if q = i and ϕ(i) = q ′ . For τ in T with λ(τ ) = (q, f, q1 , . . . , qn , c)
define λ′ (τ ) := (q, f, ϕ(q1 ), . . . , ϕ(qn ), c) and for s in S with σ(s) = (q1 , q2 , c)
define σ ′ (s) := (q1 , ϕ(q2 ), c). Finally define λ′ (τ ′ ) := (q ′ , x, 1). Then the wWTA
A′ = (Q′ , {i}, ι, T ′ , λ′ , S, σ ′ ) is still normalized and reduced with i being a source.
Moreover (A′ )µx ≡ A.
q ′ [x|1]
i
···
A′ :
(A′ )µx :
q′
i
···
i
···
A :
1
Theorem 11. Every weakly recognizable weighted tree-language is definable by a rational expression.
Proof. We prove inductively that each wWTA recognizes a rationally definable weighted
tree-language.
To each normalized automaton A = (Q, {i}, ι, T, λ, S, σ) we associate the pair of
integers (cyc(A), |Q|). On these integer-pairs we consider the lexicographical order:
(x, y) ≤ (u, v) ⇐⇒ x < u ∨ (x = u ∧ y ≤ v) and take this as an induction-index.
Since any wWTA has an initial state, the smallest possible index is (0, 1). Such
an automaton has Q = {i} and S = ∅. Moreover if T = {t1 , . . . , tn } then there
are a1 , . . . , an ∈ Σ (0) ∪ X and c1 , . . . , cn ∈ K such that λ(tk ) = (i, ak , ck )
(k = 1, . . . , n). The weighted tree-language that is recognized by such an automaton is
{[a1 |c1 ], . . . , [an |cn ]} this is definable by the following rational expression:
n
ck · ak .
k=1
Suppose now the claim holds for all wWTAs with index less than (n, m). Let A =
(Q, {i}, ι, T, λ, S, σ) be a normalized wWTA with cyc(A) = n and |Q| = m.
If i is a source, then we use Proposition 9 and obtain
∂A
∂A
+
wt(s)
A≡
[lab(τ )|wt(τ )]
∂τ
∂s
τ ∈Ti
s∈Si
∂A
For τ ∈ Ti of arity k let Aτ,k := ∂(τ,k)
and for s ∈ Si let As := ∂A
∂s .
Since the number of states of Aτ,k is strictly smaller than that of A and the cyclicity
of Aτ,k is not greater that that of A, we conclude that the index of Aτ,k is strictly smaller
than that of A. Hence the weighted tree-language that is recognized by Aτ,k is rationally
definable. The same holds for the derivations by silent transitions.
(j)
For j ∈ IN, τ ∈ Ti , 1 ≤ k ≤ j let eτ,k be a rational expression defining a weighted
tree-language isomorphic to the one recognized by Aτ,k . Moreover, for s ∈ Si let es
396
C. Pech
be a rational expression defining a weighted tree-language that is isomorphic to the one
recognized by As . Then
wt(s) · es
[lab(τ )|wt(τ )]et,1 , . . . , et,j +
j∈IN τ ∈T (j)
i
s∈Si
is a rational expression defining a weighted tree-language isomorphic to LA .
If i is not a source then we use Proposition 10 and obtain a wWTA A′ such that
′ µ
(A )x ≡ A. Clearly, A′ has a smaller cyclicity and hence also a smaller index than A.
By induction hypothesis there is a rational expression e such that [[e]] ≡ LA′ . Therefore
µx.(e) is a rational expression for LA .
⊓
⊔
If we want to characterize the recognizable weighted tree-languages in a similar way,
then we must take care about the problem that only the a-recursion of a-quasiregular
recognizable weighted tree-languages is guaranteed to be recognizable again. Therefore
we restrict the set of rational expressions: The set pRat(Σ, X, K) of proper rational expressions shall consist of all words of the language E defined by the following grammar:
E ::= a | x | c · E | E + E | f E, . . . , E | µx.(Ex )
Ex ::= a | y | c · Ex | Ex + Ex | f E, . . . , E | µx.(Ex ).
where a ∈ Σ (0) , x, y ∈ X, x = y, c ∈ K and f ∈ Σ. The semantics of proper rational
expressions is the same as for rational expressions. The essential difference between
Rat and pRat is that an expression µx.(e) is in pRat only if [[e]] is x-quasiregular.
Therefore it is clear that the semantics of proper rational expressions are always going
to be recognizable.
Theorem 12. For every recognizable weighted tree-language L there is a proper rational expression e such that L ∼
= [[e]].
Proof. L is recognized by a wWTA without silent cycles. The decomposition steps to
obtain a rational expression for L never introduce new silent cycles (in fact they never
introduce any cycles). Therefore the construction from the proof of Theorem 11 produces
a proper rational expression.
⊓
⊔
5
Formal Tree-Series
Given a ranked alphabet (Σ, rk) and a semiring (K, ⊕, ⊙, 0, 1) let TΣ be the set of all
trees over Σ. A function S : TΣ → K is called formal tree-series. We will adopt the
usual notation and write (S, t) for the image of t under S. With KΣ we will denote
the set of all formal tree-series over Σ.
Let WTΣ be the set of all weighted trees over Σ with weights from K. To each
weighted tree t we associate its weight wt(t) ∈ K and its underlying tree ut(t) ∈ TΣ .
The function ut we already defined above. The function wt : WTΣ → K is defined
n
inductively: wt([a|c]) := c and wt([f |c]t1 , . . . , tn ) := c ⊙
wt(ti ).
i=1
Kleene’s Theorem for Weighted Tree-Automata
397
An easy property of wt is that wt(c · t) = c ⊙ wt(t) for all t ∈ WTΣ . Another very
crucial property only holds if K is commutative, namely for t ∈ WTΣ with rka (t) = n
and for s1 , . . . , sn ∈ WTΣ :
n
wt(t ◦a s1 , . . . , sn ) = wt(t) ⊙
wt(si ).
i=1
From now on we assume that K is commutative.
Given now a finitary L = (L, |.|) ∈ WTLΣ we associate a formal tree-series SL
with L according to:
(SL , t) :=
wt(|s|)
(t ∈ TΣ ).
s∈L
ut(|s|)=t
Since L is finitary, SL is welldefined.
We call S ∈ KΣ a-quasiregular if (S, a) = 0 and we call S recognizable if there
is a recognizable L ∈ WTLΣ with SL = S. It is easy to see that if a finitary weighted
tree-language L is a-quasiregular, then SL is a-quasiregular.
The operations of sum and product with scalars can be introduced for formal treeseries pointwise. That is (S1 + S2 , t) := (S1 , t) ⊕ (S2 , t) and (c · S1 , t) := c ⊙ (S1 , t)
for any S1 , S2 ∈ KΣ, c ∈ K. It is not surprising that for any L1 , L2 ∈ WTLfΣ we
have SL1 +L2 = SL1 + SL2 and Sc·L1 = c · SL1 .
Next we define the a-product of formal tree-series S1 , S2 for a ∈ Σ (0) according to
rka (s)
(S1 , s) ⊙
(S1 ·a S2 , t) :=
s∈TΣ
s1 ,... ,srka (s) ∈TΣ
t=s◦a s1 ,... ,srka (s)
(S2 , si )
i=1
Whenever K is commutative, then for L1 , L2 ∈ WTLfΣ we have SL1 ·a L2 = SL1 ·a SL2 .
Let f ∈ Σ (n) , c ∈ K, S1 , . . . , Sn ∈ KΣ. Then we define the topcatenation
[f |c]S1 , . . . , Sn according to
([f |c]S1 , . . . , Sn , t) =
c⊙
0
n
i=1 (Si , ti )
if t = f t1 , . . . , tn
else.
Again, some thought reveals that we have S[f |c]L1 ,... ,Ln = [f |c]SL1 , . . . , SLn for
all L1 , . . . , Ln ∈ WTLfΣ . Note that here the semiring does not need to be commutative.
The most delicate operation to define for formal tree-series is the a-iteration. Luckily
we showed its close relation to free ranked monoids. This relationship we use to define
the a-iteration on formal tree-series. Let S ∈ KΣ, a ∈ Σ (0) such that (S, a) = 0. Let
(TΣ , rka )∗ be the free ranked monoid generated by (TΣ , rka ) (cf. (1) in Section 1). Let
TΣ∗ be its carrier and ε be its neutral element. Let ϕ : (TΣ , rka )∗ → (TΣ , rka , ◦a , a)
be the unique homomorphism induced by the identity map of TΣ . On TΣ∗ we define a
398
C. Pech
weight-function wt∗S inductively:
1
s=ε
s ∈ TΣ
(S, s)
rka (t)
wt∗S (s) :=
wt∗S (ti ) s = tt1 , . . . , trka (t) , t ∈ TΣ ,
(S, t) ⊙
i=1
t1 , . . . , trka (t) ∈ TΣ∗ .
Then we define Sa∗ ∈ KΣ according to
(Sa∗ , t) :=
wt∗S (s).
∗
s∈TΣ
ϕ(s)=t
Assume K is commutative and L ∈ WTLfΣ is a-quasiregular. Then SL∗a = (SL )∗a .
Summing up we obtain:
Proposition 13. If K is commutative, then the assignment L → SL preserves sum,
product with scalars, a-product, topcatenation and a-iteration.
⊓
⊔
For S ∈ KΣ we define the a-annihilation by S¬a := S ·a 0 where 0 denotes the
series that maps each tree to 0. Clearly we have SL¬a = (SL )¬a for any L ∈ WTLfΣ .
The a-recursion of formal tree-series can also be introduced easily now. Let S ∈
KΣ be a-quasiregular. Then we define Saµ := (Sa∗ )¬a . Using the characterization
of the a-recursion through the a-iteration of weighted tree-languages, it is evident that
SLµa = (SL )µa .
It is clear that for any e ∈ pRat(Σ, X, K) we get that S[[e]] is a recognizable element
of KΣ(X). From Theorem 12 and from Proposition 13 we obtain immediately the
following result
Theorem 14. Let K be commutative and let S ∈ KΣ(X) be recognizable. Then
there is a proper rational expression e with S = S[[e]] .
⊓
⊔
Using that a-product preserves recognizability and that the a-recursion may be simulated by a-iteration and a-product, we can also formulate a more conventional Kleenetype result:
Corollary 15. Let K be commutative. Then the set of all recognizable formal tree-series
over Σ(X) is the smallest subset of KΣ(X) that contains all polynomials and that
is closed with respect to sum, product with scalars, x-product (x ∈ X) and x-iteration
(x ∈ X).
⊓
⊔
References
1. Berstel, J., Reutenauer, C.: Recognizable formal power series on trees. Theoretical Computer
Science 18 (1982) 115–148
2. Bloom, S.L., Ésik, Z.: An extension theorem with an application to formal tree series. BRICS
Report Series RS-02-19, University of Aarhus (2002)
Kleene’s Theorem for Weighted Tree-Automata
399
3. Bozapalidis, S.: Equational elements in additive algebras. Theory Comput. Systems 32 (1999)
1–33
4. Culik, K., Kari, J.: Image compression using weighted finite automata. Computer and
Graphics 17 (1993) 305–313
5. Droste, M., Vogler, H.: A Kleene theorem for weighted tree automata. technical report
TUD-FI02-04, Technische Universität Dresden (2002)
6. Kleene, S.E.: Representation of events in nerve nets and finite automata. In Shannon, C.E.,
McCarthy, J., eds.: Automata Studies. Princeton University Press, Princeton, N.J. (1956) 3–42
7. Kuich, W.: Formal power series over trees. In: Proc. of the 3rd International Conference
Developments in Language Theory, Aristotle University of Thesaloniki (1997) 60–101
8. Mohri, M.: Finite-state transducers in language and speech processing. Computational
Linguistics 23 (1997) 269–311
9. Pech, C.: Kleene-type results for weighted tree-automata. Dissertation, TU-Dresden (2003)
http://www.math.tu-dresden.de/˜pech/diss.ps.
10. Schützenberger, M.P.: On the definition of a family of automata. Information and Control 4
(1961) 245–270
11. Thatcher, J.W., Wright, J.B.: Generalized finite automata theory with application to a decision
problem of second-order logic. Math. Systems Theory 2 (1968) 57–81
Weak Cardinality Theorems for First-Order
Logic
(Extended Abstract)
Till Tantau
Fakultät IV – Elektrotechnik und Informatik
Technische Universität Berlin
Franklinstraße 28/29, D-10587 Berlin, Germany
tantau@cs.tu-berlin.de
Abstract. Kummer’s cardinality theorem states that a language A is
recursive if a Turing machine can exclude for any n words w1 , . . . , wn
one of the n + 1 possibilities for the cardinality of {w1 , . . . , wn } ∩ A. It
is known that this theorem does not hold for polynomial-time computations, but there is evidence that it holds for finite automata: at least
weak cardinality theorems hold for them. This paper shows that some
of the weak recursion-theoretic and automata-theoretic cardinality theorems are instantiations of purely logical theorems. Apart from unifying
previous results in a single framework, the logical approach allows us to
prove new theorems for other computational models. For example, weak
cardinality theorems hold for Presburger arithmetic.
1
Introduction
Given a language A and n input words, we often wish to know which of these
words are in the language. For languages like the satisfiability problem this
problem is presumably difficult to solve, for languages like the halting problem
it is impossible to solve. To tackle such problems, Gasarch [7] has proposed to
study a simpler problem instead: we just count how many of the input words
are elements of A. To make things even easier, we do not require this number to
be computed exactly, but only approximately. Indeed, let us just try to exclude
one possibility for the number of input words in A.
In recursion theory, Kummer’s cardinality theorem [16] states that, using a
Turing machine, excluding one possibility for the number of input words in A
is just as hard as deciding A. It is not known whether this statement carries
over to automata theory, that is, it is not known whether a language A must be
regular if a finite automaton can always exclude one possibility for the number
of input words in A. However, several weak forms of this theorem are known to
hold for automata theory. For example, the finite automata cardinality theorem
is known [25] to hold for n = 2.
These parallels between recursion and automata theory are surprising insofar
as computational models ‘in between’ exhibit a different behaviour: there are
A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 400–411, 2003.
c Springer-Verlag Berlin Heidelberg 2003
Weak Cardinality Theorems for First-Order Logic
401
languages A outside the class P of problems decidable in polynomial time for
which we can always exclude, in polynomial time, for any n ≥ 2 words one
possibility for their number in A.
The present paper explains (at least partly) why the parallels between recursion and automata theory exist and why they are not shared by the models in
between. Basically, the weak cardinality theorems for Turing machines and finite
automata are just different instantiations of the same logical theorems. These
logical theorems cannot be instantiated for polynomial time, because polynomial
time lacks a logical characterisation in terms of elementary definitions.
Using logic for the formulation and proof of the weak cardinality theorems
has another advantage, apart from unifying previous results. Theorems formulated for arbitrary logical structures can be instantiated in novel ways: the weak
cardinality theorems all hold for Presburger arithmetic and the nonspeedup theorem also holds for ordinal number arithmetic.
In the logical setting ‘computational models’ are replaced by ‘logical structures’ and ‘computations’ are replaced by ‘elementary definitions’. For example,
the cardinality theorem for n = 2 now becomes the following statement: Let S be
a logical structure with universe U satisfying certain requirements and let A ⊆ U .
If there exists a function f : U × U → {0, 1, 2} with f (x, y) = |{x, y} ∩ A| for all
x, y ∈ U that is elementarily definable in S, then A is elementarily definable in S.
Cardinality computations have applications in the study of separability. As
argued in [26], ‘cardinality theorems are separability results in disguise’. In recursion theory and in automata theory one can rephrase the weak cardinality
theorems as separability results. Such a rephrasing is also possible for the logical versions and we can formulate purely logical separability theorems that are
interesting in their own right. An example of such a theorem is the following
statement: Let S be a logical structure with universe U satisfying certain requirements and let A ⊆ U . If there exist elementarily definable supersets of
A × A, A × Ā, and Ā × Ā whose intersection is empty, then A is elementarily
definable in S.
This paper is organised as follows. In section 2 the history of the cardinality
theorem is retraced and the weak cardinality theorems are formulated rigorously.
Section 3 prepares the logical formulation of the weak cardinality theorems. It is
shown how the class of regular languages and the class of recursively enumerable
languages can be characterised in terms of appropriate elementary definitions.
In section 4 the weak cardinality theorems for first-order logic are formulated.
In section 5 applications of the theorems to separability are discussed.
This extended abstract does not include any proofs due to lack of space.
They can be found in the full technical report version of the paper [27].
2
2.1
History of the Cardinality Theorem
The Cardinality Theorem for Recursion Theory
For a set A, the cardinality function #An takes n words as input and yields the
number of words in A as output, that is, #An (w1 , . . . , wn ) = |{w1 , . . . , wn } ∩ A|.
402
T. Tantau
The cardinality function and the idea of ‘counting input words’, which is due to
Gasarch [7] in its general form, play an important role in a variety of proofs both
in complexity theory [9,12,14,18,23] and recursion theory [4,16,17]. For example,
the core idea of the Immerman–Szelepcsényi theorem is to count the number of
reachable vertices in a graph in order to decide a reachability problem.
One way of quantifying the complexity of #An is to consider its enumeration
complexity, which is the smallest number m such that #An is m-enumerable.
Enumerability, which was first defined by Cai and Hemaspaandra [6] in the
context of polynomial-time computations and which was later transferred to
recursive computations, can be regarded as ‘generalised approximability’. It is
defined as follows: a function f , taking n tuples of words as input, is m-Turingenumerable if there exists a Turing machine that on input w1 , . . . , wn starts
a possibly infinite computation during which it prints words onto an output
tape. At most m different words may be printed and one of them must be
f (w1 , . . . , wn ).
Intuitively, the larger m, the easier it should be to m-Turing-enumerate #An .
This intuition is wrong. Kummer’s cardinality theorem, see below, states that
even n-Turing-enumerating #An is just as hard as deciding A. In other words,
excluding just one possibility for #An (w1 , . . . , wn ) is just as hard as deciding A.
Intriguingly, the intuition is correct for polynomial-time computations since the
work of Gasarch, Hoene, and Nickelsen [7,11,20] shows that a polynomial-time
version of the cardinality theorem does not hold for n ≥ 2.
Theorem 2.1 (Cardinality theorem [16]). If #An is n-Turing-enumerable,
then A is recursive.
The cardinality theorem has applications for instance in the study of semirecursive sets [13], which play a key role in the solution of Post’s problem [22].
The proof of the cardinality theorem is difficult. Several less general results had
already been proved when Kummer wrote his paper ‘A proof of Beigel’s cardinality conjecture’ [16]. The title of Kummer’s paper refers to the fact that
Richard Beigel was the first to conjecture the cardinality theorem as a generalisation of his so-called ‘nonspeedup theorem’ [3]. In the following formulation
of the nonspeedup theorem χnA denotes the n-fold characteristic function of A,
which maps any n words w1 , . . . , wn to a bitstring whose ith bit is 1 iff wi ∈ A.
The nonspeedup theorem is a simple consequence of the cardinality theorem.
Theorem 2.2 (Nonspeedup theorem [3]). If χnA is n-Turing-enumerable,
then A is recursive.
Owings [21] succeeded in proving the cardinality theorem for n = 2. For
larger n he could only show that if #An is n-Turing-enumerable, then A is recursive in the halting problem. Harizanov et al. [8] have formulated a restricted
cardinality theorem, whose proof is somewhat simpler than the proof of the full
cardinality theorem.
Theorem 2.3 (Restricted cardinality theorem [8]). If #An is n-Turingenumerable via a Turing machine that never enumerates both 0 and n simultaneously, then A is recursive.
Weak Cardinality Theorems for First-Order Logic
2.2
403
Weak Cardinality Theorems for Automata Theory
If we restrict the computational power of Turing machines, the cardinality theorem no longer holds [7,11,20]: there are languages A ∈
/ P for which we can always
exclude one possibility for #An (w1 , . . . , wn ) in polynomial time for n ≥ 2. However, if we restrict the computational power even further, namely if we consider
finite automata, there is strong evidence that the cardinality theorem holds once
more, see the following conjecture:
Conjecture 2.4 ([25]). If #An is n-fa-enumerable, then A is regular.
The conjecture refers to the notion of m-enumerability by finite automata.
This notion was introduced in [24] and is defined as follows: A function f is mfa-enumerable if there exists a finite automaton for which for every input tuple
(w1 , . . . , wn ) the output attached to the last state reached is a set of size at
most m that contains f (w1 , . . . , wn ). The different components of the tuple are
put onto n different tapes, shorter words padded with blanks, and the automaton
scans the tapes synchronously, which means that all heads advance exactly one
symbol in each step. The same method of feeding multiple words to a finite
automaton has been used in [1,2,15].
In a line of research [1,2,15,24,25,26], the following three theorems were established. They support the above conjecture by showing that all of the historically
earlier, weak forms of the recursion-theoretic cardinality theorem hold for finite
automata.
Theorem 2.5 ([24]). If χnA is n-fa-enumerable, then A is regular.
Theorem 2.6 ([25]). If #A2 is 2-fa-enumerable, then A is regular.
Theorem 2.7 ([25,2]). If #An is n-fa-enumerable via a finite automaton that
never enumerates both 0 and n simultaneously, then A is regular.
3
Computational Models as Logical Structures
The aim of formulating purely logical versions of the weak cardinality theorems
is to abstract from concrete computational models. The present section explains
which logical abstraction is used.
3.1
Presburger Arithmetic
Let us start with an easy example: Presburger arithmetic. This notion is easily
transferred to a logical setting since it is defined in terms of first-order logic
in the first place. A set A of natural numbers is called definable in Presburger
arithmetic if there exists a first-order formula φ(x) over the signature {+2 }
with the following property: A contains exactly those numbers a that make φ(x)
404
T. Tantau
true if we interpret x as a and the symbol + as the normal addition of natural
numbers. For example, the set of even natural numbers is definable in Presburger
arithmetic using the formula φ(x) = ∃y (y + y = x).
In the abstract logical setting used in the next sections the ‘computational
model Presburger arithmetic’ is represented by the logical structure (N, +). The
class of sets that are ‘computable in Presburger arithmetic’ is given by the class
of sets that are elementarily definable in (N, +). Recall that a relation R is
called elementarily definable in a logical structure S if there exists a first-order
formula φ(x1 , . . . , xn ) such that (a1 , . . . , an ) ∈ R iff φ(x1 , . . . , xn ) holds in S if
we interpret each xi as ai .
3.2
Finite Automata
In order to make finite automata and regular languages accessible to a logical setting, for a given alphabet Σ we need to find a logical structure SREG,Σ
with the following property: a language A ⊆ Σ ∗ is regular iff it is elementarily
definable in SREG,Σ .
It is known that such a structure SREG,Σ exists: Büchi has proposed one [5],
though a small correction is necessary as pointed out by McNaughton [19].
However, the elements of Büchi’s structure are natural numbers, not words,
and thus a reencoding is necessary. A more directly applicable structure is discussed in [26], where it is shown that for non-unary alphabets the structure
(Σ ∗ , Iσ1 , . . . , Iσ|Σ| ) has the desired properties. The relations Iσi , one for each
symbol σi ∈ Σ, are binary relations that hold for a pair (u, v) of words if the
|v|-th letter of u is σi . For unary alphabets, an appropriate structure SREG,Σ
can also be constructed.
3.3
Polynomially Time-Bounded Turing Machines
There is no logical structure S such that the class of languages that are elementarily definable in S is exactly the class P of languages decidable in polynomial
time. To see this, consider the relation R = {(M, t) | M halts on input M after t
steps}. This relation is in P, but the language defined by the first-order formula
φ(M ) = ∃t R(M, t) is exactly the halting problem. Thus in any logical structure
in which we can elementarily define R we can also elementarily define the halting
problem.
3.4
Resource-Unbounded Turing Machines
On the one hand, the class of recursive languages cannot be defined elementarily: the argument for polynomial-time machines also applies here. On the other
hand, the arithmetical hierarchy contains exactly the sets that are elementarily
definable in (N, +, ·).
The most interesting case, the class of recursively enumerable languages, is
more subtle. Since the class is not closed under complement, it cannot be characterised by elementary definitions. However, it can be characterised by positive
Weak Cardinality Theorems for First-Order Logic
405
elementary definitions, which are elementary definitions that do not contain
negations: For every alphabet Σ there is a structure SRE,Σ such that a language A ⊆ Σ ∗ is recursively enumerable iff it is positively elementarily definable
in SRE,Σ . An example of such a structure SRE,Σ is the following: its universe
is Σ ∗ and it contains all recursively enumerable relations over the alphabet Σ ∗ .
4
Logical Versions of the Weak Cardinality Theorems
In this section the weak cardinality theorems for first-order logic are presented.
The theorems are first formulated for elementary definitions, which allows us to
apply them to all computational models that can be characterised in terms of elementary definitions. As argued in the previous section, this includes Presburger
arithmetic, finite automata, and the arithmetical hierarchy, but misses the recursively enumerable languages. This is remedied later in this section, where
positive elementary definitions are discussed. It is shown that at least the nonspeedup theorem can be formulated in a ‘positive’ way. At the end of the section
higher-order logics are briefly touched.
We are still missing one crucial definition for the formulation of the weak
cardinality theorems: What does it mean that a function is ‘m-enumerable in a
logical structure’ ?
Definition 4.1. Let S be a logical structure with universe U and m a positive
integer. A function f : U → U is (positively) elementarily m-enumerable in S if
there exists a relation R ⊆ U × U with the following properties:
1. R is (positively) elementarily definable in S,
2. the graph of f is contained in R,
3. R is m-bounded, that is, for every x ∈ U there exist at most m different y
with (x, y) ∈ R.
The definition is easily adapted to functions f that take more than one
input or yield more than one output. This definition does, indeed, reflect the
notion of enumerability: A function with finite range is m-fa-enumerable iff it is
elementarily m-enumerable in SREG,Σ ; a function is m-Turing-enumerable iff it
is positively elementarily m-enumerable in SRE,Σ .
4.1
The Non-positive First-Order Case
We are now ready to formulate the weak cardinality theorems for first-order
logic. In the following theorems, a logical structure is called well-orderable if a
well-ordering of its universe can be defined elementarily. For example (N, +) is
well-orderable using the formula φ≤ (x, y) = ∃z (x + z = y). The cross
product of
two function f and g is defined in the usual way by (f × g)(u, v) = f (u), g(v) .
The first of the weak cardinality theorems, the nonspeedup theorem, is actually just a corollary of a more general theorem that is formulated first: the cross
product theorem.
406
T. Tantau
Theorem 4.2 (Cross product theorem). Let S be a well-orderable logical
structure with universe U . Let f, g : U → U be functions. If f × g is elementarily
(n + m)-enumerable in S, then f is elementarily n-enumerable in S or g is
elementarily m-enumerable in S.
Theorem 4.3 (Nonspeedup theorem). Let S be a well-orderable logical
structure with universe U . Let A ⊆ U . If χnA is elementarily n-enumerable in S,
then A is elementarily definable in S.
Theorem 4.4 (Cardinality theorem for two words). Let S be a wellorderable logical structure with universe U . Let every finite relation on U be
elementarily definable in S. Let A ⊆ U . If #A2 is elementarily 2-enumerable
in S, then A is elementarily definable in S.
Theorem 4.5 (Restricted cardinality theorem). Let S be a well-orderable
logical structure with universe U . Let every finite relation on U be elementarily
definable in S. Let A ⊆ U . If #An is elementarily n-enumerable in S via a relation R that never ‘enumerates’ 0 and n simultaneously, then A is elementarily
definable in S.
The premises of the first two and the last two of the above theorems differ in
the following way: for the last two theorems we require that every finite relation
on S is elementarily definable in S. An example of a logical structure where this
is not the case is (ω1 , +, ·), where ω1 is the first uncountable ordinal number and
+ and · denote ordinal number addition and multiplication. Since this structure
is uncountable, there exist a singleton set A = {α} with α ∈ ω1 that is not
elementarily definable in (ω1 , +, ·). For this structure theorems 4.4 and 4.5 do
not hold: #A2 is elementarily 2-enumerable in (ω1 , +, ·) since #A2 (x, y) ∈ {0, 1}
for all x, y ∈ ω1 , but A is not elementarily definable in (ω1 , +, ·).
4.2
The Positive First-Order Case
The above theorems cannot be applied to Turing enumerability since they refer to
elementary definitions, not to positive elementary definitions. Unfortunately, the
proofs of the theorems cannot simply be reformulated in a ‘positive’ way. They
use negations to define the smallest element in a set B with respect
to a well
ordering <. The defining formula is given by φ(x) = B(x)∧¬∃x′ x′ < x∧B(x′ ) .
This is a fundamental problem: the set {(M, x) | x is the smallest word accepted by M } is not recursively enumerable. Thus if we insist on finding the
smallest element in every recursively enumerable set, we will not be able to apply the theorems to Turing machines. Fortunately, a closer examination of the
proofs shows that we do not actually need the smallest element in B, but just
any element of B as long as the same element is always chosen.
This is not as easy as it may sound—as is well-recognised in set theory, where
the axiom of choice is needed for this choosing operation. Suppose you and a
Weak Cardinality Theorems for First-Order Logic
407
friend wish to agree on a certain element of B, but neither you nor your friend
know the set B beforehand. Rather, you must decide on a generic method of
picking an element such that, when the set B becomes known to you and your
friend, you will both pick the same element. Agreements like ‘pick some element
from B’ will not guarantee that you both pick the same element, except if the
set happens to be a singleton.
We need a (partial) recursive choice function that assigns a word that is
accepted by M to every Turing machine M , provided such a word exists. Such a
choice function does, indeed, exist: it maps M to the first word that is accepted
by M during a dovetailed simulation of M on all words.
In the following, first-order logic is augmented by choice operators. Choice
operators have been used for example in [10], but the following definitions are
adapted to the purposes of this paper and differ from the formalism used in [10].
On the sematic side we augment logical structures by a choice function; on the
syntactic side we augment first-order logic by a choice operator ε:
Definition 4.6. A choice function on a set U is a function ζ : P(U ) → U such
that ζ(B) ∈ B for all nonempty B ⊆ U .
Definition 4.7. A choice structure is a pair (S, ζ) consisting of a logical structure S and a choice function ζ on the universe of S.
Definition 4.8 (Syntax of the choice operator). First-order formulas with
choice are defined inductively in the usual way with one addition: if x is a variable
and φ is a first-order formula with choice, so is ε(x, φ).
In the next definition φ(S,ζ) (x) = u ∈ U | (S, ζ) |= φ[x = a] denotes the
set of all u that make φ hold in (S, ζ) when plugged in for the variable x.
Definition 4.9 (Semantics of the choice operator). The semantics of firstorder logic with choice operator is defined in the usual way with the following
addition: a formula of the form ε(x, φ) holds in a choice
structure
(S, ζ) for an
assignment α if φ(S,ζ) (x) is nonempty and α(x) = ζ φ(S,ζ) (x) .
As an example, consider the logical structure S = (N, +, ·, <, 0) and let ζ map
every
nonempty set of natural numbers to its smallest element. Let φ(x, y, z) =
ε z, 0 < z ∧ ∃a (x · a = z) ∧ ∃b (y · b = z) . Then φ(S,ζ) (x, y, z) is the set of all
triples (n, m, k) such that k is the least common multiple of n and m: the formula
0 < z ∧ ∃a (x · a = z) ∧ ∃b (y · b = z) is true for all positive z that are multiples
of both x and y; thus the choice operator picks the smallest one of these.
The following theorem shows that the class of recursively enumerable sets
can be characterised in terms of first-order logic with choice.
Theorem 4.10. For every alphabet Σ there exists a choice structure (SRE,Σ , ζ)
such that a language A ⊆ Σ ∗ is recursively enumerable iff it is positively elementarily definable with choice in (SRE,Σ , ζ).
408
T. Tantau
We can now formulate the cross product theorem and the nonspeedup theorem in such a way that they can be applied both to finite automata and to
Turing machines.
Theorem 4.11 (Cross product theorem, positive version). Let (S, ζ) be
a choice structure with universe U . Let the inequality relation on U be positively
elementarily definable in (S, ζ). Let every finite relation on U that is elementarily
definable with choice in (S, ζ) be positively elementarily definable with choice in
(S, ζ). Let f, g : U → U be functions. If f × g is positively (n + m)-enumerable
with choice in (S, ζ), then f is positively n-enumerable with choice in (S, ζ) or
g is positively m-enumerable with choice in (S, ζ).
Theorem 4.12 (Nonspeedup theorem, positive version). Let (S, ζ) be a
choice structure with universe U . Let the inequality relation on U be positively
elementarily definable in (S, ζ). Let every finite relation on U that is elementarily
definable with choice in (S, ζ) be positively elementarily definable with choice in
(S, ζ). Let A ⊆ U . If χnA is positively n-enumerable with choice in (S, ζ), then
A is positively elementarily definable with choice in (S, ζ).
The cross product theorem, theorem 4.2, is a consequence of its positive
version, theorem 4.11. (And not the other way round, as one might perhaps
expect.) The same is true for the nonspeedup theorem. To see this, consider a
well-orderable structure S whose existence is postulated in theorem 4.2. Define
a choice structure (S ′ , ζ) as follows: S ′ has the same universe as S and contains
all relations that are elementarily definable in S. The function ζ maps each set A
to its smallest element with respect the well-ordering of S’s universe. With these
definitions, a relation is positively elementarily definable with choice in (S ′ , ζ)
iff it is elementarily definable in S.
4.3
The Higher-Order Case
We just saw that the cross product theorem for a certain logic, namely firstorder logic, is a consequence of the cross product theorem for a less powerful
logic, namely positive first-order logic. We may ask whether we can similarly
apply the theorems for first-order logic to higher-order logics.
This is indeed possible and we can use the same kind of argument as above:
Consider any logical structure S. Define a new structure S ′ as follows: it has the
same universe as S and it contains every relation that is higher-order definable
in S. Then a relation is elementarily definable in S ′ iff it is higher-order definable
in S. This allows us to transfer the cross product theorem and all of the weak
cardinality theorems to all logics that are at least as powerful as first-order logic.
Just one example of such a transfer is the following:
Theorem 4.13 (Cross product theorem for higher-order logic). Let S be
a well-orderable logical structure with universe U . Let f, g : U → U be functions.
If f × g is higher-order (n + m)-enumerable in S, then f is higher-order n-enumerable in S or g is higher-order m-enumerable in S.
Weak Cardinality Theorems for First-Order Logic
5
409
Separability Theorems for First-Order Logic
Kummer’s cardinality theorem can be reformulated in terms of separability. In
n
[26] it is shown that it is equivalent to the following statement, where A(k )
denotes the set of all n-tuples of distinct words such that exactly k of them are
in A.
Theorem 5.1 (Separability version of Kummer’s cardinality theorem).
n
Let A be a language. Suppose there exist recursively enumerable supersets of A( 0 ) ,
n
n
A( 1 ) , . . . , A(n) whose intersection is empty. Then A is recursive.
In [26] it is also shown that the above statement is still true if we replace
‘recursive enumerable’ by ‘co-recursively enumerable’.
The weak cardinality theorems for first-order logic can be reformulated in a
similar way. Let us start with the cardinality theorem for two words. It can be
stated equivalently as follows, where Ā = U \ A denotes the complement of A.
Theorem 5.2. Let S be a well-orderable logical structure with universe U . Let
every finite relation on U be elementarily definable in S. Let A ⊆ U . Suppose
there exist elementarily definable supersets of A × A, A × Ā, and Ā × Ā whose
intersection is empty. Then A is elementarily definable in S.
The restricted cardinality theorem can be reformulated in terms of elementary separability. Let us call two sets A and B elementarily separable in a structure S if there exists a set C with A ⊆ C ⊆ B̄ that is elementarily definable
in S.
Theorem 5.3. Let S be a well-orderable structure with universe U . Let every
n
n
finite relation on U be elementarily definable in S. Let A ⊆ U . If A( 0 ) and A(n)
are elementarily separable in S, then A is elementarily definable in S.
6
Conclusion
This paper proposed a new, logic-based approach to the proof of (weak) cardinality theorems. The approach has two advantages:
1. It unifies previous results in a single framework.
2. The results can easily be applied to other computational models.
Regarding the first advantage, only the cross product theorem and the nonspeedup theorem are completely ‘unified’ by the theorems presented in this
paper: the Turing machine versions and the finite automata versions of these
theorems are just different instantiations of theorems 4.11 and 4.12.
For the cardinality theorem for two words and for the restricted cardinality
theorem the situation is (currently) more complex. These theorem hold for Turing machines and for finite automata, but different proofs are used. In particular,
410
T. Tantau
the logical theorems cannot be instantiated for Turing enumerability. Nevertheless, the logical approach is fruitful here: the logical theorem can be instantiated
for new models like Presburger arithmetics.
Organised by computational model, the results of this paper can be summarised as follows: the cross product theorem and the nonspeedup theorem
–
–
–
–
–
–
hold for Presburger arithmetic,
hold for finite automata,
do not hold for polynomial-time machines,
hold for Turing machines,
hold for natural number arithmetic,
hold for ordinal number arithmetic.
The cardinality theorem for two inputs and the restricted cardinality theorem
–
–
–
–
–
–
hold for Presburger arithmetic,
hold for finite automata,
do not hold for polynomial-time machines,
hold for Turing machines,
hold for natural number arithmetic,
do not hold for ordinal number arithmetic.
The behaviour of ordinal number arithmetic is interesting: the cardinality theorem for two inputs and the restricted cardinality theorem fail since there exist ordinal numbers that are not elementarily definable in ordinal number arithmetic—
but this is not a ‘problem’ for the cross product theorem and the nonspeedup
theorem.
The results of this paper raise the question of whether the cardinality theorem holds for first-order logic. I conjecture that this is the case, that is, I
conjecture that for well-orderable structures S in which all finite relations can
be elementarily defined, if #An is elementarily n-enumerable then A is elementarily definable. Proving this conjecture would also settle the open problem of
whether the cardinality theorem holds for finite automata.
References
1. H. Austinat, V. Diekert, and U. Hertrampf. A structural property of regular
frequency computations. Theoretical Comput. Sci., 292(1):33–43, 2003.
2. H. Austinat, V. Diekert, U. Hertrampf, and H. Petersen. Regular frequency computations. In Proc. RIMS Symposium on Algebraic Systems, Formal Languages
and Computation, volume 1166 of RIMS Kokyuroku, pages 35–42. Research Inst.
for Mathematical Sci., Kyoto Univ., Japan, 2000.
3. R. Beigel. Query-Limited Reducibilities. PhD thesis, Stanford Univ., USA, 1987.
4. R. Beigel, W. I. Gasarch, M. Kummer, G. Martin, T. McNicholl, and F. Stephan.
The complexity of ODDA
n . J. Symbolic Logic, 65(1):1–18, 2000.
5. J. R. Büchi. On a decision method in restricted second-order arithmetic. In Proc.
1960 International Congress on Logic, Methodology and Philosophy of Sci., pages
1–11. Stanford Univ. Press, 1962.
Weak Cardinality Theorems for First-Order Logic
411
6. J.-Y. Cai and L. A. Hemachandra. Enumerative counting is hard. Inf. Computation, 82(1):34–44, 1989.
7. W. I. Gasarch. Bounded queries in recursion theory: A survey. In Proceedings of
the Sixth Annual Structure in Complexity Theory Conference, pages 62–78. IEEE
Computer Soc. Press, 1991.
8. V. Harizanov, M. Kummer, and J. Owings. Frequency computations and the
cardinality theorem. J. Symbolic Logic, 52(2):682–687, 1992.
9. L. A. Hemachandra. The strong exponential hierarchy collapses. J. Comput. Syst.
Sci., 39(3):299–322, 1989.
10. D. Hilbert and P. Bernay. Grundlagen der Mathematik II, volume 50 of Die
Grundlehren der mathematischen Wissenschaft in Einzeldarstellungen. SpringerVerlag, second edition, 1970.
11. A. Hoene and A. Nickelsen. Counting, selecting, and sorting by query-bounded
machines. In Proc. 10th International Symposium on Theoretical Aspects of Comp.
Sci., volume 665 of Lecture Notes on Comp. Sci., pages 196–205. Springer-Verlag,
1993.
12. N. Immerman. Nondeterministic space is closed under complementation. SIAM J.
Comput., 17(5):935–938, 1988.
13. C. G. Jockusch, Jr. Reducibilities in Recursive Function Theory. PhD thesis,
Massachusetts Inst. of Technology, USA, 1966.
14. J. Kadin. PNP[O(log n)] and sparse Turing-complete sets for NP. J. Comput. Syst.
Sci., 39(3):282–298, 1989.
15. E. B. Kinber. Frequency computations in finite automata. Cybernetics, 2:179–187,
1976.
16. M. Kummer. A proof of Beigel’s cardinality conjecture. J. Symbolic Logic,
57(2):677–681, 1992.
17. M. Kummer and F. Stephan. Effecitive search problems. Mathematical Logic
Quarterly, 40(2):224–236, 1994.
18. S. R. Mahaney. Sparse complete sets for NP: Solution of a conjecture of Berman
and Hartmanis. J. Comput. Syst. Sci., 25(2):130–143, 1982.
19. R. McNaughton. Review of [5]. J. Symbolic Logic, 28(1):100–102, 1963.
20. A. Nickelsen. On polynomially D-verbose sets. In Proceedings of the 14th International Symposium on Theoretical Aspects of Computer Science, volume 1200 of
Lecture Notes on Comp. Sci., pages 307–318. Springer-Verlag, 1997.
21. J. C. Owings, Jr. A cardinality version of Beigel’s nonspeedup theorem. J. Symbolic
Logic, 54(3):761–767, 1989.
22. E. L. Post. Recursively enumerable sets of positive integers and their decision
problems. Bulletin of the American Mathematical Society, 50:284–316, 1944.
23. R. Szelepcsényi. The method of forced enumeration for nondeterministic automata.
Acta Informatica, 23(3):279–284, 1988.
24. T. Tantau. Comparing verboseness for finite automata and Turing machines. In
Proc. 19th International Symposium on Theoretical Aspects of Comp. Sci., volume
2285 of Lecture Notes on Comp. Sci., pages 465–476. Springer-Verlag, 2002.
25. T. Tantau. Towards a cardinality theorem for finite automata. In Proc. 27th International Symposium on Mathematical Foundations of Comp. Sci., volume 2420
of Lecture Notes on Comp. Sci., pages 625–636. Springer-Verlag, 2002.
26. T. Tantau. On Structural Similarities of Finite Automata and Turing Machine
Enumerability Classes. PhD thesis, Technical Univ. Berlin, Germany, 2003.
27. T. Tantau. Weak cardinality theorems for first-order logic. Technical Report
TR03-024, Electronic Colloquium on Computational Complexity, www.eccc.unitrier.de/eccc, 2003.
Compositionality of Hennessy-Milner Logic
through Structural Operational Semantics
Wan Fokkink1,2 , Rob van Glabbeek1 , and Paulien de Wind2
1
2
CWI, Department of Software Engineering
PO Box 94079, 1090 GB Amsterdam, The Netherlands
Vrije Universiteit Amsterdam, Department of Theoretical Computer Science
De Boelelaan 1081a, 1081 HV Amsterdam, The Netherlands
wan@cwi.nl, http://www.cwi.nl/˜wan/
rvg@cs.stanford.edu, http://theory.stanford.edu/˜rvg/
pdwind@cs.vu.nl, http://www.cs.vu.nl/˜pdwind/
Abstract. This paper presents a method for the decomposition of HML
formulae. It can be used to decide whether a process algebra term satisfies a HML formula, by checking whether subterms satisfy certain formulae, obtained by decomposing the original formula. The method uses
the structural operational semantics of the process algebra. The main
contribution of this paper is that an earlier decomposition method from
Larsen [14] for the De Simone format is extended to the more general
ntyft/ntyxt format without lookahead.
1
Introduction
In the past two decades, compositional methods have been developed for checking the validity of assertions in modal logics, used to describe the behaviour of
processes. This means that the truth of an assertion for a composition of processes can be deduced from the truth of certain assertions for the components of
the composition. Most research papers in this area focus on a particular process
algebra.
Barringer, Kuiper & Pnueli [3] present (a preliminary version of) a
compositional proof system for concurrent programs, which is based on a rich
temporal logic, including operators from process logic [10] and LTL [20]. For
modelling concurrent programs they define a language including assignment,
conditional and while statements. Interaction between parallel components is
done via shared variables.
In Stirling [22] modal proof systems are developed for subsets of CCS [16]
(with and without silent actions) including only sequential and alternative composition, to decide the validity of formulae from Hennessy-Milner Logic (HML)
[11]. In Stirling [23,24] the results from [22] are extended, creating proof systems for subsets of CCS and SCCS [18] including asynchronous and synchronous
parallelism and infinite behaviour, using ideas from [3]. In Stirling [25] the proposals in [23,24] are generalised to be able to cope with the restriction operator.
A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 412–422, 2003.
c Springer-Verlag Berlin Heidelberg 2003
Compositionality of Hennessy-Milner Logic
413
In Winskel [26] a method is given to decompose formulae with respect
to each operation in SCCS. The language of assertions is HML with infinite
conjunction and disjunction. This decomposition provides the foundations of
Winskel’s proof system for SCCS with modal assertions. In [27], [2] and [1]
processes are described by specification languages inspired by CCS and CSP
[6]. The articles describe compositional methods for deciding whether processes
satisfy assertions from a modal µ-calculus [13].
Larsen [14] developed a more general compositional method for deciding
whether a process satisfies a certain property. Unlike the aforementioned methods, this method is not oriented towards a particular process algebra, but it is
based on structural operational semantics [19], which provides process algebras
and specification languages with an interpretation. A transition system specification, consisting of an algebraic signature and a set of transition rules of the
premises
, generates a transition relation between the closed terms over the
form conclusion
signature. An example of a transition rule, for alternative composition, is
a
x1 −→ y
a
x1 + x2 −→ y
meaning for states t1 , t2 and u that if state t1 can evolve into state u by the
execution of action a, then so can state t1 + t2 . Larsen showed how to decompose HML formulae with respect to a transition system specification in the De
Simone format [21]. This format was originally put forward to guarantee that
the bisimulation equivalence associated with a transition system specification is
a congruence, meaning that bisimulation equivalence is preserved by all functions in the signature. Larsen and Xinxin [15] extended this decomposition
method to HML with recursion (which is equivalent to the modal µ-calculus).
Since modal proof systems for specific process algebras are tailor-made, they
may be more concise than the ones generated by the general decomposition
method of Larsen (e.g., [23,24,25]). However, in some cases the general decomposition method does produce modal proof systems that are similar in spirit to
those in the literature (e.g., [22,26]).
In Bloom, Fokkink & van Glabbeek [4] a method is given for decomposing formulae from a fragment of HML with infinite conjunctions, with respect
to terms from any process algebra that has a structural operational semantics in
ntyft/ntyxt format [9] without lookahead. This format is a generalisation of the
De Simone format, and still guarantees that bisimulation equivalence is a congruence. The decomposition method is not presented in its own right, but is used
in the derivation of congruence formats for a range of behavioural equivalences
from van Glabbeek [8].
In this paper the decomposition method from [4] is extended to full HML with
infinite conjunction, again with respect to terms from any process algebra that
has a structural operational semantics in ntyft/ntyxt format without lookahead.
414
2
W. Fokkink, R. van Glabbeek, and P. de Wind
Preliminaries
In this section we give the basic notions of structural operational semantics
and Hennessy-Milner Logic (HML) that are needed to define our decomposition
method.
2.1
Structural Operational Semantics
Structural operational semantics [19] provides a framework to give an operational
semantics to programming and specification languages. In particular, because of
its intuitive appeal and flexibility, structural operational semantics has found
considerable application in the study of the semantics of concurrent processes.
Let V be an infinite set of variables. A syntactic object is called closed if it
does not contain any variables from V .
Definition 1 (signature). A signature is a collection Σ of function symbols
f ∈ V , equipped with a function ar : Σ → N. The set T(Σ) of terms over a
signature Σ is defined recursively by:
– V ⊆ T(Σ),
– if f ∈ Σ and t1 , . . . , tar(f ) ∈ T(Σ), then f (t1 , . . . , tar(f ) ) ∈ T(Σ).
A term c() is abbreviated as c. For t ∈ T(Σ), var(t) denotes the set of variables
that occur in t. T (Σ) is the set of closed terms over Σ, i.e. the terms t ∈ T(Σ)
with var(t) = ∅. A Σ-substitution σ is a partial function from V to T(Σ).
If σ is a Σ-substitution and S is any syntactic object, then σ(S) denotes the
object obtained from S by replacing, for x in the domain of σ, every occurrence
of x in S by σ(x). In that case σ(S) is called a substitution instance of S. A
Σ-substitution is closed if it is a total function from V to T (Σ).
In the remainder, let Σ denote a signature and A a set of actions, satisfying
|Σ| ≤ |V | and |A| ≤ |V |.
a
Definition 2 (literal). A positive Σ-literal is an expression t −→ t′ and a
a
with t, t′ ∈ T(Σ) and a ∈ A. For t, t′ ∈
negative Σ-literal an expression t −→
a
a
′
are said to deny each other.
T(Σ) and a ∈ A, the literals t −→ t and t −→
Definition 3 (transition rule). A transition rule over Σ is an expression of
the form H
α with H a set of Σ-literals (the premises of the the rule) and α a
positive Σ-literal (the conclusion). The left- and right-hand side of α are called
the source and the target of the rule, respectively. A rule H
α with H = ∅ is also
written α.
Definition 4 (transition system specification). A transition system specification (TSS) is a pair (Σ, R) with R a collection of transition rules over Σ.
Compositionality of Hennessy-Milner Logic
415
Definition 5 (proof ). Let P = (Σ, R) be a TSS. A proof of a transition rule
H
α from P is a well-founded, upwardly branching tree of which the nodes are
labelled by Σ-literals, and some of the leaves are marked “hypothesis”, such that:
– the root is labelled by α,
– H contains the labels of the hypotheses, and
– if β is the label of a node q which is not an hypothesis and K is the set of
labels of the nodes directly above q, then K
β is a substitution instance of a
transition rule in R.
If a proof of
K
α
from P exists, then
K
α
is provable from P , notation P ⊢
K
α.
Definition 6 (transition relation). A transition relation over Σ is a relation
a
a
for
→ ⊆ T (Σ) × A × T (Σ). We write p −→ q for (p, a, q) ∈ → and p −→
a
¬∃q ∈ T (Σ) : p −→ q.
Thus a transition relation over Σ can be regarded as a set of closed positive
Σ-literals (transitions). A TSS with only positive premises specifies a transition
relation in a straightforward way as the set of all provable transitions. But it
is much less trivial to associate a transition relation to a TSS with negative
premises. Several solutions are proposed in Groote [9], Bol & Groote [5]
and van Glabbeek [7]. From the latter we adopt the notion of a well-supported
proof and a complete TSS.
Definition 7 (well-supported proof ). Let P = (Σ, R) be a TSS. A wellsupported proof of a closed literal α from P is a well-founded, upwardly branching tree of which the nodes are labelled by closed Σ-literals, such that:
– the root is labelled by α, and
– if β is the label of a node q and K is the set of labels of the nodes directly
above q, then
1. either K
β is a closed substitution instance of a transition rule in R
2. or β is negative and for every set N of negative closed literals such that
P ⊢N
γ for γ a closed literal denying β, a literal in K denies one in N .
We say α is ws-provable from P , notation P ⊢ws α, if a well-supported proof of
α from P exists.
In [7] it was noted that ⊢ws is consistent, in the sense that no standard TSS
admits well-supported proofs of two literals that deny each other.
Definition 8 (completeness). A TSS P is complete if for any closed literal
a
a
a
either P ⊢ws p −→ p′ for some closed term p′ or P ⊢ws p −→.
p −→
Now a TSS specifies a transition relation if and only if it is complete. The
specified transition relation is then the set of all ws-provable transitions.
416
2.2
W. Fokkink, R. van Glabbeek, and P. de Wind
Hennessy-Milner Logic
A variety of modal logics have been developed to express properties of transition
relations. Modal logic aims to formulate properties of process terms, and to
identify terms that satisfy the same properties. Hennessy & Milner [11] have
defined a modal language, often called Hennessy-Milner Logic (HML), which
characterises the bisimulation equivalence relation on process terms, assuming
that each term has only finitely many outgoing transitions. This assumption can
be discarded if infinite conjunctions are allowed [17,12].
Definition 9 (Hennessy-Milner Logic). Assume an action set A. The set O
of potential observations or modal formulae is recursively defined by
ϕ ::=
ϕi | a ϕ | ¬ϕ
i∈I
with a ∈ A and I some index set.
Definition 10 (satisfaction relation). Let P = (Σ, R) be a TSS. The satisfaction relation |=P ⊆ T (Σ) × O is defined as follows, with p ∈ T (Σ):
p |=P
ϕi iff p |=P ϕi for all i ∈ I
i∈I
a
p |=P a ϕ iff there is a q ∈ T (Σ) such that P ⊢ws p −→ q and q |=P ϕ
p |=P ¬ϕ iff p |=P ϕ
We will use the binary conjunction ϕ1 ∧ ϕ2 as an abbreviation of i∈{1,2} ϕi ,
whereas ⊤ is an abbreviation for the empty conjunction. We
formu
identify
∼
∼
(
lae
that
are
logically
equivalent
using
the
laws
⊤
∧
ϕ
ϕ,
=
j∈Ji ϕj ) =
i∈I
∼
∼
ϕ.
This
is
justified
because
ϕ
ψ
implies
p
|=
ϕ
⇔
ϕ
and
¬¬ϕ
=
=
P
i∈I, j∈Ji j
p |=P ψ.
3
Decomposing HML Formulae
In this section we will see how one can decompose HML formulae with respect
to process terms. The TSS defining the transition relation on these terms should
be in ready simulation format [4], allowing only ntyft/ntyxt rules [9] without
lookahead.
Definition 11 (ntyxt,ntyft,nxytt). An ntytt rule is a transition rule in which
the right-hand sides of positive premises are variables that are all distinct, and
that do not occur in the source. An ntytt rule is an ntyxt rule if its source is
a variable, and an ntyft rule if its source contains exactly one function symbol
and no multiple occurrences of variables. An ntytt rule is an nxytt rule if the
left-hand sides of its premises are variables.
Definition 12 (lookahead). A transition rule has no lookahead if the variables
occurring in the right-hand sides of its positive premises do not occur in the lefthand sides of its premises.
Compositionality of Hennessy-Milner Logic
417
Definition 13 (ready simulation format). A TSS is in ready simulation
format if its transition rules are ntyft or ntyxt rules that have no lookahead.
Definition 14 (free). A variable occurring in a transition rule is free if it does
not occur in the source nor in the right-hand sides of the positive premises of
this rule.
Definition 15 (decent). A transition rule is decent if it has no lookahead and
does not contain free variables.
In Bloom, Fokkink & van Glabbeek [4] for any TSS P in ready simulation
format the collection of P -ruloids is defined. These are decent nxytt rules for
which the following holds:
Theorem 1. [4] Let P be a TSS in ready simulation format. Then P ⊢ws
a
σ(t) −→ p for t a term, p a closed term and σ a closed substitution, iff there
are a P -ruloid H
and a closed substitution σ ′ with P ⊢ws σ ′ (α) for α ∈ H,
a
t−→u
σ ′ (t) = σ(t) and σ ′ (u) = p.
Given a TSS P = (Σ, R) in ready simulation format, the following definition
assigns to each term t ∈ T(Σ) and each observation ϕ ∈ O a collection t−1
P (ϕ)
of decomposition mappings ψ : V → O. Each of these mappings ψ ∈ t−1
P (ϕ)
guarantees, given a closed substitution σ, that σ(t) satisfies ϕ if σ(x) satisfies the
formula ψ(x) for all x ∈ var (t). Moreover, whenever for some closed substitution
σ the term σ(t) satisfies ϕ, there must be a decomposition mapping ψ ∈ t−1
P (ϕ)
with σ(x) satisfying ψ(x) for all x ∈ var (t). This is formalised in Theorem 2 and
proven thereafter.
Definition 16. Let P = (Σ, R) be a TSS in ready simulation format. Then
·−1
P : T(Σ) → (O → P(V → O)) is defined by:
H
and a χ ∈ u−1
– ψ ∈ t−1
a
P (ϕ) and ψ : V → O
P ( a ϕ) iff there is a P -ruloid t−→u
is given by
¬ c ⊤ if x ∈ var (t)
b χ(y) ∧
χ(x) ∧
c
b
ψ(x) =
(x−→)∈H
(x−→y)∈H
⊤
if x ∈ var (t)
– ψ ∈ t−1
P ( i∈I ϕi ) iff
ψi (x)
ψ(x) =
i∈I
t−1
P (ϕi )
where ψi ∈
for i ∈ I.
(¬ϕ)
iff
there
is a function h : t−1
– ψ ∈ t−1
P (ϕ) → var (t) and ψ : V → O is
P
given by
ψ(x) =
¬χ(x)
χ∈h−1 (x)
418
W. Fokkink, R. van Glabbeek, and P. de Wind
When clear from the context, the subscript P will be omitted.
It is not hard to see that if ψ ∈ t−1
P (ϕ) then ψ(x) = ⊤ for all x ∈ var (t).
Theorem 2. Let P = (Σ, R) be a complete TSS in ready simulation format.
Let ϕ ∈ O. For any term t ∈ T(Σ) and closed substitution σ : V → T (Σ) one
has
σ(t) |= ϕ ⇔ ∃ψ ∈ t−1 (ϕ)∀x ∈ var (t) σ(x) |= ψ(x)
Proof. With induction on the structure of ϕ.
– ϕ = a ϕ′
⇒ Suppose σ(t) |= a ϕ′ . Then by Definition 10 there is a p ∈ T (Σ) with
a
P ⊢ws σ(t) −→ p and p |= ϕ′ . Thus, by Theorem 1 there must be a P -ruloid
H
and a closed substitution σ ′ with P ⊢ws σ ′ (α) for α ∈ H, σ ′ (t) = σ(t),
a
t−→u
′
i.e. σ (x) = σ(x) for x ∈ var (t), and σ ′ (u) = p. Since σ ′ (u) |= ϕ′ , the
induction hypothesis can be applied, and there must be a χ ∈ u−1 (ϕ′ ) such
that σ ′ (z) |= χ(z) for all z ∈ var (u). Furthermore σ ′ (z) |= χ(z) = ⊤ for all
z ∈ var (u). Now define ψ as indicated in Definition 16. By definition, ψ ∈
b
b
t−1 ( a ϕ′ ). Let x ∈ var (t). For (x −→ y) ∈ H one has P ⊢ws σ ′ (x) −→ σ ′ (y)
c
∈ H one has
and σ ′ (y) |= χ(y), so σ ′ (x) |= b χ(y). Moreover, for (x −→)
c
c
′
′
so the consistency of ⊢ws yields P ⊢ws σ (x) −→ q for all
P ⊢ws σ (x) −→,
q ∈ T (Σ), and thus σ ′ (x) |= ¬ c ⊤. It follows that σ(x) = σ ′ (x) |= ψ(x).
⇐ Now suppose that there is a ψ ∈ t−1 ( a ϕ′ ) such that σ(x) |= ψ(x) for
all x ∈ var (t). This means that there is a P -ruloid
bj
a
i
{x −→
yi | i ∈ Ix , x ∈ var (t)} ∪ {x −→|
j ∈ Jx , x ∈ var (t)}
a
t −→ u
and a decomposition mapping χ ∈ u−1 (ϕ′ ) such that, for all x ∈ var (t),
¬ bj ⊤
ai χ(yi ) ∧
σ(x) |= χ(x) ∧
i∈Ix
j∈Jx
a
i
By Definition 10 it follows that, for x ∈ var (t) and i ∈ Ix , P ⊢ws σ(x) −→
pi
for some pi ∈ T (Σ) with pi |= χ(yi ). Moreover, for x ∈ var (t) and j ∈ Jx ,
bj
P ⊢ws σ(x) −→ q for all q ∈ T (Σ), so by the completeness of P , P ⊢ws
bj
σ(x) −→.
Let σ ′ be a closed substitution with σ ′ (x) = σ(x) for x ∈ var (t)
and σ ′ (yi ) = pi for i ∈ Ix and x ∈ var (t). Here we use that the variables
x and yi are all different. Now σ ′ (z) |= χ(z) for z ∈ var (u), using that
u contains only variables that occur in t or in the premises of the ruloid.
Thus the induction hypothesis can be applied, and σ ′ (u) |= ϕ′ . Moreover,
a
bj
i
P ⊢ws σ ′ (x) −→
σ ′ (yi ) for x ∈ var (t) and i ∈ Ix , and P ⊢ws σ ′ (x) −→
for
a
′
′
x ∈ var (t) and j ∈ Jx . So, by Theorem 1, P ⊢ws σ (t) −→ σ (u), which
implies σ(t) = σ ′ (t) |= a ϕ′ .
Compositionality of Hennessy-Milner Logic
419
– ϕ = i∈I
ϕi
σ(t) |= i∈I ϕi ⇔ ∀i ∈ I : σ(t) |= ϕi
⇔ ∀i ∈ I ∃ψi∈ t−1 (ϕi ) ∀x ∈ var (t) : σ(x) |= ψi (x)
⇔ ∃ψ ∈ t−1 ( i∈I ϕi ) ∀x ∈ var (t) : σ(x) |= ψ(x).
– ϕ = ¬ϕ′
⇒ Suppose σ(t) |= ¬ϕ′ . Then by Definition 10 we have σ(t) |= ϕ′ . Using
the induction hypothesis, there is no χ ∈ t−1 (ϕ′ ) such that σ(x) |= χ(x)
for all x ∈ var (t). So for all χ ∈ t−1 (ϕ′ ) there is an x ∈ var (t) such that
σ(x) |= ¬χ(x). Let us denote this x as h(χ), so that we obtain a function
h : t−1 (ϕ′ ) → var (t) such that σ(h(χ)) |= ¬χ(h(χ)) for all χ ∈ t−1 (ϕ′ ).
Define ψ ∈ t−1 (¬ϕ′ ) as indicated in Definition 16, using h. Let x ∈ var (t).
If x = h(χ) for some χ ∈ t−1 (ϕ′ ) then σ(x) |= ¬χ(x). Hence, σ(x) |=
χ∈h−1 (x) ¬χ(x) = ψ(x).
⇐ Suppose that there is a ψ ∈ t−1 (¬ϕ′ ) such that σ(x) |= ψ(x) for all
−1
′
x ∈ var (t).
By Definition 16 there is a function h : t (ϕ ) → var (t) such that
ψ(x) = χ∈h−1 (x) ¬χ(x) for all x ∈ var (t). So for all x ∈ var (t) and for all
χ ∈ h−1 (x) we have that σ(x) |= ¬χ(x). In other words, for all χ ∈ t−1 (ϕ′ ),we
have σ(h(χ)) |= ¬χ(h(χ)). So ¬∃χ ∈ t−1 (ϕ′ )∀x ∈ var (t) σ(x) |= χ(x) .
Then using the induction hypothesis, we have σ(t) |= ϕ′ , so σ(t) |= ¬ϕ′ .
We give a few examples of the application of Definition 16.
Example 1. Let A = {a, b} and let P = (Σ, R) with Σ consisting of the constant
c and the binary function symbol f and R is:
a
a
c −→ c
a
x1 −→ y
x2 −→ y
b
b
x1 −→
b
f (x1 , x2 ) −→ y
f (x1 , x2 ) −→ y
This TSS is complete and in ready simulation format. We proceed to compute f (x1 , x2 )−1 ( b ⊤). There are two P -ruloids with a conclusion of the form
a
b
f (x1 , x2 ) −→ , namely
x1 −→y
b
f (x1 ,x2 )−→y
a
and
x2 −→y
b
x1 −→
b
f (x1 ,x2 )−→y
. According to Definition
−1
16, we have f (x1 , x2 ) ( b ⊤) = {ψ1 , ψ2 } with ψ1 and ψ2 as defined below, using
χ ∈ y −1 (⊤) (so χ(x) = ⊤ for all variables x ∈ V ):
ψ1 (x1 ) = χ(x1 ) ∧ a χ(y) = ⊤ ∧ a ⊤ = a ⊤
ψ1 (x2 ) = χ(x2 ) = ⊤
ψ1 (x) = ⊤ for x ∈ var (f (x1 , x2 ))
ψ2 (x1 ) = χ(x1 ) ∧ ¬ b ⊤ = ⊤ ∧ ¬ b ⊤ = ¬ b ⊤
ψ2 (x2 ) = χ(x2 ) ∧ a χ(y) = ⊤ ∧ a ⊤ = a ⊤
ψ2 (x) = ⊤ for x ∈ var (f (x1 , x2 ))
By Theorem 2 a closed term f (u1 , u2 ) can execute a b if and only if the closed
term u1 can execute an a, or the closed term u1 can not execute a b and the
closed term u2 can execute an a. Looking at the premises, this is what we would
expect.
420
W. Fokkink, R. van Glabbeek, and P. de Wind
Example 2. Using the TSS and the mappings ψ1 , ψ2 ∈ f (x1 , x2 )−1 ( b ⊤) from
Example 1, we can compute f (x1 , x2 )−1 (¬ b ⊤). There are four possible functions h : f (x1 , x2 )−1 ( b ⊤) → var (f (x1 , x2 )), yielding four possible definitions
of ψ ∈ f (x1 , x2 )−1 (¬ b ⊤).
1. If h(ψ1 ) = h(ψ2 ) = x1 then
ψ(x1 ) = ¬ψ1 (x1 ) ∧ ¬ψ2 (x1 ) = ¬ a ⊤ ∧ ¬¬ b ⊤ = ¬ a ⊤ ∧ b ⊤
ψ(x2 ) = ⊤
2. If h(ψ1 ) = h(ψ2 ) = x2 then
ψ(x1 ) = ⊤
ψ(x2 ) = ¬ψ1 (x2 ) ∧ ¬ψ2 (x2 ) = ¬⊤ ∧ ¬ a ⊤
3. If h(ψ1 ) = x1 and h(ψ2 ) = x2 then
ψ(x1 ) = ¬ψ1 (x1 ) = ¬ a ⊤
ψ(x2 ) = ¬ψ2 (x2 ) = ¬ a ⊤
4. If h(ψ1 ) = x2 and h(ψ2 ) = x1 then
ψ(x1 ) = ¬ψ2 (x1 ) = ¬¬ b ⊤ = b ⊤
ψ(x2 ) = ¬ψ1 (x2 ) = ¬⊤
By Theorem 2 a closed term f (u1 , u2 ) can not execute a b if and only if (1)
the closed term u1 can execute a b but not an a, or (3) the closed term u1 can
not execute an a and the closed term u2 can not execute an a. Looking at the
premises, this is again what we would expect. The other two possibilities (2) and
(4) do not qualify, since no term can ever satisfy ¬⊤.
A little less obvious example is the following:
Example 3. Let A = {a, b} and let P = (Σ, R) with Σ consisting of the constant
c and the unary function symbol f and R is:
a
c −→ c
a
x −→ y
b
f (x) −→ y
b
x −→ y
a
f (x) −→ f (y)
This TSS is complete and in ready simulation format. We proceed to compute
b
f (f (x))−1 ( b a ⊤). The only P -ruloid that has a conclusion f (f (x)) −→
b
is
x−→y
. So for each ψ ∈ f (f (x))−1 ( b a ⊤), ψ(x) = χ(x) ∧ b χ(y)
b
f (f (x))−→f (y)
−1
with χ ∈ f (y)
b
is
y −→z
.
a
f (y)−→f (z)
a
( a ⊤). The only P -ruloid that has a conclusion f (y) −→
So χ(y) = χ′ (y) ∧ b χ′ (z) with χ′ ∈ f (z)−1 (⊤). Since χ′ (y) =
χ′ (z) = ⊤ we have χ(y) = b ⊤. Moreover x ∈ var (f (y)) implies χ(x) = ⊤.
Hence ψ(x) = b b ⊤.
By Theorem 2 a closed term f (f (u)) can execute a b followed by an a if and
only if the closed term u can execute two consecutive b’s.
Compositionality of Hennessy-Milner Logic
421
The following example shows that in Theorem 2 it is essential that the TSS is
complete. That is, the theorem would fail if we would take the transition relation
induced by a TSS to consist of those transitions for which a well-supported proof
exists.
Example 4. Let A = {a, b} and let P = (Σ, R) with Σ consisting of the constant
c and the unary function symbol f and R is:
a
x −→
b
f (x) −→ c
a
c −→
a
c −→ c
This TSS, which is in ready simulation format, is incomplete. For example,
a
a
neither P ⊢ws c −→ t for a closed term t nor P ⊢ws c −→.
Let us assume that the transition relation induced by this TSS consists of
those transitions for which a well-supported proof exists. Then there is no atransition for c and no b-transition for f (c), so c |= a ⊤ and f (c) |= b ⊤.
a
The only P -ruloid is
x−→
b
f (x)−→c
. Hence Theorem 2 would yield f (c) |= b ⊤ ⇔
c |= ¬ a ⊤ ⇔ c |= a ⊤. Since this is false, Theorem 2 would fail with respect
to P .
References
1. H. R. Andersen, C. Stirling & G. Winskel (1994): A compositional proof
system for the modal µ-calculus. In Proceedings, Ninth Annual IEEE Symposium
on Logic in Computer Science, IEEE Computer Society Press, Paris, France, pp.
144–153.
2. H. R. Andersen & G. Winskel (1992): Compositional checking of satisfaction.
Formal Methods in System Design 1(4), pp. 323–354.
3. H. Barringer, R. Kuiper & A. Pnueli (1984): Now you may compose temporal
logic specifications. In ACM Symposium on Theory of Computing (STOC ’84),
ACM Press, Baltimore, USA, pp. 51–63.
4. B. Bloom, W. J. Fokkink & R. J. van Glabbeek (2003): Precongruence formats for decorated trace semantics. ACM Transactions on Computational Logic.
To appear.
5. R. Bol & J. F. Groote (1996): The meaning of negative premises in transition
system specifications. Journal of the ACM 43(5), pp. 863–914.
6. S. D. Brookes, C. A. R. Hoare & A. W. Roscoe (1984): A theory of communicating sequential processes. Journal of the ACM 31(3), pp. 560–599.
7. R. J. van Glabbeek (1996): The meaning of negative premises in transition
system specifications II. In F. Meyer auf der Heide & B. Monien, editors: Automata,
Languages and Programming, 23rd Colloquium (ICALP ’96), Lecture Notes in
Computer Science 1099, Springer-Verlag, Paderborn, Germany, pp. 502–513.
8. R. J. van Glabbeek (2001): The linear time – branching time spectrum I: The
semantics of concrete, sequential processes. In J. A. Bergstra, A. Ponse & S. A.
Smolka, editors: Handbook of Process Algebra, chapter 1, Elsevier, pp. 3–99.
9. J. F. Groote (1993): Transition system specifications with negative premises.
Theoretical Computer Science 118(2), pp. 263–299.
422
W. Fokkink, R. van Glabbeek, and P. de Wind
10. D. Harel, D. Kozen & R. Parikh (1982): Process logic: Expressiveness, decidability, completeness. Journal of Computer and System Sciences 25(2), pp.
144–170.
11. M. C. B. Hennessy & R. Milner (1985): Algebraic laws for non-determinism
and concurrency. Journal of the ACM 32(1), pp. 137–161.
12. M. C. B. Hennessy & C. Stirling (1985): The power of the future perfect in
program logics. Information and Control 67(1–3), pp. 23–52.
13. D. Kozen (1983): Results on the propositional µ-calculus. Theoretical Computer
Science 27(3), pp. 333–354.
14. K. G. Larsen (1986): Context-Dependent Bisimulation between Processes. PhD
thesis, University of Edinburgh, Edinburgh.
15. K. G. Larsen & L. Xinxin (1991): Compositionality through an operational
semantics of contexts. Journal of Logic and Computation 1(6), pp. 761–795.
16. R. Milner (1980): A Calculus of Communicating Systems. Springer-Verlag. Volume 92 of Lecture Notes in Computer Science.
17. R. Milner (1981): A modal characterization of observable machine-behaviour. In
E. Astesiano & C. Böhm, editors: CAAP ’81: Trees in Algebra and Programming,
6th Colloquium, Lecture Notes in Computer Science 112, Springer-Verlag, Genoa,
pp. 25–34.
18. R. Milner (1983): Calculi for synchrony and asynchrony. Theoretical Computer
Science 25(3), pp. 267–310.
19. G. D. Plotkin (1981): A structural approach to operational semantics. Technical
Report DAIMI FN-19, Computer Science Department, Aarhus University, Aarhus,
Denmark.
20. A. Pnueli (1981): The temporal logic of concurrent programs. Theoretical Computer Science 13, pp. 45–60.
21. R. De Simone (1985): Higher-level synchronising devices in Meije–SCCS. Theoretical Computer Science 37(3), pp. 245–267.
22. C. Stirling (1985): A proof-theoretic characterization of observational equivalence. Theoretical Computer Science 39(1), pp. 27–45.
23. C. Stirling (1985): A complete compositional modal proof system for a subset
of CCS. In W. Brauer, editor: Automata, Languages and Programming, 12th
Colloquium (ICALP ’85), Lecture Notes in Computer Science 194, Springer-Verlag,
pp. 475–486.
24. C. Stirling (1985): A complete modal proof system for a subset of SCCS. In
H. Ehrig, C. Floyd, M. Nivat & J. W. Thatcher, editors: Mathematical Foundations
of Software Development: Proceedings of the Joint Conference on Theory and
Practice of Software Development (TAPSOFT), Volume 1: Colloquium on Trees
in Algebra and Programming (CAAP ’85), Lecture Notes in Computer Science
185, Springer-Verlag, pp. 253–266.
25. C. Stirling (1987): Modal logics for communicating systems. Theoretical Computer Science 49(2-3), pp. 311–347.
26. G. Winskel (1986): A complete proof system for SCCS with modal assertions.
Fundamenta Informaticae IX, pp. 401–420.
27. G. Winskel (1990): On the compositional checking of validity (extended abstract). In J. C. M. Baeten & J. W. Klop, editors: CONCUR ’90: Theories of
Concurrency: Unification and Extension, Lecture Notes in Computer Science 458,
Springer-Verlag, Amsterdam, The Netherlands, pp. 481–501.
On a Logical Approach to Estimating Computational
Complexity of Potentially Intractable Problems⋆
Andrzej Szaáas
The College of Economics and Computer Science, Olsztyn, Poland
and
Department of Computer Science, University of Linköping, Sweden
andsz@ida.liu.se
Abstract. In the paper we present a purely logical approach to estimating computational complexity of potentially intractable problems. The approach is based
on descriptive complexity and second-order quantifier elimination techniques.
We illustrate the approach on the case of the transversal hypergraph problem,
TransHyp, which has attracted a great deal of attention. The complexity of the
problem remains unsolved for over twenty years. Given two hypergraphs, G and
H, TransHyp depends on checking whether G = Hd , where Hd is the transversal
hypergraph of H.
In the paper we provide a logical characterization of minimal transversals of
a given hypergraph and prove that checking whether G ⊆ Hd is tractable.
For the opposite inclusion the problem still remains open. However, we
interpret the resulting quantifier sequences in terms of determinism and bounded
nondeterminism. The results give better upper bounds than those known from
the literature, e.g., in the case when hypergraph H has a sub-logarithmic
number of hyperedges and (for the deterministic case) all hyperedges have
the cardinality bounded by a function sub-linear wrt maximum of sizes of G and H.
Keywords: second-order logic, second-order quantifier elimination, descriptive
complexity, transversal hypergraph problem
1
Introduction
In the current paper we propose a rather general methodology for estimating the complexity of potentially intractable problems. The methodology consists of the following
steps1 :
1. Specify the problem in the second-order logic.
The complexity of checking validity of second-order formulas in a finite model is
PSpace-complete wrt the size of the model. Thus, for all problems in PSpace such
a description exists. The existential fragment of the second-order logic2 is NPTime⋆
1
2
Supported in part by the KBN grant 8 T11C 00919.
Below and throughout the paper we apply well-known results of descriptive complexity theory.
For the relevant details see, e.g., [5,12].
I.e., the fragment consisting of formulas in which all second-order quantifiers are existential
and appear only in prefixes of formulas.
A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 423–431, 2003.
c Springer-Verlag Berlin Heidelberg 2003
424
A. Szaáas
complete over finite models. Dually, the universal fragment of second-order logic
is co-NPTime-complete over finite models.
2. Try to eliminate second-order quantifiers.
An application of known methods, if successful, might result in3 :
– a formula of the first-order logic, validity of which (over finite models) is in
PTime and LogSpace. Here one can apply, e.g., the Ackermann lemma (see
Lemma 2.4) or the SCAN algorithm of [10];
– a formula of the fixpoint logic, validity of which (over finite models) is in
PTime.4 Here one can apply the elimination theorem of [15].
3. If the second-order quantifier elimination is not successful, which is likely to happen
for NPTime, co-NPTime or PSpace-complete problems, one can try to identify
subclasses of the problem, for which elimination of second-order quantifiers is
guaranteed. In such cases tractable (or quasi-polynomial) subproblems of the main
problem can be identified.
Below we apply the methodology to the transversal hypergraph problem and show
that inclusion in one direction is in PTime. We also identify some tractable and almost
tractable cases for verifying the opposite inclusion, and relate the results to a bounded
nondeterminism. Let us, however, emphasize that our main goal is to show how logic
can help in analyzing the complexity of problems which can be naturally expressed by
means of the second-order logic. The hypergraph problem is chosen mainly as a case
study.
Hypergraph theory [2] has may applications in computer science and artificial intelligence (see, e.g., [3,6,7,11,13]). In particular, the transversal hypergraph problem,
TransHyp, has attracted a great deal of attention. Many important problems of databases,
knowledge representation, Boolean circuits, duality theory, diagnosis, machine learning,
data mining, explanation finding, etc. can be reduced to TransHyp (see, e.g., [7]). However, the precise complexity of this problem remains open for over twenty years. The best
known algorithm, provided in [9], runs in quasi-polynomial time wrt the size of the input
hypergraphs. More precisely, if n is the size of the input hypergraphs, then the algorithm
of [9] requires no(log n) steps. The paper [8] provides a result that relates TransHyp
to a limited nondeterminism by showing that the complement of the problem can be
solved in polynomial time with O(χ(n) ∗ log n) guessed bits, where χ(n)χ(n) = n. As
observed in [9], χ(n) ≈ log n/ log log n = o(log n).
2
Preliminaries
Let us first define notions related to the TransHyp problem. We provide definitions
slightly adapted for further logical characterization. However, the definitions are equivalent to those considered in the literature.
Definition 2.1. By a hypergraph we mean a triple H = V, E, M , where
3
4
For an overview of known second-order quantifier elimination techniques see, e.g., [14].
Recall that fixpoint logic captures all problems solvable in deterministic polynomial time,
provided that the underlying domain is linearly ordered.
On a Logical Approach to Estimating Computational Complexity
425
– V and E are finite disjoint sets of elements and hyperedges, respectively
– M ⊆ E × V is an edge membership relation.
A transversal of H is any set T ⊆ V such that for any hyperedge e ∈ E there is v ∈ V
such that (T (v) ∧ M (e, v)) holds. A transversal is minimal iff it is minimal wrt set
inclusion.
In the sequel we sometimes identify hyperedges with sets of their members, i.e., any
hyperedge e ∈ E of hypergraph H = V, E, M is identified with set
{v ∈ V : M (e, v) holds}.
Definition 2.2. By the transversal hypergraph5 of a hypergraph H we mean hypergraph
Hd whose hyperedges are all minimal transversals of H.
Definition 2.3. By the transversal hypergraph problem, denoted by TransHyp, we mean
a problem of checking, for given hypergraphs G and H, whether G = Hd .
We say that a formula Φ is positive w.r.t. a predicate P iff any occurrence of P in
Φ appears within the scope of an even number of negations only6 . Dually, we say that
Φ is negative w.r.t. P iff any occurrence of P in Φ appears within the scope of an odd
number of negations only.
x̄
By Ψ P (t̄) := [Φ]t̄ we understand formula obtained from Ψ by replacing every
occurrence of P in by Φ where in each replacement the actual argument of P , say t̄,
replaces the variables of x̄ in Φ (with renaming bound variables, whenever necessary).
The following lemma is substantial for the technique we propose.
Lemma 2.4. Let P be a predicate variable and let Φ and Ψ (P ) be first–order formulas
such that Ψ (P ) is positive w.r.t. P and Φ contains no occurrences of P . Then
x̄
∃P ∀x̄ (P (x̄) → Φ(x̄)) ∧ Ψ (P ) ≡ Ψ P (t̄) := [Φ]t̄
and similarly if the sign of P is switched to ¬ and Ψ is negative w.r.t. P .
Lemma 2.4 was proved by Ackermann in [1]. It can also be found in [16] and, in
the context of circumscription7 , in [4]. A substantially stronger elimination theorem
extending this lemma is given in [15].
We shall also need the following simple proposition.
Proposition 2.5. Let P be a predicate variable and let Φ, Ψ be first–order formulas.
Assume that P does not occur in Φ. Then
x̄
∃P ∀x̄ (P (x̄) ≡ Φ(x̄)) ∧ Ψ (P ) ≡ Ψ P (t̄) := [Φ]t̄
5
6
7
Called also a dual hypergraph.
Under the standard convention stating that implication (Ψ1 → Ψ2 ) is treated as the disjunction
(¬Ψ1 ∨ Ψ2 ), and equivalence (Ψ1 ≡ Ψ2 ) is treated as formula [(Ψ1 ∧ Ψ2 ) ∨ (¬Ψ1 ∧ ¬Ψ2 )].
Observe that the conjunction (1)∧(2), substantial for our considerations, is simply the circumscribed formula (1), where T is minimized.
426
3
A. Szaáas
Characterization of Minimal Transversals of Hypergraphs
Obviously, T is a transversal of hypergraph H = V, E, M iff
∀e ∈ E∃v ∈ V (T (v) ∧ M (e, v)).
It is a minimal transversal iff
∀e ∈ E∃v ∈ V (T (v) ∧ M (e, v))∧
′
′
(1)
′
∀T {[∀e ∈ E∃v ∈ V (T (v) ∧ M (e, v)) ∧ ∀w ∈ V (T (w) → T (w))] →
∀u ∈ V (T (u) → T ′ (u))}
(2)
Formula (2) is a universal second-order formula. Application of this formula to the
verification whether a given transversal is minimal, is thus in co-NPTime. On the other
hand, one can eliminate the second-order quantification by applying Lemma 2.4. To do
this, we first negate (2):
∃T ′ {[∀e ∈ E∃v ∈ V (T ′ (v) ∧ M (e, v)) ∧ ∀w ∈ V (T ′ (w) → T (w))]∧
∃u ∈ V (T (u) ∧ ¬T ′ (u))}
(3)
Formula (3) is equivalent to
∃u ∈ V ∃T ′ [∀w ∈ V (T ′ (w) → T (w))∧
′
(4)
′
∀e ∈ E∃v ∈ V (T (v) ∧ M (e, v)) ∧ T (u) ∧ ¬T (u)],
i.e., to
∃u ∈ V ∃T ′ [∀w ∈ V (T ′ (w) → T (w))∧
∀e ∈ E∃v ∈ V (T ′ (v) ∧ M (e, v)) ∧ T (u)∧
∀w ∈ V (T ′ (w) → w = u)],
and finally, to
∃u ∈ V ∃T ′ [∀w ∈ V (T ′ (w) → (T (w) ∧ w = u))∧
(5)
∀e ∈ E∃v ∈ V (T ′ (v) ∧ M (e, v)) ∧ T (u)].
After the application of Lemma 2.4 we obtain the following formula equivalent to (5):
∃u ∈ V [∀e ∈ E∃v ∈ V (T (v) ∧ v = u ∧ M (e, v)) ∧ T (u)].
(6)
After negating formula (6) and rearranging the result, we obtain the following first-order
formula equivalent to (2):
∀u ∈ V [T (u) → ∃e ∈ E∀v ∈ V ((T (v) ∧ M (e, v)) → v = u)].
(7)
Let H = V, E, M be a hypergraph. In the sequel we use notation M inH (T ),
defined by
def
M inH (T ) ≡
∀e ∈ E∃v ∈ V (T (v) ∧ M (e, v))∧
∀u ∈ V [T (u) → ∃e ∈ E∀v ∈ V ((T (v) ∧ M (e, v)) → v = u)].
We now have the following lemma.
(8)
On a Logical Approach to Estimating Computational Complexity
427
Lemma 3.1. For any hypergraph H = V, E, M , T is a minimal transversal of H iff it
satisfies formula M inH (T ). In consequence8 , checking whether a given T is a minimal
transversal of a hypergraph is in PTime and LogSpace wrt the size of the hypergraph.
4
4.1
Specification of the TransHyp Problem In Logic
Specification of the TransHyp Problem in the Second-Order Logic
Let G = V, EG , MG and H = V, EH , MH be hypergraphs. In order to check whether
G = Hd , we verify inclusions G ⊆ Hd and Hd ⊆ G. The inclusions can be characterized
in the second-order logic as follows:
d
d
∀e ∈ EG ∃ e′ ∈ EH
∀v ∈ V (MG (e, v) ≡ MH
(e′ , v))
d
d
∀e′ ∈ EH
∃ e ∈ EG ∀v ∈ V (MG (e, v) ≡ MH
(e′ , v)).
(9)
(10)
According to Lemma 3.1, formulas (9) and (10) can be expressed as
∀e ∈ EG ∃ T [M inH (T ) ∧ ∀v ∈ V (MG (e, v) ≡ T (v))]
∀T [M inH (T ) → ∃ e ∈ EG ∀v ∈ V (MG (e, v) ≡ T (v))].
(11)
(12)
The above specification leads to intractable algorithms (unless PTime = NPTime). In
the following sections we attempt to reduce the complexity by eliminating second-order
quantifiers from formulas (11) and (12).
4.2 The Case of Inclusion G ⊆ Hd
Consider the second-order part of formula (11), i.e.,
∃ T [M inH (T ) ∧ ∀v ∈ V (MG (e, v) ≡ T (v))].
(13)
9
Due to equivalence (8), Lemma 3.1 and Proposition 2.5, formula (13) is equivalent to
∀e′ ∈ EH ∃v ∈ V (MG (e, v) ∧ MH (e′ , v))∧
(14)
∀u ∈ V [MG (e, u) →
∃e′ ∈ EH ∀v ∈ V ((MG (e, v) ∧ MH (e′ , v)) → v = u)].
In consequence, formula (11) is equivalent to
∀e ∈ EG ∀e′ ∈ EH ∃v ∈ V (MG (e, v) ∧ MH (e′ , v))∧
(15)
∀e ∈ EG ∀u ∈ V [MG (e, u) →
∃e′ ∈ EH ∀v ∈ V ((MG (e, v) ∧ MH (e′ , v)) → v = u)].
Thus the inclusion G ⊆ Hd is first-order definable by formula (15). We then have
the following corollary.
Corollary 4.1. For any hypergraphs G = V, EG , MG and H = V, EH , MH , checking whether G ⊆ Hd , is in PTime and LogSpace wrt the maximum of sizes of hypergraphs
G and H.
8
This easily follows from the equivalence (8) by which M inH (T ) is characterized by a firstorder formula.
9
Note that in order to apply Proposition 2.5, bound variable e is renamed into e′
428
A. Szaáas
4.3 The Case of Inclusion Hd ⊆ G
Unfortunately, no known second-order quantifier elimination method is successful for
the inclusion (12). We thus equivalently transform formula (12) to a form where Lemma
2.4 is applicable. The verification of the resulting formula in finite models is, in general, of
exponential complexity. However, when some restrictions are assumed, the complexity
reduces to the deterministic polynomial or quasi-polynomial time, as shown below.
By (8), formula (12) is equivalent to
∀T {[∀e ∈ EH ∃v ∈ V (T (v) ∧ MH (e, v))∧
(16)
∀u ∈ V [T (u) → ∃e ∈ EH ∀v ∈ V ((T (v) ∧ MH (e, v)) → v = u)]] →
∃ e ∈ EG ∀v ∈ V (MG (e, v) ≡ T (v))}
Let us assume that the inclusion G ⊆ Hd holds. If not, then the answer to TransHyp
for this particular instance is negative. Under this assumption, formula (16) is equivalent
to10
∀T {[∀e ∈ EH ∃v ∈ V (T (v) ∧ MH (e, v))∧
(17)
∀u ∈ V [T (u) → ∃e ∈ EH ∀v ∈ V ((T (v) ∧ MH (e, v)) → v = u)]] →
∃ e ∈ EG ∀v ∈ V (MG (e, v) → T (v))}.
In order to apply Lemma 2.4 we first negate (17):
∃T {∀e ∈ EH ∃v ∈ V (T (v) ∧ MH (e, v))∧
(18)
∀u ∈ V [T (u) → ∃e ∈ EH ∀v ∈ V ((T (v) ∧ MH (e, v)) → v = u)]∧
∀ e ∈ EG ∃v ∈ V (MG (e, v) ∧ ¬T (v))}.
In order to simplify calculations, by Γ (T ) we denote the conjunction of formulas given
in the last two lines of (18). Formula (18) is then expressed by
∃T {∀e ∈ EH ∃v ∈ V (T (v) ∧ MH (e, v)) ∧ Γ (T )}.
(19)
Observe that Γ (T ) is negative wrt T . Thus the main obstacle for applying Lemma 2.4
is created by the existential quantifier ∃v ∈ V appearing within the scope of ∀e ∈ EH .
def
Assume EH = {e1 , . . . , ek }. Denote by Ve = {x : MH (e, x) holds}. Formula (19)
can then be expressed by
∃T {∃v1 ∈ Ve1 T (v1 ) ∧ . . . ∧ ∃vk ∈ Vek T (vk ) ∧ Γ (T )},
i.e., by
∃v1 ∈ Ve1 . . . ∃vk ∈ Vek ∃T {T (v1 ) ∧ . . . ∧ T (vk ) ∧ Γ (T )},
which is equivalent to
∃v1 ∈ Ve1 . . . ∃vk ∈ Vek ∃T {∀v ∈ V [(v = v1 ∨ . . . ∨ v = vk ) → T (v)] ∧ Γ (T )}.
10
By minimality of Hd , and the assumption G ⊆ Hd , inclusion expressed by
∀v ∈ V (MG (e, v) → T (v)) is equivalent to the set equality, expressed by
∀v ∈ V (MG (e, v) ≡ T (v)).
On a Logical Approach to Estimating Computational Complexity
429
The application of Lemma 2.4 results in the following first-order formula:
v
∃v1 ∈ Ve1 . . . ∃vk ∈ Vek {Γ [T (t) := [(v = v1 ∨ . . . ∨ v = vk )]t ] }.
In consequence, formula (17) is equivalent to
v
∀v1 ∈ Ve1 . . . ∀vk ∈ Vek {¬Γ [T (t) := [(v = v1 ∨ . . . ∨ v = vk )]t ] },
i.e., to
∀v1 ∈ Ve1 . . . ∀vk ∈ Vek ∀u ∈ V {[(u = v1 ∨ . . . ∨ u = vk ) →
(20)
∃e ∈ EH ∀v ∈ V [((v = v1 ∨ . . . ∨ v = vk ) ∧ MH (e, v)) → v = u]] →
∃ e ∈ EG ∀v ∈ V (MG (e, v) → (v = v1 ∨ . . . ∨ v = vk ))}.
The major complexity of checking whether given hypergraphs satisfy formula (20) is
caused by the sequence of quantifiers ∀v1 ∈ Ve1 . . . ∀vk ∈ Vek ∀u. We then have the
following theorem.
Theorem 4.2. For given hypergraphs G and H, such that G ⊆ Hd , the problem of
checking whether Hd ⊆ G is solvable in time O(|V1 | ∗ . . . ∗ |Vk | ∗ p(n)), where p(n) is
a polynomial11 , n is the maximum of sizes of G and H, k is the number of edges in H,
and for e = 1, . . . , k, |Ve | denotes the cardinality of set {x : MH (e, x) holds}.
Accordingly we have the following corollary.
Corollary 4.3. Under assumptions of Theorem 4.2, if cardinalities |V1 |, . . . , |Vk | are
bounded by a function f (n) then the problem of checking whether Hd ⊆ G is solvable
in time O(f (n)k ∗ p(n)).
In the view of the result given in [9], Corollary 4.3 can be useful if k is bounded by
a (sub-) logarithmic function, and f (n) is (sub-)linear wrt n. For instance, if both k and
f (n) are bounded by log n then the corollary gives us an upper bound O((log n)log n ∗
p(n)) which is better than that offered by algorithm of [9]. Let us emphasize that in
many cases |V | and consequently f (n) is bounded by log n, since the dual hypergraph
might be of size exponential wrt |V |.
The characterization provided by formula (20) is also related to the bounded nondeterminism. Namely, consider the complement of TransHyp problem. The sequence
of quantifiers ∀v1 ∈ Ve1 . . . ∀vk ∈ Vek appearing in formula (20) is transformed into
∃v1 ∈ Ve1 . . . ∃vk ∈ Vek . In order to verify the negated formula it is then sufficient to
guess k sequences of bits of size not greater than log max {|Ve |}. Thus, in the worst
e=1,...,k
case, it suffices to guess k ∗ log |V | bits. By the result of [8], mentioned in Section 1,
O(log2 n) guessed bits suffice to further solve the TransHyp problem in deterministic
polynomial time. Thus the observation we just made is useful, e.g., when one considers
the input graph H with the number of edges (sub-)logarithmic wrt n. Observe, however,
that often n is exponentially larger than |V |.12
11
12
Reflecting the complexity introduced by quantifiers inside formula (20).
This frequently happens in the duality theory, where the number of prime implicants and
implicates is exponential wrt the size of the input formula.
430
5
A. Szaáas
Conclusions
In the paper we presented a purely logical approach to estimating computational complexity of potentially intractable problems. We illustrated the approach on the case of the
complexity of the TransHyp problem. We provided a logical characterization of minimal transversals of a given hypergraph and proved that checking the inclusion G ⊆ Hd
is tractable. For the opposite inclusion the problem still remains open. However, we
interpreted the resulting quantifier sequences in terms of determinism and bounded nondeterminism. The results give better upper bounds than those known from the literature
in the case when hypergraph H has a sub-logarithmic number of hyperedges and (for the
deterministic case) all hyperedges have the cardinality bounded by a function sub-linear
wrt the maximum of sizes of the input hypergraphs.
Let us also emphasize that the simplest second-order quantifier elimination techniques were applied. In some cases it might be useful to apply theorem of [15] which
results in a fixpoint formula, i.e., much stronger but still tractable formalism.
References
1. W. Ackermann. Untersuchungen über das eliminationsproblem der mathematischen logik.
Mathematische Annalen, 110:390–413, 1935.
2. C. Berge. Hypergraphs, volume 45 of North-Holland Mathematical Library. Elsevier, 1989.
3. E. Boros, V. Gurvich, L. Khachiyan, and K. Makino. Generating partial and multiple transversals of a hypergraph. In Automata, Languages and Programming, volume 1853 of Lecture
Notes in Computer Science, pages 588–599. Springer, 2000.
4. P. Doherty, W. Lukaszewicz, and A. Szaáas. Computing circumscription revisited. Journal of
Automated Reasoning, 18(3):297–336, 1997.
5. H-D. Ebbinghaus and J. Flum. Finite Model Theory. Springer-Verlag, Heidelberg, 1995.
6. T. Eiter and G. Gottlob. Identifying the minimal transversals of a hypergraph and related
problems. SIAM Journal on Computing, 24(6):1278–1304, 1995.
7. T. Eiter and G. Gottlob. Hypergraph transversal computation and related problems in logic
and AI. In M. Flesca, S. Greco, N. Leone, and G. Ianni, editors, Proceedings of the 8th
Conference JELIA 2002, LNAI 2424, pages 549–564. Springer-Verlag, 2002.
8. T. Eiter, G. Gottlob, and K. Makino. New results on monotone dualization and generating
hypergraph transversals. In ACM STOC 2002, pages 14–22, 2002.
9. M.L. Fredman and L. Khachiyan. On the complexity of dualization of monotone disjunctive
normal forms. Journal of Algorithms, 21:618–628, 1996.
10. D. M. Gabbay and H. J. Ohlbach. Quantifier elimination in second-order predicate logic.
In B. Nebel, C. Rich, and W. Swartout, editors, Principles of Knowledge representation and
reasoning, KR 92, pages 425–435. Morgan Kauffman, 1992.
11. G. Gogic, C.H. Papadimitriou, and M. Sideri. Incremental recompilation of knowledge.
Journal of Artificial Intelligence Research, 8:23–37, 1998.
12. N. Immerman. Descriptive Complexity. Springer-Verlag, New York, Berlin, 1998.
13. D.J. Kavvadias and E.C. Stavropoulos. Evaluation of an algorithm for the transversal hypergraph problem. In J. Scott Vitter and C. D. Zaroliagis, editors, Algorithm Engineering, 3rd
International Workshop, WAE ’99, volume 1668 of Lecture Notes in Computer Science, pages
72–84. Springer, 1999.
14. A. Nonnengart, H.J. Ohlbach, and A. Szaáas. Elimination of predicate quantifiers. In H.J.
Ohlbach and U. Reyle, editors, Logic, Language and Reasoning. Essays in Honor of Dov
Gabbay, Part I, pages 159–181. Kluwer, 1999.
On a Logical Approach to Estimating Computational Complexity
431
15. A. Nonnengart and A. Szaáas. A fixpoint approach to second-order quantifier elimination
with applications to correspondence theory. In E. Oráowska, editor, Logic at Work: Essays
Dedicated to the Memory of Helena Rasiowa, volume 24 of Studies in Fuzziness and Soft
Computing, pages 307–328. Springer Physica-Verlag, 1998.
16. A. Szaáas. On the correspondence between modal and classical logic: An automated approach.
Journal of Logic and Computation, 3:605–620, 1993.
Author Index
Ablayev, Farid 296
Aleksandrov, Lyudmil
Angel, Eric 39
Antunes, Luı́s 303
Arora, Sanjeev 1
Arpe, Jan 158
Asano, Takao 2
Kratsch, Dieter 61
Kuich, Werner 376
Kutrib, Martin 321
246
Lanka, André 15
Latteux, Michel 355
Liśkiewicz, Maciej 158
Lipton, Richard J. 311
Bampis, Evripidis 39
Berstel, Jean 343
Boasson, Luc 343
Bodlaender, Hans 61
Brandstädt, Andreas 61
Bugliesi, Michele 364
Maheshwari, Anil 246
Mastrolilli, Monaldo 49
Miltersen, Peter Bro 171
Moser, Philippe 333
Niedermeier, Rolf 195
Nilsson, Bengt J. 234
Carton, Olivier 343
Ceccato, Ambra 364
Chlebı́k, Miroslav 27
Chlebı́ková, Janka 27
Cieliebak, Mark 98
Coja-Oghlan, Amin 15
Pagourtzis, Aris 98
Papadimitriou, Christos
Păun, Gheorghe 284
Pech, Christian 387
Persson, Mia 234
Petazzoni, Bruno 343
Pin, Jean-Éric 343
Damaschke, Peter 183
Damgård, Ivan Bjerre 109, 118
Eidenbenz, Stephan 98
Evans, Patricia A. 210
Fokkink, Wan 412
Fomin, Fedor V. 73
Fortnow, Lance 303
Frandsen, Gudmund Skovbjerg
Gainutdinova, Aida 296
Glabbeek, Rob van 412
Goerdt, Andreas 15
Gourvès, Laurent 39
Gramm, Jens 195
Gudmundsson, Joachim 86
Guo, Jiong 195
Halava, Vesa 355
Hammar, Mikael 234
Hansen, Kristoffer Arnsfelt 171
Harju, Tero 355
Heggernes, Pinar 73
Hoogeboom, Hendrik Jan 355
Jakoby, Andreas
Kik, Marcin
132
158
157
Rao, Michaël 61
Reif, John H. 258, 271
Rossi, Sabina 364
109, 118
Sack, Jörg-Rüdiger 246
Schädlich, Frank 15
Smith, Andrew D. 210
Spinrad, Jeremy 61
Stachowiak, Grzegorz 144
Sun, Zheng 258, 271
Szalas, Andrzej 423
Tantau, Till 400
Telle, Jan Arne 73
Viglas, Anastasios 311
Vinay, V. 171
Vinodchandran, N.V. 303
Wagner, Klaus W. 376
Wind, Paulien de 412
Zhu, Binhai
222