Academia.eduAcademia.edu
Lecture Notes in Computer Science Edited by G. Goos, J. Hartmanis, and J. van Leeuwen 2751 3 Berlin Heidelberg New York Hong Kong London Milan Paris Tokyo Andrzej Lingas Bengt J. Nilsson (Eds.) Fundamentals of Computation Theory 14th International Symposium, FCT 2003 Malmö, Sweden, August 12-15, 2003 Proceedings 13 Series Editors Gerhard Goos, Karlsruhe University, Germany Juris Hartmanis, Cornell University, NY, USA Jan van Leeuwen, Utrecht University, The Netherlands Volume Editors Andrzej Lingas Lund University Department of Computer Science Box 118, 221 00 Lund, Sweden E-mail: Andrzej.Lingas@cs.lth.se Bengt J. Nilsson Malmö University College School of Technology and Society 205 06 Malmö, Sweden E-mail: Bengt.Nilsson@ts.mah.se Cataloging-in-Publication Data applied for A catalog record for this book is available from the Library of Congress. Bibliographic information published by Die Deutsche Bibliothek Die Deutsche Bibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data is available in the Internet at <http://dnb.ddb.de>. CR Subject Classification (1998): F.1, F.2, F.4, I.3.5, G.2 ISSN 0302-9743 ISBN 3-540-40543-7 Springer-Verlag Berlin Heidelberg New York This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer-Verlag. Violations are liable for prosecution under the German Copyright Law. Springer-Verlag Berlin Heidelberg New York a member of BertelsmannSpringer Science+Business Media GmbH http://www.springer.de © Springer-Verlag Berlin Heidelberg 2003 Printed in Germany Typesetting: Camera-ready by author, data conversion by PTP-Berlin GmbH Printed on acid-free paper SPIN: 10930632 06/3142 543210 Preface The papers in this volume were presented at the 14th Symposium on Fundamentals of Computation Theory. The symposium was established in 1977 as a biennial event for researchers interested in all aspects of theoretical computer science, in particular in algorithms, complexity, and formal and logical methods. The previous FCT conferences were held in the following cities: Poznań (Poland, 1977), Wendisch-Rietz (Germany, 1979), Szeged (Hungary, 1981), Borgholm (Sweden, 1983), Cottbus (Germany, 1985), Kazan (Russia, 1987), Szeged (Hungary, 1989), Gosen-Berlin (Germany, 1991), Szeged (Hungary, 1993), Dresden (Germany, 1995), Kraków (Poland, 1997), Iasi (Romania, 1999), and Riga (Latvia, 2001). The FCT conferences are coordinated by the FCT steering committee, which consists of B. Chlebus (Denver/Warsaw), Z. Esik (Szeged), M. Karpinski (Bonn), A. Lingas (Lund), M. Santha (Paris), E. Upfal (Providence), and I. Wegener (Dortmund). The call for papers sought contributions on original research in all aspects of theoretical computer science including design and analysis of algorithms, abstract data types, approximation algorithms, automata and formal languages, categorical and topological approaches, circuits, computational and structural complexity, circuit and proof theory, computational biology, computational geometry, computer systems theory, concurrency theory, cryptography, domain theory, distributed algorithms and computation, molecular computation, quantum computation and information, granular computation, probabilistic computation, learning theory, rewriting, semantics, logic in computer science, specification, transformation and verification, and algebraic aspects of computer science. There were 73 papers submitted, of which the majority were very good. Because of the FCT format, the program committee could select only 36 papers for presentation. In addition, invited lectures were presented by Sanjeev Arora (Princeton), George Păun (Romanian Academy), and Christos Papadimitriou (Berkeley). FCT 2003 was held on August 13–15, 2003, in Malmö, and Andrzej Lingas (Lund University) and Bengt Nilsson (Malmö University College) were, respectively, the program committee and the conference chairs. We wish to thank all referees who helped to evaluate the papers. We are grateful to Lund University, Malmö University College, and the Swedish Research Council for their support. Lund, May 2003 Andrzej Lingas Bengt J. Nilsson Organizing Committee Bengt Nilsson, Malmö (Chair) Oscar Garrido, Malmö Thore Husfeldt, Lund Miroslaw Kowaluk, Warsaw Program Committee Arne Andersson, Uppsala Stefan Arnborg, KTH Stockholm Stephen Alstrup, ITU Copenhagen Zoltan Esik, Szeged Rusins Freivalds, UL Riga Alan Frieze, CMU Pittsburgh Leszek Ga̧sieniec, Liverpool Magnus Halldórsson, UI Reykjavik Klaus Jansen, Kiel Juhani Karhumäki, Turku Marek Karpinski, Bonn Christos Levcopoulos, Lund Ming Li, Santa Barbara Andrzej Lingas, Lund (Chair) Jan Maluszyński, Linköping Fernando Orejas, Barcelona Jürgen Prömel, Berlin Rüdiger Reischuk, Lübeck Wojciech Rytter, Warsaw/NJIT Miklos Santha, Paris-Sud Andrzej Skowron, Warsaw Paul Spirakis, Patras Esko Ukkonen, Helsinki Ingo Wegener, Dortmund Pawel Winter, Copenhagen Vladimiro Sassone, Sussex Referees M. Albert A. Aldini J. Arpe A. Barvinok C. Bazgan S.L. Bloom M. Bläser M. Bodirsky B. Bollig C. Braghin R. Bruni A. Bucalo G. Buntrock M. Buscemi B. Chandra J. Chlebikova A. Coja-Oghlan L.A. Cortes W.F. de la Vega M. de Rougemont W. Drabent S. Droste C. Durr M. Dyer L. Engebretsen H. Eriksson L.M. Favrholdt H. Fernau A. Ferreira A. Fishkin A. Flaxman D. Fotakis O. Gerber G. Ghelli O. Giel M. Grantson J. Gudmundsson V. Halava B.V. Halldórsson L. Hemaspaandra M. Hermo M. Hirvensalo F. Hoffmann T. Hofmeister J. Holmerin J. Hromkovic L. Ilie A. Jakoby T. Jansen J. Jansson A. Jarry M. Jerrum P. Kanarek J. Kari R. Karlsson J. Katajainen A. Kiehn Organization H. Klaudel B. Klin B. Konev S. Kontogiannis J. Kortelainen G. Kortsarz M. Koutny D. Kowalski M. Krivelevich K.N. Kumar M. Kääriäinen G. Lancia R. Lassaigne M. Latteux M. Libura M. Liśkiewicz K. Loryś E.M. Lundell F. Magniez B. Manthey M. Margraf N. Marti-Oliet M. Mavronicolas E. Mayordomo C. McDiarmid T. Mielikäinen M. Mitzenmacher S. Nikoletseas B.J. Nilsson U. Nilsson J. Nordström H. Ohsaki D. Osthus A. Palbom G. Persiano T. Petkovic I. Potapov C. Priami E. Prouff K. Reinert J. Rousu M. Sauerhoff H. Shachnai J. Shallit D. Slezak J. Srba F. Stephan O. Sykora P. Tadepalli M. Takeyama A. Taraz P. Thiemann M. Thimm P. Valtr S.P.M. van Hoesel J. van Leeuwen S. Vempala Y. Verhoeven E. Venigoda H. Vogler B. Vöcking H. Völzer A.P.M. Wagelmans R. Wanka M. Westermann A. Wojna J. Wroblewski Q. Xin M. Zachariasen G. Zhang G.Q. Zhang H. Zhang VII Table of Contents Approximability 1 Proving Integrality Gaps without Knowing the Linear Program . . . . . . . . . Sanjeev Arora 1 An Improved Analysis of Goemans and Williamson’s LP-Relaxation for MAX SAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Takao Asano 2 Certifying Unsatisfiability of Random 2k-SAT Formulas Using Approximation Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Amin Coja-Oghlan, Andreas Goerdt, André Lanka, Frank Schädlich 15 Approximability 2 Inapproximability Results for Bounded Variants of Optimization Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Miroslav Chlebı́k, Janka Chlebı́ková 27 Approximating the Pareto Curve with Local Search for the Bicriteria TSP(1,2) Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Eric Angel, Evripidis Bampis, Laurent Gourvès 39 Scheduling to Minimize Max Flow Time: Offline and Online Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Monaldo Mastrolilli 49 Algorithms 1 Linear Time Algorithms for Some NP-Complete Problems on (P5 ,Gem)-Free Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hans Bodlaender, Andreas Brandstädt, Dieter Kratsch, Michaël Rao, Jeremy Spinrad Graph Searching, Elimination Trees, and a Generalization of Bandwidth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fedor V. Fomin, Pinar Heggernes, Jan Arne Telle 61 73 Constructing Sparse t-Spanners with Small Separators . . . . . . . . . . . . . . . . . Joachim Gudmundsson 86 Composing Equipotent Teams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mark Cieliebak, Stephan Eidenbenz, Aris Pagourtzis 98 X Table of Contents Algorithms 2 Efficient Algorithms for GCD and Cubic Residuosity in the Ring of Eisenstein Integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 Ivan Bjerre Damgård, Gudmund Skovbjerg Frandsen An Extended Quadratic Frobenius Primality Test with Average and Worst Case Error Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 Ivan Bjerre Damgård, Gudmund Skovbjerg Frandsen Periodic Multisorting Comparator Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 132 Marcin Kik Fast Periodic Correction Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 Grzegorz Stachowiak Networks and Complexity Games and Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 Christos Papadimitriou One-Way Communication Complexity of Symmetric Boolean Functions . . 158 Jan Arpe, Andreas Jakoby, Maciej Liśkiewicz Circuits on Cylinders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 Kristoffer Arnsfelt Hansen, Peter Bro Miltersen, V. Vinay Computational Biology Fast Perfect Phylogeny Haplotype Inference . . . . . . . . . . . . . . . . . . . . . . . . . . 183 Peter Damaschke On Exact and Approximation Algorithms for Distinguishing Substring Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 Jens Gramm, Jiong Guo, Rolf Niedermeier Complexity of Approximating Closest Substring Problems . . . . . . . . . . . . . . 210 Patricia A. Evans, Andrew D. Smith Computational Geometry On Lawson’s Oriented Walk in Random Delaunay Triangulations . . . . . . . 222 Binhai Zhu Competitive Exploration of Rectilinear Polygons . . . . . . . . . . . . . . . . . . . . . . 234 Mikael Hammar, Bengt J. Nilsson, Mia Persson An Improved Approximation Algorithm for Computing Geometric Shortest Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246 Lyudmil Aleksandrov, Anil Maheshwari, Jörg-Rüdiger Sack Table of Contents XI Adaptive and Compact Discretization for Weighted Region Optimal Path Finding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258 Zheng Sun, John H. Reif On Boundaries of Highly Visible Spaces and Applications . . . . . . . . . . . . . . 271 John H. Reif, Zheng Sun Computational Models and Complexity Membrane Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284 Gheorghe Păun Classical Simulation Complexity of Quantum Machines . . . . . . . . . . . . . . . . 296 Farid Ablayev, Aida Gainutdinova Using Depth to Capture Average-Case Complexity . . . . . . . . . . . . . . . . . . . . 303 Luı́s Antunes, Lance Fortnow, N.V. Vinodchandran Structural Complexity Non-uniform Depth of Polynomial Time and Space Simulations . . . . . . . . . 311 Richard J. Lipton, Anastasios Viglas Dimension- and Time-Hierarchies for Small Time Bounds . . . . . . . . . . . . . . 321 Martin Kutrib Baire’s Categories on Small Complexity Classes . . . . . . . . . . . . . . . . . . . . . . . 333 Philippe Moser Formal Languages Operations Preserving Recognizable Languages . . . . . . . . . . . . . . . . . . . . . . . 343 Jean Berstel, Luc Boasson, Olivier Carton, Bruno Petazzoni, Jean-Éric Pin Languages Defined by Generalized Equality Sets . . . . . . . . . . . . . . . . . . . . . . 355 Vesa Halava, Tero Harju, Hendrik Jan Hoogeboom, Michel Latteux Context-Sensitive Equivalences for Non-interference Based Protocol Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364 Michele Bugliesi, Ambra Ceccato, Sabina Rossi On the Exponentiation of Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376 Werner Kuich, Klaus W. Wagner Kleene’s Theorem for Weighted Tree-Automata . . . . . . . . . . . . . . . . . . . . . . . 387 Christian Pech XII Table of Contents Logic Weak Cardinality Theorems for First-Order Logic . . . . . . . . . . . . . . . . . . . . . 400 Till Tantau Compositionality of Hennessy-Milner Logic through Structural Operational Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412 Wan Fokkink, Rob van Glabbeek, Paulien de Wind On a Logical Approach to Estimating Computational Complexity of Potentially Intractable Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423 Andrzej Szalas Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433 Proving Integrality Gaps without Knowing the Linear Program Sanjeev Arora Princeton University During the past decade we have had much success in proving (using probabilistically checkable proofs or PCPs) that computing approximate solutions to NP-hard optimization problems such as CLIQUE, COLORING, SET-COVER etc. is no easier than computing optimal solutions. After the above notable successes, this effort is now stuck for many other problems, such as METRIC TSP, VERTEX COVER, GRAPH EXPANSION, etc. In a recent paper with Béla Bollobás and László Lovász we argue that NPhardness of approximation may be too ambitious a goal in these cases, since NP-hardness implies a lowerbound – assuming P = NP – on all polynomial time algorithms. A less ambitious goal might be to prove a lowerbound on restricted families of algorithms. Linear and semidefinite programs constitute a natural family, since they are used to design most approximation algorithms in practice. A lowerbound result for a large subfamily of linear programs may then be viewed as a lowerbound for a restricted computational model, analogous say to lowerbounds for monotone circuits The above paper showed that three fairly general families of linear relaxations for vertex cover cannot be used to design a 2-approximation for Vertex Cover. Our methods seem relevant to other problems as well. This talk surveys this work, as well as other open problems in the field. The most interesting families of relaxations involve those obtained by the so-called lift and project methods of Lovász-Schrijver and Sherali-Adams. Proving lowerbounds for such linear relaxations involve elements of combinatorics (i.e., strong forms of classical Erdős theorems), proof complexity, and the theory of convex sets. References 1. S. Arora, B. Bollobás, and L. Lovász. Proving integrality gaps without knowing the linear program. Proc. IEEE FOCS 2002. 2. S. Arora and C. Lund. Hardness of approximations. In [3]. 3. D. Hochbaum, ed. Approximation Algorithms for NP-hard problems. PWS Publishing, Boston, 1996. 4. L. Lovász and A. Schrijver. Cones of matrices and setfunctions, and 0-1 optimization. SIAM Journal on Optimization, 1:166–190, 1990. 5. H. D. Sherali and W. P. Adams. A hierarchy of relaxations between the continuous and convex hull representations for zeroone programming problems. SIAM J. Optimization, 3:411–430, 1990. A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, p. 1, 2003. c Springer-Verlag Berlin Heidelberg 2003  An Improved Analysis of Goemans and Williamson’s LP-Relaxation for MAX SAT Takao Asano Department of Information and System Engineering Chuo University, Bunkyo-ku, Tokyo 112-8551, Japan asano@ise.chuo-u.ac.jp Abstract. For MAX SAT, which is a well-known NP-hard problem, many approximation algorithms have been proposed. Two types of best approximation algorithms for MAX SAT were proposed by Asano and Williamson: one with best proven performance guarantee 0.7846 and the other with performance guarantee 0.8331 if a conjectured performance guarantee of 0.7977 is true in the Zwick’s algorithm. Both algorithms are based on their sharpened analysis of Goemans and Williamson’s LP-relaxation for MAX SAT. In this paper, we present an improved analysis which is simpler than the previous analysis. Furthermore, algorithms based on this analysis will play a role as a better building block in designing an improved approximation algorithm for MAX SAT. Actually we show an example that algorithms based on this analysis lead to approximation algorithms with performance guarantee 0.7877 and conjectured performance guarantee 0.8353 which are slightly better than the best known corresponding performance guarantees 0.7846 and 0.8331 respectively. Keywords: Approximation algorithm, MAX SAT, LP-relaxation. 1 Introduction MAX SAT, one of the most well-studied NP-hard problems, is stated as follows: given a set of clauses with weights, find a truth assignment that maximizes the sum of the weights of the satisfied clauses. More precisely, an instance of MAX SAT is defined by (C, w), where C is a set of boolean clauses, each clause C ∈ C being a disjunction of literals and having a positive weight w(C). Let X = {x1 , . . . , xn } be the set of boolean variables in the clauses of C. A literal is a variable x ∈ X or its negation x̄. For simplicity we assume xn+i = x̄i (xi = x̄n+i ). Thus, X̄ = {x̄ | x ∈ X} = {xn+1 , xn+2 , . . . , x2n } and X ∪ X̄ = {x1 , . . . , x2n }. We assume that no literals with the same variable appear more than once in a clause in C. For each xi ∈ X, let xi = 1 (xi = 0, resp.) if xi is true (false, resp.). Then, xn+i = x̄i = 1 − xi and a clause Cj = xj1 ∨ xj2 ∨ · · · ∨ xjkj ∈ C  kj can be considered to be a function Cj = Cj (x) = 1 − i=1 (1 − xji ) on x = (x1 , . . . , x2n ) ∈ {0, 1}2n . Thus, Cj = Cj (x) = 0 or 1 for any truth assignment x ∈ {0, 1}2n with xi + xn+i = 1 (i = 1, 2, ..., n) and Cj is satisfied if Cj (x) = 1. A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 2–14, 2003. c Springer-Verlag Berlin Heidelberg 2003  An Improved Analysis of Goemans and Williamson’s LP-Relaxation 3  The value of a truth assignment x is defined to be FC (x) = Cj ∈C w(Cj )Cj (x). That is, the value of x is the sum of the weights of the clauses in C satisfied by x. Thus, the goal of MAX SAT is to find an optimal truth assignment (i.e., a truth assignment of maximum value). We will also use MAX kSAT, a restricted version of the problem in which each clause has at most k literals. MAX SAT is known to be NP-hard and many approximation algorithms for it have been proposed. Håstad [5] has shown that no approximation algorithm for MAX SAT can achieve performance guarantee better than 7/8 unless P = N P . On the other hand, Asano and Williamson [1] have presented a 0.7846approximation algorithm and an approximation algorithm whose performance guarantee is 0.8331 if a conjectured performance guarantee of 0.7977 is true in the Zwick’s algorithm [9]. Both algorithms are based on their sharpened analysis of Goemans and Williamson’s LP-relaxation for MAX SAT [3]. In this paper, we present an improved analysis which is simpler than the previous analysis by Asano and Williamson [1]. Furthermore, this analysis will lead to approximation algorithms with better performance guarantees if combined with other approximation algorithms which were (and will be) presented. Algorithms based on this analysis will be used as a building block in designing an improved approximation algorithm for MAX SAT. Actually, algorithms based on this analysis lead to approximation algorithms with performance guarantee 0.7877 and conjectured performance guarantee 0.8353 which are slightly better than the best known corresponding performance guarantees 0.7846 and 0.8331 respectively, if combined with the MAX 2SAT and MAX 3SAT algorithms by Halperin and Zwick [6] and the Zwick’s algorithm [9], respectively. To explain our result in more detail, we briefly review the 0.75-approximation algorithm of Goemans and Williamson based on the probabilistic method [3]. Let xp = (xp1 , . . . , xp2n ) be a random truth assignment with 0 ≤ xpi = pi ≤ 1 (xpn+i = 1 − xpi = 1 − pi = pn+i ). That is, xp is obtained by setting independently each variable xi ∈ X to be true with probability pi (and xn+i = x̄i to be true with probability pn+i = 1 − pi ). Then the probability of a clause Cj = xj1 ∨ xj2 ∨ · · · ∨ xjkj ∈ C satisfied by the random truth assignment  kj xp = (xp1 , . . . , xp2n ) is Cj (xp ) = 1 − i=1 (1 −  xpji ). Thus, the expected value p p of the random truth assignment x is FC (x ) = Cj ∈C w(Cj )Cj (xp ). The probabilistic method assures that there is a truth assignment xq ∈ {0, 1}2n of value at least FC (xp ). Such a truth assignment xq can be obtained by the method of conditional probabilities [3]. Using an IP (integer programming) formulation of MAX SAT and its LP (linear programming) relaxation, Goemans and Williamson [3] obtained an algorithm for finding a random truth assignment xp of value FC (xp ) at least  1 1 k k≥1 (1 − (1 − k ) )Ŵk ≥ (1 − e )Ŵ ≈ 0.632Ŵ , where e is the base of nat  ural logarithm, Ŵk = C∈Ck w(C)C(x̂), and FC (x̂) = k≥1 Ŵk for an optimal truth assignment x̂ of (C, w) (Ck denotes the set of clauses in C with k literals). Goemans and Williamson also obtained a 0.75-approximation algorithm by using a hybrid approach of combining the above algorithm with Johnson’s algorithm [7]. It finds a random truth assignment of value at least 4 T. Asano 0.750Ŵ1 + 0.750Ŵ2 + 0.789Ŵ3 + 0.810Ŵ4 + 0.820Ŵ5 + 0.824Ŵ6 +  k≥7 βk Ŵk , where βk = 12 (2 − 21k − (1 − k1 )k ). Asano and Williamson [1] showed that one of the non-hybrid algorithms of Goemans and Williamson finds a random truth assignment xp with value FC (xp ) at least  0.750Ŵ1 + 0.750Ŵ2 + 0.804Ŵ3 + 0.851Ŵ4 + 0.888Ŵ5 + 0.915Ŵ6 + k≥7 γk Ŵk , 1 where γk = 1 − 21 ( 43 )k−1 (1 − 3(k−1) )k−1 for k ≥ 3 (γk > βk for k ≥ 3). Actually, they obtained a 0.7846-approximation algorithm by combining this algorithm with known MAX kSAT algorithms. They also proposed a generalization of this algorithm which finds a random truth assignment xp with value FC (xp ) at least  ′ γk Ŵk , 0.914Ŵ1 + 0.750Ŵ2 + 0.750Ŵ3 + 0.766Ŵ4 + 0.784Ŵ5 + 0.801Ŵ6 + 0.817Ŵ7 + k≥8 where γk′ = 1 − 0.914k (1 − k1 )k for k ≥ 8. They showed that if this is combined with Zwick’s MAX SAT algorithm with conjectured 0.7977 performance guarantee then it leads to an approximation algorithm with performance guarantee 0.8331. In this paper, we show that another generalization of the non-hybrid algorithms of Goemans and Williamson finds a random truth assignment xp with value FC (xp ) at least  0.750Ŵ1 + 0.750Ŵ2 + 0.815Ŵ3 + 0.859Ŵ4 + 0.894Ŵ5 + 0.920Ŵ6 + k≥7 ζk Ŵk , where ζk = 1 − 41 ( 43 )k−2 for k ≥ 3 and ζk > γk . We also present another algorithm which finds a random truth assignment xp with value FC (xp ) at least  ′ γk Ŵk . 0.914Ŵ1 + 0.750Ŵ2 + 0.757Ŵ3 + 0.774Ŵ4 + 0.790Ŵ5 + 0.804Ŵ6 + 0.818Ŵ7 + k≥8 This will be used to obtain a 0.8353-approximation algorithm. The remainder of the paper is structured as follows. In Section 2 we review the algorithms of Goemans and Williamson [3] and Asano and Williamson [1]. In Section 3 we give our main results and their proofs. In Section 4 we briefly outline improved approximation algorithms for MAX SAT obtained by our main results. 2 MAX SAT Algorithms of Goemans and Williamson Goemans and Williamson considered the following LP relaxation (GW ) of MAX SAT [3]:  (GW ) max w(Cj )zj Cj ∈ C s.t. kj  i=1 yji ≥ zj ∀Cj = xj1 ∨ xj2 ∨ · · · ∨ xjkj ∈ C An Improved Analysis of Goemans and Williamson’s LP-Relaxation 5 ∀i ∈ {1, 2, ..., n} yi + yn+i = 1 0 ≤ yi ≤ 1 0 ≤ zj ≤ 1 ∀i ∈ {1, 2, ..., 2n} ∀Cj ∈ C. In this formulation, variables y = (yi ) correspond to the literals {x1 , . . . , x2n } and variables z = (zj ) correspond to the clauses C. Thus, variable yi = 1 if and only if xi = 1. Similarly, zj = 1 if and only if Cj is satisfied. The first set of constraints implies that one of the literals in a clause is true if the clause is satisfied and thus IP formulation of this (GW ) with yi ∈ {0, 1} (∀i ∈ {1, 2, ..., 2n}) and zj ∈ {0, 1} (∀Cj ∈ C) exactly corresponds to MAX SAT. Throughout this paper, let (y ∗ , z ∗ ) be an optimal solution to this LP relaxation of MAX SAT. Goemans and Williamson set each variable xi to be true with probability yi∗ . Then a clause Cj = xj1 ∨ xj2 ∨ · · · ∨ xjkj is satisfied by this   k  ∗ random truth assignment xp = y ∗ with probability Cj (y ∗ ) ≥ 1 − 1 − k1 zj . Thus, the expected value F (y ∗ ) of y ∗ obtained in this way satisfies  k   1 1 ∗ ∗ Wk∗ ≥ 1 − w(Cj )Cj (y ) ≥ 1− 1− F (y ) = k e Cj ∈C W ∗, k≥1   ∗ ∗ ∗ ∗ = where W ∗ = Cj ∈Ck w(Cj )zj (note that W Cj ∈C w(Cj )zj and Wk =   ∗ w(C )ẑ for an optimal solution (ŷ, ẑ) to the IP ≥ Ŵ = w(C )z j j j j Cj ∈C Cj ∈C formulation of MAX SAT). Since (1 − 1e ) ≈ 0.632, this is a 0.632-approximation algorithm for MAX SAT. Goemans and Williamson [3] also considered three other non-linear randomized rounding algorithms. In these algorithms, each variable xi is set to be true with probability fℓ (yi∗ ) defined as follows (ℓ = 1, 2, 3). 3 1 1  4 y + 4 if 0 ≤ y ≤ 3     if 31 ≤ y ≤ 32 f1 (y) = 12     3 if 32 ≤ y ≤ 1, 4y f2 (y) = (2a − 1)y + 1 − a 1 − 4−y ≤ f3 (y) ≤ 4y−1 . 3 3 ≤a≤ √ −1 , 3 4 4 ∗ Note that fℓ (yi∗ ) + fℓ (yn+i ) = 1 hold for ℓ = 1, 2 and that f3 (yi∗ ) has to be ∗ ∗ chosen to satisfy f3 (yi ) + f3 (yn+i ) = 1. They then proved that all the random p ∗ ∗ truth assignments x = fℓ (y ) = (fℓ (y1∗ ), . . . , fℓ (y2n )) obtained in this way 3 3 ∗ have the expected values at least 4 W and lead to 4 -approximation algorithms. Asano and Williamson [1] sharpened the analysis of Goemans and Williamson to provide more precise bounds on the probability of a clause Cj = xj1 ∨xj2 ∨· · ·∨xjk with k literals being satisfied (and thus on the expected weight of satisfied clauses 6 T. Asano in Ck ) by the random truth assignment xp = fℓ (y ∗ ) for each k (and ℓ = 1, 2). From now on, we assume by symmetry, xji = xi for each i = 1, 2, ..., k since fℓ (x) = 1 − fℓ (x̄) and we can set x := x̄ if necessary. They considered clause Cj = x1 ∨ x2 ∨ · · · ∨ xk corresponding to the constraint y1 + y2 + · · · + yk ≥ zj in the LP relaxation (GW ) of MAX SAT, and gave a bound on the ratio of Cj (fℓ (y ∗ )) to zj∗ , where Cj (fℓ (y ∗ )) is the probability of clause Cj being satisfied by the random truth assignment xp = fℓ (y ∗ ) (ℓ = 1, 2). Actually, they analyzed parametrized functions f1a and f2a with 21 ≤ a ≤ 1 defined as follows:  1 ay + 1 − a if 0 ≤ y ≤ 1 − 2a      1 1 if 1 − 2a ≤ y ≤ 2a (1) f1a (y) = 21      1 ay if 2a ≤ y ≤ 1, f2a (y) = (2a − 1)y + 1 − a. 3/4 Note that f1 = f1 and f2 = f2a . Let k−1 a γk,1 (2) 1 1 − 2a 1 1 a = 1 − ak−1 1 − , γk,2 = 1 − ak 1 − 2 k−1 k  a if k = 1 γka = a a , γk,2 } if k ≥ 2, min{γk,1 and δka = 1 − ak 1 − 2− k 1 a k , (3) (4) k . (5) Then their results are summarized as follows. k Proposition 1. [1] For 12 ≤ a ≤ 1, let Cj (fℓa (y ∗ )) = 1 − i=1 (1 − fℓa (yi∗ )) be the probability of clause Cj = x1 ∨ x2 ∨ · · · ∨ xk ∈ C being satisfied by the ∗ random truth assignment xp = fℓa (y ∗ ) = (fℓa (y1∗ ), . . . , fℓa (y2n )) (ℓ = 1, 2). Then the following statements hold. k a ∗ a ∗ 1. Cj (f1a (y ∗ )) = 1− i=1 (1−f1a (yi∗ )) ≥ γ k zj and the expected value F (f1 (y )) a ∗ ∗ ∗ a p a of x = f1 (y ) satisfies F (f1 (y )) ≥ k≥1 γk Wk . k a ∗ a ∗ 2. Cj (f2a (y ∗ )) = 1 − i=1 (1 − f2a (yi∗ )) ≥ δ k zj and the expected value F (f2 (y )) a ∗ p a ∗ a ∗ of x = f2 (y ) satisfies F (f2 (y )) ≥ k≥1 δk Wk . 3. γka > δka hold for all k ≥ 3 and for all a with 21 < a < 1. For k = 1, 2, γka = δka (γ1a = δ1a = a, γ2a = δ2a = 34 ) hold. 3 Main Results and Their Proofs Asano and Williamson did not consider a parametrized function of f3 . In this section we consider a parametrized function f3a of f3 and show that it has better An Improved Analysis of Goemans and Williamson’s LP-Relaxation 7 performance than f1a and f2a . Furthermore, its analysis (proof) is simpler. We also consider a generalization of both f1a and f2a . For 12 ≤ a ≤ 1, let f3a be defined as follows:  a 1   1 − (4a2 )y if 0 ≤ y ≤ 2 a f3 (y) = (6)   (4a2 )y 1 if 2 ≤ y ≤ 1. 4a For 3 4 ≤ a ≤ 1, let 1 1 − . a 2 Then the other parametrized function f4a is defined as follows:  ay + 1 − a if 0 ≤ y ≤ 1 − ya      f4a (y) = a2 y + 12 − a4 if 1 − ya ≤ y ≤ ya      ay if ya ≤ y ≤ 1. ya = (7) (8) Thus, f3a (y) + f3a (1 − y) = 1 and f4a (y) + f4a (1 − y) = 1 hold for 0 ≤ y ≤ 1. Furthermore, f3a and f4a are both continuous functions which are increasing with y. Thus, f3a ( 12 ) = f4a ( 21 ) = 12 . Let ζka and ηka be the numbers defined as follows.  a if k = 1 a ζk = (9) 1 − 14 ak−2 if k ≥ 2, a a ηk,1 = γk,2 = 1 − ak 1 − a ηk,3 =1− ak 2 1− ηka =  1 − ya k−1 1 k k ak−2 , 4 (10) a a k 1  − 1 + , 2k 2 k (11) , a ηk,2 = ζka = 1 − , a ηk,4 =1− k−1 a if k = 1 a a a a min{ηk,1 , ηk,2 , ηk,3 , ηk,4 } if k ≥ 2. (12) Then we have the following theorems for the two parameterized functions f3a and f4a . √ Theorem 1. For 12 ≤ a ≤ 2e = 0.82436, the probability of Cj = x1 ∨ x2 ∨ · · · ∨ xk ∈ C being satisfied by the random truth assignment xp = f3a (y ∗ ) = k ∗ (f3a (y1∗ ), . . . , f3a (y2n )) is Cj (f3a (y ∗ )) = 1 − i=1 (1 − f3a (yi∗ )) ≥  ζka zj∗ . Thus, the a ∗ p a ∗ a ∗ expected value F (f3 (y )) of x = f3 (y ) satisfies F (f3 (y )) ≥ k≥1 ζka Wk∗ . √ Theorem 2. For 2e = 0.82436 ≤ a ≤ 1, the probability of Cj = x1 ∨ x2 ∨ · · · ∨ xk ∈ C being satisfied by the random truth assignment xp = f4a (y ∗ ) = k ∗ )) is Cj (f4a (y ∗ )) = 1 − i=1 (1 − f4a (yi∗ )) ≥  ηka zj∗ . Thus, the (f4a (y1∗ ), . . . , f4a (y2n ∗ ∗ a ∗ p a a expected value F (f4 (y )) of x = f4 (y ) satisfies F (f4 (y )) ≥ k≥1 ηka Wk∗ . 8 T. Asano Theorem 3. The following statements hold for γka , δka , ζka , and ηka . √ e 1 a a a 1. If √ 2 ≤ a ≤ 2 = 0.82436, then ζk > γk > δk hold for all k ≥ 3. e a a a 2. If 2 = 0.82436 ≤ a < 1, then ηk ≥ γk > δk hold for all k ≥ 3. In particular, √ if 2e = 0.82436 ≤ a ≤ 0.881611, then ηka > √ γka > δka hold for all k ≥ 3. 3. For k = 1, 2, γka = δka = ζka hold if 12 ≤ a ≤ 2e = 0.82436, and γka = δka = ηka √ hold if 2e = 0.82436 ≤ a ≤ 1. In this paper, we first give a proof of Theorem 1. It is very simple and we use only the following lemma. Lemma 1. If 1 2 ≤a≤ √ e 2 = 0.82436, then f3a (y) ≥ ay. 2 y 2 y Proof. Let g(y) ≡ (4a4a) − ay. Then its derivative is g ′ (y) = ln(4a2 ) (4a4a) − a. Thus,√g ′ (y) is increasing with y and g ′ (1) = a(ln(4a2 ) − 1) ≤ 0, since ln(4a2 ) ≤ ln(4( 2e )2 ) = 1. This implies that g ′ (y) ≤ 0 for all 0 ≤ y ≤ 1 and that g(y) is decreasing with 0 ≤ y ≤ 1. Thus, g(y) takes a minimum value at y = 1, i.e., 2 y 2 g(y) = (4a4a) − ay ≥ g(1) = 4a 4a − a = 0. Now we are ready to prove the lemma. For 21 ≤ y ≤ 1, we have f3 (y) − ay = g(y) = (4a2 )y 4a − ay ≥ 0. For 0 ≤ y ≤ 21 , we have a (4a2 )1−y + a(1 − y) + 1 − a − ay = − 2 y (4a ) 4a 1 1−a = −g(1 − y) + 1 − a ≥ −g( ) + 1 − a = ≥0 2 2 f3 (y) − ay = 1 − since g(y) is decreasing and g(1 − y) ≤ g( 21 ) = 1−a 2 for 1 2 ≤ 1 − y ≤ 1. Proof of Theorem 1. Noting that clause Cj = x1 ∨x2 ∨· · ·∨xk corresponds to the constraint y1∗ + y2∗ + · · · + yk∗ ≥ zj∗ (13) in the LP relaxation (GW ) of MAX SAT, we will show that Cj (f3a (y ∗ )) =1− k  (1 − f3a (yi∗ )) ≥ ζka zj∗ i=1 √ for 21 ≤ a ≤ 2e = 0.82436. By symmetry, we assume y1∗ ≤ y2∗ ≤ · · · ≤ yk∗ . Note that yk∗ ≤ zj∗ , since otherwise (y ∗ , z ∗ ) would not be an optimal solution to the LP zj′ = yk∗ and zj′ ′ = zj∗′ relaxation (GW ) of MAX SAT (if yk∗ > zj∗ then (y ∗ , z ′ ) with  ′ (j = j) would also be a feasible solution to (GW ) and Cj′ ∈ C w(Cj ′ )zj′ ′ >  ∗ Cj ′ ∈ C w(Cj ′ )zj ′ ), a contradiction. If k = 1, then we have Cj (f3a (y ∗ )) = f3a (y1∗ ) ≥ ay1∗ ≥ azj∗ = ζ1a zj∗ by Lemma 1 and inequality (13). Next suppose k ≥ 2. We consider two cases as follows: Case 1: 0 ≤ yk∗ ≤ 12 ; and Case 2: 12 < yk∗ ≤ 1. An Improved Analysis of Goemans and Williamson’s LP-Relaxation 9 Case 1: 0 ≤ yk∗ ≤ 21 . Since all yi∗ ≤ 21 (i = 1, 2, ..., k), we have f3a (yi∗ ) = a 1 − a2 y∗ and 1 − f3a (yi∗ ) = ∗ . Thus, we have 2 y (4a ) (4a ) i Cj (f3a (y ∗ )) = 1 − k  i (1 − f3a (yi∗ )) = 1 − i=1 ≥1− ak ∗ ≥ (4a2 )zj 1− ak 4a2 k  ak a k ∗ =1− yi∗ 2 (4a ) (4a2 ) i=1 yi i=1 ak−2 zj∗ = ζka zj∗ , zj∗ = 1 − 4 where the first inequality follows by inequality (13), and the second inequality k follows from the fact that 1 − a z∗ is a concave function in 0 ≤ zj∗ ≤ 1. Case 2: 1 2 < yk∗ ≤ 1. Let (4a2 ) j ∗ yk−1 > 12 . Then, since f3a (yi∗ ) ≥ 1 − a (i = 1, 2, ..., k), ∗ 2 yi we have 1 − f3a (yi∗ ) ≤ a (i = 1, 2, ..., k − 2), 1 − f3a (yi∗ ) = 1 − (4a4a) ≤ k (i = k − 1, k), and zj∗ ≤ 1, and Cj (f3a (y ∗ )) = 1 − i=1 (1 − f3a (yi∗ )) satisfies Cj (f3a (y ∗ )) ≥ 1 − ak−2 1 2 2 =1− ak−2 ≥ 4 1− ∗ Thus, we can assume yk−1 ≤ 12 . Since 1 − f3a (yi∗ ) = ak−2 4 a y∗ (4a2 ) i 1 2 zj∗ = ζka zj∗ . (i = 1, 2, ..., k − 1), we have Cj (f3a (y ∗ )) = 1 − k  (1 − f3a (yi∗ )) = 1 − i=1 ak−1 k−1 ∗ (4a2 ) i=1 yi ∗ 1− (4a2 )yk 4a ∗ ∗ ∗ (4a2 )yk (4a2 )yk ak−1 ak−1 2 yk 1− 1− =1 − ∗ −y ∗ ∗ (4a ) z z 4a 4a (4a2 ) j k (4a2 ) j k k−2 k−1 a a a zj∗ = ζka zj∗ a=1− ≥ 1− ≥1− zj∗ zj∗ 2 2 4 (4a ) (4a ) ≥1 − ∗ 2 yk u ) ≤ a with u = (4a2 )yk , by inequality (13), yk∗ ≤ zj∗ , (4a2 )yk (1− (4a4a) ) = u(1− 4a ∗ and the fact that 1 − ak (4a2 ) z∗ j ∗ is a concave function in 0 ≤ zj∗ ≤ 1. Proofs of Theorems 2 and 3. Proofs of Theorems 2 and 3 are almost similar to ones in Asano and Williamson [1]. In this sense, proofs may be a little complicated, however, they can be done in a systematic way. Here, we will give only an outline of Proof of Theorem 2. Proof of Theorem 3 is almost similar. Outline of Proof of Theorem 2. For a clause Cj = x1 ∨ x2 ∨ · · · ∨ xk corresponding to the constraint y1∗ + y2∗ + · · · + yk∗ ≥ zj∗ as described in Proof k of Theorem 1, we will show that Cj (f4a (y ∗ )) = 1 − i=1 (1 − f4a (yi∗ )) ≥ ηka zj∗ for 34 ≤ a ≤ 1. We assume y1∗ ≤ y2∗ ≤ · · · ≤ yk∗ and yk∗ ≤ zj∗ holds as described before. Suppose k = 1. Since f4a (y) − ay = 1 − a ≥ 0 for 0 ≤ y ≤ 1 − ya and a f4 (y) − ay = 0 for ya ≤ y ≤ 1, we consider the case when 1 − ya ≤ y ≤ ya . 10 T. Asano In this case, f4a (y) − ay = 2−a−2ay is decreasing with 1 − ya ≤ y ≤ ya and we 4 2−a−2ay a a a ≥ f4 (ya ) − aya = 2−a−2ay = 0 by Eq.(7). Thus, have f4 (y) − ay = 4 4 a ∗ a ∗ ∗ ∗ a ∗ Cj (f4 (y )) = f4 (y1 ) ≥ ay1 ≥ azj = η1 zj by inequality (13). Next suppose k ≥ 2. We consider three cases as follows. Case 1: yk∗ ≤ 1 − ya ; Case 2: 1 − ya < yk∗ ≤ ya ; and Case 3: ya ≤ yk∗ ≤ 1. Case 1: yk∗ ≤ 1 − ya . Since all yi∗ ≤ 1 − ya (i = 1, 2, ..., k), f4a (yi∗ ) = 1 − a + ayi∗ k and 1 − f4a (yi∗ ) = a(1 − yi∗ ). Thus, Cj (f4a (y ∗ )) = 1 − i=1 (1 − f4a (yi∗ )) satisfies Cj (f4a (y ∗ )) k =1−a k  (1 − yi∗ ) zj∗ k k k ≥1−a i=1 ≥ 1 − ak 1 − ≥  1− k i=1  1 − ak 1 − k yi∗ k 1 k k a zj∗ = ηk,1 zj∗ , where the first inequality follows by the arithmetic/geometric mean inequality, z∗ the second by inequality (13), and third by the fact that 1 − ak (1 − kj )k is a concave function in 0 ≤ zj∗ ≤ 1. ∗ Case 2: 1 − ya ≤ yk∗ ≤ ya . Let ℓ be the number such that yℓ∗ < 1 − ya ≤ yℓ+1 ℓ  k and let yA = i=1 yi∗ and yB = i=ℓ+1 yi∗ . Then k − ℓ ≥ 1 and ℓ ≥ 0. If ℓ = 0   then, f4a (yi∗ ) = 12 ayi∗ + 1 − a2 (i = 1, 2, ..., k) and for the same reason as in Case 1 above, we have Cj (f4a (y ∗ )) = 1 − k  (1 − f4a (yi∗ )) = 1 − i=1 k k  k 1 2 1+ i=1  a − ayi∗ 2 k a ayB k a azj∗ 1 1 ≥1− 1+ − 1+ − ≥1− 2 2 k 2 2 k  k 1 a a k a = 1− zj∗ = ηk,4 1+ − zj∗ . 2 2 k  k Now suppose ℓ > 0 and that yB ≤ zj∗ (we omit the case when yB > zj∗ , since k it can be argued similarly). Then Cj (f4a (y ∗ )) = 1 − i=1 (1 − f4a (yi∗ )) satisfies Cj (f4a (y ∗ )) = 1 − aℓ 1 2 ℓ k−ℓ  (1 − yi∗ ) i=1 ≥ 1 − aℓ 1 2 k−ℓ 1 2 k−ℓ ≥ 1 − aℓ 1 2 k−ℓ = 1 − aℓ  k   1+ i=ℓ+1  a − ayi∗ 2 a ayB y A ℓ 1+ − 1− ℓ 2 k−ℓ 1− zj∗ − yB ℓ g(yB ), ℓ 1+ k−ℓ ayB a − 2 k−ℓ k−ℓ An Improved Analysis of Goemans and Williamson’s LP-Relaxation zj∗ −yB ℓ  where g(yB ) ≡ 1 − ℓ  1+ a 2 − ayB k−ℓ with yB . Thus, if k − ℓ ≥ 2 then, by yB ≤ Cj (f4a (y ∗ )) ≥ 1 − aℓ  ≥ ≥ zj∗ 1 2 k−ℓ a a − 2 k−ℓ k−ℓ 1+ 1+ a a 2 − 2 2 g(zj∗ ) = 1 − aℓ 1 2  k−ℓ 1 2 1 − ak−2 2  . Note that g(yB ) is increasing and g(yB ) ≤ g(zj∗ ), we have k−ℓ 1 2 1 − aℓ k−ℓ 11 1+ azj∗ a − 2 k−ℓ k−ℓ zj∗ zj∗ = 1− ak−2 4 a zj∗ = ηk,2 zj∗ ,   k−ℓ  azj∗ k−ℓ 1 + a2 − k−ℓ is a concave function in 0 ≤ zj∗ ≤ 1 and since 1 − aℓ 21 k−ℓ  k−ℓ  a 1 + a2 − k−ℓ 1 − aℓ 21 is increasing with k − ℓ for 43 ≤ a ≤ 1 (which can be shown by Lemma 2.5 in [1]). Similarly, if k − ℓ = 1, then yB = yk∗ ≤ ya and zj∗ − ya 1 ak g(ya ) = 1 − 1− Cj (f4a (y ∗ )) ≥ 1 − ak−1 2 2 k−1  k−1 k 1 − ya a a ≥ 1− zj∗ = ηk,3 1− zj∗ 2 k−1 k−1     k z ∗ −ya k−1 z ∗ −ya k−1 by Eq.(7) and 1 − a2 1 − jk−1 since g(yB ) ≤ g(ya ) = a 1 − jk−1 is a concave function in ya ≤ zj∗ ≤ 1 (see Lemma 2.4 in [1]). ∗ ∗ Case 3: ya ≤ yk∗ ≤ 1. If yk−1 +yk∗ > 1 then (1−f4a (yk−1 ))(1−f4a (yk∗ )) ≤ 41 and k ∗ a a ∗ 1 − f4 (yi ) ≤ a (i = 1, 2, ..., k) and Cj (f4 (y )) = 1 − i=1 (1 − f4a (yi∗ )) satisfies ∗ Cj (f4a (y ∗ )) ≥ 1 − ak−2 (1 − f4a (yk−1 ))(1 − f4a (yk∗ )) ≥ 1 − ∗ Thus, we can assume yk−1 ≤ 1 − ya . Let yA = k−1  Cj (f4a (y ∗ )) ≥ 1 − ak−1(1 − ayk∗ ) k−1 i=1 ak−2 a a = ηk,2 ≥ ηk,2 zj∗ . 4 yi∗ . Then we have (1 − yi∗ ) ≥ 1 − ak−1 (1 − ayk∗ ) 1 − i=1 k−1 ≥1−a (1 − ayk∗ ) zj∗ − yk∗ 1− k−1 ∗ zj∗ −yk k−1 k−1 k−1 k−1 zj∗ − ya k−1 ak =1− ≥ 1 − ak−1 (1 − aya ) 1 − k−1 2  k−1 k a 1 − ya a 1− ≥ 1− zj∗ = ηk,3 zj∗ , 2 k−1  since (1 − ayk∗ ) 1 − yA k−1 1− zj∗ − ya k−1 is decreasing with yk∗ (ya ≤ yk∗ ≤ 1). k−1 12 4 T. Asano Improved Approximation Algorithms In this section, we briefly outline our improved appproximation algorithms for MAX SAT based on a hybrid approach which is described in detail in Asano and Williamson [1]. We use a semidefinite programming relaxation of MAX SAT which is a combination of ones given by Goemans and Williamson [4], Feige and Goemans [2], Karloff and Zwick [8], Halperin and Zwick [6], and Zwick [9]. Our algorithms pick the best solution returned by the four algorithms corresponding to (1) f3a in Goemans and Williamson [3], (2) MAX 2SAT algorithm of Feige and Goemans [2] or of Halperin and Zwick [6], (3) MAX 3SAT algorithm of Karloff and Zwick [8] or of Halperin and Zwick [6], and (4) Zwick’s MAX SAT algorithm with a conjectured performance guarantee 0.7977 [9]. The expected value of the solution is at least as good as the expected value of an algorithm that uses Algorithm (i) with probability pi , where p1 + p2 + p3 + p4 = 1. Our first algorithm picks the best solution returned by the three algorithms corresponding to (1) f3a in Goemans and Williamson [3], (2) Feige and Goemans’s MAX 2SAT algorithm [2], and (3) Karloff and Zwick’s MAX 3SAT algorithm [8] (this implies that p4 = 0). From the arguments in Section 3, the probability that a clause Cj ∈ Ck is satisfied by Algorithm (1) is at least ζka zj∗ , where ζka is defined in Eq.(9). Similarly, from the arguments in [4,2], the probability that a clause Cj ∈ Ck is satisfied by Algorithm (2) is at least 0.93109 · 2 ∗ z k j for k ≥ 2, and at least 0.97653zj∗ for k = 1. By an analysis obtained by Karloff and Zwick [8] and an argument similar to one in [4], the probability that a clause Cj ∈ Ck is satisfied by Algorithm (3) is at least 37 ∗ z k8 j for k ≥ 3, and at least 0.87856zj∗ for k = 1, 2. Suppose that we set a = 0.74054, p1 = 0.7861, p2 = 0.1637, and p3 = 0.0502 (p4 = 0). Then ap1 + 0.97653p2 + 0.87856p3 ≥ 0.7860 for k = 1, 3 p1 + 0.93109p2 + 0.87856p3 ≥ 0.7860 for k = 2, 4 2 × 0.93109 37 ζka p1 + p2 + p3 ≥ 0.7860 for k ≥ 3. k k8 Thus this is a 0.7860-approximation algorithm. Note that the algorithm in Asano and Williamson [1] picking the best solution returned by the three algorithms corresponding to (1) f1a with a = 43 in Goemans and Williamson [3], (2) Feige and Goemans [2], and (3) Karloff and Zwick [8] only achieves the performance guarantee 0.7846. Suppose next that we use three algorithms (1) f3a in Goemans and Williamson [3], (2) Halperin and Zwick’s MAX 2SAT algorithm [6], and (3) Halperin and Zwick’s MAX 3SAT algorithm [6] instead of Feige and Goemans [2] and Karloff An Improved Analysis of Goemans and Williamson’s LP-Relaxation 13 and Zwick [8]. If we set a = 0.739634, p1 = 0.787777, p2 = 0.157346, and p3 = 0.054877, then we have ap1 + 0.9828p2 + 0.9197p3 ≥ 0.7877 for k = 1, 3 p1 + 0.9309p2 + 0.9197p3 ≥ 0.7877 for k = 2, 4 2 × 0.9309 37 ζka p1 + p2 + p3 ≥ 0.7877 for k ≥ 3. k k8 Thus we have a 0.7877-approximation algorithm for MAX SAT (note that the performance guarantees of Halperin and Zwick’s MAX 2SAT and MAX 3SAT algorithms are based on the numerical evidence [6]). Suppose finally that we use two algorithms (1) f4a in Goemans and Williamson [3] and (4) Zwick’s MAX SAT algorithm with a conjectured performance guarantee 0.7977 [9]. If we set a = 0.907180, p1 = 0.343137 and p4 = 0.656863 (p2 = p3 = 0), then the probability of clause Cj with k literals being satisfied can be shown to be at least 0.8353zj∗ for each k ≥ 1. Thus, we can obtain a 0.8353-approximation algorithm for MAX SAT if a conjectured performance guarantee 0.7977 is true in Zwick’s MAX SAT algorithm [9,1]. Remarks. As described above, algorithms based on f3a and f4a can be used as a building block for designing an improved approximation algorithm for MAX SAT. We have examined several other parameterized functions including ones in Asano and Williamson [1] and we are sure that algorithms based on f3a and f4a are almost the best as such a building block among functions of using an optimal solution (y ∗ , z ∗ ) to Goemans and Williamson’s LP relaxation for MAX SAT. Acknowledgments. I would like to thank Prof. B. Korte of Bonn University for having invited me to have stayed in his institute and done this work. I also thank Dr. D.P. Williamson for useful comments. This work was supported in part by 21st Century COE Program: Research on Security and Reliability in Electronic Society, Grant in Aid for Scientific Research of the Ministry of Education, Science, Sports and Culture of Japan, The Institute of Science and Engineering of Chuo University, and The Telecommunications Advancement Foundation. References 1. T. Asano and D.P. Williamson, Improved approximation algorithms for MAX SAT, Journal of Algorithms 42, pp.173–202, 2002. 2. U. Feige and M.X. Goemans, Approximating the value of two prover proof systems, with applications to MAX 2SAT and MAX DICUT, In Proc. 3rd Israel Symposium on Theory of Computing and Systems, pp. 182–189, 1995. 3. M.X. Goemans and D.P. Williamson, New 3/4-approximation algorithms for the maximum satisfiability problem, SIAM Journal on Discrete Mathematics 7, pp. 656–666, 1994. 14 T. Asano 4. M.X. Goemans and D.P. Williamson, Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming, Journal of the ACM 42, pp. 1115–1145, 1995. 5. J. Håstad, Some optimal inapproximability results, In Proc. 28th ACM Symposium on the Theory of Computing, pp. 1–10, 1997. 6. E. Halperin and U. Zwick, Approximation algorithms for MAX 4-SAT and rounding procedures for semidefinite programs, Journal of Algorithms 40, pp. 184–211, 2001. 7. D.S. Johnson, Approximation algorithms for combinatorial problems, Journal of Computer and Systems Science 9, pp. 256–278, 1974. 8. H. Karloff and U. Zwick, A 7/8-approximation algorithm for MAX 3SAT?, In Proc. 38th IEEE Symposium on the Foundations of Computer Science, pp. 406–415, 1997. 9. U. Zwick, Outward rotations: a tool for rounding solutions of semidefinite programming relaxations, with applications to MAX CUT and other problems, In Proc. 31st ACM Symposium on the Theory of Computing, pp. 679–687, 1999. Certifying Unsatisfiability of Random 2k-SAT Formulas Using Approximation Techniques Amin Coja-Oghlan1 , Andreas Goerdt2 , André Lanka2 , and Frank Schädlich2 1 2 Humboldt-Universität zu Berlin, Institut für Informatik Unter den Linden 6, 10099 Berlin, Germany coja@informatik.hu-berlin.de Technische Universität Chemnitz, Fakultät für Informatik Straße der Nationen 62, 09107 Chemnitz, Germany {goerdt,lanka,frs}@informatik.tu-chemnitz.de Abstract. Let k be an even integer. We investigate the applicability of approximation techniques to the problem of deciding whether a random k-SAT formula is satisfiable. Let n be the number of propositional variables under consideration. First we show that if the number m of clauses satisfies m ≥ Cnk/2 for a certain constant C, then unsatisfiability can be certified efficiently using (known) approximation algorithms for MAX CUT or MIN BISECTION. In addition, we present an algorithm based on the Lovász ϑ function that within polynomial expected time decides whether the input formula is satisfiable, provided m ≥ Cnk/2 . These results improve previous work by Goerdt and Krivelevich [14]. Finally, we present an algorithm that approximates random MAX 2-SAT within expected polynomial time. 1 Introduction The k-SAT problem is to decide whether a given k-SAT formula is satisfiable or not. Since it is well-known that the k-SAT problem is N P-complete for k ≥ 3, it is natural to ask for algorithms that can handle random formulas efficiently. Given a set of n propositional variables and a function c = c(n), a random k-SAT instance is obtained by picking c k-clauses over the set of n variables uniformly at random and independently of each other. Part of the recent interest in random k-SAT is due to the interesting threshold behavior, in that there exist values ck = ck (n) such that random k-SAT instances with at most (1 − ε)·ck ·n random clauses are satisfiable with high probability, whereas for at least (1 + ε) · ck · n random clauses we have unsatisfiability with high probability. (Here, “with high probability” or “whp.” means “with probability tending to 1 as n, the number of variables, tends to infinity”). In particular, according to current knowledge ck = ck (n) lies in a bounded interval depending on k only. However, it is not known whether the threshold really is a constant independent of n, cf. [10]. In this paper, we are concerned with values of c(n) well above the threshold, and the problem is to certify efficiently that a random formula is unsatisfiable. A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 15–26, 2003. c Springer-Verlag Berlin Heidelberg 2003  16 A. Coja-Oghlan et al. There are two different types of algorithms for deciding whether a random k-SAT formula is satisfiable or not. First, there are algorithms that on any input formula have a polynomial running time, and that whp. give the correct answer, “satisfiable” or “unsatisfiable”. However, with probability o(1), the algorithm may give an inconclusive answer. Hence, the algorithm never makes an incorrect decision. We shall refer to algorithms of this type as efficient certification algorithms. Note that the trivial constant time algorithm always returning “unsatisfiable” is not an efficient certification algorithm in our sense because it gives an incorrect answer in some (rare) cases. Secondly, there are algorithms that always answer correctly (either “satisfiable” or “unsatisfiable”), and that applied to a random formula have a polynomial expected running time. Let us emphasize that although an efficient certification algorithm may give an inconclusive answer in some (rare) cases, such an algorithm is still complete in the following sense. Given a random k-SAT instance such that the number of clauses is above the satisfiability threshold, whp. the algorithm will indeed give the correct answer (“unsatisfiable” in the present case). Note that no polynomial time algorithm can answer “unsatisfiable” on all unsatisfiable inputs; completeness only refers to a subset whose probability tends to 1. Any certification algorithm can be turned into a satisfiability algorithm that answers correctly on any input, simply by invoking an enumeration procedure in case that the efficient certification procedure gives an inconclusive answer. However, an algorithm obtained in this manner will not run in polynomial expected time in general. For the probability of an inconclusive answer may be too large (even though it is o(1)). Thus, asking for polynomial expected running time is a rather strong requirement. From [11] and [14] it is essentially known that for random k-SAT instances with Poly(log n) · nk/2 clauses we can efficiently certify unsatisfiability, in case of even k. For odd k we need n(k/2) + ε random clauses. Hence, it is an obvious problem to design algorithms that can certify unsatisfiability of random formulas efficiently for smaller numbers of clauses than given in [11,14]. To make further progress on this question, new techniques seem to be necessary. Therefore, in this paper we investigate what various algorithmic techniques contribute to the random k-SAT problem. We achieve some improvements for the case of even k, removing the polylogarithmic factor and achieving an algorithm with a polynomial expecteded running time. Based on reductions from 4-SAT instances to instances of graph theoretic optimization problems we obtain efficient certification algorithms applying known approximation algorithms for the case of at least C · n2 4-clauses. Similar constructions involving approximation algorithms can be found in [6] or [13]. We present two different certification algorithms. One applies the MAX CUT approximation algorithm of Goemans and Williamson [12]. The other one employs the MIN BISECTION approximation algorithm of Feige and Krauthgamer [8]. Since the MAX CUT approximation algorithm is based on semidefinite programming, our first algorithm is not purely combinatorial. In contrast, the application of the MIN BISECTION algorithm yields a combinatorial algorithm. We state Certifying Unsatisfiability of Random 2k-SAT Formulas 17 our result only for k = 4, but it seems to be only a technical matter to extend it to arbitrary even numbers k and C · nk/2 clauses. Moreover, we obtain the first algorithm for deciding satisfiability of random k-SAT formulas with at least C · nk/2 random clauses in expected polynomial time (k even). Indeed, the algorithm can even handle semirandom formulas, cf. Sec. 4 for details. Since the algorithm is based on computing the Lovász number ϑ, it is not purely combinatorial. The analysis is based on a recent estimate on the probable value of the ϑ-function of sparse random graphs [4]. The paper [2] is also motivated by improving the nk/2 barrier. Further, in [9] another algorithm is given that certifies unsatisfiability of random 2k-SAT formulas consisting of at least Cnk/2 clauses with probability tending to 1 as C → ∞. Though the decision version of the 2-SAT problem (“given a 2-SAT formula, is there a satisfying assignment?”) can be solved in polynomial time, the optimization version MAX 2-SAT (“given a 2-SAT formula, find an assignment that satisfies the maximum number of clauses”) is NP-hard. Therefore, we present an algorithm that approximates MAX 2-SAT in expected polynomial time. The algorithm is based on a probabilistic analysis of Goemans’ and Williamson’s semidefinite relaxation of MAX 2-SAT [12]. Concerning algorithms for worst case instances cf. [7]. In Section 2 we give our certification algorithms and in Section 3 we state the theorem crucial for their correctness. Section 4, which is independent of Sections 2 and 3, deals with the expected polynomial time algorithm. Finally, in Section 5 we consider the MAX 2-SAT problem. 2 Efficient Certification of Unsatisfiability Given a set of n propositional variables, Var = Varn = {v1 , . . . , vn }, a literal over Var is a variable vi or a negated variable ¬vi . A k-clause is an ordered k-tuple l1 ∨ l2 ∨ . . . ∨ lk of literals such that the variables underlying the literals are distinct. A k-SAT instance is a set of k-clauses. We think of a k-SAT instance as C1 ∧ C2 ∧ . . . ∧ Cm where each Ci is a k-clause. Given a truth value assignment a of Var, we can assign true or false to a k-SAT instance as usual. We let Ta be the set of variables x with a(x) = true and Fa the set of variables x with a(x) = false. The probability space Formn,k,p is the probability space of k-SAT instances obtained by picking each k-clause with probability p independently. A k-uniform hyperedge or simply k-tuple over the vertex set V is a vector (x1 , x2 , . . . , xk ) where the xi ∈ V are all distinct. H = (V, E) is a k-uniform hypergraph if E is a set of k-tuples over the vertex set V . In the context of k-uniform hypergraphs we use the notion of type in the following sense: Let X1 , X2 , . . . , Xk ⊆ V , a k-tuple (x1 , x2 , . . . , xk ) is of type (X1 , X2 , . . . , Xk ) if we have for all i that xi ∈ Xi . A random hypergraph H ∈ HGn,k,p is obtained by picking each of the possible (n)k k-tuples with probability p, independently. Let S be a set of k-clauses over the set of variables Var, as defined above. The hypergraph H = (V, E) associated to S is defined by V = Var and 18 A. Coja-Oghlan et al. (x1 , x2 , x3 , . . . , xk ) ∈ E if and only if there is a k-clause l1 ∨ l2 ∨ . . .∨lk ∈ S such that for all i li = xi or li = ¬xi . In case of even k, the graph G = (V, E) associated to S is defined by V = {(x1 , . . . , xk/2 ) | xi ∈ Var and xi = xj for i = j} and {(x1 , x2 , . . . , xk/2 ), (x(k/2)+1 , . . . , xk )} ∈ E if and only if there is a k-clause l1 ∨ l2 ∨ . . . ∨ lk ∈ S such that the variable underlying li is xi . The following asymptotic abbreviations are used: f (n) ∼s g(n) iff there is an ε > 0 such that f (n) = g(n) · (1 + O(1/nε )). Here ∼s stands for strong asymptotic equality. Similarly we use f (n) = so(g(n)) iff f (n) = O(1/nε )·g(n). We say f (n) is negligible iff f (n) = so(1). Parity properties analogous to the next theorem have been proved in [6] for 3-SAT instances with a linear number of clauses and in [13] for 4-SAT instances. But in the proof of [13] it is important that the probability of each clause is p ≤ 1/n2+ε where ε > 0 is a constant. This implies that the number of occurrences of two given literals in several clauses of a random formula is small. This is not any more the case for p = C/n2 and some complications arise. Theorem 1 (Parity Theorem). For a random F ∈ Formn,4,p where p = C/n2 and C is a sufficiently large constant, we can efficiently certify the following properties. (a) Let S ⊆ F be the subset of all clauses of F corresponding to one of the 16 possibilities of placing negated and non-negated variables into the four slots of clauses available. Let G = (V, E) be the graph associated to S. Then |S| = C · n2 · (1 + so(1)) and |E| = C · n2 · (1 + so(1)). (b) For all satisfying assignments a of F we have that |Ta | ∼s (1/2) · n and |Fa | ∼s (1/2) · n. (c) Let S be the set of clauses of F consisting only of non-negated variables. Let H be the hypergraph associated to S. For all satisfying assignments a of F the number of 4-tuples of H of each of the 8 types (Ta , Ta , Ta , Fa ), (Ta , Ta , Fa , Ta ), (Ta , Fa , Ta , Ta ), (Fa , Ta , Ta , Ta ), (Fa , Fa , Fa , Ta ), (Fa , Fa , Ta , Fa ), (Fa , Ta , Fa , Fa ), (Ta , Fa , Fa , Fa ) is (1/8) · C · n2 · (1 + so(1)). The same statement applies when S is one of the remaining seven subsets of clauses of F which have a given even number of negated variables in a given subset of the four slots available. (d) Let H be the hypergraph associated to those clauses of F whose first slot contains a negated variable and whose remaining three slots contain non-negated variables. The number of 4-tuples of H of each of the 8 types (Ta , Ta , Ta , Ta ), (Ta , Ta , Fa , Fa ), (Ta , Fa , Ta , Fa ), (Ta , Fa , Fa , Ta ), (Fa , Fa , Fa , Fa ), (Fa , Fa , Ta , Ta ), (Fa , Ta , Fa , Ta ), (Fa , Ta , Ta , Fa ) is (1/8) · C · n2 · (1 + so(1)). A statement analogously to (c) applies. The technical notion type of a 4-tuple of a hypergraph is defined above. Statement (b) means that we have an ε > 0 such that we can certify that all assignments a with |Ta | ≥ (1/2) · n · (1 + 1/nε ) or |Fa | ≥ (1/2) · n · (1 + 1/nε ) do not satisfy a random F . Similarly for the remaining statements. Of course probabilistically there should be no satisfying assignment. Given a graph G = (V, E), a cut is a partition of V into two subsets V1 and V2 . The MAX CUT problem is the problem to maximize the number of Certifying Unsatisfiability of Random 2k-SAT Formulas 19 crossing edges, that is the number of edges with one endpoint in V1 and the other endpoint in V2 . There is a polynomial time approximation algorithm which, given G, finds a cut such that the number of crossing edges is guaranteed to be at least 0.87 · Opt(G), see [12]. Note that the algorithm is deterministic. Algorithm 2. Certifies unsatisfiability. The input is a 4-SAT instance F . 1. Certify the properties as stated in Theorem 1. 2. Let S be the subset of all clauses of F containing only non-negated variables. We construct the graph G = (V, E) as defined above, associated to S. 3. Apply the MAX CUT approximation algorithm to G. 4. If the cut found in 3. contains at most 0.86 · |E| edges the output is “unsatisfiable”, otherwise the algorithm gives an inconclusive answer. Theorem 3. When applying Algorithm 2 to a F ∈ Formn,4,p where p = C/n2 and C is sufficiently large the algorithm efficiently certifies the unsatisfiability of F. Proof. To show that the algorithm is correct, let F be any satisfiable 4-SAT instance. Let a be a satisfying truth value assignment of F . Then Theorem 1 (c) implies that G has a cut comprising almost all edges and the approximation algorithm finds sufficiently many edges, so that we do not answer “unsatisfiable”. Completeness follows from Theorem 1 (c) and the fact that when C is a sufficiently large constant any cut of G has at most slightly more than a fraction of 1/2 of all edges with high probability. ⊓ ⊔ At this point we know that an algorithm efficiently certifying unsatisfiability exists, because there exist suitable so(1)-terms as we know from our theorems and considerations. Given a graph G = (V, E), where |V | is even. A bisection of G is a partition of V into two subsets V1 and V2 with |V1 | = |V2 | = |V |/2. The MIN BISECTION problem is the problem to minimize the number of crossing edges. There is a polynomial time approximation algorithm which, given G, finds a bisection such that the number of crossing edges is guaranteed to be at most O((log n)2 ) · Opt(G), |V | = n, see [8]. Algorithm 4. Certifies unsatisfiability. The input is a 4-SAT instance F . 1. Certify the properties as stated in Theorem 1. 2. Let S be the subset of all clauses of F whose first literal is a negated variable and whose remaining literals are non-negated variables. We construct the graph G = (V, E) associated to this set S. Check if the maximal degree of G is at most 3 · ln n. 3. Apply the MIN BISECTION approximation algorithm to G. 4. If the bisection found contains at least (1/3) · |E| edges, then the output is “unsatisfiable”, otherwise inconclusive. Theorem 3 applies analogously to Algorithm 4. Now, the proof relies on Theorem 1 (d). 20 3 A. Coja-Oghlan et al. Proof of the Parity Theorem We present the algorithms to prove Theorem 1. To deal with the problem of multiple occurrences of pairs of variables in several clauses we need to work with labelled (multi-)graphs and labelled (multi-)hypergraphs. Here the edges between vertices are distinguished by labels. Let H = (V, E) be a standard 4-uniform hypergraph. When speaking of the projection of H onto coordinates 1 and 2 we think of H as a labelled multigraph in which the labelled edge {x1 , x2 }(x1 ,x2 ,x3 ,x4 ) is present if and only if (x1 , x2 , x3 , x4 ) ∈ E. We denote this projection by G = (V, E). Let e = |E| , V = {1, . . . , n}, X ⊆ V , and Y = V \ X. We denote the number of labelled edges of G with one endpoint in X and the other endpoint in Y by e(X, Y ). Similarly e(X) is the number of labelled edges with both endpoints from X. In an asymptotic setting we use our terminology from Section 2 and say that G has negligible discrepancy iff for all X ⊆ V with |X| = α · n where β ≤ α ≤ 1 − β and Y = V \X e(X) ∼s eα2 and e(X, Y ) ∼s 2eα(1 − α) . Here β > 0 is a constant. This extends the discrepancy notion from page 71ff. of [3] to multigraphs. The n × n-matrix A = AG is the adjacency matrix of G, where A(x, y) is the number of labelled edges between x and y. As A is real valued and symmetric, A has n different eigenvectors and corresponding real eigenvalues which we consider ordered as λ1,A ≥ λ2,A ≥ · · · ≥ λn,A . We let λ = λA = max2≤i≤n |λi,A |. In an asymptotic context we speak of strong eigenvalue n separation with respect to a constant k. By this we mean that i=2 λki = so(λk1 ). When k is even and constant, strong eigenvalue separation implies n in particular k k that λ = so(λ ). It is known that for any k ≥ 0 Trace(A ) = 1 x=1 A (x, x) = n k k i=1 λi,A . Moreover, the Trace(A ) is equal to the number of closed walks of length k, that is k steps, in G. The degree of the vertex x in G dx is the number of labelled edges in which x occurs. The n × n-matrix L = LG is a normalized adjacency matrix, it is  related to the Laplacian matrix. We have L(x, y) = A(x, y)/ dx dy . As L = LG is real valued and symmetric, too, we use all the eigenvalue notation introduced for A analogously for L. Here λ1,L = 1 is known. Let d = d(n) be given. In an asymptotic context we say that G is almost d-regular, if for any vertex x of G dx,G = d(n) · (1 + so(1)). Theorem 5.1 and its corollaries on page 72/73 of [3] imply the following fact. Fact 5. Let G = (V, E) where V = {1, . . . , n} be a projection onto two coordinates of the 4-uniform hypergraph H = (V, E) with e = |E|. Let G be almost d-regular, let β ≤ α ≤ 1 − β where β > 0 is a constant, and let X ⊆ V with |X| = αn. Then we have,   (a) e(X) − eα2  ≤ λL · e · α · (1 + so(1)),  (b) |e(X, Y ) − 2eα(1 − α)| ≤ λL · 2 · e · α · (1 − α) · (1 + so(1)) for Y = V \ X. We need methods to estimate λL , they are provided by the next lemma. Lemma 6. Let G be the projection onto two given coordinates of the 4-uniform hypergraph H = (V, E) where V = {1, . . . , n}. If G is almost d-regular and Certifying Unsatisfiability of Random 2k-SAT Formulas 21 AG has strong eigenvalue separation with respect to a given constant k, then LG has strong eigenvalue separation with respect to k. Proof. Let W be the number of walks of length k in G. Then W =  closed n k L shows Trace(Ak ) and Trace LkG = G (x, x). An inductive argument x=1 n k λ that Trace LkG ≤ W · (1/d)k · (1 + so(1)) . Then we get, i=1 i,LG ≤ n    k 1 k · d · (1 + so(1)) . As λ1,LG = 1, whereas λk1,AG = dk · (1 + so(1)) i=1 λi,AG  n we get that i=2 λki,LG = so(1). Note that λ1,A is always at most the maximal degree of G and at least the minimal degree. ⊓ ⊔ We collect some probabilistic properties of labelled projections when H is a random hypergraph. The proof follows known principles. Lemma 7. Let p = c/n2 where c is a sufficiently large constant and let H = (V, E) be a hypergraph from HGn,4,p . Let G = (V, E) be a labelled projection of H onto two coordinates. (a) Let d = d(n) = 2·c·n . Then G is almost d-regular ε with probability at least 1 − e−Ω(n ) for a constant ε > 0. (b) The adjacency matrix A = AG has strong Eigenvalue separation with respect to k = 4 with high probability. Algorithm 8. Efficiently certifies negligible discrepancy with respect to a given constant β of projection graphs. Input is a 4-uniform hypergraph H = (V, E). Let G = (V, E) be the projection onto two given coordinates of H. Check almost  d-regularity of G and check for the adjacency matrix A of G if Trace A4 = d4 · (1 + so(1)). The correctness of the algorithm follows from Fact 5, the completeness when considering HGn,4,p , where p = C/n2 , C sufficiently large, from Lemma 7 and Fact 5. We need to certify discrepancy properties of projections onto 3 given coordinates of a random 4-uniform hypergraph from HGn,4,p where p = c/n2 . Let H = (V, E) be a standard 4-uniform hypergraph. When speaking of the projection of H onto coordinates 1, 2, and 3, we think of H as a labelled 3-uniform hypergraph G = (V, E) in which the labelled 3-tuple (x1 , x2 , x3 )(x1 ,x2 ,x3 ,x4 ) is present if (x1 , x2 , x3 , x4 ) ∈ E. We restrict attention to the projection onto coordinates 1, 2 and 3 in the following. For X, Y, Z ⊆ V we define eG (X, Y, Z) = |{(x, y, z, −) ∈ E | (x, y, z) is of type (X, Y, Z)}| . For the notion of type we refer to the beginning of Section 2. With n = |V | and e = |E| we say that the projection G has negligible discrepancy with respect to β if for all X with |X| = αn, β ≤ α ≤ 1 − β, and Y = V \X we have that eG (X, X, X) ∼s α3 · e, eG (X, Y, X) ∼s α2 (1 − α) · e and analogously for the remaining 6 possibilities of placing X and Y . For 1 ≤ i ≤ 3 and x ∈ V we let dx,i be the number of 4-tuples in E which have x in the i’th slot. Given d = d(n), we say that G is almost d-regular if and only if dx,i = d · (1 + so(1)) for all x ∈ V and all i = 1, 2, 3. We assign labelled product graphs to G. Definition 9 (Labelled product). Let G = (V, E) be the projection onto coordinates 1, 2, and 3 of the 4-uniform hypergraph H = (V, E). 22 A. Coja-Oghlan et al. The labelled product of G with respect to the first coordinate is the labelled graph P = (W, F ), where W = V × V and F is defined as: For x1 , x2 , y1 , y2 ∈ V with (x1 , y1 ) = (x2 , y2 ) we have {(x1 , y1 ), (x2 , y2 )}(h,k) ∈ F iff h = (z, x1 , x2 , −) ∈ E and k = (z, y1 , y2 , −) ∈ E and (!) h = k. If the projection G is almost d-regular the number of labelled edges of the product is n · d2 · (1 + so(1)) provided d ≥ nǫ for constant ǫ > 0. Discrepancy notions for labelled products are totally analogous to those for labelled projection graphs defined above. Theorem 10 is an adaption of Theorem 3.2 in [13]. Theorem 10. Let ǫ > 0 and d = d(n) ≥ nǫ . Let G = (V, E) with |V | = n be the labelled projection hypergraph onto coordinates 1, 2 and 3 of the 4-uniform hypergraph H = (V, E). Assume that G and H have the following properties. 1. G is almost d-regular. 2. The labelled projection graphs of H onto any two of the coordinates 1, 2, and 3 have negligible discrepancy with respect to β > 0. 3. The labelled products of G have negligible discrepancy with respect to β 2 . Then the labelled projection G has negligible discrepancy with respect to β. Lemma 11. Let H = (V, E) be a random hypergraph from HGn,4,p where p = c/n2 and c is sufficiently large. Let G be the labelled projection of H onto the coordinates 1, 2, and 3. Let P = (W, F ) be the labelled product with respect to the first coordinate of G. Then we have (a) P is almost d-regular, where d = 2·c2 ·n, with probability 1−n−Ω(log log n) . (b) The adjacency matrix AP has strong eigenvalue separation with respect to k = 6. Proof. (a) We consider the vertex (x1 , y1 ) ∈ W . First, assume that x1 = y1 . We introduce the random variables, Xz = |{(z, x1 , −, −) ∈ E}|, Yz = |{(z, y1 , −, −) ∈ E}| Xz′ = |{(z, −, x1 , −) ∈ E}|, Yz′ = |{(z, −, y1 , −) ∈ E}|   ′ ′ and finally D = z Xz · Yz + z Xz · Yz . Then D is the degree of the vertex (x1 , y1 ) in the labelled product. The claim follows with Hoeffding’s bound [16], page 104, Theorem 7. For x1 = y1 we can argue similarly. (b) Applying standard techniques we get E[Trace(A6P )] = (2c2 n)6 + so(n6 ). which with (a) implies strong Eigenvalue separation with respect to k = 6 with high probability. ⊓ ⊔ Algorithm 12. Certifies negligible discrepancy of labelled projections onto 3 coordinates of 4-uniform hypergraphs. The input is a 4-uniform hypergraph H = (V, E). Let G = (V, E) be the projection of H onto the coordinates 1, 2, and 3. 1. Check if there is a suitable d such that G is almost d-regular. That is check if dx,i = d · (1 + so(1)) for all vertices x and all i = 1, 2, 3. 2. Check if the labelled projections onto any two of the coordinates 1, 2, 3 of H have negligible discrepancy. Apply Algorithm 8. 3. Check if the products of G are almost d-regular with d = 2c2 n. Certifying Unsatisfiability of Random 2k-SAT Formulas 23   4. For each of the 3 labelled products P of G check if Trace A6P = (2c2 n)6 · (1 + so(1)) where AP is the adjacency matrix of P . 5. Successful certification for G iff all checks are positive. Correctness of the algorithm follows with Theorem 10. Completeness for HGn,k,p with p = C/n2 and C sufficiently large with Theorem 10, Lemma 11 whose proof shows that the property concerning the trace holds with high probability and implies strong eigenvalue separation. Now we can prove Theorem 1. Theorem 1 (a) is trivial. Concerning Theorem 1 (b) we consider the following algorithm. Algorithm 13. Certifies Theorem 1 (b). The input is a 4-SAT instance F . Let H = (V, E) be the hypergraph associated to the subset of clauses which consist of unnegated variables only. 1. Check that the labelled projection of H onto coordinates 1, 2, 3 has negligible discrepancy. 2. Check that the labelled projection of H onto coordinates 2, 3, 4 has negligible discrepancy. 3. Do the same as 1. and 2. for the hypergraph associated to the clauses consisting only of negated variables. 4. If all checks have been successful, certify Theorem 1 (b). Let F be any 4-SAT instance such that the algorithm is successful. Let a be an assignment with |Fa | ≥ (1/2) · n · (1 + δ) where δ = δ(n) > 0 is not negligible in the sense of Section 2 (for example δ = 1/ log n). From Step 1 we know that the fraction of 4-tuples of H of type (Fa , Fa , Fa , −) is ((1/2) · (1 + δ))3 · (1 + so(1)). Under the assumption that a satisfies F , the empty slot is filled with a variable from Ta . From Step 2 we know that the fraction of 4-tuples of H of type (−, Fa , Fa , Ta ) is ((1/2) · (1 + δ))2 · (1/2) · (1 − δ). As δ is not negligible this contradicts negligible discrepancy of the labelled projection onto coordinates 2, 3, and 4 of H. In the same way we can exclude assignments with more variables set to true than false because Step 3 is successful. Therefore the algorithm is correct. For random F the hypergraphs constructed are random hypergraphs and the completeness of Algorithm 12 implies the completeness of the algorithm. Concerning Theorem 1 (c) we consider the following algorithm. Algorithm 14. certifies parity properties. The input is a 4-SAT instance F . 1. Invoke Algorithm 13. 2. Let H be the hypergraph associated to the clauses of F consisting only of non-negated variables. 3. Certify that all 4 labelled projections onto any 3 different coordinates of H have negligible discrepancy (wrt. a suitable β > 0). 4. Certify that all 6 labelled projections onto any two coordinates of H have negligible discrepancy. 5. Certify Theorem 1 (c) if all preceeding checks are successful. 24 A. Coja-Oghlan et al. Correctness and completeness follow similarly as for the preceding algorithm. Those cases of Theorem 1 which are left open by now can be treated analogously and the Parity Theorem is proved. 4 Deciding Satisfiability in Expected Polynomial Time Let Var = Varn = {x1 , . . . , xn } be a set of variables, and let Formn,k,m denote a k-SAT formula chosen uniformly at random among all (2n)k·m possibilities. Further, we consider semirandom formulas Form+ n,k,m , which are made up of a random share and a worst case part added by an adversary: 1. Choose F0 = C1 ∧ · · · ∧ Cm = Formn,k,m at random. 2. An adversary picks any formula F = Form+ n,k,m over Var in which at least one copy of each Ci , i = 1, . . . , m, occurs. Note that in general we cannot reconstruct F0 from F . We say that an algorithm A has a polynomial expected running time applied to Form+ n,k,m if the expected running time remains bounded by a polynomial in the input length regardless of the decisions of the adversary. Theorem 15. Let k ≥ 4 be an even integer. Suppose that m ≥ C · 2k · nk/2 , for some sufficiently large constant C > 0. There exists an algorithm DecideSAT that satisfies the following conditions. 1. Let F be any k-SAT instance over Var. If F is satisfiable, then DecideSAT(F ) will find a satisfying assignment. Otherwise DecideSAT(F ), will output “unsatisfiable”. 2. Applied to Form+ n,k,m , DecideSAT runs in polynomial expected time. DecideSAT exploits the following connection between the k-SAT problem and the maximum independent set problem. Let V = {1, . . . , n}k/2 , and ν = nk/2 . Given any k-SAT instance F over Varn we define two graphs GF = (V, EF ), G′F = (V, EF′ ) as follows. We let {(v1 , . . . , vk/2 ), (w1 , . . . , wk/2 )} ∈ EF iff the k-clause xv1 ∨ · · · ∨ xvk/2 ∨ xw1 ∨ · · · ∨ xwk/2 occurs in F . Similarly, {(v1 , . . . , vk/2 ), (w1 , . . . , wk/2 )} ∈ EF′ iff the k-clause ¬xv1 ∨· · ·∨¬xvk/2 ∨¬xw1 ∨ · · · ∨ ¬xwk/2 occurs in F. Let α(G) denote the independence number of a graph G. Lemma 16. [14] If F is satisfiable, then max{α(GF ), α(G′F )} ≥ 2−k/2 nk/2 . Let Gν,µ denote a graph with ν vertices and µ edges, chosen uniformly at random. We need the following slight extension of a lemma from [14]. Lemma 17. Let F ∈ Formn,k,m be a random formula. 1. Conditioned on |E(GF )| = µ, the graph GF is uniformly distributed; i.e. GF = Gν,µ . A similar statement holds for G′F . 2. Let ε > 0. Suppose that 2k · nk/2 ≤ m ≤ nk−1 . Then with probability at least 1 − exp(−Ω(m)) we have min{|E(GF )|, |E(G′F )|} ≥ (1 − ε) · 2−k · m. Certifying Unsatisfiability of Random 2k-SAT Formulas 25 Thus, our next aim is to bound   the independence number of a semirandom graph efficiently. Let 0 ≤ µ ≤ ν2 . The semirandom graph G+ ν,µ is produced in two steps: First, choose a random graph G0 = Gν,µ . Then, an adversary adds to G0 arbitrary edges, thereby completing G = G+ ν,µ . We employ the Lovász number ϑ, which can be seen as a semidefinite programming relaxation of the independence number. Indeed, ϑ(G) ≥ α(G) for any graph G, and ϑ(G) can be computed in polynomial time [15]. Our algorithm DecideMIS, which will output “typical”, if the independence number of the input graph is “small”, and “not typical” otherwise, is based on ideas invented in [4,5]. Algorithm 18. DecideMIS(G, µ) Input: A graph G of order ν, and a number µ. Output: “typical” or “not typical”. 1. If ϑ(G) ≤ C ′ ν(2µ)−1/2 , then terminate with output “typical”. Here C ′ denotes some sufficiently large constant. 2. If there is no subset S of V , |S| = 25 ln(µ/ν)ν/µ, such that |V \(S ∪N (S))| > 12ν(2µ)−1/2 , then output “typical” and terminate. 3. Check whether in G there is an independent set of size 12ν(2µ)−1/2 . If this is not the case, then output “typical”. Otherwise, output “not typical”. Proposition 19. For any G, if DecideMIS(G, µ) outputs “typical”, then we have α(G) ≤ C ′ ν(2µ)−1/2 . Moreover, the probability that DecideMIS(G+ ν,µ , µ) outputs “not typical” is < exp(−ν). Applied to G+ ν,µ , DecideMIS has a polynomial expected running time, provided µ ≥ C ′′ ν, for some constant C ′′ > 0. Proof. The proof goes along the lines of [5] and is based on the following facts (cf. [4]): Whp. we have ϑ(Gν,µ ) ≤ c1 ν(2µ)−1/2 . Moreover, if M is a median of ϑ(Gν,µ ), and if ξ > 0, then Prob[ϑ(Gν,p ) ≥ M + ξ] ≤ 30ν exp(−ξ 2 /(5M + 10ξ)). To handle G+ ⊓ ⊔ ν,µ , we make use of the monotonicity of ϑ (cf. [15]). Algorithm 20. DecideSAT(F ) Input: A k-SAT formula F over Varn . Output: Either a satisfying assignment of F or “unsatisfiable”. 1. Let µ = 2−k−1 m. If both DecideMIS(GF , µ) and DecideMIS(G′F , µ) answer “typical”, then terminate with output “unsatisfiable”. 2. Enumerate all 2n assignments and look for a satisfying one. Thus, Thm. 15 follows from Lemmas 16, 17 and Prop. 19. 5 Approximating Random MAX 2-SAT Theorem 21. Suppose that m = Cx2 n for some large constant C > 0 and some constant x > 0. There is an algorithm ApxM2S that approximates MAX 2-SAT within a factor of 1 − 1/x for any formula C ∈ Formn,2,m such that the expected running time of ApxM2S(Formn,2,m ) is polynomial. The analysis of ApxM2S is based on the probabilistic analysis of the SDP relaxation SMS of MAX 2-SAT of Goemans and Williamson [12] (details omitted). 26 A. Coja-Oghlan et al. Algorithm 22. ApxM2S(C) Input: An instance C ∈ Formn,2,m of MAX 2-SAT. Output: An assignment of x1 , . . . , xn . 1. Check √ whether the assignment xi =true for all i satisfies at least 3m/4 − c1 mn clauses of C. If this is not the case, then go to 3. Here c1 denotes some suitable constant. √ 2. Compute SMS(C). If SMS(C) ≤ 3m/4 + c2 mn, then output the assignment xi =true for all i and terminate. Here c2 denotes some suitable constant. 3. Enumerate all 2n assignments of x1 , . . . , xn and output an optimal solution. References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. Alon, N., Spencer J.: The Probabilistic Method. John Wiley and Sons 1992. Ben-Sasson, E., Bilu, Y.: A Gap in Average Proof Complexity. ECCC 003 (2002). Chung, F.R.K.: Spectral Graph Theory. American Mathematical Society 1997. Coja-Oghlan, A.: The Lovasz number of random graphs. Hamburger Beiträge zur Mathematik 169. Coja-Oghlan, A., Taraz, A.: Colouring random graphs in expected polynomial time. Proc. STACS 2003, Springer LNCS 2607 487–498. Feige, U.: Relations between average case complexity and approximation complexity. Proc. 34th STOC (2002) 310–332. Feige, U., Goemans, M. X.: Approximating the value of two prover proof systems, with applications to MAX 2SAT and MAX DICUT. Proc. 3rd Israel Symp. on Theory of Computing and Systems (1995) 182–189. Feige, U., Krauthgamer, R.: A polylogarithmic approximation of the minimum bisection. Proc. 41st FOCS (2000) 105–115. Feige, U., Ofek, E.: Spectral techniques applied to sparse random graphs, report MCS03-01, Weizmann Institute of Science (2003). Friedgut., E.: Necessary and Sufficient Conditions for Sharp Thresholds of Graph Properties and the k-SAT problem. J. Amer. Math. Soc. 12 (1999) 1017–1054. Friedman, J., Goerdt, A.: Recognizing more Unsatisfiable Random 3-SAT Instances efficiently. Proc. ICALP 2001, Springer LNCS 2076 310–321. Goemans, M.X., Williamson, D.P.: Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. J. ACM 42 1115–1145. Goerdt, A., Jurdzinski, T.: Some Results on Random Unsatisfiable k-SAT Instances and Approximation Algorithms Applied to Random Structures. Proc. MFCS 2002, Springer LNCS 2420 280–291. Goerdt, A., Krivelevich, M.: Efficient recognition of random unsatisfiable k-SAT instances by spectral methods. Proc. STACS 2001, Springer LNCS 2010 294–304. Grötschel, M., Lovász, L., Schrijver, A.: Geometric algorithms and combinatorial optimization. Springer 1988. Hofri, M.: Probabilistic Analysis of Algorithms. Springer 1987. Inapproximability Results for Bounded Variants of Optimization Problems Miroslav Chlebı́k1 and Janka Chlebı́ková2⋆ 1 Max Planck Institute for Mathematics in the Sciences Inselstraße 22-26, D-04103 Leipzig, Germany 2 Christian-Albrechts-Universität zu Kiel Institut für Informatik und Praktische Mathematik Olshausenstraße 40, D-24098 Kiel, Germany jch@informatik.uni-kiel.de Abstract. We study small degree graph problems such as Maximum Independent Set and Minimum Node Cover and improve approximation lower bounds for them and for a number of related problems, like Max-B-Set Packing, Min-B-Set Cover, Max-Matching in B-uniform 2-regular hypergraphs. For example, we prove NP-hardness 95 factor of 94 for Max-3DM, and factor of 48 for Max-4DM; in both cases 47 the hardness result applies even to instances with exactly two occurrences of each element. 1 Introduction This paper deals with combinatorial optimization problems related to bounded variants of Maximum Independent Set (Max-IS) and Minimum Node Cover (Min-NC) in graphs. We improve approximation lower bounds for small degree variants of them and apply our results to even highly restricted versions of set covering, packing and matching problems, including Maximum-3Dimensional-Matching (Max-3DM). It has been well known that Max-3DM is MAX SNP-complete (or APXcomplete) even when restricted to instances with the number of occurrences of any element bounded by 3. To the best of our knowledge, the first inapproximability result for bounded Max-3DM with the bound 2 on the number of occurrences of any elements in triples, appeared in our paper [5], where the first explicit approximation lower bound for Max-3DM problem is given. (For less restricted matching problem, Max 3-Set Packing, the similar inapproximability result for instances with 2 occurrences follows directly from hardness results for Max-IS problem on 3-regular graphs [2], [3]). For B-dimensional Matching problem with B ≥ 4 the lower bounds on approximability were recently proven by Hazan, Safra and Schwartz [12]. A limitation of their method, as their explicitly state, is that it does not provide an inapproximability factor for ⋆ The author has been supported by EU-Project ARACNE, Approximation and Randomized Algorithms in Communication Networks, HPRN-CT-1999-00112. A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 27–38, 2003. c Springer-Verlag Berlin Heidelberg 2003  28 M. Chlebı́k and J. Chlebı́ková 3-Dimensional Matching. But just inapproximability factor for 3-dimensional case is of major interest, as it allows the improvement of hardness of approximation factors for several problems of practical interest, e.g. scheduling problems, some (even highly restricted) cases of Generalized Assignment problem, and other packing problems. This fact, and an important role of small degree variants of Max-IS (Min-NC) problem as intermediate steps in reductions to many other problems of interest, are good reasons for trying to push our technique to its limits. We build our reductions on a restricted version of Maximum Linear Equations over Z2 with 3 variables per equation and with the (large) constant number of occurrences of each variable. Recall that this method, based on the deep Håstad’s version of PCP theorem, was also used to prove ( 117 116 − ε)-approximability lower bound for Traveling Salesman problem by Papadimitriou and Vempala [14], 96 and for our lower bound of 95 for Steiner Tree problem in graphs [6]. In this paper we optimize our equation gadgets and their coupling via a consistency amplifier. The notion of consistency amplifier varies slightly from problem to problem. Generally, they are graphs with suitable expanding (or mixing) properties. Interesting quantities, in which our lower bounds can be expressed, are parameters of consistency amplifiers that provably exist. Let us explain how our inapproximability results for bounded variants of Max-IS and Min-NC, namely B-Max-IS and B-Min-NC, imply the same bounds for some set packing, set covering and hypergraph matching problems. Max Set Packing (resp. Min Set Cover) is the following: Given a collection C of subsets of a finite set S, find a maximum (resp., minimum) cardinality collection C ′ ⊆ C such that each element in S is contained in at most one (resp., in at least one) set in C ′ . If each set in C is of size at most B, we speak about B-Set Packing (res. B-Set Cover). It may be phrased also in hypergraph notation; the set of nodes is S and elements of C are hyperedges. In this notation a set packing is just a matching in the corresponding hypergraph. For a graph G = (V, E) we define its dual  = (E, V ) whose node set is just E, V = { hypergraph G v : v ∈ V }, and for each  v ∈ V hyperedge v consists of all e ∈ E such that v ∈ e in G. Hypergraph G  is contained exactly defined by this duality is clearly 2-regular, each node of G  is of dimension B, in particin two hyperedges. G is of maximum degree B iff G  is B-uniform. Independent sets in G are in one-to-one ular G is B-regular iff G  (hence with set packings, in set-system nocorrespondence with matchings in G  Hence any approximation tation), and node covers in G with set covers for G. hardness result for B-Max-IS translates via this duality to the one for MaxB-Set Packing (with exact 2 occurrences), or to Max Matching in 2-regular B-dimensional hypergraphs. Similar is the relation of results on B-Min-NC to Min-B-Set Cover problem.  is, moreover, B-partite with If G is B-regular edge B-colored graph, then G balanced B-partition determined by corresponding color classes. Hence independent sets in such graphs correspond to B-dimensional matchings in natural way. Hence any inapproximability result for B-Max-IS problem restricted to Inapproximability Results for Bounded Variants of Optimization Problems 29 B-regular edge-B-colored graphs translates directly to inapproximability result for Max-B-Dimensional Matching (Max-B-DM), even on instances with exact two occurrences of each element. Our results for Max-3DM and Max-4DM nicely complement recent results of [12] on Max-B-DM given for B ≥ 4. To compare our results with their for 54 B = 4, we have better lower bound ( 48 47 vs. 53 − ε) and our result applies even to highly restricted version with two occurrences. On the other hand, their hard gap result has almost perfect completeness. The main new explicit NP-hardness factors of this contribution are summarized in the following theorem. In more precise parametric way they are expressed in Theorems 3, 5, 6. Better upper estimates on parameters from these theorems immediately improve lower bounds given bellow. Theorem. It is NP-hard to approximate: 95 • Max-3DM and Max-4DM to within 94 and 48 47 respectively, both results apply to instances with exactly two occurrences of each element; • 3-Max-IS (even on 3-regular graphs) and Max Triangle Packing (even on 4-regular line graphs) to within 95 94 ; • 3-Min-NC (even on 3-regular graphs) and Min-3-Set Cover (with exactly two occurrences of each element) to within 100 99 ; 48 • 4-Max-IS (even on 4-regular graphs) to within 47 ; • 4-Min-NC (even on 4-regular graphs) and Min-4-Set Cover (with exactly 53 two occurrences) to within 52 ; • B-Min-NC (B ≥ 3) to within 67 − 12 logBB . Preliminaries Definition 1. Max-E3-Lin-2 is the following optimization problem: Given a system I of linear equations over Z2 , with exactly 3 (distinct) variables in each equation. The goal is to maximize, over all assignments ϕ to the variables, the ratio sat(ϕ) |I| , where sat(ϕ) is the number of equations of I satisfied by ϕ. We use the notation Ek-Max-E3-LIN-2 for the same maximization problem, where each variable occurs exactly k times. The following theorem follows from Håstad’s results [11], see [5] for more details   Theorem 1. For every ε ∈ 0, 41 there is a constant k(ε) such that for every k ≥ k(ε) the following problem is NP-hard: given an instance of Ek-Max-E3Lin-2, decide whether the fraction of more than (1 − ε) or less than ( 21 + ε) of all equations is satisfied by the optimal (i.e. maximizing) assignment. To use all properties of our equation gadgets, the order of variables in equations will play a role. We denote by E[k, k, k]-Max-E3-Lin-2 those instances of E3k-Max-E3-Lin-2 for which each variable occurs exactly k times as the first variable, k times as the second variable and k times as the third variable in equations. Given an instance I0 of Ek-Max-E3-Lin-2 we can easily transform 30 M. Chlebı́k and J. Chlebı́ková it into an instance I of E[k, k, k]-Max-E3-Lin-2 with the same optimum, as follows: for any equation x + y + z = j of I0 we put in I the triple of equations x + y + z = j, y + z + x = j, and z + x + y = j. Hence the same NP-hard gap as in Theorem 1 applies for E[k, k, k]-Max-E3-Lin-2 as well. We describe several reductions from E[k, k, k]-Max-E3-Lin-2 to bounded occurrence instances of NP-hard problems that preserve the hard gap of E[k, k, k]-Max-E3-Lin-2. 2 Consistency Amplifiers As a parameter of our reduction for B-Max-IS (or B-Min-NC) (B ≥ 3), and Max-3DM, we will use a graph H, so called consistency 3k-amplifier, with the following structure: (i) (ii) (iii) (iv) The degree of each node is at most B. There are 3k pairs of contact nodes {(ci0 , ci1 ) : i = 1, 2, . . . , 3k}. The degree of any contact node is at most B − 1. The first 2k pairs of contact nodes {(ci0 , ci1 ) : i = 1, 2, . . . , 2k} are implicitly linked in the following sense: whenever J is an independent set in H, there is an independent set J ′ in H such that |J ′ | ≥ |J|, a contact node c can belong to J ′ only if c ∈ J, and for any i = 1, 2, . . . , 2k at most one node of the pair (ci0 , ci1 ) belongs to J ′ . (v) The consistency property: Let us denote Cj := {c1j , c2j , . . . , c3k j } for j ∈ {0, 1}, and Mj := max{|J| : J is an independent set in H such that J ∩ C1−j = ∅}. Then M1 = M2 (:= M (H)), and for every ψ : {1, 2, . . . , 3k} → {0, 1} and for every independent set J in H \ {ci1−ψ(i) : i = 1, 2, . . . , 3k} we   have |J| ≤ M (H) − min |{i : ψ(i) = 0}|, |{i : ψ(i) = 1}| . Remark 1. Let j ∈ {0, 1} and J be any independent set in H \ C1−j such that |J| = M (H), then J ⊇ Cj . To show that, assume that for some l ∈ {1, 2, . . . , 3k} clj ∈ / J. Define ψ : {1, 2, . . . , 3k} → {0, 1} by ψ(l) = 1 − j, and ψ(i) = j for i = l. Now (v) above says |J| < M (H), a contradiction. Hence, in particular, Cj is an independent set in H. To obtain better inapproximability results we use equation gadgets that require some further restrictions on degrees of contact nodes of a consistency 3k-amplifier: (iii-1) For B-Max-IS, B ≥ 6, the degree of any contact node is at most B − 2. (iii-2) For B-Max-IS, B ∈ {4, 5}, the degree of any contact node cij with i ∈ {1, . . . , k} is at most B − 1, the degree of cij with i ∈ {k + 1, . . . , 3k} is at most B − 2, where j = 1, 2. For integers B ≥ 3 and k ≥ 1 let GB,k stand for the set of corre : H ∈ GB,k , sponding consistency 3k-amplifiers. Let µB,k := min M (H) k   (H) λB,k := min |V (H)|−M : H ∈ GB,k (if GB,k = ∅, let λB,k = µB,k = ∞), k µB = limk→∞ µB,k , and λB = limk→∞ λB,k . The parameters µB and λB play a role of quantities in which our inapproximability results for B-Max-IS and B-Min-NC can be expressed. To obtain explicit lower bounds on approximability requires to find upper bounds on those parameters. Inapproximability Results for Bounded Variants of Optimization Problems 31 In what follows we describe some methods how consistency 3k-amplifiers can be constructed. We will confine ourselves to highly regular amplifiers. This ensures that our inapproximability results apply to B-regular graphs for small values of B. We will look for a consistency 3k-amplifier H as a bipartite graph with bipartition (D0 , D1 ), where C0 ⊆ D0 , C1 ⊆ D1 and |D0 | = |D1 |. The idea is that if Dj (j = 0, 1) is significantly larger than 3k (= |Cj |) then suitable probabilistic model of constructing bipartite graphs with bipartition (D0 , D1 ) and prescribed degrees, will produce with high probability a graph H with good “mixing properties” that ensures the consistency property with M (H) = |Dj |. We will not develop probabilistic model here, rather we will rely on what has already been proved (using similar methods) for amplifiers. The starting point to our construction of consistency 3k-amplifiers will be amplifiers, which were studied by Berman & Karpinski [3], [4] and Chlebı́k & Chlebı́ková [5]. Definition 2. A graph G = (V, E) is a (2, 3)-graph if G contains only the nodes of degree 2 (contacts) and 3 (checkers). We denote Contacts = {v ∈ V : degG (v) = 2}, and Checkers = {v ∈ V : degG (v) = 3}. Furthermore, a (2, 3)-graph G is an amplifier if for every A ⊆ V : |Cut A| ≥ |Contacts ∩ A|, or |Cut A| ≥ |Contacts \ A|, where Cut A = {{u, v} ∈ E: exactly one of nodes u and v is in A}. An amplifier G is called a (k, τ )-amplifier if |Contacts| = k and |V | = τ k. To simplify proofs we will use in our constructions only such (k, τ )-amplifiers which contain no edge between contact nodes. Recall, that the infinite families of amplifiers with τ = 7 [3], and even with τ ≤ 6.9 constructed in [5], are of this kind. The consistency 3k-amplifier for B = 3. Let a (3k, τ )-amplifier G = (V (G), E(G)) from Definition 2 be fixed, and x1 , . . . , x3k be its contact nodes. We assume, moreover, that there is a matching in G consisting of nodes V (G) \ {x2k+1 , . . . , x3k }. Let us point out that both, the wheel-amplifiers with τ = 7 [3], and also their generalization given in [5] with τ ≤ 6.9, clearly contain such matchings. Let one such matching M ⊆ E(G) be fixed from now on. Each node x ∈ V (G) is replaced with a small gadget Ax . The gadget of x ∈ V (G) \ {x2k+1 , . . . , x3k } is a path of 4 nodes x0 , X1 , X0 , x1 (in this order). For x ∈ {x2k+1 , . . . , x3k } we take as Ax a pair of nodes x0 , x1 without an edge. Denote Ex := {x0 , x1 } for each x ∈ V (G), and Fx := {X0 , X1 } for x ∈ V (G) \ {x2k+1 , . . . , x3k }. The union of gadgets Ax (over all x ∈ V (G)) contains already all nodes of our consistency 3k-amplifier H, and some of its edges. Now we identify the remaining edges of H. For each edge {x, y} of G we connect corresponding gadgets Ax , Ay with a pair of edges in H, as follows: if {x, y} ∈ M, we connect X0 with Y1 and X1 with Y0 ; if {x, y} ∈ E(G) \ M, we connect x0 with y1 , and x1 with y0 . Having this done, one after another for each edge {x, y} ∈ E(G), we obtain the consistency 3k-amplifier H = (V (H), E(H)) with contact nodes xij determined by contact nodes xi of G, for j ∈ {0, 1}, i ∈ {1, 2, . . . , 3k}. The proof of all conditions from the definition of a consistency 3k-amplifier can be found in [7]. Hence, µ3 ≤ 40.4, λ3 ≤ 40.4 follows from this construction. 32 M. Chlebı́k and J. Chlebı́ková The construction of the consistency amplifier for B = 4 is similar and can be also found in [7]. In this case µ4 ≤ 21.7, λ4 ≤ 21.7 follows from the construction. We do not try to optimize our estimates for B ≥ 5 in this paper, we are mainly focused on cases B = 3 and B = 4. For larger B we provide our inapproximability results based on small degree amplifiers constructed above. Of course, one can expect that amplifiers with much better parameters can be found for these cases by suitable constructions. We only slightly change the consistency 3k-amplifier H constructed for case B = 4 to get some (very small) improvement for B ≥ 5 case. Namely, also for x ∈ {xk+1 , xk+2 , . . . , x2k } we take as Ax a pair of nodes connected by an edge. The corresponding ci0 , ci1 nodes of H will have degree 3 in H, but we will have now M (H) = 3τ k. The same proof of consistency for H will work. This consistency amplifier H will be clearly simultaneously a consistency 3k-amplifier for any B ≥ 5. In this way we get the upper bound µB ≤ 20.7, λB ≤ 20.7 for any B ≥ 5. 3 The Equation Gadgets In the reduction to our problems we use the equation gadgets Gj for equations x + y + z = j, j = 0, 1. To obtain better inapproximability results, we use slightly modified equation gadgets for distinct value of B in B-Max-IS problem (or B-Min-NC problem). For j ∈ {0, 1} we define equation gadgets Gj [3] for 3-Max-IS problem (Fig. 1), Gj [4] for 4(5)-Max-IS (Fig. 2(i)), Gj [6] for B-Max-IS B ≥ 6 (Fig. 2(ii)). In each case the gadget G1 [∗] can be obtained from G0 [∗] replacing each i ∈ {0, 1} in indices and labels by 1 − i. For each u ∈ {x, y, z} we denote by Fu the set of all accented u-nodes from Gj (hence Fu is a subset of {u′0 , u′1 , u′′0 , u′′1 }), and Fu := ∅ if Gj does not contain any accented u-node; Tu := Fu ∪ {u0 , u1 }. For a subset A of nodes of Gj and any independent set J in Gj we will say that J is pure in A if all nodes of A ∩ J have the same lower index (0 or 1). If moreover, A ∩ J consists exactly of all nodes of A of one index, we say that J is full in A. The following theorem describes basic properties of equation gadgets, the proof can be found in [7]. Theorem 2. Let Gj (j ∈ {0, 1}) be one of the following gadgets: Gj [3], Gj [4], or Gj [6], corresponding to an equation x + y + z = j. Let J be an independent set in Gj such that for each u ∈ {x, y} at most one of two nodes u0 and u1 belongs to J. Then there is an independent set J ′ in Gj with the following properties: (I) (II) (III) (IV) |J ′ | ≥ |J|, for each u ∈ {x, y} it holds J ′ ∩ {u0 , u1 } = J ∩ {u0 , u1 }, J ′ ∩ {z0 , z1 } ⊆ J ∩ {z0 , z1 } and |J ′ ∩ {z0 , z1 }| ≤ 1, J ′ contains (exactly) one special node, say ψ(x)ψ(y)ψ(z). Furthermore, J ′ is pure in Tu and full in Fu . Inapproximability Results for Bounded Variants of Optimization Problems z1 z0′ x′1 x0 z0 z1′′ z0′′ 011 z1′ 110 y1 y0′ x′′0 y1′′ x′′1 y0′′ x1 101 x′0 33 000 y0 y1′ Fig. 1. The equation gadget G0 := G0 [3] for 3-Max-IS and Max-3DM. y1 y0 101 z0 110 x′0 000 101 z1 000 z1 z0 x0 011 x1 x′1 x1 (i) y1 011 y0 x0 (ii) 110 Fig. 2. The equation gadget (i) G0 := G0 [4] for B-Max-IS, B ∈ {4, 5}, (ii) G0 := G0 [6] for B-Max-IS (B ≥ 6). 4 Reduction for B-Max-IS and B-Min-NC For arbitrarily small fixed ε > 0 consider k large enough such that conclusion of Theorem 1 for E[k, k, k]-Max-E3-Lin-2 is satisfied. Further, let a consistency (H) (resp. |V (H)|−M ) as close to µB (resp. λB ) as we 3k-amplifier H have M (H) k k need. Keeping one consistency 3k-amplifier H fixed, our reduction f (= fH ) from E[k, k, k]-Max-E3-Lin-2 to B-Max-IS (resp., B-Min-NC) is as follows: Let I be an instance of E[k, k, k]-Max-E3-Lin-2, V(I) be the set of variables of I, m := |V(I)|. Hence I has mk equations, each variable u ∈ V(I) occurs exactly in 3k of them: k times as the first variable, k times as the second one, and k times as the third variable in the equation. Assume, for convenience, that equations are numbered by 1, 2, . . . , mk. Given variable u ∈ V(I) and s ∈ {1, 2, 3} let rs1 (u) < rs2 (u) < · · · < rsk (u) be the numbers of equations in which variable u occurs as the s-th variable. On the other hand, if for fixed r ∈ {1, 2, . . . , mk} the r-th equation is x + y + z = j (j ∈ {0, 1}), there are uniquely determined 34 M. Chlebı́k and J. Chlebı́ková i(x,r) i(y,r) numbers i(x, r), i(y, r), i(z, r) ∈ {1, 2, . . . , k} such that r1 (x) = r2 (y) = i(z,r) r3 (z) = r. Take m disjoint copies of H, one for each variable. Let Hu denote a copy of H that correspondents to a variable u ∈ V(I). The corresponding contacts are in Hu denoted by Cj (u) = {uij : i = 1, 2, . . . , 3k}, j = 0, 1. Now we take mk disjoint copies of equation gadgets Gr , r ∈ {1, 2, . . . , mk}. More precisely, if the r-th equation reads as x + y + z = j (j ∈ {0, 1}) we take as Gr a copy of Gj [3] for 3-Max-IS (or Gj [4] for 4(5)-Max-IS or Gj [6] for B-Max-IS, B ≥ 6). Then i(x,r) i(x,r) , x1 the nodes x0 , x1 , y0 , y1 , z0 , z1 of Gr are identified with nodes x0 k+i(y,r) k+i(y,r) 2k+i(z,r) 2k+i(z,r) (of Hy ), z0 , z1 (of Hz ), respectively. (of Hx ), y0 , y1 It means that in each Hu the first k-tuple of pairs of contacts corresponds to the occurrences of u as the first variable, the second k-tuple corresponds to the occurrences as the second variable, and the third one occurrences as the last variable. Making the above identification for all equations, one after another, we get a graph of degree at most B, denoted by f (I). Clearly, the above reduction f (using the fixed H as a parameter) to special instances of B-Max-IS is polynomial. It can be proved that NP-hard gap of E[k, k, k]-Max-E3-Lin-2 is preserved ([7]). The following main theorem summarizes the results Theorem 3. It is NP-hard to approximate: the solution of 3-Max-IS to within any constant smaller than 1 + 2µ31+13 ; for B ∈ {4, 5} the solution of B-Max-IS to within any constant smaller than 1 + 2µB1 +3 , the solution of B-Max-IS, B ≥ 6, to within any constant smaller than 1 + 2µB1 +1 . Similarly, it is NP-hard to approximate the solution of 3-Min-NC to within any constant smaller than 1 + 1 2λ3 +18 , for B ∈ {4, 5} the solution of B-Min-NC to within any constant smaller than 1 + 2λB1+8 , the solution of B-Min-NC, B ≥ 6, to within any constant smaller than 1 + 2λB1+6 . Using our upper bounds given for µB , λB for distinct value of B we obtain Corollary 1. It is NP-hard to approximate the solution of 3-Max-IS to within 48 1.010661 (> 95 94 ); the solution of 4-Max-IS to within 1.0215517 (> 47 ), the so46 lution of 5-Max-IS to within 1.0225225 (> 45 ) and the solution of B-Max-IS, 44 B ≥ 6 to within 1.0235849 (> 43 ). Similarly, it is NP-hard to approximate the solution of 3-Min-NC to within 1.0101215 (> 100 99 ); the solution of 4-Min-NC ); the solution of 5-Min-NC to within 1.0202429 to within 1.0194553 (> 53 52 49 ) and B-Min-NC, B ≥ 6, to within 1.021097 (> ). For each B ≥ 3, the (> 51 50 48 corresponding result applies to B-regular graphs as well. 5 Asymptotic Approximability Bounds This paper is focused mainly on graphs of very small degree. In this section we discuss also the asymptotic relation between hardness of approximation and degree for Independent Set and Node Cover problem in bounded degree graphs. Inapproximability Results for Bounded Variants of Optimization Problems 35 For the Independent Set problem in the class of graphs of maximum degree B the problem is known to be approximable with performance ratio arbitrarily close to B+3 (Berman & Fujito, [2]). But asymptotically better ratios can 5 be achieved by polynomial algorithms, currently the best one approximates to within a factor of O(B log log B/ log B), as follows from [1], [13]. On the other hand, Trevisan [15] has proved NP-hardness to approximate the solution to √ O( log B) within B/2 . For the Node Cover problem the situation is more challenging, even in general graphs. A recent result of Dinur and Safra [10] shows that for any δ > 0 the √ Minimum Node Cover problem is NP-hard to approximate to within 10 5 − 21 − δ. One can observe that their proof can give hardness result also for graphs with (very large) bounded degree B(δ). This follows from the fact that after their use of Raz’s parallel repetition (where each variable appears in only a constant number of tests), the degree of produced instances is bounded by a function of δ. But the dependence of B(δ) on δ in their proof is really very complicated. The earlier 76 − δ lower bound proved by Håstad [11] was extended by Clementi & Trevisan [9] to graphs with bounded degree B(δ). Our next result improve on their; it has better trade-off between non-approximability and the degree bound. There are no hidden constants in our asymptotic formula, and it provides good explicit inapproximability results for degree bound B starting from few hundreds. First we need to introduce some notation. Notation. Denote F (x) := −x log x − (1 − x) log(1 − x), x ∈ (0, 1), where log means the natural logarithm. Further, G(c, t) := (F (t) + F (ct))/(F (t) − 1 ctF ( 1c )) for 0 < t < 1c < 1, g(t) := G( 1−t t , t) for t ∈ (0, 2 ). More explicitly, g(t) = 2[−t log t − (1 − t) log(1 − t)]/[−2(1 − t) log(1 − t) + (1 − 2t) log(1 − 2t)]. Using Taylor series of the logarithm near 1 we see that the denominator here is ∞ ∞ 1 2k+2 −2 tk > t2 , and −(1−t) log(1−t) = t−t2 k=0 (k+1)(k+2) tk < t, t2 · k=0 (k+1)(k+2) 2 1 consequently g(t) < t (1 + log t ). For large enough B we look for δ ∈ (0, 16 ) such that 3⌊g( 2δ )⌋ + 3 ≤ B. As 1 ≈ 75.62 and g is decreasing in (0, 12 , we can see that for B ≥ 228 any −1 B δ > δB := 2g (⌊ 3 ⌋) will do. Trivial estimates on δB (using g(t) < 2t (1 + log 1t )) B 12 are δB < B−3 (log(B − 3) + 1 − log 6) < 12 log . B We will need the following lemma about regular bipartite expanders to prove the Theorem 4 (see [7] for proofs). 1 g( 12 ) Lemma 1. Let t ∈ (0, 12 ) and d be an integer for which d > g(t). For every sufficiently large positive integer n there is a d-regular n by n bipartite graph H with bipartition (V0 , V1 ), such that for each independent set J in H either |J ∩ V0 | ≤ tn, or |J ∩ V1 | ≤ tn. Theorem 4. For every δ ∈ (0, 16 ) it is NP-hard to approximate Minimum Node Cover to within 67 − δ even in graphs of maximum degree ≤ 3⌊g( 2δ )⌋ + 3 ≤ 3⌈ 4δ (1 + log 2δ )⌉. Consequently, for any B ≥ 228 it is NP-hard to approximate B-Min-NC to within any constant smaller than 76 − δB , where δB := 12 (log(B − 3) + 1 − log 6) < 12 logBB . 2g −1 (⌊ B3 ⌋) < B−3 36 M. Chlebı́k and J. Chlebı́ková Typically, the methods used for asymptotic results cannot be used for small values of B to achieve interesting lower bounds. Therefore we work on new techniques that improve the results of Berman & Karpinski [3] and Chlebı́k & Chlebı́ková [5]. 6 Max-3DM and Other Problems Clearly, the restriction of B-Max-IS problem to edge-B-colored B-regular graphs is a subproblem of Maximum B-Dimensional Matching (see [5] for more details). Hence we want to prove that our reduction to B-Max-IS problem can produce as instances edge-B-colored B-regular graphs. In this contribution we present results for B = 3, 4. For the equation x + y + z = j (j ∈ {0, 1}) of E[k, k, k]-Max-E3-Lin-2 we will use an equation gadget Gj [B], see Fig. 1 and Fig. 2(i). The basic properties of these gadgets are described in Theorem 2. Maximum 3-Dimensional Matching As follows from Fig. 1 a gadget G0 [3] can be edge-3-colored by colors a, b, c in such way that all edges adjacent to nodes of degree one (contacts) are colored by one fixed color, say a (for G1 [3] we take the corresponding analogy). As an amplifier of our reduction f = fH from E[k, k, k]-Max-E3-Lin-2 to Max-3DM we use a consistency 3k-amplifier H ∈ G3,k with some additional properties: degree of any contact node is exactly 2, degree of any other node is 3 and moreover, a graph H is an edge-3-colorable by colors a, b, c in such way that all edges adjacent to contact nodes are colored by two colors b and c. Let G3DM,k ⊆  : H ∈ G3,k be the class of all such amplifiers. Denote µ3DM,k = min M (H) k G3DM,k and µ3DM := limk→∞ µ3DM,k . We use the same construction for consistency 3k-amplifiers as was presented for 3-Max-IS, but now we have to show that produced graph H fulfills conditions about coloring of edges. For fixed (3k, τ )-amplifier G and the matching M ⊆ E(G) of nodes V (G) \ {x2k+1 , . . . , x3k } we define edge coloring in two steps: (i) Take preliminary the following edge coloring: for each {x, y} ∈ M we color the corresponding edges in H as depicted on Fig. 3(i). The remaining edges of H are easily 2-colored by colors b and c, as the rest of the graph is bipartite and of degree at most 2. So, we have a proper edge-3-coloring but some edges adjacent to contacts are colored by color a. It will happen exactly if x ∈ {x1 , x2 , . . . , x2k }, {x, y} ∈ M. (We assume that no two contacts of G are adjacent, hence y is a checker node of G.) Clearly, one can ensure that in the above extension of coloring of edges by colors c and b both other edges adjacent to x0 and x1 have the same color. (ii) Now we modify our edge coloring in all these violating cases as follows. Fix x ∈ {x1 , . . . , x2k }, {x, y} ∈ M, and let both other edges adjacent to x0 and x1 have assigned color b. Then change coloring according Fig. 3(ii). The case when both edges have assigned color c, can be solved analogously (see Fig. 3(iii)). From the construction follows µ3DM ≤ 40.4. Keeping one such consistency 3k-gadget H fixed, our reduction f (= fH ) from E[k, k, k]-Max-E3-Lin-2 is exactly the same as for B-Max-IS described Inapproximability Results for Bounded Variants of Optimization Problems 37 x1 X0 Y1 y0 x1 X0 Y1 y0 x1 X0 Y1 y0 x0 X1 Y0 y1 x0 X1 Y0 y1 x0 X1 Y0 y1 (i) (ii) (iii) Fig. 3. a color: dashed line, b color: dotted line, c color: solid line in Section 4. Let us fix an instance I of E[k, k, k]-Max-E3-Lin-2 and consider an instance f (I) of 3-Max-IS. As f (I) is edge 3-colored 3-regular graph, it is at the same time an instance of 3DM with the same objective function. We can show how the NP-hard gap of E[k, k, k]-Max-E3-Lin-2 is preserved exactly in the same way as for 3-Max-IS. Consequently it is NP-hard to approximate the solution of Max-3DM to within 1 + (1 − 4ε)( 2Mk(H) + 13 + 2ε), even on instances with each element occurring in exactly two triples. Maximum 4-Dimensional Matching We will use the following edge-4-coloring of our gadget G0 [4] in Fig. 2(i) (analogously for G1 [4]): a-colored edges {x′0 , 101 }, {x′1 , 011 }, {y1 , 000 }, {y0 , 110 }; b-colored edges {x′0 , 110 }, {x′1 , 000 }, {y1 , 101 }, {y0 , 011 }; ccolored edges {x1 , x′0 }, {x0 , x′1 }, { 101 , 110 }, {z0 , 011 }, {z1 , 000 }; d-colored edges {x′0 , x′1 }, { 000 , 011 }, {z0 , 101 }, {z1 , 110 }. Now we will show that an edge-4-coloring of a consistency 3k-amplifier H exists that fit well with the above coloring of equation gadgets. We suppose that the (3k, τ )-amplifier G from which H was constructed has a matching M of all checkers. (This is true for amplifiers of [3] and [5]). The color d will be used for edges {x0 , x1 }, x ∈ V (G) \ {x2k+1 , . . . , x3k }. Also, for any x ∈ {xk+1 , . . . , x2k }, the corresponding {X0 , X1 } edge will have color d too. The color c will be reserved for coloring edges of H “along the matching M”, i.e. if {x, y} ∈ M, edges {x0 , y1 } and {x1 , y0 } have color c. Furthermore, for x ∈ {xk+1 , . . . , x2k } the corresponding edges {x0 , X1 } and {x1 , X0 } will be of color c too. The edges that are not colored by c and d form a 2-regular bipartite graph, hence they can be edge 2-colored by colors a and b. The above edge 4-coloring of H and Gj [4] (j ∈ {0, 1}) ensures that instances produced in our reduction to 4-Max-IS are edge-4-colored 4-regular graphs. The following theorem summarizes both achieved results: Theorem 5. It is NP-hard to approximate the solution of Max-3DM to within 95 1 any constant smaller than 1 + 2µ3DM +13 > 1.010661 > 94 , and the solution 48 of Max-4-DM to within 1.0215517 (> 47 ). The both inapproximability results hold also on instances with each element occurring in exactly two triples, resp. quadruples. Lower bound for Min-B-Set Cover follows from that of B-Min-NC, as was explained in Introduction. It is also easy to see that instances obtained by 38 M. Chlebı́k and J. Chlebı́ková our reduction for 3-Max-IS are 3-regular triangle-free graphs. Hence, we get the same lower bound for Maximum Triangle Packing by simple reduction (see [5] for more details). Theorem 6. It is NP-hard to approximate the solution of the problems Maximum Triangle Packing (even on 4-regular line graphs) to within any constant 95 smaller than 1 + 2µ31+13 > 1.010661 > 94 , Min-3-Set Cover with exactly two occurrences of each element to within any constant smaller than 1 + 2λ31+13 > 1.0101215 > 100 99 ; and Min-4-Set Cover with exactly two occurrences of each element to within any constant smaller than 1 + 2λ41+8 > 1.0194553 > 53 52 . Conclusion Remarks. A plausible direction to improve further our inapproximability results is to give better upper bounds on parameters λB , µB . We think that there is still a potential for improvement here, using a suitable probabilistic model for the construction of amplifiers. References 1. N. Alon and N. Kahale: Approximating the independent number via the θ function, Mathematical Programming 80(1998), 253–264. 2. P. Berman and T. Fujito: Approximating independent sets in degree 3 graphs, Proc. of the 4th WADS, LNCS 955, 1995, Springer, 449–460. 3. P. Berman and M. Karpinski: On Some Tighter Inapproximability Results, Further Improvements, ECCC Report TR98-065, 1998. 4. P. Berman and M. Karpinski: Efficient Amplifiers and Bounded Degree Optimization, ECCC Report TR01-053, 2001. 5. M. Chlebı́k and J. Chlebı́ková: Approximation Hardness for Small Occurrence Instances of NP-Hard Problems, Proc. of the 5th CIAC, LNCS 2653, 2003, Springer (also ECCC Report TR02-73, 2002). 6. M. Chlebı́k and J. Chlebı́ková: Approximation Hardness of the Steiner Tree Problem on Graphs, Proc. of the 8th SWAT, LNCS 2368, 2002, Springer, 170–179. 7. M. Chlebı́k and J. Chlebı́ková: Inapproximability results for bounded variants of optimization problems, ECCC Report TR03-26, 2003. 8. F. R. K. Chung: Spectral Graph Theory, CBMS Regional Conference Series in Mathematics, AMS, 1997, ISSN 0160-7642, ISBN 0-8218-0315-8. 9. A. Clementi and L. Trevisan: Improved non-approximability results for vertex cover with density constraints, Theoretical Computer Science 225(1999), 113–128. 10. I. Dinur and S. Safra: The importance of being biased, ECCC Report TR01-104, 2001. 11. J. Håstad: Some optimal inapproximability results, Journal of ACM 48(2001), 798–859. 12. E. Hazan, S. Safra and O. Schwartz: On the Hardness of Approximating kDimensional Matching, ECCC Report TR03-20, 2003. 13. D. Karger, R. Motwani and M. Sudan: Approximate graph coloring by semi-definite programming, Journal of the ACM 45(2)(1998), 246–265. 14. C. H. Papadimitriou and S. Vempala: On the Approximability of the Traveling Salesman Problem, In Proc. 32nd ACM Symposium on Theory of Computing, Portland, 2000. 15. L. Trevisan: Non-approximability results for optimization problems on bounded degree instances, In Proc. 33rd ACM Symposium on Theory of Computing, 2001. Approximating the Pareto Curve with Local Search for the Bicriteria TSP(1,2) Problem⋆ (Extended Abstract) Eric Angel, Evripidis Bampis, and Laurent Gourvès LaMI, CNRS UMR 8042, Université d’Évry Val d’Essonne, France Abstract. Local search has been widely used in combinatorial optimization [3], however in the case of multicriteria optimization almost no results are known concerning the ability of local search algorithms to generate “good” solutions with performance guarantee. In this paper, we introduce such an approach for the classical traveling salesman problem (TSP) problem [13]. We show that it is possible to get in linear time, a 32 -approximate Pareto curve using an original local search procedure based on the 2-opt neighborhood, for the bicriteria TSP(1,2) problem where every edge is associated to a couple of distances which are either 1 or 2 [12]. 1 Introduction The traveling salesman problem (TSP) is one of the most popular problems in combinatorial optimization. Given a complete graph where the edges are associated with a positive distance, we search for a cycle visiting each vertex of the graph exactly once and minimizing the total distance. It is well known that the TSP problem is NP-hard and it cannot be approximated within a bounded approximation ratio, unless P=NP. However, for the metric TSP (i.e. when the distances satisfy the triangle inequality), Christofides proposed an algorithm with performance ratio 3/2 [1]. For more than 25 years, many researchers attempted to improve this bound but with no success. Papadimitriou and Yannakakis [12] studied a more restrictive version of the metric TSP, the case where all distances are either one or two, and they achieved a 7/6 approximation algorithm. This problem, known as the T SP (1, 2) problem, remains NP-hard, it is in fact this version of TSP that was shown NP-complete in the original reduction of Karp [2]. The T SP (1, 2) problem is a generalization of the hamiltonian cycle problem since we are asking for the tour of the graph that contains the fewest possible non-edges (edges of weight 2). More recently, Monnot et al. obtained results for the T SP (1, 2) with respect to the differential approximation ratio [8,9]. In this paper, we consider the bicriteria T SP (1, 2) problem which is a special case of the multicriteria TSP problem [14] in which every edge is associated to a ⋆ Research partially supported by the thematic network APPOL II (IST 2001-32007) of the European Union, and the France-Berkeley Fund project MULT-APPROX. A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 39–48, 2003. c Springer-Verlag Berlin Heidelberg 2003  40 E. Angel, E. Bampis, and L. Gourvès couple of distances which are either 1 or 2, i.e. each edge can take a value from the set {(1, 1), (1, 2), (2, 1), (2, 2)}. As an application consider two undirected graphs G1 and G2 on the same set V of n vertices. Does there exists a hamiltonian cycle which is common for both graphs? This problem can be formulated as a special case of the bicriteria traveling salesman problem we consider. Indeed, for G = G1 or G2 let δG ([i, j]) = 1 if there is an edge between vertices i and j in graph G and let δG ([i, j]) = 0 otherwise. We form a bicriteria TSP instance in a complete graph in the following way: consider any couple of vertices {i, j} ∈ V 2 , we set the cost of edge [i, j] to be c([i, j]) = (2 − δG1 ([i, j]), 2 − δG2 ([i, j])). Then there exists a hamiltonian cycle common for both graphs if and only if there exists a solution for the bicriteria TSP achieving a cost (n, n). Here, we study the optimization version of this bicriteria TSP in which we look for a common “hamiltonian cycle” using the fewest possible non-edges in each graph, i.e. we are seeking a hamiltonian cycle in the complete graph of the T SP (1, 2) instance minimizing the cost of both coordinates. A solution of our problem is evaluated with respect to two different optimality criteria (see [5] for a recent book on multicriteria optimization). Here, we are interested in the trade-off between the different objective functions which is captured by the set of all possible solutions which are not dominated by other solutions (the so-called Pareto curve). Since the monocriterion T SP (1, 2) problem is NP-hard, determining whether a point belongs to the Pareto curve is NP-hard. Papadimitriou and Yannakakis [11] considered an approximate version of the Pareto curve, the so-called (1 + ε)approximate Pareto curve. Informally, an (1+ε)-Pareto curve is a set of solutions that dominates all other solutions approximately (within a factor 1+ε) in all the objectives. In other words, for every other solution, the considered set contains a solution that is as good approximately (within a factor 1 + ε) in all objectives. We propose a bicriteria local search procedure using the 2-opt neighborhood which finds a 3/2-approximate Pareto curve (notice that a 2-approximate Pareto curve can be trivially constructed, just consider any tour). Interestingly, Khanna et al. [7] have shown that a local search algorithm using the 2-opt neighborhood achieves a 3/2 performance ratio, for the monocriterion T SP (1, 2) problem. We furthermore show that the gap between the cost of a local optimum produced by our local search procedure when compared to a solution of the exact Pareto curve is 3/2, and thus our result is tight. Up to the best of our knowledge, no results were known about the ability of local search algorithms to provide good (from the approximation –with performance guarantee– point of view) solutions in the area of multicriteria optimization. 1.1 Definitions Given an instance of a multicriteria minimization problem, with γ ≥ 1 objective functions Gi , i = 1, . . . , γ, its Pareto curve P is the set of all γ-vectors (cost vectors) such that for each v = (v1 , . . . , vγ ) ∈ P, 1. there exists a feasible solution s such that Gi (s) = vi for all i, and 2. there is no other feasible solution s′ such that Gi (s′ ) ≤ vi for all i, with a strict inequality for some i. Approximating the Pareto Curve with Local Search 41 For the ease of presentation, we will sometimes use P to denote a set of solutions which achieve these values. (If there is more than one solution with the same vi values, P contains one of them.) Since for the problem we consider computing the (exact) Pareto curve is infeasible in polynomial time (unless P=NP), we consider an approximation. Given ε > 0, an (1+ε)-approximate Pareto curve, denoted P(1+ε) , is a set of cost vectors of feasible solutions such that for every feasible solution s of the problem there is a solution s′ with cost vector from P(1+ε) such that Gi (s′ ) ≤ (1 + ε)Gi (s) for all i = 1, ..., γ. 2 Bicriteria Local Search We consider the bicriteria T SP (1, 2) with n cities. For an edge e, we shall denote by c(e) ∈ {(1, 1), (1, 2), (2, 1), (2, 2)} its cost, and c(e) = (c1 (e), c2 (e)). The objective  is to find a tour T (set of edges) minimizing G1 (T ) = e∈T c1 (e) and G2 (T ) = e∈T c2 (e). In the following we develop a local search based procedure in order to find a 3/2-approximate Pareto curve for this bicriteria problem. We shall use the well known 2-opt neighborhood for the traveling salesman problem [4]. Given a tour T , its neighborhood N (T ), is the set of all the tours which can be obtained from T by removing two non adjacent edges from T (a = [x, y] and b = [u, v] in Figure 1) and inserting two new edges (c = [y, v] and d = [x, u] in Figure 1) in order to obtain a new tour. x a y x y 2−opt v b u Tour T c d v u Tour T’ Fig. 1. The 2-opt move. In the bicriteria setting there is a difficulty to define properly what is a local optimum. The natural preference relation over the set of tours, denoted ≺n , is defined as follows. Definition 1. Let T and T ′ be two tours. One has T ′ ≺n T iff – G1 (T ′ ) ≤ G1 (T ) and G2 (T ′ ) < G2 (T ), or – G1 (T ′ ) < G1 (T ) and G2 (T ′ ) ≤ G2 (T ). If we consider this natural preference relation in order to define the notion of local optimum i.e. if we say that a tour T is a local optimum tour with respect to the 2-opt neighborhood whenever there does not exist a tour T ′ ∈ N (T ) such that T ′ ≺n T , then there exist instances for which a local optimum tour gives a performance guarantee strictly worse than 3/2 for one criterion. 42 E. Angel, E. Bampis, and L. Gourvès Indeed, in Figure 2, the exact Pareto curve of the depicted instance contains only the tour abcdef ghij of weight (10, 10). Thus, a 3/2-approximate Pareto curve of the instance should contain a single tour of weight strictly less than 16 for both criteria. Tours aebicdf ghj and adjigf ecbh are both local optima with respect to ≺n and their weights are respectively (16, 10) and (10, 16) (see Figure 2). Thus, using local optima with respect to ≺n is not appropriate to compute a 3/2-approximate Pareto curve of the considered problem (more details are given in the full paper). a b j c i d e h g (2, 1) (1, 2) (1, 1) f Fig. 2. Non represented egdes have a weight (2, 2). Hence, we introduce the following partial preference relations among the set of two edges. These preference relations, denoted by ≺1 and ≺2 , are defined in Figure 3. The set of the ten possible couples of cost-vectors of the edges has been partitioned into three sets S1 , S2 and S3 , and for any s1 ∈ S1 , s2 ∈ S2 , s3 ∈ S3 , we have s1 ≺1 s2 , s1 ≺1 s3 and s2 ≺1 s3 . Intuitively, preference relation ≺1 (resp. ≺2 ) means: pairs with at least one (1,1)-weighted edge in front of all others, and among the rest, pairs with at least one (1,2)-weighted edge (resp. (2,1)-weighted edge) in front. Definition 2. We say that the tour T is a local optimum tour with respect to the 2-opt neighborhood and the preference relation ≺1 if there does not exist a tour T ′ ∈ N (T ), obtained from T by removing edges a, b and inserting edges c, d, such that {c, d} ≺1 {a, b}. A similar definition holds for the preference relation ≺2 . We consider the following local search procedure: Bicriteria Local Search (BLS): 1. Let s1 be a 2-opt local optimum tour with the preference relation ≺1 . 2. Let s2 be a 2-opt local optimum tour with the preference relation ≺2 . 3. If s1 ≺n s2 output {s1 }, if s2 ≺n s1 output {s2 }, otherwise output {s1 , s2 }. Approximating the Pareto Curve with Local Search 43 In order to find a local optimum tour, we start from an arbitrary solution (say s). We look for a solution s′ in the 2-opt neighborhood of s such that s′ ≺1 s (resp. s′ ≺2 s) and replace s by s′ . The procedure stops when such a solution s′ does not exist, meaning that the solution s is a local optimum with respect to the preference relation ≺1 (resp. ≺2 ). Notice that the proposed 2-opt neighborhood local search algorithm does not collapse to the traditional 2-opt neighborhood local search when applied to the monocriterion special case TSP with c1 (e) = c2 (e) for all edges e. In this case our BLS algorithm does not replace a pair of edges with weights (1,1) and (2,2) by a pair of edges edges with weights (1,1) and (1,1), even if this move improves the quality of the tour. However allowing such moves does not improve the performance guarantee as the example in Figure 7 shows. In the next section, we prove the next two theorems. Theorem 1. The set of solution(s) returned by the Bicriteria Local Search (BLS) procedure is a 3/2-approximate Pareto curve for the multicriteria TSP problem with distances one and two. Moreover, this bound is asymptotically sharp. Theorem 2. The number of 2-opt moves performed by BLS is O(n). 3 Analysis of BLS The idea of the proof of Theorem 1 is based (as in [7]) on the comparison of the number of the different types of cost vectors in the obtained local optimum solution(s) with the corresponding numbers with any other feasible solution (including the optimal one). In the following we assume that T is any 2-opt local optimal tour with respect to the preference relation ≺1 . The tour O is any fixed tour (in particular, one of the exact Pareto curve). Let us denote by x (resp. y,z and t) the number of (1,1) (resp. (1,2), (2,1) and (2,2)) edges in tour T . We denote with a prime the same quantities for the tour O. Lemma 1. With the preference relation ≺1 one has x ≥ x′ /2. Proof. Let UO (resp. UT ) be the set of (1, 1) edges in the tour O (resp. local optimum tour T ). We define a function f : UO → UT in the following way. Let e be an edge in UO . If e ∈ UT then f (e) = e. Otherwise let e′ and e′′ be the two edges adjacent to e in the tour T as depicted in Figure 4 (we assume an arbitrary orientation of T and consider that the only edges adjacent to e are e′ and e′′ and not e4 and e5 ). Let e′′′ be the edge forming a cycle of length 4 with e, e′ and e′′ (see Figure 4). We claim that there is at least one edge among e′ and e′′ with a weight (1, 1) and define f (e) to be one of those edges (possibly chosen arbitrarily). Otherwise, we have {e, e′′′ } ∈ S1 and {e′ , e′′ } ∈ S2 ∪ S3 (see Figure 3), contradicting the fact that T is a local optimum with respect to the preference relation ≺1 . Now observe that for a given edge e′ ∈ UT , there can be at most two edges e ∈ UO such that f (e) = e′ . Such a case occurs in Figures 5 and 6. Therefore we have |UT | ≥ |UO |/2. ⊓ ⊔ 44 E. Angel, E. Bampis, and L. Gourvès S1 S1 (1, 1) S2 (1, 1) S3 S2 S3 (1, 1) (1, 2) (2, 1) (1, 1) (2, 1) (1, 2) (1, 1) (1, 2) (2, 1) (1, 1) (2, 1) (1, 2) (1, 2) (1, 2) (2, 1) (2, 1) (2, 1) (1, 2) (1, 1) (2, 1) (2, 2) (1, 1) (1, 2) (2, 2) (2, 1) (1, 2) (2, 2) (1, 2) (2, 1) (2, 2) (1, 1) (2, 2) (2, 2) (1, 1) (2, 2) (2, 2) (2, 2) (2, 2) (a) The preference relation ≺1 . (b) The preference relation ≺2 . Fig. 3. The two preference relations ≺1 and ≺2 . e′′ e4 e′′′ e e′ e5 Fig. 4. The local optimal tour T (arbitrarily oriented). a a e1 c e2 b c e2 e1 e’ b Tour O Tour T Fig. 5. f (e1 ) = f (e2 ) = e′ with e1 , e2 ∈ O and e′ ∈ T a e’’ e1 a e1 c e2 Tour O b c e’=e b 2 Tour T Fig. 6. f (e1 ) = f (e2 ) = e′ with e1 , e2 ∈ O and e′ ∈ T Approximating the Pareto Curve with Local Search 45 Lemma 2. With the preference relation ≺2 one has x ≥ x′ /2. Proof. The proof of Lemma 2 is symmetric to the one of Lemma 1, just assume that T is any 2-opt local optimal tour with respect to the preference relation ≺2 . ⊓ ⊔ Lemma 3. With the preference relation ≺1 one has x + y ≥ (x′ + y ′ )/2. Proof. Let UO (resp. UT ) be the set of (1, 1) and (1, 2) edges in the tour O (resp. local optimum tour T ). We define a function f : UO → UT in the following way. Let e be an edge in UO . If e ∈ UT then f (e) = e. Otherwise let e′ and e′′ be the two edges adjacent to e in the tour T as depicted in Figure 4 (we assume an arbitrary orientation of T as in the proof of Lemma 1 ). Let e′′′ be the edge forming a cycle of length 4 with e, e′ and e′′ (see Figure 4). We claim that there is at least one edge among e′ and e′′ with a weight (1, 1) or (1, 2) and define f (e) to be one of those edges (possibly chosen arbitrarily). Otherwise, we have {e, e′′′ } ∈ S1 ∪ S2 and {e′ , e′′ } ∈ S3 (see Figure 3), contradicting the fact that T is a local optimum with respect to the preference relation ≺1 . Now observe that for a given edge e′ ∈ UT , there can be at most two edges e ∈ UO such that f (e) = e′ . Therefore we have |UT | ≥ |UO |/2. ⊓ ⊔ Proposition 1. If the tour O has a cost (X, X + α) with X a positive integer (n ≤ X ≤ 2n) and n ≥ α ≥ 0, then the solution T achieves a performance guarantee of 3/2 relatively to the solution O for both criteria. 1 2 Proof. Let (CO , CO ) be the cost of the tour O and (CT1 , CT2 ) be the cost of the 1 tour T . We have CT1 = 2n − x − y, CO = 2n − x′ − y ′ and CT2 = 2n − x − z, 2 CO = 2n − x′ − z ′ . Let us consider the first coordinate. We want to show that 1 CT 1 CO 2n−x−y 3 = 2n−x ′ −y ′ ≤ 2 . Using Lemma 3 we get Now we have to show ′ 2n−x−y 2n−x′ −y ′ ′ ≤ ′ 2n− x2 − y2 2n−x′ −y ′ . ′ 2n − x2 − y2 3 ⇐⇒ 4n − x′ − y ′ ≤ 6n − 3x′ − 3y ′ ≤ ′ ′ 2n − x − y 2 ⇐⇒ 2x′ + 2y ′ ≤ 2n ⇐⇒ x′ + y ′ ≤ n, which is true since x′ + y ′ + z ′ + t′ = n and z ′ , t′ ≥ 0. We consider now the second 2 1 coordinate. Since the tour O has a cost (X, X + α), it means that CO = CO +α ′ ′ and therefore z = y − α. We have to show 2n − x − z 3 ⇐⇒ 4n − 2x − 2z ≤ 6n − 3x′ − 3z ′ ≤ 2n − x′ − z ′ 2 ⇐⇒ 3x′ − 2x + 3z ′ − 2z ≤ 2n ⇐⇒ 3x′ − 2x + 3y ′ − 3α − 2z ≤ 2(x′ + y ′ + z ′ + t′ ) ⇐⇒ x′ − 2x − y ′ − α − 2z ≤ 2t′ , which is true since x′ − 2x ≤ 0 by Lemma 1. ⊓ ⊔ 46 E. Angel, E. Bampis, and L. Gourvès We assume now that T is any 2-opt local optimal tour with respect to the preference relation ≺2 . The tour O is any fixed tour. In a similar way as in the case of Lemma 3 we can prove: Lemma 4. With the preference relation ≺2 one has x + z ≥ (x′ + z ′ )/2. Proof. The proof of Lemma 4 is symmetric to the one of Lemma 3. ⊓ ⊔ Proposition 2. If the tour O has a cost (X + α, X) with X a positive integer (n ≤ X ≤ 2n) and α > 0, then the solution T achieves a performance guarantee of 3/2 relatively to the solution O for both criteria. Proof. The proof of Proposition 2 is symmetric to the one of Proposition 1, using Lemma 4 and Lemma 2 instead of Lemma 3 and Lemma 1. ⊓ ⊔ Now, we are ready to prove Theorems 1 and 2. Proof of Theorem 1. Proof. Let s be an arbitrary tour. If s has a cost (X, X + α), α ≥ 0, then using Proposition 1 the solution s1 3/2-approximately dominates the solution s. Otherwise, s has a cost (X + α, X), α > 0, and using Proposition 2 the solution s2 3/2-approximately dominates the solution s. To see that this bound is asymptotically sharp consider the instance depicted in Figure 7. The tour s1 s2 . . . s2n s1 is a local optimum with respect to ≺1 and ≺2 , and it has a weight n × (1, 1) + n × (2, 2) = (3n, 3n), whereas the optimal tour s1 s3 s2n s4 s2n−1 . . . sn−1 sn+4 sn sn+3 sn+1 sn+2 s2 s1 has a weight (2n − 1) × (1, 1) + (2, 2) = (2n + 1, 2n + 1). s1 ⊓ ⊔ s2 s2n s3 s4 s2n−1 (1, 1) sn−1 sn+4 sn sn+3 sn+2 sn+1 Fig. 7. The edges represented have a weight (1, 1), whereas non represented edges have a weight (2, 2). Approximating the Pareto Curve with Local Search 47 Proof of Theorem 2. Proof. Let T be a tour. Let F1 (T ) = 3x + y with x (resp. y) the number of (1, 1) edges (resp. (1, 2) edges) of T . We assume that one 2-opt move, with respect to ≺1 , transforms T to T ′ . Then it is easy to see that one has F1 (T ′ ) ≥ F1 (T ) + 1 for any such 2-opt move. Indeed, each 2-opt move with respect to ≺1 increases either the number of (1, 2) without decreasing the number of (1, 1), or increases the number of (1, 1) edges by decreasing the number of (1, 2) edges by at most two. Since 0 ≤ F1 (T ) ≤ 3(x + y) ≤ 3n and F1 (T ) ∈ N, a local search which uses ≺1 converges to a local optimum solution in less than 3n steps. One can use the same proof with ≺2 , just assume that F2 (T ) = 3x + z with x (resp. z) the number of (1, 1) edges (resp. (2, 1) edges) of a tour T . ⊓ ⊔ 4 Concluding Remarks In this paper we proposed a bicriteria local search procedure based on the standard 2-opt neighborhood which allowed to get a 3/2-approximate Pareto curve for the bicriteria T SP (1, 2). Our results can be extended to the T SP (a, a + δ) δ with a ∈ R+∗ and 0 ≤ δ ≤ a. In that case we obtain an 1 + 2a -approximate Pareto curve. Since Chandra et al. [6] have shown that for the TSP satisfying the triangle inequality, the worst-case performance ratio of 2-opt (resp. k-opt) √ √ 1 local search is at most 4 n and at least 41 n (resp. 41 n 2k ), our constant approximation result cannot be extended for the metric case. It would be however interesting to establish lower and upper bounds for this more general case. Our results can also be applied to the bicriteria version of the M AX T SP (1, 2) problem. In this problem, the objective is the maximization of the length of the tour. For the monocriterion case the best approximation algorithm known has a performance ratio of 7/8 [8,9] (the previously known approximation algorithm had a performance ratio of 3/4 [10]). We can obtain for the bicriteria case a 2/3-approximate Pareto curve in the following way. The idea is to modify the instance by replacing each edge (2,2) by an edge (1,1), each edge (1,1) by and edge (2,2), and each edge (1,2) by an edge (2,1) and vice et versa. It can be shown that obtaining a 3/2-approximate Pareto curve for the bicriteria M IN T SP (1, 2) on this modified instance yields a 2/3-approximate Pareto curve for the bicriteria M AX T SP (1, 2) on the original instance. This is equivalent to say that we work on the original instance, but using modified preference relations ≺′1 and ≺′2 obtained from ≺1 and ≺2 by replacing each edge (2,2) by an edge (1,1), each edge (1,2) by an edge (2,1), and vice et versa. An interesting question is whether it is possible to obtain constant approximation ratios for the more general k-criteria T SP (1, 2) problem (for k > 2). It seems that our approach cannot be directly applied to this case. 48 E. Angel, E. Bampis, and L. Gourvès References 1. N. Christofides. Worst-Case analysis of a new heuristic for the traveling salesman problem. Technical Report, GSIA, Carnegie Mellon University, 1976. 2. R.M. Karp. Reducibility among combinatorial problems. Complexity of Computer Computations, R.E. Miller and J.W. Thatcher (Eds.), Pluner, NY, 1972. 3. E. Aarts and J.K. Lenstra, Local search in combinatorial optimization, John Wiley and Sons, 1997. 4. D.S. Johnson and L.A. McGeoch, The traveling salesman problem: a case study in Local Optimization, chapter in Local search in combinatorial optimization, E. Aarts and J.K. Lenstra (eds.), John Wiley and Sons, 1997. 5. M. Ehrgott Multicriteria Optimization, Lecture Notes in Economics and Mathematical Systems, vol. 491, Springer, 2000. 6. B. Chandra, H. Karloff and C. Tovey, New results on the old k-opt algorithm for the TSP, SIAM Journal on Computing, 28(6), 1998–2029, 1999. 7. S. Khanna, R. Motwani, M. Sudan and V. Vazirani, On syntactic versus computational views of approximability, SIAM Journal on Computing, 28(1), 164–191, 1998. 8. J. Monnot, Differential approximation results for the traveling salesman and related problems, Information Processing Letters, 82(5), 229–235, 2002. 9. J. Monnot, V. Th. Paschos and S. Toulouse, Differential approximation results for the traveling salesman problem with distances 1 and 2, European Journal of Operational Research, 145, 557–568, 2003. 10. A.I. Serdyukov, An algorithm with an estimate for the traveling salesman problem of the maximum, Upravlyaemye Sistemy, 25, 80–86, 1984. 11. C.H. Papadimitriou and M. Yannakakis, On the approximability of trade-offs and optimal access of web sources, Proceedings 41th Annual IEEE Symposium on Foundations of Computer Science, 86–92, 2000. 12. C.H. Papadimitriou and M. Yannakakis. The traveling salesman problem with distances one and two. In Mathematics of Operations Research, 18(1), 1–11, 1993. 13. C.H. Papadimitriou, S. Vempala. On the approximability of the traveling salesman problem. Proc. STOC’00, 126–133, 2000. 14. A. Gupta, A. Warburton. Approximation methods for multiple criteria traveling salesman problems, Towards Interactive and Intelligent Decision Support Systems, Proc. of the 7th International Conference on Multiple Criteria Decision Making, (Y. Sawaragi Ed.), Springer Verlag, 211–217, 1986. Scheduling to Minimize Max Flow Time: Offline and Online Algorithms⋆ Monaldo Mastrolilli IDSIA, Galleria 2, 6928 Manno, Switzerland monaldo@idsia.ch Abstract. We investigate the max flow scheduling problem in the offline and on-line setting. We prove positive and negative theoretical results. In the off-line setting, we address the unrelated parallel machines model and present the first known fully polynomial time approximation scheme, when the number of machines is fixed. In the on-line setting and when the machines are identical, we analyze the First In First Out (FIFO) heuristic when preemption is allowed. We show that FIFO is an on-line algorithm with a (3 − 2/m)-competitive ratio. Finally, we present two lower bounds on the competitive ratio of deterministic on-line algorithms. 1 Introduction The m-machine scheduling problem is one of the most widely-studied problems in computer science, with an almost limitless number of variants ( [3,6,12,18] are surveys). The most common objective function is the makespan, which is the length of the schedule, or equivalently the time when the last job is completed. This objective function formalizes the viewpoint of the owner of the machines. If the makespan is small, the utilization of his machines is high; this captures the situation when the benefits of the owner are proportional to the work done. If we turn our attention to the viewpoint of a user, the time it takes to finish individual jobs may be more important; this is especially true in interactive environments. Thus, if many jobs that are released early are postponed at the end of the schedule, it is unacceptable to the user of the system even if the makespan is optimal. For that reason other objective functions are studied. With this aim, a wellstudied objective function is the total flow time [1,13,17]. The flow time of a job is the time the job is in the system, i.e., the completion time minus the time when it becomes first available. The above mentioned objective function is the sum of these values over all jobs. The Shortest Remaining Processing Times (SRPT) heuristic produces a schedule with optimum total flow time (see [12]) when there is a single processor. Unfortunately, this heuristic has the well-known ⋆ Supported by the “Metaheuristics Network”, grant HPRN-CT-1999-00106, and by Swiss National Science Foundation project 20-63733.00/1, “Resource Allocation and Scheduling in Flexible Manufacturing Systems”. A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 49–60, 2003. c Springer-Verlag Berlin Heidelberg 2003  50 M. Mastrolilli drawback that it leads to starvation. That is, some jobs may be delayed to an unbounded extent. Inducing starvation is an inherent property of the total flow time metric. In particular, there exists inputs where any optimal schedule for total flow time forces the starvation of some job (see Lemma 2.1 in [2]). This property is undesirable. From the discussion above, it is natural to conclude that in order to avoid starvation, one should bound the flow time of each job. This motivates the study of the minimization of the maximum flow time. Problems: We address three basic types of parallel machine models. In each there are n jobs J1 , ..., Jn to be scheduled on m machines M1 , ..., Mm . Each machine can process at most one job at a time, and each job must be processed in an uninterrupted fashion on one of the machines. We will also consider the preemptive case, in which a job may be interrupted on one machine and continued later (possibly on another machine) without penalty. Job Jj (j = 1, ..., n) is released at time rj ≥ 0 and cannot start processing before that time. In the most general setting, the machines are unrelated : job Jj takes pij = pj /sij time units when processed by machine Mi , where pj is the processing requirement of job Jj and sij is the speed of machine Mi for job Jj . If the machines are uniformly related, then each machine Mi runs at a given speed si for all jobs. Finally, for identical machines, we assume that si = 1 for each machine Mi . We denote the completion time of job Jj in a schedule S by Cjs or Cj , if no confusion is possible. The flow time of job Jj is defined as Fj = Cj − rj , and the maximum flow time Fmax is maxj=1,...,n Fj . We seek to minimize the maximum flow time. In the off-line version of the problem, it is assumed that the scheduler has full information of the problem instance. By contrast, in the on-line version of the problem, jobs are introduced to the algorithm at their release times. Thus, the algorithm bases its decision only upon information related to already released jobs. In the on-line paradigm, we distinguish between the clairvoyant and nonclairvoyant model. In the clairvoyant model we assume that once a job is known to the scheduler, its processing time is also known. In the non-clairvoyant model the processing time of a job is unknown until its processing is completed. Previous Work: To the best of our knowledge, the only known result about the non-preemptive max flow time scheduling problem is due to Bender et al. [2]. They address the on-line non-preemptive problem with identical parallel machines (in the notation of Graham et al. [6], this problem is noted P |online; rj |Fmax ). In [2] they claim that the First In First Out (FIFO) heuristic (that is, scheduling jobs in the order they arrive to the machine on which they will finish first) is a (3 − 2/m)-competitive algorithm1 . When preemption is allowed, in each of the three types of parallel models, we observe that there are polynomial-time off-line algorithms for finding optimal 1 A ρ-competitive algorithm is an on-line algorithm that finds a solution within a ρ factor of the optimum. Scheduling to Minimize Max Flow Time: Offline and Online Algorithms 51 preemptive solutions: these are obtained by adapting the approaches proposed in [14,15] for the preemptive parallel machines problems with release times and deadlines. In [14,15] the objective function is the minimization of the maximum lateness Lmax = max Lj , where Lj is the lateness of job Jj , that is the completion time of Jj minus the its deadline (the time by which job Jj must be completed). We can use the algorithms in [14,15] for the preemptive maximum flow time minimization by setting the deadline of each job equal to its release time. When the jobs release times are identical, the problem reduces to the classical makespan minimization problem. In this case the three types of parallel machine models have been studied extensively (see [3,6,12,18] for surveys). Here, we only mention that these related scheduling problems are all strongly NP-hard [5], and polynomial time approximation schemes2 (PTAS) are known when the machines are either identical or uniformly related [7,8]. For unrelated machines, Lenstra, Shmoys and Tardos [16] gave a polynomial-time 2-approximation algorithm for this problem; and this is the currently known best approximation ratio achieved in polynomial time. They also proved that for any positive ε < 1/2, no polynomial-time (1 +ε)-approximation algorithm exists, unless P=NP. Since the problem is NP-hard even for m = 2, it is natural to ask how well the optimum can be approximated when there is only a constant number of machines. In contrast to the previously mentioned inapproximability result for the general case, there exists a fully polynomial-time approximation scheme for the problem when m is fixed. Horowitz and Sahni [10] proved that for any ε > 0, an ε-approximate solution can be computed in O(nm(nm/ε)m−1 ) time, which is polynomial in both n and 1/ε if m is constant. Recently, Jansen and Porkolab [11], and later improved by Fishkin, Jansen and Mastrolilli [4], presented a fully polynomial time approximation scheme for the problem whose running time is linear in the number of jobs. Note that, as the makespan problem is a special case of the max flow time problem, all the mentioned negative results hold also for the problems addressed in this paper. Our Results: In this paper, we investigate the max flow time problem in the off-line and on-line setting. We prove positive and negative theoretical results. In the off-line setting, we address the unrelated parallel machines model (Section 2.1) and present, when the number m of machines is fixed, the first known fully polynomial time approximation scheme (FPTAS). Observe that no polynomial time approximation scheme is possible when the number of machines is part of the input [16], unless P=NP. Therefore, for fixed m obtaining a FPTAS is to some extent the strongest possible result. In the on-line setting and when the machines are identical, we analyze the (non-preemptive) FIFO heuristic when preemption is allowed (noted as P |online; pmtn; rj |Fmax according to Graham et al. [6]). Bender et al. [2] claimed that 2 Algorithms that, for any fixed ε > 0, find a solution within a (1 + ε) factor of the optimum in polynomial time. If the running time is bounded by a polynomial in the input size and 1/ε, then these algorithms are called fully polynomail time approximation schemes (FPTAS). 52 M. Mastrolilli this strategy is a (3 − 2/m)-competitive algorithm for the non-preemptive problem. We show (Section 3.1) that FIFO comes within the same bound of the optimal preemptive schedule length. Since FIFO does not depend on the sizes of the jobs, it is also an on-line non-clairvoyant algorithm with a (3−2/m)-competitive ratio. In Section 3.2 we show that no 1-competitive (optimal) on-line algorithm is possible for the preemptive problem (P |on-line; pmtn; rj |Fmax ). This result should be contrasted with the related problem P |on-line; pmtn; rj |Cmax (i.e., the same problem with makespan as objective function) that admits an optimal on-line algorithm [9]. In Section 3.3, we show that in the non-clairvoyant model the competitive ratio cannot be better than 2. This proves that the competitive ratio of FIFO matches the lower bound when m = 2. Finally, in Section 3.4 we address the problem with uniformly related parallel machines and identical processing times (noted as Q|on-line; pj = p; rj |Fmax according to [6]). We show that in this case FIFO is 1-competitive (optimal). Due to page limit, several proofs had to be omitted from this version of the paper. A complete version of the paper is available (http://www.idsia.ch/˜monaldo /research papers.html). 2 2.1 Offline Max Flow Time A FPTAS for Unrelated Parallel Machines In this section we consider the off-line problem of scheduling a set J = {J1 , ..., Jn } of n independent jobs on a set M = {M1 , ..., Mm } of m unrelated parallel machines. We present a FPTAS when the number m of machines is a constant. Our approach consists of partitioning the set of jobs into blocks B(1), B(2), ..., such that jobs belonging to any block can be scheduled regardless of jobs belonging to other blocks (Separation Property). The FPTAS follows by presenting a (1 + ε)-approximation algorithm for each block of jobs. Separation Property. Let pj = mini=1,...,m pij denote the smallest processing time of job Jj . Let R = {r(1), r(2), ..., r(ρ)} be the set of all release dates (ρ ≤ n is the number of different release values). Assume, without loss of generality, that r(1) < r(2) < ... < r(ρ). Set r(ρ + 1) = ∞. Partition jobs according to their release times and let N (i) = {Jj : rj = r(i)}, i = 1, ..., ρ, denote the set of jobs released at time r(i). Finally, let PN (i)  be the sum of the smallest processing times of jobs from N (i), i.e., PN (i) = Jj ∈N (i) pj . Block Definition. The first block B(1) is defined as follows. If r(1)+PN (1) ≤ r(2) then B(1) = N (1). Otherwise, if r(1) + PN (1) + PN (2) ≤ r(3) then B(1) = N (1) ∪ N (2), else continue similarly. More formally, B(1) =  i=1,..,b1 N (i) Scheduling to Minimize Max Flow Time: Offline and Online Algorithms 53 where b1 is the smallest positive integer such that r(1) +  PN (i) ≤ r(b1 + 1). i=1,..,b1 Therefore if a job belongs to B(1) then it could be completed not later than time r(b1 + 1) (by assigning jobs to the machines with the smallest processing requirements). Other possible blocks are obtained in a similar way: if r(b1 + 1) ≤ r(ρ) then discard all jobs from B(1) and apply a similar procedure to obtain the next block B(2). More formally, for w = 2, 3, ..., the w-th block is defined as  B(w) = N (i) i=bw−1 +1,..,bw where bw is the smallest positive integer such that r(bw−1 + 1) +  PN (i) ≤ r(bw + 1). i=bw−1 +1,..,bw In the following, let us use β to denote the number of blocks. By definition, observe that bβ = ρ. release time of jobs from block B(i), Block Property. Let rB(i) be the earliest  i.e., rB(i) = minJj ∈B(i) rj , and PB(i) = Jj ∈B(i) pj . Formerly, we claim that jobs belonging to any block can be scheduled regardless of jobs belonging to other blocks. A sufficient condition to have this separation property would be that in any ‘good’ (optimal or approximate) solution all jobs from block B(i) (i = 1, ..., β) could be scheduled between time rB(i) and rB(i) + PB(i) . However, this is not always true for this problem, as Example 1 shows. Example 1. Consider an instance with 3 jobs and 2 machines. The data are reported in the table of Figure 1. In this example we have only one block B(1) and rB(1) + PB(1) = 5. In Figure ∗ 1 it is shown an optimal solution (Fmax = 3) in which the last job completes at time 6 (> rB(1) + PB(1) ). j rj p1j p2j 1 0 3 10 2 2 1 3 3 3 3 1 M1 M2 Fig. 1. Block example J3 J1 J2 54 M. Mastrolilli We overcome the previous problem by showing that there exists always at least one ‘good’ (optimal or approximate) solution in which all jobs from block B(i) (i = 1, ..., β) are scheduled between time rB(i) and rB(i) + PB(i) . We prove this by exhibiting an algorithm which transforms any solution into another solution with the desired separation property. Moreover, the objective function value of the new solution is not worse than the previous one. Separation Algorithm. Assume that we have a solution SOL of value Fmax in which jobs from different blocks are not scheduled separately. Then there exists at least one block, say B(w), in which the last job of B(w) completes after time rB(w) + PB(w) . For those blocks B(w), and starting with the block with the lowest index w, we show how to reschedule jobs from B(w) such that they are completed within time rB(w) + PB(w) , and without worsening the solution value. Let C(i) denote the time all jobs from N (i) are completed according to solution SOL, i.e., the time the last job from N (i) completes.  Observe that Fmax = maxi (C(i) − r(i)). Recall the block definition B(w) = i=bw−1 +1,..,bw N (i), and let N (l) ⊆ B(w) be the last released group of jobs such that  C(l) ≤ rB(w) + PN (i) . i=bw−1 +1,...,l By construction we have C(x) > rB(w) +  PN (i) , for x = l + 1, ..., bw . i=bw−1 +1,...,x Now remove from SOL all jobs belonging to N (l +1)∪...∪N (bw ) and reschedule them in order of non-decreasing release times and on the machine requiring the lowest processing time. We claim that according to the new solution SOL′ the completion time C ′ (i) of every class N (i) is not increased, i.e. C ′ (i) ≤ C(i) for i = l + 1, ..., bw , and all jobs from B(w) are completed by time rB(w) + PB(w) . Indeed, the new completion time C ′ (l + 1) of jobs from N (l + 1) is bounded by C(l)+PN (l+1) that is at most rB(w) + i=bw−1 +1,...,l+1 PN (i) , and by construction less than C(l + 1). More generally, this property holds for every set N (x + 1) with x = l + 1, ..., bw , i.e. C ′ (x + 1) ≤ C ′ (x) + PN (x+1)  ≤ rB(w) + PN (i) < C(x + 1). i=bw−1 +1,...,x+1 It follows that in solution SOL′ every job from N (x) (⊆ B(w)) is completed  within time rB(w) + i=bw−1 +1,...,x PN (i) and therefore every job from B(w) is ′ completed by time rB(w) + PB(w) . Moreover the maximum flow time Fmax of the ′ ′ new solution is not increased since Fmax = maxi (C (i) − r(i)) ≤ maxi (C(i) − r(i)) = Fmax . Scheduling to Minimize Max Flow Time: Offline and Online Algorithms 55 Lemma 1. Without increasing the maximum flow time, any given solution can be transformed into a new feasible solution having all jobs from block B(w) (w = 1, ..., β) scheduled between time rB(w) and rB(w) + PB(w) . Block Approximation. By Lemma 1 a (1 + ε)-approximate solution can be obtained as follows: starting from the first block, compute a (1 + ε)-approximate schedule for each block B(w) that starts at time rB(w) and completes by time rB(w) + PB(w) , i.e., not later than the earliest starting time of the next block B(w + 1) of jobs. A (1 + ε)-approximate solution can be computed in polynomial time if there exists a polynomial time (1 + ε)-approximation algorithm for each block of jobs. By previous arguments, we focus our attention on a single block of jobs and assume, without loss of generality, that the input instance is given by this set of jobs. For simplicity of notation we again use n to denote the number of jobs in the block instance and {J1 , ..., Jn } the set of jobs. Moreover, we assume, without loss of generality, that the earliest release date is zero, i.e., minj rj = 0. Observe that pmax = maxj pj is a lower bound for the minimum objec∗ ∗ tive value Fmax , i.e., Fmax ≥ pmax . By block definition, Lemma 1 and since n minj rj = 0, all jobs can be completed by time j=1 pj ≤ npmax . Moreover, any solution that completes within time npmax has a maximum flow time that ∗ cannot be larger than npmax . Therefore, the optimal objective value Fmax can ∗ be bounded as follows: pmax ≤ Fmax ≤ npmax . Without loss of generality, we restrict our attention to finding those solutions with maximum flow time at most npmax . Therefore we can discard all solutions whose last job completes later than 2npmax , since all solutions with greater length have a maximum flow time larger than npmax . Similarly, we will implicitly assume that job Jj cannot be scheduled on those machines Mi with pij > npmax , since otherwise the resulting schedule would have a maximum flow time larger than npmax . In the following we show how to compute a (1 + ε)-approximate solution in which the last job completes not later than 2npmax . This solution can be always transformed a (1 + ε)-approximate solution with the last job completing not into n later than j=1 pj by Lemma 1. The (1 + ε)-approximation algorithm is structured in the following three steps. 1. Round input values. 2. Find an optimal solution of the rounded instance. 3. Unround values. We will first describe step 2, then step 1 with its “inverse” step 3. An Optimal Algorithm. We start making some observations regarding the maximum flow time of a schedule. First renumber the jobs such that r1 ≤ r2 ≤ ... ≤ rn holds. A simple job interchange argument shows that for a single machine, the maximum flow time is minimized if the jobs are processed in a non-decreasing order of release times. This property was first observed by Bender et al. [2]. 56 M. Mastrolilli We may view any m-machine schedule as an assignment of the set of jobs to machines with jobs assigned to machine Mi being processed in increasing order of index. Consequently given an assignment the max flow time is easily computed. We are interested in obtaining an assignment which minimizes Fmax . Thus we may regard assignment and schedule as synonymous. A completion configuration c is a m-dimensional vector c =(c1 , ..., cm ): ci denotes the completion time of machine Mi , for i = 1, ..., m. A partial schedule, σk is an assignment of the jobs J1 , ..., Jk to machines. A completion schedule ωk is an assignment of the remaining jobs Jk+1 , ..., Jn to machines. Consider two partial schedules σk1 and σk2 such that according to σk1 the last job on machine Mi (for i = 1, ..., m) completes not later than the last job scheduled on the same machine Mi according to σk2 ; moreover the maximum flow time of σk1 is not larger than that of σk2 . If this happens we say that σk1 dominates σk2 . It is easy to check that whatever is the completion schedule ωk , the schedule obtained considering the assignment of jobs as in σk1 and ωk cannot be worse that attainable with σk2 and ωk . Therefore, with no loss, we can discard all dominated partial schedules. The reason is that by adding the remaining jobs Jk+1 , ..., Jn in order of increasing index, the completion time of the current job Jj (j = k + 1, ..., n) is a monotone non-decreasing function of the completion times of machines before scheduling Jj (and does not depend on how J1 , ..., Jj−1 are really scheduled). Therefore if jobs J1 , ..., Jk are scheduled according to σk1 then the maximum flow time of jobs Jk+1 , ..., Jn , when scheduled according to any ωk , cannot be larger than the maximum flow time of the same set of jobs when J1 , ..., Jk are scheduled according to σk2 . We encode a feasible schedule s by a (m + 1)-dimensional vector s =(c1 , ..., cm , F ), where (c1 , ..., cm ) is a completion configuration and F is the maximum flow time in s. We say that schedule s1 =(c′1 , ..., c′m , F ′ ) dominates s2 =(c′′1 , ..., c′′m , F ′′ ) if c′i ≤ c′′i , for i = 1, ..., m, and F ′ ≤ F ′′ . Moreover, since ∗ Fmax ≤ npmax we classify as dominated all those schedule s =(c1 , ..., cm , F ) with F > npmax . The latter implies ci ≤ 2npmax (i = 1, ..., m) in any not dominated schedule. For every s =(c1 , ..., cm , F ), let us define the operator ⊕ as follows: s ⊕ pij = (c1 , ..., c′i , ..., cm , F ′ ) where c′i =  ci + pij if rj ≤ ci rj + pij otherwise and F ′ = max {F ; c′i − rj }. The following dynamic programming algorithm computes the optimal solution: Scheduling to Minimize Max Flow Time: Offline and Online Algorithms 57 Algorithm OPT-Fmax 1. Initialization: L0 ← {(c1 = 0, ..., cm = 0, 0)} 2. For j = 1 to n 3. For i = 1 to m 4. For every vector s ∈ Lj−1 put vector s ⊕ pij in Lj 5. Discard from Lj all dominated schedules 6. Output: return the vector (c1 , ..., cm , F ) ∈ Ln with minimum F At line 4, the algorithm schedules job Jj at the end of machine Mi . At line 5, all dominated partial schedules are discarded. The total running time of the dynamic program is O(nmD), where D is the maximum number of not dominated schedules at steps 4 and 5. Let δ be the maximum number of different values that each machine completion time ci can take in any not dominated schedule. The reader should have no difficulty to bound D by O(δ m ). Therefore, the described algorithm is, for every fixed m, a polynomial time algorithm iff δ is polynomial in n and 1/ε. The next subsection shows how to transform any given instance such that the latter happens. Rounding and Unrounding Jobs. Let ε > 0 be an arbitrary small rational number and assume, for simplicity, that 1/ε is an integral value. The first step is to round max i, for down every processing and release time to the nearest lower value of εp2n 2 i = 0, 1, . . . , 2n /ε; clearly this does not increase the objective function value. Note that the largest release time rn is not greater than npmax since all jobs can be completed by that time. Then, find the optimal solution SOL of the resulting instance by using the dynamic programming approach described in the previous subsection. Observe that, since in every not dominated schedule the completion time ci of any machine Mi cannot be larger than 2npmax , then the maximum max number δ of different values of ci is now bounded by 1 + (2npmax )/( εp2n ) = 1 + 4n2 /ε, i.e., polynomial in n and 1/ε. Solution SOL can be easily modified to be a feasible solution also for the max (this is original instance. First, delay the starting time of each job by εp2n sufficient to guarantee that all jobs do not start before their original release date); max the completion time of each job may increase by at most εp2n . Second, replace the rounded processing values with the originals; now the completion time of each job may increase by at most εpmax /2 (here we are using the assumption that each processing time cannot be larger than npmax , and that each processing max ). Therefore, we may potentially increase the time may increase by at most εp2n ∗ max + εp2n ≤ εFmax . This results in maximum flow time of SOL by at most εpmax 2 a (1 + ε)-approximate solution for the block instance. The total running time of the described FPTAS is determined by the dynamic programming algorithm, that is O(nm(n2 /ε)m ). Theorem 1. For the problem of minimizing the maximum flow time in scheduling n jobs on m unrelated machines (m fixed), there exists a fully polynomial time approximation scheme that runs in O(nm(n2 /ε)m ) time. 58 3 3.1 M. Mastrolilli Online Max Flow Time Analysis of FIFO for P|on-line; pmtn; rj |Fmax In this section we will analyze the FIFO heuristic when preemption is allowed and in the identical machines model. Bender et al. [2] claimed that this strategy is a (3 − 2/m)-competitive algorithm for nonpreemptive scheduling. We show that FIFO (that is non-preemptive) comes within the same bound of the optimal preemptive schedule length. Since FIFO does not depend on the sizes of the jobs, it is also an on-line non-clairvoyant algorithm with a (3−2/m)-competitive ratio. In Section 3.2 we will show that no 1-competitive (optimal) on-line algorithm is possible. Lower Bounds. First observe that pmax = maxj pj is a lower bound for the ∗ ∗ minimum objective value Fmax , i.e., Fmax ≥ pmax . In the following we provide a second lower bound. Consider a relaxed version of the problem in which a job Jj can be processed by more that one machine simultaneously and without changing the total processing time pj that Jj spends on machines. Let us call this relaxed version of ◦ of the the problem as the fractional problem. Clearly the optimal value Fmax ∗ fractional problem cannot be larger than Fmax , i.e. the optimal preemptive max flow time. Now, recall the definitions given in subsection 2.1, and without loss of generality, let us renumber the jobs J1 , J2 , ..., Jn such that r1 ≤ r2 ≤ ... ≤ rn . Consider the following rule that we call fractional FIFO: schedule jobs in order of increasing index and assigning pj /m time units of job Jj (j = 1, ..., n) to each machine. Lemma 2. The optimal solution of the fractional problem can be obtained by using the fractional FIFO. Now according to the fractional FIFO, let the fractional load ℓ(i) at time r(i) be defined as the total sum of processing times of jobs that at time r(i) have been released but not yet finished. More formally, we have ℓ(1) = PN (1) , ℓ(i + 1) = PN (i+1) + max{ℓ(i) − m(r(i + 1) − r(i)); 0}. By Lemma 2, the maximum flow time FN (i) of jobs from N (i) is the time required to process all jobs that at time r(i) have been released but not yet finished, i.e. ◦ FN (i) = ℓ(i)/m. The optimal solution value Fmax of the fractional solution is ℓmax 1 therefore equal to m = m maxi=1,...,ρ ℓ(i). We will refer to this value ℓmax as ∗ the maximal fractional load over time. Since the optimal solution value Fmax ◦ of our original preemptive problem cannot be smaller than Fmax , we have the following lower bounds ∗ ≥ max{ Fmax ℓmax ; pmax }. m (1) Scheduling to Minimize Max Flow Time: Offline and Online Algorithms 59 Analysis of FIFO. We start showing that FIFO delivers a schedule whose max1 imum flow time is within ℓmax m + 2(1 − m )pmax . Lemma 3. FIFO returns a solution with maximum flow bounded by ℓmax 1 + 2(1 − )pmax . m m By Lemma 3 and inequality (1) it follows that FIFO is a (3 − 2/m)competitive algorithm. Moreover, this bound is tight. Theorem 2. FIFO is (3 − 2/m)-competitive algorithm for P |on-line; pmtn; rj |Fmax and this bound is tight. 3.2 A Lower bound for P|on-line; pmtn; rj |Fmax We show that no on-line preemptive algorithm can be 1-competitive. Theorem 3. The competitive ratio of any deterministic algorithm for P |on1 . line; pmtn; rj |Fmax is at least 1 + 14 This result should be contrasted with the related problem P |on-line; pmtn; rj |Cmax (i.e., the same problem with makespan as objective function) that admits an optimal on-line algorithm [9]. Moreover, we already observed that in the offline setting the problem can be solved optimally in polynomial time by adapting the algorithm described in [14,15]. 3.3 A Lower bound for P|on-line-nclv; rj |Fmax When jobs processing times are known at their arrival dates (clairvoyant model), Bender et al. [2] observed a simple lower bound of 3/2 on the competitive ratio of any on-line deterministic algorithm. In the following we show that in the nonclairvoyant model the competitive ratio cannot be better than 2. This shows that the competitive ratio of FIFO matches the lower bound when m = 2. Theorem 4. The competitive ratio of any deterministic algorithm for P |online-nclv; rj |Fmax is at least 2. 3.4 Analysis of FIFO for Q|on-line; pj =p; rj |Fmax We address the problem with identical and uniformly related parallel machines. We assume that the processing times of jobs are identical. Simple analysis shows that FIFO is optimal. Theorem 5. FIFO is 1-competitive for Q|on-line; pj = p; rj |Fmax . 60 M. Mastrolilli References 1. B. Awerbuch, Y. Azar, S. Leonardi, and O. Regev. Minimizing the flow time without migration. In In Proceedings of the 31st Annual ACM Symposium on Theory of Computing (STOC’99), pages 198–205, 1999. 2. M. A. Bender, S. Chakrabarti, and S. Muthukrishnan. Flow and stretch metrics for scheduling continuous job streams. In Proceedings of the 9th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’98), pages 270–279, 1998. 3. B. Chen, C. Potts, and G. Woeginger. A review of machine scheduling: Complexity, algorithms and approximability. Handbook of Combinatorial Optimization, 3:21– 169, 1998. 4. A. Fishkin, K. Jansen, and M. Mastrolilli. Grouping techniques for scheduling problems: simpler and faster. In 9th Annual European Symposium on Algorithms (ESA’01), volume LNCS 2161, pages 206–217, 2001. 5. M. R. Garey and D. S. Johnson. Computers and intractability; a guide to the theory of NP-completeness. W.H. Freeman, 1979. 6. R. Graham, E. Lawler, J. Lenstra, and A. R. Kan. Optimization and approximation in deterministic sequencing and scheduling: A survey. volume 5, pages 287–326. North–Holland, 1979. 7. D. Hochbaum and D. Shmoys. Using dual approximation algorithms for scheduling problems: theoretical and practical results. Journal of the ACM, 34:144–162, 1987. 8. D. Hochbaum and D. Shmoys. A polynomial approximation scheme for machine scheduling on uniform processors: Using the dual approximation approach. SIAM J. on Computing, 17:539–551, 1988. 9. K. S. Hong and J. Y.-T. Leung. On-line scheduling of real-time tasks. IEEE Transactions on Computing, 41:1326–1331, 1992. 10. E. Horowitz and S. Sahni. Exact and approximate algorithms for scheduling nonidentical processors. Journal of the ACM, 23(2):317–327, 1976. 11. K. Jansen and L. Porkolab. Improved approximation schemes for scheduling unrelated parallel machines. In Proceedings of the 31st Annual ACM Symposium on the Theory of Computing, pages 408–417, 1999. 12. D. Karger, C. Stein, and J. Wein. Scheduling algorithms. In M. J. Atallah, editor, Handbook of Algorithms and Theory of Computation. CRC Press, 1997. 13. H. Kellerer, T. Tautenhahn, and G. J. Woeginger. Approximability and nonapproximability results for minimizing total flow time on a single machine. In In Proceedings of the 28th Annual ACM Symposium on Theory of Computing (STOC’96), pages 418–426, 1996. 14. J. Labetoulle, E. L. Lawler, J. K. Lenstra, and A. H. G. R. Kan. Preemptive scheduling of uniform machines subject to release dates. In W. R. Pulleyblank, editor, Progress in Combinatorial Optimization, pages 245–261. Academic Press, 1984. 15. E. Lawler and J. Labetoulle. On preemptive scheduling of unrelated parallel processors by linear programming. Journal of the ACM, 25:612–619, 1978. 16. J. K. Lenstra, D. B. Shmoys, and E. Tardos. Approximation algorithms for scheduling unrelated parallel machines. Mathematical Programming, 46:259–271, 1990. 17. S. Leonardi and D. Raz. Approximating total flow time on parallel machines. In Proc. 28th Annual ACM Symposium on the Theory of Computing (STOC’96), pages 110–119, 1997. 18. J. Sgall. On-line scheduling – a survey. In A. Fiat and G. Woeginger, editors, OnLine Algorithms, Lecture Notes in Computer Science. Springer-Verlag, Berlin., 1997. Linear Time Algorithms for Some NP-Complete Problems on (P5 ,Gem)-Free Graphs (Extended Abstract) Hans Bodlaender1 , Andreas Brandstädt2 , Dieter Kratsch3 , Michaël Rao3 , and Jeremy Spinrad4 1 3 Institute of Information and Computing Sciences, Utrecht University P.O. Box 80.089, 3508 TB Utrecht, The Netherlands hansb@cs.uu.nl 2 Fachbereich Informatik, Universität Rostock A.-Einstein-Str. 21, 18051 Rostock, Germany ab@informatik.uni-rostock.de Université de Metz, Laboratoire d’Informatique Théorique et Appliquée 57045 Metz Cedex 01, France fax: ++ 00 33 387315309 {kratsch,rao}@sciences.univ-metz.fr 4 Department of Electrical Engineering and Computer Science Vanderbilt University, Nashville TN 37235, U.S.A. spin@vuse.vanderbilt.edu Abstract. A graph is (P5 ,gem)-free, when it does not contain P5 (an induced path with five vertices) or a gem (a graph formed by making an universal vertex adjacent to each of the four vertices of the induced path P4 ) as an induced subgraph. Using a characterization of (P5 ,gem)-free graphs by their prime graphs with respect to modular decomposition and their modular decomposition trees [6], we obtain linear time algorithms for the following NP-complete problems on (P5 ,gem)-free graphs: Minimum Coloring, Maximum Weight Stable Set, Maximum Weight Clique, and Minimum Clique Cover. Keywords: algorithms, graph algorithms, NP-complete problems, modular decomposition, (P5 ,gem)-free graphs. 1 Introduction Graph decompositions play an important role in graph theory. The central role of decompositions in the recent proof of one of the major open conjectures in Graph Theory, the so-called Strong Perfect Graph Conjecture of C. Berge, is an exciting example [9]. Furthermore various decompositions of graphs such as decomposition by clique cutsets, tree-decomposition and clique-width are often used to design efficient graph algorithms. There are even beautiful general results stating that a variety of NP-complete graph problems can be solved in linear time for graphs of bounded treewidth and bounded clique-width, respectively [1,12]. A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 61–72, 2003. c Springer-Verlag Berlin Heidelberg 2003  62 H. Bodlaender et al. Despite the fact that modular decomposition is a well-known decomposition in graph theory having algorithmic uses that seem to be simple and obvious, there is relatively few research concerning non-trivial uses of modular decomposition such as designing polynomial time algorithms for NP-complete problems on special graph classes. An important exception are the many linear and polynomial time algorithms for cographs [10, 11] i.e. P4 -free graphs which are known to have a cotree representation which allows to solve various NP-complete problems in linear time when restricted to cographs, among them the problems Maximum (Weight) Stable Set, Maximum (Weight) Clique, Minimum Coloring and Minimum Clique Cover [10,11]. The original motivation to study (P5 ,gem)-free graphs, as a natural generalization of cographs, by the authors of [6] had been to construct a faster, possibly linear time algorithm for the Maximum Stable Set problem on (P5 ,gem)-free graphs. They established a characterization of the (P5 ,gem)-free graphs by their prime induced subgraphs called the Structure Theorem for (P5 ,gem)-free graphs. We show in this paper that the Structure Theorem is a powerful tool to design efficient algorithms for NP-complete problems on (P5 ,gem)-free graphs. All our algorithms use the modular decomposition tree of the input graph and the structure of the prime (P5 ,gem)-free graphs. We are convinced that efficient algorithms for other NP-complete graph problems (e.g. domination problems) on (P5 ,gem)-free graphs can also be obtained by this approach. It is remarkable that there are only few papers establishing efficient algorithms for NP-complete graph problems using modular decomposition and that most of them consider a single problem, namely Maximum (Weight) Stable Set. For work dealing with other problems we refer to [4,5,18]. Concerning the limits of modular decomposition it is known, for example, that Achromatic Number, List Coloring, and λ2,1 -Coloring with pre-assigned colors remain NP-complete on cographs [2,3,19]. This implies that these three problems are NP-complete on (P5 ,gem)-free graphs.1 There is also a strong relation between modular decomposition and the clique-width of graphs. For example, if all prime graphs of a graph class have bounded size then this class has bounded clique-width. Problems definable in a certain logic, so-called LinEMSOL(τ1,L )-definable problems, such as Maximum (Weight) Stable Set, Maximum (Weight) Clique and Minimum (Weight) Dominating Set, can be solved in linear time on any graph class of bounded clique-width, assuming a k-expression describing the graph is part of the input [12]. Many other NP-complete problems which are not LinEMSOL(τ1,L )-definable can be solved in polynomial time on graph classes of bounded clique-width [15,20]. Brandstädt et al. have shown that (P5 ,gem)-free graphs have clique-width at most five [7]. However this does not yet imply linear time algorithms for LinEMSOL(τ1,L )definable problems on (P5 ,gem)-free graphs, since their approach does not provide a linear time algorithm to compute a suitable k-expression. We present a linear time algorithm to solve the NP-complete Minimum Coloring problem on (P5 ,gem)-free graphs using modular decomposition in Section 5. The NP1 A proof, similarly to the one in [3] shows that λ2,1 -Coloring is NP-complete for graphs with at most one prime induced subgraph, the P4 , and hence for (P5 ,gem)-free graphs. Linear Time Algorithms for Some NP-Complete Problems on (P5 ,Gem)-Free Graphs 63 complete problems Maximum Weight Stable Set, Maximum Weight Clique and Minimum Clique Cover can also be solved by linear time algorithms using modular decomposition for (P5 ,gem)-free graphs. Due to space constraints, these algorithms are not shown in this extended abstract. 2 Preliminaries We assume the reader to be familiar with standard graph theoretic notations. In this paper, G = (V, E) is a finite undirected graph, and |V | = n and |E| = m. N (v) := {u : u ∈ V, u = v, uv ∈ E} denotes the open neighborhood of v and N [v] := N (v) ∪ {v} the closed neighborhood of v. The complement graph of G is denoted G = (V, E). For U ⊆ V let G[U ] denote the subgraph of G induced by U . A graph is co-connected if its complement G is connected. If for U ⊂ V , a vertex not in U is adjacent to exactly k vertices in U then it is called a k-vertex for U . A function f : V → N is a (proper) coloring of the graph G = (V, E), if {u, v} ∈ E implies f (u) = f (v). The chromatic number of G, denoted χ(G), is the smallest k such that the graph G has a k-coloring f : V → {1, 2, . . . , k}. Let G = (V, E) be a graph with vertex weight function w : V → N. The weight of a vertex set U ⊆ V is defined to be w(U ) := u∈U w(u). We let αw (G) denote the maximum weight of a stable set of G and ωw (G) denote the maximum weight of a clique of G. A weighted k-coloring of (G, w) assigns to each vertex v of G w(v) different colors, i.e. integers of {1, 2, . . . , k}, such that {x, y} ∈ E implies that no color assigned to x is equal to a color assigned to y. χw (G) denotes the smallest k such that the graph G with weight function w has a weighted k-coloring. Note that each weighted k-coloring of (G, w) corresponds to a multiset S1 , S2 , . . . , Sk of stable sets of G where Si , i ∈ {1, 2, . . . , k}, is the set of all vertices of G to which color i is assigned. 3 Modular Decomposition Modular decomposition is a fundamental decomposition technique that can be applied to graphs, partially ordered sets, hypergraphs and other structures. It has been described and used under different names and it has been rediscovered various times. Gallai introduced and studied modular decomposition in his seminal 1967 paper [17] where it is used to decompose comparability graphs. A vertex set M ⊆ V is a module in G if for all vertices x ∈ V \ M , x is either adjacent to all vertices in M , or non-adjacent to all vertices in M . The trivial modules of G are ∅, V and the singletons. A homogeneous set in G is a nontrivial module in G. A graph containing no homogeneous set is called prime. Note that the smallest prime graph is the P4 . A homogeneous set M is maximal if no other homogeneous set properly contains M . Modular decomposition of graphs is based on the following decomposition theorem. Theorem 1 ([17]). Let G = (V, E) be a graph with at least two vertices. Then exactly one of the following conditions holds: 64 H. Bodlaender et al. (i) G is not connected: it can be decomposed into its connected components; (ii) G is not connected: G can be decomposed into the connected components of G; (iii) G is connected and co-connected. There is some U ⊆ V and a unique partition P of V such that (a) |U | > 3, (b) G[U ] is a maximal prime induced subgraph of G, and (c) for every class S of the partition P , S is a module of G and |S ∩ U | = 1. Consequently there are three decomposition operations. 0-Operation: If G is disconnected then decompose it into its connected components G1 , G2 , . . . , Gr . 1-Operation: If G is disconnected then decompose G into G1 , G2 , . . . Gs , where G1 , G2 , . . . Gs are the connected components of G. 2-Operation: If G = (V, E) is connected and co-connected then its maximal homogeneous sets are pairwise disjoint and they form the partition P of V . The graph G[U ] is obtained from G by contracting every maximal homogeneous set of G to a single vertex; it is called the characteristic graph of G and denoted by G∗ . (Note that the characteristic graph of a connected and co-connected graph G is prime.) The decomposition theorem and the above mentioned operations lead to the uniquely determined modular decomposition tree T of G. The leaves of the modular decomposition tree are the vertices of G. The interior nodes of T are labeled 0, 1 or 2 according to the operation corresponding to the node. Thus we call them 0-node (parallel node), 1-node (series node) and 2-node (prime node). Any interior node x of T corresponds to the subgraph of G induced by the set of all leaves in the subtree of T rooted at x, denoted by G(x). 0-node. The children of a 0-node x correspond to the components obtained by a 0operation applied to the disconnected graph G(x). 1-node. The children of a 1-node x correspond to the components obtained by a 1operation applied to the not co-connected graph G(x). 2-node The children of a 2-node x correspond to the subgraphs induced by the maximal homogeneous sets or single vertices of the connected and co-connected graph G(x). Additionally, the characteristic graph of G(x) is assigned to the 2-node x. The modular decomposition tree is of basic importance for many algorithmic applications, and in [22,13,14], linear time algorithms are given for determining the modular decomposition tree of an input graph. Often, algorithms exploiting the modular decomposition have the following structure. Let Π be a graph problem to be solved on some graph class G, e.g., Maximum Stable Set on (P5 ,gem)-free graphs. First the algorithm computes the modular decomposition tree T of the input graph G using one of the linear time algorithms. Then in a bottom up fashion the algorithm computes for each node x of T the optimal value for the subgraph G(x) of G induced by the set of all leaves of the subtree of T rooted at x. Thus the computation starts assigning the optimal value to the leaves. Then the algorithm computes the optimal value of an interior node x by using the optimal values of all children of x depending on the type of the node. Finally the optimal value of the Linear Time Algorithms for Some NP-Complete Problems on (P5 ,Gem)-Free Graphs 65 root is the optimal value of Π for the input graph G. (Note that various more complicated variants of this method can be useful.) Thus to specify such a modular decomposition based algorithm we only have to describe how to obtain the value for the leaves, and which formula to evaluate or which subproblem to solve on 0-nodes, 1-nodes and 2-nodes, using the values of all children as input. It is well-known how to do this for 0-nodes and 1-nodes for the NP-complete graph problems Maximum Weight Stable Set, Maximum Weight Clique, Minimum Coloring and Minimum Clique Cover from the corresponding cograph algorithm [10,11]. On the other hand to find out the algorithmic problem to solve on 2-nodes, called the 2node subproblem, for solving problem Π using modular decomposition can be quite challenging. 4 The Structure Theorem for (P5 ,Gem)-Free Graphs To state the Structure Theorem of (P5 ,gem)-free graphs we need to define three classes of (P5 ,gem)-free graphs which together contain all prime (P5 ,gem)-free graphs. Definition 1. A graph G = (V, E) is called matched cobipartite if its vertex set V is partionable into two cliques C1 , C2 with |C1 | = |C2 | or |C1 | = |C2 | − 1 such that the edges between C1 and C2 form a matching and at most one vertex in C1 and at most one vertex in C2 are not covered by the matching. Definition 2. A graph G is called specific if it is the complement of a prime induced subgraph of one of the three graphs in Figure 1. Fig. 1. To establish a definition of the third class of prime graphs, we do need some more notions. A graph is chordal if it contains no induced cycles Ck , k ≥ 4. See e.g. [8] for properties of chordal graphs. A graph is cochordal if its complement graph is chordal. A vertex v is simplicial in G if its neighborhood N (v) in G is a clique. A vertex v is cosimplicial in G if it is simplicial in G. It is well-known that every chordal graph has a simplicial vertex and that such a vertex can be found in linear time. We also need the following kind of substituting a C5 into a vertex: For a graph G and a vertex v in G, let the result of the extension operation ext(G, v) denote the graph G′ resulting from G by replacing v with a C5 (v1 , v2 , v3 , v4 , v5 ) of new vertices such that v2 , v4 and v5 have the same neighborhood in G as v, and v1 , v3 have only their C5 66 H. Bodlaender et al. neighbors, i.e. have degree 2 in G′ . For a vertex set U ⊆ V of G, let ext(G, U ) denote the result of applying repeatedly the extension operation to all vertices of U . Note that the resulting graph does not depend on the order of replacing U vertices. Definition 3. For k ≥ 0, let Ck be the class of prime graphs G′ = ext(G, Q) resulting from a (not necessarily prime) cochordal gem-free graph G by extending a clique Q of exactly k cosimplicial vertices of G. Thus, C0 is the class of prime cochordal gem-free graphs. Clearly each graph in Ck contains k C5 ’s which are vertex-disjoint. It is also known that each graph in Ck has neither C4 nor C6 as an induced subgraph [6]. Lemma 1. Let G = (V, E) be a graph of Ck , k ≥ 1. Then for every C5 C = (v1 , v2 , v3 , v4 , v5 ) of G, the vertex set V has a partition into {v1 , v2 , v3 , v4 , v5 }, the stable set A of 0-vertices for C and the set B of 3-vertices for C such that all vertices of B have the same non consecutive neighbors in C, say v2 , v4 , v5 , and G[B] is a cograph. Theorem 2 (Structure Theorem [6]). A connected and co-connected graph G is (P5 ,gem)-free if and only if the following conditions hold: (1) The homogeneous sets of G are P4 -free (i.e., induce a cograph); (2) For the characteristic graph G∗ of G, one of the following conditions holds: (2.1) G∗ is a matched co-bipartite graph; (2.2) G∗ is a specific graph; (2.3) there is a k ≥ 0 such that G∗ is in Ck . Consequently, the modular decomposition tree T of any connected (P5 ,gem)-free G contains at most one 2-node. If G is a cograph then T has no 2-node. If G is not a cograph then the only 2-node of T is its root. 5 An Algorithm for Minimum Coloring on (P5 ,Gem)-Free Graphs In this section we present a linear time algorithm for the Minimum Coloring problem on (P5 ,gem)-free graphs. That is we are given a (P5 ,gem)-free graph G, and want to determine χ(G). Minimum Coloring is not LinEMSOL(τ1,L ) definable. Nevertheless there is a polynomial time algorithm for graphs of bounded clique-width [20]. However this algorithm is only of theoretical interest. For graphs of clique-width at most five (which is the best known upper bound for the clique-width of (P5 ,gem)-free graphs [7]), the exponent r of the running time O(nr ) of this algorithm is larger than 2000. 5.1 The Subproblems We use the approach discussed in Section 3. Thus, we start by computing (in linear time) the modular decomposition tree T of G. For each node x of T , we compute χ(G(x)). Suppose x1 , x2 , . . . , xr are the children of x. For leaves, 0-nodes, and 1-nodes x, the Linear Time Algorithms for Some NP-Complete Problems on (P5 ,Gem)-Free Graphs 67 steps of the linear time algorithm for Minimum Coloring on cographs can be used: If x is a leaf of T then χ(G(x)) := 1. If x is a 0-node, then χ(G(x)) := max χ(G(xi )). i=1,... ,r r If x is a 1-node, then χ(G(x)) := i=1 χ(G(xi )). Suppose x is a 2-node of T . Let G∗ = (V ∗ , E ∗ ) be the characteristic graph assigned to x. We assign to the vertex set V ∗ of G∗ the weight function w∗ : V ∗ → N such that w∗ (vi ) := χ(G(xi )). We have that χ(G(x)) := χw∗ (G∗ ). Thus, the Minimum Coloring problem on (P5 ,gem)-free graphs becomes the problem of computing the minimum number of colors for a weighted coloring of (G∗ , w∗ ), where G∗ is a prime (P5 ,gem)-free graph. The remainder of this section is devoted to this problem. The Structure Theorem tells us that G∗ either is a matched co-bipartite graph, a specific graph, or there is a k ≥ 0 with G∗ in Ck . In three subsections, each of these cases will be dealt with. We also use the following notation and lemma.  Let N = v∈V ∗ w∗ (v) be the total weight. Observe that N is at most the number of vertices of the original (P5 ,gem)-free graph. Lemma 2. Let G be a perfect graph and w be a vertex weight function of G. Then χw (G) = ωw (G) and κw (G) = αw (G). Proof. Let G′ be the graph obtained from G by substituting each vertex v of G by a clique of cardinality w(v). As any weighted coloring of (G, w) corresponds to a coloring of G′ and vice versa, we have χw (G) = χ(G′ ). Similarly, ωw (G) = ω(G′ ). Let G be perfect. Then G is perfect by Lovasz’s Perfect Graph Theorem [21]. G′ is obtained from the perfect graph G by vertex multiplication, and thus it is perfect [21]. As G′ is the complement of a perfect graph (G′ ), it is perfect. Since G′ is perfect we have χ(G′ ) = ω(G′ ) and thus χw (G) = ωw (G). Similarly, since G′ is perfect we obtain χw (G) = ωw (G). Hence κw (G) = αw (G). ⊓ ⊔ We now discuss how to solve the weighted coloring problem for each of the three classes of prime (P5 ,gem)-free graphs. 5.2 Matched Cobipartite Graphs The graph G∗ is cobipartite and thus perfect. By Lemma 2 we obtain χw∗ (G∗ ) = ωw∗ (G∗ ). One easily finds in linear time a partition of the vertex set of G∗ into two cliques, C1 , and C2 . Now, as each maximal clique of G∗ is either C1 , C2 , or an edge of G∗ , ωw∗ (G∗ ) = χw∗ (G∗ ) can be computed by a linear time algorithm. 5.3 Specific Graphs Each specific graph G∗ is a prime induced subgraph of the complement of one of the three graphs in Figure 1. To solve the weighted coloring problem on specific graphs, we formulate this problem as an integer linear programming problem, and then argue that this ILP can be solved in constant time. 68 H. Bodlaender et al. Consider the specific graph G∗ with weights w∗ . Let S be the collection of all maximal stable sets of G∗ . We build an integer linear programming with for each S ∈ S a variable xS , as follows.  minimize xS such that (1) S∈S  v∈S,S∈S xS ≥ w(v) for all v ∈ V (2) xS ≥ 0 for all S ∈ S (3) xS integer for all S ∈ S (4) With x we denote a vector containing for each S ∈ S a value xS . Let z be the optimal value of this ILP. z equals the minimum number of colors needed for (G∗ , w∗ ). If we have a coloring of (G∗ , w∗ ) with a minimum number of colors, then assign to each color one maximal stable set S ∈ S, such that this color is given to (a subset of) all vertices in S. Let xS be the number of colors assigned to S. Clearly, xS is a non-negative  integer. For each v ∈ V , as v has w(v) colors, we have  S∈S xS equals the number of colors. Conversely, suppose v∈S,S∈S xS ≥ w(v). we have an optimal solution xS of the ILP. For each S ∈ S, we can take a set of xS unique colors, and use these  colors to color the vertices in xS . As S is stable, this gives a proper coloring, and as v∈S,S∈S xS ≥ w(v), each vertex has sufficiently many colors available. So, this gives a coloring of (G∗ , w∗ ) with z colors. The relaxation of the ILP is the linear program, obtained by dropping the integer condition (4):  minimize xS such that (5) S∈S  v∈S,S∈S xS ≥ w(v) for all v ∈ V (6) xS ≥ 0 for all S ∈ S (7) Let x′ be an optimal solution of this relaxation, with value z ′ = S∈S x′S . As G∗ is a specific graph, the linear program has a constant number of variables (namely, the number of maximal stable sets of G∗ ) and a constant number of constraints (at most nine, one per vertex of G∗ ), and hence can be solved in constant time. (E.g., enumerate all corners of the polyhedron spanned by program, and take the optimal one.) Note that we can write the linear program in the form max{cx | Ax ≤ b}, such that each element of A is either 0 or 1. Let ∆ be the maximum value of a subdeterminant of this matrix A. It follows that ∆ is bounded by a constant. Write s = |S|. Now we can use a result of Cook, Gerards, Schrijver, and Tardos, see Theorem 17.2 from [23]. This theorem tells us that the ILP has an optimal solution x′′ , such that for each S ∈ S, |x′S − x′′S | ≤ s∆. Thus, the following is an algorithm that finds the optimal solution to the ILP (and hence the number of colors needed for (G∗ , w∗ )) in constant time. First, find an optimal solution x′ of the relaxation. Then, enumerate all integer vectors x′′ with for all S ∈ S,  Linear Time Algorithms for Some NP-Complete Problems on (P5 ,Gem)-Free Graphs 69 |x′S −x′′S | ≤ s∆. For each such x′′ , check if it fulfils condition (2), and select the solution vector that fulfils the conditions with the minimum value. By Theorem 17.2 from [23], this is an optimal solution of the ILP. This method takes constant time, as s and ∆ are bounded by constants, and thus ‘only’ a constant number of vectors have to be checked, and each is of constant size.2 A straightforward implementation of this procedure would not be practical, as more than (s∆)s vectors are checked, with s the number of maximal stable sets in one of the specific graphs. In a practical setting, one could first solve the linear program, and use that value as starting point in a branch and bound procedure. Remark 1. The method not only works for the specific graphs, but for any constant size graph. This implies that Minimum Coloring can be solved in linear time for graphs whose modular decomposition has a constant upper bound on the size of the characteristic graphs. Remark 2. In the full version we present an O(N 3 ) time algorithm to solve the weighted coloring of the specific graphs, that has no large hidden constant in the running time. 5.4 ∞ k=0 Ck ∗ Let G ∈ Ck , for some k ≥ 0, and w∗ the weight function G∗ . All C5 ’s of G∗ can be computed by a linear time algorithm that first computes all vertices of degree two. If G∗ = C5 then with the technique applied to specific graphs χw∗ (G∗ ) can be computed in constant time. If G∗ ∈ C0 then it is cochordal and thus perfect. Hence χw∗ (G∗ ) = ωw∗ (G∗ ) by Lemma 2. Lemma 3. The Maximum Weight Clique problem and the weighted coloring problem can be solved by a linear time algorithm for cochordal graphs. Proof. Frank [16] gave a linear time algorithm to compute the maximum weight of a stable set of a chordal graph G. This implies that there is an O(n2 ) algorithm to compute the maximum weight of a clique in a cochordal graph G since ωw (G) = αw (G). To get a linear time algorithm, we must avoid the complementation; thus, we simulate Frank’s algorithm applied to G. This is Frank’s algorithm: First it computes a perfect elimination ordering v1 , . . . , vn of the input chordal graph G = (V, E). Then a maximum weight stable set is constructed as follows. Initially, let c w(vi ) = w(vi ), for all 1 ≤ i ≤ n. For each i from 1 to n, if c w(vi ) > 0 then colour vi red, and subtract c w(vi ) from c w(vj ) for all vj ∈ {vi } ∪ (N (vi ) ∩ {vi+1 , . . . , vn }). After all vertices have been processed, set I = ∅ and, for each i from n down to 1, if vi is red and not adjacent to any vertex of I then I = I ∪ {vi }. When all vertices have been processed again, the algorithm terminates and outputs the maximum weight stable set I of (G, w). We now describe our simulation of this algorithm. First a perfect elimination ordering v1 , v2 , . . . , vn of G is computed in linear time (see e.g. [22]). The maximum weight of a clique of G is constructed as follows. Initially, let W ′ = 0 and s(vi ) = 0 for all i (1 ≤ i ≤ n). For each i from 1 to n, if w(vi ) − W ′ + s(vi ) > 0 2 Computer computation shows that ∆ ≤ 3 for specific graphs. 70 H. Bodlaender et al. then colour vi red, set W ′ = w(vi ) + s(vi ) and add w(vi ) − W ′ + s(vi ) to s(vj ) for all vj ∈ (N (vi ) ∩ {vi+1 , . . . , vn }). After all vertices have been processed, set K = ∅ and, for each i from n down to 1, if vi is red and adjacent to all vertices of K then K = K ∪ {vi }. Finally the algorithm outputs the maximum weight clique K of (G, w). Clearly our algorithm runs in linear time. Its correctness follows from the fact that when treating the vertex vi , the difference W ′ − s(vi ) is precisely the value the original Frank algorithm applied to the complement of G would have subtracted from c w(vi ) up to the point when it treats vi . Thus our algorithm simulates Frank’s algorithm on G, and thus it is correct. ⊓ ⊔ In the remaining case, we consider a prime graph G∗ ∈ Ck , k ≥ 1 such that G∗ = C5 . Lemma 4. Let k ≥ 1, G∗ ∈ Ck and G∗ = C5 . Let C = (v1 , v2 , v3 , v4 , v5 ) be a C5 in G∗ and v1 and v3 its vertices of degree two. Let w∗ be the vertex weight function of G∗ . Then there is a minimum weight coloring S ∗ of (G∗ , w∗ ) with precisely max(w∗ (v2 ), w∗ (v4 )+ w∗ (v5 )) stable sets containing at least one of the vertices of {v2 , v4 , v5 }. Proof. By Lemma 1, the set A of 0-vertices for C = (v1 , v2 , v3 , v4 , v5 ) is a stable set, B = V ∗ \ (C ∪ A) = N (v2 ) \ C = N (v4 ) \ C = N (v5 ) \ C, and G∗ [B] is a cograph. Let S be any minimum weight coloring of (G∗ , w∗ ). Since N (v1 )\C = N (v3 )\C = ∅ and N (v2 ) \ C = N (v4 ) \ C = N (v5 ) \ C = B we may assume that every stable set of S contains either none or two vertices of C. Therefore we study weighted colorings of a C5 C = (v1 , v2 , v3 , v4 , v5 ) of G∗ with vertex weights w∗ , where all stable sets are non edges of C and call them partial weight colorings (abbr. pwc) of C. Clearly any pwc of C = (v1 , v2 , v3 , v4 , v5 ) contains at least w∗ (v2 ) stable sets containing v2 , and it contains at least w∗ (v4 ) + w∗ (v5 ) stable sets containing v4 or v5 . Let S ′ be a weighted coloring of G∗ containing the smallest possible number of stable sets S with S ∩ {v2 , v4 , v5 } =  ∅. Let t be the number of stable sets S of S ′ satisfying S ∩ {v2 , v4 , v5 } = ∅ and suppose that, contrary to the statement of the lemma, t > max(w∗ (v2 ), w∗ (v4 ) + w∗ (v5 )). Let s(v) be the number of stable sets of S ′ containing the vertex v. Then t > w∗ (v4 ) + w∗ (v5 ) implies s(v4 ) > w∗ (v4 ) or s(v5 ) > w∗ (v5 ). W.l.o.g. we may assume s(v4 ) > w∗ (v4 ). Hence there is a stable set S ′ ∈ S ′ containing v4 . Consequently either S ′ ⊆ {v2 , v4 } ∪ A or S ′ ⊆ {v1 , v4 } ∪ A. In both cases we replace the stable set S ′ of S ′ by {v1 , v3 } ∪ A. Thus the replacement decrements the number of stable sets containing v4 and possibly the number of stable sets containing v2 . Thus we obtain a new weighted coloring S ′′ of G∗ with t − 1 stable sets S with S ∩ {v2 , v4 , v5 } =  ∅. This contradicts the choice of t. Consequently t = max(w∗ (v2 ), w∗ (v4 ) + w∗ (v5 )). ⊓ ⊔ To extend any pwc of a C5 C to G∗ only two parameters are important: the number a of stable sets {v1 , v3 } in the pwc of C, and the number b of non edges in the pwc of C different from {v1 , v3 }. Each of the a stable sets {v1 , v3 } in the pwc of C, can be extended to a maximal stable set {v1 , v3 } ∪ A′ of G∗ , where A′ is some maximal stable set of G∗ − C. Each of the b non edges S, S = {v1 , v3 }, in the pwc of C has the unique extension to the maximal stable set S ∪ A of G∗ . Linear Time Algorithms for Some NP-Complete Problems on (P5 ,Gem)-Free Graphs 71 By Lemma 4, for each C5 of G∗ there is a minimum weight coloring of G∗ extending a pwc of the C5 C with b = max(w∗ (v2 ), w∗ (v4 ) + w∗ (v5 )). Taking such a minimum weight coloring we can clearly remove vertices v1 and v3 from stable sets containing both until we obtain the smallest possible value of a in a pwc of C with b = max(w∗ (v2 ), w∗ (v4 ) + w∗ (v5 )). Finally given a C5 C, the smallest possible value of a in a pwc of C with b = max(w∗ (v2 ), w∗ (v4 ) + w∗ (v5 )) can be computed in constant time. (Details omitted.) Now we are ready to present our coloring algorithm that computes a minimum weight coloring of (G∗ , w∗ ) for a graph G∗ of Ck , k ≥ 1. It removes at most k times the precomputed C5 from the current graph until the remaining graph has no C5 and is therefore a cochordal graph. Then by Lemma 3 there is an algorithm to solve the weighted coloring problem for the cochordal graph in linear time. In each round, i.e. when removing one C5 C = (v1 , v2 , v3 , v4 , v5 ) from the current graph G′ with current weight function w′ , the algorithm proceeds as follows: First it computes in constant time a pwc of C such that b = max(w′ (v2 ), w′ (v4 ) + w′ (v5 )) and a as small as possible. Then the algorithm removes all vertices of C and obtains the graph G′′ = G′ − C. Furthermore it removes all vertices of the stable set A of 0-vertices for C in G′ with weight at most a and decrements the weight of all other vertices in A by a. Recursively the algorithm solves the minimum weight coloring problem on the obtained graph G′′ with weight function w′′ . Finally the minimum number of stable sets in a weighted coloring of (G′ , w′ ) is obtained using the formula χw′ (G′ ) = a + max(b, χw′′ (G′′ )). Thus the algorithm removes at most k ≤ n times a C5 . Each pwc of a C5 can be computed in constant time. For the final cochordal graph the minimum weight coloring can be solved in linear time. Hence the overall running time of the algorithmis linear. We have given a linear time algorithm for the weighted coloring problem for k≥0 Ck . We can finally conclude: Theorem 3. There is a linear time algorithm to solve the Minimum Coloring problem on (P5 ,gem)-free graphs. 6 Conclusion We have shown how modular decomposition and the Structure Theorem for (P5 ,gem)free graphs can used to obtain a linear time algorithm to solve the Minimum Coloring problem. In a quite similar way one can construct a linear time algorithm to solve the Minimum Clique Cover problem on (P5 ,gem)-free graphs. Modular decomposition can also be used to obtain linear time algorithms for the LinEMSOL(τ1,L ) definable NPcomplete graph problems Maximum Weight Stable Set and Maximum Weight Clique on (P5 ,gem)-free graphs. These algorithms are given in the full version of this paper. Acknowledgement. Thanks are due to Alexander Schrijver for pointing towards Theorem 17.2 from his book [23]. 72 H. Bodlaender et al. References 1. S. Arnborg, J. Lagergren, D. Seese, Easy problems for tree-decomposable graphs, J. Algorithms 12 (1991), 308–340. 2. H. Bodlaender, Achromatic number is NP-complete for cographs and interval graphs, Inform. Process. Lett. 31 (1989) 135–138 3. H.L. Bodlaender, H.J. Broersma, F.V. Fomin, A.V. Pyatkin, G.J. Woeginger, Radio labeling with pre-assigned frequencies, Proceedings of the 10th European Symposium on Algorithms (ESA’2002), LNCS 2461 (2002) 211–222 4. H.L. Bodlaender, K. Jansen, On the complexity of the maximum cut problem, Nord. J. Comput. 7 (2000) 14–31 5. H.L. Bodlaender, U. Rotics, Computing the treewidth and the minimum fill-in with the modular decomposition, Proceedings of the 8th Scandinavian Workshop on Algorithm Theory (SWAT’2002), LNCS 1851 (2002) 388–397 6. A. Brandstädt, D. Kratsch, On the structure of (P5 ,gem)-free graphs, Manuscript 2002 7. A. Brandstädt, H.-O. Le, R. Mosca, Chordal co-gem-free graphs have bounded clique width, Manuscript 2002 8. A. Brandstädt, V.B. Le, J. Spinrad, Graph Classes: A Survey, SIAM Monographs on Discrete Math. Appl., Vol. 3, SIAM, Philadelphia (1999) 9. M. Chudnovsky, N. Robertson, P.D.Seymour, R.Thomas, The Strong Perfect Graph Theorem, Manuscript 2002 10. D.G. Corneil, H. Lerchs, L. Stewart-Burlingham, Complement reducible graphs, Discrete Applied Math. 3 (1981) 163–174 11. D.G. Corneil, Y. Perl, L.K. Stewart, Cographs: recognition, applications, and algorithms, Congressus Numer. 43 (1984) 249–258 12. B. Courcelle, J.A. Makowsky, U. Rotics, Linear time solvable optimization problems on graphs of bounded clique-width, Theory of Computing Systems 33 (2000) 125–150 13. A. Cournier, M. Habib, A new linear algorithm for modular decomposition, Trees in Algebra and Programming - CAAP ’94, LNCS 787 (1994) 68–84 14. E. Dahlhaus, J. Gustedt, R.M. McConnell, Efficient and practical algorithms for sequential modular decomposition, J. Algorithms 41 (2001) 360–387 15. W. Espelage, F. Gurski, E. Wanke, How to solve NP-hard graph problems on clique-width bounded graphs in polynomial time, Proceedings of the 27th Workshop on Graph-Theoretic Concepts in Computer Science (WG 2001), LNCS 2204 (2001) 117–128 16. A. Frank, Some polynomial algorithms for certain graphs and hypergraphs, Proceedings of the Fifth British Combinatorial Conference (Univ. Aberdeen, Aberdeen, 1975) 211–226, Congressus Numerantium No. XV, Utilitas Math., Winnipeg, Man. (1976) 17. T. Gallai, Transitiv orientierbare Graphen, Acta Mathematica Academiae Scientiarum Hungaricae 18 (1967) 25–66 18. V. Giakoumakis, I. Rusu, Weighted parameters in (P 5, P 5)-free graphs, Discrete Appl. Math. 80 (1997) 255–261 19. K. Jansen, P. Scheffler, Generalized coloring for tree-like graphs, Discrete Appl. Math. 75 (1997) 135–155 20. D. Kobler, U. Rotics, Edge dominating set and colorings on graphs with fixed clique-width, Discrete Appl. Math. 126 (2003) 197–221 21. L. Lovász, Normal hypergraphs and the perfect graph conjecture, Discrete Math. 2 (1972) 253–267 22. R.M. McConnell, J. Spinrad, Modular decomposition and transitive orientation, Discrete Math. 201 (1999) 189–241 23. A. Schrijver, Theory of Linear and Integer Programming, John Wiley & Sons, Chichester, 1986. Graph Searching, Elimination Trees, and a Generalization of Bandwidth Fedor V. Fomin, Pinar Heggernes, and Jan Arne Telle Department of Informatics, University of Bergen, N-5020 Bergen, Norway {fomin,pinar,telle}@ii.uib.no Abstract. The bandwidth minimization problem has a long history and a number of practical applications. In this paper we introduce a generalization of bandwidth to partially ordered layouts. We consider this generalization from two main viewpoints: graph searching and tree decompositions. The three graph parameters pathwidth, profile and bandwidth related to linear layouts can be defined by variants of graph searching using a standard fugitive. Switching to an inert fugitive, the two former parameters are generalized to treewidth and fill-in, and our first viewpoint considers the analogous tree-like generalization that arises from the bandwidth variant. Bandwidth also has a definition in terms of ordered path decompositions, and our second viewpoint generalizes this in a natural way to ordered tree decompositions. In showing that both generalizations are equivalent we employ the third viewpoint of elimination trees, as used in the field of sparse matrix computations. We call the resulting parameter the treespan of a graph and prove some of its combinatorial and algorithmic properties. 1 Motivation through Graph Searching Games Different versions of graph searching has been attracting the attention of researchers from Discrete Mathematics and Computer Science for a variety of elegant and unexpected applications in different and seemingly unrelated fields. There is a strong resemblance of graph searching to certain pebble games [15] that model sequential computation. Other applications of graph searching can be found in VLSI theory since this game-theoretic approach to some important parameters of graph layouts such as the cutwidth [19], the topological bandwidth [18], the bandwidth [9], the profile [10], and the vertex separation number [8] is very useful for the design of efficient algorithms. There is also a connection between graph searching, pathwidth and treewidth, parameters that play an important role in the theory of graph minors developed by Robertson & Seymour [3,7,22]. Furthermore, some search problems have applications in problems of privacy in distributed environments with mobile eavesdroppers (‘bugs’) [11]. In the standard node-search version of searching, a single searcher is placed at a vertex of a graph G at every move, while from other vertices searchers are removed (see e.g. [15]). The purpose of searching is to capture an invisible fugitive moving fast along paths in G. The fugitive is not allowed to run through A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 73–85, 2003. c Springer-Verlag Berlin Heidelberg 2003  74 F.V. Fomin, P. Heggernes, and J.A. Telle the vertices currently occupied by searchers. So the fugitive is caught when a searcher is placed on the vertex it occupies, and it has no possibility to leave the vertex because all the neighbors are occupied (guarded) by searchers. The goal of search games is to find a search strategy to guarantee the fugitive’s capture while minimizing some resource usage. Because the fugitive is invisible, the only information the searchers possess are the previous search moves that may give knowledge about subgraphs where the fugitive cannot possibly be present. This brings us to the interesting interpretation of the search problem [3] as the problem of fighting against damage spread in complex systems, e.g. the spread of a mobile computer virus in networks. Initially all vertices are viewed as contaminated (infected by a virus or damaged) and a contaminated vertex is cleared once it is occupied by a searcher (checked by an anti-virus program). A clear vertex v is recontaminated if there is a path without searchers leading from v to a contaminated vertex. In some applications it is required that recontamination should never occur and in this case we are interested in the so-called ’monotone’ searching. For most of the search game variants considered in the literature it can be shown, sometimes by very clever techniques, that the resource usage does not increase in spite of this constraint [15,16,4,7]. The ‘classical’ goal of the search problem is to find the search program such that the maximum number of searchers in use at any move is minimized. The minimum number of searchers needed to clear the graph is related to the parameter called pathwidth. Dendris et al. [7] studied a variation of the node-search problem with inert, or lazy, fugitive. In this version of the game the fugitive is allowed to move only just before a searcher is placed on the vertex it occupies. The smallest number of searchers needed to find the fugitive in this version of searching is related to the parameter called treewidth [7]. Another criteria of optimality in node-searching, namely search cost was studied in [10]. Here the goal is to minimize the sum of the number of searchers in use over all moves of the search program. The search cost of a graph is equal to the interval completion number, or profile, which is the smallest number of edges in any interval supergraph of the given graph. Looking at the monotone search cost version but now with an inert fugitive, it is easy to see that this parameter is equal to the smallest number of edges in the chordal supergraph of a given graph, so called fill-in. (It is not clear if in this version of searching recontamination can help and this is an interesting open question.) We thus have the following elegant relation: the parameters related to standard node searching (pathwidth, profile) expressible in terms of interval completion problems, correspond in inert fugitive searching to chordal completion problems (treewidth, fill-in). In this paper we want to minimize the maximum length of time (number of intermediate moves) during which a searcher occupies a vertex. A similar problem for pebbling games (that can be transferred into search terms) was studied by Rosenberg & Sudborough [23]. In terms of monotone pebbling (i.e., no recontamination allowed) this becomes the maximum lifetime of any pebble in the game. It turned out that this parameter is related to the bandwidth of a graph G, which is the minimum over all linear layouts of vertices in G of the maximum Graph Searching, Elimination Trees, and a Generalization of Bandwidth 75 distance between images of adjacent vertices. The following table summarizes the knowledge about known relations between graph monotone searching and graph parameters. Standard Search Inert Search Number of Searchers Cost of Searching Occupation Time pathwidth [15] profile [10] bandwidth [23] treewidth [7] fill-in ??? One of the main questions answered in this paper concerns the entry labeled ??? above: What kind of graph parameter corresponds to the minimum occupation time (mot) for monotone inert fugitive search? In section 2 we introduce a generalization of bandwidth to tree-like layouts, called treespan, based on what we call ordered tree decompositions. In section 3 we give the formal definition of the parameter mot(G), and then in section 4 we show that it is equivalent to a parameter arising from elimination trees, as used in the sparse matrix computation community. In section 5 we show the equivalence also between this elimination tree parameter and treespan, thereby providing evidence that the entry labeled ??? above indeed corresponds to a natural generalization of bandwidth to partially ordered (tree) layouts. Finally in section 6 we obtain some algorithmic and complexity results on the treespan parameter. 2 Motivation through Tree Decompositions We assume simple, undirected, connected graphs G = (V, E), where |V | = n. We let N (v) denote the neighbors of vertex v, and d(v) = |N (v)| is the degree of v. The maximum degree of any vertex in G is denoted by ∆(G). For a set of vertices U ⊆ V , N (U ) = {v ∈ U | uv ∈ E and u ∈ U }. H ⊆ G means that H is a subgraph of G. For a rooted tree T and a vertex v in T , we let T [v] denote the subtree of T with root in v. A chord of a cycle C in a graph is an edge that connects two non-consecutive vertices of C. A graph G is chordal if there are no induced chordless cycles of length ≥ 4 in G. Given any graph G = (V, E), a triangulation G+ = (V, E + ) of G is a chordal graph such that E ⊆ E + . A tree decomposition of a graph G = (V, E) is a pair (X, T ), where T = (I, M ) is a tree and X = {Xi | i ∈ I} is a collection of subsets of V called bags, such that:  1. i∈I Xi = V 2. uv ∈ E ⇒ ∃i ∈ I with u, v ∈ Xi 3. For all vertices v ∈ V , the set {i ∈ I | v ∈ Xi } induces a connected subtree of T . The width of a tree decomposition (X, T ) is tw(X, T ) = maxi∈I |Xi | − 1. The treewidth of a graph G is the minimum width over all tree decompositions of G. A path decomposition is a tree decomposition (X, T ) such that T is a path. The pathwidth of a graph G is the minimum width over all path decompositions of G. We refer to Bodlaender’s survey [5] for further information on treewidth. 76 F.V. Fomin, P. Heggernes, and J.A. Telle For a chordal graph G, the treewidth is one less than the size of the largest clique in G. For a non-chordal graph G, the treewidth is the minimum treewidth over all triangulations of G. This is due to the fact that a tree decomposition (X, T ) of G actually corresponds to a triangulation of the given graph G: simply add edges to G such that each bag of X becomes a clique. The resulting graph, which we will call tri(X, T ) is a chordal graph of which G is a subgraph. In addition, any triangulation G+ of G is equal to tri(X, T ) for some tree decomposition (X, T ) of G. Another reason why tree decompositions and chordal graphs are closely related is that chordal graphs are exactly the intersection graphs of subtrees of a tree [14]. Analogously, interval graphs are related to path decompositions, and they are the intersection graphs of subpaths of a path. A graph is interval if there is a mapping f of its vertices into sets of consecutive integers such that for each pair of vertices v, w the following is true: vw is an edge ⇔ f (v)∩f (w) = ∅. Interval graphs form a subclass of chordal graphs. Similar to treewidth, the pathwidth of a graph G is one less than the smallest clique number over all triangulations of G into interval graphs. The bandwidth of G, bw(G), is defined as the minimum, over all linear orders of the vertices of G, maximum difference between labels of two adjacent vertices. Similar to pathwidth and treewidth, bandwidth can be defined in terms of triangulations as follows. A graph isomorphic to K1,3 is referred to as a claw, and a graph that does not contain an induced claw is said to be claw-free. An interval graph G is a proper interval graph if it is claw-free [21]. As it was observed by Parra & Scheffler [20], the bandwidth of a graph G is one less than the smallest clique number over all triangulations of G into proper interval graphs. One can define bandwidth in terms of ordered path decompositions. In an ordered path decomposition, the bags are numbered 1, 2, ..., n from left to right. The first bag X1 contains only one vertex of G, and for 1 ≤ i ≤ n − 1 we have |Xi+1 \ Xi | = 1, meaning that exactly one new graph vertex is introduced in each new bag. The number of bags a vertex v belongs to is denoted by l(v). It is easy to show that bw(G) is the minimum, over all ordered path decompositions, max{l(v) − 1 | v ∈ V }. The natural question here is, what kind of parameter corresponds to bandwidth when, instead of path decompositions, we switch to tree decompositions? This brings us to the definition of ordered tree decomposition and treespan. Definition 1. An ordered tree decomposition (X, T, r) of a graph G = (V, E) is a tree decomposition (X, T ) of G where T = (I, M ) is a rooted tree with root r ∈ I, such that: |Xr | = 1, and if i is the parent of j in T , then |Xj \ Xi | = 1. Definition 2. Given a graph G = (V, E) and an ordered tree decomposition (X, T, r) of G, we define: l(v) = |{i ∈ I | v ∈ Xi }| (number of bags that contain v), for each v ∈ V . ts(X, T, r) = max{l(v) | v ∈ V } − 1. The treespan of a graph G is ts(G) = min{ts(X, T, r) | (X, T, r) is an ordered tree decomposition of G}. Graph Searching, Elimination Trees, and a Generalization of Bandwidth 77 Since every ordered path decomposition is an ordered tree decomposition, it is clear that for every graph G, ts(G) ≤ bw(G). 3 Search Minimizing Occupation Time with Inert Fugitive In this section we give a formal definition of minimum occupation time for inert fugitive searching. A search program Π on a graph G = (V, E) is the sequence of pairs (A0 , Z0 ), (A1 , Z1 ), . . . , (Am , Zm ) such that I. II. III. IV. V. For i ∈ {0, . . . , m}, Ai ⊆ V and Zi ⊆ V . We say that vertices Ai are cleared, vertices V − Ai are contaminated and vertices Zi are occupied by searchers at the ith step. (Initial state.) A0 = ∅ and Z0 = ∅. All vertices are contaminated. (Final state.) A0 = V and Z0 = ∅. All vertices are cleared. (Placing-removing searchers and clearing vertices.) For i ∈ {1, . . . , m} there exists v ∈ V and Yi ⊆ Ai−1 such that Ai − Ai−1 = v and Zi = Yi ∪ {v}. Thus at every step one of the searchers is placed on a contaminated vertex v while the others are placed on cleared vertices Yi . The searchers are removed from vertices Zi−1 − Yi . Note that Yi is not necessarily a subset of Zi−1 . (Possible recontamination.) For i ∈ {1, . . . , m} Ai − {v} is the set of vertices u ∈ Ai−1 such that every uv-path has an internal vertex in Zi . This means that the fugitive awakening in v can run to a cleared vertex u if there is a uv-path unguarded by searchers. Dendris, Thilikos & Kirousis [7] initiated the study of inert search problem, where the problem is to find a search program Π with the smallest maxi∈{0,...,m} |Zi | (this maximum can be treated as the maximum number of searchers used in one step). It turns out that this number is equal to the treewidth of a graph. We find an alternative measure of search to be interesting as well. For a search program Π = (A0 , Z0 ), (A1 , Z1 ), . . . , (Am , Zm ) on a graph G = (V, E) and vertex v ∈ V we define  1, v ∈ Zi δi (v) = 0, v ∈ Zi m Then the number i=0 δi (v) is the number of steps at which vertex v was occupied by searchers. For a program mΠ we define the maximum vertex occupation time to be ot(Π, G) = maxv∈V i=0 δi (v). The vertex occupation time of a graph G, denoted by ot(G), is the minimum maximum vertex occupation time over all search programs on G. A search program (A0 , Z0 ), (A1 , Z1 ), . . . , (Am , Zm ) is monotone if Ai−1 ⊆ Ai for each i ∈ {1, . . . , m}. Note that recontamination does not occur when a searcher is placed on a contaminated vertex thus awaking the fugitive. 78 F.V. Fomin, P. Heggernes, and J.A. Telle Finally, for a graph G we define mot(G) to be the minimum maximum vertex occupation time over all monotone search programs on G. We do not know whether mot(G) = ot(G) for every graph G, and leave it as an interesting open question. 4 Searching and Elimination Trees In this section we discuss a relation between mot(G) and elimination trees of G. This relation is not only interesting in its own but also serves as a tool in further proofs. For a graph G = (V, E), an elimination order α : {1, 2, ..., n} → V is a linear order of the vertices of G. For each given order α, a unique triangulation G+ α of G can be computed from the following procedure: starting with vertex α(1), at each step i, make the higher numbered neighbors of vertex α(i) in the transitory graph into a clique by adding edges. The resulting graph, which is denoted G+ α, is chordal [12], and the given elimination ordering decides the quality of this resulting triangulation. The following lemma follows from the definition of G+ α. Lemma 1. uv is an edge of G+ α ⇔ uv is an edge of G or there is a path u, x1 , x2 , ..., xk , v in G with k ≥ 1 such that all xi are ordered before u and v by α (in other words, max{α−1 (xi ) | 1 ≤ i ≤ k} < min{α−1 (u), α−1 (v)}). Definition 3. For a vertex v ∈ V we define madj + (v) to be the set of vertices u ∈ V such that α(u) ≥ α(v) and uv is an edge of G+ α . (The higher numbered neighbors of v in G+ α .) Given a graph G, and an elimination order α on G, the corresponding elimination tree is a rooted tree ET = (V, P ), where the edges in P are defined by the following parent function: parent(α(i)) = α(j) where j = min{k | α(k) ∈ madj + (α(i))}, for i = 1, 2, ..., n. Hence the elimination tree is a tree on the vertices of G, and vertex α(n) is always the root. The height of the elimination tree is the longest path from a leaf to the root. Minimum elimination tree height of a graph G, mh(G) is the minimum height of an elimination tree corresponding to any triangulation of G. For a vertex u ∈ V we denote by ET [u] the subtree of ET rooted in u and containing all descendants (in ET ) of u. It is important to note that, for two vertices u and v such that ET [u] and ET [v] are disjunct subtrees of ET , no vertex belonging to ET [u] is adjacent to any vertex belonging + to ET [v] in G or G+ α . In addition, N (ET [v]) is a clique in Gα , and a minimal + vertex separator in both Gα and G when v is not the only child of its parent in ET . Let α be an elimination order of the vertices of a graph G = (V, E) and let ET be the corresponding elimination tree of G. Observe that the elimination tree ET gives enough information about the chordal completion G+ of G that ET corresponds to. It is important to understand that any post order α of the vertices of ET is an elimination order on G that results in the same chordal + completion G+ α = G . Thus given G and ET , we have all the information we need on the corresponding triangulation. Graph Searching, Elimination Trees, and a Generalization of Bandwidth 79 Definition 4. Given an elimination tree ET of G, the pruned subtree with root in x, ETp [x], is the subtree obtained from ET [x] by deleting all descendants of every vertex y ∈ ET [x] such that xy ∈ E(G) but no descendant of y is a neighbor of x in G. Thus, the leaves of ETp [x] are neighbors of x in G, and all lower numbered neighbors in G+ of x are also included in ETp [x]. In addition, there might clearly appear vertices in ETp [x] that are not neighbors of x in G. However, every neighbor of x in G+ appears in ETp [x], as we prove in the following lemma. Lemma 2. Let α be an elimination order of graph G = (V, E) and let ET be a corresponding elimination tree. Then for any u, v ∈ V , u ∈ ETp [v] if and only if v ∈ madj + (u). Proof. Let u ∈ ETp [v] and let w be a neighbor of v in G such that u is on a vw-path in ET . By the definition of pruned tree such a vertex w always exists. Because ET is an elimination tree, there is a uw-path P + in G+ α such that for any vertex x of P + , α−1 (x) ≤ α−1 (u). By Lemma 1, this implies that there is also an uw-path P in G such that for any vertex x of P , α−1 (x) ≤ α−1 (u). Since w is adjacent to v in G, we conclude that v ∈ madj + (u). Let v ∈ madj + (u). Then there is an uv-path P in G (and hence in G+ α ) such that all inner vertices of the path are ordered before u in α. Let w be the vertex of P adjacent to v. Because ET is elimination tree, we have that u is on vw-path in ET . Thus u ∈ ETp [v]. We define a parameter called elimination span, es, as follows: Definition 5. Given an elimination tree ET of a graph G = (V, E), for each vertex v ∈ V we define s(v) = |ETp [v]| and es(ET ) = max{s(v) | v ∈ V }−1. The elimination span of a graph G is es(G) = min{es(ET ) | ET is an elimination tree of G}. Theorem 1. For any graph G = (V, E), es(G) = mot(G) − 1. Proof. Let us prove es(G) ≤ mot(G) − 1 first. Let Π = (A0 , Z0 ), (A1 , Z1 ), . . . , (Am , Zn ) be a monotone search program. At every step of the program exactly one new vertex Ai − Ai−1 is cleared. Thus we can define the vertex ordering α by putting for 1 ≤ i ≤ n α(Ai − Ai−1 ) = n − i + 1. At the ith step, when a searcher is placed at a vertex u = Ai − Ai−1 every vertex v ∈ Ai such that there is a uv-path with no inner vertices in Ai should be occupied by a searcher (otherwise v would be recontaminated). Therefore, v ∈ madj + (u) and the number of steps when a vertex v is occupied by searchers, is |{u | v ∈ madj + (u)}|. By Lemma 2, |{u | v ∈ madj + (u)}| = s(v) and we arrive at es(ET ) ≤ mot(Π, G) − 1. We now show that es(G) ≥ mot(G)−1. Let ET be an elimination tree and let α be a corresponding elimination vertex ordering. We consider a search program Π where at the ith step of the program, 1 ≤ i ≤ n, the searchers occupy the set of vertices madj + (v), where v is a vertex with α(v) = n − i + 1. Let us first 80 F.V. Fomin, P. Heggernes, and J.A. Telle prove that Π is recontamination free. Suppose, on the contrary, that a vertex u is recontaminated at the ith step after placing a searcher on a vertex v. Then there is a uv-path P such that no vertex of P except v contains a searcher at the ith step. On the other hand, vertex u is after v in ordering α. Thus P should contain a vertex w ∈ madj + (u), w = u, occupied by a searcher. This is a contradiction. Since every vertex was occupied at least once and no recontamination occurs, we conclude that at the end of Π all vertices are cleared. Every vertex v was occupied by searchers during |{u | v ∈ madj + (u)}| steps and using Lemma 2 we conclude that es(ET ) ≥ mot(Π, G) − 1. 5 Ordered Tree Decompositions and Elimination Trees In this section we discuss a relation between the treespan ts(G) and elimination trees of G, establishing that ts(G) = mot(G). We first give a simplified view of ordered tree decompositions and then proceed to prove some of their properties. There are exactly n bags in X of an ordered tree decomposition (X, T, r) of G. Thus, the index set I for Xi , i ∈ I can be chosen so that I = V , with r ∈ V . Then T is a tree on the vertices of G. To identify the bags and to define the correspondence between I and V uniquely, name the bags so that Xr is the bag corresponding to the root r of T . Regarding the bags in a top down fashion according to T , name the bag in which vertex v appears for the first time Xv and the corresponding tree node v. Thus if y is the parent of v in T then Xv \ Xy = {v}. This explains how to rename the bags and the vertices of T with elements from V given a tree decomposition based on I. However, if we replace i with v and I with V in Conditions 1 - 3 of the definition of a tree decomposition, and change condition in the definition of ordered tree decompositions to “Xr = {r}, and if y is the parent of v in T then Xv \ Xy = {v}”, then this will automatically give a tree T on the vertices of G as we have explained above. For the remainder of this paper, when we mention an ordered tree decomposition (X, T, r), we will assume that T is a tree on the vertices of G as explained here. The following lemma will make the role of T even clearer. Lemma 3. Given a graph G = (V, E) and a rooted tree T = (V, P ), there exists an ordered tree decomposition (X, T, r) of G ⇔ for every edge uv ∈ E, u and v have an ancestor-descendant relationship in T . Proof. Assume that T corresponds to a valid ordered tree decomposition of G, but there is an edge uv in G such that T [u] and T [v] are disjunct subtrees of T . Xu is the first bag in which u appears and Xv is the first bag in which v appears, thus u and v do not appear in any bag Xw where w is on the path from u to the root or from v to the root in T . Thus if u and v appear together in any other bag Xy where y belongs to T [u] or T [v] or any other disjunct subtree in T , this would violate Condition 3 of a tree decomposition. Therefore, u and v cannot appear together in any bag, and there cannot exist a valid decomposition (X, T, r) of G. Graph Searching, Elimination Trees, and a Generalization of Bandwidth 81 For the reverse direction, assume that for every edge uv in G, u and v have an ancestor-descendant relationship in T . Assume without loss of generality that v is an ancestor of u. Then the bags can be defined so that 1) Xv contains v, 2) no bag Xy contains v where y is an ancestor of v, 3) for every vertex w on the path from v to u in T , Xw contains v (and w of course), and 4) Xu contains both u and v. We can see that all the conditions of an ordered tree decomposition are satisfied. Lemma 4. Let (X, T, r) be an ordered tree decomposition of a given graph. For every edge uv in tri(X, T ), u and v have an ancestor-descendant relationship in T. Proof. As we have seen in the proof of Lemma 3, if u and v belong to disjunct subtrees of T , then they cannot appear together in the same bag. Since only the bags are made into cliques, u and v cannot belong to the same clique in tri(X, T ), which means that the edge uv does not exist in tri(X, T ). Lemma 5. Let (X, T, r) be an ordered tree decomposition of a given graph. Let uv be an edge of tri(X, T ) such that v is an ancestor of u in T . Then v belongs to bag Xw for every w on the path from v to u including Xv and Xu . Proof. Vertex v appears for the first time in Xv on the path from the root, and u appears for the first time in Xu . For every vertex w on the path from v to u, exactly vertex w is introduced in Xw . Thus Xu is the first bag in which u and v both can belong to. In order for this to be possible, v must belong to bag Xw for every vertex w on the path from v to u in T . Lemma 6. For each graph G, there exists an ordered tree decomposition (X, T, r) of G of minimum treespan such that if u is a child of v in T then v ∈ Xu . Proof. Assume that u is a child of v in T and v ∈ Xu . Clearly, uv is not an edge of G. Since v does not belong to any bag Xy for a descendant y of u, we can move u up to be a child of a node w in T where uw is an edge of G and where w is the first node on the path from v to the root that is a neighbor of u. Lemma 7. Let (X, T, r) be an ordered tree decomposition of G, and let α : {1, ..., n} → V be a post order of T . Then G+ α ⊆ tri(X, T ). Proof. Let uv be an edge of G+ α , and assume without loss of generality that u has a lower number than v according to α. If uv is an edge of of G, then we are done. Otherwise, due to Lemma 1, there must exist a path u, x1 , x2 , ..., xk , v in G with k ≥ 1 such that all xi are ordered before u. Since α is a post order of T , none of the vertices xi , i = 1, ..., k, can lie on the path from u to the root in T . Consequently and due to Lemma 3, since ux1 is an edge of G, x1 belongs to 82 F.V. Fomin, P. Heggernes, and J.A. Telle T [u]. With the same argument, since x1 , x2 , ..., xk is a path in G, all the vertices x1 , x2 , ..., xk must belong to T [u]. Now, since vxk is an edge in G, v must be an ancestor of xk and thus of u in T , where u lies on the path from v to xk . By Lemma 5, vertex v must be present in all bags Xw where w lies on the path from v to xk , and consequently also in bag Xu . Therefore, u and v are both present in bag Xu and are neighbors in tri(X, T ). Lemma 8. Let (X, T, r) be an ordered tree decomposition of G, and let α be a post order of T . Let ET be the elimination tree of G+ α . Then for any vertex u, if v is the parent of u in ET , then v lies on the path from u to the root in T . Proof. Since v is the parent of u in ET , uv is an edge of G+ α . By Lemma 7, uv is also an edge of tri(X, T ). By Lemma 4, u and v must have an ancestordescendant relationship in T . Since α is a post order of T , and α−1 (u) < α−1 (v), v must be an ancestor of u in T . Theorem 2. For any graph G, ts(G) = es(G). Proof. First we prove that ts(G) ≤ es(G). Let ET = (V, P ) be an elimination tree of G such that es(G) = es(ET ), and let r be the root vertex of ET . We define an ordered tree decomposition (X = {Xv | v ∈ V }, T = ET, r) of G in the following way. For each vertex v in ET , put v in exactly the bags Xu such that u ∈ ETp [v]. Regarding ET top down, each vertex u will appear for the first time in bag Xu , and clearly |Xu \ Xv | = 1 whenever v is the parent of u. It remains to show that (X, ET ) is a tree decomposition of G. Conditions 1 and 3 of a tree decomposition are trivially satisfied since ETp [v] is connected and includes u for every vertex v. For Condition 2, if uv is an edge of G, then the lower numbered of v and u is a descendant of the other in ET . Let us say u is a descendant of v, then u ∈ ETp [v], and v and u will both appear in bag Xu . Thus (X, ET ) is an ordered tree decomposition of G, and clearly, ts(X, ET ) = es(G). Consequently, ts(G) ≤ es(G). Now we show that es(G) ≤ ts(G). Let (X, T, r) be an ordered tree decomposition of G with ts(X, T, r) = ts(G). Let α be a post order on T , and let ET be the elimination tree of G+ α . For any two adjacent vertices u and v in G, u and v must have an ancestor-descendant relationship both in T and in ET . Moreover, due to Lemma 8, all vertices that are on the path between u and v in ET must also be present on the path between u and v in T . Assume, without loss of generality, that u is numbered lower than v. By Lemma 5, v must belong to all the bags corresponding to the vertices on the path from v to u in T . Thus for each vertex v, s(v) in ET is at most l(v) in (X, T, r). Consequently, es(G) ≤ ts(G), and the proof is complete. Theorems 1 and 2 imply the main combinatorial result of this paper. Corollary 1. For any graph G, ts(G) = es(G) = mot(G). Graph Searching, Elimination Trees, and a Generalization of Bandwidth 6 83 Treespan of Some Special Graph Classes The diameter of a graph G, diam(G), is the maximum length of a shortest path between any two vertices of G. The density of a graph G is defined as dens(G) = (n − 1)/diam(G). The following result is well known Lemma 9. [6] For any graph G, bw(G) ≥ max{dens(H) | H ⊆ G}. A caterpillar is a tree consisting of a main path of vertices of degree at least two with some leaves attached to this main path. Theorem 3. For any graph G, ts(G) ≥ max{dens(H) | H ⊆ G and H is a caterpillar}. Proof. Let the caterpillar H be a subgraph of G consisting of the following main path: c1 , c2 , ..., cdiam(H)−1 . We view the bags of an ordered tree decomposition as labeled by vertices of G in the natural manner (as described before Lemma 3). Let (X, T, r) be an ordered tree decomposition of G with (X ′ , T ′ , r′ ) being the topologically induced ordered tree decomposition on H, i.e. containing only bags labeled by a vertex from H, where we contract edges of T going to vertices labeled by vertices not in H to get T ′ . Let Xci be the ’highest’ bag in (X ′ , T ′ , r′ ) labeled by a vertex from the main path, so that only the subtree of (X ′ , T ′ , r′ ) rooted at Xci contains any vertices from the main path. Let there be h + 1 bags on the path from Xci to the root Xr′ of (X ′ , T ′ , r′ ). Since vertex r′ of H (a leaf unless r′ = ci ) is adjacent to a vertex on the main path it appears in at least h + 1 bags, giving ts(G) ≥ h. Moreover, by applying Lemma 3 we get that T ′ between its root Xr′ and Xci consists simply of a path without further children, so that the subtree rooted at Xci has |V (H)| − h bags. Each of these bags contain a vertex from the main path since every leaf of H is adjacent in H only to a vertex on the main path, and by the pigeonhole principle we thus have that some main path vertex lives in at least ⌈(|V (H)| − h)/(diam(H) − 1)⌉ bags. If (|V (H)| − h)/(diam(H) − 1) is not an integer, then immediately we have the bound ts(G) ≥ ⌊(|V (H)| − h)/(diam(H) − 1)⌋. If (diam(H) − 1) on the other hand does divide (|V (H)| − h) then we apply the fact that at least diam(H) − 2 bags must contain at least two vertices from the main path, to account for edges between them, and for diam(H) ≥ 3 (which holds except for the trivial case H a star) this increases the span of at least one main path vertex and we again get ts(G) ≥ ⌊(|V (H)| − h)/(diam(H) − 1)⌋. Thus ts(G) ≥ max{h, ⌊(|V (H)| − h)/(diam(H) − 1)⌋}. If h ≤ dens(H) we have that ⌊(|V (H)| − h)/(diam(H) − 1)⌋ ≥ (|V (H)| − 1)/diam(H) and therefore ⌊(|V (H)| − h)/(diam(H) − 1)⌋ ≥ dens(H). We conclude that ts(G) ≥ dens(H) and the lemma follows. With this theorem, in connection with the following result from [2], we can conclude that bw(G) = ts(G) for a caterpillar graph G. Lemma 10. [2] For a caterpillar graph G, bw(G) ≤ max{dens(H) | H ⊆ G}. 84 F.V. Fomin, P. Heggernes, and J.A. Telle Lemma 11. For a caterpillar graph G, bw(G) = ts(G) = max{dens(H) | H ⊆ G}. Proof. Let G be a caterpillar graph. Then, bw(G) ≥ ts(G) ≥ max{dens(H) | H ⊆ G} ≥ bw(G). The first inequality was mentioned in Section 5, the second inequality is due to Theorem 3, and the last inequality is due to Lemma 10 since G is a caterpillar. Thus all of the mentioned parameters on G are equal. A set of three vertices x, y, z of a graph G is called an asteroidal triple (AT) if for any two of these vertices there exists a path joining them that avoids the (closed) neighborhood of the third. A graph G is called an asteroidal triplefree (AT-free) graph if G does not contain an asteroidal triple. This notion was introduced by Lekkerkerker an Boland [17] for the following characterization of interval graphs: G is an interval graph if and only if it is chordal and AT-free. A graph G is said to be cobipartite if it is the complement of a bipartite graph. Notice that cobipartite graphs form a subclass of AT-free claw-free graphs. Another subclass of AT-free claw-free graphs are the proper interval graphs, which were mentioned earlier. Thus G is a proper interval graph if and only if it is chordal and AT-free claw-free. A minimal triangulation of G is a triangulation H such that no proper subgraph of H is a triangulation of G. The following result is due to Parra and Scheffler. Theorem 4. [20] Let G be an AT-free claw-free graph. Then every minimal triangulation of G is a proper interval graph, and hence, bw(G) = pw(G) = tw(G). Theorem 5. For an AT-free claw-free graph G, ts(G) = bw(G) = pw(G) = tw(G). Proof. Let G be AT-free claw-free and let H be its minimal triangulation such that ts(G) = ts(H). Such a graph H must exist, since for an optimal ordered tree decomposition (X, T, r), the graph tri(X, T ) is chordal and ts(tri(X, T )) = ts(G). Thus any minimal graph from the set of chordal graphs ’sandwiched’ between tri(X, T ) and G can be chosen as H. By Theorem 4, H is a proper interval graph. Thus ω(H) − 1 = bw(H) ≥ bw(G). Since ts(H) ≥ ω(H) − 1, we have that ts(G) = ts(H) ≥ ω(H) − 1 ≥ bw(G) ≥ ts(G). By the celebrated result of Arnborg, Corneil & Proskurowski [1], tree-width (and hence path-width and bandwidth) is NP-hard even for cobipartite graphs. Thus Theorem 5 yields the following corollary. Corollary 2. Computing treespan is NP-hard for cobipartite graphs. We conclude with an open question. For any graph G, ts(G) ≥ ⌈∆(G)/2⌉. For trees of maximum degree at most 3 it is easy to prove that ts(G) ≤ ⌈∆(G)/2⌉. It is an interesting question whether treespan can be computed in polynomial time for trees of larger max degree. Notice that bandwidth remains NP-complete on trees of max degree 3 [13]. Graph Searching, Elimination Trees, and a Generalization of Bandwidth 85 References 1. S. Arnborg, D.G. Corneil, and A. Proskurowski, Complexity of finding embeddings in a k-tree, SIAM J. Alg. Disc. Meth., 8 (1987), pp. 277–284. 2. S.F. Assman, G.W. Peck, M.M. Syslo, and J.Zak, The bandwidth of caterpillars with hairs of length 1 and 2, SIAM J. Alg. Disc. Meth., 2 (1981), pp. 387–392. 3. D. Bienstock, Graph searching, path-width, tree-width and related problems (a survey), DIMACS Ser. in Discrete Mathematics and Theoretical Computer Science, 5 (1991), pp. 33–49. 4. D. Bienstock and P. Seymour, Monotonicity in graph searching, J. Algorithms, 12 (1991), pp. 239–245. 5. H.L. Bodlaender, A partial k-arboretum of graphs with bounded treewidth, Theor. Comp. Sc., 209 (1998), pp. 1–45. 6. P.Z. Chinn, J. Chvátalová, A.K. Dewdney, and N.E. Gibbs, The bandwidth problem for graphs and matrices – a survey, J. Graph Theory, 6 (1982), pp. 223– 254. 7. N.D. Dendris, L.M. Kirousis, and D.M. Thilikos, Fugitive-search games on graphs and related parameters, Theor. Comp. Sc., 172 (1997), pp. 233–254. 8. J.A. Ellis, I.H. Sudborough, and J. Turner, The vertex separation and search number of a graph, Information and Computation, 113 (1994), pp. 50–79. 9. F. Fomin, Helicopter search problems, bandwidth and pathwidth, Discrete Appl. Math., 85 (1998), pp. 59–71. 10. F.V. Fomin and P.A. Golovach, Graph searching and interval completion, SIAM J. Discrete Math., 13 (2000), pp. 454–464 (electronic). 11. M. Franklin, Z. Galil, and M. Yung, Eavesdropping games: A graph-theoretic approach to privacy in distributed systems, J. ACM, 47 (2000), pp. 225–243. 12. D. Fulkerson and O. Gross, Incidence matrices and interval graphs, Pacific Journal of Math., 15 (1965), pp. 835–855. 13. M.R. Garey, R.L. Graham, D.S. Johnson, and D.E. Knuth, Complexity results for bandwidth minimization, SIAM J. Appl. Math., 34 (1978), pp. 477–495. 14. F. Gavril, The intersection graphs of subtrees in trees are exactly the chordal graphs, J. Combin. Theory Ser. B, 16 (1974), pp. 47–56. 15. L.M. Kirousis and C.H. Papadimitriou, Searching and pebbling, Theor. Comp. Sc., 47 (1986), pp. 205–218. 16. A.S. LaPaugh, Recontamination does not help to search a graph, J. ACM, 40 (1993), pp. 224–245. 17. C.G. Lekkerkerker and J.C. Boland, Representation of a finite graph by a set of intervals on the real line, Fund. Math, 51 (1962), pp. 45–64. 18. F.S. Makedon, C.H. Papadimitriou, and I.H. Sudborough, Topological bandwidth, SIAM J. Alg. Disc. Meth., 6 (1985), pp. 418–444. 19. F.S. Makedon and I.H. Sudborough, On minimizing width in linear layouts, Disc. Appl. Math., 23 (1989), pp. 201–298. 20. A. Parra and P. Scheffler, Treewidth equals bandwidth for AT-free claw-free graphs, Technical Report 436/1995, Technische Universität Berlin, Fachbereich Mathematik, Berlin, Germany, 1995. 21. F.S. Roberts, Indifference graphs, in Proof Techniques in Graph Theory, F. Harary, ed., Academic Press, 1969, pp. 139–146. 22. N. Robertson and P.D. Seymour, Graph minors – a survey, in Surveys in Combinatorics, I. Anderson, ed., Cambridge Univ. Press, 1985, pp. 153–171. 23. A.L. Rosenberg and I.H. Sudborough, Bandwidth and pebbling, Computing, 31 (1983), pp. 115–139. Constructing Sparse t-Spanners with Small Separators Joachim Gudmundsson⋆ Department of Mathematics and Computing Science, TU Eindhoven 5600 MB Eindhoven, The Netherlands. Abstract. Given a set of n points S in the plane and a real value t > 1 we show how to construct in time O(n log n) a t-spanner G of S such √ that there exists a set of vertices S ′ of size O( n log n) whose removal leaves two disconnected sets A and B where neither is of size greater than 2/3 · n. The spanner also has some additional properties; low weight and constant degree. 1 Introduction Complete graphs represent ideal communication networks but they are expensive to build; sparse spanners represent low cost alternatives. The weight of the spanner network is a measure of its sparseness; other sparseness measures include the number of edges, maximum degree and the number of Steiner points. Spanners for complete Euclidean graphs as well as for arbitrary weighted graphs find applications in robotics, network topology design, distributed systems, design of parallel machines, and many other areas, and have been subject to considerable research [1,2,5,8,14]. Consider a set S of n points in the plane. A network on S can be modeled as an undirected graph G with vertex set S and with edges e = (u, v) of weight wt(e). In this paper we will study Euclidean networks, a Euclidean network is a geometric network where the weight of the edge e = (u, v) is equal to the Euclidean distance d(u, v) between its two endpoints u and v. Let t > 1 be a real number. We say that G is a t-spanner for S, if for every pair of points u, v ∈ S, there exists a path in G of weight at most t times the Euclidean distance between u and v. A sparse t-spanner is defined to be a t-spanner with a linear number of edges and total weight (sum of edge weights) O(wt(M ST (S))), where wt(M ST (S)) is the total weight of a minimal spanning tree of S. Many algorithms are known that compute t-spanners with O(n) edges that have additional properties such as bounded degree, small spanner diameter (i.e., any two points are connected by a t-spanner path consisting of only a small number of edges), low weight (i.e., the total length of all edges is proportional to the weight of a minimum spanning tree of S), and fault-tolerance; see, e.g., [1,2,3,5,7,8,9,11,12,14,19], and the surveys [10,20]. All these algorithms compute t-spanners for any given constant t > 1. ⋆ Supported by The Netherlands Organisation for Scientific Research (NWO). A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 86–97, 2003. c Springer-Verlag Berlin Heidelberg 2003  Constructing Sparse t-Spanners with Small Separators 87 In this paper, we consider the construction of a sparse t-spanner with constant degree and with a provable balanced separator. Finding small separators in a graph is a problem that has been studied extensively within theoretical computer science for the last three decades, and a survey of the area can be found in the book by Rosenberg and Heath [17]. Spanners with good separators have, for example, applications in the construction of external memory data structures [16]. It is well-known that planar graphs have small separators and, hence any planar spanner has a small separator. Bose et al. [4] showed how to construct a planar t-spanner for t ≈ 10 with constant degree and low weight. Also, it is known that the Delaunay triangulation is a t-spanner for t = 2π/(3 cos(π/6)) [13]. For arbitrary values of t > 1 this article is, to the best of the author’s knowledge, the first time that separators have been considered. Definition 1. Given a graph G = (V, E), a separator is a set of vertices C ⊂ V whose removal leaves two disconnected sets A and B. A separator C is said to be balanced if the size of both A and B is at most 2/3 · |V |. The main result of this paper is the following theorem. Theorem 1. Given a set S of n points in the plane and a constant t > 1, there is an O(n log n)-time algorithm that constructs a graph G = (S, E) 1. 2. 3. 4. 5. that it is a t-spanner of S, that has a linear number of edges, that has weight O(wt(M ST (S))), √ that has a balanced separator of size O( n log n), and, in which each node has constant degree. The paper is organised as follows. First we present an algorithm that produces a t-spanner G. Then, in Section 3, we prove that G has all the properties stated in Theorem 1. 2 Constructing a t-Spanner In this section we first show an algorithm that, given a set S of n points in the plane together with a real value t > 1, produces a t-spanner G. The algorithm works in two steps: first it produces a modified approximate θ-graph [6,12,18], denoted Gθ , which is then pruned using a greedy approach [1,5,8,11]. We show that the resulting graph, denoted G, has two basic properties that will be used to prove that it is a sparse spanner with a balanced separator. 2.1 The Algorithm It has long been known that for any constant t > 1, every point set S in the plane has a t-spanner with O(n) edges. One such construction is the θ-graph of S. Let θ < π/4 be a value such that kθ = 2π/θ is a positive integer. The θ-graph of S is obtained by drawing kθ non-overlapping cones around each point 88 J. Gudmundsson p ∈ S, each spanning an angle of θ, and connecting p to the point in each cone closest to p. For each of these edges, p is said to be the source while the other endpoint is said to be the sink. The result is a tθ -spanner with at most nkθ edges. Here tθ = (cos(θ) − sin(θ))−1 . The time needed to construct the θ-graph for any constant θ is O(n log n) [12]. Approximate the θ-Graph. Here we will build an approximate version of the θ-graph, which we denote a φ-graph Gφ = (S, Eφ ). First build a θ′ -graph (S, Eθ′ ) with θ′ = ǫθ, for some small constant ǫ, as shown in Fig. 1a. A point v ∈ S belongs to Sp if and only if (p, v) ∈ Eθ′ and p is the source of (p, v). Process each point p ∈ S iteratively as follows until Sp is empty. Let v be the point in Sp closest to p. Add the edge (p, v) to Eφ′ and remove every point u from Sp for which it holds that ∠vpu < (θ/2), as illustrated in Fig. 1b. Continue until Sp is empty. Gφ′ is a tφ′ -spanner with tφ′ = (cos(φ′ ) − sin(φ′ ))−1 and, since two adjacent cones may overlap, the number of outgoing edges is bounded by 4π/θ. Arya et al. [2] showed that a θ-graph can be pruned such that each point has constant degree. Applying this result to Gφ′ gives a tφ -spanner Gφ where each point has t ′ degree bounded by O( θ(tφφ−tφ ) ). Note that the value of φ′ is θ(1 + 2ǫ). Remove “long” Edges Intersecting “short” Edges. The remaining two steps of the construction algorithms are both pruning the graph. Prune Gφ = (S, Eφ ) to obtain a graph Gθ = (S, Eθ ) as follows. Build the minimum spanning tree Tmst = (S, Emst ) of S. Sort the edges in Eφ and in Emst with respect to their lengths. We obtain the two ordered sets Eφ = {e1 , . . . , eO(n) } and Emst = {e′1 , . . . , e′n−1 } respectively. The idea is to process the edges in Eφ in order, while maintaining a graph T that will cluster vertices that lie within distance l from each other, where l = |ei |/n2 and ei is the edge just about to be processed. The graph will also contain information about the convex hull of each cluster and we will show that this can be done in linear time if the minimum spanning tree is given. Initially T contains n clusters where every cluster is a single point. Assume that we are about to process an edge ei = (u, v) ∈ Eφ . The first step is to merge all clusters in T that are connected by an edge of length at most l = |ei |/n2 . This is done by extracting the shortest edge, e′j = (u′j , vj′ ), in Emst and merging the two clusters C1 and C2 containing u′j respectively vj′ . This is done until there are no more edges in Emst of length less than l = |ei |/n2 . At the same time we also compute the convex hull, denoted C, of C1 and C2 , note that this can be done in linear time with respect to the decrease in complexity from C1 and C2 , to C. Hence, in total, it will require linear time to update the convex hulls of the clusters. Now we are ready to process ei = (u, v). Let m(u, l) and m(v, l) denote the clusters in T containing u and v respectively. If ei intersects the convex hull of either m(u, l) or m(v, l) then ei is discarded, otherwise it is added to Eθ , as shown in Fig. 1c. Since the original graph is a φ-graph it is not hard Constructing Sparse t-Spanners with Small Separators 89 to see that between every pair of clusters, C1 and C2 , there is at least one edge (u, v) ∈ Eφ such that u and v lies on the convex hull of C1 and C2 respectively. This finishes the second part of the algorithm and we sum it up by stating the following straight-forward observation. m(u, l) u e m(v, l) v θ′ m(v, l) u e m(u, l) (a) (b) v (c) Fig. 1. (a) Constructing a θ′ -graph, which is then (b) pruned to obtain a φ-graph. (c) Every edge is tested to see if it intersects the convex hulls of the clusters containing u and v. Observation 1 The above algorithm produces a graph Gθ in time O(n log n) 1 which is a tθ -spanner, where tθ ≤ ( cos(φ)−sin(φ) + n1 ). Greedily Pruning the Graph. We are given a modified approximate θ-graph Gθ for tθ = t/(1 + ε). The final step is to run the greedy tg -spanner algorithm with Gθ and tg = (1 + ε) as input. The basic idea of the standard greedy algorithm is sorting the edges (by increasing weight) and then processing them in order. Greedy processing of an edge e = (u, v) entails a shortest path query, i.e., checking whether the shortest path in the graph built so far has length at most t · d(u, v). If the answer to the query is no, then edge e is added to the spanner G, else it is discarded, see Fig. 2. The greedy algorithm was first considered by Althöfer et al. [1] and later variants of the greedy algorithm using clustering techniques improved the analysis [5,8,11]. In [8] it was observed that shortest path queries need not be answered precisely. Instead, approximate shortest path queries suffice, of course, this meant that the greedy algorithm, too, was only approximately simulated by the algorithm. The most efficient algorithm was recently presented by Gudmundsson et al. [11], where they show an O(n log n)time variant of the greedy algorithm. In the approximate greedy algorithm an approximate shortest path query checks if the path is longer than τ · d(u, v), where 1 < τ < t. 2.2 Two Basic Properties The final result is a t-spanner G = (S, E) with several nice properties, among them the following two simple and fundamental properties that will be used in 90 J. Gudmundsson Algorithm Standard-Greedy(G = (S, E), t) 1. sort the edges in E by increasing weight 2. E ′ := ∅ 3. G′ := (S, E ′ ) 4. for each edge (u, v) ∈ E do 5. if ShortestPath(G′ , u, v) > t · d(u, v) then 6. E ′ := E ′ ∪ {(u, v)} 7. G′ := (S, E ′ ) ′ 8. output G Fig. 2. The naive O(|E|2 · |S| log |S|)-time greedy spanner algorithm the analysis: the obtuse Empty-cone property, and the Leap-frog property. Let C(u, v, θ) denote the (unbounded) cone with apex at u, spanning an angle of θ such that (u, v) splits the angle at u into two equal angles. An edge set E is said to have the Empty-cone property if for every edge e = (u, v) ∈ E it holds that v is the point closest to u within C(u, v, θ). From the definition of θ-graphs it is obvious that Gθ satisfies the Empty-cone property, actually we can see that the property can be somewhat strengthen to what we call an obtuse Empty-cone property. Assume w.l.o.g. that (u, v) is vertical, u lies below v and u is the source of e. Since u and v lies on the convex hull of m(u, l) and m(v, l) (otherwise e would have been discarded in the pruning step) it holds that there are two half disks intersecting (u, v) with radii l = |e|/n2 and centers at u and v, see Fig 3a. Thus, the union of the half disks and the part of the cone C(u, v, θ) within distance |uv| from u is said to be an obtuse cone, and is denoted Co (u, v, θ). The following observation is straight-forward. Observation 2 The shortest edge that intersects an edge e = (u, v) ∈ E satisθ/2 . fying the obtuse Empty-cone property must be longer than 2|e| sin n2 Next we consider the Leap-frog property. Let t ≥ τ > 1. An edge set E satisfies the (t, τ )-leapfrog property if the following is true for every possible E ′ = {(u1 , v1 ), . . . , (um , vm )}, which is a subset of E: τ ·wt(u1 , v1 ) < m  i=2 m−1  wt(ui , vi ) + t·( wt(vi , ui+1 ) + wt(vm , u1 )). i=1 Informally, this definition says that if there exists an edge between u1 and v1 then any path, not including (u1 , v1 ) must have length greater than τ ·wt(u1 , v1 ), as illustrated in Fig. 3b. Lemma 1. Given a set of points in the plane and a real value t > 1 the above algorithm produces a t-spanner G = (S, E) that satisfies the obtuse Empty-cone property, and the Leap-frog property. Constructing Sparse t-Spanners with Small Separators (a) v 91 (b) u2 v2 u3 v1 v3 u1 u Fig. 3. (a) The shaded area, denoted Co (u, v, θ), is empty if e satisfies the obtuse Empty-cone property. (b) Illustrating the Leap-frog property. Proof. Since E is a subset of the edges in the approximate θ-graph Gθ it immediately follows that E has the obtuse Empty-cone property. Now, let C be the shortest simple cycle in G containing an arbitrary edge e = (u, v). To prove that G satisfies the leapfrog property we have to estimate wt(C) − wt(u, v). Let e′ = (u′ , v ′ ) be the longest edge of C. Among the cycle edges e′ is examined last by the algorithm. What happens while the algorithm is examining e′ ? In [11] it was shown that if the algorithm adds an edge e′ to the graph the shortest path between u′ and v ′ must be longer than τ · d(u′ , v ′ ) in the partial graph constructed so far. Hence, wt(C) − d(u, v) ≥ wt(C) − d(u′ , v ′ ) > τ · d(u′ , v ′ ) ≥ τ · d(u, v). The lemma follows. ⊓ ⊔ The obtuse Empty-cone property will be used to prove that G has a balanced separator and, the Leap-frog property will mainly be used to prove that the total weight of G is small, as will be shown in Section 3.2. 3 The Analysis In this section we will perform a close analysis of the graph constructed by the algorithm presented in the previous section. First we study the separator property and then, in Section 3.2, we take a closer look at the remaining properties claimed in Theorem 1. 3.1 A Balanced Separator In this subsection we prove that the graph G = (S, E) has a balanced separator √ of size O( n log n), by using the famous Planar Separator Theorem by Lipton and Tarjan [15]. Fact 1 (Planar Separator Theorem [15]) Every planar graph G with n vertices can be partitioned into three parts A, B and C such that C is a separator of 92 J. Gudmundsson √ √ G and |A| ≤ 2n/3, |B| ≤ 2n/3 and |C| ≤ 2 2 n. Furthermore, there is an algorithm to compute this partition in time O(n). The following corollary is a straight-forward consequence of Fact 1. Corollary 1. Let G be a graph in the plane such that every edge of G intersects at most N other edges of G. It can be partitioned into three parts A,√B√and C such that C is a separator of G and |A| ≤ 2n/3, |B| ≤ 2n/3 and |C| ≤ 2 2 n·N . This corollary immediately suggests a way prove that G has a balanced sep√ arator of size O(N n), namely prove that every edge in E intersects at most N other edges in E. It should be noted that it is not enough to prove that the intersection graph I of G has low complexity since finding a balanced separator in I does not imply a balanced separator of G. The first step is to partition the edge set E into a constant number of groups, each having the three nice properties listed below. The idea of partitioning the edge set into groups is borrowed from [7]. The edge set E can be partitioned into a constant number of groups such that the following three properties are satisfied for each subset: 1. Near-parallel property: Associate to each edge e = (u, v) a slope as follows. Let h be a horisontal segment with left endpoint at the source of e. The of e is now the counter-clockwise angle between h and e. An edge e in E belongs to the subgroup Ei if the slope of e is between (i − 1)β and iβ, for some small angle β ≪ θ. 2. Length-grouping property: Let γ > 0 be a small constant. The length of any two edges in Ei,j differ by at most a factor δ = (1 − γ) or by at least a factor xδ c−1 . Consider a group Ei of near-parallel edges. Let the length of the longest edge in Ei be ℓ. Partition the interval [0, ℓ] into an infinite number of intervals {[ℓδ, ℓ], [ℓδ 2 , ℓδ], [ℓδ 3 , ℓδ 2 ], . . . }. Define the subgroup Ei,j as containing the edges whose lengths lie in intervals {[ℓδ j+1 , ℓδ j ], [ℓδ j+c+1 , ℓδ j+c ], . . . }. There is obviously only a constant number of such groups. 3. Empty-region property: Any two edges e1 and e2 in Ei,j,k that are nearparallel and almost of equal length are separated by a distance which is a large multiple of |e1 |. Hence, two “near-equal” edges cannot be close to each other. To achieve this grouping [7], construct a graph H where the nodes are edges of Ei,j , and two “near-equal” nodes in H, say e1 and e2 , are connected by an edge if e1 intersects a large cylinder of radius α|e2 | and height α|e2 | centered at the center of the edge e2 in Ei,j , for some large constant α. This graph has constant degree, because by the Leap-frog property, there can be only a constant number of similar “near-equal” edges whose endpoints can be packed into the cylinder. Thus this graph has a constant chromatic number, and consequently a constant number of independent sets. Hence, Ei,j is subdivided into a constant number of groups, denoted Ei,j,k . Constructing Sparse t-Spanners with Small Separators 93 Cov (u, v, θ) Cou (u, v, θ) R1′ v u u (a) (b) Fig. 4. (a) Illustrating the split of Co (u, v, θ) into Cou (u, v, θ) and Cov (u, v, θ). (b) R′1 lies inside R1 . Let e = (u, v) be an arbitrary edge in E. Next we will prove that the number of edges in D = Ei,j,k , for any i, j and k, that may intersect e is bounded by O(log n) and since there is only a constant number of groups this implies that e is intersected by at most a logarithmic number of edges of E. For simplicity we will assume that e is horisontal. To simplify the analysis we partition Co (u, v, θ) into two regions, Cou (u, v, θ) and Cov (u, v, θ), where every point in Cou (u, v, θ) lies closer to u than to v, see Fig. 4a. We will prove that the number of edges intersecting (u, v) within the region Cou (u, v, θ) is bounded by O(log n). By symmetry the proof also holds for the region Cov (u, v, θ) since a cone of size and shape as described by the region Cou (u, v, θ) can be placed within Cov (u, v, θ), see Fig. 4a. Hence, for the rest of this section we will only consider the region Cou (u, v, θ). Let D′ = {e1 , e2 , . . . , er } be the edges in D intersecting the part of e within u Co (u, v, θ) , ordered from left to right with respect to their intersection with e. Let qi denote the intersection point between ei and e and let yi denote the length of the intersection between a vertical line through qi and Cou (u, v, θ). ui+1 ui ui ui+1 vi+1 vi vi+1 Fig. 5. Illustrating the proof of Lemma 2 94 J. Gudmundsson Lemma 2. The distance between any pair of consecutive points qi and qi+1 along e is greater than y2i sin(θ/2). Proof. We will assume that ui and ui+1 lie above vi and vi+1 . Note that in the calculations below we assumed that the edges in D are parallel but since the final bound is far from the exact solution the bound stated in the lemma is still valid. There are three cases to consider. 1. |ei+1 | < δ c · |ei |. We will have two subcases: a) ei+1 does not intersect C(ui , vi , θ), see Fig. 5a. The distance between ei and ei+1 is minimised when vi+1 is the intersection between the lower side of C(u, v, θ) and the right side of C(ui , vi , θ), and ui lies on the top side of C(u, v, θ). Now, straight-forward trigonometry shows that the horisontal distance between qi and qi+1 is greater than yi sin(θ/2) > y2i sin(θ/2). b) ei+1 intersects C(ui , vi , θ), see Fig. 5b. The distance between qi and qi+1 is minimised when ui+1 lies on the right side of C(ui , vi , θ) in a leftmost position. Again, using straight-forward trigonometry we obtain that the distance between qi and qi+1 is greater than (ei (1 − δ c−1 ) sin(θ/2) > yi (1 − δ c−1 ) sin(θ/2) > y2i sin(θ/2). 2. |ei | ≤ δ c · ei+1 . We will have two subcases a) ei does not intersect C(ui+1 , vi+1 , θ), see Fig. 6a. The proof is almost identical to case 1a. The distance between qi and qi+1 is minimised when vi+1 is the intersection between the lower side of C(u, v, θ) and the right side of C(ui , vi , θ), and ui lies on the top side of C(u, v, θ). Simple calculations show that the distance between qi and qi+1 is greater than y2i sin(θ/2). b) ei intersects C(ui+1 , vi+1 , θ), see Fig. 6b. The proof is similar to case 1b. The distance between qi and qi+1 is minimised when ui lies on the left side of C(ui+1 , vi+1 , θ) in a rightmost position. Again, using straight-forward trigonometry we obtain that the distance between ei and ei+1 is at least (ei (1 − δ c−1 ) sin(θ/2) > yi (1 − δ c−1 ) sin(θ/2) > y2i sin(θ/2). 3. δ c · |ei | ≤ |ei+1 | ≤ (1/δ c ) · |ei |. It follows from the Empty-region property of D that the distance between ei and ei+1 is at least (α · max(|ei |, |ei+1 |)). ⊓ ⊔ We need one more lemma before we can state the main theorem of this section. Lemma 3. e intersects O(log n) edges in G. Proof. As above we assume w.l.o.g. that e is horisontal. Partition Cou (u, v, θ) into two regions, the region R1 containing all points in Cou (u, v, θ) with horisontal distance at most (|e|/n2 ) from u, and the region R2 containing the remaining Constructing Sparse t-Spanners with Small Separators 95 ui+1 ui+1 ui ui vi vi vi+1 Fig. 6. Illustrating the proof of Lemma 2 region. Consider the disk Du of radius (|e|/n2 ) with center at u. From the construction of G it holds that there is a half-disk centered at u and with radius (|e|/n2 ) that is empty. We may assume w.l.o.g. that the half-disk covers the upper right quadrant of Du (otherwise it must cover the lower right quadrant of Du ), Fig. 4b. Let us first consider the region R1 . Let R′1 be the rectilinear box inside R1 with bottom left corner at u, width (|e|/n2 ) and height (|e|/n2 · sin(θ/2)), as illustrated in Fig. 4b. Every edge intersecting e within R1 must also intersect R′1 , hence we may consider R′1 instead of R1 . According to Lemma 2, the distance · sin(θ/2) , which implies that the total between qi and qi+1 , is at least |e|·sin(θ/2) n2 2 2 1 number of edges that may intersect e within R′1 is n|e|2 / |e|·sinn2(θ/2) = sin2 (θ/2) which is constant since θ is a constant. Next we consider the part of e within R2 . The width of R2 is less than (|e|/2), its left side has height at least (|e|/n2 · sin(θ/2)) and its right side has height at 2 (θ/2) most (|e|/2 · sin θ). From Lemma 2 it holds that yi+1 ≥ yi (1 + 2sin cos(θ/2) ) since 2 (θ/2) the distance between qi and qi+1 is at least yi /2 · sin(θ/2). Set λ = 2sin cos(θ/2) . The length of the shortest edge ℓmin is Ω(1/n2 ) according to Observation 2, and the value of yi is at least (1 + λ)i−1 · ℓmin . The largest y-value is obtained for the rightmost intersection point qb . Obviously yb is bounded by (|e|/2 · sin θ), hence it holds that (1 + λ)b · ℓmin = O(|e|) which is true for b = O(log n). ⊓ ⊔ Now we are ready to state the main theorem of this section, which is obtained by putting together Corollary 1 and Lemma 3 . √ Theorem 2. G has a balanced separator of size O( n log n). 3.2 Other Properties Theorem 1 claims that G has five properties which we will discuss below, one by one: 96 J. Gudmundsson 1. G is a t-spanner of the complete Euclidean graph. Since Gθ is a (t/(1 + ε))-spanner of the complete Euclidean graph and since G is a (1 + ε)-spanner of Gθ it follows that G is a t-spanner of the complete Euclidean graph. 2. G has a linear number of edges. This property is straight-forward since G is a subgraph of Gθ and we already know from Section 2.1 that the number of edges in Gθ is less than n · 4π θ . 3. G has weight O(wt|M ST |). Das and Narasimhan showed the following fact about the weight of graphs that satisfy the Leap-frog property. Fact 2 [Theorem 3 in [8]] There exists a constant 0 < φ < 1 such that the following holds: if a set of line segments E in d-dimensional space satisfies the (t, τ )-leapfrog property, where t ≥ τ ≥ φt + 1 − φ > 1, then wt(E) = O(wt(M ST )), where M ST is a minimum spanning tree connecting the endpoints of E. The constant implicit in the O-notation depends on t and d. The low weight property now follows from the above fact together with Lemma 1 and the fact that Gθ is a (t/(1 + ε))-spanner of the complete Euclidean graph of S, hence it also includes a spanning tree of weight O(wt(M ST (S))). 4. G has a balanced separator Follows from Theorem 2. 5. G has constant degree. This property is straight-forward since G is a subgraph of Gφ , constructed in Section 2.1, which has constant degree. This concludes the proof of Theorem 1. 4 Conclusions and Further Research We have shown the first algorithm that given a set of points in the plane and a real value t > 1 constructs in time O(n log n) a sparse t-spanner with constant degree and with a provably balanced separator. There are two obvious ques√ tions: (1) Is there a separator of size O( n), and (2) will the algorithm produce a t-spanner with similar properties in higher dimensions. Another interesting question to answer is if the greedy algorithm by itself produces a t-spanner with a balanced separator. Acknowledgements. I am grateful to Anil Maheswari for introducing me to the problem, and to Mark de Berg, Otfried Cheong and Andrzej Lingas for stimulating and helpful discussions during the preparation of this article. Constructing Sparse t-Spanners with Small Separators 97 References 1. I. Althöfer, G. Das, D. P. Dobkin, D. Joseph, and J. Soares. On sparse spanners of weighted graphs. Discrete Computational Geometry, 9. 2. S. Arya, G. Das, D. M. Mount, J. S. Salowe, and M. Smid. Euclidean spanners: short, thin, and lanky. In Proc. 27th Annual ACM Symposium on Theory of Computing, pages 489–498, 1995. 3. J. Bose, J. Gudmundsson, and P. Morin. Ordered theta graphs. In Proc. 14th Canadian Conference on Computational Geometry, 2002. 4. J. Bose, J. Gudmundsson, and M. Smid. Constructing plane spanners of bounded degree and low weight. In Proc. 10th European Symposium on Algorithms, 2002. 5. B. Chandra, G. Das, G. Narasimhan, and J. Soares. New sparseness results on graph spanners. International Journal of Computational Geometry and Applications, 5:124–144, 1995. 6. K. L. Clarkson. Approximation algorithms for shortest path motion planning. In Proc. 19th ACM Symposium on Computational Geometry, pages 56–65, 1987. 7. G. Das, P. Heffernan, and G. Narasimhan. Optimally sparse spanners in 3dimensional Euclidean space. In Proc. 9th Annual ACM Symposium on Computational Geometry, pages 53–62, 1993. 8. G. Das and G. Narasimhan. A fast algorithm for constructing sparse Euclidean spanners. International Journal of Computational Geometry and Applications, 7:297–315, 1997. 9. G. Das, G. Narasimhan, and J. Salowe. A new way to weigh malnourished Euclidean graphs. In Proc. 6th ACM-SIAM Sympos. Discrete Algorithms, pages 215– 222, 1995. 10. D. Eppstein. Spanning trees and spanners. In J.-R. Sack and J. Urrutia, editors, Handbook of Computational Geometry, pages 425–461. Elsevier Science Publishers, Amsterdam, 2000. 11. J. Gudmundsson, C. Levcopoulos, and G. Narasimhan. Improved greedy algorithms for constructing sparse geometric spanners. SIAM Journal of Computing, 31(5):1479–1500, 2002. 12. J. M. Keil. Approximating the complete Euclidean graph. In Proc. 1st Scandinavian Workshop on Algorithmic Theory, pages 208–213, 1988. 13. J. M. Keil and C. A. Gutwin. Classes of graphs which approximate the complete Euclidean graph. Discrete and Computational Geometry, 7:13–28, 1992. 14. C. Levcopoulos, G. Narasimhan, and M. Smid. Improved algorithms for constructing fault-tolerant spanners. Algorithmica, 32:144–156, 2002. 15. R. J. Lipton and R. E. Tarjan. A separator theorem for planar graphs. SIAM Journal of Applied Mathematics, 36:177–189, 1979. 16. A. Maheswari. Personal communication, 2002. 17. A. L. Rosenberg and L. S. Heath. Graph separators, with applications. Kluwer Academic/Plenum Publishers, Dordrecht, the Netherlands, 2001. 18. J. Ruppert and R. Seidel. Approximating the d-dimensional complete Euclidean graph. In Proc. 3rd Canadian Conference on Computational Geometry, pages 207– 210, 1991. 19. J. S. Salowe. Construction of multidimensional spanner graphs with applications to minimum spanning trees. In Proc. 7th Annual ACM Symposium on Computational Geometry, pages 256–261, 1991. 20. M. Smid. Closest point problems in computational geometry. In J.-R. Sack and J. Urrutia, editors, Handbook of Computational Geometry, pages 877–935. Elsevier Science Publishers, Amsterdam, 2000. Composing Equipotent Teams Mark Cieliebak1 , Stephan Eidenbenz2 , and Aris Pagourtzis3 1 2 Institute of Theoretical Computer Science, ETH Zürich cieliebak@inf.ethz.ch Basic and Applied Simulation Science (CCS-5), Los Alamos National Laboratory† eidenben@lanl.gov 3 Department of Computer Science, School of ECE, National Technical University of Athens, Greece‡ pagour@cs.ntua.gr Abstract. We study the computational complexity of k Equal Sum Subsets, in which we need to find k disjoint subsets of a given set of numbers such that the elements in each subset add up to the same sum. This problem is known to be NP-complete. We obtain several variations by considering different requirements as to how to compose teams of equal strength to play a tournament. We present: – A pseudo-polynomial time algorithm for k Equal Sum Subsets with k = O(1) and a proof of strong NP-completeness for k = Ω(n). – A polynomial-time algorithm under the additional requirement that the subsets should be of equal cardinality c = O(1), and a pseudopolynomial time algorithm for the variation where the common cardinality is part of the input or not specified at all, which we proof NP-complete. – A pseudo-polynomial time algorithm for the variation where we look for two equal sum subsets such that certain pairs of numbers are not allowed to appear in the same subset. Our results are a first step towards determining the dividing lines between polynomial time solvability, pseudo-polynomial time solvability, and strong NP-completeness of subset-sum related problems; we leave an interesting set of questions that need to be answered in order to obtain the complete picture. 1 Introduction The problem of identifying subsets of equal value among the elements of a given set is constantly attracting the interest of various research communities due to its numerous applications, such as production planning and scheduling, parallel processing, load balancing, cryptography, and multi-way partitioning in VLSI design, to name only a few. Most research has so far focused on the version where † ‡ LA–UR–03:1158; work done while at ETH Zürich. Work partially done while at ETH Zürich, supported by the Human Potential Programme of EU, contract no HPRN-CT-1999-00104 (AMORE). A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 98–108, 2003. c Springer-Verlag Berlin Heidelberg 2003  Composing Equipotent Teams 99 the subsets must form a partition of the given set; however, the variant where we skip this restriction is interesting as well. For example, the Two Equal Sum Subsets problem can be used to show NP-hardness for a minimization version of Partial Digest (one of the central problems in computational biology whose exact complexity is unknown) [4]. Further applications may include: forming similar groups of people for medical experiments or market analysis, web clustering (finding groups of pages of similar content), or fair allocation of resources. Here, we look at the problem from the point of view of a tournament organizer: Suppose that you and your friends would like to organize a soccer tournament (you may replace soccer with the game of your choice) with a certain number of teams that will play against each other. Each team should be composed of some of your friends and – in order to make the tournament more interesting – you would like all teams to be of equal strength. Since you know your friends quite well, you also know how well each of them plays. More formally, you are given a set of n numbers A = {a1 , . . . , an }, where the value ai represents the excellence of your i-th friend in the chosen game, and you need to find k teams (disjoint subsets1 of A) such that the values of the players of each team add up to the same number. This problem can be seen as a variation of Bin Packing with fixed number of bins. In this new variation we require that all bins should be filled to the same level while it is  not necessary to use all the elements. For any set A of numbers, let sum(A) := a∈A a denote the sum of its elements. We call our problem k Equal Sum Subsets, where k is a fixed constant: Definition 1 (k Equal Sum Subsets). Given is a set of n numbers A = {a1 , . . . , an }. Are there k disjoint subsets S1 , . . . , Sk ⊆ A such that sum(S1 ) = . . . = sum(Sk )? The problem k Equal Sum Subsets has been recently shown to be NPcomplete for any constant k ≥ 3 [3]. The NP-completeness of the particular case where k = 2 has been shown earlier by Woeginger and Yu [8]. To the best of our knowledge, the variations of k Equal Sum Subsets that we study in this paper have not been investigated before in the literature. We have introduced parameter k for the number of equal size subsets as a fixed constant that is part of the problem definition. An interesting variation is to allow k to be a fixed function of the number of elements n, e.g. k = nq for some constant q. In the sequel, we will always consider k as a function of n; whenever k is a constant we simply write k = O(1). The definition of k Equal Sum Subsets corresponds to the situation in which it is allowed to form subsets that do not have the same number of elements. In some cases this makes sense; however, we may want to have the same 1 Under a strict formalism we should define A as a set of elements which have values {a1 , . . . , an }. For convenience, we prefer to identify elements with their values. Moreover, the term “disjoint subsets” refers to subsets that contain elements of A with different indices. 100 M. Cieliebak, S. Eidenbenz, and A. Pagourtzis number of elements in each subset (this would be especially useful in composing teams for a tournament). We thus define k Equal Sum Subsets of Specified Cardinality as follows: Definition 2 (k Equal Sum Subsets of Specified Cardinality). Given are a set of n numbers A = {a1 , . . . , an } and a cardinality c. Are there k disjoint subsets S1 , . . . , Sk ⊆ A with sum(S1 ) = . . . = sum(Sk ) such that each Si has cardinality c? There are two nice variations of this problem, depending on the parameter c. The first is to require c to be a fixed constant; this corresponds to always playing a specific game (e.g. if you always play soccer then it is c = 11). We call this problem k Equal Sum Subsets of Cardinality c. The second variation is to require only that all teams should have an equal number of players, without specifying this number; this indeed happens in several “unofficial” tournaments, e.g. when composing groups of people for medical experiments, or in online computer games. We call the second problem k Equal Sum Subsets of Equal Cardinality. Let us now consider another aspect of the problem. Your teams would be more efficient and happy if they consisted of players that like each other or, at least, that do not hate each other. Each of your friends has a list of people that she/he prefers as team-mates or, equivalently, a list of people that she/he would not like to have as team-mates. In order to compose k equipotent teams respecting such preferences/exclusions, you should be able to solve the following problem: Definition 3 (k Equal Sum Subsets with Exclusions). Given are a set of n numbers A = {a1 , . . . , an }, and an exclusion graph Gex = (A, Eex ) with vertex set A and edge set Eex ⊆ A × A. Are there k disjoint subsets S1 , . . . , Sk ⊆ A with sum(S1 ) = . . . = sum(Sk ) such that each set Si is an independent set in Gex , i.e., there is no edge between any two vertices in Si ? An overview of the results presented in this paper is given below. In Section 2, we propose a dynamic programming algorithm for k Equal Sum Subsets k with running time O( knS k−1 ), where n is the cardinality of the input set and S is the sum of all numbers in the input set; the algorithm runs in pseudo-polynomial time2 for k = O(1). For k Equal Sum Subsets with k = Ω(n), we show strong NP-completeness3 in Section 3 by proposing a reduction from 3-Partition. 2 3 That is, the running time of the algorithm is polynomial in (n, m), where n denotes the cardinality of the input set and m denotes the largest number of the input, but it is not necessarily polynomial in the length of the representation of the input (which is O(n log m)). This means that the problem remains NP-hard even when restricted to instances where all input numbers are polynomially bounded in the cardinality of the input set. In this case, no pseudo-polynomial time algorithm can exist for the problem (unless P = NP). For formal definitions and a detailed introduction to the theory of NP-completeness the reader is referred to [5]. Composing Equipotent Teams 101 In Section 4, we propose a polynomial-time algorithm for k Equal Sum Subsets of Cardinality c. The algorithm uses exhaustive search and runs in time O(nkc ), which is polynomial in n as the two parameters k and c are fixed constants. For k Equal Sum Subsets of Specified Cardinality, we show NP-completeness; the result holds also for k Equal Sum Subsets of Equal Cardinality. However, we show that none of these problems is strongly NPcomplete, by presenting an algorithm that can solve them in pseudo-polynomial time. In Section 5, we study k Equal Sum Subsets with Exclusions, which is NP-complete since it is a generalization of k Equal Sum Subsets. We present a pseudo-polynomial time algorithm for the case where k = 2. We also give a modification of this algorithm that additionally guarantees that the two sets will have an equal (specified or not) cardinality. We conclude in Section 6 presenting a set of open questions and problems. 1.1 Number Representation In many of our proofs, we use numbers that are expressed in the number system of some base B. We denote by a1 , . . . , an  the number 1≤i≤n ai B n−i ; we say that ai is the i-th digit of this number. In our proofs, we will choose base B large enough such that even adding up all numbers occurring in the reduction will not lead to carry-digits from one digit to the next. Therefore, we can add numbers digit by digit. The same holds for scalar products. For example, having base B = 27 and numbers α = 3, 5, 1, β = 2, 1, 0, then α + β = 5, 6, 1 and 3 · α = 9, 15, 3. We will generally make liberal use of the notation and allow different bases for each digit. We define the concatenation of two numbers by a1 , . . . , an   b1 , . . . , bm  := a1 , . . . , an , b1 , . . . , bm , i.e., α  β = αB m + β, where m is the number of digits in β. We will use ∆n (i) := 0, . . . , 0, 1, 0, . . . , 0 for the number that has n digits, all 0’s except for the i-th position where the digit is 1. Furthermore, 1 n := 1, . . . , 1 is the number that has n digits, all 1’s, and 0 n := 0, . . . , 0 has n zeros. Notice that 1 n = B n − 1. 2 A Pseudo-Polynomial Time Algorithm for k Equal Sum Subsets with k = O(1) We present a dynamic programming algorithm for k Equal Sum Subsets that uses basic ideas of well-known dynamic programming algorithms for Bin Packing with fixed number of bins [5]. For constant k, this algorithm runs in pseudopolynomial time. For an instance A = {a1 , . . . , an } of k Equal Sum Subsets, let S = sum(A). We define boolean variables F (i, s1 , . . . , sk ), where i ∈ {1, . . . , n} and sj ∈ {0, . . . , ⌊ Sk ⌋} for 1 ≤ j ≤ k. Variable F (i, s1 , . . . , sk ) will be TRUE if there are k disjoint subsets X1 , . . . , Xk ⊆ {a1 , . . . , ai } with sum(Xj ) = sj , for 1 ≤ j ≤ k. 102 M. Cieliebak, S. Eidenbenz, and A. Pagourtzis There are k sets of equal sum if and only if there exists a value s ∈ {1, . . . , ⌊ Sk ⌋} such that F (n, s, . . . , s) = TRUE. Clearly, F (1, s1 , . . . , sk ) is TRUE if and only if either si = 0 for 1 ≤ i ≤ k or there exists index j such that sj = a1 and si = 0 for all 1 ≤ i ≤ k, i = j. For i ∈ {2, . . . , n} and sj ∈ {0, . . . , ⌊ Sk ⌋}, variable F (i, s1 , . . . , sk ) can be expressed recursively as follows: F (i, s1 , . . . , sk ) = F (i − 1, s1 , . . . , sk ) ∨  F (i − 1, s1 , . . . , sj−1 , sj − ai , sj+1 , . . . , sk ). 1≤j≤k sj −ai ≥0 k The value of all variables can be determined in time O( knS k−1 ), since there are n⌊ Sk ⌋k variables, and computing each variable takes at most time O(k). This yields the following Theorem 1. There is a dynamic programming algorithm that solves k Equal k Sum Subsets for input A = {a1 , . . . , an } in time O( kn·S k−1 ), where S = sum(A). For k = O(1) this algorithm runs in pseudo-polynomial time. 3 Strong NP-Completeness of k Equal Sum Subsets with k = Ω(n) In Section 2 we gave a pseudo-polynomial time algorithm for k Equal Sum Subsets assuming that k is a fixed constant. We will now show that this is unlikely if k is a fixed function of the cardinality n of the input set. In particular, we will prove that k Equal Sum Subsets is strongly NP-complete if k = Ω(n). Let k = nq for some fixed integer q ≥ 2. We provide a polynomial reduction from 3-Partition, which is defined as follows: Given a multiset of n = 3m numbers P = {p1 , . . . , pn } and a number h with h4 < pi < h2 , for 1 ≤ i ≤ n, are there m pairwise disjoint sets T1 , . . . , Tm such that sum(Tj ) = h, for 1 ≤ j ≤ m? Observe that in a solution for 3-Partition, there are exactly three elements in each set Tj . Lemma 1. If k = nq for some fixed integer q ≥ 2, then 3-Partition can be reduced to k Equal Sum Subsets. Proof. Let P = {p1 , . . . , pn } and h be an instance of 3-Partition. If all elements in P are equal, then there is a trivial solution. Otherwise, let r = 3 · (q − 2) + 1 and ai = pi   0 r , for 1 ≤ i ≤ n 2n bj = h  0 r , for 1 ≤ j ≤ 3 dk,ℓ = 0  ∆r (k), for 1 ≤ k ≤ r, 1 ≤ ℓ ≤ n 3 Composing Equipotent Teams 103 Here, we use base B = 2nh for all numbers. Let A be the set containing all numbers ai , bj and dk,ℓ . We will use A as an instance of k Equal Sum Subsets. n 2n n The size of A is n′ = n + 2n 3 + r · 3 = n + 3 + (3 · (q − 2) + 1) · 3 = q · n. We prove that there is a solution for the 3-Partition instance P and h if and only ′ if there are nq disjoint subsets of A with equal sum. “only if ”: Let T1 , . . . , Tm be a solution for the 3-Partition instance. This induces m subsets of A with sum h  0 r , namely Si = {ai | pi ∈ Ti }. Together n′ with the 2n 3 subsets that contain exactly one of the bj ’s each, we have n = q subsets of equal sum h  0 r . “if ”: Assume there is a solution S1 , . . . , Sn for the k Equal Sum Subsets instance. Let Sj be any set in this solution. Then sum(Sj ) will have a zero in the r rightmost digits, since for each of these digits, there are only n3 numbers in A for which this digit is non-zero (which are not enough to have one of them in each of the n sets Sj ). Thus, only numbers ai and bj can occur in the solution; moreover, we only need to consider the first digit of these numbers (as the other are zeros). ′ Since not all numbers ai are equal, and the solution consists of nq = n disjoint sets, there must be at least one bj in one of the subsets in the solution. Thus, for all j we have sum(Sj ) ≥ h. On the other hand, the sum of all ai ’s and of all bj ’s is exactly n · h, therefore sum(Sj ) = h, which means that all ai ’s and all bj ’s appear in the solution. More specifically, there are 2n 3 sets in the solution such that each of them contains exactly one of the bj ’s, and each of the remaining n3 sets in the solution consists only of ai ’s, such that the corresponding ai ’s add up to h. Therefore, the latter sets immediately yield a solution for the 3-Partition instance. ⊓ ⊔ In the previous proof, r is a constant, therefore numbers ai and bj are polynomial in h and numbers dk,ℓ are bounded by a constant. Since 3-Partition is strongly NP-complete [5], k Equal Sum Subsets is strongly NP-hard for k = nq as well. Obviously, k Equal Sum Subsets is in NP even if k = nq for some fixed integer q ≥ 2, thus we have the following Theorem 2. k Equal Sum Subsets is NP-complete in the strong sense for k = nq , for any fixed integer q ≥ 2. 4 Restriction to Equal Cardinalities In this section we study the setting where we do not only require the teams to be of equal strength, but to be of equal cardinality as well. If we are interested in a specific type of game, e.g. soccer, then the size of the teams is also fixed, say c = 11, and we have k Equal Sum Subsets of Cardinality c. This problem is solvable   in time polynomial in n by exhaustive search as follows: compute all N = nc subsets of the input set A that have cardinality c; consider all N k possible sets of k subsets, and for each one check if it consists of disjoint subsets 104 M. Cieliebak, S. Eidenbenz, and A. Pagourtzis of equal sum. This algorithm needs time O(nck ), which is polynomial in n, since c and k are constants. On the other hand, if the size of the teams is not fixed, but given as part of the input, then we have k Equal Sum Subsets of Specified Cardinality. We show that this problem is NP-hard by modifying a reduction used in [3] to show NP-completeness of k Equal Sum Subsets. The reduction is from Alternating Partition, which is the following NP-complete [5] variation of Partition: Given n pairs of numbers (u1 , v1 ), . . . , (un , vn ), are there two  disjoint u + sets of indices I and J with I ∪ J = {1, . . . , n} such that  i∈I i  j∈J vj =     v = u + u (equivalently, v + j ∈J vj )? i∈I ui + j∈J j i∈I i j∈J j i∈I i Lemma 2. Alternating Partition can be reduced to k Equal Sum Subsets of Specified Cardinality for any k ≥ 2. Proof. We transform a given Alternating Partition instance with pairs (u1 , v1 ), . . . , (un , vn ) into a k Equal n Sum Subsets of Specified Cardinality instance as follows: Let S = i=1 (ui + vi ). For each pair (ui , vi ) we create two numbers u′i = ui   ∆n (i) and vi′ = vi   ∆n (i). In addition, we create k − 2 (equal) numbers b1 , . . . , bk−2 with bi =  S2   ∆n (n). Finally, for each bi we create n − 1 numbers di,j = 0  ∆n (j), for 1 ≤ j ≤ n − 1. While we set the base of the first digit to k · S, for all other digits it suffices to use base n + 1, in order to ensure that no carry-digits can occur in any addition in the following proof. The set A that contains all u′i ’s, vi′ ’s, bi ’s, and dij ’s, together with chosen cardinality c = n, is our instance of k Equal Sum Subsets of Specified Cardinality. Assume first that we are given a solution for the Alternating Partition instance, i.e., two indices sets I and J. We create k equal sum subsets S1 , . . . , Sk as follows: for i = 1, . . . , k − 2, we have Si = {bi , di,1 , . . . , di,n−1 }; for the remaining two subsets, we let u′i ∈ Sk−1 , if i ∈ I, and vj′ ∈ Sk−1 , if j ∈ J, and we let u′j ∈ Sk , if j ∈ J, and vi′ ∈ Sk , if i ∈ I. Clearly, all these sets have n elements, and their sum is  S2   1 n . Now assume we are given a solution for the k Equal Sum Subsets of Specified Cardinality instance, i.e., k equal sum subsets S1 , . . . , Sk of cardinality n; in this case, all numbers participate in the sets Si , and the elements in each Si sum up to  S2   1 n . Since the first digit of each bi equals S2 , we may assume w.l.o.g. that for each 1 ≤ i ≤ k − 2, set Si contains bi and does not contain any number with non-zero first digit (i.e., it does not contain any u′j or any vj′ ). Therefore, all u′i ’s and vi′ ’s (and only these numbers) are in the remaining two subsets; this yields an alternating partition for the original instance, as u′i and vi′ can never be in the same subset since both have the (i + 1)-th digit non-zero. ⊓ ⊔ Since the problem k Equal Sum Subsets of Specified Cardinality is obviously in NP, we get the following: Theorem 3. For any k ≥ 2, k Equal Sum Subsets of Specified Cardinality is NP-complete. Composing Equipotent Teams 105 Remark: Note that the above reduction, hence also the theorem, holds also for the variation k Equal Sum Subsets of Equal Cardinality. This requires to employ a method where additional extra digits are used in order to force the equal sum subsets to include all augmented numbers that correspond to numbers in the Alternating Partition instance; a similar method has been used in [8] to establish the NP-completeness of Two Equal Sum Subsets (called EqualSubset-Sum there). However, these problems are not strongly NP-complete for fixed constant k. We will now describe how to convert the dynamic programming algorithm of Section 2 to a dynamic programming algorithm for k Equal Sum Subsets of Specified Cardinality and for k Equal Sum Subsets of Equal Cardinality. It suffices to add to our variables k more dimensions corresponding to cardinalities of the subsets. We define boolean variables F (i, s1 , . . . , sk , c1 , . . . , ck ), where i ∈ {1, . . . , n}, sj ∈ {0, . . . , ⌊ Sk ⌋} for 1 ≤ j ≤ k, and cj ∈ {0, . . . , ⌊ nk ⌋} for 1 ≤ j ≤ k. Variable F (i, s1 , . . . , sk , c1 , . . . , ck ) will be TRUE if there are k disjoint subsets X1 , . . . , Xk ⊆ {a1 , . . . , ai } with sum(Xj ) = sj and the cardinality of Xj is cj , for 1 ≤ j ≤ k. There are k subsets of equal sum and equal cardinality c if and only if there exists a value s ∈ {1, . . . , ⌊ Sk ⌋} such that F (n, s, . . . , s, c, . . . , c) = TRUE. Also, there are k subsets of equal sum and equal (non-specified) cardinality if and only if there exists a value s ∈ {1, . . . , ⌊ Sk ⌋} and a value d ∈ {1, . . . , ⌊ nk ⌋} such that F (n, s, . . . , s, d, . . . , d) = TRUE. Clearly, F (1, s1 , . . . , sk , c1 , . . . , ck ) = TRUE if and only if either si = 0, ci = 0 for 1 ≤ i ≤ k, or there exists index j such that sj = a1 , cj = 1, and si = 0 and ci = 0 for all 1 ≤ i ≤ k, i = j. Each variable F (i, s1 , . . . , sk , c1 , . . . , ck ), for i ∈ {2, ..., n}, sj ∈ {0, . . . , ⌊ Sk ⌋}, and cj ∈ {0, . . . , ⌊ nk ⌋}, can be expressed recursively as follows: F (i, s1 , . . . , sk , c1 , . . . , ck ) = F (i − 1, s1 , . . . , sk , c1 , . . . , ck ) ∨  F (i − 1, s1 , . . . , sj − ai , . . . , sk , c1 , . . . , cj − 1, . . . , ck ). 1≤j≤k sj −ai ≥0 cj >0 k k+1 ·n The boolean value of all variables can be determined in time O( Sk2k−1 ), S k n k since there are n⌊ k ⌋ ⌊ k ⌋ variables, and computing each variable takes at most time O(k). This yields the following: Theorem 4. There is a dynamic programming algorithm that solves k Equal Sum Subsets of Specified Cardinality and k Equal Sum Subsets of k ·nk+1 ), Equal Cardinality for input A = {a1 , . . . , an } in running time O( Sk2k−1 where S = sum(A). For k = O(1) this algorithm runs in pseudo-polynomial time. 106 5 M. Cieliebak, S. Eidenbenz, and A. Pagourtzis Adding Exclusion Constraints In this section we study the problem k Equal Sum Subsets with Exclusions where we are additionally given an exclusion graph (or its complement: a preference graph) and ask for teams that take this graph into account. Obviously, k Equal Sum Subsets with Exclusions is NP-complete, since k Equal Sum Subsets (shown NP-complete in [3]) is the special case where the exclusion graph is empty (Eex = ∅). Here, we present a pseudo-polynomial algorithm for the case k = 2, using a dynamic programming approach similarin-spirit to the one used for finding two equal sum subsets (without exclusions) [1]. Let A = {a1 , . . . , an } and Gex = (A, Eex ) be an instance of k Equal Sum Subsets with Exclusions. We assume w.l.o.g. that the input values are orn dered, i.e., a1 ≤ . . . ≤ an . Let S = i=1 ai . We define boolean variables F (k, t) for k ∈ {1, . . . , n} and t ∈ {1, . . . , S}. Variable F (k, t) will be TRUE if there exists a set X ⊆ A such that X ⊆ {a1 , . . . , ak }, ak ∈ X, sum(X) = t, and X is independent in Gex . For a TRUE entry F (k, t) we store the corresponding set in a second variable X(k, t). We compute the value of all variables F (k, t) by iterating over t and k. The algorithm runs until it finds the smallest t ∈ {1, . . . , S} for which there are indices k, ℓ ∈ {1, . . . , n} such that F (k, t) = F (ℓ, t) = TRUE; in this case, sets X(k, t) and X(ℓ, t) constitute a solution: sum(X(k, t)) = sum(X(ℓ, t)) = t, both sets are disjoint due to minimality of t, and both sets are independent in Gex . We initialize the variables as follows. For all 1 ≤ k ≤ n, we set F (k, t) = k FALSE for 1 ≤ t < ak and for i=1 ai < t ≤ S; moreover, we set F (k, ak ) = TRUE and X(k, ak ) = {ak }. Observe that these equations already define F (1, t) for 1 ≤ t ≤ S, and F (k, 1) for 1 ≤ k ≤ n. k After initialization, the table entries for k > 1 and ak ≤ t ≤ i=1 ai can be computed recursively: F (k, t) is TRUE if there exists an index ℓ ∈ {1, . . . , k − 1} such that F (ℓ, t − ak ) is TRUE and the subset X(ℓ, t − ak ) remains independent in Gex when adding ak . The recursive computation is as follows. F (k, t) = k−1  [ F (ℓ, t − ak ) ∧ ∀a ∈ X(ℓ, t − ak ), (a, ak ) ∈ Eex ]. ℓ=1 If F (k, t) is set to TRUE due to F (ℓ, t − ak ), then we set X(k, t) = X(ℓ, t − ak ) ∪ {ak }. The key observation for showing correctness is that for each F (k, t) considered by the algorithm there is at most one F (ℓ, t − ak ) that is TRUE, for 1 ≤ ℓ ≤ k − 1; if there were two, say ℓ1 , ℓ2 , then X(ℓ1 , t − ak ) and X(ℓ2 , t − ak ) would be a solution to the problem and the algorithm would have stopped earlier – a contradiction. This means that all subsets considered are constructed in a unique way, and therefore no information can be lost. In order to determine the value F (k, t), the algorithm considers k − 1 table entries. As shown above, only one of them may be TRUE; for such an entry, say F (ℓ, t − ak ), the (at most ℓ) elements of X(ℓ, t − ak ) are checked to see if they Composing Equipotent Teams 107 exclude ak . Hence, computation of F (k, t) takes time O(n) and the total time complexity of the algorithm is O(n2 · S). Therefore, we have the following Theorem 5. Two Equal Sum Subsets with Exclusions can be solved for input A = {a1 , . . . , an } and Gex = (A, Eex ) in pseudo-polynomial time O(n2 ·S), where S = sum(A). Remarks: Observe that the problem k Equal Sum Subsets of Cardinality c with Exclusions, where cardinality c is constant, and an exclusion graph is given, can be solved by exhaustive search in time O(nkc ) in the same way as the problem k Equal Sum Subsets of Cardinality c is solved (see Section 4). Moreover, we can have a pseudo-polynomial time algorithm for k Equal Sum Subsets of Equal Cardinality with Exclusions, where the cardinality is part of the input, if k = 2, by modifying the dynamic programming algorithm for Two Equal Sum Subsets with Exclusions as follows. We introduce a further dimension in our table F , the cardinality, and set F (k, t, c) to TRUE if there is a set X with sum(X) = t (and all other conditions as before), and such that the cardinality of X equals c. Again, we can fill the table recursively, and we stop as soon as we find values k, ℓ ∈ {1, . . . , n}, t ∈ {1, . . . , S} and c ∈ {1, . . . , n} such that F (k, t, c) = F (ℓ, t, c) = TRUE, which yields a solution. Notice that the corresponding two sets must be disjoint, since otherwise removing their intersection would yield two subsets of smaller equal cardinality that are independent in Gex ; thus, the algorithm - which constructs two sets of minimal cardinality - would have stopped earlier. Table F now has n2 · S entries, thus we can solve Two Equal Sum Subsets with Exclusions in time O(n3 · S). Note that the above sketched algorithm does not work for specified cardinalities, because there may be exponentially many ways to construct a subset of the correct cardinality. 6 Conclusion – Open Problems In this work we studied the problem k Equal Sum Subsets and some of its variations. We presented a pseudo-polynomial time algorithm for constant k, and proved strong NP-completeness for non-constant k, namely for the case in which we want to find nq subsets of equal sum, where n is the cardinality of the input set and q a constant. We also gave pseudo-polynomial time algorithms for the k Equal Sum Subsets of Specified Cardinality problem and for the Two Equal Sum Subsets with Exclusions problem, as well as for variations of them. Several questions remain open. Some of them are: determine the exact borderline between pseudo-polynomial time solvability and strong NP-completeness for k Equal Sum Subsets, for k being a function different than nq , for example k = logq n ; find faster dynamic programming algorithms for k Equal Sum Subsets of Specified Cardinality; and, finally, determine the complexity of k Equal Sum Subsets with Exclusions, i.e. is it solvable in pseudo-polynomial time or strongly NP-complete? 108 M. Cieliebak, S. Eidenbenz, and A. Pagourtzis Another promising direction is to investigate approximation versions related to the above problems, for example “given a set of numbers A, find k subsets of A with sums that are as similar as possible”. For k = 2, the problem has been studied by Bazgan et al. [1] and Woeginger [8]; an FPTAS was presented in [1]. We would like to find out whether there is an FPTAS for any constant k. Finally, it would be interesting to study phase transitions of these problems with respect to their parameters, in a spirit similar to the work of Borgs, Chayes and Pittel [2], where they analyzed the phase transition of Two Equal Sum Subsets. Acknowledgments. We would like to thank Peter Widmayer for several fruitful discussions and ideas in the context of this work. References 1. C. Bazgan, M. Santha, and Zs. Tuza; Efficient approximation algorithms for the Subset-Sum Equality problem; Proc. ICALP’98, pp. 387–396. 2. C. Borgs, J.T. Chayes, and B. Pittel; Sharp Threshold and Scaling Window for the Integer Partitioning Problem; Proc. STOC’01, pp. 330–336. 3. M. Cieliebak, S. Eidenbenz, A. Pagourtzis, and K. Schlude; Equal Sum Subsets: Complexity of Variations; Technical Report 370, ETH Zürich, Department of Computer Science, 2003. 4. M. Cieliebak, S. Eidenbenz, and P. Penna; Noisy Data Make the Partial Digest Problem N P -hard; Technical Report 381, ETH Zürich, Department of Computer Science, 2002. 5. M.R. Garey and D.S. Johnson; Computers and Intractability: A Guide to the Theory of NP-completeness; Freeman, San Francisco, 1979. 6. R.M. Karp; Reducibility among combinatorial problems; in R.E. Miller and J.W. Thatcher (eds.), Complexity of Computer Computations, Plenum Press, New York, pp. 85 – 103, 1972. 7. S. Martello and P. Toth; Knapsack Problems; John Wiley & Sons, Chichester, 1990. 8. G.J. Woeginger and Z.L. Yu; On the equal-subset-sum problem; Information Processing Letters, 42(6), pp. 299–302, 1992. Efficient Algorithms for GCD and Cubic Residuosity in the Ring of Eisenstein Integers⋆ Ivan Bjerre Damgård and Gudmund Skovbjerg Frandsen BRICS⋆⋆ Department of Computer Science University of Aarhus Ny Munkegade DK-8000 Aarhus C, Denmark {ivan,gudmund}@daimi.au.dk Abstract. We present simple and efficient algorithms for computing gcd and cubic residuosity in the ring of Eisenstein integers, Z[ζ], i.e. the integers extended with ζ, a complex primitive third root of unity. The algorithms are similar and may be seen as generalisations of the binary integer gcd and derived Jacobi symbol algorithms. Our algorithms take time O(n2 ) for n bit input. This is an improvement from the known results based on the Euclidean algorithm, and taking time O(n · M (n)), where M (n) denotes the complexity of multiplying n bit integers. The new algorithms have applications in practical primality tests and the implementation of cryptographic protocols. 1 Introduction The Eisenstein integers, Z[ζ] = {a+bζ | a, b ∈ Z}, is the ring of integers extended with a complex primitive third root of unity, i.e. ζ is root of x2 + x + 1. Since the ring Z[ζ] is a unique factorisation domain, a greatest common divisor (gcd) of two numbers is well-defined (up to multiplication by a unit). The gcd of two numbers may be found using the classic Euclidean algorithm, since Z[ζ] is an Euclidean domain, i.e. there is a norm N (·) : Z[ζ] \ {0} → N such that for a, b ∈ Z[ζ] \ {0} there is q, r ∈ Z[ζ] such that a = qb + r with r = 0 or N (r) < N (b). When a gcd algorithm is directly based on the Euclidean property, it requires a subroutine for division with remainder. For integers there is a very efficient alternative in the form of the binary gcd, that only requires addition/subtraction and division by two [12]. A corresponding Jacobi symbol algorithm has been analysed as well [11]. It turns out that there are natural generalisations of these binary algorithms over the integers to algorithms over the Eisenstein integers for computing the ⋆ ⋆⋆ Partially supported by the IST Programme of the EU under contract number IST1999-14186 (ALCOM-FT). Basic Research in Computer Science, Centre of the Danish National Research Foundation. A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 109–117, 2003. c Springer-Verlag Berlin Heidelberg 2003  110 I.B. Damgård and G.S. Frandsen gcd and the cubic residuosity symbol. The role of 2 is taken by the number 1 − ζ, which is a prime of norm 3 in Z[ζ]. We present and analyse these new algorithms. It turns out that they both have bit complexity O(n2 ), which is an improvement over the so far best known algorithms by Scheidler and Williams [8], Williams [16], Williams and Holte [17]. Their algorithms have complexity O(nM (n)), where M (n) is the complexity of integer multiplication and the best upper bound on M (n) is O(n log n log log n) [10]. 1.1 Related Work The asymptotically fastest algorithm for integer gcd takes time O(nlognloglogn) and is due to Schönhage [9]. There is a derived algorithm for the Jacobi symbol of complexity O(n(log n)2 log log n). For practical input sizes the most efficient algorithms seems to be variants of the binary gcd and derived Jacobi symbol algorithms [11,7]. If ωn is a complex primitive nth root of unity, say ωn = e2π/n then the ring Z[ωn ] is known to be norm-Euclidean for only finitely many n and the smallest unresolved case is n = 17 [6,4]. Weilert have generalised both the “binary” and the asymptotically fast gcd algorithms to Z[ω4 ] = Z[i], the ring of Gaussian integers [13,14]. In the latter case Weilert has also described a derived algorithm for computing the quartic residue symbol [15], and in all cases the complexity is identical to the complexity of the corresponding algorithm over Z. Williams [16], Williams and Holte [17] both describe algorithms for computing gcd and cubic residue symbols in Z[ω3 ], the Eisenstein integers. Scheidler and Williams describe algorithms for computing gcd and nth power residue symbol in Z[ωn ] for n = 3, 5, 7 [8]. Their algorithms all have complexity O(nM (n)) for M (n) being the complexity of integer multiplication. Weilert suggests that his binary (i.e. (1 + i)-ary) gcd algorithm for the Gaussian integers may generalise to other norm-Euclidean rings of algebraic integers [13]. Our gcd algorithm for the Eisenstein integers was obtained independently, but it may nevertheless be seen as a confirmation of this suggestion in a specific case. It is an open problem whether the “binary” approach to gcd computation may be further generalised to Z[ω5 ]. Weilert gives an algorithm for the quartic residue symbol that is derived from the asymptotically fast gcd algorithm over Z[i]. For practical purposes, however, it would be more interesting to have a version derived from the “binary” approach. In the last section of this paper, we sketch how one can obtain such an algorithm. 1.2 Applications Our algorithms may be used for the efficient computation of cubic residuosity in other rings than Z[ζ] when using an appropriate homomorphism. As an Efficient Algorithms for GCD and Cubic Residuosity 111 example, consider the finite field GF (p) for prime p ≡ 1 mod 3. A number z ∈ {1, . . . , p − 1} is a cubic residue precisely when z (p−1)/3 ≡ 1 mod p, implying that (non)residuosity may be decided by a (slow) modular exponentiation. However, it is possible to decide cubic residuosity much faster provided we make some preprocessing depending only on p. The preprocessing consists in factoring p over Z[ζ], i.e. finding a prime π ∈ Z[ζ] such that p = ππ̄. A suitable π may be found as π = gcd(p, r − ζ), where r ∈ Z is constructed as a solution to the quadratic equation x2 + x + 1 = 0 mod p. Following this preprocessing cubic residuosity of any z is decided using that z (p−1)/3 ≡ 1 mod p if and only if [z/π] = 1, where [·/·] denotes the cubic residuosity symbol. When the order of the multiplicative group in question is unknown, modular exponentiation cannot be used, but it may still be possible to identify some nonresidues by computing residue symbols. In particular, the primality test of Damgård and Frandsen [2] uses our algorithms for finding cubic nonresidues in a more general ring. Computation of gcd and cubic residuosity is also used for the implementation of cryptosystems by Scheidler and Williams [8], and by Williams [16]. 2 Preliminary Facts about Z[ζ] Z[ζ] is the ring of integers extended with a primitive third root of unity ζ (complex root of z 2 + z + 1). We will be using the following definitions and facts (see f.x. [3]). Define the two conjugate mappings σi : Z[ζ] → Z[ζ] by σi (ζ) = ζ i for i = 1, 2. The rational integer N (α) = σ1 (α)σ2 (α) ≥ 0 is called the norm of α ∈ Z[ζ], and N (a + bζ) = a2 + b2 − ab. (Note that σ2 (·) and N (·) coincides with complex conjugation and complex norm, respectively). A unit in Z[ζ] is an element of norm 1. There are 6 units in Z[ζ]: ±1, ±ζ, ±ζ 2 . Two elements α, β ∈ Z[ζ] are said to be associates if there exists a unit ǫ such that α = ǫβ. A prime π in Z[ζ] is a non-unit such that for any α, β ∈ Z[ζ], if π|αβ, then π|α or π|β. 1 − ζ is a prime in Z[ζ] and N (1 − ζ) = 3. A primary number has the form 1 + 3β for some β ∈ Z[ζ]. If α ∈ Z[ζ] is not divisible by 1 − ζ then α is associated to a primary number. (The definition of primary seems to vary in that some authors require the alternate forms ±1 + 3β [5] and −1 + 3β [3], but our definition is more convenient in the present context). A simple computation reveals that the norm of a primary number has residue 1 modulo 3, and since the norm is a multiplicative homomorpism it follows that every α ∈ Z[ζ] that is not divisible by 1 − ζ has N (α) ≡ 1( mod 3). 3 Computing GCD in Z[ζ] It turns out that the well-known binary integer gcd algorithm has a natural generalisation to a gcd algorithm for the Eisenstein integers. The generalised 112 I.B. Damgård and G.S. Frandsen algorithm is best understood by relating it to the binary algorithm in a nonstandard version. The authors are not aware of any description of the latter in the literature (for the standard version see f.x. [1]). A slightly nonstandard version of the binary gcd is the following. Every integer can be represented as (−1)i · 2j · (4m + 1), where i ∈ {0, 1}, j ≥ 0 and m ∈ Z. Without loss of generality, we may therefore assume that the numbers in question are of the form (4m+1). One iteration consists in replacing the numerically larger of the two numbers by their difference. If it is nonzero then the dividing 2-power (at least 22 ) may be removed without changing the gcd. If necessary the resulting odd number is multiplied with −1 to get a number of the form 4m + 1 and we are ready for the next iteration. It is fairly obvious that the product of the numeric values of the two numbers decreases by a factor at least 2 in each step until the gcd is found, and hence the gcd of two numbers a, b can be computed in time (log2 |ab|). To make the analogue, we recall that any element of Z[ζ] that is not divisible by 1 − ζ is associated to a (unique) primary number, i.e. a number of the form 1 + 3α. This implies that any element in Z[ζ] \ {0} has a (unique) representation on the form (−ζ)i · (1 − ζ)j · (1 + 3α) where 0 ≤ i < 6, 0 ≤ j and α ∈ Z[ζ]. In addition, the difference of two primary numbers is divisible by (1− ζ)2 , since 3 = −ζ 2 (1 − ζ)2 . Now a gcd algorithm for the Eisenstein integers may be formulated as an analogue to the binary integer gcd algorithm. We may assume without loss of generality that the two input numbers are primary. Replace the (normwise) larger of the two numbers with their difference. If it is nonzero, we may divide out any powers of (1 − ζ) that divide the difference (at least (1 − ζ)2 ) and convert the remaining factor to primary form by multiplying with a unit. We have again two primary numbers and the process may be continued. In each step we are required to identify the (normwise) larger of two numbers. Unfortunately it would be too costly to compute the relevant norm, but it suffices to choose the large number based on an approximation that we can afford to compute. By a slightly nontrivial argument one may prove that the product of the norms of the two numbers decreases by a factor at least 2 in each step until the gcd is found, and hence the gcd of two numbers α, β can be computed in time O(log2 N (αβ)). Algorithm 1 describes the details including a start-up to bring the two numbers on primary form. Theorem 1. Algorithm 1 takes time O(log2 N (αβ)) to compute the gcd of α, β, or formulated alternatively, the algorithm has bit complexity O(n2 ). Proof. Let us assume that a number α = a + bζ ∈ Z[ζ] is represented by the integer pair (a, b). Observe that since N (α) = a2 + b2 − ab, we have that log |a| + log |b| ≤ log N (α) ≤ 2(log |a| + log |b|) for a, b = 0, i.e. the logarithm of the norm is proportional to the number of bits in the representation of a number. We may do addition, subtraction on general numbers and multiplication by units in linear time. Since (1 − ζ)−1 = (2 + ζ)/3, division by (and check for divisibility by) (1 − ζ) may also be done in linear time. Efficient Algorithms for GCD and Cubic Residuosity 113 Algorithm 1 Compute gcd in Z[ζ] Require: α, β ∈ Z[ζ] \ {0} Ensure: g = gcd(α, β) 1: Let primary γ, δ ∈ Z[ζ] be defined by α = (−ζ)i1 · (1 − ζ)j1 · γ and β = (−ζ)i2 · (1 − ζ)j2 · δ. 2: g ← (1 − ζ)min{j1 ,j2 } 3: Replace α, β with γ, δ. 4: while α = β do 5: LOOP INVARIANT: α, β are primary 6: Let primary γ be defined by α − β = (−ζ)i · (1 − ζ)j · γ 7: Replace “approximately” larger of α, β with γ. 8: end while 9: g ← g · α Clearly, the startup part of the algorithm that brings the two numbers on primary form can be done in time O(log2 N (αβ)). Hence, we need only worry about the while loop. We want to prove that the norm of the numbers decrease for each iteration. The challenge is to see that forming the number α−β does not increase the norm too much. In fact N (α−β) ≤ 4·max{N (α), N (β)}. This follows trivially from the fact that the norm is non-negative combined with the equation N (α+β)+N (α− β) = 2(N (α)+N (β)) that may be proven by an elementary computation. Hence, for the γ computed in the loop of the algorithm, we get N (γ) = 3−j N (α − β) ≤ 3−2 4 · max{N (α), N (β)}. In each iteration, γ ideally replaces the one of α and β with the larger norm. However, we can not afford to actually compute the norms to find out which one is the larger. Fortunately, by Lemma 1, it is possible in linear time to compute an approximate norm that may be slightly smaller than the exact norm, namely up to a factor 9/8. When γ replaces the one of α and β with the larger approximate norm, we know that N (αβ) decreases by a factor at least 9/4 · 8/9 = 2 in each iteration, i.e. the total number of iterations is O(log N (αβ)). Each loop iteration takes time O(log N (αβ)) except possibly for finding the exponent of (1 − ζ) that divides α − β. Assume that (1 − ζ)ti is the maximal power of (1 − ζ) that divides α − β in the ith  iteration. Then the combined time complexity of all loop iterations is O(( i ti ) log N (αβ)). We also know that the norm decreases by a factor at least 3ti −2 · 2 in the ith iteration, i.e.  ti −2 i (3 · 2) ≤ N (αβ). Since there is only O(log  N (αβ)) iterations it follows that i 3ti ≤ (9/2)O(log N (αβ)) N (αβ) and hence i ti = O(log N (αβ)). Lemma 1. Given α = a + bζ ∈ Z[ζ] it is possible to compute an approximate norm Ñ (α) such that 8 N (α) ≤ Ñ (α) ≤ N (α) 9 in linear time, i.e. in time O(log N (α)). 114 I.B. Damgård and G.S. Frandsen Proof. Note that N (a + bζ) = (a − b)2 + a2 + b2 . 2 Given ǫ > 0, we let d˜ denote some approximation to integer d satisfying that (1 − ǫ)|d| ≤ d˜ ≤ |d|. Note that (1 − ǫ)2 N (a + bζ) ≤ (a − b)2 + ã2 + b̃2 ≤ N (a + bζ) 2 Since we may compute a−b in linear time it suffices to compute ˜-approximations and square them in linear time for some ǫ < 1/18. Given d in the usual binary representation, we take d˜ to be |d| with all but the 6 most significant bits replaced with zeroes, in which case (1 − 1 )|d| ≤ d˜ ≤ |d| 32 and we can compute d˜2 from d in linear time. 4 Computing Cubic Residuosity in Z[ζ] Just as the usual integer gcd algorithms may be used for constructing algorithms for the Jacobi symbol, so can our earlier strategy for computing the gcd in Z[ζ] be used as the basis for an algorithm for computing the cubic residuosity symbol. We start by recalling the definition of the cubic residuosity symbol. [·/·] : Z[ζ] × (Z[ζ] − (1 − ζ)Z[ζ]) → {0, 1, ζ, ζ −1 } is defined as follows: – For prime π ∈ Z[ζ] where π is not associated to 1 − ζ: [α/π] = (α – For number β = t i=1 N (π)−1 3 ) mod π πimi ∈ Z[ζ] where β is not divisible by 1 − ζ: [α/β] = t  [α/πi ]mi i=1 Note that these rules imply [α/ǫ] = 1 for a unit ǫ and [α/β] = 0 when gcd(α, β) = 1. In addition, we will need the following laws satisfied by the cubic residuosity symbol (recall that β is primary when it has the form β = 1 + 3γ for γ ∈ Z[ζ]) [5]: – Modularity: [α/β] = [α′ /β], when α ≡ α′ ( mod β). Efficient Algorithms for GCD and Cubic Residuosity 115 – Multiplicativity: [αα′ /β] = [α/β] · [α′ /β]. – The cubic reciprocity law: [α/β] = [β/α], when α and β are both primary. – The complementary laws (for primary β = 1 + 3(m + nζ), where m, n ∈ Z) [1 − ζ/β] = ζ m , [ζ/β] = ζ −(m+n) , [−1/β] = 1. The cubic residuosity algorithm will follow the gcd algorithm closely. In each iteration we will assume the two numbers α, β to be primary with Ñ (α) ≥ Ñ (β). We write their difference on the form α − β = (−ζ)i (1 − ζ)j γ, for primary γ = 1+3(m+nζ). By the above laws, [α/β] = ζ mj−(m+n)i [γ/β]. If Ñ (α) < Ñ (β), we use the reciprocity law to swap γ and β before being ready to a new iteration. The algorithm stops, when the two primary numbers are identical. If the identical value (the gcd) is not 1 then the residuosity symbol evaluates to 0. Algorithm 2 describes the entire procedure including a start-up to ensure that the numbers are primary. Algorithm 2 Compute cubic residuosity in Z[ζ] Require: α, β ∈ Z[ζ] \ {0}, and β is not divisible by (1 − ζ) Ensure: c = [α/β] 1: Let primary γ, δ ∈ Z[ζ] be defined by α = (−ζ)i1 · (1 − ζ)j1 · γ and β = (−ζ)i2 · δ. 2: Let m, n ∈ Z be defined by δ = 1 + 3m + 3nζ. 3: t ← mj1 − (m + n)i1 mod 3 4: Replace α, β by γ, δ. 5: If Ñ (α) < Ñ (β) then interchange α, β. 6: while α = β do 7: LOOP INVARIANT: α, β are primary and Ñ (α) ≥ Ñ (β) 8: Let primary γ be defined by α − β = (−ζ)i · (1 − ζ)j · γ 9: Let m, n ∈ Z be defined by β = 1 + 3m + 3nζ. 10: t ← t + mj − (m + n)i mod 3 11: Replace α with γ. 12: If Ñ (α) < Ñ (β) then interchange α, β. 13: end while 14: If α = 1 then c ← 0 else c ← ζ t Theorem 2. Algorithm 2 takes time O(log2 N (αβ)) to compute [α/β], or formulated alternatively, the algorithm has bit complexity O(n2 ). Proof. The complexity analysis from the gcd algorithm carries over without essential changes. 116 5 I.B. Damgård and G.S. Frandsen Computing GCD and Quartic Residuosity in the Ring of Gaussian Integers We may construct fast algorithms for gcd and quartic residuosity in the ring of Gaussian integers, Z[i] = {a+bi | a, b ∈ Z}, in a completely analogous way to the algorithms over the Eisenstein integers. In the case of the gcd, this was essentially done by Weilert [13]. However, the case of the quartic residue symbol may be of independent interest since such an algorithm is likely to be more efficient for practical input values than the asymptically ultrafast algorithm [15]. Here is a sketch of the necessary facts (see [5]). There are 4 units in Z[i]: ±1, ±i. 1 + i is a prime in Z[i] and N (1 + i) = 2. A primary number has the form 1 + (2 + 2i)β for some β ∈ Z[i]. If α ∈ Z[i] is not divisible by 1 + i then α is associated to a primary number. In particular, any element in Z[i] \ {0} has a (unique) representation on the form ij · (1 + i)k · (1 + (2 + 2i)α) where 0 ≤ j < 4, 0 ≤ k and α ∈ Z[i]. In addition, the difference of two primary numbers is divisible by (1 + i)3 , since (2 + 2i) = −i(1 + i)3 . This is the basis for obtaining an algorithm for computing gcd over the Gaussian integers analogous to Algorithm 1. This new algorithm has also bit complexity O(n2 ) as one may prove when using that N ((1 + i)3 ) = 8 and N (α − β) ≤ 4 · max{N (α), N (β)}. For computing quartic residuosity, we need more facts [5]. If π is a prime in Z[i] and π is not associated to 1 + i then N (π) ≡ 1( mod 4), and the quartic residue symbol [·/·] : Z[i] × (Z[i] − (1 + i)Z[i]) → {0, 1, −1, i, −i} is defined as follows: – For prime π ∈ Z[i] where π is not associated to 1 + i: N (π)−1 4 [α/π] = (α – For number β = t j=1 mj πj ) mod π ∈ Z[i] where β is not divisible by 1 + i: t  [α/β] = [α/πj ]mj j=1 The quartic residuosity symbol satisfies in addition – Modularity: [α/β] = [α′ /β], when α ≡ α′ ( mod β). – Multiplicativity: [αα′ /β] = [α/β] · [α′ /β]. – The quartic reciprocity law: [α/β] = [β/α] · (−1) N (α)−1 N (β)−1 · 4 4 , when α and β are both primary. Efficient Algorithms for GCD and Cubic Residuosity 117 – The complementary laws (for primary β = 1 + (2 + 2i)(m + ni), where m, n ∈ Z) 2 [1 + i/β] = i−n−(n+m) , [i/β] = in−m . This is the basis for obtaining an algorithm for computing quartic residuosity analogous to Algorithm 2. This new algorithm has also bit complexity O(n2 ). References 1. Eric Bach and Jeffrey Shallit. Algorithmic number theory. Vol. 1. Foundations of Computing Series. MIT Press, Cambridge, MA, 1996. Efficient algorithms. 2. Ivan B. Damgård and Gudmund Skovbjerg Frandsen. An extended quadratic Frobenius primality test with average and worst case error estimates. Research Series RS-03-9, BRICS, Department of Computer Science, University of Aarhus, February 2003. Extended abstract in these proceedings. 3. Kenneth Ireland and Michael Rosen. A classical introduction to modern number theory, Vol. 84 of Graduate Texts in Mathematics. Springer-Verlag, New York, second edition, 1990. 4. Franz Lemmermeyer. The Euclidean algorithm in algebraic number fields. Exposition. Math. 13(5) (1995), 385–416. 5. Franz Lemmermeyer. Reciprocity laws. Springer Monographs in Mathematics. Springer-Verlag, Berlin, 2000. From Euler to Eisenstein. 6. Hendrik W. Lenstra, Jr. Euclidean number fields. I. Math. Intelligencer 2(1) (1979/80), 6–15. 7. Shawna Meyer Eikenberry and Jonathan P. Sorenson. Efficient algorithms for computing the Jacobi symbol. J. Symbolic Comput. 26(4) (1998), 509–523. 8. Renate Scheidler and Hugh C. Williams. A public-key cryptosystem utilizing cyclotomic fields. Des. Codes Cryptogr. 6(2) (1995), 117–131. 9. A. Schönhage. Schnelle Berechnung von Kettenbruchentwicklungen. Acta Informat. 1 (1971), 139–144. 10. A. Schönhage and V. Strassen. Schnelle Multiplikation grosser Zahlen. Computing (Arch. Elektron. Rechnen) 7 (1971), 281–292. 11. Jeffrey Shallit and Jonathan Sorenson. A binary algorithm for the jacobi symbol. ACM SIGSAM Bull. 27(1) (1993), 4–11. 12. J. Stein. Computationals problems associated with Racah algebra. J. Comput. Phys. 1 (1967), 397–405. 13. André Weilert. (1 + i)-ary GCD computation in Z[i] is an analogue to the binary GCD algorithm. J. Symbolic Comput. 30(5) (2000), 605–617. 14. André Weilert. Asymptotically fast GCD computation in Z[i]. In Algorithmic number theory (Leiden, 2000), Vol. 1838 of Lecture Notes in Comput. Sci., pp. 595–613. Springer, Berlin, 2000. 15. André Weilert. Fast computation of the biquadratic residue symbol. J. Number Theory 96(1) (2002), 133–151. 16. H. C. Williams. An M 3 public-key encryption scheme. In Advances in cryptology— CRYPTO ’85 (Santa Barbara, Calif., 1985), Vol. 218 of Lecture Notes in Comput. Sci., pp. 358–368. Springer, Berlin, 1986. 17. H. C. Williams and R. Holte. Computation of the solution of x3 + Dy 3 = 1. Math. Comp. 31(139) (1977), 778–785. An Extended Quadratic Frobenius Primality Test with Average and Worst Case Error Estimates⋆ ⋆⋆ Ivan Bjerre Damgård and Gudmund Skovbjerg Frandsen BRICS⋆ ⋆ ⋆ Department of Computer Science, University of Aarhus. {ivan,gudmund}@daimi.au.dk Abstract. We present an Extended Quadratic Frobenius Primality Test (EQFT), which is related to an extends the Miller-Rabin test and the Quadratic Frobenius test (QFT) by Grantham. EQFT takes time about equivalent to 2 Miller-Rabin tests, but has much smaller error probability, namely 256/331776t for t iterations of the test in the worst case. We give bounds on the average-case behaviour of the test: consider the algorithm that repeatedly chooses random odd k bit numbers, subjects them to t iterations of our test and outputs the first one found that passes all tests. We obtain numeric upper bounds for the error probability of this algorithm as well as a general closed expression bounding the error. For instance, it is at most 2−143 for k = 500, t = 2. Compared to earlier similar results for the Miller-Rabin test, the results indicates that our test in the average case has the effect of 9 Miller-Rabin tests, while only taking time equivalent to about 2 such tests. We also give bounds for the error in case a prime is sought by incremental search from a random starting point. 1 Introduction Efficient methods for primality testing are important, in theory as well as in practice. Tests that always return correct results exist see for instance [1], but all known tests of this type are only of theoretical interest because they are much too inefficient to be useful in practice. In contrast, tests that accept composite numbers with bounded probability are typically much more efficient. This paper presents and analyses one such test. Primality tests are used, for instance, in public-key cryptography, where efficient methods for generating large, random primes are indispensable tools. Here, it is important to know how the test behaves in the average case. But there are also scenarios (e.g., in connection with DiffieHellman key exchange) where one needs to test if a number n is prime and where ⋆ ⋆⋆ ⋆⋆⋆ Partially supported by the IST Programme of the EU under contract number IST1999-14186 (ALCOM-FT). Full paper is available at http://www.brics.dk/RS/03/9/index.html Basic Research in Computer Science, Centre of the Danish National Research Foundation. A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 118–131, 2003. c Springer-Verlag Berlin Heidelberg 2003  An Extended Quadratic Frobenius Primality Test 119 n may have been chosen by an adversary. Here the worst case performance of the test is important. Virtually all known probabilistic tests are built on the same basic principle: from the input number n, one defines an Abelian group and then tests if the group structure we expect to see if n is prime, is actually present. The well-known Miller-Rabin test uses the group Zn∗ in exactly this way. A natural alternative is to try a quadratic extension of Zn , that is, we look at the ring Zn [x]/(f (x)) where f (x) is a degree 2 polynomial chosen such that it is guaranteed to be irreducible if n is prime. In that case the ring is isomorphic to the finite field with n2 elements, GF (n2 ). This approach was used successfully by Grantham [6], who proposed the Quadratic Frobenius Test (QFT), and showed that it accepts a composite with probability at most 1/7710, i.e. a better bound than may be achieved using 6 independent Miller-Rabin tests, while asymptotically taking time approximately equivalent to only 3 such tests. Müller proposes a different approach based on computation of square roots, the MQFT [7,8] which takes the same time as QFT and has error probability essentially1 1/131040. Just as for the Miller-Rabin test, however, it seems that most composites would be accepted with probability much smaller than the worst-case numbers. A precise result quantifying this intuition would allow us to give better results on the average case behaviour of the test, i.e., when it is used to test numbers chosen at random, say, from some interval. Such an analysis has been done by Damgård, Landrock and Pomerance for the Miller-Rabin test, but no corresponding result for QFT or MQFT is known. In this paper, we propose a new test that can be seen as an extension of QFT. We call this the Extended Quadratic Frobenius test (EQFT). EQFT comes in two variants, EQFTac which works well in an average case analysis and EQFTwc, which is better for applications where the worst case behavior is important. For the average case analysis: consider an algorithm that repeatedly chooses random odd k-bit numbers, subject each number to t iterations of EQFTac, and outputs the first number found that passes all t tests. Under the ERH, each iteration takes expected time equivalent to about 2 Miller-Rabin tests, or 2/3 of the time for QFT/MQFT (the ERH is only used to bound the run time and does not affect the error probability). Let qk,t be the probability that a composite is output. We derive numeric upper bounds for qk,t , e.g., we show q500,2 ≤ 2−143 , and also show√a general upper bound, namely for 2 ≤ t ≤ k − 1, qk,t is O(k 3/2 2(σt +1)t t−1/2 4− 2σt tk ) with an easily computable big-O constant, where σt = log2 24 − 2/t. Comparison to the similar analysis by Damgård et al. for the MR test indicates that for t ≥ 2, our test in the average case roughly speaking has the effect of 9 Miller-Rabin tests, while only taking time equivalent to 2 such tests. We also analyze the error probability when a random k-bit prime is instead generated using incremental search from a random starting point, still using (up to) t iterations of our test to distinguish primes from composites. 1 The test and analysis results are a bit different, depending on whether the input is 3 or 1 modulo 4, see [7,8] for details 120 I.B. Damgård and G.S. Frandsen Concerning worst case analysis, we show that t iterations of EQFTwc err with probability at most 256/331776t except for an explicit finite set of numbers2 . The same worst case error probability can be shown for EQFTac, but this variant is up to 4 times slower on worst case inputs than in the average case, namely on numbers n where very large powers of 2 and 3 divide n2 − 1. For EQFTwc, on the other hand, t iterations take time equivalent to about 2t + 2 MR tests on all inputs (still assuming ERH). For comparison with EQFT/MQFT, assume that we are willing to spend the same fixed amount of time testing an input number. Then EQFTwc gives asymptotically a better bound on the error probability: using time approximately corresponding to 6t Miller-Rabin tests, we get error probability 1/77102t ≈ 1/19.86t using QFT, 1/1310402t ≈ 1/50.86t using MQFT, and 256/3317763t−1 ≈ 1/5766t using EQFTwc. 2 The Intuition behind EQFT 2.1 A Simple Initial Idea Given the number n to be tested, we start by constructing a quadratic extension Zn [X]/(f (X)), which is kept fixed during the entire test (across all iterations). We let H be the multiplicative group in this extension ring. If n is prime, the quadratic extension is a field, and so H is cyclic of order n2 − 1. We may of course assume that n is not divisible by 2 or 3, which implies that n2 − 1 is always divisible by 24. Let H24 be the subgroup of elements of order dividing 24. If H is cyclic, then clearly |H24 | = 24. On the other hand, if n is not prime, H is the direct product of a number of subgroups, one for each distinct prime factor in n, and we may have |H24 | ≫ 24. Now, suppose we are already given an element r ∈ H of order 24. Then a very simple approach to a primality test could be the following: Choose a random element z in H, and verify that z n = z̄, where z̄ refers to the standard conjugate 2 (explained later). This implies z n −1 = 1 for any invertible z and so is similar to the classical Fermat test. It is, however, in general a much stronger test than just checking the order of z. Then, from z construct an element z ′ chosen from H24 with some “suitable” distribution. For this intuitive explanation, just think of z ′ as being uniform in H24 . Now check that z ′ ∈< r >, i.e. is a power of r. This must be the case if n is prime, but may fail if n is composite. This is similar to the part of the MR test that checks for existence of elements of order 2 different from -1. To estimate the error probability, let ω be the number of distinct prime factors in n. Since H is the direct product of ω subgroups, H24 is typically of order 24ω 3 . As one might then expect, it can be shown that the error probability of the test is at most 24/24ω times the probability that z n = z̄ . The factor 241−ω corresponds to the factor of 21−ω one obtains for the MR test. 2 3 namely if n has no prime factors less than 118, or if n ≥ 242 it may be smaller, but then the Fermat-like part of the test is stronger than otherwise, so we only consider the maximal case in this section An Extended Quadratic Frobenius Primality Test 2.2 121 Some Problems and Two Ways to Solve Them It is not clear how to construct an element of order 24 (if it exists at all), and we have not specified how to construct z ′ from z. We present two different approaches to these problems. EQFTwc. In this approach, we run a start-up procedure that may discover that n is composite. But if not, it constructs an element of order 24 and also guarantees that H contains ω distinct subgroups, each of order divisible by 2u 3v , where 2u , 3v are the maximal 2- and 3-powers dividing n2 − 1. This procedure runs in expected time O(1) Miller-Rabin tests. Details on the idea behind it are given in Section 5. Having run the start-up procedure, we construct z ′ as 2 z ′ = z (n −1)/24 . Note that without the condition on the subgroups of H, we could have z ′ = 1 always which would clearly be bad. Each z can be tested in time approximately 2 MR tests, for any n. This leads to the test we call EQFTwc (since it works well in a worst case analysis). EQFTac. The other approach avoids spending time on the start-up. This comes at the cost that the test becomes slower on n’s where u, v are very large. But this only affects a small fraction of the potential inputs and is not important when testing randomly chosen n, since then the expected values of u, v are constant. The basic idea is the following: we start choosing random z’s immediately, and instead of trying to produce an element in H24 from z, we look separately for an element of order dividing 3 and one of order dividing 8. For order 3, 2 v we compute z (n −1)/3 and repeatedly cube this value at most v times. This is guaranteed to produce an element of order 3, if 3 divides the order of z. If we already know an element ξ3 of order 3, we can check that the new element we produce is in the group generated by ξ3 , and if not, n is composite. Of course, we do not know an element of order 3 from the start, but note that the computations we do on each z may produce such an element. So if we do several iterations of the test, as soon as an iteration produces an element of order 3, this can be used as ξ3 by subsequent iterations. A similar idea can be applied to elements of order 8. This leads to a test of strength comparable to EQFTwc, except for one problem: the iterations we do before finding elements of the right order may have larger error probability than the others. This can be compensated for by a number of further tricks: rather than choosing z uniformly, we require that N (z) has Jacobi symbol 1, where N () is a fixed homomorphism from H to Zn∗ defined below. This means we can expect z to have order a factor 2 smaller than otherwise4 , and this turns out to improve the error probability of the Fermat-like part of the test by a factor of 21−ω . Moreover, some partial testing of the elements we produce is always possible: for instance, we know n is composite if we see an element of order 2 different from -1. These tricks imply that the test, up to 4 This also means that we should look for an element ξ4 of order 4 (and not 8) in the part of the test that produces elements of order a 2-power 122 I.B. Damgård and G.S. Frandsen a small constant factor on the error probability, is as good as if we had known ξ3 , ξ4 from the start. This version of the test is called EQFTac (since it works well in an average case analysis). We show that it satisfies the same upper bound on the error probability as we have for EQFTwc. 2.3 Comparison to Other Tests We give some comments on the similarities and difference between EQFT and Grantham’s QFT. In QFT the quadratic extension, that is, the polynomial f (x), is randomly chosen, whereas the element corresponding to our z is chosen deterministically, given f (x). This seems to simplify the error analysis for EQFT. Other than that, the Fermat part of QFT is transplanted almost directly to EQFT. For the test for roots of 1, QFT does something directly corresponding to the square root of 1 test from Miller-Rabin, but does nothing relating to elements of higher order. In fact, several of our ideas cannot be directly applied to QFT since there, f (x) changes between iterations. As for the running time, since our error analysis works for any (i.e. a worst case) quadratic extension, we can pick one that has a particularly fast implementation of arithmetic, and this is the basis for the earlier mentioned difference in running time between EQFT and QFT. A final comment relates to the comparison in running times between MillerRabin, Grantham’s and our test. Using the standard way to state running times in the literature, the Miller-Rabin, resp. Grantham’s, resp. our test run in time log n+o(log n) resp. 3 log n+o(log n) resp. 2 log n+o(log n)) multiplications in Zn . However, the running time of Miller-Rabin is actually log n squarings +o(log n) multiplications in Zn , while the 3 log n (2 log n) multiplications mentioned for the other tests are a mix of squarings and multiplications. So we should also compare the times for modular multiplications and squarings. On a standard, say, 32 bit architecture, a modular multiplication takes time about 1.25 times that of a modular squaring if the numbers involved are very large. However, if we use the fastest known modular multiplication method (which is Montgomery’s in this case, where n stays constant over many multiplications), the factor is smaller for numbers in the range of practical interest. Concrete measurements using highly optimized C code shows that it is between 1 and 1.08 for numbers of length 500-1000 bits. Finally, when using dedicated hardware the factor is exactly 1 in most cases. So we conclude that the comparisons we stated are quite accurate also for practical purposes. 2.4 The Ring R(n, c) and EQFTac Definition 1. Let n be an odd integer and let c be a unit modulo n. Let R(n, c) denote the ring Z[x]/(n, x2 − c). More concretely, an element z ∈ R(n, c) can be thought of as a degree 1 polynomial z = ax + b, where a, b ∈ Zn , and arithmetic on polynomials is modulo x2 − c where coefficients are computed on modulo n. An Extended Quadratic Frobenius Primality Test 123 Let p be an odd prime. If c is not a square modulo p, i.e. (c/p) = −1, then the polynomial x2 − c is irreducible modulo p and R(p, c) is isomorphic to GF (p2 ). Definition 2. Define the following multiplicative homomorphisms on R(n, c) (assume z = ax + b): · : R(n, c) → R(n, c), z = −ax + b N (·) : R(n, c) → Zn , (1) 2 2 N (z) = z · z = b − ca (2) and define the map (·/·) : Z × Z → {−1, 0, 1} to be the Jacobi symbol. The maps · and N (·) are both multiplicative homomorphisms whether n is composite or n is a prime. The primality test will be based on some additional properties that are satisfied when p is a prime and (c/p) = −1, in which case R(p, c) ≃ GF (p2 ): Frobenius property / generalised Fermat property: Conjugation, z → z, is a field automorphism on GF (p2 ). In characteristic p, the Frobenius map that raises to the p’th power is also an automorphism, using this it follows easily that z = zp (3) Quadratic residue property / generalised Solovay-Strassen property: The norm, z → N (z), is a surjective multiplicative homomorphism from GF (p2 ) to the subfield GF (p). As such the norm maps squares to squares and non-squares to non-squares, it follows from the definition of the norm and (3) that 2 z (p −1)/2 = N (z)(p−1)/2 = (N (z)/p) (4) 4’th-root-of-1-test / generalised Miller-Rabin property: Since GF (p2 ) is a field there are only four possible 4th roots of 1 namely 1, −1 and ξ4 , −ξ4 , the two roots of the cyclotomic polynomial Φ4 (x) = x2 + 1. In particular, this implies for p2 − 1 = 2u 3v q where (q, 6) = 1 that if z ∈ GF (p2 ) \ {0} is a square then z3 v q = ±1, or z 2 i v 3 q = ±ξ4 for some i = 0, . . . , u − 3 (5) 3’rd-root-of-1-test: Since GF (p2 ) is a field there is only three possible 3rd roots of 1 namely 1 and ξ3 , ξ3−1 , the two roots of the cyclotomic polynomial Φ3 (x) = x2 + x + 1. In particular, this implies for p2 − 1 = 2u 3v q where (q, 6) = 1 that if z ∈ GF (p2 ) \ {0} then z2 u q = 1, or z 2 u i 3 q = ξ3±1 for some i = 0, . . . , v − 1 (6) The actual test will have two parts (see algorithm 1). In the first part, a specific quadratic extension is chosen, i.e. R(n, c) for an explicit c. In the second part, the above properties of R(n, c) are tested for a random choice of z. When the EQFTac is run several times on the same n, only the second part is executed multiple times. The second part receives two extra inputs, a 3rd and a 4th root of 1. On the first execution of the second part these are both 1. During later 124 I.B. Damgård and G.S. Frandsen Algorithm 1 Extended Quadratic Frobenius Test (EQFTac). First part (construct quadratic extension): Require: input is odd number n ≥ 13 Ensure: output is “composite” or c satisfying (c/n) = −1 1: if n is divisible by a prime less than 13 return “composite” 2: if n is a perfect square return “composite” 3: choose a small c with (c/n) = −1; return c Second part (make actual test): Require: input is n, c, r3 , r4 , where n ≥ 5 not divisible by 2 or 3, (c/n) = −1, r3 ∈ {1} ∪ {ξ ∈ R(n, c) | Φ3 (ξ) = 0} and r4 ∈ {1, −1} ∪ {ξ ∈ R(n, c) | Φ4 (ξ) = 0} Let u, v be defined by n2 − 1 = 2u 3v q for (q, 6) = 1. Ensure: output is “composite”, or “probable prime”, s3 , s4 , where s3 ∈ {1} ∪ {ξ ∈ R(n, c) | Φ3 (ξ) = 0} and s4 ∈ {1, −1} ∪ {ξ ∈ R(n, c) | Φ4 (ξ) = 0} 4: select random z ∈ R(n, c)∗ with (N (z)/n) = 1 2 5: if z = z n or z (n −1)/2 = 1 return “composite” v i v 6: if z 3 q = 1 and z 2 3 q = −1 for all i = 0, . . . , u − 2 return “composite” i0 v 7: if we found i0 ≥ 1 with z 2 3 q = −1 (there can be at most one such value) then i0 −1 v v 3 q . Else let R4 (z) = z 3 q (= ±1); let R4 (z) = z 2 if (r4 = ±1 and R4 (z) ∈ {±1, ±r4 }) return “composite” u u i 8: if z 2 q = 1 and Φ3 (z 2 3 q ) = 0 for all i = 0, . . . , v − 1 return “composite” u i0 9: if we found i0 ≥ 0 with Φ3 (z 2 3 q ) = 0 (there can be at most one such value) u i0 then let R3 (z) = z 2 3 q else let R3 (z) = 1; if (r3 = 1 and R3 (z) ∈ {1, r3±1 }) return “composite” 10: if r3 = 1 and R3 (z) = 1 then let s3 = R3 (z) else let s3 = r3 ; if r4 = ±1 and R4 (z) = ±1 then let s4 = R4 (z) else let s4 = r4 ; return “probable prime”, s3 , s4 executions of the second part some nontrivial roots are possibly constructed. If so they are transferred to all subsequent executions of the second part. Here follows some more detailed comments to algorithm 1: Line 1 ensures that 24 | n2 − 1. In addition, we will use that n has no small prime factors in the later error analysis. Line 2 of the algorithm is necessary, since no c with (c/n) = −1 exists when n is a perfect square. Line 3 of the algorithm ensures that R(n, c) ≃ GF (n2 ) when n is a prime. Lemma 2 defines more precisely what “small” means. Line 4 makes sure that z is a square, when n is a prime. Line 5 checks equations (3) and (4), the latter in accordance with the condition enforced in line 4. Line 6 checks equation (5) to the extent possible without having knowledge of ξ4 , a primitive 4th root of 1. Line 7f continues the check of equation (5) by using any ξ4 given on the input. Line 8 checks equation (6) to the extent possible without having knowledge of ξ3 , a primitive 3rd root of 1. An Extended Quadratic Frobenius Primality Test 125 Line 9f continues the check of equation (6) by using any ξ3 given on the input. 2.5 Implementation of the Test High powers of elements in R(n, c) may be computed efficiently when c is (numerically) small. Represent z ∈ R(n, c) in the natural way by ((Az , Bz ) ∈ Zn × Zn , i.e. z = Az x + Bz . Lemma 1. Let z, w ∈ R(n, c): 1. z · w may be computed from z and w using 3 multiplications and O(log c) additions in Zn 2. z 2 may be computed from z using 2 multiplications and O(log c) additions in Zn Proof. For 1, we use the equations Azw = m1 + m2 and Bzw = (cAz + Bz )(Aw + Bw ) − (cm1 + m2 ) with m1 = Az Bw and m2 = Bz Aw . For 2, we need only observe that in the proof of 1, z = w implies that m1 = m2 . We also need to argue that it is easy to find a small c with (c/n) = −1. One may note that if n = 3 mod 4, then c = −1 can always be used, and if n = 5 mod 8, then c = 2 will work. In general, we have the following: Lemma 2. Let n be an odd composite number that is not a perfect square. Let π− (x, n) denote the number of primes p ≤ x such that (p/n) = −1, and, as usual, let π(x) denote the total number of primes p ≤ x. Assuming the Extended Riemann Hypothesis (ERH), there exists a constant C (independent of n) such that 1 π− (x, n) > for all x ≥ C(log n log log n)2 π(x) 3 Proof. We refer to the full paper for the proof that is based on [2, th.8.4.6]. Theorem 1. Let n be a number that is not divisible by 2 or 3, and let u ≥ 3 and v ≥ 1 be maximal such that n2 − 1 = 2u 3v q. There is an implementation of algorithm 1 that on input n takes expected time equivalent to 2 log n + O(u + v) + o(log n) multiplications in Zn , when assuming the ERH. Remark 1. We can only prove a bound on the expected time, due to the random selection of an element z (in line 4) having a property that is only satisfied by half the elements, and to the selection of a suitable c (line 3), where at least a third of the candidates are usable. Although there is in principle no bound on the maximal time needed, the variance around the expectation is small because the probability of failing to find a useful z and c drops exponentially with the number of attempts. We emphasize that the ERH is only used to bound the 126 I.B. Damgård and G.S. Frandsen running time (of line 3) and does not affect the error probability, as is the case with the original Miller test. The detailed implementation of algorithm 1 may be optimized in various ways. The implementation given in the proof that follows this remark has focused on simplicity more than saving a few multiplications. However, we are not aware of any implementation that avoids the O(u + v) term in the complexity analysis. Proof. We will first argue that only lines 5-9 in the algorithm have any significance in the complexity analysis. line 2. By Newton iteration the square root of n may be computed using O(log log n) multiplications. line 3. By lemma 2, we expect to find a c of size O((log n log log n)2 ) such that (c/n) = −1 after three attempts (or discover that n is composite). line 4. z is selected randomly from R(n, c) \ {0}. We expect to find z with (N (z)/n) = 1 after two attempts (or discover that n is composite). line 5-9. Here we need to explain how it is possible to simultaneously verify that z = z n , and do both a 4’th-root-of-1-test and a 3’rd-root-of-1-test without using too many multiplications. We refer to lemma 1 for the implementation of arithmetic in R(n, c). Define s, r by n = 2u 3v s + r for 0 < r < 2u 3v . A simple calculation confirms that q = ns + rs + (r2 − 1)/(2u 3v ), (7) where the last fraction is integral. Go through the following computational steps using the z selected in line 4 of the algorithm: 1. compute z s . This uses 2 log n + o(log n) multiplications in Zn . 2. compute z n . Starting from step 1 this requires O(v + u) multiplications in Zn . 3. verify z n = z. 4. compute z q . One may compute z q from step 1 using O(v + u) multiplications in Zn , when using (7) and the shortcut z ns = z s , where the shortcut is implied by step 3 and exponentiation and conjugation being commuting maps. v v 2 v u−2 v 5. compute z 3 q , z 2·3 q , z 2 3 q , . . . , z 2 3 q . Starting from step 4 this requires O(v + u) multiplications in Zn . v i v 6. verify that z 3 q = 1 or z 2 3 q = −1 for some 0 ≤ i ≤ u − 2. If there is i0 ≥ 1 i0 v i0 −1 v 3 q with z 2 3 q = −1 and if ξ4 is present, verify that z 2 = ±ξ4 . u u u 2 u v−1 2 q 2 3q 2 3 q 2 3 q 7. compute z , z ,z ,...,z . Starting from step 4 this requires O(v + u) multiplications in Zn . u i 8. By step 6 there must be an i (0 ≤ i ≤ v) such that z 2 3 q = 1. Let i0 be the u i0 −1 q smallest such i. If i0 ≥ 1 verify that z 2 3 is a root of x2 + x + 1. If ξ3 is ±1 2u 3i0 −1 q present, verify in addition that z = ξ3 An Extended Quadratic Frobenius Primality Test 3 127 An Expression Bounding the Error Probability Theorem 2 assumes that the auxiliary inputs r3 , r4 are “good”, which should be taken to mean that they are non-trivial third and fourth roots of 1, and are roots in the third and fourth cyclotomic polynomial (provided such roots exist in R(n, c). When EQFT is executed as described earlier, we cannot be sure that r3 , r4 are good. However, the probability that they are indeed good is sufficiently large that the theorem can still be used to bound the actual error probability as shown in Theorem 3 (for proofs, see the full paper): Theorem ω 2.i Let n be an ωodd composite number with prime power factorisation n = i=1 pm , let Ω = i i=0 mi , and let c satisfy that (c/n) = −1. Given good values of the inputs r3 , r4 , the error probability of a single iteration of the second part of the EQFTac (algorithm 1) is bounded by β(n, c) ≤ 241−ω ω  2(1−mi ) pi sel[(c/pi ), i=1 (n/pi − 1, (p2i − 1)/24) 12 , ] ≤ 241−Ω (p2i − 1)/24 pi − 1 where, we have adopted the notation sel[±1, E1 , E2 ] for a conditional expression with the semantics sel[−1, E1 , E2 ] = E1 and sel[1, E1 , E2 ] = E2 . Theorem 3. Let n be an odd composite number with ω distinct prime factors. For any t ≥ 1, the error probability βt (n) of t iterations of EQFTac (algorithm 1) is bounded by βt (n) ≤ 4 4.1 max (c/n)=−1 4ω−1 β(n, c)t EQFTac: Average Case Behaviour Uniform Choice of Candidates Let Mk be the set of odd k-bit integers (2k−1 < n < 2k ). Consider the algorithm that repeatedly chooses random numbers in Mk , until one is found that passes t iterations of EQFTac, and outputs this number. The expected time to find a “probable prime” with this method is at most tTk /pk , where Tk is the expected time for running the test on a random number from Mk , and pk is the probability that such a number is prime. Suppose we choose n at random and let n2 − 1 = 2u 3v q, where q is prime to 2 and 3. It is easy to see that the expected values of u and v are constant, and so it follows from Theorem 1 that Tk is 2k + o(k) multiplications modulo a k bit number. This gives approximately the same time needed to generate a probable prime, as if we had used 2t iterations of the Miller-Rabin test in place of t iterations of EQFTac. But, as we shall see, the error probability is much smaller than with 2t MR tests. 128 I.B. Damgård and G.S. Frandsen Let qk,t be the probability that the algorithm above outputs a composite number. When running t iterations of our test on input n, it follows from Theorem 3 and Theorem 2 that the probability βt (n) of accepting n satisfies βt (n) ≤ 4ω−1 24t(1−Ω) max{ (n/p − 1, (p2 − 1)/24) 12 t , } (p2 − 1)/24 p−1 where p is the largest prime factor in n and Ω is the number of prime factors in n, counted with multiplicity. This expression is extremely similar to the one for the Rabin test found in [5]. Therefore we can find bounds for qk,t in essentially the same way as there. Details can be found in the full paper. We obtain numerical estimates for qk,t , some sample results are shown in the table 1, which contains − log2 of the estimates, so we assert that, e.g., q500,2 ≤ 2−143 . Table 1. Lower bounds on − log2 qk,t k\t 300 400 500 600 1000 1 42 49 57 64 86 2 105 125 143 159 212 3 139 165 187 208 276 4 165 195 221 245 325 We also get a closed expression (with an easily computable big-O constant): √ Theorem 4. For 2 ≤ t ≤ k−1, we have that qk,t is O(k 3/2 2(σt +1)t t−1/2 4− 2σt tk ) Comparing to corresponding results in [5] for the Miller-Rabin test one finds that if several iteration of EQFTac are performed, then roughly speaking each iteration has the effect of 9 Miller-Rabin tests, while only taking time equivalent to about 2 M-R tests. 4.2 Incremental Search The algorithm we have just analysed is in fact seldom used in practice. Most real implementations will not want to choose candidates for primes uniformly at random. Instead one will choose a random starting point n0 in Mk and then test n0 , n0 + 2, n0 + 4, . . . for primality until one is found that passes t iterations of the test. Many variations are possible, such as other step sizes, various types of sieving, but the basic principle remains the same. The reason for applying such an algorithm is that test division by small primes can be implemented much more efficiently (see for instance [4]). On the other hand, the analysis we did above depends on the assumption that candidates are independent. In [3], a way to get around this problem for the Miller-Rabin test was suggested. We apply an improvement of that technique here. An Extended Quadratic Frobenius Primality Test 129 We will analyse the following example algorithm which depends on parameters t and s: choose n0 uniformly in Mk and test n0 , n0 + 2, .., n0 + 2(s − 1) using t iterations of EQFTac. If no probable prime is found, start over with a new independently chosen value of n0 . Output the first number found that passes all t iterations of EQFTac. We argue in the full paper that the expected time to find a probable prime by the above algorithm is at most O(tk 2 ) multiplications modulo k bit numbers, if s is θ(k). Practice shows that for s = 10 ln 2k , we need almost all the time only one value of n0 , and so st(2k + o(k)) multiplications is an upper bound5 . Let Qk,t,s be the probability that the above algorithm outputs a composite number. Table 2 shows sample numeric results of our estimates of Qk,t,s . Table 2. Estimates of the overall error probability with incremental search, lower bounds on − log2 Qk,t,s using s = c · ln(2k ) and c = 10. k\t 300 400 500 600 1000 5 1 18 26 34 40 62 2 74 93 109 125 176 3 107 132 153 174 239 4 133 162 186 210 288 EQFTwc: Worst Case Analysis We present in this section the version of our test (EQFTwc) which is fast for all n and has essentially the same error probability bound as EQFTac. The price for this is an expected start up cost of ≤ 2 log n + o(log n) multiplications in Zn for the first iteration of the test. For comparison of our test with the earlier tests of Grantham, Müller and Miller-Rabin, assume that we are willing to spend some fixed amount of time testing an input number, say, approximately corresponding to the time for t Miller-Rabin tests. Then, using our test, we get asymptotically a better bound on the error probability: using Miller-Rabin, Grantham[6], Müller [7,8], and EQFTwc, respectively, we get error bounds 4−t , 19.8−t , 50.8−t and approximately 576−t . In Section 2, the general idea behind EQFTwc was explained. The only point left open was the following: we need to design a start-up procedure that can either discover that n is composite, or construct an element r24 of order 24, and also guarantee that all Sylow-2 and -3 subgroups of R(n, c)∗ have order at least 2u , 3v 5 Of course, this refers to the run time when only the EQFTac is used. In practice, one would use test division and other tricks to eliminate some of the non primes faster than EQFTac can do it. This may reduce the run time significantly. Any such method can be used without affecting the error estimates, as long as no primes are rejected. 130 I.B. Damgård and G.S. Frandsen Algorithm 2 Extended Quadratic Frobenius Test (EQFTwc). First iteration: Require: input is an odd number n ≥ 5 Ensure: output is “composite”, or “probable prime”, c ∈ Zn , r24 ∈ R(n, c)∗ , where (c/n) = −1 and Φ24 (r24 ) = 0. 1: if n is divisible by 2 or 3 return “composite” 2: if n is a perfect square or a perfect cube return “composite” 3: choose a small c with (c/n) = −1 4: compute r ∈ R(n, c) satisfying r2 + r + 1 = 0 (may return “composite”) 5: a: if n ≡ 1 mod 3 then select a random z ∈ R(n, c)∗ with (N (z)/n) = −1 and res3 (z) = 1. b: if n ≡ 2 mod 3 then repeat Make a Miller-Rabin primality test on n (may return “composite”) select a random z ∈ R(n, c)∗ with (N (z)/n) = −1 and compute res3 (z) until either the Miller-Rabin test returns composite or the selected z satisfies that res3 (z) = 1 6: if z = z n return “composite”. 2 8 12 7: Let r24 = z (n −1)/24 . If r24 = r±1 or r24 = −1 return “composite”. 8: return “probable prime”, c, r24 Subsequent iterations: Require: input is n, c, r24 , where n ≥ 5 is not divisible by 2 or 3, (c/n) = −1, and Φ24 (r24 ) = 0 Ensure: output is “composite” or “probable prime” 9: select random z ∈ R(n, c)∗ 10: if z = z n return “composite” 2 i 11: if z (n −1)/24 ∈ {r24 | i = 0, . . . , 23} return “composite” 12: return “probable prime” (where as usual, 2u , 3v are the maximal 2- and 3-powers dividing n2 − 1). We do this by choosing z ∈ R(n, c)∗ in such a way that if n is prime, then z is both a 2 non-square and a non-cube. This means that we can expect that z (n −1)/2 = −1 2 and that z (n −1)/3 = r±1 , where r is a primitive 3rd root of 1. If this is not the case, n is composite. If it is, n may still be composite, but we have the required 2 condition on the Sylow-2 and -3 subgroups, and we can set r24 = z (n −1)/24 . The subsequent iterations of the test are then very simple: take a random z ∈ R(n, c) 2 i and check whether z = z n and z (n −1)/24 ∈ {r24 | i = 0, . . . , 23 } Before presenting the algorithm, we need to define a homomorphism res3 from the ring R(n, c)∗ into the complex third roots of unity {1, ζ, ζ 2 }. This homomorphism will be used to recognize cubic nonresidues. Definition 3. For arbitrary n ≥ 5 with (n, 6) = 1, for arbitrary c with (c/n) = −1, assume there exists an r = gx + h ∈ R(n, c) with r2 + r + 1 = 0, and if n ≡ 1 mod 3 assume in addition that r ∈ Zn , i.e. g = 0. Define res3 : R(n, c)∗ → {1, ζ, ζ 2 } ⊆ Z[ζ] by  2 [b − ca2 / gcd(n, r − ζ)], if n ≡ 1 mod 3 res3 (ax + b) = [(b + a(ζ − h)/g) / n], if n ≡ 2 mod 3 where [·/·] denotes the cubic residuosity symbol. An Extended Quadratic Frobenius Primality Test 131 To find the element z mentioned above, we note that computing the Jacobi symbol will let us recognize 1/2 of all elements as nonsquares. One might expect that applying res3 would let us recognize 2/3 of all elements as noncubes. Unfortunately, all we can show is that res3 is nontrivial except possibly when n is a perfect cube, or n is composite and n ≡ 2 mod 3. To handle this problem, we take a pragmatic solution: Run a Miller-Rabin test and a search for noncubes in parallel. If n is prime then the search for a noncube will succeed, and if n is composite then the MR-test (or the noncube search) will succeed. The following results are proved in the full paper: Theorem 5. There is an implementation of algorithm 2 that on input n takes expected time equivalent to at most 2 log n + o(log n) multiplications in Zn per iteration, when assuming the ERH. The first iteration has an additional expected start up cost equivalent to at most 2 log n + o(log n) multiplications in Zn . Theorem 6. Let n be an ωodd composite number with prime power factorisation ω i , let Ω = n = i=1 pm i i=0 mi . If γt (n) denotes the probability that n passes t iterations of the EQFTwc test (algorithm 2) then γt (n) ≤ ω−1 max (c/n)=−1 ω−1 ≤4 4 1−ω (24 ω  i=1 2(1−mi ) pi sel[(c/pi ), (n/pi − 1, p2i − 1) (n2 /p2i − 1, pi − 1) t ]) , (pi − 1)2 p2i − 1 t(1−Ω) 24 If n has no prime factor ≤ 118 or n ≥ 242 then γt (n) ≤ 44 24−4t ≈ 28−18.36t References 1. Manindra Agrawal, Neeraj Kayal, and Nitin Saxena. PRIMES is in P. Preprint 2002. Department of Computer Science & Engineering, Indian Institute of Technology, Kanpur Kanpur-208016, INDIA, 2002. 2. Eric Bach and Jeffrey Shallit. Algorithmic number theory. Vol. 1. Foundations of Computing Series. MIT Press, Cambridge, MA, 1996. Efficient algorithms. 3. Jørgen Brandt and Ivan Damgård. On generation of probable primes by incremental search. In Advances in cryptology—CRYPTO ’92 (Santa Barbara, CA, 1992), Vol. 740 of Lecture Notes in Comput. Sci., pp. 358–370. Springer, Berlin, 1993. 4. Jørgen Brandt, Ivan Damgård, and Peter Landrock. Speeding up prime number generation. In Advances in cryptology—ASIACRYPT ’91 (Fujiyoshida, 1991), Vol. 739 of Lecture Notes in Comput. Sci., pp. 440–449. Springer, Berlin, 1993. 5. Ivan Damgård, Peter Landrock, and Carl Pomerance. Average case error estimates for the strong probable prime test. Math. Comp. 61(203) (1993), 177–194. 6. Jon Grantham. A probable prime test with high confidence. J. Number Theory 72(1) (1998), 32–47. 7. Siguna Müller. A probable prime test with very high confidence for n ≡ 1 mod 4. In Advances in cryptology—ASIACRYPT 2001 (Gold Coast), Vol. 2248 of Lecture Notes in Comput. Sci., pp. 87–106. Springer, Berlin, 2001. 8. Siguna Müller. A probable prime test with very high confidence for n ≡ 3 mod 4. J. Cryptology 16(2) (2003), 117–139. Periodic Multisorting Comparator Networks⋆ Marcin Kik Institute of Mathematics, Wroclaw University of Technology ul. Wybrzeże Wyspiańskiego 27, 50-370 Wroclaw, Poland kik@im.pwr.wroc.pl Abstract. We present a family of periodic comparator networks that transform the input so that it consists of a few sorted subsequences. The depths of the networks range from 4 to 2 log n while the number of sorted subsequences ranges from 2 log n to 2. They work in time c log2 n + O(log n) with 4 ≤ c ≤ 12, and the remaining constants are also suitable for practical applications. So far, known periodic sorting networks of a constant depth that run in time O(log2 n) (a periodic version of AKS network [7]) are impractical because of complex structure and very large constant factor hidden by big “Oh”. Keywords: sorting, comparator networks, parallel algorithms. 1 Introduction Comparator is a simple device capable of sorting two elements. Many comparators can be connected together to form a comparator network. This way we get the classical framework for sorting algorithms. Optimal arranging the comparators turned out to be a challenge. The main complexity measures of comparator networks are time complexity (depth or number of steps) and the number of comparators. The most famous sorting network is AKS network with asymptotically optimal depth O(log n) [1], however the big constant hidden by big “Oh” makes it impractical. The Batcher networks of depth ≈ 12 log2 n [2], seem to be very attractive for practical applications. A periodic network is repeatedly used on the intermediate results until the output becomes sorted, thus the same comparators are reused many times. In this case, the time complexity is the depth of the network multiplied by the number of iterations. The main advantage of periodicity is the reduction of the amount of hardware (comparators) needed for the realization of the sorting algorithm, with a very simple control mechanism providing the output of one iteration as the input for the next iteration. Dowd et al, [3], reduced the number of comparators from Ω(n log2 n) to 12 n log n, while keeping the sorting time log2 n, by the use of a periodic network of depth log n. (The networks of depth d have at most dn/2 comparators.) There are some periodic sorting networks of a constant depth ([10], [5], [7]). In [7], constant depth networks with time complexity O(log2 n) are ⋆ Research supported by KBN grant 7T11C 3220 in the years 2002, 2003. A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 132–143, 2003. c Springer-Verlag Berlin Heidelberg 2003  Periodic Multisorting Comparator Networks 133 obtained by “periodification” of the AKS network, and more practical solutions with time complexity O(log3 n), are obtained by “periodification” of the Batcher network. On the other hand there is not known any ω(log n) lower bound on the time complexity of periodic sorting networks of constant depth. Closing the gap between the known upper bound of O(log2 n) and the trivial general lower bound Ω(log n) seems to be a very hard problem. Periodic networks of constant depth can also be used for simpler tasks, such as merging sorted sequences [6], or resorting sequences with few values modified [4]. 1.1 New Results We assume that the values are stored in the registers and the only allowed operations are compare-exchange operations (applications of comparators) on the pairs of registers. Such an operation takes the two values stored in the pair of registers and stores the lower value in the first register and the greater value in the second register. (This interpretation differs from the one presented for instance in [8] but is more useful when periodic comparator networks are concerned.) We present a family of periodic comparator networks Nm,k . The input size of Nm,k is n = 4m2k . The depth of Nm,k is 2⌈k/m⌉ + 2. In Section 4 we prove the following theorem. Theorem. The periodic network Nm,k transforms the input into 2m sorted subsequences of length n/(2m) in time 4k 2 + 8km + O(k + m). For example, the network N1,k is a network of depth ≈ 2 log n that produces 2 sorted sequences in time ≈ 4 log2 n + O(log n). On the other hand, Nk,k is a network of depth 4 that transforms the input into ≈ 2 log n sorted sequences in time ≈ 12 log2 n + O(log n). Due to the large constants in the known periodic constant depth networks sorting in time O(log2 n), [7], it could be interesting alternative to use Nk,k to produce very much ordered (although not completely sorted) output. The output produced by Nm,k can be finally sorted by a network merging 2m sequences. This can be performed by the very efficient multiway merge sorting networks [9]. It is an interesting problem to find efficient periodic network of constant depth that merges multiple sorted sequences. The periodic networks of constant depth that merge two sorted sequences in time O(log n) are already known [6]. As Nm,k outputs multiple sorted sequences, we call it a multisorting network. Much simpler multisorting networks of constant depth exist if some additional operations are allowed (such as permutations of the elements in the registers between the iterations). However, we consider only the case restricted to the compare-exchange operations. 134 2 M. Kik Preliminaries By a comparator network we mean a set of registers R0 , . . . , Rn−1 together with a finite sequence of layers of comparators. Every moment a register Ri contains a single value (denoted by v(Ri )) from some totally ordered set, say IN. We say that the network stores a sequence v(R0 ), . . . , v(Rn−1 ). A subset S of registers is sorted if for all Ri , Rj in S, i < j implies that v(Ri ) ≤ v(Rj ). A comparator is denoted by an ordered pair of registers (Ri , Rj ). If v(Ri ) = x and v(Rj ) = y before an application of the comparator (Ri , Rj ), then v(Ri ) = min{x, y} and v(Rj ) = max{x, y} after the application of (Ri , Rj ). A set of comparators L forms a layer if each register is contained in at most one of the comparators of L. So all the comparators of a layer can be applied simultaneously. We call such application a step. The depth of the network is the number of its layers. An input is the initial value of the sequence v(R0 ), . . . , v(Rn−1 ). An output of the network N is the sequence v(R0 ), . . . , v(Rn−1 ) obtained after application of all its layers (application of N ) on some initial input sequence. We can iterate the network’s application, by applying it to the output of its previous application. We call such network a periodic network. The time complexity of the periodic network is the number of steps performed in all iterations. 3 Definition of the Network Nm,k We define a periodic network Nm,k for positive integers m and k. For the sake of simplicity we fix the values m and k and denote Nm,k by N . Network N contains n registers R0 , . . . , Rn−1 , where n = 4m · 2k . It will be useful to imagine that the registers are arranged in a three-dimensional matrix M of size 2 × 2m × 2k . For 0 ≤ x ≤ 1, 0 ≤ y ≤ 2m − 1 and 0 ≤ z ≤ 2k − 1, the element Mx,y,z is a register Ri such that i = x + 2y + 4mz. For the intuitions, we assume that Z and Y coordinates are increasing downwards and rightwards respectively. By a column Cx,y we mean a subset of registers Mx,y,z with 0 ≤ z < 2k . Py = C0,y ∪ C1,y is a pair of columns. An Z-slice is a subset of registers with the same Z coordinate. Let d = ⌈k/m⌉. We define the sets of comparators X, Y0 , Y1 , and Zi , for 0 ≤ i < d, as follows. (Comparators of X, Yj and Zi are called X-comparators, Y -comparators and Z-comparators, respectively.) The comparators of X, Y0 and Y1 act in each Z-slice separately (see Figure 1). Set X contains comparators (M0,y,z , M1,y,z ), for all y and z. Let Y be an auxiliary set of all comparators (Mx,y,z , Mx,y′ ,z ) such that y ′ = (y + 1) mod 2m. Y0 contains all comparators (Mx,y,z , Mx,y′ ,z ) from Y , such that y is even. Y1 consists of these comparators from Y that are not in Y0 . Note that the layer Y1 contains nonstandard comparators (Mx,2m−1,z , Mx,0,z ) (i.e. comparators that place the greater value in the register with lower index). In order to describe Zi we define a matrix α of size d × 2m (with the rows indexed by the first coordinate) such that, for 0 ≤ i < d and 0 ≤ j < 2m: – if j is even then αi,j = d · j/2 + i, – if j is odd αi,j = αi,2m−1−j . Periodic Multisorting Comparator Networks 135 X Y Fig. 1. Comparator connections within a single Z-slice. Dotted (respectively, dashed and solid) arrows represent comparators from X (respectively, Y0 and Y1 ). For example, for m = 4 and 4 < k ≤ 8, α is the following matrix:   06244260 . 17355371 For 0 ≤ i < d, Zi consists of comparators (M1,y,z , M0,y,z′ ) such that 0 ≤ y < 2m and z ′ = z + 2k−1−αi,y provided that 0 ≤ z, z ′ < 2k and k − 1 − αi,y ≥ 0. By a height of the comparator (Mx,y,z , Mx′ ,y′ ,z′ ) we mean z ′ − z. Note that each single Z-comparator is contained within a single pair of columns and all comparators of Zi contained in the same pair of columns are are of the same height which is a power of two. All Z-comparators of height 2k−1 , 2k−2 , . . . , 2k−d (which are from Z0 , Z1 , . . . , Zd−1 , respectively) are placed in the pairs of columns P0 and P2m−1 . All Z-comparators of height 2k−1−d , . . . , 2k−2d (from Z0 , . . . , Zd−1 ) are placed in P2 and P2m−3 . And so on. Generally, for 0 ≤ i < d and 0 ≤ y < m, the height of all comparators of Zi contained in P2y and in P2m−1−2y is 2k−1−dy−i . X Z height 4 X Z height 2 X Z height 1 Fig. 2. Z-comparators of different heights within the pairs of columns, for k = 3. The sequence of layers of the network N is (L0 , . . . , L2d+1 ) where L2i = X, L2i+1 = Zi , for 0 ≤ i < d, and L2d = Y0 , L2d+1 = Y1 . 136 M. Kik Y X Z X−comparators Z−comparators Y X Z Y −comparators 0 Y −comparators 1 Fig. 3. Network N3,3 . For clarity, the Y -comparators are drawn separately. Periodic Multisorting Comparator Networks 137 A set of comparators K is symmetric if (Ri , Rj ) ∈ K implies (Rn−1−j , Rn−1−i ) ∈ K. Note that all layers of N are symmetric. Figure 3 shows a network Nk,m , for k = m = 3. As m ≥ k, this network contains only one layer of Z-comparators Z0 . 4 Analysis of the Computation of Nm,k The following theorem is a more detailed version of the theorem stated in the introduction. k Theorem 1. After T ≤ 4k 2 + 8mk + 7k + 14m + 6 m + 13 steps of the periodic network Nm,k all its pairs of columns are sorted. We denote Nm,k by N . By the zero-one principle, [8], it is enough to show this property for the case when only zeroes and ones are stored in the registers. We replace zeroes by negative numbers and ones by positive numbers. These numbers can increase their absolute values between the applications of subsequent layers in periodic computation of N , but can not change their signs. We show that, after T steps, negative values preceed all positive values within each pair of columns. Initially, let v(R0 ), . . . v(Rn−1 ) be arbitrary sequence of the values from {−1, 1}. We apply N to this sequence as a periodic network. We call the application of the layer Yi (respectively, X,Zi ) an Y-step (respectively, X-step, Z-step). To make the analysis more intuitive, we assume that each register stores (besides the value) an unique element. The value of an element e stored in Ri , (denoted v(e)) is equal to v(Ri ). If v(e) > 0 then e is positive. Otherwise e is negative. If just before the application of comparator c = (Ri , Rj ) we have v(Ri ) > v(Rj ) then during the application of c the elements are exchanged between Ri and Rj . If c is from Y0 or Y1 then the elements are exchanged also if v(Ri ) = v(Rj ). If e is a positive (respectively, negative) element contained in Ri or Rj , before the application of c, then e wins in c if, after the application of c, it ends up in Rj (respectively, Ri ). Otherwise e loses in c. We call the elements that are stored during the X-steps and Z-steps in the pairs of columns P2i , for 0 ≤ i < m, right-running elements. The remaining elements are called left-running. Let k ′ = md. (Recall that d = ⌈k/m⌉.) Let δ = 1/(4k ′ ). Note that k ′ δ < 1. By critical comparators we mean the comparators between P2m−1 and P0 from the layer Y1 . We modify the computation of N as follows: – After each Z-step, we increase the values of the positive right-running elements and decrease the values of the negative left-running elements by δ. (We call it δ-increase.) – When a positive right-running (respectively, negative left-running) element e wins in a critical comparator, we increase v(e) to ⌊v(e) + 1⌋ (respectively, decrease v(e) to ⌈v(e) − 1⌉). 138 M. Kik Note that once a positive (respectively, negative) element becomes rightrunning (respectively, left-running) it remains right-running (respectively, leftrunning) for ever. All the positive left-running and negative right-running elements have absolute value 1. Lemma 1. If, during the Z-step t, |v(e)| = l + y ′ δ, where l and y ′ are nonnegative integers such that l ≥ 2 and 0 ≤ y ′ < k ′ , then, during t, e can be processed ′ only by comparators with height 2k−1−y . Let e be a positive element. (A negative element behaves symmetrically.) Since v(e) > 1, e is a right-running element during step t. At the moment when e started being right-running, its value was equal 1. A right-running element can be δ-increased at most k ′ times between its subsequent wins in the critical comparators, and k ′ δ < 1. Thus e reached the value 2 when it entered P0 for the first time. Then its value was being increased by δ, after each Z-step (d times in each P2j ), and rounded up to the next integer during its wins in critical comparators. The lemma follows from the definition of α and Zi : The heights of the Z-comparators from the subsequent Z-layers Zi , for 0 ≤ i < d, in the subsequent pairs of columns P2j , for 0 ≤ j < m, are the decreasing powers of two. ✷ We say that a register Mx,y,z is l-dense for v if – in the case v > 0: v(Mx,y,z+i⌈2l ⌉ ) ≥ v, for all i ≥ 0 such that z + i⌈2l ⌉ < 2k , and – in the case v < 0: v(Mx,y,z−i⌈2l ⌉ ) ≤ v for all i ≥ 0 such that z − i⌈2l ⌉ ≥ 0. Note that, for l < 0,“l-dense” means “0-dense”. An element is l-dense for v if it is stored in a register that is l-dense for v. Lemma 2. If Mx,y,z is l-dense for v > 0 (respectively, v < 0), then, for 0 < v ′ ≤ v (respectively, v ≤ v ′ < 0), Mx,y,z is l-dense for v ′ . If Mx,y,z is l-dense for v > 0 (respectively, v < 0), then, for all j ≥ 0 (respectively, j ≤ 0), Mx,y,z+j⌈2l ⌉ is l-dense for v. If Mx,y,z is l-dense for v > 0 (respectively, v < 0) and Mx,y,z+⌊2l−1 ⌋ (respectively, Mx,y,z−⌊2l−1 ⌋ ) is l-dense for v, then Mx,y,z is (l − 1)-dense for v. The properties can be easily derived from the definition. ✷ Lemma 3. Let L be any layer of N and (Mx,y,z , Mx′ ,y′ ,z′ ) ∈ L. If Mx,y,z or Mx′ ,y′ ,z′ is l-dense for v > 0 (respectively, v < 0), just before an application of L, then Mx′ ,y′ ,z′ (respectively, Mx,y,z ) is l-dense for v just after the application of L. If Mx,y,z and Mx′ ,y′ ,z′ are l-dense for v, just before the application of L, then Mx,y,z and Mx′ ,y′ ,z′ are l-dense for v just after the application of L. Proof. The lemma follows from the fact that, for each integer i such that 0 ≤ z + i⌈2l ⌉, z ′ + i⌈2l ⌉ < 2k , the comparator (Mx,y,z+i⌈2l ⌉ , Mx′ ,y′ ,z′ +i⌈2l ⌉ ) is also in L. ✷ Periodic Multisorting Comparator Networks 139 Corollary 1. If an element l-dense for v wins during an application of a layer L of N , then it remains l-dense for v. If it looses to another element l-dense for v, then it also remains l-dense for v. If it wins in critical comparator and v > 0 (respectively, v < 0), then it becomes l-dense for ⌊v + 1⌋ (respectively, ⌈v − 1⌉). If just before Z-step t, e is right-running positive (respectively, left-running negative) element l-dense for v > 0 (respectively, v < 0), and, during t, e looses to another element l-dense for v or wins, then it becomes l-dense for v + δ (respectively, v − δ), after the δ-increase following t. The following lemma states that each positive element e that was rightrunning for a long time is contained in a dense foot of the elements with the value v(e) or greater, and an analogical property holds for left-running negative values. Lemma 4. Consider the configuration of N after the Z-step. For nonnegative integers l,s and y ′ such that y ′ ≤ k ′ , for each element e: If v(e) = l + 2 + s + y ′ δ, then e is (k − l)-dense for l + 2 + y ′ δ and, if y ′ > l, then e is (k − l − 1)-dense for l + 2 + y ′ δ. If v(e) = −(l + 2 + s + y ′ δ), then e is (k − l)-dense for −(l + 2 + y ′ δ) and, if ′ y > l, then e is (k − l − 1)-dense for −(l + 2 + y ′ δ). Proof. We prove only the first part. The second part is analogical since all layers of N are symmetrical. The proof is on induction by l. Let 0 ≤ l < k. Let e be any element with v(e) = l + 2 + s + y ′ δ, for some nonnegative integers s,y ′ , where y ′ ≤ k ′ . The element e was right-running during each of the last y ′ Z-steps. These steps were preceeded by a critical step t, that increased v(e) to l + 2 + s. Let ti (respectively, t′i ) be the (i + 1)-st X-step (respectively, Z-step) after step t. Let Mxi ,yi ,zi (respectively, Mx′i ,yi ,zi′ ) be the register that stored e just after ti (respectively, t′i ). Let vi denote the value l + 2 + iδ. During each step ti and t′i , all elements e′ with v(e′ ) ≥ v(e), in the pair of columns containing e, are (k − l)-dense for vi . (For l = 0 it is obvious, since the “height” of N is 2k , and, for l > 0, it follows from the induction hypothesis and Corollary 1, since e′ was (k − l)-dense for l + 1 already before t, and, hence, (k − l)-dense for v0 just after t.) Claim (Breaking Claim). For 0 ≤ i ≤ l, just after the X-step ti , the registers M0,yi ,zi +2k−i and M1,yi ,zi +2k−i are (k − l)-dense for vi , if they exist. We prove the claim by induction on i. For i = 0 it is obvious. (M0,yi ,zi +2k and M1,yi ,zi +2k do not exist.) Let 0 < i ≤ l. Consider the configuration just after step ti−1 . (See Figure 4.) Since ti−1 was an X-step, v(M1,yi−1 ,zi−1 ) ≥ v(e) and, hence, M1,yi−1 ,zi−1 is (k − l)-dense for vi−1 . Thus, M1,yi−1 ,zi−1 +2k−i is (k − l)-dense for vi−1 , since 2k−i is multiple of 2k−l . By the induction hypothesis of the claim, M0,yi−1 ,zi−1 +2k−i+1 and M1,yi−1 ,zi−1 +2k−i+1 are (k − l)-dense for vi−1 . Just after the step t′i−1 , M1,yi−1 ,zi−1 +2k−i , and M1,yi−1 ,zi−1 +2k−i+1 remain (k − l)-dense for vi−1 , since they were compared to the registers M0,yi−1 ,zi−1 +2k−i+1 and M0,yi−1 ,zi−1 +2k−i+2 140 M. Kik e z i−1 z +2k − i z +2k − i +1 i−1 i−1 x=0 x =1 Fig. 4. The configuration after ti−1 in Pyi−1 in the registers with Z-coordinates zi−1 + j2k−i , for 0 ≤ j < 4. (Black registers are (k − l)-dense for vi−1 . Arrows denote the comparators from t′i−1 .) that were (k − l)-dense for vi−1 . M0,yi−1 ,zi−1 +2k−i+1 remains (k − l)-dense for vi−1 . M0,yi−1 ,zi−1 +2k−i also becomes (or remains) (k − l)-dense for vi−1 , since it was compared to M1,yi−1 ,zi−1 . Thus, just after the Z-step t′i−1 , for x ∈ {0, 1}, the ′ registers Mx′ = Mx,yi−1 ,zi−1 +2k−i are (k−l)-dense for vi−1 (and for vi , after the δ′ ′ increase). (Either zi−1 = zi−1 + 2k−i = zi−1 and Mx′ = Mx,yi−1 ,zi−1 +2k−i , or zi−1 ′ and Mx = Mx,yi−1 ,zi−1 +2k−i+1 .) If i mod d = 0 then, during the next two Y-steps, the elements from both M0′ and M1′ together with the element e are moved “horizontally” to P2i/d (wining by the way). Thus, by Corollary 1, just before and after the X-step ti , for x ∈ {0, 1}, the registers Mx,yi ,zi +2k−i are (k − l)-dense for vi . This completes the proof of the claim. The next claim shows how the values vl or greater form twice more condensed foot below e. Claim (Condensing Claim). After the Z-step t′l , e is (k − l − 1)-dense for vl (and for vl+1 , after the δ-increase). Consider the configuration just after X-step tl . The registers Mxl ,yl ,zl and, by the Breaking Claim, M0,yl ,zl +2k−l and M1,yl ,zl +2k−l are (k −l)-dense for vl . Since the last step was an X-step, M1,yl ,zl is (k − l)-dense for vl . Consider the following scenarios of the Z-step t′l (see Figure 5): 1. e remains in M0,yl ,zl : Then the register M0,yl ,zl +2k−l−1 becomes (k − l)-dense for vl , by Lemma 3, since M1,yl ,zl was (k − l)-dense for vl just before t′l . Thus e becomes (k − l − 1)-dense for vl , by Lemma 2. 2. e is moved from M1,yl ,zl to M0,yl ,zl +2k−l−1 : Then by Corollary 1, e remains (k − l)-dense for vl , and the register M0,yl ,zl +2k−l remains (k − l)-dense for vl . Thus e becomes (k − l − 1)-dense for vl , by Lemma 2. 3. e remains in M1,yl ,zl : Then v(e) ≤ v(M0,yl ,zl +2k−l−1 ) ≤ v(M1,yl ,zl +2k−l−1 ) just before t′l . (The second inequality is forced by the X-step tl ). Hence, for x ∈ {0, 1}, Rx′ = Mx,yl ,zl +2k−l−1 was (k−l)-dense for vl just before t′l . During Periodic Multisorting Comparator Networks 141 R’ z l e e e z +2 k −l −1 l e R’0 R’1 R’’ case 4 z +2 k −l l case 1 case 2 case 3 Fig. 5. The scenarios of t′l . t′l the register R1′ is compared to M0,yl ,zl +2k−l . So R1′ remains (k − l)-dense for vl . Since e was compared to R0′ , it also remains (k − l)-dense for vl . By Lemma 2, e is (k − l − 1)-dense for vl just after t′l . 4. e is moved from M0,yl ,zl to R′ = M1,yl ,zl −2k−l−1 : During t′l , R′ was compared to Mxl ,yl ,zl and R′′ = M1,yl ,zl was compared to M0,yl ,zl +2k−l−1 that was (k − l)-dense for vl just before t′l , by the Breaking Claim applied to the element in R′ . Thus, by Lemma 3, the registers R′ and R′′ remain (k − l)dense for vl just after t′l . By Lemma 2, R′ is (k − l − 1)-dense for vl just after t′l . Since there are no other scenarios for e and the subsequent δ-increase is the same for all positive elements in Pyl , the proof of the claim is completed. By Corollary 1, the element e remains (k − l − 1)-dense for vi , for i > l, since other elements in its pair of columns with values v(e) or greater are now also (k − l − 1)-dense for vi , and during Y-steps e is wining (right-running). For l ≥ k, “(k − l)-dense for v” means “0-dense for v”. The element e with v(e) = k + 1 + kδ is 0-dense for k + 1 + kδ. All the positive elements below it increase their values at the same rate as e. Thus, when v(e) reaches k + 2, it becomes 0-dense for k + 2. By repeating this reasoning for the values k + 2 and greater we complete the proof of the Lemma 4. ✷ By Lemma 4, whenever any element e reaches the value k + 2 (in the pair of columns P0 ) it is 0-dense for k + 2. Then, by the Breaking Claim, after the X-step after e reaches the value k + 2 + kδ, e is stored in a register Mx,y,z such that M0,y,z+1 is also 0-dense for k + 2 + kδ. Hence, all the elements following e in its pair of columns are 0-dense for k + 2 + kδ. By Corollary 1, this property of e remains valid forever. Since the network is symmetric, we have the following corollary: Corollary 2. Consider a configuration in a pair of columns Py just after an X-step. 142 M. Kik If, for some register Ri ∈ Py , v(Ri ) ≥ k + 2 + kδ, then, for all Rj ∈ Py such that j ≥ i, we have v(Rj ) ≥ k + 2 + kδ. If, for some register Ri ∈ Py , v(Ri ) ≤ −(k + 2 + kδ), then, for all Rj ∈ Py such that j ≤ i, we have v(Rj ) ≤ −(k + 2 + kδ). Now, it is enough to show that, after the last X-step of the first T steps, all right-running positive and all left-running negative elements have the absolute values k+2+kδ or greater. Then in each pair of columns containing right-running elements, the −1s are above the positive values, and in each pair of columns containing left-running elements, the 1s are below the negative elements. Lemma 5. If, after m Y-steps, and the next k ′ (k + 1) + k Z-steps, and the next X-step, e is a left-running positive (respectively, right-running negative) element, then e remains left-running (respectively, right-running) forever. Let e be positive. (The proof for e negative is analogical). During each of the first m Y-steps, e was compared with the positive right-running elements. For t ≥ 0, let yt be such that e was in Pyt just after the (t + 1)st Y-step. For 0 ≤ i < m, let Si (respectively, Si′ ) denote the set of positive elements that were in Pyi (respectively, P(yi +1) mod 2m ) just after (i + 1)st Y-step. Let S ′′ be the set of negative elements in Pym−1 just after the mth Y-step. For 0 ≤ i < m, |Sm−1 | = 2 · 2k − |S ′′ | ≤ |Si′ |, since Sm−1 ⊆ Si and |Si | ≤ |Si′ |. Note that, for all t ≥ m, during the (t + 1)st Y-step, the pair of columns containing (left-running) S ′′ is compared to the pair of columns containing (right-running) St′ mod m . After the next k ′ (k + 1) + k Z-steps all the elements of S ′′ have values −(k + 2 + kδ) or less, and, for 0 ≤ i < k, the elements of Si′ have values k + 2 + kδ or greater (they have walked at least k + 1 times through the critical comparators and then increased their values by δ at least k times during Z-steps). Let t′ be the next X-step. Let t be any Y-step after t′ such that e is still in the same pair of columns as S ′′ . Before the step t, the elements in S ′′ and each Si′ were processed by an X-step after their absolute values had reached k + 2 + kδ. Hence, by the Corollary 2, just before the Y-step t, all the final |Si′ | registers of the pair of columns containing Si′ store the values k + 2 + kδ or greater and the pair of columns containing S ′′ has all the initial |S ′′ | registers filled with the values −(k + 2 + kδ) or less. Thus, e is stored in one of its remaining 2 · 2k − |S ′′ | final registers and, during the Y-step t, e is compared with a value k + 2 + kδ or greater and it must remain left-running. ✷ The depth of N is 2d + 2. Each iteration of N performs two Y-steps as its last steps. Thus the first m Y-steps are performed during the first (2d + 2)⌈m/2⌉ steps. Each iteration of N performs d Z-steps. Thus, the next k ′ (k + 1) + k Zsteps are performed during the next (2d + 2)⌈(k ′ (k + 1) + k)/d⌉ steps. After the next X-step, t′ , by Lemma 5, the set S of positive right-running and negative left-running elements remains fixed. After the next ⌈(k ′ (k + 1) + k)/d⌉ iterations absolute values of elements in S are k + 2 + kδ or greater. (t′ was the first step of these iterations.) After the first X-step of the next iteration, by Corollary 2, in all pairs of columns the negative values preceed the positive values. We can now replace negative values with zeroes, positive values with ones, and, by the Periodic Multisorting Comparator Networks 143 zero-one principle, we have all the pairs of columns sorted. (Note that, by the definition of N , once all the pairs of columns are sorted, they remain sorted for ever.) We can estimate the number of steps by T ≤ (2d + 2)(⌈m/2⌉ + 2⌈(k ′ (k + 1) + k)/d⌉) + 1. Recall that d = ⌈k/m⌉. It can be verified that T ≤ 4k 2 + 8mk + 7k + k + 13. This completes the proof of Theorem 1. 14m + 6 m Remarks: Note that the network N1,k can be simplified to a periodic sorting network of depth 2 log n, by removing the Y-steps and merging P0 with P1 . However, better networks exist, [3], with depth log n that sort in log n iterations. Note also that the arrangement of the registers in the matrix M can be arbitrary. We can select the one that is most suitable for the subsequent merging. Acknowledgments. I would like to thank Miroslaw Kutylowski for his useful suggestions and comments on this paper. References 1. M. Ajtai, J. Komlós and E. Szemerédi. Sorting in c log n parallel steps. Combinatorica, Vol. 3, pages 1–19, 1983. 2. K. E. Batcher. Sorting networks and their applications. Proceedings of 32nd AFIPS, pages 307–314, 1968. 3. M. Dowd, Y. Perl, L. Rudolph, and M. Saks. The periodic balanced sorting network. Journal of the ACM, Vol. 36, pages 738–757, 1989. 4. M. Kik. Periodic correction networks. Proceedings of the Euro-Par 2000, Springer Verlag, LNCS 1900, pages 471–478, 2000. 5. M. Kik, M. Kutylowski and G. Stachowiak. Periodic constant depth sorting network. Proceedings of the 11th STACS, Springer Verlag, LNCS 775, pages 201–212, 1994. 6. M. Kutylowski, K. Loryś and B. Oesterdiekhoff. Periodic merging networks. Proceedings of the 7th ISAAC, pages 336–345, 1996. 7. M. Kutylowski, K. Loryś, B. Oesterdiekhoff, and R. Wanka. Fast and feasible periodic sorting networks. Proceedings of the 55th IEEE-FOCS, 1994. 8. D. E. Knuth. The art of Computer Programming. Volume 3: Sorting and Searching. Addison-Wesley, 1973. 9. De-Lei Lee and K. E. Batcher. A multiway merge sorting network. IEEE Transactions on Parallel and Distributed Systems 6, pages 211–215, 1995. 10. U. Schwiegelshohn. A short-periodic two-dimensional systolic sorting algorithm. IEEE International Conference on Systolic Arrays, pages 257–264, 1988. Fast Periodic Correction Networks Grzegorz Stachowiak Institute of Computer Science, University of Wrocáaw, Przesmyckiego 20, 51-151 Wrocáaw, Poland gst@ii.uni.wroc.pl Abstract. We consider the problem of sorting N -element inputs differing from already sorted sequences on t entries. To perform this task we construct a comparator network that is applied periodically. The two constructions for this problem made by previous authors required O(log n + t) iterations of the network. Our construction requires O(log n + (log log N )2 (log t)3 ) iterations which makes it faster for t ≫ log N . Keywords: sorting network, comparator, periodic sorting network. 1 Introduction Sorting is one of the most fundamental problems of computer science. A classical approach to sort a sequence of keys is to apply a comparator network. Apart from a long tradition, comparator networks are particularly interesting due to hardware implementations. They can be also implemented as sorting algorithms for parallel computers. In our approach sorted elements are stored in registers r1 , r2 , . . . , rN . Registers are indexed with integers or elements of other linearly ordered sets. A comparator [i : j] is a simple device connecting registers ri and rj (i < j). It compares the keys they contain and if the key in ri is bigger, it swaps the keys. The general problem is the following. At the beginning of the computations the input sequence of keys is placed in the registers. Our task is to sort the sequence of keys according to the linear order of register indices by applying a sequence of comparators. The sequence of comparators is the same for all possible inputs. We assume that comparators connecting disjoint pairs of registers can work in parallel. Thus we arrange the sequence of comparators into a series of layers which are sets of comparators connecting disjoint pairs of registers. The total time needed by such a network to sort a sequence is proportional to the number of layers called the network’s depth. Much research concerning sorting networks was done in the past. Most famous results are asymptotically optimal AKS [1] sorting network of depth O(log N ) and more ‘practical’ Batcher [2] network of depth ∼ 12 log2 N (from now on all the logarithms are binary). Some research was devoted to problems concerning periodic sorting networks. Such a comparator network is applied not once but many times in a series of iterations. The input of the first iteration is the sequence to be sorted. The input of (i + 1)st iteration is the output of ith iteration. The output of the last iteration should always be sorted. The total time needed to sort an input sequence is the product of the number of iterations A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 144–156, 2003. c Springer-Verlag Berlin Heidelberg 2003  Fast Periodic Correction Networks 145 and the depth of the network. Constructing such networks especially of small constant depth gives hope to reduce the amount of hardware needed to build sorting comparator networks. It can be done by applying the same small chip many times to sort an input. We can also view such a network as a building block of a sorting network in which layers are repeated periodically. Main results concerning periodic sorting networks are presented in the table: depth # iterations DPS [3] log N √log N Schwiegelsohn [15] 8 O( N log N ) KKS [5] O(k) O(N 1/k ) Loryś et al. [9] 3-5 O(log2 N ) Last row of this table requires some words of explanation. The paper [9] describes a network of depth 5, but a later paper [10] reduces this value to 3. The number of iterations O(log2 N ) is achieved by periodification of AKS sorting network for which the constant hidden behind big O is very big. Periodification of Batcher network requires less iterations for practical sizes of the input, though it requires the time O(log3 N ) asymptotically. It is not difficult to show that 3 is the minimal depth of a periodic sorting network which requires o(N ) iterations to sort an arbitrary input. A sequence obtained from a sorted one by t changes being either swaps between pairs of elements or changes on single positions we call t-disturbed. We define t-correction network to be a specialized network sorting t-disturbed inputs. Such networks were designed to obtain a sorted sequence from an output produced by a sorting network having t faulty comparators [14,11,16]. There are also other potential applications in which we have to deal with sequences that differ not much from a sorted one. Let us consider a large sorted database with N entries. In some period of time we make t modifications of the database and want to have it sorted back. It can be more effective to use a specialized correction unit in such a case, than to apply a sorting network. Results concerning such correction networks are presented in [4,16]. There was some interest in constructing periodic comparator networks of a constant depth, that sort t-disturbed inputs. The reason is that the fastest known constant depth periodic sorting networks have running time O(log2 N ). On the other hand in some applications faster correction networks can replace sorting networks. Two periodic correction networks were already constructed by Kik and Piotrów [6,12]. The first of them has depth 8 and the other has depth 6. Both of them require O(log N + t) iterations for considered inputs where N is input size and t is the number of modifications. The running time is O(log N ) for t = O(log N ) and the constants hidden behind the big O are small. Unfortunately it is not known how fast these networks complete sorting if t ≫ log N . In this paper we construct a periodic t-correction network to deal with t : log N ≪ t ≪ N . The reason we assume that t is small in comparison to N is the following. If t is about the same as N , then the periodification scheme gives a practical periodic sorting network of depth 3 requiring O(log3 N ) = O(log3 t) iterations. Actually we do not hope to get better performance in such a case. Our network has depth 3 and running time: O(log N + (log log N )2 (log t)3 ). We should mention that in our construction 146 G. Stachowiak we do not use AKS sorting network. If this network was used (also in the auxiliary construction of a non periodic t-correction network) we would get the running time: O(log N + (log log N )(log t)2 ). In such case the AKS constant would stand in front of (log log N )(log t)2 . Now we remind of a couple of useful properties of comparator networks. The first of them is a general property of all comparator networks. Let us assume we have two inputs for a fixed comparator network. We say that we have relation (x1 , x2 , . . . , xN ) ≤ (y1 , y2 , . . . , yN ) between these inputs if for all i we have xi ≤ yi . Lemma 1.1. If we apply the same comparator network to inputs for which we have (x1 , x2 , . . . , xN ) ≤ (y1 , y2 , . . . , yN ) then this relation is preserved for the outputs. The analysis of sorting networks is most often based on the following lemma [7] Lemma 1.2 (zero–one principle). A comparator network is a sorting network if and only if it can sort any input consisting only of 0s and 1s. This lemma is the reason, why from now on we consider inputs consisting only of 0s and 1s. Thus we consider only t-disturbed sequences consisting of 0s and 1s. We note, that 0-1 sequence x1 , . . . , xN is t disturbed if for some index b called the border at most t entries in x1 , . . . , xb are 1s and at most t entries in xb+1 , . . . , xN are 0s. These 1s (0s) we call displaced. Let us remind the proof of zero–one principle. The input consists of arbitrary elements. We prove that the comparator network sorts it. We consider an arbitrary a from this input and show it gets to the register corresponding to its rank in the sequence. We replace elements bigger than a by 1, and smaller by 0. Indeed the only difference between outputs for sequences where a is replaced by 0 or 1 respectively is the register with the index corresponding to rank(a). Now we deal with an arbitrary t-disturbed input. We transform it to a t-disturbed 0-1 sequence as in the proof of zero–one principle. This gives us a useful analog of zero-one principle for t-correction networks. Lemma 1.3. A comparator network is a t-correction network if it can sort any tdisturbed input consisting of 0s and 1s. We define dirty area for 0-1 a sequence stored in the registers during computations of a comparator network. Dirty area is the smallest set of subsequent registers such that all registers with lower indices contain 0s and all registers with bigger indices contain 1s. A specialized comparator network that sorts any input having dirty area of a given size we call a cleaning network. 2 Periodic Sorting Networks In this section we remind the periodification scheme in [9]. Actually what we present is closer to the version of this scheme described by Oesterdiekhoff [10] which produces a network of depth 3. In comparison to previous authors we change the construction of Schwiegelsohn edges and embed only a single copies of sorting and merging networks. Fast Periodic Correction Networks 147 The analysis of the network is almost the same as in abovementioned papers and we do not show it. The periodification scheme is a method to convert a non periodic sorting network having T (p) layers for input size p into a periodic sorting network of depth 3. This periodic network sorts any input containing Θ(pT (p)) items in O(T (p) log p) iterations. We take advantage of the fact, that for any sorting network T (p) = Ω(log p). The periodification scheme applied to Batcher sorting network gives a periodic sorting network which needs O(log3 N ) iterations to sort an arbitrary input of size N . If we put AKS sorting network into this scheme, we get a periodic sorting network requiring O(log2 N ) iterations which is (due to very large constants in AKS) worse solution for practical N . In the periodification scheme registers are indexed with pairs (i, j), 1 ≤ i ≤ p, 1 ≤ j ≤ q ordered lexicographically. Thus we view these registers as arranged in rectangular matrix p × q of p rows and q columns. We have the rows with smallest indices i at the‘top’ and those with biggest indices at the ‘bottom’ of the array. We also view columns with smallest indices j to be on the left hand side and those with biggest indices to be on the right hand side. The parameter q = 10(T (p) + log p) is an even number (for simplicity from now on we write log p instead of ⌈log p⌉). The periodic sorting network consists of three subsequent layers A, B and C. The layers A and B which are layers of odd-even transposition sort network are called horizontal steps. They are sets of comparators: A = {[(i, 2j − 1) : (i, 2j)]|i, j = 1, 2, . . .} B = {[(i, 2j) : (i, 2j + 1)]|i, j = 1, 2, . . .} ∪ {[(i, q) : (i + 1, 1)]|i = 1, 2, . . .} The edges of A and B connecting registers of the same row we call horizontal. The layers A, B alone sort any input but in general the time to do it is very long. Defining layer C called vertical step is much more complicated. We first divide the columns of registers into six subsequent areas: S, ML , XL , Y, XR , MR . Each of the areas contains an even number of columns. First two columns form an area S where so called ‘Schwiegelsohn’ edges are located. So the columns with numbers 3, 4, . . . , 2 log p + 2 are in the area ML . Next 2T (p) columns form area XL . Last 2 log p columns are contained in area MR . Area XR consists of 2T (p) columns directly preceding MR . And the area Y contains all the columns between XL and XR . We now say where the comparators of layer C are in each area. In area S the comparators form the set {[(2i − 1, 1) : (2i, 2)]|i = 1, 2, . . .} Note that this way of defining “Schwiegelsohn” edges differs from one described in previous papers on this subject. Comparators of C in all other areas unlike those in S connect always registers in the same column. There are no comparators in area Y on layer C. In each area ML and MR we embed a single copy of a network which merges two sorted sequences of length p/2. In this network’s input of length p even indexed entries are one sequence and odd indexed entries are the other. We also assume, that the sequence in odd indexed entries does not have more 1s than one contained in even 148 G. Stachowiak 2 log p p S 2T (p) 2T (p) 2 log p merging sorting sorting merging network network network network in in in in odd even odd even columns columns columns columns ML XL Y XR MR Fig. 1. Areas defined to embed C-layer. Arrows indicate the order of layers of embeded networks. indexed entries. A comparator network merging two such sequences is the series of layers L1 , L2 , . . . , Llog p−1 where Li = {[2j : 2j + 2log p−i − 1]|j = 1, 2, . . .}. Thus the set of comparators in ML is equal to {[(k, 2j + 1) : (l, 2j + 1)]|[k : l] ∈ Lj , j = 1, 2, . . .}. The set of comparators in MR is equal to {[(k − 1, q − 2j + 2)) : (l − 1, q − 2j + 2)]|[k : l] ∈ Lj , j = 1, 2, . . .}. For technical reasons the network embedded in MR is moved one row up. Finally we define comparators in XL and XR . These comparators are embedding of a single sorting network in each area. Let this sorting network be the series of layers L′1 , L′2 , . . . , L′T (p) . Let jL = 2 + 2 log p + 2T (p) be the last column of XL . The set of comparators in XL is equal {[(k, jL − 2(j − 1)) : (l, jL − 2(j − 1))]|[k : l] ∈ L′j , j = 1, 2, . . .}. Analogously if jR = q − 2 log p − 2T (p) + 1 is the first column of XR , then the set of comparators in XR is equal {[(k, jR + 2(j − 1)) : (l, jR + 2(j − 1))]|[k : l] ∈ L′j , j = 1, 2, . . .}. The edges connecting registers in the same column we call vertical. Almost all the edges of step C are vertical. Only the slanted edges in S are not vertical. Our aim in the analysis of the network obtained in periodification scheme is to prove that it sorts any input in O(T (p) log p) steps. The proof easily follows from the key lemma Fast Periodic Correction Networks 149 Lemma 2.1 (key lemma). There exist constants c and d such that after d · q steps – the bottom c · p rows contain only 1s if there are more 1s than 0s in the registers; – the top c · p rows contain only 0s if there are more 0s then 1s in the registers. Indeed if we consider only the rows containing dirty area in the key lemma, then this area is guaranteed to be reduced by a constant factor within O(q) steps. Thus applying the key lemma O(log p) times we reduce this area within O(q log p) steps to a constant number of rows. Next O(q) steps sort such a reduced dirty area. We do not describe the proof of key lemma, but define some notions from it to use them further in the paper. In this proof it is assumed, that two 1s or 0s compared by a horizontal edge are exchanged. In a given moment of computations we call an item (i.e. 0 or 1) right-running (left-running) if it is placed in the right (left) register by a horizontal edge of the recently executed horizontal step. We can extend this definition on wrap-around edges of layer B in a natural way saying that they put right-running items in the first column and left-running items in the last. A column containing rightrunning (left-running) items is called R-column (L-column). Analyzing the network we can observe ‘movement’ of R-columns of 1s to the right and L-columns of 0s to the left. Thus any column is alternately L-column and R-column and the change occurs during every horizontal step. The only exception are two columns of S. From the proof of key lemma it also follows, that we have the following property Fact 2.2 Assume we add any vertical edges to the layer C in area Y . For such a new network the key lemma still holds. Now we modify periodification scheme step by step to obtain at the end periodic t-correction network. First we introduce a construction of a periodic cleaning network sorting any N -element input with the dirty area of size qt, q ≥ 10(T (2t) + 2 log t). In this construction registers are arranged into q columns and dirty area is contained in t subsequent rows. This network needs O(q log t) iterations to do its job. The construction of periodic correction network is based on this cleaning network. We first build a simple non periodic cleaning network Lemma 2.3. Assume we have a sorting network of depth T (t) for input size t. We can construct a comparator network of depth T ′ (t) = T (2t) + log t which sorts any input with dirty area of size t. Proof. We divide the set of all registers r1 , r2 , . . . , rN into N/t disjoint parts each consisting of t subsequent registers. Thus we obtain part P1 containing registers r1 , . . . , rt , P2 containing registers rt+1 , . . . , r2t , P3 containing registers r2t+1 , . . . , r3t , and so on. The cleaning network consists of two steps. First we have networks sorting keys in P2i ∪ P2i+1 for each i. It requires T (2t) layers. Then we have networks merging elements in P2i−1 with those in P2i for each i. It requires log t layers of the network. Now we can build a periodic cleaning network. We do it substituting sorting network in the periodification scheme with the cleaning network described above. This way we can reduce XL and XR to 2T ′ (t) columns. We also reduce ML and ML to 2 log t 150 G. Stachowiak columns, by embedding only log t last layers of merging network instead of the whole merging network applied in periodification scheme. These layers are (after relabeling) L1 , L2 , . . . , Llog t where Li = {[2j + 1 : 2j + 2log t−i+1 ]|j = 1, 2, . . .}. They merge any two sequences that do not differ by more than t/2 1s. So instead of a sorting network we use a cleaning one and we reduce the merging network. Such reduced sorting and merging networks are not distinguishable from original merging and sorting networks if we deal only with inputs having dirty areas of size at most qt. The analysis of such a periodification scheme for cleaning networks is the same as the original one for sorting networks and gives us the following fact Lemma 2.4. The periodic cleaning network described above has depth 3 and sorts any input with dirty area having t rows in O(q log t) iterations. One can notice that there are no edges of layer C in Y in this construction. If we add any vertical edges in Y or any other edges with the difference between row numbers of end registers bigger than t to layer C, then the network remains a cleaning network. Roughly speaking by adding such edges we are going to transform the periodic cleaning network into a periodic t-correction network. 3 Main Construction In this section we define our periodic t-correction network. To do it we need another non periodic comparator network. We call it (t, ∆, δ)-semi-correction network. If a tdisturbed input with dirty area of size ∆ is processed by such a network, then the dirty area size is guaranteed to be reduced to δ. Now we present quite unoptimal construction of (t, ∆, δ)-semi-correction network. We divide the set of all registers r1 , r2 , . . . , rN into N/∆ disjoint parts each consisting of ∆ subsequent registers. Thus we obtain part P1 containing registers r1 , . . . , r∆ , P2 containing registers r∆+1 , . . . , r2∆ , P3 containing registers r2∆+1 , . . . , r3∆ , and so on. The construction consists of two steps. In step 1 we give new indices to the registers of each sum P2k ∪ P2k+1 , k = 1, 2, . . .. These indices are lexicographically ordered pairs (i, j), 1 ≤ i ≤ 2t∆/δ, 1 ≤ j ≤ δ/(2t). The ordering of new indices is the same as the main ordering of indices. We apply a t-correction network to each column j of each sum separately. This way we obtain dirty area of size at most δ in each sum. In step 2 we repeat the construction from step 1 for sums P2k−1 ∪ P2k Because any dirty area of size ∆ is contained in one of the sums Pl ∪ Pl+1 from step 1 or 2, this dirty area is reduced to size δ. Thus we get the following lemma Lemma 3.1.  There exists a (t, ∆, δ)-semi-correction network  Let t ≪ δ and t ≪ ∆/δ. of depth O log x + (log t log log x)2 , where x = ∆/δ.   Proof. Description of t-correction networks of depth O log N + d(log t log log N )2 (N is the input size) can be found in [4,16]. We apply such a network in the construction presented above and obtain a semi-correction network with desired depth. Simple calculations are left to the reader. Fast Periodic Correction Networks 151 Now at last we get to the main construction of this paper. We assume, that log N ≪ t ≪ N and want to construct an efficient periodic t-correction network. Without loss of generality   we assume that t is even. Let S(N, t) = O log N/ log t + (log log N )2 (log t)2 be the maximum depth of a (t, ∆, δ)-semicorrection network for x = ∆/δ = N 1/ log t . As before T (t) is the depth of a sorting network. In our construction the registers are arranged into an array of q columns and N/q rows, where q = max {10(T (4t + 4) + 2 log t), 4(T (4t + 4) + 2 log t) + 2S(N, t)} The rows of this array are divided into N/pq floors which are sets of p = 4t+4 subsequent rows. So the floor 1 consists of rows 1, 2, . . . , p, floor 2 of rows p + 1, p + 2, . . . , 2p and so on. We use the notions of ‘bottom’ and ‘top’ registers from the proof of key lemma. Thus we divide each floor into two halves: top and bottom. They consist of p/2 = 2t + 2 top and bottom rows of each floor respectively. We define a family of floors to be a the set of all floors whose indices differ by i · log t for some integer i. Altogether we have log t families of floors. To each family of floors we assign the index of its first floor. From now on we all the time deal only with t-disturbed 0-1 input sequences. Any such a sequence has a border index b. The b-th register we call the border register. Its row we call the border row. Its floor we call the border floor. In the analysis we take into account only behavior of displaced 1s. Due to symmetry of the network the analysis for displaced 0s is the same and can be omitted. We begin with defining a particular kind of periodic cleaning network, which the whole construction is based on. By adding comparators to this network we finally obtain a periodic t-correction network. The periodic cleaning network is constructed in the similar way as one in the previous chapter. Above all we want to have some relation between vertical edges in areas XL and XR and the division of rows into floors. These comparators are embeddings of a cleaning network for dirty area p/2 = 2t + 2 in each area. Note that such a network also sorts any input with dirty area of size t, so can be used in the construction of periodic cleaning network for t dirty rows. The cleaning network consists of three subsequent parts. The first part are sorting networks – each sorting a group of p subsequent registers corresponding to a single floor. This part has depth T (p). The second part consists of merging networks which merge neighboring upper and lower halves of each pair of subsequent groups from the first part. It has depth log p. The third part is the last layer which we can add arbitrarily, because any layer of comparators does nothing to a sorted sequence. This layer is defined a bit later in the paper. Parts S, ML , MR are defined exactly the same way as earlier for a periodic cleaning network. So as we previously proved the periodic network we now defined is a cleaning network for dirty areas consisting of at most t rows and the following key lemma describing its running time holds Lemma 3.2 (key lemma). We consider t′ , t′ ≤ t subsequent rows of above defined network, such that above (below) these rows there are only 0s (1s). Let we have majority of 0s (1s) in these rows. There exist constants c and d such that after d · q steps the top (bottom) c · t′ of these t′ rows contain only 0s (1s). 152 G. Stachowiak Note that if we add to C any edges in Y or connecting rows whose difference is bigger than t, then the key lemma still holds and so all its consequences hold too. We prove the following lemma Lemma 3.3. The periodic cleaning network described above sorts considered inputs with dirty area having a · t rows in O(qa + q log t) iterations. Proof. If the number of rows in dirty area is smaller than t then a standard reasoning for periodic sorting networks works. We need only to consider what happens if the number of rows in dirty area is bigger than t. If there are at least highest t/2 rows of dirty area above the border row, then we can apply key lemma to these rows. Since the input is t-disturbed we have majority of 0s in these rows. So we obtain ct/2 top rows of 0s in time dq. Thus the dirty area is reduced by ct/2 rows. In the opposite case an analogous reasoning can be applied to t/2 lowest rows where we have majority of 1s. Now we add some comparators to layer C so that our network gains new properties. First we add in area S comparators {[(2i, 1) : (2i + t + 1, 2)]|i}. To formulate the fact which follows from the presence of these comparators we must specify what exactly we mean by right-running items. In the proof of key lemma rightrunning items were those 0s and 1s which were on the right of a horizontal edge after step A or B. We redefine it saying that in area S right-running items go right in step C instead of step A that is just after this step. Analogously we can redefine left-running items. We assume that two diplaced 1s or two 0s are swapped by an edge if this is an edge of step B or a slanted edge of step C or an edge of step A not belonging to area S. Displaced 1s are not swapped with non-displaced 1s. We can now formulate a simple property of our network that is preserved when we add edges Fact 3.4 In the network defined above right-running displaced 1s remain right-running as long as they are more than t + 1 rows above border row. Now for a while we assume, that we deal only with displaced 1s that are more that one floor above the border. We remind, that R-columns and L-columns after a given step are columns containing right-running and left-running items respectively. We can note that R-column which gets to the column jR while moving through XR is first sorted separately on each floor by the first part of the cleaning network. Next the displaced 1s from each floor go half a floor down by the second part. An analogous process is also performed for left-running 1s in XL as long as they remain left-running. Thus after the second part of their way through XR right-running displaced 1s are located at the bottom of the top half of each floor above the border floor. Analogously left-running displaced 1s are also moved just before the last layer embedded in XL to the bottom of the top half of each floor. We now should specify what the additional layer in the third part of XR does. Formally speaking this layer is the set of comparators {[(kp + p/2 + 2i) : (kp + p/2 − 2i + 1)]|, 0 < i ≤ t/2}. Fast Periodic Correction Networks 153 It moves right-running displaced 1s that went through XR to odd indexed rows in the middle of each floor. Analogously the last layer embedded in XL is {[(kp + p/2 + 2i − 1) : (kp + p/2 − 2i)]|, 0 < i ≤ ⌈t/2⌉}. It moves left-running displaced 1s to even indexed rows in the middle of each floor. Let us call these rows for a while starting rows of these 1s. We can see that these all right-running displaced 1s then pass MR , S, ML and XL without being moved by vertical edges in MR , ML , XL . Note, that they encounter vertical edges only in ML and they are at the bottom of these edges. The same happens to left-running 1s when they pass ML , S, MR and XR . After passing XL each right-running 1 is t + 2 rows below its starting (odd) row. After passing XR each left-running 1 is 2 rows above its starting (even) row. These 1s are still on the same floors as their starting rows. Similar facts can be proved for displaced 0s below the border which are also moved by last layers of XL and XR described above. Now we define the vertical edges added in area Y of layer C. These comparators are embeddings of four semi-correction networks in each family of floors. Now we describe the comparators embedded in r-th family of floors. Let a1 , a2 , . . . , a2N/(q log t) be the indices of odd rows in this family of floors. We can build a (t, N 1−(r−1)/ log t , N 1−r/ log t )semi-correction network on registers with these indices. The depth of this network is not bigger (from the assumption about q) than the number of odd indexed columns in Y . Let this network be the sequence of layers L1 , L2 , L3 , . . .. The first set of comparators is {[(jL + 2j − 1, k) : (jL + 2j − 1, l) : [k, l] ∈ Lj ]}. We assumed that after passing XL right-running 1s are in odd rows. Assume that they can be present only in N 1−(r−1)/ log t odd rows of rth family directly above the border. When they pass Y they can be present only in N 1−r/ log t odd rows of rth family directly above the border. Passing Y in family r = log t finally causes these 1s get to some of t odd rows of this family directly above the border. We formulate this assertion as a fact later because we need some additional assumptions. Analogously we can embed the same network once again to deal with left-running 1s that are in even rows. Formally speaking we add to C the following set of comparators {[(jR − 2j + 1, k + 1) : (jR − 2j + 1, l + 1) : [k, l] ∈ Lj ]}. This set of edges again causes left running 1s which are in N 1−(r−1)/ log t even rows of rth family directly above the border reduce the number of these rows between these 1s and the border to at most N 1−r/ log t . Analogously we also embed two copies of (t, N r/ log t , N (r−1)/ log t )-semi-correction network to deal with displaced 0s below the border row. We described the whole network and the way it works informally. To make this analysis more formal we assign colors to displaced 1s. We use five colors: blue, black, red, yellow and green. Let β be the index of the border floor. We assume the following rules of coloring displaced 1s: – At the beginning the color of all displaced 1s is blue. 154 G. Stachowiak reduced merging merging network network in even in odd columns merging network in odd columns S ML columns sorting sorting network network in even in odd columns columns reduced merging network in odd merging merging network network in even in odd columns columns Y columns MR Fig. 2. Comparator networks embeded on a single floor. – If a blue 1 is compared with a non-blue 1 by a vertical edge, then the blue 1 behaves like a 0. – When any 1 gets to the floor with the index not smaller than β − 1, it changes its color to green. – When a right-running non-blue 1 gets to the floor β − 2, it changes its color to green. – When a non-green left-running 1 changes to be right-running it becomes blue. – When a blue 1 gets from Y to outside of Y it changes its color to black. – When a black 1 enters Y from outside of Y on the floor belonging to the family 1, then it changes its color to red. – When a red 1 leaves Y on the floor in the last family of floors (family log t), then it becomes yellow. First we prove, that all green 1s stay close to the border. They prove to be all the time at the floors with indices not smaller than β − 2, so they are not more than 13t rows above the border row. We notice that right-running 1s can go only to the lower rows. Left-running 1s can go to the higher rows only in area S of layer C and by wrap-around edges of layer B. So only left-running can go up from the floor β − 2. Each q horizontal steps a left-running 1 can go up by maximum t + 2 rows. But on the other hand each q horizontal steps it passes XL once. Passing XL it goes to the row t-th or lower counting from the bottom of floor β − 2. Thus it cannot leave floor β + 2 going up not more than t + 2 rows. Moreover because our network is periodic correction network for dirty area of t rows we have the following fact Fact 3.5 If all displaced 1s are green, then the time to sort all the items above the border is O(q log t). Now we consider a right-running blue 1 or a left-running blue 1 assuming it stays left-running. From what we said before a right-running 1 stops to be right-running only when it is green. We want to see how quickly it becomes green. After O(q) steps this 1 stops to be blue. The worst case is that it becomes black. The following fact can be Fast Periodic Correction Networks 155 viewed as a summary of what we said defining comparators of last column of XL and XR . We take advantage of the fact that right-running 1s that just changed from being left-running above floor β − 1 are blue. We also take advantage of the fact, that rightrunning 1s which are more than t rows above the border do not become left-running. Such 1s is the only factor that could disturb the 1s we are interested in to go one floor down. Fact 3.6 Any black, red or yellow right-running 1 on the floor higher than β − 1 passing areas XR , MR , S, ML , XL goes one floor down and ends up in an odd indexed row. Any black, red or yellow left-running 1 on the floor higher than β passing areas XL , ML , S, MR and XR goes one floor down and ends up in an even indexed row. The comparators in Y connect only the rows belonging to the same family of floors. So passing Y a displaced 1 does not change its family of floors. Thus we have the next fact. Fact 3.7 Every q horizontal steps a black or red 1 gets from a family r to the family r + 1. The exception is family r = log t from which it gets to family 1. So after at most q log t horizontal steps a black 1 becomes red, unless it starts to be green. We measure the distance of a red 1 that is in family r to the border as the number of rows that belong to the family r and are between this 1 and the floor β − 2. Passing Y in family 1 a red 1 reduces this distance from at most N to at most 2N 1−1/ log t . Then it gets to families 2, 3, . . . , log t − 1. Passing Y in family r a red 1 reduces this distance from 2N 1−(r−1)/ log t to 2N 1−r/ log t . Then after passing Y in family log t a red 1 is in the distance at most 2t. This way a red 1 becomes yellow after q log t horizontal steps. Now it is at most log t + 2 floors above the border. A yellow right-running 1 goes at least 1 floor down each q horizontal steps, till it becomes green after at most q log t horizontal steps. This whole process of color change from blue to green takes altogether 3q log t horizontal steps. It always succeeds for right-running 1s. Left-running 1s can switch to be right-running before they become green. They have to do it before 3q log t horizontal steps in which they have to become green if they are all the time left-running. In such a case they become inevitably green after next 3q log t iterations as right-running 1s. Thus we have the following fact. Fact 3.8 All disturbed 1s start to be green after at most 6q log t horizontal steps. Because inputs having only green 1s are quickly sorted we get the main result of the paper Theorem 3.9. The periodic t-correction network we defined in this paper sorts any t disturbed input in O(q log t) iterations, which is equal to O(log N + (log log N )2 (log t)3 ) Acknowledgments. The author wishes to thank Marek Piotrów and other coworkers from algorithms and complexity group of his institute for helpful discussions. 156 G. Stachowiak References 1. M. Ajtai, J. Komolós, E. Szemerédi, Sorting in c log n parallel steps, Combinatorica 3 (1983), 1–19. 2. K.E. Batcher, Sorting networks and their applications, in AFIPS Conf. Proc. 32 (1968), 307– 314. 3. M. Dowd, Y. Perl, M. Saks, L. Rudolph. The Periodic Balanced Sorting Network. Journal of the ACM 36 (1989), 738–757. 4. M. Kik, M. Kutyáowski, M. Piotrów, Correction Networks, in Proc. of 1999 ICPP, 40–47. 5. M. Kik, M. Kutyáowski, G. Stachowiak, Periodic constant depth sorting networks, Proc. of the 11th STACS, 1994, 201–212. 6. M. Kik, Periodic Correction Networks, EUROPAR 2000 Proceedings, LNCS 1900, 471–478 7. D. E. Knuth, The Art of Computer Programming, Vol 3, 2nd edition, Addison Wesley, Reading, MA, 1975. 8. J. Krammer, Lösung von Datentransportproblemen in integrierten Schaltungen. Dissertation, TU München 1991. 9. K. Loryś, M. Kutyáowski, B. Oesterdiekoff, R. Wanka, Fast and Feasible Periodic Sorting Networks of Constant Depth, Proc of 35 IEEE-FOCS, 1994, 369–380. 10. B. Oesterdiekoff, On the Minimal Period of Fast Periodic Sorting Networks, Technical Report TR-RI-95-167, University of Paderborn, 1995. 11. M. Piotrów, Depth Optimal Sorting Networks Resistant to k Passive Faults in Proc. 7th SIAM Symposium on Discrete Algorithms (1996), 242–251 (also accepted for SIAM J. Comput.). 12. M. Piotrów, Periodic Random-Fault-Tolerant Correction Networks, Proceedings of 13th SPAA, ACM 2001, 298–305. 13. L. Rudolph,A Robust Sorting Network, IEEE Transactions on Computers 34(1985), 326–336. 14. M. Schimmler, C. Starke, A Correction Network for N -Sorters, SIAM J. Comput. 18 (1989), 1179–1197. 15. U. Schwiegelsohn. A shortperiodic two-dimensional systolic sorting algorithm. In International Conference on Systolic Arrays, Computer Society Press, Baltimore 1988, 257–264. 16. G. Stachowiak, Fibonacci Correction Networks, in Algorithm Theory – SWAT 2000 , M Halldórsson (Ed.) , LNCS 1851, Springer 2000, 535–548. Games and Networks Christos Papadimitriou The Computer Science Division University of California, Berkeley Berkeley, CA 94720-1776 christos@cs.berkeley.edu Abstract. Modern networks are the product of, and arena for, the complex interactions between selfish entities. This talk surveys recent work (with Alex Fabrikant, Eli Maneva, Milena Mihail, Amin Saberi, and Scott Shenker) on various instances in which the theory of games offers interesting insights to networks. We study the Nash equilibria of a simple and novel network creation game in which nodes/players add edges, at a cost, to improve communication delays. We point out that the heavy tails in the degree distribution of the Internet topology can be the result of a trade-off between connection costs and quality of service for each arriving node. We study an interesting class of games called network congestion games, and prove positive and negative complexity results on the problem of computing pure Nash equilibria in such games. And we show that shortest path auctions, which are known to involve huge overpayments in the worst case, are “frugal” in expectation in several random graph models appropriate for the Internet. A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, p. 157, 2003. c Springer-Verlag Berlin Heidelberg 2003  One-Way Communication Complexity of Symmetric Boolean Functions Jan Arpe⋆ , Andreas Jakoby⋆⋆ , and Maciej Liśkiewicz⋆ ⋆ ⋆ Institut für Theoretische Informatik, Universität zu Lübeck {arpe,jakoby,liskiewi}@tcs.uni-luebeck.de Abstract. We study deterministic one-way communication complexity of functions with Hankel communication matrices. In this paper some structural properties of such matrices are established and applied to the one-way two-party communication complexity of symmetric Boolean functions. It is shown that the number of required communication bits does not depend on the communication direction, provided that neither direction needs maximum complexity. Moreover, in order to obtain an optimal protocol, it is in any case sufficient to consider only the communication direction from the party with the shorter input to the other party. These facts do not hold for arbitrary Boolean functions in general. Next, gaps between one-way and two-way communication complexity for symmetric Boolean functions are discussed. Finally, we give some generalizations to the case of multiple parties. 1 Introduction The communication complexity of two-party protocols was introduced by Yao in 1979 [15]. The theory of communication complexity evolved into an important branch of computational complexity (for a general survey of the theory see e.g. Kushilevitz and Nisan [9]). In this paper we consider one-way communication, i.e. we restrict the communication to a single round. This simple model has been investigated by several authors for different types of communication such as fully deterministic, probabilistic, nondeterministic, and quantum (see e.g. [15,12,1,11,3,8,7]). We study the deterministic setting. One-way communication complexity finds application in a wide range of areas, e.g. it provides lower bounds on VLSI complexity and on the size of finite automata (cf. [5]). Moreover, one-way communication complexity of symmetric Boolean functions is connected to binary decision diagrams by the following observation due to Wegener [14]: The size of an optimal protocol coincides with the number of nodes at a certain level in a minimal OBDD. We consider the standard two-party communication model: Initially the parties, called Alice and Bob, hold disjoint parts of input data x and y, respectively. ⋆ ⋆⋆ ⋆⋆⋆ Supported by DFG research grant Re 672/3. Part of this work was done while visiting International University Bremen, Germany. On leave from Instytut Informatyki, Uniwersytet Wroclawski, Wroclaw, Poland. A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 158–170, 2003. c Springer-Verlag Berlin Heidelberg 2003  One-Way Communication Complexity of Symmetric Boolean Functions 159 In order to compute a function f (x, y), they exchange messages between each other according to a communication protocol. In a (deterministic) one-way protocol P for f , one of the parties sends a single message to the other party, and then the latter party computes the output f (x, y). We call P a protocol of type A → B if Alice sends to Bob and of type B → A if Bob sends to Alice. The size of P is the number of different messages that can potentially be transmitted via the communication channel according to P. The one-way communication size S A→B (f ) of f is the size of the best protocol of type A → B. It is clear that the respective one-way communication complexity is C A→B (f ) = ⌈log S A→B (f )⌉. For the case when Bob sends messages to Alice, we analogously use the notation S B→A and C B→A . Note that throughout this paper, log always denotes the binary logarithm. The main results of this paper deal with one-way communication complexity of symmetric Boolean functions – an important subclass of all Boolean functions. A Boolean function F is called symmetric, if permuting the input bits does not effect the function value. Some examples for symmetric functions are and, or, parity, majority, and arbitrary threshold functions. We assume that the input bits for a given F are partitioned into two parts, one part consisting of m bits held by Alice and the other part consisting of n bits only known to Bob. As the function value of a symmetric Boolean function only depends on the number of 1’s in the input (cf. [13]), it is completely determined by the sum of the number of 1’s in Alice’s input part and the number of 1’s in Bob’s part. Hence for such functions, we are faced with the problem of determining the one-way communication complexity of a function f : {0, . . . , m} × {0, . . . , n} → {0, 1} associated to F , where f (x, y) only depends on the sum x + y. Note that S A→B (F ) ≤ m + 1 is a trivial upper bound on the one-way communication size of F . Let us assume that Alice’s input part is at most as large as Bob’s is (i.e. let m ≤ n). While for arbitrary functions this property does not imply which communication direction admits the better one-way protocols, we show that the converse is true for symmetric Boolean functions F , namely in this case we have C A→B (F ) ≤ C B→A (F ). Moreover, we prove that if some protocol of type A → B does not require maximal size, i.e. if S A→B (F ) < m + 1, then both directions yield the same complexities, i.e. C A→B (F ) = C B→A (F ). We also present a class of families of symmetric Boolean functions for which one-way communication is almost as powerful as two-way communication. More precisely, for any family of symmetric Boolean functions F1 , F2 , F3 . . . with Fm : {0, 1}2m → {0, 1}, let fm : {0, . . . , m} × {0, . . . , m} → {0, 1} denote the integer function associated to Fm . We prove that if fm ⊆ fm+1 for all m ∈ N, then either the one-way communication complexities of F1 , F2 , F3 . . . are almost all equal to a constant c or the two-way communication complexities of F1 , F2 , F3 . . . are infinitely often maximal. We show that one can easily test whether the first or the second case occurs: The two-way communication complexities are infinitely often maximal if and only if the unary language {0k+ℓ | fm (k, ℓ) = 1, m, k, ℓ ∈ N} is nonregular. 160 J. Arpe, A. Jakoby, and M. Liśkiewicz On the other hand, we construct an example of a symmetric Boolean function having one-way communication complexity exponentially larger than its twoway communication complexity. Finally, we generalize the two-party model to the case of multiple parties and extend our results to such a setting. Our proofs are based on the fact that the communication matrix of the integer function f associated with a symmetric Boolean function F is a Hankel matrix. In general, the entries of the communication matrix Mf of f are defined by mi,j = f (i, j). A Hankel matrix is a matrix in which the entries on each anti-diagonal are constant (equivalently, mi,j only depends on i + j). Hankel matrices are completely determined by the entries of their first rows and their last columns. Thus with any (m + 1) × (n + 1)-Hankel matrix H we associate a function fH such that fH (0), fH (1), . . . , fH (n) compose the first row of H and fH (n), fH (n + 1), . . . , fH (m + n) make up its last column. One of the main technical contributions of this paper is a theorem saying that if m ≤ n and H has less than m + 1 different rows, then fH is periodic on a certain large interval. We apply this property to the one-way communication size using a known relationship between this measure and the number of different rows in communication matrices. As a byproduct, we obtain a word combinatorial property: Let w be an arbitrary string over some alphabet Σ. Then, for m ≤ ⌈|w|/2⌉ and n = |w| − m + 1, the number of different substrings of w of length n is at most as large as the number of different substrings of w of length m. Moreover, if the former number is strictly less than m (note that it can be at most m in general), then the number of different substrings of length n and the number of different substrings of length m coincide. The paper is organized as follows: In Section 2, we introduce basic definitions and notation. Section 3 deals with the examination of the number of different rows and columns in Hankel matrices involving certain periodicity properties. In Section 4, we state some applications of these properties. Then, in Section 5, we present a class of symmetric Boolean functions with both maximal one-way and two-way communication complexity, and then we construct a symmetric Boolean function with an exponential gap between its one-way and its two-way communication complexity. Finally, in Section 6, we discuss natural extensions of our results to the case of multiple parties. 2 Preliminaries For any integers 0 ≤ k < k′ , let [k..k′ ] denote the set {k, k + 1, . . . , k′ }, and denote [0..k] by [k] for short. By N we denote the set of nonnegative integers. We consider deterministic one-way communication protocols between Alice and Bob for functions f : [m] × [n] → Σ, where Σ is an arbitrary (finite or infinite) nonempty set. More specifically, we assume that Alice holds a value x ∈ [m], and Bob holds a value y ∈ [n] for some fixed positive integers m and n. Their aim is to compute the value f (x, y). One-Way Communication Complexity of Symmetric Boolean Functions 161 Let M(m, n) denote the set of all (m + 1) × (n + 1) matrices M = (mi,j ), with mi,j ∈ Σ. It will be convenient for us to enumerate the rows from 0 to m and the columns from 0 to n. For a given function f : [m] × [n] → Σ, we denote by Mf the corresponding communication matrix in M(m, n). Definition 1. For a matrix M ∈ M(m, n), define #row(M ) to be the number of different rows of M , and similarly let #col(M ) be the number of different columns of M . Furthermore, for any i, j ∈ [m], let i ∼M j denote that the rows i and j of M are equal. It is easy to characterize the one-way communication size by #row and #col. Fact 1. For all m, n ∈ N and for every function f : [m] × [n] → Σ, it holds that S A→B (f ) = #row(Mf ) and S B→A (f ) = #col(Mf ). In this paper we will restrict ourselves to functions f that only depend on the sum of the arguments. Note that for such functions f the communication matrix Mf is a Hankel matrix. The problem of finding protocols for such restricted f arises naturally when one considers symmetric Boolean functions. Definition 2. Let f : [s] → N, λ ≥ 1 and s1 , s2 ∈ [s] with s1 ≤ s2 − λ. We call f λ-periodic on [s1 ..s2 ], if for all x ∈ [s1 ..s2 − λ], f (x) = f (x + λ). Obviously, f is λ-periodic on [s1 ..s2 ] if and only if for all x, x′ ∈ [s1 ..s2 ] with λ | (x − x′ ), it holds that f (x) = f (x′ ). 3 Periodicity of Rows and Columns in Hankel Matrices This section is devoted to examine the relationship between the number of different rows and the number of different columns in a Hankel matrix. Lemmas 1 through 3 are technical preparations for Theorem 1 which gives an explicit characterization of a certain periodic behaviour of the function associated with a Hankel matrix and of the Hankel matrix itself. Theorems 2 and 3 reveal all possible constellations of values for #row(H) and #col(H) for a Hankel matrix H. The results will be applied to the theory of one-way communication in Section 4. Fact 2. Let f : [s] → N be λ-periodic on [s1 ..s2 ] ⊆ [s] and on [t1 ..t2 ] ⊆ [s] such that s1 ≤ t1 and t1 + λ ≤ s2 . Then f is λ-periodic on [s1 ..t2 ]. Lemma 1. Let H ∈ M(m, n) be a Hankel matrix, m0 , m1 ∈ [m] with m0 < m1 , and λ ∈ [1..m1 − m0 ]. Then the following two statements are equivalent: (a) fH is λ-periodic on [m0 ..m1 + n]. (b) For all x ∈ [m0 ..m1 ] and all k ∈ N such that x + kλ ≤ m1 , x ∼H x + kλ. 162 J. Arpe, A. Jakoby, and M. Liśkiewicz Fig. 1. An illustration of Case 1. Proof. “(a)⇒(b)”: Let x ∈ [m0 ..m1 ] and k ∈ N such that x + kλ ≤ m1 . For all y ∈ [n], x + y ≥ m0 and x + y + kλ ≤ m1 + n . Since fH is λ-periodic on [m0 ..m1 + n], we have fH (x + y) = fH (x + kλ + y). “(b)⇒(a)”: Let x ∈ [m0 ..m1 + n − λ]. We consider two cases. If x ≤ m0 + n, then fH (x) = fH (m0 + (x − m0 )) = fH (m0 + λ + (x − m0 )) = fH (x + λ) , because m0 ∼H m0 + λ by hypothesis. If on the other hand x > m0 + n, then x − n > m0 and x − n + λ ≤ m1 . By hypothesis, x − n ∼H x − n + λ, and thus fH (x) = fH (x − n + n) = fH (x − n + λ + n) = fH (x + λ) . ⊓ ⊔ Corollary 1. Let H ∈ M(m, n) be a Hankel matrix and i, j ∈ [m] with i < j. Then i ∼H j if and only if fH is (j − i)-periodic on [i..j + n]. Corollary 2. Let H ∈ M(m, n) be a Hankel matrix. If fH is λ-periodic on [m0 ..m1 + n] for some m0 , m1 ∈ [m] with m0 < m1 and some λ ∈ [1..m1 − m0 ], then #row(H) ≤ m0 + λ + m − m1 , where equality holds if and only if all rows 0, . . . , m0 + λ − 1 and m1 + 1, . . . , m are pairwise different. Lemma 2. Let H ∈ M(m, n) be a Hankel matrix and m0 , m′0 , i, j ∈ [m] such that m0 ≤ i < j, m′0 − m0 ≤ n + 1, j − m0 ≤ n + 1, i ∼H j, and m0 ∼H m′0 . Then fH is (j − i)-periodic on [m0 ..j + n]. Proof. Choose λ = j − i and µ0 = m′0 − m0 . By Corollary 1, fH is (i) µ0 -periodic on [m0 ..m′0 + n] and (ii) λ-periodic on [i..j + n]. Let x ∈ [m0 ..j + n − λ]. In order to show that fH (x + λ) = fH (x), we consider: Case 1: m0 ≤ x < i: Let k ∈ N such that i ≤ x + kµ0 ≤ i + µ0 − 1. We need to show that x, x + kµ0 , x + kµ0 + λ, x + λ ∈ [m0 ..m′0 + n] x + kµ0 , x + kµ0 + λ ∈ [i..j + n] and (1) (2) One-Way Communication Complexity of Symmetric Boolean Functions 163 in order to apply properties (i) and (ii) to the corresponding elements. Property (1) follows from m0 ≤ x and x+kµ0 +λ ≤ i+µ0 +λ−1 = j+m′0 −m0 −1 ≤ m′0 +n. Property (2) is due to i ≤ x + kµ0 and x + kµ0 + λ ≤ j − 1 + µ0 ≤ j + n. Now (cf. Fig. 1) fH (x) = fH (x + kµ0 ) = fH (x + kµ0 + λ) = fH (x + λ) , where the first and the last equality follow from properties (1) and (i), and the middle equality is due to properties (2) and (ii). Case 2: i ≤ x ≤ j + n − λ: In this case, fH (x) = fH (x + λ) by Corollary 1. ⊓ ⊔ The following lemma is symmetric to the previous one: Lemma 3. Let H ∈ M(m, n) be a Hankel matrix and m1 , m′1 , i, j ∈ [m] such that i < j ≤ m1 , m1 − m′1 ≤ n + 1, m1 − i ≤ n + 1, i ∼H j, and m1 ∼H m′1 . Then fH is (j − i)-periodic on [i..m1 + n]. Proof. Let H = (hi,j ). We define λ = j − i and H ′ = (h′µ,ν ) ∈ M(m, n) by h′µ,ν = hm−µ,n−ν for (µ, ν) ∈ [m] × [n], i.e. we rotate H by 180 degrees in the plane. Clearly, H ′ is again a Hankel matrix. Moreover, we have fH (z) = fH ′ (m + n − z) for all z ∈ [m + n]. We set m0 = m − m1 , m′0 = m − m′1 , i′ = m − j, and j ′ = m − i. Now it is easy to check that H ′ , i′ , j ′ , m0 , and m′0 fulfill the preconditions of Lemma 2 and m + n − x − λ ∈ [m0 ..j ′ + n − λ], thus ⊓ ⊔ yielding fH (x + λ) = fH ′ (m + n − x − λ) = fH ′ (m + n − x) = fH (x) . Theorem 1. Let m ≤ n + 1 and H ∈ M(m, n) be a Hankel matrix with #row(H) < m + 1. Then there exist λ ∈ [1..n] and m0 , m1 ∈ [m] with m1 − m0 ≥ λ such that the following two properties hold: (a) The function fH is λ-periodic on [m0 ..m1 + n]. (b) If i, j ∈ [m] with i < j and i ∼H j, then i, j ∈ [m0 ..m1 ] and λ | (j − i). Moreover, m0 , m1 and λ can be explicitly determined as follows: m0 = min{k ∈ [m] | ∃k ′ ∈ [m] with k ′ > k and k ∼H k ′ } , m1 = max{k ∈ [m] | ∃k ′ ∈ [m] with k ′ < k and k ∼H k ′ } , and λ = min{j − i | i, j ∈ [m] with i ∼H j and i < j} . Proof. Since #row(H) < m + 1, there exist i, j ∈ [m] with i < j such that i ∼H j. Thus, m0 , m1 and λ are well-defined. Clearly, m1 − m0 ≥ λ. Choose i0 , j0 ∈ [m] such that i0 ∼H j0 and j0 − i0 = λ. Since m ≤ n, all preconditions of Lemma 2 and Lemma 3 are satisfied. Thus we conclude that fH is λ-periodic on both discrete intervals [m0 ..j0 + n] and [i0 ..m1 + n]. Fact 2 now yields property (a). Now let i, j ∈ [m] with i < j and i ∼H j. Let k ∈ N such that j − i = kλ + r with 0 ≤ r < λ. By property (a), fH is λ-periodic on [m0 ..m1 + n], and so by Lemma 1 (note that i + kλ = j − r ≤ j ≤ m1 ), we have i + kλ ∼H i ∼H j. As r = j − i − kλ < λ and λ is the minimal difference between two equal rows of different indices, we have r = 0, so λ | (j − i). ⊓ ⊔ Using Corollary 2 we deduce two consequences of Theorem 1: Corollary 3. For H, m0 , m1 and λ as in Theorem 1, #row(H) = m0 + λ + m − m1 , i.e. H has exactly m0 + λ + m − m1 pairwise different rows. 164 J. Arpe, A. Jakoby, and M. Liśkiewicz Corollary 4. Let m ≤ n + 1 and H ∈ M(m, n) be a Hankel matrix with #row(H) < m + 1. Then #col(H) ≤ #row(H). The next lemma states an “expansion property” of Hankel matrices with at least two equal rows. Lemma 4. For arbitrary m, n ∈ N let H ∈ M(m, n) be a Hankel matrix with #row(H) < m + 1. Then there exist m′ ≥ n and a Hankel matrix H̃ ∈ M(m′ , n) such that #row(H̃) = #row(H) and #col(H̃) = #col(H). Sketch of proof. We duplicate the area between two equal rows until the total number of rows exceeds the total number of columns n. This process effects neither the number of different rows nor the number of different columns. ⊓ ⊔ Theorem 2. Let m ≤ n + 1 and H ∈ M(m, n) be a Hankel matrix with #row(H) < m + 1. Then #row(H) = #col(H). Proof. From Corollary 4, we have #row(H) ≥ #col(H). By Lemma 4, there exist m′ ≥ n and a Hankel matrix H̃ ∈ M(m′ , n) such that #row(H̃) = #row(H) and #col(H̃) = #col(H). Thus, again by Corollary 4, we obtain #row(H) = #row(H̃) = #col(H̃ T ) ≤ #row(H̃ T ) = #col(H̃) = #col(H) . Consequently, we have #row(H) = #col(H). ⊓ ⊔ Theorem 3. Let m ≤ n and H ∈ M(m, n) be a Hankel matrix with #row(H) = m + 1. Then #col(H) ≥ m + 1. Proof. Induction on n: For n = m, we have H = H T and thus #col(H) = #row(H T ) = #row(H) = m+1. Now suppose that n > m. Let H ′ ∈ M(m, n−1) be the matrix H without its last column. We consider two cases: Case 1: n ∼H T n′ for some n′ ∈ [n−1]. Then #col(H) = #col(H ′ ). In addition, #row(H ′ ) = m + 1, because if #row(H ′ ) ≤ m was true, then we had i ∼H ′ j for some 0 ≤ i < j ≤ m, and thus i ∼H j, since fH (i + n) = fH (i + n′ ) = fH (j + n′ ) = fH (j + n). Thus, we get #col(H) = #col(H ′ ) ≥ m + 1 by induction hypothesis. Case 2: n ∼H T n′ for all n′ ∈ [n − 1]. Then #col(H) = #col(H ′ ) + 1. Once again, we have to consider two subcases: Case 2a: #row(H ′ ) = m + 1: Then #col(H) = #col(H ′ ) + 1 = m + 2 > m + 1 by induction hypothesis. Case 2b: #row(H ′ ) ≤ m: Assume that #row(H ′ ) < m, and let m0 = min{k ∈ [m] | ∃k ′ ∈ [m] with k ′ > k and k ∼H k ′ } , m1 = max{k ∈ [m] | ∃k ′ ∈ [m] with k ′ < k and k ∼H k ′ } , λ = min{k ′ − k | k, k′ ∈ [m] with k < k′ and k ∼H k ′ } , where m′0 , m′1 and λ′ are the corresponding numbers for H ′ . By Corollary 3, #row(H ′ ) = m′0 + m − m′1 + λ′ , and f is λ′ -periodic on [m′0 ..m′1 + n − 1] by One-Way Communication Complexity of Symmetric Boolean Functions 165 Theorem 1. Since #row(H ′ ) < m by assumption, λ′ < m′1 − m′0 . In particular, m0 ∼H m0 + λ′ , and thus λ | λ′ by Theorem 1. Consequently, m0 ≤ m′0 , m1 ≥ m′1 − 1 and λ ≤ λ′ . Hence again by Corollary 3, #row(H) = m0 + m − m1 + λ ≤ m′0 + m − (m′1 − 1) + λ′ ≤ m′0 + m − m′1 + λ′ + 1 = #row(H ′ ) + 1 < m + 1 , contradicting the precondition #row(H) = m + 1. Thus, #row(H ′ ) = m. By Theorem 2, #col(H ′ ) = #row(H ′ ) = m. Consequently, #col(H) = #col(H ′ ) + 1 = m + 1. ⊓ ⊔ Note that for Hankel matrices over Σ with |Σ| ≥ m + n + 1 we can say even more. Namely, if m ≤ n, then for all r ∈ [m + 1..n + 1], there exists a Hankel matrix H ∈ M(m, n) with #row(H) = m + 1 and #col(H) = r. To see this, define f : [m] × [n] → Σ = {a0 , . . . , am+n } by f (x, y) = a(x+y) mod r . Then H = Mf is a Hankel matrix fulfilling the requested properties. 4 Applications Theorems 2 and 3 can be summarized in terms of one-way communication as follows. Theorem 4. Let m ≤ n and f : [m] × [n] → Σ be a function for which the corresponding communication matrix Mf is a Hankel matrix. Then the following properties hold: (a) S A→B (f ) ≤ S B→A (f ). (b) If S A→B (f ) < m + 1, then S A→B (f ) = S B→A (f ). This result can immediately be applied to symmetric Boolean functions: Corollary 5. Let m ≤ n and F : {0, 1}m × {0, 1}n → {0, 1} be a symmetric Boolean function. Then the following properties hold: (a) S A→B (F ) ≤ S B→A (F ). (b) If S A→B (F ) < m + 1, then S A→B (F ) = S B→A (F ). The results of the last paragraph can also be applied to word combinatorics as follows: Theorem 5. Let w be an arbitrary string over some alphabet Σ, and let Nw (i) denote the number of different subwords of w of length i. Then, for m ≤ ⌈|w|/2⌉ and n = |w| − m + 1, we have Nw (n) ≤ Nw (m). Moreover, if Nw (n) < m (note that Nw (n) ≤ m in general), then Nw (n) = Nw (m). 5 One-Way versus Two-Way Protocols In this section we first present a class of families of functions for which one-way communication complexities are almost the same as two-way communication complexities. We denote the two-way complexity of F by C(F ). Let F1 , F2 , F3 . . . with Fm : {0, 1}2m → {0, 1} be a family of symmetric Boolean functions and let fm : [m] × [m] → {0, 1} denote the integer function associated to Fm , i.e. m 2m F (x1 , . . . , x2m ) = 1 if and only if f ( i=1 xi , i=m+1 xi ) = 1. 166 J. Arpe, A. Jakoby, and M. Liśkiewicz Theorem 6. Let F1 , F2 , F3 . . . be a family of symmetric Boolean functions such that fm ⊆ fm+1 for all m ∈ N. Then either (a) for almost all m ∈ N, C A→B (Fm ) = c for some constant c or (b) for infinitely many m ∈ N, C(Fm ) = ⌈log(m + 1)⌉. Moreover, (b) holds iff the language L = {0k+ℓ | fm (k, ℓ) = 1, m, k, ℓ ∈ N} is nonregular. Proof. First, Theorem 11.3 in [6] gives a nice characterization of (non)regular unary languages in terms of the rank of certain Hankel matrices. This characterization was first observed by Condon et al. in [2]. It says that the unary language L is nonregular if and only if for infinitely many m ∈ N, rank(Mfm ) = m + 1 (i.e. the communication matrix Mfm has maximum rank). Second, Mehlhorn and Schmidt [10] showed that C(f ) ≥ log(rank(Mf )) for every f . Combining these facts we get that for nonregular L, C(fm ) = ⌈log(m + 1)⌉ for infinitely many m ∈ N. On the other hand, if L is regular then by the Myhill-Nerode Theorem [4] the infinite matrix M = (mi,j )i,j∈N defined by mi,j = 1 iff 0i+j ∈ L, has constant number of different rows. Hence the theorem follows. ⊓ ⊔ Example 1. Let Fm (x1 , x2 , . . . , x2m ) = 1 iff the number of 1’s in the sequence x1 , x2 , . . . , x2m is the square of some integer. By Theorem 6 either for all m ∈ N, C(Fm ), C A→B (Fm ) ≤ c for some constant c or for infinitely many m ∈ N, C A→B (Fm ) = C(Fm ) = ⌈log(m + 1)⌉. Since the language {0n | n is the square of some integer} is nonregular, the (one-way) communication complexity of Fm is maximal for infinitely many m ∈ N. Next, we construct a symmetric Boolean function with an exponential difference between its one-way and its two-way communication complexity. Let p0 , p1 , . . . with pi < pi+1 for all i ∈ N be the sequence of all prime numbers. According to the Prime Number Theorem, there are at least logℓ ℓ prime numbers 2k −1 in the interval [ℓ] for all ℓ ≥ 5. For k = ⌈log log m⌉ and n = 2k · (1 + i=0 pi ), the function f : [m] × [n] → {0, 1} defined by f (x, y) = 1 iff  consider z mod pz mod2k = 0, where z = x + y. Using the following two-way proto2k col, one can see that the two-way communication complexity of f is at most 4 log log m: In the first round, Bob sends y0 = y mod 2k to Alice. In the sec0 ond round, Alice sends z0 = (x + y0 ) mod 2k and z ′ = x+y mod pz0 to Bob. 2k y ′ Finally, Bob computes f (x, y) by checking whether ( 2k + z ) mod pz0 = 0. Note that z0 = z mod 2k . The correctness of the protocol can be seen by investigating the addition of integers using a remainder representation. Lemma 5. C(f ) ≤ 4 log log m. For the one-way communication complexity of f we obtain: Lemma 6. #row(Mf ) = m + 1, i.e. C A→B (f ) = ⌈log(m + 1)⌉. Theorem 7. For the symmetric Boolean function F : {0, 1}m ×{0, 1}n → {0, 1} associated with f , we have C(F ) ∈ O(log log m) and C A→B (F ) ∈ Θ(log m). One-Way Communication Complexity of Symmetric Boolean Functions 6 167 Multiparty Communication So far we have analyzed the case that a fixed input partition for a function is given. However, sometimes it is also of interest to examine the communication complexity of a fixed function under varying the input partition. A typical question for this scenario is whether we can partition the input in such a way that the communication complexities for protocols of type A → B and B → A coincide. The main tool for these examinations is the diversity ∆(f ) of f which we introduce below. For a function f : [s] → Σ and m ∈ [s], define fm : [m]×[s−m] → Σ by fm (x, y) = f (x+y) for x ∈ [m] and y ∈ [s−m], and let rf (m) = #row(Mfm ). We define ∆(f ) = maxm∈[s] rf (m). Lemma 7. For every function f : [s] → Σ, the following conditions hold: (a) rf (m) = m + 1 for all m ∈ [∆(f ) − 1], (b) if ∆(f ) ≤ 2s , then rf (m) = ∆(f ) for all m ∈ [∆(f ) − 1 .. s − ∆(f ) + 1], (c) rf (m) ≥ rf (m + 1) for all m ∈ [∆(f ) − 1 .. s − 1]. It is an immediate consequence of Lemma 7 that ∆(f ) equals the minimum m such that Mfm has less than m + 1 different rows, provided that such an m exists. The diversity is helpful to analyze the case that more than two parties are involved. For such multiparty communication we assume that the input is distributed among d parties P1 , . . . , Pd . Every party Pi knows a value xi ∈ [mi ]. The goal is to compute a fixed function f : [m1 ]×. . .×[md ] → Σ. Analogously to communication matrices in the two-party case, we use multidimensional arrays to represent f . Let M(m1 , . . . , md ) be the set of all d-dimensional (m1 + 1) × . . . × (md + 1) arrays M with entries M (i1 , . . . , id ) ∈ Σ for ij ∈ [mj ], j ∈ [1..d]. M is called the communication array of a function f iff M (i1 , . . . , id ) = f (i1 , . . . , id ). We denote the communication array of f by Mf . Recall that in the two-party model the sender has to specify the row/column his input belongs to. In the multiparty case each party will have to specify the type of subarray determined by his input value. Therefore, for each k ∈ [1..d] and (k) each x ∈ [mk ], we define the subarray Mx ∈ M(m1 , . . . , mk−1 , mk+1 , . . . , md ) (k) of M by Mx (i1 , . . . , ik−1 , ik+1 , . . . , id ) = M (i1 , . . . , ik−1 , x, ik+1 , . . . , id ) for all 0 ≤ ij ≤ mj , j ∈ [1..d] \ {k}. Finally, for k ∈ [1..d] we define #subk (M ) as the number of different subarrays with fixed k th dimension: #subk (M ) = |{ Mx(k) | x ∈ [mk ] }| . We call M ∈ M(m1 , . . . , md ) a Hankel array, if M (i1 , . . . , id ) = M (j1 , . . . , jd ) for every pair (i1 , . . . , id ), (j1 , . . . , jd ) ∈ [m1 ] × . . . × [md ] with i1 + . . . + id = d j1 + . . . + jd . For a Hankel array M ∈ M(m1 , . . . , md ), let fM : [ i=1 mi ] → Σ be defined by fM (x) = M (x1 , . . . , xd ), if x = x1 + . . . + xd . Note that fM is well-defined since M is a Hankel array. 168 J. Arpe, A. Jakoby, and M. Liśkiewicz Lemma 8. For a function f such that the corresponding communication array M is a Hankel array, we have rfM (mk ) = #subk (M ) for every k ∈ [1..d]. As a natural extension of two-party communication complexity we consider the case that the parties P1 , . . . , Pd are connected by a directed chain of the parties specified by a permutation π : [1..d] → [1..d], i.e. Pπ(i) can only send messages to Pπ(i+1) for i ∈ [d − 1]. Let S π be the size of an optimal protocol. More precisely, S π is the number of possible communication sequences on the network in an optimal protocol. We will now present a protocol of minimal size for a fixed chain network and functions f such that Mf is a Hankel array. During the computation the i parties use the arrays Mi ∈ M( j=1 mπ(j) , mπ(i+1) , . . . , mπ(d) ), where Mi is the Hankel array defined by Mi (yi , . . . , yd ) = Mf (z1 , . . . , zd ) i for all yi ∈ [ j=1 mπ(j) ], yi+1 ∈ [mπ(i+1) ], . . . , yd ∈ [mπ(d) ] and values z1 ∈ i [m1 ], . . . , zd ∈ [md ] with yi = j=1 zπ(j) and yj = zπ(j) for all j ∈ [i + 1..d]. (1) (1) Furthermore, let Γi (yi ) be the minimum value z such that (Mi )z = (Mi )yi . The protocol works as follows: (1) Pπ(1) sends γ1 = Γ1 (xπ(1) ) to Pπ(2) . (2) For i ∈ [2..d − 1], Pπ(i) receives γi−1 from Pπ(i−1) and sends γi = Γi (xπ(i) + γi−1 ) to Pπ(i+1) . (3) Pπ(d) receives γd−1 from Pπ(d−1) . Then Md (γd−1 + xπ(d) ) gives the result of the function. Theorem 8. For a function f such that Mf ∈ M(m1 , . . . , md ) is a Hankel array the size of the protocol presented above is minimal. Note that the communication size S π may depend on the order π of the parties on the chain. We will state that for mπ(i) ≤ mπ(i+1) for all i ∈ [1..d − 1] the ordering is optimal with respect to the communication size. Theorem 9. Let f be a function such that Mf ∈ M(m1 , . . . , md ) is a Hankel array and π : [1..d] → [1..d] be a permutation with mπ(i) ≤ mπ(i+1) for all ′ i ∈ [1..d − 1]. Then for every permutation π ′ : [1..d] → [1..d] S π (f ) ≤ S π (f ) . A second generalization of the two-party model is the simultaneous communication complexity (C || ), where all parties can simultaneously write in a single round on a blackboard. This means that the messages send by each party do not depend on the messages send by the other parties. After finishing the communication round, each party has to be able to compute the result of the function (see e.g. [9]). For two-party communication it is well-known that C || (f )  = C A→B (f ) + C B→A (f ) . Similarly, for the d-party case we have || C (f ) = i∈[1..d] ⌈log #subi (Mf )⌉ . Hence, if Mf is a Hankel array and if for some dimension k ∈ [1..d] we have #subk (Mf ) ≤ mini∈[1..d] mi , then by Lemmas 7 and 8 C || (f ) = d · ⌈log ∆(fMf )⌉ . As a third generalization, we consider the case that in each round some party can write a message on a blackboard. The message may depend on messages that One-Way Communication Complexity of Symmetric Boolean Functions 169 have been published on the board in previous rounds. We restrict the communication such that each party (except for the last one) publishes exactly one message on the blackboard, and in each round exactly one message is published. After finishing the communication rounds, at least one party has to be able to compute the result of the function. Let S ✷ be the corresponding size of an optimal protocol. Note that this model generalizes both of the previous models. Theorem 10. Let f be a function such that Mf ∈ M(m1 , . . . , md ) is a Hankel array and let π : [1..d] → [1..d] be a permutation such that mπ(i) ≤ mπ(i+1) for all i ∈ [1..d − 1]. Then S π (fM ) = S ✷ (fM ) . 7 Conclusions and Open Problems In this paper we have investigated one-way communication complexity of functions for which the corresponding communication matrices are Hankel matrices. We have established some structural properties of such matrices. As a direct application, we have obtained a complete solution to the problem of how the communication direction in deterministic one-way communication protocols effects the communication complexity of symmetric Boolean functions. One possible direction of future research is to study other kinds of one-way communication such as nondeterministic and randomized for the class of symmetric functions. Another interesting extension of the topic is to drop the restriction to one-way protocols and consider the deterministic two-way communication complexity of symmetric Boolean functions for both a bounded and an unbounded number of communication rounds. This particularly involves results about the computation of the rank of Hankel matrices. In addition, consequences for word combinatorics and OBDD theory may be of interest. Acknowledgment. We would like to thank Ingo Wegener for his useful comment on the connection between one-way communication and OBDD theory. References 1. F. Ablayev, Lower bounds for one-way probabilistic communication complexity and their application to space complexity. Theoretical Comp. Sc., 157 (1996), 139–159. 2. A. Condon, L. Hellerstein, S. Pottle, and A. Wigderson, On the power of finite automata with both nondeterministic and probabilistic states. SIAM J. Comput., 27 (1998), 739–762. 3. P. Ďuriš, J. Hromkovič, J.D.P. Rolim, and G. Schnitger, On the power of Las Vegas for one-way communication complexity, finite automata, and polynomialtime computations. Proc. 14th STACS, Springer, 1997, 117–128. 4. J. E. Hopcroft and J. D. Ullman, Formal Languages and Their Relation to Automata. Addison-Wesley, Reading, Massachusetts, 1969. 5. J. Hromkovič, Communication Complexity and Parallel Computing. Springer, 1997. 170 J. Arpe, A. Jakoby, and M. Liśkiewicz 6. I. S. Iohvidov, Hankel and Toeplitz Matrices and Forms. Birkhäuser, Boston, 1982 7. H. Klauck, On quantum and probabilistic communication: Las Vegas and one-way protocols. Proc. 32nd STOC, 2000, 644–651. 8. I. Kremer, N. Nisan, and D. Ron, On randomized one-round communication complexity. Computational Complexity, 8 (1999), 21–49. 9. E. Kushilevitz and N. Nisan, Communication Complexity. Camb. Univ. Press, 1997. 10. K. Mehlhorn and E. M. Schmidt, Las Vegas is better than determinism in VLSI and distributed computing. Proc. 14th STOC, 1982, 330–337. 11. I. Newman and M. Szegedy, Public vs. private coin flips in one round communication games. Proc. 28th STOC, 1996, 561–570. 12. C. Papadimitriou and M. Sipser, Communication complexity. J. Comput. System Sci., 28 (1984), 260–269. 13. I. Wegener, The complexity of Boolean functions. Wiley-Teubner, 1987. 14. I. Wegener, personal communication, April 2003. 15. A. C. Yao, Some complexity questions related to distributive computing. Proc. 11th STOC, 1979, 209–213. Circuits on Cylinders Kristoffer Arnsfelt Hansen1 , Peter Bro Miltersen1 , and V. Vinay2 1 Department of Computer Science, University of Aarhus, Denmark {arnsfelt,bromille}@daimi.au.dk 2 Indian Institute of Science, Bangalore, India. vinay@csa.iisc.ernet.in Abstract. We consider the computational power of constant width polynomial size cylindrical circuits and nondeterministic branching programs. We show that every function computed by a Π2 ◦ MOD ◦ AC0 circuit can also be computed by a constant width polynomial size cylindrical nondeterministic branching program (or cylindrical circuit) and that every function computed by a constant width polynomial size cylindrical circuit belongs to ACC0 . 1 Introduction In this paper we consider the computational power of constant width, polynomial size cylindrical branching programs and circuits. It is well known that there is a rough similarity between the computational power of width restricted circuits and depth restricted circuits, but that this similarity is not a complete equivalence. For instance, the class of functions computed by a family of circuits of quasi-polynomial size and polylogarithmic depth is equal to the class of functions computed by a family of circuits of quasi-polynomial size and polylogarithmic width. On the other hand, the class of functions computed by a family of circuits of polynomial size and polylogarithmic width (non-uniform SC) is, in general, conjectured to be different from the class of functions computed by a family of circuits of polynomial size and polylogarithmic depth (non-uniform NC). For the case of constant depth and width, there is a provable difference in computational power; the class of functions computable by constant depth circuits of polynomial size, i.e, AC0 , is a proper subset of the functions computable by constant width circuits (or branching programs) of polynomial size, the latter being, by Barrington’s Theorem [1], the bigger class NC1 . On the other hand, Vinay [9] and Barrington et al [2,3] showed that by putting a geometric restriction on the computation, the difference disappears: The class of functions computable by upwards planar, constant width, polynomial size circuits (or nondeterministic branching programs) is exactly AC0 . Thus, both AC0 and NC1 can be captured by a constant width as well as by a constant depth circuit model. It is then natural to ask if one can similarly capture classes between AC0 and NC1 defined by various constant depth circuit models, such as ACC0 and TC0 , by some natural constant width circuit or branching program model. A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 171–182, 2003. c Springer-Verlag Berlin Heidelberg 2003  172 K.A. Hansen, P.B. Miltersen, and V. Vinay x1 x1 x2 x2 x3 x3 x4 x4 x2 x3 x4 x2 x3 x4 x5 x5 Fig. 1. A cylindrical branching program of width 2 computing PARITY. Building upon the results in this paper, such a characterisation has recently been obtained for ACC0 [6]: The class of functions computed by planar constant width, polynomial size circuits is exactly ACC0 . In this paper we consider a slightly more relaxed geometric restriction than upwards planarity, yet more restrictive than planarity: We consider the functions computed by cylindrical polynomial size, constant width circuits (or nondeterministic branching programs). Informally (for formal definitions, see the next section), a layered circuit (branching program) is cylindrical if it can be embedded on the surface of a cylinder in such a way that each layer is embedded on a cross section of the cylinder (disjoint from the cross sections of the other layers), no wires intersect and all wires between two layers are embedded on the part of the cylinder between the two corresponding cross sections (see Fig. 1). It is immediate that constant width polynomial size cylindrical branching programs have more computational power than constant width polynomial size upwards planar branching programs: The latter compute only functions in AC0 [2] while the former may compute PARITY (see Fig. 1). We ask what their exact computational power is and show that their power does not extend much beyond computing functions such as PARITY. Indeed, they can only compute functions in ACC0 . To be precise, the first main result of this paper is the following lower bound on the power of cylindrical computation. Theorem 1. Every Boolean function computed by a polynomial size Π2 ◦ MOD ◦ AC0 circuit is also computed by a constant width, polynomial size cylindrical nondeterministic branching program. By a Π2 ◦ MOD ◦ AC0 circuit we mean a polynomial sized circuit with an AND gate at the output, a layer of OR gates feeding the AND gate, a layer of MODm gates (perhaps for many different constant-bounded values of m) feeding the OR gates and a (multi-output) AC0 circuit feeding the MOD gates. It is not known if the inclusion is proper. We prove Theorem 1 by a direct construction, generalising and extending the simple idea of Fig. 1. Our second main result is the following upper bound on the power of cylindrical computation. Theorem 2. Every Boolean function computed by a constant width, polynomial size cylindrical circuit is in ACC0 . Due to space constraints, the proof of Theorem 2 is omitted from this version of the paper. Instead we provide a proof of the weaker statement that cylindrical Circuits on Cylinders 173 branching programs only compute functions in ACC0 . We do however give an overview of a proof of Theorem 2. The full proof can be found in the technical report version of this paper [7]. The simulation is done (as were many previous results about constant width computation) by using the theory of finite monoids and the results of Barrington and Therien [4]. The notions of upwards planarity and of cylindricality share the property that all arcs flow along a common direction. This allows these notions to be captured by local constraints, which allows one to transfer the analysis of the restricted branching programs and circuits into an appropriate algebraic setting. Thus, we show the inclusion by relating the computation of cylindrical circuits to solving the word problem of a certain finite monoid and then show that this monoid is solvable. A standard simulation shows that every Boolean function computed by a constant width, polynomial size cylindrical nondeterministic branching program is also computed by a constant width, polynomial size cylindrical circuit. For completeness, we describe this simulation in Proposition 3. Thus, one can exchange “cylindrical nondeterministic branching program” with “cylindrical circuit” and vice versa in our two main results. Organisation of Paper. In Sect. 2, we formally define the notions of cylindrical branching program and circuits. We also give an overview of the algebraic tools we use. In Sect. 3, we show Theorem 1. In Sect. 4 we show the weaker version of Theorem 2 for cylindrical branching programs (instead of circuits), and in Sect. 5, we give an overview of the full proof of Theorem 2. We conclude with some discussions and open problems in Sect. 6. 2 Preliminaries Bounded Depth Circuits. Let A ⊂ {0, . . . , m − 1}. Using the notation of Grolmusz and [5], a MODA m gate takes n boolean inputs x1 , . . . , xn and Tardos n outputs 1 if i=1 xi ∈ A (mod m) and 0 otherwise. We let MOD denote the family of MODA m gates for all constant bounded m and all A. Similarly will AND and OR denote the family of unbounded fanin AND and OR gates. If G is a family of boolean gates and C is a family of circuits we let G ◦ C denote the class of polynomial size circuit families consisting of a G gate taking circuits from C as inputs. AC0 is the class of functions computed by polynomial size bounded depth circuits consisting of NOT gates and unbounded fanin AND and OR gates. ACC0 is the class of functions computed when we also allow unbounded fanin MOD gates computing MODk for constants k. We will also use AC0 and ACC0 to denote the class of circuits computing the languages in the respective classes. Cylindrical Branching Programs and Circuits. A digraph D = (V, A) is called layered if there is a partition V = V0 ∪ V1 ∪ · · · ∪ Vh such that all arcs of 174 K.A. Hansen, P.B. Miltersen, and V. Vinay A goes from layer Vi to the next layer Vi+1 for some i. We call h the depth of D, |Vi | the width of layer i and k = max |Vi | the width of D. Let [k] denote the integers {1, . . . , k}. For a, b ∈ [k] where a ≡ b + 1 (mod k) we define the (cyclic) interval [a, b] to be the set {a, . . . , b} if a ≤ b and {a, . . . , k} ∪ {1, . . . , b} if a > b. Furthermore let (a, b) = [a, b] \ {a, b}, and let (a, b) = [k] \ {a, b} if a ≡ b + 1 (mod k). Let D be a layered digraph in which all layers have width k. We will assume the nodes in each layer numbered 1, . . . , k, and refer to nodes by these numbers. Then, D is called a cylindrical if the following property is satisfied: For every pair of arcs going from layer l to layer l + 1 connecting node a to node c and node b to node d the following must hold: Nodes in the interval (a, b) of layer l can only connect to nodes in the interval [c, d] of layer l + 1 and nodes in the interval (b, a) of layer l can only connect to nodes in the interval [d, c] of layer l + 1. Notice this is equivalent of saying that nodes in the interval (c, d) of layer l + 1 can only connect to nodes in the interval [a, b] of layer l and nodes in the interval (d, c) of layer l + 1 can only connect to nodes in the interval [b, a] of layer l. A nondeterministic branching program 1 is an acyclic digraph where all arcs are labelled by either a literal, i.e. a variable or a negated variable, or a boolean constant, and an initial and a terminal node. An input is accepted if and only if there is a path from the initial node to the terminal node in the digraph that results from substituting constants for the literals according to the input and then deleting arcs labelled by 0. We will only consider branching programs in layered form, that is, viewed as a digraph it is layered. We can assume that the initial node is in the first layer and the terminal node in the last layer, and furthermore that these are the only nodes incident to arcs in these layers. We can also assume that all layers have the same number of nodes, by the addition of dummy nodes. By a cylindrical branching program we will then mean a bounded-width nondeterministic branching program in layered form, which is cylindrical when viewed as a digraph. A cylindrical circuit is a circuit consisting of fanin 2 AND and OR gates and fanin 1 COPY gates, which when viewed as a digraph is a cylindrical digraph. Inputs nodes can be literals or boolean constants. The output gate is in the last layer. We can assume that all layers have the same number of nodes by adding dummy input nodes to the first layer and dummy COPY gates to the other layers. A standard simulation of nondeterministic branching programs by circuits extends to cylindrical branching programs and cylindrical circuits. We give the details for completeness. 1 Our definition deviates slightly from the usual definition where nodes rather than edges are labelled by literals and unlabelled nodes serve as special nondeterministic “choice”-nodes, but it is easily seen to be polynomially equivalent - also in the cylindrical case - and it is more convenient for us. Circuits on Cylinders 175 Proposition 3. Every function computed by a width k, depth d cylindrical branching program is also computed by a width O(k), depth O(d log k) cylindrical circuit Proof. Replace every node in the branching program by an OR-gate. Replace each arc, going from, say, node u to node v and labelled with the literal x, with a new AND-gate taking two inputs, gate u and the literal x and with the output of the AND-gate feeding gate v. This transformation clearly preserves the cylindricality of the graph. Also, the width of the circuit is linear in the width of the branching program. The resulting OR-gates may have fan-in bigger than two. We replace each such gate with a tree of fan-in two OR-gates, preserving the width and blowing up the depth by at most a factor of O(log k). ⊓ ⊔ Monoids and Groups. Let x and y be elements of a group G. The commutator of x and y is the element x−1 y −1 xy. The subgroup G(1) of G generated by all of the commutators in G is called the commutator subgroup of G. In general, let G(i+1) denote the commutator subgroup of G(i) . G is solvable if G(n) is the trivial group for some n. It follows that an Abelian group, and in particular a cyclic group, is solvable. A monoid is a set M with an associative binary operation and a two sided identity. A subset G of M is a group in M if it is a group with respect to the operation of M . Note that a group G in M is not necessarily a submonoid of M as the identity element of G may not be equal to the identity element of M . M is called solvable if every group in M is solvable. The word problem for a finite monoid M is the computation of the product x1 x2 . . . xn given x1 , x2 , . . . , xn as input. A theorem by Barrington and Therien [4] states that the word problem for a solvable finite monoid is in ACC0 . 3 Simulation of Bounded Depth Circuits by Cylindrical Branching Programs In this section, we prove Theorem 1. As a starting point, we shall use the “only if” part of the following correspondence established by Vinay [9] and Barrington et al [2]. We include here a proof of the “only if” part for completeness. Theorem 4. A language is in AC0 if and only if it is accepted by a polynomial size, constant width upwards planar branching program. Here an upwards planar branching program is a layered branching program satisfying, that for every pair of arcs going from layer l to layer l + 1 connecting node a to node c and node b to node d, if a < b then c ≤ d. We need some simple observations. First observe that if we can simulate a class of circuits C with upwards planar (cylindrical) branching programs, then we can also simulate AND ◦ C by upwards planar (cylindrical) branching programs by simply concatenating the appropriate branching programs. 176 K.A. Hansen, P.B. Miltersen, and V. Vinay Another way to combine branching programs is by substitution where we simply substitute a branching program for the edges corresponding to a particular literal. The effect of this is captured in the following lemma. Lemma 5. If f (x1 , . . . , xn ) is computed by an upwards planar (cylindrical) branching program of size s1 and width w1 and g1 , . . . , gn and g 1 , . . . , g n are computed by upwards planar branching programs, each of size s2 and width w2 then f (g1 , . . . , gn ) is computed by an upwards planar (cylindrical) branching program of size O(s1 w1 s2 ) and width O(w12 w2 ). • x1 • • 1 • • 1 • 1 xn−1 x2 • 1 • 1 • xn • 1 Fig. 2. An upwards planar branching program computing OR. Combining the above observations with the construction in Fig. 2, simulating an OR gate, we have established the “only if” part of Theorem 4. Simulation of a MODA m gate can be done as shown in Fig. 3 if one disregards the top nodes in the first and last layers and modifies the connections between the second-to-last layer to take the set A into account. Thus, combining this construction with Lemma 5, the “only if” part of Theorem 4 and the closure of cylindrical branching programs under polynomial fan-in AND, we have established that we can simulate AND ◦ MOD ◦ AC0 circuits by bounded width polynomial size cylindrical circuits. • 1 • 1 • 1 • x1 x1 x1 x1 • • x1 • x1 • x1 • x2 x2 • • x2 • x2 • xn xn 1 • • xn • xn • • xn 1 • 1 xn • x2 • xn xn x2 • x1 1 x2 x2 • 1 • Fig. 3. A cylindrical branching program fragment for MOD4 . The construction as shown in Fig. 3 has actually more use, by seeing it as computing elements of M2 , where M2 is the monoid of binary relations on [2]. The general construction of a branching program fragment for MODA m taking n inputs is as follows: Without loss of generality we can assume that |A| = 1 and in fact A = {0} since we aim for simulating OR ◦ MOD. The branching program fragment will have n + 3 layers. The first and last layer of width 2 and the middle layers of width m. The top node in the first layer has arcs to all nodes but node 1 and the bottom node has an arc to node 1. The top node in the last layer has arcs from all nodes but the one in A and the bottom node has an arc from this node. The nodes in the middle layers represent the sum of a prefix Circuits on Cylinders 177 of the input modulo m in the obvious way. Consider now the elements of M2 shown in Fig. 4. The branching program fragment just described corresponds to (a) and (b) for m = 2 and m > 2 respectively, when the simulated MOD gate evaluates to 0. In both cases, the fragment correspond to (c) when the simulated MOD gate evaluates to 1. • •• •• •• • • •• •• •• • (a) (b) (c) (d) Fig. 4. Some elements of M2 . We can now describe our construction for simulating OR ◦ MOD circuits. The construction interleaves branching program fragments for (d) between the branching program fragments for the MOD gates. This can be seen as a way of “short circuiting” the branching program in the case that one of the MOD gates evaluate to 1. Finally we add layers at both ends picking out the appropriate nodes for the simulation. The entire construction is shown in Fig. 5. The correctness can easily be verified. The simulation of OR ◦ MOD circuits, the “only if” part of Theorem 4, Lemma 5, and the closure of cylindrical branching programs under polynomial fan-in AND, together completes the proof of Theorem 1. • 1 • • MOD • • 1 1 1 • • MOD • • • 1 1 1 • • • 1 1 1 • • MOD • • 1 • Fig. 5. A cylindrical branching program computing MOD ∨ · · · ∨ MOD. 4 Simulation of Cylindrical Branching Programs by Bounded Depth Circuits In this section, we compensate for the omitted proof of Theorem 2 sketched in the next section, by giving a simpler (but similar) proof of the weaker result that constant width polynomial size cylindrical nondeterministic branching programs compute only functions in ACC0 . In fact, we shall prove that for fixed k the following “branching program value problem” BPVk is in ACC0 : Given a width k cylindrical branching program and a truth assignment to its variables, decide if the program accepts. As any function computed by width k cylindrical polynomial size branching program clearly is a Skyum-Valiant projection [8] of BPVk , we will be done. We shall prove that BPVk is in ACC0 by showing that it reduces, by an AC0 reduction, to the word problem of the monoid Mk we define next. Then, we show that the monoid Mk is solvable, and since this implies, by the result 178 K.A. Hansen, P.B. Miltersen, and V. Vinay of Barrington and Therien [4] that the word problem for Mk is in ACC0 , our proof will be complete. We define Mk to be the monoid of binary relations on [k] which capture the calculation of width k branching programs embedded on a cylinder in the following sense: Mk is the monoid generated by all the relations which express how arcs can travel between two adjacent layers in an width k cylindrical digraph. The monoid operation is the usual composition operation of binary relations, i.e., if A, B ∈ Mk and x, y ∈ [k], xABy ⇔ ∃z : xAz ∧ zBy. BPVk reduces to the word problem for Mk by the following AC0 reduction: Substitute constants for the literals in the branching program according to the truth assignment. Consider now the cylindrical digraph D consisting only of arcs which have the constant 1 associated. Then, the branching program accepts the input given if and only if there is a path from the initial node in the first layer to the terminal node in the last layer of D. We can decide this by simply decomposing D into a sequence A1 , A2 , . . . , Ah of elements from Mk , computing the product A = A1 A2 · · · Ah and checking whether this is different from the zero element of Mk . Thus, we just need to show that Mk is solvable. Our proof is finished by the following much stronger statement. Proposition 6. All groups in Mk are cyclic. Proof. Let G ⊆ Mk be a group with identity E. Let A ∈ G and let R be the set of all x such that xEx. As will be shown next it will be enough to consider elements of R to capture the structure of A. Let x ∈ R. Since AA−1 = E there exists z such that xAz and zA−1 x. Since A−1 A = E it follows zEz, that is, z ∈ R. Hence there exists a function πA : R → R such that ∀x : xAπA (x) ∧ πA (x)A−1 x To see that A is completely described by by πA , we define a relation  on [k] such that xÂy ⇔ πA (x) = y. That is,  is just πA viewed as a relation. Since  ⊆ A it follows E ÂE ⊆ EAE = A. Conversely let xAy. Since E k A = A there exists z ∈ R such that xEz and zAy. Since πA (z)A−1 z we get πA (z)Ey. That is xEz, z ÂπA (z) and πA (z)Ey. Thus xE ÂEy. Hence we obtain that A = E ÂE. We would like to have both that πA is a permutation and that {πA |A ∈ G} is a group. This is in general not true, since E can be any transitive relation in Mk . To obtain this we will first simplify the structure of the elements of G using the following equivalence relation on [k] defined by x ∼ y ⇔ (xEy ∧ yEx) ∨ x = y. Let A ∈ G. If x ∼ x′ and y ∼ y ′ then xAy ⇔ x′ Ay ′ , since EAE = A. Thus A gives rise to a relation à on [k]/∼ where xAy ⇔ [k]x Ã[k]y and it will follow that {Ã|A ∈ G} is an isomorphic group of G. Circuits on Cylinders 179  = ÃB̃. This follows since [k]x AB[k]  z ⇔ For this we need to show that AB xABz ⇔ ∃y : xAy ∧ yBz ⇔ ∃y : [k]x Ã[k]y ∧ [k]y B̃[k]z ⇔ [k]x ÃB̃[k]z We can find an isomorphic copy of this group in Mk as follows. Choose for each equivalence class [k]x a representative r([k]x ) in [k]x . Define a relation C on [k] such that xCy ⇔ x = y = r([k]x ). Thus ∀x : r([k]x )Cr([k]x ). Let σ : G → Mk be given by σ(A) = CAC. Then σ(G) is the desired isomorphic copy of G. We can thus assume that the equivalence classes with respect to ∼ are of size 1. We now return to the study of πA . The following property, that for x, y ∈ R it holds that xEy ⇔ πA (x)EπA (y), is satisfied: If xEy then πA (x)A−1 y since A−1 E = A−1 . As A−1 A = E it follows that πA (x)EπA (y). Conversely if πA (x)EπA (y) then xAπA (y) since xAπA (x) and AE = A. As πA (y)A−1 y and AA−1 = E it then follows that xEy. We can now conclude that πA is a permutation on R: If πA (x) = πA (y) then πA (x) ∼ πA (y) so x ∼ y, that is, x = y. Also πA is uniquely defined : Assume π̂A : R → R satisfies ∀x : xAπ̂A (x) ∧ π̂A (x)A−1 x Let x ∈ R. We then obtain πA (x) ∼ π̂A (x) so πA (x) = π̂A (x). Hence πA = π̂A . Now we can conclude that {πA |A ∈ G} is a permutation group which is isomorphic to G. For this we need to show that πAB = πB ◦ πA . Let x ∈ R. Since xAπA (x) and πA (x)BπB ◦ πA (x) it follows xABπB ◦ πA (x). Since πB ◦ πA (x)B −1 πA (x) and πA (x)A−1 x it follows πB ◦ πA (x)B −1 A−1 x, i.e. −1 πB ◦ πA (x)(AB) x Since πAB is uniquely defined the result follows. To show that {πA |A ∈ G} is cyclic we need the following fact, which easily follows from the definition of cylindricality Fact: Let A be a relation which can be directly embedded on a cylinder. Let p1 < p2 < . . . pm and q1 < q2 < · · · < qm and π a permutation on [m] such that ∀i : pi Aqπ(i) . Then π is in the cyclic group of permutations on [m] generated by the cycle (1 2 . . . m). Now let r1 < r2 < · · · < rm be the elements of R. Write A ∈ G as A = A1 A2 . . . Ah where the Ai ’s can be directly embedded on the cylinder. Since ri AπA (ri ) we have for all i, elements of [k], ri = qi0 , qi1 , . . . , qih = πA (ri ) such that qij Aj+1 qij+1 . For fixed j all the qij ’s are distinct. If not we would have i1 and i2 such that ri1 AπA (ri2 ) and ri2 AπA (ri1 ). But then since πA (ri1 )A−1 ri1 and πA (ri2 )A−1 ri2 we then get ri1 Eri2 and ri2 Eri1 . That is ri1 ∼ ri2 which implies ri1 = ri2 . Now by the fact and induction on h we have a permutation π in the cyclic group generated by the cycle (1 2 . . . m) such that rπ(i) = πA (ri ). Thus πA is in the cyclic group generated by the cycle (r1 r2 . . . rm ) and we can conclude that G is cyclic. ⊓ ⊔ 180 5 K.A. Hansen, P.B. Miltersen, and V. Vinay Simulation of Cylindrical Circuits by Bounded Depth Circuits In this section we provide an overview of the proof of Theorem 2 which can be found in the technical report version of this paper [7]. The rough outline is similar to that of the last section. For fixed k we consider the following “circuit value problem” CVk : Given a width k cylindrical circuit and a truth assignment to its input variables, decide if the circuit evaluates to 1. This is then reduced, by an AC0 reduction, to the word problem of the monoid Nˆk defined next, which will be proved to be solvable. By the result of Barrington and Therien [4] it then follows that CVk is in ACC0 . Consider a width k cylindrical circuit C with k input nodes, all placed in the first layer. We can view this as computing a function mapping {0, 1}k to {0, 1}k by reading off the values of the nodes in the last layer. We let N̂k be the monoid of such functions mapping {0, 1}k to {0, 1}k . This provides the base for the desired AC0 reduction in the following way: Given an instance of the circuit value problem we substitute constants for the variables according to the truth assignment and then view each layer of the circuit as an element of Nˆk by preceding it with k input nodes. By computing the product of these and evaluating it on the constants given to the first layer, the desired result is obtained. The monoid Nˆk is shown to be solvable like in the previous section, by proving that all its groups are cyclic. A first step to obtain this is to eliminate constants from the circuits correspond to group elements. Let Nk be the monoid of functions mapping {0, 1}k to {0, 1}k which are computed by width k cylindrical circuits with k variable input nodes, all placed in the first layer, with constant input nodes disallowed. It is then proved that every group in Nˆk is isomorphic to a group in Nk . The tool for studying Nk will be an identification of input vectors in {0, 1}k with its set of maximal 1-intervals as considered in [3], only here we consider cyclic intervals. For example is the vector 1010011011 identified with the set of intervals {[3, 3], [6, 7], [9, 1]}. Now consider a group G in Nk with identity e, and let f ∈ G. Since e ◦ e = e we get that e is the identity mapping on the image of e, Im e. Thus any f ∈ G is a permutation of Im e, since f ◦ f −1 = f −1 ◦ f = e and e ◦ f = f . Also since f ◦ e = f it follows that f is completely described by its restriction to Im e. The fact that f has an inverse on Im e, is shown to imply that f must preserve the number of intervals in any x ∈ Im e. The crucial property employed here, is the monotonicity of the gate operations. This furthermore implies that f is completely described by its restriction to the set I of vectors in Im e consisting of only a single interval. Next, using the natural partial order on I given by lifting the order 0 < 1 pointwise, one can decompose I into antichains, onto which f ∈ G is easy to describe. In fact f is a cyclic shift on each of these antichains. Finally by relating these cyclic shifts one can conclude that G is a cyclic group. Circuits on Cylinders 6 181 Conclusion and Open Problems We have located the class of functions computed by small constant width cylindrical circuits (or nondeterministic branching programs) between Π2 ◦ MOD ◦ AC0 and ACC0 . It would be very interesting to get an exact characterisation of the power of cylindrical circuits and branching programs in terms of bounded depth circuits. It is not known whether Π2 ◦ MOD ◦ AC0 is different from ACC0 and this seems a difficult problem to resolve, so we cannot hope for an unconditional separation of the power of cylindrical circuits from ACC0 . On the other hand, it seems difficult to generalise the simulation of Π2 ◦ MOD ◦ AC0 by cylindrical branching programs to handle more than one layer of MOD gates and we tend to believe that such a simulation is in general not possible. Thus, one could hope that by better understanding the structure of the monoids we have considered in this paper, it would be possible to prove an upper bound seemingly better than ACC0 , such as for instance AC0 ◦ MOD ◦ AC0 . It would also be interesting to separate the power of branching programs from the power of circuits. As circuits can be trivially negated while preserving cylindricality, we immediately have that not only Π2 ◦ MOD ◦ AC0 but also Σ2 ◦ MOD ◦ AC0 can be simulated by small constant width cylindrical circuits. On the other hand, we don’t know if Σ2 ◦ MOD ◦ AC0 can be simulated by small constant width cylindrical branching programs. Note that in the upwards planar case, both models capture AC0 and in the geometrically unrestricted case, both models capture NC1 , so it is not clear if one should a priori conjecture the cylindrical models to have different power. Note that if the models have identical power then they can simulate AC0 ◦ MOD ◦ AC0 . This follows from the fact that the branching program model is closed under polynomial fan-in AND while the circuit model is closed under negation. An interesting problems concerns the blowup of width to depth when going from a cylindrical circuit or branching program to an ACC0 circuit. Our proof does not yield anything better than a doubly exponential blowup. Again, by better understanding the structure of the monoids we have considered, one could hope for a better upper bound. Acknowledgements. The first two authors are supported by BRICS, Basic Research in Computer Science, a Centre of the Danish National Research Foundation. References 1. D. A. Barrington. Bounded-width polynomial-size branching programs recognize exactly those languages in NC1 . J. Comput. System Sci., 38(1):150–164, 1989. 2. D. A. M. Barrington, C.-J. Lu, P. B. Miltersen, and S. Skyum. Searching constant width mazes captures the AC0 hierarchy. In Proceedings of the 15th Annual Symposium on Theoretical Aspects of Computer Science, pages 73–83, 1998. 182 K.A. Hansen, P.B. Miltersen, and V. Vinay 3. D. A. M. Barrington, C.-J. Lu, P. B. Miltersen, and S. Skyum. On monotone planar circuits. In 14th Annual IEEE Conference on Computational Complexity, pages 24–31. IEEE Computer Society Press, 1999. 4. D. A. M. Barrington and D. Thérien. Finite monoids and the fine structure of NC1 . Journal of the ACM (JACM), 35(4):941–952, 1988. 5. V. Grolmusz and G. Tardos. Lower bounds for (modp − modm) circuits. SIAM Journal on Computing, 29(4):1209–1222, Aug. 2000. 6. K. A. Hansen. Constant width planar computation characterizes ACC0 . Technical Report 25, Electronic Colloquium on Computational Complexity, 2003. 7. K. A. Hansen, P. B. Miltersen, and V. Vinay. Circuits on cylinders. Technical Report 66, Electronic Colloquium on Computational Complexity, 2002. 8. S. Skyum and L. G. Valiant. A complexity theory based on boolean algebra. Journal of the ACM (JACM), 32(2):484–502, 1985. 9. V Vinay. Hierarchies of circuit classes that are closed under complement. In 11th Annual IEEE Conference on Computational Complexity, pages 108–117. IEEE Computer Society, 1996. Fast Perfect Phylogeny Haplotype Inference Peter Damaschke Chalmers University, Computing Sciences, 41296 Göteborg, Sweden ptr@cs.chalmers.se Abstract. We address the problem of reconstructing haplotypes in a population, given a sample of genotypes and assumptions about the underlying population. The problem is of major interest in genetics because haplotypes are more informative than genotypes when it comes to searching for trait genes, but it is difficult to get them directly by sequencing. After showing that simple resolution-based inference can be terribly wrong in some natural types of population, we propose a different combinatorial approach exploiting intersections of sampled genotypes (considered as sets of candidate haplotypes). For populations with perfect phylogeny we obtain an inference algorithm which is both sound and efficient. It yields with high propability the complete set of haplotypes showing up in the sample, for a sample size close to the trivial lower bound. The perfect phylogeny assumption is often justified, but we also believe that the ideas can be further extended to populations obeying relaxed structural assumptions. The ideas are quite different from other existing practical algorithms for the problem. 1 Introduction Somatic cells of diploid organisms such as higher animals and plants contain two copies of genetic material, in pairs of homologous chromosomes. The material on an arbitrary but fixed part of a single chromosome is called a haplotype. Formally we may describe a haplotype as a vector (a1 , . . . , as ) where s is the number of sites considered, and ai is the genetic data at site i. Here the term site can refer to a gene, a short subsequence, or even a single nucleotide. The ai are called alleles. The vector of unordered pairs ({a1 , b1 }, . . . , {an , bn }) resulting from haplotypes (a1 , . . . , an ) and (b1 , . . . , bn ) on homologous chromosomes is called a genotype. A site is homozygous if ai = bi , and heterozygous (or ambigous) if ai = bi . The terminology in the literature is not completely standardized, in the present paper we use it as introduced above. Usual sequencing methods yield only genotypes but not the pairs of haplotypes they are built from, the so-called phase information. Haplotyping techniques exist, but they are much more expensive, and it is expected that this relation will stay so for quite many years. On the other hand, haplotype data is often needed for analyzing the background of hereditary dispositions. For example, a hereditary trait often originates from a single mutation on a chromosome that has been transmitted over generations, and further silent A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 183–194, 2003. c Springer-Verlag Berlin Heidelberg 2003  184 P. Damaschke mutations (without effect) supervened. This way the trait is associated with a certain subset of haplotypes. If one wants to find the relevant mutation amongst the silent ones, it is useful to recognize haplotypes of affected individuals and to search the corresponding chromosomes only. Genotype information alone is less specific, also for the purpose of prediction of traits. Other applications include questions from population dynamics. Therefore it is important to reconstruct haplotypes from observed genotypes. A genotype with k > 0 ambigous sites can be explained by 2k−1 distinct haplotype pairs, and reconstruction is impossible if we consider isolated genotypes only. However if we have a large enough genotype sample from a population and a proper assumption about the structure of this population, we may be able to infer the haplotypes with high confidence. One of them is: Definition 1. A population fulfills the random mating assumption (is in HardyWeinberg equilibrium) if the haplotypes form pairs at random, according to their frequencies in the population, i.e. the probability to have a specific ordered pair of haplotypes in a randomly chosen genotype is simply the product of their frequencies. Although this is not perfectly true in real populations, due to mating preferences and spatial structure, the behaviour of an inference algorithm in such a setting says much about its appropriateness. We focus attention on the biallelic case where each ai has two possible values which we may denote by Boolean constants 0 and 1. This is not a severe restriction because there exist only two alleles per locus if mutations affect every locus only once, which is typically the case. For notational convenience we write haplotypes as binary strings and genotypes as ternary strings where 0,1, and 2 stand for {0, 0}, {1, 1}, and {0, 1}, respectively. Definition 2. For β ⊂ {0, 1, 2}, the β-set of a genotype or haplotype is the set of all sites whose value is in β. We omit set parantheses in β. Sometimes it is convenient to rename the alleles such that some specific haplotype is w.l.o.g. the zero string 00 . . . 0. Note that the 2-sets of genotypes are invariant under this renaming. One may also think of haplotypes as vertices of the s-dimensional cube of Boolean vectors of length s. Having this picture in mind, we identify a genotype with the subcube c having the generating haplotype pair as one of its diagonals, i.e. with the set of haplotypes a ∈ c. This relation holds true iff ai = ci for all i in the 0,1-set of c. We will use the notations interchangeably. Related literature and our contribution. We try to give an overview of various attempts, and we apologize for any omission. In [2], the following resolution (or subtraction) method has been proposed. Assume that our sample contains a genotype with no or one ambigous site. Then we immediately know one or two haplotypes, respectively, for sure. They are called resolved haplotypes. For any resolved haplotype a = a1 . . . an and any Fast Perfect Phylogeny Haplotype Inference 185 genotype c = c1 . . . cn such that ai = 2 implies ci = ai , it is possible that c is composed of a and another haplotype b defined by bi = ai for ci = 2, and bi = 1 − ai for ci = 2. We call b the complement of a in c. The classical resolution algorithm simply assumes that c is indeed built from a and b, it considers b as a new resolved haplotype, and removes c as a resolved genotype from the sample, and so on, until no further resolution step can be executed. Objections against this heuristic have been noticed already in [2]: A minor problem is that we may not find a resolved haplotype to start with. A large enough sample will contain some homozygous genotypes w.h.p. (Here and henceforth, w.h.p. means: with high probability.) More seriously, any resolution step may be wrong, i.e. the subcube c containing vertex a may actually be formed by a different haplotype pair. This is called an anomalous match. Even worse, further resolution steps starting from a false haplotype b may cause a cascade of such errors. The rush removal of resolved genotypes is yet another source of errors, since the same genotype may well be formed by different haplotype pairs in a population. Resolution has been further studied in [7,8]. The output depends on the ordering the steps are performed, and the “true” ordering must resolve all genotypes. Unfortunately, the corresponding maximization problem to resolve as many genotypes as possible is Max-SNP hard [7], and moreover, a big number of resolved genotypes does not guarantee that the inferred haplotypes are correct. (There exist some conjectures, heuristic reasoning, and experimental results around this question, but apparently without rigorous theoretical foundation.) More advanced resolution algorithms solve some integer programming problem on a resolution graph constructed from the sample, and they can find good results in experiments [8], but still the reliabilty question remains. A completlely different approach to haplotype inference is Bayesian statistics under the random mating assumption. We refer to [4,11,12]. Although accuracy has certainly been noticed as an issue, it is not obvious how reliable every single haplotype in output sets of the various algorithms actually is. In the present paper we address the question of reliable combinatorial haplotype inference methods. For haplotype populations having a perfect phylogenetic tree (definitions are given later) we show that a combinatorial algorithm which is different from resolution is able to infer all haplotypes w.h.p. from a large enough sample, whereas resolution is provably bad. The perfect phylogeny assumption has first been used for haplotype inference in [9], resulting in an almost linear but very complicated algorithm (via reduction to the graph realization problem). Slower but practical and elegant algorithms have been discovered shortly thereafter independently by [1,3], and they proved useful on real data. The work presented here (including the principal idea to exploit perfect phylogeny structure) was mainly finished before we became aware of [9,1,3]. We propose another elementary algorithm. It happens to be quite different from the algorithms in [1,3] which work with pairs of sites. Our approach is “orthogonal” so to speak, as it works with pairs of genotypes. This can be advantageous for the running time, since only certain pairs of genotypes have 186 P. Damaschke to be considered. It should be noticed that the algorithms in [1,3] output a representation of all consistent haplotyping results, whereas our primary goal is to output the haplotypes that can be definitely determined. We also study the size of a random sample that leads to a unique result w.h.p. This does not mean that the method gives a result only in the latter case: It still resolves many haplotypes if fewer genotypes are available, and it is incremental in the sense that new genotype data can be easily incorporated. Due to the different approach and focus, our expected time complexity is not directly comparable to the previous bounds, but under some circumstances it seems to be favourable. (Details follow later.) We believe that our approach complements the arsenal of haplotype inference methods. It seems that the ideas can be generalized to more complicated populations. 2 Preliminaries In addition to the notion already introduced, we clarify some more terminology as we use it in the paper. Definition 3. The genotype formed by haplotypes a and b (where a = b is allowed) is simply denoted ab. Haplotype b is called the complement of a in ab, and vice versa. Recall that we sometimes consider genotypes as sets (subcubes) of haplotypes, and note that each haplotype has a unique complement in a genotype. Definition 4. A population is a set P of haplotypes, equipped with a frequency of each haplotype in P . Clearly, the frequencies sum up to 1. A sample from P is a multiset G of genotypes (not haplotypes!) ab with a, b ∈ P . (The same genotype may appear several times in G.) An anomalous match, with respect to G, is a triple of haplotypes a, b, c such that a, b ∈ P , ab ∈ G, c ∈ ab, but the complement of c in ab is not in P . An anomalous match can cause a wrong resolution step, if c is used to resolve ab. (We do not demand c ∈ P since c may already be result of an earlier false resolution step.) Since very rare haplotypes are hard to find but, on the other hand, are also of minor significance, we take a parameter n and aim at finding those haplotypes with frequency at least 1/n. Suppose that n is chosen large enough such that these haplotypes cover the whole P , up to some negligible fraction. In the following we adopt the random mating assumption and make some technical simplifications for the analysis later on. We emphasize that they are not substantial and do not affect the algorithm itself. Let fi (i = 1, 2, . . .) denote the haplotype frequencies. We fix some parameter n and aim at identifying all haplotypes with fi ≥ 1/n, where n is chosen large enough such that the fi < 1/n sum up to a negligible fraction. In the worst case P contains n different Fast Perfect Phylogeny Haplotype Inference 187 haplotypes, all with fi = 1/n. In general we will for simplicity pretend that all fi are (roughly) integer multiples of 1/n. Then a haplotype of frequency fi is considered as a set of fi n haplotypes which are equal as strings. Henceforth, if we speak of “k haplotypes” or “k genotypes”, we do not require that they are pairwise different. We say “identical” and “distinct” when we refer to these copies of haplotypes and genotypes, and “equal” and “different” when we refer to their string values. The probability that a randomly chosen genotype yields a resolved haplotype is 1/n in the worst case. Definition 5. The sample graph of G has vertex set P (consisting of n distinct haplotypes) and edge set G, that is: An edge joins two haplotypes if they produced the genotype corresponding to that edge. A sample graph may contain loops (completlely homozygous genotypes) and multiple edges (if the same haplotype pair is sampled several times). Note that the sample graph is of course not “visible”, otherwise we would already know P . Our focus is on asymptotic results, so we consider sums of sufficiently many independent random variables, being sharply concentrated around their expected values, such that we may simply take these expectations for deterministic values. A well-known result on the coupon collector’s problem says that, if we choose one of k objects at random then, after O(k log k) such trials, we have w.h.p. touched every object at least once (see e.g. [10]). Consequently, if we sample O(n2 log n) genotypes then w.h.p. all haplotypes are trivially resolved, because all vertices in the sample graph get loops. The interesting question is what can be accomplished by a smaller sample. Thus, suppose that G has size n1+g , with g < 1. Then the sample graph has loops at (expected) ng distinct vertices and about n1+g further edges between distinct vertices. 3 Populations with Tree Structure Now we approach the particular contribution of this paper. A natural special type of population has a single founder haplotype and is exposed to random mutations over time. As long as the population is relatively young and the total √ number of mutations (and hence n) is bounded by some small fraction of s, w.h.p. each of the s sites is affected at most once. (Calculations are simple.) Non-affected sites can be ignored, therefore s henceforth denotes the number of sites where different alleles appear. From the uniqueness of mutations at every site it follows that such a population P forms a phylogenetic tree T that enjoys some strong properties discussed below. We call T a perfect phylogeny [6]. Definition 6. A population P of s-site haplotypes has a perfect phylogeny T if the following holds: (1) T is a tree. The vertices of T are labeled by haplotypes (bit strings) such that: (1.1) P is a subset of the vertex set of T . (1.2) Labels of any two vertices joined by an edge in T differ on exactly one site. (2) Edges of T are labeled by sites, such that: 188 P. Damaschke (2.1) The label of every edge is the site mentioned in (1.2). (2.2) Each site is the label of at most one edge. A branch vertex of T is a vertex with degree > 2. The vertices of T can be seen as the haplotypes that appeared in the history of P . However not every vertex is necessarily in P , since it can have disappeared by extinction. Every edge in T is labeled by the site of the allele that has been changed by the mutation corresponding to that edge. Sometimes we identify vertices and edges of T with their labels, i.e. haplotypes and sites, respectively. Note that T is an undirected tree. (Knowing the root is immaterial for our purpose.) The distance of two vertices in T equals the Hamming distance of their labels. For every pair of haplotypes a, b let [a, b] = [b, a] denote the unique path (of length 0 if a = b) in T connecting a and b. Obviously, edge labels on [a, b] are exactly the members of the 2-set of ab. It follows easily: Lemma 1. A haplotype c from T belongs to (the subcube) ab if and only if the vertex labeled c is on [a, b]. Proof. We have c ∈ ab iff a, b, c agree at all sites in the 0,1-set of ab. These sites are exactly the labels of edges out of [a, b]. ⊓ ⊔ Lemma 1 implies that every such triple a, b, c is an anomalous match, unless c = a or c = b: If the complement d of c in ab were in P then d is on [a, b], and [c, d] = [a, b], an obvious contradiction. Therefore we have many anomalous matches already in trivial cases: Θ(n3 ) if T is a path. Even in more natural cases such as fat trees, the number of anomalous matches is still in the order of n2 log n. In general, suppose that we have n2+d anomalous matches and sampled n1+g random genotypes. Consider any of the hg haplotypes in P which are resolved right from the beginning. It has the role of c in (expected) n1+d anomalous matches, but it has only 2hg true haplotypes as neighbors in the sample graph. That means that already for d > g − 1, almost all resolution results would be false. (In contrast to perfect trees, resolution is a very good method if parts of the genetic material under consideration have a high mutation rate: O(log n) random sites are enough to destroy all anomalous matches.) In the next section we address haplotype inference from a genotype sample G, provided that the given population P has a perfect phylogeny. Since resolution is highly misleading then, we follow another natural idea: We utilize intersections of genotypes (considered as subcubes) from sample G. 4 Haplotype Inference in a Perfect Phylogeny Problem statement: Given an unknown population P of haplotypes and a known sample G of genotypes, as in Definition 4. We assume (or: it is promised) that P has a perfect phylogeny T (unknown, of course). Identify as many as possible haplotypes in P . Fast Perfect Phylogeny Haplotype Inference 189 We continue analyzing the problem. Note that the intersection of any two paths in T , say [a, b] and [c, d], is either empty or a path, say [e, f ]. Genotype intersection neatly corresponds to path intersection in T : Lemma 2. With the above denotations, the intersection of genotypes ab and cd is the genotype ef . Proof. W.l.o.g. let a − e − f − b and c − e − f − d be the ordering of vertices a, b, c, d, e, f (not necessarily distinct) on path [a, b] and [c, d], respectively. Let the label of e be w.l.o.g. the zero string. Let A, B, C, D, F denote the set of edge labels on [a, e], [b, f ], [c, e], [d, f ], [e, f ], respectively. Then the label of a, b, c, d, f has the 1-set A, B ∪ F, C, D ∪ F, F , respectively. Hence ab has the 2-set A ∪ B ∪ F and the 1-set ∅. Similarly, cd has the 2-set C ∪D∪F and the 1-set ∅. We conclude that ab ∩ cd has the 2-set F and the 1-set ∅. On the other hand, ef has the 2-set F and the 1-set ∅. Now equality follows. ⊓ ⊔ Due to this exact correspondence we sometimes use the notions genotype and path interchangeably if we do not risk confusion. Definition 7. For a subset S of vertices in T , the hull [S] of S is the unique smallest subtree of T that includes S. Algorithm, phase 1: We reconstruct [U ] where U is the set of haplotypes known in the beginning (i.e. genotypes of size 1 and 2), utilizing the algorithm of [5] which runs in O(ns) time. Surely, output [U ] is a (correct) subtree of T since this reconstruction problem has a unique solution up to isomorphism. While the labels of vertices in U are already determined, we have to compute the labels of branch vertices in [U ] as well. For any branch vertex d, there exist three vertices a, b, c ∈ U such that the paths from d to them are pairwise edgedisjoint. By Lemma 1, d belongs to each of ab, ac, bc. Given three binary strings a, b, c of length s, their majority string, also of length s, is simply defined as follows: At each position, the bit in the majority string is the bit appearing there in a, b, c two or three times. Lemma 3. With the above denotations, the label of d is the majority string of labels of a, b, c. Proof. Consider any bit position i, and w.l.o.g. let 1 be the bit which has majority among ai , bi , ci . W.l.o.g. let be ai = bi = 1. Since d ∈ ab, we must have di = 1. ⊓ ⊔ Algorithm, phase 2: Compute the labels of all branch vertices d in [U ] in O(ns) time, using Lemma 3. Note that we can choose some fixed vertex from U as a, and b, c as descendants of two distinct children of d in the tree rooted at a. Let U ′ be the union of U and the set of branch vertices in [U ]. Note that [U ] = [U ], and that U ′ partitions [U ] into edge-disjoint paths. Since we have the vertex labels in U ′ , we know the 2-set assigned to each of these paths, but not the internal linear ordering of edge labels. This gives reason to define the following data structure: ′ 190 P. Damaschke Definition 8. A path-labeled tree consists of: - a tree, - a subset of its vertices called pins, - labels of the pins, - labels of the pin paths, where a pin path is a path that connects two pins, without a further pin as internal vertex. In our case, every pin path label is simply the set of edge labels on that pin path, i.e. we forget the ordering of edge labels, and the set of pins is initially U ′ . The edge-labeled tree for [U ′ ] can be finished in O(ns) time, as we know the labels of pins, including all the branch vertices. Sometimes we abuse notation and identify edges and their labels if the context is clear. Algorithm, phase 3: For each genotype in G, compute the intersection of its 2-set with [U ]. Recall that this intersection must be a path in [U ] (since the 2-set of every genotype ab ∈ G with a, b ∈ P corresponds to path [a, b] in T ). In particular, if the 2-set of a genotype is entirely contained in [U ], we conclude that the ends of this path are haplotypes in P . All intersections are obviously computable in O(n1+g s) time. In our path-labeled tree we recover the labels of endvertices of all (at most n1+g ) intersection paths [a, b] (where not necessarily a, b ∈ P ), as described in the following. Path [a, b] intersects one or more pin paths in [U ], and we can recognize these pin paths by nonempty intersection of their labels with the known 2-set of ab. If an end of [a, b], say a, happens to be a pin, then nothing remains to be done with a. Otherwise a is an inner vertex of a pin path with ends denoted by c and d. If [a, b] intersects parts of [U ] outside [c, d], let c be that end of [c, d] not included in [a, b]. By computing set differences we get the path labels of [a, d] and [c, a]. Since we know the label of pin c, and now also the 2-set of ca, we can change exactly those sites of c being in this 2-set and obtain the label of a. (By symmetry we could also start from d.) Due to this refinement of the path-labeled tree, a satisfies all requirements to become a new pin. A slightly more complicated argument applies if [a, b] is contained in [c, d]. Again let a denote the end of [a, b] being closer to c. Since we have the label of c and the 0-,1-, and 2-set of ab, we can split the set of sites in three subsets: the 2-set of ab, and the remaining sites being equal and different, respectively, in c and ab. (Note that their values are 0 or 1.) If we walk the path [c, d] starting in c, the sites in the 2-set and those being equal in c and ab cannot be changed before a is reached, whereas the sites being different must be changed before a is reached. These conditions uniquely determine the path label of [c, a]. Once this path label is established, we recover the label of a as in the previous case. This refinement of the path-labeled tree is successively done for all genotypes from G. The operations which are merely manipulations along paths in [U ′ ] can be implemented in O(n1+g s) time for all genotypes. We summarize the preliminary results in Lemma 4. We can identify, in O(n1+g s) time, all haplotypes a ∈ P for which there exists another haplotype b ∈ P such that ab ∈ G, and a, b ∈ [U ]. ⊓ ⊔ Fast Perfect Phylogeny Haplotype Inference 191 Next we try to identify also haplotypes that do not fulfill the condition in Lemma 4. Let ab ∈ G be a genotype such that [a, b] intersects [U ], in at least one vertex or in some path. The part of [a, b] outside [U ] may consist of two paths. Obviously, it is not possible to determine the correct splitting of the 2-set of ab if we solely look at ab. However we shall see that pairwise intersections of genotypes are useful. Definition 9. At any moment, the known part K of T is the subtree represented by our path-labeled tree as described in Definition 8, where each pin is a haplotype from P or a branch vertex or both. In particular, after the steps leading to Lemma 4 we have K = [U ]. Consider a, b, c, d ∈ P with ab, cd ∈ G, ab = cd, and ab ∩ cd = ∅. W.l.o.g. suppose that ab ⊆ cd. Due to Lemma 2, these assumptions imply that [e, f ] := [a, b] ∩ [c, d] = ∅, and that some edge of [a, b] is not in [e, f ] but incident to e or f . Let us call this edge an anchor. Remember that we can easily compute the 0-, 1- and 2-set of ef from the sampled ab and cd. From the 2-set we get also K ∩ [e, f ] if this intersection contains at least one edge. By the same method as described in phase 3, using the labels of pins and pin paths, we can also determine the labels of ends of K ∩ [e, f ] and thus the precise location of K ∩ [e, f ] in K, and split the path labels of affected pin paths in K accordingly. With the denotations from the previous paragraph, next suppose that the anchor is also an edge of K. We can recognize if this is true, since we know that K ∩ [a, b] is a path in K extending K ∩ [e, f ], and we know the corresponding 2-sets. In fact, an anchor belongs to K iff the 2-set of K ∩ [a, b] properly contains the 2-set of K ∩ [e, f ]. Definition 10. With respect to K, we call ab, cd an anchored pair of genotypes if they have a nonempty intersection which also intersects K in at least one edge, [e, f ] = [a, b] ∩ [c, d] is not completely in K, and they have an anchor, i.e. an edge from the set difference, incident to e or f , in K. In that case we can conclude that one end of path [e, f ] in T is exactly the vertex of K where the anchor is attached to [e, f ], since otherwise [e, f ] would not be the intersection of [a, b] and [c, d]. (This picture of a fixed point where some “rope” ends inspired the naming “anchor”.) Finally, if we start at the anchor and trace the edges of K whose labels are in the known 2-set of ef , we can reconstruct the entire path [e, f ], thereby adding its last part to K. In particular, e and f and the vertex where [e, f ] leaves K become pins in tree K extended by [e, f ] \ K. Thus, if [e, f ] is not entirely in K, we have extended the known part of T . Algorithm, phase 4: Choose an anchored pair of genotypes and extend K. Repeat this step as long as possible. Resolve the genotypes whose paths are completely contained in K, as in Lemma 4. 192 P. Damaschke Rephrasing Definition 10 we see that a pair of genotypes is anchored if their intersection paths with K end in the same vertex, x say, in K, their other ends in K are different, and the part of the intersection of their 2-sets not yet in K is nonempty. Testing any two genotypes from G for nonempty intersection outside K takes O(s) time, and each pair must be tested at most once: If the test fails, the intersection outside K will always be empty, since K only grows. If the test succeeds, the missing part of the intersection is attached to K at x. This gives a naive overall time bound of O(n2(1+g) s). However, the nice thing here is that we need not check all pairs in G in order to find anchored pairs. (The following is simpler than the implementation suggested in an earlier manuscript.) In a random sample G we can expect that every set of genotypes in G whose intersection paths in K end at the same vertex x is much smaller than n1+g . Since tests can be restricted to paths that end in the same x, this gives already an improvement. Moreover, the remaining 2-sets of genotypes outside K can be maintained in O(n1+g s) time during the course of the algorithm. To find an anchored pair with common end vertex x we may randomly pick such paths, first with mutually distinct other ends, and mark their edges outside K in an array of length < s. As long as no intersection is found, the time is within O(s). If the degree of x is smaller than the number of distinct ends, we find a nonempty intersection in O(s) time by the pigeonhole principle. Otherwise, since the sample graph is random, a nonempty intersection involves w.h.p. two paths with distinct ends in K, such that a few extra trials succeed. Thus we conjecture O(s2 + n1+g s) expected time for all O(s) extension steps, under the probabilistic assumptions made, but the complete analysis could be subtle. The algorithms in [1,3] both run in guaranteed time O(n1+g s2 ) (in our terminology), however recall that they also output a representation of not completely identified haplotypes, and that improved time bounds might be established. It is hard to compare the algorithms directly. To resume our haplotype inference algorithm for tree populations: First determine the set U of resolved haplotypes (i.e genotypes being homozygous in all positions except at most one), set up the path-labeled tree description of K = [U ], and then successively refine and enlarge it by paths from G in K and intersection paths of anchored pairs, as long as possible. With all the notation from above we can now state the following, still rather technical result: Lemma 5. Given a sample G of genotypes from a population P of haplotypes with perfect phylogeny, we can determine, in polynomial time, all haplotypes v ∈ P that satisfy these two conditions: v belongs to the subtree K of T obtained by successively adding, to the initially known subtree [U ], intersection paths of anchored pairs, and v is endpoint of some path from G in the final K. ⊓ ⊔ Note that Lemma 5 is a combinatorial statement, saying which haplotypes can at least be inferred from a given sample G. No probabilistic assumptions have been made at this stage. However, if we plug in the random mating assumption, Fast Perfect Phylogeny Haplotype Inference 193 we can expect that singleton intersections and anchors occur frequently enough such that the final subtree K covers the entire population P : Theorem 1. Given a population of n haplotypes with perfect phylogeny which form genotypes by random mating, our algorithm reconstructs the population w.h.p., from a random sample of n1+g genotypes, where for any desired confidence level, any g > 0 is sufficient for large enough n. Proof. (Sketch) In T we may assign to every path from G a random orientation, such that the bundles of roughly ng paths starting in each vertex of P are pairwise independent random sets. This can only double the sample size estimate, but it simplifies the argument. Recall that initially K = [U ] where U is the set of haplotypes known from the beginning. The expected number of elements in U is ng . A component (maximal subtree) of T \ K of size larger than Õ(n1−g ) does not exist w.h.p. since it would contain w.h.p. an element from U which is impossible by definition of K. Now let v ∈ P be any vertex in any component C of T \ K. Some pair of paths from G starting in v has an anchor in K that allows to extend K up to v, unless all these paths end in the same component of T \ K or at the same vertex in K. Since roughly ng paths of G start in v and end in random vertices, g the probability of this bad event is in the order of 1/ng n for any single v, and at most n times as large for all v. Thus we will eventually have K = [P ] w.h.p., and all haplotypes inside K can be recovered. ⊓ ⊔ If the haplotype fractions fi < 1/n sum up to some considerable fraction r(n), the analysis goes through, only at cost of another factor 1/(1 − r(n))2 = O(1) in the sample size. The tradeoff between error probability and sample size may be further analyzed. Here it was our main concern to show that much fewer than O(n2 ) genotypes are sufficient. We may also recognize a larger part of T in the beginning, since one can show that intersections of genotypes with cardinality at most 2 must be vertices of T , on the other hand it costs extra time to find them. 5 Conclusions Although perfect phylogeny is not only a narrow special case, as discussed in [9, 1,3], some extensions are desirable. Can we still apply the ideas if P has arisen from several founders by mutations, if mutations affected some sites more than once, if several evolutionary paths led to the same haplotype, if mutations are interspersed with a few crossover events, etc.? If P consists of several perfect phylogenetic trees with pairwise Hamming distance greater than the number of mutations in each tree, the method obviously works with slight modification: Genotypes with 2-set larger than this distance are ignored. Since the others are composed of two haplotypes from the same tree, the trees can be recovered independently. The fraction of “useful” genotypes in a random sample, and thus the blow-up in sample size, is constant, for any constant number of trees. However, this trivial extension is no longer possible if the trees are not so well separated. 194 P. Damaschke Acknowledgments. This work was partially supported by SWEGENE and by The Swedish Research Council (Vetenskapsrådet), project title “Algorithms for searching and inference in genetics”, file no. 621-2002-4574. I also thank Olle Nerman (Chalmers, Göteborg) and Andrzej Lingas (Lund) for some inspiring discussions. References 1. V. Bafna, D. Gusfield, G. Lancia, S. Yooseph: Haplotyping as perfect phylogeny: A direct approach, UC Davis Computer Science Tech. Report CSE-2002-21 2. A. Clark: Inference of haplotypes from PCR-amplified samples of diploid populations, Mol. Biol. Evol. 7 (1990), 111–122 3. E. Eskin, E. Halperin, R.M. Karp: Large scale reconstruction of haplotypes from genotype data, 7th Int. Conf. on Research in Computational Molecular Biology RECOMB’2003, 104–113 4. L. Excoffier, M. Slatkin: Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population, Amer. Assoc. of Artif. Intell. 2000 5. D Gusfield: Efficient algorithms for inferring evolutionary trees, Networks 21 (1991), 19–28 6. D. Gusfield: Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology, Cambridge Univ. Press 1997 7. D. Gusfield: Inference of haplotypes from preamplified samples of diploid populations, UC Davis, technical report csse-99-6 8. D. Gusfield: A practical algorithm for optimal inference of haplotypes from diploid populations, 8th Int. Conf. on Intell. Systems for Mol. Biology ISMB’2000 (AAAI Press), 183–189 9. D. Gusfield: Haplotyping as perfect phylogeny: Conceptual framework and efficient solutions (extended abstract), 6th Int. Conf. on Research in Computational Molecular Biology RECOMB’2002, 166–175 10. R. Motwani, P. Raghavan: Randomized Algorithms, Cambridge Univ. Press 1995 11. M. Stephens, N.J. Smith, P. Donnelly: A new statistical method for haplotype reconstruction from population data, Amer. J. Human Genetics 68 (2001), 978– 989 12. J. Zhang, M. Vingron, M.R. Hoehe: On haplotype reconstruction for diploid populations, EURANDOM technical report, 2001 On Exact and Approximation Algorithms for Distinguishing Substring Selection Jens Gramm⋆ , Jiong Guo⋆⋆ , and Rolf Niedermeier⋆⋆ Wilhelm-Schickard-Institut für Informatik, Universität Tübingen, Sand 13, D-72076 Tübingen, Fed. Rep. of Germany {gramm,guo,niedermr}@informatik.uni-tuebingen.de Abstract. The NP-complete Distinguishing Substring Selection problem (DSSS for short) asks, given a set of “good” strings and a set of “bad” strings, for a solution string which is, with respect to Hamming metric, “away” from the good strings and “close” to the bad strings. Studying the parameterized complexity of DSSS, we show that DSSS is W[1]-hard with respect to its natural parameters. This, in particular, implies that a recently given polynomial-time approximation scheme (PTAS) by Deng et al. cannot be replaced by a so-called efficient polynomial-time approximation scheme (EPTAS) unless an unlikely collapse in parameterized complexity theory occurs. By way of contrast, for a special case of DSSS, we present an exact fixed-parameter algorithm solving the problem efficiently. In this way, we exhibit a sharp border between fixed-parameter tractability and intractability results. Keywords: Algorithms and complexity, parameterized complexity, approximation algorithms, exact algorithms, computational biology. 1 Introduction Recently, there has been strong interest in developing polynomial-time approximation schemes (PTAS’s) for several string problems motivated by computational molecular biology [6,15,16]. More precisely, all these problems adhere to a scenario where we are looking for a string which is “close” to a given set of strings and, in some cases, which shall also be “far” from another given set of strings (see Lanctot et al. [14] for an overview on these kinds of problems and their applications in molecular biology). The underlying distance measure is Hamming metric. The list of problems in this context includes Closest (Sub)String [15], Consensus Patterns [16], and Distinguishing (Sub)String Selection [6]. All these problems are NP-complete, hence polynomial-time exact solutions are out of reach and PTAS’s might be the best one can hope for. PTAS’s, however, ⋆ ⋆⋆ Supported by the Deutsche Forschungsgemeinschaft (DFG), project OPAL (optimal solutions for hard problems in computational biology), NI 369/2. Partially supported by the Deutsche Forschungsgemeinschaft (DFG), junior research group PIAF (fixed-parameter algorithms), NI 369/4. A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 195–209, 2003. c Springer-Verlag Berlin Heidelberg 2003  196 J. Gramm, J. Guo, and R. Niedermeier often carry huge hidden constant factors that make them useless from a practical point of view. This difficulty also occurs with the problems mentioned above. Hence, two natural questions arise. 1. To what extent can the above approximation schemes be made really practical? 1 2. Are there, besides pure heuristics, theoretically satisfying approaches to solve these problems exactly, perhaps based on a parameterized point of view [2, 10]? In this paper, we address both these questions, focusing on the Distinguishing Substring Selection problem (DSSS): Input: Given an alphabet Σ of constant size, two sets of strings over Σ, – Sg = {s1 , . . . , skg }, each string of length at least L (the “good” strings),2 – Sb = {s′1 , . . . , s′kb }, each string of length at least L (the “bad” strings), and two non-negative integers dg and db . Question: Is there a length-L string s over Σ such that – in every si ∈ Sg , for every length-L substring ti , dH (s, ti ) ≥ dg and – every s′i ∈ Sb has at least one length-L substring t′i with dH (s, t′i ) ≤ db ? Here, dH (s, ti ) denotes the Hamming distance between strings s and si . Following Deng et al. [6], we distinguish DSSS from Distinguishing String Selection (DSS) in which all good and bad strings have the same length L; note that Lanctot et al. [14] did not make this distinction and denoted both problems as DSS. The above mentioned Closest Substring is the special case of DSSS where the set of good strings is empty. Furthermore, Closest String is the special case of Closest Substring where all given strings and the goal string have the same length. Since Closest String is known to be NP-complete [12,14], the NP-completeness of Closest Substring and DSSS immediately follows. All the mentioned problems carry at least two natural input parameters (“distance” and “number of input strings”) which often are small in practice when compared to the overall input size. This leads to the important question whether the seemingly inevitable “combinatorial explosion” in exact algorithms for these problems can be restricted to some of the parameters—this is the parameterized 1 2 As Fellows [10] put it in his recent survey, “it would be interesting to sort out which problems with PTAS’s have any hope of practical approximation”. Also see the new survey by Downey [7] for a good exposition on this issue. Deng et al. [6] let all good strings be of same length L; we come back to this restriction in Sect. 4. The terminology “good” and “bad” has its motivation in the application [14] of designing genetic markers to distinguish the sequences of harmful germs (to which the markers should bind) from human sequences (to which the markers should not bind). On Exact and Approximation Algorithms 197 complexity approach [2,7,8,10]. In [13], it was shown that for Closest String this can successfully be done for the “distance” parameter as well as the parameter “number of input strings”. However, Closest String is the easiest of these problems. As to Closest Substring, fixed-parameter intractability (in the above sense of restricting combinatorial explosion to parameters) was recently shown with respect to the parameter “number of input strings” [11]. More precisely, a proof of W[1]-hardness (see [8] for details on parameterized complexity theory) was given. It was conjectured that Closest Substring is also fixed-parameter intractable with respect to the distance parameter, but it is an open question to prove (or disprove) this statement.3 Now, in this work, we show that DSSS is fixed-parameter intractable (i.e., W[1]hard) with respect to all natural parameters as given in the problem definition and, thus, in particular, with respect to the distance parameters. Besides of the interest in its own concerning the impossibility4 of efficient exact fixed-parameter algorithms, this result also has important consequences concerning approximation algorithms. More precisely, our result implies that no efficient polynomialtime approximation scheme (EPTAS) in the sense of Cesati and Trevisan [5] is available for DSSS. As a consequence, there is strong theoretical support for the claim that the recent PTAS of Deng et al. [6] cannot be made practical. In addition, we indicate an instructive border between fixed-parameter tractability and fixed-parameter intractability for DSSS which lies between alphabets of size two and alphabets of size greater than two. Two proofs in Sect. 4 had to be omitted due to the lack of space. 2 Preliminaries and Previous Work Parameterized Complexity. Given a graph G = (V, E) with vertex set V , edge set E, and a positive integer k, the NP-complete Vertex Cover problem is to determine whether there is a subset of vertices C ⊆ V with k or fewer vertices such that each edge in E has at least one of its endpoints in C. Vertex Cover is fixed-parameter tractable with respect to the parameter k. There now are algorithms solving it in less than O(1.3k + kn) time. The corresponding complexity class is called FPT. By way of contrast, consider the NP-complete Clique problem: Given a graph G = (V, E) and a positive integer k, Clique asks whether there is a subset of vertices C ⊆ V with at least k vertices such that C forms a clique by having all possible edges between the vertices in C. Clique appears to be fixed-parameter intractable: It is not known whether it can be solved in f (k) · nO(1) time, where f might be an arbitrarily fast growing function only depending on k. Downey and Fellows developed a completeness program for showing fixedparameter intractability [8]. We very briefly sketch some integral parts of this theory. 3 4 In fact, more hardness results for unbounded alphabet size are known [11]. Here, we refer to the practically most relevant case of constant alphabet size. Unless an unlikely collapse in structural parameterized complexity theory occurs [10]. 198 J. Gramm, J. Guo, and R. Niedermeier Let L, L′ ⊆ Σ ∗ ×N be two parameterized languages.5 For example, in the case of Clique, the first component is the input graph and the second component is the positive integer k, that is, the parameter. We say that L reduces to L′ by a standard parameterized m-reduction if there are functions k → k ′ and k → k ′′ from N to N and a function (x, k) → x′ from Σ ∗ × N to Σ ∗ such that 1. (x, k) → x′ is computable in time k ′′ |x|c for some constant c and 2. (x, k) ∈ L iff (x′ , k ′ ) ∈ L′ . Observe that in the subsequent section we will present a reduction from Clique to DSSS, mapping the Clique parameter k into all  four parameters of DSSS; i.e., k ′ in fact is a four-tuple (kg , kb , dg , db ) = (1, k2 , k + 3, k − 2) (see Sect. 3.1 for details). Notably, most reductions from classical complexity turn out not to be parameterized ones. The basic reference degree for fixed-parameter intractability, W[1], can be defined as the class of parameterized languages that are equivalent to the Short Turing Machine Acceptance problem (also known as the k-Step Halting problem). Here, we want to determine, for an input consisting of a nondeterministic Turing machine M and a string x, whether or not M has a computation path accepting x in at most k steps. This can trivially be solved in O(nk+1 ) time and we would be surprised if this can be much improved. Therefore, this is the parameterized analogue of the Turing Machine Acceptance problem that is the basic generic NP-complete problem in classical complexity theory, and the conjecture that FPT = W[1] is very much analogous to the conjecture that P = NP. Other problems that are W[1]hard (and also W[1]-complete) include Clique and Independent Set, where the parameter is the size of the relevant vertex set [8]. W[1]-hardness gives a concrete indication that a parameterized problem with parameter k is unlikely to allow for a solving algorithm with f (k) · nO(1) running time, i.e., restricting the combinatorial explosion to k. Approximation. In the following, we explain some basic terms of approximation theory, thereby restricting to minimization problems. Given a minimization problem, a solution of the problem is (1 + ǫ)-approximate if the cost of the solution is d, the cost of an optimal solution is dopt , and d/dopt ≤ 1 + ǫ. A polynomial-time approximation scheme (PTAS) is an algorithm that computes, for any given real ǫ > 0, a (1+ǫ)-approximate solution in polynomial time where ǫ is considered to be constant. For more details on approximation algorithms, refer to [4]. Typically, PTAS’s have a running time nO(1/ǫ) , often with large constant factors hidden in the exponent which make them infeasible already for moderate approximation ratio. Therefore, Cesati and Trevisan [5] proposed the concept of an efficient polynomial-time approximation scheme (EPTAS) where the PTAS is required to have an f (ǫ) · nO(1) running time where f is an arbitrary function depending only on ǫ and not on n. Notably, most known PTAS’s are not EPTAS’s [7,10]. 5 Generally, the second component (representing the parameter) can also be drawn from Σ ∗ ; for most cases, assuming the parameter to be a positive integer (or a tuple of positive integers) is sufficient. On Exact and Approximation Algorithms 199 Previous Work. Lanctot et al. [14] initiated the research on the algorithmic complexity of distinguishing string selection problems. In particular, besides showing NP-completeness (an independent NP-completeness result was also proven by Frances and Litman [12]), they gave a polynomial-time factor-2approximation for DSSS. Building on PTAS algorithms for Closest String and Closest Substring [15], Deng et al. [6] recently gave a PTAS for DSSS. There appear to be no nontrivial results on exact or fixed-parameter algorithms for DSSS. Since Closest Substring is a special case of DSSS, however, the fixed-parameter intractability results for Closest Substring [11] also apply to DSSS, implying that DSSS is W[1]-hard with respect to the parameter “number of input strings”. Finally, the special case DSS of DSSS (where all given input strings have exactly the same length as the goal string) is solvable in O((kg + kb ) · L · (max {db + 1, (d′g + 1) · (|Σ| − 1)})db ) time with d′g = L − dg [13], i.e., for constant alphabet size, it is fixed-parameter tractable with respect to the aggregate parameter (d′g , db ). In a sense, DSS relates to DSSS as Closest String relates to Closest Substring and, thus, DSS should be regarded as considerably easier and of less practical importance than DSSS. 3 Fixed-Parameter Intractability of DSSS We show that DSSS is, even for binary alphabet, W[1]-hard with respect to the aggregate parameter (dg , db , kg , kb ). This also means hardness for every single of these parameters. With [5], this implies that DSSS does not have an EPTAS. To simplify presentation, in the rest of this section we use the following technical terms. Regarding the good strings, we say that a length-L string s matches an si ∈ Sg or, equivalently, s is a match for si , if dH (s, ti ) ≥ dg for every length-L substring ti of si . Regarding the bad strings, we say that a length-L string s matches an s′i ∈ Sb or, equivalently, s is a match for s′i , if there is a length-L substring t′i of s′i with dH (s, t′i ) ≤ db . Both these notions of matching for good as well as for bad strings generalize to sets of strings in the natural way. Our hardness proof follows a similar structure as the W[1]-hardness proof for Closest Substring [11]. We give a parameterized reduction from Clique to DSSS. Here, however, the reduction has novel features in two ways. Firstly, from the technical point of view, the reduction becomes much more compact and, thus, more elegant. Secondly, for Closest Substring with binary alphabet, we could only show W[1]-hardness with respect to the number of input strings. Here, however, we can show W[1]-hardness with respect to, among others, parameters dg and db . This has strong implications: Here, we can conclude that DSSS has no EPTAS, which is an open question for Closest Substring [11]. 3.1 Reduction from Clique to DSSS A Clique instance is given by an undirected graph G = (V, E), with a set V = {v1 , v2 , . . . , vn } of n vertices, a set E of m edges, and a positive integer k denoting the desired clique size. We describe how to generate two sets of strings over 200 J. Gramm, J. Guo, and R. Niedermeier alphabet {0, 1}, Sg (containing one string sg of length L := nk + 5) and Sb   (containing k2 strings, each of length m · (2nk + 5) + (m − 1)), such that G has a clique of size k iff there is a length-L string s which is a match for Sg and also for Sb ; this means that dH (s, sg ) ≥ dg with Sg := {sg } and dg := k + 3, and every s′b ∈ Sb has a length-L substring t′b with dH (s, t′b ) ≤ db and db := k − 2. In the following we use “◦” to denote the concatenation of strings. Good String. Sg := {sg } where sg = 0L , the all-zero string of length L. Bad Strings. Sb := {s′1,2 , . . . , s′1,k , s′2,3 , s′2,4 , . . . , s′k−1,k }, where every s′i,j has length m · (2nk + 5) + (m − 1) and encodes the whole graph; in the following, we describe how we generate a string s′i,j . We encode a vertex vr ∈ V , 1 ≤ r ≤ n, in a length-n string by setting the rth position of this string to “1” and all other positions to “0”, i.e., vertex(vr ) := 0r−1 10n−r . In s′i,j , we encode an edge {vr , vs } ∈ E, 1 ≤ r < s ≤ n, by a length-(nk) string n edge(i, j,{vr , vs }) := 0n . . . 0n ◦ vertex(vr ) ◦ 0n . . .0n ◦ vertex(vs ) ◦0 . . 0n .  . (i − 1) (j − i − 1) (k − j) Furthermore, we define edge block(i, j, {vr , vs }) := edge(i, j, {vr , vs }) ◦ 01110 ◦ edge(i, j, {vr , vs }) . We choose this way of constructing the edge block(·, ·, ·) strings for the following reason: Let edge(i, j, {vr , vs }) [h1 , h2 ] denote the substring of edge(i, j, {vr , vs }) ranging from position h1 to position h2 . Then, every length L = nk + 5 substring of edge block(·, ·, ·) which contains the “01110” substring will have the form edge(i, j, {vr , vs }) [h, nk] ◦ 01110 ◦ edge(i, j, {vr , vs }) [1, h − 1] for 1 ≤ h ≤ nk + 1. This will be important because our goal is that a match for a solution in a bad string contains all information of edge(i, j, {vr , vs }) . It is difficult to enforce that a match starts at a particular position but we will show that we are able to enforce that it contains a “111” substring which, by our construction, implies that the match contains all information of edge(i, j, {vr , vs }) . Then, given E = {e1 , . . . , em }, we set s′i,j := edge block(i, j, e1 ) ◦0◦ edge block(i, j, e2 ) ◦ . . .◦ edge block(i, j, em ) . Parameter Values. We set L := nk + 5 and generate kg := 1 good string,   kb := k2 bad strings, and we set distance parameters dg := k +3 and db := k −2. Example. Let G = (V, E) with V := {v1 , v2 , v3 , v4 } and E := {{v1 , v3 }, {v1 , v4 }, {v2 , v3 }, {v3 , v4 }} asshown in Fig. 1(a) and let k = 3. Fig. 1(b) displays the good string sg and the k2 = 3 bad strings s′1,2 , s′1,3 , and s′2,3 . Additionally, we show the length-(nk + 5), i.e., length-17, string s which is a match for Sg = {sg } and a match for Sb = {s′1,2 , s′1,3 , s′2,3 } and, thus, corresponds to the k-clique in G. On Exact and Approximation Algorithms 201 Fig. 1. Example for the reduction from a Clique instance to a DSSS instance with binary alphabet. (a) A Clique instance G = (V, E) with k = 3. (b) The produced DSSS instance. We indicate the “1”s of the construction by grey boxes, the “0”s by white boxes. We display the solution s that is found since G has a clique of size k = 3; matches of s in s′1,2 , s′1,3 , and s′2,3 are indicated by dashed boxes. By bold lines we indicate the substrings by which we constructed the bad strings: each edge block(·, ·, e) substring is built from edge(·, ·, e) for some e ∈ E, consisting of k length-n substrings, followed by “01110”, followed again by edge(·, ·, e). (c) Alignment of the matches t′1,2 , t′1,3 , and t′2,3 (marked by dashed boxes in (b)) with sg and s. 3.2 Correctness of the Reduction We show the two directions of the correctness proof for the above construction by two lemmas. Lemma 1 For a graph with a k-clique, the construction in Sect. 3.1 produces an instance of DSSS that has a solution, i.e., there is a length-L string s such that dH (s, sg ) ≥ dg and every s′i,j ∈ Sb has a length-L substring t′i,j with dH (s, t′i,j ) ≤ db . Proof. Let h1 , h2 , . . . , hk denote the indices of the clique’s vertices, 1 ≤ h1 < h2 < · · · < hk ≤ n. Then, we can find a solution string s := vertex(vh1 ) ◦ vertex(vh2 ) ◦ · · · ◦ vertex(vhk ) ◦ 01110. For every s′i,j , 1 ≤ i < j ≤ k, the bad string s′i,j contains a substring t′i,j with dH (s, t′i,j ) ≤ db = k − 2, namely t′i,j := edge(i, j, {vhi , vhj }) ◦ 01110. Moreover, we have dH (s, sg ) ≥ dg = k + 3. ⊓ ⊔ 202 J. Gramm, J. Guo, and R. Niedermeier Lemma 2 A solution for the DSSS instance produced from a graph G by the construction in Sect. 3.1 corresponds to a k-clique in G. Proof. We prove this statement in several steps: (1) We observe that a solution for the DSSS instance has at least k + 3 “1”s since dH (s, sg ) ≥ dg = k + 3 and sg consists only of “0”s. (2) We observe that a solution for the DSSS instance has at most k + 3 many “1”s: Following the construction, every length-L substring t′i,j of every bad string s′i,j , 1 ≤ i < j ≤ k, contains at most five “1”s and dH (s, t′i,j ) ≤ k − 2. (3) A match t′i,j for s in the bad string s′i,j contains exactly five “1”s: This follows from the observation that any length-L substring in a bad string contains at most five “1”s together with (1) and (2): Only if t′i,j contains five “1”s and all of them coincide with “1”s in s, we have dH (s, t′i,j ) ≤ (k + 3) − 5 = k − 2. (4) All t′i,j , 1 ≤ i < j ≤ k, and s must contain a “111” substring, located at the same position: To show this, let t′i,j be a match of s in a bad string s′i,j for some 1 ≤ i < j ≤ k. From (3), we know that the match t′i,j must contain exactly five “1”s. Thus, since a substring of a bad string contains five “1”s only if it contains a “111” substring, t′i,j must also contain a “111” substring (which separates in s′i,j two substrings edge(i, j, e) for some e ∈ E). All “1”s in t′i,j have to coincide with “1”s chosen from the k − 3 “1”s in s. In particular, the position of the “111” substring must be the same in the solution and in t′i,j for all 1 ≤ i < j ≤ k. This ensures a “synchronization” of the matches. (5) W.l.o.g., all t′i,j , 1 ≤ i < j ≤ k, and s all end with the “01110” substring: From (4), we know that all t′i,j contain a “111” substring at the same position. If they do not all end with “01110”, we can shift them such that the contained “111” substring is shifted to the appropriate position, as we describe more precisely in the following. Recall that every length-L substring which contains the “111” substring of edge block(i, j, e) has the form edge(i, j, e) [h, nk] ◦ 01110 ◦ edge(i, j, e) [1, h − 1] for 1 ≤ h ≤ nk and e ∈ E. Since all t′i,j , 1 ≤ i < j ≤ k, contain the “111” substring at the same position, they all have this form for the same h. Then, we can, instead, consider edge(i, j, e) [1, nk] ◦ 01110 and, by a circular shift, move the “111” substring in the solution to the appropriate position. Considering the solution s and the matches t′i,j for all 1 ≤ i < j ≤ k as a character matrix, this is a reordering of columns and, thus, the pairwise Hamming distances do not change. (6) We divide the first nk positions of the matches and the solution into k “sections”, each of length n. In s, each of these sections has the form vertex(v) for a vertex v ∈ V by the following argument: By (5), all matches in bad strings end with “01110” and, by the way we constructed the bad strings, each of their sections either consists only of “0”s or has the form vertex(v) for a vertex v ∈ V . If the section encodes a vertex, it contains one “1” which has to coincide with a “1” in s. For the ith section, 1 ≤ i ≤ k, the matches in strings s′i,j for i < j ≤ k and in strings s′j,i for 1 ≤ j < i, encode a vertex in their ith section. Therefore, every of the k sections in s contains a “1” and, since s (by (1) and (2)) contains k + 3 many “1”s and (by (4)) ends with “01110”, each of its sections contains exactly one “1”. Therefore, every section of s can be read as the encoding vertex(v) for a v ∈ V . On Exact and Approximation Algorithms 203 Conclusion. Following (6), let vhi , 1 ≤ i ≤ k, be the vertex encoded in the ith length-n section of s. Now, consider some 1 ≤ i < j ≤ k. Solution s has a match in s′i,j iff there is an edge(i, j, {vhi , vhj }) ◦ 01110 substring in s′i,j and this holds iff {vhi , vhj } ∈ E. Since this is true for all 1 ≤ i < j ≤ k, all vh1 , vh2 , . . . , vhk are pairwisely connected by edges in G and, thus, form a k-clique. ⊓ ⊔ Lemmas 1 and 2 yield the following theorem. Theorem 1 DSSS with binary alphabet is W[1]-hard for every combination of the parameters kg , kb , dg , and db .6 ⊓ ⊔ Theorem 1 means, in particular, that DSSS with binary alphabet is W[1]hard with respect to every single parameter kg , kb , dg , and db . Moreover, it allows us to exploit an important connection between parameterized complexity and the theory of approximation algorithms as follows. Corollary 1 There is no EPTAS for DSSS unless W[1] = FPT. Proof. Cesati and Trevisan [5] have shown that a problem with an EPTAS is fixed-parameter tractable with respect to the parameters that correspond to the objective functions of the EPTAS. In Theorem 1, we have shown W[1]-hardness for DSSS with respect to dg and db . Therefore, we conclude that DSSS cannot have an EPTAS for the objective functions dg and db unless W[1] = FPT. ⊓ ⊔ 4 Fixed-Parameter Tractability for a Special Case In this section, we give a fixed-parameter algorithm for a modified version of DSSS. First of all, we restrict the problem to a binary alphabet Σ = {0, 1}. Then, the problem input consists, similar as in DSSS, of two sets Sg and Sb of binary strings, here with all strings in Sg being of length L. Increasing the number of good strings, we can easily transform an instance of DSSS into one in which all good strings have the same length L by replacing each string si ∈ Sg by a set containing all length-L substrings of si . Therefore, in the same way as Deng et al. [6] we assume in the following that all strings in Sg have length L. We now consider, instead of the parameter dg from the DSSS definition, the “dual parameter” d′g := L − dg such that we require a solution string s with dH (s, si ) ≥ L − d′g for all si ∈ Sg . The idea behind is that in some practical cases it might occur that, while dg is rather large, d′g is fairly small. Hence, restricting the combinatorial explosion to d′g might sometimes be more natural than restricting it to dg . Parameter d′g is said to be optimal if there is an s with dH (s, si ) ≥ L − d′g for all si ∈ Sg and if there is no s′ with dH (s′ , si ) ≥ L − d′g + 1 for all si ∈ Sg . The question addressed in this section is to find the minimum integer db such that, for the optimal parameter value d′g , there is a length-L 6 Note that this is the strongest statement possible for these parameters because it means that the combinatorial explosion cannot be restricted to a function f (kg , kb , dg , db ). 204 J. Gramm, J. Guo, and R. Niedermeier string s with dH (s, si ) ≥ L − d′g for every si ∈ Sg and such that every s′i ∈ Sb has a length-L substring t′i with dH (s, t′i ) ≤ db . Naturally, we also want to compute the length-L solution string s corresponding to the found minimum db . We refer to this modified version of DSSS as MDSSS. We can read the set Sg of kg length-L strings as a kg × L character matrix. We call a column in this matrix dirty if it contains “0”s as well as “1”s. In the following, we present an algorithm solving MDSSS. We conclude this section by pointing out the difficulties arising when giving up some of the restrictions concerning MDSSS. 4.1 Fixed-Parameter Algorithm We present an algorithm that shows the fixed-parameter tractability of MDSSS with respect to the parameter d′g . There are instances of MDSSS where d′g is in fact smaller than the parameter dg . In these cases, solving MDSSS could be a way to circumvent the combinatorial difficulty of computing exact solutions for DSSS; notably, DSSS is not fixed-parameter tractable with respect to dg (Sect. 3) and we conjecture that it is not fixed-parameter tractable with respect to d′g . The structure of the algorithm is as follows. Preprocessing: Process all non-dirty columns of the input set Sg . If there are more than d′g · kg dirty columns then reject the input instance. Otherwise, proceed on the thereby reduced set Sg consisting only of dirty columns. Phase 1: Determine all solutions s such that dH (s, si ) ≥ L−d′g for every si ∈ Sg for the optimal d′g . Phase 2: For every s found in Phase 1, determine the minimal value of db such that every s′i ∈ Sb has a length-L substring t′i with dH (s, t′i ) ≤ db . Finally, find the minimum value of db over all examined choices of s. Note that, in fact, Phase 1 and Phase 2 are interleaved. Phase 1 of our algorithm extends the ideas behind a bounded search tree algorithm for Closest String in [13]. There, however, the focus was on finding one solution whereas, here, we require to find all solutions for the optimal parameter value. This extension was only mentioned in [13] and it will be described here. Preprocessing. Reading the set Sg as a kg ×L character matrix, we set, for an all“0” (all-“1”) column in this matrix, the corresponding character in the solution to “1” (“0”); otherwise, we would not find a solution for an optimal d′g . If the number of remaining dirty columns is larger than d′g · kg then we reject the input instance since no solution is possible. Phase 1. The precondition of this phase is an optimal parameter d′g . Since, in general, the optimal d′g is not known in advance, it can be found by looping through d′g = 0, 1, 2, . . . , each time invoking the procedure described in the following until we meet the optimal d′g . Notably, for each such d′g value, we do not have to redo the preprocessing, but only compare the number of dirty columns against d′g · kg . On Exact and Approximation Algorithms 205 Phase 1 is realized as a recursive procedure: We maintain a length-L candidate string sc which is initialized as sc := inv(s1 ) for s1 ∈ Sg , where inv(s1 ) denotes the bitwise complement of s1 . We call a recursive procedure Solve MDSSS, given in Fig. 2, working as follows. If sc is far away from all strings in Sg (i.e., dH (sc , si ) ≥ L − d′g for all si ∈ Sg ) then sc already is a solution for Phase 1. We invoke the second phase of the algorithm with the argument sc . Since it is possible that sc can be further transformed into another solution, we continue the traversal of the search tree: we select a string si ∈ Sg such that sc is not allowed to be closer to si (i.e., dH (sc , si ) = L − d′g ); such an si must exist since parameter d′g is optimal. We try all possible ways to move sc away from si (such that dH (sc , si ) = L − (d′g − 1)), calling the recursive procedure Solve MDSSS for each of the produced instances. Otherwise, if sc is not a solution for Phase 1, we select a string si ∈ Sg such that sc is too close to si (i.e., dH (sc , si ) < L − d′g ) and try all possible ways to move sc away from si , calling the recursive procedure for each of the produced instances. The invocations of the recursive procedure can, thus, be described by a search tree. In the above recursive calls, we omit those calls trying to change a position in sc which has already been changed before. Therefore, we also omit further invocations of the recursive procedure if the current node of the search tree is already at depth d′g of the tree; otherwise, sc would move too close to s1 (i.e., dH (sc , s1 ) < L − d′g ). Phase 1 is given more precisely in Fig. 2. It is invoked by Solve MDSSS(inv(s1 ), d′g ). Phase 2. The second phase deals with determining the minimal value of db such that there is a string s in the set of the solution strings found in the first phase with dH (s, t′i ) ≤ db for 1 ≤ i ≤ kb , where t′i is a length-L substring of s′i . For a given solution string s from the first phase and a string s′i ∈ Sb , we use Abrahamson’s algorithm [1] to find the minimum of the number of mis√ matches between s and every length-L substring of s′i in O(|si | L log L) time. This minimum is equal to mint′i dH (s, t′i ), where t′i is length-L substring of s′i . Applying this algorithm to all strings in Sb , we get the value of db for s, maxi=1,... ,kb mint′i dH (s, t′i ). The minimum value of db is then the minimum distance of a solution string from Phase 1 to all bad strings, and s which achieves this minimum distance is the corresponding solution string. If we are given a fixed db and are asked if there is a string s among the solution strings from the first phase which is a match to all strings in Sb , there is a more efficient algorithm by √ Amir et al. [3] for string matching with db -mismatches, which takes only O(|s′i | db log db ) time to find all length-L substrings in s′i whose Hamming distance to s is at most db . 4.2 Correctness of the Algorithm Preprocessing. The correctness of the preprocessing follows in a similar way as the correctness of the “problem kernel” for Closest String observed by Evans et al. [9] (proof omitted). 206 J. Gramm, J. Guo, and R. Niedermeier Recursive procedure Solve MDSSS(sc , ∆d): Global variables: Sets Sg and Sb of strings, all strings in Sg of length L, and integer d′g . Input: Candidate string sc and integer ∆d, 0 ≤ ∆d ≤ d′g . Output: For optimal d′g , each length-L string ŝ with dH (ŝ, si ) ≥ L−d′g and dH (ŝ, sc ) ≤ ∆d. Remark: The procedure calls, for each computed string ŝ, Phase 2 of the algorithm. Method: (0) if (∆d < 0) then return; (1) if (dH (sc , si ) ≤ L − (d′g + ∆d)) for some i ∈ {1, . . . , kg } then return; (2) if (dH (sc , si ) ≥ L − d′g ) for all i = 1, . . . , kg then /* sc already is a solution for Phase 1 */ call Phase 2(sc , Sb ); choose i ∈ {1, . . . , kg } such that dH (sc , si ) = L − d′g ; P := { p | sc [p] = si [p] }; for all p ∈ P do s′c := sc ; s′c [p] := inv(sc [p]); call Solve MDSSS(s′c , ∆d − 1); end for else /* sc is not a solution for Phase 1 */ choose i ∈ {1, . . . , kg } such that dH (sc , si ) < L − d′g ; Q := { p | sc [p] = si [p] }; choose any Q′ ⊆ Q with |Q′ | = d′g + 1; for all q ∈ Q′ do s′c := sc ; s′c [q] := inv(sc [q]); call Solve MDSSS(s′c , ∆d − 1); end for end if (3) return; Fig. 2. Recursive procedure realizing Phase 1 of the algorithm for MDSSS. Lemma 3 Given an MDSSS instance with the set Sg of kg good length-L strings, and a positive integer d′g . If the resulting kg × L matrix has more than kg · d′g dirty columns then there is no string s with dH (s, si ) ≥ L − d′g for all si ∈ Sg . ⊓ ⊔ Phase 1. From Step (2) in Fig. 2 it is obvious that every string s, which is output of Phase 1 and for which, then, Phase 2 is invoked, satisfies dH (s, si ) ≥ L − d′g for all si ∈ Sg . The reverse direction, i.e., to show that Phase 1 finds every length-L string s with dH (s, si ) ≥ d′g for all si ∈ Sg , is more involved; the proof is omitted: On Exact and Approximation Algorithms 207 Lemma 4 Given an MDSSS instance, if s is an arbitrary length-L solution string, i.e., dH (s, si ) ≥ L − d′g for all si ∈ Sg , then s can be found by calling procedure Solve MDSSS. ⊓ ⊔ Phase 2. The second phase is only an application of known algorithms. 4.3 Running Time of the Algorithm Preprocessing. The preprocessing can easily be done in O(L · kg ) time. Even if the optimal d′g is not known in advance, we can simply process the non-dirty columns and count the number Ld of dirty ones; therefore, the preprocessing has to be done only once. Then, while looping through d′g = 0, 1, 2, . . . in order to find the optimal d′g , we only have to check, for every value of d′g in constant time, whether Ld ≤ d′g · kg . Phase 1. The dependencies of the recursive calls of procedure Solve MDSSS can be described as a search tree in which an instance of the procedure is the parent node of all its recursive calls. One call of procedure Solve MDSSS invokes at most d′g + 1 new recursive calls. More precisely, if sc is a solution then it invokes at most d′g calls and if sc is not a solution then it invokes at most d′g + 1 calls. Therefore, every node in the search tree has at most d′g + 1 children. Moreover, ∆d is initialized to d′g and every recursive call decreases ∆d by 1. As soon as ∆d = 0, no new recursive calls are invoked. Therefore, the height of the search ′ ′ tree is at most d′g . Hence, the search tree has a size of O((d′g + 1)dg ) = O((d′g )dg ). Regarding the running time needed for one call of procedure Solve MDSSS, note that, after the preprocessing, the instance consists of at most d′g ·kg columns. Then, a central task in the procedure is to compute the Hamming distance of two strings. To this end, we initially build, in O(d′g · kg2 ) = O(L · kg ) time, a table containing the distances of sc to all strings in Sg . Using this table, to determine whether or not sc is a match for Sg or to find an si having at least d′g positions coinciding with sc can both be done in O(kg ) time. To identify the positions in which sc coincides with an si ∈ Sg can be done in O(d′g ·kg ) time. After we change one position in sc , we only have to inspect one column of the kg × (d′g · kg ) matrix induced by Sg and, therefore, can update the table in O(kg ) time. Summarizing, one call of procedure Solve MDSSS can be done in O(d′g · kg ) time. Together with the d′g = 0, 1, 2, . . . loop in order to find the optimal d′g , Phase 1 ′ can be done in O((d′g )2 · kg · (d′g )dg ) time. Phase 2. For every solution string found in Phase 1, the running time of the √ second phase is O(N L log L), where N denotes the sum of the length of all strings in Sb [1]. We obtain the following theorem: √ Theorem 2 MDSSS can be solved in O(L · kg + ((d′g )2 kg + N L log L) ·  ′ (d′g )dg ) time where N = s′ ∈Sb |s′i | is the total size of the bad strings. ⊓ ⊔ i 208 4.4 J. Gramm, J. Guo, and R. Niedermeier Extensions of MDSSS The special requirements imposed on the input of MDSSS seem inevitable in order to obtain the above fixed-parameter tractability result. We discuss the problems arising when relaxing the constraints on the alphabet size and the value of d′g . Non-binary alphabet. Already extending the alphabet size in the formulation of MDSSS from two to three makes our approach, described in Sect. 4.1, combinatorially much more difficult such that it does not yield fixed-parameter tractability any more. A reason lies in the preprocessing. When having an allequal column in the character matrix induced by Sg , for a three-letter alphabet there are two instead of one possible choices for the corresponding position in the solution string. Therefore, to enumerate all solutions s with dH (s, si ) ≥ L−d′g for all si ∈ Sg , which is essential for our approach, is not fixed-parameter tractable any more; the number of solutions is too large. Let L′ ≤ L be the number of non-dirty columns and let the alphabet size be three. Then, aside from the ′ dirty columns, we already have 2L assignments of characters to the positions corresponding to non-dirty columns. Non-optimal d′g parameter. Also for non-optimal d′g parameter, the number of solutions s with dH (s, si ) ≥ L − d′g for all si ∈ Sg can become too large and it appears to be fixed-parameter intractable with respect to d′g to enumerate them where Sg = {0L }. Then, there are more than  L  all. Consider the example L ′ ′ d′g strings s with dH (s, 0 ) ≥ L − dg . (If the value of dg is only a fixed number larger than the optimal one, it could, nevertheless, be possible to enumerate all solution strings of Phase 1.) 5 Conclusion We have shown that Distinguishing Substring Selection, which has a PTAS, cannot have an EPTAS unless FPT = W[1]. It remains open whether this also holds for the tightly related and similarly important computational biology problems Closest Substring and Consensus Patterns, each of which has a PTAS [15,16] and for each of which it is unknown whether an EPTAS exists. It has been shown that, even for constant size alphabet, Closest Substring and Consensus Patterns are W[1]-hard with respect to the number of input strings [11]; the parameterized complexity with respect to the distance parameter, however, is open for these problems, whereas it has been settled for DSSS in this paper. It would be interesting to further explore the border between fixed-parameter tractability and intractability as initiated in Sect. 4. On Exact and Approximation Algorithms 209 References 1. K. Abrahamson. Generalized string matching. SIAM Journal on Computing, 16(6):1039–1051, 1987. 2. J. Alber, J. Gramm, and R. Niedermeier. Faster exact solutions for hard problems: a parameterized point of view. Discrete Mathematics, 229(1-3):3–27, 2001. 3. A. Amir, M. Lewenstein, and E. Porat. Faster algorithms for string matching with k mismatches. In Proc. of 11th ACM-SIAM SODA, pages 794–803, 2000. 4. G. Ausiello, P. Crescenzi, G. Gambosi, V. Kann, A. Marchetti-Spaccamela, and M. Protasi. Complexity and Approximation – Combinatorial Optimization Problems and their Approximability Properties. Springer, 1999. 5. M. Cesati and L. Trevisan. On the efficiency of polynomial time approximation schemes. Information Processing Letters, 64(4):165–171, 1997. 6. X. Deng, G. Li, Z. Li, B. Ma, and L. Wang. A PTAS for Distinguishing (Sub)string Selection. In Proc. of 29th ICALP, number 2380 in LNCS, pages 740–751, 2002. Springer. 7. R. G. Downey. Parameterized complexity for the skeptic (invited paper). In Proc. of 18th IEEE Conference on Computational Complexity, July 2003. 8. R. G. Downey and M. R. Fellows. Parameterized Complexity. Springer, 1999. 9. P. A. Evans, A. Smith, and H. T. Wareham. The parameterized complexity of p-center approximate substring problems. Technical report TR01-149, Faculty of Computer Science, University of New Brunswick, Canada. 2001. 10. M. R. Fellows. Parameterized complexity: the main ideas and connections to practical computing. In Experimental Algorithmics, number 2547 in LNCS, pages 51–77, 2002. Springer. 11. M. R. Fellows, J. Gramm, and R. Niedermeier. On the parameterized intractability of Closest Substring and related problems. In Proc. of 19th STACS, number 2285 in LNCS, pages 262–273, 2002. Springer. 12. M. Frances and A. Litman. On covering problems of codes. Theory of Computing Systems, 30:113–119, 1997. 13. J. Gramm, R. Niedermeier, and P. Rossmanith. Exact solutions for Closest String and related problems. In Proc. of 12th ISAAC, number 2223 in LNCS, pages 441–453, 2001. Springer. Full version to appear in Algorithmica. 14. J. K. Lanctot, M. Li, B. Ma, S. Wang, and L. Zhang. Distinguishing string selection problems. In Proc. of 10th ACM-SIAM SODA, pages 633–642, 1999. 15. M. Li, B. Ma, and L. Wang. On the Closest String and Substring Problems. Journal of the ACM, 49(2):157–171, 2002. 16. M. Li, B. Ma, and L. Wang. Finding similar regions in many sequences, Journal of Computer and System Sciences, 65(1):73–96, 2002. Complexity of Approximating Closest Substring Problems Patricia A. Evans1 and Andrew D. Smith1,2 1 University of New Brunswick, P.O. Box 4400, Fredericton N.B., E3B 5A3, Canada pevans@unb.ca 2 Ontario Cancer Institute, University Health Network, Suite 703 620 University Avenue, Toronto, Ontario, M5G 2M9 Canada fax: +1-506-453-3566 asmith@uhnres.utoronto.ca Abstract. The closest substring problem, where a short string is sought that minimizes the number of mismatches between it and each of a given set of strings, is a minimization problem with a polynomial time approximation scheme [6]. In this paper, both this problem and its maximization complement, where instead the number of matches is maximized, are examined and bounds on their hardness of approximation are proved. Related problems differing only in their objective functions, seeking either to maximize the number of strings covered by the substring or maximize the length of the substring, are also examined and bounds on their approximability proved. For this last problem of length maximization, the approximation bound of 2 is proved to be tight by presenting a 2-approximation algorithm. Keywords: Approximation algorithms; Hardness of approximation; Closest Substring 1 Introduction Given a set F of strings, the closest substring problem seeks to find a string C of a desired length l that minimizes the maximum distance from C to a substring in each member of F. We call such a short string C a center for F. The corresponding substrings from each string in F are the occurrences of C. If all strings in F are the same length n, and the center is also to be of length n, then this special case of the problem is known as closest string. We examine the complexity of approximating three problems related to closest substring with different objective functions. A center is considered to be optimal in the context of the problem under discussion, in that it either maximized or minimizes the problem’s objective function. This examination of the problems’ approximability with respect to their differing objective functions reveals interesting differences between the optimization goals. In [6], a polynomial time approximation scheme (PTAS) is given for closest 1 substring that has a performance ratio of 1 + 2r−1 + ǫ, for any 1 ≤ r ≤ m where m = |F|, and ǫ > 0. A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 210–221, 2003. c Springer-Verlag Berlin Heidelberg 2003  Complexity of Approximating Closest Substring Problems 211 While closest substring minimizes the number of mismatches, max closest substring maximizes the number of matches. We show that the max closest substring problem cannot be approximated in polynomial time with ratio better than (log m)/4, unless P=NP. As the maximization complement of the closest substring problem, its reduction can also be applied to closest substring. This application produces a similarly complementary result indicating 1 the necessity of the O(m) term in the PTAS [6]. While the hard ratio for closest substring disappears asymptotically when m approaches infinity (as is to be expected given the PTAS [6]), it indicates a connection between the objective function and the number of strings given as input. This result supports the posi1 in the PTAS performance ratio cannot be significantly tion that the term O(m) improved by a polynomial time algorithm. In [8], Sagot presents an exponential exact algorithm for the decision problem version of closest substring, also known as common approximate substring. Sagot also extends the problem to quorums, finding strings that are approximately present in at least a specified number of the input strings. This quorum size can be maximized as an alternate objective function, producing the maximum coverage approximate substring problem. A restricted version of this problem was examined in [7], and erroneously claimed to be as hard to approximate as clique. We give a reduction from the maximum coverage version of set cover, showing that the problem is hard to approximate within e/(e − 1) − ǫ (where e is the base of the natural logarithm) for any ǫ > 0. The longest common approximate substring problem seeks to maximize the length of a center string that is within some specified distance d from every occurrence. We give a 2-approximation algorithm for this problem and show that 2 is optimal unless P=NP. 2 Preliminary Definitions Definition 1. Let x be an instance of optimization problem Π with optimal solution opt(x). Let A be an algorithm solving Π, and A(x) the solution value produced by A for x. The performance ratio of A with respect to x is   A(x) opt(x) max , . opt(x) A(x) A is a ρ-approximation algorithm if and only if A always returns a solution with performance ratio less than or equal to ρ. Definition 2. Let Π and Π ′ be two minimization problems. A gap-preserving reduction (GP -reduction, ≤GP ) from Π to Π ′ with parameters (c, ρ),(c′ , ρ′ ) is a polynomial-time algorithm f . For each instance I of Π, f produces an instance I ′ = f (I) of Π ′ . The optima of I and I ′ , say opt(I) and opt(I ′ ) respectively, satisfy the following properties: 212 P.A. Evans and A.D. Smith opt(I) ≤ c ⇒ opt(I ′ ) ≤ c′ , opt(I) > cρ ⇒ opt(I ′ ) > c′ ρ′ , where (c, ρ) and (c′ , ρ′ ) are functions of |I| and |I ′ | respectively, and ρ, ρ′ > 1. Observe that the above definition of gap preserving reduction specifically refers to minimization problems, but can easily be adapted for maximization problems. Although it is implied by the name, GP -reductions do not require the size of the gap to be preserved, only that some gap remains [1]. We now formally specify the problems treated in this paper. All of these can be seen as variations on the closest substring problem. Note that dH (x, y) represents the number of mismatches, or Hamming distance, between two strings x and y of equal length |x| = |y|. max closest substring Instance: A set F = {S1 , . . . , Sm } of strings over alphabet Σ such that max1≤i≤m |Si | = n, integer l, (1 ≤ l ≤ n). Question: Maximize mini (l − dH (C, si )), such that C ∈ Σ l and si is a substring of Si , (1 ≤ i ≤ m). maximum coverage approximate substring Instance: A set F = {S1 , . . . , Sm } of strings over alphabet Σ such that max1≤i≤m |Si | = n, integers d and l, (1 ≤ d < l ≤ n). Question: Maximize |F ′ |, F ′ ⊆ F, such that for some C ∈ Σ l and for all Si ∈ F ′ , there exists a substring si of Si such that dH (C, si ) ≤ d. longest common approximate substring Instance: A set F = {S1 , . . . , Sm } of strings over alphabet Σ such that max1≤i≤m |Si | = n, integer d, (1 ≤ d < n). Question: Maximize l = |C|, C ∈ Σ ∗ , such that dH (C, si ) ≤ d and si is a substring of Si , (1 ≤ i ≤ m). Throughout this paper, when discussing different problems the values of d, l and m may refer to either the optimal values of objective functions or the values specified as part of the input. These symbols are used in accordance with their use in the formal statement of whatever problem is being discussed. 3 3.1 Max Closest Substring Hardness of Approximating Max Closest Substring In this section we use a gap preserving reduction from set cover to show inapproximability for max closest substring. Lund and Yannakakis [2], with a reduction from label cover to set cover, showed that set cover could not be approximated in polynomial time with performance ratio better than Complexity of Approximating Closest Substring Problems 213 (log |B|)/4 (where B is the base set) unless NP = DTIME(2poly(log n) ). A result of Raz and Safra [3] indirectly strengthened the conjecture; set cover is now known to be NP-hard to approximate with ratio better than (log |B|)/4. set cover Instance: A set B of elements to be covered and a collection of sets L such that Li ⊆ B, (1 ≤ i ≤ |L|). |R| Question: Minimize |R|, R ⊆ L, such that ∪j=1 Rj = B. Let I = B, L be an instance of set cover. The reduction constructs, in polynomial time, a corresponding instance I ′ = F, l of max closest substring. For all ρ > 1, there exists a ρ′ > 1 such that a solution for I with a ratio of ρ can be obtained in polynomial time from a solution to I ′ with ratio ρ′ . The Alphabet. The strings of F are composed of characters from the alphabet Σ = Σ1 ∪ Σ2 . The characters of Σ1 are referred to as set characters, and identify sets in L. The characters of Σ2 are referred to as element characters and are in one-to-one correspondence with elements of the base set B. Σ1 = {pi : 1 ≤ i ≤ |L|} , Σ2 = {ui : 1 ≤ i ≤ |B|} . Substring Gadgets. The strings of F are made up of two types of substring gadgets. We use the function f , defined below, to ensure that the substring gadgets are sufficiently large. The gadgets are defined as follows: Subset Selectors: Separators: f (|B|) set(i) = pi f (|B|) separator(j) = uj The Reduction. The string set F contains |B| strings, corresponding to the elements of B. For each j ∈ B, let Lj ⊆ L be the subfamily of sets containing the element j. With product notation referring to concatenation, define the string  set(q)separator(j) . Sj = q∈Lj The function f : N → N must be defined. It is necessary for f to have the property that for all positive integers x < |B|,     f (|B|) f (|B|) > . x x+1 It is straightforward to check that f (y) = y 2 has this property. The maximum length of any member of F is n = 2|L||B|2 , the size of F is m = |B|, the length of the center is l = f (|B|) = |B|2 and the alphabet size is |Σ| = |L| + |B|. We call any partition of F whose equivalence relation is the property of having an exact 214 P.A. Evans and A.D. Smith common substring a substring induced partition. For any two occurrences s, s′ of a center, we call s and s′ disjoint if for all 1 ≤ q ≤ |s|, s[q] = s′ [q]. Observe that the maximum distance to an optimal center, for any set of disjoint occurrences, increases with the size of the set. Lemma 1. Let F be a set of occurrences of an optimal center C such that |F | = k. If for each pair s, s′ ∈ F , dH (s, s′ ) = l, then for every s ∈ F , l − dH (C, s) ≥ ⌊l/k⌋. Also, there is at least one s ∈ F such that l − dH (C, s) = ⌊l/k⌋. Proof. There are l total positions and for any position p, there is a unique s ∈ F such that s[p] = C[p]. If some s ∈ F had l − dH (C, s) < ⌊l/k⌋, then the center C would not be optimal, as a better center can be constructed by taking position symbols evenly from the k occurrences. If all s ∈ F have l − dh (C, s) > ⌊l/k⌋, then the total number of matches exceeds l, some pair of matches would have the same position, and thus some pair s, s′ ∈ F have dH (s, s′ ) < l. ⊓ ⊔ The significance of our definition for f is apparent from the above proof. It is essential that, under the premise of Lemma 1, values of k (the number of distinct occurrences of a center) can be distinguished based on the maximum distance from any occurrence to the optimal center. Lemma 2. set cover ≤GP max closest substring. Proof. Suppose the optimal cover R for B, L has size less than or equal to c. Construct string C of length |B|2 as follows. To the positions in C, assign in equal amounts the set characters representing members of R. Then C is a center for F with maximum similarity ⌊|B|2 /c⌋. Suppose |R| > c. Let F ′ be the largest subset of F having a substring induced c-partition. By the reduction, since |R| > c, F ′ = F. Let S be any string in F \F ′ . By Lemma 1, any optimal center for F ′ must have minimum similarity ⌊|B|2 /c⌋, and therefore has at least ⌊|B|2 /c⌋ characters from a substring of every string in F ′ . But the occurrence in S is disjoint from the occurrences in F ′ , forcing the optimal center to match an equal number of positions in more than c disjoint occurrences. Hence, also by Lemma 1, the optimal center matches no more than ⌊|B|2 /(c + 1)⌋ < ⌊|B|2 /c⌋ characters in some occurrence. The gap preserving property of the reduction follows since ⌊|B|2 /c⌋ is a decreasing function of c. ⊓ ⊔ Theorem 1. max closest substring is not approximable within (log m)/4 in polynomial time unless P=NP. Proof. The theorem follows from the fact that the NP-hard ratio for max closest substring remains identical to that of the source problem set cover. ⊓ ⊔ As max closest substring is the complementary maximization version of closest substring, and there is a bijection between feasible solutions to the complementary problems that preserves the order of solution quality, this reduction also applies to closest substring. The form of the hard performance ratio for closest substring provides evidence that the two separate sources of error, 1/O(m) and ǫ, are necessary in the PTAS of [6]. Complexity of Approximating Closest Substring Problems 215 Theorem 2. closest substring cannot be approximated with performance 1 ratio 1 + ω(m) in polynomial time unless P=NP. Proof. Since the NP-hard ratio for set cover is ρ = (1/4) log |B|, the NP-hard ratio obtained for closest substring in the above reduction is ρ′ = cρ−1 cρ−ρ =1+ ≥1+  ρ−1 ρ 1 O(m)    1 · c−1 . ⊓ ⊔ 3.2 An Approximation Algorithm for Max Closest Substring The preceding subsection showed that max closest substring cannot be approximated within (log m)/4. Here, we show that this bound is within a factor of 4 · |Σ| of being tight, by presenting an approximation algorithm that achieves a bound of |Σ| log m for max closest substring. Due to the complementary relationship between max closest substring and closest substring, we start by presenting a greedy algorithm for closest string. The greedy nature of the algorithm is due to the fact that it commits to a local improvement at each iteration. The algorithm also uses a lazy strategy that bases each decision on information obtained by examining a restricted portion of the input. This is the most naive form of local search; the algorithm is not expected to perform well. The idea of the algorithm is to read the input strings column by column, and for each column i, assign a character to C[i] before looking at any column j such that j > i. Algorithm 1 describes this procedure, named GreedyAndLazy, in pseudocode. 216 P.A. Evans and A.D. Smith Lemma 3. The greedy and lazy algorithm for closest string produces a cen1 ter string with radius within a factor of m(1 − |Σ| ) of the optimal radius. Proof. Consider the number of iterations required to guarantee that each S ∈ F matches C in at least one position. Let Ji be the set of strings that do not match any position of C after the ith iteration, then  |Σ| − 1 Ji+1 ≤ Ji ≤ exp(−1/|Σ|)Ji . |Σ| This is because the algorithm always selects the column majority character of those strings in Ji . Let x be the number of iterations required before all members of F match C in at least one position. A bound on the value of x is given by the following inequality:  1 x > exp − . m |Σ| Hence, for any strictly positive ǫ, after x = |Σ| ln m+ǫ iterations, each member of F matches C in at least one position. After the final iteration, the total distance from C to any member of F is at most n − n/(|Σ| ln m). The optimal distance is at least n/m, otherwise some positions are identical in F (and thus should not be considered). Therefore the performance ratio of GreedyAndLazy is  n − n/(|Σ| ln m) 1 ≤m 1− . n/m |Σ| ⊓ ⊔ The running time of GreedyAndLazy, for m sequences of length n, is O(|Σ|mn2 ). Now consider applying GreedyAndLazy to the max closest substring problem by selecting an arbitrary set of substrings of length l to reduce the problem to a max closest string problem. The number of matches between any string in F and the constructed center will be at least Ω(l/(|Σ| log m)). Corollary 1. GreedyAndLazy is a O(|Σ| log m)-approximation algorithm for max closest substring. Since max closest substring is hard to approximate with ratio better than (log m)/4, this approximation algorithm is within 4 · |Σ| of optimal. 4 Maximum Coverage Approximate Substring The incorrect reduction given in [7] claimed an NP-hard ratio of O(nǫ ), ǫ = 14 , for maximum coverage approximate substring when l = n and |Σ| = 2. Its error resulted from applying Theorem 5 of [5], proven only for alphabet size at least three, to binary strings. Hardness of approximation for the general problem is shown here by a reduction from maximum coverage. Complexity of Approximating Closest Substring Problems 217 maximum coverage Instance: A set B of elements to be covered and a collection of sets L such that Li ⊆ B, (1 ≤ i ≤ |L|), a positive integer k. Question: Maximize |B|, B ⊆ B, such that B = ∪kj=1 Lj , where Lj ∈ L. Given an instance B, L, k of maximum coverage, we construct an instance F, l, d of maximum coverage approximate substring where m = |B|, l = k, d = k − 1 and n ≤ k|L|. The construction of F is similar to the construction used when reducing from set cover to closest substring in Section 3; unnecessary parts are removed. The Alphabet. The strings of F are composed of characters from the alphabet Σ. The characters of Σ correspond to the sets Li ∈ L that can be part of a cover, so Σ = {xi : 1 ≤ i ≤ |L|}. The Reduction. The string set F = {S1 , . . . , S|B| } will contain strings corresponding to the elements of B. To construct these strings for each j ∈ B, let Lj ⊆ L be the subfamily of sets containing the element j. For each j ∈ B, define  xki . Sj = xi ∈Lj Set d = k − 1 and l = k. We seek to maximize the number of strings in F containing occurrences of some center C. Lemma 4. maximum coverage ≤GP maximum coverage approximate substring. Proof. Suppose L, B, k is an instance of maximum coverage with a solution set R ⊂ L, such that |R| = k and R covers b ≤ |B| elements. Then there is a center C for F of length l = k that has distance at most d = k − 1 from a substring of b strings in F. Let the k positions in C be assigned characters representing the k sets in the cover, i.e. for each xi ∈ R, there is a position p such that C[p] = xi . All b members of F corresponding to those covered elements in B contain a substring matching at least one character in C, and mismatch at most k − 1 characters. Suppose one cannot obtain a k cover with ratio better than ρ. Then one cannot obtain a center for F that occurs in more than b/ρ b = ρ. ⊓ ⊔ strings of F, so the hard ratio is ρ′ = b/ρ Theorem 3. maximum coverage approximate substring cannot be approximated with performance ratio e/(e − 1) − ǫ, for any ǫ > 0, unless P=NP. Proof. It was shown in [4] that the NP-hard ratio for maximum coverage is e/(e − 1) − ǫ. This result combined with Lemma 4 proves the theorem. ⊓ ⊔ Note that this reduction shows hardness for the general version of the problem, and leaves open the restricted case of l = n with |Σ| = 2. No approximation algorithms with nontrivial ratios are known. 218 5 P.A. Evans and A.D. Smith Longest Common Approximate Substring The longest common approximate substring problem seeks to maximize the length of a center that is within a given distance from each string in the problem instance. That a feasible solution always exists can be seen by considering the case of a single character, since the problem is defined with d > 0. This problem is useful in finding seeds of high similarity for sequence comparisons. Here we show that a simple algorithm always produces a valid center that is at least half the optimal length. A valid center is any string that has distance at most d from at least one substring of each string in F. The algorithm simply evaluates each substring of members of F and tests them as centers. The following procedure Extend accomplishes this with a time complexity of Θ(m2 n3 ). Theorem 4. Extend is a 2-approximation algorithm for longest common approximate substring. Proof. Let C be the optimal center for F. For each Si ∈ F, let si be the occurrence of C from Si ; observe that |si | = |C|. Define si,1 as the substring of si consisting of the first |C|/2 positions of si , and si,2 as the substring consisting of the remaining positions. Similarly, define C1 and C2 as the first and last half of C. For x ∈ {1, 2}, let cx be equal to the string si,x that satisfies dH (si,x , Cx ) ≤ min dH (sj,x , Cx ) . sj,x ,j =i Define c such that c= c1 if dH (c1 , C1 ) ≤ dH (c2 , C2 ), c2 otherwise. Note that dH (c, Cx ) ≤ d/2, for some x ∈ {1, 2}. Suppose, for contradiction, that c is not a valid center. Assume, without loss of generality, that c = si,1 for some i. Then there is some si,1 such that dH (c, si,1 ) > d. Since dH (c, C1 ) = d/2 − y for some 1 ≤ y ≤ d/2, by the triangle inequality dH (si,1 , C1 ) ≥ d/2 + y + 1. This implies that dH (si,2 , C2 ) ≤ d/2 − y − 1 < dH (c, C1 ), contradicting the definition Complexity of Approximating Closest Substring Problems 219 of c. Hence c is a valid center. Since c is a substring of one of the input strings, it will be found by Extend. It is half the length of the optimal length center C, so a center will be found that is at least half the length of the longest center. ⊓ ⊔ The performance ratio of 2 is optimal unless P=NP. We use a transformation from the vertex cover decision problem that introduces a gap in the objective function. vertex cover Instance: A graph G = (V, E) and a positive integer k. Question: Does G have a vertex cover of size at most k, i.e., a set of vertices V ′ ⊆ V , |V ′ | ≤ k, such that for each edge (u, v) ∈ E, at least one of u and v belongs to V ′ ? Suppose for some graph G, we seek to determine if G contains a vertex cover of size k. We construct an instance of longest common approximate substring with |E| strings corresponding to the edges of G. The intuition behind the reduction is that an occurrence of the center in each string corresponds to the occurrence of a cover vertex in the corresponding edge. Before giving values of n and d, we describe the gadgets used in the reduction. The Alphabet. The string alphabet is Σ = Σ1 ∪ Σ2 ∪ {A}. We refer to these as vertex characters (Σ1 ), unique characters (Σ2 ), and the alignment character (A), where Σ1 = {vi : 1 ≤ i ≤ |V |} and Σ2 = {uij : (i, j) ∈ E}. Substring Gadgets. We next describe the two “high level” component substrings used in the construction. The function f is any arbitrarily large polynomial function of |G|. Vertex Selectors: Separators: (z−1) vertex(x, i, j, z) = Af (k) uij separator(i, j) = (k−z) vx uij Af (k) 3f (k) uij The Reduction. We construct F as follows. For any edge (i, j) ∈ E:  Sij = vertex(i, i, j, z)separator(i, j)vertex(j, i, j, z)separator(i, j) 1≤z≤k The length of each string is then n = k(10f (k) + 2k). The threshold distance is d = k − 1. Theorem 5. longest common approximate substring cannot be approximated in polynomial time with performance ratio better than 2−ǫ, for any ǫ > 0, unless P=NP. Proof. For any set of strings F so constructed, there is an exact common substring of length f (k) corresponding to the f (k) repeats of the alignment character A. Suppose there is a size k cover for the source instance of vertex cover. Construct a center C for F as follows. Assign the alignment character A to the 220 P.A. Evans and A.D. Smith first f (k) positions in C. To positions f (k) + 1 through f (k) + k, assign the characters corresponding to the vertices in the vertex cover. These may be assigned in any order. Finally, assign the alignment character A to the remaining f (k) positions of C. Each string in F contains a substring that matches 2f (k) + 1 positions in C, so C is a valid center. If there is no k cover for the source instance of vertex cover, then for any length f (k)+k string there will be some S ∈ F that mismatches k positions. As f can be any arbitrarily large polynomial function of k, the NP-hard performance ratio is 2f (k) + k ≥2−ǫ , f (k) + k for any constant ǫ > 0. To show hardness for 2 − ǫ, where ǫ is not a constant (it can be a function of l), consider that we can manipulate the hard ratio into the form 2− k . f (k) + k Since l is the optimal length and l = 2f (k) + k, substitute f (k) = l/2 − k/2 in the performance ratio: 2− k 2k =2− . l/2 − k/2 + k l+k Suppose we select l = k c during the reduction, where c is any arbitrarily large constant. Then we have shown a hard performance ratio of  2 2l1/c 2l1/c 1 = 2 − 2− ≥ 2 − = 2 1 − (c−1)/c . 1/c (c−1)/c l l+l l l ⊓ ⊔ 6 Conclusion These results show that, unless P=NP, the max closest substring, maximum coverage approximate substring, and longest common approximate substring problems all have limitations on their approximability. The relationships between the different objective functions produce an interesting interplay between the approximability of minimizing d with l fixed, maximizing l with d fixed, and maximizing their difference l − d. While this last variant, the max closest substring problem, has a hard performance ratio directly related to the number of strings m, the two variants that fix one parameter and attempt to maximize the difference by optimizing the other parameter have lower ratios of approximability. It is NP-hard to approximate max closest substring with a performance ratio better than (log m)/4, and we Complexity of Approximating Closest Substring Problems 221 have provided a (|Σ| log m)-approximation. For longest common approximate substring, with d fixed, the length can be approximately maximized with a ratio of 2, and it is NP-hard to approximate for any smaller ratio. The best ratio of approximation is for closest substring, where l is fixed and d is 1 + ǫ), for any 1 ≤ r ≤ m, minimized; the PTAS of [6] achieves a ratio of (1 + 2r−1 and we have now shown that unless P=NP it cannot be approximated closer 1 than 1 + O(m) . For the quorum variant of closest substring, where the number of strings covered is instead the objective function to be maximized, then it is NP-hard to obtain a performance ratio better than e/(e − 1). The restricted variant with l = n and |Σ| = 2 once thought to be proven hard by [7] is still open, without either hardness or a nontrivial approximation algorithm. Our reductions use alphabets whose size will increase. The complexity of variants of these new problems where the alphabet size is treated as a constant is open, except as they relate to known results for constant alphabets [6,7]. References 1. Sanjeev Arora. Probabilistic checking of proofs and the hardness of approximation problems. PhD thesis, UC Berkeley, 1994. 2. Carsten Lund and Mihalis Yannakakis. On the hardness of approximating minimization problems. Journal of the ACM, 41(5), 1994. 3. Ran Raz and Shmuel Safra. A sub-constant error-probability low-degree test, and a sub-constant error-probability PCP characterization of NP. In Proceedings of the Annual ACM Symposium on Theory of Computing, 475–484, 1997. 4. Uriel Feige. A threshold of log n for approximating set cover. Journal of the ACM, 45(4):634–652, 1998. 5. J. K. Lanctot, M. Li, B. Ma, S. Wang, and L. Zhang. Distinguishing string selection problems. In Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms, 633–642. ACM Press, 1999. 6. Ming Li, Bin Ma, and Lusheng Wang. On the closest string and substring problems. Journal of the ACM, 49(2):157–171, 2002. 7. Bin Ma. A polynomial time approximation scheme for the closest substring problem. In Combinatorial Pattern Matching (CPM 2000), Lecture Notes in Computer Science 1848, 99–107. Springer, 2000. 8. Marie-France Sagot. Spelling approximate repeated or common motifs using a suffix tree. In LATIN’98, Lecture Notes in Computer Science 1380, 374–390. Springer, 1998. On Lawson’s Oriented Walk in Random Delaunay Triangulations⋆ Binhai Zhu Department of Computer Science Montana State University Bozeman, MT 59717-3880 USA bhz@cs.montana.edu Abstract. In this paper we study the performance of Lawson’s Oriented Walk, a 25-year old randomized point location algorithm without any preprocessing and extra storage, in 2-dimensional Delaunay triangulations. Given n pseudo-random points drawn from a convex set C with unit area and their Delaunay triangulation D, √ we prove that the algorithm locates a query point q in D in expected O( n log n) time. We also present an improved version of this algorithm, Lawson’s Oriented Walk with Sampling, which takes expected O(n1/3 ) time. Our technique is elementary and the proof is in fact to relate Lawson’s Oriented Walk with Walkthrough, another well-known point location algorithm without preprocessing. Finally, we present empirical results to compare these two algorithms with their siblings, Walkthrough and Jump&Walk. Keywords: Random Delaunay triangulation, point location, averagecase analysis. 1 Introduction Point location is one of the classical problems in computational geometry, GIS, graphics and solid modeling. In general, point location deals with the following problem: given a set of disjoint geometric objects, determine the object containing a query point. The theoretical problem is well studied in the computational geometry literature and several theoretically optimal algorithms have been proposed since early 1980s; see e.g., Snoeyink’s recent survey [Sn97]. In the last couple of years, optimal or close to optimal solutions (sometimes even in the average-case) are proposed with simpler data structures [ACMR00,AMM00, AMM01a,AMM01b,GOR97]. All these (theoretically) faster algorithms require preprocessing to obtain fast query bounds. However, it should be noted that in practice point location is mainly used as a subroutine for computing and updating large scale triangulations, like in mesh generation. Therefore, extra preprocessing and building additional data structure ⋆ The research is partially supported by NSF CARGO grant DMS-0138065 and a MONTS grant. A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 222–233, 2003. c Springer-Verlag Berlin Heidelberg 2003  On Lawson’s Oriented Walk in Random Delaunay Triangulations 223 is hard, if not impossible, to perform in practice. We need practical point location solutions which performs no or very little preprocessing in practice; moreover, as Delaunay triangulation is used predominantly in areas like mesh generation, finite-element analysis (FEA) and GIS we in fact need efficient practical point location algorithms in Delaunay triangulations. Practical point location in Delaunay triangulations only receives massive attention from computational geometers very recently [DMZ98,De98,MSZ99, DLM99]. All these works are somehow based on an idea due to Green and Sibson to use the “walkthrough” method to perform point location in Delaunay triangulation, a common data structure used in these areas. In particular the Jump&Walk method of [DMZ98,MSZ99] uses random sampling to select a good starting point to walk toward the destination while others mix the “walkthrough” idea with some extra simple tree-like data structure to make the algorithm more general [De98,DLM99] (e.g., deal with arbitrary-distributed data [De98] or handle extremely large input while bounding the query time [DLM99]). Some of these algorithms, e.g., Jump&Walk, has been used in important software packages [Sh96,TG+ 96,BDTY00]. Theoretically, for pseudo-uniformly distributed points in a convex set C, in 2D Jump&Walk is known to have a running time of O(n1/3 ) when the query point is slightly away from the boundary of C [DMZ98]. Similar result holds in 3D [MSZ99]. (We remark that similar “walk” ideas have also been used in ray shooting [AF97,HS95].) Lawson’s Oriented Walk, another randomized point location algorithm without preprocessing, was proposed in 1977 [La77]. It is known that, unlike the Walkthrough method, it could run into loops in arbitrary triangulations. But in Delaunay triangulations it always terminates [Ed90,DFNP91]. Almost no theoretical analysis was ever done on its performance and this question was raised again recently [DPT01]. In this paper, we focus on proving the expected performance of Lawson’s Oriented Walk algorithm in a random Delaunay triangulation (i.e., Delaunay triangulation of n random points). (We remark that given these random data, when enough preprocessing, i.e., Θ(n) expected time and space, is performed we can answer point location queries in expected O(1) time [AEI+ 85].) Delaunay Triangulations. For completeness, we briefly mention the following definitions. Further details can be found in some standard textbooks like [PS85]. The convex hull of a finite point set X is the smallest convex set containing X, denoted as CH(X). The convex hull of a set of k + 1 affinely independent points in Rd , for 0 ≤ k ≤ d, is called a k-simplex; i.e., a vertex, an edge, a triangle, or a tetrahedron, etc. If k = d, we also say the simplex is full dimensional. A triangulation T of X is a subdivision of the convex hull of X consisting of simplices with the following two properties: (1) for every simplex in T , all its faces are also simplices in T ; (2) the intersection of any two simplices in T is either empty or a face of both, in which case it is again a simplex in T . A Delaunay triangulation D of X is a triangulation in which the circumsphere of every full-dimensional simplex is empty, i.e., contains no points of X in its interior. 224 B. Zhu Point Location by Walking. The basic idea is straightforward; it goes back to early work on constructing Delaunay triangulations in 2D and 3D [GS78, Bo81]. Given a Delaunay triangulation D of a set X of n points in Rd , and a query point q; in order to locate the (full-dimensional) simplex in D containing q, start at some arbitrary simplex in D and then “walk” from the center of that simplex to neighboring simplex “in the general direction” of the target point q. Figure 1 shows an example for the straight Walkthrough method walking from an edge e to q. Other simple variations of this kind of “walk” are possible, e.g., the Orthogonal Walk [DPT01]. The underlying assumption for “walk” is that the D is given by an internal representation allowing constant-cost access between neighboring simplices (for example, in 2D, a linked list of triangles suffices as long as each triangle store its corresponding local information, i.e., the coordinates of its three vertices and pointers to its three edges and three neighboring triangles). The list of other suitable data structures includes the 2D quad-edge data structure [GS85], the edge-facet structure in 3D [DL89], its specialization and compactification to the domain of 3D triangulations [Mu93], or its generalization to d dimensions [Br93], etc. e1 e2 q e Fig. 1. An example for the walkthrough method and Lawson’s Oriented Walk. Lawson’s Oriented Walk. Given the Delaunay triangulation D of these n points {X1 , X2 , . . . , Xn }, and a query point q, Lawson’s Oriented Walk algorithm locates the simplex of D containing q, if such a simplex exists, as follows (Figure 1). (1) Select an edge e = Y1 Y2 at random from D. (2) Determine the triangle t adjacent to e such that t and q are on the same side of the line containing e. Let the other two edges of t be e1 , e2 . On Lawson’s Oriented Walk in Random Delaunay Triangulations 225 (3) Determine ei , i = 1, 2, such that the halfplane passing through ei and not containing t, hi , contains q. If both ei ’s have this property, randomly pick up one. If neither ei ’s have this property, return t as the triangle containing q. (4) Update e ← ei and repeat step (2)-(4). The advantage of Lawson’s Oriented Walk is that it handles geometric degeneracy better in practice compared with the Walkthrough method (in which some edges of D might be collinear with the walking segment). In the following, we focus on proving the expected performance of Lawson’s Oriented Walk algorithm under the assumption that the Delaunay triangulation D of n points X1 , ..., Xn are pseudo-uniformly distributed in a compact convex set C. 2 Theoretical Analysis We start by recalling some fundamental definitions. Let C be a compact convex set of R2 and let α and β be two reals such that 0 < α < β. We say that a probability measure P is an (α, β)-measure over C if P [C] = 1 and if we have α λ(S) ≤ P [S] ≤ β λ(S) for every measurable subset S of C, where λ is the usual Lebesgue measure. An R2 -valued random variable X is called an (α, β)-random variable over C if its probability law L(X) is an (α, β)-measure over C. A particular and important example of an (α, β)-measure P is when P is a probability measure with density f (x) such that α ≤ f (x) ≤ β for all x ∈ C. This probabilistic model was slightly more general than the uniform distribution and we will loosely call it pseudo-uniform or pseudo-random. Throughout this section, ci ’s are constants related to the local geometry (but not related to n). The idea of our analysis on Lawson’s Oriented Walk in random Delaunay triangulations is as follows. When e = Y1 Y2 is selected, we consider two situations. In case 1, the segment pq, where p is any point on e, is  O( logn n ) distance away from the boundary of C, ∂C. In case 2, the segment pq could be very close to ∂C (but this event has a very small probability). In both cases, we argue that the number of triangles visited by Lawson’s Oriented Walk is proportional to the number of triangles crossed by the segment pq. To estimate the number of triangles of D crossed by a line segment pq when pq is  O( logn n ) distance away from ∂C, we need the following lemma of [BD98] which is reorganized as follows. Lemma 1. Let C be a compact convex set with unit area in R2 and let X1 , . . . , Xn be n points drawn independently in C from an (α, β)-measure. Let D be the Delaunay triangulation of X1 , . . . , Xn . If L is a fixed line segment of length  |L| in C and is O( logn n ) distance away from the boundary of C and if L is independent of X, then the expected number of triangles or edges of the Delaunay triangulation D crossed by L is bounded by √ c3 + c4 |L| n . 226 B. Zhu We now prove the following lemma. Lemma 2. Let E[T1 (e, q)], where e = Y1 Y2 is a random edge picked by Lawson’s Oriented Walk  and the query point q is independent of X1 , . . . , Xn and both e, q are O( logn n ) distance away from ∂C, be the expected number of triangles crossed by (or, visited by the walkthrough method along) a straight √ segment pq, where p ∈ e is any point of e. We have E[T1 (e, q)] ≤ c5 + c6 E|pq| n . Proof. Let De be the Delaunay triangulation for data points {X1 , ..., Xn } − {Y1 , Y2 }. Then L = pq, the line segment connecting p and q, is independent of the data points {X1 , ..., √Xn } − {Y1 , Y2 }. By Lemma 1, pq crosses an expected number of c3 + c4 E|pq| n − 2 edges in De . Let T1 (e, q) denote the number of triangles in D crossed by pq, p ∈ e. Clearly ET1 (e, q) is bounded by the number of triangles in D crossed by pq which is in turn bounded by the number of triangles of De crossed by pq plus the sum of the degrees of Y1 , Y2 in the Delaunay triangulation De . To see this, note that L either crosses a triangle without one of Y1 and Y2 as a vertex (in which case the triangle is identical in D and De ) or with one of Y1 and Y2 as a vertex. The total number of the latter kind of triangles does not exceed S. The expected value of S is, by symmetry, 2 times the expected degree of Y1 , which is at most 6 by Euler’s formula. Therefore, we have √ ET1 (e, q) ≤ 6 × 2 + c3 + c4 · E|pq| n − 2 √ ≤ 12 + c3 + c4 E|pq| n √ = c5 + c6 E|pq| n , c5 > 12 . This concludes the proof of Lemma 2. ⊓ ⊔ Lemma 2 has a very interesting implication which will be useful in the proof of Theorem 1. We simply list it as a corollary. √ Corollary 1. Let e, q, c5 , c6 be as in Lemma 2. If c5√+ c6 E|p′ q| n, for some p′ ∈ e, is greater than a value, then so is c5 + c6 E|pq| n, for every p ∈ e. Now we are ready to prove the following theorem regarding the expected performance of Lawson’s Oriented Walk in a random Delaunay triangulation. Theorem 1. Let C be a compact convex set with unit area in R2 , and let X1 , . . . , Xn be n points drawn independently in C from  an (α, β)-measure. If the query point q is independent of X1 , . . . , Xn and is O( logn n ) distance away from ∂C, then the expected number of triangles visited by Lawson’ Oriented Walk is bounded by  c1 + c2 n log n .  Proof of Theorem 1. Let B be the event that e is O( logn n ) distance  away from the boundary of C, i.e., B = {e is O( logn n ) distance away from On Lawson’s Oriented Walk in Random Delaunay Triangulations 227   ∂C}. Clearly, P [B] ≥ 1 − β · O( logn n ) and P [B] ≤ β · O( logn n ) following the property of (α, β) measure. Let E[T (e, q)], e = Y1 Y2 , be the expected number of triangles of D visited by Lawson’s Oriented Walk. We first consider E[T (e, q)|B]. Let t be the triangle incident to e such that t and q are on the same side of the line through e. Let t = △Y1 Y2 Y3 . We have two cases: (a) Y3 is inside △qY1 Y2 ; and (b) Y√ 3 is outside of △qY1 Y2 . We prove by induction that E[T (e, q)|B] ≤ c7 + c8 · E|pq| n, for any point p ∈ e; moreover, c7 = c5 and c8 = c6 suffices. Notice that in case (a), the algorithm needs to pick up e1 or e2 randomly. Without loss of generality, assume that algorithm picks e1 . We have E[T (e, q)|B] = 1 + E[T (e1 , q)|B]. In this case the distance from any point on e1 to q is always smaller than the distance from some point on e to q. We extend qY3 which intersects e at Y4 and we have qY4 = qY3 + Y3 Y4 (Figure 2 (a)). We prove by induction that in this case √ E[T (e, q)|B] ≤ c7 +c8 ·E|pq| n for any p ∈ e. (The induction is on the number of edges visited by the algorithm, in reverse order.) The basis is straightforward: if q is inside a triangle incident to e and p is any point on e, then E[T (e, q)|B] and  = 1√ √ log n following Lemma 2, c7 +c8 ·E|pq| n is less than on equal to c7 +c8 ·O( n ) n. (This is due to the fact that |pq| is less than the maximal edge length of the triangle containing q, following [BEY91,MSZ99],  the expected maximal edge when the edge is O( logn n ) distance away from the  √ √ boundary of C.) Clearly, 1 ≤ c7 + c8 · O( logn n ) n = c7 + c8 O( log n) (if we set √ c7 = c5 > 12). Let the inductive hypothesis be E[T (e1 , q)|B] ≤ c7√ +c8 ·E|qY ′ | n, for any Y ′ ∈ e1 . Consequently, E[T (e1 , q)|B] ≤ c7 + c8 · E|qY3 | n, as Y3 ∈ e1 . We have length in D is O( log n n ) E[T (e, q)|B] = 1 + E[T (e1 , q)|B] √ ≤ 1 + c7 + c8 E|qY3 | n √ √ = c7 + c8 E(|qY3 | + |Y3 Y |) n + (1 − c8 nE|Y Y3 |), √ √ − c8 E|Y Y3 | n < 0, which is bounded by c7 + c8 · E|qY | n, Y ∈ Y1 Y2 , if we set 1 n i.e., c8 ≥ E|Y Y13 |√n . Following [BEY91,MSZ99], E|Y Y3 | ≤ c9 log . So in this n 1 √ case we just need to set c8 = max{c6 , }, which is c6 when n is sufficiently c9 log n large. To finish our inductive proof for case (a) using Corollary 3, we can simple √ set c7 = c5 . In other words, E[T (e, q)|B] ≤ c7 + c8 · E|pq| n, for any point p ∈ e; moreover, c7 = c5 and c8 = c6 . Notice that in case (b), the algorithm can only pick up one of e1 and e2 . Without loss of generality, assume that algorithm picks e1 . Let the intersection of qY1 and e1 be Y (Figure 2 (b)). In this case we still have E[T (e, q)|B] = 1 + E[T (e1 , q)|B]. In this case, we can again prove √ by induction that E[T (e, q)|B] is bounded by E[T (e, q)|B] ≤ c7 + c8 · E|pq| n, for any p ∈ e. We consider the line segment 228 B. Zhu q Y3 Y1 e2 e1 e Y Y Y1 Y3 e2 e1 e q Y2 Y2 (a) (b) Fig. 2. Illustration for the proof of Theorem 1. qY1 = qY + Y Y1 . From the inductive hypothesis we further have √ E[T (e1 , q)|B] ≤ c7 + c8 · E|qY | n. Therefore, in this case we also have E[T (e, q)|B] = 1 + E[T (e1 , q)|B] √ ≤ 1 + c7 + c8 · E|qY | n √ √ = c7 + c8 · E|qY1 | n + (1 − c8 E|Y Y1 | n). √ To make E[T (e, q)|B] ≤ c7 + c8 · E|qY1 | n, we just need to set c8 ≥ E|Y Y11 |√n .  n , in this case we also need Again, following [BEY91,MSZ99], E|Y Y1 | ≤ c6 log n to set c8 = max{c6 , √ 1 } = c6 . Similarly, we can set c7 = c5 and finish the c9 log n inductive proof for case (b). By definition, we have E[T (e, q)] = E[T (e, q)|B] · P [B] + E[T (e, q)|B] · P [B]. To conclude the proof, we note that E|pq| is of length Θ(1) in both cases. To see this, let p be any point on Y1 Y2 . and note that |pq|2 π is the probability contents of the circle at q of radius |pq|, and is therefore distributed as an i.i.d. (independently identically distributed) uniform [0, c10 ] random variables, which we call Z. E{Z} = c10 /2. Following the Cauchy-Schwarz inequality,  Clearly,  10 E|pq| ≤ E|pq|2 = E(Z/π) = c2π . Also, note that E[T (e, q)|B] is bounded by the size of D, i.e., E[T (e, q)|B] = O(n). A final calculation shows that  E[T (e, q)] ≤ c1 + c2 n log n . ⊔ ⊓ On Lawson’s Oriented Walk in Random Delaunay Triangulations 3 229 Lawson’s Oriented Walk with Sampling We notice that it is very easy to generalize Lawson’s Oriented Walk by starting at a ‘closer’ edge e using random sampling, as done in [DMZ98]. The algorithm is presented as follow. (1) Select m edges at random and without replacement from D. Let e = Y1 Y2 be the closest one from q. (2) Determine the triangle t adjacent to e such that t and q are on the same side of the line containing e. Let the other two edges of t be e1 , e2 . (3) Determine ei , i = 1, 2, such that the halfplane passing through ei and not containing t, hi , contains q. If both ei ’s have this property, randomly pick up one. If neither ei ’s have this property, return t as the triangle containing q. (4) Update e ← ei and repeat step (2)-(4). In Step (1), the distance between a sample edge and q can be measured as the distance between the midpoint of the sample edge and q. The following theorem can be obtained in very much the way as in [DMZ98]. We hence omit the proof. Theorem 2. Let C be a compact convex set with unit area in R2 , and let X1 , . . . , Xn be n points drawn independently in C from  an (α, β)-measure. If the query point q is independent of X1 , . . . , Xn and is O( logn n ) distance away from ∂C, then the expected time of Lawson’ Oriented Walk with Sampling is bounded by  c11 m + c12 n/m . 1/3 1/3 If m =  Θ(n ), then the running time is optimized to O(n ), provided that q n ) distance away from ∂C. is O( log n1/3 4 Empirical Results In this section, we present some empirical results to compare the following algorithms: Green and Sibson’s Walkthrough method (Walk), Lawson’s Oriented Walk (Lawson), Jump and Walk (J&W) and Lawson’s Oriented Walk with Sampling (L&S). All the data points and query points are within a unit square Q bounded by (0,0) and (1,1). (Throughout this section, we define an axis-parallel square by giving the coordinates of its lower-left and upper-right corner points.) We mainly consider two classes of data: random (uniformly generated) points in Q and three clusters of random points in Q. The latter case does not satisfy the conditions of the theorems we have proved in this paper, but it covers practical situation when data points could be clustered. The 3-cluster contains three cluster squares defined by lower-left and upperright corner points: (0.40,0.10) and (0.63,0.33); (0.70,0.67) and (0.93,0.90); and, (0.10,0.67) and (0.33,0.90). Each cluster square has an area of 0.0529 (or 5.29% 230 B. Zhu Fig. 3. 200 random data points in Q and 200 random data points within the threecluster. of the area of Q). In Figure 3 we show two examples for random data and 3cluster data when there are 200 data points. In both situations, we include the four corner points of Q as data points. Our empirical results are summarized in Table 1 and Table 2. For each n, we record the average cost (i.e., # of triangles visited) over 10000 queries. The actual cost is also related to the actual implementation, especially the geometric primitives used. For Jump&Walk and Lawson’s Oriented Walk with Sampling, we use either s1 = ⌊n1/3 ⌋ or s2 = ⌈n1/3 ⌉ sample edges, depending on whether |n − s31 | or |n − s32 | is smaller. Table 1. Comparison of Walk, Jump&Walk, Lawson’s Oriented Walk and Lawson’s Oriented Walk with Sampling when the data points are random. n 10000 15000 20000 25000 30000 35000 40000 45000 50000 W alk 110 130 155 182 197 211 227 235 257 Lawson 127 140 173 193 209 244 243 258 265 J&W 24 28 31 33 35 38 39 40 42 L&S 25 29 33 35 37 41 42 43 45 From Table 1, we can see that when the data points are randomly generated Lawson’s Oriented Walk usually visits an extra (small) constant number of triangles compared with Green and Sibson’s walkthrough method. This conforms with the proof of Theorem 1 (in which we set c8 = c6 , i.e., the number of triangles visited by the two algorithms is bounded by the same function). For Jump & Walk and Lawson’s Oriented Walk with Sampling, the difference is even smaller. Table 2. Comparison of Walk, Jump&Walk, Lawson’s Oriented Walk and Lawson’s Oriented Walk with Sampling when the data points are clustered. n 10000 15000 20000 25000 30000 35000 40000 45000 50000 W alk 87 114 137 148 156 170 184 184 187 Lawson 103 132 151 156 175 189 207 225 237 J&W 27 33 34 36 37 40 41 44 45 L&S 29 33 36 38 39 41 44 46 47 On Lawson’s Oriented Walk in Random Delaunay Triangulations 231 From Table 2, we can see that when the data points are clustered similar fact can be observed: Lawson’s Oriented Walk usually visits an extra constant number of triangles compared with Green and Sibson’s walkthrough method and the difference between Jump & Walk and Lawson’s Oriented Walk with Sampling is very small. One interesting observation is that the costs for walkthrough and Lawson’s Oriented Walk algorithms when data are clustered are lower than the corresponding costs when data are random. The reason is probably the following: As the three clusters have a total area of 15.87% of Q, most parts of the Delaunay triangulation in Q are ‘sparse’. Since the 10000 query points are randomly generated, we can say that most of the time these algorithms traverse those ‘sparse’ regions. 5 Closing Remarks We remark that similar results for Theorem 1 and Theorem 2 hold for d = 3, with a polylog factor and extra boundary conditions inherit from [MSZ99]. It is an interesting question whether we can generalize these results into any fixed dimension, possibly with no extra polylog factor. The theoretical results in this paper implies that within random Delaunay triangulations Lawson’s Oriented Walk performs in very much the same way as the Walkthrough method. Empirical results show the Walkthrough performs slightly better. Still, if we know in advance that degeneracies could appear in the data then Lawson’s Oriented Walk might be a better choice. It seems that when the input data points are random then such degeneracies do not occur. Acknowledgement. The author would like to thank Sunil Arya for communicating his research results. References [AEI+ 85] T. Asano, M. Edahiro, H. Imai, M. Iri, and K. Murota. Practical use of bucketing techniques in computational geometry. In G. T. Toussaint, editor, Computational Geometry, pages 153–195. North-Holland, Amsterdam, Netherlands, 1985. [AF97] B. Aronov and S. Fortune. Average-case ray shooting and minimum weight triangulations. In Proceedings of the 13th Symposium on Computational Geometry, pages 203–212, 1997. [ACMR00] S. Arya, S.W. Cheng, D. Mount and H. Ramesh. Efficient expected-case algorithms for planar point location. In Proceedings of the 7th Scand. Workshop on Algorithm Theory, pages 353–366, 2000. [AMM00] S. Arya, T. Malamatos and D. Mount. Nearly optimal expected-case planar point location. In Proceedings of the 41th IEEE Symp on Foundation of Computer Science, pages 208–218, 2000. [AMM01a] S. Arya, T. Malamatos and D. Mount. A simple entropy-based algorithm for planar point location. In Proceedings of the 12th ACM/SIAM Symp on Discrete Algorithms, pages 262–268, Jan, 2001. 232 B. Zhu [AMM01b] S. Arya, T. Malamatos and D. Mount. Entropy-preserving cuttings and space-efficient planar point location. In Proceedings of the 12th ACM/SIAM Symp on Discrete Algorithms, pages 256–261, Jan, 2001. [BD98] P. Bose and L. Devroye. Intersections with random geometric objects. Comp. Geom. Theory and Appl., 10:139–154, 1998. [BDTY00] J. Boissonnat, O. Devillers, M. Teillaud and M. Yvinc. Triangulations in CGAL triangulation. Proc. 16th Symp. On Computational Geometry, pages 11–18, 2000. [BEY91] M. Bern, D. Eppstein, and F. Yao. The expected extremes in a Delaunay triangulation. International Journal of Computational Geometry & Applications, 1:79–91, 1991. [Bo81] A. Bowyer. Computing Dirichlet tessellations. The Computer Journal, 24:162–166, 1981. [Br93] E. Brisson. Representing geometric structures in d dimensions: Topology and Order. Discrete & Computational Geometry, 9(4):387–426, 1993. [De98] O. Devillers. Improved incremental randomized Delaunay triangulation. In Proceedings of the 14th Symposium on Computational Geometry, pages 106–115, 1998. [DFNP91] L. De Floriani, B. Falcidieno, G. Nagy and C. Pienovi. On sorting triangles in a Delaunay tessellation. Algorithmica, 6: 522–532, 1991. [DLM99] L. Devroye, C. Lemaire and J-M. Moreau. Fast Delaunay point location with search structures. In Proceedings of the 11th Canadian Conf on Computational Geometry, pages 136–141, 1999. [DMZ98] L. Devroye, E. P. Mücke, and B. Zhu. A note on point location in Delaunay triangulations of random points. Algorithmica, Special Issue on Average Case Analysis of Algorithms, 22(4):477–482, 1998. [DL89] D. P. Dobkin and M. J. Laszlo. Primitives for the manipulation of threedimensional subdivisions. Algorithmica, 4(1):3–32, 1989. [DPT01] O. Devillers, S. Pion, and M. Teillaud. Walking in a triangulation. In Proceedings of 17th ACM Symposium on Computational Geometry (SCG’01), pages 106–114, 2001. [Ed90] H. Edelsbrunner. An acyclicity theorem for cell complexes in d dimensions. Combinatorica, 10(3):251–280, 1990. [GOR97] M. T. Goodrich, M. Orletsky, and K. Ramaiyer. Methods for achieving fast query times in point location data structures. In Proceedings of Eighth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA ’97), pages 757–766, 1997. [GS78] P. J. Green and R. Sibson. Computing Dirichlet tessellations in the plane. The Computer Journal, 21:168–173, 1978. [GS85] L. J. Guibas and J. Stolfi. Primitives for the manipulation of general subdivisions and the computation of Voronoi diagrams. ACM Transactions on Graphics, 4(2):74–123, 1985. [HS95] J. Hershberger and S. Suri. A pedestrian approach to ray shootings: shoot a ray, take a walk. J. Algorithms, 18:403–431, 1995. [La77] C. L. Lawson. Software for C 1 surface interpolation. In J.R. Rice, editor, Mathematical Software III, pages 161–194. Academic Press, NY, 1977. [Mu93] E. P. Mücke. Shapes and Implementations in Three-Dimensional Geometry. Ph.D. thesis. Technical Report UIUCDCS-R-93-1836. Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, 1993. On Lawson’s Oriented Walk in Random Delaunay Triangulations [MSZ99] [PS85] [Sh96] [Sn97] [TG+ 96] 233 E. P. Mücke, I. Saias and B. Zhu. Fast randomized point location without preprocessing in two and three-dimensional Delaunay triangulations. Comp. Geom. Theory and Appl., Special Issue for SoCG’96, 12(1/2):63– 83, 1999. F. P. Preparata and M.I. Shamos. Computational Geometry: An Introduction. Springer-Verlag, 1985. J. R. Shewchuk. Triangle: Engineering a 2D quality mesh generator and Delaunay triangulator. In Proceedings of the First ACM Workshop on Applied Computational Geometry, pages 124–133, 1996. J. Snoeyink. Point location. In J. E. Goodman and J. O’Rourke, editors, Handbook of Discrete and Computational Geometry, pages 559–574. CRC Press, Boca Raton, 1997. H. Trease, D. George, C. Gable, J. Fowler, E. Linnbur, A. Kuprat and A. Khamayseh. The X3D Grid Generation System. In Proceedings of the 5th International Conference on Numerical Grid Generation in Computational Field Simulations, 239–244, 1996. Competitive Exploration of Rectilinear Polygons Mikael Hammar1 , Bengt J. Nilsson2 , and Mia Persson2 1 Department of Computer Science, Salerno University, Baronissi (SA) - 84081, Italy. hammar@dia.unisa.it 2 Technology and Society, Malmö University College, S-205 06 Malmö, Sweden. {Bengt.Nilsson,Mia.Persson}@ts.mah.se Abstract. Exploring a polygon with a robot, when the robot does not have a map of its surroundings can be viewed as an online problem. Typical for online problems is that you must make decisions based on past events without complete information about the future. In our case the robot does not have complete information about the environment. Competitive analysis can be used to measure the performance of methods solving online problems. The competitive ratio of such a method is the ratio between the method’s performance and the performance of the best method having full knowledge of the future. We are interested in obtaining good upper bounds on the competitive ratio of exploring polygons and prove a 3/2-competitive strategy for exploring a simple rectilinear polygon in the L1 metric. 1 Introduction Exploring an environment is an important and well studied problem in robotics. In many realistic situations the robot does not possess complete knowledge about its environment, e.g., it may not have a map of its surroundings [1,2,4,6,7,8,9]. The search of the robot can be viewed as an online problem since the robot’s decisions about the search are based only on the part of its environment that it has seen so far. We use the framework of competitive analysis to measure the performance of an online search strategy S. The competitive ratio of S is defined as the maximum of the ratio of the distance traveled by a robot using S to the optimal distance of the search. We are interested in obtaining good upper bounds for the competitive ratio of exploring a rectilinear polygon. The search is modeled by a path or closed tour followed by a point sized robot inside the polygon, given a starting point for the search. The only information that the robot has about the surrounding polygon is the part of the polygon that it has seen so far. Deng et al. [4] show a deterministic strategy having competitive ratio two for this problem if distance is measured according to the L1 -metric. Hammar et al. [5] prove a strategy with competitive ratio 5/3 and Kleinberg [7] proves a lower bound of 5/4 for the competitive ratio of any deterministic strategy. We will show a deterministic strategy obtaining a competitive ratio of 3/2 for searching a rectilinear polygon in the L1 -metric. A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 234–245, 2003. c Springer-Verlag Berlin Heidelberg 2003  Competitive Exploration of Rectilinear Polygons 235 The paper is organized as follows. In the next section we present some definitions and preliminary results. In Section 3 we give an overview of the strategy by Deng et al. [4]. Section 4 contains an improved strategy giving a competitive ratio of 3/2. 2 Preliminaries We will henceforth always measure distance according to the L1 metric, i.e., the distance between two points p and q is defined by ||p, q|| = |px − qx | + |py − qy |, where px and qx are the x-coordinates of p and q and py and qy are the ycoordinates. We define the x-distance between p and q to be ||p, q||x = |px − qx | and the y-distance to be ||p, q||y = |py − qy |. If C is a polygonal curve, then the length of C, denoted length(C), is defined the sum of the distances between consecutive pairs of segment end points in C. Let P be a simple rectilinear polygon. Two points in P are said to see each other, or be visible to each other, if the line segment connecting the points lies in P. Let p be a point somewhere inside P. A watchman route through p is defined to be a closed curve C that passes through p such that every point in P is seen by some point on C. The shortest watchman route through p is denoted by SWR p . It can be shown that the shortest watchman route in a simple polygon is a closed polygonal curve [3]. Since we are only interested in the L1 length of a polygonal curve we can assume that the curve is rectilinear, that is, the segments of the curve are all axis parallel. Note that the shortest rectilinear watchman route through a point p is not necessarily unique. For a point p in P we define four quadrants with respect to p. Those are the regions obtained by cutting P along the two maximal axis parallel line segments that pass through p. The four quadrants are denoted Q1 (p), Q2 (p), Q3 (p), and Q4 (p) in anti-clockwise order from the top right quadrant to the bottom right quadrant. We let Qi,j (p) denote the union of Qi (p) and Qj (p). Consider a reflex vertex of P. The two edges of P connecting at the reflex vertex can each be extended inside P until the extensions reach a boundary point. The segments thus constructed are called extensions and to each extension a direction is associated. The direction is the same as that of the collinear polygon edge as we follow the boundary of P in clockwise order. We use the four compass directions north, west, south, and east to denote the direction of an extension. Lemma 1. (Chin, Ntafos [3]) A closed curve is a watchman route for P if and only if the curve has at least one point to the right of every extension of P. Our objective is thus to present a competitive online strategy that enables a robot to follow a closed curve from the start point s in P and back to s with the curve being a watchman route for P. 236 M. Hammar, B.J. Nilsson, and M. Persson An extension e splits P into two sets Pl and Pr with Pl to the left of e and Pr to the right. We say a point p is to the left of e if p belongs to Pl . To the right is defined analogously. As a further definition we say that an extension e is a left extension with respect to a point p, if p lies to the left of e, and an extension e dominates another extension e′ , if all points of P to the right of e are also to the right of e′ . By Lemma 1 we are only interested in the extensions that are left extensions with respect to the starting point s since the other ones already have a point (the point s) to the right of them. So without loss of clarity when we mention extensions we will always mean extensions that are left extensions with respect to s. 3 An Overview of GO Consider a rectilinear polygon P that is not a priori known to the robot. Let s be the robot’s initial position inside P. For the starting position s of the robot we associate a point f 0 on the boundary of P that is visible from s and call f 0 the principal projection point of s. For instance, we can choose f 0 to be the first point on the boundary that is hit by an upward ray starting at s. Let f be the end point of the boundary that the robot sees as we scan the boundary of P in clockwise order; see Figure 1(a). The point f is called the current frontier. f f 0 principal projection f frontier f s 0 ext(f ) f =v C (a) ext(f ) (b) f 0 C v fr fl p fl0 q (c) s fr0 (d) Fig. 1. Illustrating definitions. Let C be a polygonal curve starting at s. Formally a frontier f of C is a vertex of the visibility polygon, VP(C) of C adjacent to an edge e of VP(C) that is not an edge of P. Extend e until it hits a point q on C and let v be the vertex of P that is first encountered as we move along the line segment [q, f ] from q to f . We denote the left extension with respect to s associated to the vertex v by ext(f ); see Figures 1(b) and (c). Deng et al. [4] introduce an online strategy called greedy-online, GO for short, to explore a simple rectilinear polygon P in the L1 metric. If the starting point s lies on the boundary of P, their strategy, we call it BGO, goes as follows: from the starting point scan the boundary clockwise and establish the first frontier f . Competitive Exploration of Rectilinear Polygons 237 Move to the closest point on ext(f ) and establish the next frontier. Continue in this fashion until all of P has been seen and move back to the starting point. Deng et al. show that a robot using strategy BGO to explore a rectilinear polygon follows a tour with shortest length, i.e., BGO has competitive ratio one. They also present a similar strategy, called IGO, for the case when the starting point s lies in the interior of P. For IGO they show a competitive ratio of two, i.e., IGO specifies a tour that is at most twice as long as the shortest watchman route through s. IGO shoots a ray upwards to establish a principal projection point f 0 and then scans the boundary clockwise to obtain the frontier. Next, it proceeds exactly as BGO, moving to the closest point on the extension of the frontier, updating the frontier, and repeating the process until all of the polygon has been seen. It is clear that BGO could just as well scan the boundary anti-clockwise instead of clockwise when establishing the frontiers and still have the same competitive ratio. Hence, BGO can be seen as two strategies, one scanning clockwise and the other anti-clockwise. We can therefore parameterize the two strategies so that BGO(p, orient) is the strategy beginning at some point p on the boundary and scanning with orientation orient where orient is either clockwise cw or anti-clockwise aw . Similarly for IGO, we can not only choose to scan clockwise or anti-clockwise for the frontier but also choose to shoot the ray giving the first principal projection point in any of the four compass directions north, west, south, or east. Thus IGO in fact becomes eight different strategies that we can parameterize as IGO(p, dir , orient) and the parameter dir can be one of north, south, west, or east. We further define partial versions of GO starting at boundary and interior points. Strategies PBGO(p, orient, region) and PIGO(p, dir , orient, region) apply GO until either the robot has explored all of region or the robot leaves the region region. The strategies return as result the position of the robot when it leaves region or when region has been explored. Note that PBGO(p, orient, P) and PIGO(p, dir , orient, P) are the same strategies as BGO(p, orient) and IGO(p, dir , orient) respectively except that they do not move back to p when all of P has been seen. 4 The Strategy CGO We present a new strategy competitive-greedy-online(CGO) that explores two quadrants simultaneosly without using up too much distance. We assume that s lies in the interior of P since otherwise we can use BGO and achieve an optimal route. The strategy uses two frontier points simultaneously to improve the competitive ratio. However, to initiate the exploration, the strategy begins by performing a scan of the polygon boundary to decide in which direction to start the exploration. This is to minimize the loss inflicted upon us by our choice of initial direction. 238 M. Hammar, B.J. Nilsson, and M. Persson The initial scan works as follows: construct the visibility polygon VP(s) of the initial point s. Consider the set of edges in VP(s) not coinciding with the boundary of P. The end points of these edges define a set of frontier points each having an associated left extension. Let e denote the left extension that is furthest from s (distance being measured orthogonally to the extension), breaking ties arbitrarily. Let l be the infinite line containing e. We rotate the view point of s so that Q3 (s) and Q4 (s) intersect l whereas Q1 (s) and Q2 (s) do not. Hence, e is a horizontal extension lying below s. The initial direction of exploration is upwards through Q1 (s) and Q2 (s). The two frontier points used by the strategy are obtained as follows: the left frontier fl is established by shooting a ray towards the left for the left principal projection point fl0 and then scan the boundary in clockwise direction for fl ; see Figure 1(d). The right frontier fr is established by shooting a ray towards the right for the right principal projection point fr0 and then scan the boundary in anti-clockwise direction for fr ; see Figure 1(d). To each frontier point we associate a left extension ext(fl ) and a right extension ext(fr ) with respect to s. The strategy CGO, presented in pseudo code below makes use of three different substrategies: CGO-0, CGO-1, and CGO-2, that each takes care of specific cases that can occur. Our strategy ensures that whenever it performs one of the substrategies this is the last time that the outermost while-loop is executed. Hence, the loop is repeated only when the strategy does not enter any of the specified substrategies. The loop will lead the strategy to follow a straight line and we will maintain the invariant during the while-loop that all of the region Q3,4 (p) ∩ Q1,2 (s) has been seen. We distinguish four classes of extensions. A is the class of extensions e whose defining edge is above e, B is the class of extensions e whose defining edge is below e. Similarly, L is the class of extensions e whose defining edge is to the left of e, and R is the class of extensions e whose defining edge is to the right of e. For conciseness, we use C1 C2 as a shorthand for the Cartesian product C1 × C2 of the two classes C1 and C2 . fl fl = u fl = u s (a) u s (b) s (c) Fig. 2. Illustrating the key point u. We define two key vertices u and v together with their extensions ext(u) and ext(v) that are useful to establish the correct substrategy to enter. The vertex u lies in Q2 (s) and v in Q1 (s). If ext(fl ) ∈ A ∪ B, then u is the vertex issuing Competitive Exploration of Rectilinear Polygons 239 ext(fl ) and ext(u) = ext(fl ). If ext(fl ) ∈ L and ext(fl ) crosses the vertical line through s, then u is the vertex issuing ext(fl ) and again ext(u) = ext(fl ). If ext(fl ) ∈ L does not cross the vertical line through s, then u is the leftmost vertex of the bottommost edge visible from the robot, on the boundary going from fl clockwise until we leave Q2 (s). The extension ext(u) is the left extension issued by u, and hence, ext(u) ∈ A; see Figures 2(a), (b), and (c). The vertex v is defined symmetrically in Q1 (s) with respect to fr . Each of the substrategies is presented in sequence and for each of them we claim that if CGO executes the substrategy, then the competitive ratio of CGO is bounded by 3/2. Let FR s be the closed route followed by strategy CGO starting at an interior point s. Let FR s (p, q, orient) denote the subpath of FR s followed in direction orient from point p to point q, where orient can either be cw (clockwise) or aw (anti-clockwise). Similarly, we define the subpath SWR s (p, q, orient) of SWR s . We denote by SP (p, q) a shortest rectilinear path from p to q inside P. Strategy CGO 1 Establish the exploration direction by performing the initial scan of the polygon boundary 2 Establish the left and right principal projection points fl0 and fr0 for Q2 (s) and Q1 (s) respectively 3 while Q1 (s) ∪ Q2 (s) is not completely seen do 3.1 Obtain the left and right frontiers, fl and fr 3.2 if fl lies in Q2 (s) and fr lies in Q1 (s) then 3.2.1 Update vertices u and v as described in the text  3.2.2 if (ext(u), ext(v)) ∈ LR or (ext(u), ext(v)) ∈ AR∪LA and ext(u)  crosses ext(v) then 3.2.2.1 Go to the closest horizontal extension  elseif (ext(u), ext(v)) ∈ BR ∪ LB or (ext(u), ext(v)) ∈ AR ∪ LA  and ext(u) does not cross ext(v) then 3.2.2.2 Apply substrategy CGO-1 elseif (ext(u), ext(v)) ∈ AA ∪ AB ∪ BA ∪ BB then 3.2.2.3 Apply substrategy CGO-2 endif else 3.2.3 Apply substrategy CGO-0 endif endwhile 4 if P is not completely visible then 4.1 Apply substrategy CGO-0 endif End CGO 240 M. Hammar, B.J. Nilsson, and M. Persson We claim the following two simple lemmas without proof. Lemma 2. If t is a point on some tour SWR s , then length(SWR t ) ≤ length(SWR s ). Lemma 3. Let S be a set of points that are enclosed by some tour SWR s , and let S1 = S ∩ Q1,2 (s), S2 = S ∩ Q2,3 (s), S3 = S ∩ Q3,4 (s), and S4 = S ∩ Q1,4 (s). Then length(SWR s ) ≥ 2 max{||s, p||y } + 2 max{||s, p||x } + p∈S1 p∈S2 + 2 max{||s, p||y } + 2 max{||s, p||x }. p∈S3 p∈S4 The structure of the following proofs are very similar to each other. In each case we will establish a point t that we can ensure is passed by SWR s and that either lies on the boundary of P or can be viewed as to lie on the boundary of P. We then consider the tour SWR t and compare its length to the length of FR s . By Lemma 2 we know that length(SWR t ) ≤ length(SWR s ), hence the difference in length between FR s and SWR t is an upper bound on the loss produced by CGO. We start by presenting CGO-0, that does the following: Let p be the current robot position. If Q1 (p) is completely seen from p then we run PIGO(p, north, aw , P) and move back to the starting point s, otherwise Q2 (p) is completely seen from p and we run PIGO(p, north, cw , P) and move back to the starting point s. Lemma 4. If CGO-0 is applied, then length(FR s ) = length(SWR s ). Proof. Assume that CGO-0 realizes that when FR s reaches the point p, then Q1 (p) is completely seen from p. The other case, that Q2 (p) is completely seen from p is symmetric. Since the path FR s (s, p, orient) that the strategy has followed when it reaches point p is a straight line, the point p is the currently topmost point of the path. Hence, we can add a vertical spike issued by the boundary point immediately above p, giving a new polygon P′ having p on the boundary and furthermore with the same shortest watchman route through p as P. This means that performing strategy IGO(p, north, orient) in P yields the same result as performing BGO(p, orient) in P′ , p being a boundary point in P′ , and orient being either cw or aw . The tour followed is therefore a shortest watchman route through the point p in both P′ and P. Also the point p lies on an extension with respect to s, by the way p is defined, and it is the closest point to s such that all of Q1 (s) has been seen by the path FR s (s, p, orient) = SP (s, p). Hence, there is a route SWR s that contains p and by Lemma 2 length(SWR p ) ≤ length(SWR s ). The tour followed equals FR s = SP (s, p) ∪ SWR p (p, s, aw ), and we have that length(FR s ) = length(SWR p ) ≤ length(SWR s ), and since FR s cannot be strictly shorter than SWR s the equality holds which concludes the proof. Competitive Exploration of Rectilinear Polygons p FR s p FR s v 241 v u u s SWR u (a) r s SWR u (b) Fig. 3. Illustrating the cases in Lemma 5 when ||s, p||y + ||s, u||x ≤ ||s, v||x . Next we present CGO-1. Let u and v be vertices as defined earlier. The strategy does the following: if (ext(u), ext(v)) ∈ LA ∪ LB, we mirror the polygon P at the vertical line through s and swap the names of u and v. Hence, (ext(u), ext(v)) ∈ AR ∪ BR. We continue moving upwards updating fr and v until either all of Q1 (s) has been seen or ext(v) no longer crosses the vertical line through s. If all of Q1 (s) has been seen then we explore the remaining part of P using PIGO(p, east, aw , P), where p is the current robot position. If ext(v) no longer crosses the vertical line through s then we either need to continue the exploration by moving to the right or return to u and explore the remaining part of the polygon from there. If ||s, p||y + ||s, u||x ≤ ||s, v||x we choose to return to u. If ext(u) ∈ A we run PBGO(u, aw , P) and if ext(u) ∈ B we use PBGO(u, cw , P); see Figure 3. Otherwise, ||s, p||y + ||s, u||x > ||s, v||x and in this case we move to the closest point v ′ on ext(v). By definition, the extension of v is either in A or B in this case. If ext(v) ∈ B then v = v ′ and we choose to run PBGO(v, aw , P). Otherwise, ext(v) ∈ A. If Q1 (v ′ ) is seen from v ′ then the entire quadrant has been explored and we run PIGO(v ′ , east, aw , P) to explore the remainder of the polygon. If Q1 (v ′ ) is not seen from v ′ then there are still things hidden from the robot in Q1 (v). We explore the rest of the quadrant using PBGO(v ′ , north, aw , Q1 (v)) reaching a point q where a second decision needs to be made. If v is seen from the starting point and ||s, q||x ≤ ||s, v||, we go back to v and run PBGO(v, aw , P), otherwise we run PIGO(q, east, cw , P) from the interior point q; see Figure 5. If v is not seen from the starting point s then we go back to v and run PBGO(v, aw , P). To finish the substrategy CGO-1 our last step is to return to the starting point s. 242 M. Hammar, B.J. Nilsson, and M. Persson Lemma 5. If CGO-1 is applied, then length(FR s ) ≤ 32 length(SWR s ). v p v FR s u p FR s u r SWR v s (a) v′ r SWR v s (b) Fig. 4. Illustrating the proof of Lemma 5 when ||s, p||y + ||s, u||x > ||s, v||x . Proof. We handle each case separately. Assume for the first case that when FR s reaches the point p, then Q1 (p) is completely visible. Hence, we have the same situation as in the proof of Lemma 4 and using the same proof technique it follows that length(FR s ) = length(SWR s ). Assume for the second case that CGO-1 decides to go back to u, i.e., that ||s, p||y + ||s, u||x ≤ ||s, v||x ; see Figures 3(a) and (b). The tour followed equals one of  SP (s, p) ∪ SP (p, u) ∪ SWR u ∪ SP (u, s) FR s = SP (s, p) ∪ SP (p, u) ∪ SWR u (u, r, cw ) ∪ SP (r, s) where r is the last intersection point of FR s with the horizontal line through s. Using that ||s, p||y + ||s, u||x ≤ ||s, v||x it follows that the length of FR s in both cases is bounded by length(FR s ) = length(SWR u ) + 2||s, p||y + 2||s, u||x ≤ length(SWR s ) + 3 + ||s, p||y + ||s, u||x + ||s, v||x ≤ length(SWR s ). 2 The inequalities follow from the assumption together with Lemmas 2 and 3. Assume for the third case that CGO-1 goes to the right, i.e., that ||s, p||y + ||s, u||x > ||s, v||x . We begin by handling the different subcases that are independent of whether s sees v; see Figures 4(a) and (b). The tour followed equals one of  SP (s, v) ∪ SWR v (v, r, aw ) ∪ SP (r, s) FR s = SP (s, v ′ ) ∪ SWR v′ (v ′ , r, aw ) ∪ SP (r, s) Since ||s, v||x = ||s, v ′ ||x the length of FR s is in both subcases bounded by length(FR s ) ≤ length(SWR s ) + 2||s, v||x < length(SWR s ) + 3 length(SWR s ), + ||s, p||y + ||s, u||x + ||s, v||x ≤ 2 Competitive Exploration of Rectilinear Polygons 243 The inequalities follow from Lemmas 2 and 3. v v ′ p v′ u FR s r SWR v q v p FR s u q s s SWR v (a) v q (b) p v′ u q′ FR s SWR v s r (c) Fig. 5. Illustrating the proof of Lemma 5. Assume now that CGO-1 goes to the right, i.e., that ||s, p||y + ||s, u||x > ||s, v||x and that v is indeed seen from s; see Figures 5(a) and (b). The tour followed in this case is one of  SP (s, v) ∪ SWR v (v, q, cw ) ∪ SP (q, v) ∪ SWR v (v, r, aw ) ∪ SP (r, s) (∗) FR s = SP (s, v) ∪ SWR v ∪ SP (v, s) where q is the resulting location after exploring Q1 (v). Here we use that v is seen from s, and hence, that the initial scan guarantees that there is a point t of SWR s in Q3,4 (s) such that ||s, t||y ≥ ||s, v||x , thus FR s is bounded by length(FR s ) = length(SWR v ) + 2 min{||s, v||, ||s, q||x } ≤ length(SWR s ) + + ||s, v||y + ||s, v||x + ||s, q||x < length(SWR s ) + 3 length(SWR s ). + ||s, v||y + ||s, t||y + ||s, q||x + ||s, u||x ≤ 2 On the other hand, when v is not seen from s, the tour follows the path marked with (∗) above; see Figure 5(c). Thus, the polygon boundary obscures the view from s to v, and hence, there is a point q ′ on the boundary such that the shortest path from s to v ′ contains q ′ . The path our strategy follows between s 244 M. Hammar, B.J. Nilsson, and M. Persson and v ′ is a shortest path and we can therefore assume that it also passed through q ′ . We use that ||s, q ′ ||x ≤ ||s, v||x ≤ ||s, q||x to get the bound. length(FR s ) = length(SWR v ) + 2||s, q ′ ||x ≤ length(SWR s ) + + ||s, v||x + ||s, q||x < length(SWR s ) + 3 length(SWR s ). + ||s, v||y + ||s, u||x + ||s, q||x ≤ 2 The inequalities above follow from Lemmas 2 and 3 and this concludes the proof. Fig. 6. Illustrating the cases in the proof of Lemma 6. We continue the analysis by first showing the substrategy CGO-2 and then claiming its competitive ratio. The strategy does the following: if ||s, u||x ≤ ||s, v||x then we mirror P at the vertical line through s also swapping the names of u and v. This means that v is closer to the current point p with respect to x-distance than u. Next, go to v ′ , the closest point on ext(v). If ext(v) ∈ B, run PBGO(v, aw , P) since v = v ′ . If ext(v) ∈ A and Q1 (v) is seen from v ′ then we run PIGO(v ′ , east, aw , P). If ext(v) ∈ A but Q1 (v) is not completely seen from v ′ then we explore Q1 (v) using PBGO(v ′ , north, cw , Q1 (v ′ )). Competitive Exploration of Rectilinear Polygons 245 Once Q1 (v) is explored we have reached a point q and we make a second decision. If ||s, q||x ≤ ||s, v||, go back to v and run PBGO(v, aw , P), otherwise run PIGO(q, east, cw , P). Finally go back to s. We claim the following lemma without proof. The proof idea is the same as that of Lemma 5. Lemma 6. If CGO-2 is applied, then length(FR s ) ≤ 32 length(SWR s ). We have the following theorem. Theorem 1. CGO is 3/2-competitive. 5 Conclusions We have presented a 3/2-competitive strategy to explore a rectilinear simple polygon in the L1 metric. An obvious open problem is to reduce the gap between the lower bound of 5/4 and our upper bound of 3/2 for deterministic strategies. It would also be interesting to look at variants of this problem, e.g., what if we are only interested in finding a shortest path and not a closed tour that sees all of the polygon; see Deng et al. [4]. References 1. M. Betke, R.L. Rivest, M. Singh. Piecemeal Learning of an Unknown Environment. Machine Learning, 18(2–3):231–254, 1995. 2. K-F. Chan, T.W. Lam. An on-line algorithm for navigating in an unknown environment. International Journal of Computational Geometry & Applications, 3:227–244, 1993. 3. W. Chin, S. Ntafos. Optimum Watchman Routes. Information Processing Letters, 28:39–44, 1988. 4. X. Deng, T. Kameda, C.H. Papadimitriou. How to Learn an Unknown Environment I: The Rectilinear Case. Journal of the ACM, 45(2):215–245, 1998. 5. M. Hammar, B.J. Nilsson, S. Schuierer. Improved Exploration of Rectilinear Polygons. Nordic Journal of Computing, 9(1):32–53, 2002. 6. F. Hoffmann, C. Icking, R. Klein, K. Kriegel. The Polygon Exploration Problem. SIAM Journal on Computing, 31(2):577–600, 2001. 7. J.M. Kleinberg. On-line search in a simple polygon. In Proc. of 5th ACM-SIAM Symp. on Discrete Algorithms, pages 8–15, 1994. 8. A. Mei, Y. Igarashi. An Efficient Strategy for Robot Navigation in Unknown Environment. Inform. Process. Lett., 52:51–56, 1994. 9. C.H. Papadimitriou, M. Yannakakis. Shortest Paths Without a Map. Theoret. Comput. Sci., 84(1):127–150, 1991. An Improved Approximation Algorithm for Computing Geometric Shortest Paths⋆ Lyudmil Aleksandrov1 , Anil Maheshwari2 , and Jörg-Rüdiger Sack2 1 Bulgarian Academy of Sciences, CICT, Acad. G. Bonchev Str. Bl. 25-A, 1113 Sofia, Bulgaria 2 School of Computer Science, Carleton University, Ottawa, Ontario K1S5B6, Canada Abstract. Consider a polyhedral surface consisting of n triangular faces where each face has an associated positive weight. The cost of travel through each face is the Euclidean distance traveled multiplied by the weight of the face. We present an approximation algorithm for computing a path such that the ratio of the cost of the computed path with respect to the cost of a shortest path is bounded by (1 + ε), for a given 0 < ε < 1. The algorithm is based on a novel way of discretizing the polyhedral surface. We employ a generic greedy approach for solving shortest path problems in geometric graphs produced by such discretization. We improve upon existing approximation algorithms for computing shortest paths on polyhedral surfaces [1,4,5,10,12,15]. 1 Introduction Shortest path problems are among the fundamental problems studied in computational geometry and graph algorithms. These problems arise naturally in application areas such as motion planning, navigation and geographical information systems. Aside from the importance of shortest paths problems in their own right, often they appear in the solutions of other problems. Existing algorithms for many shortest path problems, are quite complex in design and implementation or have very large time and space complexities. Hence they are unappealing to practitioners and pose a challenge to theoreticians. This coupled with the fact that geographic and spatial models are approximations of reality and high-quality paths are favored over optimal paths that are “hard” to compute, approximation algorithms are suitable and necessary. In this paper we present algorithms for computing approximate shortest paths on (weighted) polyhedral surfaces. Our solutions employ the paradigm of partitioning a continuous geometric search space into a discrete combinatorial search space. Discretization methods are natural, theoretically interesting, and enable implementation. They transform geometric shortest path problems into combinatorial shortest path problems in graphs. Shortest path problems in graphs are well studied and general solutions with implementations are readily ⋆ Research supported in part by NSERC A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 246–257, 2003. c Springer-Verlag Berlin Heidelberg 2003  An Improved Approximation Algorithm 247 available. We consider surfaces that are polyhedral 2-manifolds, whereas most of the previous algorithms were designed to handle particular geometric instances, such as convex polyhedra, or possibly non-convex hole-free polyhedra, etc. Also, we allow arbitrary (positive) weights to be assigned to the faces of the domain thus generalizing from uniform and obstacle avoidance scenarios. While shortest paths graph algorithms are available and applicable to the graphs generated here, the geometric structure of shortest path problems can be exploited for the design of more efficient algorithms. Brief Literature Review: Shortest path problems can be categorized by various factors which include the dimensionality of the space, the type and the number of objects or obstacles, and the distance measure used. We discuss those contributions which relate directly to this paper. The following table summarizes the results for shortest path problems on polyhedral surfaces. We need a few preliminaries in order to comprehend the table. Let P be a polyhedral surface in 3-dimensional Euclidean space consisting of n triangular faces. A path π ′ is an 1+ǫ approximation of a shortest path π between two vertices of P if ||π ′ || ≤ (1+ ǫ)||π||, where ||π|| denotes the length of π and ǫ > 0. A natural generalization of the Euclidean shortest path problems are shortest path problems set in weighted surfaces. In this problem a triangulated polyhedral surface is given consisting of n faces, where each face has a positive weight representing the cost of traveling through that face. The cost of a path is defined to be the sum of Euclidean lengths multiplied by the face weights of the sub-paths within each face traversed. (Results on weighted shortest paths involve geometric parameters and they have been omitted for the sake of clarity.) Surface Cost Metric Approx. Ratio Time Complexity Reference Convex Euclidean Exact O(n3 log n) [14] Non-convex Euclidean Exact O(n2 log n) [11] Non-convex Euclidean Exact O(n2 ) [7] 2 Non-convex Euclidean Exact O(n log n) [9] Convex Euclidean 2 O(n) [8] Convex Euclidean 1 + ǫ O(n log 1ǫ + 1/ǫ3 ) [3] √ Convex Euclidean 1 + ǫ O(n/ ǫ + 1/ǫ4 ) [2] 5/3 5/3 Non-convex Euclidean 7(1 + ǫ) O(n log n) [1] Non-convex Euclidean 15(1 + ǫ) O(n8/5 log8/5 n) [1] n 8 Non-convex Weighted (1 + ǫ) O(n log ǫ ) [12] Non-convex Weighted Additive O(n3 log n) [10] Non-convex Weighted (1 + ǫ) O( ǫn2 log n log 1ǫ ) [4] Non-convex Weighted (1 + ǫ) O( nǫ log 1ǫ ( √1ǫ + log n)) [5] Non-convex Weighted (1 + ǫ) O( nǫ log nǫ log 1ǫ ) [15] Non-convex Weighted (1 + ǫ) O( √nε log nε log 1ε ) This paper From practical point of view the “exact” algorithms are unappealing, since they are fairly complex, numerically unstable and may require exponential number 248 L. Aleksandrov, A. Maheshwari, and J.-R. Sack of bits to perform the computation associated to “unfolding” of faces. These drawbacks have motivated researchers to look into practical approximation algorithms. Approximation algorithms of [8,2,10,15,5,4] have been implemented. New Results - Overview and Significance: Results of this paper are 1. We provide a new discretization of polyhedral surfaces. For a given approximation parameter ε ∈ (0, 1), the size of the discretization for a polyhedral surface consisting of n triangular faces is O( √nε log 1ε ). We precisely evaluate the constants hidden in the big-O notation. (Section 2) 2. We define approximation graphs with nodes corresponding to the Steiner points of the discretization. We show that the distance between any pair of nodes in the approximation graph is within a factor of (1 + ε) times the cost of a shortest path in the corresponding surface. (Section 3) 3. We describe a greedy approach for solving the single source shortest path (SSSP) problem in the approximation graph and obtain an O( √nε log nε log 1ε ) time (1 + ε)-approximation algorithm for SSSP problem on a polyhedral surface. (Section 4) Our scheme places Steiner points, for the first time, in the interior of the faces and not on the face boundaries. While this is somewhat counter-intuitive, we can show that the desired approximation properties can still be proven, but now using a much sparser mesh. (In addition this leads to algorithmic simplifications by avoiding the construction of “cones” used in [5].) The size of the discretization is smaller than those previously established and the improvement is by a factor √ of ε. A greedy approach for computing SSSP in the approximation graph has been proposed in [15]. Edges in our approximation graphs do not correspond to line segments as required in their algorithm, as well as their approach does not seem to generalize to 3-dimensions. We propose an alternative greedy algorithm, which is applicable here as well as generalizes to 3-dimensions. Geographical information systems are an immediate application domain for shortest path problems on polyhedral surfaces and terrains. In such applications, the number of faces, n, may be huge (several million). Storage and time complexities are functions on n and constants are critical. In terms of computational complexity our algorithm improves upon previous approximation algorithms for solving shortest path problems on polyhedral surfaces [1,4,5,10,12,15]. The running time of √ our algorithm improves upon the most recent algorithm of [15] by a factor of ε. Ignoring the geometric parameters, the original algorithm of [12] has been improved by about 1/n7 . The algorithm of [12] uses O(n4 ) space. This was improved substantially in [5,15]. The discretization presented here improves further on the storage requirement by reducing the number of Steiner points by √ ε over [5,15]. The practicality of discretization for solving geodesic shortest path problems has been demonstrated in [10,15,16]. From a theoretical viewpoint the discretization scheme proposed here is more complex and requires very careful analysis, its implementation would however be similar to our previous ε-schemes [4,5]. These have been implemented and experimentally verified in [16]. More precisely, the An Improved Approximation Algorithm 249 algorithm presented here does not require any complex data structures (just linked lists, binary search trees, and priority queues). Existing software libraries for computing shortest paths in graphs (Dijkstra’s algorithm) can be used. We provide explicit calculation of key constants often hidden through the use of the big − O-notation. The constant in the estimate on the total number of Steiner points (Lemma 1) is 12Γ log L, where Γ is the average of the reciprocals of the sinuses of the angles of the faces of P . For example, if no face of P has angles smaller than 10◦ , then Γ ≤ 5. Moreover the simplicity of our algorithm, coupled with the fact that we obtain theoretically guaranteed approximation factors, should make it a very promising candidate for the application domain. It is important to note that the edges and Steiner points of the discretization can be produced on-the-fly. When Dijkstra’s algorithm requests edges incident to the current vertex all incident edges (connecting Steiner points) are generated. 2 Preliminaries and Discretization Let P be a triangulated polyhedral surface in the 3-dimensional Euclidean space. P can be any polyhedral 2-manifold. We do not assume any additional geometrical or topological properties such as convexity, being a terrain, or absence of holes, etc. Assume that P consists of n triangular faces denoted by t1 , . . . , tn . Positive weights w1 , . . . , wn are associated with triangles t1 , . . . , tn representing the cost of traveling inside them. The cost of traveling along an edge is the minimum of the weights of the triangles incident to that edge. Edges are assumed to be part of the triangle, from which they inherit their weight. Any continuous (rectifiable) n curve lying in P is called a path. The cost of a path π is defined by π = i=1 wi |πi |, where |πi | denotes the Euclidean length of the intersection of π with triangle ti , i.e., πi = π ∩ ti . Given two distinct points u and v in P a minimum cost path π(u, v) joining u and v is called a geodesic path. Without loss of generality we may assume that u and v lie on a boundary of a face. In this setting, it is well known that geodesic paths are simple (non self-intersecting) and consist of a sequence of segments, whose endpoints are on the edges of P . The intersection of a geodesic path with the interior of faces or edges is a set of disjoint segments. More precisely, each segment on a geodesic path is of one of the following two types: 1) face-crossing – a segment which crosses a face joining two points on its boundary; 2) edge-using – a sub-segment of an edge. We define linear paths to be simple paths consisting of face-crossing and edge-using segments exclusively. Thus, any geodesic path is a linear path. A linear path π(u, v) is represented as a sequence of its segments {s1 , . . . , sl } or equivalently as a sequence of points a0 , . . . , al+1 lying on the edges, that are endpoints of these segments, i.e., si = (ai−1 , ai ), u = a0 , and v = al+1 . Points ai that are not vertices of P are called bending points of the path. Geodesic paths satisfy Snell’s law of refraction at each of their bending points (see [12] for details). In the following we introduce a function d(x) defined as the minimum Euclidean distance from a point x ∈ P to the edges around x. The distance d(x) is 250 L. Aleksandrov, A. Maheshwari, and J.-R. Sack a lower bound on the length of a face-crossing segment incident to x and plays essential role in our constructions. Definition 1. Given a point x ∈ P let E(x) be the set of edges of triangles incident to x minus the edges incident to x. The distance d(x) is defined as the minimum Euclidean distance from x to the edges in E(x). Throughout the paper ε is a real number in (0, 1). Next, we define a set of points on P , called Steiner points, that together with vertices of P constitute an (1+ε)approximation mesh for the set of linear paths on P . That is, we define a graph Gε whose set of nodes consists of the vertices of P and the Steiner points. The edges of Gε correspond to local shortest paths between their endpoints and have cost equal to the cost of their corresponding path. Then we show how the graph Gε can be used to approximate geodesic paths between vertices of P . Using Definition 1 above, for each vertex v of P we define a weighted radius r(v) = wmin (v) d(v), 7wmax (v) (1) where wmax (v) and wmin (v) are the maximum and the minimum weights of the faces incident to v. By using the weighted radius r(v) for each face incident to v we define a “small” isosceles triangle with two sides of length εr(v) incident to v. These triangles around v form a star shaped polygon S(v), which we call a vertex-vicinity of v. In all previous approximation schemes Steiner points have been placed on the edges of P . Here we place Steiner points inside faces of √ P . In this way we reduce the total number of Steiner points by a factor of ε. We will need to show that the (1 + ε)-approximation property of the resulting mesh is preserved. Let triangle t be a face of P . Steiner points inside t are placed along the three bisectors of t as follows. Let v be a vertex of t and ℓ be the bisector of the angle α of t at v. We define a set of Steiner points p1 , . . . , pk on ℓ by  (2) |pi−1 pi | = sin(α/2) ε/2|vpi−1 |, i = 1, . . . , k, where p0 is the point on ℓ and on the boundary of the vertex vicinity S(v) (Figure 1). The next lemma establishes estimates on the number of Steiner points inserted on a particular bisector and on their total number. Lemma 1. (a) The number of Steiner points inserted in a bisector ℓ of an angle α at a vertex v is bounded by C(ℓ) √1ε log2 2ε , where the constant C(ℓ) < 4 sin α log2 |ℓ| r(v) cos(α/2) . (b) The total number of Steiner points on P is less than n 2 C(P ) √ log2 , ε ε (3) where C(P ) < 12Γ log L and L is the maximum of the ratios |ℓ(v)|/r(v) cos(α/2) andΓ is the average of the reciprocals of the sinuses of angles on P , i.e. Γ = 3n 1 1 i=1 sin αi . 3n An Improved Approximation Algorithm 251 Proof: We estimate the number of Steiner points on a bisector ℓ of an angle α at  a vertex v. From (2) it follows, that |vpi | = λi εr(v) cos(α/2), where λ = (1 + ε/2 sin(α/2)). Therefore the number of the Steiner points on ℓ is k ≤ logλ |ℓ| |ℓ| + ln 2ε ln 2r(v) cos(α/2) 4 log2 r(v) cos(α/2) 2 |ℓ| √  = log2 . ≤ εr(v) cos(α/2) ε sin α ε ln(1 + ε/2 sin(α/2)) This proves (a). Estimate (b) is obtained by summing up (a) over all bisectors on P . ⊓ ⊔ Fig. 1. (Left) Steiner points inserted in a bisector ℓ are shown.  (Right) Proof of Lemma 2 is illustrated: Sinuses of angles ∠pi x1 pi+1 and ∠pi x2 pi+1 ≤ ε/2, implying |x1 pi | + |pi x2 | ≤ (1 + ε/2)|x1 x2 |. The set of Steiner points partitions bisectors into intervals, that we call Steiner intervals. The following lemma establishes two important properties of Steiner intervals (Figure 1). Lemma 2. (a) Let ℓ be the bisector of the angle formed by edges e1 and e2 of P . If (pi , pi+1 ) is a Steiner interval on ℓ and x is a point on e1 or e2 , then  (4) sin(∠pi xpi+1 ) ≤ ε/2. (b) Let x1 and x2 be points on e1 and e2 and outside the vertex vicinity of the vertex incident to e1 and e2 . If p is the Steiner point closest to the intersection between the segment (x1 , x2 ) and ℓ, then |x1 p| + |px2 | ≤ (1 + ε/2)|x1 x2 |. (5) Proof: The statement (a) follows easily from the definition of Steiner points. Here we prove (b). Let us denote by θ, θ1 , and θ2 the angles of the triangle px1 x2 at p, x1 and x2 respectively. From (a) and ε ≤ 1 it follows 1 /2) sin(θ2 /2) )|x1 x2 | = that θ ≥ π/2 and we have |x1 p| + |px2 | = (1 + 2 sin(θsin(θ/2) (1 + sin(θ1 ) sin(θ2 ) 2 sin(θ/2) cos(θ1 /2) cos(θ2 /2) )|x1 x2 | 3 Discrete Paths ≤ (1 + ε )|x1 x2 | 4 sin2 (θ/2) ≤ (1 + ε/2)|x1 x2 |.⊓ ⊔ Next, we define a graph Gε = (V (Gε ), E(Gε )). The set of nodes V (Gε ) consists of the set of vertices of P and the set of Steiner points. The set of edges E(Gε ) is 252 L. Aleksandrov, A. Maheshwari, and J.-R. Sack defined as follows. A node that is a vertex of P is connected to all Steiner points on bisectors in the faces incident to this vertex. The cost of these edges equals the cost of the shortest path between its endpoints restricted to lie inside the triangle containing them. These shortest paths consist either of a single segment joining the vertex and the corresponding Steiner point or of two segments the first of which follows one of the edges incident to the vertex. The rest of the edges of Gε join pairs of Steiner points lying on neighboring bisectors as follows. Let e be an edge of P . In general, there are four bisectors incident to e. We define graph edges between pairs of nodes (Steiner points) on these four bisectors. We refer to all these edges as edges of Gε crossing the edge e of P . Let (p, q) be an edge between Steiner points p and q crossing e. The cost of (p, q) is defined as the cost of the shortest path between p and q restricted to lie inside the quadrilateral formed by the two triangles around e, that is pq = minx,y∈e px+xy+yq. (Note that we do not need edges in Gε between pairs of Steiner points for which the local shortest paths do not intersect e.) Paths in Gε are called discrete paths. The cost of a discrete path π is the sum of the costs of its edges and is denoted by π. Note that if we replace each of the edges in a discrete path with the corresponding segments (at most three) forming the shortest path used to compute its cost we obtain a path on P of the same cost. Theorem 1. Let π̃(v0 , v) be a linear path joining two different vertices v0 and v on P . There exists a discrete path π(v0 , v), such that π ≤ (1 + ε)π̃. Proof: First, we discuss the structure of linear paths. Following from the definition, a linear path π̃(v0 , v) consists of face-crossing and edge-using segments and is determined by the sequence of their endpoints, called bending points, which are located on the edges of P . Following the path from v0 and on, let a0 be the last bending point on π̃ that is inside the vertex vicinity S(v0 ). Next, let b1 be the first bending point after a0 that is in a vertex vicinity, say S(v1 ), and let a1 be the last bending point in S(v1 ). Continuing in this way, we define a sequence of vertices of v0 , v1 , . . . , vl = v and a sequence of bending points a0 , b1 , a1 , . . . , al−1 , bl on π̃, such that for i = 0, . . . , l, points bi , ai are in S(vi ) (we assume b0 = v0 , al = v). Furthermore, portions of π̃ between ai and bi do not intersect vertex vicinities. Thereby, the path π̃ is partitioned into portions π̃(v0 , a0 ), π̃(a0 , b1 ), π̃(b1 , a1 ), . . . , π̃(bl , v). (6) Portions π̃(ai , bi+1 ) for i = 0, . . . , l−1 are called between vertex vicinities portions and portions π̃(bi , ai ) for i = 0, . . . , l (b0 = v0 ), are called vertex vicinities portions. Consider a between vertex vicinity portion π̃(ai , bi+1 ) for some 0 ≤ i < l. We define π̃ ′ (vi , vi+1 ) to be the linear path from vi to vi+1 along the sequence of inner bending points of π̃(ai , bi+1 ). Using triangle inequality and the definition of vertex vicinities (1) we obtain π̃ ′ (vi , vi+1 ) ≤ π̃(ai , bi+1 ) + vi ai  + bi+1 vi+1  ≤ ε π̃(ai , bi+1 ) + (wmin (vi )d(vi ) + wmin (vi+1 )d(vi+1 )). (7) 7 An Improved Approximation Algorithm 253 Changing all between vertex vicinities portions in this way we obtain a linear path π̃ ′ (v0 , v) = {π̃ ′ (v0 , v1 ), π̃ ′ (v1 , v2 ), . . . , π̃ ′ (vl−1 , v)}, consisting of between vertex vicinities portions only. Next, we approximate each of these portions by a discrete path. Consider a portion π̃i′ = π̃ ′ (vi , vi+1 ) for some 0 ≤ i < l and let sj = (xj−1 , xj ), j = 1, . . . , ν be the segments forming this portion (x0 = vi , xν = vi+1 ). Segments sj are facecrossing and edge-using segments. Indeed, there are no consecutive edge-using segments. Let sj be a face-crossing segment. Then sj intersects the bisector ℓj of the angle formed by the edges of P containing the end-points of sj . We define pj to be the closest Steiner point to the intersection between sj and ℓj . Now we replace each of the face-crossing segments sj of π̃i′ by two segments path xj−1 , pj , xj and denote the obtained path by π̃i′′ . From (5) it follows that π̃i′′  ≤ (1 + ε/2)π̃i′ . The sequence of bending points of π̃i′′ contains as a subsequence the Steiner points pj1 , . . . , pjν1 , (ν1 ≤ ν) corresponding to the facecrossing segments of π̃i′ . Note that pairs (vi , pi1 ) and (piν1 , vi+1 ) are adjacent in Gε . Furthermore, between any two consecutive Steiner points pjµ , pjµ+1 there is at most one edge-using segment and, according our definition of the graph Gε , they are connected in Gε . The cost of each edge (pjµ , pjµ+1 ) is at most the cost of the portions of π̃i′′ from pjµ to pjµ+1 . Therefore, the sequence of nodes {vi , pj1 , . . . , pjν1 , vi+1 } defines a discrete path π(vi , vi+1 ) such that π(vi , vi+1 ) ≤ π̃i′′  ≤ (1 + ε/2)π̃ ′ (vi , vi+1 ). (8) We combine discrete paths π(v0 , v1 ), . . . , π(vl−1 , v) and obtain a discrete path π(v0 , v) from v0 to v. We complete the proof by estimating the cost of this path. We denote wmin (vi )d(vi ) + wmin (vi+1 )d(vi+1 ) by κi and use (8), (7) obtaining π(v0 , v) = (1 + ε/2) l−1  i=0 l−1  i=0 π((vi , vi+1 ) ≤ (1 + ε/2) l−1  i=0 π̃ ′ (vi , vi+1 ) ≤ (π̃(ai , bi+1 ) + εκi /7) ≤ (1 + ε/2)π̃(v0 , v) + (3ε/14) l−1  κi . (9) i=0 l−1 It remains to estimate the sum i=0 κi appearing above. From the definitions of d(·), (6), and (1) it follows that κi ≤ 2π̃(ai , bi+1 ) + vi ai  + bi+1 vi+1  ≤ 2π̃(ai , bi+1 ) + κi /7. Thus κi ≤ (7/3)π̃(ai , bi+1 ) and substituting this in (9) we obtain the desired estimate π(v, v0 ) ≤ (1 + ε)π̃(v0 , v). ⊓ ⊔ 4 Algorithms In this section we discuss algorithms for solving the Single Source Shortest Paths (SSSP) problem in approximation graphs Gε . Straightforwardly, one can apply Dijkstra’s algorithm. When implemented using Fibonacci heaps it would solve SSSP problem in O(|Eε | + |Vε | log |Vε |) time. By Lemma 1, |Vε | = O( √nε log 1ε ) and by the definition of edges |Eε | = O( nε log2 1ε ). Thus it follows that the SSSP 254 L. Aleksandrov, A. Maheshwari, and J.-R. Sack problem can be solved by Dijkstra’s algorithm in O( nε log nε log 1ε ) time. Already this time matches the best previously known bound [15]. In the remainder of this section we show how geometric properties of our model can be used to obtain a more efficient algorithm for SSSP in the corresponding approximation graph. More precisely, we present an algorithm that runs in O(|Vε | log |Vε |) = O( √nε log nε log 1ε ) time. First, we discuss the general structure of our algorithm. Let G(V, E) be a directed graph with positive costs (lengths) assigned to its edges and s be a fixed vertex of G. The SSSP problem is to find shortest paths from s to any other vertex of G. The standard greedy approach for solving the SSSP problem works as follows: a subset of vertices S to which the shortest path has already been found and a set of edges E(S) connecting S with S a ⊂ V \ S is maintained. The set S a consists of vertices not in S but adjacent to S. In each iteration an optimal edge e(S) = (u, v) in E(S) is selected. Its target v is added to S and E(S) is updated correspondingly. An edge e = e(S) is optimal for S if it minimizes the value δ(u) + c(e), where δ(u) is the distance from s to u and c(e) is the cost of e. The correctness of this approach follows from the fact that when e = (u, v) is optimal the distance δ(v) is equal to δ(u) + c(e). Different strategies for maintaining information about E(S) and finding an optimal edge e(S) in each iteration result in different algorithms for computing SSSP. For example, Dijkstra’s algorithm maintains only a subset Q(S) of E(S), which however always contains an optimal edge. Namely, for each vertex v in S a Dijkstra’s algorithm keeps in Q(S) one edge only – the one that ends the shortest path to v using vertices in S only. Alternatively, one may maintain a subset Q(S) of E(S) containing one edge per vertex u ∈ S. The target vertex of this edge is called representative of u and is denoted by ρ(u). The vertex u itself is called predecessor of its representative. The representative ρ(u) is defined to be the target of the minimum cost edge in the propagation set I(u) of u, where I(u) ⊂ E(S) consists of all edges (u, v) such that δ(u) + c(u, v) ≤ δ(u′ ) + c(u′ , v) for any other vertex u′ ∈ S (ties are broken arbitrarily). The union of propagation sets forms a subset Q(S) of E(S), that always contains an optimal edge. Propagation sets I(u) for u form a partition of Q(S), which we call a Propagation Diagram and denote by I(S). Similar scheme has been used by [15]. A possible implementation of this alternative strategy is to maintain the set of representatives R ⊂ S a organized in a priority queue, where a key of a vertex ρ(u) in R is defined to be δ(u) + c(u, ρ(u)). Observe that the edge corresponding to the minimum in R is an optimal edge for S. In each iteration the minimum key node v in R is selected and the following three steps are implemented: Step 1. The vertex v is moved from R into S. Then the propagation set I(v) is computed and the propagation diagram I(S) is updated accordingly. Step 2. The representative ρ(v) of v and a new representative ρ(u) for the predecessor u of v are computed. Step 3. The new representatives ρ(u) and ρ(v) are either inserted into R together with their corresponding keys, or (if they are already in R) their keys are updated and the decrease key operation is executed in R if necessary. An Improved Approximation Algorithm 255 Clearly, this leads to a correct algorithm for solving the SSSP problem in G. The total time for the priority queue operations if R is implemented with Fibonacci heaps is O(|V | log |V |). Therefore the efficiency of this strategy depends on the maintenance of the propagation diagrams, the complexity of the propagation sets and efficient updates of the new representatives. Our approach is as follows. We partition the set of edges E(S) into groups, so that the propagation sets and the corresponding propagation diagrams when restricted to a fixed group become simple and allow efficient updates. Then for each vertex u in S we will keep multiple representatives in R, one for each group, where edges incident to u participate. As a result a vertex in S a will eventually have multiple predecessors. As we describe below, the number of groups where u can participate will be O(1). We will be able to compute new representatives in O(1) time and update propagation diagrams in logarithmic time in our approximation graphs Gε . Next, we present some details and state the complexity of the resulting algorithm. The edges of the approximation graph Gε were defined to join pairs of nodes (Steiner points) lying on neighboring bisectors, where two bisectors are neighbors if the angles they split share an edge of P . Since the polyhedral surface P is triangulated a fixed bisector may have at most six neighbors. We can partition the set of edges of Gε into groups E(ℓ, ℓ1 ) corresponding to pairs of neighboring bisectors ℓ and ℓ1 . For a node u on a bisector ℓ we maintain one representative ρ(u, ℓ1 ) per each bisector ℓ1 neighboring ℓ. The representative ρ(u, ℓ1 ) is defined to be the target of the minimum cost edge in the propagation set I(u; ℓ, ℓ1 ), consisting of the edges (u, v) in E(ℓ, ℓ1 ), such that δ(u)+c(u, v) ≤ δ(u′ )+c(u′ , v) for any node u′ ∈ ℓ ∩ S. A node on ℓ with a non-empty propagation set on ℓ1 will be called active for E(ℓ, ℓ1 ). Consider now an iteration of our greedy algorithm. Let v be the node produced by Extract min operation in the priority queue R comprising of representatives. Denote the set of predecessors of v by R−1 (v). Our task is to compute new representatives for v and for each of the predecessors u ∈ R−1 (v). Consider first the case when v is a vertex of the polyhedral surface P . We assume that the edges incident to a vertex v have been sorted with respect to their cost and when a new representative for v is required we simply report the target of the smallest cost edge joining v with S a . Thereby the new representative for a node that is a vertex of P can be computed in constant time. The total number of edges incident to vertices of P is O( √nε log 1ε ) and their sorting in a preprocessing step takes O( √nε log2 1ε ) time. Consider now the case when v is a node on a bisector say ℓ. An efficient computation of representatives in this case is based on the following two lemmas. Lemma 3. The propagation set I(v; ℓ, ℓ1 ) for an active node v is characterized by an interval (x1 , x2 ) on ℓ1 , i.e., it consists of all edges in E(ℓ, ℓ1 ) whose targets belong to (x1 , x2 ). Furthermore, the function dist(v, x), measuring the cost of the shortest path from v to x restricted to lie in the union of the two triangles containing ℓ and ℓ1 , is convex in (x1 , x2 ). 256 L. Aleksandrov, A. Maheshwari, and J.-R. Sack Lemma 4. Let v1 , . . . , vk be the active vertices for E(ℓ, ℓ1 ). The propagation diagram I(ℓ, ℓ1 ) = I(v1 , . . . , vk ) is characterized by k intervals. Updating the diagram I(v1 , . . . , vk ) to the propagation diagram I(v1 , . . . , vk , v), where v is a new active node in ℓ takes O(log k) time. Thus to compute a new representative of v on a neighboring bisector ℓ1 we update the propagation diagram I(ℓ, ℓ1 ). Then we consider the interval characterizing the propagation set I(v; ℓ, ℓ1 ) and select the minimum cost edge whose target is in that interval and in S a . Assume that nodes on ℓ1 currently in S a are maintained in a doubly linked list with their positions on ℓ1 . Using the convexity of the function dist(v, x) this selection can be done in time logarithmic on the number of these nodes, which is O(log 1ε ). There are at most six new representatives of v corresponding to bisectors around ℓ to be computed. Thus the total time for updates related to v is O(log 1ε ). The update of the representative for a node u ∈ R−1 (v) on ℓ takes constant time since no change in the propagation set I(u; ·, ℓ) occurred and the new representative of u is a neighbor to the current one in the list of nodes in S a on ℓ. The set of predecessors R−1 (v) contains at most six vertices and thus their representatives are updated in constant time. So computing representatives in an iteration takes O(log 1ε ) time and in total O(|Vε | log 1ε ). The following theorem summarizes the result of this section. Theorem 2. The SSSP problem in the approximation graph Gε for a polyhedral surface P can be solved in O( √nε log nε log 1ε ) time. In the following theorem we summarize the main result of this paper. Starting from a vertex v0 our algorithm solves SSSP problem in the graph Gε and construct shortest paths tree rooted at v0 . According to Theorem 1 output distances from v0 to other vertices of P are within a factor of 1 + ε from the cost of the shortest paths. Using the definition of the edges of Gε an approximate shortest path between pair of vertices can be output in time proportional to the number of segments in this path. The approximate shortest paths tree rooted at v0 and containing all Steiner points and vertices of P can be output in O(|Vε |) time. Thus we have established the following theorem. Theorem 3. Let P be a weighted polyhedral surface with n triangular faces and ε ∈ (0, 1). Shortest paths from a vertex v0 to all other vertices of P can be approximated within a factor of (1 + ε) in O( √nε log nε log 1ε ) time. Extensions: We briefly comment on how our approach can be applied to approximate shortest paths in weighted polyhedral domains and formulate the corresponding result. In 3-dimensional space most shortest path problems are difficult. Given a set of pairwise disjoint polyhedra in 3D and two points s and t, the Euclidean 3-D Shortest Path Problem is to compute a shortest path between s and t that avoids the interiors of polyhedra seen as obstacles. Canny and Reif have shown that this problem is NP-hard [6] (even for the case of axis parallel triangles in 3D). Papadimitriou [13] gave the first fully polynomial (1 + ǫ)-approximation algorithm for the 3D problem. There are numerous other An Improved Approximation Algorithm 257 results on this problem, but due to the space constraints we omit their discussion and refer the reader to the most recent work [5] for a literature review. Let P be a tetrahedralized polyhedral domain in the 3-dimensional Euclidean space, consisting of n tetrahedra. Assume that positive weights are assigned to the tetrahedra of P and that the cost of traveling inside a tetrahedron t is equal to the Euclidean distance traveled multiplied by the weight of t. Using the approach of this paper we are able to approximate shortest paths in P within (1+ε) factor as follows: Discretization in this case is done by inserting Steiner points in the bisectors of the dihedral angles of the tetrahedra of P . The total number of Steiner points in this case is O( εn2 log 1ε ). The construction of Steiner points and the proof of the approximation properties of the resulting graph Gε in this case involves more elaborate analysis because of the presence of edge vicinities – small spindle like regions around edges – in addition to vertex vicinities. Nevertheless, an analogue to Theorem 1 holds. SSSP in the graph Gε can be computed by following a greedy approach like that in Section 4. References 1. K.R. Varadarajan and P.K. Agarwal, “Approximating Shortest Paths on Nonconvex Polyhedron”, SIAM Jl. Comput. 30(4): 1321–1340 (2000). 2. P.K. Agarwal, S. Har-Peled, and M.Karia, “Computing approximate shortest paths on convex polytopes”, Algorithmica 33:227–242, 2002. 3. P.K. Agarwal et al., “Approximating Shortest Paths on a Convex Polytope in Three Dimensions”, Jl. ACM 44:567–584, 1997. 4. L. Aleksandrov, M. Lanthier, A. Maheshwari, J.-R. Sack, “An ε-approximation algorithm for weighted shortest paths”, SWAT, LNCS 1432:11–22, 1998. 5. L. Aleksandrov, A. Maheshwari, and J.-R. Sack, ”Approximation Algorithms for Geometric Shortest Path Problems”, 32nd STOC, 2000, pp. 286–295. 6. J. Canny and J. H. Reif, “New Lower Bound Techniques for Robot Motion Planning Problems”, 28th FOCS, 1987, pp. 49–60. 7. J. Chen and Y. Han, “Shortest Paths on a Polyhedron”, 6th SoACM-CG, 1990, pp. 360–369. Appeared in ”Internat. J. Comput. Geom. Appl.”, 6: 127–144, 1996. 8. J. Hershberger and S. Suri, “Practical Methods for Approximating Shortest Paths on a Convex Polytope in ℜ3 ”, 6SODA, 1995, pp. 447–456. 9. S. Kapoor, “Efficient Computation of Geodesic Shortest Paths”, 31st STOC, 1999. 10. M. Lanthier, A. Maheshwari and J.-R. Sack, “Approximating Weighted Shortest Paths on Polyhedral Surfaces”, Algorithmica 30(4): 527–562 (2001). 11. J.S.B. Mitchell, D.M. Mount and C.H. Papadimitriou, “The Discrete Geodesic Problem”, SIAM Jl. Computing, 16:647–668, August 1987. 12. J.S.B. Mitchell and C.H. Papadimitriou, “The Weighted Region Problem: Finding Shortest Paths Through a Weighted Planar Subdivision”, JACM, 38:18–73, 1991. 13. C.H. Papadimitriou, “An Algorithm for Shortest Path Motion in Three Dimensions”, IPL, 20, 1985, pp. 259–263. 14. M. Sharir and A. Schorr, “On Shortest Paths in Polyhedral Spaces”, SIAM J. of Comp., 15, 1986, pp. 193–215. 15. Z. Sun and J. Reif, “BUSHWACK: An approximation algorithm for minimal paths through pseudo-Euclidean spaces”, 12th ISAAC, LNCS 2223:160–171, 2001. 16. M. Ziegelmann, Constrained Shortest Paths and Related Problems Ph.D. thesis, Universität des Saarlandes (Max-Planck Institut für Informatik), 2001. Adaptive and Compact Discretization for Weighted Region Optimal Path Finding Zheng Sun and John H. Reif Department of Computer Science, Duke University, Durham, NC 27708, USA {sunz,reif}@cs.duke.edu Abstract. This paper presents several results on the weighted region optimal path problem. An often-used approach to approximately solve this problem is to apply a discrete search algorithm to a graph Gǫ generated by a discretization of the problem; this graph guarantees to contain an ǫ-approximation of an optimal path between given source and destination points. We first provide a discretization scheme such that the size of Gǫ does not depend on the ratio between the maximum and minimum unit weights. This leads to the first ǫ-approximation algorithm whose complexity is not dependent on the unit weight ratio. We also introduce an empirical method, called adaptive discretization method, that improves the performance of the approximation algorithms by placing discretization points densely only in areas that may contain optimal paths. BUSHWHACK is a discrete search algorithm used for finding optimal paths in Gǫ . We added two heuristics to BUSHWHACK to improve its performance and scalability. 1 Introduction In the past two decades the geometric optimal path problems have been extensively studied (see [1] for a review). These problems have a wide range of applications in robotics and geographical information systems. In this paper we study the path planning problem for a point robot in a 2D space consisting of n triangular regions, each of which is associated with a distinct unit weight. Such a space can be used to model an area consisting of different geographical features, such as deserts, forests, grasslands, and lakes, in which the traveling costs for the robot are different. The goal is to find between given source and destination points s and t an optimal path (a path with the minimum weighted length). Unlike the un-weighted 2D optimal path problem, which can be solved in O(n log n) time, this problem is believed to be very difficult. Much of the effort has been focused on ǫ-approximation algorithms that can guarantee to find ǫgood approximate optimal paths (see [2,3,4,5,6]). For any two points s and t in the space, we say that a path p connecting s and t is an ǫ-good approximate optimal path if p < (1 + ǫ)popt (s, t), where popt (s, t) represents an optimal path from s to t and  ·  represents the weighted length, or the cost, of a path. Equivalently, we say that p is popt (s, t)’s ǫ-approximation. A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 258–270, 2003. c Springer-Verlag Berlin Heidelberg 2003  Adaptive and Compact Discretization 259 Before we give a review of previous works, we first define some notations. We let V be the set of vertices of all regions, and let E be the set of all boundary edges. We use wr to denote the unit weight of any region r. For a boundary edge e separating two regions r1 and r2 , the unit weight we of e is defined to be max min{wr1 , wr2 }. We define unit weight ratio µ to be w wmin , where wmax (wmin , respectively) is the maximum (minimum, respectively) unit weight among all regions. We use |p| to denote the Euclidean length of path p, and use p1 + p2 to denote the concatenation of two paths p1 and p2 . The first ǫ-approximation algorithm on this problem was given by Mitchell and Papadimitriou [2]. Their algorithm uses “Snell’s Law” and “continuous Dijkstra method” to give an optimal-path map for any given source point s. The time complexity of their algorithm is O(n8 log nµ ǫ ). In practice, however, the time complexity is expected to be much lower. Later Mata and Mitchell [3] presented another ǫ-approximation algorithm based on constructing a “pathnet graph” of 3 size O(nk), where ǫ = O( µk ). The time complexity, in terms of ǫ and n, is O( nǫ µ ). Some of the existing algorithms construct from the original continuous space a weighted graph Gǫ (V ′ , E ′ ) by placing discretization points, called Steiner points, on boundary edges. The node set V ′ of Gǫ contains all Steiner points as well as vertices of the regions. The edge set E ′ of Gǫ contains every edge v1 v2 such that v1 and v2 are on the border of the same region. The weight of edge v1 v2 is determined by the weighted length of segment v1 v2 in the original weighted space. Gǫ guarantees to contain an ǫ-good approximate optimal path between s and t, and therefore the task of finding an ǫ-good approximate optimal path is reduced to computing a shortest path in Gǫ , which we call optimal discrete path, using a discrete search algorithm such as Dijkstra’s algorithm or BUSHWHACK [5,7]. In the remainder of this paper, we will mainly discuss techniques for approximation algorithms using this approach. Since an optimal discrete path from s to t in Gǫ is used as an ǫ-approximation for the real optimal path, the phrases “optimal discrete path” and “ǫ-good approximate optimal path” are used interchangeably, and are both denoted by p′opt (s, t). Aleksandrov et al. [4,6] proposed two discretization schemes that place O( 1ǫ log 1ǫ log µ) Steiner points on each boundary edge to construct Gǫ for a given ǫ. Combining the discretization scheme of [6] with a “pruned” Dijkstra’s algorithm, they provided an ǫ-approximation algorithm that runs in roughly O( nǫ ( √1ǫ + log n) log 1ǫ log µ) time. It is important to note, however, that the discretization size (and therefore the time complexity) for these approximation algorithms also depends on various geometric parameters, such as the smallest angle between two adjacent boundary edges, maximum integer coordinate of vertices, etc. These parameters are omitted here since they are irrelevant to our discussion. In this paper we present the following results on finding ǫ-good approximate optimal paths in weighted regions: Compact Discretization Scheme. The complexity of each of the approximation algorithms we have mentioned above depends more or less on µ, either 260 Z. Sun and J.H. Reif linearly ([3]) or logarithmically ([2,4,6]). This dependency is caused by the corresponding discretization scheme used. In particular, the discretization scheme of Aleksandrov et al. [6] places O( 1ǫ log 1ǫ log µ) Steiner points on each boundary edge. Here again we omit the other geometric parameters. The main obstacle for removing the dependency on µ from the size of Gǫ is that otherwise it is difficult to prove that for each optimal path popt there exists in Gǫ a discrete path that is an ǫ-approximation of popt . One traditional proof technique used in proving the existence of such a discrete path is to decompose popt into k subpaths p1 , p2 , · · · , pk and then construct a discrete path p′ = p′1 + p′2 + · · · + p′k such that p′i  ≤ (1 + ǫ)pi  for each i. Ideally, we could choose p′i such that pi and p′i lie in the same region, and therefore the discretization just needs to make sure that |p′i | ≤ (1 + ǫ)|pi |. However, due to the discrete nature of Gǫ , it is not always possible to find such p′i for each pi . For example, as shown in Figure 1.a, popt could cross a series of boundary edges near a vertex v. The point where it crosses each boundary edge e is between v and the closest Steiner point from v on e. In that case, p′i could travel in regions different from where pi lies in, and therefore to bound p′i  with respect to pi , the discretization scheme has to take into consideration variance of unit weights. By modifying the above proof technique, we provide in Section 2 an improvement on the discretization scheme of Aleksandrov et al. [6]. The number of Steiner points inserted by this new discretization scheme is O( 1ǫ log 1ǫ ), with the dependency on other geometric parameters unchanged. Combining BUSHWHACK with this discretization scheme, we can have the first ǫ-approximation algorithm whose time complexity is not dependent on µ. popt New York ✔ Durham Malmö < $980 ≥ $100 v ≥ $600 ✈ ≥ $750 ≥ $300 Mexico City ✘ (a) “Bad” optimal path (b) Searching for the cheapest flight Fig. 1. Adaptive Discretization Method. The traditional approximation algorithms construct from the original space a graph Gǫ and compute with a discrete search algorithm an optimal discrete path in Gǫ in a one-step manner. We call this method the fixed discretization method. For single query problem, this method is rather inefficient in that, although the goal is to find an ǫ-good approximate optimal path p′opt (s, t) from s to t, it actually computes an ǫ-good approximate Adaptive and Compact Discretization 261 optimal path from s to any point v in Gǫ , as long as the cost of such a path is less than that of p′opt (s, t). Much of the effort is unnecessary as most of these points would not help to find an ǫ-good approximate optimal path from s to t. We use flight ticket booking as an example. When trying to find the cheapest flight from Durham to Malmö with one stop (supposing no direct flight is available), a travel agent does not need to consider Mexico City as a candidate for the connecting airport if she knows the following: a) there is always a route from Durham to Malmö with one stop that costs less than $980; b) any direct flight from Durham to Mexico City costs no less than $300; and c) any direct flight from Mexico City to Malmö costs no less than $750. Therefore, she does not need to find out the exact prices of the direct flights from Durham to Mexico City and from Mexico City to Malmö, saving two queries to the ticketing database. Analogously, we do not need to compute p′opt (s, v) and p′opt (v, t) for a point v ∈ Gǫ if we know in advance that v does not connect any optimal discrete path between s and t. However, while the travel agent can rely on knowledge she previously gained, the approximation algorithms using the fixed discretization method have no prior knowledge to draw upon. In Section 3 we discuss a multiple-stage discretization method that we call adaptive discretization method. It starts with a coarse discretization G ′ = Gǫ1 for some ǫ1 > ǫ and adaptively refines G ′ until it guarantees to contain an ǫ-good approximate optimal path from s to t. Approximate optimal path information acquired in each stage is used to identify the areas where no optimal path from s to t will pass through and therefore no further Steiner point needs to be inserted in the next stage. Heuristics for BUSHWHACK. The BUSHWHACK algorithm is an alternative algorithm for computing optimal discrete paths in Gǫ . It uses a number of complex data structures to keep track of all potential optimal paths. When m, the number of Steiner points placed on each boundary edge, is small, the efficiency gained by accessing only a subgraph of Gǫ is outweighed by the cost of establishing and maintaining these data structures. Another weakness of BUSHWHACK is that its performance improvement diminishes when the number of regions in the space is large. These weaknesses affect the practicability of BUSHWHACK as in most cases the desired quality of approximation does not require too many Steiner points for each boundary edge, while in the given 2D space there can be arbitrary number of regions. In Section 4 we introduce two cost-saving heuristics for the original BUSHWHACK algorithm to overcome the weaknesses mentioned above. 2 Compact Discretization Scheme In this section we provide an improvement on the discretization scheme of Aleksandrov et al. [6] by removing the dependency of the size of Gǫ on the unit weight ratio µ. For any point v, we let E(v) be the set of boundary edges incident to v and let d(v) be the minimum distance between v and boundary edges in E\E(v). For each edge e ∈ E, we let d(e) = sup{d(v) | v ∈ e} and let ve be the point on e so 262 Z. Sun and J.H. Reif that d(ve ) = d(e). For each vertex v of a region, the radius r′ (v) of v is defined to wmin (v) ′ be d(v) 5 , and the weighted radius r(v) of v is defined to be wmax (v) · r (v), where wmin (v) and wmax (v) are the minimum and maximum unit weights among all regions incident to v, respectively. According to the discretization scheme of Aleksandrov et al. [6], for each boundary edge e = v1 v2 , the Steiner points on e are chosen as the following. Each vertex vi has a “vertex-vicinity” S(vi ) of radius rǫ (vi ) = ǫr(vi ) and the Steiner points vi,1 , vi,2 , · · · , vi,ki are placed on the segment of e outside the vertex-vicinities so that |vi vi,1 | = rǫ (vi ), |vi,j vi,j+1 | = ǫd(vi,j ) and vi,ki vi + ǫd(vi,ki ) ≥ |vi ve |. The number of Steiner points placed  on e can be bounded by C(e) · 1ǫ log 1ǫ , where C(e) = O(|e|/d(e) · log(|e|/ r(v1 )r(v2 ))) =  O(|e|/d(e) · (log(|e|/ r′ (v1 )r′ (v2 )) + log µ)). This discretization can guarantee a 3ǫ-good approximate optimal path. Observe that, for this discretization scheme, on each boundary edge e Steiner points are placed more densely in the portion of e closer to the two endpoints, with the exception that no Steiner point is placed inside the vertex-vicinities. Therefore, the larger the vertex vicinities are, the less Steiner points the discretization needs to use. In the following we show that the radius rǫ (v) of the vertex-vicinity of v can be increased to ǫr′ (v) while still guaranteeing the same error bound. Here we assume that ǫ ≤ 21 . A piecewise linear path p is said to be a normalized path if it does not cross region boundaries inside vertex vicinities other than at the vertices. That is, for each bending point u of p, if u is located on boundary edge e = v1 v2 , then either u is one of the endpoints of e, or |vi u| ≥ rǫ (vi ) for i = 1, 2. For example, the path shown in Figure 2 is not a normalized path, as it passes through u1 and u2 , both of which are inside the vertex vicinity of v. We first state the following lemma: Lemma 1. For any path p from s to t, there is a normalized path p̂ from s to t such that p̂ = (1 + 2ǫ ) · p. Proof. In the following, for a path p and two points u1 , u2 ∈ p, we use p[u1 , u2 ] to denote the subpath of p between u1 and u2 . Refer to Figure 2. Suppose path p passes through the vertex vicinity S(v) of v, as shown in Figure 2. We use u1 (u2 , respectively) to denote the first (last, respectively) bending point of p inside S(v), and use u′′1 (u′′2 ) to denote the first (last, respectively) bending point of p on the border of the union of all regions incident to v. By the definition of d(v), we have |p[u′′1 , u1 ]| + |u1 v| ≥ d(v) and ǫ·d(v)/5 ǫ |p[u2 , u′′2 ]| + |vu2 | ≥ d(v). Therefore, |u1 v|/|p[u′′1 , u1 ]| ≤ d(v)−ǫ·d(v)/5 = 5−ǫ ≤ 4ǫ , ǫ ′′ as |u1 v| ≤ ǫd(v) 5 . Similarly, we can prove that |vu2 |/|p[u2 , u2 ]| ≤ 4 . We let r1 be the region with the minimum unit weight among all regions crossed by subpath p[u′′1 , u1 ], and u′1 be the point where p[u′′1 , u1 ] enters region r1 for the first time. Similarly, we let r2 be the region with the minimum unit weight among all regions crossed by subpath p[u2 , u′′2 ], and let u′2 be the point where p[u2 , u′′2 ] leaves region r1 for the last time. Adaptive and Compact Discretization 263 cheap region u′′2 cheap region r1 u′1 u1 r2 v u′2 u2 u′′1 Fig. 2. Path passing through vicinity of a vertex Consider replacing subpath p[u′′1 , u′′2 ] by this normalized subpath: p̂[u′′1 , u′′2 ] = + u′1 v + vu′2 + p[u′2 , u′′2 ]. We have the following inequality: p[u′′1 , u′1 ] p̂[u′′1 , u′′2 ] − p[u′′1 , u′′2 ] = wr1 · |u′1 v| + wr2 · |vu′2 | − p[u′1 , u1 ] − p[u1 , u2 ] − p[u2 , u′2 ] ≤ (wr1 · |u′1 v| − p[u′1 , u1 ]) + (wr2 · |vu′2 | − p[u2 , u′2 ]) ≤ wr1 (|u′1 v| − |p[u′1 , u1 ]|) + wr2 (|vu′2 | − |p[u2 , u′2 ]|) ǫ·|p[u2 ,u′′ ǫ·|p[u′′ 2 ]| 1 ,u1 ]| + wr2 · ≤ wr1 · |u1 v| + wr2 · |vu2 | ≤ wr1 · 4 4 ≤ 4ǫ · (p[u′′1 , u1 ] + p[u2 , u′′2 ]) ≤ 4ǫ · p[u′′1 , u′′2 ] Therefore, p̂[u′′1 , u′′2 ] ≤ (1+ 4ǫ )p[u′′1 , u′′2 ]. Suppose p passes through k vertex vicinities, S(v1 ), S(v2 ), · · · , S(vk ). For each vi , we replace the subpath pi of p that passes through S(vi ) by a normalized subpath p̂i as we described above. Let p̂ be the resulting normalized path. Note that the sum of the weighted lengths of p1 , p 2 , · · · , pk is less than twice of the weighted length of p, we have k ⊓ ⊔ p̂ ≤ p + 4ǫ i=1 pi  ≤ (1 + 2ǫ )p. We call a segment of a boundary edge bounded by two adjacent Steiner points a Steiner segment. Each segment u1 u2 of a normalized path p̂ is significantly long as compared to the Steiner segment on which u1 or u2 lies. Therefore, it is easy to find a discrete path in Gǫ that is an ǫ-approximation of p̂. With Lemma 1, we can prove the claimed error bound for this modified discretization: Theorem 1. The discretization constructed with rǫ (v) = ǫr′ (v) contains a 3ǫgood approximation for an optimal path popt from s to t, for any two vertices s and t. Proof. We first construct a normalized path p̂ such that p̂ ≤ (1 + 2ǫ )popt . Then we can use a proof similar to the one provided in [6] to show that, for 264 Z. Sun and J.H. Reif any normalized path p̂, there is a discrete path p′ so that p′  ≤ (1 + 2ǫ)p̂. Therefore, p′  ≤ (1 + 2ǫ)(1 + 2ǫ )popt  = (1 + 25 ǫ + ǫ2 )popt  ≤ (1 + 3ǫ)popt , ⊓ ⊔ assuming ǫ ≤ 21 . With the modification on the radius of each vertex vicinity, for each boundary edge e the number of Steiner points placed on e is reduced to C ′ (e)· 1ǫ log 1ǫ , where  C ′ (e) = O(|e|/d(e) log(|e|/ r′ (v1 )r′ (v2 ))). Note that C ′ (e) is independent of µ. The significance of this compact discretization scheme is that, combining it with either Dijkstra’s algorithm or BUSHWHACK, we can get an approximation algorithm whose time complexity does not depend on µ. To our best knowledge, all previous ǫ-approximation algorithms have time complexities dependent on µ. 3 Adaptive Discretization Method Even with the compact discretization scheme, the size of Gǫ can still be very large even for a modest ǫ, as the number of Steiner points placed on each boundary edge is also determined by a number of geometric parameters. Therefore, computing an ǫ-good approximate optimal path by directly applying a discrete search algorithm to Gǫ may be very costly. In particular, a discrete search algorithm such as Dijkstra’s algorithm will compute an optimal discrete path from s to every point v ∈ Gǫ that is closer to s than t is, meaning that it has to search through a large space with the same (small) error tolerance ǫ. Here we further elaborate the flight ticket booking example. With the knowledge accumulated through past experiences, the travel agent may know, for any intermediate airport A, a lower bound LD,A of the cost of a direct flight from Durham to A as well as a lower bound LA,M of the cost of a direct flight from A to Malmö. Further, she also knows an upper bound, say, $980, of the cost of the cheapest flight (with one stop) from Durham to Malmö. In that case, the travel agent would only consider airport A as a possible stop between Durham and Malmö if LD,A + LA,M < 980. For example, it at least worths the effort to check the database to find out the exact cost of the flight from Durham to Malmö via New York, as shown in Figure 1.b. The A* algorithm partially addresses this issue as it would first explore points that are estimated using a heuristic function to be closer to the destination point t. However, if the unit weights of the regions vary significantly, it is difficult for a heuristic function to provide a close estimation of the weighted distance between any point and t. As a result, the A* algorithm may still have to search through many points in Gǫ unnecessarily. Here we introduce a multi-stage approximation algorithm that uses an adaptive discretization method. For each i, 1 ≤ i ≤ d, this method computes an ǫi -good approximate path from s to t in a subgraph Gǫ′ i of Gǫi , where ǫ1 > ǫ2 > · · · > ǫd−1 > ǫd = ǫ. In each stage, with the approximate optimal path information acquired through the previous stage, the algorithm can identify for each boundary edge the portion of the edge where more Steiner points need to placed to guarantee an approximate optimal path with a reduced error bound. Adaptive and Compact Discretization 265 For the rest portion of the boundary edge, no further Steiner point needs to be placed. We say that a path p′ neighbors an optimal path popt if, for any Steiner segment that popt crosses, p′ passes through one of the two Steiner points that bound the Steiner segment. Our method requires that the discretization scheme satisfy the following property (which is the case for the discretization schemes of [4,6] and the one described in Section 2): Property 1. For any two vertices v1 and v2 in the original (continuous) space and any optimal path popt from v1 and v2 , there is a discrete path from v1 to v2 in the discretization with a cost no more than (1 + ǫ) · popt (v1 , v2 ) that neighbors popt . For any two points v1 , v2 ∈ Gǫ′ i , we denote the optimal discrete path found from v1 to v2 in the i-th stage by p′ǫi (v1 , v2 ). We say that a point v ∈ Gǫ′ i is a searched point if an optimal discrete path p′ǫi (s, v) from s to v in Gǫ′ i is determined. For each searched point v, we also compute an optimal discrete path p′ǫi (v, t) from v to t. We say that a point v is a useful point if either p′ǫi (s, v) + p′ǫi (v, t) ≤ (1 + ǫi ) · p′ǫi (s, t) or v is a vertex; we say that a Steiner segment is a useful segment if at least one of its endpoints is useful. An optimal path popt will not pass through a useless segment, and therefore in the next stage the algorithm can avoid putting more Steiner points in this segment. 1. i ← 1 2. construct a discretization Gǫ′ i = Gǫi . 3. repeat 4. compute p′ǫi (s, t) in Gǫ′ i . 5. if i = d then return p′ǫi (s, t). 6. continue to compute p′ǫi (s, v) for each point v in Gǫ′ i until p′ǫi (s, v) grows beyond (1 + ǫi ) · p′ǫi (s, t). 7. apply Dijkstra’s algorithm in a reversed way, and compute p′ǫi (v, t) for any searched point v. 8. G ′ ǫi+1 ← ∅ 9. for each useful point v ∈ G ′ ǫi 10. add v into G ′ ǫi+1 11. for each point v ∈ Gǫi+1 12. if v is located inside a useful Steiner segment of G ′ ǫi then 13. add v into G ′ ǫi+1 14. i ← i+1 Algorithm 1: Adaptive Each stage contains a forward search and a backward search. These two searches can be performed simultaneously using Dijkstra’s two-tree algorithm [8]. To prove the correctness of our multiple-stage approximation algorithm, it suffices to show the following theorem: Theorem 2. For any optimal path popt (s, t), in each Gǫ′ i there is a discrete path p′ (s, t) with a cost no more than (1 + ǫi ) · popt (s, t) that neighbors popt (s, t). 266 Z. Sun and J.H. Reif Proof. We prove by induction. Basic Step: When i = 1, Gǫ′ i = Gǫ1 , and therefore the proposition is true, according to Property 1. Inductive Step: We assume that, for any optimal path popt (s, t), Gǫ′ i contains a discrete path p′ (s, t) neighboring popt (s, t) such that p′ (s, t) ≤ (1 + ǫi ) · popt (s, t). We first show that popt (s, t) will not pass through any useless Steiner segment u1 u2 in Gǫ′ i . Suppose otherwise that popt (s, t) passes through a point between u1 and u2 . According to the induction hypothesis, we can construct a discrete path p′ (s, t) from s to t with a cost no more than (1 + ǫi ) · popt (s, t) that neighbors popt (s, t). This implies that p′ (s, t) passes through either u1 or u2 . W.L.O.G. we assume that p′ (s, t) passes through u1 . Because popt (s, t) ≤ p′ǫi (s, t), the cost of p′ (s, t) is no more than (1 + ǫi ) · p′ǫi (s, t). This is a contradiction to the fact that p′ǫi (s, u1 ) + p′ǫi (u1 , t) > (1 + ǫi ) · p′ǫi (s, t), as p′ (s, t) cannot be better than the concatenation of p′ǫi (s, u1 ) and p′ǫi (u1 , t). Since any optimal path from s to t will not pass through a useless Steiner segment, Gǫ′ i+1 , which includes all the Steiner points of Gǫi+1 except those inside useless Steiner segments, contains every discrete path in Gǫ+1 that neighbors one of the optimal paths from s to t. This finishes the proof. ⊓ ⊔ The adaptive discretization method has both pros and cons when compared against the fixed discretization method. It has to run a discrete search algorithm on d different graphs, and each time it involves both forward and backward searches. However, in the earlier stages it explores approximate optimal paths with high error tolerance, while in later stages, as it gradually reduces the error tolerance, it only searches approximate optimal paths in a small subspace (that is, the useful segments of the boundary edges) instead of the entire original space (all boundary edges). Our experimental results show that, when the desired error tolerance ǫ is small, the adaptive discretization method performs more efficiently than the fixed discretization. This discretization method can also be applied to other geometric optimal path problems, such as the time-optimum movement planning problem in regions with flows [9], the anisotropic optimal path problem [10,11], and the 3D Euclidean shortest path problem [12,13]. 4 Heuristics for BUSHWHACK The BUSHWHACK algorithm was originally designed for the weighted region optimal path problem [5] and was later generalized to a class of piecewise pseudoEuclidean optimal path problems [7]. BUSHWHACK, just like Dijkstra’s algorithm, is used to compute optimal discrete paths in a graph Gǫ generated by a discretization scheme. Unlike Dijkstra’s algorithm, which applies to any arbitrary weighted graph, BUSHWHACK is adept at finding optimal discrete paths in graphs derived from geometric spaces with certain properties, one of which being the following: Property 2. Two optimal discrete paths that originate from a same source point cannot intersect in the interior of any region. Adaptive and Compact Discretization 1 0 0 1 0 1 0 1 0 1 0 1 11 00 00 11 00 11 00 11 00 11 e′ 00 11 00 11 00 11 1 0 00 11 0000000 1111111 1 0 00 11 0000000 1111111 00 11 0000000 1111111 00 11 0000000 1111111 00 11 0000000 1111111 00 11 00 11 0000000 1111111 r 1111111 00 11 00 11 0000000 00 11 0000000 1111111 1 0 00 11 0 1 00 11 0000000 1111111 0 1 00 11 0 1 00 11 0000000 1111111 0 00 0 1 11 1 e 1 0 0000 1111 01111 1 0000 0 1 0000 1111 0000 1111 0000 1111 0000 1111 0000 1111 0000 1111 0000 1111 0000 1111 0000 1111 e′′ 0000 1111 0000 1111 0000 1111 0000 1111 0000 1111 0000 1111 0000 1111 0000 1111 0 1 0000 1111 0 1 0 1 0 1 0 1 (a) Edges associated with ILISTe′′ ,e′ 1 0 00 0 11 1 00 11 00 11 e′ 0 1 0 1 0 1 00 11 11 00 00 11 1 0 0 1 r e 1 0 0 1 1 0 1 0 0 1 1 0 0 e′′ 1 0 1 0 1 0 1 0 1 0 1 00 00 11 11 00 11 11 00 00 00 11 11 (b) Edges associated with ILISTe′′ ,e′ 267 11 00 0000 1111 001111 11 0000 00 0011 11 0000 1111 00 11 00 11 0000 1111 00 11 00 11 0000 1111 1 0 00 11 00 11 0000 1111 1 0 00 11 0000 1111 00 11 0000 1111 00 11 0000 1111 00 11 e′ 00 11 0000 1111 00 11 e′′ 00 11 0000 1111 11 00 00 11 0000 1111 0000000 1111111 11 00 11 00 00 11 0000 1111 0000000 1111111 11 00 00 11 0000 1111 0000000 1111111 00 11 0000 1111 0000000 00 11 0000 1111 r1111111 00 11 0000000 1111111 0 1 00 11 0000 1111 00 11 0000000 1111111 0 1 00 11 0000 1111 00 11 0000000 1111111 00 11 0000 1111 0000000 1111111 00 11 0 1 00 11 00 11 00 11 0000 1111 0000000 1111111 00 11 00 11 0 1 00 11 00 11 00 11 0000000 1111111 00 00 0 00 11 00 11 11 1 11 e (c) Edges associated with either interval lists Fig. 3. Intersecting Edges Associated with Two Interval Lists One implication of Property 2 is that, if two edges v1 v2 and u1 u2 of Gǫ intersect inside region r, they cannot both be useful. An edge is said to be useful if it contributes to optimal discrete paths that originate from s. To exploit this property, BUSHWHACK maintains a list ILISTe,e′ of intervals for each pair of boundary edges e and e′ such that e and e′ are on the border of the same region r. A point v is said to be discovered if an optimal discrete path p′opt (s, v) has been determined. ILISTe,e′ contains for each discovered point v ∈ e an interval Iv,e,e′ defined as the following: Iv,e,e′ = {v ∗ ∈ e′ |wr · |vv ∗ | + p′opt (s, v) ≤ wr · |v ′ v ∗ | +p′opt (s, v ′ )∀ v ′ ∈ PLISTe }. Here PLISTe is the list of all discovered points on e. We say that edge vv ∗ is associated with interval list ILISTe,e′ if v ∈ e and v ∗ ∈ Iv,e,e′ . It is easy to see that any edge vv ∗ that crosses region r is useful only if it is associated with an interval list inside r. If m is the number of Steiner points placed on each boundary edge, the total number of edges associated with interval lists inside a region r is Θ(m). Dijkstra’s algorithm, on the other hand, has to consider all Θ(m2 ) edges inside r. By avoid accessing most of the useless edges, BUSHWHACK takes only O(nm log nm) time to compute an optimal discrete path from s to t, as compared to O(nm2 +nm log nm) time for Dijkstra’s algorithm. In this section we introduce BUSHWHACK+ , a variation of BUSHWHACK. On the basis of the original BUSHWHACK algorithm, BUSHWHACK+ uses several cost-saving heuristics. The necessity of the first heuristic is rather obvious. Let r be a triangular region with boundary edges e, e′ and e′′ . There are six interval lists for each triangular region r, one for each ordered pair of boundary edges of r. Although the edges associated with the same interval list do not intersect with each other, two edges associated with different interval lists may still intersect inside r. Therefore, BUSHWHACK may still use some intersecting edges to construct candidate optimal paths. Figure 3.a and 3.b show the edges associated with ILISTe,e′ and ILISTe′′ ,e′ , respectively. Figure 3.c shows that these two sets of edges intersect with each other, meaning that some of them must be useless. To address this issue, BUSHWHACK+ merges ILISTe,e′ and ILISTe′′ ,e′ into a single list ILISTr,e′ . Any point v ∗ ∈ e′ is included in one and only one interval 268 Z. Sun and J.H. Reif in this list. (In BUSHWHACK, every such point is included in two intervals, one in ILISTe,e′ and one in ILISTe′′ ,e′ .) More specifically, for any discovered point v ∈ e ∪ e′′ , v ∗ ∈ Iv,r,e′ if and only if wr · |vv ∗ | + p′opt (s, v) ≤ wr · |v ′ v ∗ | + p′opt (s, v ′ ) for any other discovered point v ′ ∈ e ∪ e′′ . Therefore, any two edges associated with ILISTr,e′ will not intersect with each other inside r. As BUSHWHACK+ constructs candidate optimal paths using only edges associated with interval lists, it would avoid using both of two intersecting edges v1 v1∗ and v2 v2∗ if v1 , v2 ∈ e ∪ e′′ and v1∗ , v2∗ ∈ e′ . The second heuristic is rather subtle. It reduces the size of QLIST, the list of candidate optimal paths. Possible operations on this list include inserting a new candidate optimal path and deleting the minimum cost path in the list. On average, each such operation costs O(log(nm)) time. As each iteration of the algorithm will invoke one or more such operations, it is very important to contain the size of QLIST. In the original BUSHWHACK, for any point v ∈ e, QLIST may contain six or more candidate optimal paths from s to v. Among these paths, four of them are propagated through edges associated with interval lists, while the remaining ones are extended to v from left and right along the edge e. This is a serious disadvantage against Dijkstra-based approximation algorithm, which keeps only one path from s to v in the Fibonacci heap for each Steiner point v. When n is relatively large, the performance gain of BUSHWHACK by accessing only a small subgraph of Gǫ will be totally offset by the time wasted on a larger path list. If multiple candidate optimal paths for v are inserted into QLIST, BUSHWHACK keeps each of them until it is time to extract that path from QLIST, even though it can be immediately decided that all of those paths except one cannot be optimal (by comparing the costs of those paths). This is because BUSHWHACK would generate new candidate optimal paths using these paths in different ways. A (non-optimal) path may lead to the generation of a true optimal discrete path and therefore it cannot be simply discarded. What BUSHWHACK does is to keep the path in QLIST until this path becomes the minimum cost path. At that time, it will be extracted from QLIST and a new candidate optimal path generated from the old path will be inserted into QLIST. BUSHWHACK+ , however, uses a slightly different propagation scheme to avoid keeping multiple paths with the same ending point. Let p(s, v ′ ) be a candidate optimal path from s to v ′ that has just been inserted into QLIST. If there is already another candidate optimal path p′ (s, v ′ ) in QLIST, instead of keeping both of them in QLSIT, BUSHWHACK+ will take the more costly one, say p′ (s, v ′ ), and immediately extract it from QLIST. This extracted path will be processed as if it had been extracted in the normal situation (in which it would have been the minimum cost path in the list). This is, in essence, a “propagation-inadvance” strategy that is somewhat contradictory to “lazy” propagation scheme of BUSHWHACK. It may cause accessing edges unnecessarily. It is a trade-off between reducing the path list size and reducing the number of edges accessed. Adaptive and Compact Discretization 5 269 Preliminary Experimental Results In order to provide a performance comparison, we implemented using Java the following three algorithms: 1) BUSHWHACK+ ; 2) pure Dijkstra’s algorithm, which searches every incident edge of a Steiner point in Gǫ ; 3) two-stage adaptive discretization method, which uses pure Dijkstra’s algorithm for each stage and chooses ǫ1 = 2ǫ . All the timed results were acquired from a Sun Blade-1000 workstation with 4GB memory. For our experiments we chose triangulations converted from terrain maps in grid data format. More specifically, we used the DEM (Digital Elevation Model) file of Kaweah River basin. It is a 1424x1163 grid with 30m between two neighboring grid points. We randomly took twenty 60x45 patches and converted them to TINs by connecting two grid points diagonally for each grid cell. Therefore, in each example there are 5192 triangular faces. For each triangular face r, we assign to r a unit weight wr that is equal to 1 + 10 tan αr , where αr is the angle between r and the horizontal plane. Table 1. Statistics of running time (in seconds) and number of visited edges per region Algorithm BUSHWHACK+ pure Dijkstra adaptive discretization 1 =3 156.9 / 2371 243.0 / 16558 281.3 / 10877 ǫ 1 290.7 / 4603 711.0 / 55797 570.2 / 24041 =5 ǫ 1 =7 440.6 / 7098 1506.0 / 124086 1054.7 / 40827 ǫ 1 631.9 / 9795 2672.5 / 224987 1528.9 / 60495 =9 ǫ For each TIN, we ran the three algorithms five times, each time choosing randomly generated source and destination points. For each algorithm, we took the average of the running times of all experiments. We repeated the experiments with 1ǫ = 3, 5, 7 and 9. From Table 1, it is easy to see that, when 1ǫ grows, the running times of the BUSHWHACK+ algorithm and adaptive discretization method are growing much slower than that of the pure Dijkstra’s algorithm. We also list the average number of visited edges per region for each algorithm and each ǫ value. It occurs to us that, the number of visited edges per region and the running time are closely correlated. 6 Conclusion In this paper we provided several improvements on the approximation algorithms for the weighted region optimal path problem: 1) a compact discretization scheme that removes the dependency on the unit weight ratio; 2) an adaptive discretization that selectively put Steiner points with high density on boundary edges; and 3) a revised BUSHWHACK algorithm with two cost-saving heuristics. Acknowledgement. This work is supported by NSF ITR Grant EIA-0086015, DARPA/AFSOR Contract F30602-01-2-0561, NSF EIA-0218376, and NSF EIA0218359. 270 Z. Sun and J.H. Reif References 1. Mitchell, J.S.B.: Geometric shortest paths and network optimization. In Sack, J.R., Urrutia, J., eds.: Handbook of Computational Geometry. Elsevier Science Publishers B.V. North-Holland, Amsterdam (2000) 633–701 2. Mitchell, J.S.B., Papadimitriou, C.H.: The weighted region problem: Finding shortest paths through a weighted planar subdivision. Journal of the ACM 38 (1991) 18–73 3. Mata, C., Mitchell, J.: A new algorithm for computing shortest paths in weighted planar subdivisions. In: Proceedings of the 13th Annual ACM Symposium on Computational Geometry. (1997) 264–273 4. Aleksandrov, L., Lanthier, M., Maheshwari, A., Sack, J.R.: An ǫ-approximation algorithm for weighted shortest paths on polyhedral surfaces. In: Proceedings of the 6th Scandinavian Workshop on Algorithm Theory. Volume 1432 of Lecture Notes in Computer Science. (1998) 11–22 5. Reif, J.H., Sun, Z.: An efficient approximation algorithm for weighted region shortest path problem. In: Proceedings of the 4th Workshop on Algorithmic Foundations of Robotics. (2000) 191–203 6. Aleksandrov, L., Maheshwari, A., Sack, J.R.: Approximation algorithms for geometric shortest path problems. In: Proceedings of the 32nd Annual ACM Symposium on Theory of Computing. (2000) 286–295 7. Sun, Z., Reif, J.H.: BUSHWHACK: An approximation algorithm for minimal paths through pseudo-Euclidean spaces. In: Proceedings of the 12th Annual International Symposium on Algorithms and Computation. Volume 2223 of Lecture Notes in Computer Science. (2001) 160–171 8. Helgason, R.V., Kennington, J., Stewart, B.: The one-to-one shortest-path problem: An empirical analysis with the two-tree dijkstra algorithm. Computational Optimization and Applications 1 (1993) 47–75 9. Reif, J.H., Sun, Z.: Movement planning in the presence of flows. In: Proceedings of the 7th International Workshop on Algorithms and Data Structures. Volume 2125 of Lecture Notes in Computer Science. (2001) 450–461 10. Lanthier, M., Maheshwari, A., Sack, J.R.: Shortest anisotropic paths on terrains. In: Proceedings of the 26th International Colloquium on Automata, Languages and Programming. Volume 1644 of Lecture Notes in Computer Science. (1999) 524–533 11. Sun, Z., Reif, J.H.: On energy-minimizing paths on terrains for a mobile robot. In: Proceedings of the 2003 IEEE International Conference on Robotics and Automation. (2003) To appear. 12. Papadimitriou, C.H.: An algorithm for shortest-path motion in three dimensions. Information Processing Letters 20 (1985) 259–263 13. Choi, J., Sellen, J., Yap, C.K.: Approximate Euclidean shortest path in 3-space. In: Proceedings of the 10th Annual ACM Symposium on Computational Geometry. (1994) 41–48 On Boundaries of Highly Visible Spaces and Applications John H. Reif and Zheng Sun Department of Computer Science, Duke University, Durham, NC 27708, USA {reif,sunz}@cs.duke.edu Abstract. The purpose of this paper is to investigate the properties of a certain class of highly visible spaces. For a given geometric space S containing obstacles specified by disjoint subsets of S, the free space F is defined to be the portion of S not occupied by these obstacles. The space is said to be highly visible if at each point in F a viewer can see at least an ǫ fraction of the entire F . This assumption has been used for robotic motion planning in the analysis of random sampling of points in the robot’s configuration space, as well as the upper bound of the minimum number of guards needed for art gallery problems. However, there is no prior result on the implication of this assumption to the geometry of the space under study. For the two-dimensional case, with the additional assumptions that S is bounded within a rectangle of constant aspect ratio and that the volume ratio between F and S is a constant, we show by “charging” each obstacle boundary by a certain  portion of S that the total length of all obstacle boundaries in S is O( nµ(F)/ǫ), if S contains  polygonal obstacles with a total of n boundary edges; or O( nµ(F )/ǫ), if S contains n convex obstacles that are piecewise smooth. In both cases, µ(F ) is the volume of F. For the polygonal case, this bound is tight as we can construct a space whose boundary size is Θ( nµ(F )/ǫ). These results can be partially extended to three dimensions. We show that these results can be applied to the analysis of certain probabilistic roadmap planners, as well as a variation of the art gallery problem. 1 Introduction Computational geometry is now a mature field with a multiplicity of well-defined foundational problems associated with, for many cases, efficient algorithms as well as well-established applications over a broad range of areas including computer vision, robotic motion planning and rendering. However, as compared to some other fields, the field of computational geometry has not yet explored as much the methodology of looking at reasonable sub-cases of inputs that appear in practice for practical problems. For example, in matrix computation, there is a well-established set of specialized matrices, such as sparse matrices, structured matrices, and banded matrices, for which there are especially efficient algorithms. One assumption that has been used in a number of previous works in computational geometry is the assumption that, for a given geometric space S with A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 271–283, 2003. c Springer-Verlag Berlin Heidelberg 2003  272 J.H. Reif and Z. Sun a specified set of obstacles, a viewer can see at every point of the free space F an ǫ fraction of the entire volume of F. Here obstacles are defined to be compact subsets of S, while the free space F is defined to be the portion of S not occupied by the obstacles. In this paper we will call this assumption ǫ-visibility (though please note that some of the prior authors called it instead ǫ-goodness). 1.1 Probabilistic Roadmap Planners The ǫ-visibility assumption, in particular, has been used in the analysis of randomized placements of points in the robot’s configuration space for probabilistic roadmap (PRM) planners [1,2]. A classic PRM planner [3,4] randomly picks in the free space of the robot’s configuration space a set of points, called milestones. With these milestones, it constructs a roadmap by connecting each pair of milestones between which a collision-free path can be computed using a simple local planner. For any given initial and goal configurations s and t, the planner first finds two milestones s′ and t′ such that a simple collision-free path can be found connecting s (t, respectively) with s′ (t′ , respectively) and then searches the roadmap for a path connecting s′ and t′ . The PRM planners have proved to be very effective in practice, capable of solving robotic motion planning problems with many degrees of freedom. They also find applications in other areas such as computer animation, computational biology, etc. The performance of a PRM planner depends on two key features of the roadmaps it constructs, visibility and connectivity. Firstly, for any given (initial or goal) configuration v, there should exist in the roadmap a milestone v ′ such that a local planner can find a path connecting v and v ′ . Since in practice most PRM planners use local planners that connect configurations by straight line segments, this implies that the milestones collectively need to see the entire (or at least a significant portion of) free space. Secondly, the roadmap should capture the connectivity of the free space it represents. Any two milestones in the same connected component of the free space should also be connected via the roadmap, or otherwise the planner would give “false negative” answers to some queries. The earlier PRM planners pick milestones with a uniform distribution in the free space. The success of these planners motivated Kavraki et al.[1] to establish a theoretical foundation for the effectiveness of this sampling method. They showed that, for an ǫ-visible configuration space, O( 1ǫ log 1ǫ ) milestones uniformly sampled in the free space are needed to adequately cover the free space with a high probability. 1.2 Art Gallery Problems The ǫ-visibility assumption has also been used in bounding the number of guards needed for art gallery problems [5,6,7,8]. Potentially, this assumption might also allow for much more efficient algorithms in this case. The assumption appears to be reasonable in large number of practical cases as long as the considered area is within a closed area (such as a room). On Boundaries of Highly Visible Spaces and Applications 273 The original art gallery problem was first proposed by V. Klee, who described the problem as the following: how many guards are necessary, and how many guards are sufficient, to guard the paintings and works of art in an art gallery with n walls? Later, Chvátal [9] showed that ⌊ n3 ⌋ guards are always sufficient and occasionally necessary to guard a simple polygon with n edges. Since then, there have been numerous variations of the art gallery problem, including, but not limited to, vertex guard problem, edge guard problem, fortress and prison yard problems, etc. (See [10] for a comprehensive review of various art gallery problems.) Although for the worst case the number of guards needed is Θ(n) for polygonal galleries with n edges, intuitively, one would expect that galleries that are ǫ-visible should require much fewer guards. By translating the result of Kavraki et al.[1] into the context of art gallery problems, a uniformly random placement of O( 1ǫ log 1ǫ ) guards is very likely to guard an adequate portion of the gallery. Kavraki et al.[1] also conjectured that in d-dimensional space any ǫ-visible polygonal gallery with h holes can be guarded by at most fd (h, 1ǫ ) guards, for some polynomial function fd . Following some ideas of an earlier work by Kalai and Matoušek [5], Valtr [6] confirmed the 2D version of the conjecture by showing that f2 (h, 1ǫ ) = (2 + o(1)) 1ǫ log 1ǫ log(h + 2). However, Valtr [7] disapproved the 3D version of the conjecture by constructing for any integer k a 95 -visible art gallery that cannot be guarded by k guards. Kirkpatrick [8] later showed that 64 1ǫ log log 1ǫ vertex guards are needed to guard all vertices of a simply connected polygon P that has the property that each vertex of P can see at least ǫ fraction of the other vertices of P . He also gave a similar result for boundary guards. It has been proved that, for various art galleries problems, finding the minimum number of guards is difficult. Lee and Lin [11] proved that the minimum vertex guard problem for polygons is NP-hard. Schuchardt and Hecker[12] further showed that even for orthogonal polygons, whose edges are parallel to either the x-axis or the y-axis, the minimum vertex and point guard problems are NPhard. Ghosh [13] presented an O(n5 log n) algorithm that can compute a vertex guard set whose size is at most O(log n) times the minimum number of guards needed. However, with the assumption of ǫ-visibility, one can use a simple and efficient randomized approximation algorithm based on the result of Kavraki et al.[1] for the original art gallery problem. Moreover, this approximation algorithm does not require the assumption that the space is polygonal. 1.3 Our Result Intuitively, for an ǫ-visible space, the total size of all obstacle boundaries cannot be arbitrarily large; an excessive size of obstacle boundaries would inevitably cause a point in F to lose ǫ-visibility by blocking a significant portion of its view. Our main result of this paper is an upper bound of the boundary size of ǫ-visible spaces in two and (in some special cases) three dimensions. The upper bound of the boundary size not only is a fundamental property for the geometric 274 J.H. Reif and Z. Sun spaces of this type, but also may have implications to other applications that use this assumption. We show that,  for an ǫ-visible 2D space, the total length of all obstacle boundaries is O( nµ(F)/ǫ), if the space contains polygonal obstacles with a total of n boundary edges; or O( nµ(F)/ǫ), if the space contains n convex obstacles that are piecewise smooth. In both cases, µ(F) is the area of F. For the case of polygonal obstacles, this bound is tight as one can construct an ǫ-visible  space containing obstacle boundaries with a total length of Θ( nµ(F)/ǫ). Our result can be used to bound the number of guards needed for the following variation of the original art gallery problem: given a space with a specified set of obstacles, how to put points on boundaries of obstacles so that these points see the entire (or a significant portion of) space. We call this problem boundary art gallery problem. This problem can find applications in practical situations where the physical constraints would only allow points to be placed on obstacle boundaries. For example, one might need to install lights on the walls to enlighten a closed space consisting of rooms and corridors. If this result can be extended to higher dimensions, we can also apply it to bounding the number of randomly sampled boundary points needed to adequately cover the free space. Although it is difficult to uniformly sample points on the boundary of a space without an explicit algebraic description, there exist PRM planners [14,15] that place milestones “pseudo-uniformly” on the boundary of the free space using various techniques. These planners have proved to be more effective in capturing the connectivity of the configuration space with the presence of narrow passages. 2 Bounding Boundary Size for 2D and 3D ǫ-Visible Spaces In this section we prove an upper bound of the boundary size of 2D ǫ-visible spaces. We also show that this result can be partially extended to 3D ǫ-visible spaces. 2.1 Preliminaries Suppose S is the 2D space bounded inside a rectangle R. We let B denote the union of all obstacles in S, and let ∂B denote the boundaries of all obstacles. For each point v ∈ F, we let Vv = {v ′ | line segment vv ′ ⊂ F}. That is, Vv is the set of all free space points that can be seen from v. We assume that the aspect ratio of R, defined to be the ratio between the lengths of the shorter and longer sides of R, is no less than λ, where 0 < λ < 1. We also assume that µ(F) ≥ ρ · µ(S), for some constant ρ > 0. In the full version of the paper, we will give examples where the boundary size cannot be bounded if λ and ρ are not bounded by constants. A segment of the boundary (which we call sub-boundary) of an obstacle is said to be smooth if the curvature is continuous along the curve defining the On Boundaries of Highly Visible Spaces and Applications 275 boundary. The boundary of an obstacle is said to be piecewise smooth if it consists of a finite number of smooth sub-boundaries. In this section we assume that the boundaries of all obstacles inside R are piecewise smooth. For a smooth sub-boundary c, the turning angle, denoted by A(c), is defined to be the integral of the curvature along c. For a piecewise sub-boundary c, the turning angle is defined to be the sum of the turning angles of all smooth subboundaries of c, plus the sum of the instantaneous angular changes at the joint points. Observe that the turning angle of the boundary of an obstacle is 2π if the obstacle is convex, or greater than 2π if it is non-convex. In some sense, the turning angle of the boundary of an obstacle reflects the geometric complexity of the obstacle. For each sub-boundary c, we use |c| to denote the length of c, and use c[u1 , u2 ] to denote the part of c between points u1 and u2 on c. For any point v ∈ c, we let u1 and u2 be the two points on c such that c is lying between the two rays − →1 and − →2 . We call u1 and u2 bounding points of c by v. We define the viewing vu vu angle of c from v to be  u1 vu2 . u1,1 c1 u1,2 c2 u1 c1 c3 u2 (a) Various ǫ-flat sub-boundaries bounded between two arcs v (b) Blocked visibility near ǫflat sub-boundary Fig. 1. Lines and curves are not drawn proportionally. For each obstacle, we decompose its boundary into minimum number of ǫflat sub-boundaries. A sub-boundary c is said to be ǫ-flat if A(c) ≤ π − θǫ , λρ where θǫ = 16(1+λ 2 ) · ǫ. Let u1 and u2 be the two endpoints of c. Observe that c is bounded between two minor arcs each with chord u1 u2 and angle 2θǫ , as shown in Figure 1.a. Therefore, the width of c, defined by |u1 u2 |, is no less than |c| · cos θ2ǫ , while the height of c, defined by the maximum distance between any θǫ point u ∈ c and line segment u1 u2 , is no more than |c| 2 · sin 2 . Since ǫ-flat sub-boundaries are “relatively” flat, any point v ∈ F “sandwiched” between two ǫ-flat sub-boundaries will have a limited visibility, as we show in the follow lemma: Lemma 1. If v ∈ F is a point between two ǫ-flat sub-boundaries c1 and c2 and the total viewing angle of c1 and c2 from v is more than 2π − 6θǫ , then v is not ǫ-visible. Proof Abstract. For each i = 1, 2, let ui,1 and ui,2 be the two endpoints of ci . Vv is the union of the following three regions: I) the region bounded by subboundary c1 , vu1,1 and vu1,2 ; II) the region bounded by sub-boundary c2 , vu2,1 276 J.H. Reif and Z. Sun and vu2,2 ; and III) the region not inside either  u1,1 vu1,2 or  u2,1 vu2,2 . Since the total viewing angle of v blocked by c1 and c2 is more than 2π − 6θǫ , and u1,1 vu1,2 ≤ π + θǫ and  u2,1 vu2,2 ≤ π + θǫ , we have  u1,1 vu1,2 > π − 7θǫ and  u2,1 vu2,2 > π − 7θǫ . Since c1 is ǫ-flat, the volume of Region I is bounded by the union of △ui,1 vui,2 and the arc with chord |c1 | and  angle 2θǫ , as shown in 2 +1 µ(F), where LR is Figure 1.b. Since |c1 | · cos(θǫ /2) ≤ |u1,1 u1,2 | ≤ LR ≤ λλρ the length of the diagonal of R, the volume of Region I is bounded by O(ǫµ(F)). Region III is the union of two (possibly merged) cones with a total angle of 6θǫ , and therefore the volume of Region III is also O(ǫµ(F)). Hence, the region visible from v has a total volume of O(ǫµ(F)). (In the full version of the paper we will show that the volume is actually less than ǫµ(F).) Therefore, v is not ǫ-visible. ⊓ ⊔ In the rest of this section we will prove the following theorem: Theorem 1. If the boundaries of all obstacles can be divided into n ǫ-flat  sub) boundaries, the total length of all obstacle boundaries is bounded by O( nµ(F ). ǫ However, to prove Theorem 1 we need two lemmas, which we will prove in the next subsection. In Subsection 2.3 we will show the proof of this theorem as well as its corollaries. 2.2 Forbidden Neighborhoods of ǫ-Flat Sub-boundaries For each ǫ-flat sub-boundary c with endpoints u1 and u2 , we divide it into 15 equal-length segments, and let u′1 and u′2 be the two endpoints of the middle segment. The ǫ-neighborhood of c, denoted by Nǫ (c), is defined to be the union of points from each of which the viewing angle of c[u′1 , u′2 ] is greater than π − θǫ , as show in Figure 2.a. It is easy to see that, for any v ∈ Nǫ (c), the distance |c[u′1 ,u′2 ]| between v and line segment u′1 u′2 is no more than · tan θǫ = |c| 2 30 · tan θǫ . The distance between v and line segment u1 u2 is no more than the sum of the distance between u and u′1 u′2 and the maximum distance between u′1 u′2 and u1 u2 , |c| θǫ which is |c| 30 · tan θǫ + 2 · sin 2 . These neighborhoods are “forbidden” in the sense that they do not overlap with each other if the corresponding sub-boundaries are roughly the same length, as we will show in Lemma 2. By “charging” a certain portion of S to each ǫ-flat sub-boundary, we show that the total length of all ǫ-flat sub-boundaries, that is, the length of ∂B, can be upper-bounded. Lemma 2. The ǫ-neighborhoods of two sub-boundaries c1 and c2 do not overlap if |c21 | ≤ |c2 | ≤ 2|c1 |. Proof. Suppose for the sake of contradiction v ∈ S is a point inside Nǫ (c1 ) ∩ Nǫ (c2 ), where the length ratio between c1 and c2 is between 21 and 2. For each i = 1, 2, we let ui,1 and ui,2 be the two endpoints of ci , and let u′i,1 and u′i,2 be the endpoints of the portion of ci incident to the ǫ-neighborhood of ci . Let vi be On Boundaries of Highly Visible Spaces and Applications v1 u′1,2 v ′ u′1,1 1 c1 v ′ l1 + l2 u′2,2 u2,1 c 2 ′ u2,2 v2 v l1 u”2,2 2 u”2,1l1 u1,2 ǫ-neighborhood u1 u′1 277 u′2 u2 c obstacle (a) ǫ-neighborhood u1,1 u2,1 (b) ǫ-neighborhoods are nonoverlapping for sub-boundaries with similar lengths Fig. 2. Lines and curves are not drawn proportionally. the projection of v on line segment ui,1 ui,2 , and let vi′ be the intersection of ci and the straight line that passes both vi and v. The intuition here is as the following: since c1 and c2 are “relatively” flat, non-intersecting, and about the same length, for Nǫ (c1 ) and Nǫ (c2 ) to overlap, u1,1 u1,2 and u2,1 u2,2 have to be “almost” parallel and also close to each other. That way, we can find in the free space between c1 and c2 a point that can only see less than ǫµ(F) of the free space as its visibility is mostly “blocked” by c1 and c2 , leading to a contradiction to the assumption that S is ǫ-visible. There are a number of cases corresponding to different geometric arrangements of the points, line segments and curves (sub-boundaries). In the following we assume that u1,1 u1,2 and u2,1 u2,2 do not intersect, v lies between u1,1 u1,2 and u2,1 u2,2 , and v1′ (v2′ , respectively) lies between v and v1 (v2 , respectively), as shown in Figure 2.b. The other cases can be analyzed in an analogous manner. Since line segments u1,1 u1,2 and u2,1 u2,2 do not intersect, either both v1 u2,1 and v1 u2,2 lie between u1,1 u1,2 and u2,1 u2,2 , or both v2 u1,1 and v2 u1,2 lie between u1,1 u1,2 and u2,1 u2,2 . Without loss of generality we assume that it is the former case. Let l1 = |vv1 | and l2 = |vv2 |. Let u′′2,1 (u′′2,2 , respectively) be the projection of u′2,1 (u′2,2 , respectively) on u2,1 u2,2 . Observe that v1′ lies inside the small rectangle of width |u′′2,1 u′′2,2 | + 2l1 and height l1 + l2 (the solid rectangle in Figure 2.b). Since |u2,2 u′′2,2 | = |u2,2 u2,1 | − |u′′2,2 u2,1 | > |u2,2 u2,1 | − |c[u′2,2 , u2,1 ]|, we have tan  v1′ u2,1 u2,2 ≤ ≤ l1 + l2 |u2,2 u2,1 | − |c[u′2,2 , u2,1 ]| − l1 1 · tan θǫ + ( 30 8|c2 | 15 |c2 | · cos θ2ǫ − Applying |c1 | ≤ 2|c1 | and θǫ < tan  v1′ u2,1 u2,2 ≤ 1 12 , 1 2 · sin θ2ǫ ) · (|c1 | + |c2 |) 1 − ( 30 · tan θǫ + 1 2 · sin θ2ǫ ) · |c1 | we now have 1 1 θǫ · ( 30 cos θǫ + 4 ) · 3|c2 | 8 1 (cos θ2ǫ − 15 − ( 15 · tan θǫ + sin θ2ǫ )) · |c2 | 5θǫ 5θǫ 5 ≤ ≤ tan θǫ ≤ tan . 2 2 2 . 278 J.H. Reif and Z. Sun It follows that  v1′ u2,1 u2,2 ≤ 5θ2ǫ . Similarly, we can show that  v1′ u2,2 u2,1 ≤ 5θ2ǫ , and therefore  u2,1 v1′ u2,2 ≥ π−5θǫ . Since v1′ is on c1 ,  u1,1 v1′ u1,2 ≥ π−θǫ . Therefore, the viewing angle from v1′ not blocked by c1 and c2 is no more than 2π − (π − θǫ ) − (π − 5θǫ ) = 6θǫ . According to Lemma 1 v1′ is not ǫ-visible. Therefore, we can find a point v1∗ ∈ F close to v1′ who is also not ǫ-visible, a contradiction to the assumption that S is ǫ-visible. ⊓ ⊔ Next we give a lower bound of the volume of the ǫ-neighborhood of any ǫ-flat sub-boundary with the following lemma: Lemma 3. For any ǫ-flat sub-boundary c, the volume of Nǫ (c) is Ω(θǫ · |c|2 ). Proof. We will show that, the ǫ-neighborhood of c has a volume no less than θ |c[u′1 ,u′2 ]|2 , for some constant κ1 > 1. (We will explain later how this µ0 = ǫ 18κ 1 constant κ1 is chosen.) part of ǫ-neighborhood part of ǫ-neighborhood c c2 u′2 c3 v v0 c u0 c2 u c1 obstacle u′1 (a) ǫ-flat sub-boundary: case I u′2 c3 obstacle v1 c1 u′1 (b) ǫ-flat sub-boundary: case II Fig. 3. In the figures we only show the portion of sub-boundary c between u′1 and u′2 We divide c[u′1 , u′2 ] into three equal-length segments, c1 , c2 , and c3 . For any point u on c[u′1 , u′2 ], we say that v ∈ F is the lookout point of u if line segment vu is normal to c[u′1 , u′2 ] and the viewing angle of c[u′1 , u′2 ] from v is π − θǫ . We call the length of uv the lookout distance of c[u′1 , u′2 ] at u. We first consider Case I, where for each point u ∈ c2 the length of the lookout θ |c[u′ ,u′ ]| distance of c at u is at least l = ǫ 3κ11 2 , as shown in Figure 3.a. In this case, the volume of the ǫ-neighborhood of c outside c2 is at least |c2 | · l − |c[u′1 ,u′2 ]|2 ·θǫ 9κ1 θǫ2 2κ1 ) |c[u′1 ,u′2 ]|2 ·θǫ ≥ 18κ1 no less than µ0 . l2 ·θǫ 2 = · (1 − = µ0 , and therefore the volume of the ǫ-neighborhood of c is Now we consider Case II, where there exists a point u0 ∈ c2 such that the lookout distance at u0 is less than l, as shown in Figure 3.b. Let v0 be the lookout point of u0 . Since A(c[u′1 , u′2 ]) ≤ A(c) ≤ θǫ , v0 will see at least one of the two endpoints of c[u′1 , u′2 ], or otherwise the viewing angle of v0 is less than π − θǫ . Without loss of generality we let u′1 be an endpoint of c[u′1 , u′2 ] that is visible from v0 . c[u0 , u′1 ], the part of c between u0 and u′1 , lies below line segments v0 u′1 . |c[u′1 ,u′2 ]| Since u0 ∈ c2 , we have |c[u0 , u′1 ]| ≥ |c1 | = . 3 On Boundaries of Highly Visible Spaces and Applications 279 Since curve c[u0 , u′1 ] is also ǫ-flat, we have |u0 u′1 | ≥ |c[u0 , u′1 ]| · cos θ2ǫ > |c[u′1 ,u′2 ]| 6 u0 u′1 . . We use u0 u′1 as the chord to draw a minor arc of angle 2θǫ outside |u u′ | |c[u′ ,u′ ]| 0 1 1 2 The radius of this arc is r0 = 2 sin θǫ ≥ 12θǫ . Let v1 be the point where ′ ′ ′ arc u 0 u1 intersects v0 u1 . We claim that any point v inside the closed region ′ ′  bounded by arc u0 u1 and chord u1 v1 belongs to the ǫ-neighborhood of c. First of all, v ′ is outside c[u0 , u′1 ], as c[u0 , u′1 ] lies below v0 u′1 . Secondly, the viewing angle of c[u′1 , u′2 ] from v ′ should be no less than the viewing angle of c[u0 , u′1 ] from v ′ , which is at least π − θǫ . ′ ′ Now we consider the volume of the region bounded by u 0 u1 and u1 v1 . This ′ ′ is actually an arc u v1 with angle θ0 = 2θǫ − 2 u0 u v0 and radius r0 . Since  u0 u′1 v0 < |u0 v0 | |u0 u′1 | 1 1 < l |c[u′1 ,u′2 ]|/6 = 2θǫ κ1 . As long as we choose κ1 large enough, ′v ,  u0 u′ v0 < θǫ and therefore θ0 > θǫ . The volume of arc u 1 1 1 2 r02 r02 ·θ03 |c[u′1 ,u′2 ]|2 θǫ therefore, is 2 (θ0 − sin θ0 ) ≥ 14 ≥ . Once again, if we choose κ1 14·122 ′ ′ 2 θ |c[u ,u ]| ǫ ′v )≥ 1 2 = µ0 , and therefore the volume large enough, we can have µ(u 1 1 18κ1 of the ǫ-neighborhood of c is greater than µ0 . 2 ⊓ ⊔ Since |c[u′1 , u′2 ]| = |c| 15 , we have µ(Nǫ (c)) = Ω(θǫ · |c| ). we can have 2.3 Putting It Together With the lemmas established in the last subsection, we are ready to prove Theorem 1: Proof of Theorem 1. Let Lmax be the maximum length of all ǫ-flat sub-boundaries inside R. We divide all ǫ-flat sub-boundaries into subsets S1 , S2 , · · · , Sk . For each i, Si contains the boundaries edges whose lengths are between Lmax and L2max i−1 , 2i We let ci,1 , ci,2 , · · · , ci,ni be the ni sub-boundaries in Si . By Lemma 2, Nǫ (ci,j ) ∩ Nǫ (ci,j ′ ) = ∅, for any j and j ′ , 1 ≤ j, j ′ ≤ ni . By Lemma 3, there exists a constant K > 0 such that µ(Nǫ (ci,j )) ≥ K · θǫ · |ci,j |2 for all i and j. Therefore, we have ni ni   µ(F) µ(Nǫ (ci,j )) Nǫ (ci,j )) = ≥ µ(S) ≥ µ( ρ j=1 j=1 = ni  j=1 Hence we have ni ≤ K · θǫ · |ci,j |2 ≥ ni · K · θǫ · 4i ·µ(F ) K·θǫ ·L2max ·ρ . Let K ′ = an upper bound of |∂B|, which is defined to be ǫ-flat sub-boundaries. Since k Observe that i=1 ni = n, L2max . 4i µ(F ) K·θǫ ·L2max ·ρ . k ni i=1 j=1 Now we are to give |ci,j |, the sum of all k Lmax |ci,j | ≤ 2i−1 , we have |∂B| ≤ Lmax · i=1 ni · 2−i+1 . k −i+1 is maximized when ni = K ′ · 4i for i=1 ni · 2 280 J.H. Reif and Z. Sun i < log4 3n K′ k  i=1 3n K′ . and ni = 0 for i ≥ log4 log4 ni · 2 −i+1 ≤ 3n K′  i=1 −1 Therefore, we have log4 ′ i K ·4 ·2 < 2K ′ · 2log4 Therefore, |∂B| is no more than 3n K′  = √ −i+1 = 2K 12n · K ′ = ′  3n K′  −1 2i i=1 12n · µ(F) . K · θǫ · L2max · ρ 12n·µ(F ) K·θǫ ·ρ . Recall that K and ρ are constants  ) ). ⊓ ⊔ and that θǫ = Θ(ǫ), we have |∂B| = O( nµ(F ǫ If all the obstacles inside S are polygons, each boundary edge is an ǫ-flat sub-boundary, and therefore we have the following corollary: Corollary 1. If S contains polygonal obstacles with a total of n edges, |∂B| is  nµ(F ) ). O( ǫ If all obstacles inside S are convex, the boundary of each obstacle can be decomposed into 2π θǫ ǫ-flat sub-boundaries, and therefore we have: Corollary  2. If S contains n convex obstacles that are piecewise smooth, |∂B| is O( 1ǫ nµ(F)). In some sense, the upper bound stated in Corollary 1 is tight, as one can construct an ǫ-visible space inside a square consisting of n = 1ǫ rectangular free   space “cells,” each with length µ(F) and width ǫ · µ(F). The total length   ) ). of obstacle boundaries is Θ( 1ǫ µ(F) = Θ( nµ(F ǫ Nonetheless, we still conjecture that the best bound should be the following: Conjecture 1. |∂B| is O( 1ǫ 2.4  µ(F)). Extension to Three Dimensions In this subsection we show how to generalize our proof of Theorem 1 to 3D spaces. For simplicity, we assume that the boundary (surface) of each obstacle is smooth, meaning that the curvature is continuous everywhere on the surface. To replicate the proofs of Lemmas 1, 2, and 3 for the 3D case, we first need to define the ǫ-flat surface patch, the 3D counterpart of ǫ-flat sub-boundary. A surface patch s is said to be ǫ-flat if, for any point u ∈ s and any plane p that contains the line ls,u , the curve c = p ∩ s is ǫ-flat. Here ls,u is the line that passes through u and is normal to s. Moreover, we also need the surface patch to be “relatively round.” More specifically, we require that for each ǫ-flat surface patch s there exists a “center” vs such that, max{|vs v||v ∈ ∂s}/ min{|vs v||v ∈ ∂s} is bounded by a constant. Here ∂s is the closed curve that defines the boundary of s. We call Rs,vs = min{|vs v||v ∈ ∂s} the minimum radius of s at center vs . On Boundaries of Highly Visible Spaces and Applications 281 We define the ǫ-neighborhood Nǫ (s) for an ǫ-flat surface patch similarly to the case of ǫ-flat sub-boundary. We choose a small “sub-patch” s′ of s at the center of s so that the distance between vs and every point on the boundary of s′ is k1 · Rs,vs , for some constant k1 < 1. For any point v outside the obstacle that s is bounding, v ∈ Nǫ (s) if and only if there exist two points u1 , u2 ∈ s′ such that  u1 vu2 > π − k2 ǫ for some constant k2 > 0. We use a sequence of planes each containing lv,sv to “sweep” through the volume of Nǫ (s). Each such plane p contains a “slice” of Nǫ (s) with an area of 2 no less than Θ(ǫ · Rs,v ), following the same argument of the proof of Lemma 3. s 3 3 ) = Θ(ǫ · µ(s) 2 ). We leave the Therefore, the total volume of Nǫ (s) is Θ(ǫ · Rs,v s details of the proof as well as the proofs of the 3D versions of the other lemmas to the full version of the paper, and only state the result as the following: Theorem 2. If S contains convex obstacles bounded by a total of n ǫ-flat surface )2 1/3 ) ). patches, |∂B| is O(( nµ(F ǫ2 3 Applications and Open Problems  It is easy to see that in a 2D ǫ-visible space ∂Bv = Ω(ǫ µ(F)) for any v ∈ F. Therefore, we can arrive at a lower bound of the fraction of all obstacle boundaries that each free space point can see for various cases by using Corollaries 1 and 2. In particular, if Conjecture 1 holds, each free space point can see at least Ω(ǫ2 ) fraction of all obstacle boundaries. Then, using the same proof technique as [1]1 , we can show that O( ǫ12 log 1ǫ ) randomly sampled boundary points can view a significant portion of F with a high probability. These results can be applied to the boundary art gallery problem to provide an upper bound of the number of boundary guards needed to adequately guard the space. It occurs to us that, although one can construct an example wherethere exists a free space point that can only see obstacle boundaries of size Θ(ǫ µ(F)), the total volume of such points could be upper-bounded. In particular, we have the following conjecture: √ Conjecture 2. Every point in F, exceptfor a small subset of volume O( ǫµ(F)), can see obstacle boundaries of size Ω( ǫµ(F)). If we can prove both Conjecture 1 and Conjecture 2, we can reduce the number of boundary points needed to adequately cover the space with high 1 probability to O( ǫ3/2 log 1ǫ ). So far our results are limited to 2D ǫ-visible spaces and some special cases of 3D ǫ-visible spaces. If we can extend these results to higher dimensions, we will be able to provide a theoretical foundation for analyzing the effectiveness of the PRM planners [14,15] that (randomly) pick milestones close to boundaries of obstacles. These planners have shown to be more efficient than the earlier 1 The difference is that, in our proof, every point v in the free space sees at least ǫ2 fraction of obstacle boundaries, and therefore the probability that k points uniformly sampled on obstacle boundaries cannot see v is (1 − ǫ2 )k . 282 J.H. Reif and Z. Sun PRM planners based on uniform sampling in the free space by better capturing narrow passages in the configuration space; that is, the roadmaps they construct have better connectivity. However, there has been no prior theoretical result on the visibility of the roadmaps constructed using the sampled boundary points. With upper bound results analogous to the ones for 2D and 3D cases, we will be able to prove an upper bound of the number of milestones uniformly sampled on obstacle boundaries needed to adequately cover free space F with a high probability, an result similar to the one provided by Kavraki [1] for uniform sampling method. 4 Conclusion In this paper we provided some preliminary results as well as several conjectures on the upper bound of the boundary size of ǫ-visible spaces in 2D and 3D spaces. These results can be used to bound the number of guards needed for the boundary art gallery problem. Potentially, they can also be applied to the analysis of a certain class of PRM planners that sample points close to obstacle boundaries. Acknowledgement. This work is supported by NSF ITR Grant EIA-0086015, DARPA/AFSOR Contract F30602-01-2-0561, NSF EIA-0218376, and NSF EIA0218359. References 1. Kavraki, L.E., Latombe, J.C., Motwani, R., Raghavan, P.: Randomized query processing in robot motion planning. In: Proceedings of the 27th Annual ACM Symposium on Theory of Computing. (1995) 353–362 2. Hsu, D., Kavraki, L., Latombe, J.C., Motwari, R., Sorkin, S.: On finding narrow passages with probabilistic roadmap planners. In: Proceedings of the 3rd Workshop on Algorithmic Foundations of Robotics. (1998) 3. Kavraki, L., Latombe, J.C.: Randomized preprocessing of configuration space for fast path planning. In: Proceedings of the 1994 International Conference on Robotics and Automation. (1994) 2138–2145 4. Overmars, M.H., Švestka, P.: A probabilistic learning approach to motion planning. In: Proceedings of the 1st Workshop on Algorithmic Foundations of Robotics. (1994) 19–37 5. Kalai, G., Matoušek, J.: Guarding galleries where every point sees a large area. Israel Journal of Mathematics 101 (1997) 125–139 6. Valtr, P.: Guarding galleries where no point sees a small area. Israel Journal of Mathematics 104 (1998) 1–16 7. Valtr, P.: On galleries with no bad points. Discrete & Computational Geometry 21 (1999) 193–200 8. Kirkpatrick, D.: Guarding galleries with no nooks. In: Proceedings of the 12th Canadian Conference on Computational Geometry. (2000) 43–46 On Boundaries of Highly Visible Spaces and Applications 283 9. Chvátal, V.: A combinatorial theorem in plane geometry. Journal of Combinatorial Theory Series B 18 (1975) 39–41 10. Urrutia, J.: Art gallery and illumination problems. In Sack, J.R., Urrutia, J., eds.: Handbook of Computational Geometry. Elsevier Science Publishers B.V. NorthHolland, Amsterdam (2000) 973–1026 11. Lee, D.T., Lin, A.K.: Computational complexity of art gallery problems. IEEE Transactions on Information Theory 32 (1986) 276–282 12. Schuchardt, D., Hecker, H.: Two NP-hard art-gallery problems for ortho-polygons. Mathematical Logic Quarterly 41 (1995) 261–267 13. Ghosh, S.K.: Approximation algorithms for art gallery problems. In: Proceedings of Canadian Information Processing Society Congress. (1987) 14. Amato, N.M., Bayazit, O.B., Dale, L.K., Jones, C., Vallejo, D.: OBPRM: An obstacle-based PRM for 3d workspaces. In: Proceedings of the 3rd Workshop on Algorithmic Foundations of Robotics. (1998) 155–168 15. Boor, V., Overmars, M.H., Stappen, A.F.: The Gaussian sampling strategy for probabilistic roadmap planners. In: Proceedings of the 1999 IEEE International Conference on Robotics and Automation. (1999) 1018–1023 Membrane Computing Gheorghe Păun Institute of Mathematics of the Romanian Academy PO Box 1-764, 70700 Bucureşti, Romania, and Research Group on Mathematical Linguistics Rovira i Virgili University Pl. Imperial Tárraco 1, 43005 Tarragona, Spain gpaun@imar.ro, gp@astor.urv.es Abstract. This is a brief overview of membrane computing, at about five years since this area of natural computing has been initiated. One informally introduces the basic ideas and the basic classes of membrane systems (P systems), some directions of research already well developed (mentioning only some central results or types of results along these directions), as well as several research topics which seem to be of interest. 1 Foreword Membrane computing is a branch of natural computing which abstracts distributed parallel computing models from the structure and functioning of the living cell. The devices investigated in this framework, called membrane systems or P systems, are both able of Turing universal computations and, in certain cases where an enhanced parallelism is provided, able to solve intractable problems in a polynomial time (by trading space for time). The domain is well developed at the mathematical level, still waiting for implementations of a practical computational interest, but several applications in modelling various biological (but also related to ecology, artificial life, abstract chemistry, even to linguistics) phenomena have been reported. At less than five years since the paper [6] was circulated on Internet, the bibliography of the domain is pretty large and continuously growing, hence the present survey will only mention the main directions of research and their central results, as well as some topics for further investigation. The goal is to let the reader to have an idea about what membrane computing is dealing with, rather than to provide a formal presentation of membrane systems of various types or a list of precise results. Also, we do not give complete references. The domain is fastly evolving – in particular, several results are repeatedly improved – hence we suggest to the interested reader to consult the web page http://psystems.disco.unimib.it for up-dated details and references. Of a special interest can be the collective volumes available in the web page, those devoted to the series of Workshops on Membrane Computing (held in Curtea de Argeş, Romania, in 2000, 2001, and 2002, and in Tarragona, Spain, in 2003), A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 284–295, 2003. c Springer-Verlag Berlin Heidelberg 2003  Membrane Computing 285 as well as the proceedings of the Brainstorming Week on Membrane Computing, held in Tarragona, in February 2003. For a comprehensive introduction to membrane computing one can also use the monograph [7]. 2 The Basic Class of P Systems The fundamental ingredients of a membrane system are the (1) membrane structure and the sets of (2) evolution rules which process (3) multisets of (4) objects placed in the compartments of the membrane structure. A membrane structure is a hierarchically arranged set of membranes (understood as three dimensional vesicles), as suggested in Figure 1. We distinguish the external membrane (corresponding to the plasma membrane and usually called the skin membrane) and several internal membranes (corresponding to the membranes present in a cell, around the nucleus, in Golgi apparatus, vesicles, etc); a membrane without any other membrane inside it is said to be elementary. Each membrane determines a compartment, also called region, the space delimited from above by it and from below by the membranes placed directly inside, if any exists. The correspondence membrane-region is one-to-one, that is why we sometimes use interchangeably these terms; also, we identify by the same label a membrane and its associated region. (Mathematically, a membrane structure is represented by the unordered tree which describes it, or by a sequence of matching labelled parentheses.) elementary membrane membrane skin ❆❆❯ ❅ ✡ ✩ ✬ ❅ ✡ ✩ ✬ ❘ ❅ ✘ 1 2✛ 4 ✓ ✡ ✢ ✏ membrane 5 ✒ ✑ ✜ 6✤ ✓ ✓ ✏ ✏ ✯ ✟ 9 ✠ region ✟ ❍ ❅❍✚ ✙ ❍❍ ✒8✑ ✒✑ ❅✛ ✢ ✘✣ ❍❍ ❅ ✓ ✏ ❘ 3❅ ❍ ❥ ❍ environment environment ✒ ✑ 7 ✚✙ ✫ ✪ ✫ ✪ Fig. 1. A membrane structure In the basic variant of P systems, each region contains a multiset of symbolobjects, which correspond to the chemicals swimming in a solution in a cell compartment; these chemicals are considered here as unstructured, that is why we describe them by symbols from a given alphabet. The objects evolve by means of evolution rules, which are also localized, associated with the regions of the membrane structure. The rules correspond 286 G. Păun to the chemical reactions possible in the compartments of a cell. The typical form of such a rule is aad → (a, here)(b, out)(b, in), with the following meaning: two copies of object a and one copy of object d react and the reaction produces one copy of a and two copies of b; the new copy of a remains in the same region (indication here), one of the copies of b exits the compartment (indication out) and the other enters one of the directly inner membranes (indication in). We say that the objects a, b, b are communicated as indicated by the commands associated with them in the right hand member of the rule. When an object exits a compartment, it will go to the surrounding compartment; in the case of the skin membrane this is the environment, hence the object is “lost”, it never comes back into the system. If no inner membrane exists (that is, the rule is associated with an elementary membrane), then the indication in cannot be followed, and the rule cannot be applied. The communication of objects through membranes reminds of the fact that the biological membranes contain various (protein) channels through which the molecules can pass (in a passive way, due to concentration difference, or in an active way, with a consumption of energy), in a rather selective manner. A rule as above, with several objects in its left hand member, is said to be cooperative; a particular case is that of catalytic rules, of the form ca → cu, where a is an object and c is a catalyst, always appearing only in such rules, never changing. A rule of the form a → u, where a is an object, is called noncooperative. The rules associated with a compartment are applied to the objects from that compartment, in a maximally parallel way: all objects which can evolve by means of local rules should do it (we assign objects to rules, until no further assignment is possible). The used objects are “consumed”, the newly produced objects are placed in the compartments of the membrane structure according to the communication commands assigned to them. The rules to be used and the objects to evolve are chosen in a nondeterministic manner. In turn, all compartments of the system evolve at the same time, synchronously (a common clock is assumed for all membranes). Thus, we have two levels of parallelism, one at the level of compartments and one at the level of the whole “cell”. A membrane structure and the multisets of objects from its compartments identify a configuration of a P system. By a nondeterministic maximally parallel use of rules as suggested above we pass to another configuration; such a step is called a transition. A sequence of transitions constitutes a computation. A computation is successful if it halts, it reaches a configuration where no rule can be applied to the existing objects. With a halting computation we can associate a result in various ways. The simplest possibility is to count the objects present in the halting configuration in a specified elementary membrane; this is called internal output. We can also count the objects which leave the system during the computation, and this is called external output. In both cases the result is a number. If we distinguish among different objects, then we can have as the result a vector of natural numbers. The objects which leave the system can also Membrane Computing 287 be arranged in a sequence according to the moments when they exit the skin membrane, and in this case the result is a string. This last possibility is worth emphasizing, because of the qualitative difference between the data structure used inside the system (multisets of objects, hence numbers) and the data structure of the result, which is a string, it contains a positional information, a syntax. A string can also be obtained by following the trace of a distinguished object (a “traveller”) through membranes. Because of the nondeterminism of the application of rules, starting from an initial configuration, we can get several successful computations, hence several results. Thus, a P system computes (one also uses to say generates) a set of numbers, or a set of vectors of numbers, or a language. We stress the fact that the data structure used in this basic type of P systems is the multiset (of symbols), hence membrane computing can be considered as a biologically inspired algorithmic framework for processing multisets (in a distributed, parallel, nondeterministic manner). Moreover, the main type of evolution rules are rewriting-like rules. Thus, membrane computing has natural connections with many areas of (theoretical) computer science: formal languages (L systems, commutative languages, formal power series, grammar systems, regulated rewriting), automata theory, DNA (more general: molecular) computing, the chemical abstract machine, the Gamma language, Petri nets, complexity theory, etc. 3 Further Ingredients With motivations coming from biology (trying to have systems as adequate as possible to the cell structure and functioning), from computer science (looking for computationally powerful and/or efficient models), or from mathematics (minimalistic models, even if they are not realistic, are more elegant, challenging, appealing), many types of P systems were introduced and investigated. The number of features considered in this framework is very large. For instance, we can add a partial order relation to each set of rules, interpreted as a priority relation among rules (this corresponds to the fact that certain reactions are more likely to appear – are more active – than others), and in this way the nondeterminism is decreased. The rules can also have other effects than changing the multisets of objects, namely, they can control the membrane permeability (this corresponds to the fact that the protein channels from cell membranes can sometimes be closed, e.g., when an undesirable substance should be kept isolated, and they are reopen when the “poison” vanishes). If a membrane is non-permeable, then no rule which asks for passing an object through it can be used. In this way, the processes taking place in a membrane system can be controlled (“programmed”). In particular, membranes can be dissolved (all objects and membranes from a dissolved membrane are left free in the surrounding compartment – the skin membrane is never dissolved, because this destroys the “computer”; the rules of the dissolved membrane are removed, they are supposed to be specific to the 288 G. Păun reaction conditions from the former compartment, hence they cannot be applied in the upper compartment, which has its own rules), created, and divided (like in biology, when a membrane is divided, its content is replicated in the newly obtained membranes). Furthermore, the rules can be used in a conditional manner, depending on the contents of the region where they are applied. The conditions can be of a permitting context type (a rule is applied only if certain associated objects are present) or of a forbidding context type (a rule is applied only if certain associated objects are not present). This also reminds of biological facts, the promoters and the inhibitors which regulate many biochemical reactions. Several other ingredients can be considered but we do not enter here into details. 4 Processing Structured Objects The case of symbol-objects corresponds to a level of approaching (“zooming”) the cell where we distinguish the internal compartmentalization and the chemicals from compartments, but not the structure of these chemicals. However, most of the molecules present in a cell have a complex structure, and this observation makes necessary to consider structured objects also in P systems. A particular case of interest is that where the chemicals can be described by strings (this is the case with DNA, RNA, etc). String-objects were considered in membrane systems from the very beginning. There are two possibilities: to work with sets of strings (hence languages, in the usual sense) or with multisets of strings, where we count the different copies of the same string. In both cases we need evolution rules based on string processing operations, while the second case makes necessary the use of operations which increase and decrease the number of (copies of) strings. Among the operations used in this framework, the basic ones were rewriting and splicing (well-known in DNA computing: two strings are cut at specific sites and the fragments are recombined), but also less popular operations were used, such as rewriting with replication, splitting, conditional concatenation, etc. The next step is to consider trees or arbitrary graphs as objects, with corresponding operations, then two-dimensional arrays, or even more complex pictures. The bibliography from the mentioned web page contains titles which refer to all these possibilities. A common feature of the membrane systems which work with strings or with more complex objects is the fact that the halting condition can be avoided when defining the successful computations and their result: a number is not “completely computed” until the computation is finished, it can grow at any further step, but a string sent out of the system at any time remains unchanged, irrespective whether or not the computation continues. Thus, if we compute/generate languages, then the powerful “programming technique” of the halting condition can be ignored (this is also biologically motivated, as, in general, the biological processes aim to last as much as possible, not to reach a “dead state”). Membrane Computing 5 289 Universality From a computability point of view, it is quite interesting that many types of P systems (this means, many combinations of ingredients as those described in the previous sections), of rather restricted forms, are computationally universal. In the case when numbers are computed, this means that these systems can compute all Turing computable sets of natural numbers. When the result of a computation is a string or a set of strings, we get characterizations of the family of recursively enumerable languages. This is true even for systems with simple rules (catalytic), with a very reduced number of membranes (most of the universality results recalled in [7] refer to systems with less that five membranes). The proof techniques frequently used in such universality results are based on the universality of matrix grammars with appearance checking (in certain normal forms) or on the universality of register machines – and this is rather interesting, as both these machineries are “old stuff” in computer science, being well investigated already three to four decades ago (in both cases, improvements of old results were necessary, motivated by the applications to membrane computing; for instance, new normal forms for matrix grammars, sharper than those known from the literature were recently proved). The abundance of universality results obtained in membrane computing, on the one hand, shows that “the cell is a powerful computer”, on the other hand, asks for an “explanation” of this phenomenon. Roughly speaking, the explanation lies in the fact that Turing computability is based on the possibility to use an arbitrarily large work space, and this means to really use it, that is, to control all this space, to send messages at an arbitrary distance (in general, this can be reformulated as context-sensitivity); besides context-sensitivity, essential is the possibility of erasing. Membrane systems possess erasing by definition (sending objects to the environment or to a “garbage collector” membrane can mean erasing), while the synchronized use of rules (the maximal parallelism) together with the compartmentalization and the halting condition provide “sufficient” context-sensitivity. Thus, the universality is expected, the only challenge is to get it by using systems with a small number of membranes, using as restricted features as possible. For instance, by using catalytic rules also having associated a priority relation it is rather easy to get the universality; not so easy is to replace the priority with the possibility to control the membrane permeability, but this can be done. However, it is surprising to get the universality by using catalytic rules only and no other ingredient. An additional problem concerns the number of catalysts. The initial proof (by P. Sosik) of the universality of catalytic P systems used eight catalysts, then the number was decreased to six, then to five (R. Freund and P. Sosik), it was shown that one catalyst does not suffice (O.H. Ibarra et al), but the question which is the optimal result from this point of view remains open. Similar “races” for the best result can be found in the case of the number of membranes for various other types of P systems (just one example: for a while, matrix grammars without appearance checking were simulated by rewriting string-object P systems with four membranes, but recently the result 290 G. Păun was improved to three – M. Madhu – without knowing whether this is an optimal result). 6 Computing by Communication Only The chemicals do not pass always alone through membranes, but a coupled transport is often met, where two solutes pass together through a protein channel, either in the same direction or in the opposite directions. In the first case the process is called symport, in the latter case it is called antiport. For completeness, uniport names the case when a single molecule passes through a membrane. The idea of a coupled transport can be captured in membrane computing terms in a rather easy way: for the symport case, consider rules of the form (ab, in) or (ab, out), while for the antiport case write (a, out; b, in), with the obvious meaning. Mathematically, we can generalize this idea and consider rules which move arbitrarily many objects through a membrane. The use such rules suggests a very interesting question (research topic): can we compute only by communication, only by transferring objects through membranes? This question leads to considering systems which contain only symport/antiport rules, which only change the places of objects, but not their “names” (no object is created or destroyed). One starts with (finite) multisets of objects placed in the regions of the system, and with certain objects available in the environment in arbitrarily many copies (the environment is an inexhaustible provider of “raw materials”, otherwise we can only deal with the finite number of objects given at the beginning; note that by symport and/or antiport rules associated with the skin membrane we can bring objects from the environment into the system); the symport/antiport rules associated with the membranes are used in the standard nondeterministic maximally parallel manner – and in this way we get a computation. Note that such systems have several interesting properties, besides the fact that they compute by communication only: the rules are directly inspired from biology, the environment takes part to the process, nothing is created, nothing is destroyed, hence the conservation law is observed – and all these features are rather close to reality. Surprising at the first sight, but expected in view of the context-sensitivity and erasing possibilities available in symport/antiport P systems, these systems are again universal, even when using a small number of membranes, symport rules and/or antiport rules of small “weights” (the weight of a rule is the number of objects it involves). 7 P Automata Up to now we have discussed only P systems which behave like a grammar: one starts from an initial configuration and one evolves according to the given evolution rules, collecting some results, numbers or strings, in a specified membrane or in the environment. Also an automata-like behavior is possible, especially in Membrane Computing 291 the case of systems using only symport/antiport rules. For instance, we can say that a string is accepted by a P system if it consists of symbols brought into the system during a halting computation (we can imagine that a tape is present in the environment, the symbols of which are taken by symport or by antiport rules and introduced into the system; if the computation halts, then the contents of the tape is accepted). This is a simple and natural definition, considered by R. Freund and M. Oswald. More automata ingredients were considered by E. Csuhaj-Varju and G. Vaszil (the contents of regions are considered states, which control the computation, while only symport rules of the form (x, in) are used, hence the communication is done in a one-way manner; further features are considered, but we omit them here), and by K. Krithivasan, M. Mutyam, and S.V. Varma (special objects are used, playing the role of states, which raises interesting questions concerning the minimisation of P automata both from the point of view of the number of membranes and of states). The next step is to consider not only an input but also an output of a P system, and this step was also done, by considering P transducers (G. Ciobanu, Gh. Păun, and Gh. Ştefănescu). As expected, also in the case of P automata (and P transducers) we get the universality: the recursively enumerable languages (the Turing translations, respectively) are characterized in all circumstances mentioned above, always with systems of a reduced size. 8 Computational Efficiency The computational power is only one criterion for assessing the quality of a new computing machinery; from a practical point of view at least equally important is the efficiency of the new device. The P systems display a high degree of parallelism. Moreover, at the mathematical level, rules of the form a → aa are allowed and by iterating such rules we can produce an exponential number of objects in a linear time. The parallelism and the possibility to produce an exponential working space are standard ways to speed-up computations. In the general framework of P systems with symbol-objects (and without membrane division or membrane creation) these ingredients do not suffice in order to solve computationally hard problems (e.g., NP-complete problems) in a polynomial time: in [11] it is proved that any deterministic P system can be simulated by a deterministic Turing machine with a linear slowdown. However, pleasantly enough, if additional features are considered, either able to provide an enhanced parallelism (for instance, by membrane division, which may produce exponentially many membranes in a linear time), or to better structurate the multisets of objects (by membrane creation), then NP-complete problems can be solved in a polynomial (often, linear) time. The procedure is as follows (it has some specific features, slightly different from the standard computational complexity requirements). Given a decision problem, we construct in polynomial time a family of P systems (each one of a polynomial size) which 292 G. Păun will solve the instance of the problem in the following sense. In a well specified time, bounded by a given function, the system corresponding to the instances of a given size of the problem will sent to its environment a special object yes if and only if the instance of the problem introduced into the initial configuration of the system has a positive answer. During the computation, the system can grow exponentially (as the number of objects and/or the number of membranes) and can work in a nondeterministic manner; important is that it always halt. Standard problems for illustrating this approach are SAT (satisfiability of propositional formulas in the conjunctive normal form) and HPP (the existence of an Hamiltonian path in a directed graph), but many other problems were also considered. Details can be found in [7] and [9]. There is an interesting point here: we have said that the family of P systems solving a given problem is constructed in polynomial time, but this does not necessarily mean that the construction is uniform: it may not start from n but from the nth instance of the problem. Because the construction (done by a Turing machine) takes a polynomial time, it is honest, it cannot hide the solution of the problem in the system itself which solves the problem. This “semi-uniformity” (we may call it fairness/honestity) is usual in molecular computing. However, if we insist on having uniform constructions in the classic sense of complexity theory, then this can also be obtained in many cases. A series of results in this direction were obtained by the Sevilla membrane computing group (M.J. PerezJimenez, A. Romero-Jimenez, F. Sancho-Caparrini, etc). Recently, a surprising result was reported by P. Sosik: P systems with membrane division can also solve in polynomial time problems known to be PSPACEcomplete. P. Sosik has shown this for QBF (satisfiability of quantified propositional formulas). The family of P systems used in the proof is constructed in the semi-uniform manner mentioned above and the systems use the division operation not only for elementary membranes but also for arbitrary membranes. It is an open problem whether or not the result can be improved from these two points of view. All previous remarks refer to P systems with symbol-objects. Polynomial (often linear) solutions to NP-complete problems can be obtained also in the framework of string-objects, for instance, when string replication is used for obtaining an exponential work space. 9 Resent Research Topics The two types of attractive results mentioned in the previous sections – computational universality and computational efficiency – as well as the versatility of the P systems explain the very rapid development of the membrane computing area. Besides the topics discussed above, many others were investigated (normal forms in what concerns the shape of the membrane structure, the number and the type of used rules, decidability problems, links with Eilenberg X machines, parallel rewriting of string-objects, ways to avoid the communication deadlock in this case, associating energy to objects or to reactions, and so on and so forth), Membrane Computing 293 but we do not enter here into details. Instead, we just briefly mention some topics which were considered in the last time, some of them promising to open new research vistas in membrane computing. A P system is a computing model, but at the same time it is a model of a cell, whatever reductionistic it is in a given form, hence one can consider its evolution, its “life” as the main topic of investigation and not a number/string produced at the end of a halting computation. This leads to interpreting P systems as dynamic systems, possibly evolving forever, and this viewpoint raises specific questions, different from the computer science ones. Such an approach (P systems as dynamic systems) was started by V. Manca and F. Bernardini, and promises to be of interest for biological applications (see also the next section). At a theoretical level, a fruitful recent idea is to associate with a P system (with string-objects) not only one language, as usual for grammar or automatalike devices, but a family of languages. This reminds the “old” idea of grammar forms, but also the forbidding-enforcing systems [3]. Actually, M. Cavaliere and N. Jonoska have started from such a possible bridge between forbidding-enforcing systems and membrane systems, considering P systems with a way to define the new populations of strings in terms of forbidding-enforcing conditions. A different idea in defining a family of languages as “generated” by a P system was followed by A. Alhazov. Returning to the abundance of universality results, which somehow end the research interest for the respective classes of P systems (the equivalence with Turing machines directly implies conclusions regarding decidability, complexity, closure properties, etc), a related question of interest is to investigate the subuniversal classes of P systems. For instance, several universality results refer to systems with arbitrary catalytic rules (of the form ca → cu), used together with non-catalytic rules; also, a given number of membranes is necessary (although in many cases one does not know the sharp borderline between universality and sub-universality from this point of view). What about the power and the properties of P systems which are not universal? Some problems are shown to be decidable for them; which is the complexity of these problems? Which are the closure properties of the associated families of numbers or of languages? Topics of this type were addressed from time to time, but recently O.H. Ibarra and his group started a systematic study, considering both new (restricted) classes of P systems and new problems (e.g., the reachability of a configuration and the complexity of deciding it). Rather promising seems to be the use of P systems for handling twodimensional objects. There are several papers in this area, dealing with graphs, arrays, other types of pictures (R. Freund, M. Oswald, K. Krithivasan and her group, R. Ceterchi, R. Gramatovici, N. Jonoska, K.G. Subramanian, etc). Especially interesting is the following idea (suggested several times in membrane computing papers and now followed by R. Ceterchi and her colleagues in Tarragona): instead of using a membrane structure as a support of a computation whose “main” subject are the objects present in the regions of the membrane structure, let us take the tree which describes the membrane structure as the 294 G. Păun subject of the computation, and use the contents of the regions as auxiliary tools in the computation. A very important direction of research – important especially from the point of view of applications in biology and related areas – is to bring to membrane computing some approximate reasoning tools, some non-crisp mathematics, in the probability theory, fuzzy sets, or rough sets sense – or in a mixture of all these. Randomized P algorithms, which solve hard problems in polynomial time, using a polynomial space, with a controlled probability, were already proposed by A. Obtulowicz, who has also started a systematic study of the possibility to model the uncertainty in membrane computing. It is highly probable that all these topics will be much investigated in the near future, with a special emphasis on complexity matters and on issues related to applications, to the adequacy of membrane computing to the biological reality. 10 Implementations and Applications Some branches of natural computing, such as neural computing and evolutionary computing, starts from biology and try to improve the way we use the existing electronic computers, while DNA computing has the ambition to find a new support for computations, a new hardware. For membrane computing is not yet clear in which direction we have to look for implementations. Anyway, it seems too early to try to implement computations at the level of a cell, whatever attractive this seems to be. However, there are several attempts to implement (actually, to simulate) P systems on the usual computers. Of course, the biochemically inspired nice features of P systems (in special, the nondeterminism and the parallelism) are lost, as they can be only simulated on the deterministic usual computers, but the obtained simulators still can be useful for certain practical purposes (not to mention their didactical usefulness). At this moment, there are reported at least one dozen of programs for implementing P systems of various types – see references in the web page, where some programs are available, too. On the other hand, several applications of membrane computing were reported in the literature, in general, of the following type: one takes a piece of reality, most frequently from cell biology, but also from artificial life, abstract chemistry, biology of eco-systems, one constructs a P system modelling this piece of reality, then one writes a program which simulates this P system and one runs experiments, carefully arranging the system parameters (especially, the form of rules and their probabilities to be applied); statistics about the populations of objects in various compartments of the system are obtained, sometimes suggesting interesting conclusions. Typical examples can be found in [1] (including an approach to the famous Brusselator model, with conclusions which fit with the known ones, obtained by using continuous mathematics – by Y. Suzuki et al, an investigation of photosynthesis – by T. Nishida, signaling patways and T cell activation – by G. Ciobanu and his collaborators). Several other (preliminary) applications of P systems to cryptography, linguistics, distributed computing Membrane Computing 295 can be found in the volumes [1,8], while [2] contains a promising application in writing algorithms for sorting. The turning of the domain towards applications in biology is rather natural: P systems are (discrete, algorithmic, well investigated) models of the cell and the cell biologists miss efficient global models of the cell, in spite of the fact that modelling and simulating the living cell is a very important task (as it was stated in several places, this is one of the main challenges of bioinformatics for this beginning of millennium). 11 Final Remarks At the end of this brief and informal excursion to membrane computing, we stress the fact that our goal was only to give a general impression about this fastly growing research area, hence we strongly suggest to the interested reader to access the web page mentioned in the first section of the paper for any additional information. The page contains the full current bibliography, many downloadable papers, the addresses of people who have contributed to membrane computing, lists of open problems, calls for participation to related meetings, some software for simulating P systems, etc. References 1. C.S. Calude, Gh. Păun, G. Rozenberg, A. Salomaa, eds., Multiset Processing. Mathematical, Computer Science, and Molecular Computing Points of View, Lecture Notes in Computer Science, 2235, Springer, Berlin, 2001. 2. M. Cavaliere, C. Martin-Vide, Gh. Păun, eds., Proceedings of the Brainstorming Week on Membrane Computing; Tarragona, February 2003, Technical Report 26/03, Rovira i Virgili University, Tarragona, 2003. 3. A. Ehrenfeucht, G. Rozenberg, Forbidding-enforcing systems, Theoretical Computer Science, 292 (2003), 611–638. 4. O.H. Ibarra, On the computational complexity of membrane computing systems, submitted, 2003. 5. K. Krithivasan, S.V. Varma, On minimising finite state P automata, submitted, 2003. 6. Gh. Păun, Computing with membranes, Journal of Computer and System Sciences, 61, 1 (2000), 108–143. 7. Gh. Păun, Computing with Membranes: An Introduction, Springer, Berlin, 2002. 8. Gh. Păun, G. Rozenberg, A. Salomaa, C. Zandron, eds., Membrane Computing. International Workshop, WMC-CdeA 2002, Curtea de Argeş, Romania, Revised Papers, Lecture Notes in Computer Science, 2597, Springer, Berlin, 2003. 9. M. Perez-Jimenez, A. Romero-Jimenez, F. Sancho-Caparrini, Teorı́a de la Complejidad en Modelos de Computatión Celular con Membranas, Editorial Kronos, Sevilla, 2002. 10. P. Sosik, The computational power of cell division in P systems: Beating down parallel computers?, Natural Computing, 2003 (in press). 11. C. Zandron, A Model for Molecular Computing: Membrane Systems, PhD Thesis, Universitá degli Studi di Milano, 2001. Classical Simulation Complexity of Quantum Machines⋆ Farid Ablayev and Aida Gainutdinova Dept. of Theoretical Cybernetics, Kazan State University 420008 Kazan, Russia {ablayev,aida}@ksu.ru Abstract. We present a classical probabilistic simulation technique of quantum Turing machines. As a corollary of this technique we obtain several results on relationship among classical and quantum complexity classes such as: P rQP = P P , BQP ⊆ P P and P rQSP ACE(S(n)) = P rP SP ACE(S(n)). 1 Introduction Investigations of different aspects of quantum computations in last decade became a very intensively growing area of mathematics, computer science, physics and technology. A good source of information on quantum computations is Nielsen’s and Chuang’s book [8]. Notice that in quantum mechanic and quantum computations traditionally used “right-left” presentation of computational process. That is, current general state of quantum system is presented as column-vector |ψ which is multiplied by unitary transition matrix U to obtain next general state |ψ ′  = U |ψ. In this paper we use “left-right” presentation of quantum computational process (as it is used to use for presentation of classical deterministic and stochastic computational processes). That is, current general state of quantum system is presented as row-vector ψ| (elements of ψ| are complex conjugates of elements of |ψ) which is multiplied by unitary transition matrix W = U † to obtain next general state ψ ′ | = ψ|W . In the paper we consider probabilistic and quantum complexity classes. Here BQSpace(S(n)) and P rQSpace(S(n)) stand for complexity classes determined by O(S(n)) space bounded quantum Turing machines that recognize languages with bounded and unbounded error respectively. P rSpace(S(n)) stands for complexity class determined by O(S(n)) space bounded classical probabilistic Turing machines that recognize languages with unbounded error. BQT ime(T (n)) and P rQT ime(T (n)) stand for complexity classes determined by O(T (n)) time bounded quantum Turing machines that recognize languages with bounded and ⋆ Supported by the Russia Fund for Basic Research under the grant 03-01-00769 A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 296–302, 2003. c Springer-Verlag Berlin Heidelberg 2003  Classical Simulation Complexity of Quantum Machines 297 unbounded error respectively. P rT ime(T (n)) stands for complexity class determined by O(T (n)) time bounded classical probabilistic Turing machines that recognize languages with unbounded error. We assume T (n) ≥ n and S(n) ≥ log n are fully time and space constructible respectively. For most of the paper, we will refer to the polynomial-time case, where T (n) = S(n) = nO(1) . Classical simulations of quantum computational models use different techniques, see for example [3,9,10,6,7]. In our paper we view a computation process of classical one-tape probabilistic Turing machines (PTM) and quantum Turing machines (QTM) as a linear process. That is, a computation on PTM for particular input u is the Markov process, in which a vector of probabilities distribution of configurations at a given step is multiplied by a fixed stochastic transition matrix M to obtain the vector of probabilities distribution of configurations at the next step. A computation on QTM is a unitary-linear process similar to the Markov process. A quantum computation step corresponds to multiplying a general state (the vector of amplitudes distribution of all possible configurations) at the current step by fixed complex unitary transition matrix to obtain a general state at the next step. We refer to the paper [6] for more information. In the paper we present classical Simulation Theorem 2 (simulation technique of quantum computation process) which states that having unitary-linear process we can construct equivalent (in the sense of language presentation) Markov process. This simulation technique allows to gather together different complexity results on classical simulation of quantum computations. As a corollary of the Theorem 2 we have the following relations among complexity classes. Theorem 1. P rQT ime(T (n)) = P rT ime(T (n)). In particular P rQP = P P . BQT ime(T (n)) ⊆ P rT ime(T (n)) [1], In particular BQP ⊆ P P BQSpace(S(n)) ⊆ P rSpace(S(n)) [10] and P rQSpace(S(n)) = P rSpace(S(n)) [10] Proof (Sketch): Quantum simulation technique of classical probabilistic Turing machines is well known, see for example [5,8]. This technique establishes inclusions P rSpace(S(n)) ⊆ P rQSpace(S(n)) and P rT ime(T (n)) ⊆ P rQT ime(T (n)). The Simulation Theorem 2 and observation (section 4) prove inclusions: BQT ime(T (n)) ⊆ P rT ime(T (n)), P rQT ime(T (n)) ⊆ P rT ime(T (n)), BQSpace(S(n)) ⊆ P rSpace(S(n)), P rQSpace(S(n)) ⊆ P rSpace(S(n)). 2 Classical Simulation of Quantum Turing Machines We consider a two-tape Turing machine (probabilistic and quantum) with readonly input tape and read-write tape. We call Turing machine M t(n)-time, 298 F. Ablayev and A. Gainutdinova s(n)-space machine if every computation of M on input of length n halts in at most t(n) steps and uses at most s(n) cells on the read-write tape during a computation. We assume t(n) ≥ n and s(n) ≥ log n are fully time and space constructible respectively. We will always have s(n) ≤ t(n) ≤ 2O(s(n)) . By a configuration C of Turing machine we mean the content of its read-write tape, tape pointers, and current state of the machine. Definition 1 A probabilistic Turing machine (PTM) P consists of a finite set Q of states, a finite input alphabet Σ, a finite tape alphabet Γ , and a transition function δ : Q × Σ × Γ × Q × Γ × {L, R} × {L, R} → [0, 1] where δ(q, σ, γ, q ′ , γ ′ , d1 , d2 ) gives the probability with which the machine in state q reading σ and γ will enter state q ′ , write γ ′ , and move in direction d1 and d2 on read and read-write tapes respectively. Definition 2 A quantum Turing machine QTM Q consists of a finite set Q of states, a finite input alphabet Σ, a finite tape alphabet Γ , and a transition function δ : Q × Σ × Γ × Q × Γ × {L, R} × {L, R} → C where C is the set of complex numbers, δ(q, σ, γ, q ′ , γ ′ , d1 , d2 ) gives the amplitude with which the machine in state q reading σ and γ will enter state q ′ , write γ ′ , and move in direction d1 and d2 on read and read-write tapes respectively. Vector-Matrix Machine. From now we will view Turing machine computation as a linear process described in [6]. Below we present formal description of probabilistic and quantum machine in matrix form. For fairness we should only allow efficiently computable matrix entries, where we can compute i-th bit in time polynomial in i. First we define a general d-dimensional, t-time “vector-matrix machine” (d, t)−VMM that feeds our needs for linear presentation of computation procedure of probabilistic and quantum machines. Fix an input u. VMM (u) = a(0)|, T, F  where a(0)| = (a1 , . . . , ad ) is an initial row-vector for an input u, T is a d × d transition matrix, F ⊆ {1, . . . , d} is an accepting set of states. VMM (u) proceeds in t steps as follows: in each step i a current vector a(i)| is multiplied by d × d matrix T to obtain the next vector a(i + 1)| that is, a(i + 1)| = a(i)|T . From the resulting vector a(t)| we determine numbers 1 2 P raccept (u) and P raccept (u) as follows:  1 (u) = i∈F |ai (t)|; 1. P raccept 2 2. P raccept (u) = i∈F |ai (t)|2 . These numbers will express probability of u acceptance for probabilistic and quantum machines respectively. We call VMM (u) that uses P r1 (VMM (u)) (P r2 (VMM (u))) for probability acceptance Type I VMM (u) (Type II VMM (u)). Classical Simulation Complexity of Quantum Machines 299 Linear Presentation of Probabilistic Machine. Let P be a t(n)-time, s(n)space PTM. Computation on an input u of length n by P can be presented by a finite Markov chain with d(n) = 2O(s(n)) states (states of this Markov chain correspond to configurations of PTM) and d(n) × d(n) stochastic matrix M . Notice that for polynomial-time computation, given configurations Ci , Cj and input u one can in polynomial-time compute probability M (i, j) of transition from Ci to Cj , even though the whole transition matrix M is too big to write down in polynomial-time. Formally computation on input u, |u| = n, can be described by stochastic machine SM (u) SM (u) = p(0)|, M, F  where SM is Type I (d(n), t(n))−VMM with the following restrictions: p(0)| = (p1 , . . . , pd(n) ) is stochastic row-vector of initial probabilities distribution of configurations. That is, pi = 1 and pj = 0 for j = i, where Ci is the initial configuration of P for the input u. M is the stochastic matrix defined above. F ⊆ {1, . . . , d(n)} is a set of indexes of accepting configurations of P. Linear Presentation of Quantum Machine. Consider t(n)-time, s(n)-space QTM Q. Computation on an input u of length n by Q can be presented by the following restricted quantum system (unitary-linear process) with d(n) (d(n) = 2O(s(n)) ) basis states corresponding to configurations of QTM and d(n) × d(n) complex valued unitary matrix W . Notice that for polynomial-time computation, given configurations Ci , Cj and input u one can in polynomial-time compute amplitude W (i, j) of transition from Ci to Cj , as for PTM. Formally computation on input u, |u| = n, can be described by linear machine LM (u) LM (u) = µ(0)|, W, F  where LM (u) is Type II (d(n), t(n)) − VMM with the following restrictions: µ(0)| = (z1 , . . . , zd(n) ) is the initial general state (complex row-vector of initial amplitudes distribution of configurations). Namely, zj = 0 for j = i and zi = 1 where Ci is the initial configuration of Q for the input u. W is the unitary matrix defined above. F ⊆ {1, . . . , d(n)} is a set of indexes of accepting configurations of Q. Language Acceptance Criteria. We use standard unbounded error and bounded error acceptance criteria. For a language L, for an n ≥ 1 denote Ln = L ∩ Σ n . We say that language Ln is unbounded error recognized by Type I (Type II) (d(n), t(n))−VMM if for arbitrary input u ∈ Σ n there exists Type I (Type II) (d(n), t(n))−VMM (u) such that it is holds that P r(VMM (u)) > 1/2 for u ∈ Ln and P r(VMM (u)) < 1/2 for u ∈ Ln . Similarly we say that language Ln is (d(n), t(n))−VMM bounded error recognized by Type I (Type II) (d(n), t(n))−VMM if for ǫ ∈ (0, 1/2), arbitrary u ∈ Σ n there exists Type I (Type II) (d(n), t(n))−VMM (u) such that it is holds that P r(VMM (u)) ≥ 1/2 + ǫ for u ∈ Ln and P r(VMM (u)) ≤ 1/2 − ǫ for u ∈ Ln . We say that VMM (u) process its input u with threshold 1/2. 300 F. Ablayev and A. Gainutdinova Let M be a classic probabilistic P or quantum Q Turing machine. We say that M unbounded (bounded) error recognizes language L ⊆ Σ ∗ if for all n ≥ 1 corresponding (d(n), t(n)) − VMM unbounded (bounded) error recognizes language Ln . Theorem 2 (Simulation Theorem). Let language Ln be unbounded error (bounded error) recognized by quantum machine (d(n), t(n))−LM . Then there exists stochastic machine (d′ (n), t′ (n))−SM that unbounded error recognizes Ln with d′ (n) ≤ 4d2 (n) + 3, and t′ (n) = t(n). We present the sketch of the proof of Theorem 2 in the next section. 3 Proof of Simulation Theorem For the proof let us fix arbitrary input u, |u| = n, and let d = d(n) and t = t(n). We call VMM (u) complex-valued (real-valued) if VMM has complex-valued (real-valued) entries for initial vector and transition matrix. Lemma 1. Let LM (u) be complex-valued (d, t)−LM (u). Then there exists realvalued (2d, t)−LM ′ (u) such that P r(LM (u)) = P r(LM ′ (u)). Proof: The proof uses the real-valued simulation of complex-valued matrix multiplication (which is now folklore) and is omitted. Next Lemma states complexity relation among machines of Type I and Type II (among “linear” and “non linear” extracting a result of computation). Lemma 2. Let LM (u) be real-valued (d, t) − LM (u). Then there exists realvalued Type I (d2 , t)−VMM (u) such that P r(V M M (u)) = P r(LM (u)). Proof: Let LM (u) = µ(0)|, W, F . We construct VMM (u) = τ (0)|, T, F ′  as follows. The initial general state τ (0)| = µ(0) ⊗ µ(0)| — is d2 -dimension vector, T = W ⊗ W is d2 × d2 matrix. Accepting set F ′ ⊆ {1, . . . , d2 (n)} of states is defined in according to F ⊆ {1, . . . , d} as follows F ′ = {j : j = (i−1)d+i, i ∈ F }. We denote |i – d-dimensional unit column-vector with value 1 at i and 0 elsewhere. Using the fact that for real valued vectors c, b it is holds that  t c|b2 = c ⊗ c|b ⊗ b we have that T t = W ⊗ W = W t ⊗ W t and P r(VMM (u)) =  τ (0)|T t |j = = µ(0) ⊗ µ(0)|W t ⊗ W t |i ⊗ i i∈F j∈F ′   t 2 µ(0)|W |i = P r(LM (u)). i∈F Lemma 3. Let (d, t)−VMM (u) be real-valued Type I machine with k, k ≤ d, accepting states. Then there exists real-valued Type I (d, t)−VMM ′ (u) with unique accepting state such that P r(VMM (u)) = P r(VMM ′ (u)). Classical Simulation Complexity of Quantum Machines 301 Proof: The proof uses standard technique from Linear Automata Theory (see for example the book [4]) and is omitted. Next lemma presents classical probabilistic simulation complexity of linear machines. Lemma 4. Let VMM (u) be real-valued Type I (d, t) − VMM (u). Then there exists stochastic machine (d + 2, t)−SM (u) such that P r(SM (u)) = ct P r(VMM (u)) + 1/(d + 2) where constant c ∈ (0, 1] depends on VMM (u). Proof: Let VMM (u) = τ (0)|, T, F . In according to Lemma 3 we consider VMM (u) with unique accepting state. We construct SM (u) = p(0)|, M, F ′  as follows. For d × d matrix T we define (d + 2) × (d + 2) matrix   0 0...0 0   A =  b T ...  , β q 0 such that sum of elements of each row and each column of A is zero (we are free to select elements of column b, row q and number β). Matrix A has the property: sum of elements of each row and each column of A is zero. k-th power Ak of A preserves this property. Now let R be stochastic (d + 2) × (d + 2) matrix who’s (i, j)-entry is 1/(d + 2). Select positive constant c ≤ 1 such that matrix M , defined as M = cA + R is stochastic matrix. Further by induction on k we have that k-th power M k of M is also stochastic matrix and has the same structure. That is, M k = ck Ak + R. By selecting suitable initial probabilities distribution p(0)| and accepting state we can pick up from M t entry we need (entry that gives u accepting probability). From the construction of stochastic machine ((d + 2), t)-SM (u) we have that P r(SM (u)) = ct P r(VMM (u)) + 1/(d + 2). Lemma 4 says that having Type I (d, t)−VMM (u) that process its input u with threshold 1/2 one can construct stochastic machine (d + 2, t)−SM (u) that process u with threshold λ = ct 1/2 + 1/(d + 2). Lemma 5. Let (d, t)-SM (u) be stochastic machine that process its input u with threshold λ ∈ [0, 1). Then for arbitrary λ′ ∈ (λ, 1) there exists (d + 1, t)-SM ′ (u) that process u with threshold λ′ . Proof: The proof uses standard technique from Probabilistic Automata Theory (see for example the book [4]) and is omitted. 302 4 F. Ablayev and A. Gainutdinova Observation For machines presented in vector-matrix form Theorem 2 states complexity characteristics of classical simulation of quantum machines. Vector-matrix technique keep the dimension of classical machine close to dimension of quantum machine, and amazingly we have that the simulation time does not increase. But from Lemma 4 we have that the stochastic simulation of linear machine is not completely free of charge — we lose ǫ-isolation of threshold (bounded error acceptance property) of the machine. Notice that we present our classical simulation technique of quantum computation process (Simulation Theorem) in a form of vector-matrix machine VMM and omit a description how to come back to the uniform Turing machine. Obviously we have that in the case of Turing machines we will have slowdown of such simulations but this slowdown keeps simulations in polynomial time restriction. Remind that threshold changing technique for Turing machine models is well known (it was used for proving N P ⊆ P P inclusion, see for example [2]). Acknowledgments. We are grateful to referees for helpful remarks and on mentioning that the technique of the paper [1] also works for proving the first statement P rQT ime(T (n)) = P rT ime(T (n)) of Theorem 1. References 1. L. Adleman, J. Demarrais, M. Huang, Quantum computability, SIAM J. on Computing. 26(5), (1997), 1524–1540. 2. J. Balcázar, J. Dı́az and J. Gabarró, Structural Complexity I, An EATCS series, Springer-Verlag, 1995. 3. E. Bernstein and U. Vazirany, Quantum complexity theory, SIAM J. Comput, Vol. 26, No. 5, (1997), 1411–1473. 4. R. Bukharaev. The Foundation of Theory of Probabilistic Automata. Moscow, Nauka, 1985. (In Russian). 5. J. Gruska. Quantum computing. The McGraw-Hill Publishing Company. 1999. 6. L. Fortnow. One complexity theorist’s view of quantum computing. Theoretical Computer Science, 292(3), (2003), 597–610. 7. C. Moore, J. Crutchfield. Quantum Automata and Quantum Grammars. Theoretical Computer Science 237, (2000), 275–306. 8. M. Nielsen and I. Chuang. Quantum Computation and Quantum Information. Cambridge University Press. 2000. 9. D. Simon, On the power of quantum computation, SIAM J. Comput, Vol. 26, No. 5, (1997), 1474–1483. 10. J. Watrous. Space-bounded quantum complexity. Journal of Computer and System Sciences, 59(2), (1999), 281–326. Using Depth to Capture Average-Case Complexity Luı́s Antunes1⋆ , Lance Fortnow2 , and N.V. Vinodchandran3⋆⋆ 1 3 DCC-FC & LIACC-University of Porto R.Campo Alegre, 823, 4150-180 Porto, Portugal lfa@ncc.up.pt 2 NEC Laboratories America 4 Independence way, Princeton, NJ 08540 fortnow@nec-labs.com Department of Computer Science and Engineering University of Nebraska vinod@cse.unl.edu Abstract. We give the first characterization of Turing machines that run in polynomial-time on average. We show that a Turing machine M runs in average polynomial-time if for all inputs x the Turing machine uses time exponential in the computational depth of x, where the computational depth is a measure of the amount of “useful” information in x. 1 Introduction In theoretical computer science we analyze most algorithms based on their worstcase performance. Many algorithms with bad worse-case performance nevertheless perform well in practice. The instances that require a large running-time rarely occur. Levin [Lev86] developed a theory of average-case complexity to capture this issue. Levin gives a clean definition of Average Polynomial Time for a given language L and a distribution µ. Some languages may remain hard in the worst case but can be solved in Average Polynomial Time for all reasonable distributions. We give a crisp formulation of such languages using computational depth as developed by Antunes, Fortnow and van Melkebeek [AFvM01]. Define deptht (x) as the difference of K t (x) and K(x) where K(x) is the usual Kolmogorov complexity and K t (x) is the version where the running times are bounded by time t. The deptht function [AFvM01] measures in some sense the “useful information” of a string. We have two main results that hold for every language L. 1. If (L, µ) is in Average Polynomial Time for all P-samplable distributions µ then there exists a Turing machine M computing L and a polynomial p such that for all x, the running time of M (x) is bounded by 2O(depthp (x)+log |x|) . ⋆ ⋆⋆ Research done during an academic internship at NEC. This author is partially supported by funds granted to LIACC through the Programa de Financiamento Plurianual, Fundação para a Ciência e Tecnologia and Programa POSI. Research done while a post doctoral scientist at NEC Research Institute, Princeton. A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 303–310, 2003. c Springer-Verlag Berlin Heidelberg 2003  304 L. Antunes, L. Fortnow, and N.V. Vinodchandran 2. If there exists a Turing machine M and a polynomial p such that M computes L and for all inputs x, the running time of M (x) is bounded by 2O(depthp (x)+log |x|) , then (L, µ) is in Average Polynomial Time for all Pcomputable distributions. We do not get an exact characterization from these results. The first result requires P-samplable distributions and the second holds only for the smaller class of P-computable distributions. However, we can get an exact characterization by considering the time-bounded universal distribution mt . We show that the following are equivalent for every language L and every polynomial p: – (L, mp ) is in Average Polynomial Time. – There is some Turing machine M computing L such that for all inputs x the running time of M is bounded by 2O(depthp (x)+log |x|) . Since the polynomial-time bounded universal distribution is dominated by a P-samplable distribution and dominates all P-computable distributions (see [LV97]) our main results follow from this characterization. We prove our results for arbitrary time bounds t and as we take t towards infinity we recover Li and Vitányi’s [LV92] result that under (non-time-bounded) universal distribution, the average-case complexity and the worst-case complexity coincide. Our theorems could be viewed as a time-bounded version of Li and Vitányi’s result. This directly addresses the issue raised by Miltersen [Mil93] of relating a time-bounded version of Li and Vitányi with Levin’s average-case complexity. 2 Preliminaries We use binary alphabet Σ = {0, 1} for encoding strings. Our computation model will be prefix free Turing machines: Turing machines with a one-way input tape (the input head can only read from left to right), a one-way output tape and a two-way work tape. The function log denote log2 . All explicit resource bounds we use in this paper are time-constructible. 2.1 Kolmogorov Complexity and Computational Depth We give essential definitions and basic result in Kolmogorov complexity for our needs and refer the reader to the textbook by Li and Vitányi [LV97] for more details. We are interested in self-delimiting Kolmogorov complexity (denoted by K(.)). Definition 1. Let U be a fixed prefix free universal Turing machine. Then for any string x ∈ {0, 1}∗ , the Kolmogorov complexity of x is, K(x) = minp {|p| : U (p) = x}. For any time constructible t, the t-time-bounded Kolmogorov complexity of x is, K t (x) = min{|p| : U (p) = x in at most t(|x|) steps}. Using Depth to Capture Average-Case Complexity 305 Kolmogorov complexity of a string is a rigorous measure of the amount of information contained in it. A string with high Kolmogorov complexity contains lots of information. A random string has high Kolmogorov complexity and hence very informative. However, intuitively, the very fact that it is random restricts its utility in computational complexity theory. How can we measure the nonrandom information in a string? Antunes, Fortnow and van Melkebeek [AFvM01] propose a notion of Computational Depth as a measure of nonrandom information in a string. Intuitively strings of high depth are low Kolmogorov complexity strings (and hence nonrandom), but a resource bounded machine cannot identify this fact. Indeed, Bennett’s logical depth [Ben88] can be viewed as such a measure, but its definition is rather technical. Antunes, Fortnow and van Melkebeek suggest that the difference between two Kolmogorov complexity measures captures the intuitive notion of nonrandom information. Based on this intuition and with simplicity in mind, in this work we use the following depth measure. Definition 2 (Antunes-Fortnow-van Melkebeek). Let t be a constructible time bound. For any string x ∈ {0, 1}∗ , deptht (x) = K t (x) − K(x). Average Case Complexity We give definitions from average case complexity theory necessary for our purposes [Lev86]. For more details readers can refer to the survey by Jie Wang [Wan97]. In average case complexity theory, a computational problem is a pair (L, µ) where L ⊆ Σ ∗ and µ is a probability distribution. The probability distribution is a function from Σ ∗ to the real interval [0, 1] such that  x∈Σ ∗ µ(x) ≤ 1. For probability  distribution µ, the distribution function, denoted by µ∗ is given by µ∗ (x) = y≤x µ(x). The notion of polynomial on average is central to the theory of average case completeness. Definition 3. Let µ be a probability distribution function on {0, 1}∗ . A function f : Σ + → N is polynomial on µ-average if there exists an ǫ > 0 such that  f (x)ǫ x |x| µ(x) < ∞. From the definition it follows that any polynomial is polynomial on µ-average for any µ. It is easy to show that if functions f and g are polynomial on µ-average, then the functions f.g, f + g, and f k for some constant k are also polynomial on µ-average. Definition 4. Let µ be a probability distribution and L ⊆ Σ ∗ . Then the pair (L, µ) is in Average Polynomial time (denoted as Avg-P) if there is a Turing machine accepting L whose running time is polynomial on µ-average. We need the notion of domination for comparing distributions. The next definition formalizes this notion. 306 L. Antunes, L. Fortnow, and N.V. Vinodchandran Definition 5. Let µ and ν be two distributions on Σ ∗ . Then µ dominates ν if there is a constant c such that for all x ∈ Σ ∗ , µ(x) ≥ |x|1 c ν(x). We also say ν is dominated by µ. Proposition 1. If a function f is polynomial on µ-average, then for all distributions ν dominated by µ, f is also polynomial on ν-average. Average case analysis is, in general, sensitive to the choice of distribution, if we allow arbitrary distributions then average case complexity classes take the form of traditional worst-case complexity classes [LV92]. So it is important to restrict attention to distributions which are in some sense simple. Usually simple distributions are identified with the polynomial-time computable or polynomialtime samplable distributions. Definition 6. Let t be a time constructible function. A probability distribution function µ on {0, 1}∗ is said to be t-time computable, if there is a deterministic Turing machine that on every input x and a positive integer k, runs in time t(|x| + k), and outputs a fraction y such that |µ∗ (x) − y| ≤ 2−k . The most controversial definition in the average case complexity theory is the association of the class of simple distributions with P-computable, which may seem too restricting. Ben-David et al. in [BCGL92] introduced a wider family of natural distributions, P-samplable, consisting of distributions that can be sampled by randomized algorithms, working in time polynomial in the length of the sample generated. Definition 7. A probability distribution µ on {0, 1}∗ is said to be P-samplable, if there is a probabilistic Turing machine M which on input 0k produces a string x such that |P r(M (0k ) = x) − µ(x)| ≤ 2−k and M runs in time poly(|x| + k). Every P-computable distribution is also P-samplable, however the converse is unlikely. Theorem 1 ([BCGL92]). If one-way functions exists, then there is a Psamplable probability distribution µ which is not dominated by any polynomialtime computable probability distribution ν. Universal Distributions The Kolmogorov complexity function K(.) naturally defines a probability distribution on Σ ∗ : for any string x assign a probability of 2−K(x) . Kraft’s inequality implies that this indeed is a probability distribution. This distribution is called the universal distribution and is denoted by m. Universal distribution has many equivalent formulations and has many nice properties. Refer to the textbook by Li and Vitányi [LV97] for an in-depth study on m. The main drawback of m is that it is not computable. In this paper we consider the resource-bounded version of the universal distribution. Using Depth to Capture Average-Case Complexity 307 Definition 8. The t-time bounded universal distribution, mt is given by mt (x) t = 2−K (x) . One important property of mt is that it dominates certain computable distributions. Theorem 2 ([LV97]). mt dominates any t/n-time computable distribution. Proof. (Sketch) Let µ be a t/n-time computable distribution and let µ∗ denote the distribution of µ. We will show that for any x ∈ Σ n , K t (x) ≤ − log(µ(x)) + Cµ for a constant Cµ which depends on µ. Let Bi = {x ∈ Σ n |2−(i+1) ≤ µ(x) < 2−i }. Since for any x in Bi , µ(x) ≥ 2−(i+1) , we have that |Bi | ≤ 2i . Consider the real interval [0, 1]. Divide it into intervals of size 2−i . Since µ(x) ≥ 2−i , we have for any j, 0 ≤ j ≤ 2i , the j th interval [j2−i , (j + 1)2−i ] will have at most one x ∈ Bi such that µ(x) ∈ [j2−i , (j + 1)2−i ]. Since µ is t/n-computable, for any x ∈ Bi , given j, we can do a binary search to output the unique x satisfying µ(x) ∈ [j2−i , (j + 1)2−i ]. This involves computing µ∗ correct up to 2−(i+1) . So the total running time of the process will be bounded by O((t/n)n). Hence we have the theorem. Note that mt approaches m as t → ∞. In the proof of Theorem2, mt very strongly dominates t/n-time computable distributions, in the sense that mt (x) ≥ 1 µ(x). The definition of domination that we follow only needs mt to dominate 2Cµ µ within a polynomial. It is then natural to ask if there exists a polynomial-time computable distribution dominating mt . Schuler [Sch99] showed that if such a distribution exists then no polynomially secure pseudo-random generators exists. Pseudo-random generators are efficiently computable functions which stretches a seed into a long string so that for a random input the output looks random for a resourcebounded machine. Theorem 3 ([Sch99]). If there exists a polynomial time computable distribution that dominates mt then pseudo-random generators do not exist. While, it is unlikely that there are polynomial-time computable distributions dominating universal distributions, we show that there are P-samplable distributions dominating the time-bounded universal distributions. Lemma 1. For any polynomial t, there is a P-samplable distribution µ which dominates mt . Proof. (Sketch) We will define a samplable distribution µt by prescribing a sampling algorithm for µt as follows. Let U be the universal machine. Sample n ∈ N with probability n12 Sample 1 ≤ j ≤ n with probability 1/n Sample uniformally y ∈ Σ j Run U (y) for t steps. If U stops and outputs a string x ∈ Σ n , output x. For any string x of length n, K t (x) ≤ n. Hence it is clear that the probability t that x is at least n13 2−K (x) . 308 3 L. Antunes, L. Fortnow, and N.V. Vinodchandran Computational Depth and Average Polynomial Time We state our main theorem which relates computational depth to average polynomial time. Theorem 4. Let T be a constructible time bound. Then for any time constructible t, the following statements are equivalent. 1. T (x) ∈ 2O(deptht (x)+log |x|) . 2. T is polynomial on mt -average. In [LV92], Li and Vitányi showed that when the inputs to any algorithm are distributed according to the universal distribution, the algorithm’s average case complexity is of the same order of magnitude as its worst case complexity. Rephrasing this connection in the setting of average polynomial time we can make the following statement. Theorem 5 (Li-Vitányi). Let T be a constructible time bound. The following statements are equivalent 1. T (x) is bounded by a polynomial in |x|. 2. T is polynomial on m-average. As t → ∞, K t approaches K. So deptht approaches 0 and mt approaches m. Hence our main theorem can be seen as a generalization of Li and Vitányi’s theorem. We can apply the implication (1 ⇒ 2) of the main theorem in the following way. Let M be a Turing machine and let L(M ) denote the language accepted by M . Let TM denote its running time. If TM (x) ∈ 2O(deptht (x)+log |x|) then (L(M ), µ) is in Avg-P for any µ which is computable in time t/n. The following corollary follows from our main theorem and the universality of mt (Theorem 2). Corollary 1. Let M be a deterministic Turing machine whose running time is bounded by 2O(deptht (x)+log |x|) , for some polynomial t. Then for any t/ncomputable distribution µ, the pair (L(M ), µ) is in Avg-P. Hence a sufficient condition for a language L (accepted by M ) to be in Avg-P with respect to all polynomial-time computable distributions is that the running time of M is bounded by exponential in deptht , for all polynomials t. An obvious question that arises is whether this condition is necessary. We have already partially answered this question (Lemma 1) by exhibiting an efficiently samplable distribution µt that dominates mt . Hence if (L(M ), µt ) is in Avg-P then (L(M ), mt ) is also in Avg-P. From the implication (2 ⇒ 1) of the main theorem, we have that TM (x) ∈ 2O(deptht (x)+log |x|) . From Lemma 1, we get that if a machine runs in time polynomial on average for all P-samplable distributions then it runs in time exponential in its depth. Corollary 2. Let M be a machine which runs in time TM . Suppose for all distributions µ in P-samplable, TM is polynomial on µ-average, then TM (x) ∈ 2O(deptht (x)+log |x|) , for some polynomial t. Using Depth to Capture Average-Case Complexity 309 We now prove our main theorem. Proof. (Theorem 4) (1 ⇒ 2). We will show that the statement 1 implies that T (x) is polynomial on mt -average. Let T (x) ∈ 2O(deptht (x)+log |x|) . Because of the closure properties of functions which are polynomial on average, it is enough to show that the function T ′ (x) = 2deptht (x) is polynomial on mt -average. This essentially follows from the definitions and Kraft’s inequality. The details are as follows. Consider the sum  T ′ (x)  2deptht (x) t mt (x) = 2−K (x) |x| |x| ∗ ∗ x∈Σ x∈Σ =  2K t (x)−K(x) t 2−K (x) |x| ∗ x∈Σ ≤  2−K(x) |x| ∗ x∈Σ <  2−K(x) < 1 x∈Σ ∗ The last inequality is the Kraft’s inequality. (2 ⇒ 1) Let T (x) be a time constructible function which is polynomial on mt -average. Then for some ǫ > 0 we have  T (x)ǫ mt (x) < 1 |x| ∗ x∈Σ n Define Si,j,n = {x ∈ Σ |2i ≤ T (x) < 2i+1 and K t (x) = j}. Let 2r be the approximate size of Si,j,n . Then the Kolmogorov complexity of elements in Si,j,n is r up to an additive log n factor. The following claim (proof omitted) states this fact more formally. Claim. For i, j ≤ n2 , let 2r ≤ |Si,j,n | < 2r+1 . Then for any x ∈ Si,j,n , K(x) ≤ r + O(log n). Consider the above sum restricted to elements in Si,j,n . Then we have  T (x)ǫ mt (x) < 1 |x| x∈Si,j,n i t −j T (x) ≥ 2 , m (x) = 2 and there are at least 2r elements in the above sum. r iǫ −j Hence the above sum is lower-bounded by the expression 2 .2|x|.2 for some c constant c. This gives us  T (x)ǫ mt (x) 1> |x| x∈Si,j,n ≥ 2r .2iǫ .2−j = 2iǫ+r−j−c log n |x|c That is iǫ + r − j − c log n < 1. From Claim 3, it follows that there is a constant d, such that for all x ∈ Si,j,n , iǫ ≤ deptht (x) + d log |x|. Hence d T (x) ≤ 2i+1 ≤ 2 ǫ (deptht (x)+log |x|) . 310 L. Antunes, L. Fortnow, and N.V. Vinodchandran Acknowledgment. We thank Paul Vitányi for useful discussions. References [AFvM01] Luis Antunes, Lance Fortnow, and Dieter van Melkebeek. Computational depth. In Proceedings of the 16th IEEE Conference on Computational Complexity, pages 266–273, 2001. [BCGL92] S. Ben-David, B. Chor, O. Goldreich and M. Luby. On the theory of average case complexity. J. Computer System Sci., 44(2):193–219, 1992. [Ben88] Charles H. Bennett. Logical depth and physical complexity. In R. Herken, editor, The Universal Turing Machine: A Half-Century Survey, pages 227– 257. Oxford University Press, 1988. [HILL99] Johan Håstad, Russell Impagliazzo, Leonid A. Levin, and Michael Luby. A pseudorandom generator from any one-way function. SIAM Journal on Computing, 28(4):1364–1396, August 1999. [Lev86] Leonid A. Levin. Average case complete problems. SIAM Journal on Computing, 15(1):285–286, 1986. [Lev84] Leonid A. Levin. Randomness conservation inequalities: information and independence in mathematical theories. Information and Control, 61:15– 37, 1984. [LV92] Ming Li and Paul M. B. Vitanyi. Average case complexity under the universal distribution equals worst-case complexity. Information Processing Letters, 42(3):145–149, May 1992. [LV97] Ming Li and Paul M. B. Vitányi. An introduction to Kolmogorov complexity and its applications. Springer, 2nd edition, 1997. [Mil93] Peter Bro Miltersen. The complexity of malign measures. In SIAM Journal on Computing, 22(1):147–156, 1993. [Sch99] Rainer Schuler. Universal distributions and time-bounded Kolmogorov complexity. In Proc. 16th Annual Symposium on Theoretical Aspects of Computer Science, pages 434–443, 1999. [Wan97] Jie Wang. Average-case computational complexity theory. In Alan L. Selman, Editor, Complexity Theory Retrospective, volume 2. 1997. Non-uniform Depth of Polynomial Time and Space Simulations Richard J. Lipton1 and Anastasios Viglas2 1 College of Computing, Georgia Institute of Technology and Telcordia Applied Research rjl@cc.gatech.edu 2 University of Toronto, Computer Science Department, 10 King’s College Road, Toronto, ON M5S 3G4, Canada aviglas@cs.toronto.edu Abstract. We discuss some connections between polynomial time and non-uniform, small depth circuits. A connection is shown with simulating deterministic time in small space. The well known result of Hopcroft, Paul and Valiant [HPV77] showing that space is more powerful than time can be improved, by making an assumption about the connection of deterministic time computations and non-uniform, small depth circuits. To be more precise, we prove the following: If every linear time deterministic computation can be done by non-uniform circuits of polynomial size and sub-linear depth,then DT IME(t) ⊆ DSPACE(t1−ǫ ) for some constant ǫ > 0. We can also apply the same techniques to prove an unconditional result, a trade-off type of theorem for the size and depth of a non-uniform circuit that simulates a uniform computation. Keywords: Space simulations, non-uniform depth, block respecting computation. 1 Introduction We present an interesting connection between non-uniform characterizations of Polynomial time and time versus space results. Hopcroft Paul and Valiant [HPV77] proved that space is more powerful than time: DT IME(t) ⊆ DSPACE(t/ log t). The proof of this trade-off result is based on pebbling techniques and the notion of block respecting computation. Improving the space simulation of deterministic time has been a long standing open problem. Paul Tarjan and Celoni [PTC77] proved an n/ log n lower bound for pebbling a certain family of graphs. This lower bound implies that the trade-off result DT IME(t) ⊆ DSPACE(t/ log t) of [HPV77] cannot be improved using similar pebbling arguments. In this work we present a connection between space simulations of deterministic time and the depth of non-uniform circuits simulating polynomial time computations. This connection gives a way to improve the space simulation result from [HPV77] mentioned above, by making a non-uniform assumption. If A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 311–320, 2003. c Springer-Verlag Berlin Heidelberg 2003  312 R.J. Lipton and A. Viglas every problem in linear deterministic time can be solved by polynomial size non-uniform circuits of small (sub-linear) depth then every deterministic computation of time t can be simulated in space t1−ǫ for some constant ǫ > 0 (that depends only on our assumption about the non-uniform depth of linear time): DT IME(n) ⊆SIZE−DEPT H(poly(n), nδ ) =⇒ DT IME(t) ⊆ DSPACE(t1−ǫ ) (1) where δ < 1 and ǫ > 0. Note that we allow the size of the non-uniform circuit to be any polynomial. Since DT IME(t) ⊆ SIZE(t · log t) (proved in [PF79]), our assumption basically asks to reduce the depth of the non-uniform circuit by a small amount, allowing the size to increase by any polynomial factor. It is interesting to note that in this result, a non-uniform assumption is used (P has small non-uniform depth) to prove a purely uniform result (deterministic time can be simulated in small space). This can also be considered as an interesting result for the power of non-uniformity: If non-uniformity is powerful enough to allow small depth circuits for linear time deterministic computations, then we can improve the space-bounded simulation of deterministic time given by Hopcroft Paul and Valiant. A related result was shown by Sipser [Sip86,Sip88] from the point of view of reducing randomness required for randomized algorithms. His result considers the problem of constructing expanders with certain properties. Assuming that those expanders can be constructed efficiently, the main theorem proved is that either P is equal to RP or the space simulation of Hopcroft, Paul and Valiant [HPV77] can be improved: Under the hypothesis that certain expanders have explicit constructions, there exists an ǫ > 0 such that (P = RP) or (DT IME(t) ∩ 1∗ ) ⊆ DSPACE(t1−ǫ ) (2) An explicit construction for the expanders mentioned above was given by Saks, Srinivasan and Zhou [SSZ98]. The theorem mentioned above reveals a deep connection between pseudo-randomness and efficient space simulations (for unary languages): either space bounded simulations for deterministic time can be improved, or we can construct (pseudorandom) sequences that can be used to improve the derandomization of certain algorithms. On the other hand, the result we are going to present in this work, gives a connection between the power of non-uniformity and the power of space bounded computations. Other related results include Dymond and Tompa [DT85] where it is shown that DT IME(t) ⊆ AT IME(t/ log t), improving the Hopcroft Paul Valiant theorem, and Paterson and Valiant [PV76] proving SIZE(t) ⊆ DEPT H(t/ log t). We also show how to apply the same techniques to prove an unconditional trade-off type of result for the size and depth of a non-uniform circuit that simulates a uniform computation. Any deterministic√time t computation can be √ simulated by a non-uniform circuit of size roughly 2 t and depth t, which has “semi-unbounded” fan-in: all AND gates have polynomially bounded fan-in and OR gates are unbounded, or vice versa. √ Similar results were given in [DT85] showing that time t is in PRAM time t. Non-uniform Depth of Polynomial Time and Space Simulations 2 313 Notation – Definitions We use the standard notation for time and space complexity classes DT IME(t) and DSPACE(t). SIZE−DEPT H(s, d) will denote the class of non-uniform circuits with size (number of gates) O(s) and depth O(d). We also use N C/poly (N C with polynomial advice) to denote the class of non-uniform circuits of polynomial size and poly-logarithmic depth, SIZE−DEPT H(poly, polylog). At some points in the paper, we will also avoid writing poly-logarithmic factors in detail and use the notation Õ(n) to denote O(n logk n) for constant k. In this work we consider time complexity functions that are time constructible: A function t(n) is called fully time constructible if there exists a deterministic Turing Machine that on input of length n halts after exactly t(n) steps. In general a function f (n) is t-time constructible, if there is a deterministic Turing Machine that on input x outputs 1f (|x|) and runs in time O(t). (t, s)-time-space constructible functions are defined similarly. We also use “TM” for “deterministic Turing Machine”. For the proof of the main result we use the notion of block respecting Turing machines introduced by Hopcroft Paul and Valiant in [HPV77]. computation tapes t steps b steps t b bits Fig. 1. Block respecting computation Definition 1. Let M be a machine running in time t(n), where n is the length of its input x. Let the computation of M be partitioned in a(n) segments, where each segment consists of b(n) consecutive steps, a(n) · b(n) = t(n). Let also the tapes of M be partitioned into a(n) blocks each consisting of b(n) bits (cells) on each tape. We will call M block respecting if during each segment of its computation, each head visits only one block on each tape. Every Turing Machine can be converted to a block respecting machine with only a constant factor slow down in its running time. The construction is simple: Let M be a deterministic Turing Machine running in time t. Break the computation steps (1 . . . t) in segments of size B. Break the work tapes in blocks of the same size B. If at the start of a computation segment σ the work tape head is 314 R.J. Lipton and A. Viglas in block bj , then during the computation steps (b steps) of that segment, the head could only visit the adjacent blocks, bj−1 or bj+1 . Keep a copy of those two blocks along with bj and do all the computation of segment σ reading and updating from those copies (if needed). At the end of the computation of every segment, there is a clean-up step: update the blocks bj−1 and bj+1 and move the work tape head to the appropriate block to start the computation of the next segment. This construction can be done for different block sizes B. For our purposes B will be tc for a small constant c < 1. Block respecting Turing machines are also used in [PPST83] to prove that non-deterministic linear time is more powerful than deterministic linear time (see also [PR81] for a generalization of the results from [HPV77] for RAMs and other machine models). 3 Main Results We show that if linear time has small non-uniform circuit depth (for polynomial size circuits) then DT IME(t) ⊆ DSPACE(t1−ǫ ) for a constant ǫ > 0. To be more precise, the strongest form of the main result is the following: if (deterministic) linear time has polynomial size, non-uniform circuits of sublinear depth (for example depth nδ for 0 < δ < 1), then DT IME(t) ⊆ DSPACE(t1−ǫ ) for a small positive ǫ > 0: DT IME(n) ⊆ SIZE−DEPT H( poly, nδ ) =⇒ DT IME(t) ⊆ DSPACE(t1−ǫ ) (3) The main idea is the following: Start with a deterministic Turing machine M running in time t and convert it to a block respecting machine MB with block size B. In each segment of the computation, MB reads and/or writes in exactly one block on each tape. We will argue that we can check the computation in each such segment with the same sub-circuit and we can actually construct this sub-circuit with polynomial size and small (poly-logarithmic or sub-linear) depth. Combining all these sub-circuits together we can build a larger circuit that will check the entire computation of MB in small depth. The final step is a technical lemma that shows how to evaluate this circuit in small space (equal to its depth). We start by proving the main theorem using the assumption P ⊆ N C/poly. It is easy to see that an assumption of the form DT IME(n) ⊆ N C/poly implies P ⊆ N C/poly by padding arguments. Theorem 1. Let t be a polynomial time complexity function. If P ⊆ N C/poly then DT IME(t) ⊆ DSPACE(t1−ǫ ) for some constant ǫ > 0. Proof. (Any “reasonable” time complexity function could be used in the statement of this theorem.) Consider any Turing Machine M running in deterministic time t. Here is how to simulate M in small space using the assumption that polynomial time has shallow (poly-logarithmic depth) polynomial size circuits: Non-uniform Depth of Polynomial Time and Space Simulations 315 1. Convert given TM in a block respecting machine with block size B. 2. Construct the graph that describes the computation. Each vertex corresponds to a computation segment of B steps. 3. The computation on each vertex can be checked by the same TM U that runs in polynomial time (linear time) 4. Since P ⊆ N C/poly, there is a circuit UC that can replace U . UC has polynomial size and polylogarithmic depth. 5. Construct UC by trying all possible circuits. 6. Plug in the sub-circuit UC to the entire graph. This graph is the description of a circuit of small depth, that corresponds to the computation of the given TM. Evaluate the circuit (in small space) In more detail: Convert M to a block respecting machine MB . Break the computation of MB (on input x) in segments of size B each; the number of segments is t/B. Consider the directed graph G corresponding to the computation of the block respecting machine as described in [HPV77]: G has one vertex for every time segment (that is t/B vertices) and the edges are defined from the sequence of head positions. Let v(∆) denotes the vertex corresponding to time segment ∆ then and ∆i is the last time segment before ∆ during which the i-th head was scanning the same block as during segment ∆. Then the edges of G are v(∆ − 1) → v(∆) and for all 1 ≤ i ≤ l, v(∆i ) → v(∆). The number of edges t can be at most  t O( tB) and therefore the number of bits required to describe the graph is O B log B . Figure 2 shows the idea behind the construction of the s1 1 computation s2 t steps B b1 work tapes b2 tape block b1 s1 B computation steps b2 s2 Fig. 2. Graph description of a block respecting computation. graph for the block respecting computation. The computation is partitioned in segments of size B. Every segment corresponds to a vertex (denoted by a circle in figure 2). Each segment will access only one block on each tape. Figure 2 shows the tape blocks blocks which are read during a computation segment (input blocks for that vertex) and those that will be written during the same segment (shown as output blocks). If a block is written during a segment and the 316 R.J. Lipton and A. Viglas same block is read by another computation segment later in the computation, then the second segment depends directly from the previous one and there will be an edge connecting the corresponding vertices in our graph. Each vertex of this graph corresponds to B computation steps of MB . During this computation, MB reads and writes only in one block from each tape. In order to check the computation that corresponds to a vertex of this graph, we would need to simulate MB for B steps and check O(B) bits from MB ’s tapes. For each vertex we need to check/simulate a different segment of MB ’s computation: this can be done by a Turing machine that will check the corresponding computation of MB . We argue that the same Turing machine can be used on every vertex. The computation we need to do on each vertex of the graph is essentially the same: given the “input” and “output” contents of certain tape blocks, simulate the machine MB for B steps and check if the output contents are correct. The only thing that changes is the actual segment of the computation of MB that we are going to simulate (which B steps of MB we should simulate). This means that the exact same “universal” Turing machine checks the computation for each segment/vertex, and this universal machine also takes as input the description (for example the index of the part of the computation of the initial machine MB it will need to simulate or any reasonable encoding) of the computation that it needs to actually simulate on each vertex. Therefore we have the same machine U on all vertices of the graph which runs in deterministic polynomial time. If P ⊆ SIZE−DEPT H(nk , logl n) then U can be simulated by a circuit tape block depth polylog B size B B computation steps Fig. 3. Insert the (same) sub-circuit on all vertices UC of size O(B k ) and small depth O(logl B), for some k, l. The same circuit is used on all vertices of the graph. In order to construct this circuit, we can try all possible circuits and simulate them on all possible inputs. This requires exponential time, but only small amount of space: the size of the circuit is B k and its depth polylogarithmic in B. We need Õ(B k ) bits to write down the circuit and only polylog space to evaluate it (using lemma 1). Once we have constructed UC , we can build the entire circuit that will simulate MB . This circuit derives directly from the (block-respecting) computation Non-uniform Depth of Polynomial Time and Space Simulations 317 graph where each vertex is an instance of the sub-circuit UC . The size of the entire circuit is too big to write down. We have up to t/B sub-circuits (UC ) that would require a size of Õ( Bt B k ) for some constant k. But since it is the same sub-circuit UC that appears throughout the graph, we can implicitly describe the entire circuit in much less space. For the evaluation of the circuit, we only need to be able to describe the exact position of a vertex in the graph, and determine the immediate neighbors of a given vertex (previous and next vertices). This can easily be done in space Õ(t/B + B k ). In order to complete the simulation we need to show how to evaluate a smalldepth circuit in small space (see Borodin [Bor77]). Lemma 1. Consider a directed acyclic graph G with one source (root). Assume that the leaves are labeled from {0, 1}, its inner nodes are either AND or OR nodes and the depth is at most d. Then we can evaluate the graph in space at most O(d). Proof. (of lemma. See [Bor77] for more details). Convert the graph to a tree (by making copies of the nodes). The tree will have much bigger size but the depth will remain the same. We can prove (by induction) that the value of the tree is the same as the value of the graph from which we started. Evaluating the tree corresponds to computing the value of its root. In order to find the value of any node v in the tree, proceed as follows: Let u1 , . . . , uk denote the child-nodes of v. If v is an AND node, then compute (recursively) the value of its first child u1 . If value(u1 ) = 0 then the value of v is also 0. Otherwise continue with the next child. If the last child has value 1 then the value of v is 1. Notice that we do not need to remember the value of the child-nodes that we have evaluated. If v is an OR node, the same idea can be applied. We can use a stack for the evaluation of the tree. It is easy to see that the size of the stack will be at most O(d), that is as big as the depth of the tree. The total amount of space used is: Õ(B 2k + t logl B) B (4) To get the desired result, we need to choose the size B of the blocks appropriately to balance the two terms in (4). B will be t1/c for some constant c that is larger than k. As mentioned above, the exact same proof would work even if we allow almost linear depth for the non-uniform circuits for just linear deterministic time instead of P. The stronger theorem is the following: Theorem 2. If DT IME(n) ⊆ SIZE−DEPT H(nk , nδ ) for some k > 0 and 1−δ δ < 1, then DT IME(t) ⊆ DSPACE(t1−ǫ ) where ǫ = 1 − 2k+1 . Proof. From the proof of theorem 1 we can calculate the space required for the simulation: In order to find the correct sub-circuit which has size B k and depth 318 R.J. Lipton and A. Viglas B δ , we need O(B 2k log B) space to write it down and O(B δ ) to evaluate it. To evaluate the entire circuit which has depth Bt · B δ ) we are only using space O( t t · B δ log B + log t + B 2k log B) B B (5) The first term in equation (5), is the space required to evaluate the entire circuit that has depth Bt · B δ and the second and third term is the space required to write down an implicit description of the entire circuit (description of the graph from the block respecting computation, and the description of the smaller sub-circuit) Total space used (to find the correct sub-circuit and to evaluate the entire circuit) is O( t · B δ log B + B 2k log B) B (6) If we set B = t1/2k+1 then the space bound is 1−δ O(t1− 2k+1 ) (7) In these calculations 2k + 1 means just something greater than 2k. These proof ideas seem to fail if we try to simulate non-deterministic time in small space. In that case, evaluating the circuit would be more complicated: we would need to use more space in order to make sure that the non-deterministic guesses are consistent throughout the evaluation of the circuit. 4 Semi-unbounded Circuits These simulation ideas using block respecting computation can also be used to prove an unconditional result relating uniform polynomial time and non-uniform small depth circuits. The simulation of the previous section implies unconditionally a trade-off type of result for the size and depth of non-uniform circuits that simulate uniform computations. The next theorem proves that any deterministic √ √ time can be simulated by a non-uniform circuit of size t · 2 t or √ t computation √ 2O( t) and depth t, which has “semi-unbounded” fan-in. Previous work by Dymond and Tompa [DT85]√also present similar results showing that deterministic time t is in PRAM time t. Theorem 3. Let t be a√reasonable time complexity function. Then DT IME(t) √ ⊆ SIZE−DEPT H(2O( t) , t), and the simulating circuits require exponential fan-in for AND gates and polynomial for OR gates (or vice-versa) Proof. Given a Turing machine running in DT IME(t), construct the block respecting version, and repeat the exact same construction as the one presented in the proof of theorem 1: Construct the graph describing the block respecting Non-uniform Depth of Polynomial Time and Space Simulations 319 computation, which has t/B nodes, and every node corresponds to a segment of B (we will chose the size B later in the proof) computation steps. Use this graph to construct the non-uniform circuit: For every node, build a circuit, say in DNF, that corresponds to the computation that takes place on that node. This circuit has size exponential in B in the worst case, 2O(B) , and depth 2. The entire graph describes a circuit of size Bt 2O(B) and depth O(B). Also, note that for every sub-circuit that corresponds to each node, the input gates (AND gates as described in the proof) have a fan-in of at most O(B), while the second level might need exponential fan-in. This construction yields a circuit of “semi-unbounded” type fan-in. 5 Discussion – Open Problems In this work we have shown a connection between the power of non-uniformity and the power of space bounded computation. The proof of the main theorem is based on the notion of block respecting computation and various techniques for simulating Turing Machine computation. The main result states that if Polynomial time has small non-uniform depth then space can simulate deterministic time fast(-er). An interesting open question is to see if the same ideas can be used to prove a similar space simulation for non-deterministic time. It seems also possible that a result could be proved for probabilistic classes. A different approach would be to make a stronger assumption (about complexity classes) and reach a contradiction with some hierarchy theorem or other diagonalization result thus proving a complexity class separation. Acknowledgments. We would like to thank Nicola Galesi, Toni Pitassi and Charlie Rackoff for many discussions on these ideas. Also many thanks to Dieter van Melkebeek and Lance Fortnow. References [Bor77] A. Borodin. On relating time and space to size and depth. SIAM Journal of Computing, 6(4):733–744, December 1977. [DT85] Patrick W. Dymond and Martin Tompa. Speedups of deterministic machines by synchronous parallel machines. Journal of Computer and System Sciences, 30(2):149–161, April 1985. [HPV77] J. Hopcroft, W. Paul, and L. Valiant. On time versus space. Journal of the ACM., 24(2):332–337, April 1977. [PF79] Nicholas Pippenger and Michael J. Fischer. Relations among complexity measures. Journal of the ACM, 26(2):361–381, April 1979. [PPST83] Wolfgang J. Paul, Nicholas Pippenger, Endre Szemerédi, and William T. Trotter. On determinism versus non-determinism and related problems (preliminary version). In 24th Annual Symposium on Foundations of Computer Science, pages 429–438, Tucson, Arizona, 7–9 November 1983. IEEE. 320 R.J. Lipton and A. Viglas [PR81] [PTC77] [PV76] [Sip86] [Sip88] [SSZ98] W. Paul and R. Reischuk. On time versus space II. Journal of Computer and System Sciences, 22(3):312–327, June 1981. Wolfgang J. Paul, Robert Endre Tarjan, and James R. Celoni. Space bounds for a game on graphs. Mathematical Systems Theory, 10:239–251, 1977. M. S. Paterson and L. G. Valiant. Circuit size is nonlinear in depth. Theoretical Computer Science, 2(3):397–400, September 1976. M. Sipser. Expanders, randomness, or time versus space. In Alan L. Selman, editor, Proceedings of the Conference on Structure in Complexity Theory, volume 223 of LNCS, pages 325–329, Berkeley, CA, June 1986. Springer. M. Sipser. Expanders, randomness, or time versus space. Journal of Computer and System Sciences, 36:379–383, 1988. Michael Saks, Aravind Srinivasan, and Shiyu Zhou. Explicit OR-dispersers with polylogarithmic degree. Journal of the ACM, 45(1):123–154, January 1998. Dimension- and Time-Hierarchies for Small Time Bounds Martin Kutrib Institute of Informatics, University of Giessen Arndtstr. 2, D-35392 Giessen, Germany kutrib@informatik.uni-giessen.de Abstract. Recently, infinite time hierarchies of separated complexity classes in the range between real time and linear time have been shown. This result is generalized to arbitrary dimensions. Furthermore, for fixed time complexities of the form id + r, where r ∈ o(id) is a sublinear function, proper dimension hierarchies are presented. The hierarchy results are established by counting arguments. For an equivalence relation and a family of witness languages the number of induced equivalence classes is compared to the number of equivalence classes distinguishable by the model in question. By contradiction the properness of the inclusions is proved. 1 Introduction If one is particularly interested in computations with small time bounds, let us say in the range between real time and linear time, most of the relevant Turing machine results have been published in the early times of computational complexity. In the sequel we are concerned with time bounds of the form id + r, where id denotes the identity function on integers, and r ∈ o(id) is a sublinear function. Most of the previous investigations in this area have been done in terms of one-dimensional Turing machines. Recently, infinite time hierarchies of separated complexity classes in the range in question have been shown [10]. In [2] it has been proved that the complexity class Q which is defined by nondeterministic multitape real-time computations is equal to the corresponding linear-time languages. Moreover, it has been shown that two working tapes and a one-way input tape are sufficient to accept the languages from Q in real time. On the other hand, in [13] an NP-complete language was exhibited which is accepted 1 by a nondeterministic single-tape Turing machine in time id + O(id 2 log) but not in real time. This interesting result stresses the power of nondeterminism impressively and motivates the exploration of the world below linear time once more. For deterministic machines the situation is different. Though in [7] for one tape the identity DTIME1 (id) = DTIME1 (LIN) has been proved, for a total of at least two tapes the real-time languages are strictly included in the linear-time languages. A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 321–332, 2003. c Springer-Verlag Berlin Heidelberg 2003  322 M. Kutrib Another aspect that, at first glance, might attack the time range of interest is a possible speed-up. The well-known linear speed-up [6] from t(n) to id + ε · t(n) for arbitrary ε > 0 yields complexity classes close to real time (i.e. DTIME(LIN) = DTIME((1 + ε) · id)) for k-tape and multitape machines, but does not allow assertions on the range between real time and linear time. An application to the time bound id + r, r ∈ o(id), would result in a slow-down to id + ε · (id + r) ≥ id + ε · id. Let us recall known time hierarchy results. For a number of k ≥ 2 tapes in [5,14] the hierarchy DTIMEk (t′ ) ⊂ DTIMEk (t), if t′ ∈ o(t) and t constructible, has been shown. By the linear speed-up we obtain the necessity of the condition t′ ∈ o(t). The necessity of the constructibility property of t follows from the well-known Gap Theorem [9]. Since in case of multitape machines one needs to construct a Turing machine with a fixed number of tapes that simulates machines even with more tapes, the proof of a corresponding hierarchy involves a reduction of the number of tapes. This costs a factor log for the time complexity. The hierarchy DTIME(t′ ) ⊂ DTIME(t), if t′ · log(t′ ) ∈ o(t) and t constructible, has been proved in [6]. Due to the necessary condition t′ ∈ o(t) resp. t′ · log(t′ ) ∈ o(t), again, the range between real time and linear time is not affected by the known time hierarchy results. Moreover, it follows immediately from the condition t′ ∈ o(t) and the linear speed-up that there are no infinite hierarchies for time bounds of the form t + r, r ∈ o(id), if t ≥ c · id, c > 1. Related work concerning higher dimensional Turing machines can be found e.g. in [8], where for on-line computations the trade-off between time and dimensionality is investigated. Upper bounds for the reduction of the dimensions are dealt with e.g. in [12,15,16,19]. Here, on one hand, we are going to present infinite time hierarchies below linear time for any dimension. Such hierarchies are also known for one-dimensional iterative arrays [3]. On the other hand, dimension hierarchies are presented for each time bound in question. Thus, we obtain a double time-dimension hierarchy. The basic notions and a preliminary result of technical flavor are the objects of the next section. Section 3 is devoted to the time hierarchies below linear time. They are established by counting arguments. For an equivalence relation and a family of witness languages the number of induced equivalence classes is compared to the number of equivalence classes distinguishable by the model in question. By contradiction the properness of the inclusions follows. In Section 4 for fixed time complexities of the form id + r, r ∈ o(id) proper dimension hierarchies are proved. 2 Preliminaries We denote the rational numbers by Q, the integers by ZZ, the positive integers {1, 2, ...} by IN and the set IN ∪ {0} by IN0 . The reversal of a word w is denoted by wR . For the length of w we write |w|. We use ⊆ for inclusions and ⊂ if the inclusions are strict. Let ei = (0, . . . , 0, 1, 0, . . . , 0) (the 1 is at position i) denote Dimension- and Time-Hierarchies for Small Time Bounds 323 the ith d-dimensional unit vector, then we define Ed = {ei | 1 ≤ i ≤ d} ∪ {−ei | 1 ≤ i ≤ d} ∪ {(0, . . . , 0)}. For a function f : IN0 → IN we denote its i-fold composition by f [i] , i ∈ IN. If f is increasing and unbounded, then its inverse is defined according to f −1 (n) = min{m ∈ IN | f (m) ≥ n}. The identity function n → n is denoted by id. As usual we define the set of functions that grow strictly less than f by o(f ) = {g : IN0 → IN | lim n→∞ g(n) = 0}. f (n) In terms of orders of magnitude, f is an upper bound of the set O(f ) = {g : IN0 → IN | ∃ n0 , c ∈ IN : ∀ n ≥ n0 : g(n) ≤ c · f (n)}. Conversely, f is a lower bound of the set Ω(f ) = {g : IN0 → IN | f ∈ O(g)}. A d-dimensional Turing machine with k ∈ IN tapes consists of a finitestate control, a read-only one-dimensional one-way input tape and k infinite d-dimensional working tapes. On the input tape a read-only head, and on each working tape a read-write head is positioned. At the outset of a computation, the Turing machine is in the designated initial state and the input is the inscription of the input tape, all the other tapes are blank. The head of the input tape scans the leftmost input symbol whereas all other heads are positioned on arbitrary tape cells. Dependent on the current state and the currently scanned symbols on the k + 1 tapes, the Turing machine changes its state, rewrites the symbols at the head positions of the working tapes, and possibly moves the heads independently to a neighboring cell. The head of the input tape may only be moved to the right. With an eye towards language recognition, the machines have no extra output tape but the states are partitioned in accepting and rejecting states. More formally: Definition 1. A deterministic d-dimensional Turing machine with k ∈ IN tapes (DTMdk ) is a system S, T, A, δ, s0 , F , where 1. 2. 3. 4. 5. 6. S is the finite set of internal states, T is the finite set of tape symbols containing the blank symbol , A ⊆ T \ {} is the set of input symbols, s0 ∈ S is the initial state, F ⊆ S is the set of accepting states, δ : S × (A ∪ {}) × T k → S × T k × {0, 1} × Edk is the partial transition function. Since the input tape cannot be rewritten, we need no new symbol for its current tape cell. Due to the same fact, δ may only expect symbols from A ∪ {} on it. The input tape is one dimensional and one way and, thus, its head moves 324 M. Kutrib according to {0, 1}. The set of rejecting states is implicitly given by the partitioning, i.e. S \ F . The unit vectors correspond to the possible moves of the read-write heads. Let M be a DTMdk . A configuration of M at some time t ≥ 0 is a description of its global state which is a (2(k + 1) + 1)-tuple (s, f0 , f1 , . . . , fk , p0 , p1 , . . . , pk ), where s ∈ S is the current state, f0 : ZZ → A ∪ {} and fi : ZZ d → T are functions that map the tape cells of the corresponding tape to their current contents, and p0 ∈ ZZ and pi ∈ ZZ d are the current head positions, 1 ≤ i ≤ k. The initial configuration (s0 , f0 , f1 , . . . , fk , 1, 0, . . . , 0) at time 0 is defined by the input word w = a1 · · · an ∈ A∗ , the initial state s0 , and blank working tapes:  a if 1 ≤ m ≤ n f0 (m) = m otherwise fi (m1 , . . . , md ) =  for 1 ≤ i ≤ k Successor configurations are computed according to the global transition function ∆: Let (s, f0 , f1 , . . . , fk , p0 , p1 , . . . , pk ) be a configuration. Then (s′ , f0 , f1′ , . . . , fk′ , p′0 , p′1 , . . . , p′k ) = ∆(s, f0 , f1 , . . . , fk , p0 , p1 , . . . , pk ) if and only if δ(s, f0 (p0 ), f1 (p1 ), . . . , fk (pk )) = (s′ , x1 , . . . , xk , j0 , j1 , . . . , jk ) such that  fi (m1 , . . . , md ) if (m1 , . . . , md ) = pi xi if (m1 , . . . , md ) = pi ′ ′ pi = pi + ji , p0 = p0 + j0 fi′ (m1 , . . . , md ) = for 1 ≤ i ≤ k. Thus, the global transition function ∆ is induced by δ. Throughout the paper we are dealing with so-called multitape machines (DTMd ), where every machine has an arbitrary but fixed number of working tapes. A Turing machine halts iff the transition function is undefined for the current configuration. An input word w ∈ A∗ is accepted by a Turing machine M if the machine halts at some time in an accepting state, otherwise it is rejected. L(M) = {w ∈ A∗ | w is accepted by M} is the language accepted by M. If t : IN0 → IN, t(n) ≥ n, is a function, then M is said to be t-time-bounded or of time complexity t iff it halts on all inputs w after at most t(|w|) time steps. If t equals the function id, acceptance is said to be in real time. The lineartime languages are defined according to time complexities t = c · id, where c ∈ Q with c ≥ 1. Since time complexities are mappings to positive integers and have to be greater than or equal to id, actually, c · id means max{⌈c · id⌉, id}. But for convenience we simplify the notation in the sequel. The family of all languages which can be accepted by DTMd with time complexity t is denoted by DTIMEd (t). Dimension- and Time-Hierarchies for Small Time Bounds 325 In order to prove tight time hierarchies, in almost all cases well-behaved time bounding functions are required. Usually, the notion “well-behaved” is concretized in terms of computability or constructibility of the functions with respect to the device in question. Definition 2. Let d ∈ IN be a constant. A function f : IN0 → IN is said to be DTMd constructible iff there exists a DTMd which for every n ∈ IN on input 1n halts after exactly f (n) time steps. Another common definition of constructibility demands the existence of an O(f )-time-bounded Turing machine that computes the binary representation of the value f (n) on input 1n . Both definitions have been proven to be equivalent for multitape machines [11]. The following definition summarizes the properties of well-behaved (in our sense) functions and names them. Definition 3. The set of all increasing, unbounded DTMd -constructible functions f with the property ∀ c ∈ IN : ∃ c′ ∈ IN : c · f (n) ≤ f (c′ · n) is denoted by T (DTMd ). The set of their inverses is T −1 (DTMd ) = {f −1 | f ∈ T (DTMd )}. Since we are interested in time bounds of the form id + r, we need small functions r below the identity. The constructible functions are necessarily greater than the identity. Therefore, the inverses of constructible functions are used. The properties increasing and unbounded are straightforward. At first glance the property ∀ c ∈ IN : ∃ c′ ∈ IN : c · f (n) ≤ f (c′ · n) seems to be restrictive, but it is not. It is easily verified that almost all of the commonly considered bounding functions above the identity have this property (e.g, the identity itself, polynomials, exponential functions, etc.) As usual here we remark that even the family T(DTM1 ) is very rich. More details can be found for example in [1,17,20]. In order to clarify later calculations, we observe the following: Let r ∈ T −1 (DTMd ) be some function. Then there must exist a constructible function r̂ ∈ T (DTMd ) such that r = r̂−1 . Moreover, for all n we obtain r(r̂(n)) = n by definition: r(r̂(n)) = min{m ∈ IN | r̂(m) ≥ r̂(n)} implies m = n and, thus, r(r̂(n)) = n. In general, we do not have equality for the converse r̂(r(n)), but in the sequel we will need only the equality case. The following equivalence relation is well known (cf. Myhill-Nerode Theorem on regular languages). Definition 4. Let L ⊆ A∗ be a language over an alphabet A and l ∈ IN0 be a constant. Two words w and w′ are l-equivalent with respect to L if and only if wwl ∈ L ⇐⇒ w′ wl ∈ L for all wl ∈ Al . The number of l-equivalence classes of words of length n − l with respect to L (i.e. |wwl | = n) is denoted by N (n, l, L). The underlying idea is to bound the number of distinguishable equivalence classes. The following lemma gives a necessary condition for a language to be (id + r)-time acceptable by a DTMd . 326 M. Kutrib Lemma 5. Let r : IN0 → IN be a function and d ∈ IN be a constant. If L ∈ DTIMEd (id + r), then there exists a constant p > 1 such that: N (n, l, L) ≤ p(l+r(n)) d Proof. Let M = S, T, A, δ, s0 , F be a (id + r)-time DTMd that accepts a language L. In order to determine an upper bound for the number of l-equivalence classes, we consider the possible situations of M after reading all but l input symbols. The remaining computation depends on the current internal state and the contents of the at most (2(l + r(n)) + 1)d cells on each tape that are still reachable during the last at most l + r(n) time steps. Let p1 = max{|T |, |S|, 2}. (2(l+r(n))+1)d For the (2(l + r(n)) + 1)d cells per tape there are at most p1 different inscriptions. For some k ∈ IN tapes we obtain altogether at most k(2(l+r(n))+1)d +1 different situations which bounds the number of l-equivalence p1 (k+1)·3d classes. The lemma follows for p = p1 3 . ⊓ ⊔ Time Hierarchies In this section we will present the time hierarchies between real time and linear time for any dimension d ∈ IN. Theorem 6. Let r : IN0 → IN and r′ : IN0 → IN be two increasing functions 1 and d ∈ IN be a constant. If r ∈ T −1 (DTMd ), r ∈ O(id d ), and either r′ ∈ o(r) if d = 1, or r′ ∈ o(r1−ε ) for an arbitrarily small ε > 0 if d > 1, then DTIMEd (id + r′ ) ⊂ DTIMEd (id + r). Before proving the theorem we give the following example which is naturally based on root functions. The dimension hierarchies to be proved in Theorem 8 are also depicted. Example 7. Since T (DTMd ) contains the polynomials idc , c ≥ 1, the functions 1 1 id c are belonging to T −1 (DTMd ). (Actually, the inverses of idc are ⌈id c ⌉ but as mentioned before we simplify the notation for convenience.) 1 1 For d = 1, trivially, id i+1 ∈ o(id i ). 1 1 For d > 1 we need to find an ε such that id i+1 ∈ o(id i (1−ε) ). The condition i 1 < 1i (1 − ε). Thus, if i+1 < 1 − ε and therefore, if is fulfilled if and only if i+1 1 i . ε < 1 − i+1 . We conclude that the condition is fulfilled for all ε < i+1 The hierarchy ist depicted in Figure 1. ⊓ ⊔ Proof (of Theorem 6). At first we adjust a constant q dependent on ε. Choose q such that d−1 ≤ε dq + d for d > 1, and q = 1 for d = 1. Dimension- and Time-Hierarchies for Small Time Bounds ´  · 327 µ ´  · µ  ´  ·  µ ´  · µ  ´  ·  µ  ´  ·  µ ´  · µ  ´  ·  µ  ´  ·  µ ºº º ºº º ºº º ´ µ  ´ µ  ´ µ  ´  ·  µ ºº º  ´ µ  Fig. 1. Double hierarchy based on root functions. Since r ∈ T −1 (DTMd ), i.e. r is the inverse of a constructible function, there exists a constructible function r−1 ∈ T (DTMd ) such that r(r−1 (n)) = n. Now we are prepared to define a witness language L1 for the assertion. The words of L1 are of the form al br −1 q−1 (l1+d ) R R w1 $w1R c |w2 $w2 c | · · · c|ws $ws c|d1 · · · dm y, q q−1 where l ∈ IN is a positive integer, s = ld , m = (d − 1) · ld , y, wi ∈ {0, 1}l , 1 ≤ i ≤ s, and di ∈ Ed−1 , 1 ≤ i ≤ m. The acceptance of such a word is best described by the behavior of an accepting DTMd M. During a first phase, M reads al and stores it on a tape. Since d and q are q−1 constants, f (l) = l1+d is a polynomial and, thus, constructible. The function r−1 is constructible per assumption. The constructible functions are closed under composition. Therefore, during a second phase, M can simulate a constructor for r−1 (f ) on the stored input al and verify the number of b’s. Parallel to what follows, M verifies the lengths of the subwords wi to be l q (with the help of the stored al ) and the numbers s and m (s = ld as well as q−1 m = (d − 1) · ld are constructible functions). When w1 appears in the input M begins to store the subwords wi in a dq−1 q−1 q−1 dimensional area of size ld ×· · ·×ld ×l1+d . Suppose the area to consist of l dq−1 hypercubes with edge length l that are stacked up. The subwords are stored q−1 along the last coordinate, such that ld subwords are stacked up, respectively. If, for example, the head of the corresponding tape is located at coordinates (m1 , . . . , md ), then the following subword wi is stored into the cells (m1 , . . . , md−1 , md ), (m1 , . . . , md−1 , md + 1), . . . , (m1 , . . . , md−1 , md + l − 1). Temporarily, wi is also stored on another tape. Now M has to decide where to store the next subword wi+1 (for this purpose it simulates appropriate construcq−1 tors for ld ). In principle, there are two possibilities. The first one is that wi+1 328 M. Kutrib is stored as a neighbor of wi . In this case the head has to move back to position (m1 , . . . , md ) and to change the dth coordinate appropriately. The second one is that the subword wi+1 is stored below wi . In this case the head has to keep its position (m1 , . . . , md + l). The head is possibly moved while reading wiR . In both cases wiR is verified with the temporarily stored wi . The last phase leads to acceptance or rejection. After storing all subwords wi , q−1 we may assume that the last coordinate of the head position is l1+d (i.e., the head is on the bottom face of the area). While reading the di , M changes its head simply by adding di to the current position. Since di ∈ Ed−1 the d th coordinate q−1 is not affected. This phase leads to a head position (m1 , . . . , md−1 , l1+d ). Now the subword y is read and stored on two other tapes. Finally, M verifies whether or not y matches one of the subwords which have been stacked up in the cells q−1 (m1 , . . . , md−1 , 0), . . . , (m1 , . . . , md−1 , l1+d − 1) (if there are stored subwords in these cells at all). Continuous comparisons without delay are achieved by alternating moving one head from back to forth on one of the stored copies of y, while the other head moves from forth to back over the second copy. Machine M accepts if and only if it finds a matching subword. Altogether, M needs n time steps for reading the whole input and at most q−1 another l1+d time steps for comparing the y with the stacked up subwords. q−1 The first part of the input contains r−1 (l1+d ) symbols b. Therefore, n > q−1 q−1 q−1 r−1 (l1+d ) and since r is increasing, r(n) ≥ r(r−1 (l1+d )) = l1+d . We conclude that M obeys the time complexity id+r and, hence, L1 ∈ DTIMEd (id + r). Assume now L1 is acceptable by some DTMd M with time complexity id+r′ . Two words −1 1+dq−1 R R ) w1 $w1R c al br (l |w2 $w2 c| · · · c|ws $ws c | and al br −1 q−1 (l1+d ) ′ ′R ′ ′R w1′ $w1′R c |w2 $w2 c| · · · c |ws $ws c| are not (m + l)-equivalent with respect to L1 if the sets {w1 , . . . , ws } and  l {w1′ , . . . , ws′ } are different. There exist exactly l2dq different subsets of {0, 1}l q q with s = ld elements. For l large enough such that log(ld ) ≤ 14 l, it follows: dq   l q l 2 − ld 2l N (n, l + m, L1 ) ≥ dq > ldq l  l ldq  l dq 22 ) 2 −log(l ≥ = 2 q ld  q ld q  l ≥ 24 ld  l dq = 24l ∈ 2Ω(l 1+dq ) On the other hand, by Lemma 5 the number of equivalence classes distinguishable by M, is bounded for a constant p > 1: ′ N (n, l + m, L1 ) ≤ p(l+m+r (n)) d Dimension- and Time-Hierarchies for Small Time Bounds 329 For n we have q−1 n = l + r−1 (l1+d q q q−1 ) + (2l + 2) · ld + (d − 1) · ld q−1 = O(l1+d ) + r−1 (l1+d +l ). 1 Since r ∈ O(id d ), it follows r−1 ∈ Ω(idd ). Therefore, q−1 r−1 (l1+d We conclude q ) ∈ Ω(ld+d ). q−1 n ≤ c1 · r−1 (l1+d ) for some c1 ∈ IN. Due to the property ∀ c ∈ IN : ∃ c′ ∈ IN : c · r−1 (n) ≤ r−1 (c′ · n), we obtain q−1 n ≤ r−1 (c2 · l1+d From 1 − ε ≤ 1 − d−1 dq +d dq +1 dq +d = = ) for some c2 ∈ IN. 1 dq−1 + d dq−1 +1 and r′ ∈ o(r1−ε ) it follows: q−1 r′ (n) ≤ r′ (r−1 (c2 · l1+d )) q−1 ∈ o(r(r−1 (c2 · l1+d q−1 1 = o(l d +d q−1 By l + m = l + (d − 1) · ld ) q−1 ∈ O(ld dq−1 + 1 d )) dq−1 +1 ) ) it holds: q−1 (l + m + r′ (n))d ∈ (O(ld 1 q−1 1 ) + o(l d +d q−1 = o(l d +d ))d q )d = o(l1+d ) So the number of distinguishable equivalence classes is N (n, l + m, L1 ) ≤ po(l 1+dq ) = 2o(l 1+dq ) . Now we have the contradiction that previously N (n, l + m, L1 ) has been calcu1+dq ) lated to be at least 2Ω(l which proves L1 ∈ / DTIMEd (id + r′ ). ⊓ ⊔ For one-dimensional machines we have hierarchies from real time to linear time. Due to the possible speed-up from id + r to id + ε · r the condition r′ ∈ o(r) cannot be relaxed. 4 Dimension Hierarchies Now we are going to show that there exist infinite dimension hierarchies for all time complexities in question. So we obtain double hierarchies. It turns out that dimensions are more powerful than small time bounds. 330 M. Kutrib Theorem 8. Let r : IN0 → IN be an increasing function and d ∈ IN be a con1 stant. If r ∈ o(id d ), then DTIMEd+1 (id) \ DTIMEd (id + r) = ∅. Again, before proving the theorem, we present an example based on natural functions. It shows another double hierarchy. Example 9. Since T (DTMd ) is closed under composition and contains 2id , the functions log[i] , i ≥ 1 are belonging to T −1 (DTMd ). For d = 1, trivially, log[i+1] ∈ o(log[i] ). For d > 1 we need to find an ε such that log[i+1] ∈ o((log[i] )1−ε ). We have log(log[i] ) and (log[i] )1−ε and, therefore, the condition is fulfilled for all ε < 1: The hierarchy ist depicted in Figure 2. ⊓ ⊔                                              ℄ ℄                ℄ ℄            ℄ ℄                ℄ ℄            Fig. 2. Double hierarchy based on iterated logarithms. Proof (of Theorem 8). The words of the witness language L2 are of the form R w1 $w1R c|w2 $w2R c |···c |ws $ws c|d1 · · · dm y, where l ∈ IN is a positive integer, s = ld , m = d · l, y, wi ∈ {0, 1}l , 1 ≤ i ≤ s, and di ∈ Ed , 1 ≤ i ≤ m. An accepting (d + 1)-dimensional real-time machine M works as follows. The subwords wi are stored into a (d + 1)-dimensional area of size l × l × · · · × l. The first symbols of the subwords wi are stored at the ld positions (0, 0, . . . , 0) to (l − 1, l − 1, . . . , l − 1, 0). The remaining symbols of each wi are stored along the (d + 1)st dimension, respectively. After storing the subwords, M moves its corresponding head as requested by the di . Since the di are belonging to Ed , this movement is within the first d Dimension- and Time-Hierarchies for Small Time Bounds 331 dimensions only. Finally, when y appears in the input, M tries to compare it with the subword stored at the current position. M accepts if a subword has been stored at the current position at all and if the subword matches y. Thus, L2 ∈ DTIMEd+1 (id). In order to apply Lemma 5, we observe that, again, two words R w1 $w1R c|w2 $w2R c | · · · c|ws $ws c| and ′R ′R w1′ $w1′R c |w2 $w2 c| · · · c|ws $ws c | are not (m + l)-equivalent with respect to L2 if the sets {w1 , . . . , ws } and {w1′ , . . . , ws′ } are different. Therefore, L2 induces at least  l d+1 2 N (n, l + m, L2 ) ≥ d ≥ 2Ω(l ) l equivalence classes for all sufficiently large l. On the other hand, we obtain an upper bound of the number of distinguishable equivalence classes for an (id + r)-time DTMd M as follows: N (n, l + m, L2 ) ≤ p(l+m+r(n)) d = p(l+d·l+r((2l+2)·l ≤ p(c1 ·l+r(c2 ·l ∈p d+1 d +l+d·l))d ))d for some c1 , c2 ∈ IN 1 (O(l)+o(c2 ·ld+1 ) d )d = p(O(l)+o(l = po(l = po(l d+1 d 1 since r ∈ o(id d ) ))d d+1 d )d d+1 ) = 2o(l d+1 ) From the contradiction L2 ∈ / DTIMEd (id + r) follows. ⊓ ⊔ The inclusions DTIMEd+1 (id) ⊆ DTIMEd+1 (id + r) and DTIMEd (id + r) ⊆ DTIMEd+1 (id+r) are trivial. An application of Theorem 8 yields the hierarchies: Corollary 10. Let r : IN0 → IN be an increasing function and d ∈ IN be a 1 constant. If r ∈ o(id d ), then DTIMEd (id + r) ⊂ DTIMEd+1 (id + r). 1 Note that despite the condition r ∈ o(id d ), the dimension hierarchies can 1 touch r = id d : 1 1 1 1 id d ∈ o(id d−1 ) and DTIMEd−1 (id + id d ) ⊂ DTIMEd (id + id d ). 332 M. Kutrib References 1. Balcázar, J. L., Dı́az, J., and Gabarró, J. Structural Complexity I . Springer, Berlin, 1988. 2. Book, R. V. and Greibach, S. A. Quasi-realtime languages. Math. Systems Theory 4 (1970), 97–111. 3. Buchholz, T., Klein, A. and Kutrib, M. Iterative arrays with small time bounds, Mathematical Foundations of Computer Science (MFCS 2000), LNCS 1893, Springer 2000, pp. 243–252. 4. Cole, S. N. Real-time computation by n-dimensional iterative arrays of finite-state machines. IEEE Trans. Comput. C-18 (1969), 349–365. 5. Fürer, M. The tight deterministic time hierarchy. Proceedings of the Fourteenth Annual ACM Symposium on Theory of Computing (STOC ’82), 1982, pp. 8–16. 6. Hartmanis, J. and Stearns, R. E. On the computational complexity of algorithms. Trans. Amer. Math. Soc. 117 (1965), 285–306. 7. Hennie, F. C. One-tape, off-line turing machine computations. Inform. Control 8 (1965), 553–578. 8. Hennie, F. C. On-line turing machine computations. IEEE Trans. Elect. Comput. EC-15 (1966), 35–44. 9. Hopcroft, J. E. and Ullman, J. D. Introduction to Automata Theory, Language, and Computation. Addison-Wesley, Reading, Massachusetts, 1979. 10. Klein A. and Kutrib, M. Deterministic Turing machines in the range between real-time and linear-time. Theoret. Comput. Sci. 289 (2002), 253–275. 11. Kobayashi, K. On proving time constructibility of functions. Theoret. Comput. Sci. 35 (1985), 215–225. 12. Loui, M. C. Simulations among multidimensional turing machines. Theoret. Comput. Sci. 21 (1982), 145–161. 13. Michel P. An NP-complete language accepted in linear time by a one-tape Turing machine. Theoret. Comput. Sci. 85 (1991), 205–212. 14. Paul, W. J. On time hierarchies. J. Comput. System Sci. 19 (1979), 197–202. 15. Paul, W., Seiferas, J. I., and Simon, J. An information-theoretic approach to time bounds for on-line computation. J. Comput. System Sci. 23 (1981), 108–126. 16. Pippenger, N. and Fischer, M. J. Relations among complexity measures. J. Assoc. Comput. Mach. 26 (1979), 361–381. 17. Reischuk, R. Einführung in die Komplexitätstheorie. Teubner, Stuttgart, 1990. 18. Rosenberg, A. L. Real-time definable languages. J. Assoc. Comput. Mach. 14 (1967), 645–662. 19. Stoß, H.-J. Zwei-Band Simulation von Turingmaschinen. Computing 7 (1971), 222–235. 20. Wagner, K. and Wechsung, G. Computational Complexity. Reidel, Dordrecht, 1986. Baire’s Categories on Small Complexity Classes Philippe Moser Computer Science Department, University of Geneva moser@cui.unige.ch Abstract. We generalize resource-bounded Baire’s categories to small complexity classes such as P, QP and SUBEXP and to probabilistic classes such as BPP. We give an alternative characterization of small sets via resource-bounded Banach-Mazur games. As an application we show that for almost every language A ∈ SUBEXP, in the sense of Baire’s category, PA = BPPA . 1 Introduction Resource-bounded measure and resource-bounded Baire’s Category were introduced by Lutz in [1] and [2] for both complexity classes E and EXP. It provides a means of investigating the sizes of various subsets of E and EXP. In resource-bounded measure the small sets are those with measure zero, in resource-bounded Baire’s Category the small sets are those of first category (meager sets). Both smallness notions satisfy the following three axioms. First every single language L ∈ E is small, second the whole class E is large, and finally “easy infinite unions” of small sets are small. These axioms meet the essence of Lebegue’s measure and Baire’s category and ensure that it is impossible for a subset of E to be both large and small. The first goal of Lutz’s approach was to extend existence results, such as “there is a language in C satisfying property P ”, to abundance results such as “most languages in C satisfy property P ”, which is more informative since an abundance result reflects the typical behavior of languages in a class, whereas an existence result could as well correspond to an exception in the class. Both resource-bounded measure and resource-bounded Baire’s Category have been successfully used to understand the structure of the exponential time classes E and EXP. An important problem in resource-bounded measure theory was to generalize Lutz’s measure theory to small complexity classes such as P, QP and SUBEXP and to probabilistic classes such as BPP and BPE. These issues have been solved in the following list of papers [3], [4], [5] and [6]. As noticed in [7], the same question in the Baire’s category setting was still left unanswered. In this paper we solve this problem by generalizing resource-bounded Baire’s categories on small complexity classes such as P, QP and SUBEXP and to probabilistic classes such as BPP. We also give an alternative characterization of meager sets through Banach-Mazur games. As an application we improve the result of [3] where it was shown that for almost every language A ∈ SUBEXP, in A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 333–342, 2003. c Springer-Verlag Berlin Heidelberg 2003  334 P. Moser the sense of resource-bounded measure, PA = BPP. The question whether the same result holds with PA = BPPA was raised in [3]. We answer this question affirmatively in the resource-bounded Baire’s category setting, by showing show that for almost every language A ∈ SUBEXP, in the sense of resource-bounded Baire’s category, PA = BPPA . The remainder of the paper is organized as follows. In section 3 we introduce resource-bounded Baire’s category on P. In section 3.1 we give another characterization of small sets through resource-bounded Banach-Mazur games. In section 4 we introduce resource-bounded Baire’s category on BPP with the corresponding resource-bounded Banach-Mazur games formulation. Finally in section 5 we prove the result on BPP mentioned above. 2 Preliminaries We use standard notation for traditional complexity classes; see for instance  δ [8] and [9], or [10]. For ǫ > 0, denote by Eǫ the class Eǫ = δ<ǫ DTIME(2n ). SUBEXPis the class ∩ǫ>0 Eǫ , and quasi polynomial time refers to the class QP = k ∪k≥1 DTIME(nlog n ). Let us fix some notations for strings and languages. Let s0 , s1 , . . . be the standard enumeration of the strings in {0, 1}∗ in lexicographical order, where s0 = λ denotes the empty string. A sequence is an element of {0, 1}∞ . If w is a string or a sequence and 1 ≤ i ≤ |w| then w[i] and w[si ] denotes the ith bit of w. Similarly w[i . . . j] and w[si . . . sj ] denote the ith through jth bits, and dom(w) the domain of w, where w is viewed as a partial function. We identify language L with its characteristic function χL , where χL is the sequence such that χL [i] = 1 iff si ∈ L. For a string si define its position by pos(si ) = i. If w1 is a string and w2 is a string or a sequence extending w1 , we write w1 ⊑ w2 . We write w1 ❁ w2 if w1 ⊑ w2 and w1 = w2 . For two strings τ, σ ∈ {0, 1}∗ , we denote by τ ∧ σ the concatenation of τ followed by σ. For a, b ∈ N let a−̇b denote max(a − b, 0). We identify N with {0, 1}∗ , thus we denote by NN the set of all functions mapping strings to strings. 2.1 Finite Extension Strategies Whereas resource-bounded measure is defined via martingales, resource-bounded Baire’s category is defined via finite extension strategies. Here is a definition. Definition 1. A function h : {0, 1}∗ → {0, 1}∗ is a finite extension strategy, or a constructor, if for every string τ ∈ {0, 1}∗ , τ ⊑ h(τ ). For simplicity we will use the word “strategy” for finite extension strategy. We will often consider indexed strategies. An indexed strategy is a function h : N × {0, 1}∗ → {0, 1}∗ , such that hi := h(i, ·) is a strategy for every i ∈ N. If h is a strategy and τ ∈ {0, 1}∗ , define ext h(τ ) to be the unique string u such that h(τ ) = τ ∧ u. We say a strategy h avoids some language A (or language A avoids strategy h) if for every string τ ∈ {0, 1}∗ we have h(τ ) ⊑ χA . We say a Baire’s Categories on Small Complexity Classes 335 strategy h meets some language A if h does not avoid A. For the results in Section 5 we will need the following definition of the relativized hardness of a pseudorandom generator. Definition 2. Let A be any language. The hardness H A (Gm,n ) of a random generator Gm,n : {0, 1}m −→ {0, 1}n , is defined as the minimal s such that there exists an n-input circuit C with oracle gates to A, of size at most s, for which: | Pr x∈{0,1}m [C(Gm (x)) = 1] − Pr y∈{0,1}n [C(y) = 1]| ≥ 1 . s (1) Klivans and Melkebeek [11] noticed that Impagliazzo and Widgerson’s [12] pseudorandom generator construction relativizes; i.e. for any language A, there is a deterministic polynomial time procedure that converts the truth table of a Boolean function that is hard to compute for circuits having oracle gates for A, into a pseudorandom generator that is pseudorandom for circuits with A oracle gates. More precisely, Theorem 1 (Klivans-Melkebeek [11]). Let A be any language. There is a polynomial-time computable function F : {0, 1}∗ × {0, 1}∗ → {0, 1}∗ , with the following properties. For every ǫ > 0, there exists a, b ∈ N such that a F : {0, 1}n × {0, 1}b log n → {0, 1}n , (2) and if r is the truth table of a (a log n)-variables Boolean function of A-oracle circuit complexity at least nǫa , then the function Gr (s) = F (r, s) is a generator, mapping {0, 1}b log n into {0, 1}n , which has hardness H A (Gr ) > n. 3 Baire’s Category on P To define a resource bounded Baire’s category on P, we will consider strategies computed by Turing machines which have random access to their inputs, i.e. on input τ , the machine can query any bit of τ to its oracle. For such a random Turing machine M running on input τ , we denote this convention by M τ (·). Note that random Turing machines can compute the lengths of their input τ in O(log |τ |) steps, by using bisection. We will consider random Turing machines running in time polylog in the input’s length |τ | or equivalently polynomial in |s|τ | |. Note that such machines cannot read their entire input, but only a sparse subset of it. Definition 3. An indexed strategy h : N × {0, 1}∗ → {0, 1}∗ is P-computable if there is a random access Turing machine M as above, such that for every τ ∈ {0, 1}∗ and every i ∈ N, M τ (0i ) = ext hi (τ ) where M runs in time polynomial in |s|τ | | + i. (3) 336 P. Moser We say a class is small if there is a single indexed strategy that avoids every language in the class. More precisely, Definition 4. A class C of languages is P-meager if there exists a P-computable indexed strategy h, such that for every L ∈ C there exists i ∈ N, such that hi avoids L. In order to formalize the third axiom we need to define “easy infinite unions” precisely.  Definition 5. X = i∈N Xi is a P-union of P-meager sets, if there exists an indexed P-computable strategy h : N × N × {0, 1}∗ → {0, 1}∗ , such that for every i ∈ N, hi,· witnesses Xi ’s meagerness. Let us prove the three basic axioms. Theorem 2. For any language L in P, the singleton {L} is P-meager. Proof. Let L ∈ P be any language. We describe a P-computable constructor h which avoids {L}. Consider the following Turing machine M computing h. On input string σ, M σ simply outputs 1 − L(s|σ|+1 ). h is clearly P-computable, and h avoids {L}. ⊓ ⊔ The proof of the third axiom is straightforward. Theorem 3. 1. All subsets of a P-meager set are P-meager. 2. A P-union of P-meager sets is P-meager. Proof. Immediate by definition of P-meagerness. ⊓ ⊔ Let us prove the second axiom which says that the whole space P is not small. Theorem 4. P is not P-meager. Proof. Let h be an indexed P-computable constructor and let M be a Turing machine computing h. We construct a language L ∈ P which meets hi for every i. The idea is to construct a language L with the following characteristic function, χL = |  0 | ext h1 (B0 )0· · ·0 |ext h2 (B0 ∧ B1 )0· · ·0 |· · ·|ext hi (B0 ∧ B1 ∧ · · · ∧ Bi−1 )0 · · ·0 | B0   B1    B2     Bi (4) where block Bi corresponds to all strings of size i, and block Bi contains ext hi (B0 ∧ B1 ∧ · · · ∧ Bi−1 ) followed by a padding with 0’s. Bi is large enough to contain ext hi (B0 ∧ B1 ∧ · · · ∧ Bi−1 ), because M ’s output’s length is bounded by a polynomial in i. Let us construct a polynomial time Turing machine N deciding L. On input x, where |x| = n, Baire’s Categories on Small Complexity Classes 337 1. Compute p where x is the pth word of ∧length n. ∧ ∧ 2. For i = 1 to n simulate M B0 B1 ··· Bi−1 (0i ). Answer M ’s queries with the previously stored binary sequences B̄1 , B̄2 , B̄i−1 in the following way. ∧ ∧ ∧ Suppose that during its simulation M B0 B1 ··· Bi−1 (0i ) queries the kth bit ∧ ∧ ∧ of B0 B1 · · · Bi−1 to its oracle. To answer this query, simply compute sk and compute its lengths lk and its position pk among words of size lk . Look up whether the stored binary sequence B̄lk contains a pk th bit bk . If this is the case answer M ’s query with bk , else answer M ’s query with 0. Finally ∧ ∧ ∧ store the output of M B0 B1 ··· Bi−1 (0i ) under B̄i . 3. If the stored binary sequence B̄n contains a pth bit then output this bit, else output 0 (x is in the padded zone of Bn ). Let us check that L is in P. The first and third step are clearly computable in time polynomial in n. For the second step we have that for each of the n recursive steps there are at most a polynomial number of queries (because h is P-computable) and each simulation of M once the queries are answered takes time polynomial in n because M is polynomial. Note that all B̄i ’s have size polynomial in n, therefore it’s no problem to store them. ⊓ ⊔ 3.1 Resource-Bounded Banach-Mazur Games We give an alternative characterization of small sets via resource-bounded Banach-Mazur games. Informally speaking, a Banach-Mazur game, is a game between two strategies f and g, where the game begins with the empty string λ. Then g ◦ f is applied successively on λ. Such a game yields a unique infinite string, or equivalently a language, called the result of the play between f and g. For a class C, we say that g is a winning strategy if it can force the result of the game with any strategy f to be a language not in C. We show that the existence of a winning strategy is equivalent to the meagerness of C. This equivalence result is useful in practice, since it is often easier to find a winning strategy, rather than a finite extension strategy. Definition 6. 1. A play of a Banach-Mazur game is a pair (f, g) of strategies such that for every string τ ∈ {0, 1}∗ , τ ❁ g(τ ). 2. The result R(f, g) of the play (f, g) is the unique element of {0, 1}∞ that extends (g ◦ f )i (λ) for every i ∈ N. For a class of languages C and two function classes FI and FII , denote by G[C, FI , FII ] the Banach-Mazur game with distinguished set C, where player I must choose a strategy in FI , and player II a strategy in FII . We say player II wins the play (f, g) if R(f, g) ∈ C, otherwise we say player I wins. We say player II has a winning strategy for the game G[C, FI , FII ], if there exists a strategy g ∈ FII such that for every strategy f ∈ FI , player II wins (f, g) The following result states that a class is meager iff there is a winning strategy for player II. This is very useful since in practice it is often easier to give a winning strategy for player II, than to exhibit a constructor avoiding every language in the class. 338 P. Moser Theorem 5. Let X be any class of languages. The following are equivalent. 1. Player II has a winning strategy for G[X, NN , P]. 2. X is P-meager. Proof. Suppose the first statement holds and let g be a P-computable wining strategy for player II. Let M be a Turing machine computing g. We define an indexed P-computable constructor h. Let k ∈ N and σ ∈ {0, 1}∗ , hk (σ) := g(σ ′ ) where σ ′ = σ ∧ 0k−̇|σ| . (5) ′ h is P-computable because computing hk (σ) simply requires to simulate M σ , answering M’s queries in dom(σ ′ )\dom(σ) by 0. We show that if language A meets hk for every k ∈ N, then A ∈ X. This implies that X is P-meager as witnessed by h. To do this we show that for every α ❁ χA there is a string β such that, α ⊑ β ⊑ g(β) ❁ χA . (6) If this holds, then player I has a winning strategy yielding R(f, g) = A: for a given α player I extends it to obtain the corresponding β, thus forcing player II to extend to a prefix of χA . So let α be any prefix of χA , where |α| = k. Since A meets hk , there is a string σ ❁ χA such that σ ′ ⊑ g(σ ′ ) = hk (σ) ❁ χA (7) where σ ′ = σ ∧ 0k−̇|σ| . Since |α| ≤ |σ ′ | and α, σ ′ are prefixes of χA , we have α ⊑ σ ′ . Define β to be σ ′ . For the other direction, let X be P-meager as witnessed by h, i.e. for every A ∈ X there exists i ∈ N such that hi avoids A. Let N be a Turing machine computing h. We define a P-computable constructor g inducing a winning strategy for player II in the game G[X, NN , P]. We show that for any strategy f , R(f, g) meets hi for every i ∈ N, which implies R(f, g) ∈ X. Here is a description of a Turing machine M computing g. For a string σ , M σ does the following. 1. Compute n0 = mint≤n [(∀τ ⊑ σ such that |τ | ≤ n) ht (τ ) ⊑ σ], where n = |s|σ| |. 2. If no such n0 exists output 0. ∧ 3. If n0 exists (hn0 is the next strategy to be met), simulate N σ 0 (0n0 ) answering N ’s queries in dom(σ ∧ 0)\dom(σ) with 0, denote N ’s answer by ω. Output 0 ∧ ω. g is clearly P-computable. We show that R(f, g) meets every hi for any strategy f . Suppose for a contradiction that this is not the case, i.e. there is a strategy f such that R(f, g) does not meet h. Let n0 be the smallest index such that R(f, g) does not meet hn0 . Since R(f, g) meets hn0 −1 there is a string τ such that hn0 −1 (τ ) ❁ R(f, g). Since g strictly extends strings at every round, after at most 2O(|τ |) rounds, f will output a string σ long enough to enable step 1 (of M ’s description) to find out that hn0 −1 (τ ) ⊑ σ ❁ R(f, g) thus incrementing n0 −1 to n0 . At this round we have g(σ) = σ ∧ 0 ∧ ext hn0 (σ ∧ 0), i.e. hn0 ❁ R(f, g) which is a contradiction. ⊓ ⊔ Baire’s Categories on Small Complexity Classes 339 It is easy to check that throughout Section 3, P can be replaced by QP or Eǫ , thus yielding a Baire’s category notion on both quasi-polynomial and subexponential time classes. 4 Baire’s Category on BPP To construct a notion of Baire’s category on probabilistic classes, we will use the following probabilistic indexed strategies. Definition 7. An indexed strategy h : N × {0, 1}∗ → {0, 1}∗ is BPP-computable if there is a probabilistic oracle Turing machine M such that for every τ ∈ {0, 1}∗ and every i, n ∈ N, Pr[M τ (0i , 0n ) = ext hi (τ )] ≥ 1 − 2−n (8) where the probability is taken over the internal coin tosses of M , and M runs in time polynomial in |s|τ | | + i + n. By using standard Chernoff bound arguments it is easy to show that Definition 7 is robust, i.e. the error probability can range from 1/2 + 1/p(n) to 1 − 2−q(n) for any polynomials p, q, without enlarging (resp. reducing) the class of strategies defined in Definition 7. As in Section 3, a class is meager if there is a single probabilistic strategy that avoids every language in the class. Definition 8. A class of languages C is BPP-meager if there exists a BPPcomputable indexed strategy h, such that for every L ∈ C there exists i ∈ N, such that hi avoids L. As in section 3, we need to define “easy infinite unions” precisely in order to prove the third axiom.  Definition 9. X = i∈N Xi is a BPP-union of BPP-meager sets, if there exists an indexed BPP-computable strategy h : N × N × {0, 1}∗ → {0, 1}∗ , such that for every i ∈ N, hi,· witnesses Xi ’s meagerness. Let us prove that all three axioms hold for our Baire’s category notion on BPP. Theorem 6. For any language L in BPP, {L} is BPP-meager. Proof. The proof is similar to Theorem 2 except that the constructor h is computed with error probability smaller than 2−n . ⊓ ⊔ The third axiom holds by definition. Theorem 7. 1. All subsets of a BPP-meager set are BPP-meager. 2. A BPP-union of BPP-meager sets is BPP-meager. 340 P. Moser Proof. Immediate by definition of BPP-meagerness. ⊓ ⊔ Let us prove the second axiom. Theorem 8. BPP is not BPP-meager. Proof. The proof is similar to Theorem 4 except for the second step of N ’s computation, where every simulation of M is performed with error probability smaller than 2−n . Since there are n distinct simulation of M , the total error probability is smaller than n2−n , which ensures that L is in BPP. ⊓ ⊔ 4.1 Resource-Bounded Banach-Mazur Games Similarly to Section 3.1, we give an alternative characterization of meager sets through resource-bounded Banach-Mazur games. Theorem 9. Let X be any class of languages. The following are equivalent. 1. Player II has a winning strategy for G[X, NN , BPP]. 2. X is BPP-meager. Proof. The 1. implies 2. direction is similar to Theorem 5, except that hk (σ) can be computed with error probability smaller than 2−n . For the other direction, the only difference with Theorem 5, is that the first and third step of M ’s computation can be performed with small error probability. ⊓ ⊔ 5 Application to the P = BPP Problem It was shown in [3] that for every ǫ > 0, almost every language A ∈ Eǫ , in the sense of resource-bounded measure, satisfies PA = BPP. We improve their result by showing that for every ǫ > 0, almost every language A ∈ Eǫ , in the sense of resource-bounded Baire’s category, satisfies PA = BPPA . Theorem 10. For every ǫ > 0, the set of languages A such that PA = BPPA is Eǫ -meager. Proof. Let ǫ > 0. Let 0 < δ < max(ǫ, 1/4) and b > 2kδ/ǫ, where k is some constant that will be determined later. Consider the following strategy h, computed by the following Turing machine M . On input σ, where |s|σ| | = n, M does the following. At start Z = ∅, and i = 1. M computes zi in the following way. Deterb|u| mine whether pos(s|σ|+i ) = pos(02 u) for some string u of size log(n2/b ). If not then zi = 0, output zi , and compute zi+1 ; else denote by ui the corresponding string u. Construct the set Ti of all truth tables of |ui |-inputs Boolean circuits C with oracle gates for σ of size less than 2δ|ui | , such that C(uj ) = zj for every (uj , zj ) ∈ Z. Compute Mi = MajorityC∈Ti [C(ui )], and let zi = 1 − Mi . Add 2/b (ui , zi ) to Z. Output zi , and compute zi+1 , unless ui = 1log(n ) (i.e. ui is the last string of size log(n2/b )), in which case M stops. Baire’s Categories on Small Complexity Classes 341 4δ/b Since there are 2n circuits to simulate, and simulating such a circuit takes time O(n4δ/b ), by answering its queries to σ with the input σ, M runs in time 4δ/b ǫ′ 2n , where ǫ′ < ǫ. Finally computing the majority Mi takes time 2O(n ) . Thus 2cδ/b the total running time is less than 2n for some constant c, which is less than ǫ′ 2n with ǫ′ < ǫ for an appropriate choice of k. b|u| Let A be any language and consider F (A) := {u|02 u ∈ A}. It is clear that F (A) ∈ EA . Consider HδA the set of languages with high circuit complexity, i.e. HδA = {L| every n-inputs circuits with oracle gates for A of size less than 2δn fails to compute L}. We have, F (A) ∩ HδA = ∅ implies PA = BPPA , by Theorem 1. We show that h avoids every language A such that F (A) ∩ HδA = ∅. So let A be any such language, i.e. there is a n-inputs circuit family {Cn }n>0 , with oracle gates for A, of size less than 2δn computing F (A). We have C(ui ) = 1 iff 02 b|ui | ui ∈ A for every string ui such that (ui , zi ) ∈ Z. (9) (for simplicity we omit C’s index). Consider the set Dn of all circuits with log(n2/b )-inputs of size at most n2δ/b with oracles gates for A satisfying Equation 9. We have |Dn | ≤ 24δ/b . By construction, every zi such that (ui , zi ) ∈ Z reduces the cardinal of Dn by a factor 2. Since there are n2/b zi ’s such that (ui , zi ) ∈ Z, 2/b ⊓ ⊔ we have Dn ≤ 24δ/b · 2−n < 1, i.e. Dn = ∅. Therefore h(σ) ❁ χA . 6 Conclusion Theorem 4 shows that the class SPARSE of all languages with polynomial density is not P-meager. To remedy this situation we can improve the power of P-computable strategies by considering locally computable strategies, which can avoid SPARSE and even the class of language of subexponential density. This issue will be addressed in [13]. References 1. Lutz, J.: Category and measure in complexity classes. SIAM Journal on Computing 19(1990) 1100–1131 2. Lutz, J.: Almost everywhere high nonuniform complexity. Journal of Computer and System Science 44(1992) 220–258 3. Allender, E., Strauss, M.: Measure on small complexity classes, with application for BPP. Proceedings of the 35th Annual IEEE Symposium on Foundations of Computer Science (1994) 807–818 4. Strauss, M.: Measure on P-strength of the notion. Inform. and Comp. 136:1(1997) 1–23 5. Regan, K., Sivakumar, D.: Probabilistic martingales and BPTIME classes. In Proc. 13th Annual IEEE Conference on Computational Complexity (1998) 186–200 6. Moser, P.: A generalization of Lutz’s measure to probabilistic classes. submitted (2002) 7. Ambos-Spies, K.: Resource-bounded genericity. Proceedings of the Tenth Annual Structure in Complexity Theory Conference (1995) 162–181 342 P. Moser 8. Balcázar, J.L., Dı́az, J., and Gabarró, J.: Structural Complexity I. EATCS Monographs on Theorical Computer Science Volume 11, Springer-Verlag (1995) 9. Balcázar, J.L., Dı́az, J., and Gabarró, J.: Structural Complexity II. EATCS Monographs on Theorical Computer Science Volume 22, Springer-Verlag (1990) 10. Papadimitriou, C.: Computational complexity. Addisson-Wesley (1994) 11. Klivans, A., Melkebeek, D.: Graph nonisomorphism has subexponential size proofs unless the polynomial hierarchy collapses. Proceedings of the 31st Annual ACM Symposium on Theory of Computing (1999) 659–667 12. Impagliazzo, R., Widgerson, A.: P = BPP if E requires exponential circuits: derandomizing the XOR lemma. Proceedings of the 29th Annual ACM Symposium on Theory of Computing (1997) 220–229 13. Moser, P.: Locally computed Baire’s categories on small complexity classes. submitted (2002) Operations Preserving Recognizable Languages Jean Berstel1 , Luc Boasson2 , Olivier Carton2 , Bruno Petazzoni3 , and Jean-Éric Pin2 1 Institut Gaspard Monge, Université de Marne-la-Vallée, 5, boulevard Descartes, Champs-sur-Marne, F-77454 Marne-la-Vallée Cedex 2, berstel@univ-mlv.fr, 2 LIAFA, Université Paris VII and CNRS, Case 7014, 2 Place Jussieu, F-75251 Paris Cedex 05, FRANCE† {Olivier.Carton,Luc.Boasson,Jean-Eric.Pin}@liafa.jussieu.fr 3 Lycée Marcelin Berthelot, Saint-Maur bpetazzoni@ac-creteil.fr Abstract. Given a subset S of N, filtering a word a0 a1 · · · an by S consists in deleting the letters ai such that i is not in S. By a natural generalization, denote by L[S], where L is a language, the set of all words of L filtered by S. The filtering problem is to characterize the filters S such that, for every recognizable language L, L[S] is recognizable. In this paper, the filtering problem is solved, and a unified approach is provided to solve similar questions, including the removal problem considered by Seiferas and McNaughton. There are two main ingredients on our approach: the first one is the notion of residually ultimately periodic sequences, and the second one is the notion of representable transductions. 1 Introduction The original motivation of this paper was to solve an automata-theoretic puzzle, proposed by the fourth author (see also [8]), that we shall refer to as the filtering problem. Given a subset S of N, filtering a word a0 a1 · · · an by S consists in deleting the letters ai such that i is not in S. By a natural generalization, denote by L[S], where L is a language, the set of all words of L filtered by S. The filtering problem is to characterize the filters S such that, for every recognizable language L, L[S] is recognizable. The problem is non trivial since, for instance, it can be shown that the filter {n! | n ∈ N} preserves recognizable languages. The quest for this problem led us to search for analogous questions in the literature. Similar puzzles were already investigated in the seminal paper of Stearns and Hartmanis [14], but the most relevant reference is the paper [12] of Seiferas and McNaughton, in which the so-called “removal problem” was solved: characterize the subsets S of N2 such that, for each recognizable language L, the language P (S, L) = {u ∈ A∗ | there exists v ∈ A∗ such that (|u|, |v|) ∈ S and uv ∈ L} is recognizable. † Work supported by INTAS project 1224. A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 343–354, 2003. c Springer-Verlag Berlin Heidelberg 2003  344 J. Berstel et al. The aim of this paper is to provide a unified approach to solve at the same time the filtering problem, the removal problem and similar questions. There are two main ingredients in our approach. The first one is the notion of residually ultimately periodic sequences, introduced in [12] as a generalization of a similar notion introduced by Siefkes [13]. The second one is the notion of representable transductions introduced in [9,10]. Complete proofs will be given in the extended version of this article. Our paper is organized as follows. Section 2 introduces some basic definitions: rational and recognizable sets, etc. The precise formulation of the filtering problem is given in Section 3. Section 4 is dedicated to transductions. Residually ultimately periodic sequences are studied in Section 5 and the properties of differential sequences are analyzed in Section 6. Section 7 is devoted to residually representable transductions. Our main results are presented in Section 8. Further properties of residually ultimately periodic sequences are discussed in Section 9. The paper ends with a short conclusion. 2 2.1 Preliminaries and Background Rational and Recognizable Sets Given a multiplicative monoid M , the subsets of M form a semiring P(M ) under union as addition and subset multiplication defined by XY = {xy | x ∈ X and y ∈ Y }. Throughout this paper, we shall use the following convenient  notation. If X is a subset of M , and K is a subset of N, we set X K = n∈K X n . Recall that the rational subsets of a monoid M form the smallest subset R of P(M ) containing the finite subsets of M and closed under finite union, product, and star (where X ∗ is the submonoid generated by X). The set of rational subsets of M is denoted by Rat(M ). It is a subsemiring of P(M ). Recall that a subset P of a monoid M is recognizable if there exists a finite monoid F and a monoid morphism ϕ : M → F such that P = ϕ−1 (ϕ(P )). By Kleene’s theorem, a subset of a finitely generated free monoid is recognizable if and only if it is rational. Various characterizations of the recognizable subsets of N are given in Proposition 1 below, but we need first to introduce some definitions. A sequence (sn )n≥0 of elements of a set is ultimately periodic (u.p.) if there exist two integers m ≥ 0 and r > 0 such that, for each n ≥ m, sn = sn+r . The (first) differential sequence of an integer sequence (sn )n≥0 is the sequence ∂s defined by (∂s)n = sn+1 − sn . Note that the integration formula sn = s0 +  (∂s) i allows one to recover the original sequence from its differential 0≤i≤n−1 and s0 . A sequence is syndetic if its differential sequence is bounded. If S is an infinite subset of N, the enumerating sequence of S is the unique strictly increasing sequence (sn )n≥0 such that S = {sn | n ≥ 0}. The differential sequence of this sequence is simply called the differential sequence of S. A set is syndetic if its enumerating sequence is syndetic. The characteristic sequence of a subset S of N is the sequence cn equal to 1 if n ∈ S and to 0 otherwise. The following elementary result is folklore. Operations Preserving Recognizable Languages 345 Proposition 1. Let S be a set of non-negative integers. The following conditions are equivalent: (1) S is recognizable, (2) S is a finite union of arithmetic progressions, (3) the characteristic sequence of S is ultimately periodic. If S is infinite, these conditions are also equivalent to the following conditions (4) the differential sequence of S is ultimately periodic. Example 1. Let S = {1, 3, 4, 9, 11} ∪ {7 + 5n | n ≥ 0} ∪ {8 + 5n | n ≥ 0} = {1, 3, 4, 7, 8, 9, 11, 12, 13, 17, 18, 22, 23, 27, 28, . . . }. Its characteristic sequence 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, . . . and its differential sequence 2, 1, 3, 1, 1, 2, 1, 1, 4, 1, 4, 1, 4, . . . are ultimately periodic. 2.2 Relations Given two sets E and F , a relation on E and F is a subset of E × F . The inverse of a relation S on E and F is the relation S −1 on F × E defined by (y, x) ∈ S −1 if and only if (x, y) ∈ S. A relation S on E and F can also be considered as a function from E into P(F ), the set of subsets of F , by setting, for each x ∈ E, S(x) = {y ∈ F | (x, y) ∈ S}. It can also be viewed as a function from P(E) into P(F ) by setting, for each subset X of E:  S(X) = S(x) = {y ∈ F | there exists x ∈ X such that (x, y) ∈ S} x∈X Dually, S −1 can be viewed as a function from P(F ) into P(E) defined, for each subset Y of F , by S −1 (Y ) = {x ∈ E | S(x) ∩ Y = ∅}. When this “dynamical” point of view is adopted, we say that S is a relation from E into F and we use the notation S : E → F . A relation S : N → N is recognizability preserving if, for each recognizable subset R of N, the set S −1 (R) is recognizable. 3 Filtering Languages A filter is a finite or infinite increasing sequence s of non-negative integers. If u = a0 a1 a2 · · · is an infinite word (the ai are letters), we set u[s] = as0 as1 · · · . Similarly, if u = a0 a1 a2 · · · an is a finite word, we set u[s] = as0 as1 · · · ask , where k is the largest integer such that sk ≤ n < sk+1 . Thus, for instance, if s is the sequence of squares, abracadabra[s] = abcr. By extension, if L is a language (resp. a set of infinite words), we set L[s] = {u[s] | u ∈ L} 346 J. Berstel et al. If s is the enumerative sequence of a subset S of N, we also use the notation L[S]. If, for every recognizable language L, the set L[s] is recognizable, we say that the filter S preserves recognizability. The filtering problem is to characterize the recognizability preserving filters. 4 Transductions In this paper, we consider transductions that are relations from a free monoid A∗ into a monoid M . Transductions were intensively studied in connection with context-free languages [1]. Some transductions can be realized by a non-deterministic automaton with output in P(M ), called transducer. More precisely, a transducer is a 6-tuple T = (Q, A, M, I, F, E) where Q is a finite set of states, A is the input alphabet, M is the output monoid, I = (Iq )q∈Q and F = (Fq )q∈Q are arrays of elements of P(M ), called respectively the initial and final outputs. The set of transitions E is a finite subset of Q × A × P(M ) × Q. Intuitively, a transition (p, a, R, q) is interpreted as follows: if a is an input letter, the automaton moves from state p to state q and produces the output R. A path is a sequence of consecutive transitions: a1 |R1 a2 |R2 an |Rn q0 −→ q1 −→ q2 · · · qn−1 −→ qn The (input) label of the path is the word a1 a2 · · · an . Its output is the set Iq0 R1 R2 · · · Rn Fqn . The transduction realized by T maps each word u of A∗ onto the union of the outputs of all paths of input label u. A transduction τ : A∗ → M is said to be rational if τ is a rational subset of the monoid A∗ × M . By the Kleene-Schützenberger theorem [1], a transduction τ : A∗ → M is rational if and only if it can be realized by a rational transducer, that is, a transducer with outputs in Rat(M ). A transduction τ : A∗ → M is said to preserve recognizability, if, for each recognizable subset P of M , τ −1 (P ) is a recognizable subset of A∗ . It is well known that rational transductions preserve recognizability, but this property is also shared by the larger class of representable transductions, introduced in [9, 10]. Two types of transduction will play an important role in this paper, the removal transductions and the filtering transductions. Given a subset S of N2 , considered as a relation on N, the removal  transduction of S is the transduction σS : A∗ → A∗ defined by σS (u) = (|u|,n)∈S uAn . The filtering transduction of a filter s is the transduction τs : A∗ → A∗ defined by τs (a0 a1 · · · an ) = As0 a0 As1 a1 · · · Asn an A{0,1,... ,sn+1 } . The main idea of [9,10] is to write an n-ary operator Ω on languages as the inverse of some transduction τ : A∗ → A∗ × · · · × A∗ , that is Ω(L1 , . . . , Ln ) = τ −1 (L1 ×· · ·×Ln ). If the transduction τ turns out to be representable, the results of [9,10] give an explicit construction of a monoid recognizing Ω(L1 , . . . , Ln ), given monoids recognizing L1 , . . . , Ln , respectively. Operations Preserving Recognizable Languages 347 −1 (L). Indeed, In our case, we claim that P (S, L) = σS−1 (L) and L[s] = τ∂s−1 we have on the one hand    σS−1 (L) = {u ∈ A∗ | uAn ∩ L = ∅} (|u|,n)∈S = {u ∈ A∗ | there exists v ∈ A∗ such that (|u|, |v|) ∈ S and uv ∈ L} = P (S, L) and on the other hand −1 τ∂s−1 (L) = {a0 a1 · · · an ∈ A∗ | As0 −1 a0 As1 −s0 −1 a1 · · · Asn −sn−1 −1 an A{0,1,... ,sn+1 −sn −1} ∩ L = ∅} = L[s] Unfortunately, the removal transductions and the filtering transductions are not in general representable. We shall see in Section 7 how to overcome this difficulty. But we first need to introduce our second major tool, the residually ultimately periodic sequences. 5 Residually Ultimately Periodic Sequences Let M be a monoid. A sequence (sn )n≥0 of elements of M is residually ultimately periodic (r.u.p.) if, for each monoid morphism ϕ from M into a finite monoid F , the sequence ϕ(sn ) is ultimately periodic. We are mainly interested in the case where M is the additive monoid N of non-negative integers. The following connexion with recognizability preserving sequences was established in [5,7,12,16]. Proposition 2. A sequence (sn )n≥0 of non-negative integers is residually ultimately periodic if and only if the function n → sn preserves recognizability. For each non-negative integer t, define the congruence threshold t by setting: x≡y (thr t) if and only if x = y < t or x ≥ t and y ≥ t. Thus threshold counting can be viewed as a formalisation of children counting: zero, one, two, three, . . . , many. A function s : N → N is said to be ultimately periodic modulo p if, for each monoid morphism ϕ : N → Z/pZ, the sequence un = ϕ(s(n)) is ultimately periodic. It is equivalent to state that there exist two integers m ≥ 0 and r > 0 such that, for each n ≥ m, un ≡ un+r (mod p). A sequence is said to be cyclically ultimately periodic (c.u.p.) if it is ultimately periodic modulo p for every p > 0. These functions are called “ultimately periodic reducible” in [12,13]. Similarly, function s : N → N is said to be ultimately periodic threshold t if, for each monoid morphism ϕ : N → Nt,1 , the sequence un = ϕ(s(n)) is ultimately periodic. It is equivalent to state that there exist two integers m ≥ 0 and r > 0 such that, for each n ≥ m, un ≡ un+r (thr t). 348 J. Berstel et al. Proposition 3. A sequence of non-negative integers is residually ultimately periodic if and only if it is ultimately periodic modulo p for all p > 0 and ultimately periodic threshold t for all t ≥ 0. The next proposition gives a very simple criterion to generate sequences that are ultimately periodic threshold t for all t. Proposition 4. A sequence (un )n≥0 of integers such that limn→∞ un = +∞ is ultimately periodic threshold t for all t ≥ 0. Example 2. The sequence n! is residually ultimately periodic. Indeed, let p be a positive integer. Then for each n ≥ p, n! ≡ 0 mod p and thus n! is ultimately periodic modulo p. Furthermore, Proposition 4 shows that, for each t ≥ 0, n! is ultimately periodic threshold t. The class of cyclically ultimately periodic functions has been thoroughly studied by Siefkes [13], who gave in particular a recursion scheme for producing such functions. Residually ultimately periodic sequences have been studied in [3, 5,7,12,15,16]. Their properties are summarized in the next proposition. Theorem 1. [16,3] Let (un )n≥0 and (vn )n≥0 be r.u.p. sequences. Then the following sequences are also r.u.p.: (1) (composition) uvn , (2) (sum) un + vn , (3) (product) un vn , (4) (difference) un − vn provided that un ≥ vn and lim (un − vn ) = +∞, n→∞ (5) (exponentation) uvnn ,  (6) (generalized sum) 0≤i≤vn ui ,  (7) (generalized product) 0≤i≤vn ui . In particular, the sequences nk and k n (for a fixed k), are residually ultimately periodic. However, r.u.p. sequences are not closed under quotients. For instance, let un be the sequence equal to 1 if n is prime and to n! + 1 otherwise. Then nun is r.u.p. but un is not r.u.p.. This answers a question left open in [15]. . . .2 2 (exponential stack of 2’s of height n), considered in [12], The sequence 22 is also a r.u.p. sequence, according to the following result. Proposition 5. Let k be a positive integer. Then the sequence un defined by u0 = 1 and un+1 = k un is r.u.p. The existence of non-recursive, r.u.p. sequences was established in [12]: if ϕ : N → N is a strictly increasing, non-recursive function, then the sequence un = n!ϕ(n) is non-recursive but is residually ultimately periodic. The proof is similar to that of Example 2. Operations Preserving Recognizable Languages 6 349 Differential Sequences An integer sequence is called differentially residually ultimately periodic (d.r.u.p. in abbreviated form), if its differential sequence is residually ultimately periodic. What are the connections between d.r.u.p. sequences and r.u.p. sequences? First, the following result holds: Proposition 6. [3, Corollary 28] Every d.r.u.p. sequence is r.u.p. However, the two notions are not equivalent: for instance, it was shown in [3] that  if bn is a non-ultimately periodic sequence of 0 and 1, the sequence un = ( 0≤i≤n bi )! is r.u.p. but is not d.r.u.p. It suffices to observe that (∂u)n ≡ bn threshold 1. Note that, if only cyclic counting were used, it would make no difference: Proposition 7. Let p be a positive number. A sequence is ultimately periodic modulo p if and only if its differential sequence is ultimately periodic modulo p. There is a special case for which the notions of r.u.p. and d.r.u.p. sequences are equivalent. Indeed, if the differential sequence is bounded, Proposition 1 can be completed as follows. Lemma 1. If a syndetic sequence is residually ultimately periodic, then its differential sequence is ultimately periodic. Putting everything together, we obtain Proposition 8. Let s be a syndetic sequence of non-negative integers. The following conditions are equivalent: (1) s is residually ultimately periodic, (2) ∂s is residually ultimately periodic, (3) ∂s is ultimately periodic. Proof. Proposition 6 shows that (2) implies (1). Furthermore (3) implies (2) is trivial. Finally, Lemma 1 shows that (1) implies (3). Proposition 9. Let S be an infinite syndetic subset of N. The following conditions are equivalent: (1) S is recognizable, (2) the enumerating sequence of S is residually ultimately periodic, (3) the differential sequence of S is residually ultimately periodic, (4) the differential sequence of S is ultimately periodic. Proof. The last three conditions are equivalent by Proposition 8 and the equivalence of (1) and (4) follows from Proposition 1. The class of d.r.u.p. sequences was thoroughly studied in [3]. Theorem 2. [3, Theorem 22] Differential residually ultimately periodic sequences are closed under sum, product, exponentation, generalized sum and generalized product. Furthermore, given two d.r.u.p. sequences (un )n≥0 and (vn )n≥0 such that un ≥ vn and lim (∂u)n −(∂v)n = +∞, the sequence un −vn is d.r.u.p. n→∞ 350 7 J. Berstel et al. Residually Representable Transductions Let M be a monoid. A transduction τ : A∗ → M is residually rational (resp. residually representable ) if, for every monoid morphism α from M into a finite monoid N , the transduction α ◦ τ : A∗ → N is rational (resp. representable). Since a rational transduction is (linearly) representable, every residually rational transduction is residually representable. Furthermore, every representable transduction is residually representable. We now show that the removal transductions and the filtering transductions are residually rational. We first consider the removal transductions. Fig. 1. A transducer realizing β. Proposition 10. Let S be a recognizability preserving relation on N. The removal transduction of S is residually rational. Proof. Let α be a morphism from A∗ into a finite monoid N . Let β = α ◦ τs and R = α(A). Since the monoid P(N ) is finite, the sequence (Rn )n≥0 is ultimately periodic. Therefore, there exist two integers r ≥ 0 and q > 0 such that, for all n ≥ r, Rn = Rn+q . Consider the following subsets of N: K0 = {0}, K1 = {1}, . . . , Kr−1 = {r − 1}, Kr = {r, r + q, r + 2q, . . . }, Kr+1 = {r + 1, r + q + 1, r + 2q + 1, . . . }, . . . , Kr+q−1 = {r + q − 1, r + 2q − 1, r + 3q − 1, . . . }. The sets Ki , for i ∈ {0, 1, . . . , r + q − 1} are recognizable and since S is recognizability preserving, each set S −1 (Ki ) is also recognizable. By Proposition 1, there exist two integers ti ≥ 0 and pi > 0 such that, for all n ≥ ti , n ∈ S −1 (Ki ) if and only if n + pi ∈ S −1 (Ki ). Setting t = max0≤i≤r+q−1 ti and p = lcm0≤i≤r+q−1 pi , we conclude that, for all n ≥ t and for 0 ≤ i ≤ r + q − 1, n ∈ S −1 (Ki ) if and only if n + p ∈ S −1 (Ki ), or equivalently S(n) ∩ Ki = ∅ ⇐⇒ S(n + p) ∩ Ki = ∅ Operations Preserving Recognizable Languages 351 It follows that the sequence Rn of P(N ) defined by Rn = RS(n) is ultimately periodic of threshold t and period p, that is, Rn = Rn+p for all n ≥ t. Consequently, the transduction β can be realized by the transducer represented in Figure 1, in which a stands for a generic letter of A. Therefore β is rational and τs is residually rational. Fig. 2. A transducer realizing γs . Proposition 11. Let s be a residually ultimately periodic sequence. Then the filtering transduction τs is residually rational. Proof. Let α be a morphism from A∗ into a finite monoid N . Let γs = α ◦ τs and R = α(A). Finally, let ϕ : N → P(N ) be the morphism defined by ϕ(n) = Rn . Since P(N ) is finite and sn is residually ultimately periodic, the sequence ϕ(sn ) = Asn is ultimately periodic. Therefore, there exist two integers t ≥ 0 and p > 0 such that, for all n ≥ t, Rsn+p = Rsn . It follows that the transduction γs can be realized by the transducer represented in Figure 2, in which a stands for a generic letter of A. Therefore γs is rational and thus τs is residually rational. The fact that the two previous transducers preserve recognizability is now a direct consequence of the following general statement. Theorem 3. Let M be a monoid. Any residually rational transduction τ : A∗ → M preserves recognizability. Proof. Let P be a recognizable subset of M and let α : M → N be a morphism recognizing P , where N is a finite monoid. By definition, α−1 (α(P )) = P . Since τ is residually rational, the transduction α ◦ τ : A∗ → N is rational. Since N is finite, every subset of N is recognizable. In particular, α(P ) is recognizable and since τ preserves recognizability, (α ◦τ )−1 α(P ) is recognizable. The theorem follows, since (α ◦ τ )−1 α(P ) = τ −1 (α−1 (α(P ))) = τ −1 (P ). 352 8 J. Berstel et al. Main Results The aim of this section is to provide a unified solution for the filtering problem and the removal problem. 8.1 The Filtering Problem Theorem 4. A filter preserves recognizability if and only if it is differentially residually ultimately periodic. Proposition 11 and Theorem 3 show that if a filter is d.r.u.p., then it preserves recognizability. We now establish the converse property. Proposition 12. Every recognizability preserving filter is differentially residually ultimately periodic. Proof. Let s be a recognizability preserving filter. By Proposition 3 and 7, it suffices to prove the following properties: (1) for each p > 0, s is ultimately periodic modulo p, (2) for each t ≥ 0, ∂s is ultimately periodic threshold t. (1) Let p be a positive integer and let A = {0, 1, ...(p − 1)}. Let u = a0 a1 · · · be the infinite word whose i-th letter ai is equal to si modulo p. At this stage, we shall need two elementary properties of ω-rational sets. The first one states that an infinite word u is ultimately periodic if and only if the ω-language {u} is ωrational. The second one states that, if L is a recognizable language of A∗ , then − → L (the set of infinite words having infinitely many prefixes in L) is ω-rational. We claim that u is ultimately periodic. Define L as the set of prefixes of the infinite word (0123 · · · (p − 1))ω . Then L[s] is the set of prefixes of u. Since L is −→ recognizable, L[s] is recognizable, and thus the set L[s] is ω-rational. But this set reduces to {u}, which proves the claim. Therefore, the sequence (sn )n≥0 is ultimately periodic modulo p. (2) The proof is quite similar to that of (1), but is sligthly more technical. Let t be a non-negative integer and let B = {0, 1, . . . , t}∪{a}, where a is a special symbol. Let d = d0 d1 · · · be the infinite word whose i-th letter di is equal to si+1 − si − 1 threshold t. Let us prove that d is ultimately periodic. Consider the recognizable prefix code P = {0, 1a, 2a2 , 3a3 , . . . , tat , a}. Then P ∗ [s] is recognizable, and so is the language R = P ∗ [s] ∩ {0, 1, . . . , t}∗ . We claim that, for each n > 0, the word pn = d0 d1 · · · dn−1 is the maximal word of R of length n in the lexicographic order induced by the natural order 0 < 1 < . . . < t. First, pn = u[s], where u = as0 d0 as1 −s0 −1 d1 · · · dn−1 asn −sn−1 −1 and thus pn ∈ R. Next, let p′n = d′0 d′1 · · · d′n−1 be another word of R of length n. Then p′n = u′ [s] for some word u′ ∈ P ∗ . Suppose that p′n comes after pn in the lexicographic order. We may assume that, for some index i ≤ n − 1, d0 = d′0 , d1 = d′1 , . . . , di−1 = d′i−1 and di < d′i . Since u′ ∈ P ∗ , the letter d′i , which occurs in position si in u′ , is followed by at least d′i letters a. Now d′i > di , whence di < t and di = si+1 − si − 1. It Operations Preserving Recognizable Languages 353 follows in particular that in u′ , the letter in position si+1 is an a, a contradiction, since u′ [s] contains no occurrence of a. This proves the claim. Let now A be a finite deterministic trim automaton recognizing R. It follows from the claim that in order to read d in A, starting from the initial state, it suffices to choose, in each state q, the unique transition with maximal label in the lexicographic order. It follows at once that d is ultimately periodic. Therefore, the sequence (∂s) − 1 is ultimately periodic threshold t, and so is (∂s). 8.2 The Removal Problem The solution of the removal problem was given in [12]. Theorem 5. Let S be a subset of N2 . The following conditions are equivalent: (1) for each recognizable language L, the language P (S, L) is recognizable, (2) S is a recognizability preserving relation The most difficult part of the proof, (2) implies (1), follows immediately from Proposition 10 and Theorem 3. 9 Further Properties of d.r.u.p. Sequences Coming back to the filtering problem, the question arises to characterize the filters S such that, for every recognizable language L, both L[S] and L[N \ S] are recognizable. By Theorem 4, the sequences defined by S and its complement should be d.r.u.p. This implies that S is recognizable, according to the following slightly more general result. Proposition 13. Let S and S ′ be two infinite subsets of N such that S ∪ S ′ and S ∩ S ′ are recognizable. If the enumerating sequence of S is d.r.u.p. and if the enumerating sequence of S ′ is r.u.p., then S and S ′ are recognizable. One can show that the conclusion of Proposition 13 no longer holds if S ′ is only assumed to be residually ultimately periodic. 10 Conclusion Our solution to the filtering problem was based on the fact that any residually rational transduction preserves recognizability. There are several advantages to our approach. First, it gives a unified solution to apparently disconnected problems, like the filtering problem and the removal problem. Actually, most of – if not all – the automata-theoretic puzzles proposed in [4,5,6,7,9,10,11,12,14] and [15, Section 5.2], can be solved by using the strongest fact that any residually representable transduction preserves recognizability. Next, refining the approach of [9,10], if τ : A∗ → A∗ × · · · × A∗ is a residually representable transduction, one could give an explicit construction of a monoid 354 J. Berstel et al. recognizing τ −1 (L1 × · · · × Ln ), given monoids recognizing L1 , . . . , Ln , respectively (the details will be given in the full version of this paper). This information can be used, in turn, to see whether a given operation on languages preserves star-free languages, or other standard classes of rational languages. Acknowledgements. Special thanks to Michèle Guerlain for her careful reading of a first version of this paper and to the anonymous referees for their suggestions. References 1. J. Berstel, Transductions and context-free languages, Teubner, Stuttgart, (1979). 2. O. Carton and W. Thomas, The monadic theory of morphic infinite words and generalizations, in MFCS 2000, Lecture Notes in Computer Science 1893, M. Nielsen and B. Rovan, eds, (2000), 275–284. 3. O. Carton and W. Thomas, The monadic theory of morphic infinite words and generalizations, Inform. Comput. 176, (2002), 51–76. 4. S. R. Kosaraju, Finite state automata with markers, in Proc. Fourth Annual Princeton Conference on Information Sciences and Systems, Princeton, N. J. (1970), 380. 5. S. R. Kosaraju, Regularity preserving functions, SIGACT News 6 (2), (1974), 1617. Correction to “Regularity preserving functions”, SIGACT News 6 (3), (1974), 22. 6. S. R. Kosaraju, Context-free preserving functions, Math. Systems Theory 9, (1975), 193–197. 7. D. Kozen, On regularity-preserving functions, Bull. Europ. Assoc. Theor. Comput. Sc. 58 (1996), 131–138. Erratum: On Regularity-Preserving Functions, Bull. Europ. Assoc. Theor. Comput. Sc. 59 (1996), 455. 8. A. B. Matos, Regularity-preserving letter selections, DCC-FCUP Internal Report. 9. J.-É. Pin and J. Sakarovitch, Operations and transductions that preserve rationality, in 6th GI Conference, Lecture Notes in Computer Science 145, Springer Verlag, Berlin, (1983), 617–628. 10. J.-É. Pin and J. Sakarovitch, Une application de la représentation matricielle des transductions, Theoret. Comp. Sci. 35 (1985), 271–293. 11. J. I. Seiferas, A note on prefixes of regular languages, SIGACT News 6, (1974), 25–29. 12. J. I. Seiferas and R. McNaughton, Regularity-preserving relations, Theoret. Comp. Sci. 2, (1976), 147–154. 13. D. Siefkes, Decidable extensions of monadic second order successor arithmetic, in: Automatentheorie und formale Sprachen, (Mannheim, 1970), J. Dörr and G. Hotz, Eds, B.I. Hochschultaschenbücher, 441–472. 14. R. E. Stearns and J. Hartmanis, Regularity preserving modifications of regular expressions, Information and Control 6, (1963), 55–69. 15. Guo-Qiang Zhang, Automata, Boolean matrices, and ultimate periodicity, Information and Computation, 152, (1999), 138–154. 16. Guo-Qiang Zhang, Periodic functions for finite semigroups, preprint. Languages Defined by Generalized Equality Sets Vesa Halava1 , Tero Harju1 , Hendrik Jan Hoogeboom2 , and Michel Latteux3 1 Department of Mathematics and TUCS – Turku Centre for Computer Science, University of Turku, FIN-20014, Turku, Finland {vehalava,harju}@utu.fi 2 Dept. of Comp. Science, Leiden University P.O. Box 9512, 2300 RA Leiden, The Netherlands hoogeboom@liacs.nl 3 Université des Sciences et Technologies de Lille, Bâtiment M3, 59655 Villeneuve d’Ascq Cédex, France latteux@lifl.fr Abstract. We consider the generalized equality sets which are of the form EG (a, g1 , g2 ) = {w | g1 (w) = ag2 (w)}, determined by instances of the generalized Post Correspondence Problem, where the morphisms g1 and g2 are nonerasing and a is a letter. We are interested in the family consisting of the languages h(EG (J)), where h is a coding and J is a shifted equality set of the above form. We prove several closure properties for this family. 1 Introduction In formal language theory, languages are often determined by their generating grammars or accepting machines. It is also customary to say that languages generated by grammars of certain form or accepted by automata of specific type form a language family. Here we shall study a language family defined by simple generalized equality sets of the form EG (J), where J = (a, g1 , g2 ) is an instance of the shifted Post Correspondence Problem consisting of a letter a and two morphisms g1 and g2 . Then the set EG (J) consists of the words w that satisfy g1 (w) = ag2 (w). Our motivation for these generalized equality sets comes partly from a result of [2], where it was proved that the family of regular valence languages is equal to the family of languages of the form h(EG (J)), where h is a coding (i.e., a letterto-letter morphism), and, moreover, in the instance J = (a, g1 , g2 ) the morphism g2 is periodic. Here we shall consider general case where we do not assume g2 to be periodic, but both morphisms to be nonerasing. We study the properties of this family CE of languages by studying its closure properties. In particular, we show that CE is closed under union, product, Kleene plus, intersection with regular sets. Also, more surprisingly, CE is closed under nonerasing morphisms inverse morphisms. A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 355–363, 2003. c Springer-Verlag Berlin Heidelberg 2003  356 2 V. Halava et al. Preliminaries Let A be an alphabet, and denote by A∗ the monoid of all finite words under the operation of catenation. Note that the empty word, denoted by ε, is in the monoid A∗ . The semigroup A∗ \ {ε} generated by A is denoted by A+ . For two words u, v ∈ A∗ , u is a prefix of v if there exists a word z ∈ A∗ such that v = uz. This is denoted by u ≤ v. If v = uz, then we also write u = vz −1 and z = u−1 v. In the following, let A and B be alphabets and g : A∗ → B ∗ a mapping. For a word x ∈ B ∗ , we denote by g −1 (x) = {w ∈ A∗ | g(w) = x} the inverse image of x under g. Then g −1 (K) = ∪x∈K g −1 (x) is the inverse image of K ⊆ B ∗ under g, and g(L) = {g(w) | w ∈ L} is the image of L ⊆ A∗ under g. Also, g is a morphism if g(uv) = g(u)g(v) for all u, v ∈ A∗ . A morphism g is a coding, if it maps letters to letters, that is, if g(A) ⊆ B. A morphism g is said to be periodic, if there exists a word w ∈ B ∗ such that g(A∗ ) ⊆ w∗ . In the following section, for an alphabet A, the alphabet Ā = {ā | a ∈ A} is a copy of A, if A ∩ Ā = ∅. In the Post Correspondence Problem, PCP for short, we are given two morphisms g1 , g2 : A∗ → B ∗ and it is asked whether or not there exists a nonempty word w ∈ A+ such that g1 (w) = g2 (w). Here the pair (g1 , g2 ) is an instance of the PCP, and the word w is called a solution. As a general reference to the problems and results concerning the Post Correspondence Problem, we give [3]. For an instance I = (g1 , g2 ) of the PCP, let E(I) = {w ∈ A∗ | g1 (w) = g2 (w)} be its equality set. It is easy to show that an equality set E = E(g1 , g2 ) is always a monoid, that is, E = E ∗ . In fact, it is a free monoid, and thus the algebraic structure of E is relatively simple, although the problem whether or not E is trivial is undecidable. We shall now consider special instances of the generalized Post Correspondence Problem in order to have slightly more structured equality sets. In the shifted Post Correspondence Problem, or shifted PCP for short, we are given two morphisms g1 , g2 : A∗ → B ∗ and a letter a ∈ B, and it is asked whether there exists a word w ∈ A∗ such that g1 (w) = ag2 (w). (2.1) The triple J = (a, g1 , g2 ) is called an instance of the shifted PCP and a word w satisfying equation (2.1) is called a solution of J. It is clear that a solution w is always nonempty. We let   EG (J) = w ∈ A+ | g1 (w) = ag2 (w) be the generalized equality set of J. We shall denote by CE the set of all languages h(EG (J)), where h is a coding, and the morphisms in the instances J of the shifted PCP are both nonerasing. Languages Defined by Generalized Equality Sets 357 In [2] CE per was defined as the family of languages h(EG (J)), where h is a coding, and one of the morphisms in the instance J of the shifted PCP was assumed to be periodic. It was proved in [2] that CE per is equal to the family of languages defined by the regular valence grammars (see [6]). It is easy to see that the morphisms in the instances could have been assumed to be nonerasing in order to get the same result. Therefore, the family CE studied in this paper is a generalization of CE per or, actually, CE per is a subfamily of CE. 3 Closure Properties of CE The closure properties of the family CE per follow from the known closure properties of regular valence languages. In this section, we study the closure properties of the more general family CE under various operations. Before we start our journey through the closure results, we make first some assumptions of the instances of the shifted PCP defining the languages at hand. First of all, we may always assume that in an instance J = (a, g1 , g2 ) of the shifted PCP the shift letter a is a special symbol that satisfies: The shift letter a can appear only as the first letter in the images of g1 and it does not occur at all in the images of g2 . To see this, consider any language L = h(EG (a, g1 , g2 )), where g1 , g2 : A∗ → B ∗ and h : A∗ → C ∗ . Let # be a new letter not in A ∪ B. Construct a new instance (#, g1′ , g2′ ), where g1′ , g2′ : (A ∪ Ā)∗ → (B ∪ {#})∗ and Ā is a copy of A, by setting for all x ∈ A g2′ (x) = g2′ (x̄) = g2 (x), and g1′ (x) = g1 (x) and  g1 (x), if a  g1 (x), ′ g1 (x̄) = #w, if g1 (x) = aw. Define a new coding h′ : (A ∪ Ā)∗ → C ∗ by h′ (x) = h′ (x̄) = h(x) for all x ∈ A. It is now obvious that L = h′ (EG (#, g1′ , g2′ )). We shall call such an instance (#, g1′ , g2′ ) shift-fixed, where the shift letter # is used only as the first letter. The next lemma shows that we may also assume that the instance (g1 , g2 ) does not have any nontrivial solutions, that is, E(g1 , g2 ) = {ε} for all instance J = (a, g1 , g2 ) defining the language h(EG (J)). For this result we introduce two mappings which are used for desynchronizing a pair of morphisms. Let d be a new letter. For a word u = a1 a2 · · · an , where each ai is a letter, define ℓd (u) = da1 da2 d · · · dan and rd (u) = a1 da2 d · · · dan d. In other words ℓd is a morphism that adds d in front of every letter and rd is a morphism that adds d after every letter of a word. Lemma 1 For every instance J of the shifted PCP and coding h, there exists an instance J ′ = (a, g1′ , g2′ ) and a coding h′ such that h(EG (J)) = h′ (EG (J ′ )) and E(g1′ , g2′ ) = {ε}. 358 V. Halava et al. Proof. Let J = (a, g1 , g2 ) be a shift-fixed instance of the shifted PCP where g1 , g2 : A∗ → B ∗ , and let h : A∗ → C ∗ be a coding. We define new morphisms / B is a new letter and Ā is a copy of g1′ , g2′ : (A ∪ Ā)∗ → (B ∪ {d})∗ , where d ∈ A, as follows. For all x ∈ A, g2′ (x) = ℓd (g2 (x)) and g2′ (x̄) = ℓd (g2 (x))d  #d · rd (w), if g1 (x) = #w, ′ ′ g1 (x) = g1 (x̄) = rd (g1 (x)), if a  g1 (x). (3.1) (3.2) Note that the letters in Ā can be used only as the last letter of a solution of (a, g1′ , g2′ ). Since every image by g2′ begins with letter d and it is not a prefix of any image of g1′ , we obtain that E(g1′ , g2′ ) = {ε}. On the other hand, (a, g1′ , g2′ ) has a solution wx̄ if and only if wx is a solution of (a, g1 , g2 ). Therefore, we define h′ : (A ∪ Ā)∗ → C ∗ by h′ (x) = h′ (x̄) = h(x) for all x ∈ A. The claim of the lemma follows, since obviously h(EG (J)) = h′ (EG (J ′ )). We shall call an instance an instance (a, g1 , g2 ) reduced, if it is shift-fixed and E(g1 , g2 ) = {ε}. 3.1 Union and Product Theorem 2 The family CE is closed under union. Proof. Let K, L ∈ CE with K = h1 (EG (J1 )) and L = h2 (EG (J2 )), where J1 = (a1 , g11 , g12 ) and J2 = (a2 , g21 , g22 ) are reduced, and g11 , g12 : Σ ∗ → B1∗ and g21 , g22 : Ω ∗ → B2∗ . Without restriction we can suppose that Ω ∩ Σ = ∅. (Otherwise we take a primed copy of the alphabet Ω that is disjoint from Σ, and define a new instance J2′ by replacing the letter with primed copies.) Assume also that B1 ∩ B2 = ∅. Let B = B1 ∪ B2 , and let # be a new letter. First replace every appearance of the shift letters a1 and a2 in J1 and J2 with #. Define morphisms g1 , g2 : (Σ ∪ Ω)∗ → B ∗ as follows: for all x ∈ Σ ∪ Ω,   g11 (x), if x ∈ Σ g12 (x), if x ∈ Σ g1 (x) = and g2 (x) = g21 (x), if x ∈ Ω g22 (x), if x ∈ Ω. Define a coding h : (Σ ∪ Ω)∗ → C ∗ similarly:  h1 (x), if x ∈ Σ h(x) = h2 (x), if x ∈ Ω. (3.3) Since Σ ∩ Ω = ∅, and J1 and J2 are reduced (i.e., E(g11 , g12 ) = {ε} = E(g21 , g22 )), we see that the solutions in EG (J1 ) and EG (J2 ) cannot be combined or mixed. Thus, it is straightforward to show that h(EG (#, g1 , g2 )) = K ∪ L. Next we consider the product KL of languages. Languages Defined by Generalized Equality Sets 359 Theorem 3 The family CE is closed under product of languages. Proof. Let K, L ∈ CE with K = h1 (EG (J1 )) and L = h2 (EG (J2 )), where J1 = (a1 , g11 , g12 ) and J2 = (a2 , g21 , g22 ) are shift-fixed. Assume that g11 , g12 : Σ ∗ → B1∗ , and g21 , g22 : Ω ∗ → B2∗ , where again we can assume that Σ ∩ Ω = ∅, and similarly that B1 ∩ B2 = ∅. We also assume that the length of the images of the morphisms are at least 2 (actually, this is needed only for g11 ). This can be assumed, for example, by the construction in Lemma 1. We shall prove that KL = {uv | u ∈ K, v ∈ L} is in CE. For this, we define morphisms g1 , g2 : (Σ ∪ Ω)∗ → (B1 ∪ B2 )∗ in the following way: for each x ∈ Σ,  ℓa2 (g11 (x)), if a1  g11 (x), g1 (x) = if g11 (x) = a1 yw (y ∈ B1 ), a1 yℓa2 (w), and g2 (x) = ra2 (g12 (x)), and for each x ∈ Ω, g1 (x) = g21 (x) and g2 (x) = g22 (x). If we now define h by combining h1 and h2 as in (3.3), we obtain that h(EG (a1 , g1 , g2 )) = KL. We shall now extend the above result by proving that CE is closed under Kleene plus, i.e., if K ∈ CE, then  K+ = K i ∈ CE. i≥1 Clearly CE is not closed under Kleene star, since the empty word does not belong to any language in CE. Theorem 4 The family CE is closed under Kleene plus. Proof. Let K = h(EG (#, g1 , g2 )), where g1 , g2 : A∗ → B ∗ are nonerasing morphisms, h : A∗ → C ∗ is a coding and the instance (#, g1 , g2 ) is shift-fixed. Also, let Ā be a copy of A, and define ḡ1 , ḡ2 : (A ∪ Ā)∗ → B ∗ in the following way: for each x ∈ A, ḡ1 (x) = g1 (x) and ḡ2 (x) = g2 (x),  ℓ# (g1 (x)), if #  g1 (x), ḡ1 (x̄) = ℓ# (w), if g1 (x) = #w, ḡ2 (x̄) = r# (g2 (x)). Extend h also to Ā by setting h(x̄) = h(x) for all x ∈ A. It is now clear that h(EG (#, ḡ1 , ḡ2 )) = K + , since ḡ1 (w) = #ḡ2 (w) if and only if, w = x1 · · · xn xn+1 , where xi ∈ Ā+ for 1 ≤ i ≤ n, xn+1 ∈ A+ , ḡ1 (xi )# = #ḡ2 (xi ) for 1 ≤ i ≤ n and ḡ1 (xn+1 ) = #ḡ2 (xn+1 ). It is clear that after removing the bars form the letters xi (by h), we obtain words in EG (#, g1 , g2 ). 360 3.2 V. Halava et al. Intersection with Regular Languages We show now that CE is closed under intersections with regular languages. Note that for CE per this closure already follows from the closure of Reg(Z) languages. Theorem 5 The family CE is closed under intersections with regular languages. Proof. Let J = (a, g1 , g2 ) be an instance of the shifted PCP, g1 , g2 : Σ ∗ → B ∗ . Let L = h(EG (J)), where h : Σ ∗ → C ∗ is coding. We shall prove that h(EG (J)) ∩ R is in CE for all regular R ⊆ B ∗ . We note first that h(EG (J)) ∩ R = h(EG (J) ∩ h−1 (R)), and therefore it is sufficient to show that, for all regular languages R ⊆ Σ ∗ , h(EG (J)∩R) is in CE. Therefore, we shall give a construction for instances J ′ of the shifted PCP such that EG (J ′ ) = EG (J) ∩ R. Assume R ⊆ Σ ∗ is regular language, and let G = (N, Σ, P, S) be a right linear grammar generating R (see [7]). Let N = {A0 , . . . , An−1 }, where S = A0 , and assume without restriction, that there are no productions having S = A0 on the right hand side. We consider the set P of the productions as an alphabet. Let # and d be new letters. We define new morphisms g1′ , g2′ : P ∗ → (B ∪ {d, #})∗ as follows. First assume that g1 (a) = a1 a2 . . . ak and g2 (a) = b1 b2 . . . bm for the (generic) letter a. We define and  n #d a1 dn a2 dn . . . ak dj ,    dn−i a dn a dn . . . a dj , 1 2 k g1′ (π) = #dn a1 dn a2 dn . . . ak ,    n−i d a1 dn a2 dn . . . ak , g2′ (π) = dn b1 dn b2 . . . dn bm , if if if if π π π π = (A0 → aAj ) = (Ai → aAj ), = (A0 → a), = (Ai → a). if π = (A → aX), where X ∈ N ∪ {ε}. As in [4], EG (J ′ ) = EG (J) ∩ R for the new instance J ′ = (#, g1′ , g2′ ). The claim follows from this. 3.3 Morphisms Next we shall present a construction for the closure under nonerasing morphisms. This construction is a bit more complicated than the previous ones. Theorem 6 The family CE is closed under taking images of nonerasing morphisms. Languages Defined by Generalized Equality Sets 361 Proof. Let J = (a, g1 , g2 ) be an instance of the shifted PCP, where g1 , g2 : A∗ → B ∗ . Let L = h(EG (J)), where h : A∗ → C ∗ is a coding. Assume that f : C ∗ → Σ ∗ is a nonerasing morphism. We shall construct h′ , g1′ and g2′ such that f (L) = h′ (EG (J ′ )) for the new instance J ′ = (a, g1′ , g2′ ). First we show that we can restrict ourselves to cases where min{|g1 (x)|, |g2 (x)|} ≥ |f (x)| for all x ∈ A. (3.4) Indeed, suppose the instance J does not satisfy (3.4). We construct a new in¯ = h(EG (J)) and ḡ1 stance J¯ = (#, ḡ1 , ḡ2 ) and a coding h̄ such that h̄(EG (J) and ḡ2 do fulfill (3.4). Let c ∈ / B be a new letter. Let k = maxx∈A {|f (x)|}. We define ḡ1 (x) = ℓkc (g1 (x)) and ḡ2 (x) = ℓkc (g2 (x)) for all x ∈ A. We also need a new copy x′ of each letter x for which a is a prefix of g1 (x). If g1 (x) = aw, where ¯ then w ∈ B ∗ , then define ḡ1 (x′ ) = #ℓkc (w). It now follows that if u ∈ EG (J), ′ ∗ u = x v for some word v ∈ A and xv ∈ EG (J). Therefore, by defining h̄ as follows  h(y), if y ∈ A h̄(y) = h(x), if y = x′ , ¯ = h(EG (J)) as required. we have h̄(EG (J) Now assume that (3.4) holds in J = (a, g1 , g2 ) and for f . Let us consider the nonerasing morphism f ◦ h : A∗ → Σ ∗ . Note that also the morphism f ◦ h satisfies (3.4). In order to prove the claim, it is clearly sufficient to consider the case, where h is the identity mapping, that is, f = f ◦ h. First we define for every image f (x), where x ∈ A, a new alphabet Ax = {bx | b ∈ Σ}. We consider the words (b1 b2 . . . bm )x = (b1 )x (b2 )x . . . (bm )x , for f (x) = b1 . . . bm . Let c and d be new letters and let n = x∈A |f (x)|. Assume that A = {x1 , x2 , . . . , xq }. Partition the integers 1, 2, . . . , n into q sets such that for the letter xi there corresponds a set, say Si = {i1 , i2 , . . . , i|f (xi | }, of |f (xi )| integers. Assume that f (xi ) = b1 . . . bm , g1 (xi ) = a1 a2 . . . aℓ , and g2 (xi ) = a′1 a′2 . . . a′k . We define new morphisms g1′ and g2′ as follows: g1′ ((b1 )xi ) = cn dn a1 ci1 , g1′ ((bj )xi ) = cn−ij−1 dn aj cij g1′ ((bm )xi ) n−im−1 n =c for j = 2, . . . , m − 1, n n d am c d . . . cn dn aℓ , and g2′ ((b1 )xi ) = cn dn a1 cn di1 , g2′ ((bj )xi ) = dn−ij−1 a′j cn dij g2′ ((bm )xi ) = for j = 2, . . . , m − 1, cn dn−im−1 a′m cn dn . . . cn dn a′k . 362 V. Halava et al. Then g1′ ((b1 . . . bm )xi ) = cn dn a1 cn dn a2 . . . cn dn aℓ , g2′ ((b1 . . . bm )xi ) = cn dn a′1 cn dn a′2 . . . cn dn a′k . The beginning has to be still fixed. For the cases, where a1 = a, we need new letters (b1 )′xi , for which we define g1′ ((b1 )′xi ) = aci1 and g2′ ((b1 )′xi ) = cn dn aj cn di1 . Now our constructions for the morphisms g1′ and g2′ are completed. Next we define h′ , by setting h′ ((bi )x ) = bi and h′ ((b1 )′x ) = b1 for all i and x. We obtain that h′ (EG (J ′ )) = f (h(EG (J)), which proves the claim. Next we shall prove that the family CE is closed under inverse of nonerasing morphisms. Theorem 7 The family CE is closed under nonerasing inverse morphisms. Proof. Consider an instance h(EG (J)), where J = (#, g1 , g2 ) with gi : A∗ → B ∗ and h : A∗ → C ∗ is a coding. We may assume that h(A) = C. Moreover, let g : Σ ∗ → C ∗ be a nonerasing morphism. For each a ∈ Σ, let h−1 g(a) = {va,1 , va,2 , . . . , va,ka } and let Σa = {a(1) , . . . , a(ka ) } be a set of new letters for a. Denote Θ = ∪a∈Σ Σa , and define the morphisms g1′ , g2′ : Θ∗ → B ∗ and the coding t : Θ∗ → Σ ∗ by gj′ (a(i) ) = gj (va,i ) for j = 1, 2, and t(a(i) ) = a for each a(i) ∈ Θ. Consider the instance J ′ = (#, g1′ , g2′ ). Now, assume that x = a1 a2 . . . an ∈ g −1 h(EG (J)) (with ai ∈ Σ). Then there exists a word w = w1 w2 . . . wn such that g1 (w) = #g2 (w) and ai ∈ g −1 h(wi ), that is, wi = vai ,ri ∈ h−1 g(ai ) for some ri , and so g1′ (w′ ) = #g2′ (w′ ) for the word (r ) (r ) (r ) w′ = a1 1 a2 2 . . . an n , for which t(w′ ) = x. Therefore x ∈ t(EG (J ′ )). In converse inclusion, t(EG (J ′ )) ⊆ g −1 h(EG (J)) is clear by the above constructions. ∗ ∗ Let A and B be two alphabets. A mapping τ : A∗ → 2B , where 2B denotes the set of all subsets of B ∗ , is a substitution if for all u, v ∈ A∗ τ (uv) = τ (u)τ (v). ∗ Note that τ is actually a morphisms from A∗ to 2B . A substitution τ is called finite if τ (a) is finite for all a ∈ A, and nonerasing if ∅ = τ (a) = {ε} for all a ∈ A. Languages Defined by Generalized Equality Sets 363 Corollary 8 The family CE is closed under nonerasing finite substitutions. Proof. Since CE is closed under nonerasing morphisms, inverse of nonerasing morphisms, that implies that it is closed under nonerasing finite substitutions that are compositions of inverse of a coding and a nonerasing morphism. Note that CE is almost a trio, see [1], but it seems that it is not closed under all inverse morphisms. It is also almost a bifaithful rational cone, see [5], but since the languages do not contain ε, CE is not closed under the bifaithful finite transducers. References 1. S. Ginsburg, Algebraic and Automata-theoretic Properties of Formal Languages, North-Holland, 1975. 2. V. Halava, T. Harju, H. J. Hoogeboom and M. Latteux, Valence Languages Generated by Generalized Equality Sets, Tech. Report 502, Turku Centre for Computer Science, August 2002, submitted. 3. T. Harju and J. Karhumäki, Morphisms, Handbook of Formal Languages (G. Rozenberg and A. Salomaa, eds.), vol. 1, Springer-Verlag, 1997. 4. H. Latteux and J. Leguy, On the composition of morphisms and inverse morphisms, Lecture Notes in Comput. Sci. 154 (1983), 420–432. 5. H. Latteux and J. Leguy, On Usefulness of Bifaithful Rational cones, Math. Systems Theory 18 (1985), 19–32. 6. G. Păun, A new generative device: valence grammars, Revue Roumaine de Math. Pures et Appliquées 6 (1980), 911–924. 7. A. Salomaa, Formal Languages, Academic Press, New York, 1973. Context-Sensitive Equivalences for Non-interference Based Protocol Analysis ⋆ Michele Bugliesi, Ambra Ceccato, and Sabina Rossi Dipartimento di Informatica, Università Ca’ Foscari di Venezia via Torino 155, 30172 Venezia, Italy {bugliesi, ceccato, srossi}@dsi.unive.it Abstract. We develop new proof techniques, based on non-interference, for the analysis of safety and liveness properties of cryptographic protocols expressed as terms of the process algebra CryptoSPA. Our approach draws on new notions of behavioral equivalence, built on top of a context-sensitive labelled transition system, that allow us to characterize the behavior of a process in the presence of any attacker with a given initial knowledge. We demonstrate the effectiveness of the approach with an example of a protocol of fair exchange. 1 Introduction Non-Interference has been advocated by various authors [1, 9] as a powerful method for the analysis of cryptographic protocols. In [9], Focardi et al. propose a general schema for specifying security properties with a uniform and concise definition. The approach draws on earlier work by the same authors on characterizing information-flow security in terms of Non-Interference for the Security Process Algebra (SPA, for short). We briefly review the main ideas below. SPA is a variant of CCS in which the set of actions is partitioned into two sets: L, for low, and H for high. A Non-Interference property P for a process E is expressed as follows: E ∈ P if ∀Π ∈ EH : (E||Π) \ H ≈P E \ H (1) where EH is the set of all high-level processes, ≈P is an observation equivalence (parametric in P ), || is parallel composition, and \ is restriction. The processes E \ H and (E||Π) \ H represent the low-level views of E and of E||Π, respectively. The basic intuition is expressed by the slogan: “If no high-level process can change the low behavior, then no flow of information from high to low is possible”. In [9] this idea is refined to provide a general definition of security properties for cryptographic protocols described as terms of CryptoSPA, a process algebra that extends SPA with cryptographic primitives. Intuitively, the refinement amounts to viewing the participants to a protocol as low-level processes, while the high-level processes represent the external attackers. Then, Non-Interference implies that the attackers have no way to change the low (honest) behavior of the protocol. ⋆ This work has been partially supported by the MIUR project “Modelli formali per la sicurezza (MEFISTO)” and the EU project IST-2001-32617 “Models and types for security in mobile distributed systems (MyThS)”. A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 364−375, 2003.  Springer-Verlag Berlin Heidelberg 2003 Context-Sensitive Equivalences for Non-interference Based Protocol Analysis 365 There are two problems that need to be addressed to formalize this idea. First, the intruder should be assumed to have complete control over the public components of the network. Consequently, any step in a protocol involving a public channel should be classified as a high-level action. However, since a protocol specification is usually entirely determined by the exchange of messages over public channels, a characterization like (1) becomes trivial, as (E||Π) \ H and E \ H are simply the null processes. This is easily rectified by extending the protocol specification with low-level actions that are used to specify the desired security property. A further problem arises from the formalization of the perfect cryptography assumption that is usually made in the analysis of the logical properties of cryptographic protocols. In [9] this assumption is expressed by making the definition of Non-Interference dependent on the initial knowledge of the attacker and on a deduction system by which the attacker may compute new information. The initial knowledge, noted φ, includes private data (e.g., the enemy’s private keys) as well as any piece of publicly available information, such as names of entities and public keys. Property (1) is thus reformulated for a protocol P as follows: φ P ∈ P if ∀Π ∈ EH : (P||Π) \ H ≈P P \ H. φ (2) where EH is the set of the high-level processes Π which can perform only actions using the public channel names and whose messages (those syntactically appearing in Π) can be deduced from φ. This framework is very general, and lends itself to the characterization of various security properties, obtained by instantiating the equivalence ≈P in the schema above. Instead, it is less effective as a proof method, due to the universal quantification over the φ possible intruders Π in the class EH . In [9], the problem is circumvented by analyzing the protocol in presence of the “hardest attacker”. However, In [9] this characterization is proved correct only for the class of relationships ≈P that are behavioral preorders on processes. In particular, the proof method is not applicable for equivalences based on bisimulation, and consequently, for the analysis of certain, branching time, liveness properties, such as fairness. We partially rectify the problem by developing a technique which does not require us to exhibit an explicit attacker (nor, in particular, it requires the existence of a hardest attacker). Our approach draws on ideas from [4] to represent the attacker indirectly, in terms of a context-sensitive labelled transition system. The labelled transitions take the a form φ⊲P −→ φ′ ⊲P′ , where φ represents the context’s knowledge prior to the transition, ′ and φ is the new knowledge resulting from P performing the action a. Building on this labelled transition system we provide quantification-free characterizations for different instantiations of (2), specifically when ≈P is instantiated to trace equivalence, and to weak bisimulation equivalence. This allows us to apply our technique to the analysis of safety as well as liveness security properties. We demonstrate the latter with an example of a protocol of fair exchange. The rest of the presentation proceeds as follows: Section 2 briefly reviews the process algebra CryptoSPA, Section 3 introduces context-sensitive labelled transition systems, Section 4 gives characterizations for various security properties, Section 5 illustrates the example, and Section 6 draws some conclusions. 366 M. Bugliesi, A. Ceccato, and S. Rossi All the results presented in this paper are described and proved in [7]. 2 The CryptoSPA Language The Cryptographic Security Process Algebra (CryptoSPA, for short) [9] is an extension of SPA [8] with cryptographic primitives and constructs for value passing. The syntax is based on the following elements: a set M of basic messages and a set K of encryption keys with a function ·−1 : K −→ K such that (k−1 )−1 = k; a set M , ranged over by m, of all messages, defined as the least set containing M ∪ K and closed under the deduction rules in Table 1 (more on this below); a set C of channels partitioned into two sets H and L of high and low channels, respectively; a function Msg which maps every channel c into the set of messages that can be sent and received on c and such that Msg(c) = Msg(c); a set L = {c(m) | m ∈ Msg(c)} ∪ {cm | m ∈ Msg(c)} of visible actions and the set Act = L ∪ {τ} of all actions, ranged over by a, where τ is the internal (invisible) action; a function chan(a) which returns c if a is either c(m) or cm and the special channel void when a = τ; a set Const of constants. By an abuse of notation, we write c(m), c m ∈ H whenever c, c ∈ H, and similarly for L. The syntax of CryptoSPA terms (or processes) is defined as follows: P ::= 0 | c(x).P | cm.P | τ.P | P + P | P||P | P \C | P[ f ] | | A(m1 , ..., mn ) | [m = m′ ]P; P | [hm1 ...mn i ⊢rule x]P; P Both c(x).P and [hm1 ...mn i ⊢rule x]P; P′ bind the variable x in P. Constants are defined de f as: A(x1 , ..., xn ) = P, where P is a CryptoSPA process that may contain no free variables except x1 , . . . , xn , which must be pairwise distinct. Table 1. Inference system for message manipulation where m, m′ ∈ M and k, k−1 ∈ K m m′ (m, m′ ) (⊢ pair ) (m, m′ ) (m, m′ ) (⊢ f st ) m′ m m k {m}k (⊢enc ) {m}k k−1 (⊢snd ) (⊢dec ) m Intuitively, 0 is the empty process; c(x).P waits for input m on channel c, and then behaves as P[m/x] (i.e., P with all the occurrences of x substituted by m); c(m).P outputs m on channel c and continues as P; P1 + P2 represents the nondeterministic choice between P1 and P2 ; P1 ||P2 is parallel composition, where executions are interleaved, possibly synchronized on complementary input/output actions, producing an internal action τ; P \ C is like P but prevented from sending and receiving messages Context-Sensitive Equivalences for Non-interference Based Protocol Analysis 367 on channels in C ⊆ C ; in P[ f ] every channel c is relabelled into f (c); A(m1 , ..., mn ) behaves like the respective definition where the variables x1 , · · · , xn are substituted with messages m1 , · · · , mn ; [m = m′ ]P1 ; P2 behaves as P1 if m = m′ and as P2 otherwise; finally, [hm1 ...mn i ⊢rule x]P1 ; P2 tries to deduce an information z from the tuple hm1 ...mn i through rule ⊢rule ; if it succeeds then it behaves as P1 [z/x], otherwise it behaves as P2 . In formalizing the security properties of interest, we will find it convenient to rely on (an equivalent of) the hiding operator, of CSP, noted P/C with P process and C ⊆ C , which turns all actions using channels in C into internal τ’s. This operator can be def defined in CryptoSPA as follows: given any set C ⊆ C , P/C = P[ fC ] where fC (a) = a if chan(a) 6∈ C and fC (a) = τ if chan(a) ∈ C. We denote by E the set of all CryptoSPA processes and by EH the set of all highlevel processes, i.e., those constructed only using actions in H ∪ {τ}. The operational semantics of CryptoSPA is defined in terms of the labelled transition system (LTS) in Table 2. Most of the transitions are standard, and simply formalize the intuitive semantics of the process constructs discussed above. The two rules (⊢i ) connect the deduction system in in Table 1 with the transition system. The former system is used to model the ability of the attacker to deduce new information from its initial knowledge. Note, in particular, that secret keys, not initially known to the attacker, may not be deduced (hence we disregard cryptographic attacks, based on guessing secret keys). We say that m is deducible from a set of messages φ (and write φ ⊢ m) if m can be obtained from φ by applying the inference rules in Table 1. As in [9] we assume that ⊢ is decidable. We complement the definition of the semantics with a corresponding notion of observation equivalence, which is used to establish equalities among processes and is based on the idea that two systems have the same semantics if and only if they cannot be distinguished by an external observer. The equivalences that are relevant to the present discussion are trace equivalence, noted ≈T , and weak bisimulation, noted ≈B (see [13]). In the next section, we introduce coarser versions of these equivalences, noted φ φ ≈T and ≈B , which distinguish processes in contexts with initial knowledge φ. These context-sensitive notions of equivalence are built on a refined version of the labelled transition system, which we introduce next. 3 Context-Sensitive Equivalences Following [4], we characterize the behavior of processes in terms of “context-sensitive labelled transitions” where each process transition depends on the knowledge of the context. To motivate, consider a process P that produces and sends a message {m}k reaching the state P′ , and assume that m and k are known to P but not to the context. Under these hypotheses, the context will never be able to reply the message m to P′ (or any continuation thereof). Hence, if P′ waits for further input, we can safely leave any input transition involving m out of the LTS, as the P′ will never receive m from the context. The states of the new labelled transition system are configurations of the form φ ⊲ P, where P is a process and φ is the current knowledge of the context, represented through 368 M. Bugliesi, A. Ceccato, and S. Rossi Table 2. The operational rules for CryptoSPA m ∈ Msg(c) (input) c(m) m ∈ Msg(c) (output) c(m) c(x).P −→ P[m/x] cm.P −→ P a (tau) τ τ.P −→ P P1 −→ P1′ a P1 ||P2 −→ P1′ ||P2 (||2 ) a (=1 ) m= 6 m′ P2 −→ P2′ a [m = m′ ]P1 ; P2 −→ P2′ f (a) τ P1 ||P2 −→ P1′ ||P2′ m = m′ P1 −→ P1′ a [m = m′ ]P1 ; P2 −→ P1′ a P[ f ] −→ P′ [ f ] (\C) a (constant) c(m) P1 −→ P1′ P2 −→ P2′ a (=2 ) a P −→ P′ ([ f ]) a P1 + P2 −→ P1′ c(m) a (||1 ) P1 −→ P1′ (+1 ) P[m1 /x1 , . . . , mn /xn ] −→ P′ /C P −→ P′ chan(a) ∈ a P \C −→ P′ \C de f A(x1 , . . . , xn ) = P a A(m1 , . . . , mn ) −→ P′ a (⊢1 ) hm1 , . . . , mn i ⊢rule m P1 [m/x] −→ P1′ a [hm1 , . . . , mn i ⊢rule x]P1 ; P2 −→ P1′ a (⊢2 ) ∄m : hm1 , . . . , mn i ⊢rule m P2 −→ P2′ a [hm1 , . . . , mn i ⊢rule x]P1 ; P2 −→ P2′ Context-Sensitive Equivalences for Non-interference Based Protocol Analysis 369 Table 3. Inference rules for the ELTS cm (output) P −→ P′ cm ∈ H c(m) c(m) (input) P −→ P′ φ ⊲ P −→ φ ∪ {m} ⊲ P′ c(m) ∈ H c(m) φ ⊲ P −→ φ ⊲ P′ a τ (tau) φ ⊢ m P −→ P′ (low) τ φ ⊲ P −→ φ ⊲ P′ P −→ P′ a ∈ L a φ ⊲ P −→ φ ⊲ P′ a set of messages. The transitions represent interactions between the process and the context and now take the form a φ ⊲ P −→ φ′ ⊲ P′ , where a is the action executed by the process P and φ′ is the new knowledge at disposal to the context for further interactions with P′ . The transitions between configurations, in Table 3, are defined rather directly starting from the corresponding transitions between processes. In rule (output), the context’s knowledge is augmented with the information sent by the process. Dually, rule (input) assumes that the context performs an output action synchronizing with the input of the process. The message sent by the context must be completely deducible from the context’s knowledge φ, otherwise the corresponding transition is impossible: this is how the new transitions provide an explicit account of the attacker’s knowledge. The remaining rules, (tau) and (low) state that internal actions of the protocol, and low actions do not contribute to the knowledge of the context in any way. In the rest of the presentation, we refer to the transition rules in Table 3 collectively as the enriched LTS (ELTS, for short). Also, we assume that the initial knowledge of the context includes only public information and the context’s private names. This is a reasonable condition, since it simply corresponds to assuming that each protocol run starts with fresh keys and nonces, a condition that is readily guaranteed by relying on time-dependent elements (e.g., time-stamps) and assuming that session keys are distinct for every executions. The notions of trace and weak bisimulation equivalences extend in the expected way from processes to ELTS configurations, as we discuss below. a τ We write φ ⊲ P =⇒ φ′ ⊲ P′ to denote the sequence of transitions φ ⊲ P (−→)∗ φ ⊲ a τ a P1 −→ φ′ ⊲ P2 (−→)∗ φ′ ⊲ P′ , where, as expected, φ = φ′ if −→ is an input, low or silent action. Furthermore, let γ = a1 . . . an ∈ L ∗ be a sequence of (non silent) γ actions; then φ ⊲ P =⇒ φ′ ⊲ P′ if there are P1 , P2 , . . . , Pn−1 ∈ E and φ1 , φ2 , . . . , φn−1 an−1 a1 a2 an φ′ ⊲ P′ . The notation . . . =⇒ φn−1 ⊲ Pn−1 =⇒ φ1 ⊲ P1 =⇒ states such that φ ⊲ P =⇒ â a τ ′ ′ ′ ′ φ ⊲ P =⇒ φ ⊲ P stands for φ ⊲ P =⇒ φ ⊲ P if a ∈ L and for φ ⊲ P (−→)∗ φ ⊲ P′ if a = τ, as usual. 370 M. Bugliesi, A. Ceccato, and S. Rossi Definition 1 (Trace Equivalence over configurations). γ – T (φ ⊲ P) = {γ ∈ L ∗ | ∃φ′ , P′ : φ ⊲ P =⇒ φ′ ⊲ P′ } is the set of traces associated with the configuration φ ⊲ P. – Two configurations φP ⊲ P and φQ ⊲ Q are trace equivalent, denoted by φP ⊲ P ≈cT φQ ⊲ Q, if T (φP ⊲ P) = T (φQ ⊲ Q). Based on trace equivalence over configurations we can then define a corresponding notion of process equivalence, for processes executing in an environment with initial φ knowledge φ. Formally, P ≈T Q whenever φ ⊲ P ≈cT φ ⊲ Q. Definition 2 (Weak Bisimulation over configurations). – A binary relation R over configurations is a weak bisimulation if, assuming (φP ⊲ P, φQ ⊲ Q) ∈ R , one has, for all a ∈ Act: a • if φP ⊲ P −→ φP′ ⊲ P′ , then there exists a configuration φQ′ ⊲ Q′ such that â φQ ⊲ Q =⇒ φQ′ ⊲ Q′ and (φP′ ⊲ P′ , φQ′ ⊲ Q′ ) ∈ R ; a • if φQ ⊲ Q −→ φQ′ ⊲ Q′ , then there exists a configuration φP′ ⊲ P′ such that â φP ⊲ P =⇒ φP′ ⊲ P′ and (φP′ ⊲ P′ , φQ′ ⊲ Q′ ) ∈ R . – Two configurations φP ⊲ P and φQ ⊲ Q are weakly bisimilar, denoted by φP ⊲ P ≈cB φQ ⊲ Q, if there exists a weak bisimulation containing the pair (φP ⊲ P, φQ ⊲ Q). It is not difficult to prove that relation ≈cB is the largest weak bisimulation over configurations, and that it is an equivalence relation. As for trace equivalence, we can recover an equivalence relation on processes executing in a context with initial knowledge φ by φ defining P ≈B Q if and only if φ ⊲ P ≈cB φ ⊲ Q. 4 Non-interference Proof Techniques We show that the new definitions of behavioral equivalence may be used to construct effective proof methods for various security properties within the general schema proposed in [9]. In particular, we show that making our equivalences dependent on the initial knowledge of the attacker provides us with security characterizations that are stated independently from the attacker itself. The first property we study, known as NDC, results from instantiating ≈P in (2) (see the introduction) to the trace equivalence relation ≈T . As discussed in [9], NDC is a generalization of the classical idea of Non-Interference to non-deterministic systems and can be used for analyzing different security properties of cryptographic protocols such as secrecy, authentication and integrity. NDC can readily be extended to account for the context’s knowledge as follows: φ Definition 3 (NDCφ ). P ∈ NDCφ if P \ H ≈T (P||Π) \ H, ∀ Π ∈ EH . A process P is NDCφ if for every high-level process Π with initial knowledge φ a low level user cannot distinguish P from (P||Π), i.e., if Π cannot interfere with the low-level execution of the process P. Context-Sensitive Equivalences for Non-interference Based Protocol Analysis 371 Focardi et al. in [9] show that when φ is finite it is possible to find a most general intruder Topφ so that verifying NDCφ reduces to checking P \ H ≈T (P||Topφ ) \ H. Here we provide an alternative1 , quantification-free characterization of NDCφ . Let P/H denote the process resulting from P, by replacing all high-level actions with the silent action τ (cf. Section 2). φ Theorem 1 (NDCφ ). P ∈ NDCφ if and only if P \ H ≈T P/H. More interestingly, our approach allows us to find a sound proof method for the BNDCφ property, which results from instantiating (2) in the introduction with the equivalence ≈B as follows: φ Definition 4 (BNDCφ ). P ∈ BNDCφ if P \ H ≈B (P||Π) \ H, ∀Π ∈ EH . As for NDCφ , the definition falls short of providing a proof method due to the universal quantification over Π. Here, however, the problem may not be circumvented by resorting to a hardest attacker, as the latter does not exist, being there no (known) preorder on processes corresponding to weak bisimilarity. What we propose here is a partial solution that relies on providing a coinductive (and quantification free) characterization of a sound approximation of BNDCφ , based on the following persistent version of BNDCφ . Definition 5 (P BNDCφ ). P ∈ P BNDCφ if P′ ∈ BNDCφ , ∀P′ reachable from P. P BNDCφ is the context-sensitive version of the P BNDC property studied in [10]. Following the technique in [10], one can show that P BNDCφ is a sound approximation of BNDCφ which admits elegant quantification-free characterizations. Specifically, like P BNDC, P BNDCφ can be characterized both in terms of a suitable weak bisimulation φ relation “up to high-level actions”, noted ≈ \H , and in terms of unwinding conditions, as discussed next. We first need the following definition: â Definition 6. Let a ∈ Act. The transition relation =⇒\H is defined as follows: â =⇒\H = â ( â =⇒ if a 6∈ H a τ̂ =⇒ or =⇒ if a ∈ H â The transition relation =⇒\H is defined as =⇒, except that it treats H-level actions as silent actions. Now, weak bisimulations up to H over configurations are defined as weak bisimulations over configurations except that they allow a high action to be matched by zero or more high actions. Formally: Definition 7 (Weak Bisimulation up to H over configurations). – A binary relation R over configurations is a weak bisimulation up to H if (φP ⊲ P, φQ ⊲ Q) ∈ R implies that, for all a ∈ Act, 1 An analogous result has been recently presented by Gorrieri et al. in [11] for a timed extension of CryptoSPA. We discuss the relationships between our and their result in Section 6. 372 M. Bugliesi, A. Ceccato, and S. Rossi a • if φP ⊲ P −→ φP′ ⊲ P′ , then there exists a configuration φQ′ ⊲ Q′ such that â φQ ⊲ Q =⇒\H φQ′ ⊲ Q′ and (φP′ ⊲ P′ , φQ′ ⊲ Q′ ) ∈ R ; a • if φQ ⊲ Q −→ φQ′ ⊲ Q′ , then there exists a configuration φP′ ⊲ P′ such that â φP ⊲ P =⇒\H φP′ ⊲ P′ and (φP′ ⊲ P′ , φQ′ ⊲ Q′ ) ∈ R . – Two configurations φP ⊲ P and φQ ⊲ Q are weakly bisimilar up to H, denoted by φP ⊲ P ≈c\H φQ ⊲ Q, if there exists a weak bisimulation up to H containing the pair (φP ⊲ P, φQ ⊲ Q). Again, we can prove that the relation ≈c\H is the largest weak bisimulation up to H over configurations and that it is an equivalence relation. Also, as for previous relations over configurations, we can recover an associated relation over processes in a context with initial knowledge φ by defining φ P ≈\H Q if and only if φ ⊲ P ≈c\H φ ⊲ Q. We can finally state the two characterizations of P BNDCφ . The former characteriφ zation is expressed in terms of ≈ \H (with no quantification on the reachable states and on the high-level malicious processes). φ Theorem 2 (P BNDCφ 1). P ∈ P BNDCφ if and only if P \ H ≈ \H P. The second characterization of P BNDCφ is given in terms of unwinding conditions which demand properties of individual actions. Unwinding conditions aim at “distilling” the local effect of performing high-level actions and are useful to define both proof systems (see, e.g., [6]) and refinement operators that preserve security properties, as done in [12]. Theorem 3 (P BNDCφ 2). P ∈ P BNDCφ if and only if for all φi ⊲ Pi reachable from h τ̂ φ ⊲ P, if φi ⊲ Pi −→ φ′i ⊲ Pi′ for h ∈ H, then φi ⊲ Pi =⇒ φ′′i ⊲ Pi′′ such that φ′i ⊲ Pi′ \H ≈cB φ′′i ⊲ Pi′′ \ H. Both the characterizations can be used for verifying cryptographic protocols. A concrete example of a fair exchange protocol is illustrated in the next section. 5 An Example: The ASW Fair Exchange Protocol The ASW contract signing protocol [2] is used in electronic commerce transactions to enable two parties, named O (originator) and R (responder), to obtain each other’s commitment on a previously agreed contractual text M. To deal with unfair situations, each party may appeal to a trusted third party T which can decide, on the basis of the data it has received, whether to issue a replacement contract or an abort token. If both O and R are honest, and they receive the messages sent to them, then they both obtain a valid contract upon the completion of the protocol. We say that the protocol guarantees fairness to O (dually, to R) on message M, if whatever malicious R (O) is considered, if R (O) gets evidence that O (R) has originated M then also O (R) will eventually obtain the evidence that R (O) has received M. Context-Sensitive Equivalences for Non-interference Based Protocol Analysis 373 Notice that this is a branching-time liveness property: we are requiring that something should happen if O (resp. R) gets his evidence —i.e., that also R (resp. O) should get his evidence— for all the execution traces in the protocol (cf. [9] for a thorough discussion on this point). The protocol consists of three independent sub-protocols: exchange, abort and resolve. Here, we focus on the main exchange sub-protocol that is specified by the following four messages, where M is the contractual text on which we assume the two parties previously agreed, while SKO and SKR (PKO and PKR ) are the private (public) keys of O and R, respectively. O → R : me1 = {M, h(NO )}SKO R → O : me2 = {{M, h(NO )}SKO , h(NR )}SKR O → R : me3 = NO R → O : me4 = NR In the first step, O commits to the contractual text by hashing a random number NO , and signing a message that contains both h(NO ) and M. While O does not actually reveal the value of its contract authenticator NO to the recipient of message me1 , O is committed to it. As in a standard commitment protocol, we assume that it is not computationally feasible for O to find a different number NO′ such that h(NO′ ) = h(NO ). In the second step, R replies with its own commitment. Finally, O and R exchange the actual contract authenticators. We specify the sub-protocol in CryptoSPA (see the figure below), by introducing some low-level actions to verify the correctness of protocol’s executions. We say that an execution is correct if we observe the sequence of low-level actions received me1 , received me2 , received NO , received NR in this order. de f O(M, NO ) = [hNO , kh i ⊢enc n][h(M, n), SKO i ⊢enc p] cp. c(v). [hv, PK R i ⊢dec i][i ⊢ f st p′ ][i ⊢snd r′ ][p′ = p] received v. cNO . c( j). [h j, kh i ⊢enc r′′ ][r′′ = r′ ] received j de f R(M, NR ) = c(q). [hq, PKO i ⊢dec s][s ⊢ f st m][s ⊢snd n′ ][m = M] received q. [hNR , kh i ⊢enc r][h(q, r), SKR i ⊢enc t] ct. c(u). [hu, kh i ⊢enc n′′ ][n′′ = n′ ] received u. cNR de f P = O(M, NO ) || R(M, NR ) Fig. 1. The CryptoSPA specification of the exchange sub-protocol of ASW We can demonstrate that the protocol does not satisfy property P BNDCφ when φ consists of public information and private data of possible attacker’s. This can be easily checked by applying Theorem 3. Indeed, just observing the protocol ELTS, one a can immediately notice that there exists a configuration transition φ ⊲ P −→ φ′ ⊲ P′ , τ̂ where a = cme1 , but there isn’t any φ′′ and P′′ such that φ ⊲ P =⇒ φ′′ ⊲ P′′ and φ′ ⊲ P′ \ H ≈cB φ′′i ⊲ Pi′′ \ H. In fact, it is easy to prove that φ′ ⊲ P′ \ H ≈cB 0 for all φ′ , while 374 M. Bugliesi, A. Ceccato, and S. Rossi τ̂ φ′′ ⊲ P′′ \ H 6≈cB 0 for all P′′ and φ′′ such that φ ⊲ P =⇒ φ′′ ⊲ P′′ . However, the fact that, in this case, the ASW protocol does not satisfy P BNDCφ does not represent a real attack to the protocol since such a situation is resolved by inching the trusted party T . More interestingly, we can analyze the protocol under the assumption that one of the participants is corrupt. This can be done by augmenting the knowledge φ with the corrupt party’s private information such as its private key and its contract authenticator. We can show that the protocol does not satisfy P BNDCφ when O is corrupt, finding the attack already described in [14]. 6 Conclusions and Related Work We have studied context-sensitive equivalence relationships and relative proof techniques within the process algebra CryptoSPA to analyze protocols. Our approach builds on context-sensitive labelled transition systems, whose transitions are constrained by the knowledge of the environment. We showed that our technique can be used to analyze both safety and liveness properties of cryptographic protocols. In a recent paper Gorrieri et al. [11] prove results related to ours, for a real-time extension of CryptoSPA. In particular, they prove an equivalent of Theorem 1: however, while the results are equivalent, the underlying proof techniques are not. More precisely, instead of using context-sensitive LTS’s, [11] introduces a special hiding operator /φ and prove that P ∈ NDCφ if and only if P \ H ≈T P/φ H. Process P/φ H corresponds exactly to our configuration φ ⊲ P/H, in that the corresponding LTS’s are isomorphic. However, the approach of [11] is still restricted to the class of observation equivalences that are behavioral preorders on processes and thus it does not extend to bisimulations. As we pointed out since the outset, our approach is inspired by Boreale, De Nicola and Pugliese’s work [4] on characterizing may test and barbed congruence in the spi calculus by means of trace and bisimulation equivalences built on top of context-sensitive LTS’s. Based on the same technique, symbolic semantics and compositional proofs have been recently studied in [3, 5], providing effective tools for the verification of cryptographic protocols. Symbolic description methods could be exploited to deal with the state-explosion problems which are intrinsic in the construction of context-sensitive labelled transition systems. Future plans include work in that direction. References 1. M. Abadi. Security Protocols and Specifications. In W. Thomas, editor, Proc. of the Second International Conference on Foundations of Software Science and Computation Structure (FoSSaCS’99), volume 1578 of LNCS, pages 1–13. Springer-Verlag, 1999. 2. N. Asokan, V. Shoup, and M. Waidener. Asynchronuous Protocols for Optimistic Fair Exchange. In Proc. of the IEEE Symposium on Research in Security and Privacy, pages 86–99. IEEE Computer Society Press, 1998. 3. M. Boreale and M. G. Buscemi. A Framework for the Analysis of Security Protocols. In Proc. of the 13th International Conference on Concurrency Theory (CONCUR’02), volume 2421 of LNCS, pages 483–498. Springer-Verlag, 2002. Context-Sensitive Equivalences for Non-interference Based Protocol Analysis 375 4. M. Boreale, R. De Nicola, and R. Pugliese. Proof Tecniques for Cryptographic Processes. In Proc. of the 14th IEEE Symposium on Logic in Computer Science (LICS’99), pages 157–166. IEEE Computer Society Press, 1999. 5. M. Boreale and D. Gorla. On Compositional Reasoning in the spi-calculus. In Proc. of the 5th International Conference on Foundations of Software Science and Computation Structures (FossaCS’02), volume 2303 of LNCS, pages 67–81. Springer-Verlag, 2002. 6. A. Bossi, R. Focardi, C. Piazza, and S. Rossi. A Proof System for Information Flow Security. In M. Leuschel, editor, Proc. of Int. Workshop on Logic Based Program Development and Transformation, LNCS. Springer-Verlag, 2002. To appear. 7. A. Ceccato. Analisi di protocolli crittografici in contesti ostili. Laurea thesis, Università Ca’ Foscari di Venezia, 2001. 8. R. Focardi and R. Gorrieri. Classification of Security Properties (Part I: Information Flow). In R. Focardi and R. Gorrieri, editors, Foundations of Security Analysis and Design, volume 2171 of LNCS. Springer-Verlag, 2001. 9. R. Focardi, R. Gorrieri, and F. Martinelli. Non Interference for the Analysis of Cryptographic Protocols. In U. Montanari, J.D.P. Rolim, and E. Welzl, editors, Proc. of Int. Colloquium on Automata, Languages and Programming (ICALP’00), volume 1853 of LNCS, pages 744– 755. Springer-Verlag, 2000. 10. R. Focardi and S. Rossi. Information Flow Security in Dynamic Contexts. In Proc. of the 15th IEEE Computer Security Foundations Workshop, pages 307–319. IEEE Computer Society Press, 2002. 11. R. Gorrieri, E. Locatelli, and F. Martinelli. A Simple Language for Real-time Cryptographic Protocol Analysis. In Proc. of 12th European Symposium on Programming Languages and Systems, LNCS. Springer-Verlag, 2003. To appear. 12. H. Mantel. Unwinding Possibilistic Security Properties. In Proc. of the European Symposium on Research in Computer Security, volume 2895 of LNCS, pages 238–254. Springer-Verlag, 2000. 13. R. Milner. Communication and Concurrency. Prentice-Hall, 1989. 14. V. Shmatikov and J. C. Mitchell. Analysis of a Fair Exchange Protocol. In Proc. of 7th Annual Symposium on Network and Distributed System Security (NDSS 2000), pages 119– 128. Internet Society, 2000. On the Exponentiation of Languages Werner Kuich1 and Klaus W. Wagner2 1 Institut für Algebra und Computermathematik Technische Universität Wien Wiedner Hauptstraße 8, A 1040 Wien kuich@tuwien.ac.at 2 Institut für Informatik Bayerische Julius-Maximilians-Universität Würzburg Am Hubland, D-97074 Würzburg, Germany wagner@informatik.uni-wuerzburg.de Abstract. We characterize the exponentiation of languages by other language operations: In the presence of some “weak” operations, exponentiation is exactly as powerful as complement and ε-free morphism. This characterization implies, besides others, that a semi-AFL is closed under complement iff it is closed under exponentiation. As an application we characterize the exponentiation closure of the context-free languages. Furthermore, P is closed under exponentiation iff P = NP , and NP is closed under exponentiation iff NP = co-NP. 1 Introduction Kuich, Sauer, Urbanek [4] defined addition + and multiplication × (different from concatenation) in such a way that equivalence classes of formal languages, defined by help of length preserving morphisms, form a lattice. They defined lattice families of formal languages and showed that, if F is a lattice family of languages then LF is a lattice with a least and a largest element. Here LF is a set of equivalence classes defined by a family F of languages. Moreover, Kuich, Sauer, Urbanek [4] defined exponentiation of formal languages as a new operation. Then they defined stable families of languages (essentially, these are lattice families of languages closed under exponentiation) and showed that, if F is a stable family of languages then LF is a Heyting algebra with a largest element. Moreover, they proved that stable families F of languages can be used to characterize the join and meet irreducibility of LF . (See Theorems 4.2 and 4.3 of Kuich, Sauer, Urbanek [4].) From the point of view of lattice theory it is, by the results quoted above, very interesting to find families of languages that are lattice families or stable families. The paper consists of this and four more sections. In Section 2, we introduce the language operations and language families (formal language classes as well as complexity classes) which are considered in this paper, and we cite from the literature the present knowledge on the closure properties of these classes. A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 376–386, 2003. c Springer-Verlag Berlin Heidelberg 2003  On the Exponentiation of Languages 377 In Section 3 we examine which “classical” language operations are needed to generate the operations addition, multiplication and exponentiation. As corollaries we get lists of classes which are closed under these operations and which are lattice families or stable families, resp. It turns out that the regular languages, the context-sensitive languages, the rudimentary languages, the class PH of the polynomial time hierarchy, and the complexity classes PSPACE, DSPACE(s) for s(n) ≥ n, and NSPACE(s) for space-constructible s(n) ≥ n are stable families and hence closed under exponentiation. In Section 4 we prove that, for every family F of languages that contains all regular languages, the closure of F under union, inverse morphism and exponentiation coincides with the closure of F under union, inverse morphism, ε-free morphism and complement. Since union and inverse morphism are weak operations which only smooth given language classes, this result can informally stated as follows: exponentiation is just as powerful as ε-free morphism and complement together. As one of the possible consequences we obtain: A semi-AFL is closed under exponentiation iff it is closed under complement. In Section 5 we apply the results of Section 4 to various classes of languages which are not closed or not known to be closed under exponentiation. Kuich, Sauer, Urbanek [4] proved that the class CFL of context-free languages is not closed under exponentiation. We show that the closure of CFL under exponentiation and the weak operations of union and inverse morphism coincides with Smullyan’s class RUD of rudimentary languages. Furthermore, we prove that the family of languages P (languages accepted by a deterministic Turing machine in polynomial time) is closed under exponentiation iff P = NP, and that the family of languages NP (languages accepted by a nondeterministic Turing machine in polynomial time) is closed under exponentiation iff NP = co-NP. It is assumed that the reader has a basic knowledge of lattice theory (see Balbes, Dwinger [2]), formal language and automata theory (see Ginsburg [3]), and complexity theory (see Balcázar, Dı́az, Gabarró [1] and Wagner, Wechsung [7]). 2 Families of Languages and Their Closure Properties In this paper we consider several classical operations on languages. We use the symbol εh (lh, h−1 , lh−1 , ∩REG, and − , resp.) for the operation of ε-free morphism (length preserving morphism, inverse morphism, inverse length preserving morphism, intersection with regular languages, and complement, resp.). Given operations O1 , O2 , . . . , Or on languages, we introduce the closure operator ΓO1 ,O2 ,...,Or on families of languages as follows: For a family F of languages, ΓO1 ,O2 ,...,Or (F) is the closure of F under the operations O1 , O2 , . . . , Or , i.e., the least family of languages containing F and being closed under the operations O1 , O2 , . . . , Or . Let REG, CFL, and CSL be the classes of regular, context-free, and contextsensitive languages, resp. The class LOGCFL consists of the languages which are logarithmic-space many-one reducible to context-free languages. The class RUD 378 W. Kuich and K.W. Wagner of rudimentary languages is the smallest class of languages that contains CFL and is closed under ε-free morphism and complement, i.e., RUD = Γεh,− (CFL). The classes P and NP consist of all languages which can be accepted in polynomial time by deterministic and nondeterministic, resp., Turing machines. Let co-NP be the class of all languages whose complement is in NP. With Q we denote the classes of languages which can be accepted in linear time by nondeterministic Turing machines. The classes L and NL consist of all languages which can be accepted in logarithmic space by deterministic and nondeterministic, resp., Turing machines. The class PSPACE consists of all languages which can be accepted in polynomial space by deterministic Turing machines. Let Σkp and Πkp , k ≥ 1, be the classes of the polynomial-time hierarchy, i.e., p p Σ1 = N P , Σk+1 is the class of all languages which are nondeterministically polynomial-time Turing-reducible to languages from Σkp , and Πkp is the class of all languages whose complement is in Σkp (k ≥ 1). Finally, PH is the union of all these classes Σkp and Πkp . Notice that PH ⊆ PSPACE. For a function t : N → N, the classes DTIME(t) and NTIME(t) consist of all languages which can be accepted in time t by deterministic and nondeterministic, resp., Turing machines. For a function s : N → N, the classes DSPACE(s) and NSPACE(s) consist of all languages which can be accepted in space s by deterministic and nondeterministic, resp., Turing machines. For exact definitions and more information about these classes see e.g. [1] and [7]. The following table shows the known closure properties of these classes (cf. [7]). Theorem 21 An entry + (-, ?, resp.) in the following table means that the class in this row is closed (not closed, not known to be closed, resp.) under the operation in this column. 3 Lattice Families and Stable Families of Languages In this section we introduce the operations of addition, multiplication and exponentiation of languages, and we see how they can be generated by “classical” operations on languages. Throughout this paper the symbol Σ (possibly provided with indices) denotes a finite subalphabet of some infinite alphabet Σ∞ of symbols. Let L1 ⊆ Σ1∗ and L2 ⊆ Σ2∗ . Define L1 ≤ L2 if h(L1 ) ⊆ L2 for some length preserving morphism h : Σ1∗ → Σ2∗ and L1 ∼ L2 if L1 ≤ L2 and L2 ≤ L1 . Then ∼ is an equivalence relation. If L1 ∼ L′1 and L2 ∼ L′2 then L1 ≤ L2 iff L′1 ≤ L′2 . It follows that ≤ is a partial order relation on the ∼-equivalence classes. Let [L] be the ∼-equivalence class including the language L. Let L1 ⊆ Σ1∗ and L2 ⊆ Σ2∗ . Define L1 ×L2 = {(a1 , b1 ) . . . (an , bn ) | a1 . . . an ∈ L1 , b1 . . . bn ∈ L2 } ⊆ (Σ1 × Σ2 )∗ , and let L1 + L2 be the disjoint union of L1 and L2 . That is the language defined as L1 ∪L2 given that Σ1 ∩Σ2 = ∅. If Σ1 ∩Σ2 = ∅ On the Exponentiation of Languages language classes REG CFL CSL LOGCFL RUD L NL P Q NP co-NP Σkp (k ≥ 1) Πkp (k ≥ 1) PH PSPACE DTIME(t) (t(n) ≥ n) NTIME(t) (t(n) ≥ n) DSPACE(s) (s(n) ≥ n) NSPACE(s) (s(n) ≥ n) operations ∪ ∩REG ∩ − εh lh−1 + + + + + + + + − − + + + + + + + + + + + + ? + + + + + + + + + + + ? + + + + + ? + + + + + ? + + + + ? + + + + + ? + + + + + ? ? + + + + ? + + + + + ? ? + + + + + + + + + + + + + + + + + ? + + + + ? + + + + + + + + + + + +3 + + 379 h−1 + + + + + + + + + + + + + + + +1 +1 +2 +2 The functions t and s are assumed to be increasing. +1 - Replace t with t(O(n)) +2 - Replace s with s(O(n)) +3 - Assume that s is space-constructible, i.e., the computation x → s(|x|) can be carried out in space s(|x|). then create the new alphabet Σ̄ = {ā | a ∈ Σ2 } such that Σ1 ∩ Σ̄ = ∅ and a copy L̄ ⊆ Σ̄ ∗ of L2 and take L1 + L2 = L1 ∪ L̄. It is easy to see that if L1 ∼ L3 and L2 ∼ L4 then L1 + L2 ∼ L3 + L4 and L1 × L2 ∼ L3 × L4 . It follows that the operations + and × lift consistently to ∼equivalence classes of languages. It is clear that multiplication × and addition + on ∼-equivalence classes are commutative and associative operations. We denote the set of ∼-equivalence classes of languages by L. If F is a family of languages ◦ then we denote LF = {[L] ∩ F | L ∈ F}. By 1 ∈ L we denote the ∼-equivalence class containing the language {a}∗ for some a ∈ Σ∞ and by ∅ ∈ L we denote the ∼-equivalence class containing the language ∅. A lattice P ; ≤, +, × is a partially ordered set in which for every two elements a, b ∈ P there exists a least upper bound, denoted by a + b, and a greatest lower bound, denoted by a × b. A family F of languages is called lattice family if F is closed under isomorphism, plus + and times ×, and contains ∅ and Σ ∗ for all finite Σ ⊂ Σ∞ . 380 W. Kuich and K.W. Wagner Theorem 31 (Kuich, Sauer, Urbanek [4]) L; ≤, +, × is a lattice with least ◦ element ∅ and largest element 1. If F is a lattice family of languages then ◦ LF ; ≤, +, × is a lattice with least element ∅ and largest element 1. Lemma 32 For all L1 ⊆ Σ1∗ and L2 ⊆ Σ2∗ there exist length preserving morphisms H, H1 , H2 such that L1 + L2 = L1 ∪ H −1 (L2 ) and L1 × L2 = H1−1 (L1 ) ∩ H2−1 (L2 ) . Proof. (i) If Σ1 ∩ Σ2 = ∅ then H : Σ2∗ → Σ2∗ is the identity. If Σ1 ∩ Σ2 = ∅ then create the new alphabet Σ̄ = {ā | a ∈ Σ2 } and define H : Σ̄ ∗ → Σ2∗ by H(ā) = a, a ∈ Σ2 . (ii) Define Hi : (Σ1 × Σ2 )∗ → Σi∗ , i = 1, 2, by H1 ([a, b]) = a and H2 ([a, b]) = b, a ∈ Σ1 , b ∈ Σ2 . Then L1 × L2 = H1−1 (L1 ) ∩ H2−1 (L2 ). ⊓ ⊔ From this and the previous theorem we conclude the following theorem. Theorem 33 1. If F is a family of languages closed under union, intersection and inverse length preserving morphism then F is also closed under addition and multiplication. 2. If F is a family of languages that contains ∅ and Σ ∗ for all finite Σ ⊆ Σ∞ and that is closed under union, intersection, and inverse length preserving morphism then F is a lattice family. Corollary 34 The following families of languages are lattice families: (i) REG, CSL, LOGCFL, and RUD. (ii) L, NL, P, Q, NP, and PSPACE. (iii) Σkp , Πkp for k ≥ 1, and PH. (iv) DTIME(t) and NTIME(t) for t(n) ≥ n. (v) DSPACE(s) and NSPACE(s) for s(n) ≥ n. Proof. This is an immediate consequence of Theorem 21 ⊓ ⊔ Let Σ = {h | h : Σ1 → Σ2 } be the set of all functions h : Σ1 → Σ2 considered as an alphabet. This alphabet is denoted by Σ2Σ1 . For f = h1 . . . hn ∈ Σ n and w = a1 . . . am ∈ Σ1m define  h1 (a1 ) . . . hn (an ) if n = m f (w) = undefined if n = m. (and ε(ε) = ε if n = 0). For L1 ⊆ Σ1∗ , L2 ⊆ Σ2∗ define ∗ 1 LL 2 = {f ∈ Σ | f (w) ∈ L2 for all w ∈ L1 for which f (w) is defined} . 1 Observe that LL 2 depends on the sets Σ1 and Σ2 . On the Exponentiation of Languages 381 The notion of exponentiation lifts to ∼-equivalence classes of languages. Hence, for ∼-equivalence classes of languages L1 and L2 the class LL2 1 is independent of the alphabets. A lattice P ; ≤, +, × is called Heyting algebra if (i) for all a, b ∈ P there exists a greatest c ∈ P such that a × c ≤ b. This element c is denoted by ba . It is called the exponentiation of b by a. (ii) There exists a least element 0 in P . A family F of languages is stable if it is a lattice family and closed under exponentiation and intersection with regular languages. Theorem 35 (Kuich, Sauer, Urbanek [4]) Let F be a stable family of languages. Then LF ; ≤, +, × is a Heyting algebra, where the class ∅ is the 0-element ◦ and 1 is the largest element. Hence, for the equivalence classes of LF , where F is a stable family of languages, the computation rules given in Kuich, Sauer, Urbanek [4], Corollary 2.3, are valid, e. g., LL1 +L2 = LL1 × LL2 , (LL1 )L2 = LL1 ×L2 , (L1 × L2 )L = LL1 × LL2 for all L, L1 , L2 ∈ LF . For L ⊆ Σ ∗ we define the complement of L by complΣ (L) = Σ ∗ − L. Lemma 36 For all L1 ⊆ Σ1∗ and L2 ⊆ Σ2∗ there exist length preserving morphisms H1 , H2 , H3 such that −1 −1 1 LL 2 = complΣ (H3 (H1 (L1 ) ∩ H2 (complΣ2 (L2 )))) , where Σ = Σ2Σ1 . Proof. Define the morphisms H1 : (Σ × Σ1 )∗ → Σ1∗ , H2 : (Σ × Σ1 )∗ → Σ2∗ and H3 : (Σ × Σ1 )∗ → Σ ∗ by H1 ([h, a]) = a, H2 ([h, a]) = h(a) and H3 ([h, a]) = h for all h ∈ Σ and a ∈ Σ1 . Then, for all h1 , . . . , hn ∈ Σ, n ≥ 0, 1 h1 . . . hn ∈ complΣ (LL 2 ) ⇔ ∃a1 , . . . , an (a1 . . . an ∈ L1 ∧ h1 (a1 ) . . . hn (an ) ∈ complΣ2 (L2 )) ⇔ ∃a1 , . . . , an (H1 ([h1 , a1 ]) . . . H1 ([hn , an ]) ∈ L1 ∧H2 ([h1 , a1 ]) . . . H2 ([hn , an ]) ∈ complΣ2 (L2 )) ⇔ ∃a1 , . . . , an ([h1 , a1 ] . . . [hn , an ] ∈ H1−1 (L1 ) ∩ H2−1 (complΣ2 (L2 ))) ⇔ h1 . . . hn ∈ H3 (H1−1 (L1 ) ∩ H2−1 (complΣ2 (L2 ))) . ⊓ ⊔ From this and the previous theorem we conclude the following theorem. Theorem 37 1. If F is a family of languages closed under union, complement, inverse length preserving morphism and length preserving morphism then F is also closed under exponentiation. 2. If F is a family of languages that contains ∅ and Σ ∗ for all finite Σ ⊆ Σ∞ and that is closed under union, complement, inverse length preserving morphism, length preserving morphism and intersection with regular languages then F is stable. From this and Theorem 21 we obtain Corollary 38 The following families of languages are stable (and hence closed under exponentiation): 382 W. Kuich and K.W. Wagner (i) (ii) (iii) (iv) REG, CSL, and RUD. PH and PSPACE. DSPACE(s) for s(n) ≥ n. NSPACE(s) for space-constructible s(n) ≥ n. 4 On the Power of Exponentiation In this section we will compare the power of exponentiation with the power of complement and ε-free morphism. In this comparision some other operations play a role, namely union, intersection with regular languages, and inverse morphism. However, these operations are weak in the sense that they do not really add power to language classes, they only smooth them. Practically all formal language classes and complexity classes are closed under these operations. On the other side, the operations of length preserving morphism and complement are more powerful: ε-free morphisms introduce nondeterminism, and the class of context free languages, for example, is not closed under complement. In this section we prove that, in the presence of the above mentioned weak operations, ε-free morphism and complement on the one side and exponentiation on the other side are equally powerful. We start with two lemmas showing how length preserving morphism and complementation can be generated by exponentiation. For Σ ⊂ Σ∞ we define EΣ ⊆ (Σ × Σ)∗ by EΣ = {[x, x] | x ∈ Σ}+ . Observe that complΣ×Σ (EΣ ) is a regular language. Lemma 41 For L ⊆ Σ ∗ there exists a length preserving morphism H : Σ ∗ → ((Σ × Σ)Σ )∗ such that complΣ (L) = H −1 ((complΣ×Σ (EΣ ))L ). Proof. We define hb : Σ → Σ × Σ by hb (a) = [a, b] and the morphism H by H(b) = hb for all a, b ∈ Σ. Then, for b1 , . . . , bn ∈ Σ, the equivalence b1 . . . bn ∈ L ⇔ ∃a1 , . . . , an (a1 . . . an ∈ L ∧ a1 . . . an = b1 . . . bn ) implies the equivalences b1 . . . bn ∈ complΣ (L) ⇔ ∀a1 , . . . , an (a1 . . . an ∈ L ⇒ a1 . . . an = b1 . . . bn ) ⇔ ∀a1 , . . . , an (a1 . . . an ∈ L ⇒ [a1 , b1 ] . . . [an , bn ] ∈ complΣ×Σ (EΣ )) ⇔ ∀a1 , . . . , an (a1 . . . an ∈ L ⇒ hb1 (a1 ) . . . hbn (an ) ∈ complΣ×Σ (EΣ )) ⇔ hb1 . . . hbn ∈ complΣ×Σ (EΣ )L ⇔ H(b1 . . . bn ) ∈ complΣ×Σ (EΣ )L ⇔ b1 . . . bn ∈ H −1 (complΣ×Σ (EΣ )L ) ⊓ ⊔ For a length preserving morphism h : Σ1∗ → Σ2∗ we define Eh = {[x, h(x)] | x ∈ Σ1 }+ . Observe that Eh is a regular language. Lemma 42 For L ⊆ Σ1∗ and a length preserving morphism h : Σ1∗ → Σ2∗ there exist length preserving morphisms H1 : Σ2∗ → ((Σ1 × Σ2 )Σ1 )∗ and H2 : (Σ1 × Σ2 )∗ → Σ1∗ such that ∗ h(L) = complΣ2 (H1−1 (complΣ1 ×Σ2 (Eh ∩ H2−1 (L))Σ1 )) . On the Exponentiation of Languages 383 Proof. We define hb : Σ1 → Σ1 × Σ2 by hb (a) = [a, b], H1 by H1 (b) = hb , and H2 by H2 ([a, b]) = a for all a ∈ Σ1 , b ∈ Σ2 . Then, for b1 , . . . , bn ∈ Σ2 , the equivalence b1 . . . bn ∈ h(L) ⇔ ∃a1 , . . . , an (h(a1 . . . an ) = b1 . . . bn ∧ a1 . . . an ∈ L) implies the equivalences b1 . . . bn ∈ complΣ2 (h(L)) ⇔ ⇔ ∀a1 , . . . , an (a1 . . . an ∈ Σ1∗ ⇒ ¬(h(a1 . . . an ) = b1 . . . bn ∧ a1 . . . an ∈ L)) ⇔ ∀a1 , . . . , an (a1 . . . an ∈ Σ1∗ ⇒ [a1 , b1 ] . . . [an , bn ] ∈ / Eh ∩ H2−1 (L)) ∗ ⇔ ∀a1 , . . . , an (a1 . . . an ∈ Σ1 ⇒ hb1 (a1 ) . . . hbn (an ) ∈ complΣ1 ×Σ2 (Eh ∩ H2−1 (L))) ∗ ⇔ hb1 . . . hbn ∈ complΣ1 ×Σ2 (Eh ∩ H2−1 (L))Σ1 ∗ ⇔ H1 (b1 . . . bn ) ∈ complΣ1 ×Σ2 (Eh ∩ H2−1 (L))Σ1 −1 −1 Σ1∗ ⊓ ⊔ ⇔ b1 . . . bn ∈ H (compl (Eh ∩ H (L)) ) . 1 Σ1 ×Σ2 2 The next lemma shows how ε-free morphisms can be generated by length preserving morphisms (cf. [3]). Lemma 43 Consider L ⊆ Σ1∗ and an ε-free morphism h : Σ1∗ → Σ2∗ . Then there exists a length preserving morphism h′ : Σ ∗ → Σ2∗ , a morphism H : Σ ∗ → Σ1∗ , and a regular set R ⊆ Σ ∗ such that h(L) = h′ (H −1 (L) ∩ R) . Proof. Let Σ1 = {a1 , . . . , ak }, and let h(ai ) = bi1 bi2 . . . biri for i = 1, . . . , k. Define the alphabet Σ by Σ = {aij | i = 1, . . . , k and j = 1, . . . , ri }, the length preserving morphism h′ by h′ (aij ) = bij for i = 1, . . . , k and j = 1, . . . , ri , the morphism H by H(ai1 ) = ai , H(aij ) = ε for i = 1, . . . , k and j = 2, . . . , ri , and the regular set R by R = {ai1 ai2 . . . airi | i = 1, . . . , k}∗ . Then we obtain h(L) = {h(ai1 ai2 . . . ain ) | ai1 ai2 . . . ain ∈ L} = {bi1 1 . . . bi1 ri1 bi2 1 . . . bi2 ri2 . . . bin 1 . . . bin rin | ai1 ai2 . . . ain ∈ L} = {h′ (ai1 1 . . . ai1 ri1 ai2 1 . . . ai2 ri2 . . . ain 1 . . . ain rin ) | ai1 ai2 . . . ain ∈ L} = h′ ({ai1 1 . . . ai1 ri1 ai2 1 . . . ai2 ri2 . . . ain 1 . . . ain rin | ai1 ai2 . . . ain ∈ L}) ⊓ ⊔ = h′ (H −1 (L) ∩ R) . Using this notation we immediately obtain the following consequences from Lemma 36, Lemma 41, Lemma 42, and Lemma 43. Corollary 44 For any family F of languages there holds: 1. 2. 3. 4. Γexp (F) ⊆ Γ∪,lh−1 ,lh,− (F) Γ− (F) ⊆ Γlh−1 ,exp (F ∪ REG) Γlh (F) ⊆ Γ∩REG,lh−1 ,− ,exp (F) Γεh (F) ⊆ Γ∩REG,h−1 ,lh (F) Now we can prove the main theorem of this section. Informally it says that, in the presence of the weak operations ∪ and h−1 , the operation exp is as powerful as the operations εh and − (lh and − , resp). 384 W. Kuich and K.W. Wagner Theorem 45 For a family F of languages that contains REG, there holds 1. Γ∪,lh−1 ,lh,− (F) = Γ∪,lh−1 ,exp (F). 2. Γ∪,h−1 ,εh,− (F) = Γ∪,h−1 ,lh,− (F) = Γ∪,h−1 ,exp (F). Proof. We conclude Γ∪,lh−1 ,lh,− (F) ⊆ Γ∪,lh−1 ,∩REG,− ,exp (F) = Γ∪,lh−1 ,− ,exp (F) (Lemma 44.3) ⊆ Γ∪,lh−1 ,exp (F) (Lemma 44.2) (Lemma 44.1) ⊆ Γ∪,lh−1 ,lh,− (F) and Γ∪,h−1 ,εh,− (F) ⊆ ⊆ ⊆ ⊆ ⊆ Γ∪,h−1 ,∩REG,lh,− (F) = Γ∪,h−1 ,lh,− (F) Γ∪,h−1 ,∩REG,− ,exp (F) = Γ∪,h−1 ,− ,exp (F) Γ∪,h−1 ,exp (F) Γ∪,h−1 ,lh,− (F) Γ∪,h−1 ,εh,− (F) (Lemma (Lemma (Lemma (Lemma 44.4) 44.3) 44.2) 44.1) ⊓ ⊔ Corollary 46 1. Let F be a family of languages that contains REG and is closed under union and inverse length preserving morphism. Then F is closed under exponentiation iff it is closed under length preserving morphism and complement. 2. Let F be a family of languages that contains REG and is closed under union and inverse morphism. Then F is closed under exponentiation iff it is closed under ε-free morphism and complement. From this corollary we get directly the following three corollaries. Corollary 47 Let F be a family of languages that contains REG and is closed under union, inverse length preserving morphism, and length preserving morphism. Then F is closed under complement iff it is closed under exponentiation. A family of languages is called a semi-AFL if it is closed under union, inverse morphism, ε-free morphism, and intersection with regular languages and if it contains ∅ and Σ ∗ for all Σ ⊆ Σ∞ . (see [3]). Corollary 48 A semi-AFL is closed under complement iff it is closed under exponentiation. Corollary 49 1. Let F be a family of languages that contains REG and is closed under union, complement and inverse length preserving morphism. Then F is closed under length preserving morphism iff it is closed under exponentiation. 2. Let F be a family of languages that contains REG and is closed under union, complement and inverse morphism. Then F is closed under ε-free morphism iff it is closed under exponentiation. On the Exponentiation of Languages 5 385 Application to Language Classes In this section we apply the results of the previous section to the language classes mentioned in Section 2. In the case that a class is not closed under exponentiation we will characterize the closure of this class under exponentiation. In the case that it is not known whether the class is closed under exponentiation we will give equivalent conditions for the class being closed under exponentiation. Let us start with the class CFL of context-free languages. By Lemma 2.1 of Kuich, Sauer, Urbanek [4], the context-free languages are not closed under exponentiation. We are now able to determine the closure of CFL under exponentiation (together with some “weak” operations). The class RUD of rudimentary languages, introduced by Smullyan in [5], can be considered as the linear time analogon of the class PH of the polynomial time hierarchy. From Theorem 45.2 and Theorem 21 we obtain the following theorem. Theorem 51 The class RUD coincides with the closure of CFL under union, inverse morphism and exponentiation. Now we turn to classes which are not known to be closed under exponentiation. We start with some classes between L and P. Theorem 52 Let F be a family of languages that is closed under union, complement, and logarithmic space many-one reducibility and that fulfills L ⊆ F ⊆ NP. Then F is closed under exponentiation iff F = NP. Proof. Obviously, closure under logarithmic space many-one reducibility implies closure under inverse morphism. By Corollary 49.2 we obtain that F is closed under exponentiation iff it is closed under ε-free morphism. If F is closed under ε-free morphism, then we obtain F = Γεh (F) ⊇ Γεh (L). A result by Springsteel [6] says that Γεh (L) ⊇ Q. Hence F ⊇ Q. The class Q contains sets which are logarithmic space many-one complete for NP. Since F is closed under logarithmic space many-one reducibility we get F ⊇ NP and hence F = NP. On the other side, if F = NP then, by Theorem 21, F is closed under ε-free morphism. ⊓ ⊔ Since the classes L, NL, LOGCFL, P, and NP ∩ coNP are closed under union, complement, and logarithmic space many-one reducibility, we obtain the following corollary. Corollary 53 1. L is closed under exponentiation iff L = NP. 2. NL is closed under exponentiation iff NL = NP. 3. LOGCFL is closed under exponentiation iff LOGCFL = NP. 4. P is closed under exponentiation iff P = NP. 5. NP ∩ coNP is closed under exponentiation iff NP = coNP. 386 W. Kuich and K.W. Wagner The classes in the previous corollary are closed under complement but not known to be closed under ε-free morphism. For the nondeterministic time classes Q, NP, NTIME(t) and Σkp the opposite is true. Here we can apply Corollary 47. Theorem 54 1. Q is closed under exponentiation iff Q = co-Q. 2. NP is closed under exponentiation iff NP = co-NP. 3. For every increasing t : N → N such that t(n) ≥ n, NTIME(t) is closed under exponentiation iff NTIME(t) = co-NTIME(t). 4. Σkp is closed under exponentiation iff Σkp = Πkp . Note that Q = co-Q implies NP = co-NP, and NP = co-NP implies Σkp = Πkp for k ≥ 2 (cf. [7]). Finally we consider the classes Πkp of the polynomial-time hierarchy. Theorem 55 For k ≥ 1, the class Πkp is closed under exponentiation iff Πkp = Σkp . Proof. If Πkp is closed under exponentiation then, by Corollary 44.2 and Theorem 21, Πkp is closed under complementation, i.e., Πkp = Σkp . On the other side, if Πkp = Σkp then Πkp = PH. By Corollary 38 we obtain that Πkp is closed under exponentiation. ⊓ ⊔ References [1] Balcázar J.L., Dı́az J., Gabarró J.: Structural Complexity I. Second edition. Springer-Verlag Berlin, 1995. [2] Balbes R., Dwinger P.: Distributive Lattices. University of Missouri Press, 1974. [3] Ginsburg S.: Algebraic and Automata-Theoretic Properties of Formal Languages. North-Holland, 1975. [4] Kuich W., Sauer N., Urbanek F.: Heyting algebras and formal languages. J.UCS 8(2002), 722–736. [5] Smullyan R.: Theory of Formal Systems. Annals of Mathematical Studies vol. 47. Princeton University Press, 1961. [6] Springsteel F.N.: On the pre-AFL of logn space and related families of languages. Theoretical Computer Science 2(1976), 295–303. [7] Wagner K., Wechsung G.: Computational Complexity. Deutscher Verlag der Wissenschaften, 1986. Kleene’s Theorem for Weighted Tree-Automata Christian Pech⋆ Technische Universität Dresden Fakultät für Mathematik und Naturwissenschaften D-01062 Dresden, Germany pech@math.tu-dresden.de Abstract. We sketch the proof of a Kleene-type theorem for formal tree-series over commutative semirings. That is, for a suitable set of rational operations we show that the proper rational formal tree-series coincide with the recognizable ones. A complete proof is part of the PhD-thesis of the author, which is available at [9]. Keywords: tree, automata, weight, language, Kleene’s theorem, Schützenberger’s theorem, rational expression. A formal tree-series is a function from the set TΣ of trees over a given ranked alphabet Σ into a semiring K. The classical notion of formal tree-languages is obtained if K is chosen to be the Boolean semiring. Rational operations on formal tree-languages like sum, topcatenation, a-multiplication etc. have been used by Thatcher and Wright [11] to characterize the recognizable formal tree-languages by rational expressions. Thus they generalized the classical Kleene-theorem [6] stating that rational and recognizable formal languages coincide. The rational operations on tree-languages can be generalized to formal tree-series. We would like to know the generating power of these operations. There are several results on this problem—each for some restricted class of semirings—saying that for formal tree-series the rational series coincide with the recognizable series, too. In particular it was shown by Kuich [7] for complete, commutative semirings, by Bozapalidis [3] for ω-additive, commutative semirings, by Bloom and Ésik [2] for commutative Conwaysemirings and by Droste and Vogler [5] for idempotent, commutative semirings. The necessary restrictions on the semiring are in contrast with the generality of Schützenbergers theorem for formal power series (i.e. functions from Σ ∗ into a semiring) [10] that is completely independent of the semiring. Here we develop a technique how to restrict the list of requirements to a minimum. The main idea is that instead of working directly with formal tree-series, we introduce the notion of weighted tree-languages. They form a category which algebraically is closer related to formal tree-languages than to formal tree-series. The environment that we obtain allows us to translate the known constructions of the rational operations directly to weighted tree-languages. ⋆ This work was supported by the German Research Council (DFG, GRK 433/2). A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 387–399, 2003. c Springer-Verlag Berlin Heidelberg 2003  388 C. Pech On the level of weighted tree-languages we can prove a Kleene-type theorem . Its proof is rather conventional and often uses classical automata-theoretic constructions tailored to the new categorical setting of weighted tree-languages. Upto this point the results do not depend on the semiring at all. Only when translating our results to formal tree-series the unavoidable restriction to the semiring becomes apparent. Luckily we only need to require the coefficient-semiring to be commutative—a very mild restriction given that almost all semirings, that are actually used in applications like image compression (cf. [4]) or natural language processing (cf. [8]) are commutative. 1 Preliminaries A ranked alphabet (or ranked set) is a pair (Σ, rk) where Σ is a set of letters (an alphabet) and rk : Σ → IN assigns to each letter its rank. With Σ (n) we denote the set of letters from Σ with rank n. For any set X disjoint from Σ we define Σ(X) := (Σ ∪ X, rk′ ) where rk′|Σ := rk and rk′ (x) := 0 for all x ∈ X. If X consists just of one element x then we also write Σ(x) instead of Σ({x}). The set TΣ of trees is the smallest set of words such that Σ (0) ⊆ TΣ and if f ∈ Σ (n) , t1 , . . . , tn ∈ TΣ , then f t1 , . . . , tn  ∈ TΣ . A semiring is a quintuple (K, ⊕, ⊙, 0, 1) such that (K, ⊕, 0) is a commutative monoid, (K, ⊙, 1) is a monoid and the following identities hold: (x ⊕ y) ⊙ z = (x ⊙ z) ⊕ (y ⊙ z), x ⊙ (y ⊕ z) = (x ⊙ y) ⊕ (x ⊙ z) and x ⊙ 0 = 0 ⊙ x = 0. The set WTΣ of weighted trees is the smallest set of words such that [a|c] ∈ WTΣ for all a ∈ Σ (0) , c ∈ K and if f ∈ Σ (n) , t1 , . . . , tn ∈ WTΣ , c ∈ K, then [f |c]t1 , . . . , tn  ∈ WTΣ . Each weighted tree t has an underlying tree ut(t). This tree is obtained from t be deleting all weights from the nodes. Let a ∈ Σ (0) . To each tree s ∈ TΣ we associate its a-rank rka (s) ∈ IN. This is just the number of occurrences of the letter a in s. The a-rank can be lifted to weighted trees according to rka (t) := rka (ut(t)) (for t ∈ WTLΣ ). The semiring K acts naturally on WTΣ from the left. In particular, for every c, d ∈ K: d · [a|c] := [a|d ⊙ c], d · [f |c]t1 , . . . , tn  := [f |d ⊙ c]t1 , . . . , tn . Obviously (c ⊙ d) · t = c · (d · t) for c, d ∈ K and t ∈ WTΣ . For a ∈ Σ (0) we define the operation of a-substitution on WTΣ . In particular, for t ∈ WTΣ , t1 , . . . , trka (t) ∈ WTΣ we define t ◦a t1 , . . . , trka (t)  by induction on the structure of t: [a|c] ◦a t1  := c · t1 , [b|c] ◦a  := [b|c] (where b = a) and [f |c]t1 , . . . , tn  ◦a s1,1 , . . . , sn,mn  := [f |c]t1 ◦a s1,1 , . . . , s1,m1 , . . . , tn ◦a sn,1 , . . . , sn,mn . Next we equip WTΣ with the structure of a ranked monoid1 . Before we can do that, we need to introduce further notions: A ranked semigroup is a triple (S, rk, ◦) where (S, rk) is a ranked set and where ◦ = (◦i )i∈IN is a family of composition operations ◦i : S (i) × S i → S where ◦i : (f, (g1 , . . . , gi )) → f ◦ g1 , . . . , gi  such that rk(f ◦ g1 , . . . , gi ) = rk(g1 ) + · · · + rk(gi ), and 1 These structures were already used by Berstel and Reutenauer [1] under the name “magma”; however, this leads to a name clash with another type of algebraic structures. Kleene’s Theorem for Weighted Tree-Automata 389 (f ◦ g1 , . . . , gn ) ◦ h1,1 , . . . , h1,m1 , . . . , hn,1 , . . . , hn,mn  = f ◦ g1 ◦ h1,1 , . . . , h1,m1 , . . . , gn ◦ hn,1 , . . . , hn,mn . The latter is called superassociativity law. A ranked monoid is a tuple (S, rk, ◦, 1) where (S, rk, ◦) is a ranked semigroup and 1 ∈ S (1) is a left- and right-unit of ◦. That is x ◦ 1, . . . , 1 = x and 1 ◦ y = y for all x, y ∈ S. Examples of ranked monoids are (TΣ , rka , ◦a , a) for a ∈ Σ (0) and (WTΣ , rka , ◦a , [a|1]). Homomorphisms between ranked semigroups (monoids) are defined in the evident way—as rank-preserving functions between the carriers that additionally preserve the composition-operation (and the unit 1). Ranked semigroups and ranked monoids may be considered as a special kind of many-sorted algebras where the sorts are the natural numbers. Hence there exist free structures. The free ranked monoid freely generated by a ranked alphabet Σ will be denoted by (Σ, rk)∗ . With Σ ′ := Σ(ε) (where ε is a letter that is not in Σ) we have that (Σ, rk)∗ = (TΣ ′ , rkε , ◦ε , ε). (1) 2 Weighted Tree-Languages Let K be a semiring and Σ = (Σ, rk) be a ranked alphabet. A weighted tree-language is a pair L = (L, |.|) where L is a set, |.| : L → WTΣ : s → |s|. Let L1 = (L1 , |.|1 ), L2 = (L2 , |.|2 ) be weighted tree-languages. A function h : L1 → L2 is called homomorphism from L1 to L2 if for all t ∈ L1 holds |t|1 = |h(t)|2 . Thus the weighted tree-languages form a category which will be denoted by WTLΣ . This category is complete and cocomplete. The forgetful functor U : WTLΣ → Set creates colimits. Moreover WTLΣ has an initial object (∅, ∅) and a terminal object (WTΣ , 1WTΣ ). The action of K on WTΣ may be extended to a functor on WTLΣ . In particular, for c ∈ K we define the functor [c · −] : L → c · L, h → h where c · (L, |.|) := (L, |.|′ ) such that |.|′ : t → c · |t|. Next we define the topcatenation. Let f ∈ Σ (n) , c ∈ K. Then we define the functor [f |c]−1 , . . . , −n  : (L1 , . . . , Ln ) → [f |c]L1 , . . . , Ln , (h1 , . . . , hn ) → h1 ×· · ·× hn where for Li = (Li , |.|i ) (i = 1, . . . , n) [f |c]L1 , . . . , Ln  := (L1 × · · · × Ln , |.|) such that |(t1 , . . . , tn )| := [f |c]|t1 |1 , . . . , |tn |n . Let a ∈ Σ (0) . We will lift now the a-substitution from weighted trees to weighted tree-languages. We do this in two steps. First we define t·a L for t ∈ WTΣ , L ∈ WTLΣ . Later we will define L1 ·a L2 for L1 , L2 ∈ WTLΣ . As usual we proceed by induction: [[a|c] ·a −] := [c · −], [[b|c] ·a −] := C{[b|c]} where C{[b|c]} is the constant functor that maps each language to {[b|c]} and each homomorphism to the unit-homomorphism of {[b|c]}. [[f |c]t1 , . . . , tn  ·a −] := [f |c]t1 ·a −, . . . , tn ·a −. The connection of this operation with the a-substitution on weighted trees is as follows. Let t ∈ WTΣ with rka (t) = n. Let L = (L, |.|) ∈ WTLΣ . Then t ·a L ∼ = (Ln , |.|t,a ) where |(t1 , . . . , tn )|t,a := t ◦a |t1 |, . . . , |tn |.  The a-product of two weighted tree-languages is now obtained by [− ·a L2 ] : L1 → t∈L1 |t|1 ·a L2 . The definition of this functor on homomorphisms is done pointwise 390 C. Pech in the evident way. Of course we can give a more transparent construction of this operation: Let L the set of words defined according to L := {t ◦a s1 , . . . , srka (t)  | t ∈ L1 , s1 , . . . , srka (t) ∈ L2 } and define a structure map |.| on L according to |t ◦a s1 , . . . , srka (t) | := |t|1 ◦a |s1 |2 , . . . , |srka (t) |2 . Then L1 ·a L2 ∼ = (L, |.|). As special case of the a-product we define [−¬a ] : L → L¬a where L¬a := L ·a ∅. This operation is called a-annihilation. Proposition 1. [c·−], [f |c]−1 , . . . , −n , [−·a L] and [−¬a ] preserve arbitrary colimits and monos. [t ·a −] and [L ·a −] preserve directed colimits and monos. ⊓ ⊔ Apart from the already defined operations on WTLΣ we also have the coproductfunctor [−1 + −2 ]. We note that this functor also preserves directed colimits and monos. Recall that the composition of functors preserving directed colimits will again preserve directed colimits (the same holds for monos-preserving functors). Our next step is to introduce some iteration operations on WTLΣ . This can be done as for usual tree-languages, only using the appropriate categorical notions. Let us start with the a-iteration—a generalization of the Kleene-star for formal languages to weighted tree-languages. Define Sa : WTL2Σ → WTLΣ : (X, L) → (L ·a X) + {[a|1]}. Then this functor preserves directed colimits. Since WTLΣ has an initial object (the empty language), there exists an initial Sa (−, L)-algebra µX.Sa (X, L). Its carrier may be chosen to be the colimit of the initial sequence ∅ → Sa (∅, L) → S2a (∅, L) → · · · It is called the a-iteration of L and is denoted by L∗a . Next we will reveal a very nice connection between a-iteration and ranked monoids. It is this connection that makes the a-iteration a generalization of the Kleene-star. Proposition 2. Given L = (L, |.|) ∈ WTLΣ . For t ∈ L set rka (t) := rka (|t|). Let (L, rka )∗ be the free ranked monoid generated by (L, rka ). Let L∗a be its carrier and let |.|∗a be the initial homomorphism from (L, rka )∗ to (WTΣ , rka , ◦a , [a|1]) induced by |.|. Then (L∗a , |.|∗a ) ∼ = L∗a . Another important iteration operation is obtained from Ra : WTL2Σ → WTLΣ : (X, L) → L ·a X. We call its initial algebra carrier µX.Ra (X, −) a-recursion. The a-recursion of a weighted tree-language L will be denoted by Lµa . A close relation of a-recursion to a-iteration is given by the fact that Lµa ∼ = (L∗a )¬a for any L ∈ WTLΣ . Let us introduce a last iteration operation. Set Pa : WTL2Σ → WTLΣ : (X, L) → L ·a (X + {[a|1]}). Then the initial algebra carrier µX.Pa (X, −) : L → L+ a will be called a-semiiteration. The relation of this operation to a-iteration is given by L∗a ∼ = µ ∼ + L+ a + {[a|1]}. An immediate consequence is that La = (La )¬a . The following two properties of weighted tree-languages will be important later, when we associate formal tree-series to weighted tree-languages. A weighted treelanguage L = (L, |.|) is called finitary if for all t ∈ TΣ the set {s ∈ L | ut(|s|) = t} is finite. It is called a-quasiregular (for some a ∈ Σ (0) ) if it does not contain any element s with ut(|s|) = a. The full subcategory of WTLΣ of all finitary weighted tree-languages will be denoted by WTLfΣ . Kleene’s Theorem for Weighted Tree-Automata 391 Proposition 3. Let L1 , . . . , Ln ∈ WTLfΣ , c ∈ K, f ∈ Σ (n) . Then L1 + L2 , c · L1 , [f |c]L1 , . . . , Ln , L1 ·a L2 , (L1 )¬a are all finitary again. ⊓ ⊔ Proposition 4. Let L ∈ WTLfΣ . Then L∗a is finitary if and only if L is a-quasiregular. ⊓ ⊔ 3 Weighted Tree-Automata Given a ranked alphabet Σ and a semiring K, a finite weak weighted tree-automaton (wWTA) is a 7-tuple (Q, I, ι, T, λ, S, σ) where Q is a finite set of states, I ⊆ Q is a set of initial states, ι : I → K describes the initial weights, T is a finite ranked set of transition-symbols and λ is a function assigning to each transition-symbol τ ∈ T a transition where for τ ∈ T (n) a transition is a tuple (q, f, q1 , . . . , qn , c) such that q, q1 , . . . , qn ∈ Q, f ∈ Σ (n) and c ∈ K. Moreover, S is a finite set of silent transitionsymbols and σ assigns to each silent transition-symbol a silent transition where a silent transition is a triple (q1 , q2 , c) for q1 , q2 ∈ Q, c ∈ K. Let A be a wWTA. For convenience reasons for τ ∈ T with λ(τ ) = (q, f, q1 , . . . , qn , c) we define lab(τ ) := f , wt(τ ) := c, dom(τ ) := q, cdomi (τ ) := qi and cdom(τ ) := {q1 , . . . , qn } and for s ∈ S with σ(s) = (q1 , q2 , c) we define dom(s) := q1 , cdom(s) := q2 and wt(s) := c. Let A be a wWTA. Runs through A are defined inductively: If τ ∈ T , λ(τ ) = (q, a, c), then τ is a run of A with root q along a. If s ∈ S, σ(s) = (q, q ′ , c) and p is a run of A with root q ′ along t. Then s · p is a run of A with root q along t. If finally τ ∈ T , λ(τ ) = (q, f, q1 , . . . , qn , c) and p1 , . . . , pn are runs of A with root q1 , . . . , qn along trees t1 , . . . , tn , respectively, then τ p1 , . . . , pn  is a run of A with root q along f t1 , . . . , tn . The root of a run p will be denoted by root(p). A run is called initial if its root is in I. With runt (A) we denote the set of all initial runs in A along t and with run(A) we denote the set of all initial runs of A. A (silent) transition symbol is called reachable if it is involved in some initial run of A. A state of A is called reachable if it is the domain of some reachable (silent) transition-symbol. To each run p of A we may associate a weighted tree |p|. This is done by induction on the structure of p. If p = τ , λ(τ ) = (q, a, c), then |τ | := [a|c]. If p = s · p′ with σ(s) = (q1 , q2 , c), then |p| := c · |p′ | and if p = τ p1 , . . . , pn , λ(τ ) = (q, f, q1 , . . . , qn , c), then |p| := [f |c]|p1 |, . . . , |pn |. The weighted tree-language recognized by A is defined as LA := (run(A), |.|A ) where |p|A := ι(root(p)) · |p|. A weighted tree-language L is called weakly recognizable if there is a finite wWTA A with L ∼ = LA . Two wWTAs A1 , A2 are called equivalent (denoted by A1 ≡ A2 ) if LA1 ∼ = LA2 . A wWTA A is called reduced if each of its states and (silent) transition-symbols is reachable. It is called normalized if it has precisely one initial state and the initial weight of this state is equal to 1. It is easy to see that for every wWTA A there is a reduced, normalized wWTA A′ such that A ≡ A′ . Therefore, from now on we will only consider normalized wWTAs. Since the description of wWTAs by a tuple of certain set and mappings is tedious we sometimes prefer a graphical representation. In such a representation each transitionsymbol τ with λ(τ ) = (q, f, q1 , . . . , qn , c) will be depicted by 392 C. Pech qn f |c ··· q q2 q1 . The output-arms are always ordered counterclockwise starting directly after the inputarm. The initial weights are depicted by arrows to the initial states carrying weights. Silent transition symbols are represented by arrows between states that are equipped with a weight. In normalized wWTAs we usually omit the arrow with the initial weight. Let us give a small example of a wWTA: 1 i f |1 q2 2 1 q3 g|2 q5 q4 f |3 q6 ∗|1 ∗|2 A weighted tree-automaton (WTA) is a wWTA with empty set of silent transitionsymbols. A weighted tree-language L is called recognizable if there is a WTA A such that L ∼ = LA . Proposition 5. Let L1 , . . . , Ln be recognizable weighted tree-languages, c ∈ K, f ∈ Σ (n) , a ∈ Σ (0) . Then the c · L1 , L1 + L2 , [f |c]L1 , . . . , Ln  and L1 ·a L2 are also recognizable. ⊓ ⊔ Note that recognizable weighted tree-languages are always finitary. In particular we can see already that the recognizable weighted tree-languages will not be closed with respect to a-iteration (e.g. {[a|c]}∗a is not recognizable). It is clear that recognizability implies weak recognizability. However, the converse does not hold. In the next few paragraphs we will give necessary and sufficient conditions for a wWTA to recognize a recognizable weighted tree-language. A word s = s1 · · · sk ∈ S ∗ of silent transitions of A is called silent path if cdom(si ) = dom(si+1 ) (1 ≤ i < k). By convention, the empty word ε counts also as a silent path. We may extend dom and cdom to non-empty silent paths according to dom(s) := dom(s1 ), cdom(s) := cdom(sk ). A silent path s with dom(s) = cdom(s) is called silent cycle. If any silent transition of a silent cycle is reachable then the cycle is called reachable. The set of all silent paths of A is denoted by sPA . To each silent path s ∈ sPA we assign a weight wt(s) ∈ K according to wt(ε) := 1, wt(s · s) := wt(s) ⊙ wt(s). Silent cycles play a crucial role in the characterization of the finitary weakly recognizable weighted tree-languages. Proposition 6. Let A be a wWTA. Then LA is finitary if and only if A does not contain a reachable silent cycle. ⊓ ⊔ Proposition 7. Let A be a wWTA without reachable silent cycles. Then there is a WTA A′ such that LA ∼ = LA′ . Kleene’s Theorem for Weighted Tree-Automata 393 Proof. Since the normalization and reduction of wWTAs do not introduce new silent cycles, we can assume that A = (Q, {i}, ι, T, λ, S, σ) is normalized and reduced. Let A have no silent cycles. Then we claim that sPA is finite, for assume it is not, then it contains words of arbitrary length (because S is finite). Hence it would also contain a word of length > |Q| but such a word contains necessarily a cycle—contradiction. Let us construct the WTA A′ now. Its state set is Q and the set of transitions T ′ of A′ is defined as follows: T ′ := {(s, t) | s ∈ sPA , t ∈ T, s = ε or cdom(s) = dom(t)} and λ′ (s, t) := (q ′ , f, q1 , . . . , qn , c′ ) where λ(t) = (q, f, q1 , . . . , qn , c) and c′ := wt(s) ⊙ c and where q ′ = q if s = ε and q ′ = dom(s) else. Altogether A′ = (Q, {i}, ι, T ′ , λ′ , ∅, ∅). We skip the proof that A′ is indeed equivalent to A. ⊓ ⊔ As immediate consequence we get that a weakly recognizable weighted tree-language is recognizable if and only if it is finitary. Another important question is how to decide whether a given wWTA recognizes an a-quasiregular weighted tree-language. A short thought reveals that a wWTA A fails to be a-quasiregular if and only if either there is some t ∈ T with dom(t) ∈ I, lab(t) = a or there exists a silent path s starting in I and ending in a state that is the domain of a transition t ∈ T with lab(t) = a. Proposition 8. Let L1 , . . . , Ln be weakly recognizable weighted tree-languages, c ∈ K, f ∈ Σ (n) , a ∈ Σ (0) . Then the c · L1 , L1 + L2 , [f |c]L1 , . . . , Ln , L1 ·a L2 , (L1 )∗a , µ (L1 )+ a and (L1 )a are also weakly recognizable. Proof. Each operation is defined as construction on wWTAs. Then we argue that the assignment A → LA preserves the operations up to isomorphism. i1 c·A := i′ 1 c i A1 + A2 A := i 1 i2 f |c [f |c]A1 , . . . , An  := i1 i2 A1 a|c1 a|c2 i1 a|ck i2 A2 A2 ik Ak c1 c2 i1 := i2 ··· ··· ·a ··· A1 i ck c2 c1 a|c1 a|c2 i a|ck + a|c1 := i ··· ···  a a|c2 a|ck ck c2 a|c1 a|c2 i a|ck µ a c1 := i ··· ···  ck ′ The a-iteration of a wWTA A can now be defined according to A∗a := A+ a + A where ′ A is a wWTA that recognizes {[a|1]}. ⊓ ⊔ 394 C. Pech 4 A Kleene-Type Result Let X be a set of variable symbols disjoint from Σ and let K be a semiring. The set Rat(Σ, K, X) of rational expressions over Σ, X and K is the set E of words given by the following grammar: E ::= a | x | c · E | E + E | f E, . . . , E | µx.(E) a ∈ Σ (0) , x ∈ X, f ∈ Σ where in f E, . . . , E the number of E’s is equal to the rank of f . The semantics of rational expressions is given in terms of weighted tree-languages over the ranked alphabet Σ(X). It is defined inductively: [[a]] := {[a|1]}, [[x]] := {[x|1]}, [[f e1 , . . . , en ]] := [f |1][[e1 ]], . . . , [[en ]], [[c · e]] := c · [[e]], [[e1 + e2 ]] := [[e1 ]] + [[e2 ]] µ and [[µx.(e)]] := [[e]]x . We have already seen that the semantics of each rational expression is weakly recognizable. Showing the opposite direction—namely that each weakly recognizable weighted tree-language is isomorphic to the semantics of a rational expression—will be our goal in the next few paragraphs. As first step into this direction we introduce the accessibility graph of wWTAs.  Let A = (Q, {i}, ι, T, λ, S, σ) be a normalized wWTA. Let E1 := T (j) × j∈IN\{0} ˙ and define s : E → Q according to s(e) = dom(t) if {1, 2, . . . , j}, E := E1 ∪S, e = (t, j), t ∈ T and s(e) = dom(e) if e ∈ S. Moreover define d : E → Q according to d(e) := cdomj (t) if e = (t, j), t ∈ T and d(e) := cdom(e) if e ∈ S. Then the multigraph ΓA = (Q, E, s, d) is called accessibility-graph of A.2 A path of length n in ΓA = (Q, E, s, d) is a word e1 e2 · · · en where e1 , . . . , en ∈ E and such that d(ej ) = s(ej+1 ) (j = 1, . . . , n−1). Such a path is called cyclic if d(en ) = s(e1 ). It is called minimal cycle if for all 1 ≤ j, k ≤ n we have s(ej ) = s(ek ) ⇒ j = k. The number of minimal cycles of ΓA is called the cyclicity of A. It is denoted by cyc(A). A state q of A is called source if it is a source of ΓA . That is, there does not exist any arc e of ΓA with d(e) = q. Let A = (Q, {i}, ι, T, λ, S, σ) be a normalized wWTA. Let τ ∈ T with domain i. Assume λ(τ ) = (i, f, q1 , . . . , qn , c). Then for 1 ≤ k ≤ n the derivation of A by (q, k) is the reduction of the automaton (Q, {qk }, ι′ , T, λ, S, σ) where ι′ maps qk to 1. It will ∂A . Moreover we define the complete derivation of A by τ as the tuple be denoted by ∂(τ,k)   ∂A ∂A ∂A ∂τ := ∂(τ,1) , . . . , ∂(τ,n) . Analogously, for s ∈ S with σ(s) = (i, q, c) we define the derivation of A by s as the reduction of the automaton (Q, {q}, ι′ , T, λ, S, σ) where ι′ maps q to 1. It will be denoted by ∂A ∂s . Proposition 9. With the notions from above let Ti ⊆ T , Si ⊆ S be the sets of all transition-symbols and silent transition-symbols with domain i, respectively. Then    ∂A ∂A [lab(τ )|wt(τ )] + A≡ wt(s) ∂τ ∂s τ ∈Ti s∈Si ⊓ ⊔ 2 The function names s and d are abbreviations for “source” and “destination” of arcs, respectively. Kleene’s Theorem for Weighted Tree-Automata 395 Proposition 10. Let A = (Q, {i}, ι, T, λ, S, σ) be a reduced and normalized wWTA whose initial state i is not a source. Let x be a variable symbol that does not occur in A. Define Q′ := Q + {q ′ } and T ′ := T + {τ ′ } and let ϕ : Q → Q′ such that ϕ(q) = q if q = i and ϕ(i) = q ′ . For τ in T with λ(τ ) = (q, f, q1 , . . . , qn , c) define λ′ (τ ) := (q, f, ϕ(q1 ), . . . , ϕ(qn ), c) and for s in S with σ(s) = (q1 , q2 , c) define σ ′ (s) := (q1 , ϕ(q2 ), c). Finally define λ′ (τ ′ ) := (q ′ , x, 1). Then the wWTA A′ = (Q′ , {i}, ι, T ′ , λ′ , S, σ ′ ) is still normalized and reduced with i being a source. Moreover (A′ )µx ≡ A. q ′ [x|1] i ··· A′ : (A′ )µx : q′ i ··· i ··· A : 1 Theorem 11. Every weakly recognizable weighted tree-language is definable by a rational expression. Proof. We prove inductively that each wWTA recognizes a rationally definable weighted tree-language. To each normalized automaton A = (Q, {i}, ι, T, λ, S, σ) we associate the pair of integers (cyc(A), |Q|). On these integer-pairs we consider the lexicographical order: (x, y) ≤ (u, v) ⇐⇒ x < u ∨ (x = u ∧ y ≤ v) and take this as an induction-index. Since any wWTA has an initial state, the smallest possible index is (0, 1). Such an automaton has Q = {i} and S = ∅. Moreover if T = {t1 , . . . , tn } then there are a1 , . . . , an ∈ Σ (0) ∪ X and c1 , . . . , cn ∈ K such that λ(tk ) = (i, ak , ck ) (k = 1, . . . , n). The weighted tree-language that is recognized by such an automaton is {[a1 |c1 ], . . . , [an |cn ]} this is definable by the following rational expression: n  ck · ak . k=1 Suppose now the claim holds for all wWTAs with index less than (n, m). Let A = (Q, {i}, ι, T, λ, S, σ) be a normalized wWTA with cyc(A) = n and |Q| = m. If i is a source, then we use Proposition 9 and obtain    ∂A ∂A + wt(s) A≡ [lab(τ )|wt(τ )] ∂τ ∂s τ ∈Ti s∈Si ∂A For τ ∈ Ti of arity k let Aτ,k := ∂(τ,k) and for s ∈ Si let As := ∂A ∂s . Since the number of states of Aτ,k is strictly smaller than that of A and the cyclicity of Aτ,k is not greater that that of A, we conclude that the index of Aτ,k is strictly smaller than that of A. Hence the weighted tree-language that is recognized by Aτ,k is rationally definable. The same holds for the derivations by silent transitions. (j) For j ∈ IN, τ ∈ Ti , 1 ≤ k ≤ j let eτ,k be a rational expression defining a weighted tree-language isomorphic to the one recognized by Aτ,k . Moreover, for s ∈ Si let es 396 C. Pech be a rational expression defining a weighted tree-language that is isomorphic to the one recognized by As . Then    wt(s) · es [lab(τ )|wt(τ )]et,1 , . . . , et,j  + j∈IN τ ∈T (j) i s∈Si is a rational expression defining a weighted tree-language isomorphic to LA . If i is not a source then we use Proposition 10 and obtain a wWTA A′ such that ′ µ (A )x ≡ A. Clearly, A′ has a smaller cyclicity and hence also a smaller index than A. By induction hypothesis there is a rational expression e such that [[e]] ≡ LA′ . Therefore µx.(e) is a rational expression for LA . ⊓ ⊔ If we want to characterize the recognizable weighted tree-languages in a similar way, then we must take care about the problem that only the a-recursion of a-quasiregular recognizable weighted tree-languages is guaranteed to be recognizable again. Therefore we restrict the set of rational expressions: The set pRat(Σ, X, K) of proper rational expressions shall consist of all words of the language E defined by the following grammar: E ::= a | x | c · E | E + E | f E, . . . , E | µx.(Ex ) Ex ::= a | y | c · Ex | Ex + Ex | f E, . . . , E | µx.(Ex ). where a ∈ Σ (0) , x, y ∈ X, x = y, c ∈ K and f ∈ Σ. The semantics of proper rational expressions is the same as for rational expressions. The essential difference between Rat and pRat is that an expression µx.(e) is in pRat only if [[e]] is x-quasiregular. Therefore it is clear that the semantics of proper rational expressions are always going to be recognizable. Theorem 12. For every recognizable weighted tree-language L there is a proper rational expression e such that L ∼ = [[e]]. Proof. L is recognized by a wWTA without silent cycles. The decomposition steps to obtain a rational expression for L never introduce new silent cycles (in fact they never introduce any cycles). Therefore the construction from the proof of Theorem 11 produces a proper rational expression. ⊓ ⊔ 5 Formal Tree-Series Given a ranked alphabet (Σ, rk) and a semiring (K, ⊕, ⊙, 0, 1) let TΣ be the set of all trees over Σ. A function S : TΣ → K is called formal tree-series. We will adopt the usual notation and write (S, t) for the image of t under S. With KΣ we will denote the set of all formal tree-series over Σ. Let WTΣ be the set of all weighted trees over Σ with weights from K. To each weighted tree t we associate its weight wt(t) ∈ K and its underlying tree ut(t) ∈ TΣ . The function ut we already defined above. The function wt : WTΣ → K is defined n inductively: wt([a|c]) := c and wt([f |c]t1 , . . . , tn ) := c ⊙ wt(ti ). i=1 Kleene’s Theorem for Weighted Tree-Automata 397 An easy property of wt is that wt(c · t) = c ⊙ wt(t) for all t ∈ WTΣ . Another very crucial property only holds if K is commutative, namely for t ∈ WTΣ with rka (t) = n and for s1 , . . . , sn ∈ WTΣ : n wt(t ◦a s1 , . . . , sn ) = wt(t) ⊙ wt(si ). i=1 From now on we assume that K is commutative. Given now a finitary L = (L, |.|) ∈ WTLΣ we associate a formal tree-series SL with L according to: (SL , t) := wt(|s|) (t ∈ TΣ ). s∈L ut(|s|)=t Since L is finitary, SL is welldefined. We call S ∈ KΣ a-quasiregular if (S, a) = 0 and we call S recognizable if there is a recognizable L ∈ WTLΣ with SL = S. It is easy to see that if a finitary weighted tree-language L is a-quasiregular, then SL is a-quasiregular. The operations of sum and product with scalars can be introduced for formal treeseries pointwise. That is (S1 + S2 , t) := (S1 , t) ⊕ (S2 , t) and (c · S1 , t) := c ⊙ (S1 , t) for any S1 , S2 ∈ KΣ, c ∈ K. It is not surprising that for any L1 , L2 ∈ WTLfΣ we have SL1 +L2 = SL1 + SL2 and Sc·L1 = c · SL1 . Next we define the a-product of formal tree-series S1 , S2 for a ∈ Σ (0) according to rka (s) (S1 , s) ⊙ (S1 ·a S2 , t) := s∈TΣ s1 ,... ,srka (s) ∈TΣ t=s◦a s1 ,... ,srka (s)  (S2 , si ) i=1 Whenever K is commutative, then for L1 , L2 ∈ WTLfΣ we have SL1 ·a L2 = SL1 ·a SL2 . Let f ∈ Σ (n) , c ∈ K, S1 , . . . , Sn ∈ KΣ. Then we define the topcatenation [f |c]S1 , . . . , Sn  according to ([f |c]S1 , . . . , Sn , t) = c⊙ 0 n i=1 (Si , ti ) if t = f t1 , . . . , tn  else. Again, some thought reveals that we have S[f |c]L1 ,... ,Ln  = [f |c]SL1 , . . . , SLn  for all L1 , . . . , Ln ∈ WTLfΣ . Note that here the semiring does not need to be commutative. The most delicate operation to define for formal tree-series is the a-iteration. Luckily we showed its close relation to free ranked monoids. This relationship we use to define the a-iteration on formal tree-series. Let S ∈ KΣ, a ∈ Σ (0) such that (S, a) = 0. Let (TΣ , rka )∗ be the free ranked monoid generated by (TΣ , rka ) (cf. (1) in Section 1). Let TΣ∗ be its carrier and ε be its neutral element. Let ϕ : (TΣ , rka )∗ → (TΣ , rka , ◦a , a) be the unique homomorphism induced by the identity map of TΣ . On TΣ∗ we define a 398 C. Pech weight-function wt∗S inductively:  1 s=ε      s ∈ TΣ (S, s) rka (t) wt∗S (s) :=  wt∗S (ti ) s = tt1 , . . . , trka (t) , t ∈ TΣ , (S, t) ⊙    i=1   t1 , . . . , trka (t) ∈ TΣ∗ . Then we define Sa∗ ∈ KΣ according to (Sa∗ , t) := wt∗S (s). ∗ s∈TΣ ϕ(s)=t Assume K is commutative and L ∈ WTLfΣ is a-quasiregular. Then SL∗a = (SL )∗a . Summing up we obtain: Proposition 13. If K is commutative, then the assignment L → SL preserves sum, product with scalars, a-product, topcatenation and a-iteration. ⊓ ⊔ For S ∈ KΣ we define the a-annihilation by S¬a := S ·a 0 where 0 denotes the series that maps each tree to 0. Clearly we have SL¬a = (SL )¬a for any L ∈ WTLfΣ . The a-recursion of formal tree-series can also be introduced easily now. Let S ∈ KΣ be a-quasiregular. Then we define Saµ := (Sa∗ )¬a . Using the characterization of the a-recursion through the a-iteration of weighted tree-languages, it is evident that SLµa = (SL )µa . It is clear that for any e ∈ pRat(Σ, X, K) we get that S[[e]] is a recognizable element of KΣ(X). From Theorem 12 and from Proposition 13 we obtain immediately the following result Theorem 14. Let K be commutative and let S ∈ KΣ(X) be recognizable. Then there is a proper rational expression e with S = S[[e]] . ⊓ ⊔ Using that a-product preserves recognizability and that the a-recursion may be simulated by a-iteration and a-product, we can also formulate a more conventional Kleenetype result: Corollary 15. Let K be commutative. Then the set of all recognizable formal tree-series over Σ(X) is the smallest subset of KΣ(X) that contains all polynomials and that is closed with respect to sum, product with scalars, x-product (x ∈ X) and x-iteration (x ∈ X). ⊓ ⊔ References 1. Berstel, J., Reutenauer, C.: Recognizable formal power series on trees. Theoretical Computer Science 18 (1982) 115–148 2. Bloom, S.L., Ésik, Z.: An extension theorem with an application to formal tree series. BRICS Report Series RS-02-19, University of Aarhus (2002) Kleene’s Theorem for Weighted Tree-Automata 399 3. Bozapalidis, S.: Equational elements in additive algebras. Theory Comput. Systems 32 (1999) 1–33 4. Culik, K., Kari, J.: Image compression using weighted finite automata. Computer and Graphics 17 (1993) 305–313 5. Droste, M., Vogler, H.: A Kleene theorem for weighted tree automata. technical report TUD-FI02-04, Technische Universität Dresden (2002) 6. Kleene, S.E.: Representation of events in nerve nets and finite automata. In Shannon, C.E., McCarthy, J., eds.: Automata Studies. Princeton University Press, Princeton, N.J. (1956) 3–42 7. Kuich, W.: Formal power series over trees. In: Proc. of the 3rd International Conference Developments in Language Theory, Aristotle University of Thesaloniki (1997) 60–101 8. Mohri, M.: Finite-state transducers in language and speech processing. Computational Linguistics 23 (1997) 269–311 9. Pech, C.: Kleene-type results for weighted tree-automata. Dissertation, TU-Dresden (2003) http://www.math.tu-dresden.de/˜pech/diss.ps. 10. Schützenberger, M.P.: On the definition of a family of automata. Information and Control 4 (1961) 245–270 11. Thatcher, J.W., Wright, J.B.: Generalized finite automata theory with application to a decision problem of second-order logic. Math. Systems Theory 2 (1968) 57–81 Weak Cardinality Theorems for First-Order Logic (Extended Abstract) Till Tantau Fakultät IV – Elektrotechnik und Informatik Technische Universität Berlin Franklinstraße 28/29, D-10587 Berlin, Germany tantau@cs.tu-berlin.de Abstract. Kummer’s cardinality theorem states that a language A is recursive if a Turing machine can exclude for any n words w1 , . . . , wn one of the n + 1 possibilities for the cardinality of {w1 , . . . , wn } ∩ A. It is known that this theorem does not hold for polynomial-time computations, but there is evidence that it holds for finite automata: at least weak cardinality theorems hold for them. This paper shows that some of the weak recursion-theoretic and automata-theoretic cardinality theorems are instantiations of purely logical theorems. Apart from unifying previous results in a single framework, the logical approach allows us to prove new theorems for other computational models. For example, weak cardinality theorems hold for Presburger arithmetic. 1 Introduction Given a language A and n input words, we often wish to know which of these words are in the language. For languages like the satisfiability problem this problem is presumably difficult to solve, for languages like the halting problem it is impossible to solve. To tackle such problems, Gasarch [7] has proposed to study a simpler problem instead: we just count how many of the input words are elements of A. To make things even easier, we do not require this number to be computed exactly, but only approximately. Indeed, let us just try to exclude one possibility for the number of input words in A. In recursion theory, Kummer’s cardinality theorem [16] states that, using a Turing machine, excluding one possibility for the number of input words in A is just as hard as deciding A. It is not known whether this statement carries over to automata theory, that is, it is not known whether a language A must be regular if a finite automaton can always exclude one possibility for the number of input words in A. However, several weak forms of this theorem are known to hold for automata theory. For example, the finite automata cardinality theorem is known [25] to hold for n = 2. These parallels between recursion and automata theory are surprising insofar as computational models ‘in between’ exhibit a different behaviour: there are A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 400–411, 2003. c Springer-Verlag Berlin Heidelberg 2003  Weak Cardinality Theorems for First-Order Logic 401 languages A outside the class P of problems decidable in polynomial time for which we can always exclude, in polynomial time, for any n ≥ 2 words one possibility for their number in A. The present paper explains (at least partly) why the parallels between recursion and automata theory exist and why they are not shared by the models in between. Basically, the weak cardinality theorems for Turing machines and finite automata are just different instantiations of the same logical theorems. These logical theorems cannot be instantiated for polynomial time, because polynomial time lacks a logical characterisation in terms of elementary definitions. Using logic for the formulation and proof of the weak cardinality theorems has another advantage, apart from unifying previous results. Theorems formulated for arbitrary logical structures can be instantiated in novel ways: the weak cardinality theorems all hold for Presburger arithmetic and the nonspeedup theorem also holds for ordinal number arithmetic. In the logical setting ‘computational models’ are replaced by ‘logical structures’ and ‘computations’ are replaced by ‘elementary definitions’. For example, the cardinality theorem for n = 2 now becomes the following statement: Let S be a logical structure with universe U satisfying certain requirements and let A ⊆ U . If there exists a function f : U × U → {0, 1, 2} with f (x, y) = |{x, y} ∩ A| for all x, y ∈ U that is elementarily definable in S, then A is elementarily definable in S. Cardinality computations have applications in the study of separability. As argued in [26], ‘cardinality theorems are separability results in disguise’. In recursion theory and in automata theory one can rephrase the weak cardinality theorems as separability results. Such a rephrasing is also possible for the logical versions and we can formulate purely logical separability theorems that are interesting in their own right. An example of such a theorem is the following statement: Let S be a logical structure with universe U satisfying certain requirements and let A ⊆ U . If there exist elementarily definable supersets of A × A, A × Ā, and Ā × Ā whose intersection is empty, then A is elementarily definable in S. This paper is organised as follows. In section 2 the history of the cardinality theorem is retraced and the weak cardinality theorems are formulated rigorously. Section 3 prepares the logical formulation of the weak cardinality theorems. It is shown how the class of regular languages and the class of recursively enumerable languages can be characterised in terms of appropriate elementary definitions. In section 4 the weak cardinality theorems for first-order logic are formulated. In section 5 applications of the theorems to separability are discussed. This extended abstract does not include any proofs due to lack of space. They can be found in the full technical report version of the paper [27]. 2 2.1 History of the Cardinality Theorem The Cardinality Theorem for Recursion Theory For a set A, the cardinality function #An takes n words as input and yields the number of words in A as output, that is, #An (w1 , . . . , wn ) = |{w1 , . . . , wn } ∩ A|. 402 T. Tantau The cardinality function and the idea of ‘counting input words’, which is due to Gasarch [7] in its general form, play an important role in a variety of proofs both in complexity theory [9,12,14,18,23] and recursion theory [4,16,17]. For example, the core idea of the Immerman–Szelepcsényi theorem is to count the number of reachable vertices in a graph in order to decide a reachability problem. One way of quantifying the complexity of #An is to consider its enumeration complexity, which is the smallest number m such that #An is m-enumerable. Enumerability, which was first defined by Cai and Hemaspaandra [6] in the context of polynomial-time computations and which was later transferred to recursive computations, can be regarded as ‘generalised approximability’. It is defined as follows: a function f , taking n tuples of words as input, is m-Turingenumerable if there exists a Turing machine that on input w1 , . . . , wn starts a possibly infinite computation during which it prints words onto an output tape. At most m different words may be printed and one of them must be f (w1 , . . . , wn ). Intuitively, the larger m, the easier it should be to m-Turing-enumerate #An . This intuition is wrong. Kummer’s cardinality theorem, see below, states that even n-Turing-enumerating #An is just as hard as deciding A. In other words, excluding just one possibility for #An (w1 , . . . , wn ) is just as hard as deciding A. Intriguingly, the intuition is correct for polynomial-time computations since the work of Gasarch, Hoene, and Nickelsen [7,11,20] shows that a polynomial-time version of the cardinality theorem does not hold for n ≥ 2. Theorem 2.1 (Cardinality theorem [16]). If #An is n-Turing-enumerable, then A is recursive. The cardinality theorem has applications for instance in the study of semirecursive sets [13], which play a key role in the solution of Post’s problem [22]. The proof of the cardinality theorem is difficult. Several less general results had already been proved when Kummer wrote his paper ‘A proof of Beigel’s cardinality conjecture’ [16]. The title of Kummer’s paper refers to the fact that Richard Beigel was the first to conjecture the cardinality theorem as a generalisation of his so-called ‘nonspeedup theorem’ [3]. In the following formulation of the nonspeedup theorem χnA denotes the n-fold characteristic function of A, which maps any n words w1 , . . . , wn to a bitstring whose ith bit is 1 iff wi ∈ A. The nonspeedup theorem is a simple consequence of the cardinality theorem. Theorem 2.2 (Nonspeedup theorem [3]). If χnA is n-Turing-enumerable, then A is recursive. Owings [21] succeeded in proving the cardinality theorem for n = 2. For larger n he could only show that if #An is n-Turing-enumerable, then A is recursive in the halting problem. Harizanov et al. [8] have formulated a restricted cardinality theorem, whose proof is somewhat simpler than the proof of the full cardinality theorem. Theorem 2.3 (Restricted cardinality theorem [8]). If #An is n-Turingenumerable via a Turing machine that never enumerates both 0 and n simultaneously, then A is recursive. Weak Cardinality Theorems for First-Order Logic 2.2 403 Weak Cardinality Theorems for Automata Theory If we restrict the computational power of Turing machines, the cardinality theorem no longer holds [7,11,20]: there are languages A ∈ / P for which we can always exclude one possibility for #An (w1 , . . . , wn ) in polynomial time for n ≥ 2. However, if we restrict the computational power even further, namely if we consider finite automata, there is strong evidence that the cardinality theorem holds once more, see the following conjecture: Conjecture 2.4 ([25]). If #An is n-fa-enumerable, then A is regular. The conjecture refers to the notion of m-enumerability by finite automata. This notion was introduced in [24] and is defined as follows: A function f is mfa-enumerable if there exists a finite automaton for which for every input tuple (w1 , . . . , wn ) the output attached to the last state reached is a set of size at most m that contains f (w1 , . . . , wn ). The different components of the tuple are put onto n different tapes, shorter words padded with blanks, and the automaton scans the tapes synchronously, which means that all heads advance exactly one symbol in each step. The same method of feeding multiple words to a finite automaton has been used in [1,2,15]. In a line of research [1,2,15,24,25,26], the following three theorems were established. They support the above conjecture by showing that all of the historically earlier, weak forms of the recursion-theoretic cardinality theorem hold for finite automata. Theorem 2.5 ([24]). If χnA is n-fa-enumerable, then A is regular. Theorem 2.6 ([25]). If #A2 is 2-fa-enumerable, then A is regular. Theorem 2.7 ([25,2]). If #An is n-fa-enumerable via a finite automaton that never enumerates both 0 and n simultaneously, then A is regular. 3 Computational Models as Logical Structures The aim of formulating purely logical versions of the weak cardinality theorems is to abstract from concrete computational models. The present section explains which logical abstraction is used. 3.1 Presburger Arithmetic Let us start with an easy example: Presburger arithmetic. This notion is easily transferred to a logical setting since it is defined in terms of first-order logic in the first place. A set A of natural numbers is called definable in Presburger arithmetic if there exists a first-order formula φ(x) over the signature {+2 } with the following property: A contains exactly those numbers a that make φ(x) 404 T. Tantau true if we interpret x as a and the symbol + as the normal addition of natural numbers. For example, the set of even natural numbers is definable in Presburger arithmetic using the formula φ(x) = ∃y (y + y = x). In the abstract logical setting used in the next sections the ‘computational model Presburger arithmetic’ is represented by the logical structure (N, +). The class of sets that are ‘computable in Presburger arithmetic’ is given by the class of sets that are elementarily definable in (N, +). Recall that a relation R is called elementarily definable in a logical structure S if there exists a first-order formula φ(x1 , . . . , xn ) such that (a1 , . . . , an ) ∈ R iff φ(x1 , . . . , xn ) holds in S if we interpret each xi as ai . 3.2 Finite Automata In order to make finite automata and regular languages accessible to a logical setting, for a given alphabet Σ we need to find a logical structure SREG,Σ with the following property: a language A ⊆ Σ ∗ is regular iff it is elementarily definable in SREG,Σ . It is known that such a structure SREG,Σ exists: Büchi has proposed one [5], though a small correction is necessary as pointed out by McNaughton [19]. However, the elements of Büchi’s structure are natural numbers, not words, and thus a reencoding is necessary. A more directly applicable structure is discussed in [26], where it is shown that for non-unary alphabets the structure (Σ ∗ , Iσ1 , . . . , Iσ|Σ| ) has the desired properties. The relations Iσi , one for each symbol σi ∈ Σ, are binary relations that hold for a pair (u, v) of words if the |v|-th letter of u is σi . For unary alphabets, an appropriate structure SREG,Σ can also be constructed. 3.3 Polynomially Time-Bounded Turing Machines There is no logical structure S such that the class of languages that are elementarily definable in S is exactly the class P of languages decidable in polynomial time. To see this, consider the relation R = {(M, t) | M halts on input M after t steps}. This relation is in P, but the language defined by the first-order formula φ(M ) = ∃t R(M, t) is exactly the halting problem. Thus in any logical structure in which we can elementarily define R we can also elementarily define the halting problem. 3.4 Resource-Unbounded Turing Machines On the one hand, the class of recursive languages cannot be defined elementarily: the argument for polynomial-time machines also applies here. On the other hand, the arithmetical hierarchy contains exactly the sets that are elementarily definable in (N, +, ·). The most interesting case, the class of recursively enumerable languages, is more subtle. Since the class is not closed under complement, it cannot be characterised by elementary definitions. However, it can be characterised by positive Weak Cardinality Theorems for First-Order Logic 405 elementary definitions, which are elementary definitions that do not contain negations: For every alphabet Σ there is a structure SRE,Σ such that a language A ⊆ Σ ∗ is recursively enumerable iff it is positively elementarily definable in SRE,Σ . An example of such a structure SRE,Σ is the following: its universe is Σ ∗ and it contains all recursively enumerable relations over the alphabet Σ ∗ . 4 Logical Versions of the Weak Cardinality Theorems In this section the weak cardinality theorems for first-order logic are presented. The theorems are first formulated for elementary definitions, which allows us to apply them to all computational models that can be characterised in terms of elementary definitions. As argued in the previous section, this includes Presburger arithmetic, finite automata, and the arithmetical hierarchy, but misses the recursively enumerable languages. This is remedied later in this section, where positive elementary definitions are discussed. It is shown that at least the nonspeedup theorem can be formulated in a ‘positive’ way. At the end of the section higher-order logics are briefly touched. We are still missing one crucial definition for the formulation of the weak cardinality theorems: What does it mean that a function is ‘m-enumerable in a logical structure’ ? Definition 4.1. Let S be a logical structure with universe U and m a positive integer. A function f : U → U is (positively) elementarily m-enumerable in S if there exists a relation R ⊆ U × U with the following properties: 1. R is (positively) elementarily definable in S, 2. the graph of f is contained in R, 3. R is m-bounded, that is, for every x ∈ U there exist at most m different y with (x, y) ∈ R. The definition is easily adapted to functions f that take more than one input or yield more than one output. This definition does, indeed, reflect the notion of enumerability: A function with finite range is m-fa-enumerable iff it is elementarily m-enumerable in SREG,Σ ; a function is m-Turing-enumerable iff it is positively elementarily m-enumerable in SRE,Σ . 4.1 The Non-positive First-Order Case We are now ready to formulate the weak cardinality theorems for first-order logic. In the following theorems, a logical structure is called well-orderable if a well-ordering of its universe can be defined elementarily. For example (N, +) is well-orderable using the formula φ≤ (x, y) = ∃z (x + z = y). The cross  product of  two function f and g is defined in the usual way by (f × g)(u, v) = f (u), g(v) . The first of the weak cardinality theorems, the nonspeedup theorem, is actually just a corollary of a more general theorem that is formulated first: the cross product theorem. 406 T. Tantau Theorem 4.2 (Cross product theorem). Let S be a well-orderable logical structure with universe U . Let f, g : U → U be functions. If f × g is elementarily (n + m)-enumerable in S, then f is elementarily n-enumerable in S or g is elementarily m-enumerable in S. Theorem 4.3 (Nonspeedup theorem). Let S be a well-orderable logical structure with universe U . Let A ⊆ U . If χnA is elementarily n-enumerable in S, then A is elementarily definable in S. Theorem 4.4 (Cardinality theorem for two words). Let S be a wellorderable logical structure with universe U . Let every finite relation on U be elementarily definable in S. Let A ⊆ U . If #A2 is elementarily 2-enumerable in S, then A is elementarily definable in S. Theorem 4.5 (Restricted cardinality theorem). Let S be a well-orderable logical structure with universe U . Let every finite relation on U be elementarily definable in S. Let A ⊆ U . If #An is elementarily n-enumerable in S via a relation R that never ‘enumerates’ 0 and n simultaneously, then A is elementarily definable in S. The premises of the first two and the last two of the above theorems differ in the following way: for the last two theorems we require that every finite relation on S is elementarily definable in S. An example of a logical structure where this is not the case is (ω1 , +, ·), where ω1 is the first uncountable ordinal number and + and · denote ordinal number addition and multiplication. Since this structure is uncountable, there exist a singleton set A = {α} with α ∈ ω1 that is not elementarily definable in (ω1 , +, ·). For this structure theorems 4.4 and 4.5 do not hold: #A2 is elementarily 2-enumerable in (ω1 , +, ·) since #A2 (x, y) ∈ {0, 1} for all x, y ∈ ω1 , but A is not elementarily definable in (ω1 , +, ·). 4.2 The Positive First-Order Case The above theorems cannot be applied to Turing enumerability since they refer to elementary definitions, not to positive elementary definitions. Unfortunately, the proofs of the theorems cannot simply be reformulated in a ‘positive’ way. They use negations to define the smallest element in a set B with respect to a well  ordering <. The defining formula is given by φ(x) = B(x)∧¬∃x′ x′ < x∧B(x′ ) . This is a fundamental problem: the set {(M, x) | x is the smallest word accepted by M } is not recursively enumerable. Thus if we insist on finding the smallest element in every recursively enumerable set, we will not be able to apply the theorems to Turing machines. Fortunately, a closer examination of the proofs shows that we do not actually need the smallest element in B, but just any element of B as long as the same element is always chosen. This is not as easy as it may sound—as is well-recognised in set theory, where the axiom of choice is needed for this choosing operation. Suppose you and a Weak Cardinality Theorems for First-Order Logic 407 friend wish to agree on a certain element of B, but neither you nor your friend know the set B beforehand. Rather, you must decide on a generic method of picking an element such that, when the set B becomes known to you and your friend, you will both pick the same element. Agreements like ‘pick some element from B’ will not guarantee that you both pick the same element, except if the set happens to be a singleton. We need a (partial) recursive choice function that assigns a word that is accepted by M to every Turing machine M , provided such a word exists. Such a choice function does, indeed, exist: it maps M to the first word that is accepted by M during a dovetailed simulation of M on all words. In the following, first-order logic is augmented by choice operators. Choice operators have been used for example in [10], but the following definitions are adapted to the purposes of this paper and differ from the formalism used in [10]. On the sematic side we augment logical structures by a choice function; on the syntactic side we augment first-order logic by a choice operator ε: Definition 4.6. A choice function on a set U is a function ζ : P(U ) → U such that ζ(B) ∈ B for all nonempty B ⊆ U . Definition 4.7. A choice structure is a pair (S, ζ) consisting of a logical structure S and a choice function ζ on the universe of S. Definition 4.8 (Syntax of the choice operator). First-order formulas with choice are defined inductively in the usual way with one addition: if x is a variable and φ is a first-order formula with choice, so is ε(x, φ).   In the next definition φ(S,ζ) (x) = u ∈ U | (S, ζ) |= φ[x = a] denotes the set of all u that make φ hold in (S, ζ) when plugged in for the variable x. Definition 4.9 (Semantics of the choice operator). The semantics of firstorder logic with choice operator is defined in the usual way with the following addition: a formula of the form ε(x, φ) holds in a choice structure (S, ζ) for an   assignment α if φ(S,ζ) (x) is nonempty and α(x) = ζ φ(S,ζ) (x) . As an example, consider the logical structure S = (N, +, ·, <, 0) and let ζ map every nonempty set of natural numbers to its smallest element. Let φ(x, y, z) =  ε z, 0 < z ∧ ∃a (x · a = z) ∧ ∃b (y · b = z) . Then φ(S,ζ) (x, y, z) is the set of all triples (n, m, k) such that k is the least common multiple of n and m: the formula 0 < z ∧ ∃a (x · a = z) ∧ ∃b (y · b = z) is true for all positive z that are multiples of both x and y; thus the choice operator picks the smallest one of these. The following theorem shows that the class of recursively enumerable sets can be characterised in terms of first-order logic with choice. Theorem 4.10. For every alphabet Σ there exists a choice structure (SRE,Σ , ζ) such that a language A ⊆ Σ ∗ is recursively enumerable iff it is positively elementarily definable with choice in (SRE,Σ , ζ). 408 T. Tantau We can now formulate the cross product theorem and the nonspeedup theorem in such a way that they can be applied both to finite automata and to Turing machines. Theorem 4.11 (Cross product theorem, positive version). Let (S, ζ) be a choice structure with universe U . Let the inequality relation on U be positively elementarily definable in (S, ζ). Let every finite relation on U that is elementarily definable with choice in (S, ζ) be positively elementarily definable with choice in (S, ζ). Let f, g : U → U be functions. If f × g is positively (n + m)-enumerable with choice in (S, ζ), then f is positively n-enumerable with choice in (S, ζ) or g is positively m-enumerable with choice in (S, ζ). Theorem 4.12 (Nonspeedup theorem, positive version). Let (S, ζ) be a choice structure with universe U . Let the inequality relation on U be positively elementarily definable in (S, ζ). Let every finite relation on U that is elementarily definable with choice in (S, ζ) be positively elementarily definable with choice in (S, ζ). Let A ⊆ U . If χnA is positively n-enumerable with choice in (S, ζ), then A is positively elementarily definable with choice in (S, ζ). The cross product theorem, theorem 4.2, is a consequence of its positive version, theorem 4.11. (And not the other way round, as one might perhaps expect.) The same is true for the nonspeedup theorem. To see this, consider a well-orderable structure S whose existence is postulated in theorem 4.2. Define a choice structure (S ′ , ζ) as follows: S ′ has the same universe as S and contains all relations that are elementarily definable in S. The function ζ maps each set A to its smallest element with respect the well-ordering of S’s universe. With these definitions, a relation is positively elementarily definable with choice in (S ′ , ζ) iff it is elementarily definable in S. 4.3 The Higher-Order Case We just saw that the cross product theorem for a certain logic, namely firstorder logic, is a consequence of the cross product theorem for a less powerful logic, namely positive first-order logic. We may ask whether we can similarly apply the theorems for first-order logic to higher-order logics. This is indeed possible and we can use the same kind of argument as above: Consider any logical structure S. Define a new structure S ′ as follows: it has the same universe as S and it contains every relation that is higher-order definable in S. Then a relation is elementarily definable in S ′ iff it is higher-order definable in S. This allows us to transfer the cross product theorem and all of the weak cardinality theorems to all logics that are at least as powerful as first-order logic. Just one example of such a transfer is the following: Theorem 4.13 (Cross product theorem for higher-order logic). Let S be a well-orderable logical structure with universe U . Let f, g : U → U be functions. If f × g is higher-order (n + m)-enumerable in S, then f is higher-order n-enumerable in S or g is higher-order m-enumerable in S. Weak Cardinality Theorems for First-Order Logic 5 409 Separability Theorems for First-Order Logic Kummer’s cardinality theorem can be reformulated in terms of separability. In n [26] it is shown that it is equivalent to the following statement, where A(k ) denotes the set of all n-tuples of distinct words such that exactly k of them are in A. Theorem 5.1 (Separability version of Kummer’s cardinality theorem). n Let A be a language. Suppose there exist recursively enumerable supersets of A( 0 ) , n n A( 1 ) , . . . , A(n) whose intersection is empty. Then A is recursive. In [26] it is also shown that the above statement is still true if we replace ‘recursive enumerable’ by ‘co-recursively enumerable’. The weak cardinality theorems for first-order logic can be reformulated in a similar way. Let us start with the cardinality theorem for two words. It can be stated equivalently as follows, where Ā = U \ A denotes the complement of A. Theorem 5.2. Let S be a well-orderable logical structure with universe U . Let every finite relation on U be elementarily definable in S. Let A ⊆ U . Suppose there exist elementarily definable supersets of A × A, A × Ā, and Ā × Ā whose intersection is empty. Then A is elementarily definable in S. The restricted cardinality theorem can be reformulated in terms of elementary separability. Let us call two sets A and B elementarily separable in a structure S if there exists a set C with A ⊆ C ⊆ B̄ that is elementarily definable in S. Theorem 5.3. Let S be a well-orderable structure with universe U . Let every n n finite relation on U be elementarily definable in S. Let A ⊆ U . If A( 0 ) and A(n) are elementarily separable in S, then A is elementarily definable in S. 6 Conclusion This paper proposed a new, logic-based approach to the proof of (weak) cardinality theorems. The approach has two advantages: 1. It unifies previous results in a single framework. 2. The results can easily be applied to other computational models. Regarding the first advantage, only the cross product theorem and the nonspeedup theorem are completely ‘unified’ by the theorems presented in this paper: the Turing machine versions and the finite automata versions of these theorems are just different instantiations of theorems 4.11 and 4.12. For the cardinality theorem for two words and for the restricted cardinality theorem the situation is (currently) more complex. These theorem hold for Turing machines and for finite automata, but different proofs are used. In particular, 410 T. Tantau the logical theorems cannot be instantiated for Turing enumerability. Nevertheless, the logical approach is fruitful here: the logical theorem can be instantiated for new models like Presburger arithmetics. Organised by computational model, the results of this paper can be summarised as follows: the cross product theorem and the nonspeedup theorem – – – – – – hold for Presburger arithmetic, hold for finite automata, do not hold for polynomial-time machines, hold for Turing machines, hold for natural number arithmetic, hold for ordinal number arithmetic. The cardinality theorem for two inputs and the restricted cardinality theorem – – – – – – hold for Presburger arithmetic, hold for finite automata, do not hold for polynomial-time machines, hold for Turing machines, hold for natural number arithmetic, do not hold for ordinal number arithmetic. The behaviour of ordinal number arithmetic is interesting: the cardinality theorem for two inputs and the restricted cardinality theorem fail since there exist ordinal numbers that are not elementarily definable in ordinal number arithmetic— but this is not a ‘problem’ for the cross product theorem and the nonspeedup theorem. The results of this paper raise the question of whether the cardinality theorem holds for first-order logic. I conjecture that this is the case, that is, I conjecture that for well-orderable structures S in which all finite relations can be elementarily defined, if #An is elementarily n-enumerable then A is elementarily definable. Proving this conjecture would also settle the open problem of whether the cardinality theorem holds for finite automata. References 1. H. Austinat, V. Diekert, and U. Hertrampf. A structural property of regular frequency computations. Theoretical Comput. Sci., 292(1):33–43, 2003. 2. H. Austinat, V. Diekert, U. Hertrampf, and H. Petersen. Regular frequency computations. In Proc. RIMS Symposium on Algebraic Systems, Formal Languages and Computation, volume 1166 of RIMS Kokyuroku, pages 35–42. Research Inst. for Mathematical Sci., Kyoto Univ., Japan, 2000. 3. R. Beigel. Query-Limited Reducibilities. PhD thesis, Stanford Univ., USA, 1987. 4. R. Beigel, W. I. Gasarch, M. Kummer, G. Martin, T. McNicholl, and F. Stephan. The complexity of ODDA n . J. Symbolic Logic, 65(1):1–18, 2000. 5. J. R. Büchi. On a decision method in restricted second-order arithmetic. In Proc. 1960 International Congress on Logic, Methodology and Philosophy of Sci., pages 1–11. Stanford Univ. Press, 1962. Weak Cardinality Theorems for First-Order Logic 411 6. J.-Y. Cai and L. A. Hemachandra. Enumerative counting is hard. Inf. Computation, 82(1):34–44, 1989. 7. W. I. Gasarch. Bounded queries in recursion theory: A survey. In Proceedings of the Sixth Annual Structure in Complexity Theory Conference, pages 62–78. IEEE Computer Soc. Press, 1991. 8. V. Harizanov, M. Kummer, and J. Owings. Frequency computations and the cardinality theorem. J. Symbolic Logic, 52(2):682–687, 1992. 9. L. A. Hemachandra. The strong exponential hierarchy collapses. J. Comput. Syst. Sci., 39(3):299–322, 1989. 10. D. Hilbert and P. Bernay. Grundlagen der Mathematik II, volume 50 of Die Grundlehren der mathematischen Wissenschaft in Einzeldarstellungen. SpringerVerlag, second edition, 1970. 11. A. Hoene and A. Nickelsen. Counting, selecting, and sorting by query-bounded machines. In Proc. 10th International Symposium on Theoretical Aspects of Comp. Sci., volume 665 of Lecture Notes on Comp. Sci., pages 196–205. Springer-Verlag, 1993. 12. N. Immerman. Nondeterministic space is closed under complementation. SIAM J. Comput., 17(5):935–938, 1988. 13. C. G. Jockusch, Jr. Reducibilities in Recursive Function Theory. PhD thesis, Massachusetts Inst. of Technology, USA, 1966. 14. J. Kadin. PNP[O(log n)] and sparse Turing-complete sets for NP. J. Comput. Syst. Sci., 39(3):282–298, 1989. 15. E. B. Kinber. Frequency computations in finite automata. Cybernetics, 2:179–187, 1976. 16. M. Kummer. A proof of Beigel’s cardinality conjecture. J. Symbolic Logic, 57(2):677–681, 1992. 17. M. Kummer and F. Stephan. Effecitive search problems. Mathematical Logic Quarterly, 40(2):224–236, 1994. 18. S. R. Mahaney. Sparse complete sets for NP: Solution of a conjecture of Berman and Hartmanis. J. Comput. Syst. Sci., 25(2):130–143, 1982. 19. R. McNaughton. Review of [5]. J. Symbolic Logic, 28(1):100–102, 1963. 20. A. Nickelsen. On polynomially D-verbose sets. In Proceedings of the 14th International Symposium on Theoretical Aspects of Computer Science, volume 1200 of Lecture Notes on Comp. Sci., pages 307–318. Springer-Verlag, 1997. 21. J. C. Owings, Jr. A cardinality version of Beigel’s nonspeedup theorem. J. Symbolic Logic, 54(3):761–767, 1989. 22. E. L. Post. Recursively enumerable sets of positive integers and their decision problems. Bulletin of the American Mathematical Society, 50:284–316, 1944. 23. R. Szelepcsényi. The method of forced enumeration for nondeterministic automata. Acta Informatica, 23(3):279–284, 1988. 24. T. Tantau. Comparing verboseness for finite automata and Turing machines. In Proc. 19th International Symposium on Theoretical Aspects of Comp. Sci., volume 2285 of Lecture Notes on Comp. Sci., pages 465–476. Springer-Verlag, 2002. 25. T. Tantau. Towards a cardinality theorem for finite automata. In Proc. 27th International Symposium on Mathematical Foundations of Comp. Sci., volume 2420 of Lecture Notes on Comp. Sci., pages 625–636. Springer-Verlag, 2002. 26. T. Tantau. On Structural Similarities of Finite Automata and Turing Machine Enumerability Classes. PhD thesis, Technical Univ. Berlin, Germany, 2003. 27. T. Tantau. Weak cardinality theorems for first-order logic. Technical Report TR03-024, Electronic Colloquium on Computational Complexity, www.eccc.unitrier.de/eccc, 2003. Compositionality of Hennessy-Milner Logic through Structural Operational Semantics Wan Fokkink1,2 , Rob van Glabbeek1 , and Paulien de Wind2 1 2 CWI, Department of Software Engineering PO Box 94079, 1090 GB Amsterdam, The Netherlands Vrije Universiteit Amsterdam, Department of Theoretical Computer Science De Boelelaan 1081a, 1081 HV Amsterdam, The Netherlands wan@cwi.nl, http://www.cwi.nl/˜wan/ rvg@cs.stanford.edu, http://theory.stanford.edu/˜rvg/ pdwind@cs.vu.nl, http://www.cs.vu.nl/˜pdwind/ Abstract. This paper presents a method for the decomposition of HML formulae. It can be used to decide whether a process algebra term satisfies a HML formula, by checking whether subterms satisfy certain formulae, obtained by decomposing the original formula. The method uses the structural operational semantics of the process algebra. The main contribution of this paper is that an earlier decomposition method from Larsen [14] for the De Simone format is extended to the more general ntyft/ntyxt format without lookahead. 1 Introduction In the past two decades, compositional methods have been developed for checking the validity of assertions in modal logics, used to describe the behaviour of processes. This means that the truth of an assertion for a composition of processes can be deduced from the truth of certain assertions for the components of the composition. Most research papers in this area focus on a particular process algebra. Barringer, Kuiper & Pnueli [3] present (a preliminary version of) a compositional proof system for concurrent programs, which is based on a rich temporal logic, including operators from process logic [10] and LTL [20]. For modelling concurrent programs they define a language including assignment, conditional and while statements. Interaction between parallel components is done via shared variables. In Stirling [22] modal proof systems are developed for subsets of CCS [16] (with and without silent actions) including only sequential and alternative composition, to decide the validity of formulae from Hennessy-Milner Logic (HML) [11]. In Stirling [23,24] the results from [22] are extended, creating proof systems for subsets of CCS and SCCS [18] including asynchronous and synchronous parallelism and infinite behaviour, using ideas from [3]. In Stirling [25] the proposals in [23,24] are generalised to be able to cope with the restriction operator. A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 412–422, 2003. c Springer-Verlag Berlin Heidelberg 2003  Compositionality of Hennessy-Milner Logic 413 In Winskel [26] a method is given to decompose formulae with respect to each operation in SCCS. The language of assertions is HML with infinite conjunction and disjunction. This decomposition provides the foundations of Winskel’s proof system for SCCS with modal assertions. In [27], [2] and [1] processes are described by specification languages inspired by CCS and CSP [6]. The articles describe compositional methods for deciding whether processes satisfy assertions from a modal µ-calculus [13]. Larsen [14] developed a more general compositional method for deciding whether a process satisfies a certain property. Unlike the aforementioned methods, this method is not oriented towards a particular process algebra, but it is based on structural operational semantics [19], which provides process algebras and specification languages with an interpretation. A transition system specification, consisting of an algebraic signature and a set of transition rules of the premises , generates a transition relation between the closed terms over the form conclusion signature. An example of a transition rule, for alternative composition, is a x1 −→ y a x1 + x2 −→ y meaning for states t1 , t2 and u that if state t1 can evolve into state u by the execution of action a, then so can state t1 + t2 . Larsen showed how to decompose HML formulae with respect to a transition system specification in the De Simone format [21]. This format was originally put forward to guarantee that the bisimulation equivalence associated with a transition system specification is a congruence, meaning that bisimulation equivalence is preserved by all functions in the signature. Larsen and Xinxin [15] extended this decomposition method to HML with recursion (which is equivalent to the modal µ-calculus). Since modal proof systems for specific process algebras are tailor-made, they may be more concise than the ones generated by the general decomposition method of Larsen (e.g., [23,24,25]). However, in some cases the general decomposition method does produce modal proof systems that are similar in spirit to those in the literature (e.g., [22,26]). In Bloom, Fokkink & van Glabbeek [4] a method is given for decomposing formulae from a fragment of HML with infinite conjunctions, with respect to terms from any process algebra that has a structural operational semantics in ntyft/ntyxt format [9] without lookahead. This format is a generalisation of the De Simone format, and still guarantees that bisimulation equivalence is a congruence. The decomposition method is not presented in its own right, but is used in the derivation of congruence formats for a range of behavioural equivalences from van Glabbeek [8]. In this paper the decomposition method from [4] is extended to full HML with infinite conjunction, again with respect to terms from any process algebra that has a structural operational semantics in ntyft/ntyxt format without lookahead. 414 2 W. Fokkink, R. van Glabbeek, and P. de Wind Preliminaries In this section we give the basic notions of structural operational semantics and Hennessy-Milner Logic (HML) that are needed to define our decomposition method. 2.1 Structural Operational Semantics Structural operational semantics [19] provides a framework to give an operational semantics to programming and specification languages. In particular, because of its intuitive appeal and flexibility, structural operational semantics has found considerable application in the study of the semantics of concurrent processes. Let V be an infinite set of variables. A syntactic object is called closed if it does not contain any variables from V . Definition 1 (signature). A signature is a collection Σ of function symbols f ∈ V , equipped with a function ar : Σ → N. The set T(Σ) of terms over a signature Σ is defined recursively by: – V ⊆ T(Σ), – if f ∈ Σ and t1 , . . . , tar(f ) ∈ T(Σ), then f (t1 , . . . , tar(f ) ) ∈ T(Σ). A term c() is abbreviated as c. For t ∈ T(Σ), var(t) denotes the set of variables that occur in t. T (Σ) is the set of closed terms over Σ, i.e. the terms t ∈ T(Σ) with var(t) = ∅. A Σ-substitution σ is a partial function from V to T(Σ). If σ is a Σ-substitution and S is any syntactic object, then σ(S) denotes the object obtained from S by replacing, for x in the domain of σ, every occurrence of x in S by σ(x). In that case σ(S) is called a substitution instance of S. A Σ-substitution is closed if it is a total function from V to T (Σ). In the remainder, let Σ denote a signature and A a set of actions, satisfying |Σ| ≤ |V | and |A| ≤ |V |. a Definition 2 (literal). A positive Σ-literal is an expression t −→ t′ and a a with t, t′ ∈ T(Σ) and a ∈ A. For t, t′ ∈ negative Σ-literal an expression t −→  a a ′  are said to deny each other. T(Σ) and a ∈ A, the literals t −→ t and t −→ Definition 3 (transition rule). A transition rule over Σ is an expression of the form H α with H a set of Σ-literals (the premises of the the rule) and α a positive Σ-literal (the conclusion). The left- and right-hand side of α are called the source and the target of the rule, respectively. A rule H α with H = ∅ is also written α. Definition 4 (transition system specification). A transition system specification (TSS) is a pair (Σ, R) with R a collection of transition rules over Σ. Compositionality of Hennessy-Milner Logic 415 Definition 5 (proof ). Let P = (Σ, R) be a TSS. A proof of a transition rule H α from P is a well-founded, upwardly branching tree of which the nodes are labelled by Σ-literals, and some of the leaves are marked “hypothesis”, such that: – the root is labelled by α, – H contains the labels of the hypotheses, and – if β is the label of a node q which is not an hypothesis and K is the set of labels of the nodes directly above q, then K β is a substitution instance of a transition rule in R. If a proof of K α from P exists, then K α is provable from P , notation P ⊢ K α. Definition 6 (transition relation). A transition relation over Σ is a relation a a  for → ⊆ T (Σ) × A × T (Σ). We write p −→ q for (p, a, q) ∈ → and p −→ a ¬∃q ∈ T (Σ) : p −→ q. Thus a transition relation over Σ can be regarded as a set of closed positive Σ-literals (transitions). A TSS with only positive premises specifies a transition relation in a straightforward way as the set of all provable transitions. But it is much less trivial to associate a transition relation to a TSS with negative premises. Several solutions are proposed in Groote [9], Bol & Groote [5] and van Glabbeek [7]. From the latter we adopt the notion of a well-supported proof and a complete TSS. Definition 7 (well-supported proof ). Let P = (Σ, R) be a TSS. A wellsupported proof of a closed literal α from P is a well-founded, upwardly branching tree of which the nodes are labelled by closed Σ-literals, such that: – the root is labelled by α, and – if β is the label of a node q and K is the set of labels of the nodes directly above q, then 1. either K β is a closed substitution instance of a transition rule in R 2. or β is negative and for every set N of negative closed literals such that P ⊢N γ for γ a closed literal denying β, a literal in K denies one in N . We say α is ws-provable from P , notation P ⊢ws α, if a well-supported proof of α from P exists. In [7] it was noted that ⊢ws is consistent, in the sense that no standard TSS admits well-supported proofs of two literals that deny each other. Definition 8 (completeness). A TSS P is complete if for any closed literal a a a either P ⊢ws p −→ p′ for some closed term p′ or P ⊢ws p −→.  p −→  Now a TSS specifies a transition relation if and only if it is complete. The specified transition relation is then the set of all ws-provable transitions. 416 2.2 W. Fokkink, R. van Glabbeek, and P. de Wind Hennessy-Milner Logic A variety of modal logics have been developed to express properties of transition relations. Modal logic aims to formulate properties of process terms, and to identify terms that satisfy the same properties. Hennessy & Milner [11] have defined a modal language, often called Hennessy-Milner Logic (HML), which characterises the bisimulation equivalence relation on process terms, assuming that each term has only finitely many outgoing transitions. This assumption can be discarded if infinite conjunctions are allowed [17,12]. Definition 9 (Hennessy-Milner Logic). Assume an action set A. The set O of potential observations or modal formulae is recursively defined by  ϕ ::= ϕi | a ϕ | ¬ϕ i∈I with a ∈ A and I some index set. Definition 10 (satisfaction relation). Let P = (Σ, R) be a TSS. The satisfaction relation |=P ⊆ T (Σ) × O is defined as follows, with p ∈ T (Σ):  p |=P ϕi iff p |=P ϕi for all i ∈ I i∈I a p |=P a ϕ iff there is a q ∈ T (Σ) such that P ⊢ws p −→ q and q |=P ϕ p |=P ¬ϕ iff p |=P ϕ  We will use the binary conjunction ϕ1 ∧ ϕ2 as an abbreviation of i∈{1,2} ϕi , whereas ⊤ is an abbreviation for the empty conjunction. We formu  identify ∼ ∼ ( lae that are logically equivalent using the laws ⊤ ∧ ϕ ϕ, = j∈Ji ϕj ) = i∈I  ∼ ∼ ϕ. This is justified because ϕ ψ implies p |= ϕ ⇔ ϕ and ¬¬ϕ = = P i∈I, j∈Ji j p |=P ψ. 3 Decomposing HML Formulae In this section we will see how one can decompose HML formulae with respect to process terms. The TSS defining the transition relation on these terms should be in ready simulation format [4], allowing only ntyft/ntyxt rules [9] without lookahead. Definition 11 (ntyxt,ntyft,nxytt). An ntytt rule is a transition rule in which the right-hand sides of positive premises are variables that are all distinct, and that do not occur in the source. An ntytt rule is an ntyxt rule if its source is a variable, and an ntyft rule if its source contains exactly one function symbol and no multiple occurrences of variables. An ntytt rule is an nxytt rule if the left-hand sides of its premises are variables. Definition 12 (lookahead). A transition rule has no lookahead if the variables occurring in the right-hand sides of its positive premises do not occur in the lefthand sides of its premises. Compositionality of Hennessy-Milner Logic 417 Definition 13 (ready simulation format). A TSS is in ready simulation format if its transition rules are ntyft or ntyxt rules that have no lookahead. Definition 14 (free). A variable occurring in a transition rule is free if it does not occur in the source nor in the right-hand sides of the positive premises of this rule. Definition 15 (decent). A transition rule is decent if it has no lookahead and does not contain free variables. In Bloom, Fokkink & van Glabbeek [4] for any TSS P in ready simulation format the collection of P -ruloids is defined. These are decent nxytt rules for which the following holds: Theorem 1. [4] Let P be a TSS in ready simulation format. Then P ⊢ws a σ(t) −→ p for t a term, p a closed term and σ a closed substitution, iff there are a P -ruloid H and a closed substitution σ ′ with P ⊢ws σ ′ (α) for α ∈ H, a t−→u σ ′ (t) = σ(t) and σ ′ (u) = p. Given a TSS P = (Σ, R) in ready simulation format, the following definition assigns to each term t ∈ T(Σ) and each observation ϕ ∈ O a collection t−1 P (ϕ) of decomposition mappings ψ : V → O. Each of these mappings ψ ∈ t−1 P (ϕ) guarantees, given a closed substitution σ, that σ(t) satisfies ϕ if σ(x) satisfies the formula ψ(x) for all x ∈ var (t). Moreover, whenever for some closed substitution σ the term σ(t) satisfies ϕ, there must be a decomposition mapping ψ ∈ t−1 P (ϕ) with σ(x) satisfying ψ(x) for all x ∈ var (t). This is formalised in Theorem 2 and proven thereafter. Definition 16. Let P = (Σ, R) be a TSS in ready simulation format. Then ·−1 P : T(Σ) → (O → P(V → O)) is defined by: H and a χ ∈ u−1 – ψ ∈ t−1 a P (ϕ) and ψ : V → O P ( a ϕ) iff there is a P -ruloid t−→u is given by     ¬ c ⊤ if x ∈ var (t) b χ(y) ∧ χ(x) ∧ c b ψ(x) = (x−→)∈H  (x−→y)∈H   ⊤ if x ∈ var (t)  – ψ ∈ t−1 P ( i∈I ϕi ) iff  ψi (x) ψ(x) = i∈I t−1 P (ϕi ) where ψi ∈ for i ∈ I. (¬ϕ) iff there is a function h : t−1 – ψ ∈ t−1 P (ϕ) → var (t) and ψ : V → O is P given by  ψ(x) = ¬χ(x) χ∈h−1 (x) 418 W. Fokkink, R. van Glabbeek, and P. de Wind When clear from the context, the subscript P will be omitted. It is not hard to see that if ψ ∈ t−1 P (ϕ) then ψ(x) = ⊤ for all x ∈ var (t). Theorem 2. Let P = (Σ, R) be a complete TSS in ready simulation format. Let ϕ ∈ O. For any term t ∈ T(Σ) and closed substitution σ : V → T (Σ) one has   σ(t) |= ϕ ⇔ ∃ψ ∈ t−1 (ϕ)∀x ∈ var (t) σ(x) |= ψ(x) Proof. With induction on the structure of ϕ. – ϕ = a ϕ′ ⇒ Suppose σ(t) |= a ϕ′ . Then by Definition 10 there is a p ∈ T (Σ) with a P ⊢ws σ(t) −→ p and p |= ϕ′ . Thus, by Theorem 1 there must be a P -ruloid H and a closed substitution σ ′ with P ⊢ws σ ′ (α) for α ∈ H, σ ′ (t) = σ(t), a t−→u ′ i.e. σ (x) = σ(x) for x ∈ var (t), and σ ′ (u) = p. Since σ ′ (u) |= ϕ′ , the induction hypothesis can be applied, and there must be a χ ∈ u−1 (ϕ′ ) such that σ ′ (z) |= χ(z) for all z ∈ var (u). Furthermore σ ′ (z) |= χ(z) = ⊤ for all z ∈ var (u). Now define ψ as indicated in Definition 16. By definition, ψ ∈ b b t−1 ( a ϕ′ ). Let x ∈ var (t). For (x −→ y) ∈ H one has P ⊢ws σ ′ (x) −→ σ ′ (y) c  ∈ H one has and σ ′ (y) |= χ(y), so σ ′ (x) |= b χ(y). Moreover, for (x −→) c c ′ ′  so the consistency of ⊢ws yields P ⊢ws σ (x) −→ q for all P ⊢ws σ (x) −→, q ∈ T (Σ), and thus σ ′ (x) |= ¬ c ⊤. It follows that σ(x) = σ ′ (x) |= ψ(x). ⇐ Now suppose that there is a ψ ∈ t−1 ( a ϕ′ ) such that σ(x) |= ψ(x) for all x ∈ var (t). This means that there is a P -ruloid bj a i {x −→ yi | i ∈ Ix , x ∈ var (t)} ∪ {x −→|  j ∈ Jx , x ∈ var (t)} a t −→ u and a decomposition mapping χ ∈ u−1 (ϕ′ ) such that, for all x ∈ var (t),   ¬ bj ⊤ ai χ(yi ) ∧ σ(x) |= χ(x) ∧ i∈Ix j∈Jx a i By Definition 10 it follows that, for x ∈ var (t) and i ∈ Ix , P ⊢ws σ(x) −→ pi for some pi ∈ T (Σ) with pi |= χ(yi ). Moreover, for x ∈ var (t) and j ∈ Jx , bj P ⊢ws σ(x) −→ q for all q ∈ T (Σ), so by the completeness of P , P ⊢ws bj σ(x) −→.  Let σ ′ be a closed substitution with σ ′ (x) = σ(x) for x ∈ var (t) and σ ′ (yi ) = pi for i ∈ Ix and x ∈ var (t). Here we use that the variables x and yi are all different. Now σ ′ (z) |= χ(z) for z ∈ var (u), using that u contains only variables that occur in t or in the premises of the ruloid. Thus the induction hypothesis can be applied, and σ ′ (u) |= ϕ′ . Moreover, a bj i P ⊢ws σ ′ (x) −→ σ ′ (yi ) for x ∈ var (t) and i ∈ Ix , and P ⊢ws σ ′ (x) −→  for a ′ ′ x ∈ var (t) and j ∈ Jx . So, by Theorem 1, P ⊢ws σ (t) −→ σ (u), which implies σ(t) = σ ′ (t) |= a ϕ′ . Compositionality of Hennessy-Milner Logic 419  – ϕ = i∈I  ϕi σ(t) |= i∈I ϕi ⇔ ∀i ∈ I : σ(t) |= ϕi ⇔ ∀i ∈ I ∃ψi∈ t−1 (ϕi ) ∀x ∈ var (t) : σ(x) |= ψi (x) ⇔ ∃ψ ∈ t−1 ( i∈I ϕi ) ∀x ∈ var (t) : σ(x) |= ψ(x). – ϕ = ¬ϕ′ ⇒ Suppose σ(t) |= ¬ϕ′ . Then by Definition 10 we have σ(t) |= ϕ′ . Using the induction hypothesis, there is no χ ∈ t−1 (ϕ′ ) such that σ(x) |= χ(x) for all x ∈ var (t). So for all χ ∈ t−1 (ϕ′ ) there is an x ∈ var (t) such that σ(x) |= ¬χ(x). Let us denote this x as h(χ), so that we obtain a function h : t−1 (ϕ′ ) → var (t) such that σ(h(χ)) |= ¬χ(h(χ)) for all χ ∈ t−1 (ϕ′ ). Define ψ ∈ t−1 (¬ϕ′ ) as indicated in Definition 16, using h. Let x ∈ var (t). If x = h(χ) for some χ ∈ t−1 (ϕ′ ) then σ(x) |= ¬χ(x). Hence, σ(x) |=  χ∈h−1 (x) ¬χ(x) = ψ(x). ⇐ Suppose that there is a ψ ∈ t−1 (¬ϕ′ ) such that σ(x) |= ψ(x) for all −1 ′ x ∈ var (t).  By Definition 16 there is a function h : t (ϕ ) → var (t) such that ψ(x) = χ∈h−1 (x) ¬χ(x) for all x ∈ var (t). So for all x ∈ var (t) and for all χ ∈ h−1 (x) we have that σ(x) |= ¬χ(x). In other words, for all χ ∈ t−1 (ϕ′ ),we  have σ(h(χ)) |= ¬χ(h(χ)). So ¬∃χ ∈ t−1 (ϕ′ )∀x ∈ var (t) σ(x) |= χ(x) . Then using the induction hypothesis, we have σ(t) |= ϕ′ , so σ(t) |= ¬ϕ′ . We give a few examples of the application of Definition 16. Example 1. Let A = {a, b} and let P = (Σ, R) with Σ consisting of the constant c and the binary function symbol f and R is: a a c −→ c a x1 −→ y x2 −→ y b b x1 −→  b f (x1 , x2 ) −→ y f (x1 , x2 ) −→ y This TSS is complete and in ready simulation format. We proceed to compute f (x1 , x2 )−1 ( b ⊤). There are two P -ruloids with a conclusion of the form a b f (x1 , x2 ) −→ , namely x1 −→y b f (x1 ,x2 )−→y a and x2 −→y b x1 −→  b f (x1 ,x2 )−→y . According to Definition −1 16, we have f (x1 , x2 ) ( b ⊤) = {ψ1 , ψ2 } with ψ1 and ψ2 as defined below, using χ ∈ y −1 (⊤) (so χ(x) = ⊤ for all variables x ∈ V ): ψ1 (x1 ) = χ(x1 ) ∧ a χ(y) = ⊤ ∧ a ⊤ = a ⊤ ψ1 (x2 ) = χ(x2 ) = ⊤ ψ1 (x) = ⊤ for x ∈ var (f (x1 , x2 )) ψ2 (x1 ) = χ(x1 ) ∧ ¬ b ⊤ = ⊤ ∧ ¬ b ⊤ = ¬ b ⊤ ψ2 (x2 ) = χ(x2 ) ∧ a χ(y) = ⊤ ∧ a ⊤ = a ⊤ ψ2 (x) = ⊤ for x ∈ var (f (x1 , x2 )) By Theorem 2 a closed term f (u1 , u2 ) can execute a b if and only if the closed term u1 can execute an a, or the closed term u1 can not execute a b and the closed term u2 can execute an a. Looking at the premises, this is what we would expect. 420 W. Fokkink, R. van Glabbeek, and P. de Wind Example 2. Using the TSS and the mappings ψ1 , ψ2 ∈ f (x1 , x2 )−1 ( b ⊤) from Example 1, we can compute f (x1 , x2 )−1 (¬ b ⊤). There are four possible functions h : f (x1 , x2 )−1 ( b ⊤) → var (f (x1 , x2 )), yielding four possible definitions of ψ ∈ f (x1 , x2 )−1 (¬ b ⊤). 1. If h(ψ1 ) = h(ψ2 ) = x1 then ψ(x1 ) = ¬ψ1 (x1 ) ∧ ¬ψ2 (x1 ) = ¬ a ⊤ ∧ ¬¬ b ⊤ = ¬ a ⊤ ∧ b ⊤ ψ(x2 ) = ⊤ 2. If h(ψ1 ) = h(ψ2 ) = x2 then ψ(x1 ) = ⊤ ψ(x2 ) = ¬ψ1 (x2 ) ∧ ¬ψ2 (x2 ) = ¬⊤ ∧ ¬ a ⊤ 3. If h(ψ1 ) = x1 and h(ψ2 ) = x2 then ψ(x1 ) = ¬ψ1 (x1 ) = ¬ a ⊤ ψ(x2 ) = ¬ψ2 (x2 ) = ¬ a ⊤ 4. If h(ψ1 ) = x2 and h(ψ2 ) = x1 then ψ(x1 ) = ¬ψ2 (x1 ) = ¬¬ b ⊤ = b ⊤ ψ(x2 ) = ¬ψ1 (x2 ) = ¬⊤ By Theorem 2 a closed term f (u1 , u2 ) can not execute a b if and only if (1) the closed term u1 can execute a b but not an a, or (3) the closed term u1 can not execute an a and the closed term u2 can not execute an a. Looking at the premises, this is again what we would expect. The other two possibilities (2) and (4) do not qualify, since no term can ever satisfy ¬⊤. A little less obvious example is the following: Example 3. Let A = {a, b} and let P = (Σ, R) with Σ consisting of the constant c and the unary function symbol f and R is: a c −→ c a x −→ y b f (x) −→ y b x −→ y a f (x) −→ f (y) This TSS is complete and in ready simulation format. We proceed to compute b f (f (x))−1 ( b a ⊤). The only P -ruloid that has a conclusion f (f (x)) −→ b is x−→y . So for each ψ ∈ f (f (x))−1 ( b a ⊤), ψ(x) = χ(x) ∧ b χ(y) b f (f (x))−→f (y) −1 with χ ∈ f (y) b is y −→z . a f (y)−→f (z) a ( a ⊤). The only P -ruloid that has a conclusion f (y) −→ So χ(y) = χ′ (y) ∧ b χ′ (z) with χ′ ∈ f (z)−1 (⊤). Since χ′ (y) = χ′ (z) = ⊤ we have χ(y) = b ⊤. Moreover x ∈ var (f (y)) implies χ(x) = ⊤. Hence ψ(x) = b b ⊤. By Theorem 2 a closed term f (f (u)) can execute a b followed by an a if and only if the closed term u can execute two consecutive b’s. Compositionality of Hennessy-Milner Logic 421 The following example shows that in Theorem 2 it is essential that the TSS is complete. That is, the theorem would fail if we would take the transition relation induced by a TSS to consist of those transitions for which a well-supported proof exists. Example 4. Let A = {a, b} and let P = (Σ, R) with Σ consisting of the constant c and the unary function symbol f and R is: a x −→  b f (x) −→ c a c −→  a c −→ c This TSS, which is in ready simulation format, is incomplete. For example, a a neither P ⊢ws c −→ t for a closed term t nor P ⊢ws c −→.  Let us assume that the transition relation induced by this TSS consists of those transitions for which a well-supported proof exists. Then there is no atransition for c and no b-transition for f (c), so c |= a ⊤ and f (c) |= b ⊤. a The only P -ruloid is x−→  b f (x)−→c . Hence Theorem 2 would yield f (c) |= b ⊤ ⇔ c |= ¬ a ⊤ ⇔ c |= a ⊤. Since this is false, Theorem 2 would fail with respect to P . References 1. H. R. Andersen, C. Stirling & G. Winskel (1994): A compositional proof system for the modal µ-calculus. In Proceedings, Ninth Annual IEEE Symposium on Logic in Computer Science, IEEE Computer Society Press, Paris, France, pp. 144–153. 2. H. R. Andersen & G. Winskel (1992): Compositional checking of satisfaction. Formal Methods in System Design 1(4), pp. 323–354. 3. H. Barringer, R. Kuiper & A. Pnueli (1984): Now you may compose temporal logic specifications. In ACM Symposium on Theory of Computing (STOC ’84), ACM Press, Baltimore, USA, pp. 51–63. 4. B. Bloom, W. J. Fokkink & R. J. van Glabbeek (2003): Precongruence formats for decorated trace semantics. ACM Transactions on Computational Logic. To appear. 5. R. Bol & J. F. Groote (1996): The meaning of negative premises in transition system specifications. Journal of the ACM 43(5), pp. 863–914. 6. S. D. Brookes, C. A. R. Hoare & A. W. Roscoe (1984): A theory of communicating sequential processes. Journal of the ACM 31(3), pp. 560–599. 7. R. J. van Glabbeek (1996): The meaning of negative premises in transition system specifications II. In F. Meyer auf der Heide & B. Monien, editors: Automata, Languages and Programming, 23rd Colloquium (ICALP ’96), Lecture Notes in Computer Science 1099, Springer-Verlag, Paderborn, Germany, pp. 502–513. 8. R. J. van Glabbeek (2001): The linear time – branching time spectrum I: The semantics of concrete, sequential processes. In J. A. Bergstra, A. Ponse & S. A. Smolka, editors: Handbook of Process Algebra, chapter 1, Elsevier, pp. 3–99. 9. J. F. Groote (1993): Transition system specifications with negative premises. Theoretical Computer Science 118(2), pp. 263–299. 422 W. Fokkink, R. van Glabbeek, and P. de Wind 10. D. Harel, D. Kozen & R. Parikh (1982): Process logic: Expressiveness, decidability, completeness. Journal of Computer and System Sciences 25(2), pp. 144–170. 11. M. C. B. Hennessy & R. Milner (1985): Algebraic laws for non-determinism and concurrency. Journal of the ACM 32(1), pp. 137–161. 12. M. C. B. Hennessy & C. Stirling (1985): The power of the future perfect in program logics. Information and Control 67(1–3), pp. 23–52. 13. D. Kozen (1983): Results on the propositional µ-calculus. Theoretical Computer Science 27(3), pp. 333–354. 14. K. G. Larsen (1986): Context-Dependent Bisimulation between Processes. PhD thesis, University of Edinburgh, Edinburgh. 15. K. G. Larsen & L. Xinxin (1991): Compositionality through an operational semantics of contexts. Journal of Logic and Computation 1(6), pp. 761–795. 16. R. Milner (1980): A Calculus of Communicating Systems. Springer-Verlag. Volume 92 of Lecture Notes in Computer Science. 17. R. Milner (1981): A modal characterization of observable machine-behaviour. In E. Astesiano & C. Böhm, editors: CAAP ’81: Trees in Algebra and Programming, 6th Colloquium, Lecture Notes in Computer Science 112, Springer-Verlag, Genoa, pp. 25–34. 18. R. Milner (1983): Calculi for synchrony and asynchrony. Theoretical Computer Science 25(3), pp. 267–310. 19. G. D. Plotkin (1981): A structural approach to operational semantics. Technical Report DAIMI FN-19, Computer Science Department, Aarhus University, Aarhus, Denmark. 20. A. Pnueli (1981): The temporal logic of concurrent programs. Theoretical Computer Science 13, pp. 45–60. 21. R. De Simone (1985): Higher-level synchronising devices in Meije–SCCS. Theoretical Computer Science 37(3), pp. 245–267. 22. C. Stirling (1985): A proof-theoretic characterization of observational equivalence. Theoretical Computer Science 39(1), pp. 27–45. 23. C. Stirling (1985): A complete compositional modal proof system for a subset of CCS. In W. Brauer, editor: Automata, Languages and Programming, 12th Colloquium (ICALP ’85), Lecture Notes in Computer Science 194, Springer-Verlag, pp. 475–486. 24. C. Stirling (1985): A complete modal proof system for a subset of SCCS. In H. Ehrig, C. Floyd, M. Nivat & J. W. Thatcher, editors: Mathematical Foundations of Software Development: Proceedings of the Joint Conference on Theory and Practice of Software Development (TAPSOFT), Volume 1: Colloquium on Trees in Algebra and Programming (CAAP ’85), Lecture Notes in Computer Science 185, Springer-Verlag, pp. 253–266. 25. C. Stirling (1987): Modal logics for communicating systems. Theoretical Computer Science 49(2-3), pp. 311–347. 26. G. Winskel (1986): A complete proof system for SCCS with modal assertions. Fundamenta Informaticae IX, pp. 401–420. 27. G. Winskel (1990): On the compositional checking of validity (extended abstract). In J. C. M. Baeten & J. W. Klop, editors: CONCUR ’90: Theories of Concurrency: Unification and Extension, Lecture Notes in Computer Science 458, Springer-Verlag, Amsterdam, The Netherlands, pp. 481–501. On a Logical Approach to Estimating Computational Complexity of Potentially Intractable Problems⋆ Andrzej Szaáas The College of Economics and Computer Science, Olsztyn, Poland and Department of Computer Science, University of Linköping, Sweden andsz@ida.liu.se Abstract. In the paper we present a purely logical approach to estimating computational complexity of potentially intractable problems. The approach is based on descriptive complexity and second-order quantifier elimination techniques. We illustrate the approach on the case of the transversal hypergraph problem, TransHyp, which has attracted a great deal of attention. The complexity of the problem remains unsolved for over twenty years. Given two hypergraphs, G and H, TransHyp depends on checking whether G = Hd , where Hd is the transversal hypergraph of H. In the paper we provide a logical characterization of minimal transversals of a given hypergraph and prove that checking whether G ⊆ Hd is tractable. For the opposite inclusion the problem still remains open. However, we interpret the resulting quantifier sequences in terms of determinism and bounded nondeterminism. The results give better upper bounds than those known from the literature, e.g., in the case when hypergraph H has a sub-logarithmic number of hyperedges and (for the deterministic case) all hyperedges have the cardinality bounded by a function sub-linear wrt maximum of sizes of G and H. Keywords: second-order logic, second-order quantifier elimination, descriptive complexity, transversal hypergraph problem 1 Introduction In the current paper we propose a rather general methodology for estimating the complexity of potentially intractable problems. The methodology consists of the following steps1 : 1. Specify the problem in the second-order logic. The complexity of checking validity of second-order formulas in a finite model is PSpace-complete wrt the size of the model. Thus, for all problems in PSpace such a description exists. The existential fragment of the second-order logic2 is NPTime⋆ 1 2 Supported in part by the KBN grant 8 T11C 00919. Below and throughout the paper we apply well-known results of descriptive complexity theory. For the relevant details see, e.g., [5,12]. I.e., the fragment consisting of formulas in which all second-order quantifiers are existential and appear only in prefixes of formulas. A. Lingas and B.J. Nilsson (Eds.): FCT 2003, LNCS 2751, pp. 423–431, 2003. c Springer-Verlag Berlin Heidelberg 2003  424 A. Szaáas complete over finite models. Dually, the universal fragment of second-order logic is co-NPTime-complete over finite models. 2. Try to eliminate second-order quantifiers. An application of known methods, if successful, might result in3 : – a formula of the first-order logic, validity of which (over finite models) is in PTime and LogSpace. Here one can apply, e.g., the Ackermann lemma (see Lemma 2.4) or the SCAN algorithm of [10]; – a formula of the fixpoint logic, validity of which (over finite models) is in PTime.4 Here one can apply the elimination theorem of [15]. 3. If the second-order quantifier elimination is not successful, which is likely to happen for NPTime, co-NPTime or PSpace-complete problems, one can try to identify subclasses of the problem, for which elimination of second-order quantifiers is guaranteed. In such cases tractable (or quasi-polynomial) subproblems of the main problem can be identified. Below we apply the methodology to the transversal hypergraph problem and show that inclusion in one direction is in PTime. We also identify some tractable and almost tractable cases for verifying the opposite inclusion, and relate the results to a bounded nondeterminism. Let us, however, emphasize that our main goal is to show how logic can help in analyzing the complexity of problems which can be naturally expressed by means of the second-order logic. The hypergraph problem is chosen mainly as a case study. Hypergraph theory [2] has may applications in computer science and artificial intelligence (see, e.g., [3,6,7,11,13]). In particular, the transversal hypergraph problem, TransHyp, has attracted a great deal of attention. Many important problems of databases, knowledge representation, Boolean circuits, duality theory, diagnosis, machine learning, data mining, explanation finding, etc. can be reduced to TransHyp (see, e.g., [7]). However, the precise complexity of this problem remains open for over twenty years. The best known algorithm, provided in [9], runs in quasi-polynomial time wrt the size of the input hypergraphs. More precisely, if n is the size of the input hypergraphs, then the algorithm of [9] requires no(log n) steps. The paper [8] provides a result that relates TransHyp to a limited nondeterminism by showing that the complement of the problem can be solved in polynomial time with O(χ(n) ∗ log n) guessed bits, where χ(n)χ(n) = n. As observed in [9], χ(n) ≈ log n/ log log n = o(log n). 2 Preliminaries Let us first define notions related to the TransHyp problem. We provide definitions slightly adapted for further logical characterization. However, the definitions are equivalent to those considered in the literature. Definition 2.1. By a hypergraph we mean a triple H = V, E, M , where 3 4 For an overview of known second-order quantifier elimination techniques see, e.g., [14]. Recall that fixpoint logic captures all problems solvable in deterministic polynomial time, provided that the underlying domain is linearly ordered. On a Logical Approach to Estimating Computational Complexity 425 – V and E are finite disjoint sets of elements and hyperedges, respectively – M ⊆ E × V is an edge membership relation. A transversal of H is any set T ⊆ V such that for any hyperedge e ∈ E there is v ∈ V such that (T (v) ∧ M (e, v)) holds. A transversal is minimal iff it is minimal wrt set inclusion. In the sequel we sometimes identify hyperedges with sets of their members, i.e., any hyperedge e ∈ E of hypergraph H = V, E, M  is identified with set {v ∈ V : M (e, v) holds}. Definition 2.2. By the transversal hypergraph5 of a hypergraph H we mean hypergraph Hd whose hyperedges are all minimal transversals of H. Definition 2.3. By the transversal hypergraph problem, denoted by TransHyp, we mean a problem of checking, for given hypergraphs G and H, whether G = Hd . We say that a formula Φ is positive w.r.t. a predicate P iff any occurrence of P in Φ appears within the scope of an even number of negations only6 . Dually, we say that Φ is negative w.r.t. P iff any occurrence of P in Φ appears within the scope of an odd number of negations only. x̄  By Ψ P (t̄) := [Φ]t̄ we understand formula obtained from Ψ by replacing every occurrence of P in by Φ where in each replacement the actual argument of P , say t̄, replaces the variables of x̄ in Φ (with renaming bound variables, whenever necessary). The following lemma is substantial for the technique we propose. Lemma 2.4. Let P be a predicate variable and let Φ and Ψ (P ) be first–order formulas such that Ψ (P ) is positive w.r.t. P and Φ contains no occurrences of P . Then  x̄  ∃P ∀x̄ (P (x̄) → Φ(x̄)) ∧ Ψ (P ) ≡ Ψ P (t̄) := [Φ]t̄ and similarly if the sign of P is switched to ¬ and Ψ is negative w.r.t. P . Lemma 2.4 was proved by Ackermann in [1]. It can also be found in [16] and, in the context of circumscription7 , in [4]. A substantially stronger elimination theorem extending this lemma is given in [15]. We shall also need the following simple proposition. Proposition 2.5. Let P be a predicate variable and let Φ, Ψ be first–order formulas. Assume that P does not occur in Φ. Then  x̄  ∃P ∀x̄ (P (x̄) ≡ Φ(x̄)) ∧ Ψ (P ) ≡ Ψ P (t̄) := [Φ]t̄ 5 6 7 Called also a dual hypergraph. Under the standard convention stating that implication (Ψ1 → Ψ2 ) is treated as the disjunction (¬Ψ1 ∨ Ψ2 ), and equivalence (Ψ1 ≡ Ψ2 ) is treated as formula [(Ψ1 ∧ Ψ2 ) ∨ (¬Ψ1 ∧ ¬Ψ2 )]. Observe that the conjunction (1)∧(2), substantial for our considerations, is simply the circumscribed formula (1), where T is minimized. 426 3 A. Szaáas Characterization of Minimal Transversals of Hypergraphs Obviously, T is a transversal of hypergraph H = V, E, M  iff ∀e ∈ E∃v ∈ V (T (v) ∧ M (e, v)). It is a minimal transversal iff ∀e ∈ E∃v ∈ V (T (v) ∧ M (e, v))∧ ′ ′ (1) ′ ∀T {[∀e ∈ E∃v ∈ V (T (v) ∧ M (e, v)) ∧ ∀w ∈ V (T (w) → T (w))] → ∀u ∈ V (T (u) → T ′ (u))} (2) Formula (2) is a universal second-order formula. Application of this formula to the verification whether a given transversal is minimal, is thus in co-NPTime. On the other hand, one can eliminate the second-order quantification by applying Lemma 2.4. To do this, we first negate (2): ∃T ′ {[∀e ∈ E∃v ∈ V (T ′ (v) ∧ M (e, v)) ∧ ∀w ∈ V (T ′ (w) → T (w))]∧ ∃u ∈ V (T (u) ∧ ¬T ′ (u))} (3) Formula (3) is equivalent to ∃u ∈ V ∃T ′ [∀w ∈ V (T ′ (w) → T (w))∧ ′ (4) ′ ∀e ∈ E∃v ∈ V (T (v) ∧ M (e, v)) ∧ T (u) ∧ ¬T (u)], i.e., to ∃u ∈ V ∃T ′ [∀w ∈ V (T ′ (w) → T (w))∧ ∀e ∈ E∃v ∈ V (T ′ (v) ∧ M (e, v)) ∧ T (u)∧ ∀w ∈ V (T ′ (w) → w = u)], and finally, to ∃u ∈ V ∃T ′ [∀w ∈ V (T ′ (w) → (T (w) ∧ w = u))∧ (5) ∀e ∈ E∃v ∈ V (T ′ (v) ∧ M (e, v)) ∧ T (u)]. After the application of Lemma 2.4 we obtain the following formula equivalent to (5): ∃u ∈ V [∀e ∈ E∃v ∈ V (T (v) ∧ v = u ∧ M (e, v)) ∧ T (u)]. (6) After negating formula (6) and rearranging the result, we obtain the following first-order formula equivalent to (2): ∀u ∈ V [T (u) → ∃e ∈ E∀v ∈ V ((T (v) ∧ M (e, v)) → v = u)]. (7) Let H = V, E, M  be a hypergraph. In the sequel we use notation M inH (T ), defined by def M inH (T ) ≡ ∀e ∈ E∃v ∈ V (T (v) ∧ M (e, v))∧ ∀u ∈ V [T (u) → ∃e ∈ E∀v ∈ V ((T (v) ∧ M (e, v)) → v = u)]. We now have the following lemma. (8) On a Logical Approach to Estimating Computational Complexity 427 Lemma 3.1. For any hypergraph H = V, E, M , T is a minimal transversal of H iff it satisfies formula M inH (T ). In consequence8 , checking whether a given T is a minimal transversal of a hypergraph is in PTime and LogSpace wrt the size of the hypergraph. 4 4.1 Specification of the TransHyp Problem In Logic Specification of the TransHyp Problem in the Second-Order Logic Let G = V, EG , MG  and H = V, EH , MH  be hypergraphs. In order to check whether G = Hd , we verify inclusions G ⊆ Hd and Hd ⊆ G. The inclusions can be characterized in the second-order logic as follows: d d ∀e ∈ EG ∃ e′ ∈ EH ∀v ∈ V (MG (e, v) ≡ MH (e′ , v)) d d ∀e′ ∈ EH ∃ e ∈ EG ∀v ∈ V (MG (e, v) ≡ MH (e′ , v)). (9) (10) According to Lemma 3.1, formulas (9) and (10) can be expressed as ∀e ∈ EG ∃ T [M inH (T ) ∧ ∀v ∈ V (MG (e, v) ≡ T (v))] ∀T [M inH (T ) → ∃ e ∈ EG ∀v ∈ V (MG (e, v) ≡ T (v))]. (11) (12) The above specification leads to intractable algorithms (unless PTime = NPTime). In the following sections we attempt to reduce the complexity by eliminating second-order quantifiers from formulas (11) and (12). 4.2 The Case of Inclusion G ⊆ Hd Consider the second-order part of formula (11), i.e., ∃ T [M inH (T ) ∧ ∀v ∈ V (MG (e, v) ≡ T (v))]. (13) 9 Due to equivalence (8), Lemma 3.1 and Proposition 2.5, formula (13) is equivalent to ∀e′ ∈ EH ∃v ∈ V (MG (e, v) ∧ MH (e′ , v))∧ (14) ∀u ∈ V [MG (e, u) → ∃e′ ∈ EH ∀v ∈ V ((MG (e, v) ∧ MH (e′ , v)) → v = u)]. In consequence, formula (11) is equivalent to ∀e ∈ EG ∀e′ ∈ EH ∃v ∈ V (MG (e, v) ∧ MH (e′ , v))∧ (15) ∀e ∈ EG ∀u ∈ V [MG (e, u) → ∃e′ ∈ EH ∀v ∈ V ((MG (e, v) ∧ MH (e′ , v)) → v = u)]. Thus the inclusion G ⊆ Hd is first-order definable by formula (15). We then have the following corollary. Corollary 4.1. For any hypergraphs G = V, EG , MG  and H = V, EH , MH , checking whether G ⊆ Hd , is in PTime and LogSpace wrt the maximum of sizes of hypergraphs G and H. 8 This easily follows from the equivalence (8) by which M inH (T ) is characterized by a firstorder formula. 9 Note that in order to apply Proposition 2.5, bound variable e is renamed into e′ 428 A. Szaáas 4.3 The Case of Inclusion Hd ⊆ G Unfortunately, no known second-order quantifier elimination method is successful for the inclusion (12). We thus equivalently transform formula (12) to a form where Lemma 2.4 is applicable. The verification of the resulting formula in finite models is, in general, of exponential complexity. However, when some restrictions are assumed, the complexity reduces to the deterministic polynomial or quasi-polynomial time, as shown below. By (8), formula (12) is equivalent to ∀T {[∀e ∈ EH ∃v ∈ V (T (v) ∧ MH (e, v))∧ (16) ∀u ∈ V [T (u) → ∃e ∈ EH ∀v ∈ V ((T (v) ∧ MH (e, v)) → v = u)]] → ∃ e ∈ EG ∀v ∈ V (MG (e, v) ≡ T (v))} Let us assume that the inclusion G ⊆ Hd holds. If not, then the answer to TransHyp for this particular instance is negative. Under this assumption, formula (16) is equivalent to10 ∀T {[∀e ∈ EH ∃v ∈ V (T (v) ∧ MH (e, v))∧ (17) ∀u ∈ V [T (u) → ∃e ∈ EH ∀v ∈ V ((T (v) ∧ MH (e, v)) → v = u)]] → ∃ e ∈ EG ∀v ∈ V (MG (e, v) → T (v))}. In order to apply Lemma 2.4 we first negate (17): ∃T {∀e ∈ EH ∃v ∈ V (T (v) ∧ MH (e, v))∧ (18) ∀u ∈ V [T (u) → ∃e ∈ EH ∀v ∈ V ((T (v) ∧ MH (e, v)) → v = u)]∧ ∀ e ∈ EG ∃v ∈ V (MG (e, v) ∧ ¬T (v))}. In order to simplify calculations, by Γ (T ) we denote the conjunction of formulas given in the last two lines of (18). Formula (18) is then expressed by ∃T {∀e ∈ EH ∃v ∈ V (T (v) ∧ MH (e, v)) ∧ Γ (T )}. (19) Observe that Γ (T ) is negative wrt T . Thus the main obstacle for applying Lemma 2.4 is created by the existential quantifier ∃v ∈ V appearing within the scope of ∀e ∈ EH . def Assume EH = {e1 , . . . , ek }. Denote by Ve = {x : MH (e, x) holds}. Formula (19) can then be expressed by ∃T {∃v1 ∈ Ve1 T (v1 ) ∧ . . . ∧ ∃vk ∈ Vek T (vk ) ∧ Γ (T )}, i.e., by ∃v1 ∈ Ve1 . . . ∃vk ∈ Vek ∃T {T (v1 ) ∧ . . . ∧ T (vk ) ∧ Γ (T )}, which is equivalent to ∃v1 ∈ Ve1 . . . ∃vk ∈ Vek ∃T {∀v ∈ V [(v = v1 ∨ . . . ∨ v = vk ) → T (v)] ∧ Γ (T )}. 10 By minimality of Hd , and the assumption G ⊆ Hd , inclusion expressed by ∀v ∈ V (MG (e, v) → T (v)) is equivalent to the set equality, expressed by ∀v ∈ V (MG (e, v) ≡ T (v)). On a Logical Approach to Estimating Computational Complexity 429 The application of Lemma 2.4 results in the following first-order formula: v ∃v1 ∈ Ve1 . . . ∃vk ∈ Vek {Γ [T (t) := [(v = v1 ∨ . . . ∨ v = vk )]t ] }. In consequence, formula (17) is equivalent to v ∀v1 ∈ Ve1 . . . ∀vk ∈ Vek {¬Γ [T (t) := [(v = v1 ∨ . . . ∨ v = vk )]t ] }, i.e., to ∀v1 ∈ Ve1 . . . ∀vk ∈ Vek ∀u ∈ V {[(u = v1 ∨ . . . ∨ u = vk ) → (20) ∃e ∈ EH ∀v ∈ V [((v = v1 ∨ . . . ∨ v = vk ) ∧ MH (e, v)) → v = u]] → ∃ e ∈ EG ∀v ∈ V (MG (e, v) → (v = v1 ∨ . . . ∨ v = vk ))}. The major complexity of checking whether given hypergraphs satisfy formula (20) is caused by the sequence of quantifiers ∀v1 ∈ Ve1 . . . ∀vk ∈ Vek ∀u. We then have the following theorem. Theorem 4.2. For given hypergraphs G and H, such that G ⊆ Hd , the problem of checking whether Hd ⊆ G is solvable in time O(|V1 | ∗ . . . ∗ |Vk | ∗ p(n)), where p(n) is a polynomial11 , n is the maximum of sizes of G and H, k is the number of edges in H, and for e = 1, . . . , k, |Ve | denotes the cardinality of set {x : MH (e, x) holds}. Accordingly we have the following corollary. Corollary 4.3. Under assumptions of Theorem 4.2, if cardinalities |V1 |, . . . , |Vk | are bounded by a function f (n) then the problem of checking whether Hd ⊆ G is solvable in time O(f (n)k ∗ p(n)). In the view of the result given in [9], Corollary 4.3 can be useful if k is bounded by a (sub-) logarithmic function, and f (n) is (sub-)linear wrt n. For instance, if both k and f (n) are bounded by log n then the corollary gives us an upper bound O((log n)log n ∗ p(n)) which is better than that offered by algorithm of [9]. Let us emphasize that in many cases |V | and consequently f (n) is bounded by log n, since the dual hypergraph might be of size exponential wrt |V |. The characterization provided by formula (20) is also related to the bounded nondeterminism. Namely, consider the complement of TransHyp problem. The sequence of quantifiers ∀v1 ∈ Ve1 . . . ∀vk ∈ Vek appearing in formula (20) is transformed into ∃v1 ∈ Ve1 . . . ∃vk ∈ Vek . In order to verify the negated formula it is then sufficient to guess k sequences of bits of size not greater than log max {|Ve |}. Thus, in the worst e=1,...,k case, it suffices to guess k ∗ log |V | bits. By the result of [8], mentioned in Section 1, O(log2 n) guessed bits suffice to further solve the TransHyp problem in deterministic polynomial time. Thus the observation we just made is useful, e.g., when one considers the input graph H with the number of edges (sub-)logarithmic wrt n. Observe, however, that often n is exponentially larger than |V |.12 11 12 Reflecting the complexity introduced by quantifiers inside formula (20). This frequently happens in the duality theory, where the number of prime implicants and implicates is exponential wrt the size of the input formula. 430 5 A. Szaáas Conclusions In the paper we presented a purely logical approach to estimating computational complexity of potentially intractable problems. We illustrated the approach on the case of the complexity of the TransHyp problem. We provided a logical characterization of minimal transversals of a given hypergraph and proved that checking the inclusion G ⊆ Hd is tractable. For the opposite inclusion the problem still remains open. However, we interpreted the resulting quantifier sequences in terms of determinism and bounded nondeterminism. The results give better upper bounds than those known from the literature in the case when hypergraph H has a sub-logarithmic number of hyperedges and (for the deterministic case) all hyperedges have the cardinality bounded by a function sub-linear wrt the maximum of sizes of the input hypergraphs. Let us also emphasize that the simplest second-order quantifier elimination techniques were applied. In some cases it might be useful to apply theorem of [15] which results in a fixpoint formula, i.e., much stronger but still tractable formalism. References 1. W. Ackermann. Untersuchungen über das eliminationsproblem der mathematischen logik. Mathematische Annalen, 110:390–413, 1935. 2. C. Berge. Hypergraphs, volume 45 of North-Holland Mathematical Library. Elsevier, 1989. 3. E. Boros, V. Gurvich, L. Khachiyan, and K. Makino. Generating partial and multiple transversals of a hypergraph. In Automata, Languages and Programming, volume 1853 of Lecture Notes in Computer Science, pages 588–599. Springer, 2000. 4. P. Doherty, W. Lukaszewicz, and A. Szaáas. Computing circumscription revisited. Journal of Automated Reasoning, 18(3):297–336, 1997. 5. H-D. Ebbinghaus and J. Flum. Finite Model Theory. Springer-Verlag, Heidelberg, 1995. 6. T. Eiter and G. Gottlob. Identifying the minimal transversals of a hypergraph and related problems. SIAM Journal on Computing, 24(6):1278–1304, 1995. 7. T. Eiter and G. Gottlob. Hypergraph transversal computation and related problems in logic and AI. In M. Flesca, S. Greco, N. Leone, and G. Ianni, editors, Proceedings of the 8th Conference JELIA 2002, LNAI 2424, pages 549–564. Springer-Verlag, 2002. 8. T. Eiter, G. Gottlob, and K. Makino. New results on monotone dualization and generating hypergraph transversals. In ACM STOC 2002, pages 14–22, 2002. 9. M.L. Fredman and L. Khachiyan. On the complexity of dualization of monotone disjunctive normal forms. Journal of Algorithms, 21:618–628, 1996. 10. D. M. Gabbay and H. J. Ohlbach. Quantifier elimination in second-order predicate logic. In B. Nebel, C. Rich, and W. Swartout, editors, Principles of Knowledge representation and reasoning, KR 92, pages 425–435. Morgan Kauffman, 1992. 11. G. Gogic, C.H. Papadimitriou, and M. Sideri. Incremental recompilation of knowledge. Journal of Artificial Intelligence Research, 8:23–37, 1998. 12. N. Immerman. Descriptive Complexity. Springer-Verlag, New York, Berlin, 1998. 13. D.J. Kavvadias and E.C. Stavropoulos. Evaluation of an algorithm for the transversal hypergraph problem. In J. Scott Vitter and C. D. Zaroliagis, editors, Algorithm Engineering, 3rd International Workshop, WAE ’99, volume 1668 of Lecture Notes in Computer Science, pages 72–84. Springer, 1999. 14. A. Nonnengart, H.J. Ohlbach, and A. Szaáas. Elimination of predicate quantifiers. In H.J. Ohlbach and U. Reyle, editors, Logic, Language and Reasoning. Essays in Honor of Dov Gabbay, Part I, pages 159–181. Kluwer, 1999. On a Logical Approach to Estimating Computational Complexity 431 15. A. Nonnengart and A. Szaáas. A fixpoint approach to second-order quantifier elimination with applications to correspondence theory. In E. Oráowska, editor, Logic at Work: Essays Dedicated to the Memory of Helena Rasiowa, volume 24 of Studies in Fuzziness and Soft Computing, pages 307–328. Springer Physica-Verlag, 1998. 16. A. Szaáas. On the correspondence between modal and classical logic: An automated approach. Journal of Logic and Computation, 3:605–620, 1993. Author Index Ablayev, Farid 296 Aleksandrov, Lyudmil Angel, Eric 39 Antunes, Luı́s 303 Arora, Sanjeev 1 Arpe, Jan 158 Asano, Takao 2 Kratsch, Dieter 61 Kuich, Werner 376 Kutrib, Martin 321 246 Lanka, André 15 Latteux, Michel 355 Liśkiewicz, Maciej 158 Lipton, Richard J. 311 Bampis, Evripidis 39 Berstel, Jean 343 Boasson, Luc 343 Bodlaender, Hans 61 Brandstädt, Andreas 61 Bugliesi, Michele 364 Maheshwari, Anil 246 Mastrolilli, Monaldo 49 Miltersen, Peter Bro 171 Moser, Philippe 333 Niedermeier, Rolf 195 Nilsson, Bengt J. 234 Carton, Olivier 343 Ceccato, Ambra 364 Chlebı́k, Miroslav 27 Chlebı́ková, Janka 27 Cieliebak, Mark 98 Coja-Oghlan, Amin 15 Pagourtzis, Aris 98 Papadimitriou, Christos Păun, Gheorghe 284 Pech, Christian 387 Persson, Mia 234 Petazzoni, Bruno 343 Pin, Jean-Éric 343 Damaschke, Peter 183 Damgård, Ivan Bjerre 109, 118 Eidenbenz, Stephan 98 Evans, Patricia A. 210 Fokkink, Wan 412 Fomin, Fedor V. 73 Fortnow, Lance 303 Frandsen, Gudmund Skovbjerg Gainutdinova, Aida 296 Glabbeek, Rob van 412 Goerdt, Andreas 15 Gourvès, Laurent 39 Gramm, Jens 195 Gudmundsson, Joachim 86 Guo, Jiong 195 Halava, Vesa 355 Hammar, Mikael 234 Hansen, Kristoffer Arnsfelt 171 Harju, Tero 355 Heggernes, Pinar 73 Hoogeboom, Hendrik Jan 355 Jakoby, Andreas Kik, Marcin 132 158 157 Rao, Michaël 61 Reif, John H. 258, 271 Rossi, Sabina 364 109, 118 Sack, Jörg-Rüdiger 246 Schädlich, Frank 15 Smith, Andrew D. 210 Spinrad, Jeremy 61 Stachowiak, Grzegorz 144 Sun, Zheng 258, 271 Szalas, Andrzej 423 Tantau, Till 400 Telle, Jan Arne 73 Viglas, Anastasios 311 Vinay, V. 171 Vinodchandran, N.V. 303 Wagner, Klaus W. 376 Wind, Paulien de 412 Zhu, Binhai 222