Idea Transcript
Frontiers in Mathematics
Jens Flemming
Source Conditions,
Variational
Quadratic
Inverse Problems, Sparsity Promoting
Regularization
Frontiers in Mathematics
Advisory Editorial Board Leonid Bunimovich (Georgia Institute of Technology, Atlanta) William Y. C. Chen (Nankai University, Tianjin, China) Benoît Perthame (Université Pierre et Marie Curie, Paris) Laurent Saloff-Coste (Cornell University, Ithaca) Igor Shparlinski (Macquarie University, New South Wales) Wolfgang Sprößig (TU Bergakademie Freiberg) Cédric Villani (Institut Henri Poincaré, Paris)
More information about this series at http://www.springer.com/series/5388
Jens Flemming
Variational Source Conditions, Quadratic Inverse Problems, Sparsity Promoting Regularization New Results in Modern Theory of Inverse Problems and an Application in Laser Optics
Jens Flemming Fakult¨at f¨ur Mathematik Technische Universit¨at Chemnitz Chemnitz, Germany
ISSN 1660-8046 ISSN 1660-8054 (electronic) Frontiers in Mathematics ISBN 978-3-319-95263-5 ISBN 978-3-319-95264-2 (eBook) https://doi.org/10.1007/978-3-319-95264-2 Library of Congress Control Number: 2018950663 Mathematics Subject Classification (2010): 65J22, 65J20, 47A52, 47J06 © Springer Nature Switzerland AG 2018 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This book is published under the imprint Birkhäuser, www.birkhauser-science.com by the registered company Springer Nature Switzerland AG part of Springer Nature. The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
This book is dedicated to Bernd Hofmann on the occasion of his retirement.
Preface
This book grew out of the author’s habilitation thesis, which has been completed in January 2018. Parts II and III cover and slightly extend the material of the thesis. Part I, on the one hand, provides an introduction to the other parts and, on the other hand, contains new results on variational source conditions in the context of convergence rates theory for illposed inverse problems. The intention of writing this book was to demonstrate new and to some extent nonorthodox ideas for handling ill-posed inverse problems. This book is not a comprehensive introduction to inverse problems. Instead, it focuses on few research topics and handles them in depth. The three topics of the book, variational source conditions, quadratic inverse problems, and 1 -regularization, seem to be quite different. The first one is of great generality and establishes the basis for several more concrete results in the book. The second one is concerned with nonlinear mappings in a classical Hilbert space setting, whereas the third deals with linear mappings in non-reflexive Banach spaces. At the second sight, quadratic inverse problems and linear inverse problems with sparsity context have similar structures and their handling shows several parallels. Nevertheless, I decided to divide the book into three more or less independent parts and to give hints on cross connections from time to time. The advantage of this decision is that the reader may study the three parts in arbitrary order. Finishing this book would not have been possible without constant support and advice by Prof. Bernd Hofmann (TU Chemnitz). I thank him a lot for his efforts in several regards during all the years I have been working in his research group. I also want to thank my colleagues and coauthors, especially Steven Bürger and Daniel Gerth, for interesting and fruitful discussions. Last but not least I have to express my thanks to the Faculty of Mathematics at TU Chemnitz as a whole for the cordial and cooperative working atmosphere. Chemnitz, Germany May 2018
Jens Flemming
vii
Contents
Part I Variational Source Conditions 1
Inverse Problems, Ill-Posedness, Regularization .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 1.1 Setting . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 1.2 Ill-Posedness.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 1.2.1 Global Definitions by Hadamard and Nashed . . . . .. . . . . . . . . . . . . . . 1.2.2 Local Definitions by Hofmann and Ivanov .. . . . . . .. . . . . . . . . . . . . . . 1.2.3 Interrelations .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 1.2.4 Nashed’s Definition in Case of Uncomplemented Null Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 1.3 Tikhonov Regularization .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . .
3 3 4 4 5 8 10 12
2
Variational Source Conditions Yield Convergence Rates . . . . . .. . . . . . . . . . . . . . . 2.1 Evolution of Variational Source Conditions . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 2.2 Convergence Rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . .
15 15 17
3
Existence of Variational Source Conditions . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 3.1 Main Theorem .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 3.2 Special Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 3.2.1 Linear Equations in Hilbert Spaces .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 3.2.2 Bregman Distance in Banach Spaces . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 3.2.3 Vanishing Error Functional.. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . .
21 21 25 25 26 28
Part II Quadratic Inverse Problems 4
What Are Quadratic Inverse Problems? .. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 4.1 Definition and Basic Properties .. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 4.2 Examples . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 4.2.1 Autoconvolutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 4.2.2 Kernel-Based Autoconvolution in Laser Optics. . .. . . . . . . . . . . . . . . 4.2.3 Schlieren Tomography . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 4.3 Local Versus Global Ill-Posedness . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . .
31 31 35 35 40 45 48 ix
x
Contents
4.4 4.5
Geometric Properties of Quadratic Mappings’ Ranges.. . . .. . . . . . . . . . . . . . . Literature on Quadratic Mappings.. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . .
49 52
5
Tikhonov Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . .
55
6
Regularization by Decomposition.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 6.1 Quadratic Isometries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 6.2 Decomposition of Quadratic Mappings . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 6.3 Inversion of Quadratic Isometries . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 6.4 A Regularization Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 6.5 Numerical Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . .
59 59 63 67 75 78
7
Variational Source Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 97 7.1 About Variational Source Conditions . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 97 7.2 Nonlinearity Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 98 7.3 Classical Source Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 99 7.4 Variational Source Conditions Are the Right Tool . . . . . . . . .. . . . . . . . . . . . . . . 100 7.5 Sparsity Yields Variational Source Conditions . . . . . . . . . . . . .. . . . . . . . . . . . . . . 101
Part III
Sparsity Promoting Regularization
8
Aren’t All Questions Answered? .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 109
9
Sparsity and 1 -Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 9.1 Sparse Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 9.2 1 -Regularization .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 9.3 Other Sparsity Promoting Regularization Methods.. . . . . . . .. . . . . . . . . . . . . . . 9.4 Examples . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 9.4.1 Denoising . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 9.4.2 Bidiagonal Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 9.4.3 Simple Integration and Haar Wavelets. . . . . . . . . . . . .. . . . . . . . . . . . . . . 9.4.4 Simple Integration and Fourier Basis . . . . . . . . . . . . . .. . . . . . . . . . . . . . .
111 111 113 118 119 119 119 120 121
10 Ill-Posedness in the 1 -Setting .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 125 11 Convergence Rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 11.1 Results in the Literature .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 11.2 Classical Techniques Do Not Work . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 11.3 Smooth Bases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 11.4 Non-smooth Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 11.5 Convergence Rates Without Source-Type Assumptions.. . .. . . . . . . . . . . . . . . 11.6 Convergence Rates Without Injectivity-Type Assumptions .. . . . . . . . . . . . . . 11.6.1 Distance to Norm Minimizing Solutions . . . . . . . . . .. . . . . . . . . . . . . . . 11.6.2 Sparse Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 11.6.3 Sparse Unique Norm Minimizing Solution . . . . . . .. . . . . . . . . . . . . . .
129 129 131 132 138 145 148 148 152 155
Contents
xi
11.6.4 Non-sparse Solutions .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 158 11.6.5 Examples .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 160 A
Topology, Functional Analysis, Convex Analysis. . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . A.1 Topological Spaces and Nets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . A.2 Reflexivity, Weak and Weak* Topologies .. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . A.3 Subdifferentials and Bregman Distances . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . .
165 165 166 168
B
Verification of Assumption 11.13 for Example 11.18 . . . . . . . . . . .. . . . . . . . . . . . . . . 171
References .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 177 Index . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 181
Part I Variational Source Conditions
1
Inverse Problems, Ill-Posedness, Regularization
Abstract
We introduce the mathematical setting as well as basic notation used throughout the book. Different notions of ill-posedness in the context of inverse problems are discussed and the need for regularization leads us to Tikhonov-type methods and their behavior in Banach spaces.
1.1
Setting
Let X and Y be Banach spaces over R or C and let F : X ⊇ D(F ) → Y be a mapping between them with domain D(F ). We aim to solve equations F (x) = y † ,
x ∈ D(F ),
(1.1)
with exact and attainable data y † in Y . Solving such equations requires, in some sense, inversion of F . Hence the term inverse problem. The mathematical field of inverse problems is not concerned with Eq. (1.1) in general but only with equations that are ill-posed. Loosely speaking, an equation is ill-posed if the inversion process is very sensitive to perturbations in the right-hand side y † . Such perturbations cannot be avoided in practice because y † represents some measured quantity and measurements always are corrupted by noise. We provide and discuss different precise definitions of ill-posedness in the next section.
© Springer Nature Switzerland AG 2018 J. Flemming, Variational Source Conditions, Quadratic Inverse Problems, Sparsity Promoting Regularization, Frontiers in Mathematics, https://doi.org/10.1007/978-3-319-95264-2_1
3
4
1 Inverse Problems, Ill-Posedness, Regularization
To analyze and overcome ill-posedness noise has to be taken into account. In other words, the exact right-hand side y † is not available for the inversion process. Instead, we only have some noisy measurement y δ at hand, which is assumed to belong to Y , too, and to satisfy δ y − y † ≤ δ
(1.2)
with nonnegative noise level δ. For later reference we list the following restrictions on our setting. Assumption 1.1 We assume that (i) equation (1.1) has a solution, (ii) the domain D(F ) is weakly sequentially closed, (iii) the mapping F is weakly sequentially continuous. Items (ii) and (iii) are satisfied if and only if for each sequence (xn )n∈N in D(F ) and each x in X we have xn x
⇒
x ∈ D(F ),
F (xn ) F (x).
1.2
Ill-Posedness
1.2.1
Global Definitions by Hadamard and Nashed
The classical definition of ill-posedness was introduced by Hadamard. Definition 1.2 The mapping F in Eq. (1.1) is well-posed in the sense of Hadamard if (i) for each y † in Y there exists a solution, (ii) for each fixed right-hand side y † there is at most one solution, (iii) solutions depend continuously on the data. Else F is ill-posed in the sense of Hadamard. Items (i) and (ii) of the definition require that F is bijective and item (iii) says that the inverse mapping has to be continuous with respect to the norm or some other topology. Due to its restrictive nature Hadamard’s definition only plays a minor role in modern theory of
1.2 Ill-Posedness
5
inverse problems. Existence of solutions usually is formulated as an assumption, cf. item (i) in Assumption 1.1, and uniqueness is not required because the developed theory will cover the case of multiple solutions. In [1] Nashed proposed a definition of ill-posedness for bounded linear mappings F between Banach spaces X and Y with domain D(F ) = X. Definition 1.3 Let F in Eq. (1.1) be linear and bounded. Then F is well-posed in the sense of Nashed if the range of F is closed in Y and ill-posed in the sense of Nashed if the range of F is not closed in Y . Nashed’s definition does not consider existence and uniqueness of solutions, but focusses on continuous (generalized) invertibility. If a generalized inverse exists, then it is continuous if and only if F is well-posed in the sense of Nashed, see [2, Theorem 5.6(b)]. But one should be aware of the fact, that in general Banach spaces generalized inverses are not always available, because the null space of F or the closure of the range may be uncomplemented, see Proposition 1.10 and Sect. 1.2.4 below. An important example for this situation is the setting used for analyzing 1 -regularization in Part III. If F is injective, then the inverse F −1 : Y ⊇ R(F ) → X is continuous on R(F ) if and only if R(F ) is closed. If X and Y are Hilbert spaces, then the Moore–Penrose inverse is a generalized inverse which always exists. Thus, in Hilbert spaces well-posedness in the sense of Nashed is equivalent to continuity of the Moore–Penrose inverse. Nashed distinguished two types of ill-posedness in [1]. In Chap. 10 we have a closer look at this distinction in the context of 1 -regularization.
1.2.2
Local Definitions by Hofmann and Ivanov
Hadamard’s and Nashed’s definitions of ill-posedness are of global nature. For nonlinear mappings F properties may vary from point to point and ill-posedness has to be understood in a local manner. Following the ideas in [3] we have to distinguish between local illposedness at a point x in X and local ill-posedness at a point y in Y . The aim of defining precisely what is meant by ill-posedness is to describe the following situation mathematically: Given a sequence (yn )n∈N in R(F ) approximating the unknown exact data y † in (1.1), a sequence (xn )n∈N of corresponding solutions to F (x) = yn , x ∈ D(F ), does not converge to a solution of (1.1). The difficulties are to choose concrete types of approximation and convergence and to handle the case of multiple solutions. One possibility for defining ill-posedness locally at a point of the domain D(F ) has been suggested in [4] by Hofmann, see also [5].
6
1 Inverse Problems, Ill-Posedness, Regularization
Definition 1.4 The mapping F is locally well-posed in the sense of Hofmann at a point x0 in D(F ) if there is some positive ε such that for each sequence (xn )n∈N in Bε (x0 ) ∩ D(F ) the implication F (xn ) → F (x0 )
⇒
xn → x0
is true. Otherwise, F is locally ill-posed in the sense of Hofmann at the point x0 . Local well-posedness in the sense of Hofmann implies that x0 has to be an isolated solution to F (x) = F (x0 ), x ∈ D(F ). In this sense, local uniqueness is part of this type of local well-posedness. Ivanov introduced a similar concept in [6], but locally in Y . Thus, he gets around the question of uniqueness. See also [3, Definition 1]. Definition 1.5 The mapping F is locally well-posed in the sense of Ivanov at a point y0 in R(F ) if for each sequence (yn )n∈N in R(F ) the implication yn → y0
⇒
sup
inf
−1 x∈F ˜ −1 (yn ) x∈F (y0 )
x˜ − x → 0
is true. Otherwise, F is locally ill-posed in the sense of Ivanov at the point y0 . The set-to-set distance sup inf x˜ − x
x∈ ˜ M˜
x∈M
between two subsets M˜ and M of X used in the Definition 1.5 is not symmetric. It expresses the maximum distance of elements in M˜ to the set M. Since we cannot control which of possibly many approximate solutions is chosen by an inversion method, this type of distance is the right choice. The only drawback of Definition 1.5 is that norm convergence cannot be replaced easily by other types of convergence to define ill-posedness with respect to the weak topology, for example. The following proposition provides an equivalent reformulation which avoids explicit use of norms. The proposition was already mentioned briefly in [3, Remark 1]. Proposition 1.6 The mapping F is well-posed in the sense of Ivanov at a point y0 in R(F ) if and only if for each sequence (yn )n∈N in R(F ) converging to y0 and for each sequence (x˜n )n∈N of preimages x˜n from F −1 (yn ) there exists a sequence (xn )n∈N in F −1 (y0 ) with x˜n − xn → 0.
1.2 Ill-Posedness
7
Proof Let F be well-posed in the sense of Ivanov at the point y0 and let (yn )n∈N be a sequence in R(F ) converging to y0 . Given a sequence (x˜n )n∈N with x˜n ∈ F −1 (yn ) we immediately see inf
x∈F −1 (y0 )
x˜n − x → 0.
Fixing ε, for each n we find xn in F −1 (y0 ) with x˜n − xn ≤
inf
x∈F −1 (y0 )
x˜n − x + ε.
Thus, we obtain x˜n − xn ≤ 2 ε for all sufficiently large n, which implies convergence x˜n − xn → 0. Now let y0 be in R(F ) and let (yn )n∈N be a sequence in R(F ) converging to y0 . Further, assume that for each sequence (x˜n )n∈N of preimages x˜n from F −1 (yn ) there exists a sequence (xn )n∈N in F −1 (y0 ) with x˜n − xn → 0. If there would be some positive fixed ε with inf
sup
−1 (y ) x∈F −1 (y0 ) x∈F ˜ n
x˜ − x > ε,
we would find a sequence (x˜n )n∈N with inf
x∈F −1 (y0 )
x˜n − x > ε
for all n. Thus, there would be a sequence (xn )n∈N with ε<
inf
x∈F −1 (y0 )
x˜n − x ≤ x˜n − xn → 0,
which contradicts ε > 0. This shows sup
inf
−1 x∈F ˜ −1 (yn ) x∈F (y0 )
x˜ − x → 0.
Remark 1.7 From Proposition 1.6 we easily see that the following condition is sufficient for local well-posedness in the sense of Ivanov at y0 : Each sequence (xn )n∈N in D(F ) with F (xn ) → y0 contains a convergent subsequence and the limits of all convergent subsequences are solutions corresponding to the right-hand side y0 .
8
1 Inverse Problems, Ill-Posedness, Regularization
Throughout this book ill-posedness is to be understood in the sense of Ivanov if not otherwise stated.
1.2.3
Interrelations
The definitions of Hofmann and Ivanov are closely connected, but differ in two aspects. On the one hand, Hofmann’s definition works in X and Ivanov’s definition works in Y . On the other hand, and also as a consequence of the first difference, in Hofmann’s definition well-posedness is restricted to isolated solutions whereas Ivanov’s definition works for arbitrary solution sets. Both views have their advantages. Hofmann’s definition allows for a deeper analysis of ill-posedness phenomena. Due to its locality in X at each element of a set of isolated solutions we can distinguish between well-posedness and ill-posedness. That is, for one fixed data element at the same time there might exist solutions at which the mapping is well-posed and solutions at which the mapping is ill-posed in the sense of Hofmann. Analyzing an inverse problem with Hofmann’s definition allows to identify regions of well-posedness and regions of ill-posedness. Thus, restricting the domain of the mapping F with the help of Hofmann’s definition could make the inverse problem well-posed. Ivanov’s definition does not allow for such a detailed analysis. But its advantage is that it is closer to the issue of numerical instability. Given a data element, we want to know whether a sequence of approximate solutions based on noisy data becomes arbitrarily close to the set of exact solutions if the noise is reduced until it vanishes. This is exactly what Ivanov’s definition expresses. The interrelations between Hofmann’s definition and Ivanov’s definition are made precise by the following two propositions. The first proposition is a slightly extended version of [3, Proposition 2] and the second stems from oral communication with Bernd Hofmann (Chemnitz). Proposition 1.8 If the mapping F is locally well-posed in the sense of Ivanov at some point y0 in R(F ), then F is locally well-posed in the sense of Hofmann at each isolated solution corresponding to the data y0 . Proof Let F be locally well-posed in the sense of Ivanov at y0 and let x0 be an isolated solution to data y0 . Take a positive radius ε such that x0 is the only solution to data y0 in B2 ε (x0 ). For each sequence (x˜n )n∈N in Bε (x0 ) ∩ D(F ) and for the corresponding sequence (yn )n∈N with yn := F (x˜n ) Proposition 1.6 yields a sequence (xn )n∈N in D(F ) with F (xn ) = y0 and x˜n −xn → 0. Since (x˜n )n∈N lies in Bε (x0 ) and x0 is the only solution in B2 ε (x0 ), we obtain xn = x0 for all n. Consequently, x˜ n → x0 , which proves local wellposedness in the sense of Hofmann at x0 .
1.2 Ill-Posedness
9
Proposition 1.9 There exist mappings F and points x0 in D(F ) such that F is locally well-posed in the sense of Hofmann at x0 but locally ill-posed in the sense of Ivanov at F (x0 ). 2
x Proof Choose X := R, Y := R and F (x) := 1+x 4 with D(F ) = X. Then x0 := 0 is the only solution to F (x) = 0, x ∈ X, and continuous invertibility of F near zero immediately implies local well-posedness in the sense of Hofmann. On the other hand, we may consider a sequence (yn )n∈N with elements yn := F (xn ) such that xn → ∞. Then yn → 0, but
sup
inf
−1 (y ) x∈F −1 (0) x∈F ˜ n
x˜ − x ≥
inf
x∈F −1 (0)
xn − x = xn → 0.
Thus, F is locally ill-posed in the sense of Ivanov at F (0).
Finally, we state the interrelation between Nashed’s definition and Ivanov’s definition. The special case of Hilbert spaces, where each closed subspace is complemented, can be found in [3, Proposition 1]. Proposition 1.10 Let F be a bounded linear operator with domain D(F ) = X between the Banach spaces X and Y and let the null space N (F ) be complemented in X. Then F is well-posed in the sense of Nashed if and only if F is locally well-posed in the sense of Ivanov at every point of R(F ) and F is ill-posed in the sense of Nashed if and only if F is locally ill-posed in the sense of Ivanov at every point of R(F ). Proof Let N (F ) be complemented by U , that is, U is a closed linear subspace of X and X = N (F )⊕U . One easily shows, that the restriction F |U of F to U is bijective between U and R(F ). Thus, the inverse (F |U )−1 is a well-defined linear operator, which due to R(F ) = R(F |U ) is bounded if and only if R(F ) is closed. We see that F is well-posed in the sense of Nashed if and only if (F |U )−1 is bounded. Let (F |U )−1 be bounded. To show local well-posedness in the sense of Ivanov at an arbitrary point y0 in R(F ) we choose sequences (yn )n∈N with yn → y and (x˜n )n∈N in X with F (x˜n ) = yn . By Proposition 1.6 we have to show that there exists a sequence (xn )n∈N with F (xn ) = y0 and x˜n − xn → 0. Such a choice is given by xn := x˜n − (F |U )−1 F (x˜n ) + (F |U )−1 (y0 ), because F (xn ) = F (x˜n ) − F (x˜n ) + y0 = y0
10
1 Inverse Problems, Ill-Posedness, Regularization
and x˜n − xn = (F |U )−1 F (x˜n ) − (F |U )−1 (y0 ) = (F |U )−1 (yn − y0 ) → 0. Now let (F |U )−1 be unbounded. We want to show local ill-posedness in the sense of Ivanov at y0 = 0. Local ill-posedness for arbitrary y0 in R(F ) then follows easily by translation. Since (F |U )−1 is unbounded, there exists a sequence (yn )n∈N in R(F ) with yn → 0 but (F |U )−1 (yn ) → 0. Set x˜n := (F |U )−1 (yn ) for all n in N. By Proposition 1.6 we have to show that there is no sequence (xn )n∈N in X with F (xn ) = 0 and x˜ n − xn → 0. Suppose there would exist such a sequence and denote the projection of X onto U along N (F ) by PU : X → U . This projection exists and it is a bounded linear operator because U and N (F ) are complementary subspaces of X. Then x˜n = PU (x˜n − xn ) → 0, because x˜n ∈ U and xn ∈ N (F ). But this contradicts x˜n → 0. Thus, F has to be locally ill-posed in y0 = 0.
1.2.4
Nashed’s Definition in Case of Uncomplemented Null Spaces
Given a closed subspace U of a Banach space X we always find a second subspace V such that X = U ⊕ V is the direct sum of U and V , see [7, Proposition 3.2.3]. The space U is said to be algebraically complemented by V . If, in addition, V is closed, then U is said to be topologically complemented or simply complemented by V . Banach spaces which are not isomorphic to a Hilbert space always contain uncomplemented closed subspaces, see [8]. An example is ∞ with the uncomplemented closed subspace c0 , cf. [7, Theorem 3.2.20]. In Hilbert spaces each closed subspace is complemented. In Proposition 1.10 we followed the lines of [1], [2, Section 5.2], that is, we excluded the case that the null space of the considered linear operator might by uncomplemented. But since this might happen in every Banach space which is not isomorphic to a Hilbert space, we want to discuss implications of Nashed’s definition of well- and ill-posedness on invertibility of (generalized) inverses and on Ivanov’s definition. Such a discussion seems to be missing in the literature. In this subsection we only consider bounded linear operators F between Banach spaces X and Y with domain D(F ) = X. To highlight linearity we use the more common notation A := F. The null space N (A) of A is a closed subspace of X. Given an algebraic complement U of N (A) we may define an inverse A+ U : R(A) → X of A with respect to U as the inverse
1.2 Ill-Posedness
11
of the restriction A|U of A to U . Obviously, A|U is a bijective bounded linear operator from U onto R(A). Note, that if the closure R(A) of the range of A is (topologically) complemented by V in Y , then we could extend A+ U to a generalized inverse on the dense subspace R(A) ⊕ V of Y . For operators with uncomplemented null space both well-posedness in the sense of Nashed and ill-posedness in the sense of Nashed may occur. An example for wellposedness is the mapping A : 1 → 2 defined by A x :=
xn z(n) ,
n∈N
where (z(n) )n∈N is a dense subset of the open unit ball in 2 . See [9, Proof of Theorem 2.3.1] for details. We now show that if N (A) is uncomplemented, then Nashed’s definition of well-posedness does not tell anything about continuity of the inverse A+ U. Proposition 1.11 Let N (A) be uncomplemented and let U be an algebraic complement of N (A). Then the corresponding inverse A+ U is unbounded. Proof Since N (A) is uncomplemented we have N (A) ∩ U = {0}. Choose some u in (N (A) ∩ U ) \ {0} and a sequence (un )n∈N in U with un → u. If A+ U would be bounded, then + + un = A + U A un → AU A u = AU 0 = 0,
which contradicts un → u = 0.
It remains to clarify the relation between Nashed’s definition and Ivanov’s definition. We only have the partial result that a closed range implies well-posedness in the sense of Ivanov, regardless of possible uncomplementedness of N (A). Proposition 1.12 Let R(A) be closed. Then A is locally well-posed in the sense of Ivanov at every point of R(A). Proof Obviously, it suffices to show local well-posedness in the sense of Ivanov at zero. Let (yn )n∈N be a sequence in R(A) with yn → 0 and take some positive ε. Then the set Mε := x˜ ∈ X :
inf
x∈N (A)
x˜ − x < ε
is open and the open mapping theorem implies that the image A Mε is open, too. Since 0 ∈ A Mε , there is n with yn ∈ A Mε . Consequently, we find some x˜n in Mε with A x˜ n = yn . Letting ε → 0 we obtain a sequence (x˜n )n∈N with A x˜ n = yn and infx∈N (A) x˜n −x → 0.
12
1 Inverse Problems, Ill-Posedness, Regularization
Noting inf
x∈N (A)
x˜ − x =
inf
x∈N (A)
x˜n − x for all x˜ ∈ A−1 yn
we arrive at sup
inf
x∈A ˜ −1 yn x∈N (A)
x˜ − x =
inf
x∈N (A)
x˜n − x → 0.
If N (A) is uncomplemented and R(A) is not closed, then the author does not have any result on well- or ill-posedness in the sense of Ivanov. But he conjectures that both situations are possible.
1.3
Tikhonov Regularization
To solve Eq. (1.1) numerically we have to take into account noisy data and lacking stability of the inversion process due to ill-posedness. Noisy data need not belong to the range of F . Thus, there need not exist a solution. To overcome this difficulty we content ourselves with ‘least squares’ solutions defined by 1 F (x) − y δ p → min , p x∈D(F )
(1.3)
where y δ denotes noisy data with noise level δ as introduced in (1.2). The exponent p can be used to ease numerical minimization. We assume p > 1. Without further assumptions on F there need not exist a solution to the minimization problem (1.3). Even if we have a sequence (y δn )n∈N of noisy data with decreasing noise levels δn , that is, δn → 0, for which solutions to (1.3) exist, we cannot guarantee that they converge to the solution set of (1.1). This is a simple consequence of local ill-posedness in the sense of Ivanov at the exact right-hand side y † , cf. Definition 1.5. There exist several approaches to stabilize the minimization problem and to obtain convergence of the stabilized problem’s solutions to the solution set of (1.1). We restrict our attention to the very flexible class of Tikhonov-type regularization methods Tα (x, y δ ) :=
1 F (x) − y δ p + α Ω(x) → min . p x∈D(F )
(1.4)
The penalty functional Ω : X → (−∞, ∞] has to be chosen in a way which stabilizes the minimization problem, see Assumption 1.13 below. The positive regularization parameter
1.3 Tikhonov Regularization
13
α controls the trade-off between data fitting and stabilization. To obtain convergence of the Tikhonov minimizers to the solution set of (1.1) if the noise level decreases, we have to choose α depending on δ or y δ or on both. Tikhonov-type methods are well established in the field of inverse problems and detailed information can be found in almost all monographs and textbooks in the field, see, e.g., [10–12]. Assumption 1.13 We assume that (i) (ii) (iii) (iv) (v)
Assumption 1.1 is true, there exists a solution to (1.1) at which Ω is finite, Ω is convex and proper, Ω is weakly sequentially lower semicontinuous, each sequence in a sublevel set {x ∈ X : Ω(x) ≤ c}, c ∈ R, contains a weakly convergent subsequence.
Under these assumptions one easily shows that there exist solutions to (1.1) which minimize Ω in the set of all solutions. Such solutions will be referred to as Ω minimizing solutions and are usually denoted by x † . Tikhonov minimizers only converge to this special type of solutions. Theorem 1.14 Let Assumption 1.13 be true. Then the following assertions on existence, stability and convergence of Tikhonov regularized solutions are true. (i) For each y in Y and each positive α the minimization problem (1.4) has a solution. (ii) Let α > 0. If (yn )n∈N is a sequence in Y converging to some fixed y in Y and if (xn )n∈N is a sequence of corresponding minimizers of Tα (·, yn ), then (xn )n∈N has a weakly convergent subsequence and the limit of each weakly convergent subsequence ¯ for each weakly convergent is a minimizer of Tα (·, y). Further Ω(xnk ) → Ω(x) ¯ subsequence (xnk )k∈N with limit x. (iii) Let (δn )n∈N be a sequence of noise levels converging to zero and let (y δn )n∈N be a sequence of corresponding data elements. Further assume that (αn )n∈N is a sequence p of positive regularization parameters with αn → 0 and δαnn → 0. Then each sequence (xn )n∈N of corresponding minimizers of Tαn (·, y δn ) has a weakly convergent subsequence and the limit of each weakly convergent subsequence is an Ω minimizing solution to (1.1). Further Ω(xnk ) → Ω(x †) for each weakly convergent subsequence (xnk )k∈N with limit x † . Proof See [12, Section 4.1] or [13, Chapter 3].
14
1 Inverse Problems, Ill-Posedness, Regularization
Convergence to Ω minimizing solutions may be arbitrarily slow and we are interested in estimates for the convergence speed. Of course, additional assumptions are required for such estimates. This issue will be discussed in the next chapter. At first, we have to decide how to measure the speed of convergence. To cover all relevant situations we introduce a general error functional E † : X → [0, ∞), where the symbol † indicates that the error functional depends on one or more Ω minimizing solutions. The canonical choice in Hilbert spaces is the squared norm distance E † (x) = x − x † 2 between regularized and exact solution. If there are multiple solutions, the point-to-set distance E † (x) = inf x − x † 2 x † ∈S
is a suitable choice if S denotes the set of norm minimizing solutions. In Banach spaces alternatives like the Bregman distance E † (x) = BξΩ† (x, x † ) (see Sect. A.3 for a definition) proved to be useful. But Banach space norms can also be used. In 1 (N) we could choose E † (x) = x − x † 1
or
E † (x) = inf x − x † 1 x † ∈S
with S denoting again the set of norm minimizing solutions. All these examples will be discussed in Sect. 3.2 in more detail. Denoting by xαδ the minimizers of the Tikhonov minimization problem (1.4), we aim at asymptotic estimates E † (xαδ ) = O(ϕ(δ)),
δ → 0,
(1.5)
where α may depend on δ and y δ . The function ϕ shall be an index function in the following sense. Definition 1.15 A function ϕ : [0, ∞) → [0, ∞) is an index function if it is continuous, monotonically increasing, strictly increasing in a neighborhood of zero, and satisfies ϕ(0) = 0.
2
Variational Source Conditions Yield Convergence Rates
Abstract
We introduce variational source conditions and derive convergence rates for Tikhonovtype regularization methods.
2.1
Evolution of Variational Source Conditions
Different techniques have been developed to prove convergence rates (1.5). The most prominent tool are source conditions for linear ill-posed inverse problems in Hilbert spaces. The classical concept is described in [10, Section 3.2] and general source conditions are studied in [14]. See also the references given in [14] for the origins of general source conditions. In both cases the norm distance between exact and regularized solution is used as error functional E † . For Banach spaces usage of source conditions is quite limited. But in 2007 variational source conditions were introduced in [15] and thoroughly studied and developed during the past 10 years, see, e.g., [11,13,16–18]. This type of condition allows to prove convergence rates for many different settings, especially for nonlinear operators and general penalty functionals in (1.4). In its original version Bregman distances (see Sect. A.3) were used as error functional E † . Variational source conditions are also known as variational inequalities, but this term conflicts with the already existing mathematical field with the same name. A second alternative was introduced in the book [13]. There the term variational smoothness assumption is used, because several kinds of smoothness (not only of the underlying exact solution as it is the case for classical source conditions) are jointly described by one expression. The term variational source condition rouses associations to classical source conditions. But the new concept has no similarity to classical source conditions, most notably there © Springer Nature Switzerland AG 2018 J. Flemming, Variational Source Conditions, Quadratic Inverse Problems, Sparsity Promoting Regularization, Frontiers in Mathematics, https://doi.org/10.1007/978-3-319-95264-2_2
15
16
2 Variational Source Conditions Yield Convergence Rates
is no source element. Nevertheless, in most recent literature ‘variational source condition’ seems to be used more often than ‘variational inequality’, whereas ‘variational smoothness assumption’ is not used by other authors. Thus, we write ‘variational source condition’ to name the technique described and applied below and, to avoid drawing too many parallels to source conditions, we read it as ‘variational replacement for source conditions’. The definition of variational source conditions in this book, see also [19], will be slightly more general than other variants before, because it is not connected to one fixed solution of (1.1). Instead, we allow multiple Ω minimizing solutions and refer to [20] and Sect. 7.1 for a concrete example. For this purpose we denote by Ω † the value of Ω at the Ω minimizing solutions, that is, Ω † := min{Ω(x † ) : x † ∈ D(F ), F (x † ) = y † }. Assumption 1.13 guarantees existence of Ω minimizing solutions. Definition 2.1 Let β > 0 be a constant and let ϕ : [0, ∞) → [0, ∞) be an index function. A variational source condition for fixed right-hand side y † holds on a set M ⊆ D(F ) if β E † (x) ≤ Ω(x) − Ω † + ϕ(F (x) − y † )
for all x in M.
(2.1)
If the set M is large enough to contain all minimizers of the Tikhonov functional (1.4), then a variational source condition (2.1) implies the desired convergence rate (1.5). Although our variant is slightly more general, the proofs of this fact given in [13, Chapter 4] or in [21] still work with trivial modifications. Suitable choices of the regularization parameter α are discussed there, too. For the sake of completeness we provide the proof in the next section. The constant β plays only a minor role. In principle we could hide it in the functional † E , but then E † would depend on the chosen index function ϕ and not solely on exact and regularized solutions. The implied convergence rate does not depend on β, only the O-constant contains the factor β1 . Variational source conditions originally were developed to obtain rates for Tikhonovtype regularization, but can also be used in the context of other methods. See [22] for the residual method and [18] for iteratively regularized Newton methods. A major drawback of variational source conditions is that the best obtainable rate may be slower than the best possible one. This is for instance the case for rates faster than √ 2 O( δ) in the classical linear Hilbert space setting, where the best one is O(δ 3 ). On the other hand, in 1 -regularization rates up to the best possible one O(δ) for the error norm can be obtained, see [23] and Sect. 11.3. An approach to overcome technical rate limitations was undertaken in [24], but it is limited to linear equations.
2.2 Convergence Rates
2.2
17
Convergence Rates
In this section we prove that variational source conditions (2.1) imply rates (1.5). To choose the regularization parameter we consider an a priori parameter choice α = α(δ) specified below and an a posteriori parameter choice α = α(δ, y δ ) known as discrepancy principle. The later consists in choosing α such that δ ≤ F (xαδ ) − y δ ≤ τ δ
(2.2)
where τ ≥ 1 is a parameter and xαδ denotes a Tikhonov regularized solution. In case of the a priori choice we apply techniques from [21], but slightly improve the constants in the error estimate. For the discrepancy principle we take the proof from [13] and specialize it to our setting. Both proofs can also be found in [25] for the special case of 1 -regularization. To shorten the two proofs we mention two properties of concave index functions ϕ. Simple calculations show that t → ϕ(tt ) is decreasing. As a consequence we see that ϕ(c t) ≤ c ϕ(t) if c ≥ 1. Both observations will be used without further notice. Proposition 2.2 Let the variational source condition (2.1) be satisfied with a convex index function ϕ and choose α in (1.4) such that c1
δp δp ≤ α ≤ c2 ϕ(δ) ϕ(δ)
for all positive δ with positive constants c1 , c2 . Then E † (xαδ ) ≤
1 β
1 1 + (1 + 2 p c2 ) p−1 ϕ(δ) 1+ p c1
for all positive δ. Proof Because xαδ is a minimizer of (1.4), for an arbitrary Ω minimizing solution x † to (1.1) we have Ω(xαδ ) − Ω † =
1 α
≤
1 α
≤
1 α
1 Tα (xαδ , y δ ) − α Ω(x †) − F (xαδ ) − y δ p p 1 1 † δ p δ δ p F (x ) − y − F (xα ) − y p p 1 p 1 δ δ p δ − F (xα ) − y . p p
18
2 Variational Source Conditions Yield Convergence Rates
and thus the variational source condition (2.1) implies β E † (xαδ ) ≤
1 p δ − F (xαδ ) − y δ p + ϕ F (xαδ ) − F (x † ) . pα
(2.3)
Because β E † (xαδ ) ≥ 0, we obtain F (xαδ ) − y δ p ≤ δ p + p α ϕ F (xαδ ) − F (x † ) . If F (xαδ ) − y δ ≤ δ, then the triangle inequality, the properties of ϕ and the parameter choice imply F (xαδ ) − y δ p ≤ δ p + p α ϕ(2 δ) ≤ δ p + 2 p α ϕ(δ) ≤ (1 + 2 p c2 ) δ p , that is, 1
1
F (xαδ ) − y δ ≤ (1 + 2 p c2 ) p δ ≤ (1 + 2 p c2 ) p−1 δ. If, on the other hand, F (xαδ ) − y δ > δ, then F (xαδ ) − y δ p ≤ δ p + p α ϕ F (xαδ ) − y δ + δ ϕ F (xαδ ) − y δ + δ p F (xαδ ) − y δ + δ = δ +pα δ δ F (xα ) − y + δ ϕ(δ) F (xαδ ) − y δ + δ δ ϕ(δ) F (xαδ ) − y δ ≤ δ p−1 F (xαδ ) − y δ + 2 p α δ ≤ δp + p α
and thus, 1 1 ϕ(δ) p−1 F (xαδ ) − y δ ≤ δ p−1 + 2 p α ≤ (1 + 2 p c2 ) p−1 δ. δ In both cases (2.3) can be further estimated to obtain 1 p δ − F (xαδ ) − y δ p + ϕ F (xαδ ) − y δ + δ pα
1 δp + ϕ 1 + (1 + 2 p c2 ) p−1 δ ≤ pα
1 δp ≤ + 1 + (1 + 2 p c2 ) p−1 ϕ(δ) pα
β E † (xαδ ) ≤
2.2 Convergence Rates
19
and the lower bound for α leads to β E † (xαδ ) ≤
1 ϕ(δ)
+ 1 + (1 + 2 p c2 ) p−1 ϕ(δ). p c1
Note that in the proof we used arguments similar to the ones in [21], but made changes in the details leading to a better constant in the obtained error estimate. Corresponding estimates in [21, Theorem 1] lead to E † (xαδ ) ≤
1 1
1 + 2 (2 + p) p−1 ϕ(δ), β
which has a greater constant factor than our estimate. Our estimate with the parameter choice from [21], that is c1 = c2 = 1, reads E † (xαδ ) ≤
1 β
1 1 1 + + (1 + 2 p) p−1 ϕ(δ). p
Proposition 2.3 Let the variational source condition (2.1) be satisfied and choose α in (1.4) according to the discrepancy principle (2.2). Then E † (xαδ ) ≤
1+τ ϕ(δ) β
for all positive δ. Proof Because xαδ is a minimizer of (1.4), for an arbitrary Ω minimizing solution to (1.1) we have 1 1 Tα (xαδ , y δ ) − α Ω(x †) − F (xαδ ) − y δ p Ω(xαδ ) − Ω † = α p 1 1 1 † δ p δ δ p ≤ F (x ) − y − F (xα ) − y α p p 1 1 p 1 δ δ p ≤ δ − F (xα ) − y α p p and taking into account the left-hand inequality in (2.2) we obtain Ω(xαδ ) − Ω(x †) ≤ 0.
20
2 Variational Source Conditions Yield Convergence Rates
The variational source condition (2.1) thus implies β E † (xαδ ) ≤ ϕ F (xαδ ) − F (x † ) ≤ ϕ F (xαδ ) − y δ + δ and the right-hand side in (2.2) yields β E † (xαδ ) ≤ ϕ (1 + τ ) δ ≤ (1 + τ ) ϕ(δ).
3
Existence of Variational Source Conditions
Abstract
We prove that variational source conditions are available in almost all situations.
3.1
Main Theorem
The aim of this chapter is to answer the question under which conditions we find a variational source condition of type (2.1). For linear equations in Hilbert spaces we know that general source conditions can always be satisfied, see [26], and that general source conditions imply variational source conditions without reducing the implied convergence rate, see [13, Chapter 13]. In Banach spaces up to now variational source conditions were only verified for few concrete examples, see, e.g., [15, 20, 27]. The only exception is 1 -regularization for linear equations, where in [25] it was shown that one always finds a variational source condition for some index function ϕ, see also Sect. 11.5. Of course, we have to connect the error functional E † in some way to the other ingredients of a variational source condition. Assumption 3.1 Given β > 0 and M ⊆ D(F ) we assume the following. (i) M is weakly sequentially closed. (ii) All solutions x ∗ to (1.1) satisfy the inequality in (2.1) if they belong to M, that is, β E † (x ∗ ) ≤ Ω(x ∗) − Ω † for all x ∗ in M with F (x ∗ ) = y † .
© Springer Nature Switzerland AG 2018 J. Flemming, Variational Source Conditions, Quadratic Inverse Problems, Sparsity Promoting Regularization, Frontiers in Mathematics, https://doi.org/10.1007/978-3-319-95264-2_3
21
22
3 Existence of Variational Source Conditions
(iii) The mapping x → −β E † (x) + Ω(x) is weakly sequentially lower semicontinuous on M. (iv) There are constants β˜ > β and c˜ ≥ 0 such that β˜ E † (x) − Ω(x) ≤ c˜
for all x in M.
These assumptions will be verified for several important special cases in the next section. There we will see that all relevant settings are covered. Theorem 3.2 Let Assumptions 1.13 and 3.1 be true for a constant β > 0 and a subset M of D(F ). Then there exists a concave index function ϕ such that the variational source condition (2.1) is satisfied. The theorem appeared first in [19]. The assertion of the theorem is quite similar to the main result of [26], where it is shown that in linear Hilbert space settings one always finds an index function such that a corresponding general source condition is satisfied. Our theorem extends this results to nonlinear Banach space settings. Remark 3.3 As already noted in Sect. 2.1, a variational source condition (2.1) only implies the desired convergence rate (1.5) if the set M is large enough. Here, ‘large enough’ means that all regularized solutions have to belong to M, at least for small noise levels δ. In Theorem 3.2 the size of M is restricted only by Assumption 3.1 and, as we will see in the next section, Assumption 3.1 is usually satisfied with M = D(F ). Thus, Theorem 3.2 not only yields some artificial variational source condition, but a variational source condition which implies rates. The proof of the theorem relies on the technique of approximate variational source conditions introduced in [28] and thoroughly studied in [13, Chapter 12]. Here we only introduce the parts we need for the proof. The idea is to measure the violation of a variational source condition with linear index function ϕ by the distance function
(3.1) Dβ (r) := sup β E † (x) − Ω(x) + Ω † − r F (x) − y † x∈M
for r ≥ 0. This function is the supremum of affine functions and thus convex and continuous on the interior of its domain. Obviously, it is monotonically decreasing. Lemma 3.4 Let Dβ be defined by (3.1). If Dβ (0) > 0 and lim Dβ (r) = 0,
r→∞
3.1 Main Theorem
23
then ϕ(t) = inf Dβ (r) + r t , r≥0
t ≥ 0,
defines a concave index function and a variational source condition (2.1) with this ϕ and same β and M as in (3.1) holds true. Proof From β E † (x) − Ω(x) + Ω † = inf β E † (x) − Ω(x) + Ω † − r F (x) − y † + r F (x) − y † r≥0
≤ inf Dβ (r) + r F (x) − y † r≥0
for all x in M we obtain the asserted variational source condition if ϕ is indeed an index function. Since Dβ is decreasing and goes to zero, it has to be nonnegative. Consequently, 0 ≤ ϕ(t) < ∞. We also immediately see that ϕ is monotonically increasing. In addition, ϕ is concave and upper semicontinuous as an infimum of affine functions. Thus, ϕ is continuous on the interior (0, ∞) of its domain. The decay of Dβ to zero yields ϕ(0) = 0, which, together with upper semicontinuity, yields continuity on the whole domain [0, ∞). The assumption Dβ (0) > 0 ensures ϕ(t) > 0 for some t. Thus, ϕ has to be strictly increasing near zero.
Proof of Theorem 3.2 We want to apply Lemma 3.4. If Dβ (0) ≤ 0, then the variational source condition holds for arbitrary index functions ϕ. So we may assume Dβ (0) > 0 and the lemma reduces the proof to verification of Dβ (r) → 0 if r → ∞. In addition, we may assume Dβ (r) ≥ 0 for all r > 0, because Dβ (r) < 0 for some r would imply a variational source condition with the best possible concave index function ϕ(t) = r t. As first step we show that for fixed r ≥ 0 the supremum in the definition (3.1) of Dβ (r) is attained. We write
(3.2) Dβ (r) := − inf −β E † (x) + Ω(x) − Ω † + r F (x) − y † x∈M
and denote by (xn )n∈N an infimizing sequence. The functional in the infimum is bounded on the sequence by some constant c. With β˜ and c˜ as in Assumption 3.1(iv) we see β ˜ † β † −β E (xn ) + Ω(xn ) = −β E (xn ) + Ω(xn ) + 1 − Ω(xn ) β˜ β˜ β β c˜ + 1− Ω(xn ), ≥ β˜ β˜
24
3 Existence of Variational Source Conditions
which implies β β c˜ 1− Ω(xn ) ≤ −β E † (xn ) + Ω(xn ) − ˜ β β˜ ≤ −β E † (xn ) + Ω(xn ) − Ω † + r F (xn ) − y † + Ω † − ≤ c + Ω† −
β c˜ β˜
β c˜ . β˜
Thus, (Ω(xn ))n∈N is bounded and we find a subsequence of (xn )n∈N converging weakly to some x˜ ∈ X. The subsequence will be denoted again by (xn )n∈N . Assumption 3.1(i) ensures x˜ ∈ M and with item (iii) one easily shows that x˜ is a minimizer in (3.2). As second step we take a sequence (rn )n∈N in [0, ∞) with rn → ∞ and corresponding maximizers xn in the definition (3.1) of Dβ (rn ) to show Dβ (r) → ∞ if r → ∞. As discussed above, Assumption 3.1(iv) implies β β c˜ 1− Ω(xn ) ≤ −β E † (xn ) + Ω(xn ) − Ω † + rn F (xn ) − y † + Ω † − ˜ β β˜ = −Dβ (rn ) + Ω † − ≤ Ω† −
β c˜ β˜
β c˜ . β˜
Thus, (Ω(xn ))n∈N is bounded and we find a subsequence of (xn )n∈N converging weakly to some x˜ ∈ M. The subsequence will be denoted again by (xn )n∈N with corresponding (rn )n∈N . From Dβ (rn ) ≥ 0 we obtain rn F (xn ) − y † ≤ β E † (xn ) − Ω(xn ) + Ω † and the right-hand side is bounded by Assumption 3.1(iv). Consequently, rn → ∞ implies ˜ = y†, F (xn ) − y † → 0 and the lower semicontinuity of x → F (x) − y † yields F (x) that is, the maximizers in the definition of Dβ converge (subsequentially) to solutions of (1.1). If we combine this observation with items (ii) and (iii) in Assumption 3.1, we obtain 0 ≤ lim inf Dβ (rn ) ≤ lim sup Dβ (rn ) = − lim inf −Dβ (rn ) n→∞
n→∞
n→∞
= − lim inf −β E † (xn ) + Ω(xn ) − Ω † + rn F (xn ) − y † n→∞
≤ − lim inf −β E † (xn ) + Ω(xn ) − Ω †
n→∞
≤ − −β E † (x) ˜ + Ω(x) ˜ − Ω† ≤ 0.
3.2 Special Cases
25
This proves Dβ (rn ) → 0 and, since (rn )n∈N was chosen arbitrarily, also Dβ (r) → 0 if r → ∞. The fact that (rn )n∈N is only a subsequence of the original sequence causes no troubles, because Dβ is monotonically decreasing. Thus, Dβ (rn ) → 0 has to hold for the original sequence, too.
Theorem 3.2 states that there is always an index function ϕ for a variational source condition. In principle the proof is constructive, but calculating the distance function Dβ is a difficult task. From the proof of Lemma 3.4 we see that also a majorant for Dβ yields an index function ϕ as long as this majorant decays to zero at infinity. Index functions ϕ constructed in this spirit can be found in [20, 25] and Sects. 7.5 and 11.4 for special settings. A very similar approach to obtain index functions ϕ is discussed in [29].
3.2
Special Cases
We provide three special cases for which Theorem 3.2 is applicable. Two other special cases are considered in Sect. 7.4 (autoconvolution with multiple solutions) and Sect. 11.6 (1 -regularization).
3.2.1
Linear Equations in Hilbert Spaces
Let X and Y be Hilbert spaces and let F be linear and bounded with D(F ) = X. For the Tikhonov functional (1.4) we choose p = 2 and Ω(x) = x2 . For each α there is exactly one Tikhonov minimizer. Equation (1.1) may have multiple solutions, but there is exactly one Ω minimizing solution, which we denote by x † . As error functional E † we choose E † (x) = x − x † 2 ,
x ∈ X.
Corollary 3.5 For each β ∈ (0, 1) there exists a concave index function ϕ such that the variational source condition (2.1) is satisfied with M = X. Proof Assumption 1.13 is obviously true and we only have to check Assumption 3.1 to apply Theorem 3.2. Item (i) is trivially true. Item (ii) reads β x ∗ − x † 2 ≤ x ∗ 2 − x † 2
for all x ∗ ∈ X with F (x ∗ ) = y † .
26
3 Existence of Variational Source Conditions
For solutions x ∗ the difference x ∗ − x † is in the nullspace of F and x † is in the orthogonal complement of the null space. Thus, the inner product x † , x ∗ −x † vanishes and we obtain the desired estimate β x ∗ − x † 2 = β x ∗ 2 − x † 2 − 2 Re x † , x ∗ − x † = β x ∗ 2 − x † 2 ≤ x ∗ 2 − x † 2 . Concerning item (iii) we observe −β x − x † 2 + x2 = (1 − β) x2 − 2 β Re x, x † + β x † 2 for all x ∈ X. Since β < 1 by assumption, the functional is obviously weakly sequentially lower semicontinuous. Finally, item (iv) follows from ˜ x2 − 2 β˜ Re x, x † + βx ˜ † 2 β˜ x − x † − x2 = −(1 − β) ˜ x2 + 2 β˜ x x † + βx ˜ † 2 , ≤ −(1 − β) because the right-hand side is bounded for all β˜ < 1. So we may choose β˜ ∈ (β, 1).
The fact that in a linear Hilbert space setting there is always a variational source condition has been proven already in a different and more complicated way in [13, Chapter 13] based on the result that there is always a general source condition (cf. [26]). In [13, Section 13.2] it is shown that the constant β in a variational source condition does not influence the index function ϕ in a linear Hilbert space setting, up to some scaling.
3.2.2
Bregman Distance in Banach Spaces
A first convergence rate result for Tikhonov regularization (1.4) with convex penalty Ω can be found in [30] based on a source condition. There the error functional E † is a Bregman distance with respect to Ω, which is defined as follows. Let x † be an Ω minimizing solution with nonempty subdifferential and denote by ξ † ∈ X∗ a subgradient of Ω at x † . Then BξΩ† (x, x † ) := Ω(x) − Ω(x † ) − ξ † , x − x † X∗ ×X ,
x ∈ X,
is the Bregman distance between x and x † . If Ω is a Hilbert space norm, then the corresponding Bregman distance is the usual norm distance.
3.2 Special Cases
27
Bregman distances became a standard tool for convergence rate analysis in Banach spaces. The original variational source condition in [15] used them, too. With Theorem 3.2 we can prove the following. Corollary 3.6 Let Assumption 1.13 be true and assume that there is only one solution x † to (1.1) with subgradient ξ † of Ω. Let E † be the corresponding Bregman distance E † (x) = BξΩ† (x, x † ),
x ∈ X.
Then there exist a constant β ∈ (0, 1) and a concave index function ϕ such that the variational source condition (2.1) is satisfied with M = D(F ). Proof We have to show that Assumption 3.1 is true. Item (i) is a consequence of the weak continuity of F and weak closedness of D(F ). Item (ii) is trivially true, because there is only one solution. Weak lower semicontinuity of x → −β E † (x) + Ω(x) = (1 − β) Ω(x) + β Ω † + β ξ † , x − x † X∗ ×X follows from Assumption 1.13(iv) and β ≤ 1. It remains to show boundedness of x → β˜ E † (x) − Ω(x) for some β˜ > β. Weak sequential compactness of the sublevel sets of Ω implies boundedness of the sublevel sets. Thus, by [31, Exercise 2.41 and pp. 324–326] there are a positive constant c1 and a constant c2 such that Ω(x) ≥ c1 x + c2
for all x ∈ X.
With this observation and with β˜ ∈ (β, 1] we obtain β˜ E † (x) − Ω(x) = (β˜ − 1) Ω(x) − β˜ Ω † − β˜ ξ † , x − x † X∗ ×X ˜ † , x † X∗ ×X ≤ (β˜ − 1) (c1 x + c2 ) − β˜ Ω † + β˜ ξ † x + βξ ˜ † , x † X∗ ×X = (c1 + ξ † ) β˜ − c1 x + (β˜ − 1) c2 − β˜ Ω † + βξ The last expression is bounded with respect to x ∈ X if β˜ ≤
c1 . c1 + ξ †
Thus, Assumption 3.1 is true for all β ∈ 0, c
c1 † . +ξ 1
28
3.2.3
3 Existence of Variational Source Conditions
Vanishing Error Functional
In [32] convergence rates for the Bregman distance between regularized and exact solutions are proven. But compared to our definition of the Bregman distance in the previous subsection the roles of regularized and exact solution are interchanged. That is, the subgradient of Ω is taken at the regularized solutions. As smoothness assumption a variational source condition (2.1) is used, but the error functional E † there is not the Bregman distance from the rate result. Instead, the choice E † (x) = 0,
x ∈ X,
in (2.1) suffices to prove rates Bξαδ (x † , xαδ ) = O(ϕ(δ)),
δ → 0,
where ξαδ is a subgradient of Ω at xαδ and x † is an Ω minimizing solution to (1.1). With Theorem 3.2 we can show that this special type of variational source condition is always available. Corollary 3.7 Let Assumption 1.13 be true and let E † be null everywhere. Then there exists a concave index function ϕ such that the variational source condition (2.1) holds with M = D(F ). Proof From Assumption 1.13 we easily derive Assumption 3.1. Thus, Theorem 3.2 applies.
Part II Quadratic Inverse Problems
4
What Are Quadratic Inverse Problems?
Abstract
We introduce the notion of quadratic inverse problems, present some examples and discuss ill-posedness as well as some geometric properties of quadratic mappings.
4.1
Definition and Basic Properties
Ill-posed inverse problems are frequently divided into two classes: linear and nonlinear ones. The reason for this distinction is that for linear inverse problems there is a huge and almost closed theory of regularization whereas for nonlinear ones there are only weak theoretical results and each concrete nonlinear inverse problem has to be handled in a different way. In the present part of the book we consider a subclass of nonlinear inverse problems and develop theoretical results and algorithms which can be applied to all inverse problems from this class. We call the subclass the class of quadratic inverse problems. Let X and Y be Banach spaces over R or C and let F : X → Y be a (nonlinear) mapping. The inverse problem under consideration is the equation F (x) = y †
(4.1)
with given y † in Y and sought-for x in X. We assume that there exists a solution, that is, y † belongs to the range of F .
© Springer Nature Switzerland AG 2018 J. Flemming, Variational Source Conditions, Quadratic Inverse Problems, Sparsity Promoting Regularization, Frontiers in Mathematics, https://doi.org/10.1007/978-3-319-95264-2_4
31
32
4 What Are Quadratic Inverse Problems?
Definition 4.1 The mapping F is called quadratic if there is a continuous bilinear mapping B : X × X → Y such that F (x) = B(x, x) holds for all x in X. Note that by definition quadratic mappings are always continuous. The quadratic structure implies several simple facts which will be used throughout the book. So let F be quadratic with underlying bilinear mapping B. Then we have F (x + u) = B(x + u, x + u) = B(x, x) + B(x, u) + B(u, x) + B(u, u) = F (x) + B(x, u) + B(u, x) + F (u) for x and u in X and also F (t x) = B(t x, t x) = t 2 B(x, x) = t 2 F (x) for scalars t. In particular, quadratic mappings cannot be injective because F (x) = F (−x). We also see F (0) = 0. Note that there might be many different bilinear mappings B, the diagonals of which produce the same quadratic mapping F . But restricting our attention to symmetric bilinear mappings, we enforce uniqueness. Proposition 4.2 Let F be quadratic. Then there is exactly one symmetric bilinear mapping BF : X × X → Y with F (x) = BF (x, x) for all x in X. The mapping BF is given by BF (x, u) = F for x and u in X.
x+u 2
−F
x−u 2
(4.2)
4.1 Definition and Basic Properties
33
Proof Obviously, BF (x, x) = F (x) for all x and from x−u u−x u−x F =F − =F 2 2 2 we see that BF is symmetric. Assume that there is another symmetric bilinear mapping B with B(x, x) = F (x). Then 1 B(x + u, x + u) − B(x − u, x − u) 4 x−u x+u −F = BF (x, u) =F 2 2
B(x, u) =
for all x and u in X. That is, BF = B.
We already mentioned above that quadratic mappings cannot be injective because F (x) = F (−x). We want to distinguish between mappings which only have this simple non-injectivity and mappings which are really non-injective. Definition 4.3 A quadratic mapping F is injective up to sign if for all x and u in X equality F (x) = F (u) implies x = u or x = −u. Injectivity up to sign can be characterized by the behavior of F around zero. This is quite similar to linear mappings. Proposition 4.4 A quadratic mapping F is injective up to sign if and only if for all x and u in X equality BF (x, u) = 0 implies x = 0 or u = 0. Proof Injectivity up to sign is equivalent to F (x) ˜ = F (u) ˜
⇒
x˜ = u˜ or x˜ = −u. ˜
Setting x := x˜ − u˜ and u := x˜ + u˜ this can be rewritten as x−u x+u =F ⇒ x = 0 or u = 0 F 2 2 and simplifying the equality on the left we obtain BF (x, u) = 0
⇒
x = 0 or u = 0.
Like for linear mappings global properties of quadratic mappings can be deduced from their local behavior.
34
4 What Are Quadratic Inverse Problems?
Proposition 4.5 Let F be quadratic and let ε be positive. The mapping F is uniquely determined by its values on the closed ε-ball around any point x0 . Proof For x in X \ {0} write x2 F (x) = 2 ε
x x 1 1 F x0 + ε + F x0 − ε − F (x0 ) 2 x 2 x
to verify the assertion.
Another similarity to linear mappings is that quadratic mappings constitute a normed vector space. Obviously, (pointwise) sums and scalar multiples of quadratic mappings are again quadratic mappings. It remains to define a norm on this vector space. But since the vector space of all continuous bilinear mappings B carries the norm B :=
sup
B(x, u)
x,u∈X x≤1,u≤1
we may define F := BF . This immediately provides the estimate F (x) ≤ F x2
(4.3)
for all x in X. Together with Proposition 4.2 we obtain that the space of quadratic mappings is isometrically isomorphic to the space of continuous symmetric bilinear mappings. For later use we introduce the notion of the adjoint of a bilinear mapping. Given a symmetric bounded bilinear mapping B : X × X → Y and elements x from X and η from Y ∗ the functional u → η, B(x, u) on X is obviously linear and bounded. Thus it can be represented by some element from X∗ which we denote by B ∗ (x, η). In this way we obtain a bounded sesquilinear mapping B ∗ : X × Y ∗ → X∗ , to which we refer as the adjoint mapping of B. The mapping B is antilinear in its first component and linear in its second, that is, B ∗ (a x, b η) = a b B ∗ (x, η) for complex numbers a and b. The same construction works for the functional u → η, B(u, x) and, since B is symmetric, yields the same sesquilinear mapping B ∗ . The adjoint B ∗ is uniquely determined by η, B(x, u) = B ∗ (x, η), u
for all u, x in X and all η in Y ∗ .
(4.4)
4.2 Examples
35
A last basic property of quadratic mappings we want to mention in this introductory section is that quadratic mappings always are differentiable. Proposition 4.6 Each quadratic mapping F is Fréchet differentiable on X. The Fréchet derivative F [x] : X → Y at x is given by F [x] h = 2 BF (x, h)
for h in X.
Proof For x and h in X we have F (x + h) − F (x) − 2 BF (x, h) F (h) = ≤ F h. h h Taking the limit h → 0 completes the proof.
Quadratic mappings are also twice Fréchet differentiable. For F we have F [x](h1 , h2 ) = 2 B(h1 , h2 ) for all h1 , h2 and x in X. Obviously, all higher derivatives are zero. Thus, the Taylor expansion of F at x0 is F (x) = F (x0 ) + 2 B(x0 , x − x0 ) + B(x − x0 , x − x0 ) for all x in X.
4.2
Examples
We provide several examples of quadratic mappings and corresponding inverse problems. The focus in this book is on different types of autoconvolutions but the results apply to other quadratic mappings, too.
4.2.1
Autoconvolutions
Let X = L2 (0, 1) be the space of real-valued or complex-valued square integrable functions over the interval (0, 1). There are three common types of autoconvolution operations on this space.
4.2.1.1 Autoconvolution of Functions with Uniformly Bounded Support We may interpret functions in L2 (0, 1) as functions defined on the whole real line, but with support contained in (0, 1). In other words, the functions have uniformly bounded
36
4 What Are Quadratic Inverse Problems?
support and without loss of generality we assume that the support is bounded by zero and one. Then the usual autoconvolution of real-valued or complex-valued functions over the real line reduces to
F (x) (s) :=
⎧s ⎪ ⎪ ⎪ ⎪ x(s − t) x(t) dt, ⎪ ⎪ ⎨
if s ∈ (0, 1),
1 ⎪ ⎪ ⎪ ⎪ x(s − t) x(t) dt, ⎪ ⎪ ⎩
if s ∈ (1, 2)
0
s−1
or, which is the same,
F (x) (s) =
min{s, 1}
x(s − t) x(t) dt,
s ∈ (0, 2).
(4.5)
max{0, s−1} 2 Proposition 4.7 The mapping F defined by (4.5) is a quadratic mapping from L (0, 1) into L2 (0, 2) with 23 ≤ F ≤ 1. Further, F is injective up to sign.
Proof With BF : L2 (0, 1) × L2 (0, 1) → L2 (0, 2) given by BF (x, u) (s) :=
min{s, 1}
x(s − t) u(t) dt,
s ∈ (0, 2)
(4.6)
max{0, s−1}
we have F (x) = BF (x, x). To prove that F is quadratic it remains to show that the symmetric bilinear mapping BF is continuous. We write 2 2
BF (x, u) =
BF (x, u) (s)2 ds
0
2 1 2 1 s = x(s − t) u(t) dt + x(s + 1 − t) u(t) dt ds 0
0
s
4.2 Examples
37
and apply the Cauchy–Schwarz inequality to bound the first inner integral by s 2 x(s − t) u(t) dt ≤ x|(0,s)2 u|(0,s)2 0
and the second by 2 1 x(s + 1 − t) u(t) dt ≤ x|(s,1)2 u|(s,1)2 s
2
2 = x2 − x|(0,s) u2 − u|(0,s) . Thus, we see BF (x, u)2 1 ≤
2
2 2
2 x2 u2 − x|(0,s) u2 − u|(0,s) − u|(0,s) x2 − x|(0,s) ds
0
1 =
2 2 2 2 x2 u2 − x|(0,s) u|(s,1) − u|(0,s) x|(s,1) ds
0
and therefore 1 BF (x, u) ≤
x2 u2 ds = x2 u2 .
2
0
Next to the continuity of F we also obtain from this estimate that F ≤ 1. Injectivity up to sign is a consequence of Titchmarsh’s theorem (see [33, Theorem VII]). It states that if BF (x, u) = 0 then either x = 0 or u = 0 has to be true. This is exactly the characterization of injectivity up to sign provided by Proposition 4.4. The lower bound for F follows from F (x) =
2 3
if x(t) = 1 for all t.
A proof of F ≤ 1 has already been given in [34, Corollary 24]. But one seems to be only an upper bound which is not sharp.
38
4 What Are Quadratic Inverse Problems?
Conjecture 4.8 For F defined by (4.5) we conjecture that F =
2 . 3
This conjecture is based on two observations: For x(t) = 1, t ∈ (0, 1), we have 2 F (x) = 3 and numerical approximation of F suggests that F (x) is maximized by x ≡ 1 under the constraint x ≤ 1. Several attempts to prove this conjecture failed. For real spaces in [35, Proposition 2.3] local ill-posedness in the sense of Hofmann (cf. Definition 1.4) of F at every point was shown if the domain of F is restricted to nonnegative functions. Local ill-posedness in the sense of Hofmann in every point was shown in [36, Examples 3.1, 3.2] for the complex case. Since F is injective up to sign equation (4.1) only has isolated solutions. Thus, by Proposition 1.8 the mapping F is locally illposed in the sense of Ivanov (cf. Definition 1.5) at each data element. The mapping F is known to be weakly continuous and to not being compact, see [37, Proposition 1]. The type of autoconvolution discussed here appears for example if the density of two identically and independently distributed random variables with density supported in [0, 1] has to be computed. The corresponding inverse problem is to find the density of the underlying random variables from the density of their sum.
4.2.1.2 Truncated Autoconvolution of Functions with Uniformly Bounded Support The autoconvolution operation introduced above maps from L2 (0, 1) into L2 (0, 2). Restricting the images to the interval (0, 1) yields a mapping F : L2 (0, 1) → L2 (0, 1) given by F (x) (s) :=
s x(s − t) x(t) dt,
s ∈ (0, 1).
(4.7)
0
Proposition 4.9 The mapping F defined by (4.7) is a quadratic mapping from L2 (0, 1) into L2 (0, 1) with 0.6860 ≤ F ≤ 1. Proof The proposition can be proven analogously to Proposition 4.7. The lower bound for √ F follows from F (x) = 16 + π32 ≥ 0.6860 if x(t) = 2 cos π2 t for all t.
Obviously the mapping F is not injective up to sign because all functions with support contained in ( 12 , 1) are mapped to zero. Note that the lower bound for F is greater than the conjectured norm of F in the untruncated case, see previous subsection.
4.2 Examples
39
In [38] several further properties of F in case of real spaces were proven. For instance, that F is weakly continuous on properly restricted domains, that F (x) is a continuous function for all x, and that F is compact on certain subsets but not on the whole space. Weak continuity on the whole (real) space was shown in [34, Proposition 8]. The mapping F between real spaces is locally ill-posed in the sense of Hofmann everywhere, see [36, Example 2.1]. The same is true if the L2 -spaces are replaced by spaces of continuous functions, see [39, Example 2.2]. Truncated autoconvolution mappings as discussed here play an important role in spectroscopy. See, e.g., [40] for an application in appearance potential spectroscopy.
4.2.1.3 Autoconvolution of Periodic Functions We may interpret real-valued or complex-valued functions in L2 (0, 1) as periodic functions on the whole real line. Then the usual autoconvolution of periodic functions can be written as F (x) (s) =
s
1 x(s − t) x(t) dt +
x(s + 1 − t) x(t) dt,
s ∈ (0, 1).
(4.8)
s
0
Proposition 4.10 The mapping F defined by (4.8) is a quadratic mapping from L2 (0, 1) into L2 (0, 1) with F = 1. Proof The underlying symmetric bilinear mapping BF can be defined in the obvious way, cf. proof of Proposition 4.7. To show its boundedness we write 1 BF (x, u)2 =
⎞2 ⎛ s 1 ⎝ x(s − t) u(t) dt + x(s + 1 − t) u(t) dt ⎠ ds
0
0
s
and apply the Cauchy–Schwarz inequality to obtain 1
2 x|(0,s) u|(0,s) + x|(s,1) u|(s,1) ds. BF (x, u) ≤ 2
0
Using again the Cauchy–Schwarz inequality, but now for vectors with two components, we see x|(0,s) u|(0,s) + x|(s,1) u|(s,1) 2 2 2 2 ≤ x|(0,s) + x|(s,1) u|(0,s) + u|(s,1) = x u
40
4 What Are Quadratic Inverse Problems?
and therefore BF (x, u)2 ≤ x2 u2 . Thus, BF is bounded and F ≤ 1. Equality F = 1 follows from F (x) = x for x(t) = 1, t ∈ (0, 1).
Note that F defined by (4.8) is not injective up to sign, because we find x = 0 and u = 0 with BF (x, u) = 0 (cf. Proposition 4.4). Choose, e.g., x(t) = sin(2 π t)
and u(t) = sin(4 π t),
t ∈ (0, 1).
Exploiting the Fourier convolution theorem we see that BF (x, u) = 0 if and only if the Fourier transforms of x and u have disjoint supports. This type of autoconvolution is well-known in many branches of mathematics. In the sequel we will not consider the corresponding inverse problem, but we will use this type of autoconvolution for proving results for the before mentioned variants of autoconvolution (cf. Example 7.6).
4.2.2
Kernel-Based Autoconvolution in Laser Optics
The example of a quadratic mapping presented in this subsection originates from joint research work of TU Chemnitz (professorship ‘Inverse Problems’) and Max Born Institute for Nonlinear Optics and Short Pulse Spectroscopy in Berlin (research group ‘Solid State Light Sources’).
4.2.2.1 Ultra-Short Laser Pulses With today’s laser technology scientists are able to produce sequences of extremely short laser pulses. Up to 100 million pulses per second, each lasting only 5–100 fs, can be generated with so called femtosecond lasers. Such ultra-short pulses are the shortest events mankind can produce. To get a better idea of the duration of a femtosecond we note that a femtosecond is related to a second in the same way as a second is related to 31.7 million years. Also note that light travels only 0.3 μm/fs and that the period of visible light is about 2 fs. Since the reaction time of the pigments in the human eye is at 200 fs we are not able to see ultra-short laser pulses. Applications of ultra-short laser pulses are manifold. They allow, for example, to machine inside a material without affecting its surface or to drill and cut without measurable burr and without build-up heat. Also the observation of chemical reactions in realtime can be realized with the help of ultra-short laser pulses.
4.2 Examples
41
4.2.2.2 SD-SPIDER Method Characterizing ultra-short laser pulses is a major challenge due to the fact that no direct measurements can be obtained. Each pulse lasts only few optical cycles and thus even light is to slow to obtain full information about the variation of the electric field during one pulse. Existing approaches for characterizing ultra-short laser pulses are autocorrelation techniques, frequency-resolved optical gating (FROG) and spectral phase interferometry for direct electric field reconstruction (SPIDER). In 2010 the research group ‘Solid State Light Sources’ at Max Born Institute for Nonlinear Optics and Short Pulse Spectroscopy in Berlin presented the self-diffraction SPIDER method (SD-SPIDER), which allows for fully characterizing ultra-short pulses but requires the solution of an autoconvolution problem. Figure 4.1 shows the experimental setup at Max Born Institute Berlin and Fig. 4.2 provides a scheme of the SD-SPIDER method. At first the beam of ultra-short pulses generated by the laser is split into two beams. One passes through a long glass cylinder which results in chirped pulses, that is, due to frequency-dependent speed of light in glass frequencies are segregated and the pulse is stretched. The other beam is split again. After delaying one part slightly both parts are rejoined, resulting in a doubling of each pulse. In practice the doubling is realized by the first splitter since both sides of the splitter reflect the beam with a slight delay between both reflected beams. The two beams are rejoined inside a so called nonlinear optical material with such a delay that a chirped pulse meets a doubled pulse, resulting in a beam of spectrally sheared delayed pairs of pulses, which then enters a spectrograph. With the help of the spectrograph
Fig. 4.1 Experimental setup for SD-SPIDER method. The lines indicate the path of the laser pulses (photo courtesy by S. Birkholz from Max Born Institute Berlin, lines and text added by the author)
42
4 What Are Quadratic Inverse Problems? ultra-short pulse
streched pulse
glass cylinder
laser
delayed pair
delay spectrally sheared delayed pair
nonlinear material delay spectrograph Fig. 4.2 Scheme of experimental setup for SD-SPIDER method
the interferences in the frequency domain are recorded. The nonlinear material used at Max Born Institute Berlin is barium fluoride.
4.2.2.3 The Inverse Problem We do not go into the details of the full path from the measured data to absolute value and phase of the original laser pulse. Figure 4.3 shows a data set from the experiments in Berlin and Fig. 4.4 shows the data which results from preprocessing steps (mainly the Takeda algorithm) and which is the relevant information for the inverse problem part of the reconstruction process. The inverse problem consists in solving a kernel-based autoconvolution equation. Denote by L2C (0, 1) the space of complex-valued square integrable functions over the interval (0, 1) and let k : D(k) → C be a bounded and continuous function with domain D(k) = (s, t) : 0 ≤ s ≤ 2, max{0, s − 1} ≤ t ≤ min{s, 1} . Then the mapping F : L2C (0, 1) → L2C (0, 2) in (4.1) for the SD-SPIDER reconstruction problem is given by F (x) (s) =
min{s, 1}
k(s, t) x(s − t) x(t) dt, max{0, s−1}
which is the same as (4.5) if the kernel k is one everywhere.
s ∈ (0, 2),
(4.9)
4.2 Examples
43
200
150
100
50
0
390
395
400
405
410
frequency in THz
Fig. 4.3 SD-Interferograms with (black) and without (gray) nonlinear optical material recorded at Max Born Institute Berlin. The interferogram without nonlinear material is used for calculating the delay between the two pulses of a pair
200
150
100
50
0
390
395
400
frequency in THz
405
410
Fig. 4.4 Absolute value (black) and phase (gray) of right-hand side of the inverse problem (4.1). The phase is only sufficiently accurate where the absolute value is not close to zero. Thus only frequencies between 392 and 407 THz are of interest. This interval is rescaled to (0, 1) for simplicity. Implementation of the preprocessing steps (Takeda algorithm) was done by Steven Bürger (TU Chemnitz)
44
4 What Are Quadratic Inverse Problems?
0.9
2.5
0.85
|k(s, t)|
2
0.8
1.5
0.75
1
0.7
0.5
0.65
0 1
t
0.5
0.5
0 0
s
1
1.5
2
Fig. 4.5 A kernel function for (4.9) as it occurs in the SD-SPIDER method. The absolute value is given on the vertical axis and the phase is indicated by the gray level
The function x represents the Fourier transform of the desired electric field of the ultrashort laser pulse over time and the function F (x) is some intermediate step between the original electric field and the recorded SD-interferogram. In practice the absolute value of x can be measured, but the absolute value of F (x) is only accessible with very low accuracy. These facts can be incorporated into the reconstruction process and we refer to [34, 37] for such approaches. The kernel function depends on properties of the experimental setup. Essential influence have the thickness of the nonlinear material and the angle between the two beams arriving at it. Both parameters have to be measured as accurately as possible. The kernel always is bounded away from zero and it is bounded above. A typical kernel function is plotted in Fig. 4.5. Details on the kernel and an explicit formula can be found in [34, 41]. Proposition 4.11 Let |k(s, t)| ≤ c for all (s, t) in D(k). Then the mapping F defined by (4.9) is a quadratic mapping from L2C (0, 1) into L2C (0, 2) with F ≤ c. Proof The symmetric bilinear mapping BF from Proposition 4.2 is min{s, 1}
BF (x, u) (s) =
k(s, t)
x(s − t) u(t) + u(s − t) x(t) dt, 2
s ∈ (0, 2)
max{0, s−1}
and can be rewritten as
BF (x, u) (s) =
min{s, 1}
max{0, s−1}
k(s, t) + k(s, s − t) x(s − t) u(t) dt, 2
s ∈ (0, 2).
4.2 Examples
45
Thus we have 2 BF (x, u)2 ≤ c2 0
⎛ ⎜ ⎝
min{s, 1}
⎞2 ⎟ |x(s − t) u(t)| dt ⎠ ds,
max{0, s−1}
which yields BF (x, u) ≤ c x u as in the proof of Proposition 4.7.
In general the inverse problem with F defined in (4.9) is locally ill-posed everywhere, because for k ≡ 1 we obtain the kernelless autoconvolution problem discussed above. Fixing the amplitude and allowing only perturbations in the phase, locally well-posed situations can be observed, see [36, Proposition 3.3]. Analogously to the case k ≡ 1 the mapping F is weakly continuous and not compact, see [37, Propositions 1, 2].
4.2.3
Schlieren Tomography
Ultrasound transducers are integral parts in medical diagnostics and treatment as well as in material testing and many other fields. Next to the wanted effects of the produced acoustic pressure also unwanted effects can occur (e.g. cavitation in medical applications). Therefore the shape and the intensity of the pressure distribution should be known as accurately as possible. But direct measurements at sufficiently many points inside the sonicated medium are too expensive and too time-consuming. Schlieren tomography is a widely applied and not too complex alternative to direct measurements, which allows to visualize the pressure distribution in fluids with high resolution and accuracy. The principal construction of a Schlieren system is shown in Fig. 4.6. A cylindrical tank filled with water is illuminated by parallel light. The ultrasound transducer, the pressure distribution of which shall be examined, is mounted at the top of the cylinder and at the bottom a sound absorbing material avoids reflections. Due to variations in the water’s density light is diffracted more or less. A lens focuses the diffracted light onto a screen and the undiffracted light is filtered out. The results are dark and light areas on the screen corresponding to negative and positive pressure regions along the light’s path. Up to negligible side effects the intensity of light arriving at a point of the screen is proportional to the square of the integral over the pressure distribution along the corresponding ray. Thus, up to the square, the relation between pressure and observed image is the same as in X-ray imaging between density of the body and observed image. Taking images from many sides of the cylinder, tomographic reconstruction of the pressure distribution in space is possible. The corresponding mapping F in (4.1) is the pointwise square of the Radon transform.
46
4 What Are Quadratic Inverse Problems?
ultrasound transducer water tank
screen (camera)
lense
lense
filter
light source
diffracted light ultrasound absorber
undiffracted light
Fig. 4.6 The principal setup of a Schlieren imaging system
To make things more precise we look at a horizontal slice Ω := {(v, w) ∈ R2 : v 2 + w2 < 1} of the cylinder and denote by S1 := {σ ∈ R2 : σ12 + σ22 = 1} the set of all directions for which Schlieren images are taken. Here, the screen is assumed to be in parallel to the chosen direction (σ and −σ yield the same image, but for simplicity we do not exclude this doubling). By σ ⊥ := (−σ2 , σ1 ) we denote the direction which is orthogonal to a given direction σ . Figure 4.7 shows a sketch of the setting. With this notation at hand the Schlieren mapping is given by ⎞2 ⎛ √ 2 1−s ⎟ ⎜ x(s σ + t σ ⊥ ) dt ⎟ F (x) (s, σ ) := ⎜ ⎠ , ⎝ √ −
s ∈ (−1, 1), σ ∈ S1 ,
(4.10)
1−s 2
where x is real-valued and defined on Ω. Proposition 4.12 The mapping F : L2R (Ω) → L1R (0, 1) × S1 defined by (4.10) is a quadratic mapping with F ≤ 4 π.
4.2 Examples
47
Fig. 4.7 Parametrization of two-dimensional slice through the cylinder
screen
σ 0
Proof The underlying symmetric bilinear mapping BF is given by ⎛ √ ⎞ 2 1−s ⎜ ⎟ B(x, u) (s, σ ) := ⎜ x(s σ + t σ ⊥ ) dt ⎟ ⎝ ⎠ √ −
1−s 2
⎛ √ ⎞ 2 1−s ⎜ ⎟ ⎜ u(s σ + t σ ⊥ ) dt ⎟ ⎝ ⎠ √ −
1−s 2
and we have BF (x, u) = (−1,1)×S1
√ 1−s 2 ⊥ x(s σ + t σ ) dt √ − 1−s 2
√ 1−s 2 ⊥ d(s, σ ). u(s σ + t σ ) dt √ − 1−s 2
The Cauchy–Schwarz inequality applied√to both inner integrals and then to the outer integral in combination with the estimate 1 − s 2 ≤ 1 yields BF (x, u) ≤2 (−1,1)×S1
√ √ 1−s 2 1−s 2 x(s σ + t σ ⊥ )2 dt u(s σ + t σ ⊥ )2 dt d(s, σ ) √ √ −
1−s 2
−
1−s 2
48
4 What Are Quadratic Inverse Problems?
≤ 2
√ 2 1−s
x(s σ + t σ ⊥ )2 dt d(s, σ )×
√
(−1,1)×S1 −
×
1−s 2
√ 2 1−s
√
(−1,1)×S1 −
u(s σ + t σ ⊥ )2 dt d(s, σ ).
1−s 2
The first double integral reduces to √ 2 1−s
√
(−1,1)×S1 −
x(s σ + t σ ⊥ )2 dt d(s, σ ) =
√ 2 1−s
1
√
S1 −1 −
1−s 2
x(s σ + t σ ⊥ )2 dt ds dσ
1−s 2
x2 dσ = 2 π x2
= S1
and the second analogously to 2 π u2 . Thus, BF (x, u) ≤ 4 π.
4.3
Local Versus Global Ill-Posedness
Quadratic mappings are nonlinear and ill-posedness properties may vary from point to point. On the other hand Proposition 4.5 shows, that information about local ill-posedness or well-posedness at each point is contained in an arbitrarily small ball around zero (or any other point) and thus should have some structure. As a result in this direction the following proposition states that ill-posedness or well-posedness of quadratic mappings does not vary on rays. Proposition 4.13 If a quadratic mapping F is locally well-posed (or ill-posed) in the sense of Ivanov at y0 then it is locally well-posed (or ill-posed) at t y0 for all positive t. Proof We apply Proposition 1.6. Let F be locally well-posed at y0 . Denote by (yk )k∈N a sequence in R(F ) converging to t y0 and let (x˜k )k∈N be a sequence in X with F (x˜k ) = yk for all k. Then 1t yk → y0 and F ( √1t x˜k ) = 1t yk . Local well-posedness at y0 implies existence of a sequence (xk )k∈N with F (xk ) = y0 and √1t x˜k − xk → 0. Thus, for the √ √ √ sequence ( t xk )k∈N we have F ( t xk ) = t y0 and x˜k − t xk → 0. This shows local well-posedness at t y0 .
4.4 Geometric Properties of Quadratic Mappings’ Ranges
49
If F is locally ill-posed at y0 but not locally ill-posed at some point t y0 , then the proof’s first part would imply local well-posedness at y0 (use y˜0 := t y0 ). Thus, the proof is complete.
There are quadratic mappings which are everywhere locally well-posed, for instance strong quadratic isometries, see Sect. 6.1. There are also quadratic mappings which are everywhere locally ill-posed. This is for instance the case for autoconvolution of functions with uniformly bounded support presented in Sect. 4.2.1. The problem, whether there exist quadratic mappings which are locally ill-posed at some point but locally well-posed at another point, has been solved in [3]. There, an example shows that local well-posedness and local ill-posedness can occur simultaneously at different points.
4.4
Geometric Properties of Quadratic Mappings’ Ranges
Here we collect some observations on the geometric structure of the range of a quadratic mapping. These observation will not be used in subsequent chapters, but may help to get some intuition about the behavior of quadratic mappings. As we will see, ellipses play a central role. Therefore we start with some remarks on the definition of ellipses. In the plane R2 an ellipse with half-axis lengths a and b in standard form is the set of all points (x1 , x2 ) such that x12 x22 + 2 = 1. a2 b A diameter of an ellipse is a line segment between two points of the ellipse through the ellipse’s center. Two diameters are called conjugate diameters if the one is in parallel with the tangents to the ellipse at the other’s intersection points with the ellipse. Analogously conjugate radii can be defined. Now let E be an ellipse in the plane R2 with center (x1 , x2 ) and with conjugate radii represented by the vectors [r1 , r2 ]T and [s1 , s2 ]T . Then E = {x + α r + β s : α 2 + β 2 = 1}. If, on the other hand, a set E can be represented in this form, then E is an ellipse with corresponding center and conjugate radii. In analogy to convex combinations, we call a linear combination, the squared coefficients of which add to one, elliptic combination. The same way ellipses can be described as elliptic combinations of two conjugate radii, n-dimensional ellipsoids are determined by there center and n conjugate radii (tangents in the definition of conjugate radii have to be replaced by tangent planes). With this technique finite-dimensional ellipsoids can be defined in infinite-dimensional spaces, too.
50
4 What Are Quadratic Inverse Problems?
Given three elements x, r, s in a Hilbert space X, by ell(x, r, s) := {x + α r + β s : α 2 + β 2 = 1} we denote the ellipse with center x and conjugate radii r, s. In the following X and Y are real Hilbert spaces and F : X → Y is a quadratic mapping. Proposition 4.14 Each quadratic mapping maps two-dimensional subspaces to elliptic cones. More precisely, F (x) + F (u) F (x) − F (u) F span {x, u} = , , BF (x, u) t ell . 2 2 t ≥0
Proof For coefficients a, b with a 2 + b 2 = 1 we have F (a x + b u) = a 2 F (x) + 2 a b BF (x, u) + b2 F (u) =
a 2 − b2 a 2 + b2 (F (x) + F (u)) + (F (x) − F (u)) + 2 a b BF (x, u) 2 2
=
2 a2 − 1 1 (F (x) + F (u)) + (F (x) − F (u)) + 2 a b BF (x, u). 2 2
Because (2 a 2 − 1)2 + (2 a b)2 = 4 a 4 − 4 a 2 + 1 + 4 a 2 (1 − a 2 ) = 1, (u) F (x)−F (u) we see that ell(0, x, u) is mapped to ell F (x)+F , , BF (x, u) . The observation 2 2 span {x, u} =
t ell(0, x, u)
t ≥0
completes the proof.
From the proof we immediately see that ellipses centered at the origin are mapped to ellipses centered somewhere else. In particular, intersections of the unit sphere and twodimensional subspaces are mapped to ellipses. This observation instigates the idea that intersections of the unit sphere and n-dimensional subspaces are mapped to ellipsoids. But this is not the case if n > 2. Figure 4.8 shows the image of the unit sphere in
4.4 Geometric Properties of Quadratic Mappings’ Ranges
51
1.0
0.5
0.0
–0.5
–1.0 –1.0
–0.5
–0.0
–0.5
–1.0 –5.0 0.0 0.5 1.0
1.0
1.0
0.5
0.5
0.0
0.0
–0.5
–0.5
–1.0 –1.0
–0.5
0.0
0.5
–1.0 –0.5 0.0 0.5 1.0
–1.0 –1.0
1.0
1.0
0.5
0.5
0.0
0.0
–0.5
–0.5
–1.0 –1.0
–0.5
0.0
–0.5
1.0 0.5 0.0 –0.5 –1.0
–1.0 –1.0
–0.5
0.0
0.5
–1.0 –0.5 0.0 0.5 1.0
–0.5
0.0
–0.5
1.0 0.5 0.0 –0.5 –1.0
Fig. 4.8 Image of the unit sphere in R3 under the quadratic mapping (x1 , x2 , x3 ) → (x12 + √ √ √ 2 x2 x3 , x22 + 2 x1 x3 , x32 + 2 x1 x2 ). Upper left: unit sphere, middle left: full image, lower left: same as middle left but seen from the opposite direction, middle right: same as middle left but with the cap cut off, lower right: same as lower left but with the cap cut off
52
4 What Are Quadratic Inverse Problems?
three-dimensional space under the quadratic mapping F : R3 → R3 defined by √ √ √ F (x) := x12 + 2 x2 x3 , x22 + 2 x1 x3 , x32 + 2 x1 x2 ,
x ∈ R3 .
This is not an ellipsoid. Nevertheless, the structure of the unit sphere’s image under quadratic mappings can be described with the help of ellipses. Note, that the image of the whole space then is the cone spanned by this image set. Proposition 4.15 Let (ek )k∈N be an orthonormal basis in X and denote by Sn−1 the intersection of the unit sphere in X with span {e1 , . . . , en }. Then F (S1 ) is an ellipse and F (Sn−1 ) =
x∈Sn−1
ell
F (en ) + F (x) F (en ) − F (x) , , BF (en , x) 2 2
for n > 1. Proof That F (S1 ) is an ellipse follows from the proof of Proposition 4.14. For each element in Sn−1 there is some x in Sn−2 such that the element is contained in ell(0, en , x). Thus, F (Sn−1 ) =
F ell(0, en , x)
x∈Sn−2
and the assertion follows as in the proof of Proposition 4.14.
4.5
Literature on Quadratic Mappings
There is only very few literature on quadratic mappings, especially on quadratic mappings in infinite-dimensional spaces. Most of the publications focus on autoconvolutions. In this section we provide a brief overview of the literature relevant for this part of the book. The core material this book builds up on are publications on de-autoconvolution as an inverse problem. There was a first accumulation in the 1990s [35, 38, 42], followed by the articles [43–45]. A second accumulation started in the past five years with [20, 34, 36, 37, 39, 41, 46–51]. In the engineering literature several practical methods for de-autoconvolution in finitedimensional settings are discussed. Some articles of this type can be found in the reference list of [40]. There are also relatively old works on convexity properties of the range of quadratic mappings in finite dimensions, see [52–54]. For extensions of those results see [55, 56] and references therein.
4.5 Literature on Quadratic Mappings
53
In principle a quadratic mapping can be regarded as a special case of tensors. Thus, results about tensors apply to some extend also to quadratic mappings. We mention [57, 58] here, where singular value decompositions for tensors in finite dimensions are discussed. Although such concepts could be useful for solving quadratic inverse problems, closer inspection and numerical tests lead to the decision to not follow this path.
5
Tikhonov Regularization
Abstract
We briefly consider Tikhonov-type regularization for solving quadratic inverse problems.
A simple but effective regularization method to approximate solutions of ill-posed nonlinear equations is the method of Tikhonov. We restrict our attention to separable Hilbert spaces X and Y . Then Tikhonov’s method for quadratic equations (4.1) consists in minimizing the Tikhonov functional ¯ 2, Tα (x, y) := F (x) − y2 + α x − x
x ∈ X, y ∈ Y
with respect to x. The element y is an approximation to the exact right-hand side y † of (4.1), the reference element x¯ in X deals as initial guess of the exact solution and the positive regularization parameter α controls the trade-off between data fitting and stabilization. Classical results in [59] on existence, stability and convergence of Tikhonov minimizers are based an the assumption that the mapping F is sequentially weak-to-weak continuous, see also Sect. 1.3. In general, quadratic mappings do not have this property. An example is the mapping F : X → R defined by F (x) = x2 . For an orthonormal basis (ek )k∈N we have F (ek ) = 1 although the sequence (ek ) converges weakly to zero. Verification of weak-to-weak continuity reduces to its verification at zero. Proposition 5.1 A quadratic mapping is sequentially weak-to-weak continuous on X if and only if it is sequentially weak-to-weak continuous at zero.
© Springer Nature Switzerland AG 2018 J. Flemming, Variational Source Conditions, Quadratic Inverse Problems, Sparsity Promoting Regularization, Frontiers in Mathematics, https://doi.org/10.1007/978-3-319-95264-2_5
55
56
5 Tikhonov Regularization
Proof Let (xk )k∈N be a sequence in X converging weakly to some x in X. The mapping F is weak-to-weak continuous if for each such sequence η, F (xk ) − F (x) → 0 for all η in Y . Rewriting F (xk ) − F (x) = F (xk − x) + 2 BF (xk − x, x) and using the notion of adjoint bilinear mappings defined by (4.4) we see η, F (xk ) − F (x) = η, F (xk − x) + 2 η, BF (xk − x, x) = η, F (xk − x) + 2 BF∗ (x, η), xk − x). Thus, η, F (xk ) − F (x) → 0 if and only if η, F (xk − x) → 0. In other words, F is weak-to-weak continuous if and only if xk − x 0
⇒
F (xk − x) 0,
which by F (0) = 0 is simply the definition of weak-to-weak continuity at zero.
All examples of quadratic mappings provided in Sect. 4.2 are weak-to-weak continuous (see provided references there) and hence Tikhonov regularization is a stable and convergent approximation method for the solutions of (4.1). Remark 5.2 The problem of non-injectivity of quadratic mappings carries over to the Tikhonov functional. If the reference element x¯ is zero, we obviously have that Tα (x, y) = Tα (−x, y) for all x and y. This issue cannot be solved by choosing x¯ = 0, but only slightly tempered. For all x with x, x ¯ = 0 we still have Tα (x, y) = Tα (−x, y). Thus, we have to expect multiple Tikhonov minimizers. Analogously, there might by multiple norm minimizing solutions to (4.1) (i.e., solutions with minimal distance to x), ¯ even if x¯ = 0. For x¯ = 0 we want to mention the useful observation that the range of sensible regularization parameters is bounded above. This behavior is typical only for Tikhonov regularization with quadratic mappings and for sparsity promoting regularization with linear mappings (cf. Proposition 9.7). Proposition 5.3 Let y ∈ Y and set αmax := 2 sup Re F (x), y. x∈X x≤1
If α ≥ αmax , then 0 ∈ argmin Tα (x, y). x∈X
5 Tikhonov Regularization
57
If, in addition, α > αmax or F is injective up to sign, then argmin Tα (x, y) = {0}. x∈X
Proof If x = 0 we have Tα (x, y) = F (x)2 − 2 Re F (x), y + y2 + α x2 ≥ F (x) − 2 Re F (x), y + y + 2 Re F 2
2
! x , y x2 x
= F (x)2 + y2 ≥ y2 = Tα (0, y), proving the first assertion. If α > αmax , the first inequality sign is strict. If F is injective up to sign, x = 0 implies F (x) = 0, making the second inequality sign a strict one.
Remark 5.4 If αmax is chosen greater than in the proposition, the proposition remains true. An easy to calculate replacement is 2 F y. Theory for Tikhonov regularization with nonlinear mappings is well developed. But in practice this method suffers from the need for global minimizers. For quadratic mappings the Tikhonov functional is not convex, which makes numerical minimization challenging. The only available tailor-made algorithm for our concrete minimization problem is the TIGRA method proposed in [44].
6
Regularization by Decomposition
Abstract
We propose a regularization method which splits a quadratic ill-posed inverse problem in Hilbert spaces into an ill-posed linear problem and a well-posed quadratic one. The technique is based on the notion of quadratic isometries. We present the decomposition approach and its application in a regularization method. Numerical tests complete the chapter.
Throughout this chapter we consider the ill-posed quadratic problem (4.1) in real or complex separable Hilbert spaces X and Y . The results presented in this chapter were published in [47, 49], but only real Hilbert spaces were considered there.
6.1
Quadratic Isometries
It is well known that a linear operator preserves inner products if and only if it preserves norms. In the quadratic case there are (strong) isometries, which preserve both inner products and norms, and there are (weak) isometries, which only preserve norms. Definition 6.1 A quadratic mapping F is a strong isometry if F (x), F (u) = x, u2
© Springer Nature Switzerland AG 2018 J. Flemming, Variational Source Conditions, Quadratic Inverse Problems, Sparsity Promoting Regularization, Frontiers in Mathematics, https://doi.org/10.1007/978-3-319-95264-2_6
59
60
6 Regularization by Decomposition
for all x and u from X and a weak isometry if F (x) = x2 for all x. Obviously, each strong isometry is also weak. The following example shows that there are weak quadratic isometries which are not strong. Example 6.2 Define F : R2 → R2 by "
# x12 − x22 F (x) := . 2 x1 x2 Then F (x)2 = (x12 − x22 )2 + 4 x12 x22 = x2 for all x, but $ %" #& %" #&' $" # " #'2 1 0 1 0 F ,F = −1 = 0 = , . 0 1 0 1 Thus, F is a weak isometry but not a strong one.
An example of a strong quadratic isometry in infinite-dimensional spaces will be given in Sect. 6.2. Figure 6.1 visualizes a strong quadratic isometry mapping between R2 and R3 . For checking isometric properties of quadratic mappings we provide the following criterion.
Fig. 6.1 Example√of a strong quadratic isometry acting between R2 and R3 . The element (x1 , x2 ) is mapped to (x12 , 2 x1 x2 , x22 ). Left-hand side: unit disc in R2 . Right-hand side: image set of unit disc
6.1 Quadratic Isometries
61
Proposition 6.3 Let (ei )i∈N be an orthonormal basis in X. A quadratic mapping F : X → Y is a strong isometry if and only if the following two conditions hold: (i) BF (ei , ej ) = (ii) The set
⎧ ⎨1,
if j = i,
⎩ √1 , 2 {BF (ei , ej ) : i
j < i. ∈ N, j ≤ i} is an orthogonal system.
Proof Necessity follows from calculation of BF (ei , ej ), BF (ek , el ). With (4.2) we obtain BF (ei , ej ), BF (ek , el ) =
=
1 1 ei , ek ej , el + ei , el ej , ek 2 2 ⎧ ⎪ ⎪1, if i = j = k = l, ⎨ 1 2, ⎪ ⎪ ⎩ 0,
if i = k = l = j or i = l = k = j, else,
which directly yields the two conditions in the proposition. For sufficiency we observe & %∞ &' $ %∞ x i ei , F uk e k F i=1
=
k=1
∞ ∞ ∞ ∞
xi xj uk ul BF (ei , ej ), BF (ek , el )
i=1 j =1 k=1 l=1
=
∞ ∞ i=1 j =1
x i x j ui uj =
%∞ i=1
$
&2 x i ui
=
∞ i=1
x i ei ,
∞
'2 uk e k
.
k=1
As one might expect from an isometry, each strong quadratic isometry is continuously invertible. Remember that quadratic mappings cannot be injective because F (x) = F (−x) for all x. Thus, we have to use a slightly generalized notion of continuous invertibility. In view of Remark 1.7 strong quadratic isometries always are locally well-posed at each point. Proposition 6.4 Let F be a strong quadratic isometry and denote by F −1 (y) the full preimage of F at some point y. If a sequence (yk )k∈N in Y converges to some y in Y and if (xk )k∈N is a sequence of corresponding preimages xk from F −1 (yk ) and x is a preimage of y, then (xk )k∈N converges to x or −x or it decomposes into two subsequences (xk+ )k∈N and (xk− )k∈N such that xk+ → x and xk− → −x.
62
6 Regularization by Decomposition
Proof Define index sets I + := {k ∈ N : Re xk , x ≥ 0} and I − := {k ∈ N : Re xk , x < 0}. If (xk ) does not converge to x or −x both sets have infinitely many elements. Then (xk+ )k∈N is the subsequence (xk )k∈I + and (xk− )k∈N is the subsequence (xk )k∈I − . Since F is a strong isometry we have ( xk+ − x2 = xk+ 2 − 2 Re xk+ , x + x2 = yk − 2 Re yk , y + y √ where Re yk , y yields the same value for both complex roots. The first summand converges to y and the second to −2 y. Thus, xk+ − x2 → 0. Analogously, we obtain ( xk− − (−x)2 = xk− 2 + 2 Re xk− , x + x2 = yk − 2 Re yk , y + y → 0. The second equality follows from xk− , x2 = yk , y and Re xk− , x < 0.
Remark 6.5 From the proposition we immediately see that strong isometries are injective up to sign. If F (u) = F (x), choose y = F (x), yk = F (u) and xk = u in the proposition to obtain u = x or u = −x. For linear mappings continuous invertibility is equivalent to closedness of the range. Since this in general is not true for nonlinear mappings we now prove the following result. Proposition 6.6 Let F be a weak quadratic isometry which is weak-to-weak continuous. Then its range R(F ) is closed. Proof Let (F (xk ))k∈N be a sequence in R(F ) which converges to some y in Y . Then xk 2 = F (xk ) → y. Thus, there is a weakly convergent subsequence (xkl )l∈N with limit x and weak-to-weak continuity of F implies F (xkl ) F (x). Together with F (xkl ) y we obtain y = F (x).
6.2 Decomposition of Quadratic Mappings
63
If we look at strong quadratic isometries F and replace the image space Y by the closed subspace span R(F ), then F is always weak-to-weak continuous and, thus, has closed range. Proposition 6.7 Each strong quadratic isometry F : X → Y is weak-to-weak continuous as a mapping between the Hilbert spaces X and span R(F ). Proof Let (xk )k∈N be a sequence in X converging weakly to zero. Then for all x in X we have F (xk ), F (x) = xk , x2 → 0, that is, F (xk ), y → 0 for all y in R(F ). Consequently this also holds for all y in span R(F ). Now let y be an element from span R(F ) and let (yl )l∈N be a sequence in span R(F ) converging to y. Then |F (xk ), y| ≤ F (xk ) y − yl + |F (xk ), yl | for all l and all k. Weak convergence of (xk ) implies boundedness of (xk ) and thus F (xk ) ≤ c for all k with some c > 0. For arbitrary positive ε we may choose l¯ such that ¯ This shows |F (xk ), y| ≤ ε y −yl¯ ≤ 2εc and k¯ such that |F (xk ), yl¯| ≤ 2ε for all k ≥ k. ¯ for k ≥ k. We deduce weak convergence F (xk ) 0 if F is considered as a mapping between X and span R(F ), which proves weak-to-weak continuity of F at zero. Proposition 5.1 yields weak-to-weak continuity on X.
6.2
Decomposition of Quadratic Mappings
Bounded linear operators can be decomposed into a partial isometry and a selfadjoint operator. In a similar spirit we suggest the decomposition of quadratic mappings into a quadratic isometry and a linear operator. Theorem 6.8 Each quadratic mapping F can be decomposed into a strong quadratic isometry Q : X → 2 (N) and a densely defined linear operator A : 2 (N) → Y such that F (x) = A Q(x) for all x in X.
(6.1)
64
6 Regularization by Decomposition
The proof is constructive and will be given in the following. The two lemmas provide a possible choice of the quadratic part Q and the linear part A. But as we will discuss below, other choices are possible and maybe advantageous. For easier handling of indices we define the mapping κ : {(i, j ) ∈ N × N : 1 ≤ j ≤ i} → N by κ(i, j ) := j +
i (i − 1) . 2
(6.2)
This is a bijection. Lemma 6.9 Let (ei )i∈N be an orthonormal basis of X. The mapping Q : X → 2 (N) defined by
Q(x)
κ(i,j )
⎧√ ⎨ 2 x, e x, e , i j := 2 ⎩x, ei ,
if j < i, if j = i
(6.3)
for (i, j ) in N × N with 1 ≤ j ≤ i and x in X is a strong quadratic isometry. Proof The underlying symmetric bilinear mapping of Q is given by ⎧ ⎨ √1 x, e u, e + x, e u, e , i j j i 2 BQ (x, u) κ(i,j ) := ⎩x, e u, e , i i
if j < i, if j = i.
(6.4)
Thus, BQ (ei , ej ) is one or √1 at position κ(i, j ) if i = j or i = j , respectively, and zero 2 at all other positions. The assertion of the lemma now follows from Proposition 6.3.
Lemma 6.10 Let (ei )i∈N be an orthonormal basis of X and let F be quadratic. Denote by D(A) ⊆ 2 (N) the set of all z in 2 (N) for which ⎞ ⎛ ∞ i−1 √ ⎝ A z := 2 zκ(i,j ) BF (ei , ej ) + zκ(i,i) BF (ei , ei )⎠ i=1
(6.5)
j =1
converges. Then the corresponding mapping A : D(A) → Y is linear and its domain D(A) is dense in 2 (N).
6.2 Decomposition of Quadratic Mappings
65
Proof Linearity is obvious and since BF is bounded we have BF (ei , ej ) ≤ F for all
i and j . Thus, the dense subspace 1 (N) of 2 (N) belongs to the domain of A. Now the proof of the main theorem is quite simple. Proof of Theorem 6.8 With Q from (6.3) and A from (6.5) we have F (x) = A Q(x) for all x in X. Since F (x) is defined for each x, in particular we see that the range of Q belongs to the domain of A.
The mapping A in the decomposition may be unbounded, even if F is injective up to sign as the following example shows. In Sect. 6.5 we discuss a setting with bounded A. Example 6.11 This example is based on an idea by Steven Bürger (TU Chemnitz, November 2016), which has not been published elsewhere. Let X = Y = 2R (N) and define F by [F (x)]1 := x2
[F (x)]1+k := [Q(x)]k ,
and
k ∈ N,
for all x ∈ 2R (N) with Q from (6.3). This mapping is injective up to sign because Q is injective up to sign (cf. Remark 6.5) and A : 2R (N) → 2R (N) is given by [A z]1 =
∞
zκ(k,k)
[A z]1+k = zk ,
and
k ∈ N.
k=1
If we choose z(n) in 2R (N) with (n)
zκ(k,l)
⎧ ⎨ √1 , n := ⎩0,
if k = l and k ≤ n, else,
then z(n) = 1 and 2 A z(n) 2 ≥ [A z(n)]1 = n. Thus, A z(n) → ∞ for n → ∞, which proves unboundedness of A.
The constructed decomposition in the proof of the theorem is not the only one. Choosing another quadratic isometry Q one can improve the properties of A. If, for instance, A in the proof is injective and bounded, we can write it as A = A˜ U with selfadjoint A˜ : Y → Y and a linear isometry U : 2 (N) → Y . This follows from the polar
66
6 Regularization by Decomposition
decomposition of the adjoint A∗ . Then Q˜ := U Q is again a strong quadratic isometry and ˜ F = A˜ Q. The following proposition states that the converse is also true: each strong quadratic isometry is the composition of a linear isometry and Q from (6.3), with the underlying basis (ei )i∈N chosen arbitrarily. Note that restriction to the image space 2 (N) comes from the context of the present section, but from the proof we immediately see that 2 (N) can be replaced by any separable Hilbert space. Proposition 6.12 Let Q˜ : X → 2 (N) be a strong quadratic isometry and let (ei )i∈N be ˜ → 2 (N) an orthonormal basis in X. Then there is a linear isometry U : span {R(Q)} 2 such that the strong quadratic isometry U Q˜ : X → (N) attains the form of Q in (6.3). Proof By Proposition 6.3 the set √ ˜ i) : i ∈ N ∪ Q(e 2 BQ˜ (ei , ej ) : i, j ∈ N, j < i ˜ Let (fi )i∈N be the standard orthonormal basis in is an orthonormal basis in span {R(Q)}. ˜ → 2 (N) by 2 (N) and define the linear mapping U : span {R(Q)} ˜ i ) := fκ(i,i) U Q(e
and
U BQ˜ (ei , ej ) := fκ(i,j )
for all i and j with j < i (the index map κ is defined in (6.2)). Then U transfers an orthonormal basis to an orthonormal basis and thus U is an isometry. From ⎛ ⎞ ∞ i−1 √ ⎝ ˜ 2 x, ei x, ej fκ(i,j ) + x, ei 2 fκ(i,i) ⎠ U Q(x) = i=1
for all x in X
j =1
˜ attains the form of Q in (6.3). we see that U Q
Regarding different choices of Q in the decomposition (6.1) the question arises whether boundedness of corresponding operators A depends on the choice of Q. The answer is ‘No’. Proposition 6.13 Let Q : X → 2 (N) be as in (6.3) and let Q˜ : X → 2 (N) be a second strong quadratic isometry. Further let A, A˜ : 2 (N) → Y be two linear mappings such ˜ if and only if A is bounded on 2 (N). that A˜ Q˜ = A Q. Then A˜ is bounded on span {R(Q)} ˜ → 2 (N) such Proof By Proposition 6.12 there is a linear isometry U : span {R(Q)} ˜ = Q. Consequently A˜ Q ˜ = AU Q ˜ and hence A˜ = A U on span {R(Q)} ˜ and that U Q
6.3 Inversion of Quadratic Isometries
67
˜ If A˜ is also A = A˜ U −1 on 2 (N). If A is bounded, then A˜ is bounded on span {R(Q)}. 2 ˜ bounded on span {R(Q)}, then A is bounded on (N).
The isometry Q defined by (6.3) has the advantage that it is weak-to-weak continuous. This property will be used in the next section. As a consequence its range is closed (cf. Proposition 6.6). The same is true for all isometries Q˜ := U Q constructed as described above. The weak-to-weak continuity of Q is a direct consequence of Proposition 6.7. Proposition 6.14 The isometry Q defined by (6.3) is weak-to-weak continuous. Proof We only have to show that the range of Q spans the whole space 2 (N) (cf. Proposition 6.7). Let (ei )i∈N be the orthonormal basis used in the definition of Q and let (fi )i∈N be the standard orthonormal basis in 2 (N). The range of Q spans the whole space if (fi ) belongs to span R(Q). But this can easily be seen because
fκ(i,j )
⎧ ⎨ √1 Q(ei + ej ) − Q(ei ) − Q(ej ), 2 = ⎩Q(e ), i
if j < i, if j = i.
6.3
Inversion of Quadratic Isometries
With the decomposition (6.1) at hand regularization of a quadratic mapping F reduces to regularization of one possibly unbounded linear operator. At least if A is bounded this can be done by standard techniques. The interested reader finds information about regularization of unbounded linear operators in [60]. After inverting A by some regularization method we have to invert the strong quadratic isometry Q. As shown in Proposition 6.4 such mappings are continuously invertible and therefore no regularization is required. Only the fact that the solution z of the regularized linear problem typically lies in the orthogonal complement of the null space of A and possibly not in the range of Q has to be handled somehow. This can be done by projecting z onto the range of Q. A more advanced approach to tackle this problem will be presented in the next section. In the present section we state results at first for general quadratic isometries Q and then we apply them to the concrete Q defined in (6.3). Projection of some element z in 2 (N) onto the range of an isometry Q : X → 2 (N) can be realized by solving Q(x) − z → min . x∈X
(6.6)
68
6 Regularization by Decomposition
Existence and stability of minimizers can be shown for weak-to-weak continuous quadratic mappings, especially for the strong isometry introduced in (6.3) (cf. Proposition 6.14). Proposition 6.15 Let Q : X → 2 (N) be a weak-to-weak continuous quadratic mapping. Then the minimization problem (6.6) has at least one solution. If (zk )k∈N is a sequence in 2 (N) with limit z and if (xk )k∈N is a sequence of corresponding minimizers, then (xk ) has a convergent subsequence and the limits of all convergent subsequences are solutions to (6.6). Proof To prove existence, take a minimizing sequence (xk )k∈N and observe xk 2 = Q(xk ) ≤ Q(xk ) − z + z → inf Q(x) − z + z, x∈X
that is, (xk ) is bounded. Thus, there is a weakly convergent subsequence and due to weak lower semi-continuity of the norm limits of convergent subsequences are minimizers of (6.6). For proving stability take a sequence (zk )k∈N with limit z and a sequence of corresponding minimizers (xk )k∈N . Then xk 2 = Q(xk ) ≤ Q(xk ) − zk + zk ≤ Q(0) − z + z = 2 z, that is, (xk ) is bounded. Thus, there is a weakly convergent subsequence and the limit x¯ of each weakly convergent subsequence satisfies (with (xk ) denoting the subsequence) Q(x) ¯ − z ≤ lim inf Q(xk ) − zk ≤ lim inf Q(x) − zk = Q(x) − z k→∞
k→∞
for all x. To obtain convergence in norm we observe Q(x) ¯ − z ≤ lim inf Q(xk ) − z ≤ lim sup Q(xk ) − z k→∞
k→∞
¯ − zk + zk − z ≤ lim sup Q(xk ) − zk + zk − z ≤ lim sup Q(x) k→∞
k→∞
= Q(x) ¯ − z, which yields Q(x) ¯ − z = lim Q(xk ) − z. k→∞
Equivalently we may write x ¯ 4 − 2 Re Q(x), ¯ z = lim xk 4 − 2 Re Q(xk ), z . k→∞
6.3 Inversion of Quadratic Isometries
69
Weak convergence Q(xk ) Q(x) ¯ implies Re Q(xk ), z → Re Q(x), ¯ z and thus xk converges to x. ¯ Now, weak convergence in combination with convergence of the norms ¯ → 0.
yields xk − x Next we show how to calculate the minimizers of (6.6). We start with a lemma and then provide two theorems, one for real Hilbert spaces and one for complex Hilbert spaces. The ∗ of B appearing in the two theorems has been defined in (4.4). adjoint BQ Q Lemma 6.16 Let Q be a weak quadratic isometry. If Re Q(x), z ≤ 0 for all x, then zero is a solution to (6.6). Else the normalized minimizers of (6.6) coincide with the maximizers of Re Q(x), z → max .
(6.7)
x=1
and the minimizers of (6.6) have norm
(
Re Q(x), ˜ z, where x˜ is a maximizer of (6.7).
Proof If Re Q(x), z ≤ 0 for all x, then Q(0) − z2 = z2 ≤ Q(x)2 − 2 Re Q(x), z + z2 = Q(x) − z2 , that is, zero is a minimizer of (6.6). Else set x := t u where t = x ≥ 0 and u = 1, and minimize with respect to t for each u. The minimum of hu (t) := Q(t u) − z2 = t 4 − 2 t 2 Re Q(u), z + z2 is at ⎧ ⎨0, t= √ ⎩ Re Q(u), z,
if Re Q(u), z ≤ 0, if Re Q(u), z > 0.
Thus, for each u with u = 1 we have min hu (t) = t ≥0
⎧ ⎨z2 ,
if Re Q(u), z ≤ 0, ⎩z2 − (Re Q(u), z)2 , if Re Q(u), z > 0,
and the minimization problem (6.6) turns out to be equivalent to (Re Q(u), z)2 →
max
,
u=1 Re Q(u),z>0
(6.8)
70
6 Regularization by Decomposition
which can be rewritten as Re Q(u), z → max . u=1
Theorem 6.17 Let Q be a weak quadratic isometry between real Hilbert spaces X and 2R (N). If√ Q(x), z ≤ 0 for all x, then zero is a solution to (6.6). Else each minimizer is of the form λ x˜ where λ is the largest eigenvalue of the selfadjoint bounded linear operator C : X → X defined by ∗ C x := BQ (x, z)
for all x in X
and x˜ is a corresponding normalized eigenelement. In particular, C has positive eigenvalues. Proof By Lemma 6.16 the minimization problem (6.6) is equivalent to (6.7). If x is a maximizer of (6.7), which exists by Proposition 6.15, then it is a stationary point of the Lagrange function (cf. [61, Theorem 43.A]), that is, there is some non-zero real number λ such that C x − λ x = 0. Here we use that C is the Fréchet derivative of u → 12 Q(u), z. It remains to show that the stationary points of the Lagrange function with largest Lagrange multiplier λ are indeed maximizers of (6.7). Taking some stationary point x with multiplier λ this follows from Q(x), z = x, C x = x, λ x = λ. In addition we see that existence of a maximizer implies existence of positive eigenvalues.
Theorem 6.17 holds, in principle, also if X and 2 (N) are considered over the complex numbers. But we are faced with two more or less technical difficulties, which force us to take some additional care of the complex case. On the one hand the mapping x → ∗ (x, z) is antilinear, that is, B ∗ (a x, z) = a B ∗ (x, z) for complex numbers a. Thus we BQ Q Q would have to use spectral theory for antilinear operators, which is of course quite similar to spectral theory of linear operators, but hardly covered in the literature. On the other hand, optimization over complex Hilbert spaces is hardly covered in the literature, too. Our aim is to reduce the complex case to a problem in real Hilbert spaces. Before we state the theorem we have to introduce some notation. Let (ek )k∈N be an orthonormal basis in the complex Hilbert space X. By ) * 2 XR := ak ek : a ∈ R (N) k∈N
6.3 Inversion of Quadratic Isometries
71
we denote the real Hilbert space spanned by (ek ). For x in X we define Re x :=
(Re x, ek ) ek ,
Im x :=
k∈N
(Im x, ek ) ek . k∈N
Thus, Re x ∈ XR and Im x ∈ XR . Analogously, for z in 2C (N) we define Re z := (Re z1 , Re z2 , . . .),
Im z := (Im z1 , Im z2 , . . .),
which are elements of 2R (N). The isometry Q : X → 2C (N) can be considered as a mapping on XR and we define its real part R : XR → 2R (N) and its imaginary part S : XR → 2R (N) by R(x) := Re Q(x)
and
S(x) := Im Q(x)
for x in XR .
(6.9)
The mappings R and S are quadratic mappings, too. Theorem 6.18 Let Q be a weak quadratic isometry between complex Hilbert spaces X and 2C (N) with real part R and imaginary part S as defined in (6.9). If the inequality Re Q(x), √z ≤ 0 holds for all x, then zero is a solution to (6.6). Else each minimizer is of the form λ x˜ where λ is the largest eigenvalue of the selfadjoint bounded linear operator C : XR × XR → XR × XR defined by " # # " u BR∗ (u, Re z) − BS∗ (v, Re z) + BS∗ (u, Im z) + BR∗ (v, Im z) C := −BR∗ (v, Re z) − BS∗ (u, Re z) − BS∗ (v, Im z) + BR∗ (u, Im z) v for all u and v in XR and x˜ = u˜ + i v˜ with [u, ˜ v] ˜ T being a corresponding normalized eigenelement. In particular, C has positive eigenvalues. Proof The mapping C is obviously linear and bounded. Selfadjointness follows from $
" # " #' , + u u˜ C , = BR∗ (u, Re z) − BS∗ (v, Re z) − BS∗ (u, Im z) − BR∗ (v, Im z), u˜ v v˜ , + + −BR∗ (v, Re z) − BS∗ (u, Re z) + BS∗ (v, Im z) − BR∗ (u, Im z), v˜ = Re z, BR (u, u) ˜ − Re z, BS (v, u) ˜ − Im z, BS (u, u) ˜ − Im z, BR (v, u) ˜ − Re z, BR (v, v) ˜ − Re z, BS (u, v) ˜ + Im z, BS (v, v) ˜ − Im z, BR (u, v) ˜
72
6 Regularization by Decomposition
= BR∗ (u, ˜ Re z), u − BS∗ (u, ˜ Re z), v − BS∗ (u, ˜ Im z), u − BR∗ (u, ˜ Im z), v − BR∗ (v, ˜ Re z), v − BS∗ (v, ˜ Re z), u + BS∗ (v, ˜ Im z), v − BR∗ (v, ˜ Im z), u $ " # " #' $" # " #' u˜ u u u˜ = C , = ,C . v˜ v v v˜ By the definition of R and S we have Q(x) = Q(Re x + i Im x) = Q(Re x) − Q(Im x) + 2 i BQ (Re x, Im x) = R(Re x) + i S(Re x) − R(Im x) − i S(Im x) + 2 i BR (Re x, Im x) − 2 BS (Re x, Im x) and thus Re Q(x) = R(Re x) − R(Im x) − 2 BS (Re x, Im x), Im Q(x) = S(Re x) − S(Im x) + 2 BR (Re x, Im x). From the definition of the inner product in 2C (N) we see Re Q(x), z = Re Q(x), Re z + Im Q(x), Im z, where the first inner product is in 2C (N) and the second and third are in 2R (N). Now, remembering the definition of C, we rewrite the objective function as , + Re Q(x), z = Re z, R(Re x) − R(Im x) − 2 BS (Re x, Im x) + , + Im z, S(Re x) − S(Im x) + 2 BR (Re x, Im x) = BR∗ (Re x, Re z), Re x − BR∗ (Im x, Re z), Im x − BS∗ (Re x, Re z), Im x − BS∗ (Im x, Re z), Re x + BS∗ (Re x, Im z), Re x − BS∗ (Im x, Im z), Im x + BR∗ (Re x, Im z), Im x + BR∗ (Im x, Im z), Re x $" # " #' Re x Re x = ,C . Im x Im x From here on the proof is analogous to the proof of Theorem 6.17.
6.3 Inversion of Quadratic Isometries
73
We now specify the mapping C from Theorems 6.17 and 6.18 for the strong isometry Q from Lemma 6.9. Lemma 6.19 Let X be a real Hilbert space and let Q : X → 2R (N) be defined by (6.3) ∗ (x, z) then has the symmetric with an orthonormal basis (ei )i∈N . The mapping x → BQ matrix representation Dz in RN×N with respect to (ei ) given by ⎧ ⎨ √1 z , if j < i, 2 κ(i,j ) [Dz ]i,j := ⎩z , if j = i, κ(i,j )
where κ is the index map defined in (6.2). Proof From ⎛ ⎞' ∞ ∗ BQ (x, z), ei = z, BQ (x, ei ) = z, BQ ⎝ x, ej ej , ei ⎠ $
j =1
=
∞ x, ej z, BQ (ej , ei ) j =1
we see [Dz ]i,j = z, BQ (ei , ej ) =
k ∞
. zκ(k,l) BQ (ei , ej ) κ(k,l)
k=1 l=1
and by the definition of BQ , see (6.4), for j ≤ i we have -
BQ (ei , ej )
. κ(k,l)
=
⎧ 1 ⎪ ⎪ √2 , ⎨ 1, ⎪ ⎪ ⎩ 0,
if j < i and k = i and l = j, if j = i and k = l = i, else,
which together with the previous equation shows the assertion of the lemma.
Proposition 6.20 Let Dz be the (infinite) matrix from Lemma 6.19 and let Q be defined by (6.3) with an orthonormal basis (ei )i∈N . Then the mapping C in Theorem 6.17 has the matrix representation Dz with respect to (ei ) and the mapping C in Theorem 6.18 has the matrix representation " # DRe z DIm z , DIm z −DRe z which has to be understood as a mapping from 2R (N) × 2R (N) into 2R (N) × 2R (N).
74
6 Regularization by Decomposition
Proof The matrix representation of C in Theorem 6.17 is a direct consequence of Lemma 6.19. To obtain the matrix representation for C in Theorem 6.18 we note that the imaginary part S of Q is zero and the real part R is the mapping Q restricted to the real Hilbert space XR . Consequently, # ∗ (·, Re z) B ∗ (·, Im z) BQ Q , C= ∗ (·, Im z) −B ∗ (·, Re z) BQ Q "
where BQ is considered as a mapping from XR × XR into 2R (N). Lemma 6.19 applied to ∗ -mappings in real spaces completes the proof. the four BQ
Remark 6.21 From the matrix representation of C (real and complex case) and the fact that z is in 2 (N) we immediately see that C is a Hilbert–Schmidt operator and thus compact. We close this section with two properties of projections onto the range of an quadratic isometry. Proposition 6.22 Denote by ζ an orthogonal projection of some z in 2 (N) onto the range of a weak quadratic isometry Q : X → 2 (N). Then ζ ≤ z. Proof Let x in X be such that Q(x) = ζ . If x = 0 then ζ = 0 and the assertion is obviously true. If x = 0, Lemma 6.16 in combination with the Cauchy–Schwarz inequality yields
! x ζ = x = Re Q , z ≤ z. x 2
Proposition 6.23 If zero is an orthogonal projection of some z in 2 (N) and also of −z onto the range of a weak quadratic isometry Q : X → 2 (N), then z = 0. Proof By Lemma 6.16 we have Re Q(x), z ≤ 0
and
Re Q(x), −z ≤ 0
for all x in X. Thus, Re ˜z, z = 0 for all z˜ in span R(Q), that is, for all z in 2 (N) (cf. proof of Proposition 6.14), which is equivalent to z = 0.
As a consequence of the last proposition we immediately see that projecting a nontrivial subspace onto the range of a quadratic isometry always yields a non-trivial set of projections.
6.4 A Regularization Method
6.4
75
A Regularization Method
In view of Theorem 6.8 regularized inversion of a quadratic mapping can be realized in two steps: regularized inversion of a linear mapping and inversion of a quadratic isometry. The second has been discussed in the previous section. The first can be done by standard regularization methods in Hilbert spaces, for example Tikhonov regularization, Landweber iteration or spectral cut-off. Here we focus on Tikhonov regularization A z − y2 + α z2 → min
z∈2 (N)
with positive regularization parameter α and data y in Y for approximate but stable solution of A z = y †,
z ∈ 2 (N).
(6.10)
For Q we choose (6.3). Throughout the present section we assume that the linear mapping A is bounded. From Example 6.11 we know that this is not always the case, but in the next section we apply the described regularization method to a mapping F for which one can prove boundedness of A. An issue that only becomes visible at the second sight requires our attention and will have essential influence on the regularization procedure to be developed: From regularization theory in Hilbert spaces (cf. [10]) we know that given noisy data y δ with positive noise level δ, that is y δ − y † ≤ δ, corresponding regularized solutions zαδ converge to the norm minimizing solution z† of (6.10) if δ tends to zero and the regularization parameter α is chosen in the right way depending on δ. Thus, the projections onto the range of Q converge to the projections of z† . Here the question arises, whether Q(x † ) is a projection of z† onto the range of Q, where x † denotes a solution of (4.1). Else we cannot expect convergence of the minimizers xαδ of (6.6) with z = zαδ to x † . If A is injective then obviously z† belongs to the range of Q. If A is not injective we have to modify the regularization procedure in a way which forces the regularized solutions zαδ to converge to some solution of (6.10) which lies in the range of Q. Such a solution always exists because y † belongs to the range of F by assumption. Tikhonov regularization can be modified to shift the limit of regularized solutions: A z − y δ 2 + α z − ζ 2 → min . z∈2 (N)
Corresponding minimizers then converge to a solution of (6.10) which minimizes the distance to ζ over the set of all solutions. Choosing a suitable reference element ζ in the penalty term may force convergence to Q(x † ). In the method we propose below ζ is chosen iteratively.
76
6 Regularization by Decomposition
We first present the algorithm and then discuss its construction: 1. 2. 3. 4. 5. 6. 7.
Choose α0 > 0, q ∈ (0, 1) and τ > 1. Set ζ (0) = 0 ∈ 2 (N) and k = 1. Set αk = q αk−1 . Solve (A A∗ + αk I ) yk = y δ − A ζ (k−1) for yk . Set z(k) = A∗ yk + ζ (k−1). Find a minimizer xk of (6.6) with z = z(k) . Set ζ (k) = Q(xk ). If A ζ (k) − y δ ≤ τ δ then stop. Else increase k by one and go to 2.
The three main ingredients are Tikhonov regularization as described above, the discrepancy principle for stopping the algorithm and an alternating projections approach to determine reference elements for the Tikhonov penalty. Solving the Tikhonov minimization problem A z − y δ 2 + αk z − ζ (k−1)2 → min
z∈2 (N)
is equivalent to solving the first order optimality condition (A∗ A + αk I ) z = A∗ y δ + αk ζ (k−1).
(6.11)
This holds for both real and complex Hilbert spaces 2 (N) and Y . In a finite-dimensional situation (e.g. after discretization) the matrix A∗ A is extremely large. To see this take an orthonormal basis (ei )i∈N in X and consider only the n-dimensional subspace spanned by e1 , . . . , en . Applying Q to this subspace yields a subset of 2 (N) which spans a subspace of dimension n (n+1) . Consequently A would map this subspace to some finite-dimensional 2 subspace of Y with dimension m ≤ n (n+1) . The corresponding matrix A∗ A then would 2 4
× n (n+1) or, in other words, the matrix would have more than n4 have dimension n (n+1) 2 2 entries. To avoid solving such a large system we rewrite the first order optimality condition (6.11) in a way which involves A A∗ instead of A∗ A. In finite dimensions the correspond. In the ing matrix would have m × m entries and typically m is much smaller than n (n+1) 2 example presented in the next section we will have m = 2 n. We start rewriting (6.11) by substituting z˜ := z − ζ (k−1). Then (6.11) becomes (A∗ A + αk I ) z˜ = A∗ y δ − A∗ A ζ (k−1) or, equivalently, z˜ = (A∗ A + αk I )−1 A∗ y δ − A ζ (k−1) .
6.4 A Regularization Method
77
A simple calculation shows (A∗ A + αk I )−1 A∗ = A∗ (A A∗ + αk I )−1 , which yields z˜ = A∗ (A A∗ + αk I )−1 y δ − A ζ (k−1) . To obtain z˜ we thus have to solve (A A∗ + αk I ) yk = y δ − A ζ (k−1) for yk and then calculate z˜ = A∗ yk or, equivalently, z = A∗ yk + ζ (k−1). This is done in steps 3 and 4 of the algorithm. Step 5 is based on Theorems 6.17 and 6.18 in connection with Proposition 6.20. Here we have to find the largest eigenvalue and a corresponding eigenelement of a symmetric Hilbert–Schmidt operator (or a symmetric matrix in finite dimensions). There are several standard algorithms for this purpose. We mention the power iteration (also known as Von Mises iteration). The iteration is stopped in step 7 if the so called discrepancy principle is satisfied, that is, if the obtained approximate solution xk fulfils F (xk ) − y δ ≤ τ δ where τ should be slightly greater than one. Note that F (xk ) = A ζ (k). The idea behind this well-known principle is that there is no reason to get closer to the noisy data y δ than the exact data y † is. To complete the description of the algorithm the interplay of the Tikhonov minimization problem and the projection in steps 5 and 6 has to be discussed. At first we consider the idea without regularization. So the question is: How to iteratively approximate Q(x † )? Having approximations of Q(x † ) we also have approximations of x † . Thus, everything can be considered in 2 (N). Obviously, Q(x † ) belongs to the intersection of the range of Q and the shifted null space z† + N (A). The alternating projections method can be used to find points in the intersection of two sets. One starts at some point (we use z† ) by projecting it orthogonally onto one of the sets. Then this projected point is projected orthogonally onto the other set. With this second projection point the procedure is repeated. One can show that this alternating projections method converges weakly to a point in the intersection of the two sets if both sets are closed and convex (see [62, Theorem 1.3(a)]). The set z† +N (A) is closed and convex, but the range of Q is only closed and not convex. Thus, it is not clear whether the method converges and we were not able to solve this issue. Now we incorporate regularization into the idea of alternating projections. Projecting some ζ (k−1) onto z† + N (A) means that we search for the solution of (6.10) with minimal distance to the original point ζ (k−1). Such a solution can be found approximately but stable by minimizing a Tikhonov functional with reference element ζ (k−1) in the penalty term. For fixed k the regularization parameter α in the Tikhonov functional can be chosen by
78
6 Regularization by Decomposition
starting with a large value α0 and then decreasing α by some factor q until some stopping rule (e.g. discrepancy principle) is fulfilled. In principle this parameter choice has to be realized for each k. To save computation time parameter choice and (outer) iteration with respect to k can be combined because in the first outer iterations we do not need maximal accuracy. Thus, solving the Tikhonov minimization problem only for one regularization parameter and decreasing the regularization parameter for the next iteration with new reference element seems to be a valid approach. A similar idea is used for instance in the TIGRA method (see [44]). Due to the lack of convexity of the range of Q we do not have a full convergence proof for our proposed algorithm. We only have proven that it is stable, which is a consequence of standard regularization theory in Hilbert spaces and of Proposition 6.15, and we gave a precise motivation why we expect that the algorithm yields useful results.
6.5
Numerical Example
To demonstrate practicality of the algorithm proposed in the previous section we implemented it to solve the complex-valued autoconvolution problem with full data described in Sect. 4.2.1. The corresponding quadratic mapping was introduced in (4.5). This problem is very similar to the autoconvolution problem for the SD-SPIDER method described in Sect. 4.2.2. The only difference is that in (4.9) we have an additional kernel function k. Assuming that this kernel has a multiplicative structure ˜ k(s ˜ − t), k(s, t) = k(t)
(s, t) ∈ D(k)
with a function k˜ : (0, 1) → C inverting the SD-SPIDER-related mapping is equivalent to ˜ Of course our method can be applied to the kernel-based inverting (4.5) and dividing by k. autoconvolution problem even if the kernel has not such a multiplicative structure. The only additional difficulty would be that we no longer would be able to explicitly calculate several expressions when discretizing our spaces and mappings. Instead we would have to use numerical integration. We discretize functions in X = L2C (0, 1) by cutting off their Fourier coefficients. For k in N set / 0 k k γk := (−1) 2 and ek (t) := e2 π i γk t ,
t ∈ (0, 1).
6.5 Numerical Example
79
Then (ek )k∈N is the usual Fourier basis, but re-indexed to avoid negative indices. The indices k = 1, 2, . . . are mapped to γk = 0, 1, −1, 2, −2, . . .. For our computations we only use the first n Fourier coefficients, that is, approximate solutions belong to the span of e1 , . . . , en . The mapping Q from (6.3) satisfies [Q(x)]κ(k,l) = 0 for k > n, that is, only the first n (n+1) components of Q(x) are non-zero. 2 To obtain the linear mapping A we need BF (ek , el ) for k = 1, 2, . . . and l = 1, . . . , k. Evaluating the integral in (4.6) we obtain ⎧ ⎨s e2 π i γk s , F (ek ) (s) = ⎩(2 − s) e2 π i γk s ,
if s ∈ (0, 1), if s ∈ (1, 2),
(6.12)
and
⎧ 2π iγ s 1 ⎨ k − e2 π i γ l s , e BF (ek , el ) (s) = 2 π i (γk −γl ) −1 2 π i γ k s − e2 π i γ l s , ⎩ 2 π i (γk −γl ) e
if s ∈ (0, 1), if s ∈ (1, 2),
(6.13)
for k = 1, 2, . . . and l = 1, . . . , k −1. Here we see that the images of F (x) = A Q(x) for x in span {e1 , . . . , en } are continuous. Thus, to discretize them we may use piecewise linear interpolation on an equispaced grid with m + 1 nodes in (0, 2) at 2mk for k = 0, 1, . . . , m. Proposition 6.24 The mapping A is bounded. Proof For z in D(A) we have (A z)(s) =
%k−1 ∞ k=1
l=1
& √ 2 zκ(k,l) 2 π i γk s e − e2 π i γl s + zκ(k,k) s e2 π i γk s . 2 π i (γk − γl )
We show that this series converges in L2C (0, 2) for each z in 2C (N), that is, we have D(A) = 2C (N). Then A is obviously bounded. We restrict our attention to convergence in L2C (0, 1). Analogous steps lead to convergence in L2C (1, 2) and thus to convergence in L2C (0, 2). At first we show that an (s) :=
n k−1 zκ(k,l) 2 π i γk s e − e2 π i γ l s , γk − γl k=1 l=1
s ∈ (0, 1),
80
6 Regularization by Decomposition
defines a Cauchy sequence (an )n∈N and then that bn (s) :=
n
zκ(k,k) s e2 π i γk s ,
s ∈ (0, 1),
k=1
defines a Cauchy sequence (bn )n∈N . We have −
n n n k−1 n n zκ(k,l) 2 π i γl s zκ(k,l) 2 π i γl s zκ(l,k) 2 π i γk s e =− e = e γk − γl γk − γl γk − γl k=1 l=1
l=1 k=l+1
k=1 l=k+1
and hence %k−1 & n n zκ(k,l) zκ(l,k) e2 π i γ k s . + an (s) = γk − γl γk − γl k=1
l=1
l=k+1
For m in N (without loss of generality we assume m < n) we obtain m % n & z κ(l,k) an − am = e2 π i γ k · γk − γl 2
k=1
l=m+1
n
+
k=m+1
2 %k−1 & n zκ(k,l) zκ(l,k) e2 π i γ k · + γk − γl γk − γl 2 l=1
l=k+1
n 2 k−1 2 m n n z zκ(l,k) zκ(l,k) κ(k,l) = + + γk − γl γk − γl γk − γl k=1 l=m+1
k=m+1 l=1
LC (0,1)
l=k+1
and the Cauchy–Schwarz inequality yields n 2 % n &% n & z 1 κ(l,k) |zκ(l,k) |2 ≤ γk − γl |γk − γl |2 l=m+1
l=m+1
l=m+1
as well as k−1 2 n z zκ(l,k) κ(k,l) + γk − γl γk − γl l=1
l=k+1
≤
%k−1 l=1
n 1 1 + 2 |γk − γl | |γk − γl |2 l=k+1
& %k−1 l=1
|zκ(k,l)| + 2
n l=k+1
& |zκ(l,k)|
2
.
6.5 Numerical Example
81
Together with ∞ l=1 l =k
∞ 1 1 π2 ≤ 2 = 3 |γk − γl |2 l2 l=1
and an extensive re-indexing we see % m %k−1 && n n n π2 2 2 2 2 |zκ(l,k) | + |zκ(k,l)| + |zκ(l,k)| an − am ≤ 3 =
2 π2 3
k=1 l=m+1
k=m+1
n k−1
m k−1
%
|zκ(k,l) |2 −
k=1 l=1
l=1
&
l=k+1
|zκ(k,l)|2 .
k=1 l=1
Since ∞ k−1
|zκ(k,l)|2 ≤ z2 < ∞
k=1 l=1
we obtain an − am → 0 if m, n → ∞. Now we come to (bn )n∈N . Here for m in N (again m < n) we have 2 1 1 n 2 π i γk s bn − bm = zκ(k,k) s e ds = |s|2 2
0
m+1
0
n 2 2 π i γk s zκ(k,k) e ds m+1
2 1 n n 2 π i γk s zκ(k,k) e |zκ(k,k)|2 , ≤ ds = 0
m+1
k=m+1
which together with ∞
|zκ(k,k)|2 ≤ z2 < ∞
k=1
implies bn − bm → 0 if m, n → ∞. To complete the proof note that %√ & 2 an + bn A z = lim n→∞ 2 π i and this limit exists for all z in 2C (N).
82
6 Regularization by Decomposition
The adjoint A∗ : L2C (0, 2) → 2C (N) of A is given by [A∗ y]κ(k,l)
⎧ ⎨y, F (e ), k = √ ⎩ 2 y, BF (ek , el ),
l = k,
(6.14)
l 0
⇔
x † − x0 < x † .
In particular, x0 = 0 implies x † = 0.
100
7 Variational Source Conditions
Proof Let v satisfy (7.5) and define G : X → X by G(x) := x0 + F [x]∗ v. Then G is a contraction, because G(x) − G(u) = 2 BF∗ (x − u, v) ≤ 2 BF∗ (x − u, · ) v = 2 BF (x − u, · ) v ≤ 2 F v x − u holds for all x and u and we have 2 F v < 1. Thus, Banach’s fixed point theorem implies existence of x † with x † = G(x † ), which is equivalent to (7.4). Now let x † and v satisfy (7.4). Then x † − x0 = F [x † ]∗ v = 2 BF∗ (x † , v) ≤ 2 F v x † ≤ x † and strict inequality x † − x0 < x † holds if x † = 0. Simple manipulations lead to x0 2 ≤ 2 Re x † , x0 and x0 2 < 2 Re x † , x0 if x † = 0. From these two inequalities one easily obtains the first equivalence. The second equivalence is not hard to prove, too, since Re x † , x0 > 0
implies x † = 0. The proposition shows that for non-trivial x † the a priori chosen reference element x0 has to be close enough to the unknown solution x † . Typically, no useful a priori information is available and one chooses x0 = 0. Then source condition and smallness condition are only satisfied if x † = 0. Thus, the classical concept of source conditions for obtaining convergence rates in case of nonlinear inverse problems is not applicable for quadratic mappings.
7.4
Variational Source Conditions Are the Right Tool
We now show that variational source conditions (7.2) are always satisfied in case of quadratic inverse problems and thus are a suitable tool for proving convergence rates (7.1). This results has not been published elsewhere. Theorem 7.3 Let F be quadratic, weakly continuous and injective up to sign. Then for each x † in X and for each positive β with β < 1 there is a concave index function ϕ such that the variational source condition (7.2) holds on X.
7.5 Sparsity Yields Variational Source Conditions
101
Proof Fix x † and note that due to injectivity up to sign we have S = {x † , −x † } for the set of solutions in (7.2). Thus, dist(x, S)2 = min x − x † 2 , x + x † 2 = x2 + x † 2 + 2 min{−Re x, x † , Re x, x † } = x2 + x † 2 − 2 |Re x, x † | for all x in X. The assertion follows from Theorem 3.2 if Assumption 3.1 can be verified. Items (i) and (ii) of Assumption 3.1 are trivially true. For item (iii) we observe −β E † (x) + Ω(x) = (1 − β) x2 − β x † 2 + 2 β |Re x, x † | for all x, which is a lower semicontinuous functional. Item (iv) follows from ˜ x2 − 2 β˜ |Re x, x † | + β˜ x † 2 β˜ E † (x) − Ω(x) = −(1 − β) ≤ β˜ x † 2 for β˜ ∈ (β, 1).
7.5
Sparsity Yields Variational Source Conditions
In the previous section we saw that there are always a constant β and an index function ϕ such that a variational source condition (7.2) is satisfied for a fixed quadratic mapping F . Now we demonstrate a method to obtain a concrete function ϕ depending on the behavior of the exact solutions and of the mapping F . The core of this section has been published in [20]. First we consider general quadratic mappings and later we apply the results to autoconvolution of functions with uniformly bounded support. The following lemma provides a variational source condition for very specific quadratic mappings, which will be the basis for the more general result. Lemma 7.4 If there are an exact solution x † to (4.1) and an orthonormal basis (ek )k∈N in X such that (i) F (ek ) k∈N is an orthonormal system in Y , (ii) BF (ek , el ) = 0 for all k and l in N with k = l, (iii) {k ∈ N : x † , ek = 0} is finite, that is, x † is sparse with respect to (ek )k∈N ,
102
7 Variational Source Conditions
then a variational source condition (7.2) is satisfied with β=1
ϕ(t) = 2
and
√ n t,
where n is the number of non-zero coefficients of x † . Proof Set xk† := x † , ek and xk := x, ek for all x and all k. Further set fk := F (ek ). Then (fk )k∈N is an orthonormal system in Y . From F (x) = F (x † )
(xk† )2 = xk2
⇔
for all k
we see that the set S of all solutions to (1.1) is S = {x ∈ X : xk = xk† or xk = −xk† for all k in N}. Thus, dist(x, S)2 =
∞
min |xk − xk† |2 , |xk + xk† |2 .
k=1
To simplify this expression further, for each x in X we define a sequence (ξk (x))k∈N by ξk (x) :=
⎧ ⎨1,
if Re (xk xk† ) ≥ 0,
⎩−1, else.
Then we easily derive dist(x, S)2 =
∞
|xk − ξk (x) xk†|2 .
k=1
Fix x and set x˜ † :=
∞ k=1
ξk (x) xk† ek .
(7.6)
7.5 Sparsity Yields Variational Source Conditions
103
Obviously, x˜ † ∈ S. From (i) and (ii) in the lemma we thus obtain 2 ∞ ∞ † 2 2 F (x) − F (x ) = F (x) − F (x˜ ) = xk fk − (x˜k ) fk †
2
†
2
k=1
=
k=1
∞ ∞ 2 x − (x˜ † )2 2 = xk − ξk (x) x †2 xk + ξk (x) x †2 k k k k k=1
k=1
and a simple calculation shows xk + ξk (x) x †2 ≥ x † 2 . k k Denoting by I := {k ∈ N : xk† = 0} the support of (xk† )k∈N with cardinality n we obtain F (x) − F (x † )2 ≥
∞ ∞ xk − ξk (x) x † 2 x † 2 = xk − ξk (x) x † x † 2 k k k k k=1
k=1
and by applying the Cauchy–Schwarz inequality and the triangle inequality we obtain the estimate 1 F (x) − F (x ) ≥ n †
2
1 ≥ n
%
∞ xk − ξk (x) x † x † k
&2
k
k=1
∞ 2 ,2 1 + † † xk − ξk (x) xk ξk (x) xk = x˜ † , x − x˜ † . n k=1
On the other hand, we have dist(x, S)2 − x2 + x † 2 = x − x˜ † 2 − x2 + x˜ † 2 + + , , = −2 Re x˜ † , x − x˜ † ≤ 2 x˜ † , x − x˜ † , completing the proof.
Note that if F is the autoconvolution of periodic functions (see Sect. 4.2.1), then (i) and (ii) in the lemma are satisfied with (ek )k∈N being the Fourier basis in L2 (0, 1). Theorem 7.5 Let F : X → Y be quadratic and injective up to sign and denote the solutions to (4.1) by x † and −x † . If there exist an orthonormal basis (ek )k∈N in X and a
104
7 Variational Source Conditions
bounded linear operator A : Y → Y such that (i) A F (ek ) k∈N is an orthonormal system in Y , (ii) A BF (ek , el ) = 0 for all k and l in N with k = l, (iii) {k ∈ N : x † , ek = 0} is finite, that is, x † is sparse with respect to (ek )k∈N , then a variational source condition (7.2) is satisfied with X replaced by the subset M := {x ∈ X : Re (xk xk† ) ≥ 0 for all k or Re (xk xk† ) ≤ 0 for all k} with β =1
and
√ ϕ(t) = 2 A n t,
where n is the number of non-zero coefficients of x † . The solutions x † and −x † are interior points of M. Proof The solution x † of (4.1) is also a solution of A F (x) = A y †, x ∈ X. Thus we can apply Lemma 7.4 to the quadratic mapping A F and obtain the variational source condition dist(x, SA F )2 ≤ x2 − x † 2 + 2
√ n A F (x) − A F (x † )
for all x in X, where SA F denotes the set of all solutions to A F (x) = A y † , x ∈ X. The estimate A F (x) − A F (x †) ≤ A F (x) − F (x † ) yields the right-hand side of the desired variational source condition. From (7.6) in the proof of Lemma 7.4 we know that dist(x, SA F )2 =
∞
|xk − xk† |2 = x − x † 2 ≥ dist(x, {x †, −x † })2
k=1
if Re (xk x † ) ≥ 0 for all k. On the other hand, if Re (xk x † ) ≤ 0 for all k, then dist(x, SA F )2 =
∞
|xk + xk† |2 = x + x † 2 ≥ dist(x, {x † , −x † })2 .
k=1
Thus, for all x in M we have dist(x, SA F )2 ≥ dist(x, {x † , −x † })2 , which proves the variational source condition on M.
7.5 Sparsity Yields Variational Source Conditions
105
To show that x † and −x † are interior points of M we show that balls centered at the solutions with radius r :=
min
k∈N: xk† =0
|xk† |
lie in M. From x − x † ≤ r we obtain |xk − xk† | ≤ |xk | for all k with xk† = 0. Thus, |xk |2 − 2 Re (xk xk† ) ≤ 0, which yields Re (xk xk† ) ≥ 0. Analogously, x + x † ≤ r implies Re (xk xk† ) ≤ 0.
Note that the variational source condition in the theorem does not hold on the whole space, but only on a smaller set M. Nevertheless convergence rates can be obtained from such a variational source condition because the exact solutions are interior points of M and, thus, regularized solutions lie in M if the data noise level is small enough, see, e.g. [13]. Example 7.6 We consider autoconvolution of functions with uniformly bounded support, that is, X = L2C (0, 1), Y = L2C (0, 2) and F is given by (4.5). Let (ek )k∈N be the Fourier basis as described in Sect. 6.5 and let A : Y → Y be defined by (A y)(s) := y(s) + y(s + 1),
s ∈ (0, 1).
Then one easily verifies that √ A F is the autoconvolution of periodic functions defined in (4.8) and that A ≤ 2. Thus, Theorem 7.5 is applicable because the convolution theorem yields assumptions (i) and (ii) in the above theorem.
Lemma 7.4 can be extended to non-sparse solutions x † . With a technique very similar to the proof of Theorem 11.7 in Part III we would obtain a variational source condition (7.2) with some positive β and with an index function ⎛
⎞ † √ 1 2 |xk | + 2 2 n + 1 t ⎠ , ϕ(t) = inf ⎝ n∈N 1 − β
t ≥ 0,
(7.7)
|k|>n
which is essentially determined by the decay of the coefficients of x † . The variational source condition on a subset M of X in Theorem 7.5 can be proven for non-sparse x † , too. But x † and −x † are no longer interior points of M. Thus, we cannot ensure that regularized solutions belong to M and the usual convergence rates proof does not work.
Part III Sparsity Promoting Regularization
8
Aren’t All Questions Answered?
Abstract
We collect several questions concerning sparsity promoting regularization which remained unanswered during the past years of extensive research in this field.
Sparsity promoting regularization techniques for linear operator equations, which will be introduced in the next chapter, were a very active field of research during the first 10 years of the new millennium. Methods were suggested, algorithms were developed and the theoretical backing had been extended extensively. The major sparsity promoting regularization method is 1 -regularization and this method is used in many applications today and is an integral part of signal processing tool boxes. Research in the inverse problems community started in 2004 with the influential paper [64] and went on for several years. From about 2010 on the focus changed to other, more involved sparsity promoting methods including regularization with non-convex penalties and it seemed that the theory of 1 -regularization was more or less complete, at least complete enough to ensure proper behavior of corresponding algorithms in applications. In the third part of this book we resume research on the fundamentals of 1 -regularization and answer questions which were not posed in first years, but are of importance for understanding the method and for verifying its applicability in a wide field of problems. These questions include: • What happens if the frequent assumption, that the sought-for solution has a sparse representation in a certain basis, fails? Is 1 -regularization capable of producing sufficiently precise approximations to such non-sparse solutions?
© Springer Nature Switzerland AG 2018 J. Flemming, Variational Source Conditions, Quadratic Inverse Problems, Sparsity Promoting Regularization, Frontiers in Mathematics, https://doi.org/10.1007/978-3-319-95264-2_8
109
110
8 Aren’t All Questions Answered?
• Next to sparsity of the solution, which additional conditions have to be satisfied to bound the distance between 1 -regularized and exact solutions in terms of the data noise level? What is the weakest set of such conditions? • What about non-injective operators? Is it possible to prove error estimates if the underlying linear operator is not injective, even not in a weakened sense? These questions will be answered in detail in the following chapters. The presented results were published in [23, 25, 65–68]. If we would aim at brevity, we could formulate a very technical theorem first and then derive all results as corollaries. Instead, to increase readability, we start with the less technical results and then go step by step to the more involved questions and their answers. Following this longer path allows to reconstruct the core ideas of the final theorem on convergence rates for 1 -regularization with non-injective operators, which else would be covered by technical and notational difficulties. Depending on the reader’s knowledge in the fields of set theoretic topology, functional analysis and convex analysis it might be beneficial to first look into Appendix A prior to reading the next chapters.
Sparsity and 1 -Regularization
9
Abstract
In this section we describe the basic ideas of sparsity promoting regularization techniques, introduce 1 -regularization and briefly discuss alternative methods. In addition we have a first look at several examples which will appear again later on in the text.
9.1
Sparse Signals
Let X˜ and Y be real Banach spaces. For fixed y † in Y we aim to solve the operator equation A˜ x˜ = y †
(9.1)
˜ where A : X˜ → Y is assumed to be linear and bounded. If this equation is for x˜ in X, ill-posed, we have to take into account possible inaccuracies in the right-hand side, that is, only some y δ in Y with y δ − y † Y ≤ δ
(9.2)
is accessible. Here, the positive constant δ denotes the noise level. To stabilize the inversion process additional information on the exact solutions is required. This is the point where sparsity comes into play. Especially in signal processing applications, which include all kinds of image processing, one often knows a priori that the sought-for solution to (9.1) is sparse or almost sparse in the following sense.
© Springer Nature Switzerland AG 2018 J. Flemming, Variational Source Conditions, Quadratic Inverse Problems, Sparsity Promoting Regularization, Frontiers in Mathematics, https://doi.org/10.1007/978-3-319-95264-2_9
111
9 Sparsity and 1 -Regularization
112
Definition 9.1 Let (e˜k )k∈N be a Schauder basis in X˜ and denote be (x˜k )k∈N corresponding ˜ The Banach space element x˜ is sparse with respect to the basis coefficients of some x˜ in X. (e˜k )k∈N if only finitely many coefficients do not vanish. The element x˜ is almost sparse with respect to that basis if (x˜k )k∈N belongs to 1 . As usual, 1 denotes the Banach space of absolutely summable real sequences. If the underlying space X˜ is a sequence space and no basis is explicitly specified, then sparsity is considered with respect to standard basis (e(k) )k∈N , where el(k)
⎧ ⎨1, if l = k, := ⎩0, else.
Note that existence of Schauder bases is not guaranteed for each Banach space. Obviously, if there is a Schauder basis, then the space is separable. However, there are separable Banach spaces which do not have a Schauder basis (see [69, page 35] and references therein). A Schauder basis (e˜k )k∈N might be unbounded, but yields a bounded Schauder basis if each element e˜k is divided by its norm e˜k X˜ . Thus, existence of Schauder bases is equivalent to existence of bounded Schauder bases. But note that normalizing a Schauder basis as described changes the set of almost sparse elements, whereas the set of sparse elements remains unchanged. In most applications (and publications) X˜ is a separable Hilbert space and (e˜k )k∈N is an orthonormal basis. Wavelet bases are a common choice. We do not restrict our considerations to Hilbert spaces, because, on the one hand, the Banach space setting does not imply ponderable additional expenses and, on the other hand, it allows us to test our results and their limitations with a much wider class of examples. Sparsity is a property of the coefficient sequence with respect to a basis and not of a Banach space element itself. Thus, we may replace X˜ by some sequence space. A good choice is 1 for two reasons: first, it is large enough to contain the coefficient sequences of all almost sparse elements, and second, it is small enough to ensure that each sequence ˜ More precisely, we could say is indeed a coefficient sequence of some element in X. 1 that contains exactly those sequences which are coefficient sequences of almost sparse ˜ Because we are looking for (almost) sparse solutions to (9.1), there is no elements in X. need to consider coefficient sequences which do not belong to 1 . Proposition 9.2 Let (e˜k )k∈N be a Schauder basis in X˜ and define L : 1 → X˜ by L x :=
∞ k=1
xk e˜k
(9.3)
9.2 1 -Regularization
113
for x in 1 . Then L is linear, injective and bounded with LL (1 ,X) ˜ = sup e˜k X˜ . k∈N
Proof Linearity and injectivity are obvious. Boundedness and the upper bound for the norm of L follow from L xX˜ ≤
∞
|xk | e˜k X˜ ≤ x1 sup e˜k X˜ k∈N
k=1
for x in 1 . Choose x = e˜k to see that the upper bound is also a lower bound for the norm of L.
Now define the bounded linear operator A : 1 → Y by A := A˜ L and consider the equation A x = y †,
x ∈ 1 .
(9.4)
This is the equation we are going to deal with in the remaining chapters of this part of the book. If we know a solution to (9.4), then applying L yields a solution to the original Eq. (9.1).
9.2
1 -Regularization
If Eq. (9.4) is ill-posed, exact solutions are not accessible, because they do not depend continuously on the (noisy) right-hand side. Instead we have to seek for approximate but stable solutions. Knowing a priori that the sought-for solutions to (9.4) are sparse, it is sensible to look for sparse approximations. One way to obtain such sparse approximations is to minimize the Tikhonov-type functional p
Tα (x, y δ ) := A x − y δ Y + α x1
(9.5)
over x in 1 . Here, p is some positive exponent for simplifying numerical minimization. If Y is a Hilbert space, then p = 2 is a good choice. In any case we assume p > 1. The trade-off between data fitting and stabilization is controlled by the positive regularization parameter α. In [64] an algorithm for finding the minimizers numerically had been proposed. That article popularized 1 -regularization in the inverse problems community and motivated numerous further publications on the subject. The minimizers of Tα (·, y δ ) are referred to as 1 -regularized solutions. To show that this method is well-defined and regularizing and that the minimizers are sparse, we formulate
9 Sparsity and 1 -Regularization
114
the theorem below. The assertions of the theorem are well known in the literature and we repeat them here for the sake of completeness. The basic assumption required for the theorem can be stated in several equivalent ways as shown in the lemma. As usual, c0 denotes the Banach space of real sequences converging to zero. Remember that (c0 )∗ = 1 . Note that in Sect. 1.3 we already discussed Tikhonov-type regularization methods in general Banach spaces. Results there were based on the weak topology in X. But the 1 -penalty does not satisfy item (v) of Assumption 1.13. Thus, we have to use a weaker topology on 1 . Lemma 9.3 The following assertions are equivalent. (i) (ii) (iii) (iv)
(A e(k))k∈N converges weakly to zero. R(A∗ ) ⊆ c0 . A is weak*-to-weak continuous. A is sequentially weak*-to-weak continuous.
Proof Let (i) be satisfied. Then for each A∗ η from R(A∗ ) we have [A∗ η]k = A∗ η, e(k)∞ ×1 = η, A e(k)Y ∗ ×Y → 0 if k → ∞, that is, A∗ η ∈ c0 . Now let (ii) be true. If we take a weakly* convergent net (x (κ) )κ∈N with limit x, then η, A x (κ) Y ∗ ×Y = A∗ η, x (κ) ∞ ×1 and, since A∗ η belongs to c0 and 1 is the dual of c0 , we may write A∗ η, x (κ) ∞ ×1 = x (κ) , A∗ η1 ×c0 . Thus, limη, A x (κ) Y ∗ ×Y = limx (κ) , A∗ η1 ×c0 = x, A∗ η1 ×c0 , κ
κ
showing limη, A x (κ) Y ∗ ×Y = η, A xY ∗ ×Y κ
for all η ∈ Y ∗ .
Finally, (iii) immediately implies (iv) and from (iv) and the obvious fact that (e(k))k∈N converges weakly* to zero we immediately obtain (i).
Theorem 9.4 Let A be weak*-to-weak continuous. Then the following assertions are true. (i) Existence: There exist solutions to (9.4) with minimal norm (referred to as norm minimizing solutions) and there exist minimizers of the Tikhonov-type functional (9.5). Further, all minimizers of Tα (·, y δ ) are sparse. (ii) Stability: If (yk )k∈N converges to y δ and if (x (k))k∈N is a corresponding sequence of minimizers of (9.5) with y δ replaced by yk , then this second sequence has a
9.2 1 -Regularization
115
weakly* convergent subsequence and each weakly* convergent subsequence converges weakly* to a minimizer of Tα (·, y δ ). (iii) Convergence: If (δk )k∈N converges to zero and if (yk )k∈N satisfies yk − y † Y ≤ δk , then there is a sequence (αk )k∈N such that each corresponding sequence of minimizers of Tαk (·, y δk ) contains a weakly* convergent subsequence. Each such subsequence converges in norm to some norm minimizing solution of (9.4). Proof Closed balls in 1 are weakly* compact, see [7, Theorem 2.6.18], which guarantees that bounded sequences have a weak* accumulation point and that the 1 -norm is weakly* lower semi-continuous. Taking also the weak*-to-weak continuity of A and the weak lower semi-continuity of the norm in Y into account we may apply standard results on Tikhonovtype regularization methods in Banach spaces, see [12, 13]. Note that the 1 -norm satisfies the so called weak* Kadec-Klee property, which yields convergence in norm in item (iii) of the theorem. It only remains to show that each minimizer of (9.5) has only finitely many nonzero components. This is a consequence of R(A∗ ) ⊆ c0 (cf. Lemma 9.3). By standard arguments from convex analysis we see that some x is a minimizer of Tα (·, y δ ) if and only if there is some ξ in ∞ such that −ξ ∈ α ∂ · 1 (x) and ξ ∈ A∗ ∂(A · − y δ Y )(x). p
Thus, |ξk | = α whenever xk = 0 and ξ belongs to c0 . This is only possible if x has at most finitely many non-zero components.
The assumption that A is weak*-to-weak continuous is very weak, because it is automatically satisfied if A has a bounded extension to some q -space with q > 1. To see this, let Eq : 1 → q be the bounded embedding of 1 into q and let Aq : q → Y be the bounded extension of A to q . Then A e(k) = Aq Eq e(k) = Aq e(k) and (e(k))k∈N converges weakly to zero in q . The weak-to-weak continuity of bounded linear operators thus implies Aq e(k) 0. If the original space X˜ (cf. previous section) is a Hilbert space and (e˜k )k∈N is an orthonormal basis, then A has a bounded extension to 2 and thus is weak*-to-weak continuous. Nevertheless there are bounded linear operators which are not weak*-to-weak continuous. An example is the identity mapping A = I if Y = 1 . In 1 weak sequential convergence coincides with norm convergence, but not with weak* convergence. Thus, the identity mapping cannot be sequentially weak*-to-weak continuous. To cover also this special case, we may weaken the assumption of weak*-to-weak continuity slightly by demanding only weak*-to-weak* continuity. This is possible if Y is the dual space of some other Banach space. If Y is reflexive then weak and weak* convergence coincide. If Y is not reflexive, as it is the case for Y = 1 , then weak convergence implies weak* convergence but not vice versa.
9 Sparsity and 1 -Regularization
116
The above lemma can be modified as follows to deal with the weaker condition. The implication from (ii) to (iv) in the lemma was formulated and proven by Bernd Hofmann (TU Chemnitz) and has recently been published in [70, Proposition 5.1]. Lemma 9.5 Let Z be a Banach space and let Y = Z ∗ . Then the following assertions are equivalent. (i) (ii) (iii) (iv)
(A e(k))k∈N converges weakly* to zero. R(A∗ |Z ) ⊆ c0 . A is weak*-to-weak* continuous. A is sequentially weak*-to-weak* continuous.
Proof Let (i) be satisfied. Then for each A∗ η from R(A∗ |Z ), that is, η ∈ Z ⊆ Y ∗ , we have [A∗ η]k = A∗ η, e(k)∞ ×1 = η, A e(k) Y ∗ ×Y = A e(k), ηZ ∗ ×Z → 0 if k → ∞, that is, A∗ η ∈ c0 . Now let (ii) be true. If we take a weakly* convergent net (x (κ) )κ∈N with limit x, then A x (κ) , ηZ ∗ ×Z = η, A x (κ) Y ∗ ×Y = A∗ η, x (κ) ∞ ×1 for all η in Z and, since A∗ η belongs to c0 and 1 is the dual of c0 , we may write A∗ η, x (κ) ∞ ×1 = x (κ) , A∗ η1 ×c0 . Thus, limA x (κ) , ηZ ∗ ×Z = limη, A x (κ)Y ∗ ×Y = limx (κ) , A∗ η1 ×c0 → x, A∗ η1 ×c0 , κ
κ
κ
showing limA x (κ), ηZ ∗ ×Z = A x, ηZ ∗ ×Z κ
for all η ∈ Z.
Finally, (iii) immediately implies (iv) and from (iv) and the obvious fact that (e(k))k∈N converges weakly* to zero we immediately obtain (i).
The above theorem on existence, stability and convergence remains true, except for sparsity of minimizers, if weak*-to-weak continuity is replaced by weak*-to-weak* continuity, because the proof only relies on the fact that A ·−y δ Y is weakly* lower semicontinuous. This is the case in both variants, with or without *, since the norm functional is weakly and also weakly* lower semi-continuous.
9.2 1 -Regularization
117
Note that the case Y = 1 , A = I is mainly of theoretical interest. It is a tool for exploring the frontiers of the theoretic framework we have chosen for investigating 1 regularization. For practical applications it is irrelevant because one easily verifies that with the natural choice p = 1 in (9.5) the 1 -regularized solutions coincide with the data y δ if α < 1. For the sake of completeness we mention that there exist bounded linear operators which not even are weak*-to-weak* continuous. Example 9.6 If Y = 1 and
[A x]k :=
⎧∞ 1 ⎪ ⎨ xl ,
if k = 1,
l=1
⎪ ⎩x , k
else,
for all k in N and all x in 1 , then A e(k) = e(1) + e(k) if k > 1. Thus, A e(k) ∗ e(1) but e(k) ∗ 0. The same operator A considered as mapping into Y = 2 is an example of a not weak*-to-weak continuous bounded linear operator in the classical Hilbert space
setting for 1 -regularization. See also [70] for further discussion of the weak*-to-weak* setting. We close this section with an observation about the maximum sensible regularization parameter α. An analogous observation was made in Part II for standard Hilbert space Tikhonov regularization applied to quadratic mappings, cf. Proposition 5.3. To the author’s best knowledge the following proposition does not appear in the literature, although its proof is quite simple. Proposition 9.7 Let Y be a Hilbert space, let p = 2 in (9.5) and set αmax := 2
sup A x, y δ Y ×Y .
x∈X x1 ≤1
If α ≥ αmax , then 0 ∈ argmin Tα (x, y δ ). x∈X
If, in addition, α > αmax or A is injective, then argmin Tα (x, y δ ) = {0}. x∈X
9 Sparsity and 1 -Regularization
118
Proof If x = 0 we have Tαδ (x) = A x2Y − 2 A x, y δ Y ×Y + y δ 2Y + α x1 ! x 2 δ δ 2 δ ≥ A x)Y − 2 A x, y Y ×Y + y Y + 2 A x1 ,y x1 Y ×Y = A x2Y + y δ 2Y ≥ y δ 2Y = Tαδ (0), proving the first assertion. If α > αmax , the first inequality sign is strict. If A is injective, x = 0 implies A x = 0, making the second inequality sign a strict one.
Remark 9.8 If αmax is chosen greater than in the proposition, the proposition remains true. An easy to calculate replacement is 2 AL (1 ,Y ) y δ Y . Note that some of the results here and in the sequel can also be derived for nonlinear equations, if the nonlinearity is controlled by additional assumptions, see [71]
9.3
Other Sparsity Promoting Regularization Methods
We briefly mention three other regularization methods which, under suitable assumptions, produce sparse approximate solutions to linear operator equations (9.4). The first is the residual method, followed by so called non-convex regularization and the elastic net method. All three methods are closely related to 1 -regularization. The residual method consists in solving the minimization problem x1 → min
subject to
A x − y δ Y ≤ δ.
In finite dimensions the method is investigated, for instance, in [72]. For results in a very general infinite-dimensional setting we refer to [73]. Under suitable assumptions, the residual method yields the same approximate solutions as 1 -regularization with α chosen by the discrepancy principle, see [73, page 2] for a reference. The term non-convex regularization typically refers to Tikhonov-type methods with non-convex penalty. That is, minimization problems of the form p
A x − y δ Y + α R(x) → min x∈1
are considered, where R : 1 → [0, ∞] is some non-convex stabilizing functional. A common choice for R is R(x) :=
∞ k=1
|xk |q
9.4 Examples
119
with q in (0, 1) and R(x) = ∞ if the series does not converge. Such methods yield sparse minimizers, but are numerically very challenging. For details and references we refer to [74, 75]. Elastic net regularization is an extension of 1 -regularization which aims at simplifying numerical minimization of the Tikhonov-type functional. The idea is to add a second, smooth regularization term, typically the 2 norm: p
A x − y δ Y + α1 x1 + α2 x22 → min . x∈1
Additional difficulties arise from the fact that here two regularization parameters have to be chosen in the right way to guarantee proper behavior of the method. For details we refer to [76] and references therein.
9.4
Examples
In this chapter we introduce several examples which will be discussed in more detail in subsequent chapters to demonstrate certain features of the obtained results.
9.4.1
Denoising
Reconstructing a sparse signal from noisy measurements is an important application of 1 -regularization. Reduced to sequence spaces, that is, after representing the signal with respect to a suitable basis, the task is to find a sparse approximation x to given y δ . The noise is typically measured in a norm weaker than the 1 -norm. Thus, let Y := p with p in (1, ∞). The operator A in (9.4) is the bounded embedding Ep : 1 → p of 1 into p . Because Ep obviously has a bounded extension to p , the operator is weak*-to-weak continuous, cf. discussion in Sect. 9.2. Thus, Theorem 9.4 applies. For later reference we note ) * ∞ p ∗ ∞ p−1 |ξk |