About the Authors:
M. Martinez-Garcia
Roles Software, Writing – review & editing
Affiliations Image Processing Lab., Univ. València, València, Spain, Instituto de Neurociencias, CSIC, Alicante, Spain
P. Cyriac
Roles Software, Writing – review & editing
Affiliation: Information and Communication Technologies Dept., Univ. Pompeu Fabra, Barcelona, Spain
T. Batard
Roles Writing – original draft
Affiliation: Information and Communication Technologies Dept., Univ. Pompeu Fabra, Barcelona, Spain
M. Bertalmío
Roles Conceptualization, Funding acquisition, Writing – review & editing
Affiliation: Information and Communication Technologies Dept., Univ. Pompeu Fabra, Barcelona, Spain
ORCID logo http://orcid.org/0000-0002-1023-8325
J. Malo
Roles Conceptualization, Formal analysis, Funding acquisition, Software, Supervision, Writing – original draft, Writing – review & editing
* E-mail: [email protected]
Affiliation: Image Processing Lab., Univ. València, València, Spain
ORCID logo http://orcid.org/0000-0002-5684-8591
Abstract
In vision science, cascades of Linear+Nonlinear transforms are very successful in modeling a number of perceptual experiences. However, the conventional literature is usually too focused on only describing the forward input-output transform. Instead, in this work we present the mathematics of such cascades beyond the forward transform, namely the Jacobian matrices and the inverse. The fundamental reason for this analytical treatment is that it offers useful analytical insight into the psychophysics, the physiology, and the function of the visual system. For instance, we show how the trends of the sensitivity (volume of the discrimination regions) and the adaptation of the receptive fields can be identified in the expression of the Jacobian w.r.t. the stimulus. This matrix also tells us which regions of the stimulus space are encoded more efficiently in multi-information terms. The Jacobian w.r.t. the parameters shows which aspects of the model have bigger impact in the response, and hence their relative relevance. The analytic inverse implies conditions for the response and model parameters to ensure appropriate decoding. From the experimental and applied perspective, (a) the Jacobian w.r.t. the stimulus is necessary in new experimental methods based on the synthesis of visual stimuli with interesting geometrical properties, (b) the Jacobian matrices w.r.t. the parameters are convenient to learn the model from classical experiments or alternative goal optimization, and (c) the inverse is a promising model-based alternative to blind machine-learning methods for neural decoding that do not include meaningful biological information. The theory is checked by building and testing a vision model that actually follows a modular Linear+Nonlinear program. Our illustrative derivable and invertible model consists of a cascade of modules that account for brightness, contrast, energy masking, and wavelet masking. To stress the generality of this modular setting we show examples where some of the canonical Divisive Normalization modules are substituted by equivalent modules such as the Wilson-Cowan interaction model (at the V1 cortex) or a tone-mapping model (at the retina).
Figures
Fig 13
Fig 14
Fig 15
Fig 1
Fig 2
Fig 3
Fig 4
Fig 5
Fig 6
Fig 7
Fig 8
Fig 9
Fig 10
Fig 11
Fig 12
Fig 13
Fig 14
Fig 15
Fig 1
Fig 2
Fig 3
Citation: Martinez-Garcia M, Cyriac P, Batard T, Bertalmío M, Malo J (2018) Derivatives and inverse of cascaded linear+nonlinear neural models. PLoS ONE 13(10): e0201326. https://doi.org/10.1371/journal.pone.0201326
Editor: Wolfgang Einhäuser, Technische Universitat Chemnitz, GERMANY
Received: November 29, 2017; Accepted: July 11, 2018; Published: October 15, 2018
Copyright: © 2018 Martinez-Garcia et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All image distortion files used in the model optimization are available from the publicly available TID database http://www.ponomarenko.info/tid2008.htm.
Funding: This work was partially funded by the Spanish Ministerio de Economia y Competitividad projects CICYT TEC2013-50520-EXP and CICYT BFU2014-59776-R, by the European Research Council, Starting Grant ref. 306337, by the Spanish government and FEDER Fund, grant ref. TIN2015-71537-P(MINECO/FEDER,UE), 1021, and by the ICREA Academia Award. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
1 Introduction
The mathematics of Linear+Nonlinear (L+NL) transforms is interesting in neuroscience because cascades of such modules are key in explaining a number of perceptual experiences [1]. For instance, in visual neuroscience, perceptions of color, motion and spatial texture are tightly related to L+NL models of similar functional form [2–4]. The literature is usually focused on describing the behavior, i.e. setting the parameters of the forward input-output transform. However, understanding the transform computed by the sensory system, S, goes beyond predicting the output from the input. The mathematical properties of the model (namely the derivatives, ∇S, and the inverse, S−1), are also relevant. Here we show that the Jacobian matrices and the inverse provide analytical insight into fundamental aspects of the psychophysics of the visual system, its physiology, and its function. Additionally, the Jacobian matrices and the inverse enable new experimental designs, data analysis and applications in visual neuroscience. Finally, related applied disciplines like image processing that require computable and interpretable models of visual perception may also benefit from this formulation.
Derivatives are relevant
The Jacobian, ∇S, represents a local linear approximation of the nonlinear system, S. From a fundamental perspective, the analytical expressions of the Jacobian matrices have a variety of interests in visual neurosicence. In physiology, the dot product definition of receptive field introduced for linear systems [5, 6] can be extended to nonlinear systems using the Jacobian matrix with regard to the stimulus. Therefore, this Jacobian is convenient to properly formulate concepts such as adaptive (stimulus dependent) receptive fields or adaptive features. On the other hand, the Jacobian of the response w.r.t the parameters allows to assess the impact of the different aspects of the model on the response, and hence the relative relevance of these aspects. In psychophysics, the sensitivity of the system is characterized by its discrimination abilities (inverse of the volume of the regions determined by the just noticeable differences -JNDs- [7, 8]). Discrimination depends on models to compute perceptual differences from the internal representation [9–11] or on models of noise at the internal representation [12–14]. In any of these cases, the way the sensory system, S, deforms the stimulus space is critical to understand how the discrimination regions in the internal representation transform back into the image space. It does not matter that these internal JNDs are implied by internal noise or by an assumed internal metric. The change of variable theorem [15, 16] implies that the Jacobian w.r.t. the stimulus controls how the volume element is enlarged or compressed in the deformations suffered by the representation along the neural pathway. That is the key to describe how the metric matrices change under nonlinear transforms in Riemannian geometry [16]. For the same reason, this Jacobian wrt the stimulus is also the key to characterize the propagation of noise throughout the system [17]. In analyzing the function of the sensory system in information-theoretic terms the relation between the information and the volume of the signal manifold is crucial [18, 19]. According to this, for the same geometrical reasons stated above [15, 16], the Jacobian wrt the stimulus plays an important role in determining the amount of information lost (or neglected) along the neural pathway. More specifically [19], the Jacobian wrt the stimulus determines the multi-information shared by the different sensors of the neural representation.
From an experimental and applied perspective, the Jacobian matrices also have relevance in visual neuroscience. Novel psychophysical techniques such as Maximum Differentiation [20–23] synthesize stimuli for the experiments through the gradient of the perceptual distance, and it depends on the Jacobian w.r.t. the input. On the other hand, characterizing the Jacobian w.r.t. the parameters is also important. First, it is relevant in order to learn the L+NL cascade that better reproduces classical experiments (e.g. physiological responses or psychophysical judgements), as opposed to approaches that rely on exhaustive search (as in [11, 24–26]). Second, an explicit expression for this Jacobian is important to understand the optimization for alternative goals such as optimal coding, as opposed to approaches that rely on implicit automatic differentiation (as in [27]).
Finally, related disciplines such as image processing may benefit from analytically interpretable models. Reliable subjective image distances (and hence the Jacobian w.r.t. stimulus) have paramount relevance in image processing applications judged by human viewers [28–30]. Examples include tone mapping and contrast enhancement [31], image coding [9, 10, 32], motion estimation and video coding [24, 33, 34], denoising [35, 36], visual pattern recognition [37], or search in image databases [38]. In all these cases, either the subjective distance between the original and the processed image has to be minimized, or the distance used to find image matches has to be perceptually meaningful.
Inverse is relevant
In neuroscience, visual brain decoding [39–41] may benefit from the analytic inverse, S−1, because it may lead to improvements of the current techniques based on blind regression [42]. Interestingly, the benefits of the inverse may not only be limited to straightforward improvements in decoding: the inverse may also give rise to more accurate methods to estimate the model. For instance, the best parameters of S would be those that lead to better reconstructions through the corresponding S−1. Note that another relevant point of ∇S is its relation to S−1: according to the theorem of the inverse function [15], the non-singularity of the Jacobian is the necessary condition for the existence of the inverse.
In the image processing side, the relevance of the inverse is obvious in perceptual image/video coding where the signal is transformed to the perceptual representation prior to quantization [10, 32–34]: decompression implies the inverse to reconstruct the image. Another example is white balance based on human color constancy (or chromatic adaptation): in general, adaptation may be understood as a transform to an invariant representation which is insensitive to irrelevant changes (as for instance the nature of the illumination) [43–45]. Models of this class of invariant representations could be easily applied for color constancy if the transform is invertible.
In this paper we derive three analytic results for neural models consisting on cascades of canonical Linear+Nonlinear modules: (i) the Jacobian with regard to the stimulus, (ii) the Jacobian with regard to the parameters, and (iii) the inverse.
We discuss the use of the above results in the context of illustrative derivable and invertible vision models made of cascades of L+NL modules. This kind of models is used to illustrate both (a) the fundamental insight that can be obtained from the analytical expressions as well as (b) their usefulness in designing new experiments and applications in visual neuroscience.
Regarding the insight obtained from analytical expressions, in physiology, (a.1) we show how the context-dependence of the receptive fields of the sensors can be explicitly seen in the expression of the Jacobian w.r.t the stimulus. Likewise, (a.2) we show that the expression of the Jacobian wrt the parameters reveals that the impact in the response of uncertainty at the filters (or synaptic weights) may vary over the stimulus space, and this trend may depend on the sensor. In psychophysics, (a.3) we show how the general trends of the sensitivity over the stimulus space can be seen from the determinant of the metric based on the Jacobian wrt the stimulus. Finally, in studying the function of the system in coding terms, (a.4) we show that the Jacobian wrt the stimulus implies different efficiency (different multi-information reduction) in different regions of the stimulus space.
Regarding the experimental and applied interest of the expressions, we address three examples: (b.1) the Jacobian wrt the image is used for stimuli generation in geometry-based psychophysics as in [21]; (b.2) the Jacobian wrt the parameters is used to maximize the alignment with subjective distortion measures, improving the brute-force approaches in [11, 24, 26]; and finally, (b.3) we discuss how the analytic inverse may be a successful alternative to decoding techniques based on blind linear regression [46] or nonlinear kernel-ridge regression [41].
To stress the generality of the modular L+NL cascade we show examples where some of the canonical Divisive Normalization modules [1] are substituted by equivalent modules such as the Wilson-Cowan interaction [47, 48] (at the V1 cortex) or a tone-mapping model [49] (at the retina). Of course, these were selected just as illustrative nonlinearities in an active field in which new alternatives are being explored [50–53].
Despite the relevance of these ubiquitous neural models, the above mathematical issues have not been addressed in detail in the experimental literature. Interestingly, although the machine learning literature deals with similar architectures [54], these details are also not made explicit due to the growing popularity of automatic differentiation [55]. For instance, in [14, 27, 53, 56, 57] biologically plausible L+NL architectures are optimized according to physiological data, psychophysical data or to efficient coding principles. Unfortunately, the Jacobian w.r.t. the parameters was hidden behind automatic differentiation.
On the contrary, here we show how the explicit expressions provide intuition on the role of biologically relevant parameters.
2 Results
2.1 Notation and general considerations
Stimuli as vectors.
An image in the retina, x0(p, λ), is a function describing the spectral irradiance in each spatial location, p, and wavelength, λ. Here regular, bold, and capital letters will represent scalars, vectors and matrices (or multivariate applications) respectively. Assuming a dense enough sampling, the continuous input can be represented by a discrete spectral array with no information loss, regardless of the specific sampling pattern [58]. Here we will assume Cartesian sampling in space and wavelength. This implies that the spectral cube consists of b matrices of size h × w, where the l-th matrix represents the discrete spatial distribution of the energy of l-th discrete wavelength (l = 1, …, b). In vision models the spatio-spectral resolution of viewers should determine the sampling frequencies. Given the cut-off frequencies of the contrast sensitivities [59, 60], and given the smoothness of the achromatic and opponent spectral sensitivities [61, 62], the spatial dimensions may be sampled at about 80 samples/deg (cpd) and the spectral dimension at about 0.1 samples/nm [63].
Using an appropriate rearrangement of the spectral array, the input image can be thought as a vector in a d0-dimensional space,(1)i.e. the input stimuli, x0(p, λ), which are functions defined in a discrete 3-dimensional domain, are rearranged as a d0-dimensional column vectors, , where d0 = h × w × b. Note that with the considered sampling frequencies, the dimension of the input stimuli is huge even for moderate image sizes (small angular field in the visible spectral range).
The particular scanning pattern in the rearrangement function, vect(⋅), has no major relevance as long as it can be inverted back to the original spatio-spectral domain. Here we will use the last-dimension-first convention used in the Matlab functions im2col.m and col2im.m. The selected rearrangement pattern has no fundamental effect, but it has to be taken into account to make sense of the structure of the matrices of the model acting on the input vector.
This rearrangement function, vect(⋅), will be also a convenient choice when computing derivatives with regard to the elements of the matrices involved in the model.
The visual pathway: Modular L+NL architecture
The visual system may be thought as an operator, S, transforming the input d0-dimensional vectors (stimuli) into dn-dimensional output vectors (or sets of dn responses),(2)where is the response vector, and Θ is the set of parameters of the model. Vectorial output is equivalent to considering dn separate sensors (or mechanisms) acting on the stimulus, x0, leading to the corresponding individual responses, , where k = 1, 2, … dn. In this view, the k-th sensor would be responsible for the k-th dimension of the response vector, xn. The number of separate sensors analyzing the signal may not be the same as the input dimension, so in general dn ≠ d0. The number of parameters of the model, dΘ, depends on the specific functional form of the considered transform.
As suggested in [1], the global response described above may be decomposed as a series of feed-forward elementary operations, or a cascade of n modules (stages or layers), S(i), where i = 1, 2, ⋯, n,(3)i.e. the global response is the composition of the elementary responses:
The intermediate representations of the signal along this response path may have different dimension, i.e. , because the number of mechanisms in stage S(i) may be different from the number of mechanisms in S(i−1). Each layer in the above deep network architecture has its own parameters, xi+1 = S(i)(xi, Θi). Again, dΘi, depends on the specific functional form of the i-th layer. Each layer performs a linear+nonlinear (L+NL) operation:(4)i.e. each layer is a composition of two operations: . Let us briefly note that, while in most models the linear operation is followed by the nonlinear one, which is why we use this formulation here, in some instances an inverted scheme of a nonlinear+linear model might be more suitable [64, 65]. That scenario can be handled by our framework as well, after some trivial modification (e.g. choosing the linear operation in the first layer of the L+NL model to be the identity, so that the first layer becomes in practice the nonlinear operation of the first layer followed by the linear operation of the second layer, and the whole cascade gets shifted into a NL+L form).
The linear operation, , is represented by a matrix . The number of rows in the matrix Li corresponds to the number of linear sensors in layer S(i). This number of mechanisms determines the dimension of the linear output, ,(5)In the nonlinear operation, , each output of the previous linear operation undergoes a saturation transform. Phenomena such as masking or lateral inhibition imply that the saturation of should depend on the neighbors with k′ ≠ k. This saturation is usually formalized using divisive normalization [1]. This adaptive saturation is a canonical neural operation and it is at the core of models for color [2], motion [3], and spatial texture vision [4]. Nevertheless, other alternative nonlinearities may be considered as discussed below. In general, this saturating interaction will depend on certain parameters θi, (6)
Summarizing, in this cascaded setting, the parameters of the i-th layer are (1) the weights of the bank of linear sensors represented in (the rows of the matrix Li), and (2) the parameters of the nonlinear saturating interaction, , i.e.(7)Note that according to Eq 5, the rows of the Li play the same scalar-product role as standard linear receptive fields [5, 6]). The only difference is that the rows of Li are defined in the space of vectors xi−1 instead of being defined in the input image space (of vectors x0).
Canonical and alternative nonlinearities
Divisive normalization in matrix notation.
The conventional expressions of the canonical divisive normalization saturation use an element-wise formulation [1],(8)This expression, in which the energy of each linear response is , combines conventional matrix-on-vector operations (such as the product H ⋅ e in the denominator) with a number of element-wise operations: the division of each coefficient of the vector in the numerator by the corresponding coefficient of an inhibitory denominator vector, ; the element-wise absolute value (or rectification) to compute the energy; the element-wise exponentiation; the element-wise computation of sign, and its preservation in the response through an element-wise product. Therefore, the parameters of this divisive normalization are: the excitation and inhibition exponent, γ; the semisaturation constants in the vector, b; and the interaction matrix in the denominator, H,
The matrix-on-vector operation in the denominator is key in understanding masking and adaptation. This is because the k-th row of Hi describes how the neighbor activities saturate (or mask) the response of the k-th nonlinear response. The effect of these parameters are extensively analyzed elsewhere [1].
From a formal perspective, the combination of element-wise and matrix-on-vector operations in the conventional expression makes differentiation and inversion from Eq 8 extremely cumbersome. This can be alleviated by a matrix-vector expression where the individual coefficients, k, are not explicitly present. Incidentally, this matrix expression will imply more efficient code in matrix-oriented environments such as Matlab.
In order to get such matrix-vector form, it is convenient to recall the equivalence between the element-wise (or Hadamard) product and the operation with diagonal matrices [66]. Given two vectors a and b, their Hadamard product is:(9)where is the diagonal matrix with vector a in the diagonal.
Using the matrix form of the Hadamard product and the definitions of energy, and denominator vector, , the conventional Divisive Normalization, Eq 8, can be re-written with diagonal matrices without referring to the individual components of the vectors:(10)where the model parameters are θi = {γi, bi, Hi}. Similarly to Von-Kries adaptation [67], this matrix form of Divisive Normalization is nonlinear because the diagonal of the matrix depends on the signal. The derivation of the results (proofs given in the Supporting Information Files) shows that the above matrix version of Divisive Normalization is extremely convenient to avoid cumbersome individual element-wise partial derivatives and to compute the analytic inverse.
Alternative nonlinearities: Wilson-Cowan equations and tone-mapping.
Even though all the elementary L+NL layers of the deep network in Eq 3 could be implemented by a composition of Eqs 5 and 10 (as suggested in [1]), here we also consider particular alternatives for the nonlinearities that have been proposed to account for the response at specific stages in the visual pathway. Namely, the Wilson-Cowan equations [47, 48], which could account for the masking between local-oriented sensors [26]; and nonlinear models of brightness perception such as the ones used in tone mapping [49, 68, 69]. The consideration of these alternatives for specific stages stresses the generality of the proposed framework since, as shown in the examples of the Discussion, the network equations can be applied no matter the specific functional form of each stage (provided the elementary derivatives and inverses are known).
The Wilson-Cowan equations [47, 48] describe the temporal evolution of the mean activity of a population of neurons at the V1 cortex. In what follows, we consider the following form of the Wilson-Cowan equations(11)where α, μ, λ are coupling coefficients, W = Wk,k′ is a kernel which decays with the difference |k − k′|, f is a sigmoid function and t is time.
The steady-state equation of the evolution Eq (11) is (12)Existence and uniqueness of the solution of the steady-state Eq (11) are not guaranteed in the general case. We refer the reader to [70] for some conditions on the coefficients α, μ, λ and the sigmoid f for which the existence and uniqueness of the solution is guaranteed.
From now on, we assume that we are in a case where we have existence and uniqueness of the solution of the steady-state equation. Then, we define the Wilson-Cowan transform N(i)(yi) of yi as the unique solution xi of the steady-state equation.
While the Wilson-Cowan equations are sensible for populations of cortical neurons, brightness-from-luminance models may account for nonlinearities at earlier stages of the visual pathway (e.g. in the retina). An illustrative example of these specific nonlinearities which is connected to image enhancement applications through tone mapping is the two-gamma model in [68]. In this model the nonlinear saturation is a simple exponential function with no interaction between neighbor dimensions,(13)where all operations (sign, rectification, exponentiation) are dimension-wise. However, note that the exponent is a function of the magnitude of the input tristimulus value. Specifically,(14)The exponent has different values for low and high inputs, γL and γH respectively (hence the two-gamma name). The transition of γ between γL and γH happens around the value |y| = μ1. This transition is smooth, and its sharpness is controlled by the exponent m.
This expression for γ has statistical grounds since the resulting nonlinearity approximately equalizes the probability density function (PDF) of luminance values in natural scenes [69, 71], which is a sensible goal in the information maximization context [72]. This nonlinearity can be applied both to linear luminance values [68, 73] as well as to linear opponent color channels [43, 74]. Therefore, this specific nonlinearity could be applied after a linear stage where the spectrum in each spatial location is transformed into opponent tristimulus values. Special modification of the nonlinearity around zero is required to address the singularity of the derivative in zero. We will be more specific on this point when we address the Jacobian of this two-gamma model below.
Jacobian matrices of L+NL cascades
In the modular setting outlined above, variation of the responses may come either from variations of the stimulus, x0, or from variations of the parameters, Θ. On the one hand, for a given set of fixed parameters, many properties of the sensory system depend on how the output depends on the stimuli, i.e. many properties depend on the Jacobian of the transform with regard to the image, ∇x0S (where the subindex at the derivative operator indicates the derivation variable). In particular, this Jacobian is critical to decode the neural representation (existence of inverse), and to describe perceptual distance between stimuli. As an example, the Discussion shows how this Jacobian is key in the generation of stimuli fulfilling certain geometric requirements involved in recent psychophysics. On the other hand, when looking for the parameters that better explain certain experimental behavior, it is necessary to know how the response depends on the parameters, i.e. the key is the Jacobian with regard to the parameters, ∇ΘS. As an example, the Discussion shows how this Jacobian can be used to maximize the correlation with subjective opinion in visual distortion psychophysics.
In these notation preliminaries we address the general properties of these Jacobian matrices (both ∇x0S and ∇ΘS) in the context of the modular network outlined above. The interest of these preliminaries is that we show that the problem of computing ∇x0S and ∇ΘS reduces to the computation of the Jacobian matrices of the elementary nonlinearities ( and respectively). These elementary Jacobians, and , and the inverse, (whose existence is related to ), are the three analytical results of the paper, and will be addressed in the next subsections. Specifically, for the divisive normalization, in Eq 24 (result I), Eqs 29–36 (result II), and Eq 39 (result III).
Local-linear approximation.
The response function, S, can be seen as a nonlinear change of coordinates depending on the (independent) variables x0 and Θ. Therefore, around certain , this function can be expanded in Taylor series and its properties depend on the matrices of derivatives with regard to these variables [15, 16], in this case, the Jacobian matrices ∇x0S and ∇ΘS, (15)This is the local-linear approximation of the nonlinear response for small perturbations of the stimulus or the parameters. In Eq 15 the derivatives are computed at , the vector is the variation of the stimulus; and is a vector with a perturbation of the dΘ parameters in the model. Note that the column vector of model parameters (of dimension dΘ) is obtained simply by concatenating the parameters of the different layers.
The Jacobian with regard to the parameters necessarily has variables from different layers, so it makes an extensive use of the chain rule. Therefore, lets start with the Jacobian with regard to the stimulus and then, let’s introduce the chain rule for this simpler case.
Global Jacobian with regard to the stimulus.
At certain point , one may make independent variations in all the dimensions of the input. Note that statistical independence of the dimensions of the stimuli is a different issue (different from formal mathematical independence in the expression). Actually, in general, the dimensions of natural stimuli are not statistically independent [45, 75]. Omitting the (fixed) parameters, ΘA, for the sake of clarity, the Jacobian with regard to the input is the following concatenation (independent variables imply concatenation of derivatives [15, 16]), where . Expanding these column vectors, we see that :(16)
Note that this Jacobian may depend on the input, x0, because the slope of the response (the behavior of the system) may be different in different points of the stimulus space.
Note also that, for fixed parameters, according to Eq 15, the global nonlinear behavior of the system can be linearly approximated in a neighborhood of some stimulus, , using the Jacobian with regard to the stimulus, i.e. variations of the response linearly depend on variations of the input for small distortions Δx0.
Chain rule: Global Jacobian in terms of the Jacobians of the layers.
The Jacobian of the composition of functions (e.g. the multi-layer architecture we have here), can be decomposed as the product of the individual Jacobian matrices. For example, given the composition, f ∘ g ∘ h = f(g(h(x))), the application of the chain rule leads to: Note that when inputs and outputs are multidimensional (matrix chain-rule) the order of the product of Jacobians is important for obvious reasons. Following the above, the Jacobian of the cascade can be expressed in terms of the Jacobian of each layer:(17)Similarly to ∇x0S, in general ∇xi−1S(i) depends on the input and is rectangular. Note that . Given the L+NL structure of each layer, , we can also apply the chain rule inside each layer,(18)where we used the trivial derivative of a linear function [76]: .
Note that assuming we know the parameters of the system (the linear weights, Li, in each layer, and the parameters of the nonlinearities, θi), after Eqs 17 and 18 the final piece to compute the Jacobian of the system with regard to the stimulus is the Jacobian of the specific nonlinearities, . Solving this remaining unknown will be the first analytical result of the paper (Result I), namely Eq 24.
Jacobian with regard to the parameters.
For a given set of parameters, ΘA, one may introduce independent perturbations in the parameters of each layer. Therefore, the Jacobian with regard to the parameters is the following concatenation,(19)where each is a rectangular matrix with dΘi being the dimension of Θi; and the input (xA, ΘA) was omitted for the sake of clarity. Note that actual independence among the different parameters is different from formal mathematical independence in the expression. In fact, certain interaction between layers can be required to get certain computational goal.
Applying the chain rule for the Jacobian with regard to the parameters of the i-th layer, (20)Note how Eq 20 makes sense, both dimensionally and qualitatively. First, note that and . Second, it makes sense that the effect of changing the parameters in the i-th layer has two terms: one describing how the change affects the response of this layer (given by ∇ΘiS(i)), and other describing the propagation of the perturbation through the remaining layers of the network (given by the product of the other Jacobians -with regard to the stimulus!-, ).
Now, taking into account that in each layer the parameters come from the linear and the nonlinear parts, and these could be varied independently, we obtain:where we applied the chain rule in the Jacobian with regard to the matrix Li, and the fact that, by definition, .
Further development of the first term requires the use of the derivative of a linear function with regard to the elements in the matrix Li. This technical issue is addressed in the S2 File. Using the result derived there, namely Eq S2.4, the above equation reduces to: (21)where, as stated in Eq S2.4, is just a block diagonal matrix made from di replications of the (known) vector xi−1, and this expression assumes that the elements of the perturbations ΔLi are vector-arranged row-wise, e.g. using vect(ΔLi⊤). Note that in Eq 21, the only unknown terms are the Jacobian of the nonlinearity: , already referred to as the first analytical result of this work (Eq 24), and , which will be the second analytical result of the work (Result II), namely Eqs 29–36.
Jacobian and perceptual distance
In the input-output setting represented by S, perceptual decisions (e.g. discrimination between stimuli) will be made on the basis of the information available in the response (output) space and not in the input space. This role of the response space in stimulus discrimination is consistent with (i) the psychophysical practice that assumes uniform just noticeable differences in the response domain to derive the slope of the response from experimental thresholds [2, 4, 43], and (ii) the formulation of subjective distortion metrics as Euclidean measures in the response domain [9–11, 77].
Perceptual distance: General expression.
The perceptual distance, , between two images, and , can be defined as the Euclidean distance in the response domain:(22)
An Euclidean distance in the response domain implies a non-Euclidean measure in the input image domain [9, 11, 14, 16, 78]. One may imagine that, for nontrivial S−1, the inverse of the points in the sphere of radius |Δxn|2 around the point will no longer be a sphere (not even a convex region!) in the input space. The size and orientation of these discrimination regions determine the visibility of distortions Δx0 on top of certain background image, . Different Euclidean lengths in the image space (different |Δx0|2) will be required in different directions in order to lead to the same perceptual distance . The variety of orientations and sizes of the well-known Brown-MacAdam color discrimination regions [79] is an intuitive (just three-dimensional) example of the above concepts.
Perceptual distance: 2nd-order approximation.
Assuming the local-linear approximation of the response around the reference image, Eq 15, we have . Under this approximation, the perceptual distance from the reference image reduces to:(23)with . Therefore, the matrix plays the role of a non-Euclidean metric matrix induced by the sensory system. This is a 2nd-order approximation because in this way, perceived distortion only depends on the interaction between the deviations in pairs of locations: .
Note that a constant value for the distance in Eq 23 defines an ellipsoid oriented and scaled according to the metric matrix . In this 2nd-order approximation, the discrimination regions reduce to discrimination ellipsoids. The properties of these ellipsoids depend on the metric and hence on the Jacobian of the response w.r.t. the stimulus (i.e. on Result I below). In particular, the orientation depends on the eigenvectors of M and the scaling depends on the eigenvalues.
The simplicity of Eq 23 depends on the assumption of quadratic norm in Eq 22 (as opposed to other possible summation exponents in Minkowski metrics [15]). Note that using other norms would prevent writing the distance in the response domain through the dot product of Δxn. Therefore, the linear approximation would not be that easy. With non-quadratic summation the distance would still depend on the elements of the Jacobian (and hence on Result I), but the expression would be more complicated, and the reasoning through Jacobian-related eigenvectors would not be as intuitive.
2.2 Result I: Jacobian with regard to the stimulus
The problem of computing the Jacobian with regard to the stimulus in the cascade of L+NL modules, ∇x0S, reduces, according to Eqs 17 and 18, to the computation of the Jacobian of the nonlinearity with regard to the stimulus in every layer, . In this section we give the analytical result of the required Jacobian, , in the canonical divisive normalization case, and for two alternative nonlinearities. Proofs of this first set of analytical results are given in the S3 File. The role of this analytical result in generating stimuli for novel psychophysics is illustrated in the Discussion, Section 3.2.
Jacobian of the canonical nonlinearity with regard to the stimulus.
The matrix form of the divisive normalization, Eq 10, based on the diagonal matrix notation for the Hadamard products, is convenient to easily compute the Jacobian (see the explicit derivation in the S3 File, which leads to,(24)
Eq 24 shows that the Jacobian, , depends on the subtraction of two matrices, where the first one is diagonal and the second one depends on Hi, the matrix describing the interaction between the intermediate linear responses. Note that the role of the interaction is subtractive, i.e. it reduces the slope (for positive Hi). In situations where there is no interaction between the different coefficients of yi, , the resulting is point-dependent, but diagonal.
Eq 24 also shows that the sign of the linear coefficients has to be considered twice (through the multiplication by the diagonal matrices at the left and right). This detail in the sign (which is crucial to set the direction in gradient descent), was not properly addressed in previous reports of this Jacobian (e.g. in [10, 11, 80]) because this literature was focused on properties which are independent of the sign (diagonal nature, effect on the metric, and determinant respectively).
Jacobian of alternative nonlinearities with regard to the stimulus.
The forward Wilson-Cowan transform does not have an explicit expression since the solution evolves from a differential equation. As a result, there is no analytic solution of the Jacobian either. However its inverse is analytical (as detailed in the next section, Eq 40). Therefore, given the relation between the Jacobian matrices of inverse functions, namely , we can compute the Jacobian of the forward Wilson-Cowan transform from the Jacobian of its inverse.
Specifically, derivation with regard to the response in the analytic inverse given in Eq 40 is straightforward, and it leads to:(25)As a result, the Jacobian of the forward Wilson-Cowan nonlinearities at the point is,(26)assuming that is nonsingular at . Note that, in general, this Jacobian matrix will be nondiagonal because of the inhibitory interactions between sensors expressed in the (nondiagonal) matrix W.
For the other example of alternative nonlinearity, the two-gamma saturation model, the Jacobian with regard to the stimulus is a diagonal matrix since this special nonlinearity is a point-wise operation. From Eq 13, according to the derivation given in the S3 File, the Jacobian of the two-gamma model is:(27)Note that the logarithm and the division by |y| imply a singularity in zero. Then, in order to guarantee the differentiability of the nonlinear transform, we propose a modification of the nonlinearity in a small neighborhood of 0. By choosing an arbitrarily small, ϵ, so that 0 < ϵ < < 1, we modify Eq 13 for small inputs in this way, (28)where,
With this modification around zero the two-gamma nonlinearity and its derivative are continuous and well defined everywhere: the Jacobian for |y| > ϵ would be given by Eq 27, and for smaller inputs , which is well defined at zero.
2.3 Result II: Jacobian with regard to the parameters
The problem of computing the Jacobian with regard to the parameters in the cascade of L+NL modules, ∇ΘS, reduces, according to Eqs 19–21, to the computation of the Jacobian of the nonlinearity with regard to the parameters in every layer, . In this section we give the analytical result of the required Jacobian in the canonical divisive normalization case. Proofs of this second analytical result are given in the S4 File. The role of this analytical result in getting optimal models from classical psychophysics is illustrated in the Discussion, Section 3.3.
Jacobian w.r.t. parameters: General equations.
The parameters of the divisive normalization of the i-th layer that may be independently modified are θi = {γi, bi, Hi}. Therefore, is given by this concatenation:(29)where, according to the derivation given in the S4 File, we have,(30)(31)(32)where stands for a diagonal matrix with vector v in the diagonal as stated in Eq 9, and stands for a block diagonal matrix built by d-times replication of the matrix (or vector) v as stated in the S2 File (in Eq S2.4). Note also that, consistently with the derivative of a linear function w.r.t. its parameters (in the S2 File), in order to apply the Jacobian in Eq 32 on small perturbations of the matrix, Hi, the corresponding perturbation should undergo row-wise vectorization. For instance, imagine Hi is perturbed so that . Then, the perturbation in the response should be computed as .
Jacobian w.r.t. parameters: Specific equations for Gaussian kernels.
The qualitative meaning of Hi (interaction between neighboring neurons) naturally leads to propose specific structures in the rows of these matrices. For instance, stronger interaction between closer neurons naturally leads to the idea of Gaussian kernels [4]. This functional parametrization implies a dramatic reduction in the number of unknowns because each row, , with dimension di, could be described by a Gaussian defined by with only two parameters: amplitude and width. In the considered retina-V1 pathway the identity of the sensors is characterized by its 2D spatial location or by its 4D spatio-frequency location. In the most general case the index, k, of the sensor has spatio-frequency meaning:where pk = (pk1, pk2) is the optimal 2D location, fk is the optimal spatial frequency, and ϕk is the optimal orientation of the k-th sensor. In V1, the interaction between the linear response and the neighbors decreases with the distance between k and k′ in space, frequency and orientation [4]. Restricting ourselves to intra-subband interactions (which incidentally are the most relevant [11, 80]) one has:(33)where the relevant parameters are and which respectively stand for the amplitude and width of the Gaussian centered in the k-th sensor. is the squared distance between the sensors, and dpk1dpk2 is just the spatial area of the discrete grid of sensors that sample the visual space in this subband. This implies that the pool of all interactions is .
In the case of different interactions per sensor (different Gaussian in each row, ), derivatives with regard to the independent widths are, (34)With this parametrization of H we can develop Eq 32 further: the dependence on individual widths can be obtained by using , and the final result (see the S4 File) is:(35)where,
A diagonal matrix for makes sense because the modification of the interaction width of a sensor only affects the nonlinear response of this sensor (similarly to the diagonal nature of in Eq 31).
The derivative with regard to the vector of amplitudes of the Gaussian interactions, , is a concatenation of columns (similarly to Eq 34). It can also be computed from the chain rule and from the derivative w.r.t the corresponding variables. The result is:(36)where,
The number of free parameters can be further reduced if one assumes that the values of the semisaturation, , or the parameters of the Gaussians, and , have certain structure (e.g. constant along the visual space in each subband). One may impose this structure in Eqs 31, 35 and 36 by right-multiplication of the jacobian by a binary matrix that describes the structure of the considered vector. For instance, assuming the same width all over each scale in a two-scales image representation, one only has two independent parameters. In that case:where, the structure matrix selects which coefficients belong to each scale:
2.4 Result III: Analytic inverse
The inverse of the global transform can be obtained inverting each individual L+NL layer in turn,(37)where, (38)Here we will focus on the part because the linear part can be addressed by standard matrix inversion.
Here we present the analytical inverse of the canonical divisive normalization and of the Wilson-Cowan alternative. The inverse of the two-gamma nonlinearity is not addressed here but in the S1 File because, given the coupling between the input and the exponent, it has no analytical inverse. Nevertheless, a simple and efficient iterative method is proposed there to compute the inverse. The role of the analytical inverse in improving conventional decoding of visual signals is illustrated in the Discussion, Section 3.4.
A note on the linear part: the eventual rectangular nature of Li (different number of outputs than inputs in the i-th layer) requires standard pseudoinverse, (⋅)†, instead of the regular square-matrix inversion, (⋅)−1; and it may be regularized through standard methods [81, 82] in case Li is ill-conditioned. Information loss in the pseudoinverse due to strong dimensionality reduction in Li is not serious in the central region of the visual field due to mild undersampling of the fovea throughout the neural pathway [83]. The only aspect of the input that definitely cannot be recovered from the responses is the spectral distribution in each location. In color perception models the first stage is linear spectral integration to give opponent tristimulus values in each spatial location [62]. This very first linear stage is represented by a extremely fat rectangular matrix, , in each location (300 wavelengths in the spectral visible region reduce to 3 tristimulus values), which definitely is not invertible though standard regularized pseudoinversion. Therefore, the inversion of a standard retina-V1 model such as the one used in the Discussion may recover the tristimulus images but not the whole hyperspectral array.
The metamerism concept (the many-to-one transform) can be generalized beyond the spectral integration. In higher levels of processing, it has been suggested that stimuli may be not be represented by the specific responses of a population of neurons, but by their statistical properties [84]. These statistical summaries could be thought as a stronger nonlinear dimensionality reduction which cannot be decoded through regular pseudoinversion. Therefore, the proposed inverse is applicable only to the (early) stages in which the information is still encoded in the responses of the population and not in summarized descriptions of these responses.
Analytic inverse of the divisive normalization.
Analytic inversion of standard divisive normalization, Eq 8, is not obvious. However, using the diagonal matrix notation for the Hadamard product, the inverse is (see the S5 File), (39)where is element-wise exponentiation of elements of the vector v.
Consistently with generic inverse-through-integration approaches based on ∇xS−1 [85], here Eq 39 shows more specifically that in this linear-nonlinear architecture, inversion reduces to matrix inversion. While the linear filtering operations, Li, may be inverted without the need of an explicit matrix inversion through surrogate signal representations (deconvolution in the Fourier or Wavelet domains), there is no way to avoid the inverse (I − ε)−1 in Eq 39. This may pose severe computational problems in high-dimensional situations (e.g. in redundant wavelet representations). A series expansion alternative for that matrix inversion was proposed in [10], where it is substituted by a (more affordable) series of matrix-on-vector operations.
Inverse of the Wilson-Cowan equations.
The expression of the inverse of the Wilson-Cowan transform is straightforward: by reordering the terms in the steady-state equation, Eq 12, it follows,(40)Note that this inverse function is easily derivable w.r.t xi, which is required to obtain the corresponding Jacobian of the forward transform Eq 26.
Relation between Result I and Result III.
Result III (inverse) is obviously related to Result I (Jacobian with regard to the stimulus) because a sufficient condition for invertibility is that the Jacobian with regard to the stimulus is nonsingular for every image. Note that if the Jacobian is non singular, the inverse of the Jacobian can be integrated and hence, the global inverse can be obtained from the local-linear approximations as in other local-to-global methods, e.g. [43, 45, 85, 86].
This general statement is perfectly illustrated by the similarity between Eqs 39 and 24. According to Eq 39, inverting the divisive normalization reduces to inverting (). Similarly, according to Eq 24, the singularity of the Jacobian depends on the very same matrix. As a result, specific interest on invertible models would imply restrictions to the response and the parameters of H: the eigenvalues of have to be smaller than 1 [10].
3 Discussion
In this section we consider illustrative vision models based on cascades of L+NL stages to point out (a) the fundamental insight into the system behavior that can be obtained from the analytic expressions, and (b) the usefulness of the expressions to develop new experiments and methods in visual neuroscience. The first consist on using the analytical expressions to identify basic trends in physiology, in psychophysics and in the function of the sensory system. Specifically, (a.1) we show how the context-dependence of the receptive fields of the sensors can be explicitly seen in the expression of the Jacobian w.r.t. the stimulus. (a.2) We show that the expression of the Jacobian w.r.t. the parameters reveals that the impact in the response of uncertainty at the filters, or synaptic weights, varies over the stimulus space, and this trend is different for different sensors. (a.3) We show how the general trends of the sensitivity over the stimulus space can be seen from the determinant of the metric based on the Jacobian w.r.t. the stimulus. (a.4) We show that this Jacobian also implies different efficiency (different multi-information reduction) in different regions of the stimulus space. The second includes (b.1) stimulus design in novel psychophysics, (b.2) more accurate model fitting in classical physiology and psychophysics, and (b.3) new proposals for decoding of visual signals.
Let’s briefly describe the kind of vision model used as example throughout the discussion. The case-study model follows the program suggested in [1]: a cascade of four isomorphic canonica L+NL modules addressing brightness, contrast, frequency filtered contrast masked in the spatial domain, and orientation/scale masking. The general architecture is certainly not new, but the proposed expressions were very helpful to tune psychophysically these specific modules to work together for the first time. The response of the model on an image is illustrated in Fig 1.
[Figure omitted. See PDF.]
Fig 1. A cascade of isomorphic L+NL modules based on canonical Divisive Normalization.
The input is the spatial distribution of the spectral irradiance at the retina. For this illustration we generated a spectral image from the publicly available reflectance samples in ColorLab [87] and a regular color image from the public USC-SIPI Database [88] to get a scene with reduced contrast at the left. (1) The linear part of the first layer consist of three positive LMS spectral sensitivities and a linear recombination of the LMS values with positive/negative weights. This leads to three tristimulus values in each spatial location: one is proportional to the luminance, and the other two have opponent meaning (red-green and yellow-blue). These linear responses undergo adaptive saturation transforms. Perception of brightness is mediated by an adaptive Weber-like nonlinearity applied to the luminance at each location. This nonlinearity enhances the response in the low-luminance regions. (2) The linear part of the second layer computes the deviation of the brightness at each location from the local brightness. Then, this deviation is normalized by the local brightness to give the local nonlinear contrast. (3) Local contrast is convolved by center surround receptive fields (or filtered by the Contrast Sensitivity Function). Then the linearly filtered contrast is normalized by the local contrast. Again normalization increases the response in the regions with small input (low contrast). (4) After linear wavelet transform, each response is normalized by the activity of the surrounding neurons. Again, the activity relatively increases in the regions with low input. The common effect of all the nonlinear modules throughout the network is response equalization. The S8 File shows the PDFs of the responses along the network which are consistent with previous reports of the predictive effect of Divisive Normalization.
https://doi.org/10.1371/journal.pone.0201326.g001
Before going into the many details of the full 4-layer model (given in the S1 File, and in the code available in http://isp.uv.es/docs/BioMultiLayer_L_NL.zip), let’s look at a cartoon version for a better interpretation of the analytical expressions.
Consider a system with only three sensors acting on three-pixel images. Consider it is a cascade of just two L+NL layers, one for brightness and the next for spatial frequency analysis:
* Layer 1: brightness from radiance,
* Layer 2: spatial frequency analyzers and contrast response,
The biological basis of this simplified model is straightforward: integration over wavelengths is done using the standard spectral sensitivity function, Vλ [61], and we assume a simple, point-wise and fixed, exponential relation between luminance and brightness [61, 62]. Regarding spatial pattern detection, we assume frequency-selective linear analyzers [7] in the rows of F. The first sensor (first row) is tuned to the DC component of brightness, the second sensor (second row) to the low frequency component, and the last sensor (third row) to the high frequency. Each of these linear sensors has different (frequency dependent) gain in the diagonal matrix G. This gain is band-pass, i.e. similar to the Contrast Sensitivity Function, CSF [59]. Finally, the contrast response undergoes a compressive transform where the interactions between coefficients are neglected as in [89, 90], by using an identity matrix as interaction kernel H.
As a result, the responses at the k-th photo-receptor of the first L+NL layer represent the luminance, , and the brightness, , at the k-th spatial location. Given the frequency analysis meaning of F, the responses and are related to the average brightness of the image, while and , with k > 1, are related to the amplitude or contrast of the low- and high-frequency AC components. With this in mind we will be able to identify the trends of biologically meaningful magnitudes from the proposed expressions in terms of the luminance and contrast of the images in the stimulus space.
The first part of the discussion is focused on examples of the insight into the system that can be obtained from the presented analytical results. Then, we show that Result I is convenient in new psychophysics such as MAximum Differentiation (MAD) [20]; and Result II is convenient for parameter estimation in classical experiments. In fact, MAD and Result I were used to determine the 2nd and 3rd layers of the illustrative L+NL cascade, and Result II was used as alternative to brute-force optimization to maximize correlation with subjective opinion in 1st and 4th layers. The good visual examples of MAD and the goal-optimization curves are practical demonstrations of the correctness of the analytical results. Finally, the analytical inversion, Result III, is compared here with conventional blind decoding techniques [39–41] used for visual brain decoding.
3.1 Physiological, psychophysical and functional trends from the expressions
Physiology.
The receptive field of a neuron is the function that describes how the amplitude of the stimulus at different locations affects its response. In the simplest (linear) setting, the receptive field of the k-th neuron of the n-th layer is a vector of weights, , and the variation of the response is given by the dot product of this vector times the variation of the stimulus, [5, 6]. In a nonlinear system, S, the variation of the response(s) due to the variation of the input is described by the first term of the linear approximation in Eq 15, Δxn = ∇x0S ⋅ Δx0. Therefore, the receptive fields of the sensors at the n-th layer can be thought as the rows of the corresponding Jacobian w.r.t. the stimulus.
Using the above receptive field definition based on the Jacobian, a number of interesting qualitative consequences can be extracted from the analytical Result I (Eq 24) and the associated chain rule expressions, Eqs 17 and 18.
First, the shape of the receptive fields at the i-th layer is mediated by the functions in the rows of the matrices Li, but it is going to be a signal-dependent combination of these functions. Note that if the Jacobian is not diagonal, the receptive fields are spatially non-trivial, i.e. not simple delta functions. In the cartoon example this non-diagonal nature comes from the matrix with frequency analyzers, F. In the same way, in the cortical layer of our full model (4-th layer), this spatially meaningful part is mediated by a filterbank of wavelet-like linear sensors (in the matrix L4). According to the chain rule, Eq 18, these wavelet-like receptive fields will be modified by . This is relevant because, if the Jacobian of the nonlinear part is signal-dependent and nondiagonal, the corresponding linear filters will be recombined in interesting ways leading to variations of the receptive fields. Result I tells us that, in general, this is going to be the case because the diagonal matrices in Eq 24 depend on the signal, and the interaction matrix H is, in principle, non-diagonal. This anticipates that receptive fields at this cortical layer are going to be signal-dependent combinations of wavelet functions. A closer look at Eq 24 allows to make more specific statements about this adaptive behavior of the receptive fields.
Second, the fundamental effect of the background signal is reducing the amplitude of the receptive fields (or reducing the global gain of the sensors). Note that in Eq 24 this is done in two different ways: a global divisive effect through , and a subtractive effect through . In both cases, the bigger the input or output activity (the bigger e and |x|, i.e. the contrast), the stronger the attenuation and subtraction. Moreover, this reduction is sensor-specific. Note that left-multiplication by diagonal matrices implies a different factor per row (see Eq S9.3 in the S9 File), therefore activity in the k-th sensor is going to reduce the amplitude of the k-th row and it is going to increase the subtraction of the linear combination described by the k-th row of H (but not of other rows!). The terms in mean that H would determine a fixed combination of the wavelet filters that would be subtracted from the original filters to a bigger or lower extent depending on the contrast of each wavelet component of the signal. However, there is an extra signal-dependent matrix in the Jacobian: .
Third, the way the linear filters are recombined depends on H, but this recombination is not fixed: it may be signal dependent. However, if γ = 1 this dependence vanishes. This effect comes from the extra matrix we mentioned above. This matrix is right-multiplying the interaction kernel H, and hence its effect is substantially different: it applies a different factor per column (see Eq S9.4 in the S9 File). As a result, the neighbor filters will be combined differently if γ ≠ 1. Otherwise this matrix becomes diagonal and the combination of neighbors is totally determined by H, leading to a contrast dependent attenuation but not a strong change in shape.
In summary, in Result I, Eq 24, one can identify specific signal-dependent changes in the receptive fields that involve (i) attenuation for sensors stimulated with their preferred signal and (ii) stronger effects on the shape of the receptive field depending on the excitation-inhibition exponent γ. All these effects can be seen in the simulation of Fig 2, where we compare the receptive fields at the 4-th layer tuned to different frequencies at different locations of illustrative signals of different contrast. We compare the receptive fields using γ = 0.65 (found in the experiments described in the next sections) with those obtained using γ = 1.
[Figure omitted. See PDF.]
Fig 2. Insight into physiology I: Context dependence of receptive fields (from ∇xS).
Comparison of cortical receptive fields tuned to different frequencies, orientations, and locations while adapted to illustrative stimuli of different contrast. The stimuli are shown at the left column. Each row corresponds to the receptive fields induced by the corresponding stimulus. The panel at the center shows the receptive fields assuming γ ≠ 1 (as obtained from the experiments). On the contrary, the panel at the right displays the receptive fields setting γ = 1 at every layer on purpose. Note that signal-dependent changes in the receptive fields involve (i) attenuation for sensors stimulated with their preferred signal and (ii) stronger effects on the shape of the receptive field in the left panel when γ ≠ 1, as predicted by the theory.
https://doi.org/10.1371/journal.pone.0201326.g002
Result II is also useful to address physiologically interesting questions. For instance, how uncertainty in the synaptic weights affects the response of the sensors?. Such question is interesting because the assumptions done to set these filters may be poor, e.g. selection of a wavelet filterbank in the cortical layer which is not biologically plausible. Similarly, parameters coming from experimental measurements are noisy. How critical is the experimental error in terms of the final impact in the response?. In such situations, the Jacobian w.r.t. the parameters (Result II) that describes the impact of variations of the parameters in the response has obvious interest.
Here we show an example of this use of Result II in the simplified three-sensors model outlined above. In particular, we address how uncertainty in the frequency analyzers (rows of F) has an impact on the response of the different sensors, , across the stimulus space. In absence of Result II, we could add random noise to the filters and empirically check the variation for natural images of different luminance and contrast (see Fig 3). However, Result II allows us to anticipate the outcome of such experiment. In this case, using the part of Eq 21 that corresponds to the derivative w.r.t. the filters in the matrix L2 = G ⋅ F, the impact of a variation of the k-th filter in F, the row vector ΔFk, is: (41)In this equation we can see that different filters have different behavior over the image space. On the one hand, the impact in the response of AC sensors (k > 1) will increase with the luminance of the input due to the direct dependence with x1, which is related to brightness. However, when increasing the contrast of the images the responses and also increase and then, the subtractive and divisive effects in the fraction of Eq 41 reduce the impact. Of course, increasing the semisaturation or the gain implies the corresponding decrease and increase in the impact of the AC filters. On the contrary, the impact of the DC filter behaves quite differently: it increases (a little bit) with the contrast, through the energy that may be captured by the random variation in ΔFk. But, more importantly, it strongly decreases with luminance because of the subtractive and divisive effects caused by increased values in and , which are increasing functions of brightness. Fig 3 confirms all these trends.
[Figure omitted. See PDF.]
Fig 3. Insight into physiology II: Impact on the response of uncertainty in different filters (from ∇θS).
Top: distortions in the zero frequency filter, Middle: distortion in the low-frequency filter, and Bottom: distortion in the high-frequency filter. Variation in the response of the zero, low-, and high-frequency sensors is represented in red, green, and blue respectively. The different columns were computed using variations in the parameters of the simplified model (baseline) to point out the trends seen in Eq 41. Specifically: (i) Impacts in the responses of AC sensors increases with input luminance and gain, and decreases when contrast or semisaturation increase; (ii) Impact of DC filters increases with contrast and strongly decreases with luminance.
https://doi.org/10.1371/journal.pone.0201326.g003
Psychophysics.
The sensitivity of the system is characterized by its discrimination ability: the sensitivity is bigger where the discrimination regions determined by the JNDs are smaller [7, 8]. The trends of the sensitivity of the system in the image space for a range of luminance and contrast can be identified from Result I, Eq 24, and the associated expression for the metric, Eq 23. This is because the volume of the discrimination region at each point of the stimulus space is inversely proportional to the determinant of the metric. In our simplified model the frequency analysis transform is orthonormal, i.e. |F| = 1, as a result, the sensitivity in the space of luminance images depends on these three factors in brackets: (42)The first factor clearly decreases with luminance because its three terms decrease with luminance, brightness and the nonlinear response to brightness (either by division or by subtraction). Note that the first term in this factor is divisive because γ1 < 1 (saturating transform), and hence 2(γ1 − 1) < 0. Setting γ1 = 1 would reduce the dependence with luminance because the first term in this factor would be 1 for every image. The second factor decreases with contrast because the responses of the AC sensors of the second layer increase with contrast and hence, the two terms of the second factor decrease with contrast (by division and subtraction respectively). Note that increasing the semisaturation factor will reduce the dependence with contrast and luminance. Finally, the third factor increases with the area under the CSF-like gain in the diagonal of the matrix G. In Fig 4 we compute the inverse of the volumes of the discrimination regions for 3-pixel natural images covering the luminance and contrast range and the above trends are confirmed.
[Figure omitted. See PDF.]
Fig 4. Insight into physchophysics: Sensitivities from the volume of the JND regions (related to ∇xS).
From left to right: (a) Baseline situation (shows the expected luminance/contrast dependence), (b) Linear luminance-to-brightness response is set to linear (γ1 = 1), (c) contrast sensitivity is increased, (d) semisaturation is increased.
https://doi.org/10.1371/journal.pone.0201326.g004
Function.
It has been argued that one of the basic functions of the retina-cortex neural pathway is maximizing the information about the stimuli transmitted to subsequent areas of the brain [5, 75, 91]. Under certain conditions [92], the transmitted information is maximized by reducing the redundancy between the coefficients of the neural representation of the stimulus. Therefore, measuring how redundancy is reduced when facing different kinds of stimuli tells us how efficient is the system in transmitting the information about them. A very general measure of redundancy in a set of variables is multi-information, MI, which measures the amount of bits shared by them [19]. Therefore, an appropriate way to assess the efficiency of a system in transmitting the information about certain stimuli is measuring how the multi-information in the internal representation, MI(xn), is reduced with regard to the multi-information in the input space, MI(x0). Interestingly, this difference depends on the Jacobian of the transform from x0 to xn [19], and hence our Result I is helpful here. According to [19], the multi-information reduction under a transform, S, is:(43)where h(⋅) represents the entropy of the considered scalar variable, which is easy to compute from the corresponding univariate probability density function of the stimuli, and is the expected value of the log2 of the determinant of the Jacobian over the considered kind of stimuli. In the case of natural images and vision models with reasonable parameters, the effect of the nonlinearities is performing a sort of PDF equalization [80, 93], therefore, the first term, Δh, should be fairly independent of the average contrast and luminance, and one would expect that the main dependence of ΔMI is given by the term that depends on the Jacobian. In our simplified model, the determinant of the Jacobian is the square root of the sensitivity given in Eq 42. As a result, one would expect that the efficiency shows the same trends as the sensitivity.
In the illustration shown in Fig 5 we took 105 natural images of 3-pixels and adjusted their average luminance and contrast to get different sets of stimuli over the whole range. For each set of 105 samples we computed the ΔMI according to Eq 43. Fig 5 shows that Δh is fairly constant and that the redundancy reduction follows the trends expected from Eq 42. Additionally, sensitivity and efficiency do follow similar trends: they are bigger in the low-luminance, low-contrast regions of the input space. Interestingly, these are the regions more populated by natural images [71, 94, 95]
[Figure omitted. See PDF.]
Fig 5. Insight into function: Reduction in multi-information from the difference in marginal entropies and the Jacobian of the transform.
Top: differences in marginal entropies. Middle: Term depending on the Jacobian. Bottom: final multi-information reduction. From left to right: (a) Baseline situation (shows the expected luminance/contrast dependence), (b) Linear luminance-to-brightness response, (c) increased contrast sensitivity, (d) increased semisaturation. We can see how Δh is fairly constant and the final efficiency follows the trends of the Jacobian: it is large in low-luminance, low-contrast regions of input space. Interestingly, these are the regions more populated by natural images.
https://doi.org/10.1371/journal.pone.0201326.g005
3.2 Jacobian with regard to the image in stimulus synthesis
Many times, stimuli design implies that the desired image should fulfill certain properties in the response domain. Examples include (i) artistic style transfer [96], in which the response to the synthesized image should be close to the response to the image from which the content is inherited, and should have a covariance structure close to the one in the response to the image from which style is inherited; and (ii) Maximum Differentiation [20, 21, 23], in which the synthesized images should have maximum/minimum perceptual distance with regard to a certain reference image with a constraint in the energy of the distortion. In both cases, fulfilling the requirements implies modifying the image so that the response is modified in certain direction. In such situations the Jacobian of the response with regard to the image (Result I) is critical.
Here we discuss in detail the case of MAximum Differentiation (MAD). This technique is used to rank competing vision models by using them to solve a simple geometric question and visually assessing which one gave the better solution. While in conventional psychophysics the decision between two models relies on how well they fit thousands of individual measurements (either contrast incremental thresholds or subjective ratings of distortions), in MAD the decision between two models reduces to a single visual experiment.
The geometric question for the perception model in MAD is the following [20]: given a certain reference image, , and the set of distorted images departing a certain amount of energy from the reference image, the sphere with center in and certain fixed radius (or certain Mean Squared Error); the problem is looking for the images with maximum and minimum perceptual distance on the sphere, lets call them and . If the vision model is meaningful, and should have a very different visual appearance. The more accurate vision model will be the one leading to the pair of images which are maximally different. The discriminative power of this visual experiment comes from the fact that the synthesis of these stimuli involves comparing the performance of the models under consideration in every possible direction of the space of images.
Fig 6(a) illustrates the geometric problem in MAD. The following paragraphs show how the different solutions to this geometric problem reduce to the use of Result I.
[Figure omitted. See PDF.]
Fig 6. Stimulus generation in MAximum Differentiation (MAD).
(a) The MAD concept: given an original image, e.g. the point in light blue, a perceptual distance measure, , coming from a vision model (leading to the discrimination region in green), and certain fixed Euclidean distance, , (sphere in black); the problem is looking for the best and worst images on the sphere according to the perceptual distance . The solution is given by the images that are in the directions (or paths) leading to the biggest or the lowest perceptual distortions ( and , in dark blue and orange respectively). In the example, the path leading to the biggest perceptual difference (for the same Euclidean length) is the curve in orange because it represents the shortest path from the original image to the discrimination boundary in green. Equivalently, the path leading to the lowest perceptual difference (for the same Euclidean length) is the curve in blue because it represents the longest path to the discrimination boundary in green. (b) The MAD algorithm: start from a random point at the sphere and modify it to increase (or alternatively decrease) the perceptual distance following . Note that the naive application of the gradient implies a solution out of the sphere. This has to be projected on the sphere through the appropriate correction: first remove the component parallel to , and then project in the direction of . (c) Second order approximation: approximate the perceptual discrimination regions by ellipsoids (local linear approximation of the vision model). In this way the MAD images are given by the directions of the maximum and minimum eigenvalue of the 2nd order metric matrix.
https://doi.org/10.1371/journal.pone.0201326.g006
General, but numeric, solution to MAD.
In general there is no analytical solution for such problem and hence one has to start from a guess image and modify it according to the direction of the gradient of the perceptual distance, , to maximize or minimize this distance. Of course, the problem, illustrated in Fig 6(b), is that, a naive modification of the m-th guess, , in the direction of this gradient puts the solution out of the sphere: note the location of in pink, in Fig 6(b). As proposed in [20], this departure from the sphere is solved by (i) subtracting the component parallel to the gradient of the Euclidean distance, (see the point in red), and (ii) projecting this displaced point back into the sphere (see the point in orange). In summary, the complete iteration for the stimulus that maximizes/minimizes the distance is as follows [20]: (44)where, λ is the constant that controls the convergence of the gradient descent, and ν can be computed analytically since the Euclidean distance of the point projected onto the sphere should be , where, given the gradients, the only unknown is ν. Note that the gradients of the distances are row vectors since they should be applied on the column vectors describing the increments in the images: (row vector times column vector). That is why we need to transpose the gradients before adding the modifications to , and the reason for the transposes in the scalar products of gradients (as in the projection , row vector times column vector). Note also that the gradients without prime are computed at , and the gradient with prime is computed at .
Now, lets address the gradients. The Euclidean distance with regard to the reference image evaluated at certain is . Therefore, the gradient of the Euclidean distance with regard to is:
More interestingly (since this was not addressed in [20]), the gradient of the perceptual distance in the cascaded setting considered here, which is defined at the response domain, Eq 22, is, (45)which depends on the responses for the considered images, xn = S(x0), and on the Jacobian of the response with regard to the input .
Eq 45 together with the auxiliary results on ∇x0S (Eqs 17 and 18) imply that the application of MAD in the cascaded setting considered here, reduces to the use of Result I, i.e. Eq 24 for the canonical nonlinearity, or the equivalent equations for the alternative nonlinearities considered (i.e. Eqs 26 and 27).
Analytic, but approximated, solution to MAD.
As stated in Section 2.1 when talking about the distance, Eq 23, in the local-linear approximation the general discrimination regions are approximated by ellipsoids. In the illustration of Fig 6, the (general) curved region in dark green in Fig 6.a and 6.b is approximated by the ellipsoid in Fig 6.c.
Under this approximation, the minimization/maximization of the perceptual distance on the Euclidean sphere has a clear analytic solution: the images with maximum and minimum perceptual distance will be those in the directions of the eigenvectors with minimum and maximum eigenvalues of the metric matrix . These, again, depend on the Jacobian of the response with regard to the stimulus, and hence on Result I.
The view of the MAD problem in terms of a metric matrix is also useful when breaking large images into smaller patches for computational convenience. In these patch-wise scenarios the global metric matrix actually has block-diagonal structure (see the S6 File). Therefore, given the properties of block-diagonal matrices [82], the global eigenvectors (and hence the solution to MAD) actually reduce to the computation of the eigenvectors of the smaller metric matrices for each patch.
Illustration of the general and the analytic solutions.
Here we take a reference image and we launch a gradient descent/ascent search in the sphere of constant Root Mean Square Error (RMSE) to look for the best/worst version of this image.
For the same image we compute the Jacobian with regard to the stimulus and we compute the eigenvectors of bigger and lower eigenvalue, i.e. the directions that lead to most/least visible distortions in the 2nd order approximation of the distance (approximated analytic MAD solution). For computational convenience we take a patch-wise approach considering distinct regions subtending 0.65 deg. This region-oriented approach certainly generates some artifacts in the block boundaries. However, the moderate visual impact of these edge effects suggests that for regions of this size (and above) it is fair to assume the block-wise independence of distortion See additional comments on this computationally convenient assumption in the S6 File.
Fig 7 shows the evolution of the general MAD distances and the solutions from the initial guess on the sphere (image corrupted with white-noise). Monotonic increase and decrease in the red and blue distance curves and progressive degradation or improvement in the images indicate both (a) the correctness of Result I, and (b) the accuracy of the parameters of the model used in this illustration. Figs 8 and 9 show the results of the general MAD search and its analytic approximation respectively.
[Figure omitted. See PDF.]
Fig 7. Gradient-descent/ascent MAD.
Left panel: evolution of the perceptual distance maximized (red curve) or minimized (blue curve) from an initial white-noise distorted image. Right panel (from left-to-right): evolution of the intermediate MAD images on the sphere of constant RMSE. The two rows at the top show the evolution of progressively-worse images while maximizing the perceptual distance. The two rows at the bottom show the equivalent evolution of the progressively-better images while minimizing the perceptual distance. In each case (top and bottom) we show the image+distortions and the isolated distortions. In each case (top and bottom), the first image at the left is the initial randomly selected image in the sphere of constant RMSE. This image gets progressively worse/better. The reference original image is shown in Fig 8.
https://doi.org/10.1371/journal.pone.0201326.g007
[Figure omitted. See PDF.]
Fig 8. General numeric solution.
Top: original image (picture of the corresponding author taken by Virginia Amblar). Central row: extreme images (MAD best and worst -left and right-) with the same RMSE than the white-noise corrupted image at the center. The central image is the initial random selection in the sphere of constant RMSE. Extreme images were computed using the gradient descent/ascent described in Eq 44 (i.e. require Result I). The fact that these extreme images are visually better and worse than the central image indicates that the theory works. Bottom row: isolated distortions of the same energy: versus the initial white noise.
https://doi.org/10.1371/journal.pone.0201326.g008
[Figure omitted. See PDF.]
Fig 9. Analytic, but approximated, solution.
Top: original image . Central row: extreme images (MAD best and worst -left and right-) with the same RMSE than the white-noise corrupted image at the center. Extreme images are built from the eigenvectors of the metric matrix, (hence requiring Result I). In this case no search is carried out: the image corrupted with white noise in the center of 2nd row is there only for comparison purposes. The fact that the extreme images are visually better and worse than the central image indicates that the theory works. Bottom row: isolated distortions of the same energy: versus white noise.
https://doi.org/10.1371/journal.pone.0201326.g009
The main trend is this: the numerical procedure leads to noises of similar visual nature than the analytic procedure. This means that the iterative search is certainly attracted to the subspaces with low and high eigenvalues of the 2nd order metric. More specifically, in both cases (a) the algorithms tend to allocate high-contrast low-frequency artifacts in low-contrast and low-luminance regions (e.g background) to increase the visibility of the noise, and (b) the algorithms tend to allocate high-frequency noise in high-contrast regions (e.g. face) to minimize its visibility. These distortions are completely consistent with the trends identified above in the analytical sensitivity and efficiency of the system, Eqs 42 and 43 and Figs 4 and 5: focus on the low-contrast region and the role played by the CSF-like gain.
There are differences between the general and the analytic solutions. In this example the visual difference between the pairs of the analytic (approximated) solution seems bigger than the visual difference between the general (numeric) solution. In principle, the numeric solution follows more closely the actual geometry induced by perception (amorphous discrimination region versus approximated ellipsoid). However, the finite length of the gradient descent search and the eventual trapping in local minima may prevent the practical use of the general technique (not to speak about the substantially higher computational cost of the search!).
Nevertheless, the qualitative similarities of the solutions (the nature of the distortions and its spatial location) is more relevant than the small quantitative differences. In the model considered throughout this Discussion (Fig 1), the 2nd and 3rd layers (contrast, and energy masking) were determined using analytic MAD as in [21]. The experimental determination consisted of deciding between different distorted pairs corresponding to eigenvectors coming from models from different parameters. The final values found are those referred in the associated code (see the S1 and S8 Files). Interestingly, MAD images in this paper (Figs 7–9, and Fig S8.3 in the demo of the Toolbox in the S8 File). were computed using two extra layers in the model (1st and 4th, accounting for brightness and wavelet masking respectively). The important point is that, either numeric or analytic MAD images, they give rise to distinct pairs by putting wavelet-like localized distortions of the right frequency in the regions with the right contrast or luminance, as expected from the sensitivity and efficiency discussed above.
Recently proposed visualization techniques to assess the biological plausibility of deep-network architectures [14] reduce to our analytic-MAD result originally proposed in [21]. The relevance of Result I is that it makes explicit the Jacobian expressions which are hidden in automatic differentiation techniques used in [14]. With the expressions proposed here one may anticipate what kind of patterns and where they should be located to lead to highly (or hardly) visible distortions.
MAD may have two different problems (a) problems in identifying the best/worst distortions for a given image (due to local minima in the distance or noise in the eigenvectors), and (b) substantial change of the directions of the distortions accross the image space. In our experience we found that, given a reference image, the directions are fairly independent of the initial guess and consistent with the eigenvectors. Therefore, the second problem is more severe than the first because if the dependence is big, multiple experiments would be needed to decide between models [23]. Note that even in this undesirable situation the analytical Jacobian is also useful to assess how poor MAD can be. This assessment could be done by measuring the variability of the directions defined by the Jacobian.
3.3 Jacobian with regard to the parameters in model optimization
The standard methodology to set the free parameters of a model is looking for the values that better reproduce experimental results (either direct physiological recordings or indirect psychophysical results). Sometimes brute-force exhaustive search (as done in [11, 24–26]) is good enough given the low dimensionality of the parameter space. However, when considering thousands of parameters (as may happen in the considered model), brute-force approaches are definitely unfeasible. In this high-dimensional scenario the Jacobian with regard to the model parameters (i.e. Result II) may be very convenient to look for the optimal solution, as for instance using gradient descent.
Interestingly, model fitting procedures based on alternative goals (as for instance optimal encoding/decoding performance, as in [27]) also depend on gradient descent and this Jacobian w.r.t. parameters. Unfortunately, the use of this Jacobian in similar biological models for optimal encoding/decoding (in [27]) or to reproduce psychophysical data (in [14]), was hidden behind automatic differentiation. On the contrary, here we gave the explicit equations (Result II) and show their practical performance and correctness in analyzing psychophysical data.
In this section we discuss how to use the presented Result II (generic Eqs 19–21 and specific equations for the Divisive Normalization, Eqs 29–36), to obtain the model parameters from classical subjective image quality ratings.
Reproducing direct input-output data.
In a controlled input-output situation (as in [25]), it is usual to have a set of experimental physiological responses, , for a given set of known inputs, , and the goal is finding the model that behaves like the recorded data.
A popular cost function depending on the parameters is the quadratic norm of the deviation between the theoretical and the experimental responses . Minimization of this deviation, requires the derivative of the cost with regard to the parameters,(46)which of course depends on the Jacobian w.r.t the parameters (and hence on Result II).
The analytical inverse, Result III, has an interesting consequence in terms of the determination of Θ in a controlled input-output situation. In general estimation problems the solution is necessarily more accurate if one combines multiple constraints to restrict the range of possible outcomes. In our particular case, an alternative constraint to the one considered above is the minimization of the distance between the actual input and the theoretical input that would be obtained from the inverse applied to the actual output. Assuming that the decoding transform is the inverse of the encoding transform, this implies minimizing . This extra constraint would lead to a gradient similar to Eq 46, but involving the inverse function. That is using Result III, and the Jacobian of the inverse wrt the parameters. Interestingly, it holds ∇ΘS−1 = −(∇x0S)−1 ⋅ ∇ΘS. Therefore, this additional constraint reduces to the combined use of Result III, Result I, and Result II.
Reproducing indirect data.
By indirect data we refer to certain behavior that is mediated by the responses of the underlying L+NL mechanisms, but it is different from the actual responses themselves. This is the conventional situation in psychophysics. An illustrative example is the subjective assessment of perceived differences in image quality databases.
In this particular image quality situation instead of having a set of physiological responses for a given input, we have a Mean Opinion Score (MOS) of a set of distorted images (which is the usual ground truth in the image quality literature [97]), and we want to adjust our model to reproduce this opinion.
In this case, the goal function is the correlation between the experimental subjective distance and the perceptual distance computed using the model explained above. More specifically, consider a set of N corrupted images, , with c = 1…N. For this set, we assume we know the N mean opinion scores, M = (m[1], …, m[N])⊤, and we can compute the N perceptual distortions, , using the model (e.g. using Eq 22).
Therefore, the optimal parameters will be those maximizing the alignment between the ground truth, M and the model predictions D. Using the Pearson correlation, ϱ, as alignment measure, we have,(47)where subindex s stands for subtraction of the mean of the vectors.
The maximization of the correlation, ϱ, requires its derivative with regard to the parameters of the model. Interestingly (see the S7 File), it turns out that the derivative of this goal function also depends on ∇ΘS (i.e. it depends on Result II):(48)
Cascaded L+NL model in image quality.
The reproduction of image quality ratings is a good way to check the performance of vision models in a variety of observation conditions (variety of natural backgrounds and variety of suprathreshold tests).
The image quality results discussed in this section illustrate three interesting issues:
* They are a complementary evidence of the quality of the modular model (different from MAD results of the previous section).
* They point out the benefits of the modular nature of the model since inclusion of extra layers leads to consistent improvements of the performance (either by using canonical L+NL layers or by using alternative formulations such as the two-gamma tone mapping model or the Wilson-Cowan nonlinearity after a linear wavelet stage).
* They reveal the relevance of Result II in finding the model parameters in high-dimensional scenarios and the correctness of the presented expressions.
Fig 10 shows the performance of the kind of cascaded L+NL models we are considering here in the reproduction of mean opinion scores. We include two reference models (in red) for convenient comparison. The first reference is just the Euclidean distance between inputs (RMSE). The second reference is the most popular perceptual quality predictor in the image processing community (the Structural SIMilarity index, SSIM [99]). Our baseline model is the 2-stage L+NL model whose parameters were tuned using MAximum Differentiation [21]. Results reported here are better than those reported in [21] (using the same parameters) probably because of two reasons: (1) here we are using bigger patches and hence the patch-independence assumption holds better, and (2) we are now applying luminance calibration to digital values of the TID database. This baseline model corresponds to the 2nd and 3rd canonical stages of the global model we are considering throughout the discussion section (see model details in the S1 File).
[Figure omitted. See PDF.]
Fig 10. Prediction of subjective distortion.
Correlation between predicted subjective distortion (in abscisas) and actual opinion (in ordinates) is a good performance measure of models in a variety of observation conditions. These results show the Pearson, Spearman and Kendall correlations of different models in reproducing the subjective opinion in the TID database [98]. Scatter plots in red correspond to two simpler models for convenient reference: the Euclidean distance, RMSE; and a widely used model of perceptual distortion, SSIM [99]. Scatter plots in blue correspond to progressive improvements of the baseline 2-layer (L+NL + L+NL) model at the left. Labels in the abscisas indicate (a) the perceptual phenomena taken into account by the layers of the models, and (b) the structure of the layers and how they were estimated. MAD-DN stands for Divisive Normalization layer with parameters estimated using MAD experiments. Brute-force TM and Brute-force WC stand for Tone-Mapping and Wilson-Cowan layers estimated through the maximization of the Pearson correlation using exhaustive search in a grid. Finally, opt-DN stands for Divisive Normalization layers estimated through gradient optimization of Pearson correlation (using Result II).
https://doi.org/10.1371/journal.pone.0201326.g010
Substantial jumps in correlation from the RMSE result indicate the well-known limitation of naive Euclidean distance [30] but also the potential of MAD to set the parameters of this 2-stage model [21]. Note that, assuming the Contrast Sensitivity Function (CSF) of the Standard Spatial Observer [24], this 2-stage model only has 5 free parameters (something affordable using MAD): the widths of the kernels and the semisaturation for contrast computation, and the width of the kernel, the semisaturation, and the excitation exponent in the masking nonlinearity.
Modularity and interpretability of the model is nice because it allows to propose straightforward improvements of the baseline model: just introduce extra layers according to the program suggested in [1]. Accordingly, we included a brightness perception layer before the contrast computation, and a wavelet interaction model after the CSF+spatial masking layer. To stress the generality of the proposed modular approach our first brightness model was the two-gamma tone mapping operator cited in Section 2.1, and our first wavelet masking scheme was the Wilson-Cowan model cited in Section 2.1 applied to each subband of a steerable pyramid. Following [26] the 5 extra parameters of these extra layers were obtained through brute-force search using 50% of the database. Exhaustive search in 5 dimensions is hard but still feasible. The resulting model not only improves the Pearson correlation in image quality (as expected by construction), but it also has sensible behavior in reproducing contrast masking [26].
Finally, we explicitly explored the maximization of the correlation using different versions of the brightness and the wavelet+masking stages. In this final case we used canonical Divisive Normalization layers. Note that the joint optimization of the 1st and 4th layers is an interesting way to check Results I and II at once. First, relation to the Jacobian w.r.t the parameters in Result II is obvious from Eq 48. But, more interestingly, note that the chain rule, Eq 20, implies that distortions due to variations in the parameters propagate throughout the network. Then, the Jacobian w.r.t the stimulus (i.e. Result I of all the layers following the one under optimization) is also required in the joint optimization of 1st and 4th layers.
As a result of the chain rule, this optimization has a great experimental value but also a great computational cost. That is why we split the optimization of this illustration in two separate phases.
In the optimization phase one we addressed the (highly illustrative but extremely demanding) joint optimization of the 1st and 4th stages. In this phase one we used a reduced training set to avoid the computational burden, and we used structured versions of the parameters as discussed after Eq 36 to address a relatively low-dimensional problem (but still not affordable through brute force). Then, in phase two, we took the results of phase one (which are only a first approximation to the right solution because of the small size of the training set) and focused on the optimization of a single parameter which is fast to compute but extremely high-dimensional to point out even more clearly the necessity of using Result II. Positive results of these first and second learning phases are illustrated in Fig 11.
[Figure omitted. See PDF.]
Fig 11. Maximization of correlation with subjective opinion (phase one and phase two).
Phase one (left) involved the joint optimization of the following parameters: β, γ of the 1st stage and b, σ, γ, c of the 4th stage. In this case and one parameter per orientation (with 4 orientation) and scale (with 4 scales) plus both residuals. Note that even though the relatively small number of parameters, the dimensionality is still huge to allow a brute force approach. Given the computational cost of phase one we used stochastic gradient descent on a extremely small training set. Results shown in the curve correspond to the randomly varying training set. In phase one the correlation for the whole database only arrives up to 0.84 because more iteration would be required. Phase two (right) only involved the optimization of b4. This is fast enough to use deterministic gradient ascent training with half the database. However, note that if no structure is imposed in b4, it has thousands of elements thus brute-force is not possible. In this case, correlation results shown correspond to the whole database (indicating proper generalization). The parameters leading to the maximum correlation in the test set (peak of the red curve) are those used for the scatter plot of Fig 10.
https://doi.org/10.1371/journal.pone.0201326.g011
Computational cost of joint optimization (phase one) implies that not that many training points can be used in the gradient ascent. This implies that in order to generalize the training set has to be stochastically updated. We show the result of such stochastic maximization of the correlation using only 48 samples of the TID database at a time. The gradient ascent search in phase one (blue curve at the left) certainly increases the correlation for the considered small training set. Note that oscillations come from the random modification of the training set in each iteration. The consistent increase of correlation in the stochastic phase one points out the correctness of Results I and II. However, in the explored iterations in phase one, the correlation in the whole dataset only increased up to 0.84. This generalization problem means that the training set is too small to avoid overfitting or equivalently, that extra iterations would be necessary so that this small set could visit the whole variability in the dataset.
Once Result II (and also Result I) have been checked in the most demanding situation (joint optimization of two layers in phase one), we switch to phase two. In the phase two only the semisaturation of the 4th stage (only the vector b4) was optimized. Note that in this restricted case we do not need the Result I of the intermediate stages anymore. In fact, the computation of the Jacobian w.r.t. this single parameter is so fast that we allowed the search in the full dimensionality of this vector and using deterministic gradient ascent (using a substantial part of the available database). In particular in this phase two we trained with 800 randomly chosen points of the database (about 50%), as opposed to the reduced number of random regions taken from 48 points used in phase one. In this high-dimensional case (note the huge dimension of b4) brute-force is certainly not possible, and hence the gradient ascent (i.e. Result II) is the most sensible way. In this phase two, the correlation on the whole database (in red) consistently increases at the beginning of the search indicating both the correctness of Result II (for this parameter) and the representativeness of the training set. Finally, as expected in any learning problem using a limited training set, overfitting occurs and the correlation for the test set starts to oscillate. The values found at the (trustable) highlighted point are those used in the final scatter plot of Fig 10.
Only the brute-force and the result of the phase two optimization are compared in Fig 10. These methods used the same amount of training samples but note that the complexity of the models is much bigger in the wavelet-DN case (where brute-force is definitely not possible). Figs 10 and 11 (right) report the results for the whole database: of course, as in any learning problem, correlation values in the separated train and test sets are slightly higher and lower respectively.
3.4 Analytic inverse in visual brain decoding
Visual brain decoding refers to the reconstruction of the input stimulus from physiological recordings of the neural activity (for instance fMRI) [39]. Conventional decoding techniques are based on learning-through-examples the response-stimulus relation. First approaches to decoding used plain linear regression [46], but now the current practice is using non-linear regression as for instance based on kernel methods (as in [41, 100]). However, given the fact that models of the BOLD signal also have this cascaded L+NL structure [25], the analytic inverse of the transform proposed here (Result III) may have obvious application in decoding the input from the recorded output.
In order to illustrate the eventual benefits of the analytic inverse in the visual decoding problem, in this discussion we consider a simulation where conventional blind machine-learning techniques (linear regression and nonlinear kernel-ridge regression as in [41]) are compared to the analytic inversion. Here we simulate the recorded neural signal by applying the forward model to noisy inputs and distorting the output. This controlled scenario allows us to generate as many corresponding input-output pairs as necessary to train the machine-learning techniques, as done in the experimental acquisition phase in the brain decoding literature.
In fact, regression techniques depend both on the nature of the input-output pairs and on the nature of the distortions. In our simulation we controlled both:
* We controlled the nature of the signals by augmenting a calibrated set of natural images (Van Hateren database [101]) using controlled modifications of the illumination conditions. Specifically, we linearly modified the images to have different average luminance and contrast. We considered 7×9 combinations of luminance and contrast covering the range available in a conventional computer display (see Fig 12).
* Distortion in the signals comes from random variations in the input (e.g. photon-noise at the retina), random variations of the output (e.g. noise in the cortical response), and distortions due to the measurement (e.g. blurring and noise in the BOLD signal). There is a debate on the psychophysical relevance of the noise at the input versus the neural noise [17, 102, 103] that we don’t want to address here. Just for the sake of the illustration, we controlled these distortions by using uncorrelated noise at the input and blur+noise in the acquisition of the neural signal given by the model. The outputs of the model were blurred using a Gaussian kernel with width of 0.05 degrees (in visual angle units, in the spatial domain corresponding to each subband). We considered two distortion regimes: low-distortion and high distortion. The low-distortion regime involved Gaussian noise at the input with σ = 3cd/m2, and Poisson noise at the responses with Fano factor 0.02. The high-distortion regime involved the same sources of noise with input deviation σ = 30cd/m2, and internal Fano factor 0.05.
[Figure omitted. See PDF.]
Fig 12. Range of illumination conditions.
Average luminance increases from top to bottom and contrast increases from left to right. Luminance is in the range [0, 160] cd/m2. Average luminances are in the range [25, 80] cd/m2, and average contrasts are in the range [0.1, 0.9]. The learning-based decoders were trained with natural images from the central luminance/contrast condition, under two levels of distortion. The undistorted image comes from the publicly available Van Hateren database [101] and is reproduced here with permission from Hans Van Hateren.
https://doi.org/10.1371/journal.pone.0201326.g012
We trained the machine learning algorithms with images from the central luminance/contrast condition and responses under the low and high distortion regimes. We considered 5000 input-output examples in the training. We tested on an image not considered in the training set. In the test we considered the different illumination conditions (the one used in the training and the other conditions considered in Fig 12), and we applied the two distortion regimes.
In each test example, in which illumination may or may not correspond to the training, we decoded the response with 5 decoders: (1) linear decoder trained for the considered distortion, (2) linear decoder trained with noise of different nature, (3) non-linear decoder trained for the considered distortion, (4) non-linear decoder trained with noise of different nature, (5) analytical decoder.
Distortion in the decoded signals is shown in Fig 13. Here we use the Mean Absolute Error in the input domain (in cd/m2 units) as distortion measure, just because it has direct physical interpretation (subjective accuracy will be apparent in the visual examples below). Results show that the error of the analytic decoder is lower and substantially more independent from the illumination conditions than the error of the machine learning models that depend on the training. The error surfaces of the data-dependent decoders are curved because they are trained for the central condition in the range. Therefore they have generalization problems in other regions. For bigger distortions all methods have lower performance (the analytic decoding is affected too), but note that in this case, appropriate training becomes more critical because using decoders trained in other distortion conditions increase the error (see how the green surface goes up in the plot of the right).
[Figure omitted. See PDF.]
Fig 13. Decoding error.
Left panels show the Mean Absolute Error (MAE) in the low-distortion regime and panels of the right show the MAE in the high-distortion regime. Image luminance is in the range [0, 160] cd/m2. In each case surfaces red, green and blue represent the error of the linear regression, non-linear regression and analytic decoders. In each distortion regime, the panel at the left shows the results of the decoders trained for that regime. The panel of the right represents the case where the signal is reconstructed by decoders trained in a different distortion regime. The highlighted points correspond to the error of the decoded images shown in Figs 14 and 15. Note that the points at the center correspond to the optimum for the learning-based decoders: the training illumination and distortion. Axis label, L means Luminance, and C means Contrast.
https://doi.org/10.1371/journal.pone.0201326.g013
Beyond Mean Absolute Error or alternative arbitrary measures of reconstruction accuracy (all of them perceptually arguable), it is worth taking an explicit look at the reconstructed images. Representative examples of the decoded signals are shown in Figs 14 and 15.
[Figure omitted. See PDF.]
Fig 14. Reconstructions in the low-distortion regime.
These six reconstructions correspond to the highlighted dots in the surfaces at the left plot of Fig 13. The analytic decoding clearly overperforms the learning algorithms even in the case that the image has the illumination conditions used in the training (medium luminance/contrast) and the decoders are those trained for the considered distortion.
https://doi.org/10.1371/journal.pone.0201326.g014
[Figure omitted. See PDF.]
Fig 15. Reconstructions in the high-distortion regime.
These six reconstructions correspond to the highlighted dots in the surfaces at the right plot of Fig 13. In this example, with decoders properly trained in the high-distortion regime, the analytic decoding does not have the best MAE values (MAE is not a visually meaningful measure anyway), but it is certainly better in preserving the visual structures in the scene.
https://doi.org/10.1371/journal.pone.0201326.g015
Note that the set of visual examples include the best case scenario for the learning-based decoders: training in the same illumination conditions and with distortion of the same nature (top row of Figs 14 and 15). Even in this best case scenario, the analytic decoder better reproduces the visual structures even in the high distortion condition. These visual examples illustrate the advantages of considering the analytic decoding (generalization ability) with regard to the conventional linear and nonlinear regression.
4 Concluding remarks
This paper addressed relevant mathematical details of biologically plausible feed-forward cascades of linear+nonlinear neural models. These details, namely the Jacobians of the transform (w.r.t. the stimulus and w.r.t. the parameters) and the decoding transform, are usually disregarded in the conventional experimental literature (e.g. [1] and cites therein), because it is focused on obtaining the encoding transform in specific experimental settings.
The analytical results presented here show that for the considered L+NL cascades the Jacobian with regard to the stimulus, ∇x0S, the Jacobian with regard to the parameters, ∇ΘS, and the inverse, S−1, reduce to knowing the corresponding Jacobian and inverse of the nonlinear part of each layer of the cascade, namely , , and . These necessary elements are explicitly given here: the analytical expressions of , , and for the case canonical Divisive Normalization. Equivalent results for alternative nonlinearities such as the Wilson-Cowan model [47, 48] and models of brightness perception [49] are also given.
The fundamental reason for this analytical treatment is the analytical insight into the physiology, the psychophysics and the function of the visual system. We explicitly saw that the context-dependent changes in the receptive fields, the impact in the response of uncertainty in the filters (or synaptic weights), the trends of the sensitivity and JNDs, and the efficiency of the system in multi-information terms can be identified in the analytical expressions of the Jacobians ∇x0S and ∇ΘS.
It is true that the artificial neural networks literature addresses similar cascaded architectures [54], and this community has been recently attracted by the applications of their derivatives and inverse (e.g. in image synthesis [96], or new visualization methods to assess deep networks using derivatives [14] and the inverse [104]). However, this literature doesn’t address all the analytic results reported here for biologically plausible nonlinearities such as the Divisive Normalization or the Wilson-Cowan model. One reason is because the most popular artificial networks use simplified nonlinearities and biological plausibility is not one of the goals when training the artificial models [54]. But more importantly, given the growing popularity of automatic differentiation methods [55], derivatives may be extensively used, but explicit expressions are not given.
It is important to stress that even in the case of training biologically plausible models, while the use of automatic (implicit) differentiation certainly leads to successful working models, it makes the models more difficult to understand: one should check their behavior empirically because it is not summarized in equations. On the contrary, as shown in the examples presented in the first part of the discussion, the explicit expressions allow to identify the basic properties of the system. In this way one may anticipate the kind of results that will emerge from the model (e.g. the nature and location of the distortions computed in MAD) and their relation with the sensitivity or the efficiency of the system because this can be seen in the equations.
Additional examples of the advantages of the analytical treatment include the following. On the one hand, the analytical expressions may suggest simplifications that enable links between models of different nature. For instance, the truncation of the expansion of the analytical inverse of the Divisive Normalization is convenient to establish the equivalence between the Divisive Normalization and the Wilson-Cowan model [105]. On the other hand, the analytical expressions allow sensible comparisons between normal and anomalous versions of a model. For instance, the corresponding pair procedure to simulate the perception of anomalous observers [106] could be used in more general models if the analytical inverse is available and it depends on the parameter that describes the anomaly.
From the experimental and applied perspective, we discussed how ∇x0S can be used in the design of stimuli for novel psychophysics (MAD); how ∇ΘS can be used to get the parameters of the model using classical psychophysics in image quality ratings; and how visual representation decoding may be benefited from the use of S−1. These illustrations are a practical demonstration of the correctness of the presented expressions and suggest that (i) the proposed modular model can be easily extended including extra layers that can be fitted without relying in brute-force techniques (hence improving the results in [11, 24, 26]), and (ii) the analytic inverse seems an interesting alternative to blind regression techniques [41, 46] previously used in visual decoding.
Finally, it is important to acknowledge that relevant aspects of the sensory system were not explicitly considered in the examples shown here, as for instance its dynamics or the eventual presence of feedback.
Regarding time-varying stimuli, the L+NL cascaded architecture considered here has also been successfully used to model motion sensitive areas such as V1 and MT [3]. Therefore, the spatio-temporal extension of the presented formulation is completely straightforward. Focus on the stationary solution of the system, as done in our consideration of the Wilson-Cowan model, implies ignoring transients. Nevertheless, networks with divisive feedback lead to regular divisive normalization-like steady states and the semisaturation depends on the signal [107]. Similarly, when divisive normalization is considered to be equivalent to the steady state of the Wilson-Cowan model, signal dependence appears in the kernel H [105]. Therefore, some of the parameters that we assumed to be constant should actually vary with the environment with specific time constants.
In relation with the limitations due to ignoring feedback in the considered models we would like to stress that we are not advocating for feedforward L+NL cascades as the perfect approach in all situations, but rather introducing the maths for better using a very popular framework, which will in turn help advance our understanding of how biological vision works.
In the same vein, a number of alternative models also assume multiple L+NL stages and adaptive non-linearities [14, 50–53, 57]. And they all are successfully fitted via gradient based methods to data. While some of them provide a detailed account on how the parameters are fitted [50–52], others rely on automatic differentiation [14, 53, 57]. An interesting example related with the approach proposed here is the model considered in [108], where they explicitly provide not only the Jacobian wrt the parameters to fit the model, but also the Jacobian wrt the input to synthesize MAD stimuli. Unfortunately, as the other cases, they do not address the inverse either.
In summary, given the insight that can be obtained from the explicit expressions, future modeling efforts should not be restricted to the forward transform, but they should also address the derivatives and the inverse if the level of abstraction is low enough to invert the model.
Supporting information
[Figure omitted. See PDF.]
S1 File. A cascaded linear+nonlinear vision model.
https://doi.org/10.1371/journal.pone.0201326.s001
(PDF)
S2 File. Derivative of a linear function with regard to its parameters.
https://doi.org/10.1371/journal.pone.0201326.s002
(PDF)
S3 File. Derivation of the Jacobian with regard to the stimulus.
https://doi.org/10.1371/journal.pone.0201326.s003
(PDF)
S4 File. Derivation of the Jacobian with regard to the parameters.
https://doi.org/10.1371/journal.pone.0201326.s004
(PDF)
S5 File. Derivation of the inverse.
https://doi.org/10.1371/journal.pone.0201326.s005
(PDF)
S6 File. Region-based approach to Maximum Differentiation.
https://doi.org/10.1371/journal.pone.0201326.s006
(PDF)
S7 File. Maximization of correlation with subjective opinion.
https://doi.org/10.1371/journal.pone.0201326.s007
(PDF)
S8 File. The BioMultiLayer-L+NL Toolbox.
https://doi.org/10.1371/journal.pone.0201326.s008
(PDF)
S9 File. Toolbox-oriented matrix properties.
https://doi.org/10.1371/journal.pone.0201326.s009
(PDF)
Citation: Martinez-Garcia M, Cyriac P, Batard T, Bertalmío M, Malo J (2018) Derivatives and inverse of cascaded linear+nonlinear neural models. PLoS ONE 13(10): e0201326. https://doi.org/10.1371/journal.pone.0201326
1. Carandini M, Heeger DJ. Normalization as a canonical neural computation. Nature Rev Neurosci. 2012;13(1):51–62.
2. Hillis JM, Brainard DH. Do common mechanisms of adaptation mediate color discrimination and appearance? JOSA A. 2005;22(10):2090–2106. pmid:16277280
3. Simoncelli EP, Heeger D. A Model of Neuronal Reponses in Visual Area MT. Vision Research. 1998;38(5):743–761. pmid:9604103
4. Watson AB, Solomon JA. A model of visual contrast gain control and pattern masking. JOSA A. 1997;14:2379–2391. pmid:9291608
5. Olshausen B, Field D. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature. 1996;281:607–609.
6. Ringach DL, Hawken MJ, Shapley R. Receptive field structure of neurons in monkey primary visual cortex revealed by stimulation with natural image sequences. Journal of Vision. 2002;2(1):2.
7. Graham N. Visual Pattern Analyzers. Oxford University Press; 1989.
8. Regan D. Spatial Vision. CRC Press; 1991.
9. Epifanio I, Gutierrez J, Malo J. Linear transform for simultaneous diagonalization of covariance and perceptual metric matrix in image coding. Pattern Recognition. 2003;36(8):1799–1811.
10. Malo J, Epifanio I, Navarro R, Simoncelli EP. Nonlinear image representation for efficient perceptual coding. IEEE Transactions on Image Processing. 2006;15(1):68–80. pmid:16435537
11. Laparra V, Muñoz-Marí J, Malo J. Divisive normalization image quality metric revisited. JOSA A. 2010;27(4):852–864. pmid:20360827
12. Seriès P, Stocker AA, Simoncelli EP. Is the Homunculus “Aware” of Sensory Adaptation? Neural Computation. 2009;21(12):3271–3304. pmid:19686064
13. da Fonseca M, Samengo I. Derivation of Human Chromatic Discrimination Ability from an Information-Theoretical Notion of Distance in Color Space. Neural Computation. 2016;28(12):2628–2655. pmid:27764598
14. Berardino A, Balle J, Laparra V, Simoncelli EP. Eigen-distortion of hierarchical representations. Adv Neur Inf Proc NIPS-17. 2017;.
15. Spivak M. Calculus on manifolds: a modern approach to classical theorems of advanced calculus. Mathematics monograph series. Reading, Mass. Addison-Wesley; 1965.
16. Dubrovin B, Novikov S, Fomenko A. Modern Geometry: Methods and Applications. New York: Springer Verlag; 1982.
17. Ahumada A. Puting the visual system noise back in the picture. J Opt Soc Am A. 1987;4(12):2372–2378. pmid:3430224
18. Cover TM, Thomas JA. Elements of Information Theory, 2nd Edition. 2nd ed. Wiley-Interscience; 2006.
19. Studeny M, Vejnarova J. In: Jordan MI, editor. The Multi-information function as a tool for measuring stochastic dependence. Kluwer; 1998. p. 261–298.
20. Wang Z, Simoncelli EP. Maximum differentiation (MAD) competition: A methodology for comparing computational models of perceptual quantities. Journal of Vision. 2008;8(12):8–8. pmid:18831621
21. Malo J, Simoncelli E. Geometrical and statistical properties of vision models obtained via maximum differentiation. In: SPIE Electronic Imaging. International Society for Optics and Photonics; 2015. p. 93940L–93940L.
22. Ma K, Wu Q, Duanmu Z, Wang Z, Yong H, Zhang L, et al. Group MAD competition: A new methodology to compare objective image quality models. IEEE ICCV. 2016;.
23. Malo J, Kane D, Bertalmio M. The Maximum Differentiation competition depends on the Viewing Conditions. J Vision. 2016;16(12):822–822.
24. Watson AB, Malo J. Video quality measures based on the standard spatial observer. In: IEEE Proc. ICIP 02. vol. 3. IEEE; 2002. p. III–41.
25. Kendrick KN, Winawer J, Rokem A, Mezer A, Wandell A. A two-stage cascade model of BOLD responses in human visual cortex. PLoS Comput Biol. 2013;9(5):e1003079.
26. Bertalmio M, Cyriac P, Batard T, Martinez-Garcia M, Malo J. The Wilson-Cowan model describes Contrast Response and Subjective Distortion. J Vision. 2017;17(10):657–657.
27. Ballé J, Laparra V, Simoncelli EP. End-to-end Optimized Image Compression. Int Conf Learn Repres. 2017;5. doi:arXiv:1611.01704.
28. Sakrison DJ. On the role of the observer and a distortion measure in image transmission. IEEE Trans Commun. 1977;25:1251–1267.
29. Watson AB, editor. Digital Images and Human Vision. Cambridge, MA, USA: MIT Press; 1993.
30. Wang Z, Bovik AC. Mean squared error: Love it or leave it? A new look at signal fidelity measures. IEEE Signal Processing Magazine. 2009;26(1):98–117.
31. Mantiuk R, Myszkowski K, Seidel H. A perceptual framework for contrast processing of high dynamic range images. ACM Trans Appl Percept. 2000;3(3):286–308.
32. Wallace GK. The JPEG Still Picture Compression Standard. Commun ACM. 1991;34(4):30–44.
33. Le Gall D. MPEG: A Video Compression Standard for Multimedia Applications. Commun ACM. 1991;34(4):46–58.
34. Malo J, Gutiérrez J, Epifanio I, Ferri FJ, Artigas JM. Perceptual feedback in multigrid motion estimation using an improved DCT quantization. IEEE Trans Im Proc. 2001;10(10):1411–1427.
35. Gutiérrez J, Ferri FJ, Malo J. Regularization operators for natural images based on nonlinear perception models. IEEE Trans Im Proc. 2006;15(1):189–200.
36. Laparra V, Gutiérrez J, Camps-Valls G, Malo J. Image denoising with kernels based on natural image relations. The Journal of Machine Learning Research. 2010;11:873–903.
37. Coen-Cagli R, Schwartz O. The impact on midlevel vision of statistically optimal divisive normalization in V1. Journal of Vision. 2013;13(8):13. pmid:23857950
38. Del Bimbo A. Visual Information Retrieval. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.; 1999.
39. Kay KN, Naselaris T, Prenger RJ, Gallant JL. Identifying natural images from human brain activity. Nature. 2008;452(7185):352–355. pmid:18322462
40. Kamitani Y, Tong F. Decoding the visual and subjective contents of the human brain. Nature Neurosci. 2005;8(5):679–685. pmid:15852014
41. Marré O, Botella V, Simmons T, Mora G, Tkacik G. High accuracy dynamical motion from a large retinal population. PLoS Comput Biol. 2017;11(7):e1004304.
42. Martinez-Garcia M, Galan B, Malo J. Image Reconstruction from Neural Responses: what can we learn from the analytic inverse? Perception, ECVP-16. 2016;45(S2):46–46.
43. Laparra V, Jiménez S, Camps-Valls G, Malo J. Nonlinearities and adaptation of color vision from sequential principal curves analysis. Neural Comp. 2012;24(10):2751–2788.
44. Gutmann MU, Laparra V, Hyvärinen A, Malo J. Spatio-chromatic adaptation via higher-order canonical correlation analysis of natural images. PloS one. 2014;9(2):e86481. pmid:24533049
45. Laparra V, Malo J. Visual aftereffects and sensory nonlinearities from a single statistical framework. Frontiers in Human Neuroscience. 2015;9:557. pmid:26528165
46. Stanley GB, F LF, Dan Y. Reconstruction of natural scenes from ensemble responses in the lateral geniculate nucleus. J Neuroscience. 1999;19(18):8036–8042.
47. Wilson HR, Cowan JD. Excitatory and inhibitory interactions in localized populations of model neurons. Biophys J. 1972;12:1–24. pmid:4332108
48. Cowan JD, Neuman J, van Drongelen W. Wilson–Cowan Equations for Neocortical Dynamics. J Math Neurosci. 2016;6(1):1–24. pmid:26728012
49. Cyriac P, Bertalmio M, Kane D, Vazquez-Corral J. A tone mapping operator based on neural and psychophysical models of visual perception. Proc IS&T/SPIE Electronic Imaging. 2015;9394.
50. McFarland JM, Cui Y, Butts DA. Inferring Nonlinear Neuronal Computation Based on Physiologically Plausible Inputs. PLOS Computational Biology. 2013;9(7):1–18.
51. Vintch B, Movshon JA, Simoncelli EP. A Convolutional Subunit Model for Neuronal Responses in Macaque V1. Journal of Neuroscience. 2015;35(44):14829–14841. pmid:26538653
52. Cui Y, Wang YV, Park SJH, Demb JB, Butts DA. Divisive suppression explains high-precision firing and contrast adaptation in retinal ganglion cells. eLife. 2016;5:e19460. pmid:27841746
53. Antolík J, Hofer SB, Bednar JA, Mrsic-Flogel TD. Model Constrained by Visual Hierarchy Improves Prediction of Neural Responses to Natural Scenes. PLOS Computational Biology. 2016;12(6):1–22.
54. Goodfellow I, Bengio Y, Courville A. Deep Learning. MIT Press; 2016.
55. Baydin AG, Pearlmutter BA, Radul AA, Siskind JM. Automatic differentiation in machine learning: a survey. CoRR. 2015; https://arxiv.org/abs/1502.05767.
56. Laparra V, Berardino A, Ballé J, Simoncelli EP. Perceptually optimized image rendering. J Opt Soc Am A. 2017;34(9):1511–1525.
57. Klindt D, Ecker AS, Euler T, Bethge M. Neural system identification for large populations separating “what”and “where”. In: NIPS 30; 2017. p. 3506–3516.
58. Tekalp AM. Digital Video Processing. Upper Saddle River, NJ: Prentice Hall; 1995.
59. Campbell FW, Robson JG. Application of Fourier Analysis to the Visibility of Gratings. Journal of Physiology. 1968;197:551–566. pmid:5666169
60. Mullen KT. The CSF of Human Colour Vision to Red-Green and Yellow-Blue Chromatic Gratings. J Physiol. 1985;359:381–400.
61. Wyszecki G, Stiles WS. Color Science: Concepts and Methods, Quantitative Data and Formulae. New York: John Wiley & Sons; 1982.
62. Fairchild MD. Color Appearance Models. The Wiley-IS&T Series in Imaging Science and Technology. Wiley; 2013.
63. Wandell BA. Foundations of Vision. Massachusetts: Sinauer Assoc. Publish.; 1995.
64. Schwartz G, Rieke F. Nonlinear spatial encoding by retinal ganglion cells: when 1 + 1 ≠ 2. The Journal of General Physiology. 2011;138(3):283–290.
65. Schwartz GW, Okawa H, Dunn FA, Morgan JL, Kerschensteiner D, Wong RO, et al. The spatial structure of a nonlinear receptive field. Nature Neurosci. 2012;15(11):1572–1580. pmid:23001060
66. Minka TP. Old and New Matrix Algebra Useful for Statistics; 2001.
67. Fairchild MD. Color appearance models. Wiley; 2013.
68. Cyriac P, Kane D, Bertalmio M. Optimized Tone Curve for In-Camera Image Processing. IST Electronic Imaging Conference. 2016;13:1–7.
69. Kane D, Bertalmio M. System gamma as a function of image-and monitor-dynamic range. Journal of vision. 2016;16(6):4–4. pmid:27271806
70. Palma-Amestoy R, Provenzi E, Bertalmio M, Caselles V. A perceptually inspired variational framework for color enhancement. IEEE Trans Patt Anal Mach Intell. 2009;31(3):458–474.
71. Huang J, Mumford D. Statistics of natural images and models. In: Computer Vision and Pattern Recognition, 1999. IEEE Computer Society Conference On. vol. 1. IEEE; 1999. p. 541–547.
72. Laughlin SB. Matching coding to scenes to enhance efficiency. In: In Braddick, O.J. & Sleigh, A.C. (Eds) Phys. Biol. Proc. Imag. Springer; 1983. p. 42–52.
73. Cyriac P, Kane D, Bertalmio M. Automatic, Viewing-Condition Dependent Contrast Grading based on Perceptual Models. SMPTE 2016 Annual Tech Conf. 2016; p. 1–11.
74. MacLeod D, von der Twer T. The pleistochrome: optimal opponent codes for natural colors. In: Heyer D, Mausfeld R, editors. Color Perception: From Light to Object. Oxford, UK: Oxford Univ. Press; 2003.
75. Barlow H. Redundancy reduction revisited. Network: Comp Neur Syst. 2001;12(3):241–253.
76. Petersen KB, Pedersen MS. The Matrix Cookbook; 2012. Available from: http://www2.imm.dtu.dk/pubdb/p.php?3274.
77. Teo PC, Heeger DJ. Perceptual image distortion. Proc SPIE. 1994;2179:127–141.
78. Pons A, Malo J, Artigas J, Capilla P. Image quality metric based on multidimensional contrast perception models. Displays. 1999;20(2):93–110.
79. Brown WRJ, MacAdam DL. Visual sensitivities to combined chromaticity and luminance differences. JOSA. 1949;39(10):808–834.
80. Malo J, Laparra V. Psychophysically tuned divisive normalization approximately factorizes the PDF of natural images. Neural computation. 2010;22(12):3179–3206. pmid:20858127
81. Press WH, Teukolsky SA, Vetterling WT, Flannery BP. Numerical Recipes 3rd Edition: The Art of Scientific Computing. 3rd ed. New York, NY, USA: Cambridge University Press; 2007.
82. Golub GH, Van Loan CF. Matrix Computations (3rd Ed.). Baltimore, MD, USA: Johns Hopkins University Press; 1996.
83. Kandel ER, Schwartz JH, Jessell TM, editors. Principles of Neural Science. 3rd ed. New York: Elsevier; 1991.
84. Freeman J, Simoncelli E. Metamers of the ventral stream. Nature Neuroscience. 2011;14(9):1195–1204. pmid:21841776
85. Malo J, Navarro R, Epifanio I, Ferri F, Artigas JM. Non-linear invertible representation for joint statistical and perceptual feature decorrelation. Lect Not Comp Sci. 2000;1876:658–667.
86. Malo J, Gutiérrez J. V1 non-linear properties emerge from local-to-global non-linear ICA. Network: Computation in Neural Systems. 2006;17(1):85–102.
87. Malo J, Luque MJ. ColorLab: The Matlab toolbox for Colorimetry and Color Vision; 2002. http://isp.uv.es/code/visioncolor/colorlab.html.
88. USC-SIPI Image Database; 1977. http://sipi.usc.edu/database/.
89. Legge GE, Foley JM. Contrast Masking in Human Vision. Journal of the Optical Society of America. 1980;70:1458–1471. pmid:7463185
90. Legge GE. A Power Law for Contrast Discrimination. Vision Research. 1981;18:68–91.
91. Barlow HB. Possible principles underlying the transformation of sensory messages. In: Rosenblith W, editor. Sensory Communication. MIT Press; 1961. p. 217–234.
92. Bell AJ, Sejnowski TJ. The “independent components” of natural scenes are edge filters. Vision Research. 1997;37(23):3327–3338. pmid:9425547
93. Bertalmio M. From image processing to computational neuroscience: A neural model based on histogram equalization. Front Comput Neurosci. 2014;8(71):1–7.
94. Simoncelli EP, Olshausen BA. Natural Image Statistics and Neural Representation. Annual Review of Neuroscience. 2001;24(1):1193–1216. pmid:11520932
95. Malo J, Ferri F, Albert J, Soret J, Artigas JM. The role of perceptual contrast non-linearities in image transform quantization. Im Vis Comp. 2000;18(3):233–246.
96. Gatys LA, Ecker AS, Bethge M. A Neural Algorithm of Artistic Style; 2015. Available from: http://arxiv.org/abs/1508.06576.
97. Wang Z, Bovik AC. Modern image quality assessment. Synthesis Lectures on Image, Video, and Multimedia Processing. 2006;2(1):1–156.
98. Ponomarenko N, Carli M, Lukin V, Egiazarian K, Astola J, Battisti F. Color Image Database for Evaluation of Image Quality Metrics. Proc Int Workshop on Multimedia Signal Processing. 2008; p. 403–408.
99. Wang Z, Bovik AC, Sheikh HR, Simoncelli EP. Image quality assessment: from error visibility to structural similarity. IEEE Trans Im Proc. 2004;13(4):600–612.
100. Miyawaki Y, Uchida H, Yamashita O, Sato MA, Morito Y, Tanabe HC, et al. Visual image reconstruction from human brain activity using a combination of multiscale local image decoders. Neuron. 2008;60(5):915–929. pmid:19081384
101. van Hateren J.H., van der Schaaf A. Independent Component Filters of Natural Images Compared with Simple Cells in Primary Visual Cortex. Proceedings: Biological Sciences. 1998;265(1394):359–366.
102. Pelli DG. Noise in the visual system may be early. In: Landy M, Movshon JA, editors. Computational Models of Visual Processing. Cambridge: MIT Press; 1991. p. 147–152.
103. Georgeson M, Meese T. Fixed or variable noise in contrast discrimination? The jury’s still out… Vision Research. 2006;46(25):4294–4303. pmid:16225900
104. Mahendran A, Vedaldi A. Understanding Deep Image Representations by Inverting Them. CoRR. 2014: https://arxiv.org/abs/1412.0035.
105. Malo J, Bertalmio M. Appropriate kernels for Divisive Normalization explained by Wilson-Cowan equations. MODVIS-18 arXiv Quant Biol. 2018: https://arxiv.org/abs/1804.05964.
106. Capilla P, Diez-Ajenjo M, Luque MJ, Malo J. Corresponding-pair procedure: a new approach to simulation of dichromatic color perception. JOSA A. 2004;21(2):176–186. pmid:14763760
107. Wilson HR, Humanski R. Spatial frequency adaptation and contrast gain control. Vis Res. 1993;33(8):1133–1149. pmid:8506653
108. Schutt HH, Wichmann FA. An image-computable psychophysical spatial vision model. Journal of Vision. 2017;17(12):12. pmid:29053781
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2018 Martinez-Garcia et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
In vision science, cascades of Linear+Nonlinear transforms are very successful in modeling a number of perceptual experiences. However, the conventional literature is usually too focused on only describing the forward input-output transform. Instead, in this work we present the mathematics of such cascades beyond the forward transform, namely the Jacobian matrices and the inverse. The fundamental reason for this analytical treatment is that it offers useful analytical insight into the psychophysics, the physiology, and the function of the visual system. For instance, we show how the trends of the sensitivity (volume of the discrimination regions) and the adaptation of the receptive fields can be identified in the expression of the Jacobian w.r.t. the stimulus. This matrix also tells us which regions of the stimulus space are encoded more efficiently in multi-information terms. The Jacobian w.r.t. the parameters shows which aspects of the model have bigger impact in the response, and hence their relative relevance. The analytic inverse implies conditions for the response and model parameters to ensure appropriate decoding. From the experimental and applied perspective, (a) the Jacobian w.r.t. the stimulus is necessary in new experimental methods based on the synthesis of visual stimuli with interesting geometrical properties, (b) the Jacobian matrices w.r.t. the parameters are convenient to learn the model from classical experiments or alternative goal optimization, and (c) the inverse is a promising model-based alternative to blind machine-learning methods for neural decoding that do not include meaningful biological information. The theory is checked by building and testing a vision model that actually follows a modular Linear+Nonlinear program. Our illustrative derivable and invertible model consists of a cascade of modules that account for brightness, contrast, energy masking, and wavelet masking. To stress the generality of this modular setting we show examples where some of the canonical Divisive Normalization modules are substituted by equivalent modules such as the Wilson-Cowan interaction model (at the V1 cortex) or a tone-mapping model (at the retina).
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer