Estimation of functions of d variables is considered using ridge combinations of the form m k=1 c... more Estimation of functions of d variables is considered using ridge combinations of the form m k=1 c 1,k φ(d j=1 c 0,j,k x j − b k) where the activation function φ is a function with bounded value and derivative. These include single-hidden layer neural networks, polynomials, and sinusoidal models. From a sample of size n of possibly noisy values at random sites X ∈ B = [−1, 1] d , the minimax mean square error is examined for functions in the closure of the ℓ 1 hull of ridge functions with activation φ. It is shown to be of order d/n to a fractional power (when d is of smaller order than n), and to be of order (log d)/n to a fractional power (when d is of larger order than n). Dependence on constraints v 0 and v 1 on the ℓ 1 norms of inner parameter c 0 and outer parameter c 1 , respectively, is also examined. Also, lower and upper bounds on the fractional power are given. The heart of the analysis is development of information-theoretic packing numbers for these classes of functions.
The adsorption of polyphenols from apples, a good source of polyphenols in the human diet, onto β... more The adsorption of polyphenols from apples, a good source of polyphenols in the human diet, onto β-glucan, a soluble dietary fibre were studied. Polyphenols were extracted from the flesh and peel of two apple varieties (wild apple and Slavonska srčika) and adsorbed onto β-glucan for 16 hours. The adsorption capacities (mg/g) and equilibrium polyphenol concentrations (mg/l) were modelled with Freundlich and Langmuir isotherms. Polyphenols from the flesh and peel showed different behaviours – flesh polyphenols exhibited greater affinity and peel polyphenols greater theoretical adsorption capacity. The analysis of individual polyphenols with high-performance liquid chromatography revealed that the composition of the flesh and peel differed (flesh was rich in phenolic acids, peel in flavonols) which could explain the contrasting adsorption behaviour. This study shows that polyphenols from apples can be adsorbed onto β-glucan, that the flesh and peel exhibit distinct adsorption behaviours...
For the additive Gaussian noise channel with average codeword power constraint, sparse superposit... more For the additive Gaussian noise channel with average codeword power constraint, sparse superposition codes and adaptive successive decoding is developed. Codewords are linear combinations of subsets of vectors, with the message indexed by the choice of subset. A feasible decoding algorithm is presented. Communication is reliable with error probability exponentially small for all rates below the Shannon capacity.
2017 IEEE International Symposium on Information Theory (ISIT), 2017
Estimation of functions of d variables is considered using ridge combinations of the form m k=1 c... more Estimation of functions of d variables is considered using ridge combinations of the form m k=1 c 1,k φ(d j=1 c 0,j,k x j − b k) where the activation function φ is a function with bounded value and derivative. These include single-hidden layer neural networks, polynomials, and sinusoidal models. From a sample of size n of possibly noisy values at random sites X ∈ B = [−1, 1] d , the minimax mean square error is examined for functions in the closure of the ℓ 1 hull of ridge functions with activation φ. It is shown to be of order d/n to a fractional power (when d is of smaller order than n), and to be of order (log d)/n to a fractional power (when d is of larger order than n). Dependence on constraints v 0 and v 1 on the ℓ 1 norms of inner parameter c 0 and outer parameter c 1 , respectively, is also examined. Also, lower and upper bounds on the fractional power are given. The heart of the analysis is development of information-theoretic packing numbers for these classes of functions.
We establish L ∞ and L 2 error bounds for functions of many variables that are approximated by li... more We establish L ∞ and L 2 error bounds for functions of many variables that are approximated by linear combinations of ReLU (rectified linear unit) and squared ReLU ridge functions with ℓ 1 and ℓ 0 controls on their inner and outer parameters. With the squared ReLU ridge function, we show that the L 2 approximation error is inversely proportional to the inner layer ℓ 0 sparsity and it need only be sublinear in the outer layer ℓ 0 sparsity. Our constructions are obtained using a variant of the Jones-Barron probabilistic method, which can be interpreted as either stratified sampling with proportionate allocation or two-stage cluster sampling. We also provide companion error lower bounds that reveal near optimality of our constructions. Despite the sparsity assumptions, we showcase the richness and flexibility of these ridge combinations by defining a large family of functions, in terms of certain spectral conditions, that are particularly well approximated by them.
IEEE Transactions on Information Theory, May 1, 2012
For the additive white Gaussian noise channel with average codeword power constraint, new coding ... more For the additive white Gaussian noise channel with average codeword power constraint, new coding methods are devised in which the codewords are sparse superpositions, that is, linear combinations of subsets of vectors from a given design, with the possible messages indexed by the choice of subset. Decoding is by least squares, tailored to the assumed form of linear combination. Communication is shown to be reliable with error probability exponentially small for all rates up to the Shannon capacity.
The adsorption of polyphenols from apples, a good source of polyphenols in the human diet, onto β... more The adsorption of polyphenols from apples, a good source of polyphenols in the human diet, onto β-glucan, a soluble dietary fibre were studied. Polyphenols were extracted from the flesh and peel of two apple varieties (wild apple and Slavonska srčika) and adsorbed onto β-glucan for 16 hours. The adsorption capacities (mg/g) and equilibrium polyphenol concentrations (mg/l) were modelled with Freundlich and Langmuir isotherms. Polyphenols from the flesh and peel showed different behaviours-flesh polyphenols exhibited greater affinity and peel polyphenols greater theoretical adsorption capacity. The analysis of individual polyphenols with high-performance liquid chromatography revealed that the composition of the flesh and peel differed (flesh was rich in phenolic acids, peel in flavonols) which could explain the contrasting adsorption behaviour. This study shows that polyphenols from apples can be adsorbed onto β-glucan, that the flesh and peel exhibit distinct adsorption behaviours and that the polyphenol composition can affect the adsorption mechanism.
New families of Fisher information and entropy power inequalities for sums of independent random ... more New families of Fisher information and entropy power inequalities for sums of independent random variables are presented. These inequalities relate the information in the sum of n independent random variables to the information contained in sums over subsets of the random variables, for an arbitrary collection of subsets. As a consequence, a simple proof of the monotonicity of information in central limit theorems is obtained, both in the setting of i.i.d. summands as well as in the more general setting of independent summands with variancestandardized sums.
For Gaussian regression, we develop and analyse methods for combining estimators from various mod... more For Gaussian regression, we develop and analyse methods for combining estimators from various models. For squared-error loss, an unbiased estimator of the risk of a mixture of general estimators is developed. Special attention is given to the case that the components are least-squares projections into arbitrary linear subspaces. We relate the unbiased risk estimate for the mixture estimator to estimates of the risks achieved by the components. This results in accurate bounds on the risk and its unbiased estimate-without advance knowledge of which model is best, the resulting performance is comparable to what is achieved by the best of the individual models.
For the additive white Gaussian noise channel with average codeword power constraint, new coding ... more For the additive white Gaussian noise channel with average codeword power constraint, new coding methods are devised in which the codewords are sparse superpositions, that is, linear combinations of subsets of vectors from a given design, with the possible messages indexed by the choice of subset. Decoding is by least squares, tailored to the assumed form of linear combination. Communication is shown to be reliable with error probability exponentially small for all rates up to the Shannon capacity.
The minimum description length principle applied to function estimation can yield a criterion of ... more The minimum description length principle applied to function estimation can yield a criterion of the form log(likelihood)+const·m instead of the familiar log(likelihood)+(m/2) log n where m is the number of parameters and n is the sample size. The improved criterion yields minimax optimal rates for redundancy and statistical risk. The analysis suggests an information-theoretic reconciliation of criteria proposed by Rissanen
Abstract—For the additive Gaussian noise channel with aver-age codeword power constraint, sparse ... more Abstract—For the additive Gaussian noise channel with aver-age codeword power constraint, sparse superposition codes and adaptive successive decoding is developed. Codewords are linear combinations of subsets of vectors, with the message indexed by the choice of subset. A feasible decoding algorithm is presented. Communication is reliable with error probability exponentially small for all rates below the Shannon capacity. I.
In this present study, the optimal conditions of ultrasonic-assisted extraction (UAE) of capsaici... more In this present study, the optimal conditions of ultrasonic-assisted extraction (UAE) of capsaicinoids from hot Chili peppers were determined for large scale preparation. First, single factor experiments were performed to optimize the extraction procedure of capsaicinoids, and initial optimized results were: ratio of solvent to mass of 6 to 10 ml/g, extraction temperature of 25 to 35°C, and extraction time of 0 to 30 min. Then, an orthogonal array experimental design (L 9 (3 4)) was used to further optimize the extraction procedure. The results of F-test and P-value indicated that the effect order on extraction yield of capsaicinoids from high to low was ratio of solvent to mass, extraction time, and extraction temperature. The maximum extraction yield of capsaicinoids was obtained at ratio of solvent to mass of 10 ml/g, extraction time of 40 min, and extraction temperature of 25°C. Under these conditions, the extraction yields of capsaicinoids were 2.35 ± 0.042 and 3.92 ± 0.089 mg/g for conventional and UAE methods, respectively.
We extend the correspondence between two-stage coding procedures in data compression and penalize... more We extend the correspondence between two-stage coding procedures in data compression and penalized likelihood procedures in statistical estimation. Traditionally, this had required restriction to countable parameter spaces. We show how to extend this correspondence in the uncountable parameter case. Leveraging the description length interpretations of penalized likelihood procedures we devise new techniques to derive adaptive risk bounds of such procedures. We show that the existence of certain countable coverings of the parameter space implies adaptive risk bounds and thus our theory is quite general. We apply our techniques to illustrate risk bounds for ℓ_1 type penalized procedures in canonical high dimensional statistical problems such as linear regression and Gaussian graphical Models. In the linear regression problem, we also demonstrate how the traditional l_0 penalty times (n)/2 plus lower order terms has a two stage description length interpretation and present risk bounds ...
For any ReLU network there is a representation in which the sum of the absolute values of the wei... more For any ReLU network there is a representation in which the sum of the absolute values of the weights into each node is exactly 1, and the input layer variables are multiplied by a value V coinciding with the total variation of the path weights. Implications are given for Gaussian complexity, Rademacher complexity, statistical risk, and metric entropy, all of which are shown to be proportional to V. There is no dependence on the number of nodes per layer, except for the number of inputs d. For estimation with sub-Gaussian noise, the mean square generalization error bounds that can be obtained are of order V √(L + d)/√(n), where L is the number of layers and n is the sample size.
We give conditions for an O(1/n) rate of convergence of Fisher information and relative entropy i... more We give conditions for an O(1/n) rate of convergence of Fisher information and relative entropy in the Central Limit Theorem. We use the theory of projections in L2 spaces and Poincare inequalities, to provide a better understanding of the decrease in Fisher information implied by results of Barron and Brown. We show that if the standardized Fisher information ever becomes finite then it converges to zero.
New inequalities are proved for the variance of the Pitman estimators (minimum variance equivaria... more New inequalities are proved for the variance of the Pitman estimators (minimum variance equivariant estimators) of θ constructed from samples of fixed size from populations F (x−θ). The inequalities are closely related to the classical Stam inequality for the Fisher information, its analog in small samples, and a powerful variance drop inequality. The only condition required is finite variance of F ; even the absolute continuity of F is not assumed. As corollaries of the main inequalities for small samples, one obtains alternate proofs of known properties of the Fisher information, as well as interesting new observations like the fact that the variance of the Pitman estimator based on a sample of size n scaled by n monotonically decreases in n. Extensions of the results to the polynomial versions of the Pitman estimators and a multivariate location parameter are given. Also, the search for characterization of equality conditions for one of the inequalities leads to a Cauchy-type functional equation for independent random variables, and an interesting new behavior of its solutions is described.
2006 IEEE International Symposium on Information Theory, 2006
We provide a simple proof of the monotonicity of information in the Central Limit Theorem for i.i... more We provide a simple proof of the monotonicity of information in the Central Limit Theorem for i.i.d. summands. Extensions to the more general case of independent, not identically distributed summands are also presented. New families of Fisher information and entropy power inequalities are discussed.
Estimation of functions of d variables is considered using ridge combinations of the form m k=1 c... more Estimation of functions of d variables is considered using ridge combinations of the form m k=1 c 1,k φ(d j=1 c 0,j,k x j − b k) where the activation function φ is a function with bounded value and derivative. These include single-hidden layer neural networks, polynomials, and sinusoidal models. From a sample of size n of possibly noisy values at random sites X ∈ B = [−1, 1] d , the minimax mean square error is examined for functions in the closure of the ℓ 1 hull of ridge functions with activation φ. It is shown to be of order d/n to a fractional power (when d is of smaller order than n), and to be of order (log d)/n to a fractional power (when d is of larger order than n). Dependence on constraints v 0 and v 1 on the ℓ 1 norms of inner parameter c 0 and outer parameter c 1 , respectively, is also examined. Also, lower and upper bounds on the fractional power are given. The heart of the analysis is development of information-theoretic packing numbers for these classes of functions.
The adsorption of polyphenols from apples, a good source of polyphenols in the human diet, onto β... more The adsorption of polyphenols from apples, a good source of polyphenols in the human diet, onto β-glucan, a soluble dietary fibre were studied. Polyphenols were extracted from the flesh and peel of two apple varieties (wild apple and Slavonska srčika) and adsorbed onto β-glucan for 16 hours. The adsorption capacities (mg/g) and equilibrium polyphenol concentrations (mg/l) were modelled with Freundlich and Langmuir isotherms. Polyphenols from the flesh and peel showed different behaviours – flesh polyphenols exhibited greater affinity and peel polyphenols greater theoretical adsorption capacity. The analysis of individual polyphenols with high-performance liquid chromatography revealed that the composition of the flesh and peel differed (flesh was rich in phenolic acids, peel in flavonols) which could explain the contrasting adsorption behaviour. This study shows that polyphenols from apples can be adsorbed onto β-glucan, that the flesh and peel exhibit distinct adsorption behaviours...
For the additive Gaussian noise channel with average codeword power constraint, sparse superposit... more For the additive Gaussian noise channel with average codeword power constraint, sparse superposition codes and adaptive successive decoding is developed. Codewords are linear combinations of subsets of vectors, with the message indexed by the choice of subset. A feasible decoding algorithm is presented. Communication is reliable with error probability exponentially small for all rates below the Shannon capacity.
2017 IEEE International Symposium on Information Theory (ISIT), 2017
Estimation of functions of d variables is considered using ridge combinations of the form m k=1 c... more Estimation of functions of d variables is considered using ridge combinations of the form m k=1 c 1,k φ(d j=1 c 0,j,k x j − b k) where the activation function φ is a function with bounded value and derivative. These include single-hidden layer neural networks, polynomials, and sinusoidal models. From a sample of size n of possibly noisy values at random sites X ∈ B = [−1, 1] d , the minimax mean square error is examined for functions in the closure of the ℓ 1 hull of ridge functions with activation φ. It is shown to be of order d/n to a fractional power (when d is of smaller order than n), and to be of order (log d)/n to a fractional power (when d is of larger order than n). Dependence on constraints v 0 and v 1 on the ℓ 1 norms of inner parameter c 0 and outer parameter c 1 , respectively, is also examined. Also, lower and upper bounds on the fractional power are given. The heart of the analysis is development of information-theoretic packing numbers for these classes of functions.
We establish L ∞ and L 2 error bounds for functions of many variables that are approximated by li... more We establish L ∞ and L 2 error bounds for functions of many variables that are approximated by linear combinations of ReLU (rectified linear unit) and squared ReLU ridge functions with ℓ 1 and ℓ 0 controls on their inner and outer parameters. With the squared ReLU ridge function, we show that the L 2 approximation error is inversely proportional to the inner layer ℓ 0 sparsity and it need only be sublinear in the outer layer ℓ 0 sparsity. Our constructions are obtained using a variant of the Jones-Barron probabilistic method, which can be interpreted as either stratified sampling with proportionate allocation or two-stage cluster sampling. We also provide companion error lower bounds that reveal near optimality of our constructions. Despite the sparsity assumptions, we showcase the richness and flexibility of these ridge combinations by defining a large family of functions, in terms of certain spectral conditions, that are particularly well approximated by them.
IEEE Transactions on Information Theory, May 1, 2012
For the additive white Gaussian noise channel with average codeword power constraint, new coding ... more For the additive white Gaussian noise channel with average codeword power constraint, new coding methods are devised in which the codewords are sparse superpositions, that is, linear combinations of subsets of vectors from a given design, with the possible messages indexed by the choice of subset. Decoding is by least squares, tailored to the assumed form of linear combination. Communication is shown to be reliable with error probability exponentially small for all rates up to the Shannon capacity.
The adsorption of polyphenols from apples, a good source of polyphenols in the human diet, onto β... more The adsorption of polyphenols from apples, a good source of polyphenols in the human diet, onto β-glucan, a soluble dietary fibre were studied. Polyphenols were extracted from the flesh and peel of two apple varieties (wild apple and Slavonska srčika) and adsorbed onto β-glucan for 16 hours. The adsorption capacities (mg/g) and equilibrium polyphenol concentrations (mg/l) were modelled with Freundlich and Langmuir isotherms. Polyphenols from the flesh and peel showed different behaviours-flesh polyphenols exhibited greater affinity and peel polyphenols greater theoretical adsorption capacity. The analysis of individual polyphenols with high-performance liquid chromatography revealed that the composition of the flesh and peel differed (flesh was rich in phenolic acids, peel in flavonols) which could explain the contrasting adsorption behaviour. This study shows that polyphenols from apples can be adsorbed onto β-glucan, that the flesh and peel exhibit distinct adsorption behaviours and that the polyphenol composition can affect the adsorption mechanism.
New families of Fisher information and entropy power inequalities for sums of independent random ... more New families of Fisher information and entropy power inequalities for sums of independent random variables are presented. These inequalities relate the information in the sum of n independent random variables to the information contained in sums over subsets of the random variables, for an arbitrary collection of subsets. As a consequence, a simple proof of the monotonicity of information in central limit theorems is obtained, both in the setting of i.i.d. summands as well as in the more general setting of independent summands with variancestandardized sums.
For Gaussian regression, we develop and analyse methods for combining estimators from various mod... more For Gaussian regression, we develop and analyse methods for combining estimators from various models. For squared-error loss, an unbiased estimator of the risk of a mixture of general estimators is developed. Special attention is given to the case that the components are least-squares projections into arbitrary linear subspaces. We relate the unbiased risk estimate for the mixture estimator to estimates of the risks achieved by the components. This results in accurate bounds on the risk and its unbiased estimate-without advance knowledge of which model is best, the resulting performance is comparable to what is achieved by the best of the individual models.
For the additive white Gaussian noise channel with average codeword power constraint, new coding ... more For the additive white Gaussian noise channel with average codeword power constraint, new coding methods are devised in which the codewords are sparse superpositions, that is, linear combinations of subsets of vectors from a given design, with the possible messages indexed by the choice of subset. Decoding is by least squares, tailored to the assumed form of linear combination. Communication is shown to be reliable with error probability exponentially small for all rates up to the Shannon capacity.
The minimum description length principle applied to function estimation can yield a criterion of ... more The minimum description length principle applied to function estimation can yield a criterion of the form log(likelihood)+const·m instead of the familiar log(likelihood)+(m/2) log n where m is the number of parameters and n is the sample size. The improved criterion yields minimax optimal rates for redundancy and statistical risk. The analysis suggests an information-theoretic reconciliation of criteria proposed by Rissanen
Abstract—For the additive Gaussian noise channel with aver-age codeword power constraint, sparse ... more Abstract—For the additive Gaussian noise channel with aver-age codeword power constraint, sparse superposition codes and adaptive successive decoding is developed. Codewords are linear combinations of subsets of vectors, with the message indexed by the choice of subset. A feasible decoding algorithm is presented. Communication is reliable with error probability exponentially small for all rates below the Shannon capacity. I.
In this present study, the optimal conditions of ultrasonic-assisted extraction (UAE) of capsaici... more In this present study, the optimal conditions of ultrasonic-assisted extraction (UAE) of capsaicinoids from hot Chili peppers were determined for large scale preparation. First, single factor experiments were performed to optimize the extraction procedure of capsaicinoids, and initial optimized results were: ratio of solvent to mass of 6 to 10 ml/g, extraction temperature of 25 to 35°C, and extraction time of 0 to 30 min. Then, an orthogonal array experimental design (L 9 (3 4)) was used to further optimize the extraction procedure. The results of F-test and P-value indicated that the effect order on extraction yield of capsaicinoids from high to low was ratio of solvent to mass, extraction time, and extraction temperature. The maximum extraction yield of capsaicinoids was obtained at ratio of solvent to mass of 10 ml/g, extraction time of 40 min, and extraction temperature of 25°C. Under these conditions, the extraction yields of capsaicinoids were 2.35 ± 0.042 and 3.92 ± 0.089 mg/g for conventional and UAE methods, respectively.
We extend the correspondence between two-stage coding procedures in data compression and penalize... more We extend the correspondence between two-stage coding procedures in data compression and penalized likelihood procedures in statistical estimation. Traditionally, this had required restriction to countable parameter spaces. We show how to extend this correspondence in the uncountable parameter case. Leveraging the description length interpretations of penalized likelihood procedures we devise new techniques to derive adaptive risk bounds of such procedures. We show that the existence of certain countable coverings of the parameter space implies adaptive risk bounds and thus our theory is quite general. We apply our techniques to illustrate risk bounds for ℓ_1 type penalized procedures in canonical high dimensional statistical problems such as linear regression and Gaussian graphical Models. In the linear regression problem, we also demonstrate how the traditional l_0 penalty times (n)/2 plus lower order terms has a two stage description length interpretation and present risk bounds ...
For any ReLU network there is a representation in which the sum of the absolute values of the wei... more For any ReLU network there is a representation in which the sum of the absolute values of the weights into each node is exactly 1, and the input layer variables are multiplied by a value V coinciding with the total variation of the path weights. Implications are given for Gaussian complexity, Rademacher complexity, statistical risk, and metric entropy, all of which are shown to be proportional to V. There is no dependence on the number of nodes per layer, except for the number of inputs d. For estimation with sub-Gaussian noise, the mean square generalization error bounds that can be obtained are of order V √(L + d)/√(n), where L is the number of layers and n is the sample size.
We give conditions for an O(1/n) rate of convergence of Fisher information and relative entropy i... more We give conditions for an O(1/n) rate of convergence of Fisher information and relative entropy in the Central Limit Theorem. We use the theory of projections in L2 spaces and Poincare inequalities, to provide a better understanding of the decrease in Fisher information implied by results of Barron and Brown. We show that if the standardized Fisher information ever becomes finite then it converges to zero.
New inequalities are proved for the variance of the Pitman estimators (minimum variance equivaria... more New inequalities are proved for the variance of the Pitman estimators (minimum variance equivariant estimators) of θ constructed from samples of fixed size from populations F (x−θ). The inequalities are closely related to the classical Stam inequality for the Fisher information, its analog in small samples, and a powerful variance drop inequality. The only condition required is finite variance of F ; even the absolute continuity of F is not assumed. As corollaries of the main inequalities for small samples, one obtains alternate proofs of known properties of the Fisher information, as well as interesting new observations like the fact that the variance of the Pitman estimator based on a sample of size n scaled by n monotonically decreases in n. Extensions of the results to the polynomial versions of the Pitman estimators and a multivariate location parameter are given. Also, the search for characterization of equality conditions for one of the inequalities leads to a Cauchy-type functional equation for independent random variables, and an interesting new behavior of its solutions is described.
2006 IEEE International Symposium on Information Theory, 2006
We provide a simple proof of the monotonicity of information in the Central Limit Theorem for i.i... more We provide a simple proof of the monotonicity of information in the Central Limit Theorem for i.i.d. summands. Extensions to the more general case of independent, not identically distributed summands are also presented. New families of Fisher information and entropy power inequalities are discussed.
Uploads
Papers by Andrew Barron