List of works
Preprint
A Criterion for Aggregation Error for Multivariate Spatial Data
Posted to a preprint site 12/19/2023
The criterion for aggregation error (CAGE) is an important metric that aims to measure errors that arise in multiscale (or multi-resolution) spatial data,
referred to as the modifiable areal unit problem and the ecological fallacy. Specifically, CAGE is a measure of between scale variance of eigenvectors in a
Karhunen-Lo\'{e}ve expansion (KLE), motivated by a theoretical result, referred to as the ``null-MAUP-theorem,'' that states that the MAUP/ecological fallacy are not present when this variance is zero. CAGE was originally developed for univariate spatial data, but its use has been applied to multivariate spatial data without the development of a null-MAUP-theorem in the multivariate spatial setting. To fill this gap, we provide theoretical justification for a
multivariate CAGE (MVCAGE), which includes multiscale multivariate extensions of the KLE, Mercer's theorem, and the-null-MAUP theorem. Additionally, we provide technical results that demonstrate that the MVCAGE is preferable to spatial-only CAGE, and extend commonly used basis functions used to compute CAGE to the multivariate spatial setting. Empirical results are provided to demonstrate the use of MVCAGE for uncertainty quantification and
regionalization.
Journal article
Correcting for informative sampling in spatial covariance estimation and kriging predictions
Published 10/2023
Journal of geographical systems, 25, 4, 587 - 613
Informative sampling designs can impact spatial prediction, or kriging, in two important ways. First, the sampling design can bias spatial covariance parameter estimation, which in turn can bias spatial kriging estimates. Second, even with unbiased estimates of the spatial covariance parameters, since the kriging variance is a function of the observation locations, these estimates will vary based on the sample and overestimate the population-based estimates. In this work, we develop a weighted composite likelihood approach to improve spatial covariance parameter estimation under informative sampling designs. Then, given these parameter estimates, we propose three approaches to quantify the effects of the sampling design on the variance estimates in spatial prediction. These results can be used to make informed decisions for population-based inference. We illustrate our approaches using a comprehensive simulation study. Then, we apply our methods to perform spatial prediction using real estate data across a metropolitan area.
Journal article
REDS: Random ensemble deep spatial prediction
Published 02/2023
Environmetrics (London, Ont.), 34, 1, e2780
There has been a great deal of recent interest in the development of spatial prediction algorithms for very large datasets and/or prediction domains. These methods have primarily been developed in the spatial statistics community, but there has been growing interest in the machine learning community for such methods, primarily driven by the success of deep Gaussian process regression approaches and deep convolutional neural networks. These methods are often computationally expensive to train and implement and consequently, there has been a resurgence of interest in random projections and deep learning models based on random weights-so called reservoir computing methods. Here, we combine several of these ideas to develop the random ensemble deep spatial (REDS) approach to predict spatial data. The procedure uses random Fourier features as inputs to an extreme learning machine (a deep neural model with random weights), and with calibrated ensembles of outputs from this model based on different random weights, it provides a simple uncertainty quantification. The REDS method is demonstrated on simulated data and on a classic large satellite data set.
Journal article
An Overview of Univariate and Multivariate Karhunen Loeve Expansions in Statistics
Published 12/2022
Journal of the Indian Society for Probability and Statistics, 23, 2, 285 - 326
Dependent data are ubiquitous in statistics and across various subject matter domains, with dependencies across space, time, and variables. Basis expansions have proven quite effective in modeling such processes, particularly in the context of functional data and high-dimensional spatial, temporal, and spatio-temporal data. One of the most useful basis function representations is given by the Karhunen-Loeve expansion (KLE), which is derived from the covariance kernel that controls the dependence of a random process, and can be expressed in terms of reproducing kernel Hilbert spaces. The KLE has been used in a wide variety of disciplines to solve many different types of problems, including dimension reduction, covariance estimation, and optimal spatial regionalization. Despite its utility in the univariate context, the multivariate KLE has been used much less frequently in statistics. This manuscript provides an overview of the KLE, with the goal of illustrating the utility of the univariate KLE and bringing the multivariate version to the attention of a wider audience of statisticians and data scientists. After deriving the KLE from a univariate perspective, we derive the multivariate version and illustrate the implementation of both via simulation and data examples.
Journal article
Published 11/2021
Bulletin - Calcutta Statistical Association, 73, 2, 127 - 145
Growth is a fundamental aspect of a living organism. Growth curves play an important role in explaining the complex dynamics of growth trajectories. The development of a large class of growth models provides more choices to explain complex growth dynamics. However, identifying a suitable growth curve from a broad class of growth models becomes a challenging task. Relative Growth Rate (RGR) is the most popular measure in the growth-related study. It serves many purposes in growth curve literature, including constructing any goodness-of-fit index of some growth dynamics. However, the goodness-of-fit test based on RGR is restricted to only simple growth models. This study aims to develop a new growth rate function, instantaneous maturity rate (IMR), which can play an important role in identifying growth models. We have explored that the measure has synergy in mathematical form with IMR. However, unlike the hazard rate, IMR is a random variable when the size/RGR variable is stochastic. We have derived the exact and asymptotic distribution of this measure under the Gaussian setup of both the size and RGR variables. We have constructed a goodness-of-fit test for the extended Gompertz growth model based on the instantaneous maturity rate. We have checked the performance of the test through simulation studies as well as real data.
AMS 2010 subject classifications: 62Mxx, 92Bxx, 62P10
Preprint
Posted to a preprint site 08/27/2021
Informative sampling designs can impact spatial prediction, or kriging, in two important ways. First, the sampling design can bias spatial covariance
parameter estimation, which in turn can bias spatial kriging estimates. Second, even with unbiased estimates of the spatial covariance parameters, since the kriging variance is a function of the observation locations, these estimates will vary based on the sample and overestimate the population-based estimates. In this work, we develop a weighted composite likelihood approach to improve spatial covariance parameter estimation under informative sampling designs. Then, given these parameter estimates, we propose three approaches to quantify the effects of the sampling design on the variance estimates in spatial prediction. These results can be used to make informed decisions for population-based inference. We illustrate our approaches using a comprehensive simulation study. Then, we apply our methods to perform spatial prediction on nitrate concentration in wells located throughout central California.
Preprint
Deep Neural Network in Cusp Catastrophe Model
Posted to a preprint site 04/05/2020
Catastrophe theory was originally proposed to study dynamical systems that exhibit sudden shifts in behavior arising from small changes in input. These
models can generate reasonable explanation behind abrupt jumps in nonlinear dynamic models. Among the different catastrophe models, the Cusp Catastrophe model attracted the most attention due to it's relatively simpler dynamics and rich domain of application. Due to the complex behavior of the response, the parameter space becomes highly non-convex and hence it becomes very hard to optimize to figure out the generating parameters. Instead of solving for these generating parameters, we demonstrated how a Machine learning model can be trained to learn the dynamics of the Cusp catastrophe models, without ever really solving for the generating model parameters. Simulation studies and application on a few famous datasets are used to validate our approach. To our knowledge, this is the first paper of such kind where a neural network based approach has been applied in Cusp Catastrophe model.