The visualization, unbiased estimation and interpretation of distributional regression models
Distributional regression represents a modern approach to regression modeling that yields the ability to simultaneously connect multiple parameters beyond the mean of any parametric response distribution to structured additive predictors that can take parametric and non-parametric forms. This thesis proposes contributions to this field in three unique ways: in 1) a framework for the visualization of distributional regression models is developed, which focuses on predicted conditional moments and the shape of the whole distribution, instead of solely relying on distributional parameters as is commonly done. It is implemented as an extensive interactive R package named distreg.vis, focused on usability. The second contribution 2) recognizes a bias in the estimation of distributional regression model coefficients of all parameters if the model equation of one parameter is incorrectly specified. A solution for two-parameter distributions based on a numerically solved system of ordinary differential equations (ODE) created with the parameters' maximum likelihood estimate (MLE) covariance matrix is outlined, implemented and tested in a simulation study. Contribution 3) fills a gap in the interpretation of fitted distributional regression models. Existing metrics for ranking the importance of variables in linear regression models are discussed, with “relative weights” and “hierarchical partitioning” standing out as the most suitable due to their robustness to the scale of covariates, the consideration of variable cross-correlation, order independency and suitability for effects with more than one degree of freedom. These metrics are subsequently extended to generalized linear models (GLM) and generalized additive models for location, scale and shape (GAMLSS) with linear predictors taking into account the possibly multi-parametric response structure and likelihood-based nature of the fitted regression models. These extensions are implemented in an R package called vibe, providing methods compatible with several other packages. The above contributions are showcased using several datasets about wages in the Mid-Atlantic region of the USA, gym visitor numbers in Göttingen, extreme rainfall in Tasmania of Australia, patient satisfaction with a health care provider in North Macedonia and malnutrition scores in India.