Macquarie University
01whole.pdf (38 MB)

Investigating trace metal contamination and potential human health risks using novel environmental data science techniques

Download (38 MB)
posted on 2024-07-03, 03:00 authored by Xiaochi Liu

Resolving the nexus between environmental contamination and human health is the first step in designing and planning targeted, exposure specific intervention strategies. A multitude of variables can make it challenging to evaluate the differential influence of complex environmental sources and exposures factors on human health outcomes. The burgeoning field of environmental data science offers new and novel analytical techniques to help decipher the complex relationships hidden within complex data. This thesis applied the latest advances in environmental data science to investigate trace metal contamination and its potential connection to health outcomes. The thesis looks at multiple trace metal exposures but has a special focus on environmental Pb sources because of its persistence and prevalence in the environment coupled to its well-known adverse health effects even at low concentrations of exposure. Specifically, the thesis undertakes three main steps: (1) quantifies the effect of potential sources on trace metal concentrations in residential soil (e.g., soil Pb); (2) assesses the effect of environmental Pb exposure (e.g., soil Pb) on childhood blood lead levels (BLL); and (3) evaluates the effect of lowlevel prenatal Pb exposure (i.e., cord BLL < 5 μg/dL) on birth outcomes. 

To explore the potential sources of trace metal concentrations in soil, Step 1 analysed the spatial data of 8,221 soil samples from 1,828 homes in the Greater Sydney area. The study, detailed in Chapter 2, applied a novel spatial statistical method, optimal parameter-based geographical detector (OPGD) model, quantifying the effects of anthropogenic and natural factors on soil trace metal concentrations. The OPGD model applies a data-drive approach to optimise the relevant parameters (i.e., the spatial scale selection and the spatial discretization method) to improve the accuracy and effectiveness of the analysis. The analyses revealed that anthropogenic factors (e.g., aged/painted home density, road density, industrial trace metal emissions) were the main contributors to soil As, Cd, Cr, Cu, Pb, and Zn, whereas natural factors (e.g., soil pH value, regolith stability, soil type) were the main factors associated with to soil Mn and Ni. In particular, soil Pb contamination was shown to be the most concerning given that 42.7% of homes within the Greater Sydney area have soils above the Australian residential soil Pb guideline (300 mg/kg). Areas with high risk of soil Pb contamination were identified to inform potential future targeted mitigation strategies. 

Given the main contributors to soil Pb detected in Step 1 and other confounding variables, Step 2 assessed the effect of environmental Pb exposures (e.g., soil Pb) on childhood BLL by analysing Australia’s longest and largest blood Pb data set: 25 years (1991–2015) of childhood BLL records (n = 23,749) from Broken Hill. The study, detailed in Chapter 3, screened multiple machine learning (ML) algorithms, identified the one with optimal performance, and then applied a series of model-agnostic interpretation methods. Results show that Stacked Ensemble (SE), a method for optimally combining multiple prediction algorithms, enhanced predictive performance by 1.1% with respect to mean absolute error (p < 0.01) and 2.6% for root-meansquared error (p < 0.01) compared to the best performing constituent algorithm (random forest). The interpretation of the SE model showed that childhood BLL had a near-linear positive association with soil Pb; children had higher BLL if they resided within 1.0 km to the central mining area or 1.37 km to the railroad; BLL increased faster in Aboriginal than in non-Aboriginal children at 9–10 and 12–18 months of age. 

The modelling procedure developed in Step 2 was further applied in Step 3 to evaluate the effect of low-level prenatal Pb exposure on birth outcomes such as gestational age, newborn head circumference, newborn weight, and newborn length. The study, detailed in Chapter 4, analysed 1,091 mother-newborn observations from 2009 to 2021 extracted from Broken Hill pregnancy and cord BLL data. Results show that SE models consistently enhanced the predictive performance, consistent with the result in Chapter 3. Model interpretation revealed that as cord BLL increased, newborn head circumference and newborn length had a generally decreasing trend. Specifically, newborn head circumference decreased below the model’s mean prediction (34.5 cm) when cord BLL exceeded 1.7 μg/dL; and newborn length decreased below the model’s mean prediction (50.6 cm) when cord BLL exceeded 1.4 μg/dL. Cord BLL was shown not to evidently influence gestational age and newborn weight. Moreover, cord BLL strongly interacted with alcohol consumption when predicting newborn length and newborn weight: where pregnant mothers reported not drinking alcohol, the predicted newborn length and newborn weight did not vary as cord BLL increased; whereas the mothers reporting consuming alcohol, the ±1 standard deviation of predicted newborn length decreased from [51.2, 55.1] cm at 𝑐𝑜𝑟𝑑 𝐵𝐿𝐿 = 0 μg/dL to [45.8, 50.9] cm at 𝑐𝑜𝑟𝑑 𝐵𝐿𝐿 = 5.0 μg/dL, and predicted newborn weight decreased from [3113, 3783] g at 𝑐𝑜𝑟𝑑 𝐵𝐿𝐿 = 0 μg/dL to [2945, 3579] g at 𝑐𝑜𝑟𝑑 𝐵𝐿𝐿 = 3.7 μg/dL. 

The application of new investigation tools in environmental data science in this thesis has illuminated their utility for investigations examining the nexus between environmental contamination and human health exposure risks and outcomes. The OPGD model applied in this thesis is suitable for exploring influential factors across large geographical scales. The “Stacked Ensemble + model-agnostic interpretation” analytical framework developed and applied to the environmental and human health data in this thesis was shown to be reliable and robust for untangling complex relationships and discovering nuanced insights in exposure datasets. Both approaches have a clear value and potential for other environmental health research. 


Table of Contents

1. Introduction -- 2. Trace metal contamination in residential soil -- 3. How environmental Pb exposure affect childhood blood lead levels -- 4. How prenatal Pb exposure affect birth outcomes -- 5. Discussion -- 6. Conclusion -- Appendix -- References


Additional Supervisor 4: Chenyin Dong Additional Supervisor 5: Yongze Song Additional Supervisor 6: Xincai Wu

Awarding Institution

Macquarie University

Degree Type

Thesis PhD


Doctor of Philosophy

Department, Centre or School

School of Natural Sciences

Year of Award


Principal Supervisor

Mark Taylor

Additional Supervisor 1

Janaki Amin

Additional Supervisor 2

Marjorie Aelion


Copyright: The Author Copyright disclaimer:




416 pages

Former Identifiers

AMIS ID: 286248