Introduction
Severe malaria caused by the parasite
Severe falciparum malaria has been defined by experts convened by the World Health Organization (WHO) as clinical or laboratory evidence of vital organ dysfunction in the presence of circulating asexual
Our goal was to develop a biomarker-based model that can differentiate probabilistically between ‘true severe malaria’ and severe illness not caused primarily by malaria, but with concomitant parasitaemia. We define ‘true severe malaria’ conceptually as a febrile illness caused by malaria parasites, with organ dysfunction, that can result in death whereby mortality is attributable directly to the malaria parasites. This attributable mortality can be given a formal causal definition by using a conceptual (albeit unethical) randomised experiment of delayed versus prompt antimalarial therapy. In a theoretical patient population with true severe malaria, delay in administration of an effective antimalarial would result in increased mortality (Warrell et al., 1982; Gomes et al., 2009) whereas in a population with severe illness not caused by malaria (‘not severe malaria’) there would not be a corresponding increase in mortality.
We developed a probabilistic diagnostic model of severe malaria based on haematological biomarkers using data from 1704 adults and children mainly from low transmission settings whose diagnosis of severe malaria is considered to be highly specific. We used this model to demonstrate low phenotypic specificity in a cohort of 2220 Kenyan children who were diagnosed clinically with severe malaria. We validated the predictions using a natural experiment, the distribution of sickle cell trait (HbAS), the genetic polymorphism with the strongest known protective effect against all forms of clinical malaria (Malaria Genomic Epidemiology Network, 2014). Building on work on ‘data-tilting’ (Nie et al., 2013), we suggest a new method for testing genetic associations in the context of case-control studies in which cases are re-weighted by the probability that the severe malaria diagnosis is correct under the model. As proof of concept, we ran a genome-wide association study across 9.6 million imputed biallelic variants using the subset of cases with genome-wide genotype data (n = 1297) and population controls (n = 1614). Adjusting for case mis-classification decreased genome-wide FDRs (Storey, 2002) and increased effect sizes in three of the top regions of the human genome most strongly associated with protection from severe malaria in East Africa (
Results
Reference model of severe malaria
We used the joint distribution of platelet counts and white blood cell counts (both on a logarithmic scale) to develop a simple biomarker-based reference model of severe malaria. To fit the reference model (i.e. P[Data | Severe malaria]), we used platelet and white count data from (i) severe malaria patient cohorts enrolled in low transmission areas where severe disease accompanied by a positive blood stage parasitaemia has a high positive predictive value for severe malaria (930 adults from Vietnam [Hien et al., 1996; Phu et al., 2010] and 653 adults and children from Thailand and Bangladesh); and (ii) severely ill African children with plasma
Figure 1A shows the reference data (green triangles: patients with a highly specific diagnosis of severe malaria, summarised in Table 1) alongside data from a large Kenyan cohort of hospitalised children diagnosed with severe malaria, whose diagnosis had unknown specificity (pink squares). The median platelet count in the reference data was 57,000 per μL, and the median total white blood cell count was 8400 per μL. In contrast, the median platelet count in the Kenyan children was 120,000 per μL, and the median total white blood cell count was 13,000 per μL. Direct comparisons of white counts across these two datasets are confounded by geography and age. Total white blood cell counts are known to be age-dependent and vary across genetic backgrounds, in particular lower neutrophil counts are associated with mutations in the
Figure 1.
Platelet counts and white blood cell counts as diagnostic predictors of severe falciparum malaria.
Panel (A) shows the bivariate marginal distribution for the reference data (thought to be highly specific to severe malaria, green triangles, n = 1704, summarised in Table 1) and for the Kenyan case data (pink squares, n = 2220; black diamonds: HbAS). The dashed ellipses show the 50% and 95% bivariate normal probability contours approximating each dataset (dark green: reference data; purple: Kenyan data). Panel (B) shows the relationship between platelet counts and plasma
Table 1.
Summary of severe disease datasets used in our analyses.
For age and parasite density, we show the median values as the distributions are highly skewed. *For the FEAST trial, the severe malaria reference dataset only included platelet and white count data from the 121 patients who had
Bangladesh-Thailand | Vietnam | FEAST (Uganda) | Kenya | |
---|---|---|---|---|
Description | Observational studies of severe malaria | Randomised controlled trials in severe malaria | Randomised controlled trial in severe febrile illness | Observational severe malaria cohort |
Purpose | Reference data | Reference data | Reference data* and Figure 1B | Testing data |
Published references | Leopold et al., 2019 | Hien et al., 1996; Phu et al., 2010 | Maitland et al., 2011 | MalariaGEN Consortium et al., 2018 |
653 | 930 | 567 | 2220 | |
Age (years, range) | 28 (2–80) | 30 (15–79) | 2.1 (0–12) | 2.3 (0–13) |
Parasite density (per μL, IQR) | 48,984 (8289–187,395) | 83,084 (13,047–316,512) | 400 (0–53,200) | 72,000 (6208–315,250) |
Mortality (%) | 18.2 | 12.9 | 11.3 | 11.6 |
Estimating the proportion of children mis-diagnosed with severe malaria
We can consider the hospitalised Kenyan children in this series as a mixture of two latent sub-populations, ‘severe malaria’ and ‘not severe malaria’ (i.e an alternative aetiology for severe illness). To estimate the proportion of each, we use the distribution of HbAS, the human polymorphism most protective against all forms of clinical falciparum malaria. HbAS provides at least 90% protection against severe malaria (Taylor et al., 2012; Malaria Genomic Epidemiology Network, 2014). The causal SNP rs334 was genotyped in 2213 of the Kenyan children, of whom 57 were HbAS. The causal pathways (a) or (b) in Figure 2 (note all children have been selected into the study on the basis of clinical symptoms consistent with severe malaria) show how the distribution of HbAS can be used to infer the marginal probability P(Severe malaria) in the Kenyan cohort as the prevalence of HbAS is expected to differ in the two latent sub-populations.
Figure 2.
Theoretical causal pathways that lead to the clinical diagnosis of severe malaria under the current WHO definition (World Health Organisation, 2014).
Pathways (a) and (b) represent the two ways patients can be mis-classified as severe malaria. For both pathways (a) and (b), we expect a higher prevalence of HbAS relative to the population with true severe malaria as a consequence of the protective bottlenecks. In this causal model, we assume that HbAS does not protect against asymptomatic parasitaemia, although this assumption is not strictly necessary. Adapted with permission from Small et al., 2017.
We assumed that cases with the highest likelihood values P(Data | Severe malaria) under the reference model (a bivariate
Estimating individual probabilities of severe malaria
We then estimated P(Severe malaria | Data) for each Kenyan case by fitting a mixture model to the reference data and to the Kenyan data jointly. The model assumed that the platelet and white count data for the Kenyan children were drawn from a mixture of P(Data | Severe malaria) and P(Data | Not severe malaria). The reference data (Asian adults and children with severe malaria and African children with
Figure 3A shows the bimodal distribution of the posterior individual estimates of P(Severe malaria | Data). As expected, the individual posterior probabilities of severe malaria were highly predictive of HbAS ( from a generalised additive logistic regression model fit, Figure 3C). The individual probabilities were also predictive of in-hospital mortality ( from a generalised additive model fit; Figure 3D) and admission peripheral blood parasite density ( from a generalised additive model fit; Figure 3E). In the top quintile of patients with the highest estimated P(Severe malaria | Data), the prevalence of HbAS was 0.7% (3 out of 446). In contrast, for patients in the lowest quintile of estimated P(Severe malaria | Data), the prevalence of HbAS was 4.8% (21 out of 444). The patients with a low probability of severe malaria had a substantially higher case fatality ratio (18.8% mortality for patients in the bottom quintile of P[Severe malaria | Data] versus 6.1% mortality for the top quintile of P[Severe malaria | Data]). This may be explained by the higher case-specific mortality of severe bacterial sepsis (the most likely alternative cause of severe illness). The admission parasite densities in patients with a probability of severe malaria close to 1 were approximately fivefold higher than in patients with a probability of severe malaria close to 0. The blood culture positive rate was 2.1% in the top quintile of P(Severe malaria | Data) and 4.4% in the lowest quintile of P(Severe malaria | Data), and the individual probabilities were predictive of blood culture results ( under a generalised additive logistic regression model fit).
Figure 3.
Model estimates of P(Severe malaria | Data) in 2220 Kenyan children clinically diagnosed with severe malaria.
Panel (A) shows the distribution of posterior probabilities of severe malaria being the correct diagnosis. Panel (B) shows these same probabilities plotted as a function of the platelet and white counts on which they are based (dark red: probability close to 0; dark blue: probability close to 1). The black diamonds show the HbAS individuals. Panels (C–E) show the relationship between the estimated probabilities of severe malaria and HbAS, in-hospital mortality and admission parasite density, respectively. The black lines (shaded areas) show the mean estimated values (95% confidence intervals) from a generalised additive logistic regression model with a smooth spline term for the likelihood (R package
Accounting for case imprecision in case-control studies
‘False-positive’ cases reduce statistical power and dilute effect size estimates in case-control studies. We propose a novel approach for case-control studies with phenotypic imprecision based on data-tilting (Nie et al., 2013). The idea is to ‘tilt’ the cases towards a pseudo-population with higher specificity for severe malaria. We can do this by re-weighting the data by the probabilities P(Severe malaria | Data), that is, re-weighting the contribution to the log-likelihood in an association model.
We applied this approach as proof of concept to a genome-wide association study using the subset of Kenyan children who had clinical and genome-wide data available (after quality control checks n = 1297 cases) and a set of matched population controls (n = 1614), across 9.6 million biallelic variants on the autosomal chromosomes (Band et al., 2019). We compared the data-tilting method to the standard non-weighted approach by estimating local FDRs (Storey, 2002). Compared to the standard non-weighted GWAS, data-tilting substantially increased the number of significant associations for local FDRs in the range of 1–5% (Figure 4). For example, at an FDR of 2%, the number of significant hits is more than doubled with the additional hits all around known loci associated with protection from severe malaria. We note that if the data weights were not predictive of the true latent phenotype, we would expect fewer significant hits for a given FDR because of the reduction in effective sample size. This is demonstrated by permuting the data weights (for the cases only), which results in 50–75% reduction in the number of significant hits at FDRs < 5% (Appendix 3).
Figure 4.
The number of significant hits as a function of the FDR for the genome-wide association study across 9.6 million biallelic variants.
This analysis is based on a subset of the Kenyan children with whole-genome data available and passing quality checks n = 1297 and n = 1614 controls. Dashed line: weighted model; thick line: non-weighted model.
Examining three major genetic regions strongly associated with protection from severe malaria in East Africa (
Figure 5.
The three regions in the human genome with the greatest evidence for protection against severe malaria in East Africa (
The Manhattan plots (left panels) compare p-values from the weighted model (blue) and the non-weighted model (orange). Each Manhattan plot is centred around the known causal position shown by the vertical dashed line (0.5 Mb region). The horizontal dashed line shows (threshold often used for defining genome-wide significance). The 10 positions with the greatest –
Reappraisal of directly typed polymorphisms
We re-analysed case-control associations for 120 polymorphisms on 70 candidate malaria-protective genes which were typed directly in the 2220 Kenyan children along with 3940 population controls. In this case-control cohort, 14 polymorphisms had previously been identified as associated with protection or increased risk in severe malaria (MalariaGEN Consortium et al., 2018). A re-analysis of these 14 variants using the same models of association as previously published and down-weighting the likely mis-classified cases replicated the majority of associations, with increased effect sizes and increased –
We explored whether there was evidence of differential effects in the Kenyan cases using P[Severe malaria | Data] to assign probabilistically each case to the ‘severe malaria’ versus ‘not severe malaria’ sub-populations. We fitted a categorical logistic regression model predicting the latent sub-population label versus control, where the latent case label was estimated from the weights shown in Figure 3A. This resulted in approximately 1279 cases in the ‘severe malaria’ sub-population and 941 cases in the ‘not severe malaria’ sub-population. Differential effects were tested by comparing the estimated log-odds for the two sub-populations. After accounting for multiple testing, two polymorphisms showed significant differential effects: rs334 (derived allele encodes haemoglobin S, ) and rs1050828 (derived allele encodes
Figure 6.
Exploring differential effects in 120 directly typed polymorphisms across 70 candidate malaria-protecting genes.
(A) Case-control effect sizes estimated for the ‘severe malaria’ sub-population versus the ‘not severe malaria’ sub-population (n = 3940 controls and n = 2220 cases, with approximately 1279 in the ‘severe malaria’ sub-population and 941 in the ‘not severe malaria’ sub-population). The vertical and horizontal grey lines show the 95% credible intervals. (B) The
Discussion
The clinical diagnosis of severe falciparum malaria in African children is imprecise (Taylor et al., 2004; Bejon et al., 2007; White et al., 2013). Even with quantitation of parasite densities, specificity is still imperfect (Bejon et al., 2007). In children with cerebral malaria (unrouseable coma with malaria parasitaemia), the most specific of the severe malaria clinical syndromes, postmortem examination revealed another diagnosis in a quarter of cases studied in Blantyre, Malawi (Taylor et al., 2004). Diagnostic specificity can be improved by visualisation of the obstructed microcirculation in vivo (e.g. through indirect ophthalmoscopy) or from parasite biomass indicators (quantitation and staging of malaria parasites on thin blood films, counting of neutrophil-ingested malaria pigment, measurement of plasma concentrations of
This re-analysis using rich clinical data provides additional evidence for the three major genetic polymorphisms protective against severe malaria present in East Africa. After probabilistic down-weighting of the likely mis-classified cases, substantial increases in effect sizes were found. Dilution of effect sizes resulting from mis-classification could partially explain the large heterogeneity in effects noted in the largest severe malaria GWAS to date (Band et al., 2019). For haemoglobin S (rs334), there was a fourfold variation in estimated odds ratios across participating sites. Some of this heterogeneity can be attributed to variations in linkage disequilibrium affecting imputation accuracy (Malaria Genomic Epidemiology Network et al., 2013), but our analysis shows an additional substantial source of heterogeneity which results from diagnostic imprecision. This can be adjusted for if detailed clinical data are available. For example, in the case of rs334 (directly typed), the data-tilting approach results in a 25% increase in effect size on the log-odds scale, corresponding to 35% decrease in estimated odds ratios (0.1 versus 0.16).
As for the interpretation of genetic effects, one of the most interesting results concerns the
The limitations of our diagnostic model can be summarised as follows. First, the validity and interpretation of the individual probabilities of severe malaria is heavily dependent on the reference model and thus the reference data. Our reference data were primarily from Asian adults in whom diagnostic specificity for severe malaria is thought to be very high. Diagnostic checks suggested that the marginal distributions of platelet counts were similar between adults and children, and we made age corrections to the white blood cell count, but small deviations could reduce the discriminatory value (e.g. lower white counts associated with the Duffy negative phenotype; Reich et al., 2009). Second, it is possible that rare genetic conditions exist in which the probabilities of severe malaria under this model might be biased. One example is sickle cell disease (HbSS, <0.5% in the Kenyan cases), which results in chronic inflammation with high white counts and low platelet counts relative to the normal population (Sadarangani et al., 2009). The 11 children with HbSS in this cohort were all assigned low probabilities of severe malaria, but this should be interpreted with caution. Whether HbSS is protective against severe malaria or increases the risk of severe malaria remains unclear (Williams and Obaro, 2011). For these patients, other biomarkers such as plasma
In summary, under a probabilistic model based on routine blood count data, we have shown that it is possible to estimate mis-classification rates in diagnosed severe childhood malaria in a malaria endemic area of East Africa and compute probabilistic weights that can downweight the contribution of likely mis-classified cases. The well-established protective effect of HbAS provided an independent validation of the model. Relative to predicted mis-classified cases, patients predicted to have ‘true severe malaria’ had a substantially lower prevalence of HbAS, higher parasite densities, lower rates of positive blood cultures and lower mortality. These data strongly support the current guideline to give broad-spectrum antibiotics to all children with suspected severe malaria and suggest that normal range platelet counts (>200,000 per μL) could be used as a simple exclusion criterion in studies of severe malaria. Based on this analysis, we recommend that future studies in severe malaria collect and record complete blood count data. Further studies of platelet and white blood cell counts from a diverse cohort of children with severe falciparum malaria, confirmed using high-specificity diagnostic techniques such as visualisation of the microcirculation, and measurement of plasma
Materials and methods
Data
Kenyan case-control cohort
The Kenyan case-control cohort has been described in detail previously (MalariaGEN Consortium et al., 2018). Severe malaria cases consisted of all children aged <14 years who were admitted with clinical features of severe falciparum malaria to the high-dependency ward of Kilifi County Hospital between 11 June 1999 and 12 June 2008. Severe malaria was defined as a positive blood film for
Fluid Expansion as Supportive Therapy (FEAST)
FEAST was a multicentre randomised controlled trial comparing fluid boluses for severely ill children (n = 3161) that was not specific to severe malaria (Maitland et al., 2011). Platelet counts, white blood cell counts, parasite densities and
AQ Vietnam and AAV randomised controlled trials
The AQ and the AAV studies were two randomised clinical trials in Vietnamese adults diagnosed clinically with severe falciparum malaria recruited to a specialist ward of the Hospital for Tropical Diseases, Ho Chi Minh City, Vietnam, between 1991 and 2003 (Hien et al., 1996; Phu et al., 2010). AQ Vietnam was a double-blind comparison of intramuscular artemether versus intramuscular quinine (n = 560); AAV compared intramuscular artesunate and intramuscular artemether (n = 370).
Observational studies in Thai and Bangladeshi adults and children
We included data from multiple observational studies in severe falciparum malaria conducted by the Mahidol Oxford Tropical Medicine Research Unit in Thailand and Bangladesh between 1980 and 2019. These pooled data have been described previously (Leopold et al., 2019). Platelet counts and white blood cell counts were available in 657 patients. We excluded one 30-year-old adult from Bangladesh whose recorded platelet count was 1000 per μL and three other adults with platelet counts greater than 450,000 per μL as outliers reflecting likely data entry errors. Plasma
Multiple imputation
In the Kenyan severe malaria cohort (n = 2220), data on platelet counts were missing in 18%, white blood counts were missing in 0.2% and parasite density was missing in 1.6%. In-hospital outcome (died/survived) was missing for 13 patients. rs334 genotype was missing for 7; -thalassaemia genotype was missing for 101 patients. In the Vietnamese adults, platelet counts were missing in 4%, white counts in 2% and parasitaemia in 0%.
We did multiple imputation using random forests for all available clinical variables using the R package
Reference model of severe malaria
Biological rationale
Thrombocytopenia accompanied by a normal white blood count and a normal neutrophil count are typical features of severe malaria (Hanson et al., 2015; Leblanc et al., 2020), but they may also occur in some systemic viral infections and in severe sepsis. Neutrophil leukocytosis may sometimes occur in very severe malaria, but is more characteristic of pyogenic bacterial infections. These indices, whilst individually not very specific, could each have useful discriminatory value. We reasoned therefore that their joint distribution could help discriminate between children with severe malaria versus those severely ill with coincidental parasitaemia. The Kenyan severe malaria cohort did not have differential white count data, so we used platelet counts and total white blood cell counts as the two diagnostic biomarkers in the reference model of severe malaria.
Choice of reference data and confounders
The best data for fitting the biomarker model are either from children or adults from low transmission areas (where parasitaemia has a high positive predictive value) or in children or adults with high plasma
In the first years of life, white blood cell counts are often much higher than in adults because of lymphocytosis. We used data from 858 children from the FEAST trial, in whom white counts were measured, to estimate the relationship between age and mean white count in severe illness (median age was 24 months). The estimated relationship is shown in Appendix 8 (using a generalised additive linear model with the white count on the
There is also a systematic difference associated with the Duffy negative phenotype which is near fixation in Africa but absent in Asia. Duffy negative individuals have lower neutrophil counts (termed benign ethnic neutropenia) (Reich et al., 2009). The use of Asian adults to estimate the reference distribution of white counts in severe malaria could thus falsely include individuals with elevated white counts (relative to the normal ranges). However, a diagnostic quantile-quantile plot (Appendix 1, on the log scale) comparing the white blood cell count distribution in Vietnamese adults and in children in the FEAST trial who had
For platelet counts (which have the greatest diagnostic value for severe malaria in our series), age is not a confounder and published data support the hypothesis that thrombocytopenia is highly specific for ‘true’ severe malaria in children as well as adults suspected of having severe malaria (with a diagnostic and a prognostic value). The French national guidelines specifically mention thrombocytopenia (<150,000 per μL) for the diagnosis of severe malaria in children who have travelled to a malaria endemic area. In a French paediatric severe malaria series in travellers, almost half had severe thrombocytopenia (<50,000 per μL) (Lanneaux et al., 2016; Pediatric Imported Malaria Study Group for the ‘Centre National de Référence du Paludisme’ et al., 2017). In Dakar, Senegal (one of the lowest transmission areas in Africa), thrombocytopenia was an independent predictor of death and the median platelet count was 100,000 (Gérardin et al., 2007; Gérardin et al., 2002). Comparison of the distributions of platelet counts (on the log scale) between Asian children and Asian adults suggested no major differences (Appendix 1), although we had few data for Asian children. In the seminal Blantyre autopsy study (Taylor et al., 2004), platelet counts were substantially different between fatal cases confirmed postmortem to be severe malaria (62,000 per μL and 56,000 per μL for the children with sequestration only and sequestration + microvascular pathology, respectively) and fatal cases with a mis-diagnosis of severe malaria (no sequestration: 176,000 per μL; the inter-group difference was statistically significant, ). A larger cohort from the same centre in Malawi reported substantially higher platelet counts in retinopathy-negative cerebral malaria (mean platelet count was 161,000 per μL, n = 288) compared to retinopathy-positive cerebral malaria (mean count was 81,000 per μL, n = 438) (Small et al., 2017).
We visually checked approximate normality for each marginal distribution using quantile-quantile plots (Appendix 9). On the
Limitations of the model
The diagnostic model of severe malaria using platelet counts and white blood cell counts cannot be applied to all patients. We summarise here the known and possible limitations. When using this model to estimate the association between a genetic polymorphism and the risk of severe malaria, if the genetic polymorphism of interest affects the complete blood count independently, there will be selection bias (see the directed acyclic graph in Appendix 10). One example is HbSS. Children with HbSS have chronic inflammation with white blood cells counts about 2–3 times higher than normal and slightly lower platelet counts (Sadarangani et al., 2009). All 11 children in the Kenyan cohort with HbSS were assigned low probabilities of having severe malaria (Appendix 10), but these probabilities could reflect a deficiency of the model. Including or excluding these children from the analysis had no impact on the results as they represent less than 0.5% of the cases.
The second possible limitation concerns the validation using HbAS. Previous studies have suggested negative epistasis between the malaria-protective effects of HbAS and -thalassaemia (Williams et al., 2005; Opi et al., 2014). The 3.7 kb deletion across the
The third possible problem concerns the use of white blood cell counts in relation to invasive bacterial infections. Bacteraemia could either be the cause of severe illness (with coincidental parasitaemia) or it could be concomitant (which may result from extensive parasitised erythrocyte sequestration in the gut), that is, a result of severe malaria. The former should be identified as ‘not severe malaria’ (as bacteraemia is the main cause of illness), but the latter should be identified as ‘severe malaria’ and might be mis-classified as ‘not severe malaria’ under our model. However, in a series of 845 Vietnamese adults (high diagnostic specificity), only one of eight patients who had concomitant-invasive bacterial infections and a white count measured had leukocytosis (median white count was 8100; range 3500–14,850 per μL; Phu et al., 2020).
Estimating the diagnostic specificity in the Kenyan cohort
We assume that the Kenyan cases are a latent mixture of two sub-populations:
We can infer the value of π (proportion correctly classified as severe malaria) without making parametric assumptions about
Out of the 2213 Kenyan cases with rs334 genotyped, we assume that cases in the top 40th percentile of the likelihood distribution under are drawn from
For the other cases, the proportion drawn from
Finally, additional information is incorporated by using data from a cohort of individuals with severe disease from the same hospital who had positive malaria blood slides but whose diagnosis was not severe malaria , of which were HbAS) (Uyoga et al., 2019).
Under these assumptions, we can fit a Bayesian binomial mixture model to these data with three parameters: . The likelihood is given by
The priors used were (i.e. 5% prior probability with 100 pseudo observations); (1% prior probability with 100 pseudo observations). A sensitivity analysis with flat beta priors (Beta[1,1]) did not qualitatively change the result (by one percentage point for the final estimate of π). To check the validity of the use of the external population from Uyoga et al., 2019, we did a sensitivity analysis using the lowest quintile of the likelihood ratio distribution as a population drawn entirely from
Estimating P(Severe malaria | Data) in the Kenyan cohort
Denote the platelet and white count data from the FEAST trial as ; the data from the Vietnamese adults and children as ; the data from the Kenyan children as . We fit the following joint model to the reference biomarker data and the Kenyan biomarker data.with the following prior distributions and hyperparameters, where such that :
The covariance matrices and were parameterised as their Cholesky LKJ decomposition, where the L correlation matrices had a uniform prior (i.e. hyperparameter ν = 1). The model was implemented in
This models the biomarker data in ‘not severe malaria’ as a mixture of
Re-weighted likelihood for case-control analyses
For each , we estimate the posterior probability of being drawn from the sampling distribution
Genome-wide association study
Anonymised whole-genome data from the Illumina Omni 2.5M platform for 1944 severe malaria cases and 1738 population controls were downloaded from the European Genome-Phenome Archive (dataset accession ID: EGAD00010001742, release date March 2019; Band et al., 2019). This contained sequencing data on 2,383,648 variants. We used the quality control metadata provided with the 2019 data release to select SNPs and individuals with high-quality data. We first excluded 386 individuals (due to relatedness: 155; missing data or low intensity: 226; gender: 5). We then removed 616,426 SNPs that did not pass quality control, leaving a total of 1,767,222 SNPs. We used
Case-control study in directly typed polymorphisms
We fit a categorical (multinomial) logistic regression model to the case-control status as a function of the directly typed polymorphisms (120 after discarding those that are monomorphic in this population; see MalariaGEN Consortium et al., 2018 for additional details). We modelled the severe malaria cases as two separate sub-populations with a latent variable: ‘severe malaria’ versus ‘not severe malaria’, resulting in three possible labels (controls, ‘severe malaria’, ‘not severe malaria’). The models adjusted for self-reported ethnicity and sex. The model was coded in
Code availability
Code, along with a minimal clinical dataset for reproducibility of the diagnostic phenotyping model, is available via a GitHub repository: https://github.com/jwatowatson/Kenyan_phenotypic_accuracy (Watson, 2021; copy archived at swh:1:rev:03a2de285d38b85a769aa25de46b7960487efc62).
Data availability
A curated minimal clinical dataset is currently available alongside the code on the GitHub repository. This will also be made available at publication via the KEMRI-Wellcome Harvard Dataverse (https://dataverse.harvard.edu/dataverse/kwtrp).
This paper used genome-wide genotyping data generated by Band et al., 2019, available on request from the European Genome-Phenome Archive (dataset accession ID: EGAD00010001742).
Requests for access to appropriately anonymised clinical data and directly typed genetic variants (Malaria Genomic Epidemiology Network, 2014) for the Kenyan severe malaria cohort can be made by application to the data access committee at the KEMRI-Wellcome Trust Research Programme by email to [email protected].
The FEAST trial datasets are available from the principal investigator on reasonable request ([email protected]). Requests for access to appropriately anonymised clinical data from the AQ and AAV Vietnam study and the Asian paediatric cohort can be made via the Mahidol Oxford Tropical Medicine Research Unit data access committee by emailing the corresponding author JAW ([email protected]) or Rita Chanviriyavuth ([email protected]).
2 Centre for Tropical Medicine and Global Health, Nuffield Department of Medicine, University of Oxford Oxford United Kingdom
3 KEMRI-Wellcome Trust Research Programme, Centre for Geographic Medicine Research-Coast Kilifi Kenya
4 The Wellcome Sanger Institute Cambridge United Kingdom
5 Wellcome Trust Centre for Human Genetics, University of Oxford Oxford United Kingdom
6 Medical Research Council Clinical Trials Unit, University College London London United Kingdom
7 Institute of Global Health Innovation, Imperial College, London London United Kingdom
8 Nuffield Department of Medicine, University of Oxford Oxford United Kingdom
9 Department of Statistics, University of Oxford Oxford United Kingdom
The University of Melbourne Australia
University of Geneva Switzerland
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2021, Watson et al. This work is published under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Severe falciparum malaria has substantially affected human evolution. Genetic association studies of patients with clinically defined severe malaria and matched population controls have helped characterise human genetic susceptibility to severe malaria, but phenotypic imprecision compromises discovered associations. In areas of high malaria transmission, the diagnosis of severe malaria in young children and, in particular, the distinction from bacterial sepsis are imprecise. We developed a probabilistic diagnostic model of severe malaria using platelet and white count data. Under this model, we re-analysed clinical and genetic data from 2220 Kenyan children with clinically defined severe malaria and 3940 population controls, adjusting for phenotype mis-labelling. Our model, validated by the distribution of sickle trait, estimated that approximately one-third of cases did not have severe malaria. We propose a data-tilting approach for case-control studies with phenotype mis-labelling and show that this reduces false discovery rates and improves statistical power in genome-wide association studies.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer