A Genome-Wide Association Study (GWAS) for Bronchopulmonary Dysplasia
OBJECTIVE: Twin studies suggest that heritability of moderate-severe bronchopulmonary dysplasia (BPD) is 53% to 79%, we conducted a genome-wide association study (GWAS) to identify genetic variants associated with the risk for BPD.
METHODS: The discovery GWAS was completed on 1726 very low birth weight infants (gestational age = 250–296/7 weeks) who had a minimum of 3 days of intermittent positive pressure ventilation and were in the hospital at 36 weeks’ postmenstrual age. At 36 weeks’ postmenstrual age, moderate-severe BPD cases (n = 899) were defined as requiring continuous supplemental oxygen, whereas controls (n = 827) inhaled room air. An additional 795 comparable infants (371 cases, 424 controls) were a replication population. Genomic DNA from case and control newborn screening bloodspots was used for the GWAS. The replication study interrogated single-nucleotide polymorphisms (SNPs) identified in the discovery GWAS and those within the HumanExome beadchip.
RESULTS: Genotyping using genomic DNA was successful. We did not identify SNPs associated with BPD at the genome-wide significance level (5 × 10−8) and no SNP identified in previous studies reached statistical significance (Bonferroni-corrected P value threshold .0018). Pathway analyses were not informative.
CONCLUSIONS: We did not identify genomic loci or pathways that account for the previously described heritability for BPD. Potential explanations include causal mutations that are genetic variants and were not assayed or are mapped to many distributed loci, inadequate sample size, race ethnicity of our study population, or case-control differences investigated are not attributable to underlying common genetic variation.
- genome-wide association study (GWAS)
- chronic lung disease
- genetic predisposition to disease
- very low birth weight infant
- BPD —
- bronchopulmonary dysplasia
- BW —
- birth weight
- CPQCC —
- California Perinatal Quality Care Collaborative
- GA —
- gestational age
- gDNA —
- genomic DNA
- GWAS —
- genome-wide association study
- IPPV —
- intermittent positive pressure ventilation
- NBS —
- newborn screening bloodspots
- OR —
- odds ratio
- PC —
- principal component
- PMA —
- postmenstrual age
- PNA —
- postnatal age
- SNP —
- single-nucleotide polymorphism
- VLBW —
- very low birth weight
What’s Known on This Subject:
Twin studies suggest that bronchopulmonary dysplasia (BPD) is heritable; however, only a small number of genetic loci have been associated with BPD and these explain only a limited amount of this heritability.
What This Study Adds:
A genome-wide association study of singleton infants (899 BPD cases and 827 controls) of 25 to 30 weeks’ gestational age did not identify single-nucleotide polymorphisms associated with BPD at the genome-wide significance level but did identify polymorphisms warranting further study.
Bronchopulmonary dysplasia (BPD), a disorder characterized by impairment of alveolarization, remains a leading cause of morbidity and mortality in premature infants.1 The risk of developing BPD rises with decreasing gestational age (GA) or birth weight (BW).2,3 Infants with BPD require prolonged stays in NICUs, and after discharge many infants require supplemental oxygen therapy and experience frequent hospitalizations.
Although mechanisms responsible for BPD have been investigated,2 and there is evidence for mediators and pathways,3 there has been little progress in decreasing the incidence of BPD in very low birth weight (VLBW) infants (BW <1500 g). Because 2 separate twin studies4,5 indicated that genetic factors were a major risk for developing BPD, investigators have used different strategies to identify heritable factors, including identification of frequency differences in single-nucleotide polymorphisms (SNPs) in candidate genes. Although there have been reports of promising genes, such as SFTPB, these efforts have been largely unsuccessful (reviewed in refs 6 and 7).
To investigate potential genetic etiologies of BPD in VLBW infants, we conducted a population-based genome-wide association study (GWAS) among cases and controls in California to identify disease-susceptible genes and generate biological hypotheses.
Patient Population and Phenotype Definitions
Records of infants for this case-control study of singletons were identified from the California Perinatal Quality Care Collaborative (CPQCC; http://www.cpqcc.org/)8 database during calendar years 2005 to 2008. The CPQCC prospectively collects clinical data from 128 member hospitals representing more than 90% of all NICU admissions in California. CPQCC conducts yearly data abstractor training at California locations. Each record has range and logic checks during data collection and before data closeout. Records with excessive missing data are audited.
Inclusion criteria included GA 250 to 296/7 weeks, BW <1500 g, and a minimum of 3 days intermittent positive pressure ventilation (IPPV) during the infant’s hospitalization up to 36 weeks’ postmenstrual age (PMA). The database did not differentiate between nasal ventilation and tracheal ventilation. The minimum of 3 days IPPV was included as 1 inclusion criterion so that this “environmental factor” would be consistent for both cases and controls. We used standard National Institute of Child Health and Human Development/National Institutes of Health criteria for diagnosis of mild, moderate, and severe BPD.1,9 Since Lavoie et al5 demonstrated that heritability was associated with moderate-severe, but not mild BPD, we defined cases as infants requiring supplemental oxygen at 36 weeks’ PMA and controls as infants who were breathing room air at 36 weeks’ PMA. The need for supplemental oxygen was determined by the practices of the individual NICU and physiologic assessments10 were not routinely carried out. Infants were excluded if they did not meet inclusion criteria, were 1 of a multiple birth, had major congenital abnormalities, if they died or left the hospital before 36 weeks’ PMA, or if supplemental oxygen status at 36 weeks’ PMA was not known. We also excluded infants with major surgery, as interventions, including assisted ventilation, may have introduced bias. Patent ductus arteriosus ligation was not an exclusion criterion. Control infants met all inclusion criteria and were in the hospital but did not require supplemental oxygen at 36 weeks’ PMA (Supplemental Methods).
Research Ethic Boards
The Institutional Review Board of Stanford University and the Health and the Welfare Agency Committee for the Protection of Human Subjects of the State of California approved this study. California newborn screening bloodspots (NBS) may be used for anonymous research studies unless parents refuse.
The California Department of Public Health linked study subjects’ CPQCC information to their NBS. Genomic DNA (gDNA) was extracted from bloodspots by using the protocol described by St Julien et al11 and genotyped (Illumina HumanOmni2.5 beadchip, San Diego, CA). Nonamplified gDNA was used and the genotype calls were made by using GenomeStudio software (Illumina, 2011) after quality control procedures (Supplemental Methods).
To identify potential factors for risk adjustment in analyses between SNPs and BPD, we first examined univariate relationships between variables reported to be associated with the risk of BPD.12 Variables with P < .20 were entered into a logistic model and those with P ≤ .05 were retained in models using backward selection. Based on variables and their results, potential associations between measured SNPs and BPD were evaluated with an additive genetic model using logistic regression, adjusted for genetic ancestry, gender, and BW. Bivariate analyses of a number of factors, as well as findings from the literature, indicated that both gender and BW were significantly associated with BPD. To estimate genetic ancestry, we calculated principal components (PCs) from a subset of 114 764 (see Supplemental Information) SNPs13 and adjusted for PCs that were significantly associated with BPD. We also performed self-reported race/ethnicity stratified analyses within African American, Hispanic, and Caucasian individuals. In addition to exploring genetic risks under an additive effect model, we explored risks under a dominant model. We also explored potential confounding on BPD risks associated with clinical practice variation across NICUs by using a propensity score model and found no major confounding effect. To refine association signals, an additional 9 217 535 SNPs imputed from the 1000 Genomes Project were analyzed as additive dosages.13 In addition to single SNP analyses, gene sets and pathways (Kyoto Encyclopedia of Genes and Genomes, Reactome, Biocarta, and Gene Ontology, Supplemental Methods) were assessed for association with BPD.
After analyzing our discovery GWAS data, we selected 5673 SNPs for replication in an independent replication data set of 371 BPD cases and 424 controls based on their P values and odds ratios (ORs; Supplemental Methods). Infants were selected from CPQCC calendar years 2009 to 2010 by using the same criteria as in the discovery GWAS. All DNA replication samples were genotyped by using the Illumina HumanExome beadchip with custom SNP content added for the 5673 SNPs. Nonamplified gDNA was used and genotype calls were made by using GenomeStudio software (Illumina, 2011). Replication study data were analyzed both individually and in combination with the discovery set, the latter being termed joint analysis.
From 2005 to 2008 there were more than 2 million births in California and 61 695 infants were admitted to the NICUs that report to the CPQCC. After applying inclusion and exclusion criteria, we identified 1063 controls and 1091 cases with moderate-severe BPD (total = 2154). NBS were matched to 851 controls and 922 infants with moderate-severe BPD (total = 1773). See Supplemental Fig 3 for number of infants at each step. The 2154 enrolled infants demonstrated the expected increased risk of moderate-severe BPD as GA and BW decreased (Table 1). Because we included only VLBW infants who had a minimum of 3 days IPPV, the overall proportion of moderate-severe BPD is higher than what would be expected if all VLBW neonates had been included. From 2005 to 2010, the incidence of BPD in all premature infants who weighed 501 to 1500 g at birth and were entered in the CPQCC database was 32%.
Self-reported race/ethnicity followed a frequency pattern similar to all of California’s births (Table 1, Supplemental Fig 4). Factors known to increase or decrease BPD risk12 were observed (Table 1, Supplemental Tables 3 and 4). Univariate and bivariate analyses of a number of factors, as well as findings from the literature, indicated that both gender and BW were significantly associated with BPD.
Genomic DNA extracted from 1726 bloodspots (827 controls and 899 cases) was used in the GWAS discovery study and 1 795 103 SNPs with minor allele frequency ≥0.01 were successfully genotyped. The first 3 components of PC analysis identified 4 race/ethnic groups, representing African American, Hispanic, Caucasian, and Asian (Fig 1A). There were differences between genotyped race/ethnic background and self-identified race/ethnicity background (Fig 1B). The genomic inflation factor (λGC = 1.0051)14 and the quantile-quantile plot revealed no evidence of inflation of test statistics due to population stratification (Supplemental Fig 5).
Assuming a genome-wide significance threshold15 of 5 × 10−8, no SNPs were significantly associated with moderate-severe BPD (Fig 2). The SNP with the smallest P value was rs8089528 (P = 8.64 × 10−7), an intergenic SNP on chromosome 18. Additional SNPs with small P values included rs118078182 (P = 1.30 × 10−6), an intronic SNP in collagen, type XXIII, alpha1 (COL23A1), and rs12571250 (P = 1.24 × 10−6), an intron in the bicaudal C homolog 1 (BICC1) gene.
We further evaluated our GWAS discovery cohort and identified 259 “super-control” infants who were breathing room air at both 28 days’ postnatal age (PNA) and 36 weeks’ PMA and 568 infants with mild BPD1,9 who were receiving supplemental oxygen at 28 days’ PNA but breathing room air at 36 weeks’ PMA. Assuming a genome-wide significance threshold15 of 5 × 10−8, we did not find any SNPs that were significantly associated with moderate-severe BPD relative to either mild BPD or the super-control group.
The replication population-based study cohort of 371 moderate-severe BPD cases and 424 controls shared similar demographic and clinical characteristics as the discovery GWAS cohort. No SNP from the discovery stage replicated, and joint analysis of discovery and replication cohorts did not yield any genome-wide significant SNPs (Supplemental Fig 6). Nevertheless, our joint analysis identified 6 SNPs with P ≤ 10−5, a criterion used by the National Human Genome Research Institute (http://www.genome.gov/26525384). The strongest signals from the joint analysis were rs556493 (P = 1.42 × 10−6, OR = 0.74), an intronic SNP in syntaxin binding protein 5 (STXBP5), and rs12356475 (P = 1.54 × 10−6, OR = 0.70), an intronic SNP in catenin (cadherin-associated protein), α 3 (CTNNA3) (Table 2). Nine SNPs with a P ≤ 10−5 that were detected in the discovery stage did not replicate (Table 2). SNPs with P ≤ 10−5 and in linkage disequilibrium with the 3 index SNPs in Table 2 are tabulated in Supplemental Table 5.
SNPs Identified in Previous Studies
We examined 27 SNPs previously associated with BPD (Supplemental Table 6). None reached replication-wide significance (P < .05/27 = 0.0018). Supplemental Table 7 shows SNPs with P values <0.1 in our discovery GWAS analysis or race/ethnicity subgroup analysis (adjusted for PCs); these include (SPOCK 2) (rs1245560), vascular endothelial growth factor A (VEGFA) (rs699947 and rs833061), superoxide dismutase 2, mitochondrial (SOD2) (rs5746136), toll-interleukin 1 receptor (TIR) domain containing adaptor protein (TIRAP) (rs8177374), mannose-binding lectin (protein C) 2, soluble (MBL2) (rs5030737). In our set-based analysis, surfactant protein D (SFTPD), interleukin 18 (interferon-gamma-inducing factor) (IL18), superoxide dismutase 3, extracellular (SOD3), matrix metallopeptidase 16 (membrane-inserted) (MMP16), and selectin L (SELL) had P < .1 (Supplemental Table 6).
Genetic Models and Covariables
GWAS analyses included the examination of differing genetic models and covariates as possible influences on risk of moderate-severe BPD. Results associated with a dominant genetic model largely agreed with those from the additive model, as expected (the additive model is still highly powered for a dominant genetic model); P values associated with 12 SNPs that were larger than 10−5 when analyzed under an additive model became less than 10−5 when analyzed under a dominant model in joint analyses (Supplemental Table 8). We also tested models without adjustments for gender and BW. Although there was no dramatic change in the result, P values of 5 SNPs were substantially smaller (up to 1.4 orders of magnitude), suggesting confounding of these SNPs with gender and BW (Supplemental Table 9). Further examination showed that confounding was associated with BW. The higher-order term of BW showed no association with moderate-severe BPD in the discovery data set and significant association in the replication data set. However, adding the higher-order term of BW had a negligible effect on discovery results. Analyses adjusting for NICUs using a propensity score did not reveal substantively different results despite prevalence differences of moderate-severe BPD across NICUs. Year of birth did not affect findings.
Stratified analyses by self-reported race/ethnicity (and adjusted for PCs) identified a total of 21 SNPs with P ≤ 10−5 in both GWAS discovery samples and combined samples (Supplemental Table 10). The strongest statistical signal was identified in Caucasians in discovery samples: rs6988306 (OR = 0.23, P = 2.81 × 10−7), an intron SNP in GRHL2. This SNP did not show statistical evidence of replication (replication OR = 0.81, P = .52; joint analysis OR = 0.36, P = 2.8 × 10−6). The SNPs identified in each race/ethnic subpopulation did not overlap.
Imputed SNPs and Pathway Analyses
Results from imputed SNPs in the discovery phase supported associations of genotyped SNPs and 5 imputed SNPs showed a stronger signal than their linked genotyped SNPs in joint analysis (Table 2).
Our set-based analyses investigated how groups of SNPs affected risk of moderate-severe BPD. No gene or pathway was statistically associated with moderate-severe BPD (Bonferroni correction for P values: 5 × 10−6 for gene set analysis and 2 × 10−5 for pathway analysis).
Exome SNP Analyses
Associations with ∼41 000 SNPs with minor allele frequency >0.01 present on the exome array were evaluated in replication samples. One SNP, a missense mutation in muscular Lamin A/C–interacting protein, had P < 10−5 (Supplemental Fig 7, Supplemental Table 11). No exome array SNPs reached statistical significance using set-based (gene) analyses.
To systematically investigate genetic influences on VLBW infants developing moderate-severe BPD, we used a large population-based approach to evaluate millions of SNPs by using both a GWAS and an exome-based genotyping platform. We neither identified SNPs associated with moderate-severe BPD at the genome-wide significance level (5 × 10−8) nor replicated previous findings. This indicates that heritability of BPD is not attributable to 1, or only a few, of the large number of ancestrally conserved genetic variants assayed herein. We did identify 15 SNPs with P ≤ 10−5 in either our discovery stage or joint analyses, which will help guide future studies trying to characterize potential genetic risks for moderate-severe BPD.4,5 Our novel demonstration that nonamplified gDNA from NBS can be used for a GWAS has significant implications for the millions of stored NBS.
Our most promising SNP was rs118078182 in COL23A1, a transmembrane collagen expressed in the developing lung’s mesenchyme.16COL23A1 has a cleavable canonical collagen domain localized to the outside of the cell. COL23A1 may anchor mesenchyme cells to the basement membrane through the collagen domain.
Although COL23A and the other SNPs did not replicate in our replication study, our replication sample size was limited and real signals may have gone undetected. Others suggest that a P value criterion of 5 × 10−8 for genome-wide significance is too stringent.15 Therefore, testing the types of SNPs described here for functional relevance has merit.
A GWAS on 43 BPD French patients using pooled DNA from cases and controls identified sparc/osteonectin, cwcv, and kazal-like domains proteoglycan (testican) 2 (SPOCK 2) as an at-risk gene.17 Hadchouel et al replicated their findings in Finnish infants.17 The association for SPOCK 2 SNP rs1245560 was seen for both Caucasians (P = .02, OR = 1.85) and Africans (P = .007, OR = 2.43), whereas the association of rs1049269 was observed only in Caucasians (P = .025, OR = 1.79).17 The association was observed with moderate-severe, but not mild, BPD, which is consistent with previous observations.5 Our GWAS did not replicate Hadchouel et al's findings in the overall case-control analyses (OR = 1.0, P = .97 for rs1245560; and OR = 1.0, P = .95 for rs1049269). Analyses stratified by self-reported race/ethnicity (adjusted for PCs), showed a weak association for rs1245560 (P = .08, OR = 1.32) in only Caucasians, thus lending some support for SPOCK2 association with BPD. The P values of these 2 SNPs in replication samples and the overall P value of the SPOCK2 gene showed no association with BPD. Because the ancestral proportions for racially admixed populations have been shown to influence lung function in adults18 and children,19 it is possible we did not observe an association with SPOCK 2 because of the relatively small number of Caucasian and African American infants.
Two SNPs (rs3771159 and rs3771171) in interleukin-18 have been associated with BPD.20 We did not replicate these associations. However, SNP rs61731845, a missense mutation in paralemmin 3 (PALM3), which binds to immunoglobulin interleukin-1 receptor-related molecule (SIGIRR),21 had an OR of 6.97 (P = 8.5 × 10−5 risk allele = G) in our discovery GWAS. It, however, did not replicate (OR = 1.09, P = .88). There were significant differences in patients of the previous study20 and our study. The previous study evaluated 1091 Caucasian and African American infants who were born between 1989 and 2008, had a GA <35 weeks, and the BPD criterion used supplemental oxygen at 28 days’ PNA. In contrast, our population was predominantly Hispanic, was VLBW, had a GA <30 weeks, and if infants were on supplemental oxygen at 28 days’ PNA, yet on room air at 36 weeks’ PMA, they were included in our control group. They were included in our control group because previous work demonstrated heritability of BPD was not observed in infants with mild BPD.5
Most genetic studies for BPD have been small, have included a limited set of genes, and primarily targeted Caucasians (reviewed in ref 6). Our GWAS included a highly diverse population (Fig 1) and has by far the largest sample size and number of SNPs interrogated to date for BPD. Confounding from genetic ancestries was accounted for in logistic regression by using PCs, a method that addresses population stratification in heterogeneous populations.22 However, relative to GWASs focusing on adult diseases, which have involved tens of thousands of patients, our sample size is relatively small. This limited power to detect small-moderate associations.
Our inability to detect loci at genome-wide significant signals indicates that genetic risk for moderate-severe BPD is not likely to be solely influenced by 1, or a small number, of major ancestrally conserved genetic variants. Our approach may have missed some other heritable genetic paradigm, such as rare mutations, epigenetic effects, joint effects of multiple SNPs, copy number variations, or interactions among SNPs and nongenetic factors. Assessment of other models for heritability for BPD will require either different assay platforms or development of more advanced interrogation methods. Our study population does differ from the twin studies4,5 reporting substantive heritability underlying BPD. These twin studies did not report race/ethnicity of patients and their estimation of heritability did not investigate possible genetic heterogeneity among different populations. Based on the geographic location of the studies,4,5 we speculate that most of the patients were Caucasian, whereas our cases and controls were predominately of Mexican-Hispanic origin (Fig 1B). Genetic heterogeneity can affect power of detecting associations, especially in a highly heterogeneous population such as ours and, as discussed previously, there is convincing evidence that ancestral (genetic) determined race/ethnicity affects lung function in both children and adults. Finally, the eligibility criteria for both our cases and controls required a minimum of 3 days of IPPV. We chose this approach to better define the BPD phenotype and decrease the “environmental” differences between the groups in the hope that this would enhance our ability to detect genetic factors. However, BPD sometimes occurs in extremely premature infants who did not require IPPV and the findings of others may reflect, at least in part, the fact that they did not use this as one of the eligibility criteria. Moreover, unknown differences between the NICUs in California regarding their clinical approaches to these infants may have affected our ability to detect genetic effect. All these factors may have contributed to the lack of genome-wide significant findings of our study.
The authors express their appreciation to Dr Richard Bland for his contributions in writing the grant to obtain funding for the research, to Drs Fred Lorey and Shabbir Ahmad for so aptly directing efforts to make newborn blood specimens available for analyses, to Allan Santos for his detailed efforts in finding and processing bloodspots, and to the many individuals associated with the CPQCC for their efforts to create such an important database.
- Accepted May 13, 2013.
- Address correspondence to Hugh O’Brodovich, MD, Department of Pediatrics, Stanford University, 300 Pasteur Dr (Room H310), Stanford, CA, 94305. E-mail:
Dr Wang and Ms St Julien contributed equally to this work.
Drs Shaw and O’Brodovich had access to all of the data and take responsibility for the integrity of the data and accuracy of the data analysis. Drs Gould, Hoffmann, Jelliffe-Pawlowski, Krasnow, O’Brodovich, Shaw, Stevenson, and Witte and Ms Quaintance contributed to study concept and design. Dr Lazzeroni contributed to the statistical design and analysis plan in the grant application. Drs Gould, Hoffmann, Jelliffe-Pawlowski, Shaw, and Wang, and Mr Oehlert and Ms St Julien contributed to acquisition of data. All authors contributed to analysis and interpretation of data and critical revision of the manuscript for important intellectual content. Drs Gould, Hoffmann, Jelliffe-Pawlowski, Shaw, Witte, and Wang, and Mr Oehlert performed the statistical analyses. Dr O’Brodovich, as principal investigator, and Drs Gould, Krasnow, Lazzeroni, Shaw, and Stevenson, and Ms Quaintance obtained funding. Dr Jelliffe-Pawlowski, Ms Quaintance, and Mr Oehlert provided administrative, technical, or material support. Drs O’Brodovich and Shaw supervised the study.
FINANCIAL DISCLOSURE: The authors have indicated they have no financial relationships relevant to this article to disclose.
FUNDING: Funded by National Institutes of Health/National Heart, Lung, and Blood Institute grant RC2 HL101748. Funded by the National Institutes of Health (NIH).
- Bhandari A,
- Bhandari V
- Bhandari V,
- Bizzarro MJ,
- Shetty A,
- et al.,
- Neonatal Genetics Study Group
- Lavoie PM,
- Pham C,
- Jang KL
- Walsh MC,
- Yao Q,
- Gettner P,
- et al.,
- National Institute of Child Health and Human Development Neonatal Research Network
- Laughon MM,
- Langer JC,
- Bose CL,
- et al.,
- Eunice Kennedy Shriver National Institute of Child Health and Human Development Neonatal Research Network
- Panagiotou OA,
- Ioannidis JP,
- Genome-Wide Significance Project
- Visel A,
- Thaller C,
- Eichele G
- Copyright © 2013 by the American Academy of Pediatrics