Genomics and bioinformatic

Scientific publications - Genomics and bioinformatic


Vongsangnak, W., Olsen, P., Hansen, K., Krogsgaard, S., Nielsen, J.

"Improved annotation through genome-scale metabolic modeling of Aspergillus oryzae"

BMC Genomics, 9, art. no. 245. (2008)

Background: Since ancient times the filamentous fungus Aspergillus oryzae has been used in the fermentation industry for the production of fermented sauces and the production of industrial enzymes. Recently, the genome sequence of A. oryzae with 12,074 annotated genes was released but the number of hypothetical proteins accounted for more than 50% of the annotated genes. Considering the industrial importance of this fungus, it is therefore valuable to improve the annotation and further integrate genomic information with biochemical and physiological information available for this microorganism and other related fungi. Here we proposed the gene prediction by construction of an A. oryzae Expressed Sequence Tag (EST) library, sequencing and assembly. We enhanced the function assignment by our developed annotation strategy. The resulting better annotation was used to reconstruct the metabolic network leading to a genome scale metabolic model of A. oryzae. Results: Our assembled EST sequences we identified 1,046 newly predicted genes in the A. oryzae genome. Furthermore, it was possible to assign putative protein functions to 398 of the newly predicted genes. Noteworthy, our annotation strategy resulted in assignment of new putative functions to 1,469 hypothetical proteins already present in the A. oryzae genome database. Using the substantially improved annotated genome we reconstructed the metabolic network of A. oryzae. This network contains 729 enzymes, 1,314 enzyme-encoding genes, 1,073 metabolites and 1,846 (1,053 unique) biochemical reactions. The metabolic reactions are compartmentalized into the cytosol, the mitochondria, the peroxisome and the extracellular space. Transport steps between the compartments and the extracellular space represent 281 reactions, of which 161 are unique. The metabolic model was validated and shown to correctly describe the phenotypic behavior of A. oryzae grown on different carbon sources. Conclusion: A much enhanced annotation of the A. oryzae genome was performed and a genome-scale metabolic model of A. oryzae was reconstructed. The model accurately predicted the growth and biomass yield on different carbon sources. The model serves as an important resource for gaining further insight into our understanding of A. oryzae physiology. © 2008 Vongsangnak et al; licensee BioMed Central Ltd.

Martinez, D., Berka, R.M., Henrissat, B., Saloheimo, M., Arvas, M., Baker, S.E., Chapman, J., Chertkov, O., Coutinho, P.M., Cullen, D., Danchin, E.G.J., Grigoriev, I.V., Harris, P., Jackson, M., Kubicek, C.P., Han, C.S., Ho, I., Larrondo, L.F., De Leon, A.L., Magnuson, J.K., Merino, S., Misra, M., Nelson, B., Putnam, N., Robbertse, B., Salamov, A.A., Schmoll, M., Terry, A., Thayer, N., Westerholm-Parvinen, A., Schoch, C.L., Yao, J., Barbote, R., Nelson, M.A., Detter, C., Bruce, D., Kuske, C.R., Xie, G., Richardson, P., Rokhsar, D.S., Lucas, S.M., Rubin, E.M., Dunn-Coleman, N., Ward, M., Brettin, T.S.

"Genome sequencing and analysis of the biomass-degrading fungus Trichoderma reesei (syn. Hypocrea jecorina)"

Nature Biotechnology, 26 (5), pp. 553-560. (2008)

Trichoderma reesei is the main industrial source of cellulases and hemicellulases used to depolymerize biomass to simple sugars that are converted to chemical intermediates and biofuels, such as ethanol. We assembled 89 scaffolds (sets of ordered and oriented contigs) to generate 34 Mbp of nearly contiguous T. reesei genome sequence comprising 9,129 predicted gene models. Unexpectedly, considering the industrial utility and effectiveness of the carbohydrate-active enzymes of T. reesei, its genome encodes fewer cellulases and hemicellulases than any other sequenced fungus able to hydrolyze plant cell wall polysaccharides. Many T. reesei genes encoding carbohydrate-active enzymes are distributed nonrandomly in clusters that lie between regions of synteny with other Sordariomycetes. Numerous genes encoding biosynthetic pathways for secondary metabolites may promote survival of T. reesei in its competitive soil habitat, but genome analysis provided little mechanistic insight into its extraordinary capacity for protein secretion. Our analysis, coupled with the genome sequence data, provides a roadmap for constructing enhanced T. reesei strains for industrial applications such as biofuel production.

David, H., Özçelik, I.Ş., Hofmann, G., Nielsen, J.

"Analysis of Aspergillus nidulans metabolism at the genome-scale"

BMC Genomics, 9, art. no. 163. (2008)

Background: Aspergillus nidulans is a member of a diverse group of filamentous fungi, sharing many of the properties of its close relatives with significance in the fields of medicine, agriculture and industry. Furthermore, A. nidulans has been a classical model organism for studies of development biology and gene regulation, and thus it has become one of the best-characterized filamentous fungi. It was the first Aspergillus species to have its genome sequenced, and automated gene prediction tools predicted 9,451 open reading frames (ORFs) in the genome, of which less than 10% were assigned a function. Results: In this work, we have manually assigned functions to 472 orphan genes in the metabolism of A. nidulans, by using a pathway-driven approach and by employing comparative genomics tools based on sequence similarity. The central metabolism of A. nidulans, as well as biosynthetic pathways of relevant secondary metabolites, was reconstructed based on detailed metabolic reconstructions available for A. niger and Saccharomyces cerevisiae, and information on the genetics, biochemistry and physiology of A. nidulans. Thereby, it was possible to identify metabolic functions without a gene associated, and to look for candidate ORFs in the genome of A. nidulans by comparing its sequence to sequences of well-characterized genes in other species encoding the function of interest. A classification system, based on defined criteria, was developed for evaluating and selecting the ORFs among the candidates, in an objective and systematic manner. The functional assignments served as a basis to develop a mathematical model, linking 666 genes (both previously and newly annotated) to metabolic roles. The model was used to simulate metabolic behavior and additionally to integrate, analyze and interpret large-scale gene expression data concerning a study on glucose repression, thereby providing a means of upgrading the information content of experimental data and getting further insight into this phenomenon in A. nidulans. Conclusion: We demonstrate how pathway modeling of A. nidulans can be used as an approach to improve the functional annotation of the genome of this organism. Furthermore we show how the metabolic model establishes functional links between genes, enabling the upgrade of the information content of transcriptome data. © 2008 David et al; licensee BioMed Central Ltd.

Baker, S.E., Thykaer, J., Adney, W.S., Brettin, T.S., Brockman, F.J., D'haeseleer, P., Martinez, A.D., Miller, R.M., Rokhsar, D.S., Schadt, C.W., Torok, T., Tuskan, G., Bennett, J., Berka, R.M., Briggs, S.P., Heitman, J., Taylor, J., Gillian Turgeon, B., Werner-Washburne, M., Himmel, M.E.

"Fungal genome sequencing and bioenergy"

Fungal Biology Reviews, 22 (1), pp. 1-5. (2008)

To date, the number of ongoing filamentous fungal genome sequencing projects is almost tenfold fewer than those of bacterial and archaeal genome projects. The fungi chosen for sequencing represent narrow kingdom diversity; most are pathogens or models. We advocate an ambitious, forward-looking phylogenetic-based genome sequencing program, designed to capture metabolic diversity within the fungal kingdom, thereby enhancing research into alternative bioenergy sources, bioremediation, and fungal-environment interactions.

Oh, Y.-K., Palsson, B.O., Park, S.M., Schilling, C.H., Mahadevan, R.

"Genome-scale reconstruction of metabolic network in Bacillus subtilis based on high-throughput phenotyping and gene essentiality data"

Journal of Biological Chemistry, 282 (39), pp. 28791-28799. (2007)

In this report, a genome-scale reconstruction of Bacillus subtilis metabolism and its iterative development based on the combination of genomic, biochemical, and physiological information and high-throughput phenotyping experiments is presented. The initial reconstruction was converted into an in silico model and expanded in a four-step iterative fashion. First, network gap analysis was used to identify 48 missing reactions that are needed for growth but were not found in the genome annotation. Second, the computed growth rates under aerobic conditions were compared with high-throughput phenotypic screen data, and the initial in silico model could predict the outcomes qualitatively in 140 of 271 cases considered. Detailed analysis of the incorrect predictions resulted in the addition of 75 reactions to the initial reconstruction, and 200 of 271 cases were correctly computed. Third, in silico computations of the growth phenotypes of knock-out strains were found to be consistent with experimental observations in 720 of 766 cases evaluated. Fourth, the integrated analysis of the large-scale substrate utilization and gene essentiality data with the genome-scale metabolic model revealed the requirement of 80 specific enzymes (transport, 53; intracellular reactions, 27) that were not in the genome annotation. Subsequent sequence analysis resulted in the identification of genes that could be putatively assigned to 13 intracellular enzymes. The final reconstruction accounted for 844 open reading frames and consisted of 1020 metabolic reactions and 988 metabolites. Hence, the in silico model can be used to obtain experimentally verifiable hypothesis on the metabolic functions of various genes. © 2007 by The American Society for Biochemistry and Molecular Biology, Inc.

Tang, M.R., Sternberg, D., Behr, R.K., Sloma, A., Berka, R.M.

"Use of transcriptional profiling & bioinformatics to solve production problems"

Industrial Biotechnology, 2 (1), pp. 66-74. (2006)

The production of pigments during industrial fermentations is undesirable, and expensive purification steps are often required to remove colored compounds from anticipated commercial products. We observed that a recombinant Bacillus subtilis strain synthesizing the heterologous polysaccharide hyaluronic acid (HA) produced copious amounts of a red pigment with biochemical properties characteristic of pulcherrimin, a previously characterized iron-binding pyrazine compound. An apigmented B. subtilis mutant was isolated following chemical mutagenesis and compared to its parent strain using DNA microarray transcriptome analysis. Among the genes whose transcription was significantly (p < 0.05) altered in the apigmented mutant, yvmC, and cypX were selected as likely pulcherrimin biosynthetic genes on the basis of in silico bioinformatics examination which suggested that the yvmC gene product contains a putative phosphopantetheine domain characteristic of some cyclic peptide synthases, and the cypX gene encodes a cytochrome P450-like enzyme. Pulcherrimin biosynthesis was previously proposed to occur via cyclization of two leucine molecules followed by a redox reaction involving molecular oxygen. Additionally, yvmC and cypX genes are juxtaposed on the B. subtilis chromosome and appear to be coordinately expressed based on hierarchical cluster analysis of microarray data. Disruption of the yvmC or cypX genes yielded strains that did not produce red pigment. This study illustrates that the combination of transcriptional profiling and bioinformatics is an effective approach that can be employed to solve industrial fermentation problems.

Woods, K., Hilu, K.W., Borsch, T., Wiersema, J.H.

"Pattern of variation and systematics of Nymphaea odorata: II. Sequence information from ITS and trnL-trnF"

Systematic Botany, 30 (3), pp. 481-493. (2005)

Sequence data from the nuclear internal transcribed spacer (ITS) and the plastid trnL-trnF regions were used to assess relationships among populations of N. odorata across its North American range, and to evaluate whether subsp. odorata and subsp. tuberosa form distinct taxonomic units. Nymphaea mexicana was included because of suspected hybridization with N. odorata. The trnL-trnF region provided a single informative site in N. odorata. In contrast, the ITS region was more variable. Phylogenetic analysis of ITS data supports the monophyly of the two species. Within N. odorata, two clades were resolved largely representing subsp. odorata and subsp. tuberosa, although a few individuals appeared outside the respective clades. Polymorphic sites were detected in ITS, indicating possible hybridization between the subspecies. The geographic location of these hybrids suggests a possible hybrid zone. Overall, molecular evidence supports the segregation of subsp. odorata and subsp. tuberosa, with limited gene flow between them. © Copyright 2005 by the American Society of Plant Taxonomists.

Lin, J.T., Connelly, M.B., Amolo, C., Otani, S., Yaver, D.S.

"Global transcriptional response of Bacillus subtilis to treatment with subinhibitory concentrations of antibiotics that inhibit protein synthesis"

Antimicrobial Agents and Chemotherapy, 49 (5), pp. 1915-1926. (2005)

Global gene expression patterns of Bacillus subtilis in response to subinhibitory concentrations of protein synthesis inhibitors (chloramphenicol, erythromycin, and gentamicin) were studied by DNA microarray analysis. B. subtilis cultures were treated with subinhibitory concentrations of protein synthesis inhibitors for 5, 15, 30, and 60 min, and transcriptional patterns were measured throughout the time course. Three major classes of genes were affected by the protein synthesis inhibitors: genes encoding transport/binding proteins, genes involved in protein synthesis, and genes involved in the metabolism of carbohydrates and related molecules. Similar expression patterns for a few classes of genes were observed due to treatment with chloramphenicol (0.4x MIC) or erythromycin (0.5x MIC), whereas expression patterns of gentamicin-treated cells were distinct. Expression of genes involved in metabolism of amino acids was altered by treatment with chloramphenicol and erythromycin but not by treatment with gentamicin. Heat shock genes were induced by gentamicin but repressed by chloramphenicol. Other genes induced by the protein synthesis inhibitors included the yheIH operon encoding ABC transporter-like proteins, with similarity to multidrug efflux proteins, and the ysbAB operon encoding homologs of LrgAB that function to inhibit cell wall cleavage (murein hydrolase activity) and convey penicillin tolerance in Staphylococcus aureus. Copyright © 2005, American Society for Microbiology. All Rights Reserved.

Rey, M.W., Ramaiya, P., Nelson, B.A., Brody-Karpin, S.D., Zaretsky, E.J., Tang, M., Lopez de Leon, A., Xiang, H., Gusti, V., Clausen, I.G., Olsen, P.B., Rasmussen, M.D., Andersen, J.T., Jørgensen, P.L., Larsen, T.S., Sorokin, A., Bolotin, A., Lapidus, A., Galleron, N., Ehrlich, S.D., Berka, R.M.

"Complete genome sequence of the industrial bacterium Bacillus licheniformis and comparisons with closely related Bacillus species."

Genome biology, 5 (10), pp. R77. (2004)

BACKGROUND: Bacillus licheniformis is a Gram-positive, spore-forming soil bacterium that is used in the biotechnology industry to manufacture enzymes, antibiotics, biochemicals and consumer products. This species is closely related to the well studied model organism Bacillus subtilis, and produces an assortment of extracellular enzymes that may contribute to nutrient cycling in nature. RESULTS: We determined the complete nucleotide sequence of the B. licheniformis ATCC 14580 genome which comprises a circular chromosome of 4,222,336 base-pairs (bp) containing 4,208 predicted protein-coding genes with an average size of 873 bp, seven rRNA operons, and 72 tRNA genes. The B. licheniformis chromosome contains large regions that are colinear with the genomes of B. subtilis and Bacillus halodurans, and approximately 80% of the predicted B. licheniformis coding sequences have B. subtilis orthologs. CONCLUSIONS: Despite the unmistakable organizational similarities between the B. licheniformis and B. subtilis genomes, there are notable differences in the numbers and locations of prophages, transposable elements and a number of extracellular enzymes and secondary metabolic pathway operons that distinguish these species. Differences include a region of more than 80 kilobases (kb) that comprises a cluster of polyketide synthase genes and a second operon of 38 kb encoding plipastatin synthase enzymes that are absent in the B. licheniformis genome. The availability of a completed genome sequence for B. licheniformis should facilitate the design and construction of improved industrial strains and allow for comparative genomics and evolutionary studies within this group of Bacillaceae.

Martinez, D., Larrondo, L.F., Putnam, N., Sollewijn Gelpke, M.D., Huang, K., Chapman, J., Helfenbein, K.G., Ramaiya, P., Detter, J.C., Larimer, F., Coutinho, P.M., Henrissat, B., Berka, R., Cullen, D., Rokhsar, D.

"Genome sequence of the lignocellulose degrading fungus Phanerochaete chrysosporium strain RP78"

Nature Biotechnology, 22 (6), pp. 695-700. (2004)

White rot fungi efficiently degrade lignin, a complex aromatic polymer in wood that is among the most abundant natural materials on earth. These fungi use extracellular oxidative enzymes that are also able to transform related aromatic compounds found in explosive contaminants, pesticides and toxic waste. We have sequenced the 30-million base-pair genome of Phanerochaete chrysosporium strain RP78 using a whole genome shotgun approach. The P. chrysosporium genome reveals an impressive array of genes encoding secreted oxidases, peroxidases and hydrolytic enzymes that cooperate in wood decay. Analysis of the genome data will enhance our understanding of lignocellulose degradation, a pivotal process in the global carbon cycle, and provide a framework for further development of bioprocesses for biomass utilization, organopollutant degradation and fiber bleaching. This genome provides a high quality draft sequence of a basidiomycete, a major fungal phylum that includes important plant and animal pathogens.

Hansen, E.H., Schembri, M.A., Klemm, P., Schäfer, T., Molin, S., Gram, L.

"Elucidation of the Antibacterial Mechanism of the Curvularia Haloperoxidase System by DNA Microarray Profiling"

Applied and Environmental Microbiology, 70 (3), pp. 1749-1757. (2004)

A novel antimicrobial enzyme system, the Curvularia haloperoxidase system, was examined with the aim of elucidating its mechanism of antibacterial action. Escherichia coli strain MG1655 was stressed with sublethal concentrations of the enzyme system, causing a temporary arrest of growth. The expression of genes altered upon exposure to the Curvularia haloperoxidase system was analyzed by using DNA microarrays. Only a limited number of genes were involved in the response to the Curpularia haloperoxidase system. Among the induced genes were the ibpA and ibpB genes encoding small heat shock proteins, a gene cluster of six genes (b0301-b0306) of unknown function, and finally, cpxP, a member of the Cpx pathway. Knockout mutants were constructed with deletions in b0301-b0306, cpxP, and cpxARP, respectively. Only the mutant lacking cpxARP was significantly more sensitive to the enzyme system than was the wild type. Our results demonstrate that DNA microarray technology cannot be used as the only technique to investigate the mechanisms of action of new antimicrobial compounds. However, by combining DNA microarray analysis with the subsequent creation of knockout mutants, we were able to pinpoint one of the specific responses of E. coli-namely, the Cpx pathway, which is important for managing the stress response from the Curvularia haloperoxidase system. 

R.M. Berka; B.A. Nelson; E.J. Zaretsky; W.T. Yoder; M.W. Rey.

"Genomics of Fusarium venenatum: an alternative fungal host for making enzymes". 

In, Applied Mycology & Biotechnology, Vol. 4, Fungal Genomics (Arora, D.K. and Khachatourians, G.G., eds.) Elsevier Science, Amsterdam (2003)

Fusarium venenatum A3/5 (formerly F. graminearum Schwabe A3/5) has been used since 1985 as the commercial source of Quorn™ mycoprotein, a processed form of fungal mycelia applied in several human food products to simulate chunks of chicken or beef.  Regulatory approval of the organism for human consumption made it an attractive candidate to consider as a host for the production of industrial and food grade enzymes. Systems for genetic manipulation and transformation of F. venenatum cells have been developed together with several strong promoters and selectable markers for the introduction and expression of heterologous genes.  Recent marketing of a heterologous xylanase and a fungal trypsin have provided a "proof of concept" for F. venenatum as a useful alternative to more traditional fungal hosts such as Aspergillus niger or A. oryzae.  However, compared to the latter organisms and well-studied model fungi such as Neurospora crassa and A. nidulans, information regarding the genomics of F. venenatum is inadequate.  This chapter provides one of the first overviews of F. venenatum genomic information based on a compilation of expressed sequence tags and chromosomal gene sequences to initiate momentum for more comprehensive genome sequencing efforts.


R.M. Berka; X. Cui; C.Yanofsky.

"Genome-wide transcriptional changes associated with genetic alterations and nutritional supplementation affecting tryptophan metabolism in Bacillus subtilis."

Proc. Nat. Acad. Sci. USA, 100, 5682-5687 (2003)

DNA microarrays comprising  95% of the Bacillus subtilis annotated protein coding ORFs were deployed to generate a series of snapshots of genomewide transcriptional changes that occur when cells are grown under various conditions that are expected to increase or decrease transcription of the trp operon segment of the aromatic supraoperon. Comparisons of global expression patterns were made between cells grown in the presence of indole acrylic acid, a specific inhibitor of tRNATrp charging; cells deficient in expression of the mtrB gene, which encodes the tryptophan-activated negative regulatory protein, TRAP; WT cells grown in the presence or absence of two or three of the aromatic amino acids; and cells harboring a tryptophanyl tRNA synthetase mutation conferring temperature-sensitive tryptophan-dependent growth. Our findings validate expected responses of the tryptophan biosynthetic genes and presumed regulatory interrelationships between genes in the different aromatic amino acid pathways and the histidine biosynthetic pathway. Using a combination of supervised and unsupervised statistical methods we identified  100 genes whose expression profiles were closely correlated with those of the genes in the trp operon. This finding suggests that expression of these genes is influenced directly or indirectly by regulatory events that affect or are a consequence of altered tryptophan metabolism.

C. Workman; L.J. Jensen; H. Jarmer; R. Berka; L. Gautier; H.H. Saxild; C. Nielsen; S. Brunak; S. Knudsen. 

"Methods to reduce variability in DNA microarray experiments."

Genome Biology, 3, 0048.1-0048.16 (2002)

Microarray data are subject to multiple sources of variation, of which biological sources are of interest whereas most others are only confounding. Recent work has identified systematic sources of variation that are intensity-dependent and non-linear in nature. Systematic sources of variation are not limited to the differing properties of the cyanine dyes Cy5 and Cy3 as observed in cDNA arrays, but are the general case for both oligonucleotide microarray (Affymetrix GeneChips) and cDNA microarray data. Current normalization techniques are most often linear and therefore not capable of fully correcting for these effects.  We present here a simple and robust non-linear method for normalization using array signal distribution analysis and cubic splines. These methods compared favorably to normalization using robust local-linear regression (lowess). The application of these methods to oligonucleotide arrays reduced the relative error between replicates by 5-10% compared with a standard global normalization method. Application to cDNA arrays showed improvements over the standard method and over Cy3-Cy5 normalization based on dye-swap replication. In addition, a set of known differentially regulated genes was ranked higher by the t-test. In either cDNA or Affymetrix technology, signal-dependent bias was more than ten times greater than the observed print-tip or spatial effects.  Intensity-dependent normalization is important for both high-density oligonucleotide array and cDNA array data. Both the regression and spline-based methods described here performed bett er than existing linear methods when assessed on the variability of replicate arrays. Dye-swap normalization was less effective at Cy3-Cy5 normalization than either regression or spline-based methods alone.

R.M. Berka; J. Hahn; I. Draskovic; M. Persuh; X. Cui; A. Sloma; W. Widner; D. Dubnau.

"Microarrray analysis of the Bacillus subtilis K-state: genome-wide expression changes induced by ComK." 

Mol. Microbiol., 43, 1331-1345 (2002)

In Bacillus subtilis, the competence transcription factor ComK activates its own transcription as well as the transcription of genes that encode DNA transport proteins. ComK is expressed in about 10% of the cells in a culture grown to competence. Using DNA microarrays representing  95% of the protein-coding open reading frames in B. subtilis, we compared the expression profiles of wild-type and comK strains, as well as of a mecA mutant (which produces active ComK in all the cells of the population) and a comK mecA double mutant. In these comparisons, we identified at least 165 genes that are upregulated by ComK and relatively few that are downregulated. The use of reporter fusions has confirmed these results for several genes. Many of the ComK-regulated genes are organized in clusters or operons, and 23 of these clusters are preceded by apparent ComK-box promoter motifs. In addition to those required for DNA uptake, other genes that are upregulated in the presence of ComK are probably involved in DNA repair and in the uptake and utilization of nutritional sources. From this and previous work, we conclude that the ComK regulon defines a growth-arrested st ate, distinct from sporulation, of which competence for genetic transformation is but one notable feature. We suggest that this is a unique adaptation to stress and that it be termed the 'K-state'.

H. Jarmer; R. Berka; S. Knudsen; H.H. Saxild.

"Transcriptome analysis documents induced competence of Bacillus subtilis during nitrogen limiting conditions."

FEMS Microbiol. Lett., 206, 197-200 (2002)

DNA microarrays were used to analyze the changes in gene expression in Bacillus subtilis strain 168 when nitrogen limiting (glutamate) and nitrogen excess (ammonium plus glutamate) growth conditions were compared. Among more than 100 genes that were significantly induced during nitrogen starvation we detected the comG, comF, comE, nin-nucA and comK transcription units together with recA. DNA was added to B. subtilis grown in minimal medium with glutamate as the sole nitrogen source and it was demonstrated that the cells were competent. Based on these observations we propose a simplification of previously designed one-step transformation procedures for B. subtilis strain 168.