Utilizing the rapid advances in genome sequencing technology, we obtain the entire genome sequence of a microorganism, encoding thousands of new genes. To make sense of all this information, sophisticated bioinformatics tools and databases are needed.



Most of Novozymes’ products are based on the enzyme genes from one of the myriads of different microorganisms found in nature. A typical bacterium contains 5,000 different genes, and 10 years ago getting the sequence of a single gene required a three-month lab project. Now, all 5,000 genes can be identified in less than a week, shifting the challenge from the lab toward bioinformatics and database technologies.

At gene level the challenge for bioinnovation companies like Novozymes has shifted from data generation to data interpretation.
The rapid approach to obtaining the entire genome sequence of our production microorganisms also allows us to zoom in and understand what makes a specific strain a particularly good enzyme secretor or suggests ways of genetically improving it further.

Bioinformatics tools and sequence databases

For the discovery of novel microbial enzymes, the availability of literally thousands of genome sequences in the public domain, combined with Novozymes’ proprietary strains, puts an emphasis on user-friendly, sophisticated tools for selecting the right candidate genes before spending valuable lab time on the cloning and expression procedures.
The key to discovering the potential next blockbuster enzyme is to have sequence analysis flows and database views that allow the scientist to focus on the biological relevance of the data rather than being burdened by IT technicalities. Ensuring this flow and still ensuring access to all levels of the DNA and protein sequence information requires the use of highly complex database structures combined with user-friendly web interfaces.
Having in place a relational data structure that permits all the individual scientists to submit their enzyme-related data (sequence, assay performance, stability, etc.) allows data mining efforts that may help in deciphering correlations between enzyme sequences and physical properties such as thermostability.
The data flows and analyses require a close interplay between bioinformaticians, lab scientists, and IT experts. For example, the DNA of a bacterial strain needs to be selected and purified in the lab, the genome sequence needs to be assembled and annotated by the bioinformatics people, and the individual sequences need to be integrated into the company-wide relational database. It is this collaboration between different technologies that enables Novozymes to discover and engineer all of the products that are available from our company.