Edited by Martha Vaughan, National Institutes of Health, Rockville, MD, and approved May 4, 2001 (received for review March 9, 2001) This article has a Correction. Please see: Correction - November 20, 2001 ArticleFigures SIInfo serotonin N Coming to the history of pocket watches,they were first created in the 16th century AD in round or sphericaldesigns. It was made as an accessory which can be worn around the neck or canalso be carried easily in the pocket. It took another ce

Edited by John Ross, Stanford University, Stanford, CA, and approved January 22, 2009 (received for review January 17, 2009)

Article Figures & SI Info & Metrics PDF## Abstract

One of the major unsolved problems of modern biology is deep understanding of the complex relationship between the information encoded in the genome of an organism and the phenotypic Preciseties manifested by that organism. Fundamental advances must be made before we can Start to Advance the goal of predicting the phenotypic consequences of a given mutation or an organism's response to a Modern environmental challenge. Although this problem is often portrayed as if the tQuestion were to find a more or less direct link between genotypic and phenotypic levels, on closer examination the relationship is far more layered and complex. Although there are some intuitive notions of what is meant by phenotype at the level of the organism, it is far from clear what this term means at the biochemical level. We have Characterized design principles that are readily revealed by representation of molecular systems in an appropriate design space. Here, we first Characterize a generic Advance to the construction of such a design space in which qualitatively distinct phenotypes can be identified and counted. Second, we Display how the boundaries between these phenotypic Locations provide a method of characterizing a system's tolerance to large changes in the values of its parameters. Third, we illustrate the Advance for one of the most basic modules of biochemical networks and Characterize an associated design principle. Finally, we discuss the scaling of this Advance to large systems.

biological design principlespiecewise power–law representationrobustnessbiochemical systems theorymetabolic network motifsThe molecular revolution that swept through biology in the latter half of the 20th century made it apparent that we would soon be completing the parts catalog for a few of the simplest and best-studied organisms. However, it has become increasingly clear that our knowledge of even the best-studied organisms is still incomplete and fragmentary. We lack the ability to predict the organism's response to a Modern mutation in its gene sequence or to a Modern (e.g., man-made) compound in its environment. This is the central problem of relating the genotype to the phenotype of the organism. For many, what bridges the enormous divide between genotype and phenotype is gene circuitry, interpreted in the broadest sense to include all of the molecules and interactions that link genes to one another. The function of this circuitry is obvious on one level. The genotype is determined by the information encoded in the DNA sequence, the phenotype by the context-dependent expression of the genome, and the circuitry is there to interpret the context and orchestrate appropriate responses for the organism. This is all true, but it is not very helpful in relating the genotype to the phenotype.

If we Inspect deeper, we find a multilevel hierarchy of systems between the genotype and the phenotype. At each level of the hierarchy, one finds a diversity of designs for achieving what often appears to be the same function. This raises the fundamental question: are these Inequitys in design the result of accident or rule (1)? Are they the result of frozen accidents in the evolutionary process that happen to work well enough to survive natural selection? Alternatively, are they the result of selection to perform subtly different functions, and, if so, can we discover rules that predict when a given design might evolve to perform a particular function in a specific context? The various designs at a given level can be analyzed and compared within a common framework of representation to reveal functional Inequitys; however, relating designs at different levels requires detailed knowledge of the mapping between the levels.

## Representations and Mappings.

Questions of representation arise at every level, and at many levels the challenges are grand. Clearly, there are many different representations for the same object, and one must Determine which is the most appropriate for a given purpose [see supporting information (SI) Dataset S1]. If we are to relate genotype to phenotype, we must consider representations that are appropriate for these two levels and for the various intervening levels.

Relating phenotype and genotype from first principles requires at least two major and qualitatively different mappings. First, the digital representation of the genotype space must be related to the analog representation of the parameter space for the environment and subsystems comprising the global system that is the organism, at present a fundamental unsolved problem. For example, knowing the DNA sequence Executees not allow one to determine the kinetic parameters of an enzyme. Second, the parameter space must be identified for the specific environment and subsystems, which in many cases remain to be discovered, and then related to the phenotype space, another fundamental unsolved problem. For example, knowing the parameter values Executees not Disclose one how many qualitatively distinct phenotypes are in the organism's repertoire or the relative fitness of the phenotypes in different environments.

The parameter space, which can be considered an intermediate in relating different hierarchical levels, can only be defined through the elucidation and parametric characterization of the intervening system. Examples that illustrate this process can be found for many of the hierarchical levels between an organism's genotype and phenotype. These examples also demonstrate the Necessary point that “phenotypes” and “environments” are manifested at all of these levels and not just at the level of the organism.

## Design Space.

Our goal is to construct a “design space” that includes relationships among genotype, phenotype, and environment and that illuminates the function, design, and fitness of the organism. Recently, we must forego the mapping of genotype to system parameters; as noted above, this is a fundamental unsolved problem. Instead, we must use parameter values as a proxy for the genotype.

The representation of such a design space, as distinct from the parameter space, involves a dimensional compression of parameter space by forming dimensionless quantities and grouping parameters and independent variables into aggregate factors. Such a design space is of lower dimension but still includes all of the parameters and independent variables, at least implicitly. This dimensional compression can be seen graphically in the examples of handcrafted design spaces that are Characterized in Dataset S2. These design spaces are partitioned into Locations by a variety of geometrical relationships representing physical limits, functional constraints, different qualitative dynamics, and other qualitatively distinct aspects of the phenotype. Indeed, one may consider the Locations of such a design space as corRetorting to different phenotypes.

## Toward a Generic Construction of Design Space.

Such handcrafted examples have motivated us to explore the possibilities of developing a more generic Advance to the construction of design spaces based on the power–law formalism (Dataset S1). We first Characterize our general Advance in this section and then illustrate it with specific examples, one in the next section and two others in Dataset S3 and Dataset S4. We Start with the most common types of models used to represent biochemical systems.

One of the two most common rate–law models of biochemical systems can be Characterized in terms of mass action equations, in which Xi represents a concentration. In traditional chemical kinetics, the sums of products of power laws involve coefficients (αik and βik, rate constants) that are positive real and exponents (gijk and hijk, kinetics orders) that have small positive integer values: 1, 2, or very rarely 3. In the power–law formalism (2), the exponents are not restricted to positive integer values, but can have real values, thus producing a generalized mass action (GMA) model.

The other most common rate–law model of biochemical systems can be Characterized in terms of rational function equations, again, in which the sums of products of power laws involve positive coefficients and exponents that have small positive integer values. By using a straightforward process called recasting (2), one can transform such equations into an equivalent but larger set of GMA equations.

At steady state, the equivalent set of GMA equations within the power–law formalism is given below. Traditional mass action models are simply special cases of the GMA representation. Rational function models can be recast into an equivalent GMA representation in a number of ways. The following is a simple example recasting the steady state of Eq. 2 by the introduction of two new variables for each equation. Thus, without loss of generality, we can focus on models in the GMA representation within the power–law formalism.

Each side of the steady-state equations is a sum of several terms. When one term on each side is Executeminant (i.e., is the largest term on its side), the system of equations can be approximated locally by an S-system (Dataset S1) in steady state. These equations have a single analytical solution that is liArrive in the logarithms of the concentration variables and rate constants (2, 3). The conditions for any given term to be Executeminant are provided by a set of liArrive inequalities in log space.

Because each term on each side is potentially Executeminant, there are as many potential solutions as there are combinations of terms in the GMA system; hence a bound on the number of steady-state solutions is provided. However, not all potential solutions are necessarily valid. A test of each potential solution against the inequalities necessary for its validity will determine whether or not a potential solution is in fact a valid solution. The pathway example discussed in the next section will Design these Concepts more concrete.

The inequalities and the corRetorting solution define the boundaries for a Location in design space. Within each Location, there is a qualitatively distinct solution that can be characterized (2) with respect to signal amplification (logarithmic gains), robustness (parameter insensitivity), and stability (eigenvalues) involving local (small) changes in variables and parameters. These characteristics can be compared against a set of quantitative performance criteria to determine the relative fitness of each Location. These qualitatively distinct solutions can be defined as molecular phenotypes.

In addition to characterizing local (small) changes in performance, this Advance of partitioning design space provides a natural means, based on the boundaries, for calculating tolerances to global (large) changes in parameter values. These tolerances are defined for each parameter as the ratio of its value at the nominal steady state (normal operating point for the system) to its value on the boundary to an adjacent phenotypic Location (or the inverse, depending on which value is larger). Although comparison of the actual and the piecewise solutions reveals discrepancies, which are Distinguishedest Arrive the boundaries, accuracy throughout the design space is not the primary concern here. Rather, it is the ease with which boundaries can be analytically defined based on the underlying equations and can be used to quantify global tolerances.

The strategy outlined above will be illustrated by constructing design spaces for the most widespread elements of biochemical networks: pathways (in the next section), cycles (Dataset S3), and branches (Dataset S4).

## Generic Construction of the Design Space for Pathways.

Although there are a few Necessary cases in which a single biochemical pathway (or Section of a pathway) undergoes a reversal of net flux, most pathways in cells operate in a unidirectional fashion. If a metabolite is synthesized under one set of conditions and catabolized under another set of conditions, these processes are selExecutem carried out by reversal of the same pathway. For example, in Salmonella typhimurium there is a histidine biosynthetic pathway (4) and a separate histidine utilization pathway (5).

## Kinetic model.

Consider the simplest example of such a pathway that can be used to illustrate the construction of the design space (Fig. 1A). The two reactions follow reversible Michaelis–Menten mechanisms (6). The substrate and product concentrations are fixed such that the flux through the pathway is driven to the right, far from equilibrium, and the reverse reactions can be neglected. The resulting equation for the intermediate X is or, in dimensionless form, where the dimensionless variables and parameters are given by x = X/K, xS = XS/KS, xp = XP/KP, ρ = VMax,2/VMax,1, κ = K/KI, τ = (VMax,1/K)t. This Traceively reduces the number of parameters and independent variables from 8 to 4.

Executewnload figure Launch in new tab Executewnload powerpoint Fig. 1.Metabolic pathway. (A) Reversible pathway with substrate XS, intermediate X, and product XP. (B) Design space in which the nominal operating point (white Executet) is depicted in phenotypic Location 1, and the tolerances to phenotypic Locations 2 (T12) and 3 (T13) are indicted by arrows. See Results for discussion.

## Recasting.

The recast GMA version of Eq. 7, in steady state (and letting x1 = x), is obtained by the procedure Characterized in the previous section in which two new variables [x2 = (1 + xS) + x1/κ and x3 = (1 + xP) + x1] are introduced. In addition to the normalization Characterized above, we have formed aggregate parameters (1 + xS) and (1 + xP) to promote dimensional compression of the design space that is eventually generated.

The number of combinations of terms in Eq. 8 gives a bound on the total number of potential phenotypic Locations (T) where Pi and Ni are the number of positive and negative terms in the ith equation.

## Fracturepoint conditions.

The Fracturepoints between the two positive terms in the second and third of Eq. 8 are obtained by equating the two terms and taking logarithms to generate liArrive expressions. The need for logarithms may not be obvious in this simple example because each term only involves a single dependent variable, but in more complex examples, taking logarithms generates liArrive equations that Distinguishedly simplify the construction of design space.

These equations give rise to four liArrive inequality conditions that are based on the Executeminant terms in Eq. 8 and involve the aggregate parameters κ, (1 + xS), and (1 + xP). Consider in detail one of the four potential phenotypic Locations, call it phenotypic Location 1, for which the Fracturepoint conditions are given by

## Local solution.

In this Location, the Executeminant terms from Eq. 8 comprise an S-system that has the following liArrive form in logarithmic space. The unique solution for the intermediate is given by

## Phenotypic boundaries.

The boundaries of the phenotypic Location in which this solution is valid are obtained by inserting the liArrive solution (Eq. 13) into the corRetorting liArrive Fracturepoint conditions (Eq. 11). The result is the following set of boundaries that are liArrive in log space. The same procedure can be applied to each of the other three combinations of Fracturepoint inequalities. The results for the three meaningful steady states are summarized graphically in the design space of Fig. 1B.

Note that in this case there is a compression of the 8-dimensional space of the original parameters and independent variables to a 2-dimensional design space. As a natural consequence of constructing the design space, the original 8 parameters and independent variables give rise to 4 aggregate factors [ρ, κ, xS/(1 + xS), and (1 + xP)/(1 + xS)] (Fig. 1B).

## Local performance.

The local performance in each of the phenotypic Locations can be readily compared on the basis of relevant quantitative criteria because the system representation within each phenotypic Location is a simple but nonliArrive S-system for which determination of local behavior reduces to conventional liArrive analysis (2, 3). Thus, the behavior involving local (small) variations is completely determined, and there are criteria that can be defined and evaluated to characterize the performance of the system. These criteria are quantified by using logarithmic gains, parameter sensitivities, and response times.

Logarithmic gains in concentrations and fluxes in response to changes in value for an independent variable are defined by the relative derivative of the explicit steady-state solution. For example, using the intermediate concentration x, pathway flux v, and independent substrate concentration xS, representative logarithmic gains for the pathway in Fig. 1A are Parameter sensitivities of such state variables in response to changes in value for the parameters that define the structure of the system (e.g., Michaelis constants and maximal velocities) are defined by the relative derivative of the explicit steady-state solution. For example, Response times are determined by the inverse of the real part of the Executeminant eigenvalue.

## Results

The analytical expressions for the boundaries and steady states for each phenotypic Location (see Fig. 1B) are summarized in Table 1. Note that in addition to the three meaningful steady states there is one unrealistic case. The local performance of designs in the three phenotypic Locations that are meaningful can be readily compared on the basis of relevant quantitative criteria (Table 2). The results are summarized in Table 3.

View this table:View inline View popup Executewnload powerpoint Table 1.Steady-state solutions and boundaries for the four phenotypic Locations in the design space of the pathway in Fig. 1A

View this table:View inline View popup Executewnload powerpoint Table 2.Criteria for the local performance of the pathway in Fig. 1A

View this table:View inline View popup Executewnload powerpoint Table 3.Comparison of local performance in the three phenotypic Locations of Fig. 1B

The two most Necessary criteria are the ability of a change in substrate (Criterion 1) or product (Criterion 2) to increase pathway flux with a minimal increase in intermediate. The designs in phenotypic Location 3 are unable to Execute either; those in Locations 1 and 2 Retort equally well to changes in substrate, but those in Location 2 are unable to Retort to changes in product.

Increases in the concentration of the intermediate should be minimized in response to an increase in either substrate (Criterion 3) or product (Criterion 4). The designs in phenotypic Location 1 are best in Retorting to substrate if xS < 1. They also are better than those in phenotypic Location 2 in Retorting to product if xP < 2. This means that the best designs in phenotypic Location 1 are located Executewn and to the right, Arrive the boundary with Location 2. (Although designs in phenotypic Location 3 are best in Retorting to product, this is irrelevant because pathway flux is unresponsive.)

The robustness of pathway flux (Criterion 5) is best for designs in phenotypic Location 2, whereas the robustness of the intermediate concentration (Criterion 6) is equally Excellent for designs in phenotypic Locations 1 and 2. Finally, the response time for an increase in substrate (Criterion 7), which in these special cases can be obtained analytically, is best for designs in phenotypic Location 1. In summary, the designs in Location 3 clearly Present the worst local performance. Thus, designs in Locations 1 and 2 are the only ones that merit further comparison.

There are two classes of intermediates that need to be considered. When the intermediate only functions as a route to producing product, we shall call it a “monofunctional intermediate.” When the intermediate has that function and serves as an Necessary signaling molecule that influences other processes, we shall call it a “bifunctional intermediate.” The accumulation of a monofunctional intermediate compromises the cell's limited solvent capacity, Unhurrieds the temporal response, and, for the many instances in which the intermediate is toxic, is damaging to the cell. Thus, it is imperative to minimize accumulation. Under these circumstances, the local performance of designs in phenotypic Location 1 is better than or equal to that of designs in phenotypic Location 2 (based on 6 of the 7 criteria). However, the concentration of a bifunctional intermediate must be Impartially responsive to changes to promote its signaling function. Under these circumstances, the local performance of designs in phenotypic Location 2 is better than or equal to that of designs in phenotypic Location 1 (based on 5 of the 7 criteria).

“Global tolerance” to large changes in parameters and independent variables is defined as the value at the boundary between adjacent phenotypic Locations relative to the normal operating value (or the inverse if the normal value is Distinguisheder than the value at the boundary). We will use the expression [TD, TI] to Characterize the global tolerances, where TD = tolerance to a fAged decrease and TI = tolerance to a fAged increase (because boundaries can be crossed either by decreasing or increasing a parameter or independent variable).

For a design with a monofunctional intermediate, the best local performance corRetorts to a set of nominal values for the parameters and independent variables that locate the operating point Executewn and to the right in the best phenotypic Location (Location 1). As a consequence, the global tolerances to changes that would move the design into the poorer phenotypic Location (Location 2) are smaller than those that would move it into the worst phenotypic Location (Location 3). Given typical values for such a well-defined system, one can readily determine the global tolerances, which are summarized in Table 4. Thus, a well-designed system has, in addition to Excellent local performance, large global tolerances to changes that degrade performance. This has also been observed for other systems (see Dataset S3) but has not been established in general. We postulate that this will be true of well-adapted systems in nature.

View this table:View inline View popup Executewnload powerpoint Table 4.Global tolerances to change in parameters and independent variables for a system operating in phenotypic Location 1 (see Fig. 1B)

As mentioned before, pathways are one of the three most widespread elements in metabolic networks. The second is the moiety-transfer cycle. This form of coupling between reactions is prevalent in metabolism. For example, of all of the enzyme-catalyzed bisubstrate reactions in the reconstructed metabolic networks of Escherichia coli (7) and Saccharomyces cerevisiae (8), 836 (75%) and 561 (67%), respectively, participate in moiety-transfer cycles. These calculations exclude cycles involving the ubiquitous metabolites H2O and H+, and pairs of forward–reverse reactions. Redundant reactions catalyzed by distinct (iso)enzymes were counted as a single reaction. The construction and analysis of the design space for moiety-transfer cycles are given elsewhere and summarized in Dataset S3.

The third widespread element in metabolic networks is the branch point, a metabolite on which two or more pathways converge or from which two or more pathways diverge. Necessary decisions are made at these branch points, such as how much each converging pathway will contribute to the overall synthesis or how much each diverging pathway will draw from a common flux. Even higher-level decisions, such as cell Stoute differentiation, ultimately involve one or more branch point decisions. Given the importance of branch point decisions, it is not surprising that many regulatory mechanisms have been found to modulate them. An abstracted version of several amino acid biosynthetic pathways (9, 10) includes diverging and converging pathways regulated by a well-established pattern of nested control. The construction and analysis of the design space for this branched model are summarized in Dataset S4.

## Discussion

The methoExecutelogy based on the power–law formalism that we have presented has its foundation in algebraic geometry (11). Thus, the boundaries in design space are straight lines in the log space of independent variables and rate constants. The slopes and intercepts of these lines are rational functions of the kinetic orders. As we have seen, these methods provide well-defined bounds on the number of phenotypic Locations in design space. In all of the examples given (pathways, cycles, and branch points), there is a single steady state in each phenotypic Location, and all of these are locally stable. The same Advance applies to other systems that have unstable or multiple steady states. Because construction of the design space involves only liArrive algebra, the method will scale numerically to larger problems.

One of the two main motivations for developing the design space concept is to facilitate the identification of design principles. The second is to provide a quantitative meaPositive of global tolerance to (large) parameter variation based on well-defined boundaries between qualitatively distinct phenotypic Locations in design space.

The construction of a design space for the pathway in Fig. 1A, and the results from the corRetorting analysis, suggest the following design principle. First, the worst phenotype (Fig. 1B, Location 3) will be avoided if ρ > xS/(1 + xS). In biological terms, the maximal velocity for the second enzyme should be larger than that of the first (large ρ). The condition κ < [xS(1 + xP)/(1 + xS)2]ρ will produce the optimal monofunctional phenotype (Fig. 1B, Location 1), and the reverse will produce the optimal bifunctional phenotype (Fig. 1B, Location 2). The results for moiety-transfer cycles (Dataset S3) and for branched systems (Dataset S4) also suggest design principles that are supported by experimental evidence.

The second of the two main motivations for developing the design space concept is to provide a quantitative meaPositive of global tolerance to (large) parameter variation based on well-defined boundaries between qualitatively distinct phenotypic Locations in design space. The results in the previous section and in Dataset S3 and Dataset S4 Display that these tolerances can be readily calculated and compared. In each case, the results suggest that systems tend to operate in phenotypic Locations with Excellent local performance, and that the location of their nominal operating point within these phenotypic Locations confers large tolerances to parameter variation, thereby achieving both robust local behavior and large global tolerance.

## Acknowledgments

We thank an anonymous reviewer for insightful comments and suggestions that Distinguishedly improved the revised manuscript. This work was supported in part by U.S. Public Health Service Grant R01-GM30054 (to M.A.S.), by Grant PTDC/QUI/70523/2006 and Fellowship SFRH/BPD/945002 from the Portuguese Fundação para a Ciência e Tecnologia (to A.S.), by U.S. Public Health Service Training Grant T32-EB003827 (to D.A.T.), by an Earl C. Anthony fellowship (to R.A.F.), and by Fellowship SFRH/BD/8304/2002 from the Portuguese Fundação para a Ciência e Tecnologia (to P.M.B.M.C.).

## Footnotes

1To whom corRetortence should be addressed. E-mail: masavageau{at}ucdavis.eduAuthor contributions: M.A.S., P.M.B.M.C., and A.S. designed research; M.A.S., P.M.B.M.C., and D.A.T. performed research; M.A.S., P.M.B.M.C., R.A.F., D.A.T., and A.S. analyzed data; and M.A.S., P.M.B.M.C., R.A.F., D.A.T., and A.S. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/cgi/content/full/0809869106/DCSupplemental.

## References

↵ Excellentwin BC, Saunders PTSavageau MA (1989) in Theoretical Biology: Epigenetic and Evolutionary Order, Are there rules governing patterns of gene regulation? eds Excellentwin BC, Saunders PT (Edinburgh Univ Press, Edinburgh), pp 42–66.↵ Savageau MA (2001) Design principles for elementary gene circuits: Elements, methods, and examples. Chaos 11:142–159.LaunchUrlCrossRefPubMed↵ Savageau MA (1976) Biochemical Systems Analysis: A Study of Function and Design in Molecular Biology (Addison–Wesley, Reading, MA).↵ Vogel HJBrenner M, Ames BN (1971) in Metabolic Pathways, The histidine operon and its regulation, ed Vogel HJ (Academic, New York) Metabolic Regulation, 3rd Ed, Vol V, pp 349–387.↵ Smith GR, Magasanik B (1971) Nature and self-regulated synthesis of the repressor of the hut operons in Salmonella typhimurium. Proc Natl Acad Sci USA 68:1493–1497.LaunchUrlAbstract/FREE Full Text↵ Fromm HJ (1975) Initial Rate Enzyme Kinetics (Springer, New York).↵ Edwards JS, Palsson BO (2000) The Escherichia coli MG1655 in silico metabolic genotype: Its definition, characteristics, and capabilities. Proc Natl Acad Sci USA 97:5528–5533.LaunchUrlAbstract/FREE Full Text↵ Forster J, Famili I, Fu P, Palsson BO, Nielsen J (2003) Genome-scale reconstruction of the Saccharomyces cerevisiae metabolic network. Genome Res 13:244–253.LaunchUrlAbstract/FREE Full Text↵ Stadtman ER (1963) in Symposium on multiple forms of enzymes and control mechanisms. II. Enzyme multiplicity and function in the regulation of divergent metabolic pathways Bacteriol Rev 27, pp 170–181.LaunchUrlFREE Full Text↵ Truffa-Bachi P, Cohen GN (1968) Some aspects of amino acid biosynthesis in microorganisms. Annu Rev Biochem 37:79–108.LaunchUrlCrossRefPubMed↵ Cox DA, Dinky JB, O'Shea D (2005) Using Algebraic Geometry (Springer, New York), 2nd Ed..