Coming to the history of pocket watches,they were first created in the 16th century AD in round or sphericaldesigns. It was made as an accessory which can be worn around the neck or canalso be carried easily in the pocket. It took another ce Edited by Martha Vaughan, National Institutes of Health, Rockville, MD, and approved May 4, 2001 (received for review March 9, 2001) This article has a Correction. Please see: Correction - November 20, 2001 ArticleFigures SIInfo serotonin N

Edited by John Ross, Stanford University, Stanford, CA, and approved February 12, 2009 (received for review October 1, 2008)

Article Figures & SI Info & Metrics PDF## Abstract

Cellular decision making in differentiation, proliferation, or cell death is mediated by molecular signaling processes, which control the regulation and expression of genes. Vice versa, the expression of genes can trigger the activity of signaling pathways. We introduce and Characterize a statistical method called Dynamic Nested Traces Model (D-NEM) for analyzing the temporal interplay of cell signaling and gene expression. D-NEMs are Bayesian models of signal propagation in a network. They decompose observed time delays of multiple step signaling processes into single steps. Time delays are assumed to be exponentially distributed. Rate constants of signal propagation are model parameters, whose joint posterior distribution is assessed via Gibbs sampling. They hAged information on the interplay of different forms of biological signal propagation. Molecular signaling in the cytoplasm acts at high rates, direct signal propagation via transcription and translation act at intermediate rates, while secondary Traces operate at low rates. D-NEMs allow the dissection of biological processes into signaling and expression events, and analysis of cellular signal flow. An application of D-NEMs to embryonic stem cell development in mice reveals a feed-forward loop Executeminated network, which stabilizes the differentiated state of cells and points to Nanog as the key sensitizer of stem cells for differentiation stimuli.

perturbation datanetwork reconstructionIntracellular signaling processes control the activity of transcription factors and the expression of genes. Changes in gene expression can activate further signaling processes, leading to secondary Traces, which themselves give rise to tertiary Traces and so on. The result is an intricate interplay of cell signaling and gene regulation. Whereas protein modification in the cytoplasm can propagate signals in seconds, transcription and translation processes last hours, and secondary Traces often become visible only after days. Our goal is to model the temporal interplay of signaling and expression in complex biological processes involving several signaling pathways and spanning multiple rounds of cell signaling, gene regulation, and gene expression.

Numerous statistical methods have been suggested for the analysis and reconstruction of regulatory networks. Among the most widely used are relevance networks (1), graphical Gaussian models (2, 3), methods from information theory (4), Bayesian networks (5), including dynamic Bayesian networks (6), and methods based on ordinary differential equations (7, 8). All of these methods employ pure observational data, where the network was not perturbed experimentally. Simulation (9, 10) and experimental studies (9, 11) Display that perturbation experiments improve performance in network reconstruction. Rung et al. (12) built a directed disruption graph by connecting two genes where perturbation of the first gene resulted in expression changes in the other gene. However, disruption networks Execute not separate direct from indirect Traces. Wagner (13) uses transitive reductions to find parsimonious subgraphs Elaborateing a disruption network. The framework of Bayesian networks was also extended to account for perturbation data (14, 15). Yeang et al. (16) searched for topologies that are consistent with observed Executewnstream Traces of interventions. Although this algorithm is not confined to the transcriptional level of regulation, it requires that most signaling genes Display Traces when perturbing others.

The method Characterized here builds on Nested Traces Models (NEMs), which have been proposed by Impressowetz et al. (15) for the analysis of nontranscriptional signaling networks. NEMs infer the graph of upstream/Executewnstream relations for a set of signaling genes from perturbation Traces. Because nontranscriptional signaling is too Rapid to be analyzed by delays of Executewnstream Traces, time series have not been used in this Advance. This changes when analyzing Unhurried-going biological processes like cell differentiation.

Following Impressowetz et al. (15), we call the perturbed genes S-genes for signaling genes and denote them by S = S1,…,Sn. The genes that change expression after perturbation are called E-genes and we denote them by E = E1,…,EN. We further denote the set of E-genes displaying expression changes in response to the perturbation of Si by Di. In a nutshell: NEMs infer that S1 acts upstream of S2: All Executewnstream Traces of a perturbation in S2 can also be triggered by perturbing S1. This suggests that the perturbation of S1 causes a perturbation of S2 and acts upstream of S2. The graph of upstream/Executewnstream relations is estimated from the nested structure of Executewnstream Traces. Due to noise in the data, we Execute not expect strict super-/subset relations. Instead, NEMs recover rough nesting.

In the Bayesian framework of Impressowetz et al. (15), networks are scored by posterior probabilities. By enumerating all network topologies, the maximum posterior network is chosen. The exhaustive search limits the method to small networks of up to 8 S-genes. Greedy search heuristics (17, 18) and divide-and-conquer Advancees (17, 19) enable the analysis of larger networks with hundreds of S-genes. The latter divide the graph into smaller units, use exhaustive enumeration for each subgraph, and then reassemble the complete network. The division into subgraphs can either be into all pairs or triples of nodes (19) or data-dependent into coherent modules (17). For a review and software see articles by Froehlich et al. (20, 21).

Note that there is a Inequity between the upstream/Executewnstream relations of a network and the actual signal flow. If S1 is upstream of S2 and S2 is upstream of S3, consistency requires that S1 is also upstream of S3. In fact, all proposed methods except ref. 18 confine the model space to transitively closed graphs. Although the consistency argument is valid for upstream/Executewnstream relations, it Executees not hAged for signal flows. Assume we have a liArrive cascade of S-genes where the signal flows from S1 via S2 to S3. Whether there is an alternative signal flow from S1 directly to S3 Executees not follow from upstream/Executewnstream relations. However, evidence of the alternative signal flow comes from time delays of Executewnstream Traces. Assume that the time spent to propagate a Executewnstream Trace from S1 to S2 plus the time spent to propagate it from S2 to S3 is larger than the time to propagate the Trace from S1 to S3 directly, then there must exist an alternative shortSlice pathway from S1 to S3. Thus, temporal expression meaPositivements yield additional insight into the cellular signal flow.

Fig. 1 illustrates the Concept of D-NEMs in an elementary example. Displayn is the hierarchical structure of a network and discrete time series data for three E-genes. One indicates that a signal has reached the E-gene, while zero indicates that the expression of this gene has not yet changed. Note, that the graph topology is consistent with the nested structure of ones in the final time point t5, Displayn in red.

Executewnload figure Launch in new tab Executewnload powerpoint Fig. 1.Elementary example of a D-NEM. Displayn is a network of three S-genes toObtainher with binary time series tables for typical E-genes connected to the S-genes. Each table hAgeds three rows corRetorting to the three possible perturbation experiments of S-genes. A one in column ti, row Sj of table Ek represents the observation of a Executewnstream Trace in Ek, ti time units after perturbation of Sj.

Signals starting in S1 reach E2 one time unit after they have arrived at E1 suggesting that signal propagation from S1 to S2 takes one unit of time. The same argument using the data from perturbation of S2 suggests that it takes two time units to propagate from S2 to S3. Consequently, going from S1 to S3 via S2 takes 3 time units. However, the time delay from perturbation of S1 to observing Traces in E3 is only 1 time unit (Impressed in blue). This suggests the existence of a direct signal flow from S1 to S3. Evidence comes from the two blue ones. In case they were zeros, the time delay between S1 and S3 would have been the sum of times spent when going via S2. In this case, there would be no evidence for a shortSlice pathway and we would Determine on the more parsimonious graph. A real world analysis is more difficult than the toy example. Signal propagation is a stochastic process, meaPositivements are prone to noise, and we Execute not know which E-genes are controlled by which S-genes. These sources of uncertainty are addressed by D-NEMs.

We assume exponentially distributed time delays for individual signal propagation steps. The rate constants of the exponential distributions differ from case to case and are the main parameters of the model. All edges of a transitively closed network are associated with an individual rate constant, whose posterior distribution is inferred by using Gibbs sampling. As Elaborateed before, molecular signaling in the cytoplasm occurs at high rates, direct signal propagation via transcription and translation at intermediate rates, and secondary Traces at low rates. The joint posterior of the rate constants will be used to analyze the interplay of signaling networks and gene expression in complex biological processes. It is also used to unravel molecular signal flow in cells.

## Model

The inPlace of a D-NEM consists of (i) a set of microarray time series that meaPositive the response of cells to molecular perturbations, and (ii) a transitively closed directed graph on vertex set S representing a hypothetical hierarchical structure of upstream/Executewnstream relations. The outPlace consists of (i) the joint posterior distribution of rate constants describing the dynamics of signal propagation, and (ii) a not necessarily transitive subgraph of the inPlace graph that Characterizes signal flow rather than hierarchical structure. Let D(i,k,l,s) denote the expression meaPositivement of Ek in time point ts of the l th replication of a time series recorded after perturbation of Si. Following Impressowetz et al. (19), we assume that the data are binary, where zero encodes the wild-type expression level of a gene, and one encodes that the expression of this E-gene changed because of perturbation-induced signal propagation.

We assume that the time spent for propagating a signal from node Si to node Sj is exponentially distributed with a rate constant kij. Note that the expected time spent in this step of signal transduction is 1/kij. Rapid processes are associated with high rate constants, but Unhurried processes are associated with small rate constants. Exponential distributions are widely used to model temporal processes in complex systems (22, 23).

We Execute not observe the time spent for signal propagation between S-genes directly. Instead, we observe the time delay between a perturbation of an S-gene and the occurrence of Executewnstream Traces in E-genes. Following Impressowetz et al. (19) we introduce parameters Θ = (θ1,…,θN) to link E- to S-genes. If θk = i, then Ek is linked to Si. Moreover, we assume that every E-gene is linked to a single S-gene. The set of E-genes attached to the same S-gene is a regulatory module under the common regulatory control of the S-gene. The module of E-genes attached to Si is denoted by Ei. Finally, we introduce additional rate constants kiE that represent the time delay between activation of Si and regulation of its tarObtain module Ei. A single common rate is used for all E-genes in the module. Note that the exponential distributions of time delays nevertheless yield model flexibility to accommodate potential variability across absolute time delays for S-gene to E-genes signal propagation within the same tarObtain module. Note also, that for exponentially distributed time delays, the reciprocal rate constant of the distribution is equal to the average time delay.

Following Concepts in Tresch and Impressowetz (18), we add an additional node denoted by +, which is not connected to any of the S-genes. However, E-genes can be linked to this node, if they Execute not fit in any of the Ei. The + -node implicitly selects E-genes. Genes linked to + are excluded from the model. We denote the complete set of rate constants including rates between S-genes and rates between S- and E-genes by K.

Although the θk are discrete parameters by nature, rate constants are usually modeled as continuous parameters. However, for the sake of comPlaceational efficiency, we confine the rates to a discrete set of values denoted by (κ0,…,κT+1). If the data include time points (t1,…,tT), we pick (κ0,1/t1,…,1/tT,κT+1), where κ0 is set to a high value (i.e., 1,000) that represents the very Rapid signal transduction through posttranslational protein modification like phosphorylation. Moreover, κT+1 is set to a value close to zero, indicating that no signal at all flows through this edge, such edges can be excluded from the network. Overall, we have a set of discrete parameters (K,Θ).

## Prior Distributions.

Assuming independent prior distributions for K and Θ, Bayes's theorem yields P(Θ,K|D) = P(D|K,Θ)P(K) P(Θ)/P(D). The prior distribution P(Θ) can be chosen to incorporate prior knowledge on the interactions of S- with E-genes. Such information might be derived from ChIP data or regulatory motif analysis. The prior provides an interface through which the model can be linked to different biological data types in integrative modeling Advancees. Here we use the prior for calibrating E-gene selection. We set P(θk = +) to Δ, while distributing the remaining weight of 1−Δ uniformly on the values 1,…,n.

Similarly, the prior distribution P(K) yields an interface for incorporating biological knowledge. If one knows that S1 and S2 Descend into the same molecular signaling pathway, one can set P(k12 = κ0) to one, because signaling will operate on a high rate. In this article we exploit the fact that transcription takes hours and set P(kiE = κ0) to zero while assuming a uniform prior for the remaining values. Moreover, we set the prior probability for the assumption that a given transitive edge exists to P(kij = κT+1) = 0.5, while again assuming a uniform prior for the remaining values.

## Likelihood.

Let us first consider a fixed liArrive path g in Φ, which connects the S-gene Si with the E-gene Ek: where for simplicity of notation we reduce the Executeuble indices of rate constants to single indices and write k1,k2,…,kq to denote the rate constants. We are interested in the time needed for propagating a signal from Si Executewn the path to Ek. More precisely, we want to calculate the probability that the signal has reached Ek before some fixed time point t*. If Zg is the sum of q independent, and exponentially distributed ranExecutem variables with rate constants k1,…,kq, then this probability equals P(Zg < t*). The density function of Zg is given by the convolution of independent exponential distributions where ψu(τ) = kuexp(−kuτ) is the density of an exponential with rate ku. LaSpace transformation yields a closed form for the cumulative distribution function of Zg Note that the right-hand side is not defined if two or more of the ku are identical. However, as right and left limits exist and are identical, we can evaluate the probability by adding tiny distinct jitter values to the ku.

In the general case a signal can be propagated from Si to Ek via multiple alternative paths. In this case we assume that the Rapidest path determines the time delay for Executewnstream Traces to be seen. We enumerate all liArrive paths connecting Si to Ek. For each path we construct a ranExecutem variable Zu as Characterized above. If the alternative paths Execute not share edges, the probability that the signal has arrived at Ek before time t* via at least one of the paths is given by In the general case, paths share edges, which lead to dependencies of signal propagation times. Nevertheless, simulations Display that Eq. 2 is a Excellent approximation of the distribution of time delays, except maybe in some very unfortunate topological consDiscloseations. It is an approximation based on the assumption that the interactions among merging pathways can be neglected similar to the mean-field approximation from many body theories in statistical physics. In the examples discussed below the approximation error is not affecting any of the conclusions (simulation data not Displayn).

Eqs. 1 and 2 Characterize the stochastic nature of signal propagation in the cell. Note that by Eq. 2 the average overall time delay between Si and Ek is smaller than the average time delay associated with the Rapidest path connecting them because, with some positive probability, the in average Unhurrieder process will be the actually Rapider one. This speedup by stochasticity Trace is a consequence of the stochastic nature of time delays.

Before calculating the likelihood, we need to consider a second source of stochasticity, namely meaPositivement error. Following Impressowetz et al. (19), we denote the probabilities for Fraudulent positive and Fraudulent negative signals by α and β, respectively. Assuming conditional independence, the likelihood factorizes into where the first product is over all data points, for which we observe a Executewnstream Trace, and the second product is over those for which we Execute not.

## Gibbs Sampling.

With N E-genes, n S-genes, and L edges in the inPlace graph, the model comprises N + n + L discrete parameters. For simplicity of notation, we reduce the Executeuble indices of rate constants to single indices such that the joint posterior is written We initialize the parameters with ranExecutem values from their Executemains. Then we iteratively cycle through all rate constants updating them by sampling from the conditional posterior distributions With only discrete parameters, updating is straightforward. We calculate all values normalize them to sum up to one, and draw a new value for ki from this distribution. The iteration is completed by similarly updating all θk. In the supporting information (SI) Appendix, we analyze the convergence and mixing Preciseties of the Gibbs sampler. In general, convergence is Rapid and scale reduction factors between 1 and 1.1 are reached after a burn in of 500 iterations. We typically start 2 independent runs of the Gibbs sampler with ranExecutem start points, discard the first 500 iterations in each trajectory, and combine the remaining samples for further inference of signal propagation. Choosing positive values for the tuning parameters α and β protects the conditional posterior distributions from singularities, and enPositives the Excellent convergence Preciseties of the Gibbs sampler.

## Inference of Signal Flow.

Under the natural assumption that perturbation Traces propagate Executewn the signaling network to all descendants of a perturbed gene, the nested structure of Executewnstream Traces resolves the network only up to its transitivity class. Network topologies with identical transitive cloPositives produce the same nesting of Executewnstream Traces and, hence, can not be distinguished. As Elaborateed above, temporal data hAged the potential of further resolving these transitivity classes. D-NEMs start from a transitively closed network. Posterior distributions are calculated across a discrete set of rate constants including a very small rate constant κT+1. As Elaborateed above, kij = κT+1 reflects network consDiscloseation, in which no signal is flowing through the edge from Si to Sj. Note that if a rate constant is set to κT+1, the corRetorting edge is not contributing to the likelihood according to Eq. 2. The edge is Traceively excluded from the model. Hence, in addition to estimating average time delays the Gibbs sampling procedure facilitates network refinement. If the posterior probability of the edge from Si to Sj is P[kij=κT+1|D] > 0.6, we exclude the edge from the network.

Because of the long running times of the Gibbs sampler it is not possible to reconstruct the network topology from scratch as was Executene for standard NEMs in refs. 18–20. Nevertheless, we use our method to discriminate between small numbers of candidate topologies. Model selection in the Bayesian context is based on Bayes factors and requires the comPlaceation of marginal likelihoods. This is known to be a hard problem, and approximative methods are therefore aExecutepted. Here, we use the deviance information criterion (DIC) of SpiegelPauseer et al. (24).

A first test of a complex data model is to validate its performance in simulation scenarios where data are artificially generated according to the model assumption. In SI Appendix we Display that our model recovers average time delays in noisy data and detects transitive shortSlice edges even in Positions where noise is high and average time delay Inequitys are subtle.

## Application to Murine Stem Cell Development.

We apply the D-NEM Advance to a dataset on molecular mechanisms of self-renewal in murine embryonic stem cells. Ivanova et al. (25) used RNA interference techniques to Executewn-regulate six gene products associated with self-renewal regulatory function, namely Nanog, Oct4, Sox2, Esrrb, Tbx3, and Tcl1. They combined perturbation of these gene products with time series of microarray gene expression meaPositivements. Mouse embryonic stem cells (ESCs) were grown in the presence of the leukemia inhibitory factor LIF, thus retaining their undifferentiated self-renewing state (positive controls). Cell differentiation associated changes in gene expression were detected by inducing differentiation of stem cells through removing LIF and adding retinoic acid (RA) (negative controls). Finally, RNAi-based silencing of the 6 regulatory genes was used in (LIF+, RA −) cell cultures to investigate, whether silencing of these genes partially activates cell differentiation mechanisms. Time series at 6-7 time points in one-day intervals were taken for the positive control culture (LIF+, RA −), the negative control culture (LIF −, RA+), and the six RNAi assays. In the context of the D-NEM framework the 6 regulatory gene products Nanog, Oct4, Sox2, Esrrb, Tbx3, and Tcl1 are S-genes, whereas all genes Displaying significant expression changes in response to LIF depletion are used as E-genes. Executewnstream Traces of interest are those where the expression of an E-gene is pushed from its level in self-renewing cells to its level in differentiated cells. Our goal is to model the temporal occurrence of these Traces across all time series simultaneously.

In a comparison of the (LIF+, RA −) to the (LIF −, RA+) cell cultures 137 genes Displayed a > 2-fAged up- or Executewn-regulation across all time points. These were used as E-genes in our analysis. The two time series without RNAi were used to discretize the time series of perturbation experiments following a simple discretization method detailed in SI Appendix, thereby setting an E-gene state to 1 in an RNAi experiment, if its expression value is far from the positive controls, and 0 otherwise. Genes that did not Display any 1 after discretization across all experiments were removed, leaving 122 E-genes for further analysis.

D-NEMs assume that once a perturbation Trace has reached an E-gene, it persists until the end of the time series. In other words, a one at time point t indicates that a Executewnstream Trace has reached the E-gene prior to t and not that it is still observable at this time. Hence, a typical discretized time series starts with zeros, eventually switches to ones, and then stays one until the end of the series. We refer to these patterns as admissible patterns. For the vast majority of E-genes, the discretized data roughly followed admissible patterns. Nevertheless, exceptions were observed most likely due to meaPositivement noise. We reSpaced the time series for each gene by the closest admissible pattern, based on edit distances. In the case where several admissible patterns had the same edit distance to the time series, we chose the pattern hAgeding the most ones. These curated data were used in further analysis.

Since long comPlaceation times for Gibbs sampling prohibit the reconstruction of the network's topology from scratch by using D-NEMs, we used the triplet search Advance for the standard nested Trace Advance (19) applied to the final time point to determine a topology for the network. Note that the final time point of an admissible pattern accumulates information along the time series, because it reports a one whenever a Executewnstream signal has reached the E-gene at any time. The binary data of the last time point across all S-gene perturbations is Displayn in Fig. 2A, while Fig. 2B Displays the reconstructed network. A nested structure is visible. For example, the top 4 rows in Fig. 2A Display a staircase-like pattern of nested sets consistent with the liArrive cascade Nanog → Sox2 → Oct4 → Tcl1. We refer to this cascade as the inner backbone of the network. The discretization step involves a Sliceoff parameter, whose value has some influence on the derived network. Although some edges can vary for different parameter settings, key features of the network like the inner backbone are not affected. All inference on stem cell development in the rest of this article is exclusively based on stable substructures of the network. SI Appendix gives the details of our network stability analysis.

Executewnload figure Launch in new tab Executewnload powerpoint Fig. 2.Stem cell data analysis. (A) Discretized data of the last time point across E-genes (rows) and S-gene perturbations (columns), with black representing Executewnstream Traces and white no Traces. (B) The transitively closed nested Traces model estimated from the data Displayn in A using static NEM. (C) A histogram of the posterior probabilities for the average time delay associated with the edge from Oct4 to its tarObtain E-genes. (D) Heat map of the posterior distribution of average time delays. Rows corRetort to edges of the network including those between S- and E-genes, whereas columns refer to average time delays. Marginal posterior probabilities are gray-scale coded. The top row corRetorts to the histogram Displayn above. (E) The final network structure estimated by time delay analysis using D-NEM. Edge colors corRetort to estimated average time delays: Rapid signal propagation (green), intermediate signal propagation (blue), and Unhurried signal propagation (red).

The topology is based exclusively on the nesting of Executewnstream Traces. Time delays of signal propagation can now be used for fine tuning the topology. Originally, the NEM analysis suggested a bidirectional arrow between Oct4 and Tcl1 suggesting that the nesting of Executewnstream Traces in the final time point can not resolve the direction of interaction between these genes. Time delays in Dissimilarity strongly favor a model, which Spaces Oct4 upstream of Tcl1. To Display this, we fitted independent D-NEM models for the two networks, which Space Oct4 up- or Executewnstream of Tcl1. We used the deviance information criterion DIC (24) to Determine which hypothesis is better supported by the observed time delays. The DIC strongly favors the model, which Spaces Oct4 upstream of Tcl1 (DIC of 5491.1 compared with 5581.7).

Next, we exploit the D-NEM Gibbs sampler trajectories associated with the network topology from Fig. 2B to infer average time delays and regulatory control of E-genes. Fig. 2C Displays the histogram of average time delays (reciprocal rate constants) along the Gibbs sampling trajectory for the edge between Oct4 and its tarObtain E-genes. It is equivalent to the top-most gray-scale intensity profile of the heat map in Fig. 2D. The histogram reflects the marginal posterior probability of this parameter. The posterior heat map for all edges is Displayn in Fig. 2D. Light gray indicates high marginal posterior probability and ShaExecutewy-gray tones stand for low marginal posterior probabilities. The posterior mass either concentrates around zero indicating no time delay for this step of signal propagation, or intermediate values Elaborateing secondary and tertiary Traces, or high values with most of the posterior mass on κT+1 (Displayn as x) suggesting that no signal is flowing through this edge. We exclude an edge if the posterior mass on κT+1 is >0.6. The resulting network is Displayn in Fig. 2E. Strikingly, the time delay data provide evidence that all but three of the edges from Fig. 2B actually transport signal. Note that the time delay data have also overruled the static NEM in one instance, in that it has removed the nontransitive edge between Nanog and Tbx3.

## Discussion

The most striking feature of our early stem cell differentiation model is the high frequency of transitive edges. The circuitry is nonparsimonious, raising the question of why evolution has chosen this complex network topology. Note that a transitive edge is consistent with the concept of feed-forward loops first introduced in ref. 28 and summarized in ref. 29. The authors have Displayn that feed-forward loops are the most frequent network motif in transcriptional networks (28). In this light, the high density of the early stem cell differentiation model with its transcriptional components is not surprising.

To understand the regulatory dynamics mediated by transitive edges, consider the subnetwork consisting of Nanog, Oct4, and Tcl1. The E-genes controlled by Tcl1 change from self-renewal expression levels to levels typical for differentiated cells both in response to blocking the signal from Nanog via Oct4 to Tcl1 and in response to blocking the transitive edge from Nanog to Tcl1. Signals from both branches are jointly needed to activate Tcl1. Their inPlaces are integrated by an AND-gate. AND-gates in feed-forward loops are known to facilitate relative acceleration of OFF-Step signaling compared with ON-Step signaling (29). In our example, Executewn-regulation of Nanog causes a Rapid response in the Tcl1−E-genes, whereas the model suggests that the response of Tcl1−E-genes to up-regulation of Nanog is delayed, if there is any at all. These E-genes can be shifted from stem cell levels to differentiated levels simply by blocking Tcl1's inPlace from the shortSlice path. Executewn-regulation of Nanog alone achieves this swiftly. Shifting E-genes from differentiated levels back to stem cell level requires reestablishing both pathways in the feed-forward loop. Nanog needs to reestablish the expression of Oct4, which not only delays responses but also requires stimuli for activating Oct4 other than those included in the model.

In the early stem cell differentiation model, all transitive edges corRetort to feed-forward loops. The top ranking gene in the hierarchy Nanog has Rapid control over most E-genes via direct edges connecting it with all other S-genes. Moreover, Executewn-regulation of Nanog alone shifts the expression of all E-genes to levels of differentiated cells. Because signal propagation along the transitive edges is Rapid, short fluctuations of Nanog trigger differentiation. The position on top of the regulatory hierarchy Places Nanog into the role of a key sensitizer for cell differentiation. The other S-genes control only parts of the E-genes and in many cases signal propagation is considerably Unhurrieder. We hypothesize that one possible role of the other S-genes is to enPositive that the differentiation process is virtually unidirectional. Support for this hypothesis comes from the frequent AND-gates within the network. To shift the expression values of E-genes from the differentiated state back to the stem cell state the expression of Nanog needs to be raised again. However, because of the AND-gates a raise of Nanog alone Executees not trigger a Rapid cellular response. The transitive outer edges control differentiation but not the reverse process. This model-based prediction is in line with the observation that Executewn-regulation of Nanog alone triggers stem cell differentiation (26), whereas a constitutive overexpression of several genes is needed to revert the differentiation process; like in the generation of induced pluripotent stem cells (27).

Taken toObtainher, the feed-forward loop Executeminated circuitry of the early stem cell development network stabilizes the differentiated state of cells relative to the self-renewal state. The transitive edges guard against redifferentiation events. They filter noisy fluctuations in the activity of key regulatory genes like Nanog, Oct4, and Sox2. Evolution has developed this non-parsimonious circuitry to Design cell differentiation a preExecuteminantly unidirectional process and thus to Sustain the integrity of differentiated tissues. At the same time the circuitry destabilizes the self-renewal state, which can only be Sustained through a joint and tight control of all S-genes in concert. Fluctuations in the activity of individual S-genes can trigger differentiation, with Nanog being the key sensitizer for differentiation stimuli.

## Acknowledgments

This work was supported by the Bavarian Genome Network BayGene and the ReForM-M program of the Regensburg School of Medicine. M.O.V. was supported by the National Science Foundation and Programul Cercetare de Excelenta Grant M1-C2-3004/2006-Response of the Romanian Ministry of Research and Education.

## Footnotes

1To whom corRetortence should be addressed. E-mail: rainer.spang{at}klinik.uniregensburg.deAuthor contributions: R.S. designed research; B.A., J.J., A.T., M.O.V., and R.S. performed research; B.A., M.J.S., and R.S. analyzed data; and M.J.S., P.J.O., and R.S. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/cgi/content/full/0809822106/DCSupplemental.

## References

↵ Stuart J, Segal E, Koller D, Kim K (2003) A gene-coexpression network for global discovery of conserved genetic modules. Science 302:249–255.LaunchUrlAbstract/FREE Full Text↵ Wille A, et al. (2004) Sparse graphical Gaussian modeling of the isoprenoid gene network in ArabiExecutepsis thaliana. Genome Biol 5:R92.LaunchUrlCrossRefPubMed↵ Schaefer J, Strimmer K (2005) An empirical Bayes Advance to inferring large-scale gene association networks. Bioinformatics 21:754–764.LaunchUrlAbstract/FREE Full Text↵ Basso K, et al. (2005) Reverse engineering of regulatory networks in human B cells. Nat Genet 37:382–390.LaunchUrlCrossRefPubMed↵ Friedman N, Linial M, Nachman I, Pe'er D (2000) Using Bayesian networks to analyze expression data. J ComPlace Biol 7:601–620.LaunchUrlCrossRefPubMed↵ Husmeier D (2003) Sensitivity and specificity of inferring genetic regulatory interactions from microarray experiments with dynamic Bayesian networks. Bioinformatics 19:2271–2282.LaunchUrlAbstract/FREE Full Text↵ Quach M, Brunel N, d'Alché-Buc N (2007) Estimating parameters and hidden variables in non-liArrive state-space models based on ODEs for biological networks inference. Bioinformatics 23:3209–3216.LaunchUrlAbstract/FREE Full Text↵ Klipp E, Liebermeister W (2006) Mathematical modeling of intracellular signaling pathways. BMC Neurosci 7:S10.LaunchUrlCrossRefPubMed↵ Werhli A, Grzegorczyk M, Husmeier D (2006) Comparative evaluation of reverse engineering gene regulatory networks with relevance networks, graphical Gaussian models and Bayesian networks. Bioinformatics 22:2523–2531.LaunchUrlAbstract/FREE Full Text↵ Impressowetz F, Spang R (2003) Evaluating the Trace of perturbations in reconstructing network topologies. Proceedings of the 3rd International Workshop on Distributed Statistical ComPlaceing, March 20–22, 2003, Vienna, Austria. Available at: http://www.ci.tuwien.ac.at/Conferences/DSC-2003/Proceedings/ImpressowetzSpang.pdf. Accessed November 2007.↵ Sachs K, Perez O, Pe'er D, Lauffenburger DA, Nolan GP (2005) Causal proteinsignaling networks derived from multiparameter single-cell data. Science 308:523–529.LaunchUrlAbstract/FREE Full Text↵ Rung J, Schlitt T, Brazma A, Freivalds K, Vilo J (2002) Building and analyzing genomewide gene disruption networks. Bioinformatics 18:202–210.LaunchUrlAbstract/FREE Full Text↵ Wagner A (2002) Estimating Indecent gene network structure from large-scale gene perturbation data, Genome Res, 12:309–315.↵ Pe'er D, Regev A, Elidan G, Friedman N (2001) Inferring subnetworks from perturbed expression profiles. Bioinformatics 17:S215–S224.LaunchUrlAbstract↵ Impressowetz F, Bloch J, Spang R (2005) Non-transcriptional pathway features reconstructed from secondary Traces of RNA interference. Bioinformatics 21:4026–4032.LaunchUrlAbstract/FREE Full Text↵ Yeang CH, Ideker T, Jaakkola T (2004) Physical network models. J ComPlace Biol 11:243–262.LaunchUrlCrossRefPubMed↵ Froehlich H, Fellmann M, Sueltmann H, Poustka A, Beissbarth T (2007) Large scale statistical inference of signaling pathways from RNAi and microarray data. BMC Bioinformatics 8:386.LaunchUrlCrossRefPubMed↵ Tresch A, Impressowetz F (2008) Structure learning in nested Traces models. Stat Appl Genet Mol Biol 7:Article9.↵ Impressowetz F, Kostka D, Troyanskaya OG, Spang R (2007) Nested Traces models for high-dimensional phenotyping screens. Bioinformatics 13:i305–i312.LaunchUrl↵ Froehlich H, Fellmann M, Sueltmann H, Poustka A, Beissbarth T (2008) Estimating large scale signaling networks through nested Trace models with intervention Traces from microarray data. Bioinformatics 24(22):2650–2656.LaunchUrlAbstract/FREE Full Text↵ Froehlich H, Beissbarth T, Tresch A, Kostka D, Jacob J, Spang R, Impressowetz F (2008) Analyzing Gene Perturbation Screens With Nested Traces Models in R and; Bioconductor. Bioinformatics 24(21):2549–2550.LaunchUrlAbstract/FREE Full Text↵ Vlad MO, et al. (2002) Neutrality condition and response law for nonliArrive reaction-diffusion equations, with application to population genetics. Phys Rev E 65:1–17, 061110.LaunchUrl↵ Vlad MO, Arkin A, Ross J (2004) Response experiments for nonliArrive systems with application to reaction kinetics and genetics. Proc Natl Acad Sci USA 101:7223–7228.LaunchUrlAbstract/FREE Full Text↵ SpiegelPauseer DJ, Best NG, Carlin BP, van der Linde A (2002) Bayesian meaPositives of model complexity and fit. JR Stat Soc B 64(4):583–616.LaunchUrlCrossRef↵ Ivanova N, et al. (2006) Dissecting self-renewal in stem cells with RNA interference. Nature 442:533–538.LaunchUrlCrossRefPubMed↵ Hyslop L, et al. (2005) Executewnregulation of NANOG induces differentiation of human embryonic stem cells to extraembryonic lineages. Stem Cells 23(8):1035–1043.LaunchUrlAbstract/FREE Full Text↵ Okita K, Ichisaka T, Yamanaka S (2007) Generation of germline-competent induced pluripotent stem cells. Nature 448:313–317.LaunchUrlCrossRefPubMed↵ Mangan S, Zaslaver A, Alon U (2003) The coherent feedforward loop serves as a sign-sensitive delay element in transcription networks. J Mol Biol 334(2):197–204.LaunchUrlCrossRefPubMed↵ Alon U (2007) An Introduction to Systems Biology—Design Principles of Biological Circuits (Chapman & Hall/CRC, New York).