Steps Toward Heuristic Design at the Molecular-Scale
James A. Gurney^{a}, Edward A. Rietman^{*, b}, Matthew A. Marcus^{c}, and Mark P. Andrews^{a}
^{a}McGill University, Chemistry Dept. Montreal, QC, H3A 2K6, Canada ^{b}Bell Labs, Lucent Technologies, Murray Hill, NJ 07974, USA ^{c}KLA Instruments, San Jose, CA 95161, USA
We desire the ability to design molecular-scale components and new materials using a heuristic programming technique such as genetic algorithms. To this end we have been investigating the possibility of designing new materials in the space of cellular automata. Molecules interact, more or less, only with their nearest neighbors. This suggests that cellular automata (arrays of nearest neighbor interacting finite state machines) may be used for modeling the dynamics of molecules. Since it is well known that structures can be "grown" in the space of the cellular automata (CA) we conjecture that by manipulation of the rule table or rule vector describing the CA dynamics we can evolve desired structures in the CA space. To support this supposition we present work on modeling the etching and deposition (dissolution and growth) of crystals. We empirically find a mapping between the automaton rule table and the surface physics of the crystal. In order to keep this "extended abstract" within bounds we assume the readers know about cellular automata. The full text of this paper will appear in Gurney et al. (1998).
Introduction
In order to succeed with any technique of heuristic design of new materials there must be a mathematical mapping (empirical will be adequate) between the rules that describe the dynamics and the final structure. That is, the structures evolved in the CA space (or even the "space" in a molecular dynamics simulation) must map to molecules, and therefore the CA space (CAS) must map to the space of real world (RW) materials.
This mapping relation could be very difficult to obtain. Further, the elements manipulated by the heuristic program should naturally be the rule table for the automaton not the CA space directly. So, we need to find a mapping relation between the rule table (R) and RW.
Of course, the mapping between the rules and the CA structures, or CA space, are found by simply running the cellular automata. If we find the map r we can circumvent the CA simulation. In affect we would have a mapping relation between a vector describing the microscale dynamics of the molecules and the physics of the materials.
One Parameter Mapping
In order to test the basic mapping conjecture we started with a "slab" in CA space and call this slab the crystal. We used a von Neumann neighborhood (5 cells) and two state automata for the simulation of etching and deposition (desorption and condensation). An atom would jump from a site with a probability, P, and an atom would jump to a site with a probability, 1-P. That is, the probability of a desorption is P and the probability of a condensation is 1-P. Using this simple scheme the 32 rules that describe the dynamics of the CA can be described by P.
Figure 1
Figure 1 is a "screen dump" from a CA run where the P value results in etching. The crystal starts with a thickness of 50 pixels and after 10 iterations the surface already has some degree of roughness. Continuing like this for many P values we can obtain plots of the rate of change on the crystal surface (etching or deposition) and the roughness. These plots are shown in Figures 2 and 3 respectively.
Figure 2
Figure 3
The data in Figure 2 show a Boltzmann-type sigmoid with only a slight perturbation at P=0.37. Figure 3 shows a cusp in the roughness at the same probability value. Both of these curves support the one-parameter mapping conjecture. That is, Figure 2 is a bijective mapping from the P value (a metric representing the rule table) to the rate, and Figure 3 shows an injective map from P to the roughness. These maps imply that for this simple one parameter system we can compute the rate and roughness (as well as surface entropy, see Gurney et al. 1998) directly from the rule table of the CA. This metric, representing the rule table, hints to the possibility that we could design new materials using only the P value and the mapping relations shown in the figure. Thus it is trivial support for the mapping relation r shown above. Gurney et al. (1998) also report studies on the mechanisms associated with the surface diffusion for this one parameter system.
Many Parameter Mapping
In the above section we have seen that cellular automata can be used in modeling etching and deposition processes - complex processes that could be modeled by systems of partial differential equations. Cellular automata are especially good at modeling natural systems that can be described by massive numbers of locally interacting simple objects. Many real world processes are intractable. No computational procedure, short of a complete particle-particle interaction simulation, could short cut these computationally irreducible processes (Wolfram, 1984). For these intractable problems the cellular automata machine becomes a world emulator for the process. The input to the CA machine is the CA program and the initial conditions, and the output is the final configuration. The CA machine does the simulation and computes the final configuration.
We would like to find a short cut to this intractable computation - some computational process that will be much faster and still give reasonably good answers, albeit with some error. The results for a trivial mapping of this type were described above. We would now like to extend the computation to include an almost totally stochastic automaton. That is, the rule table describing the automaton updates will consist of 32 unique probability values. Unlike the above simulations, the elements are random probability numbers not coupled to each other. If we can find a mapping between stochastic rule tables and physics values (e.g. roughness, rate) than we have a chance at using heuristic programming to design new materials. In this case the stochastic rule table, representing the dynamics and forces acting at the molecular level, is the domain to the mapping, and the physics values are the codomain.
Wulff and Hertz (1993) have demonstrated that a neural network can learn the dynamics of cellular automata. Similarly, we demonstrate that a neural network can be used for mapping the relation between the rule table and the physics values. Of course this comes about at the expense of an error and the loss of physical meaning to the adjustable parameters.
For the etching and deposition simulations the final configuration (arbitrarily defined as n time steps) is the end result of the CA computation. We then fed that final configuration information and some parameters logged during the simulation into an analytical computation to calculate the physics values. These two computational procedures - the CA computation and the analytical computation - can be viewed as a black box computational process into which we feed the initial conditions and a computer program, and the output is the physics values (e.g. roughness, entropy).
The initial condition for this computation is the initial configuration of the CA space. The computer program for the CA machine is the rule table. A neural network can emulate this computational black box. Since the input to our black box is the rule table for the CA machine, that will be the primary input to the neural network. The black box process also receives the initial configuration. To the neural network this initial configuration would simply be a constant so it will have no impact on the final computation. The output of our black box is the physics values. Similarly, we will make the output to the neural network the physics values. This is all shown schematically in Figure 4.
Figure 4
Neural networks are known to be Turing complete and capable of universal mapping (Hornik, 1989; Hornik et al., 1990). (This suggests their use for the current problem.) They have been well described in the literature (cf. Rumelhart et al., 1986 and Weiss and Kulikowski, 1991), and the most common learning algorithm is a gradient descent in weight space.
The input vector, in our case the stochastic rule vector, is presented as the input to a feed forward, one-hidden layer neural net with one output. The output is one of the physics values. A separate network was trained for each physics value for which we had calculations from the black box model. The output of the "neurons" in the hidden layer are given by the hyperbolic tangent of the sum of the product of the inputs to that neuron with the connection weights between the input and that hidden neuron. The output of the network is compared with the target value and least mean square error is computed and used to find the gradient in weight space
In training the neural network we ran the computational black box simulator to collected a training/testing set of 20,000 samples. The training consisted of minimizing the error between the neural network output and the output from the black box. In other words, minimizing the error in the neural network computation of the physics values. The connection weights in the network are simply matrices of adjustable parameters.
Our network had 33 input nodes representing the 32 elements in the rule vector and one bias input. The one hidden layer consisted of 10 nodes. There was one output representing the physics parameter of interest. The 33-10-1 network thus had a total of 340 adjustable, "physics free," parameters. Although this may seem like a large number of parameters the impact of them was reduced by statistical regularization (cf., Tikhonov and Arsenin, 1977; and Kirsch, 1996). Vapnik (1995) has shown that the critical issue for generalization of the network - that is the expected error from test samples - is the capacity of the network, not the number of connections. Furthermore, we used 10,000 samples to tune these 340 adjustable parameters (a ratio of 29:1).
The neural networks were trained with 10,000 samples and tested with the remaining 10,000. Table 1 summarizes the results for each of the physics values. The neural network is an estimator of the physics parameters. From the simulator we collected a population of observations and calculate the mean and standard deviation for the physics values. Based on a gaussian fit of this population data, we can get a random estimate of the physics value from a rule vector. For comparison the table also shows this value. Each neural network estimate is more than 80% closer to the correct value than a random estimate.
Table 1. Neural network and random guess compared
std. dev. of NN
std. dev. of random
roughness
2.30
13.54
rate
11.31
94.31
It is likely that the surjective mapping done by the cellular automata machine causes the primary error associated with this procedure. A dynamical system is surjective if every output state has at least one input state, but the inverse mapping is not true. In other words, the surjective mapping is a many to one map. Several members of the domain can map to the same member of the codomain. Two-dimensional cellular automata are well known to be surjective (Durando, 1994; Kari, 1994).
We have demonstrated that the neural network can map between the rule table and the physics values - even for stochastic rule table elements. This implies that it may be possible to use a rule table mapping for heuristic design of materials. The surjective nature of the mapping and the error associated with the neural network implies that our design will be somewhat open. The end result will be an approximation. However, in order to make this a useful tool we must know both it capabilities and its limitations.
References
Durando, B., "The Surjectivity Problem for 2D Cellular Automata," J. of Computer and System Sciences, 49, 718-725 (1994)
Gurney, J. A., Rietman, E. A., Marcus, M. A. and Andrews, M. P. , "Mapping the Rule Table of a 2-D Probabilistic Cellular Automata to the Chemical Physics of Etching and Depositon," submitted, 1998
Hornik, K., "Multilayer Feedforward Networks are Universal Approximators," Neural Networks, 2, 359-366, 1989
Hornik, K., Stinchcombe, M., and White, H., "Universal approximation of an Unknown Mapping and its derivatives using Multilayer Feedforward Networks," Neural Networks, 3, 551-560, 1990
Kirsch, A., An Introduction to the Mathematical Theory of Inverse Problems, Springer, New York, 1996
Rumelhart, D. E., McClelland, J. L. and the PDP Research Group (editors), Parallel Distributed Processing: Explorations in the Microstructure of Cognition," MIT Press, Cambridge, MA, 1986
Tikhonov, A. N. and Arsenin, V. Y., Solutions of Ill-Posed Problems, John Wiley, New York, 1977
Vapnik, V. N., The Nature of Statistical Learning Theory, Springer, New York, 1995
Weiss, S. M. and Kulikowski, C. A., Computer Systems that Learn, Morgan Kaufmann, San Mateo, CA, 1991
Wolfram, S., "Undecidability and Intractability in Theoretical Physics," Physical Review Letters, 54 (8), 735-738, 1985
Wulff, N. H. and Hertz, J. A., "Learning Cellular Automaton Dynamics with Neural Networks," pp631-638 in Advances in Neural Information Processing Systems 5, S. J. Hanson, J. D. Cowan and C. Lee Giles, (editors) Mrogan Kaufmann, San Mateo, CA, 1993
^{*}Corresponding Address:
Edward A. Rietman
Bell Labs, Lucent Technologies
Murray Hill, NJ 07974, USA
E-mail: ear@bell-labs.com