“Psychology has two traditional models of learning, classical and operant conditioning. In classical conditioning an agent learns the association of features, food and bell ringing in Pavlov’s famous example, provided the features meet various conditions, such as proximity of time of occurrence. Measured by drops of saliva, Pavlov’s dogs learned to expect food from bell ringing, and I count that as learning an association, but they did not learn anything about causation—they did not learn the effects of any intervention; for example, how to bring about or to prevent either the presentation of food or the ringing of the bell.
Some associations between a prior feature and a subsequent feature hold because the occurrence of the first feature caused the occurrence of the second feature, and some associations hold for other reasons—often because some third feature caused both the first and the second: Pavlov (or his assistants) caused both the bell to ring and the food to appear. By contrast, in operant conditioning an agent learns both an association and at least a fragment of a causal relation. Skinner’s pigeons learned that pecking a target is associated with the appearance of food pellets, and they learned at the same time how to control or influence the appearance of food pellets—by pecking the target.That partial causal knowledge was evidenced by an acquired skill, a competence at bringing about the presence of food by appropriate pecking, and of course not by anything linguistic. Skinner and his assistants arranged the mechanism, but given that mechanism each bird learned a causal conditional: if it pecks, food appears.
The causal knowledge acquired in operant conditioning may be radically incomplete if it is confined to implicit knowledge of the effects of the learner’s own actions, and not generalized to yield an understanding of the effects of other sources of intervention. It is one thing to know that if I peck on the target, a food pellet will appear, another to knowthat if there is a blow on the target, from whatever source, a food pellet will appear. A full causal understanding separates events that are subject to a system of causal relations from interventions that alter them, and implies a general grasp of the relevant interventions. Learning by imitation seems to indicate a more complete causal understanding. Meltzoff and Moore (1977) showed that very young babies imitate some of the actions of others, and of course older children and adults imitate all the time. Imitation can be for its own sake, from which useful consequences may later be discovered, or may be acquired along with knowledge of the consequences of the act imitated. In the latter case, imitation is the manifestation of an efficient way of acquiring causal knowledge, a way that identifies an act as a generic kind, that recognizes the causal power of the kind, and that recognizes the agent’s own action as an instance of the kind, no matter how different from the observed action of another one’s own action may look or feel to oneself. There is, of course, a reverse inference, from observation of the consequences of one’s own actions to knowledge of the consequences of like actions by others.
Learning causal relations from observations of others’ actions is essential for the accretion of causal knowledge that constitutes culture. It is, therefore, interesting that recent studies suggest that nonhuman primate modes of learning by imitation may be seriously limited in comparison with humans, either because they do not imitate, or do not learn from observations of others the consequences of imitated actions.
Whatever the biological constraints and imperatives concerning the formation of concepts of kinds, it seems likely that humans and perhaps other creatures also have the capacity to fashion kinds to suit their environments. Adults certainly fashion many categories to discriminate causal powers, and presumably children do as well.There may be many ways that causal roles influence categorization, but consider just one of the ways suggested by network representations. From the point of view of Bayes nets, fashioning kinds is fashioning variables, deciding when perceptual or historical differences should be used to separate things into distinct kinds, and when they should be ignored. The “should be” has to do with whatever promotes causal understanding, prediction, and control.
Simon has often insisted that intelligence works best in an “approximately decomposable” world, a world where not everything is hooked up to everything else, and the influences of causes are approximately separable. One of the morals of computational research on Bayes nets is that their utilities are only available when the domain is structured so that the networks are sparse. If every variable is in fact dependent on every other variable, conditional on every other set of variables, little about causal structure can be learned from the data short of a complete set of randomized experiments, and, were such a complex causal structure to be known, prediction and control would be infeasible to compute with it.
But knowledge of a causal structure is useful in prediction and control only if the structure is not completely empty, only if some features influence other features. Causal knowledge from fragmentary observations, possible when the causal structure is sparse, is useful only when some things influence others, but the structure is still sparse.
Whether the causal relations in a domain are sufficiently sparse to be tractable and sufficiently dense to aid prediction and control depends in part on how the variables are specified—on the kinds. Dense causal structures are sometimes resolved into sparser causal structures by introducing new kinds.”