The Link between Action and Language : Recent Findings and Future Perspectives

This paper aims to present a critical review of studies focused on embodied cognition and, more specifically, on the relationship between language and action. A critical analysis of studies using methods such as TMS and fMRI will be presented, and results reported by the different studies will be discussed, both theoretically and methodologically. Then, in response to some inconsistency detected by the analysis of literature, Virtual Reality will be presented as a possible answer or enrichment for the study of this topic. Possible future research tracks and application are discussed.


Introduction
Traditional theories of cognition are based on the idea that knowledge is represented in the brain in the form of concepts and stored in memory systems as semantic information.Concepts, from this perspective, are conceived as amodal, abstract and arbitrary (Fodor 1975), then independent from the brain's modal system of perception (e.g., vision, audition), and action (e.g., movement, proprioception).Chomsky's theory of language (Chomsky 1965) is completely aligned with this view: The theory of Universal Grammar considers language as a corpus of abstract symbols combined together according to formal syntactic rules; two properties, among others, are distinctive of human language, the generativity and compositionality.
In more recent years, nevertheless, a radically different conception of knowledge has been taken into account, that brings together data from different methodological approaches such as neurobiology, brain imaging, and neuropsychology: the theory of Embodied Cognition (Wilson 2002;Gibbs 2006).According to the embodied cognition hypothesis, concepts are not amodal and knowledge relies on body states and experiences.Therefore, there is a tight link between concepts, action, and perception, to the extent that conceptual knowledge is mapped within the sensory-motor system.The notion that cognition is grounded in action and perception is encapsulated in the term 'embodiment'.
Metaphorically speaking, embodied theories of cognition extended the boundaries of anatomical structures to which traditionally a specific function was assigned: The mind is no longer confined to the brain but also includes other body parts, such as hands, legs, eyes.Moreover, within the brain, the separation between primary areas, recruited for basic sensory and motor processing, and the associative areas, in which more complex processes take place is not strictly defined anymore: actually, the distinction between low and high level processes drops down in favor of a more integrated model.This new model proposes an interplay that allows the recruitment of primary areas even during cognitive processes such as language and conceptualization.
In the last decades many neuroscientists focused on the theory of embodied cognition in general, and on embodied language in particular.Embodied theories of language predict that the neural structures involved in sensory, perceptual or motor areas are also active when processing words whose meaning embeds prominent sensory (auditory and tactile features; Goldberg et al. 2006), perceptual (color;Martin et al. 1995), or faces and places; Aziz-Zadeh et al. 2008), or motor features (see below for a detailed review).This interest in embodied theories of cognitive processes, thus, yielded a growing corpus of data, yet still many topics are unclear and deserve further investigation.One of the most intriguing one is the link between language processing and motor system, which has been extensively investigated in recent years.
Starting from these considerations, the present paper has two main purposes: The first one is to briefly review the recent literature that addresses the relationship between motor system and language processing, distinguishing research on the base of the tool used to investigate this issue (transcranial magnetic stimulation or functional magnetic resonance).The intention is to show how and to what extent experimental protocols with different methodologies and tools lead sometimes to contrasting results; moreover a special attention will be paid to the discussion of the capabilities that each technique inherently presents.The second goal is to reflect on future perspectives.In particular, we will present a new tool for the study of the embodiment that, to our knowledge, has not been used so far for studying cognitive processes: virtual reality.

TMS Studies
Transcranial magnetic stimulation (TMS) proved to be an efficient and promising method to investigate the link between action and language.Thanks to its temporal and spatial resolution, TMS became one of the most used tools to study where and when the language processes are mapped within the motor system.Most of the researchers applied single pulse TMS protocols over the primary motor cortex (M1) during a linguistic task and registered motor evoked potential (MEP) from the muscles that are supposed to respond depending on the portion of the cortex stimulated.The rational is the following: If the linguistic task engages to some extent the portion of the cortex stimulated at the time of stimulation, then it should result in a modulation of cortico-spinal excitability and thus of the MEP amplitude (compared to rest condition).This kind of experimental design has been mostly employed to investigate the role of M1 during the processing of abstract vs. action verbs, but results are sometimes contrasting.For example, Papeo et al. (2009) reported an increase of MEPs recorded while participants read action verbs compared with what happened while they read verbs describing abstract concepts; in contrast, Buccino et al. (2005) described a reverse situation during language comprehension: MEPs recorded from hand muscles was lower while participants heard hand-related action verbs compared to foot-related action verbs, indicating an effector specific inhibition.Although these findings might seem incoherent, several different experimental features can account for them; one of these is the timing of stimulation, which is an important issue to consider when studying excitability of such dynamic systems.In fact, we can argue that stimulation of an area occurring just while the process is taking place should produce an interference effect, and hence an inhibition of that area; by opposite, a stimulation delivered shortly before the onset of the process in this given area might act as a prime and produce a sort of facilitation effect (preactivation) for that area.Papeo et al. (2009) evaluated the effects of TMS over M1 at different windows of time from the linguistic stimulus onset: They reported an involvement of M1 in the linguistic process only when stimulation was delivered after 500 ms post-stimulus, that is in the post-conceptual stage but not in the previous ones.This result would lead us to think that lexical-semantic processing of action verbs does not automatically activate the M1, whose activation is modulated in a top-down manner.
The second element to take into account is the specific linguistic task performed by participants.In literature we can find different researches that employed different linguistic tasks to evaluate motor activation, each of whom entailed different linguistic processes.In some cases lexical decision was required (Pulvermuller et al. 2005), while others used reading (Fadiga et al. 2002), semantic judgments (Buccino et al. 2005), imagery (Fourkas et al. 2006), transformation tasks (Oliveri et al. 2004).Tomasino et al. (2008) compared systematically the effects of different timings of stimulation during different kind of tasks (silent reading, motor imagery and frequency judgments) and found that M1 plays a role only during motor imagery, so they concluded that the recruitment of motor networks during language understanding is not required, but it occurs only when explicit motor simulation is requested.However, the effect of TMS in modulating MEPs during semantic judgments of nouns (natural vs. tools; graspable vs. ungraspable) has been reported, even without any overt motor simulation (Gough et al. 2012).The identification vs. distinction of the simulation/imagery processes is still open, even if imaging data seem to support the distinction hypeothesis (Willems et al. 2010b, see below).
Recently TMS protocols have been employed to discover the role of morpho-syntactic features on the activity of M1: Papeo and colleagues (Papeo et al. 2011) compared MEPs recorded during reading tasks of action vs. abstract verbs presented using the first or the third singular person (I vs he/she); they found an increase of MEPs amplitude selectively for the action verbs at the first person, deriving from these data that motor simulation is facilitated when the conceptual representation of the verb includes the self as agent.Furthermore, a sensitivity of the primary motor cortex to the polarity of sentences was high-lighted: Active action-related sentences suppressed cortico-spinal reactivity compared to passive action-related sentences, and either active or passive abstract sentences (Liuzza et al. 2011).
Finally, TMS can be used in offline procedures, delivering repeated trains of stimulation over a period of time lasting several minutes (rTMS or TBS) in order to modify transiently the cortical excitability and investigate the role of the stimulated area in a given process.In this case experimenters are not interested in defining the exact timing of the cognitive process but rather aim to discover if the area is involved in that process.To this field of application can be ascribed the studies carried out by Gerfo et al. (2008) and Willems et al. (2011).In both studies, motor networks (primary and/or premotor cortices) are found to be functionally relevant in action-related language understanding.
Future studies are needed to investigate with offline (facilitatory and inhibitory) stimulation the role of motor areas in different linguistic tasks in order to deepen the knowledge about their function (causal or epiphenomenal?) during language processing.

Imaging Studies
Functional magnetic resonance imaging (fMRI) is so far the imaging technique preferred by researchers who intend to shed light on the relationship between motor areas and language processing.While TMS studies allow to establish a causal link between experimental manipulations (i.e.site of stimulation) and behavioural tasks (i.e.linguistic tasks), fMRi experiments are correlational protocols by nature, giving the possibility to identify, among all the brain areas, those engaged during a specific process and a precise window time; further, fMRI allows to track down networks of activations, reflecting the dynamic features of the process under investigation.
A first line of research aimed to determine if and where language processing recruits brain areas usually activated during motor tasks (considered in a broad sense, i.e. motor observation, preparation, execution).This topic often intercepts and includes theoretical issues that arise from studies focused on mirror neurons.In fact, it is well known that mirror neurons in monkeys are activated not only by the observation of a movement performed by others but also when the noise associated to the action is heard (Kohler et al. 2002).In humans, action-related auditory inputs are well implemented in language stimuli: This happens in particular when sentences describing actions are presented auditorily.Many studies have been carried out to explore the possibility that the understanding of action-related sentence relies on the same observationexecution system by means of mirror neurons (see Aziz-Zadeh & Damasio 2008 for a review).Most of these researches, relying on different linguistics tasks, reported a somatotopic activation of premotor cortex, primary motor cortex and Broca's region (Hauk et al. 2004;Tettamanti et al. 2005;Aziz-Zadeh et al. 2006).Interestingly, this pattern of activation is confirmed even in children (age 4-6), as described by James & Maouene (2009), indicating that the embodied nature of language makes its appearance early in child development, when the language is not wholly acquired.Nevertheless, it is noteworthy that there is not a strong consensus about a somatotopic organisation of action words meaning representations, and this fact is not astounding considering that the organization of the premotor cortex is still poorly understood.For example, Postle et al. (2008), combining functional MRI with cyto-architectonically defined probabilistic maps of left hemisphere primary and premotor cortices, failed to find a direct correspondence between the activations triggered by effector-specific action words meaning and those found during the real movement of the same effectors.
As it has been noticed reviewing TMS studies, even in this case the kind of task and the features of the verbal material seem to yield different results.Raposo et al. (2009) comparing cerebral activation when proposing different semantic contexts (isolated action-verbs, literal sentences, idiomatic sentences) found that neural response was maximum in motor areas for isolated verbs and minimum for idiomatic sentences, with literal sentences in the middle; according to authors discussion, these findings suggest that motor response during language processing is context-dependent rather than automatic and invariable.From a similar perspective, van Dam and collaborators (van Dam et al. 2010) examined brain activity during the semantic judgment of verbs describing actions with different degrees of kinematic details: a region within the bilateral inferior parietal lobule proved to be sensitive to the specificity of motor programs associated to the action verbs, with the BOLD signal greater for the finest-grained actions.
Finally, fMRI can contribute to refine the theory of embodied language and also to test hypotheses that, if confirmed, can add data in favor of this theoretical position.In one recent research Willems et al. (2010b) investigated the construct of mental simulation, which is thought to be one of the core mechanism of embodiment, but it is still unclear whether it is the equivalent to explicit imagery.In particular, the authors found that implicit simulation of actions during language understanding is neurally dissociated from explicit motor imagery, thus confirming that the two processes are distinct in nature.Furthermore, according to simulation hypothesis, as stated by Willems et al. (2010a), "if understanding action words involves mentally simulating one's own actions, then the neurocognitive representation of word meanings should differ for people with different kinds of bodies, who perform actions in systematically different ways" (i.e.right-vs.left-handers): This prediction has been corroborated by fMRI data which showed a preferential activation of the right premotor cortex during lexical decision on action verbs for left-handers, and the opposite pattern of activation for the right-handers.
As showed in this short excursus, fMRI studies gave an important contribute to the study of the link between language processes and perceptive brain areas, thus adding essential pixels to the big picture of embodied semantics theory; however, beside traditional neuroscience techniques, such as fMRI and TMS, other tools could demonstrate great capabilities in this field of application: The next section is dedicated to the description of one of them, virtual reality.

Virtual Reality: A New Frontier for Neuroscience Research
A virtual reality system (VR) is a combination of technological devices that allows users creating, exploring and interacting with 3D environments.Typically, people entering a virtual environment feels like being a part of this world and has the opportunity to interact with it almost like he would do in real world: Just turning around his head, a user can explore visually the scene, and with other user-friendly controls he/she can move through the environment, approach objects, select them, meet other people presented as avatars or video-tape.This capability is made possible by the use of input tools (trackers, gloves, mice) that send to the computer the position and the movement of the user in real time, graphic rendering that changes the environment coherently with the information acquired, and output devices (visual, aural, and haptic) that return to the user a feedback of the interaction.However, it is the user immersion in a synthetic environment that characterizes VR as being different from interactive computer graphics or multimedia.In fact, the sense of presence in a virtual world elicited by immersive VR technology indicates that VR applications may differ fundamentally from those commonly associated with graphics and multimedia systems.Even if there is not yet a common agreement about what Presence is common definitions are the "sense of being there" (Steuer 1992) "the feeling of being in a world that exists outside the self" (Waterworth et al. 2010;Riva et al. 2011) or the "perceptual illusion of non-mediation" (Lombard & Ditton 1997).In general, scientific literature identified a set of factors that have a direct influence on the experience of presence (IJsselsteijn & Riva 2003;Riva 2006;Youngblut 2007): (a) the processing of multimodal input (visual, tactile, auditory, kinesthetic, olfactory) from the virtual experience be combined to form coherent perceptual categories -that is that the virtual experience be recognized as 'real'; (b) the processing of the multimodal input in an egocentric reference frame -that is the user feels that he or she is within the environment as opposed to observing it from a third person perspective; and (c) the ability to give a meaning to the multimodal input -that is that the virtual experience be recognized as 'meaningful' and 'relevant'.
Far from being a merely recreational tool, VR is increasingly used in research and clinical settings (Riva 2002).Traditionally, the most common application of VR in mental health is related to the treatment of anxiety disorders (Emmelkamp 2005;Parsons & Rizzo 2008): from simple phobias (Rothbaum et al. 2006;Krijn et al. 2007), to panic disorders (Vincelli et al. 2003;Botella et al. 2007), post-traumatic stress disorder (Rothbaum et al. 2001;Gerardi et al. 2008), and generalized anxiety disorder (Repetto et al. 2009a(Repetto et al. , 2009b;;Repetto & Riva 2011).The reason for the diffusion of the VR in this field of application is its versatility for implementing exposure therapy (VRET): In fact, VRET is safer, more controllable, less embarrassing and costly than in vivo exposure, but at the same time its immersive nature provides a real-like experience that may be more emotionally engaging than imaginal exposure (Riva 2010).
Recently Bohil and colleagues (Bohil et al. 2011) described the advantages of using virtual environments in several domains of neuroscience, such as spatial navigation, multisensory integration, social neuroscience, pain remediation, and neuro-rehabilitation.The authors pointed out the capabilities of VR for implementing experiments that overcome traditional limitations encountered by researchers interested in understanding the functioning of central nervous system.
One of these limitations is the gap between the degree of complexity typical of the real world and that embedded into the stimuli created ad hoc for the experimental protocol.In fact, usually participants in research settings perform tasks interacting with several different devices (i.e.computer, botton boxes) none of which is designed to simulate the real experience where the process investigated occurs.Virtual reality, by opposite, allows bypassing the common criticism toward the experimental setting, that is, its poor ecological validity: Immersing participant in virtual environments one could gain ecological validity without giving up controllability and replicability.
For researchers interested in studying cognitive processes from an embodied point of view this is a great opportunity: If representations in the cognitive system are multimodal, then to investigate their properties one should recreate the multimodal experience that can trigger the process.Furthermore, with the advance of technology, the interface between subject and VR system is more and more intended to become a non-mediated process, in which the body itself is the navigation tool (without the need of control devices).For these reasons, VR could be thought as an ideal medium for investigating several cognitive domains (Riva 1998), but the capabilities are not confined to the fact that inside the virtual experience many different source of stimulation can work together to recreate a realistic environment.In fact, VR can be considered an 'embodied technology' for its effects on body perceptions (Riva 2002): It is possible the use of VR for inducing controlled changes to the experience of the body.On one side, VR has been used to improve the experience of the body in patients with eating disorders (Perpiña et al. 1999;Riva et al. 2003;Ferrer-García & Gutiérrez-Maldonado 2012) or obesity (Riva et al. 2006).On the other side, different authors used VR to induce illusory perceptions -e.g., a fake limb (Slater et al. 2009) or body transfer illusion (Slater et al. 2010) -by altering the normal association between touch and its visual correlate.Being an embodied technology, VR seems a promising tool for the investigation of the link between language and action.In the recent past, the discovery of mirror neurons changed the outlook of neuroscience and established a connection between language and motor system (Gallese & Lakoff 2005;Chen & Yuan 2008).
The embodiment theory of language assigns an important role to this class of motor neurons in understanding action related concepts: mirror neurons should be activated by the linguistic stimulus and hence it should result in a modulation of the primary and premotor cortex (Gallese 2008).As reviewed in previous sections, several studies confirmed that language itself triggers motorlike responses within the cerebral areas where movement is represented (Hauk et al. 2004;Buccino et al. 2005).The opposite way to understand the relationships between language and action is to investigate if and to what extent motor inputs affect language representation and acquisition.Paulus and colleagues (Paulus et al. 2009) asked participants to learn functional verbal knowledge of new objects while performing different motor tasks.They found the presence of motor interference when the acquisition of manual object knowledge was paired with the concurrent manual action but this wasn't true if concurrent actions with the feet were performed.Furthermore, Macedonia and colleagues (Macedonia et al. 2011) studied the impact of iconic gestures on foreign language words learning: If learning of novel words was coupled to iconic gestures participants retained better the verbal material over time, if compared with meaningless gestures; this behavioral data was accompanied to imaging data, that indicated an activation of premotor cortices only for words encoded with iconic gestures.
The researches that use actions for understanding the interplay between language, motor system, and mirror neurons find in VR a privileged medium where being implemented.VR gives users the opportunity to see themselves moving in the environment while being comfortably seated in a chair.Thanks to different input devices participants could virtually perform any action, even those typically not performable in an experimental setting (to jump a rope, kick a ball, or shoot something, for example).Thus, within a virtual environment, experimenters could investigate the effect on language processing of performing different actions.The fact that users are not really moving their bodies in the real space, but still have the subjective sensation of being 'in action', places VR in a intermediate position between the real action and mere action observation (such as in a video): It has been demonstrated that cortical excitability is modified by the observation of movements performed by others (Strafella & Paus 2000), but this modulation is greater if the orientation of the movement is compatible with the point of view of the observer (Maeda et al. 2002).The advantage of VR is the fact that the movement the individual does is egocentric, exactly as if he/she would act in real world.
As Cameirao has argued (Cameirao et al. 2010), the first-person perspective could engage stronger the mirror neurons system because this is the perspective the system is most frequently exposed to.This observation has important rebounds in the field of rehabilitation: If the enactment of verbal material facilitates learning in non pathological samples, it should be investigated if this effect is replicable in people with language deficit.Moreover, often patients with different types of aphasia have motor deficits as well, and VR could give them the opportunity to take advantage of the action-language coupling protocols even without moving at all.
Finally, VR experiments can be conducted also in association with imaging techniques, such as fMRI: Further researches, thus, using virtual environments during fMRI scans could shed light on the cortical activations triggered by virtual movements, and on the role of mirror neurons in these processes.

Conclusions
This contribution, starting from a theoretical reflection on the importance of the embodied cognition, aims to emphasize the relevance of this topic for the study of the relationship between language and the motor cortex.
Recently, many studies have been presented related to this topic, but a review of those studies has revealed conflicting results.Which could be the cause of these differences?A critical analysis has allowed us to hypothesize that they may be at least partly attributable to different experimental protocols, each of whom would study a specific stage of the neurocognitive processes being examined, leading them to measure different things, and reporting different results when relating the recorded values to the same functions without stressing the differences in timing along the whole examined process.
After this first clarification, it is encouraging to be able to highlight, on the basis of the review presented here, how the investigation techniques used in the presented studies, which are extremely different from each other, and aimed at investigating different aspects, setting the research protocol with a causal perspective (TMS) or a correlational one (fMRI), still revealed a strong reliable theoretical link between language and action.But how is it possible to operationalize these results, taking into account, with a critical perspective, also the differences to which different research protocols lead?In order to answer this question in this paper we suggest and discuss the theoretical and operational usefulness and relevance of VR.
Due to its functional characteristics, which are extensively described, this tool allows to test many of the theories previously investigated with other techniques, but using a more environmentally friendly (and ecological) setting and a reverse pattern (starting from real action and not from an abstract/verbal stimulus) that would allow a real enrichment of this specific area.
In addition to this, the already known and popular applications of VR in clinical settings, open up new fields of application of studies linking language and action (with particular attention given to the contribution given by studies on mirror neurons).Actually, this method not only allows an enrichment of specific knowledge on the phenomenon, but it can be considered as a promising field for applications of theoretical insights to improve the learning or relearning of language or motor skills in deficit conditions.