Abstracts

Alphabetical by last name.

Albano, Mariangela. University Sorbonne Nouvelle. Title: Adaptive memory in case of linguistic contamination: a comparison of the reception of TV News in France after FIFA World Cup in 2014 and after November 2015 Paris attacks. Abstract: This paper analyses a fieldwork based on the adaptive memory in the case of linguistic “contamination”. We analyse the reception of TV News (2 reports in French) from a sample of 50 individuals from 20 to 30 years old (25 women/25 men). In particular, our analysis concerns the reception of a report about the FIFA World Cup 2014 and a report relating to the Paris attacks in November 2015. Each interview will be characterised by a psychological methodology and, in particular, we take into consideration the research of Bonin and Bugaiska (2013). We will analyse the 50 interviews using a joint psychological and linguistic approach. In particular, we take into consideration the evolutionist approach in psychology. According to this approach, memory is considered problem-oriented because there is a specialisation to memorize certain kinds of information related to survival problems. In this study, we show that it is possible to apply the perspective of physical contamination to the case of a “linguistic virus”. Moreover, media language is characterised by a high degree of stereotypes that are entrenched in our brain through the normal process of neural learning. According to cognitive linguistics, we acquire them automatically and unconsciously through the learning of conventional everyday blends. Indeed, the most important concepts are conceptualized by multiple metaphorical and analogical structures. From this perspective, in order to analyse the interviews, a cognitive linguistics approach will be applied to show how the answers of the participants could be influenced by the linguistic and visual choices of the newscasts. The aim of this research is to understand how the survival pressures have a primary role in the memorisation process, to show which elements of the TV News activate automatic stereotypes and how categorization works in the creation of metaphorical blendings.

Alcaraz Carrión, Daniel; Valenzuela, Javier. Lancaster University; University of Murcia. Title: Temporal co-speech gestures: A comparison between spatial and non-spatial temporal expressions. Abstract: In this study, we compare the co-speech gestures triggered by a number of English temporal expressions as a means of finding out about English speakers’ conceptualization of time. The temporal expressions examined belong to three different categories. The first category involves temporal expressions that do not employ spatial language (earlier, later) and are thus non-metaphoric; the other two consist of spatial temporal metaphors (that is, temporal expressions which include spatial language), which are further subdivided into directional expressions, that is, spatial expressions which mention explicitly the direction as in back in those days or months ahead, and non-directional spatial expressions such as distant past or near future, which include spatial terms -distant, near- but do not make reference to a specific spatial location. The aim of the study is to determine whether or not there is a difference in co-speech gestures (and thus, a different conceptualization) among these different categories. Data was obtained through the NewsScape Library, a multimodal corpus which contains more than 10 years of television news and talk shows and allows us to gather high-quality, natural data. We collected a total of 469 temporal co-speech gestures, divided among the three categories (127 for non-spatial, 146 for spatial directional and 196 for spatial non-directional expressions). All the data was qualitatively analyzed by two different coders to ensure its attestability. Our results provide support for previous hypotheses, for instance, the tendency reported in the literature for English speakers to create online timelines on the lateral axis when conceptualizing time. This has been found to be the preferred axis, though other axes are also employed in gesture realization (sagittal and vertical), in a proportion which has been observed to depend on the specific type of temporal category. For example, lateral gestures are more likely to be performed in non-spatial language (72%) rather than directional (59%) or non-directional (64%) language. This is congruent with the linguistic terms used, since items such as back or ahead are linked to the sagittal axis rather than to the lateral one. However, contrary to expectation, a sagittal gesture is more likely to be triggered by non-directional linguistic items (37%) than by directional ones (19%). We hypothesize that sagittal gestures are more frequent in non-directional metaphors because the speaker needs to establish a clear temporal point by gesturing, since such information is not linguistically indicated.

Ambrus, Laura. Eötvös Loránd University, Budapest, Cultural Linguistic Doctoral Program. Title: The role of gestures in the interpretation of digital memes. Abstract: The spreading of the internet had a great impact on language use and communication. Among the biggest changes we can mention the speed of information spreading, the dismissal of grammatical and orthographic rules, fact that nowadays influences the everyday language use, and most importantly, the very frequent usage of non-linguistic elements with specific communicative functions. According to this tendency, a big proportion of digital communication is performed via pictures and pictorial/textual elements, even in those cases in which a textual description would carry the information. The attention-grabbing function of pictures is not in doubt in our present days, but several questions arise from a cognitive linguistic point of view: what are the meaning making processes the memes perform? How do we interpret them? What kind of function has the pictorial and the textual element, and how do they correlate? The main hypothesis is that the digital memes may be multimodal not only in a textual-pictorial form, but also can make use of the gestures and facial expressions in a way that in certain cases the gesture/facial expression is represented by the pictorial element, so in the meaning making process the picture carries the meaning of the gesture/facial expression. Generally can be stated that the metaphors and other figurative devices are not represented exclusively in a multimodal way, however, in most of the cases the emotions and opinions represented by the pictorial elements may be metaphoric, or metonymic. Interestingly, in the most frequently used memes, the gesture represented acquires a particular meaning that was not present initially in the original picture, consequently it can be used without any textual element, since its meaning (the meaning of the gesture) is well-known among the users. However, the academic opinion regarding the facial movements is highly divided, only a few authors consider them gestures (Ekman and Friesen, Argyle), with the restriction of intentionality: if the speaker displays a facial expression, it can be qualified as a gesture. In the present study the facial expressions are considered intentional, because in case of memes can be assumed that the pictorial element that represents a gesture is chosen by the producer for a reason. The pictures representing a gesture or expressing an emotion via facial expression are combined with textual (contextual) messages that have nothing to do with that specific picture in reality. It can be assumed that the gestures and facial expressions represented in the pictorial elements are chosen very carefully, they carry a meaning which has a huge importance in understanding certain memes. Presumably the pictorial elements are selected in a way that the represented gestures should be quite typical for certain emotions and opinions, even though the original picture was not captured to express these emotions. The analysis of the metaphors is based on the Conceptual Metaphor Theory, blends are very common, where the Conceptual Integration Theory performs better, and aspects of multimodality are taken into account based on Charles Forcville’s works. The gestures are investigated in a CMT-friendly manner.

Arruda-Doná, Beatriz; Pereira de Souza, Aline. UNESP (State University of São Paulo). Title: The seduction of the image: an attractive way of constructing texts. Abstract: At the light of the concept of Cognitive Projections, discussed by Fauconnier (1995; 1997; 2001), Fauconnier and Turner (1994; 2002), Turner (2014), Berger (2012) and Hofstadter and Sander (2013), this paper aims to show the relevance of analogue projection processes, such as metaphor, metonymy and parables, in the construction of written text, and in particular of an argumentative text. This work focuses on the main role of a text introduction and relates it to the effect the images have on this creative process. In this sense, analogy is an important cognitive process that can be used to contextualize the theme and introduce the point of view defended in an argumentative text. It plays an important role as seducing the interlocutor to the text content, creating an emotional environment that catches the interlocutor’s attention to what will be discussed throughout the text. Based on studies of Arruda (2007), the use of these processes (metaphor, metonymy and parables) consciously functions as important tools in the introduction of an argumentative text, since they create images with greater visibility and refer to situations with which the interlocutors are familiar. As a secondary school teacher in Brazil, working on “reading” and “writing”, I believe that they need these domains not only to go on to a higher education, considering that most selective processes in Brazil requests production of argumentative text, but mainly because they need reasoning skills for their everyday communication. What this paper presents are the results of the analysis of three students’ work, having applied the principal theory outlined above into the line classroom setting. In conclusion, it will show how the seduction of the images provides a very attractive and creative way of constructing texts.

Augustyn, Rafał. Maria Curie-Skłodowska University. Title: Verbo-audio-visual ensembles in science communication. Abstract: Human cognition is reliant on various sensory stimuli and their simultaneous processing has a largely positive impact on the way we process information, inter alia, by raising our attention, boosting our memory capacity and engaging us emotionally. Therefore meaning, which in Cognitive Linguistics paradigm is usually reduced to conceptualisation, cannot be limited to a single (verbal) mode of representation, as meaning-making is based on a number of human cognitive faculties, including our ability to think in metaphors, recognizing patterns and finding correspondences between concepts deriving from different sensory inputs (cf. Forceville & Urios-Aparisi 2009). Multimodal communication can also be successfully used for educational purposes (cf. Kress 2010), and thus in recent years we have been witnessing unprecedented rise of different and creative forms of communication aimed at popularising knowledge pertaining to virtually every science field and discipline. Owing to modern social media (YouTube, Vlogs, dedicated websites) and special events held in real life (e.g. TED conferences, FameLab) both researchers and lay science communicators use different channels available and create multimodal presentations (using mainly picture and sound) on strictly scientific or science-related topics to attract attention of different target groups. As the aim of such presentations is to attract the attention of the audience and the presenters usually have a very limited time to tackle frequently a complex scientific issue, it requires great planning skills as to both the content and the form of presentation. With this in mind, based on selected examples of FameLab competition entries and TED talks in which the presenters use verbal message, pictures, gestures and music, I approach to account for the interplay of different inputs in the process of meaning construal as intended by the presenters. To this end, I will use Fauconnier and Turner’s (2002) standard model of Conceptual Blending, as well as its later modifications (cf. Oakley & Coulson 2008 and Brandt 2013) which, in my opinion, can give further insight into how inputs from various modalities are fused together to produce a semantically rich yet succinct blend that can later be successfully unpacked by the audience. In particular, I am concerned with the following questions: a) Can we identify individual patterns of multimodal conceptual integration in science communication? – A pilot study suggests that, for instance, music tends to be conceptualised as a moving object in space (cf. Pérez-Sobrino 2014) and thus it is possible to integrate “musical dimensions” (such as rhythm, pitch, tempo, timbre) with spatial dimensions as used in gestures or orientational metaphors in language. b) Can we establish the degree of merger between different types of multimodal inputs (verbal, audio, visual)? – Is it possible to assess the contribution of individual inputs to the actual meaning of the multimodal blend? c) Can we exactly define the role conceptualisers play in multimodal blend creation? – Is it more a subconscious or conscious process?

Avelar, Maíra; Ferrari, Lilian. State University of Southwest Bahia (UESB); Federal University of Rio de Janeiro (UFRJ). Title: "Experiential Integration and Deixis: the discourse role of gestures."Since spoken utterances are usually accompanied by hand gestures (HOSTETTER & ALIBALI 2008), we propose an analysis of deictic multimodal constructions (e.g., STEEN & TURNER 2013) based on preliminary Brazilian Portuguese data. More specifically, we focus on the spatial deictics ‘aqui’ (here) and ‘lá’ (there) in Brazilian Portuguese political discourse, using videotaped data. Drawing on Cognitive Linguistics framework, the analysis conjugates Mental Spaces Theory and Cognitive Grammar, relying on the notion of Conceptual Integration (FAUCONNIER & TURNER, 2002), and the related notion of Experiential Integration (AUCHLIN, 2013), as well as on the operation of construal, and the related concept of focalization based on foreground/background distinctions (LANGACKER, 1991, 2008, 2013). For the characterization of emergent structure in the blend space, we investigate the relations between specific deictic uses of ‘aqui’ (here) and ‘lá’ (there) and associated gestural configurations. Some other previous results on pointing gestures in Brazilian Portuguese corpora showed that all the seven different pointing gestures categorized by Kendon (2004) were performed, and most of the pointed referents were not concrete, but abstract (AVELAR, 2016). Based on the notion of gesture excursion (KENDON, 2004), we analyze the gesture strokes that co-occur with ‘aqui’ and ‘lá’, and the results suggest that discursive uses of these spatial deictics emerge from the integration of a conceptual input space, which contains the deictics’ linguistic structure, and an experiential input space, containing specific gesture configurations. The integration of these input spaces allows the speaker to select parts of discourse s/he wishes to put in prominence.

Averina, Anna. Professor of the Chair for Germanic Philology of Moscow Teacher Training University. Title: Modal particles in combination with modal words as multimodal constructions in German. Abstract: The purpose of this paper is to describe different possibilities of usage of modal particles with modal words in German sentence. Modal particles in German such as doch, wohl and ja can be combined with modal words according to some certain rules. The common position of modal particles in front of modal words has already been represented (Coniglio 2011), (Abraham 2011). However, there are cases when some modal words take the position in front of modal particles. The following sentence can exemplify this point: (1) Dafür war aber eigentlich wohl eine andere Person zuständig, der ja meines Wissens arabisch-islamische Gruppen in Berlin koordiniert (DeWaC). This paper is concerned with the following issues: modal meanings the modal particles wohl, doch and ja encode, the reasons why the modal particles are multimodal, the reasons why some modal words can be used in front of modal particles whereas others can not, e.g.: (2) Dass uns das sehr überraschen musste, wird hoffentlich (*) wohl sehr leicht begreiflich sein (DeWaC). (3) Der Topf, in den dieses Geld fließt, macht allerdings wohletwa 25 Prozent der gesamten GEMA-Erträge aus (DeWaC), the functions modal particles have when used after modal words/ if these functions differ from those they have when used before modal words or not, if the compatibility of modal particles depend on the semantics of modal words, the way the contextual environment can influence the possibility of usage of modal particles after modal words.

Barbieri Vieira, Sarah; Suarez Abreu, Antônio. São Paulo State University/ Estácio - UniSEB; São Paulo State University (UNESP). Title: Multimodal metonymies and metaphors in political discourse. Abstract: Traditionally, metonymies and metaphors are seen as something anchored in visual images as in We need more hands in our company orEinstein was a giant. Our proposal in this work is to explore multimodal sources linked to our five senses: sight, taste, smell, hearing and touch in the political discourse. Our knowledge of the world is essentially metonymical. We see only parts of things and people. We can retrieve what they are or who they are because we keep their wholes within our long-term memory. So far we are talking about images. However, when I receive a phone call from my mother, I do not need to ask her who is calling me because I can easily metonymically link her voice to her, retrieving this information from my long-term memory. When I am driving my car on a very long descent that forces me to use the brakes uninterruptedly and start to get brake smell, I immediately link that smell to break deterioration. When I taste a pineapple, I can say if it is sour or sweet, comparing it to tastes previously stored in my long-term memory. When I handle a melon at the fair, I know, by the touch, if it is ripe, comparing its consistency to the consistency of ripe melons that I have stored in my long-term memory. We retrieve all those multimodal experiences from our five senses in our everyday experiences when we use metaphors as in She is in the clouds because she is getting married tomorrow (sight); This dress has yelling colours (hearing); These partnerships between government and contractors do not smell good(smell); The high of interest is a bitter remedy (taste); We need to trim the edges between negotiators (touch). These resources aid us to perform in our minds not only visual images as said Bergen (Bergen, 2012), but multimodal ones. As says Lakoff in his well-known recently reissued book Don't think of an elephant, words matter. Language is part of the way we construct the world within our mind. “You might think that the world exists independently of how we understand. You would be mistaken.” (Lakoff, 2014, p. 35). Looking randomly at the Washington Post (14.01.2017), we ran into some of those multimodal metaphors: Obama is not going quietly in the twilightof his presidency (sight); Trump will break the mold. (touch); It’s a long step from those kinds of outbursts to any rhetoric… (hearing). We believe that studying multimodal metonymy and metaphors in the politic discourse would lead us to understand one of the important tools politicians use in their rhetoric for framing the way the audience must built the world as they intend.

Beltrán-Palanques, Vincente; Campoy-Cubillo, Mari Carmen. Universitat Jaume I. Title: Multimodal pragmatics in conversation: What the active listener reveals verbally and non-verbally. Abstract: Research in interlanguage pragmatics (ILP) has largely ignored the multimodal nature of communication. Furthermore, ILP research has been rarely explored from a conversation analysis (CA) perspective (Kasper, 2006). A multimodal approach to CA would enable researchers to explore how speakers construct action over the course the conversation from a multimodal perspective. In following this perspective, one might observe not only how speakers take turns to elicit sequences but also simultaneous talk performed through different communicative modes. We thus observe how speakers construct talk that is supported by a variety of modes. Although some studies have been conducted to explore simultaneous talk in authentic data, and particularly, backchannels (Adolphs, 2008; Knight, 2011), there is a need for examining this phenomenon in simulated data in the arena of ILP. Although simultaneous talk might sometimes be seen as a face-threatening act (Brown and Levinson, 1987), it can also be regarded as an expression of positive politeness in the sense that speakers may attempt to talk simultaneously as they are actively engaged in the conversation (Fernández-Amaya, 2013). A conversation feature that is commonly observed in everyday interaction is backchannel, which reveals signals of active listenership. The present audio-visual pragmatic corpus-based study consists of simulated data elicited from foreign language learners performing complaint sequences. By means of a video corpus, a multimodal perspective was used to explore aspects of visual interaction (Goodwin, 2000a, 2000b; Heath, Hindmarsh and Luff, 2010). Particularly, for the purpose of the current study, we follow a multimodal CA perspective (Mondada, 2016; Streeck, Goodwin and LeBaron, 2011) that allows examining multimodal pragmatics in conversation and more precisely the multimodal construction of interaction by focusing on the nature of the active listener, that is, backchannels. The present study reveals how participants perform backchannels not only verbally but also non-verbally and how those signals affect the current speaker’ construction and the elicitation of multimodal turns. Hence, we examined the multimodal construction of interaction by paying attention to the active listeners’ interplay of verbal production, facial expressions and gestures.

Berindeanu, Florin. Case Western Reserve University, Department of Classics. Title: Deleuze Beyond Language: Communication between Semiotics and Pragmatics. Abstract: Gilles Deleuze is one of the most prominent post-structuralist philosophers whose work - especially his major titles - defends the precursory roots of pragmatics over semiotics. Such a claim will be analyzed in this paper with particular emphasis on how Deleuze attempts to redefine a certain "deterritorialization" of the sign. A particular regard will be given to the way in which the philosopher describes the circulation of signs from grammatical interaction to literary and mytho-critical pragmatics. Examples sustained by Deleuze's own references to the Stoics, Kafka and myth will illustrate the whole point of the intervention, namely that what is traditionally called philosophy represents in fact the passage of becoming from pragmatics to semiotics.

Berov, Leonid. University of Osnabrück. Title: Mental Simulation for Multi-Modal Story Telling. Abstract: We present phenomenological evidence that suggest that the creation of (multimodal forms of) narratives involves mental simulation as an exploratory process as part of an iterative engagement-reflection cycle. Our domain of study is film, an inherently multimodal genre. As Russell Davies, screenwriter of the fiction series "Doctor Who" puts it: "it's not just music + picture + character in separate beats. No, they're all interconnected. The pictures aren't just pictures; they're the tone, the wit, the style, the plot, the people, all in one" (Davies & Cook, 2010). How does the mind of a screenwriter synchronize these different modalities: sound, picture and the spatio-temporality of plot? Evidence on the employed mental models can be uncovered from phenomenological accounts of the creative process, like e.g. Davies' "The Writer's Tale", an epistolary description of the writing process of "Doctor Who". Phenomenology -- the description of the subjective experience of a process -- is considered a valid source of information during the first steps of cognitive modeling, since cognitive models need to show "how causal relationships at a deeper level give rise to the phenomena of cognition at a higher, phenomenological level" (Sun, 2005). Repeatedly, Davies describes instances of hearing and seeing a narratives unfold during writing. This quasi automatic execution of symbolic representations of the external world is reminiscent of simulation, a mental model that is also employed during the consumption of textual narratives (Oatley, 1999; Steen, 2005). Reading in itself is a process that transforms unimodal text into a rich, multimodal mental representation. It seems reasonable to assume that the mental machinery used to decode a narrative should be the same as the one used to encode it. Our phenomenological evidence suggests that, during writing, mental simulations can be executed repeatedly under varying preconditions, which affect the result. This can be interpreted as an exploratory process that helps synchronizing the spatio-temporality of events in the plot, with the other modalities that require the writer's consideration during screen writing. Under such an interpretation mental simulation should be considered a part of the engagement-reflection model of creative writing (Sharples, 1999). We argue that this is enough to establish the hypothesis that mental simulation is part of the mental processes involved in writing (especially multi-modal) narratives, and suggest that an analysis of deeper structures (according to Sun's definition) is called for in a next step.

Bhatt, Mehul; Suchan, Jakob; Kondyli, Vasiliki. Human-Centred Cognitive Assistance Lab; The Designspace Group University of Bremen. Title: Multi-Modal Studies of Human Behaviour: The Case of Embodied Experiences and Visuo-Spatial Cognition. Abstract: The field of visuo-spatial cognition and computation has established its foundational significance for the design and implementation of computational cognitive systems, and multimodal interaction & assistive technologies where people-centred perceptual sensemaking and interaction with cognitively founded conceptualisations of space, events, actions, and change are crucial. Our research in particular has prioritised those areas where the processing and interpretation of (potentially) large volumes of highly dynamic visuo-spatial imagery are central. In particular, our research on visuo-spatial cognition and computation addresses a wide-range of domains where perceptual sensemaking --e.g., abstraction, reasoning, learning-- with dynamic visuo-spatial imagery is central. In this backdrop, deep semantics denotes the existence of declaratively grounded models (e.g., for spatial and temporal knowledge) and systematic formalisation that can be used to perform reasoning and query answering, relational learning, embodied grounding and simulation etc. The talk will present a computational framework for the grounding and semantic interpretation of multi-modal dynamic visuo-spatial imagery (e.g., consisting of video, eye-tracking and other human behaviour data). The framework and its applications in two areas of industrial and academic interest are showcased: 1. architecture design. aimed at predictive and empirical / evidence-based design involving analysis of pre and post design user behaviour in buildings, e.g., pertaining visual perception of signage, wayfinding etc. 2. cognitive film studies. aimed at investigating attention and recipient effects in observers vis-a-vis the motion picture. I will particularly focus on the capability to perform Q/A with space-time histories within constraint logic programming (CLP); space-time histories as primitive objects provides the foundation for grounding perceptual and analytical entities, e.g., areas of visual attention, moving objects, as native objects that can be qualitatively and declaratively reasoned about within CLP. I will present running examples and a demo application involving automated analytical summarisation of large-scale visual perception experiments. Demonstrations and Case-Study: In support of the conceptual overview of our research, this presentation will also utilise supporting demos of key outcomes and a range of prototypical tools developed. As case-study, select examples from a large-scale human-behaviour experiments that we have conducted the the domain of film and hospital design will be utilised.

Bilkic, Maida. University of Bern. Title: Sensing the war: The multimodal production of public memory in Bosnia-Herzegovina. Abstract: The 1992-1995 war in Bosnia-Herzegovina (B&H) emerges in public memory in all sorts of complicated ways and across various cinematic, literary, and political domains. Important sites which (re)produce discourses of memory and memorialization are museum and gallery exhibitions, which are also fully multimodal and, indeed, multi-sensory texts (Levent & Pascual-Leone, 2014; Kress & van Leeuwen, 2001). Exhibitions as primarily visual experiences and primary composite mediums, construe ideologies through specifically presented and arranged semiotic resources co-deployed together to make a meaningful textual whole (Pang, 2004). My paper presents a critical multimodal discourse analysis (cf. Machin & Mayr, 2012) of one gallery and two museum exhibitions in Sarajevo: “Sarajevo under siege”, “War childhood”, and “Srebrenica”. In each case – and across the three cases – my analysis treats the exhibitions as complex semiotic assemblages constituted through the selection and composition of verbal narratives, artefacts, images, spaces and technologies. Following the lead of Ravelli’s (2006) research on museums as communication, I identify four key rhetorical strategies through which discourses of public memories are staged and produced: (1) hardship and survival, (2) victimhood and suffering, (3) internment, and (4) malevolent enemies. Through this analysis, I demonstrate how the multimodal design and organization of the exhibitions work ideologically to produce an over-riding narrative of national victimhood which remains locked in a vicious cycle of (re)experiencing its past. Ultimately, visitors are engaged in a surely meaningful, but also socially desirable processes of (re)imagining and sensing the war, often without agency.

Bolek, Elwira. Maria Curie-Skłodowska University. Title: Play on signs and meanings in Polish artistic theatre posters. Abstract: The starting point for the planned presentation is a premise that an artistic poster is a message in which the utilised sign systems – verbal and visual – interact. Noticing that the semiotic categories are interwoven allows the use of multimodality in regard to the poster. In previous research on the semantic and cultural interpretations of artistic posters, I have always noticed the coexistence of domains contributing to the individual conceptual framework of the message’s recipient. The input spaces which blend into a single entity in the semantic and cultural interpretations of a poster are the following: object-oriented, verbal, and visual. The following elements constitute the object-oriented domain: the title of the advertised play, playwright’s name, venue, name of the poster’s author, year of creation. Some verbal signs might become independent of the object they refer to and they might begin to play on meanings. When words do not point to the designate, but create associations, they enter the verbal domain. For example, the title refers to the play, but sometimes (when the recipient is not familiar with the play’s content) it might only create connotations with the words used. This is the situation we face when reading the meanings of HenrykTomaszewski’s poster for Historia. For the recipients who did not know Gmobrowicz’s play, but saw the poster in a street in 1983, the very word historia [history], read in the context of the political situation, referred to Martial Law. This example shows also that the domains cannot be read in isolation, as the reception of the reference to Martial Law simultaneously activated elements from other input spaces. The visual domain is the area of graphical signs, with their connotations, symbolism, and the colours’ meaning. In artistic multimodal messages, linguistic signs tend to serve the function of the image. In the example, the described association of the word history with Martial Law is created partially by graphical signs – in this case the “V” (Victoria – “victory”) – a symbol of anti-system resistance, used by the demonstrating crowds, and the date 1983 – information about the poster’s creation date, but also a reference to the year of Martial Law being lifted. Tomaszewski’s message announces the content of the play – a visual depiction of a green foot refers to the Gombrowicz’s category of “barefootedness”, key to the advertised Historia – but also, through a play on the meanings of verbal and visual elements, it is associated with the political situation. This partial analysis of that poster shows that text and image interpenetrate and complement each other, blending within the poster into a makrosign, the meaning of which can be read only when recognising its blended nature. HenrykTomaszewski’s posters for plays such as Zemsta or Ślub shall constitute representative research material. The material analyses and interpretations will show how many meanings and contexts can be activated by one multimodal message. The verbal and visual signs of artistic theatre posters play on meanings and allow the recipient multiple ways of reading, reinterpreting them, and revising meanings.

Bolognesi Marianna. University of Amsterdam. Title: Flickr Distributional Tagspace: Modeling the multimodal semantic information encoded in Flickr tags. Abstract: Social tagging can be defined as a large scale uncoordinated effort through which internet users annotate digital resources (such as, for example, digital pictures or whole web-pages), with keywords known as “tags” (e.g. Cattuto et al. 2009). Social tagging is an increasingly popular phenomenon on the web, that has caught the attention of many academic researchers, interested in mining the semantic information encoded in these streams of Big Data. The idea behind this is that each little piece of data (that is, each tag) is a trace of human behavior and offers us a potential clue to understand basic cognitive principles. Flickr users, for example, tag their personal pictures with a variety of tags, that range beyond the simple visual features (e.g. concrete objects, lights and colors) represented in the pictures. Flickr tags encompass abstract concepts, sensory-related words that range beyond the visual modality, emotions, concrete objects that are not physically represented, but they are conceptually cued, etc. Flickr tagsets, in other words, encompass multimodal semantic information, triggered by visual cues (i.e. Flickr pictures). The multimodal semantic information encoded in Flickr tags provides genuine insights on salient aspects emerging from the personal experiences that have been captured in the picture. By means of distributional semantics (e.g. LSA, Landauer, Dumais 1997) it is possible to mine the emergent semantic patterns of these complex open- ended large-scale bodies of uncoordinated annotations provided by humans. I hereby show how a distributional method based exclusively on Flickr tagsets (Author, 2014; 2016a; 2016b) can model human-like behavior in determining the correct order of the color-spectrum of the 6 primary and secondary colors that constitute the rainbow, and in clustering coherent classes of semantically related words. The performance of this method is compared to the performance of language-based classic distributional methods which, unlike the Flickr-based method, are based solely on the semantic information that can be retrieved from linguistic co-occurrences. Finally, the method is used to model the semantic similarity that characterizes two metaphor terms in, respectively, a sample of representative visual metaphors and a sample of representative linguistic metaphors. The method can successfully capture the semantic similarity between concepts aligned in visual but not in linguistic metaphors. The theoretical implications are hereby discussed, in support of a multimodal account of cognition.

Bonifazi, Anna. University of Stuttgart. Title: The multimodal poetics of weeping from Homer to the Beatles. Abstract: The paper presents the results of a pragmatic and cognitive analysis of multimodality emerging from a sample of ancient literary texts designed to performance, and from a sample of modern songs. The texts and lyrics under consideration differ in language, genre, and date, yet they share the same topic: all of them describe entities weeping about something. The literary texts include lines from the Homeric epic poem The Iliad (8th to 6th cent. BCE); fragments of funerary laments (thrēnoi, 6th cent. BCE); and lines from tragedies by Euripides (5th cent. BCE). The analysis in these cases relies on the linguistic and the metrical components—besides indirect and general information about the performative context. For the songs, conversely, we can count on multiple recordings and live performances: they include a piece for voice and lute by John Dowland titled “Flow my tears” (1600), the aria from Haendel’s Rinaldo “Lascia ch’io pianga” [“Let me weep”], (1711), and the song “While my guitar gently weeps” by the Beatles (1968). I take into account the figurative language being used, the word arrangement, the nonlinguistic activities mentioned together with the act of weeping, the nonlinguistic (acoustic and visual) aspects of the performances, and how the medium “intrudes” into the semantic content in these pieces. I discuss the multimodal mappings of the conceptual metaphor performing is weeping. The analysis points out recurrent features in spite of chronological, cultural, and genre diversity: the iconic level of communication, the connection between weeping and narrating, the aesthetics of fluidity (against rigidity), and how the sense of perdurability/unstoppability is embodied. Further thoughts regard the metaphorical extensions provided by the visual representations of mourning gestures in late Bronze Age artifacts, and by labels such as the (contemporary) plant name “Niobe weeping willow.”

Borkent, Mike. SSHRCC Postdoctoral Visiting Fellow; Department of English, University of Calgary. Title: Multimodality and Medium-specificity: The Case of Temporal Inferences in Comics. Abstract: While multimodal communication is often discussed in terms of natural language use and recordings there of (such as in gesture studies), a distinction needs to be maintained between unmediated and mediated forms of multimodal communication. Media (such as video, websites, and print) function as environments that constrain and control which and how modalities can be employed, including developing discrete discursive and generic conventions (such as tweets and hashtags). While ignoring the medium is acceptable when recording and analyzing natural, multimodal communication, the medium cannot be ignored when it informs communicative practices. Mediated multimodal communicative artifacts—such as printed works, videos, websites, sculptures, and weavings—permeate human cultures and require a flexible cognitive framework that engages with the material qualities that mediate and inform representational strategies. I model such a framework through the analysis of the medium of comics. Comics and other graphic narratives are a common and popular form of multimodal communication that employ the sequential segmentation of panels, which are composed of images and text (such as in the single panel included below), that readers are required to piece together to interpret. Out of the juxtaposition and interrelation of images, language, and layouts, comics develop complex event structures, characterization, and conceptual layers. For instance, the panel below from a 1965 Captain America comic illustrates a dynamic altercation between the protagonist and a villain, including distinct physical, emotional, and political content. This paper presents a medium-specific approach to comics, offering in particular cognitive tools for engaging with temporal construal. I develop an analytical paradigm drawing on blending theory (Dancygier 2012; Fauconnier and Turner 2002) and its extension to material anchors (Hutchins 2005) and time lines (Coulson and Cánovas 2009), grounded in the embodied theories of cognitive domains (Clausner and Croft 1999; Fillmore 1982; Langacker 2001) and mental simulation (Bergen 2012; Hickok 2014). I examine how comics anchor schematic timelines in their medial sequencing of panels, while also embedding temporal information in the verbal and pictographic modalities (such as the conventional motion lines that extend the implied duration of the image). Blending theory helps us trace how each modality impacts temporal construal about events and characters within and between panels, especially through the partial activation of domain knowledge through the development of mental simulations. Importantly, pictographic and linguistic cues contribute substantially different temporal qualities. Images typically represent a brief moment in time while language often offers more lengthily dialogue or narrative cues. For instance, in the example, we see the villain mid-fall after being struck. Meanwhile, the protagonist makes a longer statement about freedom. There is an inherent asynchronicity between the modalities that must be reconciled by readers to interpret the panel as a cohesive scene. Combining the mental simulation of embodied domains through mental space blending helps show how asynchronous mediated modalities can synthesize into an emergent mental space that approximates real co-speech actions while construing event qualities. This model aligns with recent empirical research into comics comprehension (Foulsham, Wybrow, and Cohn 2016), while showing how mediation impacts meaning.

Brenger, Bela; Schüller, Daniel; Beecks, Christian; Mittelberg, Irene. RWTH Aachen University. Title: Methods and tools for motion-capture data analysis: Advances in exploiting multimodal corpora. Abstract: Co-verbal communicative action, deeply embedded in multimodal discourse, has been a subject of inquiry for several decades (e.g., Kendon 1975; McNeill et al. 2001; Müller et al. 2013). In multimodal interaction research, motion-capture technology has proven to be a powerful tool to investigate furtive dynamic structures of gestures and head movements in minute detail, to visualize motion trajectories, and to facilitate statistical analyses. However, gesture researchers are still facing various challenges when adapting this technology to communicative movements to respond to specific linguistic and semiotic research questions. Compared to video recordings, motion-capture technology has the advantage of providing precise 3-dimensional numerical data corresponding to each gestural gestalt, e.g. hand/finger configurations and movements tracked on a millimeter and millisecond scale. These kinetic data may then build the basis for numerical and statistical analyses, thus opening new avenues for gesture and sign language research (e.g. Pfeiffer 2013). This paper presents recent technical and methodological advances, both qualitative and quantitative, made in the Natural Media Lab. First, we introduce an automatization toolchain that significantly reduces the time needed for data postprocessing. Our auto-labeling-algorithm uses only few manually labeled reference frames per subject and calculates the distances between the tracked markers. An algorithm then calculates distances in the unlabeled data and finds the best fitting solution, while taking into account that distances may exhibit higher or lower variance depending on where they are placed on the body. With this procedure, fully-labelled data can be produced within minutes. Furthermore, the velocity and acceleration of markers is used in a threshold-based algorithm that creates gesture segments functioning as pre-segmented reference tiers for the annotation of hand and head movements in ELAN, which significantly speeds up the traditional procedure of segmentation, annotation, and analysis (Author 1 & Author 4 2015). In addition, normalizing gesture data makes it possible to aggregate trajectories across subjects. This allows us to visualize the use of gesture space (e.g. via heat maps), either accounting for individual gesture styles or generalizing over several subjects (Priesters & Author 4 2013). In another study, we employed pre-segmented data to create gesture signatures of movement types that served as input for a similarity algorithm searching the data set for matching trajectories, thus affording faster retrieval of all the tokens of a given gesture type from motion-capture corpora (Author 2 et al. to appear; Author 3 et al. 2016). Qualitative analyses of single gestures may further be enhanced by 3-dimensional visual representations of their configurations and trajectories, e.g. through diagrams (Author 2 & Author 4). In combination with speech-annotation and POS-tagging, this enables multimodal discourse analysis as exemplified by Author 4 and Coauthor (to appear). Overall, motion-capture technology is an asset to multimodal gesture research in that it can be employed to automatize tasks that are otherwise done manually. Resulting numeric data streams afford analyses that exceed the limits of traditional video recordings. Motion-capture thus shows great potential to provide both new insights and richer information that can assist researchers in answering certain kinds of research question more efficiently and effectively.

Brône, Geert; Oben, Bert; Feyaerts, Kurt. University of Leuven, Department of Linguistics. Title: The grounding function of eye gaze in interactional humor. A corpus-based study using eye-tracking data. Abstract: Studies in conversation analysis, cognitive discourse psychology and (interactional) linguistics have pointed at the important role of eye gaze as a grounding mechanism for both speakers and hearers (Kendon 1967, Goodwin & Goodwin 1986, Bavelas et al. 2002, Rossano 2010, 2012a). For instance, gaze can be used by speakers to elicit and monitor response by the recipients in an attempt to establish or extend common ground in the interaction. Hearers, on their part, can establish and maintain eye contact with the speaker or other participants as a display of attention, engagement and understanding (Rossano 2012b, Holler & Kendrick 2015). These phenomena lead to highly synchronized sequences of gaze behaviour across participants, realizing mutual gaze situations (i.e. both participants looking at each other, referred to as gaze windows) in which speakers elicit and recipients realize a form of minimal response (also referred to as back channels or continuers, Bavelas et al. 2002). In case of questions, for instance, speaker gaze towards the recipient may serve as an additional resource to communicate that the speaker is expecting a response, especially when the question is not marked intonationally (Rossano 2010). In the present study, we zoom in on one pervasive phenomenon in face-to-face interaction which requires particular attention in terms of grounding, viz. interactional humor. The different manifestations of this phenomenon, including teasing, joint fantasy, irony and sarcasm, all crucially hinge on the different participants’ awareness of the layered nature of the speaker’s communicative intent (Clark 1996, 2016): speakers set up complex constellations of mental spaces (author 2008, 2012; Coulson 2005) and may signal this in a multimodal way through the use of gesture or facial expressions (Tabacaru 2014). In the present study, we focus on the role of eye gaze as a grounding mechanism in interactional humor, using humorous sequences (n=100) taken from a multimodal video corpus of three-party face-to-face interactions, in which the gaze behavior of all participants was recorded using mobile eye-tracking devices (authors 2015). From the speaker perspective, we study gaze patterns as a feedback monitoring mechanism, allowing speakers to track the reaction from the different recipients in the interaction. From the perspective of the recipients, we are mainly interested in reaction monitoring between the recipients. The results of the analysis, which is based on a systematic comparison between the humorous sequences and a random selection of nonhumorous sequences taken from the same corpus, reveal a number of interesting patterns: i. There are more gaze shifts, both on the part of the speakers and the recipients, in humorous turns compared to nonhumorous turns of a similar length ii. There is a different ratio between speaker and hearer gaze in humorous vs. nonhumorous cases (with a higher ratio of hearer gaze in humorous turns) iii. The synchronization of mutual gaze events between recipients is stronger in humorous turns in comparison to nonhumorous turns.

Brunner, Marie-Louise; Diemer, Stefan. Saarland University; Trier University of Applied Sciences. Title: Developing a taxonomy of gestures in multimodal communication via Skype. Abstract: Gestures have been increasingly studied as a key means of meaning making (cf. e.g. McNeill 2000, Streeck 2010), and there have been calls for a stronger consideration of multimodal elements in corpora (e.g. Adolphs & Carter 2013). Quantitative gesture analyses are rare, not least due to a lack of multimodal corpora that allow for the detailed study of non-verbal aspects in corpus linguistics. One of the main issues with quantifying gestures in a corpus is their retrievability, if they are transcribed at all, as they are difficult to systematize. Such a systematic approach is, however, indispensable to ensure a consistent and “replicable coding scheme” (see Adolphs and Carter 2013:155) which allows, in turn, to quantify results. We use data from CASE, the Corpus of Academic Spoken English (forthcoming) as a basis for developing a taxonomy of gestures. CASE consists of Skype conversations between speakers of English as a Lingua Franca from eight European countries. For quantitative analysis, we use a dataset of 20 conversations, BabyCASE (forthcoming). BabyCASE consists of 13 hours of Skype conversations, totaling roughly 115 000 words in the annotated version. The interaction between verbal discourse and non-verbal elements in CASE allows a differentiated view that has not yet been explored in other corpora. We follow a bottom-up approach in developing a taxonomy for gestures in CASE. Gestures contributing to meaning making were marked in a descriptive way by transcribers and then extracted and grouped in systematically retrievable descriptive categories. Our taxonomy is illustrated with a case study of the eight most frequent gestures in our data: nods, head shakes, shrugs, pointing, air quotes, imitating gestures, waving, and physical stance shifts. Gestures cannot be considered in isolation, but as being interconnected with verbal interaction in a dynamic process of meaning making. Quantitative and qualitative methods were employed to analyze how gestures contribute to the negotiation of meaning in interaction. Keyword and context analyses were used to isolate co-occurring items, allowing additional categorization and quantification. Nodding, for example, most frequently co-occurs with “yeah”, “mhm”, “right”, and “okay”, indicating (and emphasizing) support and agreement. However, such co-occurrences could only be observed for two thirds of the cases, leaving the remaining instances open for interpretation; likewise, many of the other gestures had less obvious or frequent co-occurrences. Those instances of gestures were qualitatively analyzed to further categorize the different levels of meaning. Headshakes, for example, can have multiple, and even opposing meanings (e.g. confirmation and negation, awe and despair, resignation, lack of understanding, etc.), depending on conversational context and speaker background (see also Brunner, Diemer and Schmidt forthcoming). Our findings suggest that a descriptive categorization is essential when creating a taxonomy of gestures suitable for analysis, and that a mixed-methods quantitative and qualitative approach allows for a more nuanced and complete interpretation . Our paper thus contributes to the integration of multimodal elements as part of the analysis of spoken language data, developing a taxonomy which can serve as basis for automated gesture recognition.

Caponetto, Paolo G.; Vinci, Elisabetta. University of Catania. Title: Multimodality and Cultural Heritage: the Lamberto Puggelli's archive and the "Teatro Machiavelli". Abstract: Our contribution aims to present a project of valorization of a theatre archive and of an historical place of Catania (Sicily, Italy) using multimodality. The place in question is the “Teatro Machiavelli” located in the basement of an eighteenth century building, Palazzo San Giuliano situated in Piazza Università, one of the most important squares of Catania. The Palace was the residence of the noble family Paternò Castello, who played an important role not only in the political events of Catania, but also in Europe, since Antonino Paternò Castello, sixth Marquise of San Giuliano, was Foreign Minister of The Reign of Italy between 1910 and 1914. Throughout the years, the Teatro Machiavelli was subject to several architectonical modifications, producing corresponding changes in the relation with citizens. Today it is managed by two non profit institutions (Association Ingresso Libero, Lamberto Puggelli Foundation). The Lamberto Puggelli Foundation owns the library and the archive of the Italian theatre director Lamberto Puggelli. The archive comprehends about 300 folders containing different sources (writings, articles, images, photographies) thus documenting the activity of the famous director throughout Italy, Europe and USA. Both the theatre and the archive represent part of the Italian cultural heritage. Our research project focuses on the valorization and fruition of these elements through multimodal tools. In particular we aim to make people aware of the history of the place and to allow them to consult the archive online. In this paper we will present some example already available and proposals for the future. One of the available tools is an audioguide now published on the website izi.TRAVEL, a platform based on storytelling. Among the proposals to create we list: the creation of a database to preserve Lamberto Puggelli's archive, through the use of DBSM MySQL and PHPmyadmin and the creation of relationships “one to many”. It aims to create an useful instrument for theatre scholars and students of theatrical literature to know the career's history of the Italian artist Lamberto Puggelli, who worked in the most important lyric theatres of the world like San Carlo in Naples, Teatro Regio in Parma and Scala in Milan; an exhibition of croquis and scketches a used by Lamberto Puggelli in his works; the creation of a theatrical play concernig events and characters related to the Teatro Machiavelli and the Palace; 3D recontructions of archeological elements found in the locale of the theatre, so as to make people aware of the past history both of the place and the city in an immersive way. To conclude, as far as our project is concerned, multimodality can represent a way to increase the audience's involvement and find alternative strategies to give value to important cultural heritage.

Cichmińska, Monika. University of Warmia and Mazury in Olsztyn. Title: Global and local multimodal metaphors in (television) series. Abstract: The aim of the present paper is to propose an approach to the issue of locality versus globality of conceptual metaphors as used in series made by television networks and other services (Netflix, Amazon, HBO). In cognitive stylistics, we differentiate between micrometaphors and megametaphors (Stockwell 2002). Forceville (2008) differentiates between metaphors with a local focus and embedded metaphors. Our aim is to present an approach especially adjusted for the needs of analysis of (television) series. We would like to propose that conceptual metaphors (both mono- and multimodal) can be either global or local in scope, but also that both globality and locality need to be understood differently than in film and literary studies. Global metaphors in case of series may refer to metaphors underlying the whole series, but also one or more seasons. Local metaphors, on the other hand, may be analysed with reference to one particular scene, or a series of scenes in one episode. We would also like to address the issue of cognitive processing as proposed by Bordwell (1989) and the "thinking viewer" as proposed by Ostaszewski (1994) with reference to how those metaphors may be understood by viewers. The paper is going to use examples of selected series produced within the last 10 years (for example, House MD, Fargo, Homeland) to demonstrate that both local and global metaphors are a vital element of any quality television production. The paper is a part of the author’s research into monomodal and multimodal metaphors in series.

Cienki, Alan. Vrije Universiteit Amsterdam; Moscow State Linguistic University. Title: Utterance Construction Grammar (UCxG): An approach to constructions as variably multimodal. Abstract: Some proponents of the theory of Construction Grammar have been investigating how it might address the nature of spoken language usage as multimodal (e.g., Andrén 2010; Schoonjans 2014; Steen & Turner 2013; Zima 2014). Problems confronted in this endeavour have included the variability with which gesture is used with speech in terms of its (in)frequency and its (non)obligatoriness: for some expressions, a certain kind of gesture is basically obligatory (witness the speech-linked gestures accompanying deictic expressions [McNeill 1992]), but for many others, gesture is a variably optional component, the use of which depends upon contextual (top-down) and cognitive (bottom-up) factors (the “gesture threshold” discussed in Hostetter & Alibali 2008). Following Kendon (2004), “utterance” is proposed here as a level of description above that of speech and gesture for characterizing face-to-face communicative constructions. It picks up on earlier proposals to consider constructions as prototype categories with more central and more peripheral features (Gries 2003; Imo 2007; Lakoff 1987). The language community’s knowledge of a given utterance construction and that of any language user are discussed as “deep structures” (in a non-Chomskian sense) that provide a set of options (some more central and others more peripheral) for expression. In this sense, any “surface structure” is a metonymic precipitation in context of the construction’s features. Furthermore, these deep and surface structures can be thought of in terms of language users’ knowledge of their potential forms, or in terms of how they are actuallyused in given communicative usage events (Langacker 2008). For example, what Fried (2015) calls a construct (a concrete utterance token that has actually been produced) is a surface structure as actually used, while a potential surface structure involves the level of all the possible allomorphs of a given utterance construction. An important attentional mechanism proposed that guides production and comprehension (“uptake”) of utterance constructions is the dynamic scope of relevant behaviors. Building on the notions of the selective activation of meaning (Müller 2008) and attentional analysis of meaning (Oakley 2009), the claim is that a given producer’s focus on what behaviors to deploy as relevant in a given context, and an attender’s focus on which behaviors of the producer are relevant in a given context, are both variable. This means that the producer’s and attender’s scope can be broader or narrower, involving more or fewer behaviors (from among speech, facial expressions, manual gestures, etc.), but that greater alignment in the range of their scopes, and in the degree and kinds of multimodal cues being included as relevant, should result in interactants being more in tune with each other (i.e., interpreting the surface structures used as indexing the same “deep structures”). Taken together in terms of a more fully elaborated framework, it is hoped that the elements of this approach may help bring Construction Grammar closer to being a truly usage-based theory (Barlow & Kemmer 2000).

Císlerová, Eva. Masarykův ústav vyšších studií, ČVUT, Praha (Masaryk Institute of Business and Interdisciplinary Studies, CTU). Title: Architecture: Language and Paralanguage. Abstract: The paper observes recorded interactions and studies the role and the limits of spoken language in the field of architecture. The basis of the paper is to describe a small study of 15 interactions (video recordings; approximately 1 hour length each) that took place in architectural contexts – lectures, discussions and project teamwork. The recorded interactions took place in English, but they were not monocultural, because their participants came from several countries and cultures (Switzerland, Denmark, Czech Republic; architects, academia and NGOs). Aim of the analysis was to observe paralinguistically distinct situations and find features of the interactions in which verbal communication was insufficient and which were related to the topic of architecture. The data was analysed with the help of qualitative data analysis and research software atlas.ti. Theoretical background of the paper is the multidisciplinary work of Fernando Poyatos and the research is based on four theoretical concepts: culture, interaction, dialogue and text. Culture is traced both as language culture and as an individual culture with associated norms and values. Each interaction is seen as a unique encounter of cultures. The most significant paralinguistic features of the interactions were gestures that pointed to models and drawings (illustrators). Pointing at a model was often connected with difficulties to describe features of a building and with difficulties of finding the correct architectural term. These difficulties originate from the immaterial (language) and material (architecture) character of the two phenomena. However, architecture isn´t a purely material phenomenon and there were several situations in the interactions in which aesthetic qualities of language (poetry) helped to explain an architectonic model. Another significant paralinguistic feature of the interactions was the considerably lower tempo of speech when talking (both lecturing and discussing) about the concepts and meanings of architectonic projects. The paper enables linguists and communication scientists to discover features of language and paralanguage that emerge from specific language interactions (language for specific purposes), and experts in architecture to discover the role and limitations of language in their branch. The most interesting situations will be chosen; their transcribed versions will be presented and played for the audience.

Colautti, Jesse. Graduate Student, Department of English: University of British Columbia. Title: The Lord of the White House: Blending fictive narratives with reality in multimodal memes and gifs. Abstract: This paper focuses on a subset of examples I use to examine why image-plus-caption clips from films and television series are popularized in social media as metaphors for real world events and problems faced by users. I expand on current research into multimodal internet discourse (Dancygier and Vandelanotte, 2015, 2016) to demonstrate that memes and gifs metonymically evoke salient frames to blend reassuring fictive narratives with threatening real world events to offer a positive model of interpretation. I suggest my data is indicative of a broader linguistic pattern in which people use known fictive narratives as models for cognition (Herman 2003). Shared memes and gifs provide new evidence of this phenomenon through the easily accessible and quantifiable medium of social media. Specifically, this paper analyzes a variety of multimodal memes and gifs used in the days following the 2016 U.S. presidential election, all of which frame the unsettling election result in terms of events from The Lord of the Rings films. For example, multimodal blends in the examples below provide fictive representation of two aspects of the election: reframing the future White House as Barad Dur, Sauron’s evil fantasy locale, and using the movie dialogue and characters to offer a consoling message. The major mechanism prompting these types of blends is frame metonymy: recognizing the evil eye of Sauron and the White House as metonymic representations of the two (now connected) realms, the US and Mordor, or tapping into the knowledge of The Lord of the Rings narrative as a whole to create an imagined positive outcome. I consider several types of visual information: the iconic nature of the images (the eye, the White House), the representations of characters’ eye gaze and gesture (in ways that prompt the engagement of the viewer as addressee or overhearer), and the choice of textual snippets, as relevant to both the fantasy story structure and the real situation being framed. They can be viewed consistently in terms of viewpoint networks, as described in Dancygier and Vandelanotte 2015, 2016. The spontaneous emergence of such artifacts in the immediate aftermath of the election raises two key issues. Firstly, the rapid sharing of these blends through social media suggests that narratives that have become simplified through pop culture can metonymically evoke richer frames through single images or lines of text. Thus, the long narrative of The Lord of the Rings in these examples is salient to social media users because it is accessed simply as a successful, yet arduous journey against evil, even by those who are not familiar with its more complex narrative. Secondly, the speed of these multimodal constructions also suggests the power and primacy of fictive blends in web users’ cognition. I argue these blends are more than just creative and popular metaphors for real world events, but rather crucial tools of cognition.

Colín Rodea, Marisela;Mihaeliescu, Mihaela. Universidad Nacional Autónoma de México. Title: Modes and submodes in Mihai Eminescu’s literary work. Abstract: This research is an observational study that asks how a literary text changes itsmodality from writing to orality, passing from the poetry to the singing, to themovie until it arrives to the electronic media as image and as video. In essence, all communication is multimodal, but ZUBERBOHLER (2002:32) says that multimodal communication is used, where 'multi' does not just imply 'sound and vision', but the fact that several different forms of communication (in the sense of submodes) can be implemented within both the auditory and visual channels, ZUBERBOHLER (2002:3). Our corpus is the literary work of Mihai Eminescu, a Romanian writer of the romantic period in1850. Our study analyzes the multimodality acquired by this literary work in the communication and identity of the Romanian people: We explain the process of circulation's literary work passing from writing, orality, image and the submodes: audio-orality, orality–gestures-images. We show how the text is inserted in different textual genres: the poetry, the singing, the dialog, the image and the video. We explain how other meanswork (Kress, 2009), how the use of electronic media makes the work goes for the hiperspaciality (Lussault , 2013) and which changes in reception happen. The theoretical scope uses Kress (2009) social semiotic theory based on cultural and social context and meaning. This author understands multimodality as a communication norm centered on the different means of delivering content. The assumptions of this proposal are: 1) that it is culture that provides the resources to reproduce meanings; 2) that cultural diversity implies differences in representation and meaning; 3) that language is not the only means of conveying meaning;; and 4) that the objects of study can be varied and dynamic: the book, the film, the web page, the artwork, or any piece that can trigger processes of meaning creation. According to Kress (2009, p. 265) recent years have seen an increasing use of virtual spaces, a combination of media and the emergence of new communication patterns. A mean is defined as a set of socially constructed material resources—speech (S), writing (W), gesture (G), dance (D), images (I), movements (M)—to materialize the message to be conveyed. The analytical tools were based on the taxonomy of communication proposed by Zuberbühler (2003:31-33). The data were generated from an experimental exercise of audiovisual reception in native informants groups. We observed perception based on meanings and emotions. We showed the cultural side of multimodality communication.

De Beugher, Stijn. KU Leeuven EAVISE. Title: A semi-automatic annotation tool for unobtrusive gesture analysis. Abstract: One of the key challenges that researchers of multimodal communication face, is that the empirical analysis of speech in relation to gestural behavior, gaze and other modalities requires high-quality video data and detailed annotation of the different semiotic resources under scrutiny. In the majority of cases, the annotation of hand position, hand motion, gesture type, etc. is done manually, which is a time-consuming enterprise requiring multiple annotators. In this paper we present a semi-automatic alternative, in which the focus lies on minimizing the manual workload while guaranteeing highly accurate gesture annotations. More specifically, we zoom in on three features of our system: (i) segmentation-based hand detection, (ii) positioning of hands in gesture space, and (iii) analysis of the directionality of gestures. Our gesture analysis builds on a semi-automatic, segmentation-based, hand detection approach as proposed by [3]. Once the positions of the hands are obtained, our framework automatically segments a recording in gesture and non-gesture segments based on the position of the hands. We validated our gesture segmentation on a recording of the NeuroPeirce corpus [1] and of the SaGA corpus [4], since these data sets include manual annotations of gesture segments. In total, both recordings have a duration 13.5 minutes and consist of a total of 23500 video frames. Comparing our automatic gesture segmentation against manual segmentation resulted in an average F1-accuracy of 88.61%, which demonstrates the usefulness of our approach. Furthermore, the manual effort is reduced to a minimum: in only 2.6% of the frames, the system required manual validation or correction. In a second step, the result of the gesture segmentation described above is used as a basis for calculating the position of the hands in gesture space. Manually annotating the gesture space is extremely labor-intensive, since ideally, one has to assign a specific spatial position to each individual frame of a gesture sequence. For that reason, most manual annotations of spatial information only provide one value for an entire gesture phase or unit. To overcome this problem, our approach automatically analyzes the position in gesture space for each gesture segment according to McNeill’s gesture space [5] and automatically defines the appropriate sector and sub-sector for each hand in each frame of the segment. A third analytical layer concerns the directionality of gestures. Several gesture annotation systems (e.g. [2]) and empirical accounts include the direction and movement of hand gestures, resulting in a specific trajectory. Comparable to the positioning in gesture space (cf. ii), we noticed that manual annotation is often restricted to a partial analysis. For example, the directionality of an entire leftward pointing gesture is often annotated as “left”, since this is the major direction of movement. To further support and refine the annotation, we propose an automatic alternative. Here, we calculate the direction of movement for each frame by comparing the hand positions of the current frame and the positions in the previous frame. This generates a reliable and fine-grained movement analysis that can be used for further statistical and time-sensitive analysis.

De Smedt, Tom; De Pauw, Guy; Daelemans, Walter. Textgain; University of Antwerp. Title: Real-time text mining applications. Abstract: Recent advances in natural language processing have made it possible to evaluate large volumes of text in real-time, from syntax (e.g., part-of-speech tagging) and semantics (concept extraction) to state of mind (sentiment analysis) and demography (age & gender prediction). We present our ongoing work in developing such tools, by bootstrapping publicly available and multilingual resources (e.g., social media, news, blogs, wikis), and applying them to rising problems in online communication. One example is the proliferation of hate speech, for example in the form of cyberbullying and jihadist propaganda. To address this, we are using a combination of text analysis and image analysis – the latter of which has become practically feasible with today’s neural networks and deep learning techniques. Combining both approaches is useful, since the meaning of an image of a burning Eifel Tower is different when accompanied by the blurb “Tonight on the sci-fi channel!” than when accompanied by an unusual religious quote. Finally, we will discuss problems and solutions in making such systems publicly available and/or industry-ready, in terms of privacy, viability and scalability.

Engh, Line Cecilie. The Norwegian Institute, Rome; University of Oslo. Title: Transforming Group Identities through Liturgy. Abstract: Liturgies are multisensory reconstructions of narratives. Enacted in highly structured architectural spaces, supported by visual representations, by sounds, smell, touch, and movement, medieval liturgies created and sustained a repertoire of narratives, reframing identities and social meanings. These multimodal forms of social communication tap into powerful structuring processes in human cognition, redefining the boundaries between self and other and reorganizing the hierarchy of values. Participatory multimodal enactments of narrative dramas, defining new goals, obstacles, resources, and strategies, communicate a collective vision that not only informs, but transforms. Liturgies involve individual and collective identity formation within texts, material culture, and performance. The full resources of memory, imagination, advanced social cognition, and decision-making are evoked by the extraordinary multimodal forms and actions of the liturgy. Liturgies are accordingly a laboratory for investigating the cognitive science of multimodal communication. This talk will discuss the principles and technologies of medieval liturgies, that is, forms of expression in various modalities—visual, auditory, gestural, olfactory, kinetic, etc. – that prompt for the construction of meaning (Palazzo 2014, Jørgensen, Laugerud & Skinnebach 2015). In particular, the talk will concentrate on constructions used to prompt for transformations in the group. Emphasis will be placed on the Cistercian liturgy of the twelfth century and the Cistercian abbot Bernard of Clairvaux’s Semones per annum (Sermons for the Liturgical Year). The Cistercians enriched the classical Roman liturgy by infusing it with drama, poetry, and symbolism (Chupungco 1997, Waddell 1994). But how did Bernard of Clairvaux stage - by imaginative, orchestrated, multimodal performances of words, sounds, and visual cues - the transformation of himself and his monks into semi-angelic and pseudo-biblical beings who breached the boundaries between here and now, there and then, heaven and earth? It has been noted that medieval people imagined and projected themselves into sacred narrative like biblical texts and hagiography (Leclercq 1982, Boynton & Reilley 2011, Engh 2014). I develop on this central insight to show how Cistercian liturgy and architecture, in conjunction with Bernard’s sermons, functioned as vectors for (re)positioning selves and communities through multimodal constructions of viewpoints and perspectives. The fundamental assumption of the liturgy is to (re)activate Christ’s presence. Activation is effected by the reading or chanting of text, but also by architectural space and visual aids such as iconographical schemes in frescos and mosaics, incense, garments, illuminated liturgical books and other liturgical artifacts. To create new identities of the celebrants, however, requires cognitive processes of compression and blended viewpoints. Forms of drama like the Easter visitation sepulchri made use of liturgical texts staging the liturgical ministers as dramatis personae. Into the complex process of multimodal, performative sensory experience, Cistercian liturgy cast the performers in a series of narratives and roles that worked to profoundly shift their perspectives. Blended viewpoints and deictic displacements form part of a repertoire of multimodal constructions that Bernard employed systematically. Viewpoint can be expressed linguistically and gesturally (Sweetser 2012, Steen & Turner 2013), and serves to create a spatial framework orienting interlocutors. Bernard and the Cistercians blended viewpoints by linguistic markers such as pronouns (ego, tu, vos, nos, etc.: in Latin these would be marked by verb inflection unless emphasized). For instance, when the ‘we’ of a Psalm verse was sung by the monks, their viewpoint was blended with that of the biblical figures. Such deictic displacements of liturgical performers mapped identities and meaning, triggering new concepts of society and its boundaries, communal experience, empathy, and engagement of the self.

Ferre, Gaelle. University of Nantes. Title: Weather Reports as Multimodal Constructions. Abstract: As already pointed out by Steen & Turner (2013), weather reports are good examples of multimodal constructions as they afford intertwined oral and visual information in the weather predictions they propose. Expanding on this, we would like to add that weather reports are extremely condensed communication situations, mainly composed of descriptive moves and containing very few non descriptive moves (e.g. greetings or linking the weather forecast with the previous or following television program). Speech in weather reports is generally described as very rapid (Moore Mauroux, 2016), but is also often elliptical with elision of some articles, but mainly verbs and the presentative construction.

When verbs are clearly expressed, they are mostly in the present tense or the infinitive thus neutralizing the expression of time in speech, one of the major semantic domains of weather reports. Semantically speaking then, the three domains present in descriptive moves are Time, Space and Weather condition and/or Temperature. The three domains may be present both in speech and the graphic background screen but one domain may be present in only one semiotic mode. Besides, each domain can be reinforced by a focal accent (Katz and Selkirk, 2011) or a gesture. For instance, as shown in Figure 1 below, the forecaster says: “and by the end of the night also some thundery downpours approaching the far south”, which forms a semantically complete discourse move. Time is expressed in speech but is also present in print on the map. Speech mentions weather conditions (thundery downpours), whereas the map shows temperatures (brighter orange revealing higher temperatures). And at last, spatial information is present both in speech (far south) and the map itself (showing the UK and northern France), but is also highlighted by the pointing gesture towards the south of England and a focal accent on “south”. There are of course many possible combinations for presenting the four domains across the various semiotic modes, but our initial prediction was that Space and Weather would be preferentially represented visually (on a map and/or in gesture) due to their greater iconic potential (although Ferré and Brisson, 2015, showed that pointing at a map may generate viewpoint mismatches in the expression of space), whereas Time and Temperature would rather be represented orally (in speech, potentially reinforced by a focal accent). Our second prediction was that the frequent loss of verbs in this genre would have a stronger impact on a language like French, as opposed to English, since the former language has a richer conjugation paradigm than English and the loss is therefore more important in terms of the expression of time. This greater loss would have to be compensated in some respect. We tested these two hypotheses on a preliminary corpus of 5 French and 5 English weather forecasts, completely transcribed and analyzed for focal accents in Praat (Boersma and Weenink, 2009) as well as annotated for visual components in Elan (Sloetjes and Wittenburg, 2008). Our first prediction is only partially confirmed: oral features are used in a proportion of 45% for each of the semantic domains and visual cues represent a proportion of 55%. Yet, whereas Space and Weather are more often reinforced by gesture (a visual cue), Time and Temperature are more often reinforced by a focal accent (an oral cue). Our second prediction is confirmed: in French, gestures are used to express or reinforce Time expressions twice as much as in English. No other semiotic mode shows any difference for the expression of Time in the two languages. Since the French forecasters do not gesture more than the English and do not compensate in any other semiotic mode, this higher reinforcement of Time expressions by gestures could well be a compensatory device for the loss of tense marking in weather forecasts.

Fonseca, Paula; Pascual, Esther. Polytechnic Institute of Viseu, Centre for the Study of Education, Technologies and Health (CSETH); Zhejiang University. Title: Reality, Fiction and Fictivity: Multimodal interaction blends in The Daily Show with Jon Stewart. Abstract: Political entertainment television programs like the award-winning Daily Show with Jon Stewart discuss politics in humorous, creative ways that viewers are receptive to due to their ability to speak to them through comprehensible and enjoyable means (Jones 2010). In our paper, we discuss a multimodal communicative strategy based on the face-to-face conversation frame, fictive interaction (Pascual 2002; 2008; 2014), which was successfully used by Jon Stewart in his monologues. In particular, Stewart often introduced actual previously produced utterances (e.g. through video clips and other news media) as well as entirely fictitious ones (e.g. made-up utterances ascribed to a real dead individual inhabiting a cartoon world), creatively integrating them into his discourse. He did so through multiple modes of communication, as a means to not only entertain his audience and inform them on the latest news, but also to present his own views and opinions on those news items (Fonseca 2016). We sustain that this strategy involves the conceptual integration of factual past or fictitious counterfactual mental spaces with the here-and-now space of the ongoing show, in which the actual or fictitious utterances at issue constitute fictive interactions with the audience –turned fictive overhearers–, who ultimately are to be convinced of the host’s views. Specifically, we focus on Stewart’s discussion of the first presidential debate between Mitt Romney and Barack Obama in the 2012 US presidential election. Stewart’s conceptual framing of the debate in his discourse to viewers is imaginative and complex. He strategically combines different communicative modes, like the verbal and visual mode, as well as prosodic features (e.g. exaggerating what he says using pitch, tone and rate of speech) and gestures, to convey his opinions and emotions. These in turn affect the overall meaning of complex conceptual blending networks involving fictive interaction. He uses the conversation frame together with other cognitive mechanisms, like metaphor and metonymy, to effectively perform his non-actual conversations with a number of fictive addressees. By creating fictitious conversations with the candidates both in the here-and-now space of the show and in the past reality space of the debate and in other actual and imaginary spaces, he is able to take his viewers back to the presidential debate and analyze what happened using their ‘mind’s eye’ to construe moments in the debate that led Barack Obama to lose it. The host turns actual discourse characters into fictive conversational participants by: 1) fictively talking to them in an actual past-present blend; 2) talking for them in an alternative imaginary counterfactual space; and 3) fictively talking to them in the here-and-now. We sustain that the indisputable success of this “fake news” satirical program during Jon Stewart’s reign is largely due to these complex multimodal fictive interaction blends he created to analyze and criticize politicians’ discourse. In doing so, Stewart was holding the individuals he discussed accountable for what they said in order to better inform and raise his viewers’ awareness of political issues.

Giangreco, Ivan; Rossetto, Luca; Tanase, Claudiu; Schuldt, Heiko. Databases and Information Systems Research Group, Department of Mathematics and Computer Science, University of Basel. Title: vitrivr: A Multimedia Search System supporting Multimodal Interactions. Abstract: vitrivr is an open source, full-stack content-based multimedia retrieval system that primarily focuses on video content. At the interface, vitrivr offers multimodal interactions by providing a large variety of different query paradigms, such as keyword queries, search for semantic concepts, query-by-example, query-by-sketch, motion queries, and any combination thereof. Despite its focus on video content, its modular architecture makes it easy to process different types of media as well. Keyword search is based on manual annotations and on the results of OCR applied to the content of the collection. Semantic queries rely on features obtained from Deep Neural Networks in three areas: semantic class labels for entry-level concepts, hidden layer activation vectors for query-by-example and 2D semantic similarity results display. Furthermore, the system considers various low-level features used for queryby-example, query-bysketch and motion queries. vitrivr’s grid-based result navigation interface supports browsing. At the database back-end, the distributed polystore ADAMpro supports large-scale multimedia applications and ensures the scalability to steadily growing collections. Search is provided by Cineast, the retrieval engine of vitrivr. The user includes the possibility for keyword specification, sketching, the specification of sample objects, the selection of semantic class labels, and the specification of flow fields for motion. All these query modes can be applied either simultaneously (supporting multi-object, multi-feature queries) or sequentially, for query refinement. In addition, users can even naturally interact with the vitrivr system by using spoken commands. Multimodal commands enable the combination of spoken instructions with manual pointing and sketching. This also fosters the collaborative interaction with the vitrivr system. The IMOTION project, which is based on the vitrivr system, won the 2017 Video Browser Showdown (VBS 2017)1 , an international video retrieval competition that consists of known-item search tasks based on visual escriptions, textual descriptions, and ad-hoc search tasks, all on the basis of the TRECVID 2016 Ad-hoc Video Search (AVS)2dataset of approx. 600 hours (144 GB) of video content.

Gondorf, Carsten; Suchan, Jakob; Bhatt, Mehul. Center for Cognitive Science, University of Kaiserslautern; Human-Centred Cognitive Assistance Lab. and The Designspace Group, University of Bremen. Title: Multi-Modal Studies of Visuo-Spatial Problem Solving Strategies in Children with Dyslexia: General methods, and Preliminary Results from a Large-Scale Experiment. Abstract: This presentation pursues a twofold objective: 1. reporting preliminary results from a large-scale experiment involving the multi-modal analysis of visual-spatial problem solving strategies in children with developmental dyslexia; and 2. in the backdrop of the experiments in (1), demonstrating the underlying integrated analytical and empirical evidence-based methods for understanding the complex multi-modal interactions between task characteristics and the situated visuo-spatial problem solving activity in a wide range of human behaviour studies. The multi-modality that we allude to in the context of this presentation encompasses: (1) analysis of embodied gestural behaviours of the children, (2) visual perceptual analysis based on eye-tracking data gathered during the problem solving process; and (3) fine-grained human expert annotated event segments (i.e., moments of interest, deep analysis based on dialogic components) in the experimental cycle. Our presentation will demonstrate that: (1). high-level analysis of spatio-temporally grounded behavioural data —in this case pertaining to the problem solving strategies of children— can help in determining visuo-spatial reasoning processes and applied problem solving strategies in experimental domain of developmental psychology (focussing particularly not he case of children with developmental dyslexia); and (2). that our approach is useful for identifying individual approaches in solving a given problem —be it in real-world or lab settings— and is a useful step toward the semantic interpretation of human-behaviour data in a much broader range of experimental settings.

Gonzalez-Marquez, Monica; Bergmann, Jerome.

RWTH-Aachen University. Title: MsquareC: A Python tool for coding multi-method data. Abstract:

We introduce a tool, MsquareC (Multi-Method Coding) to code qualitative data produced using experimental methods. The aim of MsquareC is to significantly increase the objectivity of the subjective evaluation of qualitative data. Specifically, our goal is to reduce bias, while coding, that may result from access to information about sex, gender, age, educational level, previous response, experimental condition, etc. In short, any information that may be inadvertently available to a coder that is not the information she is coding. MsquareC is intended to fill the gap between traditional qualitative coding software such as QDA Miner Lite and Coding Analysis Toolkit, and experiment generators such as Psyscript and E-prime by combining the focus on qualitative data of the former, with the experimental controls available in the latter. It has also been designed to be relatively easy to use, even for researchers unfamiliar with programming. MsquareC has been written using Python, and will be freely available on GitHub. Following are two situations where MsquareC may be useful. A) A researcher has a set of open-ended written participant responses. MsquareC will make it possible to evaluate the responses individually without bias. B) A researcher is evaluating a gesture production study. She is coding for emotional valance, and needs to ascertain the emotional response of naïve viewers of the gestures. Our tool can: 1. Obviate subject and experimental condition information during coding, such that only responses to be coded are visible to a coder (this is to prevent any bias associated with any of the independent variables, i.e. sex, experimental condition, etc.) 2. Each set of responses to a given question is available as a set so that coders can code one question at a time (i.e. all 250 responses to the same question, instead of going down a list of responses to questions). The sets of data to be coded are also randomizable. This is to prevent coder expectation bias, i.e. first response by a given participant was of low quality, which might bias qualification of the response to the next question. This feature also helps prevent any order effects. 3. The current coder is blind to the evaluations of other coders. 4. It is possible to stop coding midway through a list, and resume later from that point. 5. Data can be easily exported for statistical analysis 6. Coding can take the form of binary categories, scales and comments. 7. Can take text, audio, and video data as input. 8. Tools available in other typical qualitative data analysis software will be added, i.e. word counts, term classification, as needed, although at issue for MsquareC is automating blind coding. During our presentation, we will describe the need for a tool such as MsquareC when conducting multi-method research. We will also introduce the use of MsquareC using actual data, including a detailed description of the features described above.

Gordejuela, Adriana. Institute for Culture and Society, University of Navarra (UNav). Title: Film flashbacks and the conceptualization of time. Abstract: In the study of how time is depicted in multimodal discourse, it is of great interest to analyze how cinema represents the passing of time and the temporal leaps in a narrative. Among all the resources that movies usually employ in this sense, the flashback or temporal retrospection is the most attractive one from a multimodal point of view: it uses different visual resources (various types of framing, camera movements, expression and motion of the actors, transitions and visual effects, and many more), as well as acoustic ones (music, dialogue, various sounds and sound effects, and so forth) which are combined to represent a temporal leap from the present to the past. The question that follows then is: how is time conceptualized and rendered in film flashbacks, and how do viewers make sense of them? Is there a time-space mapping in those representations, in the same way as in many verbal and gestural depictions of time? To provide an answer, we take a look at the multimodal cues that a movie offers in a retrospective scene and analyze the cognitive processes that those cues activate in the viewers, and which make the comprehension of the flashback possible. Among those formal cues, the “eyeline match” structure reveals itself as a fundamental one in many film retrospections. This classic film technique, which contributes to the “continuity system”, consists on the combination of at least a shot of a character looking on a certain direction off-screen and a shot of an object (or another character) towards which the first person looks. Ultimately, what lies at the basis of that continuity device is a joint-attention scheme, upon which the flashback is built. Also, the “eyeline match” provides the foundation for a time-space mapping that represents the past as being in front of the character. Through the analysis of flashback examples from different films —Big Fish (2003) and The Help (2011), first, and also Rebecca (1940) and Begin Again (2013) for creative variations of the same technique— it will be shown how certain cognitive processes are activated by the multimodal cues presented in each scene, thus making the viewer understand the retrospection without difficulty. Some of the cognitive issues that will be discussed are: blended joint attention, time compression, viewpoint integration, or identity and analogy connections.

Grogan, Kimberly L. Case Western Reserve University. Title: Multimodality and Conceptual Blending: Political Graffiti Murals as a Form of Fictive Communication. Abstract: Art is a form of communication (Turner, 2014) and can metaphorically “speak” to viewers (Pascual, 2014; Sullivan, 2016). Political graffiti murals share this communicative trope, yet are a visual discourse occupying a unique communicative space that further highlights and make use of the artist as speaker blend (Sullivan, 2009), while their illicit creation rhetorically situates the message as a subversion of authority. The graffiti artist’s desire to create joint attentional scenes is predicated upon the belief that an audience will be able see and construct meaning from their work; classic joint attention (Tomasello, 1999) is co-opted to transcend beyond the present and intuit future interactions and affordances. This in turn gives rise to blended joint attention (Cánovas & Turner, 2015) and fictive communication (Pascual, 2008; 2014). Public locations are chosen with communicative intent, while vivid representations that reference and subvert culturally salient events and mental states ensure attentional processing. Frames (Fillmore, 1980), conceptual blending (Fauccoiner & Turner, 2002), and compression (Fauconnier & Turner, 2000) allow for the viewer to extract semantic content from the visual components. To evaluate the ways in which political graffiti murals serve their distinctive communicative function and deploy their anti-authoritarian message, works by internationally renowned graffiti artist known by the pseudonym Banksy will be assessed.

Gullapuram, Shruti; Shukla, Abhinav. IIIT Hyderabad, Red Hen Lab, CCExtractor. Title: Visual feature extraction, shot classification, and affect recognition in broadcast news videos. Abstract: Red Hen processes massive amounts of news data everyday. This works presents a pipeline which analyzes news videos at the shot level and enriches the NewsScape corpus with visual information. Trained on 40 hours of annotated video data, an integral part of the pipeline is the news shot classifier- a convolutional neural network based on the AlexNet [1] architecture. As far as we know, news shot classification in existing literature has mainly been limited to the binary distinction of anchor shots and “news stories”. With an accuracy of 86%, comparable to state-of-the-art, our classifier categorizes shots into one of, Newsperson (which includes shots containing studio/correspondent/reporter), Background Roll, Weather section, Sports section, and Graphics (including branding graphics, text, or data visualization). Such a classification may help us study how programs/shows across different news networks are temporally structured. In addition, low level semantic information such as the scene type or the presence of particular objects in a shot, associated with the shot class, may enhance a visual content-based search and assist in communication studies. Further, the shot characterization pipeline may be used as a preprocessing tool for a gesture detection framework, by narrowing down potential scenes of interesting gestures such as ‘Newsperson’ shots, which are predominantly human centric. Another aspect that we aim to study is affect recognition in news videos. Red Hen already has NLP based sentiment analysis for the caption text modality. We propose a deep learning based computational model which uses audio-visual features to predict the valence (level of pleasantness) and arousal (level of intensity) elicited in news videos. We thereby try to associate emotional keywords based on [2] from the predictions, to help us estimate emotion present in the visual/audio content of news.

Hampe, Beate (Mittelberg, Irene; Turner, Mark; Uhrig, Peter). University of Erfurt (University of Erfurt (RWTH Aachen; Case Western Reserve University; University of Erlangen). Title: Multimodal constructions? There-constructions ‘in the wild‘. Abstract: Communication is multimodal by nature, and spontaneous, face-to-face interaction relies on several semiotic channels simultaneously – both verbal and nonverbal ones. Crucially, these offer radically different means for coding/expressing meaning, diverging in the extent to which they can draw on iconic motivation (e.g. Mittelberg 2014). The recent surging interest in multimodal communication in Cognitive Linguistics has put research on speech-accompanying gestures center stage, even hypothesizing about the multimodal nature of syntactic constructions (e.g. Turner & Steen 2013). While it cannot be doubted in the least that spontaneous gestures present a vital aspect of ongoing, interactive meaning-making, and certainly also open a new window to the conceptualizations motivating an individual speaker’s speech at any given point in time – even to the point of „exbodying“ vital aspects of individual speakers‘ embodied simulations of this content (e.g. Mittelberg 2014; Hofstetter & Alibabi 2008), it is by far less clear whether (at least some of) the speech-accompanying gestures must also be seen as an integral part of the syntactic constructions structuring the verbal (and also the prosodic) planes of expressions in spontaneous discourse. On the one hand, understanding syntactic structures as “grammatical symbols“ (i.e. in terms of a widened notion of de Saussure’s linguistic sign) is compatible with such an idea. The fact that leaving out or repressing speech-accompanying gestures, may make language production/ comprehension more effortful, i.e. hinder or block the unfolding of McNeill’s “growth point“ in processes of what Slobin called “thinking for speaking“ (e.g. McNeil & Duncan 2000) could also be seen as supporting this idea. On the other hand, such a repression does not make any utterance structurally unacceptable. In this, gestures clearly contrast with inadequate verbal structures, and also (though maybe to a lesser extent) with prosodic structures. Investigating the multi-modality of syntactic constructions thus incurs the necessity to carefully weigh the contributions and functions of the verbal, the prosodic and the gestural plane in real-time communication. The question of whether syntactic constructions are multi-modal clearly also involves other long-standig issues in construction-based syntax research to do with (i) mode-specific aspects/properties of syntax more generally, especially those in the spoken mode that are caused by time and interaction pressures (e.g. Biber et. al 1999) , as well as (ii) the role of mid-level schemas in construction-based syntax in general. The latter point is important because it connects back to the issue of the precise role of syntactic constructions that are relatively specific, perhaps even tied to particular “semantic frames“ (rather than very generic scenarios generalizing over any number of specific frames). To follow up some of these issues, this contribution makes use of the data provided by the RedHen archives, more specifically the US KABC talk show “The View“ (2016) as a part of the „NewsScape“ archive accessible via the CQPweb Erlangen. It presents analyses of the linguistic and gestural properties of large random samples of expressions instantiating there-constructions with the copula BE, including deictic, existential and discourse-deictic uses. The implications of the empirical findings for the theoretical issues laid out above are discussed in depth.

He, Xu. Red Hen Lab, Jacobs University Bremen. Title: Audio Processing Pipeline for NewsScape Datasets. Abstract: Large amount of audio signals are generated everyday in television, video sharing websites and other digital media platforms. Recent growth of computing power in hardware and modern development of signal processing and machine learning technology have made it possible to automatically parse these raw audio data and extract useful information for further research in Linguistics, Communication Studies and Cognitive Science. In this talk, we present an audio processing pipeline implemented on a High-Performance Computing Cluster and applied to process archived videos from the NewsScape dataset. This pipeline consists of modules for media format conversion, acoustic fingerprinting, forced alignment, diarization, speaker recognition and gender identification. Each of these modules will be introduced in terms of their function, working principles and sample results, further extension of this pipeline will also be discussed.

Hedblom, Maria M.; Kutz, Oliver. Free University of Bozen-Bolzano. Title: Multi-modal Image Schemas. Abstract: The theory of image schemas was introduced as a missing link between embodied experience and mental representation. The theory proposes a relatively small number of conceptual building blocks based on spatio-temporal relationships called ``image schemas'' upon which reasoning and different forms of communication can be built. While image schemas often are spoken about as spatio-temporal relationships, rather often the temporal dimension is omitted. Identifying and formally discussing image schemas in their static sense is complicated enough, but it is conceptually impossible to discuss the phenomena of image schemas while ignoring the dynamics of temporal change. For instance, the image schema CONTAINMENT is proposed to be learned from the movement of objects in and out of containers rather than the inside-border-outside relationship presented in cognitive linguistics research. It is a prerequisite that an infant understands in and out movement before it can understand concepts such as enclosure and containment. Image schemas have found increased interest in research on artificial intelligence as they offer a cognitively inspired bridge to computational concept comprehension and concept invention. One assumption is that the integration of image schemas will enable artificial intelligence and language comprehension tools to support a better 'understanding' of abstract language, conceptual metaphors, or analogies. However, currently the state of rendering image schemas formally has been primarily restricted to describing them as purely static relationships. In order to have a more accurate formal description, the temporal dimensions need more attention. This abstract is intended to highlight the importance of time and change for image schemas, as these constitute some of the most important aspects of these conceptual building blocks. The theory of image schemas is therefore naturally and closely linked to the fields of multi-modal and qualitative modelling, which we intend to explore further in our work, in particular with attention to the cognitive adequacy of the chosen formalisms. Formalising image schemas qualitatively may then employ e.g. temporal logics, trajectory calculus, and a variety of spatial calculi. Here, the appropriate combination of these formal methods is of the essence for capturing the full multi-modality of the respective image schema.

Hirrel, Laura. University of New Mexico. Title: Multimodal Constructions Incorporating Cyclic Gestures and Finger-spread Handshapes. Abstract: The cyclic gesture is a recurrent co-speech gesture form that is characterized by a circular movement of the hand or arm. Cyclics have been described as functioning to represent “ongoing” events (Ladewig, 2011) and to metaphorically depict processes or transitions (McNeill, 1992). Cyclics have also been found to occur with specific motion constructions in English in which the verbal element encodes a circular path or manner of movement (Zima, 2014). The current research expands understanding of cyclic gesture functions by identifying new functional types not previously described in the literature. It further aims to demonstrate that single gesture strokes have the potential to be symbolically complex gestural constructions that are used in systematic ways with spoken language constructions. This research is part of a larger study that takes a multimodal construction-based approach to identifying patterns in 1000 tokens of cyclic gesture use in American English talk show data. The analysis included identifying the types of spoken language constructions co-expressed with cyclic gestures and analyzing specific functional properties expressed within those types. Using cluster analysis and chi-squared tests, patterns were identified in the mapping between formal properties of the gestural construction and functional properties of the spoken language construction. I analyze cyclic gestures as component symbolic structures that participate in different gestural and multimodal constructions. Research findings suggest that across constructions cyclic gestures schematically profile relationships. The specific type of relationship a cyclic designates is instantiated by the composite multimodal construction with which it is used. As relationships, cyclics are conceptually dependent on more autonomous structures, the “things” constituting the relationships (Langacker, 2008). Cyclic gestures are also phonologically dependent structures. As movements, they are dependent on entities moving (handshapes) and the locations of the movements in space. I argue that these phonologically more autonomous structures that simultaneously occur with cyclic movements are potentially meaningful and symbolic in their own right. This presentation focuses on constructions involving cyclic gestures and finger-spread handshapes. Findings suggest that finger-spread handshapes are symbolic units that are systematically used with cyclic gestures in multimodal constructions that share semantic properties. Multimodal constructions involving cyclic gestures and finger-spread handshapes were found to be used to express meanings of relative magnitude, degree, approximation, indefiniteness and hypotheticality. Recurrent constructions that incorporate cyclics and finger-spread handshapes include quantifiers, degree adjectives, hedges, indefinite pronouns, and conditional clauses (specifically, the protasis clause). I suggest that cyclics used in gestural constructions with finger-spread handshapes prototypically profile non-processual (atemporal) relationships. This contrasts with temporal relational meanings for which cyclics have previously been described as being used. I describe other formal properties that distinguish different types of constructions involving finger-spread and cyclics. For example, quantifier constructions are associated with the performance of two-handed cyclic rotations while hedges are associated with multiple rotations and eye gaze toward the interlocutor. Using examples from the data, I demonstrate how the specific functions of cyclic gestures that profile atemporal relationships are elaborated by the composite multimodal constructions with which they co-occur.

Hoetjes, Marieke. Radboud University Nijmegen. Title: The effect of gesture on fluency. Abstract: There is general agreement in the literature that speech and gesture are closely related. However, many aspects of the precise relationship between speech and gesture are still unclear, and several hypotheses about the details of the speech-gesture relationship exist. Some suggestions are that gestures can aid lexical retrieval (Krauss, Chen, & Gottesman, 2000), and that gestures can help in information packaging (Kita, 2000). What these particular hypotheses have in common is that they imply that producing gestures somehow help people speak. One aspect of being able to speak is fluency. A question is whether this means that people who gesture are also able to speak more fluently. Some previous relevant studies have been done, indicating that gestures occur mainly during fluent speech (Graziano & Gullberg, 2013), and that during disfluencies, both speech and gesture stop (e.g. Seyfeddinipur, 2006). However, to our best of knowledge, there has not yet been a study comparing the overall fluency of people who naturally gesture (a lot) with people who do not gesture. In the present study we analysed an existing dataset to gain more insight into whether gestures may help verbal fluency. The data consisted of descriptions of hard to describe objects, which were collected previously in the context of a referential description task. We selected all descriptions without any gestures, which were from 10 speakers, and totalled 19 target descriptions, and matched these to the same number of descriptions of the same objects by speakers who produced many gestures. We then compared these two groups (gesturers/non-gesturers) on several aspects of verbal fluency. We found that the speakers who gestured showed a higher speech rate (in words per second), fewer filled pauses, and fewer self-corrections, but these differences were not statistically significant. However, there were statistically significant differences with regard to the mean utterance length (number of words), the number of verbal hedges (per 100 words), and the overall number of disfluencies (per 100 words), with gesturers producing longer utterances (M=19.2, SD=7.7) than non-gesturers (M=15.2, SD=3.8, F(1,36)=4.112, p<.05), fewer hedges (M=3.01, SD=1.42) than non-gesturers (M=5.32, SD=2.76,F(1,36)=10.60, p<.01), and fewer disfluencies overall (M=12.3, SD=5.) than non-gesturers (M=16.1, SD=5.0 6, F(1,36)=4.91, p<.05). Although some of the findings of this study were not statistically significant (which may not be surprising given the small dataset), the overall picture seems to be that speakers who naturally produced many gestures produced longer utterances, with fewer disfluencies. This suggests that producing gestures while speaking indeed helps to be more fluent. Future work could study these results in more detail, such as the timing of the gestures in relation to particular disfluencies.

Horst, Dorothea. European University Viadrina, Frankfurt (Oder). Title: The embodied and multimodal dynamics of metaphor in campaign commercials. Abstract: Metaphor is considered a fundamental tool for making sense of the world, not only in everyday language but also in the realm of political discourse (e.g., Lakoff 1996; Musolff 2004; Charteris-Black 2005). The paper aims to address a research object hitherto largely underexplored in the study of metaphors in political discourse: the role of metaphor in audio-visual political advertising, more precisely in campaign commercials. In particular, it touches upon the question of how metaphoric meaning making plays out in the dynamics of speech and audio-visual images and thereby enables spectators to literally make sense of candidates. The study draws on a transdisciplinary cognitive-linguistic and film-analytic account of audio-visual metaphor (Kappelhoff and Müller 2011; Müller and Schmitt 2015; Schmitt, Greifenstein, and Kappelhoff 2014). Its core assumption is that in experiencing audio-visual compositions, spectators get affectively involved through the temporal and aesthetic orchestration of their perception process evoking felt experiences. On the basis of such embodied experiences, they are enabled to construct metaphoric meaning. Using the example of two campaign commercials from Germany and Poland, different ways of metaphorizing the two candidates shall be outlined and juxtaposed. Both of them were challengers in the respective parliamentary elections: Frank-Walter Steinmeier (German Democratic Republic, SPD) in 2009, andJarosław Kaczyński (Law and Justice, PiS party) in 2011. In order to discern the differences in the emergent image of the candidates that are apparently rooted in the specifics of the metaphoric meaning making process in the two spots, the analysis puts a strong focus on the concrete audio-visual context and on the dynamic interplay of speech and audio-visual staging. Such a dynamic perspective that conceives of metaphor as a process instead of a product (Müller 2008a,b, 2011), brings to light different forms of staging metaphoricity: one proceeding from speech explicitly with audio-visual staging playing a rather depicting role, and one being grounded in the modulation of affective experience through the audio-visual composition with speech playing a coordinated explicating role. Taking into account the media situatedness, the interplay of articulatory modalities, and the processuality of metaphor thus allows for productive insights into the embodied and multimodal dynamics of meaning making.

Hougaard Anders. Institute of Language and Communication, University of Southern Denmark. Title: Hyperembodiment in Multimodal Instant Picture Messaging. Abstract: The world at our feet is a rapidly developing, advanced, multimodal, technological world. From constant technological change follows a constant change in the ecology that we are thrown into, and from this in turn follows an eternal process of developing an optimal grip on a novel world (Merleau-Ponty 1962(1945)). Multimodal digital communication technology presents a special case of practices for getting a grip on each other. As media theorist Marshall McLuhan proposed back in the 1960s (e.g. McLuhan 1964), any technology “extends” human beings in some way. However, when we “are” in the media, we are, as McLuhan also observed (McLuhan 1977), disembodied. These two factors together—extension and disembodiment—create a fascinating “fourth dimension” (Laurence Scott 2016) of communication, perception and social presence, a dimension which both lacks the quality of embodied presence while at the same time facilitating “hyperembodied” effects that constitute emergent ways of getting a grip on each other and each other’s worlds through technology. This paper presents analyses of “snaps” from the very popular instant picture messaging application Snapchat. The snaps exemplify use of multimodal, mediated communication which produces “hyperembodied” effects. Two types of effects will be discussed: hyperembodiment through “high definition” representation in a selfie and hyperembodiment through “extraembodied” representation in a “footie”. The analyses will elaborate on the phenomenon of hyperembodiment and show how through mediated multimodal communication a whole new order of perception and intersubjectivity is created. Moreover the analyses will show how users perform “conceptual integrations” (Fauconnier and Turner 2002, Turner 2014) of technological affordances and embodied social presence and perception in order to achieve these effects. The paper thus contributes to the development of a social-cognitive approach to computer-mediated communication (Walther 2011).

House, David; Ambrazaitis, Gilbert; Alexanderson, Simon; Ewald, Otto; Kelterer, Anneliese. KTH (Royal Institute of Technology); Lund University. Title: Temporal organization of eyebrow beats, head beats and syllables in multimodal signaling of prominence. Abstract: Prominent words and syllables are frequently conveyed in speech communication by a complex of multimodal signals involving both speech and gesture. Important components of prominence include stressed syllables and focal accents with concomitant facial gestures such as eyebrow and head beats. These beat gestures tend to align with focal pitch accents, and there is evidence suggesting that eyebrow and head beat gestures are more likely to occur with perceptually strong accents. While there is ample evidence of the co-occurrence of beat gesture movement and focal accents, we are still lacking detailed accounts of the timing relationships between the gestures and syllables comprising multimodal prominence. This study presents data concerning the temporal organization of beat gestures and accented syllables obtained from two different data sources: recordings of spontaneous dialogue and television news reading. For the spontaneous speech analysis, a 20-minute dialogue between a female and a male speaker was taken from the Spontal corpus of Swedish dialogue, a database of unrestricted conversation comprised of high-quality audio and video recordings and motion capture. The motion capture data enabled automatic detection of head movements resulting in 150 nod segments for the female speaker and 64 nod segments for the male. As the automatic detection process resulted in the extraction of head movements having various functions (e.g. feedback giving and confirmation), two annotators classified the head nods as to whether or not they had a clear beat function. 50 head nods from the female speaker were marked as such by both annotators (65% agreement). Syllable boundaries and nuclei were automatically derived from the audio signals following. Although the timing relationships between the nods and the co-occurring syllable show a considerable amount of variation, the peak rotation of the nod is on average aligned with the nucleus of the stressed syllable, and the onset of the nod tends to slightly precede the onset of the syllable. The television corpus consists of 31 brief news readings from Swedish Television, comprising speech from four news anchors (two female, two male) and 986 words in total (6 1/2 minutes). It was annotated for focal accents and head and eyebrow beats, independently by three annotators. A previous study has reported on the frequency of combined prominence markers in the material. For the current study, two additional independent annotators marked the temporal location of the eyebrow movement related to the head movement. The locations of 51 eyebrow movements were agreed upon (Cohen’s Unweighted Kappa 0.75) with 57% preceding the head movement, 41% simultaneous, and 2% following the head movement. These results suggest a possible general temporal ordering of multimodal signals for prominence where eyebrow movement precedes head nods with both being anchored to but followed by a focally accented syllable. In terms of bodily mobilization for multimodal prominence gestures, the relatively small mass of the eyebrows may facilitate their use as precursors to the greater muscle activation needed for the head nod.

Hsu, Hui-Chieh;

Feyaerts, Kurt;

Brône, Geert.

KU Leuven. Title:

The hand, the eye and the bow: Embodied multimodal depicting in cello master classes. Abstract: Depictions, as recently identified and defined by Clark, are physical scenes people create and display with a single set of actions at a single place and time, for others to use in imagining the scenes depicted (Clark, 2016: 324–325). Situated in staging theory (Clark, 2016), the framework of depicting promises the potential of unifying numerous linguistic phenomena thus far approached individually, such as quotation, demonstration, iconic gesture, and constructed action (Cormier, Smith, & Sevcikova, 2013; Kendon, 2004; McNeill, 1992; Streeck, 2009), all of which involve depicting as a common denominator. To better understand how depicting works in real-life interaction, we examined video recordings in the context of musical instruction, an arena particularly rich in information communicated multimodally. This is attributable to the nature of the interaction, namely that the instructor often needs to communicate nonverbal information on musical interpretation or techniques to the student. Specifically, we zoomed in on two 70 minute-long cello master classes (Masterclass Media Foundation, 2007, 2008), in both of which the cellist Steven Isserlis instructs a student on the interpretation of a musical piece. Selected segments were coded for speech (including humming and vocal mimicry; cf. Perlman & Cain, 2014; Hsu, Anible, & Occhino-Kehoe, 2016), gesture (manual, facial, and bodily; following the MUMIN coding scheme [Allwood et al., 2007]), eye gaze (cf. Rossano, 2012), and importantly, interaction with artifacts (e.g. musical instrument, score, and music stand). Systematic patterns were found where the instructor, when depicting, uses the cello bow as an extension of his arm. That is, the instructor would sometimes stage a depiction with the cello bow still held in his right hand, incorporating the bow in the depiction, as if it were part of his limb. Such phenomenon is especially common in depictions where the instructor attempts to illustrate the metaphorical contours of musical phrases, echoing the recent finding that musical instruments can be used as material anchors in musical instruction (Sambre & Feyaerts, in press), as well as the speaker’s propensity to construe physical tools (e.g. car and bicycle) as extensions of their body (cf. Lakoff & Johnson, 1999). Equally pervasive are depictions where the instructor’s eye gaze and manual gesture converge. When depicting a musical concept with a manual gesture, for instance, the instructor would often follow his own gesture with gaze. In numerous cases, these two phenomena co-occur: The instructor’s gaze would follow his own bow-incorporated depiction. Bringing in the additional dimension of artifact incorporation to multimodal depicting, the results reveal how different modalities can be creatively coordinated by the language user to jointly stage depictions to facilitate the addressee’s comprehension. The use of the cello bow as an extension of the arm also suggests the highly embodied nature of musical instruction. As a preliminary qualitative case study, the present project will benefit from further quantitative and experimental investigations—on, among others, the instructor’s gaze-gesture convergence and instructor-student interaction—as well as comparisons to other types of instruction settings.

Hubbard, Amy L. German Sport University Cologne. Title: Walk this way: How movement analysis is changing treatment options for one neuroclinical population. Abstract: As the International Conference on Multimodal Communication demonstrates, it is well-established that human communication is not limited to linguistic information exchanged via the auditory modality. Whenever humans become visually proximate, the very ordinary, unconscious production and interpretation of multiple visual cues yield extraordinary meaningfulness. Here, we show how the seemingly ordinary action of walking can communicate invaluable information about invididuals with multiple sclerosis (MS) and how this information can be used to question what is currently the status quo for treating individuals with this chronic neurological disease. Over 80 percent of individuals with MS suffer from a disease-specific form of debilitating fatigue; consequently, this MS symptom denies an overwhelming majority of individuals with MS the choice of engaging in basic life activities such as working, walking, driving, and socializing with family and friends. Although there are three FDA-approved medications which clinical neurologists have observed as having a positive impact on MS fatigue [amantadine hydrochloride-Symmetrel®, modafinil-Provigil®, and Armodafinil-Nuvigil®], these medications are neither indicated nor FDA-approved for treatment of MS fatigue. At the root of what is a longtime stalemate in treatment strategies for MS fatigue lies the lack of an objective test and measurement scale for MS fatigue. All attempts to characterize MS fatigue have either been based fully on patients’ subjective responses using a Likert Scale or have focused solely on cognitive fatigue,thus both limiting research on MS fatigue to individuals with MS who are “suffering from subjective fatigue” as well as occluding pharmaceutical testing for this elusive phenomenon. Here, we present a novel, objective tool for measuring MS fatigue. This tool is based upon a standardized test of movement behavior (BAST) that has been previously validated for diagnosis of neuropsychiatric conditions.The 10-minute videorecorded test includes basic physical tasks (e.g., 30 sec segments of walking and stomping) followed by four one-minute improvisations. Movement analysis of a subject walking at the beginning of the test versus walking after the final improvisation offers a physiological representation of the presence or absence of MS fatigue. Systematic operationalization of MS fatigue, including observations by clinical neuroimmunologists as well as movement specialists, has guided the creation of the BAST+MS rating scale. As this will be the first public presentation of the large-scale goal of designing, testing, and validating the BAST+MS, the audience will be encouraged both to ask questions and share their movement observations of potential fatigue indices that they see in the video presentation of test data.

Ibrahim, Wesam Mohamed Abdel Khalek. Tanta University. Title: A Mulitmodal Analysis of Political Cartoons in Post-Revolution Egypt. Abstract: ‘It is in the condensation of a complex idea into one striking and memorable image that we find the appeal of [a] great cartoon’’ (Gombrich, 1963, 130). This paper presents a multimodal analysis of political cartoons posted on facebook about the Egyptian political and economic situation after the 25th January 2011 Revolution. Focus will be particularly on the crises related to shortages of Propane gas bottles and soaring prices. The facebook political cartoons, which have become embedded in the everyday political culture of the ordinary Egyptian citizen, would provide a good example of multimodal texts, ‘since they typically combine the verbal and visual semiotic mode to create meaning’ (El Refaie and Horschelmann, 2010, 195) or, in other words, provide an ‘interpretation of larger societal practices and forces through the use of textual and visual codes’ (Chaplin 1994: 1). Generally, political cartoons might merely summarize a political event, depict a political figure, or comment on a political situation. They tend to consist of illustrations which address a current political issue from a critical point of view and are accompanied by verbal elements which make a satirical, witty, or humorous point. However, facebook cartoonists have developed their cartoon style by introducing to the above combination pictures or extracts from films, TV series or advertisements. The humour embedded in cartoons is defined by James Scott (1990: xii) as ‘a hidden transcript that represents a critique of power spoken behind the back of the dominant.’ This definition, however, does not cover the situation in Egypt after the revolution. The emerging ‘revolutionary character’ (Boime 1992: 256), the widespread freedom of expression and the absence of censorship on facebook postings allowed the critique of power to surface and become more explicit. Humour occurs when the cartoonist relies on the unexpected, the incongruous, or the absurd to convey his meaning (Duus, 2001: 966). However, the humour associated with the Egyptian cartoons can be described, using Hewitson’s term (2012: 213), as ‘black humour’or ‘humour deriving from the contemplation of suffering or death’since the cartoons tackle social and economic crises. This paper uses the cognitive linguistic theory of conceptual blending to illustrate how recipients of cartoons construct meanings as well as humour from these cartoons. Conceptual blending (Fauconnier and Turner, 2002), can be seen as a dynamic process that occurs at the moment of perception to create new meanings from existing ways of thinking. It is a common cognitive activity, closely related to analogy and metaphor (Fauconnier, 2001). Blending can occur in both verbal and visual domains, which makes it ideally suitable to account for the specific features exhibited by cartoons. This paper illustrates the potential contribution of blending theory to the analysis of political cartoons and shows its explanatory capacity to provide detailed descriptions of the reception process of these cartoons.

Iriskhanova, Olga; Prokofyeva, Olga. Moscow State Linguistic University, Institute of Linguistics, RAS, Moscow. Title: Focusing in visual perception, speech and gestures: А multimodal analysis of oral descriptive discourse in Russian. Abstract: The present research is an empirical study of the relation between attention (focusing), speech and gesture in oral descriptive discourse. It has been shown in cognitive linguistic works that the attentional systems are at the heart of linguistic meaning construction (Langacker 1999, Talmy 2000a-b, Fauconnier & Turner 2002, Oakley 2009,). However, much less has been said about how the processes of foregrounding and backgrounding carry across various modes in different communicative practices and languages. In this research we concentrate mainly on two modalities of oral descriptive communication in Russian – verbal and gestural, with the pictorial modality serving as the initial stimulus. Basing on the ideas about co-speech gestures in (McNeill & Duncan 2000, Kita 2000, Mueller, Cienki et al. 2013, Mittelberg 2013, Steen & Turner 2013, Sweetser 2012), we investigate how these modalities complement each other in foregrounding objects and their qualities in discourse. Specifically, we are interested in what types of gestures (and how often) are used to support profiling of entities in descriptive discourse by linguistic means.The corpus of texts under analysis consists of oral discription of paintings, provided by 20 Russian participants (aged from 19 to 23) in their native language. Four paintings of different art movements and styles were used as visual stimuli for each participant. Description was produced from memory, immediately after a 40-second demonstration of a painting. The whole procedure resulted in 2 sets of data: the eye movement data recorded by an eye tracking device (SMI iView X RED 4 (FireWire)) and the discourse data registered by a video camera. For the eye movement analysis the areas of interest (AOI) were chosen and the total fixation duration per AOI was determined. The discourse data set was analyzed in terms of the co-occurrence of gestures and the linguistic expressions denoting objects and their qualities, used in the focal position in the text. The focal status of a linguistic expression was determined with regard to a number of language-specific criteria: syntactic position, part of speech characteristics, lexical semantic features (abstract vs. concrete, literal vs. figurative), prosody, etc. Quantitative and qualitative analysis of the 2 data sets has shown the correlation between perceptual and discursive focusing, on the one hand, and between verbal and gestural focusing, on the other hand. The research has demonstrated that linguistic and gestural foci in descriptive discourse relate to the AOI of the perceptual focusing at the stage of observing the image. Importantly, the co-occurrence of linguistic means and gestures used for foregrounding, as well as their semantic and functional characteristics, are influenced by the features of descriptive discourse and the task situation. This indicates that discourse elements belonging to different modalities jointly contribute not only to the “mechanics” of the selection system, but to the interpersonal system of sharing, harmonizing and directing another person’s attention (Oakley 2009).

Jaki, Sylvia. University of Hildesheim, Institut für Übersetzungswissenschaft und Fachkommunikation. Title: Extending the notion of documentary: A multimodal analysis of the docudrama series Roman Empire: Reign of Blood (2016). Abstract: Documentary film has come a long way since the time it was almost entirely associated with the transmission of factual knowledge. Today, individual documentaries and documentary series need to attract and sustain viewers’ interest to survive in a highly competitive market. As they are mostly consumed for entertainment, producers are not only concerned with conveying information, but also with the manifold techniques of making their formats more entertaining, especially by appealing to the audience’s emotions. As a consequence, a variety of hybrid, so-called docutainment, formats have evolved which combine fact and fiction in several ways (cf. Wolf 2006). One of these is Roman Empire: Reign of Blood about Emperor Commodus, a 2016 production by the very successful online streaming platform Netflix. While typical historical documentary formats have long employed re-enactment as a means of visualising historical facts, in Roman Empire we are dealing with a primarily re-enacted, strictly chronological mini-series, complete with historical facts provided by historians and a narrator. This paper analyses Roman Empire in comparison to more typical documentary formats. Indeed, some characteristics of this series are highly striking when set against its more staid predecessors, for example the dramatic series title, the extensive re-enactment noted above, as well as the cliff-hanger teasers for subsequent episodes, all of which clearly indicate that the series focuses primarily on entertainment through emotionalisation. The question that motivates this analysis is, however, in which ways the passages of Roman Empire where factual knowledge is explicitly conveyed by the off-screen commentary or the experts are comparable to the corresponding parts in more staid documentaries. It goes without saying that both types of formats are multimodal media products. As “moving-image producers capitalize on the affordances of modes to emotionalize moving-image texts” (Rowsell 2014: 308), even more traditional documentaries display a multimodal design that may resemble proper entertainment formats like disaster movies or television drama (cf. Hobden 2016). Hence, a look at the different modes is paramount for an analytically-significant comparison, especially at how they interact, since “it will typically not be justified (sometimes even misleading) to assume a simple additive effect of ‘contributing mood variances’ from different modalities” (Vitouch 2001: 81 on sound and image in film). The above-mentioned focus on the passages provided by the off-screen commentary and the experts intends to show to what extent and how even those elements that are most apt for constructing historical knowledge subtly try to appeal to the audience’s emotions by a complex and well thought-through multimodal make-up.

Jehlicka, Jakub; Leheckova, Eva. Department of Linguistics, Faculty of Arts, Charles University, Prague; Institute of Czech Language and Theory of Communication, Faculty of Arts, Charles University, Prague. Title: Functions of beat gestures in spontaneous interaction: evidence from English and Czech. Abstract: Beat gestures, or “batons”, have been traditionally regarded as the least “semantic” class of co-speech gestures (Kendon, 1988, McNeill, 1992). Taken as mere involuntary movements accompanying production of spoken language, their relation to language has been often limited to phonological aspects: rhythm or prosody in general. The fundamental association of beat gestures with prosody has been acknowledged since the very beginning of linguistic interest in gesture (e.g. Pike, 1967). Closely linked to prosody is another widely recognized function of beat gestures – discourse marking. As prosody serves as a marker of distinction between discursive units (information structure), so may gesture (Kendon, 1972). Our paper focuses on a) what roles beat gestures take on when co-occurring with the spontaneously produced language in interaction, and b) the usefulness of distinguishing between the canonical co-speech gesture categories (beats, iconic, deictic, metaphoric gestures) with respect to their functions. Our data come from two comparable multimodal corpora of English and Czech. The English corpus is AMI-corpus (Carletta, 2006), freely available resource of spontaneous interactions. Czech data were obtained from our multimodal corpus (in development). English and Czech subsamples comprise of spontaneous speech and gesture production of 10 native speakers each. Coding was performed in ELAN by three independent coders with a significant inter-annotator agreement. Gestures were coded for their formal features. Adjacent constructions were coded for their semantic features as well as discursive status (focus, topic, contrastive topic, according to Lambrecht (1994)). In both languages, we observed general tendency of beat gestures to integrate various functions within a single unit. In particular, beat gestures were found to often serve rhythmic, discursive as well as semantic function all at once. More robust tendency to accumulate these 3 clusters of functions was observed in English. Such a difference may be accounted for a different meaning encoding through the use of morphological markers in English and Czech: whereas in analytic English there is a need for additional profiling of the verb meaning via gesture, the morphologically rich Czech enables to express the verb meaning using affixes. Based on our data, we argue that co-speech gestures should be approached rather as aggregates of various functions (Kok, Bergmann, Cienki, & Kopp, 2016) – simultaneously co-expressing rhythmic, discursive, semantic and other function. We illustrate the similarities between the gesture’s co-expressivity, co-expressivity of signs of signed languages, as well as hand movements of a music conductor (Rudolf, 1950).

Jelec, Anna. Department of Cognitive Linguistics, Adam Mickiewicz University in Poznań. Title: Grounding conceptualisation in bodily mimesis. Abstract: Research demonstrates that a vast majority of abstract concepts is represented in concrete terms both in speech and gesture, (e.g. Cienki and Müller 2008), and abstract subjects are commonly described as sensorimotor experiences (cf. Lakoff and Johnson 2003). Although there is some consensus that the emergence of meaning depends on the interactions with the world (Pecher and Zwaan 2005), there is little clarity as to how non-physical abstract concepts could be grounded in physical experience. Perhaps the origins of abstract conceptualisation can be sought in the body itself. Bodily mimesis, the use of the body for representational means (Donald 1991, 2001), is one plausible link between action and mental representation. A particular act of cognition or communication is an act of bodily mimesis if and only if it fulfils certain conditions: cross-modality, volition, representation and communicative function. This study analyses recordings of a congenitally blind girl explaining a set of concepts to a computer. Data was gathered in the space of three years as part of a longitudinal study of blind children’s gesture (Jelec 2014). The gestural, verbal and vocal performance of the child when she was 7 years old showed a remarkable overlap of features with Donald’s definition of mimesis, in that both gestures and descriptions relied heavily on reenactments of situations and associated sounds and movement. However,, by the age of 10, the mimetic behaviours in language have been replaced almost exclusively by verbal descriptions, while the number of mimetic gestures remained at the same level. These results go in line with the findings of Zlatev who hypothesised that bodily mimesis grounds but does not constitute linguistic meaning (Zlatev 2007:327), as well as studies showing a developmental leap in abstract concept acquisition between the ages of 7 and 9 (Ponari et.al. 2016). Hence, mimetic behaviour could pave the way for concept understanding in a way predicted by some novel approaches to the symbol grounding problem.

Jurewicz, Joanna. Warsaw University. Title: The Dancing Śiva and the Work of Mind. Cognitive Analysis of the Religious Image. Abstract: The aim of the paper is to analyse the meaning of the figure of the dancing Śiva and to show how the models of the conceptual metonymy, metaphor and blending can explain the power of human mind to express and understand abstract philosophical concepts via visual art. The meaning of this figure has been analysed by Zvelebil (1985). He has shown that each detail of the figure evokes abstract philosophical and religious concepts, such as creation and destruction of the world, its protection, death, immortality, salvation etc. I will show that these concepts are activated thanks to metonymic and metaphoric links the vehicles and source domains of which are visually represented, and that these links can be fully understood only against the background of cultural knowledge of the Indian religious and philosophical heritage. Moreover, in order to understand the full meaning of the figure, the recipient has to compress all the concepts in one huge blend which gives global insight into - as Zvelebil (1985) writes - the "systemic, structural totality that constitutes the meaning of the whole – the meaning of the ānanda-taṇḍava (the cosmic dance of Śiva)". The rich content of the recipient’s creation realised during contemplation of the figure is not only an esthetic experience, but may also lead to the mystic vision which consists on blending and which, in Indian tradition, is controlled by the agent thanks to yoga discipline.

Kibrik, Andrej A.; Fedorova, Olga V.; Nikolaeva, Julia V. Institute of Linguistics, Russian Academy of Sciences; Lomonosov Moscow State University. Title: Russian Multichannel Discourse. Abstract: This paper reports a project implemented at the Institute of Linguistics, Russian Academy of Sciences, and aimed at creating a corpus of Russian multichannel discourse. The term “multimodal communication”, as it is used in modern studies, is somewhat overstated, as one normally takes into

account only two modalities: auditory (or vocal) and visual (or kinetic). There is a multitude of communication channels belonging to these two modalities, as shown in Figure 1. It is for this reason that we prefer the notion of multichannel discourse/communication. Human beings are involved in multichannel communication throughout the most part of their lives. However, this phenomenon is still poorly understood and even documented, for at least two reasons. First, multichannel communication is an ephemeral process: it goes by, normally leaving behind no objective traces. Second, studies of multichannel communication are traditionally divided between different disciplines; in particular, linguists are mostly interested in the verbal channel, while kinetic communication is primarily explored by psychologists. In this project we attempt, first, to register the multiple channels of communication via the use of advanced technological solutions and, second, to overcome the traditional disciplinary boundaries, looking at most of the channels shown in Figure 1 within one framework. We have collected a resource named “Russian pear chats and stories”, seewww.multidiscourse.ru (the web site is in Russian). This resource consists of 24 communication episodes, or “recordings”, lasting from 12 to 38 minutes each. Four persons took part in each recording, with the fixed communicative roles: Narrator, Commentator, Reteller, and Listener. At the preparatory stage, two participants (Narrator and Commentator) watch the so-called Pear Film – a well known stimulus material, created in the 1970s by a research group led by Wallace Chafe, see www.linguistics.ucsb.edu/faculty/chafe/pearfilm.htm. The main part of each recording consists of three stages. First, Narrator tells the content of the film to Reteller; this is a monologic stage. The second stage is conversational: Reteller asks clarifying questions, Commentator provides additional comments on the film, and all three discuss the content of the film in detail. The third stage is again monologic: Reteller, who did not see the film,

tells about it to Listener, who joins the group immediately before that. Finally, Listener writes down the content of the film, which is important for motivating Reteller to recount the content of the film well. The spatial design of the communication scene can be seen in Figure 2. Note that Narrator and Reteller wear lightweight eyetracker glasses that allow us to register their eye gaze throughout communication. The audio signal was recorded with the help of a six-channel recorder ZOOM H6 Handy Recorder. Three industrial video cameras JAI GO-5000M (100 frames per second) recorded three participants, shooting individually from a frontal perspective. These cameras use the mjpeg format, free of interframe compression, which is crucial for subsequent frame-by-frame annotation. The camera GoPro Hero 4 (50 frames per second) was used to record in the cover shot mode. Two eyetrackers Tobii Glasses II (50 Hz) were used. In the corpus, each recording is represented with a set of 10 synchronized media files that objectively register the participants’ vocal, kinetic, and gaze behavior. Furthermore, for each recording various annotations are provided, including transcripts of speech (the verbal component and many aspects of prosody), annotations of manual and cephalic gestures, and annotations of oculomotor behavior. At this time, three recordings have been uploaded at multidiscourse.ru, and we are working towards a fuller coverage. As a pilot subproject, we have also created annotations of phonetics, corporal gestures, facial expressions, and proxemics. This corpus is growing into one of the first resources allowing one to analyse human communication in its actual richness. There is a wide variety of research issues that are being explored with the help of this resource, ranging from most theoretical ones, e.g. a multichannel reinterpretation of communication roles, turn-taking, or pausing, to more technical ones, such as precise temporal coordination between prosodic and gestural units. ("Research underlying this paper is conducted with support of grant #14-18-03819 from the Russian Science Foundation.")

Krajcik, Chelsea. SOAS, London. Title: Placement and removal verbs in Joola Kujireray - using gesture to explore semantic information. Abstract: This research presents a description of the domain of placement and removal events as expressed by speakers of Joola Kujireray, spoken in the lower Casamance region of Senegal in West Africa, by observing the co-speech manual gestures of speakers. Placement and removal events are described as “caused motion events involving an action where an agent causes an object (the figure object) to move to an end location (a goal ground) to which it will relate in a resulting spatial relationship” (Gullberg, 2011:7). This event domain varies cross-linguistically, both on a syntactic and semantic level: speakers of various languages express different information in verbs and or in adverbial phrases. Speakers retrieve particular lexical items depending on the varying features found within a placement or removal event, such as a figure’s size, shape, or canonical/non-canonical position, or features of the ground such as whether or not the figure is visible, contained, or on a flat, horizontal surface. Joola Kujireray is a language of the Atlantic family. It is the nominal language of the village Brin, with approximately 500 speakers. The first comprehensive description of this language was carried out by Dr. Rachel Watson from 2011-2015, now allowing for more fine-grained studies on specific domains such as placement and removal events. This research was carried out in the fall of 2015 and 2016 through elicitation sessions and two Director-Matcher tasks adapted from the Put Project (Bowerman, Gullberg, Majid & Narasimhan, 2004) and the Caused Positions (Hellwig & Lüpke, 2001) set. 18 participants between the ages of 18 and 29 completed the task. Preliminary results show that Joola Kujireray has a complex verbal system of expressing placement and removal events, including both broad and fine grained semantic information in verb roots. This system includes the use of positionals such as -fil- ‘lie’ and -il-‘stand’ for certain inanimate figures. The observation of co-speech gestures have provided insight to finer-grained semantic differences between verbs, both for example, the more semantically-general verbs such as e-baŋ ‘to put; to hold; to put down’ and e-kan ‘to put; to do; to make’, and for the semantically-specific verbs, e-toot ‘to pick up’, e-jop ‘to pick up a quantity of X by hand’, and e-women ‘to gather items together’. Co-speech gestures were coded for handshape and show additional semantic information of the figure’s size and shape. Furthermore, these gestures express contrastive manners of motion, aiding the description of the verbs’ semantics. This research presents the semantics of a selection of verbs in Joola Kujireray and describes the various features of the figure and ground determining the choice of verb used. This research also serves to advocate for the incorporation of co-speech gestures as it provides additional semantic information which is valuable to the description of a language. Providing a description of the co-speech gestures and the semantic information expressed in both modalities sheds light on the extent of cross-linguistic variation by speakers in the domain of placement and removal events.

Krug, Maximilian. University of Duisburg-Essen. Title: Rehearsing in theatre: Collective achievement of an interactional system through the coordination of multiple activities. Abstract: Shows of professional theatres often need more than six weeks to be made. This period of practice in which directors, actors, and other members of the staff develop a theatrical play is called rehearsal. Although being an important part of western culture, the practices of rehearsal processes remain widely uninvestigated (McAuley 2012). Therefore, this paper aims to analyse the organised ways in which participants of a theatre rehearsal contribute to the creative process according to their specific work tasks and how multimodal resources (especially gaze, gesture, and alignment) are used to achieve the different task related activities. The data used for this paper are taken from a corpus of 200 hours of video recordings of rehearsals at a professional theatre in Germany. The corpus covers 31 rehearsals of a devised theatre play with two actors, the director, and his assistant director over the course of six weeks. In addition to camcorders, two mobile eye tracking glasses were worn by the director and his assistant director. This allows an insight in gaze behaviour of primarily visual activities like observing and reading (Holler/Kendrick 2015). For the purpose of this paper, a one-minute-sequence is chosen to demonstrate the multimodal coordination practices of multiple activities (‘Multiactivity’, Haddington et al. 2014) and their implications for the organisation of rehearsal practices. The sequence consists of a monologue rehearsal with one actor playing, the director observing the actor’s activities, and the assistant director making notes and prompting, when the actor forgets parts of the text. Employing eye tracking, it becomes visible that activities such as reading and observing are closely coordinated with other activities and other participants’ actions through mutual monitoring. To perform the prompting (multi-)activity, the assistant director cancels the note-making, starts to read the script, observes the actor for hints of “running dry” and eventually starts to prompt the required text. The actor uses multimodal resources such as pauses, prosodic features, gaze, and body orientation towards the assistant director in order to display a need for a prompt. In comparison, director and assistant director both observe the actor’s actions as part of their professional vision (Goodwin 1994). However, while the assistant director performs the observation activity as part of the prompt (multi-)activity, the director treats the observation of the actor as his main activity by subordinating and eventually cancelling other activities such as searching for the script, reading, reacting to the actor’s performance. Thereby, the aim of this paper is to investigate how participants of a theatre work place setting make use of their multimodal resources in order to collectively establish a rehearsal as one interactional system. By examining multi-party interactions with the methods of multimodal interaction analysis, the fine-tuned multimodal practises with which participants solve practical problems of coordinating multiple activities in specific contextual arrangements become observable, and thus helping us understand how social interactions are achieved through multimodal resources.

Lackner, Helmut K.; Papousek, Ilona; Rominger, Christian; Brône, Geert; Oben, Bert; Feyaerts, Kurt. University of Graz; University of Leuven. Title: Mapping patterns of interactive alignment in and across verbal and physiological behaviour. An empirical study. Abstract: People in face-to-face interaction are subjected to processes of mutual alignment (Pickering & Garrod 2006; Author 2015). In this contribution, we investigate In this contribution, we investigate interactive alignment from a multimodal perspective by zooming in on the relation between verbal and physiological alignment. More specifically, we discuss findings of an empirical study, in which we investigated the synchronisation of two interaction partners in their verbal use of two types of viewpoint phenomena, amplifiers and comical hypotheticals (Authors, subm.), as well as in their physiological behavior. We also analysed whether the verbal and physiological synchronisations covaried. Our study is based on the analysis of 24 dyads, in each of which male participants, unknown to each other at the start of the experiment, engaged in 22,5 minutes (3 phases of 7,5 minutes) of spontaneous face-to-face conversation, which took place under controlled circumstances (Authors, 2012). We recorded the heart rate using high-resolution electrocardiogram, respiration, and gross movements using a 3D accelerometer. The measurements allowed to compute heart rate fluctuations while taking into account changes in the breathing patterns as well as movement artefacts. The analysis of synchronisation between heart rate fluctuations of the two interaction partners was based upon the weak coupling of two chaotic systems (phase synchronisation, cf. Lackner et al., 2011), with simultaneous consideration of the speech patterns. As synchronised patterns may also appear by chance, we compared the synchronisation in the real dialogues to the synchronisation in control dialogues (which were obtained by randomly matching interlocutors that never actually talked to each other (Dale et al. 2006; Author 2015). In this contribution, we report inter-personal alignment patterns, which were observed in both modalities (verbal, physiological) as well as some intriguing interactions across different modalities.

Lelandais, Manon; Ferré, Gaëlle. University of Nantes. Title:

The multimodal expression of subordination in spontaneous conversation. Abstract: Based on a video recording of conversational British English, this study aims at providing a qualitative analysis of the multimodal construction of subordination in conversation. We focus specifically on the contribution of gestures in speech sequences containing syntactic subordinate structures. These constructions are described in linguistics as elements specifying or elaborating upon some primary features, or as additions associated to another propositional content in the host or embedding structure (Huddleston and Pullum 2002: 1048). Some specific subordinate constructions, such as conditional and concessive adverbial patterns, are increasingly regarded as practices in interaction which accomplish a range of interactional and pragmatic actions in specific sequential environments (Ford 1997; Evans 2007; Debaisieux 2016; Ehmer 2016). They have been shown to involve bundles of resources, and the focus has been on specific lexico-syntactic and prosodic means (e.g., Chafe 1988; Couper-Kuhlen 1996; 2012; Laury & Suzuki 2011). In face-to-face interactions, participants use different channels (verbal, vocal, visual) when they speak. One of the aims of Multimodal Discourse Analysis is to study the contribution of each channel to the information content of messages. While the independent pragmatic actions and the different modes of prosodic realisation of subordinate constructions have been highlighted, few studies have focused on the articulation of gesture with the different communicative modalities in their production process. If such constructions emerge from the local contingencies of spontaneous conversation in context, a multimodal account of subordinate constructions through the sequential analysis of several examples helps at better delineating their levels of action and modes of integration in discourse. It also demonstrates how gestures are part of the linguistic structure of utterances in interaction. The corpus used for this study, ENVID, is a collection of dialogues in British English. This collaborative corpus gathers video recordings realised in soundproof studios between 2000 and 2012, making up a total of 2 hours and 10 minutes of interaction. They involve British people aged 20 to 23 who were friends or had already met. The corpus was first transcribed orthographically before going through a range of syntactic, discursive, prosodic, and gestural annotations, using both Praat for speech events (Boersma and Weenink 2013) and Elan (Sloetjes and Wittenburg 2008) for gestures and to relate information in the different domains. Beyond showing that gesture brings both semantic and prosodic features to the utterance, the results shed new light on syntactic subordination, seen as part of a composite message in which information is not presented in isolation but in a contiguity relationship. Gesture may thus bring a different kind of mental coherence to the representation of a message (Ping & Goldin-Meadow 2010), which increases the efficiency of representation, but also benefits the interaction.

Liao, Zhenggang. Jilin University. Title: The Cognitive Interpretation of Basic Color Words in the Yellow Emperor Bible. Abstract: Corpus methods are used to investigate the constructional patterns involving basic color words in The Yellow Emperor Bible. There are many such interesting patterns. The paper analyzes the cognitive and conceptual bases for these patterns. The Yellow Emperor Bible deploys basic color words in ways quite different from those presented by Berlin and Key as belonging to a universal cognitive structure. Subcategorization of color words in The Yellow Emperor Bible involves conceptual mappings between five colors and five organs and a cultural scheme of color cognition. The cognitive basis of this color scheme is discussed. One common grammatical pattern involves the blending of two basic color words, where the concepts evoked by those words bear vital relationships such as cause-effect, sequential relation, and so on. An additional pattern is “extensive adjective+ basic color word.” The pattern "extensive adjective+ basic color word" is seen less often in The Yellow Emperor Bible than would be expected. The Yellow Emperor Bible presents a range of abnormal phenomena for color words that are not basic. Expressions in The Yellow Emperor Bible about the four seasons also show interesting patterns. The color of heaven, the color of earth, and the color of man are conceptually mapped to perceptual organs. Most interestingly, these color schemes play a role in the conceptual understanding of mental conditions and in particular the diagnoses of mental conditions, where the diagnoses are designed to guide clinical treatment. The constructional patterns involving color words are thereby connected to language, vision, embodied understanding, conception of the universe and a human being’s place in it, and especially to the understanding of other minds based on analyzing the multimodal communication presented by those other minds.

Libura, Agnieszka. Wrocław University. Title: "Any Minuten now... Wörk!" Playing with stereotypes in Polandball memes. Abstract: The aim of this paper is to investigate the Polandball memes. These Internet cartoons originally featured a ball-shaped creature with the reversed color scheme of the Polish flag and subsequently evolved to include other ball-like cartoon characters representing various countries, marked with their respective color flag schemes. First, the paper analyzes the national stereotypes behind countryballs characters, differentiating between elements of autostereotypes and heterostereotypes. Second, the relations between visual and verbal modalities are examined. The paper focuses especially on two cases: (1) on the one hand, when the meaning is conveyed by simple visual elements like shapes, (2) on the other hand - when the meaning is conveyed by the combination of wordplays and striking visual elements. The study shows that, in spite of their amateur-looking and unpolished quality, the Polandball memes are able to precisely construct meaning and values, participating in political debate, challenging political correctness and co-creating post-modern folklore.

Lou, Adrian. University of British Columbia. Title: Internet Memes as Multimodal Similes: A Cognitive Linguistic Analysis. Abstract: Though a widely recognized trope throughout the history of rhetoric (Aristotle 1960; Perelman and Olbrechts-Tyteca 1969), simile has been relatively understudied and often

simplified as a subtype of metaphor. However, cognitive linguists Barbara Dancygier and Eve Sweetser argue that metaphors and similes have “different patterns of mapping” (2014: 138). Building on conceptual blending theory (see Fauconnier and Turner 2002), Dancygier and Sweetser put forth the term “limited-scope blend” to underscore the ways in which similes select foregrounded elements of one domain. Unpacking the simile “The classroom was buzzing like a beehive” (2014: 145), they illustrate how the simile construction triggers the mapping of a specific vivid perceptual element of the source domain (i.e. the sound of the beehive) onto the target domain (i.e. classroom). Other possible elements associated with the source (e.g. the shape and size of the beehive) are not mapped or blended together. Expanding on their work, this paper analyzes the “when” meme, a popular online image macro which juxtaposes a when clause with an ostensibly unrelated image (see Figure 1). Despite the initial incongruity between what is presented textually compared to what is shown visually, I argue that the meme prompts selective mapping between the text and image to produce a multimodal simile. For instance, Figure 1 is an image macro depicting a car driving away from a gas pump with the nozzle still attached to the car. The car appears to have stopped, and the pump has been ripped from its base. The text on top does not refer to any of the objects featured in the photograph. Rather, the verbal element establishes a scene where a person walks away from his/her laptop but has forgotten to remove the earphones that are still attached to the person’s head as well as the computer. Viewers are able to reconcile the incongruities by decontextualizing aspects of the text and the picture. That is, viewers are made to focus on the resistance of the string-like object depicted in both modalities and how this force acts on one moving and one stable object. The force dynamic present in both modalities functions as the conceptual link that makes the text and image alike. Therefore, what makes the “when” meme a simile and not a metaphor is the fact that the domains are never blended together. The meme’s unique multimodal construction, moreover, is able to convey the idea that one phenomenon is similar to another one without having to use the simile’s conventional like or as form. My exploration of the “when” meme attempts to bolster recent cognitive linguistic analyses of simile (Israel et al. 2004; Moder 2008, 2010; Dancygier and Sweetser 2014) and to engage with existing research on multimodal metaphors (Forceville and Urios-Aparisi 2009). In adopting a cognitive linguistic approach, this paper also participates in an ongoing discussion on multimodal discourse by joining other scholars (Dancygier and Vandelanotte 2015, 2016, submitted) who view internet memes as novel communicative forms that offer new insight into the relationship between multimodality and figurative language.

Lüdtke, Jana; Jacobs, Arthur M. Department of Education and Psychology, Freie Universität Berlin; Center for Cognitive Neuroscience (CCNB), Freie Universität Berlin. Title: ‘QNAing’ Shakespeare: Tools and predictions for neurocognitive poetics. Abstract: Two main goals of the emerging field of neurocognitive poetics are i) the use of more natural and ecologically valid stimuli, tasks and contexts and ii) providing methods and models allowing to quantify distinctive features of complex verbal materials used in such tasks and contexts, e.g. metaphors or entire sonnets (Jacobs, 2015; Jacobs & Willems, 2017; Willems & Jacobs, 2016). Recent progress in the development of quantitative narrative analysis (QNA) and computational modeling/machine learning tools allows to quantify a wealth of features of complex stimuli at all relevant text levels (i.e., sublexical, lexical, interlexical and supralexical; Jacobs et al., 2016). In this talk we demonstrate an application of QNA tools allowing to i) predict the main topics of the 154 sonnets by Shakespeare and ii) generate various testable hypotheses for neurocognitive poetics studies using multiple response measures such as ratings, eye tracking or brain activity (Jacobs et al., 2017).

Mendes de Oliveira, Milene. University of Potsdam. Title: Blended classic joint attention in user-created Youtube videos. Abstract: Etymologically speaking, to communicate means “to make common,” i.e., “to establish information as common or shared” (Clark and Henetz, 2014: 18). Within the realm of communication, the phenomenon of joint attention is very representative of this intersubjective experience highlighted in the etymology of the verb. Tomasello (1995) defines joint attention as a type of socio-cognitive phenomenon in which interactants are aware of jointly paying attention to a matter or an object. This act of “seeing together” (Tobin 2010) is made possible because interactants share the understanding of what the “ground” is in a certain communicative situation (Turner 2015). Scholars have made the case that traditional media heavily counts on viewers’ knowledge of scenes of classic joint attention (CJA) in the production of broadcast news (Turner 2015; Steen and Turner 2013). In these programs, anchorpersons simulate CJA scenes by ‘talking’ to the viewer as if they were in the same physical space. The viewer is not fooled by this simulation (Steen and Turner, 2013:4); but the scene of CJA is an important input space upon which the viewer bases her interpretations of (multimodal) constructions. However, this is not the only input space the viewer bases her interpretations upon; another one refers to her actual experience of being in a living room, looking at a TV, and watching the news. These – and probably other – input spaces are blended and a scene of blended classic joint attention (BCJA) is created where multimodal constructions receive meanings to a certain extent independent from those in the input spaces. In news programs in the established media, BCJA usually goes unnoticed (Steen and Turner 2013); i.e; when watching the anchorperson utter ‘see you tomorrow,’ viewers are not trying to make sense of the complete mental web that surrounds the blend. In contrast, some productions might choose to use simulations of CJA moments very explicitly in order to create certain viewer effects. This paper explores how some user-created YouTube videos play with established media’s traditional use of CJA by: a) exaggerating CJA and making viewers aware of it or b) by failing to simulate it, mostly on purpose. In this paper, CJA-oriented moments and CJA ‘slips‘ will be identified and analyzed in videos by the Brazilian vlogger Jout Jout, who has a very popular channel (Jout Jout Prazer) on YouTube. Verbal and non-verbal (gestures, eye-gaze, pitch movements, and visual effects) clues referring to CJA scenes and ‘CJA slips’ will be considered in data analysis. I conclude that the combination of exaggerated and ‘failed’ CJA moments might be one of the reasons for the vlogger’s rising success among Brazilians. Even the ‘CJA failures’ might have the effect to reinforce her connection with the audience by creating a more real and human image of herself, in contrast with the image of TV stars shown by the big media.

Menke, Peter. Universität Paderborn. Title: "Es war einmal" - A multimodal corpus of a collaborative storytelling card game. Abstract: We present interim findings of an ongoing project that analyzes multimodal interaction during the play of a collaborative storytelling game. In this game, the overarching goal is to tell a fairy tale. In each session, three players have cards on their hands that stand for concepts that are distinctive of fairy tales. During the narration (and the coherent continuation) of the story they are allowed to play a card from their hands each time they use its concept in the story. The player who plays all her cards first wins. We conducted a pilot study which was recorded and then annotated partially. Using the resulting data, we analyzed several phenomena from the area of multimodal interaction. Our focus is on those actions that are used by players in order to organize the interaction, or to clarify and to disambiguate their contributions, especially with respect to the jointly produced fairy tale. In particular, the following kinds of behavior are interesting: 1. The players make extensive use of pointing gestures of various kinds in order to refer to earlier concepts as well as to points in time as well as temporal relations within the narrated story. This is possible because the concept cards are played in a row. Thus, they encode the temporal course and development of the story. 2. In situations where a long story emerged, players produce complex gestures to support the disambiguation of terms or referents. In these gestures, they make use of distinctive hand shapes or spatial locations that stand for different entities. While McNeill’s concept of cohesives appears to cover the essential part of these gestures (cf. McNeill, 1995: 16f.), we assume that there are other properties and functions of this kind of gesture that need to be examined further. 3. In the pilot study, we observed that players regularly synchronize the placement of a card with the peak of the current intonation unit. In most (but not all) cases, this peak appears at the word that represents the concept depicted on that card. We intend to record more participants in order to collect a data basis that is large enough for a quantitative analysis of this phenomenon. During the processing of the pilot data we devised a first version of an annotation schema that helps us at investigating the questions above as well as others. One of the goals of the development of this schema is to open up the data to a wide array of research questions. While this annotation schema is rather lightweight, it could be extended or attached to more elaborate, established annotation schemas such as FORM (Martell, 2005) or MUMIN (Allwood et al., 2007). Also, we intend to widen our focus to additional fields of research (such as narratology, game studies, etc.) as soon as a larger corpus is available.

Micklos, Ashley. Max Planck Institute for Psycholinguistics. Title: Multimodal repair initiation in silent gesture communication games: Eye gaze, facial expressions, and timing. Abstract: In ordinary conversation we encounter problems in understanding our interlocutors. Fortunately we have an arsenal of resources for repairing such problems. In face-to-face conversation, interlocutors recruit a number of resources simultaneously to indicate that a problem has arisen (Schegloff, Jefferson, & Sacks, 1977). The body (Seo & Koshik, 2010), the face (Kendrick, 2015), and eye gaze (Manrique, 2016) can all be used along with the corresponding talk (or sign) to highlight a problem of hearing or understanding. However, in a context of limited interactive affordances, as in experiments in language evolution, are participants still able to signal repair effectively? Problems of understanding arise in these settings as participants negotiate form-meaning mappings in a novel communicative mode. In a silent gesture communication game participants acting as gesturer and guesser were required to disambiguate similarly gestured - and thus easily confusable - noun-verb pairs. Here repair became a vital resource for establishing optimal disambiguation strategies. Other-initiated repairs were often performed with eye gaze and facial expressions. In instances of the guesser’s returned gaze to the gesturer, either open or specific class repairs were initiated (Dingemanse & Enfield, 2015). Open class repairs, which indicate a general trouble of understanding, were performed with a furrowed brow which was accompanied by either a mouth frown or half-smile - the latter likely being a mitigating strategy to the face-threat other-repair can impart. Similarly, specific offers would incorporate mouth frowns or half-smiles as indications of try marking (Byun et al, forthcoming) in candidate understandings. A guessers’ specific request saw the simultaneous use of a point to the gesture space with a brow raise hinting at the suggestive nature of the point’s referent and the repair initiation itself. Throughout the task, participants become entrained to the timing of gestures and guesses. Longer gaps between the end of the gesture and the guess could then signal a problem with the prior, akin to longer pauses in turn taking (Kendrick, 2015). The subtle cue to repair can be made more salient with facial gestures that are not visually directed at the gesturer (that is, no gaze return), but at the array of meanings in front of the guesser. As with those made to the gesturer, array-oriented displays of confusion included the furrowed brow and mouth frown. The averted eye gaze could again be a means to attenuate face-threat. Self-initiated repairs, which typically arose in later iterations of the game, were made salient with simultaneous face and hand gestures. While the hands gestured the intended meaning, the face highlighted the self repair with a high brow raise. The brow raise laminates the current gesture as a repaired one in an attempt to draw specific attention to the repairable as a deviation from previously established conventions. Sometimes apologetic half-smiles accompanied these self-repairs. Even with limited affordances, participants were able to make use of multimodal and interactive repair sequences - while simultaneously attending to the socio-pragmatic constraints on repair - to negotiate and establish signaling conventions.

Mierzwinska-Hajnos, Agnieszka. Maria Curie-Skłodowska University, Lublin. Title: When Music does not Need Words: Multimodality in Walt Disney’s Fantasia. A Conceptual Blending Analysis. Abstract: Assuming, after Eggins, that language ‘is modelled as networks of interconnected linguistic systems from which we choose in order to make the meaning we need to make to achieve our communicative purposes’ (2004: 327), the present paper aims to explore a multimodal character of Walt Disney’s animation series Fantasia, adopting a conceptual blending approach, both in the original framework as proposed by Fauconnier and Turner (1998, 2002) and its further modifications (cf. Brandt and Brandt 2005, Brandt 2013). While analyzing various segments of Fantasia 1940 and Fantasia 2000, it is easy to observe that the linguistic component, usually manifesting itself in texts, inscriptions, commentaries, or other forms of linguistic expression, has virtually been replaced with non-linguistic modes of communication, mainly with visual and auditory channels. The overlapping of visual and auditory information as depicted in Fantasia allows us to account for complex cognitive processes which accompany multimodality, thus making it a successful mode of communication (cf. Murray 2013, Kress 2010, Zbikowski 2009, 2015). Of vital importance here is the interaction between the sender and the receiver of the encoded message. Therefore, at least three problems ensue while approaching Fantasia: (i) how meanings of particular segments in the series are created, (ii) how they are interpreted by the audience, and (iii) to what extent the audience’s interpretation diverts from the initial intention of Fantasia creators. For the purpose of this presentation, a thorough study will be carried out on George Gershwin’s Rhapsody in Blue, one of the segments in Fantasia 2000.

Müller-Viezens, Johannes. TU Chemnitz. Title: Performing speech under copresence - Multimodal communication and its conceptualizing in face-to-face role-playing games. Abstract: Role-playing games under copresence, such as ,Dungeons and Dragons, Warhammer or Unknown Armies, represent a specific form of communication that is limited and restricted by a complex system of rules. Through creating characters using a pre-scripted and pre-organized gaming-system, players act in a fictional world which is only discernible through performing it. Players have to negotiate the rules of this fictional world and need to describe and perform speech acts to create a collaborated story. To do so they not only have to perform speech acts as themselves, they are also facing the complex problem to anticipate how characters would use language to create speech acts. This situation brings forth a variety of specific verbal speech-acts that are mostly of a highly performative nature. This talk presents the results of an analysis which focuses on the conceptualization and realization of performative speech acts and multimodality - especially gesture and visual signs and their conceptualization - in such role-playing encounters. Based on about twelve hours of audio data, it will be shown how the structure of the role-playing games rests upon principles of interactivity and narration and, in particular, the performance of speech acts (Austin 1972, Searle 1983). By taking into account aspects of selected media and game studies works (Eco 1977, Fleischmann 2008, Müller 2014) and specific perspectives on group communication (Kieserling 1999, Goffman 1959), the presentation aims to point out the connection of interaction, gaming and multimodal communication. The highly situational adapted language that is developed in these particular role-playing encounters reveals insights on the online cognitive conceptualization of speech and the nature of communication itself.

Nirme, Jens; Garde, Henrik. Lund University Humanities Lab; Lund University. Title: Computational camera placement optimization improves motion capture data quality. Abstract: 3D motion tracking data collected from multimodal communication setups by use of passive reflective markers and high frequency infrared cameras is very accurate. However, occlusion and visual overlap complicates unambiguous identification of markers and decreases data quality considerably. These problems could be related to suboptimal camera positions. Data loss is often experienced during recordings of natural, spontaneous, undirected, unpredictable multimodal communication. It is especially difficult to work with gestures, characterized by short changes in velocity and directionality of markers on hands and fingers. Data loss occurs when hands or fingers are folded, close to each other or to objects or subjects that are parts of the experiment. The data loss causes problems with post processing like filling in occluded markers as well as event tracking. Typical camera setups are generally well suited for capturing a full body motion on the floor whereas hand gestures are often small movements close to the body. To find optimal camera positions and angles manually by trial and error is very time-consuming due to the number of possible camera positions and to constraints such as distracting IR beam sources obtained directly by other cameras. Here we present a possible solution, namely a simulation of motion capture that predicts marker visibility, given scenarios and camera configurations. In a VR program enabling a virtual room to be equipped with props like tables and armchairs (similar to the experimental set-up in the real world) we add an animated skeleton with a simple body and virtual, simulated markers attached to it. Based on realistic movements, i.e. similar to those expected to take place in an experiment with a real participant, we then run the simulation which assesses the chosen camera configuration by counting visible markers frame-by-frame. By repeating this for other configuration we have a qualified ‘best’ camera set-up. With this automated guide to optimize camera setup for difficult motion capture a range of projects can achieve customized set-ups for experiments, depending, for example, on different levels of mobility or fidelity, i.e. number of markers, cameras and limited positions to mount them. A dynamic motion capture set-up combined with VR technology also allows the creation of experimental platforms to study speech-gesture processing, for instance. An easily set up framework based on 3D body motion data and recorded speech allows extremely controlled experiments to be run on a larger scale. Optimized camera placement is an essential part of that setup.

Olza,Inéz; Valenzuela, Javier; Alcaraz Carrión, Daniel; Pagán Cánovas, Cristóbal. Institute for Culture and Society, University of Navarra; University of Murcia; University of Lancaster; Institute for Culture and Society, University of Navarra. Title: The Red Hen Lab’s NewsScape Library and Gesture tagging in the CREATIME project: Methodology, theoretical implications and contribution to machine-learning. Abstract: Among other research tasks and objectives, CREATIME’s workflow includes gesture analysis as long as it offers evidence of the conceptualization patterns shaping our discourse on time. Following a revision of several thousands of examples (more than 4,500) drawn from the NewsScape Library of International TV News, we analyzed and tagged around 475 clips containing English temporal expressions accompanied by any kind of relevant body movement. This presentation aims to reflect on the methodological and theoretical implications underlying CREATIME’s co-speech gesture analyses. (1) On the one hand, a full overview of the categories used in CREATIME gesture tagging is offered, with reference to how it relies on –and goes beyond, in certain aspects– other existing ontologies for gesture annotation (see, among others, Bressem, Ladewig & Müller 2013). (2) On the other hand, our approach to co-speech gesture differs from mainstream analyses that seek to offer individual and exhaustive descriptions of body behavior using software (e.g. ELAN or motion capture) that allows for it (Duncan, Rohlfing & Loehr 2013; Pfeiffer 2013). In contrast, our use of big amounts of data calls for a less atomized approach that makes it possible to detect recurrent body behavior and significant patterns in the way speakers gesture when talking about time. At this point, we will reflect on the advantages and limitations of such an approach. Along with this, the kind of analysis described above recently opened a door for collaboration with computer scientists at CWRU (Ray and Turchyn) in the development of tools for automatic gesture recognition. We will offer here an update of the advances in CREATIME gesture annotation fostered so far by the mentioned tool.

Pagán Cánovas, Cristóbal; Olza, Inéz; Valenzuela, Javier. Institute for Culture and Society, University of Navarra; University of Murcia. Title: Conceptual integration templates for time: converging evidence from big multimodal data. Abstract: This will be a wrap-up talk putting forward some conclusions and further working hypotheses based on the CREATIME empirical work. It will combine the exposition of the panel coordinator and a brief round-table debate with the other presenters. The picture that is emerging from the CREATIME studies indicates that time is constructed through more complex patterns than what mere conceptual transfer or direct projection can explain. A more complex mental network model is needed, in which various sets of knowledge and information come together during live performance or in the complex production and processing of poetry or film. Therefore, the patterns learned go far beyond the entrenched projections from sensorimotor experience to abstract concepts, as suggested by transfer theories of mind and brain (Feldman 2008; Lakoff 2008, 2014). Instead, our data show a complex interplay not only between cognition and cultural background (Donald 1991, 2002), but also between mental habits and the circumstances of communication or action: specific purposes, context, conditions of the communicative exchange, pragmatic functions, and so forth. The different recipes that connect and blend various mental packages for producing temporal meanings are manipulated with breathtaking speed and ease to suit complex strategies, not only in the delayed conceptualization of technology-mediated discourse, but also in oral conversation. Across all our data, temporal representations are swiftly adapted, playing with a variety of parameters to serve ad-hoc rhetorical goals: speed, directionality, orientation (left-to-right, right-to-left, up-down, etc.), axis (sagittal, lateral, vertical), manner of motion, distance in space, various aspects of agency, integration and separation of viewpoints, and many more. Moreover, time itself, without the intervention of other concepts, is manipulated with great agility to produce a wide variety of effects: durations can be stretched to centuries or reduced to seconds, time leaps are created by conflating moments and viewpoints through different verbal and multimodal patterns, scenes with impossible interactions between past and future elements are created, temporal relations are even suppressed to carry out certain reasoning or to express emotions, among other phenomena. A fluid model that triangulates cognition-culture-action is needed to account for the construction of temporal meaning, and for any other complex conceptual work. Based on the CREATIME empirical work, we propose a preliminary model for some major mental patterns in time-related mappings that recur across modalities (Santiago, Román, & Ouellet, 2011; Coulson & Pagán Cánovas, 2013; Pagán Cánovas, Valenzuela, & Santiago, 2015; Pagán Cánovas & Piata, forthcoming). We put forward the hypothesis that human beings interiorize habits for blending as well as for compressing and expanding conceptual materials (Fauconnier & Turner 2002; Turner 2014; Pagán Cánovas & Valenzuela Manzanares 2014; Pagán Cánovas & Turner 2016). Multimodality research shows that it is not only crucial to understand how these habits work in cognitive terms, but also how they are instantiated in situated communication and mental imagery, combining different modalities and serving various communicative strategies and goals. These habits apply across modalities and often integrate various multimodal elements. The templates include basic cognitive structures and cultural representations, mappings between them, and knowledge on how to build effective blending networks based on the mappings (what to fuse, what to project separately to the blend, how to run the emerging scene in the blend, and so forth). Besides, the templates also include the practical know-how about their own use in various cultural and communicative settings, for a variety of purposes in discourse, action, and reasoning. Creativity does not only operate by manipulating the mappings and integrations themselves, but also by adapting them to specific contexts and goals.

Page, Jeremy. Ph.D. Candidate, Macquarie University, Sydney. Title: 'The Michelangelo of Flow’: Revised Methods for Visualising Rhythmic Synchronicity in Rap. Abstract: As applied to the analysis of rap music, rhythmic synchronicity refers to the rhythmic interplay between a rapper’s vocal performance and the (usually 4/4) pulse of backing instrumentation, where the vocals can either align with rhythmic pulses (synchronic) or stray between beats (asynchronic). Such analysis entails an examination of what is usually referred to as a rapper’s flow, an integral stylistic aspect of rap music, and a core element in the communication of (possible) meaning. Beginning with Krims (2000), adapted by Bradley (2009), Edwards (2009) and most recently developed by Caldwell (2010), several methods for analysing and visually representing rhythmic synchronicity (or flow) have so far been developed. However, and while each approach offers its own advantages in bringing certain sonic and rhythmic qualities to the fore, each approach exhibits notable disadvantages, usually being either overcomplicated in its presentation or oversimplified in its scope. As a starting point, and drawing on the important work of those mentioned, this paper first presents a revised method for analysing and visually representing rhythmic synchronicity, which both expands the scope of past approaches while simplifying visual forms. Utilising this revised methodology, the discussion then proceeds to an analysis of rhythmic synchronicity in a small sample of songs by artists Jay-Z, Roc Marciano and Tupac Shakur, highlighting the benefits (and some possible shortfalls) of the proposed approach. Since an increasingly sophisticated understanding of rhythmic synchronicity has noteworthy implications — not only for our understanding of rap but for our theories of art more generally — the article closes with some brief remarks on what such broad implications may be. Though there is much work remaining to be done, the revised approach presented throughout is geared in the direction of automating an accurate digital rendering of rhythmic synchronicity, by dividing a rapper’s a capella (vocals) from backing instrumentation and tracking peaks separately, then combining such peaks to generate a dual pulse rendering. This process marks the early stages toward developing a program that could, given some input variables, automatically generate accurate graphic visualisations of any rap song fed through it. As a characteristically (and increasingly) multimodal form of expression, along with perhaps our most popular, music may be our richest art form available for analysis and exploration by both literary theorists and the cognitive sciences. Accordingly, an increasingly multimodal approach to its interpretation and analysis is likely to produce both the most fruitful and the most engaging results.

Paliichuk, Elina. Borys Grinchenko Kyiv University. Title: The paper focuses on semiotic peculiarities of social agenda mental pictures constructed in discourses. The research of the human traffickingsituation in mass media shows an iconic structure shaping audiences’ worldview in a specific way. Refraction of the social agenda through the cobweb picture gives rise to further study in terms of iconic framing. In particular, the interest lies in relevant semiotic representations across multimodal discourses and with regard to other social events. The value of such approach is predetermined by the possibility of providing linguistic and extra-linguistic signs in visuals for raising effectiveness of social awareness and prevention campaigns through iconic structuring of societal realities. The contemporary world is increasingly living in mediated realities by consuming pre-constructed messages about the hottest issues: national identity, multiculturalism, tolerance, migration, human trafficking, threats to national security, religion, ecology, information technologies or other agenda. The question of how these ready-made media messages shape societies’ world-images have become the latest concern of not only a linguistic thought but a subject matter of a transdisciplinary research as well (see Blanpain 2005; Bochel and Daly 2014 and others). To this end there have been previously investigated thehuman trafficking situation highlighted in contemporary media discourse, with a cobweb-like conceptual model having been revealed. Representing the social agenda through the lenses of a natural phenomenon is nothing short of a metaphorical structuring of the objective realities via contemporary media. However, not only imagery but also logic is activated via iconic shapes of a thought related to trade in human beings. With an eye to reconstructing the human trafficking mental picture, the frame-modelling techniques (Fillmore 1985; Langacker 1987; Zhabotynska 2010) have been employed to the corpus of linguistic evidences. In particular, there have been distinguished the basic actants: victim, family, traffickers, clients, police, governments, and international organizations. Each has been analyzed conceptually through a combination of frames for their static and dynamic predicates and semantic roles. All the frames put together have shown victim being focalized at the expense of other actants’ direct relations thereto, thus shaping centripetal visual lines. At the same time, the intertwining networks show the actants’ dealing with each other on how to benefit from the trafficking activity, which is represented by circular lines within the mental structure (Paliichuk 2011). Framed as a cobweb mental picture, human trafficking is represented both metaphorically and iconically with the help of verbal units seen as linguistic pencils for drawing ideas. This way of thinking arouses interest in further semiotic research related to other social agenda both from linguistic and multimodal perspectives.

Pearlman, Karen. Macquarie University, Sydney. Title: Cutting Rhythms: Editing and Cognition Beyond Continuity. Abstract: Film editing is the focus of some scholarship in cognitive studies of the moving image, with the most commonly studied form of editing being continuity cutting. Heimann et al. (2016) propose that 'Research trying to differentiate physiological responses across different kinds of cuts or edits is … rare.' Cognitive study of the editors’ processes in creating editing that transcends simple preservation of continuity and becomes what Scherer et al. call ‘cinematic expressive movement’ (2014) is rarer still. This article argues that there are considerations beyond continuity that dominate editors’ conscious and non-conscious creative processes. Further, that the edits arising from these more complex considerations have significant impact on audience narrative comprehension and emotional alignment with characters in film. This argument builds on analyses done of rhythm in film editing by Karen Pearlman in Cutting Rhythms, which describes the work of editors as cognitively complex artistry of “time, energy and movement shaped by timing, pacing and trajectory phrasing for the purposes of creating cycles of tension and release” (Pearlman 2009, 2015). This paper begins by looking at two influential theories in cognitive studies of the moving image and asking if they could be used as explanatory frameworks for aspects of film editors’ creative decision making. Film editors tend to say that they make their decisions by what ‘feels right’ (see Oldham, 1992). This article proposes that one source of something feeling ‘right’ to an editor working with continuity editing may be the smooth transference of attention as demonstrated by Tim J. Smith in Attentional Theory of Cinematic Continuity (2012). Another source of rightness of feeling may be the depth and appropriateness of what Vittorio Gallese and Michele Guerra (2011, 2012, 2014) have identified as an ‘embodied simulation response’. Building on the premise that these two theories can explain some, but not all, aspects of film editors’ creative decision making, this article proposes refinements and further questions that cognitive studies of the moving image could pursue. These further questions concern edits that do not conform precisely to the principles of maximum attentional efficiency and instead are generated by editors’ feeling for rhythmic phrases of movement, for character alignment and for cycles of tension and release. It proposes that there are studies that could be done by cognitivist scholars of the moving image that could shed light on editors’ creative process and artistry and the audience experiences these provoke. These studies would have to look at editing decisions that create subtly unique expressive forms in narrative films by deploying artistry of a higher order than the relatively straightforward rules of continuity cutting.

Pereira de Souza, Aline; Arruda-Doná, Beatriz. UNESP (State University of São Paulo). Title: Metaphor and metonymy providing text comprehension. Abstract: This work is part of a research we have been developing about metonymy and metaphor as a cognitive process not only exclusively and separately presented in the literary environment, but mainly as part of our daily routine and of all genres of discourse, including those that are classified as ‘non-literary’. In this sense, Turner (1996, p.07), in his book titled “The literary mind”, says about the prime value of the figures: “If we want to study the everyday mind, we can begin by turning to the literary mind exactly because the everyday mind is essentially literary.” Taking his idea into consideration, the hypothesis here is that by understanding metaphor and metonymy as cognitive and analogical processes, it can help considerably the students to develop the ability of understanding and interpreting texts. The theoretical framework is based on the studies of Conceptual Integration Theory as proposed by Fauconnier and Turner (2002) and Turner (2014) with the Blending Theory and Mental Spaces and also on the theory of Analogy, as proposed by Hofstadter and Sander (2013). The examples of analysis we bring here is part of a corpus we work with in our everyday classes as teachers of “Reading and Comprehension” to a secondary school in Brazil. As we deal with several types of texts such as newspaper articles and editorial, readers’ letters, comments on blogs, memes on Facebook and others network texts, it was necessary to limited the variety and quantity of these texts for this presentation, in order to guarantee qualitative well-done analyses. In this way, a comment in a blog and a Facebook meme were taken to explain how they are attractive and persuasive because of the use of these figures, and how the figures play an important role in their argumentative text function.

Pfaender, Stefan. Chair of Romance Linguistics, University of Freiburg. Title: Synchronization in Embodied Interaction: Collaborative Utterances as Multimodal Constructions? Abstract: Inspired by the growing body of research on synchronization in embodied interaction (Delaherche et al. 2012, Kim 2015, Schoonjans et al. 2016) and multimodal constructions (Steen & Turner 2013, Zima 2014, Mittelberg 2017), this paper examines how participants align both emergent linguistic constructions and emergent gestural patterns in dyadic real time interaction (Auer 2015). The data for this talk stems from the TACO(=TAlking COuples)-Corpus, containing to date 86 multimodally transcribed interactions; these data have been compiled within our recently started interdisciplinary research project “Synchronisation in Embodied Interaction”, funded by the Marie Sklodwska-Curie EU-Programme. The setting involves couples seated in familiar, naturalistic spaces – on a sofa in their living room, on chairs in their kitchen, or on a bench in a changing room, etc. While this setting is fairly naturalistic, the seating arrangement does resemble an experimental set-up because the participants are sitting very close to each other and both face the camera, so that they have to actively turn to the side if they want to look at each other. This set-up allows for comparative analyses of the collaboratively told stories in French, Spanish, Italian, German and Russian. As it is to be expected for collaborative storytelling, the corpus contains a high amount of collaboratively built utterances, or co-constructions (Günthner 2015). A closer look at the data reveals that these collaborative utterances are not limited to the interactants’verbal behavior. Much rather, participants also heavily synchronize their bodily behavior, particularly hand gestures, head movements and (upper) body sway, anticipating the syntactic and/or prosodic co-construction of an emergent utterance. Within the framework of Interactional Linguistics (Selting & Couper-Kuhlen 2001), we thus want to analyse the TACO-Corpus in order to delve deeper into both empirical and theoretical considerations of the display of togetherness (Schumann 2017) via both verbal and bodily synchronization in jointly produced utterances. Combining syntactic and motion analysis, the talk will aim at providing answers to the following questions: 1. How do participants synchronize their multimodal resources during their sense-making on the fly (= selfsynchronisation) 2. How do participants invite alignment, especially when taking deontical or epistomological stances and how do participants react to invitations, negotiating the emerging stance (= other-synchronisation)? 3. While achieving self- and other-synchrony in talk-in-interaction, do the participants make use of ad hoc strategies and/or sedimented multimodal constructions?

Piata, Anna; Soriano, Cristina. University of Geneva and University of Neuchâtel; Swiss Center for Affective Sciences, University of Geneva. Title: Spatial representations of time and emotion: Evidence on mental imagery from the PoetiCog literary database. Abstract: Time conceptualization has long been viewed as being metaphorically structured in spatial terms across languages (see, e.g., Clark 1973; Lakoff & Johnson 1980, 1999; Radden 2006) in the form of two main conceptual patterns: the Time-moving metaphor and the Ego-moving metaphor, each one assigning motion to a different agent (time and the experiencer of time, respectively) to construe the passage of time. However, a question arises as to what factors may influence which pattern speakers prefer to use in a given communicative event. In other words, in which context would one say Christmas is approaching and when instead We are approaching Christmas? Psycholinguistic research has recently suggested that affect is one of the contextual factors determining which metaphorical construal of time speakers opt for; positive events seem to be associated with the Ego-moving construal, while negative events are more likely to be represented as moving toward the experiencer (Margolies & Crawford 2008), a pattern that varies depending on whether events belong to the past or the future (Lee & Ji 2013). Similarly, psycholinguistic experiments have also observed a correlation between particular emotions and certain temporal representations; happiness tends to correlate with Ego-moving construals of time while anxiety and depression with Time-moving ones (Richmond et al. 2012). The proposed explanation for this time-affect association is that positive events foster agency and volition that tallies with the Ego-moving metaphor of time, whereas negative events are more conducive to passivity that can be better expressed through the Time-moving metaphor. There is no doubt that such evidence sheds new light into the metaphorical conceptualization of time through spatial imagery, by adding a new player to the scene: affect. However, to the best of our knowledge, thus far there is no relevant linguistic study of the alleged time-affect association. This paper aims to address this issue from the vantage point of language use and in particular from a discourse domain that favors the emergence of expressive and affective meanings: poetry. Our data come from PoetiCog, a corpus of poetic texts (19th-20th century) in three languages (English, Modern Greek and Spanish) stored in and accessed through SketchEngine. For the purposes of our study, we focus on English; we perform searches using lexemes from the semantic field of time (e.g., hour, moment, past…) and then annotate the corpus data that include a metaphorical motional pattern in the event representation. Our annotation categories include both motional features (moving entity, speed, directionality, etc.) and affective factors manifested in the immediate linguistic context of the temporal expression (positive or negative valence, emoter, emotion, etc.). This corpus-based line of research aims to empirically test the time-affect association assumed in the psycholinguistic literature. At the same time, it aspires to broaden the scope of the possible explanations behind this association by taking into account discursive factors (rather than solely psychological ones) that may play a role, such as generic genre expectations nd particular stylistic choices and rhetorical goals.

Pleshakova, Anna. School of Interdisciplinary Area Studies, University of Oxford. Title: Viewpoint Construction in Russian Media: "A Right to Voice One's Opinion". Abstract: We witness daily how the ideas of national unity, nationalism and patriotism promoted by persuasive and manipulative TV news messages often generate hatred and animosity at both national and international levels. There is a pressing need to understand how such messages are conveyed to audiences, and how and why they are likely to be understood by them. To satisfy this need, by working in collaboration with Prof. Francis Steen (UCLA, US) and Prof. Mark Turner (CWRU, US) (since January 2016), the PI has been conducting a pilot research project on multimodal analysis of Russian news, in which she created and analysed a pilot dataset of TVC programs. Using Red Hen Lab (www.redhen.org) as a platform, the ongoing project combines cognitive (linguistic) analysis - with the viewpoint blending analysis and blended joint attention (Fauconnier & Turner 2002, Dancygier & Sweetser 2012, Turner 2014, Pleshakova 2016) constituting the core method, media analysis and computational analysis to understand how the anchor in a news discussion program (“A Right to Voice One’s Opinion”) establishes a viewpoint which is subtly imposed on the viewer as authoritative. TVC’s “A Right to Voice One’s Opinion” is a Russian TV program, which focuses on the main international news of the day. Conducting the multimodal study of “A Right to Voice One’s Opinion” tests the proposed interdisciplinary approach on a smaller amount of data. The study focuses on the functioning of metaphor, which is working jointly with such productive multimodal viewpoint construction operations as: irony, parody, counterfactual, and expository questions. The study brings more immediate results in the form of answers to the following research questions: 1) What makes the viewer regard the anchor’s viewpoint as authoritative? 2) Are there markers that trigger and constrain the construction of viewpoint blending relying on metaphoric operations? 3) What do the findings tell us about the efficiency, limitations and possible development of the research methods and computer-based tools in their application to multimodal viewpoint construction analysis of TV news? 4) What does the analysis of an anchor’s performance as a viewpoint blend tell us about the construction of Russian media discourses? Ultimately, the paper presents the preliminary findings of this proof-of- concept study to show how persuasive and manipulative communication strategies and techniques can be analysed from an interdisciplinary perspective through the creation and multimodal (cognition-based) analysis of a large media dataset, using (open access) computer-based tools.

Pollaroli, Chiara. Institute of Argumentation, Linguistics and Semiotics (IALS), Università della Svizzera italiana, Lugano. Title: An integrated method for the analysis of multimodal tropes and their argumentative-rhetorical function. Abstract: The

persuasiveness multimodal tropes provoke is not only connected to the goals of appeal and recall, but, more importantly, to the argumentative-rhetorical inferential work they invite the audience to operate (Kjeldsen 2012; Pollaroli and Rocci 2015; Rocci, Mazzali-Lurati, and Pollaroli forthcoming). Multimodal metaphors in product advertising, for example, are blended absurd scenarios (Fauconnier and Turner 1998, 2002) which invite the audience to infer a property of the product (usually the target of the metaphor (Forceville 1996)) by exploiting familiar frames (Fillmore 2006 [1982]) which work as source domains. The inferential process stemming from the blended scenario is controlled by a generic space (Fauconnier and Turner 1998, 2002). Multimodal metonymies may manifest similar blended scenarios, but their inferential process operates differently: two (or more) entities of a complex scenario are highlighted in order to make the audience infer the causal chain composing the complex scenario itself (Fauconnier and Turner 1999; Rocci, Mazzali-Lurati, and Pollaroli forthcoming). This cognitive work has an argumentative-rhetorical function because it makes the audience infer an argument in support of a claim; in particular, the generic space works as the contextual premise of a rhetorical syllogism. Multimodal metaphors usually correspond to arguments by analogy, whereas multimodal metonymies usually correspond to causal argumentation (Kjeldsen 2012; Pollaroli and Rocci 2015; Rocci, Mazzali-Lurati, and Pollaroli forthcoming). How can this cognitive inferential work be made explicit? How can the rhetorical syllogisms activated by multimodal tropes be reconstructed and their persuasive effect described? A five-step method which integrates insights from semiotics, cognitive linguistics, multimodal discourse analysis, argumentation theory, and rhetoric (Bateman 2008; Phillips and McQuarrie 2004; Forceville 1996, 2008; Schilperoord 2014; Fauconnier and Turner 1998, 1999, 2002; Rigotti and Greco Morasso 2010) is proposed for the analysis of multimodal tropes and their argumentative-rhetorical value (Figure 1). Step 1 and step 2 are preliminary levels (cf. Bateman 2008) which enable the analysts to describe the multimodal piece of communication employing a multimodal trope. Step 3 stems from a combination of findings by Phillips and McQuarrie (2004), Schilperoord (2014), Forceville (1996, 2008), and insights from semiotics; here, after having identified the two or more entities of the trope (e.g., target and source of a multimodal metaphor), the possible visual structure and the contribution of textual information can be detected. Step 4 aims at understanding more in depth the conceptual integration network composing the trope by identifying the generic abstract structure governing it (Fauconnier and Turner 1998, 2002). The analysis of the multimodal structure and of the conceptual integration network is completed by the argumentative reconstruction of the enthymematic structure of the multimodal piece of communication (argumentative messages are usually constructed as enthymemes, i.e. with missing premises retrievable from the common ground (Rigotti and Greco Morasso 2010). The method is illustrated by means of the analysis of the multimodal tropes employed in the campaigns by Helvetas (https://www.helvetas.org/), a Swiss international network of organizations committed to improving the living conditions of disadvantaged people in Africa, Asia, America, and Eastern Europe.

Rekittke, Linn-Marlen. RWTH Aachen University. Title: Multimodal expressions of discontent: The role of interactive gestures in negotiating agreement during travel planning. Abstract: Besides the use of rising intonation and pauses (Clark & Wilkes-Gibbs 1986) and verbal sociocentric sequences marking common ground (e.g., you know; Bernstein 1962), Bavelas and colleagues (1992, 1995) have identified a set of gestural means that serve to coordinate conversations and to integrate dialogue partners, thus driving and orchestrating the development of an ongoing dialogue. The following four main categories of such interactive gestures have been established (Bavelas 1995): (1) delivering information; (2) citing the discourse partner; (3) word seeking; and (4) turn-taking. This paper aims to contribute to the research on interactive functions of co-speech gestures by focusing on stancetaking strategies. Stancetaking is a broad concept (e.g., Englebretson 2007) investigated from different perspectives including corpus linguistics (e.g., Biber & Finegan 1989), discourse analysis (e.g., DuBois 2007) as well as multimodal interaction research (e.g., Debras 2013; Schoonjans 2014). One fundamental property of everyday conversations is the speakers’ alignment and positioning with respect to her discourse partner(s) and the discourse contents (e.g., DuBois 2007). Drawing on DuBois’s (2007) intersubjective account of stancetaking as well as previous work on interactive functions of co-speech gestures (Bavelas et al. 1995), this study examines a collaborative task of planning an Interrail tour. This task type is preferred because it requires discussing and negotiating different interests and habits. The status of geographic knowledge also has a strong influence on the dynamics of the coordination processes involved. Here the multimodal acts of (dis)alignment and (dis)agreement with the conversational partner and the related evaluative processes of both discourse partner and contents are of primary interest. German dialogue data from the MuSKA (Multimodal Speech and Kinesic Action) Corpus compiled in the Natural Media Lab served as the basis of analysis. Dyads were asked to jointly plan an Interrail tour through Europe and recorded with audio, video and motion-capture technologies (Brenger & Mittelberg 2015). In line with microanalysis of face-to-face dialogue (Bavelas et al. 2016), we take a functional and multimodal perspective on interaction. Interpretation of the multimodal data was done through a careful observation and transcription of audible and visible responses (Schegloff 1998). In particular, our analysis highlights the interplay and dynamics of affective and epistemic stance markers of these highly contextualized, multimodal utterances. We discuss in detail a sequence, in which discrepancy in geographical knowledge between the dialogue partners leads to multimodal expressions of discontent. Whereas participant A exhibits a better geographical knowledge and demonstrates power over participant B, the latter is defending herself and replies with another multimodally expressed objection. Our analysis revealed not only “downtoning” (Waltereit 2006) modal particles in German such as ja and halt and the accompanying head gestures (Schoonjans 2014) and shrugs involving several body parts (e.g. Debras 2013; Streeck 2009), but also interactive gestural indices (Mittelberg & Waugh 2014) that ask for clarification or fend off the interlocutor. With this frame-by-frame analysis of a dialogue on travel planning we like to contribute to the broader theoretical framework of interactive functions of co-speech gestures and their affective and epistemic motivation.

Rocamora Abellán, Rafael. School of Tourism. University of Murcia. Title: Multimodal metaphors in the promotion of intangible products. Abstract: The present paper is an attempt to apply multimodal metaphor analyses of one of the most multidisciplinary and current branches of linguistics, Cognitive Linguistics, to the study of advertising of intangible products. This has already been applied to the study of advertisements from various fields of the market and has proved to be an effective and appropriate framework for applied research in virtually any subject ( Turner and Fauconnier (2008), Forcevile (2007, 2009, 2012), Forceville, and Urios-Aparisi (2009) Pérez-Sobrino, (2013, 2016) etc). Advertising, on the other hand, is a tremendously dynamic field and offers some very particular communicative conditions that make it especially attractive as an object of study. For this paper, different examples of services, not goods, that offer difficulty to the avertising creative departments, have been chosen, compared and contrasted to state their similarities and differences as far as a multmodal metaphor analysis is concerned. Examples taken from different sources (press, brochures, webs, magazines, etc) and from different fields (tourism, banking, insurance, etc), with their special characteristics, prove to be a really suitable field of evidence to develop some of the techniques of analysis of applied Cognitive Linguistics. Once a number of advertising examples for intangible products are analyzed, we will try to find the answers for the two research questions we begin with: First, Is it the combination of different modes what “creates” the metaphor? Or, secondly, Is the metaphor activated by means of only one of the modes upon which the other depend? Authors skilled in multimodal metaphor find a rich source of examples in advertising environments. In fact, metaphor and advertising coexist naturally since the internal logic of metaphor as a cognitive operation fits perfectly with the conventions of advertising as a genre: both are based on the ascription of the attributes of one particular domain over another (in the case of metaphor, of the source domain over the goal, in the case of advertising, the desired values on the advertised product). In order to achieve a classification of the multimodal patterns used in these fields of advertising, the analytical mechanisms of the multimodal metaphor and the theory of conceptual integration will be used in order to analyze what kind of associations are established between different modalities (image, sound, text, etc.) and how these projections between domains give rise to phenomena of conceptualization based on processes of induced categorization as well as the emergence of meanings useful for the advertising process.

Sadowska-Dobrowolska, Katarzyna. University of Maria-Curie Skłodowska. Title: Multimodality in the tanslation of humour in comics. Abstract: The paper is dedicated to the reflection on the translation of comic books defined by Farid as a literary genre that communicates a narrative message, on the one hand, through the medium of the image and, on the other hand, through the medium of the text (Farid, 1989 : 11). By translating a comic book, the translator must take into account its multimodal nature, which unites the verbal and iconic code to create a complete message. In addition, the generic nature of comics causes other difficulties to overcome in translation: the linguistic elements (especially wordplays) submit to drawings with which create an inseparable totality. Thus the meaning of comic book goes beyond the text itself and requires special translation strategies and techniques. All this constitutes a real challenge for the translator who, during his work, must focus not only on a good transmission of the original message but also on maintain the play of the meaning between the discourse and the image, A translator must also take care that the iconic message and language was just as consistent in the target text, as in the original. Our focus will be primarily humor and wordplays in comics and their cultural motivation (we treat translation as a double act of intercultural communication that encompasses cultural as well as linguistic elements). The cultural value of wordplays on one side concerns the semantic structure of a word (eg. connotations), and on the other relates to the cultural context that motivates their composition, internal structure and meaning. At lexical level, wordplays constitute bundles of semantic features that come from the meaning of words constructing these games and which relate to customs, beliefs, stereotypes, etc. typical for users of a language (and which are updated to texts). From the point of view of translation, it is important that the lexicon of a language present specific semantic values that frequently lack in other languages. All these problems are really important in translation and will be the object of our reflection and analysis. How to translate word games and are they translatable? Which translation strategies will be able to overcome the lexical asymmetry or cultural? What solutions make it possible to restore the wordplays of the original? In seeking an answer, we will present an interpretative analysis of French comics. It will aim at a comparison of the proposals made in the Polish and English versions and a reflection on the semantic effect they introduce into the text. We will check the strategies chosen by the translators and their influence on the lexical and stylistic structure of the target wordplay.

Schönherr, Beatrix. Institut für Germanistik, Universität Innsbruck. Title: Some Functions of Prosody and Gesture in Conversations on Stage. Abstract: Research addressing nonverbal communication and research upholding the paradigm of multimodality have both proven that nonverbal behaviour plays a crucial role in face-to-face interaction and relates closely to the verbal message. These findings hold true not only for the pragmatic aspects of conversation, but also for the most central dimensions of speech ––namely, syntax and semantics (cf. Schönherr 1997, Fricke 2012). The perception of syntactic structures and semantic relationships in spoken language is supported by nonverbal cues, particularly certain kinds of gestures, which in relation to syntax and semantics are quite similar to certain prosodic features and often coincide with prosodic cues. Co-occurrences of speech and gesture are thereby stable enough to be observed when an utterance is not produced spontaneously, but recited, as is the case with conversations on stage. In this talk, theatre data are considered to be a special kind of natural data. From a cognitive point of view, since the act of memorising and reciting language differs starkly from spontaneous utterance, the multimodal analysis of scenes on stage can provide new insights into the relationships among speech, prosody and gesture. In this talk, scenes from classical German plays are analysed following the approach of interactional linguistics (cf. Selting/Couper-Kuhlen 2001). It is shown how prosody and gesture structure speech and foreground important information as well as elucidate reformulations regarding their semantic and pragmatic implications. In this respect, prosody and gesture on stage do not differ from natural conversations. This result seems not to be very surprising, but if you keep in mind the entirely different conditions in which prosody and gesture on stage are produced, then the similarity between exchanges on stage and natural conversations cannot be regarded as a matter of course.

Schwering, Nathalie. Johannes Gutenberg-Universität Mainz. Title: Reading as Interaction: A Model for Shared and Varied Reader Reactions. Abstract: Reading is a fundamental human activity, even when there are no texts around. Every day, we engage in the constant and automatic activity of reading (and judging) each other. We are so used to passing judgment that sometimes we do it without being aware of it, and even if we are, we do not always question our judgments, or indeed where they came from. Recent findings in moral psychology have shown that our evolved mechanisms of moral judgment are quite fragile in the sense that they are very easy to interfere with. Literary authors can use a variety of narrative strategies (both consciously and unconsciously) to trigger, bias, and even reverse readers’ moral judgments while readers interact with texts. These strategies include emotional priming, triggering moral emotions, activating reader’s affective and/or cognitive biases, narrative deceptions like unreliability and metafictional twists, and, potentially most effective, changing or distorting the moral categories we employ to make moral judgments. Our mechanisms of moral judgment are at play both within the textual universe and in our interactions with texts. My model of reading as interaction uses categories of moral judgment to explain both shared and varied reader reactions to the same text. The text transmits the author’s multimodal communications to the readers, who in turn not only process the information, but react to it by drawing on their own multimodal communication skills. What is more, readers interact with narrative manipulations by employing real-life psychological mechanisms (again, both consciously and unconsciously) of deception detection, source tagging, withdrawing trust, and punishment behaviour. Crucially, in some cases, these behaviours are not only directed at the text, but also at the author. Understanding the biological roots of narrative manipulation not only helps to explain both shared and varied reader responses to literary texts, but it can also help us improve our deception detection skills and question our initial reactions (and those of others). Conversely, applying our thusly “improved” reading skills to non-literary texts such as news and social media can help us recognise emotional priming, intentional biasing, the triggering of dangerous moral emotions like disgust, impending ingroup/outgroup behaviour, and the skewing of our own moral judgments – critical skills we sorely need in the age of Brexit, Trump, and fake news.

Senkbeil, Karsten. Department for Intercultural Communication University of Hildesheim. Title: Multimodality in intercultural communication: a cognitive-pragmatic approach to intercultural transfers of multimodal cultural texts. Abstract: Many studies on authentic intercultural communication have revealed that miscommunication and misunderstanding among speakers from different native languages and cultural backgrounds occur less often than generally assumed in culture-contrastive studies of language (Bührig & ten Thije 2006, Kecskes 2014). This can partly be explained by factors centrally discussed in the relatively young linguistic sub-discipline Intercultural Pragmatics, such as the interlocutors’ heightened degree of awareness about linguistic differences and commonalities in authentic intercultural encounters, and a more conscious cooperation and co-construction process, to name only a few (Kecskes 2014, 19). Moreover, we can observe that authentic intercultural conversations feature – and in fact appear to benefit from – more bodily expressions and gestures than intracultural communication, a fact that can be explained by the cognitive roots of figurative language as the basis of many (though not all) gestures, such as the embodied mind (Lakoff & Johnson 2010) and image schemas (Hampe & Grady 2005). Meanwhile, internationally successful movies and television series, i.e. multimodal texts that cross cultural boundaries, have a long tradition of letting images and actors’ bodies “speak for themselves”, benefiting from culture- and language-independent patterns of multimodality to communicate complex stories to culturally and linguistically diverse audiences around the world. While the transcultural aesthetics of remarkable films has of course been recognized in film studies, cultural studies, and intercultural transfer studies, the necessary bridge to linguistics and, particularly, the cognitive linguistic roots of multimodal metaphors, image schemas, and embodiment has been rarely built. I argue that a combination of a cognitive and a pragmatic linguistic approach to multimodal communication sharpens our view for what happens when filmmakers and other text producers communicate across cultures, shedding light on why, for example, some cinematic macrometaphors seem to work worldwide, while others do not (and usually restrict the reach of that particular text to its ‘home culture’). I will quote from some of my results from example studies on interculturally successful film (Senkbeil 2017), advertising, and literature, Senkbeil 2015, 2017), to argue that the communication of compositionally complex meaning (multimodal, figurative, etc.) in intercultural contexts is often based on a “hidden common ground” across cultures, for which a cognitive-pragmatic approach may provide viable inroads.

Shukla, Abhinav; Fernandez, Carlos. IIIT Hyderabad, CCExtractor. Title: CCExtractor - Towards extracting all international formats of caption text and building a holistic subtitle processing tool. Abstract: Caption text is a very informative and useful modality in media analysis. Subtitles act not only as an accessibility aid for people with auditory impairment, but also as abundant sources of raw information which can be fed to natural language processing systems in order to gain valuable insights into human communication. CCExtractor is a tool which can extract subtitles from videos, with support for a wide variety of subtitle standards prevalent around the world. It is a core part of Red Hen’s data processing pipeline and was used to extract all the 2.5 billion words of caption text in the NewsScape dataset. Current CCExtractor features (or under active development) that we will talk about: A map of the world and the regions whose subtitle formats are supported, and what we plan to support in the near future. A description and demonstration of the DVB support that we have implemented to perfection, allowing modern European TV subtitles to be flawlessly extracted. Support for burned-in subtitle extraction using video processing and OCR. Support for continuous ticker tape style captions such as those present in news channels and prices of stocks etc. Real time processing of multiple incoming live TV feeds to build a worldwide repository of captions. Future efforts will be centered around the following: Automatic detection of ‘interesting’ parts of videos and annotating them in transcripts, such as laughter/jokes, events in sports videos such as goals, and other segments of interest. Subtitle synchronization between two different versions of the same footage using audio fingerprinting (e.g. to sync subtitles in a video after cropping commercials out of it) Developing a Python library to interface with other popular data science tools usually used for multimodal processing. Supporting caption extraction for any new capture stations that Red Hen sets up anywhere in the world, if not already supported. Aside from this, since we aim to develop ‘the’ holistic tool which is able to do anything that involves subtitles, we will be happy to hear any suggestions for use cases or things that we can work on. If it has anything to do with subtitles, we want to be involved.

Stave, Matthew; Pederson, Eric. University of Oregon. Title: When does a nod mean ”Yes”? The timing of back-channel head gestures during narration. Abstract: Head nodding has been mentioned from the point of view of kinesics (Birdwhistell 1970) and as a backchannel behavior (Schegloff 1982, Duncan 1975, Goodwin 1986). In the last thirty years, there has been more in depth exploration of head nod behavior: Iwano et al. (1996) explores the function of different head motions, while Maynard (1986, 1987) and Kita (1998, 2007) look at the integration of head nods with verbal back-channel behavior. Kogure (2007) also looks at the use of nods with other non-verbal behavior such as gaze coordination. However, none of these previous studies precisely examine the temporal relationship between listener head nods and other speaker and listener behaviors. What literature there is on interactional timing generally focuses on the verbal aspects of turn-taking rather than head gestures, e.g., Fusaroli & Tylen (2016) and Stivers et al. (2009). There are exceptions: Ishi et al. (2014) examines the synchronization of speaker head nods in Japanese speech and Louwerse et al. (2012) examines the co-occurrence of head nods for both speaker and listener, but such studies haven’t specifically examined the relationship between head nods and other behaviors. This paper reports on data collected from 40 narratives in English taken from dyad conversations using video (for precise spatial descriptions) and Kinect recording (for 3-D motion tracking). All head movements, changes in gaze direction, speech/vocalizations, and manual gestures were coded for speakers and listeners. We compared each listener behavior with each temporally proximate (within -1500ms to +1500ms) speaker behavior. Listener nods are extremely responsive to the speaker’s gaze shift towards them (nearly 50% of such gaze shifts elicit listener head behavior). This visual modality result parallels findings about listener speech back-channels, which tend to be timed precisely to follow immediately after the offset of speaker speech. Long nods (nods or chains of nods in the upper tertile of duration) follow speaker speech (45%), head gesture (63%), and gaze toward the listener (69%). Short nods and other head behaviors are less generally less responsive to the close of speaker behavior. This suggests that long (mostly multiple) nods may be largely in response to what the speaker is communicating (more ‘yes’-like), while other listener head behaviors function more along the lines of ‘continuer’ back-channels. Current work is investigating precisely to what extent the function of a head nod aligns with the form, as opposed to the timing, of the nod. For example, can we determine that a listener head nod following speaker’s speech offset is typically an affirmation of content whereas a nod during speaker’s speech is an indication of attention or other interactive information? The former is presumably a more consciously guided activity and depends on semantic processing of the speaker’s speech. The latter is presumably a largely automatic response and can be more closely linked to speaker non-verbal behavior. This work represents a first foray into a quantitative approach to the temporal and morphological structure of both verbal and nonverbal listener and speaker interaction.

Steen, Francis. Professor, Department of Communication, UCLA. Title: On establishing an integrated research agenda and workflow for multimodal communication. Abstract: Human communication is a core research area, intersecting with every other human endeavor. An improved understanding of the actual and effective complexities of human communication will have consequences for a broad range of fields, from politics and religion to education and business. Historically, the study of human communication has long been recognized as a central discipline, reaching back to antiquity when the study of rhetoric was the core element in education. Theoretical advances in the understanding of human communication date back millennia, to the early grammarians. The focus was mostly on written language, since the data record lent itself to systematic study. Human communication, however, has always been multimodal, and modern communication technologies have been developed to allow the full visual and auditory channels of face-to-face communication to be broadcast globally. These broadcasts can in turn be electronically captured and stored, making vast datasets of real world multimodal communication for the first time available for systematic scientific study. These new datasets present a radical new challenge: to develop a new and integrative model of the full complexity of human communication, building on existing advances in linguistics. To advance research into human multimodal communication and its role in human endeavors, Mark Turner and I founded The Distributed Little Red Hen Lab. Red Hen is designed to function as a global research commons, a practical and theoretical platform for research into multimodal communication. It provides core datasets, maintains a wide and rapidly growing network of researchers, develops an expanding suite of new computational tools, and facilitates the exchange of skills and the identification of the complementary forms of expertise required to make rapid progress. It aims to create an efficacious multilevel integrated research infrastructure and workflow along the following lines.

Suchan, Jakob; Bhatt, Mehul. Human-Centred Cognitive Assistance Lab., University of Bremen. Title: Obsessed by Symmetry: A Multi-Modal Study on the Perception and Interpretation of Symmetry in the Films of Wes Anderson. Abstract: We focus on the case of ``symmetry'' in the visual and cinematographic structure of the moving image, and present a multi-level model of interpreting symmetric patterns therefrom. Whereas our computational model is general, we particularly focus (in this presentation) on the works of Wes Anderson as one case-study for its compelling, vivid, and holistic use of symmetry in a range of aspects such as character placement, editing, and even on-screen character interactions at the level of the screenplay. Our model provides the foundation for integrating scene analysis with the analysis of its visuo-spatial perception based on eye-tracking data. This is achieved by the integration of: computational semantic interpretation of the scene ---involving scene objects (people, objects in the scene), cinematographic aids (camera movement, shot types, cuts and scene structure)--- and perceptual artefacts (fixations, saccades, scan-path, areas of attention). The model is compared with an independent large-scale empirical human study where subjects qualitatively rate their perception of the levels of symmetry for a subset of randomly sample images (going beyond the domain of film) from a large pool. Visuo-Cinematographic Symmetry - Patterns in Space-Time: Symmetry is represented within our computational framework as a multi-level model allowing formal analysis at different layers of abstraction; we look at symmetry on three levels: Scene Level: Symmetry in the editing of a scene is defined by symmetric use of cinematographic aids, e.g. intercutting between characters, symmetric camera movements; Object Level: Symmetry on the object level is defined based on the placement of objects and people in the frame. E.g. placing characters in the symmetry axis of the frame; Image Level: Image level symmetry is defined based on low-level features that support a symmetry axis, i.e., contrast edges, textures, etc. On this level symmetry can occur in multiple places and objects can be symmetric within themselves. Our multi-level model of symmetry provides a comprehensive characterisation of the symmetric structure of a scene, connecting high-level conceptual categories to low-level visual features in the image. Space-Time Symmetry and Visual Perception: Eye-Movement Patterns and Gaze Transitions: The perceptual reception of symmetry is studied by analysing eye-movement behaviour of spectators, and correlating them to the multi-level symmetry model of the scene. Perceptual data encompasses individual eye-movements, and aggregated gaze data of multiple spectators. We investigate the perception of symmetry in the context of an eye-tracking dataset. Our experiment focusses on analysing the relationship between spatio-temporal symmetry and resulting gaze transitions vis-a-vis low-level features \& corresponding high-level objects. Our preliminary results suggest that symmetry in the composition of frames and the editing reduces the gaze transitions on the object level and that symmetric editing and camera movement reduces eye-movements after a cut.

Sukhova, N. V.; Nikolaeva, Y. V. Lomonosov Moscow State University; National University of Science and Technology "MISiS"; Lomonosov Moscow State University. Institute of Linguistics, Russian Academy of Sciences Title: On Defining Cephalic Gesture Categories. Abstract: There is a wide variety of studies on gestures in general (cf.McNeill 2000; Müller et al. 2013; Seyfeddinipur, Gullberg 2014), mostly considering hand gestures, whereas other kinetic forms (the head movements in particular) have not been investigated so widely (see Hadar et al. 1985; Bull 1987; Kousidis et al. 2013). The paper reports some initial steps towards the definition of cephalic gesture categories in the discourse. Head movements are studied on the corpus resource called "Russian Pear Chats and Stories" (see www.multidiscourse.ru in Russian) aimed at describing multimodal data within a unified approach. For annotation purposes we should address some fundamental concerns. Firstly, we know that the nature of cephalic gestures is inherently different from that of manual gesticulation. Hand gestures are distinct from insignificant movements called adaptors (Ekman, Friesen 1969) or manipulators (Ekman 1999). Hence, here is the first question: whether we should discriminate between head gestures and head movements, not being meaningful signs. The second question is: what is a unit of annotation – a movement or a period between movements (we may call it a head posture), thus discerning figure and ground according to (Müller, Bressem et al. 2013). The third question is: if a head gesture is a simple movement or it can include several movements. In our project we consider all head movements as gestures, since at this stage it is impossible to differentiate meaningless head movements and voluntary head gestures, while all of them apparently play a role in communication. Secondly, some head movements are conventionalized and conceptualized in language, like nod, shake and jerk in English. On the one hand, literature analysis suggests that the research usually revolves around those notions/ gestures (McClave 2000; Allwood, Cerrato 2003; Benoit, Caplier 2005). Undoubtedly though, here there is a danger to mix formal, functional and semantic approaches towards defining the cephalic gestures. Moreover, with this viewpoint it is not quite clear how we should tackle all the other head movements which can be seen in discourse and which do not fall into those categories. Thus, our consideration is that gestures are the movements and the pauses between them are rest positons. There should be a physical effort, a change in the direction of a movement along the three axes, i.e. vertical, horizontal and angular (see the Figure 1 p. 212 in Wagner et al. 2014), and change in its intensity for us to single out or label a gesture. Thus, annotating head gestures in ELAN, we have a vocabulary for a free Type of Head Gestures, which includes all kinds from simple – to compound – to complex gestures, where e.g. a simple gesture is a drop; a compound gesture is a forward-back; and a complex gesture is a multiple nod. There are 20 items in the vocabulary with a physical description of each. Some movements may combine (for example, a nod and a turn-right), so an annotation represents physical complexity of the movement.(Research underlying this paper is conducted with the support of grant no. 14-18-03819 from the Russian Science Foundation)

Tong, Yao. Vrije Universiteit Amsterdam. Title: Words accompanied by representational gestures, their grammatical categories and semantic fields: A corpus-linguistic approach. Abstract: Representational gestures primarily refer to either an abstract or concrete entity or an event, in contrast to gestures primarily serving a pragmatic or discourse-structuring function. They are claimed to be activated by mentally simulated actions (Hostetter & Alibali, 2008) and are usually can be seen as bearing semantic content (McNeill, 1992). This research attempts to provide an overview of the linguistic profile of representational gestures, that is, what kinds of words, grammatical categories, and semantic fields are likely to co-occur with representational gestures. The corpus used in this research contains 50 short video clips (about 200 minutes) taken from the The Ellen DeGeneres Show. The spoken utterances co-occurred with representational gestures were identified by two coders. The whole corpus was divided into two sub-corpora: the corpus of spoken utterances accompanied by representational gestures and the corpus of spoken utterances without such gestures. The two sub-corpora were tokenized, lemmatized, and grammatically tagged using Stanford natural language processing pipelines (Manning et al., 2014) and the Wmatrix semantic tagging tool (Rayson, 2008). The representational gesture corpus comprises 4,229 tokens in total and the speech-only corpus – 35,658 tokens. The study uses the Relative Frequency Ratio (Damerau, 1993) to measure the association strength of linguistic categories with representational gestures. The corpus-based observations have shown that representational gestures are likely to co-occur with cardinal numbers, nouns, prepositional or subordinating conjunctions; and tend to repel modals, interjections, and wh-determiners/pronouns/adverbs. The most representational gesture attractive semantic fields tend to be more human-oriented, for example, ‘body and the individual’, ‘architecture, buildings, houses and the home’, ‘movement, location, travel and transport’; and the representational gesture attractive semantic fields tend to be more abstract, for example, ‘emotional actions, states and processes’ and ‘psychological actions and processes’. This research provides a rich resource to examine patterns of representational gestures and their natural linguistic contexts.

Trujillo, James P.; Vaitonyté, Julija; Simanova, Irina; Bekkering, Harold; Özyürek, Asli. Donders Institute for Brain, Cognition and Behaviour, Radboud University Nijmegen, The Netherlands; Centre for Language Studies, Radboud University Nijmegen, The Netherlands; Max Planck Institute for Psycholinguistics. Title: Kinect and gesture research: a validation study of automatic feature coding. Abstract: Human communication is multimodal, utilizing not only speech but also non-verbal channels such as eye-gaze, facial expression, gesture, and action. Multimodality research enables us to understand how information is expressed and transferred using these non-verbal cues. Although there is a wealth of information available in any given interaction, there are also many different approaches to quantifying and analyzing the data. Typically, researchers analyze multimodal corpora manually, using annotation tools such as ELAN to code various movement features such as gesture size and complexity. Manual coding is, however, time-consuming and may be imprecise due to the need for human coders to annotate the data using perceptual metrics (eg. Judging size based on a 2-dimensional video still-frame, or making subjective categorizations of the complexity of a particular gesture). We therefore set out to develop and validate an automatic and efficient way to collect and analyze gesture data. In a previous study1, we collected kinematic data using the Microsoft Kinect, which tracks human joint positions without the need for markers or extensive calibration. Forty participants were recorded while producing pantomimes of 31 actions (eg. “cut the paper with the scissors”) using both the Kinect and HD video recording. We developed scripts to automatically calculate several kinematic features from the Kinect data, including sub-movements (number of individual movements comprising the overall gesture), peak-velocity (fastest movement within a gesture), hold-time (amount of time in which the hands are static), and use-of-space (how much vertical space is used for a gesture). For the current study, we asked a human coder to calculate these same values using manual coding of a subset of videos from this previous study (n = 120) 1, representing 3 videos from each participant and 3-4 videos of each pantomimes. These data are thus representative of an even selection (10%) of the original gesture data. We ask whether the automatic coding is measuring the same features (strong concurrence between manual and automatic coding), and subsequently discuss the differences between the two approaches in terms of precision and utility. We show that the Kinect can be used for naturalistic data collection and can provide accurate calculation of several kinematic features. The automatic (kinect) coding is in line with the manually coded features. Specifically, we found a strong correlation between manual and automatic coding of sub-movements and hold-time, and a significant concurrence between manual and automatic coding of space, as assessed with Cohen’s kappa. Peak velocity was not found to have a positive correlation, which may be due to the perceptual difficulty of coding this feature. Automatic coding may therefore provide a solution. This study therefore has implications for multimodality research in general as it provides a novel technique that can be advantageous for studying gesture features and kinematics. Furthermore, it provides relatively untapped potential for multimodal researchers to further develop automatic and efficient coding routines for gesture data.

Turner, Mark. Institute Professor and Professor of Cognitive Science, Case Western Reserve University. Title: “Red Hen tools for the study of multimodal constructions.” Abstract: The Distributed Little Red Hen Lab (http://redhenlab.org) has been developing new tools for several years, with support from various agencies, including Google, which has provided two awards for Google Summers of Code, in 2015 and 2016. These tools concern search, tagging, data capture and analysis, language, audio, video, gesture, frames, and multimodal constructions. Red Hen now has several hundred thousand hours of recordings, more than 3 billion words, in a variety of languages and from a variety of countries. She ingests and processes about an additional 150 hours per day, and is expanding the number of languages held in the archive. The largest component of the Red Hen archive is called “Newsscape,” but Red Hen has several other components with a variety of content and in a variety of media. Red Hen is entirely open-source; her new tools are free to the world; they are built to apply to almost any kind of recording, from digitized text to cinema to news broadcasts to experimental data to surveillance video and more. This interactive workshop will present in technical detail some topical examples, taking a theoretical research question and showing how Red Hen tools can be applied to achieve final research results.

Vajrabhaya, Prakaiwan. RWTH Aachen University. Title: Not so smooth criminal: Pointing gestures in crime news report in Thai newspapers. Abstract: The crime section in Thai newspapers often reports when a suspect of a crime has been apprehended by the police. In addition to reporting details of the crime in a prose (e.g., name, date, time, location, and description of the crime), it is common for Thai newspapers to include a photo of the suspect identification process in the news report. In this observational study, over fifty photos of the suspect identification process, taken from four major Thai newspapers were examined. In these photos, the victim is found to stand to the side of the suspect, performing a fully extended arm or partially extended arm with the index finger extended towards the suspect. No other kinds of pointing gestures (e.g., open hand palm up or pointing with thumb) are found. Pointing gestures vary in forms and can serve a wide array of functions in discourse (Kendon, 2004). To point with the index finger also varies in form and could, for example, function to introduce a new referent (Kendon & Versante, 2003) or to create joint attention during an interaction (Mondada, 2014). The function of pointing with the index finger in the context of crime news report in Thai newspapers, however, extends beyond a communicative function. In particular, I suggest that it is conducted to publicly shame and/or humiliate the suspect, which fulfills a social function. The discussion will be situated in the realm of: (1) impoliteness and Thai social norms (e.g., Andrén & Zlatev, 2016 on Thai children’s sensitivity to pointing at people); and (2) how non-verbal communication plays a role in mass media in Thailand.

Wabende, Scholastica. PHD student, Moi University. Title: Gender and Power as Constructed in Bukusu Circumcision Discourses. Abstract: This study is an analysis of circumcision discourses among the Bukusu from the approach of interactional sociolinguistics (Gumperz,1982a,1982b,1996) with an integration of multi modal approach(Kress and Van Leeuwen, 1996: Jewitt, C.2006)). Circumcision is rite of passage which is held after every year as a way of initiating the young ones into adulthood (Wanyama, 2009; Makila,2004). During such ceremonies, there are songs that are sung for entertainment and these songs are predominantly about male and female relationships. The songs to some extent are marked with interactions and of importance the paralinguistic features that interact with verbal aspects in these discourses and how they construct gender and power among the Bukusu people. We therefore argue that verbal aspects in these discourses are not enough in describing how language functions. This is according to the recent studies on gender and power in relation to such discourses (Coates, 1996;Gunthner, Susanne, 1992,Bourdieu, 1994; Tannen,1981). This study therefore seeks to find out the significance of paralinguistic features in construction of gender and power relations among the Bukusu community. Multi modality is necessary in capturing the relationship between verbal aspects basing on the argument by Kress and Leuween (1996). In this, there is analysis of communication in all its forms but mostly concerned with texts which contains the interaction and integration of two or more semiotic modes of communication in order to achieve the communicative functions of the text. These resources include aspects of speech such as intonation and other vocal characteristics: the semiotic action of other bodily resources such as gestures. Circumcision discourses are accompanied by dances, ululations, chants, speeches, use different tones as captured in the songs and other interactions and even facial expressions among others. Interactional sociolinguistics is important in this analysis because the centre of the argument is how users create meaning and in this case how gender and power are constructed. The following key areas of IS will be crucial; linguistic features can play a large part in conveying meaning and hence in negotiating relationships. Secondly, the speakers of different cultural backgrounds develop systematically different conventions for using and interpreting linguistic features. There is also a demonstration that differing contextualization cues contribute to the perception and the reality of social inequality and discrimination. The study conforms to the contributions by Coates that discourses construct gender and through this there are representations of inequalities as there is struggle for dominance hence the revelation of power.(Coates, 1996). The study benefits from ethnography as a method of analyzing data as this will allow access to Bukusu cultural meanings of some lexical items through a close association and familiarity with the social setting (Brewer 2000:10). Ethnography is a method of social inquiry that facilitates the study of people in naturally occurring settings or fields by means of methods which capture their social meanings and ordinary activities. The research also adopts a self-observing approach which calls for close participation in the interactions. The study focuses on audio and video recordings of the interactions and analysed within the interactional sociolinguistics.

Wahn, Basil; König, Peter. Institute of Cognitive Science, University of Osnabrück, Osnabrück, Germany; Department of Neurophysiology and Pathophysiology, Center of Experimental Medicine, University Medical Center Hamburg-Eppendorf, Hamburg. Title: Is Attentional Resource Allocation Across Sensory Modalities Task-Dependent? Abstract: Human information processing is limited by attentional resources. That is, via attentional mechanisms, humans select a limited amount of sensory input to process while neglecting other sensory inputs. In multisensory research, a matter of ongoing debate is whether there are distinct pools of attentional resources for each sensory modality or whether sensory modalities share attentional resources. Recent studies have suggested that attentional resource allocation across sensory modalities is in part task-dependent. That is, the recruitment of attentional resources across the sensory modalities depends on whether processing involves object-based attention (e.g., the discrimination of stimulus attributes) or spatial attention (e.g., the localization of stimuli). Here, we report and review findings in multisensory research related to this view. For the visual and auditory sensory modalities, results suggest that distinct resources are recruited when humans perform object-based attention tasks, whereas for the visual and tactile sensory modalities, partly shared resources are recruited. If object-based attention tasks are time-critical, shared resources are recruited across the sensory modalities. When humans perform an object-based attention task in combination with a spatial attention task, partly shared resources are recruited across the sensory modalities as well. Conversely, for spatial attention tasks, attentional processing does consistently involve shared attentional resources for the sensory modalities. Generally, findings suggest that the attentional system flexibly allocates attentional resources depending on task demands. We propose that such flexibility reflects a large-scale optimization strategy that minimizes the brain’s costly resource expenditures and simultaneously maximizes capability to process currently relevant information.

Wozny, Jacek. University of Wroclaw. Title: Multimodal blending in meaning construction of Polish "chodzi o" and Czech "jde o" ('it is about'). Abstract: Conceptual Blending Theory in its original form (Fauconnier and Turner 2002, Turner 2014) and extensions (for example, Brandt and Brandt 2005, Oakley and Coulson 2008, Brandt 2013, Pérez-Sobrino 2014) describes the basic, dynamic and transient mental operations of juxtaposing 'input spaces' to construct meaning. One of the important areas of the rapidly expanding research of conceptual integration examines the case when the input spaces are triggered by signals belonging to different modalities (for example, aural and visual). Such cases are referred to as 'multimodal blending'. The paper investigates the meaning construction of Polish and Czech expressions "chodzi o", "jde o" (in literal translation: "it walks about"), which are typically used to refer anaphorically to the topics of the newscasts in Polish and Czech media. The main research question asked is how inputs belonging to different modalities interact with one another in the construction of meaning of the above expressions. The dataset comes from the television news archive (UCLA Newsscape Archive) of the Distributed Little Red Hen Lab, co-directed by Francis Steen and Mark Turner. Excerpts of Polish and Czech news programs (video and text) containing the expressions "chodzi o" and "jde o" were collected with the use of UCLA Edge search engine. Each of the excerpts was then analyzed with respect to the multimodal input spaces and their interaction. In conclusion, the paper proposes that in each of those cases three blends are constructed from input spaces belonging to conceptual, aural and visual modalities. The conceptual input mentioned above can be described as 'small spatial story' (Turner 1996: 13), and the remaining two input spaces are triggered by the spoken news commentary and the accompanying image on the television screen. The research method and analysis carried out can be applied to other 'metaphorical' expressions ("chodzi o" and "jde o" can be categorized as TOPICS ARE PHYSICAL OBJECTS IN SPACE) to shed more light on the nature of individual inputs interaction in multimodal blending.

Zima, Elisabeth; Brône, Geert; Weiß, Clarissa. University of Freiburg; University of Leuven; University of Freiburg. Title: On the role of eye gaze in competition for talk. Simultaneous starts in triadic interactions. Abstract: Studies in conversation analysis have abundantly shown that face-to-face interaction is a tightly organized process, in which the participants resort to a variety of semiotic resources to manage a smooth transition of speaker turns (Sacks et al. 1974, Goodwin 1981, Mondada 2007, Clayman 2013, Hayashi 2013 a.o.). Whereas normal conversation is typically assumed to unfold according to the norm that speakers take only one turn at a time (i.e. there is hardly overlap that may disrupt the flow of the interaction), the instances in which speakers overlap may be of particular interest to (multimodal) interaction analysis (Schegloff 2000). These include, for instance, terminal overlaps in which a speaker starts his/her turn before the prior speaker has finished, based on a projected completion (Jefferson 1984), and choral forms of talk which are expected to be done simultaneously, such as collective greetings and congratulations. In this paper, we zoom in on an instance of overlap that is potentially more competitive than the previous two examples, viz. simultaneous starts of turns by two or more speakers. Based on the core assumptions of the turn-taking model, an overlap resolution device should deal with these potentially problematic instances of simultaneous talk, resulting in one speaker dropping out of the competition (Schegloff 2000). From a multimodal perspective, simultaneous starts present an interesting case, because their resolution seems to be managed in great part nonverbally, and more specifically through specific gaze constellations. In this paper, we take data from triadic conversations (both in German and Dutch), in which the participants’ eye gaze behavior was measured using mobile eye-tracking data. As a result, we have access to all participants’ visual fixation points at each point in time during the entire course of the interaction. We compare the gaze behavior of all three participants in cases of (a) simultaneous starts (defined as two turns starting within a time frame of 0 - 200 milliseconds, Walker 2015) of two (or occasionally even three) speakers that result in only one speaker finishing his/her turn while the other one abandons his/her turn before reaching a point of turn completion as well as (b) sequences in which more than one speaker starts a turn simultaneously and they all finish their turn, resulting in a longer stretch of overlapping speech. The analysis reveals three dominant gaze patterns. First, speakers are more likely to finish their turn if they manage to secure their recipients’ gaze. This pattern is highly robust and dominant in very different conversational activities, such as storytelling (by one or two speakers), joint brainstorming activities, discussions etc. The gaze analysis hence corroborates and strengthens prior claims on the influence of recipients on speakers’ behavior in conversation (Lerner 2004). Furthermore, we found that if competing speakers were engaged in mutual gaze before the simultaneous start, the one who first looks away first, i.e. dissolves the mutual gaze phase, is more likely to abandon his/her turn. However, if competing speakers did not share mutual gaze before starting to talk simultaneously, the speaker who first gazes at the competitor is more likely to drop out.