SSoL 2019: lecture abstracts – Summer School of Linguistics

Kateřina Chládková

Phonetic learning in adults

Language users exhibit remarkable phonetic learning abilities throughout the lifespan. Speech sound learning is typically considered as a mechanism most active in infancy and early childhood and is thought to attenuate with age. In this talk I will review experimental evidence from e.g. lexical adaptation studies, distributional training, and talker and accent adaptation experiments, all of which indicate that a powerful speech sound learning mechanism operates throughout adulthood.

Speech sound acquisition in infants

Infants acquire the speech sound inventory of their native language(s) roughly within the first year of life: at the outset they seem to be nearly-perfect (universal) perceivers able to distinguish any speech sound difference and gradually become attuned specifically to those differences that are ‘meaningful’ in their language environment. I will present findings on speech sound development across languages, across phoneme classes, and across infant ages and will sketch a proposal that could account for a variety of language- and phoneme-specific developmental trajectories that had been observed.

Phonetic learning before birth

Humans are not born as linguistic blank slates: they learn a lot about (their mother’s) language while still in the womb. Studies show that near-term fetuses and newborns recognize their native language, a familiar song or rhyme, and are even able to discriminate (at least some) speech sound differences. In this talk I will review the various aspects of language-specificity that is found shortly before and shortly after birth. I will focus on recent findings on cortical tuning to (native) speech and will present experimental data compatible with a prenatal onset of speech sound acquisition.

Jan Chromý

Erroneous processing of locally ambiguous sentences

The lecture will address the idea of Good-Enough processing of locally ambiguous sentences (so-called “garden-path” sentences). Results of four experiments on Czech using self-paced reading will be presented. I will show that native speakers of Czech have problems in analyzing this kind of sentences – just as the Good-Enough Approach would claim. However, certain findings of these experiments cannot be accounted for by this approach. Other possible interpretations of both for my data and for the results of various experiments under the Good-Enough Approach will be discussed and evaluated.

Václav Cvrček, Zuzana Komrsková

Register variation in Czech: A multi-dimensional approach (workshop)

In this workshop we will guide the participants through the multi-dimensional analysis, a widely used and highly valued corpus-based approach to register variation of texts. We will start with identifying the theoretical assumptions, then we will move to the specificities of MD analysis in Czech and cover the resources available (corpora, tools, sources for compiling the list of relevant linguistic features). The workshop will focus on the crucial part of any MD analysis, i.e. interpretation of results. Participants will be provided with the results of statistical (factor) analysis of the Czech data and will be guided through the process of adding the labels to dimensions taking into account the feature loadings and factor scores.

Ondřej Dufek

Corpus linguistics and (critical) discourse analysis together: A case study of language ideologies in public discourse

The lecture will present benefits and discuss limitations of corpus approaches to discourse analysis. Special attention will be paid to the concept of keyness (keywords are common way of “looking into” the corpus/discourse) and how to measure it adequately to what we want to find out. I will show a possible application of corpus tools in a case study of language ideologies in Czech public discourse, represented by parliamentary and media part.

Viktor Elšík

Linguistic Atlas of Central Romani

Central Romani, one of the major dialect groups within Romani (e.g. Boretzky 1999, Matras 2002, Elšík & Beníšek 2019), traditionally spoken in the area of the historical Kingdoms of Bohemia, Hungary, and Galicia, remains one of the last idioms of East Central Europe and one of the last major dialect groups of Romani that lack a comprehensive description of their cross-dialectal variability. A dialectological atlas of Romani (Boretzky & Igla 2004), which draws almost exclusively on published sources, represents Central Romani by less than a dozen of regional varieties. The few existing publications on the cross-dialectal variability within Central Romani (Lípa 1965, Boretzky 1999, Elšík et al. 1999, Červenka 2006) are likewise seriously limited in their coverage and sources of data.

The lecture will present the objectives and the methodology of an on-going project that will result in the publication of the Linguistic Atlas of Central Romani, a comprehensive resource based on a wealth of data from several hundred local varieties of Central Romani. I will discuss the geographical scope and coverage of the project; its sources of data; the structure of the Linguistic Questionnaire for the Documentation of Central European Romani, which has been used to elicit linguistic data in the field; the structure of the Linguistic Database of Central European Romani, which has been used to store and analyse the data; the list of linguistic features to be included in the Atlas; the structure of the prose comments to the maps; and several methodological problems encountered during the project. I will also present several preliminary maps of selected features.

References

Boretzky, Norbert. 1999. Die Gliederung der Zentralen Dialekte und die Beziehungen zwischen Südlichen Zentralen Dialekten (Romungro) und Südbalkanischen Romani-Dialekten. In: Halwachs, Dieter W. & Florian Menz (eds.) Die Sprache der Roma. Perspektiven der Romani-Forschung in Österreich im interdisziplinären und internazionalen Kontext. Klagenfurt: Drava. 210–276.

Boretzky, Norbert & Birgit Igla. 2004. Kommentierter Dialektatlas des Romani. Wiesbaden: Harrassowitz Verlag.

Červenka, Jan. 2006. Dialektní specifika severocentrální romštiny ve středoslovenských oblastech Kysuce, Turiec a Liptov. Praha: Signeta.

Elšík, Viktor, Milena Hübschmannová & Hana Šebková. 1999. The Southern Central (ahi-imperfect) Romani dialects of Slovakia and Northern Hungary. In: Halwachs, Dieter W. & Florian Menz (eds.) Die Sprache der Roma. Perspektiven der Romani-Forschung in Österreich im interdisziplinären und internazionalen Kontext. Klagenfurt: Drava. 277–390.

Elšík, Viktor & Michael Beníšek. 2019+, in print. Romani dialectology. In: Matras, Yaron & Anton Tenser (eds.) The Palgrave Handbook of Romani Language and Linguistics. Palgrave Macmillan.

Lípa, Jiří. 1965. Cikánština v jazykovém prostředí slovenském a českém. K otázkám starých a novějších složek v její gramatice a lexiku. Praha: Nakladatelství Československé akademie věd.

Matras, Yaron. 2002. Romani: A linguistic introduction. Cambridge: Cambridge University Press.

Monique Flecken

Seeing and speaking about events

In this lecture I will discuss theories on language production, with a specific focus on the process of sentence production. I will talk about empirical research on the relation between seeing and speaking, for the domain of events. Cross-linguistic comparisons of event description in native speakers and second language learners will be discussed.

Cross-linguistic influences in event cognition

In this lecture I will discuss theories of linguistic relativity. I will outline empirical, cross-linguistic research on event cognition (perception and memory, specifically).

Christina Kim

Ellipsis, focus, and discourse structure

A puzzle about ellipsis is that, on the one hand, different elliptical constructions appear to be constrained by construction-specific, language-specific grammatical factors. For example, Verb Phrase ellipsis is possible in English but typologically uncommon, whereas other varieties of ellipsis like sluicing appear to be typologically widespread. On the other hand, ellipsis in general seems obviously tightly linked to information structure, which should in principle be language-independent. While conceptually attractive, it has proven difficult to fully account for the range of acceptability of various ellipsis constructions with only a general information structural condition related to discourse well-formedness. In these sessions, we will look at some of these attempts, then take a slightly different approach, adopting a Question Under Discussion-type framework (van Kuppevelt 1995; Roberts 1996) of discourse structure, and explaining variability in judgments across elliptical constructions in terms of the recoverability and availability of the appropriate QUD. We will ask how implicit questions are cued by different focusing devices, and look at the relationship between adjacency/sequential order and discourse structure.

Nicholas Lester

Taking up spaCy: A beginner’s guide to Natural Language Processing with Python

(Requirements: laptop required; install: Python and spaCy – instructions here)

Linguists of all stripes are relying more and more heavily on corpora to produce and validate their theories of language structure. A key aspect of these corpora is their size, which allows one to collect many different instances of their target phenomenon. But this size can also be daunting: one should probably not read through 100 million words of text to find all relative clauses! Therefore, we need to automate our searches, especially when we seek abstract units (such as “all relative clauses”). To do this efficiently, we need annotation. To get this annotation (on the grand scale), we need NLP. This course will introduce you to a simple but powerful tool for NLP called spaCy. We will cover some basics of the Python programming language before diving into practical examples of how to tokenize, lemmatize, tag, parse, and extract semantic vectors from a corpus. We will end with a brief tutorial on how to train your own spaCy models based on existing databases. Don’t worry; spaCy makes all of this quite easy!

Linguistic distributions in lexical processing and acquisition

Combinatoriality is a central component of human language. By rearranging linguistic units in various ways, we achieve almost limitless expressive power. One consequence of this fact is that we experience these linguistic units in a vast number of different contexts. As it turns out, this experience plays a critical hand in shaping our linguistic representations, from early infancy all the way into adulthood. This course will introduce the nature of contextual variability at several levels of linguistic structure – segmental, prosodic, morphological, lexical, syntactic, and discourse – and how to measure it for words. Effects of contextual variability are discussed in three domains: word learning, lexical comprehension, and lexical production. Data come from computational, experimental, and observational (corpus-based) studies. The findings are framed in current theoretical models of learning and lexical representation.

Quantitative analysis of language acquisition in under-resourced languages

Despite a vast literature, research on child language acquisition suffers from three major issues. First, while the number of human languages is estimated to fall somewhere near 7,000, studies of child language cover only a miniscule proportion of that figure. Looking at quantitative studies, we see even fewer. Second, the languages that have been studied most extensively tend to be both typologically and geographically homogenous. Thus, we not only miss out on the majority of languages, but also the majority of types of languages. Third, for those languages that do differ typologically from the bulk of the available samples, the amount of data is often too small to be useful for standard quantitative assessment. To combat these issues, colleagues at the University of Zürich have begun collecting and standardizing a corpus of 10 (soon 13) maximally typologically and geographically distinct samples of naturalistic child-produced and child-surrounding speech. This course will introduce the corpus and present several quantitative studies that overcome the issue of sample size while preserving theoretical significance.

Chris Montgomery

Perceptual Dialectology: History and development

Perceptual Dialectology: Methods and data processing

These lectures will detail the history and development of the field of perceptual dialectology, which is an area of study which aims to uncover non-linguists’ thoughts, beliefs, and perceptions in relation to regional dialect variation. The first lecture will chart the development of interest in non-linguists’ dialect perceptions, and discuss contemporary approaches in the field. The second session will discuss methods for collecting and processing data, including the use of Geographical Information Systems to produce composite maps of dialect perceptions.

Emma Moore

Language Variation and Social Meaning

Variationist Sociolinguistics examines the relationship between language and social factors. In order to do so, research in this field has focused on exploring the social patterning of “the linguistic variable”, which has been defined as “two or more ways of saying the same thing” (Labov 1972: 271). But what does it mean to say that two variants ‘say the same thing’? In this series of talks, we will consider how variationist sociolinguists have defined ‘social meaning’ in their research. When a speaker uses one variant over another (for instance, saying “I didn’t do nothing” rather than “I didn’t do anything” in British English), are they simply expressing their social class status (given that traditional variationist sociolinguistic studies have found that nonstandard expressions like “I didn’t do nothing” are used more frequently by working class speakers), or are they communicating an attitude or stance that would be communicated less effectively (if at all) by the expression “I didn’t do anything”? To answer this question, we will think about what kinds of social meanings can be communicated by linguistic variables; whether phonetic/phonological variables can communicate different types of social meanings to morphosyntactic variables; and the methods required to investigate the social meanings of linguistic variation.

Pieter Muysken

In three lectures I situate language contact on a temporal axis, going from contemporary contact situations with code-switching, to the historical perspective of European colonial expansion with creoles, to the pre-historical perspective of deep time contacts in South America.

An integrative perspective on code-switching

Code-switching is a highly complex phenomenon, in which different aspects of bilingual speech come together. Starting from a linguistic analysis in terms of the grammatical properties of intrasentential codeswitching, I present an integrative perspective on the phenomenon, in which sociolinguistic, psycholinguistic, and grammatical aspects are combined. Particular attention is paid to the question of language distance, and how it influences code-switching.

Contextualizing creoles: the case of Surinam

Creole languages are generally studied as a separate group of languages, together with pidgins, within creole studies. In this lecture I take Surinam, a country where half a dozen creole languages are spoken together with many other languages, as the point of departure, to see how creoles fit into the larger context of language contact studies.

Deep time language contact: the case of South America

There is a growing interest in adopting multidisciplinary perspectives on the early history of the languages of the world, and on the role that language contact may have played in that early history. In this lecture I take the example of the diversification and spread of South American indigenous languages to study the methodological and theoretical aspects of this enterprise. In doing so, I explore the possibility of deep time survival of very early grammatical traits.

Mits Ota

The relationship between early phonological and lexical development

Open a textbook on first language acquisition and you will typically find seperate chapters on speech/phonological acquisition and lexical acquisition (usually in that order), giving us the impression that they are independent aspects of language development with one preceding the other. But in fact the process of learning sound patterns and the process of learning words in a language are heavily intertwined. In these sessions, I will review and discuss the interplay between phonological and lexical development during infancy and childhood, focusing on the following questions:

1. Can phonetic categories and phonological patterns learned independent of lexical knowledge?

2. What is the role of phonological knowledge in lexical learning?

3. Do children learn to produce sounds or words?

4. Does the profile of phonological input affect lexical development?

Radek Skarnitzl

The true matched guise: Creating reliable stimuli for perception experiments (workshop)

(Requirements: laptop and headphones required; install: Audacity and Praat; files for the workshop are available in the shared folder)

The traditional application of the Matched Guise Technique (MGT) consists in multi-lingual or -dialectal speakers performing “guises” in different languages or dialects. That may, however, involve undesirable changes, beyond those reflecting the linguistic differences (as admitted, for instance, by Purnell, Idsardi & Baugh, 1999). In the workshop, we will learn to create what I refer to as “true” MGT stimuli in a controlled way, using computer resynthesis. That allows experimenters to precisely control their speech stimuli, making their findings more robust: any differences in listeners’ perceptual impressions should be due to the experimental manipulation. The workshop will include manipulations of melodic and temporal patterning, as well as vowel quality.

Ondřej Tichý

Analysis & Visualisation of Linguistic Networks in Gephi (workshop)

(Requirements: laptop required; install: Gephi – see instructions below)

A number of both language internal and language external features are regularly structured into sets of linked nodes – that is into graphs or in other words into networks. Think about syntagmatic features like valency, paradigmatic features like synonymy and antonymy, or even relationships between members of linguistic communities and their communication.

Often these features do not yield easily to the more traditional representations of corpus linguistics like concordance lines or quantified visualisations like charts – or at least not on a certain level of their complexity.

The workshop will focus on the practical matters of transforming sets of linguistic data (corpora) into visualisations that encourage exploration of specifically their networked features. We will focus on the current de facto standard tool in the field of network analysis and visualisation – Gephi. We will learn to prepare the data for Gephi, explore the data using it and export the data into visual representations that can be both highly informative and visually stun your audience.

Installation instructions:

If you plan to attend the Analysis & visualisation of linguistic networks in Gephi workshop, please follow the following instructions to install the necessary software prior to the workshop – there may not be enough time during the workshop to solve technical problems:

if you don’t have Java, download & install it from https://www.java.com/en/download/
download & install Gephi from https://gephi.org/users/download/
optionally, download & install any free advanced text editor (i.e. not notepad or MS Word), e.g. https://notepad-plus-plus.org/download
try running Gephi – if it opens without reporting any errors, you are good to go

If Gephi does not open – there are three common problems with its installation:

if Gephi reports an error with Java, make sure it is installed properly and it is version 7 or higher
if Gephi reports that it cannot find Java even though it is installed on your system, look for the folder where you have Java installed (usually “C:/Program Files (x86)/Java/jre1.8.0_151” or similar depending on your Java version) and edit the gephi.conf file in C:\Program Files\Gephi-0.9.2\etc (or wherever you installed Gephi) updating the update the path there from #jdkhome=”\path\here” to jdkhome=”C:\Program Files (x86)\Java\jre1.8.0_151″ (note that you need to remove the ‘#’ symbol)
if Gephi reports an error concerning your GPU drivers (Intel or OpenGL drivers), try to update the drivers for your graphic card in addition to following the previous point

If all else fails, try starting Gephi by double clicking gephi64.exe in C:\Program Files\Gephi-0.9.2\bin (or wherever you installed Gephi).In case of further problems, do not hesitate to reach me at ondrej.tichy@ff.cuni.cz.