Hi, this is a tutorial designed to accompany an introductory level hands-on workshop. The tutorial (along with the workshop) aims at helping phonetic researchers run some useful statistical analyses in R. We will discuss R code for the following tasks while assuming no prior knowledge of programming.

  1. t-tests;
  2. SSANOVA;
  3. LME;
  4. plotting with ggplot2.

There are many ways to do the same thing, but here we will discuss only a few of them.

We will be using the following packages: gss, ggplot2, lmerTest, plyr, and some artificially generated data.


1 preliminaries

setwd(choose.dir())

If you encounter difficulty using the above code, you may alternatively set the working directory in the following format. The content enclosed in the quotation marks should be the directory/path to the folder where you saved the data files:

setwd("C:/Users/v/Desktop/Rworkshop")

2 t-tests

2.1 paired t-test

Below we use a dataset of 3 columns where

  • column ‘token’ denotes the number of repetition,
  • column ‘Y.ei_C_F1’ denotes the mean F1 value for each token of diphthong /ei/ in Cantonese produced by a Cantonese speaker
  • column ‘Y.ei_E_F1’ denotes the mean F1 value for each token of diphthong /ei/ in English produced by the same speaker
dttest <- read.csv("F1_ttest.csv",fileEncoding="UTF-8-BOM")
dttest

We can run a paired t-test to see if mean F1 differs significantly in different language contexts.

t.test(dttest$Y.ei_C_F1,dttest$Y.ei_E_F1,paired=TRUE)
## 
##  Paired t-test
## 
## data:  dttest$Y.ei_C_F1 and dttest$Y.ei_E_F1
## t = 9.5462, df = 5, p-value = 0.0002135
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  46.10818 80.09094
## sample estimates:
## mean of the differences 
##                63.09956

t.test() returns useful statistics such as t value, df, and p value.

2.2 independent 2-group t-test

If you skip the argument ‘paired=TRUE’ in t.test(), R will run independent 2-group t-test for you.

t.test(dttest$Y.ei_C_F1,dttest$Y.ei_E_F1) 
## 
##  Welch Two Sample t-test
## 
## data:  dttest$Y.ei_C_F1 and dttest$Y.ei_E_F1
## t = 7.521, df = 9.9494, p-value = 2.072e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  44.39313 81.80599
## sample estimates:
## mean of x mean of y 
##  449.0327  385.9332

2.3 one sample t-test

This is often used to check if something is significantly different from chance level, or a given value. The given value to be tested against is specified in the argument ‘mu=’.

Here for demonstration we test whether the mean F1 for /ei/ in Cantonese in our dataset is significantly different from 450Hz.

t.test(dttest$Y.ei_C_F1,mu=450) 
## 
##  One Sample t-test
## 
## data:  dttest$Y.ei_C_F1
## t = -0.16919, df = 5, p-value = 0.8723
## alternative hypothesis: true mean is not equal to 450
## 95 percent confidence interval:
##  434.3366 463.7289
## sample estimates:
## mean of x 
##  449.0327

3 SSANOVA

Below we will work on a different data file, and use SSANOVA to test whether the formant values of vowel ei are significantly different in bilinguals’ Cantonese and English.

3.1 load data

dssanova <- read.csv("F1_F2.csv",fileEncoding="UTF-8-BOM")
tail(dssanova)

Our data contains F1 and F2 values of ei produced in Cantonese and English.

The vowel label is listed under the column ‘word’, e.g. ‘ei_C_F2’ means F2 of ei in Cantonese.

Column ‘X’ records the normalized measurement points.

Column ‘Y’ denotes the actual formant values at each corresponding measurement point.

3.2 run SSANOVA

#we use code in this section to generate SSANOVA values, which will later be used for plotting.
library(gss)
# actual comparisons
engF1 <-"ei_E_F1"
canF1 <-"ei_C_F1"

F1 <- subset(dssanova,word %in% c(engF1, canF1))
F1$word <- factor(F1$word, levels = unique(F1$word))


# fit model
ssa.fit2 <-ssanova(Y~word+X+word:X,data=F1)

# get predictions for constructing confidence intervals
X=seq(min(F1$X), max(F1$X), by=0.01)
grid <- expand.grid(X=X, word = c(engF1, canF1))
grid$ssa.fit <- predict(ssa.fit2,newdata = grid,se = T)$fit
grid$ssa.SE <- predict(ssa.fit2,newdata = grid,se = T)$se.fit

3.3 plot

We can use the package ‘ggplot2’ to make graphs. We will plot the F1 values (along with the confidence interval, generated by SSANOVA) on the y axis, measurement points (time) on the x axis.

# plotting comparison; setting up plot object:
library(ggplot2)
p01 <- ggplot(grid,aes(x = X, colour = word, group = word)) +
        theme_bw() + 
        geom_line(aes(y = ssa.fit),alpha = 1,colour = "grey20", lty=3) +
        geom_ribbon(aes(ymin = ssa.fit-(1.96*ssa.SE), ymax = ssa.fit+(1.96*ssa.SE),fill = word),alpha = 0.5,color="NA") +
        xlab("normalized time") +
        ylab("Formant (Hz)")+
        theme(axis.text = element_text(colour="black", size=15),
                  text = element_text(size=15),
                  title = element_text(size=15),
                  axis.title.x=  element_text(vjust=-0.45),
                  axis.title.y = element_text(vjust=.2),
                  axis.ticks = element_line(colour="black"),
                  axis.line = element_line())
p01

~challenge task

Sections above have compared the F1 values in English and Cantonese. Try working on F2 values yourself.

Hint: change the specification for comparisons.

When you’re done, you may check the last section for a reference answer.


4 LME

In this section we build models to see how the various information we have about the speaker or the linguistic context of the word influence the duration of the vowel.

Suppose we have a corpus of child’s speech and we are interested primarily in the relationship between word frequency and the duration of the vowel. So we decided to look at two specific vowels, schwa and ae. We sampled vowel duration from the speech of 40 children. These children form 4 groups of 10, so that each group of children would have the same first or dominant language, i.e. English, French, mandarin or Japanese. In each children’s speech, we sampled 10 tokens of schwa, and 10 tokens of ae. Because it’s a corpus, we could not control for the age of the children. Overall, they age between 5-10. Also, the words in which we measure the vowels may vary across speakers. For instance, maybe one of the tokens from a speaker comes from “bad”, but none of the tokens from another speaker comes from “bad”. All together there are 48 unique words.

Let’s load the data and call it dlmer.

dlmer<-read.csv("dlmer.csv",header=T)

4.1 data snap view

Take a closer look at the data

4.1.1 example data

head(dlmer)

4.1.2 data description

The dataset consists of 800 rows (10 tokens x 2 vowels x 10 speakers x 4 groups) and 8 columns with the following column names:

  • speakerID: 40 speakers, 1-40 (10 speakers per language; balanced in sex);
  • sex: F/M;
  • age: age of speakers, 5-10;
  • lang: native language: English/French/Mandarin/Japanese;
  • frequeny: centralized word frequency;
  • word: the word in which the vowel duration was measured, 48 unique words;
  • vowel: the vowel identity: schwa/ae;
  • duration: duration of the vowel

4.2 why lme: some intuition

Since we are interested in the effect of word frequency on vowel duration, we may plot all 800 data points, and have a linear model of the relation between frequency and duration. There seems to be a positive correlation between frequency and duration: vowels in higher frequency words appear, quite counterintuitively, longer than those in lower frequency words.

We get the same pattern when we separate the data for each vowel…

or when we further seperate the data based on speakers’ L1…

But when we take into consideration the identity of individual speakers, things begin to look different. The correlation is clearly negative.

Although we could build individual models for each speaker, we would also like to know how strong the correlation is in general, i.e. across speakers (imagine a line that represents the grand average of all the lines). This is why we need mixed models.

In other words…

It’s important to take into consideration the individuals/idiosyncracies.

A lme model (linear mixed effects model) allows us to assess the relationship between the variables we are interested in (e.g. frequency and duration) in aggregated data while taking into consideration the idiosyncracies (e.g. speakers).

4.3 random effects

m0<-lm(duration~frequency,data=dlmer)

library(lmerTest)

m1<-lmer(duration~frequency+(1|speakerID),data=dlmer)

m2<-lmer(duration~frequency+(1+frequency|speakerID),data=dlmer)

Compare the above figures with the one based on raw data:

Clearly, among the three models, m2 best predicts the real data.

The magical element is

(1 + frequency | speakerID)

(1|speakerID) says

speakers can have variable baselines in terms of the duration (i.e. random intercept);

(1+frequency|speakerID) says, in addition to random intercept,

the effects of frequency on duration could also vary for different speakers (i.e. random slope).

4.4 the 3 pillars of a lme model

Let’s straighten out some terms.

lmer(duration ~ frequency+ ( 1 + frequency | speakerID ),data=dlmer)

Elements in red are the building blocks of a mixed effects model. From left to right, they are

duration : dependent variable. left to the tilde.

frequency: fixed effect(s) (also independent variable). right to the tilde, not in brackets.

(1+frequency|speakerID): random effects. right to the tilde, in brackets.

4.5 significance testing: model comparison

In our toy models above, we included only frequency, but real data oftentimes contain more than one variable and we would like to know whether some specific variables are significant. In our case here, we would like to know whether variables such as lang, age, vowel would impact the duration of the vowel.

One way to do that is through model comparison.

m2<-lmer(duration~frequency+(1+frequency|speakerID),data=dlmer)
m3<-lmer(duration~frequency+lang+(1+frequency|speakerID),data=dlmer)
anova(m2,m3)
## refitting model(s) with ML (instead of REML)

Notice that in comparison with m2, m3 has one more fixed effect lang.

Since the p value (0.19) is higher than 0.05 (a conventional threshold for hypothesis rejection), meaning that the addition of lang as a fixed factor did not improve the model, we conclude that lang is not a significant predictor of duration.

m3inter<-lmer(duration~frequency*lang+(1+frequency|speakerID),data=dlmer)
anova(m3,m3inter)
## refitting model(s) with ML (instead of REML)

frequency * lang = frequency as a fixed factor +lang as a fixed factor + an interaction between frequency and lang.

4.6 significance testing: t-values

Another way to do significance testing is to read the results of the models.

m4<-lmer(duration~frequency+vowel+(1+frequency|speakerID),data=dlmer)
summary(m4)
## Linear mixed model fit by REML t-tests use Satterthwaite approximations
##   to degrees of freedom [lmerMod]
## Formula: duration ~ frequency + vowel + (1 + frequency | speakerID)
##    Data: dlmer
## 
## REML criterion at convergence: -8.7
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.8906 -0.5885  0.0376  0.6601  2.5157 
## 
## Random effects:
##  Groups    Name        Variance Std.Dev. Corr
##  speakerID (Intercept) 0.29721  0.5452       
##            frequency   0.03013  0.1736   0.01
##  Residual              0.04129  0.2032       
## Number of obs: 800, groups:  speakerID, 40
## 
## Fixed effects:
##              Estimate Std. Error        df t value Pr(>|t|)    
## (Intercept)   0.12736    0.08761  38.39802   1.454    0.154    
## frequency    -0.19510    0.03152  39.13341  -6.190 2.78e-07 ***
## vowelschwa   -0.24958    0.01521 736.15015 -16.408  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##            (Intr) frqncy
## frequency   0.018       
## vowelschwa -0.092 -0.124

summary() offers a p value based on t distribution.

Here, for instance, the p value for vowelschwa is <.001, indicating that the duration for vowel schwa is significantly different from that for vowel ae.

~challenge task

Build your own model and test the significance of your fixed effects. Experiment with different random intercepts, random slopes, and interactions.

4.7 beyond this tutorial…

  • Baayen (2008 : Ch. 7): Analyzing Linguistic Data

  • Cunnings & Finlayson (2015): Mixed effects modelling and longitudinal data analysis

  • Field et al. (2012: Ch. 19): Discovering Statistics Using R


5 plotting with ggplot2

In the SSANOVA section, we have used ggplot2 to generate a graph of formants. In this section we will build simpler graphs like bar charts and line charts, but with a focus on the code itself, in order to understand how ggplot2 works.

5.1 bar plot

Let’s make a bar plot with dssanova.

#take a look at what our data set is like
head(dssanova)

5.1.1 data preparation

Plots with error bars are usually more informative than those without. To have error bars, we need to prepare the data beforehand. Here we will pot bar charts with +-1 standard deviation as error bars. The code below is a function that calculates the mean and sd.

library(plyr)
#+++++++++++++++++++++++++
# Function to calculate the mean and the standard deviation
  # for each group
#+++++++++++++++++++++++++
# data : a data frame
# varname : the name of a column containing the variable
  #to be summariezed
# groupnames : vector of column names to be used as
  # grouping variables
data_summary <- function(data, varname, groupnames){
  require(plyr)
  summary_func <- function(x, col){
    c(mean = mean(x[[col]], na.rm=TRUE),
      sd = sd(x[[col]], na.rm=TRUE))
  }
  data_sum<-ddply(data, groupnames, .fun=summary_func,
                  varname)
  data_sum <- rename(data_sum, c("mean" = varname))
 return(data_sum)
}

Run the function with our dataset.

dssanova.sum<-data_summary(dssanova, varname="Y", groupnames=c("word"))
dssanova.sum

5.1.2 a basic bar plot

Now we can build a basic bar chart.

#plot
library(ggplot2)
p<-ggplot((aes(x = word, y=Y)), data=dssanova.sum) +  
  geom_bar(stat="identity")+
  geom_errorbar(aes(ymin=Y-sd, ymax=Y+sd))
p  

5.1.3 prettify the plot by adding codes at various places…

make error bars leaner

p<-ggplot((aes(x = word, y=Y)), data=dssanova.sum) +  
  geom_bar(stat="identity")+
  geom_errorbar(aes(ymin=Y-sd, ymax=Y+sd), width=.4,position=position_dodge(.9))
p 

make the background white

p<-p+theme_bw()
p

make more informative axes titles

p<-p+ylab("formant (Hz)")+xlab("language and formant condition")
p

~challenge task

Make a bar plot where bars are clustered by the formant number, i.e. the bars should appear in the order of ei_C_F1, ei_E_F1, ei_C_F2, ei_E_F2; The space between ei_C_F1 and ei_E_F1 (and the space between ei_C_F2 and ei_E_F2) should be smaller than the space between ei_E_F1 and ei_C_F2.

Answer appears at the end of the tutorial.

5.2 line plot

Bar charts offer some idea of the overall mean, but a line chart showing the trajectory could be more informative.

5.2.1 data preparation

Get mean and sd at each measurement point for each word condition.

dssanova.sumline<-data_summary(dssanova, varname="Y", groupnames=c("word","X"))
dssanova.sumline

5.2.2 a line plot

pp<-ggplot((aes(x = X, y=Y,color=word)), data=dssanova.sumline) +  
  geom_errorbar(aes(ymin=Y-sd, ymax=Y+sd), width=.4,linetype="solid")+
  geom_line()+
  geom_point()+
  ylab("formant (Hz)")+xlab("position")+
  theme_bw()
pp

5.2.3 panels

We can also make a plot with 2 panels, one comparing F1 in Cantonese and English, the other one comparing F2.

dssanova.sumline2<-data_summary(dssanova, varname="Y", groupnames=c("X","language","formant"))
pp<-ggplot((aes(x = X, y=Y,color=language)), data=dssanova.sumline2) +  
  geom_errorbar(aes(ymin=Y-sd, ymax=Y+sd), width=.4,linetype="solid")+
  geom_line()+
  geom_point()+
  ylab("formant (Hz)")+xlab("position")+
  theme_bw()+
  facet_wrap(~formant,scales="free")
pp

6 answers

There are many ways to do the same thing. Your code doesn’t have to be the same as the ones below. So long as they work, things are fine.

6.1 SSANOVA challenge

library(gss)
# this is where we make a change
engF1 <-"ei_E_F2"
canF1 <-"ei_C_F2"

#reuse all the rest of the code, and we'll get a plot for F2

F1 <- subset(dssanova,word %in% c(engF1, canF1))
F1$word <- factor(F1$word, levels = unique(F1$word))


# fit model
ssa.fit2 <-ssanova(Y~word+X+word:X,data=F1)

# get predictions for constructing confidence intervals
X=seq(min(F1$X), max(F1$X), by=0.01)
grid <- expand.grid(X=X, word = c(engF1, canF1))
grid$ssa.fit <- predict(ssa.fit2,newdata = grid,se = T)$fit
grid$ssa.SE <- predict(ssa.fit2,newdata = grid,se = T)$se.fit


ggplot(grid,aes(x = X, colour = word, group = word)) +
        theme_bw() + 
        geom_line(aes(y = ssa.fit),alpha = 1,colour = "grey20", lty=3) +
        geom_ribbon(aes(ymin = ssa.fit-(1.96*ssa.SE), ymax = ssa.fit+(1.96*ssa.SE),fill = word),alpha = 0.5,color="NA") +
        xlab("normalized time") +
        ylab("Formant (Hz)")+
        theme(axis.text = element_text(colour="black", size=15),
                  text = element_text(size=15),
                  title = element_text(size=15),
                  axis.title.x=  element_text(vjust=-0.45),
                  axis.title.y = element_text(vjust=.2),
                  axis.ticks = element_line(colour="black"),
                  axis.line = element_line())

6.2 lme challenge

mchallenge<-lmer(duration~frequency+sex+(1+frequency|speakerID),data=dlmer)
anova(m2,mchallenge)
## refitting model(s) with ML (instead of REML)
mchallenge2<-lmer(duration~frequency*sex+(1+frequency|speakerID),data=dlmer)
anova(mchallenge2,mchallenge)
## refitting model(s) with ML (instead of REML)

frequency * sex = frequency as a fixed factor +sex as a fixed factor + an interaction between frequency and sex

There is a different between boys and girls’ vowel durations, for the inclusion of sex significantly improved the model (m2), but the effect of frequency did not vary for different sexes.

6.3 bar-plot challenge

dssanova.sum2<-data_summary(dssanova, varname="Y", groupnames=c("formant","language"))

p<-ggplot((aes(x = formant, y=Y,fill=language)), data=dssanova.sum2) +  
  geom_bar(stat="identity",position=position_dodge())+
  geom_errorbar(aes(ymin=Y-sd, ymax=Y+sd), width=.4,position=position_dodge(.9))+
  ylab("formant (Hz)")+xlab("formant")+
  theme_bw()
p

7 acknowledgement

This tutorial has benefited from inputs from varied sources, in particular, Sang-im Lee-Kim (on SSANOVA) and Albert Lee (on LME) and Josef Fruehwald (style sheet).

All errors are mine.


Congratulations on making it to the end of this tutorial!

You’re now better equipped as a researcher than you used to be.

Off into the wild world of research you go!