As noted in Chapter 1, this thesis is investigating two main points--the effects of the variables in the study on whether or not a given vowel was devoiced, and, for the voiced vowels only, the effects of the variables on the duration of voicing activity. These two points will be dealt with in this and the next chapter.
First it is necessary to ask whether the use of the three computer-controlled display times elicited three adequately distinct sets of data to represent three distinct SRs. This is necessary because of the lack of known previous studies using computer-controlled display times to elicit SR differences. The question of whether or not the program adequately produced 3 distinct data sets will be addressed in the first section of this chapter through rigorous examination of the data generated by the experimental setup. It will be shown that the answer to this question is yes, the program was quite successful.
Further, it is important to determine any effects that the setup of the study had on the data collected. As an example, the participants of the study included both females and males. Previous research (Yuen & Hubbard 1997) has shown that in Japanese, like other languages, males tend to have faster SRs than females. It is therefore necessary to show that the use of both genders in the study did not compromise the elicitation of the 3 data sets; i.e. that the males did not produce significantly shorter duration tokens than the females even though they were performing the same elicitation task. In addition to checking for effects of the variable gender on mean token durations (MTDs), checks for effects of the variables participant, repetition, block, token, and #dev were also made.
The outline of this chapter is as follows. §5.2 provides detailed descriptive statistics of the token duration data sets for each SR, including discussion of the usability of the data for further statistical checks for effects of variables. §5.3 discusses the setup of the ANCOVA models used to check for effects of the variables controlled for in the study, while §5.4 presents the results of those checks and their interpretation. Finally, §5.5 summarizes the chapter.
As noted in Ch. 1, this chapter and the next two contain background statistical information in addition to the results of the statistical checks. This is for the benefit of those who, like the author, have phonological backgrounds that include little or no exposure to statistics.
This section presents descriptive statistics and distribution characteristics for the data collected. In particular, checks for normalcy (§5.2.1 below) and homogeneity of variance (§5.2.2 below) of the data are important, since the statistical procedures being utilized are based on the assumptions that the normalcy and homogeneity of variance of the data are within acceptable limits.
Statistical checks are often made on the main body of a data set, ignoring data points that are far from the mean value of the data set, the so-called outliers. (A typical criterion for inclusion of a data point is that it fall within 95% of the data points closest to the mean value of the data set.) However, on the advice of Milton (1992: 31-32), outliers were not removed from the data set before measures of normalcy and homogeneity of variance were made; instead statistical values are given both with and without the outliers. The extent of the outliers can be seen in the distribution plot of the entire data set below, where the most extreme outlier, the long production at token duration = 867 ms in the slow data set, lies well outside the range of the rest of the data.

Figure 5.1 Distribution plot of token duration vs. SR; all data points (n = 1800).
The base assumptions underlying the statistical checks will now be presented, focusing on the normalcy and homogeneity of variance of the three data sets.
The first of the base assumptions of most statistical calculations is that the data sets being compared be normally distributed, that it follow a normal curve, also known as a Bell curve. To check normalcy, plots known as histograms are used to show the frequency distribution of the data. For the current data set, the histograms show the distribution of the values for the token durations (token duration). These histograms therefore show how many tokens occur in each percentage of the total range of token durations (note 5-1).
The data are separated into three sets corresponding to the three SRs since the carrier sentence display program artificially imposed three separate SRs on the participantsÕ productions. Figures 5.2 to 5.7 present a separate histogram for each SR, Figures 5.2 to 5.4 for all data and 5.5 to 5.7 with the outliers removed. The axes have been kept constant to show the relative distribution of the data for each SR, and a normal curve has been fitted to each histogram to aid visual inspection.
The size of the groups of data represented by the bars in the graph was determined by a statistical procedure known as SturgesÕ rule (note 5-2); the entire range of the data is divided into a number of even duration intervals for comparison with the normal curve. For the data sets presented in this chapter, 10 groups were used. (Not all 10 divisions are visible in the first 3 histograms due to the small number of tokens in the data ranges for the very short and very long durations; the number of data points in these lowest and highest groups are so small as to be indistinguishable from the x-axis.)

Figure 5.2 Histogram for token duration, slow SR; n = 600 (all data points).

Figure 5.3 Histogram for token duration, normal SR; n = 600 (all data points).

Figure 5.4 Histogram for token duration, fast SR; n = 600 (all data points).

Figure 5.5 Histogram for token duration, slow SR; n = 576 (24 outliers removed).

Figure 5.6 Histogram for token duration ,normal SR; n = 579 (21 outliers removed).

Figure 5.7 Histogram for token duration, fast SR; n = 573 (27 outliers removed).
As can be seen from the figures, the data for the normal and fast SRs approximate the fitted normal curves fairly well both with outliers included and outliers removed. The data sets for token duration are therefore considered to fulfill the requirement of normalcy for statistical tests.
As a side note, it can also be seen that the elicitation program was successful in targeting a different mean token production at each SR--the center of the fitted normal curve shifts toward a shorter duration value from the slow to the normal to the fast SR. The values associated with the centers of the fitted normal curves, averaged across all data points for all 10 tokens used in the study (Figures 5.2 to 5.4), are 472 ms for the slow SR, 356 ms for the normal SR, and 273 ms for the fast SR. These values represent 12%, 13% and 16% of the total sentence display times, respectively--approximately an equal percentage of the total sentence display time, indicating that the change in SR resulted in approximately the same amount of time being spent on producing the tokens at the beginning of the stimuli sentences.
The more important base assumption of most statistical tests is that the data in different groups being compared display homogeneity of variance--the data of each group being compared should be distributed about the mean of that group in approximately the same way. That is to say, in order to make observations about the differences in the mean values for groups of data, the distribution of values about those means of the data should be about the same. The variance of a data set is a measure of this distribution of the values about the means; i.e. a measure of how the data varies about the mean value for the data set. The variance of the three data sets corresponding to the three SRs are given below, along with other statistical measures.
Table 5.1 Summary statistics the token duration data, split by SR; all data points.
|
fast |
normal |
slow |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Table 5.2 Summary statistics for the token duration data, split by SR; outliers removed.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
It is important to note that both the range of values of each data set and the distribution of data throughout each range (as indicated by the variance) increase from the fast to the normal to the slow data set.
The relationship between the mean and variance of the three subsets of data can be seen more clearly in what is known as a box plot. Each box indicates the main body of data points for each data set; the line through the center of each box is the mean value of that data set. The outer bars of each box give the standard deviation, while the extreme values, the outliers, lie above and below each box. The height of each box and range of the outliers therefore indicate the distribution of the data in each set.

Figure 5.8 Box plot for token duration, separated by SR.
It can be seen that the mean of each data set increases as SR decreases; i.e. the MTD of each data set increased as the presentation time of the stimuli sentences increased. The mean of the fast data set, corresponding to a sentence presentation time of 1700 ms, had a MTD of 273 ms; the normal data set, corresponding to a presentation time of 2700 ms, had a MTD of 356 ms; and the slow data set (presentation time of 4000 ms) had a MTD of 472 ms. The increase in MTD roughly parallels the increase in sentential display duration.
More importantly, the range of most of the data in the three subsets, indicated by the height of the boxes and the spread of the points outside the boxes, again shows an increase in variance as the SR decreases (i.e. the range of the data is greatest for the slow SR). Gravetter & Wallnau (1996) state that for large samples (n > 10), there is reason for concern if the variance of one set of data is more than twice the variance of another. Following this rule of thumb, non-homogeneity is a problem for the 3 SR data sets for token duration. As can be seen from the tables above, the variance of the slow group of data is about three times the variance of the fast data, and the variance of the normal group of data is over twice the variance of the fast data for both the full set of data and the data with the outliers removed.
These differences can also be seen in the different shapes of the fitted normal curves in Figures 5.2 to 5.7 above. The shape of the fitted normal curve for the slow data is much broader than the shape of the fitted normal curve for the fast data, indicating that the slow data is distributed throughout its range farther away from the data set mean. In addition, the width of the bars in the graph also indicates the overall range, since they are equal to 1/10th of the total range. The bars are much wider at the slow SR than at the fast, again indicating a wider distribution of data at the slow SR. The differences can be seen just as well in the box plot in Figure 5.8 in the increasing height of the boxes, which also shows the greater range of the data at the slow SR.
The statisticians consulted for this project pointed out that this increase in variance indicates a relationship between mean token duration and variance--variance increases along with mean token duration. This relationship is thought to be due the slower SR allowing the participants more freedom in adjusting their sentence production to the display duration of the target sentence. Since the tokens were the first words in all stimuli sentences, they may have been more subject to variation than other parts of the sentence, with Ôfine-tuningÕ adjustments coming later in the production after the participant had time to see whether their initially chosen SR for a given production would actually match the display duration or not.
A relation like this where the variance changes along with the dependent variable often indicates a logarithmic relationship between the two variables (Abacus Concepts 1994: 305). This type of relationship is quite commonly found in the natural sciences, where a period of slow increase is followed by rapid growth or vice versa. The following figure represents this type of relationship.

Figure 5.9 Representation of a logarithmic relationship between two variables; linear y-axis.
In order to show the relationship of the two variables represented on the x- and y-axis (in the example figures, variables A and B; in the current data set, variables SR and token duration), the data can be presented on a plot that has a logarithmic axis. In this type of plot, the logarithmic increases are accounted for by adjusting the scaling of the axis. This is equivalent to plotting the logarithmically adjusted values on linear y-axis, but it preserves the units (i.e. the units for the current data set will still be in ms, rather than in logarithm values). The example plot given above would look something like the following when presented on a plot with a logarithmic y-axis.

Figure 5.10 Representation of a logarithmic relationship between two variables; logarithmic y-axis.
On the consultantsÕ advice, and as recommended in Abacus Concepts (1994: 305), the data was adjusted with the formula adjusted value = log (token duration). Descriptive statistics were then generated to check the variance of the adjusted values. The variance of the three SRs is given again in the table below, along with the variance of the log adjusted values for comparison.
Table 5.3 Descriptive statistics of the data, before and after logarithmic transformation.
|
duration fast |
duration normal |
duration slow |
|
|
Mean
|
|
|
|
|
SD
|
|
|
|
|
Variance
|
|
|
|
|
Mean
(log trans.) |
|
|
|
|
SD
(log trans.) |
|
|
|
|
Variance
(log trans.) |
|
|
|
The adjustment of the variance by the log transformation can be seen in the last row of the table. Whereas the linear data showed a continually increasing mean and variance as the SR decreased, the log transformed data showed a more linear increase in mean and an increase and then decrease in variance as the SR decreased. The largest variance, that of the normal SR, is no longer twice as large as the variance of the other two data subsets.
Again, the information is perhaps easiest to see in the form of a box plot. The following box plot shows the same data given in Figure 5.8 plotted on a logarithmic y-axis.

Figure 5.11 Box plot for token duration at each SR; linear axis.
The height of the boxes and the spread of the data points outside the boxes are now fairly uniform, indicating the variance of the three log-transformed data subsets is much more uniform (i.e. the distribution of the data in the three subsets is much more homogeneous).
However, the transformation is by no means completely satisfactory, either. If there were a true logarithmic relationship between the 3 data subsets, the variance of all 3 sets should be fairly equal. The larger variance of the normal data subset suggests that some other more complex relationship is at work here.
Also, it is suspected that this logarithmic increase in variance will not hold as SR slows even further. The most likely scenario is that once the stimuli sentences are displayed for a long enough duration, the differing variation in the distribution of the token duration data will level out for data collected at even longer stimuli sentence display durations. That is, it is expected that while the token durations will continue to fluctuate around the mean value within a given range; after a certain slow SR is reached the range over which the data fluctuates is expected to remain fairly constant as display duration is further increased. The data is therefore suspected to be best represented by some complex relationship (i.e. logarithmic up to a certain slow SR, then plateauing out to some fairly constant level of variance) that will need to be investigated in further research.
For the present study, there are two courses that further analysis can take. First, further statistical checks can be made on the logarithmically adjusted data, either with or without the inclusion of the outliers. This would ensure that the statistics generated by those tests would not have a hidden error due to differing variances of the data subsets. A second option, however, is to simply analyze all of the raw data and consider it unlikely that the differences in variance will cause an erroneous statistic to be generated.
Rosenthal & Rosnow (1991) state that while homogeneity of variance is an important assumption of ANOVA tests, the likelihood of erroneous results stemming from non-homogeneous variances is minimized by using samples of the same size. In their words,
"Only if the two population variances are very different and if the two sample sizes are very different is the violation of this assumption [homogeneity of variance] likely to lead to serious consequences."
Rosenthal & Rosnow (1991: 315); emphasis in the original
In addition, with the possible exception of the most extreme outlier in the slow data set, none of the productions are by any means performance errors; all productions were very normal sounding utterances. Their varying duration is simply a function of the inherent variability of the speech act. In addition, the differences in variance are caused by only one factor in the experimental setup, the factor of SR; all other factors are the same for all 3 data sets. Notably, the effects of this factor are incredibly large for both the raw data and the transformed data; the only difference in preliminary investigations using both raw and transformed data were in the effects of other, less significant factors.
The second option is therefore adopted in this thesis. The untransformed data will be used for all statistical checks and the plots showing the relationships being discussed. In the interests of completeness, however, the statistical checks were run a second time on the data with outliers removed, outliers again being defined as those points falling within the 95% of the data closest to the mean of that data set. Both results will be included in the discussion of the results.
This section details the effects that the variables to be discussed had on the mean token durations (MTDs), and the interaction among those variables. (note 5-3) Significant interactions will be judged at the 95% level of confidence.
Due to the exploratory nature of this investigation, it was decided to include multiple checks for various combinations of variables. Because of the relationship between participant and gender, checks could not be made on these two variables at the same time. The gender of the participants did not change during the study (and to the best of our knowledge still has not) so no comparisons involving both variables could be made.
Effects of the inherent segmental material in the tokens and the following syntactic clitic are not directly included in the following model. While there may well be interaction between the segmental material of the tokens themselves, the location of that material in either the first or second mora, and the syntactic clitic following a token (either ka or to), the interaction among all of these factors is quite difficult to interpret in the current study because the experimental setup did not adequately control these factors--the tokens used in the study did not cover all possible combinations of CV material in each mora and with each clitic. Therefore the only check that will be made involving segmental material will be for the factor token. All of the segmental effects mentioned above are therefore combined in this one variable. The interaction among the component effects will be left for further, more controlled research.
Since the number of devoiced vowels is suspected to have a significant effect on mean token duration, the variable #dev was included in the models as well. As mentioned in §4.4, #dev is actually a regressor in these models, and the models themselves therefore become analysis of covariance (ANCOVA) to reflect the fact that the effect of two dependent variables are being checked for effects on each other.
Finally, as was discussed in Chapter 3, because accented vowels are regularly being devoiced in the Tokyo dialect, and because the decision to devoice an accented vowel or to manipulate the accent location to avoid devoicing an accented vowel appears to be in free variation, the variable pitch was not included in the models.
The ANCOVA model constructed by the statistical consultants for this project was therefore expanded to include both gender and participant. The two models run are as follows:
token duration = SR * token * #dev + participant * block + participant * repetition (block)
token duration = SR * token * #dev + gender * block + gender * repetition (block)
where SR * token * #dev indicates all 3 variables by themselves and all possible combinations, participant * block indicates those two variables by themselves and the interaction between them, and participant * repetition (block) indicates those two variables by themselves and the interaction between them where repetition is nested within block (i.e. 3 repetitions were produced in each of the two repetition blocks).
SR, token and #dev were grouped together since these are what is known as fixed effects, effects that are controlled in or defined by the experiment--an investigation is being made into the effects of these particular 3 SRs, these particular 10 tokens, and these particular 3 numbers of devoiced vowels (0, 1 or 2). On the other hand, participant and gender are combined with repetition and block because these are what is known as random effects, effects that are not specifically controlled by the experiment--an investigation is not being made into the effects of these particular 10 participants, but we wish to extend the results to the larger population of young Japanese speakers of the Tokyo dialect. Similarly, we are not interested in the effects of the 3 repetitions in each of these particular 2 repetition blocks, but we wish to extend the results to any number of such repetition blocks.
All models were run as multivariate repeated measures models, since the measured MTDs were measured 6 times for each participant in each experimental condition.
The following sections will discuss the significant effects found, their interpretation, and their implications for further research.
1. While statistical measures such as the skewness and kurtosis of the data curve and the modified Kolmogorov-Smirnov (K-S) test are available for quantitatively checking normalcy, the statisticians consulted for this project were of the opinion that visual checks for approximation to the normal curve are sufficient due to the robust nature of the ANOVA and ANCOVA procedures. Statistical measures of normalcy are therefore not presented.
2. SturgesÕ rule (Sturges 1926: 65-66; as quoted by Milton 1992: 18-19) is based on the base 2 logarithm of the data count for data counts greater than 16:
# of categories = truncate (log2 (data count)) + 1
i.e. the next highest integer above the whole number component of the base 2 exponent for the number of data points. For the current data sets the data count = 600, so the number of categories is given by truncate (log2 (600)) + 1 = truncate (9.2) + 1 = 10. The range of data for each of the three data sets was therefore divided into 10 equal duration ranges, and counts of data points were made within each duration range.
3. The effects of any two or more independent variables must begin with their interaction. For example, if one studies the effects of air temperature on human discomfort without taking into account humidity, it might seem as if people living in Tokyo are much more sensitive to heat than those living in Phoenix. Air temperature and humidity interact in their effect on discomfort; i.e. how much one affects human discomfort depends on the value of the other. The effects of heat must therefore be studied within the contexts of humidity and vice versa.