11.3 Stress


Stress is the variation in loudness that differentiates strong and weak syllables, a variation that in turn characterizes a word's spoken identity (Crystal 1995). Thus the word "originality" can only be the word "originality" if primary stress falls on the "nal" syllable. If we stress the "lit" syllable instead, the word no longer means "originality." The word "produce" means two very different things depending on which syllable is stressed: either the verb that is synonymous with "create" or the noun that refers to fruits and vegetables.

Stress also applies to longer utterances. For example, in the question "Does he like it?" sentence-level stress falls on the verb "like." The use of stress at both the word and the sentence level is critical for intelligibility. Non-native speakers of English who fail to master the stress patterns of this language may have a hard time making themselves understood.

The chief function of sentence-level stress is to signal informational prominence, or focus. The focal point of an utterance is the piece of information the speaker intends the listener to consider "new" and to pay special attention to. By default, in English we focus on the last open-class element of the sentence (as discussed in Chapter 10). The default stress position in "Does he like it?" is therefore on "like," because this is the last open-class element of the question; "it" is a pronoun and therefore a closed-class item. For further illustration of this default stress principle and how it can be overridden, consider example (4).

graphics/sound_icon.gif

(4)

Somebody must have TAKEN it.


The last open-class item in this sentence is the participle "taken," and so it receives stress by default. Less commonly, however, other parts of the sentence may also host prominence, as in (5) and (6), where stress functions contrastively.

graphics/sound_icon.gif

(5)

Somebody MUST have taken it. [It's no use your arguing.]


(6)

SOMEbody must have taken it. [. . . even if you didn't.]


This is known as contrastive stress, which refers to the way that people override default accent patterns to highlight any word (or portion of a word) they please. Unlike default stress, contrastive stress can even fall on closed-class items, which are not stressed under normal (default) conditions for example, "and" in "John AND Mary went? Together???", "un" in "I thought she looked UNhappy," and "the" with a long e in "The Four Seasons is THE place to stay on the Big Island."

Users benefit significantly when you use contrastive stress appropriately in your VUI. Consider example (7).

graphics/sound_icon.gif

(7)

SYSTEM:

I heard "Tennessee." Is that RIGHT?

CALLER:

No.

SYSTEM:

What about "Texas"? Is THAT right?


"Is that RIGHT?" expresses the default stress pattern; stress falls on the last open-class item of the sentence, which is the adjective "right." In the second prompt ("Is THAT right?"), stress falls on the pronoun, which is a closed-class item, for contrast. By shifting stress from "right" to "that" as the dialog progresses, the VUI's persona effectively gives the impression that he or she is keeping track of the conversation, as in authentic, human-to-human conversations. The appropriate use of contrastive stress endows the persona with humanlike linguistic intelligence as well as an attentive character.

In contrast, there are VUIs that repeat the same prompt, such as "Is this correct?" several times in a short sequence, as many times as there are confirmations needed. This context-insensitive repetition is unnatural and prosodically unfamiliar to users. Reminiscent of a broken record, it detracts from our perception of the persona.

When attention is not paid to linguistic context, the resulting prompts may contain stress anomalies, as in (8).

graphics/sound_icon.gif

(8)

CALLER:

Get my MESsages.

SYSTEM:

You have no new MESsages.


The system response in (8) stresses "messages," as if this were established by the context as the new focal point, but it is not. Compare prosodically anomalous (8) with the more conversational, prosodically well-formed (9).

graphics/sound_icon.gif

(9)

CALLER:

Get my MESsages.

SYSTEM:

You don't HAVE any messages.


In destressing "messages," the system response is prosodically appropriate. It appears to be keeping track of the conversation in a humanlike way, as a dynamic flow of information, and this is where the system in (8) fails. Whereas (8) simply issues a context-insensitive translation of the backend return "N equals zero," the context-sensitive message in (9) further manages the caller's assumption that there are messages to be retrieved, by shifting the informational focus to the negated verb of having.

A similar stress violation occurs in a bill payment application in (10).

graphics/sound_icon.gif

(10)

CALLER:

Bill payment.

SYSTEM:

What bill would you like to PAY?

CALLER:

MasterCard.

SYSTEM:

How much would you like to PAY?

CALLER:

Two hundred and fifteen dollars.

SYSTEM:

On what date would you like it PAID?


Participating in this dialog, over the telephone, a user is likely to think, "Something sounds stilted or unnatural, but what exactly is it?" When we read the transcript of this dialog, however, the chief culprit leaps off the page, thanks to the use of SMALL CAPS. The verb "pay" bears sentence-level stress in every utterance in which it appears, one after the other, as if it were the new informational focus of each prompt.

The dialog sounds better if "pay," after it makes its debut, is prosodically defocused in other words, destressed. There are several other changes that can make the resulting dialog seem more natural, as shown in (11).

graphics/sound_icon.gif

(11)

CALLER:

Bill payment.

SYSTEM:

Sure, bill payment. What bill would you like to PAY?

CALLER:

MasterCard.

SYSTEM:

How MUCH would you like to pay?

CALLER:

Two hundred and fifteen dollars.

SYSTEM:

And on what DATE would you like it paid?


To this point, our examples have featured cases where we would naturally expect to find contrastive stress, but because the messages have been recorded with little or no attention to context, contrastive stress is inappropriately lacking. In the next example, however, we find the opposite scenario. Contrastive stress seems natural in the context of the recording session, but it turns out to be inappropriate in the context in which the caller will experience the recording. In the context of reporting the time of day in natural, everyday conversation, it is the "M" of "AM" and "PM" that is stressed, as shown in (12).

graphics/sound_icon.gif

(12)

EIGHT ay-EM

FOUR pee-EM


In this context, "A" and "P" are weak. The strong syllables are those at the edge (the beginning and end) of each phrase, that is, the hour and "M." In the recording studio, however, voice actors sometimes stress "AM" and "PM" contrastively (13).

graphics/sound_icon.gif

(13)

AY-em

PEE-em


As a result, the stress pattern in (13) would sound appropriate only in the context of questioning or clarifying "AM" versus "PM," as in, "Did you just say 'AY-em' or 'PEE-em'?" or "The flight isn't leaving at six AY-em; it's leaving at six PEE-em." Unfortunately, this stress pattern is unsuitable for reporting the time in a neutral way. Instead of hearing prosodically well-formed (12), what we hear instead is anomalous "EIGHT | AY-em" and "FOUR | PEE-em," where the vertical bar ( | ) indicates a concatenation break.

The case of "AM" and "PM" is only one example of a commonplace prosodic violation in concatenated messages that can be explained as an artifact of inadequate scripting. That is, concatenation units are often recorded in simple list format, without the benefit of contextual cues. Whoever is recording prompts, however, should be aware that lists are subject to their own prosodic patterns, which are not always compatible with the intended use in the dialog. The concatenation result is prosodically ungrammatical, meaning that it sounds strange.

Often, the use of contrastive stress also affects the prosody of numbers. For example, when we count, we naturally stress the unit preceding "teen," as in (14). In other words, we count contrastively.

graphics/sound_icon.gif

(14)

. . . THIRteen, FOURteen, FIFteen, SIXteen, SEVenteen, EIGHTeen, NINEteen . . .


Whether or not this stress pattern is appropriate in a specific dialog context, many voice actors will read a list of numbers on a script in just this way if they do not know the dialog context. This is only natural. Apart from such number lists, however, stress may fall on either the first or the second half of each item, depending on the specific syntactic context. Compare the stress pattern of numbers not followed by nouns in (15) with numbers followed by nouns in (16).

graphics/sound_icon.gif

(15)

Message fourTEEN.

Today is June fifTEENTH.

Today's high will be twenty-two degrees Fahrenheit, with a low of eighTEEN.

The New York Mets defeated the San Diego Padres twenty to thirTEEN.


graphics/sound_icon.gif

(16)

You have SEVenteen messages.

The refund amount is THIRteen dollars and twenty-eight cents.

It's NINEteen degrees below zero, with the wind-chill factor of . . .

. . . a SIXteen percent increase.


In (15), stress falls on the last syllable of the numbers ("-TEEN"), whereas in (16), it falls on the first syllable (e.g., "SEV-"). This is known as stress shift. It is beyond the scope of this chapter to provide a formal analysis of this phenomenon, but in short, the patterns in (15) and (16) reflect the preference in English for stress at the edges of certain kinds of syntactic units (Selkirk 1995). The phrase "THIRteen-year-old GIRL" satisfies this preference, whereas "thirTEEN-year-old GIRL" violates it at the left edge and is therefore prosodically anomalous.[3]

[3] This principle is superseded, however, when stress is required for contrast for example, ". . . with a high of NINEteen and a low of THIRteen" and "The score was SIXteen to FOURteen."

Stress shifts are an example of a purely structural principle of prosody that is often violated when speech is concatenated. You must always consider context surrounding units such as "a.m." and "seventeen." To avoid creating prosodic anomalies, you should always contextualize concatenation units on the recording script for the director and voice actor (see Chapter 17).



Voice User Interface Design 2004
Voice User Interface Design 2004
ISBN: 321185765
EAN: N/A
Year: 2005
Pages: 117

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net