Text to Speech Pronunciation Instructions

<break> #

Controls pausing or other prosodic boundaries between words. Using <break> between any pair of tokens is optional. If this element is not present between words, the break is automatically determined based on the linguistic context.

AttributeDescription
timeSets the length of the break by seconds or milliseconds (e.g. “3s” or “250ms”).
strengthSets the strength of the output’s prosodic break by relative terms.

Valid values are: “x-weak”, weak”, “medium”, “strong”, and “x-strong”. The value “none” indicates that no prosodic break boundary should be outputted, which can be used to prevent a prosodic break that the processor would otherwise produce.

Example

<speak>
  Step 1, take a deep breath. <break time="200ms"/>
  Step 2, exhale.
  Step 3, take a deep breath again. <break strength="weak"/>
  Step 4, exhale.
</speak>

<say-as> #

This element lets you indicate information about the type of text construct that is contained within the element. It also helps specify the level of detail for rendering the contained text.

The <say‑as> element has the required attribute, interpret-as, which determines how the value is spoken. Optional attributes format and detail may be used depending on the particular interpret-as value.

Example

The following example is spoken as “Twelve thousand three hundred forty five” (for US English) or “Twelve thousand three hundred and forty five (for UK English)”:

cardinal

The following example is spoken as “Twelve thousand three hundred forty five” (for US English) or “Twelve thousand three hundred and forty five (for UK English)”:

<speak>
  <say-as interpret-as="cardinal">12345</say-as>
</speak>

ordinal

The following example is spoken as “First”:

<speak>
  <say-as interpret-as="ordinal">1</say-as>
</speak>

characters

The following example is spoken as “C A N”:

<speak>
  <say-as interpret-as="characters">can</say-as>
</speak>

date

The format attribute is a sequence of date field character codes. Supported field character codes in format are {y, m, d} for year, month, and day (of the month) respectively. If the field code appears once for year, month, or day then the number of digits expected are 4, 2, and 2 respectively. If the field code is repeated then the number of expected digits is the number of times the code is repeated. Fields in the date text may be separated by punctuation and/or spaces.

The detail attribute controls the spoken form of the date. For detail=’1′ only the day fields and one of month or year fields are required, although both may be supplied. This is the default when less than all three fields are given. The spoken form is “The {ordinal day} of {month}, {year}”.

The following example is spoken as “The tenth of September, nineteen sixty”:

<speak>
  <say-as interpret-as="date" format="yyyymmdd" detail="1">
    1960-09-10
  </say-as>
</speak>

<p>,<s> #

Sentence and paragraph elements.

Example

<p><s>This is sentence one.</s><s>This is sentence two.</s></p>
  • Use <s>…</s> tags to wrap full sentences, especially if they contain SSML elements that change prosody (that is, <audio>, <break>, <emphasis>, <par>, <prosody>, <say-as>, <seq>, and <sub>).
  • If a break in speech is intended to be long enough that you can hear it, use <s>…</s> tags and put that break between sentences.

<prosody> #

Used to customize the pitch, speaking rate, and volume of text contained by the element. Currently the rate, pitch, and volume attributes are supported.

AttributeDescription
name
The string ID for each mark.
OptionDescription
RelativeSpecify a relative value (e.g. “low”, “medium”, “high”, etc) where “medium” is the default pitch.
SemitonesIncrease or decrease pitch by “N” semitones using “+Nst” or “-Nst” respectively. Note that “+/-” and “st” are required.
PercentageIncrease or decrease pitch by “N” percent by using “+N%” or “-N%” respectively. Note that “%” is required but “+/-” is optional.

Example

The following example uses the <prosody> element to speak slowly at 2 semitones lower than normal:

<prosody rate="slow" pitch="-2st">Can you hear me now?</prosody>

<emphasis> #

Used to add or remove emphasis from text contained by the element. The <emphasis> element modifies speech similarly to <prosody>, but without the need to set individual speech attributes.

This element supports an optional “level” attribute with the following valid values:

  • strong
  • moderate
  • none
  • reduced

Example

The following example uses the <emphasis> element to make an announcement:

<emphasis level="moderate">This is an important announcement</emphasis>