Previous page    Harmony Assistant    Next page 
 

Introduction
Products
What's new ?
Tutorial
Notation
Rendering
Devices/scripting
Virtual Singer
General points
Quick creation
Shaped notes
Gregorian
Jazz Scat
Midi & ABC
Rules for writing
Technical  background
Settings
Palette
Basic settings
Effects settings
Timbre settings
Editing phonemes
SAMPA notation
FAQ
Summary of commands
Real Singer
Bibliography/Thanks
FAQ
Software license
Technical support
Appendices
Printable manual


symbol marks changed chapters.

 
 

Virtual Singer

Editing Phonemes

 
    Very important: This chapter refers to advanced concepts of digital signal processing. Some knowledge of acoustics and digital signal processing will be needed to make use of it.

We saw earlier that phonemes are considered the basic acoustic elements for the spoken or sung voice (see the chapters on "Voice technical background").
Virtual Singer uses complex algorithms in order to synthesize these phonemes.
This kind of synthesis, called formants synthesis, uses original internal algorithms, inspired mainly by the writings of D. Klatt (see bibliography), as well as other informational sources.
The algorithm has been designed and refined following our own research into the reproduction of the sung voice.

While editing the voice timbre, an "Advanced" button opens the dialog box for defining the individual phonemes. Changes made in this window only modify the current singer's voice. Other voices will remain unchanged.

A few technical details


 
Question: How does Virtual Singer generate a phoneme?
An excitation digital signal (historically called a "glottal source") is generated, depending on the power and fundamental frequency of the phoneme to be sung. This signal is composed of a parabolic half-period, followed by a silent half-period (glottal stop). The first harmonic (the fundamental frequency),  the second harmonic (twice the fundamental frequency), and the thrid harmonic (triple the fundamental frequency) are then amplified, in order to approximate as nearly as possible the aural rendering of a sung vocal source. This source is then amplified to a greater or lesser degree, according to the voicing value.

Then the processing is divided into two parts:

Cascade processing: a noise, called aspiration noise, is added to the excitation source. This signal is then processed by a serial filter sequence (cascade), each filter corresponding to a formant.

Parallel processing: a noise, called frication noise, is added to the excitation source. The first order derivative of this signal is then processed by a parallel filter set, each filter corresponding to a formant. The amplitude of each formant is processed, in order to increase or decrease the respective influence of each formant in the output signal.

The results of the two processes listed above are then added, and modulated if necessary by a low-frequency (20 Hz) oscillator to simulate a rolling effect (as in Spanish "R"s).

After applying the output gain and treble/bass setting, the output signal is finally complete.

In concrete terms, this algorithm has major implications on how a phoneme is processed:

  • The amplitude for each formant is only processed by the parallel portion of the processing algorithm. Thus, even if a formant amplitude is set to zero, this formant will still have an effect on the resulting signal, because of its action in the cascade processing.
  • Aspiration noise passes through the cascade filter set. It is then highly distorted by the phoneme's formants, and its output is a more filtered (softer) noise, which can be used to simulate the effects of breath, generated at the far back of the vocal tract.
  • The first order derivative of the frication noise passes through the parallel filter set. It gives a more high-pitched noise, which can be used to simulate the sibilant, whistling noises made by the front part of the mouth.

Fragments


The basic phonetic element is the phoneme. But we have seen that some complex phonemes, such as diphthongs, can be made up of several successive states.
Because of this, we must define the notion of a fragment, which represents a "static" state within a phoneme. Thus, a phoneme can be made of one or several fragments.

The list on the left of this window displays the complete list of all fragments needed to pronounce any phoneme in any language. Fragments displayed in bold are used by the current language.


Important Note: In this window, you can change the pronunciation of one or several fragments. These changes are only applied to the singer currently being edited. Modifying a fragment in this window will only alter pronunciation for this singer, not the others.

Once a fragment is modified, it is displayed in color in the list. When selecting a modified fragment, it is possible to retore its default values by clicking the Original button below the list.

In the right part of this window, several graphical objects allow you to modify the fragment data.

In the topmost part of the window, a pop-up menu shows the fragment type:
Vowel means this fragment can be stretched when the syllable it is included into is extended in time.
If the syllable does not include any vowels, Virtual Singer will try to stretch transitional vowel fragments.
In the absence of either of those two types, vocalized consonant fragments will be stretched, and then unvocalized consonants.

The fragment duration can be changed through a slider.
This value is the natural time for the fragment. If this fragment is stretched, its duration will be increased.

Note: When a value is changed graphically (through a slider, for example), its digital value appears in a frame on the bottom right of the window.

Static part of a fragment

These are the set of values used to define the static part of the fragment, i.e. the portion that is independent of any transitions to or from adjacent fragments. These parameters can be modified using the large graphical area in the right part of the window.

Formants are displayed as triangles of colored lines. For each formant, the center frequency (in Hertz), Amplitude (dB) and bandwidth (width of the triangle's base, in Hz) can be changed. A set of checkboxes below this graphic allows you to activate or deactivate each formant in the parallel part of the voice generator.

Note: As explained above, even if a formant is deactivated, and is no longer displayed on the graphics, its frequency and bandwidth are still used in the cascade part of the voice generator.
On the right, a set of vertical sliders alows you to change the levels of voicing (av), rolling (Rl), aspiration (asp) and frication (af).
 

Tip: While editing a formant middle frequency or bandwidth graphically, two vertical lines are displayed. They show the upper and lower bounds for the parameter being modified, for that formant, among all the phonemes in the list. This helps you avoid setting the parameter to too "exotic" a value.
 

Fragment transition curves

During a spoken or sung part, the transition from one fragment to another is not instantaneous: the next fragment starts to be said before the previous one is completely finished. This smooth transition between fragments is called coarticulation.

For each parameter (formant frequency, amplitude, bandwidth and various levels), the graphic area on the bottom of the window lets you define its transition curve over time. The parameter whose curve is displayed is circled in red in the upper area.

On the transition curve, by convention, the previous value of the parameter is represented by the lowest value on the graph's vertical axis. The static value for the currently selected fragment (selected in the upper graphics) is represented by the highest value on the axis.

Note: this is a schematic display, not directly related to the effective or relative values of the parameter described.

The parameter's transition from its previous value to the current static value is displayed as two segments:
The first segment on the left, whose duration is "stolen" from the previous fragment's time. This segment will make the parameter evolve from the previous fragment's static value to an intermediate value, defined by the two vertical sliders to the left of the curve.
The ratio slider (Ra) lets you select the importance of the previous parameter value relative to that of the current fragment's static value (the value to be reached during the transition).
For example: a 0% ratio sets the intermediate value to the value to be reached.

A 100% ratio sets the intermediate value to the previous parameter value.
A 50% ratio sets the intermediate value to the average of the previous and current values.

The starting offset (Od) allows you to add a fixed amount to the intermediate value.

For example: with a ratio (Ra) of 50%, and an offset (Od) of 100, intermediate value is equal to 100 + the average of the previous and current values.

On the curve, the second segment gives the transition time between the intermediate value and the value to be reached (the static value of this parameter for the current fragment). This time is "stolen" from the current fragment.

Symmetrically, the two segments on the right, with a corresponding pair of sliders, allow you to define the transition from the current static value to the static value of the next fragment.

Thus a transition curve can be defined from the previous fragment's static value, as well as to the next fragment's static value.

Which transition curve segments are used depends on which fragment has a higher priority.  If the current fragment has a higher priority than the previous one, its "transition from previous" segments will be used, instead of the previous fragment's "transition to next" segments. Priority is given by the order of the fragment in the fragment list: the higher in the list, the greater the priority.

Example:
If the list only includes three fragments, "a, b, c" in that order, and the syllable to be sung is "babc", the following transitions will be made for each fragment parameter:

  • static value of fragment "b",
  • transition to the value of fragment "a", using the two first segments of the "a" transition curve (because "a" has a greater priority than "b"),
  • static value of fragment "a",
  • transition to the value of fragment "c", using the two last segments of the "a" transition curve (because "a" has a greater priority than "c"),
  • static value of fragment "c",
  • transition to the value of fragment "b", using the two last segments of the "b" transition curve (because "b" has a greater priority than "c"),
  • static value of fragment "b".

Action buttons

These buttons, located in the bottom-right corner of the window, perform several actions:

Try button

You can try the modified fragment by typing a simple sentence in the corresponding frame, then clicking the button. Then, a list of fragments used to pronounce the sentence is displayed. The symbols > and < between the fragment names give the relative priority of each fragment compared to the adjacent ones.
Note: whenever you select a fragment in the fragment list, a sample word for that fragment will be inserted in the text area.
Language pop-up menu
When another language has been selected, fragments used in that language appear in bold in the fragment list.
Copy/Paste buttons
These buttons allow you to copy all of the parameters and transition curves of a fragment, in order to paste them on another fragment.


(c) Myriad 2008 - All rights reserved