Virtual Singer
Editing Phonemes
|
 |
Very important: This chapter refers to
advanced concepts of digital
signal processing.
Some knowledge of acoustics and digital signal processing will be
needed to make use of it.
|
|
We saw earlier that phonemes are considered
the basic acoustic elements for the spoken or sung voice (see the
chapters on
"Voice technical background").
Virtual Singer uses complex algorithms in order to
synthesize
these phonemes.
This kind of synthesis, called formants synthesis, uses
original
internal algorithms, inspired mainly by the writings of D. Klatt (see
bibliography), as well as other
informational
sources.
The algorithm has been designed and refined following our own research
into the reproduction of the sung voice.
While editing the voice timbre, an "Advanced" button
opens the dialog
box for defining the individual phonemes. Changes made in this window
only modify
the current singer's voice. Other voices will remain unchanged.
A few technical details
|
 |
 |
Question: How
does Virtual Singer generate a phoneme? |
|
An excitation digital signal (historically called a "glottal
source")
is generated, depending on the power and fundamental frequency of the
phoneme
to be sung. This signal is composed of a parabolic half-period,
followed by
a silent half-period (glottal stop). The first harmonic (the
fundamental
frequency), the second harmonic (twice the fundamental
frequency), and the thrid harmonic (triple the
fundamental frequency) are then amplified, in order to approximate as
nearly as possible
the aural rendering of a sung vocal source. This source is then
amplified to a greater or lesser degree, according to the voicing value.
Then the processing is divided into two parts:
Cascade processing: a noise, called aspiration
noise,
is added to the excitation source. This signal is then processed by a serial
filter sequence (cascade), each filter corresponding to a formant.
Parallel processing: a noise, called frication
noise, is added to the excitation source.
The first order
derivative of this signal is then processed by a parallel filter set,
each filter corresponding to a formant. The amplitude of each
formant is processed, in order to increase or decrease the respective
influence
of each formant in the output signal.
The results of the two processes listed above are then
added, and modulated if
necessary by a low-frequency (20 Hz) oscillator to simulate a rolling
effect (as in Spanish "R"s).
After applying the output gain and treble/bass setting,
the output signal
is finally complete.
In concrete terms, this algorithm has major implications
on how a phoneme is processed:
- The amplitude for each formant is only
processed by the parallel
portion of the processing algorithm. Thus, even if a formant amplitude
is
set to zero, this formant will still have an effect on the resulting
signal,
because of its action in the cascade processing.
- Aspiration noise passes through the cascade
filter set. It
is then highly distorted by the phoneme's formants, and its output is a
more filtered (softer) noise, which can be used to simulate the effects
of breath,
generated at the far back of the vocal tract.
- The first order derivative of the frication noise
passes
through
the parallel filter set. It gives a more high-pitched noise,
which
can be used to simulate the sibilant, whistling noises made by the
front part of the
mouth.
Fragments
|
 |
The basic phonetic element is the phoneme.
But we have seen that some complex phonemes, such as diphthongs, can be
made up of several
successive states.
Because of this, we must define the notion of a fragment, which
represents a "static" state within a phoneme.
Thus, a phoneme can be made of one or several fragments.
The list on the left of this window displays the
complete list of all
fragments needed
to pronounce any phoneme in any language. Fragments
displayed in bold are used by the current
language.
 |
Important Note:
In this window, you can change the pronunciation of one or several
fragments. These changes are only applied to the
singer currently being edited. Modifying a fragment in this window will
only
alter pronunciation for this singer, not the others. |
|
Once a fragment is modified, it is displayed in color in the list. When
selecting a modified fragment, it is possible to retore its
default
values by clicking the Original button below the list.
In the right part of this window, several graphical
objects allow you to
modify the fragment data.
In the topmost part of the window, a pop-up menu shows
the fragment
type:
Vowel means this fragment can be stretched
when the syllable
it is included into is extended in time.
If the syllable does not include any vowels, Virtual Singer will try
to stretch transitional vowel fragments.
In the absence of either of those two types, vocalized
consonant fragments will be stretched, and then unvocalized
consonants.
The fragment duration can be changed through a
slider.
This value is the natural time for the fragment. If this
fragment
is stretched, its duration will be increased.
 |
Note:
When a value is changed graphically (through
a slider, for example), its digital value appears in a frame on the
bottom
right of the window. |
|
Static part of a fragment
|
 |
These are the set of values used to define the static
part of the fragment, i.e. the portion that is independent of any
transitions to or from adjacent
fragments. These parameters can be modified using the large graphical
area in
the right part of the window.
Formants are displayed as triangles of
colored lines.
For each formant, the center
frequency (in Hertz), Amplitude (dB) and bandwidth
(width of the triangle's base, in Hz) can be changed. A set of
checkboxes below this graphic allows you to activate or
deactivate
each formant in the parallel part of the voice generator.
 |
Note:
As explained above, even if a formant is deactivated,
and is no longer displayed on the graphics, its frequency and bandwidth
are still used in the cascade part of the
voice generator. |
|
On the right, a set of vertical sliders alows you to change the levels
of voicing (av), rolling (Rl), aspiration
(asp) and frication
(af).
 |
Tip:
While editing a formant middle
frequency or bandwidth graphically, two vertical lines
are displayed.
They
show the upper and lower bounds for the parameter being modified, for
that formant, among all the phonemes in the
list. This helps you avoid setting the parameter to too "exotic"
a value. |
|
Fragment transition curves
|
 |
During a spoken or sung part, the transition from one
fragment
to another
is not instantaneous: the next fragment starts to be said before the
previous
one is completely finished. This smooth transition between fragments is
called coarticulation.
For each parameter (formant frequency,
amplitude, bandwidth and various levels), the graphic area on the
bottom of the window lets you define its transition curve over time.
The parameter whose
curve is displayed is circled in red in the upper area.
On the transition curve, by convention, the previous
value of the parameter is represented by the lowest value on the
graph's vertical axis. The static value for the currently selected
fragment (selected in the upper
graphics) is represented by the highest value on the axis.
 |
Note: this is
a schematic display, not directly related to the effective or relative
values of
the
parameter described. |
|
The parameter's transition from its previous value to the
current
static
value is displayed as two segments:
The first segment on the left, whose
duration
is "stolen"
from the previous fragment's time. This segment will make the parameter
evolve from the previous fragment's
static value to an intermediate value, defined by the two vertical
sliders to the left of the curve.
The ratio slider (Ra) lets you select
the importance of the previous
parameter value relative to that of the current fragment's static value
(the value to be reached during the transition).
For example: a 0% ratio sets the intermediate value to
the value to be
reached.
A 100% ratio sets the intermediate value to the
previous parameter
value.
A 50% ratio sets the intermediate value to the average of the previous
and current values.
The starting offset (Od) allows you to add a
fixed
amount to the intermediate
value.
For example: with a ratio (Ra) of 50%, and an
offset (Od) of 100, intermediate value is equal to 100 + the average of
the previous
and current values.
On the curve, the second segment gives the
transition time between the intermediate value and the value to be
reached (the static value of
this parameter for the current
fragment). This time is "stolen" from the current fragment.
Symmetrically, the two segments on the right, with a
corresponding pair of sliders, allow you to define the transition from
the current static value to the static
value of the next fragment.
Thus a transition curve can be defined from the
previous
fragment's static value, as well as to the next fragment's
static value.
Which transition curve segments are used depends on
which fragment has a higher priority.
If the current fragment has a higher priority than the previous one,
its "transition from previous" segments will be used, instead of the
previous fragment's "transition to next" segments. Priority
is given by the order of the fragment in the fragment list: the higher
in the list, the greater the priority.
Example:
If the list only includes three fragments,
"a, b, c" in that order, and the syllable to be sung is "babc", the
following
transitions will be made for each fragment parameter:
- static value of fragment "b",
- transition to the value of fragment "a",
using the two first
segments of the "a" transition curve (because "a" has a greater
priority
than "b"),
- static value of fragment "a",
- transition to the value of fragment "c",
using the two last
segments of the "a" transition curve (because "a" has a greater
priority
than "c"),
- static value of fragment "c",
- transition to the value of fragment "b",
using the two last
segments of the "b" transition curve (because "b" has a greater
priority
than "c"),
- static value of fragment "b".
Action buttons
|
 |
These buttons, located in the bottom-right corner of the
window, perform several actions:
Try button
You can try the modified fragment by typing a
simple sentence
in the corresponding frame, then clicking the button. Then, a list
of fragments used to pronounce the sentence is
displayed. The symbols > and < between the
fragment names give
the
relative priority of each fragment compared to the adjacent ones.
Note: whenever you select a fragment in
the fragment list, a sample word for that fragment will be inserted in
the
text
area.
Language pop-up menu
When another language has been selected,
fragments used
in that language
appear in bold in the fragment list.
Copy/Paste buttons
These buttons allow you to copy all of the
parameters and
transition curves of
a fragment, in order to paste them on another fragment. |