Previous page    Melody Assistant    Next page 
 

Introduction
Products
What's new ?
Tutorial
Notation
Rendering
Devices
Virtual Singer
General points
Quick creation
Shaped notes
Gregorian
Jazz Scat
Midi & ABC
Rules for writing
Technical  background
General points
Voice synthesis
Settings
SAMPA notation
FAQ
Summary of commands
Real Singer
Bibliography/Thanks
FAQ
Software license
Technical support
Appendices
Printable manual


symbol marks changed chapters.

 
 
 

Virtual Singer

Voice technical background

Sung voice synthesis


In voice synthesis, for speech as well as for singing, three main methods can be used:
  • vocal tract simulation,
  • connection of recorded elements,
  • formant synthesis.
  •  

    Vocal tract simulation


    Historically, this is the oldest method. The very first speech synthesis was designed for a mechanical automaton, using a collection of tubes and valves to simulate a vocal tract. The computer models of this process haven't given a convincing result to date, because of its extreme complexity.

    Connection of recorded elements

    A singer or a speaker is digitally recorded, in order to store the whole set of phonemes (or groups of phonemes). Then these samples are connected in sequence to rebuild the voice. Complex algorithms are used to alter the recorded phonemes and make them follow the vocal intonation (prosody).

    This method provides excellent results for standard speech. However, the algorithms are poorly adapted to generating a singing voice, because of the much wider frequency ranges.  Another drawback of this method is the need for very large voice description files.

    To define another voice, it is necessary to record another speaker/singer. Furthermore, the whole set of phonemes for each language must be recorded separately. To create multilingual software, it is thus necessary to record several different speakers/singers, and to store these samples in a huge file, often several megabytes in size.

    Formant synthesis

    This synthesis is based on the analysis of vocal sound. Acousticians have determined that vocal tract resonances amplify a small number of frequency ranges, related to the spoken phoneme. These frequency ranges have been called "formants". A formant is characterized by its frequency (pitch), its bandwidth (width of frequency range) and its energy (strength).

    Note: In electronics or computing, a formant can be simulated by a resonant bandpass filter.

    In the early 1960s, the first devices used electronic filters to generate recognizable phonemes. Acousticians then realized that only three to six formants are sufficient to generate a phoneme with acceptable quality. The advantage of this method is that only a small amount of data is required to generate a phoneme, and it is far easier to modify these data slightly to produce another voice timbre.
    However, the result is generally less realistic than with recorded speech elements.

    This third method is used in Virtual Singer.



(c) Myriad - All rights reserved