IBM Watson™ Ideas

Welcome to the IBM Watson™ Ideas Portal


We welcome and appreciate your feedback on IBM Watson™ Products to help make them even better than they are today!


If you are looking for troubleshooting help or wondering how to use our products and services, please check the IBM Watson™ documentation. Please do not use the Ideas Portal for reporting bugs - we ask that you report bugs or issues with the product by contacting IBM support.


Before you submit an idea, please perform a search first as a similar idea may have already been reported in the portal.


If a related idea is not yet listed, please create a new idea and include with it a description which includes expected behavior as well as why having this feature would improve the service and how it would address your use case.

Phoneme timings in Text to Speech service

We'd like to use the text to speech service to control an animatronic. The animatronic has a mouth and needs to manipulate its lips and jaws as it's speaking and Amazon had phoneme and viseme support which is what we were using. However, we're switching to Watson for our upcoming demo and could not find anything related to the "mouth position" that would correspond with the audio. We tried generating the mouth shape using acoustic models but it doesn't look good. We're looking to retrieve both the audio and phonetics while the robot is speaking to control the mouth directly. Is there any way to do that with IBM Watson's Text to Speech system? See image attached for the different mouth shapes used by companies like Disney.

Related links: https://docs.aws.amazon.com/polly/latest/dg/viseme.html

  • Guest
  • Mar 28 2019
  • Attach files