Crafting a short story with LLMs and Text-to-Speech

I’ve spent a lot of time exploring Image Generation with Stable Diffusion, which has been quite fascinating. However, a couple of months ago, I also tried out some text-to-speech services/models, and I have to admit, the results were surprisingly good. 🔊

The Story

I only spent about five minutes experimenting with prompts in a standard ChatGPT-3.5 instance, and here’s what I came up with: Write me a short children’s story about a kid that grew up to become a dragon

Here is the full text:

Once upon a time, there was a little boy named Tom who loved to read books about dragons. He was fascinated by their powerful wings, sharp claws, and fiery breath. Tom spent hours imagining what it would be like to be a dragon and fly high in the sky.
As he grew older, Tom’s fascination with dragons never faded. In fact, he became even more obsessed with them. He read every book he could find, watched every movie, and even dressed up as a dragon for Halloween.
One day, Tom woke up and felt different. His skin was tingling, and he could feel something growing on his back. He looked in the mirror and couldn’t believe his eyes. He had sprouted wings! And as he tried to speak, a little puff of smoke came out of his mouth.
Tom soon realized that his childhood dream had come true - he had become a dragon! At first, he was scared and didn’t know what to do, but soon he learned how to control his wings and his fiery breath. He flew high in the sky, feeling the wind beneath his wings and the sun on his scales.
Everyone in the village was amazed when they saw Tom’s new form, but they soon realized that he was still the same kind-hearted boy they had always known. Tom loved to play with the village children and always made sure to use his fire-breath only for good.
And so, Tom the dragon became a beloved member of the village, and the children loved to hear stories about his adventures in the sky. Tom knew that he had fulfilled his childhood dream and felt grateful for every moment he spent soaring through the clouds.

Clearly, it’s not a fully developed story, more like a rough sketch, but I found it rather cute and thought it would be perfect for testing with text-to-speech.

Giving it a Voice

Following a brief Google search, I came across ElevenLabs. After experimenting with several voice options, I settled on Bella from the available pre-made voices. As of my current writing, ElevenLabs generously provides up to 2500 characters for a single text from a quota of 10,000, all at no cost.

A screenshot of ElevenLabs' Speech Synthesis

After a brief pause, voilà! The results sound remarkably impressive, in my humble opinion. They bear no resemblance to the older, mechanical-sounding text-to-speech voices that we often come across. The clarity and naturalness of the output are quite striking, making it a notable advancement compated to the last time I personally used text-to-speech.

Amazing right?

Conclusion

Is AI here to steal our creativity or our jobs? My take? Nope. AI is more like a creative sidekick. It boosts our ideas and helps us do cool stuff, but it doesn’t replace us. All these new tools are easily accessible for anyone, with new user interfaces, clients, etc., releasing every day, and by going even deeper with APIs or fine-tuned models, the possibilities are limitless.

An important caveat to consider is that, like many other technological advances, the benefits of AI will only truly manifest when it’s accessible to all, and when AI models aren’t confined within walled gardens. Fortunately, we’re seeing positive strides in this direction, thanks to models like Stable Diffusion and LLaMA amongst others.