Tired of waiting for your favourite eBook to be narrated? Not anymore! You can use Audiblez, a Python program to convert your favorite Epub Ebooks to Audiobooks in Linux, macOS and Windows.
Table of Contents
Introduction
Have you ever found yourself eagerly anticipating the audiobook release of a new book, only to be disappointed by the long wait?
Or maybe you're interested in a niche title that hasn't been picked up by audiobook platforms?
If this sounds familiar, Audiblez might be the solution you've been looking for.
Audiblez is a Python-based tool that allows you to generate your own audiobooks from e-books using text-to-speech technology. Powered by the Kokoro v0.19 model, Audiblez delivers remarkably natural-sounding narration, transforming your digital reading experience.
What is Kokoro?
Kokoro is a high-quality, relatively small text-to-speech (TTS) model, released under the Apache 2.0 licence and capable of generating audiobooks.
Kokoro stands out for its small size (82 million parameters) and remarkably high-quality output, even surpassing models many times its size.
Kokoro utilizes a combination of innovative architectures, including:
- StyleTTS 2: A method for high-fidelity and expressive speech synthesis.
- ISTFTNet: A neural network architecture designed for efficient audio signal processing.
Kokoro is trained using only permissive/non-copyrighted audio data, ensuring ethical considerations are addressed. Examples of such data include public domain audio and synthetic audio generated by commercial TTS models.
Key Features of Kokoro
- High Elo Rating: It achieved the #1 ranking in the TTS Spaces Arena, a competitive benchmark for TTS models. This indicates its superior performance compared to other models, even those with significantly more parameters and training data.
- Efficient Training: Kokoro v0.19 was trained on A100 80GB vRAM instances for approximately 500 GPU hours, with an estimated total cost of $400.
- Limited Data Requirements: The model was trained on less than 100 hours of audio data, demonstrating its efficiency in learning from relatively small datasets.
- Multilingual Capabilities: Kokoro currently supports American and British English, with the potential for expanding to other languages.
- Diverse Voice Options: The model offers a selection of voice packs, including male and female voices with distinct accents.
Despite its strengths, Kokoro v0.19 has some limitations, primarily stemming from its training data and architectural choices:
- Lack of voice cloning capability.
- Reliance on an external grapheme-to-phoneme (g2p) converter, introducing potential failure points.
- Training data focused on long-form reading and narration, potentially limiting its performance in conversational settings.
What is Audiblez?
Audiblez is a command line Python program. It utilises Kokoro to convert epub ebooks into m4b audiobooks. The process involves extracting text from the epub, then using Kokoro to synthesise speech, finally combining the audio chapters into an audiobook file using ffmpeg.
Audiblez program supports multiple languages and voices, offering customisation options for speed and chapter selection.
Please note that Audiblez only supports .epub
files at the moment.
Key Features of Audiblez
- High-quality Speech Synthesis: Audiblez leverages Kokoro, an 82M parameter TTS model known for its impressive output quality, surpassing even larger models in certain benchmarks. This ensures your audiobooks sound clear and engaging.
- Support for Multiple Languages: Audiblez currently caters to English (both American and British accents), French, Japanese, Korean, and Mandarin, allowing you to enjoy audiobooks in various languages.
- Diverse Voice Options: Choose from a variety of voices to personalise your listening experience. Explore the different options available for each language and find the perfect voice to accompany your reading.
- Chapter Detection: Audiblez automatically identifies chapters within your e-book, streamlining the conversion process. This helps to organise your audiobook and makes navigation easier.
- Easy to Use: Installing and using Audiblez is straightforward. With a few simple commands, you can start converting your e-books into audiobooks.
Supported Languages
Audiblez currently supports these languages: American English (en-us), British English (en-gb), French (fr-fr), Japanese (ja), Korean (kr), and Mandarin (cmn).
You can specify the language using the -l option when running Audiblez. For example, to use British English, you would use the command audiblez book.epub -l en-gb
.
Supported Voices
You can choose from these voices using the -v option: af, af_bella, af_nicole, af_sarah, af_sky, am_adam, am_michael, bf_emma, bf_isabella, bm_george, and bm_lewis.
You can listen to samples of each voice at the Kokoro-TTS demo: https://huggingface.co/spaces/hexgrad/Kokoro-TTS.
How to Install Audiblez in Linux
First, make sure you have installed ffmepg on your system.
You can install Audiblez using any Python package installers, such as conda, pip, pipx, pipenv, or uv.
For instance, to install Audiblez using pipx, run the following command:
pipx install audiblez
Next you need to download the Kokoro model and voice files. I recommend you to put them in a dedicated folder for easy management.
mkdir My\ Audio\ Books
cd My\ Audio\ Books/
wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files/kokoro-v0_19.onnx
This will download the latest Kokoro model.
Next download the voice files with command:
wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files/voices.json
We now have both the model and the voice files. it's time to generate audiobooks from epub ebooks.
Convert Epub to Audiobook using Audiblez
Generating audiobooks from epub files is very easy and straight-forward.
To do so, use the Audiblez command-line tool to specify the e-book file, desired language, and voice like below:
audiblez book.epub -l en-us -v af_sky
Replace book.epub
with your own epub ebook and af_sky
with your preferred voice model.
Audiblez generates individual WAV files for each chapter and combines them into an M4B audiobook file, compatible with most audiobook players.
Please Note: To create M4B files, you need to have ffmpeg installed on your machine.
Adjust Audio Speed
You can adjust the speed of the audio generated by Audiblez. By default, Audiblez generates audio at normal speed.
However, you can make the audio up to twice as slow or as fast by specifying a speed argument between 0.5 and 2.0 using the -s option.
For example, to generate audio 1.5 times faster than the normal speed, you would use the command:
audiblez book.epub -l en-gb -v af_sky -s 1.5
Running Audiblez on GPU
By default, Audiblez runs on the CPU and uses the CPU-enabled ONNX Runtime, which is installed automatically.
To run Audiblez on a GPU for faster performance, you need to install the GPU-enabled ONNX Runtime and specify a runtime provider with the --providers
flag.
Here's how to do it.
1. Install the GPU-enabled ONNX Runtime. This needs to be done manually. You can install it using pip or pipx:
pip install onnxruntime-gpu
2. Specify ONNX providers using the --providers
tag when running Audiblez.
To use an NVIDIA GPU, use CUDAExecutionProvider
:
audiblez book.epub -l en-gb -v af_sky --providers CUDAExecutionProvider
3. You can specify a provider hierarchy by providing multiple hierarchies separated by spaces. For example:
audiblez book.epub -l en-gb -v af_sky --providers CUDAExecutionProvider CPUExecutionProvider
This command will try to use the CUDAExecutionProvider first, and if that's not available, it will fall back to the CPUExecutionProvider.
To see a list of available providers on your system, run:
audiblez --help
or
python -c "import onnxruntime as ort; print(ort.get_available_providers())"
This will display the ONNX providers that can be used.
Conclusion
With Audiblez, you can easily convert your existing e-book library into an audio library with simple commands.
While Audiblez is still under development, it already provides an easy and efficient way to create audiobooks from ebooks.
Future updates may include improvements to chapter detection, the ability to add chapter navigation within the M4B file, and even narration for images using image-to-text technology.
Related Read:
- Audiogenipy: Create Audiobooks With Python And gTTS Effortlessly In Linux
- Speech Note: An Offline Speech Recognition, Text-to-Speech and Translation App for Linux
- eSpeak NG – A Text To Speech Synthesizer For Linux
Featured Image by Stas Knop from Pexels.