Sometimes you may only need text from a small part of a long YouTube video, maybe a section from a talk or a clip from a podcast. Downloading the entire video is a waste of time. But with the right tools, you can pull out only the part you want and turn it into text quickly.
In this post, I'll show you how to extract text from a specific portion of a YouTube Video using free tools such as yt-dlp, ffmpeg, and Whisper.
Everything runs right on Linux, and my setup stays clean thanks to pipx, which installs Whisper without cluttering the system.
Table of Contents
Why I Use This Method
I like to keep control over my data. Most web services for transcription upload files somewhere I don't know. With free, open source command-line tools, I work locally and keep everything private.
This method lets me:
- Download only the audio from a YouTube video.
- Cut out just the section I care about.
- Turn that audio into text I can edit, quote, or summarize.
The whole transcription process is very simple, reliable, and works entirely offline. Let us get to work.
WARNING: Whisper and its dependencies can take up significant space (around 5–8 GB during installation). The first time you run it, it may briefly connect to download the model files. After that, it runs completely offline. Your data (audio) never leaves your computer. If you're on limited data or a slow connection, consider using lighter alternatives like faster-whisper or online transcription tools instead.
Step 1: Install the Necessary Tools
As I already mentioned, I am going to use the following open source tools to transcribing audio from a YouTube video i.e extracting text from YouTube videos:
Here's what each tool does in our case:
| Tool | Purpose | Benefit |
|---|---|---|
| yt-dlp | Downloads full audio from YouTube | Fast and reliable |
| ffmpeg | Cuts only the part you need | No re-encoding required |
| Whisper | Converts speech to text | Accurate and offline |
| pipx | Installs Whisper cleanly | Keeps system Python untouched |
Let us install these tools.
In Debian and Ubuntu-based systems, you can install these tools like below:
sudo apt install yt-dlp ffmpeg pipx pipx ensurepath
Next install Whisper through pipx, so it doesn't scatter files across my system:
pipx install openai-whisper
pipx keeps Whisper and its dependencies inside their own isolated environment under ~/.local/pipx/. That means you can remove or upgrade it anytime without touching your main Python setup.
Now you have these commands available everywhere:
yt-dlpffmpegwhisper
Step 2: Download the Audio Only from a YouTube Video using Yt-dlp
To grab audio from YouTube, you can use yt-dlp:
yt-dlp -f bestaudio -x --audio-format mp3 -o "video_audio.%(ext)s" "https://www.youtube.com/watch?v=xxxx"
Replace "https://www.youtube.com/watch?v=xxxx" in the above command with your actual URL.
This downloads only the audio stream, converts it to MP3, and saves it as video_audio.mp3.
Even though you only need a part of the video, yt-dlp must download the full audio file because YouTube doesn't allow partial time-range downloads. But that's fine, because the next step trims it down in seconds.
Step 3: Trim the Exact Portion with FFmpeg
Let's say you need the section from 1:52:00 to 2:30:00 from the output audio file i.e. video_audio.mp3 in our case.
To do so, you can use ffmpeg like below:
ffmpeg -ss 01:52:00 -to 02:30:00 -i "video_audio.mp3" -c copy clip.mp3
This creates a smaller file called clip.mp3 that contains only that time range.
The -c copy option tells ffmpeg to skip re-encoding, so the process finishes almost instantly even for long recordings.
Note: If you want to transcribe the whole YouTube video, simply ignore the STEP 3.
Step 4: Convert Speech to Text using Whisper
Now use Whisper to transcribe the audio clip that we trimmed in step 3:
whisper clip.mp3 --model base --language en
I got good results with base model. You can also use some other models like tiny, small, medium, large. Check out the Whisper Github page for available models.
whisper clip.mp3 --model small --language en
Whisper produces several files:
clip.txt: plain text transcriptclip.srtandclip.vtt: subtitle files with timestamps
It runs entirely offline and is surprisingly accurate.
If you want GPU acceleration or faster results, you can install Faster-Whisper with pipx as well:
pipx install faster-whisper faster-whisper --model medium clip.mp3
Both versions work great, but the regular Whisper model is often enough for short clips.
As I mentioned already, Whisper produces many files. If you want to save the text in one file, you can use this command:
whisper clip.mp3 --model base --language en --output_format txt --output_dir .
Please note the dot (.) at the end. This command creates a file named clip.txt containing the transcribed text. Here's what each part of the command does:
--model base: uses the small, fast Whisper model.--language en: forces English mode for better accuracy.--output_format txt: generates a plain-text file.--output_dir .: saves it in the current folder.
Step 5: Transcribe YouTube Videos using a Script (Optional)
If you do this often, you can use a small script to handle all the steps. Save the following code in a text called yt_extract_text.sh:
#!/bin/bash url=$1 start="01:52:00" end="02:30:00" yt-dlp -f bestaudio -x --audio-format mp3 -o temp.mp3 "$url" ffmpeg -ss "$start" -to "$end" -i temp.mp3 -c copy clip.mp3 whisper clip.mp3 --model small --language en
Replace the start and end times, audio format, output file name, model and language in the script.
Then make it executable:
chmod +x yt_extract_text.sh
To use it, simply type:
./yt_extract_text.sh "https://www.youtube.com/watch?v=xxxx"
The script downloads, trims, and transcribes automatically. You will end up with a neat .txt file containing just the part you wanted.
FAQ: Extracting Text from YouTube Videos on Linux
A: Not directly. YouTube doesn't allow partial downloads. The clean way is to download the full audio with yt-dlp, then use ffmpeg to trim the exact section before transcribing it with Whisper.
A: Use OpenAI Whisper or Faster-Whisper. Both run locally, work with many languages, and deliver high accuracy compared to online transcription tools.
A: Install pipx first, then run: pipx install openai-whisper
This method keeps Whisper in its own isolated environment and avoids cluttering your system Python packages.
A: Yes. Whisper is open-source and completely free. You can run it locally on Linux without any subscription or cloud cost.
A: You need three tools:
1. yt-dlp to download the audio
2. ffmpeg to trim the portion you want
3. Whisper to transcribe speech to text
All three are free and open-source.
A: Yes. The same commands work on Windows and macOS with small path adjustments. Linux users just find it easier because most tools are available from the terminal by default.
A: There's no strict limit. Whisper can handle long audio files as long as your system has enough RAM and storage. For very long videos, trimming them into smaller segments works best.
A: Whisper generates .txt, .srt, and .vtt files. The .txt file contains plain text, while .srt and .vtt include timestamps for use as subtitles.
A: No. Once you’ve downloaded the audio with yt-dlp, both ffmpeg and Whisper work completely offline.
A: Yes. Every step runs on your local machine. No audio or text leaves your system, which keeps your data private.
Conclusion
I use this method often to collect quotes and notes from long YouTube videos. I no longer waste time transcribing by hand or relying on online tools that store my data. Everything happens locally, and I get high-quality text from exactly the part I need.
If you watch long lectures, podcasts, or interviews, this setup can save you hours. Try it once, it's simple, clean, and it just works.
Related Read:
Featured image by Pixabay.

4 comments
This really interested me and I decided to give it a try, However I really think you should be upfront and warn people of the size of this software. On my modest connection the install of open-ai-whisper and pipx totalled 10.8gb! Yes 10.8gb and it was downloading for what seemed like an age with no indication of what was happening. On running the program it immediately made an https connection to god knows where and I instantly disconnected from the internet for safety. So much for not sending your data out to who knows where!. This software would not work without an internet connection. I would not recommend anyone to try this software, unless you can put my concerns to rest?.
You’re right. It downloads a significant amount of data at first install. Please note that Whisper is large but safe. The only network activity is model downloading, not data uploading.
Once installed, it’s fully offline, private, and transparent. I will add this warning right way. Somehow I missed it. Thanks for bringing this to my attention.
Thanks for your quick response and for adding the warning to the article, its appreciated. I have had chance to test again and have found that whisper was re-downloading the model (base) I selected as it said that the files sha256 did not match with the copy I had installed. I also tested with the ‘small’ model and this produced very accurate results on the 9 minute youtube clip I downloaded. I’m impressed with it and look forward to trying it some more.
Glad it helped you. Whisper is one of the useful opensource project.