Extract Text from A Specific Portion Of A YouTube Video On Linux

Sometimes you may only need text from a small part of a long YouTube video, maybe a section from a talk or a clip from a podcast. Downloading the entire video is a waste of time. But with the right tools, you can pull out only the part you want and turn it into text quickly.

In this post, I'll show you how to extract text from a specific portion of a YouTube Video using free tools such as yt-dlp, ffmpeg, and Whisper.

Everything runs right on Linux, and my setup stays clean thanks to pipx, which installs Whisper without cluttering the system.

Table of Contents

Why I Use This Method

I like to keep control over my data. Most web services for transcription upload files somewhere I don't know. With free, open source command-line tools, I work locally and keep everything private.

This method lets me:

Download only the audio from a YouTube video.
Cut out just the section I care about.
Turn that audio into text I can edit, quote, or summarize.

The whole transcription process is very simple, reliable, and works entirely offline. Let us get to work.

WARNING: Whisper and its dependencies can take up significant space (around 5–8 GB during installation). The first time you run it, it may briefly connect to download the model files. After that, it runs completely offline. Your data (audio) never leaves your computer. If you're on limited data or a slow connection, consider using lighter alternatives like faster-whisper or online transcription tools instead.

Step 1: Install the Necessary Tools

As I already mentioned, I am going to use the following open source tools to transcribing audio from a YouTube video i.e extracting text from YouTube videos:

Yt-dlp
FFmpeg
Whisper
Pipx (to install Whisper)

Here's what each tool does in our case:


Tool	Purpose	Benefit
yt-dlp	Downloads full audio from YouTube	Fast and reliable
ffmpeg	Cuts only the part you need	No re-encoding required
Whisper	Converts speech to text	Accurate and offline
pipx	Installs Whisper cleanly	Keeps system Python untouched

Let us install these tools.

In Debian and Ubuntu-based systems, you can install these tools like below:

sudo apt install yt-dlp ffmpeg pipx
pipx ensurepath

Next install Whisper through pipx, so it doesn't scatter files across my system:

pipx install openai-whisper

pipx keeps Whisper and its dependencies inside their own isolated environment under ~/.local/pipx/. That means you can remove or upgrade it anytime without touching your main Python setup.

Now you have these commands available everywhere:

yt-dlp
ffmpeg
whisper

Step 2: Download the Audio Only from a YouTube Video using Yt-dlp

To grab audio from YouTube, you can use yt-dlp:

yt-dlp -f bestaudio -x --audio-format mp3 -o "video_audio.%(ext)s" "https://www.youtube.com/watch?v=xxxx"

Replace "https://www.youtube.com/watch?v=xxxx" in the above command with your actual URL.

This downloads only the audio stream, converts it to MP3, and saves it as video_audio.mp3.

Even though you only need a part of the video, yt-dlp must download the full audio file because YouTube doesn't allow partial time-range downloads. But that's fine, because the next step trims it down in seconds.

Step 3: Trim the Exact Portion with FFmpeg

Let's say you need the section from 1:52:00 to 2:30:00 from the output audio file i.e. video_audio.mp3 in our case.

To do so, you can use ffmpeg like below:

ffmpeg -ss 01:52:00 -to 02:30:00 -i "video_audio.mp3" -c copy clip.mp3

This creates a smaller file called clip.mp3 that contains only that time range.

The -c copy option tells ffmpeg to skip re-encoding, so the process finishes almost instantly even for long recordings.

Note: If you want to transcribe the whole YouTube video, simply ignore the STEP 3.

Step 4: Convert Speech to Text using Whisper

Now use Whisper to transcribe the audio clip that we trimmed in step 3:

whisper clip.mp3 --model base --language en

I got good results with base model. You can also use some other models like tiny, small, medium, large. Check out the Whisper Github page for available models.

whisper clip.mp3 --model small --language en

Whisper produces several files:

clip.txt: plain text transcript
clip.srt and clip.vtt: subtitle files with timestamps

It runs entirely offline and is surprisingly accurate.

If you want GPU acceleration or faster results, you can install Faster-Whisper with pipx as well:

pipx install faster-whisper
faster-whisper --model medium clip.mp3

Both versions work great, but the regular Whisper model is often enough for short clips.

As I mentioned already, Whisper produces many files. If you want to save the text in one file, you can use this command:

whisper clip.mp3 --model base --language en --output_format txt --output_dir .

Please note the dot (.) at the end. This command creates a file named clip.txt containing the transcribed text. Here's what each part of the command does:

--model base: uses the small, fast Whisper model.
--language en: forces English mode for better accuracy.
--output_format txt: generates a plain-text file.
--output_dir .: saves it in the current folder.

Step 5: Transcribe YouTube Videos using a Script (Optional)

If you do this often, you can use a small script to handle all the steps. Save the following code in a text called yt_extract_text.sh:

#!/bin/bash
url=$1
start="01:52:00"
end="02:30:00"

yt-dlp -f bestaudio -x --audio-format mp3 -o temp.mp3 "$url"
ffmpeg -ss "$start" -to "$end" -i temp.mp3 -c copy clip.mp3
whisper clip.mp3 --model small --language en

Replace the start and end times, audio format, output file name, model and language in the script.

Then make it executable:

chmod +x yt_extract_text.sh

To use it, simply type:

./yt_extract_text.sh "https://www.youtube.com/watch?v=xxxx"

The script downloads, trims, and transcribes automatically. You will end up with a neat .txt file containing just the part you wanted.

FAQ: Extracting Text from YouTube Videos on Linux

Q: Can I extract text from a specific part of a YouTube video without downloading the whole thing?

A: Not directly. YouTube doesn't allow partial downloads. The clean way is to download the full audio with yt-dlp, then use ffmpeg to trim the exact section before transcribing it with Whisper.

Q: What is the most accurate way to convert YouTube audio to text offline?

A: Use OpenAI Whisper or Faster-Whisper. Both run locally, work with many languages, and deliver high accuracy compared to online transcription tools.

Q: How do I install Whisper cleanly on Linux?

A: Install pipx first, then run: pipx install openai-whisper

This method keeps Whisper in its own isolated environment and avoids cluttering your system Python packages.

Q: Is Whisper free to use?

A: Yes. Whisper is open-source and completely free. You can run it locally on Linux without any subscription or cloud cost.

Q: What are the main tools used in this method?

A: You need three tools:

1. yt-dlp to download the audio
2. ffmpeg to trim the portion you want
3. Whisper to transcribe speech to text

All three are free and open-source.

Q: Can I use this method on Windows or macOS?

A: Yes. The same commands work on Windows and macOS with small path adjustments. Linux users just find it easier because most tools are available from the terminal by default.

Q: How long can Whisper transcribe audio?

A: There's no strict limit. Whisper can handle long audio files as long as your system has enough RAM and storage. For very long videos, trimming them into smaller segments works best.

Q: What output formats does Whisper create?

A: Whisper generates .txt, .srt, and .vtt files. The .txt file contains plain text, while .srt and .vtt include timestamps for use as subtitles.

Q: Does this method need an internet connection after downloading the video?

A: No. Once you’ve downloaded the audio with yt-dlp, both ffmpeg and Whisper work completely offline.

Q: Is this workflow safe and private?

A: Yes. Every step runs on your local machine. No audio or text leaves your system, which keeps your data private.

Conclusion

I use this method often to collect quotes and notes from long YouTube videos. I no longer waste time transcribing by hand or relying on online tools that store my data. Everything happens locally, and I get high-quality text from exactly the part I need.

If you watch long lectures, podcasts, or interviews, this setup can save you hours. Try it once, it's simple, clean, and it just works.

Related Read:

How To Extract Text From Screenshots And Images In Linux

Featured image by Pixabay.

Command line FFmpeg FFmpeg Commands Linux Linux tips Open source Pipx Python Text Extraction Whisper youtube-dl yt-dlp yt-dlp commands

4 comments

livewire October 31, 2025 - 7:49 pm

This really interested me and I decided to give it a try, However I really think you should be upfront and warn people of the size of this software. On my modest connection the install of open-ai-whisper and pipx totalled 10.8gb! Yes 10.8gb and it was downloading for what seemed like an age with no indication of what was happening. On running the program it immediately made an https connection to god knows where and I instantly disconnected from the internet for safety. So much for not sending your data out to who knows where!. This software would not work without an internet connection. I would not recommend anyone to try this software, unless you can put my concerns to rest?.

sk October 31, 2025 - 7:58 pm

You’re right. It downloads a significant amount of data at first install. Please note that Whisper is large but safe. The only network activity is model downloading, not data uploading.
Once installed, it’s fully offline, private, and transparent. I will add this warning right way. Somehow I missed it. Thanks for bringing this to my attention.

LIVEWIRE October 31, 2025 - 9:11 pm

Thanks for your quick response and for adding the warning to the article, its appreciated. I have had chance to test again and have found that whisper was re-downloading the model (base) I selected as it said that the files sha256 did not match with the copy I had installed. I also tested with the ‘small’ model and this produced very accurate results on the 9 minute youtube clip I downloaded. I’m impressed with it and look forward to trying it some more.

sk October 31, 2025 - 9:14 pm

Glad it helped you. Whisper is one of the useful opensource project.

Extract Text from A Specific Portion Of A YouTube Video On Linux

Transcribe YouTube Audio with yt-dlp, ffmpeg & Whisper.

Why I Use This Method

Step 1: Install the Necessary Tools

Step 2: Download the Audio Only from a YouTube Video using Yt-dlp

Step 3: Trim the Exact Portion with FFmpeg

Step 4: Convert Speech to Text using Whisper

Step 5: Transcribe YouTube Videos using a Script (Optional)

FAQ: Extracting Text from YouTube Videos on Linux

Conclusion

sk

Rust Coreutils Bug That Broke Automatic Ubuntu 25.10 Updates Is Now Fixed

Linux Kernel 6.18 RC3 Released With SMB, XFS And Btrfs Fixes

You May Also Like

Yt-dlp Commands: The Complete Tutorial For Beginners (2026)

4 comments

Leave a Comment Cancel Reply