WANTED: YouTube Live (or X Live) transcribe via AI

Hello DannyHamilton!

I’m sharing the instructions on how to do it.

I can try to set this up for you on EC2, but I’ll admit - I haven’t worked much with EC2. If the free tier allows it, I can give it a shot.

I focused on YouTube - this script won’t work with X, but if you’d like, I can try to write something that would work with X.
It won’t work with X because yt-dlp doesn’t currently support livestreams from X, and the platform doesn’t provide direct access to live video streams in a downloadable format.

::How to Transcribe::

1. Requirements
-> Linux/macOS or Windows with WSL installed
-> Terminal access
-> Internet connection

2. Install required tools
On Linux (Ubuntu/Debian) or WSL terminal:

Code:

sudo apt update
sudo apt install -y ffmpeg python3-pip
pip3 install yt-dlp
pip3 install openai-whisper

On macOS (with Homebrew):

Code:

brew install ffmpeg
pip3 install yt-dlp openai-whisper

3. Create the recording & transcription script
Create a file named (for example) record_and_transcribe.sh with this content:

Code:

#!/bin/bash

# Check if URL argument is given
if [ -z "$1" ]; then
  echo "Usage: $0 <YouTube_Live_URL> [duration_in_seconds]"
  exit 1
fi

URL=$1
DURATION=${2:-1800}  # default 1800 seconds (30 minutes)
OUTPUT="live_audio_$(date +%Y%m%d_%H%M%S).wav"
TRANSCRIPT="transcript_$(date +%Y%m%d_%H%M%S).txt"
# Of course, the names of the OUTPUT and TRANSCRIPT files can be changed - we also can modify the script to accept them as arguments.

echo "Recording audio from: $URL for $DURATION seconds..."
yt-dlp -f bestaudio -o - "$URL" | ffmpeg -i pipe:0 -t $DURATION -c:a pcm_s16le "$OUTPUT"

echo "Recording finished: $OUTPUT"
echo "Starting transcription with Whisper..."

whisper "$OUTPUT" --model tiny --output_format txt --output_dir .
# Whisper's tiny model is fast and light.
# For better accuracy use larger models (base, small, medium, or large) but they require more resources.

# Rename transcript to consistent filename
mv "${OUTPUT%.*}.txt" "$TRANSCRIPT"

echo "Transcription complete."
echo "Audio file: $OUTPUT"
echo "Transcript file: $TRANSCRIPT"

Make it executable:

Code:

chmod +x record_and_transcribe.sh

4. Run the script
For example, to record 10 minutes:

Code:

./record_and_transcribe.sh https://www.youtube.com/watch?v=YOUR_LIVE_LINK 600

5. Result
-> A .wav audio file with recorded live stream audio
-> A .txt file with the transcript of the audio

6. Optional notes
Adjust the duration_in_seconds parameter to change the recording length (it’s the second argument you provide when running the script).

Whisper’s "tiny" model is fast and light; for better accuracy use larger models (base, small, medium, or large) but they require more resources.

For Windows users, use WSL Ubuntu or Git Bash with Linux tools installed.

::How to Summarize the Transcription::

1. Open your terminal.

2. Clone the repository:

Code:

git clone https://github.com/ggerganov/llama.cpp

3. Change directory:

Code:

cd llama.cpp

4. Build the program:

Code:

make

5. Download the Mistral-7B-Instruct model in .gguf format and place it inside llama.cpp/models/mistral/.
For example: you can download it from here: https://huggingface.co/itlwas/Mistral-7B-Instruct-v0.1-Q4_K_M-GGUF/blob/main/mistral-7b-instruct-v0.1-q4_k_m.gguf

This model and version are lightweight and fast, making them perfect for local use without heavy hardware. It is instruction-tuned, so it handles commands like “Summarize this text” very well. The Q4_K_M quantization reduces the model size and speeds up inference with minimal quality loss. The GGUF format is optimized for llama.cpp, making it easy and efficient to run locally.

It strikes a good balance between speed, quality, and usability on typical personal computers.

6. Copy your transcript file (for example, the one generated earlier) into the llama.cpp folder and name it (for example) transcript.txt.

7. Run the summary command and save the output to a file:

Code:

 ./main -m models/mistral/mistral-7b-instruct-v0.1.Q4_K_M.gguf -p "Summarize this text: $(cat transcript.txt)" > summary.txt

(Of course, the output file name can be different.)

8. Open summary.txt to read the summary.

9. Notes & Tips
The script assumes you run it inside the llama.cpp folder where ./main and models/mistral/ reside.

Whisper’s tiny model is fast but less accurate. You can use other Whisper models (base, small, medium, large) by changing the --model flag.

If the transcript is very long, llama.cpp might not handle the entire text in one go (due to token limits). In that case, consider splitting the transcript and summarizing parts separately.

On Windows, use WSL for Linux compatibility.

::Complete Guide: Transcribe & Summarize YouTube Live Stream in One Script::

1. Prerequisites
Operating System: Linux/macOS or Windows with WSL
Terminal with internet access

Installed tools:
ffmpeg
yt-dlp
python3-pip

Python packages: openai-whisper

llama.cpp repository built with make

Mistral-7B-Instruct model downloaded in .gguf format and placed in llama.cpp/models/mistral/

2. Install Required Tools
Linux (Ubuntu/Debian) or WSL:

Code:

sudo apt update
sudo apt install -y ffmpeg python3-pip make git build-essential
pip3 install yt-dlp openai-whisper

macOS (with Homebrew):

Code:

brew install ffmpeg
pip3 install yt-dlp openai-whisper

3. Download and Prepare llama.cpp and Mistral Model

Code:

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make

Download the Mistral-7B-Instruct model in .gguf format and place it inside:

llama.cpp/models/mistral/mistral-7b-instruct-v0.1.Q4_K_M.gguf

You can download it from:
https://huggingface.co/itlwas/Mistral-7B-Instruct-v0.1-Q4_K_M-GGUF/blob/main/mistral-7b-instruct-v0.1-q4_k_m.gguf

4. Create the Combined Script
Create a bash script file called record_transcribe_summarize.sh with the following content:

Code:

#!/bin/bash

if [ -z "$1" ]; then
  echo "Usage: $0 <YouTube_Live_URL> [duration_in_seconds]"
  exit 1
fi

URL=$1
DURATION=${2:-1800} # default 30 minutes

TIMESTAMP=$(date +%Y%m%d_%H%M%S)
OUTPUT_AUDIO="live_audio_${TIMESTAMP}.wav"
TRANSCRIPT="transcript_${TIMESTAMP}.txt"
SUMMARY="summary_${TIMESTAMP}.txt"

echo "Recording audio from $URL for $DURATION seconds..."
yt-dlp -f bestaudio -o - "$URL" | ffmpeg -i pipe:0 -t $DURATION -c:a pcm_s16le "$OUTPUT_AUDIO"

echo "Audio recording finished: $OUTPUT_AUDIO"
echo "Starting transcription with Whisper..."

whisper "$OUTPUT_AUDIO" --model tiny --output_format txt --output_dir .

mv "${OUTPUT_AUDIO%.*}.txt" "$TRANSCRIPT"

echo "Transcription complete: $TRANSCRIPT"

echo "Starting summary generation with llama.cpp and Mistral-7B..."

./main -m models/mistral/mistral-7b-instruct-v0.1.Q4_K_M.gguf -p "Summarize this text: $(cat "$TRANSCRIPT")" > "$SUMMARY"

echo "Summary saved to: $SUMMARY"

Make it executable:

Code:

chmod +x record_transcribe_summarize.sh

5. Run the Script
Run the script providing the YouTube Live URL and optionally duration in seconds:

Code:

./record_transcribe_summarize.sh https://www.youtube.com/watch?v=YOUR_LIVE_LINK 600

This will record 10 minutes of audio from the live stream, transcribe it, then generate a summary.

Output files:

Audio: live_audio_TIMESTAMP.wav

Transcript: transcript_TIMESTAMP.txt

Summary: summary_TIMESTAMP.txt

6. Notes & Tips
The script assumes you run it inside the llama.cpp folder where ./main and models/mistral/ reside.

Whisper’s tiny model is fast but less accurate. You can use other Whisper models (base, small, medium, large) by changing the --model flag.

If the transcript is very long, llama.cpp might not handle the entire text in one go (due to token limits). In that case, consider splitting the transcript and summarizing parts separately.

On Windows, use WSL for Linux compatibility.

::How to Record Without Specifying Duration - Auto Stop When Live Ends::

To record until the YouTube live stream ends automatically without specifying duration, modify the recording command in your script to remove the duration limit.

Replace this line in the script:

Code:

yt-dlp -f bestaudio -o - "$URL" | ffmpeg -i pipe:0 -t $DURATION -c:a pcm_s16le "$OUTPUT_AUDIO"

with this:

Code:

yt-dlp -f bestaudio -o - "$URL" | ffmpeg -i pipe:0 -c:a pcm_s16le "$OUTPUT_AUDIO"

What this does:
yt-dlp streams the audio continuously until the live stream ends

ffmpeg records all audio until yt-dlp stops

Recording automatically finishes when the live stream ends (no manual duration needed)

Code:

#!/bin/bash

if [ -z "$1" ]; then
  echo "Usage: $0 <YouTube_Live_URL>"
  exit 1
fi

URL=$1

TIMESTAMP=$(date +%Y%m%d_%H%M%S)
OUTPUT_AUDIO="live_audio_${TIMESTAMP}.wav"
TRANSCRIPT="transcript_${TIMESTAMP}.txt"
SUMMARY="summary_${TIMESTAMP}.txt"

echo "Recording audio from $URL until live ends..."
yt-dlp -f bestaudio -o - "$URL" | ffmpeg -i pipe:0 -c:a pcm_s16le "$OUTPUT_AUDIO"

echo "Recording finished: $OUTPUT_AUDIO"
echo "Starting transcription with Whisper..."

whisper "$OUTPUT_AUDIO" --model tiny --output_format txt --output_dir .

mv "${OUTPUT_AUDIO%.*}.txt" "$TRANSCRIPT"

echo "Transcription complete: $TRANSCRIPT"

echo "Starting summary generation with llama.cpp and Mistral-7B..."

./main -m models/mistral/mistral-7b-instruct-v0.1.Q4_K_M.gguf -p "Summarize this text: $(cat "$TRANSCRIPT")" > "$SUMMARY"

echo "Summary saved to: $SUMMARY"

Make sure it’s executable:

Code:

chmod +x record_transcribe_summarize.sh

Run it like this (no duration argument needed):

Code:

./record_transcribe_summarize.sh https://www.youtube.com/watch?v=YOUR_LIVE_LINK

::Cons of not specifying recording duration (auto-stop)::
-> Large files may quickly use up disk space.
-> Recording may stop early if the stream disconnects or buffers.
-> No control over how long you record.
-> Very long files take more time and resources to transcribe and summarize.
-> Whisper and llama.cpp may struggle with very large inputs.

If this works for you, a tip would make me do a happy dance

(bc1q955fz4agkyt9fy53gznlx99w30xyvl46e9ynnd)

If you want me to set this up on EC2, I can give it a try.
I can customize or extend the script to handle long transcription chunking or automate model downloads if needed.

	Author	Topic: WANTED: YouTube Live (or X Live) transcribe via AI (Read 98 times)
This is a self-moderated topic. If you do not want to be moderated by the person who started this topic, create a new topic.