Bitcoin Forum
September 10, 2025, 05:02:15 AM *
News: Latest Bitcoin Core release: 29.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: WANTED: YouTube Live (or X Live) transcribe via AI  (Read 98 times)
This is a self-moderated topic. If you do not want to be moderated by the person who started this topic, create a new topic.
DannyHamilton (OP)
Legendary
*
Offline Offline

Activity: 3640
Merit: 5177



View Profile
May 15, 2025, 02:01:31 AM
 #1

I'd like to be able to transcribe and summarize YouTube Live broadcasts via a tool like fireflies.ai.

I need a way to capture and route the audio from the live broadcast directly into the service.

I don't really have the time (or motivation) right now to solve this myself.

What would be GREAT would be if someone could get an AWS EC2 instance all set up and working to accomplish this, and then could just share the AMI so that I can fire up the instance whenever I want it, point it at a live broadcast and AI tool of my choice, and then shut it down when I'm done with it.

Does anybody here have the time and skills to set something like that up, and if so, at what cost?

It doesn't necessarily have to be an EC2 instance. That's just the first solution that came to mind. I'm open to other solutions, or even a well-written set of step-by-step instructions on exactly how to set it up myself (assuming that the steps could be followed without much knowledge, and can be completed in less than 2 hours).
suzanne5223
Hero Member
*****
Offline Offline

Activity: 3066
Merit: 689


Want top-notch marketing for your project, Hire me


View Profile WWW
May 15, 2025, 06:01:15 PM
 #2

I have never used Fireflies.ai or the AWS EC2, and I don't know how perfectly it transcribes live video. However, I have used live.maestra.ai to transcribe YouTube live video.
Although it misspells words at some point due to pronunciation, which I believe you will understand, AI does that sometimes

Since you're open to other solution, or providing instruction on how to setup AWS EC2 because I dont see Fireflies.ai as a platform that's hard to use, I am good with that but I want to know if there's a specific day, time, and time span for live Youtube video and the X live just to be sure if it is something that will work for me.

KarmaHODL
Newbie
*
Offline Offline

Activity: 8
Merit: 0


View Profile
May 20, 2025, 07:12:00 PM
Last edit: May 21, 2025, 04:48:03 PM by KarmaHODL
 #3

Hello DannyHamilton!

I’m sharing the instructions on how to do it.

I can try to set this up for you on EC2, but I’ll admit - I haven’t worked much with EC2. If the free tier allows it, I can give it a shot.

I focused on YouTube - this script won’t work with X, but if you’d like, I can try to write something that would work with X.
It won’t work with X because yt-dlp doesn’t currently support livestreams from X, and the platform doesn’t provide direct access to live video streams in a downloadable format.


::How to Transcribe::

1. Requirements
-> Linux/macOS or Windows with WSL installed
-> Terminal access
-> Internet connection

2. Install required tools
On Linux (Ubuntu/Debian) or WSL terminal:
Code:
sudo apt update
sudo apt install -y ffmpeg python3-pip
pip3 install yt-dlp
pip3 install openai-whisper

On macOS (with Homebrew):
Code:
brew install ffmpeg
pip3 install yt-dlp openai-whisper

3. Create the recording & transcription script
Create a file named (for example) record_and_transcribe.sh with this content:
Code:
#!/bin/bash

# Check if URL argument is given
if [ -z "$1" ]; then
  echo "Usage: $0 <YouTube_Live_URL> [duration_in_seconds]"
  exit 1
fi

URL=$1
DURATION=${2:-1800}  # default 1800 seconds (30 minutes)
OUTPUT="live_audio_$(date +%Y%m%d_%H%M%S).wav"
TRANSCRIPT="transcript_$(date +%Y%m%d_%H%M%S).txt"
# Of course, the names of the OUTPUT and TRANSCRIPT files can be changed - we also can modify the script to accept them as arguments.

echo "Recording audio from: $URL for $DURATION seconds..."
yt-dlp -f bestaudio -o - "$URL" | ffmpeg -i pipe:0 -t $DURATION -c:a pcm_s16le "$OUTPUT"

echo "Recording finished: $OUTPUT"
echo "Starting transcription with Whisper..."

whisper "$OUTPUT" --model tiny --output_format txt --output_dir .
# Whisper's tiny model is fast and light.
# For better accuracy use larger models (base, small, medium, or large) but they require more resources.

# Rename transcript to consistent filename
mv "${OUTPUT%.*}.txt" "$TRANSCRIPT"

echo "Transcription complete."
echo "Audio file: $OUTPUT"
echo "Transcript file: $TRANSCRIPT"

Make it executable:
Code:
chmod +x record_and_transcribe.sh

4. Run the script
For example, to record 10 minutes:
Code:
./record_and_transcribe.sh https://www.youtube.com/watch?v=YOUR_LIVE_LINK 600

5. Result
-> A .wav audio file with recorded live stream audio
-> A .txt file with the transcript of the audio

6. Optional notes
Adjust the duration_in_seconds parameter to change the recording length (it’s the second argument you provide when running the script).

Whisper’s "tiny" model is fast and light; for better accuracy use larger models (base, small, medium, or large) but they require more resources.

For Windows users, use WSL Ubuntu or Git Bash with Linux tools installed.


::How to Summarize the Transcription::

1. Open your terminal.

2. Clone the repository:
Code:
git clone https://github.com/ggerganov/llama.cpp

3. Change directory:
Code:
cd llama.cpp

4. Build the program:
Code:
make

5. Download the Mistral-7B-Instruct model in .gguf format and place it inside llama.cpp/models/mistral/.
For example: you can download it from here: https://huggingface.co/itlwas/Mistral-7B-Instruct-v0.1-Q4_K_M-GGUF/blob/main/mistral-7b-instruct-v0.1-q4_k_m.gguf

This model and version are lightweight and fast, making them perfect for local use without heavy hardware. It is instruction-tuned, so it handles commands like “Summarize this text” very well. The Q4_K_M quantization reduces the model size and speeds up inference with minimal quality loss. The GGUF format is optimized for llama.cpp, making it easy and efficient to run locally.

It strikes a good balance between speed, quality, and usability on typical personal computers.

6. Copy your transcript file (for example, the one generated earlier) into the llama.cpp folder and name it (for example) transcript.txt.

7. Run the summary command and save the output to a file:
Code:
 ./main -m models/mistral/mistral-7b-instruct-v0.1.Q4_K_M.gguf -p "Summarize this text: $(cat transcript.txt)" > summary.txt 
(Of course, the output file name can be different.)

8. Open summary.txt to read the summary.

9. Notes & Tips
The script assumes you run it inside the llama.cpp folder where ./main and models/mistral/ reside.

Whisper’s tiny model is fast but less accurate. You can use other Whisper models (base, small, medium, large) by changing the --model flag.

If the transcript is very long, llama.cpp might not handle the entire text in one go (due to token limits). In that case, consider splitting the transcript and summarizing parts separately.

On Windows, use WSL for Linux compatibility.


::Complete Guide: Transcribe & Summarize YouTube Live Stream in One Script::

1. Prerequisites
Operating System: Linux/macOS or Windows with WSL
Terminal with internet access

Installed tools:
ffmpeg
yt-dlp
python3-pip

Python packages: openai-whisper

llama.cpp repository built with make

Mistral-7B-Instruct model downloaded in .gguf format and placed in llama.cpp/models/mistral/

2. Install Required Tools
Linux (Ubuntu/Debian) or WSL:
Code:
sudo apt update
sudo apt install -y ffmpeg python3-pip make git build-essential
pip3 install yt-dlp openai-whisper

macOS (with Homebrew):
Code:
brew install ffmpeg
pip3 install yt-dlp openai-whisper

3. Download and Prepare llama.cpp and Mistral Model
Code:
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make

Download the Mistral-7B-Instruct model in .gguf format and place it inside:

llama.cpp/models/mistral/mistral-7b-instruct-v0.1.Q4_K_M.gguf

You can download it from:
https://huggingface.co/itlwas/Mistral-7B-Instruct-v0.1-Q4_K_M-GGUF/blob/main/mistral-7b-instruct-v0.1-q4_k_m.gguf

4. Create the Combined Script
Create a bash script file called record_transcribe_summarize.sh with the following content:

Code:
#!/bin/bash

if [ -z "$1" ]; then
  echo "Usage: $0 <YouTube_Live_URL> [duration_in_seconds]"
  exit 1
fi

URL=$1
DURATION=${2:-1800} # default 30 minutes

TIMESTAMP=$(date +%Y%m%d_%H%M%S)
OUTPUT_AUDIO="live_audio_${TIMESTAMP}.wav"
TRANSCRIPT="transcript_${TIMESTAMP}.txt"
SUMMARY="summary_${TIMESTAMP}.txt"

echo "Recording audio from $URL for $DURATION seconds..."
yt-dlp -f bestaudio -o - "$URL" | ffmpeg -i pipe:0 -t $DURATION -c:a pcm_s16le "$OUTPUT_AUDIO"

echo "Audio recording finished: $OUTPUT_AUDIO"
echo "Starting transcription with Whisper..."

whisper "$OUTPUT_AUDIO" --model tiny --output_format txt --output_dir .

mv "${OUTPUT_AUDIO%.*}.txt" "$TRANSCRIPT"

echo "Transcription complete: $TRANSCRIPT"

echo "Starting summary generation with llama.cpp and Mistral-7B..."

./main -m models/mistral/mistral-7b-instruct-v0.1.Q4_K_M.gguf -p "Summarize this text: $(cat "$TRANSCRIPT")" > "$SUMMARY"

echo "Summary saved to: $SUMMARY"

Make it executable:

Code:
chmod +x record_transcribe_summarize.sh

5. Run the Script
Run the script providing the YouTube Live URL and optionally duration in seconds:

Code:
./record_transcribe_summarize.sh https://www.youtube.com/watch?v=YOUR_LIVE_LINK 600

This will record 10 minutes of audio from the live stream, transcribe it, then generate a summary.

Output files:

Audio: live_audio_TIMESTAMP.wav

Transcript: transcript_TIMESTAMP.txt

Summary: summary_TIMESTAMP.txt

6. Notes & Tips
The script assumes you run it inside the llama.cpp folder where ./main and models/mistral/ reside.

Whisper’s tiny model is fast but less accurate. You can use other Whisper models (base, small, medium, large) by changing the --model flag.

If the transcript is very long, llama.cpp might not handle the entire text in one go (due to token limits). In that case, consider splitting the transcript and summarizing parts separately.

On Windows, use WSL for Linux compatibility.


::How to Record Without Specifying Duration - Auto Stop When Live Ends::

To record until the YouTube live stream ends automatically without specifying duration, modify the recording command in your script to remove the duration limit.

Replace this line in the script:
Code:
yt-dlp -f bestaudio -o - "$URL" | ffmpeg -i pipe:0 -t $DURATION -c:a pcm_s16le "$OUTPUT_AUDIO"

with this:
Code:
yt-dlp -f bestaudio -o - "$URL" | ffmpeg -i pipe:0 -c:a pcm_s16le "$OUTPUT_AUDIO"

What this does:
yt-dlp streams the audio continuously until the live stream ends

ffmpeg records all audio until yt-dlp stops

Recording automatically finishes when the live stream ends (no manual duration needed)

Code:
#!/bin/bash

if [ -z "$1" ]; then
  echo "Usage: $0 <YouTube_Live_URL>"
  exit 1
fi

URL=$1

TIMESTAMP=$(date +%Y%m%d_%H%M%S)
OUTPUT_AUDIO="live_audio_${TIMESTAMP}.wav"
TRANSCRIPT="transcript_${TIMESTAMP}.txt"
SUMMARY="summary_${TIMESTAMP}.txt"

echo "Recording audio from $URL until live ends..."
yt-dlp -f bestaudio -o - "$URL" | ffmpeg -i pipe:0 -c:a pcm_s16le "$OUTPUT_AUDIO"

echo "Recording finished: $OUTPUT_AUDIO"
echo "Starting transcription with Whisper..."

whisper "$OUTPUT_AUDIO" --model tiny --output_format txt --output_dir .

mv "${OUTPUT_AUDIO%.*}.txt" "$TRANSCRIPT"

echo "Transcription complete: $TRANSCRIPT"

echo "Starting summary generation with llama.cpp and Mistral-7B..."

./main -m models/mistral/mistral-7b-instruct-v0.1.Q4_K_M.gguf -p "Summarize this text: $(cat "$TRANSCRIPT")" > "$SUMMARY"

echo "Summary saved to: $SUMMARY"

Make sure it’s executable:
Code:
chmod +x record_transcribe_summarize.sh

Run it like this (no duration argument needed):
Code:
./record_transcribe_summarize.sh https://www.youtube.com/watch?v=YOUR_LIVE_LINK

::Cons of not specifying recording duration (auto-stop)::
-> Large files may quickly use up disk space.
-> Recording may stop early if the stream disconnects or buffers.
-> No control over how long you record.
-> Very long files take more time and resources to transcribe and summarize.
-> Whisper and llama.cpp may struggle with very large inputs.

Smiley

If this works for you, a tip would make me do a happy dance Smiley (bc1q955fz4agkyt9fy53gznlx99w30xyvl46e9ynnd)

If you want me to set this up on EC2, I can give it a try.
I can customize or extend the script to handle long transcription chunking or automate model downloads if needed.
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!