YouTube 동영상 자동 자막 생성: OpenAI의 Whisper & Google Colab

esmile1 2024. 12. 20. 00:30

YouTube 동영상 자동 자막 생성: OpenAI의 Whisper & Google Colab

YouTube 동영상의 자막을 자동으로 생성하는 것은 콘텐츠 제작자와 시청자 모두에게 큰 도움이 됩니다. OpenAI에서 개발한 Whisper 모델을 활용하면 이 과정을 쉽고 효율적으로 수행할 수 있습니다. 이 글에서는 Whisper 모델을 사용하여 YouTube 동영상의 음성을 텍스트로 변환하는 방법을 상세히 알아보겠습니다.

Whisper 모델 소개

Whisper는 OpenAI에서 개발한 강력한 음성 인식 모델입니다. 다양한 언어와 억양을 처리할 수 있으며, 노이즈가 있는 환경에서도 우수한 성능을 보입니다. 이 모델은 여러 크기로 제공되어 사용자의 필요와 하드웨어 사양에 맞게 선택할 수 있습니다.

사용 가능한 Whisper 모델

Whisper 모델은 다양한 크기로 제공됩니다. 각 모델의 특징은 다음과 같습니다:

크기 매개변수 영어 전용 모델 다국어 모델 필요 VRAM 상대 속도

tiny	39M	tiny.en	tiny	~1GB	~32x
base	74M	base.en	base	~1GB	~16x
small	244M	small.en	small	~2GB	~6x
medium	769M	medium.en	medium	~5GB	~2x
large	1550M	N/A	large	~10GB	1x

모델 선택 시 고려사항:

tiny와 base: 빠른 처리 속도, 낮은 정확도
small과 medium: 균형 잡힌 성능과 속도
large: 최고의 정확도, 느린 처리 속도

시스템 요구사항

Whisper 모델을 효과적으로 사용하기 위해서는 다음과 같은 시스템 요구사항을 충족해야 합니다:

GPU: T4 이상 권장 (Google Colab에서 무료로 제공)
VRAM: 선택한 모델에 따라 1GB~10GB 필요
저장 공간: 동영상 길이와 출력 형식에 따라 다름

Whisper를 이용한 YouTube 동영상 자막 생성 과정

Whisper를 사용하여 YouTube 동영상의 자막을 생성하는 과정은 크게 다음과 같은 단계로 이루어집니다:

필요한 라이브러리 설치
YouTube 동영상 URL 입력 및 Whisper 모델 선택
YouTube 동영상에서 오디오 추출
Whisper 모델을 사용하여 오디오 전사
전사 결과를 텍스트 또는 SRT 형식으로 저장

이제 각 단계를 자세히 살펴보겠습니다.

상세 사용 방법 (20단계)

Google Colab 접속: 웹 브라우저에서 Google Colab(colab.research.google.com)에 접속합니다.
새 노트북 생성: 'File' > 'New notebook'을 선택하여 새 노트북을 생성합니다.
GPU 설정: 'Runtime' > 'Change runtime type' > 'Hardware accelerator' > 'GPU'를 선택합니다.
필요한 라이브러리 설치: 다음 명령어를 실행하여 필요한 라이브러리를 설치합니다.
!pip install -q pytube !pip install -q git+https://github.com/openai/whisper.git
필요한 모듈 임포트: 다음 코드를 실행하여 필요한 모듈을 임포트합니다.
import os, re import torch from pathlib import Path from pytube import YouTube import whisper from whisper.utils import get_writer
YouTube URL 입력: 전사하고자 하는 YouTube 동영상의 URL을 입력합니다.
YouTube_URL = "<https://www.youtube.com/watch?v=your_video_id>"
Whisper 모델 선택: 사용할 Whisper 모델을 선택합니다.
whisper_model = "medium" # "tiny", "base", "small", "medium", "large" 중 선택
출력 형식 설정: 텍스트(.txt) 또는 SRT(.srt) 형식으로 저장할지 선택합니다.
text = True srt = True
오디오 다운로드 함수 정의: YouTube 동영상에서 오디오를 추출하는 함수를 정의합니다.
def download_audio_from_youtube(url, file_name=None, out_dir="."): # 함수 내용
오디오 전사 함수 정의: Whisper 모델을 사용하여 오디오를 전사하는 함수를 정의합니다.
def transcribe_audio(model, file, text, srt): # 함수 내용
Whisper 모델 로드: 선택한 Whisper 모델을 로드합니다.
device = "cuda" if torch.cuda.is_available() else "cpu" model = whisper.load_model(whisper_model).to(device)
오디오 다운로드: YouTube 동영상에서 오디오를 추출합니다.
audio = download_audio_from_youtube(YouTube_URL)
오디오 전사: 추출한 오디오를 Whisper 모델을 사용하여 전사합니다.
result = transcribe_audio(model, audio, text, srt)
결과 확인: 전사 결과를 확인합니다. 텍스트 파일과 SRT 파일이 생성되었는지 확인합니다.
파일 다운로드: 생성된 파일을 로컬 컴퓨터로 다운로드합니다.
텍스트 편집: 필요한 경우 생성된 텍스트를 편집하여 정확도를 높입니다.
SRT 파일 확인: SRT 파일의 형식이 올바른지 확인합니다.
자막 적용: 생성된 SRT 파일을 YouTube 동영상에 적용합니다.
성능 평가: 생성된 자막의 정확도를 평가하고 필요한 경우 모델을 조정합니다.
반복 및 개선: 다른 동영상에 대해서도 과정을 반복하며 전체적인 워크플로우를 개선합니다.

주의사항 및 팁

모델 선택: 동영상의 길이와 원하는 정확도에 따라 적절한 모델을 선택하세요.
저작권: 자동 생성된 자막의 저작권 문제에 주의하세요.
다국어 지원: 다국어 콘텐츠의 경우 다국어 모델을 선택하세요.
후처리: 자동 생성된 자막은 항상 수동으로 검토하고 수정하는 것이 좋습니다.
GPU 사용: 가능하면 GPU를 사용하여 처리 속도를 높이세요.

결론

OpenAI의 Whisper 모델을 활용한 YouTube 동영상 자동 자막 생성은 콘텐츠 제작자들에게 큰 도움이 됩니다. 이 과정을 통해 시간을 절약하고 콘텐츠의 접근성을 높일 수 있습니다. 다만, 자동 생성된 자막의 정확성을 항상 확인하고 필요한 경우 수동으로 수정하는 것이 중요합니다. Whisper 모델의 지속적인 발전으로 앞으로 더욱 정확하고 효율적인 자막 생성이 가능해질 것으로 기대됩니다.

# YouTube Video Transcription with OpenAI's Whisper

[![License](https://img.shields.io/github/license/kazuki-sf/youtube-whisper)](https://github.com/kazuki-sf/youtube-whisper)
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/kazuki-sf/youtube-whisper/blob/main/youtube_whisper.ipynb)

## How to Use the Notebook
Feel free to `Copy to Drive` the notebook or run it directly.
1. Enter the URL of the YouTube video or shorts you want to transcribe.
2. Choose the whisper model you want to use.
3. Run the code cell (Step 1-3) and wait for the transcription to complete.

## Notes
* `T4 GPU` or higher is recommended for running the notebook. You can change the runtime type by going to `Runtime` -> `Change runtime type` -> `Hardware accelerator` -> `GPU`.
* Whenever you change the YouTube URL or Whisper Model, please run the `Step 1` and then run `Step 3` (You can skip `Step 2` if you already ran it before)
* When you run `Step 3`, the website might ask you a permission to download multiple files.
* This project is not affiliated with OpenAI. The code provided here is for educational purposes only.
* Here's a list of whisper model and the relative speed of each model. For more information, please visit the official GitHub page: https://github.com/openai/whisper#available-models-and-languages
---

|  Size  | Parameters | English-only model | Multilingual model | Required VRAM | Relative speed |
|:------:|:----------:|:------------------:|:------------------:|:-------------:|:--------------:|
|  tiny  |    39 M    |     `tiny.en`      |       `tiny`       |     ~1 GB     |      ~32x      |
|  base  |    74 M    |     `base.en`      |       `base`       |     ~1 GB     |      ~16x      |
| small  |   244 M    |     `small.en`     |      `small`       |     ~2 GB     |      ~6x       |
| medium |   769 M    |    `medium.en`     |      `medium`      |     ~5 GB     |      ~2x       |
| large  |   1550 M   |        N/A         |      `large`       |    ~10 GB     |       1x       |

# @title Step 1: Enter URL & Choose Whisper Model

# @markdown Enter the URL of the YouTube video
YouTube_URL = "" #@param {type:"string"}

# @markdown Choose the whisper model you want to use
whisper_model = "tiny" # @param ["tiny", "base", "small", "medium", "large", "large-v2", "large-v3"]

# @markdown Save the transcription as text (.txt) file?
text = True #@param {type:"boolean"}

# @markdown Save the transcription as an SRT (.srt) file?
srt = True #@param {type:"boolean"}

# Step 2: Install Dependencies (this may take about 2-3 min)

!pip install -q pytube
!pip install -q git+https://github.com/openai/whisper.git

import os, re
import torch
from pathlib import Path
from pytube import YouTube
import whisper
from whisper.utils import get_writer

# Step 3: Transcribe the video/audio data

device = "cuda" if torch.cuda.is_available() else "cpu"
model = whisper.load_model(whisper_model).to(device)

# Util function to change name
def to_snake_case(name):
    return name.lower().replace(" ", "_").replace(":", "_").replace("__", "_")

# Download the audio data from YouTube video
def download_audio_from_youtube(url,  file_name = None, out_dir = "."):
    print(f"\n==> Downloading audio...")
    yt = YouTube(url)
    if file_name is None:
        file_name = Path(out_dir, to_snake_case(yt.title)).with_suffix(".mp4")
    yt = (yt.streams
            .filter(only_audio = True, file_extension = "mp4")
            .order_by("abr")
            .desc())
    return yt.first().download(filename = file_name)

# Transcribe the audio data with Whisper
def transcribe_audio(model, file, text, srt):
    print("\n=======================")
    print(f"\n🔗 YouTube URL: {YouTube_URL}")
    print(f"\n🤖 Whisper Model: {whisper_model}")
    print("\n=======================")

    file_path = Path(file)
    output_directory = file_path.parent

    # Run Whisper to transcribe audio
    print(f"\n==> Transcribing audio")
    result = model.transcribe(file, verbose = False)

    if text:
        print(f"\n==> Creating .txt file")
        txt_path = file_path.with_suffix(".txt")
        with open(txt_path, "w", encoding="utf-8") as txt:
            txt.write(result["text"])
    if srt:
        print(f"\n==> Creating .srt file")
        srt_writer = get_writer("srt", output_directory)
        srt_writer(result, str(file_path.stem))

    # Download the transcribed files locally
    from google.colab import files

    colab_files = Path("/content")
    stem = file_path.stem

    for colab_file in colab_files.glob(f"{stem}*"):
        if colab_file.suffix in [".txt", ".srt"]:
            files.download(str(colab_file))

    print("\n✨ All Done!")
    print("=======================")
    return result

# Download & Transcribe the audio data
audio = download_audio_from_youtube(YouTube_URL)
result = transcribe_audio(model, audio, text, srt)

'IT' 카테고리의 다른 글

OpenAI의 ChatGPT, 이제 전화와 WhatsApp으로도 이용 가능 (2)	2024.12.20
Perplexity AI를 활용한 블로그 자동화 가이드 (5)	2024.12.20
구글 시트와 AI를 활용한 유튜브 트렌드 분석 자동화 (1)	2024.12.19
유튜브 영상 수집 시트: 강력한 검색 도구의 소개와 사용법 (0)	2024.12.19
구글 시트를 활용한 문자 자동화 시스템 구축하기 (0)	2024.12.19

현재글YouTube 동영상 자동 자막 생성: OpenAI의 Whisper & Google Colab

esmile1 님의 블로그

esmile1 님의 블로그 입니다.

영원소망, 그리스도의 복음, 참된 자유와 기쁨, 하나님말씀 #바르게나누어 # 부끄러움없이 #하나님앞에 #설수있도록 #성경을 공부하라, 은혜믿음, 오블완, 은혜볻음, 정책적인동향, 갈등고통, 1₩, the word of truth, 인간적인 실수, rightly dividing, 은헤의 복음, bible #bible study, 티스토리챌린지, 믿음조상, 신재생에너지 # 태양광 # 풍력 # 연료전지, ㅗㄱ으, 진리의 복음,

Today :
Yesterday :

esmile1 님의 블로그