Audio Video 2 Text
Extract Audio
ffmpeg -i input.mp4 -ar 16000 -ac 1 output.wav
Transcripe to text
pocketsphinx_continuous -infile output.wav >/tmp/out.txt
Do a whole bunch
#!/bin/bashfor file in *.mp4; do
ffmpeg -i "$file" -ar 16000 -ac 1 "${file%.mp4}.wav"
pocketsphinx_continuous -infile "${file%.mp4}.wav" >"${file%.mp4}.txt"
done
- LLMDataHub: Awesome Datasets for LLM Training: https://github.com/Zjh-819/
- LLM Datasets: https://github.com/mlabonne/llm-datasets
- https://kili-technology.com/large-language-models-llms/9-open-sourced-datasets-for-training-large-language-models