Clip Genius: ML Sports Highlight Generator
Clip Genius is an advanced Machine Learning tool that processes sports videos to automatically detect score changes and generate highlight clips. Developed during Datathon hosted by Google Waterloo, this project integrates FAISS-based similarity search, Computer Vision (OpenCV), and Multi-threaded Video Processing to streamline sports content creation. Veiw it on github here.
Final Output:
Faiss & Transcription Process:
How it works:
1. Extract Audio
- Extracts the audio from an MP4 video file and saves it as a WAV file.
- Uses FFmpeg to convert the audio to 16kHz mono PCM WAV, ideal for speech recognition.
2. Split Audio
- Breaks the extracted WAV file into 30-second chunks, with an additional 5-second buffer.
- Saves each chunk separately in the clips folder for easier processing.
3. Process File
- Transcribes each audio chunk using NeMo ASR.
- Converts the transcript into a vector representation using MiniLM.
- Ranks clips by similarity to key moments using FAISS.
4. Transcribe & Filter
- Runs the process in parallel (multi-threaded) for efficiency.
- Sorts clips by highest relevance to game highlights.
- Keeps only the top 40% of relevant clips.
5. Merge Clips
- Combines all highlight clips into one final merged video.
- Uses FFmpeg to concatenate clips in the correct order.
- Deletes temporary clips after merging to save space.
OpenCV Scoreboard Detection:
How it works:
The video processing pipeline uses OpenCV, PyTesseract, and FFmpeg to efficiently analyze frames. It starts by opening the video with cv.VideoCapture, resizing frames to 512x512, and detecting the scoreboard using edge detection and Hough Line Transform. Once located, the scoreboard region is extracted, and OCR processes it to recognize scores with a confidence threshold of 75. The detected scores are converted to absolute coordinates, overlaid onto the video, and a timestamp is added.
To optimize performance, only necessary pixels are processed, reducing computational load. OCR extracts numeric scores by cropping and preprocessing the scoreboard area—converting it to grayscale, resizing, and denoising. It then filters non-numeric text, returning a score or zero if no digits are found. This automated approach ensures accurate score tracking and highlight generation.
- fetch_score_coords(filepath): Fetches the coordinates of the scoreboard.
- split_video(filepath, SEGMENT_SIZE, tempfulder, "segments_%03d.mp4"): Splits the video into smaller segments for parallel processing.
- analyze_segments_with_threads(tempfulder, cords): Analyzes the video segments concurrently using multiple threads.
- sorted(results): Sorts the results from all threads.
- process_results(filepath, results): Processes the results. Returns contrul to the script.
User Interface
The frontend of Clip Genius is built using PyQt5, providing a sleek and interactive GUI for users to generate AI-powered sports highlights. It simplifies the complex backend processing into an intuitive interface where users can:

- • Upload a sports video for analysis.
- • Select highlight duration (e.g., 1, 5, or 10 minutes).
- • Choose transcript options (e.g., English, French, or Spanish).
- • Generate highlights with a single click and return MP4
Technologies Used
Our ML video processing system utilizes the following key technologies:
- FAISS - Fast similarity search for detecting key game moments.
- OpenCV - Real-time scoreboard and object detection.
- pytesseract - OCR for extracting numeric scores from images.
- FFmpeg - Video processing and clip generation.
- Python Multiprocessing - Efficiently processes video segments in parallel.
- PyQt5 - GUI framework for building an interactive and user-friendly application.
Acknowledgments
We extend our gratitude to Laurier Analytics and Google Waterloo for
organizing the datathon, providing mentorship, and fostering innovation in AI and data science.
Also shout out to the GOAT Shavam Garg 🐐
Contributors: Robert Pevec, JD, Swaab Anas, Suhana Khullar
Source Code: GitHub Repository