文档已移动

Seoul National University of Science and Technology researchers propose PV2DOC: A tool to summarize presentation videos into structured documents — PV2DOC organizes each audio and visible information from presentation movies into structured PDF paperwork, making the content material simpler to grasp and entry. Credit score: Affiliate Professor Hyuk-Yoon Kwon from Seoul Nationwide College of Science and Know-how

You’ve gotten possible encountered presentation-style movies that mix slides, figures, tables, and spoken explanations. These movies have change into a broadly used medium of delivering data, notably after the COVID-19 pandemic when stay-at-home measures had been applied.

Whereas movies are an enticing option to entry content material, a big disadvantage is that they’re time-consuming, since one should watch your entire video to seek out particular data. In addition they take up appreciable space for storing because of their giant file dimension.

Researchers led by Professor Hyuk-Yoon Kwon at Seoul Nationwide College of Science and Know-how in South Korea aimed to deal with these points with PV2DOC, a software program instrument that converts presentation movies into summarized paperwork. Not like different video summarizers, which require a transcript alongside the video and change into ineffective when solely the video is out there, PV2DOC overcomes this limitation by combining each visible and audio information and changing video into paperwork.

Their analysis was made out there on-line on October 11, 2024, and was revealed within the journal SoftwareX on December 1, 2024.

“For customers who want to observe and research quite a few movies, similar to lectures or convention displays, PV2DOC generates summarized stories that may be learn inside two minutes. Moreover, PV2DOC manages figures and tables individually, connecting them to the summarized content material so customers can discuss with them when wanted,” explains Prof. Kwon.

For picture processing, PV2DOC extracts frames from the video at one-second intervals. It makes use of a way known as the structural similarity index, which compares every body with the earlier one to determine distinctive frames. Objects in every body, similar to figures, tables, graphs, and equations, are then detected by object detection fashions, Masks R-CNN and YOLOv5.

Throughout this course of, some photographs might change into fragmented because of whitespace or sub-figures. To resolve this, PV2DOC makes use of a determine merge approach that identifies overlapping areas and combines them right into a single determine. Subsequent, the system applies optical character recognition (OCR) utilizing the Google Tesseract engine to extract textual content from the pictures. The extracted textual content is then organized right into a structured format, similar to headings and paragraphs.

Concurrently, PV2DOC extracts the audio from the video and makes use of the Whisper mannequin, an open-source speech-to-text (STT) instrument, to transform it into written textual content. The transcribed textual content is then summarized utilizing the TextRank algorithm, making a abstract of the details.

The extracted photographs and textual content are mixed right into a Markdown doc, which might be became a PDF file. The ultimate doc presents the video’s content material—similar to textual content, figures, and formulation—in a transparent and arranged means, following the construction of the unique video.

By changing unorganized video information into structured, searchable paperwork, PV2DOC enhances the accessibility of the video and reduces the space for storing wanted for sharing and storing the video.

“This software program simplifies information storage and facilitates information evaluation for presentation movies by reworking unstructured information right into a structured format, thus providing important potential from the views of knowledge accessibility and information administration. It supplies a basis for extra environment friendly utilization of presentation movies,” says Prof. Kwon.

The researchers plan to additional streamline video content material into accessible codecs. Their subsequent objective is to coach a big language mannequin (LLM), just like ChatGPT, to supply a question-answering service, the place customers can ask questions primarily based on the content material of the movies, with the mannequin producing correct, contextually related solutions.

Extra data:
Gained-Ryeol Jeong et al, PV2DOC: Changing the presentation video into the summarized doc, SoftwareX (2024). DOI: 10.1016/j.softx.2024.101922

Offered by
Seoul Nationwide College of Science & Know-how

Quotation:
PV2DOC: New instrument summarizes presentation movies into searchable, structured PDF paperwork (2024, December 30)
retrieved 30 December 2024
from https://techxplore.com/information/2024-12-pv2doc-tool-videos-searchable-pdf.html

This doc is topic to copyright. Aside from any truthful dealing for the aim of personal research or analysis, no
half could also be reproduced with out the written permission. The content material is supplied for data functions solely.

Source link