Rich metadata is the key to content discovery and monetization. It powers advanced video search and recommendation engines that lie at the heart of buying and viewing decisions. By creating and managing both asset-level and time-based metadata at the production stage, content owners can add greater value to the life of their video content. Wazee Digital's automated metadata technology, powered by Wazee Digital Core, extracts a wide range of asset-level and time-based metadata detail from the content itself during ingest — before storing, managing, and delivering it for multiple downstream uses. Core uses a series of automated workflows that enable phonetic keyword scanning, transcript synchronization, and facial recognition services. Through these processes (and a wide range of other metadata management services available via Wazee Digital’s Core platform), users can more effectively access, utilize, and monetize their content. This article will explain automated metadata capture, the technology behind Wazee Digital’s advanced automated metadata capture services, and examples of how content owners have used them.
David Candler has worked in the media and entertainment industry for more than 21 years. He has an impressive track record of successfully implementing and developing both commercial and operational relationships with major blue-chip customers across the broadcast media sector. David’s contributions have played an important role in the growth and success of many major media companies, including Orbit Showtime Network, Todd AO, BBC, Pearson Television, RTL, Ascent Media, Technicolor, Prime Focus Technologies, Deluxe Media Europe, and Wazee Digital.
Robin Melhuish has nearly 30 years of management experience in the media sector in production, postproduction, and digital image processing. Robin specializes in leading change for media companies that are taking advantage of the transition to digital to create disruptive new business models. He has worked with and contributed to the success of many major media companies, including major and minor Hollywood studios, major broadcasters, and groundbreaking startups such as Lowry Digital and Digital Vision.
[Title] Automated Advanced Metadata Extraction
[Subtitle] Advanced Metadata Unlocks the Value of a Video Library
by David Candler, Senior Director, Customer Solutions, Wazee Digital and Robin Melhuish, Director, Customer Solutions, Wazee Digital
Many media organizations have archives full of valuable content that could potentially be repurposed internally or sold to third parties for reuse in a variety of ways. For example, sport leagues and movie studios own the rights to innumerable video clips that contain memorable moments — moments that broadcasters, advertising agencies, production companies, and many others are willing to pay to insert into their own productions.
Wazee Digital has millions of assets under management and handles content licensing for many rights holders. These customers know that repurposing and monetization can only happen if people can find their content in the first place, and the key to doing that is using metadata. Not just the technical metadata that comes from the camera, but rich, descriptive metadata that tells people what’s happening inside the video. The names of the actors and what they’re doing and saying in the scene. A football player’s name and number, the type of play, and the weather conditions during the game. The contents and circumstances of a speech made by a cultural icon. This kind of information gets captured in bits and pieces throughout the production process — in a script, a prop report, a transcript, even the label on a film can — and some of it might make its way into metadata fields within a database, but the key is to make all of that metadata available at once in an asset management system so that it can be used to find and purchase clips or whole assets.
Wazee Digital’s metadata services are built on Core, the company’s cloud-native asset management platform designed for monetizing content. When a content owner begins using Core for the first time, the system accumulates any metadata that exists from many disparate sources, but often the data is incomplete. Knowing that one of the richest sources of metadata is inside the content itself, Wazee Digital created an automated metadata augmentation technology that extracts every relevant metadata detail from ingested content before it enters the Core library, making that content much easier to discover and, ultimately, sell for downstream usage.
Core uses a combination of Wazee Digital’s advanced metadata-capture capabilities and third-party technologies to create time-based metadata, which means that the metadata applies not to an entire asset, but to a particular timecode within an asset. Time-based metadata is critical to locating any single moment within a particular asset, such as a piece of dialog within a movie or a certain serve within a tennis match.
There are three different automated metadata services within Core that users can apply against their assets: phonetic keyword scanning, transcript synchronization, and facial recognition. All of these services are transparent to the user, but there’s a lot going on behind the scenes.
Phonetic Keyword Scanning
For the phonetic keyword scanning service, Core integrates advanced audio-scanning technology from Nexidia (now owned by Avid), an industry-renowned expert in the business of parsing recorded audio into phonetic sounds that make it easily searchable. Core uses Nexidia software to “listen” to the audio and identify phonetic keywords during the ingest workflow.
It works like this: Core ingests a video asset via Amazon S3 and uses S3 policies to push it down predefined workflows. Core creates renditions of the master video asset, including a low-resolution browse proxy for previewing. The browse proxy automatically runs through a Nexidia software workflow, which scans for phonetic keywords. Core automatically updates the asset with this new metadata and presents it as a searchable timeline. Core users can preview the asset and jump to all new keywords marked along the asset’s timeline. From there they can download all video assets and metadata for downstream utilization.
The transcript synchronization service is another automated metadata component that uses Nexidia technology, this time to accurately align a transcript to the timecodes within an asset. In this case, Core imports and parses an .SRT transcript file in one of a few ways depending on what suits the content owner’s workflow. Once the .SRT file is in place, Core’s automatic workflows take over. Nexidia software interprets the audio, syncs it with the transcript, and saves the updated timeline data back into Core. The result is a script of the dialog that is perfectly aligned with the video, and users can then search against that script in Core. Transcript synchronization provides a powerful way of searching within large assets, such as movies. Users can simply type a piece of memorable dialog into the search field, and Core will jump straight to that point in the asset.
For the facial recognition service, Core uses the first generation of Google facial recognition software (formerly Pittsburgh Pattern Recognition) to identify faces during the ingest workflow. Once an asset is ingested and the browser proxy made, Core automatically runs the browser proxy through the Google software workflow to find all similar faces within the asset. Core then presents the new facial recognition metadata as a zip file for users to download to a local computer. The user then unzips the download and loads the relevant files into Wazee Digital’s desktop Face Sorter application. The Face Sorter app maps all required names to faces along the timeline and lets users quickly confirm the results or make changes as they see fit. Once complete, they simply upload the data from the app back into Core to be associated with the asset’s timeline.
The Big Benefit
Even if it were practical to have a person — or even a group of people — watch and log an entire library full of assets, automated metadata technology is much faster, more accurate, more scalable, and more consistent than humans could ever be. The rich metadata it generates greatly enhances discoverability and monetization for content producers or rights holders. Users can now find valuable moments within a content library simply by typing keywords or phrases into the search field, whether or not they know precisely what they’re looking for when they begin their search.
Looking to the future, Wazee Digital is evaluating many of the leading solutions available on the market today, with the intention of continually enhancing its automated metadata services. The company is particularly interested in applying artificial intelligence and machine learning to visual search.
Automated Metadata Services in Action
Wazee Digital’s Core users can apply any or all automated metadata services to their video assets, unlocking metadata that makes it easier to find specific moments within the content. Here are just a couple examples of rights holders who have put Core into action.
Wazee Digital worked with show staffers and others to launch a new viewer-facing content portal for the “Charlie Rose” television series, an interview show distributed throughout the United States. The portal is powered by Wazee Digital Core and enables the show’s staff to curate 25 years’ worth of interviews that journalist and anchor Charlie Rose has conducted with some of the world’s top entertainers, athletes, politicians, business leaders, and thought leaders. Public users can browse the entire archive — including full episodes and individual segments — and search for specific guests, dates, and topics at www.charlierose.com
To make it happen, Wazee Digital and the Charlie Rose team used two Core metadata services: phonetic keyword scanning and transcript synchronization. Once the phonetic keyword scanning service indexed the assets and generated keywords, the transcript synchronization service took over. It aligned closed-caption transcripts with associated video for every interview in the library and then stored the results as timeline-based metadata inside Core. Consequently, portal users can easily find and play back specific interviews inside the portal. The transcript plays alongside the video and matches the dialog word for word. Should he choose to do so, Charlie Rose could license the assets through Core or make them available to download for a fee.
Major Hollywood Film and Television Studio
One of Hollywood’s biggest film studios uses all three of Core’s automated metadata-processing services to catalog its film and TV inventory with granular specificity. That automatically generated metadata augments source metadata and manual inputs at the frame level. From there the studio can produce countless timelines of metadata and parse them in different ways to drive internal processes and external services such as Crackle and iTunes.
All told, a fully indexed feature film typically contains about 12,000 discrete timeline segments across all timelines (descriptive, facial, dialog, music, legal, and scene selects) and more than 1,500 descriptive keyword search terms. The studio estimates it has captured over 5 million time-based moments to date, thanks in large part to Core. Having such detailed, multilevel, frame-accurate metadata has enabled the studio to increase the number of monetizable moments in its library, in turn making its library all the more valuable.
Metadata is an engineering challenge for the media industry. Most content owners know that they need detailed, granular-level metadata about their video assets in order to sell or repurpose those assets effectively, but automatically generating that metadata — and then making all of it available in one platform for people to search against — is a tricky proposition. Great engineering minds are working on this challenge, but it’s a difficult problem to fix, and there is no perfect solution yet.
Fortunately there are automated tools, such as the metadata services inside Wazee Digital Core, that make the process easier. Wazee Digital has made significant strides in the field of advanced automated metadata capture and partners with third-party technology companies that are recognized experts in audio scanning and facial recognition. Together they create metadata that ultimately makes an organization’s content more valuable.
By adopting metadata-driven solutions that use advanced algorithms to extract, normalize, and manage metadata within a sophisticated discovery platform, content owners can get more value from their video than ever before.