comScore Sets New Standards for Streaming Media Measurement
In 2003, comScore began researching a way to collect data on the millions of consumers who stream media content on the Web. It partnered with RealNetworks to develop technology to capture key pieces of information on every request that is viewed by one of its panelists. It gathers data on requests sent via streaming servers, as well as on media delivered via progressive download and pre-caching methods.
Unlike other services, comScore’s measurement looks at the media request itself--not at a server log or image beacon (a 1x1 pixel embedded in a Web page, image, or video file that is used for tracking purposes). As its panelists surf the Web, comScore dissects each request to glean important information about the URL location of the file and protocol used to transmit it. It also gathers data on the file’s mimetype, codec, and format, as well as the software used to play it. When metadata is available, it catches that, too.
This information is then combined with demographic data provided by each panelist during the signup process. A proprietary process then matches observed traffic to each individual within the household. This allows them to know, for instance, if it’s a 15-year-old teen who is streaming or his 45-year-old dad.
Each month, statisticians examine the composition of the comScore panel and assign each panelist a weight to reflect the estimated size of various demographic groups. The millions of records of data are then projected out to provide a scientifically reliable estimate of the digital media activity taking place throughout the country, according to the company.
Many different views
The resulting data is used by a variety of clients who are interested in specific views of the industry. The comScore Streaming Key Measures report delivers Internet-wide metrics on streams served by protocol, format, and player. It breaks out the top sites on the Internet based on number of unique streaming visitors. It further crosses these data by home, university, and work breakouts.
comScore worked closely with Arbitron to develop the comScore/Arbitron Online Radio Ratings Report. This report monitors popular radio services on Yahoo, AOL, MSN, and Live365, and the company plans to add more providers soon. The comScore data are merged with Arbitron’s established terrestrial radio formulas for cumulative and average quarter hours by demographic breakout and by daypart, allowing advertisers to compare the online world to the traditional radio ratings they’re used to.
In a similar fashion, comScore produces monthly video ratings that are used by media aggregators to sell advertising on their services and improve their products. Though comScore isn’t working with a partner the way they are on Internet radio ratings, a company spokesperson says they’re "actively working with many constituents, including leading content delivery networks (CDNs) and content providers, to move video tracking forward."
Challenges remain
Although comScore has tackled a number of challenges to make such reporting possible, significant hurdles remain. The chief amongst them is cataloguing and identifying streams, particularly those served via CDNs.
comScore’s proprietary dictionary technology contains a massive catalog of URLs that identify Web pages by their media channels and owners. This identification process creates a problem in the streaming world, where a stream for a site like CNN.com may actually be served up from a CDN like Akamai. In some cases only the CDN gets credit for the stream, when the news channel the viewer was visiting should get credit as well.
Even if a CDN is not involved, it can still be a challenge to know exactly where a user was when they played a particular stream. Streaming servers often operate independently of Web servers, so streams are usually not stored in a hierarchical folder structure that corresponds to the Web site. Additionally, a digital media file may be referenced from many places on a particular site or even across groups of sites, and referring data is not always available or consistent.