Bloggers: Big Media Is Watching

As content recognition software gets more sophisticated, expect more copyright-related battles online like the recent AP-blogger flap

The Associated Press unleashed a firestorm in the blogosphere earlier this month when it demanded that a political site take down AP content it said violated copyrights. Bloggers, including Michael Arrington of and Markos Moulitas of Daily Kos, cried foul, saying the AP's move threatened the free flow of information over the Web. The furor abated a few days later when the AP tempered its demands.

But the dustup between the AP and bloggers was just an early skirmish in what's likely to become a protracted war over how and where media content is published online. On one side are bloggers and other Web sites eager to ensure continued access to information. On the other are media companies intent on controlling or cashing in on the dissemination of their stories, videos, and other digital media. One reason: For the first time, content owners are able to track exactly where and how their words and images show up, thanks to an emerging class of technology called content recognition systems.

Manual Tracking No Longer Necessary

The AP, a not-for-profit news cooperative owned by thousands of subscriber newspapers, has been using a system from Redwood City (Calif.)-based startup Attributor. Like other content recognition systems, Attributor's software extracts a small digital fingerprint—a string of bits unique to a given article, song, or video—and collects them in a database. Then it continually crawls billions of Web sites and blogs, much as Google does when a user launches a search, to detect where that fingerprint recurs. In the recent incident, AP had unearthed instances where its content—at times whole articles—was posted to the liberal-leaning Web site Drudge Retort. Other Attributor customers include Thomson Reuters (TRI), Condé Nast Publications' CondéNet, and the Canadian Press. The AP and Attributor declined to comment on the incident.

For a media executive, the appeal of a content recognition system is clear. With a glance, a publisher or studio head can plainly see where, when, and how their content is being viewed. In a demonstration for BusinessWeek earlier this year, Attributor executives showed how many times scenes from The Sopranos had appeared on 20 leading video sites since they first aired on TV. In all, 1,500 scenes from 52 episodes had been viewed 32 million times. For Time Warner's (TWX) HBO, those viewings might have brought in more than $1 million, said Attributor Chief Executive Officer Jim Brock.

Availability of technology like Attributor's represents a sea change for companies that until recently had to track online content manually or hire an outside company to do it. The new systems can automate the job and do it more cheaply. Most big TV and movie studios, including NBC Universal and Walt Disney (DIS), use systems from companies such as Audible Magic and Vobile to monitor the massive traffic at online video sites such as Google's (GOOG) YouTube. Even phone giant AT&T (T) plans to use Vobile technology (, 11/7/07) to help it root out piracy over the Web.

More Ways to Make Money?

But many bloggers, sites, and free speech advocates are concerned over how widespread the technology will be deployed. The software can be programmed to automatically send out "takedown notices" that require sites to remove contested content, and the data it generates could end up being used to build a case against alleged copyright infringers. Viacom, which has filed a $1 billion lawsuit accusing YouTube of violations, is said to be testing Vobile's video fingerprinting technology, and may introduce its reports as evidence if the case goes to trial.

Bloggers have particular cause for alarm. The software could act as a kind of ever-present police, having a chilling effect on writers concerned they'll be dragged into court for inadvertently excerpting too large a chunk of material. "These systems are like magnets looking for needles in haystacks," says Robert Cox, president of the Media Bloggers Assn., an advocacy group. As more media companies adopt content recognition systems, more lawsuits against bloggers are likely to ensue, Cox says. Already, the number of suits against bloggers has surged, to 500 today from 10 in 2004, he says.

Fans of content recognition systems say these fears are unfounded. They say the technology will hasten a flowering of information on the Web. Their argument is that the systems give copyright holders more ways to make money by distributing their content on the Web. Publishers will know which sites are attracting the most traffic using their articles, and can demand a cut on the associated ad revenue. "Without good content recognition, you're negotiating without the facts," says Facebook Chief Financial Officer Gideon Yu, an investor in Vobile. As former CFO for YouTube, he was involved in the failed licensing talks with Viacom that led to the lawsuit in early 2007.

Keeping Tabs on Content

"It's not just about interdiction," says Scott Teissler, chief information officer of Turner Broadcasting System, which has invested in Attributor and is testing the technology for its own use. "It's about finding out how your content is consumed. That's where the interesting opportunities are."

Just ask Sarah Chubb, president of Condé, owner of sites ranging from the cooking site to fashion site to WiredDigital, the online arm of Wired magazine. A few years ago, Chubb enlisted a team of people to scour the Web for unlicensed content use. Now she has a team that does the opposite—figuring out how to get CondéNet's recipes, fashion photos, and other content onto up-and-coming blogs and social networking sites. Her team is using Attributor's system not to issue takedown notices but to spot new targets. "We used to build our sites on the idea that people would come to our home page," Chubb says. "Now, we're consciously trying to put our content in a lot of places. In most of those cases, there's a revenue opportunity for us," she says, adding that she has no interest in using the technology to launch lawsuits.

Still, most customers are still only kicking the tires on content recognition systems. The technology is not perfect; for example, most systems struggle to identify when a clip is being used as part of a book review or political parody, both legally protected uses. And as the AP flap shows, companies believed to use the systems to trammel the spread of information can face a backlash. What's clear is that it's becoming a lot easier for content owners to keep tabs on where their assets are being consumed, and how.

    Before it's here, it's on the Bloomberg Terminal. LEARN MORE