Rapleaf's Web: How You Are Profiled on the Web

Earlier, I posted about San Francisco-based Internet information aggregator Rapleaf, a service that collects, sorts, and repackages data about many of us who spend an inordinate amount of time on the Internet. I started poking around and discovered many startups that are using data from Rapleaf, but it's not just startups. Just take a look at this article on Rapleaf in Fast Company from last year:

"By accessing its database of 378,968,953 consumer e-mail profiles, banks, retailers, and anti-fraud firms (all of which it counts among its clients) Rapleaf can quickly confirm legitimate customers and weed out scammers, cutting verification costs and improving the user experience. 'Companies spend as much as $100 getting customers to their site. The goal is to filter out the bad people and keep as many good people as possible,' (Joel) Jewitt (Rapleaf's vice-president of business development) says. 'If a customer's e-mail address is attached to three or four social networking sites with 300 friends, the e-mail likely isn't fake and the retailer can put that person in the "good" pile.' "

One of our readers pointed out that because Rapleaf is sending data to these companies, which may be caching your information, there's more information leaking out about you on the Web.Opting out of Rapleaf's service isn't going to do you any good. Let's put it bluntly: For better or worse, the genie is out of the bottle.

How Rapleaf Works

To better understand how, exactly, Rapleaf works, I did some investigating. On a basic level, Rapleaf is like a credit-card company's database. When you're at a store and the cashier slides your credit card through, the store checks your card information against the credit-card company's database to make sure your card hasn't expired and you have enough credit.

Rapleaf's database contains e-mail addresses. Say an airline offers a discount coupon, as long as you provide your e-mail. When you sign up for the coupon, the airline looks up your e-mail address in Rapleaf's database; Rapleaf confirms the e-mail is valid by checking it against your profile in its database; and the airline knows it can send you its e-mail newsletter.

When I contacted Rapleaf, it said the company has built a database by crawling the Web, looking for connections, and building profiles based on its own technology. "Like Google, we crawl publicly available data on the Web—as long as robots.txt allows search engines like us to crawl (we stop crawling if people disallow search engines)," Chief Executive Officer Auren Hoffman e-mailed. He added:

"Rapleaf is working hard to protect consumers. We are a data company that, like 99 percent of data companies, is opt-out (rather than opt-in). But we are a white-hat data company who helps companies safely provide a more personalized experience to their customers. We try really hard to protect consumers (see)—we've thought a lot about consumer protection and are proud of everything we are doing. However, we are open to ideas on how we can improve and I encourage your readers to e-mail me at auren.hoffman@rapleaf.com with ideas on how we can improve and better protect consumers. While we cannot commit to implementing any idea from your readers, we can commit that we will consider all thought-out suggestions."

The company argues that what it does is no different from various ad networks, and that its policies are more consumer-friendly. You can opt out of Rapleaf by visiting this location, Hoffman said. Nevertheless, Rapleaf's services are clearly much in demand, based on this response from CEO Hoffman:

"Today we help hundreds of top retailers, hotels, advertising agencies, large brands, tech startups, educational organizations, and nonprofits personalize their customers' experiences. (We sign NDAs with our customers so we cannot release their names.)"

Think of Rapleaf as the provider of the FICO score about an e-mail address. That e-mail address comes with Facebook ID, Flickr ID, Twitter account information, and other social details. For a marketer, or even someone trying to hit you up for business, these are pretty relevant data, because they allow the marketer to target customers and connect them socially. In another scenario, you can buy an e-mail list of a million addresses for $1,000, check them against Rapleaf, and end up with about 10,000 e-mails worth targeting. That's a pretty good deal.

A Good E-Mail ID Is Worth Money

In order for Rapleaf to be successful, it needs to keep growing its database of good e-mail addresses, which is why it's giving startups like Facebook game and social CRM companies liberal access to its APIs. When a social CRM company, such as Rapportive, plugs into your Gmail account, it confirms to Rapleaf that your e-mail address is valid. Since the social CRMs create profiles of the people who e-mail you, the services confirm to Rapleaf that your friends' addresses are valid, too. Technically, no data are exchanged, but the sheer quantity of lookups is enough to beef up Rapleaf's database.

Think of it this way: Companies like Rapportive, by making simple queries, are becoming the sources of the best and highest-quality e-mails/IDs that Rapleaf has ever obtained. I think this is the crux of the problem. Here's a question I sent to Rapleaf and the answer I received (emphasis mine).

"Does Rapportive (and others like them, such as Gist) pay for the service? If yes, how much? What happens to the queries that originate from Rapportive? Say e-mail x@x.com. Does that data get stored in your databases?

Unfortunately we're not able to go into details about specific relationships because of our confidentiality agreements, but all of our customers pay us for our service. We do have a free API (up to 1,000 queries per month) that many companies use—but companies need to pay for Rapleaf for queries above that. We only allow companies to learn more about their existing customers (and we have never given out e-mail addresses) and when they query their customers' e-mail, we return the most updated information Rapleaf has associated with that e-mail. If this is a new e-mail we have not seen before, it may be cached to provide better user experience in the future or it can be removed via opt-out."

Given that Rapleaf's core competency is its ability to take e-mail addresses, map them with data on the Web, and build a profile, I find the argument that data are cached for better user experience hard to swallow. With nearly a billion e-mail addresses in its database, any lookup helps Rapleaf cull the best e-mails from the giant morass of addresses. There are at least two companies I spoke to that have declined to work with Rapleaf and refused its offer of free data, mostly because, in their opinion, they found the workflow unsavory, to put it mildly.

Rapleaf's Startup Web

Regardless, here is a list of Internet startups that have access to data from Rapleaf. Clearly it is incomplete, and, for some of these companies, it is not clear if they send data back to Rapleaf (I've noted the companies that confirmed they only look up data). I am going to update this post with more comments as I get them.

Rapportive. The CEO has confirmed that the company doesn't pass any data back and forth.

eTacts. It says it's not passing information back to Rapleaf.

Gist. The CTO confirmed the company isn't passing any information back to Rapleaf.

Flowtown. Co-founder Ethan Bloch left a comment indicating Flowtown doesn't pass any information back to Rapleaf.



SocialShield. Arad Rostampour denied passing any data back to Rapleaf.

As I said earlier, even if the companies aren't passing any data, every time they do an e-mail-based lookup against Rapleaf's database, they are essentially helping make Rapleaf's database more powerful.

Casting the Social Web

Verifying e-mails is one thing. But today, there is a lot more valid social information about demographics, interests, location, etc. available that a company like Rapleaf could use to fill out its profiles. I'm as concerned about startups using Rapleaf's API as I am about how the company continues to mine data from huge data-rich social services such as LinkedIn. LinkedIn data are ending up on Rapleaf, and from there, they're appearing on other services such as Flowtown. When I contacted LinkedIn, its spokesperson sent the following response:

"As we've always said, our user data belong to our users. It is provided by them and, unless they have restricted it, is available on our site. We don't share personally identifiable information with third parties without user consent. We also have teams that help protect our members' professional profiles from scraping, spamming, and any other activity that violates our terms of service. We don't have any business relationship with Rapleaf."

However, LinkedIn data end up at Rapleaf and, via Rapleaf, at other services through scraping of the publicly available data. Some people with knowledge of the subject believe that alternative tactics are being used to get around the API limitations of services such as LinkedIn. (If you know more, please get in touch with me.)

To be clear, I don't have old-fashioned notions about privacy on the Internet. I know the realities of today's Internet life. To enjoy the convenience of using Web-based services, one has to make some sacrifices, and living socially online will eventually lead to an erosion of privacy. However, what I find egregious is how the information is surreptitiously collected all over the Web, then aggregated to be sold, without us having any control or ability to look into those data. Sure we can opt out, but only if we know that we're being profiled. (Ironically, you have to register to opt-out.)

I don't want to blame only Rapleaf—ad networks are doing this as well, giving it cutesy names like behavioral targeting. U.S. Representatives Edward Markey (D-Mass.) and Joe Barton (R-Tex.) recently sent a letter to Mark Zuckerberg and Facebook, questioning him about privacy breaches at the social network.In August 2010, these same representatives asked for information from various Web services on cookies and how they use them. Maybe they should consider looking at these data-collectors as well. Perhaps they will come to the conclusion that this industry needs some kind of oversight.

