What's Google's algorithm for ferreting out "racism" in Portuguese?by
Google has agreed to provide Brazilian authorities with data on users who encourage racism, homophobia and pedophilia. (ex Battelle) Plenty of serious questions about privacy and freedom of expression, of course. But I'm wondering exactly how Google goes about locating hate speech.
It can't just be a question of looking for hateful words. If so, a literary analysis of Huckleberry Finn could end up in the batch. There are plenty of more advanced methods, which analyze the syntax and verb groupings in a text. That takes up a lot of computing power and produces lots of false positives.
Blog analysis companies like Umbria Inc. use human readers to pick out examples of what they're looking for. Then they use these as templates to "teach" the machine how to find more of the same. Some anti-spam companies use a similar approach. As we all know, they don't always get it right. Regardless of the technical specifics, I'm betting that some Brazilian who puts off-color jokes on Orkut, or perhaps pictures of his eight-year-old daughter's birthday party, is going to be IDed by Google's computers as a criminal suspect.