The Semantic Web and The Quilombo

I study a Brazilian martial art called Capoeira that was practiced by Brazil’s slave population for about 400 years. It is said to have originated in Angolafrom which most Brazilian slaves were kidnappedand the martial art is full of songs which tell a story of rebellion and oppression. One of the main stories is of a hero named Zumbi who fought against the ruthless Portuguese land owners and founded a hide-out in the Amazon jungle called a “quilombo”. During Brazil’s slave period men like Zumbi and their quilombos (which had grown in number over the years) fought a guerrilla war against the Portuguese and finally the slaves overthrew their slave masters using Capoeira to forge modern Brazil.

It’s entirely possible that, without the well hidden and defended quilombos and rebellious people like Zumbi, Brazil would have been a slave country for much longer. What made the Quilombo possible is that the slave masters couldn’t find them. These small fortress villages were too well hidden in the dense Amazon jungle-which is known to be hostile to most humans. If the slave masters could just click a “find the Quilombo” button on a web browser then Brazil’s history would have been quite different.

In World War II Germany there was something just like this “find Quilombo” button. Right before Hitler and the Nazi’s rounded up the Jews, they hired IBM to create a computer they could use in their next census. This computer took information from the most recent census and literally spit out a list of all the Jewish people in Germany. Then, one day, the Germans moved in and collected men, women, and children to be exploited and exterminated. But, the Jews also had their Quilombos thanks to the few resistance fighters-especially in Poland. Books like “The Avengers: A Jewish War Story” detail how guerrilla fighters not only took on the Germans but also continued to hunt down Nazis after WWII.

These bits of history demonstrate that humans frequently use technology to kill and enslave other humans. During Brazil’s history the technology was ships, transportation, steel, and firearms. In Hitler’s case it was practically the same stuff, but with a new even more devastating weapon: information processing. These days it’s common that if a government finds a new way to analyze its citizens then the next step is oppression. You laugh, but look at how traditionally oppressive governments like China have used technology to find and prosecute dissidents. Even how the US Department of Transportation uses CAPPS II to find “potential terrorists” based on “information collected in government and commercial databases”. That’s right, some dipshit with a high school education at American Airlines is going to take down your personal information and then tag you with a color to indicate if you’re a “threat”.

So what does the Semantic Web have to do with all of this? The purpose of the Semantic Web is to make the web machine processable. This sounds like a great idea, but who needs it? I mean, who needs to process all of the documents people on the Internet write in order to analyze them? I don’t need it. I’m pretty sure you don’t need it. So who benefits from this? The answer is anyone who wants to control that information and possibly the people who generate it. People continually put forth how this is going to save you so much time. How you’ll be able to search for things by their “semantic meaning”, even though I seriously doubt all of human experience can be distilled into a bunch of <humor>, <related>, <subject> and <person> tags.

Hopefully it will be a massively failed project and nobody will need to worry about it. I doubt it though since it will be such a great boon for organizations that want to organize information. Think of the semantic web in terms of Government information processing needs. Imagine if the President Bush wants the Secret Service to find everyone on-line who has said anything bad about him. The Secret Service’s only choice right now is to go use a web search engine like Google and all they really get are a list of web pages without any real relational information or too much information about who really wrote the pages.

Now lets add the semantic web and its ability to not only let people search for things by their semantic meaning, but also projects like FOAF which let you find out how people are related. The Secret Service does a search for the semantic phrase “? > hates> Bush” and pulls up all the people who hate GW. Then they use FOAF to figure out who knows these people. Instantly the Secret Service is able to go after you, and if they can’t then they can go after your friends, lovers, children, and anyone who owes you money. In the now famous words of Guinness … “Brilliant!”

The only catch to this whole scenario is that nobody is required to use the W3C’s products. Right? Yeah sure, and you have all the freedom to choose between maybe 6 whole versions of HTML when you do your web pages and you can even use 2 versions of HTTP. That’s a whole lot of choice sure (I’m being sarcastic here). No, I’m sure the W3C will have no problems turning their wonderfully crafted personal data crunching web formats into an “official” standard. And with a Congress that has already forced all libraries in America to place “content filtering” on every computer with Internet access, I’m sure its only a small step to sneak in a law requiring certain formats. You know how SGML became popular? The US Government started requiring all submitted documentation in SGML as the official format. And this is just the US. Countries with even poorer human rights records wouldn’t have any problem doing this.

Enforcement is a piece of cake as well since the semantic web is machine parseable. Simply state in the law that “da gov” will use a big computer to continually scan the Internet, and if the computer finds a document which is not parseable according to the W3C standards then they fine the person who wrote it. Or “detain”, “imprison”, “kill”, whatever, they’re just people anyway and I’m sure corporations will be exempt somehow.

I really don’t think that the W3C has it out for people, I just think that maybe they haven’t thought of the implications of introducing their technology. Especially project like FOAF which just really pisses me off. They don’t even think that people might consider their contact personal. Now LOAF has the right idea. Their social network format is a Bloom Filter which makes it nearly impossible to figure out who is related to who unless you already know the target person. These folks thought ahead and made sure that you can share acquaintances without fear of reprisal or having someone start sending you SPAM because you happen to know a guy with a small penis. Or, you happen to have a small penis and all your friends start sending you e-mails asking, “Hey man, how come I started getting”Enzyte”: offers?”

The W3C probably can’t do anything about these problems really. They are just a bunch of geeks trying to do something cool. What I hope they do though is to start thinking about these things and maybe start to work with law makers to make sure that their tools don’t get used in a bad way. I great law I would like to see is that the government cannot mix databases and can’t use commercial information. This would mean that the IRS cannot share its information with the Department of Homeland security without a specific targeted warrant for that person’s information. Throw in provisions against “surreptitious information collection against private citizens” and you’ve got the beginning of a great law. The “surreptitious” provision would simply state that the government cannot collect or store information about a citizen unless it was in relation to a transaction that citizen initiated. Filing a tax return, getting arrested, joining the military, and applying for student loans are all transactions requiring some information to complete. Going to a war protest, buying lunch at Denny’s, or having sex with your girlfriend are not transactions.