25 February 2011

Computers and Language: Part 1

Last week, peppered amidst the news of uprising in the Middle East and Northern Africa are reports of a more more sinister nature. Well, according to the conspiracy theorists among us. For the rest of us, the computer Watson and his romp on Jeopardy was simple, light fun and a small precursor to what the future may hold for us.

This video is of the first half of the first episode airing the matchup of Watson against the two best contestants humanity could offer, Ken Jennings and Brad Rutter. Feel free to watch the whole nine minute ordeal, but the section that interests me is at the beginning when IBM has a small documercial (documentary commercial, a word i made up for this sentence) about Watson and the team who designed him.

With Watson and IBM so prominent in the news, articles are cropping up left and right concerning machine intelligence and the day, looming ever closer, we can refer to a machine as a thinking entity.

So far, we are safe from a machine uprising. In the first of two games, Watson went into Final Jeopardy with a commanding lead over the two human opponents. Under the category of ‘U.S. Cities,’ the clue given somehow managed to trip up Watson and leave even non-Jeopardy players scratching their heads.

‘Its largest airport is named for a World War II hero; its second largest for a World War II battle.’

The correct answer: Chicago.

Watson’s answer: Toronto.

Obviously, and most North Americans would know this, Toronto is not a U.S. city, residing as it does in the grand country known as Canada.

Jennings and Rutter both answered correctly, so why did Watson get the answer so very wrong? Steve Hamm at IBM, through their Smarter Planet blog through their had an answer for us as to why Watson was so very wrong:

First, the category names on Jeopardy! are tricky. The answers often do not exactly fit the category. Watson, in his training phase,  learned that categories only weakly suggest the kind of answer that is expected, and, therefore, the machine downgrades their significance.  The way the language was parsed provided an advantage for the humans and a disadvantage for Watson, as well. “What US city” wasn’t in the question. If it had been, Watson would have given US cities much more weight as it searched for the answer. Adding to the confusion for Watson, there are cities named Toronto in the United States and the Toronto in Canada has an American League baseball team. It probably picked up those facts from the written material it has digested. Also, the machine didn’t find much evidence to connect either city’s airport to World War II. (Chicago was a very close second on Watson’s list of possible answers.) So this is just one of those situations that’s a snap for a reasonably knowledgeable human but a true brain teaser for the machine.

The problem Watson ran into what partly due to its programing and training--not putting as much weight on the category as was necessary--and partly due to the wording of the question.

Jeopardy, as difficult as it can be due to the puns and wordplay often involved in the clues, is a static medium. There is always a category, always a clue, and always an answer question that answers the clue and fits within the category. It’s not the fluid, dynamic web of language we call a conversation.

As Dr. Katharine Frase states in the documercial, normal humans communicate more in a style she calls ‘open-question answering.’ There is a small bit of chaos in our interactions with one another.

This idea of chaotic conversation leads to more roads, and another blog for next week.

For the record, the final totals after the two games left Watson and IBM with $77,147, Jennings with $24,000 and Rutter at the bottom with $21,600. IBM says it will donate the money to charity.

10 February 2011

A Rape by Any Other Name

Last week, the Republicans in the House of Representatives removed a provision from at Act on the floor that would have been detrimental to women's rights and sent the wrong message to the public. Not to mention the damage it would do to language.

The 'No Taxpayer Funding for an Abortion Act' originated with Chris Smith, a Republican Representative from New Jersey. The purpose is pretty obvious—prevent the use of government money earmarked for health care and medicare being used to pay for abortions. The obvious goal behind this is to chip away at the current stance the government takes on abortion, with the ultimate goal to make it illegal. The stated impetus is to save government money in a time when our deficit is measured in numbers generally reserved for grade school children attempting hyperbole.

I won't get into the complicated and muddied topic of abortion; that's not in the scope of this blog, nor is it a topic i have considerable knowledge it. Suffice it to say, i know it's not always as black and white as each side paints it, so let's leave it there.

Instead, i want to talk about one word used in the No Taxpayer Funding for an Abortion Act, from this point referred to as House Resolution 3, or HR3. So much more concise. In the resolution, one passage stands out above the rest. It states the Act 'shall not apply to an abortion' if, for instance, 'the pregnancy occurred because the pregnant female was the subject of an act of forcible rape or, if a minor, an act of incest.' The document has since been changed to remove the word 'forcible,' but only after websites such as Mother Jones and MoveOn.org brought attention to the matter.

If 'forcible' were included in the bill, it would leave certain women up a creek without an option, for example, a woman raped while drugged, or a mentally challenged woman taken advantage of, or even certain instances of date rape.

I read a passage in President Obama's book The Audacity of Hope that rang true for this circumstance.
'Much of the time, the law is settled and plain. But life turns up new problems, and lawyers, officials, and citizens debate the meaning of terms that seemed clear years or even months before. For in the end laws are just words on a page—words that are sometimes malleable, opaque, as dependent on context and trust as they are in a story or poem or promise to someone, words whose meanings are subject to erosion, sometimes collapsing in the blink of an eye.'

What Obama refers to is the trouble we often have with interpretation of laws, often even the Constitution. We like to think of our laws, and especially our Constitution, as being immutable and firm. Firm they may be, but never immutable. The three branches of government are constantly re-examining documents and laws to further refine them.

The problem with HR3 is how it attempted to shortcut the process by refining the definition of 'rape.' Whomever was inspired to add the word 'forcible' probably thought they were being smart, removing the case for statutory rape, but being compassionate and allowing for abortions for those women who were physically beaten into submission.

Whether HR3 passes and becomes law is still to be seen, but if it does, it will allow for the original, broader and more compassionate, coverage.

03 February 2011

The Pyramids Are Revolting

In response to the recent unrest in Egypt, the government in China has been closely monitoring the internet, specifically preventing netizens from searching for the term 'Egypt' on social networking sites, as well as keeping the news coverage in official channels to the bare minimum.

Censorship in China is no new practice, even when it comes the internet. Since 2003, they have operated what is known as the Great Firewall of China to be able to prevent their citizens from gaining access to certain sites, IP addresses, and keyword searches.

When it comes to technology, especially when using it to prevent people from doing something, there are always work-arounds. For the Great Firewall, proxy servers outside of China, virtual private networks and various free programs allow, to varying extents, access to information and websites not allowed by the Great Firewall.

As for the keyword searches, there is an even easier fix: different keyword. In order to better monitor their citizens, China has their own version of Twitter called Weibo. It is one of the sites which prevents users from searching for the 'Egypt.' As any high school student in the United States can tell you, when one word is disallowed, another more innocuous word takes its place. Where Chinese citizens cannot search for 'Egypt,' they might very well be looking around for posts and articles about 'pyramids,' 'Nile,' 'Cairo,' or countless other new keywords.

Short of killing the entire country's internet, dissident information will always leak to the people looking for it. Even without internet, people are able to coordinate well enough to disseminate information, and this Wired article tells you how.

That's the beauty of language for me; like Malcolm tells us in Jurassic Park, '[language] finds a way.' As literacy grows, so does knowledge. As knowledge grows, so do the possibilities. Language and literacy are doors that open to wider worlds, not smaller. I guess that's what makes it so scary to the Chinese government.