LaRue's Views: June 23, 1999 - How Filters Work

Welcome

This blog represents most of the newspaper columns (appearing in various Colorado Community Newspapers and Yourhub.com) written by me, James LaRue, during the time in which I was the director of the Douglas County Libraries in Douglas County, Colorado. (Some columns are missing, due to my own filing errors.) This blog covers the time period from April 11, 1990 to January 12, 2012.

Unless I say so, the views expressed here are mine and mine alone. They may be quoted elsewhere, so long as you give attribution. The dates are (at least according my records) the dates of publication in one of the above print newspapers.

The blog archive (web view) is in chronological order. The display of entries, below, seems to be in reverse order, new to old.

All of the mistakes are of course my own responsibility.

Wednesday, June 23, 1999

June 23, 1999 - How Filters Work

I frankly admit my biases about this issue: I am flabbergasted that as part of a post-Columbine response, all across the country we’re loosening laws regarding access to guns — which do kill people — and locking down library terminals — which don’t.

I don’t have much of an opinion about gun laws. It’s not my area of expertise. But I do know something about libraries and the Internet. The current crop of proposed legislation requiring software “filters” (two bills at the federal level, some dozen at the state level) is so wildly off-base that I thought I’d point out some significant technical issues.

How do filters work? Well, there are only five kinds of data in a web page:

- the URL (Uniform Resource Locator) — as in “douglas.lib.co.us.” Some filtering software consists of databases of “bad” URLs, which are then blocked. But some tens of thousands of new sites come up every day, from all over the world, which makes tracking them a problem.

- the IP (Internet Protocol) address — as in 198.59.43.1. This is the machine language version of the URL. And we have the same problem as with URLs, but with a new wrinkle: there are several ways to enter an IP address such that (at least at this writing) it will bypass most filtering software. For instance, the IP can be entered in hexadecimal.

- metatags — this is coded text appearing in the header of a page, usually invisible when you view the page through a browser. The purpose is to offer information to a search engine about the content of the page. There are two problems here. First is that some disreputable types put deliberating misleading information in the metatags. For instance, shortly after Columbine, some pornographic web sites added “columbine” to their metatags. Many other common words and phrases are used the same way. Filtering software cannot possibly block these words without blocking access to nearly anything of interest.

The second problem is that filtering software tends to be very literal: it looks for word patterns, but has notoriously bad judgment about the thousands of exceptions. My favorite example is “XXX” — which just happens to appear in the site for several Super Bowls (where “XXX” stands for “thirty”). Another commonly cited problem is “breast” — which then blocks access to information about breast cancer.

- text within the page. Here, too, filtering software stumbles over pattern recognition. Some filtering software vendors are working hard on using new technologies — fuzzy logic, artificial intelligence. Nonetheless, it is still very difficult to determine the difference between an anti-Semitic web site, and a web site ABOUT anti-Semitism. Should both be blocked? Should either?

- graphics. If a web page is nothing but a picture (as many web pages are), filtering software simply can’t deal with it. Cartographers, meteorologists, and many others would love to see some pattern recognition regarding images. But that looks to be some 10 years out.

See why librarians have a problem with this “solution?” On every count, it can, and has been, clearly demonstrated that filtering software simply does not work.

Moreover, many of the things that are blocked raise serious issues of censorship. Exhaustive studies conducted by librarians show that not only do tons of inappropriate materials still get through, but also that many fine, perfectly appropriate sites do not, for reasons that are often unclear.

There’s a librarian out in Oregon who’s a big believer in filters. He recently crowed that libraries that use them only get an average of two complaints per month. Well, we don’t use filtering at all, and we only get about one complaint per quarter. (Of course, we don’t put Internet terminals in our children’s areas, either.)

I get more complaints from our patrons about almost anything else — the air-conditioning, the dandelions — every single day. If wanton Internet access isn’t a local problem, why do we need a state or federally mandated solution?

As I noted in a recent column, librarians aren’t wearing blinders. If we observe something that demonstrates extraordinary incivility — a man summoning Penthouse photos just as a Girl Scout troop arrives at the library — we would ask him to stop. And he would, or he would be ejected. In other words, we continue to supervise public space, as we always have.

So far, we have yet to be required to peer over your shoulder — physically or electronically — to see what text you’re reading, and to stop you if somebody in Denver or DC doesn’t approve. So far, librarians have yet to be persuaded that the root cause of youth violence is children who are spending too much time at the library.

LaRue's Views

Welcome

Wednesday, June 23, 1999

June 23, 1999 - How Filters Work

No comments:

Post a Comment