Information Representation

LIS 4701 Information Representation

Week 8: Tying representation to the retrieval tool

This week's reading:

Tufte, Chapter 6: Multiples in space and time.

Administrative items:

Reminder: we have guests!

Your papers are done. We will use them in class today!

Thursday: you may either come to 216 and do a lab with me from 3:35 to 4:35, or you may attend Jessica Larkey's talk in room 258 from 4:00-5:00 p.m. If you attend Jessica's talk, you need to send me a paragraph about it in order to receive credit for that day's session. I will probably drop in at 4:35, just FYI :-).

Don't forget that the GICCA is next Thursday, March 4. You should come to class already divided into groups of 3 (preferable) or 4 (we will have to have a couple of these to even it out).

What we will do this week:

1) Highlight the important points of Tufte's Chapter 6
2) Discuss what we mean by tying representation to the retrieval tool

Tufte Chapter 6: Multiples in space and time

Why this chapter now?

Because Tufte's discussion of multiples helps us to think about systems rather than individual things.

The more information you include AND try to represent in a standardized way (see p. 107) the more likely you are to be able to use your knowledge in conjunction with that information to build correct new knowledge (p. 106).

In part, this is because, as we see on p. 108, looking at multiples in space and time (see why he makes this his centerpiece example?) helps us to ease over confusing transitions in the data.

Sort of like looking at stock market graphs. Seen with few multiples (few data points) over short periods of time, the market looks very volatile. Seen with many multiples (many data points) over long periods of time, the steady upward trend becomes clear.

[Notice Tufte's important point on p. 109: in depicting movement in flatland, it's important that when "space replaces time as the sequencing dimension" that the space accurately represents time.]

Now look at page 111 and let's think about tying what we know about representation to the simple retrieval system shown.

Important point: for this whole graphical represention (retrieval system), someone had to make a decision about what were the relevant and appropriate pieces of information to include, and how to represent them. So, someone had to look and decide: what are the relevant attributes of this item to include in this system?

Notice that this decision can happen from the specific to the general (like your last assignment) or from the general to the specific.

Specific to general: you work with each item that you want to represent, deciding what are the important bits to include. Then, looking over your collection as a whole, you generalize to decide what TYPES of information to include. For item 1, you note: contains sugar. For item 2, you note: contains aspartame. For item 3, you note: containes acesulfame K. Then, looking at the collection, you decide you need a TYPE of information called :Sweetener, whose values can be sugar, aspartame, acesulfame K, AND ... others! So the values of the type are not necessarily restricted to the specific items you were working with. You've achieved generalizability, and this is a good thing (note that if you do decide to restrict the value of the atttibute Sweetener to one of several acceptable values, you've done what? Yes, you've made a controlled vocabulary, albeit a very small one)

General to specific: You know that for each item, you want to include the date. This is a general type of information, or an attribute of all the pieces of information you're going to include. Then, you work through the specific values as they come up. You might find that for some, you only have the month/year, and you have to figure out how to deal with that specific. Or you might not have a date, and you have to figure out how to handle a null value.

But since Tufte's thing is the visual, it will stand us in good stead to address the visual for a moment, as on pp. 112-113.

Meaningless clusters created by the inclusion of unnecessary attributes (oh, wait, this isn't restricted to the visual either ...). This is very important. When you were doing A2, you probably noticed that in your compare/contrast bit, you found things that were alike and different that were true yet seemed perhaps trivial?

This of course depends on what? (yes, the context)

But back to Tufte ... false clusters by color, and false time sequences by color, too.

But ... back to contexts of use of an information system: making the representation match the retrieval system means acknowledging context of use, such as underwater. (Yes this also attaches to the user, which we'll do later). Submersible! A value for "form" that also identifies the attribute of "purpose" : quite an economical approach, unless you are concerned (as you should be) about conflating the contents of multiple fields. (why city, state, and ZIP+4 don't all go in one container)

p. 116 disinformation design by fishes. Tying representation to the realities of retrieval happens even in nature.

p. 116 (and p. 104): multiples in representation can reinforce perceptions. (the more grammatically correct sentences you assemble in one place, the more likely I am to think that you're an adequate writer).

p. 119 but what bad thing has happened from page 118?

Let's look at an example

(Hint: what happens differently in your head when you see a color photo as opposed to a black-and-white one?)

And so we go ...

2) tying representation to the retrieval tool

Basically what we mean is this:

when you create a representation you decide what to include

In deciding what to include, you are determing what your retrieval system will be able to do later.

If you don't include phone numbers in the represention, your retrieval system will never be able to retrieve by phone number, or for phone number.

If you don't include the name of the creator of a web page in the representation (metadata) for every web page, then later, you will never be able to use your retrieval system to aggregate documents by the author (and you won't be able to tell who wrote stuff, either).

Many people when they think about retrieval systems think about speed, or how much stuff it can get back ... but not about the representation of the items that went in.

Think about all of the attributes that are missing (or may be missing) from the representations that google uses for its search, because the representations for what goes in haven't been the focus of the design.

Successful tying of your representation to its robust use in a retrieval tool only happens if you standardize your representations, making things consistent from one item to the next. This means including the same types of information for each item, figuring out what to do if an item has an attribute that you WANT to include but that doesn't have a slot in your scheme, and vv, what happens when your item doesn't have an attribute to fill in a needed slot.

3 muddiest points? 3 clearest points?