Finding it On the Net: Being A Digital Detective

Text Box: Presented by 
David Warlick
David is an Instructional Technology Consultant and Web Applications Developer.  He conducts workshops and speaks at conferences around the world.

The Landmark Project
Raleigh, North Carolina
919-571-3292
919-571-2760
david@landmark-project.com
http://landmark-project.com
http://landmark-project.com/dfw/

Perhaps the most important thing to understand about searching for information on the Internet is the fact that it is more like being a detective than simply pressing buttons. When researchers approach the Internet, they are looking for information on a certain topic. In an important way they already have an idea of what they will find...

á          The kind of Internet resource it might be,

á          Other topics that might be discussed on that web page,

á          What type of server on which the information might be found.

In the same way that the detective at the scene of a crime gets an immediate impression of the events that transpired, and then sets out to find clues and evidence that support that initial impression.   Internet researchers use their assumptions about what is available and what they will find as clues in their search.  As a crime detective keeps an eye open for evidence that paints a different story, successful Internet researchers are open to different types of information that they had not anticipate, resources and formats that were not expected.

Because searching the Internet means investigating a digital information world, it is important to understand that researching on the Internet is a process. It is almost never a single search with Alta Vista producing the best solution for your problem at the top of the list. It is always a series of searches, each revealing new clues, new avenues, and ultimately, the best information for your needs.

Earlier tools for finding resources on the Internet included programs like Archie, Veronica, and Jughead -- names owing to the eccentricities of the programmers who developed them the tools. A popular analogy during those years was that the Internet was like a used bookstore -- after an earthquake. Today's tools make the Internet appear, at least, to be more organized. Today's tools help find information and learn more about the Internet at the same time.  They enable more systematic approaches to Internet research and more success, which is especially impressive considering the geometric growth of the global electronic library.

There are essentially three tools or categories of tools that you can use to find digital resources on the Internet. They are: Topic Oriented Directories (or web directories), Search Engines, and what I call, Net-Smarts -- or becoming Net-wise.

 

Advantages of Topic Oriented Directories:

There are two major advantages of topic-oriented directories.

1.       They rely on a logical organization of information. Unlike encyclopedias and other reference books that organize information in alphabetical order, topic directories are organized by subject and topic. Therefore the information (rather than spelling) becomes the navigator. Another benefit of this information-based organization is that cross-references tend to exist linking different paths of research together.

2.       The second advantage lies in the final page that we received.  That final list of web pages will be a short and concentrated list of resources, concentrated in that they will all be about butterflies and moths.  If you have used a larger search engine before, such as Alta Vista, you know that a search of butterflies will produce a very large list of resources, only some of which are actually about butterflies.  For instance a search of Altavista generated a list of 264,150 web pages with the word butterflies or butterfly in them.  By contrast, Yahoo produced a list of only 37 sites, each one about butterflies.

 

Web Links available at Landmarks for Schools – http://landmark-project.com/

Just click “Online Handouts”.


Search Engines

 

Boolean Searching

Concept

Explanation/Example

Keyword

A keyword is a word or term that we want the search engine to consider in looking for relevant information. In our example one world that would likely appear in a web page about Native Americans is Indian.

 

Example:

Indian

OR

 

In many cases, there may be a synonym of our keyword that might appear in the web page instead of the keyword we have already chosen. So we will want to expand the number of pages that the search engine sends us to include the ones using the synonym. In the case of our example, many web pages would likely use the term Native American, which is more commonly used today than Indian. In this case we would use the operator, OR, to say that we want web pages with either the word Indian or the term Native American.

 

Example:

Indian OR Native American

AND

 

Since we are looking for information about Native Americans in the state of Ohio, then an additional keyword will be Ohio. We want to narrow the web pages that we get to only those about Native Americans in Ohio, so we will say that both terms must be present. Here is where we will use AND.

 

Example:

Indian OR Native American AND Ohio

NOT

As we think through the information that we are likely to receive, we realize that there is a baseball team in Cleveland, Ohio called the Indians. We will want to filter out all web pages about the baseball team. So we will add a new keyword, baseball, and connected it to our search express with the operator, Not. We are saying that the acceptable web page should NOT have the keyword baseball in it.

 

Example:

Indian OR Native American AND Ohio NOT baseball

quotes

Just as we use commas, question marks, and other punctuation to help communicate with people, we use special symbols to clarify what we want from a search engine. One example is the use of quotation marks to define phrases. In our example, Native American is going to look like two separate words to the search engine that could each appear any place in the web page. To communicate that these two words belong together as a distinct phrase, we use quotes.

 

Example:

Indian OR "Native American" AND Ohio NOT baseball

Parentheses

 

Each operator in a search expression defines a distinct keyword concept.

Keyword 1 AND Keyword 2

Keyword 3 OR Keyword 4

Keyword 5 NOT Keyword 6

A keyword concept can consist of:

A single keyword or phrase

Two single keywords or phrases connected by an operator

Keyword concepts connected by an operator to other keyword concepts or single keywords or phrases.

Individual keyword concepts are marked by enclosing them in parentheses. In our example, the following are distinct keyword concepts:

Indian

(Indian OR "Native American")

((Indian OR "Native American") AND Ohio)

The final keyword concept, the one that includes all constituent keyword concepts is called our search expression.

 

Example:

((Indian OR "Native American") AND Ohio) NOT baseball

Search engines are the miracle of the Internet.  These sophisticated tools seem to reach right into the global network, and scour its contents at your command.  In reality, they do not work exactly in this way, although the true nature of search engines is no less fascinating.  Technically, in order to be called a Search Engine, a search tool must be made up of three major components.

It must be a web page, which topic directories have; it must have an index or database, which topic directories have; but what topic directories don’t have are spiders or crawlers.  These are tiny programs, pieces of software that are constantly scouring the Internet, looking for new web pages to include in their index.

 

The Language of Search Engines

Search engines are your helpers. They are information assistants who aid you in finding the information that you need to solve a problem, answer a question, or make a decision. Like any other assistant, the degree to which they are able to help depends on the degree to which you are able to tell them what you want. Therefore, communicating with your search engine is a critical part of the search process.

Search engines need to know what information you seek, and they need this information communicated in a logical way -- they are, after all, computers. The language that we traditionally use to talk with computer-based searching tools is called boolean, named after George Boole, a mathematician of the 19th century.

In Boolean Logic we use keywords to describe what words to look for when searching the index. We also use operators to describe the relationships between our keywords and the information that we need. The basic operators are AND, OR, and NOT.

Let's use an example to explore how we would use Boolean Logic to search for information on the Internet. We will look for information about Native Americans in the state of Ohio.  In the table below we will explore several concepts involved in speaking Boolean and relate these concepts to our search.

Admittedly, Boolean Logic is not the simplest thing to understand or teach. However, it is a very effective way of communicating with search engines your information needs.

To make things easier for casual users, Internet search engines have developed alternatives to traditional Boolean Logic. One of the most common conventions is the use of pluses (+) and minuses (-), to indicate which terms must (+) and must not (-) be present in the returned documents. Each search engine has developed its own version of these searching conventions, each trying to improve upon these standards, and this evolution of the search language continues. None is perfect and you will find that finding information from the Internet is more a process than the click of a button.

 

An Alternative Search Convention

Pluses (+)

Any keywords in your search expression that MUST appear in your target web page should be preceded by a plus symbol (+).

If the keyword is a phrase, then it should be enclosed by quotes

Example: +basketball +"Mike Jordan"

Minuses (-)

Any keyword that must NOT appear in your target web page should be preceded by a minus symbol (-).

As when using the plus symbol, if the keyword is a phrase, then it should be enclosed by quotes.

Example: +basketball +"Mike Jordan" -Nike

Pipe (|)

This character is usually above the backslash (\)

The pipe character helps you to fine tune your search.  Place and pipe character between to search terms tells the search engine to search for the first term and then search for the second term within the first term's hits.

Example: Internet|Web

Advantages of Using Search Engines

There are three very compelling advantages of most search engines.

1.       The indexes of search engines are usually vast, representing significant portions of the Internet and offering a wide variety and quantity of information resources.

2.       The growing sophistication of search engine software enables us to precisely describe the information that we seek.

3.        The large number and variety of search engines enriches the Internet, making it at least appear to be organized.

 

 

Being Net-Smart

Net-smarts is perhaps your most valuable tool in finding information on the Internet. It is a growing awareness of what is available on the Internet and how it works, and a growing sense of "where is the best first place to start?" As mentioned earlier, searching the Internet involves investigating an information environment, turning over stones, checking for fingerprints, examining strands of hair. It means having an idea of what you are looking for, and at the same time being open for the unexpected.

More than anything, being net-smart involves asking questions. Here are some questions that must be asked and considered when embarking on an information safari on the Internet.

1.       What do you want to find?

How Big are the Search Engines

Search Engine

Millions of web Pages Indexed

Percent of the Internet assuming 1 billion pages.

AltaVista

350

35%

Fast

340

34%

Northern Light

260

26%

Google

230/512

23%

Excite

214

21%

Inktomi

110

11%

Go (Infoseek)

50

5%

Lycos

50

5%

From Search Engine Watch – http://www.searchenginewatch.com

2.       Will the information most likely be found in articles, company web pages, software, conferences, discussion groups, or people. The answer to this question helps you decide on a search strategy.

3.       Why would someone publish this information on the Internet?

4.       Who would publish this information on the Internet?

5.       Who would host this information on the Internet?

6.       What would a web page with the Information I seek look like?

Questions two through five would each help us in developing our search phrase.

7.       Are you wanting to broaden your knowledge of a general topic or do you want more narrow, specific information?

Broad or general information is usually best found in topic-oriented directories. More information on more specific topics is best found with search engines.

The S.E.A.R.C.H. Process

Conducting effective searches of the Internet is rarely a matter of typing in a single keyword and being presented with the solution to your problem. It is much more frequently a series of searches, each revealing more clues about the information that is available, and where that information can be found.

Developing a search process can be difficult, because each person's process depends on their personal style of using information and the particular types of information that they typically need.  However, there is a process that can be used as a springboard to the personal procedures that you develop with experience.  The process is called S.E.A.R.C.H.  It is an acronym for the process that has you Start with a small database search tool, Edit your search expression, Advance to a larger database search tool, Refine your search phrase, Cycle back and advance again, and finally, Harvest your information gems.

On the next page is a larger representation of the S.E.A.R.C.H. process.

 


Search Strategy

Search with a key term on Yahoo or another small index search tool.

Notes:

 

You start with a small index search tool for two reasons:

1.        You will receive a limited and manageable number of hits.

2.        The hits that you get will be representative of what is available on the subject

Examine the hit pages collecting words that are common among the relevant hits and words that are common among the irrelevant hits.

Edit the search expression with terms gleaned from the initial search.

 

 

Add words collected from the initial search, including words common among relevant and irrelevant pages.  Construct a boolean search expression that effectively communicates the information that you seek.

Advance into more advances and extensive search engines

 

 

Enter the edited search phrase into a larger index search tool.  Examples are:

á          Excite                            http://www.excite.com

á          InfoSeek                        http://www.infoseek.com

á          Alta Vista                      http://www.altavista.digital.com

á        HotBot                           http://www.hotbot.com

 

Refine the search expression

 

 

Explore the pages reported by the larger search engine and refine the expression even more, further defining the relevant hits, and filtering the irrelevant.  Again, examine both good hits and bad hits.

 

Cycle back and Advance again.

 

 

Return to the advanced search engine that you used before or use another search engine.

 

Harvest the results

 

 

Collect the needed information by printing, downloading, forwarding by e-mail or just reading.