
Perhaps the most
important thing to understand about searching for information on the Internet is the fact
that it is more like being a detective than simply pressing buttons. When researchers
approach the Internet, they are looking for information on a certain topic. In an
important way they already have an idea of what they will find...
á
The
kind of Internet resource it might be,
á
Other
topics that might be discussed on that web page,
á
What
type of server on which the information might be found.
In the same way that
the detective at the scene of a crime gets an immediate impression of the events that
transpired, and then sets out to find clues and evidence that support that initial
impression. Internet researchers use
their assumptions about what is available and what they will find as clues in their
search. As a crime detective keeps an eye
open for evidence that paints a different story, successful Internet researchers are open
to different types of information that they had not anticipate, resources and formats that
were not expected.
Because searching
the Internet means investigating a digital information world, it is important to
understand that researching on the Internet is a process. It is almost never a single
search with Alta Vista producing the best solution for your problem at the top of the
list. It is always a series of searches, each revealing new clues, new avenues, and
ultimately, the best information for your needs.
Earlier tools for
finding resources on the Internet included programs like Archie, Veronica,
and Jughead -- names owing to the
eccentricities of the programmers who developed them the tools. A popular analogy during
those years was that the Internet was like a used bookstore -- after an earthquake.
Today's tools make the Internet appear, at least, to be more organized. Today's tools help
find information and learn more about the Internet at the same time. They enable more systematic approaches to Internet
research and more success, which is especially impressive considering the geometric growth
of the global electronic library.
There are
essentially three tools or categories of tools that you can use to find digital resources
on the Internet. They are: Topic Oriented
Directories (or web directories), Search
Engines, and what I call, Net-Smarts -- or
becoming Net-wise.
Advantages of Topic
Oriented Directories:
There are two major advantages of topic-oriented directories.
1. They rely on a logical
organization of information. Unlike encyclopedias and other reference books that organize
information in alphabetical order, topic directories are organized by subject and topic.
Therefore the information (rather than spelling) becomes the navigator. Another benefit of
this information-based organization is that cross-references tend to exist linking
different paths of research together.
2. The second advantage
lies in the final page that we received. That
final list of web pages will be a short and concentrated list of resources, concentrated
in that they will all be about butterflies and moths.
If you have used a larger search engine before, such as Alta Vista, you know that a
search of butterflies will produce a very large list of resources, only some of which are
actually about butterflies. For instance a
search of Altavista generated a list of 264,150
web pages with the word butterflies or butterfly in them.
By contrast, Yahoo produced a list of
only 37 sites, each one about butterflies.
Boolean Searching |
||
Concept |
Explanation/Example |
|
Keyword |
A keyword is a word or term that we want
the search engine to consider in looking for relevant information. In our example one
world that would likely appear in a web page about Native Americans is Indian. |
|
|
Example: |
Indian |
OR |
In many cases, there may be a synonym of
our keyword that might appear in the web page instead of the keyword we have already
chosen. So we will want to expand the number of pages that the search engine sends us to
include the ones using the synonym. In the case of our example, many web pages would
likely use the term Native American, which is
more commonly used today than Indian. In this
case we would use the operator, OR, to say that
we want web pages with either the word Indian
or the term Native American. |
|
|
Example: |
Indian OR Native American |
AND |
Since we are looking for information
about Native Americans in the state of Ohio, then an additional keyword will be Ohio. We
want to narrow the web pages that we get to only those about Native Americans in Ohio, so
we will say that both terms must be present. Here is where we will use AND. |
|
|
Example: |
Indian OR Native American AND Ohio |
NOT |
As we think through the information that
we are likely to receive, we realize that there is a baseball team in Cleveland, Ohio
called the Indians. We will want to filter out all web pages about the baseball team. So
we will add a new keyword, baseball, and
connected it to our search express with the operator, Not. We are saying that the acceptable web page
should NOT have the keyword baseball in it. |
|
|
Example: |
Indian OR Native American AND Ohio NOT
baseball |
quotes |
Just as we use commas, question marks,
and other punctuation to help communicate with people, we use special symbols to clarify
what we want from a search engine. One example is the use of quotation marks to define
phrases. In our example, Native American is going to look like two separate words to the
search engine that could each appear any place in the web page. To communicate that these
two words belong together as a distinct phrase, we use quotes. |
|
|
Example: |
Indian OR "Native American"
AND Ohio NOT baseball |
Parentheses |
Each operator in a search expression defines a
distinct keyword concept. Keyword 1 AND Keyword 2 Keyword 3 OR Keyword 4 Keyword 5 NOT Keyword 6 A keyword concept can consist of: A single keyword or phrase Two single keywords or phrases connected
by an operator Keyword concepts connected by an
operator to other keyword concepts or single keywords or phrases. Individual keyword concepts are marked
by enclosing them in parentheses. In our example, the following are distinct keyword
concepts: Indian (Indian OR "Native American") ((Indian OR "Native American")
AND Ohio) The final keyword concept, the one that
includes all constituent keyword concepts is called our search expression. |
|
|
Example: |
((Indian OR "Native American")
AND Ohio) NOT baseball |
Search engines are
the miracle of the Internet. These
sophisticated tools seem to reach right into the global network, and scour its contents at
your command. In reality, they do not work
exactly in this way, although the true nature of search engines is no less fascinating. Technically, in order to be called a Search Engine, a search tool must be made up of
three major components.
It must be
a web page, which topic directories have; it must have an index or database, which topic
directories have; but what topic directories dont have are spiders or crawlers. These are tiny programs, pieces of software that
are constantly scouring the Internet, looking for new web pages to include in their index.
The Language of Search Engines
Search engines are
your helpers. They are information assistants who aid you in finding the information that
you need to solve a problem, answer a question, or make a decision. Like any other
assistant, the degree to which they are able to help depends on the degree to which you
are able to tell them what you want. Therefore, communicating with your search engine is a
critical part of the search process.
Search engines need
to know what information you seek, and they need this information communicated in a
logical way -- they are, after all, computers. The language that we traditionally use to
talk with computer-based searching tools is called boolean,
named after George Boole, a mathematician of the 19th century.
In Boolean Logic we use keywords to describe what words to look for when
searching the index. We also use operators to
describe the relationships between our keywords
and the information that we need. The basic operators
are AND, OR, and NOT.
Let's use an example to explore how we would use Boolean Logic to search for information on the Internet. We will look for information about Native Americans in the state of Ohio. In the table below we will explore several concepts involved in speaking Boolean and relate these concepts to our search.
Admittedly, Boolean
Logic is not the simplest thing to understand or teach. However, it is a very effective
way of communicating with search engines your information needs.
To make
things easier for casual users, Internet search engines have developed alternatives to
traditional Boolean Logic. One of the most common conventions is the use of pluses (+) and
minuses (-), to indicate which terms must (+) and must not (-) be present in the returned
documents. Each search engine has developed its own version of these searching
conventions, each trying to improve upon these standards, and this evolution of the search
language continues. None is perfect and you will find that finding information from the
Internet is more a process than the click of a button.
An
Alternative Search Convention |
|
Pluses
(+) |
Any
keywords in your search expression that MUST appear in your target web page should be
preceded by a plus symbol (+). If
the keyword is a phrase, then it should be enclosed by quotes |
Example:
+basketball +"Mike Jordan" |
|
Minuses
(-) |
Any
keyword that must NOT appear in your target web page should be preceded by a minus symbol
(-). As
when using the plus symbol, if the keyword is a phrase, then it should be enclosed by
quotes. |
Example:
+basketball +"Mike Jordan" -Nike |
|
Pipe
(|) This
character is usually above the backslash (\) |
The
pipe character helps you to fine tune your search.
Place and pipe character between to search terms tells the search engine to search
for the first term and then search for the second term within the first term's hits. |
Example:
Internet|Web |
|
Advantages of
Using Search Engines
There are three very
compelling advantages of most search engines.
1. The indexes of
search engines are usually vast, representing significant portions of the Internet and
offering a wide variety and quantity of information resources.
2. The growing
sophistication of search engine software enables us to precisely describe the information
that we seek.
3. The large
number and variety of search engines enriches the Internet, making it at least appear to
be organized.
Being
Net-Smart
Net-smarts is
perhaps your most valuable tool in finding information on the Internet. It is a growing
awareness of what is available on the Internet and how it works, and a growing sense of
"where is the best first place to start?" As mentioned earlier, searching the
Internet involves investigating an information environment, turning over stones, checking
for fingerprints, examining strands of hair. It means having an idea of what you are
looking for, and at the same time being open for the unexpected.
More than anything, being net-smart involves asking questions. Here are some questions that must be asked and considered when embarking on an information safari on the Internet.
1. What do you want to
find?
How Big are
the Search Engines |
||
Search
Engine |
Millions
of web Pages Indexed |
Percent
of the Internet assuming 1 billion pages. |
AltaVista |
350 |
35% |
Fast |
340 |
34% |
Northern
Light |
260 |
26% |
Google |
230/512 |
23% |
Excite |
214 |
21% |
Inktomi |
110 |
11% |
Go
(Infoseek) |
50 |
5% |
Lycos |
50 |
5% |
From
Search Engine Watch http://www.searchenginewatch.com |
||
2. Will the information
most likely be found in articles, company web pages, software, conferences, discussion
groups, or people. The answer to this question helps you decide on a search strategy.
3. Why would someone
publish this information on the Internet?
4. Who would publish this
information on the Internet?
5. Who would host this
information on the Internet?
6. What would a web page
with the Information I seek look like?
Questions two through
five would each help us in developing our search phrase.
7. Are you wanting to
broaden your knowledge of a general topic or do you want more narrow, specific
information?
Broad or general
information is usually best found in topic-oriented directories. More information on more
specific topics is best found with search engines.
Conducting effective
searches of the Internet is rarely a matter of typing in a single keyword and being
presented with the solution to your problem. It is much more frequently a series of
searches, each revealing more clues about the information that is available, and where
that information can be found.
Developing a search
process can be difficult, because each person's process depends on their personal style of
using information and the particular types of information that they typically need. However, there is a process that can be used as a
springboard to the personal procedures that you develop with experience. The process is called S.E.A.R.C.H. It is an acronym for the process that has you Start with a small database search tool, Edit your search expression, Advance to a larger database search tool, Refine your search phrase, Cycle back and advance again, and finally, Harvest your information gems.
On the next
page is a larger representation of the S.E.A.R.C.H. process.
Search
Strategy
Search
with a key term on Yahoo or another small index search tool. |
Notes: |
||
|
You
start with a small index search tool for two reasons: 1. You
will receive a limited and manageable number of hits. 2. The
hits that you get will be representative of what is available on the subject Examine
the hit pages collecting words that are common among the relevant hits and words that are
common among the irrelevant hits. |
||
Edit
the search expression with terms gleaned from the initial search. |
|
||
|
Add
words collected from the initial search, including words common among relevant and
irrelevant pages. Construct a boolean search
expression that effectively communicates the information that you seek. |
||
Advance
into more advances and extensive search engines |
|
||
|
Enter
the edited search phrase into a larger index search tool.
Examples are: á
Excite
http://www.excite.com á
InfoSeek
http://www.infoseek.com á
Alta
Vista
http://www.altavista.digital.com á
HotBot
http://www.hotbot.com |
|
|
Refine
the search expression |
|
||
|
Explore
the pages reported by the larger search engine and refine the expression even more,
further defining the relevant hits, and filtering the irrelevant. Again, examine both good hits and bad hits. |
|
|
Cycle
back and Advance again. |
|
||
|
Return
to the advanced search engine that you used before or use another search engine. |
|
|
Harvest
the results |
|
||
|
Collect
the needed information by printing, downloading, forwarding by e-mail or just reading. |
|
|