SEARCH ADVISOR:
Training Internet users towards search sessions.

Avgoustos. A. Tsinakos and Konstandinos. G. Margaritis
Department of Informatics
University of Macedonia


ABSTRACT

Internet and World Wide Web have experienced an enormous explosion recently. With the explosive growth in the availability of on line information it has become increasingly difficult for most of Internet users to locate and exploit the information avail able. The development of Intelligent Agents seems to be the most promising answer to confront this problem. Autonomous Agents allow a radically new approach which allows for easy and efficient information retrieval. This paper outlines the use of Intelligent Agents developed on World Wide Web towards user training and information retrieval tasks. Additionally describes a system called "SEARCH ADVISOR", which provides help towards Internet users as far as Internet b ased information retrieval is concerned, using expert searchers knowledge. As it is a propose-and-revise system, can be used as an Intelligent Agent in the construction of search strategies for information retrieval .

Keywords: Information retrieval, Internet Learning and Teaching, Artificial Intelligence in Education, Intelligent Agents, Internet, Search .


1. Introduction

The rapid growth of data volume and diversity in Internet and World Wide Web has created significant problems related to the efficiency and accuracy of the information retrieval. Additionally, information in existing Internet repositories is heterogeneou s, inconsistent and sometimes incomplete [1]. This fact increases the difficulty of the above mentioned problem. To make effective use of this wealth of information, user needs means to locate information. In the past few years, a number of such resource discovery tools have been created such as:

  1. Internet Browsing and Exploring systems, such as Gopher, Hytelnet, Global Network Navigator,
  2. Subject - Oriented Search systems, such as WWW, Virtual Library, Yahoo, USENET Frequently Asked Questions Archive,
  3. Word - Oriented Search systems, such as Lycos, Web Crawler, Knowbot, Archie, WAIS and have gained wide popular acceptance in the Internet.
Further models originally developed for Artificial Intelligence research, have been applied to Information Retrieval leading to the development and evaluation of intelligent retrieval models for text documents, such as those found in bibliographic databa ses. These retrieval models specify strategies for evaluating documents with respect to a given query, typically resulting in a ranked output. Hypertext researchers, on the other hand have emphasized flexible organizations of multimedia "nodes" through connection made with user-specified links and interfaces that facilitate browsing in this network of links. A number of approaches to the integr ation of query-based retrieval strategies and browsing in hypertext networks have been proposed. The I3R system [3] and the medical handbook system described by Frisse, for example, use query based retrieval strategies to form a ranked list of candidate "starting points" for hypertext browsing. Also a number of probabilistic retrieval models for hypertext have been proposed [5] [6]. These models view hypertext links as specifying important dependencies between hypertext nodes. The aim of the retrieval strategies based on these models are to improve the effectiveness of retrieval and to provide better starting points for browsing [4].

2. Intelligent Agents

The development of Agent software has brought a new approach to information retrieval. Broadly defined, an agent is a program that performs unique tasks without direct human supervision. Agents are programs that have some special skill and are able to en gage and help users in complex actions. As such, agents transforms the user from a worker into a manager who delegates tasks to that agent.[2] (Fig 1). An agent is the carrier of will, the entity that chooses between possible actions. Agents cannot be s eeing, but we can only see what they are performing.

Figure 1,
User Agent interaction.

To be called "intelligent", an agent must satisfy several interrelated criteria. Danniel Weld offers five attributes which capture the essence of an intelligent agent [7]:
  1. Integrated. The agent must support an understandable, consistent interface
  2. Expressive. The agent must accept requests in different modalities.
  3. Goal-oriented. The agent must determine how and when to achieve a goal.
  4. Cooperative. The agent must collaborate with the user.
  5. Customised. The agent must adapt to different users.
In summary, an intelligent agent must be capable of autonomous goal- oriented behavior in some environment that acts as a personal assistant to the user in order better easier and more efficient performance for some tedious and time consuming human task s to be achieved (such as information retrieval). In this realm, an agent will help to reduce complexity and increase efficiency. Intelligent Agents seems to be the future of information retrieval on the Net allowing people to spend less time searching for information - and more time utilizing or analyzing "good" information that is "auto-retrieved". They will allow information retrieval novices to achieve expert "power searcher" results.

SEARCH ADVISOR, the system we are going to describe, provides help towards Internet users as far as Internet based information retrieval is concerned, using expert searchers knowledge. By meeting the pre-mentioned criteria, SEARCH ADVISOR can be consider as an intelligent agent.

3. Introduction of SEARCH ADVISOR system.

Even though there is a variety of search engines available on the Net , there is a lack of a mechanism that will be able to construct a global search strategy. SEARCH ADVISOR in the sense of an Intelligent Agent is a propose-and-revise system which automates the construction of a search strategy (in a specific domain) for Internet based information retrieval, in order to help Internet users to access and retr ieve information using a variety of Internet meta-search engines and information resources. SEARCH ADVISOR can help and train "novice searchers" towards a search task, by providing additional information regarding the decision tree that the system construct during the search session. Using and combing the meta-knowledge that has been acquired from expert searchers and user's defined search term, can be used as a trainer for novices searchers. Providing, justification reasons, regarding each step of the proposed search strategy, SEARCH ADVISOR enables them to identify the criteria that an exp ert searcher uses during a search task, and allow him/her to monitories expert' thoughts. The main goal of SEARCH ADVISOR is not only to accomplish accurately the retrieval session, but simultaneously to enrich and improve users search skills during this search session. SEARCH ADVISOR's system can be analysed in the following levels (Fig 2). 1) User interaction - Data Input-Output level: The system can be accessed by the user via front - end interface of the system. The user is allowed as first step, to insert a search term in a dialogue box and additionally to receive the results both o f the proposed search strategy and the Internet search.

2) SEARCH ADVISOR' s Advisor level: At this level the system implements and combines user inputs with the expert suggestions (using the pre-stored Librarian, Internet and Domain expert knowledge) in order to report the preferable search strategy, that is suggested to be followed, back to the user. A detailed description of this level is given in the following section.

Figure 2,
Overview of Search Advisor system.

3) Query Transformation level: This level is responsible for the transformation of the proposed search plan to individual queries towards Internet search engines.

4) Information Retrieval level: Here the system reaches Internet repositories, and reports the results back to the user (via the second level interface). For the implementation of the interactive components of SEARCH ADVISOR system (levels 1, 3 and 4), HTML3 and CGI (Common Gateway Interface) scripts are used and for the implementation of the Advisor Component (level 2), is used Common LISP.

4. SEARCH ADVISOR flowchart.

SEARCH ADVISOR's flowchart is as follows (Fig 3) : STEP 1: System is intialised using user input and the meta-knowledge stored in the three different KBS. User accesses SEARCH ADVISOR via WWW and defines the search term.
STEP 2: User inputs are transferred to the Advisor Component so that the search strategy is determined .

Figure 3,
Flowchart of a search session using Search Advisor.

STEP 3: The results of SEARCH ADVISOR' s Advisor Component are reported back to the user. At this point, the user can ask for justification reasons regarding the criteria that the system has used in order the search strategy to be constructed.
STEP 4: SEARCH ADVISOR reaches Internet repositories and reports the search results back to the user.
STEP 5: According to the results that have been retrieved, if the user accepts them the search task is ended, otherwise the entire search session is refined.

5. Further analysis of the Advisor Component

The Advisor Component of SEARCH ADVISOR system comprises of:
a. An automated Knowledge Acquisition component: This will be responsible for the knowledge elicitation from a domain expert and for the transformation of the acquired knowledge to a Knowledge Base System (KBS) as a side-effect of a man-machine dialog ue. The stage of knowledge elicitation requires three different kinds of domain experts in order three different kinds of KBS to be constructed.
b. Three different Knowledge Base Systems (LKBS, IKBS, DKBS): The first type of knowledge base system, named LKBS (Librarian Knowledge Base System), is going to be constructed based on the acquired knowledge from a Librarian expert (a person specialised in subject or word-related search). This KBS will include the top level rules, that an Librarian expert searcher usually follows, in order to locate the information that he is interested in. Additionally the meta-knowledge used by the expert for the refi nement of a search task is also included in the LKBS.
The second knowledge base system, named IKBS (Internet Knowledge Base System), will be based on the acquired knowledge from an Internet expert (a person specialised in Internet information location) and will include again the top level rules, tricks and t ips that the expert usually follows in order to retrieve a specific information from Internet repositories. Both LKBS and IKBS will include rules and knowledge which will be domain independent and furthermore can be used and reused independently of the user defined search term.
Finally the third knowledge base, named DKBS (Domain Knowledge Base System), will include information provided by the domain expert, in the sense of related concepts or synonyms to the user defined search term, in order the potentials of the users search to be enhanced.


Figure 4,
System Advisor components.

All the information stored in DKBS will be organised in distinct domain dependent sub-knowledge bases. Every time a new knowledge elicitation happens, the acquired domain specific knowledge will be added to the DKBS.
Consequently the information stored in DKBS is not static but on the contrary is increased gradually each time a new knowledge elicitation happens.. (fig 3) c. A forward chaining Inference Engine: This Inference Engine is going to be initialised using the user's inputs. Therefore based on these, and in combination with the information stored in the three pre-constructed knowledge bases LKBS, IKBS and DKB S, it will apply forward chaining to find all the rules and the related concepts that contribute in order a search plan to be reached. The inference Engine is also responsible for the refinement of the search session in case that search results are rejec ted from the user

6. Conclusions

7. Further research.

Further research topics include the investigation regarding the appropriate model of knowledge representation of the acquired knowledge and the concept piling. The isolation and identification of the meta-knowledge used by expert searchers during a sear ch session is another crucial point. The mismatches between user defined search term and subject headings is one of the most usual reasons that a search task fails. To improve the performance of an information retrieval session we have to eliminate those mismatches by developing a mechanism in order search terms defined by the user to be corresponded to the appropriate subject heading . Finally it is necessary SEARCH ADVISOR to be updated and compatible towards new search mechanisms that will rise o n the Net.

8. References

[1] Bowman,C. M., & Danzig, P.B., & Manber, U., & Schwartz, M.F., (1994). Scalable Internet Resource Discovery: Research Problems and Approaches. Communication of ACM, 37 (8), 98-107.
[2] Chen, H. et al. (1996). Towards Intelligent Meeting Agents, Computer Magazine, August 1996, pp. 62-69.
[3] Croft, W.B, & Thompson R.H, (1987). I3R: A new approach to the design of document retrieval systems. Journal of the American Society for Information Science, 38 (6), 389-404.
[4] Croft, W.B, & Turtle, H. (1993). Retrieval Strategies for Hypertext. Information Processing and Management, 29 (3), 313-324.
[5] Frisse, M.E., & Cousins, S.B (1989). Information retrieval from hypertext: Update on the dynamic medical handbook project, 1989, In Hypertext '89 Proceedings, New York ACM Press. VA 199-212
[6] Savoy, J., & Desbois, D. (1991). Bayesian inference networks in hypertext, 1991, In Proceedings RIAO 3, Paris: CID., VA 662-681.
[7] Weld, D. (1995). The Role of Intelligent Systems in the National Information Infrastructure, AI Magazine, Fall 1955. pp. 45-64.


Avgoustos. A. Tsinakos and Konstandinos. G. Margaritis
Department of Informatics
University of Macedonia
54006 Thessaloniki
GREECE
Tel : +30-31- 891 891
E-mail: tsinakos/kmarg@macedonia.uom.gr


©, 1997. The authors, Avgoustos A. Tsinakos and Konstandinos G. Margaritis assign to University of New Brunswick and other educational and non-profit institutions a non-exclusive license to use this document for personal use and in courses of instruction provided that the article is used in full and this copyright statement is reproduced. The authors also grant a non-exclusive license to the University of New Brunswick to publish this document in full on the World Wide Web and on CD-ROM and in printed form with the conference papers, and for the document to be published on mirrors on the World Wide Web. Any other usage is prohibited without the express permission of the authors.