AlephWeb: a CSCW Large Scale Trader

Gerard Rodríguez, Leandro Navarro

gerard@ac.upc.es, leandro@ac.upc.es

Universitat Politècnica de Catalunya UPC Barcelona Spain

Introduction | The Aleph Computational Model | AlephWeb architecture and status | Conclusions

Abstract: AlephWeb is a CSCW Trader to index information on the Internet, specially on the Web. The purpose of AlephWeb is to act as an information point for people who want to find out a web page or any web server of an specific organisation. AlephWeb might be considered as a common search engine since it keeps track of web pages to be retrieved later on, but it has been designed with CSCW purposes in mind. So, its main features are those that permit the cooperation among users and the system. Furthermore, AlephWeb has a distributed structure and the AlephWeb servers cooperate providing a search engine service by means of the Aleph federation mechanisms.

1. Introduction

The Web is a good tool to make documents available but specific information is sometimes difficult to find out. There is a desperate need for tools either to announce where the information is, or to look for specific web documents throughout the network. Till now, this problem has been solved with search engines. This engines gather as much references as possible looking for web servers throughout the whole Internet. However, the current search engines have several problems that decrease their real effectiveness in such a large environment as the current Internet: they use a central database, there is no control from the providers of information on the semantic content and indexing policies of the database. In others words, the current search engines do not provide tools to support CSCW communities, and specially those placed in a large and heterogeneus environment.

What the Aleph Computational Model proposes is a step ahead in scaling up the current model by moving from a central service to a cluster of local and specialised cooperating servers. We propose to interconnect them with a federated structure that does not limit the independence of its members promoting the emergence of many servers with diverse local contents and diverse gathering policies. So, in term of searching for information, what we have is a group of small search engines that cooperate among them. This framework is the Aleph Computational Model, where applications are replicated and they cooperate using the Aleph federation mechanism.

The rest of this paper is structured as follows. The second chapter is about the main concepts of the Aleph Computational Model (the core of the AlephWeb prototype): the federation structure and the federation mechanisms guiding its behaviour. Chapter three presents the status of an application based on the Aleph computational model: the AlephWeb search engine. Finally, the last chapter presents some conclusions and future and ongoing work.

2. The Aleph Computational Model

Figure 1: This figure shows the components of the Aleph Computational Model: the Aleph based applications that use the Aleph federation tools, the federation managers and the databases with the contextual information of each entity.

The Aleph Computational Model has been designed with large environments in mind. In this sort of environments scalability and heterogeneity are important issues. By scalability we mean that the system must be able to grow without affecting the current state of the Aleph based applications already working. On the other hand, heterogeneity is closely related to the diversity of large social and computational settings that rule on every entity. We mean that every entity is under the control of a set of people with their specific point of view of their organisational setting. This diversity of viewpoints might decrease the ability to cooperate or share documents among several entities. [Schmidt 93].

In order to address the previous drawbacks and achieve a real and easy cooperation among the entities, we have taken as schema the federated structure. In this structure distinct entities from different environments resolve to cooperate without decreasing their independence. A central authority does not exist. Communications take form between anyone that is inside the federation without the cooperation of a central entity. Thus, how entities are completely independent and how central entity does not exist, then boundary troubles provoked by the dynamism of the structure will increase. The element that deals with this dynamism is the federation manager.

The federation manager offers the whole federation environment to each entity. We might see the federation manager as the door to the outside space. If an entity needs to connect to the outside, it must cooperate with the federation manager. It will do all the required tasks to make an efficient use of the federation, such as to choose the best foreign entities to address the user's query, merge the results coming from the federation, maintain the list of partners, and it should hide the complex structure of the federation to the entity as much as possible.

The federation manager uses a short description of the contextual setting of other Aleph based applications to lead the query to the most worthy set of them. We have named this functionality as dynamic query routing. Whenever the federation manager has to choose a set of partners to forward the user's query, it looks in these descriptions and it decides which are the most relevant. The query is not solve against all the federated entities, but the query is addressed to the most valuable of them, so the result will be restricted to the information that best matches the user requirements (quality policy). These short descriptions of the contextual settings of each federated entity are represented with IAFA templates [Deutsch 95], and they are stored in every federation manager into the dynamic tables.

2.1. The federation manager

The federation manager is the core of the federated structure and it carries out the required tasks to maintain the federated structure. The federation manager has to provide functions for:

* choosing which entities will cooperate to serve a request. Each request is just relevant to a group of entities inside the whole federation, and each request can not be served communicating with all the federated entities because it would break the quality policy. So, the federation manager should address the request to the group of entities which are the most relevant according to the content of the request. (Partner identification)

* dealing with changes in the federated contextual settings. It means re-adapt de federation managers' databases according to new features of the federated applications, such as new Web service registered in the case of AlephWeb.(Context learning)

* dealing with changes in membership: either new entities joining or leaving the federation (Registration functionality).

2.1.1. Partner identification

Partner identification refers to the mechanisms to choose the best set of partners to serve a request according to the contextual setting [Duda 94]. It is required to select a set of partners because there can be a lot of entities (Federation managers acting as their contact point) in a federated environment. Thus, a user's request cannot be addressed contacting with all the existing entities. This would provoke an excessive number of interactions and irrelevant information. Therefore, a mechanism to guide the process of looking for these potential partners is required. In our computational schema these mechanisms