SCI-Map is a new PC based system for mapping the scientific literature. By selecting a seed item, the user can build a network or cluster of nodes interactively, and can view the structure as it is being built. New nodes are selected for addition to the network by the strength of their links to the items already clustered, and the positions of new nodes are determined by a geometric triangulation method. SCI-Map can be used to perform clusterbased retrieval using co-citation or other measures of document association, and enables the user to explore the structure of large document sets. This case study focuses on the AIDS literature and shows how the network is built up topic by topic, the recall of the final cluster, and where AIDS connects to the literature of other fields.
Data visualization techniques have opened up new possibilities for science mapping. To exploit this opportunity new methods
are needed to position tens of thousands of documents in a single coordinate space. A general framework is described for achieving
this goal involving hierarchical clustering, ordination of clusters, and the merging of ordinations into a common coordinate
space. The SciViz system is presented as one particular implementation of this framework.
Science mapping projects have been revived by the advent of virtual reality software capable of navigating large synthetic
three dimensional spaces. Unlike the earlier mapping efforts aimed at creating simple maps at either a global or local level,
the focus is now on creating large scale maps displaying many thousands of documents which can be input into the new VR systems.
This paper presents a general framework for creating large scale document spaces as well as some new methods which perform
some of the individual processing steps. The methods are designed primarily for citation data but could be applied to other
types of data, including hypertext links.
At ISI we have used a consistent method for clustering the combinedScience Citation Index andSocial Sciences Citation Index for the last seven years (1983 to 1989). This method involves clustering highly cited documents by single-link clustering and then clustering the resultant clusters, a total of four times. This gives a hierarchical or nested structure of clusters four levels deep. Relationships among clusters at a given level can be depicted by multidimensional scaling, and by comparing successive year maps we can see how the relationships of major disciplines have changed from year to year. We focus mainly on the two highest levels of aggregation, C4 and C5, to make observations about structural changes in science involving the major disciplines. Distinction is made between changes which appear to be cyclic or oscillatory in nature, and those which appear to be more permanent or unidirectional.
The technique of co-citation cluster analysis is applied to a special three-year (1972–1974) file of theSocial Sciences Citation Index. An algorithm is devised for identifying clusters which belong to a discipline based on the percentage of source documents which appear in a disciplinary journal set. Clusters in three disciplines (economics, sociology and psychology) are identified using this algorthm. Clusters in a specialty of natural science (particle physics) obtained from the 1973Science Citation Index are compared and contrasted with the three groups of social sciences clusters. Certain common structural characteristics of the social science and natural science groups suggest that knowledge is developing in parts of the social science disciplines in a manner similar to the natural sciences.
The specialty of collagen research is tracked over a ten year period, 1970–1979, using the methodology of co-citation cluster strings. Independently obtained annual clusters are linked together over time by the percentage of highly cited documents countinuing from year to year. All inter-year links are clustered by single-linkage to form the strings, one of which corresponds to the collagen specialty. Maps of the individual year clusters within the string reveal an alternating pattern of expansion/innovation followed by contraction/consolidation. At the same time the subject focus of research gradually shifts. The institutional affiliation and funding sources for highly cited documents show a trend from early dominance by a few institutions and sources to a multiplicity and collaboration of centers and sources later on, due in part to the migration of researchers from an initially dominant institution.
Earlier experiments in the use of co-citations to cluster theScience citation Indey (SCI) database are reviewed. Two proposed improvements in the methodology are introduced: fractional citation counting and variable level clustering with a maximum cluster size limit. Results of an experiment using the 1979SCI are described comparing the new methods with those previously employed. It is found that fractional citation counting helps reduce the bias toward high referencing fields such as biomedicine and biochemistry inherent in the use of an integer citation count threshold, and increases the range of subject matters covered by clusters. Variable level clustering, on the other hand, increases recall as measured by the percentage of highly cited items included in clusters. It is concluded that the two new methods used in combination will improve our ability to generate comprehensive maps of science as envisioned byDerek Price. This topic will be discussed in a forthcoming paper.