We propose a semi-automatic method based on finite-state techniques for the unification of corporate source data, with potential
applications for bibliometric purposes. Bibliographic and citation databases have a well-known problem of inconsistency in
the data at micro-level and meso-level, affecting the quality of bibliometric searches and the evaluation of research performance.
The unification method applies parametrized finite-state graphs (P-FSG) and involves three stages: (1) breaking of corporate
source data in independent units of analysis; (2) creation of binary matrices; and (3) drawing finite-state graphs. This procedure
was tested on university departmental addresses, downloaded from the ISI Web of Science. Evaluation was in terms of an adaptation
of the measures of precision and recall. The results demonstrate the usefulness of this approach, though it requires some
This paper describe an approach for improving the data quality of corporate sources when databases are used for bibliometric
purposes. Research management relies on bibliographic databases and citation index systems as analytical tools, yet the raw
resources for bibliometric studies are plagued by a lack of consistency in fied formatting for institution data. The present
contribution puts forth a Natural Language Processing (NLP)-oriented method for the identification of the structures guiding
corporate data and their mapping into a standardized format. The proposed unification process is based on the definition of
address patterns and the ensuing application of Enhanced Finite-State Transducers (E-FST). Our procedure was tested on address
formats downloaded from the INSPEC, MEDLINE and CAB Abstracts. The results demonstrate the helpfulness of the method as long
as close control of errors is exercised as far as the formats to be unified. The computational efficacy of the model is noteworthy,
due to the fact that it is firmly guided by the definition of data in the application domain.
Cuban scientific output at macro level has not been frequently studied in the literature on scientometrics. The current paper
explores the different metric approaches to the Cuban scientific activity carried out by national and international authors.
Also, the article develops a scientometric study of the Cuban scientific production as included in Scopus during the period
1996–2007, using socio-economic indicators combined with bibliometric indicators supported by the SCImago Journal & Country Rank. Web of Science and Scopus are compared as information sources. Results confirm the possibility to use Scopus to obtain an
objective picture of the Cuban science behaviour during the end of the 1990s and the beginning of the XXI century. The SCImago Journal & Country Rank, in this case, offers an important set of indicators. The combination of these indicators with those related to socio-economic
aspects of activities in Science and Technology, allow the authors to show a perspective of the Cuban science system evolution
during the period analyzed. The inclusion in Scopus of less-cited journals published in Spanish language and its impact on
productivity and citation-based indicators is also discussed. Our investigation found an increasing growth of the Cuban scientific
production during the whole period, which is in correspondence to the country efforts and expenditures in Research and Development
Clustering is applied to web co-outlink analysis to represent the heterogeneous nature of the World Wide Web in terms of the
“triple helix” model (university-industry-government). An initial categorization is based on families of websites, which is
then matched with Spanish institutions from diverse sectors represented on the Web, to uncover cognitive structures and related
subgroups with common interests and confirm the junction of sectors of the “triple helix” model. We may conclude that the
clustering method applied to web co-outlink analysis works when fully institutionalized organizations are studied, to make
their interconnections manifest.
A bibliometric analysis of the 50 most frequently publishing Spanish universities shows large differences in the publication activity and citation impact among research disciplines within an institution. Gini Index is a useful measure of an institution's disciplinary specialization and can roughly categorize universities in terms of general versus specialized. A study of the Spanish academic system reveals that assessment of a university's research performance must take into account the disciplinary breadth of its publication activity and citation impact. It proposes the use of graphs showing not only a university's article production and citation impact, but also its disciplinary specialization. Such graphs constitute both a warning and a remedy against one-dimensional approaches to the assessment of institutional research performance.
A study is described of the rank/JIF (Journal Impact Factor) distributions in the high-coverage Scopus database, using recent
data and a three-year citation window. It includes a comparison with an older study of the Journal Citation Report categories
and indicators, and a determination of the factors most influencing the distributions. While all the specific subject areas
fit a negative logarithmic law fairly well, those with a greater External JIF have distributions with a more sharply defined
peak and a longer tail—something like an iceberg. No S-shaped distributions, such as predicted by Egghe, were found. A strong
correlation was observed between the knowledge export and import ratios. Finally, data from both Scopus and ISI were used
to characterize the rank/JIF distributions by subject area.
In recent years a number of studies have focused on Argentina’s 2001 economic crisis and its political, social, and institutional
repercussions. To date, however, no studies have analyzed its effects upon the country’s scientific system from a scientometric
perspective, in terms of resources dedicated to scientific activity and the final output and impact. The present study does
so by means of a set of scientometric indicators that reflect economic effort, human resources dedicated to research, publications,
collaborative relations, and the international visibility of scientific contributions.
The present paper proposes a method for detecting, identifying and visualizing research groups. The data used refer to nine
Carlos III University of Madrid departments, while the findings for the Communication Technologies Department illustrate the
method. Structural analysis was used to generate co-authorship networks. Research groups were identified on the basis of factorial
analysis of the raw data matrix and similarities in the choice of co-authors. The resulting networks distinguished the researchers
participating in the intra-departmental network from those not involved and identified the existing research groups. Fields
of research were characterized by the Journal of Citation Report subject category assigned to the bibliographic references
cited in the papers written by the author-factors. The results, i.e., the graphic displays of the structures of the socio-centric
and co-authorship networks and the strategies underlying collaboration among researchers, were later discussed with the members
of the departments analyzed. The paper constitutes a starting point for understanding and characterizing networking within
The intellectual structure and main research fronts of the Faculty of Natural Sciences and Museum of the National University
of La Plata, Argentina is studied, based on the cocitation analysis of subject categories, journals and authors of their scientific
publications collected in the Science Citation Index, CD-ROM version, for the period 1991–2000. The objective of this study
is to test the utility of those techniques to explore and to visualize the intellectual structure and research fronts of multidisciplinary
institutional domains. Special emphasis is laid on the identification of multilevel structures, by means of arrangements of
subject categories cocitation analysis and journal cocitation analysis.