The most popular method for judging the impact of biomedical articles is citation count which is the number of citations received.
The most significant limitation of citation count is that it cannot evaluate articles at the time of publication since citations
accumulate over time. This work presents computer models that accurately predict citation counts of biomedical publications
within a deep horizon of 10 years using only predictive information available at publication time. Our experiments show that
it is indeed feasible to accurately predict future citation counts with a mixture of content-based and bibliometric features
using machine learning methods. The models pave the way for practical prediction of the long-term impact of publication, and
their statistical analysis provides greater insight into citation behavior.
Patents represent the technological or inventive activity and output across different fields, regions, and time. The analysis
of information from patents could be used to help focus efforts in research and the economy; however, the roles of the factors
that can be extracted from patent records are still not entirely understood. To better understand the impact of these factors
on patent value, machine learning techniques such as feature selection and classification are used to analyze patents in a
sample industry, nanotechnology. Each nanotechnology patent was represented by a comprehensive set of numerical features that
describe inventors, assignees, patent classification, and outgoing references. After careful design that included selection
of the most relevant features, selection and optimization of the accuracy of classification models that aimed at finding most
valuable (top-performing) patents, we used the generated models to analyze which factors allow to differentiate between the
top-performing and the remaining nanotechnology patents. A few interesting findings surface as important such as the past
performance of inventors and assignees, and the count of referenced patents.
Authors:Mingyang Wang, Guang Yu, Shuang An, and Daren Yu
Then, the KNN classifier is used to cross-validate the classification accuracy of the feature subsets.
Classification by KNN classifier
The KNN algorithm is amongst the simplest of all machinelearning algorithms for
Authors:Julie Callaert, Joris Grouwels, and Bart Van Looy
compare the obtained indicators in terms of occurrence and contingencies. Overall, our observations reveal non-trivial differences for both indicators.
Methodology for characterizing NPRs
A supervised machinelearning approach
, Ch. M ., Pattern recognition and machinelearning , Springer Verlag 2006 .  Fisher , R ., The use of multiple measurements in taxonomic problems , Annals of Eugenics , 7 ( 1936 ), 179 – 188 .  Fukunaga , K ., Introduction to
Authors:Shlomo Argamon, Jeff Dodick, and Paul Chase
Recently, philosophers of science have argued that the epistemological requirements of different scientific fields lead necessarily
to differences in scientific method. In this paper, we examine possible variation in how language is used in peer-reviewed
journal articles from various fields to see if features of such variation may help to elucidate and support claims of methodological
variation among the sciences. We hypothesize that significant methodological differences will be reflected in related differences
in scientists’ language style.
This paper reports a corpus-based study of peer-reviewed articles from twelve separate journals in six fields of experimental
and historical sciences. Machine learning methods were applied to compare the discourse styles of articles in different fields,
based on easily-extracted linguistic features of the text. Features included function word frequencies, as used often in computational
stylistics, as well as lexical features based on systemic functional linguistics, which affords rich resources for comparative
textual analysis. We found that indeed the style of writing in the historical sciences is readily distinguishable from that
of the experimental sciences. Furthermore, the most significant linguistic features of these distinctive styles are directly
related to the methodological differences posited by philosophers of science between historical and experimental sciences,
lending empirical weight to their contentions.
Data mining is an interdisciplinary field that combines artificial intelligence, database management, data visualization, machinelearning, mathematic algorithms, and statistics
CiteSeer data which we will briefly mention.
Zhou et al. ( 2007 ) have investigated documents from CiteSeer to discover temporal social network communities in the domains of databases and machinelearning. On the other hand, Hopcroft et al. ( 2004
Authors:Michael Eckmann, Anderson Rocha, and Jacques Wainer
work quickly often prefer to publish in conferences and workshops proceedings.
Many computer science subareas have their own top conferences and journals. For example, in machinelearning, the two conferences Intl. Conference on Machine