Original Articles
 

By Dr. Juan S Yakisich
Corresponding Author Dr. Juan S Yakisich
Department of Clinical Neuroscience, Karolinska Institute, Department of Clinical Neuroscience R54 - Sweden SE-141 86
Submitting Author Dr. Juan S Yakisich
MISCELLANEOUS

Scientometrics-Bibliometrics-Citation-Impact Factor-H-Index-Relevance Index-Evaluation-Article Influence.

Yakisich JS. Relevance Index (RI): Proposal of a Novel Parameter to Evaluate the Impact of Scholar Articles and Scientists for Specific Areas of Science. WebmedCentral MISCELLANEOUS 2011;2(12):WMC002770
doi: 10.9754/journal.wmc.2011.002770

This is an open-access article distributed under the terms of the Creative Commons Attribution License(CC-BY), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
No
Submitted on: 22 Dec 2011 04:14:27 PM GMT
Published on: 23 Dec 2011 06:19:39 PM GMT

Abstract


In order to objectively evaluate scientists, several parameters are often used. At present most of these parameters reflect the impact of journals rather than the relevance of specific scientific articles or individual scientists in a particular area of science. Alternative methods such as the increasing popular h-index have also been criticized because of important biases. In this manuscript, the “Relevance Index” (RI), based in the number of citing articles containing relevant keywords in the title and/or abstracts for a particular research area is proposed as a novel indicator of the relevance of specific articles for specific areas of science. As a case analysis, the abstract and titles of citing articles for 2 specific publications were downloaded as EndNote files from Web of Knowledge (Institute for Scientific Information, Inc (ISI)) and exported as rich text format files. In each text file relevant keywords for 6 research fields or subfields were screened using the word finder tool. The RI of each article was calculated as the number of articles that contain at least one of the keywords considered relevant for that specific research fields or subfields. The RI for an individual scientist can be calculated as the summation of the RI of all articles published by the individual researcher. Although this proposal will require validation with powerful computer-based algorithms to evaluate its potential utility, the case analysis reported here suggest that it is possible to find and use specific keywords present in titles and abstract of citing articles to calculate the relevance for specific areas of research of scholar articles and perhaps, individual scientists.

Introduction


There is a growing interest to measure the quality of research articles and scholars. For instance, universities need to evaluate applicants to hire professors. This task is unique since new professors are most of the time appointed to specific departments. Funding agencies also need to evaluate scientist in order to finance those who can best achieve the purpose of the grant. In both examples, the relevance of the researcher in a particular field of science needs to be evaluated in order to select for the best candidate. The most used parameter for evaluation is the number of publications and the impact factor of the journals where the articles were published (Schutte and Svec, 2007; West et al., 2010). The number of publication by itself is of very limited value since poor research can always be published. In an attempt to solve this issue, the impact factor (IP) of journals is also included. Despite the criticism (Hecht et al., 1998; Kumar et al., 2009; Rieder et al., 2010; Seglen, 1997) several major universities and funding agencies still rely on the combined use of these two parameters. Alternative ways to measure the importance of scientific journals such as the eigen factorTM and the article influence scoreTM (Bergstrom and West, 2008; Rizkallah and Sin, 2010; West et al., 2010) have been developed to try to overcome the limitations of the IP. Not all scientific articles published by the same journal (even considering the same year) are cited at the same extent. For instance, two articles published in 1996 in the same journal (Mol Cel Biol) from the same laboratory (Dr. Blackburn) by Gylley et al and Strahl et al (Gilley and Blackburn, 1996; Strahl and Blackburn, 1996), have received different number of citations: 69 and 261 respectively (1996 - Jan/2011 period). This particular case is not unique, but serves the purpose to illustrate that the impact factor alone or any other parameter that evaluates journals instead of scientists is a useless parameter because individual articles have very different citation numbers.
In an attempt to evaluate individual scientist, other parameters were created such as the h-index (Hirsch, 2005) and its variants (Schreiber, 2010), the g-index (Egghe, 2006), the scholar factor (Bourne and Fink, 2008), the Discounted Cumulated Impact (DCI) index (Ahlgren and Jarvelin, 2010), the weighted PageRank algorithm (Yan and Ding, 2011) , the “single researcher impact factor” (Castelnuovo et al., 2010) and many others (Bollen et al., 2009; Kreiman and Maunsell, 2011). Although these indexes rely on part on the number of citations of articles, none of them are actually measuring the relevance of articles or scientist in their specific field of research that may be one of the most important factors when considering for instance, appointing a new professor for a specific department or awarding a research grant for a particular field.
This manuscript proposes that the relevance of specific articles in a particular field of science can be estimated by counting the number of citing articles relevant for a particular field or subfield. This can be achieved by analyzing the presence of specific keywords present in the abstracts and titles of citing articles (see rationale). On this basis, a novel index here called Relevance Index (RI) is proposed to evaluate the relevance of not only scholar articles but also of scientists for specific areas of science.

Rationale and calculation of the RI of specific articles


For practical purposes, as a case analysis example, the RI of two "telomerase related" articles – A1 and A2- (Gilley and Blackburn, 1996; Montalto and Ray, 1996) were calculated manually as follow: 1) all citing articles were downloaded into an EndNote file and these references (Only titles and abstracts) were later exported into a word file) from this file, keywords relevant to specific areas of life sciences, e.g. ”telomere“, “cancer”, “aging”, “stem cells”, “diabetes” and “stroke” were screened using the word finder tool. The relevance of the article for a particular field was determined as the number of citing articles containing relevant keywords in the title and/or abstract for that particular field or subfield. Examples and results are presented in Illustrations 1 and 2.
The logic behind the RI is the assumption that if a scientific article (e.g., A1) is relevant for the cancer field (FCAN), the titles and/or abstracts of articles that cite A1 will likely contain specific words relevant for that field. Then, the number of citing articles containing relevant words for the cancer field (NFCAN) will be a number “N” between zero and the total number of citing articles. Thus, the higher the RI, the higher is the relevance of that article for that particular research field.
Therefore, we can define relevance index of the article “A1” (RIA1) for the “cancer” field ( FCAN) as follow:
RIA1(FCAN) = NFCAN
Where
RI1(FA1) = Relevance index of the article (RI1) for the specific field A (FA)
NFCAN = number of citing articles that contain relevant keywords for FCAN
For instance, for the article “A1” (that has been cited 69 times) the RI for the telomere biology field (FTEL) is 65 (RI1(FTEL) = NTEL = 65) since the keywords chosen to represent the telomere field (“telomerase” and “telomere”) were present in the title and/or abstracts of 65 citing articles.
In a similar way, the relevance of the same article can be determined for other fields by counting the number of articles containing relevant keywords for those fields. For the same article A1, the relevant words for the cancer field (FCAN) “cancer” and/or “tumor” were present only in 7 articles. Thus the RI for the cancer field is: RIA1 (FCAN) = NCAN = 7.
If we compare this value with another article (article A2) that was published the same year and received only 11 citations but 6 of them contained “cancer” and/or “tumor” (RIA2 (FCAN) = NCAN = 6), we can observe that both articles have almost the same RI in the cancer field despite the difference in the absolute number of citations. This may compensate for the difference in the citation rate of different journals.
This case analysis indicates that both articles, A1 and A2 were, as expected, more cited (and thus more relevant) in articles in the telomere biology field. They were also cited in cancer related articles but not in stroke or diabetes related articles (See Illustration 2). Most important, it shows that both article are similar relevant in the cancer field despite the difference in the absolute number of citations (69 vs. 11) and the different IF for 2010 of the journals where the articles were published (6.188 and 5.403 and for A1 and A2 respectively). For comparison, the word “cancer or “tumor” were present in only 17 citing articles for another article A3 published the same year in Nature (IF 2010 = 36.103) that received 131 citations from 1996-2010. This demonstrates once again, that neither the IF of the journal or the absolute number of citations the article received is per se a reliable indicator of the relevance, in this case, for the cancer field.
The reason for using only the title and abstract instead of the full article is because, if the full article is relevant in a particular field, the relevant keywords for that particular field are likely to be included in the title and abstract at least once. This is important in order not to overestimate number of citing articles by excluding word in affiliations (e.g. “Cancer Research Institute”), acknowledgement (“This work was supported by the American Cancer Society”), Journal name in references, that may be present in the full article but are not actual relevant or the article.
Each field and subfield should be represented by the most relevant keywords. In our example we used “cancer” and “tumor” for the cancer field, “telomere” and ”telomerase” for the telomere biology field. See Illustration 2.
Calculation of the RI of scientists (RIS)
For a particular scientist, the relevance in
for a research field, e.g. cancer, (RIS FCAN)) can be calculated as the summation of the RI of each article in the cancer field published by this particular scientist.

An

RIS (FCAN) =

i = A1

Where:
RIS(FCAN) = Relevance index of a scientist (RIS) for the cancer field (FCAN)
A1 = The relevance index of article 1 for the cancer field (RIA1(FCAN)
An = The relevance index of article n for the cancer field (RIAn(FCAN)
n = number of articles published by the Scientist
Variants of the RIS
When evaluating scientists, different variants of the RI can be used depending on the need: i) The Total relevance index (RIST) defined as the summation of all the RI for articles ( RIA1…. RIAn) published by the researcher. Since this variant reflects the lifetime achievement of a scientist, it could be used for the evaluation of the most senior researcher (e.g. full professor, chairman) or for purely statistical purposes (e.g. who is the most relevant scientist in the stroke field), ii) The relevance index of the scientist for the last 10 or 5 years (RIS10 or RIS5) defined as the summation of all the RIA published by the researcher in the last 10 or 5 years respectively. These variants may be more appropriate for the evaluation of more junior positions to avoid the disadvantage inherent to applicants that are early in their career when competing with more senior investigators.

Discussion


In this manuscript the RIs of A1 and A2 for 6 research fields and subfields were calculated manually (Illustration 2) as a case analysis in order to propose the concept rather than provide statistically correlation with other parameters or a definitive validation of the RI. The latter will require the development of sophisticated tools to analyze entire databases that is beyond the scope of this article. Similar to other indexes, automatic (web based) calculation systems or specialized software can be developed to facilitate the calculation of the RI, correct and filter for potential sources of mistakes and ambiguities: the keyword “aging” could be also found in “passaging” (See Illustration 1 top). One possibility is to use the existing database of the ISI that already contains the citation of most scientific article (including titles and abstracts) and develop the above mentioned algorithms to calculate the RI for specific fields and subfields.
The choice of the keyword(s) relevant to each field or subfield should also be carefully chosen to avoid bias. Underestimations can be avoided by choosing a set of relevant keywords for a particular field or subfield. For broad field of research such as cancer, some articles may contain in the title and abstract only one of few cancer relevant keywords. For instance while one citing article may contain only “cancer” but not “tumor” or “tumour “ (Illustration 1 Top) another citing article may contain other cancer relevant keyword such as “tumor” and ”chemotherapy” but not “cancer” (Illustration 1 Bottom).
While for some subfields one keyword should be representative enough (e.g. Diabetes, Sjögren disease, Parkinson disease) others will need more than one keyword (e.g. for telomere biology: “telomere” and “telomerase”) to avoid the understimation of the number of citing articles.
Despite the apparent limitations and possible bias that the RI may have, it may constitute a reliable way for specific subfield where one or two relevant keywords are likely to be present in most articles that cite the original article. For instance, it is likely that in the case that a scientist publish an article on Sjögren or Parkinson disease, most of the articles that cites his/her work will contain “Sjögren” or “Parkinson” either in the title or in the abstract.
The RI can be refined in several ways to improve its accuracy for factors that has been criticized in other indexes: i) Self citations can be removed from the total list of citations and co-authorships can be fractionalized as described for the h-index (Schreiber, 2009), ii) Other types of articles that contribute to a favorable citation bias like review articles and editorials can also be excluded
The RI is a flexible parameter and has the potential to be developed into a useful tool to evaluate the scientific relevance of individual scholar articles and scientists to specific areas of research. Its flexibility can be exploited to define the criteria to evaluate the relevance of individual scientist for a particular application. If the RI proves to be useful, like other indexes, it should be used with caution only as an accessory parameter to evaluate the relevance of researchers.

References


1. Ahlgren, P., and Jarvelin, K. (2010). Measuring Impact of Twelve Information Scientists Using the DCI Index. Journal of the American Society for Information Science and Technology 61, 1424-1439.
2. Bergstrom, C. T., and West, J. D. (2008). Assessing citations with the Eigenfactor (TM) Metrics. Neurology 71, 1850-1851.
3. Bollen, J., Van de Sompel, H., Hagberg, A., and Chute, R. (2009). A principal component analysis of 39 scientific impact measures. PLoS One 4, e6022.
4. Bourne, P. E., and Fink, J. L. (2008). I am not a scientist, I am a number. PLoS Comput Biol 4, e1000247.
5. Castelnuovo, G., Limonta, D., Sarmiento, L., and Molinari, E. (2010). A more comprehensive index in the evaluation of scientific research: the single researcher impact factor proposal. Clin Pract Epidemiol Ment Health 6, 109-114.
6. Egghe, L. (2006). Theory and practise of the g-index. Scientometrics 69, 131-152.
7. Gilley, D., and Blackburn, E. H. (1996). Specific RNA residue interactions required for enzymatic functions of Tetrahymena telomerase. Molecular and Cellular Biology 16, 66-75.
8. Golubovskaya, V. M., Presnell, S. C., Hooth, M. J., Smith, G. J., and Kaufmann, W. K. (1997). Expression of telomerase in normal and malignant rat hepatic epithelia. Oncogene 15, 1233-1240.
9. Hamilton, S. E., Pitts, A. E., Katipally, R. R., Jia, X. Y., Rutter, J. P., Davies, B. A., Shay, J. W., Wright, W. E., and Corey, D. R. (1997). Identification of determinants for inhibitor binding within the RNA active site of human telomerase using PNA scanning. Biochemistry 36, 11873-11880.
10. Hecht, F., Hecht, B. K., and Sandberg, A. A. (1998). The journal "impact factor": a misnamed, misleading, misused measure. Cancer Genet Cytogenet 104, 77-81.
11. Hirsch, J. E. (2005). An index to quantify an individual's scientific research output. Proceedings of the National Academy of Sciences of the United States of America 102, 16569-16572.
12. Kreiman, G., and Maunsell, J. H. (2011). Nine criteria for a measure of scientific output. Front Comput Neurosci 5, 48.
13. Kumar, V., Upadhyay, S., and Medhi, B. (2009). Impact of the impact factor in biomedical research: its use and misuse. Singapore Med J 50, 752-755.
14. Montalto, M. C., and Ray, F. A. (1996). Telomerase activation during the linear evolution of human fibroblasts to tumorigenicity in nude mice. Carcinogenesis 17, 2631-2634.
15. Rieder, S., Bruse, C. S., Michalski, C. W., Kleeff, J., and Friess, H. (2010). The impact factor ranking-a challenge for scientists and publishers. Langenbecks Archives of Surgery 395, S57-S61.
16. Rizkallah, J., and Sin, D. D. (2010). Integrative approach to quality assessment of medical journals using impact factor, eigenfactor, and article influence scores. PLoS One 5, e10204.
17. Schreiber, M. (2009). The influence of self-citation corrections and the fractionalised counting of multi-authored manuscripts on the Hirsch index. Annalen Der Physik 18, 607-621.
18. Schreiber, M. (2010). Twenty Hirsch index variants and other indicators giving more or less preference to highly cited papers. Annalen Der Physik 522, 536-554.
19. Schutte, H. K., and Svec, J. G. (2007). Reaction of Folia Phoniatrica et Logopaedica on the current trend of impact factor measures. Folia Phoniatrica Et Logopaedica 59, 281-285.
20. Seglen, P. O. (1997). Why the impact factor of journals should not be used for evaluating research. British Medical Journal 314, 498-502.
21. Strahl, C., and Blackburn, E. H. (1996). Effects of reverse transcriptase inhibitors on telomere length and telomerase activity in two immortalized human cell lines. Molecular and Cellular Biology 16, 53-65.
22. West, J. D., Bergstrom, T. C., and Bergstrom, C. T. (2010). The Eigenfactor Metrics (TM): A Network Approach to Assessing Scholarly Journals. College & Research Libraries 71, 236-244.
23. Yan, E. J., and Ding, Y. (2011). Discovering author impact: A PageRank perspective. Information Processing & Management 47, 125-134.

Source(s) of Funding


Research in the author’s lab is supported by grants from the Swedish Research Council and the Karolinska Institute. Helpful discussion with members of the Karolinska Institute Library is acknowledged.

Competing Interests


The author declares no conflicts of interest.

Disclaimer


This article has been downloaded from WebmedCentral. With our unique author driven post publication peer review, contents posted on this web portal do not undergo any prepublication peer or editorial review. It is completely the responsibility of the authors to ensure not only scientific and ethical standards of the manuscript but also its grammatical accuracy. Authors must ensure that they obtain all the necessary permissions before submitting any information that requires obtaining a consent or approval from a third party. Authors should also ensure not to submit any information which they do not have the copyright of or of which they have transferred the copyrights to a third party.
Contents on WebmedCentral are purely for biomedical researchers and scientists. They are not meant to cater to the needs of an individual patient. The web portal or any content(s) therein is neither designed to support, nor replace, the relationship that exists between a patient/site visitor and his/her physician. Your use of the WebmedCentral site and its contents is entirely at your own risk. We do not take any responsibility for any harm that you may suffer or inflict on a third person by following the contents of this website.

Reviews
0 reviews posted so far

Comments
0 comments posted so far

Please use this functionality to flag objectionable, inappropriate, inaccurate, and offensive content to WebmedCentral Team and the authors.

 

Author Comments
0 comments posted so far

 

What is article Popularity?

Article popularity is calculated by considering the scores: age of the article
Popularity = (P - 1) / (T + 2)^1.5
Where
P : points is the sum of individual scores, which includes article Views, Downloads, Reviews, Comments and their weightage

Scores   Weightage
Views Points X 1
Download Points X 2
Comment Points X 5
Review Points X 10
Points= sum(Views Points + Download Points + Comment Points + Review Points)
T : time since submission in hours.
P is subtracted by 1 to negate submitter's vote.
Age factor is (time since submission in hours plus two) to the power of 1.5.factor.

How Article Quality Works?

For each article Authors/Readers, Reviewers and WMC Editors can review/rate the articles. These ratings are used to determine Feedback Scores.

In most cases, article receive ratings in the range of 0 to 10. We calculate average of all the ratings and consider it as article quality.

Quality=Average(Authors/Readers Ratings + Reviewers Ratings + WMC Editor Ratings)