Measuring impact: scholarly communication and social media

Johan Bollen
School of Informatics and Computing
Indiana University
Bloomington, IN
jbollen@indiana.edu

Overview

  1. Scholarly Impact
  2. Impact metrics
  3. New metrics
  4. Social media

cf. Stevan Harnad: “The aim is to report their findings to their peers and contribute to the ongoing cycle of creating more knowledge. They don’t want to make money from their texts but to reach as many minds as possible.”

Science as a gift economy

  1. Scientists “give” ideas to community
  2. Gift is acknowledged by citation
  3. Status in community is measured by number of citations

More citations = better! Hence citation-based impact metrics…

Myriad of impact metrics based on citations.

  1. Cites: articles, authors, journals
  2. h-index: n’th pub with at least n citations

Kaur J et al. (2012) Scholarometer: A Social Framework for Analyzing Impact across Disciplines. PLoS ONE 7(9): e43235.

Journal Impact Factor

Mean 2-year journal citation count: \[ IF_j = \frac{\sum_i w_{ij}}{N_j} \]

New developments: PageRank, Eigenfactor

\[ P_i = (1-\lambda) + \lambda\sum_i \frac{1}{O(i)} \] \[ PR_w(v_i) = \frac{\left(1-\lambda\right)}{N} + \lambda \sum_j PR_w(v_j) \times w(v_j,v_i) \]

Johan Bollen, Marko A. Rodriguez, and Herbert Van de Sompel. Journal status. Scientometrics, 69(3), December 2006 (arxiv.org:cs.DL/0601030, DOI: 10.1007/s11192-006-0176-z)

Two major issues.

  1. data: publication delay, domain-dependency, and authors only.
  2. metrics: ignore network properties

Wait… did I say “two”? That was 2006. Now there’s three.

  1. Data
  2. Metrics
  3. New modes of scholarly communication
    • Online ecology
    • Social media

Beyond citation impact

attention, impact, influence, Kloudness, overlayed with social media.

Our own work

  1. MESUR: Impact metrics from large-scale usage data
  2. Twitter analytics for scholarly impact

(1) measures impact from scholarly usage data. (2) measures impact from attention indicators in social media

MESUR

  1. 2006-2009: LANL - Andrew W. Mellon
  2. 2009-2012: IU - Andrew W. Mellon and NSF

Acknowledgement: Herbert van de Sompel and LANL RL Digital Library Prototyping and Research group

Aggregation of usage data

MESUR data flow

MESUR providers

BMC, Blackwell, UC, CSU (23), EBSCO, ELSEVIER, EMERALD, INGENTA, JSTOR, LANL, MIMAS/ZETOC, THOMSON, UPENN, UTEXAS (9)

1,000,000,000 usage events, +500,000,000 citations, +50M articles, +-100,000 serials

Data structure

Clickstreams:

Session IDs (or anonymized user ID)

Mapping citation relations to usage relations

Converting usage data to usage networks

Example: Betweenness centrality

\[ C_b(v_k) = \sum_{i \neq j \neq k}\frac{\sigma _{i,j}(v_k)}{\sigma _{i,j}} \]

1 Science
2 PNAS
3 Environmental Health Perspectives
4 Chemosphere
5 Journal of Advanced Nursing

Metrics survey: 39 metrics

MESUR now

  1. Andrew W. Mellon foundation grant to develop sustainable, community-supported environment
  2. Ongoing work:
    • collecting usage data
    • remapping science
    • tracking innovation (NSF grant)

From usage data to social media indicators

Temporal effects?

The data:

The numbers…

N articles period
arXiv downloads 2,904,816 4,606 10/04/2010 - 05/09/2011
Twitter mentions 5,752 4,415 10/04/2010 - 05/09/2011
early citations 431 70 10/04/2010 - 09/30/2011

As you can see, pretty preliminary data!

R=0.505

R=0.452

R=0.387

Multi variant linear regression analysis

Citation C vs. twitter mentions T, article arXiv downloads A, and time since submission of article P

\[ C=\beta _{1}T+\beta _{2}A+\beta _{3}P+\varepsilon _3 \]

\[ \beta _{1} = 0.120 ^{***}, std. error = 0.040 \]

\[ \beta _{2} = 0.0001, std. error = 0.00008 \]

\[ \beta _{3} = 0.041 ^{**}, std. error = 0.019 \]

Conclusion

  1. Impact is not what we think it is
    • citation, usage, and perhaps even social media
    • Multi-dimensional concept
    • Projection of underlying social network dynamics?
  2. Scholarship is changing
    • Increasingly online, digital: everything can get published
    • Social media and crowd-sourcing play an increasing role
    • Challenging peer review and existing funding models
    • We need impact assessment that changes too

ready for something radical?

a GEDANKENEXPERIMENT:

In 2010 alone the NSF convened panels of more than 15,000 scientists to review 55,542 proposals.

Running the circus from the monkey cage experts

From funding agencies to crowd!

Chaos?

receive: \[ A^t _i = B + \sum _{j \in [1,N]} O _{j \rightarrow i}^{t-1} \]

give: \[ \sum _{j \in [1,N]} O^t _{i \rightarrow j} = (F) A^{t} _i \,\,\,\,\, \forall i \in N \]

This is PageRank… Funding circulates through social network, settles into steady-state shaped by collective action?

Features (presumed)

  1. Fair: base amount of funding, yet merit-based
  2. Robust: distributed, decisions made community instead of 3 reviewers
  3. Responsive: distribution changes year-to-year, base funding stable
  4. Efficient: Five minutes/year, log in to online brokerage system
  5. Conflict of interest and certification

Simulation

Assumptions:

We took:

B=$100,000

F=0.5, vs. actual NSF and NIH, R=0.2683, ρ = 0.2999

Selected Publications

Xin Shuai, Alberto Pepe, Johan Bollen. How the Scientific Community Reacts to Newly Submitted Preprints: Article Downloads, Twitter Mentions, and Citations. PLos ONE (in press), 2012

Johan Bollen, Herbert Van de Sompel, Aric Hagberg and Ryan Chute. A Principal Component Analysis of 39 Scientific Impact Measures. PLoS ONE, June 2009. URL: http://dx.plos.org/10.1371/journal.pone.0006022.

Michael Kurtz and Johan Bollen. Usage bibliometrics. Annual Review of Information Science and Technology. Volume 44 (2010), 2009.

Bollen J, Van de Sompel H, Hagberg A, Bettencourt L, Chute R, et al. 2009 Clickstream Data Yields High-Resolution Maps of Science. PLoS ONE 4(3): e4803. doi:10.1371/journal.pone.0004803.

Johan Bollen, Herbert Van de Sompel and Marko A. Rodriguez. Towards usage-based impact metrics: first results from the MESUR project, JCDL 2008, Pittsburgh, PA, June 2008. (arXiv:0804.3791v1, best paper finalist)

Johan Bollen and Herbert Van de Sompel. Usage Impact Factor: the effects of sample characteristics on usage-based impact metrics. Journal of the American Society for Information Science and technology, 59(1), January 2008, pages 001-014 (cs.DL/0610154).