Johan Bollen
School of Informatics and Computing
Indiana University
Bloomington, IN
jbollen@indiana.edu
Overview
- Scholarly Impact
- Impact metrics
- New metrics
- Social media
cf. Stevan Harnad: “The aim is to report their findings to
their peers and contribute to the ongoing cycle of creating
more knowledge. They don’t want to make money from their texts
but to reach as many minds as possible.”
Science as a gift economy
- Scientists “give” ideas to community
- Gift is acknowledged by citation
- Status in community is measured by number of citations
More citations = better!
Hence citation-based impact metrics…
Myriad of impact metrics based on citations.
- Cites: articles, authors, journals
- h-index: n’th pub with at least n citations
Kaur J et al. (2012)
Scholarometer: A Social Framework for Analyzing Impact across Disciplines.
PLoS ONE 7(9): e43235.
Journal Impact Factor
Mean 2-year journal citation count:
\[ IF_j = \frac{\sum_i w_{ij}}{N_j} \]
\[ P_i = (1-\lambda) + \lambda\sum_i \frac{1}{O(i)} \]
\[ PR_w(v_i) = \frac{\left(1-\lambda\right)}{N} + \lambda \sum_j PR_w(v_j) \times w(v_j,v_i) \]
Johan Bollen, Marko A. Rodriguez, and Herbert Van de Sompel.
Journal status. Scientometrics, 69(3),
December 2006 (arxiv.org:cs.DL/0601030, DOI: 10.1007/s11192-006-0176-z)
Two major issues.
- data: publication delay, domain-dependency, and authors only.
- metrics: ignore network properties
Wait… did I say “two”? That was 2006. Now there’s three.
- Data
- Metrics
- New modes of scholarly communication
- Online ecology
- Social media
Beyond citation impact
- Impact: interactions with peer-reviewed publications
- Social media: blogs, twitter, facebook, web pages, etc
- Network of communication items, authors, readers, commentators, aggregators, etc.
attention, impact, influence, Kloudness, overlayed with social media.
Our own work
- MESUR: Impact metrics from large-scale usage data
- Twitter analytics for scholarly impact
(1) measures impact from scholarly usage data.
(2) measures impact from attention indicators in social media
MESUR
- 2006-2009: LANL - Andrew W. Mellon
- 2009-2012: IU - Andrew W. Mellon and NSF
Acknowledgement: Herbert van de Sompel and LANL RL Digital Library Prototyping and Research group
Aggregation of usage data
MESUR data flow
MESUR providers
BMC, Blackwell, UC, CSU (23), EBSCO, ELSEVIER, EMERALD,
INGENTA, JSTOR, LANL, MIMAS/ZETOC, THOMSON, UPENN, UTEXAS (9)
1,000,000,000 usage events, +500,000,000 citations,
+50M articles, +-100,000 serials
Data structure
- Separate user requests
- Date-time stamp down to second
- Document identifier or sufficient metadata to de-duplicate
- Request type identifier
Clickstreams:
Session IDs (or anonymized user ID)
Mapping citation relations to usage relations
Converting usage data to usage networks
Example: Betweenness centrality
\[ C_b(v_k) = \sum_{i \neq j \neq k}\frac{\sigma _{i,j}(v_k)}{\sigma _{i,j}} \]
1 |
Science |
2 |
PNAS |
3 |
Environmental Health Perspectives |
4 |
Chemosphere |
5 |
Journal of Advanced Nursing |
Metrics survey: 39 metrics
MESUR now
- Andrew W. Mellon foundation grant to develop sustainable, community-supported environment
- Ongoing work:
- collecting usage data
- remapping science
- tracking innovation (NSF grant)
- scholarly usage data -> scholarly community
- growing importance of social media: 750M people on Facebook, 400M on Twitter!
- related to scholarly impact?
Temporal effects?
The data:
The numbers…
|
N |
articles |
period |
arXiv downloads |
2,904,816 |
4,606 |
10/04/2010 - 05/09/2011 |
Twitter mentions |
5,752 |
4,415 |
10/04/2010 - 05/09/2011 |
early citations |
431 |
70 |
10/04/2010 - 09/30/2011 |
As you can see, pretty preliminary data!
R=0.505
R=0.452
R=0.387
Multi variant linear regression analysis
Citation C vs. twitter mentions T, article arXiv downloads A, and time since submission of article P
\[ C=\beta _{1}T+\beta _{2}A+\beta _{3}P+\varepsilon _3 \]
\[ \beta _{1} = 0.120 ^{***}, std. error = 0.040 \]
\[ \beta _{2} = 0.0001, std. error = 0.00008 \]
\[ \beta _{3} = 0.041 ^{**}, std. error = 0.019 \]
Conclusion
- Impact is not what we think it is
- citation, usage, and perhaps even social media
- Multi-dimensional concept
- Projection of underlying social network dynamics?
- Scholarship is changing
- Increasingly online, digital: everything can get published
- Social media and crowd-sourcing play an increasing role
- Challenging peer review and existing funding models
- We need impact assessment that changes too
ready for something radical?
a GEDANKENEXPERIMENT:
- “follow the money”: change funding model so as to change science
- acknowledge and even rely on “gift economy” aspect of science
- embrace the network
In 2010 alone the NSF convened panels of more than 15,000 scientists
to review 55,542 proposals.
- Billions of dollars distributed in this manner.
- Enormous overhead costs:
- Scientists writing proposals about what they will do rather than doing it.
- Reviewing, project administration, budget management
- Effects on science
- Project focus
- Innovation?
- Academic freedom?
- Competition.
Running the circus from the monkey cage experts
From funding agencies to crowd!
Chaos?
receive:
\[ A^t _i = B + \sum _{j \in [1,N]} O _{j \rightarrow i}^{t-1} \]
give:
\[ \sum _{j \in [1,N]} O^t _{i \rightarrow j} = (F) A^{t} _i \,\,\,\,\, \forall i \in N \]
This is PageRank… Funding circulates through social network, settles into steady-state shaped by collective action?
Features (presumed)
- Fair: base amount of funding, yet merit-based
- Robust: distributed, decisions made community instead of 3 reviewers
- Responsive: distribution changes year-to-year, base funding stable
- Efficient: Five minutes/year, log in to online brokerage system
- Conflict of interest and certification
Simulation
Assumptions:
- Citation: acknowledgement of intellectual “debt”
- Those we cite, we’d donate to?
We took:
- 20 years (1992-2010) WoS citation data (donated by LANL, from TR), 37M papers, 770M citations, 867,872 authors (+1paper per year)
- resolved all citations, author to author
- ran the system under the assumption that authors donate according to their past 5 years of citation
B=$100,000
F=0.5, vs. actual NSF and NIH, R=0.2683, ρ = 0.2999
Selected Publications
Xin Shuai, Alberto Pepe, Johan Bollen. How the Scientific Community Reacts to Newly Submitted Preprints: Article Downloads, Twitter Mentions, and Citations. PLos ONE (in press), 2012
Johan Bollen, Herbert Van de Sompel, Aric Hagberg and Ryan Chute. A Principal Component Analysis of 39 Scientific Impact Measures. PLoS ONE, June 2009. URL: http://dx.plos.org/10.1371/journal.pone.0006022.
Michael Kurtz and Johan Bollen. Usage bibliometrics. Annual Review of Information Science and Technology. Volume 44 (2010), 2009.
Bollen J, Van de Sompel H, Hagberg A, Bettencourt L, Chute R, et al. 2009 Clickstream Data Yields High-Resolution Maps of Science. PLoS ONE 4(3): e4803. doi:10.1371/journal.pone.0004803.
Johan Bollen, Herbert Van de Sompel and Marko A. Rodriguez. Towards usage-based impact metrics: first results from the MESUR project, JCDL 2008, Pittsburgh, PA, June 2008. (arXiv:0804.3791v1, best paper finalist)
Johan Bollen and Herbert Van de Sompel. Usage Impact Factor: the effects of sample characteristics on usage-based impact metrics. Journal of the American Society for Information Science and technology, 59(1), January 2008, pages 001-014 (cs.DL/0610154).