Identification, location and temporal evolution of topics
Data and algorithm — comparison of approaches
Library and Information Centre of the Hungarian Academy of Sciences, 29-30. Aug. 2016.
Conference summary
The timely problem
From science studies to research evaluation to science policy, there is an increasing need for
trustworthy information on how the science system is organized and evolving, where research fronts
are located etc. The branch of scientometrics called science mapping
has developed a wide variety of methods to address such issues. In fact, it reached a point where a next generation of questions
naturally arised: How to identify the most suitable methods? What benchmarks to use for validating
results of topic detection and for delineating fields of science? How field experts and expertise shold
be engaged? How, and to what extent, can research evaluation or science policy utilize, or even, built
upon the results of science mapping? The workshop in Budapest, co-funded by the Knowescape Cost Action and the IMPACT EV FP7 project,
and following a series of workshops in Berlin, Amsterdam, Istanbul (at ISSI 2015), was organized to address these problems,
stated in the title of the corresponding special issue of Scientometrics as "Same data, different results".
Bibliometric advancements and competing methodologies
Demonstrating the core concept of the workshop, Theresa Velden exposed the fundamental challenge stemming from the rich variety of
bibliometric methods available for scientific topic detection. Based on a large-scale publication dataset on astrophysics,
both citation-and-reference-based and text mining solutions, implemented in a joint exercise by expert groups worldwide
(CWTS, Ecoom, SciTech Inc, OCLC, etc.), were confronted. A systematic comparison between methods and the resulting topical structures
for the field of astrophysics revealed that both the choice on data models (making use of citation links as direct citations,
for bibliographic coupling or co-citation measurement) and extraction (clustering) algorithms significantly affect the topical landscape.
It points towards the importance of selecting the method most tightly fitting the research or policy question at hand, which is
probably both the solution and the main challenge behind topic identification. Beyond testing up-to-date variants of now-conventional methods
acting on metadata, elaborating on (full)text mining approaches in bibliometric settings was also an extensive branch of communication.
Wolfgang Glänzel proposed statistically re(de)fined methods of mining the topical composition of scholarly corpuses, borrowed from
quantitative linguistics and tagged as "nano-level" scientometrics for evaluative purposes. Haluk Bingol was focusing on citation analysis
being sensitive to the textual context of citation, while George Kampis presented a "blindfolded" solution of uncovering topical dynamics
within large-scale on-line textual data. As a corollary, the combination of citation- and text-based methods was presented by Edgar Schiebel,
who presented a sophisticated hybrid workflow of detecting research fronts based on various recent developments.
Algorithms: The physics of bibliometrics
Beyond data models (link- or text-based) and associated infoscience methods, another salient direction of the two-day discourse was
the interplay and methodological overlap between bibliometrics and various scientific domains, regarding topic detection. Most prominently,
expert from physics, the study of complex systems and complex networks presented valuable insights on how the advancements in network science
could better be utilized in science mapping. Tim Evans introduced a rather unconventional approach of remodelling document citation networks
within the framework of space-time geometry ("netometry"), to uncover topics and their evolution in a natural way. At the heart of
Péter Pollner's approach lied the successful "cfinder" algorithm developed for complex networks to uncover overlapping communities
(hence, topics) and their relations, grounding also the identification of changing roles for publications throughout their citation history;
Gergely Tibély, from the same Hungarian research group, continued with a set of models tailored towards detecting hierarchies in complex
networks, used in constructing a science map on the organization of disciplines via hierarchical ordering of scientific journals
by citation relations.
Outreach: Interfaces with science policy
Being of outstanding importance, the issues and methods of mapping the science system (e.g. the delineation of fields) as a science policy
tool played an important role in the workshop. Kevin Boyack triggered great interest by highlighting the findings of their recent research
on hitherto neglected factors behind the research focus of nations, namely altruistic vs. economic motives, which study was utilizing their
proposed high precision global science map. Petra Ahrweiler introduced a new project that utilizes knowledge mapping techniques and visual
analytics to reveal the relations between societal expectations and European policies (such as New and Emerging Technologies, NEST and
Responsible Research and Innovation, RRI). The interplay between science policy and science mapping was articulated by Sándor Soós while
exposing the work done under the IMPACT EV FP7 project, the latter focusing on the impact of European SSH research. Science mapping,
in this case, served as a tool for comparing the evolution and aspects of multidisciplinarity within social vs. natural sciences, in order
to inform research evaluation practices targeting the outcome of EU funded SSH projects.
Lessons to learn
Complemented by a series of theme-oriented discussions and author panels, the workshop offered quite a lot to learn,
in terms of both novel technical solutions and long-needed conceptual insights. Fundamental is the consensus that emerged from various
discussions (including an author panel on an upcoming special issue of Scientometrics entitled Same data, different results, or a roundtable
discussion on validation methods and future challenges, led by Andrea Scharnhorst, Jochen Gläser and Theresa Velden), that bibliometrics
is a fast evolving field utilizing diverse methods, analytic frameworks, techniques from various scientific domains (cf. theory of complex
networks), therefore, a smooth and more fruitful communication should take place between these domains. It would be necessary for avoiding
the "black box" effect of transdisciplinary applications (as Jochen Gläser put it), that is to gain full awareness of built-in assumptions
and scope of methods, of what is artefactual vs. real in mapping results. Also, better communication would assure that state-of-the-art
methods infiltrated sooner into applications. Synergies between the workshop and the IMPACT EV project were also discussed to assist the
characterization of SSH research with the aid of science mapping.
Workshop program
Abstracts
- Edgar Schiebel: Bibliometric field delineation with heat maps of bibliographically coupled publications using core documents and a cluster approach-the case of multiscale simulation and modelling
- George Kampis: Blindfolded NLP: Unsupervised Learning for Automatically Generating Topic Labels
- Gergely Tibély: Hierarchical organisation of scientific journals
- Haluk Bingol: Context sensitive article ranking with citation context analysis
- Kevin Boyack: The Tradeoff between Altruism and Economic Growth in the Research Focus of Nations
- Péter Pollner: Quantifying the changing role of past publications
- Theresa Velden: Mapping the cognitive structure of astrophysics combining different levels of organization: a citation and journal based approach
- Theresa Velden: Future plans on the challange of topic extraction (Discussion)
- Tim Evans: The Location of Papers in Topic Space-Time
- Wolfgang Glänzel: Lexical analysis of scientific publications for nano-level scientometrics
Presentations
- Edgar Schiebel: Bibliometric field delineation with heat maps of bibliographically coupled publications using core documents and a cluster approach-the case of multiscale simulation and modelling
- George Kampis: Blindfolded NLP: Unsupervised Learning for Automatically Generating Topic Labels
- Gergely Tibély: Hierarchical organisation of scientific journals
- Haluk Bingol: Context sensitive article ranking with citation context analysis
- Kevin Boyack: The Tradeoff between Altruism and Economic Growth in the Research Focus of Nations
- Péter Pollner: Quantifying the changing role of past publications
- Sándor Soós: Disciplinary structure and topical complexity in SSH-the IMPACT EV mission
- Theresa Velden: Mapping the cognitive structure of astrophysics combining different levels of organization: a citation and journal based approach
- Tim Evans: The Location of Papers in Topic Space-Time
- Wolfgang Glänzel: Lexical analysis of scientific publications for nano-level scientometrics
- George Kampis: Blindfolded NLP: Unsupervised Learning for Automatically Generating Topic Labels
Gallery
Photos: Klára Láng