This vignette introduces you to the different functions of the package with the data integrated in the package.
The biblio_coupling()
function is the most general
function of the package. This function takes as an input a direct
citation data frame (entities, like articles, authors or institutions,
citing references) and produces an edge list for bibliographic coupling
network, with the number of references that different articles share
together, as well as the coupling angle value of edges (Sen and Gan 1983).
This is a standard way to build bibliographic coupling network using
Salton’s cosine measure: it divides the number of references that two
articles share by the square root of the product of both articles
bibliography lengths. It avoids giving too much importance to articles
with a large bibliography. It looks like:
with R(A) and R(B) the references of documents A and B, R(A) • R(B) being the number of shared references by A and B, and L(A) and L(B) the length of the bibliographies of documents A and B.
The output is an edge list linking nodes together (see the
from
and to
columns) with a weight for each
edge being the coupling angle measure. If
normalized_weight_only
is set to be FALSE
,
another column displays the number of references shared by the two
nodes.
This example use the Ref_stagflation
data
frame.
library(biblionetwork)
biblio_coupling(Ref_stagflation,
source = "Citing_ItemID_Ref",
ref = "ItemID_Ref",
normalized_weight_only = FALSE,
weight_threshold = 1)
#> Key: <Source>
#> from to weight nb_shared_references Source
#> <char> <char> <num> <int> <int>
#> 1: 214927 2207578 0.14605935 4 214927
#> 2: 214927 5982867 0.04082483 1 214927
#> 3: 214927 8456979 0.09733285 3 214927
#> 4: 214927 10729971 0.29848100 7 214927
#> 5: 214927 16008556 0.04714045 1 214927
#> ---
#> 2712: 1111111161 1111111172 0.03434014 1 1111111161
#> 2713: 1111111161 1111111180 0.02003610 1 1111111161
#> 2714: 1111111161 1111111183 0.04050542 2 1111111161
#> 2715: 1111111172 1111111180 0.03646625 1 1111111172
#> 2716: 1111111182 1111111183 0.27060404 8 1111111182
#> Target
#> <int>
#> 1: 2207578
#> 2: 5982867
#> 3: 8456979
#> 4: 10729971
#> 5: 16008556
#> ---
#> 2712: 1111111172
#> 2713: 1111111180
#> 2714: 1111111183
#> 2715: 1111111180
#> 2716: 1111111183
This function is a relatively general function that can also be used:
source
and
ref
columns, but rather use the [biblio_cocitation()];The function just keeps the edges that have a non-normalized weight
superior to the weight_threshold
. In a large bibliographic
coupling network, you can consider for instance that sharing only one
reference is not sufficient/significant for two articles to be linked
together. This parameter could also be modified to avoid creating
intractable networks with too many edges.
biblio_coupling(Ref_stagflation,
source = "Citing_ItemID_Ref",
ref = "ItemID_Ref",
weight_threshold = 3)
#> Key: <Source>
#> from to weight Source Target
#> <char> <char> <num> <int> <int>
#> 1: 214927 2207578 0.14605935 214927 2207578
#> 2: 214927 8456979 0.09733285 214927 8456979
#> 3: 214927 10729971 0.29848100 214927 10729971
#> 4: 214927 19627977 0.11202241 214927 19627977
#> 5: 1021902 12824456 0.06537205 1021902 12824456
#> ---
#> 958: 1111111147 1111111156 0.17325923 1111111147 1111111156
#> 959: 1111111147 1111111161 0.13333938 1111111147 1111111161
#> 960: 1111111156 1111111161 0.08580846 1111111156 1111111161
#> 961: 1111111159 1111111171 0.24333213 1111111159 1111111171
#> 962: 1111111182 1111111183 0.27060404 1111111182 1111111183
As explained above, you can use the biblio_coupling()
function for creating a co-citation network, you just have to put the
references in the source
column (they will be the nodes of
your network) and the citing articles in ref
. As it is
likely to create some confusion, the package also integrates a
biblio_cocitation()
function, which has a similar structure
to biblio_coupling()
, but which is explicitly for
co-citation: citing articles stay in source
and references
stay in ref
. You can see in the next example that they
produce the same results:
biblio_coupling(Ref_stagflation,
source = "ItemID_Ref",
ref = "Citing_ItemID_Ref")
#> Key: <Source>
#> from to weight Source Target
#> <char> <char> <num> <int> <int>
#> 1: 49248 180162 1.0000000 49248 180162
#> 2: 49248 804988 0.3162278 49248 804988
#> 3: 49248 1999903 1.0000000 49248 1999903
#> 4: 49248 2031010 1.0000000 49248 2031010
#> 5: 49248 3580645 0.7071068 49248 3580645
#> ---
#> 87664: 1111112223 1111112225 1.0000000 1111112223 1111112225
#> 87665: 1111112223 1111112227 1.0000000 1111112223 1111112227
#> 87666: 1111112224 1111112225 1.0000000 1111112224 1111112225
#> 87667: 1111112224 1111112227 1.0000000 1111112224 1111112227
#> 87668: 1111112225 1111112227 1.0000000 1111112225 1111112227
biblio_cocitation(Ref_stagflation,
source = "Citing_ItemID_Ref",
ref = "ItemID_Ref")
#> Key: <Source>
#> from to weight Source Target
#> <char> <char> <num> <int> <int>
#> 1: 49248 180162 1.0000000 49248 180162
#> 2: 49248 804988 0.3162278 49248 804988
#> 3: 49248 1999903 1.0000000 49248 1999903
#> 4: 49248 2031010 1.0000000 49248 2031010
#> 5: 49248 3580645 0.7071068 49248 3580645
#> ---
#> 87664: 1111112223 1111112225 1.0000000 1111112223 1111112225
#> 87665: 1111112223 1111112227 1.0000000 1111112223 1111112227
#> 87666: 1111112224 1111112225 1.0000000 1111112224 1111112225
#> 87667: 1111112224 1111112227 1.0000000 1111112224 1111112227
#> 87668: 1111112225 1111112227 1.0000000 1111112225 1111112227
coupling_strength()
functionThis coupling_strength()
calculates the coupling
strength measure Shen et al. (2019) from a direct citation data frame.
It is a refinement of biblio_coupling()
: it takes into
account the frequency with which a reference shared by two articles has
been cited in the whole corpus. In other words, the most cited
references are less important in the links between two articles, than
references that have been rarely cited. To a certain extent, it is
similar to the tf-idf measure.
It looks like:
with N the number of articles in the whole dataset and freq(Rj) the number of time the reference j (which is shared by documents A and B) is cited in the whole corpus.
coupling_strength(Ref_stagflation,
source = "Citing_ItemID_Ref",
ref = "ItemID_Ref",
weight_threshold = 1)
#> Key: <Source>
#> from to weight Source Target
#> <char> <char> <num> <int> <int>
#> 1: 214927 2207578 0.019691698 214927 2207578
#> 2: 214927 5982867 0.005331122 214927 5982867
#> 3: 214927 8456979 0.011752248 214927 8456979
#> 4: 214927 10729971 0.046511251 214927 10729971
#> 5: 214927 16008556 0.008648490 214927 16008556
#> ---
#> 2712: 1111111161 1111111172 0.005067554 1111111161 1111111172
#> 2713: 1111111161 1111111180 0.001168603 1111111161 1111111180
#> 2714: 1111111161 1111111183 0.002580798 1111111161 1111111183
#> 2715: 1111111172 1111111180 0.003870999 1111111172 1111111180
#> 2716: 1111111182 1111111183 0.037748271 1111111182 1111111183
Rather than focusing on documents, you can want to study the
relationships between authors, institutions/affiliations or journals.
The coupling_entity()
function allows you to do that.
Coupling links are calculated using the coupling angle measure (like
biblio_coupling()
) or the coupling strength measure (like
coupling_strength()
). Coupling links are calculated
depending of the number of references two authors share, taking into
account the minimum number of times two authors are citing each
reference. For instance, if two entities share a reference in common,
the first one citing it twice (in other words, citing it in two
different articles), the second one three times, the function takes two
as the minimum value. In addition to the features of the coupling
strength measure or the coupling angle
measure, it means that, if two entities share two references in
common, the fact that the first reference is cited at least four times
by the two entities, whereas the second reference is cited at least only
once, the first reference contributes more to the edge weight than the
second reference. This use of minimum shared reference for entities
coupling comes from Zhao and Strotmann (2008). With the coupling strength
measure, it looks like:
with CAj and CBj the number of time documents A and B cite the reference j.
This example use the Ref_stagflation
and the Authors_stagflation
data
frames.
# merging the references data with the citing author information in Nodes_stagflation
entity_citations <- merge(Ref_stagflation,
Authors_stagflation,
by.x = "Citing_ItemID_Ref",
by.y = "ItemID_Ref",
allow.cartesian = TRUE)
# allow.cartesian is needed as we have several authors per article, thus the merge results
# is longer than the longer merged data frame
coupling_entity(entity_citations,
source = "Citing_ItemID_Ref",
ref = "ItemID_Ref",
entity = "Author.y",
method = "coupling_angle")
#> Key: <Source>
#> from to weight Source Target
#> <char> <char> <num> <char> <char>
#> 1: ALBANESI-S CHARI-V 0.032897585 ALBANESI-S CHARI-V
#> 2: ALBANESI-S CHRISTIANO-L 0.025302270 ALBANESI-S CHRISTIANO-L
#> 3: ALBANESI-S BALL-L 0.024296477 ALBANESI-S BALL-L
#> 4: ALBANESI-S MANKIW-G 0.038924947 ALBANESI-S MANKIW-G
#> 5: ALBANESI-S ROTEMBERG-J 0.030457245 ALBANESI-S ROTEMBERG-J
#> ---
#> 3461: WILLIAMS-J YOUNG-W 0.008684168 WILLIAMS-J YOUNG-W
#> 3462: WILLIAMS-J WILLIAMS-N 0.014002801 WILLIAMS-J WILLIAMS-N
#> 3463: WILLIAMS-J ZHA-T 0.014002801 WILLIAMS-J ZHA-T
#> 3464: WILLIAMS-N ZHA-T 0.040000000 WILLIAMS-N ZHA-T
#> 3465: WOODFORD-M YOUNG-W 0.020672456 WOODFORD-M YOUNG-W
#> Weighting_method
#> <char>
#> 1: coupling_angle
#> 2: coupling_angle
#> 3: coupling_angle
#> 4: coupling_angle
#> 5: coupling_angle
#> ---
#> 3461: coupling_angle
#> 3462: coupling_angle
#> 3463: coupling_angle
#> 3464: coupling_angle
#> 3465: coupling_angle
The biblionetwork package contains bibliometric data built by Goutsmedt (2021). These data gather the
academic articles and books that endeavoured to explain the United
States stagflation of the 1970s, published between 1975 and 2013. They
also gather all the references cited by these articles and books on
stagflation. The Nodes_stagflation
file contains
information about the academic articles and books on stagflation (the
staflation documents), as well as about the references cited at least by
two of these stagflation documents. The Ref_stagflation
is
a data frame of direct citations, with the identifiers of citing
documents, and the identifiers of cited documents. The
Authors_stagflation
is a data frame with the list of
documents explaining the US stagflation, and all the authors of these
documents (Nodes_stagflation
just takes the first author
for each document).
I take as example authors here, but the function could also be used for calculating a co-authorship network with institutions or countries as nodes.↩︎