arXiv:physics/0603242v3  [physics.soc-ph]  20 Oct 2006
The Geography of Scientiﬁc Productivity: Scaling in U.S. Computer Science
Rui Carvalho∗and Michael Batty†
Centre for Advanced Spatial Analysis, 1-19 Torrington Place,
University College London, WC1E 6BT United Kingdom
Here we extract the geographical addresses of authors in the Citeseer database of computer science
papers. We show that the productivity of research centres in the United States follows a power-law
regime, apart from the most productive centres for which we do not have enough data to reach
deﬁnite conclusions. To investigate the spatial distribution of computer science research centres in
the United States, we compute the two-point correlation function of the spatial point process and
show that the observed power-laws do not disappear even when we change the physical representation
from geographical space to cartogram space. Our work suggests that the eﬀect of physical location
poses a challenge to ongoing eﬀorts to develop realistic models of scientiﬁc productivity. We propose
that the introduction of a ﬁne scale geography may lead to more sophisticated indicators of scientiﬁc
output.
PACS numbers: 89.65.-s, 89.75.Da, 89.75.Fb, 89.90.+n
I.
INTRODUCTION
In the last decade, the analysis of mankind’s scien-
tiﬁc endeavour has become a rapidly expanding interdis-
ciplinary ﬁeld. This has been mainly due to the advent of
comprehensive online preprint servers and paper repos-
itories, from which patterns of productivity and collab-
oration networks of individual scientists can be readily
ascertained [1]. The vast amount of available data raises
the hope that scientists and policy makers will soon be
able to gain unprecedented insights into the location of
research centres and their productivity. Indeed, little is
known today about the inﬂuence that geographical loca-
tion may have on ”invisible colleges” [34] (but see [3, 4]).
Conversely, we are only just beginning to uncover how
the historical growth of these ”invisible colleges” gener-
ates heterogeneities in the physical location of research
centres and, therefore, of the scientists themselves.
Previous investigations of bibliometric data [5] by
physicists have followed two main directions.
On one
hand, eﬀorts have focused on characterizing the topologi-
cal structure of collaboration networks [6, 7, 8, 9]. On the
other, researchers have used tools of statistical physics to
gain insight into the growth dynamics of scientiﬁc out-
puts [10, 11, 12]. Despite this considerable progress, the
relation of collaboration networks to the productivity of
scientists depends on the still poorly understood ﬁne ge-
ographical location of research centres.
Matia et al. approached the challenge of characteriz-
ing institutional productivity by analyzing 408 U.S. in-
stitutes for the 11 year period 1991 −2001 [12]. They
observe a bimodal distribution and conjecture that this
is indicative of a clustering eﬀect of institutes of two dif-
ferent size classes [12].
The characterization of spatial structures at large geo-
∗Electronic address: rui.carvalho@ucl.ac.uk
†Electronic address: mbatty@geog.ucl.ac.uk
graphical scales has a long tradition. In 1971, Glass and
Tobler were the ﬁrst to apply the radial distribution func-
tion (or two-point correlation function, as it is known in
astrophysics [13]) to the study of cities on a part of the
Spanish plateau [14, 16]. They choose a 40 mile square,
homogeneous in town size and density, and apply con-
cepts developed in the study of the statistical mechanics
of equilibrium liquids. Although their analysis does not
detect clustering, we would expect the two-point corre-
lation function to reveal patterns of concentration and
clustering in data whose population sizes vary over many
orders of magnitude.
Recently, Yook et al. showed that the nodes of the in-
ternet are embedded on a fractal support driven by the
fractal structure of the population worldwide [17]. This
suggests that, in spatial networks with strong geograph-
ical constraints, the nodes may not be distributed ran-
domly in space [18], but may be clustered as a function
of population density.
Further, Gastner and Newman
presented an algorithm based on physical diﬀusion to
draw density equalizing maps, or cartograms, in which
the sizes of geographic regions appear in proportion to
their population or some other property [19]. Cartograms
give us a tool to probe into the dependence of one spa-
tial variable (e.g. cancer occurrences) upon another (e.g.
population). In particular, processes which are spatially
clustered, but dependent on population densities, are ex-
pected to display random spatial distributions once the
data are transformed by the cartogram [19, 20].
In order to bring the productivity of research centres
and their spatial interaction patterns under a single roof,
we follow a diﬀerent, but complementary approach to the
ones presented above. Indeed, research centres are not
homogeneously distributed in geographical space and it
is likely that location will impact on their productivity
and the structure of collaboration networks. However, to
fully understand the role of location on the production
of science and its networks, one must ﬁrst characterize
the underlying spatial processes, and this is the road we
take here. We therefore investigate scientiﬁc productiv-

2
ity as a function of ﬁne scale geographical location. Fur-
thermore, to underpin these results, we characterize the
spatial point process generated by the physical location
of research centres.
To investigate the role of ﬁne scale geography in
the production of science,
one needs to analyze a
large dataset.
Traditional investigations of bibliomet-
ric data have been carried out by analyzing databases
like PubMed, arXiv.org or Thomson ISI. However, these
databases suﬀer from drawbacks. Either the data con-
tains only the address of the ﬁrst (PubMed) or cor-
responding author (arXiv.org), or researchers are not
uniquely associated with their addresses (Thomson ISI).
A more promising source of data is the Citeseer digi-
tal library, created in 1998 as a prototype of Autonomous
Citation Indexing [21]. Citeseer locates computer science
articles on the web in Postscript or PDF format and ex-
tracts citations from and to documents [22]. Citeseer has
made its metadata available online [35] and the inclusion
of an address and aﬃliation ﬁelds for each author allows
a ﬁrst rigorous analysis into the geography of a very large
bibliometric database.
II.
SPATIAL STRUCTURE
We studied the Citeseer metadata, which contains
716, 772 records, some of which are repeated and some
of which have authors with empty address ﬁelds.
We
considered the N = 379, 111 (52.9%) unique papers for
which citeseer identiﬁes all authors and their respective
addresses. Out of these N unique papers, we analyzed
the M = 128, 348 (pUS = 33.9%) papers which have one
or more U.S. authors. Interestingly, pUS, is in reasonable
accordance with Thomson ISI global indicators, which
state that between 1997 and 2001, the United States out-
put 34.86 % of the world’s highly cited publications [23].
For each paper, we extracted the 5–digit ZIP code from
each author’s address ﬁeld and geocoded this ZIP into a
(latitude, longitude) pair of coordinates [36]. We iden-
tiﬁed ZIP codes from the address ﬁeld, by using regular
expressions to match a ﬁve-digit code (plus the optional
four digit code, which we ignored) preceded or followed
by a U.S. state (or its abbreviation) or the acronym USA.
This will leave out addresses like Roma 00185, Italy or Is-
rael 84105, but will also fail to locate the address Physics
Department, Northeastern University, Boston MA USA
as it lacks a ZIP code. We restricted the analysis to the 48
conterminous U.S. states plus the District of Columbia.
We identiﬁed a total of 116, 771 distinct authors with
a U.S. address. Out of these, 103, 928 (89%) list a single
ZIP code in their address, 10, 579 (9.1%) belong to insti-
tutions located in two ZIP codes and 2, 264 (1.9%) are
located in three or more institutions.
Rank
Zip
Fractional Count Institution
1
15213
2343.36
Carnegie Mellon University
2
02139
1891.18
MIT
3
94305
1512.12
Stanford University
4
94720
1496.76
University of California,
Berkeley
5
20742
1144.70
University of Maryland,
College Park
TABLE I: Most productive ZIP codes and respective Univer-
sities.
A.
Productivity of Research Centres
To investigate the concept of scaling in publication out-
put of academic research centres, we computed the prob-
ability distribution of total paper output per ZIP code.
We note that ZIP codes were not aggregated.
If two
research centres belonging to the same institution have
addresses with distinct ZIP codes, we considered them
as distinct centres. This has the disadvantage of possibly
counting more than one research centre per institution
(instead of aggregating both to the same institution).
However, Citeseer covers scientiﬁc articles in the ﬁeld of
computer science and it would be the exception that one
institution would have several geographically separated
computer science centres.
Our analysis identiﬁed 3, 393 diﬀerent ZIP codes that
matched the U.S. census bureau tables. We implemented
a version of fractional counting [5, 24] to compute the
productivity of U.S. research centres. For every paper,
we parsed each author’s address ﬁeld and extracted the
ZIP codes therein (there may be more than one ZIP, if
the author belongs to more than one U.S. institution).
Each occurrence of a ZIP code in an address ﬁeld of a
paper increments the productivity of the research centre
physically located at that ZIP code by 1/φ, where the
normalization factor φ is computed as follows. For every
address ﬁeld in the paper being analyzed, we made φ :=
φ + 1 if the address contains no ZIP codes (i.e. it is a
non-U.S. address), or φ := φ + m if the address contains
m ≥1 ZIP codes (in which case that speciﬁc author will
belong to m distinct U.S. institutions).
Identifying research centres by ZIP code has the ad-
vantage of simplifying the data parsing algorithm, which
is why we preferred this method to others based on ag-
gregation by host institution. However, the method is an
approximation, as it cannot distinguish between non-U.S.
addresses.
Table I displays the ﬁve most productive ZIP codes
and their host institutions. Interestingly, the two most
productive institutions, Carnegie Mellon University and
MIT are also the two most acknowledged entities as
shown by Giles and Councill in a previous study [21].
We then asked the question: what is the probability
distribution of the research output of each research cen-
tre? To investigate this, we plot the probability density
and cumulative distribution (P [X > x] =
R ∞
x p (y) dy)

3
in Figure 1. We found a bimodal probability distribution
of research output by ZIP code (see Figure 1a), in agree-
ment with a previous study of the Thomson ISI database
by Matia et. al [12].
Our results suggest that this probability distribution
displays power-law decay up to the ”knee” where the
regime changes.
Data was insuﬃcient to determine
whether the upper tail of the distribution also decays as
a power-law, albeit with a diﬀerent exponent. This ob-
servation is in apparent contradiction with the ﬁndings
of Matia et al. who do not ﬁnd a power-law regime. The
authors examine the productivity of 408 U.S. institutes,
whereas our method revealed that papers had been out-
put at 3, 393 U.S. institutes. Therefore, the power-law
decay which we observed may be due to our methodol-
ogy which included all research institutes in the meta-
data.
On the other hand, our analysis was limited in
scope to the Citeseer database, whereas Matia et al. an-
alyze the Thomson ISI dataset, hence comparisons with
their wider study are necessarily inconclusive. Neverthe-
less, our results raise the question of whether power-law
decay only appears once one is able to identify a large
percentage of all research institutes.
B.
The Pulling Power of Research Clusters
A simple point process in R2 may be considered as a
random countable set X ⊂R2. The ﬁrst moment of a
point process can be speciﬁed by a single number, the in-
tensity, ρ, giving the expected number of points per unit
area. The second moment can be speciﬁed by Ripley’s
K function [16], where ρK(r) is the expected number
of points within distance r of an arbitrary point of the
pattern.
The product density
ρ2 (x1, x2) dA (x1) dA (x2) = ρ2g (r) dA (x1) dA (x2)
(1)
describes the probability to ﬁnd a point in the area el-
ement dA (x1) and another point in dA (x2), at the dis-
tance r = |x1 −x2|, and g (r) is the two-point correlation
function. Ripley’s K function is related to g (r) by [26]
K (r) = 2π
Z
g (r) rdr
(2)
In other words, g (r) is the density of K (r) with re-
spect to the radial measure rdr. The benchmark of com-
plete randomness is the spatial Poisson process, for which
g (r) = 1 and K(r) = πr2, the area of the search region
for the points. Values larger than this indicate cluster-
ing on that distance scale, and smaller values indicate
regularity.
The two-point correlation function can be estimated
from N data points x ∈D inside a sample window W by
[27]:
g(r) =
|W|
N (N −1)
X
x∈D
X
y∈D
Φr (x, y)
2πr∆
ω (x, y)
(3)
10
−2
10
0
10
2
10
4
10
−2
10
−1
10
0
Probability Density
Paper output per ZIP code
10
−2
10
0
10
2
10
4
10
−3
10
−2
10
−1
10
0
Cumulative Distribution
Paper output per ZIP code
α=1.55
500 1000 1500 2000 2500 3000
1
1.5
2
2.5
3
α
k
a)
b)
FIG. 1: Probability distribution of paper output (fractional
counts) per ZIP code. (a) The probability density is bi-modal
and can be approximated by a power-law regime between the
two local maxima. (b) A least squares ﬁt to the linear region
of the cumulative distribution yields α = 1.55.
The inset
shows the Hill plot [25] as the number of upper order statistics,
k, is varied. The match between the plateau on the Hill plot
and the least squares ﬁt (dashed horizontal line), shows that
our estimate of α is appropriate.
where 2πr∆is the area of the annulus centred at x with
radius r and thickness ∆. Here |W| is the area of the
sample window, and the sum is restricted to pairs of dif-
ferent points x ̸= y. The function Φr is symmetric in its
argument and Φr (x, y) = [r ≤d (x, y) ≤r + ∆], where
d (x, y) is the Euclidean distance between the two points
and the condition in brackets equals 1 when true and 0
otherwise.
The function ω (x, y) accounts for a bounded W by
weighting points where the annulus intersects the edges
of W. There are a number of edge-corrections available,
but that of Ripley [15] has a long tradition both in human

4
2343
234
23
2
a)
c)
b)
d)
FIG. 2: a) Albers’ equal-area projection of the 48 conterminous states of the US plus the district of Columbia. Research centres
are identiﬁed by circles with area proportional to their productivity on a logarithmic scale. b)-d) Data in a) after a cartogram
transformation with R&D expenditure by state (b), and population by state (c) and county (d), respectively. For each panel,
we trace the 14, 605 point border polygon used in the computation of the two-point correlation function.
geography [16] and physics [27]:
ω (x, y) =
2πr
̥ (∂Br (x) ∩W)
(4)
where ̥ (∂Br (x) ∩W) is the fraction of the perimeter of
the circle Br (x) with radius r = |x −y| around x inside
W –e.g. ̥ (∂Br (x) ∩W) = πr if only half of the annulus
falls inside W. Note that ω (x, y) = 1 iﬀ∂Br (x) ⊂W,
in which case the summand in (3) is simply the sum of
Φr (x, y) weighted by the area of the annulus centred at
x with radius r and thickness ∆. If ∂Br (x) ∩W ̸= ∅,
that is the circle Br (x) is only partially in the sample
window W, then Φr (x, y) is weighted by the area of the
fraction of the annulus which is inside W.
Of special physical interest is whether the two-point
correlation is scale-invariant. A scale-invariant g (r) is
an indicator of a fractal distribution of research centres,
and is expected in critical phenomena [28].
To investigate the presence of power-law decay in the
two-point correlation function we selected the 1, 046 re-
search centres (ZIP codes) which had a total fractional
count of two papers or more. We chose this productivity
threshold for two main reasons. A ﬁrst factor was to con-
sider only research centres which can be clearly identiﬁed
as active. Second, the computation of the two-point cor-
relation function requires reasonable computer resources
as W is a ﬁne boundary of the United States –in our case,
a polygon with 14, 605 points.
Next,
we
projected
the
U.S.
map
and
the
(latitude, longitude)
pairs
of
the
research
centres
with the Albers’ equal area projection [29][37] and
computed the two-point correlation function, g (r), of
the resulting point process.
To investigate whether the decay of g (r) is a function
of the distribution of R&D funding or population, we ap-
plied several cartogram transformations to the base map
and the points: ﬁrst, we computed the cartogram pro-
jection using U.S. R&D funding expenditure, by state,
for the year 2001 [30, table B-17]; second we computed
the cartogram with U.S. population, by state and county,
from the 2000 census [38]. The points representing the re-
search centres were transformed accordingly to each car-

5
togram. Figure 2a) shows the Albers’ equal area projec-
tion and each centre is represented by a circle with area
proportional to the number of papers output on a loga-
rithmic scale. Figures 2b)-d) show the cartograms with
R&D expenditure by state, and population by state and
county, respectively. It is obvious from these maps that
as the cartogram transformation uses ﬁner spatial scales
(e.g. from U.S. states to counties), the points become
more homogeneously distributed spatially.
The two-point correlation function computed for the
projected data (see Figure 2a)) is plot in Figure 3, where
we observe a power-law decay g (r) ∼r−γ with γ ≃1.16.
Next we asked the following question: can the power-
law decay of g (r) be explained by a clustering of re-
search centres in areas where research funding or pop-
ulation is higher? To answer this question, we computed
g (r) for the same point process, but now using the data
transformed by the cartograms with R&D expenditure by
state (Figure 2b)), population by state (Figure 2c)), and
population by county (Figure 2d)). Our results showed
that the power-law decay was still present after the car-
togram projections, although as the transformation was
performed at ﬁner spatial scales, g (r) approached the ex-
pected value for a Poisson process, g (r) = 1, at shorter
distances.
10
0
10
1
10
2
10
3
10
−1
10
0
10
1
10
2
10
3
r in Km
g(r)
 
 
γ=1.16
g(r) in geographical space
g(r) in cartogram space [R&D per state]
g(r) in cartogram space [population per state]
g(r) in cartogram space [population per county]
FIG. 3: Variation of the two-point correlation function with
distance (Km). In blue, g (r) computed from projections of
the border and the points with the Albers’ equal area pro-
jection. In red, green and black, g (r) computed from further
transforming the data by the cartogram projection with R&D
per state, population per state and county, respectively. The
horizontal line at g = 1 is the expected value of g (r) for a
Poisson process.
III.
DISCUSSION
Considerable advances have been made over the past
few years in understanding the structure of scientiﬁc pro-
duction and its networks.
Along this road, physicists
have computed a number of quantities to characterize
networks of scientiﬁc collaborations, mainly by analyz-
ing data from online preprint servers and repositories.
However, these studies have not addressed the impact of
ﬁne scale physical location on the statistical characteri-
zation of the scientiﬁc enterprise and it networks. Here
we have presented a detailed study of the productivity
of research centres in U.S. computer science (identiﬁed
by ZIP codes) and characterized the pattern of spatial
concentration which these centres display.
A ﬁrst important conclusion of our study is that the
productivity of U.S. research centres in computer science
was highly skewed. A surprising result of our study was
the power-law decay of the probability distribution of re-
search output for some orders of magnitude. A second
important conclusion is that the physical location of re-
search centres in the U.S. formed a fractal set, which
was not completely destroyed by population or research
funding patterns.
Although we consider our results to be promising, there
are still several caveats. First our conclusions are clearly
only valid for the U.S. [12, 31] and even from the Citeseer
database, which we consider is the best currently avail-
able for such analysis, there are problems of missing and
inaccurate data which we are not able to quantify. Nev-
ertheless, our results are consistent with those from the
burgeoning geography of information technology which
suggests in qualitative fashion, that such technologies are
correlated with population but also have their own dy-
namic [32, 33]. In this sense, our result that the scaling
inherent in the geographical distribution of paper pro-
duction in U.S. computer science is still present once the
geography has been normalized with respect to the distri-
bution of population and R&D expenditures, implies pro-
cesses that are endogenous to the dynamics of research
[11].
In summary, the method introduced in this paper could
serve as a starting point for an investigation of the role
of the ﬁne scale physical location of research centres in
the production of science.
Our study focused on U.S.
computer science but further analyses should be possi-
ble as preprint server repositories make more elaborate
metadata available. And such developments may lead to
a better understanding of the role of physical location
not just in science, but for a much wider class of complex
spatial systems.
Acknowledgments
We wish to thank Michael Gastner (SFI) for help with
the code to generate cartograms and Isaac Councill (Penn
State) for help with the Citeseer database. This research

6
was supported by the Engineering and Physical Sciences
Research Council under grant EP/C513703/1.
[1] R. M. Shiﬀrin, K. B¨orner, Mapping knowledge domains,
Proc. Nat. Acad. Sci. U.S.A. 101 (suppl. 1) (2004) 5183–
5185.
[2] D. Crane, Invisible Colleges: Diﬀusion of Knowledge in
Scientiﬁc Communication, University of Chicago Press,
1972.
[3] M. Batty, The geography of scientiﬁc citation, Environ.
Plan. A 35 (2003) 761–765.
[4] K. B¨orner, S. Penumarthy, M. Meiss, W. Ke, Mapping
the diﬀusion of information among major U.S. research
institutions, Scientometrics 68 (3) (2006) 415-426.
[5] L. Egghe, R. Rousseau, Introduction to Informetrics:
Quantitative Methods in Library, Documentation and In-
formation Science., Elsevier, 1990.
[6] M. E. J. Newman, The structure of scientiﬁc collabora-
tion networks, Proc. Nat. Acad. Sci. U.S.A. 98 (2) (2001)
404–409.
[7] M. E. J. Newman, Scientiﬁc collaboration networks: I.
network construction and fundamental results, Phys.
Rev. E 64 (2001) 016131.
[8] M. E. J. Newman, Scientiﬁc collaboration networks: II.
shortest paths, weighted networks, and centrality, Phys.
Rev. E 64 (2001) 016132.
[9] A. L. Barab´asi, H. Jeong, Z. N´eda, E. Ravasz, A. Schu-
bert, T. Vicsek, Evolution of the social network of scien-
tiﬁc collaborations, Physica A 311 (2002) 590–614.
[10] V. Plerou, L. A. N. Amaral, P. Gopikrishnan, M. Meyer,
H. E. Stanley, Similarities between the growth dynamics
of university research and of competitive economic activ-
ities, Nature 400 (1999) 433–437.
[11] L. A. N. Amaral, P. Gopikrishnan, K. Matia, V. Plerou,
H. E. Stanley, Application of statistical physics methods
and concepts to the study of Science and technology sys-
tems, Scientometrics 51 (1) (2001) 9–36.
[12] K. Matia, L. A. N. Amaral, M. Luwel, H. F. Moed, H. E.
Stanley, Scaling phenomena in the growth dynamics of
scientiﬁc output, J. Am. Soc. Inf. Sci. Technol. 56 (9)
(2005) 893–902.
[13] Peacock J., Cosmological Physics, Cambridge University
Press, 1999.
[14] L. Glass, W. R. Tobler, Uniform distribution of objects
in a homeogeneous ﬁeld: Cities on a plain, Nature 233
(1971) 67–68.
[15] B. D. Ripley, The second-order analysis of stationary
point processes, J. Appl. Probab. 13 (2) (1976) 255-266.
[16] B. D. Ripley, Modelling spatial patterns, J. R. Stat. Soc.
Ser. B-Stat. Methodol. 39 (2) (1977) 172–212.
[17] S.-H. Yook, H. Jeong, A.-L. Barab´asi, Modeling the inter-
net’s large-scale topology, Proc. Nat. Acad. Sci. U.S.A.
99 (2002) 13382–13386.
[18] S. Boccaletti, V. Latora, Y. Moreno, M. Chavez, D.-
U. Hwang, Complex networks: Structure and dynamics,
Phys. Rep. 424 (2006) 175–308.
[19] M. T. Gastner, M. E. J. Newman, Diﬀusion-based
method for producing density-equalizing maps, Proc.
Nat. Acad. Sci. U.S.A. 101 (20) (2004) 7499–7504.
[20] M. T. Gastner, M. E. J. Newman, Optimal design of
spatial distribution networks, Phys. Rev. E 74 (2006)
016117.
[21] C. L. Giles, I. G. Councill, Who gets acknowledged:
Measuring scientiﬁc contributions through automatic ac-
knowledgment indexing, Proc. Nat. Acad. Sci. U.S.A.
101 (51) (2004) 17599–17604.
[22] A. A. Goodrum, K. W. McCain, S. Lawrence, C. L. Giles,
Scholarly publishing in the internet age: a citation anal-
ysis of computer Science literature, Inf. Proc. Manag. 37
(2001) 661–675.
[23] D. A. King, The scientiﬁc impact of nations, Nature 430
(2004) 311–316.
[24] D. de Solla Price, Letter to the editor, Science 212 (1981)
987.
[25] H. Dress, L. d. Haan, S. Resnick, How to make a Hill
plot, Ann. Stat. 28 (1) (2000) 254–274.
[26] D. Stoyan, Basic ideas of spatial statistics, in: K. Mecke,
D. Stoyan (Eds.), Statistical physics and spatial statis-
tics, Springer-Verlag, Heidelberg, 2000, pp. 3–21.
[27] M. Kerscher, I. Szapudi, A. S. Szalay, A comparison of
estimators for the two-point correlation function, Astro-
phys. J. 535 (2000) L13–L16.
[28] M. Kerscher, Statistical analysis of large-scale structure
in the universe, in: K. Mecke, D. Stoyan (Eds.), Statis-
tical physics and spatial statistics, Springer-Verlag, Hei-
delberg, 2000, pp. 36–71.
[29] A. H. Robinson, J. L. Morrison, P. C. Muehrcke, A. J.
Kimerling, S. C. Guptill, Elements of Cartography, John
Wiley and Sons, 1995.
[30] National Science Foundation, Division of Science Re-
sources Statistics, National Patterns of Research and De-
velopment Resources: 2003, NSF 05-308, Brandon Shack-
elford (Arlington, VA 2005).
[31] H. F. Moed, M. Luwel, The business of research, Nature
400 (1999) 411–412.
[32] M. A. Zook, The Geography Of the Internet Indus-
try: Venture Capital, Dot-Coms, And Local Knowledge,
Blackwell, 2005.
[33] M. Dodge, Understanding cyberspace cartographies: A
critical analysis of internet network infrastructure maps,
Ph.D. thesis, University College London (2006).
[34] An invisible college is a loose network of researchers who
”communicate with each other and transmit information
across the whole ﬁeld (...) to monitor the rapidly chang-
ing research ’front’.” [2, p35].
[35] http://citeseer.ist.psu.edu/oai.html
Accessed 22/02/2006.
[36] http://www.census.gov/geo/www/tiger/zip1999.html
[37] http://www.census.gov/geo/www/cob/
[38] http://www.census.gov/popest/datasets.html
