TechRank
arXiv:2210.07824v1 [q-fin.CP] 14 Oct 2022
Anita Mezzetti1 , Loı̈c Maréchal2,3 , Dimitri Percia David4 , William Lacube2 ,
Sébastien Gillard5,6 , Michael Tsesmelis2 , Thomas Maillart5 , and Alain
Mermoud2
1
Credit Suisse
Campus, armasuisse Science and Technology
3 University of Lausanne (HEC Lausanne)
4 University of Applied Science (HES-SO Valais-Wallis)
5 GSEM, University of Geneva
6 Military Academy, ETH Zurich
2 Cyber-Defence
Abstract
We introduce TechRank, a recursive algorithm based on a bi-partite graph with
weighted nodes. We develop TechRank to link companies and technologies based on
the method of reflection. We allow the algorithm to incorporate exogenous variables
that reflect an investor’s preferences. We calibrate the algorithm in the cybersecurity
sector. First, our results help estimate each entity’s influence and explain companies’
and technologies’ ranking. Second, they provide investors with a quantitative optimal
ranking of technologies and thus, help them design their optimal portfolio. We propose
this method as an alternative to traditional portfolio management and, in the case of
private equity investments, as a new way to price assets for which cash flows are not
observable.
JEL classification: C14, C69, G17, G24
Keywords: private equity, bipartite networks, technology monitoring, portfolio optimization
This document is the results of a research project funded by the Cyber-Defence Campus, armasuisse
Science and Technology.
1.
Introduction
This work investigates the innovation structure and the dynamics underlying the life
cycle of technologies. We fill two research gaps. The first concerns the identification of
future benefits and risks of emerging technologies for the society. The second regards the
valuation or early-stage companies and optimal investment decisions. To fill these gaps,
we introduce the TechRank algorithm. Our methodology assigns a score to each entities,
i.e., technologies and firms, based upon their contribution to the technological ecosystem.
We expect this method to help stakeholders in forming optimal decisions for investment,
procurement, and technology monitoring.
We calibrate our model in the cybersecurity sector, although TechRank could apply to
any sector. The cybersecurity technological landscape represents a particular challenge for
this calibration, given the important share of start-ups and fast innovations it yields.[16].
Moreover, the important number of cyber-attacks and the increasing costs they incur has
boosted cybersecurity investments.1 According to Bloomberg, “the global cybersecurity
market size is expected to reach USD 326.4 billion by 2027, registering a compound annual
growth rate of 10.0% from 2020 to 2027”.2
To develop the TechRank algorithm, we first model and map the ecosystem of companies
and technologies from the Crunchbase dataset using a bi-partite network. The bi-partite
network structure accurately describes this complex and heterogeneous system. We evaluate
the relative influence of the network nodes in the ecosystem by adapting a recursive algorithm
that estimates network-centrality.
This methodology should help decision-makers and investors to assess the influence of
entities in the cybersecurity ecosystem, reducing investment uncertainties. In fact, around
90% of startups fail and in 42% of the cases this is due to incorrect evaluation of the market
demand. The second reason (29%) is because they run out of funding and personal money.3
Christensen (1997) highlights that well-managed companies also break down, because they
over-invest in new technologies[8]. Thus, by selecting the right technologies to invest in goes
along with the optimal investment strategy.
Our research takes inspiration from Google’s PageRank algorithm, that ranks web pages
according to readers’ interest[25]. We use a similar approach with bi-partite networks to
assign a score to companies and technologies. Our method is flexible and permits to incorporate investors’ preferences such as location or previous funding rounds. TechRank let the
investor select entities’ features that reflect their interests. The algorithm uses their choices
as input, which tweaks the entities’ score to reflect them. This enables investors to select a
personalized portfolio strategy using a quantitative methodology. The evaluation of companies and new technologies largely depends on investors’ personal choices, which may lead to
misread market demand. This work aims to lead to more methodical decision making for
1
The New York Times: “As Cyberattacks Surge, Security Start-Ups Reap the Rewards” by Erin Woo
(July 26, 2021).
Yahoo Finance: “Microsoft Securing its Position with Cybersecurity Investments” by TipRanks (July 20,
2021).
2
Bloomberg: “Global Cybersecurity Market Could Exceed $320 Billion in Revenues by 2027” (July 29,
2020).
3
Findstack: “The Ultimate List of Startup Statistics for 2021” by Jack Steward.
2
investors.
The remainder of this article proceeds as follows. Section 2 presents the literature review
and hypotheses. Section 3 details the data and the methodology. Section 4 presents the
results. Section 5 concludes.
2.
Literature review and hypotheses development
2.1. Centrality measures
In network analysis, the centrality estimates the importance of nodes through ranking.
The most simple centrality estimate is the “degree”, which counts the number of neighbours
of a node. One of its drawbacks is that it does not show which one stands in the center of
the network. Two nodes may share the same degree, while being more or less peripheral.
Thus, the degree is a local centrality measure, which does not capture the influence across
nodes within the graph.
9
2
8
0
1
7
6
5
4
3
Fig. 1: Central and peripheral nodes
This figure depicts the difference between central (red) and peripheral (brown) nodes in a graph.
Another important centrality measure is the “closeness”, which measures how long it
takes for information to spread from one node to the next. Specifically, closeness is defined
as the reciprocal of the “farness”, i.e. the sum of distances of one node with respect to all
other nodes. The “betweenness centrality” of a node measures how often a node stands in
the shortest path between a pair of other nodes (see, e.g., Bavelas; 1948, Saxena and Iyengar;
2020, and Freeman; 1978)2, 28, 14).
Another strand of research focuses on the top-K shortest path identification in a complex network, a topic less tackled by the literature than centrality. To rank nodes, one
must compute the centrality of all nodes and compare them to extract the rank, which is
not always feasible due to the size of the network. To overcome this problem Saxena and
Iyengar (2017) attempt to estimate the global centrality of a node without analyzing the
whole network[29]. Similarly, Bavelas (1948) develops a structural centrality measure in
the context of social graphs[2]. Other centrality concepts include the eigenvector, Katz, or
PageRank centralities[5, 25, 20]. Finally, Freeman (1978) creates a formal mathematical
framework for centrality, which includes degree, closeness, and betweenness and advocates
for the combination of different kinds of centrality measures[14].
3
2.2. Page Rank
Page, Brin, Motwan, and Winograd (1999) develop the PageRank algorithm[25]. Its primary goal is to rank web pages objectively, a challenge with the fast-growing web. PageRank
assigns a score to each web page based on its relations with other web pages in the graph.
Other fields have benefited PageRank providing modifications and improvements. Xing and
Ghorbani (2014) extend the algorithm and propose the weighted PageRank (WPR)[32]. This
algorithm assigns larger rank values to more important pages, instead of dividing the rank
evenly among its outlink pages.4 Each outlink page gets a value proportional to its popularity, taking into account the links weights. On caveat of PageRank and its variants is that
they do not consider n-partite structures, yet, web pages can all be linked to one another.
Bi-partite networks address this issue and capture this complexity, among other structures.
2.3. Bi-partite networks
Networks are a fundamental tool to capture the relations between entities. Graphs (G)
are composed by vertices (V ) and edges (E), and we denote G = (V, E). We build links and
mathematically analyze many properties of the whole system and of singular entities. To
graphically represent a real system, we synthesize its information into a simple graph framework. This simplification generates an information loss in the modelling process. Simple
network structures might discard important information about the structure and function
of the original system [23]. As a consequence, the failure of a very small fraction of nodes
in one network may lead to the complete fragmentation of a system[6]. To solve the problem, extensions to the simple structure G = (V, E) are added and yield graphs with more
powerful features. For instance, in case vertices are connected by relationships of different
kinds, Battiston, Nicosia, and Latora (2014) advocate to work with multiplex networks, i.e.
networks where each node appears in a set of different layers, and each layer describes all the
edges of a given type [1]. When it is possible to distinguish the nature of the edges, multiplex
networks are an effective approach, which starts from embedding the edges in different layers
according to their type. However, even if we have two kinds of nodes, the nature of the edges
is unique. Therefore, a more suitable approach is bi-partite networks. Bi-partite networks
are for instance, a good way to describe the technological and business landscape. In Figure
2, we depict two sets of nodes, companies and technologies, which are interconnected but do
not present edges within the same set.
There are multiple adaptations of the PageRank algorithm to bi-partite structures[3, 12,
31, 21]. In particular, Benzi, Estrada, and Klymko (2015), Donato, Laura, Leonardi, and
Millozzi (2004), and Tu, Jiang, Song, and Zhang (2018) extend the PageRank algorithm
to multiplex networks. They assume that only some clusters of the graph are multiplex
networks and extend the PageRank algorithm only to analyze the sub-graph centrality. Bipartite networks are used to transform directed into undirected networks with twice the
number of vertices.
Klein, Maillart, and Chuang (2015) extend PageRank in the Wikipedia editors and articles context [21]. The application of this algorithm to the case of interactions between
4
Given a web page W, an inlink of W is a link of another web page that includes a link pointing to W.
An outlink of W is a link appearing in W, which points to another web page.
4
Companies
c1
t1
c2
t2
c3
t3
Technologies
Fig. 2: Bi-partite structure of companies and technologies
The left panel depicts a typical bi-partite structure. The right panel provides an illustration of this
structure with companies (layer 1) and technologies (layer 2).
companies and technologies is straightforward. A major benefit of this approach is that it
starts from an unweighted graph, linking authors and articles. They develop a recursive algorithm in which the two entities contribute to the quality (for articles) or the expertise (for
authors) of each other. They develop a bi-partied random walker by building the adjacency
matrix Me,a that takes value 1 if editor e has edited article a and 0 otherwise, which tracks
all the editors’ contributions. They obtain Me,a ∈ Rne ,na , where ne and na are the number of
editors and articles. They sort editors by the number of articles’ contributions and assign a
contribution (quality) value to each editor (article) based on their degree. The expertise we0
(quality wa0 ) is given by the number of articles (editors) they have worked on (have received
modifications).
The second part of the algorithm follows a Markov process in its iterations. The step wn
n
(w = wn (α, β)) only depends on information available at wn−1 . At each step, the algorithm
incorporates information about the expertise of editors and the quality of articles, within the
bi-partite network structure. The process is a random walker with jumps, whose transition
probability is zero in the case Me,a = 0. Next, the authors define two variables for the
transition probability, Ge,a (β) and Ge,a (α)). Ge,a (β) represents the probability of jumping
from article a to editor e and Ga,e (α) represents the probability to jump from an editor to an
article. Both parameters depend on initial conditions and the selection of optimal parameters
is done through a grid search that maximizes the Spearman rank-correlation between the
rank given by the model and a ground-truth metrics obtained independently. Finally Klein et
al. (2015) observe a “less-is-more” situation since the presence of too many editors working
on an article is detrimental to its quality. Studying different categories of Wikipedia articles
they find α to remain constant, while β varies significantly across categories.
Estimating the global rank of a node starting from local information and centrality measures is still an open research question in many sectors [28]. In particular, no research to
our knowledge use this approach for investment’s decision and portfolio optimization. Yet,
this approach could help overcome the limitations of standards financial models in private
equity, in which the network structure is easily obtainable, whereas the cash flow process is
not.
5
2.4. Private equity valuation
Private firms are not required to publicly disclose their financial statements, which makes
it difficult to measure their past performance and estimate their expected returns, without
insider information. Moreover, since they are not listed on exchanges we do not observe
expectations of market participants. Thus, standard asset pricing methods fall short in this
context. Similarly, private equity analysts must rely on insider or private information to
value private firms. These valuations generally occur around a financing round and impose
to take into account the capital dilution to compute realized returns on the firm[17].
One approach that attempts to overcome these limitations and estimate the expected
returns and risk in venture capital is Cochrane (2005)[9]. He uses a maximum likelihood
estimation method to obtain these values at the market and sector levels, such as healthcare,
biotechnology, technology companies, and retail services. He finds a mean arithmetic return
of 59%, an alpha of 32%, a beta of 1.9 and a volatility of 86% (equivalent to a 4.7% daily
volatility). Given that the distribution of returns is heavily positively skewed in venture
capital, he adopts a logarithmic model that also accounts for the inherent selection bias of
this asset class.5 Ewens (2009) updates this method on returns computed from financing
round to the next[13]. He adopts a three-regime mixture model (failure, medium returns,
and “home-runs”). He also corrects for the selection bias and obtains an alpha of 27% and a
beta of 2.4. He finds that 60% of all venture capital investments have a negative log return.
Altogether the results are similar, venture capital investments exhibit positive alpha, large
beta, and a high volatility. Other attempts to evaluate the market parameters of the venture
capital asset class includes Korteweg and Nagel (2016) and Moskowitz and Vissing-Jorgensen
(2010) with results in line with the aforementioned studies[22, 24].
Another strand of research attempts to index and benchmark the private equity market.
Peng (2001) builds a venture capital index from 1987 to 1999 from about 13,000 financing
rounds targeting over 5,600 firms[26]. He addresses the problems of missing data, censored
data, and sample selection using a re-weighting procedure and method of moment regressions.
From the index perspective, the results are qualitatively the same, that is, high and volatile
returns to venture capital (average return of 55.18% per year). He finds his index to display
a much higher volatility than the S&P 500 and NASDAQ indices and high exposure to
these indices (betas of 2.4 and 4.7, respectively). Other venture capital indices construction
includes Hwang, Quigley, and Woodward (2005), Schmidt (2006), and Cuming, Has and
Schweizer (2013), all obtaining results on par with the aforementioned studies.
One limitation of the above studies is that they only estimate these parameters at an
aggregate level. An investor could form her investment’s decisions and portfolio choices
segregating among sectors, but not obtain the actual firms’ parameters. One exception is
Schwartz and Moon (2000), who provide an approach based upon real-options theory to
price individual firms[30]. However, this method requires the observations of cash flows and
they only provide one calibration example with Amazon. Thus, there remains a caveat in
methodology to help investors’ forming optimal decisions using all the information available.
Given the recent venture capital boom, Zhong, Chuanren, Zhong, and Xiong (2018) advocates for the use of quantitative methodologies of screening and evaluation[33]. However,
5
Most of venture capital data is private and available data are more often related to successful firms than
under-performing ones.
6
there is a clear research gap on methodologies enabling to value early stage companies and
form optimal portfolios. These methodologies either only enable to value a sector, instead
of a specific company, or require the use of cash flows that are unobservable. Non-financial
features and relations between companies and technologies are instead numerous and easily
observable[10]. We thus formulate our hypotheses as follows,
H1 Using a bi-partite network structure allows to create an algorithm that rank companies
based on their links with technologies.
H2 This algorithm and its ranking may be improved and tilted towards investors’ preferences.
H3 The performance of the ranking is independent from the sector considered.
3.
Data and methods
3.1. Data
We use Crunchbase data.6 Crunchbase is a commercial database that provides access to
financial and managerial data on private and public companies globally. It was created in
2007 by TechCrunch, which is a source of information about start-up activities and their
financing within and across countries. This database has been largely adopted by both
academics and industry practitioners[4]. It is also used by international organizations such
as the OECD[11].
Crunchbase is made of data collected with a multifaceted approach that combines crowdsourcing (through venture programs or direct contributions), machine learning, in-house
processing, and aggregation of third-party providers data. Crunchbase updates and revises
data on a daily basis, which is organized into several entities such as organizations, people,
events, acquisitions, or IPOs. The primary focus of Crunchbase is the technology industry,
although it also includes data on other sectors.
Data can be accessed in two ways: using an API or downloading a *.csv file directly from
the Crunchbase website. Data is split in several databases depending on their type. We
provide a non-exhaustive list in Table 1.7
We first analyze the Crunchbase dataset dedicated to investors. With a total of 185,784
investors divided into 78,001 (41.98%) organisations and 107,783 (58.02%) persons, there
are more investors than target companies. Figure 3 shows that the majority of investors are
pure investors (87.11%). Some organisations are both investee and investor (12.65%). The
remainder of the sample are typically universities. Crunchbase ranks the top 1,000 investors
through its proprietary algorithm. Figure 3 indicates that the majority of investors are
located in the USA (29.62%). In particular, there is a wide gap between the first and second
country, China, where 7.04% of investors are located.
6
Crunchbase website: https://www.crunchbase.com/
Crunchbase daily CSV export from https://data.crunchbase.com/docs/daily-csv-export. Data
downloaded on April 28, 2021.
7
7
Field name
Description
organizations
organization desc
acquisitions
org parents
ipos
people
people desc
degrees
jobs
investors
investments
investment partners
funds
funding rounds
events
event appearances
Organisation profiles
Long descriptions of organisation profiles
List of all acquisitions
Map between parent organisations and subsidiaries
Detail for each IPO
People profiles
Long descriptions for people profiles
Detail of people’s education background
List of all jobs and advisory roles
Active investors (organisations and people)
All investments
Partners responsible for their firm’s investments
Details of investments funds
Details for each funding round in the dataset
Event details
Event participation details
Table 1: Crunchbase files description
This table reports the main fields available from Crunchbase.
Number of investors for each role
Number of investors for each role
160000
50000
140000
100000
80000
40000
161843
count
count
120000
60000
55025
42522
20000
40000
20000
0
30000
23510
investor
investor,company
roles
10000
431
other
0
USA
other
13076
9730
5488
CHN
country_code
GBR
IND
Fig. 3: Summary statistics of investors in Crunchbase data
The left panel depicts the distribution of investors according to their types. The right panel depicts the
number of investors per country.
8
3.2. Methodology
3.2.1. Adaptation of the work by Klein et al. (2015).
In this research we use a bi-partite network that describes the relations among companies and the technologies they are involved in. Figure 2 describes the typical bi-partite
network structure. This structure benefits from advances in fields such as network theory,
Markov chains, and machine learning. We adapt the recursive algorithm with the method
of Hidalgo, Hausmann, and Dasgupta (2009)[18]. We expect the resulting rank to incorporate the positive influence of well-established companies on technologies and, at the same
time, the positive impact of new companies that explore new fields. We build the adjacency
c t
CT
matrix Mc,t
∈ Rn ,n , which takes value 1 if a company c works on a technology t and 0
otherwise. nc and nt represent the number of companies and technologies. We assume that
well-established companies have more means to diversify their expertise and therefore, that
an entity has a relatively high number of neighbours[7, 15]. Thus, we initialize the algorithm
with the degree, i.e. counting the neighbours, of each entity,
(
P t
CT
= kc
wc0 = nt=1 Mc,t
(1)
P
c
n
CT
= kt
wt0 = c=1 Mc,t
The algorithm is a “random walker” that incorporates information about company expertise
and technology relevance at each step. The transition probabilities, Gc,t and Gt,c , describe
the extent to which the entities weights change over the iterations. If the relation between
c and t increases (decreases) the value, the entity weight increases (decreases) in proportion
with the transition probabilities. We define Gc,t and Gt,c ,
M CT kc−β
Gc,t (β) = Pnc c,tM CT
k′−β
c′ =1
c′ ,t c
(2)
−α
M CT k
Gt,c (α) = Pnt c,t CTt ′−α ,
t′ =1
Mc,t′ kt
where α and β inform how coordination generates value. Next, we define the recursive step,
(
P t
wcn+1 = nt=1 Gc,t (β)wtn
(3)
P c
wtn+1 = nc=1 Gt,c (α)wcn
As in PageRank, the recursion ends when the algorithm converges. Our algorithm allows to
consider the market complexity and feedback loops (investments’ impact on companies and
on technologies). We discuss this feature and the optimization of α and β after the addition
of exogenous factors.
3.2.2. Inclusion of exogenous factors
We include exogenous factors as ground truth in the parameters’ calibration step. This
allows to keep the algorithm tractable, while letting it capture the technological structure.
We use this ground truth to compute the Spearman correlation, ρc for companies and ρt for
technologies. Because ρc and ρt depend on α and β [see Eq. (2)], we find the parameters
9
that maximize these correlations,
(
(α∗ , β ∗ ) = arg maxα,β ρc (α, β)
(α∗ , β ∗ ) = arg maxα,β ρt (α, β),
(4)
and we solve this optimization problem with a grid search. Eq. (4) shows that parameters
depend on both companies and technologies. This dependence enables to create the structure
of the bi-partite graph. To obtain the correlation between the TechRank score, which assigns
a weight wc (wt ) to each company (technology) and the ground truth evaluation, which
assigns ŵc (ŵt ) to each company (technology), we normalize both TechRank results and the
exogenous measure in the same range [0, 1].
Investors use the entities’ features to select companies and the investment amount they
want to allocate. We suppose that an investor has n(C) features to pick from, denoted as
(C)
(C)
f1 , . . . , fn(C) , where C (T ) represents the association with the companies (technologies).
Pn(C) (C)
(C)
(C)
pi = 1. For
Each feature fi is associated with a percentage of interest pi and i=0
instance, if a company’s features are the amount of previous investments and its geographical
proximity to the investor, n(C) = 2. An investor may then decide to be interested at 80% in
(C)
(C)
the first feature and at 20% in the second, by selecting p1 = 0.8 and p2 = 0.2. Investors
may also be pushed back by a feature, in which case we multiply it by -1. We define all
notations in Table 2.
10
Variable
(C)
n
nc
n(T )
nt
(C)
pi
(T )
pj
(C)
fi
(T )
fj
ni
M CT
M IC
γti,c
eIC
eC
eT
eC
max
eTmax
fcC
ftT
∈
N
N
N
N
[0, 1]
[0, 1]
c
Rn
t
Rn
N
c t
Rn ·n
i c
Rn ·n
R
i c
Rn ·n
c
Rn
t
Rn
R
R
[0, 1]
[0, 1]
Description
Number of external features available for companies.
Number of companies.
Number of external features available for technologies.
Number of technologies.
Percentage of interest in the company preference number i.
Percentage of interest in the technology preference number j.
Vector of factors associated to the company preference number i.
Vector of factors associated to the technology preference number j.
Number of investors.
Adjacency matrix of the C-T bipartite network
Adjacency matrix of the I-C bipartite network
Amount in funding round between c and i at time t
Total amount of investment between each investor to each company
Total amount of investments toward each company
Total amount of investments toward each technology
Maximum amount of total investments among all the companies
Maximum amount of total investments among all the technologies
Factor related to previous investments into the company number c
Factor related to previous investments into the technology number t
Table 2: Variable definitions
This table presents the variable definitions used throughout the article.
(C)
We convert quantitative and qualitative properties into a number fi ∈ [0, 1]. Once we
(C)
(C)
have created all the factors f (C) = f1 , . . . , fn(C) , the exogenous evaluation ŵc is given by,
(C)
ŵc =
n
X
(C) (C)
pi fi
= p(C) · f (C) .
(5)
i=1
P (C) (C)
(C)
Considering ni=0 pi = 1 and that fi ∈ [0, 1] for each company i, we have ŵc ∈ [0, 1].
The same holds for ŵt . Finally, we have
ŵc = p(C) · f (C)
ŵ = p(T ) · f (T )
t
(6)
Pn(C) (C)
p
=
1
i=0 i
Pn(T ) p(T ) = 1,
i=0
i
where n(C) (n(T ) ) is the number of the company- (technology-)related features and f (C) =
(C)
(C)
(f1 , . . . , fn(C) ). To select the features, we use Crunchbase data about companies and investors (see in Table 1).
11
3.3. Previous Investments
We assume that previous investments is an essential factor to evaluate companies. Investors may be willing to invest in companies which have already received capital or look for
higher returns, targeting newer firms.
To compute this factor, we use the Crunchbase field “funding rounds”, which reports the
amount of all funding rounds from an investor i to a company c. We capture this structure
with another bi-partite network that describes the links among investors (I) and companies
(C). In this case, we weight the edges by the sum of all previous investments from investor
i to company c, until the current period (T ), and compute the adjacency matrix M IC . We
define the amount of a singleP
investment from i to c at time t by γti,c . The weight of the
edge i − c is given by ei,c = Tt=0 γti,c (see Table 2). We then sum the contribution of all
investors to find the attribute fcC ∈ [0, 1] for a company c.8 Next, we normalize and divide
all investments by the maximum investment.
i1
i2
e11
e 21e22
e23
c1
c2
c3
Fig. 4: Investors-companies bi-partite network
This figure depicts a stylized bi-partite network between investors and companies.
Figure 4 depicts the investment structure as an example. We consider two investors i1
and i2 and three companies c1 , c2 , and c3 . We compute the maximum emax as max{e11 +
e21 , e22 , e23 } and the features related to the investments for each company as,
e11 +e21
C
f1 = emax
22
(7)
f2C = eemax
C
e23
f3 = emax ,
where ni (nc ) is the total number of investors (companies). Generalizing, we get,
P
∀i, c
eIC
= Tt=0 γti,c
i,c
eC = Pni e M IC
∀c
c
i=1 i,c i,c
emax = max eC
c
c
(C)
fc = eC
c /emax ,
(8)
for each c ∈ 1, . . . , nc . We present the corresponding algorithm in the Appendix 1. With
8
Note that here, fcC represents the factor related to a company.
12
Eq. (8), for each company we have a factor between 0 and 1 that summarizes the amount
of previous investments.
c1
i1
i2
e11
e 21e22
e23
t1
t2
c2
t3
c3
Fig. 5: Tripartite structure of investors, companies, and technologies
The left hand side figure depicts a typical tri-partite structure. The right-hand side provides an illustration
of this structure with investments (layer 1), companies (layer 2), and technologies (layer 3).
We link the two bi-partite structures investment-companies and companies-technoéogies
to obtain an I-C-T tri-partite structure depicted in Figure 5. This structure allows to assign
some features to technologies from companies (direct link) or investors (indirect link). Thus,
we can find the amount of previous investments on a technology through companies’ funding
rounds. The previous investments’ factor for technology is given by,
(I,C) P(T ) t
ei,c = t=0 γi,c ∀i, c
Pni
∀c
eC
c =
i=1 ei,c
P
c
n
CT
eTt = c=1 ec Mc,t
.
(9)
T
emax = max et
t
(T )
ft = eTt /emax
We provide the algorithm of this methodology in the Appendix 2.
3.4. Location
The second feature we consider is the distance between investors and companies’ locations. We retrieve the addresses of firms and investors from Crunchbase (c address) and map
them to geographic coordinates. We compute the Haversine approximation to measure the
distance. We detail the Haversine approximation in the Appendix 5.3. Investors may prefer
short-distance investments or places with high potential. If they face some investment’s
restrictions, we filter the companies based on the criteria before applying the algorithm.
Otherwise, we add a distance factor to the algorithm.
(C)
We use the Haversine distance h to obtain a factor fc ∈ [0, 1] for each company. We
consider the distance hi,c between the company c and the investor i. We assume that the
(C)
factor is the proximity so that fc tends to one as the distance decreases,
fc(C) → 1
when
13
hi,c → 0.
(10)
(C)
To compute fc , we first find hi,c for each company and identify the maximum distance
hmax among all companies. We normalize by the maximum to obtain a distance that lies in
(C)
the [0, 1] range with, fc = 1 − hi,c /hmax , so that a distance of zero corresponds to a value of
(C)
fc = 1. We report the algorithm in the Appendix 3. We implement the algorithm and run
the experiments using Python and the NumPy, Pandas, NetworkX, Matplotlib, and Seaborn
libraries.
4.
Results
4.1. Cybersecurity field
We select all the companies whose description contains at least two cybersecurity-related
terms and obtain 2,429 companies and 477 technologies.9 Figure 6 display the structure of
the bi-partite network between technologies and companies.
9
The word list is in the Appendix 5.3
14
Law
Service Industry
Physical
EdTech
Manufacturing
National Security Enforcement
Security
Electronics Aerospace
E-Learning
Intrusion Detection
Homeland Security
Innovation Management
Market Research
Swiss Security Solutions
Honeywell International
Ethereum Machine Learning
Private Social Networking
Fitchain
Personal Health
Information Technology
Mobile Devices
Cyber Security
Network Security
Privacy
Computer
1Password
Security
Consulting
Personal Finance
Blockchain
Artificial Intelligence
Silver Shark Solutions
Acronis
Management
Consulting
Big Data
Information Services
Digital Marketing
Marketing Automation
Identity Management
Software
Enterprise Software Web Development
Cloud Infrastructure
Cloud
File Sharing
Computing
Fig. 6: Bi-partite network of cybersecurity companies
This figure describes the bi-partite network of a subset of cybersecurity companies (red nodes) and the
technologies they are involved in (blue nodes). The nodes size represents the number of neighbours.
We assume that investors are only interested in previous investments, both for technologies and companies. We examine how the parameters’ calibration step changes when we
change the investors’ preferences using a smaller sample of companies. Figure 7 shows the
optimization in which the correlations ρc and ρt change according to α and β. In Table 3
We identify the optimal α∗ and β ∗ to be of 0.04 and -1.88 for companies and 0.48 and -2.00
for technologies, respectively. Next, we plug these values in the recursive algorithm.
Companies
∗
Number
α
10
100
499
997
1,494
1,990
2,429
-0.36
-0.04
-0.08
-0.12
-0.12
-0.04
0.04
Technologies
β
∗
1.92
0.92
0.88
0.80
0.80
0.92
-1.88
Number
α∗
β∗
26
134
306
371
416
449
477
-2.00
0.52
0.68
-2.00
0.92
0.56
0.48
0.00
-1.04
-1.36
0.00
-0.12
-2.00
-2.00
Table 3: Optimal parameters in cybersecurity
This table reports the optimal parameters α and β for companies and technologies in cybersecurity,
depending on the number of companies and linked technologies considered as input.
We illustrate the evolution of the TechRank random walker in Figure 8. While the
entities’ positions significantly change over the first steps, they gradually stabilize. With
15
0.20
0.010
Correlation for Technologies
-2.0
0.15
-1.84
-1.68
-1.52
-1.36
0.005
0.10
-1.2
-1.04
-0.88
-0.72
0.000
0.05
-0.56
-0.4
-0.24
-0.08
0.00
0.08
0.005
0.24
0.4
0.05
0.56
0.72
0.88
0.010
1.04
0.10
1.2
1.36
1.52
1.68
0.15
1.84
1.68
1.2
1.52
1.36
1.04
0.88
0.4
0.72
0.56
0.24
0.08
-0.4
-0.08
-0.24
-0.56
-0.72
-1.2
-0.88
-1.04
-1.36
-1.52
-2.0
-1.68
1.84
-1.84
0.015
-2.0
-1.84
-1.68
-1.52
-1.36
-1.2
-1.04
-0.88
-0.72
-0.56
-0.4
-0.24
-0.08
0.08
0.24
0.4
0.56
0.72
0.88
1.04
1.2
1.36
1.52
1.68
1.84
-2.0
-1.84
-1.68
-1.52
-1.36
-1.2
-1.04
-0.88
-0.72
-0.56
-0.4
-0.24
-0.08
0.08
0.24
0.4
0.56
0.72
0.88
1.04
1.2
1.36
1.52
1.68
1.84
Correlation for Companies
0.20
Fig. 7: Grid search of parameters α and β
This figure displays the results of the grid search for parameters α and β for 2,429 companies and 477
technologies in cybersecurity when investors preferences are fully set in previous investments.
the 2,429 companies and 477 technologies, the algorithm requires 723 (1,120) iterations for
companies (technologies) to converge. Entities starting with a high score (the initialisation
is the degree of the node) do not significantly change rank and remain among the best ones.
Thus, the algorithm assigns good scores to entities with many neighbours. Instead, entities
starting with a low degree may significantly change their score, especially in the case of
technologies. TechRank does not only recognize the importance of the most established
entities, it also enables to identify emerging technologies. Figure 11 shows the first classified
entities in cybersecurity.
We check how TechRank performs when we change the number of companies and technologies. We fix the number of companies nc , which yields a resulting number of technologies
nt . For instance, in the cybersecurity field, by selecting 10 companies randomly, we get 26
technologies. Considering that there are 2,429 cybesecurity-related companies on Crunchbase, we study the runtime running the algorithm for 10, 100, 499, 997, 1494, 1990, and
2,429 companies and 26, 134, 306, 372, 431, 456, and 477 technologies respectively.
Figure 9 displays the results of TechRank applied on a subset of 10 cybersecurity companies. We note that “AppOmni”’s position does not change over the iteration, while two
of its technologies, “Software as a Service” (Saas) and “cloud management” increase their
scores. In Figure 10 we display this restricted network of 10 companies, that shows that
SaaS and cloud management do not have other links. Hence, the strength of this company
depends on its ability to combine important technologies (software, cyber security, and cloud
security) with more exotic fields. Similarly, “Integrity Market Group” is the single involved
in some fields (marketing, digital marketing, and advertising). This company does not use
more established technologies and thus does not improve its score. Again, in Figure 10 we
observe that these technologies lie out of the main network. Conversely, “Lacework” and
16
Fig. 8: TechRank scores evolution in cybersecurity
This figure displays the TechRank scores evolution over the iterations for 2,429 companies and 477
technologies in cybersecurity.
“Acronis” follow an opposite trend. Lacework (Acronis) significantly increases (decreases)
its score. One explanation for this behaviour is the fact that Acronis is involved in a lot
of technologies, most of which are not explored by other companies. On the other hand,
Lacework relies on recognized technologies (security, cloud security, and software). Interestingly, the compliance technology, benefits from its connections, increasing its rank by three
positions.
17
Software
Cyber Security
Zscaler
Cloud Security
Lacework
Security
Enterprise Software
Marketing
Opswat
Digital Marketing
Advertising
Compliance
OneTrust
Information technology
Pricacy
Risk Management
Consumer Eletronics
Axis Security
Sensor
AppOmni
Smart Home
SaaS
Cloud Management
BigID
Network Security
Developer APIs
BlockChain
Cloud Computing
Cloud Infrastructure
SimpliSafe
Integrity
Marketing Group
File Sharing
Virtualization
Big Data
Artificial Intelligence
Acronis
Fig. 9: TechRank scores evolution of 10 companies in cybersecurity
This figure displays the TechRank scores evolution over the iterations on a subset of 10 companies and 26
technologies in cybersecurity.
Cloud Security Zscaler Cyber Security
Enterprise Software
Big Data
Security
Artificial Intelligence
AppOmni
BigID
Cloud Management
Software
SaaS
Risk Management
SimpliSafe
Privacy
Consumer Electronics
Compliance
OneTrust
Sensor
Network Security
Smart Home
OPSWAT
Axis Security
Marketing
Developer APIs
Lacework
Acronis
Digital Marketing
Advertising
Blockchain
Integrity Marketing Group
Cloud Computing
Virtualization
Cloud Infrastructure
Information Technology
File Sharing
Fig. 10: Circular network representation of 10 companies in cybersecurity
This figure displays a circular network representation of a subset of 10 companies and 26 technologies in
cybersecurity.
Table 4 reports the number of algorithm iterations before reaching convergence. The
number of iterations needed appears to be independent from the number of entities. Technologies need more iterations than companies, which we explain by the fact that there are
18
TechRank best Technologies
in Cybersecurity
Axis Security
Software
MeWe
Cyber Security
Technologies
Companies
TechRank best Companies
in Cybersecurity
Lacework
CAST Software
Security
Information Technology
GAN Integrity
Enterprise Software
0
1
2
3
4
TechRank
5
6
7
0
1
2
3
4
TechRank
5
6
Fig. 11: TechRank top five scores in cybersecurity
This left (right) panel displays the top five companies (technologies) according to the TechRank score when
run on a subset of 10 companies in cybersecurity.
many more companies than technologies. Since each company has at least one edge, the
technology nodes have a higher degree than the companies on average. Thus, we expect
the structure and the dynamics related to technologies to be more complex. The algorithm
complexity does not only depend on the number of entities, but also on the network structure.
Companies
Technologies
Number
Iterations C
Number
Iterations T
10
100
499
997
1,494
1,990
2,429
32
100
134
196
180
240
723
26
134
306
371
416
449
477
18
155
2,469
194
871
5,000
1,120
Table 4: TechRank convergence
This table reports the number of TechRank iterations before convergence for companies and technologies in
cybersecurity.
4.2. Investment strategy
We investigate how investors can select the strategy that reflects their preferences. If
investors prefer to focus on technologies, they should choose companies working on the best
technologies as selected by the (highest) TechRank score. This decision implies many criteria
such as the number of technologies they want to be invested in, the capital allocation for
19
each company, and the diversification. We sketch the procedure to solve this decision process
quantitatively in the Appendix A1.
4.3. Comparison with the Crunchbase rank
Crunchbase assigns a rank to the top companies of each industry, that takes into account
the entity’s strength of relationships, funding events, news articles, and acquisitions.10 We
compare our results in the cybersecurity sector with the Crunchbase rank and investigate
the strength of the association between the two scores using Spearman’s correlation.
To make the ranks comparable, we convert our algorithm’s output into a ranking. The
resulting Spearman’s correlation of 1.4% indicates that the two ranks are uncorrelated. We
explain these differences by the fact that the Crunchbase rank is fixed, while TechRank is
customizable according to investors’ preferences. Moreover, the Crunchbase rank focuses on
the company’s level of activity and not on its market influence. Furthermore, the Crunchbase
rank results from an algorithm that involves all the companies, while we focus only on
a subset. We attempt to change the investors’ preferences and never obtain correlation
coefficients above 2%. Other explanations we propose for this divergence includes the fact
we assign a weight which identifies the distance between entities in the ranking. In the
same line, TechRank allows decision-makers to set a threshold as starting parameter before
running the algorithm. Finally, the Crunchbase algorithm is not open source and we do not
know its mechanism, which makes the identification of the divergence’s source difficult.
4.4. Runtime
We detail the code related to the TechRank algorithm in the Appendix 1. We run it
on a machine with a 16-cores Intel Xeon CPU E5-2620 v4 @ 2.10GHz and with 126GB of
memory. We investigate the variations in runtime given changes in the number of companies
and technologies.
The runtime is a positive function of the number of entities. For technologies the curve is
much steeper than that for companies. However, considering that the number of technologies
is a direct link to the number of companies, we repeat the experiment treating companies and
technologies together. The random walk phase lines represent the runtime to convergence.
There is a strong similarity between the runtime for companies and technologies, which is
surprising given their different numbers. This also shows how strongly they are correlated
and supports the capability of TechRank to capture the complexity of the cybersecurity
technological landscape. Table 4 reports all the runtimes and we report the corresponding
runtime comparisons for technologies and companies of both cybersecurity and medical field
in the Appendix A2.
10
https://about.crunchbase.com/blog/influential-companies/
20
Companies
Technologies
Number
Parameters’ calibration
Convergence
Number
Parameters’ calibration
Convergence
10
100
499
997
1,494
1,990
2,429
10.21
28.69
189.03
730.43
1,372.17
2,057.42
3,230.99
0.56
13.24
470.10
2,023.18
4,514.11
8,396.26
16,890.26
26
134
306
371
416
449
477
11.75
35.37
154.79
312.65
482.18
656.95
1,071.84
0.57
13.72
483.25
2,392.46
4,404.48
8,096.69
12,779.62
Table 5: TechRank runtime
This table reports the TechRank runtime for companies and technologies in cybersecurity.
4.5. Exogenous factors
We conduct a sensitivity analysis based on investors’ preferences. We restrict the analysis
to 1,000 companies only, given the to long runtime required. We assume investors to be
interested in firm location only and consider the case of an investor based in New York City
and in San Francisco, in turn. In Table 6 we report the outcome in terms of location for
the five top ranked companies in both cases. We uncover a location change in the company
ranking, with the first one being in the state of New York (investor based in New York City)
and in the state of California (investor based in San Francisco), respectively. The companies
with lower rank also reflects these geographical preferences, albeit with some exceptions
(Singapore and Beijing). This implies that, even if remote companies are disadvantaged,
their other attributes overcome this flaw.
Company rank/
Investor location
New York City
San Francisco
1
New York City
(USA)
California
(USA)
2
3
Massachusetts
Quebec
(USA)
(Canada)
Illinois
California
(USA)
(USA)
4
5
California
(USA)
Beijing
(China)
Singapore
(Singapore)
Arizona
(USA)
Table 6: Companies TechRank scores with location
This table reports the location of the top five TechRank companies’ scores when the geographical
preference is fully set in the location of the investors (New York City and San Francisco).
21
4.6. Robustness tests
To test the robustness of our algorithm and benchmark the cybersecurity sector, we apply
TechRank for companies in the medical sector. We choose this sector given the important
number of companies (twice the number of companies working in cybersecurity). We select
the companies with the same methodology, which yields a total of 4,996 companies and 437
technologies. Figure 12 shows the results of TechRank in the medical sector. The runtime
for these companies, reported in the Appendix A2, is on par with those of the cybersecurity
sector. To make the two fields comparable, we set as x-label the number of entities, for both
companies and technologies. The results reveal that the runtime of the two fields, for both
the parameter calibration and the random walker steps, follow the same behaviour, for both
companies and technologies. Increasing the number of entities does not yield any significant
change both in terms of convergence and runtime behaviour. Finally, unlike Klein et al.
(2015), for which the α remains constant and β changes significantly, we observe that all of
our parameters significantly change across sectors[21].
Fig. 12: TechRank scores evolution in the medical field
This figure displays the TechRank scores evolution over the iterations for 2,429 companies and 477
technologies in the medical field.
5.
Conclusion
5.1. Limitations
We choose technologies related to cybersecurity according to Crunchbase description,
e.g., security, privacy, or confidentiality. Since these words may overlap other fields, we require the description to contain at least two of them to classify a company as cybersecurity.
This naive strategy could be improved with more sophisticated techniques, such as natural
language processing NLP. We also face limitation given the lack of information about companies’ resources allocation towards technologies. We only have a list of technologies for each
22
company without more information. Instead, it would be helpful to observe the amount of
expenditures towards each technologies. Our algorithm is static as we do not have access to
time series and it would be interesting to study how the bi-partite network changes. Finally,
we are well aware that introducing exogenous variables may induce a bias given potential
outliers. Our normalization procedure divides the factors by their maximum, which may lead
to unproportionate results if the maximum is an outlier. However, we do not believe that
removing outliers is a viable solution, since this would lead to overlook potentially profitable
opportunities.
5.2. Further research
This research can be expanded with time series, to investigate, for instance, the outcome
of a company divesting from a technology, or investing in a new one. This would also
enables to assess which new technologies are successful. This would give more insights about
investment choices towards too recent ideas. A focus on percolation theory could help assess
the effects of a node disappearance in the network [27]. For this purpose, machine learning
methods could also be employed. Further research should be devoted to investigate additional
exogenous factors in TechRank, enabling investors the widest range of features available
possible. Alternative exogenous factors include the inception date of the company, number
of employees, social networks activity, and even the Crunchbase rank, which is itself based
upon entity’s strength of relations, funding events, news articles, or acquisitions. By the same
token, further investigation should confirm the pertinence of the TechRank algorithm in fields
that include more entities, as an increase in nodes could lead to coordination problems.
Finally, it would be crucial to assess the long-term effects of the TechRank algorithm on
investments returns and technologies’ development through back-testing, that once again,
requires time series.
5.3. Conclusion
We introduce TechRank, an algorithm that assigns a score to companies and technologies
in complex systems. This methodology constitutes the first step towards a new data-driven
investment strategy, which enables investors to follow their preferences while benefiting from
a quantitative approach. We include investors’ preference based upon a case-by-case study.
Our algorithm convergence depends on the number of entities and the complexity of the
relationships within the bi-partite network. Using a restricted number of companies in cybersecurity, we analyze the TechRank scores and we explain the score variations of entities
over iterations. We also explore how results change depending on the company’s location. Finally, we conduct robustness tests in the medical field, for which our results are qualitatively
similar.
We believe that our approach brings value to help investors’ form their decisions. Moreover, our algorithm’s flexibility allows to include exogenous factors and preferences, which
is impossible in alternative existing company ranks, such as that of Crunchbase. Given our
algorithm performance for cybersecurity, a highly complex market, as a case study, we believe that our algorithm would perform well in all markets. TechRank is a complementary,
if not alternative, way to look at portfolio choices.
23
References
[1] Battiston, F., Nicosia, V., Latora, V., 2014. Structural measures for multiplex networks.
Physical Review E 89, 1–14.
[2] Bavelas, A., 1948. A mathematical model for group structures. Applied Anthropology
7 (3), 16–30.
[3] Benzi, M., Estrada, E., Klymko, C., 2013. Ranking hubs and authorities using matrix
functions. Linear Algebra and its Applications 438, 2447–2474.
[4] Besten den, M. L., 2021. Crunchbase research: Monitoring entrepreneurship research in
the age of big data. Available at http://dx.doi.org/10.2139/ssrn.3724395
[5] Bonacich, P., 1972. Factoring and weighting approaches to status scores and clique
identification. Journal of Mathematical Sociology 2, 113–120.
[6] Buldyrev, S. V., Parshani, R., Paul, G., Stanley, H. E., Shlomo, H., 2010. Catastrophic
cascade of failures in interdependent networks. Nature 464, 1025—-1028.
[7] Canito, J., Ramos, P., Moro, S., Rita, P., 2018. Unfolding the relations between companies and technologies under the Big Data umbrella. Computers in Industry 99, 1–8.
[8] Christensen, C. M., 1997. The Innovators Dilemma: When New Technologies Cause
Great Firms to Fail. Harvard Business School Press, Boston, MA.
[9] Cochrane, J. H., 2005. The risk and return of venture capital. Journal of Financial
Economics 75, 3–52.
[10] Dalle, J.-M., Besten den, M. L., Menon, C., 2017. Using Crunchbase for economic and
managerial research. Available at https://doi.org/10.1787/18151965
[11] Dalle, J.-M., Besten den, M. L., Menon, C., 2017. Using Crunchbase for economic and
managerial research. Available at https://doi.org/10.1787/18151965
[12] Donato, D., Laura, L., Leonardi, S., Millozzi, S., 2004. Large scale properties of the
Webgraph. European Physical Journal B 38, 239–243.
[13] Ewens, M., 2009. A new model of venture capital risk and return. Available at http:
//dx.doi.org/10.2139/ssrn.1356322
[14] Freeman, L. C., 1978. Centrality in social networks conceptual clarification. Social Networks 1, 215–239.
[15] Gold, A. H., Malhotra, A., Segars, A. H., 2001. Knowledge management: An organizational capabilities perspective. Journal of Management Information Systems 18,
185–214.
24
[16] Gordon, L. A., Loeb, M. P., Lucyshyn, W., Zhou, L., 2018. Empirical evidence on the
determinants of cybersecurity investments in private sector firms. Journal of Information
Security 9, 133–153.
[17] Gornall, W., Strebulaev, I. A., 2020. Squaring venture capital valuations with reality.
Journal of Financial Economics 135, 120–143.
[18] Hidalgo, C. A., Hausmann, R., Dasgupta, P. S., 2009. The building blocks of economic
complexity. Proceedings of the National Academy of Sciences 26, 10570–10575.
[19] Ingole, P. V., Nichat, M. K., 2013. Landmark based shortest path detection by using dijkestra algorithm and haversine formula. International Journal of Engineering Research
and Applications (IJERA) 3, 162–165.
[20] Katz, L., 1953. A new status index derived from sociometric analysis. Psychometrika
18, 39–43.
[21] Klein, M., Maillart, T., Chuang, J., 2015. The virtuous circle of Wikipedia: Recursive
measures of collaboration structures. In: Proceedings of the 18th ACM Conference on
Computer Supported Cooperative Work & Social Computing, 1106–1115.
[22] Korteweg, A., Nagel, S., 2016. Risk-adjusting the returns to venture capital. Journal of
Finance 71, 1437–1470.
[23] Kurant, M., Thiran, P., 2006. Layered complex networks. Physical Review Letters 96
(13), 1–4.
[24] Moskowitz, T. J., Vissing-Jørgensen, A., 2002. The returns to entrepreneurial investment: A private equity premium puzzle? American Economic Review 92, 745–778.
[25] Page, L., Brin, S., Motwani, R., Winograd, T., 1999. The PageRank citation ranking:
Bringing order to the web. Available at http://ilpubs.stanford.edu:8090/422/
[26] Peng, L., 2001. Building a venture capital index. Available at http://dx.doi.org/10.
2139/ssrn.281804
[27] Piraveenan, M., Prokopenko, M., Hossain, L., 2013. Percolation centrality: Quantifying
graph-theoretic impact of nodes during percolation in networks. PLOS ONE 8 (1), 1–14.
[28] Saxena, A., Iyengar, S., 2020. Centrality measures in complex networks: A survey.
Available at https://doi.org/10.48550/arXiv.2011.07190
[29] Saxena, A., Iyengar, S. R. S., 2017. Global rank estimation. Available at https://
arxiv.org/abs/1710.11341
[30] Schwarty, E. S., Moon, M., 2000. Rational pricing of internet companies. Financial
Analysts Journal 56 (3), 62–75.
[31] Tu, X., Jiang, G.-P., Song, Y., Zhang, X., 2018. Novel multiplex PageRank in multilayer
networks. IEEE Access 6, 12530–12538.
25
[32] Xing, W., Ghorbani, A., 2004. Weighted PageRank algorithm. In: Second Annual Conference on Communication Networks and Services Research, 305–314.
[33] Zhong, H., Chuanren, L., Zhong, J., Xiong, H., 2018. Which startup to invest in: A
personalized portfolio strategy. Annals of Operations Research 263, 339–360.
26
Appendix
Start
The investor has a
capital K to invest
Any strong
preference?
Yes
No
Filter entities
Set preferences
and weights
Run the TechRank
algorithm and get the
entities' score
Focus
on companies (C)
or technologies
(T)?
C
T
One
One
T or more
Ts?
More
Select the Cs
working on that T
One
C or more
Cs?
One
One C for
each T?
Yes
More
Invest the
whole capital
K in C
Select the best C for
each T according to
TechRank
No
End
Split
capital K
equaly?
Select how
many Cs for
each T
Yes
No
Invest K/n in
each
company
Define how
much invest
in each C
End
End
Yes
n: number of
companies
Same
number of Cs for
each T?
No
Decide how to define
the number of Cs for
each T according to
Ts' score
Fig. A1: Flowchart of the investment process
This flowchart sketches a potential investment process that uses TeckRank and investment preferences
(exogenous factors and investment styles) before reaching an optimal investment and portfolio choice.
List of words used to identify companies’ sectors
List of words related to cybersecurity: cybersecurity, confidentiality, integrity, availability, secure, security, safe, reliability, dependability, confidential, confidentiality, integrity,
availability, defence, defensive, privacy.
List of words related to the medical field: cure, medicine, surgery, doctors, nurses, hospital, medication, prescription, pill, health, cancer, antibiotic, HIV, cancers, disease, resonance, rays, CAT, blood, blood transfusion, accident, injuries, emergency, poison, transplant, biotechnology, health care, healthcare, health-tech, genetics, DNA, RNA, lab, heart,
lung, lungs, kidneys, brain, gynaecologist, cholesterol, diabetes, stroke, infections, infection,
ECG, sonogram.
27
Runtime evolution
Runtime parameters calibration
104
104
103
Seconds
Seconds
103
102
102
101
C medicine
T medicine
C cybersecurity
T cybersecurity
101
0
0
100
0
200
0
300
Number of entities
0
400
C medicine
T medicine
C cybersecurity
T cybersecurity
100
0
0
500
0
100
0
200
0
300
Number of entities
0
400
0
500
Fig. A2: Runtime comparisons
The left panel displays the grid search runtime for cybersecurity and medical fields. The right panel
displays the parameters’ calibration runtime for the cybersecurity and medical fields. The ordinate axis
uses a logarithmic scale.
Algorithm 1 Previous investments factor for companies
1: eC ← [0] · len(c names)
2: for c ∈ range(c names) do
3:
for i ∈ range(i names) do
do
4:
for c ∈ range(i
PT names)
i,c
t
⊲
γ
5:
eIC
←
γ
i,c is the amount of the investment from i to c at time t
i,c
t=0 t
6:
eC [c] ← eC [c] + eIC
i,c
7:
end for
8:
end for
9: end for
C
10: eC
max ← max (e )
11: f C ← eC /emax
⊲ f C : list of previous investments for each technology
12: return f C
28
Algorithm 2 Previous investments factor for technologies
1: eC ← [0] · len(c names)
2: for c ∈ range(c names) do
3:
for i ∈ range(i names) do
P(T ) IC
t
4:
eIC
⊲ γi,c
is the amount of the investment from i to c at time t
i,c ←
t=0 γt
5:
eC [c] ← eC [c] + eIC
i,c
6:
end for
7: end for
8: eT ← eC · M CT
⊲ Matrix multiplication
T
9: emax ← max (e )
10: f T ← eT /emax
⊲ f T : list of previous investments for each technology
11: return f T
Algorithm 3 Geographic coordinates factor
1: h dict ← {}
2: for c name, c address ∈ c locations do
3:
lat ← c address.latitude
4:
lon ← c address.longitude
5:
h ← haver dist(lat, lon, lat inv, lon in)
6:
h dict[c name] ← 1/h
7: end for
8: h max ← max (h dict)
9: for c name, h ∈ h dict do
10:
h dict[c name] ← 1 − h/h max
11: end for
12: return h dict
⊲ haver dist is a function we have created
Distance computation
We obtain the distance between two points on earth with the Haversine approximation
(hav(θ)), using latitude and longitude of the locations[19].
Let (λ1 , φ1 ) and (λ2 , φ2 ) be the longitude and latitude in radiance of two points on a
sphere and θ the central angle given by the spherical law of cosines, the Haversine distance
writes,
h = hav(θ) = hav(φ2 − φ1 ) + cos φ1 cos φ2 hav(λ2 − λ1 ).
29
(11)