Academia.eduAcademia.edu

TechRank

2022, arXiv (Cornell University)

We introduce TechRank, a recursive algorithm based on a bi-partite graph with weighted nodes. We develop TechRank to link companies and technologies based on the method of reflection. We allow the algorithm to incorporate exogenous variables that reflect an investor's preferences. We calibrate the algorithm in the cybersecurity sector. First, our results help estimate each entity's influence and explain companies' and technologies' ranking. Second, they provide investors with a quantitative optimal ranking of technologies and thus, help them design their optimal portfolio. We propose this method as an alternative to traditional portfolio management and, in the case of private equity investments, as a new way to price assets for which cash flows are not observable.

TechRank arXiv:2210.07824v1 [q-fin.CP] 14 Oct 2022 Anita Mezzetti1 , Loı̈c Maréchal2,3 , Dimitri Percia David4 , William Lacube2 , Sébastien Gillard5,6 , Michael Tsesmelis2 , Thomas Maillart5 , and Alain Mermoud2 1 Credit Suisse Campus, armasuisse Science and Technology 3 University of Lausanne (HEC Lausanne) 4 University of Applied Science (HES-SO Valais-Wallis) 5 GSEM, University of Geneva 6 Military Academy, ETH Zurich 2 Cyber-Defence Abstract We introduce TechRank, a recursive algorithm based on a bi-partite graph with weighted nodes. We develop TechRank to link companies and technologies based on the method of reflection. We allow the algorithm to incorporate exogenous variables that reflect an investor’s preferences. We calibrate the algorithm in the cybersecurity sector. First, our results help estimate each entity’s influence and explain companies’ and technologies’ ranking. Second, they provide investors with a quantitative optimal ranking of technologies and thus, help them design their optimal portfolio. We propose this method as an alternative to traditional portfolio management and, in the case of private equity investments, as a new way to price assets for which cash flows are not observable. JEL classification: C14, C69, G17, G24 Keywords: private equity, bipartite networks, technology monitoring, portfolio optimization This document is the results of a research project funded by the Cyber-Defence Campus, armasuisse Science and Technology. 1. Introduction This work investigates the innovation structure and the dynamics underlying the life cycle of technologies. We fill two research gaps. The first concerns the identification of future benefits and risks of emerging technologies for the society. The second regards the valuation or early-stage companies and optimal investment decisions. To fill these gaps, we introduce the TechRank algorithm. Our methodology assigns a score to each entities, i.e., technologies and firms, based upon their contribution to the technological ecosystem. We expect this method to help stakeholders in forming optimal decisions for investment, procurement, and technology monitoring. We calibrate our model in the cybersecurity sector, although TechRank could apply to any sector. The cybersecurity technological landscape represents a particular challenge for this calibration, given the important share of start-ups and fast innovations it yields.[16]. Moreover, the important number of cyber-attacks and the increasing costs they incur has boosted cybersecurity investments.1 According to Bloomberg, “the global cybersecurity market size is expected to reach USD 326.4 billion by 2027, registering a compound annual growth rate of 10.0% from 2020 to 2027”.2 To develop the TechRank algorithm, we first model and map the ecosystem of companies and technologies from the Crunchbase dataset using a bi-partite network. The bi-partite network structure accurately describes this complex and heterogeneous system. We evaluate the relative influence of the network nodes in the ecosystem by adapting a recursive algorithm that estimates network-centrality. This methodology should help decision-makers and investors to assess the influence of entities in the cybersecurity ecosystem, reducing investment uncertainties. In fact, around 90% of startups fail and in 42% of the cases this is due to incorrect evaluation of the market demand. The second reason (29%) is because they run out of funding and personal money.3 Christensen (1997) highlights that well-managed companies also break down, because they over-invest in new technologies[8]. Thus, by selecting the right technologies to invest in goes along with the optimal investment strategy. Our research takes inspiration from Google’s PageRank algorithm, that ranks web pages according to readers’ interest[25]. We use a similar approach with bi-partite networks to assign a score to companies and technologies. Our method is flexible and permits to incorporate investors’ preferences such as location or previous funding rounds. TechRank let the investor select entities’ features that reflect their interests. The algorithm uses their choices as input, which tweaks the entities’ score to reflect them. This enables investors to select a personalized portfolio strategy using a quantitative methodology. The evaluation of companies and new technologies largely depends on investors’ personal choices, which may lead to misread market demand. This work aims to lead to more methodical decision making for 1 The New York Times: “As Cyberattacks Surge, Security Start-Ups Reap the Rewards” by Erin Woo (July 26, 2021). Yahoo Finance: “Microsoft Securing its Position with Cybersecurity Investments” by TipRanks (July 20, 2021). 2 Bloomberg: “Global Cybersecurity Market Could Exceed $320 Billion in Revenues by 2027” (July 29, 2020). 3 Findstack: “The Ultimate List of Startup Statistics for 2021” by Jack Steward. 2 investors. The remainder of this article proceeds as follows. Section 2 presents the literature review and hypotheses. Section 3 details the data and the methodology. Section 4 presents the results. Section 5 concludes. 2. Literature review and hypotheses development 2.1. Centrality measures In network analysis, the centrality estimates the importance of nodes through ranking. The most simple centrality estimate is the “degree”, which counts the number of neighbours of a node. One of its drawbacks is that it does not show which one stands in the center of the network. Two nodes may share the same degree, while being more or less peripheral. Thus, the degree is a local centrality measure, which does not capture the influence across nodes within the graph. 9 2 8 0 1 7 6 5 4 3 Fig. 1: Central and peripheral nodes This figure depicts the difference between central (red) and peripheral (brown) nodes in a graph. Another important centrality measure is the “closeness”, which measures how long it takes for information to spread from one node to the next. Specifically, closeness is defined as the reciprocal of the “farness”, i.e. the sum of distances of one node with respect to all other nodes. The “betweenness centrality” of a node measures how often a node stands in the shortest path between a pair of other nodes (see, e.g., Bavelas; 1948, Saxena and Iyengar; 2020, and Freeman; 1978)2, 28, 14). Another strand of research focuses on the top-K shortest path identification in a complex network, a topic less tackled by the literature than centrality. To rank nodes, one must compute the centrality of all nodes and compare them to extract the rank, which is not always feasible due to the size of the network. To overcome this problem Saxena and Iyengar (2017) attempt to estimate the global centrality of a node without analyzing the whole network[29]. Similarly, Bavelas (1948) develops a structural centrality measure in the context of social graphs[2]. Other centrality concepts include the eigenvector, Katz, or PageRank centralities[5, 25, 20]. Finally, Freeman (1978) creates a formal mathematical framework for centrality, which includes degree, closeness, and betweenness and advocates for the combination of different kinds of centrality measures[14]. 3 2.2. Page Rank Page, Brin, Motwan, and Winograd (1999) develop the PageRank algorithm[25]. Its primary goal is to rank web pages objectively, a challenge with the fast-growing web. PageRank assigns a score to each web page based on its relations with other web pages in the graph. Other fields have benefited PageRank providing modifications and improvements. Xing and Ghorbani (2014) extend the algorithm and propose the weighted PageRank (WPR)[32]. This algorithm assigns larger rank values to more important pages, instead of dividing the rank evenly among its outlink pages.4 Each outlink page gets a value proportional to its popularity, taking into account the links weights. On caveat of PageRank and its variants is that they do not consider n-partite structures, yet, web pages can all be linked to one another. Bi-partite networks address this issue and capture this complexity, among other structures. 2.3. Bi-partite networks Networks are a fundamental tool to capture the relations between entities. Graphs (G) are composed by vertices (V ) and edges (E), and we denote G = (V, E). We build links and mathematically analyze many properties of the whole system and of singular entities. To graphically represent a real system, we synthesize its information into a simple graph framework. This simplification generates an information loss in the modelling process. Simple network structures might discard important information about the structure and function of the original system [23]. As a consequence, the failure of a very small fraction of nodes in one network may lead to the complete fragmentation of a system[6]. To solve the problem, extensions to the simple structure G = (V, E) are added and yield graphs with more powerful features. For instance, in case vertices are connected by relationships of different kinds, Battiston, Nicosia, and Latora (2014) advocate to work with multiplex networks, i.e. networks where each node appears in a set of different layers, and each layer describes all the edges of a given type [1]. When it is possible to distinguish the nature of the edges, multiplex networks are an effective approach, which starts from embedding the edges in different layers according to their type. However, even if we have two kinds of nodes, the nature of the edges is unique. Therefore, a more suitable approach is bi-partite networks. Bi-partite networks are for instance, a good way to describe the technological and business landscape. In Figure 2, we depict two sets of nodes, companies and technologies, which are interconnected but do not present edges within the same set. There are multiple adaptations of the PageRank algorithm to bi-partite structures[3, 12, 31, 21]. In particular, Benzi, Estrada, and Klymko (2015), Donato, Laura, Leonardi, and Millozzi (2004), and Tu, Jiang, Song, and Zhang (2018) extend the PageRank algorithm to multiplex networks. They assume that only some clusters of the graph are multiplex networks and extend the PageRank algorithm only to analyze the sub-graph centrality. Bipartite networks are used to transform directed into undirected networks with twice the number of vertices. Klein, Maillart, and Chuang (2015) extend PageRank in the Wikipedia editors and articles context [21]. The application of this algorithm to the case of interactions between 4 Given a web page W, an inlink of W is a link of another web page that includes a link pointing to W. An outlink of W is a link appearing in W, which points to another web page. 4 Companies c1 t1 c2 t2 c3 t3 Technologies Fig. 2: Bi-partite structure of companies and technologies The left panel depicts a typical bi-partite structure. The right panel provides an illustration of this structure with companies (layer 1) and technologies (layer 2). companies and technologies is straightforward. A major benefit of this approach is that it starts from an unweighted graph, linking authors and articles. They develop a recursive algorithm in which the two entities contribute to the quality (for articles) or the expertise (for authors) of each other. They develop a bi-partied random walker by building the adjacency matrix Me,a that takes value 1 if editor e has edited article a and 0 otherwise, which tracks all the editors’ contributions. They obtain Me,a ∈ Rne ,na , where ne and na are the number of editors and articles. They sort editors by the number of articles’ contributions and assign a contribution (quality) value to each editor (article) based on their degree. The expertise we0 (quality wa0 ) is given by the number of articles (editors) they have worked on (have received modifications). The second part of the algorithm follows a Markov process in its iterations. The step wn n (w = wn (α, β)) only depends on information available at wn−1 . At each step, the algorithm incorporates information about the expertise of editors and the quality of articles, within the bi-partite network structure. The process is a random walker with jumps, whose transition probability is zero in the case Me,a = 0. Next, the authors define two variables for the transition probability, Ge,a (β) and Ge,a (α)). Ge,a (β) represents the probability of jumping from article a to editor e and Ga,e (α) represents the probability to jump from an editor to an article. Both parameters depend on initial conditions and the selection of optimal parameters is done through a grid search that maximizes the Spearman rank-correlation between the rank given by the model and a ground-truth metrics obtained independently. Finally Klein et al. (2015) observe a “less-is-more” situation since the presence of too many editors working on an article is detrimental to its quality. Studying different categories of Wikipedia articles they find α to remain constant, while β varies significantly across categories. Estimating the global rank of a node starting from local information and centrality measures is still an open research question in many sectors [28]. In particular, no research to our knowledge use this approach for investment’s decision and portfolio optimization. Yet, this approach could help overcome the limitations of standards financial models in private equity, in which the network structure is easily obtainable, whereas the cash flow process is not. 5 2.4. Private equity valuation Private firms are not required to publicly disclose their financial statements, which makes it difficult to measure their past performance and estimate their expected returns, without insider information. Moreover, since they are not listed on exchanges we do not observe expectations of market participants. Thus, standard asset pricing methods fall short in this context. Similarly, private equity analysts must rely on insider or private information to value private firms. These valuations generally occur around a financing round and impose to take into account the capital dilution to compute realized returns on the firm[17]. One approach that attempts to overcome these limitations and estimate the expected returns and risk in venture capital is Cochrane (2005)[9]. He uses a maximum likelihood estimation method to obtain these values at the market and sector levels, such as healthcare, biotechnology, technology companies, and retail services. He finds a mean arithmetic return of 59%, an alpha of 32%, a beta of 1.9 and a volatility of 86% (equivalent to a 4.7% daily volatility). Given that the distribution of returns is heavily positively skewed in venture capital, he adopts a logarithmic model that also accounts for the inherent selection bias of this asset class.5 Ewens (2009) updates this method on returns computed from financing round to the next[13]. He adopts a three-regime mixture model (failure, medium returns, and “home-runs”). He also corrects for the selection bias and obtains an alpha of 27% and a beta of 2.4. He finds that 60% of all venture capital investments have a negative log return. Altogether the results are similar, venture capital investments exhibit positive alpha, large beta, and a high volatility. Other attempts to evaluate the market parameters of the venture capital asset class includes Korteweg and Nagel (2016) and Moskowitz and Vissing-Jorgensen (2010) with results in line with the aforementioned studies[22, 24]. Another strand of research attempts to index and benchmark the private equity market. Peng (2001) builds a venture capital index from 1987 to 1999 from about 13,000 financing rounds targeting over 5,600 firms[26]. He addresses the problems of missing data, censored data, and sample selection using a re-weighting procedure and method of moment regressions. From the index perspective, the results are qualitatively the same, that is, high and volatile returns to venture capital (average return of 55.18% per year). He finds his index to display a much higher volatility than the S&P 500 and NASDAQ indices and high exposure to these indices (betas of 2.4 and 4.7, respectively). Other venture capital indices construction includes Hwang, Quigley, and Woodward (2005), Schmidt (2006), and Cuming, Has and Schweizer (2013), all obtaining results on par with the aforementioned studies. One limitation of the above studies is that they only estimate these parameters at an aggregate level. An investor could form her investment’s decisions and portfolio choices segregating among sectors, but not obtain the actual firms’ parameters. One exception is Schwartz and Moon (2000), who provide an approach based upon real-options theory to price individual firms[30]. However, this method requires the observations of cash flows and they only provide one calibration example with Amazon. Thus, there remains a caveat in methodology to help investors’ forming optimal decisions using all the information available. Given the recent venture capital boom, Zhong, Chuanren, Zhong, and Xiong (2018) advocates for the use of quantitative methodologies of screening and evaluation[33]. However, 5 Most of venture capital data is private and available data are more often related to successful firms than under-performing ones. 6 there is a clear research gap on methodologies enabling to value early stage companies and form optimal portfolios. These methodologies either only enable to value a sector, instead of a specific company, or require the use of cash flows that are unobservable. Non-financial features and relations between companies and technologies are instead numerous and easily observable[10]. We thus formulate our hypotheses as follows, H1 Using a bi-partite network structure allows to create an algorithm that rank companies based on their links with technologies. H2 This algorithm and its ranking may be improved and tilted towards investors’ preferences. H3 The performance of the ranking is independent from the sector considered. 3. Data and methods 3.1. Data We use Crunchbase data.6 Crunchbase is a commercial database that provides access to financial and managerial data on private and public companies globally. It was created in 2007 by TechCrunch, which is a source of information about start-up activities and their financing within and across countries. This database has been largely adopted by both academics and industry practitioners[4]. It is also used by international organizations such as the OECD[11]. Crunchbase is made of data collected with a multifaceted approach that combines crowdsourcing (through venture programs or direct contributions), machine learning, in-house processing, and aggregation of third-party providers data. Crunchbase updates and revises data on a daily basis, which is organized into several entities such as organizations, people, events, acquisitions, or IPOs. The primary focus of Crunchbase is the technology industry, although it also includes data on other sectors. Data can be accessed in two ways: using an API or downloading a *.csv file directly from the Crunchbase website. Data is split in several databases depending on their type. We provide a non-exhaustive list in Table 1.7 We first analyze the Crunchbase dataset dedicated to investors. With a total of 185,784 investors divided into 78,001 (41.98%) organisations and 107,783 (58.02%) persons, there are more investors than target companies. Figure 3 shows that the majority of investors are pure investors (87.11%). Some organisations are both investee and investor (12.65%). The remainder of the sample are typically universities. Crunchbase ranks the top 1,000 investors through its proprietary algorithm. Figure 3 indicates that the majority of investors are located in the USA (29.62%). In particular, there is a wide gap between the first and second country, China, where 7.04% of investors are located. 6 Crunchbase website: https://www.crunchbase.com/ Crunchbase daily CSV export from https://data.crunchbase.com/docs/daily-csv-export. Data downloaded on April 28, 2021. 7 7 Field name Description organizations organization desc acquisitions org parents ipos people people desc degrees jobs investors investments investment partners funds funding rounds events event appearances Organisation profiles Long descriptions of organisation profiles List of all acquisitions Map between parent organisations and subsidiaries Detail for each IPO People profiles Long descriptions for people profiles Detail of people’s education background List of all jobs and advisory roles Active investors (organisations and people) All investments Partners responsible for their firm’s investments Details of investments funds Details for each funding round in the dataset Event details Event participation details Table 1: Crunchbase files description This table reports the main fields available from Crunchbase. Number of investors for each role Number of investors for each role 160000 50000 140000 100000 80000 40000 161843 count count 120000 60000 55025 42522 20000 40000 20000 0 30000 23510 investor investor,company roles 10000 431 other 0 USA other 13076 9730 5488 CHN country_code GBR IND Fig. 3: Summary statistics of investors in Crunchbase data The left panel depicts the distribution of investors according to their types. The right panel depicts the number of investors per country. 8 3.2. Methodology 3.2.1. Adaptation of the work by Klein et al. (2015). In this research we use a bi-partite network that describes the relations among companies and the technologies they are involved in. Figure 2 describes the typical bi-partite network structure. This structure benefits from advances in fields such as network theory, Markov chains, and machine learning. We adapt the recursive algorithm with the method of Hidalgo, Hausmann, and Dasgupta (2009)[18]. We expect the resulting rank to incorporate the positive influence of well-established companies on technologies and, at the same time, the positive impact of new companies that explore new fields. We build the adjacency c t CT matrix Mc,t ∈ Rn ,n , which takes value 1 if a company c works on a technology t and 0 otherwise. nc and nt represent the number of companies and technologies. We assume that well-established companies have more means to diversify their expertise and therefore, that an entity has a relatively high number of neighbours[7, 15]. Thus, we initialize the algorithm with the degree, i.e. counting the neighbours, of each entity, ( P t CT = kc wc0 = nt=1 Mc,t (1) P c n CT = kt wt0 = c=1 Mc,t The algorithm is a “random walker” that incorporates information about company expertise and technology relevance at each step. The transition probabilities, Gc,t and Gt,c , describe the extent to which the entities weights change over the iterations. If the relation between c and t increases (decreases) the value, the entity weight increases (decreases) in proportion with the transition probabilities. We define Gc,t and Gt,c ,  M CT kc−β  Gc,t (β) = Pnc c,tM CT k′−β c′ =1 c′ ,t c (2) −α M CT k  Gt,c (α) = Pnt c,t CTt ′−α , t′ =1 Mc,t′ kt where α and β inform how coordination generates value. Next, we define the recursive step, ( P t wcn+1 = nt=1 Gc,t (β)wtn (3) P c wtn+1 = nc=1 Gt,c (α)wcn As in PageRank, the recursion ends when the algorithm converges. Our algorithm allows to consider the market complexity and feedback loops (investments’ impact on companies and on technologies). We discuss this feature and the optimization of α and β after the addition of exogenous factors. 3.2.2. Inclusion of exogenous factors We include exogenous factors as ground truth in the parameters’ calibration step. This allows to keep the algorithm tractable, while letting it capture the technological structure. We use this ground truth to compute the Spearman correlation, ρc for companies and ρt for technologies. Because ρc and ρt depend on α and β [see Eq. (2)], we find the parameters 9 that maximize these correlations, ( (α∗ , β ∗ ) = arg maxα,β ρc (α, β) (α∗ , β ∗ ) = arg maxα,β ρt (α, β), (4) and we solve this optimization problem with a grid search. Eq. (4) shows that parameters depend on both companies and technologies. This dependence enables to create the structure of the bi-partite graph. To obtain the correlation between the TechRank score, which assigns a weight wc (wt ) to each company (technology) and the ground truth evaluation, which assigns ŵc (ŵt ) to each company (technology), we normalize both TechRank results and the exogenous measure in the same range [0, 1]. Investors use the entities’ features to select companies and the investment amount they want to allocate. We suppose that an investor has n(C) features to pick from, denoted as (C) (C) f1 , . . . , fn(C) , where C (T ) represents the association with the companies (technologies). Pn(C) (C) (C) (C) pi = 1. For Each feature fi is associated with a percentage of interest pi and i=0 instance, if a company’s features are the amount of previous investments and its geographical proximity to the investor, n(C) = 2. An investor may then decide to be interested at 80% in (C) (C) the first feature and at 20% in the second, by selecting p1 = 0.8 and p2 = 0.2. Investors may also be pushed back by a feature, in which case we multiply it by -1. We define all notations in Table 2. 10 Variable (C) n nc n(T ) nt (C) pi (T ) pj (C) fi (T ) fj ni M CT M IC γti,c eIC eC eT eC max eTmax fcC ftT ∈ N N N N [0, 1] [0, 1] c Rn t Rn N c t Rn ·n i c Rn ·n R i c Rn ·n c Rn t Rn R R [0, 1] [0, 1] Description Number of external features available for companies. Number of companies. Number of external features available for technologies. Number of technologies. Percentage of interest in the company preference number i. Percentage of interest in the technology preference number j. Vector of factors associated to the company preference number i. Vector of factors associated to the technology preference number j. Number of investors. Adjacency matrix of the C-T bipartite network Adjacency matrix of the I-C bipartite network Amount in funding round between c and i at time t Total amount of investment between each investor to each company Total amount of investments toward each company Total amount of investments toward each technology Maximum amount of total investments among all the companies Maximum amount of total investments among all the technologies Factor related to previous investments into the company number c Factor related to previous investments into the technology number t Table 2: Variable definitions This table presents the variable definitions used throughout the article. (C) We convert quantitative and qualitative properties into a number fi ∈ [0, 1]. Once we (C) (C) have created all the factors f (C) = f1 , . . . , fn(C) , the exogenous evaluation ŵc is given by, (C) ŵc = n X (C) (C) pi fi = p(C) · f (C) . (5) i=1 P (C) (C) (C) Considering ni=0 pi = 1 and that fi ∈ [0, 1] for each company i, we have ŵc ∈ [0, 1]. The same holds for ŵt . Finally, we have   ŵc = p(C) · f (C)    ŵ = p(T ) · f (T ) t (6) Pn(C) (C)  p = 1  i=0 i   Pn(T ) p(T ) = 1, i=0 i where n(C) (n(T ) ) is the number of the company- (technology-)related features and f (C) = (C) (C) (f1 , . . . , fn(C) ). To select the features, we use Crunchbase data about companies and investors (see in Table 1). 11 3.3. Previous Investments We assume that previous investments is an essential factor to evaluate companies. Investors may be willing to invest in companies which have already received capital or look for higher returns, targeting newer firms. To compute this factor, we use the Crunchbase field “funding rounds”, which reports the amount of all funding rounds from an investor i to a company c. We capture this structure with another bi-partite network that describes the links among investors (I) and companies (C). In this case, we weight the edges by the sum of all previous investments from investor i to company c, until the current period (T ), and compute the adjacency matrix M IC . We define the amount of a singleP investment from i to c at time t by γti,c . The weight of the edge i − c is given by ei,c = Tt=0 γti,c (see Table 2). We then sum the contribution of all investors to find the attribute fcC ∈ [0, 1] for a company c.8 Next, we normalize and divide all investments by the maximum investment. i1 i2 e11 e 21e22 e23 c1 c2 c3 Fig. 4: Investors-companies bi-partite network This figure depicts a stylized bi-partite network between investors and companies. Figure 4 depicts the investment structure as an example. We consider two investors i1 and i2 and three companies c1 , c2 , and c3 . We compute the maximum emax as max{e11 + e21 , e22 , e23 } and the features related to the investments for each company as,  e11 +e21 C  f1 = emax 22 (7) f2C = eemax   C e23 f3 = emax , where ni (nc ) is the total number of investors (companies). Generalizing, we get,  P  ∀i, c eIC = Tt=0 γti,c  i,c   eC = Pni e M IC ∀c c i=1 i,c i,c  emax = max eC c   c   (C) fc = eC c /emax , (8) for each c ∈ 1, . . . , nc . We present the corresponding algorithm in the Appendix 1. With 8 Note that here, fcC represents the factor related to a company. 12 Eq. (8), for each company we have a factor between 0 and 1 that summarizes the amount of previous investments. c1 i1 i2 e11 e 21e22 e23 t1 t2 c2 t3 c3 Fig. 5: Tripartite structure of investors, companies, and technologies The left hand side figure depicts a typical tri-partite structure. The right-hand side provides an illustration of this structure with investments (layer 1), companies (layer 2), and technologies (layer 3). We link the two bi-partite structures investment-companies and companies-technoéogies to obtain an I-C-T tri-partite structure depicted in Figure 5. This structure allows to assign some features to technologies from companies (direct link) or investors (indirect link). Thus, we can find the amount of previous investments on a technology through companies’ funding rounds. The previous investments’ factor for technology is given by,  (I,C) P(T ) t ei,c = t=0 γi,c ∀i, c    Pni   ∀c eC c = i=1 ei,c  P c n CT eTt = c=1 ec Mc,t . (9)   T emax = max et   t   (T ) ft = eTt /emax We provide the algorithm of this methodology in the Appendix 2. 3.4. Location The second feature we consider is the distance between investors and companies’ locations. We retrieve the addresses of firms and investors from Crunchbase (c address) and map them to geographic coordinates. We compute the Haversine approximation to measure the distance. We detail the Haversine approximation in the Appendix 5.3. Investors may prefer short-distance investments or places with high potential. If they face some investment’s restrictions, we filter the companies based on the criteria before applying the algorithm. Otherwise, we add a distance factor to the algorithm. (C) We use the Haversine distance h to obtain a factor fc ∈ [0, 1] for each company. We consider the distance hi,c between the company c and the investor i. We assume that the (C) factor is the proximity so that fc tends to one as the distance decreases, fc(C) → 1 when 13 hi,c → 0. (10) (C) To compute fc , we first find hi,c for each company and identify the maximum distance hmax among all companies. We normalize by the maximum to obtain a distance that lies in (C) the [0, 1] range with, fc = 1 − hi,c /hmax , so that a distance of zero corresponds to a value of (C) fc = 1. We report the algorithm in the Appendix 3. We implement the algorithm and run the experiments using Python and the NumPy, Pandas, NetworkX, Matplotlib, and Seaborn libraries. 4. Results 4.1. Cybersecurity field We select all the companies whose description contains at least two cybersecurity-related terms and obtain 2,429 companies and 477 technologies.9 Figure 6 display the structure of the bi-partite network between technologies and companies. 9 The word list is in the Appendix 5.3 14 Law Service Industry Physical EdTech Manufacturing National Security Enforcement Security Electronics Aerospace E-Learning Intrusion Detection Homeland Security Innovation Management Market Research Swiss Security Solutions Honeywell International Ethereum Machine Learning Private Social Networking Fitchain Personal Health Information Technology Mobile Devices Cyber Security Network Security Privacy Computer 1Password Security Consulting Personal Finance Blockchain Artificial Intelligence Silver Shark Solutions Acronis Management Consulting Big Data Information Services Digital Marketing Marketing Automation Identity Management Software Enterprise Software Web Development Cloud Infrastructure Cloud File Sharing Computing Fig. 6: Bi-partite network of cybersecurity companies This figure describes the bi-partite network of a subset of cybersecurity companies (red nodes) and the technologies they are involved in (blue nodes). The nodes size represents the number of neighbours. We assume that investors are only interested in previous investments, both for technologies and companies. We examine how the parameters’ calibration step changes when we change the investors’ preferences using a smaller sample of companies. Figure 7 shows the optimization in which the correlations ρc and ρt change according to α and β. In Table 3 We identify the optimal α∗ and β ∗ to be of 0.04 and -1.88 for companies and 0.48 and -2.00 for technologies, respectively. Next, we plug these values in the recursive algorithm. Companies ∗ Number α 10 100 499 997 1,494 1,990 2,429 -0.36 -0.04 -0.08 -0.12 -0.12 -0.04 0.04 Technologies β ∗ 1.92 0.92 0.88 0.80 0.80 0.92 -1.88 Number α∗ β∗ 26 134 306 371 416 449 477 -2.00 0.52 0.68 -2.00 0.92 0.56 0.48 0.00 -1.04 -1.36 0.00 -0.12 -2.00 -2.00 Table 3: Optimal parameters in cybersecurity This table reports the optimal parameters α and β for companies and technologies in cybersecurity, depending on the number of companies and linked technologies considered as input. We illustrate the evolution of the TechRank random walker in Figure 8. While the entities’ positions significantly change over the first steps, they gradually stabilize. With 15 0.20 0.010 Correlation for Technologies -2.0 0.15 -1.84 -1.68 -1.52 -1.36 0.005 0.10 -1.2 -1.04 -0.88 -0.72 0.000 0.05 -0.56 -0.4 -0.24 -0.08 0.00 0.08 0.005 0.24 0.4 0.05 0.56 0.72 0.88 0.010 1.04 0.10 1.2 1.36 1.52 1.68 0.15 1.84 1.68 1.2 1.52 1.36 1.04 0.88 0.4 0.72 0.56 0.24 0.08 -0.4 -0.08 -0.24 -0.56 -0.72 -1.2 -0.88 -1.04 -1.36 -1.52 -2.0 -1.68 1.84 -1.84 0.015 -2.0 -1.84 -1.68 -1.52 -1.36 -1.2 -1.04 -0.88 -0.72 -0.56 -0.4 -0.24 -0.08 0.08 0.24 0.4 0.56 0.72 0.88 1.04 1.2 1.36 1.52 1.68 1.84 -2.0 -1.84 -1.68 -1.52 -1.36 -1.2 -1.04 -0.88 -0.72 -0.56 -0.4 -0.24 -0.08 0.08 0.24 0.4 0.56 0.72 0.88 1.04 1.2 1.36 1.52 1.68 1.84 Correlation for Companies 0.20 Fig. 7: Grid search of parameters α and β This figure displays the results of the grid search for parameters α and β for 2,429 companies and 477 technologies in cybersecurity when investors preferences are fully set in previous investments. the 2,429 companies and 477 technologies, the algorithm requires 723 (1,120) iterations for companies (technologies) to converge. Entities starting with a high score (the initialisation is the degree of the node) do not significantly change rank and remain among the best ones. Thus, the algorithm assigns good scores to entities with many neighbours. Instead, entities starting with a low degree may significantly change their score, especially in the case of technologies. TechRank does not only recognize the importance of the most established entities, it also enables to identify emerging technologies. Figure 11 shows the first classified entities in cybersecurity. We check how TechRank performs when we change the number of companies and technologies. We fix the number of companies nc , which yields a resulting number of technologies nt . For instance, in the cybersecurity field, by selecting 10 companies randomly, we get 26 technologies. Considering that there are 2,429 cybesecurity-related companies on Crunchbase, we study the runtime running the algorithm for 10, 100, 499, 997, 1494, 1990, and 2,429 companies and 26, 134, 306, 372, 431, 456, and 477 technologies respectively. Figure 9 displays the results of TechRank applied on a subset of 10 cybersecurity companies. We note that “AppOmni”’s position does not change over the iteration, while two of its technologies, “Software as a Service” (Saas) and “cloud management” increase their scores. In Figure 10 we display this restricted network of 10 companies, that shows that SaaS and cloud management do not have other links. Hence, the strength of this company depends on its ability to combine important technologies (software, cyber security, and cloud security) with more exotic fields. Similarly, “Integrity Market Group” is the single involved in some fields (marketing, digital marketing, and advertising). This company does not use more established technologies and thus does not improve its score. Again, in Figure 10 we observe that these technologies lie out of the main network. Conversely, “Lacework” and 16 Fig. 8: TechRank scores evolution in cybersecurity This figure displays the TechRank scores evolution over the iterations for 2,429 companies and 477 technologies in cybersecurity. “Acronis” follow an opposite trend. Lacework (Acronis) significantly increases (decreases) its score. One explanation for this behaviour is the fact that Acronis is involved in a lot of technologies, most of which are not explored by other companies. On the other hand, Lacework relies on recognized technologies (security, cloud security, and software). Interestingly, the compliance technology, benefits from its connections, increasing its rank by three positions. 17 Software Cyber Security Zscaler Cloud Security Lacework Security Enterprise Software Marketing Opswat Digital Marketing Advertising Compliance OneTrust Information technology Pricacy Risk Management Consumer Eletronics Axis Security Sensor AppOmni Smart Home SaaS Cloud Management BigID Network Security Developer APIs BlockChain Cloud Computing Cloud Infrastructure SimpliSafe Integrity Marketing Group File Sharing Virtualization Big Data Artificial Intelligence Acronis Fig. 9: TechRank scores evolution of 10 companies in cybersecurity This figure displays the TechRank scores evolution over the iterations on a subset of 10 companies and 26 technologies in cybersecurity. Cloud Security Zscaler Cyber Security Enterprise Software Big Data Security Artificial Intelligence AppOmni BigID Cloud Management Software SaaS Risk Management SimpliSafe Privacy Consumer Electronics Compliance OneTrust Sensor Network Security Smart Home OPSWAT Axis Security Marketing Developer APIs Lacework Acronis Digital Marketing Advertising Blockchain Integrity Marketing Group Cloud Computing Virtualization Cloud Infrastructure Information Technology File Sharing Fig. 10: Circular network representation of 10 companies in cybersecurity This figure displays a circular network representation of a subset of 10 companies and 26 technologies in cybersecurity. Table 4 reports the number of algorithm iterations before reaching convergence. The number of iterations needed appears to be independent from the number of entities. Technologies need more iterations than companies, which we explain by the fact that there are 18 TechRank best Technologies in Cybersecurity Axis Security Software MeWe Cyber Security Technologies Companies TechRank best Companies in Cybersecurity Lacework CAST Software Security Information Technology GAN Integrity Enterprise Software 0 1 2 3 4 TechRank 5 6 7 0 1 2 3 4 TechRank 5 6 Fig. 11: TechRank top five scores in cybersecurity This left (right) panel displays the top five companies (technologies) according to the TechRank score when run on a subset of 10 companies in cybersecurity. many more companies than technologies. Since each company has at least one edge, the technology nodes have a higher degree than the companies on average. Thus, we expect the structure and the dynamics related to technologies to be more complex. The algorithm complexity does not only depend on the number of entities, but also on the network structure. Companies Technologies Number Iterations C Number Iterations T 10 100 499 997 1,494 1,990 2,429 32 100 134 196 180 240 723 26 134 306 371 416 449 477 18 155 2,469 194 871 5,000 1,120 Table 4: TechRank convergence This table reports the number of TechRank iterations before convergence for companies and technologies in cybersecurity. 4.2. Investment strategy We investigate how investors can select the strategy that reflects their preferences. If investors prefer to focus on technologies, they should choose companies working on the best technologies as selected by the (highest) TechRank score. This decision implies many criteria such as the number of technologies they want to be invested in, the capital allocation for 19 each company, and the diversification. We sketch the procedure to solve this decision process quantitatively in the Appendix A1. 4.3. Comparison with the Crunchbase rank Crunchbase assigns a rank to the top companies of each industry, that takes into account the entity’s strength of relationships, funding events, news articles, and acquisitions.10 We compare our results in the cybersecurity sector with the Crunchbase rank and investigate the strength of the association between the two scores using Spearman’s correlation. To make the ranks comparable, we convert our algorithm’s output into a ranking. The resulting Spearman’s correlation of 1.4% indicates that the two ranks are uncorrelated. We explain these differences by the fact that the Crunchbase rank is fixed, while TechRank is customizable according to investors’ preferences. Moreover, the Crunchbase rank focuses on the company’s level of activity and not on its market influence. Furthermore, the Crunchbase rank results from an algorithm that involves all the companies, while we focus only on a subset. We attempt to change the investors’ preferences and never obtain correlation coefficients above 2%. Other explanations we propose for this divergence includes the fact we assign a weight which identifies the distance between entities in the ranking. In the same line, TechRank allows decision-makers to set a threshold as starting parameter before running the algorithm. Finally, the Crunchbase algorithm is not open source and we do not know its mechanism, which makes the identification of the divergence’s source difficult. 4.4. Runtime We detail the code related to the TechRank algorithm in the Appendix 1. We run it on a machine with a 16-cores Intel Xeon CPU E5-2620 v4 @ 2.10GHz and with 126GB of memory. We investigate the variations in runtime given changes in the number of companies and technologies. The runtime is a positive function of the number of entities. For technologies the curve is much steeper than that for companies. However, considering that the number of technologies is a direct link to the number of companies, we repeat the experiment treating companies and technologies together. The random walk phase lines represent the runtime to convergence. There is a strong similarity between the runtime for companies and technologies, which is surprising given their different numbers. This also shows how strongly they are correlated and supports the capability of TechRank to capture the complexity of the cybersecurity technological landscape. Table 4 reports all the runtimes and we report the corresponding runtime comparisons for technologies and companies of both cybersecurity and medical field in the Appendix A2. 10 https://about.crunchbase.com/blog/influential-companies/ 20 Companies Technologies Number Parameters’ calibration Convergence Number Parameters’ calibration Convergence 10 100 499 997 1,494 1,990 2,429 10.21 28.69 189.03 730.43 1,372.17 2,057.42 3,230.99 0.56 13.24 470.10 2,023.18 4,514.11 8,396.26 16,890.26 26 134 306 371 416 449 477 11.75 35.37 154.79 312.65 482.18 656.95 1,071.84 0.57 13.72 483.25 2,392.46 4,404.48 8,096.69 12,779.62 Table 5: TechRank runtime This table reports the TechRank runtime for companies and technologies in cybersecurity. 4.5. Exogenous factors We conduct a sensitivity analysis based on investors’ preferences. We restrict the analysis to 1,000 companies only, given the to long runtime required. We assume investors to be interested in firm location only and consider the case of an investor based in New York City and in San Francisco, in turn. In Table 6 we report the outcome in terms of location for the five top ranked companies in both cases. We uncover a location change in the company ranking, with the first one being in the state of New York (investor based in New York City) and in the state of California (investor based in San Francisco), respectively. The companies with lower rank also reflects these geographical preferences, albeit with some exceptions (Singapore and Beijing). This implies that, even if remote companies are disadvantaged, their other attributes overcome this flaw. Company rank/ Investor location New York City San Francisco 1 New York City (USA) California (USA) 2 3 Massachusetts Quebec (USA) (Canada) Illinois California (USA) (USA) 4 5 California (USA) Beijing (China) Singapore (Singapore) Arizona (USA) Table 6: Companies TechRank scores with location This table reports the location of the top five TechRank companies’ scores when the geographical preference is fully set in the location of the investors (New York City and San Francisco). 21 4.6. Robustness tests To test the robustness of our algorithm and benchmark the cybersecurity sector, we apply TechRank for companies in the medical sector. We choose this sector given the important number of companies (twice the number of companies working in cybersecurity). We select the companies with the same methodology, which yields a total of 4,996 companies and 437 technologies. Figure 12 shows the results of TechRank in the medical sector. The runtime for these companies, reported in the Appendix A2, is on par with those of the cybersecurity sector. To make the two fields comparable, we set as x-label the number of entities, for both companies and technologies. The results reveal that the runtime of the two fields, for both the parameter calibration and the random walker steps, follow the same behaviour, for both companies and technologies. Increasing the number of entities does not yield any significant change both in terms of convergence and runtime behaviour. Finally, unlike Klein et al. (2015), for which the α remains constant and β changes significantly, we observe that all of our parameters significantly change across sectors[21]. Fig. 12: TechRank scores evolution in the medical field This figure displays the TechRank scores evolution over the iterations for 2,429 companies and 477 technologies in the medical field. 5. Conclusion 5.1. Limitations We choose technologies related to cybersecurity according to Crunchbase description, e.g., security, privacy, or confidentiality. Since these words may overlap other fields, we require the description to contain at least two of them to classify a company as cybersecurity. This naive strategy could be improved with more sophisticated techniques, such as natural language processing NLP. We also face limitation given the lack of information about companies’ resources allocation towards technologies. We only have a list of technologies for each 22 company without more information. Instead, it would be helpful to observe the amount of expenditures towards each technologies. Our algorithm is static as we do not have access to time series and it would be interesting to study how the bi-partite network changes. Finally, we are well aware that introducing exogenous variables may induce a bias given potential outliers. Our normalization procedure divides the factors by their maximum, which may lead to unproportionate results if the maximum is an outlier. However, we do not believe that removing outliers is a viable solution, since this would lead to overlook potentially profitable opportunities. 5.2. Further research This research can be expanded with time series, to investigate, for instance, the outcome of a company divesting from a technology, or investing in a new one. This would also enables to assess which new technologies are successful. This would give more insights about investment choices towards too recent ideas. A focus on percolation theory could help assess the effects of a node disappearance in the network [27]. For this purpose, machine learning methods could also be employed. Further research should be devoted to investigate additional exogenous factors in TechRank, enabling investors the widest range of features available possible. Alternative exogenous factors include the inception date of the company, number of employees, social networks activity, and even the Crunchbase rank, which is itself based upon entity’s strength of relations, funding events, news articles, or acquisitions. By the same token, further investigation should confirm the pertinence of the TechRank algorithm in fields that include more entities, as an increase in nodes could lead to coordination problems. Finally, it would be crucial to assess the long-term effects of the TechRank algorithm on investments returns and technologies’ development through back-testing, that once again, requires time series. 5.3. Conclusion We introduce TechRank, an algorithm that assigns a score to companies and technologies in complex systems. This methodology constitutes the first step towards a new data-driven investment strategy, which enables investors to follow their preferences while benefiting from a quantitative approach. We include investors’ preference based upon a case-by-case study. Our algorithm convergence depends on the number of entities and the complexity of the relationships within the bi-partite network. Using a restricted number of companies in cybersecurity, we analyze the TechRank scores and we explain the score variations of entities over iterations. We also explore how results change depending on the company’s location. Finally, we conduct robustness tests in the medical field, for which our results are qualitatively similar. We believe that our approach brings value to help investors’ form their decisions. Moreover, our algorithm’s flexibility allows to include exogenous factors and preferences, which is impossible in alternative existing company ranks, such as that of Crunchbase. Given our algorithm performance for cybersecurity, a highly complex market, as a case study, we believe that our algorithm would perform well in all markets. TechRank is a complementary, if not alternative, way to look at portfolio choices. 23 References [1] Battiston, F., Nicosia, V., Latora, V., 2014. Structural measures for multiplex networks. Physical Review E 89, 1–14. [2] Bavelas, A., 1948. A mathematical model for group structures. Applied Anthropology 7 (3), 16–30. [3] Benzi, M., Estrada, E., Klymko, C., 2013. Ranking hubs and authorities using matrix functions. Linear Algebra and its Applications 438, 2447–2474. [4] Besten den, M. L., 2021. Crunchbase research: Monitoring entrepreneurship research in the age of big data. Available at http://dx.doi.org/10.2139/ssrn.3724395 [5] Bonacich, P., 1972. Factoring and weighting approaches to status scores and clique identification. Journal of Mathematical Sociology 2, 113–120. [6] Buldyrev, S. V., Parshani, R., Paul, G., Stanley, H. E., Shlomo, H., 2010. Catastrophic cascade of failures in interdependent networks. Nature 464, 1025—-1028. [7] Canito, J., Ramos, P., Moro, S., Rita, P., 2018. Unfolding the relations between companies and technologies under the Big Data umbrella. Computers in Industry 99, 1–8. [8] Christensen, C. M., 1997. The Innovators Dilemma: When New Technologies Cause Great Firms to Fail. Harvard Business School Press, Boston, MA. [9] Cochrane, J. H., 2005. The risk and return of venture capital. Journal of Financial Economics 75, 3–52. [10] Dalle, J.-M., Besten den, M. L., Menon, C., 2017. Using Crunchbase for economic and managerial research. Available at https://doi.org/10.1787/18151965 [11] Dalle, J.-M., Besten den, M. L., Menon, C., 2017. Using Crunchbase for economic and managerial research. Available at https://doi.org/10.1787/18151965 [12] Donato, D., Laura, L., Leonardi, S., Millozzi, S., 2004. Large scale properties of the Webgraph. European Physical Journal B 38, 239–243. [13] Ewens, M., 2009. A new model of venture capital risk and return. Available at http: //dx.doi.org/10.2139/ssrn.1356322 [14] Freeman, L. C., 1978. Centrality in social networks conceptual clarification. Social Networks 1, 215–239. [15] Gold, A. H., Malhotra, A., Segars, A. H., 2001. Knowledge management: An organizational capabilities perspective. Journal of Management Information Systems 18, 185–214. 24 [16] Gordon, L. A., Loeb, M. P., Lucyshyn, W., Zhou, L., 2018. Empirical evidence on the determinants of cybersecurity investments in private sector firms. Journal of Information Security 9, 133–153. [17] Gornall, W., Strebulaev, I. A., 2020. Squaring venture capital valuations with reality. Journal of Financial Economics 135, 120–143. [18] Hidalgo, C. A., Hausmann, R., Dasgupta, P. S., 2009. The building blocks of economic complexity. Proceedings of the National Academy of Sciences 26, 10570–10575. [19] Ingole, P. V., Nichat, M. K., 2013. Landmark based shortest path detection by using dijkestra algorithm and haversine formula. International Journal of Engineering Research and Applications (IJERA) 3, 162–165. [20] Katz, L., 1953. A new status index derived from sociometric analysis. Psychometrika 18, 39–43. [21] Klein, M., Maillart, T., Chuang, J., 2015. The virtuous circle of Wikipedia: Recursive measures of collaboration structures. In: Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing, 1106–1115. [22] Korteweg, A., Nagel, S., 2016. Risk-adjusting the returns to venture capital. Journal of Finance 71, 1437–1470. [23] Kurant, M., Thiran, P., 2006. Layered complex networks. Physical Review Letters 96 (13), 1–4. [24] Moskowitz, T. J., Vissing-Jørgensen, A., 2002. The returns to entrepreneurial investment: A private equity premium puzzle? American Economic Review 92, 745–778. [25] Page, L., Brin, S., Motwani, R., Winograd, T., 1999. The PageRank citation ranking: Bringing order to the web. Available at http://ilpubs.stanford.edu:8090/422/ [26] Peng, L., 2001. Building a venture capital index. Available at http://dx.doi.org/10. 2139/ssrn.281804 [27] Piraveenan, M., Prokopenko, M., Hossain, L., 2013. Percolation centrality: Quantifying graph-theoretic impact of nodes during percolation in networks. PLOS ONE 8 (1), 1–14. [28] Saxena, A., Iyengar, S., 2020. Centrality measures in complex networks: A survey. Available at https://doi.org/10.48550/arXiv.2011.07190 [29] Saxena, A., Iyengar, S. R. S., 2017. Global rank estimation. Available at https:// arxiv.org/abs/1710.11341 [30] Schwarty, E. S., Moon, M., 2000. Rational pricing of internet companies. Financial Analysts Journal 56 (3), 62–75. [31] Tu, X., Jiang, G.-P., Song, Y., Zhang, X., 2018. Novel multiplex PageRank in multilayer networks. IEEE Access 6, 12530–12538. 25 [32] Xing, W., Ghorbani, A., 2004. Weighted PageRank algorithm. In: Second Annual Conference on Communication Networks and Services Research, 305–314. [33] Zhong, H., Chuanren, L., Zhong, J., Xiong, H., 2018. Which startup to invest in: A personalized portfolio strategy. Annals of Operations Research 263, 339–360. 26 Appendix Start The investor has a capital K to invest Any strong preference? Yes No Filter entities Set preferences and weights Run the TechRank algorithm and get the entities' score Focus on companies (C) or technologies (T)? C T One One T or more Ts? More Select the Cs working on that T One C or more Cs? One One C for each T? Yes More Invest the whole capital K in C Select the best C for each T according to TechRank No End Split capital K equaly? Select how many Cs for each T Yes No Invest K/n in each company Define how much invest in each C End End Yes n: number of companies Same number of Cs for each T? No Decide how to define the number of Cs for each T according to Ts' score Fig. A1: Flowchart of the investment process This flowchart sketches a potential investment process that uses TeckRank and investment preferences (exogenous factors and investment styles) before reaching an optimal investment and portfolio choice. List of words used to identify companies’ sectors List of words related to cybersecurity: cybersecurity, confidentiality, integrity, availability, secure, security, safe, reliability, dependability, confidential, confidentiality, integrity, availability, defence, defensive, privacy. List of words related to the medical field: cure, medicine, surgery, doctors, nurses, hospital, medication, prescription, pill, health, cancer, antibiotic, HIV, cancers, disease, resonance, rays, CAT, blood, blood transfusion, accident, injuries, emergency, poison, transplant, biotechnology, health care, healthcare, health-tech, genetics, DNA, RNA, lab, heart, lung, lungs, kidneys, brain, gynaecologist, cholesterol, diabetes, stroke, infections, infection, ECG, sonogram. 27 Runtime evolution Runtime parameters calibration 104 104 103 Seconds Seconds 103 102 102 101 C medicine T medicine C cybersecurity T cybersecurity 101 0 0 100 0 200 0 300 Number of entities 0 400 C medicine T medicine C cybersecurity T cybersecurity 100 0 0 500 0 100 0 200 0 300 Number of entities 0 400 0 500 Fig. A2: Runtime comparisons The left panel displays the grid search runtime for cybersecurity and medical fields. The right panel displays the parameters’ calibration runtime for the cybersecurity and medical fields. The ordinate axis uses a logarithmic scale. Algorithm 1 Previous investments factor for companies 1: eC ← [0] · len(c names) 2: for c ∈ range(c names) do 3: for i ∈ range(i names) do do 4: for c ∈ range(i PT names) i,c t ⊲ γ 5: eIC ← γ i,c is the amount of the investment from i to c at time t i,c t=0 t 6: eC [c] ← eC [c] + eIC i,c 7: end for 8: end for 9: end for C 10: eC max ← max (e ) 11: f C ← eC /emax ⊲ f C : list of previous investments for each technology 12: return f C 28 Algorithm 2 Previous investments factor for technologies 1: eC ← [0] · len(c names) 2: for c ∈ range(c names) do 3: for i ∈ range(i names) do P(T ) IC t 4: eIC ⊲ γi,c is the amount of the investment from i to c at time t i,c ← t=0 γt 5: eC [c] ← eC [c] + eIC i,c 6: end for 7: end for 8: eT ← eC · M CT ⊲ Matrix multiplication T 9: emax ← max (e ) 10: f T ← eT /emax ⊲ f T : list of previous investments for each technology 11: return f T Algorithm 3 Geographic coordinates factor 1: h dict ← {} 2: for c name, c address ∈ c locations do 3: lat ← c address.latitude 4: lon ← c address.longitude 5: h ← haver dist(lat, lon, lat inv, lon in) 6: h dict[c name] ← 1/h 7: end for 8: h max ← max (h dict) 9: for c name, h ∈ h dict do 10: h dict[c name] ← 1 − h/h max 11: end for 12: return h dict ⊲ haver dist is a function we have created Distance computation We obtain the distance between two points on earth with the Haversine approximation (hav(θ)), using latitude and longitude of the locations[19]. Let (λ1 , φ1 ) and (λ2 , φ2 ) be the longitude and latitude in radiance of two points on a sphere and θ the central angle given by the spherical law of cosines, the Haversine distance writes, h = hav(θ) = hav(φ2 − φ1 ) + cos φ1 cos φ2 hav(λ2 − λ1 ). 29 (11)