Hannes Leo

Context

EIT Health Austria, active since 2022, is part of the largest European expert network for innovation and entrepreneurship in healthcare. This network collaborates to develop and validate new healthcare solutions, accelerating their journey from prototype to market. The stakeholder analysis commissioned by EIT Health Austria aims to encompass all relevant elements of the eHealth ecosystem, i.e. key Austrian stakeholders that interact and collaborate to create a thriving environment for life science companies. This includes healthcare sector institutions whose services are complementary to, competitive with, or otherwise significant to EIT Health Austria. According to the Life Sciences Directory of aws and FFG, about 90 institutions fit this description, with 62 included in the analysis.

Method

Natural Language Processing (NLP) techniques were used to position these institutions based on the description of the institution and its services. A large Language Model (LLM) – specifically a BERT model (Bidirectional Encoder Representations from Transformers) – from Hugging Face was employed to determine the embeddings of each institution’s description. Embeddings are defined as vectors – in this case, 768-dimensional – that place each institution within the “language space.”

The vectors of the 62 institutions underwent principal component analysis (PCA) and were positioned in a coordinate system using the first two principal components (see Figure 1). The x-axis represents the proximity of the institution to the enterprise sector while the y axis stands for the thematic focus.  

Results

The positioning of institutions obviously follows their main tasks in the ecosystem. Institutions in the blue circle in the upper left part of figure 1 tend to focus on scientific research white the institutions in the green circle are health oriented. Life Science support institutions – including EIT Health Austria are grouping in the orange circle in the middle. On the right hand side is a cluster of start-up support institutions and regional support institutions. EIT Health Austria (EIT H) is positioned centrally among institutions with similar aims and activities. This central positioning indicates that EIT Health Austria is well-placed to interact with stakeholders from the science, health, and the business sector.

Further cluster analysis provides additional insights into how stakeholders are grouped around specific topics. A 3-cluster solution groups business-related institutions, scientific institutions, and healthcare institutions into separate clusters. Institutions are thus precisely assigned to the business, science, and health domains, positioned in a three-dimensional space using the first three components of the PCA (see Figure 2).

Figure 1: Positioning with principal component analysis 

Figure 2: 3-cluster model

A diagram of a graph Description automatically generated with medium confidence

Insights

The embeddings calculated by a Large Language Model (LLM), based on the descriptions of the institutions, combined with principal component analysis, allow for an intuitive and coherent positioning of stakeholders within the Austrian life sciences ecosystem. The results can be displayed in a two- or three-dimensional coordinate system, where the distances between institutions realistically represent their proximity and thematic positioning. This demonstrates that descriptions alone are sufficient to meaningfully position and describe the ecosystem in a kind of quantitative way.

While traditional methods (e.g., data collection via questionnaires and/or interviews) could also achieve the positioning of institutions, they would require significantly more resources and subjective decisions. The AI/NLP-based approach via embeddings is effective even with small datasets. Combining this with cluster analysis enables empirical structuring of the ecosystem, which can be used for further analysis. Although thematic clustering into science, business, and health could be achieved with other analytical methods, in this case, it was derived empirically without subjective judgments.

This approach is applicable to many research and consulting questions where textual information is available. While beneficial for small datasets, the advantages of AI/NLP-based positioning grow with larger datasets.