Two weeks prior to the conference, I connected with my colleague Tuomas Syrjänen, co-founder Futurice. Tuomas is an expert in data and artificial intelligence, and is passionate about exploring and learning from data. Whenever the databases or data sets he is working with don’t offer sufficient ways to search for what he needs, he writes additional code to help him dig deeper.
During our chat, Tuomas instantly recognized the problem I was facing. Fortunately, he was also able to tell me about the possibilities of text mining and how I could put it to good use. Here’s what I learned from him.
What is text mining?
As a company, you want to stay on top of your game, ahead of the competition and aware of the market. This will enable you to work proactively so that you’re not at the mercy of the circumstances, but rather in control of how your organization is navigating the changing realities of business.
To do so, you can do a variety of traditional steps – like doing desk research or talking to people and learning from them – but the market is big. How do you get the big picture?
Text mining is a process that focuses around recognizing and extracting patterns from large collections of data. It makes key concepts, trends and relationships visible, giving you insights to base your decisions on and validate assumptions with. It can also point out your blind spots where you need to delve deeper and learn more.
How to get started with text mining
Text mining is used to search for answers or tendencies in databases, on social media, in job listing services, and so on. Databases can be any number of different sources, for example news articles, press releases, patent libraries, academic articles, social media. At Futurice, we have paid to gain access to various relevant databases and data sets – our playground for doing text mining.
But a database as such is just a starting point. To make a search in a database, you need two key things:
- Focus: What question are we asking? What would be relevant to look into and learn about? In our health industry example, we had very high-level questions: What is the main focus in news articles, or at conferences? What can we learn from patents?
- Data points to start with: In our case, in which clinics do we look for answers? We decided to focus on a selection of the largest clinics – five in the U.S. and nine in Europe – as we wanted to learn about the differences between the two markets.
Which rich data sources should you explore?
Once you have outlined your focus and data points, you can start to investigate different databases and data sets:
- Which patents are being registered in a certain area? We know that if something is patented, it will be mainstream in the next year or two. This can give companies a good indication of what their competitors are focusing on, enabling them to ask themselves how and where they want to position themselves in this game.
- What open positions are being posted in the industry and what can we learn from them? Job listings tell a lot about a company – they reveal what the strategic focus is, what new technologies are being explored, what skills are wanted, and so on.
- What topics do companies talk about, and which keywords are essential to them? This tells a lot about their strategic – or at least operational – focus.
- What do companies say about mergers and acquisitions or strategic partnerships? This can give you an idea about how a company is expanding (or planning to).
- How do companies talk about their products and services, and which new products and services are they launching? It can be inspiring to learn and take cues from what others are engaging with the market. Answers to these questions can support companies as they rethink their own position in the market, and help them gain inspiration when creating strategies, developing new products and services, evaluating their value chain, and so on.
Answers to these questions can support companies as they rethink their own position in the market, and help them gain inspiration when creating strategies, developing new products and services, evaluating their value chain, and so on.
What to keep in mind when doing text mining
When exploring the possibilities of text mining, one of the most important things is to stay curious, and routinely remind yourself that you are on a journey! Keep asking yourself why you’re doing text mining and what you’re trying to achieve with it, and search in several data sources to validate your insights.
“What is the most relevant question to answer,” you might be asking, and the answer is that the most relevant question will always change over time. Be willing to shift your question as you learn – you might realize you haven’t been focusing on what’s important from the beginning.
It’s also important to be aware of confirmation bias. If your data fits your purpose, you might be able to see the results you want it in the data. But are you just trying to confirm what you would like to see? Remember to challenge your approach and be aware of your own biases.
Are your data sets representative? On a global scale, you’ll most likely have higher access to data from the US than from other countries. We should expect data from (developing) countries to not be anywhere near as widely represented or available as data from the western world. Consider whether you need to combine different methods to get more representative insights.
So what exactly is happening in the health industry today?
To gain an overview of what is happening in the health industry, Tuomas took a deep dive into data sets with news articles, press releases and patents. Here’s what he learned:
1. Sustainability Unsurprisingly, sustainability is a trending keyword also in the health industry, and it is discussed significantly more actively in the U.S. than in Europe. In Europe, the sustainability discussion is dominated by waste and recycling, while in the U.S., environmental aspects are discussed more. Among the growing topics in Europe are emissions and air quality, and in the U.S., trending topics in include environmental factors, emissions and ESG.
2. Telehealth Telehealth is an emerging topic that gained traction during the pandemic, driven by an increased need for online consultations. Telehealth, however, is much more than that – it is also about optimizing outpatient settings, monitoring patients, and offering wearables as well as other products and services that help keep patients at home. We were not surprised to learn that this is a trending topic, given the number of patents registered in this area. Telehealth is primarily discussed in U.S. hospitals, not so much in European ones.
3. Hiring for data expertise The majority of actors in the healthcare industry aim to create more coherent journeys for both patients and healthcare professionals. This is made possible only if actors, products, services and systems in healthcare work in unison and share more than they do today. The industry needs to become more connected, and share data more actively. Our text mining exercise showed that in general, there is a lot of talk about data sharing, yet open positions are all about software developers – not data engineers and data scientists. Open positions related to data are much more common in other industries.
4. Patents in health The number of patents focusing on digital surgery, digital operating rooms and 3D printing are growing steadily. So far, the industry has been focusing a lot on robotic surgery. The next step along that path is digital surgery, which requires introducing artificial intelligence and machine learning to assist in surgery planning and decision making – as well as remote patient monitoring using sensors and wearables; AR/VR assisted tools for education, training and visualization; and finally, better networks and connectivity to enable real-time data sharing and remote surgery.
5. Interoperability Interoperability is the ability of two or more systems to exchange health information and use the information once it is received. It is a huge topic in health and one of the key success factors required for a better experience for both patients and healthcare professionals. Our text mining effort revealed that currently, only US hospitals tend to discuss interoperability. It also showed that the topic is gaining traction among tech companies (e.g. Microsoft and Oracle) as well as software companies (such as Epic).
Data-informed conversations are a continuous journey
In our roundtable event, these insights sparked interesting conversations. The findings themselves were discussed at length, but they also served as an invitation for the participants to share their organizations’ perspectives – which also gave rise to an interesting exchange. The findings clearly sparked a lot of interest, and the participants were curious about where the data came from.
The results of the roundtable were encouraging and certainly did their part to motivate us to keep digging deeper into the topics and learn more. After all, a learning journey should ideally not stop, but rather continue in an iterative cycle.
Ideally, text mining should be used in iterations, starting with exploring high-level questions. After gaining the first insights, it’s time to reflect on those, refocus and decide to go in one direction, to zoom in on something. After that, the learning circle starts again.
What needs can text mining solve?
Text mining is great for gaining insights, learning about various tendencies, and identifying problems. It typically serves to start conversations, not end them. It helps us identify which topics we should dig deeper into or which questions we should explore. Ideally, you should repeat the exercise regularly to learn more – and combine it with other methods like qualitative interviews, which dig deeper, validate, and give you more insights to work with.
Using text mining to gain an overview of your problem landscape and the issues prevalent in your industry can help you open your eyes, sharpen your focus, leverage conversations, and make decisions more efficiently. For the time being, involving a data scientist or data engineer to explore data is highly beneficial, but I believe that in the very near future, AI-enabled search engines will allow anyone to ask questions and get well structured answers.
- Louise FuglsangHead of Health, Germany