Data science and data engineering
What you can expect in 2021–2022
MLOps is trending heavily in the data science domain. Over the next couple of years we’ll see improved access to and availability of easy API integrations, advanced out-of-the-box machine learning features, and NLP tools. Fairness and privacy issues are on the agenda as well.
AI solutions and other data-related projects depend on having access to data. Data engineers build data lakes and ensure that the right data is available at the right moment.
Data modeling will create clarity when consolidating business domain specific databases, while data governance and other data management skills ensure the integrity and security of data.
Today, our largest clients are already in the process of developing their data warehouses. Smaller players will follow suit just a couple of steps behind.
Democratization of machine learning
Image and speech recognition and other solutions to generic problems are already available as easily integrated APIs, and their significance will keep growing. The number of ready machine learning (ML) services will also continue to grow. As a result, developers will be able to implement advanced features without a deep knowledge of data science.
Even in projects that require custom ML solutions, AutoML will partly automate the model optimization and therefore enhance the impact of the data scientists’ work.
NLP for smaller languages
Companies want to capitalize on their textual and speech data, and this applies in languages other than English as well.
Cloud platforms already provide NLP tools for a number of languages and plan to release more in the future to cater to additional markets. Even so, plenty of specialized problems remain to be solved that are not universal enough to convince the big players to implement purpose-built tools for relatively small languages.
The need for both simpler rule-based models and more advanced pre-trained deep learning models remains to be solved by smaller boutique players.
MLOps, or the management of the lifecycle of machine learning (ML) projects, will become increasingly important when we develop long-term solutions and not just proofs-of-concept (PoCs).
MLOps means developing systematic approach to the monitoring, scalability and evaluation of data pipelines and ML models. Model training should be reproducible and deployment should be as automated as possible.
The ability to rapidly build end-to-end solutions allows for an improved focus on providing genuine business value.
Fairness and privacy
As many aspects of our daily lives are becoming automated by machine learning systems, it is important to mitigate their inherited risks and potential harm. Data scientists need to be acutely aware of potential sources of bias and have the necessary tools to evaluate the impact of the systems they are building.
Data privacy is important to ensure the end-users’ trust towards the application. Technical solutions, such as federated learning, can ensure that private data is not shared more widely than necessary – but data scientists will also need to be aware of privacy best practices.