If you know anything about artificial intelligence and the internet of things, you most likely can’t help but feel a great deal of excitement at the idea of combining the extraordinary promises that these two fields have to offer. With the unparalleled advances in the field of computer hardware, some of the advanced theoretical knowledge that has been around for decades can be finally leveraged and put to use in real-life, practical applications. And with an ever-growing IoT market, getting high quality data has never been so easy, enabling the development of ever better, more performant models, and in particular of deep learning models.
Wait, Before we Start: What is Deep Learning, Exactly?
Deep learning is a family of artificial intelligence techniques responsible for countless technological advances in the most recent years. Even though deep learning (formally referred to as “deep neural networks”) has existed since the 50’s and has seen its key algorithmic breakthroughs in the 80’s, it is only recently that computer hardware advances have caught up to theories, giving computer scientists access to the enormous computational power and data storage that they needed to make neural networks reach their full potential.
Deep learning algorithms are characterized by a unique multi-layer architecture in which knowledge is progressively “abstracted away” from the raw data they take as an input. For instance, when a deep neural net is trained to identify the pictures of a particular type of animal out of a pool of random images, it starts by first considering the images at the pixel level, then converts this knowledge into “higher-level” knowledge such as the edges of the object of interest, then identifies specific features (such as the paws, the tail, etc.) before eventually abstracting a general visualization of the object (or, to continue our example, of the animal) as a whole. The interesting point here is that the early layers that concern the lower-level knowledge could be reused with an entirely new type of images–such as satellite imagery for example–because edges remain edges regardless of what an image represents.
Remarkably enough, no data scientist ever explicitly programed a computer to perform such a task. Instead, the learning algorithm is fed with hundreds of thousands of images until it figures out by itself (and learns by itself!) how to recognize the desired objects.
Deep Learning on the Edge: Why it Matters
In parallel to the hardware improvements that helped the advancement of deep learning research, some tremendous discoveries have been made that have allowed the rapid development of IoT devices. In particular, the preponderance of microcontroller units (MCU) are creating a unique opportunity to make AI applications–especially deep learning-enabled applications–accessible to the consumer at an unprecedented scale and speed.
MCUs offer remarkable advantages for deployment of Deep Learning-based applications, as they reduce latency, conserve bandwidth and offer better guarantees in terms of privacy. When such AI applications are installed directly on an IoT device, that’s what I’d refer to as the deployment of AI applications on the edge. Choosing if an application is best deployed on server or cloud or on the edge of an IoT device comes down to making a decision regarding the trade-off between speed (through latency) and accuracy (as more complex and therefore larger models, can’t always be stored in the more limited memory of an IoT device).
What is Transfer Learning?
In the context of psychology, transfer learning is defined as the study of dependency of human learning or performance on preliminary experience. Humans are not taught how to perform every single possible task in order to master it. Rather, when they encounter new situations, they manage to solve problems in an ad-hoc manner by extrapolating old knowledge to new environments. For example, when a child learns to swing a tennis racket, he or she can easily transfer that skill set to ping pong or baseball. The same goes for conceptual understanding, like applying statistics or other mathematics to budgeting as an adult.
By contrast to the way humans function, most machine learning algorithms, once implemented, tend to be specific to a particular data set or to a particular, discrete task. Machine learning researchers have been giving more and more attention recently to how to empower computers to reuse their acquired knowledge and re-apply it to new tasks and new domains, attempting to abstract and transfer the “smarts” extracted from the data across multiple applications, usually somewhat similar to each other.
While Deep Learning can be used in an unsupervised context, generally it powers supervised learning. That means the algorithm learns from examples made of input-output pairs which it then uses to try to identify patterns and extract relationships, with the goal of predicting an outcome from new unseen data. Deep learning has application in countless areas of research, but suffers from one major drawback: these are extremely data-greedy algorithms require massive quantities of data in order to tune the thousands of parameters that come into play in a neural network architecture. This means that not only is a lot of data required to achieve good performance, but this data also needs to be labeled, which can be an expensive and time-consuming endeavor. Even worse, it is often not even possible to obtain quality data, or to label it accurately enough to be able to train a model from it.
This is where transfer learning can really help. It allows for a model developed for a specific task to be reused as the starting point on another one. It is an exciting area for machine learning scientists because it mimics the way humans generalize knowledge from one specialized task to another. In fact, it is a key strategy when it comes to reducing the required size of datasets in order for deep neural networks to even become a viable option.
Remember the example we surfaced earlier about how a model can learn to identify object edges in a deep learning network? That’s the exact sort of knowledge that can be transferred to another computer vision task. In this way, a model that learns to see regular objects like cats and dogs can transfer that understanding to more complex tasks, like identifying nuclei in cancer imagery.
click for larger image
Figure 1: The different learning approaches when building AI applications (Source: Figure Eight)
Transfer Learning Strategies
Transfer learning certainly comes across as an elegant and natural way to solve the dilemma of information poverty in the age of Big Data. However, some important tactical questions regarding transfer learning need to be answered prior to being able to use it in practice:
When is it appropriate to use it?
What’s the best way to perform the transfer?
To answer those questions, it is worthwhile to dig deeper into the theory of transfer learning to understand the different transfer learning approaches that exist.
As we have seen, sometimes, getting labeled data in the target domain is challenging while labeled data exists in abundance in another source domain. This is when transductive transfer learning becomes useful. In some cases, the source and target feature spaces are different, and sometimes they are the same but the marginal probability distributions of the input data are different. The latter case of the transductive transfer learning is closely related to domain adaptation for knowledge transfer.
In the subcase that scientists refer to as inductive transfer learning , however, it is the target and source tasks that differ from each other; in fact, sometimes even the source and target domains are not the same. In this case, some labeled data specific to the target domain remains necessary in order to induce an objective predictive model for the target domain.
Finally, the unsupervised transfer learning setting allows (remarkably!) to reuse models even when the target task is different from but still related to the source task.
click for larger image
Figure 2: The different transfer learning strategies (Source: Figure Eight)
Why it Matters for Data Scientists
Supervised machine learning algorithms function under the assumption that the training data belongs in the same feature space and has the same distribution than the data used for inference. In fact, if a trained model fails to generalize, the very first thing a data scientist would do is check if both statistical distributions match. Wouldn’t it be amazing though if instead, there was a way to leverage data from an entirely different domain where data is abundant and cheap to label, and use it as training data for a problem that we have difficulties to solve in particular because of data sparsity? Well, this is exactly what transfer learning sets itself to achieve.
As we have seen, transfer learning helps address the potential challenges related to the collection and labeling of large amounts of data that would typically be necessary for supervised learning. Beyond the promise of applying pre-learned knowledge to entirely new tasks, there is an implicit expectation from data scientists that by reusing this precedent knowledge, the model performance should improve, and the training time should dramatically decrease compared to “traditional” supervised learning scenarios.
In fact, in successful cases of transfer learning, data scientists can expect the several practical benefits such as improving the performance of learning while avoiding extensive data collection and labeling efforts:
They expect a stronger starting point, meaning that the initial predictive power (prior to refining the model) on the source model is higher than it otherwise would be.
They expect a better improvement rate (a better “learning curve”) of the source model than they would normally get training a target model using supervised learning on the target data set.
They hope that the final performance of the converged skill of the trained model is better than it would be without transfer learning.
In short, data scientists expect an acceleration of their modeling productivity, a drop in the amount of data as well as the amount of compute required for their tasks, a more predictable development process and even some level of r isk mitigation since the approach or the data that is being transferred is already “true and tested”.
Transfer Learning as a Ready-to-Go Model
Technically, transfer learning is about encoding knowledge within the parameters of a model and making use of that knowledge to solve different problems. In practice, the transfer can occur in two different ways, starting with existing models built on preliminary data or with existing data alone.
A data scientist looks for an existing model that might appear as a good candidate to address his/her own needs. He/she might not even be aware of the data this model was originally trained on. In these scenarios, we talk about “pretrained models”, and the knowledge that is transferred is already encoded into the model itself, and the ML expert’s job is solely focused on fine-tuning the model using a small amount of data taken from the target domain through a process called warm restart . This is one of the most appealing ways to use the process, as it actually enables AI systems to learn from a small number of cases, which makes the training phase both fast and efficient. This process is revolutionary in the sense that it opens the door to collaborative artificial intelligence, when each developer or scientist can build on top of existing expertise, and focus solely on the added value that they can bring as an expert.
A data scientist realizes that the data available to train a supervised model is insufficient in quantity and/or quality, or that labeling the data is too tedious or expensive, and decides to build a model from scratch by leveraging another existing labeled data set originally developed in the context of another task or another domain. This allows him/her to make use of existing labels in order to save money or gather labels more easily. In this case, the data scientist starts with a regular supervised learning approach solving a different problem (a different task), or the problem at hand using another data set. He/she then finishes by fine-tuning the model to best adapt it to his/her problem.
It also offers an opportunity to save compute power, which is particularly important in the context of computationally heavy models.
Why is Transfer Learning Important for IoT?
A hairy problem when developing customized applications is known as the “cold start” problem. Cold start refers to a new user hitting a site where it is very difficult to build a personalized offering because his/her behavior or preferences are still unknown as no preliminary data has been collected. This is why your recommendations on an eCommerce website are usually disappointing until you start generating more data by browsing and shopping.
IoT sees a similar issue. After all, it’s always hard to train a model for a new connected product due to the fact the new user's preferences are unknown. Under these circumstances, a transfer learning approach can help because a warm restart typically requires much less data than would be necessary to recreate an entire model. Therefore, it offers an opportunity to ‘onboard’ new customers faster without the need to collect large amounts of data. In this scenario, the ‘base’ model would be generated using a combination of the data collected across a large number of customers which is more likely to be a fair representation of the expectations of the average user.
Transfer learning is also critical to the successful deployment of IoT deep learning applications that require complex machine-generated information of such volume and velocity that it would be simply impossible to ever find enough human experts to label it and train new models from scratch in a reasonable amount of time. Think about it: if it takes days or even weeks to label data and train a particular model, by the time the model is ready for consumption, it is very likely to already be obsolete!
No Magic Bullet
Transfer learning truly opens the door to new applications, and is definitely one of the key elements for building smart IoT applications. However, some particular precautions need to be taken to ensure proper usage of this technology.
With ever-growing concerns by consumers over the usage of their data, it is only natural to question the potential data security concerns related to IoT. As seen previously, transfer learning often means that a model built from an existing data set is used as the starting point to the development of a new solution. And this data isn’t necessarily open source data: it might sometimes be the aggregation of the data of a large group of people, or even (more rarely) the data set of a particular consumer. As a general rule, data scientists prefer to use aggregated data sets not only because it gives a more reliable worldview to start with when creating a model for a new customer, but also because the more data involved, the less likely it is that a specific individual might be directly identifiable.
And there are also more technical challenges related to transfer learning. In fact, in some situations, knowledge should not be transferred at all, as brute-force transfer may hurt the performance of learning in the target domain, this situation is referred to as negative transfer . How to avoid negative transfer is an important open issue that is attracting more and more attention in the research community, including at Figure Eight.
Jennifer Prendki is currently the VP of Machine Learning at Figure Eight, the Human-in-the-Loop AI category leader. She has spent most of her career creating a data-driven culture wherever she went, succeeding in sometimes highly skeptical environments. She is particularly skilled at building and scaling high-performance Machine Learning teams, and is known for enjoying a good challenge. Trained as a particle physicist, she likes to use her analytical mind not only when building complex models, but also as part of her leadership philosophy. She is pragmatic yet detail-oriented. Jennifer also takes great pleasure in addressing both technical and non-technical audiences at conferences and seminars, and is passionate about attracting more women to careers in STEM.