Take me straight to the MVP
Try the MVP
Machine learning is complicated?
Agreed, machine learning as a subject is complicated. It is beyond the scope of the end user to worry about the details and inner working of a machine learning system. However, users are intelligent enough to be able to make models, test these models and deploy them to automate decision making. Currently, the barriers to entry are very high - the machine learning ecosystem of today has not made room for the kind of users who just want to tweak and use the system. We often see products with an underlying machine learning system - which is always boxed off from the user. We need to empower people to directly interface with machine learning systems and manipulate them to serve their purpose.
User interfaces for machine learning systems
User interfaces to machine learning will have to accomodate users who are not as computationally and mathemetically adept. I akin this to the text editor revolution, where previously used command based editors were wiped out by WYSIWYG text editor. I imagine futuristic machine learning system interfaces to bring the WYSIWYG concept to machine learning.
Visualization's as input tools
Traditionally visualizations have been seen as tools to bring data to life. The concept of visualization implies a one way flow of information. Information (data) is processed to yeild some visualization friendly format, this processed data is then fed into a visualization to populate it.
How the mainstream concept of visualization is a data viewing device.
Instead visualizations of data should be looked at as interfaces to the underlying data. This leap is not a new one, we use common forms of visualizations as input tools all the time. Color pickers, WikiMapia, font selectors etc. are example of visualizations that serve as input tools. A two way flow of information should look as follows.
Visualization acting as an input-output interface to the undelying data.
The main challenge of developing interfaces for machine learning system is that the tools and data vary according to the use case of the user. Machine learning problems can be broadly divided into two sub-sections :
- Regression problems: where the output of the machine learning model is a continuous value. These problems involve predicting the value of a variable given some data.
- Classification problems: where data must be classified into bins. These problems involve deciding if an image is a cat or a dog etc.
As an example we will use the regression problem of real estate model pricing when the crime (incidents per 1000 residents) and the number of rooms (function of sq. footage) is known.
The objective of this project is to:
- Demonstrate that real-estate agents can codify their knowledge and wisdom into machine learning models
- Design a usable machine learning interface for end-users
- Demonstrate the viability of this concept with an MVP product
Operations on the interface
Because this is a regression based problem, we will use a scatter plot to reprsent the underlying data. The scatter plot is interactive and trigger an automatic training on the dataset being represented.
Moving a data point
Adjusting a data point on the interface trigger a re-training on the dataset.
Predicting a value
Depending on the current training on the dataset and the supplied values for crime and rooms, the new data point will be predicted in red.
Predicting a value using a visualization facilitated interface.
Interactive machine learning
This project explored an emerging field of interactive machine learning that merges the disciplines of design, computer science and math. This includes imporvements to machine learning algorithms to facilitate interactivity and reponsiveness to small changes. Inventing new interfaces for machine learning system, since historically this problem did not exist and it calls for novel thinking. An interactive real-estate pricing model is available for you to play around with, please take a look at the MVP here
Future work involves refining the existing MVP product to allow for batched updates for re-training, be able to re-purpose existing work to serve regression problem with more than three training variables. With this project, we have only scratched the surface of what might be tomorrow's commoditized machine learning systems meant for end-user use.