CSE512 Projects (Winter 2014)

Interactive Visualizations for Supervised Learning

Alexandre Bykov, Stanley Wang
The main screen, showing the scatterplot matrix for the last iteration and the number of errors per iteration on the lower chart. The prominent orange and blue marks are classification errors.

In this project we create a visualization tool for analyzing machine learning classification algorithms. The key idea behind this tool is to provide the user with per-iteration performance information for the algorithm. This is done through two main views. The first view contains a scatterplot matrix of the data projected into multiple dimension pairs. Each point is labeled with its actual and predicted labels to highlight where in the dataset errors occur at each dimension. The second view provides summary statistics (classification accuracy, number of errors, etc.) at each iteration and an interface to scroll through every iteration of the algorithm. Both of these views are updated in real-time as the algorithm as running. As a test of the system, the provided implementation visualizes running a linear SVM algorithm on a breast cancer survival dataset with about 200 points and 3 dimensions. The implementation is easily extendable to other algorithms and datasets.

Software

Since the visualization depends on Python's scikit-learn SVM implementation it must be run locally on a machine with the latest version of scikit-learn (download here).

To run, download the full repository and follow the following steps:
1) Run python -m SimpleHTTPServer
2) Run python websocketserver.py from the root directory of the visualization.
3) Access http://localhost:8000/mlviz.html to view the visualization.

Note: If the visualization does not start immediately when reloading the page, pressing ctrl-c in the webserver window should fix that.

Materials