Classificationbox uses machine learning to automatically classify various types of data, such as text, images, and structured and unstructured data. By providing continuous and live learning, Classificationbox avoids the need for expensive training sessions, GPUs, or large data sets.
Classificationbox has a variety of utilities:
- Learn about how your company is perceived by grouping tweets into positive and negative
- Automatically group photos of cats and dogs
- Group emails into spam and non-spam categories
- Build a classifier to detect the language of a piece of text based on previously taught examples
Classificationbox in aiWARE
You can use Classificationbox in aiWARE by uploading .classificationbox files to a library and then using the Classificationbox engine to process content and classify images or frames from videos.
Run Classificationbox
An interactive administration console that includes everything you need to get going is available when you run Classificationbox.
- Make sure you have Docker running with at least 2 CPUs and 4GB RAM.
- Run this code in your terminal to start the box:
MB_KEY="YOURKEYHERE"
docker run -p 8080:8080 -e "MB_KEY=$MB_KEY" machinebox/classificationbox
- Go to http://localhost:8080/ in your browser to see what your box can do.
Updating Classificationbox
If you already have Classificationbox installed, you can update it with the following:
docker pull machinebox/classificationbox:latest
A few tools exist to help you train your classifier:
- imgclass - Train Classificationbox with images from your hard drive
- textclass - Train Classificationbox with text files to build a text-based classifier
Best practices
- The quality of classifiers depends largly on the input data and how you teach a model.
- Aim to have at least 100 examples for each class - exact requirements differ by case.
- Have the same (or very similar) number of examples per class.
- Ensure the quality of the examples you train with, and make sure each example is in the correct class.
- Take a random selection of 80% of your examples for teaching, and use the other 20% for validating. You can measure what percentage of your validation set was correctly predicted by the model; this is the model accuracy. You can experiment with different data sets and compare them to decide which gives you the best results. To avoid a biased model, the order of the examples you teach should be random. Do not teach all examples for each class in a group, instead spread the teaching out among all classes.