A weblog by Will Fitzgerald



Orange is a component-based data mining software. It includes a range of preprocessing, modelling and data exploration techniques. It is based on C++ components, that are accessed either directly (not very common), through Python scripts (easier and better), or through GUI objects called Orange Widgets….Orange is distributed free under GPL.
Some of the readily-available features of Orange include:
Data input/ouput: Orange can read from and write to tab-delimited files and C4.5 files, and supports also some more exotic formats.
Preprocessing: feature subset selection, categorization, feature utility estimation for predictive tasks.
Predictive modelling: classification trees, naive bayes, k-NN, majority classifier, support vector machines, logistic regression.

Ensemble methods like boosting and bagging are also included .
Model validation: different data sampling and validation techniques (like cross-validation, random sampling, etc.), and various statistics for model validation (classification accuracy, AUC, sensitivity, specificity, …) are included.

Orange evaluation schemas support caching: validation results (class probabilities) are stored, and rerunning the validation will only validate new classifiers.


Comments are closed.

%d bloggers like this: