Extraction of logical rules and selection of features using neural networks
Włodzisław Duch |
Real data from samples of 23 species of mushrooms.
22 attributes and 3 class labels: edible, poisonous, not recommended. 4208 (51.8%) edible, 3916 (48.2%) not.
Attributes: cap shape (6, e.g.. bell, conical,flat...), cap surface (4), cap color (10), bruises (2), odor (9), gill attachment (4), gill spacing (3), gill size (2), gill color (12), stalk shape (2), stalk root (7, many missing values), surface above the ring (4), surface below the ring (4), color above the ring (9), color below the ring (9), veil type (2), veil color (4), ring number (3), spore print color (9), population (6), habitat (7).
Task: identify edible mushrooms, find relevant features. The books says there is no rule ...
Data sample:
Mushroom-1 is: edible,convex,fibrous,yellow,bruises,anise,free,crowded, narrow,brown,tapering,bulbous,smooth,smooth,white,white, partial,white,one,pendant,purple,several,woods
Mushroom-2 is: edible,flat,smooth,white,bruises,almond,free,crowded, narrow,pink,tapering,bulbous,smooth,smooth,white,white, partial,white,one,pendant,purple,several,woods
Mushroom-3 is: edible,bell,smooth,white,bruises,almond,free,close,broad, white,enlarging,club,smooth,smooth,white,white,partial, white,one,pendant,black,scattered,meadows
Mushroom-4 is: poisonous,convex,smooth,white,bruises,pungent,free,close, narrow,white,enlarging,equal,smooth,smooth,white,white, partial,white,one,pendant,black,scattered,urban
Mushroom-5 is: poisonous,convex,smooth,white,bruises,pungent,free,close, narrow,pink,enlarging,equal,smooth,smooth,white,white, partial,white,one,pendant,black,several,urban
Mushroom-8000 is: poisonous,convex,smooth,white,bruises,pungent,free,close, narrow,pink,enlarging,equal,smooth,smooth,white,white, partial,white,one,pendant,brown,scattered,urban
Rule for edible:
IF odor=(almond.or.anise.or.none).and.spore_print_color=not.green THEN edible
48 errors, 99.41% correct
Rules for poisonous - 6 attributes only:
R1) IF odor=not(almond.or.anise.or.none) THEN poisonous
120 errors, 98.52% accuracy
R2) IF spore_print_color=green THEN poisonous
48 errors, 99.41% correct
R3) IF odor=none.and.stalk_surface_below_ring=scaly.and.stalk_color_above_ring=not.brown THEN poisonous
8 errors, 99.90%
R4) IF habitat=leaves.and.cap_color=white THEN poisonous
no errors!
Iris dataset: 150 Iris flowers of 3 kinds, leaf and petal width and length in cm.
5.1,3.5,1.4,0.2, Iris-setosa
4.9,3.0,1.4,0.2, Iris-setosa
4.7,3.2,1.3,0.2, Iris-setosa
4.6,3.1,1.5,0.2, Iris-setosa
5.0,3.6,1.4,0.2, Iris-setosa
5.4,3.9,1.7,0.4, Iris-setosa
4.6,3.4,1.4,0.3, Iris-setosa
5.0,3.4,1.5,0.2, Iris-setosa
4.4,2.9,1.4,0.2, Iris-setosa
4.9,3.1,1.5,0.1, Iris-setosa
6.3,3.3,4.7,1.6, Iris-versicolor
4.9,2.4,3.3,1.0, Iris-versicolor
6.6,2.9,4.6,1.3, Iris-versicolor
5.2,2.7,3.9,1.4, Iris-versicolor
5.0,2.0,3.5,1.0, Iris-versicolor
5.9,3.0,4.2,1.5, Iris-versicolor
6.0,2.2,4.0,1.0, Iris-versicolor
6.1,2.9,4.7,1.4, Iris-versicolor
5.6,2.9,3.6,1.3 ,Iris-versicolor
6.7,3.1,4.4,1.4, Iris-versicolor
5.6,3.0,4.5,1.5, Iris-versicolor
5.8,2.7,4.1,1.0, Iris-versicolor
6.2,2.2,4.5,1.5, Iris-versicolor
5.6,2.5,3.9,1.1, Iris-versicolor
6.3,2.9,5.6,1.8, Iris-virginica
6.5,3.0,5.8,2.2, Iris-virginica
7.6,3.0,6.6,2.1, Iris-virginica
4.9,2.5,4.5,1.7, Iris-virginica
7.3,2.9,6.3,1.8, Iris-virginica
6.7,2.5,5.8,1.8, Iris-virginica
7.2,3.6,6.1,2.5, Iris-virginica
6.5,3.2,5.1,2.0, Iris-virginica
6.4,2.7,5.3,1.9, Iris-virginica
6.8,3.0,5.5,2.1, Iris-virginica
5.7,2.5,5.0,2.0, Iris-virginica
5.8,2.8,5.1,2.4, Iris-virginica
6.4,3.2,5.3,2.3, Iris-virginica
6.5,3.0,5.5,1.8 Iris-virginica
What can we say about such data?
IF (x3 < 2.5) iris-setosa;
IF (x3 > 4.8) iris-virginica
ELSE versicolor
The very simple rules for the Iris dataset (3 errors, 98.0%):
IF (x3=small) iris-setosa;
IF (x3=large.or.x4 =large) virginica
ELSE versicolor