|
if (aSample.Rows.Count == 0)
{
return new TreeNode(new Attribute(getMostCommonValue(aSample, targetAttribute)));
}
In the above code, when getting a common value, do you mean to pass in variable "samples" and not variable "aSample"?
|
|
|
|
|
Hi,
thank you very much for this code.
I just found a little bug - i think.
You got an error at the gain() method where you calculate -(positives+negatives)/amount*entropy.
it always gives -entropy because amount = positives+negatives currently.
It need to be the amount of the main set not of the subset.
regards.
Bernhard
|
|
|
|
|
please how can i get the source code of Id3 decision tree algorithm for malaria fever
|
|
|
|
|
Hi,
First of all thank you for your code that is really helpfull.
I modified my attribute to make your program work on my example :
Attribute industrialRisk = new Attribute("industrialRisk", new string[] { "P", "A", "N" });
Attribute managementRisk = new Attribute("managementRisk", new string[] { "P", "A", "N" });
Attribute financialFlexibility = new Attribute("financialFlexibility", new string[] { "P", "A", "N" });
Attribute credibility = new Attribute("credibility", new string[] { "P", "A", "N" });
Attribute competitiveness = new Attribute("competitiveness", new string[] { "P", "A", "N" });
Attribute operatingRisk = new Attribute("operatingRisk", new string[] { "P", "A", "N" });
Attribute[] attributes = new Attribute[] { industrialRisk, managementRisk, financialFlexibility, credibility, competitiveness, operatingRisk};
DataTable samples = getDataTable();
DecisionTreeID3 id3 = new DecisionTreeID3();
TreeNode root = id3.mountTree(samples, "result", attributes);
printNode(root, "");
Console.ReadKey();
But then I have an issue, the code succesfully compute the 2 first TreeNode but then crash ...
After looking through your code I see that my best attribute (the one which is choose as a node) is null so it can not achieve to instatiate my TreeNode with this value. More precisely it seems like it don't achieve to count any positive or negative (NaN).
Do you have any idea where the error come from ? Or is it normal ?
Many thanks for your help (and your code ... )
Michael
|
|
|
|
|
|
hi, could you please help me to develop the decision tree algorithm.
how could i develop the decion tree for 16 classes??? if you have samlple coding or project please send me as i am currently working on my final project
|
|
|
|
|
Exception in thread "Thread-3" java.lang.NullPointerException
at guiID3.ID3.internalMountTree(ID3.java:225)
at guiID3.ID3.mountTree(ID3.java:286)
at guiID3.ID3.internalMountTree(ID3.java:261)
at guiID3.ID3.mountTree(ID3.java:286)
at guiID3.ID3.internalMountTree(ID3.java:261)
at guiID3.ID3.mountTree(ID3.java:286)
at guiID3.ID3.internalMountTree(ID3.java:261)
at guiID3.ID3.mountTree(ID3.java:286)
at guiID3.ID3.run(ID3.java:50)
at java.lang.Thread.run(Thread.java:722)
|
|
|
|
|
I have data set ( Training Samples) stored in SQL DataTable .
I want to make this table as an input to this algorithm to find a decision tree ..
can u set the steps to add this class to my windows project ?
With appreciation..
|
|
|
|
|
Hi ,
I have seen the code it is pretty good .
but i have an issue.i want to go to a particular node in the tree and remove the subtree corresponding to that node and put a new leaf node over there
This leaf node should be the majority of classes corresponding to subset of data till the node which we are going to remove.
Can you please help in this.
I need your help asap because i need to submit my assignment.
Thanks in advance
rasna
|
|
|
|
|
Hi ,
can you please help to know how to do post pruning for decsion trees in c#.i need your help asap.
Thanks
Rasna
|
|
|
|
|
sadly just for two class output decision...
but with modifying method internalMountTree, getpositive and getnegative method. this code can be used for more than 2 class output decision...
thx...
-- modified 10-May-12 21:44pm.
|
|
|
|
|
Hii,
Can you please send me the modified code if you did it please.
|
|
|
|
|
Hi could you please send me the sample codes, thank you
|
|
|
|
|
in this code, how we call call the class for printing the result in a web page.for printnode method,what we can give as the argument values,because rootnode value is coming from another class.i don't know how i will display the result.please anyone help me...its urgent
|
|
|
|
|
I am using the code of roosevelt.How the output class can be calledin a web page to show the result.I don't know how to do it.please anyone help.
|
|
|
|
|
This code is great, but I thought the concept deserved a little more configurability.
I've built a new executable, and refactored a bit to allow for configurability of source data - the source data is no longer built into the executable, but rather, you create a text file (csv) with source data in it. It also uses Winforms instead of console output.
To access the source and executable, check out my blog - http://oopstruggles.blogspot.com/2011/02/id3-decision-tree-in-c.html[^]
|
|
|
|
|
Hello
I need this implementation with another spliting criterion for example gini index or Miss cllasification.
thanks
|
|
|
|
|
Hello
I just can't undrestand how you define new attributes in main? where you got them from?
thanks
|
|
|
|
|
I manually define as >> for example.. you want to have one attribute named "Income" and its values >> High, middle , low
Attribute Income = new Attribute("Income", new string[] { "High", "Middle", "Low" });
...
...
Attribute[] attributes = new Attribute[] { Gender, Age, Income };
|
|
|
|
|
Thank you for your reply.
I believe this algorithm is for C4.5 not really ID#. Because in this implementation we use information gain not just entropy. Am I right?
Do you have any source code for ID3 (just using Information entropy)
thank you for your help. my assignment due on Monday
|
|
|
|
|
I personally believe this version is ID3 without Gini Index or any missing attribution handler ( apart from getmostcommonvalue function )
This below code is partly extract from my web application with a little modification from Rossvelt 's one..
using System;
using System.Collections.Generic;
using System.IO;
using System.Data;
using System.Diagnostics;
using System.Collections;
namespace TravelSys.DAL.DSS
{
/// Class that represents an attribute used in the class of decision
public class Attribute
{
//string value;
ArrayList mValues;
string mName;
public object mLabel;
/// Initializes a new instance of a class Attribute
public Attribute(string name, string[] values)
{
mName = name;
mValues = new ArrayList(values);
mValues.Sort();
}
public Attribute(object Label)
{
mLabel = Label;
mName = string.Empty;
mValues = null;
}
/// Indicates the attribute name
public string AttributeName
{
get
{
return mName;
}
}
/// Returns an array with the attribute values
public string[] values
{
get
{
if (mValues != null)
return (string[])mValues.ToArray(typeof(string));
else
return null;
}
}
/// Indicates whether a value is allowed for this attribute
public bool isValidValue(string value)
{
return indexValue(value) >= 0;
}
/// Returns the index value
public int indexValue(string value)
{
if (mValues != null)
return mValues.BinarySearch(value);
else
return -1;
}
/// <returns>
public override string ToString()
{
if (mName != string.Empty)
{
return mName;
}
else
{
return mLabel.ToString();
}
}
}
/// Class that represent the decision tree built;
///
public class TreeNode
{
private ArrayList mChilds = null;
private Attribute mAttribute;
private int nData = 0;
private double _percentage = 100.0;
private TreeNode _parent = null;
int j = 1;
/// Initialize a new instance of TreeNode
public TreeNode(Attribute attribute)
{
try
{
if (attribute.values != null)
{
j++;
mChilds = new ArrayList(attribute.values.Length);
for (int i = 0; i < attribute.values.Length; i++)
mChilds.Add(null);
}
else
{
mChilds = new ArrayList(1);
mChilds.Add(null);
}
mAttribute = attribute;
}
catch (Exception)
{
}
}
/// Adds a TreeNode child in this branch name by indicating ValueName
public void AddTreeNode(TreeNode treeNode, string ValueName)
{
int index = mAttribute.indexValue(ValueName);
treeNode._parent = this;
mChilds[index] = treeNode;
}
/// Returns the total number of children node
public int totalChilds
{
get
{
return mChilds.Count;
}
}
/// Returns the child node of a node
public TreeNode getChild(int index)
{
return (TreeNode)mChilds[index];
}
public TreeNode Parent
{
get { return this._parent; }
}
/// Attribute that is connected to Node
public Attribute attribute
{
get
{
return mAttribute;
}
}
public int totalData
{
get { return nData; }
set { nData = value; }
}
public double percentage
{
get { return _percentage; }
set { _percentage = value; }
}
/// Returns the child of a node by the name of the branch leading to it
public TreeNode getChildByBranchName(string branchName)
{
int index = mAttribute.indexValue(branchName);
return (TreeNode)mChilds[index];
}
public static int MaxChildScoreIndex(TreeNode root)
{
double val = ((TreeNode)root.mChilds[0]).percentage;
int ind = 0;
for (int i = 0; i < root.mChilds.Count; i++)
{
if (val < ((TreeNode)root.mChilds[i]).percentage)
{
ind = i;
val = ((TreeNode)root.mChilds[i]).percentage;
}
}
return ind;
}
}
/// <summary>
/// Class that implements a decision tree using ID3 algorithm
/// </summary>
public class DecisionTreeID3
{
private DataTable mSamples;
private int mTotalPositives = 0;
private int mTotal = 0;
private string mTargetAttribute = "Outcome";
private double mEntropySet = 0.0;
/// Returns the total number of positive samples in a table of samples
private int countTotalPositives(DataTable samples)
{
int result = 0;
foreach (DataRow aRow in samples.Rows)
{
if ((string)aRow[mTargetAttribute.Trim()] == "Go")
result++;
}
return result;
}
// Calculate the entropy given the following formula
/// -p+log2p+ - p-log2p-
private double calcEntropy(int positives, int negatives)
{
int total = positives + negatives;
if (total == 0)
{
return 0;
}
double ratioPositive = (double)positives / total;
double ratioNegative = (double)negatives / total;
if (ratioPositive != 0)
ratioPositive = -(ratioPositive) * System.Math.Log(ratioPositive, 2);
if (ratioNegative != 0)
ratioNegative = -(ratioNegative) * System.Math.Log(ratioNegative, 2);
double result = ratioPositive + ratioNegative;
return result;
}
/// Returns the best attribute
private void getValuesToAttribute(DataTable samples, Attribute attribute, string value, out int positives, out int negatives)
{
positives = 0;
negatives = 0;
foreach (DataRow aRow in samples.Rows)
{
if (((string)aRow[attribute.AttributeName] == value))
if ((string)aRow[mTargetAttribute.Trim()] == "Go")
positives++;
else
negatives++;
}
}
/// Calculates the gain of an attribute
private double gain(DataTable samples, Attribute attribute)
{
string[] values = attribute.values;
double sum = 0.0;
for (int i = 0; i < values.Length; i++)
{
int positives, negatives;
positives = negatives = 0;
getValuesToAttribute(samples, attribute, values[i], out positives, out negatives);
double entropy = calcEntropy(positives, negatives);
sum += -(double)((positives + negatives) * entropy) / mTotal;
}
return mEntropySet + sum;
}
/// Returns the best attribute
private Attribute getBestAttribute(DataTable samples, Attribute[] attributes)
{
double maxGain = 0.0;
Attribute result = null;
try
{
foreach (Attribute attribute in attributes)
{
double aux = gain(samples, attribute);
if (aux > maxGain)
{
maxGain = aux;
result = attribute;
}
}
}
catch
{ }
return result;
}
/// Returns true if all examples are positive sampling
private bool allSamplesPositives(DataTable samples, string targetAttribute)
{
foreach (DataRow row in samples.Rows)
{
if ((string)row[targetAttribute.Trim()] == "NotGo")
return false;
}
return true;
}
/// Returns true if all are negative examples of sampling
private bool allSamplesNegatives(DataTable samples, string targetAttribute)
{
foreach (DataRow row in samples.Rows)
{
if ((string)row[targetAttribute.Trim()] == "Go")
return false;
}
return true;
}
/// Returns a list of all the distinct values from a table of random
private ArrayList getDistinctValues(DataTable samples, string targetAttribute)
{
ArrayList distinctValues = new ArrayList(samples.Rows.Count);
foreach (DataRow row in samples.Rows)
{
if (distinctValues.IndexOf(row[targetAttribute]) == -1)
distinctValues.Add(row[targetAttribute]);
}
return distinctValues;
}
/// Return the most common value within a sampling
private object getMostCommonValue(DataTable samples, string targetAttribute)
{
ArrayList distinctValues = getDistinctValues(samples, targetAttribute);
int[] count = new int[distinctValues.Count];
if (count.Length == 0) return distinctValues[0];
foreach (DataRow row in samples.Rows)
{
int index = distinctValues.IndexOf(row[targetAttribute]);
count[index]++;
}
int MaxIndex = 0;
int MaxCount = 0;
for (int i = 0; i < count.Length; i++)
{
if (count[i] > MaxCount)
{
MaxCount = count[i];
MaxIndex = i;
}
}
return distinctValues[MaxIndex];
}
/// Assemble a decision tree based on samples submitted
private TreeNode internalMountTree(DataTable samples, string targetAttribute, Attribute[] attributes)
{
if (allSamplesPositives(samples, targetAttribute) == true)
{
TreeNode tmpNode = new TreeNode(new Attribute(true));
tmpNode.totalData = samples.Rows.Count;
return tmpNode;
}
if (allSamplesNegatives(samples, targetAttribute) == true)
{
TreeNode tmpNode = new TreeNode(new Attribute(false));
tmpNode.totalData = samples.Rows.Count;
return tmpNode;
//return new TreeNode(new Attribute(false));
}
if (attributes.Length == 0)
return new TreeNode(new Attribute(getMostCommonValue(samples, targetAttribute)));
mTotal = samples.Rows.Count;
mTargetAttribute = targetAttribute;
mTotalPositives = countTotalPositives(samples);
mEntropySet = calcEntropy(mTotalPositives, mTotal - mTotalPositives);
Attribute bestAttribute = getBestAttribute(samples, attributes);
TreeNode root = new TreeNode(bestAttribute);
root.totalData = mTotal;
DataTable aSample = samples.Clone();
try
{
foreach (string value in bestAttribute.values)
{
// Selects all elements with the value of this attribute
aSample.Rows.Clear();
DataRow[] rows = samples.Select(bestAttribute.AttributeName + " = " + "'" + value + "'");
foreach (DataRow row in rows)
{
aSample.Rows.Add(row.ItemArray);
}
// Selects all elements with the value of this attribute
// Create a new list of attributes unless the attribute that is the current best attribute
ArrayList aAttributes = new ArrayList(attributes.Length - 1);
for (int i = 0; i < attributes.Length; i++)
{
if (attributes[i].AttributeName != bestAttribute.AttributeName)
aAttributes.Add(attributes[i]);
}
// Create a new list of attributes unless the attribute that is the current best attribute
if (aSample.Rows.Count == 0)
{
try
{
TreeNode retNode = new TreeNode(new Attribute(getMostCommonValue(aSample, targetAttribute)));
root.AddTreeNode(retNode, value);
}
catch
{
root.AddTreeNode(new TreeNode(new Attribute("")), value);
}
// return retNode;
}
else
{
DecisionTreeID3 dc3 = new DecisionTreeID3();
TreeNode ChildNode = dc3.mountTree(aSample, targetAttribute, (Attribute[])aAttributes.ToArray(typeof(Attribute)));
root.AddTreeNode(ChildNode, value);
if (root.totalData > 0)
{
ChildNode.percentage = (Convert.ToDouble(ChildNode.totalData) / Convert.ToDouble(root.totalData)) * 100;
}
}
}
}
catch (Exception)
{
}
return root;
}
/// Assemble a decision tree based on samples submitted
public TreeNode mountTree(DataTable samples, string targetAttribute, Attribute[] attributes)
{
mSamples = samples;
return internalMountTree(mSamples, targetAttribute, attributes);
}
}
/// Class which exemplifies the use of ID3
public class ID3Sample
{
public static string _temptext = "";
public static double Treescore = 1;
public string temptext
{
get
{
return _temptext;
}
set
{
_temptext = value;
}
}
/// Print Node and Branch tree
public static void printNode(TreeNode root, int tabs, bool showValue)
{
//Check if it is root or leaf line, if root then try branch and sub-branch next
//Write Root line
if (tabs == 0)
{
_temptext = "";
if (showValue)
{
_temptext += "##" + root.attribute + "## [" + root.totalData + "] (" + root.percentage.ToString("#0.00") + "%)<br>";
}
else
{
_temptext += "##" + root.attribute + "##<br>";
}
}
// Write Leaf line
else
{
if (showValue)
{
_temptext += " --> " + root.attribute + " [" + root.totalData + "] (" + root.percentage.ToString("#0.00") + "%)<br>";
}
else
{
_temptext += " --> " + root.attribute + "<br>";
}
}
try
// Write Branch line
{
if (root.attribute.values != null)
{
tabs += 1;
for (int i = 0; i < root.attribute.values.Length; i++)
{
if (tabs == 1)
{
_temptext += "<br>" + "..++" + root.attribute.values[i];
}
else if (tabs == 2)
{
_temptext += "<br>" + "..........++" + root.attribute.values[i];
}
else if (tabs == 3)
{
_temptext += "<br>" + ".........................++" + root.attribute.values[i];
}
else if (tabs == 4)
{
_temptext += "<br>" + "............................................++" + root.attribute.values[i];
}
else
{
_temptext += "<br>" + "...............................................................++" + root.attribute.values[i];
}
TreeNode childNode = root.getChildByBranchName(root.attribute.values[i]);
printNode(childNode, tabs, showValue);
}
}
else
{ _temptext += "<br>"; }
}
catch (Exception)
{
_temptext += "<br>";
}
}
|
|
|
|
|
Dear Roosevelts, thank you for your great contribution. I has implemented your decision tree code to my project and found that there was a problem if one of root attributes contains no data ( total == 0 )and tree result is a little bit different from WEKA's result as shown below .
private double calcEntropy(int positives, int negatives)
{
int total = positives + negatives;
double ratioPositive = (double)positives / total;
double ratioNegative = (double)negatives / total;
if (ratioPositive != 0)
ratioPositive = -(ratioPositive) * System.Math.Log(ratioPositive, 2);
if (ratioNegative != 0)
ratioNegative = -(ratioNegative) * System.Math.Log(ratioNegative, 2);
double result = ratioPositive + ratioNegative;
return result;
}
/// Returns the best attribute
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Below is my modification
// Calculate the entropy given the following formula
/// -p+log2p+ - p-log2p-
private double calcEntropy(int positives, int negatives)
{
int total = positives + negatives;
if (total == 0)
{
return 0;
}
double ratioPositive = (double)positives / total;
double ratioNegative = (double)negatives / total;
if (ratioPositive != 0)
ratioPositive = -(ratioPositive) * System.Math.Log(ratioPositive, 2);
if (ratioNegative != 0)
ratioNegative = -(ratioNegative) * System.Math.Log(ratioNegative, 2);
double result = ratioPositive + ratioNegative;
return result;
}
/// Returns the best attribute
/// Assemble a decision tree based on samples submitted
private TreeNode internalMountTree(DataTable samples, string targetAttribute, Attribute[] attributes)
{
if (allSamplesPositives(samples, targetAttribute) == true)
{
TreeNode tmpNode = new TreeNode(new Attribute(true));
tmpNode.totalData = samples.Rows.Count;
return tmpNode;
}
if (allSamplesNegatives(samples, targetAttribute) == true)
{
TreeNode tmpNode = new TreeNode(new Attribute(false));
tmpNode.totalData = samples.Rows.Count;
return tmpNode;
//return new TreeNode(new Attribute(false));
}
if (attributes.Length == 0)
return new TreeNode(new Attribute(getMostCommonValue(samples, targetAttribute)));
mTotal = samples.Rows.Count;
mTargetAttribute = targetAttribute;
mTotalPositives = countTotalPositives(samples);
mEntropySet = calcEntropy(mTotalPositives, mTotal - mTotalPositives);
Attribute bestAttribute = getBestAttribute(samples, attributes);
TreeNode root = new TreeNode(bestAttribute);
root.totalData = mTotal;
DataTable aSample = samples.Clone();
try
{
foreach (string value in bestAttribute.values)
{
// Selects all elements with the value of this attribute
aSample.Rows.Clear();
DataRow[] rows = samples.Select(bestAttribute.AttributeName + " = " + "'" + value + "'");
foreach (DataRow row in rows)
{
aSample.Rows.Add(row.ItemArray);
}
// Selects all elements with the value of this attribute
// Create a new list of attributes unless the attribute that is the current best attribute
ArrayList aAttributes = new ArrayList(attributes.Length - 1);
for (int i = 0; i < attributes.Length; i++)
{
if (attributes[i].AttributeName != bestAttribute.AttributeName)
aAttributes.Add(attributes[i]);
}
// Create a new list of attributes unless the attribute that is the current best attribute
if (aSample.Rows.Count == 0)
{
try
{
TreeNode retNode = new TreeNode(new Attribute(getMostCommonValue(aSample, targetAttribute)));
root.AddTreeNode(retNode, value);
}
catch
{
root.AddTreeNode(new TreeNode(new Attribute("")), value);
}
// return retNode;
}
else
{
DecisionTreeID3 dc3 = new DecisionTreeID3();
TreeNode ChildNode = dc3.mountTree(aSample, targetAttribute, (Attribute[])aAttributes.ToArray(typeof(Attribute)));
root.AddTreeNode(ChildNode, value);
if (root.totalData > 0)
{
ChildNode.percentage = (Convert.ToDouble(ChildNode.totalData) / Convert.ToDouble(root.totalData)) * 100;
}
}
}
}
catch (Exception)
{
}
return root;
}
/// Assemble a decision tree based on samples submitted
modified on Monday, September 27, 2010 5:05 AM
|
|
|
|
|
thanhks for code.
But when I use adventure work DW to predict buyer, compare with ssas get different result. Can u explain it?
le quyen
|
|
|
|
|
Hello, there is error in
if (aSample.Rows.Count == 0)
{
return (new TreeNode(new Attribute(getMostCommonValue(ASamples, targetAttribute))),value);
}
it should be
if (aSample.Rows.Count == 0)
{
root.AddTreeNode(new TreeNode(new Attribute(getMostCommonValue(samples, targetAttribute))),value);
}
because if the count is zero the program will resulted an error
--
Kevin Tanadi
--
|
|
|
|
|
can u give me a codes about b-trees in c++
|
|
|
|
|