|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectweka.classifiers.Classifier
weka.classifiers.lazy.IB1_Anon
public class IB1_Anon
Instance-based classification method for anonymized data as described in the following paper:
@inproceedings{DBLP:conf/icde/InanKB09,
author = {Ali Inan and
Murat Kantarcioglu and
Elisa Bertino},
title = {Using Anonymized Data for Classification},
booktitle = {ICDE},
year = {2009},
pages = {429-440}
}
All quasi-identifier information is read from the configuration file
that was used in anonymizing the original dataset. Please make sure that
any parameters passed as program arguments during anonymization are
reflected in the configuration file (especially the outputFormat field).
We assume that the configuration file did not contain any specifications
for identifier attributes. Otherwise, the configuration file should be edited
such that all reference to id attributes are removed and the indices of
quasi-identifier attributes are re-set according to the new schema (e.g., if
the index of a QID attribute was 4 and an id attribute was at index 2; the updated
index of the QID attribute should become 3 to reflect the removal of the id attribute).
Generalized attributes are represented as attributes of type String within the
WEKA framework. Such representation allows handling of complex generalizations. For
example if QI-Statistics are provided instead of generalizations we can compare the
two distributions. On the other hand, if generalized values are provided, we can set
the mid-point for numeric attributes and treat generalizations as new categories. Another
major advantage is that, the user does not have to deal with the tedious work of listing
all possible generalizations in the ARFF file header (i.e., simply input
"@ATTRIBUTE name string" for a quasi-identifier attribute with the name "name").
When comparing two generalized values, a distinct comparison function gen_distance is
called. This function assumes that both generalized values are obtained from the same output
format of the anonymization toolbox. If this assumption fails, gen_distance will print an
error message and exit with an error code. Therefore it is important that the input records
are all output in a similar fashion.
Among different output formats supported by the anonymization toolbox, this classifier
can only handle genVals and genValsDist. For the time being, we do not plan add support
for the anatomy format due to the costly join operation involved.
| Field Summary | |
|---|---|
private java.lang.String |
configFile
path to the configuration file |
private double[] |
m_MaxArray
The maximum values for numeric attributes. |
private double[] |
m_MinArray
The minimum values for numeric attributes. |
private weka.core.Instances |
m_Train
The training instances used for classification. |
private QIDAttribute[] |
qids
QuasiIdentifiers for generalized attributes. |
(package private) static long |
serialVersionUID
for serialization |
| Fields inherited from class weka.classifiers.Classifier |
|---|
m_Debug |
| Constructor Summary | |
|---|---|
IB1_Anon()
Class constructor. |
|
IB1_Anon(Configuration conf)
Class constructor. |
|
| Method Summary | |
|---|---|
void |
buildClassifier(weka.core.Instances instances)
Generates the classifier. |
double |
classifyInstance(weka.core.Instance instance)
Classifies the given test instance. |
java.lang.String |
configFileTipText()
Returns the tip text for this property |
private double |
distance(weka.core.Instance first,
weka.core.Instance second)
Calculates the distance between two instances |
private double |
gen_distance(java.lang.String val1,
java.lang.String val2,
QIDAttribute qid)
Method of calculating the generalized distance between any two values. |
weka.core.Capabilities |
getCapabilities()
Returns default capabilities of the classifier. |
java.lang.String |
getConfigFile()
Get the anonymization configuration file path |
java.lang.String[] |
getOptions()
Gets the current settings of IB1_Anon |
weka.core.TechnicalInformation |
getTechnicalInformation()
|
java.lang.String |
globalInfo()
Returns a string describing classifier |
java.util.Enumeration |
listOptions()
Returns an enumeration describing the available options. |
static void |
main(java.lang.String[] argv)
Main method for testing this class. |
private double |
norm(double x,
int i)
Normalizes a given value of a numeric attribute. |
private void |
setConfigFile(java.lang.String config)
|
void |
setOptions(java.lang.String[] options)
-config <num> path to the anonymization configuration file (default config.xml). |
java.lang.String |
toString()
Returns a description of this classifier. |
void |
updateClassifier(weka.core.Instance instance)
Updates the classifier. |
private void |
updateMinMax(weka.core.Instance instance)
Updates the minimum and maximum values for all the attributes based on a new instance. |
| Methods inherited from class weka.classifiers.Classifier |
|---|
debugTipText, distributionForInstance, forName, getDebug, makeCopies, makeCopy, runClassifier, setDebug |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Field Detail |
|---|
private java.lang.String configFile
static final long serialVersionUID
private weka.core.Instances m_Train
private double[] m_MinArray
private double[] m_MaxArray
private QIDAttribute[] qids
| Constructor Detail |
|---|
public IB1_Anon()
confFilename - name of the configuration file, used for anonymizationpublic IB1_Anon(Configuration conf)
conf - Anonymization configuration| Method Detail |
|---|
public java.lang.String globalInfo()
public java.util.Enumeration listOptions()
listOptions in interface weka.core.OptionHandlerlistOptions in class weka.classifiers.Classifierpublic java.lang.String configFileTipText()
public java.lang.String getConfigFile()
public java.lang.String[] getOptions()
getOptions in interface weka.core.OptionHandlergetOptions in class weka.classifiers.Classifier
public void setOptions(java.lang.String[] options)
throws java.lang.Exception
-config <num> path to the anonymization configuration file (default config.xml).
setOptions in interface weka.core.OptionHandlersetOptions in class weka.classifiers.Classifieroptions - the list of options as an array of strings
java.lang.Exception
private void setConfigFile(java.lang.String config)
throws java.lang.Exception
java.lang.Exceptionpublic weka.core.TechnicalInformation getTechnicalInformation()
getTechnicalInformation in interface weka.core.TechnicalInformationHandlerpublic weka.core.Capabilities getCapabilities()
getCapabilities in interface weka.core.CapabilitiesHandlergetCapabilities in class weka.classifiers.Classifier
public void buildClassifier(weka.core.Instances instances)
throws java.lang.Exception
buildClassifier in class weka.classifiers.Classifierinstances - set of instances serving as training data
java.lang.Exception - if the classifier has not been generated successfully
public void updateClassifier(weka.core.Instance instance)
throws java.lang.Exception
updateClassifier in interface weka.classifiers.UpdateableClassifierinstance - the instance to be put into the classifier
java.lang.Exception - if the instance could not be included successfully
public double classifyInstance(weka.core.Instance instance)
throws java.lang.Exception
classifyInstance in class weka.classifiers.Classifierinstance - the instance to be classified
java.lang.Exception - if the instance can't be classifiedpublic java.lang.String toString()
toString in class java.lang.Object
private double distance(weka.core.Instance first,
weka.core.Instance second)
throws java.lang.Exception
first - the first instancesecond - the second instance
java.lang.Exception
private double gen_distance(java.lang.String val1,
java.lang.String val2,
QIDAttribute qid)
throws java.lang.Exception
val1 - String representation of the first generalized valueval2 - String representation of the second generalized valueqid - Quasi-identifier attribute
java.lang.Exception
private double norm(double x,
int i)
x - the value to be normalizedi - the attribute's index
private void updateMinMax(weka.core.Instance instance)
instance - the new instancepublic static void main(java.lang.String[] argv)
argv - should contain command line arguments for evaluation
(see Evaluation).
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||