weka.classifiers.lazy
Class IB1_Anon

java.lang.Object
  extended by weka.classifiers.Classifier
      extended by weka.classifiers.lazy.IB1_Anon
All Implemented Interfaces:
java.io.Serializable, java.lang.Cloneable, weka.classifiers.UpdateableClassifier, weka.core.CapabilitiesHandler, weka.core.OptionHandler, weka.core.TechnicalInformationHandler

public class IB1_Anon
extends weka.classifiers.Classifier
implements weka.classifiers.UpdateableClassifier, weka.core.TechnicalInformationHandler

Instance-based classification method for anonymized data as described in the following paper:

 @inproceedings{DBLP:conf/icde/InanKB09,
 author    = {Ali Inan and
             Murat Kantarcioglu and
             Elisa Bertino},
 title     = {Using Anonymized Data for Classification},
 booktitle = {ICDE},
 year      = {2009},
 pages     = {429-440}
 }
 
All quasi-identifier information is read from the configuration file that was used in anonymizing the original dataset. Please make sure that any parameters passed as program arguments during anonymization are reflected in the configuration file (especially the outputFormat field).

We assume that the configuration file did not contain any specifications for identifier attributes. Otherwise, the configuration file should be edited such that all reference to id attributes are removed and the indices of quasi-identifier attributes are re-set according to the new schema (e.g., if the index of a QID attribute was 4 and an id attribute was at index 2; the updated index of the QID attribute should become 3 to reflect the removal of the id attribute).

Generalized attributes are represented as attributes of type String within the WEKA framework. Such representation allows handling of complex generalizations. For example if QI-Statistics are provided instead of generalizations we can compare the two distributions. On the other hand, if generalized values are provided, we can set the mid-point for numeric attributes and treat generalizations as new categories. Another major advantage is that, the user does not have to deal with the tedious work of listing all possible generalizations in the ARFF file header (i.e., simply input "@ATTRIBUTE name string" for a quasi-identifier attribute with the name "name").

When comparing two generalized values, a distinct comparison function gen_distance is called. This function assumes that both generalized values are obtained from the same output format of the anonymization toolbox. If this assumption fails, gen_distance will print an error message and exit with an error code. Therefore it is important that the input records are all output in a similar fashion.

Among different output formats supported by the anonymization toolbox, this classifier can only handle genVals and genValsDist. For the time being, we do not plan add support for the anatomy format due to the costly join operation involved.

See Also:
Serialized Form

Field Summary
private  java.lang.String configFile
          path to the configuration file
private  double[] m_MaxArray
          The maximum values for numeric attributes.
private  double[] m_MinArray
          The minimum values for numeric attributes.
private  weka.core.Instances m_Train
          The training instances used for classification.
private  QIDAttribute[] qids
          QuasiIdentifiers for generalized attributes.
(package private) static long serialVersionUID
          for serialization
 
Fields inherited from class weka.classifiers.Classifier
m_Debug
 
Constructor Summary
IB1_Anon()
          Class constructor.
IB1_Anon(Configuration conf)
          Class constructor.
 
Method Summary
 void buildClassifier(weka.core.Instances instances)
          Generates the classifier.
 double classifyInstance(weka.core.Instance instance)
          Classifies the given test instance.
 java.lang.String configFileTipText()
          Returns the tip text for this property
private  double distance(weka.core.Instance first, weka.core.Instance second)
          Calculates the distance between two instances
private  double gen_distance(java.lang.String val1, java.lang.String val2, QIDAttribute qid)
          Method of calculating the generalized distance between any two values.
 weka.core.Capabilities getCapabilities()
          Returns default capabilities of the classifier.
 java.lang.String getConfigFile()
          Get the anonymization configuration file path
 java.lang.String[] getOptions()
          Gets the current settings of IB1_Anon
 weka.core.TechnicalInformation getTechnicalInformation()
           
 java.lang.String globalInfo()
          Returns a string describing classifier
 java.util.Enumeration listOptions()
          Returns an enumeration describing the available options.
static void main(java.lang.String[] argv)
          Main method for testing this class.
private  double norm(double x, int i)
          Normalizes a given value of a numeric attribute.
private  void setConfigFile(java.lang.String config)
           
 void setOptions(java.lang.String[] options)
           -config <num> path to the anonymization configuration file (default config.xml).
 java.lang.String toString()
          Returns a description of this classifier.
 void updateClassifier(weka.core.Instance instance)
          Updates the classifier.
private  void updateMinMax(weka.core.Instance instance)
          Updates the minimum and maximum values for all the attributes based on a new instance.
 
Methods inherited from class weka.classifiers.Classifier
debugTipText, distributionForInstance, forName, getDebug, makeCopies, makeCopy, runClassifier, setDebug
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

configFile

private java.lang.String configFile
path to the configuration file


serialVersionUID

static final long serialVersionUID
for serialization

See Also:
Constant Field Values

m_Train

private weka.core.Instances m_Train
The training instances used for classification.


m_MinArray

private double[] m_MinArray
The minimum values for numeric attributes.


m_MaxArray

private double[] m_MaxArray
The maximum values for numeric attributes.


qids

private QIDAttribute[] qids
QuasiIdentifiers for generalized attributes.

Constructor Detail

IB1_Anon

public IB1_Anon()
Class constructor.

Parameters:
confFilename - name of the configuration file, used for anonymization

IB1_Anon

public IB1_Anon(Configuration conf)
Class constructor.

Parameters:
conf - Anonymization configuration
Method Detail

globalInfo

public java.lang.String globalInfo()
Returns a string describing classifier

Returns:
a description suitable for displaying in the explorer/experimenter gui

listOptions

public java.util.Enumeration listOptions()
Returns an enumeration describing the available options.

Specified by:
listOptions in interface weka.core.OptionHandler
Overrides:
listOptions in class weka.classifiers.Classifier
Returns:
an enumeration of all the available options.

configFileTipText

public java.lang.String configFileTipText()
Returns the tip text for this property

Returns:
tip text for this propert

getConfigFile

public java.lang.String getConfigFile()
Get the anonymization configuration file path

Returns:
path to the configuration file

getOptions

public java.lang.String[] getOptions()
Gets the current settings of IB1_Anon

Specified by:
getOptions in interface weka.core.OptionHandler
Overrides:
getOptions in class weka.classifiers.Classifier
Returns:
an array of strings suitable for passing to setOptions()

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
 -config <num>
 path to the anonymization configuration file
 (default config.xml).
 

Specified by:
setOptions in interface weka.core.OptionHandler
Overrides:
setOptions in class weka.classifiers.Classifier
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception

setConfigFile

private void setConfigFile(java.lang.String config)
                    throws java.lang.Exception
Throws:
java.lang.Exception

getTechnicalInformation

public weka.core.TechnicalInformation getTechnicalInformation()
Specified by:
getTechnicalInformation in interface weka.core.TechnicalInformationHandler

getCapabilities

public weka.core.Capabilities getCapabilities()
Returns default capabilities of the classifier.

Specified by:
getCapabilities in interface weka.core.CapabilitiesHandler
Overrides:
getCapabilities in class weka.classifiers.Classifier
Returns:
the capabilities of this classifier

buildClassifier

public void buildClassifier(weka.core.Instances instances)
                     throws java.lang.Exception
Generates the classifier.

Specified by:
buildClassifier in class weka.classifiers.Classifier
Parameters:
instances - set of instances serving as training data
Throws:
java.lang.Exception - if the classifier has not been generated successfully

updateClassifier

public void updateClassifier(weka.core.Instance instance)
                      throws java.lang.Exception
Updates the classifier.

Specified by:
updateClassifier in interface weka.classifiers.UpdateableClassifier
Parameters:
instance - the instance to be put into the classifier
Throws:
java.lang.Exception - if the instance could not be included successfully

classifyInstance

public double classifyInstance(weka.core.Instance instance)
                        throws java.lang.Exception
Classifies the given test instance.

Overrides:
classifyInstance in class weka.classifiers.Classifier
Parameters:
instance - the instance to be classified
Returns:
the predicted class for the instance
Throws:
java.lang.Exception - if the instance can't be classified

toString

public java.lang.String toString()
Returns a description of this classifier.

Overrides:
toString in class java.lang.Object
Returns:
a description of this classifier as a string.

distance

private double distance(weka.core.Instance first,
                        weka.core.Instance second)
                 throws java.lang.Exception
Calculates the distance between two instances

Parameters:
first - the first instance
second - the second instance
Returns:
the distance between the two given instances
Throws:
java.lang.Exception

gen_distance

private double gen_distance(java.lang.String val1,
                            java.lang.String val2,
                            QIDAttribute qid)
                     throws java.lang.Exception
Method of calculating the generalized distance between any two values.

Parameters:
val1 - String representation of the first generalized value
val2 - String representation of the second generalized value
qid - Quasi-identifier attribute
Returns:
Expected distance between two generalized values
Throws:
java.lang.Exception

norm

private double norm(double x,
                    int i)
Normalizes a given value of a numeric attribute.

Parameters:
x - the value to be normalized
i - the attribute's index
Returns:
the normalized value

updateMinMax

private void updateMinMax(weka.core.Instance instance)
Updates the minimum and maximum values for all the attributes based on a new instance.

Parameters:
instance - the new instance

main

public static void main(java.lang.String[] argv)
Main method for testing this class.

Parameters:
argv - should contain command line arguments for evaluation (see Evaluation).