|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectweka.classifiers.Classifier
weka.classifiers.lazy.IB1_Anon
public class IB1_Anon
Instance-based classification method for anonymized data as described in the following paper:
@inproceedings{DBLP:conf/icde/InanKB09, author = {Ali Inan and Murat Kantarcioglu and Elisa Bertino}, title = {Using Anonymized Data for Classification}, booktitle = {ICDE}, year = {2009}, pages = {429-440} }All quasi-identifier information is read from the configuration file that was used in anonymizing the original dataset. Please make sure that any parameters passed as program arguments during anonymization are reflected in the configuration file (especially the outputFormat field). We assume that the configuration file did not contain any specifications for identifier attributes. Otherwise, the configuration file should be edited such that all reference to id attributes are removed and the indices of quasi-identifier attributes are re-set according to the new schema (e.g., if the index of a QID attribute was 4 and an id attribute was at index 2; the updated index of the QID attribute should become 3 to reflect the removal of the id attribute). Generalized attributes are represented as attributes of type String within the WEKA framework. Such representation allows handling of complex generalizations. For example if QI-Statistics are provided instead of generalizations we can compare the two distributions. On the other hand, if generalized values are provided, we can set the mid-point for numeric attributes and treat generalizations as new categories. Another major advantage is that, the user does not have to deal with the tedious work of listing all possible generalizations in the ARFF file header (i.e., simply input "@ATTRIBUTE name string" for a quasi-identifier attribute with the name "name"). When comparing two generalized values, a distinct comparison function gen_distance is called. This function assumes that both generalized values are obtained from the same output format of the anonymization toolbox. If this assumption fails, gen_distance will print an error message and exit with an error code. Therefore it is important that the input records are all output in a similar fashion. Among different output formats supported by the anonymization toolbox, this classifier can only handle genVals and genValsDist. For the time being, we do not plan add support for the anatomy format due to the costly join operation involved.
Field Summary | |
---|---|
private java.lang.String |
configFile
path to the configuration file |
private double[] |
m_MaxArray
The maximum values for numeric attributes. |
private double[] |
m_MinArray
The minimum values for numeric attributes. |
private weka.core.Instances |
m_Train
The training instances used for classification. |
private QIDAttribute[] |
qids
QuasiIdentifiers for generalized attributes. |
(package private) static long |
serialVersionUID
for serialization |
Fields inherited from class weka.classifiers.Classifier |
---|
m_Debug |
Constructor Summary | |
---|---|
IB1_Anon()
Class constructor. |
|
IB1_Anon(Configuration conf)
Class constructor. |
Method Summary | |
---|---|
void |
buildClassifier(weka.core.Instances instances)
Generates the classifier. |
double |
classifyInstance(weka.core.Instance instance)
Classifies the given test instance. |
java.lang.String |
configFileTipText()
Returns the tip text for this property |
private double |
distance(weka.core.Instance first,
weka.core.Instance second)
Calculates the distance between two instances |
private double |
gen_distance(java.lang.String val1,
java.lang.String val2,
QIDAttribute qid)
Method of calculating the generalized distance between any two values. |
weka.core.Capabilities |
getCapabilities()
Returns default capabilities of the classifier. |
java.lang.String |
getConfigFile()
Get the anonymization configuration file path |
java.lang.String[] |
getOptions()
Gets the current settings of IB1_Anon |
weka.core.TechnicalInformation |
getTechnicalInformation()
|
java.lang.String |
globalInfo()
Returns a string describing classifier |
java.util.Enumeration |
listOptions()
Returns an enumeration describing the available options. |
static void |
main(java.lang.String[] argv)
Main method for testing this class. |
private double |
norm(double x,
int i)
Normalizes a given value of a numeric attribute. |
private void |
setConfigFile(java.lang.String config)
|
void |
setOptions(java.lang.String[] options)
-config <num> path to the anonymization configuration file (default config.xml). |
java.lang.String |
toString()
Returns a description of this classifier. |
void |
updateClassifier(weka.core.Instance instance)
Updates the classifier. |
private void |
updateMinMax(weka.core.Instance instance)
Updates the minimum and maximum values for all the attributes based on a new instance. |
Methods inherited from class weka.classifiers.Classifier |
---|
debugTipText, distributionForInstance, forName, getDebug, makeCopies, makeCopy, runClassifier, setDebug |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
private java.lang.String configFile
static final long serialVersionUID
private weka.core.Instances m_Train
private double[] m_MinArray
private double[] m_MaxArray
private QIDAttribute[] qids
Constructor Detail |
---|
public IB1_Anon()
confFilename
- name of the configuration file, used for anonymizationpublic IB1_Anon(Configuration conf)
conf
- Anonymization configurationMethod Detail |
---|
public java.lang.String globalInfo()
public java.util.Enumeration listOptions()
listOptions
in interface weka.core.OptionHandler
listOptions
in class weka.classifiers.Classifier
public java.lang.String configFileTipText()
public java.lang.String getConfigFile()
public java.lang.String[] getOptions()
getOptions
in interface weka.core.OptionHandler
getOptions
in class weka.classifiers.Classifier
public void setOptions(java.lang.String[] options) throws java.lang.Exception
-config <num> path to the anonymization configuration file (default config.xml).
setOptions
in interface weka.core.OptionHandler
setOptions
in class weka.classifiers.Classifier
options
- the list of options as an array of strings
java.lang.Exception
private void setConfigFile(java.lang.String config) throws java.lang.Exception
java.lang.Exception
public weka.core.TechnicalInformation getTechnicalInformation()
getTechnicalInformation
in interface weka.core.TechnicalInformationHandler
public weka.core.Capabilities getCapabilities()
getCapabilities
in interface weka.core.CapabilitiesHandler
getCapabilities
in class weka.classifiers.Classifier
public void buildClassifier(weka.core.Instances instances) throws java.lang.Exception
buildClassifier
in class weka.classifiers.Classifier
instances
- set of instances serving as training data
java.lang.Exception
- if the classifier has not been generated successfullypublic void updateClassifier(weka.core.Instance instance) throws java.lang.Exception
updateClassifier
in interface weka.classifiers.UpdateableClassifier
instance
- the instance to be put into the classifier
java.lang.Exception
- if the instance could not be included successfullypublic double classifyInstance(weka.core.Instance instance) throws java.lang.Exception
classifyInstance
in class weka.classifiers.Classifier
instance
- the instance to be classified
java.lang.Exception
- if the instance can't be classifiedpublic java.lang.String toString()
toString
in class java.lang.Object
private double distance(weka.core.Instance first, weka.core.Instance second) throws java.lang.Exception
first
- the first instancesecond
- the second instance
java.lang.Exception
private double gen_distance(java.lang.String val1, java.lang.String val2, QIDAttribute qid) throws java.lang.Exception
val1
- String representation of the first generalized valueval2
- String representation of the second generalized valueqid
- Quasi-identifier attribute
java.lang.Exception
private double norm(double x, int i)
x
- the value to be normalizedi
- the attribute's index
private void updateMinMax(weka.core.Instance instance)
instance
- the new instancepublic static void main(java.lang.String[] argv)
argv
- should contain command line arguments for evaluation
(see Evaluation).
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |