featRep
Class FeatureRepresentation

java.lang.Object
  extended by featRep.FeatureRepresentation

public class FeatureRepresentation
extends java.lang.Object

This class converts records in relational format to a feature vector based on various heuristics described in the following paper:

 @inproceedings{DBLP:conf/icde/InanKB09,
 author    = {Ali Inan and
             Murat Kantarcioglu and
             Elisa Bertino},
 title     = {Using Anonymized Data for Classification},
 booktitle = {ICDE},
 year      = {2009},
 pages     = {429-440}
 }
 
The class attribute is assumed to be the last attribute of the input file. Please replace if that is not the case.

The constructor parameters are as follows:


Field Summary
 java.lang.Object[] attributes
          Vector of attribute (either NumericAtt or CategoricalAtt object
private  int catRep
          Categorical representation heuristic of choice.
private  java.lang.String[] classValues
          Vector of class values
private  int[] featAttMapping
          The i^th entry contains the index of the attribute feature i correspond to
private  java.util.LinkedList<java.lang.Integer> idAttributes
          List of identifier attribute indices within the original file
 boolean[] isCont
          Entry i is true if i^th attribute is numerical
private  boolean[] isGeneralized
          Entry i is true if i^th attribute has been generalized
private  int numRep
          Numeric representation heuristic of choice.
private  QIDAttribute[] qids
          Quasi-identifier attributes (indices should be adjusted according to id attributes)
private  boolean usePDFExp
          Choice of expected distance calculation method.
private  boolean useUniExp
          Choice of expected distance calculation method.
 
Constructor Summary
FeatureRepresentation(java.lang.String descriptorFilename, QIDAttribute[] qids, java.util.LinkedList<java.lang.Integer> idAttributes, int numRep, int catRep, boolean usePDFExp, boolean useUniExp)
          Class constructor
 
Method Summary
 void assignFeatureIndices()
          Assigns feature indices to each attribute
 double dotProduct(svm_node[] x)
          Computes the expected dot product
 double dotProduct(svm_node[] x, svm_node[] y)
          Computes the expected dot product
private  double dotProductPDF(svm_node[] x)
          Computes the expected dot product based on QI-statistics
private  double dotProductPDF(svm_node[] x, svm_node[] y)
          Computes the expected dot product based on QI-statistics
private  double dotProductUni(svm_node[] x)
          Computes the expected dot product based on the assumption that values of a generalization are distributed uniformly
private  double dotProductUni(svm_node[] x, svm_node[] y)
          Computes the expected dot product based on the assumption that values of a generalization are distributed uniformly
 void featurize(java.lang.String inputFile, java.lang.String outputFile)
          Convert the input to a set of feature vectors, which will be written to the output file
 void initialScan(java.lang.String inputFile)
          Sets lower/upper bound for numeric attributes (does nothing for categorical)
private  boolean isIdentifier(int index)
          Checks if the attribute at index is an identifier
private  void readDescriptor(java.lang.String descriptor)
          Read the descriptor file
 double squareDistance(svm_node[] x, svm_node[] y)
          Computes the expected square distance
private  double squareDistancePDF(svm_node[] x, svm_node[] y)
          Computes the expected square distance based on QI-statistics
private  double squareDistanceUni(svm_node[] x, svm_node[] y)
          Computes the expected square distance based on the assumption that values of a generalization are distributed uniformly
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

classValues

private java.lang.String[] classValues
Vector of class values


attributes

public java.lang.Object[] attributes
Vector of attribute (either NumericAtt or CategoricalAtt object


isCont

public boolean[] isCont
Entry i is true if i^th attribute is numerical


isGeneralized

private boolean[] isGeneralized
Entry i is true if i^th attribute has been generalized


numRep

private int numRep
Numeric representation heuristic of choice. See Params for alternatives.


catRep

private int catRep
Categorical representation heuristic of choice. See Params for alternatives.


usePDFExp

private boolean usePDFExp
Choice of expected distance calculation method. This one uses the PDF obtained from an anonymization outputted with the genValsDist output format option of the anonymization toolbox.


useUniExp

private boolean useUniExp
Choice of expected distance calculation method. This one assumes all values within a generalization are uniformly distributed.


qids

private QIDAttribute[] qids
Quasi-identifier attributes (indices should be adjusted according to id attributes)


idAttributes

private java.util.LinkedList<java.lang.Integer> idAttributes
List of identifier attribute indices within the original file


featAttMapping

private int[] featAttMapping
The i^th entry contains the index of the attribute feature i correspond to

Constructor Detail

FeatureRepresentation

public FeatureRepresentation(java.lang.String descriptorFilename,
                             QIDAttribute[] qids,
                             java.util.LinkedList<java.lang.Integer> idAttributes,
                             int numRep,
                             int catRep,
                             boolean usePDFExp,
                             boolean useUniExp)
Class constructor

Parameters:
descriptorFilename - A names files that describes attribute types and/or domains
anonConfig - Anonymization configuration file
numRep - Numeric feature representation heuristic
catRep - Categorical feature representation heuristic
usePDFExp - Flag for calculating expected distance based on QI-statistics
useUniExp - Flag for calculating expected distance based on uniform distribution assumption
Method Detail

isIdentifier

private boolean isIdentifier(int index)
Checks if the attribute at index is an identifier

Parameters:
index - attribute index
Returns:
true if identifier, false otherwise

readDescriptor

private void readDescriptor(java.lang.String descriptor)
Read the descriptor file

Parameters:
descriptor - filename

initialScan

public void initialScan(java.lang.String inputFile)
Sets lower/upper bound for numeric attributes (does nothing for categorical)

Parameters:
inputFile - Input data filename

assignFeatureIndices

public void assignFeatureIndices()
Assigns feature indices to each attribute


featurize

public void featurize(java.lang.String inputFile,
                      java.lang.String outputFile)
Convert the input to a set of feature vectors, which will be written to the output file

Parameters:
inputFile - input filename
outputFile - output filename

squareDistance

public double squareDistance(svm_node[] x,
                             svm_node[] y)
Computes the expected square distance

Parameters:
x - vector of features
y - vector of features
Returns:
Expected square distance between x and y

dotProduct

public double dotProduct(svm_node[] x,
                         svm_node[] y)
Computes the expected dot product

Parameters:
x - vector of features
y - vector of features
Returns:
Expected dot product of x and y

dotProduct

public double dotProduct(svm_node[] x)
Computes the expected dot product

Parameters:
x - vector of features
Returns:
Expected dot product of x and x

squareDistancePDF

private double squareDistancePDF(svm_node[] x,
                                 svm_node[] y)
Computes the expected square distance based on QI-statistics

Parameters:
x - vector of features
y - vector of features
Returns:
Expected square distance between x and y

dotProductPDF

private double dotProductPDF(svm_node[] x,
                             svm_node[] y)
Computes the expected dot product based on QI-statistics

Parameters:
x - vector of features
y - vector of features
Returns:
Expected dot product between x and y

dotProductPDF

private double dotProductPDF(svm_node[] x)
Computes the expected dot product based on QI-statistics

Parameters:
x - vector of features
Returns:
Expected dot product between x and x

squareDistanceUni

private double squareDistanceUni(svm_node[] x,
                                 svm_node[] y)
Computes the expected square distance based on the assumption that values of a generalization are distributed uniformly

Parameters:
x - vector of features
y - vector of features
Returns:
Expected square distance between x and y

dotProductUni

private double dotProductUni(svm_node[] x,
                             svm_node[] y)
Computes the expected dot product based on the assumption that values of a generalization are distributed uniformly

Parameters:
x - vector of features
y - vector of features
Returns:
Expected dot product of x and y

dotProductUni

private double dotProductUni(svm_node[] x)
Computes the expected dot product based on the assumption that values of a generalization are distributed uniformly

Parameters:
x - vector of features
Returns:
Expected dot product of x and x