|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.ObjectfeatRep.FeatureRepresentation
public class FeatureRepresentation
This class converts records in relational format to a feature vector based on various heuristics described in the following paper:
@inproceedings{DBLP:conf/icde/InanKB09, author = {Ali Inan and Murat Kantarcioglu and Elisa Bertino}, title = {Using Anonymized Data for Classification}, booktitle = {ICDE}, year = {2009}, pages = {429-440} }The class attribute is assumed to be the last attribute of the input file. Please replace if that is not the case. The constructor parameters are as follows:
Field Summary | |
---|---|
java.lang.Object[] |
attributes
Vector of attribute (either NumericAtt or CategoricalAtt object |
private int |
catRep
Categorical representation heuristic of choice. |
private java.lang.String[] |
classValues
Vector of class values |
private int[] |
featAttMapping
The i^th entry contains the index of the attribute feature i correspond to |
private java.util.LinkedList<java.lang.Integer> |
idAttributes
List of identifier attribute indices within the original file |
boolean[] |
isCont
Entry i is true if i^th attribute is numerical |
private boolean[] |
isGeneralized
Entry i is true if i^th attribute has been generalized |
private int |
numRep
Numeric representation heuristic of choice. |
private QIDAttribute[] |
qids
Quasi-identifier attributes (indices should be adjusted according to id attributes) |
private boolean |
usePDFExp
Choice of expected distance calculation method. |
private boolean |
useUniExp
Choice of expected distance calculation method. |
Constructor Summary | |
---|---|
FeatureRepresentation(java.lang.String descriptorFilename,
QIDAttribute[] qids,
java.util.LinkedList<java.lang.Integer> idAttributes,
int numRep,
int catRep,
boolean usePDFExp,
boolean useUniExp)
Class constructor |
Method Summary | |
---|---|
void |
assignFeatureIndices()
Assigns feature indices to each attribute |
double |
dotProduct(svm_node[] x)
Computes the expected dot product |
double |
dotProduct(svm_node[] x,
svm_node[] y)
Computes the expected dot product |
private double |
dotProductPDF(svm_node[] x)
Computes the expected dot product based on QI-statistics |
private double |
dotProductPDF(svm_node[] x,
svm_node[] y)
Computes the expected dot product based on QI-statistics |
private double |
dotProductUni(svm_node[] x)
Computes the expected dot product based on the assumption that values of a generalization are distributed uniformly |
private double |
dotProductUni(svm_node[] x,
svm_node[] y)
Computes the expected dot product based on the assumption that values of a generalization are distributed uniformly |
void |
featurize(java.lang.String inputFile,
java.lang.String outputFile)
Convert the input to a set of feature vectors, which will be written to the output file |
void |
initialScan(java.lang.String inputFile)
Sets lower/upper bound for numeric attributes (does nothing for categorical) |
private boolean |
isIdentifier(int index)
Checks if the attribute at index is an identifier |
private void |
readDescriptor(java.lang.String descriptor)
Read the descriptor file |
double |
squareDistance(svm_node[] x,
svm_node[] y)
Computes the expected square distance |
private double |
squareDistancePDF(svm_node[] x,
svm_node[] y)
Computes the expected square distance based on QI-statistics |
private double |
squareDistanceUni(svm_node[] x,
svm_node[] y)
Computes the expected square distance based on the assumption that values of a generalization are distributed uniformly |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
private java.lang.String[] classValues
public java.lang.Object[] attributes
public boolean[] isCont
private boolean[] isGeneralized
private int numRep
private int catRep
private boolean usePDFExp
private boolean useUniExp
private QIDAttribute[] qids
private java.util.LinkedList<java.lang.Integer> idAttributes
private int[] featAttMapping
Constructor Detail |
---|
public FeatureRepresentation(java.lang.String descriptorFilename, QIDAttribute[] qids, java.util.LinkedList<java.lang.Integer> idAttributes, int numRep, int catRep, boolean usePDFExp, boolean useUniExp)
descriptorFilename
- A names files that describes attribute types and/or domainsanonConfig
- Anonymization configuration filenumRep
- Numeric feature representation heuristiccatRep
- Categorical feature representation heuristicusePDFExp
- Flag for calculating expected distance based on QI-statisticsuseUniExp
- Flag for calculating expected distance based on uniform distribution assumptionMethod Detail |
---|
private boolean isIdentifier(int index)
index
- attribute index
private void readDescriptor(java.lang.String descriptor)
descriptor
- filenamepublic void initialScan(java.lang.String inputFile)
inputFile
- Input data filenamepublic void assignFeatureIndices()
public void featurize(java.lang.String inputFile, java.lang.String outputFile)
inputFile
- input filenameoutputFile
- output filenamepublic double squareDistance(svm_node[] x, svm_node[] y)
x
- vector of featuresy
- vector of features
public double dotProduct(svm_node[] x, svm_node[] y)
x
- vector of featuresy
- vector of features
public double dotProduct(svm_node[] x)
x
- vector of features
private double squareDistancePDF(svm_node[] x, svm_node[] y)
x
- vector of featuresy
- vector of features
private double dotProductPDF(svm_node[] x, svm_node[] y)
x
- vector of featuresy
- vector of features
private double dotProductPDF(svm_node[] x)
x
- vector of features
private double squareDistanceUni(svm_node[] x, svm_node[] y)
x
- vector of featuresy
- vector of features
private double dotProductUni(svm_node[] x, svm_node[] y)
x
- vector of featuresy
- vector of features
private double dotProductUni(svm_node[] x)
x
- vector of features
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |