methods
Class FileIO

java.lang.Object
  extended by methods.FileIO

public class FileIO
extends java.lang.Object

This class contains a variety of file input/output functions that are useful in building experiment data.


Constructor Summary
FileIO()
           
 
Method Summary
static void AddStringToFile(java.lang.String inputFilename, java.lang.String value)
          Adds the second String parameter before the first line of a file
static void AttributeRemoval(java.lang.String inputFile, java.lang.String outputFile, int[] indices)
          Removes the specified attributes from input and outputs the result to the output file
static void AttributeSelection(java.lang.String inputFile, java.lang.String outputFile, int[] indices)
          Outputs inputFile's projection on specified attribute indices to the outputFile
static int CountNonEmptyLines(java.lang.String inputFile)
          Counts and returns the number of non-empty lines
static java.lang.String GetARFFHeader(java.lang.String headerFile)
          Reads the entire header file into a String, except for the first line that contains the number of attributes described in the header.
static java.lang.String GetARFFHeader(java.lang.String headerFile, int[] exclude, int[] string)
          Reads the entire header file into a String, except for the first line that contains the number of attributes described in the header.
static int GetNumAttributes(java.lang.String headerFile)
          Reads the first line of a header file and parses it as java.Integer
static void Grip(java.lang.String inputFile, java.lang.String outputFile, int numLines)
          Reads the first numLines lines of an input file into outputFile
static void main(java.lang.String[] args)
          Removes whitespace from the dataset (in between attribute values), Removes any record with unknown values ("?") Outputs the number of records in the final version If specified as parameter, shuffles the records and grips the first numLines records.
static void MergeTwoInputs(java.lang.String inputFile1, java.lang.String inputFile2, java.lang.String outputFile)
          Merges file1 and file2 into outFile.
static void PrepareCrossValidationData(java.lang.String inputFile, java.lang.String outputFile1, java.lang.String outputFile2, int numFolds, int currFold)
          Partitions the input file into training and test datasets according to the number of folds and current fold.
static void RemoveRecWithMissingValues(java.lang.String inputFile, java.lang.String outputFile)
          Removes any records with '?' in inFile and outputs results to outFile.
static void RenameFile(java.lang.String inputFilename, java.lang.String rename)
          Renames an input file to the given name, renFile.
static void ReplaceWSWithUnderScore(java.lang.String inputFile, java.lang.String outputFile)
          Replaces all space characters within attribute values with the underscore character
static void ShuffleRecords(java.lang.String inputFile, java.lang.String outputFile)
          Shuffles records in inFile and outputs the result to outFile.
static void SplitIntoPartitions(java.lang.String inputFile, java.lang.String[] outputFiles)
          Divides an input file into equi-length output files with given filenames.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

FileIO

public FileIO()
Method Detail

Grip

public static void Grip(java.lang.String inputFile,
                        java.lang.String outputFile,
                        int numLines)
                 throws java.lang.Exception
Reads the first numLines lines of an input file into outputFile

Parameters:
input - filename
output - filename
number - of lines to be read to output file
Throws:
java.lang.Exception

AttributeSelection

public static void AttributeSelection(java.lang.String inputFile,
                                      java.lang.String outputFile,
                                      int[] indices)
                               throws java.lang.Exception
Outputs inputFile's projection on specified attribute indices to the outputFile

Parameters:
input - filename
output - filename
int[] - list of indices to be projected
Throws:
java.lang.Exception

AttributeRemoval

public static void AttributeRemoval(java.lang.String inputFile,
                                    java.lang.String outputFile,
                                    int[] indices)
                             throws java.lang.Exception
Removes the specified attributes from input and outputs the result to the output file

Parameters:
input - filename
output - filename
int[] - list of indices to be removed
Throws:
java.lang.Exception

GetNumAttributes

public static int GetNumAttributes(java.lang.String headerFile)
                            throws java.lang.Exception
Reads the first line of a header file and parses it as java.Integer

Parameters:
header - file for ARFF databases
Returns:
number of attributes
Throws:
java.lang.Exception

GetARFFHeader

public static java.lang.String GetARFFHeader(java.lang.String headerFile)
                                      throws java.lang.Exception
Reads the entire header file into a String, except for the first line that contains the number of attributes described in the header. Example header file: "3 \@RELATION relName \@Attribute numeric NUMERIC \@Attribute categorical {cat1,cat2} \@Attribute alpha STRING \@DATA"

Parameters:
header - file for ARFF databases
Returns:
header information String in ARFF format
Throws:
java.lang.Exception

GetARFFHeader

public static java.lang.String GetARFFHeader(java.lang.String headerFile,
                                             int[] exclude,
                                             int[] string)
                                      throws java.lang.Exception
Reads the entire header file into a String, except for the first line that contains the number of attributes described in the header. Example header file: "3 \@RELATION relName \@Attribute numeric NUMERIC \@Attribute categorical {cat1,cat2} \@Attribute alpha STRING \@DATA"

Among these attributes, the indices specified in exclude will be removed and the indices specified in string will be converted attributes of type String.

Parameters:
header - file for ARFF databases
exclude - the attributes at the specified indices will be omitted
string - the attributes at the specified indices will be converted to string type
Returns:
header information String in ARFF format
Throws:
java.lang.Exception

RenameFile

public static void RenameFile(java.lang.String inputFilename,
                              java.lang.String rename)
                       throws java.lang.Exception
Renames an input file to the given name, renFile. CAUTION: if there exists a file with the name "renFile", it will be deleted!!!

Parameters:
input - filename, to be renames
new - filename
Throws:
java.lang.Exception

AddStringToFile

public static void AddStringToFile(java.lang.String inputFilename,
                                   java.lang.String value)
                            throws java.lang.Exception
Adds the second String parameter before the first line of a file

Parameters:
file - to be appended
String - to be appended
Throws:
java.lang.Exception

MergeTwoInputs

public static void MergeTwoInputs(java.lang.String inputFile1,
                                  java.lang.String inputFile2,
                                  java.lang.String outputFile)
                           throws java.lang.Exception
Merges file1 and file2 into outFile. Any line starting with "|" is omitted.

Parameters:
file1 - (will be the first part of outFile)
file2 -
outFile: - output filename
Throws:
java.lang.Exception

RemoveRecWithMissingValues

public static void RemoveRecWithMissingValues(java.lang.String inputFile,
                                              java.lang.String outputFile)
                                       throws java.lang.Exception
Removes any records with '?' in inFile and outputs results to outFile. In the meanwhile, whitespace between attributes are removed in the format (a_1,a_2,...,a_n) for n attributes.

Parameters:
inFile: - input filename
outFile: - output filename
Throws:
java.lang.Exception

ShuffleRecords

public static void ShuffleRecords(java.lang.String inputFile,
                                  java.lang.String outputFile)
                           throws java.lang.Exception
Shuffles records in inFile and outputs the result to outFile. Seed for random number number generator is fixed to 3345 (easily changable, but restricted for the purpose of repeatable experiments)

Parameters:
inFile: - input filename
outFile: - output filename
Throws:
java.lang.Exception

SplitIntoPartitions

public static void SplitIntoPartitions(java.lang.String inputFile,
                                       java.lang.String[] outputFiles)
                                throws java.lang.Exception
Divides an input file into equi-length output files with given filenames. Primary purpose is multiple data holder experiments (no tuples discarded, last output file contains any remainder tuples)

Parameters:
input - filename
array - of output filenames
Throws:
java.lang.Exception

PrepareCrossValidationData

public static void PrepareCrossValidationData(java.lang.String inputFile,
                                              java.lang.String outputFile1,
                                              java.lang.String outputFile2,
                                              int numFolds,
                                              int currFold)
                                       throws java.lang.Exception
Partitions the input file into training and test datasets according to the number of folds and current fold. Let the input file contain 101 records and the experiment is 4-fold. When currFold = 1, test dataset has first 25 tuples, training dataset has remaining 76 tuples. When currFold = 2, training dataset has first 25 tuples and all tuples after the 50th tuple (76 in total), test dataset has tuples (25-50].

Parameters:
input - filename
training - data output filename
test - data output filename
number - of folds (>0)
current - fold: [1-numFolds]
Throws:
java.lang.Exception

CountNonEmptyLines

public static int CountNonEmptyLines(java.lang.String inputFile)
                              throws java.lang.Exception
Counts and returns the number of non-empty lines

Parameters:
input - filename
Returns:
number of non empty lines in the input file
Throws:
java.lang.Exception

ReplaceWSWithUnderScore

public static void ReplaceWSWithUnderScore(java.lang.String inputFile,
                                           java.lang.String outputFile)
                                    throws java.lang.Exception
Replaces all space characters within attribute values with the underscore character

Parameters:
inputFile - filename
outputFile - filename
Throws:
java.lang.Exception

main

public static void main(java.lang.String[] args)
                 throws java.lang.Exception
Removes whitespace from the dataset (in between attribute values), Removes any record with unknown values ("?") Outputs the number of records in the final version If specified as parameter, shuffles the records and grips the first numLines records.

Parameters:
inputFilename -
outputFilename -
numLines - (int) --optional
Throws:
java.lang.Exception