read any file efficiently in java as string

i'm working on a simple implementation of Huffman coding and it works fine for any files using some form of text encoding but when i try to read in any other format (e.g. .mp4 .png .exe) it still works but becomes extremely slow (minutes instead of less than a second for the same size of file).

my question is is there another method i should be using to read these files so that the read speed depends on the size of the file not its format and if so what is it? thanks.

this is my IO class it uses a fileReader wrapped in a bufferedReader to read files based on a path entered in the console.

import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;

public class IO {
    public String readFile(String path, boolean includeNewLine) {
        String returnString = "";
        try {
            FileReader fileReader = new FileReader(path);

            BufferedReader bufferedReader = new BufferedReader(fileReader);

            String line;
            int nLines = 0;
            while((line = bufferedReader.readLine()) != null) {
                if(nLines > 0 && includeNewLine) {
                    returnString += "\n";
                }
                returnString += line;
                nLines++;
            }   

            bufferedReader.close();         
        } catch(FileNotFoundException e) {
            System.out.println("Unable to open file '" + path + "'");                
        } catch(IOException e) {
            System.out.println("Error reading file '" + path + "'");                  
        }

        return returnString;
    }
}

3 answers

  • answered 2018-04-14 15:13 SMA

    With returnString you are creating new instance of String by appending the new line to previous line. Instead i would suggest you use StringBuilder as follows:

    StringBuilder fileContent = new StringBuilder();
    //do your stuff
    fileContent.append(line);
    

    In this way, you keep on reusing the same builder object. Also if you are reading binary content then better use class from InputStream hierarchy.

    We do have Files class from nio package which you could use to get lines as below instead:

    try (Stream<String> stream = Files.lines( Paths.get(filePath), StandardCharsets.UTF_8)) {
        stream.forEach(s -> fileContent.append(s).append("\n"));
    }
    

    Another way, would be to use already tested code provided by Apache commons IO api FileUtils.readFileToString

  • answered 2018-04-14 15:14 Roni Kurtberg

    Maybe this will help: FileInputStream vs FileReader

    And, of course, change your method to use StringBuilder (but that's another issue).

  • answered 2018-04-14 15:28 M. le Rutte

    As long as you are trying to interpret the file as a String you'll be running into problems with efficiency. Any binary format may produce a huge string, even exceeding the 64K maximum a string can hold as there may never be a byte you'll interpret as a end of line character ('\n').

    You should interpret your file as a sequence of bytes. Use a memory mapped ByteBuffer for maximum efficiency.