“This is the 11th day of my participation in the First Challenge 2022. For details: First Challenge 2022”

preface

Hi, everyone, the 2022 Spring Festival is coming to an end, and work is starting all over the country. Recently, a friend of mine did a small project that happened to use Java to read PDF file information. So document the process.

Pdfbox introduction

PDFbox is an open source, Java-based, PDF document generation tool library that can be used to create new PDF documents, modify existing PDF documents, and extract desired content from PDF documents. Apache PDFBox also includes several command-line tools.

PDF file data is a collection of basic objects: arrays, Bools, dictionaries, numbers, strings, and binary streams.

The development environment

The version information of Java pdFBox-based reading and processing PDF files is as follows:

JDK18.
SpringBoot 2.3. 0.RELEASE
PDFbox 1.813.
Copy the code

PDFbox rely on

The PDFbox dependency needs to be introduced the first time you use it. The dependency packages used this time are as follows:

<dependency>
            <groupId>org.apache.pdfbox</groupId>
            <artifactId>pdfbox</artifactId>
            <version>1.8.13</version>
        </dependency>
Copy the code

Quick start

This example reads information from a PDF file in a specified directory and stores it in a TXT file in a specified path.

class PdfTest {

    public static void main(String[] args) throws Exception {
       String filePath ="C:\\Users\\Admin\\Desktop\\cxy1.pdf";
   
        List<String> list = getFiles(basePath);
        for (String filePath : list) {
            long ltime = System.currentTimeMillis();
            String substring = filePath.substring(filePath.lastIndexOf("\ \") + 1, filePath.lastIndexOf("."));
            String project = "(juejin.cn)";
            String textFromPdf = getTextFromPdf(filePath);
            String s = writterTxt(textFromPdf, substring + "--", ltime, basePath);
            StringBuffer stringBuffer = readerText(s, project);
            writterTxt(stringBuffer.toString(), substring + "-", ltime, basePath);
        }
        System.out.println("******************** end ************************");
    }

    public static List<String> getFiles(String path) {
        List<String> files = new ArrayList<String> (); File file =new File(path);
        File[] tempList = file.listFiles();

        for (int i = 0; i < tempList.length; i++) {
            if (tempList[i].isFile()) {
                if (tempList[i].toString().contains(".pdf") || tempList[i].toString().contains(".PDF")) {
                    files.add(tempList[i].toString());
                }
                // The file name does not contain the path
                //String fileName = tempList[i].getName();
            }
            if (tempList[i].isDirectory()) {
                // There is no recursion,}}return files;
    }

    public static String getTextFromPdf(String filePath) throws Exception {
        String result = null;
        FileInputStream is = null;
        PDDocument document = null;
        try {
            is = new FileInputStream(filePath);
            PDFParser parser = new PDFParser(is);
            parser.parse();
            document = parser.getPDDocument();
            PDFTextStripper stripper = new PDFTextStripper();
            result = stripper.getText(document);
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            if(is ! =null) {
                try {
                    is.close();
                } catch(IOException e) { e.printStackTrace(); }}if (document! =null) {
                try {
                    document.close();
                } catch(IOException e) { e.printStackTrace(); }}}Map<String.String> map = new HashMap<String.String> ();return result;
    }


    public static String writterTxt(String data, String text, long l, String basePath) {
        String fileName = null;
        try {
            if (text == null) {
                fileName = basePath + "javaio-" + l + ".txt";
            } else {
                fileName = basePath + text + l + ".txt";
            }

            File file = new File(fileName);
            //if file doesnt exists, then create it
            if(! file.exists()) { file.createNewFile(); }//true = append file
            OutputStream outputStream = new FileOutputStream(file);
// FileWriter fileWritter = new FileWriter(file.getName(), true);
// fileWritter.write(data);
// fileWritter.close();
            OutputStreamWriter outputStreamWriter = new OutputStreamWriter(outputStream);
            outputStreamWriter.write(data);
            outputStreamWriter.close();
            outputStream.close();
            System.out.println("Done");
        } catch (IOException e) {
            e.printStackTrace();
        }

        return fileName;
    }

    public static StringBuffer readerText(String name, String project) {
        // Use ArrayList to store the strings read for each line
        StringBuffer stringBuffer = new StringBuffer();
        try {
            FileReader fr = new FileReader(name);
            BufferedReader bf = new BufferedReader(fr);
            String str;
            // Reads the string by line
            while((str = bf.readLine()) ! =null) {
                str = replaceAll(str);
                if (str.contains("D、") || str.contains("D.")) {
                    stringBuffer.append(str);
                    stringBuffer.append("\n");
                    stringBuffer.append("Reference: \n");
                    stringBuffer.append("Reference: \n");
                    stringBuffer.append("\n\n\n\n");
                } else if (str.contains("A、") || str.contains("A.")) {
                    stringBuffer.deleteCharAt(stringBuffer.length() - 1);
                    stringBuffer.append("。" + project + "\n");
                    stringBuffer.append(str + "\n");
                } else if (str.contains("B、") || str.contains("C、") || str.contains("B.") || str.contains("C.")) {
                    stringBuffer.append(str + "\n");
                } else {
                    stringBuffer.append(str);
                }

            }
            bf.close();
            fr.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
        return stringBuffer;
    }

    public static String replaceAll(String str) {
        return str.replaceAll("Network".""); }}Copy the code

conclusion

Well, the above is Java inheritance related concepts, thank you for reading, I hope you like, if you are helpful, welcome to like collection. If there are shortcomings, welcome comments and corrections. See you next time.

About the author: [Little Ajie] a love tinkering with the program ape, JAVA developers and enthusiasts. Public number [Java full stack architect] maintainer, welcome to pay attention to reading communication.