Java如何解析word文档到富文本编辑器中？

2025-05-19 23:50:04 JAVA 3825

在 Java 中解析 Word 文档（如 .doc 或 .docx 文件）并将其内容导入到富文本编辑器（如 JEditorPane 或 JavaFX 的 HTMLEditor）中，通常需要以下几个步骤：读取 Word 文档的内容，将其转换为合适的格式，并将其加载到富文本编辑器中。以下是详细的步骤和代码示例。

1. 依赖库

为了读取 Word 文档，通常使用 Apache POI 库或 docx4j 库。

Apache POI

Maven 依赖:

xml
<dependency>
    <groupId>org.apache.poi</groupId>
    <artifactId>poi</artifactId>
    <version>5.2.3</version> <!-- 或者最新版本 -->
</dependency>
<dependency>
    <groupId>org.apache.poi</groupId>
    <artifactId>poi-ooxml</artifactId>
    <version>5.2.3</version> <!-- 或者最新版本 -->
</dependency>

Gradle 依赖:

groovy
implementation 'org.apache.poi:poi:5.2.3'
implementation 'org.apache.poi:poi-ooxml:5.2.3'

docx4j

Maven 依赖:

xml
<dependency>
    <groupId>org.docx4j</groupId>
    <artifactId>docx4j</artifactId>
    <version>11.2.7</version> <!-- 或者最新版本 -->
</dependency>

Gradle 依赖:

groovy
implementation 'org.docx4j:docx4j:11.2.7'

2. 读取 Word 文档内容

使用 Apache POI 读取 .docx 文件的内容并转换为 HTML 格式。对于 .doc 文件，可以使用 HWPFDocument 类。

使用 Apache POI 读取 `.docx` 文件

java
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.converter.core.IURIResolver;
import org.apache.poi.xwpf.converter.xhtml.XHTMLConverter;
import org.apache.poi.xwpf.converter.xhtml.XHTMLOptions;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;

import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.util.List;

public class WordToHTMLConverter {

    public static void main(String[] args) {
        try (FileInputStream fis = new FileInputStream("example.docx");
             OutputStream os = new FileOutputStream("output.html")) {

            XWPFDocument document = new XWPFDocument(fis);
            XHTMLOptions options = XHTMLOptions.create();
            XHTMLConverter.getInstance().convert(document, os, options);

        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

3. 将 HTML 内容加载到富文本编辑器

使用 `JEditorPane` 显示 HTML 内容

java
import javax.swing.*;
import java.io.FileReader;
import java.io.IOException;

public class HTMLViewer {

    public static void main(String[] args) {
        SwingUtilities.invokeLater(() -> {
            JFrame frame = new JFrame("HTML Viewer");
            frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);

            JEditorPane editorPane = new JEditorPane();
            editorPane.setContentType("text/html");

            try {
                editorPane.setPage("file:///path/to/output.html"); // HTML 文件路径
            } catch (IOException e) {
                e.printStackTrace();
            }

            frame.add(new JScrollPane(editorPane));
            frame.setSize(800, 600);
            frame.setVisible(true);
        });
    }
}

使用 JavaFX 的 `HTMLEditor`

java
import javafx.application.Application;
import javafx.scene.Scene;
import javafx.scene.control.Hyperlink;
import javafx.scene.control.TextArea;
import javafx.scene.layout.BorderPane;
import javafx.stage.Stage;

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;

public class HTMLViewerFX extends Application {

    @Override
    public void start(Stage primaryStage) {
        primaryStage.setTitle("HTML Viewer");

        BorderPane root = new BorderPane();
        HTMLEditor htmlEditor = new HTMLEditor();

        try (BufferedReader reader = new BufferedReader(new FileReader("output.html"))) {
            StringBuilder htmlContent = new StringBuilder();
            String line;
            while ((line = reader.readLine()) != null) {
                htmlContent.append(line).append("\n");
            }
            htmlEditor.setHtmlText(htmlContent.toString());
        } catch (IOException e) {
            e.printStackTrace();
        }

        root.setCenter(htmlEditor);
        primaryStage.setScene(new Scene(root, 800, 600));
        primaryStage.show();
    }

    public static void main(String[] args) {
        launch(args);
    }
}

4. 总结

依赖库: 使用 Apache POI 或 docx4j 读取 Word 文档。
读取 Word 文档: 将 .docx 文件转换为 HTML 格式。
加载 HTML 内容: 使用 JEditorPane 或 JavaFX 的 HTMLEditor 显示 HTML 内容。
路径配置: 确保 HTML 文件路径正确。

通过这些步骤，你可以将 Word 文档解析并显示到富文本编辑器中。如果遇到特定问题或需要进一步帮助，请提供详细信息。