Apache POI PPT幻灯片转图片，中文乱码终极解决方案

十二月 11, 2019 4790

最近项目有需求，把PPT转成图片文件，并输出到PDF文件中。公司项目都采用Java语言，自然使用了

乱码问题

最近项目有需求，把PPT转成图片文件，并输出到PDF文件中。公司项目都采用Java语言，自然使用了 Apache POI. 但是过程中遇到了棘手的问题，部分中文输出之后变成乱码如下如。

image

现有解决办法

百度到这么多的解决方案，一开始还很兴奋，以为问题很快就解决了，没想到最后还是发现有那么一小部分文字还是出现了乱码，百思不得其解。

我开始认真分析，上面的解决方案为什么会不能解决全部的乱码问题。我们一起来看看他们的解决方案。

Slide[] slide = ppt.getSlides();//获取PPT文档的幻灯片列表
for (Slide slide: ppt.getSlides()) {//遍历每一张幻灯片
    for( XSLFShape shape : slide.getShapes() ){//遍历单页幻灯片中的形状
        if ( shape instanceof XSLFTextShape ){//如果是文本形状
            XSLFTextShape txtshape = (XSLFTextShape)shape ;
            //遍历文本形状中的段落
            for ( XSLFTextParagraph textPara : txtshape.getTextParagraphs() ){
                //遍历段落中的文本
                for(XSLFTextRun textRun: textPara.getTextRuns()) {
                    //给文本指定字体为：宋体
                    textRun.setFontFamily("宋体");
                }
            }
        }
    }
}

代码中我加了一些注释，大体上我们可以看出，这个代码主要的工作就是想通过遍历幻灯片的元素，为每个文本一一指定宋体。这也是为什么很多人都能通过这个代码解决问题。

然而我自己通过这个代码并不能解决全部的乱码问题。无奈最后自己只能一步一步的单步调试POI 源码，最终发现了问题元凶。上面的代码中又这么一行：

1	if ( shape instanceof XSLFTextShape )//如果是文本形状，这段代码只处理了一种形状

它只处理了一种 shape 然而我在 POI 的源码中找到了所有的形状如下：

public Drawable getDrawable(Shape<?, ?> shape) {
        if (shape instanceof TextBox) {
            return this.getDrawable((TextBox)shape);
        } else if (shape instanceof FreeformShape) {
            return this.getDrawable((FreeformShape)shape);
        } else if (shape instanceof TextShape) {
            return this.getDrawable((TextShape)shape);
        } else if (shape instanceof TableShape) {
            return this.getDrawable((TableShape)shape);
        } else if (shape instanceof GroupShape) {
            return this.getDrawable((GroupShape)shape);
        } else if (shape instanceof PictureShape) {
            return this.getDrawable((PictureShape)shape);
        } else if (shape instanceof GraphicalFrame) {
            return this.getDrawable((GraphicalFrame)shape);
        } else if (shape instanceof Background) {
            return this.getDrawable((Background)shape);
        } else if (shape instanceof ConnectorShape) {
            return this.getDrawable((ConnectorShape)shape);
        } else if (shape instanceof Slide) {
            return this.getDrawable((Slide)shape);
        } else if (shape instanceof MasterSheet) {
            return this.getDrawable((MasterSheet)shape);
        } else if (shape instanceof Sheet) {
            return this.getDrawable((Sheet)shape);
        } else if (shape.getClass().isAnnotationPresent(DrawNotImplemented.class)) {
            return new DrawNothing(shape);
        } else {
            throw new IllegalArgumentException("Unsupported shape type: " + shape.getClass());
        }
    }

虽然从名称来看，PPT的文本应该只存在于 TextShape 中，但是事实证明其他的形状 shape 中也出现了文本。所以我立刻意识到，要能防止所有的乱码问题，必须把这些 shape 全部遍历，并且一一设置宋体。想想就知道工作量不小。

最终方案

正当我准备写一个循环能遍历所有 Shape 的时候，在 POI 源码中发现了一个神奇的东西：

//DrawFontManager 不是字体管理器吗？
public DrawFontManager getFontManager(Graphics2D graphics) {
    //可以通过 graphics.getRenderingHint 获取
    DrawFontManager fontHandler = graphics.getRenderingHint(Drawable.FONT_HANDLER);
    return (DrawFontManager)(fontHandler != null ? fontHandler : new DrawFontManagerDefault());
}

字体管理器 立刻吸引了我的注意，并且值得兴奋的是，它是通过 graphics.getRenderingHint 来获取的,那么我们就可以指定自己的字体管理器！

然后我又开始研究 DrawFontManager 字体管理器在其源码中发现：

//字体映射，把一个字体映射成另一个字体
public FontInfo getMappedFont(Graphics2D graphics, FontInfo fontInfo) {
    return this.getFontWithFallback(graphics, Drawable.FONT_MAP, fontInfo);
}

天助我也，竟然有字体映射功能，那么我就不用去遍历那么多的 shape 了!直接把所有字体映射到 宋体 就好了。代码如下

kotlin

graphics.setRenderingHint(Drawable.FONT_HANDLER, object : DrawFontManagerDefault() {
    override fun getMappedFont(graphics: Graphics2D?, fontInfo: FontInfo?): FontInfo {
        try {
            //把所有字体都映射成 宋体
            fontInfo?.typeface = "宋体"
        } catch (e: Exception) {
            //有一些字体是只读属性，会抛异常，忽略掉
        }
        return super.getMappedFont(graphics, fontInfo)
    }
})

java

graphics.setRenderingHint(Drawable.FONT_HANDLER, new DrawFontManagerDefault() {
    public FontInfo getMappedFont(Graphics2D graphics, FontInfo fontInfo) {
        try {
            //把所有字体都映射成 宋体
            fontInfo.setTypeface("宋体");
        } catch (Exception e) {
            //有一些字体是只读属性，会抛异常，忽略掉
        }
        return super.getMappedFont(graphics, fontInfo);
    }
});

现在问题已经完美解决，每个角落的乱码都不存在了！如果这段代码也帮助到了你，记得点赞噢！

本文作者：scwang90
本文链接：https://blog.scwang90.cn/2019/12/11/apache-poi-garbled/index.html
版权声明：本分享所有文章均采用 BY-NC-SA 许可协议，转载请注明出处！