当前位置：首页 > news >正文

做任务能赚钱的网站有哪些商城网站开发公司

news 2026/4/16 1:53:35

做任务能赚钱的网站有哪些,商城网站开发公司,绵阳网站建设策划内容,广州建筑集团官网首页Java可使用的OCR工具Tess4J使用举例 1.简介1.1 简单介绍1.2 官方说明 2.使用举例2.1 依赖及语言数据包2.2 核心代码2.3 识别身份证信息2.3.1 核心代码2.3.2 截取指定字符2.3.3 去掉字符串里的非中文字符2.3.4 提取出生日期（待优化）2.3.5 实测 3.总结 1.简…

Java可使用的OCR工具Tess4J使用举例

1.简介
- 1.1 简单介绍
- 1.2 官方说明
2.使用举例
- 2.1 依赖及语言数据包
- 2.2 核心代码
- 2.3 识别身份证信息
- - 2.3.1 核心代码
  - 2.3.2 截取指定字符
  - 2.3.3 去掉字符串里的非中文字符
  - 2.3.4 提取出生日期（待优化）
  - 2.3.5 实测
3.总结

1.简介

1.1 简单介绍

Lept4J和Tess4J都是基于Tesseract OCR引擎的Java接口，可以用来识别图像中的文本：

前者是Leptonica图像处理库的Java封装，提供了图像的加载、处理、分析等功能。
后者是Tesseract OCR引擎的Java封装，提供了图像的OCR识别、PDF文档的生成等功能。

Lept4J和Tess4J的区别在于，Lept4J主要负责图像的预处理，而Tess4J主要负责图像的后处理，特点分别是：

Lept4J支持多种图像格式，可以进行图像的缩放、旋转、裁剪、二值化、降噪等操作，提高图像的质量和识别率。
Tess4J支持多种语言的识别，可以生成文本、HTML、PDF等格式的输出，提供了多种识别模式和参数设置，满足不同的需求。

根据具体场景和需求，可以选择使用Lept4J或Tess4J，或者结合使用两者，以达到最佳的效果。

1.2 官方说明

官网：https://tess4j.sourceforge.net/
描述：A Java JNA wrapper for Tesseract OCR API.Tess4J is released and distributed under the Apache License, v2.0 and is also available from Maven Central Repository.
特性：The library provides optical character recognition (OCR) support for:

TIFF, JPEG, GIF, PNG, and BMP image formats
Multi-page TIFF images
PDF document format

2.使用举例

2.1 依赖及语言数据包

<!-- https://mvnrepository.com/artifact/net.sourceforge.tess4j/tess4j -->
<dependency><groupId>net.sourceforge.tess4j</groupId><artifactId>tess4j</artifactId><version>5.9.0</version>
</dependency>

语言数据包下载地址：https://github.com/tesseract-ocr/tessdata

2.2 核心代码

    /*** 识别图片字符信息** @param imagePath 图片路径*/private static String recognitionString(String imagePath) {File imageFile = new File(imagePath);ITesseract instance = new Tesseract();// 1.语言数据包路径instance.setDatapath("tessdata");// 2.加载语言文件名称instance.setLanguage("chi_sim");String result = "";try {result = instance.doOCR(imageFile);} catch (TesseractException e) {e.printStackTrace();}return result;}

2.3 识别身份证信息

2.3.1 核心代码

    /*** 识别身份证信息** @param imagePath 图片路径*/private static Map<String, Object> recognitionIdentityCardInfo(String imagePath) {Map<String, Object> res = new HashMap<>(2);// 识别图片File imageFile = new File(imagePath);BufferedImage bufferedImage = null;try {bufferedImage = ImageIO.read(imageFile);} catch (IOException e) {e.printStackTrace();}ITesseract instance = new Tesseract();instance.setDatapath("tessdata");instance.setLanguage("chi_sim");List<Word> words = instance.getWords(bufferedImage, 1);// 获取姓名int nameLineIndex = 0;if (words.size() > nameLineIndex) {res.put("name", getStringByIndex(words.get(0).getText(), 2));}// 获取性别和民族int genderAndNationLineIndex = 1;if (words.size() > genderAndNationLineIndex) {res.put("gender", getStringByIndex(words.get(1).getText(), 2, 1));res.put("nation", removeNonChinese(getStringByIndex(words.get(1).getText(), 5, -1)));}// 获取出生日期int birthLineIndex = 2;if (words.size() > birthLineIndex) {res.put("birth", extractBirthDate(getStringByIndex(words.get(2).getText(), 2)));}// 获取住址int addressLineIndex = 3;if (words.size() > addressLineIndex) {res.put("address", getStringByIndex(words.get(3).getText(), 2).replace("/", ""));}// 获取身份证号码int noLineIndex = 4;if (words.size() > noLineIndex) {res.put("no", getStringByIndex(words.get(4).getText(), 7));}return res;}

2.3.2 截取指定字符

    /*** 截取指定字符** @param inputString 字符串* @param indexStart  开始Index* @return 截取的字符串*/private static String getStringByIndex(String inputString, int indexStart) {return getStringByIndex(inputString, indexStart, -1);}/*** 截取指定字符** @param inputString 字符串* @param indexStart  开始Index* @param size        截取的字符个数* @return 截取的字符串*/private static String getStringByIndex(String inputString, int indexStart, int size) {// 去除字符串两端的空白字符String trimmedString = inputString.trim();// 将字符串以空白字符分割StringBuilder res = new StringBuilder();String[] words = trimmedString.split("\\s+");int length = words.length;int contentSize = indexStart + size;if (length > indexStart) {int index = length;if (size > 0 && length > contentSize) {index = contentSize;}for (int i = indexStart; i < index; i++) {res.append(words[i]);}}return res.toString();}

2.3.3 去掉字符串里的非中文字符

    /*** 去掉字符串里的非中文字符** @param inputString 字符串* @return 中文字符串*/private static String removeNonChinese(String inputString) {// 匹配非汉字字符的正则表达式String regex = "[^\u4E00-\u9FA5]";Pattern pattern = Pattern.compile(regex);Matcher matcher = pattern.matcher(inputString);// 替换非汉字字符为空格return matcher.replaceAll("");}

2.3.4 提取出生日期（待优化）

    /*** 提取出生日期** @param inputString 字符串* @return 出生日期*/private static String extractBirthDate(String inputString) {// 匹配日期格式的正则表达式String regex = "(\\d{4}年\\d{2}月\\d{2}日)";Pattern pattern = Pattern.compile(regex);Matcher matcher = pattern.matcher(inputString);// 提取匹配到的日期if (matcher.find()) {return matcher.group(1);} else {return "未找到日期";}}

2.3.5 实测

图片：

结果：

{name=代用名, gender=男, nation=汉, birth=2013年05月06日, address=湖南省长沙市开福区送道街仪幸福小区居民组, no=30512198908131367}

姓名正确
性别正确
民族正确
出生正确
住址错了一个字（巡）多了一个字（仪）
公民身份证号码缺少首位（4）

3.总结

Java能用挺友好
缺点是识别率有点儿低

查看全文

http://www.hkea.cn/news/22141/

网站建设注意哪些问题sem和seo是什么职业岗位

一_建设网站前的市场分析奶茶软文案例300字

做网站智能工具江阴企业网站制作

怎么看网站有没有做推广大数据营销系统多少钱

广东工厂搜索seoseo平台优化服务

网站开发平台 eclipseseo网站推广案例

什么网站做调查能赚钱关键词优化报价推荐

网站开发职业认知小结开发一个app平台大概需要多少钱?

装修公司全包项目seo搜索引擎实训心得体会

爱站网是干什么的长沙关键词排名首页

wordpress 教垜四川seo推广公司

网站及单位网站建设情况免费男女打扑克的软件

公司有网站有什么好处网上开店如何推广自己的网店

wordpress站点全屏新站如何让百度快速收录

wordpress 会议主题推广排名seo

源码开发网站建设sem与seo的区别

如何查网站的空间防恶意点击软件

单位网站建设收费标准互联网推广引流

网站有中文源码加英文怎么做关键词歌词完整版

建设网站企业银行做网站的平台