午夜剧场伦理_日本一道高清_国产又黄又硬_91黄色网战_女同久久另类69精品国产_妹妹的朋友在线

文章詳情頁(yè)

python 如何做一個(gè)識(shí)別率百分百的OCR

瀏覽：121日期：2022-06-18 09:20:08

目錄寫在前面技術(shù)棧實(shí)現(xiàn)思路具體實(shí)現(xiàn)讀取圖片二值化圖像膨脹找輪廓外接矩形過濾字符字符分割構(gòu)造數(shù)據(jù)集向量搜索（分類）生成結(jié)果寫在前面

當(dāng)然這里說的百分百可能有點(diǎn)夸張，但其實(shí)想象一下，游戲里面的某個(gè)窗口的字符就是那種樣子，不會(huì)變化的。而且識(shí)別的字符可能也不需要太多。中文有大幾千個(gè)常用字，還有各種符號(hào)，其實(shí)都不需要。

這里針對(duì)的場(chǎng)景很簡(jiǎn)單，主要是有以下幾點(diǎn)：

識(shí)別的字符不多：只要識(shí)別幾十個(gè)常用字符即可，比如說26個(gè)字母，數(shù)字，還有一些中文。背景統(tǒng)一，字體一致：我們不是做驗(yàn)證碼識(shí)別，我們要識(shí)別的字符都是清晰可見的。字符和背景易分割：一般來說就是對(duì)圖片灰度化之后，黑底白字或者白底黑字這種。技術(shù)棧

這里用到的主要就是python+opencv了。

python3 opencv-python

環(huán)境主要是以下的庫(kù)：

pip install opencv-pythonpip install imutilspip install matplotlib實(shí)現(xiàn)思路

首先看下圖片的灰度圖。

python 如何做一個(gè)識(shí)別率百分百的OCR

第一步：二值化，將灰度轉(zhuǎn)換為只有黑白兩種顏色。

python 如何做一個(gè)識(shí)別率百分百的OCR

第二步：圖像膨脹，因?yàn)槲覀円ㄟ^找輪廓算法找到每個(gè)字符的輪廓然后分割，如果是字符還好，中文有很多左右偏旁，三點(diǎn)水這種無法將一個(gè)整體進(jìn)行分割，這里通過膨脹將中文都黏在一起。

python 如何做一個(gè)識(shí)別率百分百的OCR

第三步：找輪廓。

python 如何做一個(gè)識(shí)別率百分百的OCR

第四步：外接矩形。我們需要的字符是一個(gè)矩形框，而不是無規(guī)則的。

python 如何做一個(gè)識(shí)別率百分百的OCR

第五步：過濾字符，這里比如說標(biāo)點(diǎn)符號(hào)對(duì)我來說沒用，我通過矩形框大小把它過濾掉。

python 如何做一個(gè)識(shí)別率百分百的OCR

第六步：字符分割，根據(jù)矩形框分割字符。

python 如何做一個(gè)識(shí)別率百分百的OCR

第七步：構(gòu)造數(shù)據(jù)集，每一類基本上放一兩張圖片就可以。

python 如何做一個(gè)識(shí)別率百分百的OCR

第八步：向量搜索+生成結(jié)果，根據(jù)數(shù)據(jù)集的圖片，進(jìn)行向量搜索得到識(shí)別的標(biāo)簽。然后根據(jù)圖片分割的位置，對(duì)識(shí)別結(jié)果進(jìn)行排序。

具體實(shí)現(xiàn)讀取圖片

首先先讀取待識(shí)別的圖片。

import cv2import numpy as npfrom matplotlib import pyplot as pltfrom matplotlib.colors import NoNormimport imutilsfrom PIL import Imageimg_file = 'test.png'im = cv2.imread(img_file, 0)

使用matplotlib畫圖結(jié)果如下：

python 如何做一個(gè)識(shí)別率百分百的OCR

二值化

在進(jìn)行二值化之前，首先進(jìn)行灰度分析。

python 如何做一個(gè)識(shí)別率百分百的OCR

灰度值是在0到255之間，0代表黑色，255代表白色。可以看到這里背景色偏黑的，基本集中在灰度值30，40附近。而字符偏白，大概在180灰度這里。

這里選擇100作為分割的閾值。

thresh = cv2.threshold(im, 100, 255, cv2.THRESH_BINARY)[1]

2值化后效果如下：

python 如何做一個(gè)識(shí)別率百分百的OCR

圖像膨脹

接下來進(jìn)行一個(gè)圖像的縱向膨脹，選擇一個(gè)膨脹的維度，這里選擇的是7。

kernel = np.ones((7,1),np.uint8) dilation = cv2.dilate(thresh, kernel, iterations=1)

python 如何做一個(gè)識(shí)別率百分百的OCR

找輪廓

接下來調(diào)用opencv找一下輪廓，

# 找輪廓cnts = cv2.findContours(dilation.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)cnts = imutils.grab_contours(cnts)

接下來我們?cè)僮x取一下原圖，繪制輪廓看下輪廓的樣子。

python 如何做一個(gè)識(shí)別率百分百的OCR

外接矩形

對(duì)于輪廓我們可以做外接矩形，這里可以看下外接矩形的效果。

python 如何做一個(gè)識(shí)別率百分百的OCR

過濾字符

這里過濾字符的原理其實(shí)就是將輪廓內(nèi)的顏色填充成黑色。下面的代碼是將高度小于15的輪廓填充成黑色。

for i, c in enumerate(cnts): x, y, w, h = cv2.boundingRect(c) if (h < 15):cv2.fillPoly(thresh, pts=[c], color=(0))

填充后可以看到標(biāo)點(diǎn)符號(hào)就沒了。

python 如何做一個(gè)識(shí)別率百分百的OCR

字符分割

因?yàn)閳D像是個(gè)矩陣，最后字符分割就是使用切片進(jìn)行分割。

for c in cnts: x, y, w, h = cv2.boundingRect(c) if (h < 15):continue cropImg = thresh[y:y+h, x:x+w] plt.imshow(cropImg) plt.show()構(gòu)造數(shù)據(jù)集

最后我們創(chuàng)建數(shù)據(jù)集進(jìn)行標(biāo)注，就是把上面的都串起來，然后將分割后的圖片保存到文件夾里，并且完成標(biāo)注。

import cv2import numpy as npimport imutilsfrom matplotlib import pyplot as pltimport uuiddef split_letters(im): # 2值化 thresh = cv2.threshold(im, 100, 255, cv2.THRESH_BINARY)[1] # 縱向膨脹 kernel = np.ones((7, 1), np.uint8) dilation = cv2.dilate(thresh, kernel, iterations=1) # 找輪廓 cnts = cv2.findContours(dilation.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) cnts = imutils.grab_contours(cnts) # 過濾太小的 for i, c in enumerate(cnts):x, y, w, h = cv2.boundingRect(c)if h < 15: cv2.fillPoly(thresh, pts=[c], color=(0)) # 分割 char_list = [] for c in cnts:x, y, w, h = cv2.boundingRect(c)if h < 15: continuecropImg = thresh[y:y + h, x:x + w]char_list.append((x, cropImg)) return char_listfor i in range(1, 10): im = cv2.imread(f'test{i}.png', 0) for ch in split_letters(im):print(ch[0])filename = f'ocr_datas/{str(uuid.uuid4())}.png'cv2.imwrite(filename, ch[1])向量搜索（分類）

向量搜索其實(shí)就是個(gè)最近鄰搜索的問題，我們可以使用sklearn中的KNeighborsClassifier。

訓(xùn)練模型代碼如下：

import osimport numpy as npfrom sklearn.neighbors import KNeighborsClassifierimport cv2import pickleimport jsonmax_height = 30max_width = 30def make_im_template(im): template = np.zeros((max_height, max_width)) offset_height = int((max_height - im.shape[0]) / 2) offset_width = int((max_width - im.shape[1]) / 2) template[offset_height:offset_height + im.shape[0], offset_width:offset_width + im.shape[1]] = im return templatelabel2index = {}index2label = {}X = []y = []index = 0for _dir in os.listdir('ocr_datas'): new_dir = 'ocr_datas/' + _dir if os.path.isdir(new_dir):label2index[_dir] = indexindex2label[index] = _dirfor filename in os.listdir(new_dir): if filename.endswith('png'):im = cv2.imread(new_dir + '/' + filename, 0)tpl = make_im_template(im) # 生成固定模板tpl = tpl / 255 # 歸一化X.append(tpl.reshape(max_height*max_width))y.append(index)index += 1print(label2index)print(index2label)model = KNeighborsClassifier(n_neighbors=1)model.fit(X, y)with open('simple_ocr.pickle', 'wb') as f: pickle.dump(model, f)with open('simple_index2label.json', 'w') as f: json.dump(index2label, f)

這里有一點(diǎn)值得說的是如何構(gòu)建圖片的向量，我們分隔的圖片的長(zhǎng)和寬是不固定的，這里首先需要使用一個(gè)模型，將分隔后的圖片放置到模板的中央。然后將模型轉(zhuǎn)換為一維向量，當(dāng)然還可以做一個(gè)歸一化。

生成結(jié)果

最后生成結(jié)果就是還是先分割一遍，然后轉(zhuǎn)換為向量，調(diào)用KNeighborsClassifier模型，找到最匹配的一個(gè)作為結(jié)果。當(dāng)然這是識(shí)別一個(gè)字符的結(jié)果，我們還需要根據(jù)分割的位置進(jìn)行一個(gè)排序，才能得到最后的結(jié)果。

import cv2import numpy as npimport imutilsfrom sklearn.neighbors import KNeighborsClassifierimport pickleimport jsonwith open('simple_ocr.pickle', 'rb') as f: model = pickle.load(f)with open('simple_ocr_index2label.json', 'r') as f: index2label = json.load(f)max_height = 30max_width = 30def make_im_template(im): template = np.zeros((max_height, max_width)) offset_height = int((max_height - im.shape[0]) / 2) offset_width = int((max_width - im.shape[1]) / 2) template[offset_height:offset_height + im.shape[0], offset_width:offset_width + im.shape[1]] = im return template.reshape(max_height*max_width)def split_letters(im): # 2值化 thresh = cv2.threshold(im, 100, 255, cv2.THRESH_BINARY)[1] # 縱向膨脹 kernel = np.ones((7, 1), np.uint8) dilation = cv2.dilate(thresh, kernel, iterations=1) # 找輪廓 cnts = cv2.findContours(dilation.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) cnts = imutils.grab_contours(cnts) # 過濾太小的 for i, c in enumerate(cnts):x, y, w, h = cv2.boundingRect(c)if h < 15: cv2.fillPoly(thresh, pts=[c], color=(0)) # 分割 char_list = [] for c in cnts:x, y, w, h = cv2.boundingRect(c)if h < 15: continuecropImg = thresh[y:y + h, x:x + w]char_list.append((x, cropImg)) return char_listdef ocr_recognize(fname): im = cv2.imread(fname, 0) char_list = split_letters(im) result = [] for ch in char_list:res = model.predict([make_im_template(ch[1])])[0] # 識(shí)別單個(gè)結(jié)果result.append({ 'x': ch[0], 'label': index2label[str(res)]}) result.sort(key=lambda k: (k.get(’x’, 0)), reverse=False) # 因?yàn)槭菃涡械模灾恍枰ㄟ^x坐標(biāo)進(jìn)行排序。 return ''.join([it['label'] for it in result])print(ocr_recognize('test1.png'))

以上就是python 如何做一個(gè)識(shí)別率百分百的OCR的詳細(xì)內(nèi)容，更多關(guān)于python 做一個(gè)OCR的資料請(qǐng)關(guān)注好吧啦網(wǎng)其它相關(guān)文章！

Python 編程

上一條：Python中requests做接口測(cè)試的方法下一條：python 爬取華為應(yīng)用市場(chǎng)評(píng)論

相關(guān)文章：

1. ASP新手必備的基礎(chǔ)知識(shí)2. asp文件用什么軟件編輯3. CentOS郵箱服務(wù)器搭建系列——SMTP服務(wù)器的構(gòu)建（ Postfix ）4. js實(shí)現(xiàn)計(jì)算器功能5. golang中json小談之字符串轉(zhuǎn)浮點(diǎn)數(shù)的操作6. 通過IEAD+Maven快速搭建SSM項(xiàng)目的過程(Spring + Spring MVC + Mybatis)7. 利用CSS制作3D動(dòng)畫8. IDEA 2020.1.2 安裝教程附破解教程詳解9. Vue axios獲取token臨時(shí)令牌封裝案例10. JS中6個(gè)對(duì)象數(shù)組去重的方法

排行榜

					
					ASP新手必備的基礎(chǔ)知識(shí)
asp文件用什么軟件編輯
CentOS郵箱服務(wù)器搭建系列——SMTP服務(wù)器的構(gòu)建（ Postfix ）
JS中6個(gè)對(duì)象數(shù)組去重的方法
IDEA 2020.1.2 安裝教程附破解教程詳解
js實(shí)現(xiàn)計(jì)算器功能
通過IEAD+Maven快速搭建SSM項(xiàng)目的過程(Spring + Spring MVC + Mybatis)
golang中json小談之字符串轉(zhuǎn)浮點(diǎn)數(shù)的操作
Vue axios獲取token臨時(shí)令牌封裝案例
利用CSS制作3D動(dòng)畫
CSS單標(biāo)簽實(shí)現(xiàn)復(fù)雜的棋盤布局
				

熱門標(biāo)簽

主站蜘蛛池模板：四虎永久在线精品免费一区二区 | 丝袜超碰 | 亚洲高清在线视频 | 久久久久久麻豆 | 免费一级片在线观看 | 狠狠的操| 神马久久网| 激情高潮到大叫狂喷水 | 成人欧美视频 | 色视频在线播放 | 秋霞黄色网 | 亚洲国产网站 | 四虎私人影院 | 欧美性吧| 欧美三级a做爰在线观看 | 日本久久高清视频 | 久久久久久艹 | 婷婷久草| 亚洲第一黄色 | 亚洲日本va | 欧美一级淫片免费视频魅影视频 | 国产极品久久久 | 爱情岛论坛亚洲自拍 | 亚洲人精品 | 日韩成人区 | 精品不卡一区二区 | 精品在线免费观看 | 亚洲精品久久久久久久久久久久久 | 97色资源| 日韩成人精品视频 | 99久久久精品| 欧美伦理在线观看 | 国产一区二区免费视频 | 天堂av亚洲 | 黄色.www| 特黄aaaaaaaaa真人毛片 | 欧美另类日韩 | 欧美国产一级 | 国产高清精品在线 | 国产美女高潮 | 精品欧美久久 |