forked from chornrithborey/Web-Design-Project-
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy patharticle.html
118 lines (112 loc) · 7.46 KB
/
article.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<script src="https://kit.fontawesome.com/674b195f9c.js" crossorigin="anonymous"></script>
<link href="https://fonts.googleapis.com/css2?family=Merriweather:wght@300&display=swap" rel="stylesheet">
<link rel="stylesheet" href="assets/stylesheet/style.css">
<title>Article</title>
</head>
<body>
<div class="progress-bar" id="progress-bar"></div>
<article class="article-container">
<button class="btn-mode-toggle"><i class="fas fa-moon fa-2x"></i></button>
<h1>Introducing Khmer Optical Character Recognition</h1>
<h2>Introduction</h2>
<figure>
<img class="article-img" src="https://images.unsplash.com/photo-1526666923127-b2970f64b422?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&auto=format&fit=crop&w=1352&q=80" alt="Satellite image">
<figcaption>Satellite by Donald Glover</figcaption>
</figure>
<p>
<a href="#">The Ministry of Posts and Telecommunications</a> is the government ministry that governs the
postal system and
the telecommunications systems of Cambodia. As of 2013 the Minister of Posts and Telecommunications was <a
href="#">So
Khun</a>; the ministry maintains offices in Phnom Penh.
</p>
<h3>About Optical Character Recognition</h3>
<p>
Optical character recognition or optical character reader (OCR) is the <a
href="https://en.wikipedia.org/wiki/Electronics">electronic</a> or mechanical conversion
of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document,
a photo of a document, a scene-photo (for example the text on signs and billboards in a landscape photo) or
from subtitle text superimposed on an image (for example: from a <a href="">television broadcast</a>).
</p>
<figure>
<img class="article-img"
src="https://images.unsplash.com/photo-1496942299866-9e7ab403e614?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&auto=format&fit=crop&w=1350&q=80"
alt="Abstract light image">
<figcaption>Light dance by Gertrūda Valasevičiūtė</figcaption>
</figure>
<p>
Widely used as a form of data entry from printed paper data records – whether passport documents, invoices,
bank statements, <a href="#">computerized receipts</a>, business cards, mail, printouts of static-data, or
any suitable
documentation – it is a common method of digitizing printed texts so that they can be electronically edited,
searched, stored more compactly, displayed on-line, and used in machine processes such as cognitive
computing, machine translation, (extracted) text-to-speech, key data and text mining. OCR is a field of
research in pattern recognition, <a href="https://en.wikipedia.org/wiki/Artificial_intelligence">artificial
intelligence</a> and computer vision.
</p>
<p>
Early versions needed to be trained with images of each character, and worked on one font at a time.
Advanced systems capable of producing a high degree of recognition accuracy for most fonts are now common,
and with support for a variety of digital image file format inputs.[2] Some systems are capable of
reproducing formatted output that closely approximates the original page including images, columns, and
other non-textual components
</p>
<h4>History</h4>
<p>
Early optical character recognition may be traced to technologies involving telegraphy and creating reading
devices for the blind.[3] In 1914, Emanuel Goldberg developed a machine that read characters and converted
them into standard telegraph code.[4] Concurrently, Edmund Fournier d'Albe developed the Optophone, a
handheld scanner that when moved across a printed page, produced tones that corresponded to specific letters
or characters.[5]
</p>
<p>
In the late 1920s and into the 1930s Emanuel Goldberg developed what he called a "Statistical Machine" for
searching microfilm archives using an optical code recognition system. In 1931 he was granted USA Patent
number 1,838,389 for the invention. The patent was acquired by IBM.
</p>
<h2>Application</h2>
<p>OCR engines have been developed into many kinds of domain-specific OCR applications, such as receipt OCR,
invoice OCR, check OCR, legal billing document OCR. They can be used for:</p>
<ul>
<li>Data entry for business documents, e.g. check, passport, invoice, bank statement and receipt</li>
<li>Automatic number plate recognition</li>
<li>In airports, for passport recognition and information extraction</li>
<li>Automatic insurance documents key information extraction</li>
<li>Extracting business card information into a contact list</li>
<li>More quickly make textual versions of printed documents, e.g. book scanning for Project Gutenberg</li>
<li>Make electronic images of printed documents searchable, e.g. Google Books</li>
<li>Converting handwriting in real time to control a computer (pen computing)</li>
</ul>
<figure>
<img class="article-img"
src="https://images.unsplash.com/photo-1526314114033-349ef6f72220?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&auto=format&fit=crop&w=1396&q=80"
alt="Library image">
<figcaption>Library by The Associated Press</figcaption>
</figure>
<h2>Types</h2>
<ol>
<li>Optical character recognition (OCR) – targets typewritten text, one glyph or character at a time.</li>
<li>Optical word recognition – targets typewritten text, one word at a time (for languages that use a space
as a word divider). (Usually just called "OCR".)</li>
<li>Intelligent character recognition (ICR) – also targets handwritten printscript or cursive text one glyph
or character at a time, usually involving machine learning.</li>
<li>Intelligent word recognition (IWR) – also targets handwritten printscript or cursive text, one word at a
time. This is especially useful for languages where glyphs are not separated in cursive script.</li>
</ol>
<p>OCR is generally an "offline" process, which analyses a static document. There are cloud based services which
provide an online OCR API service. Handwriting movement analysis can be used as input to handwriting
recognition.[14] Instead of merely using the shapes of glyphs and words, this technique is able to capture
motions, such as the order in which segments are drawn, the direction, and the pattern of putting the pen
down and lifting it. This additional information can make the end-to-end process more accurate. This
technology is also known as "on-line character recognition", "dynamic character recognition", "real-time
character recognition", and "intelligent character recognition".</p>
</article>
</div>
<script src="assets/scripts/article.js"></script>
</body>
</html>