Kyle Wiggers, writing for VentureBeat: Optical character recognition (OCR), or the conversion of images of handwritten or printed text into machine-readable text, is a science that dates back to the early '70s. But algorithms have long struggled to make out characters that aren't parallel with horizontal planes, which is why researchers at Amazon developed what they call TextTubes. They're detectors for curved text in natural images that model said text as tubes around their medial (middle) axes. In a paper [PDF] describing their work, the coauthors claim that their approach achieves state-of-the-art results on a popular OCR benchmark. As the researchers explain, scene text is typically broken down into two successive tasks: text detection and text recognition. The first involves localizing characters, words, and lines using contextual clues, while the second aims to transcribe their content. Both are easier said than done -- text in the wild is affected not only by deformations, but by viewpoint changes and arbitrary fonts. The team's solution is a "tube" representation of the text reference frame that captures most of the variability, taking advantage of the fact that target text is usually a concatenation of characters of similar size. It's formulated as a mathematical function that enables the training of machine learning scene text detectors, in contrast to traditional approaches that use overlap- and noise-prone rectangles and quadrilaterals to capture text information.

Read more of this story at Slashdot.



from RSSMix.com Mix ID 8859861 https://ift.tt/36a2LWB

Post a Comment

Previous Post Next Post