-
Could you tell me how to extract text chunks with coordinates? I'm trying to replace itext but still using rectandtext location strategy |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 3 replies
-
I was able to do this. All fixed. |
Beta Was this translation helpful? Give feedback.
-
Hi, this gets me most of the way. I usually use a GAP of .3 and OrigRow is true. The GAP is basically used to add space between each character, so you can play with it depending on the font size. Things get a bit messy if the fonts change size quite a lot. oh and you can swap the output to be either spool or lines if you want an xml dump of the text data. using System;
} |
Beta Was this translation helpful? Give feedback.
-
I'm not sure to understand exactly what you want to achieve, but did you have a look at the Layout Analysis tools available? Maybe have a look at the wiki here... I'd start with the NearestNeighbourWordExtractor |
Beta Was this translation helpful? Give feedback.
Hi, this gets me most of the way. I usually use a GAP of .3 and OrigRow is true. The GAP is basically used to add space between each character, so you can play with it depending on the font size. Things get a bit messy if the fonts change size quite a lot.
Let me know if you find a better way.
oh and you can swap the output to be either spool or lines if you want an xml dump of the text data.
using System;
using System.Text;
using System.IO;
using Org.BouncyCastle.Cms;
using System.Collections.Generic;
using System.Xml;
using System.Xml.Linq;
using System.Linq;
using System.Collections;
using UglyToad.PdfPig;
using UglyToad.PdfPig.DocumentLayoutAnalysis.TextExtractor;
namespace PDFTools
{
…