|
"Hutber's Law". No idea what that meant. Looked it up. Still no idea what it means. I think I've been insulted but I don't know. Play nice.
|
|
|
|
|
My reply was for Richard Deeming, rather than you. But I was alluding to the fact that Microsoft has changed the templates so you don't need to put the main method in a C# Console app, which confuses people because most of the tutorials that you can find will have been created before this "feature" was introduced. And for the record (not an insult to you) Hutber's law - Wikipedia[^].
|
|
|
|
|
They should never have added that "feature" ...
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
"Common sense is so rare these days, it should be classified as a super power" - Random T-shirt
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
I have to extract payment data from PDD's. Some have just one or two lines I need, and other have rows and columns of data.
What's the best way to extract this data using C#?
In theory, theory and practice are the same. But in practice, they never are.”
If it's not broken, fix it until it is.
Everything makes sense in someone's mind.
|
|
|
|
|
Kevin, you've been here long enough; asked enough questions already to know that that's far too vague a query to get anything practical in terms of an answer - all we can do is generically direct you to something like NuGet Gallery | iTextSharp 5.5.13.3[^] and suggest you start there!
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
"Common sense is so rare these days, it should be classified as a super power" - Random T-shirt
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
I was hoping for "I've used...." or "Try this api" kind of answers. I don't have enough info to give more detail because I don't know where to start
In theory, theory and practice are the same. But in practice, they never are.”
If it's not broken, fix it until it is.
Everything makes sense in someone's mind.
|
|
|
|
|
Most people don't use PDFs to transfer data between applications of any kind, so the pool of people who could answer that question is exceedingly small, like none of the regulars around here would have done it.
|
|
|
|
|
If it's a pdf of an image, there's no text either.
There is no "general" solution.
"Before entering on an understanding, I have meditated for a long time, and have foreseen what might happen. It is not genius which reveals to me suddenly, secretly, what I have to say or to do in a circumstance unexpected by other people; it is reflection, it is meditation." - Napoleon I
|
|
|
|
|
I know what you mean, but ... as Dave says PDF is not a data transfer format, it's a user presentation format.
Using it to transfer computer readable info is like using Word to send a bitmap in an email - you could probably do it, but anyone who saw what you were doing would be wondering "what fool came up with *that* idea?"
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
"Common sense is so rare these days, it should be classified as a super power" - Random T-shirt
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
Ya I hear ya. My client has Invices in PDF format that need data extracted. I found https://docs.apryse.com/which looks promising.
In theory, theory and practice are the same. But in practice, they never are.”
If it's not broken, fix it until it is.
Everything makes sense in someone's mind.
|
|
|
|
|
I guess it depends on where the PDF's originate: if it's a single company and they will guarantee to never, ever, ever change the format in writing signed in blood I might give it consideration - but invoices? I can see so many ways in which that could go seriously wrong and somebody end up in jail for tax evasion ... I'd probably decline to quote on that job myself.
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
"Common sense is so rare these days, it should be classified as a super power" - Random T-shirt
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
Yes. Should be using EDI ... or "something".
EDI 810 Invoice: Transactions, Format & Specifications | Astera
"Before entering on an understanding, I have meditated for a long time, and have foreseen what might happen. It is not genius which reveals to me suddenly, secretly, what I have to say or to do in a circumstance unexpected by other people; it is reflection, it is meditation." - Napoleon I
|
|
|
|
|
In some businesses (especially insurance), it was common to use PDFs for data capture purposes. This would then be transferred over to companies to process this and convert the data contained inside into something that could be used in the office.
|
|
|
|
|
Yes, if I remember correctly PDF has a Forms mode which limits what users can enter and where?
But you wouldn't use that for invoices!
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
"Common sense is so rare these days, it should be classified as a super power" - Random T-shirt
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
I've seen this used for something akin to invoices in the past. That's how the commercial insurance industry operates; they do love their PDFs.
|
|
|
|
|
I've used Ghostscript to parse PDF files before. You don't even need to write any code, just use the precompiled tools.
ghostscript extract PDF text
They have a C# wrapper but I've never used it.
|
|
|
|
|
Tika has a PDF parser. Among many others.
Apache Tika – Apache Tika[^]
You would of course still need to code to each individual different format.
(Note that there are even image parsers.)
|
|
|
|
|
The title os this message might look like a joke, but it is actually very serious.
The other day I wanted to implement my own BigInteger class, and I was using byte s as the backing fields. Yet, I would only use values from 0 to 9 for each digit.
Bing AI suggested me to create an enum with values ranging from D0 to D9 (I think their actual values are obvious).
Yet, using an enum like that doesn't forbid users from doing things like (DecimalDigit)56 and pass 56 to an enum that was only supposed to support values ranging from 0 to 9.
Of course I can validate the values at run-time... but the entire purpose of using an enum was to avoid mistakes like that.
So, my solution was to create a class (in fact, a struct , but a class serves the same purpose) that has a private constructor, and has public static readonly fields ranging from D0 to D9. This way, users outside of the class, except if they really want to mess up (like using unsafe reflection) cannot pass values that aren't in the 0-9 range.
This also reminded me of a job where we had one enum with like 20 values... and then, many, many, many switches to get the many different traits of those enums.
Wouldn't it be better to just have classes, with all the traits, and use the classes?
Aside from the use of the enum in switch statements, they work the same in most cases, work even easier in cases where we usually had to use helper methods... and if a new trait is added, we have a single place (where the enum values are declared) to fix... with no chance of "forgetting" a case in a switch somewhere else.
What do you guys think?
Example:
public struct DecimalDigit
{
public static readonly DecimalDigit D0 = new(0);
public static readonly DecimalDigit D1 = new(1);
public static readonly DecimalDigit D2 = new(2);
public static readonly DecimalDigit D3 = new(3);
public static readonly DecimalDigit D4 = new(4);
public static readonly DecimalDigit D5 = new(5);
public static readonly DecimalDigit D6 = new(6);
public static readonly DecimalDigit D7 = new(7);
public static readonly DecimalDigit D8 = new(8);
public static readonly DecimalDigit D9 = new(9);
private DecimalDigit(byte value)
{
_value = value;
}
private readonly byte _value;
public byte ByteValue
{
get => _value;
}
}
public enum DecimalDigit:
byte
{
D0,
D1,
D2,
D3,
D4,
D5,
D6,
D7,
D8,
D9
}
Notice that although the enum version is smaller, if we need to add names for the values, in the class we just add a property, for the real enum, we create a helper method.
If we need to convert them to numbers, add an emoji or whatever, in the first version it is just a matter of adapting the class, while in the second it is a matter of creating more (and somewhat unrelated) methods.
Edit: I had some questions about why create a new decimal class. There is not a real need to create one. I just wanted to do it as an exercise. I can tell that .NET implemented BigInteger is way faster than my class. Yet, just by writing the UnsignedDecimalInteger I saw opportunities to write Quadbits (effectively, half of a hexadecimal value... or just 4 bits), so in one byte I can store 2 Quadbits. I also saw opportunities for caching of the internal buffers I use... and I am just "relearning" how to do math the "old way" using decimal values. I will, at some point, improve it to use 32 or 64 bits at once.
Also, one of the next steps, be it with BigInteger or my UnsigedDecimalInteger, is to create a BigDecimal or similar class. In fact, having a value alone (without caring about operations), I just need to have a value telling where the dot separating the integer part and the fractional part. Or, I can literally have two BigInteger (or similar), one for the left side, and one for the right side, of the decimal.
modified 1-Aug-23 18:09pm.
|
|
|
|
|
I use .IsDigit more often than defining what one is. Enums make code more readable. And can save storage.
"Before entering on an understanding, I have meditated for a long time, and have foreseen what might happen. It is not genius which reveals to me suddenly, secretly, what I have to say or to do in a circumstance unexpected by other people; it is reflection, it is meditation." - Napoleon I
|
|
|
|
|
Your DecimalDigit struct should be marked as readonly , since it will never be mutated.
You could also replace the backing field with a read-only auto property.
You'll probably want to implement IEquatable<T> , and possibly IComparable<T> . And override ToString . At which point, it might be better to use a readonly record struct .
And after all that, without having to resort to reflection or unsafe code, you can still create an invalid instance:
ReadOnlySpan<byte> span = new byte[] { 42 };
DecimalDigit digit = System.Runtime.InteropServices.MemoryMarshal.Read<DecimalDigit>(span);
Console.WriteLine(digit.ByteValue);
"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer
|
|
|
|
|
Thanks, I forgot I can make the entire struct final.
Yet, I wouldn't get rid of the backing field... I don't like to make a class/struct use property getters instead of using their fields directly... that's really a personal choice.
In any case, I will mark the struct as readonly. Thanks for reminding me of that!
|
|
|
|
|
Question... if I have an array of my struct... the array itself is readonly... yet I can modify the contents of the array... should I mark the class readonly because I never replace one array by another, or not, as there are mutator methods to change the contents of the inner array?
|
|
|
|
|
You can't mark a class as readonly ; only a struct .
And a struct containing an array of struct s is probably a bad idea - the entire array would need to be stored inline as part of the outer struct s data.
"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer
|
|
|
|
|
The real class implements IEquatable and IComparable. I simply didn't want to make the code to huge for this discussion.
And, even though Marshal methods aren't marked as unsafe they are, by definition, unsafe. In fact, reflection itself is not called unsafe, although you can create empty instances of classes that do not define default constructors and the like...
|
|
|
|
|
Paulo Zemek wrote: to implement my own BigInteger class, Idea: add information to your post about why you want/need to do this.
What functionality does the MS BigInteger Struct provide that you need extended, modified, etc. [^]
? [^]
«The mind is not a vessel to be filled but a fire to be kindled» Plutarch
|
|
|
|
|