Click here to Skip to main content
15,881,938 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I am trying to read different tag values (like tags 259 (Compression), 33432 (Copyright), 306 (DateTime), 315 (Artist) etc.) from a TIFF image in Java 11.

What I have tried:

I tried with ImageIO like following:

Java
File tiffFile = new File(tiffFileName);

    ImageInputStream input = ImageIO.createImageInputStream(tiffFile) 
    ImageReader reader = ImageIO.getImageReaders(input).next(); 

    reader.setInput(input);
    IIOMetadata metadata = reader.getImageMetadata(0); 

    TIFFDirectory ifd = TIFFDirectory.createFromMetadata​(metadata);
    TIFFField myTag = ifd.get​TIFFField(33432); 
    String tagString = myTag.getAsString(0);  
    // problem here

    //String[][] replacements = { { "ä", "ae" }, { "ü", "ue" }, { "ö", "oe" }};
    String[][] replacements = {{"\u00C4", "Ae"}, {"\u00DC", "Ue"}, {"\u00D6", "Oe"},    
          {"\u00E4", "ae"}, {"\u00FC", "ue"}, {"\u00F6", "oe"}, {"\u00DF", "ss"} };

    for (String[] replacement : replacements) {
       tagString = tagString.replaceAll(replacement[0], replacement[1]);
    }


But it does not give exact value of the tag. In case of non-ASCII values (ö, ü, ä etc), question marks replace the real values. TIFFField.getAsString(0) return values like Universit�t. But I want Universität.

Can anyone tell me how to get byte values of the tag, then decode it with utf-8 to get the exact tag values ?

Suggestion for alternative java library for reading the TIFF images is also welcome. I just need to read the exact tag values including non-ASCII characters.
Posted
Updated 11-Nov-20 2:35am
v6
Comments
Richard MacCutchan 6-Nov-20 7:32am    
The values are correct, it is your display code that is producing the strange characters. You need to know the language that is being used in the text and adjust your display font to match it.
Member 12213239 6-Nov-20 8:58am    
any idea how to handle the display font ?
Richard MacCutchan 6-Nov-20 11:20am    
That depends on how you are displaying the results.
Member 12213239 6-Nov-20 11:45am    
I want to replace the umlaut (ä, ö, and ü) with equivalent characters like ae, oe and ue. My problem here is TIFFField.getAsString(0) return values like Universit�t, not exact value Universität. Can you specifically tell me how to get the exact value including the umlaut ?
Richard MacCutchan 6-Nov-20 12:02pm    
No, they do not return "Universit�t", that is produced by you trying to display a character in a font that has no equivalent for that character's value. You need to examine the character's actual value. It is no use trying to print it and hoping for the best. Look at the Character Map application in the Windows Accessories folder on the start menu. That will show you what characters are equivalent in different language fonts.

1 solution

Quote:
Can anyone tell me how to get byte values of the tag, then decode it with utf-8 to get the exact tag values ?

First, you need to understand that before unicode (DOS era), ascii codes between 128-255 where used for special chars and with pagecodes to handle different charsets.
ASCII Code - The extended ASCII table[^]
One of the reasons TIFF uses this is that TIFF was created before unicode/utf exist, at the time they needed ways to encode non ascii chars.
-So to know what was read, you need to display as hexadecimal.
Your read is probably: 55 6E 69 76 65 72 73 69 74 84 74, ä is usually encoded as 84.
- You need to understand how you data is encoded and then call function that will convert to the coding of your app.
- if you want to update this data, you will need to do a coding in reverse.

In your case, you probably need a conversion from CP437 to urf8.
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900