Click here to Skip to main content
15,886,026 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
Hi,

In our VSTO document-level add-in we have to reliably store the contents of RichText ContentControls in a persistable form, for later restoration in its original form.

As the documentation of both Interop and VBA APIs seems to tell, this should be possible by using the content's OpenXML form: Retrieving it with ContentControl.Range.WordOpenXML, and restoring it with ContentControl.Range.InsertXML.

This works in many cases, but fails in other ones, see below.

Does anybody know if there is a better, really reliable(!) way of saving and restoring the contents? Or can you tell us how to work around the current odd behaviour which from our point of view are bugs?

Since official Microsoft support told us that they do not deal with Office API issues I hope that also someone from Microsoft's Office development reads this ...

Thanks in advance,

J.K

What I have tried:

Three examples of unreliability:
(1) A RichText ContentControl ("CC") containing a single paragraph including the terminating paragraph break: The retrieved WordOpenXML is equal to the one taken from a CC containing a single paragraph without the paragraph break. The result is not surprising: In both cases no paragraph break is restored after Range.Insert. In other words: WordOpenXML failed to extract the correct XML in the case where the original contained the break.

(2) A CC containing a single image (and nothing else) is restored without the image, i.e. different from the original. If it contains anything more than the image, there is no such problem.

(3) Examples 1 and 2 apply if the page header section has never been visited. At least in VBA - I haven't checked it in depth with VSTO/interop - the behaviour changes drastically once a page header has been visited. Only visited, nothing more! The restored contents then contains an additional final paragraph break. So if you save and restore it three times, you'll end up with three paragraph breaks which were not present in the original contents. This weird behaviour cannot even be stopped with saving and re-opening the document...
Posted
Updated 11-May-21 3:01am
Comments
[no name] 11-May-21 14:59pm    
For every "example", you've identified the issue. You can either wait for the next "version", or do the fixes yourself; e.g. add the "paragraph break" yourself when "reloading". The difference between wolves and dogs.
Juliett Kilo 12-May-21 1:31am    
@Gerry Schmitz: This does not work that "easily". Corrections are too late at the time of reloading because then I do not have the original anymore and don't know whether, e.g., I have to add a paragraph, or how else to correct the memorized contents. So the XML has to be corrected already at the time of reading the XML. Do you really want to suggest that I (1) analyze the XML returned by WordOpenXML and (2) add e.g. the missing image (example 2) myself if it is missing? I guess you can imagine that this nearly means that I would have to develop another version of Word myself ...
[no name] 12-May-21 14:37pm    
I could be easy or hard. Do you know how to fix it manually? (so you can "program" it if you had to)

You said the image shows if there's other content: so, add "hidden content" or content before you serialize when there is only an image. Then take it out when you deserialize. (Hard?)

What's a "line break" look like. Are you saying you can't find 1; or 3 in a row; and remove them. (Hard?)

How many ways can you serialize / deserialize? .Text property? XML only?

I have no answer as far as "easy" is concerned.

There's good, fast and cheap: you get to pick 2.

(I "generate" book content for rich text controls that includes images, text, hyperlinks and ui elements. And parse them again at run time. I don't consider RTB's a big deal)

Juliett Kilo 14-May-21 1:29am    
@Gerry Schmitz: "Easy" is not the issue. But a solution must be (1) reliable and (2) feasible with reasonable effort.

Serialization: .Text is not an option: It cannot contain e.g. the mentioned images. XML is an option, as mentioned, but .WordOpenXML doesn't work reliably. What alternatives can you suggest to serialize any whatsoever Word contents? [That's been part of my original question :-) ]

"And parse them again at run time": In XML or in the object model? The object model is not the problem. Analyzing and modifying the XML, however, is an unreasonable effort from our point of view.

By the way: I haven't got why you mentioned RTBs here. I can't see how they could help in a VSTO document-level customization.
[no name] 24-May-21 0:13am    
"Feasible with reasonable effort". Without digging deeper you won't find it; and I get that digging is not your cup of tea. So, you get to wait until someone else thinks it's worth doing.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900