Click here to Skip to main content
15,889,216 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
See more:
Hi,
I was asked to convert a text file in the format

10_1 a;b<>cd<>ef

to be converted into xml as

<ID> 10_1 </ID>
<name>
<Name1> a </Name1>
<Name2> b </Name2>
</name>
<Def> cd </def>
<Ven> ef </Ven>

I am having problem in parsing the first part of the problem, splitting the ID and the name.

I am attaching the code I have written so far, and please do not post suggestions or comments as to how to do it, I have looked into a lot of examples but I am having continuous error issues.

What I have tried:

public class ToXML {

	BufferedReader in;
	StreamResult out;
	TransformerHandler th;
	AttributesImpl atts;

	public static void main(String args[]) {
		new ToXML().doit();
	}

	public void doit() {
		try {
			in = new BufferedReader(new FileReader("E.txt"));
			out = new StreamResult("E.xml");
			initXML();
			String str;
			while ((str = in.readLine()) != null) {
				process(str);
			}
			in.close();
			closeXML();
		} catch (IOException | ParserConfigurationException | TransformerConfigurationException | SAXException e) {
		}
	}

	public void initXML() throws ParserConfigurationException,
			TransformerConfigurationException, SAXException {
		SAXTransformerFactory tf = (SAXTransformerFactory) SAXTransformerFactory
				.newInstance();

		th = tf.newTransformerHandler();
		Transformer serializer = th.getTransformer();
		serializer.setOutputProperty(OutputKeys.ENCODING, "ISO-8859-1");
		serializer.setOutputProperty(
				"{http://xml.apache.org/xslt}indent-amount", "4");
		serializer.setOutputProperty(OutputKeys.INDENT, "yes");
		th.setResult(out);
		th.startDocument();
		atts = new AttributesImpl();
		th.startElement("", "", "Author", atts);
	}

	public void process(String s) throws SAXException {
		String[] elements = s.split("<>;");
		atts.clear();
		th.startElement("", "", "Data", atts);
		th.startElement("", "", "Bibliofile", atts);
		th.characters(elements[0].toCharArray(), 0, elements[0].length());
		th.endElement("", "", "Bibliofile");
		th.endElement("", "", "Data");
	}

	public void closeXML() throws SAXException {
		th.endElement("", "", "Author");
		th.endDocument();
	}
}
Posted
Updated 6-Jun-18 22:01pm
v4
Comments
Maciej Los 6-Jun-18 3:09am    
Improve your question and provide proper xml data structure
Richard MacCutchan 6-Jun-18 3:31am    
"I am having continuous error issues."
Well, I am afraid no one here can guess what they are.
Souvik Bhattacharya 6-Jun-18 7:59am    
I think I have posted my query very clearly above. Do read the entire thing first. You'll understand what I meant.
Richard MacCutchan 6-Jun-18 8:29am    
Sorry, but you still have not explained what these "error issues" are, and we have no way of guessing. If your code does not work then please explain what is going wrong and where the errors occur.
Souvik Bhattacharya 6-Jun-18 8:36am    
I want to separate my text file in the above given xml format. The code I wrote gives the xml version of the text but nothing close to the the format I have mentioned. What should I do?

Use a regular expression (untested: "(.+?) (.+?);(.+?)<>(.+?)<>(.+)") or split multiple times:
  1. Split by space to get the ID and the remaining part
  2. Split the remaining part by "<>"
  3. Split the first part from the above by ";" to get the two names
  4. The two other parts are the paper and task fields

[EDIT]
When not using the regex method the splitting by space from the first step should be done "manually" by locating the first space character in the input string rather than using String.Split(). Otherwise the "remaining part" may consist of multiple sub strings when there are more spaces in the input string.
[/EDIT]
 
Share this answer
 
v2
Comments
Souvik Bhattacharya 6-Jun-18 7:58am    
Can you tell me where do I apply this function?
Jochen Arndt 6-Jun-18 8:03am    
The code should be in your process() function where you handle a single input line.

While thinking about it:
Splitting by space will work for the ID but might create multiple remaining parts when those contain spaces too. I will update my answer.
Souvik Bhattacharya 6-Jun-18 8:13am    
I did, the present update is that it just reads the ID. And shows nothing else.
Jochen Arndt 6-Jun-18 8:39am    
Remember that I can't see your code / screen. So I don't kow what you are doing wrong.

But even in your inital example you have only used elements[0] and not the additional sub strings.

Try something like this (untested from scratch without error checks)
int pos = s.indexOf(' ');
String sID = s.substring(0, pos);
// All after the first space
String sRemain = s.substring(pos + 1);
// Get Names, Def, and Ven
String[] elements = sRemain.split("<>");
// Get Name1 and Name2
String[] names = elements[0].split(";");
// Name1 in names[0]
// Name2 in names[1]
// Def in elements[1]
// Ven in elements[2]
Souvik Bhattacharya 6-Jun-18 10:37am    
I have pasted my entire code. Is that how you wanted?
Based on this: How to split a string in Java: String Split with multiple characters using Regex solution by Ravindra babu[^] i'd try something like this:

Java
String line = "10_1 a;b<>cd<>ef";
//String delimiters = "[ \\;\\<>]";
String delimiters = "[\\s\\;\\<>]";

String[] result = line.split(delimiters);
String xmlContent = "<Root><ID>" + result[0] + "</ID>";
xmlContent += "<Name><Name1>" + result[1] + "</Name1>";
xmlContent += "<Name2>" + result[2] + "</Name2></Name>";
xmlContent += "<Def>" + result[3] + "</Def>";
xmlContent += "<Ven>" + result[4] + "</Ven></Root>";
System.out.println(xmlContent);


Note: not tested, but it should work as well!
 
Share this answer
 
Comments
Souvik Bhattacharya 7-Jun-18 6:17am    
What you posted takes only the first line of the text file. I have 500 more in that exact same pattern.
Maciej Los 7-Jun-18 6:21am    
Yes, because i've used a single line only!
You have to read lines from a file (line by line) and proceed every single line as provided in a code above.
Think of it!
Souvik Bhattacharya 7-Jun-18 6:37am    
Okay. Thank you!
Maciej Los 7-Jun-18 6:44am    
You're very welcome.
Please accept my solution (green button), if it was helpful.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900