Click here to Skip to main content
15,881,882 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
See more:
Need help in formatting the output. Please help!!!

test1.xml


Python
<pre><?xml version="1.0"?>
<?xml-stylesheet href="catalog.xsl" type="text/xsl"?>
<!DOCTYPE catalog SYSTEM "catalog.dtd">
<catalog>
   <product description="Cardigan Sweater" product_image="cardigan.jpg">
      <catalog_item gender="Men's">
         <item_number>QWZ5671</item_number>
         <cool_number>QWZ5671</cool_number>
         <price>39.5</price>
         <size description="Medium">
            <color_swatch image="red_cardigan.jpg">Red</color_swatch>
            <color_swatch image="burgundy_cardigan.jpg">Burgundy</color_swatch>
         </size>
      </catalog_item>
      <catalog_item gender="Women's">
         <item_number>RRX986</item_number>
         <price>42.50</price>
         <size description="Small">
            <color_swatch image="red_cardigan.jpg">Red</color_swatch>
            <color_swatch image="navy_cardigan.jpg">Nay</color_swatch>
            <color_swatch image="burgundy_cardigan.jpg">Burundy</color_swatch>
         </size>
      </catalog_item>
   </product>
</catalog>


test2.xml


Python
<pre><?xml version="1.0"?>
<?xml-stylesheet href="catalog.xsl" type="text/xsl"?>
<!DOCTYPE catalog SYSTEM "catalog.dtd">
<catalog>
   <product description="Cardigan Sweater" product_image="cardigan.jpg">
      <catalog_item gender="Men's">
         <item_number>QWZ5671</item_number>
         <cool_number>QWZ5671</cool_number>
         <price>39.5</price>
         <size description="Medium">
            <color_swatch image="red_cardigan.jpg">pink</color_swatch>
            <color_swatch image="burgundy_cardigan.jpg">Burgundy</color_swatch>
         </size>
      </catalog_item>
      <catalog_item gender="Women's">
         <item_number>peac</item_number>
         <price>42.50</price>
         <size description="Small">
            <color_swatch image="red_cardigan.jpg">lost</color_swatch>
            <color_swatch image="navy_cardigan.jpg">pet</color_swatch>
            <color_swatch image="burgundy_cardigan.jpg">hey</color_swatch>
         </size>
      </catalog_item>
   </product>
</catalog>


current output with no filenames and jumbled differences


Python
{'QWZ5671': [{'color_swatch': ['Red', 'pink']}],
 'RRX986': [{'item_number': ['RRX986', 'peac']},
            {'color_swatch': ['hey', 'pet', 'Burundy', 'Nay', 'lost', 'Red']}]}


Expected output with proper formatting and filenames. if someone can help with this

Python
{'QWZ5671': [{'color_swatch': ['test1.xml': 'Red', 'test2.xml': 'pink']}],
 'RRX986': [{'item_number': ['test1.xml': 'RRX986', 'test2.xml': 'peac']},
            {'color_swatch': ['test1.xml':'Burundy, 'test2.xml':'hey'], 
                             ['test1.xml':'Nay', 'test2.xml':'pet'],
                             ['test1.xml': 'Red','test2.xml': 'lost']}]}


What I have tried:

Python
from lxml import etree
from collections import defaultdict
import pprintpp
from pprintpp import ppprint as pp

root_1 = etree.parse('test1.xml').getroot()
root_2 = etree.parse('test2.xml').getroot()

d1, d2 = [], []
for node in root_1.findall('.//catalog_item'):
    item = defaultdict(list)
    for x in node.iter():
        if x.attrib:
            item[x.attrib.keys()[0]].append(x.attrib.values()[0])
        if x.text.strip():
            item[x.tag].append(x.text.strip())
    d1.append(dict(item))

for node in root_2.findall('.//catalog_item'):
    item = defaultdict(list)
    for x in node.iter():
        if x.attrib:
            item[x.attrib.keys()[0]].append(x.attrib.values()[0])
        if x.text.strip():
            item[x.tag].append(x.text.strip())
    d2.append(dict(item))

d1 = sorted(d1, key = lambda x: x['item_number'])
d2 = sorted(d2, key = lambda x: x['item_number'])

res_dict = defaultdict(list)
for x, y in zip(d1, d2):
    for key1, key2 in zip(x.keys(), y.keys()):
        if key1 == key2 and sorted(x[key1]) != sorted(y[key2]):
            res_dict[x['item_number'][0]].append({key1: list(set(x[key1]) ^ set(y[key2]))})

if res_dict == {}:
  print('Data is same in both XML files')
else:
  pp(dict(res_dict))
Posted
Updated 19-Jun-20 1:05am
Comments
Richard MacCutchan 19-Jun-20 8:14am    
You need to add the source file names to each item in the lists.
Member 14867652 19-Jun-20 9:43am    
It is done. Any suggestion for jumble sorting in the current output? like it is coming in

{'color_swatch': ['hey', 'pet', 'Burundy', 'Nay', 'lost', 'Red']}]}

values are mixed from both xmls

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900