Click here to Skip to main content
15,887,214 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
I'm parsing the HTML from a "most recent changes" page on a Fandom wiki with this code using BeautifulSoup:

Python
from bs4 import BeautifulSoup

with open("Recent changes Vintergatan Wiki Fandom.htm") as html:
  doc = BeautifulSoup(html, "html.parser")

results = doc.find_all(class_="mw-changeslist-line-inner")

for articles in results:
  name = (articles.find("a")) # Article name
  number = (articles.find(dir="ltr")) # Number of characters added/removed
  user = (articles.find("bdi")) # User who made the edit

  # Skips added together edits of the same article

  if ("×)" in articles):
    continue
  else:

    # Skips edits of user profiles

    if ("ndare:" in name.string):
      continue
    else:
      print(name.string + ", " + number.string + ', ' + user.string)


The code works perfectly fine for the most part, but it returns this error when it passes one of the articles, and I have no idea why.

print(name.string + ", " + number.string + ', ' + user.string)
                           ^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'string'


Specifically, it's this piece of HTML that's causing the problem:

HTML
<table data-mw-revid="8933" data-mw-ts="20230902201309" 
class="mw-changeslist-line mw-changeslist-edit mw-changeslist-ns0-Aloûgg 
mw-changeslist-line-watched mw-changeslist-ns-0 mw-changeslist-ns-subject 
mw-changeslist-user-registered mw-changeslist-user-experienced 
mw-changeslist-self mw-changeslist-human mw-changeslist-major 
mw-changeslist-last mw-changeslist-src-mw-edit mw-changeslist-watched 
mw-enhanced-rc mw-tag-visualeditor">
	<tr>
		<td>
			<div class="mw-rcfilters-ui-highlights">
				<div class="mw-rcfilters-ui-highlights-color-none" 
                    data-color="none"></div>
				<div class="mw-rcfilters-ui-highlights-color-c1" 
                    data-color="c1"></div>
				<div class="mw-rcfilters-ui-highlights-color-c2" 
                    data-color="c2"></div>
				<div class="mw-rcfilters-ui-highlights-color-c3" 
                    data-color="c3"></div>
				<div class="mw-rcfilters-ui-highlights-color-c4" 
                    data-color="c4"></div>
				<div class="mw-rcfilters-ui-highlights-color-c5" 
                    data-color="c5"></div>
			</div>
		</td>
		<td>
			
		</td>
		<td class="mw-changeslist-line-prefix"></td>
		<td class="mw-enhanced-rc" 
        colspan="2">     22:13 </td>
		<td class="mw-changeslist-line-inner" 
        data-target-page="Aloûgg">			 
				
					<a href="/sv/wiki/Alo%C3%BBgg" 
                    class="mw-changeslist-title" 
                    title="Aloûgg">Aloûgg</a> 
				
					<span>
						<a class="mw-changeslist-diff" 
                        href="/sv/wiki/Alo%C3%BBgg? 
                        curid=646&diff=8933&
                        oldid=7958">diff</a>
					
					<span>
						<a href="/sv/wiki/Alo%C3%BBgg? 
                        action=history&curid=646" 
                        class="mw-changeslist-history" 
                        title="Aloûgg">hist</a>
					
				</span>
			</span>
			
				<span dir="ltr" class="mw-plusminus-pos 
                mw-diff-bytes" title="1,499 bytes after change">+15‎
			</span>
			
				<a href="/sv/wiki/Anv%C3%A4ndare:User" 
                class="mw-userlink" title="Användare:User">
					<bdi>User</bdi>
				</a>				 
				
					<span>
						<a href="/sv/wiki/Message_Wall:User" 
                        class="mw-usertoollinks-wall" 
                        title="Message Wall:User">Message Wall</a>
					 
					<span>
						<a href="/sv/wiki/Special:Bidrag/User" 
                        class="mw-usertoollinks-contribs" 
                        title="Special:Bidrag/User">contribs</a>
					
				</span>
			</span>
				
					<a href="/sv/wiki/Alo%C3%BBgg? 
action=rollback&from=User&token=8bd802368bbc40821f5960e13b8d11b164f48845%2B%5C" 
     title=""Rollback" reverts the last contributor's edit(s) 
     to this page in one click" data-mw="interface">rollback</a>
			 
				<a href="/sv/wiki/Special:M%C3%A4rken" 
                title="Special:Märken">Tag</a>: 
					<span class="mw-tag-marker 
                    mw-tag-marker-visualeditor">Visual edit
				
			</span>
			
		</td>
	</tr>
</table>


And for reference, everything works perfectly fine with this one:

HTML
<table data-mw-revid="9029" 
data-mw-ts="20230903123500" class="mw-changeslist-line 
mw-changeslist-edit mw-changeslist-ns0-Wresnyj 
mw-changeslist-line-watched mw-changeslist-ns-0 
mw-changeslist-ns-subject mw-changeslist-user-registered 
mw-changeslist-user-experienced mw-changeslist-self 
mw-changeslist-human mw-changeslist-major mw-changeslist-last 
mw-changeslist-src-mw-edit mw-changeslist-watched mw-enhanced-rc 
mw-tag-visualeditor">
	<tr>
		<td>
			<div class="mw-rcfilters-ui-highlights">
				<div class="mw-rcfilters-ui-highlights-color-none" 
                data-color="none"></div>
				<div class="mw-rcfilters-ui-highlights-color-c1" 
                data-color="c1"></div>
				<div class="mw-rcfilters-ui-highlights-color-c2" 
                data-color="c2"></div>
				<div class="mw-rcfilters-ui-highlights-color-c3" 
                data-color="c3"></div>
				<div class="mw-rcfilters-ui-highlights-color-c4" 
                data-color="c4"></div>
				<div class="mw-rcfilters-ui-highlights-color-c5" 
                data-color="c5"></div>
			</div>
		</td>
		<td>
			
		</td>
		<td class="mw-changeslist-line-prefix"></td>
		<td class="mw-enhanced-rc" colspan="2">     14:35 </td>
		<td class="mw-changeslist-line-inner" data-target-page="Wresnyj">			 
				
					<a href="/sv/wiki/Wresnyj" 
                    class="mw-changeslist-title" 
                    title="Wresnyj">Wresnyj</a><span>
						<a class="mw-changeslist-diff" 
                              href="/sv/wiki/Wresnyj?                     curid=1879&diff=9029&oldid=7720">diff</a>
					
					<span>
						<a href="/sv/wiki/Wresnyj?action=history&curid=1879" class="mw-changeslist-history" title="Wresnyj">hist</a>
				</span>
			</span>
			
				<span dir="ltr" class="mw-plusminus-neg mw-diff-bytes" 
            title="690 bytes after change">−40‎
			</span>
			
				<a href="/sv/wiki/Anv%C3%A4ndare:User" 
                class="mw-userlink" title="Användare:User">
					<bdi>User</bdi>
				</a>			 
				
					<span>
						<a href="/sv/wiki/Message_Wall:User" 
                        class="mw-usertoollinks-wall" 
                    title="Message Wall:User">Message Wall</a>
					 
					<span>
						<a href="/sv/wiki/Special:Bidrag/User" 
                        class="mw-usertoollinks-contribs" 
                     title="Special:Bidrag/User">contribs</a>
				</span>
			</span>			 
				
					<a href="/sv/wiki/Wresnyj?action=rollback&from=User&token=23da4e19f25fb98b4c9fb4320490e2fe64f48844%2B%5C" title=""Rollback" reverts the last contributor's edit(s) to this page in one click" data-mw="interface">rollback</a>				 
				
					<a href="/sv/wiki/Special:M%C3%A4rken" 
                    title="Special:Märken">Tag</a>: 
					<span class="mw-tag-marker 
                    mw-tag-marker-visualeditor">Visual edit
				
			</span>
			
		</td>
	</tr>
</table>


Thanks in advance!

What I have tried:

I haven't spotted a single difference between them.
Posted
Updated 11-Sep-23 10:41am
v4
Comments
Richard MacCutchan 4-Sep-23 3:57am    
The error message is telling you that the variable named number has not been set. So the code at the line:
number = (articles.find(dir="ltr"))

returned nothing. You need to use the debugger to find out why.
Richard MacCutchan 4-Sep-23 4:17am    
I just tried this code and it did find the number; but it also found some extra characters. So it may be that the document itself is corrupted in some way which is causing the failure.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900