No such file or directory in Python 3

Question

0.00/5 (No votes)

See more:

I am a newbie to python. I want to extract the name of categories and webpages (category tree) of a wikipedia page having a category through the crawling procedure. During the course of this I am facing the following error and I am frustrated with an error. In regard, any help is greatly appreciated.

Downloading
Traceback (most recent call last):
File "C:\Users\SIBA\Desktop\PDF\Code\trialcode.py", line 100, in <module>
printTree(name, 0)
File "C:\Users\SIBA\Desktop\PDF\Code\trialcode.py", line 80, in printTree
content = open("categories/Category:"+catName+".html").readlines()
FileNotFoundError: [Errno 2] No such file or directory: 'categories/Category:Cricket.html'

The code snippet of what I have tried is as follows. I am using Python 3.6 version.

What I have tried:

Python

#Imports
import httplib2
from bs4 import BeautifulSoup
import subprocess
import time
import os,sys
os.path.dirname(sys.argv[0])

#declarations
catRoot = "http://en.wikipedia.org/wiki/Category:"
MAX_DEPTH = 100
done = []
ignore = []
# Removes all newline characters and replaces with spaces
def removeNewLines(in_text):
return in_text.replace('\n', ' ')

# Downloads a link into the destination
def download(link, dest):
# print link
if not os.path.exists(dest) or os.path.getsize(dest) == 0:
subprocess.getoutput('wget "' + link + '" -O "' + dest+ '"')
print ("Downloading")

def ensureDir(f):
    if not os.path.exists(f):
    os.makedirs(f)

# Cleans a text by removing tags
def clean(in_text):
s_list = list(in_text)
i,j = 0,0
while i < len(s_list):
    # iterate until a left-angle bracket is found
    if s_list[i] == '<':
        if s_list[i+1] == 'b' and s_list[i+2] == 'r' and s_list[i+3] == '>':
            i=i+1
            print (hello)
            continue
        while s_list[i] != '>':
            # pop everything from the the left-angle bracket until the right-angle bracket
            s_list.pop(i)
        # pops the right-angle bracket, too
        s_list.pop(i)

    elif s_list[i] == '\n':
        s_list.pop(i)
    else:
        i=i+1

# convert the list back into text
join_char=''
return (join_char.join(s_list))#.replace("<br>","\n")

# Gets bullets
def getBullets(content):
    mainSoup = BeautifulSoup(contents)

# Gets empty bullets
def getAllBullets(content):
mainSoup = BeautifulSoup(str(content))
subcategories = mainSoup.findAll('div',attrs={"class" :  "CategoryTreeItem"})
empty = []
full = []
for x in subcategories:
    subSoup = BeautifulSoup(str(x))
    link = str(subSoup.findAll('a')[0])
    if (str(x)).count("CategoryTreeEmptyBullet") > 0:
        empty.append(clean(link).replace(" ","_"))
    elif (str(x)).count("CategoryTreeBullet") > 0:
        full.append(clean(link).replace(" ","_"))

return((empty,full))

def printTree(catName, count):
catName = catName.replace("\\'","'")
if count == MAX_DEPTH: return
   path='trivial'
   download(catRoot+catName, path)
content = ("Category:"+catName+".html")
filepath=open("content")
(emptyBullets,fullBullets) = getAllBullets(content)
f.close()

for x in emptyBullets:
    for i in range(count): print ("  "),
    download(catRoot+x, "categories/Category:"+x+".html")
    print (x)

for x in fullBullets:
    for i in range(count): print ("  "),
    print (x)
    if x in done:
        print ("Done... "+x)
        continue
    done.append(x)
    try: printTree(x, count + 1)
    except: print ("ERROR: " + x)

name = "Cricket"
printTree(name, 0)

Posted 18-Jul-18 0:41am

Member 13916385

Updated 11-Feb-20 19:11pm

Add a Solution

1 solution

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Jochen Arndt · Answer 1 · 2018-07-18T01:32:00

The error message is quite clear: The mentioned file does not exist.

But the posted code has indentation errors and does not correspond to the code line from the error message so that it is rather impossible to help by just seeing the posted code.

In any case you can check if the file exists before trying to open it and act accordingly.

Note also that using relative pathes is prone to errors.

Finally, you should check if execution of the wget tool was successful. Otherwise, the file is not created.