Click here to Skip to main content
15,884,821 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
I'm building a scraper/crawler for linux directories. in essence the program will take users input for startpoint (EX: /home/user/Pictures/) and endpoint (EX: /home/user/Pictures/) as well as a file type to scrape for (which is where my question comes in)
I'm storing acceptable file extension types in a dictionary w/ nested lists like so:
Python
file_types = {'audio': ['mp3', 'mpa'], 'images': ['png', 'jpg']}


if I store the users input as the variable scrape_for how can I validate that the string in the variable scrape_for exists in the dictionary file_types?

What I have tried:

this is my current block of code which does the following:
1. take user input for start point
2. verify startpoint is a valid directory
3. take user input for end point
4. validate end point is both a valid directory and sub directory of start point
5. print options of file extensions for user to choose from
Python
import os
ftypes = {'audio': ['mp3', 'mpa', 'wpi', 'wav', 'wpi'], 'images': ['png', 'jpg', 'jpeg', 'gif', 'bmp'], 'text': ['txt', 'doc', 'pdf'], 'video': ['mp4', 'avi', '3g2', '3gp', 'mkv', 'm4v', 'mov', 'mpg', 'wmv', 'flv'], 'executable': ['apk', 'bat', 'bin', 'exe', 'py', 'wsf', 'com', 'cgi', 'pl']}

def UserInput():
#User inputs Start Point
    Spoint = input('Where to start: \n')
#check validity of input
    if os.path.isdir(Spoint):
        print('Scraping will begin at: ' + Spoint)
    elif not os.path.isdir(Spoint):
        print('Not a valid directory')
        exit()
#User input for End Point
    Epoint = input('\n\nWhat directory would you like to stop scraping at? \n')
# Check if Endpoint is a valid SubDirectory of the parent directory
    if os.path.isdir(Epoint) and len(Epoint) >= len(Spoint):
        print('\n\nScraping will end at: ' + Epoint)
    elif os.path.isdir(Epoint) or len(Epoint) >= len(Spoint):
        print('Error w/ End Point directory, make sure directory is formatted correly, and is a sub directory of your Starting Point')
        exit()

#User input for filetype
    for k,v in ftypes.items():
        print(k, v)
    ScrapeType = input('Please enter The extension youd like scraped: \n')
Posted
Updated 13-Jul-17 15:44pm

1 solution

Suppose you are looking for a filetype 'png'
True in ['png' in d for d in ftypes.values()]
will return True

for a non-existent file type
True in ['mdi' in d for d in ftypes.values()]
will return False

To check for a key
['audio' in ftypes.keys()] will return True

To check both
True in [input in d for d in ftypes.values()] or input in ftypes.keys()
 
Share this answer
 
v2

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900