Click here to Skip to main content
15,867,686 members
Please Sign up or sign in to vote.
5.00/5 (1 vote)
Hi there,

I am using the following line of Code to get a DataFrame Output using Pandas :-

<pre>import pandas as pd
import requests
from bs4 import BeautifulSoup
import numpy as np
import datetime as dt
class work:
    def __init__(self,link):
        self.link=link
        self.res=requests.get(self.link)
        self.soup=BeautifulSoup(self.res.content, "lxml")
        self.table = self.soup.find_all('table')[0]
        self.l = pd.read_html(str(self.table))
 
         
    def create(self):
        self.ll=[]
        for i in range(0,6):
            l1=self.l[1][0:1][i]
            l1=list(l1)
            self.ll.extend(l1)
        l2=self.l[1][2:]
        self.date=list(l2[0])
        self.location=list(l2[1])
        self.lancaster=list(l2[2])
        self.spitfire=list(l2[3])
        self.hurricane=list(l2[4])
        self.dakota=list(l2[5])
         
    def month(self):
        mm=self.l[1][1][1]
         
        if mm=='May':
            x=5
        elif mm=='June':
            x=6
        elif mm=='July':
            x=7
        elif mm=='August':
            x=8
        elif mm=='September':
            x=9
        else:
            x=0
        return x
             
 
         
         
         
    def refine(self):
        self.create()
        arr=np.asarray(self.date)
        temp=arr[0]
        for i in range(0,len(arr)):
            if arr[i]=='nan':
                arr[i]=temp
         
            else:
                temp=arr[i]
        self.y=list(arr)
        return self.y
    def convert(self):
        lx=[]
        x=self.refine()
        y=self.month()
        for i in range(0,len(x)):
            lx.append((dt.datetime(2006, y, int(x[i]))).strftime('%d-%b-%Y'))
        return lx
     
    def post(self):
        date=self.convert()
        dff = pd.DataFrame(list(zip(date,self.location,self.lancaster,self.spitfire,self.hurricane,self.dakota)), 
               columns =self.ll)
        return dff
         
         
         
#a=work('http://web.archive.org/web/20050726230748/http://www.raf.mod.uk/bbmf/may05.html')
#b=work('http://web.archive.org/web/20050726230748/http://www.raf.mod.uk/bbmf/june05.html')
#c=work('http://web.archive.org/web/20050726230748/http://www.raf.mod.uk/bbmf/july05.html')
#d=work('http://web.archive.org/web/20050726230748/http://www.raf.mod.uk/bbmf/august05.html')
#e=work('http://web.archive.org/web/20050726230748/http://www.raf.mod.uk/bbmf/september05.html')  
 
a=work('http://web.archive.org/web/20060811232523/http://www.deltaweb.co.uk/bbmf/may06.html')
b=work('http://web.archive.org/web/20060811232523/http://www.deltaweb.co.uk/bbmf/june06.html')
c=work('http://web.archive.org/web/20060811232523/http://www.deltaweb.co.uk/bbmf/july06.html')
d=work('http://web.archive.org/web/20060811232523/http://www.deltaweb.co.uk/bbmf/august06.html')
e=work('http://web.archive.org/web/20060811232523/http://www.deltaweb.co.uk/bbmf/september06.html')  
 
#a=work('http://web.archive.org/web/20070701133815/http://www.bbmf.co.uk/may07.html')
#b=work('http://web.archive.org/web/20070701133815/http://www.bbmf.co.uk/june07.html')
#c=work('http://web.archive.org/web/20070701133815/http://www.bbmf.co.uk/july07.html')
#d=work('http://web.archive.org/web/20070701133815/http://www.bbmf.co.uk/august07.html')
#e=work('http://web.archive.org/web/20070701133815/http://www.bbmf.co.uk/september07.html')  
 
#a=work('http://web.archive.org/web/20081116021904/http://www.bbmf.co.uk/may08.html')
#b=work('http://web.archive.org/web/20081116021904/http://www.bbmf.co.uk/june08.html')
#c=work('http://web.archive.org/web/20081116021904/http://www.bbmf.co.uk/july08.html')
#d=work('http://web.archive.org/web/20081116021904/http://www.bbmf.co.uk/august08.html')
#e=work('http://web.archive.org/web/20081116021904/http://www.bbmf.co.uk/september08.html')  
 
dff1=a.post()
dff2=b.post()
dff3=c.post()
dff4=d.post()
dff5=e.post()
 
X = pd.concat([dff1, dff2], axis=0)
Y = pd.concat([X, dff3], axis=0)
Z =  pd.concat([Y, dff4], axis=0)
F =  pd.concat([Z, dff5], axis=0)
F=pd.DataFrame(F)
display = F[(F['Location'].str.contains('- Display')) & (F['Dakota'].str.contains('D')) & (F['Spitfire'].str.contains('S', na=True)) & (F['Lancaster'] != 'L')]  
 
#Months = May Jun Jul Aug Sep
#Months = -05- -06- -07- -08- -09-   #('[a-zA-Z]')) or #('- Display')) or  #('- Display|Win'))
 
#display = F[(F['Location'].str.contains('[a-zA-Z]')) & (F['Date'].str.contains('Jul')) & (F['Dakota'].str.contains('D')) & (F['Spitfire'].str.contains('S', na=True)) & (F['Lancaster'] != 'L')]  
 
pd.options.display.max_rows = 1000   
pd.options.display.max_columns = 1000
display.drop('Lancaster', axis=1, inplace=True)
display=display.dropna(subset=['Spitfire', 'Hurricane'], how='all')
#display=display[['Date','Location','Dakota','Hurricane','Spitfire']]
display=display[['Location','Date','Dakota','Hurricane','Spitfire']]
display=display.fillna('--')
display.loc[86,'Location']='Windermere - Display'   #'Windermere Air Show'
display.reset_index(drop=True, inplace=True)
display.to_csv(r'C:\Users\Edward\Desktop\BBMF Schedules And Master Forum Thread Texts\BBMF-2006-Code (Dakota With Fighters).csv')
display


I am doing a search for Displays only now for the Output DataFrame, so in the filtering of Rows, I use the following line of Code :-

display = F[(F['Location'].str.contains('- Display'))


And I also changed a Row, with a Location saying Windermere Air Show to Windermere - Display for that Row,

using the following line of Code :-

display.loc[86,'Location']='Windermere - Display'


However in the Output when I run my Code, all the - Display Rows only show which is correct, but

The Windermere - Display Row shows as :-

Windermere - Display	NaN	NaN	NaN	NaN


Do I need, to put inplace=True as part of the display.loc line of Code, for the Data in the Row to show ? And if so what should the line read, when that is incorporated ? Or if not what change do I need to make ?

I tried moving the position, of that .loc Code line in the full Code, to other positions, but that made no difference, and I still get the Column values as NaN's in my Output. The Index position number '86' is correct, so an incorrect number for that, isn't the issue.

If I use the following Line of Code :-

F[(F['Location'].str.contains('- Display|Win'))


I get the correct DataFrame Output, with the Windermere - Display Row, properly showing in the
correct position. But I would like to get the DataFrame Output I want, without including the |Win in that Line of Code, if possible. If someone could direct me, to what change(s) I need to make to achieve that, I would be very grateful.

I would like to know, why the Line of code with |Win in, shows the Windermere - Display Line in the proper position in the Output DataFrame ? But when I use the one with only - Display in, the Windermere - Display Row shows at the bottom of the Output DataFrame, all with NaN values in the Column, as even moving the .loc line of Code, a few lines up the Full Code, doesn't make a difference ?

The following is the DataFrame Output I get, when I use the |Win Line of Code, which is the correct Output :-

Location	          Date	   Dakota   Hurricane	Spitfire
0	Woodspring Wings - Display	01-Jul-2006	D	H	S
1	Duxford Flying Legends - Display 08-Jul-2006	D	H	S
2	RAF Odiham - Display	        27-Jul-2006	D	--	S
3	East Fortune - Display	        29-Jul-2006	D	H	S
4	Windermere - Display	        30-Jul-2006	D	H	S
5	Whitby Carnival - Display	12-Aug-2006	D	--	S
6	Weymouth Carnival - Display	16-Aug-2006	D	H	S
7	Dawlish Carnival - Display	17-Aug-2006	D	H	S
8	Elvington - Display	        19-Aug-2006	D	H	S
9	Elvington - Display	        20-Aug-2006	D	H	S
10	Twinwoods - Display	        27-Aug-2006	D	--	S
11	Bodelwyddan Castle - Display	28-Aug-2006	D	--	S



And the following, is the Output I get when I use the - Display line of Code :-

Location	           Date	    Dakota   Hurricane	Spitfire
0	Woodspring Wings - Display	01-Jul-2006	D	H	S
1	Duxford Flying Legends - Display 08-Jul-2006	D	H	S
2	RAF Odiham - Display	        27-Jul-2006	D	--	S
3	East Fortune - Display	        29-Jul-2006	D	H	S
4	Whitby Carnival - Display	12-Aug-2006	D	--	S
5	Weymouth Carnival - Display	16-Aug-2006	D	H	S
6	Dawlish Carnival - Display	17-Aug-2006	D	H	S
7	Elvington - Display	        19-Aug-2006	D	H	S
8	Elvington - Display	        20-Aug-2006	D	H	S
9	Twinwoods - Display	        27-Aug-2006	D	--	S
10	Bodelwyddan Castle - Display	28-Aug-2006	D	--	S
11	Windermere - Display	              NaN	NaN	NaN	NaN



Could a Moderator, edit My DataFrame Outputs, if that is okay ?

I tidied them up, but they are still not displaying correctly.

Any help would be much appreciated

Regards

Eddie Winch

What I have tried:

As described in the Describe the Problem section.
Posted
Updated 29-Aug-20 12:18pm
v5

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900