Hello,
I have the following assignment:
Write a program to read through the mbox-short.txt and figure out the distribution by hour of the day for each of the messages. You can pull the hour out from the 'From ' line by finding the time and then splitting the string a second time using a colon.
From [DELETED]@uct.ac.za Sat Jan 5 09:14:16 2008
Once you have accumulated the counts for each hour, print out the counts, sorted by hour as shown below.
The data can be found in this link :
https://www.py4e.com/code3/mbox-short.txt PHPSESSID=3a64fe134f5f073f3911c47546619bcc[
^]
04 3
06 1
07 1
09 2
10 3
11 6
14 1
15 2
16 4
17 2
18 1
19 1
What I have tried:
1 name = input("Enter file:")
2 if len(name) < 1:
3 name = "mbox-short.txt"
4 handle = open(name)
5 counts = dict();
6 for line in handle:
7 if line.startswith('From:'):
8 pass
9 elif line.startswith('From'):
10 x = line.split();
11 time = x[5];
12 t = time.split(':');
13 hour = t[0];
14 for line in hour:
15 hoursorted = sorted(hour);
16 counts[line] = counts.get(line,0) + 1;
17
18 print(hoursorted, counts[line]);
The output I'm getting is:
['0', '9'] 1
['0', '9'] 1
['1', '8'] 1
['1', '8'] 1
['1', '6'] 2
['1', '6'] 1
['1', '5'] 3
['1', '5'] 1
['1', '5'] 4
['1', '5'] 2
['1', '4'] 5
['1', '4'] 1
['1', '1'] 6
['1', '1'] 7
['1', '1'] 8
['1', '1'] 9
['1', '1'] 10
['1', '1'] 11
['1', '1'] 12
['1', '1'] 13
['1', '1'] 14
['1', '1'] 15
['1', '1'] 16
['1', '1'] 17
['0', '1'] 18
['0', '1'] 2
['0', '1'] 19
['0', '1'] 3
['0', '1'] 20
['0', '1'] 4
['0', '9'] 5
['0', '9'] 2
['0', '7'] 6
['0', '7'] 1
['0', '6'] 7
['0', '6'] 2
['0', '4'] 8
['0', '4'] 2
['0', '4'] 9
['0', '4'] 3
['0', '4'] 10
['0', '4'] 4
['1', '9'] 21
['1', '9'] 3
['1', '7'] 22
['1', '7'] 2
['1', '7'] 23
['1', '7'] 3
['1', '6'] 24
['1', '6'] 3
['1', '6'] 25
['1', '6'] 4
['1', '6'] 26
['1', '6'] 5
As you can see, each of the two integers in each line are being separated. If I only print(hour), I get a column of unsorted numbers, however they don't get separated by commas, neither do they get surrounded by brackets. I'm trying to sort them as column and put the total number of times each number appears with "counts" on the right, as in the answer above.
I think my problem is with lines 14 and 15, it's clear that this is not the right way to sort a column. I searched the web and found that it is possible to do it with sort_value(), using pandas; but the compiler I'm using doesn't allow me to download pandas.
Could someone please clarify how I could sort this list without separating two of each integers and without brackets?
Thank you.