Click here to Skip to main content
15,867,453 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Hello,

I have the following assignment:

Write a program to read through the mbox-short.txt and figure out the distribution by hour of the day for each of the messages. You can pull the hour out from the 'From ' line by finding the time and then splitting the string a second time using a colon.
From [DELETED]@uct.ac.za Sat Jan 5 09:14:16 2008
Once you have accumulated the counts for each hour, print out the counts, sorted by hour as shown below.
The data can be found in this link :
https://www.py4e.com/code3/mbox-short.txt PHPSESSID=3a64fe134f5f073f3911c47546619bcc[^]

04 3
06 1
07 1
09 2
10 3
11 6
14 1
15 2
16 4
17 2
18 1
19 1


What I have tried:

Python
1 name = input("Enter file:")
2 if len(name) < 1:
3     name = "mbox-short.txt"
4 handle = open(name)
5 counts = dict();

6 for line in handle:
7     if line.startswith('From:'):
8         pass
9     elif line.startswith('From'):
10         x = line.split();
11         time = x[5];
12         t = time.split(':');
13         hour = t[0];
14         for line in hour:
15             hoursorted = sorted(hour);
16             counts[line] = counts.get(line,0) + 1;
17            
18             print(hoursorted, counts[line]);



The output I'm getting is:

['0', '9'] 1
['0', '9'] 1
['1', '8'] 1
['1', '8'] 1
['1', '6'] 2
['1', '6'] 1
['1', '5'] 3
['1', '5'] 1
['1', '5'] 4
['1', '5'] 2
['1', '4'] 5
['1', '4'] 1
['1', '1'] 6
['1', '1'] 7
['1', '1'] 8
['1', '1'] 9
['1', '1'] 10
['1', '1'] 11
['1', '1'] 12
['1', '1'] 13
['1', '1'] 14
['1', '1'] 15
['1', '1'] 16
['1', '1'] 17
['0', '1'] 18
['0', '1'] 2
['0', '1'] 19
['0', '1'] 3
['0', '1'] 20
['0', '1'] 4
['0', '9'] 5
['0', '9'] 2
['0', '7'] 6
['0', '7'] 1
['0', '6'] 7
['0', '6'] 2
['0', '4'] 8
['0', '4'] 2
['0', '4'] 9
['0', '4'] 3
['0', '4'] 10
['0', '4'] 4
['1', '9'] 21
['1', '9'] 3
['1', '7'] 22
['1', '7'] 2
['1', '7'] 23
['1', '7'] 3
['1', '6'] 24
['1', '6'] 3
['1', '6'] 25
['1', '6'] 4
['1', '6'] 26
['1', '6'] 5




As you can see, each of the two integers in each line are being separated. If I only print(hour), I get a column of unsorted numbers, however they don't get separated by commas, neither do they get surrounded by brackets. I'm trying to sort them as column and put the total number of times each number appears with "counts" on the right, as in the answer above.

I think my problem is with lines 14 and 15, it's clear that this is not the right way to sort a column. I searched the web and found that it is possible to do it with sort_value(), using pandas; but the compiler I'm using doesn't allow me to download pandas.

Could someone please clarify how I could sort this list without separating two of each integers and without brackets?

Thank you.
Posted
Updated 28-Jan-23 5:41am
v5

1 solution

Python
        for line in hour:
15             hoursorted = sorted(hour);
16             counts[line] = counts.get(line,0) + 1;
17            
18             print(hoursorted, counts[line]);

Why are you splitting the hour field into its two digits? All you need at this point is the code to build the dictionary thus:
Python
counts[hour] = counts.get(hour,0) + 1;

And then once all lines have been processed
Python
print(sorted(counts.items()))
 
Share this answer
 
v2
Comments
Guilherme Romero 28-Jan-23 11:19am    
Hello,
Your solution worked partially.
The output I'm getting now is:

[('04', 6), ('06', 2), ('07', 2), ('09', 4), ('10', 6), ('11', 12), ('14', 2), ('15', 4), ('16', 8), ('17', 4), ('18', 2), ('19', 2)]

As you can see, the hours (first numbers in each parethesis) are sorted correctly, but the "counts" is double the value that it should be. If you compare it to the answer, you'll see that "counts" for 04 is 6, where it should be 3. "Counts" for 06 is 2, where it should be 1, and so on for all the other items.
Another issue is that now the output is a horizontal list, where it should be a vertical column, as shown in the answer.
Could you help me with that?

Thank you.
Richard MacCutchan 28-Jan-23 11:32am    
The code I suggested works correctly, I tested it a number of times. So if you are getting incorrect counts then either the code you use is wrong, or the input values are not what you think. But I cannot see either so that is purely a guess. So please use the Improve question link above, and add details of the source data that you are reading, and the updated code that processes it.
Guilherme Romero 28-Jan-23 11:44am    
I've inserted the link to the data on the statement above.

The code I'm now using is this:

name = input("Enter file:")
if len(name) < 1:
name = "mbox-short.txt"
handle = open(name)
counts = dict();

for line in handle:
if line.startswith('From:'):
pass


elif line.startswith('From'):
x = line.split();
time = x[5];
t = time.split(':');
hour = t[0];
for line in hour:
counts[hour] = counts.get(hour,0) + 1;

print(sorted(counts.items()));
Richard MacCutchan 28-Jan-23 11:56am    
That code is incorrect, you are adding the count twice for each hour. The correct code is as follows:
elif line.startswith('From'):
    x = line.split()
    time = x[5]
    t = time.split(':')
    hour = t[0]
    counts[hour] = counts.get(hour,0) + 1 # call this nce only for each hour

BTW Python does not require semi-colons at the end of statements.
Guilherme Romero 28-Jan-23 12:12pm    
Thanks very much! Now "counts" is correct:
[('04', 3), ('06', 1), ('07', 1), ('09', 2), ('10', 3), ('11', 6), ('14', 1), ('15', 2), ('16', 4), ('17', 2), ('18', 1), ('19', 1)]

But still I'm getting a horizontal list instead of a vertical column. Do you know how to do it?

Thanks.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900