Click here to Skip to main content
15,891,136 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I need to average the values associated with a date string from a csv file , the format of the csv file is as follows

20/04/2009  -1.271

20/04/2009  0

20/04/2009  0.247

20/04/2009  1.337

20/04/2009  -0.438

21/04/2009  -0.134

21/04/2009  -0.096

21/04/2009  -0.418

21/04/2009  0.362

The value being a float.

The expected result would be a new csv containing

20/04/2009 , 0.025

21/04/2009 , 0.071


The language is java , and I am using the openCsv api.

I would like help in understanding how I would go down this large CSV file , and group the dates in a temporary collections so as to average them, and assign that average with the correct date.

This would be a one time operation.

Java
public class SentimentScore {

private DateTime time;
private double score;


public SentimentScore(DateTime t, double s) {
    time=t;
    score=s;

}

public DateTime getTime() {
    return time;
}

public double getScore() {
    return score;
}


the rows are stored in the above class

Java
public class Processor {

    private String pathToFile = "C:\\Data\\scores.csv";
    private List<SentimentScore> scores;
    CSVReader reader;
    DateFormat formatter = new SimpleDateFormat("dd/MM/yyyy");

    public Processor() {

    }

    public void loadScores() throws FileNotFoundException, IOException, ParseException {

        reader = new CSVReader(new FileReader(pathToFile));

        String[] record = null;

        while ((record = reader.readNext()) != null) {
            String dateString = record[0];
            Date date = formatter.parse(dateString);
            DateTime correctDate = new DateTime(date);
            double value = Double.parseDouble(record[1]);
            SentimentScore score = new SentimentScore(correctDate, value);
            scores.add(score);

        }
    }

    public void calculateAverages()
    {
        List<SentimentScore> newScores = new ArrayList<>();
        Iterator scoreIterator = scores.iterator();
        SentimentScore temp  = (SentimentScore) scoreIterator.next();
        DateTime currentDate = temp.getTime();
        while(scoreIterator.hasNext())
        {
            SentimentScore currentRow  = (SentimentScore) scoreIterator.next();
            if(currentRow.getTime().equals(currentDate))
            {
                List<Double> values = new ArrayList<>();
                values.add(currentRow.getScore());
                double sum = 0;
                for (Double value : values)
                {
                    sum = sum + value;
                }
                double avg = sum/values.size();
                SentimentScore averagedScore = new SentimentScore(currentRow.getTime(),avg);
                newScores.add(averagedScore);

                // then parse to new csv 

            }
        }
    }
}


What I have tried:

I have tried the above code , but i think i am lost and would appreciate some help.
Posted
Updated 22-Apr-16 21:31pm
v2

1 solution

Create a class that will hold the Date and a List<T> of time values. Then create a List<T> which will contain objects of that class. Each time you read a line you check to see if there is an object in the main list containing that date entry. If there is not, then create new object, and add it to the list with its value. If the entry is in the list then append the value to the object's value list. Then when you have read all the CSV entries, you just need to traverse your main List: for each object calculate the average of all the times and write a new CSV record.

[edit]
Alternatively, keep a running total and item count in each object.
[/edit]
 
Share this answer
 
v2

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900