Click here to Skip to main content
15,889,116 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
See more:
I am trying to parse a rather long string of csv values into a vector of doubles. Here is my string:

290, -44.71807341742762, -253.3983378443242, -2830.845102033997, -50.72846875682069, -244.9233159558827, -2828.886659819535, -27.65164366754831, -240.0375809241862, -2835.069523673292, -34.54927094555862, -231.321811649731, -2833.273014042345, 3.106092157848277, -253.4185721035693, -2847.925154903055, -16.84210157722069, -252.3643090610071, -2841.277599676779, 3.481993511648276, -243.7908142090034, -2850.058844625602, -16.96020230259998, -241.8251333302621, -2843.600971511372, -28.13201674096208, -323.3500615612998, -2936.785180664107, -20.53558800796208, -332.2289976974726, -2937.324499932703, -0.5792540616137933, -317.7772185227306, -2937.956001650109, -8.703039290972415, -308.7392134041689, -2937.568015894477, -3.218404154941379, -186.512334994593, -2852.615604795321, 5.825552843372414, -204.3828258646344, -2854.098284280801, -1.621143635389655, -205.3596355569897, -2851.129407159296, 4.443958877693104, -185.8652579997931, -2855.853218446998, 56.18049879403102, -126.1814547834552, -2928.184958411911, 46.25772599846893, -144.2196242760172, -2922.281888133142, 55.87793836919312, -145.6430655117966, -2923.098093177643, 46.14409203363797, -124.3305484377346, -2927.655738146642, -19.54783558019311, -116.778392739131, -2875.534811085721, -10.56613185487242, -116.7234914056516, -2879.302022999776, -11.4776563743, -94.03176348455519, -2880.971196457529, -21.24134900326896, -94.2727400417689, -2876.753942450211, ... and so on.

Yes, it's long.

Here is the function that I tried, but does not work:
C++
std::string ReadAndRemoveFirstTokenFromString (const char &separator, std::string& line) // faster than stringstream, at least for reading first element
 {
     auto found=line.find(separator);
     if (found==std::string::npos)
        {   string hold=line;
            line.clear();
            return hold;
        }
     else
     {
         std::string out=line.substr(0,found);
         line=line.substr(found+1,line.size());
         while (line[0]==' ') line=line.substr(1,line.size());
         if (out=="") out="-999999.0";
         return out;
     }
 }


std::vector<double> StringToVectorOfDoubles(std::string dataline, char separator)
{
   std::vector <double> number;
   while(dataline.size()>0)
    {
     string num=ReadAndRemoveFirstTokenFromString(separator, dataline);
     if ((num=="\0") || (num.empty())) number.emplace_back(MISSING);
     else number.emplace_back(stod(num));
     }
   return number;
}


When I run through this function I get a vector that contains:
2, 9, 0, ,, , -, 4, 4, ., 7, 1, 8, 0, 7, 3, 4, 1, 7, 4, 2, 7, 6, 2, ,, , -, 2, 5, 3, ., 3, 9, 8, 3, 3, 7, 8, 4, 4, 3, 2, 4, 2, ,, , -, 2, 8, 3, 0, ., 8, 4, 5, 1, 0, 2, 0, 3, 3, 9, 9, 7, ,, , -, 5, 0, ., 7, 2, 8, 4, 6, 8, 7, 5, 6, 8, 2, 0, 6, 9, ,, , -, 2, 4, 4, ., 9, 2, 3, 3, 1, 5, 9, 5, 5, 8, 8, 2, 7, ,, , -, 2, 8, 2, 8, ., 8, 8, 6, 6, 5, 9, 8, 1, 9, 5, 3, 5, ,, , -, 2, 7, ., 6, 5, 1, 6, 4, 3, 6, 6, 7, 5, 4, 8, 3, 1, ,, , -, 2, 4, 0, ., 0, 3, 7, 5, 8, 0, 9, 2, 4, 1, 8, 6, 2, ,, , -, 2, 8, 3, 5, ., 0, 6, 9, 5, 2, 3, 6, 7, 3, 2, 9, 2, ,, , -, 3, 4, ., 5, 4, 9, 2, 7, 0, 9, 4, 5, 5, 5, 8, 6, 2, ,, , -, 2, 3, 1, ., 3, 2, 1, 8, 1, 1, 6, 4, 9, 7, 3, 1, ,, , -, 2, 8, 3, 3, ., 2, 7, 3, 0, 1, 4, 0, 4, 2, 3, 4, 5, ,, , 3, ., 1, 0, 6, 0, 9, 2, 1, 5, 7, 8, 4, 8, 2, 7, 7, ,, , -, 2, 5, 3, ., 4, 1, 8, 5, 7, 2, 1, 0, 3, 5, 6, 9, 3, ,, , -, 2, 8, 4, 7, ., 9, 2, 5, 1, 5, 4, 9, 0, 3, 0, 5, 5, ,, , -, 1, 6, ., 8, 4, 2, 1, 0, 1, . . . there's more

What I have tried:

Oh, MISSING=-999999 and is a constant that is declared elsewhere.

Somehow I have separated each digit in the string to be an element of the vector. I don't know how this happened. Even worse, this is supposed to be a vector of doubles, so why do I get elements like '.' and '-' ? Those are chars not doubles.

However, when I re-write the function as:

C++
void StringToVectorOfDoubles(std::string dataline, char separator, std::vector<double> number)
{
    while (dataline.size() > 0)
    {
        string num = ReadAndRemoveFirstTokenFromString(separator, dataline);
        if ((num == "\0") || (num.empty())) number.emplace_back(MISSING);
        else number.emplace_back(stod(num));
    }

}


The vector number comes out correct:
290, -44.7181, -253.398, -2830.85, -50.7285, -244.923, -2828.89, -27.6516, -240.038, -2835.07, -34.5493, -231.322, -2833.27, 3.10609, -253.419, -2847.93, -16.8421, and so on

So, I've got a solution to my problem, so to speak. But, I don't know why it does not work the first way, but does seem to work the second.

Now that I am sending this I can see that I am checking for missing values twice, which is not very efficient. However, I do not believe that is the problem.

Somewhere I believe that I have entered the realm of "undefined behavior," but I sure cannot figure out where.

Any help will be appreciated.
Posted
Updated 12-Aug-21 14:55pm

1 solution

The reason it works is you were passing the separator by reference the first time and by value the second. It works when it is passed by value.

What I do is if an item is not changed and is plain, old data (POD) then I always pass it by value. If it will be modified or is more complex, like a string or vector, then I pass it be reference. If it is not POD and is going to be changed then it is passed as a const reference. These are just my habits and they work well for me.
 
Share this answer
 
v2

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900