I am trying to parse a rather long string of csv values into a vector of doubles. Here is my string:
290, -44.71807341742762, -253.3983378443242, -2830.845102033997, -50.72846875682069, -244.9233159558827, -2828.886659819535, -27.65164366754831, -240.0375809241862, -2835.069523673292, -34.54927094555862, -231.321811649731, -2833.273014042345, 3.106092157848277, -253.4185721035693, -2847.925154903055, -16.84210157722069, -252.3643090610071, -2841.277599676779, 3.481993511648276, -243.7908142090034, -2850.058844625602, -16.96020230259998, -241.8251333302621, -2843.600971511372, -28.13201674096208, -323.3500615612998, -2936.785180664107, -20.53558800796208, -332.2289976974726, -2937.324499932703, -0.5792540616137933, -317.7772185227306, -2937.956001650109, -8.703039290972415, -308.7392134041689, -2937.568015894477, -3.218404154941379, -186.512334994593, -2852.615604795321, 5.825552843372414, -204.3828258646344, -2854.098284280801, -1.621143635389655, -205.3596355569897, -2851.129407159296, 4.443958877693104, -185.8652579997931, -2855.853218446998, 56.18049879403102, -126.1814547834552, -2928.184958411911, 46.25772599846893, -144.2196242760172, -2922.281888133142, 55.87793836919312, -145.6430655117966, -2923.098093177643, 46.14409203363797, -124.3305484377346, -2927.655738146642, -19.54783558019311, -116.778392739131, -2875.534811085721, -10.56613185487242, -116.7234914056516, -2879.302022999776, -11.4776563743, -94.03176348455519, -2880.971196457529, -21.24134900326896, -94.2727400417689, -2876.753942450211, ... and so on.
Yes, it's long.
Here is the function that I tried, but does not work:
std::string ReadAndRemoveFirstTokenFromString (const char &separator, std::string& line) {
auto found=line.find(separator);
if (found==std::string::npos)
{ string hold=line;
line.clear();
return hold;
}
else
{
std::string out=line.substr(0,found);
line=line.substr(found+1,line.size());
while (line[0]==' ') line=line.substr(1,line.size());
if (out=="") out="-999999.0";
return out;
}
}
std::vector<double> StringToVectorOfDoubles(std::string dataline, char separator)
{
std::vector <double> number;
while(dataline.size()>0)
{
string num=ReadAndRemoveFirstTokenFromString(separator, dataline);
if ((num=="\0") || (num.empty())) number.emplace_back(MISSING);
else number.emplace_back(stod(num));
}
return number;
}
When I run through this function I get a vector that contains:
2, 9, 0, ,, , -, 4, 4, ., 7, 1, 8, 0, 7, 3, 4, 1, 7, 4, 2, 7, 6, 2, ,, , -, 2, 5, 3, ., 3, 9, 8, 3, 3, 7, 8, 4, 4, 3, 2, 4, 2, ,, , -, 2, 8, 3, 0, ., 8, 4, 5, 1, 0, 2, 0, 3, 3, 9, 9, 7, ,, , -, 5, 0, ., 7, 2, 8, 4, 6, 8, 7, 5, 6, 8, 2, 0, 6, 9, ,, , -, 2, 4, 4, ., 9, 2, 3, 3, 1, 5, 9, 5, 5, 8, 8, 2, 7, ,, , -, 2, 8, 2, 8, ., 8, 8, 6, 6, 5, 9, 8, 1, 9, 5, 3, 5, ,, , -, 2, 7, ., 6, 5, 1, 6, 4, 3, 6, 6, 7, 5, 4, 8, 3, 1, ,, , -, 2, 4, 0, ., 0, 3, 7, 5, 8, 0, 9, 2, 4, 1, 8, 6, 2, ,, , -, 2, 8, 3, 5, ., 0, 6, 9, 5, 2, 3, 6, 7, 3, 2, 9, 2, ,, , -, 3, 4, ., 5, 4, 9, 2, 7, 0, 9, 4, 5, 5, 5, 8, 6, 2, ,, , -, 2, 3, 1, ., 3, 2, 1, 8, 1, 1, 6, 4, 9, 7, 3, 1, ,, , -, 2, 8, 3, 3, ., 2, 7, 3, 0, 1, 4, 0, 4, 2, 3, 4, 5, ,, , 3, ., 1, 0, 6, 0, 9, 2, 1, 5, 7, 8, 4, 8, 2, 7, 7, ,, , -, 2, 5, 3, ., 4, 1, 8, 5, 7, 2, 1, 0, 3, 5, 6, 9, 3, ,, , -, 2, 8, 4, 7, ., 9, 2, 5, 1, 5, 4, 9, 0, 3, 0, 5, 5, ,, , -, 1, 6, ., 8, 4, 2, 1, 0, 1, . . . there's more
What I have tried:
Oh, MISSING=-999999 and is a constant that is declared elsewhere.
Somehow I have separated each digit in the string to be an element of the vector. I don't know how this happened. Even worse, this is supposed to be a vector of doubles, so why do I get elements like '.' and '-' ? Those are chars not doubles.
However, when I re-write the function as:
void StringToVectorOfDoubles(std::string dataline, char separator, std::vector<double> number)
{
while (dataline.size() > 0)
{
string num = ReadAndRemoveFirstTokenFromString(separator, dataline);
if ((num == "\0") || (num.empty())) number.emplace_back(MISSING);
else number.emplace_back(stod(num));
}
}
The vector number comes out correct:
290, -44.7181, -253.398, -2830.85, -50.7285, -244.923, -2828.89, -27.6516, -240.038, -2835.07, -34.5493, -231.322, -2833.27, 3.10609, -253.419, -2847.93, -16.8421, and so on
So, I've got a solution to my problem, so to speak. But, I don't know why it does not work the first way, but does seem to work the second.
Now that I am sending this I can see that I am checking for missing values twice, which is not very efficient. However, I do not believe that is the problem.
Somewhere I believe that I have entered the realm of "undefined behavior," but I sure cannot figure out where.
Any help will be appreciated.