Let’s build something Outrageous – Part 25: Dates in C++ and Faster Imports 

Back in February, we added the ability to load a CSV file and alter the contents while importing it. We also added Date support to RageDB using a Lua library. This was a masterful job of copy and paste and got us lots of functionality very quickly. When we timed the import for LDBC SNB SF10 it came in at 28 minutes. Which wasn’t bad, but wasn’t great. Let’s try to speed that up today.

I ran a little test just importing People and I noticed there was huge difference between these two imports:

for i, person in ftcsv.parseLine("/home/max/CLionProjects/ldbc/sn-sf10/person_0_0.csv", "|") do
   NodeAdd("Person", person.id, "{\"firstName\":".."\""..person.firstName.."\",
   \"lastName\":".."\""..person.lastName.."\",
   \"gender\":".."\""..person.gender.."\",
   \"birthday\":".."\""..person.birthday.."\",
   \"creationDate\":"..date(person.creationDate):todouble()..",
   \"locationIP\":".."\""..person.locationIP.."\",
   \"browserUsed\":".."\""..person.browserUsed.."\"}")
end

The exact same import statement, but instead of parsing the creationDate to a double, we just set it to the same value every time.

for i, person in ftcsv.parseLine("/home/max/CLionProjects/ldbc/sn-sf10/person_0_0.csv", "|") do
   NodeAdd("Person", person.id, "{\"firstName\":".."\""..person.firstName.."\",
   \"lastName\":".."\""..person.lastName.."\",
   \"gender\":".."\""..person.gender.."\",
   \"birthday\":".."\""..person.birthday.."\",
   \"creationDate\":"..1234.56..",
   \"locationIP\":".."\""..person.locationIP.."\",
   \"browserUsed\":".."\""..person.browserUsed.."\"}")
end

The bottom, one that doesn’t calculate the date, was twice as fast as the first one. How do we fix this? Well, we could try to optimize the Lua code, but I doubted it would do much good. Instead I decided to byte the bullet and do the conversion in C++.

I wasn’t 100% sure I wanted to make a custom Date type so I created one, but decided to simply store the “double” representation of Date instead of storing a new type. The important parts are below:

namespace ragedb {
    class Date {
      ...
      public:
        Date(double _value) : value(_value) {}
        Date(std::string _value)  {
          value = fromString(_value).value;
        }

      static double convert(std::string s);

A Date will be created with either a String or double. We’ll also have a convert function to get that String into the double we need. The conversion is a little crazy. First we have to parse the String, more on that later, but once we have our year month day values, we need to convert them to a double before adding time. This code I found online from Howard Hinnant does the trick:

    static int daysFromCivil(int y, unsigned m, unsigned d) noexcept {
      y -= m <= 2;
      const int era = (y >= 0 ? y : y - 399) / 400;
      const unsigned yoe = static_cast<unsigned>(y - era * 400);            // [0, 399]
      const unsigned doy = (153 * (m + (m > 2 ? -3 : 9)) + 2) / 5 + d - 1;  // [0, 365]
      const unsigned doe = yoe * 365 + yoe / 4 - yoe / 100 + doy;           // [0, 146096]
      return era * 146097 + static_cast<int>(doe) - 719468;
    }

He is the author of the std::chrono library, so be sure to check out his website for all kinds of date tricks and more. The parsing code consists of trimming whitespace, getting the year, month and date, checking we aren’t out of bounds (day greater than 31, month greater than 12, etc) then doing the same for the time, and time zone, and finally putting it all together. It’s long so I’m just going to link it here. Since the Lua library used time as seconds with the milliseconds on the right side of the period, we do the same.

    static double mergeTime(unsigned hour, unsigned minute, unsigned second, unsigned ms) {
      return (ms/1000.0) + second + (60 * minute) + (60 * 60 * hour);
    }

    static unsigned mergeTimeZone(unsigned hour, unsigned minute) {
      return (60 * minute) + (60 * 60 * hour);
    }

I should really add a “float” and “int32” data type, maybe merge things together into just 8, 16, 32, 64 bit blobs…or just use 64 bits and extract the bits I need for each property but that sounds dirty. Anyway… What did our new Date conversion in C++ feature give us?

A huge reduction from 28 minutes, down to just 16 minutes as you can see above. Just to check I didn’t screw something up. I can ask for a Person and check their creationDate property:

That looks right, and we can convert from the stored double representation to the Date in Lua to the String representation for ISO. Pretty nice win. We still don’t have a bulk loader, but this speed up reduces the need for it quite a bit. On our next installment I will be changing up the look of the series a bit by replacing the logo with the new RageDB mascot. Stay tuned.

Tagged , , , , ,

Leave a comment