My Favorite Paradox

April 3, 2016

Today I’m going to talk about the data for my personal iphone side project and how I edited it. My project is a cocktail recipe book. It has over 1200 mostly unique recipes using almost 400 different ingredients. This is a lot of data to wrangle. There are already plenty of articles discussing how to store data — xml, json, binary, etc. However fewer articles discuss how to edit that data. Xml and json are both human readable, but not easy to edit at scale. A tool could be written from scratch to edit the data, but that seems heavy handed and wasteful for a side project.

A friend recommended using Google Spreadsheets as an editor and I could not be happier with the decision to use it. Typing data into a spreedsheet isn’t a particularly unique concept. What makes Google Spreadsheets cool is how trivially easy it is to write a C# tool to pull that data down. If you aren’t familiar with Google Documents service then I suggest you check it out. Any Google account, such as gmail, provides free access to basic MS Office type functionality. Documents (Word), Spreadsheets (Excel), Presentation (Powerpoint), etc. None of the Google offerings are as fully features or powerful as their retail MS counterparts, but they are all free and “in the cloud” for easy access by any internet enabled computer.

Google provides a .NET library for programmatic access. With three dlls and only a few lines of code you can access a spreadsheet and pull down data. As an added bonus if you load that data at run-time you can modify a spreadsheet and refresh in real time. Because one of my main goals is to provide concrete information I made a quick example project (dead link) and dummy gmail account so anyone can download and run real code. It was compiled in Visual Studio 2010 with .NET 4. It works on my machine and I provide no further guarantees. :)

The demo project took me about 10 minutes to complete start to finish. I decided I would make a spreadsheet containing game developers and games and load it into a shiny tree view. Here’s what the data looks like.

Input data

Pretty simple and straight forward. A pair of column headers, the developer in column A, and a variable number of games to the right. What is the process to parse this data? Here’s a summary:

  1. Create a new spreadsheet service
  2. Set username and password
  3. Find the correct spreadsheet
  4. Find the correct worksheet
  5. Iterate over cells

And here is the actual code.

private void ImportFromGoogle()  
{
  // Login/Password to GMail Account
  string userName = "username_goes_here";
  string password = "password_goes_here";

  // Name of spreadsheet and worksheet to access
  string spreadsheetName = "Example Spreadsheet";
  string worksheetName = "Example Worksheet";

  // Connect to google's service with login/password
  SpreadsheetsService service = new SpreadsheetsService( spreadsheetName );
  service.setUserCredentials( userName, password );

  // Query spreadsheets
  SpreadsheetQuery spreadsheetQuery = new SpreadsheetQuery();
  SpreadsheetFeed spreadsheetFeed = service.Query(spreadsheetQuery);

  // Loop over all spreadsheets
  foreach( SpreadsheetEntry spreadsheet in spreadsheetFeed.Entries )
  {
    // Check if this the spreadsheet we want
    if( !spreadsheetName.Equals(spreadsheet.Title.Text, StringComparison.OrdinalIgnoreCase) )
      continue;

    // Query worksheets
    AtomLink worksheetLink = spreadsheet.Links.FindService(GDataSpreadsheetsNameTable.WorksheetRel, null);
    WorksheetQuery worksheetQuery = new WorksheetQuery(worksheetLink.HRef.ToString());
    WorksheetFeed worksheetFeed = service.Query(worksheetQuery);

    // Loop over worksheets
    foreach( WorksheetEntry worksheet in worksheetFeed.Entries )
    {
      // Check if this is the worksheet we want
      if( !worksheetName.Equals(worksheet.Title.Text, StringComparison.OrdinalIgnoreCase) )
        continue;

      // Get cells
      AtomLink cellLink = worksheet.Links.FindService(GDataSpreadsheetsNameTable.CellRel, null);
      CellQuery cellQuery = new CellQuery(cellLink.HRef.ToString());
      CellFeed cellFeed = service.Query(cellQuery);
      AtomEntryCollection cellEntries = cellFeed.Entries;

      // Need to determine what column is what. Hardcoded for example.
      int devOffset = 0;
      int gameOffset = 1;

      // Loop over all entries finding the first entry on each row
      for( int i = 0; i < cellEntries.Count; ++i )
      {
        // Continue if this entry is not the first entry of a row (and not the first row)
        CellEntry baseEntry = cellEntries[i] as CellEntry;
        if( baseEntry.Row == 1 || baseEntry.Column != 1 )
          continue;

        // Cell containing developer name
        CellEntry devCell = cellEntries[i+devOffset] as CellEntry;

        // Create a node for the tree view for this developer
        TreeNode devNode = new TreeNode(devCell.Value);

        // Loop over all games associated with this developer
        int gameIndex = (i + gameOffset);
        while( true )
        {
          // Get game cell, if it's on a new row we're done with this developer
          CellEntry gameCell = cellEntries[gameIndex] as CellEntry;
          if( gameCell.Row != devCell.Row )
            break;

          // Add game to developer tree node
          devNode.Nodes.Add( new TreeNode(gameCell.Value) );

          // Increment and validate index
          ++gameIndex;
          if( gameIndex >= cellEntries.Count )
            break;
        }

        // Add developer node to list view
        TV_Devs.Nodes.Add(devNode);
      }
    }
  }
}

What do we get after all that? Something like this.

Output

Tada! Spreadsheet data has successfully been transformed to C# data. This new and convenient data can be further transformed to our heart’s content.

A few things of note here. When you get the CellEntries object it contains a flat list of non-empty cells. This requires data to be properly formatted. It’s not a system I would recommend for AAA development, but for a solo project I’ve found it to be perfectly acceptable. If you want to do this in your own project then there are two special steps to take.

Keeping with my focus on the real world, here is how I used this concept in my iphone project. A standard recipe entry looks like this:

Spreadsheet Input

It defines a good bit of information including a variable number of recipe steps. Over 1200 of these recipes are loaded in addition to separate lists of glassware, drink types, measurement types, protips, ingredients, and lite edition recipes. After import the data is validated searching for typos, duplicates, and otherwise invalid entries. The data is then crunched converting most string values into integer ids. Finally, the data can be exported into whatever file format I desire to be loaded on the user platform. Here’s a picture of my tool to handle all this.

Data Editor

That’s what I did for my project and I hope at least somebody found this useful. How do you guys edit data for your side or AAA projects? Discuss in the comments below!