Macros for data cleaning

Macros for data cleaning
Photo by Guille B / Unsplash

Students often ask me, “What can I do with a degree in Political Science?” There are many things, of course, but one of the best ways to give students ideas about what they might do with a Political Science degree is to let them know what former students are currently doing.

So I’m compiling a list of my former students, for the purpose of being able to look them up on LinkedIn and see their career paths.

To compile that list, I have to access a database that contains that information. My access level only allows me to see one class at a time, and the information isn’t in CSV format. The data is in a format that’s easy enough to copy and paste into a spreadsheet, but then it needs to be cleaned.

Here’s a sample row of what the data looks like when I paste it into Google Sheets:

Spreadsheet with raw data
What the raw data looks like when pasted in

Sometimes columns G and H contain data; sometimes they don’t.

Here’s what I want the data to look like:

Spreadsheet with clean data
How I want the data formatted

The goal is to end up with a workbook containing one sheet for each class. Later I can combine the sheets into a master list and remove the duplicates, so I’ll know whom to look for on LinkedIn.

The cleaning process isn’t especially difficult. Only four basic steps are necessary:

  1. Delete unnecessary columns
  2. Split the column containing student names into two columns
  3. Copy and paste the values of those two columns into two new columns, then delete the originals
  4. Add the header row

That isn’t difficult at all. It’s tedious, though, and it takes a bit of time. I’ve taught five to seven courses per year for many years; automating the data cleaning process as much as possible using macros is a huge time saver.

Fortunately, recording and using macros in Google Sheets is very easy, and those macros can be assigned hot keys. As I started this project, I pretty quickly got macros set up for the four steps above — but I was triggering them individually.

Then it occurred to me that combining the macros would save even more time. Figuring out how to do that took a little bit of trial and error. I saw quite quickly that I could go into the Apps Script editor for my macros, add a new function that called the function for each of the four macros, then import that new function as a new macro. But when I tried to run the new macro, I didn’t get the expected result.

A little Googling turned up the problem: I needed to add SpreadsheetApp.flush() between the function calls to make sure that each change to the spreadsheet was fully applied before moving on to the next step.

Here’s the code I ended up with in the Apps Script editor:

function DoAllSteps(){
  Deletecolumns();
  SpreadsheetApp.flush();
  SplittextinColumnA();
  SpreadsheetApp.flush();
  Copyvaluesremovecolumns();
  SpreadsheetApp.flush();
  Addheader();
}

function SplittextinColumnA() {
  var spreadsheet = SpreadsheetApp.getActive();
  spreadsheet.getRange('B1').activate();
  spreadsheet.getCurrentCell().setFormula('=split(a1,",")');
  spreadsheet.getRange('B2').activate();
  spreadsheet.getRange('B1').activate();
  spreadsheet.getActiveRange().autoFillToNeighbor(SpreadsheetApp.AutoFillSeries.DEFAULT_SERIES);
};


function Copyvaluesremovecolumns() {
  var spreadsheet = SpreadsheetApp.getActive();
  spreadsheet.getRange('D1').activate();
  spreadsheet.getRange('B1:C12').copyTo(spreadsheet.getActiveRange(), SpreadsheetApp.CopyPasteType.PASTE_VALUES, false);
  spreadsheet.getRange('A:C').activate();
  spreadsheet.setCurrentCell(spreadsheet.getRange('C1'));
  spreadsheet.getActiveSheet().deleteColumns(spreadsheet.getActiveRange().getColumn(), spreadsheet.getActiveRange().getNumColumns());
};

function Addheader() {
  var spreadsheet = SpreadsheetApp.getActive();
  spreadsheet.getRange('1:1').activate();
  spreadsheet.getActiveSheet().insertRowsBefore(spreadsheet.getActiveRange().getRow(), 1);
  spreadsheet.getActiveRange().offset(0, 0, 1, spreadsheet.getActiveRange().getNumColumns()).activate();
  spreadsheet.getActiveSheet().setFrozenRows(1);
  spreadsheet.getRange('A1').activate();
  spreadsheet.getCurrentCell().setValue('LastName');
  spreadsheet.getRange('B1').activate();
  spreadsheet.getCurrentCell().setValue('FirstName');
  spreadsheet.getRange('C1').activate();
  spreadsheet.getCurrentCell().setValue('LinkedIn');
};

function Deletecolumns() {
  var spreadsheet = SpreadsheetApp.getActive();
  spreadsheet.getActiveSheet().deleteColumns(1);
  SpreadsheetApp.flush(); // Added
  spreadsheet.getActiveSheet().deleteColumns(2,6);
};

I imported DoAllSteps as a new macro and assigned it a keyboard shortcut. Now all I need to do for each course is copy and paste the information into a tab in my workbook, then type ⌘+Option+Shift+1, and I have good, clean data to work with.