Macros for data cleaning
Students often ask me, “What can I do with a degree in Political Science?” There are many things, of course, but one of the best ways to give students ideas about what they might do with a Political Science degree is to let them know what former students are currently doing.
So I’m compiling a list of my former students, for the purpose of being able to look them up on LinkedIn and see their career paths.
To compile that list, I have to access a database that contains that information. My access level only allows me to see one class at a time, and the information isn’t in CSV format. The data is in a format that’s easy enough to copy and paste into a spreadsheet, but then it needs to be cleaned.
Here’s a sample row of what the data looks like when I paste it into Google Sheets:
Sometimes columns G and H contain data; sometimes they don’t.
Here’s what I want the data to look like:
The goal is to end up with a workbook containing one sheet for each class. Later I can combine the sheets into a master list and remove the duplicates, so I’ll know whom to look for on LinkedIn.
The cleaning process isn’t especially difficult. Only four basic steps are necessary:
- Delete unnecessary columns
- Split the column containing student names into two columns
- Copy and paste the values of those two columns into two new columns, then delete the originals
- Add the header row
That isn’t difficult at all. It’s tedious, though, and it takes a bit of time. I’ve taught five to seven courses per year for many years; automating the data cleaning process as much as possible using macros is a huge time saver.
Fortunately, recording and using macros in Google Sheets is very easy, and those macros can be assigned hot keys. As I started this project, I pretty quickly got macros set up for the four steps above — but I was triggering them individually.
Then it occurred to me that combining the macros would save even more time. Figuring out how to do that took a little bit of trial and error. I saw quite quickly that I could go into the Apps Script editor for my macros, add a new function that called the function for each of the four macros, then import that new function as a new macro. But when I tried to run the new macro, I didn’t get the expected result.
A little Googling turned up the problem: I needed to add SpreadsheetApp.flush() between the function calls to make sure that each change to the spreadsheet was fully applied before moving on to the next step.
Here’s the code I ended up with in the Apps Script editor:
function DoAllSteps(){
Deletecolumns();
SpreadsheetApp.flush();
SplittextinColumnA();
SpreadsheetApp.flush();
Copyvaluesremovecolumns();
SpreadsheetApp.flush();
Addheader();
}
function SplittextinColumnA() {
var spreadsheet = SpreadsheetApp.getActive();
spreadsheet.getRange('B1').activate();
spreadsheet.getCurrentCell().setFormula('=split(a1,",")');
spreadsheet.getRange('B2').activate();
spreadsheet.getRange('B1').activate();
spreadsheet.getActiveRange().autoFillToNeighbor(SpreadsheetApp.AutoFillSeries.DEFAULT_SERIES);
};
function Copyvaluesremovecolumns() {
var spreadsheet = SpreadsheetApp.getActive();
spreadsheet.getRange('D1').activate();
spreadsheet.getRange('B1:C12').copyTo(spreadsheet.getActiveRange(), SpreadsheetApp.CopyPasteType.PASTE_VALUES, false);
spreadsheet.getRange('A:C').activate();
spreadsheet.setCurrentCell(spreadsheet.getRange('C1'));
spreadsheet.getActiveSheet().deleteColumns(spreadsheet.getActiveRange().getColumn(), spreadsheet.getActiveRange().getNumColumns());
};
function Addheader() {
var spreadsheet = SpreadsheetApp.getActive();
spreadsheet.getRange('1:1').activate();
spreadsheet.getActiveSheet().insertRowsBefore(spreadsheet.getActiveRange().getRow(), 1);
spreadsheet.getActiveRange().offset(0, 0, 1, spreadsheet.getActiveRange().getNumColumns()).activate();
spreadsheet.getActiveSheet().setFrozenRows(1);
spreadsheet.getRange('A1').activate();
spreadsheet.getCurrentCell().setValue('LastName');
spreadsheet.getRange('B1').activate();
spreadsheet.getCurrentCell().setValue('FirstName');
spreadsheet.getRange('C1').activate();
spreadsheet.getCurrentCell().setValue('LinkedIn');
};
function Deletecolumns() {
var spreadsheet = SpreadsheetApp.getActive();
spreadsheet.getActiveSheet().deleteColumns(1);
SpreadsheetApp.flush(); // Added
spreadsheet.getActiveSheet().deleteColumns(2,6);
};
I imported DoAllSteps as a new macro and assigned it a keyboard shortcut. Now all I need to do for each course is copy and paste the information into a tab in my workbook, then type ⌘+Option+Shift+1, and I have good, clean data to work with.