AI for extracting data

AI for extracting data
Photo by BoliviaInteligente / Unsplash

There are times when using AI isn’t appropriate; for example, I would never want it to write material for me. I’ll do my own thinking and writing, thanks.

But there are things that it’s very good at — things that I could do myself, but that don’t engage my creativity or problem-solving skills, and that AI can do more quickly.

I ran into an example today. I had a 16-page PDF with tabular data about books: title, author, and publication year. I wanted to pull that data into a spreadsheet.

I could do this task using Power Query in Excel, but it’s pretty tedious. Instead, I uploaded the file to Claude along with this prompt:

The file I'm uploading is a 16-page PDF containing tabular data. Each page of the PDF has three columns of data. The header row is repeated in the first row of each page. Convert the data to a CSV file that I can import into a spreadsheet. The header row information should appear in the CSV only once.

Claude took a few minutes, then handed me a well-formatted CSV file that I could import into Google Sheets (or any other spreadsheet software). I still needed to split the author column into columns for given names and surnames, and I had to do some cleanup (most of the author names in the PDF were in the format LastName, FirstName, but several were listed as FirstName LastName).

All the cleanup I had to do was the result of underlying problems in the original data, though, and not something that Claude introduced. Letting Claude create the CSV for me saved a considerable amount of time.