Importing notes from Google Docs to Emacs Org Mode
2024-05-24 08:56:41 +0200 +0200
If that demo is intriguing to you, read on!
Why
I use the superb denote package for my personal notes repository, with org-mode
for formatting.
For each meeting and 1:1s, I have a dedicated note file. denote
will stamp some metadata at the top of the file, and I can easily tag and link notes together.
Taking my own personal notes as a reminder of e.g. what we decided or disagreed on is great–for me.
But notes are also important for building shared context. In my organization, that means a shared Google Doc. That also means, I should take notes in a shared document and then copy it to my local repository after the meeting.
I need an efficient way to copy notes from Google Doc into my denote
repository, so I can take advantage of linking and local, file-based search of my notes, etc.
How (manual approach)
My starting point for "convert format A to format B" is always the reliable pandoc. Sadly, one cannot simply pipe clipboard contents from Google Docs into pandoc
. You need to first convert it to a format that pandoc
knows about, like HTML. I used this site a few times but wasn't keen to paste document content into a site.
The workflow is also kind of clunky:
- open Google Doc
- select contents or relevant region and copy to clipboard
- visit gdoctohtml site and paste
- invoke
pbpaste | pandoc --wrap=none -f html -t org
- open the
denote
document and paste
There are other options too, like downloading the Google Doc as HTML (and unzipping) or as docx format, and piping the path to pandoc
. But I wanted a way with fewer steps.
And ideally one that would integrate with Emacs + denote
. After a bit of setup, I have something working.
Fancier way
First, install glotlabs/gdrive and follow their Create Google API credentials in 50 easy steps guide 😭. Then add this code in somewhere in your Emacs config. (The LLM-that-shall-not-be-named wrote most of this after quite a bit of prodding and hallucinating. Still, I'm grateful that I didn't have try to cobble this together myself as I don't like writing Elisp.)
See the code
(defun export-google-doc-to-org-buffer ()
"Prompt for a Google Docs URL, export it to DOCX using `gdrive`, convert to Org using `pandoc`, and open the content in a new buffer."
(interactive)
(let* ((url (read-string "Enter Google Docs URL: "))
(doc-id (if (string-match "document/d/\\([^/]+\\)" url)
(match-string 1 url)
(error "Invalid URL format")))
(temp-dir (file-name-as-directory temporary-file-directory))
(docx-file (concat temp-dir doc-id ".docx"))
(org-file (concat temp-dir doc-id ".org"))
(gdrive-info-command (concat "gdrive files info " doc-id))
(gdrive-export-command (concat "gdrive files export --overwrite " doc-id " " docx-file))
(pandoc-command (concat "pandoc --wrap=none -f docx " docx-file " -t org -o " org-file))
(buffer-name nil)
(title nil)
(created nil)
(updated nil)
(gdrive-output-buffer "*GDrive Output*")
(url nil))
;; Ensure the temp directory exists
(unless (file-directory-p temp-dir)
(make-directory temp-dir t))
;; Get document info from gdrive
(with-temp-buffer
(shell-command gdrive-info-command t gdrive-output-buffer)
(goto-char (point-min))
(when (re-search-forward "^Name: \\(.*\\)" nil t)
(setq title (match-string 1))
(message "Title: %s" title))
(when (re-search-forward "^Modified: \\(.*\\)" nil t)
(setq updated (match-string 1))
(message "Updated: %s" updated))
(when (re-search-forward "^Created: \\(.*\\)" nil t)
(setq created (match-string 1))
(message "Created: %s" created))
(when (re-search-forward "^ViewUrl: \\(.*\\)" nil t)
(setq url (match-string 1))
(message "URL: %s" url)))
;; Check if title and updated are nil
(unless (and title updated)
(error "Failed to retrieve document information from gdrive"))
;; Shell out to gdrive to export the document
(with-temp-buffer
(shell-command gdrive-export-command t gdrive-output-buffer)
(message "Export command output: %s" (buffer-string))
(goto-char (point-min))
(when (re-search-forward "Exporting document '\\([^']+\\)'" nil t)
(setq buffer-name (concat (match-string 1) ".org"))
(message "Buffer Name: %s" buffer-name)))
;; Check if buffer-name is nil
(unless buffer-name
(error "Failed to export document from gdrive"))
;; Shell out to pandoc to convert the DOCX to Org
(shell-command pandoc-command)
(message "here")
;; Check if Org file was created
(unless (file-exists-p org-file)
(error "Org file was not created by pandoc command"))
;; Create properties drawer content
(setq properties-content
(concat ":PROPERTIES:\n"
":TITLE: " title "\n"
":CREATED: " created "\n"
":UPDATED: " updated "\n"
":URL: " url "\n"
":END:\n\n"))
;; Create a new buffer and insert the contents of the exported Org file
(with-current-buffer (get-buffer-create buffer-name)
(insert properties-content)
(insert-file-contents org-file)
;; Remove empty lines between bullet points
(goto-char (point-min))
(while (re-search-forward "^\n\\(-\\)" nil t)
(replace-match "\\1" nil nil))
;; Remove empty lines between numbered list items
(goto-char (point-min))
(while (re-search-forward "^\n\\([0-9]+\\.\\)" nil t)
(replace-match "\\1" nil nil))
;; Open the buffer
(switch-to-buffer (current-buffer))
;; Set the buffer to Org mode
(org-mode))))
The script:
- grabs the document identifier from the URL, so no need to manually extract it myself
- downloads the document from Google Drive in docx format
- passes the document to Pandoc and converts to org-mode
- does some clean up to remove extra lines in between list items
- extracts some metadata about the document and adds it as a properties drawer in the Org document