Importing notes from Google Docs to Emacs Org Mode

2024-05-24 08:56:41 +0200 +0200

/google-docs-org-mode.gif

If that demo is intriguing to you, read on!

Why

I use the superb denote package for my personal notes repository, with org-mode for formatting.

For each meeting and 1:1s, I have a dedicated note file. denote will stamp some metadata at the top of the file, and I can easily tag and link notes together.

Taking my own personal notes as a reminder of e.g. what we decided or disagreed on is great–for me.

But notes are also important for building shared context. In my organization, that means a shared Google Doc. That also means, I should take notes in a shared document and then copy it to my local repository after the meeting.

I need an efficient way to copy notes from Google Doc into my denote repository, so I can take advantage of linking and local, file-based search of my notes, etc.

How (manual approach)

My starting point for "convert format A to format B" is always the reliable pandoc. Sadly, one cannot simply pipe clipboard contents from Google Docs into pandoc. You need to first convert it to a format that pandoc knows about, like HTML. I used this site a few times but wasn't keen to paste document content into a site.

The workflow is also kind of clunky:

  • open Google Doc
  • select contents or relevant region and copy to clipboard
  • visit gdoctohtml site and paste
  • invoke pbpaste | pandoc --wrap=none -f html -t org
  • open the denote document and paste

There are other options too, like downloading the Google Doc as HTML (and unzipping) or as docx format, and piping the path to pandoc. But I wanted a way with fewer steps.

And ideally one that would integrate with Emacs + denote. After a bit of setup, I have something working.

Fancier way

First, install glotlabs/gdrive and follow their Create Google API credentials in 50 easy steps guide 😭. Then add this code in somewhere in your Emacs config. (The LLM-that-shall-not-be-named wrote most of this after quite a bit of prodding and hallucinating. Still, I'm grateful that I didn't have try to cobble this together myself as I don't like writing Elisp.)

See the code
(defun export-google-doc-to-org-buffer ()
  "Prompt for a Google Docs URL, export it to DOCX using `gdrive`, convert to Org using `pandoc`, and open the content in a new buffer."
  (interactive)
  (let* ((url (read-string "Enter Google Docs URL: "))
         (doc-id (if (string-match "document/d/\\([^/]+\\)" url)
                     (match-string 1 url)
                   (error "Invalid URL format")))
         (temp-dir (file-name-as-directory temporary-file-directory))
         (docx-file (concat temp-dir doc-id ".docx"))
         (org-file (concat temp-dir doc-id ".org"))
         (gdrive-info-command (concat "gdrive files info " doc-id))
         (gdrive-export-command (concat "gdrive files export --overwrite " doc-id " " docx-file))
         (pandoc-command (concat "pandoc --wrap=none -f docx " docx-file " -t org -o " org-file))
         (buffer-name nil)
         (title nil)
         (created nil)
         (updated nil)
         (gdrive-output-buffer "*GDrive Output*")
         (url nil))

    ;; Ensure the temp directory exists
    (unless (file-directory-p temp-dir)
      (make-directory temp-dir t))

    ;; Get document info from gdrive
    (with-temp-buffer
      (shell-command gdrive-info-command t gdrive-output-buffer)
      (goto-char (point-min))
      (when (re-search-forward "^Name: \\(.*\\)" nil t)
        (setq title (match-string 1))
        (message "Title: %s" title))
      (when (re-search-forward "^Modified: \\(.*\\)" nil t)
        (setq updated (match-string 1))
        (message "Updated: %s" updated))
      (when (re-search-forward "^Created: \\(.*\\)" nil t)
        (setq created (match-string 1))
        (message "Created: %s" created))
      (when (re-search-forward "^ViewUrl: \\(.*\\)" nil t)
        (setq url (match-string 1))
        (message "URL: %s" url)))

    ;; Check if title and updated are nil
    (unless (and title updated)
      (error "Failed to retrieve document information from gdrive"))

    ;; Shell out to gdrive to export the document
    (with-temp-buffer
      (shell-command gdrive-export-command t gdrive-output-buffer)
      (message "Export command output: %s" (buffer-string))
      (goto-char (point-min))
      (when (re-search-forward "Exporting document '\\([^']+\\)'" nil t)
        (setq buffer-name (concat (match-string 1) ".org"))
        (message "Buffer Name: %s" buffer-name)))

    ;; Check if buffer-name is nil
    (unless buffer-name
      (error "Failed to export document from gdrive"))

    ;; Shell out to pandoc to convert the DOCX to Org
    (shell-command pandoc-command)

    (message "here")

    ;; Check if Org file was created
    (unless (file-exists-p org-file)
      (error "Org file was not created by pandoc command"))

    ;; Create properties drawer content
    (setq properties-content
          (concat ":PROPERTIES:\n"
                  ":TITLE: " title "\n"
                  ":CREATED: " created "\n"
                  ":UPDATED: " updated "\n"
                  ":URL: " url "\n"
                  ":END:\n\n"))

    ;; Create a new buffer and insert the contents of the exported Org file
    (with-current-buffer (get-buffer-create buffer-name)
      (insert properties-content)
      (insert-file-contents org-file)

      ;; Remove empty lines between bullet points
      (goto-char (point-min))
      (while (re-search-forward "^\n\\(-\\)" nil t)
        (replace-match "\\1" nil nil))

      ;; Remove empty lines between numbered list items
      (goto-char (point-min))
      (while (re-search-forward "^\n\\([0-9]+\\.\\)" nil t)
        (replace-match "\\1" nil nil))

      ;; Open the buffer
      (switch-to-buffer (current-buffer))

      ;; Set the buffer to Org mode
      (org-mode))))

The script:

  • grabs the document identifier from the URL, so no need to manually extract it myself
  • downloads the document from Google Drive in docx format
  • passes the document to Pandoc and converts to org-mode
  • does some clean up to remove extra lines in between list items
  • extracts some metadata about the document and adds it as a properties drawer in the Org document