papers+.el

Published on 2024-09-28 (updated on 2024-10-07)

tags: emacs org

The why and the what

Org is a fantastic tool, the office suite for the Emacs OS. I maintain this website with Org, even wrote my dissertation with it. Aided by packages like org-ref, citar, cdlatex, writing scientific documents in Org is almost a breeze. Writing my dissertation, I realized that just like Emacs is not the ideal OS for every circumstance (never used it as a OS myself), Org is not the ideal office suite for all circumstances. I prefer writing equation-heavy articles directly in LaTeX than doing the same in Org. Even when Org is beefed up with all those supporting packages. Particularly when I have to collaborate with co-authors who do not use Emacs, let alone Org.

Another issue that I faced writing scientific documents in Org is that updates to Org (or supporting packages) can introduce changes to the exporter that force me to tweak the Org file to properly output to a PDF. In my experience, this is much rare when writing directly in LaTeX.

However, there are multiple steps, before you start writing a scientific article, for which Org is still a fantastic tool. Primary among them are taking notes, making presentations, and collecting references for the bibliography of your article. This post is about a small package I wrote, creatively called papers+, to easily add papers to a central list. org-ref also provides a similar ability, but that is just one of the many things that it does. As mentioned above, the other functionalities of org-ref are not very useful to me. So why fill up my elpa directory with dependency packages that I do not need?

papers+ does one thing and one thing only — add an entry corresponding to a paper to a central Org file. The central Org file acts as a repository for all your papers. You can easily search it with either the built-in isearch or something like consult-ripgrep. Each entry consists of the following:

  1. The title of the paper, as an Org header.
  2. BibTeX citation for the paper.

An entry in my central Org file looks like the following:

** Nuclear effective field theory: status and perspectives
  #+begin_src bibtex
  @article{Hammer:2019poc,
      author = {Hammer, H. -W. and K\"onig, S. and van Kolck, U.},
      title = "{Nuclear effective field theory: status and perspectives}",
      eprint = "1906.12122",
      archivePrefix = "arXiv",
      primaryClass = "nucl-th",
      doi = "10.1103/RevModPhys.92.025004",
      journal = "Rev. Mod. Phys.",
      volume = "92",
      number = "2",
      pages = "025004",
      year = "2020"
  }
  #+end_src

The header is also a link to the PDF file corresponding to the paper. If you hover over the header in the source block above, you will see that it is a link to a (nonexistent) file. In Emacs, C-c C-o on the header opens the PDF file.

Currently, papers+ has two methods to add papers:

  1. From an arXiv link.
  2. From a local PDF.

When adding from an arXiv link, papers+ will do the following:

  1. Download the PDF of the paper to a central location.
  2. Ask you to enter the title of the paper. You can give the paper’s original title as the title, or something else if you prefer. I sometimes do the latter.
  3. Try to get the BibTeX citation from the InspireHEP database. Failing which, it ask you to enter the BibTeX citation.
  4. Convert all of that to an Org capture, that you can edit before adding it to the central Org file. A trick I took from oantolin’s arXiv.el. You must add an entry for papers+ in your org-capture-templates for this to work.

Adding from a local PDF is very similar, except that it will copy the PDF to the central location, and will always ask you to enter the BibTeX citation.

This is not a fully automatic solution. But it is usually automatic enough for my purposes. There are also edge cases. I have not implemented any method to check for duplicates. But usually it is not an issue. papers+ is quite extendable though. You can make it more automatic if you choose. For example, you can parse the output from the arXiv API to automatically get the title. Personally, I found that implementing an arXiv API parser in Elisp is far more of a time sink than just manually entering the title. You can also easily add other sources to add papers/citations from, or checks for duplicates.

What did this achieve? Keeping your papers in a central Org file allows you to do (at least) the following:

  1. Tangle all the BibTeX source blocks to easily create a .bib file that you can use in your LaTeX document.
  2. Refile the entries in your central Org file to better organize your papers, however you want. You can even refile the entries to other Org files if you want. If you no longer need a paper, you can just delete that entry.
  3. Easily add notes to each entry. This is useful to keep track of which paper has which information.
  4. It is quite easy to search the Org file. It will take a while for the file to grow to unmanageable sizes, and even then searching with tools like consult-ripgrep or org-ql might still be a breeze.

The how

Here is the good part. Feel free to use the code as is or modify it however you want.

Customization options

(require 'org-capture)

(defgroup papers+ nil
  "Utility functions for maintaining research papers."
  :group 'applications)

(defcustom papers-dir+ "~/Documents/papers/"
  "Directory where papers are stored."
  :type 'file
  :group 'papers+)

(defcustom papers-org-capture-file+
  (expand-file-name "reference.org" papers-dir+)
  "List of papers."
  :type 'file
  :group 'papers+)

(defcustom papers-org-capture-header+
  "To be refiled"
  "Header for org-capture."
  :type 'string
  :group 'papers+)

(defcustom papers-org-capture-key+
  "p"
  "Key for org-capture."
  :type 'key
  :group 'papers+)

(defcustom paper-sources+
  '(("arXiv" . papers--arXiv-add+)
    ("local" . papers--local-add+))
  "Sources for papers."
  :type '(alist :key-type string :value-type function)
  :group 'papers+)

arXiv backend

(defun papers--arXiv-id+ (url-or-id)
  "Extract arXiv ID from URL-OR-ID."
  (cond
   ((string-match
     "^https?:\/\/arxiv.org/\\(?:abs\\|pdf\\)\/\\([a-z-\/0-9v.]+\\|[0-9v.]+\\)"
     url-or-id)
    (match-string 1 url-or-id))
   ((string-match "\\([a-z-\/0-9v.]+\\|[0-9v.]+\\)" url-or-id)
    (match-string 1 url-or-id))))

(defun papers--arXiv-url+ (url-or-id)
  "Construct arXiv pdf url from URL-OR-ID."
  (cond
   ((string-match
     "^https?:\/\/arxiv.org/\\(?:abs\\|pdf\\)\/\\([a-z-\/0-9v.]+\\|[0-9v.]+\\)"
     url-or-id 0 t)
    (string-replace "abs" "pdf" url-or-id))
   ((string-match "\\([a-z-\/0-9v.]+\\|[0-9v.]+\\)" url-or-id 0 t)
    (concat "https://arxiv.org/pdf/" url-or-id))))

(defun papers--arXiv-download+ (url-or-id)
  "Download paper from URL-OR-ID."
  (let ((pdf-url (papers--arXiv-url+ url-or-id))
        (pdf-file
         (expand-file-name
          (format "%s.pdf"
           (string-replace "/" "." (papers--arXiv-id+ url-or-id)))
          papers-dir+)))
    (unless (file-directory-p papers-dir+)
      (make-directory papers-dir+ t))
    (unless (file-exists-p pdf-file)
      (url-copy-file pdf-url pdf-file 1))
    pdf-file))

(defun papers--arXiv-inspirehep-bibtex+ (url-or-id)
  "Get bibtex data from Inspire HEP for arXiv URL-OR-ID."
  (with-current-buffer
      (url-retrieve-synchronously
       (format "https://inspirehep.net/api/arxiv/%s?format=bibtex"
               (papers--arXiv-id+ url-or-id)))
    (goto-char (point-min))
    (if (search-forward "404" nil t)
        nil
      (progn
        (goto-char (point-min))
        (search-forward "\n\n")
        (goto-char (match-end 0))
        (delete-region (point-min) (point))
        (string-trim (buffer-string))))))

(defun papers--arXiv-add+ ()
  (let* ((url-or-id (read-from-minibuffer "Enter arXiv url or id: " nil))
         (title (read-from-minibuffer "Enter title: " nil))
         (pdf (papers--arXiv-download+ url-or-id))
         (bib (papers--arXiv-inspirehep-bibtex+ url-or-id)))
    (unless bib
      (setq bib (read-from-minibuffer "Enter bibtex record: " nil)))
    (list pdf title bib)))

Local PDF backend

(defun papers--local-add+ ()
  (let* ((old-pdf (read-file-name "Enter path to pdf: "))
         (pdf (expand-file-name (file-name-nondirectory old-pdf) papers-dir+))
         (title (read-from-minibuffer "Enter title: " nil))
         (bib (read-from-minibuffer "Enter bibtex record: " nil)))
    (rename-file old-pdf pdf)
    (list pdf title bib)))

Frontend

(defun papers-add+ ()
  (interactive)
  (let* ((source
          (completing-read "Select source: " (mapcar #'car paper-sources+)))
         (params
          (funcall (cdr (assoc source paper-sources+)))))
    (kill-new
     (format "[[file:%s][%s]]\n#+begin_src bibtex\n%s\n#+end_src"
             (file-relative-name (nth 0 params) papers-dir+)
             (nth 1 params)
             (nth 2 params)))
    (org-capture nil papers-org-capture-key+)))

(provide 'papers+)

init.el

In my init.el, I have the following config for papers+:

(use-package papers+
:config
(defun papers-grep+ ()
  (interactive)
  (consult-grep+ papers-dir+))
(bind-keys :prefix-map papers+-map
           :prefix "C-M-p"
           :prefix-docstring "Key bindings for papers+."
           ("a" . papers-add+)
           ("g" . papers-grep+)))

consult-grep+ is a wrapper function that uses consult-ripgrep for grepping if rg is available, else defaults to consult-grep. The bind-keys macro is provided by use-package.

I also add the following within my (use-package org ...) block:

(setopt org-capture-templates
        `((,papers-org-capture-key+
           "Papers"
           entry
           (file+headline
            ,papers-org-capture-file+
            ,papers-org-capture-header+)
           "* %c")))
(setopt org-refile-targets
        `((,papers-org-capture-file+ . (:maxlevel . 4))))

Just to keep things simple, I load Org after loading papers+ in my init.el. This ensures that all the customization options in papers+ are known to Emacs when I set them in the (use-package org ...) block.