Automating DOI extraction with Python
A few months ago, I was working on – the
DOI Scraper. This Python script reads a .bib
file, hunts down articles without a DOI (Digital Object Identifier), and effortlessly fetches the missing DOIs using the
Crossref API. It then updates the .bib
file with the new data.
Why Did I Create This?
As a researcher, reference management is a critical yet often time-consuming task. One aspect I found particularly useful when writing my articles and notes in LaTeX
is the ability to include the DOIs of the cited articles in the manuscript for easy access. However, when I download the .bib
file from Google Scholar, the DOIs are often missing and the manual search for them proved to be a real headache. To save time and enhance efficiency, I decided to automate the process and share this tool with you all.
Prerequisites
- Python
requests
library
How to Get Started
- Clone the repository or snag the
doi_scraper.py
file. - Install the required dependencies with:
pip install requests
How to Use
Place your input .bib
file in the script’s directory, tweak a couple of variables in doi_scraper.py
to fit your needs
input_file = 'input.bib' # Name of the input .bib file
output_file = 'output.bib' # Name of the output .bib file
INDENT_PRE = 4 # Number of spaces before the field name
INDENT_POST = 16 # Number of spaces after the field name
and run the script:
python doi_scraper.py
Example
Before
@article{Cuadra2020,
title = {Effect of equivalence ratio fluctuations on planar detonation discontinuities},
author = {Cuadra, Alberto and Huete, C{\'e}sar and Vera, Marcos},
year = 2020,
journal = {Journal of Fluid Mechanics},
publisher = {Cambridge University Press},
volume = 903,
pages= {A30 1--39}
}
After
@article{Cuadra2020,
title = {Effect of equivalence ratio fluctuations on planar detonation discontinuities},
author = {Cuadra, Alberto and Huete, C{\'e}sar and Vera, Marcos},
year = 2020,
journal = {Journal of Fluid Mechanics},
publisher = {Cambridge University Press},
volume = 903,
pages = {A30 1--39},
doi = {10.1017/jfm.2020.651}
}
License
This project is licensed under the MIT License.