GNRD release (v0.9.0) switched to gnfinder from TaxonFinder/NetiNeti and became 25 times faster.

GNRD release (v0.9.0) switched to gnfinder from TaxonFinder/NetiNeti and became 25 times faster. ∞

GNRD and release

10 Dec 2019

Big change came to GNRD, a program David Shorthouse and Dmitry Mozzherin released back in 2012. GNRD is a web application that is able to find scientific names in UTF-8 encoded plain texts, in PDFs, MS Word and MS Excel documents, and even images.

For a long time it used two name-finding libraries TaxonFinder (developed by Patrick Leary) and NetiNeti (developed by Lakshmi Manohar Akella). Both projects served us well all these years using complementary heuristic and natural language processing algorithms. Biodiversity Heritage Library, BioStor and many others used GNRD for detection of scientific names for many eyars with success. However the speed for large-scale name-finding was not satisfactory. To make large-scale name-detection possible we developed gnfinder that also uses both heuristic and NLP algorithms. With this new release of GNRD we substitute TasonFinder and Netineti engines with gnfinder.

We tried hard to keep API as close as possible to how it was before, however there are a few changes, especially at the name-verification (reconciliation and resolution) part. This change made both name-finding and name-verification much faster with increased quality. For example, it used to take 15 seconds to find names in a 1000-page biological book. Now it takes only 0.5 seconds. GNRD tries to get names with OCR errors as well, as a result you might get false positives. We do recommend to use name-verification option to weed out such false results.

If you need to cite GNRD in a paper, v0.9.0 has a DOI attached: 10.5281/zenodo.3569619