Scientific name parsing allows to determine a canonical form, the authorship of a name, and receive other meta-information. Canonical forms are crucial for comparing names from different data sources.
We are releasing GNparser v1.1.0 written in Go language. We support Semantic Versioning, therefore it is a stable version. Output format, functions, and settings are going to be backward compatible for many years (until v2).
This is the 3rd implementation of name-parsing for Global Names Architecture project. First one, written in Ruby, biodiversity gem, uses now the Go code of GNparser. Second one, written in Scala is archived, and awaits for a new maintainer.
GNparser is a sophisticated software, it is able to parse the most complex scientific names. It is also very fast, and able to parse more than 200 million names in an hour. The parser is a core component of many other Global Names Architecture projects.
It can be used via:
Improvements since the last Scala-based release of GNparser
Speed — about 2 times faster than Scala-based version for CSV output, and about 8 times faster for JSON output.
- Issue #27 — support for
agamosp. agamossp. agamovar.ranks.
- Issue #28 — support for non-ASCII apostrophes.
- Issue #36 — support
_as a space for files in Newick format.
- Issue #40 — support names where one of parentheses is missing.
- Issue #43 — support for
- Issue #45 — support for
- Issue #46 — support for
- Issue #48 — improve transliteration of diactritical characters.
- Issue #49 — support for outdated names with several hyphens in specific epithet.
- Issue #51 — distinguish between
Aus (Bus) cusin botany and zoology (author or subgenus).
- Issue #52 — support hyphen in outdated genus names.
- Issue #57 — warn when
f.might mean either
- Issue #58 — distinguish between
Aus (Bus)in ICN and ICZN (author or subgenus).
- Issue #63 — normalize
- Issue #60 — allow outdated ranks in form of Greek letters.
- Issue #61 — support authors’ names with
- Issue #66 — remove HTML tags from names, unless asked otherwise.
- Issue #67 — add name’s authorship to the “root” of JSON structure.
- Issue #68 — provide stemmed canonical form.
- Issue #69 — provide shared C library to bind GNparser to other languages.
- Issue #72 — parse surrogate names from BOLD project.
- Issue #75 — normalize subspecies to subsp.
- Issue #74 — support CSV output.
- Issue #78 — parse virus-like non-virus names correctly.
- Issue #79 — make CSV as a default output.
- Issue #80 — add cardinality to output.
- Issue #81 — support year ranges like ‘1778/79’.
- Issue #82 — parse authors with prefix
- Issue #89 — allow
subspec.as a rank.
- Issue #90 — allow
- Issue #93 — parse
yfrom Spanish papers as an author separator.
- Issue #127 — release a stable 1.0.0 version.
- Issue #162 — support bacterial