To avoid confusion – gnparser is a new project, different from the formerly released biodiversity parser.
We are happy to announce the second public release of Scala-based
Global Names Parser or gnparser
. There are many significant
changes in the v. 0.3.0 release.
-
Speed improvements. Parser is about 50% faster than already quite fast 0.2.0 version. We were able to parse 30 million names per CPU per hour with this release.
-
Compatibility with Scala 2.10.6: it was important for us to make the parser backward compatible with this older version of Scala, because we wanted to support Spark project.
-
Compatibility with Spark v. 1.6.1. Now the parser can be used in BigData projects running on Spark and massively parallelize parsing process using Spark platform. We added documenation describing how to use the parser with either Scala or Python natively on Spark.
-
Simplified parsing output in addition to “Standard output”: It analyzes the name-strings and returns its id, canonical form, canonical form with infraspecific ranks, authorship and a year.
-
Improved and stabilized JSON fields. You can find complete description of the parser JSON output in its JSON schema. We based names of fields on TDWG’s Taxon Concept Schema, and we indend to keep JSON format stable from now on.
-
There were many structural changes, bug-fixes and quality improvements. They are described in the release documenation.