Yesterday I released a new command line tool for name resolution called gn_crossmap. It is designed for people who work with checklists of scientific names using a spreadsheet software (MS Excel, Apple Numbers, Open Office, Libre Office, Google Sheets etc.) and want to compare names that they have with another reference source. The program takes a spreadsheet saved as csv file as input and generates another csv-based spreadsheet with resolution data. Examples of input and output are included into the code. README file describes how to use the project from a command line or as a Ruby library.

This program requires internet connection, Ruby >= 2.1 installed on the machine.

Basic usage is:

$ gem install gn_crossmap
$ crossmap -i input.csv -o ouput.csv -d 1

where


short long attr Description
-i –input checklist’s spreadsheet saved as csv file
-o –output path to the output file. Default is output.csv in the current directory
-d –data-source-id ID of one of the GN Resolver data sources, Catalogue of Life id (1) is default

Web interface to this program is also in works

This project started at the Catalogue of Life workshop in Leiden, which happened in March 2015. The main focus of the hackathon was to figure out how to help national checklist teams to create, maintain and compare data in their data. We determined 3 main approaches

  1. Crossmapping checklists against other checklists and/or reference sources
  2. Annotation of crossmapped data – ability to share metadata, report mistakes
  3. Distribution of species – how to fix occurance errors for a country

A hackathon group which worked on crossmapping produced a code which would compare checklists against Catalogue of Life. The gn_crossmap program I am releasing is based heavily on what we learned during the hackathon. Crossmaping code is mostly based on use cases from Rui Figueira and Wouter Koch. During the hackathon we also determined ways to improve quality of name resolution further by:

  • Using infraspecies’ rank (var., f. subsp. etc) in the matching and penalize score if ranks are different
  • Taking in account if matching authors are basyonym or combination authors
  • Using meta-information attached to names via sensu…, not … etc. to distinguish name usages