HEP: reference curation

(click images to enlarge)

Part of curating HEP records includes curating reference lists. If an INSPIRE record contains a reference to another INSPIRE record, the two records will be connected and will be counted in citation counts.

The first step to curating a reference list is extracting the references from the paper into INSPIRE. You can find the tutorial on how to do this here.

If the paper has been published, be sure to take the references from the publisher’s website. If you run Refextract on the record, the references will be taken from the uploaded version of the paper which is most often not the published version. Instead you’ll have to copy the references and paste them into the Refextract text box.

references11

Since the publisher’s format for references often includes extra links to CrossRef and loses the reference numbers, paste the references in a text file first and clean them up by including reference numbers with periods after them and deleting unnecessary information before pasting the references in Refextract.

references13

Once you have the references included in the MARC metadata of the record, you’ll need to clean them. References will always go in the 999C5 tag, with the following possible subfields:

0 recid
a doi/hdl
h author
i ISBN
m miscellaneous information
o number of reference in text
r report number or arXiv number
s journal reference
t title
u url
y year
9 CURATOR

The CURATOR tag is automatically added to references that have been touched by a curator. It prevents Refextract from overwriting these references.

The most important subfields include 0, a, r, and s. These subfields connect references with the records that they cite.

The 0 subfield contains the recid.

The a subfield contains resolvable links, which include doi and hdl. Use the link type as a prefix doi:10.1103/PhysRevLett.105.026802 or hdl:11343/55205.

The r subfield contains the report number or arXiv number. The report number may vary, but will usually contain a form of the host institution’s name followed by numbers that usually represent the year of publication and other numbers to uniquely identify the paper (FERMILAB-CONF-15-231-AD-ND). The arXiv number will include the prefix arXiv: followed by four digits, a period, and another four or five digits. Older arXiv numbers include the field code. Examples include arXiv:1508.02507, arXiv:1305.7513 and hep-ex/9701019.

The s subfield contains the journal reference, including the journal’s short name listed in the INSPIRE Journals database, the volume, and the page range or article ID. Each part of the s field is separated by commas and includes no spaces, unless the journal’s short name contains spaces. If different sections of the journal contain different letters, such as Nuclear Instruments and Methods in Physics Research Section A and Section B, this letter is placed directly before the volume number and after the comma at the end of the journal name. Some journals have specially formed pubnotes, such as JHEP which creates the volume numberĀ  from the two digit year and the two digit volume, due to the journal repeating volume numbers every year. You can check how the volume and page numbers should be formatted by searching papers from the journal in question in HEP using fin j and the journal short name.

Some examples of s subfield formatting are listed below:

Phys.Rev.,D92,023001

JHEP,1507,161

Nucl.Instrum.Meth.,A789,28-42

Astropart.Phys.,66,39-52

PoS,LATTICE2014,392

JINST,10,C08002

Adv.High Energy Phys.,2015,921757

A few journals require issue numbers to uniquely identify papers. For citations of these journals, include the issue number in the s subfield between the volume and page numbers

Electron.J.Theor.Phys.,12,31,1

Currently these references will not be recognized as citations in INSPIRE, but including this information will make the eventual transition to a new data model easier.

 

Below you’ll see references in the 999C5 fields of a record on which Refextract has run.
references1

Notice that most of the references are clean, with the recid and journal reference of the cited paper in the 0 and s subfields.

references2 references3

Open the record preview by clicking the icon with the magnifying glass. You’ll be able to see which references are connected to INSPIRE records and which aren’t.

references4

These references are properly connected and are displayed with blue links.

references5

Here is a reference that is not connected due to incorrect formatting.

references6

Find the reference in the MARC record and fix the page range.references7Now when you refresh the preview, the reference will appear to be connected.references8

The only references that can be connected are ones for papers for which a record already exists in INSPIRE. When curating references, make sure that the journal references and report numbers are in the correct format even if the references don’t connect. This way they will automatically connect if the record is eventually added to INSPIRE.

Spend no more than fifteen minutes curating a record. Due to the amount of new content that must be curated each day, we can’t afford to spend so much time making every reference list perfect.

Link to Refextract tutorial

Was this article helpful?

Related Articles