Our First MARC-to-RDA/LRM alignment: setting a precedent

We are beginning to align MARC to RDA/LRM. We began by aligning the MARC 490 field to RDA/LRM. As expected, complications arose.

RDA/LRM properties (i.e. elements) that pertain to the MARC 490 (but are not necessarily needed for alignment):

–>”D” represents rdfs:domain and “R” represents rdfs:range

–>”***” represents a property needed to align the MARC 490 with RDA/LRM; all others are present to represent the hierarchies

1. ***series statement, http://rdaregistry.info/Elements/m/P30106, D=Manifestation, R={ }

2. manifestation statement, http://rdaregistry.info/Elements/m/P30292, D=Manifestation, R={ }

***2a. manifestation series statement, http://rdaregistry.info/Elements/m/P30291, D=Manifestation, R={ }

2b. Manifestation: manifestation copyright statement

2c. Manifestation: manifestation designation of sequence statement

2d. Manifestation: manifestation dissertation statement

2e. Manifestation: manifestation distribution statement

2f. Manifestation: manifestation edition statement

2g. Manifestation: manifestation frequency statement

2h. Manifestation: manifestation identifier statement

2i. Manifestation: manifestation manufacture statement

2j. Manifestation: manifestation production statement

2k. Manifestation: manifestation publication statement

2l. Manifestation: manifestation regional encoding statement

2m. Manifestation: manifestation title and responsibility statement

3. Manifestation: related nomen of manifestation

3a. Manifestation: title of manifestation

***3a1. title of series, http://rdaregistry.info/Elements/m/P30157, D=Manifestation, R=Nomen

***3a1-a. parallel title of series, http://rdaregistry.info/Elements/m/P30204 (child of “title of series”), D=Manifestation, R=Nomen

***3b. numbering within sequence, http://rdaregistry.info/Elements/m/P30014, D=Manifestation, R=Nomen

4. Manifestation: other title information

4a. Manifestation: parallel other title information

***4a1. parallel other title information of series, http://rdaregistry.info/Elements/m/P30152, D=Manifestation, R={ }

***5. other title information of series, http://rdaregistry.info/Elements/m/P30143, D=Manifestation, R={ }

***5a. parallel other title information of series, http://rdaregistry.info/Elements/m/P30152, D=Manifestation, R={ }

6. Manifestation: statement of responsibility

6a. Manifestation: statement of responsibility relating to edition

6b. Manifestation: statement of responsibility relating to named revision of edition

***6c. statement of responsibility relating to series, http://rdaregistry.info/Elements/m/P30119, D=Manifestation, R={ }

***6c1. parallel statement of responsibility relating to series, http://rdaregistry.info/Elements/m/P30113, D=Manifestation, R={ }

6d. Manifestation: statement of responsibility relating to title proper

7. Entity: note on RDA entity

7a. Manifestation: note on manifestation

***7a1. note on series statement, http://rdaregistry.info/Elements/m/P30058, D=Manifestation, R={ }

Comments on the RDA properties

–>What’s the difference between #1 and #2a above? Neither the RDA Toolkit nor the RDA Registry clears this up adequately. (Our decision: #2a requires a direct, exact transcription; #1 allows the value to be structured, as is done in MARC 490 fields; for example, the series statement uses ISBD punctuation; the manifestation series statement does not, it is an exact transcription; as a result, in the end, we decided “has series statement” (P30106) provided a more exact alignment for MARC 490 than “has manifestation series statement” (P30291).)

–> The following is a property in two hierarchies: 4a1 = 5a. Doesn’t it seem like “parallel other title information of series” (P30152) should be only a sub-property of “Manifestation: other title information/Manifestation: parallel other title information”?

Note on alignment procedure: as a convenience, during our first pass, our MARC-to-RDA alignment decisions need to accommodate only examples in https://www.loc.gov/marc/bibliographic/ (in other words, examples external to that source are not needed; no need to comb through OCLC or the local catalog to find examples/use-cases we need to support).

Selected examples of MARC 490 from MARC Bibliographic (https://www.loc.gov/marc/bibliographic/bd490.html):

$aBulletin / U.S. Department of Labor, Bureau of Labor Statistics

$aMPCHT art and anthropological monographs ;$vno. 35

$aDetroit area study, 1971 : social problems and social change in Detroit ;$vno. 19

$aPolicy series / CES ;$v1

$aResearch report / National Education Association Research

$aDepartment of State publication ;$v7846.$aDepartment and Foreign Service series ;$v128

$aAnnual census of manufactures =$aRecensement des manufactures,$x0315-5587

$aPapers and documents of the I.C.I. Series C, Bibliographies ;$vno. 3 =$aTravaux et documents de l’I.C.I. Série C, Bibliographies ;$vno 3

 

First attempt at 490$a:

–> 490 data was determined to be rdac:Manifestation data (it is primarily data transcribed from the manifestation with some structure added).

 

–> We wanted to use all the pertinent properties/sub-properties, just as would be done when creating native RDA; specifically we wanted to have data points for the series title, other title information, etc.

 

–> This first attempt at alignment was rejected.

 

condition 1:
if there exists only one single space-slash-space:

  • align to RDA “title of series”: <<$a value before the space-slash-space; strip the space-slash-space>>
  • align to RDA “statement of responsibility relating to series”: <<$a value after the space-slash-space; strip the space-slash-space and terminal punctuation>>

condition 2:
if there exists one slash but not space-slash-space:

  • align to RDA “has series statement”: <<$a value; strip terminal punctuation>>
  • [can’t assume it is in fact a title + statement of responsibility; that would be an assumption that there is a typo]

condition 3:
if >1 slash:

  • align to RDA “has series statement”: <<$a value; strip terminal punctuation>>
  • [can’t assume which slash separates title from statement of responsibility; however a new condition could evaluate if one and only one slash has a space-slash-space and, if so, separate the “title of series” and the “statement of responsibility relating to the series”]

condition 4:
if there are no slashes:

  • align to RDA “title of series”: <<$a value; strip terminal punctuation>>

condition 5:
if there exists only one single space-colon-space:

  • align to RDA “title of series”: <<$a value before the space-colon-space; strip the space-colon-space>>
  • align to RDA “other title information of series”: <<$a value after the space-colon-space; strip the space-colon-space and terminal punctuation>>

condition 6:
if there exists one colon but not space-colon-space:

  • align to RDA “has series statement”: <<$a value; strip terminal punctuation>>
  • [can’t assume the value following the colon is other title information; it may be part of the series title]

condition 7:
if >1 colon:

  • align to RDA “has series statement”: <<$a value; strip terminal punctuation>>
  • [alternatively set up a condition to evaluate if one and only one colon is bounded by spaces, in which case an alignment to “title of series” and “other title information of series” can be established]

condition 8:
if there are no colons:

  • title of series: <<$a value; strip terminal punctuation>>

condition 9:
if >1 $a

  • each $a value requires a separate property/value pair
    • Note: this requirement was dropped in the accepted alignment (see below)
  • apply condition 1-8 above to each $a
    • Note: combining conditions 1-8 will create additional conditions!
  • each value following ” / ” and ” : ” requires a separate property/value pair.
  • each $v following a $a requires a separate property/value.
  • it will be difficult to pair titles with the correct statement of responsibility, other title information and numbering sequence.
  • optional condition: create a “note on series statement” that pairs the property/value pairs

–> There would be several more conditions, especially as we attempt to account for $v and $x.

Comments:

Writing all these conditions will be time-consuming (i.e. expensive). It also sets a precedent for the treatment of other fields in this alignment. We will not be able to sustain writing code for all the possible conditions, so we will have to make some sort of compromise. Surely there is a design pattern for this: when it proves too costly to tokenize select values for a single property in the source ontology to align with multiple properties in the target ontology, align the source ontology property with a general property in the target ontology, if available, that allows the combination of values.

In our case, we are aligning MARC with RDA/LRM; specifically, we are searching for appropriate general properties in RDA/LRM for full MARC 490 subfield values. Fortunately there are two general properties that apply:

  • series statement, http://rdaregistry.info/Elements/m/P30106, D=Manifestation, R={ }
  • manifestation series statement, http://rdaregistry.info/Elements/m/P30291, D=Manifestation, R={ }

As stated above, we decided that “manifestation series statement” requires a direct, exact transcription; “series statement” allows the value to be structured, as is done in MARC 490 fields; for example, the series statement allows ISBD punctuation; the manifestation series statement does not, it is an exact transcription. We decided “has series statement” (P30106) provided a more exact alignment for MARC 490 than “has manifestation series statement” (P30291), adding a “note on series statement” (P30058) when needed.

Preferred alignment of the 490

There will be two possible alignments; choose one based on the presence or absence of ISBD punctuation

  • choose the “a” version when ISBD punctuation is used
  • choose the “b” version when ISBD punctuation is not used
  • how to determine ISBD punctuation:
    • if LDR/18 = a or i, use alignment (a)
    • if LDR/18 = #, c, n or u, use alignment (b)

(1) when there is 1 $a and only 1 $a

  • (a) align full value with RDA “series statement”; strip subfield, if applicable
  • (b) [“b” version of alignment not needed]

 

(2) when there is 1 $a with any combination of $a $v $x

  • (a) align full value with RDA “series statement”; retain punctuation; strip subfields and replace with a space (except $a, if present)
  • (b) align full value with RDA “series statement”; retain punctuation and subfields

 

(3) when there is >1 $a where the series are either parallel titles or have a main series/subseries relationship

  • (a) align full value with RDA “series statement”; retain punctuation; strip subfields and replace with a space (except $a, if present)
  • (b) align full value with RDA “series statement”; retain punctuation and subfields

 

Unaligned in first pass: $l, $y, $z, $3, $6, $8

 

Taking the 8 LOC MARC 490 examples cited above, we get:

example 1

[LDR/18 = whatever]

$aBulletin / U.S. Department of Labor, Bureau of Labor Statistics

…becomes the following triple in RDA/LRM/RDF:

<>

rdam:P30106

“Bulletin / U.S. Department of Labor, Bureau of Labor Statistics” .

 

example 2

[LDR/18 = a or i ]

$aMPCHT art and anthropological monographs ;$vno. 35

…becomes the following triple in RDA/LRM/RDF:

<>

rdam:P30106

“MPCHT art and anthropological monographs ; no. 35” .

example 3

[LDR/18 = a or i]

$aDetroit area study, 1971 : social problems and social change in Detroit ;$vno. 19

…becomes the following triple in RDA/LRM/RDF:

<>

rdam:P30106

“Detroit area study, 1971 : social problems and social change in Detroit ; no. 19” .

example 4

[LDR/18 = a or i]

$aPolicy series / CES ;$v1

…becomes the following triple in RDA/LRM/RDF:

<>

rdam:P30106

“Policy series / CES ; 1” .

example 5

[LDR/18 = a or i]

$aResearch report / National Education Association Research

…becomes the following triple in RDA/LRM/RDF:

<>

rdam:P30106

“esearch report / National Education Association Research” .

example 6 (altered from the MARC standard’s example to demonstrate non-ISBD punctuation)

[LDR/18 = #, c, n or u]

$aDepartment of State publication $v7846 $aDepartment and Foreign Service series $v128

…becomes the following triple in RDA/LRM/RDF:

<>

rdam:P30106

“$aDepartment of State publication $v7846 $aDepartment and Foreign Service series $v128” .

example 7

[LDR/18 = a or i]

$aAnnual census of manufactures =$aRecensement des manufactures,$x0315-5587

…becomes the following triple in RDA/LRM/RDF:

<>

rdam:P30106

“Annual census of manufactures = Recensement des manufactures, 0315-5587” .

example 8

[LDR/18 = a or i]

$aPapers and documents of the I.C.I. Series C, Bibliographies ;$vno. 3 =$aTravaux et documents de l’I.C.I. Série C, Bibliographies ;$vno 3

…becomes the following triple in RDA/LRM/RDF:

<>

rdam:P30106

“Papers and documents of the I.C.I. Series C, Bibliographies ; no. 3 = Travaux et documents de l’I.C.I. Série C, Bibliographies ; no 3” .

 

More comments:

This is an early alignment in our MARC-to-RDA efforts. We are setting a precedent in how we’ll align the MARC fields going forward. If we set up a situation where we accommodate thousands of conditions, we will have difficulty finishing the alignment, as that work could use up all our time. However, the precedent we’re setting is not to avoid accommodating conditions in a complex alignment; rather, we’re setting a precedent on how to handle single MARC values that represent multiple RDA/LRM values and are difficult to parse, either due to a lack of delimiters, or an over-abundance of subfield combinations, possibilities, etc. We can make exceptions if a particular MARC field’s value is important enough to require extensive accommodation of conditions; however, in general, we are recommending against accommodating the processing of complex values. The solution, in this case, is to use generalized alignments even when more specific alignments are possible. It is good to keep in mind that even if we chose not to generalize and accommodated hundreds of conditions, the results would be at best imperfect; thus we are not just avoiding expensive solutions, we’re also avoiding poor solutions.

Last word:

The point of the MARC-to-RDA alignment is to transform our legacy MARC into RDA/LRM/RDF. This RDA/LRM/RDF would live alongside original RDA/LRM/RDF. Information about series transcribed from a manifestation, in native RDA/LRM/RDF, will most certainly distinguish the numbering within sequence, other title information of series, statement of responsibility relating to series, title of series, and note on series statement from each other. Our alignment above, however, puts most of the series/manifestation information into a single seriesStament. Although seriesStatement is a parent property (note however that it does not constitute a parent node in any hierarchy; the reason is unclear) for the more specific properties about series, it creates an inconsistency — native RDA/LRM/RDF vs. RDA/LRM/RDF derived from MARC data — that may cause difficulties for RDA/LRM/RDF consumers. However, using all the manifestation/series properties while transforming legacy MARC data will be so complicated and labor-intensive that it doesn’t seem worth the effort. Once again it is a data complication that is pushed downstream, for “someone” to deal with.

Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *