Stacy, I only addressed a small part of your your question. I have not yet figured out how to efficiently get rid of the non-Springer URL's on these aggregator-neutral records. As for matter of the incorrect Springer URLs, I think we should be complaining loudly to Springer. It is not to their advantage if folks can't access their books. What a mess. Margaret
-----Original Message-----
From: MarcEdit support in technical and instructional matters [mailto:[log in to unmask]] On Behalf Of Wilson, Margaret
Sent: Thursday, August 19, 2010 5:51 AM
To: [log in to unmask]
Subject: Re: [MARCEDIT-L] How to find unique records
Good grief, Stacy. My library just got the OCLC Springer records yesterday and I started working on the August file of 403 records, which had 529 856's. I tried to get rid of the extra 856's in a decidedly cumbersome and retarded manner: I used the delete field function for 856's with the starting text of 41$u, 42$u, 41$3, 42$3, etc. That left me with 25 more 856's than there were records, but I think those "extras" were on records with multiple 856 40's.
I am disheartened by your report that there were so many dead links. I had only tested one so far and it worked.
Margaret Wilson
Electronic Resources Cataloger
University of Kansas Libraries
-----Original Message-----
From: MarcEdit support in technical and instructional matters on behalf of Stacy Pober
Sent: Thu 8/19/2010 12:42 AM
To: [log in to unmask]
Subject: Re: [MARCEDIT-L] How to find unique records
I just looked more closely at the OCLC records and the Springer-provided MARC records. Both are pretty terrible as access points for ebooks.
The initial load file from OCLC includes hundreds and hundreds of
links to other online services in addition to the Springer links. I
did a sort in the MarcEdit Extract tool for 856$u. Since the display only shows the first 856$u field, I couldn't find all the bad links, but just in the initial scan, I see there links to cornell.edu, ohiolink, netlibrary, ebrary, mylibrary, and metapress. None of these will work for our patrons.
There are only two different types of links that seem to go directly to the Springer books with our subscrription.
Those begin with: http://dx.doi.org or http://www.springerlink.com
Can someone advise me of the best way to automate the elimination of all the inappropriate 856 fields?
But wait, there's more!
Given those problems with the OCLC records, I started processing the Springer-provided summer update files, figuring I can remove those if I get the OCLC records straightened out.
And just to be on the safe side, I did some testing of the links in the Springer July update file. I tested 60 links. 36 of them were bad. 32 were dead DOI links. 4 were books that did not give our institution access.
If the rest of the file is like the first 60 records, then more than half of the records have dead links to the books. Yikes!
In the past, we found a very small percentage of bad links in the
Springer-provided records. We notified them, and after repeated
reminders, they finally fixed most of those. But those were a very small percentage of the total records provided. Now, we're getting
such poor quality records that MOST of the links are bad. I am
dismayed, to say the least.
Penny Swanson asked which sets we get.
In Springer terms, they call the collection we get the "English/International" collection which has all the Springer ebooks published from 2005 through 2010 except for Dutch and German language books and the Springer Protocols.
When I posted about the missing records, I was comparing the initial load of the Springer records, plus the updates that would have come out and been incorporated into the OCLC set when we received it.
However, in examining the sets and what should be included, it's very difficult to tell which Springer record files would correspond with
the OCLC sets. They're released on a different schedule.
Springer:
The Springer-provided records had an initial file of 16,346 records, covering ebooks released through Nov. 2009. Additional records were provided in Dec. 2009, and Jan., Feb, Apr. Jun., and July 2010 adding another 3721 records. This means that Springer has provided 20,067 MARC records.
OCLC:
We're getting the "Springer Complete Collection" record sets from OCLC. .The initial file from OCLC was 17173 books, with two updates since then of 274 and 403 records, which adds up to only 17,840 records.
That's a difference of 2,227 records from the Springer files.
It's possible that the August OCLC files do not include the very large
July Springer update If that's the case, then the discrepancy is
much less, only 146 records. Still, I can't just clean out the
Springer-provided files and load the OCLC MARC records without potentially losing access to over 2000 books.
This is looking like it's going to be quite an onerous chore!
--
Stacy Pober
Information Alchemist
Manhattan College Library
Riverdale, NY 10471
[log in to unmask]
On Tue, Aug 17, 2010 at 12:22 PM, Penny Swanson <[log in to unmask]> wrote:
> Hi;
>
> The problem I've found when trying to match Springer records with anything
> else is the poor cataloguing means that the titles are not consistent (and
> ISBNs are also pretty hopeless). We've found that the best match is on
> URL. Presumably if the URL is the same, it is pointing to the same
> resource. I have another program that will do this, though not totally
> consistently. Also, our system (III) can be made to do this via the load
> tables.
>
> I'm curious about the fact that the Stacy only had 119 titles for which
> records could not be found on OCLC. We've been loading these since the
> beginning, so have a lot of anomalies in our database, but I now have 27,000
> records in our database, with 10,000 non OCLC records. We get the
> international collection, and have done since 2005. Because we originally
> got records from MyILibrary, there are dups, but I would be interested to
> know what part of the Springer collection you are getting.
>
> Penny
>
> -+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
> Penny Swanson, M.L.S. email: [log in to unmask]
> Head, Cataloguing Division voice: 778.782.3184
> Simon Fraser University Library fax: 778.782.3023
> 8888 University Drive
> Burnaby, B.C. V5A 1S6
>
> ----- Original Message -----
> From: "Stacy Pober" <[log in to unmask]>
> To: [log in to unmask]
> Sent: Thursday, 12 August, 2010 12:33:59 GMT -08:00 US/Canada Pacific
> Subject: Re: How to find unique records
>
> Steve McDonald wrote:
>> For general information, though, here is how I would do a project like
>> this. The method I have seen used is to export some fields (probably title,
>> 001, and any other fields you think might be useful to identification, like
>> author and ISBN) from both files into tab-delimited text files. Import both
>> files into a spreadsheet program like Excel. In the table for the OCLC
>> records, add a new column and put "OCLC" into every box in the column. For
>> the Springer table, add a new column and put "Springer" into every box in
>> the column. Combine both tables into a single table, for instance by
>> copying and pasting. Then you sort by the title column (primary) and
>> Springer/OCLC column (secondary). The resulting table should mostly
>> alternate between matching Springer and OCLC titles. Remove those and you
>> are left with only the titles in one or the other.
>
> Steve,
>
> Thank you for the help.
>
> Just to clarify: Are you saying I should find them in the spreadsheet
> by visually scanning for the non-alternating titles, or is there some
> way to automate this? The visual scanning method is going to be
> time-consuming for such a large set of books. I wouldn't mind if it
> were just a few hundred titles, but it's over 17,000, which will make
> it a bit of a chore.
>
> Stacy
>
>
> On Thu, Aug 12, 2010 at 1:49 PM, McDonald, Stephen
> <[log in to unmask]> wrote:
>> Stacy Pober said:
>>> We get a large collection of Springer e-books. The MARC records
>>> supplied by Springer are pretty bad. They now offer records via OCLC
>>> which should be better quality.
>>>
>>> Here's the problem: 119 records are missing from the OCLC set.
>>>
>>> If I concatenate our Springer-provided MARC record files into one
>>> file, can I then compare that to the OCLC file and pull out a set of
>>> the books that are unique to the Springer-provided set ?
>>>
>>> Could someone tell me the exact steps I'd need to go through to get
>>> that file of 119 records?
>>
>> I have not done this myself, but I'm about to do something similar.
>> However, when I (briefly) looked into this myself when first loading the
>> WorldCat Collection Set records for Springer, I found that the difference
>> was that Springer had multiple records for separate volumes which were
>> combined in a single record on WCS. As far as I could tell, the WCS set was
>> in fact the complete set of books available from Springer.
>>
>> For general information, though, here is how I would do a project like
>> this. The method I have seen used is to export some fields (probably title,
>> 001, and any other fields you think might be useful to identification, like
>> author and ISBN) from both files into tab-delimited text files. Import both
>> files into a spreadsheet program like Excel. In the table for the OCLC
>> records, add a new column and put "OCLC" into every box in the column. For
>> the Springer table, add a new column and put "Springer" into every box in
>> the column. Combine both tables into a single table, for instance by
>> copying and pasting. Then you sort by the title column (primary) and
>> Springer/OCLC column (secondary). The resulting table should mostly
>> alternate between matching Springer and OCLC titles. Remove those and you
>> are left with only the titles in one or the other. From that you can make a
>> file from which to do a batch search on OCLC. After downloading the batch
>> search results, use the extr!
>
> a !
>> information in the table (author, ISBN, etc.) to remove incorrect
>> matches.
>>
>> Steve McDonald
>> [log in to unmask]
>>
>> ________________________________________________________________________
>>
>> This message comes to you via MARCEDIT-L, a Listserv(R) list for technical
>> and instructional support in MarcEdit. If you wish to communicate directly
>> with the list owners, write to [log in to unmask] To
>> unsubscribe, send a message "SIGNOFF MARCEDIT-L" to
>> [log in to unmask]
>>
>
> --
> Stacy Pober
> Information Alchemist
> Manhattan College Library
> Riverdale, NY 10471
> [log in to unmask]
>
> ________________________________________________________________________
>
> This message comes to you via MARCEDIT-L, a Listserv(R) list for technical
> and instructional support in MarcEdit. If you wish to communicate directly
> with the list owners, write to [log in to unmask] To
> unsubscribe, send a message "SIGNOFF MARCEDIT-L" to
> [log in to unmask]
> ________________________________________________________________________
>
> This message comes to you via MARCEDIT-L, a Listserv(R) list for technical
> and instructional support in MarcEdit. If you wish to communicate directly
> with the list owners, write to [log in to unmask] To
> unsubscribe, send a message "SIGNOFF MARCEDIT-L" to
> [log in to unmask]
________________________________________________________________________
This message comes to you via MARCEDIT-L, a Listserv(R) list for technical and instructional support in MarcEdit. If you wish to communicate directly with the list owners, write to [log in to unmask] To unsubscribe, send a message "SIGNOFF MARCEDIT-L" to [log in to unmask]
________________________________________________________________________
This message comes to you via MARCEDIT-L, a Listserv(R) list for technical and instructional support in MarcEdit. If you wish to communicate directly with the list owners, write to [log in to unmask] To unsubscribe, send a message "SIGNOFF MARCEDIT-L" to [log in to unmask]
________________________________________________________________________
This message comes to you via MARCEDIT-L, a Listserv(R) list for technical and instructional support in MarcEdit. If you wish to communicate directly with the list owners, write to [log in to unmask] To unsubscribe, send a message "SIGNOFF MARCEDIT-L" to [log in to unmask]
|