MARCEDIT-L Archives

August 2010

MARCEDIT-L@LISTSERV.GMU.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Stacy Pober <[log in to unmask]>
Reply To:
MarcEdit support in technical and instructional matters <[log in to unmask]>
Date:
Thu, 19 Aug 2010 01:42:20 -0400
Content-Type:
text/plain
Parts/Attachments:
text/plain (227 lines)
I just looked more closely at the OCLC records  and the
Springer-provided MARC records.  Both are pretty terrible as access
points for ebooks.

The initial load file from OCLC includes hundreds and hundreds of
links to other online services in addition to the Springer links.   I
did a sort in the MarcEdit  Extract tool for 856$u.  Since the display
only shows the first 856$u field, I couldn't find all the bad links,
but just in the initial scan, I see there links to cornell.edu,
ohiolink, netlibrary, ebrary, mylibrary, and metapress. None of these
will work for our patrons.

There are only two different types of links that seem to go directly
to the Springer books with our subscrription.
Those begin with:  http://dx.doi.org  or http://www.springerlink.com

Can someone advise me of the best way to automate the elimination of
all the inappropriate 856 fields?

But wait, there's more!

Given those problems with the OCLC records, I started processing the
Springer-provided  summer update files, figuring I can remove those if
I get the OCLC records straightened out.

And just to be on the safe side, I did some testing of the links in
the Springer July update file.  I tested 60 links.  36 of them were
bad.  32 were dead DOI links.  4 were books that did not give our
institution access.

If the rest of the file is like the first 60 records, then more than
half of the records have dead links to the books.  Yikes!

In the past, we found a very small percentage of bad links in the
Springer-provided records.   We notified them, and after repeated
reminders, they finally fixed most of those.  But those were a very
small percentage of the total records provided.  Now, we're getting
such poor quality records that MOST of the links are bad.    I am
dismayed, to say the least.

Penny Swanson asked which sets we get.
In Springer terms, they call the collection we get the
"English/International" collection which has all the Springer ebooks
published from 2005 through 2010 except for Dutch and German language
books and the Springer Protocols.

When I posted about the missing records, I was comparing the initial
load of the Springer records, plus the updates that would have come
out and been incorporated into the OCLC set when we received it.
However, in examining the sets and what should be included, it's very
difficult to tell which Springer record files would correspond with
the OCLC sets.   They're released on a different schedule.

Springer:
The Springer-provided records had an initial file of 16,346 records,
covering ebooks released through Nov. 2009.  Additional records were
provided in Dec. 2009, and  Jan., Feb, Apr. Jun., and July 2010 adding
another 3721 records.  This means that  Springer has provided 20,067
MARC records.

OCLC:
We're getting the "Springer Complete Collection" record sets from
OCLC. .The initial file from OCLC was 17173 books, with two updates
since then of 274 and 403 records, which adds up to only 17,840
records.

That's a difference of 2,227 records from the Springer files.

It's possible that the August OCLC files do not include the very large
July Springer update   If that's the case, then the discrepancy is
much less, only 146 records.   Still,  I can't just clean out the
Springer-provided files and load the OCLC MARC records without
potentially losing access to over 2000 books.

This is looking like it's going to be quite an onerous chore!


-- 
Stacy Pober
Information Alchemist
Manhattan College Library
Riverdale, NY 10471
[log in to unmask]


On Tue, Aug 17, 2010 at 12:22 PM, Penny Swanson <[log in to unmask]> wrote:
> Hi;
>
> The problem I've found when trying to match Springer records with anything
> else is the poor cataloguing means that the titles are not consistent (and
> ISBNs are also pretty hopeless).  We've found that the best match is on
> URL.  Presumably if the URL is the same, it is pointing to the same
> resource.  I have another program that will do this, though not totally
> consistently.  Also, our system (III) can be made to do this via the load
> tables.
>
> I'm curious about the fact that the Stacy only had 119 titles for which
> records could not be found on OCLC.  We've been loading these since the
> beginning, so have a lot of anomalies in our database, but I now have 27,000
> records in our database, with 10,000 non OCLC records.  We get the
> international collection, and have done since 2005.  Because we originally
> got records from MyILibrary, there are dups, but I would be interested to
> know what part of the Springer collection you are getting.
>
> Penny
>
> -+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
> Penny Swanson, M.L.S.                       email:  [log in to unmask]
> Head, Cataloguing Division                  voice:  778.782.3184
> Simon Fraser University Library             fax:    778.782.3023
> 8888 University Drive
> Burnaby, B.C. V5A 1S6
>
> ----- Original Message -----
> From: "Stacy Pober" <[log in to unmask]>
> To: [log in to unmask]
> Sent: Thursday, 12 August, 2010 12:33:59 GMT -08:00 US/Canada Pacific
> Subject: Re: How to find unique records
>
> Steve McDonald wrote:
>> For general information, though, here is how I would do a project like
>> this.  The method I have seen used is to export some fields (probably title,
>> 001, and any other fields you think might be useful to identification, like
>> author and ISBN) from both files into tab-delimited text files.  Import both
>> files into a spreadsheet program like Excel.  In the table for the OCLC
>> records, add a new column and put "OCLC" into every box in the column.  For
>> the Springer table, add a new column and put "Springer" into every box in
>> the column.  Combine both tables into a single table, for instance by
>> copying and pasting.  Then you sort by the title column (primary) and
>> Springer/OCLC column (secondary).  The resulting table should mostly
>> alternate between matching Springer and OCLC titles.  Remove those and you
>> are left with only the titles in one or the other.
>
> Steve,
>
> Thank you for the help.
>
> Just to clarify:  Are you saying I should find them in the spreadsheet
> by visually scanning for the non-alternating titles, or is there some
> way to automate this?  The visual scanning method is going to be
> time-consuming for such a large set of books. I wouldn't mind if it
> were just a few hundred titles, but it's over 17,000, which will make
> it a bit of a chore.
>
> Stacy
>
>
> On Thu, Aug 12, 2010 at 1:49 PM, McDonald, Stephen
> <[log in to unmask]> wrote:
>> Stacy Pober said:
>>> We get a large collection of Springer e-books.  The MARC records
>>> supplied by Springer are pretty bad.  They now offer records via OCLC
>>> which should be better quality.
>>>
>>> Here's the problem:  119 records are missing from the OCLC set.
>>>
>>> If  I concatenate our Springer-provided MARC record files into one
>>> file, can I then compare that to the OCLC file and pull out a set of
>>> the books that are unique to the Springer-provided set ?
>>>
>>> Could someone tell me the exact steps I'd need to go through to get
>>> that file of 119 records?
>>
>> I have not done this myself, but I'm about to do something similar.
>>  However, when I (briefly) looked into this myself when first loading the
>> WorldCat Collection Set records for Springer, I found that the difference
>> was that Springer had multiple records for separate volumes which were
>> combined in a single record on WCS.  As far as I could tell, the WCS set was
>> in fact the complete set of books available from Springer.
>>
>> For general information, though, here is how I would do a project like
>> this.  The method I have seen used is to export some fields (probably title,
>> 001, and any other fields you think might be useful to identification, like
>> author and ISBN) from both files into tab-delimited text files.  Import both
>> files into a spreadsheet program like Excel.  In the table for the OCLC
>> records, add a new column and put "OCLC" into every box in the column.  For
>> the Springer table, add a new column and put "Springer" into every box in
>> the column.  Combine both tables into a single table, for instance by
>> copying and pasting.  Then you sort by the title column (primary) and
>> Springer/OCLC column (secondary).  The resulting table should mostly
>> alternate between matching Springer and OCLC titles.  Remove those and you
>> are left with only the titles in one or the other.  From that you can make a
>> file from which to do a batch search on OCLC.  After downloading the batch
>> search results, use the extr!
>
>  a !
>>  information in the table (author, ISBN, etc.) to remove incorrect
>> matches.
>>
>>                                        Steve McDonald
>>                                        [log in to unmask]
>>
>> ________________________________________________________________________
>>
>> This message comes to you via MARCEDIT-L, a Listserv(R) list for technical
>> and instructional support in MarcEdit.  If you wish to communicate directly
>> with the list owners, write to [log in to unmask] To
>> unsubscribe, send a message "SIGNOFF MARCEDIT-L" to
>> [log in to unmask]
>>
>
> --
> Stacy Pober
> Information Alchemist
> Manhattan College Library
> Riverdale, NY 10471
> [log in to unmask]
>
> ________________________________________________________________________
>
> This message comes to you via MARCEDIT-L, a Listserv(R) list for technical
> and instructional support in MarcEdit.  If you wish to communicate directly
> with the list owners, write to [log in to unmask] To
> unsubscribe, send a message "SIGNOFF MARCEDIT-L" to
> [log in to unmask]
> ________________________________________________________________________
>
> This message comes to you via MARCEDIT-L, a Listserv(R) list for technical
> and instructional support in MarcEdit. If you wish to communicate directly
> with the list owners, write to [log in to unmask] To
> unsubscribe, send a message "SIGNOFF MARCEDIT-L" to
> [log in to unmask]

________________________________________________________________________

This message comes to you via MARCEDIT-L, a Listserv(R) list for technical and instructional support in MarcEdit.  If you wish to communicate directly with the list owners, write to [log in to unmask] To unsubscribe, send a message "SIGNOFF MARCEDIT-L" to [log in to unmask]

ATOM RSS1 RSS2