MARCEDIT-L Archives

October 2009

MARCEDIT-L@LISTSERV.GMU.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Shirley Lincicum <[log in to unmask]>
Reply To:
MarcEdit support in technical and instructional matters <[log in to unmask]>
Date:
Thu, 15 Oct 2009 14:36:26 -0400
Content-Type:
text/plain
Parts/Attachments:
text/plain (227 lines)
Terry,

As long as global searching and editing is still supported, I think
breaking large record sets down into pages as you describe would be a
fabulous enhancement to MarcEdit. I can think of lots of applications
for this functionality, such as enabling users to more easily access
sections of a large file for spot-checking purposes.

Shirley

On Thu, Oct 15, 2009 at 1:35 PM, Reese, Terry
<[log in to unmask]> wrote:
> I have a question and I’m hoping that the collective wisdom of the list can
> help me solve it.  I’ve got an update for MarcEdit that I’ve been sitting on
> for about a month because I have a specific issue (usability mostly) that
> I’m trying to solve, and I have an idea how to do it, but it will change the
> way that you edit MARC records in the editor (at least, how they are
> displayed) and before I go forward, I wanted to quickly take the communities
> pulse on this.
>
>
>
> The problem
>
>
>
> So let’s start with an explanation of the problem.  As folks that have
> worked with both MarcEdit 4.x and MarcEdit 5.x know, the ability for the
> Editor to load a lot of data into is much different.  In MarcEdit 4.x, the
> application utilized a custom edit control written in assembly for loading
> and editing records in the MarcEditor.  This allowed users to load very
> large files (150 MB or so) into the editor without a noticeable change in
> speed when adding new data to the editor, resizing windows, etc.  In
> MarcEdit 5.x, I made a conscious decision to utilize all .NET components to
> preserve the ability to port MarcEdit to the Linux and Mac platforms (Linux
> will be officially completed at the next release btw) – however, this had
> some implications with the editor in two ways.  1) Loading rich content into
> the editor has a much higher memory cost and 2) this higher memory cost has
> a definite effect on performance (loading and editing).  This is why I
> introduced the preview mode – a read-only mode that allowed users to load a
> snippet of the file and then make their global edits.  For my usage of
> MarcEdit, this worked beautifully – but I’m finding that a number of users
> have workflows that require them to load the entire file and perform single
> record edits which is, I’ll admit, painful when files start to get close to
> 8-10 mbs in size – as changes in the editing window often times are made,
> but are made with a delay (i.e., you type a word – a pause, then the data
> catches up).  This also affects screen resizing, etc.  Tied to this problem
> is the various character encodings that MarcEdit supports (it’s beyond MARC8
> and UTF8).  This as well causes an issue with memory usage depending on the
> encoding in use – and honestly, is one of the big reasons for the change
> away from the assembly components in MarcEdit 4.x – that component simply
> didn’t do Unicode well and that’s the future of MARC.  The current component
> in MarcEdit does Unicode very well, but certain scripts give Windows some
> fits rendering (performance wise) – so it’s a problem – one that I’d like to
> solve.
>
>
>
>
>
> Solutions
>
>
>
> Anyway, that’s the problem I’m looking to solve.  I’m looking for a solution
> that will allow users that want to make individual record changes on large
> dataset within the MarcEditor, and do so in a way that allows the editor to
> gracefully handle memory management and performance.  The present solution,
> the one that is completely untenable, is to load all the data into an edit
> control.  On my test machines, I can load files up to ~150 MB in size into
> the control (your mileage will vary due to virtual memory restrictions and
> available ram) but it comes at a huge cost.  In Windows (and virtual
> languages like .NET especially), rendering content virtually is expensive.
> Memory consumed is roughly 4x the source – so, rendering 150 MB of data
> costs my system ~600 MB of virtual ram.  Painful, and performance shows.
> This is why the preview mode is there.  But let’s say you are dealing with a
> smaller dataset, something in the 8-10 MB range.  You are still consuming
> close to 40 MB to render the data – and performance can suffer depending on
> hardware and memory available.  If you need to make individual record
> changes on a batch in that size range, making these changes may be
> frustrating as you may indeed have to deal with a delay in entering data as
> the system re-buffered available memory to handle the work.  I’m pretty sure
> that everyone that’s had this happen agrees that this needs to change (I’ve
> heard from 3 people recently that have been experience this problem and are
> trying to figure out how to make it work within existing workflows) and I’m
> sure there are others that have not spoken up or may still use MarcEdit 4.x
> for very specific tasks simply because the handling of larger files for
> individual record editing was better (which is fair, but becomes less and
> less of a reliable solution as more data becomes available in UTF8).
>
>
>
> So I’ve been thinking about this a lot over the past month, writing some
> test code, developing some wireframes and I want to present some options and
> get some feedback.  Essentially, there are two ways that I think I can deal
> with this issue.  One is to essentially provide real-time random access to
> large files [not preferred], so that the only data loaded into the editor
> will be available within the memory buffer.  This would likely be the ideal
> solution, but it also is the most difficult to write simply because all data
> would need to be mapped to temporary buffers, tracked, etc.  Also, when
> dealing with really large files, the random access will not be immediate,
> meaning that as you move further down the file, the ability to page down may
> become more labored.  The benefits however, is that the memory footprint
> would be much, much lower so performance for general, individual record
> editing, should improve greatly.  It also would most closely resemble the
> current way that MarcEdit provided editing within the MarcEditor.  All data
> would appear to be loaded in a Notepad-like interface – you’d page down,
> scroll down just as you do now.  I’m not sure how this would affect Find and
> Replace – but I’m sure we could make it work.
>
>
>
> And while the above may be the more ideal, it’s not the one that I’m leaning
> towards (hence this message).  I’ve been thinking a lot about how MARC
> records are represented in MarcEdit, how they are edited, etc. and I’m
> beginning to believe that when working with a large set of MARC records, the
> best solution wouldn’t be to provide simply a complete picture of all loaded
> records, but would be to display groups of records, with the ability to page
> through a recordset.  I’ve attached some wireframes to illustrate this point
> in the attached PowerPoint.  In slide 1, I’ve provided a demo of how I think
> the editing may look (ignore the menus, icons – these are just part of my
> test code).  Essentially, users would define how many records they want to
> display per “page”.  I’m thinking that the sweet spot would likely be about
> 500 – but I’d make this user defined.  MarcEdit can then, very quickly,
> determine how many records are in the file and then break up the record set
> as pages.  MarcEdit then would only load one page of records at a time.
> This allows users the ability to quickly do individual edits of records,
> reduces memory footprint and greatly improves the overall experience of
> using large data files.  It also takes system memory limitations completely
> out of the equation, as only a small block of records will be displayed at
> any given time.
>
>
>
> Using this system also would let me rethink how we do finds within a
> Recordset.  At present, when you use the find tool, MarcEdit has to
> enumerate over the entire record set and this is, for all intensive
> purposes, a very memory intensive operation.  Slow too if you have a lot of
> records.  In this new model, I’d add a new button to the Find dialog – Find
> All (see slide 2).  When Find All was used, what would be generated is a
> report of all occurrences of the needle found within the record set.  The
> report would show the criteria in context, with the ability to jump to the
> specific page where the text was found.  Personally, I think that this could
> be a big improvement over current find, as users would immediately be able
> to see all the cases in which a criteria exists without having to jump
> through the entire file.  Additionally, this type of a design would allow me
> to start thinking about the MarcEditor itself, so that record set editing
> could be done with pages (so you could for example, span a new page within a
> new MarcEditor tab so pages could be compared [see slide 3]).  I think that
> this type of design could eventually lead to some fairly interesting
> enhancements – but I also recognize that it will be different.  It
> represents a different way to view and edit records in MarcEdit – though,
> this change really only affect how you edit records individually (since
> global editing is done differently).
>
>
>
> Finally, implementation – if I move down the above path – I can integrate
> the current test code into the existing MarcEdit application with little
> work.  I could wrap up my update and not have to really worry about
> introducing regression errors.  If I try to implement the first solution,
> all bets are off in terms of when it would be done.  It would represent a
> major change to how data is handled within the program and I’d have to step
> back, re-write a lot of code and then find some willing users to try  it
> because there would be a significant chance for regression errors.
>
>
>
> Anyway, that’s my idea.  I think it addresses a known weakness in the
> program and makes individual record editing better, and does so without
> causing too much interruption to the user.  And, if successful, may allow me
> to slowly remove the preview mode from the MarcEditor, as it would no longer
> be needed.
>
>
>
> How can you help
>
>
>
> If you stayed with me this long and looked at the wireframes, you are
> probably wondering how you can help.  Well, I’m looking for comments and
> ideas on this.  MarcEdit is a very community oriented project.  I’d say that
> over 90% of the work that goes into the program, is done at the community’s
> request.  This is an issue that I know has been raised by members of the
> user community, and I’m really waiting to make the community involved in the
> decision.  I’m definitely open to other suggestions and suggestions for how
> to tweak the wireframes (since I recognize that there are many places where
> usability could be improved) – but that’s kind of where I’m at right now.
>
>
>
> Thanks everyone who made it this far,
>
>
>
> --TR
>
>
>
>
>
> ********************************
> Terry Reese
> Gray Family Chair
> for Innovative Library Services
> 121 Valley Libraries
> Corvallis, Or 97331
> tel: 541.737.6384
> ********************************
>
>
>
>
>
> ________________________________________________________________________
>
> This message comes to you via MARCEDIT-L, a Listserv(R) list for technical
> and instructional support in MarcEdit. If you wish to communicate directly
> with the list owners, write to [log in to unmask] To
> unsubscribe, send a message "SIGNOFF MARCEDIT-L" to
> [log in to unmask]

________________________________________________________________________

This message comes to you via MARCEDIT-L, a Listserv(R) list for technical and instructional support in MarcEdit.  If you wish to communicate directly with the list owners, write to [log in to unmask] To unsubscribe, send a message "SIGNOFF MARCEDIT-L" to [log in to unmask]

ATOM RSS1 RSS2