Terry,
As long as global searching and editing is still supported, I think
breaking large record sets down into pages as you describe would be a
fabulous enhancement to MarcEdit. I can think of lots of applications
for this functionality, such as enabling users to more easily access
sections of a large file for spot-checking purposes.
Shirley
On Thu, Oct 15, 2009 at 1:35 PM, Reese, Terry
<[log in to unmask]> wrote:
> I have a question and I’m hoping that the collective wisdom of the list can
> help me solve it. I’ve got an update for MarcEdit that I’ve been sitting on
> for about a month because I have a specific issue (usability mostly) that
> I’m trying to solve, and I have an idea how to do it, but it will change the
> way that you edit MARC records in the editor (at least, how they are
> displayed) and before I go forward, I wanted to quickly take the communities
> pulse on this.
>
>
>
> The problem
>
>
>
> So let’s start with an explanation of the problem. As folks that have
> worked with both MarcEdit 4.x and MarcEdit 5.x know, the ability for the
> Editor to load a lot of data into is much different. In MarcEdit 4.x, the
> application utilized a custom edit control written in assembly for loading
> and editing records in the MarcEditor. This allowed users to load very
> large files (150 MB or so) into the editor without a noticeable change in
> speed when adding new data to the editor, resizing windows, etc. In
> MarcEdit 5.x, I made a conscious decision to utilize all .NET components to
> preserve the ability to port MarcEdit to the Linux and Mac platforms (Linux
> will be officially completed at the next release btw) – however, this had
> some implications with the editor in two ways. 1) Loading rich content into
> the editor has a much higher memory cost and 2) this higher memory cost has
> a definite effect on performance (loading and editing). This is why I
> introduced the preview mode – a read-only mode that allowed users to load a
> snippet of the file and then make their global edits. For my usage of
> MarcEdit, this worked beautifully – but I’m finding that a number of users
> have workflows that require them to load the entire file and perform single
> record edits which is, I’ll admit, painful when files start to get close to
> 8-10 mbs in size – as changes in the editing window often times are made,
> but are made with a delay (i.e., you type a word – a pause, then the data
> catches up). This also affects screen resizing, etc. Tied to this problem
> is the various character encodings that MarcEdit supports (it’s beyond MARC8
> and UTF8). This as well causes an issue with memory usage depending on the
> encoding in use – and honestly, is one of the big reasons for the change
> away from the assembly components in MarcEdit 4.x – that component simply
> didn’t do Unicode well and that’s the future of MARC. The current component
> in MarcEdit does Unicode very well, but certain scripts give Windows some
> fits rendering (performance wise) – so it’s a problem – one that I’d like to
> solve.
>
>
>
>
>
> Solutions
>
>
>
> Anyway, that’s the problem I’m looking to solve. I’m looking for a solution
> that will allow users that want to make individual record changes on large
> dataset within the MarcEditor, and do so in a way that allows the editor to
> gracefully handle memory management and performance. The present solution,
> the one that is completely untenable, is to load all the data into an edit
> control. On my test machines, I can load files up to ~150 MB in size into
> the control (your mileage will vary due to virtual memory restrictions and
> available ram) but it comes at a huge cost. In Windows (and virtual
> languages like .NET especially), rendering content virtually is expensive.
> Memory consumed is roughly 4x the source – so, rendering 150 MB of data
> costs my system ~600 MB of virtual ram. Painful, and performance shows.
> This is why the preview mode is there. But let’s say you are dealing with a
> smaller dataset, something in the 8-10 MB range. You are still consuming
> close to 40 MB to render the data – and performance can suffer depending on
> hardware and memory available. If you need to make individual record
> changes on a batch in that size range, making these changes may be
> frustrating as you may indeed have to deal with a delay in entering data as
> the system re-buffered available memory to handle the work. I’m pretty sure
> that everyone that’s had this happen agrees that this needs to change (I’ve
> heard from 3 people recently that have been experience this problem and are
> trying to figure out how to make it work within existing workflows) and I’m
> sure there are others that have not spoken up or may still use MarcEdit 4.x
> for very specific tasks simply because the handling of larger files for
> individual record editing was better (which is fair, but becomes less and
> less of a reliable solution as more data becomes available in UTF8).
>
>
>
> So I’ve been thinking about this a lot over the past month, writing some
> test code, developing some wireframes and I want to present some options and
> get some feedback. Essentially, there are two ways that I think I can deal
> with this issue. One is to essentially provide real-time random access to
> large files [not preferred], so that the only data loaded into the editor
> will be available within the memory buffer. This would likely be the ideal
> solution, but it also is the most difficult to write simply because all data
> would need to be mapped to temporary buffers, tracked, etc. Also, when
> dealing with really large files, the random access will not be immediate,
> meaning that as you move further down the file, the ability to page down may
> become more labored. The benefits however, is that the memory footprint
> would be much, much lower so performance for general, individual record
> editing, should improve greatly. It also would most closely resemble the
> current way that MarcEdit provided editing within the MarcEditor. All data
> would appear to be loaded in a Notepad-like interface – you’d page down,
> scroll down just as you do now. I’m not sure how this would affect Find and
> Replace – but I’m sure we could make it work.
>
>
>
> And while the above may be the more ideal, it’s not the one that I’m leaning
> towards (hence this message). I’ve been thinking a lot about how MARC
> records are represented in MarcEdit, how they are edited, etc. and I’m
> beginning to believe that when working with a large set of MARC records, the
> best solution wouldn’t be to provide simply a complete picture of all loaded
> records, but would be to display groups of records, with the ability to page
> through a recordset. I’ve attached some wireframes to illustrate this point
> in the attached PowerPoint. In slide 1, I’ve provided a demo of how I think
> the editing may look (ignore the menus, icons – these are just part of my
> test code). Essentially, users would define how many records they want to
> display per “page”. I’m thinking that the sweet spot would likely be about
> 500 – but I’d make this user defined. MarcEdit can then, very quickly,
> determine how many records are in the file and then break up the record set
> as pages. MarcEdit then would only load one page of records at a time.
> This allows users the ability to quickly do individual edits of records,
> reduces memory footprint and greatly improves the overall experience of
> using large data files. It also takes system memory limitations completely
> out of the equation, as only a small block of records will be displayed at
> any given time.
>
>
>
> Using this system also would let me rethink how we do finds within a
> Recordset. At present, when you use the find tool, MarcEdit has to
> enumerate over the entire record set and this is, for all intensive
> purposes, a very memory intensive operation. Slow too if you have a lot of
> records. In this new model, I’d add a new button to the Find dialog – Find
> All (see slide 2). When Find All was used, what would be generated is a
> report of all occurrences of the needle found within the record set. The
> report would show the criteria in context, with the ability to jump to the
> specific page where the text was found. Personally, I think that this could
> be a big improvement over current find, as users would immediately be able
> to see all the cases in which a criteria exists without having to jump
> through the entire file. Additionally, this type of a design would allow me
> to start thinking about the MarcEditor itself, so that record set editing
> could be done with pages (so you could for example, span a new page within a
> new MarcEditor tab so pages could be compared [see slide 3]). I think that
> this type of design could eventually lead to some fairly interesting
> enhancements – but I also recognize that it will be different. It
> represents a different way to view and edit records in MarcEdit – though,
> this change really only affect how you edit records individually (since
> global editing is done differently).
>
>
>
> Finally, implementation – if I move down the above path – I can integrate
> the current test code into the existing MarcEdit application with little
> work. I could wrap up my update and not have to really worry about
> introducing regression errors. If I try to implement the first solution,
> all bets are off in terms of when it would be done. It would represent a
> major change to how data is handled within the program and I’d have to step
> back, re-write a lot of code and then find some willing users to try it
> because there would be a significant chance for regression errors.
>
>
>
> Anyway, that’s my idea. I think it addresses a known weakness in the
> program and makes individual record editing better, and does so without
> causing too much interruption to the user. And, if successful, may allow me
> to slowly remove the preview mode from the MarcEditor, as it would no longer
> be needed.
>
>
>
> How can you help
>
>
>
> If you stayed with me this long and looked at the wireframes, you are
> probably wondering how you can help. Well, I’m looking for comments and
> ideas on this. MarcEdit is a very community oriented project. I’d say that
> over 90% of the work that goes into the program, is done at the community’s
> request. This is an issue that I know has been raised by members of the
> user community, and I’m really waiting to make the community involved in the
> decision. I’m definitely open to other suggestions and suggestions for how
> to tweak the wireframes (since I recognize that there are many places where
> usability could be improved) – but that’s kind of where I’m at right now.
>
>
>
> Thanks everyone who made it this far,
>
>
>
> --TR
>
>
>
>
>
> ********************************
> Terry Reese
> Gray Family Chair
> for Innovative Library Services
> 121 Valley Libraries
> Corvallis, Or 97331
> tel: 541.737.6384
> ********************************
>
>
>
>
>
> ________________________________________________________________________
>
> This message comes to you via MARCEDIT-L, a Listserv(R) list for technical
> and instructional support in MarcEdit. If you wish to communicate directly
> with the list owners, write to [log in to unmask] To
> unsubscribe, send a message "SIGNOFF MARCEDIT-L" to
> [log in to unmask]
________________________________________________________________________
This message comes to you via MARCEDIT-L, a Listserv(R) list for technical and instructional support in MarcEdit. If you wish to communicate directly with the list owners, write to [log in to unmask] To unsubscribe, send a message "SIGNOFF MARCEDIT-L" to [log in to unmask]
|