This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: workflow idiom to compare zip/tgz with folder subtree


Eliot Moss wrote:
> There are also various backup tools based on rsync and compression.
> One of these is called duplicity, and it supports encryption as
> well.  But I suspect there are a number of these and that you can
> find one that matches your task ...

Andrey Repin wrote:
> It seems he need comparison over reservation.  I don't know of any
> backup tools that offer differential view against backup content.
> Not that I know many backup tools, though...

Warren Young suggested: fossil

Thank you all.  I've perused and pondered.  There is a key constraint
that I neglected to mention.  I am shuttling incremental work back and
forth between two locations using disc.  At one of the sites, the only
possible tools are M$ Office and a snapshot of Cygwin.  The full copy
of the working hierarchy exists at the two sites (almost identical).
The more restrictive site is the authoritative home of the historical
snapshots, though I may have mini-snapshots at the alternative site.
The comparison of the working file hierarchy with snapshots lets me
vet what needs to be shuttle back and forth; the majority of the
differences will not be relevant as the hierarcy exists at both sites.
I use the same archival scheme for local snapshots and for shuttling
work between sites, though the content is not the same (I won't take
an entire local snapshot with me on disc most of the time).

Most of the files are not software, though parallels can be drawn:
Long SQL scripts, Matlab scripts, images, data files, VBA, Matlab
files, text files, LaTeX files, image files, and M$ Office files
(Access, Excel, Word, Powerpoint, PST).  This is not a development
environment, it is an analysis environment (with code hackery to that
end).  However, the evolution of files and version control
requirements probably overlap (I can only guess as I've never worked
in a regulated code development environment, relying instead on my own
adhoc snapshots & incrementals).  One differences from the days when I
wrote "real" (compiled) code is that I'm not just archiving source
code; some of the files are images, databases, etc., and take up a lot
of space.  I end up creating incrementals a lot more, or simply
leaving the big files out of the snapshot routine (relying on very old
snapshots).  My analysis strategy is strongly influenced by this; I
try to avoid computational approaches that rely on intermediately
generated data that need to be archvied.  As much as possible,
everything should be quickly generatable from raw client input data
files.  Been able to get away with that so far, with a great deal of
effort.

I rely alot on bash hackery, even though I'm no graybeard.  "find",
"diff -qr", and "xargs" are indispensible, and using vim window
splitting, it is very efficient to browse the diff output and warp to
discrepant text files, and even delve into zip files to open its
content, and then use vimdiff to cruise the discrepancies.  The
synergy between vim & bash are (to me) like magic, scripting up copies
and such and piping them to bash.  For the most part, however, you
need to unpack the snapshot (or rebuild it from incrementals).  Andrey
is right, the main thing causing me to put the question out there is
the desire to avoid this.

I noticed that fossil & cvs are part of cygwin.  I will have to bite
the bullet & try a few baby steps at some point.


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]