PEPPLER.ORG
Michael Peppler
Sybase Consulting
Menu
Home
Sybase on Linux
Install Guide for Sybase on Linux
General Sybase Resources
General Perl Resources
Freeware
Sybperl
Sybase::Simple
DBD::Sybase
BCP Tool
Bug Tracker
Mailing List Archive
Downloads Directory
FAQs
Sybase on Linux FAQ
Sybperl FAQ
Personal
Michael Peppler's resume

sybperl-l Archive

Up    Prev    Next    

From: Stephen dot Sprague at morganstanley dot com
Subject: Re: BULK IN/OUT
Date: Apr 8 2002 2:55PM

Michael-
Okay. Yeah that makes too.

What I should do is run some tests comparing "bcp out | bcp in" and "bcp
out  -c  |  bcp   in   -c"   and   the   differences   should   be   the
formatting/unformatting  overhead  (all  other  things  being  equal  of
course.) That might tell me something useful.

Well, maybe tonight...

Thanks again!
Stephen Sprague




On Mon, 8 Apr 2002 @ 7:35am, an entity claiming to be Michael Peppler scribbled:

mpeppl :Stephen.Sprague@morganstanley.com writes:
mpeppl : > Given that the perl interface to bulk insert is as Michael notes 3 to  4
mpeppl : > times slower than Sybase's command line version I got  to  thinking  why
mpeppl : > that is.
mpeppl : >
mpeppl : > My first thoughts were why isn't there some form of binary (unformatted)
mpeppl : > bcp out and bcp in? I suspect formatting takes up good deal of time. Say
mpeppl : > I was content to get an 'unpack format' and the binary representation of
mpeppl : > the data would that shave significant time off the operations of getting
mpeppl : > data in and out of sybase?
mpeppl :
mpeppl :There is some of that - however I suspect most of the time is spent in
mpeppl :the API calls (perl->c->api->c->perl), in particular in moving data
mpeppl :aroung (in the perl format).
mpeppl :
mpeppl :Also, for each data item perl needs to create (or reuse) an SV (scalar
mpeppl :value) data structure, which means that perl needs to move quite a bit
mpeppl :more data around than a similar C program would have to. (note that
mpeppl :this doesn't mean that there aren't potential optimizations that could
mpeppl :be implemented in the Sybase::CTlib C code - in fact there almost
mpeppl :certainly are!)
mpeppl :
mpeppl :Note also that bcp OUT is essentially the same thing as a select -
mpeppl :there are no particular optimizations on the data fetches at the API
mpeppl :level (AFAIK).
mpeppl :
mpeppl : > this would be similiar to not specifying a '-c' option on  Sybase's  bcp
mpeppl : > utility. My use right now is just  shelping  data  from  one  server  to
mpeppl : > another - I really don't need to see the data.  Why  should  I  pay  for
mpeppl : > formatting it and unformatting it?
mpeppl :
mpeppl :You don't have to (as long as both servers use the same binary
mpeppl :representation of the data, of course).
mpeppl :
mpeppl :Note that the fastest way of doing that data shlepping via bcp is to
mpeppl :do something like this:
mpeppl :
mpeppl :create a view on the source server with the appropriate query
mpeppl :parameters.
mpeppl :Create a named pipe (ie mknod p ...)
mpeppl :run a bcp out from the view, with the output going to the pipe.
mpeppl :run a second bcp (in) to the target, reading from the pipe.
mpeppl :
mpeppl :This should work on most Unix boxes, removes the need for a temporary
mpeppl :file to hold the intermediate data, and is going to be way faster than
mpeppl :any perl program could be.
mpeppl :
mpeppl :You could automate this in a perl script fairly easily, I think.
mpeppl :
mpeppl :Michael
mpeppl :