|
|
sybperl-l Archive
Up Prev Next
From: Stephen dot Sprague at morganstanley dot com
Subject: Re: BULK IN/OUT
Date: Apr 8 2002 2:55PM
Michael-
Okay. Yeah that makes too.
What I should do is run some tests comparing "bcp out | bcp in" and "bcp
out -c | bcp in -c" and the differences should be the
formatting/unformatting overhead (all other things being equal of
course.) That might tell me something useful.
Well, maybe tonight...
Thanks again!
Stephen Sprague
On Mon, 8 Apr 2002 @ 7:35am, an entity claiming to be Michael Peppler scribbled:
mpeppl :Stephen.Sprague@morganstanley.com writes:
mpeppl : > Given that the perl interface to bulk insert is as Michael notes 3 to 4
mpeppl : > times slower than Sybase's command line version I got to thinking why
mpeppl : > that is.
mpeppl : >
mpeppl : > My first thoughts were why isn't there some form of binary (unformatted)
mpeppl : > bcp out and bcp in? I suspect formatting takes up good deal of time. Say
mpeppl : > I was content to get an 'unpack format' and the binary representation of
mpeppl : > the data would that shave significant time off the operations of getting
mpeppl : > data in and out of sybase?
mpeppl :
mpeppl :There is some of that - however I suspect most of the time is spent in
mpeppl :the API calls (perl->c->api->c->perl), in particular in moving data
mpeppl :aroung (in the perl format).
mpeppl :
mpeppl :Also, for each data item perl needs to create (or reuse) an SV (scalar
mpeppl :value) data structure, which means that perl needs to move quite a bit
mpeppl :more data around than a similar C program would have to. (note that
mpeppl :this doesn't mean that there aren't potential optimizations that could
mpeppl :be implemented in the Sybase::CTlib C code - in fact there almost
mpeppl :certainly are!)
mpeppl :
mpeppl :Note also that bcp OUT is essentially the same thing as a select -
mpeppl :there are no particular optimizations on the data fetches at the API
mpeppl :level (AFAIK).
mpeppl :
mpeppl : > this would be similiar to not specifying a '-c' option on Sybase's bcp
mpeppl : > utility. My use right now is just shelping data from one server to
mpeppl : > another - I really don't need to see the data. Why should I pay for
mpeppl : > formatting it and unformatting it?
mpeppl :
mpeppl :You don't have to (as long as both servers use the same binary
mpeppl :representation of the data, of course).
mpeppl :
mpeppl :Note that the fastest way of doing that data shlepping via bcp is to
mpeppl :do something like this:
mpeppl :
mpeppl :create a view on the source server with the appropriate query
mpeppl :parameters.
mpeppl :Create a named pipe (ie mknod p ...)
mpeppl :run a bcp out from the view, with the output going to the pipe.
mpeppl :run a second bcp (in) to the target, reading from the pipe.
mpeppl :
mpeppl :This should work on most Unix boxes, removes the need for a temporary
mpeppl :file to hold the intermediate data, and is going to be way faster than
mpeppl :any perl program could be.
mpeppl :
mpeppl :You could automate this in a perl script fairly easily, I think.
mpeppl :
mpeppl :Michael
mpeppl :
|