Michael Peppler
Sybase Consulting
Sybase on Linux
Install Guide for Sybase on Linux
General Sybase Resources
General Perl Resources
BCP Tool
Bug Tracker
Mailing List Archive
Downloads Directory
Sybase on Linux FAQ
Sybperl FAQ
Michael Peppler's resume

sybperl-l Archive

Up    Prev    Next    

From: Stephen dot Sprague at morganstanley dot com
Subject: Re: BULK IN/OUT
Date: Apr 8 2002 2:55PM

Okay. Yeah that makes too.

What I should do is run some tests comparing "bcp out | bcp in" and "bcp
out  -c  |  bcp   in   -c"   and   the   differences   should   be   the
formatting/unformatting  overhead  (all  other  things  being  equal  of
course.) That might tell me something useful.

Well, maybe tonight...

Thanks again!
Stephen Sprague

On Mon, 8 Apr 2002 @ 7:35am, an entity claiming to be Michael Peppler scribbled:

mpeppl writes:
mpeppl : > Given that the perl interface to bulk insert is as Michael notes 3 to  4
mpeppl : > times slower than Sybase's command line version I got  to  thinking  why
mpeppl : > that is.
mpeppl : >
mpeppl : > My first thoughts were why isn't there some form of binary (unformatted)
mpeppl : > bcp out and bcp in? I suspect formatting takes up good deal of time. Say
mpeppl : > I was content to get an 'unpack format' and the binary representation of
mpeppl : > the data would that shave significant time off the operations of getting
mpeppl : > data in and out of sybase?
mpeppl :
mpeppl :There is some of that - however I suspect most of the time is spent in
mpeppl :the API calls (perl->c->api->c->perl), in particular in moving data
mpeppl :aroung (in the perl format).
mpeppl :
mpeppl :Also, for each data item perl needs to create (or reuse) an SV (scalar
mpeppl :value) data structure, which means that perl needs to move quite a bit
mpeppl :more data around than a similar C program would have to. (note that
mpeppl :this doesn't mean that there aren't potential optimizations that could
mpeppl :be implemented in the Sybase::CTlib C code - in fact there almost
mpeppl :certainly are!)
mpeppl :
mpeppl :Note also that bcp OUT is essentially the same thing as a select -
mpeppl :there are no particular optimizations on the data fetches at the API
mpeppl :level (AFAIK).
mpeppl :
mpeppl : > this would be similiar to not specifying a '-c' option on  Sybase's  bcp
mpeppl : > utility. My use right now is just  shelping  data  from  one  server  to
mpeppl : > another - I really don't need to see the data.  Why  should  I  pay  for
mpeppl : > formatting it and unformatting it?
mpeppl :
mpeppl :You don't have to (as long as both servers use the same binary
mpeppl :representation of the data, of course).
mpeppl :
mpeppl :Note that the fastest way of doing that data shlepping via bcp is to
mpeppl :do something like this:
mpeppl :
mpeppl :create a view on the source server with the appropriate query
mpeppl :parameters.
mpeppl :Create a named pipe (ie mknod p ...)
mpeppl :run a bcp out from the view, with the output going to the pipe.
mpeppl :run a second bcp (in) to the target, reading from the pipe.
mpeppl :
mpeppl :This should work on most Unix boxes, removes the need for a temporary
mpeppl :file to hold the intermediate data, and is going to be way faster than
mpeppl :any perl program could be.
mpeppl :
mpeppl :You could automate this in a perl script fairly easily, I think.
mpeppl :
mpeppl :Michael
mpeppl :