PEPPLER.ORG
Michael Peppler
Sybase Consulting
Menu
Home
Sybase on Linux
Install Guide for Sybase on Linux
General Sybase Resources
General Perl Resources
Freeware
Sybperl
Sybase::Simple
DBD::Sybase
BCP Tool
Bug Tracker
Mailing List Archive
Downloads Directory
FAQs
Sybase on Linux FAQ
Sybperl FAQ
Personal
Michael Peppler's resume

sybperl-l Archive

Up    Prev    Next    

From: Michael Peppler <mpeppler at peppler dot org>
Subject: Re: BULK IN/OUT
Date: Apr 8 2002 2:35PM

Stephen.Sprague@morganstanley.com writes:
 > Given that the perl interface to bulk insert is as Michael notes 3 to  4
 > times slower than Sybase's command line version I got  to  thinking  why
 > that is.
 > 
 > My first thoughts were why isn't there some form of binary (unformatted)
 > bcp out and bcp in? I suspect formatting takes up good deal of time. Say
 > I was content to get an 'unpack format' and the binary representation of
 > the data would that shave significant time off the operations of getting
 > data in and out of sybase?

There is some of that - however I suspect most of the time is spent in
the API calls (perl->c->api->c->perl), in particular in moving data
aroung (in the perl format).

Also, for each data item perl needs to create (or reuse) an SV (scalar
value) data structure, which means that perl needs to move quite a bit
more data around than a similar C program would have to. (note that
this doesn't mean that there aren't potential optimizations that could
be implemented in the Sybase::CTlib C code - in fact there almost
certainly are!)

Note also that bcp OUT is essentially the same thing as a select -
there are no particular optimizations on the data fetches at the API
level (AFAIK).

 > this would be similiar to not specifying a '-c' option on  Sybase's  bcp
 > utility. My use right now is just  shelping  data  from  one  server  to
 > another - I really don't need to see the data.  Why  should  I  pay  for
 > formatting it and unformatting it?

You don't have to (as long as both servers use the same binary
representation of the data, of course).

Note that the fastest way of doing that data shlepping via bcp is to
do something like this:

create a view on the source server with the appropriate query
parameters.
Create a named pipe (ie mknod p ...)
run a bcp out from the view, with the output going to the pipe.
run a second bcp (in) to the target, reading from the pipe.

This should work on most Unix boxes, removes the need for a temporary
file to hold the intermediate data, and is going to be way faster than
any perl program could be.

You could automate this in a perl script fairly easily, I think.

Michael
-- 
Michael Peppler                              Data Migrations, Inc.
mpeppler@peppler.org           *or*          mpeppler@mbay.net
http://www.mbay.net/~mpeppler
International Sybase User Group: http://www.isug.com