[prog] perl IPC

Sat Mar 27 04:13:05 EST 2004

On Fri, Mar 26, 2004 at 05:13:51PM +0000, Caroline Johnston wrote:
> Hi,
> 
> I'm trying to get a perl cgi script to use R to do some stats on user
> supplied data. I can do this fine by using backticks and calling R with a
> script to run (I have a template which I'm parsing as appropriate using
> Text::Template). The problem is, I want to run a few scripts, depending on
> what information the user wants. I could do this like:
> 
> (if user wants)
> $res1<-`R < $script1`;
> 
> (if user wants)
> $res2<-`R < $script2`;
> 
> but it takes a reasonable amount of time for R to start up, get the data
> and put it into a suitable format for analysis. I really only want to do
> this once, as all the R scripts work on the same data.

Hi,

using pipes is heading in the right direction, but for the purpose
at hand, it's not quite gonna to cut it... ;)

In a nutshell: you need IPC::Open2 and a socket - at least that's what
I would recommend. As an alternative you could use just IPC::Open2 in
combination with FastCGI (www.fastcgi.com - btw, don't let the .com
intimidate you, it's free...). That would be sightly faster, but more
of a hassle to set up. If that nevertheless sounds interesting, let me
know, and I'll elaborate a bit more on that in another post.

Below, I'll discuss the 'IPC::Open2 + socket' approach. Essentially,
you want two things:

(1) a persistently running R program (let's call this the R-server)

(2) some way to communicate with the CGI script, acting as a frontend
    (R-client)

You need IPC::Open2 to persistently run the R interpreter, read its
stdout, and pipe in the commands to execute (you could also use Open3
if you want to handle stderr as well). This R-server is running as a
kind of daemon with the backend functionality supplied by the R
program. But that's only half of the story, as you also need some way
for the CGI script to connect to the server. Typically, one would use
a socket for that, which does also handle (or queue up, if necessary) 
multiple simultaneous requests.

Note that you _cannot_ directly run the IPC::Open2 wrapper as the CGI
script, since Open2 itself starts the program R. Terminating the
wrapper at the end of the CGI call would terminate R as well, so you'd
have gained absolutely nothing, performance-wise.

Thus, you have (at least) two processes persistently running: the R
program, and the wrapper which connects its input and output to some
rendezvous point that the clients (the CGI script) can connect to, to
send commands/requests and receive the output computed by R.

In the CGI script you'd do the usual CGI-parameter parsing on the input
side, and the templating stuff for the HTML output. The actual content
comes via the socket.

Below, you'll find absolutely barebone examples of the various
pieces you need: R-server.pl, R-client.pl and R.pl.

R.pl is a substitute for the real R that you're eventually going to fit
in this place. I've supplied a simple shell wrapper, allowing you to
issue any regular shell command like ls, ps, etc. via this execution
chain -- just for testing purposes, of course.

So, you could start R-server.pl in one terminal, and then run
R-client.pl from another, with commands like

./R-client.pl ls -l
./R-client.pl ps axf

For debugging purposes, you might want to have the R-server.pl stay in
the foreground, allowing you to print any debugging stuff to stdout
connected to that tty.  Eventually, you'll probably want to turn it
into a proper daemon (i.e. disconnecting standard IO from the tty,
chdir to toplevel /, fork, create new a session with setsid()...)
See the perl docs for details and example code (grep for 'daemonize'
in perlipc).

In theory, you could also use a plain "/bin/bash" instead of the
R.pl -- but in practice, it wouldn't quite work as intended, as there's
no easy way to tell when the shell has finished sending output. So,
the while-loop will keep reading on and on... the socket will never be
closed, and the client will hang indefinitely. Essentially, this is all
that R.pl is doing: it puts an "___END___" string at the end of the
output, so one can tell in the R-server code when there's no more
ouput to come. The actual shell command is simply run as backticks,
so it does terminate by itself, here. This is kind of cheating - so
DO NOT copy that strategy for R ;)  (what's run persistently here
is the perl script R.pl, not /bin/bash...)

When you get to plug in R here in place of R.pl, you need to devise
some way to tell when all the output has been sent, so you can
disconnect the CGI client by closing the socket. Depending on how
R's output is structured, this may turn out to become rather tricky.
I don't know...   At least, it's my experience, that getting side
aspects like these working reliably, often is the most challenging and
time-consuming part of the whole thing...

Additionally, we're making the following two assumptions:

(1) R has an option to read its commands from stdin, not only from some
interactive PTY. Probably it does, but I'm not sure... If it really
doesn't, you'd need an additional Perl module, like Term::ReadKey.

(2) All commands are executed in a simple, strictly sequential order,
i.e. you give R a command, after which it returns some output
(comparable to typing something like 'ls' at the shell prompt). While
output is being returned, no new input can be supplied, and while it's
waiting for input, no output can be read. If you need asynchronous mode
of operation, you'd have to make the IPC::Open2-wrapper fork(), so that
there are two processes - one for handling input, and another for
handling output independently.

A final note: in the example I'm using unix domain sockets. You could
of course also use regular TCP sockets (those with <IP-adress>:<port>).
Using unix domain sockets has the advantage, that you get access
control for free (via regular file permissions of the socket file -
which needs to be r/w for the UID that the CGI is running as, btw), and
that they're slightly faster.

Okay, I guess that's about it.  I won't comment every single line
of the example code (it's already a litte late around here ;) - but
don't hesitate to ask if something is unclear...

---------- R-server.pl ---------------------------- snip ---

#!/usr/bin/perl

use IO::Handle;
use IO::Socket;
use IPC::Open2;

my $sockname = "/tmp/R-server.sock";

my $program = "./R.pl";

my $reader = IO::Handle->new;
my $writer = IO::Handle->new;

open2($reader, $writer, $program);

my $socket = open_socket($sockname);
service_requests($socket);             # loop until we get killed

###

sub open_socket {
    my $sockname = shift;
    unlink $sockname;
    return IO::Socket::UNIX->new(
                                 Local  => $sockname,
                                 Type   => SOCK_STREAM,
                                 Listen => SOMAXCONN,
                             )
           or die "$0: cannot create socket '$sockname': $!\n";
}

sub service_requests {
    my $sock = shift;

    while (my $client = $sock->accept()) {

        my $cmd = <$client>;           # read command from socket
        do_command($cmd, $client);     # execute command
        close $client;                 # close socket to client
    }
}

sub do_command {
    my $cmd    = shift;
    my $client = shift;

    print $writer $cmd;                # send command to R

    while (my $line = <$reader>) {     # read output from R
        last if $line =~ /^___END___/; # need some way to exit...
        print $client $line;           # pass output to socket
    }
}

---------- R-client.pl (-> CGI) ------------------- snip ---

#!/usr/bin/perl

use IO::Socket;

my $sockname = "/tmp/R-server.sock";

my $server = IO::Socket::UNIX->new(Peer    => $sockname,
                                   Type    => SOCK_STREAM,
                                   Timeout => 5 )
             or die "$0: ERROR: $sockname: $!\n";

print $server "@ARGV\n";        # send command to socket

while (my $line = <$server>) {  # read result from socket
    print $line;                # and write to stdout
}

close $server;

---------- R.pl (substitute for the real R) ------- snip ---

#!/usr/bin/perl

$| = 1; # autoflush

while (my $cmd = <>) {             # read command from stdin
    my $output = `$cmd`;           # execute
    print $output, "___END___\n";  # write to stdout
}