mirror of
https://github.com/MariaDB/server.git
synced 2025-01-18 21:12:26 +01:00
453 lines
24 KiB
HTML
453 lines
24 KiB
HTML
<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
|
|
<html>
|
|
<head>
|
|
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
|
|
<meta name="GENERATOR" content="Mozilla/4.76 [en] (X11; U; FreeBSD 4.3-RELEASE i386) [Netscape]">
|
|
</head>
|
|
<body>
|
|
|
|
<center>
|
|
<h1>
|
|
Client/Server Interface for Berkeley DB</h1></center>
|
|
|
|
<center><i>Susan LoVerso</i>
|
|
<br><i>sue@sleepycat.com</i>
|
|
<br><i>Rev 1.3</i>
|
|
<br><i>1999 Nov 29</i></center>
|
|
|
|
<p>We provide an interface allowing client/server access to Berkeley DB.
|
|
Our goal is to provide a client and server library to allow users to separate
|
|
the functionality of their applications yet still have access to the full
|
|
benefits of Berkeley DB. The goal is to provide a totally seamless
|
|
interface with minimal modification to existing applications as well.
|
|
<p>The client/server interface for Berkeley DB can be broken up into several
|
|
layers. At the lowest level there is the transport mechanism to send
|
|
out the messages over the network. Above that layer is the messaging
|
|
layer to interpret what comes over the wire, and bundle/unbundle message
|
|
contents. The next layer is Berkeley DB itself.
|
|
<p>The transport layer uses ONC RPC (RFC 1831) and XDR (RFC 1832).
|
|
We declare our message types and operations supported by our program and
|
|
the RPC library and utilities pretty much take care of the rest.
|
|
The
|
|
<i>rpcgen</i> program generates all of the low level code needed.
|
|
We need to define both sides of the RPC.
|
|
<br>
|
|
<h2>
|
|
<a NAME="DB Modifications"></a>DB Modifications</h2>
|
|
To achieve the goal of a seamless interface, it is necessary to impose
|
|
a constraint on the application. That constraint is simply that all database
|
|
access must be done through an open environment. I.e. this model
|
|
does not support standalone databases. The reason for this constraint
|
|
is so that we have an environment structure internally to store our connection
|
|
to the server. Imposing this constraint means that we can provide
|
|
the seamless interface just by adding a single environment method: <a href="../docs/api_c/env_set_rpc_server.html">DBENV->set_rpc_server()</a>.
|
|
<p>The planned interface for this method is:
|
|
<pre>DBENV->set_rpc_server(dbenv, /* DB_ENV structure */
|
|
hostname /* Host of server */
|
|
cl_timeout, /* Client timeout (sec) */
|
|
srv_timeout,/* Server timeout (sec) */
|
|
flags); /* Flags: unused */</pre>
|
|
This new method takes the hostname of the server, establishes our connection
|
|
and an environment on the server. If a server timeout is specified,
|
|
then we send that to the server as well (and the server may or may not
|
|
choose to use that value). This timeout is how long the server will
|
|
allow the environment to remain idle before declaring it dead and releasing
|
|
resources on the server. The pointer to the connection is stored
|
|
on the client in the DBENV structure and is used by all other methods to
|
|
figure out with whom to communicate. If a client timeout is specified,
|
|
it indicates how long the client is willing to wait for a reply from the
|
|
server. If the values are 0, then defaults are used. Flags
|
|
is currently unused, but exists because we always need to have a placeholder
|
|
for flags and it would be used for specifying authentication desired (were
|
|
we to provide an authentication scheme at some point) or other uses not
|
|
thought of yet!
|
|
<p>This client code is part of the monolithic DB library. The user
|
|
accesses the client functions via a new flag to <a href="../docs/api_c/db_env_create.html">db_env_create()</a>.
|
|
That flag is DB_CLIENT. By using this flag the user indicates they
|
|
want to have the client methods rather than the standard methods for the
|
|
environment. Also by issuing this flag, the user needs to connect
|
|
to the server via the <a href="../docs/api_c/env_set_rpc_server.html">DBENV->set_rpc_server()</a>
|
|
method.
|
|
<p>We need two new fields in the <i>DB_ENV </i>structure. One is
|
|
the socket descriptor to communicate to the server, the other field is
|
|
the client identifier the server gives to us. The <i>DB, </i>and<i>
|
|
DBC </i>only need one additional field, the client identifier. The
|
|
<i>DB_TXN</i>
|
|
structure does not need modification, we are overloading the <i>txn_id
|
|
</i>field.
|
|
<h2>
|
|
Issues</h2>
|
|
We need to figure out what to do in case of client and server crashes.
|
|
Both the client library and the server program are stateful. They
|
|
both consume local resources during the lifetime of the connection.
|
|
Should one end drop that connection, the other side needs to release those
|
|
resources.
|
|
<p>If the server crashes, then the client will get an error back.
|
|
I have chosen to implement time-outs on the client side, using a default
|
|
or allowing the application to specify one through the <a href="../docs/api_c/env_set_rpc_server.html">DBENV->set_rpc_server()</a>
|
|
method. Either the current operation will time-out waiting for the
|
|
reply or the next operation called will time out (or get back some other
|
|
kind of error regarding the server's non-existence). In any case,
|
|
if the client application gets back such an error, it should abort any
|
|
open transactions locally, close any databases, and close its environment.
|
|
It may then decide to retry to connect to the server periodically or whenever
|
|
it comes back. If the last operation a client did was a transaction
|
|
commit that did not return or timed out from the server, the client cannot
|
|
determine if the transaction was committed or not but must release the
|
|
local transaction resources. Once the server is back up, recovery must
|
|
be run on the server. If the transaction commit completed on
|
|
the server before the crash, then the operation is redone, if the transaction
|
|
commit did not get to the server, the pieces of the transaction are undone
|
|
on recover. The client can then re-establish its connection and begin
|
|
again. This is effectively like beginning over. The client
|
|
cannot use ID's from its previous connection to the server. However,
|
|
if recovery is run, then consistency is assured.
|
|
<p>If the client crashes, the server needs to somehow figure this out.
|
|
The server is just sitting there waiting for a request to come in.
|
|
A server must be able to time-out a client. Similar to ftpd, if a
|
|
connection is idle for N seconds, then the server decides the client is
|
|
dead and releases that client's resources, aborting any open transactions,
|
|
closing any open databases and environments. The server timing
|
|
out a client is not a trivial issue however. The generated function
|
|
for the server just calls <i>svc_run()</i>. The server code I write
|
|
contains procedures to do specific things. We do not have access
|
|
to the code calling <i>select()</i>. Timing out the select is not
|
|
good enough even if we could do so. We want to time-out idle environments,
|
|
not simply cause a time-out if the server is idle a while. See the
|
|
discussion of the <a href="#The Server Program">server program</a> for
|
|
a description of how we accomplish this.
|
|
<p>Since rpcgen generates the main() function of the server, I do not yet
|
|
know how we are going to have the server multi-threaded or multi-process
|
|
without changing the generated code. The RPC book indicates that
|
|
the only way to accomplish this is through modifying the generated code
|
|
in the server. <b>For the moment we will ignore this issue while
|
|
we get the core server working, as it is only a performance issue.</b>
|
|
<p>We do not do any security or authentication. Someone could get
|
|
the code and modify it to spoof messages, trick the server, etc.
|
|
RPC has some amount of authentication built into it. I haven't yet
|
|
looked into it much to know if we want to use it or just point a user at
|
|
it. The changes to the client code are fairly minor, the changes
|
|
to our server procs are fairly minor. We would have to add code to
|
|
a <i>sed</i> script or <i>awk</i> script to change the generated server
|
|
code (yet again) in the dispatch routine to perform authentication.
|
|
<p>We will need to get an official program number from Sun. We can
|
|
get this by sending mail to <i>rpc@sun.com</i> and presumably at some point
|
|
they will send us back a program number that we will encode into our XDR
|
|
description file. Until we release this we can use a program number
|
|
in the "user defined" number space.
|
|
<br>
|
|
<h2>
|
|
<a NAME="The Server Program"></a>The Server Program</h2>
|
|
The server is a standalone program that the user builds and runs, probably
|
|
as a daemon like process. This program is linked against the Berkeley
|
|
DB library and the RPC library (which is part of the C library on my FreeBSD
|
|
machine, others may have/need <i>-lrpclib</i>). The server basically
|
|
is a slave to the client process. All messages from the client are
|
|
synchronous and two-way. The server handles messages one at a time,
|
|
and sends a reply back before getting another message. There are
|
|
no asynchronous messages generated by the server to the client.
|
|
<p>We have made a choice to modify the generated code for the server.
|
|
The changes will be minimal, generally calling functions we write, that
|
|
are in other source files. The first change is adding a call to our
|
|
time-out function as described below. The second change is changing
|
|
the name of the generated <i>main()</i> function to <i>__dbsrv_main()</i>,
|
|
and adding our own <i>main()</i> function so that we can parse options,
|
|
and set up other initialization we require. I have a <i>sed</i> script
|
|
that is run from the distribution scripts that massages the generated code
|
|
to make these minor changes.
|
|
<p>Primarily the code needed for the server is the collection of the specified
|
|
RPC functions. Each function receives the structure indicated, and
|
|
our code takes out what it needs and passes the information into DB itself.
|
|
The server needs to maintain a translation table for identifiers that we
|
|
pass back to the client for the environment, transaction and database handles.
|
|
<p>The table that the server maintains, assuming one client per server
|
|
process/thread, should contain the handle to the environment, database
|
|
or transaction, a link to maintain parent/child relationships between transactions,
|
|
or databases and cursors, this handle's identifier, a type so that we can
|
|
error if the client passes us a bad id for this call, and a link to this
|
|
handle's environment entry (for time out/activity purposes). The
|
|
table contains, in entries used by environments, a time-out value and an
|
|
activity time stamp. Its use is described below for timing out idle
|
|
clients.
|
|
<p>Here is how we time out clients in the server. We have to modify
|
|
the generated server code, but only to add one line during the dispatch
|
|
function to run the time-out function. The call is made right before
|
|
the return of the dispatch function, after the reply is sent to the client,
|
|
so that client's aren't kept waiting for server bookkeeping activities.
|
|
This time-out function then runs every time the server processes a request.
|
|
In the time-out function we maintain a time-out hint that is the youngest
|
|
environment to time-out. If the current time is less than the hint
|
|
we know we do not need to run through the list of open handles. If
|
|
the hint is expired, then we go through the list of open environment handles,
|
|
and if they are past their expiration, then we close them and clean up.
|
|
If they are not, we set up the hint for the next time.
|
|
<p>Each entry in the open handle table has a pointer back to its environment's
|
|
entry. Every operation within this environment can then update the
|
|
single environment activity record. Every environment can have a
|
|
different time-out. The <a href="../docs/api_c/env_set_rpc_server.html">DBENV->set_rpc_server
|
|
</a>call
|
|
takes a server time-out value. If this value is 0 then a default
|
|
(currently 5 minutes) is used. This time-out value is only a hint
|
|
to the server. It may choose to disregard this value or set the time-out
|
|
based on its own implementation.
|
|
<p>For completeness, the flaws of this time-out implementation should be
|
|
pointed out. First, it is possible that a client could crash with
|
|
open handles, and no other requests come in to the server. Therefore
|
|
the time-out function never gets run and those resources are not released
|
|
(until a request does come in). Similarly, this time-out is not exact.
|
|
The time-out function uses its hint and if it computes a hint on one run,
|
|
an earlier time-out might be created before that time-out expires.
|
|
This issue simply yields a handle that doesn't get released until that
|
|
original hint expires. To illustrate, consider that at the time that
|
|
the time-out function is run, the youngest time-out is 5 minutes in the
|
|
future. Soon after, a new environment is opened that has a time-out
|
|
of 1 minute. If this environment becomes idle (and other operations
|
|
are going on), the time-out function will not release that environment
|
|
until the original 5 minute hint expires. This is not a problem since
|
|
the resources will eventually be released.
|
|
<p>On a similar note, if a client crashes during an RPC, our reply generates
|
|
a SIGPIPE, and our server crashes unless we catch it. Using <i>signal(SIGPIPE,
|
|
SIG_IGN) </i>we can ignore it, and the server will go on. This is
|
|
a call in our <i>main()</i> function that we write. Eventually
|
|
this client's handles would be timed out as described above. We need
|
|
this only for the unfortunate window of a client crashing during the RPC.
|
|
<p>The options below are primarily for control of the program itself,.
|
|
Details relating to databases and environments should be passed from the
|
|
client to the server, since the server can serve many clients, many environments
|
|
and many databases. Therefore it makes more sense for the client
|
|
to set the cache size of its own environment, rather than setting a default
|
|
cachesize on the server that applies as a blanket to any environment it
|
|
may be called upon to open. Options are:
|
|
<ul>
|
|
<li>
|
|
<b>-t </b> to set the default time-out given to an environment.</li>
|
|
|
|
<li>
|
|
<b>-T</b> to set the maximum time-out allowed for the server.</li>
|
|
|
|
<li>
|
|
<b>-L</b> to log the execution of the server process to a specified file.</li>
|
|
|
|
<li>
|
|
<b>-v</b> to run in verbose mode.</li>
|
|
|
|
<li>
|
|
<b>-M</b> to specify the maximum number of outstanding child server
|
|
processes/threads we can have at any given time. The default is 10.
|
|
<b>[We
|
|
are not yet doing multiple threads/processes.]</b></li>
|
|
</ul>
|
|
|
|
<h2>
|
|
The Client Code</h2>
|
|
The client code contains all of the supported functions and methods used
|
|
in this model. There are several methods in the <i>__db_env
|
|
</i>and
|
|
<i>__db</i>
|
|
structures that currently do not apply, such as the callbacks. Those
|
|
fields that are not applicable to the client model point to NULL to notify
|
|
the user of their error. Some method functions remain unchanged,
|
|
as well such as the error calls.
|
|
<p>The client code contains each method function that goes along with the
|
|
<a href="#Remote Procedure Calls">RPC
|
|
calls</a> described elsewhere. The client library also contains its
|
|
own version of <a href="../docs/api_c/env_create.html">db_env_create()</a>,
|
|
which does not result in any messages going over to the server (since we
|
|
do not yet know what server we are talking to). This function sets
|
|
up the pointers to the correct client functions.
|
|
<p>All of the method functions that handle the messaging have a basic flow
|
|
similar to this:
|
|
<ul>
|
|
<li>
|
|
Local arg parsing that may be needed</li>
|
|
|
|
<li>
|
|
Marshalling the message header and the arguments we need to send to the
|
|
server</li>
|
|
|
|
<li>
|
|
Sending the message</li>
|
|
|
|
<li>
|
|
Receiving a reply</li>
|
|
|
|
<li>
|
|
Unmarshalling the reply</li>
|
|
|
|
<li>
|
|
Local results processing that may be needed</li>
|
|
</ul>
|
|
|
|
<h2>
|
|
Generated Code</h2>
|
|
Almost all of the code is generated from a source file describing the interface
|
|
and an <i>awk</i> script. This awk script generates six (6)
|
|
files for us. It also modifies one. The files are:
|
|
<ol>
|
|
<li>
|
|
Client file - The C source file created containing the client code.</li>
|
|
|
|
<li>
|
|
Client template file - The C template source file created containing interfaces
|
|
for handling client-local issues such as resource allocation, but with
|
|
a consistent interface with the client code generated.</li>
|
|
|
|
<li>
|
|
Server file - The C source file created containing the server code.</li>
|
|
|
|
<li>
|
|
Server template file - The C template source file created containing interfaces
|
|
for handling server-local issues such as resource allocation, calling into
|
|
the DB library but with a consistent interface with the server code generated.</li>
|
|
|
|
<li>
|
|
XDR file - The XDR message description file created.</li>
|
|
|
|
<li>
|
|
Server sed file - A sed script that contains commands to apply to the server
|
|
procedure file (i.e. the real source file that the server template file
|
|
becomes) so that minor interface changes can be consistently and easily
|
|
applied to the real code.</li>
|
|
|
|
<li>
|
|
Server procedure file - This is the file that is modified by the sed script
|
|
generated. It originated from the server template file.</li>
|
|
</ol>
|
|
The awk script reads a source file, <i>db_server/rpc.src </i>that describes
|
|
each operation and what sorts of arguments it takes and what it returns
|
|
from the server. The syntax of the source file describes the interface
|
|
to that operation. There are four (4) parts to the syntax:
|
|
<ol>
|
|
<li>
|
|
<b>BEGIN</b> <b><i>function version# codetype</i></b> - begins a new functional
|
|
interface for the given <b><i>function</i></b>. Each function has
|
|
a <b><i>version number</i></b>, currently all of them are at version number
|
|
one (1). The <b><i>code type</i></b> indicates to the awk script
|
|
what kind of code to generate. The choices are:</li>
|
|
|
|
<ul>
|
|
<li>
|
|
<b>CODE </b>- Generate all code, and return a status value. If specified,
|
|
the client code will simply return the status to the user upon completion
|
|
of the RPC call.</li>
|
|
|
|
<li>
|
|
<b>RETCODE </b>- Generate all code and call a return function in the client
|
|
template file to deal with client issues or with other returned items.
|
|
If specified, the client code generated will call a function of the form
|
|
<i>__dbcl_<name>_ret()
|
|
</i>where
|
|
<name> is replaced with the function name given here. This function
|
|
is placed in the template file because this indicates that something special
|
|
must occur on return. The arguments to this function are the same
|
|
as those for the client function, with the addition of the reply message
|
|
structure.</li>
|
|
|
|
<li>
|
|
<b>NOCLNTCODE - </b>Generate XDR and server code, but no corresponding
|
|
client code. (This is used for functions that are not named the same thing
|
|
on both sides. The only use of this at the moment is db_env_create
|
|
and db_create. The environment create call to the server is actually
|
|
called from the <a href="../docs/api_c/env_set_rpc_server.html">DBENV->set_rpc_server()</a>
|
|
method. The db_create code exists elsewhere in the library and we
|
|
modify that code for the client call.)</li>
|
|
</ul>
|
|
|
|
<li>
|
|
<b>ARG <i>RPC-type C-type varname [list-type]</i></b>- each line of this
|
|
describes an argument to the function. The argument is called <b><i>varname</i></b>.
|
|
The <b><i>C-type</i></b> given is what it should look like in the C code
|
|
generated, such as <b>DB *, u_int32_t, const char *</b>. The
|
|
<b><i>RPC-type</i></b>
|
|
is an indication about how the RPC request message should be constructed.
|
|
The RPC-types allowed are described below.</li>
|
|
|
|
<li>
|
|
<b>RET <i>RPC-type C-type varname [list-type]</i></b>- each line of this
|
|
describes what the server should return from this procedure call (in addition
|
|
to a status, which is always returned and should not be specified).
|
|
The argument is called <b><i>varname</i></b>. The <b><i>C-type</i></b>
|
|
given is what it should look like in the C code generated, such as <b>DB
|
|
*, u_int32_t, const char *</b>. The <b><i>RPC-type</i></b> is an
|
|
indication about how the RPC reply message should be constructed.
|
|
The RPC-types are described below.</li>
|
|
|
|
<li>
|
|
<b>END </b>- End the description of this function. The result is
|
|
that when the awk script encounters the <b>END</b> tag, it now has all
|
|
the information it needs to construct the generated code for this function.</li>
|
|
</ol>
|
|
The <b><i>RPC-type</i></b> must be one of the following:
|
|
<ul>
|
|
<li>
|
|
<b>IGNORE </b>- This argument is not passed to the server and should be
|
|
ignored when constructing the XDR code. <b>Only allowed for an ARG
|
|
specfication.</b></li>
|
|
|
|
<li>
|
|
<b>STRING</b> - This argument is a string.</li>
|
|
|
|
<li>
|
|
<b>INT </b>- This argument is an integer of some sort.</li>
|
|
|
|
<li>
|
|
<b>DBT </b>- This argument is a DBT, resulting in its decomposition into
|
|
the request message.</li>
|
|
|
|
<li>
|
|
<b>LIST</b> - This argument is an opaque list passed to the server (NULL-terminated).
|
|
If an argument of this type is given, it must have a <b><i>list-type</i></b>
|
|
specified that is one of:</li>
|
|
|
|
<ul>
|
|
<li>
|
|
<b>STRING</b></li>
|
|
|
|
<li>
|
|
<b>INT</b></li>
|
|
|
|
<li>
|
|
<b>ID</b>.</li>
|
|
</ul>
|
|
|
|
<li>
|
|
<b>ID</b> - This argument is an identifier.</li>
|
|
</ul>
|
|
So, for example, the source for the DB->join RPC call looks like:
|
|
<pre>BEGIN dbjoin 1 RETCODE
|
|
ARG ID DB * dbp
|
|
ARG LIST DBC ** curs ID
|
|
ARG IGNORE DBC ** dbcpp
|
|
ARG INT u_int32_t flags
|
|
RET ID long dbcid
|
|
END</pre>
|
|
Our first line tells us we are writing the dbjoin function. It requires
|
|
special code on the client so we indicate that with the RETCODE.
|
|
This method takes four arguments. For the RPC request we need the
|
|
database ID from the dbp, we construct a NULL-terminated list of IDs for
|
|
the cursor list, we ignore the argument to return the cursor handle to
|
|
the user, and we pass along the flags. On the return, the reply contains
|
|
a status, by default, and additionally, it contains the ID of the newly
|
|
created cursor.
|
|
<h2>
|
|
Building and Installing</h2>
|
|
I need to verify with Don Anderson, but I believe we should just build
|
|
the server program, just like we do for db_stat, db_checkpoint, etc.
|
|
Basically it can be treated as a utility program from the building and
|
|
installation perspective.
|
|
<p>As mentioned early on, in the section on <a href="#DB Modifications">DB
|
|
Modifications</a>, we have a single library, but allowing the user to access
|
|
the client portion by sending a flag to <a href="../docs/api_c/env_create.html">db_env_create()</a>.
|
|
The Makefile is modified to include the new files.
|
|
<p>Testing is performed in two ways. First I have a new example program,
|
|
that should become part of the example directory. It is basically
|
|
a merging of ex_access.c and ex_env.c. This example is adequate to
|
|
test basic functionality, as it does just does database put/get calls and
|
|
appropriate open and close calls. However, in order to test the full
|
|
set of functions a more generalized scheme is required. For the moment,
|
|
I am going to modify the Tcl interface to accept the server information.
|
|
Nothing else should need to change in Tcl. Then we can either write
|
|
our own test modules or use a subset of the existing ones to test functionality
|
|
on a regular basis.
|
|
</body>
|
|
</html>
|