Lagoon is an XML-based framework for web site maintenance.
Lagoon does not require support for any dynamic content technology, such as Servlets, CGI, ASP, SSI, PHP or JSP, on the web server. It's therefore very useful for sites on cheep web hotels which gives limited (or no) access to dynamic content features. However, Lagoon is useful larger sites too, and can be used together with other dynamic content technologies.
Lagoon is also useful for building HTML based documentation bundles, for viewing without a web server.
The basic functionality of a web server is to send the content of a regular file stored on disk as the response of a request, this is called static content. The other alternative is to start a process that generates the response for each request, this is called dynamic content. The use of dynamic content can be divided into several categories as follows.
It is also possible to have combinations (such as both real-time data and user interaction).
Lagoon produces all your pseudo-dynamic content off-line, and send the result to the web server as static files. This can give better performance, since processing doesn't have to be done at each request. This also gives you convenient pseudo-dynamic content on a web server without explicit support for it.
Lagoon does not handle user interaction, you have to use the conventional technologies for this. However, while using ASP, JSP (or whatever) for the few pages with user interaction, you can still use Lagoon for the rest of the site.
Lagoon can in some cases be used for real-time data, but this requires a more complicated setup than usual. See the Advanced User Guide.
In addition, Lagoon keeps track of all content for your web site, including static files (HTML, images, etc.) and any files for user interaction (ASP or JSP pages, CGI scripts, or whatever). Lagoon automatically detects if any source file is updated and regenerate the dependent content as necessary (and only when necessary). Lagoon can be seen as a Make tool for web sites, you have the "source code" on your computer, Lagoon performs "compilation" as necessary and stores the "object code" directly on the web server (with FTP or SSH if the web server is remote). This is especially useful if you have a large web site and updates it over a slow dial-up modem connection; if you make changes to only a few pages, only those pages are actually transmitted to the web server.
To make use of Lagoon, you need to know XML (including namespaces). Many usage patterns require knowledge of XSLT too. See http://www.w3.org/xml/ for more information of XML, XSLT and related technologies.
You don't need to know the Java programming language to make use of the basic features of Lagoon.
CLASSPATH
, or install them as
standard extensions in your JRE.bin
directory (which contains scripts for Windows,
OS/2 and UNIX based systems) to your PATH
and set an
environment variable LAGOON_HOME
to the directory where you
installed Lagoon. Alternativly, you can add the .jar
files in
the Lagoon distribution to your Java CLASSPATH
).Lagoon is based on a sitemap, which is a file describing the structure of the website. The sitemap is in XML format, see the schema A sitemap may have a name which unique identifies it on the system.
The sitemap has entries for files, each <file>
entry describes how that file should be created.
The target is the request URL for the web server,
and must be specified as a pseudo-absolute URL. The source URL must be
specified as an absolute or pseudo-absolute URL. If source is omitted,
it's set to the same string as target (including any wildcard).
The pipeline must end with a byte producer. As a shortcut, an empty
file
element is equvivalent to a file
element
with a single read
child.
The sitemap may also contain <part>
entries, which defines a partial document, to be included by another document.
A part
has a name, which is not an URL. The source URL must be
specified as an absolute or pseudo-absolute URL. The pipeline must end with an
XML producer. Wildcards may not be used.
The sitemap may also contain <output>
entries, which
defines the end of the pipeline for use by several targets (and escially by
the island feature). An <output>
entry has a name, which
is not an URL. The pipeline must end with a byte producer, and start with an
XML consumer.
The sitemap may also contain <delete>
entries, which are
used to delete the target.
The sitemap may also contain <property>
entries, which are
used to define sitemap wide parameters.
In the sitemap, URLs are used to point out files and other resources. See RFC 1738 for more information about URLs. There are three kinds of URLs:
:
), and
have no slash (/
) before the first colon. The part before the
first colon is called the scheme. E.g.
http://www.foo.com/bar/baz.html
(the scheme here is html
)./foo/bar.xml
. When used as source, it is searched for
relative to the local source root directory.pic/photo.jpeg
Absolute URLs are handled by the java.net.URLConnection
class in Java. It has a pluggable API and it's possible to install custom
handlers for any scheme (see the Java documentation for information about
this). At least http
, https
and ftp
are included in Java2 1.4.
However, three schemes are handled specially by Lagoon:
file
, res
and part
.
The file
scheme is used to point out a file anywhere in the
local file system. Lagoon handles this by itself because the support for it
in several versions of Java is broken. These URLs must be written in the form
file:native absolute file path
. E.g.
file:/usr/local/file.txt
on a UNIX system or
file:C:\Documents\file.txt
on a Windows or OS/2 system.
(Note that this is not according to the definition of
file
URLs in RFC 1738.)
The res
scheme is used to point out resources included in
the Lagoon distribution. These resources are loaded from the Java CLASSPATH
,
you can add your own resources by adding them to the Java CLASSPATH
.
See the Resource Guide for information about
the included resources. E.g. res:/style/imageindex.xsl
The part
scheme is used to point out a <part>
defined in the sitemap. E.g. part:thePart
.
Dependency checking of sources works for relative, pesudo-absolute and
absolute URLs with file
or part
schemes. However, it does not work with absolute URLs with other
schemes, if any such is used as source, it causes rebuilding each time.
Absolute URLs with res
scheme never causes rebuilding.
A file is created with a pipeline of connected producers. A producer is a component which produces a stream of bytes (implemented using a Java OutputStream) or a stream of XML data (implemented using SAX2 events). A producer may additionally take a stream of bytes or XML data as input (if it does not, it is a source producer). A pipeline is a chain of connected producers.
Lagoon has defined six types of producers:
Parameters can be passed to a producer, which is useful for e.g. giving the name of the stylesheet to a XSLT processor, parameters are given as attribute to the producer element in the sitemap. Any character content to the source producer elements is taken as a parameter with the name "name". Each file entry may have an main source, which can be read by the source producer (however, a source producer may instead obtain the data from some other source).
When Lagoon is started, the sitemap is parsed and a pipeline is set up for each file entry. Lagoon is now ready to build the website, which can be done several times since the pipelines are reusable.
The website building is performed by processing each file entry in the sitemap. A file entry is only processed if necessary, i.e. if any source data has been updated since the last time it was processed. It's up to each producer to implement this dependency checking.
You can specify similar treatment of several files by using a wildcard
in the source filename. Lagoon will enumerate all source files matching
the given wildcard pattern, and process all of them in sequence. The
target filename must also contains a wildcard, it will be instantiated
with the same string used to match the source pattern, i.e. if the
source pattern is *.xml
and the target is
*.html
, then the source file book.xml
will
generate book.html
. You cannot use wildcards if the source
is an absolute URL.
Producer may have a special feature called split which works as a transform, but can generate several output files from one single source. The pipeline after a split producer will be executed once for each part the split producer generates (it has to generate a filename for it as well).
Producer may have a special feature called island which
works as a transform, but can generate several output files from one
single source. Parts of the source document will be redirected to other
output pipelines, using <output>
entries in the sitemap
(it has to generate a filename for it as well).
You can also write your own produceres, see the Advanced User Guide.
If nothing else is stated, a producer signal the need for rebuilding when the main source has been updated (for source producers), or asks the next upstream producer (for other producers).
The main source must be a directory (and the source URL must end with '/'),
and may not be an absolute URL with other scheme than file
or
res
. The files and subdirectories in this directory
is listed and provided as XML in the following format:
<dirlist> <directory filename="somedir" url="/thisdir/somedir" timestamp="987198810097" date="2001-04-13" time="23:53:30"/> <file filename="somefile.txt" url="/thisdir/somefile.txt" timestamp="987197358000" date="2001-04-13" time="23:29:18" size="445"/> </dirlist>
The url
attribute contains the same pseudo-absolute URL that
would be used to refer to this file in the sitemap. The timestamp
attribute contains the number of milliseconds since 1970, as a decimal number.
There is one optional parameter, "pattern", which gives a wildcard pattern to select which files and subdirectories to include. If omitted, all files and subdirectories are included.
This producer signals the need for rebuild when the timestamp on the directory is updated. However, since many operating systems usually doesn't do that, it will also check if any file is added, removed or renamed. However, it does not check if the content of any file in the directory is updated.
Read the main source as a byte stream.
Applies an XSLT stylesheet to the XML stream.
The mandatory parameter "stylesheet" specifies the location of the stylesheet, as an URL. If this URL is relative, it's searched for relative to the source file. Any relative URL imported or included from the stylesheet is searched for relative to the stylesheet.
Any relative URL refeered to by the document()
function
is searched for relative to the source file. part:
URL:s may
be used in the document()
function.
This producer will check if the stylesheet, or any file imported or
included from it has been updated, and in that case signal the need for
rebuild and also recompile the stylesheet. This producer will check if
any file referred to using the document()
function has
been updated, and in that case signal the need for rebuild. However,
since the document()
function can take an expression as
argument, this may not always work properly. To remedy this problem,
there is a parameter "always", if it's set to any non-empty string,
this producer will signal the need for rebuilding (but not recompile
the stylesheet) each time.
Any other parameters are passed as parameters to the stylesheet (to be used by top-level <xsl:param> elements in XSLT).
Any xsl:output
elements in the stylesheet have no
effect. Specify formatting properties with a format
producer instead.
Implements the splitting feature.
The XML stream from the upstream producer is scanned for specific element, and each occurrence of that element generates one output file. The XML data outside this element is ignored. The main output will be a empty dummy file (however, it is needed for dependency checking).
This producer takes three mandatory parameters. "namespace" and "element" specifies the element to split on.
"outputname" specifies how to construct the filename for each part,
it contains a filename template which contains attribute names
surrounded by braces ([]
) which are replaced with the
value of that attribute on the split element. To actually include a
literal brace in the filename, use a double brace.
For example, a sitemap fragment like this:
<transform type="split" namespace="" element="thepart" outputname="[name].xml">
with an input fragment like this:
<thepart name="first">
will result in the file first.xml
being created.
The output name must be a relative URL, and is relative to the main target file. It must not be pseudo-absolute.
Implements the island feature.
The XML stream from the upstream producer is scanned for elements with specific XML namespaces, and each occurrence of such element generates one output file. The XML data outside those elements is passed through unchanged. This is typically used to processed embeeded SVG or MathML in XHTML documents.
For each XML namespace you want to extract, you need to specify
the three parameters "namespacen", "outputn" and
"outputExtn" where n is a number starting from 0.
"output" specifies which <output>
entry in sitemap to
use for this namespace. "outputExt" specifies the file extension to give
to the generated file (including '.').
Filenames for the extraced parts will be the name of the main file + "_image" + a number + the extension given.
Performs LSSI processing.
Any relative URL included from the LSSI page is searched for relative to the source file.
part:
URL:s may be used when including.
Included parts need not to be well-formed.
This producer will check all files (and other resources) the LSSI page depends on, and signal the need for rebuilding if any of them are updated (or always signal the need for rebuilding if any resource cannot be checked, e.g. an absolute URL).
Executes an LSP page.
Any parameters to this producers are used as parameters to the LSP page.
Any relative URL imported from the LSP page is searched for relative to the source file.
part:
URL:s may be used when importing.
Imported parts must be well-formed (i.e. have a single root element),
which is not nessesary the case if it's generated by LSP or XSLT.
This producer always signal the need for rebuilding.
This producer can not be used together with wildcards.
You need to include the LSP jar files (lsprt.jar
and
lspc.jar
) in your CLASSPATH.
Formats into well-formed XML.
An "encoding" parameter can be used to specify the character encoding to use, default is UTF-8.
An "indent" parameter can be used to specify wether to use indenting for pretty-printing the result. Default is no indenting.
The "doctype-public" and "doctype-system" parameters can be used to specify a specific DTD to use. Default is to not use any DTD.
An "omit-xml-declaration" parameter can be used to specify that the output should not contain any XML declaration. Default is to include an XML declaration.
Formats into classical HTML 4.01.
An "encoding" parameter can be used to specify the character encoding to use, default is iso-8859-1.
An "indent" parameter can be used to specify wether to use indenting for pretty-printing the result. Default is no indenting.
An "html" parameter can be used to specify which HTML DTD to use, it can take the values "transitional", "strict" or "frameset". Default is "transitional".
The "doctype-public" and "doctype-system" parameters can be used to specify a specific DTD to use. If specified, they will override the setting of the "html" parameter.
Formats into XHTML (well-formed XML which also can be consumed by most non-XML aware HTML browsers).
An "encoding" parameter can be used to specify the character encoding to use, default is UTF-8.
An "indent" parameter can be used to specify wether to use indenting for pretty-printing the result. Default is no indenting.
An "html" parameter can be used to specify which HTML DTD to use, it can take the values "transitional", "strict" or "frameset". Default is "transitional".
The "doctype-public" and "doctype-system" parameters can be used to specify a specific DTD to use. If specified, they will override the setting of the "html" parameter.
An "omit-xml-declaration" parameter can be used to specify that the output should not contain any XML declaration. Default is to include an XML declaration if nessecary.
Formats into plain text.
An "encoding" parameter can be used to specify the character encoding to use, default is iso-8859-1.
Formats XSL:FO into PDF using Apache FOP.
Formats SVG into an bitmap image using Apache Batik.
A mandatory parameter "format" specifies the format of the image, it may be "jpeg", "png" or "tiff".
For JPEG images, an additional parameter "quality" may be used to specify the compression rate, it's a floating point number between 0 and 1 where a larger number means better quality but less compression (larger file). The default quality is 0.8.
Parses the byte stream from the upstream producer as XML.
The use of this producer is not recommended. In most cases you can use a <source> instead. Using this producer may affect performance, and can in some situations cause deadlocks.
Copy the content of any file, useful for e.g. images:
<file target="/john.jpeg" source="/img/john.jpeg"> <read/> </file>
Same as above:
<file target="/john.jpeg" source="/img/john.jpeg"/>
Format an HTML page. Note that the source HTML file must be XHTML (well-formed XML and all HTML elements in the XHTML namespace):
<file target="/index.html"> <format type="html"> <source/> </format> </file>
Transform a bunch of XML files with an XSLT stylesheet into HTML:
<file target="/books/*.html" source="/books/*.xml"> <format type="html"> <transform type="xslt" stylesheet="/style/book_html.xsl"> <source/> </transform> </format> </file>
Transform the same XML files with another XSLT stylesheet into PDF:
<file target="/books/*.pdf" source="/books/*.xml"> <format type="fo"> <transform type="xslt" stylesheet="/style/book_print.xsl"> <source/> </transform> </format> </file>
Use XSLT to generate an index over all books:
<file target="/books/index.html" source="/books"> <format type="html"> <transform type="xslt" stylesheet="/books/index.xsl"> <source type="dir" pattern="*.xml"/> </transform> </format> </file>
Generate an HTML page using LSSI:
<file target="/coolpage.html"> <format type="html"> <transform type="lssi"> <source/> </transform> </format> </file>
Define a partial page to be included by another page:
<part name="header" source="/header.lsp"> <transform type="lsp" menu="yes"> <source/> </transform> </part>
Render a JPEG image from SVG:
<file target="/picture.jpeg" source="/picture.svg"> <format type="svg" format="jpeg" quality="0.5"> <source/> </format> </file>
Build PNG images for SVG and MathML islands in XHTML. (Requires that you have an XSLT stylesheet to transform MatmML to SVG, no such stylesheet is included in Lagoon. The SVG part works fine though.):
<output name="svgOutput"> <format type="svg" format="png"/> </output> <output name="mathmlOutput"> <format type="svg" format="png"> <transform type="xslt" stylesheet="/mathml2svg.xsl"/> </format> </output> <file target="/island.html"> <format type="html"> <transform type="island" namespace1="http://www.w3.org/2000/svg" output1="svgOutput" outputext1=".png" namespace2="http://www.w3.org/1998/Math/MathML" output2="mathmlOutput" outputext2=".png"> <source/> </transform> </format> </file>
Delete an old file:
<delete target="/oldstuff/outdated.html"/>
Lagoon is invoked by the application class
nu.staldal.lagoon.LagoonCLI
. The syntax is one of:
lagoon property_file how_to_run lagoon sitemap_file how_to_run
The property_file and sitemap_file is specified using
a platform-dependent path (e.g. use '\' as path separator in Windows),
not as an URL. It may be absolute or relative (to the current
working directory). If the filename ends with ".xml
" or
".sitemap
", it will be taken as a sitemap file, otherwise
it will be taken as a property file.
The how_to_run argument specifies what Lagoon should do after initialization. "build" causes it to perform a normal build (perform dependency checking and rebuild the necessary files) once and then exit. "force" causes it perform a force build (override dependency checking and unconditionally rebuild every file) once and then exit. An integer n will cause it to perform a normal build every nth second, forever (until terminated). Leaving this argument out causes it to go into an interactive mode and wait for you to enter a command (write something on the keyboard and press [ENTER]), 'b' will cause normal build, 'f' will cause a force build and 'q' will cause it to quit.
The property file specifies the sitemap file, the source directory, the target and the password to access the target (if nessesary). The file is a standard Java property file, i.e. a text file with one keyword-value pair on each line, separated by ':'; lines beginning with '#' are ignored.
The sitemap file and source directory are specified using platform-dependent paths (e.g. use '\' as path separator in Windows), not as URLs. They may be absolute or relative (to the current working directory). Please note that the Java property file format requires you to escape '\' with '\\'.
If no property file is used (the sitemap file is specified directly on the command line), sourceDir and targetURL will both be set to the current directory. Note: this requires a careful setup of the sitemap, since the default behaviour is to use the same path as source and target (which obviously won't work if sourceDir and targetURL are the same).
Sample property file:
# Lagoon properties sitemapFile: C:\\joe_files\\webbsite\\sitemap.xml sourceDir: C:\\joe_files\\webbsite\\src targetURL: ftp://joe@ftp.acme.com/public_html/ password: secret
Lagoon is capable to store generated files in a local directory, or at a remote server using FTP or SSH. You can also write your own FileStorage to use some other protocol, see the Advanced User Guide.
To use a local directory, just specify a platform-dependent path. The directory will be created if it doesn't exist.
To use FTP, specify an absolute URL in the form
ftp://login@host/path/
.
The path is relative to your home directory on the remote machine
(to start from root, do like ftp://joe@foo.bar.com//abs/path/
).
Note that this will send everything, including your password,
in clear-text over the network. If security is important, use SSH instead.
This requires you to specify the password in the property file.
To use SSH, specify an absolute URL in the form
ssh://login@host/path/
. The
path is relative to your home directory on the remote machine
(to start from root, do like ssh://joe@foo.bar.com//abs/path/
).
You need to have an public key properly setup before using this (you should
be able to login without entering any password), do not specify the
password in the property file. This requires a UNIX style shell with access
to the commands "mkdir -p", "rm -f" and "cat" on the remote server.
Lagoon will create a working directory that is used to store cached data
and dependency information. This directory is named ".lagoon" and is created
in the user's home directory (as pointed out by the Java system property
"user.home", you can change this by modify the lagoon
script to
pass "-Duser.home=/some/other/dir" on the java command line).
It's safe to remove the working directory when Lagoon is not running, it will be recreated next time Lagoon is run. If Lagoon suddenly start giving unexpected behavior, removing the working directory might remedy the problem. However, removing the working directory may cause unnecessary rebuilds next time Lagoon is run, especially if you use a FTP or SSH target.
Lagoon comes with an Ant task. Define the Lagoon Ant task in the Ant build file like this:
<taskdef name="lagoon" classname="nu.staldal.lagoon.LagoonAntTask"> <classpath> <pathelement location="locationOfLagoonJars/lagoon.jar" /> <pathelement location="locationOfLagoonJars/xmlutil.jar" /> </classpath> </taskdef>
and use one of the following syntaxes:
<lagoon propertyFile="propertyFile"/> <lagoon sitemapFile="sitemapFile" sourceDir="sourceDir" targetURL="targetURL" password="password" />
The password attribute can be omitted if not needed. Use the
optional attribute force
to override dependency checking.
Lagoon comes with a simple GUI which you can use instead of the command
line tool. The application class is nu.staldal.lagoon.LagoonGUI
.
The syntax is:
lagoongui [property_file]
Alternatively, you can simply execute lagoon.jar
(java -jar lagoon.jar
),
but that requires you to install the required libraries (Batik and/or FOP)
as standard extensions in your JRE.
To make use of the features of Lagoon, you have to ensure that your HTML source files are in XHTML format (well-formed XML and all HTML elements in the XHTML namespace). You might find the tool HTML Tidy useful for this.
Lagoon comes with a tool which checks an XML file for well-formedness, and reports any errors. The syntax for this tool is:
xmlcheck [-v] xml_file
xml_file is the file to check, can be specified with a
platform-dependent path or an URL. Use the -v
option to
also check for validity (not nessesary for Lagoon). If the XML file is OK,
nothing will be printed, otherwise error messages will be printed.
To help with the process of adapting an existing website, Lagoon comes with a tool which creates a sitemap from an existing directory structure. The syntax for this tool is:
buildsitemap source_dir sitemap_file
The source_dir directory will be recursively processed, and all files will be added to the sitemap which is written to the file sitemap_file. BuildSitemap tries to be a bit clever by treating files differently based on its extension. However, don't expect the generated sitemap to work fine directly, you probably have to do manual adjustments. Making use of LSSI, XSLT transformations and other features requires modifications of the sitemap.