written in Perl and C. The archetypal application is website search, but it
can be put to many different uses.
Features
* Extremely fast and scalable - can handle millions of documents
* Full support for 12 Indo-European languages.
* Support for boolean operators AND, OR, and AND NOT; parenthetical
groupings, and prepended +plus and -minus
* Algorithmic selection of relevant excerpts and highlighting of search terms
within excerpts
* Highly customizable query and indexing APIs
* Phrase matching
* Stemming
* Stoplists
WWW: http://www.rectangular.com/kinosearch/
PR: ports/96115
Submitted by: Vivek Khera <vivek@khera.org>
XML::RSS::Parser is a lightweight liberal parser of RSS feeds. This parser
is "liberal" in that it does not demand compliance of a specific RSS version
and will attempt to gracefully handle tags it does not expect or understand.
The parser's only requirements is that the file is well-formed XML and
remotely resembles RSS. Roughly speaking, well formed XML with a channel
element as a direct sibling or the root tag and item elements etc.
There are a number of advantages to using this module then just using
a standard parser-tree combination. There are a number of different RSS
formats in use today. In very subtle ways these formats are not entirely
compatible from one to another. XML::RSS::Parser makes a couple assumptions
to "normalize" the parse tree into a more consistent form. For instance,
it forces channel and item into a parent-child relationship.
WWW: http://search.cpan.org/dist/XML-RSS-Parser/
Google SiteMaps.
The Sitemap Protocol allows you to inform search engine
crawlers about URLs on your Web sites that are available
for crawling.
WWW: http://search.cpan.org/dist/WWW-Google-SiteMap/
the excellent Enchant spellchecker available as a Python module.
The bindings are generated using SWIG. It includes all the functionality
of Enchant with the flexibility of Python and a nice 'Pythonic'
object-oriented interface. It also aims to provide some higher-level
functionality than is available in the C API.
Author: Ryan Kelly <ryan@rfk.id.au>
WWW: http://pyenchant.sourceforge.net/
PR: ports/95284
Submitted by: Nicola Vitale <nivit@email.it>
It provides a lexical scanner and LR parser (constructed by PCCTS),
both of which are efficient and offer good error detection and
recovery; a set of functions for traversing the AST (abstract
syntax tree) generated by the parser; and utility functions for
manipulating strings according to BibTeX conventions.
WWW: http://www.gerg.ca/software/btOOL
PR: ports/94686
Submitted by: Kay Lehmann <kay_lehmann@web.de>
simplifies the process of writings documents and publishing them to
various output formats.
Muse consists of two main parts: an enhanced text-mode for authoring
documents and navigating within Muse projects, and a set of publishing
styles for generating different kinds of output.
WWW: http://www.emacswiki.org/cgi-bin/wiki/MuseMode
PR: ports/93716
Submitted by: Dryice Liu <dryice@dryice.name>
xmldiff uses xmlprpr and diff to display meaningful differences in XML
files in an easy to read format. Output formats available include HTML,
ANSI colour, and regular diff. The coloured modes are particularly
useful for viewing small differences in context within large XML files.
WWW: http://software.decisionsoft.com/tools.html
PR: ports/92947
Submitted by: Paul Chvostek <paul+ports@it.ca>
An XML pretty printer created to format XML that doesn't make use of
mixed content. In the default mode each element is put on a separate
line with consistent indentation. It can also separate attributes onto
individual lines, sort attributes in a specified or alphabetic order,
expand self closing tags, and more.
Note that the distribution calls this tool "xmlpp", but it has been
renamed so as not to conflict with an xmlpp already in the ports tree.
WWW: http://software.decisionsoft.com/tools.html
PR: ports/92946
Submitted by: Paul Chvostek <paul+ports@it.ca>
The po4a (po for anything) project goal is to ease translations
(and more interestingly, the maintenance of translations) using
gettext tools on areas where they were not expected like documentation.
This package contains the main libraries of po4a, and the following sub-modules:
- KernelHelp: Help messages of each kernel compilation option.
- Man: Good old manual page format.
- Pod: Perl documentation format.
- Sgml: either debiandoc or docbook DTD.
- Dia: uncompressed Dia diagrams.
- LaTeX: generic TeX or LaTeX format
WWW: http://packages.debian.org/unstable/text/po4a
PR: ports/91532
Submitted by: Meno Abels <meno.abels@adviser.com>
multibyte characters such as UTF-8, EUC-JP, and GB2312, fullwidth
characters such as east Asian characters, combining characters
such as diacritical marks and Thai, and languages which don't
use whitespaces between words such as Chinese and Japanese.
WWW: http://packages.debian.org/unstable/perl/libtext-wrapi18n-perl
PR: ports/91532
Submitted by: Meno Abels <meno.abels@adviser.com>
Fakeroot runs a command in an environment were it appears to have
root privileges for file manipulation, by setting LD_PRELOAD to a
library with alternative versions of getuid(), stat(), etc. This
is useful for allowing users to create archives (tar, ar, .deb .rpm
etc.) with files in them with root permissions/ownership. Without
fakeroot one would have to have root privileges to create the
constituent files of the archives with the correct permissions and
ownership, and then pack them up, or one would have to construct
the archives directly, without using the archiver.
WWW: http://freshmeat.net/projects/fakeroot
PR: ports/91532
Submitted by: Meno Abels <meno.abels@adviser.com>