parser. It is implemented using the Xerces C++ API, and it provides
access to most of the C++ API from Perl.
WWW: http://xerces.apache.org/xerces-p/
PR: ports/95296
Submitted by: Ken Menzel <kenm@icarz.com>
written in Python.
It is designed to be easy to adapt and extend for your application.
Stuff you can do with the Reverend:
* classify RSS stories
* classify recipes by cuisine
* who do you write like? Shakespeare, Dickens or Austen
* detect the language of a document
* is your code more like Guido's or Peter's
Author: Amir Bakhtiar <amir@divmod.org>
WWW: http://www.divmod.org/trac/wiki/DivmodReverend
PR: ports/96531
Submitted by: Nicola Vitale <nivit@email.it>
written in Perl and C. The archetypal application is website search, but it
can be put to many different uses.
Features
* Extremely fast and scalable - can handle millions of documents
* Full support for 12 Indo-European languages.
* Support for boolean operators AND, OR, and AND NOT; parenthetical
groupings, and prepended +plus and -minus
* Algorithmic selection of relevant excerpts and highlighting of search terms
within excerpts
* Highly customizable query and indexing APIs
* Phrase matching
* Stemming
* Stoplists
WWW: http://www.rectangular.com/kinosearch/
PR: ports/96115
Submitted by: Vivek Khera <vivek@khera.org>
XML::RSS::Parser is a lightweight liberal parser of RSS feeds. This parser
is "liberal" in that it does not demand compliance of a specific RSS version
and will attempt to gracefully handle tags it does not expect or understand.
The parser's only requirements is that the file is well-formed XML and
remotely resembles RSS. Roughly speaking, well formed XML with a channel
element as a direct sibling or the root tag and item elements etc.
There are a number of advantages to using this module then just using
a standard parser-tree combination. There are a number of different RSS
formats in use today. In very subtle ways these formats are not entirely
compatible from one to another. XML::RSS::Parser makes a couple assumptions
to "normalize" the parse tree into a more consistent form. For instance,
it forces channel and item into a parent-child relationship.
WWW: http://search.cpan.org/dist/XML-RSS-Parser/
Google SiteMaps.
The Sitemap Protocol allows you to inform search engine
crawlers about URLs on your Web sites that are available
for crawling.
WWW: http://search.cpan.org/dist/WWW-Google-SiteMap/
the excellent Enchant spellchecker available as a Python module.
The bindings are generated using SWIG. It includes all the functionality
of Enchant with the flexibility of Python and a nice 'Pythonic'
object-oriented interface. It also aims to provide some higher-level
functionality than is available in the C API.
Author: Ryan Kelly <ryan@rfk.id.au>
WWW: http://pyenchant.sourceforge.net/
PR: ports/95284
Submitted by: Nicola Vitale <nivit@email.it>