LuceneKit is a class-to-class port of Lucene in GNUstep. It is a technology
suitable for nearly any application that requires full-text search.
WWW: http://www.etoile-project.org/
It uses OniGuruma as regular expression engine.
This is a GNUstep fork of OgreKit 2.1.2
<http://www8.ocn.ne.jp/~sonoisa/OgreKit/>.
Since it is a fork, the API may differ in the future.
Original licence of OgreKit is BSD License.
This fork uses also BSD license (see COPYING document).
WWW: http://www.etoile-project.org/
a classic GNU-style ChangeLog from a subversion repository log. It is made
from several changelog-like scripts using common xslt constructs found in
different places.
WWW: http://ch.tudelft.nl/~arthur/svn2cl/
PR: ports/107007
Submitted by: Alexander Logvinov <ports at logvinov.com>
a stack of flashcards, but handles one-to-many and many-to-one word
relationships better, and includes an integrated scheduler for efficient use
of your 'cards'. Popup was written by Bjorn Ghola and Rob Burns.
Features:
* An editor for cardstack files with support for copying and pasting groups
of words, as well as drag and drop.
* Three quiz styles: multiple choice, spelling, and flashcard.
* Supports quizes and practice
* Graduated time interval scheduler.
* Localized for Thai and German.
WWW: http://popup.sourceforge.net/
software tool that converts the plain text formatting to (X)HTML. The
formatting syntax is designed to be easy and intuitive for web authors
and resembles typical email formatting conventions. The resultant
(X)HTML is structurally valid.
WWW: http://www.freewisdom.org/projects/python-markdown
PR: ports/105992
Submitted by: Graham Todd <gtodd at bellanet.org>
technique described in Cavnar & Trenkle, "N-Gram-Based Text Categorization".
It was primarily developed for language guessing, a task on which it is known to
perform with near-perfect accuracy.
WWW: http://software.wise-guys.nl/libtextcat/
.strings files must be distributed in ASCII encoding, which generally
isn't a convenient encoding to do translation in. As an example, its rather
difficult to enter Chinese characters into an ASCII encoded text file.
Localize will, with any luck, help out with this. Currently its just a
shell of an application, but sometime in the future I hope to complete it.
WWW: http://www.eskimo.com/~pburns/Localize/
It provides a shared library to parse, generate, mainpulate and
validate XML documents from within your own application.
(Linux version)
WWW: http://xml.apache.org/xerces-c/
PR: ports/105275
Submitted by: Alexander Logvinov <ports at logvinov.com>
2. Commercial license is also available for embedded use.
Generally, it's a standalone search engine, meant to provide fast,
size-efficient and relevant fulltext search functions to other
applications. Sphinx was specially designed to integrate well with SQL
databases and scripting languages. Currently built-in data sources
support fetching data either via direct connection to MySQL, or from
an XML pipe.
As for the name, Sphinx is an acronym which is officially decoded as
SQL Phrase Index.
WWW: http://www.sphinxsearch.com/
PR: ports/105649
Submitted by: Matthew Seaman <m.seaman at infracaninophile.co.uk>
Unicode::Unihan - The Unihan Data Base 3.2.0
use Unicode::Unihan;
my $db = new Unicode::Unihan;
print join("," => $db->Mandarin("\x{5c0f}\x{98fc}\x{5f3e}"), "\n";
This module provides a user-friendly interface to the Unicode Unihan
Database 3.2. With this module, the Unihan database is as easy as shown in
above.
WWW: http://search.cpan.org/dist/Unicode-Unihan/
2006-11-05 deskutils/offix-trash: development ceased in 1996
2006-11-04 devel/mingw: use mingw32-* ports instead
2006-11-04 devel/mingw-binutils: use mingw32-* ports instead
2006-11-04 devel/mingw-bin-msvcrt: use mingw32-* ports instead
2006-11-04 devel/mingw-gcc: use mingw32-* ports instead
2006-11-04 devel/mingw-opengl-headers: use mingw32-* ports instead
2006-11-05 editors/offix-editor: developement ceased in 1996
2006-11-05 print/offix-printer: development ceased in 1996
2006-11-05 sysutils/wmmon: no longer available from mastersite
2006-11-04 sysutils/xsysinfo: no longer available from mastersite
2006-11-04 textproc/xmlada: no longer available from mastersite; 2.0 is available
2006-11-05 www/p5-CGI-Application-ValidateRM: no longer available from mastersites
2006-11-05 x11/offix-clipboard: development ceased in 1996
2006-11-05 x11/offix-execute: development ceased in 1996
2006-11-05 x11-fm/offix-files: development ceased in 1996
2006-11-05 x11-wm/icepref: is for IceWM version 1.04 (6 years old)
Cocoa libraries. The GNUstep port that can be found here, was done by me. It
was very easy to do; primarily requiring only new interface files, and build
files.
PR: 104964
Submitted by: Gürkan Sengün
is simple: Using "Text::ExtractWords" and "Lingua::StopWords" from CPAN,
it determines how many of the known stopwords the document contains for
each language supported by "Lingua::StopWords".
Each word in the document recognized as stopword of a particular
language scores one point for this language.
The "language_guess()" function takes a document as a parameter and
returns the abbreviation of the language that it is most likely written
in.
Author: Mike Schilli <cpan@perlmeister.com>
WWW: http://search.cpan.org/~mschilli/Text-Language-Guess-0.02/
PR: ports/103571
Submitted by: Masahiro Teramoto <markun@onohara.to>
ffe is a program for extracting fields from flat file records and dis-
playing them in different formats. ffe relies on the configuration file
to control input file structure and the output format.
WWW: http://sourceforge.net/projects/ff-extractor/
Author: Timo Savinen <tjsa@iki.fi>
arbitrary text and also allows you to mark up a text as HTML
with the keywords.
A Hatena keyword is an element in a suite of web sites
*.hatena.ne.jp having blogs and social bookmarks among others.
Please refer to http://d.hatena.ne.jp/keyword/ (in Japanese) for details.
In Hatena Diary, a blog hosting service, a Hatena keyword found in
a posting is linked to the keywords page automatically.
You can implement the same kind of feature outside Hatena using this module.
It queries Hatena Keyword Link API internally for retrieving terms
Author: Naoya Ito <naoya@bloghackers.net>
WWW: http://search.cpan.org/~naoya/Hatena-Keyword-0.04/
PR: ports/102794
Submitted by: Masahiro Teramoto <markun(at)onohara.to>
This is a smaller, cheaper, faster SED implementation. Minix uses it. GNU
used to use it, until they built their own sed around an extended (some
would say over-extended) regexp package.
For embedded use we searched for a tiny sed implementation especially for
use with the dietlibc and found Eric S. Raymond's sed implementation quite
handy. Though it suffered several bugs and was not under active maintenance
anymore. After sending a bunch of fixes we agreed to continue maintaining
this lovely, historic sed implementation.
Along a lot fixes and cleanups, further speedups, and some missing features
and POSIX conformance, we also added a test-suite to the package, so
regressions are quickly and easily uncovered.
WWW: http://www.exactcode.de/oss/minised/
Author: ExactCode <info@exactcode.de>
Basically, this package contains:
- Functions to automatically adjust and cycle the section underline
decorations;
- A mode that displays the table of contents and allows you to jump anywhere
from it;
- Functions to insert and automatically update a TOC in your source
document;
- A mode which supports font-lock highlighting of reStructuredText
structures;
- Some other convenience functions.
This package is the result of merging:
- restructuredtext.el
- rst-mode.el
- rst-html.el
Those files are now OBSOLETE and have been replaced by this single
package file (2005-10-30).
WWW: http://docutils.sourceforge.net/docs/user/emacs.html
PR: ports/102384
Submitted by: Denis Shaposhnikov <dsh at vlink.ru>
Perl. Everything is implemented as a small plugin and you can mash
them up together using Plagger core API and plugin hooks. You can
think of Plagger as a blosxom or qpsmtpd for RSS aggregator.
WWW: http://plagger.org/
WARNING: This port depends on thousands of ports spececially with
full options.
xxdiff is a computer program that allows a user (usually a software
developer of some sort) to easily visualize the differences between
files. The manner and goal for which this process is applied over
multiple files is highly dependent on the application, and most of
the time is driven by custom user scripts.
For example, a configuration management engineer in a company might
provide some kind of merge policing environment, that allows software
developers to review changes in files for the purpose of accepting or
rejecting a submitted changeset to a codebase. Another example is
that of a developer wishing to review the changes he made to a
checkout of files from a source-code management system such as CVS,
Subversion, ClearCase, Perforce, etc.
WWW: http://furius.ca/xxdiff/doc/xxdiff-scripts.html
Flex is a tool for generating scanners. A scanner, sometimes called a
tokenizer, is a program which recognizes lexical patterns in text. The
flex program reads user-specified input files, or its standard input
if no file names are given, for a description of a scanner to generate.
The description is in the form of pairs of regular expressions and C
code, called rules. Flex generates a C source file named, "lex.yy.c",
which defines the function yylex(). The file "lex.yy.c" can be compiled
and linked to produce an executable. When the executable is run, it
analyzes its input for occurrences of text matching the regular
expressions for each rule. Whenever it finds a match, it executes the
corresponding C code.
WWW: http://flex.sourceforge.net/
Note that there's flex 2.5.4 in the base system. This port provides
a newer version for programs that require it, textproc/xxdiff for one.