Revision history for Perl extension AI::Categorizer. 0.07 Tue May 6 16:15:04 CDT 2003 - Oops - eg/demo.pl and t/15-knowledge_set.t didn't make it into the MANIFEST, so they weren't included in the 0.06 distribution. [Spotted by Zoltan Barta] 0.06 Tue Apr 22 10:27:26 CDT 2003 - Added a relatively simple example script at the request of several people, at eg/demo.pl - Forgot to note a dependency on Algorithm::NaiveBayes in version 0.05. Fixed. - AI::Categorizer class wasn't loading AI::Categorizer::KnowledgeSet class. Fixed. - Fixed a bug in which the 'documents' and 'categories' parameters to the KnowledgeSet objects were never accepted, claiming that it failed the "All are Document objects" or "All are Category objects" callbacks. [Spotted by rob@phraud.org] - Moved the 'stopword_file' parameter from Categorizer.pm to the Collection class. 0.05 Sat Mar 29 00:38:21 CST 2003 - Feature selection is now handled by an abstract FeatureSelector framework class. Currently the only concrete subclass implemented is FeatureSelector::DocFrequency. The 'feature_selection' parameter has been replaced with a 'feature_selector_class' parameter. - Added a k-Nearest-Neighbor machine learner. [First revision implemented by David Bell] - Added a Rocchio machine learner. [Partially implemented by Xiaobo Li] - Added a "Guesser" machine learner which simply uses overall class probabilities to make categorization decisions. Sometimes useful for providing a set of baseline scores against which to evaluate other machine learners. - The NaiveBayes learner is now a wrapper around my new Algorithm::NaiveBayes module, which is just the old NaiveBayes code from here, turned into its own standalone module. - Much more extensive regression testing of the code. - Added a Document subclass for XML documents. [Implemented by Jae-Moon Lee] Its interface is still unstable, it may change in later releases. - Added a 'Build.PL' file for an alternate installation method using Module::Build. - Fixed a problem in the Hypothesis' best_category() method that would often result in the wrong category being reported. Added a regression test to exercise the Hypothesis class. [Spotted by Xiaobo Li] - The 'categorizer' script now records more useful benchmarking information about time & memory in its outfile. - The AI::Categorizer->dump_parameters() method now tries to avoid showing you its entire list of stopwords. - Document objects now use a default 'name' if none is supplied. - For some Learner classes, the generated Hypothesis objects had non-functioning all_categories() methods. Fixed. - The Collection::Files class now uses File::Spec internally to manage cross-platform filenames. - Added the 'stopword_behavior' parameter for controlling how stopword lists and stemming interact. Previously, if stopwords & stemming were both used, stopwords were assumed to be pre-stemmed, which often isn't the case. - parse() is now an instance method of the Document class, not a class method. This means it can operate directly on an object, it doesn't have to return a hash of content. This allows more flexible document parsing. This may cause some backward compatibility problems if people were overriding the parse() method. - Added a parse_handle() method, which can parse a document directly from a filehandle. - Fixed documentation for add_hypothesis() [spotted by Thierry Guillotin] - Added documentation for the AI::Categorizer::Collection::Files class. 0.04 Thu Nov 7 19:27:15 AEST 2002 - Added learners for SVMs, Decision Trees, and a pass-through to Weka. - Added a virtual class for binary classifiers. - Wrote documentation for lots of the undocumented classes. - Added a PNG file giving an overview diagram of the classes. - Added a script 'categorizer' to provide a simple command-line interface to AI::Categorizer - save_state() and restore_state() now save to a directory, not a file. - Removed F1(), precision(), recall(), etc. from Util package since they're in Statistics::Contingency. Added random_elements() to Util. - Collection::Files now warns when no category information is known about a document in the collection (knowing it's in zero categories is okay). - Added the Collection::InMemory class - Much more thorough testing with 'make test'. - Added add_hypothesis() method to Experiment. - Added dot() and value() methods to FeatureVector. - Added 'feature_selection' parameter to KnowledgeSet. - Added document($name) accessor method to KnowledgeSet. - In KnowledgeSet, load(), read(), and scan_*() can now accept a Collection object. - Added document_frequency(), finish(), and weigh_features() methods to KnowledgeSet. - Added save_features() and restore_features() to KnowledgeSet. - Added default categories() and categorize() methods to Learner base class. get_scores() is now abstract. - Extended interface of ObjectSet class with retrieve(), includes(), and includes_name(). - Moved 'term_weighting' parameter from Document to KnowledgeSet, since the normalized version needs to know the maximum term-frequency. Also changed its values to 'n', 'l', 'b', and 't', with 'x' a synonym for 't'. - Implemented full range of TF/IDF term weighting methods (see Salton & Buckley, "Term Weighting Approaches in Automatic Text Retrieval", in journal "Information Processing & Management", 1988 #5) 0.03 Wed Jul 24 01:57:00 AEST 2002 - First version released to CPAN 0.01 Wed Apr 17 10:47:21 2002 - original version; created by h2xs 1.21 with options -XA -n AI::Categorizer