WWW::Search and AutoSearch and WebSearch ======================================== WHAT IS NEW IN WWW::Search 2.01? (1999-07-14) ---------------------------------------------- overview: * Makefile now works in Win32! * Bug fixes for some backends (as usual) * New backends for Google, LookSmart, and OpenDirectory * All backends with their own version numbers have been bumped up to 2.01 See the file ChangeLog for details. WHAT IS WWW::Search? -------------------- WWW::Search is a collection of Perl modules which provide an API to WWW (and similar) search engines. Currently WWW::Search includes back-ends for variations of AltaVista, Dejanews, Excite, HotBot, Infoseek, Lycos, Magellan, WebCrawler, and Yahoo, among others. We include two applications built from this library: AutoSearch (a program to automate tracking of search results over time), and WebSearch, a small demonstration program to drive the library. WWW::Search does NOT try to emulate the search that you would get with each search engine's GUI. WWW::Search performs the search in a way that is efficient and convenient for text processing. This might include getting "text-only" pages, making sure descriptions are turned on, and increasing the number of hits per page, among other tricks. Because WWW::Search depends on parsing the HTML output of web search engines it will fail if the search engine operators change their format (an unfortunately frequent occurrence). WWW::Search includes a test suite for most back-ends which verifies that it is functioning correctly. As of the day of the release the current back-end status is: AltaVista working AltaVista::AdvancedNews working AltaVista::AdvancedWeb working AltaVista::News working AltaVista::Web working AltaVista::Intranet working Crawler partially working Dejanews working Excite working Excite::News working ExciteForWebServers not working Fireball not working? FolioViews working Google working Gopher not working? (not in test suite) HotBot working HotFiles not working? (not in test suite) Infoseek working Infoseek::Companies working Infoseek::Email not working Infoseek::News working Infoseek::Web working Livelink not working? (not in test suite) LookSmart working Lycos working Magellan working Metacrawler not working Metapedia not working? (not in test suite) MSIndexServer not working NorthernLight working Null working OpenDirectory working PLweb not working Profusion working Search97 not working SFgate working Simple not working? (not in test suite) Snap working Verity not working (not in test suite) WebCrawler working Yahoo working ZDNet working ``Partially working'' indicates that some tests passed and some failed. WHAT IS AutoSearch? ------------------- WWW::Search's primary client is AutoSearch. AutoSearch performs a web-based search and puts the results set in a web page. It periodically updates this web page, indicating how the search changes over time. Sample output from AutoSearch can be found at . Output format is configurable. See the man page for AutoSearch details, or Demonstration section below for the quick-start instructions. REQUIREMENTS ------------ WWW::Search requires Perl5 and libwww-perl. For information on Perl5, see . For libwww-perl, see . Both are also available from the Comprehensive Perl Archive Network (CPAN). Visit to find a CPAN site near you. At the time of this release, the primary WWW::Search development and testing is under perl version 5.005_03 on Sun Sparc Solaris 7 and ActiveState perl build 517 on Windows NT 4.0 with service pack 5. AVAILABILITY ------------ The latest version of WWW::Search should always be available on CPAN. Feedback about WWW::Search is encouraged. If you're using it for a neat application, please let us know. If you'd like to (or have) implemented a new back-end for WWW::Search, let us know so we don't duplicate work. INSTALLATION ------------ In order to use this package you will need Perl version 5.002 or better. You install WWW::Search as you would install any perl module library, by running these commands: perl Makefile.PL make make test make install See below for a description of what "make test" does. If you want to install a private copy of WWW::Search in your home directory, then you should try to produce the initial Makefile with something like this command: perl Makefile.PL PREFIX=/my/perl/lib TESTING ------- The "make test" command compares expected output from WWW::Search with actual output. It detects two kinds of errors: - internal parsing: First it checks to make sure that your system computes the same results as my system based on some saved Web queries. This test should always pass for working backends; if it doesn't, send me mail. - external queries: Second, it makes real queries against the search engines and compares them with some saved results. External queries can fail for several reasons: - new pages have been added which match the test queries (not a bad thing) - changes in the web search engine output which break WWW::Search's parsers (a bad thing) If the external tests fail, please either investigate the error or send a description of the problem and the output of "make test" to the maintainer of the back-end for the search engine that fails. DISCUSSION, BUG REPORTS, AND IMPROVEMENTS ----------------------------------------- A mailing list for WWW::Search discussion exists. To subscribe, send "subscribe info-www-search" as the body of a message to . Back-end-related bug reports ("search engine ABC doesn't work") should be sent to the author of the back-end (back-end authors are identified in the corresponding man page and in the output of ``make test''). General bugs should be reported to . When submitting a bug report, please remember to include - your operating system name and version - your version of perl - your version of WWW::Search - your version of the backend - the code you ran to produce the error - sample output showing the error DEMONSTRATION ------------- After installing the client programs, try WebSearch '"Your Name Here"' to see who's talking about you on the web. Then (in your web page directory), try AutoSearch -n 'me on the web' -s '"Your Name Here"' me and the web page me/index.html will be created summarizing this information. Then add 0 3 * * 1 AutoSearch /path/to/your/web/pages/me to your crontab(1) to update this search once a week. DOCUMENTATION ------------- See `perldoc WWW::Search` for an overview of the library. POD-style documentation is also included in all modules and scripts. FUTURE PLANS ------------ Some ideas: - a global option that will force WWW::Search to perform the same search as the engine's web GUI (I'm looking for contributions of the precise arguments that will produce such a search for each engine; i.e. the hash that should be passed as the second argument to native_query) - application-level proxy support (I'm looking for a contribution here from someone who uses/needs proxy support) - more widespread use of new results tags across all back-ends - a freeze/restore interface to suspend and resume in-progress queries - more back-ends Contributions from others are always welcome. Send me e-mail if you plan a new back-end and to discuss architectural changes (to avoid duplicating work). SUPPORT AND CREDITS ------------------- The WWW::Search architecture is by John Heidemann with feedback from the other contributors. NOTE: This list is not updated; consult the on-line documentation to find out who is currently maintaining each component. PLATFORM SUPPORT: Unix John Heidemann Windows Jim Smyser (see ) APPLICATIONS: WebSearch John Heidemann AutoSearch William Scheding BACK-ENDS: AltaVista John Heidemann Dejanews Cesare Feroldi de Rosa and Martin Thurn Crawler Andreas Borchert Excite GLen Pringle and Martin Thurn ExciteForWebServers Paul Lindner Fireball Andreas Borchert FolioViews Paul Lindner Gopher Paul Lindner HotBot William Scheding and Martin Thurn HotFiles Jim Smyser Infoseek Cesare Feroldi de Rosa and Martin Thurn Livelink Paul Lindner Lycos William Scheding and John Heidemann, Martin Thurn Magellan Martin Thurn MSIndexServer Paul Lindner NorthernLight Jim Smyser Null Paul Lindner OpenDirectory Jim Smyser PLWeb Paul Lindner Profusion Jim Smyser Search97 Paul Lindner SFgate Paul Lindner Simple Paul Lindner Snap Jim Smyser Verity Paul Lindner WebCrawler Martin Thurn Yahoo William Scheding and Martin Thurn ZDNet Jim Smyser AutoSearch is based on an earlier implementation by Kedar Jog with advice from Joe Touch . Bugs and extensions (to the software and documentation) have been identified by William Scheding , T. V. Raman (proxy support), C. Feroldi , Larry Virden , Paul Lindner , Guy Decoux , R Chandrasekar (Mickey) , Martin Thurn , Chris Nandor , Martin Valldeby , Jim Smyser , Darren Stalder , Neil Bowers , Ave Wrigley , Andreas Borchert , Jim Smyser . Bugs have reported by Joseph McDonald , Juan Jose Amor , Bowen Dwelle , Vassilis Papadimos , Vidyut Luther , Chris P. Acantilado . Feedback, bug reports and fixes, and new back-ends should be sent to Martin Thurn . When sending e-mail, please please put [WWW::Search] at the beginning of the subject line (or risk me losing the message in the pile). COPYRIGHT --------- Copyright (c) 1996 University of Southern California. All rights reserved. Redistribution and use in source and binary forms are permitted provided that the above copyright notice and this paragraph are duplicated in all such forms and that any documentation, advertising materials, and other materials related to such distribution and use acknowledge that the software was developed by the University of Southern California, Information Sciences Institute. The name of the University may not be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. Portions of this README are derived from the README for libwww-perl.