Docs: Did You Mean SWSC
authorAndrea Buntz Neiman <abneiman@equinoxinitiative.org>
Fri, 9 Apr 2021 14:42:37 +0000 (10:42 -0400)
committerGalen Charlton <gmc@equinoxinitiative.org>
Mon, 12 Apr 2021 15:22:56 +0000 (11:22 -0400)
Signed-off-by: Andrea Buntz Neiman <abneiman@equinoxinitiative.org>
Signed-off-by: Galen Charlton <gmc@equinoxinitiative.org>

docs/modules/admin_initial_setup/assets/images/media/dym_staffcat.png [new file with mode: 0644]
docs/modules/admin_initial_setup/nav.adoc
docs/modules/admin_initial_setup/pages/dym_admin.adoc [new file with mode: 0644]
docs/modules/opac/assets/images/media/dym_kpac.png [new file with mode: 0644]
docs/modules/opac/assets/images/media/dym_tpac.png [new file with mode: 0644]
docs/modules/opac/assets/images/media/dym_tpac_nohits.png [new file with mode: 0644]

diff --git a/docs/modules/admin_initial_setup/assets/images/media/dym_staffcat.png b/docs/modules/admin_initial_setup/assets/images/media/dym_staffcat.png
new file mode 100644 (file)
index 0000000..5c412fc
Binary files /dev/null and b/docs/modules/admin_initial_setup/assets/images/media/dym_staffcat.png differ
index 7e2c247..c2b4440 100644 (file)
@@ -1,28 +1,30 @@
 * xref:admin_initial_setup:introduction.adoc[System Configuration and Customization]
 ** xref:admin_initial_setup:describing_your_organization.adoc[Describing your organization]
 ** xref:admin_initial_setup:describing_your_people.adoc[Describing your people]
+*** xref:admin:patron_address_by_zip_code.adoc[Patron Address City/State/County Pre-Populate by ZIP Code]
 ** xref:admin_initial_setup:migrating_patron_data.adoc[Migrating Patron Data]
 ** xref:admin_initial_setup:migrating_your_data.adoc[Migrating from a legacy system]
 ** xref:admin_initial_setup:importing_via_staff_client.adoc[Importing materials in the staff client]
 ** xref:admin_initial_setup:ordering_materials.adoc[Ordering materials]
 ** xref:admin_initial_setup:designing_your_catalog.adoc[Designing your catalog]
-** xref:admin:search_interface.adoc[Designing the patron search experience]
-** xref:admin_initial_setup:borrowing_items.adoc[Borrowing items: who, what, for how long]
-** xref:admin:autorenewals.adoc[Autorenewals in Evergreen]
-** xref:admin_initial_setup:hard_due_dates.adoc[Hard due dates]
-** xref:admin:template_toolkit.adoc[TPac Configuration and Customization]
-** xref:admin_initial_setup:carousels.adoc[Carousels]
-** xref:opac:new_skin_customizations.adoc[Creating a New Skin: the Bare Minimum]
-** xref:admin:auto_suggest_search.adoc[Auto Suggest in Catalog Search]
-** xref:admin:authentication_proxy.adoc[Authentication Proxy]
+*** xref:admin:template_toolkit.adoc[TPac Configuration and Customization]
+*** xref:opac:new_skin_customizations.adoc[Creating a New Skin: the Bare Minimum]
+*** xref:admin_initial_setup:carousels.adoc[OPAC Carousels]
 ** xref:admin_initial_setup:KidsOPAC.adoc[Kid's OPAC Configuration]
 ** xref:admin_initial_setup:bootstrap_opac.adoc[Enabling the Experimental Bootstrap OPAC]
-** xref:admin:patron_address_by_zip_code.adoc[Patron Address City/State/County Pre-Populate by ZIP Code]
-** xref:admin:phonelist.adoc[Phonelist.pm Module]
-** xref:admin:sip_server.adoc[SIP Server]
+** xref:admin:search_interface.adoc[Designing the patron search experience]
+*** xref:admin:auto_suggest_search.adoc[Auto Suggest in Catalog Search]
+*** xref:admin_initial_setup:dym_admin.adoc[Did You Mean? Search Suggestions]
+*** xref:admin_initial_setup:geosort_admin.adoc[Configuring Sort by Geographic Proximity]
+** xref:admin_initial_setup:borrowing_items.adoc[Borrowing items: who, what, for how long]
+*** xref:admin:autorenewals.adoc[Autorenewals in Evergreen]
+*** xref:admin_initial_setup:hard_due_dates.adoc[Hard due dates]
 ** xref:admin:apache_rewrite_tricks.adoc[Apache Rewrite Tricks]
 ** xref:admin:apache_access_handler.adoc[Apache Access Handler Perl Module]
+** xref:admin:authentication_proxy.adoc[Authentication Proxy]
+*** xref:admin_initial_setup:single_sign_on.adoc[Single Sign On]
+** xref:admin:backups.adoc[Backing up your Evergreen System]
 ** xref:admin:ebook_api_service.adoc[ebook_api service]
 ** xref:admin:hold_targeter_service.adoc[hold-targeter service]
-** xref:admin:backups.adoc[Backing up your Evergreen System]
-
+** xref:admin:phonelist.adoc[Phonelist.pm Module]
+** xref:admin:sip_server.adoc[SIP Server]
diff --git a/docs/modules/admin_initial_setup/pages/dym_admin.adoc b/docs/modules/admin_initial_setup/pages/dym_admin.adoc
new file mode 100644 (file)
index 0000000..f74ccc6
--- /dev/null
@@ -0,0 +1,151 @@
+= Did You Mean?: Search Suggestions Administration
+:toc:
+
+indexterm:[Searching,Search Suggestions] 
+
+== Introduction
+
+As of 3.7, the work for Did You Mean enables search suggestions for a search comprising a single word within a single search class. For the purposes of suggestions, a search class in Evergreen is a keyword, title, author, series, or subject. Search suggestions are available in the public catalog (both TPAC and Bootstrap versions), the Children's OPAC (KPAC), and the Angular Staff Catalog.
+
+Future iterations of this project are planning to add multi word, cross
+class, and other search suggestion mechanisms.
+
+Several search suggestion ordering mechanisms have been added, and are
+described below in the Library Settings section. The relative weights of
+each suggestion ordering mechanism can be adjusted to prioritize
+different suggestion routes. Each Evergreen organization will need to
+determine the best configuration of weights and suggestion ordering
+settings.
+
+Search suggestions are based on existing bibliographic data, and are
+offered for potentially correctable spelling mistakes. A new set of
+tables have been added to collect bibliographic data and build an
+internal dictionary of potential search suggestions. When a catalog
+search meets criteria for offering suggestions, this dictionary is used
+to generate the suggestions.
+
+The end user will be shown a configurable number of suggestions,
+hyperlinked to execute a new search based on that suggestion. Any search
+options such as Format that were initially set will be carried over to
+the new search.
+
+Evergreen’s existing use of search term stemming has not been altered as
+a consequence of this work.
+
+== Search Results Display
+
+In all cases, search suggestions will be offered for potentially
+correctable spelling mistakes if a search retrieves fewer than a
+configured number of results; and potential suggested terms appear at
+least a configurable number of times within the bibliographic data. Both
+of these thresholds are configured via Library Settings described below.
+
+For examples of where suggestions display in various public catalog interfaces, please see the documentation in the  xref:opac:using_the_public_access_catalog.adoc#did_you_mean[Did You Mean?] section of the OPAC documentation. 
+
+Search suggestions in the Staff Catalog appear at the bottom of the search area.
+
+image::media/staffcat.png[Search suggestions in the Staff Catalog]
+
+== Administration
+
+=== Library Settings
+
+Search suggestions are controlled by several Library Settings. Three
+settings set thresholds for spelling suggestions, and three settings
+control the weighting of different suggestion mechanisms. A lower number
+represents a ‘lighter’ weight. All settings accept a number value as
+input. Library settings are inheritable, unless there is an
+organizationally closer setting.
+
+* *Maximum search result count at which spelling suggestions may be offered*
+** Default value is 0, which means suggestions will only be offered if
+there are no results.
+** If a search has this number or fewer results, and there are correctable
+spelling mistakes, a suggested search may be provided.
+** If you want all searches to generate suggestions, you can set this to an
+artificially high number, but it’s possible that this will generate
+less-useful suggestions.
+* *Minimum required uses of a spelling suggestions that may be offered*
+** Default is 1.
+** The number of indexed bibliographic strings in which a spelling
+suggestion must appear in order to be offered to a user. Suggestions
+must appear in the bib data.
+* *Maximum number of spelling suggestions that may be offered*
+** The maximum recommended value for this setting is 3, since suggestions
+become rapidly less useful beyond that point.
+** If this is set to 0, no suggestions will be provided.
+** All values other than 0 only provide suggestions that meet the *Minimum
+required uses* threshold, and only when the *Maximum search result
+count* threshold is not passed.
+** If this is set to -1, the system will provide the best suggestion
+(dependent on the weights of various suggestion mechanisms) if and only
+if the term is considered misspelled based on the *Minimum required
+uses* setting.
+** If this is set to 1 or more, that is the maximum number of suggestions
+that will be provided.
+* *Pg_trgm score weighting in OPAC spelling suggestions*
+** Defaults to 0 for "off".
+** Controls the relative weight of the scaled pg_trgm component.
+** Input can be any positive or negative whole number, but testing
+demonstrates that setting this to 1 can significantly improve
+suggestions for most catalogs.
+* *Soundex score weighting in OPAC spelling suggestions*
+** Defaults to 0 for "off".
+** Controls the relative weight of the scaled soundex component.
+** Input can be any positive or negative whole number, but testing
+demonstrates that setting this to 1 can improve suggestions for catalogs
+that are primarily English.
+* *QWERTY Keyboard similarity score weighting in OPAC spelling
+suggestions*
+** Defaults to 0 for "off".
+** Controls the relative weight of the scaled keyboard distance component.
+** While this option is available, it can have a negative impact on
+suggestions and a value greater than 0 is not recommended for most
+catalogs.
+** If an administrator decides to use this weighting, it will accept any
+positive or negative whole number value.
+
+
+The three similarity measures, Pg_trgm (Tri-gram), Soundex, and QWERTY
+Keyboard similarity, are calculated by comparing the user's search input
+to each potential suggestion. The Library Setting numerical values for
+Pg_trgm, Soundex, and QWERTY are multipliers for each similarity
+measure. For example, setting the Pg_trgm weight to 2 will double the
+raw score for that similarity measure.
+
+The final order of a group of potential suggestions is determined first
+by the Damerau-Levenshtein edit distance, and then by the summed value
+of the weighting measures, each multiplied by its score weight. If
+suggestions coming from a particular corpus are shown to benefit from
+giving additional consideration to one or more of the measures, their
+weighting score can be increased.
+
+Empirical testing and existing research shows that increasing the weight
+of any similarity measure beyond 1 is not useful in a reasonable,
+representative set of bibliographic records, and that a multiplier of 1
+for Pg_trgm and Soundex is ideal for primarily-English catalogs, but all
+data sets vary.
+
+=== Internal flags
+
+The suggestion mechanism primarily uses a SymSpell implementation in
+Evergreen’s Postgres database. The SymSpell edit distance and prefix key
+length are controlled by two internal global flags,
+*symspell.prefix_length* and *symspell.max_edit_distance*. A full
+dictionary rebuild is required if either of these flags are changed.
+
+The SymSpell algorithm mandates the use of the Damerau-Levenshtein
+algorithm which includes insertion, deletion, substitution, and
+transposition cost calculations. While the original plan was to make use
+of the built-in Postgres implementation of the Levenshtein edit distance
+algorithm, results of partner testing led us to replace the built-in
+option with an external Damerau-Levenshtein implementation.
+
+A recommended set of values for the SymSpell settings is *6* for
+*symspell.prefix_length* and *3* for *symspell.max_edit_distance*.
+
+This set of values is known to provide a very good balance between
+accuracy and resource consumption based on empirical testing of the
+algorithm and analysis of English language texts. For further
+explanation of why these settings are recommended, please see
+https://medium.com/@wolfgarbe/1000x-faster-spelling-correction-algorithm-2012-8701fcd87a5f[this article] and the embedded links to benchmarks and later improvements.
diff --git a/docs/modules/opac/assets/images/media/dym_kpac.png b/docs/modules/opac/assets/images/media/dym_kpac.png
new file mode 100644 (file)
index 0000000..e2b4dde
Binary files /dev/null and b/docs/modules/opac/assets/images/media/dym_kpac.png differ
diff --git a/docs/modules/opac/assets/images/media/dym_tpac.png b/docs/modules/opac/assets/images/media/dym_tpac.png
new file mode 100644 (file)
index 0000000..2db7275
Binary files /dev/null and b/docs/modules/opac/assets/images/media/dym_tpac.png differ
diff --git a/docs/modules/opac/assets/images/media/dym_tpac_nohits.png b/docs/modules/opac/assets/images/media/dym_tpac_nohits.png
new file mode 100644 (file)
index 0000000..be90f91
Binary files /dev/null and b/docs/modules/opac/assets/images/media/dym_tpac_nohits.png differ