Blog Home  Home Feed your aggregator (RSS 2.0)  
Venexus DotNetNuke Blog - MS SQL Server Full-Text Indexing Info Dump
DotNetNuke Articles, Code Snippets, Errors, and News
 
 Monday, June 19, 2006

The following is a collection of information I have collected, found, and written concerning MS SQL Server Full-Text Indexing. In order to keep it in a single place, I decided to dump it all here and update as needed...

SQL Server 2005 Full-text Indexing came with several new features:

  • Significant performance increases especially with indexing.
  • A dedicated indexing service that works directly with SQL Serrver. This speeds up full-text operations and isolates SQL Server from changes to the search service made by other applications. 
  • Secure by default. All iFilters (the component which extracts the text from the content stored in the rows) must be signed before SQL Server FTS will load them. 
  • The ability to full-text index Indexed Views which simplifies partitioning.
  • Data definition language (DDL) statements for creating and altering full-text catalogs and indexes.
  • Improved Language Support SQL 2005 FTS now supports indexing and searching in 23 different languages. SQL 2005 FTS will respect language tags stored in the content which the iFilter can interpret. You can also override the language settings on a column in your query.
  • Accent Insensitivity-SQL FTS can now be configured to be accent insensitive so searches on resume will match with résumé. This option is enabled by default, but can be disabled.
  • Noise Insensitivity-SQL FTS queries will no longer break when one of the search arguments (SARGs) is a noise word. This option is enabled by default, but can be disabled.
  • Thesaurus support in the Contains and FreeText predicates. 
  • Multi-Column Queries-SQL FTS supports searching a single full-text indexed column, all full-text indexed columns, or a sub-set of the full-text indexed columns in a single query. 
  • Support for linked servers-It is now possible to query full-text catalogs on remote server through a linked server.
  • Replication support If a table is full-text indexed you can replicate the full-text properties to your subscriber if the subscription database is full-text enabled.
  • Backup and restored support for full-text catalogs. You can now backup your full-text catalog and restore it on a different server.
  • Attach/detach full-text catalogs with your databases. You can include your full-text catalog in your detached database files and optionally reattach your full-text catalogs along with your database. 
  • Properties SQL FTS 2005 now supports the indexing and querying of document properties stored in the Image or VarBinary(MAX) data type columns. 
  • Full-text indexing for XML data.
  • Troubleshooting utilities two new utilities ship with SQL 2005 to assist in the troubleshooting efforts: 
    • lrtest - an executable which allows you to see how the word breakers interpret a token at query time and at index time
    • CiDump - a tool which allows you to view the contents of your catalog to determine what is indexed, and how tokens are stored in the index.

    You can find these file in C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Binn 

  • The indexing logs are now stored as plain text in C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Log
  • Integration with SQL Profiler and logging of index operations.

If you have never used Full-text indexing, it's pretty easy to setup...

Go into the database properties and enable full-text indexing:

Right click on the table and select Full-Text Index > Define Full-Text Index...

 

Wizard will popup, click Next:

 

Select the Primary Key:

 

Select the columns you want to index:

 

Start a full population when complete:

 

Since we already had a few other catalogs, we needed to create a new one. And, this new catalog will now become the default catalong by slecting the option.

 

We also wanted to create a recurring task to populate the catalog daily while we are testing:

 

Just click Finish!

Alternatively you can use the following queries to create a Full-Text Index:

EXEC sp_fulltext_database 'enable'

EXEC sp_fulltext_table 'BrainDump', 'create', 'BrainDumpCatalog', 'PK_BrainDump'

EXEC sp_fulltext_column 'BrainDump', 'IndexWashedContent', 'add'

EXEC sp_fulltext_table 'BrainDump', 'activate'

EXEC sp_fulltext_table 'BrainDump', 'start_full'

 

If you go back and right click on the table and Full-Text Index, you will now see an option to "Stop Population". This means that the catalog is being built. Depending on the size of your table, once it has completed a population of the catalog, you can start performing queries...

There are four additions to SQL Server's version of SQL that allow access to full-text catalogs:

  • CONTAINS is used in the WHERE clause of a query to find matches to exact words and phrases, with other options such as word proximity, weighted terms, and inflection of words.

  • CONTAINSTABLE has the same functionality as CONTAINS but is used in the FROM clause of a query and offers the added ability to incorporate relevancy to the results.

  • FREETEXT is used in the WHERE clause of a query to perform matches on the meaning of the words or phrase.

  • FREETEXTTABLE is similar to FREETEXT except that it's used in the FROM clause and can add relevancy to the results.

Example Queries:

SELECT IndexWashedContent FROM BrainDump WHERE CONTAINS (IndexWashedContent, ' "someword anotherword" ')

SELECT IndexWashedContent FROM BrainDump WHERE FREETEXT(IndexWashedContent, 'someword')

Search using NEAR finds words that are close together:

SELECT IndexWashedContent FROM BrainDump WHERE CONTAINS(IndexWashedContent, '"someword*" NEAR anotherword')


You can also supply a weighted list of terms to CONTAINS, and it will prefer matches with a higher weight:

SELECT IndexWashedContent FROM BrainDump WHERE CONTAINS(IndexWashedContent, 'ISABOUT (someword weight (.8), anotherword weight (.4), andanotherword weight (.2) )' )

Full-text Indexing Performance:

Microsoft recommends these two settings for optimum performance:

The virtual memory (PAGEFILE.SYS file) setting for your operating system should be set to an amount equal to 3 times the amount of physical RAM in the server. If you have a non-dedicated SQL Server (a server running applications in addition to SQL Server) then you will want to add the virtual memory needs of these other applications to the amount calculated above.
 

The SQL Server MAX SERVER MEMORY setting should be set manually (dynamic memory allocation is turned off) so that enough virtual memory is left for the Full-Text Search service to run. To achieve this, select a MAX SERVER MEMORY setting that once set, leaves enough virtual memory so that the Full-Text Search service is able to access an amount of virtual memory equal to 1.5 times the amount of physical RAM in the server. This will take some trial and error to achieve this setting.

To find out how much virtual memory is being used by SQL Server and the Full-Text Search Service, you can use the Task Manager. By default, the Task Manager does not display the amount of virtual memory used by a process. To see this number in Task Manager, you must first go to the "Processes" tab. Once there, select "View", and then "Select Columns". From the "Select Columns" dialog box, click on "Virtual Memory Size", then "OK". Now you will be able to see the amount of virtual memory size used by each process on your server using Task Manager. Use this information to help you tune your server for use with the Full-Text Search service.

More Info:
Extending 2005 FTS

Language Features

Using Full-Text Search Catalogs

Extensions to SQL Server to Support Full-Text Search

Have anything to add? Please let us know.

Monday, June 19, 2006 12:28:44 AM (US Eastern Standard Time, UTC-05:00)  #       |   | 
Copyright © 2010 Venexus, Inc.. All rights reserved.