Blog Home  Home Feed your aggregator (RSS 2.0)  
Venexus DotNetNuke Blog - Differences Between DNN Search Engines
DotNetNuke Articles, Code Snippets, Errors, and News
 
 Sunday, April 08, 2007

 I have been asked to compare the differences between our search engine and Open-SearchEngine. I agree this is an important question that needs to be answered, so I decided to put together a comparison between the core DNN Search, Open-SearchEngine, and Venexus Search Engine. While my opinion of which is the best, is defintely biased toward our own product, I have tried to provide an in-depth look at the basics of how each search engine works, a feature matrix, and simple search results analysis. Without further ado, read on...

DotNetNuke Search (core project)
DNN Search is part of the DNN core that is installed and configured out of the box.
 
DotNetNuke Search consists of 4 main pieces:
  • Scheduled Task

The scheduled task initiates the process of indexing the modules, at the scheduled time interval. An iteration of all modules that support iSearchable is performed. During this process, text that is extracted from the module is cleaned, parsed, and added to search word and search items tables.

  • Search Admin

                    The search admin is for setting the maximum word length, minimum word length, option to include common words, and the option to include numbers. 

  • Search Input Module

A module or skin object can be used to provide the form for the search query. In module settings, you can use the default button, or an image. You do not have the option to change this image within the module, nor change the text. Styles can be used to make some look and feel changes, but it is limited. When a search is performed, the user is redirected to the Search Results page.

  • Search Results Module
This module provides the search results. In the settings, you can set the maximum search results, results per page, maximum title length, maximum description length, and the option to show description. Results are limited to the exact word queried.
 
Oddly enough, there no longer appears to be a DNN forum for search, or a blog dedicated to it on the DotNetNuke website. However, a good place to find out more about the core module is ecktwo’s site. There is a lot of information about how all the pieces work together, as well as the bugs/issues of DotNetNuke Search. There is also a tutorial and report on DNN Search for DNN 4.
 
Open-SearchEngine
Open-SearchEngine is developed by Xepient Solutions. The package is capable of indexing HTML content as well as PDF’s and several Office documents. Open-SearchEngine uses Lucene.Net, a port of the Java Lucene Search Engine, for indexing and querying.
 
Open-Search Engine consists of 4 main pieces:
  • Scheduled Task

Test The scheduled task initiates the process of spidering, at the scheduled time interval. Lucene.Net handles indexing of the data.

  • Search Engine Admin Module

This module provides an interface for configuring the search engine to your preferences. You can add a starting URL and by default, spidering is enabled. This allows you to offer multiple sites in your search engine. However, unless disabled, each time you run the process to update the index, all URLs are re-crawled. With many URLs on the site(s) you index, it can lead to a very long time between the completion of crawling and indexing runs.

  • Search Input Module 

A module or skin object can be used to provide the form for the search query. In module settings, you can use the default button, or an image. You also have the option to add “Search” as text or image before the textbox.

  • Search Results Module
This module provides the search results. In the settings, you can set which sites are part of the results scope, maximum results per page, maximum title length, title link target, and the option to hide description.
 
 
 
Venexus Search Engine
The Venexus Search Engine is quite different than the other 2 solutions. The package includes 2 modules and requires MS SQL Server Full-Text Indexing. Like traditional crawlers, VSE can crawl and index a variety of data, but where the real difference is seen is in it's ability to also “crawl” and index RSS feeds. This is the key to keeping the search results up-to-date, while conserving server and bandwidth resources. Rather than recrawling and reindexing all content, "smart caching" is used to determine when RSS feeds need to be aggregated, and when non-syndicated content needs to be recrawled on the site.
  
The Venexus Search Engine consists of 2 main pieces:
  • Seamus Module

The Seamus module is the “search engine aggregation module utilizing syndication”. On the first load of the module, Seamus iterates through the core DNN modules on all portals that support the iPortable interface. Seamus uses this “initial dump” to gather other URLs for the site. You also have the ability to add feeds to Seamus, not only for your site, but any external site. With “global crawler” enabled, any external site URLs that are discovered during crawling, are added to the queue as well.  Using AJAX, Seamus performs crawling of 3 feeds and 3 URLs with each load. If the user remains on the page, using AJAX, Seamus will continue to crawl and save the data to the table for indexing.  This decreases the load on the server by spreading the crawling and indexing across several user sessions, rather than a single scheduled task.

  • Search Module

The Search module provides the search box, as well as the results. Using Microsoft SQL Server’s feature of Full-Text Indexing, data is indexed from the crawling and storing provided by Seamus. Within the settings you can specify the search button text or use you own custom image for the button, set maximum search length, set search bx size, maximum results, results per page, set maximum length of display URL, specify remote connection string (database other than DNN), specify portal specific search, or allow user to select between site or all of the web search.

Feature Comparison Matrix:

Below you will find a list of features for DNN Search, Open-SearchEngine, Venexus Search Engine Standard, and Venexus Search Engine PRO.

Feature
DNN Search
Open-SearchEngine
Venexus Search Engine Standard
Venexus Search Engine PRO
Crawling Method
Module Indexer (Must implement iSearchable)
Custom URL crawler/spider (Must have starting URL for each site, with crawling enabled)
Custom Crawler
(Uses iPortable interface, traditional URL crawler/spider, and RSS aggregation)
Custom Crawler
(Uses iPortable interface, traditional URL crawler/spider, and RSS aggregation)
Crawl and Index Start
Requires DNN Scheduled Task
Requires DNN Scheduled Task
User Interactive (AJAX in aggregation module)
User Interactive (AJAX in aggregation module)
Global Crawler
No
No (Requires input of each domain)
No
Yes
DNN User Impersonation
No
Yes
No
No (Version 2.0)
Windows Authentication
No
Yes
No
No (Version 2.0)
Exclude List
No
Yes
Yes
Yes
Excel Documents
No
Yes
No
Yes
PDF Files
No
Yes
No
Yes
PowerPoints
No
Yes
No
Yes
RTF Files
No
No
No
Yes
Word Docs
No
Yes
No
Yes
Index File System
No
Yes
No
No (Version 2.0)
Index
Table Driven Index
Lucene.Net (flat file)
Full-Text Indexing in SQL Server (flat file)
Full-Text Indexing in SQL Server (flat file)
RSS
No
No
No
Yes
Enclosure Support (podcast/vodcast)
No
No
No
Yes
Feed Discovery
No
No
Yes
Yes
Smart Caching
No
No
Yes
Yes
Allow users to add feeds
No
No
No
Yes
Generates RSS Feed of latest items indexed
No
No
Yes
Yes
Blog and Feed Aggregator Pinging
No
No
No
Yes
Search Skin Object
Yes
Yes
Yes
Yes
Utilize DNN Search Skin
Yes
No
Yes
Yes
Modify search box and image
No
Yes
Yes
Yes
Use Image or Text for Search button
No
Yes
Yes
Yes
Portal(site) or Web search
No
No
Yes
Yes
Keyword Highlighting
No
Yes
Yes
Yes
Cached Version
No
No
No
No (Version 2.0)
User Saved Searches
No
No
No
No (Version 2.0)
Social Bookmarking
No
No
No
Yes
Price
Free
$49
Free
$199

Performance and Relevancy:

What about performance and the relevancy of the results? I setup a test site with 5 total pages of content and installed/configured DNN Search, Open-SearchEngine, and Venexus Search Engine on separate pages. I also installed PageGenerated module from Ventrian Systems to show page execution time. I am not sure of any accuracy for a benchmark here, but the following results are the best of 5 consecutive query executions against each search engine using "truman" without quotes as the search query. In reality, there are only 2 relevant pages associated with "truman". There is a link from the home page of the site with the text "Truman Doctrine" as a contextual link that directs the user to the full document about the "Truman Doctrine". Ideally, we should expect the document that is all about "truman" and his doctrine to be listed first:

DNN Search:

Best Execution Time: 0.218531 seconds

Results Returned: 1

Notes:

The only result returned is not the most relevant page on the site. In fact, the "Truman Doctrine" page is not even listed as a result. This must be because the word "truman" does not actually appear in the content of the text/html module on the Truman Doctrine page. There is "HARRY S. TRUMAN'S ADDRESS" in the content, but DNN Search can only return results where the query is spelled EXACTLY like something in the content.

Open-SearchEngine:
 

Best page Execution: 0.1093155 seconds

Returned Results: 10

Notes:

Notice the poor description and the fact that the true most relevant document (the "Truman Doctrine" page) is the 5th result. Also, there are several results of pages that have no information about "Truman" except for the link in the SolPartMenu. While it is good that the search engine is able to crawl the solpartmenu, it is unfortunate that the search engine weights pages that just have links in a menu higher than the most relevant result. The best page execution time was half that of DNN Search, which is excellent.

Venexus Search Engine:

Best Page Execution: 0.046866 seconds

Results Returned: 3

Notes:

Notice the first result is the actual document (the "Truman Doctrine" page)  we are looking for. Also, page execution time is less than half the time than Open-SearchEngine and a quarter of the time compared to DNN Search.

Conclusion:

The implementation provided by the DNN core team for the built-in DotNetNuke Search suits the needs for many smaller sites. However, larger sites will quickly run into issues with memory consumption due to the way the module indexing is performed. The search architecture is limited and greatly impacts the performance of the site and the search results due to the indexing process and the direct SQL table queries that holds the words and index. Most likely this is due to the requirement for database independence, rather than poor design. If your site is small, needs database independence, and search results are helpful, but not really an important piece of your site, then this may be the best tool for you.

If you are looking for a traditional search engine crawler, with good scalability, and you require database server independence, and decent search results, Open-SearchEngine may be the solution for you. It is by far better than the core DNN Search, but relies on tradional crawling and indexing methods. Conservation of bandwidth and server resources are debatable since there is no method of smart caching available. The ability for this engine to index direcories of files is an important feature than neither DNN Search, nor VSE offer. However, the lack of RSS aggregation as the new medium for crawling and gathering new and updated data is a huge issue that will lead to stagnant search results without frequently reindexing all URLs.  As evident from the simple search results analysis performed, most results are not really relevant, but it is better than not returning any true relevant results like DNN Search due to spelling differences. It just means your users will have plenty to click on before finding the correct document they are looking for. While execution time is certainly better than DNN Search, it is still significantly slower than the Venexus Search Engine execution time.

The Venexus Search Engine offers 2 versions, the standard (free version), and the Pro (not free version).  The standard version still offers many of the features smaller sites require, including quick and relevant results, but does not include some of the nicer features of the Pro version like PDF and MS Office document indexing and blog and feed aggregation pinging service. Where VSE really shines is in its ability to provide and aggregate RSS feeds for inclusion in its index. The smart caching and user interactive crawling using AJAX distributes the load on the server and bandwidth. The major advantage and disadvantage of VSE is MS SQL Server Full-Text indexing. The disadvantage is that VSE is NOT database independant and requires Full-Text indexing enabled versions of MS SQL Server in order to operate. The advantage is that it uses Full-Text Indexing from MS SQL Server for more relevant and faster search results. We know VSE is scalable because it has been tested against a database of over 2 million indexed pages. The simple search results analysis shows that it is 4 times faster than DNN Search and 2 times faster than Open-SearchEngine. The actual search results speak for themselves, delivering the most relevant result as #1 and contextual links from the home page as supplemental results.

Picking the right search engine application is important for your website and now you should be armed with the knowledge of how each one operates, the differences in features between them, and the overall performance and relevancy of the search results.

I hope this answers everyone's questions concerning the differences between the 3 DotNetNuke Search Engines. Feel free to comment with questions or suggestion on how this post can be improved. If you know of a feature or difference that I missed, please let me know. While this post is quite lengthy, I plan on keeping it updated as a resource for those who would like to keep track of the differences between each DNN search engine.

Sunday, April 08, 2007 6:37:38 PM (US Eastern Standard Time, UTC-05:00)  #       |  |  |  |   | 
Copyright © 2010 Venexus, Inc.. All rights reserved.