Blog Home  Home Feed your aggregator (RSS 2.0)  
Venexus DotNetNuke Blog - MS SQL Server
DotNetNuke Articles, Code Snippets, Errors, and News
 
 Saturday, May 26, 2007

We had a DNN 4.4.1 to 4.5.1 upgrade that threw a few SQLDataProvider errors during the installer that were caused by duplicate files in DNN Files table. Not sure how they got there, but here is a SQL statement to check if they exist:

SELECT PortalID, Filename, Folder, COUNT(Filename) AS NumOccurrences

FROM files

GROUP BY Filename, PortalID, Folder

HAVING ( COUNT(Filename) > 1 AND Count(PortalID) > 1 AND Count(Folder) > 1)

Here is the SQL that was provided in the log file from the installer:

/* add unique constraint to Files table */
IF NOT EXISTS (select * from dbo.sysobjects where id = object_id(N'dbo.[IX_FileName]') and OBJECTPROPERTY(id, N'IsConstraint') = 1)
BEGIN
  declare @FolderID int
  declare @FileName nvarchar(100)
  declare @FileID int
  declare @MinFileID int

  select @FolderID = min(FolderID)
  from Folders
  while @FolderID is not null
  begin 
    /* check for duplicate Filenames */
    select @FileName = null
    select @FileName = FileName
    from Files
    where FolderID = @FolderID
    group by FileName
    having COUNT(*) > 1
 
    /* if duplicates exist */
    if @FileName is not null
    begin
      /* iterate through the duplicates */
      select @FileID = min(FileID)
      from Files
      where FolderID = @FolderID
      and FileName = @FileName

      /* save min FileID */
      select @MinFileID = @FileID

      while @FileID is not null
      begin
        if @FileID <> @MinFileID
        begin
          /* remove duplicate file */
          delete
          from Files
          where FileID = @FileID
        end

        select @FileID = min(FileID)
        from Files
        where FolderID = @FolderID
        and FileName = @FileName
        and FileID > @FileID
      end
    end

    select @FolderID = min(FolderID)
    from Folders
    where FolderID > @FolderID
  end
  
  ALTER TABLE dbo.Files ADD CONSTRAINT
    IX_FileName UNIQUE NONCLUSTERED
    (
      FolderID,
      FileName
    ) ON [PRIMARY]
END

 

Saturday, May 26, 2007 12:48:57 PM (US Eastern Standard Time, UTC-05:00)  #       |   | 
 Sunday, April 08, 2007

 I have been asked to compare the differences between our search engine and Open-SearchEngine. I agree this is an important question that needs to be answered, so I decided to put together a comparison between the core DNN Search, Open-SearchEngine, and Venexus Search Engine. While my opinion of which is the best, is defintely biased toward our own product, I have tried to provide an in-depth look at the basics of how each search engine works, a feature matrix, and simple search results analysis. Without further ado, read on...

DotNetNuke Search (core project)
DNN Search is part of the DNN core that is installed and configured out of the box.
 
DotNetNuke Search consists of 4 main pieces:
  • Scheduled Task

The scheduled task initiates the process of indexing the modules, at the scheduled time interval. An iteration of all modules that support iSearchable is performed. During this process, text that is extracted from the module is cleaned, parsed, and added to search word and search items tables.

  • Search Admin

                    The search admin is for setting the maximum word length, minimum word length, option to include common words, and the option to include numbers. 

  • Search Input Module

A module or skin object can be used to provide the form for the search query. In module settings, you can use the default button, or an image. You do not have the option to change this image within the module, nor change the text. Styles can be used to make some look and feel changes, but it is limited. When a search is performed, the user is redirected to the Search Results page.

  • Search Results Module
This module provides the search results. In the settings, you can set the maximum search results, results per page, maximum title length, maximum description length, and the option to show description. Results are limited to the exact word queried.
 
Oddly enough, there no longer appears to be a DNN forum for search, or a blog dedicated to it on the DotNetNuke website. However, a good place to find out more about the core module is ecktwo’s site. There is a lot of information about how all the pieces work together, as well as the bugs/issues of DotNetNuke Search. There is also a tutorial and report on DNN Search for DNN 4.
 
Open-SearchEngine
Open-SearchEngine is developed by Xepient Solutions. The package is capable of indexing HTML content as well as PDF’s and several Office documents. Open-SearchEngine uses Lucene.Net, a port of the Java Lucene Search Engine, for indexing and querying.
 
Open-Search Engine consists of 4 main pieces:
  • Scheduled Task

Test The scheduled task initiates the process of spidering, at the scheduled time interval. Lucene.Net handles indexing of the data.

  • Search Engine Admin Module

This module provides an interface for configuring the search engine to your preferences. You can add a starting URL and by default, spidering is enabled. This allows you to offer multiple sites in your search engine. However, unless disabled, each time you run the process to update the index, all URLs are re-crawled. With many URLs on the site(s) you index, it can lead to a very long time between the completion of crawling and indexing runs.

  • Search Input Module 

A module or skin object can be used to provide the form for the search query. In module settings, you can use the default button, or an image. You also have the option to add “Search” as text or image before the textbox.

  • Search Results Module
This module provides the search results. In the settings, you can set which sites are part of the results scope, maximum results per page, maximum title length, title link target, and the option to hide description.
 
 
 
Venexus Search Engine
The Venexus Search Engine is quite different than the other 2 solutions. The package includes 2 modules and requires MS SQL Server Full-Text Indexing. Like traditional crawlers, VSE can crawl and index a variety of data, but where the real difference is seen is in it's ability to also “crawl” and index RSS feeds. This is the key to keeping the search results up-to-date, while conserving server and bandwidth resources. Rather than recrawling and reindexing all content, "smart caching" is used to determine when RSS feeds need to be aggregated, and when non-syndicated content needs to be recrawled on the site.
  
The Venexus Search Engine consists of 2 main pieces:
  • Seamus Module

The Seamus module is the “search engine aggregation module utilizing syndication”. On the first load of the module, Seamus iterates through the core DNN modules on all portals that support the iPortable interface. Seamus uses this “initial dump” to gather other URLs for the site. You also have the ability to add feeds to Seamus, not only for your site, but any external site. With “global crawler” enabled, any external site URLs that are discovered during crawling, are added to the queue as well.  Using AJAX, Seamus performs crawling of 3 feeds and 3 URLs with each load. If the user remains on the page, using AJAX, Seamus will continue to crawl and save the data to the table for indexing.  This decreases the load on the server by spreading the crawling and indexing across several user sessions, rather than a single scheduled task.

  • Search Module

The Search module provides the search box, as well as the results. Using Microsoft SQL Server’s feature of Full-Text Indexing, data is indexed from the crawling and storing provided by Seamus. Within the settings you can specify the search button text or use you own custom image for the button, set maximum search length, set search bx size, maximum results, results per page, set maximum length of display URL, specify remote connection string (database other than DNN), specify portal specific search, or allow user to select between site or all of the web search.

Feature Comparison Matrix:

Below you will find a list of features for DNN Search, Open-SearchEngine, Venexus Search Engine Standard, and Venexus Search Engine PRO.

Feature
DNN Search
Open-SearchEngine
Venexus Search Engine Standard
Venexus Search Engine PRO
Crawling Method
Module Indexer (Must implement iSearchable)
Custom URL crawler/spider (Must have starting URL for each site, with crawling enabled)
Custom Crawler
(Uses iPortable interface, traditional URL crawler/spider, and RSS aggregation)
Custom Crawler
(Uses iPortable interface, traditional URL crawler/spider, and RSS aggregation)
Crawl and Index Start
Requires DNN Scheduled Task
Requires DNN Scheduled Task
User Interactive (AJAX in aggregation module)
User Interactive (AJAX in aggregation module)
Global Crawler
No
No (Requires input of each domain)
No
Yes
DNN User Impersonation
No
Yes
No
No (Version 2.0)
Windows Authentication
No
Yes
No
No (Version 2.0)
Exclude List
No
Yes
Yes
Yes
Excel Documents
No
Yes
No
Yes
PDF Files
No
Yes
No
Yes
PowerPoints
No
Yes
No
Yes
RTF Files
No
No
No
Yes
Word Docs
No
Yes
No
Yes
Index File System
No
Yes
No
No (Version 2.0)
Index
Table Driven Index
Lucene.Net (flat file)
Full-Text Indexing in SQL Server (flat file)
Full-Text Indexing in SQL Server (flat file)
RSS
No
No
No
Yes
Enclosure Support (podcast/vodcast)
No
No
No
Yes
Feed Discovery
No
No
Yes
Yes
Smart Caching
No
No
Yes
Yes
Allow users to add feeds
No
No
No
Yes
Generates RSS Feed of latest items indexed
No
No
Yes
Yes
Blog and Feed Aggregator Pinging
No
No
No
Yes
Search Skin Object
Yes
Yes
Yes
Yes
Utilize DNN Search Skin
Yes
No
Yes
Yes
Modify search box and image
No
Yes
Yes
Yes
Use Image or Text for Search button
No
Yes
Yes
Yes
Portal(site) or Web search
No
No
Yes
Yes
Keyword Highlighting
No
Yes
Yes
Yes
Cached Version
No
No
No
No (Version 2.0)
User Saved Searches
No
No
No
No (Version 2.0)
Social Bookmarking
No
No
No
Yes
Price
Free
$49
Free
$199

Performance and Relevancy:

What about performance and the relevancy of the results? I setup a test site with 5 total pages of content and installed/configured DNN Search, Open-SearchEngine, and Venexus Search Engine on separate pages. I also installed PageGenerated module from Ventrian Systems to show page execution time. I am not sure of any accuracy for a benchmark here, but the following results are the best of 5 consecutive query executions against each search engine using "truman" without quotes as the search query. In reality, there are only 2 relevant pages associated with "truman". There is a link from the home page of the site with the text "Truman Doctrine" as a contextual link that directs the user to the full document about the "Truman Doctrine". Ideally, we should expect the document that is all about "truman" and his doctrine to be listed first:

DNN Search:

Best Execution Time: 0.218531 seconds

Results Returned: 1

Notes:

The only result returned is not the most relevant page on the site. In fact, the "Truman Doctrine" page is not even listed as a result. This must be because the word "truman" does not actually appear in the content of the text/html module on the Truman Doctrine page. There is "HARRY S. TRUMAN'S ADDRESS" in the content, but DNN Search can only return results where the query is spelled EXACTLY like something in the content.

Open-SearchEngine:
 

Best page Execution: 0.1093155 seconds

Returned Results: 10

Notes:

Notice the poor description and the fact that the true most relevant document (the "Truman Doctrine" page) is the 5th result. Also, there are several results of pages that have no information about "Truman" except for the link in the SolPartMenu. While it is good that the search engine is able to crawl the solpartmenu, it is unfortunate that the search engine weights pages that just have links in a menu higher than the most relevant result. The best page execution time was half that of DNN Search, which is excellent.

Venexus Search Engine:

Best Page Execution: 0.046866 seconds

Results Returned: 3

Notes:

Notice the first result is the actual document (the "Truman Doctrine" page)  we are looking for. Also, page execution time is less than half the time than Open-SearchEngine and a quarter of the time compared to DNN Search.

Conclusion:

The implementation provided by the DNN core team for the built-in DotNetNuke Search suits the needs for many smaller sites. However, larger sites will quickly run into issues with memory consumption due to the way the module indexing is performed. The search architecture is limited and greatly impacts the performance of the site and the search results due to the indexing process and the direct SQL table queries that holds the words and index. Most likely this is due to the requirement for database independence, rather than poor design. If your site is small, needs database independence, and search results are helpful, but not really an important piece of your site, then this may be the best tool for you.

If you are looking for a traditional search engine crawler, with good scalability, and you require database server independence, and decent search results, Open-SearchEngine may be the solution for you. It is by far better than the core DNN Search, but relies on tradional crawling and indexing methods. Conservation of bandwidth and server resources are debatable since there is no method of smart caching available. The ability for this engine to index direcories of files is an important feature than neither DNN Search, nor VSE offer. However, the lack of RSS aggregation as the new medium for crawling and gathering new and updated data is a huge issue that will lead to stagnant search results without frequently reindexing all URLs.  As evident from the simple search results analysis performed, most results are not really relevant, but it is better than not returning any true relevant results like DNN Search due to spelling differences. It just means your users will have plenty to click on before finding the correct document they are looking for. While execution time is certainly better than DNN Search, it is still significantly slower than the Venexus Search Engine execution time.

The Venexus Search Engine offers 2 versions, the standard (free version), and the Pro (not free version).  The standard version still offers many of the features smaller sites require, including quick and relevant results, but does not include some of the nicer features of the Pro version like PDF and MS Office document indexing and blog and feed aggregation pinging service. Where VSE really shines is in its ability to provide and aggregate RSS feeds for inclusion in its index. The smart caching and user interactive crawling using AJAX distributes the load on the server and bandwidth. The major advantage and disadvantage of VSE is MS SQL Server Full-Text indexing. The disadvantage is that VSE is NOT database independant and requires Full-Text indexing enabled versions of MS SQL Server in order to operate. The advantage is that it uses Full-Text Indexing from MS SQL Server for more relevant and faster search results. We know VSE is scalable because it has been tested against a database of over 2 million indexed pages. The simple search results analysis shows that it is 4 times faster than DNN Search and 2 times faster than Open-SearchEngine. The actual search results speak for themselves, delivering the most relevant result as #1 and contextual links from the home page as supplemental results.

Picking the right search engine application is important for your website and now you should be armed with the knowledge of how each one operates, the differences in features between them, and the overall performance and relevancy of the search results.

I hope this answers everyone's questions concerning the differences between the 3 DotNetNuke Search Engines. Feel free to comment with questions or suggestion on how this post can be improved. If you know of a feature or difference that I missed, please let me know. While this post is quite lengthy, I plan on keeping it updated as a resource for those who would like to keep track of the differences between each DNN search engine.

Sunday, April 08, 2007 6:37:38 PM (US Eastern Standard Time, UTC-05:00)  #       |  |  |  |   | 
 Thursday, March 22, 2007

There is nothing better to start off the day than having a client running into a 100% CPU utilization issue on their production SQL Server. Every few minutes, the server would spike up and hang there for a variable amount of time (15 seconds to several minutes). You can only imagine the flakiness of a website with SQL Server choking to death. There was nothing of value to point any fingers as to the culprit of this issue in the event logs for DNN (Admin > LogViewer)...none that we saw through a brief spot checking and filtering of event types (this was incredibly slow and seeing timeouts so we abandoned all hope of using DNN Admin/Host tools to find the problem).  And, there was not an alarming number of events actually logged in EventLog table. However, we have seen issues with performance that are usually resolved by clearing the Log Viewer. We have seen cases where clients who have high traffic/usage sites, or a broken/problematic module on all pages, have 5 and 6 figure rows of data for EventLog table, especially if all the default settings are used for the DNN Log Viewer settings. We have seen timeout issues just trying to clear the event log when they get that large ("Delete EventLog" as the sql statement does the trick quickly from SSMS). So, we went ahead and cleared it, but the issue persisted.

For those who have not explored much in SQL Server Management Studio (not in SSMS Express), there is now a Database Engine Tuning  Advisor and SQL Server Profiler (under Tools > SQL Server Profiler). Running the SSP, we performed a trace and caught the offending SQL causing all of the havoc. Just a note...we have run DETA to find recommendations from trace files for several large DNN databases and apply the recommendations (it usually creates new indexes for tables that have 6 and 7 figure rows, helping greatly with performance on databases). But in this case, we just started and stopped the trace in SSP before and after a huge and hanging spike. Going through the rows looking for CPU hits, we found the following 2 villians of resources:

GetSchedule @Server='SERVERNAME'

GetScheduleNextTask @Server='SERVERNAME'

Running these statements showed the huge spike on command, pegging the server hard. Looking in the stored proc it hits Schedule and ScheduleHistory.

ALTER PROCEDURE [dbo].[GetSchedule]

@Server varchar(150)

AS

SELECT S.ScheduleID, S.TypeFullName, S.TimeLapse, S.TimeLapseMeasurement, S.RetryTimeLapse, S.RetryTimeLapseMeasurement, S.ObjectDependencies, S.AttachToEvent, S.RetainHistoryNum, S.CatchUpEnabled, S.Enabled, SH.NextStart, S.Servers

FROM Schedule S

LEFT JOIN ScheduleHistory SH

ON S.ScheduleID = SH.ScheduleID

WHERE (SH.ScheduleHistoryID = (SELECT TOP 1 S1.ScheduleHistoryID FROM ScheduleHistory S1 WHERE S1.ScheduleID = S.ScheduleID ORDER BY S1.NextStart DESC)

OR SH.ScheduleHistoryID IS NULL)

AND (@Server IS NULL or S.Servers LIKE ',%' + @Server + '%,' or S.Servers IS NULL)

GROUP BY S.ScheduleID, S.TypeFullName, S.TimeLapse, S.TimeLapseMeasurement, S.RetryTimeLapse, S.RetryTimeLapseMeasurement, S.ObjectDependencies, S.AttachToEvent, S.RetainHistoryNum, S.CatchUpEnabled, S.Enabled, SH.NextStart, S.Servers

In ScheduleHistory we found a little over 6 thousand rows. You can use the following to check your db:

select count(*) from schedulehistory

6000+ does not seem like that many rows to be causing that much of a peak, but regardless we deleted them all getting desperate at this point:

delete schedulehistory

Executing the 2 sprocs again for the schedule, and cpu barely gets over 3% utilization. The site is again fast and responsive and I was able to get in and check settings without getting timeouts. So, as an interim fix I lowered the defaults in  DotNetNuke.Services.Scheduling.PurgeScheduleHistory under Host > Schedule.

I am concerned about why 6000 rows of data would be taking such a hit on cpu resources. However, that it more records than I believe should be there, so lowering the defaults will help. Nothing in the DNN stored procedure for GetSchedule really stands out at me as being problematic, nor at first glance do I see anything that could be changed that may help, but I will ponder on this some more in my copious spare time.

So, if you are having trouble with SQL Server performance and DNN, check and make sure you keep your EventLog and ScheduleHistory purged.

If you need help, be sure to checkout our DNN Support Packages.

Thursday, March 22, 2007 1:12:18 PM (US Eastern Standard Time, UTC-05:00)  #       |  |   | 
 Tuesday, March 06, 2007

We released the latest version of our search engine module last week. It has all of the features I mentioned in my previous post, plus the ability to add excluded URLs and partial URLs.

Here are the new specs:

Items marked with * are new

Features
Standard Version
Pro Version
Seamus Features    
Maximum # of Pages
500
Unlimited
Install on commercial site
No
Yes
Scheduled Index Updates
Yes
Yes
Announcements Module Support
Yes
Yes
Contacts Module Support
Yes
Yes
Events Module Support
Yes
Yes
FAQ Module Support
Yes
Yes
Links Module Support
Yes
Yes
Text/HTML Module Support
Yes
Yes
Index MS Excel Documents *
No
Yes
Index MS PowerPoint Documents *
No
Yes
Index MS Word Documents *
No
Yes
Index PDF Documents *
No
Yes
Index Rich Text Files *
No
Yes
Global Crawler *
No
Yes
Allows users to add feeds
No
Yes
Custom User Agent
No
Yes
Obeys Robots.txt
Yes
Yes
TTL Support
Yes
Yes
Feed and Queue Aggregation Using AJAX
Yes
Yes
Display Top X Latest Items
Yes
Yes
XSLT Support
Yes
Yes
Latest Items RSS Feed Generation
Yes
Yes
Portal Specific Feed
Yes
Yes
Enclosure/Podcast Support
No
Yes
Pinging Service
No
Yes
Exclude URLs *
Yes
Yes
     
Search Features    
Search Skin Object
Yes
Yes
Use Image or Text for Search button
Yes
Yes
and - (AND and OR) Support
Yes
Yes
Quoted Search Support
Yes
Yes
Keyword Highlighting
Yes
Yes
Obeys DNN Security
Yes
Yes
Social Bookmarking Support *
No
Yes
     
Support    
Issue Tracker
Yes
Yes
Email
No
Yes
Phone
No
1 Call
  Price  
Free
 
$199 Per Year
BuyNowButton.gif
 

You can download the free version here.

Tuesday, March 06, 2007 11:25:17 AM (US Eastern Standard Time, UTC-05:00)  #       |  |  |  |   | 
 Monday, February 26, 2007

The Venexus Search Engine is a DotNetNuke search module, plus a whole lot more. Not only does our DNN search module index your portal, but also external sites. VSE crawls pages on your site, aggregates RSS feeds from other sites, and crawls any links for external websites, making it a full search engine module.  Unlike the core DNN Search module that uses a scheduled task to perform index updates, VSE crawls and indexes content based on user request. Seamus can be configured for several different setups and displays, including the ability to hide the module on every page. When a page is loaded that has the Seamus module on it, Seamus will go out and grab 3 RSS feeds and 3 queued URLs and add any new or updated content to the index. In order to not delay the page loading for the end user, Seamus utilizes AJAX to make aggregation request, providing a seamless integration into your site.

 

Here are the pro features of VSE:

 

1.1 Pro Features

  • Allow users to add their feeds

You can enable users to add feeds to the system.

 

  • Podcast Support

Indexed items that have files associated with them are used as enclosures (aka podcast) in the feeds that Seamus generates.

 

  • Pinging service

When the pinging service is enabled, every time something new is added/updated on your site Seamus will “ping” several XML-RPC  web services for blog and feed aggregation sites to notify them your portal has new content. The aggregators will then come to your site and aggregate your feed and provide links to their users to your site.

 

Here are a few services we ping:

 

  • Custom User-Agent

You can set your own user-agent to specify your own crawler name. The default user-agent is “Seamus/1.1 PRO ( http://search.venexus.com)”.

 

  • Global Crawler

The pro version allows you to be a global crawler. Any links found on your site, from aggregated news feeds, or from external links are crawled and indexed.

 

So, not only are you able to aggregate even more content with the Pro version when compared to the Standard version, but you also get the search engine optimization benefits of pinging all of the major blog and feed aggregation services. This provides you with links directly to your site, generating more web traffic. You can watch your page rank grow very quickly with this feature.

 

Since the release of the 1.1 version, we have steadily been working on the 1.2 version. We are now testing the latest version on our demo site: search.venexus.com.

 

1.2 Pro Features (March 1, 2007 Release)

  • New file formats indexed

You asked for it, so we added support for all of the most common Office document file types as well as PDF documents. We have added a new document-to-text converter to our crawler that is able to parse the actual text from these documents. So not only does Seamus crawl and index HTML, Text, and XML files, but also the following new formats:

1.     Excel files

2.     PDF files

3.     PowerPoint files

4.     Rich text files

5.     Word documents

 

  • Social Bookmarking Support

In the search results you can enable social bookmarking to allow users to easily add bookmarks to their favorite social bookmarking application/service. This allows user to easily find their favorite links to your site. Also, the sites that provide this service will generate a link to your site, giving you more traffic once again.

 

Here is an example of what it looks like:

Here are the supported sites:

 

1.     Digg

2.     del.icio.us

3.     FURL

4.     Reddit

5.     Yahoo

6.     Blinklist

7.     Google

8.     ma.gnolia

9.     Shadows

10.  Technorati

 

Ready for a real search engine for your site? Buy the Pro version here.

 

Stay tuned for more…

Monday, February 26, 2007 5:04:36 AM (US Eastern Standard Time, UTC-05:00)  #       |  |  |  |   | 
 Wednesday, February 21, 2007

We released the Pro version of our DNN search engine module today.

Here is the breakdown of the feature comparison:

Venexus Search Engine Version Matrix

Features
Standard Version
Pro Version
Seamus Features
 
 
Maximum # of Pages
500
Unlimited
Install on commercial site
No
Yes
Scheduled Index Updates
Yes
Yes
Announcements Module Support
Yes
Yes
Contacts Module Support
Yes
Yes
Events Module Support
Yes
Yes
FAQ Module Support
Yes
Yes
Links Module Support
Yes
Yes
Text/HTML Module Support
Yes
Yes
Allows users to add feeds
No
Yes
Custom User Agent
No
Yes
Obeys Robots.txt
Yes
Yes
TTL Support
Yes
Yes
Feed Aggregation Using AJAX
Yes
Yes
Display Top X Latest Items
Yes
Yes
XSLT Support
Yes
Yes
Latest Items RSS Feed Generation
Yes
Yes
Portal Specific Feed
Yes
Yes
Enclosure/Podcast Support
No
Yes
Pinging Service
No
Yes
 
 
 
Search Features
 
 
Search Skin Object
Yes
Yes
Use Image or Text for Search button
Yes
Yes
+ and - (AND and OR) Support
Yes
Yes
Quoted Search Support
Yes
Yes
Keyword Highlighting
Yes
Yes
Obeys DNN Security
Yes
Yes
 
 
 
Support
 
 
Issue Tracker
Yes
Yes
Email
No
Yes
Phone
No
1 Call
 
 
 
Price
Free
$199 Per Year
BuyNowButton.gif

I will be discussing the features of the Pro version in a later post. Stay tuned...

Wednesday, February 21, 2007 2:52:51 PM (US Eastern Standard Time, UTC-05:00)  #       |  |  |  |   | 
 Monday, February 19, 2007

We have released the new version of the Venexus Search EngineVSE Standard Version 1.1.0 has several bug fixes and shows some of the new features of the Pro version.

New standard features and bug fixes:

  • VenexusSeamus - Changed TransformXSL to not create a temporary XML file
  • VenexusSeamus - Modified Response.Charset
  • VenexusSeamus - New Delete Tabs routine for removing deleted and expired tabs
  • VenexusSeamus - Ability to reload default XSLT file
  • VenexusSeamus - Shows total number of aggregated items
  • VenexusSeamus - Gridview pagination
  • VenexusSeamus - Link from Grid to show aggregation errors
  • VenexusSeamus - Guid attribute added
  • VenexusSeamus - application/rss+xml support
  • VenexusSeamus - Automatic creation of fulltext index during installation (works for SQL Server Express too!)
  • VenexusSearch - Support for DNN 4.4.1 and "search" URL parameter
  • VenexusSearch - Non-authenticated postback issue resolved
  • VenexusSearch - Limits URL length for display
  • VenexusSearch - Quoted query support

If you have any issues with installation, configuration, or bugs, pleas post them in our issue tracker.

Monday, February 19, 2007 5:17:05 PM (US Eastern Standard Time, UTC-05:00)  #       |  |  |  |   | 
 Wednesday, February 14, 2007

Here is a video tutorial on setting up SQL Server 2005 Express and Full-Text Indexing. It breaks down the steps for installation of SQL Server Express with Advanced Services. This is a great video that shows alot more than just setting up full-text indexing. It also shows some basic queries.

Key points of interest during installation is when you get to the Registration Information screen, uncheck "Hide advanced configuration options" before clicking Next. Then in the next screen, expand Database Services and select the option to add "entire feature will be installed on local hard drive" for Full-Text Search. After a few more steps, you must uncheck User Instances Enabled. For those who already have Full-Text Search installed, but did not uncheck that option, you can use the following SQL:

sp_configure 'user instances enabled', '0'

If you are using SQL Server Express Management Studio Express, you can go into the database properties and under files, make sure enable full-text indexing is checked. Or, run the following SQL:

sp_fulltext_database 'enable'

Now for creating the catalog and index. The example below is for our search engine module:

Create fulltext catalog VenexusSearchCatalog

Create Unique Index PKVenexusSearchEngine On Venexus_BrainDump(IndexID)

Create fulltext index On Venexus_BrainDump (IndexURL, IndexTitle, IndexWashedContent)
Key Index PKVenexusSearchEngine On VenexusSearchCatalog
With Change_Tracking Auto


 

Wednesday, February 14, 2007 2:25:21 AM (US Eastern Standard Time, UTC-05:00)  #       |   | 
 Wednesday, January 24, 2007

The first beta testers of the Venexus Search Engine were the guys from True Lawyers. They created a new portal in their DNN installation for Search.TrueLawyers.com. Search.TrueLawyers.com is a legal search engine. Their instance of VSE has aggregated over 216,000 legal articles, news, and related site pages as of this morning. You can test this site and see that the speed of VSE is still great when considering the amount of data it has already indexed. When using the site, each time a page is loaded that has Seamus on it (and Seamus CAN be hidden on the page), 5 feeds are aggregated. Any new items in the feeds are added to the index immediately. While the user sits on the page, AJAX is used to pull more feeds. Plus, since the site is new and does not have much traffic...yet, they use a RSS reader to call the Seamus RSS feed, which grabs more data every 10 to 15 minutes. As you can imagine, their index is growing FAST! You can see the latest items Seamus has aggregated by visiting the True Lawyers Legal News Room.

So, not only does VSE work as a site search engine and multiple portal search engine, it also works as a full blown search engine, aggregating items from your DNN installation, as well as other sites that provides RSS feeds. One of the features we are working on for the 1.1 Pro version is the ability to index any website, regardless of having a RSS feed. You can now have the ability to build powerful niche websites that provide your users with lots of relevant information.  Plus, with the RSS feed Seamus generates, you can set it to display items for only your website, allowing you to submit the link to many feed directories, providing search engine optimization. The 1.1 Pro version will ping many blog directories, greatly increasing traffic to your website, treating your entire website like a blog. And we all know that the other search engines are just eating up blog content, increasing the page rank of those sites over many traditional websites without feeds. Ready to try it out? You can download the release candidate here.

Stay tuned for more...

UPDATE 2/7/2007:

I just checked the total items indexed for this site again and it is now showing over 246,0000 items. So in 2 weeks, an extra 30k + items were indexed.

Wednesday, January 24, 2007 9:44:34 AM (US Eastern Standard Time, UTC-05:00)  #       |  |  |  |   | 
 Sunday, January 21, 2007

I know many of you have been patiently waiting for the release of the Venexus Search Engine. We have had several beta testers try out previous release candidates, and have several new tweaks in this release.

Seamus Additions:

  • Web.configless (No changes to web.config needed. Beta Testers should remove EntitySpaces web.config entries when installing this version)
  • Object Qualifier support (Thank you Barry White for testing this)
  • Index current tab (Seamus will index the tab it is on. You can add make Seamus invisible on the page by showing 0 items in the feed and unchecking Show Feed. Add to all pages on the site and Seamus will index and update the index when the page is updated)
  • Edit Feed Display (Only show feeds selected in Edit Feeds section. By default, all are show. Selecting feeds will filter news display of only items indexed from the feeds list)

Search Additions:

  • Web.configless
  • Object Qualifier support
  • "query" URL parameter (You can now use your existing default DNN search results page. Simply drop the module on the search results page and remove the default search results module. Utilize the DNN Search textbox in your skin with the power of Full-Text Indexing).
  • Search form skin object support (see instructions)
  • Form post fix (A fix was added that allows you to simply hit the enter key after adding your query, rather than forcing you to click on the button)
  • Allow user selected web or site search (allow your users to select whether their search is against the current portal or for all search results in the database)
  • URL Trim (Used to trim the URL display in the search results. Long URLs would stretch out the skin)
  • Search Query (Saves user queries and the number of "hits" for that query. This will be used in the pro version for "Top Searches" and "Latest Searches".)

Here is some information about our DotNetNuke search engine module. You can test it on our site here. We also have the latest version loaded on our DNN search site for finding DotNetNuke related pages and sites.

As for the official release, we are waiting on FlatBurger to fix an issue with their code protection that causes the module to generate an error after activation. We have been told that this may be fixed by Friday...we will see. In the meantime, please send us your thoughts on this release candidate. If you find any bugs, please post them to our issue tracker. If you have any suggestions for new features, please post them in the issue tracker or in the support forms.

REQUIREMENTS FOR VENEXUS SEARCH ENGINE

  • DotNetNuke 4.3.5 or Higher (Yes it works with the DNN 4.4 release) 
  • SQL Server supporting Full-Text Indexing
  • .Net full trust for EntitySpaces and Reflection usage

Now for the files....

Before installing this, you MUST read the instructions. You CANNOT just install both modules and expect it to work. You MUST configure fulltext indexing manually to get this to work. You will find instructions on performing this action in the Search Instructions and Configuration.

You can download both modules here. The file is also attached as an enclosure.

Please post your links here in a comment to show everyone how you are using the Venexus Search Engine.

UPDATED: Link to module downloads has been updated.

Sunday, January 21, 2007 5:40:30 PM (US Eastern Standard Time, UTC-05:00)  #       |  |  |  |   |  VenexusSearchEngine-v1.1.0-standard.zip (284.32 KB)
 Sunday, December 03, 2006

In case you were wondering from my last post, here is how to get a list of all modules by Portal:

SELECT DISTINCT ModuleDefinitions.ModuleDefID, ModuleDefinitions.FriendlyName, Modules.PortalID
FROM         ModuleDefinitions CROSS JOIN
                      Modules LEFT OUTER JOIN
                      Modules AS Modules_1 ON ModuleDefinitions.ModuleDefID = Modules.ModuleDefID
WHERE     (Modules.ModuleDefID IS NOT NULL) ORDER BY Modules.PortalID

Here is how to specify a specific portal in the installation:

SELECT DISTINCT ModuleDefinitions.ModuleDefID, ModuleDefinitions.FriendlyName, Modules.PortalID
FROM         ModuleDefinitions CROSS JOIN
                      Modules LEFT OUTER JOIN
                      Modules AS Modules_1 ON ModuleDefinitions.ModuleDefID = Modules.ModuleDefID
WHERE     (Modules.ModuleDefID IS NOT NULL) AND (Modules.PortalID = 0)

Sunday, December 03, 2006 1:00:06 AM (US Eastern Standard Time, UTC-05:00)  #       |   | 
 Saturday, December 02, 2006

We needed to check which modules were NOT being used on a DNN site. Why? It's a multi-portal DNN 3.1 website with quite a few 3rd party modules on it and we wanted to know which modules were not in use so we could remove them from the DNN installation before performing an upgrade. No need to add extra compilications or search for module updates for modules that were not being used. And if they are not being used, why leave them on there adding to the bloat? Anyway, here is the SQL:

SELECT     ModuleDefinitions.ModuleDefID, ModuleDefinitions.FriendlyName, ModuleDefinitions.DesktopModuleID
FROM         ModuleDefinitions LEFT OUTER JOIN
                      Modules ON ModuleDefinitions.ModuleDefID = Modules.ModuleDefID
WHERE     (Modules.ModuleDefID IS NULL)

Saturday, December 02, 2006 11:34:25 PM (US Eastern Standard Time, UTC-05:00)  #       |   | 
 Saturday, November 25, 2006
I know it has been over a week since the last post. Sorry to leave you hanging, but sometimes there are just not enough hours in a day. Anyway, without further ado, here is part two…SEAMUS.

At some point earlier this year, DNN Find became a different mission. We decided to build a full blown search engine for DotNetNuke. Not one that would just index a single DNN site, but one that would allow you to index all portals in a DNN installation AND information from external sites. And how would external site indexing best be handled? …via RSS feed aggregation of course.  

Seamus is the first of the two modules that make up the Venexus Search Engine. SEAMUS = Search Engine Aggregation Module Utilizing Syndication. On a side note, there is also an obscure Pink Floyd song that not many know from the Meddle album, about an old hound dog by the same name. Our hound dog “fetches” data and stores it to a table that has enabled MS SQL Server full-text indexing. But before I go into the specifics, I think it is important to know about the framework.

We started with traditional DotNetNuke module development…until EntitySpaces was released. I’m an old ASP/VB developer and personally, it took me a bit to get my head wrapped around how ES worked, but once I figured it out, I was hooked. ES saves the day by automagically generating all the CRUD (create, read, update, delete). While very similar to the logic of a BusinessController and InfoObject, ES uses Collections and Entities. But, where I found ES the most useful is the Dynamic Queries you can write directly into the business logic.

For example, in Seamus we need to check the domain to see if it matches one we are already indexing:

            Dim colDomains As New VenexusDomainCollection
            colDomains.Query.Select(colDomains.Query.DomainName, colDomains.Query.DomainID)
            colDomains.Query.Where(colDomains.Query.DomainName.Equal(GetDomainName(sURL)))
            colDomains.Query.Load()
            If colDomains.Count > 0 Then
                     ‘a bunch of removed logic goes here..
            End IF

With the colDomains.Query.Select, we are only returning the data we need rather than all columns. With the colDomains.Query.Where, I eliminated the need to:

  1. Write a stored proc just to retrieve by DomainName
  2. Iterate through the entire table, every row of all domains, just to find the one I am looking for.

I won’t even go into the performance gain of not having to loop through those rows of all columns, nor the time (even though it would be simple) to write a stored proc to pass in DomainName and have it return the DomainID.

Here is an example of adding a record to Seamus for a new feed:

         Dim entFeed As New VenexusSeamus
         entFeed.AddNew()
         entFeed.Url = txtURL.Text
         entFeed.Title = txtTitle.Text
         entFeed.Account = txtAccount.Text
         entFeed.Password = txtPassword.Text
         entFeed.CacheTime = txtCacheTime.Text
         entFeed.FeedTimeOut = txtTimeOut.Text
         entFeed.DateAdded = Now()
         entFeed.DateUpdated = "1/1/1901"
           If chkActive.Checked = True Then
              entFeed.IsActive = True
            Else
                entFeed.IsActive = False
            End If
         entFeed.Save()

Easy enough, eh?

And here is an update of a feed for Seamus:

    Dim entFeed As New VenexusSeamus
    entFeed.LoadByPrimaryKey(hidRSSID.Value)
    entFeed.Url = txtURL.Text
    entFeed.Title = txtTitle.Text
    entFeed.Account = txtAccount.Text
    entFeed .Password = txtPassword.Text
    entFeed.CacheTime = txtCacheTime.Text
    entFeed.FeedTimeOut = txtTimeOut.Text
    entFeed.DateAdded = Now()
    entFeed.DateUpdated = "1/1/1901"
          If chkActive.Checked = True Then
              entFeed.IsActive = True
          Else
                entFeed.IsActive = False
          End If
    entFeed .Save()

And a delete example:

    Dim entFeed As New VenexusSeamus
    entFeed.LoadByPrimaryKey(hidRSSID.Value)
    entFeed.MarkAsDeleted()
    entFeed.Save()

Yeah, it’s that easy. Makes you want to fire up your IDE eh?

Sure, I have used DAL Builder Pro, which was a huge time saver, but EntitySpaces made me to never want to develop any other way. Plus, last I checked, DAL Builder Pro was still only for DNN 3 development. The ease of generating the DAL and the ability to easily REgenerate the DAL if the database schema changes, makes ES the tool of choice for all of our module development. I cannot even begin to count the hours I have previously spent hand coding changes in a DAL due to spec changes. Oh how I wish I had all those hours back!

With the new DNN admin grid templates, it is just ridiculous how much code is generated before having to write the first line. The new template will generate an editable grid of the table(s), with sorting, paging, and search. If you are interested in .Net development (this is not just a DNN tool, it works for all .Net 2.0 development and using C# or VB.Net), you must check it out.

NOTE: Just so you know, we do not have any affiliation or partnership with EntitySpaces, we just think their tool rocks.

So, even though we had much of the initial Seamus development completed, we scrapped it and started development with ES. This will make future modifications and additions so much easier, saving time in the long run.

With that said, here is how Seamus works…

After you install Seamus, you can go into the module settings:

So in this example, the display for Seamus should show the top 10 items last indexed, each with a link to the actual item in the Title and using the “…More” link. A feed icon will also be displayed that provides a link to a RSS feed for the top 10 items.

Here is an example of the display:

Now while the above example does not show any local items (tabs or modules from this site), it does have items indexed from other sites. All of these items were from RSS feeds that were aggregated. As a module editor, you have the ability to manage external feeds (or local feeds if so desired, but we will go into more detail about how Seamus works shortly). But, if there were local items visible, they would only be visible if you have the proper permissions. Seamus checks permissions on any local site at the module and tab level for the display and the RSS feed. 

Here is an example of the feeds we are are currently indexing on the Venexus Search Engine:

Here is the interface for adding new feeds:

Now we will get into how Seamus works…

First off, on the first load of Seamus, a dump of data from all modules supporting the IPortable interface (currently limited to DNN Core modules) is performed to ensure that there is data in the index. And every X hours (determined in module settings), the index is checked for new, updated, and deleted pages/modules.

Secondly, any feeds that have been added to Seamus are aggregated 5 at a time, order by last updated.  And, while the user is sitting on the page, every 30 seconds that pass, 5 more feeds are aggregated via AJAX. This user interactive aggregating decreases the load on the server, rather than running as a scheduled task like the core DNN Search.

In order to save bandwidth, and to not tick off the owners of the websites you are aggregating data from, Seamus has what I call “smart caching”.  Each time a feed is requested, if the information in the feed as not been updated, Seamus will increase the cache time. If the feed has been updated, it will request the same feed sooner than it had previously, decreasing the cache time. Over time, and based on the “average” a feed it updated, Seamus learns when to check again for updates, all while obeying TTLs.

Seamus will also index the current page/tab it is sitting on. Now you may be asking why you would index a page that displays items that have already been indexed. Well, Seamus can be setup to not display the top X items and/nor the RSS feed. Here is an example:

With the above Seamus settings and the module settings to display on all pages and set to not display the container or using an “invisible” container, when a user lands on any page of the site, the page is indexed. You can index your entire site by letting the users "crawl" the website. Also, when the page is updated, the index will be updated. Here is the module settings example:

So, not only does Seamus index all portals in the DNN installation by doing a dump of all modules that support the IPortable interface and individual page indexing based on user interaction, it will also aggregate and index data from other sites. This gives you the ability to create a full blown search engine for your niche. For example, let say you have a website about racing. You could have your entire DNN site indexed, along with aggregation of more racing data from the following sites:

http://www.sportsline.com/partners/feeds/rss/auto_news

http://rss.news.yahoo.com/imgrss/events/sp/042103autoformula

http://rss.cnn.com/rss/si_motorsports.rss

Not only are you able to display a list of the last items indexed in order to keep a page from becoming stagnant, you can also provide a RSS feed for your users, giving them a reason to return to your site. I will save a Seamus and SEO discussion for another time, but here is an example site for a legal search engine.

Speaking of time, I am once again out of it. Part III will be a discussion of the second module, the search form module. Stay tuned...

 

Saturday, November 25, 2006 5:04:44 AM (US Eastern Standard Time, UTC-05:00)  #       |  |  |  |   | 
 Wednesday, November 15, 2006

I know a lot of people have been waiting on this and it is literally been over 5 years in the making, but it is now time to tell the story of how the Venexus Search Engine came to be…

 

Bots, Crawlers, and Spiders, Oh My!

Once upon a time, long, long ago, well over 5 years ago anyway, but that’s like ancient history in terms of the web, I wrote a little script to rip down free fonts off of a font directory website, who shall remain nameless since they are still around today. FontGrabber.vbs crawled their entire website saving zip files of free font packages. If I remember correctly, it pulled down almost 5000 font packages in a few hours.  What a time saver! And my crawler addiction began to set in…

 

MediaGrabber

The next crawler I wrote extracted data from an online database of live music recordings. I dumped about 10 to 12 thousand records into a custom media database. My crawling habit had now increased to an hour or 2 a week perfecting the use of HTTPGets using XMLHTTP and making modification to scrape other data from the site based on URL parameters.

 

Many variations of MediaGrabber were developed over the years for aggregating data. Some of the variations include:

  • PhotoGrabber - For consuming one of the stock photography buffett sites. An interesting note, the one we crawled, which will also remain nameless, started limiting the number of photo request per day the following month. I wonder if that had anything to do with what we were doing...hehe.
  • FDAUpdater - For pulling down pharmaceutical data from the FDA to be used on a pharmacy website. Enough said about that one.
  • CategoryDump - For pulling category names from Yahoo and DMOZ.
  • And others...

 

Madhatter

Madhatter was my first bot. It was a VBScript that sat in a Direct Connect P2P Server application. Madhatter started as a trigger bot.  A user would type a message into the chat and if it contained keywords or phrases that matched a list of keywords and response(s), the bot would automatically reply with a random response from the list that was associated with that keyword. Over time, I added around 1000 different responses to about 400 keywords. Madhatter then received search capabilities. You could type +search <band> or +search <date> and it would return a top 100 list of media records from a database of about 20000 records that matched with a link pointing them to the website with the information. I then gave the ability for the Operators to allow Madhatter speak on their behalf. So in addition to Madhatter automatically responding, the operator would make new responses to the user messages via Madhatter. This worked so well, and I guess to some degree could be considered my first AI application, that many DC newbies really thought it was a live person responding to their messages, even when Madhatter was running solo. I even setup the bot so that if a user tried to send Madhatter a private message chat, it would display in the Operators chat. This led to untold hours of entertainment watching people talk to a rude, trash talking bot that would kick them off the hub if they responded in a derogatory manner. Just thinking about it again makes me want to write a DNN Bot, maybe not one as feisty as Madhatter. Or maybe “bot” interactive search anyone?

 

 

Tiny IntRAnet Crawler

I started working for Semiconductor Research Corporation in August 2001 as their Web Administrator/Developer. At that time they only had a website and a forums website. The forums website was using a product called SiteScope which was written in TCL, but we will not even go there in fear of recurring nightmares .The SRC main site was not built using a Content Management System, rather a Staging to Dev push of content. I think it was sometime in early 2002, I began writing my first true crawler that would consume all items in a domain. 

 

The need was simple…with the amount of content we had on the site, there was bound to be broken links, missing images, orphaned files, and God forbid, 500 server errors. We needed something that would crawl the site and search for any issues, compare the file system, and generate a report for the Content Management Team. I was still using XMLHTTP component for grabbing the data until I found ASPTear. ASPTear proved to be faster and was the HTTP component of choice until I found NSoftware. NSoft’s HTTP component was far superior to any of the others for speed and with many more methods/objects that could be utilized.

 

SRC had a pretty big main site and we began developing 2 other websites to fall under the SRC umbrella. This lead to TIC 2.0, which crawled all 3 domains, and would (and probably still does) generate a report of any issues. With TIC now crawling more than one website and doing it dynamically (could jump from one domain and then the other with the FIFO [First In First Out] URL queue/stack), the need came to check the first link offsite. Why? In case the link moved (301 or 302), or was generating a 404. We have no control what some site may do to their content, but we sure wanted to know if our users were going to get an error if it was broken. TIC would find those problem links and let the CM Team know they needed to remove the link, or change the URL to the new redirect. Now comes TIC 3.0...

 

Tiny IntERnet Crawler

One night I was goofing around with TIC and decided to turn off the function that performs the domain or first link offsite check and just let it run…and run. And it did, all night long. When I got up the next morning, it had crawled almost 30,000 pages and had built a queue of over 100,000. Now I was hooked. How could I get more data and faster? Since TIC was a script and utilized a central database for the URL queue, instead of an in memory stack, I was able run multiple instances of the crawler. 10 instances of TIC 3.0 crawling brought my little home router to its knees. In fact, it choked and rolled over tits up. In three hours, over 110,000 pages were crawled, over 500,000 URLs queued, and had sucked down over a gig of data. Whoa…this was getting fun.

 

Over the next year or so I really was tweaking TIC quite a bit. I’d let it run for weeks at a time. I quickly realized I was going to run into a big problem…Disk space. The database was getting bloated and slowing down dramatically after it had indexed over 1 million pages and had over 5 more million queued. While those numbers are a drop in the bucket when compared to the 800 pound gorillas of search, it is still a lot of data for such a small operation. And, TIC would crawl anything, all file types.  So I started curbing back what TIC looked for…all the way down to just XML. TIC, as the last version in use, now looks just for XML files anywhere on the Internet. Of course I added tweaks to check domain importance or linking page importance based on keywords and altered the queueing process so that TIC would not get stuck on a crappy domain. But that is a discussion for another time.

 

Tiny XML Spider

So with TIC crawling the web looking for XML files, TXS was developed to crawl and index the XML files TIC found. TXS runs continuously, iterating through all “approved” RSS feeds (about 2,500 of over 100,000). For each feed it parses through the articles and stores anything new to the database. If the feed has been updated, TXS will return in less time. Feeds that have not been updated will be crawled the next time after a longer duration. I call this “smart caching”, which will be discussed in the features of Seamus later on. TXS has aggregated over 1.7 million articles from only 2500 news feeds. Not bad considering how much other data we have to collect from feeds that have not been approved. We have been stuffing the aggregated data into a combination of DNN websites for SEO reasons.

 

DNNFind

DNNFind = DotNetNuke Fulltext INDexing. At some point about 2 years ago, and with TXS bringing in the data, we decided to build a DNN module that would perform a SQL Server fulltext index query against the aggregated data and return the results. While this is not a bot, crawler, or spider, it is a fundamental step of searching the data, which we will get into when discussing the search module of VSE.

 

DNN Spider

I started developing a standalone VB.Net application for crawling DotNetNuke websites. This was my first multi-threaded application. While similar to TIC, this application would allow 1 to many threads to be used to handle the crawling. What we found is that we can use the application for stress testing DotNetNuke websites by throwing a few hundred or thousand request at it. And, we can use multiple applications running on different servers to really pound away at a box. However, this got me thinking about distributing the load of crawling against the users of the website, which is why we are using AJAX to request more data from Seamus. More on that later on as well.

 

Okay, so you made it this far and you are probably asking why I have not even started to describe what the Venexus Search Engine does. Well, I think it is important to understand the background of the application and how it came to be. It’s not like we just came up with some flimsy half-brain ideas about how a search engine should be done, but rather years of trial and error. And, I want everyone to realize that our product is not going to disappear, but get stronger as we add more functionality from all of the code we have written over the years. With that said, here are the details...

 

Sorry, I am out of time and you will have to wait for Part II of this post.

 

In the meantime, if you want to see Venexus Search Engine in action, go to search.venexus.com. To read more about VSE, go here.

REQUIREMENTS FOR VENEXUS SEARCH ENGINE

  • DotNetNuke 4.3.5
  • SQL Server supporting Full-Text Indexing
  • .Net full trust for EntitySpaces and Reflection usage

If you would like to test our release candidate, please reply in a comment to this post and I will send you the PA's.

 

Wednesday, November 15, 2006 11:12:18 AM (US Eastern Standard Time, UTC-05:00)  #       |  |  |  |   | 
 Sunday, November 05, 2006

Did you know that SQL Server Express now has Full-Text Indexing? With the soon to be release of our DotNetNuke search module, which requires SQL Server Full-Text Indexing, I thought it would be helpful to make a post here for those who did not know the differences between SQL Server Express versions.

SQL Server Express Edition Comparison

You have several specific products to choose from when you install SQL Server Express Edition Use the following table to see how features for SQL Server compare across other Express Edition products.

Express Edition Products for SQL Server Compared

Feature SQL Server 2005 Express Edition SQL Server 2005 Express Edition with Advanced Services SQL Server 2005 Express Edition Toolkit
Database Engine
*
*
 
Client Components
*
*
Full Text Search
 
*
 
Reporting Services
 
*
 
Management Studio Express
 
*
*
Business Intelligence Developer Studio
 
 
*

Each SQL Server Express Edition product has a specific use. Read the following sections to learn how each Express Edition product for SQL Server compares to the others.

SQL Server 2005 Express Edition

How does the Express Edition of SQL Server compare to other SQL Server Express Edition products? SQL Server Express Edition is perfect for use as an embedded database for a desktop application that requires a fully functional SQL Server Database Engine. SQL Server Express offers the smallest package size for faster downloads or to conserve space on deployment media.

SQL Server 2005 Express Edition with Advanced Services

How does the Express Edition with Advanced Services for SQL Server compare to the other SQL Server Express Edition products? SQL Server 2005 Express Edition with Advanced Services is perfect for use as a backend to a small, multiuser application that requires more advanced features such as Web reporting or Full-text Search.

SQL Server 2005 Express Edition Toolkit

How does the Express Edition Toolkit for SQL Server compare to other SQL Server Express Edition products? Install this package if you need the management tools and client components, but do not need the Database Engine.

Source: MSDN

Ready to download SQL Server 2005 Express with Advanced Services (has Full-Text Index)? Its free and here.

Sunday, November 05, 2006 7:31:30 PM (US Eastern Standard Time, UTC-05:00)  #       |   | 
 Thursday, October 19, 2006

I was updating a DNN site today and at the same time was migrating the SQL Server 2000 database to SQL Server 2005. I decided to use the Copy Database Wizard since I had never tried it and it worked great. However, the logins did not get updated properly. I created the login in the SQL Server 2005 security, but could not access the database via the old login. I tried doing a generic detach > attach with the same issue. Trying to edit the SQL Server Account through SQL Server Studio Management Studio would generate an error of "Login must be specified", yet it would not give me the ability to update (all grayed out). After doing some digging, I found the following stored procedure that did the trick:

EXEC sp_change_users_login 'Auto_Fix', 'USERNAME', NULL, 'PASSWORD'

After running the above, I was able to use the old login to access SQL Server. Now back to the grind...

Thursday, October 19, 2006 11:18:48 PM (US Eastern Standard Time, UTC-05:00)  #       | 
 Monday, June 19, 2006

The following is a collection of information I have collected, found, and written concerning MS SQL Server Full-Text Indexing. In order to keep it in a single place, I decided to dump it all here and update as needed...

SQL Server 2005 Full-text Indexing came with several new features:

  • Significant performance increases especially with indexing.
  • A dedicated indexing service that works directly with SQL Serrver. This speeds up full-text operations and isolates SQL Server from changes to the search service made by other applications. 
  • Secure by default. All iFilters (the component which extracts the text from the content stored in the rows) must be signed before SQL Server FTS will load them. 
  • The ability to full-text index Indexed Views which simplifies partitioning.
  • Data definition language (DDL) statements for creating and altering full-text catalogs and indexes.
  • Improved Language Support SQL 2005 FTS now supports indexing and searching in 23 different languages. SQL 2005 FTS will respect language tags stored in the content which the iFilter can interpret. You can also override the language settings on a column in your query.
  • Accent Insensitivity-SQL FTS can now be configured to be accent insensitive so searches on resume will match with résumé. This option is enabled by default, but can be disabled.
  • Noise Insensitivity-SQL FTS queries will no longer break when one of the search arguments (SARGs) is a noise word. This option is enabled by default, but can be disabled.
  • Thesaurus support in the Contains and FreeText predicates. 
  • Multi-Column Queries-SQL FTS supports searching a single full-text indexed column, all full-text indexed columns, or a sub-set of the full-text indexed columns in a single query. 
  • Support for linked servers-It is now possible to query full-text catalogs on remote server through a linked server.
  • Replication support If a table is full-text indexed you can replicate the full-text properties to your subscriber if the subscription database is full-text enabled.
  • Backup and restored support for full-text catalogs. You can now backup your full-text catalog and restore it on a different server.
  • Attach/detach full-text catalogs with your databases. You can include your full-text catalog in your detached database files and optionally reattach your full-text catalogs along with your database. 
  • Properties SQL FTS 2005 now supports the indexing and querying of document properties stored in the Image or VarBinary(MAX) data type columns. 
  • Full-text indexing for XML data.
  • Troubleshooting utilities two new utilities ship with SQL 2005 to assist in the troubleshooting efforts: 
    • lrtest - an executable which allows you to see how the word breakers interpret a token at query time and at index time
    • CiDump - a tool which allows you to view the contents of your catalog to determine what is indexed, and how tokens are stored in the index.

    You can find these file in C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Binn 

  • The indexing logs are now stored as plain text in C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Log
  • Integration with SQL Profiler and logging of index operations.

If you have never used Full-text indexing, it's pretty easy to setup...

Go into the database properties and enable full-text indexing:

Right click on the table and select Full-Text Index > Define Full-Text Index...

 

Wizard will popup, click Next:

 

Select the Primary Key:

 

Select the columns you want to index:

 

Start a full population when complete:

 

Since we already had a few other catalogs, we needed to create a new one. And, this new catalog will now become the default catalong by slecting the option.

 

We also wanted to create a recurring task to populate the catalog daily while we are testing:

 

Just click Finish!

Alternatively you can use the following queries to create a Full-Text Index:

EXEC sp_fulltext_database 'enable'

EXEC sp_fulltext_table 'BrainDump', 'create', 'BrainDumpCatalog', 'PK_BrainDump'

EXEC sp_fulltext_column 'BrainDump', 'IndexWashedContent', 'add'

EXEC sp_fulltext_table 'BrainDump', 'activate'

EXEC sp_fulltext_table 'BrainDump', 'start_full'

 

If you go back and right click on the table and Full-Text Index, you will now see an option to "Stop Population". This means that the catalog is being built. Depending on the size of your table, once it has completed a population of the catalog, you can start performing queries...

There are four additions to SQL Server's version of SQL that allow access to full-text catalogs:

  • CONTAINS is used in the WHERE clause of a query to find matches to exact words and phrases, with other options such as word proximity, weighted terms, and inflection of words.

  • CONTAINSTABLE has the same functionality as CONTAINS but is used in the FROM clause of a query and offers the added ability to incorporate relevancy to the results.

  • FREETEXT is used in the WHERE clause of a query to perform matches on the meaning of the words or phrase.

  • FREETEXTTABLE is similar to FREETEXT except that it's used in the FROM clause and can add relevancy to the results.

Example Queries:

SELECT IndexWashedContent FROM BrainDump WHERE CONTAINS (IndexWashedContent, ' "someword anotherword" ')

SELECT IndexWashedContent FROM BrainDump WHERE FREETEXT(IndexWashedContent, 'someword')

Search using NEAR finds words that are close together:

SELECT IndexWashedContent FROM BrainDump WHERE CONTAINS(IndexWashedContent, '"someword*" NEAR anotherword')


You can also supply a weighted list of terms to CONTAINS, and it will prefer matches with a higher weight:

SELECT IndexWashedContent FROM BrainDump WHERE CONTAINS(IndexWashedContent, 'ISABOUT (someword weight (.8), anotherword weight (.4), andanotherword weight (.2) )' )

Full-text Indexing Performance:

Microsoft recommends these two settings for optimum performance:

The virtual memory (PAGEFILE.SYS file) setting for your operating system should be set to an amount equal to 3 times the amount of physical RAM in the server. If you have a non-dedicated SQL Server (a server running applications in addition to SQL Server) then you will want to add the virtual memory needs of these other applications to the amount calculated above.
 

The SQL Server MAX SERVER MEMORY setting should be set manually (dynamic memory allocation is turned off) so that enough virtual memory is left for the Full-Text Search service to run. To achieve this, select a MAX SERVER MEMORY setting that once set, leaves enough virtual memory so that the Full-Text Search service is able to access an amount of virtual memory equal to 1.5 times the amount of physical RAM in the server. This will take some trial and error to achieve this setting.

To find out how much virtual memory is being used by SQL Server and the Full-Text Search Service, you can use the Task Manager. By default, the Task Manager does not display the amount of virtual memory used by a process. To see this number in Task Manager, you must first go to the "Processes" tab. Once there, select "View", and then "Select Columns". From the "Select Columns" dialog box, click on "Virtual Memory Size", then "OK". Now you will be able to see the amount of virtual memory size used by each process on your server using Task Manager. Use this information to help you tune your server for use with the Full-Text Search service.

More Info:
Extending 2005 FTS

Language Features

Using Full-Text Search Catalogs

Extensions to SQL Server to Support Full-Text Search

Have anything to add? Please let us know.

Monday, June 19, 2006 12:28:44 AM (US Eastern Standard Time, UTC-05:00)  #       |   | 
 Monday, May 15, 2006

I always forget the syntax for resetting identity columns. For example, recently on a development DNN installation, we needed to reset all identity columns back to 0 after data testing was complete. So, as a reminder to myself, here's the SQL:

DBCC CHECKIDENT (TableName, RESEED, 0)

If for some reason you need to temporarily allow inserts into the Identity column, then you can use the following:

Set Identity_Insert TableName On

Once you have executed your statements, you can turn it back off:

Set Identity_Insert TableName Off

And for the differences of Scope_Identity and @@Identity: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/tsqlref/ts_sa-ses_6n8p.asp

Monday, May 15, 2006 9:51:36 AM (US Eastern Standard Time, UTC-05:00)  #       | 
 Friday, April 28, 2006

Below you will find a few generic SQL queries for generating reports on DNN Users.

Last Activity:
SELECT     Users.Username, Users.FirstName, Users.LastName, aspnet_Users.LastActivityDate
FROM         aspnet_Users INNER JOIN
                      Users ON aspnet_Users.UserName = Users.Username
order by aspnet_Users.LastActivityDate DESC

Last Login:
SELECT   Users.Username, Users.FirstName, Users.LastName, aspnet_Membership.LastLoginDate
FROM         Users INNER JOIN
                      aspnet_Users ON Users.Username = aspnet_Users.UserName INNER JOIN
                      aspnet_Membership ON aspnet_Users.UserId = aspnet_Membership.UserId
ORDER BY aspnet_Membership.LastLoginDate DESC

New Users:
SELECT   Users.Username, Users.FirstName, Users.LastName, aspnet_Membership.CreateDate
FROM         Users INNER JOIN
                      aspnet_Users ON Users.Username = aspnet_Users.UserName INNER JOIN
                      aspnet_Membership ON aspnet_Users.UserId = aspnet_Membership.UserId
ORDER BY aspnet_Membership.CreateDate DESC

Unverified Users:
select USers.FirstName, Users.LastName, aspnet_Membership.Email, aspnet_Users.UserName from aspnet_Membership, aspnet_Users, Users where aspnet_Membership.IsApproved = 0 AND aspnet_Membership.UserID = aspnet_Users.UserId AND aspnet_Users.UserName = Users.UserName

Users in Role:
SELECT Roles.RoleName AS [Role Name], COUNT(UserRoles.RoleID) AS [Number in Role] FROM Roles INNER JOIN UserRoles ON UserRoles.RoleID = Roles.RoleID GROUP BY UserRoles.RoleID, Roles.RoleName ORDER BY RoleName

Friday, April 28, 2006 3:38:37 PM (US Eastern Standard Time, UTC-05:00)  #       |   | 
 Saturday, April 15, 2006

We needed a Proper Case Function that would change the first letter of each word to an uppercase letter.

Example:

some text = Some Text

Here is the function we added:

------------------------------------------------------

CREATE FUNCTION dbo.Proper (@tcString VARCHAR(100))

RETURNS VARCHAR(100) AS

BEGIN

-- Scratch variables used for processing

DECLARE @outputString VARCHAR(100)

DECLARE @stringLength INT

DECLARE @loopCounter INT

DECLARE @charAtPos VARCHAR(1)

DECLARE @wordStart INT

-- If the incoming string is NULL, return an error

IF (@tcString IS NULL)

RETURN ('(no string passed)')

-- Initialize the scratch variables

SET @outputString = ''

SET @stringLength = LEN (@tcString)

SET @loopCounter = 1

SET @wordStart = 1

-- Loop over the string

WHILE (@loopCounter <= @stringLength)

BEGIN

-- Get the single character off the string

SET @charAtPos = SUBSTRING (@tcString, @loopCounter, 1)

-- If we are the start of a word, uppercase the character

-- and reset the work indicator

IF (@wordStart = 1)

BEGIN

SET @charAtPos = UPPER (@charAtPos)

SET @wordStart = 0

END

-- If we encounter a white space, indicate that we

-- are about to start a word

IF (@charAtPos = ' ')

SET @wordStart = 1

-- Form the output string

SET @outputString = @outputString + @charAtPos

SET @loopCounter = @loopCounter + 1

END

-- Return the final output

RETURN (@outputString)

END

------------------------------------------------------

I am not sure where I found this function to give it the proper credit, but it's been around for a while. I had used it on a SQL Server 2000 database to change all uppercase US State names to lowercase, then running them through the Proper function. Recently I had to search several databases to find the function so I could use it on a SQL Server 2005 database. So, in case I need it again, I decided to save it here....

Saturday, April 15, 2006 9:59:31 AM (US Eastern Standard Time, UTC-05:00)  #       | 
 Saturday, February 25, 2006

For a project we are working on, we have a backend process that needed to get the PostalCode from ASPNET_Profile for a UserName. Thanks to David Silverlights post on DotNetSlackers.com we added the following functions to our database:

 --!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

CREATE FUNCTION dbo.fn_GetElement
(
@ord AS INT,
@str AS VARCHAR(8000),
@delim AS VARCHAR(1) )
RETURNS INT
AS
BEGIN
  -- If input is invalid, return null.
  IF  @str IS NULL
      OR LEN(@str) = 0
      OR @ord IS NULL
      OR @ord < 1
      -- @ord > [is the] expression that calculates the number of elements.
      OR @ord > LEN(@str) - LEN(REPLACE(@str, @delim, '')) + 1
    RETURN NULL
  DECLARE @pos AS INT, @curord AS INT
  SELECT @pos = 1, @curord = 1
  -- Find next element's start position and increment index.
  WHILE @curord < @ord
    SELECT
      @pos    = CHARINDEX(@delim, @str, @pos) + 1,
      @curord = @curord + 1
  RETURN    CAST(SUBSTRING(@str, @pos, CHARINDEX(@delim, @str + @delim, @pos) - @pos) AS INT)
END

 --!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

CREATE FUNCTION dbo.fn_GetProfileElement
(
@fieldName AS NVARCHAR(100),
@fields AS NVARCHAR(4000),
@values AS NVARCHAR(4000))
RETURNS NVARCHAR(4000)
AS
BEGIN

  -- If input is invalid, return null.
  IF  @fieldName IS NULL
      OR LEN(@fieldName) = 0
      OR @fields IS NULL
      OR LEN(@fields) = 0
      OR @values IS NULL
      OR LEN(@values) = 0
    RETURN NULL

-- locate FieldName in Fields
DECLARE @fieldNameToken AS NVARCHAR(20)
DECLARE @fieldNameStart AS INTEGER, @valueStart AS INTEGER, @valueLength AS INTEGER

-- Only handle string type fields (:S:)
SET @fieldNameStart = CHARINDEX(@fieldName + ':S',@Fields,0)

-- If field is not found, return null
IF @fieldNameStart = 0 RETURN NULL
SET @fieldNameStart = @fieldNameStart + LEN(@fieldName) + 3

-- Get the field token which I've defined as the start of the field offset to the end of the length
SET @fieldNameToken = SUBSTRING(@Fields,@fieldNameStart,LEN(@Fields)-@fieldNameStart)

-- Get the values for the offset and length
SET @valueStart = dbo.fn_getelement(1,@fieldNameToken,':')
SET @valueLength = dbo.fn_getelement(2,@fieldNameToken,':')

-- Check for sane values, 0 length means the profile item was stored, just no data
IF @valueLength = 0 RETURN ''

-- Return the string
RETURN SUBSTRING(@values, @valueStart+1, @valueLength)
END

 --!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Then a stored procedure was created to retrieve the users PostalCode by passing in the UserName.

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

CREATE PROCEDURE GetPostalCodeFromUserName
 @UserName nvarchar(50)
AS
BEGIN
 SET NOCOUNT ON;

 SELECT dbo.fn_GetProfileElement(N'PostalCode', dbo.aspnet_Profile.PropertyNames, dbo.aspnet_Profile.PropertyValuesString) AS PostalCode FROM dbo.aspnet_Profile INNER JOIN dbo.aspnet_Users ON dbo.aspnet_Profile.UserId = dbo.aspnet_Users.UserId where dbo.aspnet_Users.UserName = @UserName

END
GO

Saturday, February 25, 2006 7:36:06 PM (US Eastern Standard Time, UTC-05:00)  #       | 
 Monday, January 23, 2006

The GetSchedule() timeout error:

Error: Schedule is currently unavailable.
DotNetNuke.Services.Exceptions.ModuleLoadException: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding. ---> System.Data.SqlClient.SqlException: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding. at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection) at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection) at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj) at System.Data.SqlClient.TdsParser.Run(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj) at System.Data.SqlClient.SqlDataReader.ConsumeMetaData() at System.Data.SqlClient.SqlDataReader.get_MetaData() at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString) at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async) at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method, DbAsyncResult result) at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method) at System.Data.SqlClient.SqlCommand.ExecuteReader(CommandBehavior behavior, String method) at System.Data.SqlClient.SqlCommand.ExecuteReader(CommandBehavior behavior) at Microsoft.ApplicationBlocks.Data.SqlHelper.ExecuteReader(SqlConnection connection, SqlTransaction transaction, CommandType commandType, String commandText, SqlParameter[] commandParameters, SqlConnectionOwnership connectionOwnership) at Microsoft.ApplicationBlocks.Data.SqlHelper.ExecuteReader(String connectionString, CommandType commandType, String commandText, SqlParameter[] commandParameters) at Microsoft.ApplicationBlocks.Data.SqlHelper.ExecuteReader(String connectionString, String spName, Object[] parameterValues) at DotNetNuke.Services.Scheduling.DNNScheduling.SqlDataProvider.GetSchedule() at DotNetNuke.Services.Scheduling.DNNScheduling.SchedulingController.GetSchedule() at DotNetNuke.Services.Scheduling.DNNScheduling.DNNScheduler.GetSchedule() at DotNetNuke.Modules.Admin.Scheduling.ViewSchedule.Page_Load(Object sender, EventArgs e) --- End of inner exception stack trace ---

The temporary fix:

Cleaning out the ScheduleHistory table fixes the problem until it gets too full again...

Delete From ScheduleHistory where ScheduleHistoryID > 1

Make sure you backup your database before executing this SQL statement.

I need to investigate more on how the code for GetSchedule() works and how to cleanup the ScheduleHistory more often. Also, we are working with Microsoft on a different issue, they have voiced concern on the performance of the stored proc GetSchedule. I will post any suggestion here when they surface.

 

Monday, January 23, 2006 1:02:55 PM (US Eastern Standard Time, UTC-05:00)  #       |   | 
 Friday, January 13, 2006

While working with Microsoft concerning the DNN SQLDataProvider and 64-Bit SQL Server 2005 Issue, it was suggested that there was a performance issue with GetTab. While this does not help the issue we are having, it did help with performance...

"There are some implicit converts happening even though you have explicit converts in the statements"

Current GetTab SQL:

select TabID,
TabOrder,
Tabs.PortalID,
TabName,
IsVisible,
ParentId,
[Level],
DisableLink,
Title,
Description,
KeyWords,
IsDeleted,
'Url'= case when F2.FileName is null then Tabs.Url else F2.Folder + F2.FileName end,
SkinSrc,
ContainerSrc,
TabPath,
StartDate,
EndDate,
'IconFile' = case when Files.FileName is null then Tabs.IconFile else Files.Folder + Files.FileName end,
'HasChildren' = case when exists (select 1 from Tabs T2 where T2.ParentId = Tabs.TabId) then 'true' else 'false' end,
RefreshInterval,
PageHeadText
from   Tabs
left outer join Files on Tabs.IconFile = 'fileid=' + convert(varchar,Files.FileID)
left outer join Files F2 on Tabs.Url = 'fileid=' + convert(varchar,F2.FileID)
where  TabId = @TabId

 

The FIX:

select TabID,
TabOrder,
Tabs.PortalID,
TabName,
IsVisible,
ParentId,
[Level],
DisableLink,
Title,
Description,
KeyWords,
IsDeleted,
'Url'= case when Files.FileName is null then Tabs.Url else files.Folder + Files.FileName end,
SkinSrc,
ContainerSrc,
TabPath,
StartDate,
EndDate,
'IconFile' = case when Files.FileName is null then Tabs.IconFile else Files.Folder + Files.FileName end,
'HasChildren' = case when exists (select 1 from Tabs T2 where T2.ParentId = Tabs.TabId) then 'true' else 'false' end,
RefreshInterval,
PageHeadText
from   Tabs
left outer join Files on Tabs.IconFile = 'fileid=' + convert(nvarchar,Files.FileID)
and Tabs.Url = 'fileid=' + convert(nvarchar,Files.FileID)
where  TabId = @TabId

Running SQLDiag.exe confirmed the modification helped improve performance. More details later...

I'll post this to the DNN Bug Tracker shortly...

Update 1/25/2006:

Now in Bug Tracker: 2404

Friday, January 13, 2006 2:35:44 PM (US Eastern Standard Time, UTC-05:00)  #       |   | 
 Friday, January 06, 2006

I made the following post on the DNN Forums, and decided to post it here in case someone reads it here first...

This is all related to: Upgrading DNN from local SQL Server 2000 to Remote SQL Server 2005

We are experiencing an issue with DNN and a 64-Bit SQL Server 2005 database server that seems to be related to the SQLDataProvider. We are currently running a production web server (Server A) that has an instance of SQL Server 2000 running locally. We have a mixture of DNN3 and NON-DNN sites running on Server A. All DNN sites are fast and run smoothly, only experiencing a few issues during peak traffic times regarding the local SQL Server 2000. Previously we had a separate server for our production SQL Server 2000 databases. However, we recently acquired a very fast 64-Bit server that came from Overture.com (Server C) and had to make room in our rack by decommissioning the old database server after moving all databases to Server A. Now, we are in the process of moving all Server A databases to Server C , so we can go back to using Server A as web server only. We have been testing and planning our database move utilizing our development server (Server B).  Our goal is to move ALL Server A databases to Server C.

Server A:
Dell 2850
Dual 3 Ghz Xeons
2 Gb RAM
32-Bit
Windows 2003 Standard Edition
SQL Server 2000

Server B:
Dell 1550
Dual 1 Ghz Pentiums
2 Gb RAM
32-Bit
Windows 2003 Enterprise Edition
SQL Server 2000 Standard Instance
SQL Server 2005 Enterprise Instance

Server C:
Dell 7150
Quad Itanium Processors
12 Gb RAM
64-Bit Windows 2003 Enterprise Edition
64-Bit SQL Server 2005 Enterprise Edition


We began testing the NON-DNN site and DNN site databases against Server B, all with success. We also setup DNN instances on Server B that used Server C as database server. Testing was successful, but all DNN databases tested were less than 100 Mb. We noticed that the DNN sites on Server B that talked to Server C DNN databases were slower, but dismissed it as being on the slower Server B when compared to Server A.

So, after much testing for NON-DNN sites, we moved all NON-DNN databases to Server C. All NON-DNN sites use the SQLOLEDB Provider. Our largest NON-DNN database is over 3.5 Gb. After the move, the site with the largest database experienced great performance/speed improvements. Some larger data transactions were executed in almost half of the time. We were very pleased with the results.

After successful NON-DNN databases move from Server A to Server C, we began moving the DNN databases to Server C.  The first 3 were very small databases. However, the largest of the 3, we noticed a difference in speed. Concerned, but determined to move on, we moved a 2.5 Gb DNN database to Server C and gave it a try... The speed was horrible taking as much as 15-45 seconds to deliver a page. We quickly backed out of the upgrade and began our investigation…

Doing a little research, I dug into the DNN documentation for Data Access and looked at the code to see DNN uses System.Data.SQLClient as the provider. Why would SQLOLEDB be fast and the DNN SQLDataProvider be so dreadfully slow and only when on a 64-Bit server? I really have no desire to change any of the core DNN code to use the System.Data.OleDb as the provider to test and see if there is a difference, so any recommedations and/or suggestions are appreciated. 

I know there are a few issues with 64-Bit SQL Server 2005 tools running slow: You may experience slow performance when you run 32-bit SQL Server tools on 64-bit operating systems. But, this should not be related to the issue since this concerns just the tools, or does it?

I plan on burning a support call to Microsoft, but would like to gain any insight others might have before I make the call.

Friday, January 06, 2006 12:04:29 AM (US Eastern Standard Time, UTC-05:00)  #       |   | 
 Tuesday, January 03, 2006

Yesterday, we attempted to move a few DNN installations from using a local SQL Server 2000 database to our new SQL Server 2005 server. First, we have a few non-DNN sites that are from pre-DNN days and are using the SQLOLEDB provider. After moving the largest database of 3.5 Gb, testing the site showed remarkable improvements of speed and performance. On pages with heavy data interaction, some loaded almost twice as fast. We then moved a few of the databases for smaller DNN sites to the new server. These sites were really small and a huge difference in speed could not really be determined without some page execution code being added. Then we tried moving a 2.5 Gb DNN database. The move went smooth, but the speed was horrible!

So, this had me thinking about incoming tabular data stream (TDS) remote procedure call (RPC) protocol stream is incorrect error again. Could an issue with the default provider for DNN and SQL Server 2005 be causing trouble? Oddly enough, when monitoring the remote SQL server, there was a huge spike on all 4 processors while processing the page. Well, I am going to find out tonight. I am going to test speed from a remote SQL Server 2000 server and see if there is any difference in speed. I'll post my results here...

UPDATE:

What a surprise. Checkout the results...

 

 

So, it looks like this is a 64 Bit and default DNN data provider issue. As I mentioned above, the old non-DNN sites using SQLOLEDB for the provider were super fast.

Could this be related: You may experience slow performance when you run 32-bit SQL Server tools on 64-bit operating systems

Look like a call to MS Support is in order. I just need to do a little more reseach on the DNN data provider. More details coming soon....

Update 1/4/2005:

After looking through the DNN documentation, I found the following: DotNetNuke Data Access.pdf (428.91 KB)

Tuesday, January 03, 2006 11:50:26 PM (US Eastern Standard Time, UTC-05:00)  #       | 
 Tuesday, December 13, 2005

InnerException: The incoming tabular data stream (TDS) remote procedure call (RPC) protocol stream is incorrect. Parameter 1 ("@rawQuery"): Data type 0xE7 has an invalid data length or metadata length.

This is here as a placeholder because I have a feeling this is going to come up agaqin soon...We experieced this issue with a DNN module that called a stored proc on a database that was not located on the same server as where the DNN installation lived and a different database than the DNN database.

0xE7 = 231, which is correct for nvarchar. Running SQL Profiler, no trace of the stored proc being run was found.

**NOTE** The stored procedure that is being called is NOT in the DotNetNuke database.

We tried the following cases with interesting results:

Error Case:

Server A (Windows 2003 Standard running IIS), connecting to Server B (remote SQL Server 2005 database). ERROR = TRUE

Test Case 1:

Server A (Windows 2003 Enterprise running IIS), connecting to local SQL Server 2000  database. ERROR = FALSE

Test Case 2:

Server C (Windows 2003 Standard running IIS), connecting to local SQL Server 2000  database. ERROR = FALSE

Test Case 3:

Server A (Windows 2003 Enterprise running IIS), connecting to Server C (remote SQL Server 2000  database). ERROR = FALSE

Test Case 4:

Server C (Windows 2003 Standard running IIS), connecting to Server B (remote SQL Server 2005 database). ERROR = TRUE

Test Case 5:

Server D (Windows 2003 Standard running IIS), connecting to local SQL Server 2005  database. ERROR = TRUE

See the pattern? So, this definitely points to an SQL Server 2005 issue. This lead us down the route of changing the provider to see if the module would work. And guess what? It did. The error goes away. WTF? No time to run a packet sniffer. I am logging this here as a great mystery to solve some other time.


 Update 1/9/2005:

During validation of a byte-ordered user-defined type passed through RPC, user-defined type validation performs de-serialization/re-serialization of the user-defined type and requires that the resulting bytes be exactly the same as the original. If the validation fails, you will see the error:

"System.Data.SqlClient.SqlException, Incoming TDS RPC protocol stream is incorrect. Parameter 1 ("<ParameterName>"): The supplied value is not a valid instance of data type <TypeName>. Check the source data for invalid values."  - Reference: http://support.microsoft.com/default.aspx?scid=kb;en-us;910228  in section 4.1.4
Tuesday, December 13, 2005 3:35:40 PM (US Eastern Standard Time, UTC-05:00)  #       |   | 
 Monday, December 05, 2005

We have a large DNN installation and we are moving everything from DNN 3.1 to DNN 4.0 and needed to do some testing. So, first we need to get the DNN database from the local SQL Server 2000 to a remote SQL Server 2005.

Here is what I did...

  1. Went into DNS and created a new subdomain called dev.somedomainname.com and pointed it to the development server IP address.
  2. Copied the entire DNN directory on the production server to the development server.
  3. Reset permissions on the dev server for the DNN folder I copied from production to allow Network Service full permission.
  4. Created a new website on Dev in IIS and setup a host header for the new subdomain and for localhost, pointing it to the DNN folder I copied from Prod.
  5. Detached the production DNN database form the SQL Server 2000 instance.
  6. Copied the MDF and LDF files for the DNN database from \mssql\MSSQL\Data on Prod and moved them to \Microsoft SQL Server\MSSQL.1\MSSQL\Data  on Dev (actually I just made a copy of the files in the same folder so I could quickly get Prod running again and would not have to wait on the slower network connection to move 3 Gb of data over, then moved them over to Dev and renamed them by removing "Copy of " from the filename).
  7. Attached the database I copied from Prod using SQL Server Management Studio. Part of this process automatically upgrades database to from 2000 to 2005. WARNING! You cannot move a SQL Server 2005 database back to SQL Server 2000 using the same method. Nor can you use replication from SQL Server 2005 to a 2000 instance...or not atleast when I tried it in CTP release a couple of months ago.
  8. Opened web.config and changed the server name, username, and password for the remote SQL Server.
  9. Opened up a browser on the Dev server and went to http://dev.somedomainname.com and confirmed site was assessible.

I will repeat these tasks again when I have done thorough testing on Dev with the exception of copying the DNN root over. I will post any issues I encounter...

Monday, December 05, 2005 6:43:35 PM (US Eastern Standard Time, UTC-05:00)  #       |   | 
 Sunday, December 04, 2005

I had installed DotNetNuke 4.0 on a Windows 2003 Standard box using a remote SQL Server 2005 Enterprise database on a development box earlier tonight. I had navigated around doing a few things and paused in the middle of updating the host account. I might have reached the session expiration time before I returned, but I decided to update the host password and clicked update. This is the error that was returned:

Unhandled Error

Error Details

File
Error  

An error has occurred while establishing a connection to the server. When connecting to SQL Server 2005, this failure may be caused by the fact that under the default settings SQL Server does not allow remote connections. (provider: Named Pipes Provider, error: 40 - Could not open a connection to SQL Server)

I believe this means I need to allow named pipes because by default SQL Server 2005 does not allow this type connection. I was not aware that DNN used Named Pipes and I did not see anything concerning this in the Installation help. So, this is how you change it....

  1. Start > All Programs > Microsoft SQL Server 2005 > Configuration Tools > SQL Server Surface Are Configuration
  2. Once opened, use Surface Area Configuration for Services and Connections
  3. Select Using both TCP/IP and named pipes
  4. A notification that the Database Engine must be restarted before it takes effect will be displayed. You can restart the engine from Admin Tools > Services and restart SQL Server (MSSQLSERVER)

I am assuming this is all that is required. I will post a comment here if I encounter this error again.

 

 

Sunday, December 04, 2005 12:05:10 AM (US Eastern Standard Time, UTC-05:00)  #       |   | 
 Friday, December 02, 2005

A DotNetNuke site I have been working on that has some decent traffic, was experiencing an occasional Out of Memory Error. During peak traffic times, IIS would crap out and throw the error. IISReset was neccessary sometimes to alleviate the issue...very frustrating.

Here is the error:

InnerException: Exception of type System.OutOfMemoryException was thrown.
Message: DotNetNuke.Services.Exceptions.PageLoadException: Exception of type System.OutOfMemoryException was thrown. ---> System.OutOfMemoryException: Exception of type System.OutOfMemoryException was thrown. --- End of inner exception stack trace ---

Environment:

  • DotNetNuke 3.1
  • Windows 2003 Server
  • All of the latest patches/service packs
  • MS SQL Server 2000
  • 1.6 Gb Database size
  • Dual 3 Ghz Xeons with 2 Gb RAM Dell 2850

After digging into logs and looking around Host > Schedule,  I noticed that the errors coincided with the DNN Search Indexer schedule, so I stopped it. This helped for a bit, but as the site continued to grow (over 500,000 pages), the error began reappearing. 

I had tried a few other things to get rid of the issue without success. For example, under Host > Schedule I enabled DotNetNuke.Services.Cache.PurgeCache, DOTNETNUKE and was purging every 30 minutes. I thought surely this would be the answer, kicking myself for not enabling it earlier. However, to my disappointment, the error persisted. ARRRRRRRR!

I feel strongly that SQL Server should be on it's own server, but due to a server shuffle in our rack and decommisioning an older server to make room for 3 other servers, we have been using a single server for both IIS and SQL Server for this site. We are working through a few Remote Procedure Call errors with DNN 3.x and 4.0 and SQL Server 2005 (will discuss in later post) and were forced to use this setup. This error would surely go away when we move our production database to our Quad Itanium Processor server with 12 Gb RAM, but unfortunately this was not an option. So, I started digging through MSDN and found the Server Memory Options.

"Use max server memory to prevent SQL Server from using more than the specified amount of memory, thus leaving remaining memory available to start other applications quickly. SQL Server does not immediately allocate the memory specified in max server memory on startup. Memory usage is increased as needed by SQL Server until reaching the value specified in max server memory. SQL Server cannot exceed this memory usage unless the value of max server memory is raised." - MSDN

So, I decided to give it a try...This is what I did:

  1. Opened Enterprise Manager
  2. Expanded the server group
  3. Right-clicked the server
  4. Clicked Properties
  5. Clicked the Memory tab
  6. Under Dynamically configure SQL Server memory, lower the Maximum

This seems to have worked. MS SQL Server is such a pig. By putting it on a diet and setting max server memory, the error has disappeared. I decided to do some testing and was beating up the server pretty bad with several crawlers/spiders I have built and was hitting the site hard. The only error I experienced during my testing was an occasional deadlock victim error, which is expected with the number of transactions taking place on this database. I will live with a temporary error over an IIS crash anyday. Moving everything back to seperate servers for IIS and SQL Server should help greatly for a DNN site of this size. Let's just hope the DNN team can knock out a few crtitical issues with RPC and SQL Direct provider.

Friday, December 02, 2005 2:32:13 AM (US Eastern Standard Time, UTC-05:00)  #       |   | 
Copyright © 2010 Venexus, Inc.. All rights reserved.