Brad's profileSharePoint BlogPhotosBlogLists Tools Help

Blog


    July 29

    Why do repeated full Crawls using WSS / MOSS?

    I just saw a good article from Mike Taghizadeh describing reasons to do a full crawl at regular intervals. Good article, as I knew about the first 2 reasons and the second last one, but the third one really caught my eye - The crawler does not detect updates in SharePoint ASPX pages - so update a page's content or change a view, and these changes will only get picked up during a full crawl.

    When planning your search & indexing schedule, take this into account and you will probably have to set up daily full indexes for SharePoint sites (after hours, of course).

    From Mike's article (http://feeds.feedburner.com/~r/sharepointmsblogs/~3/134295959/reasons-for-a-full-crawl.aspx):
    ____________________________________________________________________

    I have been asked few times, the reasons why MOSS Search would need to do a full crawl. The following information has been taken out from one the whitepapers on TechNet and does a good job of explaining this:

    Reasons for an SSP administrator to do a full crawl include:

    • One or more QFE or service pack was installed on servers in the farm. See the instructions for the hotfix or service pack for more information.
    • An SSP administrator added a new managed property.
    • To re-index ASPX pages on Windows SharePoint Services 3.0 or Office SharePoint Server 2007 sites.

      Note: The crawler cannot discover when ASPX pages on Windows SharePoint Services 3.0 or Office SharePoint Server 2007 sites have changed. Because of this, incremental crawls do not re-index views or home pages when individual list items are deleted. We recommend that you periodically do full crawls of sites that contain ASPX files to ensure that these pages are re-indexed.

    • To resolve consecutive incremental crawl failures. In rare cases, if an incremental crawl fails one hundred consecutive times at any level in a repository, the index server removes the affected content from the index.
    • One or more crawl rules have been added or modified
    • To repair a corrupted index

    The system does a full crawl even when an incremental crawl is requested under the following circumstances:

    • An SSP administrator stopped the previous crawl.
    • A content database was restored.
    • A full crawl of the site has never been done.
    • To repair a corrupted index. Depending upon the severity of the corruption, the system might attempt to perform a full crawl if corruption is detected in the index

    __________________________________________________________________

    There's also a comment at the bottom of the article that indicates that you also need to do a Full Index in order to pull down the new ACL's of a file if the access list was changed but the file was not - otherwise it's possible that a user would see a search result linking to a file they do not have access to view (so the security trimming fails).

    ***UPDATE - SP1 and Post-SP1 Hotfix rollup*** - The incremental indexing of files now also looks at ACL settings and updates them if necessary, provided you have applied the Post-SP1 hotfix rollup for WSS - KB941422

    Bye :)

    July 16

    The Dreaded Double-hop dilemma and it's dynamic destroyer, Kerberos

    The "Double-hop" issue: Back in the good old days, Microsoft developed a way of authenticating clients (users) against a common database of User Name / Password pairs. They called it NTLM (for NT Lan Manager, as it managed authentication across a Lan... and they were prefixing everything back in those days with the letters NT... like NTFS, NTDS, NTCR, NTLDR and NTLM... something about corporate branding I suppose.) Back in those days, multi-tiered environments normally used either "process" accounts or an alternate form of authenticating between the first software tier and the second software tier (like a SQL account). Too easy.

    Unfortunately, it wasn't long before developers wanted to start passing through the user's authentication details to a back-end system so that the data was secure as well as the interface to the data (Kind of makes sense - if an account is compromised, you only have the rights of the compromised account on the target system, rather than a god account *cough* sa, blank pwd *cough*). However, security restrictions on the ability for accounts to impersonate other accounts (or perhaps an architectural oversight ) meant that the NTLM authentication system was not secure enough to allow this sort of caper - so once you authenticate to a server, you need to use another account (programmatically) to get to the back-end data.

    So how do you tell you're suffering from double-hop madness? There's a couple of dead giveaways.

    • First, the web application works the way you intended it to when you access it from the web server, where your access is either user or power user - but from your local machine you get access denied (Because you are already on the Web server, the first "Hop" is to the back end database. When you're on your workstation, your first "Hop" is to the front end web server).
    • Second, you open up the web application page successfully but the back-end part of the application keeps getting requests for anonymous access in the security event logs every time you refresh the page.
    • You are constantly getting prompted to log in when accessing a site, even though you are entering valid credentials (although this can mean other things as well).

    Kerberos was created to be more secure and faster than NTLM - and to fill the double-hop void... but by default, even it does not allow accounts to impersonate other accounts unless you explicitly force them to do so. That's why you need to allow the service accounts of the web application to impersonate the user's account that is currently logged into it.

    July 07

    Web Parts - Summary Link Web Part

    This is a kind of cool web part. It's cool because it allows you to have a "Favorites" list within your team site that is collapsible, can be grouped under categories and then categories can be relocated using drag & drop - and the UI for laying out the information is a lot more intuitive than the standard list presentation page.

    It's only "Kind of" because it does not have an associated list driving the content like a standard links list does. You could build your own, but then the layout UI is not as user friendly.

    I see this web part being used mainly in My Sites... but then if you are using a browser to access the site, and you are keeping a list of links that are useful to you in your my site, where is the benefit over the "Links" bar in IE?

    Anyhoo, at least y'all know it's there.

    Web Parts - Site Aggregator Web Part

    I thought I'd start up a reference list for each of the web parts that come OTB from MOSS 2007 as I investigate them for the project I'm currently working on. First one - Site Aggregator Web Part.

    The site aggregator web part is actually a pretty simple web part - put it on a page and it can display all objects (images, documents, forms, excel spreadsheets) that have been added to any SharePoint site belonging to the same site collection as the one that the web part is deployed upon.

    To configure it, you just enter the URL of the site that you want to display all the documents from, then give the link a name - this name then shows up as the name of the tab you click on to display the content from that site, so make it intuitive, eh! :)

    So... under "Sites", click on the "New Sites Tab" - Enter in the URL and the tab name, then click Create.

    New tab created, check!

    To do: Save the world :)

    Couple of tips - you can't edit a tab you've already created... and you can't move the tabs around... and they are not sorted alphabetically... hmmm, best advice is to make sure the one you've just done is complete before moving onto the next tab.

    July 03

    So you've Decided to use the Content Query Web Part... Now What?

    Here are some articles on how to use the Content Query Web Part - probably the most underutilized web part in MOSS Standard Edition (OF course the most underutilized parts in MOSS Enterprise are probably Excel Services and the mighty BDC :) )

    Configuring and Customizing The Content By Query Web Part
    Customizing Content Query RSS Feeds
    Customizing the Content Query Web Part and Custom Item Styles 
    Customizing the Content Query Web Part XSL
    How to: Display Custom Fields in a Content Query Web Part

    Cheers!
    Brad