[subscribe]
Forums -> Site Info

Boolean Searches


We now have some boolean search mechanisms and I think this requires a little explanation:

First of all, and this applies to everybody, if you try to do a keyword search on something like space marine it will think you're entering two keywords; namely space and marine. In order to search for space marine you need to enter it within quotes, thus: "space marine".

Okay, the rest of this really only applies to subscribers because they're the only ones who can use multiple keywords on the keyword search mechanism, and the forum search mechanisms are for subscribers only.

The first thing to note is what I've said above as it applies to forum searches as well as the keyword search: a forum search on space marine will return topics or posts that contain either word, or both, so you'll probably want to wrap it in quotes to get what you really want.

If you do want to search for more than one word, or keyword, then you don't need the quotes. Thus a search on trees rocks pools will give you results that contain trees OR rocks OR pools OR any combination of those. Note however that our search results are NOT weighted. A weighed search would put those results that contained 2 of those 3 words higher on the list than those that contain only 1, and anything with all 3 right would be right at the top of the list. A weighted search would also regard the first search terms as being more important than the ones at the end i.e. trees would be more important thatn rocks with pools trailing in third place. Alas we do not currently have weighted searches (see footnote) and the results are merely listed in the sort order specified on the results page (the default is normally newest first).

So what if you want the results contain ALL of your search words?

In that case you need +trees +rocks +pools. In boolean terms the + means and (not having it there means or) however what it really means is: the data MUST HAVE this word. So the results of this search MUST HAVE trees AND rocks AND pools.

There's also a not operator which means: MUST NOT HAVE. So, for example, +trees +rocks -pools yields results that have both trees and rocks but do not have pools.

Two searches that need a little more thought, and which might yield unexpected results are shown below:

trees rocks -pools - this means that the results should have trees OR rocks OR both, but must not have pools. See the difference between that and +trees +rocks -pools?

Okay, so here's the one that's going to fool you: +trees rocks -pools.

You've probably figured out that the results must have trees and must not have -pools but what about the rocks? The answer is: they get completely ignored!!!

That probably surprised you, but it happens because our searches are NOT WEIGHTED. Think about it: our results for this search MUST HAVE trees and MUST NOT HAVE pools. That in itself defines what data can be returned. Any or terms have nothing to contribute. They WOULD contribute if the search were weighted: results that had them would be higher on the list. But alas our searches are not weighted.

As a rule of thumb: if you put a + in front one term you probably want it in front of every term other than those with a -.

Two final things to note:

  1. You can put a + or - in front of a quoted phrase e.g. space -"space marine". Hopefully by this point you can figure out for yourself what results that will return. Smile

  2. Our search mechanisms are coded to reject searches that have only not words because, for example, a search on -rocks would yield an enormous number of results.

Footnote:

I'd love to be able to implement weighted searches but there are no mechanisms for this in the SQL engine that we're using... so it would need multiple SQL queries and a lot of data crunching to produce weighted results. I'm just not sure that it's worth it and I don't think we can make a decision about that until until we see how it goes with what I've now implemented.


Forums -> Site Info