Log in

Proper implementation of search

  • 11 May '14

The way I've seen search implemented in some of the newer forums applications leaves very much to be desired.

Judging by the nature of these platforms, I think Stack and Askbot have the best search implementations, which is one that has auto-completation matching topics as a user types in a question or when the user is typing in a search string.

What will it take to implement something like that here?

nitelyEsteban Castro Borsani
  • 11 May '14

I think so too, the search-as-you-type would be really helpful to avoid duplicate topics.

Spirit uses an app called haystack, they even have an example on how to do it. So I think it'll be relatively easy to do.

  • 11 May '14


  • 11 Jun '16

@nitely Hi There~
For example, I want to search this topic with the name "Proper implementation of search"
I cannot search with substring like "Prop" or "Prope", only "Proper" worked.
Is it possible to search with substring ?
Best regards~

nitelyEsteban Castro Borsani
  • 9
  • 11 Jun '16

That's not possible with haystack, AFAIK. Not even in search engines like google. This is likely the reason why.

Edit: there is the autocomplete feature, but it's only good for autocomplete, not for getting actual search results (ie: won't return all topics matching "proper", only the first one). if you type "prop", it will match "proper", but if you type "prop implementation" it will match nothing.

Edit2: Just tried it, it's possible using EdgeNgramField yeeey

  • 12 Jun '16

thanks for your assistance~~
Best regards~~

nitelyEsteban Castro Borsani
  • 5
  • 16 Jun '16

NgramField is actually what you want. I was about to make this the default instead of a simple CharField, but the index becomes huge, since what you store is every sub-string that form a word, ie: for the "hello" word it will store "hello", "hell", "hel", "he", "h", "ello", "llo", "lo", "o"

For topic titles it may be ok, but when comments are included, the index will be quite large (~100 times larger in my tests).

Edit. I guess I could make this a setting...
Edit2. Indexing a topic with 10K comments of about 500chars each becomes extremely slow with whoosh, should be ok on other search engines.
Edit3. Apparently is not the fact that whoosh is slow the problem but that it loads the entire index in memory, so if you have enough RAM it should be ok. The index is about 1GB on a DB with 1K comments of about 100 unique words each. The index size grows by unique word. A bazillion comments with the same words over and over again will generate a pretty small index at least in whoosh. When using a CharField, the index is about 15MB. In conclusion, don't use NgramField when using Whoosh as search engine backend.