Natural Language Processing (NLP) and Understanding (NLU) has become an essential toolset for the search engines of our times. There are a lot of sophisticated frameworks such as Seq2Seq, FastText, Glove, Word2Vec, BERT etc. for implementing NLP / NLU based models. Everything starts with a good language / linguistic / meta-linguistic model. We have to decide the parameters ( features ) and weights for them in this creative endeavour.
When it is about search engines and their information architecture, fundamental relationships between data, metadata, taxonomy, tokens, topology etc. remain quite significant. Hence I would suggest to give importance to find-ability, link-ability & usability when we encode and enumerate the parameter set ( feature space ) for a search engine, be it a document search engine, image search engine, face search engine, voice search engine or any other kind.
The notions of find-ability, link-ability, & usability will manifest in numerous dimensions and dynamics for various formats mentioned earlier. In this age of fake news and falsified information, trustability also become quite important. If we are factoring the data dimensions into the feature set, we can consider the big data features like volume, velocity, variety and veracity into consideration. I would say they are not essential for a simple and small scale search engine.
Find-ability elucidates the content discovery aspects such as navigation, sitemap, query structure, result set etc. Link-ability summarises the inter-objective and inter-subjective aspects of content, relationships between data and metadata, compactness of taxonomy etc. Usability is an angle towards the accessibility, visibility, experience, cognisance, consumption etc. related to the end-user engagements with the search engine.
This note is quite a brief collection of spontaneous thoughts on the aspects behind a good feature space for an NLP based search engine. Rather than starting with an arbitrary hypothesis or model, it is better to evolve a logical and linguistic framework for NLP based search engines.