suggestions on fulltext search or already existing search algorithms
Can someone suggest how to solve the below seach problem easily, I mean is there any algorithm, or full text search will be suffice for this?
There is below classification of items data,
ItemCategory ItemCluster ItemSubCluster SubCluster Items
Vegetable Root vegetables Root WithOutSkin potato, sweet potato, yam
Vegetable Root vegetables Root WithSkin onion, garlic, shallot
Vegetable Greens Leafy green Leaf lettuce, spinach, silverbeet
Vegetable Greens Cruciferous Flower cabbage, cauliflower, Brussels sprouts, broccoli
Vegetable Greens Edible plant stem Stem celery, asparagus
The inputs will be some thing like,
sweet potato, yam Yam, Potato garlik, onion lettuce, spinach, silverbeet lettuce, silverbeet lettuce, silverbeet, spinach
From the input, I want to get the mapping of the input items those belongs to which ItemCategory, ItemCluster, ItemSubCluster, SubCluster.
Any help will be much appreciated.
1 answer
-
answered 2022-05-07 10:48
Deepak Tatyaji Ahire
You are nearly following the right approach.
You don't need full text searching here.
What you can create here is a kind of inverted index as follows:
If we take example of
potato
, create a map forpotato
storing what is its ItemCategory, ItemCluster, ItemSubCluster, SubCluster.For example -
"potato": { "ItemCategory": "Vegetable", "ItemCluster": "Root vegetables", "ItemSubcluster": "Root", "Subcluster": "Without Skin" }
Now, to store this kind of data for each vegetable would be expensive.
You can optimise the storage by using an encoding scheme:
For example -
let
ItemCategory
be denoted by0
, letItemCluster
be denoted by1
, letItemSubcluster
be denoted by2
, letSubcluster
be denoted by3
and the values be denoted by a similar encoding scheme:
let
Vegetable
be denoted by0
, letRoot vegetables
be denoted by1
, letRoot
be denoted by2
, letWithout Skin
be denoted by3
Now, your mapping becomes:
"potato": { "0": "0", "1": "1", "2": "2", "3": "3", }
To further optimise this, you can also make maintain an index of vegetables. For example,
potato
can be denoted by0
.So your final index becomes:
"0": { "0": "0", "1": "1", "2": "2", "3": "3", }
do you know?
how many words do you know