Which categories should be used and how is the categories hierarchy created? (question)
The categories. At first, I added the products in a very restrictive kind of categories. But after adding several dozens of products, I realize that it doesn't make sense have a lot of categories with 1, 2 or 3 products in them (for now), so I am grouping the products in a few general categories (I am still grouping them). I think that the best is follow the "wikipedia way": Having general categories and when exist enough products (15-20??) that can be grouped together, create a subcategory for them. I saw in the french version this hierarchical struture category-subcategory. How can be done? Or you have to do it manually? In the other hand, it seems that people is adding categories without an unified criteria and looks like a little chaotic. Are you or other people working to unify the categories, so, when we have more products in the sp version, we can copy the scheme? (question from Javichu)
-
Anonymous commented
What can I do when something in the categories hierarchy is wrong? The category liquorice is a subcategory of herbal teas at the moment, but everything in the category (at least in Germany) should rather be found under sweets/candies. This has the consequence, that no Nutri Score is computed for liquorice, because herbal teas are excluded in the scoring rules. Liquorice teas exist, but those should be a different category.
-
Aaron Muir Hamilton commented
For English, there is wordnet, which covers a lot of this already.
-
Aitor commented
I agree with the idea of tipying the most specific category but I think it would be very helpfull a autocomplete functionality. This would help not to insert more than one category for the same thing when is possible to put a category that is already in the database hierachy.
-
Aitor commented
Hi, I think it would help to add some list 2 or 3 sublevels of already existing categories on the form for adding a new product in order directly adding products in the right place
-
A commented
Hi, I just round about openfoodfacts, great idea!
I come from a very different area (visual effects), but we too have to deal with big databases with loads of assets from the real world, from tires to grass to water to food to movements to skies, etc.
Its always very tempting to put them in hierarchies, but IMHO, based on my own experience, its almost always better to use tags instead of hierarchies.
So, instead of having to chose which brach a yogurt with orange juice belongs to, just add as many tags as needed, like yogurt, milk, juice, fruit, etc.
And of course a hierarchy can always be derived or inferred from the tags.
Just my two cents. -
So for Spanish, I can think of 2 different ways to create the categories:
1. same approach as French. I create a file to describe the Spanish hiearchy of categories on Google Docs, and little by little people can create the hierarchy.
or
2. we think of a more global approach, where we have an international hierarchy of categories, the same for every language and country, with category names translated in multiple languages.
Maybe we could start with 1, and then later add translations and possibly unify the categories.
I started a file for the Spanish categories:
https://docs.google.com/document/d/1PVXIGh3GFyPsVqQvwabT2Qdl7mfxJYmDs17n4tKhiU0/edit -
For the French version, we intentionaly started with a chaotic approach to categories: everybody was welcomed to put whatever they wanted in the categories field of each product. The idea was to see what kind of categories people put intuitively.
And then we started to put some order to the chaos, in a very incremental way, by starting to normalize the categories names (e.g. we decided to put all of them in the plural form: "Cakes" instead of "Cake") and by creating a sort of hierarchy of categories. ("Orange juices" being a sub-category of "Fruit juices", itself a sub-category of "Drinks" etc.)
So now we ask people to put only the most specific product category they can think of. e.g. "Pineapple juices from concentrate". And all the parents categories are added automatically.
The parents categories can be added retroactively, so it does not matter if the "Pineapple juices from concentrate" category does not exist yet.
The approach is different than the wikipedia way: instead of reviewing the products and moving them to more specific categories sometime later, we start with the most specific category, and then we review the hierarchy of categories to add the new categories that are not in the hierarchy yet.
The "hierarchy" is not strictly speaking a hierarchy: categories can have multiple parents. e.g. "Pineapple juices from concentrate" is a child of both "Pineapple juices" and "Fruit juices from concentrate".
For French, the "hierarchy" of categories is created collaboratively with this file:
https://docs.google.com/document/d/1FZsIoa223TXkDkSeAOdd4JSX_t81NNN6Il3TK-Z8qds/editAnd I use the file to create a graphical view:
http://fr.openfoodfacts.org/data/fr.categories.svg