How to auto-moderate ads using machine learning?

There are ads, they consist of a title + description + pictures and some other features. Text is in Russian or Romanian language.

There are 3 tasks:

  • to determine whether the category is correct
  • whether the title is correct
  • something illegal is not being sold.

I tried to do this for categories task: title + body + pulled out the text from the picture using EfficientNet, then using tf-idf encode text and classify with SVC and got 85 max% accuracy , but in production it all drops to 75%. Also tried using fasttext encodings, transliterate text remove/not removing stopwords but didn't get a significant improvement.

As for finding if the title is correct , I'm stuck.

How can I make the algorithm understand the correlation between title and description with photos? Should I use multiple input or Feature Union in scikit-learn?

Please tell me the direction, or what to google / to use. Thank you.