ValueError: Tokenizer class MarianTokenizer does not exist or is not currently imported
Get this error when trying to run a MarianMT-based nmt model.
Traceback (most recent call last):
File "/home/om/Desktop/Project/nmt-marionmt-api/inference.py", line 45, in <module>
print(batch_inference(model_path="en-ar-model/Mark2", text=text))
File "/home/om/Desktop/Project/nmt-marionmt-api/inference.py", line 15, in batch_inference
tokenizer = AutoTokenizer.from_pretrained(model_path, local_file_only=True)
File "/home/om/.virtualenvs/marianmt-api/lib/python3.8/site-packages/transformers/models/auto/tokenization_auto.py", line 525, in from_pretrained
raise ValueError(
ValueError: Tokenizer class MarianTokenizer does not exist or is not currently imported.
1 answer
-
answered 2022-05-04 10:16
Om Rastogi
Installing SentencePiece worked for me.
pip install sentencepiece
do you know?
how many words do you know
See also questions close to this topic
-
Python File Tagging System does not retrieve nested dictionaries in dictionary
I am building a file tagging system using Python. The idea is simple. Given a directory of files (and files within subdirectories), I want to filter them out using a filter input and tag those files with a word or a phrase.
If I got the following contents in my current directory:
data/ budget.xls world_building_budget.txt a.txt b.exe hello_world.dat world_builder.spec
and I execute the following command in the shell:
py -3 tag_tool.py -filter=world -tag="World-Building Tool"
My output will be:
These files were tagged with "World-Building Tool": data/ world_building_budget.txt hello_world.dat world_builder.spec
My current output isn't exactly like this but basically, I am converting all files and files within subdirectories into a single dictionary like this:
def fs_tree_to_dict(path_): file_token = '' for root, dirs, files in os.walk(path_): tree = {d: fs_tree_to_dict(os.path.join(root, d)) for d in dirs} tree.update({f: file_token for f in files}) return tree
Right now, my dictionary looks like this:
key:''
.In the following function, I am turning the empty values
''
into empty lists (to hold my tags):def empty_str_to_list(d): for k,v in d.items(): if v == '': d[k] = [] elif isinstance(v, dict): empty_str_to_list(v)
When I run my entire code, this is my output:
hello_world.dat ['World-Building Tool'] world_builder.spec ['World-Building Tool']
But it does not see
data/world_building_budget.txt
. This is the full dictionary:{'data': {'world_building_budget.txt': []}, 'a.txt': [], 'hello_world.dat': [], 'b.exe': [], 'world_builder.spec': []}
This is my full code:
import os, argparse def fs_tree_to_dict(path_): file_token = '' for root, dirs, files in os.walk(path_): tree = {d: fs_tree_to_dict(os.path.join(root, d)) for d in dirs} tree.update({f: file_token for f in files}) return tree def empty_str_to_list(d): for k, v in d.items(): if v == '': d[k] = [] elif isinstance(v, dict): empty_str_to_list(v) parser = argparse.ArgumentParser(description="Just an example", formatter_class=argparse.ArgumentDefaultsHelpFormatter) parser.add_argument("--filter", action="store", help="keyword to filter files") parser.add_argument("--tag", action="store", help="a tag phrase to attach to a file") parser.add_argument("--get_tagged", action="store", help="retrieve files matching an existing tag") args = parser.parse_args() filter = args.filter tag = args.tag get_tagged = args.get_tagged current_dir = os.getcwd() files_dict = fs_tree_to_dict(current_dir) empty_str_to_list(files_dict) for k, v in files_dict.items(): if filter in k: if v == []: v.append(tag) print(k, v) elif isinstance(v, dict): empty_str_to_list(v) if get_tagged in v: print(k, v)
-
Actaully i am working on a project and in it, it is showing no module name pip_internal plz help me for the same. I am using pycharm(conda interpreter
File "C:\Users\pjain\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\pjain\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Users\pjain\AppData\Local\Programs\Python\Python310\Scripts\pip.exe\__main__.py", line 4, in <module> File "C:\Users\pjain\AppData\Local\Programs\Python\Python310\lib\site-packages\pip\_internal\__init__.py", line 4, in <module> from pip_internal.utils import _log
I am using pycharm with conda interpreter.
-
Looping the function if the input is not string
I'm new to python (first of all) I have a homework to do a function about checking if an item exists in a dictionary or not.
inventory = {"apple" : 50, "orange" : 50, "pineapple" : 70, "strawberry" : 30} def check_item(): x = input("Enter the fruit's name: ") if not x.isalpha(): print("Error! You need to type the name of the fruit") elif x in inventory: print("Fruit found:", x) print("Inventory available:", inventory[x],"KG") else: print("Fruit not found") check_item()
I want the function to loop again only if the input written is not string. I've tried to type return Under print("Error! You need to type the name of the fruit") but didn't work. Help
-
Converting h5 to tflite
I have been trying to get this zero-shot text classification
joeddav / xlm-roberta-large-xnli
to convert from h5 to tflite file (https://huggingface.co/joeddav/xlm-roberta-large-xnli), but this error pops up and I cant find it described online, how is it fixed? If it can't, is there another zero-shot text classifier I can use that would produce similar accuracy even after becoming tflite?AttributeError: 'T5ForConditionalGeneration' object has no attribute 'call'
I have been trying a few different tutorials and the current google colab file I have is an amalgam of a couple of them. https://colab.research.google.com/drive/1sYQJqvhM_KEvMt2IP15d8Ud9L-ApiYv6?usp=sharing
Edit: Clarity
-
Which EC2 instance types can an AWS Sagemaker Huggingface estimator be trained on?
I'm using the Sagemaker Huggingface SDK v2.77 to deploy a Sagemaker training job. The Sagemaker compiler config attribute,
sagemaker.huggingface.TrainingCompilerConfig.SUPPORTED_INSTANCE_CLASS_PREFIXES
suggests that only "p3", "g4dn" and "p4" instance types can be used to train a Huggingface model on Sagemaker. To confirm, which instance types can be used to train a Huggingface model on Sagemaker? -
NMT LSTM gives an incorrect response, and a big loss
I am writing a neural network for translating texts from Russian to English, but I ran into the problem that my neural network gives a big loss, as well as a very far from the correct answer. Below is the LSTM that I build using Keras:
def make_model(in_vocab, out_vocab, in_timesteps, out_timesteps, n): model = Sequential() model.add(Embedding(in_vocab, n, input_length=in_timesteps, mask_zero=True)) model.add(LSTM(n)) model.add(Dropout(0.3)) model.add(RepeatVector(out_timesteps)) model.add(LSTM(n, return_sequences=True)) model.add(Dropout(0.3)) model.add(Dense(out_vocab, activation='softmax')) model.compile(optimizer=optimizers.RMSprop(lr=0.001), loss='sparse_categorical_crossentropy') return model
And also the learning process is presented:
Epoch 1/10 3/3 [==============================] - 5s 1s/step - loss: 8.3635 - accuracy: 0.0197 - val_loss: 8.0575 - val_accuracy: 0.0563 Epoch 2/10 3/3 [==============================] - 2s 806ms/step - loss: 7.9505 - accuracy: 0.0334 - val_loss: 8.2927 - val_accuracy: 0.0743 Epoch 3/10 3/3 [==============================] - 2s 812ms/step - loss: 7.7977 - accuracy: 0.0349 - val_loss: 8.2959 - val_accuracy: 0.0571 Epoch 4/10 3/3 [==============================] - 3s 825ms/step - loss: 7.6700 - accuracy: 0.0389 - val_loss: 8.5628 - val_accuracy: 0.0751 Epoch 5/10 3/3 [==============================] - 3s 829ms/step - loss: 7.5595 - accuracy: 0.0411 - val_loss: 8.5854 - val_accuracy: 0.0743 Epoch 6/10 3/3 [==============================] - 3s 807ms/step - loss: 7.4604 - accuracy: 0.0406 - val_loss: 8.7633 - val_accuracy: 0.0743 Epoch 7/10 3/3 [==============================] - 2s 815ms/step - loss: 7.3475 - accuracy: 0.0436 - val_loss: 8.9103 - val_accuracy: 0.0743 Epoch 8/10 3/3 [==============================] - 3s 825ms/step - loss: 7.2548 - accuracy: 0.0455 - val_loss: 9.0493 - val_accuracy: 0.0721 Epoch 9/10 3/3 [==============================] - 2s 814ms/step - loss: 7.1751 - accuracy: 0.0449 - val_loss: 9.0740 - val_accuracy: 0.0788 Epoch 10/10 3/3 [==============================] - 3s 831ms/step - loss: 7.1132 - accuracy: 0.0479 - val_loss: 9.2443 - val_accuracy: 0.0773
And the parameters that I transmit for training:
model = make_model(# the size of tokenized words russian_vocab_size, english_vocab_size, # maximum sentence lengths max_russian_sequence_length, max_english_sequence_length, 512) model.fit(preproc_russian_sentences, # all tokenized Russian offers that are transmitted in the format shape (X, Y) preproc_english_sentences, # all tokenized English offers that are transmitted in the format shape (X, Y, 1) epochs=10, batch_size=1024, validation_split=0.2, callbacks=None, verbose=1)
Thank you in advance.
-
What are the differences between BLEU score and METEOR?
I am trying to understand the concept of evaluating the machine translation evaluation scores.
I understand how what BLEU score is trying to achieve. It looks into different n-grams like BLEU-1,BLEU-2, BLEU-3, BLEU-4 and try to match with the human written translation.
However, I can't really understand what METEOR score is for evaluating MT quality. I am trying understand the rationale intuitively. I am already looking into different blog post but cant really figure out.
How these two evaluation metrics are different and how they are relevant?
Can anyone please help?