Make a Self-Learning Custom ner model in Spacy
I have made a Custom ner spacy model . Although as it is work my specific stock market sector, Accuracy of the model is very poor. And training of the model requires around 1.5 Hours
Is their any way where i can daily add few datasets to my model , So that it keeps on learning . Cause otherwise its just a static ner model ?
do you know?
how many words do you know
See also questions close to this topic
-
MDCEV model estimation - all observations have zero probaility at starting value for model component
I am running an MDCEV model on location choice dataset and at first I ran into an error as "Log-likelihood calculation fails at values close to the starting values!" so I changed some starting values and now I am getting this eror:
Error in if (all(testL == 0)) stop("All observations have zero probability at starting value for model component "", : missing value where TRUE/FALSE needed
Could anyone give me some suggestions? Much appreciated!
And here is my code:
rm(list = ls()) library(apollo) apollo_initialise() apollo_control = list( modelName = "MDCEV_no_outside_good", modelDescr = "MDCEV model on housing supply data, alpha-gamma profile, no outside good and constants only in utilities", indivID = "indivID", outputDirectory = "output" ) database = read.csv("Project_MDCEV.csv",header=TRUE) alt = read.csv("alternatives.csv",header=TRUE) attach(database) View(database) apollo_beta = c(alpha_base = 10, gamma_gen = -1, delta_acar = 100, sig = 1) apollo_fixed = c("sig") apollo_inputs = apollo_validateInputs() apollo_probabilities=function(apollo_beta, apollo_inputs, functionality="estimate"){ apollo_attach(apollo_beta, apollo_inputs) on.exit(apollo_detach(apollo_beta, apollo_inputs)) P = list() alternatives = as.character(alt$DAUID) avail = list() for(i in alt$DAUID){ avail[paste0(i)] = 1 } continuousChoice = list() for (i in alt$DAUID){ continuousChoice[[paste0(i)]] = get(paste0("X",i)) } V = list() for (i in alt$DAUID){ V[[paste0(i)]] = delta_acar *alt$ACAR[which(alt$DAUID==paste0(i))] } alpha = list() for (i in alt$DAUID){ alpha[paste0(i)]=1/(1+exp(-alpha_base)) } gamma =list() for (i in alt$DAUID){ gamma[paste0(i)]=gamma_gen } cost = list() for(i in alt$DAUID){ cost[paste0(i)] = 1 } budget <- budget_cal mdcev_settings <- list(alternatives = alternatives, avail = avail, continuousChoice = continuousChoice, utilities = V, alpha = alpha, gamma = gamma, sigma = sig, cost = cost, budget = budget) P[["model"]] = apollo_mdcev(mdcev_settings, functionality) P = apollo_panelProd(P, apollo_inputs, functionality) P = apollo_prepareProb(P, apollo_inputs, functionality) return(P) } model = apollo_estimate(apollo_beta, apollo_fixed, apollo_probabilities, apollo_inputs)
-
Speed differences between QStandardItemModel and QAbstractTableModel?
Can anyone explain the following: I have 2 scripts for loading a pandas dataframe in a tableview which has a filter field. The one with the standard model loads the data in the "init" section. With this one everyting is blazing fast , also the filtering. The second script works much slower but with this one i can set the background color of the cells which i need. These are the scripts:
import timeit import pandas as pd from PyQt5 import QtGui from PyQt5 import QtWidgets from PyQt5.QtCore import * from PyQt5.QtCore import QAbstractTableModel from PyQt5.QtCore import Qt, QSortFilterProxyModel from PyQt5.QtGui import * from PyQt5.uic import loadUi class PandasTableModel(QtGui.QStandardItemModel): def __init__(self, data, parent=None): QtGui.QStandardItemModel.__init__(self, parent) self._data = data for col in data.columns: data_col = [QtGui.QStandardItem("{}".format(x)) for x in data[col].values] self.appendColumn(data_col) return def rowCount(self, parent=None): return len(self._data.values) def columnCount(self, parent=None): return self._data.columns.size def headerData(self, x, orientation, role): if orientation == Qt.Horizontal and role == Qt.DisplayRole: return self._data.columns[x] if orientation == Qt.Vertical and role == Qt.DisplayRole: return self._data.index[x] def flags(self, index): if not index.isValid(): return Qt.ItemIsEnabled return super().flags(index) | Qt.ItemIsEditable # add editable flag. def setData(self, index, value, role): if role == Qt.EditRole: # Set the value into the frame. self._data.iloc[index.row(), index.column()] = value return True return False class TableViewer(QtWidgets.QMainWindow): def __init__(self): super(TableViewer, self).__init__() self.ui = loadUi("QTableViewForm.ui", self) self.ui.cmdRun1.clicked.connect(self.RunFunction1) self.ui.cmdRun2.clicked.connect(self.RunFunction2) self.ui.inputFilter.textChanged.connect(self.SetFilteredView) self.showdata() def showdata(self): start = timeit.default_timer() print("Start LoadData") data = pd.read_pickle("productdata.pkl") self.model = PandasTableModel(data) self.ui.tableData.setModel(self.model) self.proxy_model = QSortFilterProxyModel() self.proxy_model.setFilterKeyColumn(-1) # Search all columns. self.proxy_model.setSourceModel(self.model) self.proxy_model.sort(0, Qt.AscendingOrder) self.proxy_model.setFilterCaseSensitivity(False) self.ui.tableData.setModel(self.proxy_model) print("Stop LoadData") end = timeit.default_timer() print("Process Time: ", (end - start)) def set_cell_color(self, row, column): self.model.change_color(row, column, QBrush(Qt.red)) def RunFunction1(self): start = timeit.default_timer() print("Start RunFunction1") #Gans de rij in 't rood colums = self.proxy_model.columnCount() for c in range(colums): self.set_cell_color(3, c) print("Stop RunFunction1") end = timeit.default_timer() print("Process Time: ", (end - start)) def RunFunction2(self): start = timeit.default_timer() print("Start RunFunction1") #Gans de rij in 't rood colums = self.proxy_model.columnCount() for c in range(colums): self.set_cell_color(3, c) print("Stop RunFunction1") end = timeit.default_timer() print("Process Time: ", (end - start)) def SetFilteredView(self): # print("Start set_filter") filter_text = self.ui.inputFilter.text() self.proxy_model.setFilterFixedString(filter_text) filter_result = self.proxy_model.rowCount() self.ui.lblResult.setText("(" + str(filter_result) + " records)") if __name__ == "__main__": import sys app = QtWidgets.QApplication(sys.argv) win = TableViewer() win.show() sys.exit(app.exec_())enter code here
And the slow one:
import timeit import pandas as pd from PyQt5 import QtGui from PyQt5 import QtWidgets from PyQt5.QtCore import * from PyQt5.QtCore import QAbstractTableModel from PyQt5.QtCore import Qt, QSortFilterProxyModel from PyQt5.QtGui import * from PyQt5.uic import loadUi class PandasTableModel(QAbstractTableModel): def __init__(self, data, parent=None): QAbstractItemModel.__init__(self, parent) self._data = data self.colors = dict() def rowCount(self, parent=None): return self._data.index.size def columnCount(self, parent=None): return self._data.columns.size def setData(self, index, value, role): if role == Qt.EditRole: # Set the value into the frame. self._data.iloc[index.row(), index.column()] = value return True def data(self, index, role=Qt.DisplayRole): if index.isValid(): if role == Qt.DisplayRole: return str(self._data.iloc[index.row(), index.column()]) if role == Qt.EditRole: return str(self._data.iloc[index.row(), index.column()]) if role == Qt.BackgroundRole: color = self.colors.get((index.row(), index.column())) if color is not None: return color return None def headerData(self, rowcol, orientation, role): if orientation == Qt.Horizontal and role == Qt.DisplayRole: return self._data.columns[rowcol] if orientation == Qt.Vertical and role == Qt.DisplayRole: return self._data.index[rowcol] return None def change_color(self, row, column, color): ix = self.index(row, column) self.colors[(row, column)] = color self.dataChanged.emit(ix, ix, (Qt.BackgroundRole,)) class TableViewer(QtWidgets.QMainWindow): def __init__(self): super(TableViewer, self).__init__() self.ui = loadUi("QTableViewForm.ui", self) self.ui.cmdRun1.clicked.connect(self.RunFunction1) self.ui.cmdRun2.clicked.connect(self.RunFunction2) self.ui.inputFilter.textChanged.connect(self.SetFilteredView) self.showdata() def showdata(self): start = timeit.default_timer() print("Start LoadData") data = pd.read_pickle("productdata.pkl") self.model = PandasTableModel(data) self.ui.tableData.setModel(self.model) self.proxy_model = QSortFilterProxyModel() self.proxy_model.setFilterKeyColumn(-1) # Search all columns. self.proxy_model.setSourceModel(self.model) self.proxy_model.sort(0, Qt.AscendingOrder) self.proxy_model.setFilterCaseSensitivity(False) self.ui.tableData.setModel(self.proxy_model) print("Stop LoadData") end = timeit.default_timer() print("Process Time: ", (end - start)) def set_cell_color(self, row, column): self.model.change_color(row, column, QBrush(Qt.red)) def RunFunction1(self): start = timeit.default_timer() print("Start RunFunction1") #Gans de rij in 't rood colums = self.proxy_model.columnCount() for c in range(colums): self.set_cell_color(3, c) print("Stop RunFunction1") end = timeit.default_timer() print("Process Time: ", (end - start)) def RunFunction2(self): start = timeit.default_timer() print("Start RunFunction1") #Gans de rij in 't rood colums = self.proxy_model.columnCount() for c in range(colums): self.set_cell_color(3, c) print("Stop RunFunction1") end = timeit.default_timer() print("Process Time: ", (end - start)) def SetFilteredView(self): # print("Start set_filter") filter_text = self.ui.inputFilter.text() self.proxy_model.setFilterFixedString(filter_text) filter_result = self.proxy_model.rowCount() self.ui.lblResult.setText("(" + str(filter_result) + " records)") if __name__ == "__main__": import sys app = QtWidgets.QApplication(sys.argv) win = TableViewer() win.show() sys.exit(app.exec_())
(i'm loading 2000 rows and 35 columns)
Can i have a fast one with the background color function in it ?
Cheers , Johnson
- Blender render color different from the actual one
-
Issue with correct python package for a pip installation
I want to install spacy in Jupyter, in WSL. So initially I tried with "pip install -U spacy", but got error message that spacy module was not found. After doing some research I understood that the python package must be specified to be the same as for the Jupyter Kernel. Hence as explained below I tried with "python2.7 -m pip install spacy", but now I've got the error message "pip module not found", which leaves me confused.
-
Extracting time dependency/relationships using Spacy NLP
I am currently working on a spacy NLP project and am currently stuck. I am trying to determine which aspect of the text a time entity is modifying/related to. For example:
"I have been playing tennis for three years, and football for five years"
From this sentence I would want to be able to link the "three years" (Time Entity) to tennis (Sport Entity), and the "five years" (Time Entity) to football (Sport Entity).
Is there a functionality that would allow me to do this/determine these relationships?
-
Add custom NER to Spacy 3 pipeline
I am trying to build a custom Spacy pipeline based off the en_core_web_sm pipeline. From what I can tell the ner has been added correctly as it is displayed in the pipe names when printed(see below). For some reason when the model is tested on text I am not getting any results but when the custom ner is used by itself the correct entities are extracted and labelled. I am using Spacy 3.0.8 and en_core_web_sm pipeline 3.0.0.
import spacy crypto_nlp = spacy.load('model-best') nlp = spacy.load('en_core_web_sm') nlp.add_pipe('ner', source=crypto_nlp, name="crypto_ner", before="ner") print(nlp.pipe_names) text = 'Ethereum' doc = nlp(text) for ent in doc.ents: print(ent.text, ent.label_)
Output: '['tok2vec', 'tagger', 'parser', 'crypto_ner', 'ner', 'attribute_ruler', 'lemmatizer']'
But when I use my ner model:
doc = crypto_nlp(text) for ent in doc.ents: print(ent.text, ent.label_)
Output: 'Ethereum ETH'
-
How to mask [PAD] and [SEP] tokens to prevent their prediction and loss calculation for NER task on BERT models?
I am trying to fine-tune BERT model for NER tagging task using tensorflow official nlp toolkit. I found there's already a bert token classifier class which i wanted to use. Looking at the code inside, I don't see any masking to prevent tag prediction and loss calculation for paddings and [SEP] token. I think the prevention is possible, just I don't know how? I wanted to prevent this for faster training and also one of the blog mentioned some weird behaviour when not masked.
Anybody has any idea about this?