What other approaches are there for abstractive summarization, other then seq2seq?
I'm researching on abstractive text summarization, and has come across many recent papers. They all seem to be focusing on Sequence to Sequence models based on RNNs. Apart from RNNs, what other approaches are there when it comes to abstractive text summarization? Does ontology-based summarization revolve around the same seq2seq model?
Most of the material I have come across are research papers on this subject; what better sources are there to get an understanding of the underlining concepts of abstractive summarization?
See also questions close to this topic
How to remove unuseful phrases (Names) before capitalized characters (Locations)
As a step of text-analysis data preprocessing, I want to remove all the words (Names and Noises) before capitalized words (e.g., Location). In this case, all phrases before FLORHAM PARK NJ and BROOKLYN NY should be removed.
Dana Terranova Yannetelli on behalf of Rachael Isola FLORHAM PARK NJ Dear FriendsI am starting this fund on behalf of my best friend's son Christian Isola. Christian was diagnosed with T-Cell Acute Lymphoblastic Leukemia on April 27th just a few days prior to his second birthday.
Cynthia Marie Ocasio BROOKLYN NY My brother Christopher Ocasio was diagnosed with Non-Hodgkin's Lymphoma.
FLORHAM PARK NJ Dear FriendsI am starting this fund on behalf of my best friend's son Christian Isola. Christian was diagnosed with T-Cell Acute Lymphoblastic Leukemia on April 27th just a few days prior to his second birthday.
BROOKLYN NY My brother Christopher Ocasio was diagnosed with Non-Hodgkin's Lymphoma.
How to find bi-grams which include pre-defined words?
I know it is possible to find bigrams which have a particular word from the example in the link below:
finder = BigramCollocationFinder.from_words(text.split()) word_filter = lambda w1, w2: "man" not in (w1, w2) finder.apply_ngram_filter(word_filter) bigram_measures = nltk.collocations.BigramAssocMeasures() raw_freq_ranking = finder.nbest(bigram_measures.raw_freq, 10) #top-10 >>>
But I am not sure how this can be applied if I need bigrams containing both words pre-defined.
"hello, yesterday I have seen a man walking. On the other side there was another man yelling: "who are you, man?"
Given a list:
["yesterday", "I", "other", "side"]How can I get a list of bi-grams with the given words. i.e:
[("yesterday", "I"), ("other", "side")]?
Embedding vs inserting word vectors directly to input layer
I used gensim to build a word2vec embedding of my corpus. Currently I'm converting my (padded) input sentences to the word vectors using the gensim model. This vectors are used as input for the model.
model = Sequential() model.add(Masking(mask_value=0.0, input_shape=(MAX_SEQUENCE_LENGTH, dim))) model.add(Bidirectional( LSTM(num_lstm, dropout=0.5, recurrent_dropout=0.4, return_sequences=True)) ) ... model.fit(training_sentences_vectors, training_labels, validation_data=validation_data)
Are there any drawbacks using the word vectors directly without a keras embedding layer?
I'm also currently adding additional (one-hot encoded) tags to the input tokens by concatenating them to each word vector, does this approach make sense?
Pytorch LSTM not using GPU
I'm trying to train a pytorch LSTM model connected with couple of MLP layers. The model is coded as follows:
class RNNBlock(nn.Module): def __init__(self, in_dim, hidden_dim, num_layer=1, dropout=0): super(RNNBlock, self).__init__() self.hidden_dim = hidden_dim self.num_layer = num_layer self.lstm = nn.LSTM(in_dim, hidden_dim, num_layer, dropout) def forward(self, onehot, length): batch_size = onehot.shape h_in = nn.Parameter(torch.randn(self.num_layer, batch_size, self.hidden_dim)) c_in = nn.Parameter(torch.randn(self.num_layer, batch_size, self.hidden_dim)) packed = nn.utils.rnn.pack_padded_sequence(onehot, length, batch_first=True) output, (h_out, c_out) = self.lstm(packed, (h_in, c_in)) unpacked, unpacked_length = nn.utils.rnn.pad_packed_sequence(output, batch_first=True) vectors = list() for i, vector in enumerate(unpacked): vectors.append(unpacked[i, unpacked_length[i]-1, :].view(1, -1)) out = torch.cat(vectors, 0) return out class Predictor(nn.Module): def __init__(self, in_dim, out_dim, act=None): super(Predictor, self).__init__() self.linear = nn.Linear(in_dim, out_dim) nn.init.xavier_normal_(self.linear.weight) self.activation = act def forward(self, x): out = self.linear(x) if self.activation != None: out = self.activation(out) return out class RNNNet(nn.Module): def __init__(self, args): super(RNNNet, self).__init__() self.rnnBlock = RNNBlock(args.in_dim, args.hidden_dim, args.num_layer, args.dropout) self.pred1 = Predictor(args.hidden_dim, args.pred_dim1, act=nn.ReLU()) self.pred2 = Predictor(args.pred_dim1, args.pred_dim2, act=nn.ReLU()) self.pred3 = Predictor(args.pred_dim2, args.out_dim) def forward(self, onehot, length): out = self.rnnBlock(onehot, length) out = self.pred1(out) out = self.pred2(out) out = self.pred3(out) return out
and this is my train and experiment functions
def train(model, device, optimizer, criterion, data_train, bar, args): epoch_train_loss = 0 epoch_train_mae = 0 for i, batch in enumerate(data_train): list_onehot = torch.tensor(batch).cuda().float() list_length = torch.tensor(batch).cuda() list_logP = torch.tensor(batch).cuda().float() # Sort onehot tensor with respect to the sequence length. list_length, list_index = torch.sort(list_length, descending=True) list_length.cuda() list_index.cuda() list_onehot = torch.Tensor([list_onehot.tolist()[i] for i in list_index]).cuda().float() model.train() optimizer.zero_grad() list_pred_logP = model(list_onehot, list_length).squeeze().cuda() list_pred_logP.require_grad = False train_loss = criterion(list_pred_logP, list_logP) train_mae = mean_absolute_error(list_pred_logP.tolist(), list_logP.tolist()) epoch_train_loss += train_loss.item() epoch_train_mae += train_mae train_loss.backward() optimizer.step() bar.update(len(list_onehot)) epoch_train_loss /= len(data_train) epoch_train_mae /= len(data_train) return model, epoch_train_loss, epoch_train_mae def experiment(dict_partition, device, bar, args): time_start = time.time() model = RNNNet(args) model.cuda() if args.optim == 'Adam': optimizer = optim.Adam(model.parameters(), lr=args.lr, weight_decay=args.l2_coef) elif args.optim == 'RMSprop': optimizer = optim.RMSprop(model.parameters(), lr=args.lr, weight_decay=args.l2_coef) elif args.optim == 'SGD': optimizer = optim.SGD(model.parameters(), lr=args.lr, weight_decay=args.l2_coef) else: assert False, 'Undefined Optimizer Type' criterion = nn.MSELoss() scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=args.step_size, gamma=args.gamma) list_train_loss = list() list_val_loss = list() list_train_mae = list() list_val_mae = list() data_train = DataLoader(dict_partition['train'], batch_size=args.batch_size, shuffle=args.shuffle) data_val = DataLoader(dict_partition['val'], batch_size=args.batch_size, shuffle=args.shuffle) for epoch in range(args.epoch): scheduler.step() model, train_loss, train_mae = train(model, device, optimizer, criterion, data_train, bar, args) list_train_loss.append(train_loss) list_train_mae.append(train_mae) mode, val_loss, val_mae = validate(model, device, criterion, data_val, bar, args) list_val_loss.append(val_loss) list_val_mae.append(val_mae) data_test = DataLoader(dict_partition['test'], batch_size=args.batch_size, shuffle=args.shuffle) mae, std, logP_total, pred_logP_total = test(model, device, data_test, args) time_end = time.time() time_required = time_end - time_start args.list_train_loss = list_train_loss args.list_val_loss = list_val_loss args.list_train_mae = list_train_mae args.list_val_mae = list_val_mae args.logP_total = logP_total args.pred_logP_total = pred_logP_total args.mae = mae args.std = std args.time_required = time_required return args
The list_onehot and list_length tensors are loaded from the DataLoader and uploaded to GPU. Then, to use packed sequence as input, I’ve sorted the both list_onehot and list_length and uploaded to GPU. The model was uploaded to GPU and h_in, c_in tensors and packed sequence object were also uploaded to the GPU. However, when I try to run this code, it does not use GPU but only use CPU. What should I do to use GPU to train this model?
tensorflow keras sequence models - how to only predict output of last step
I am developing a sequence model by using
Assume that I have 5 time steps and each time step essentially has an output. Number of features in each time step is - say - 4. And I am working on a classification problem and the output can be one of 0, 1, or 2.
For example, following is a train example.
Step 1 input: `[0, 5, 4, 5]` and output = `0` Step 2 input: `[1, 2, 2, 7]` and output = `1` Step 3 input: `[7, 5, 3, 4]` and output = `0` Step 4 input: `[4, 5, 1, 2]` and output = `1` Step 5 input: `[8, 5, 4, 5]` and output = `2`
When training my model, I want to train it in a way so that... In step 1, if your input is
[0, 5, 4, 5], then output for this timestep is
0. In step 2, if your input is
[1, 2, 2, 7], then output for this timestep is
1. And so on....
But later on in the production, I will only expect my model to estimate the output from the last timestep. For instance:
Step 1 input: `[0, 5, 4, 5]` and output = `0` Step 2 input: `[1, 2, 2, 7]` and output = `1` Step 3 input: `[7, 5, 3, 4]` and output = `0` Step 4 input: `[4, 5, 1, 2]` and output = `1` Step 5 input: `[8, 5, 4, 5]` and output = **`?`**
Based on this, when training my model, I am a bit confused about how I should build and be training my model? Since I am only interested in the output of last step but still would like to help my model during the training phase by providing the outputs of previous steps prior to the final step, I wonder how I should construct the model? If I provide ouputs of all timesteps as expected input, as far as I understand it, loss/cost is calculated based on that. For instance, if the output of 3rd timestep is calculated wrong, the cost will be increasing. It is maybe expected, but the important thing for me is the output of last step and I am primarily interested in making a correct prediction in the last time step. How do I construct my model by using
tf.kerasfor such a case?
( Or, if I still need to train my model in a way so that it tries to estimate the output of each time step separately.. At the end, I would still like to calculate the accuracy based on only the output of last step. )
Any help will highly be appreciated!
Reshape data of a long time sequence for RNN
Suppose I have a data of size [m,n]. This data is a continuous recording, i.e., it starts a time 0 and ends at time m. Each time a feature of size n is recorded. The problem is that it is a long recording. A LSTM expects as input a 3D data, i.e., [batch_size, time_steps,feature_size].
My data is a continuous recording where the number of time steps can be, e.g., 4000. Now, suppose I have one recording of the following size: [1,4000,n] (where 4000 >> n, n is small)
How could I reshape this data and feed this to the RNN (this can be LSTM or GRU) so that I don't have to load the entire stream of data as one chunk into memory?
This could get even more problematic if I have multiple recordings r of size [4000,n]. Then I would have data of size [r,4000,n].
multi-document summarization dataset
I want to work on multi-document summarization task.
DUC2007 is a multi-document summarization dataset.
What other datasets are available in the context of multi-document summarization?
Add row to dataframe with sum of within group data
I have an example dataframe below.
eg_data <- data.frame( time = c("1", "1", "2","2"), type = c("long", "short","long", "short"), size=c(200,50, 500, 150 ))
I need to create rows which total the values of size, for each time period. I have looked at combinations of aggregate and by, but I cannot get it to work correctly.
An example of what I've tried:
rbind(eg_data, data.frame(time="1 + 2", type="long", size=by(eg_data$size, eg_data$time=="long", sum)))
An example of what I want the final dataframe to look like:
eg_data <- data.frame( time = c("1", "1", "2","2", "1 + 2", "1 + 2"), type = c("long", "short","long", "short", "long", "short"), size=c(200, 50, 500, 150, 700, 200))
Any help is appreciated, a solution with base R would be really appreciated.
How to evaluate auto summary generated with gold summaries with Rouge metric?
I'm working on a auto summarization system and I want to evaluate my output summary with my gold summaries. I have multiple summaries with different length for each case. So I'm a little confused in here. my question is that how should I evaluate my summary with these gold summaries. should I evaluate mine with each gold summary then average the results or assume union of gold summaries as gold summary the evaluate mine with that?
Thank you in advance