What other approaches are there for abstractive summarization, other then seq2seq?
I'm researching on abstractive text summarization, and has come across many recent papers. They all seem to be focusing on Sequence to Sequence models based on RNNs. Apart from RNNs, what other approaches are there when it comes to abstractive text summarization? Does ontologybased summarization revolve around the same seq2seq model?
Most of the material I have come across are research papers on this subject; what better sources are there to get an understanding of the underlining concepts of abstractive summarization?
See also questions close to this topic

How to remove unuseful phrases (Names) before capitalized characters (Locations)
As a step of textanalysis data preprocessing, I want to remove all the words (Names and Noises) before capitalized words (e.g., Location). In this case, all phrases before FLORHAM PARK NJ and BROOKLYN NY should be removed.
Inputs:
Dana Terranova Yannetelli on behalf of Rachael Isola FLORHAM PARK NJ Dear FriendsI am starting this fund on behalf of my best friend's son Christian Isola. Christian was diagnosed with TCell Acute Lymphoblastic Leukemia on April 27th just a few days prior to his second birthday.
and
Cynthia Marie Ocasio BROOKLYN NY My brother Christopher Ocasio was diagnosed with NonHodgkin's Lymphoma.
Expected results:
FLORHAM PARK NJ Dear FriendsI am starting this fund on behalf of my best friend's son Christian Isola. Christian was diagnosed with TCell Acute Lymphoblastic Leukemia on April 27th just a few days prior to his second birthday.
and
BROOKLYN NY My brother Christopher Ocasio was diagnosed with NonHodgkin's Lymphoma.

How to find bigrams which include predefined words?
I know it is possible to find bigrams which have a particular word from the example in the link below:
finder = BigramCollocationFinder.from_words(text.split()) word_filter = lambda w1, w2: "man" not in (w1, w2) finder.apply_ngram_filter(word_filter) bigram_measures = nltk.collocations.BigramAssocMeasures() raw_freq_ranking = finder.nbest(bigram_measures.raw_freq, 10) #top10 >>>
nltk: how to get bigrams containing a specific word
But I am not sure how this can be applied if I need bigrams containing both words predefined.
Example:
My Sentence:
"hello, yesterday I have seen a man walking. On the other side there was another man yelling: "who are you, man?"
Given a list:
["yesterday", "I", "other", "side"]
How can I get a list of bigrams with the given words. i.e:[("yesterday", "I"), ("other", "side")]
? 
Embedding vs inserting word vectors directly to input layer
I used gensim to build a word2vec embedding of my corpus. Currently I'm converting my (padded) input sentences to the word vectors using the gensim model. This vectors are used as input for the model.
model = Sequential() model.add(Masking(mask_value=0.0, input_shape=(MAX_SEQUENCE_LENGTH, dim))) model.add(Bidirectional( LSTM(num_lstm, dropout=0.5, recurrent_dropout=0.4, return_sequences=True)) ) ... model.fit(training_sentences_vectors, training_labels, validation_data=validation_data)
Are there any drawbacks using the word vectors directly without a keras embedding layer?
I'm also currently adding additional (onehot encoded) tags to the input tokens by concatenating them to each word vector, does this approach make sense?

Pytorch LSTM not using GPU
I'm trying to train a pytorch LSTM model connected with couple of MLP layers. The model is coded as follows:
class RNNBlock(nn.Module): def __init__(self, in_dim, hidden_dim, num_layer=1, dropout=0): super(RNNBlock, self).__init__() self.hidden_dim = hidden_dim self.num_layer = num_layer self.lstm = nn.LSTM(in_dim, hidden_dim, num_layer, dropout) def forward(self, onehot, length): batch_size = onehot.shape[0] h_in = nn.Parameter(torch.randn(self.num_layer, batch_size, self.hidden_dim)) c_in = nn.Parameter(torch.randn(self.num_layer, batch_size, self.hidden_dim)) packed = nn.utils.rnn.pack_padded_sequence(onehot, length, batch_first=True) output, (h_out, c_out) = self.lstm(packed, (h_in, c_in)) unpacked, unpacked_length = nn.utils.rnn.pad_packed_sequence(output, batch_first=True) vectors = list() for i, vector in enumerate(unpacked): vectors.append(unpacked[i, unpacked_length[i]1, :].view(1, 1)) out = torch.cat(vectors, 0) return out class Predictor(nn.Module): def __init__(self, in_dim, out_dim, act=None): super(Predictor, self).__init__() self.linear = nn.Linear(in_dim, out_dim) nn.init.xavier_normal_(self.linear.weight) self.activation = act def forward(self, x): out = self.linear(x) if self.activation != None: out = self.activation(out) return out class RNNNet(nn.Module): def __init__(self, args): super(RNNNet, self).__init__() self.rnnBlock = RNNBlock(args.in_dim, args.hidden_dim, args.num_layer, args.dropout) self.pred1 = Predictor(args.hidden_dim, args.pred_dim1, act=nn.ReLU()) self.pred2 = Predictor(args.pred_dim1, args.pred_dim2, act=nn.ReLU()) self.pred3 = Predictor(args.pred_dim2, args.out_dim) def forward(self, onehot, length): out = self.rnnBlock(onehot, length) out = self.pred1(out) out = self.pred2(out) out = self.pred3(out) return out
and this is my train and experiment functions
def train(model, device, optimizer, criterion, data_train, bar, args): epoch_train_loss = 0 epoch_train_mae = 0 for i, batch in enumerate(data_train): list_onehot = torch.tensor(batch[0]).cuda().float() list_length = torch.tensor(batch[1]).cuda() list_logP = torch.tensor(batch[2]).cuda().float() # Sort onehot tensor with respect to the sequence length. list_length, list_index = torch.sort(list_length, descending=True) list_length.cuda() list_index.cuda() list_onehot = torch.Tensor([list_onehot.tolist()[i] for i in list_index]).cuda().float() model.train() optimizer.zero_grad() list_pred_logP = model(list_onehot, list_length).squeeze().cuda() list_pred_logP.require_grad = False train_loss = criterion(list_pred_logP, list_logP) train_mae = mean_absolute_error(list_pred_logP.tolist(), list_logP.tolist()) epoch_train_loss += train_loss.item() epoch_train_mae += train_mae train_loss.backward() optimizer.step() bar.update(len(list_onehot)) epoch_train_loss /= len(data_train) epoch_train_mae /= len(data_train) return model, epoch_train_loss, epoch_train_mae def experiment(dict_partition, device, bar, args): time_start = time.time() model = RNNNet(args) model.cuda() if args.optim == 'Adam': optimizer = optim.Adam(model.parameters(), lr=args.lr, weight_decay=args.l2_coef) elif args.optim == 'RMSprop': optimizer = optim.RMSprop(model.parameters(), lr=args.lr, weight_decay=args.l2_coef) elif args.optim == 'SGD': optimizer = optim.SGD(model.parameters(), lr=args.lr, weight_decay=args.l2_coef) else: assert False, 'Undefined Optimizer Type' criterion = nn.MSELoss() scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=args.step_size, gamma=args.gamma) list_train_loss = list() list_val_loss = list() list_train_mae = list() list_val_mae = list() data_train = DataLoader(dict_partition['train'], batch_size=args.batch_size, shuffle=args.shuffle) data_val = DataLoader(dict_partition['val'], batch_size=args.batch_size, shuffle=args.shuffle) for epoch in range(args.epoch): scheduler.step() model, train_loss, train_mae = train(model, device, optimizer, criterion, data_train, bar, args) list_train_loss.append(train_loss) list_train_mae.append(train_mae) mode, val_loss, val_mae = validate(model, device, criterion, data_val, bar, args) list_val_loss.append(val_loss) list_val_mae.append(val_mae) data_test = DataLoader(dict_partition['test'], batch_size=args.batch_size, shuffle=args.shuffle) mae, std, logP_total, pred_logP_total = test(model, device, data_test, args) time_end = time.time() time_required = time_end  time_start args.list_train_loss = list_train_loss args.list_val_loss = list_val_loss args.list_train_mae = list_train_mae args.list_val_mae = list_val_mae args.logP_total = logP_total args.pred_logP_total = pred_logP_total args.mae = mae args.std = std args.time_required = time_required return args
The list_onehot and list_length tensors are loaded from the DataLoader and uploaded to GPU. Then, to use packed sequence as input, I’ve sorted the both list_onehot and list_length and uploaded to GPU. The model was uploaded to GPU and h_in, c_in tensors and packed sequence object were also uploaded to the GPU. However, when I try to run this code, it does not use GPU but only use CPU. What should I do to use GPU to train this model?

tensorflow keras sequence models  how to only predict output of last step
I am developing a sequence model by using
tf.keras
library.Assume that I have 5 time steps and each time step essentially has an output. Number of features in each time step is  say  4. And I am working on a classification problem and the output can be one of 0, 1, or 2.
For example, following is a train example.
Step 1 input: `[0, 5, 4, 5]` and output = `0` Step 2 input: `[1, 2, 2, 7]` and output = `1` Step 3 input: `[7, 5, 3, 4]` and output = `0` Step 4 input: `[4, 5, 1, 2]` and output = `1` Step 5 input: `[8, 5, 4, 5]` and output = `2`
When training my model, I want to train it in a way so that... In step 1, if your input is
[0, 5, 4, 5]
, then output for this timestep is0
. In step 2, if your input is[1, 2, 2, 7]
, then output for this timestep is1
. And so on....But later on in the production, I will only expect my model to estimate the output from the last timestep. For instance:
Step 1 input: `[0, 5, 4, 5]` and output = `0` Step 2 input: `[1, 2, 2, 7]` and output = `1` Step 3 input: `[7, 5, 3, 4]` and output = `0` Step 4 input: `[4, 5, 1, 2]` and output = `1` Step 5 input: `[8, 5, 4, 5]` and output = **`?`**
Based on this, when training my model, I am a bit confused about how I should build and be training my model? Since I am only interested in the output of last step but still would like to help my model during the training phase by providing the outputs of previous steps prior to the final step, I wonder how I should construct the model? If I provide ouputs of all timesteps as expected input, as far as I understand it, loss/cost is calculated based on that. For instance, if the output of 3rd timestep is calculated wrong, the cost will be increasing. It is maybe expected, but the important thing for me is the output of last step and I am primarily interested in making a correct prediction in the last time step. How do I construct my model by using
tf.keras
for such a case?( Or, if I still need to train my model in a way so that it tries to estimate the output of each time step separately.. At the end, I would still like to calculate the accuracy based on only the output of last step. )
Any help will highly be appreciated!

Reshape data of a long time sequence for RNN
Suppose I have a data of size [m,n]. This data is a continuous recording, i.e., it starts a time 0 and ends at time m. Each time a feature of size n is recorded. The problem is that it is a long recording. A LSTM expects as input a 3D data, i.e., [batch_size, time_steps,feature_size].
My data is a continuous recording where the number of time steps can be, e.g., 4000. Now, suppose I have one recording of the following size: [1,4000,n] (where 4000 >> n, n is small)
How could I reshape this data and feed this to the RNN (this can be LSTM or GRU) so that I don't have to load the entire stream of data as one chunk into memory?
This could get even more problematic if I have multiple recordings r of size [4000,n]. Then I would have data of size [r,4000,n].

multidocument summarization dataset
I want to work on multidocument summarization task.
DUC2007 is a multidocument summarization dataset.
What other datasets are available in the context of multidocument summarization?

Add row to dataframe with sum of within group data
I have an example dataframe below.
eg_data < data.frame( time = c("1", "1", "2","2"), type = c("long", "short","long", "short"), size=c(200,50, 500, 150 ))
I need to create rows which total the values of size, for each time period. I have looked at combinations of aggregate and by, but I cannot get it to work correctly.
An example of what I've tried:
rbind(eg_data, data.frame(time="1 + 2", type="long", size=by(eg_data$size, eg_data$time=="long", sum)))
An example of what I want the final dataframe to look like:
eg_data < data.frame( time = c("1", "1", "2","2", "1 + 2", "1 + 2"), type = c("long", "short","long", "short", "long", "short"), size=c(200, 50, 500, 150, 700, 200))
Any help is appreciated, a solution with base R would be really appreciated.

How to evaluate auto summary generated with gold summaries with Rouge metric?
I'm working on a auto summarization system and I want to evaluate my output summary with my gold summaries. I have multiple summaries with different length for each case. So I'm a little confused in here. my question is that how should I evaluate my summary with these gold summaries. should I evaluate mine with each gold summary then average the results or assume union of gold summaries as gold summary the evaluate mine with that?
Thank you in advance