Exclude Japanese Stopwords from File

I am trying to remove Japanese stopwords from a text corpus from twitter. Unfortunately the frequently used nltk does not contain Japanese, so I had to figure out a different way.

This is my MWE:

import urllib
from urllib.request import urlopen
import MeCab
import re

# slothlib
slothlib_path = "http://svn.sourceforge.jp/svnroot/slothlib/CSharp/Version1/SlothLib/NLP/Filter/StopWord/word/Japanese.txt"
sloth_file = urllib.request.urlopen(slothlib_path)

# stopwordsiso
iso_path = "https://raw.githubusercontent.com/stopwords-iso/stopwords-ja/master/stopwords-ja.txt"
iso_file = urllib.request.urlopen(iso_path)
stopwords = [line.decode("utf-8").strip() for line in iso_file]

stopwords = [ss for ss in stopwords if not ss==u'']
stopwords = list(set(stopwords))

text = '日本語の自然言語処理は本当にしんどい、と彼は十回言った。'
tagger = MeCab.Tagger("-Owakati")
tok_text = tagger.parse(text)

ws = re.compile(" ")
words = [word for word in ws.split(tok_text)]
if words[-1] == u"\n":
    words = words[:-1]
ws = [w for w in words if w not in stopwords]

print(words)
print(ws)

Successfully Completed: It does give out the original tokenized text as well as the one without stopwords

['日本語', 'の', '自然', '言語', '処理', 'は', '本当に', 'しんどい', '、', 'と', '彼', 'は', '十', '回', '言っ', 'た', '。']
['日本語', '自然', '言語', '処理', '本当に', 'しんどい', '、', '十', '回', '言っ', '。']

There is still 2 issues I am facing though:

a) Is it possible to have 2 stopword lists regarded? namely iso_file and sloth_file ? so if the word is either a stopword from iso_file or sloth_file it will be removed? (I tried to use line 14 as stopwords = [line.decode("utf-8").strip() for line in zip('iso_file','sloth_file')] but received an error as tuple attributes may not be decoded

b) The ultimate goal would be to generate a new text file in which all stopwords are removed.

I had created this MWE

### first clean twitter csv
import pandas as pd
import re
import emoji

df = pd.read_csv("input.csv")

def cleaner(tweet):
    tweet = re.sub(r"@[^\s]+","",tweet) #Remove @username 
    tweet = re.sub(r"(?:\@|http?\://|https?\://|www)\S+|\\n","", tweet) #Remove http links & \n
    tweet = " ".join(tweet.split())
    tweet = ''.join(c for c in tweet if c not in emoji.UNICODE_EMOJI) #Remove Emojis
    tweet = tweet.replace("#", "").replace("_", " ") #Remove hashtag sign but keep the text
    return tweet
df['text'] = df['text'].map(lambda x: cleaner(x))
df['text'].to_csv(r'cleaned.txt', header=None, index=None, sep='\t', mode='a')

### remove stopwords

import urllib
from urllib.request import urlopen
import MeCab
import re

# slothlib
slothlib_path = "http://svn.sourceforge.jp/svnroot/slothlib/CSharp/Version1/SlothLib/NLP/Filter/StopWord/word/Japanese.txt"
sloth_file = urllib.request.urlopen(slothlib_path)

#stopwordsiso
iso_path = "https://raw.githubusercontent.com/stopwords-iso/stopwords-ja/master/stopwords-ja.txt"
iso_file = urllib.request.urlopen(iso_path)
stopwords = [line.decode("utf-8").strip() for line in iso_file]

stopwords = [ss for ss in stopwords if not ss==u'']
stopwords = list(set(stopwords))

with open("cleaned.txt",encoding='utf8') as f:
    cleanedlist = f.readlines()
    cleanedlist = list(set(cleanedlist))

tagger = MeCab.Tagger("-Owakati")
tok_text = tagger.parse(cleanedlist)

ws = re.compile(" ")
words = [word for word in ws.split(tok_text)]
if words[-1] == u"\n":
    words = words[:-1]
ws = [w for w in words if w not in stopwords]

print(words)
print(ws)

While it works for the simple input text in the first MWE, for the MWE I just stated I get the error

in method 'Tagger_parse', argument 2 of type 'char const *'
Additional information:
Wrong number or type of arguments for overloaded function 'Tagger_parse'.
  Possible C/C++ prototypes are:
    MeCab::Tagger::parse(MeCab::Lattice *) const
    MeCab::Tagger::parse(char const *)

for this line: tok_text = tagger.parse(cleanedlist) So I assume I will need to make amendments to the cleanedlist?

I have uploaded the cleaned.txt on github for reproducing the issue: [txt on github][1]

Also: How would I be able to get the tokenized list that excludes stopwords back to a text format like cleaned.txt? Would it be possible to for this purpose create a df of ws? Or might there even be a more simple way?

Sorry for the long request, I tried a lot and tried to make it as easy as possible to understand what I'm driving at :-)

Thank you very much! [1]: https://gist.github.com/yin-ori/1756f6236944e458fdbc4a4aa8f85a2c

How many English words
do you know?
Test your English vocabulary size, and measure
how many words do you know
Online Test
Powered by Examplum