How do I reshape a pandas dataframe into a 3 dimensional object with tensorflow?
I have a Pandas df with 12 rows and 24 columns, but the input shape required for my Tensorflow model is (None, 24, 12). I've tried a few different methods inducing using df.values but I keep getting the error:
Input 0 of layer sequential is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: [None, 12]
How do I reshape my df to be in the required shape?
The (None, 24, 12) comes from the shape that I get when doing tf.shape(data) on the data that is fed into the model when training, so the models requires inputs with the same shape.
See also questions close to this topic
-
Eliminate for-loop in Python
I have to compute some quantities in python using the following formula: (I'll post an Image since LaTeX language is not recognized)
where "psi" is simply the digamma function, v_j and alpha_{t, bl}^{l} are floating numbers with t={0, 1, 2}, bl = {0, 1, 2, 3} and l = {1, 2, ..., 309}.
For now I have solved grouping v_j into a list and the alphas into nested dictionaries (in order: l, t, bl), iterating the formula with three for loops:
for loop in tqdm(range(500)): logr = [] for t in range(3): add1 = digamma(V[t]) - digamma(sum(V)) add2 = 0 for b in test.columns[2:-1]: add2 += digamma(alphas[b][t][test.iloc[395][b]]) - digamma(sum(alphas[b][t])) logrho = add1 + add2 logr.append(logrho) rho = np.exp(logr) q = rho/sum(rho)
The problem is that I have to iterate these lines of code for hundreds of time. So, I am looking for a way to eventually vectorize my problem. My first guess was to convert the "alpha dictionary" into a dataframe (
pd.DataFrame.from_dict()
) but still I can't get rid of the for loops.Is there a better way to tackle the problem? Maybe creating a three dimensional pandas dataframe?
EDIT
The real problem I am struggling to figure out is about the second part of the equation (let me forget about the digamma function). The alpha parameters have 3 dimensions: t=0,1,2; l=1,2,...,N and bl=0,1,2,3. Assume we fix t so that we end up iterating only over t=0,1,2. The alpha parameters can be represented in a two dimensional dataframe:
bl l1 l2 l3 ... lN 0 a0l1 a0l2 a0l3 a0lN 1 a1l1 a1l2 a1l3 a1lN 2 ... 3 a3l1 ... a3lN
I should't subtract the sum over the the column from each alpha, but only from one of the four classes bl=0,1,2,3. If in the data, the variable l1 provides an outcome of 3, I should subtract the sum of the alphas for column l1 from a3l1; if the outcome in l2 is 0, i should subtract the sum of the alphas of column l2 from a0l2 and so on. Finally, the sum of the results will be summed (sum over l)
-
Python ServiceNow Semaphore Timeout Error
Trying to connect to a ServiceNow instance and pull information from the tables. I am using the aiosnow module. Here is the code:
import asyncio import getpass import aiosnow from aiosnow.models.table.declared import IncidentModel as Incident # Get user input instanceSN = input("Enter the instance name: ") usernameSN = input("Enter your ServiceNow username: ") passwordSN = getpass.getpass() async def main(): client = aiosnow.Client(instanceSN, basic_auth=(usernameSN,passwordSN)) async with Incident(client, table_name="incident") as inc: query = aiosnow.select( Incident.assigned_to.starts_with(<insert username>) ).order_asc(Incident.priority) for response in await inc.get(query, limit=5): print("Attaching [readme.txt] to {number} ({sys_id})".format(**response)) await inc.upload_file(response["sys_id"], path="./readme.txt") asyncio.run(main())
When I run this I get the following error message: aiosnow.exceptions.ClientConnectionError: Cannot connect to host servicenow.example.com:443 ssl:default [The semaphore timeout period has expired]
-
Python Twitter: Check if a tweet exists by url without API
I have a list of tweets urls from the same account, and I want to check if this tweets still exist or not.
A tweet may not exist anymore if twitter responds with such errors:
This Tweet is from an account that no longer exists. Learn more
or
Sorry that page doesn't exist!
or any such errors.
What I have tried is using twint library to scrape all the tweets from the given profile, and check if the tweets on my "tweets list" is also in the result that the twint library.
And I have used this function to scrape all the tweets using twint:
def get_tweets(username): c = twint.Config() c.Username = username tweets = [] c.Store_object = True c.Retweets = True c.Store_object_tweets_list = tweets c.Hide_output = True twint.run.Profile(c) tweets_links = [] for tweet in tweets: tweets_links.append(tweet.link) return tweets_links get_tweets(username)
This works well but the problem is that it doesn't scrape all the tweets, and it stops at a certain date (for the username I'm testing 'GideonCRozner' it stops at 24/06/2020), and I have posts urls which are before that date. So simply I'm not able to scrape all the posts using twint library.
My solution right now is to include
selenium
in the code andget
the posts which are not scraped yet one by one, but as you know selenium is a slower solution for that.So I hope that I can use some ideas from you, to scrape all the user's tweets or test a tweet if it exists without selenium and without Twitter API
Thanks a lot for your time!
-
Calculate and apply CAGR where value is 0
I have a df with annual data for a variety of columns. Some of these cols don’t have long term data. I’d like to calculate a n year CAGR on each col beginning where the row Val=0, and continue to grow the following years at the same CAGR. I have n specified in a separate df by column as I would like to change these externally. I had thought to perhaps use a rolling.apply once the CAGRs were found, but not sure how to isolate cols and rows where values are zero and to pull in the n from a separate df
-
Cannot cast IntervalIndex to dtype int32 error when plotting an interval in python
I have this code:
plt.subplot(121) df=pd.DataFrame() df['age']=[1,2,3,4,5,6,7,8,9,0,2,3,4,5,6,7,8,9,0,4,1,2,3,4,5,6,7,8,9,0] df['age_cat']=pd.cut(age,bins=list(np.arange(1, 10, 2))) df.groupby('age_cat')['age'].count().plot(kind='bar')
but when I run this code, I am getting this error:
TypeError: Cannot cast IntervalIndex to dtype int32
I am trying to plot the histogram of age values. I simplified the code to show the error.
It seems that the error is related to the plot function as if I remove it, the code works.
Edit 1
changed the code to this:
plt.subplot(121) df=pd.DataFrame() df['age']=[1,2,3,4,5,6,7,8,9,0,2,3,4,5,6,7,8,9,0,4,1,2,3,4,5,6,7,8,9,0] df['age_cat']=pd.cut(df['age'],bins=list(np.arange(1, 10, 2))) df.groupby('age_cat')['age'].count().plot(kind='bar')
But still the same error message.
My library versions are:
absl-py 0.11.0 astunparse 1.6.3 async-generator 1.10 attrs 20.3.0 backcall 0.2.0 bleach 3.2.1 cachetools 4.2.0 certifi 2020.12.5 chardet 4.0.0 colorama 0.4.4 cycler 0.10.0 decorator 4.4.2 defusedxml 0.6.0 entrypoints 0.3 flatbuffers 1.12 gast 0.3.3 google-auth 1.24.0 google-auth-oauthlib 0.4.2 google-pasta 0.2.0 grpcio 1.32.0 h5py 2.10.0 idna 2.10 ipykernel 5.4.3 ipython 7.19.0 ipython-genutils 0.2.0 jedi 0.18.0 Jinja2 2.11.2 joblib 1.0.0 jsonschema 3.2.0 jupyter-client 6.1.11 jupyter-core 4.7.0 jupyterlab-pygments 0.1.2 Keras-Preprocessing 1.1.2 kiwisolver 1.3.1 Markdown 3.3.3 MarkupSafe 1.1.1 matplotlib 3.3.3 mistune 0.8.4 nbclient 0.5.1 nbconvert 6.0.7 nbformat 5.1.2 nest-asyncio 1.4.3 numpy 1.19.3 oauthlib 3.1.0 opt-einsum 3.3.0 packaging 20.8 pandas 1.2.0 pandocfilters 1.4.3 parso 0.8.1 pickleshare 0.7.5 Pillow 8.1.0 pip 20.3.3 prompt-toolkit 3.0.10 protobuf 3.14.0 pyasn1 0.4.8 pyasn1-modules 0.2.8 Pygments 2.7.4 pyparsing 2.4.7 pyrsistent 0.17.3 python-dateutil 2.8.1 pytz 2020.5 pywin32 300 pyzmq 21.0.1 requests 2.25.1 requests-oauthlib 1.3.0 rsa 4.7 scikit-learn 0.24.0 scipy 1.6.0 seaborn 0.11.1 setuptools 49.2.1 six 1.15.0 tabulate 0.8.7 tensorboard 2.4.1 tensorboard-plugin-wit 1.7.0 tensorflow 2.4.0 tensorflow-estimator 2.4.0 termcolor 1.1.0 testpath 0.4.4 tf 1.0.0 threadpoolctl 2.1.0 tornado 6.1 traitlets 5.0.5 typing-extensions 3.7.4.3 urllib3 1.26.2 wcwidth 0.2.5 webencodings 0.5.1 Werkzeug 1.0.1 wheel 0.36.2 wrapt 1.12.1
-
Python Pandas change values from a condition
how are you? I'm pretty new at code, and I have this question:
I want to iterate through a column, and I want to change this column values based on a condition, in this case, I want to change very value from column 'a1': if the value contains the word 'Juancito' I want to change it to just 'Juancito'. The for loop works OK, but the value doesn't change in the end.
What I'm doing wrong?
import pandas as pd inp = [{'a1':'Juancito 1'}, {'a1':'Juancito 2'}, {'a1':'Juancito 3'}] df = pd.DataFrame(inp) for i in df['a1']: if 'Juancito' in i: i = 'Juancito' else: pass df.head()
-
Find the less frequent number in a NumPy array
Suppose I have the following NumPy array:
a = np.array([1,2,3,1,2,1,1,1,3,2,2,1])
How can I find the less frequent number in this array?
- Load a sparse data array in python
-
How does Pythons double-sided inequality work? and why doesn't it work for numpy arrays?
In Python you can do the following;
>>> 3 < 4 < 5 True >>> 3 < 4 < 4 False
How does this work? I would have thought that
4 < 5
would return a boolean, and so3 < True
should returnFalse
, or3 < 4
should return a boolean and soTrue < 4
should maybe returnTrue
ifTrue
could be cast as an integer 1?.And why doesn't it work for numpy arrays?
>>> 1 < np.array([1, 2, 3]) < 3 Traceback (most recent call last): File "<input>", line 1, in <module> ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Could it be made to work for numpy arrays?
-
Mlflow and KerasTuner integration
I am trying to integrate together
KerasTuner
andMlflow
. I'd like to record the loss at each epoch of each trial of Keras Tuner.My approach is:
class MlflowCallback(tf.keras.callbacks.Callback): # This function will be called after each epoch. def on_epoch_end(self, epoch, logs=None): if not logs: return # Log the metrics from Keras to MLflow mlflow.log_metric("loss", logs["loss"], step=epoch) from kerastuner.tuners import RandomSearch with mlflow.start_run(run_name="myrun", nested=True) as run: tuner = RandomSearch( train_fn, objective='loss', max_trials=25, ) tuner.search(train, validation_data=validation, validation_steps=validation_steps, steps_per_epoch=steps_per_epoch, epochs=5, callbacks=[MlflowCallback()] )
However, the loss values are reported (sequentially) in one single experiment. Is there a way to record them independently?
-
save deep learning model with tf not pickle
Would anyone have a tip for saving a sequential deep learning model made with tf.keras using tf.keras
save_model
method?I have an issue with reusing some old code where this used to work:
# save model to file pickle.dump(model, open("darth-vader-model.pkl", "wb")) # save all of our data structures pickle.dump({'words':words, 'classes':classes, 'train_x':train_x, 'train_y':train_y}, open("darth-vader-data.pkl", "wb"))
Revisiting this code this now errors out on:
TypeError Traceback (most recent call last) <ipython-input-13-5b1bb51d3ad3> in <module> 1 # save model to file ----> 2 pickle.dump(model, open("darth-vader-model.pkl", "wb")) 3 4 #model.save('./model') TypeError: cannot pickle 'weakref' object
SO I have been trying to read about a solution just using tensorflow directly to save models instead of pickle method but I cannot figure out how to replicate the code above especially the
#save all of our data structures
Would anyone have any tips for how I could revise this for tf only
save_model
? Its a ML model for ntkl chat bot related processes.In tf should I be using a
model.save('/tmp/model')
or atf.keras.models.save_model( model, filepath, overwrite=True, include_optimizer=True, save_format=None, signatures=None, options=None, save_traces=True )
to replicate the pickle dump process? The process is for chat bot purposes, bag of words, ntlk use...
-
Fail to find the dnn implementation. [[{{node CudnnRNN}}]]
This error appears in my code.
I was looking for help in Internet. Then I discovered that my cuda compiler is:
nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2018 NVIDIA Corporation Built on Sat_Aug_25_21:08:01_CDT_2018 Cuda compilation tools, release 10.0, V10.0.130
However, if I write nvidia-smi, the following is shown:
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 450.102.04 Driver Version: 450.102.04 CUDA Version: 11.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 GeForce RTX 206... Off | 00000000:01:00.0 On | N/A | | 31% 33C P8 11W / 184W | 7962MiB / 7979MiB | 3% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 1291 G /usr/lib/xorg/Xorg 40MiB | | 0 N/A N/A 1333 G /usr/bin/gnome-shell 57MiB | | 0 N/A N/A 1616 G /usr/lib/xorg/Xorg 124MiB | | 0 N/A N/A 1751 G /usr/bin/gnome-shell 78MiB | | 0 N/A N/A 2096 G ...gAAAAAAAAA --shared-files 57MiB | | 0 N/A N/A 2369 G ...ulen/anaconda3/bin/python 5MiB | | 0 N/A N/A 2492 C ...ulen/anaconda3/bin/python 7529MiB | | 0 N/A N/A 2707 G ...AAAAAAAA== --shared-files 61MiB | +-----------------------------------------------------------------------------+
Could this be the problem? I thought I only have installed CUDA 10.0