pandas to_sql : string literal cannot contain NUL (0x00) characters
I a trying to append data to a PostgreSQL database table using Pandas.DataFrame.to_sql and sqlalchemy and have received the error ValueError: A string literal cannot contain NUL (0x00) characters.
I understand this is because Postgresql does not allow UTF null characters.
this is the line of code that is creating the error:
df.to_sql(table_name,con,if_exists='append',index=False)
How can I check where this particular character occurs in my data frame and replace it with something that will not cause this error to occur?
See also questions close to this topic
-
Error non-linear-regression python curve-fit
Hello guys i want to make non-linear regression in python with curve fit this is my code:
#fit a fourth degree polynomial to the economic data from numpy import arange from scipy.optimize import curve_fit from matplotlib import pyplot import math x = [17.47,20.71,21.08,18.08,17.12,14.16,14.06,12.44,11.86,11.19,10.65] y = [5,35,65,95,125,155,185,215,245,275,305] # define the true objective function def objective(x, a, b, c, d, e): return ((a)-((b)*(x/3-5)))+((c)*(x/305)**2)-((d)*(math.log(305))-math.log(x))+((e)*(math.log(305)-(math.log(x))**2)) popt, _ = curve_fit(objective, x, y) # summarize the parameter values a, b, c, d, e = popt # plot input vs output pyplot.scatter(x, y) # define a sequence of inputs between the smallest and largest known inputs x_line = arange(min(x), max(x), 1) # calculate the output for the range y_line = objective(x_line, a, b, c, d, e) # create a line plot for the mapping function pyplot.plot(x_line, y_line, '--', color='red') pyplot.show()
this is my error :
Traceback (most recent call last): File "C:\Users\Fahmi\PycharmProjects\pythonProject\main.py", line 16, in popt, _ = curve_fit(objective, x, y) File "C:\Users\Fahmi\PycharmProjects\pythonProject\venv\lib\site-packages\scipy\optimize\minpack.py", line 784, in curve_fit res = leastsq(func, p0, Dfun=jac, full_output=1, **kwargs) File "C:\Users\Fahmi\PycharmProjects\pythonProject\venv\lib\site-packages\scipy\optimize\minpack.py", line 410, in leastsq shape, dtype = _check_func('leastsq', 'func', func, x0, args, n) File "C:\Users\Fahmi\PycharmProjects\pythonProject\venv\lib\site-packages\scipy\optimize\minpack.py", line 24, in _check_func res = atleast_1d(thefunc(((x0[:numinputs],) + args))) File "C:\Users\Fahmi\PycharmProjects\pythonProject\venv\lib\site-packages\scipy\optimize\minpack.py", line 484, in func_wrapped return func(xdata, params) - ydata File "C:\Users\Fahmi\PycharmProjects\pythonProject\main.py", line 13, in objective return ((a)-((b)(x/3-5)))+((c)(x/305)**2)-((d)(math.log(305))-math.log(x))+((e)(math.log(305)-(math.log(x))**2)) TypeError: only size-1 arrays can be converted to Python scalars
thanks before
-
beautifulsoup (webscraping) not updating variables when HTML text has changed
I am new to python and I cant understand why this isn't working, but I've narrowed down the issue to one line of code.
The purpose of this bot is to scrape HTML from a website (using beautiful and post to discord when the text changes. I use FC2 and FR2 (flightcategory2 and flightrestrictions2) as memory variables for the code to check against every time it runs. If they're the same, the code waits for _ minutes and checks again, if they're different it posts it.
However when running this code, the variables "flightCategory" "flightRestrictions" change the first time the code runs, but for some reason stop changing when the HTML text on the website changes. the line in question is this if loop.
if 1==1: # using 1==1 so this loop constantly runs for testing, otherwise I have it set for a time flightCategory, flightRestrictions = und.getInfo()
When debugging mode, the code IS run, but the variables in the code don't update, and I am confused as to why they would update the first time the code is run, but not sequential times. This line is critical to the operation of my code.
Here's an abbreviated version of the code to make it easier to read. I'd appreciate any help.
FC2 = 0 FR2 = 0 flightCategory = "" flightRestrictions = "" class UND: def __init__(self): page = requests.get("http://sof.aero.und.edu") self.soup = BeautifulSoup(page.content, "html.parser") def getFlightCategory(self): # Takes the appropriate html text and sets it to a variable flightCategoryClass = self.soup.find(class_="auto-style1b") return flightCategoryClass.get_text() def getRestrictions(self): # Takes the appropriate html text and sets it to a variable flightRestrictionsClass = self.soup.find(class_="auto-style4") return flightRestrictionsClass.get_text() def getInfo(self): return self.getFlightCategory(), self.getRestrictions() und = UND() while 1 == 1: if 1==1: #using 1==1 so this loop constantly runs for testing, otherwise I have it set for a time flightCategory, flightRestrictions = und.getInfo() (scrape the html from the web) if flightCategory == FC2 and flightRestrictions == FR2: # if previous check is the same as this check then skip posting Do Something elif flightCategory != FC2 or flightRestrictions != FR2: # if any variable has changed since the last time FC2 = flightCategory # set the comparison variable to equal the variable FR2 = flightRestrictions if flightRestrictions == "Manager on Duty:": # if this is seen only output category Do Something elif flightRestrictions != "Manager on Duty:": Do Something else: print("Outside Time") time.sleep(5) # Wait _ seconds. This would be set for 30 min but for testing it is 5 seconds. O
-
Need to reload vosk model for every transcription?
The vosk model that I'm using is vosk-model-en-us-aspire-0.2 (1.4GB). Every time need quite amount of time to load the vosk model. Is it necessary to recreate the vosk object for every time? It take many time to load the model, if we only load model once. It can save up at least half of the time.
-
Merge/join two dataframes based on index and colum
I would like to join (or merge?) two dataframes. They look like the following:
Table 1 ( = df)
index | year | country ---------------------------- 0 | 1970 | NL 1 | 1970 | UK 2 | 1980 | US 3 | 1990 | NL 4 | 1990 | US
Table 2 (= df_gdp)
cntry | 1970 | 1980 | 1990 ----------------------------------- NL | 5 | 3 | 0 UK | 1 | 7 | 1 US | 9 | 2 | 0
The result should be Table1 with an additional column 'GDP'. The values of Table1.year and Table.country should be used to look up the value in Table2. So the result would be:
index | year | country | GDP -------------------------------------- 0 | 1970 | NL | 5 1 | 1970 | UK | 1 2 | 1980 | US | 2 3 | 1990 | NL | 0 4 | 1990 | US | 0
I already wrote the function with
.iterrows()
but as expected, this does not have a good performance. Instead, I'm wondering whether the result can also be achieved by either.join()
or.merge()
. What I do not understand is how to merge/join based on the index (cntry) and a changing column (the year). The code of.iterrows()
looks like the following:# Add GDP data for index, row in df.iterrows(): gdp_year = str(df.iloc[index].year) gdp_country = str(df.iloc[index].country) try: df.at[index, 'GDP'] = df_gdp.loc[gdp_country][gdp_year] except: df.at[index, 'GDP'] = 0 df
-
count number of a specific string in entire data frame in Pandas and add its value in a new column
I have a 5 column data frame, I need to find how many times each element in first column(A) is repeated and add the amount in front of that element in a new column(F), for example 'a' in first Column(A) is repeated five times in entire data frame, so need to create column() and add 5 in related cell in row Zero and so on. appreciate to have your support. I am a newbie to python and need your precious comment.
below is the original data frame:
A B C D E a - b a - c a - d b a - e d b a -
Preferred data frame would be:
A B C D E F a - 5 b a - 3 c a - 1 d b a - 2 e d b a - 1
So far I have coded below lines but I could not make new column having the sum.
import pandas as pd import numpy as np df = pd.DataFrame({'A': {0: 'a', 1: 'b', 2: 'c', 3: 'd', 4: 'e'}, 'B': {0: '-', 1: 'a', 2: 'a', 3: 'b', 4: 'd'}}) df['C'] = np.where(df['B'].isin(df['A'].values), df['B'], np.nan) df['C'] = df['C'].map(dict(zip(df.A.values, df.B.values))) df['D'] = np.where(df['C'].isin(df['B'].values), df['C'], np.nan) df['D'] = df['D'].map(dict(zip(df.B.values, df['C'].values))) df['E'] = np.where(df['D'].isin(df['C'].values), df['D'], np.nan) df['E'] = df['E'].map(dict(zip(df['C'].values, df['D'].values))) for cell in df['A']: print(cell) m=df.eq(cell).sum() # pd.DataFrame([m.values], columns=m.index) dep=sum(m) print(dep) print(df)
and below is the out put of above codes:
a 5 b 3 c 1 d 2 e 1
A B C D E a - b a - c a - d b a - e d b a -
-
Create a table of multiple mini barplots in Python
I am trying to reproduce the table of barplots (created below in tableau) in Python.
I am struggling to figure out how to do it python (matplotlib, seaborn, pandas).
Here is some example data that illustrates the problem:
import pandas as pd import numpy as np data = dict( correlation=np.random.normal(0, 0.5, 100), pvalue=np.random.normal(0.05, 0.02, 100), variable=np.tile(np.array([f"variable{i}" for i in range(10)]), 10), model=np.repeat(np.array([f"model{i}" for i in range(10)]), 10) ) data = pd.DataFrame(data) data["significant"] = data["pvalue"] < 0.05 data["positive"] = data["correlation"] > 0
My attempted plot for ONE MODEL (
"model1"
) illustrates roughly what I am looking for. As I said I am trying to reproduce the table of barplots (shown above), which would display the results for all of the models.example_plot_data = data.loc[data.model == "model1"] example_plot_data.plot( kind="bar", x="variable", y="correlation", color=example_plot_data.positive.map({True: "b", False: "r"}), rot=70, ) plt.show()
Ideally these are the aesthetics I am looking for
# Aesthetics for the plots: columns = data["model"] rows = data["variable"] bar_size = data["correlation"] # ideally centred on zero bar_color = data["correlation"] # ideally centred on zero (RdBu) bar_significance = data["significant"]
-
Cannot execute DROP EXTENSION in a read-only transaction (drop extension if exists google_insights)
I'm using Google Cloud SQL (Postgres) and created read replica for my DB.
Now I see in logs such an error:
2021-01-16 12:02:46.393 UTC [93149]: [9-1] db=cloudsqladmin,user=cloudsqladmin ERROR: cannot execute DROP EXTENSION in a read-only transaction 2021-01-16 12:02:46.393 UTC [93149]: [10-1] db=cloudsqladmin,user=cloudsqladmin STATEMENT: drop extension if exists google_insights;
These errors repeat constantly - exactly 120 errors every single hour.
As I understand the Google Cloud tries to drop some of its custom extensions for Postgres and can't do that because replica is read only.
Does anyone know why it happens and how to fix that?
-
postgresql: time stored as text. how to query with respect to time
I have a table called device with following data types
device table: column Type id integer created text name text
Here the time is stored in
text
type instead oftimestamp
Eg:
created
value12/19/2020 20:40:23
I try to query with this date time.
SELECT "device"."id", "device"."created", "device"."name", FROM "device" WHERE "device"."created" < '12/19/2020 20:40:23' LIMIT 21
the results are not as per datetime order. it might be comparing some text string.
So what is the best solution in this case to get data w.r.t time eventhough its stored as text
-
failed to retrieve server_version_num: closed [PostgreSQL error]
I'm Unable to do the migrations. Command:
sudo kong migrations bootstrap -c /etc/kong/kong.conf.default --vv
Error:
/usr/local/share/lua/5.1/kong/cmd/migrations.lua:93: [PostgreSQL error] failed to retrieve server_version_num: closed stack traceback: [C]: in function 'assert' /usr/local/share/lua/5.1/kong/cmd/migrations.lua:93: in function 'cmd_exec' /usr/local/share/lua/5.1/kong/cmd/init.lua:87: in function </usr/local/share/lua/5.1/kong/cmd/init.lua:87> [C]: in function 'xpcall' /usr/local/share/lua/5.1/kong/cmd/init.lua:87: in function </usr/local/share/lua/5.1/kong/cmd/init.lua:44> /usr/local/bin/kong:7: in function 'file_gen' init_worker_by_lua:50: in function <init_worker_by_lua:48> [C]: in function 'xpcall' init_worker_by_lua:57: in function <init_worker_by_lua:55>
-
UndefinedError: 'getNumberOfItems' is undefined
I'm trying to call a function from views.py in my html file using flask but it just comes up with undefined error. This is for my shopping cart, and it returns the number of items in it and its meant to just show it on the home page without having to click a button. Is this possible?
HTML line:
<a href="cart" class="single-icon"><i class="ti-bag"></i> <span class="total-count">{{ getNumberOfItemsInCart() }}</span></a>
Python:
@app.route("/index", methods=['GET','POST']) def getNumberOfItemsInCart(): if current_user.is_authenticated: user_ID = session['userID'] num = Cart.query.filter_by(userID=user_ID).all() q = 0 for x in num: numOfItems = q + x.quantity return numOfItems else: numOfItems = 0 return numOfItems return render_template('index.html', num=num)
-
How to backfill an incrementing id using alembic in postgres
I have a flask app that is backed by a postgres database using flask-sqlalechmy. I've been using miguel grinberg's flask migrate to handle migrations, although I've come to realize that since it is a wrapper on top of alembic, I'm best served by asking questions framed in alembic.
The problem is that I have an association table that I forgot to add a unique id to.
Here is my class for the table with the new column. But I have some records in my database, so trying to run the default migration script of course gives me the "column cannot contain nullable values" error.
class HouseTurns(db.Model): __tablename__ = 'house_turns' __table_args__ = {'extend_existing': True} id = db.Column(db.Integer, primary_key=True) // the new column I want to add user_id = db.Column(db.Integer, db.ForeignKey("users.id"), primary_key=True) house_id = db.Column(db.Integer, db.ForeignKey("houses.id"), primary_key=True) movie_id = db.Column(db.Integer, db.ForeignKey("movies.id"), primary_key=True) created_at = db.Column(db.DateTime, default=db.func.current_timestamp()) user = db.relationship(User, lazy="joined") house = db.relationship(House, lazy="joined") movie = db.relationship(Movie, lazy="joined")
And here's the migration script generated by alembic
def upgrade(): # ### commands auto generated by Alembic - please adjust! ### op.add_column('house_turns', sa.Column('id', sa.Integer(), nullable=False)) # ### end Alembic commands ### def downgrade(): # ### commands auto generated by Alembic - please adjust! ### op.drop_column('house_turns', 'id') # ### end Alembic commands ###
I am really at a loss for how to write a migration that backfills the ids for the existing records with unique values. They don't necessarily need to be unique ids, just incrementing integers.
-
SqlAlchemy declarative architecture supporting queries for both base class and subclass?
I have an SqlAlchemy declarative base class from which more complex classes are derived but for which I also use and will query for instances that are "plain" instances of the base class. My current approach doesn't quite sit right with me; I feel like I'm missing a possible simpler approach.
What I have now, as below, is a base class (e.g.,
GenericFoo
) from which I derive what outside the SqlAlchemy context would be a redundant "wrapper" class (SpecificFoo
) that does nothing but put instances in a different table, so that I can query that table for the "plain" instances. The more complex subclass is a straightforward extension as shown below.I guess this works, but it feels... wasteful to add a separate table for this. An alternative would to filter in some way a
GenericFoo
query to grab only "plain" instances that have no corresponding entry in theChildOfFoo
table, but that feels a little hacky. This seems like a fairly typical situation, but I haven't found anything describing a standard approach to this. Is there one?from sqlalchemy.ext.declarative import declarative_base Base = declarative_base() class GenericFoo(Base): __tablename__ = 'generic_foo' __mapper_args__ = {'polymorphic_identity': 'generic_foo'} def __init__(self, name, color): self.name = name self.color = color # This table only exists to query for things that aren't ChildOfFoo. class SpecificFoo(GenericFoo): __tablename__ = 'specific_foo' __mapper_args__ = {'polymorphic_identity': 'specific_foo'} class ChildOfFoo(GenericFoo): __tablename__ = 'child_of_foo' __mapper_args__ = {'polymorphic_identity': 'child_of_foo'} def __init__(self, name, color, age): super(ChildOfFoo, self).__init__(name, color) self.age = age
Thanks!