Pandas bracket assignment getitem/setattr
I'm curious how the actual pandas getattr and setattr functionality works under the hood
import pandas
df = pandas.DataFrame([1,2,3])
df['test'] # throws a KeyError
df['test'] = 1 # no error
It looks like some combination of getitem and setattr are being used in the assignment line df['test'] = 1
. Can anyone generally explain how the flow works here - I don't really understand how getitem and setattr are used in conjunction here that is able to assign a new column
I would think df['test']
returns some type of assignment object which has a setattr hook but that doesn't seem like the case
Edit:
From looking at the source code it looks like the __getattribute__
hook is being used, which is different from __getattr__
. Still not quite sure the underlying logic after looking at source
Edit2:
I put some print statements in the pandas source code and it looks like __getitem__
and __setattr__
aren't being called in the df['test'] = 1
line of code. Perhaps the DataFrame class inherits from dict?
See also questions close to this topic
-
Sparse Matrix Creation : KeyError: 579 for text datasets
I am trying to use the make_sparse_matrix function to create a sparse matrix for my text dataset, and I face KeyError: 579. Does anyone has any leads on the root of the error.
def make_sparse_matrix(df, indexed_words, labels): """ Returns sparse matrix as dataframe. df: A dataframe with words in the columns with a document id as an index (X_train or X_test) indexed_words: index of words ordered by word id labels: category as a series (y_train or y_test) """ nr_rows = df.shape[0] nr_cols = df.shape[1] word_set = set(indexed_words) dict_list = [] for i in range(nr_rows): for j in range(nr_cols): word = df.iat[i, j] if word in word_set: doc_id = df.index[i] word_id = indexed_words.get_loc(word) category = labels.at[doc_id] item = {'LABEL': category, 'DOC_ID': doc_id, 'OCCURENCE': 1, 'WORD_ID': word_id} dict_list.append(item) return pd.DataFrame(dict_list) make_sparse_matrix( X_train, word_index, y_test )
X_train is a DF that contains one single word in each cell, word_index contains all the index of words and y_test stores all labels.
The Key Error I am facing is:
--------------------------------------------------------------------------- KeyError Traceback (most recent call last) ~\New folder\envs\geo_env\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance) 3079 try: -> 3080 return self._engine.get_loc(casted_key) 3081 except KeyError as err:
pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
pandas_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
KeyError: 579
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last) in
in make_sparse_matrix(df, indexed_words, labels) 20 doc_id = df.index[i] 21 word_id = indexed_words.get_loc(word) ---> 22 category = labels.at[doc_id] 23 24 item = {'LABEL': category, 'DOC_ID': doc_id,
~\New folder\envs\geo_env\lib\site-packages\pandas\core\indexing.py in getitem(self, key) 2154 return self.obj.loc[key] 2155 -> 2156 return super().getitem(key) 2157 2158 def setitem(self, key, value):
~\New folder\envs\geo_env\lib\site-packages\pandas\core\indexing.py in getitem(self, key) 2101 2102 key = self._convert_key(key) -> 2103 return self.obj._get_value(*key, takeable=self._takeable) 2104 2105 def setitem(self, key, value):
~\New folder\envs\geo_env\lib\site-packages\pandas\core\series.py in _get_value(self, label, takeable) 959 960 # Similar to Index.get_value, but we do not fall back to positional --> 961 loc = self.index.get_loc(label) 962 return self.index._get_values_for_loc(self, loc, label) 963
~\New folder\envs\geo_env\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance) 3080 return self._engine.get_loc(casted_key) 3081 except KeyError as err: -> 3082 raise KeyError(key) from err 3083 3084 if tolerance is not None:
KeyError: 579
-
Finding part of string in list of strings
GCM = ([519,520,521,522,533],[534,525],[526,527,530,531], [4404]) slice = int(str(df["CGM"][row_count])[:3])
I am looking through a row in a csv file and taking out the number I want. i want the number that starts with the number I have in
GCM
. since they represent info I want in other columns. this has working fine with the slice function because all the number i wanted started with 3 digits. now that i need to look for any number that starts with4404
and later on going to probably need to look for57052
the slice function no longer work.is there a way I can, instead of slicing and comparing to list, can take 5 digit number and see if part of it is in list. preferably look for it starting 3 or more same digits. the real point of that part of code is finding out which list in
GCM
list the number is. it need to be able to have the number44042
and know that the part of it a care about is inGCM[3]
, but on the other side do not want it to say that32519
is inDCM[0]
since I only care about number that start with519
not ends with it.ps. I am norwegian and have been learning programming by myself. been some long nights. so something here can be lost in translation.
-
How to forecast a time series out-of-sample using an ARIMA model in Python?
I have seen similar questions at Stackoverflow. But, either the questions were different enough or if similar, they actually have not been answered. I gather it is something that modelers run into often, and have a challenge solving.
In my case I am using two variables, one Y and one X with 50 time series sequential observations. They are both random numbers representing % changes (they could be anything you want, their true value does not matter. This is just to set up an example of my coding problem). Here are my basic codes to build this ARIMAX(1,0,0) model.
import pandas as pd import statsmodels.api as sm import statsmodels.formula.api as smf df = pd.read_excel('/Users/gaetanlion/Google Drive/Python/Arima/df.xlsx', sheet_name = 'final') from statsmodels.tsa.arima_model import ARIMA endo = df['y'] exo = df['x']
Next, I build the ARIMA model, using the first 41 observations
modelho = sm.tsa.arima.ARIMA(endo.loc[0:40], exo.loc[0:40], order =(1,0,0)).fit() print(modelho.summary())
So far everything works just fine.
Next, I attempt to forecast or predict the next 9 observations out-of-sample. Here I want to use the X values over these 9 observations to predict Y. And, I just can't do it. I am showing below just the one code, that I think gets me the closest to where I need to go.
modelho.predict(exo.loc[41:49], start = 41, end = 49, dynamic = False) TypeError: predict() got multiple values for argument 'start'
-
How to extract rows from a dataframe that contain only certain values
I have this data set:
| Country |Languages Spoken | | Afghanistan | Dari Persian, Pashtu (both official), other Turkic and minor languages | Algeria | Arabic (official), French, Berber dialects |Andorra | Catalán (official), French, Castilian, Portuguese |Angola | Portuguese (official), Bantu and other African languages |Antigua and Barbuda | English (official), local dialects |Australia | English 79%, native and other languages
and I want to extract all the english speeaking countries, I think the easiest way would be to extract all the countries that have the word 'English' in the languages, ideally i want to have a new dataframe with the column english speaking and with values true or false.
-
The pandas value error still shows, but the code is totally correct and it loads normally the visualization
I really wanted to use
pd.options.mode.chained_assignment = None
, but I wanted a code clean of error.My start code:
import datetime import altair as alt import operator import pandas as pd s = pd.read_csv('../../data/aparecida-small-sample.csv', parse_dates=['date']) city = s[s['city'] == 'Aparecida']
Based on @dpkandy's code:
city['total_cases'] = city['totalCases'] city['total_deaths'] = city['totalDeaths'] city['total_recovered'] = city['totalRecovered'] tempTotalCases = city[['date','total_cases']] tempTotalCases["title"] = "Confirmed" tempTotalDeaths = city[['date','total_deaths']] tempTotalDeaths["title"] = "Deaths" tempTotalRecovered = city[['date','total_recovered']] tempTotalRecovered["title"] = "Recovered" temp = tempTotalCases.append(tempTotalDeaths) temp = temp.append(tempTotalRecovered) totalCases = alt.Chart(temp).mark_bar().encode(alt.X('date:T', title = None), alt.Y('total_cases:Q', title = None)) totalDeaths = alt.Chart(temp).mark_bar().encode(alt.X('date:T', title = None), alt.Y('total_deaths:Q', title = None)) totalRecovered = alt.Chart(temp).mark_bar().encode(alt.X('date:T', title = None), alt.Y('total_recovered:Q', title = None)) (totalCases + totalRecovered + totalDeaths).encode(color=alt.Color('title', scale = alt.Scale(range = ['#106466','#DC143C','#87C232']), legend = alt.Legend(title="Legend colour"))).properties(title = "Cumulative number of confirmed cases, deaths and recovered", width = 800)
This code works perfectly and loaded normally the visualization image, but it still shows the pandas error, asking to try to set
.loc[row_indexer,col_indexer] = value instead
, then I was reading the documentation "Returning a view versus a copy" whose linked cited and also tried this code, but it still shows the same error. Here is the code withloc
:# 1st attempt tempTotalCases.loc["title"] = "Confirmed" tempTotalDeaths.loc["title"] = "Deaths" tempTotalRecovered.loc["title"] = "Recovered" # 2nd attempt tempTotalCases["title"].loc = "Confirmed" tempTotalDeaths["title"].loc = "Deaths" tempTotalRecovered["title"].loc = "Recovered"
Here is the error message:
<ipython-input-6-f16b79f95b84>:6: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy tempTotalCases["title"] = "Confirmed" <ipython-input-6-f16b79f95b84>:9: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy tempTotalDeaths["title"] = "Deaths" <ipython-input-6-f16b79f95b84>:12: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy tempTotalRecovered["title"] = "Recovered"
Jupyter and Pandas version:
$ jupyter --version jupyter core : 4.7.1 jupyter-notebook : 6.3.0 qtconsole : 5.0.3 ipython : 7.22.0 ipykernel : 5.5.3 jupyter client : 6.1.12 jupyter lab : 3.1.0a3 nbconvert : 6.0.7 ipywidgets : 7.6.3 nbformat : 5.1.3 traitlets : 5.0.5 $ pip show pandas Name: pandas Version: 1.2.4 Summary: Powerful data structures for data analysis, time series, and statistics Home-page: https://pandas.pydata.org Author: None Author-email: None License: BSD Location: /home/gus/PUC/.env/lib/python3.9/site-packages Requires: pytz, python-dateutil, numpy Required-by: ipychart, altair
-
python more generic solution for using getattr
say I have a test file with the following content:
def a(): print('this is a') def b(x): print(x)
and also a main file:
import test def try_cmd(cmd, params): try: getattr(functions, cmd)(params) except Exception as error: print(error) while True: cmd = input('Enter cmd') params = input('Enter params') do_command(cmd, params)
The purpose of the code should be to try to call a function from a different file, with the user giving the function name and if needed params for it to take. What happens is if the value of cmd is 'a' and parmas is a random string do_command will not work because function a doesn't take params. However if cmd will be 'b' and params will be say '5' it will work. How do I get around that without forcing a to take params and not actually using it.
-
Can't parse bs4 src attribute using the getattr() function
I've created a script to parse two fields from every movie container from a webpage. The script is doing fine.
I'm trying to use this
getattr()
function to scrape text and src from two fields, as inmovie_name
andimage_link
. In case ofmovie_name
, it works. However, it fails when I try to parseimage_link
.There is a function currently commented out which works when I uncomment. However, my goal here is to make use of
getattr()
to parsesrc
.import requests from bs4 import BeautifulSoup url = "https://yts.am/browse-movies" headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36' } # def get_information(url): # res = requests.get(url,headers=headers) # soup = BeautifulSoup(res.text,'lxml') # for row in soup.select(".browse-movie-wrap"): # movie_name = row.select_one("a.browse-movie-title").text # image_link = row.select_one("img.img-responsive").get("src") # yield movie_name,image_link def get_information(url): res = requests.get(url,headers=headers) soup = BeautifulSoup(res.text,'lxml') for row in soup.select(".browse-movie-wrap"): movie_name = getattr(row.select_one("a.browse-movie-title"),"text",None) image_link = getattr(row.select_one("img.img-responsive"),"src",None) yield movie_name,image_link if __name__ == '__main__': for items in get_information(url): print(items)
How can I scrape
src
usinggetattr()
function? -
How can I get the class name from within a class method - and use this classname in __setattr__ for verification
I have the following code:
class A: @classmethod def my_class(cls): return cls.__name__ myclass = my_class() conc = "_" + str(myclass) + "__x" def __init__(self, x): self.__x = x def __setattr__(self, attr, value): if attr == conc: self.__dict__[attr] = value else: raise AttributeError def sum(self, y): print(self.__x + y)
but i got
TypeError: 'classmethod' object is not callable
How can I combine the setattr validation with the name of the current class so that I can use the method of class A when inheriting?
class B(A): def __init__(self, x): super().__init__(x) self.__x = x b = B(10) b.sum(10)
-
How do I add new attributes to a class per conditional statements and variables in the `__new__` or `__init__` method?
UPDATE, About 3 hours later....
Ok, so, @khelwood pointed out I could make an EMPTY CLASS, (Not necessarily a MetaClass) something like:
class ElBLANKO(): pass class Container(): __drum = ElBLANKO() def __init__(self, arg=None): if isinstance(arg, (tuple, list, set)) and len(arg) >= 2: drumqtynames= ("One", "Two", "Thr", "Four") for c in range(0, len(arg)): if isinstance(arg[c], SomeOtherClass): if not hasattr(self.__drum, drumqtynames[c]): setattr(self.__drum, drumqtynames[c], arg[c]) ... ... ...
And, to use that empty class as a basis for
self.__drum
upon which to add sub-attribs (again, depending upon how many elements and sub-elements I have in thearg
input.Now, I'm looking at this again, and I'm wondering: Okay, I want
self.__drum
to be able to take aSomeOtherClass
object when there's only 1 element (be it another tuple rep'd by a variable, etc... or aSomeOtherClass
Object) in thearg
, and again, be able to take multiple SUB-Attributes, which are assigned aSomeOtherClass
. depending on the amount of elements in thearg
.So...whether its moving the first mention of
__drum
to the next line Under__init__
and declare it something ELSE when there's only 1 element in thearg
, or maybe leaving it there where it is, the objective is still the same:
>>> t = Container((SomeOtherClass(a,b),(c,d),(f,g)) >>> u = Container((a,b)) >>> t.showdrum.One.attrib1 5 >>> t.showdrum.One.attrib2 69 >>> t.showdrum.Two.attrib1 1 >>> t.showdrum.Two.attrib2 45 >>> t.showdrum.Thr.attrib1 1 >>> t.showdrum.Thr.attrib2 4 >>> u.showdrum.attrib1 #because there was only 1 elem in its arg's tuple 5 >>> u.showdrum.attrib2 69
Anyway, that's my goal, and if I can achieve THAT, then maybe making an EMPTY CLASS would be the ticket. I hope it made this more clear to you, but, if this is STILL clear as mud... then I don't know how to explain it to you, then.
Original Post
I have been trying to write a Python 3.8.8 class that, depending on how many tuples, lists, sets, or custom object classes are in a given argument during the
__init__
process, will add that attribute programmatically to the given object being created.Here's my sample code:
class Container(object): __drum = 0 def __init__(self, arg=None): if isinstance(arg, (tuple, list, set)) and len(arg) >=2: drumqty = ("1", "2", "3", "4") for c in range (0,len(arg)): if isinstance(arg[c], someclass): if not hasattr(self.__drum, drumqty[c]): setattr(self.__drum, drumqty[c], arg[c]) elif isinstance(arg[c], (tuple, list, set)) and len(arg[c]) == 2: if not hasattr(self.__drum, drumqty[c]): setattr(self.__drum, drumqty[c], someclass(arg[c])) elif isinstance(arg[c], (int, float, num)): break elif ((isinstance(arg, (tuple, list, set) and len(arg) == 2) or isinstance(arg, someclass)): if isinstance(arg, someclass): self.__drum = arg else: self.__drum = someclass(arg)
Now, I know I can't necessarily do this, just like that. It throws an exception:
TypeError: can't set attributes of built-in/extension type 'object'
So, I know I have to figure out a way to use a MetaClass, somehow. My end goal for this Container class object is:
If it's got just a 2-tuple, 2-element list, or 2-element set, or it's a SomeClass object I'm passing in:
self.__drum.attribute1 self.__drum.attribute2
Or, if "arg" is a tuple, list, or set of whatever length, comprising of 2-tuples, etc., or a someclass:
self.__drum.1.attribute1 self.__drum.1.attribute2 self.__drum.2.attribute1 self.__drum.2.attribute2 etc.....
So, do I put
.__drum
in the metaclass, andinit
theself
in then-SUBclass Container (under the MetaClass) under its own__init__
method with asuper()
and then keep going?Do I define a
__new__
method in the subclass, and adjust it from there?