AzureML Machine Learning Tasks
Please email me for an easy AzureML task. Regression and Classification is needed in graphics. I will provide you details and pay you right away after we agree on the budget. email: burakkdedeoglu gmail.com
Thanks for your helps. I am looking forward to start.
See also questions close to this topic
-
Sparse Matrix Creation : KeyError: 579 for text datasets
I am trying to use the make_sparse_matrix function to create a sparse matrix for my text dataset, and I face KeyError: 579. Does anyone has any leads on the root of the error.
def make_sparse_matrix(df, indexed_words, labels): """ Returns sparse matrix as dataframe. df: A dataframe with words in the columns with a document id as an index (X_train or X_test) indexed_words: index of words ordered by word id labels: category as a series (y_train or y_test) """ nr_rows = df.shape[0] nr_cols = df.shape[1] word_set = set(indexed_words) dict_list = [] for i in range(nr_rows): for j in range(nr_cols): word = df.iat[i, j] if word in word_set: doc_id = df.index[i] word_id = indexed_words.get_loc(word) category = labels.at[doc_id] item = {'LABEL': category, 'DOC_ID': doc_id, 'OCCURENCE': 1, 'WORD_ID': word_id} dict_list.append(item) return pd.DataFrame(dict_list) make_sparse_matrix( X_train, word_index, y_test )
X_train is a DF that contains one single word in each cell, word_index contains all the index of words and y_test stores all labels.
The Key Error I am facing is:
--------------------------------------------------------------------------- KeyError Traceback (most recent call last) ~\New folder\envs\geo_env\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance) 3079 try: -> 3080 return self._engine.get_loc(casted_key) 3081 except KeyError as err:
pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
pandas_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
KeyError: 579
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last) in
in make_sparse_matrix(df, indexed_words, labels) 20 doc_id = df.index[i] 21 word_id = indexed_words.get_loc(word) ---> 22 category = labels.at[doc_id] 23 24 item = {'LABEL': category, 'DOC_ID': doc_id,
~\New folder\envs\geo_env\lib\site-packages\pandas\core\indexing.py in getitem(self, key) 2154 return self.obj.loc[key] 2155 -> 2156 return super().getitem(key) 2157 2158 def setitem(self, key, value):
~\New folder\envs\geo_env\lib\site-packages\pandas\core\indexing.py in getitem(self, key) 2101 2102 key = self._convert_key(key) -> 2103 return self.obj._get_value(*key, takeable=self._takeable) 2104 2105 def setitem(self, key, value):
~\New folder\envs\geo_env\lib\site-packages\pandas\core\series.py in _get_value(self, label, takeable) 959 960 # Similar to Index.get_value, but we do not fall back to positional --> 961 loc = self.index.get_loc(label) 962 return self.index._get_values_for_loc(self, loc, label) 963
~\New folder\envs\geo_env\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance) 3080 return self._engine.get_loc(casted_key) 3081 except KeyError as err: -> 3082 raise KeyError(key) from err 3083 3084 if tolerance is not None:
KeyError: 579
-
Finding part of string in list of strings
GCM = ([519,520,521,522,533],[534,525],[526,527,530,531], [4404]) slice = int(str(df["CGM"][row_count])[:3])
I am looking through a row in a csv file and taking out the number I want. i want the number that starts with the number I have in
GCM
. since they represent info I want in other columns. this has working fine with the slice function because all the number i wanted started with 3 digits. now that i need to look for any number that starts with4404
and later on going to probably need to look for57052
the slice function no longer work.is there a way I can, instead of slicing and comparing to list, can take 5 digit number and see if part of it is in list. preferably look for it starting 3 or more same digits. the real point of that part of code is finding out which list in
GCM
list the number is. it need to be able to have the number44042
and know that the part of it a care about is inGCM[3]
, but on the other side do not want it to say that32519
is inDCM[0]
since I only care about number that start with519
not ends with it.ps. I am norwegian and have been learning programming by myself. been some long nights. so something here can be lost in translation.
-
How to forecast a time series out-of-sample using an ARIMA model in Python?
I have seen similar questions at Stackoverflow. But, either the questions were different enough or if similar, they actually have not been answered. I gather it is something that modelers run into often, and have a challenge solving.
In my case I am using two variables, one Y and one X with 50 time series sequential observations. They are both random numbers representing % changes (they could be anything you want, their true value does not matter. This is just to set up an example of my coding problem). Here are my basic codes to build this ARIMAX(1,0,0) model.
import pandas as pd import statsmodels.api as sm import statsmodels.formula.api as smf df = pd.read_excel('/Users/gaetanlion/Google Drive/Python/Arima/df.xlsx', sheet_name = 'final') from statsmodels.tsa.arima_model import ARIMA endo = df['y'] exo = df['x']
Next, I build the ARIMA model, using the first 41 observations
modelho = sm.tsa.arima.ARIMA(endo.loc[0:40], exo.loc[0:40], order =(1,0,0)).fit() print(modelho.summary())
So far everything works just fine.
Next, I attempt to forecast or predict the next 9 observations out-of-sample. Here I want to use the X values over these 9 observations to predict Y. And, I just can't do it. I am showing below just the one code, that I think gets me the closest to where I need to go.
modelho.predict(exo.loc[41:49], start = 41, end = 49, dynamic = False) TypeError: predict() got multiple values for argument 'start'
-
AzureML LocalServices cant connect to registry
I have the following setup in Azure twice:
- a azure ml workspace
- a container registry
- a linux vm with my code and docker on it
I want to test run the azure ml webservices on my VM. With setup one I can run my Code and it works )This setup was done a year ago by someone I can't talk to anymore). With setup two, where I only changed the Resource Group and the principal user credentials I get the error "Unable to login to Docker registry".
I looked into this and found out that the registry credentials "Username" and "Password" are set to None using environment.get_image_details.
In setup one the adminuser credentials from the container registry are used.
I can't find any place to set these credentials locally, they seem to be retrieved from Azure and are just not returned in setup two. Can anyone help me out with this problem?
-
How to access to the dataset transformed by automatic featurization steps in Azure Automated ML
I’m performing a series of experiments with Azure AutoML and I need to see the featurized data. I mean, not just the new features names retrieved by method get_engineered_feature_names() or the featurization details retrieved by get_featurization_summary(), I refer to the whole transformed dataset, the one obtained after scaling/normalization/featurization that is therefore used to train the models.
Is it possible to access to this dataset or download it as a file?
Thanks.
-
I am getting blank blank dict {} after running Azureml rest endpoint which is deployed in aci
I have deployed model in aci and it's in healthy state but when I do post on that rest endpoint it's giving me a blank dict {} (same script is executing locally). Also when I checked the deployment logs for the same it's saying
PandasError: DataFrame constructor not properly called!
but I already accounted this and written my input reading code but not understanding why it is giving me a error. Can anyone help in this?My entry script run function is as below:
def run(raw_data): input_values=json.dumps(raw_data) main_data=pd.read_json(input_values) predications=model.predict(main_data) return predications