Do you have to install Python somehow in order to run a script in Azure Data Factory?
Because all documentation that I find for how to run a Python script don't talk about this.
1 answer
-
answered 2022-01-21 11:55
Nandan
ADF doesnt provide a direct platform to execute Python scripts. You can leverage Azure functions, Azure batch ,Azure automation or Azure databricks with Python via ADF
do you know?
how many words do you know
See also questions close to this topic
-
Python File Tagging System does not retrieve nested dictionaries in dictionary
I am building a file tagging system using Python. The idea is simple. Given a directory of files (and files within subdirectories), I want to filter them out using a filter input and tag those files with a word or a phrase.
If I got the following contents in my current directory:
data/ budget.xls world_building_budget.txt a.txt b.exe hello_world.dat world_builder.spec
and I execute the following command in the shell:
py -3 tag_tool.py -filter=world -tag="World-Building Tool"
My output will be:
These files were tagged with "World-Building Tool": data/ world_building_budget.txt hello_world.dat world_builder.spec
My current output isn't exactly like this but basically, I am converting all files and files within subdirectories into a single dictionary like this:
def fs_tree_to_dict(path_): file_token = '' for root, dirs, files in os.walk(path_): tree = {d: fs_tree_to_dict(os.path.join(root, d)) for d in dirs} tree.update({f: file_token for f in files}) return tree
Right now, my dictionary looks like this:
key:''
.In the following function, I am turning the empty values
''
into empty lists (to hold my tags):def empty_str_to_list(d): for k,v in d.items(): if v == '': d[k] = [] elif isinstance(v, dict): empty_str_to_list(v)
When I run my entire code, this is my output:
hello_world.dat ['World-Building Tool'] world_builder.spec ['World-Building Tool']
But it does not see
data/world_building_budget.txt
. This is the full dictionary:{'data': {'world_building_budget.txt': []}, 'a.txt': [], 'hello_world.dat': [], 'b.exe': [], 'world_builder.spec': []}
This is my full code:
import os, argparse def fs_tree_to_dict(path_): file_token = '' for root, dirs, files in os.walk(path_): tree = {d: fs_tree_to_dict(os.path.join(root, d)) for d in dirs} tree.update({f: file_token for f in files}) return tree def empty_str_to_list(d): for k, v in d.items(): if v == '': d[k] = [] elif isinstance(v, dict): empty_str_to_list(v) parser = argparse.ArgumentParser(description="Just an example", formatter_class=argparse.ArgumentDefaultsHelpFormatter) parser.add_argument("--filter", action="store", help="keyword to filter files") parser.add_argument("--tag", action="store", help="a tag phrase to attach to a file") parser.add_argument("--get_tagged", action="store", help="retrieve files matching an existing tag") args = parser.parse_args() filter = args.filter tag = args.tag get_tagged = args.get_tagged current_dir = os.getcwd() files_dict = fs_tree_to_dict(current_dir) empty_str_to_list(files_dict) for k, v in files_dict.items(): if filter in k: if v == []: v.append(tag) print(k, v) elif isinstance(v, dict): empty_str_to_list(v) if get_tagged in v: print(k, v)
-
Actaully i am working on a project and in it, it is showing no module name pip_internal plz help me for the same. I am using pycharm(conda interpreter
File "C:\Users\pjain\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\pjain\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Users\pjain\AppData\Local\Programs\Python\Python310\Scripts\pip.exe\__main__.py", line 4, in <module> File "C:\Users\pjain\AppData\Local\Programs\Python\Python310\lib\site-packages\pip\_internal\__init__.py", line 4, in <module> from pip_internal.utils import _log
I am using pycharm with conda interpreter.
-
Looping the function if the input is not string
I'm new to python (first of all) I have a homework to do a function about checking if an item exists in a dictionary or not.
inventory = {"apple" : 50, "orange" : 50, "pineapple" : 70, "strawberry" : 30} def check_item(): x = input("Enter the fruit's name: ") if not x.isalpha(): print("Error! You need to type the name of the fruit") elif x in inventory: print("Fruit found:", x) print("Inventory available:", inventory[x],"KG") else: print("Fruit not found") check_item()
I want the function to loop again only if the input written is not string. I've tried to type return Under print("Error! You need to type the name of the fruit") but didn't work. Help
-
AZURE DATA FACTORY- copy activity to sql database returning only the first row
i m making two copy activities, one from a REST API and the second from a JSON file, both of them are returning only the first row in my sql database, i tried to copy this in a JSON file instead of sql database and it worked perfectly, it returned all the rows, i m wondering why in sql database it s returning just the first row,
anyone have an idea about this issue?
thanks
-
Azure Data Factory - Parameter interpolation
I'm trying to add a pipeline parameter into the body of a post request to an Azure Function app in Azure Data Factory. It appears that the string isn't getting replaced, but the Microsoft documentation suggests I'm doing it the right way. See here: https://docs.microsoft.com/en-gb/azure/data-factory/control-flow-web-activity#request-payload-schema
This is a screenshot of an error I'm getting alongside it:
I'm confused as to how to proper interpolate pipeline parameters into this request
Thanks!
-
What is the best Code Tool to etl data from Azure SQL Database to Azure SQL Data Warehouse?
I want to build a Azure SQL Data Warehouse from a Azure SQL Database. I have seen Data Factory but I wonder if there is a tool that uses only code (python or sql) to make extraction, transformation and load. Thanks.
-
Data Factory: How to flatten json hierarchy
I have a json-file in a blob container in Azure SA and I want to use "Copy Data" activity in ADF to get the data in a SQL DB. I have also looked into using Data Flows in ADF but haven't succeeded there either.
Now when I use the copy data activity the output only contains the first entry in "lines".
The json-file has the following hierarchy:
And my goal is to have each "line" in "order" in a seperate line in the SQL DB.
EDIT 1: I am using Data Flows and data is added to both the Blob (sink1) and SQL DB (sink2) like I want to, i.e the data is flattened. The problem is that the Data Flow gives errors that I do not understand.
The flow looks like this:
And even though I have specified the file name in the Data Flow the output file is named "part-00000-609332d2-8494-4b68-b481-f237f62cc6c8-c000.json".
The output error details of the pipeline which runs the data flow is as follows:
{"StatusCode":"DFExecutorUserError","Message":"Job failed due to reason: at Sink 'sink1': org.apache.hadoop.fs.azure.AzureException: com.microsoft.azure.storage.StorageException: This operation is not permitted on a non-empty directory.","Details":"org.apache.hadoop.fs.azure.AzureException: com.microsoft.azure.storage.StorageException: This operation is not permitted on a non-empty directory.\n\tat org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.delete(AzureNativeFileSystemStore.java:2607)\n\tat org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.delete(AzureNativeFileSystemStore.java:2617)\n\tat org.apache.hadoop.fs.azure.NativeAzureFileSystem.deleteFile(NativeAzureFileSystem.java:2657)\n\tat org.apache.hadoop.fs.azure.NativeAzureFileSystem$2.execute(NativeAzureFileSystem.java:2391)\n\tat org.apache.hadoop.fs.azure.AzureFileSystemThreadPoolExecutor.executeParallel(AzureFileSystemThreadPoolExecutor.java:223)\n\tat org.apache.hadoop.fs.azure.NativeAzureFileSystem.deleteWithoutAuth(NativeAzureFileSystem.java:2403)\n\tat org.apache.hadoop.fs.azure.NativeAzureFileSystem.delete(NativeAzureFileSystem.java:2453)\n\tat org.apache.hadoop.fs.azure.NativeAzureFileSystem.delete(NativeAzureFileSystem.java:1936)\n\tat org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter."}"
Here is a sample of the JSON data as text:
{ "customerId": 2241, "soidGt": null, "timestampGt": "2022-04-25T00:00:00", "timestampLt": null, "orders": [ { "soid": 68810264, "id": "a4b84f56-c6a4-4b37-bffb-a34d04c513c4", "tableId": 4676, "revenueUnitId": 682, "lines": [ { "solid": 147557444, "articleId": 70949, "quantity": 3, "taxPctValue": 25, "articleName": "Diavola", "netAmount": 516, "grossAmount": 645 }, { "solid": 147557445, "articleId": 70961, "quantity": 1, "taxPctValue": 25, "articleName": "Parma ai pomodori secchi", "netAmount": 183.2, "grossAmount": 229 } ], "payments": [ { "soptid": 70655447, "paymentTypeId": 2, "amount": 874 } ] }, { "soid": 68810622, "id": "1b356f45-7df7-42ba-8d50-8b14cf67180d", "tableId": 4546, "revenueUnitId": 83, "lines": [ { "solid": 147557985, "articleId": 71159, "quantity": 2, "taxPctValue": 25, "articleName": "Hansa 0,4L", "netAmount": 152, "grossAmount": 190 }, { "solid": 147557986, "articleId": 70948, "quantity": 1, "taxPctValue": 25, "articleName": "Parma", "netAmount": 175.2, "grossAmount": 219 }, { "solid": 147557987, "articleId": 70918, "quantity": 1, "taxPctValue": 25, "articleName": "Focaccia sarda", "netAmount": 71.2, "grossAmount": 89 }, { "solid": 147557988, "articleId": 70935, "quantity": 1, "taxPctValue": 25, "articleName": "Pasta di manzo", "netAmount": 196, "grossAmount": 245 } ], "payments": [ { "soptid": 70655798, "paymentTypeId": 2, "amount": 750 } ] }
-
How to rerun ADF from Failed Pipeline Activity within conditional?
I have a Pipeline that has failed to run on an activity that sits inside of an "If Condition".
It seems possible to rerun the Pipeline from the If but as there are multiple activities that sit inside this (prior to the failed); ideally I would want to rerun the pipeline from the actual failed one.
Is this possible?
-
loop through next url into api url in Azure data factory
I have a an api that contains some data and another api url named "nextapi" I want to loop through each api under api and store the data of each api page to azure sql database using ADF portal. Can anyone please help me with it? My goal is to copy data to azure sql from the api I have, and then go into the 'nextlink' key present in my api and copy that data too into same SQL table. Like this I want to keep loading data until the final api page does not contain any url to next api.
Thanks.