What is the best Code Tool to etl data from Azure SQL Database to Azure SQL Data Warehouse?
I want to build a Azure SQL Data Warehouse from a Azure SQL Database. I have seen Data Factory but I wonder if there is a tool that uses only code (python or sql) to make extraction, transformation and load. Thanks.
1 answer
-
answered 2022-05-05 14:05
Nandan
There are many tools like
- Azure Data bricks
- Azure functions
- Azure automation etc
do you know?
how many words do you know
See also questions close to this topic
-
Why does this Sql query shows the possible changes but does not implement it?
So, I want to change the prefix of my tables and the following command shows the possible changes that will take place which seems alright but does not seem to implement it.
SELECT Concat('RENAME TABLE ', TABLE_NAME, ' TO fan_', SUBSTRING_INDEX(TABLE_NAME, 'pc_',-1), ';') FROM information_schema.tables WHERE table_name like 'pc_%' and table_schema='testdbhere'
Moreover, this isn't a writing privilege issue as changing the tables name individually works perfectly from the same user.
-
how to get weeklytotal and yesterday record in mysql in one table
Hi Everyone i am trying to implement query to get weekly and yesterday data in same table, dummy output i have shared below, if yesterday not exist as per employee_id it should we showing 0 also as per my table week start from monday and end at sunday.please help me out how to query this get weekly_total and yesterday record and one table.
Table name-dailydata-
Sample data
employee_id date total 20 2022-04-25 10 20 2022-04-26 20 20 2022-04-27 20 20 2022-04-28 10 20 2022-04-29 20 20 2022-04-30 30 20 2022-04-31 40 20 2022-05-01 50 40 2022-04-26 20 expected output
employee_id weekly_total yesterday_record 20 200 40 40 20 0 mysql query to get weekly data
select employee_id,sum(total) as week_total from dailydata where date between '2022-04-25' and '2022-05-01'
-
Procedure to display the Employees of a specific Department
Create a stored procedure, to display the lastnames as "Name" from the employee table. The requirement is as below: The name of the Procedure : EmployeesDept
Name of the argument as Input : DeptNo
Write the code to create the procedure
I wrote the Query like this but i didnt get expected output
CREATE PROCEDURE Employeesdept(@Deptno varchar) AS Begin SELECT lastname as name from employee where workdept= 'D21' end
Expected output:
Name --------------- Pulaski Jefferson Marino Smith Johnson Perez Monteverde Name ---------------
-
Deploy VueJS + API app to Azure Static Web App with Gitlab doesn't create functions
I've started creating a small application that will use VueJS as a frontend with Azure Functions as the backend. I was looking at using Azure Static Web Apps to host both components for the application and Gitlab to store / deploy the app.
Everything but the creation of the Azure functions works. Following https://docs.microsoft.com/en-us/azure/static-web-apps/gitlab?tabs=vue
The output from the deploy step, listed below is:
App Directory Location: '/builds/*/valhalla/valhalla-client/dist/spa' was found. Api Directory Location: '/builds/*/valhalla/valhalla-api/dist' was found. Looking for event info Could not get event info. Proceeding Starting to build app with Oryx Azure Static Web Apps utilizes Oryx to build both static applications and Azure Functions. You can find more details on Oryx here: https://github.com/microsoft/Oryx ---Oryx build logs--- Operation performed by Microsoft Oryx, https://github.com/Microsoft/Oryx You can report issues at https://github.com/Microsoft/Oryx/issues Oryx Version: 0.2.20220131.3, Commit: ec344c058843461525ff03b46031553b6e15a47a, ReleaseTagName: 20220131.3 Build Operation ID: |qAffRWArEg8=.deee9498_ Repository Commit : 7cdd5b61f956e6cb8459b13a42af363c4440a97b Detecting platforms... Could not detect any platform in the source directory. Error: Could not detect the language from repo. ---End of Oryx build logs--- Oryx was unable to determine the build steps. Continuing assuming the assets in this folder are already built. If this is an unexpected behavior please contact support. Finished building app with Oryx Starting to build function app with Oryx ---Oryx build logs--- Operation performed by Microsoft Oryx, https://github.com/Microsoft/Oryx You can report issues at https://github.com/Microsoft/Oryx/issues Oryx Version: 0.2.20220131.3, Commit: ec344c058843461525ff03b46031553b6e15a47a, ReleaseTagName: 20220131.3 Build Operation ID: |NGXLP5bVBRk=.705477f6_ Repository Commit : 7cdd5b61f956e6cb8459b13a42af363c4440a97b Detecting platforms... Could not detect any platform in the source directory. Error: Could not detect the language from repo. ---End of Oryx build logs--- Oryx was unable to determine the build steps. Continuing assuming the assets in this folder are already built. If this is an unexpected behavior please contact support. [WARNING] The function language could not be detected. The language will be defaulted to node. Function Runtime Information. OS: linux, Functions Runtime: ~3, node version: 12 Finished building function app with Oryx Zipping Api Artifacts Done Zipping Api Artifacts Zipping App Artifacts Done Zipping App Artifacts Uploading build artifacts. Finished Upload. Polling on deployment. Status: InProgress. Time: 0.1762737(s) Status: InProgress. Time: 15.3950401(s) Status: Succeeded. Time: 30.5043965(s) Deployment Complete :) Visit your site at: https://polite-pebble-0dc00000f.1.azurestaticapps.net Thanks for using Azure Static Web Apps! Exiting Cleaning up project directory and file based variables 00:00 Job succeeded
The deploy step appears to have succeeded, and the frontend is deployed, but there are no Azure Functions showing up in this Static Web App. Is something missed here? So far, the Azure Functions I have are the boiler-plate from instantiating a new Azure Function folder.
image: node:latest variables: API_TOKEN: $DEPLOYMENT_TOKEN APP_PATH: '$CI_PROJECT_DIR/valhalla-client/dist/spa' API_PATH: '$CI_PROJECT_DIR/valhalla-api/dist' stages: - install_api - build_api - install_client - build_client - deploy install_api: stage: install_api script: - cd valhalla-api - npm ci artifacts: paths: - valhalla-api/node_modules/ cache: key: node paths: - valhalla-api/node_modules/ only: - master install_client: stage: install_client script: - cd valhalla-client - npm ci artifacts: paths: - valhalla-client/node_modules/ cache: key: node paths: - valhalla-client/node_modules/ only: - master build_api: stage: build_api dependencies: - install_api script: - cd valhalla-api - npm install -g azure-functions-core-tools@3 --unsafe-perm true - npm run build artifacts: paths: - valhalla-api/dist cache: key: build_api paths: - valhalla-api/dist only: - master needs: - job: install_api artifacts: true optional: true build_client: stage: build_client dependencies: - install_client script: - cd valhalla-client - npm i -g @quasar/cli - quasar build artifacts: paths: - valhalla-client/dist/spa cache: key: build_client paths: - valhalla-client/dist/spa only: - master needs: - job: install_client artifacts: true optional: true deploy: stage: deploy dependencies: - build_api - build_client image: registry.gitlab.com/static-web-apps/azure-static-web-apps-deploy script: - echo "App deployed successfully." only: - master
-
Azure Synapse Notebooks Vs Azure Databricks notebooks
I was going through the features of Azure Synapse Notebooks Vs Azure Databricks notebooks.
- Are there any major differences between these apart from the component they belong to ?
- Are there any scenarios where one is more appropriate over other?
-
How to authorize azure container registry requests from .NET CORE C#
I have a web application which creates ContainerInstances, I have specific container registry images I want to use. As a result, I use this code to get my azure container registry
IAzure azure = Azure.Authenticate($"{applicationDirectory}/Resources/my.azureauth").WithDefaultSubscription(); IRegistry azureRegistry = azure.ContainerRegistries.GetByResourceGroup("testResourceGroup", "testContainerRegistryName");
I get this error when the second line of code is hit
The client 'bc8fd78c-2b1b-4596-827e-6a3c918b7c17' with object id 'bc8fd78c-2b1b-4596-827e-6a3c918b7c17' does not have authorization to perform action 'Microsoft.ContainerRegistry/registries/read' over scope '/subscriptions/506b787d-83ef-426a-b7b8-7bfcdd475855/resourceGroups/testapp-live/providers/Microsoft.ContainerRegistry/registries/testapp' or the scope is invalid. If access was recently granted, please refresh your credentials.
I literally have no idea what to do about this. I have seen so many articles talking about Azure AD and giving user roles and stuff. Can someone please walk me step by step how to fix this? I REALLY appreciate the help. Thanks.
I cannot find any client under that object ID so perfectly fine starting from scratch again with a better understanding of what I am doing.
-
SQL queriers on update and insert in a single query
Write a single SQL statement to update the table student_marks based on the data in the student_marks_staging. • If the same record is found in both the tables, the marks should be updated by taking from the staging table • If a new record is found in the staging table, that should be inserted into the student_marks table.
The data in student_marks at the end of update should look like STUDENT_ID SUBJECT_ID MARKS
1 11 75 1 12 90 1 13 95 2 11 86 2 12 96 3 11 92 3 13 79 4 11 52
Sample table forms
Student_marks STUDENT_ID SUBJECT_ID MARKS
1 11 67 1 12 90 1 13 95 2 11 78 3 11 82 4 11 52
Student_marks_stagging
STUDENT_ID SUBJECT_ID MARKS
1 11 75 2 12 96 3 13 79 2 11 86 3 11 92
update student_marks s set marks =(select marks from student_marks_stagging r where r.studen_id=s.student_id and r.subject_id=s.subject_id);
Above query getting proper result..Please help me on this how to get proper results.
-
Best way to compare two large files on multiple columns
I am working on a feature which will allow users to upload two csv files, write the rules to compare the rows and output a result into a file.
Both files can have any number of columns and the columns name are also not fixed.
Currently, I read the files into two separate arrays and compare the rows based on the condition given in the rule.
This works for smaller files but for large ones, it takes a lot of time and memory to do the comparison.
Is there a better way where a DB can be utilized for storing and querying on schema-less data?
Example Data:
File1 type id date amount A 1 12/10/2005 500 B 2 12/10/2005 500 File2 type id date amount A 1 12/10/2005 500 B 2 12/10/2005 500 A 1 12/10/2005 500 Rule1 File1.type == File2.type && File1.amount == File2.amount Rule2 File1.id == GroupBy(File2.id) && File1.amount == File2.TotalAmount
The match condition will be = Rule1 or Rule2
-
Informatica PC restart workflow with different sql query
I am using Informatica PC. I have workflow which have sql query. This query like "select t1, t2, t3 from table where t1 between date '2020-01-01' and date '2020-01-31'" I need to download all data between 2020 and 2022. But I can't write it in query because I will have ABORT SESSION from Teradata. I want to write smth, which will restart workflow with different dates automatically. From first start take 01.2020, second start 02.2020, third start 03.2020 and etc. How can I solve this problem?
-
AZURE DATA FACTORY- copy activity to sql database returning only the first row
i m making two copy activities, one from a REST API and the second from a JSON file, both of them are returning only the first row in my sql database, i tried to copy this in a JSON file instead of sql database and it worked perfectly, it returned all the rows, i m wondering why in sql database it s returning just the first row,
anyone have an idea about this issue?
thanks
-
Azure Data Factory - Parameter interpolation
I'm trying to add a pipeline parameter into the body of a post request to an Azure Function app in Azure Data Factory. It appears that the string isn't getting replaced, but the Microsoft documentation suggests I'm doing it the right way. See here: https://docs.microsoft.com/en-gb/azure/data-factory/control-flow-web-activity#request-payload-schema
This is a screenshot of an error I'm getting alongside it:
I'm confused as to how to proper interpolate pipeline parameters into this request
Thanks!
-
Postgres filter on a 'many' table with pagination on the 'one' table
I have a Postgress question/challange
There is a table 'products' with a lot of rows (millions).
- Each product is of a certain Class.
- Each product has a number of features of different type:
- A Feature could be 'color' and the value is a picklist of all colors.
- A Feature could be Voltages with a numerical value of (low) 220 to (high) 240.
There can be up to 100 features for each product.
What is done is to put all features of a product in a Many-table (with the Product table as the One). So, this table is even bigger (much bigger).
Standard query (no Feature-filters)
A query comes along for all products of that Class. This can result is a lot of products, so Pagination is implemented on the SQL Query.
I solved this by query the products table first, then a separate query on the feature-table , gather all features for the products in the first batch and add them to the result (in the NodeJS Api application)
Problem with using a Feature-filter
But now a new request comes along to request for product of a certain Class, and matching the value for a certain feature.
It is not possible to use the same method as before and just filter out all products not matching the value for the specific feature mentioned in the request. Because post-processing the database result and taking out products (not matching the Feature-value) will mess up the pagination (which comes from the database).
Possible Solutions
The following solutions I have already thought of:
Go the MongoDB way
Just put everything of a product in one record, and use Array's in Postgres for the features.
Downside is that array's can become quite large and I don't know how Postgres performance will be on very large records.
(Maybe I should go with MongoDB, which is filled by Postgres, just to handle requests)
Any tips here?
Forget pagination from the database
Just do not do the pagination in the database abnd handle it in NodeJS. Then I can do the postprocessing in javascript.
But I need to use WHERE clause for filtering (not LIMIT/OFFSET) which makes it quite complex and costs a lot of memory on the NodeJS Application. This is not the best solution.
Use another technique?
I'm not familiar with Data Warehousing techniques, but is there a solution lurking in that area?
Current stack is Python, Postgres, NodeJS for the API. Any other tools which can help me?
-
Data Integration Methods
I need the help of a data engineer.
I am wondering, what is the difference between methods of data integration and forms of data integration?
- Methods of data integration: manual, middleware, application based, uniform access, common storage.
- Forms of data integration: data consolidation, data federation, data virtualization, data propagation.
Best regards,
-
Using DimCalendar update days between two dates
In DWH (SQL Server) i have two tables:
DWH.Days
DayStart DayStop DaysBetween 2022-04-21 2022-04-24 null 2022-03-12 2022-04-27 null 2022-04-21 2022-04-24 null 2022-03-01 2022-04-22 null and DWH.Calendar
Date IsHoliday? 2022-05-11 yes 2022-05-12 no 2022-05-13 yes 2022-05-15 no I need to update DWH.Days.DaysBetween as a number of days between DayStart and DayStop where DWH.DimCalendar.IsHoliday?='no'. I don't have premission change the data model. I don't have any ideas how to do it, any ideas?