SQL: How to join tables with 1+ millions of records
I want to join two tables ("products" table has 1.5 millions of records) using the following query, but after 15 minutes the query was still running and my pc was overheating (it's a lenovo v330-14ikb with 8gb of RAM), so I stopped it.
I am very new to indexes, and I tried by creating the followings:
- CREATE INDEX customer_id_idx1 ON orders (customer_id)
- CREATE INDEX customer_id_idx2 ON products (customer_id)
- CREATE INDEX customer_id_revenues_idx ON orders(customer_id,revenues)
- CREATE INDEX customer_id_costs_idx ON products(customer_id,costs)
This is the query:
SELECT a.customer_id, (SUM(a.revenues) / SUM(b.costs) :: FLOAT) AS roi
FROM orders a
JOIN products b
ON a.customer_id = b.customer_id
WHERE a.customer_id IN (
SELECT customer_id FROM (SELECT
customer_id,
COUNT(*) AS n_products
FROM products
GROUP BY 1
ORDER BY 2 DESC
LIMIT 5) x
)
GROUP BY a.customer_id
ORDER BY roi DESC
The output should return the ratio of revenues/costs for the top 5 customers by number of products they bought.
I am using pgadmin. Can someone explain me how to speed up and make it compile? Thank you in advance.
1 answer
-
answered 2021-02-27 15:10
a_horse_with_no_name
I don't think you need to aggregate twice as far as I can tell.
select customer_id, roi from ( select o.customer_id, sum(o.revenues) / sum(p.costs)::float as roi, count(*) as n_products from orders o join products p on o.customer_id = p.customer_id group by o.customer_id order by n_products limit 5 ) t order by roi desc
Alternatively try to aggregate the two tables separately, then join the results:
select o.customer_id, o.revenues / p.costs::numeric as roi from ( select customer_id, sum(revenues) as revenues from orders group by customer_id ) o join ( select customer_id, sum(costs) as costs, count(*) n_products from products group by customer_id ) p on p.customer_id = o.customer_id order by p.n_products desc limit 5
See also questions close to this topic
-
How do I denormalize an ER-D into reporting views for end users?
Link to ER-D: D2L ER-D Diagram for Competency
We have this data in an oracle database. It will go through IBM Framework Mangers which reflects all of the relationships in the ER-D, as well as add some security. Then it is available to our end users via Cognos, our reporting tool. I've been tasked with de-normalizing the data so that the end users see fewer reporting views/tables. For example, for this specific data set, the user currently sees all 6 competency related tables, along with 2 others (Users and Organizational Units). The goal is to make it easier for the end user by doing the joining together and instead of having 6 (or 8) tables, to have maybe 2 or 3 reporting views. I've never done this before, and assume that in creating the views, because none of them have zero cardinality (as in zero to many, one to zero or many, etc.) they are all inner joins. So first question, are these all inner joins? 2, Do I list columns that I want from each table, and then just join on the keys like this:
select a.Activityid, a.Orgunitid, a.ActivityName, etc. b.Userid, b.LeraningObjectid, etc. from ComptencyActivities a inner join CompetnecyActivityResults b on a.ActivityId = b.ActivityId and a.OrgUnitId = b.OrgUnitid
3rd question, how do I figure out how many views to create? Would creating a single reporting view be an awful idea?
Also, I've done my best googling and have found sufficient advice on how to create ER-Ds and to normalize to a certain extent, but I'm having a hard time explaining how to de-normalize the data for reporting so any resources at all would be most appreciated. Thanks so much!
-
How to fetch SQL data using api and use that data in react-native-svg charts? I am having an API that I want to use to fetch data and display
I am fetching some data using an api. Inside that api there are SQL queries that are executed. I have api that will be used to fetch data or execute these queries. I want to know how can I replace my chart's static data with dynamic data that will be fetched from api.
Here is my
TabDashboardDetail.js
where I am fetching title for all charts based on api data:import React from 'react'; import DefaultScrollView from '../components/default/DefaultScrollView'; import ChartView from '../components/default/ChartView'; import CogniAreaChart from '../components/CogniAreaChart'; import { areaChartData } from '../chartData'; const TabDashboardDetail = ({ navigation, route }) => { const tabsConfig = route.params.tabsConfig; return ( <DefaultScrollView> {tabsConfig.components.map((comp, index) => { return ( <ChartView key={index} title={comp.name}> <CogniAreaChart areaChartData={areaChartData} height={200} /> </ChartView> ); })} </DefaultScrollView> ); }; export default TabDashboardDetail;
Here is my
CogniAreaChart.js
which is chart file that is currently being rendered:/* eslint-disable react-native/no-inline-styles */ import React from 'react'; import { View } from 'react-native'; import { AreaChart, YAxis, XAxis } from 'react-native-svg-charts'; import * as shape from 'd3-shape'; const CogniAreaChart = ({ areaChartData, visibility, ...props }) => { const xAxis = areaChartData.message.map((item) => item[Object.keys(item)[0]]); const areaChartY1 = areaChartData.message.map( (item) => item[Object.keys(item)[1]], ); return ( <View style={{ height: props.height, flexDirection: 'row', }}> <YAxis data={areaChartY1} contentInset={{ marginBottom: 20 }} svg={{ fill: 'grey', fontSize: 12, }} /> <View style={{ flex: 1 }}> <AreaChart style={{ flex: 1 }} data={areaChartY1} contentInset={{ top: 20, bottom: 20 }} curve={shape.curveNatural} svg={{ fill: 'rgba(134, 65, 244, 0.8)' }} /> <XAxis style={{ height: 20 }} data={areaChartY1} formatLabel={(value, index) => xAxis[index]} contentInset={{ left: 30, right: 30 }} svg={{ fill: 'grey', fontSize: 12, rotation: 35, originY: 5, y: 15, }} /> </View> </View> ); }; export default CogniAreaChart;
Here is areachartData that is currently being used in
CogniAreaChart.js
:export const areaChartData = { message: [ { year: '2018', quantity: 241.01956823922, sales: 74834.12976954, }, { year: '2019', quantity: 288.57247706422, sales: 80022.3050176429, }, ], status: 'success', };
I have the API that I will replace with the example if anyone suggests.
-
How do I store an array in a PSQL, where it is passed as a parameter $1 to the db query
I am passing a one-dimensional array of three strings to the function, it looks like this going in:
[ '#masprofundo', '#massensual', '#eclectic' ]
The data column is declared thus:
tags TEXT []
This is my function:
const query = `INSERT INTO posts (posted_at, message, link, tags) VALUES (TO_TIMESTAMP($1, 'DD/MM/YYYY HH24:MI'), $2, $3, ARRAY [$4]) RETURNING tags;`; const params = [timestamp, message, link, tags];
Now, postgres believes I want to insert an array containing one item, which is a string of all the values in my tags array. It looks like this:
{ tags: [ '{"#masprofundo","#massensual","#eclectic"}' ] }
What I want to know is, how do I prevent this behaviour, where postGres adds an unnecessary extra layer of quotation marks and parentheses? For further clarification; this is what the row looks like in my terminal.
{"{\"#masprofundo\",\"#massensual\",\"#eclectic\"}"}
I have looked at the docs, and tried a dozen variations on ARRAY[$4]. From what I can see, the docs do not elaborate on inserting arrays as variables. Is there some destructuring that needs to happen? The arrays I pass will be of varying size, and sometimes empty.
Any help is much appreciated.
-
Migrate SQL Server database to PostgreSQL using pgloader
I am trying to migrate a SQL Server database to PostgresSQL using pgloader like this:
pgloader --dry-run --debug mssql://USERNAME:'PASS'@127.0.0.1:1433/DB_NAME postgresql://USERNAME@127.0.0.1:5432/DB_NAME -L log_1.log
I got the following error:
sb-impl::default-external-format :UTF-8
tmpdir: #P"/tmp/pgloader/"
KABOOM!
FATAL error: At end of input^ (Line 1, Column 0, Position 0)
In context KW-LOAD:
While parsing KW-LOAD. Expected: the character tab
or the character Newline
or the character Return
or the character Space
or the string "--"
or the string "/*"
or the string "load"Date/time: 2021-04-21-20:31An unhandled error condition has been signalled:
At end of input
^ (Line 1, Column 0, Position 0)
In context KW-LOAD:
While parsing KW-LOAD. Expected:
the character Tab or the character Newline or the character Return or the character Space or the string "--" or the string "/*" or the string "load" Backtrace for: #<SB-THREAD:THREAD "main thread" RUNNING {10090A4B33}> 0: ((LAMBDA NIL :IN SB-DEBUG::FUNCALL-WITH-DEBUG-IO-SYNTAX)) 1: (SB-IMPL::CALL-WITH-SANE-IO-SYNTAX #<CLOSURE (LAMBDA NIL :IN SB-DEBUG::FUNCALL-WITH-DEBUG-IO-SYNTAX) {100940CE9B}>) 2: (SB-IMPL::%WITH-STANDARD-IO-SYNTAX #<CLOSURE (LAMBDA NIL :IN SB-DEBUG::FUNCALL-WITH-DEBUG-IO-SYNTAX) {100940CE6B}>) 3: (PRINT-BACKTRACE :STREAM #<SB-IMPL::STRING-OUTPUT-STREAM {10092CC473}> :START 0 :FROM :DEBUGGER-FRAME :COUNT 4611686018427387903 :PRINT-THREAD T :PRINT-FRAME-SOURCE NIL :METHOD-FRAME-STYLE NIL :EMERGENCY-BEST-EFFORT NIL) 4: (TRIVIAL-BACKTRACE:PRINT-BACKTRACE-TO-STREAM #<SB-IMPL::STRING-OUTPUT-STREAM {10092CC473}>) 5: (TRIVIAL-BACKTRACE:PRINT-BACKTRACE #<ESRAP:ESRAP-PARSE-ERROR #<ESRAP::FAILED-PARSE PGLOADER.PARSER::KW-LOAD @0> @0 {10092CC2F3}> :OUTPUT NIL :IF-EXISTS :APPEND :VERBOSE NIL) 6: ((FLET #:H0 :IN PGLOADER::MAIN) #<ESRAP:ESRAP-PARSE-ERROR #<ESRAP::FAILED-PARSE PGLOADER.PARSER::KW-LOAD @0> @0 {10092CC2F3}>) 7: (SIGNAL #<ESRAP:ESRAP-PARSE-ERROR #<ESRAP::FAILED-PARSE PGLOADER.PARSER::KW-LOAD @0> @0 {10092CC2F3}>) 8: (ERROR ESRAP:ESRAP-PARSE-ERROR :TEXT "" :RESULT #<ESRAP::FAILED-PARSE PGLOADER.PARSER::COMMANDS @0>) 9: (ESRAP:ESRAP-PARSE-ERROR "" #<ESRAP::FAILED-PARSE PGLOADER.PARSER::COMMANDS @0>) 10: (PGLOADER.PARSER:PARSE-COMMANDS-FROM-FILE #P"/home/yakout/web_applications/bloovo/ATS/Tabadul/log_1.log") 11: (PGLOADER:RUN-COMMANDS #P"/home/yakout/web_applications/bloovo/ATS/Tabadul/log_1.log" :START-LOGGER NIL :FLUSH-SUMMARY T :SUMMARY NIL :LOG-FILENAME NIL :LOG-MIN-MESSAGES NIL :CLIENT-MIN-MESSAGES NIL) 12: (PGLOADER::PROCESS-COMMAND-FILE ("mssql:///CrAdmin:CrPass@dmin123@127.0.0.1:1433/DevCareerDB" "postgresql:///yakout@127.0.0.1:5432/tabadul_local" "-L" "log_1.log") :FLUSH-SUMMARY T) 13: (PGLOADER::MAIN ("pgloader" "--dry-run" "--debug" "mssql:///CrAdmin:CrPass@dmin123@127.0.0.1:1433/DevCareerDB" "postgresql:///yakout@127.0.0.1:5432/tabadul_local" "-L" "log_1.log")) 14: ((LAMBDA NIL :IN "/build/pgloader-2tdEH0/pgloader-3.4.1+dfsg/dumper-2SKVI5f7.lisp")) 15: ((FLET #:WITHOUT-INTERRUPTS-BODY-88 :IN SAVE-LISP-AND-DIE)) 16: ((LABELS SB-IMPL::RESTART-LISP :IN SAVE-LISP-AND-DIE)) 2021-04-21T16:31:50.013000Z NOTICE Starting pgloader, log system is ready. 2021-04-21T16:31:50.021000Z INFO Starting monitor 2021-04-21T16:31:50.025000Z INFO Stopping monitor What I am doing here? At end of input ^ (Line 1, Column 0, Position 0) In context KW-LOAD: While parsing KW-LOAD. Expected: the character Tab or the character Newline or the character Return or the character Space or the string "--" or the string "/*" or the string "load" Waiting for the monitor thread to complete.
How can I fix it?
-
SQL: mean of appearances in two columns
I have the correlation within two time series stored into a PostgreSQL database in the following way:
first_object_identifier second_object_identifier correlation_value A B 1.0 A C 0.9 A D 0.8 B C 0.7 B D 0.6 C D 0.5 And I would like to get the mean of the correlations where each identifier appears (in one of the both columns of identifiers):
object_identifier mean_correlation A mean_A B mean_B C mean_C D mean_D Where:
mean_A = (AB + AC + AD) / 3 = (1.0 + 0.9 + 0.8) / 3 = 0.9 mean_B = (AB + BC + BD) / 3 = (1.0 + 0.7 + 0.6) / 3 = 0.766 mean_C = (AC + BC + CD) / 3 = (0.9 + 0.7 + 0.5) / 3 = 0.7 mean_D = (AD + BD + CD) / 3 = (0.8 + 0.6 + 0.5) / 3 = 0.633
-
Prevent SQL injection in api call
I have a React project. It has a PostgreSQL database. I created the api calls. Im pretty sure the way I have it written makes it succeptable to SQL injection. How can I rewrite it that can avoid this? Keeping in mind I have tons of api calls I would have to rewrite.
app.post("/comment/add", function (req, res) { let sqlquery = `INSERT INTO dbo.comments( eqid,empid,comment, createddate) VALUES ('${req.body.eqid}', '${req.body.empid}', '${req.body.comment}',now() ) RETURNING commid`; try{ console.log(req.body) pool.query(sqlquery, (err, result) => {console.log(result , err) if (result.rowCount) { res.json({eqid:result.rows[0].eqid}); }else { console.log(err); res.status(400).json({ error: "Error while adding comment" }); } }); }catch(error){ } });
-
Is there any way to speed up the predicting process for tensorflow lattice?
I build my own model with Keras Premade Models in tensorflow lattice using python3.7 and save the trained model. However, when I use the trained model for predicting, the speed of predicting each data point is at millisecond level, which seems very slow. Is there any way to speed up the predicting process for tfl?
- how to print "৳"before or after number in my java program?
-
React memo saying props and next props are equal
I have a componet which receives three props which are two data arrays and an object. The object holds the varying data, but
import isEqual from 'lodash/isEqual'; function Component({array1, array2, object}) { return ... } const condition = (props, nextProps) => { console.log(props.object); console.log(nextProps.object); return isEqual(props.object, nextProps.object); } export default React.memo(Component, condition);
Says that
props
andnextProps
have the same values. That is to say:props.name = 'test'
and
nextProps.name = 'test'
when changing
name
to'test'
. Shouldn'tprops
holds the previous value andnextProps
will bring the new ones? What am I missing here? -
Join the single dictionary with the table format
I am looking to join the key value with the table format
n = {'abcd':'alpha','efg':'beta'}
My current code
string = ("\n".join("{}==>{}".format(k,v) for k, v in n.item()))
CurrentOutput:
abcd ==> alpha efg ==> beta
But would like to have something like this
Key Value abcd alpha efg beta
I also tried setting width like
string = ("\n".join("{:<30}==>{:<30}".format(k,v) for k, v in n.items()))
But it’s not helping
-
How to return results from 2 SQL Server tables where one column in common
I've been reading for about 2 hours this afternoon and trying different things to get the results that I need but so far have failed.
Table: Schedule
- ScheduleID NOT NULL
- EmployeeID NOT NULL
- ItemDate NOT NULL
Table: Holidays
- HolidayID NOT NULL
- EmployeeID NOT NULL
- ItemDate NOT NULL
I want to return a result set that has all of the Schedule dates and all of the Holiday dates for a given EmployeeID
Sample data:
Schedule:
ScheduleID EmployeeID ItemDate ------------------------------------ 1 1 1/1/2021 2 1 3/1/2021
Holiday:
HolidayID EmployeeID ItemDate ----------------------------------- 1 1 2/1/2021
Should return the following result set
ScheduleID 1 EmployeeID 1 ItemDate 1/1/2021 HolidayID 1 EmployeeID 1 ItemDate 2/1/2021 ScheduleID 2 EmployeeID 1 ItemDate 3/1/2021
I have tried all sorts of joins, inner, outer, right, left but I can't seem to find any scenario that works for what I want.
I'm happy to have NULL values for any of the columns in the returned result set as I can handle this in the code.
The closest I've got is this but I need to have the HolidayID (even if NULL) and/or the ScheduleID (even if NULL) in the results.
SELECT ScheduleID, HolidayID, EmployeeID, ItemDate FROM Schedule FULL OUTER JOIN Holiday ON Holiday.EmployeeID = Schedule.EmployeeID ORDER BY ItemDate WHERE EmployeeID = 1
Thanks
-
When using parameter for join clause in snowflake, getting wrong result. why?
I'm using dbeaver with connection to snowflake database. I want to select data with join clause. but I need to do it with parameters. my code is:
select count(*) from my_table as a ${join} var join = 'LEFT JOIN table_b AS b ON a.ID = b.ID AND b.NAME = a.NAME'
when I run the select statement (in dbeaver), I get pop up asking me to fill
${join}
value, I put the value in the textbox and the command runs. I get WRONG result! (1,254,242)But when run the following command:
select count(*) from my_table as a LEFT JOIN table_b AS b ON a.ID = b.ID AND b.NAME = a.NAME
I get correct result (900,254)
anybody can help please? thank you.
-
Boolean indexing in Cython
I am new to Cython and trying to Cythonize one of my functions:
PerHit %Time Line ================================ def lap(grid, ghosts_bool, D, dt): 88.3 14.9 grid_out = np.copy(grid) 72.3 12.2 grid_out *= -6 3.3 0.6 shore = grid[1:, :, :] 1.5 0.3 wave = grid[:-1, :, :] 69.6 11.8 shore[ghosts_bool[1:, :, :]] = wave[ghosts_bool[1:, :, :]] 94.8 16.0 grid_out[:-1, :, :] += grid[1:, :, :] 10.4 1.8 grid_out[-1, :, :] += grid[-1, :, :] 37.1 6.3 grid[ghosts_bool] = 0 2.3 0.4 shore = grid[:-1, :, :] 1.4 0.2 wave = grid[1:, :, :] 68.3 11.6 shore[ghosts_bool[:-1, :, :]] = wave[ghosts_bool[:-1, :, :]] 93.6 15.8 grid_out[1:, :, :] += grid[:-1, :, :] 10.5 1.8 grid_out[0, :, :] += grid[0, :, :] 36.6 6.2 grid[ghosts_bool] = 0
It is already optimised in a Numpy manner, but still remains a bottleneck in my program, because with the growth of number of
True
cells inghosts_bool
operations withshore
&waves
become pretty heavy.I have got around how to predefine types, send MemoryViews to the function and do boolean indexing in Cython, but bumped into that Cython is not fully supporting in-place operations.
grid_out *= -6
just needs an ellipsisgrid_out[...] *= -6
, but this one does not work and throws an error:grid_out[1:, :, :] += grid[:-1, :, :]
.How can I best implement this operation? Or looping is the only option?
-
What is the impact of a meta tag with content="noindex,nofollow"
One of my clients is discussing a partnership. The potential partner is asking us to insert the following meta tag in the head-section of our site to "prevent their competitors from indexing the site":
<meta property="partnername:index" content="noindex,nofollow"/>
I really can't see what this tag does and if there is a risk that it prevents standard search engine crawlers (Google, Bing, ...) from indexing the pages.
To be clear: if there was a name attribute with value "robots", this would have been the usual way to tell search engines not to index and not to follow links on the page, but here it is different: no name attribute but a property attribute with the partner's name followed by ":index"
-
Simple SQL query not using index with order clause and larger limit MariaDB
I just came across an issue with database not taking in account index when selecting larger amount of data.
It's a simple select with order clause.
This query (or any other with limit less then a million)
SELECT * FROM mon_stat_detail WHERE 1 ORDER BY id DESC LIMIT 500000
properly uses an index on column id (btw. its not primary but unique index)
While this query
SELECT * FROM mon_stat_detail WHERE 1 ORDER BY id DESC LIMIT 1000000
is using file sort.
Table is quite large, it has about 60 mil. records.
With file sort it takes 15 minutes and creates over 20GB of data on disk due to filesort
However if I force index on the same query
SELECT * FROM mon_stat_detail FORCE INDEX (id_2) WHERE 1 ORDER BY id DESC LIMIT 1000000
it is using it and takes just seconds as expected.
Any idea why is this happening? Why do I need to force this index on such a simple query?
Index definition:
Columns:
Database:
- Server version: 10.1.48-MariaDB-0+deb9u2 - Debian 9.13
- Protocol version: 10