back/front filling values in SQL
I have table in PrestoSQL that looks like this:
test
id value timestamp
1 foo blue 2019-10-17 17:42:52
2 foo <NA> 2019-10-17 17:43:52
3 foo <NA> 2019-10-17 17:44:52
4 foo red 2019-10-17 17:45:52
5 foo <NA> 2019-10-17 17:46:52
6 bar <NA> 2019-10-17 17:47:52
7 bar green 2019-10-17 17:48:52
8 bar <NA> 2019-10-17 17:49:52
9 bar <NA> 2019-10-17 17:50:52
10 bar <NA> 2019-10-17 17:51:52
My objective is to fill in the values after they appear in value
for example:
output
id value timestamp
1 foo blue 2019-10-17 17:42:52
2 foo blue 2019-10-17 17:43:52
3 foo blue 2019-10-17 17:44:52
4 foo red 2019-10-17 17:45:52
5 foo red 2019-10-17 17:46:52
6 bar <NA> 2019-10-17 17:47:52
7 bar green 2019-10-17 17:48:52
8 bar green 2019-10-17 17:49:52
9 bar green 2019-10-17 17:50:52
10 bar green 2019-10-17 17:51:52
I understand how to use lead()
and lag()
, but how would one write a query to fill in current values from the previous known value by timestamp (and id) if it is not NA ?
Any suggestions would be appreciated
1 answer
-
answered 2019-10-17 21:55
Piotr Findeisen
You need to use
lag()
withIGNORE NULLS
. Example:presto:default> SELECT -> a, t, v, -> coalesce(v, lag(v, 1) IGNORE NULLS OVER (PARTITION BY a ORDER BY t)) -> FROM (VALUES -> ('a', 1, 'red'), -> ('a', 2, NULL), -> ('a', 3, 'blue'), -> ('a', 4, NULL), -> ('a', 5, NULL) -> ) t(a, t, v); -> a | t | v | _col3 ---+---+------+------- a | 1 | red | red a | 2 | NULL | red a | 3 | blue | blue a | 4 | NULL | blue a | 5 | NULL | blue (5 rows)
(tested in Presto 322)
See also questions close to this topic
-
SQL query to 'collapse' rows which are close together
We have an application that captures searches that are made by users. Because of the nature of our search (we serve results after a few characters) and the speed in which people type, we are getting a log entry for every search/letter. This looks like this:
(it looks like a up-side-down xmas tree...)
We need this data internally for counting searches (aka API calls) but for reporting to our customers it is not very nice to report on 'half' queries.
I'm looking for a way to collapse these rows into one that has the longest/last term(s) for a search.
There is a catch: a user (cid) can make more than 1 searches in a session but we can separate that if we look at timestamps I guess..
It has to be something like:
1) Group rows which are no more than 2 seconds apart
2) Order by length (or last) query to get the final query
3) Group by terms to get a count of how often a term is used to report back
Data as text:
2019-12-09 2019-12-09 12:58:45 5dea585477c94502b52c43fb 92cd6cef-3ed8-4416-ac2d-cc347780b976 search 1 search query vacuum cleaner 2019-12-09 2019-12-09 12:58:45 5dea585477c94502b52c43fb 92cd6cef-3ed8-4416-ac2d-cc347780b976 search 1 search query vacuum cleane 2019-12-09 2019-12-09 12:58:44 5dea585477c94502b52c43fb 92cd6cef-3ed8-4416-ac2d-cc347780b976 search 1 search query vacuum clean 2019-12-09 2019-12-09 12:58:43 5dea585477c94502b52c43fb 92cd6cef-3ed8-4416-ac2d-cc347780b976 search 1 search query vacuum clea 2019-12-09 2019-12-09 12:58:43 5dea585477c94502b52c43fb 92cd6cef-3ed8-4416-ac2d-cc347780b976 search 1 search query vacuum cle 2019-12-09 2019-12-09 12:58:42 5dea585477c94502b52c43fb 92cd6cef-3ed8-4416-ac2d-cc347780b976 search 1 search query vacuum cl 2019-12-09 2019-12-09 12:58:41 5dea585477c94502b52c43fb 92cd6cef-3ed8-4416-ac2d-cc347780b976 search 1 search query vacuum c 2019-12-09 2019-12-09 12:58:40 5dea585477c94502b52c43fb 92cd6cef-3ed8-4416-ac2d-cc347780b976 search 1 search query vacuum 2019-12-09 2019-12-09 12:58:39 5dea585477c94502b52c43fb 92cd6cef-3ed8-4416-ac2d-cc347780b976 search 1 search query vacuu 2019-12-09 2019-12-09 12:58:38 5dea585477c94502b52c43fb 92cd6cef-3ed8-4416-ac2d-cc347780b976 search 1 search query vacu 2019-12-09 2019-12-09 12:58:37 5dea585477c94502b52c43fb 92cd6cef-3ed8-4416-ac2d-cc347780b976 search 1 search query vac 2019-12-09 2019-12-09 12:58:15 5dea585477c94502b52c43fb 9b41fb1d-59d2-4a12-8974-b2261b2fe484 search 0 search query blue widget 2019-12-09 2019-12-09 12:58:14 5dea585477c94502b52c43fb 9b41fb1d-59d2-4a12-8974-b2261b2fe484 search 0 search query blue widge 2019-12-09 2019-12-09 12:58:13 5dea585477c94502b52c43fb 9b41fb1d-59d2-4a12-8974-b2261b2fe484 search 0 search query blue widg 2019-12-09 2019-12-09 12:58:12 5dea585477c94502b52c43fb 9b41fb1d-59d2-4a12-8974-b2261b2fe484 search 0 search query blue wid 2019-12-09 2019-12-09 12:58:12 5dea585477c94502b52c43fb 9b41fb1d-59d2-4a12-8974-b2261b2fe484 search 0 search query blue wi 2019-12-09 2019-12-09 12:58:11 5dea585477c94502b52c43fb 9b41fb1d-59d2-4a12-8974-b2261b2fe484 search 0 search query blue w 2019-12-09 2019-12-09 12:58:10 5dea585477c94502b52c43fb 9b41fb1d-59d2-4a12-8974-b2261b2fe484 search 0 search query blue 2019-12-09 2019-12-09 12:58:09 5dea585477c94502b52c43fb 9b41fb1d-59d2-4a12-8974-b2261b2fe484 search 0 search query blu 2019-12-09 2019-12-09 12:57:38 5dea585477c94502b52c43fb f96305d9-590b-4a10-95a2-2d49a9fc63a3 search 1 search query widget 2019-12-09 2019-12-09 12:57:37 5dea585477c94502b52c43fb f96305d9-590b-4a10-95a2-2d49a9fc63a3 search 1 search query widge 2019-12-09 2019-12-09 12:57:36 5dea585477c94502b52c43fb f96305d9-590b-4a10-95a2-2d49a9fc63a3 search 1 search query widg 2019-12-09 2019-12-09 12:57:35 5dea585477c94502b52c43fb f96305d9-590b-4a10-95a2-2d49a9fc63a3 search 1 search query wid
Expected result:
vacuum cleaner 1 blue widget 1 widget 1
-
LEFT function is not working with sqlite3 in python
I am trying to run the following SQL query in python, I have set up a mock database below.
import sqlite3 from datetime import datetime conn = sqlite3.connect('database.db') conn.execute("""CREATE TABLE IF NOT EXISTS books ( title TEXT, author TEXT, pages INTEGER, published INTEGER )""") values = ('Deep Learning', 'Ian Goodfellow et al.', 775, datetime(2016, 11, 18).timestamp()) conn.execute("""INSERT INTO books VALUES (?, ?, ?, ?)""", values) r = conn.execute("SELECT LEFT(title,2) FROM books") print(r.fetchall())
However I get the error:
File "C:/Users/Daniel/Pictures/AllanGray/Revolut/stack.py", line 18, in <module> r = conn.execute("SELECT LEFT(title,2) FROM books") sqlite3.OperationalError: near "(": syntax error
If I just run a normal basic select statement, it works. What is going wrong with the LEFT function?
-
Fixing Column values in MySQL table
I have a subtitle database website where users can upload a subtitle file to a subtitle database! In order not to duplicate a friend made me another table where 2 Code abbreviations at the end of the file name will be added when saving each subtitle for each specific subtitle in a specific language but I have problem
Everything is fine and works fine for languages where only one language name is written in the table but where two or three language names are written in the table, it does not work at all, whether the language names are not written in the table as it should or what could be the problem ? Here is the table SQL file (for example the languages that doesnt work : ('Gaelic; Scottish Gaelic', 'gd') or ('Spanish; Castilian', 'es') etc..:
-- version 4.8.3 -- https://www.phpmyadmin.net/ -- -- Host: localhost:3306 -- Generation Time: Dec 08, 2019 at 11:22 PM -- Server version: 10.1.43-MariaDB-cll-lve -- PHP Version: 7.2.7 SET SQL_MODE = "NO_AUTO_VALUE_ON_ZERO"; SET AUTOCOMMIT = 0; START TRANSACTION; SET time_zone = "+00:00"; /*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */; /*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */; /*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */; /*!40101 SET NAMES utf8mb4 */; -- -- Database: `xxxx_xxx` -- -- -------------------------------------------------------- -- -- Table structure for table `xxxx_xxxx` -- CREATE TABLE `xxxxx_xxxxx` ( `language` varchar(44) NOT NULL, `code` varchar(2) NOT NULL ) ENGINE=InnoDB DEFAULT CHARSET=latin1; -- -- Dumping data for table `xxxx_xxxx` -- INSERT INTO `language_code` (`language`, `code`) VALUES ('Abkhazian', 'ab'), ('Afar', 'aa'), ('Afrikaans', 'af'), ('Albanian', 'sq'), ('Amharic', 'am'), ('Arabic', 'ar'), ('Aragonese', 'an'), ('Armenian', 'hy'), ('Assamese', 'as'), ('Avestan', 'ae'), ('Aymara', 'ay'), ('Azerbaijani', 'az'), ('Bashkir', 'ba'), ('Basque', 'eu'), ('Belarusian', 'be'), ('Bengali', 'bn'), ('Bihari', 'bh'), ('Bislama', 'bi'), ('Bosnian', 'bs'), ('Breton', 'br'), ('Bulgarian', 'bg'), ('Burmese', 'my'), ('Catalan', 'ca'), ('Chamorro', 'ch'), ('Chechen', 'ce'), ('Chinese', 'zh'), ('Church Slavic; Slavonic; Old Bulgarian', 'cu'), ('Chuvash', 'cv'), ('Cornish', 'kw'), ('Corsican', 'co'), ('Croatian', 'hr'), ('Czech', 'cs'), ('Danish', 'da'), ('Divehi; Dhivehi; Maldivian', 'dv'), ('Dutch', 'nl'), ('Dzongkha', 'dz'), ('English', 'en'), ('Esperanto', 'eo'), ('Estonian', 'et'), ('Faroese', 'fo'), ('Fijian', 'fj'), ('Finnish', 'fi'), ('French', 'fr'), ('Gaelic; Scottish Gaelic', 'gd'), ('Galician', 'gl'), ('Georgian', 'ka'), ('German', 'de'), ('Greek, Modern (1453-)', 'el'), ('Guarani', 'gn'), ('Gujarati', 'gu'), ('Haitian; Haitian Creole', 'ht'), ('Hausa', 'ha'), ('Hebrew', 'he'), ('Herero', 'hz'), ('Hindi', 'hi'), ('Hiri Motu', 'ho'), ('Hungarian', 'hu'), ('Icelandic', 'is'), ('Ido', 'io'), ('Indonesian', 'id'), ('Interlingua (International Auxiliary Languag', 'ia'), ('Interlingue', 'ie'), ('Inuktitut', 'iu'), ('Inupiaq', 'ik'), ('Irish', 'ga'), ('Italian', 'it'), ('Japanese', 'ja'), ('Javanese', 'jv'), ('Kalaallisut', 'kl'), ('Kannada', 'kn'), ('Kashmiri', 'ks'), ('Kazakh', 'kk'), ('Khmer', 'km'), ('Kikuyu; Gikuyu', 'ki'), ('Kinyarwanda', 'rw'), ('Kirghiz', 'ky'), ('Komi', 'kv'), ('Korean', 'ko'), ('Kuanyama; Kwanyama', 'kj'), ('Kurdish', 'ku'), ('Lao', 'lo'), ('Latin', 'la'), ('Latvian', 'lv'), ('Limburgan; Limburger; Limburgish', 'li'), ('Lingala', 'ln'), ('Lithuanian', 'lt'), ('Luxembourgish; Letzeburgesch', 'lb'), ('Macedonian', 'mk'), ('Malagasy', 'mg'), ('Malay', 'ms'), ('Malayalam', 'ml'), ('Maltese', 'mt'), ('Manx', 'gv'), ('Maori', 'mi'), ('Marathi', 'mr'), ('Marshallese', 'mh'), ('Moldavian', 'mo'), ('Mongolian', 'mn'), ('Nauru', 'na'), ('Navaho, Navajo', 'nv'), ('Ndebele, North', 'nd'), ('Ndebele, South', 'nr'), ('Ndonga', 'ng'), ('Nepali', 'ne'), ('Northern Sami', 'se'), ('Norwegian', 'no'), ('Norwegian Bokmal', 'nb'), ('Norwegian Nynorsk', 'nn'), ('Nyanja; Chichewa; Chewa', 'ny'), ('Occitan (post 1500); Provencal', 'oc'), ('Oriya', 'or'), ('Oromo', 'om'), ('Ossetian; Ossetic', 'os'), ('Pali', 'pi'), ('Panjabi', 'pa'), ('Persian', 'fa'), ('Polish', 'pl'), ('Portuguese', 'pt'), ('Pushto', 'ps'), ('Quechua', 'qu'), ('Raeto-Romance', 'rm'), ('Romanian', 'ro'), ('Rundi', 'rn'), ('Russian', 'ru'), ('Samoan', 'sm'), ('Sango', 'sg'), ('Sanskrit', 'sa'), ('Sardinian', 'sc'), ('Serbian', 'sr'), ('Shona', 'sn'), ('Sichuan Yi', 'ii'), ('Sindhi', 'sd'), ('Sinhala; Sinhalese', 'si'), ('Slovak', 'sk'), ('Slovenian', 'sl'), ('Somali', 'so'), ('Sotho, Southern', 'st'), ('Spanish; Castilian', 'es'), ('Sundanese', 'su'), ('Swahili', 'sw'), ('Swati', 'ss'), ('Swedish', 'sv'), ('Tagalog', 'tl'), ('Tahitian', 'ty'), ('Tajik', 'tg'), ('Tamil', 'ta'), ('Tatar', 'tt'), ('Telugu', 'te'), ('Thai', 'th'), ('Tibetan', 'bo'), ('Tigrinya', 'ti'), ('Tonga (Tonga Islands)', 'to'), ('Tsonga', 'ts'), ('Tswana', 'tn'), ('Turkish', 'tr'), ('Turkmen', 'tk'), ('Twi', 'tw'), ('Uighur', 'ug'), ('Ukrainian', 'uk'), ('Urdu', 'ur'), ('Uzbek', 'uz'), ('Vietnamese', 'vi'), ('Volapuk', 'vo'), ('Walloon', 'wa'), ('Welsh', 'cy'), ('Western Frisian', 'fy'), ('Wolof', 'wo'), ('Xhosa', 'xh'), ('Yiddish', 'yi'), ('Yoruba', 'yo'), ('Zhuang; Chuang', 'za'), ('Zulu', 'zu'); COMMIT; /*!40101 SET CHARACTER_SET_CLIENT=@OLD_CHARACTER_SET_CLIENT */; /*!40101 SET CHARACTER_SET_RESULTS=@OLD_CHARACTER_SET_RESULTS */; /*!40101 SET COLLATION_CONNECTION=@OLD_COLLATION_CONNECTION */;
-
Presto - how to perform correlations on between all columns in one query
I have a table in the following format:
A B C D 7 7 2 12 2 2 3 4 2 2 2 4 2 2 2 3 5 5 2 7
I would like to calculate correlations between each of the columns using the build-in correlation function (https://prestodb.io/docs/current/functions/aggregate.html corr(y, x) → double)
I could run over all the columns and perform the corr calculation each time with:
select corr(A,B) from table
but I would like to reduce the number of times I access presto and run it in one query if its possible.Would it be possible to get as a result the column names that pass a certain threshold or at least the correlation scores between all possible combinations in one query?
Thanks.
-
How to fill in missing dates and keep running sum of values using SQL SELECT?
There's a table with values for certain dates:
date | value ------------------------- 2019-01-01 | 50 2019-01-03 | 100 2019-01-06 | 150 2019-01-08 | 20
But what I'm hoping to do is create a time series with a running sum from the first to the last date so:
date | value ------------------------- 2019-01-01 | 50 2019-01-02 | 50 2019-01-03 | 150 (+100) 2019-01-04 | 150 2019-01-05 | 150 2019-01-06 | 300 (+150) 2019-01-07 | 300 2019-01-08 | 320 (+20)
The only constraint is that all tables are
read only
, so I can only query them and not modify them.Does anyone know if this might be possible?
-
How to query table based on specific rows in another table using SQL SELECT
There's a table with data for several teams that looks like this:
original_dates: date | team_id | value --------------------------------- 2019-01-01 | 1 | 13 2019-01-01 | 2 | 88 2019-01-02 | 1 | 17 2019-01-02 | 2 | 99 2019-01-03 | 1 | 26 2019-01-03 | 2 | 105 2019-01-04 | 1 | 49 2019-01-04 | 2 | 134 2019-01-04 | 1 | 56 2019-01-04 | 2 | 167
However, on a certain date, we want to reset that day's value to 0, set all previous dates with that ID to 0, and subtract that value from all following dates, with a minimum of 0. Here's a table of dates that need to be reset:
inflection_dates: date | team_id | value ----------------------------------- 2019-01-02 | 2 | 99 2019-01-03 | 1 | 26
And here's the resulting table, which I'm hoping to achieve:
result: date | team_id | value --------------------------------- 2019-01-01 | 1 | 0 2019-01-01 | 2 | 0 2019-01-02 | 1 | 0 2019-01-02 | 2 | 0 <- row in inflection_dates (value was 99) 2019-01-03 | 1 | 0 <- row in inflection_dates (value was 26) 2019-01-03 | 2 | 6 (-99) 2019-01-04 | 1 | 23 (-26) 2019-01-04 | 2 | 35 (-99) 2019-01-04 | 1 | 30 (-26) 2019-01-04 | 2 | 68 (-99)
The only constraint is that all tables are
read only
, so I can only query them and not modify them.Does anyone know if this might be possible?