Count rows across certain columns in a dataframe if they are greater than another value and groupby another column
I have a dataframe:
df = pd.DataFrame({
'BU': ['Total', 'Total', 'Total', 'CRS', 'CRS', 'CRS'],
'Line_Item': ['Revenues','EBT', 'Expenses', 'Revenues', 'EBT', 'Expenses'],
'1Q16': [100, 120, 0, 200, 190, 210],
'2Q16': [100, 0, 130, 200, 190, 210],
'3Q16': [200, 250, 0, 120, 0, 190]})
I wish to count the number of rows in 1Q16, 2Q16, 3Q16 by "BU" that are greater than zero. To count rows in 1Q16, 2Q16, 3Q16 I was just explained, I can use:
cols = ['1Q16','2Q16','3Q16']
df[cols].gt(0).sum()
In addition, I want to group them by BU
1 answer

With your shown samples, please try following.
cols = ['1Q16','2Q16','3Q16'] df[cols].gt(0).groupby(df['BU']).sum()
Output will be as follows:
1Q16 2Q16 3Q16 BU CRS 3.0 3.0 2.0 Total 2.0 2.0 2.0
Explanation: Following is detailed explanation for above.
 Creating
cols
list which has columns names in it where we want to perform tasks.  Using
gt
function to get values which are more than0
in mentioned cols.  Then using
groupby
and passingdf['BU']
to get groupby values related to BU column.  Then applying
sum
function to get total sum of values greater than0
.
 Creating
See also questions close to this topic

how can I filter using pandas dataframes by multiple values in one column
I have a excel list, includes 2 columns (a and b), and it is 502000 rows. Also I have 2. list, includes 1 column (a), and it is 55 rows.
How can I filter 55 values using pandas dataframes by multiple values (55 values) in one column with 1. list, then write a different file?
Thanks

Matching values simply without using loops
I have a table Distribution containing Amounts and Cumulative probabilities like this one :
Amount Cumulative probability 100 0.25 200 0.50 300 0.75 And a sample of values randomly drawn between 0 and 1 like this one :
Draw 0.75 0.25 0.55 0.30 I would like to get the following table, where I retrieve the Amount corresponding to a Draw by matching loosely the Draw value and the Cumulative probability, without using loops :
Draw Amount 0.75 300 0.25 100 0.55 200 0.30 100 I've found a solution that uses pandas DataFrame and the merge_asof function of pandas. But the code seems massive. Besides, to use the merge_asof function, I need to sort the Draws, which I don't want at the end.
I wonder if there is a much simpler/more straightforward solution (with or without pandas).
Here is my current code :
distribution = pd.DataFrame({"amount": [100, 200, 300], "cumulative_probability": [0.25, 0.50, 0.75]}) draw = pd.DataFrame(np.random.rand(4), columns=["quantile"]) df_final = pd.merge_asof(draw.sort_values("quantile"), distribution, left_on="quantile", right_on="cumulative_probability").sample(frac=1)
Thank you for your help

Adding column to pandas dataframe using group name in function when iterating through groupby
I have a set of data which I fitted using a function, this yielded a dict with fitting parameters where the keys correspond to the possible group names.
Imagine I have another dataframe with some of those groups and some corresponding xvalues. What I would like to do is get the yvalues for the xvalues in the second dataset using the fitting parameters from the dict, without merging the parameters onto the second dataset.
Here is a simplified example of what I would like to do. First I have a function using fitting parameters (not the real one):
def func(x,p): y = 0 for i in range(len(p)): y += p[i]*x**(i) return y
A DataFrame with the second dataset consisting of two columns to group on and some corresponding xvalues:
df = pd.DataFrame({'a': np.random.randint(3, size=20), 'b': np.random.randint(3, size=20), 'x': np.random.randint(10, high=20, size=20)})
A dict with fitting parameters (groups of df are typically a sample of the dict keys):
params = {key: np.random.randint(5,size=3) for key in df.groupby(['a','b']).groups.keys()}
Now I want to calculate a new column 'ycalc', using the group names as selector for params and apply the function. In my head this would look something like:
for name, group in df.groupby(['a','b']): df['ycalc'] = func(params[name],group['c'])
But then the whole column is overwritten for each group, yielding NaN for all members outside the group. Another logical solution would be to use transform, but then I cannot use the group name as input (regardless of possible other syntax mistakes):
df['ycalc'] = df.groupby(['a','b'])['x'].transform(func, args=(params[name]))
What would be the best approach to get column ycalc?

how to resolve GROUP BY error in timescaedb?
I had faced time column not included in selected list if we added in select then it's show error like time column must be added in group by.
the concept of the query was:
select "status" from detail_status where "CAId" = 'test1234' group by "status" order by "time" desc limit 2;
if i add time colum in group by then the result was wrong order., i need the above query concept result just group by "status" column.
please assist for this issue or give another idea for this query concept.
thanks in adavance.

Sum where respecting conditions group by
Please how can i add a condition on my group by to say, if region == europe do the sum group by id, date. I mean by id and cat, we cant to know the distance in europe .
df['distance_europe'] = df.groupby(['id', 'date'])['distance'].transform('sum')
data
df = pd.DataFrame({'id':['x2', 'x1', 'x1', 'x1'], 'date':['20210103','20210102', '20210101', '20210101'], 'distance':[100, 200, 200, 100], 'status': [0, 1, 2, 3], 'region':['USA', 'EUROPE', 'EUROPE', 'EUROPE']})
expected output
df['distance_europe'] = [0, 200, 300, 300]

Mysql query to get created and resolved defect group by month
The table i am using is like bellow
CREATE TABLE IF NOT EXISTS `tickets` ( `id` int(6) unsigned NOT NULL, `created` timestamp , `resolved` timestamp , PRIMARY KEY (`id`) ) DEFAULT CHARSET=utf8; INSERT INTO `tickets` (`id`, `created`, `resolved`) VALUES ('1', '20210101', '20210112'), ('2', '20210225', '20210115'), ('3', '20210310', '20210322'), ('4', '20210310', '20210322'), ('5', '20210310', '20210322'), ('6', '20210311', '20210322'), ('7', '20210313', '20210322'), ('8', '20210313', '20210322'), ('9', '20210401', '20210312.');
Now i want a query to show me the table with columns like 'Month' , NumberOdticketsCreated ,NumberOfTicketsResolved
i am trying this query which is not displaying the correct result
SELECT YEAR(`created`) AS y , MONTH(`created`) AS m , COUNT(`created`) as NumberOdticketsCreated , count(`resolved`) as NumberOfTicketsResolved FROM tickets GROUP BY y, m

Loop through recordset and write records to specific rows based on condition
I have a recordset looping question for which I did not see a solution here that I can adapt, so I ask this as a question:
I have an Excel template for a list of line items, item 1 to 20, which are listed in rows 1 to 20 of the template. The cells A1 to A20 have the running number values "1" to "20" and this never ever changes.
I have an SQL Server database table that contains all line items of all commissioners. The table contains the commissioner name and the running numbers, among other things.
Initially, if 10 items are commissioned, they live in lines 1  10 of the template, and have the corresponding running number in the database. What happens then is that some of these 10 items are sold, are taken off the list, and get deleted or archived in the database.
So I end up with a recordset with running numbers 1, 3, 4 and 7, as an example. What I need is that when I populate the Excel list with the remaining records from the database, that the remaining running numbers to sit in "their" row, where cell value in row A matches with running number of record.
The code I have loops though the recordset and just slaps the 4 remaining records into lines 1 to 4.
But what I need is a loop that moves from one line of the template to the next, checks the cell value in column A, checks if there is a corresponding running number value in the recordset, puts the record in that line if true, and moves to the next line.
Maybe I have to make the value of the "row" variable dependent on which running number values I have in my recordset, but that feels awfully complex, so maybe someone with more experience has a simpler solution.
My code (which is a simple "slap everything in" loop) looks like this:
The commissioner is selected in the SQL Query, and the "row" variable is defined elsewhere and just states the row number to start with.
Sub test() Do While (Not rs1.EOF) Worksheets("Tabelle1").Select 'I tried at different places (inside and outside of the Do...Loop) 'If rs1!RunningNumber.Value = ActiveSheet.Range("A" & Row).Value Then 'but that only had the effect of stopping the loop afetr the first record. ActiveSheet.Range("B" & Row).Value = rs1!ArticleDescription.Value ActiveSheet.Range("F" & Row).Value = rs1!Size.Value ActiveSheet.Range("G" & Row).Value = rs1!CategoryID.Value ActiveSheet.Range("I" & Row).Value = rs1!Price.Value ActiveSheet.Range("L" & Row).Value = rs1!AdditionalInfo.Value Row = Row + 1 rs1.MoveNext Loop End Sub

Enable/Disable HashiCorp Terraform condition constraint block inside statement depending on expression value
I have a terraform statement with multiple condition blocks and I need to enable / disable one of them based on condition:
statement { sid = "..." effect = "Deny" actions = ["s3:PutObject"] condition { # ... } condition { test = "ArnNotEquals" variable = "aws:PrincipalArn" values = [var.needed_arn] # I need to have an expression which turns on / off the current condition like this # enabled = var.environment == "dev" ? true : false } }
Is it possible to do it somehow? If not  maybe there's a way to turn on / off the statements? Thanks in advance!

how to assign new groups to data column in R?
I have a list of data tables (df 1 and df2). I would like to know how fix the sorting based on Type and Area infromation from data table. For e.g. df1$type has 3 rows and df2$type 4 rows, where Bg is common element. To avoid such confusion I am trying to assign each table a predefined group with new column. The predefined groups G1 and G2 has higher area compared to the other avaialble elements in each type. So i would like to know 1) what would be best approch to separate groups if there is a overalp of 12 types 2) How I can also check if area of predefined group is higher compared to other Type elments in each table.
df1< data.frame( No=c(3, 3, 3), Type=c("Bg", "Ea", "Xg"), Area=c(19, 5, 4)) df2 < data.frame( No=c(4, 4, 4, 4), Type=c("Bg", "Ra","Rm","Xg"), Area=c(1, 10, 1, 20))
# list of diffrent tables
df.table < list(df1,df2) # list of tables
G1 < c("Bg", "Ea", "Xg") G2 < c("Ra","Xg")
I tried
for (j in 1:length(df.table )){ grp < as.data.frame(df.table[[j]]) plot< grp %>% mutate(Group= case_when((Type %in% G1)&&(Area[(Type == G1)>(Type != G1)])~"G1" ,(Type %in% G2)&&(Area[(Type == G2)>(Type!= G2)])~"G2" ,TRUE ~ "Other" )) print(plot) }
However, I get an output below. For No. 4 the Group should G2.
No Type Area Group 3 Bg 19 G1 3 Ea 5 G1 3 Xg 4 G1 No Type Area Group 4 Bg 1 G1 4 Ra 10 G1 4 Rm 1 G1 4 Xg 20 G1 Expected output No Type Area Group 3 Bg 19 G1 3 Ea 5 G1 3 Xg 4 G1 No Type Area Group 4 Bg 1 G2 4 Ra 10 G2 4 Rm 1 G2 4 Xg 20 G2 Thanks