How to get randomly 20 elements from np.array and save it to DataFrame?
I have DataFrame from 1 to 80 numbers how can i get randomly 20 elements and save result to another DataFrame? I cant save every list like a row. Its saving elements like a columns. In the future i want to try predict every radom elements with sklearn
a = np.arange(1,81).reshape(8,10)
pd.DataFrame(a)
I must to get 20 unique numbers and write it one row. For example in python:
from random import sample
for x in range(1,20):
i=sample(range(1,81), k=20)
i.sort()
print(x,'',i)`
It return as list [1,3,5,8,34,45,12,76,45...] 20 elements and i want its look like :
0 1 2 3 4 5 6 7 8 9 10 11 12 ... 20
0 1 5 10 14 20 55 67 34 ...... 20 elements
1
.
.
3 answers

Use
df.sample()
to get samples of data frm a dataframe:a = np.arange(1,81).reshape(8,10) df = pd.DataFrame(a) df1= df.sample(frac=.25) >>df1 0 1 2 3 4 5 6 7 8 9 5 51 52 53 54 55 56 57 58 59 60 3 31 32 33 34 35 36 37 38 39 40
For a random permutation
np.random.permutation()
:df.iloc[np.random.permutation(len(df))].head(2) 0 1 2 3 4 5 6 7 8 9 6 61 62 63 64 65 66 67 68 69 70 1 11 12 13 14 15 16 17 18 19 20
EDIT : To get 20 elements in a list use:
import itertools list(itertools.chain.from_iterable(df.sample(frac=.25).values)) #[71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
frac=.25
means25%
of the data, since you have used80
elements25%
gives you20
elements, you can adjust the fraction depending on you many elements you have and how many you want.EDIT1: Further to your edit in the question:
print(df.values)
gives you an array:[[ 1 2 3 4 5 6 7 8 9 10] [11 12 13 14 15 16 17 18 19 20] [21 22 23 24 25 26 27 28 29 30] [31 32 33 34 35 36 37 38 39 40] [41 42 43 44 45 46 47 48 49 50] [51 52 53 54 55 56 57 58 59 60] [61 62 63 64 65 66 67 68 69 70] [71 72 73 74 75 76 77 78 79 80]]
You would require to shuffle this array using
np.random.shuffle
, in this case , do it ondf.T.values
since you also want to shuffle columns:np.random.shuffle(df.T.values)
Then do a reshape:
df1 = pd.DataFrame(np.reshape(df.values,(4,20))) >>df1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 0 4 3 10 2 8 7 1 5 6 9 14 13 20 12 18 17 11 15 16 19 1 24 23 30 22 28 27 21 25 26 29 34 33 40 32 38 37 31 35 36 39 2 44 43 50 42 48 47 41 45 46 49 54 53 60 52 58 57 51 55 56 59 3 64 63 70 62 68 67 61 65 66 69 74 73 80 72 78 77 71 75 76 79

This is a simple way using existing stackoverflow answers:
1 flatten the array so it looks more like a list, will allow you to deal with only one index instead of dealing with two array indexes
https://docs.scipy.org/doc/numpy1.15.0/reference/generated/numpy.ndarray.flatten.html
aflat = a.flatten()
2 Choose random items from the flattened array any of the answers here
How to randomly select an item from a list?
3 With the selected data, build your dataframe

You can also use
numpy.random.choice
and you can specify exact rows you want from the sample:In [263]: a = np.arange(1,81).reshape(8,10) In [265]: b = pd.DataFrame(a) In [268]: b.iloc[np.random.choice(np.arange(len(b)), 5, False)] Out[268]: 0 1 2 3 4 5 6 7 8 9 5 51 52 53 54 55 56 57 58 59 60 7 71 72 73 74 75 76 77 78 79 80 3 31 32 33 34 35 36 37 38 39 40 1 11 12 13 14 15 16 17 18 19 20 4 41 42 43 44 45 46 47 48 49 50
You can change
5
to20
for your purpose. You need not worry about the percentile.