How to get randomly 20 elements from np.array and save it to DataFrame?

I have DataFrame from 1 to 80 numbers how can i get randomly 20 elements and save result to another DataFrame? I cant save every list like a row. Its saving elements like a columns. In the future i want to try predict every radom elements with sklearn

   a = np.arange(1,81).reshape(8,10)
   pd.DataFrame(a)

I must to get 20 unique numbers and write it one row. For example in python:

      from random import sample          
      for x in range(1,20):
          i=sample(range(1,81), k=20)
          i.sort()
          print(x,'-',i)`

It return as list [1,3,5,8,34,45,12,76,45...] 20 elements and i want its look like :

  0 1 2 3 4 5 6 7 8 9 10 11 12 ... 20
0 1 5 10 14 20 55 67 34 ......     20 elements
1
.
.

3 answers

  • answered 2019-01-11 05:19 anky_91

    Use df.sample() to get samples of data frm a dataframe:

    a = np.arange(1,81).reshape(8,10)
    df = pd.DataFrame(a)
    df1= df.sample(frac=.25)
    >>df1
    
        0   1   2   3   4   5   6   7   8   9
    5   51  52  53  54  55  56  57  58  59  60
    3   31  32  33  34  35  36  37  38  39  40
    

    For a random permutation np.random.permutation():

    df.iloc[np.random.permutation(len(df))].head(2)
    
        0   1   2   3   4   5   6   7   8   9
    6   61  62  63  64  65  66  67  68  69  70
    1   11  12  13  14  15  16  17  18  19  20
    

    EDIT : To get 20 elements in a list use:

    import itertools
    list(itertools.chain.from_iterable(df.sample(frac=.25).values))
    #[71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
    

    frac=.25 means 25% of the data, since you have used 80 elements 25% gives you 20 elements, you can adjust the fraction depending on you many elements you have and how many you want.

    EDIT1: Further to your edit in the question: print(df.values) gives you an array:

    [[ 1  2  3  4  5  6  7  8  9 10]
     [11 12 13 14 15 16 17 18 19 20]
     [21 22 23 24 25 26 27 28 29 30]
     [31 32 33 34 35 36 37 38 39 40]
     [41 42 43 44 45 46 47 48 49 50]
     [51 52 53 54 55 56 57 58 59 60]
     [61 62 63 64 65 66 67 68 69 70]
     [71 72 73 74 75 76 77 78 79 80]]
    

    You would require to shuffle this array using np.random.shuffle , in this case , do it on df.T.values since you also want to shuffle columns:

    np.random.shuffle(df.T.values)
    

    Then do a reshape:

    df1 = pd.DataFrame(np.reshape(df.values,(4,20)))
    
    >>df1
    
    
        0   1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  16  17  18  19
    0   4   3   10  2   8   7   1   5   6   9   14  13  20  12  18  17  11  15  16  19
    1   24  23  30  22  28  27  21  25  26  29  34  33  40  32  38  37  31  35  36  39
    2   44  43  50  42  48  47  41  45  46  49  54  53  60  52  58  57  51  55  56  59
    3   64  63  70  62  68  67  61  65  66  69  74  73  80  72  78  77  71  75  76  79
    

  • answered 2019-01-11 05:22 jberrio

    This is a simple way using existing stackoverflow answers:

    1- flatten the array so it looks more like a list, will allow you to deal with only one index instead of dealing with two array indexes

    https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.ndarray.flatten.html

    aflat = a.flatten()
    

    2- Choose random items from the flattened array any of the answers here

    How to randomly select an item from a list?

    3- With the selected data, build your dataframe

  • answered 2019-01-11 07:07 Mayank Porwal

    You can also use numpy.random.choice and you can specify exact rows you want from the sample:

    In [263]: a = np.arange(1,81).reshape(8,10)
    In [265]: b = pd.DataFrame(a)
    
    In [268]: b.iloc[np.random.choice(np.arange(len(b)), 5, False)]
    Out[268]: 
        0   1   2   3   4   5   6   7   8   9
    5  51  52  53  54  55  56  57  58  59  60
    7  71  72  73  74  75  76  77  78  79  80
    3  31  32  33  34  35  36  37  38  39  40
    1  11  12  13  14  15  16  17  18  19  20
    4  41  42  43  44  45  46  47  48  49  50
    

    You can change 5 to 20 for your purpose. You need not worry about the percentile.