What methods do we have to get a shuffle of a list of items?

I want to be able to produce a shuffle from the same set of values. The values aren't important because they are just place holders, so let's say they are 1 - 9. I want to get the numbers 1-9 shuffled, and I want to do that a lot.

My first thought is to generate a semi-random string:
$pool = SetPrecision ( pi ; 400 ) * Random.

Next, to test to ensure that all items are present.

Let ( s = "0123456789" ; 10 = Length ( Filter( s ; $pool ) ) )

Finally, to get the first instance of each item from the chunk.

Any suggestions to do this in a different fashion?

3 Likes

I'm definitely just playing around and trying to be tricky, and can't claim to have vetted this much:

While([

	_VALUE_COUNT = 2000 ;

	_PADDING = "0000000000" ;

	~i = 0 ;

	~buffer = ""
];
	~i < _VALUE_COUNT ;
[
	// Appending ~i at end of ~next_entry to ensure no duplicate entries

	~next_entry = Right( _PADDING & _PADDING & Random ; 25 ) & "-" & ~i ;

	~buffer = "" & ~buffer & ",\"" & ~next_entry & "\":" & ~i ;

	~i = ~i + 1
];

	Let(
		~json_object = Replace( ~buffer ; 1 ; 1 ; "{" ) & "}" ;

		JSONListValues( ~json_object ; "" )
	)
)

p.s. It's been a while since I've posted a calc at the Soup. Great job with setting up the calculation text formatting feature on this forum! It looks great!

EDIT: I thought about the above calc some more, and, as much as I like it, I realize that it relies on FMP behavior that isn't actually set in stone. That is, AFAIK, Claris has not stated somewhere that their JSON functions will always sort object keys (as they currently do). So, in my eyes, that weakens this approach a lot, because it could just stop working some day with a product update.

Maybe a safer approach would be something like the following, which is based on how I have seen @jwilling handle sorting of JSON arrays:

While([

	_VALUE_COUNT = 300 ;

	_PADDING = "aaaaaaaaaaaaa" ;

	~i = 0 ;

	~buffer = ""
];
	~i < _VALUE_COUNT ;
[
	~random_string = Substitute( Random ;

		[ "." ; "" ]; [ "," ; "" ]; [ 0 ; "a" ]; [ 1 ; "b" ];
		[ 2 ; "c" ]; [ 3 ; "d" ]; [ 4 ; "e" ]; [ 4 ; "e" ];
		[ 5 ; "f" ]; [ 6 ; "g" ]; [ 7 ; "h" ]; [ 8 ; "i" ];
		[ 9 ; "j" ]
	);

	~next_entry = Right( _PADDING & _PADDING & ~random_string ; 25 ) & "-" & ~i ;

	~buffer = "" & ~buffer & "¶" & ~next_entry ;

	~i = ~i + 1
];

	Let([

		~buffer_no_leading_cr = Replace( ~buffer ; 1 ; 1 ; "" );

		~buffer_sorted = SortValues( ~buffer_no_leading_cr ; 1 /* Text Sort */ );

		~buffer_sans_sort_strings = Substitute( ~buffer_sorted ;

			[ "a" ; "" ]; [ "b" ; "" ]; [ "c" ; "" ]; [ "d" ; "" ];
			[ "e" ; "" ]; [ "f" ; "" ]; [ "g" ; "" ]; [ "h" ; "" ];
			[ "i" ; "" ]; [ "j" ; "" ]; [ "-" ; "" ]
		)
	];

		~buffer_sans_sort_strings	
	)
)
2 Likes

Nice! That definitely works.

Very different to the methods that I have explored.

While( 
[ 
  ocean = SetPrecision ( Pi ; 400 ) * Random 
; r = 0
; n = 99
; i = 9 
; harvest = ""
]
; i < n

;[
 i = i + 1
; r = Position ( ocean ; i ; 3 ; 1 )
; harvest =  List ( harvest ; If ( r > 0 ; r ) )

]

;  harvest 
)

The idea with this one is that the r value is used to sort the i value.

Using the random location of i to obtain a string is another option I looked at, which is specifically for the 0-9 set.

While( 
[ 
  ocean = SetPrecision ( Pi ; 400 ) * Random 
; r = 0
; n = 99
; i = 9 
; harvest = ""
]
; i < n

;[
 i = i + 1
; r = Position ( ocean ; i ; 2 ; 1 )
; harvest =  List ( harvest ; If ( r > 0 ; Middle ( ocean ; r + 2 ; 40 ) ) )

; r = Position ( ocean ; i ; 2 ; 2 )
; harvest =  List ( harvest ; If ( r > 0 ; Middle ( ocean ; r + 2 ; 40 ) ) )

; r = Position ( ocean ;  i  ; 2 ; 3 )
; harvest =  List ( harvest ; If ( r > 0 ; Middle ( ocean ; r + 2 ; 40 ) ) )

; r = Position ( ocean ;  i  ; 2 ; 4 ) 
; harvest =  List ( harvest ; If ( r > 0 ; Middle ( ocean ; r + 2 ; 40 )  ) )

]

;  harvest 
)

This generates a list that looks like this:

2272770657928204548272652160464949683502
6913059975642646667704575504582672346313
0767485333281952881890918496036038816826
8385879160450388295473228102272770657928

The next step would be to pass those numbers through a filter, (not yet written) that keeps only the first instance of a numeral, resulting in a ten digit string which should exhibit some randomness.

2706598413 
6913057428 
0764853219
8357916042

Of course, setting the range to nine in your function produces the same result as this function. So it's already able to do both of these things.

[ Editing to add the filter mentioned above ]

While ( 
[ 
  s = "77862498454655329094008460731545239524379"
; template = "123456789"
; i = 0 
; x = ""
; y = ""
; len = length ( s ) 
; test = filter ( template ; s ) = template /* if this test fails we don't do any processing */
 ] 

;  test and i < len

; [ 
  i = i + 1
; y = Middle ( s ; i ; 1 )
; x = x & If ( PatternCount ( x ; y ) ; "" ; y )
  ] 
; filter ( x ; template )
)
// --> 786249531
/*
s should be passed in from outside the function
Given a string of numerals, extract the first instance of each integer
Do not return a result unless all required integers are present
*/

2 Likes

Daaaaaang. Bringing Pi into the mix is tasty! I'll need to study that one more.

I also love @steve_ssh 's idea to sort a bunch of random values and shuffle in one swing, taking the original values along for the ride.

Here's my quick'n'dirty stab:

While ([
	~vals = "1¶2¶3¶4¶5¶6¶7¶8¶9" ;
	~len = ValueCount ( ~vals  ) ;
	~res = ""
];
~len > 0 ;
[
	// get a random 1-based idx within remaining range
	~idx = Ceiling ( random * ~len ) ;

	// pluck out the value
	~val = GetValue ( ~vals ; ~idx ) ;
	~vals = LeftValues ( ~vals ; ~idx - 1 ) & RightValues ( ~vals ; ~len - ~idx ) ;
	~len = ~len - 1 ;

	// append it to the result
	~res = list ( ~res ; ~val )
];
~res
)

And here's a comparably quick'n'dirty visual of the algo

EDIT: After eyeballing my picture above, I realized you could feasibly shuffle "in place" (though I think FM lacks the ability to do that efficiently under the hood), by appending the plucked value to the end of ~vals and still decrement ~len on each iteration.

2 Likes

Another similar approach came to mind while at work today. IDK the implications for shuffle "quality", but, instead of plucking out vals randomly from a dwindling pool, You can leave the already-plucked values "in the running" for future swaps. Here's a version of that:

While ([
	~vals = "[1,2,3,4,5,6,7,8,9]" ;
	~len = ValueCount ( JSONListKeys ( ~vals ; "" ) ) ;
	~a = 0
];
~a < ~len ;
[
	// get a random 0-based idx
	~b = Floor ( Random * ~len ) ;

	// get vals
	~aVal = JSONGetElement ( ~vals ; ~a ) ;
	~bVal = JSONGetElement ( ~vals ; ~b ) ;

	// swap them
	~vals = JSONSetElement ( ~vals ;
		[ "[" & ~a & "]" ; ~bVal ; JSONNumber ] ;
		[ "[" & ~b & "]" ; ~aVal ; JSONNumber ]
	) ;

	~a = ~a + 1
];
jsonlistvalues ( ~vals ; "" )
)

I like that coding. I decided to modify my code to output a single number, so that I could test the suggestions for speed and compare apples to apples.

The rounded averaged run time (over a small number of tests) to generate three thousand sorted strings was :slight_smile:

JW: 1
MF: 1.5
SSH: 3.8

These are only indicative numbers. I rounded them out and made the lowest number 1 for ease of comparison. @steve_ssh I'm not sure what is going on with your method but it's really slow. In fact I was surprised that all the methods were relatively slow.

@jwilling's is clearly the best for getting a single sorted set of numbers. However, I do want to generate a lot of numbers. The method I outlined above using pi to generate a long string and tear it to pieces is designed to generate many numbers in a single pass. In the way I've implemented it, it gets about 300 strings at a time, so it is much faster for getting bulk numbers.

1 Like

That's interesting! I've put @jwilling's code to test here, doing the 300 iterations you described.
These are the results:

Tested on MacBook Pro M1, locally hosted file, FMP 19.6.3

@Torsten Nice benchmarks. I'm personally a smidge fonder of the first calc I posted because it doesn't incur the overhead cost of json fns.

Seems a bit faster (lines with "*No_JSON") -- CAVEAT: ran on a different computer!!:

EDIT: I just sent you a pull request on your repo with those CFs added

1 Like

EDIT: I think I see what is going on now.

I just re-read the original post again and now I understand that the size of elements being shuffled is always small (nine elements) -- but, this shuffle operation needs to be done something like 3K times.

It is true, 3K function calls would probably take around 3+ seconds using the second function that I wrote. I think I misunderstood the original post and thought that the size of the set of elements being shuffled could be arbitrarily large, as in "shuffle the numbers from 1 to 3000". The intent was probably clear enough in the original post, but I just managed to twist the meaning around a little bit into my own interpretation.

Hmmm. I would think that the first method I posted, which uses JSON functions, could have some performance issues that would make it undesirable, but with the second version I would not have expected such a large run duration to generate a shuffled list of size 3000.
I can't really explain it. I'm not doing batch tests of running it over and over, but on my individual tests, using the second calc, the runtime for 3000 entries is always under 1/2 second. Doing 6000 entries is always under a second. It definitely starts to get slow, though, after 6000 entries, for example taking around 1.5 seconds to generate a 9000 entry shuffle list.

I should mention that this is using v.18 of FMP on MacOS High Sierra, so definitely an older setup.

Maybe there is something about running it over and over again which causes it to spike in its overhead, or maybe it just takes longer on more modern versions of FMP.

1 Like

Thank you for mentioning it. This is absolutely relevant when comparing benchmarks. Everyone interested in doing this benchmark can download it from GitHub and do their own testing :slightly_smiling_face:

1 Like

This is something I observed in FM on many occasions. When looping over a large number of iterations, execution time growth is above linear.

1 Like

Yes, I agree. I have come to expect performance degradation with a high count of iterations. Looking at the numbers very crudely, the non-linear behavior starts to become easier to notice between 12K and 15K entries.

3000 entries --> .5 seconds
6000 entries --> 1 second
9000 entries --> 1.5 seconds
12000 entries --> 2.0 seconds
15000 entries --> 2.6 seconds

I wonder why FM shows this degradation when the file is locally hosted. There is no recursion involved.

I put it to test in a hosting situation:
Server: Mac mini M1, 16 GB, connection Gbit Ethernet
Client; MacBook Pro M1, 8 GB, connection WLAN 120 mbps

This alters performance substantially:

1 Like

@Torsten @jwilling I believe that each of you has posted a screenshot of metrics where the "Script Name" column includes values that mention writing to records. Does this mean that you are collecting metrics about the time it takes to run a calculation and perform a CRUD operation numerous times? If so, wouldn't that introduce some variance that would make it difficult to assess the performance of the calculation because there is an additional factor of a CRUD operation that is coming into play?

1 Like

Yes, it does involve CRUD operations, in all test cases. The result of each loop needs to be stored for further use. This is a real-world scenario.

Here's the link to the file:

https://github.com/torstenbernhard/FM_ShuffleBenchmark.git

3 Likes

I'll take a look. Thank you very much for the link to the file @Torsten.

Without an idea of what the file is doing, and what the columns mean, it is difficult for me to interpret the screenshots and be assured that I have understood correctly.

EDIT: I see that you had posted the link earlier on in this thread. Sorry! I had overlooked it.

2 Likes

This is a real-world scenario.

That's fair, and serves as a good reminder about perf optimization in general. One might hone their shuffle implementation down to some crazy fast speed, only for it to not matter at all when they introduce CRUD and network into the mix.

In academic settings like this thread it's totally fair to test perf of the CF by itself, but in the real world, there's no substitute for measuring the actual thing you're working on for specific bottlenecks.
</tiny_gratuitous_soapbox>

3 Likes

Part 2: Question the method to write the data to records?! :rofl:

1 Like

That's so true. There are so many layers operating in any OS, and the application thread is just one of many in the queue. Even in the very small sample below we can see that the same test can run fast and slow.

I use this sort of testing to ascertain the trends, such as @jwilling's code being consistently faster than mine. And mine, unusually, being faster than @steve_ssh.

Here are my results, using @Torsten's benchmark.

Millisec Seconds Iterations Script
136070 .13607 300 shuffleJW_WriteToExistingRecords_No_JSON
152644 .152644 300 shuffleJW_WriteToExistingRecords_No_JSON
197229 .197229 300 shuffleJW_WriteToNewRecords_No_JSON
213184 .213184 300 shuffleJW_WriteToNewRecords_No_JSON
293153 .293153 300 shuffleJW_WriteToExistingRecords
376300 .3763 300 shuffleJW_WriteToNewRecords
298743 .298743 300 shuffleJW_WriteToExistingRecords
368175 .368175 300 shuffleJW_WriteToNewRecords
400563 .400563 300 shuffle_WriteToExistingRecords_MF
433018 .433018 300 shuffle_WriteToExistingRecords_MF
457993 .457993 300 shuffle_WriteToNewRecords_MF
509322 .509322 300 shuffle_WriteToNewRecords_MF
830037 .830037 300 shuffle_WriteToExistingRecords_MF
872208 .872208 300 shuffle_WriteToNewRecords_MF
6477797 6.477797 300 shuffle_WriteToNewRecords_SSH
6525850 6.52585 300 shuffle_WriteToExistingRecords_SSH
6640577 6.640577 300 shuffle_WriteToNewRecords_SSH
6663094 6.663094 300 shuffle_WriteToExistingRecords_SSH
7387508 7.387508 300 shuffle_WriteToNewRecords_SSH
9143149 9.143149 300 shuffle_WriteToExistingRecords_SSH
1 Like

Hi @Malcolm.

I was able to run @Torsten's file on a more modern machine this morning, and it has left me with a suspicion that the file that generated the above metrics is using my version of the calculation to generate a shuffled set of size 300, rather than a shuffled set of size 9.

I'm wondering if you could please do me a favor and check this on your end? To generate a shuffled set of size 9, the _VALUE_COUNT variable needs to be initialized to a value of 9, instead of 300. I'm not certain that this is the explanation for the slow performance, but it seems like a fairly plausible thought.

I'll attach the file that I used to test, in case it helps. Basically, what I see is that when set to generate a shuffled set of size 300, the numbers relatively correspond to the large numbers shown above. When set to generate a shuffled set of size 9, they are considerably lower.

Thanks!
Shuffle_Mod_20231106_0821.fmp12.zip (495.5 KB)

3 Likes