I want to be able to produce a shuffle from the same set of values. The values aren't important because they are just place holders, so let's say they are 1 - 9. I want to get the numbers 1-9 shuffled, and I want to do that a lot.
My first thought is to generate a semi-random string: $pool = SetPrecision ( pi ; 400 ) * Random.
Next, to test to ensure that all items are present.
Let ( s = "0123456789" ; 10 = Length ( Filter( s ; $pool ) ) )
Finally, to get the first instance of each item from the chunk.
Any suggestions to do this in a different fashion?
p.s. It's been a while since I've posted a calc at the Soup. Great job with setting up the calculation text formatting feature on this forum! It looks great!
EDIT: I thought about the above calc some more, and, as much as I like it, I realize that it relies on FMP behavior that isn't actually set in stone. That is, AFAIK, Claris has not stated somewhere that their JSON functions will always sort object keys (as they currently do). So, in my eyes, that weakens this approach a lot, because it could just stop working some day with a product update.
Maybe a safer approach would be something like the following, which is based on how I have seen @jwilling handle sorting of JSON arrays:
Very different to the methods that I have explored.
While(
[
ocean = SetPrecision ( Pi ; 400 ) * Random
; r = 0
; n = 99
; i = 9
; harvest = ""
]
; i < n
;[
i = i + 1
; r = Position ( ocean ; i ; 3 ; 1 )
; harvest = List ( harvest ; If ( r > 0 ; r ) )
]
; harvest
)
The idea with this one is that the r value is used to sort the i value.
Using the random location of i to obtain a string is another option I looked at, which is specifically for the 0-9 set.
While(
[
ocean = SetPrecision ( Pi ; 400 ) * Random
; r = 0
; n = 99
; i = 9
; harvest = ""
]
; i < n
;[
i = i + 1
; r = Position ( ocean ; i ; 2 ; 1 )
; harvest = List ( harvest ; If ( r > 0 ; Middle ( ocean ; r + 2 ; 40 ) ) )
; r = Position ( ocean ; i ; 2 ; 2 )
; harvest = List ( harvest ; If ( r > 0 ; Middle ( ocean ; r + 2 ; 40 ) ) )
; r = Position ( ocean ; i ; 2 ; 3 )
; harvest = List ( harvest ; If ( r > 0 ; Middle ( ocean ; r + 2 ; 40 ) ) )
; r = Position ( ocean ; i ; 2 ; 4 )
; harvest = List ( harvest ; If ( r > 0 ; Middle ( ocean ; r + 2 ; 40 ) ) )
]
; harvest
)
The next step would be to pass those numbers through a filter, (not yet written) that keeps only the first instance of a numeral, resulting in a ten digit string which should exhibit some randomness.
2706598413
6913057428
0764853219
8357916042
Of course, setting the range to nine in your function produces the same result as this function. So it's already able to do both of these things.
[ Editing to add the filter mentioned above ]
While (
[
s = "77862498454655329094008460731545239524379"
; template = "123456789"
; i = 0
; x = ""
; y = ""
; len = length ( s )
; test = filter ( template ; s ) = template /* if this test fails we don't do any processing */
]
; test and i < len
; [
i = i + 1
; y = Middle ( s ; i ; 1 )
; x = x & If ( PatternCount ( x ; y ) ; "" ; y )
]
; filter ( x ; template )
)
// --> 786249531
/*
s should be passed in from outside the function
Given a string of numerals, extract the first instance of each integer
Do not return a result unless all required integers are present
*/
EDIT: After eyeballing my picture above, I realized you could feasibly shuffle "in place" (though I think FM lacks the ability to do that efficiently under the hood), by appending the plucked value to the end of ~vals and still decrement ~len on each iteration.
Another similar approach came to mind while at work today. IDK the implications for shuffle "quality", but, instead of plucking out vals randomly from a dwindling pool, You can leave the already-plucked values "in the running" for future swaps. Here's a version of that:
I like that coding. I decided to modify my code to output a single number, so that I could test the suggestions for speed and compare apples to apples.
The rounded averaged run time (over a small number of tests) to generate three thousand sorted strings was
JW: 1
MF: 1.5
SSH: 3.8
These are only indicative numbers. I rounded them out and made the lowest number 1 for ease of comparison. @steve_ssh I'm not sure what is going on with your method but it's really slow. In fact I was surprised that all the methods were relatively slow.
@jwilling's is clearly the best for getting a single sorted set of numbers. However, I do want to generate a lot of numbers. The method I outlined above using pi to generate a long string and tear it to pieces is designed to generate many numbers in a single pass. In the way I've implemented it, it gets about 300 strings at a time, so it is much faster for getting bulk numbers.
I just re-read the original post again and now I understand that the size of elements being shuffled is always small (nine elements) -- but, this shuffle operation needs to be done something like 3K times.
It is true, 3K function calls would probably take around 3+ seconds using the second function that I wrote. I think I misunderstood the original post and thought that the size of the set of elements being shuffled could be arbitrarily large, as in "shuffle the numbers from 1 to 3000". The intent was probably clear enough in the original post, but I just managed to twist the meaning around a little bit into my own interpretation.
Hmmm. I would think that the first method I posted, which uses JSON functions, could have some performance issues that would make it undesirable, but with the second version I would not have expected such a large run duration to generate a shuffled list of size 3000.
I can't really explain it. I'm not doing batch tests of running it over and over, but on my individual tests, using the second calc, the runtime for 3000 entries is always under 1/2 second. Doing 6000 entries is always under a second. It definitely starts to get slow, though, after 6000 entries, for example taking around 1.5 seconds to generate a 9000 entry shuffle list.
I should mention that this is using v.18 of FMP on MacOS High Sierra, so definitely an older setup.
Maybe there is something about running it over and over again which causes it to spike in its overhead, or maybe it just takes longer on more modern versions of FMP.
Thank you for mentioning it. This is absolutely relevant when comparing benchmarks. Everyone interested in doing this benchmark can download it from GitHub and do their own testing
Yes, I agree. I have come to expect performance degradation with a high count of iterations. Looking at the numbers very crudely, the non-linear behavior starts to become easier to notice between 12K and 15K entries.
@Torsten@jwilling I believe that each of you has posted a screenshot of metrics where the "Script Name" column includes values that mention writing to records. Does this mean that you are collecting metrics about the time it takes to run a calculation and perform a CRUD operation numerous times? If so, wouldn't that introduce some variance that would make it difficult to assess the performance of the calculation because there is an additional factor of a CRUD operation that is coming into play?
I'll take a look. Thank you very much for the link to the file @Torsten.
Without an idea of what the file is doing, and what the columns mean, it is difficult for me to interpret the screenshots and be assured that I have understood correctly.
EDIT: I see that you had posted the link earlier on in this thread. Sorry! I had overlooked it.
That's fair, and serves as a good reminder about perf optimization in general. One might hone their shuffle implementation down to some crazy fast speed, only for it to not matter at all when they introduce CRUD and network into the mix.
In academic settings like this thread it's totally fair to test perf of the CF by itself, but in the real world, there's no substitute for measuring the actual thing you're working on for specific bottlenecks. </tiny_gratuitous_soapbox>
That's so true. There are so many layers operating in any OS, and the application thread is just one of many in the queue. Even in the very small sample below we can see that the same test can run fast and slow.
I use this sort of testing to ascertain the trends, such as @jwilling's code being consistently faster than mine. And mine, unusually, being faster than @steve_ssh.
I was able to run @Torsten's file on a more modern machine this morning, and it has left me with a suspicion that the file that generated the above metrics is using my version of the calculation to generate a shuffled set of size 300, rather than a shuffled set of size 9.
I'm wondering if you could please do me a favor and check this on your end? To generate a shuffled set of size 9, the _VALUE_COUNT variable needs to be initialized to a value of 9, instead of 300. I'm not certain that this is the explanation for the slow performance, but it seems like a fairly plausible thought.
I'll attach the file that I used to test, in case it helps. Basically, what I see is that when set to generate a shuffled set of size 300, the numbers relatively correspond to the large numbers shown above. When set to generate a shuffled set of size 9, they are considerably lower.