I have a database with students. I import a bunch of fields and then have a bunch of calc and summary fields based on that import, so there are a total of 282 fields. Then I have a related table called "Student_Additional" which has another 138 related fields where we add our own internal information about students, and calcs on those. Then I have a third table which is for intaking new students which has another 64 fields. All told that's 484 student-related fields.
One thing I'm finding is I end up doing a lot of backflips around wanting to do a calculation or a SQL call which (for example) is based on the student program (which is in the main Students table) and who their advisor is (which is in Student Additional)... and they're just more complicated.
I'm debating the pros and cons of spending a little time merging those two and getting rid of the "additional" table... but that means I'd have over 400 fields in that database, which makes it a little harder to work with.
Are there any technical reasons why I shouldn't merge them? Am I going to have speed problems (or speed improvements) if these 400+ fields are in the same table as opposed to split among two different tables?
There is no technical reason preventing the merge. There is a lot of practical, been-there-done-that experience that suggests combining them into a mega table is the wrong way to go. For speedy results, less fields is best. Do a bit of brainstorming and come up with ways to group those 400 fields in more ways than "student". Then create tables for those groupings. Also, ensure that you don't have any fields that are sequentially numbered. "field", "field2","field3","field4" is screaming "I want to be a related table".
Ok. Do you have specifics about what sort of problems you ran into. I actually grouped them by "import from the primary data source" and "new data". All 480+ fields are specific to students:
Are they interested in summer school, how many English credits do they have? How many more to graduate? How many math, science, social studies, elective do they have? How many to graduate? What's the total of credits? Who's their coach? Their counselor? Home room? Grade level? Gender? Gender abbreviation? parent phone number and name, Home address (multiple fields)? Prior schools... etc. etc.
There are no "field 1, field 2" the closest is student passwords which I'm going to spin off into a join table. We started with one that we imported, then 2 more we added, now we're up to 4 different programs that we store passwords and usernames for, so I'm going to spin them off... but that's it.
Those 64 fields are new for intake only and so I spun them off, although I already ran into one case where I wished I could have had them in the primary because of a calc field I wanted to run on them.
Any input on "what's gone wrong" with many fields in a table would be appreciated so I have some ideas on how to split them up... Totally cool if you don't have time other than to say "you'll hate it" (which I'm hearing)...
I'm sure that every field is related to students but it is not specific to students and I doubt you want to see all of those data points simultaneously.
The contact data is going to consist of names and address info. There might be a few fields that uniquely define a "student." The rest of those fields are definitely not what defines a student.
All of the example you mention are relationships, "x has something to do with y."
"interested in summer school" is something to do with summer school. If they did maths, science, and english in the last term, that is something to do with the courses that your school teaches and the courses the student enrolled for. Their coach? Their coach probably has lots of students. Unless every student has their own coach, that's something to do with their coach, not the student.
To take one example. If you have courses, and you track enrolments in those courses then you automatically generate information such as, how many math, science, social studies, electives a student has taken.
OK thanks. I'm still not sure when/how to split them up. The places I've split them up now has mostly caused more pain than joy for me because calculations. I'll continue to mull it over. I really do appreciate your input.
@Malcolm makes valid points. Not all data points need to "belong" to the same group, many data points will likely be empty for any given student and some data points will be repeated. I suggest focussing on repetitions as a potential starting point.
How many students does a given coach have? How many households (address) have more than one enrolled student? How many parents have more than one enrolled student? How many students were interested in summer school in a given year? All of these questions point to join tables as the most efficient method of getting these answers.
I add access privileges to the lot. Each program staff likely needs to be limited to specific program data. Counselors need to see all program data… but not medical information. You can manage access by separating monolithic data into distributed tables and setting access privileges for each table. You will otherwise need to manage access via scripts… and that is insecure.
You often mention calculation simplicity and performance as a rationale for a monolithic table. Have you considered using scripts instead of calculations in such cases? The calculation results would have the advantage of being stored and indexable while still allowing you to use related tables.
Regarding performance… a well separated data structure will perform better than a monolithic one if well designed. I used to work on an aviation solution that had a table with 900+ fields in one table. It performed OK… but development was a bear. There was a whole bunch of things we could not do without copious amounts of large calculations. Performance took a hit over WAN. Performance nose dived when came time to summarize data. Indexes were numerous and large.
Normalization was always too expensive. More fields were added over time… which made any improvements more expensive. It was a vicious circle that my customer was never willing to break… and it allowed the competition to gain an edge.
Yeah this is super helpful. All the things like "counselor" and "summer school enrollment" are already in a separate table.
I do see I have ~25 fields for passwords and usernames which I can/should generalize and move out of the main table.
~10 are global "comment fields" I put in to organize the database since there aren't folders in the database. The name is something like "----- Attendance". Are these bad?
~75 fields are data that I cache nightly and display in tables on most screens. Things like attendance and credits earned by period. Originally these were portals, but they were super slow so I just cache this every night. Is there a value in putting these in separate tables if they're shown on most screens?
I also have a ton of calculations. Some are indexed, but some are left over from before I did the caching. In some cases I just copy the data out of these fields when doing the caching, in others the caching is done by SQL. I should probably try pruning these down, even if they're not displayed they could be slowing things down...
No technical problem with a 'couple of fields' like those 400 (-:
Although it might be not a perfect structure.
We do a lot of work on 'foreign FileMaker solutions' - means a customer has built an own solution but now heshe has no more time left to work on it or the original developer is no longer available (among other reason).
We came across of a solution, created with FM11, with maybe 100-200 tables. One has over 4800 fields... As I saw that solution for the first time, I mentioned the structure...
That thing runs very well! It is a mess to find out what is what and the original developer was hicked-up by his own 'design' so instead of a line-item, the whole record was deleted (a kingdom for anchor-buoy) - but it runs!
We are now on FM18 with that. One problem: Opening the 'define fields' takes way too much time. This startet with FM16, FM15 was really crispy in that
That said, a db with so much fields is -in my expierience- always a design fault and do really avoid that!! The example above tried to create an excel-like overview of project cost/controlling... per week, a couple of years back and in the future. 52 weeks a year, countless fields per 'set'. umm.
(If one tries to simulate horizontal portals...)
A small classic sql db has maybe 100 primary tables, most of them may have half of a dozen fields (or less). In FileMaker (older developer still have the 50 tables limit in their veines...), we have a few tables with a looot of fields
Can you please elaborate on scripts? I use some scripts to cache information overnight or on import so that it doesn't have to be a calc (I.e. combining first and last name into one field for easy display), but it's still a field...
I assume the overnight scripts are executed server-side or on a robot of some kind. The scripts I suggest run every time related data used in calculations is edited. Yes… a field is a field, however part of the performance equation is whether a field is stored or unstored, indexed or not.
On paper, calculation fields look great. The issues show up in situations where a calculation field can't be stored. Here is an example. I was working on an old app where data structure was not very optimal: there was one table with 400+ fields, on top of that some were repeating fields. I first broke that large tables into multiple tables with relations. On paper everything was now nice. But some calculation fields could not be stored anymore.
I modified a report because some fields were moved to different tables. That customer was a Mac shop by the way. When I ran the report, I had a surprise. Instantly I got the beach ball for 15-20 seconds. The scrolling the report to the next page brought back the beach ball again. The report was a nightmare, thanks to the unstored calculation fields.
Each time those unstored calculation fields were needed, they had to be recalculated. Since FileMaker is clever, it was recalculating only the fields that were to be displayed, not the others. But scrolling forced the recalculation again.
Calculation fields know when they need to be updated, knowing on which other fields they are related to. When the developer uses scripts instead of (automatic) unstored calculations, each time a field is updated, the fields that depend on this field are refreshed by a script and result is stored. Yes it's more work for the developer, but the performance is much better.
A calculation field value is the result of a calculation based on some other field(s). When one of the fields on which this calculation field depends on changes, that triggers the recalculation of the calculation field. All of this is fine when the calculation field - it's value - can be stored. But it's not always the case.
For example Table_C::some_field is a calculation field that depends on
Table_A::field_X and Table_B::field_Y and Table_C::field_Z.
If Table_C::some_field can be stored, and the option to store it is selected, then as soon as one of the dependent field value changes, the value of Table_C::some_field is recalculated and stored.
But if Table_C::some_field calculation can't be stored - the help file tells conditions where the value can't be stored - Table_C::some_field recalculation will occur only when Table_C::some_field is needed, because it's present on a layout displayed or it's used on a report. This last minute recalculation on a large report can be very long.
If instead Table_C::some_field is not a calculation field is but a regular number field
Table_A::field_X and Table_B::field_Y and Table_C::field_Z fields have a script attached to OnObjectSave
the script used to recalculate Table_C::some_field is executed right away and the value is saved.
The role of the script is to calculate a value the same way a calculation field would do, without delaying the recalculation of an unstored calculation field. On top of that, you may implement additional logic !