Constrain found set to ignore duplicates

I have a field -- products::parentNo -- that contains duplicate entries. I would like to script a find that returns the first entry and ignores the second, third, fourth, etc entries. I suspect I'll have to create a new field and loop through the records to set a value in that field for the first record. Any other options that do not require a new field?

If you're going to loop anyhow, you may as well sort by ParentNo and omit records when the current value is identical to the value of the record that came before it, otherwise go to next.

1 Like

The products table contains duplicate entries because an individual product may be offered in multiple catalogs (e.g., Jul 20, Aug 20, Sep 20). Therefore, each time I import the monthly catalog I'm creating duplicate entries. I'm considering creating a new "products" table that would contain unique values, renaming the current "products" table to "catalog", and establishing a relationship between the two tables. I've been avoiding adding another table (trying not add schema and cause re-work) but I find the lack of a "pure" products table (with no dupes) irritating.

A three-table setup (product, catalogue and cross-ref table product catalogue) would allow for products existing as one record in ‘products’ Catalogues reference products through records in the x-ref table.

3 Likes

Can you please expand on the cross-reference table? Would a simple one-to-many relationship between the products and catalog tables not be sufficient?

You could try an SQL select statement, and group by parentNo.

Please note: SQL will ignore your found set, so that could be a problem.

1 Like

If you want to avoid adding tables, you can add a field that marks a product as the master product. All duplicates would need to clear the field. When performing your searches, look for the products that have a mark in the field.

1 Like

Do it. You'll need it a zillion times over.

SQL to identify duplicates:

SELECT "field" from
GROUP BY "field"
HAVING COUNT(*) > 1

There are tons of SQL examples for most databases that will remove just the duplicates for you with a single SQL DELETE statement, but I don't believe FileMaker supports these clauses. (Note: for SQL DELETE, you'd have to use JDBC or a plug-in anyway since FileMaker itself still only supports SELECT.)

1 Like

For this particular case, I only need to omit the duplicates. (The entire records are not duplicated as each catalog entry has a unique orderCode field.) I’ll investigate the SQL examples, though, for another case where I may need to delete dupes.

1 Like

Head over to FileMaker Hacks and search for "duplicates" there are a few articles that handle this scenario and related situations.

1 Like

My suggestion would be to have a Products table in which you have each product, description, etc. AND a Catalog table (at least) which is related by product_ID. This way you can look at "Catalog #3" for example and see each product record that was included in that catalog (by means of the product_ID_fk (foreign key) entered into each Catalog record). One product can appear in many catalogs while each catalog has only one entry for each product (if that's the desire).

If you just want to see a list of unique values for a given field in a table there is a way to do this although it won't be performant in large data sets, so this is just an option if you're not concerned with performance: Create a summary field which lists your productNo (this list will show all values in the found set, including duplicates); create a second field of calculation type and use calculation function UniqueValues of the summary field. This will provide a list of only unique values. AGAIN this is not a great idea if you're running large tables or expect to place these fields on regularly accessed layouts, etc. It's just an example of a sort of "utility" functionality.

I have used this latter method in a couple of cases without any problems. In once case I use it on GPS coordinates that correlate to pins on Google Maps where I only want one pin occurrence for each GPS location while there may be many entries in the table with the same value. The summary and un-stored calculation fields are only on a utility layout that is only accessed during an update script. All layouts in the app load immediately (over LAN) and the GPS update script runs in less than two seconds, so it's fine in this case. EDIT to add: I could collect the list of unique GPS values by SQL as well (SELECT DISTINCT), but in this case I like to occasionally look at both lists of values (together with pin label names, etc,). If you haven't seen it, you may like to visit this post on FileMakerHacks for SQL info. It's several years old, but still relevant.

I generally prefer to create arrays by looping through records rather than using summary fields, but I still occasionally make the exception if the situation warrants. E.g. I have one client who insists on running his inventory application over dodgy WiFi and the summary field option has proven more bulletproof in his setting. His total inventory data set is fewer than 10,000 records and the summary field is only accessed as a utility and not on any user-facing layouts.

All that being said, I would still recommend separate tables for products and catalogs (at minimum) provided I understand your intensions.

3 Likes

The cross-ref table is the link between 'products' and 'catalogue', where 'catalogue' records hold the master data for catalogues, like a title. The cross-ref table essentially provides a list of product items related to catalogues.

1 Like

I agree with @Torsten regarding the third table (hence my "at least" comment above). I would have a Products table, a Catalog table (for each issue content like date of issue, region, language, etc.) and a Catalog_Entries table or similar which contains "the children" of each catalog issue. This is dynamic and scalable.

1 Like