I'm not sure what's going on, but ever since I updated my M1 mac mini server (from macOS Ventura 13.5 to 13.6) I've had a few times when the web server goes unresponsive. This is running FMS 19.6.3.
- FileMaker server is fine, one can connect to databases with FileMaker Pro
- Apache / httpd web server is unresponsive. You can not connect to / or to /fmi/webd or any other URL
- you can PING the server
- there are no crash reports on Console.app
My hunch is that apache/httpd was upgraded in 13.6, and there's some bug where it stops responding.
If it happens again I'll do more diagnostic work and report back.
This keeps happening, with the WebDirect service going dark perhaps once per week.
I found some logging data which looks informative:
[Fri Oct 20 11:32:58.464322 2023] [proxy:error] [pid 68260] (61)Connection refused: AH00957: http: attempt to connect to 127.0.0.1:16021 (127.0.0.1:16021) failed
[Fri Oct 20 11:32:58.466240 2023] [proxy_http:error] [pid 68260] [client 10.0.32.1:61661] AH01114: HTTP: failed to make connection to backend: 127.0.0.1, referer: [Redacted for privacy]
[Fri Oct 20 11:34:47.471479 2023] [proxy:error] [pid 459] (61)Connection refused: AH00957: http: attempt to connect to 127.0.0.1:16021 (127.0.0.1:16021) failed
[Fri Oct 20 11:34:47.472357 2023] [proxy_http:error] [pid 459] [client 10.0.32.1:49176] AH01114: HTTP: failed to make connection to backend: 127.0.0.1
[Fri Oct 20 11:34:47.473610 2023] [negotiation:error] [pid 459] [client 10.0.32.1:49176] AH00690: no acceptable variant: /usr/share/httpd/error/HTTP_SERVICE_UNAVAILABLE.html.var
It looks (to my untrained eye) that it's not the HTTPD process which is crashing but rather a webdirect proxy?
Right now, the only solution I've found is to reboot the whole server, but I wonder if there's a way to only kickstart the proxy?
It happened again. More info:
- the web server (httpd) is running, and I can view non-WebDirect pages in the document root
- in the FileMaker Server admin webpage, both the WebDirect and Web Publshing Engine (WPE) show as running.
- turning off WebDirect: I no longer get a 503 error, but get a 404 error.
- turning on WebDirect again: back to 503 error.
- turning off, then back on, the WPE: no change
fmsadmin restart x to restart WPE: no change
I gave up and rebooted the server again. This is very frustrating!
Perhaps having a scheduled restart would avoid the unscheduled restarts.
fmsadmin start httpserver -y
Thanks @Malcolm and @Kirk for suggestions. It's been running for 11 days now, and the only thing I've done so far is... nothing. I'm waiting for it to fail again and will report back when it does.
Still running strong for over 14 days now. I changed nothing on the FileMaker side. Why is it working better now? I have several ideas:
- I increased the server monitoring frequency from once every 60 minutes to every 15 minutes. This is a simple cronJob which runs a curl script looking for a
HTTP/1.1 200 reply. Could this increased access be doing something good, such as preventing the proxy, web server, or web direct process from going to sleep? Possibly. However, the monitor only runs from 0700 through 2300 hours, so it's not running for 8 hours every night, so this theory seems less likely.
- I updated the firmware on my router/firewall. What if there is some "packet of death" type attack that could cause the proxy to die, and the firewall is now blocking this attack?
- Perhaps there was some sort of attack, which has now stopped, and the timing is concidence?
Well of course after I declare "it's fixed" then it breaks. Just went offline again, same 503 error.
Tried this: had zero effect.
Any other ideas other then reebooting the whole macOS server?
- one of our clients sent out a mass email containing a link to the WebDirect website, about 10 minutes before the crash.
- The first evidence of a problem in the log files:
[Thu Nov 09 11:11:47.837883 2023] [proxy_http:error] [pid 53934] (20014)Internal error (specific information not available): [client A.B.C.D:3204] AH01102: error reading status line from remote server 127.0.0.1:16021
- So, something running on port 16021 on localhost appears to have failed. what process is it?
sudo lsof -i -P | grep LISTEN | grep :$PORT | grep 16021
java 1115 fmserver 61u IPv6 0x859b609868babbf5 0t0 TCP localhost:16021 (LISTEN)
Appears to be some Java process crashing.
Fishing around for java logs, I see this:
which appears to show the fault:
09-Nov-2023 11:11:45.892 WARNING [http-nio-127.0.0.1-16021-exec-8] org.apache.catalina.connector.Request.startAsync Unable to start async because the following classes in the processing chain do not support async [com.filemaker.jwpc.filter.JWPCFilter,com.FileMaker.jwpc.filter.PushRequestValidator]
java.lang.IllegalStateException: A filter or servlet of the current chain does not support asynchronous operations.
at com.filemaker.jwpc.iwp.application.AppServlet.service(Unknown Source)
at com.filemaker.jwpc.filter.JWPCFilter.doFilter(Unknown Source)
at com.filemaker.jwpc.filter.PushRequestValidator.doFilter(Unknown Source)
at com.filemaker.tomcat.FMErrorReportValve.invoke(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)ava.lang.Thread.run(Unknown Source)
Thinking that this could be a problem with my installed version of Java:
- to find the Java version installed for FileMaker Server:
cat /Library/FileMaker\ Server/Web\ Publishing/java/release
- which shows I'm running:
- the latest version(s) can be found here: https://adoptium.net/temurin/releases/?version=11
and it appears that I'm a bit behind, as
11.0.21+9 is available.
Might be worth upgrading to see if it fixes the problems?
You might try that fmsadmin command a couple times in a row. I've not had it restart on the first try more than once.
Not that much of anything in the logs are decipherable (at least by me), but the VAANDIN stuff if the WebDirect code, FWIIW
Claris stopped bundling Java. You would have seen the notices to install Java when you were installing FMS. I don’t have the correct version details on hand but they should be on the help/install pages.
Each FMS version is compatible with a specific java version. You should check with which Java version FMS 19.6.3 is compatible and install only that version. You may as well check if more than one version are installed.
FileMaker Server 19.2.1 through FileMaker Server 19.6.3 --> Claris International Inc. recommends the latest version of Java 11, for either OpenJDK or Oracle
There doesn't appear to be an easy way to upgrade in place. Per
the upgrade procedure appears to be:
- Disable Web Publishing
- stop the server
/FileMaker Server/Web Publishing/Java
- start the server
- Enable Web Publishing : this will trigger the initial Java installation process and you can install the newer version.
Very intersting - it happened again (the proxy died). Also, while doing some tests, I noticed that an idle webdirect page was getting the "1x JSON error":
Take note of any unsaved data, and click here to continue. Invalid JSON from server: 1X
As described here: ClarisPKB
What's interesting is that the workaround is to add a timeout to the proxy running on port 16021, which is the same proxy that is crashing.
I wonder if these are two different manifestations of the same core bug?
In any case, I've applied the fix, and will report back in 14 days (or maybe 14 minutes) when it happens again.
More info: I have two nearly identical servers, and am having the issue only on one of them.
- Both are 2020 Mac Minis with the M1 chip
- both have FMS 18.104.22.1682
- the "good" server is on macOS Ventura 13.5
- the "bad" server is on macOS Ventura 13.6
- the "good" server is running Java version Temurin-22.214.171.124 x86_64
- the "bad" server is running Java version Temurin-11.0.17+8 arm64
My recollection is that the problems started about the time I upgraded the "bad" server from 13.5 to 13.6. However, the fact that the two servers are on different Java versions, and the bad server is running the Apple Silicon version of Java could be important.
My luck remains consistenly bad... Literally 15 minutes after I went to bed, the server crashed, but this time it was a different crash:
- FileMaker still running
- entire HTTP server offline
- this time
fmsadmin httpserver start fixed things
Error logs are different:
tail /Library/FileMaker Server/HTTPServer/logs/ssl_error_log
[Fri Nov 10 21:31:46.420819 2023] [ssl:emerg] [pid 23119] AH02580: Init: Pass phrase incorrect for key example.com:443:0
[Fri Nov 10 21:31:46.421727 2023] [ssl:emerg] [pid 23119] SSL Library Error: error:0DFFF0A8:asn1 encoding routines:CRYPTO_internal:wrong tag
[Fri Nov 10 21:31:46.421742 2023] [ssl:emerg] [pid 23119] SSL Library Error: error:0DFFF03A:asn1 encoding routines:CRYPTO_internal:nested asn1 error
[Fri Nov 10 21:31:46.421754 2023] [ssl:emerg] [pid 23119] SSL Library Error: error:0DFFF0A8:asn1 encoding routines:CRYPTO_internal:wrong tag
[Fri Nov 10 21:31:46.421768 2023] [ssl:emerg] [pid 23119] SSL Library Error: error:0DFFF03A:asn1 encoding routines:CRYPTO_internal:nested asn1 error (Type=RSA)
[Fri Nov 10 21:31:46.421781 2023] [ssl:emerg] [pid 23119] SSL Library Error: error:04FFF004:rsa routines:CRYPTO_internal:RSA lib
[Fri Nov 10 21:31:46.421793 2023] [ssl:emerg] [pid 23119] SSL Library Error: error:0DFFF0A8:asn1 encoding routines:CRYPTO_internal:wrong tag
[Fri Nov 10 21:31:46.421806 2023] [ssl:emerg] [pid 23119] SSL Library Error: error:0DFFF03A:asn1 encoding routines:CRYPTO_internal:nested asn1 error (Type=PKCS8_PRIV_KEY_INFO)
[Fri Nov 10 21:31:46.421823 2023] [ssl:emerg] [pid 23119] AH02564: Failed to configure encrypted (?) private key example.com:443:0, check /Library/FileMaker Server/HTTPServer/conf/server.key
- it turns out the client was sending mass emails to several thousand customers with a link to the WebDirect site. I didn't know they were doing this, but once they told me this, the pattern became clear. Each email blast is followed by the proxy server crashing within in a few minutes. Logs show something linke 1500 hits to the
/fmi/webd/ url within under a minute. We've repeated this several times now and it's reproducible.
- It's not actually humans clicking the URL in the email to the WebDirect site, but rather some sort of email malware scanners accessing the URL.
- I don't know whether it's the volume of hits that is crashing the proxy or some other issue (such as a malformed HTTP request, bad headers, etc.) but in any case I'm working with the client to say "don't do that" (don't include WebDirect URLs in the emails).
- one bit of good news. I discovered that I no longer have to reboot the entire server, rather I can issue these commands:
fmsadmin restart SERVER
fmsadmin restart WPE
and the WebDirect comes back up. Still annoying as we have kick offline any FMPro users, but quicker & less disruptive then a full server reboot.