A bit of history . .
Video settings owe much to early movie practice; in particular, frame rate. Way back then (nearly 100 years ago when movies as we now know them were first getting going), with the use of vacuum tubes (aka valves) as the amplfiers with heaters to supply the electron source, extreme sensitivity to the powerline frequency was demonstable; thus the frame rate of 25 or 30 per second matching the power grid frequency was the way a vertical lock was achieved. National standards defined this and generations of product implemented it. Fast forward to early TV, and this is still aplicable: losing lock with the power frequency and the picture rolled up or down. With transistors and integrated circuit electronics the relevance of the powerline frequency lock has gone, but the standard remains.
Frame rates still matter . .
As has been mentioned the day of analog has passed - in everything but frame rate. Every national standards and broadcast authority still adheres to the old definitions of frame rate for the picture, even tho now it's digital. Since the real world is always analog, consider a digital transmission as just a superfast morse key - a powerfull transmitter which is on or off, rather than variations of amplitude and frequency. The elements of frames, picture quality and sound are encoded in the bursts of energy that are the on's and off's.
Why do phone cameras vary frame rates . .
Cameras on mobile phones are an interesting beast - the sensor is physically small, the lens likewise and the focal length is only about 1-1.5. Camera nerds would know what that means, but the normal F-stop is 8. Zooming etc is always by electronic means, which means that the sensor is always supplying the same quality of image, it's just that the use may be all of it, or just a small part of it. And that explains why a high-res sensor (many mega pixels) is needed for those features: otherwise, zooming would pixelate out quickly.
Enter low light situations for taking photos or video: evenings, or inside buildings. To get enough light on the sensor to deliver a usable image, you could either open the aperture of the lens - make it physically larger by dialling down the focal length (what a specialist camera would do) or in this case since the phone lens cannot have it's aperature changed, the phone camera simply gathers more light on the sensor by slowing down the frame rate. The lower the light level, the lower the frame rate. And that shows in the average frame rate in the clip properties: numbers below 23 for a PAL country or 27 for an NTSC country will indicate a lot of the imagery in the clip was taken in what the phone thought of as low light situations. I've seen some frame rates as low as 16 on clips from phone cameras. Playing the recorded clip back on the same phone isn't a problem - it's designed to allow for the frame variations, but exporting the recording (copying from the saved file to another store) and using it there doesn't take the same playable compatibility with it. And thus those clips usually have problems playing in programs like video editors.
What to do . . .
Most video editors have difficulty managing source files which have frame rates which significantly vary from those specified in standards (PALor NTSC mostly, but SECAM also); the error messages vary but the bottom line has been that the editor crashes trying. Fixing the frame rate fixes the problem. I've seen comment that PD doesn't have this problem - credit the programmers for that. Altho, that may only be for FHD clips - since this seems to have arisen with 4K possible the problem does exist in PD for that sort of clip. Usually, it's lower frame rates rather than higher which cause the problem.
To fix/adjust the frame rate of a clip (meaning, make it the same as a standard), there are various programs available, a lot of them free. Examples: Handbrake, Format factory and others. Using these programs, open a source file, just specify the frame rate for a specific output file, give it a name and the program selectively inserts extra frames progressively in the image to match the desired result.