I'm using the GoogleDriveAPI to check the last time a file was modified, if the file has been modified after I last executed the script it should download the file and parse it, otherwise the local version is the latest one.
I'm testint the python script that perform the calls like this:
HR:M:S | Action | Behaviour |
---|---|---|
00:00:00 | Update Google Sheet | |
00:00:05 | call script | sees changes and modified time gets updated |
00:00:10 | Update Google Sheet | |
00:00:15 | call script | (DOES NOT SEE CHANGES) |
00:00:50 | call script | (DOES NOT SEE CHANGES) last modified time is still 00:00:00 |
00:03:00 | Update Google Sheet | |
00:03:05 | call script | (SEES CHANGES) |
what it looks like to me is that the Google API groups together changes that occur too close in time.
For reference 05 diff in seconds means instantly (just tabbing to terminal and back to GoogleSheet)
the python script I am using:
request = service.files().get(fileId='----------------', fields='*')
value = request.execute()
file_name = value['name'] + ".xlsx"
modified_date = value['modifiedTime'].split("T")[0]
modified_time = value['modifiedTime'].split("T")[1].split(".")[0]
# some checks
## after some checking with current time, which does not matter, since the
## value should come from an HTTP request,
## I still delete the variables where it is stored, both the request return and the values
del value
del modified_date
del modified_time
Is this a GoogleDriveAPI issue, or is some part of my code wrong?
something that python is caching maybe?
Or simply behavior from GoogleSheets itself grouping changes as I mentioned?
Is there any way to Note that the issue is 100% not related to credentials.
Also, the changes are there, if I simply read the document after the call to the API I can see the changes applied a second ago in the version I just downloaded. But this behaviour of polling is not desired
Answer
The issue you're describing is likely related to the way the Google Drive API reports file modifications, and not necessarily something wrong with your Python code. Here's a breakdown of the likely causes and some potential solutions:
Possible Causes:
-
Google Drive's Modified Time Granularity: The Google Drive API returns the
modifiedTime
field, which represents the time when the file was last modified. However, the granularity of this timestamp may not be precise enough to capture very small changes (like updates within a few seconds of each other). It's possible that Google Sheets or Google Drive groups multiple changes that happen in close proximity in time, which could lead to the behavior you're observing — the API reports the samemodifiedTime
even if there were changes within seconds of each other. -
Caching or API Delays: Google Drive may have a delay in propagating the changes or updating the metadata (like
modifiedTime
). This is particularly common in cloud services where changes are processed asynchronously across multiple servers. When you call the API, it might return a cached value of themodifiedTime
that hasn't been updated yet to reflect the most recent changes. -
Google Sheets Versioning vs. File Metadata: Google Sheets may be applying changes internally (e.g., auto-saving), but the file metadata might not be updated immediately. This could explain why you see changes in the document (such as new data in a cell) but the API reports the same
modifiedTime
. Google Sheets does have an auto-save mechanism that might not immediately trigger amodifiedTime
update for each minor change.
Solutions/Approaches:
1. Use revisionId
to Check for Changes
Instead of relying on the modifiedTime
, you can use the revisionId
field to check for updates. The revisionId
uniquely identifies a version of the file. If the revisionId
changes, it means the file has been updated, regardless of whether the modifiedTime
has been updated. You can compare the current revisionId
with the last known one to detect if the file has been modified.
Here’s how you could implement this:
# Retrieve the file metadata, including the revisionId
request = service.files().get(fileId='your_file_id', fields='revisionId')
value = request.execute()
# Store the last known revisionId in a persistent location (e.g., file, database)
last_revision_id = get_last_known_revision_id()
# Compare the current revisionId with the stored one
if value['revisionId'] != last_revision_id:
print("File has been modified.")
# Download and process the file
save_last_known_revision_id(value['revisionId'])
else:
print("No changes detected.")
2. Introduce a Short Delay Between Polling
If you're polling for changes frequently, introducing a small delay between requests might help. This gives enough time for Google Drive to propagate the file metadata updates. Something as simple as a time.sleep(1)
between requests could reduce the frequency of repeated "no changes" results.
3. Check for Updates Using files().list()
If the file is part of a larger collection or folder, you could use files().list()
with a q
parameter that checks for updates on multiple files or within a specific time range (e.g., modifiedTime > 'last_checked_time'
). This allows you to check if any relevant changes have occurred in your file’s directory.
Example:
query = "modifiedTime > 'last_checked_time'"
request = service.files().list(q=query, fields='files(id, name, modifiedTime)')
response = request.execute()
# Then check if your file is in the response
if 'files' in response:
for file in response['files']:
if file['id'] == 'your_file_id':
print("File has been modified.")
4. Use Google Drive Push Notifications
Google Drive supports Push Notifications to notify your application of changes. This is more efficient than polling because you don't need to repeatedly check the API. Instead, you can subscribe to changes on your file or folder and receive notifications whenever there is a modification. This can avoid the issue of polling too frequently and missing changes.
Example (after setting up the push notifications service):
# You'll need to set up a channel and subscription first to get notifications
5. Manual Revision Checking
In addition to the revisionId
, you can also compare the modifiedTime
with the timestamp of when you last checked, and decide if it's worth re-checking based on the time difference.
Final Suggestions:
-
Try Using the
revisionId
Approach: This is a better way to track changes that might not show up immediately with themodifiedTime
. TherevisionId
gives a more reliable indicator of whether a file has been updated, independent of minor timing differences or Google Drive's internal handling of file changes. -
Check with Notifications: If you're looking for more real-time tracking, consider setting up push notifications so you don’t have to poll manually.
-
Implement a Small Delay Between Polls: If polling is necessary, adding a delay between requests might help to avoid frequent "no changes" results.
Let me know if you need help implementing any of these suggestions!