-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sr_poll posts old files when the timestamp format of the file changes #366
Comments
what is being polled?
In most situation, cannot select the different format. What protocol poll are you doing? |
it's using ftp. |
so all we have is ASCII. we would need to improve the parser, but the server can produce any format they like. it is not standardized. On windows, it is even more entertaining. |
That would be a big challenge if we could not get the details from the remote server. |
Should we just have an upper bound on the age of files... if something is older than that... ignore it.. would such a setting help in this case? If it is > 1 month old, we purposefully ignore it. |
It makes sense to ignore files older than specific days. Actually that would be a good enhancement because old files usually cause some backlog and some client applications do not deal with old data since they were not expecting to have it. This would also help not to pull lot of old files when a cache is not set yet especially when a poll is started the first time. |
Skip file dates older than a certain given time (2 months for ex). #366
…e = 60 days, but can be changed in config file to anything else. (value no longer hardcoded) #366
In the rest of the configuration language, all time intervals are specified in seconds, with an optional unit suffix for minutes, hours, days, possibly weeks. The code used to have varying units in different locations (for example for log rotation, in days, expiry in milliseconds) and it did confuse people in practice. There is a to_duration entry_point somewhere that you just supply a string |
…et the default duration from days to seconds instead. Also updated the en and fr doc for sr_poll.1.rst with the updated functionality #366
In the documentation, the summary gives the default is 3600 seconds, which is one hour, but the text says it is a month. Would be best to reconcile those. There is also a change to printing in sr.py that I don't think has anything to do with this patch. |
The text says it is two months, and yes I realized that mistake! but already made a patch for it with default 60d instead of 3600 (see next two commits right after). And yes the modification for sr.py was for a different issue, and accidentally modified it on this branch. |
merged great! |
great for v2. needs forward port to v3. |
…than a certain time limit are ignored. default value: file_time_limit = "60d". #366
testing from @junmtl: Test for polling of old files when date format on remote server changes: file_time_limit only works with interger. file_time_limit 24 works file_time_limit 24h failed file_time_limit 1d failed |
Playing with flow tests, found some issues:
|
…her format. added it, and always use logger for messages.
quick patch, so that tests can still run... but wonder if the last ditch message should change... it's just printing that it wasn't in the last guessed format... it should say something like "unrecognized date format for: dos de mayo 1814" |
I guess v2 needs a patch similar to the v3 one. |
Added the above with an option for TIMEZONE and CONVERT_TO is always UTC. by default TIMEZONE is UTC. Accidentally referenced the issue as 3660 instead of 366 in the commit message. |
can you fix this:
also you added 'timezone' option... but there isn't a man page entry for it. I still only get 26/30 tests passing... but I'd rather look into it when there is something that can be merged. |
I confirmed the TZ conversion stuff works perfectly... that's exciting! |
you also need to add dateparser to install_requires in setup.py as we have a new dependency. |
…having conflicts and could not be merged. #366
The commit above includes
|
what is the new branch called, and has it been pushed ? |
never mind... found it. |
Great! Ran static/flakey/dynamic... and everything was clean. |
significant changes made in v3... suggest starting a new branch for there also... or at least make sure you git pull before you start. |
merged 863f042 for v2 man pages. |
for v3, there is significant inter-relationship with #394 refactor of poll to use the duplicate suppression cache. We don't just need to compare dates, but also need to use it to store checksum data, in order to fill 'mtime' field for message for cache check purposes. instead of a _file_date_within_limits function, need to figure out the mtime of the file |
I'm wondering if the output format from the line_date should not be mtime string... one field... |
the "standard" date-time format within Sarracenia v3 is YYYYMMDD "T" HHMMSS "." ... you can see it in pubTime, mtime, and atime fields. probably the format to normalize to is that. I've mocked in the point where 'mtime' needs to be known... so you can adjust the dateparsing stuff as necessary... maybe make the desc line have a standard date-time? patch above... |
… the SFTPattributes object in the poll - the old function can be deleted now. #366
…g os.stat() for SFTPattributes. #366
… in poll are now in SFTPattributes format. #366
…object for v3. #366 Now calculating how old a file is by accessing the mtime attribute of the SFTPattributes object in the poll - the old function can be deleted now. #366 Modified logic of plugin. Manually setting attributes instead of using os.stat() for SFTPattributes. #366 improved previous patch and added options for time zone and nodupe_file_time_limit. #366 Updated documentation and removed unnecessary ifs. Assuming all lines in poll are now in SFTPattributes format. #366
OK I rebased the above patches into a single one, after running static and flakey tests successfully (having merged in from v03_wip first) dynamic gets 20/30 now... as it does for everyone. Merged. |
looking at the code... still some weirdnesses... discussing offline. |
a4016c1
|
If the timestamp of a file is in the past six months, ls displays the date and the time. If it's longer ago than six months or if it's in the future, ls displays the date and the year as documented in:
http://www.gnu.org/software/coreutils/manual/coreutils.html#Formatting-file-timestamps
Some files could have the timestamp change during the day if the timestamp becomes older than 6 months. In this case, sr_poll will then consider them as new and publishes them even though they were already published
Here is an example where the timestamp format changed and the file is considered as new and published.
It could eventually be fixed it if sr_poll uses ls with --full-time option to keep the same timestamp format and avoid posting old files.
The text was updated successfully, but these errors were encountered: