-- if run out filedescriptors in mogilefsd, mogilefsd shouldn't crash. it might now. needs a test. -- change debug level at runtime from mgmt port, propogate to children. -- MogileFS::Device ->make_directory on lighttpd isn't exactly right, as WebDAV spec says MKCOL on an existing directory should return 405 (Method Not Allowed), which I think is the same as a server without WebDAV enabled... we need to distinguish between those two cases. (perhaps in Monitor job?) 405 (Method Not Allowed) - MKCOL can only be executed on a deleted/non-existent resource. -- mogdbsetup should use /etc/mogilefs/mogilefsd.conf for upgrade dsn info -- fix 5/second in mogadm fsck status. hard-coded to 5. whoops. -- in MogileFS::Device, + # FIXME: don't use local machine's time() for this. time sync + # issues! instead, the monitor process should track this, + # noting the difference in relative time between the server's + # time (in Date: response header) and time in the usage.txt + # file. -- make mogilefsd trackers speak FUSE, so we could mount all of mogilefs using, say: http://noedler.de/projekte/wdfs/index.html things like paths could be exposed as extended attributes, or as pseudo files: cat /mnt/mogile//paths/ cat /mnt/mogile//contents/ -- if create_open before monitor runs yet, block and wait for a round? better than 'no_devices' might need some marker for 'end of monitoring'? or just wait until we have 1 or 3? or if they asked for 3, only return 1 in that rare case, preferring latency to redundancy. plus, we'd know that 1 is writable in last few seconds! -- update the 'repl' command for new file_to_replicate table -- replication policy error storm when a device is known to be observably down: [replicate(28037)] replication policy ran out of suggestions for us replicating fid 197214 [replicate(28037)] replication policy ran out of suggestions for us replicating fid 197237 [replicate(28037)] replication policy ran out of suggestions for us replicating fid 197219 [replicate(28037)] replication policy ran out of suggestions for us replicating fid 197256 [replicate(28037)] replication policy ran out of suggestions for us replicating fid 197226 -- telnet to 7001 and send "!" will crash. -- fsck job/command. need 'last fsck' table per fid? or column? -- fix haphazard CapitalStyle vs capital_style in ProcManager for class methods -- 'every' func should select on psock, to process parent-sent commands during worker's breaks -- create close could wake a replicate process. -- optional 'wait_until_replicated=1' flag to create close, so client doesn't get success until file is everywhere. -- redo/reevaluate the 'unreachable_fids' logic: unreachable should only mean host/device are up, but file is 404. -- test database failures -- identify idempotent commands and replay them 'n' times if query worker dies during processing. -- have queries workers be able to broadcast back up to parent "can't parse this" at which point parent parses it (e.g. "help" command), so admins don't need to remember the "!" prefix. of course, "!" prefix can always be used to reach parent faster. -- mb_asof handling in find_deviceid seems broken. less than max age? wrong units. -- make generic script to write out usage files for people not using mogstored -- or, let mogstored be run in 'usage' file writing only mode -- wake up deleter process? totally overkill, but why not? * 404 storms during replicating: (1.5 year old email, might be fixed, verify) :: [replicate(12648)] Error: Resource http://10.0.0.82:7500/dev15/0/015/693/0015693821.fid failed: HTTP 404 :: [replicate(12648)] Copier failed replicating 15693821 :: [replicate(12648)] Error: Resource http://10.0.0.82:7500/dev15/0/015/693/0015693819.fid failed: HTTP 404 :: [replicate(12648)] Copier failed replicating 15693819 :: [replicate(12648)] Error: Resource http://10.0.0.81:7500/dev9/0/015/693/0015693844.fid failed: HTTP 404 :: [replicate(12648)] Copier failed replicating 15693844 :: [replicate(12646)] Copier failed replicating 15693846 :: [replicate(12646)] Error: Resource http://10.0.0.82:7500/dev15/0/015/693/0015693821.fid failed: HTTP 404 :: [replicate(12646)] Copier failed replicating 15693821 :: [replicate(12646)] Error: Resource http://10.0.0.81:7500/dev9/0/015/693/0015693844.fid failed: HTTP 404 :: [replicate(12646)] Copier failed replicating 15693844 :: [replicate(12648)] Error: Resource http://10.0.0.81:7500/dev3/0/015/693/0015693848.fid failed: HTTP 404 :: [replicate(12650)] Error: Resource http://10.0.0.82:7500/dev15/0/015/693/0015693819.fid failed: HTTP 404 :: [replicate(12648)] Copier failed replicating 15693848 ...... -- fsck should catch weird state where file exists in 'file' table with devcount>0 but does not exist in file_on. Or count does not match. -- fsck for case where row from file_to_replicate(fid,fromdevid) does not exist in file_on(fid,devid). This is a byproduct of a failed inject.