snippet


22
Dec 08

Just sitting here watching the logs go by.

Some times while working on remote servers I need to watch various log files. Typically for something like this I’ll create an extremely simple script that watches all the logs at once, since the next time I’m on that machine I’ll probably have forgotten the paths in question.

Something like this:

tail -f /var/www/apache/access.log /var/www/apache/error.log

The problem with that approach is really long lines wrap and I usually just care about the far left of the file, so I’ve been looking for a way to turn off wrapping in tail. Unfortunately that seems to be impossible. I tried messing with my shell to kill the extra characters with this:

echo -e "\e[?7l\c"

But that was messing up other things. The best solution I’ve found so far is to use less with these options

less +F -S /var/log/apache/access.log

+F puts it in a tail like mode and -S chops the line to the screen width, the only drawback is that it doesn’t intersperse the 2 files like tail does. I was hoping to pipe tails input into less in the this fashion but that didn’t seem to work right either.


21
Jan 08

Duplicate music file finder code.

I’ve been consolidating my music collection and found that there were lots of duplicate files.

Most of the dupes were named something like “Happy Birthday 1.mp3″ and “Happy Birthday.mp3″ would exist in the same directory. I’m not sure which program added these dupes, but removing 2500 or so of em by hand would not be fun.

Without further ado, here’s some python code that takes care of that problem for you. It only examines the filename, not the date or the bitrate or the actual file contents, etc. But you could of course extend it to do all those things.

Enjoy.

#-------------------------------------------------------------------------------
# Name:        cleanDuplicateMusicFiles.py
# Purpose:     Loops over a directory structure looking for 'duplicate' music files
#				and moving them to a safe directory for deletion.
#
# Author:      Joshua Bloom
#
# Created:     01/18/2008
#-------------------------------------------------------------------------------
#!/usr/bin/env python
 
import os
import sys
 
dupList = []
rootDirectory = "/Users/joshbloom/Music/iTunes/iTunes Music"
sequesteredFilesDirectory = "/Users/joshbloom"
 
def main():
    print "Starting search ..."
    checkDir(rootDirectory)
    print "Found %s dupes" % len(dupList)
 
def checkDir(path):
    print "Checking path '%s' for duplicates" % os.path.basename(path)
    for item in [ os.path.join(path, x) for x in os.listdir(path) ]:
        if os.path.isdir(item):
            checkDir(item)
        else:
            checkForDupe(item)
 
def checkForDupe(fName):
	'''Example: if we find 'Happy Birthday 1.mp3' and 'Happy Birthday.mp3' exists in the
	same directory we consider this a duplicate and send it for re-education. '''
    fileName = os.path.basename(fName)
    folderList = os.listdir(os.path.dirname(fName))
    if fileName.endswith("1.mp3"):
        for otherName in folderList:
            if otherName != fileName: #Make sure we aren't comparing with the current file
                if os.path.basename (otherName).startswith(fileName[:-6]):
                    #This is a duplicate
                    dupList.append(fName)
                    sequesterDup(fName)
 
def sequesterDup(fName):
    ''' Move em to a new folder, if you were confident you could change
 		this function to delete the file. '''
    try:
        print "Moving file: %s" % fName
        os.rename(fName, os.path.join(sequesteredFilesDirectory, os.path.basename(fName)) )
    except Exception, E:
        print E
 
if __name__ == '__main__':
    main()

6
Jun 07

Convert m4a to mp3

I’ve been moving a bunch of my brothers music out of iTunes for him so he can use it with portable players besides iPod. Unfortunately he encoded a lot of his cd’s in .m4a format. I found a decent utility for converting to mp3 (and other formats) http://www.bonkenc.org/

Unfortunately when you point Bonk at a directory full of m4a’s it crashes on certain files for some reason (encoding issues probably.) After the crash you need to setup all of your settings and add the files again, which is really time consuming and annoying.

To make this easier I whipped up a short python script that calls Bonk for you on each file, if it runs into a bad file it will rename the file for you so you don’t try to process it again.

Here’s the python code:

import os
import pprint
import subprocess
curDir = os.getcwd() # The current directory. This should contain your .m4a files
pathToBonk = "C:\\Program Files\\BonkEnc\\becmd.exe" #Where the becmd.exe file lives
problemFiles = [] #A list of files that failed conversion
#
for item in os.listdir(curDir):
	if item.upper().endswith('.M4A'):
		fullPath = os.path.join(curDir,item)
		cmd = '"%s" -e LAME -d "%s" "%s"' #The command to convert a single file
		cmd = cmd % (pathToBonk, curDir, fullPath)
		val = subprocess.call(cmd)
		if val == 0: #Successfull conversion, delete the original
			os.remove(fullPath)
		else:
			problemFiles.append(fullPath)
			print 'Problem converting %s' % item
			os.rename(fullPath, fullPath + ".BAD")
print 'These files had problems converting and have been renamed with .BAD extensions:'
pprint.pprint(problemFiles)

NOTES: This will delete the .m4a file after converting it. If you want to keep your old files for some reason make sure to run this on a copy of your files. IE in a different directory.


4
Apr 07

Count the duplicates in a Python List

Here’s a nice little function I’ve written to report the number of duplicates in a python list.

from sets import Set
#
def countDuplicatesInList(dupedList):
   uniqueSet = Set(item for item in dupedList)
   return [(item, dupedList.count(item)) for item in uniqueSet]
#
lst = ['I1','I2','I1','I3','I4','I4','I7','I7','I7','I7','I7']
print countDuplicatesInList(lst)

The Set datatype is an unordered set that doesn’t allow duplicates, so the first line in the function adds each item in the original list to the Set. The set automatically throws out duplicates so we end up with a unique list.
The next line creates a tuple of the unique item name and its count in the original list.

The output of the function will look like this:

[('I1', 2), ('I3', 1), ('I2', 1), ('I4', 2), ('I7', 5)]