Sunday, November 8, 2015

Photo Processing Toy

These days when we travel, we get photos from phone,  camera,  friend's phone, and friend's camera. Each device has its own naming system, so it is hard to sort them according to the names from all devices. Luckily the picture files have metadata that captures the time when the picture is taken. I hacked down to the metadata and extracted both the time and the place the picture is taken. By setting date and place information into the pictures' file names, they can not only be easily sorted but also the places I have traveled are clearly shown up. While organizing my pictures, surprisingly, one camping place I thought in Santa Cruz Capitola, it is in Aptos in fact. So if you would like to know the exact place the picture is taken, it is really helpful.

While I was searching a good app to mark pictures on the map, Intagram claimed it has the functionality. However when I tried it, it is such a lame product. Intagram doesn't use the pictures location information at all, instead it uses the location when the user is sharing the pictures. If one takes a bunch of pictures on the trip and later share those pictures when one gets home, the home location is used to mark the pictures on the map :(

In this toy too, I also mark the places extracted from the picture in the map by scanning the pictures' GPS information. At the end, a nice trip journal map is generated. By clicking on the marker point, the picture taken there will pop up.

To use the software, first install python and PIL library.
Run the program with the absolute path of the album.
The program will rename all pictures in the album with date and place information if those information is available in the metadata.
At the end, a tripmap.html file is generated that can be viewed in a browser. All the places where pictures are taken will be marked on the map and by clicking the icon, the picture taken at the specific place will pop up.
Example:
/* B:\test is the absolute path for the directory of pictures */
/* -rTrue means all the files will be renamed, if -rFalse, the files will not be renamed but a map of the trip with pictures markers will still be generated */
python photo.py -iB:\test -rTrue

The code locates at:
https://github.com/xiaoqin2012/release_photo

This is one output example from trip in Europe:
This is for the sailing trip from Victoria to Desolation Sound in Canada:

Sunday, October 11, 2015

Visiting British Columbia Day 4: Sailing to Bowen Island

Yer. There is a Bowen Island. Bowen says, "It is a giant island!". Believe it or not, go to check out by yourself. It is a pretty island. If Bowen wants to visit it again, I will do.














Friday, October 9, 2015

Be Cautious about the Storage used by Backup

The files on my PC is massive and unorganized. It takes some effort to manage it. To do some cleanup, I wrote a simple program to find all the duplicate files and tested it with the camera roll folder at One Drive/pictures. I thought it should be a dummy test. What surprised me is the following result:
total number of files: 33347, space occupied: 15.56GB
total number of duplicates: 613, space occupied: 1.62GB

About 20% of files are duplicates even under the backup for the camera roll! My camera could not take pictures that could be exactly same. I inspect both the album on my phone and the camera roll folder on One Drive. The duplicates don't exist at the album on my phone at all, but the camera roll backup on One Drive does show a lot of duplicates. It must be something wrong with One Drive's backup protocol or a bug in the code. These duplicates are only picture files. I don't have an enterprise account. People need to be careful about the backup for the enterprise storage too.

Due to curiosity, I run the program for gdrive photo folder too. Gdrive photo folder doesn't have the duplicates. But I could not open/read gdocs and gsheets at all from the PC. If one day, you don't have internet access, the local gdrive does nothing useful at all!

The duplicate picture files on Camera Roll have the similar names like:
one is DSC06220 1, the other is DSC06220_1.  I guess it is caused by the intermittent network connection for my case as those files are random. It doesn't happen to all the pictures taken at the same time or same day.

It has a reason to keep the files accidentally with the same names. Like one changes the SD card of the phone and start to take new pictures, the names will be reused again. However the things can be worse if someone just get out the SD card, plug it back again, then one ends up all the duplicate files on One Drive.  It should have done something smarter to inspect the contents or do a checksum to avoid to store exact same files/pictures.

If you are paying money for the backup storage, watch out for the space occupied by the duplicates. You can download my code to check duplicates on windows from github.


Tuesday, October 6, 2015

Revisiting Quicksort for Big Data

Quicksort is one of the most used subroutines in applications.While browsing its implementations online, one thing troubles me. As the following example from Rosettacode,  it has the neat implementation. But how about all the numbers are same! It doesn't do anything wrong, but it just runs your CPU all the time by doing useless things :P.

#include 
 
void quick_sort (int *a, int n) {
    int i, j, p, t;
    if (n < 2)
        return;
    p = a[n / 2];
    for (i = 0, j = n - 1;; i++, j--) {
        while (a[i] < p)
            i++;
        while (p < a[j])
            j--;
        if (i >= j)
            break;
        t = a[i];
        a[i] = a[j];
        a[j] = t;
    }
    quick_sort(a, i);
    quick_sort(a + i, n - i);
}

It is not hard to fix the problem by counting the number of the pivot value.  Just return when observing all the numbers are same.

While the data set is huge and the partitions are getting smaller with a few call of quick_sort, it has high probability that all the values in one small partition are the same. For some situations even worse, like sorting all the people in USA according to the age. If the input is an array of records in database systems, it is really costly by doing useless memory access/copies in nlog(n) scope!!!

Human's age are values from 1 to 100 something. If blindly using the quick_sort algorithms, it just eats the CPU and memory for no good. In many cases the keys to be sorted have just a few unique values like the age. In these cases, instead of using a nlog(n) quicksort algorithm,  a hybrid hash/count sort can do a much better job by one or a few scans. First extract the unique keys from the input, sort/hash them, do a final scan by counting or rearranging inputs. One can do optimization by just doing one scan depending the situations.




Friday, October 2, 2015

Visiting British Columbia Day 3: Getting on the Boat at Vancouver

After a long day on bus, ferry, water taxi, finally got on the boat.

Having fun on the ferry.


Vancouver harbor view from the water taxi.


Giant Cargo ship


Getting on the boat finally.

Thursday, October 1, 2015

Visiting British Columbia Day 2: Exploring Victoria

We started off from the place we stayed and walked to Beacon Hill Park close by. 



At the south side of Beacon Hill Park, we continued to explore the trail along the cost called Dallas Road Waterfront trail. The vista points on the trail have stunning views. 







Starting from south point of Dallas Road Waterfront trail, we walked to the harbor and piers. Around harbor there are a lot of beautiful classic Victorian mansions.  








Tour De Victoria was happening on the day we were visiting. It looks a fun event. 





 Stroking around the downtown area.








At the late afternoon we had high tea at Empress hotel. It is worthwhile trying! The pier looks grand too at night. Don't miss it. 








Visiting British Columbia Day 1: Taking seaplane to Victoria

Flew from San Jose to Seattle.
Flew from Seattle to Victoria by Sea plane. It was a great experience.  Overlook of islands, mountains, harbor, and the ocean from the seaplane is grand.


We arrived Victoria at late afternoon. When we visited Butchart gardens, it was turning dark. I wish we had more day time to explore it as it is really pretty. The fireworks that only happens during summer season is great and beautiful too. Don't miss it if you come during summer and check out the time at the website.

















Sunday, August 16, 2015

Half moon bay day trip

Devil's slide is the must see. I love Devil’s slide trail. It is a short coastal hiking trail with great view of pacific ocean. In spring, there are a lot of wild flowers that makes the trail much prettier.







If you are a strenuous hiker, you can visit trails at McNee Ranch State park. It has gorgeous view of pacific ocean too. If the timing is right, one can see the whale migration that usually happens around January.
Overview from McNee Ranch State Park




Monster chef is a great place for Japanese food. I like the sashimi and sushi rolls there. It is a pity that it is not open for lunch on weekdays.


Monster Chef
The alternative is Gerkin’s Sandwitch close by or restaurants at the pier. The pier is across street of monster chef. Don't forget to buy live crabs and live fish at the pier.

Gerkin's Sandwich place
If you has more time, you can drive down to pigeon point light house. It is a scenic route. The pigeon point hostel is a cheap and fun place to stay overnight. Don’t miss watching the sunset.



The alternative plan could be:
visit pigeon point lighthouse first.
have lunch/picnic  at norms market: the artichoke bread is really good. Usually we will bring home some too.
drive down to visit devil’s slide
visit pier and have dinner at monster chef. 

Best to avoid crowd to visit Devil’s slide in earlier mornings or on 
weekdays.