All posts by ted

How to Manage Django Settings for Production and Development

When I started developing with django and deploying django application on a server, I was facing the problem of how to mange different django settings for different environments with ease. I found many questions on stackoverflow like this and blog posts like this regarding this topic. They are informative but just don’t fit my style quite well, so after summing up their solutions, I came up with my own idea. Continue reading How to Manage Django Settings for Production and Development

How to Secure Website Admin Page by Nginx Setting Using VPN

I have a VPS server hosting the current wordpress blog site and recently launched another django application on another server. When I check nginx logs I find a lot of malicious requests from a vast variety of sources all over the globe, and they all go to /wp-login.php page, which is where the admin panel resides. There are a number of wordpress plugins out there to prevent unauthorized people from logging into the admin panel. They are great but just don’t provide the level of security I want, so I turned into looking for methods of  setting up nginx to prevent undesired traffic. My idea is to setup a VPN server on a VPS and tell nginx to only allow requests to admin page from that VPN server.

The VPN solution I’m using is Shadowsocks, which is a open-source lightweight socks5 proxy. It is easy to setup and use and runs fast. Unfortunately the original repo has been emptied by the author clowwindy, but there are some backup repos.

Then add a block like below in nginx virtual host config file usually under /etc/nginx/sites-available/:

 

Then every time I want to post a new blog, I can connect to that VPN server using a shadowsocks client and login from there, otherwise I will get a 403 forbidden error. There are different shadowsocks clients for different platforms like iOS, macOS, Android, Linux. I think this same idea applies to django application at /admin, too.

MySQL ERROR! The server quit without updating PID file

I got this error when starting mysql server on my MacBook Pro. The simple solution is to put a line in ~/.my.cnf file under [mysqld] block

ps: I created a symlink under /usr/local/var, which I believe is the default directory by Homebrew, pointing to an external drive, this may be the cause for the above error, but I still need verification.

Inspired by: https://lalitvc.wordpress.com/2016/12/15/mysql-unable-to-setup-unix-socket-lock-file/

But still there is an error:

so use http connection by specifying the address with -h option:  mysql -h '127.0.0.1' -u root -p, or create a symlink to the aforementioned sock file:  ln -s /usr/local/var/mysql/mysql.sock /tmp/mysql.sock

Hope this helps some others!!!

How to Push to GitHub with SSH Configuration

Recently I came across a ProxyPool repo, which is a dynamic proxy pool used for web crawling. It is useful for me, so I think I can contribute my effort to benefit others. After adding some other free proxy service providers I found, I need to push my changes to my forked repo on GitHub.com.

I prefer using SSH for convenience and security, so I don’t have to type in my username and password every time. I read about GitHub tutorial on Connecting to GitHub with SSH, and some other sources like Push to github without password using ssh-key and Automatically use correct SSH key for remote Git repo. All the tutorials say to change repo’s remote url to this format:  git remote set-url origin git@github.com:<Username>/<Project>.git but my push fails, and ssh -T git@github.com gives Permission denied (publickey) error. Thankfully  ssh -vvv git@github.com shows some interesting hints:

It seems that with the aforementioned configuration git does not read my /Users/username/.ssh/config file and will only look for ssh keys with default names, i.e. id_*. I have multiple ssh keys for logging in to my VPS servers, and I rename all the keys accordingly, including this github key pairs. I also create an entry for each server in /Users/username/.ssh/config file like

so a simple command like ssh digitalocean will log me into the server. Then how to tell git to look for a Host entry instead? Here is the working remote url format on my machine: git remote set-url origin <Host>:<Username>/<Project>.git, therefore ssh authentication test command changes to  ssh -T github

How to Manipulate Huge CSV File in Python – One Thought

Problem

I started learning web scraping about a month ago. Now I have some data stored as CSV file a.csv, containing 9 columns and about 11 million rows in total, including all the urls I want to extract data from. For some of the urls I have visited and got data from them, stored as another CSV file b.csv, which contains 2 column and about 4.5 million rows in total. For some reason like 403 error, 404 error, there are still about 6.4 million urls I want to visit, but there are buried in the huge a.csv file, now I have a problem, how to remove the rows from a.csv which also contains the 2 columns in b.csv?
Continue reading How to Manipulate Huge CSV File in Python – One Thought