AWS Lightsail is awesome

AWS have recently introduced a new service called Lightsail which I think is really great to use as a replacement for my home-server.

I use my home server(s) extensively for my work as a big data developer. My home setup consists of Ubuntu with LAMP stack on it. Which is perfect to host some presentations, websites, share files and the most important is my stepping stone when I need to access free Internet bypassing corporate firewalls.  My home-server is running day and night every day consuming some power and requires regular updates or maintenance from time to time.

Summary of pros and cons of Lightsail vs home-server

[supsystic-tables id=1]

In contrast AWS Lightsail is living in a AWS cloud and is always online for as little as 3.50 USD a month.  The service is available as Linux or Windows and ready to be configured as application where you can chose from WordPress, LAMP stack, Node.js, GitLab and few more.

The host operating system can be selected from very small 512MB RAM, 1GB CPU, 20GB disk and 1T data transfer up to 8CPUs, 32GB RAM and 640GB disk space.

And then the best thing is a price starting at 3,50 USD a month. Which is bargain compared to running costs of my homeserver considering initial hardware cost, power used, time spend on updates and maintenance and also very important I’m not depending anymore of my home internet connection which sometimes goes down or my semi-static DNS address which can change from time to time.

Power consumption consideration

[number of  hours’ use] x [number of days’ use] x ([capacity of appliance expressed in watt] / 1,000) = number of kWh

Given that I have two desktop computers running 24/7 whole year consuming on average  100 Watts then only electricity costs are around 440 EUR a year.  The cheapest Lightsail option cost 3,50 * 12 = 42 EUR per year. Okay it’s not fair to compare home desktop with 8 TB storage and 32GB RAM to a Lightsail instance comparable to Raspberry Pi. For the record Raspberry Pi running whole year will cost only 5 EUR. But if I only turn the desktops ON when needed which is (at the moment mostly working from office)  about 3 hours a day [(3*356*(100/1000))*0.24 = 25,6 EUR].

Cockroach database

Start single instance

cockroach start --insecure --host=localhost --http-port=8181 --cache=.25 --max-sql-memory=.25

or

sudo cockroach start --insecure --host=0.0.0.0 --max-sql-memory=25% --cache=50%

Create database

cockroach sql --insecure --database=nifi -e 'CREATE database nifi'

Create table

cockroach sql --insecure --database=nifi -e 'CREATE TABLE Kafka
(
JSONAttributes TEXT,
filename TEXT,
hoedanigheden TEXT,
kafkaOffset INT,
kafkaPartition INT,
kafkaTopic TEXT,
path TEXT,
registratieIdentificatie TEXT,
size INT,
uuid TEXT,
voorvalIdentificatie TEXT
)'

 

Start SQL client

cockroach sql --insecure --port=26257

 

Chaos

Partially doe to my work but mostly of my personal interest I have read (listen) to this fascinating book about patterns found in chaos. Professionally I am involved in road traffic management works in The Netherlands. Road traffic is a good representation of chaos where many patterns are already discovered but there is still lot of work to be done.
This book gave me good inside about new developing science and I found this book very engaging. I would highly recommend this book to anyone interested in mathematics, physics, population biology and other diverse fields.
 

 

Numerical analyses and Gephi

Recently after listening of few great books about number crunching and the latest Linked… I’m fascinated with chaos theories. I have started running my own web-spider to collect data for later analyses. Iwas looking for any visualization tool to produce nice graphs with PHP but I have found something really amazing and free (Open Source). Check it out: Gephi.

Watch the video: Introducing Gephi 0.7 from gephi on Vimeo.

 

PHP Pear – Mail

Po instalacji paczki Mail wraz z zależnościami możemy uruchomić pierwsze skrypty do wysyłania e-maili.
Przykład podstawowy:

<?php
// Dolaczona paczka Pear Mail
include(‘Mail.php’);
$mail = Mail::factory(“mail”);

// W header adres nadawcy i tytul wiadommosci
$headers = array(“From”=>”ja@przyklad.pl”, “Subject”=>”Test Mail”);

// Tresc wiadomosci
$body = “To jest test!”;

// Wyslylamy do on@przyklad.pl
$mail->send(“dkrysmann@gmail.com”, $headers, $body);
?>

PHP Pear – Instalacja

W kilku z moich aplikacji internetowych zastosowałem biblioteki PEAR. Jedna z nich to Mail do wysyłania e-maili.  Zaleta rozwiązania pear jest na przykład to ze tak zwany mail_header jest dobrze skomponowany. W przeszłości stosowałem PHP mail() i zauważyłem ze niektóre serwery pocztowe odzucaly e-maile wysłane z PHP mail()

Instalacja

Zakładam ze masz działający LAMP server, przykłady tu zawarte opieraja sie na Ubuntu – Apache – PHP – MySQL.
Instalacja PEAR:

sudo apt-get install php-pear

Po instalacji możemy sprawdzić czy PEAR jest dostępny za pomoca nastepujacej komendy:

pear

… w rezultacie otrzymamy listę dostępnych opcji.

Teraz możemy zainstalować poszczególne komponenty (klasy) które chcemy zastosować. Za pomoca komendy search mozemy poszukac klasy, na przyklad:

pear search mail

…rezultat:

Retrieving data…0%
….50%..Matched packages, channel pear.php.net:
=======================================
Package         Stable/(Latest) Local
Mail            1.2.0 (stable)  1.2.0 Class that provides multiple interfaces for sending emails
Mail_IMAP       1.1.0RC2 (beta)       Provides a c-client backend for webmail.
Mail_IMAPv2     0.2.1 (beta)          Provides a c-client backend for webmail.
Mail_Mbox       0.6.3 (beta)          Read and modify Unix MBOXes
Mail_Mime       1.8.0 (stable)  1.8.0 Mail_Mime provides classes to create MIME messages.
Mail_mimeDecode 1.5.4 (stable)  1.5.4 Provides a class to decode mime messages.
Mail_Queue      1.2.6 (stable)        Class for put mails in queue and send them later in background.
Net_Vpopmaild   0.3.2 (beta)          Class for accessing Vpopmail’s vpopmaild daemon

Tu warto zauważyć ze niektóre paczki (packages) maja dopisek (stable) a inne (beta). Standardowo można instalować tylko paczki stabilne (stable). Paczki beta maja lub mogą mieć pewne niedociągnięcia i są jeszcze rozwijane. Jeżeli mimo wszystko chcemy skorzystać z paczki o statusie beta należy odpowiednio  “nastawic” pear za pomocą komendy:

pear config-set preferred_state beta

Oprócz statusów stable i beta istnieja rowniez paczki o statusie alpha i devel.

Sama instalacja dowolnej paczki odbywa sie za pomoca komendy:

sudo pear install –alldeps Mail

… gdzie opcja –alldeps zapewni nas ze wszystkie paczki które są konieczne do działania paczki która instalujemy tez się zainstalują.

Zrodla: http://pear.php.net

Przyklad prostego skryptu d owysylki e-maili.

MySQL performance tips

forge.mysql.com

1. Use EXPLAIN to profile the query execution plan

2. Use Slow Query Log (always have it on!)

3. Don’t use DISTINCT when you have or could use GROUP BY

4. Insert performance

1. Batch INSERT and REPLACE

2. Use LOAD DATA instead of INSERT

5. LIMIT m,n may not be as fast as it sounds. Learn how to improve it (if possible): http://www.facebook.com/note.php?note_id=206034210932

6. Don’t use ORDER BY RAND() if you have > ~2K records

7. Use SQL_NO_CACHE when you are SELECTing frequently updated data or large sets of data

8. Avoid wildcards at the start of LIKE queries

9. Avoid correlated subqueries and in select and where clause (try to avoid in)

10. No calculated comparisons — isolate indexed columns

11. ORDER BY and LIMIT work best with equalities and covered indexes

12. Separate text/blobs from metadata, don’t put text/blobs in results if you don’t need them

13. Derived tables (subqueries in the FROM clause) can be useful for retrieving BLOBs without sorting them. (Self-join can speed up a query if 1st part finds the IDs and uses then to fetch the rest)

14. ALTER TABLE…ORDER BY can take data sorted chronologically and re-order it by a different field — this can make queries on that field run faster (maybe this goes in indexing?)

15. Know when to split a complex query and join smaller ones

16. Delete small amounts at a time if you can

17. Make similar queries consistent so cache is used

18. Have good SQL query standards

19. Don’t use deprecated features

20. Turning OR on multiple index fields (<5.0) into UNION may speed things up (with LIMIT), after 5.0 the index_merge should pick stuff up.

21. Don’t use COUNT * on Innodb tables for every search, do it a few times and/or summary tables, or if you need it for the total # of rows, use SQL_CALC_FOUND_ROWS and SELECT FOUND_ROWS()

22. Use INSERT … ON DUPLICATE KEY update (INSERT IGNORE) to avoid having to SELECT

23. use groupwise maximum instead of subqueries

24. Avoid using IN(…) when selecting on indexed fields, It will kill the performance of SELECT query.


Scaling Performance Tips:

1. Use benchmarking

2. isolate workloads don’t let administrative work interfere with customer performance. (ie backups)

3. Debugging sucks, testing rocks!

4. As your data grows, indexing may change (cardinality and selectivity change). Structuring may want to change. Make your schema as modular as your code. Make your code able to scale. Plan and embrace change, and get developers to do the same.

Network Performance Tips:

1. Minimize traffic by fetching only what you need.

1. Paging/chunked data retrieval to limit

2. Don’t use SELECT *

3. Be wary of lots of small quick queries if a longer query can be more efficient

2. Use multi_query if appropriate to reduce round-trips

3. Use stored procedures to avoid bandwidth wastage

OS Performance Tips:

1. Use proper data partitions

1. For Cluster. Start thinking about Cluster *before* you need them

2. Keep the database host as clean as possible. Do you really need a windowing system on that server?

3. Utilize the strengths of the OS

4. pare down cron scripts

5. create a test environment

6. source control schema and config files

7. for LVM innodb backups, restore to a different instance of MySQL so Innodb can roll forward

8. partition appropriately

9. partition your database when you have real data — do not assume you know your dataset until you have real data


MySQL Server Overall Tips:

1. innodb_flush_commit=0 can help slave lag

2. Optimize for data types, use consistent data types. Use PROCEDURE ANALYSE() to help determine the smallest data type for your needs.

3. use optimistic locking, not pessimistic locking. try to use shared lock, not exclusive lock. share mode vs. FOR UPDATE

4. if you can, compress text/blobs

5. compress static data

6. don’t back up static data as often

7. enable and increase the query and buffer caches if appropriate

8. config params — http://docs.cellblue.nl/2007/03/17/easy-mysql-performance-tweaks/ is a good reference

9. Config variables & tips:

1. use one of the supplied config files

2. key_buffer, unix cache (leave some RAM free), per-connection variables, innodb memory variables

3. be aware of global vs. per-connection variables

4. check SHOW STATUS and SHOW VARIABLES (GLOBAL|SESSION in 5.0 and up)

5. be aware of swapping esp. with Linux, “swappiness” (bypass OS filecache for innodb data files, innodb_flush_method=O_DIRECT if possible (this is also OS specific))

6. defragment tables, rebuild indexes, do table maintenance

7. If you use innodb_flush_txn_commit=1, use a battery-backed hardware cache write controller

8. more RAM is good so faster disk speed

9. use 64-bit architectures

10. –skip-name-resolve

11. increase myisam_sort_buffer_size to optimize large inserts (this is a per-connection variable)

12. look up memory tuning parameter for on-insert caching

13. increase temp table size in a data warehousing environment (default is 32Mb) so it doesn’t write to disk (also constrained by max_heap_table_size, default 16Mb)

14. Run in SQL_MODE=STRICT to help identify warnings

15. /tmp dir on battery-backed write cache

16. consider battery-backed RAM for innodb logfiles

17. use –safe-updates for client

18. Redundant data is redundant


Storage Engine Performance Tips:

1. InnoDB ALWAYS keeps the primary key as part of each index, so do not make the primary key very large

2. Utilize different storage engines on master/slave ie, if you need fulltext indexing on a table.

3. BLACKHOLE engine and replication is much faster than FEDERATED tables for things like logs.

4. Know your storage engines and what performs best for your needs, know that different ones exist.

1. ie, use MERGE tables ARCHIVE tables for logs

2. Archive old data — don’t be a pack-rat! 2 common engines for this are ARCHIVE tables and MERGE tables

5. use row-level instead of table-level locking for OLTP workloads

6. try out a few schemas and storage engines in your test environment before picking one.

Database Design Performance Tips:

1. Design sane query schemas. don’t be afraid of table joins, often they are faster than denormalization

2. Don’t use boolean flags

3. Use Indexes

4. Don’t Index Everything

5. Do not duplicate indexes

6. Do not use large columns in indexes if the ratio of SELECTs:INSERTs is low.

7. be careful of redundant columns in an index or across indexes

8. Use a clever key and ORDER BY instead of MAX

9. Normalize first, and denormalize where appropriate.

10. Databases are not spreadsheets, even though Access really really looks like one. Then again, Access isn’t a real database

11. use INET_ATON and INET_NTOA for IP addresses, not char or varchar

12. make it a habit to REVERSE() email addresses, so you can easily search domains (this will help avoid wildcards at the start of LIKE queries if you want to find everyone whose e-mail is in a certain domain)

13. A NULL data type can take more room to store than NOT NULL

14. Choose appropriate character sets & collations — UTF16 will store each character in 2 bytes, whether it needs it or not, latin1 is faster than UTF8.

15. Use Triggers wisely

16. use min_rows and max_rows to specify approximate data size so space can be pre-allocated and reference points can be calculated.

17. Use HASH indexing for indexing across columns with similar data prefixes

18. Use myisam_pack_keys for int data

19. be able to change your schema without ruining functionality of your code

20. segregate tables/databases that benefit from different configuration variables

Other:

1. Hire a MySQL ™ Certified DBA

2. Know that there are many consulting companies out there that can help, as well as MySQL’s Professional Services.

3. Read and post to MySQL Planet at http://www.planetmysql.org

4. Attend the yearly MySQL Conference and Expo or other conferences with MySQL tracks (link to the conference here)

5. Support your local User Group (link to forge page w/user groups here)

[edit] Authored by

Jay Pipes, Sheeri Kritzer, Bill Karwin, Ronald Bradford, Farhan “Frank Mash” Mashraqi, Taso Du Val, Ron Hu, Klinton Lee, Rick James, Alan Kasindorf, Eric Bergen, Kaj Arno, Joel Seligstein, Amy Lee,sameer joshi

Retrieved from “http://forge.mysql.com/wiki/Top10SQLPerformanceTips”

Zapis adresu IP w MySQL

Efektywny zapis adresów IP w bazie danych (MySQL) można zrealizować za pomocą funkcji INET_ATON w MySQL

INET_ATON() Zwraca numeryczna wartość adresu IP
mysql> SELECT INET_ATON(‘209.207.224.40’);
-> 3520061480

Powstała wartość numeryczna należy zachować jako INT(10) UNSIGNED

By odtworzyć adres IP z wartości numerycznej zastosuj INET_NTOA().

INET_NTOA() Zwróć adres IP z wartości numerycznej
mysql> SELECT INET_NTOA(3520061480);
-> ‘209.207.224.40’

http://dev.mysql.com/doc/refman/5.0/en/miscellaneous-functions.html