
.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/cluster/plot_mini_batch_kmeans.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        Click :ref:`here <sphx_glr_download_auto_examples_cluster_plot_mini_batch_kmeans.py>`
        to download the full example code

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_cluster_plot_mini_batch_kmeans.py:


====================================================================
Comparison of the K-Means and MiniBatchKMeans clustering algorithms
====================================================================

We want to compare the performance of the MiniBatchKMeans and KMeans:
the MiniBatchKMeans is faster, but gives slightly different results (see
:ref:`mini_batch_kmeans`).

We will cluster a set of data, first with KMeans and then with
MiniBatchKMeans, and plot the results.
We will also plot the points that are labelled differently between the two
algorithms.

.. GENERATED FROM PYTHON SOURCE LINES 18-22

Generate the data
-----------------

We start by generating the blobs of data to be clustered.

.. GENERATED FROM PYTHON SOURCE LINES 22-34

.. code-block:: default


    import numpy as np

    from sklearn.datasets import make_blobs

    np.random.seed(0)

    batch_size = 45
    centers = [[1, 1], [-1, -1], [1, -1]]
    n_clusters = len(centers)
    X, labels_true = make_blobs(n_samples=3000, centers=centers, cluster_std=0.7)








.. GENERATED FROM PYTHON SOURCE LINES 35-37

Compute clustering with KMeans
------------------------------

.. GENERATED FROM PYTHON SOURCE LINES 37-47

.. code-block:: default


    import time

    from sklearn.cluster import KMeans

    k_means = KMeans(init="k-means++", n_clusters=3, n_init=10)
    t0 = time.time()
    k_means.fit(X)
    t_batch = time.time() - t0








.. GENERATED FROM PYTHON SOURCE LINES 48-50

Compute clustering with MiniBatchKMeans
---------------------------------------

.. GENERATED FROM PYTHON SOURCE LINES 50-65

.. code-block:: default


    from sklearn.cluster import MiniBatchKMeans

    mbk = MiniBatchKMeans(
        init="k-means++",
        n_clusters=3,
        batch_size=batch_size,
        n_init=10,
        max_no_improvement=10,
        verbose=0,
    )
    t0 = time.time()
    mbk.fit(X)
    t_mini_batch = time.time() - t0








.. GENERATED FROM PYTHON SOURCE LINES 66-72

Establishing parity between clusters
------------------------------------

We want to have the same color for the same cluster from both the
MiniBatchKMeans and the KMeans algorithm. Let's pair the cluster centers per
closest one.

.. GENERATED FROM PYTHON SOURCE LINES 72-82

.. code-block:: default


    from sklearn.metrics.pairwise import pairwise_distances_argmin

    k_means_cluster_centers = k_means.cluster_centers_
    order = pairwise_distances_argmin(k_means.cluster_centers_, mbk.cluster_centers_)
    mbk_means_cluster_centers = mbk.cluster_centers_[order]

    k_means_labels = pairwise_distances_argmin(X, k_means_cluster_centers)
    mbk_means_labels = pairwise_distances_argmin(X, mbk_means_cluster_centers)








.. GENERATED FROM PYTHON SOURCE LINES 83-85

Plotting the results
--------------------

.. GENERATED FROM PYTHON SOURCE LINES 85-145

.. code-block:: default


    import matplotlib.pyplot as plt

    fig = plt.figure(figsize=(8, 3))
    fig.subplots_adjust(left=0.02, right=0.98, bottom=0.05, top=0.9)
    colors = ["#4EACC5", "#FF9C34", "#4E9A06"]

    # KMeans
    ax = fig.add_subplot(1, 3, 1)
    for k, col in zip(range(n_clusters), colors):
        my_members = k_means_labels == k
        cluster_center = k_means_cluster_centers[k]
        ax.plot(X[my_members, 0], X[my_members, 1], "w", markerfacecolor=col, marker=".")
        ax.plot(
            cluster_center[0],
            cluster_center[1],
            "o",
            markerfacecolor=col,
            markeredgecolor="k",
            markersize=6,
        )
    ax.set_title("KMeans")
    ax.set_xticks(())
    ax.set_yticks(())
    plt.text(-3.5, 1.8, "train time: %.2fs\ninertia: %f" % (t_batch, k_means.inertia_))

    # MiniBatchKMeans
    ax = fig.add_subplot(1, 3, 2)
    for k, col in zip(range(n_clusters), colors):
        my_members = mbk_means_labels == k
        cluster_center = mbk_means_cluster_centers[k]
        ax.plot(X[my_members, 0], X[my_members, 1], "w", markerfacecolor=col, marker=".")
        ax.plot(
            cluster_center[0],
            cluster_center[1],
            "o",
            markerfacecolor=col,
            markeredgecolor="k",
            markersize=6,
        )
    ax.set_title("MiniBatchKMeans")
    ax.set_xticks(())
    ax.set_yticks(())
    plt.text(-3.5, 1.8, "train time: %.2fs\ninertia: %f" % (t_mini_batch, mbk.inertia_))

    # Initialize the different array to all False
    different = mbk_means_labels == 4
    ax = fig.add_subplot(1, 3, 3)

    for k in range(n_clusters):
        different += (k_means_labels == k) != (mbk_means_labels == k)

    identical = np.logical_not(different)
    ax.plot(X[identical, 0], X[identical, 1], "w", markerfacecolor="#bbbbbb", marker=".")
    ax.plot(X[different, 0], X[different, 1], "w", markerfacecolor="m", marker=".")
    ax.set_title("Difference")
    ax.set_xticks(())
    ax.set_yticks(())

    plt.show()



.. image-sg:: /auto_examples/cluster/images/sphx_glr_plot_mini_batch_kmeans_001.png
   :alt: KMeans, MiniBatchKMeans, Difference
   :srcset: /auto_examples/cluster/images/sphx_glr_plot_mini_batch_kmeans_001.png
   :class: sphx-glr-single-img






.. rst-class:: sphx-glr-timing

   **Total running time of the script:** ( 0 minutes  0.175 seconds)


.. _sphx_glr_download_auto_examples_cluster_plot_mini_batch_kmeans.py:


.. only :: html

 .. container:: sphx-glr-footer
    :class: sphx-glr-footer-example



  .. container:: sphx-glr-download sphx-glr-download-python

     :download:`Download Python source code: plot_mini_batch_kmeans.py <plot_mini_batch_kmeans.py>`



  .. container:: sphx-glr-download sphx-glr-download-jupyter

     :download:`Download Jupyter notebook: plot_mini_batch_kmeans.ipynb <plot_mini_batch_kmeans.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
